AU777190B2 - Streptococcus pneumoniae polynucleotides and sequences - Google Patents

Streptococcus pneumoniae polynucleotides and sequences Download PDF

Info

Publication number
AU777190B2
AU777190B2 AU33351/01A AU3335101A AU777190B2 AU 777190 B2 AU777190 B2 AU 777190B2 AU 33351/01 A AU33351/01 A AU 33351/01A AU 3335101 A AU3335101 A AU 3335101A AU 777190 B2 AU777190 B2 AU 777190B2
Authority
AU
Australia
Prior art keywords
protein
pneumoniae
gene
bacillus
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU33351/01A
Other versions
AU3335101A (en
Inventor
Steven C. Barash
Gil H. Choi
Patrick J. Dillon
Brian A. Dougherty
Michael Fannon
Charles A. Kunsch
Craig A Rosen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Human Genome Sciences Inc
Original Assignee
Human Genome Sciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU69090/98A external-priority patent/AU6909098A/en
Application filed by Human Genome Sciences Inc filed Critical Human Genome Sciences Inc
Publication of AU3335101A publication Critical patent/AU3335101A/en
Application granted granted Critical
Publication of AU777190B2 publication Critical patent/AU777190B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Description

P/00/011 28/5/91 Regulation 3.2
AUSTRALIA
Patents Act 1990
ORIGINAL
COMPLETE SPECIFICATION STANDARD PATENT Name of Applicant: Address for service is: Human Genome Sciences, Inc.
WRAY ASSOCIATES 239 Adelaide Terrace Perth, WA 6000 Attorney code: WR Invention Title: "Streptococcus pneumoniae Antigens and Vaccines" This application is a divisional application by virtue of Section 39 of Australian Patent Application 69090/98 filed on 30 October 1997.
The following statement is a full description of this invention, including the best method of performing it known to me:- 1/2 Streptococcus pneumoniae Polynucleotides and Sequences FIELD OF THE INVENTION The present invention relates to the field of molecular biology. In particular, it relates to, among other things, nucleotide sequences of Streptococcus pneumoniae, contigs, ORFs, fragments, probes, primers and related polynucleotides thereof, peptides and polypeptides encoded by the sequences, and uses of the polynucleotides and sequences thereof, such as in fermentation, polypeptide production, assays and pharmaceutical development, among others.
BACKGROUND OF THE INVENTION •Streptococcus pneumoniae has been one of the most extensively studied 1 15 microorganisms since its first isolation in 1881. It was the object of many investigations that led to important scientific discoveries. In 1928, Griffith observed that when heat-killed encapsulated pneumococci and live strains constitutively lacking any capsule were concomitantly injected into mice, the *nonencapsulated could be converted into encapsulated pnteumococci with the same capsular type as the heat-killed strain. Years later, the nature of this "transforming principle," or carrier of genetic information, was shown to be DNA. (Avery, O.T., et al., J. Exp. Med.. 79:137-157 (1944)).
In spite of the vast number of publications on S. pneumoniae many questions about its virulence are still unanswered, and this pathogen remains a 25 major causative agent of serious human disease, especially community-acquired pneumonia. (Johnston, et al., Rev. Infect. Dis. 13(Suppl. 6):S509-517 (1991)). In addition, in developing countries, the pneumococcus is responsible for the death of a large number of children under the age of 5 years from pneumococcal pneumonia. The incidence of pneumococcal disease is highest in infants under 2 years of age and in people over 60 years of age. Pneumococci are the second most frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and otitis media in children. With the recent introduction of conjugate vaccines for H.
influenzae type b, pneumococcal meningitis is likely to become increasingly prominent. S. pneumoniae is the most important etiologic agent of communityacquired pneumonia in adults and is the second most common cause of bacterial meningitis behind Neisseria meningitidis.
The antibiotic generally prescribed to treat S. pneumoniae is benzylpenicillin, although resistance to this and to other antibiotics is found occasionally. Pneumococcal resistance to penicillin results from mutations in its penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by a sensitive strain, treatment with penicillin is usually successful unless started too late. Erythromycin or clindamycin can be used to treat pneumonia in patients hypersensitive to penicillin, but resistant strains to these drugs exist. Broad spectrum antibiotics the tetracyclines) may also be effective, although tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, the mortality of pneumococcal bacteremia in the last four decades has remained stable between 25 and 29%. (Gillespie, et al., J. Med. Microbiol. 28:237- 248 (1989).
•15 S. pneumoniae is carried in the upper respiratory tract by many healthy individuals. It has been suggested that attachment of pneumococci is mediated by a disaccharide receptor on fibronectin. present on human pharyngeal epithelial cells.
(Anderson, et al., J. Immunol. 142:2464-2468 (1989). The mechanisms by which pneumococci translocate from the nasopharynx to the lung, thereby causing pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are :poorly understood. (Johnston, et al., Rev. Infect. Dis. 13(Suppl. 6):S509- 517 (1991).
Various proteins have been suggested to be involved in the pathogenicity of S. pneumoniae, however, only a few of them have actually been confirmed as virulence factors. Pneumococci produce an IgAl protease that might interfere with host defense at mucosal surfaces. (Kornfield, et al., Rev. Inf. Dis. 3:521- 534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may facilitate attachment to epithelial cells by cleaving sialic acid from the host Sglycolipids and gangliosides. Partially purified neuraminidase was observed to induce meningitis-like symptoms in mice; however, the reliability of this finding has been questioned because the neuraminidase preparations used were probably contaminated with cell wall products. Other pneumococcal proteins besides neuraminidase are involved in the adhesion of pneumococci to epithelial and endothelial cells. These pneumococcal proteins have as yet not been identified.
Recently, Cundell et.. al., reported that peptide permeases can modulate pneumococcal adherence to epithelial and endothelial cells. It was, however, unclear whether these permeases function directly as adhesions or whether they enhance adherence by modulating the expression of pneumococcal adhesions.
(DeVelasco, et al., Micro. Rev. 59:591-603 (1995). A better understanding of the virulence factors determining its pathogenicity will need to be developed to cope with the devastating effects of pneumococcal disease in humans.
Ironically, despite the prominent role of S. pneumoniae in the discovery of DNA, little is known about the molecular genetics of the organism. The S.
pneumoniae genome consists of one circular, covalently closed, double-stranded DNA and a collection of so-called variable accessory elements, such as prophages, plasinids, transposons and the like. Most physical characteristics and almost all of the genes of S. pneumoniae are unknown. Among the few that have been identified, most have not been physically mapped or characterized in detail. Only a few genes of this organism have been sequenced. (See, for instance current versions of GENBANK and other nucleic acid databases, and references that relate to the genome of S. pneumoniae such as those set out elsewhere herein.) It is clear that the etiology of diseases mediated or exacerbated by S.
pneumoniae, infection involves the programmed expression of S. pneumoniae genes, and that characterizing the genes and their patterns of expression would add dramatically to our understanding of the organism and its host interactions.
Knowledge of S. pneumoniae genes and genomic organization would improve our understanding of disease etiology and lead to improved and new ways of preventing, ameliorating, arresting and reversing diseases. Moreover, characterized genes and genomic fragments of S. pneumoniae would provide reagents for, among other things, detecting, characterizing and controlling S.
pneumoniae infections. There is a need to characterize the genome of S.
pneumoniae and for polynucleotides of this organism.
o SUMMARY OF THE INVENTION The present invention is based on the sequencing of fragments of the Streptococcus pneumoniae genome. The primary nucleotide sequences which were generated are provided in SEQ ID NOS: 1-391.
The present invention provides the nucleotide sequence of several hundred contigs of the Streptococcus pneumoniae genome, which are listed in tables below and set out in the Sequence Listing submitted herewith, and representative fragments thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, the present invention is provided as contiguous strings of primary sequence information corresponding to the nucleotide sequences depicted in SEQ ID NOS: 1-391.
The present invention further provides nucleotide sequences which are at S 15 least 95% identical to the nucleotide sequences of SEQ ID NOS: 1-391.
The nucleotide sequence of SEQ ID NOS:1-391, a representative fragment Sthereof, or a nucleotide sequence which is at least 95% identical to the nucleotide sequence of SEQ ID NOS:1-391 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM: electrical storage media such as RAM and ROM: and hybrids of these categories such as magnetic/optical storage media.
The present invention further provides systems, particularly computerbased systems which contain the sequence information herein described stored in a data storage means. Such systems are designed to identify commercially important fragments of the Streptococcus pneumoniae genome.
Another embodiment of the present invention is directed to fragments of the Streptococcus pneumoniae genome having particular structural or functional attributes. Such fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter referred to as open reading frames or ORFs, fragments which modulate the expression of an operably linked ORF, hereinafter referred to as expression modulating fragments or EMFs, and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in a sample, hereinafter referred to as diagnostic fragments or DFs.
Each of the ORFs in fragments of the Streptococcus pneumoniae genome disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in numerous ways as polynucleotide reagents. For instance, the sequences can be used as diagnostic probes or amplification primers for detecting or determining the presence of a specific microbe in a sample, to selectively control gene expression in a host and in the production of polypeptides, such as polypeptides encoded by ORFs of the present invention, particular those polypeptides that have a pharmacological activity.
The present invention further includes recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genome of the present invention. The recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Streptococcus 15 pneumoniae has been inserted.
S: The present invention further provides host cells containing any of the S.isolated fragments of the Streptococcus pneumoniae genome of the present invention. The host cells can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 20 bacterial cell.
*The present invention is further directed to isolated polypeptides and proteins encoded by ORFs of the present invention. A variety of methods, well known to those of skill in the art, routinely may be utilized to obtain any of the polypeptides and proteins of the present invention. For instance, polypeptides and proteins of the present invention having relatively short, simple amino acid sequences readily can be synthesized using commercially available automated peptide synthesizers. Polypeptides and proteins of the present invention also may be purified from bacterial cells which naturally produce the protein. Yet another alternative is to purify polypeptide and proteins of the present invention from cells which have been altered to express them.
The invention further provides methods of obtaining homologs of the fragments of the Streptococcus pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention.
Specifically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
The invention further provides antibodies which selectively bind polypeptides and proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.
The invention further provides hybridomas which produce the abovedescribed antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
The present invention further provides methods of identifying test samples derived from cells which express one of the ORFs of the present invention, or a homolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.
In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the above-described assays.
S. Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: a first container comprising one of the antibodies, or one of the DFs of the present invention; and one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.
Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents capable of binding to a polypeptide or protein encoded by one of the ORFs of the present invention.
Specifically, such agents include, as further described below, antibodies, peptides, carbohydrates, pharmaceutical agents and the like. Such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and determining whether the agent binds to said protein.
The present genomic sequences of Streptococcus pneumoniae will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Streptococcus pneumoniae genome will be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value to Streptococcus pneumoniae researchers and for immediate commercial value for the production of proteins or to control gene expression.
The methodology and technology for elucidating extensive genomic sequences of bacterial and other genomes has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced contigs and genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative genomic and molecular phylogeny.
DESCRIPTION OF THE FIGURES FIGURE 1 is a block diagram of a computer system (102) that can be 15 used to implement computer-based systems of present invention.
FIGURE 2 is a schematic diagram depicting the data flow and computer programs used to collect, assemble, edit and annotate the contigs of the Streptococcus pneumoniae genome of the present invention. Both Macintosh and 20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely as described in Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, 585, IEEE Computer Society Press, Washington D.C. (1993). Factura (AB) is a Macintosh program designed for automatic vector sequence removal and end-trimming of sequence files. The program Loadis runs on a Macintosh platform and parses the feature data extracted from the sequence files by Factura to the Unix based Streptococcus pneumoniae relational database. Assembly of contigs (and whole genome sequences) is accomplished by retrieving a specific set of sequence files and their associated features using Extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting sequence file is processed by seq_filter to trim portions of the sequences with more than 2% ambiguous nucleotides. The sequence files were assembled using TIGR Assembler, an assembly engine designed at The Institute for Genomic Research TIGR for rapid and accurate assembly of thousands of sequence fragments. The collection of contigs generated by the assembly step is loaded into the database with the lassie program. Identification of open reading frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The ORFs are searched against S. pneumoniae sequences from GenBank and against all protein sequences using the BLASTN and BLASTP programs, described in Altschul et al., J. Mol. Biol. 215: 403-410 (1990)). Results of the ORF determination and similarity searching steps were loaded into the database. As described below, some results of the determination and the searches are set out in Tables 1-3.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS The present invention is based on the sequencing of fragments of the Streptococcus pneumoniae genome and analysis of the sequences. The primary nucleotide sequences generated by sequencing the fragments are provided in SEQ ID NOS:1-391. (As used herein, the "primary sequence" refers to the nucleotide 15 sequence represented by the IUPAC nomenclature system.) In addition to the aforementioned Streptococcus pneumoniae polynucleotide and polynucleotide sequences, the present invention provides the nucleotide sequences of SEQ ID NOS:1-391, or representative fragments thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan.
As used herein, a "representative fragment of the nucleotide sequence depicted in SEQ ID NOS: 1-391" refers to any portion of the SEQ ID NOS: 1-391 which is not presently represented within a publicly available database. Preferred representative fragments of the present invention are Streptococcus pneumoniae open reading frames ORFs expression modulating fragment EMFs and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in sample DFs A non-limiting identification of preferred representative fragments is provided in Tables 1-3. As discussed in detail below, the information provided in SEQ ID NOS:1-391 and in Tables 1-3 together with routine cloning, synthesis, sequencing and assay methods will enable those skilled in the art to clone and sequence all "representative fragments" of interest, including open reading frames encoding a large variety of Streptococcus pneumoniae proteins.
While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly accurate, sequencing techniques are not perfect and, in relatively rare instances, further investigation of. a fragment or sequence of the invention may reveal a nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID NOS: 1-391. However, once the present invention is made available once the information in SEQ ID NOS:1-391 and Tables 1-3 has been made available), resolving a rare sequencing error in SEQ ID NOS:1-391 will be well within the skill of the art. The present disclosure makes available sufficient sequence information to allow any of the described contigs or portions thereof to be obtained readily by straightforward application of routine techniques. Further sequencing of such polynucleotide may proceed in like manner using manual and automated sequencing methods which are employed ubiquitous in the art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's (AB) AutoAssembler can be used as an aid during visual inspection of nucleotide sequences. By employing such routine techniques potential errors readily may be identified and the correct sequence then may be ascertained by targeting further sequencing effort, also of a routine nature, to the region containing the potential 15 error.
Even if all of the very rare sequencing errors in SEQ ID NOS: 1-391 were corrected, the resulting nucleotide sequences would still be at least 95% identical, nearly all would be at least 99% identical, and the great majority would be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-391.
20 As discussed elsewhere herein, polynucleotides of the present invention readily may be obtained by routine application of well known and standard procedures for cloning and sequencing DNA. Detailed methods for obtaining libraries and for sequencing are provided below, for instance. A wide variety of Streptococcus pneumoniae strains that can be used to prepare S. pneumoniae genomic DNA for cloning and for obtaining polynucleotides of the present invention are available to the public from recognized depository institutions, such as the American Type Culture Collection ATCC). While the present invention is enabled by the sequences and other information herein disclosed, the S.
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill in the art. As a further convenience, a library of S. pneumoniae genomic DNA.
derived from the same strain, also has been deposited in the ATCC. The S.
pneumoniae strain was deposited on October 10, i996, and was given Deposit No.
55840, and the cDNA library was deposited on October 11, 1996 and was given Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb fragments generated by partial Sau3Al digestion and they are inserted into the BamHI site in the well-known lambda-derived vector lambda DASH II (Stratagene, La Jolla, CA). The provision of the deposits is not a waiver of any rights of the inventors or their assignees in the present subject matter.
The nucleotide sequences of the genomes from different strains of Streptococcus pneumoniae differ somewhat. However, the nucleotide sequences of the genomes of all Streptococcus pneumoniae strains will be at least identical, in corresponding part, to the nucleotide sequences provided in SEQ ID NOS: 1-391. Nearly all will be at least 99% identical and the great majority will be 99.9% identical.
Thus, the present invention further provides nucleotide sequences which are at least 95%, preferably 99% and most preferably 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-391, in a form which can be readily used, analyzed and interpreted by the skilled artisan.
15 Methods for determining whether a nucleotide sequence is at least 95%, at oleast 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID ,NOS: 1-391 are routine and readily available to the skilled artisan. For example, the well known fasta algorithm described in Pearson and Lipman, Proc. Natl. Acad.
SSci. USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide sequences. The BLASTN program also can be used to generate an identity score of polynucleotides compared to one another.
COMPUTER RELATED EMBODIMENTS The nucleotide sequences provided in SEQ ID NOS:1-391, a representative 25 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ ID NOS: 1-391 may be "provided" in a variety of mediums to facilitate use thereof.
As used herein, provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention; i.e., a nucleotide sequence provided in SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS:1-391.
Such a manufacture provides a large portion of the Streptococcus pneumoniae genome and parts thereof a Streptococcus pneumoniae open reading frame (ORF)) in a form which allows a skilled artisan to examine the manufacture using means not directly applicable to examining the Streptococcus pneumoniae genome or a subset thereof as it exists in nature or in purified form.
In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, "computer readable media" refers to any medium which carn-be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD- ROM; electrical storage media such as RAM and ROM; and hybrids of these categories, such as magnetic/optical storage media. A skilled ,0 artisan can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising computer readable medium having recorded thereon a nucleotide sequence of the present invention.
Likewise, it will be clear to those of skill how additional computer readable media that may be developed also can be used to create analogous manufactures having 5 recorded thereon a nucleotide sequence of the present invention.
*As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present invention. A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially- available software such as WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of data-processor structuring formats text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.
Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium. Thus, by providing in computer readable form the nucleotide sequences of SEQ ID NOS: 1- 12 391, a representative fragment thereof, or a nucleotide sequence at least preferably at least 99% and most preferably at least 99.9% identical to a sequence of SEQ ID NOS: 1-391 the present invention enables the skilled artisan routinely to access the provided sequence information for a wide variety of purposes.
The examples which follow demonstrate how software which implements the BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading frames (ORFs) within the Streptococcus pneumoniae genome which contain homology to ORFs or proteins from both Streptococcus pneumoniae and from other organisms. Among the ORFs discussed herein are protein encoding fragments of the Streptococcus pneumoniae genome useful in producing commercially important proteins, such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.
The present invention further provides systems, particularly computerbased systems, which contain the sequence information described herein. Such systems are designed to identify, among other things, commercially important fragments of the Streptococcus pneumoniae genome.
As used herein, "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence 20 information of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means. and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based systems are suitable for use in the present invention.
As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means.
As used herein, "data storage means" refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.
As used herein, "search means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the present genomic sequences which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattem (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.
As used herein, a "target sequence" can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids s15 or from about 30 to 300 nucleotide residues. However, it is well recognized that ~searches for commercially important fragments, such as sequence fragments involved in gene expression and protein processing, may be of shorter length.
As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed upon the folding of the target motif. There are a variety of target motifs known in the art.
Protein target motifs include, but are not limited to, enzymic active sites and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).
A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the Streptococcus pneumoniae genomic sequences possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.
A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the Streptococcus pneumoniae genome. In the present examples, implementing software which implement the BLAST and BLAZE algorithms, described in Altschul er at., J. Mol. Biol. 215: 403-410 (1990), is used to identify open reading frames within the Streptococcus pneumoniae genome. A skilled artisan can readily recognize that any one of the publicly available- homology search programs can be used as the search means for the computer-based systems of the present invention.
Of course, suitable proprietary systems that may be known to those of skill also may be employed in this regard.
Figure 1 provides a block diagram of a computer system illustrative of embodiments of this aspect of present invention. The computer system 102 includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage device 114 Is5 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape ooooo drive, etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a-magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes appropriate software for reading the control logic and/or the 20 data from the removable medium storage device 114, once it is inserted into the removable medium storage device 114.
A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110, and/or a removable storage medium 116. During execution, software for accessing :'000 25 and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 108, in accordance with the requirements and operating parameters of the operating system, the hardware system and the software program or programs.
BIOCHEMICAL
EMBODIMENTS
Other embodiments of the present invention are directed to isolated fragments of the Streptococcus pneumoniae genome. The fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to fragments which encode peptides and polypeptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs) and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in a sample, hereinafter diagnostic fragments (DFs).
As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the Streptococcus pneumoniae genome" refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification means to reduce, from the composition, the number of compounds which are oe 15 normally associated with the composition. Particularly, the term refers to the nucleic acid molecules having the sequences set out in SEQ ID NOS:1-391, to representative fragments thereof as described above, to polynucleotides at least preferably at least 99% and especially preferably at least 99.9% identical in sequence thereto, also as set out above.
A variety of purification means can be used to generate the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.
In one embodiment. Streptococcus pneumoniae DNA can be enzymatically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate a Streptococcus pneumoniae library by inserting them into lambda clones as described in the Examples below. Primers flanking, for example, an ORF, such as those enumerated in Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ ID NOS:1-391. Well known Sand routine techniques of PCR cloning then can be used to isolate the ORF from the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given the availability of SEQ ID NOS:1-391, the information in Tables 1, 2 and 3, and the information that may be obtained readily by analysis of the sequences of SEQ ID NOS:1-391 using methods set out above, those of skill will be enabled by the present disclosure to isolate any ORF-containing or other nucleic acid fragment 6f the present invention.
The isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and double stranded DNA, and single stranded RNA.
As used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein.
Tables 1, 2, and 3 list ORFs in the Streptococcus pneumoniae genomic contigs of the present invention that were identified as putative coding regions by the GeneMark software using organism-specific second-order Markov probability transition matrices. It will be appreciated that other criteria can be used, in accordance with well known analytical methods, such as those discussed herein, to generate more inclusive, more restrictive, or more selective lists.
Table 1 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that over a continuous region of at least 50 bases are 95% or more identical (by BLAST analysis) to a nucleotide sequence available through GenBank in October, 1997.
Table 2 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that are not in Table 1 and match, with a BLASTP probability score of 0.01 or less, a polypeptide sequence available through GenBank in October, 1997.
Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that do not match significantly, by BLASTP analysis, a .polypeptide sequence available through GenBank in October, 1997.
In each table, the first and second columns identify the ORF by, respectively, contig number and ORF number within the contig; the third column 25 indicates the first nucleotide of the ORF (actually the first nucleotide of the stop codon immediately preceeding the ORF), counting from the 5' end of the contig strand; and the fourth column, "stop indicates the last nucleotide of the stop codon defining the 3'end of the ORF.
In Tables 1 and 2, column five, lists the Reference for the closest matching sequence available through GenBank. These reference numbers are the databases entry numbers commonly used by those of skill in the art, who will be familiar with their denominators. Descriptions of the nomenclature are available from the National Center for Biotechnology Information. Column six in Tables 1 and 2 provides the gene name of the matching sequence; column seven provides the BLAST identity score and column eight the BLAST similarity score from the comparison of the ORF and the homologous gene; and column nine indicates the length in nucleotides of the highest scoring segment pair identified by the BLAST identity analysis.
Each ORF described in the tables is defined by "start and "stop nucleotide position numbers. These position numbers refer to the boundaries of each ORF and provide orientation with respect to whether the forward or reverse strand is the coding strand and which reading frame the coding sequence is contained. The "start" position is the first nucleotide of the triplet encoding a stop codon just 5' to the ORF and the "stop" position is the last nucleotide of the triplet encoding the next in-frame stop codon the stop codon at the 3' end of the ORF). Those of ordinary skill in the art appreciate that preferred fragments within each ORF described in the table include fragments of each ORF which include the entire sequence from the delineated "start" and "stop" positions excepting the first and last three nucleotides since these encode stop codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 5' nucleotides and the three 3' nucleotides are encompassed by the present invention. Those of skill also appreciate that particularly preferred are fragments within each ORF that are polynucleotide fragments comprising polypeptide coding sequence. As defined herein, "coding sequence" includes the fragment within an 20 ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred are fragments comprising the entire coding sequence and fragments comprising the entire coding sequence, excepting the coding sequence for the N-terminal methionine. Those of skill appreciate that the N-terminal methionine is often 25 removed during post-translational processing and that polynucleotides lacking the *se% ATG can be used to facilitate production of N-termainal fusion proteins which may be benefical in the production or use of genetically engineered proteins. Of course, due to the degeneracy of the genetic code many polynucleotides can encode a given polypeptide. Thus, the invention further includes polynucleotides comprising a nucleotide sequence encoding a polypeptide sequence itself encoded by the coding sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides at least 95%, preferably at least 99% and especially preferably at least 99.9% identical in sequence to the foregoing polynucleotides, are contemplated by the present invention.
Polypeptides encoded by polynucleotides described above and elsewhere herein are also provided by the present invention as are polypeptide comprising a an amino acid sequence at least about 95%, preferably at least 97% and even more preferably 99% identical to the amino acid sequence of a polypeptide encoded by an ORF shown in Tables 1-3. These polypeptides may or may not comprise an Nterminal methionine.
The concepts of percent identity and percent similarity of two polypeptide sequences is well understood in the art. For example, two polypeptides 10 amino acids in length which differ at three amino acid positions at positions 1, 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar" possessed similar biochemical characteristics). Many programs for analysis of nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically -15 list percent identity of a matching region as an output parameter. Thus, for instance, Tables I and 2 herein enumerate the percent identity of the highest scoring segment pair in each ORF and its listed relative. Further details concerning the algorithms and criteria used for homology searches are provided below and are described in the pertinent literature highlighted by the citations provided below.
It will be appreciated that other criteria can be used to generate more inclusive and more exclusive listings of the types set out in the tables. As those of skill will appreciate, narrow and broad searches both are useful. Thus, a skilled artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae genome other than those listed in Tables 1-3, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.
As used herein, an "expression modulating fragment," EMF, means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.
As used herein, a sequence is said to "modulate the expression of an operably linked sequence" when the expression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event.
EMF sequences can be identified within the contigs of the Streptococcus pneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate the expression of an operably linked ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, an "intergenic segment" refers to fragments of the Streptococcus pneumoniae genome which are between two ORF(s) herein described. EMFs also can be identified using known EMFs as a target sequence or target motif in the computer-based systems of the present invention. Further, the two methods can be combined and used together.
The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a cloning site linked to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions. As described above, a EMF will modulate the expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided below. A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector. The vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed host in examined under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence.
As used herein, a "diagnostic fragment," DF, means a series of nucleotide molecules which selectively hybridize to Streptococcus pneumoniae sequences.
DFs can be readily identified by identifying unique sequences within contigs of the Streptococcus pneumoniae genome, such as by using well-known computer analysis software, and by generating and testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which determines amplification or hybridization selectivity.
The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof. Allelic and species variations can be routinely determined by comparing the sequences provided in SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95%, preferrably at least 99% and most at least preferably 99.9% identical to SEQ ID NOS:1-391, with a sequence from another isolate of the same species. Furthermore, to accommodate codon variability, the invention includes nucleic acid molecules coding for the same amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated. Any specific sequence disclosed herein can be readily screened for errors by 15 resequencing a particular fragment, such as an ORF, in both directions sequence both strands). Alternatively, error screening can be performed by sequencing corresponding polynucleotides of Streptococcus pneumoniae origin isolated by using part or all of the fragments in question as a probe or primer.
Preferred DFs of the present invention comprise at least about 17, preferrably at least about 20, and more preferrably at least about 50 contiguous nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs specifically hybridize to a polynucleotide containing the sequence of the ORF from which they are derived. Specific hybridization occurs even under stringent conditions defined elsewhere herein.
25 Each of the ORFs of the Streptococcus pneumoniae genome disclosed in Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents in numerous ways. For example, the sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe in a sample, particularly Streptococcus pneumoniae.
Especially preferred in this regard are ORFs such as those of Table 3, which do not match previously characterized sequences from other organisms and thus are most likely to be highly selective for Streptococcus pneumoniae. Also particularly preferred are ORFs that can be used to distinguish between strains of Streptococcus pneumoniae, particularly those that distinguish medically important strain, such as drug-resistant strains.
In addition, the fragments of the present invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Information from the sequences of the present invention can be used to design antisense and triple helixforming oligonucleotides. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription, for triple-helix formation, or to the mRNA itself, for antisense inhibition. Both techniques have been demonstrated to be effective in model systems, and the requisite techniques are well known and involve routine procedures. Triple helix techniques are discussed in, for example, Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991). Antisense techniques in general are discussed in, for instance, Okano, J. Neurochem. 56:560 (1991) and Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)).
The present invention further provides recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention. Certain preferred recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Streptococcus pneumoniae genome has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a promoter, operably linked to the ORF. For vectors comprising the EMFs of the present invention, the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF.
SLarge numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available for generating the recombinant constructs of the present invention. The following vectors are provided by way of example. Useful bacterial vectors include phagescript, PsiX 174, pBluescript S K, pBS KS, pNH8a, pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from Pharmacia).
Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTI, pSG (available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from Pharmacia).
Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late LTRs from retrovirus, and mouse metallothionein- I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
The present invention further provides host cells containing any one of the isolated fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, wherein the fragment has been introduced into the host cell using known methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or 15 a procaryotic cell, such as a bacterial cell.
A polynucleotide of the present invention, such as a recombinant construct comprising an ORF of the present invention, may be introduced into the host by a variety of well established techniques that are standard in the art, such as calcium phosphate transfection, DEAE, dextran mediated transfection and electroporation, which are described in, for instance, Davis, L. et al., BASIC METHODS IN MOLECULAR BIOLOGY (1986).
A host cell containing one of the fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, can be used in conventional manners to produce the gene product encoded by the isolated 25 fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF. The present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present invention or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is intended nucleotide fragments which differ from a nucleic acid fragment of the present invention an ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence.
Preferred nucleic acid fragments of the present invention are the ORFs and subfragments thereof depicted in Tables 2 and 3 which encode proteins.
A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides,--Such short fragments as may be obtained most readily by synthesis are useful, for example, in generating antibodies against the native polypeptide, as discussed further below.
In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the polypeptide or protein. One skilled in the art can readily employ well-known methods for isolating polypeptides and proteins to isolate and purify polypeptides or proteins of the present invention produced naturally by a bacterial strain, or by other methods. Methods for isolation and purification that can be employed in this regard include, but are not limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 15 exchange chromatography, and immuno-affinity chromatography.
The polypeptides and proteins of the present invention also can be purified from cells which have been altered to express the desired polypeptide or protein.
As used herein, a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level. Those skilled in the art can readily adapt S. procedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides or proteins of the present invention.
25 Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli.and B. subtilis. The most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.
"Recombinant," as used herein, means that a polypeptide or protein is derived from recombinant microbial or mammalian) expression systems.
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or fungal yeast) expression systems. As a product, "recombinant microbial"defines a polypeptide or protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.
"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides.
Generally, DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Streptococcus pneumoniae genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional 15 unit comprising regulatory elements derived from a microbial or viral operon.
Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of a genetic regulatory elements necessary for gene expression in the host, including elements required to initiate and maintain transcription at a level sufficient for suitable expression of the desired polypeptide, including, for example, S* promoters and, where necessary, an enhancer and a polyadenylation signal; a structural or coding sequence which is transcribed into mRNA and translated into protein, and appropriate signals to initiate translation at the beginning of the desired coding region and terminate translation at its end. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell.
Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
"Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
Mature proteins can be expressed in mammalian.cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which is hereby incorporated by reference in its entirety.
Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRPI gene, and a promoter derived from a highly expressed gene to direct transcription of a 15 downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alphafactor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium.
Optionally, the heterologous sequence can encode a fusion protein including an Nterminal identification peptide imparting desired characteristics, stabilization or simplified purification of expressed recombinant product.
Useful expression vectors for bacterial use are constructed by inserting a 25 structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable S markers and an origin of replication to ensure maintenance of the vector and, when desirable, provide amplification within the host.
Suitable prokaryotic hosts for transformation include strains of E. coli, B.
subtilis, Salmonella typhimurium and various species within the genera Pseudomonas and Streptomyces. Others may, also be employed as a matter of choice.
As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, WI, USA). These pBR322 "backbone" sections-are combined with an appropriate promoter and the structural sequence to be expressed.
Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter, where it is inducible, is derepressed or induced by appropriate means temperature shift or chemical induction) and cells are cultured for an additional period to provide for expression of the induced gene product. Thereafter cells are typically harvested, generally by centrifugation, disrupted to release expressed protein, generally by physical or chemical means, and the resulting crude extract is retained for further purification.
Various mammalian cell culture systems can also be employed to express 15 recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:175 (1981)7 and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines.
Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required 25 nontranscribed genetic elements.
Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps.
Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.
The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described.
As used herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between reference and subject sequences. For purposes of the present invention, sequences having equivalent biological activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining equivalence, truncation of the mature sequence should be disregarded.
The inventfon further provides methods of obtaining homologs from other strains of Streptococcus pneumoniae, of the fragments of the Streptococcus pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. As used herein, a sequence or protein of 15 Streptococcus pneumoniae is defined as a homolog of a fragment of the Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Streptococcus pneumoniae genome of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
As used herein, two nucleic acid molecules or proteins are said to "share significant homology" if the two contain regions which possess greater than 25 sequence (amino acid or nucleic acid) homology. Preferred homologs in this regard are those with more than 90% homology. Especially preferred are those with 93% or more homology. Among especially preferred homologs those with 95% or more homology are particularly preferred. Very particularly preferred among these are those with 97% and even more particularly preferred among those are homologs with 99% or more homology. The most preferred homologs among these are those with 99.9% homology or more. It will be understood that, among measures of homology, identity is particularly preferred in this regard.
Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NOS:1-391 or from a nucleotide sequence at least particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ ID NOS: 1-391 can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies containing cloned DNA encoding a homolog. Methods suitable to this aspect of the present invention are well known and have been described in great detail in many publications such as, for example, Innis et al., PCR Protocols, Academic Press, San Diego, CA (1990)).
When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, one skilled in the art will recognize that by employing high stringency conditions annealing at 50-60 0 C in 6X SSPC and 50% formamide, and washing at 65 0 C in 0.5X SSPC) only sequences which are greater than 75% homologous to the primer will be amplified. By employing lower stringency conditions hybridizing at 35-37 0 C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X SSPC), sequences which are greater than 40-50% homologous to the primer will also be amplified.
15 When using DNA probes derived from SEQ ID NOS:1-391, or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ ID NOS:1-391, for colony/plaque hybridization, one skilled in the art will recognize o that by employing high stringency conditions hybridizing at 50- 65 0 C in SSPC and 50% formamide, and washing at 50- 65 0 C in 0.5X SSPC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in SSPC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.
25 Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs are bacteria which are closely related to Streptococcus pneumoniae.
ILLUSTRATIVE USES OF COMPOSITIONS OF THE
INVENTION
Each ORF provided in Tables I and 2 is identified with a function by homology to a known gene or polypeptide. As a result, one skilled in the art can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the polypeptide. Such identifications permit one skilled in the art to use the Streptococcus pneumoniae ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite. A variety of reviews illustrative of this aspect of the invention are available, including the following reviews on the industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al., Eds., Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of exemplary uses that illustrate this and similar aspects of the present invention are discussed below.
1. Biosynthetic Enzymes Open reading frames encoding proteins involved in mediating the catalytic I 15 reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis.
The various metabolic pathways present in Streptococcus pneumoniae can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1-3 and SEQ ID NOS: 1-391.
Of particular interest are polypeptides involved in the degradation of intermediary metabolites as well as non-macromolecular metabolism. Such S* enzymes include amylases, glucose oxidases, and catalase.
Proteolytic enzymes are another class of commercially important enzymes.
Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits. A detailed reviewbf the proteolytic enzymes.used in the food industry is provided in Rombouts et al., Symbiosis 21:79 (1986) and Voragen et al. in Biocatalysts In Agricultural Biotechnology, Whitaker et al., Eds., American Chemical Society Symposium Series 389:93 (1989).
The metabolism of sugars is an important aspect of the primary metabolism of Streptococcus pneumoniae. Enzymes involved in the degradation of sugars, such as, particularly, glucose, galactose, fructose and xylose, can be used in industrial fermentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include sugar isomerases such as glucose isomerase.
Other metabolic enzymes have found commercial use such as glucose oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the commercial production of ascorbic acid using the Reichstein's procedure, as described in Krueger et al., Biotechnology Rhine et al., Eds., Verlag Press, Weinheim, Germany (1984).
Glucose oxidase (GOD) is commercially available and has been used in 15 purified form as well as in an immobilized form for the deoxygenation of beer.
See, for instance, Hartmeir et al., Biotechnology Letters 1:21 (1979). The most important application of GOD is the industrial scale fermentation of gluconic acid.
Market for gluconic acids which are used in the detergent, textile, leather, t photographic, pharmaceutical, food, feed and concrete industry, as described, for example, in Bigelis et al., beginning on page 357 in GENE MANIPULATIONS AND FUNGI; Benett et al., Eds., Academic Press, New York (1985). In addition to industrial applications, GOD has found applications in medicine for quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. This application is described in 25 Owusu et al., Biochem. et Biophysica. Acta. 872:83 (1986), for instance.
The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of glucose- produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988).
Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See Faultman et al., Acid Proteases Structure Function and Biology, Tang, ed., Plenum Press, New York (1977) and Godfrey et al., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial Enzymes by 1990, Hel Hepner Associates, London (1986)).
Another class of commercially usable proteins of the present invention are the microbial lipases, described by, for instance, Macrae et al., Philosophical Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal of the Amcrican Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application of lipases include the use as a detergent additive to facilitate the removal of fats from S.fabrics in the course of the washing procedures.
.The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral intermediates.
Preparation of chiral intermediates is of interest to a wide range of synthetic chemists particularly those scientists involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). The following reactions catalyzed by enzymes are of 25 interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and nitriles, esterification reactions, trans-esterification reactions, synthesis of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming reactions such as the aldol reaction.
When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially purified enzyme on the other hand, has been described in detail by Bud et al., Chemistry in Britain (1987), p.
127.
Amino transferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase enzymes catalyze the stereo- selective synthesis of only L-amino acids and generally possess uniformly high catalytic rates. A description of the use of amino transferases for amino acid production is provided by Roselle-David, Methods of Enzymology 136:479 (1987).
Another category of useful proteins encoded by the ORFs of the present invention include enzymes involved in nucleic acid synthesis, repair, and recombination.
2. Generation of Antibodies As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety of procedures and methods known in Sthe art which are currently applied to other proteins. The proteins of the present invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.
The invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
25 In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of producing the desired antibody are well known in the art (Campbell, A. Monoclonal Antibody Technology: Laboratory Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35: 1- 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., Immunology Today 4:72 (1983), pgs. 77-96 of Cole et al., in Monoclonal Antibodies And Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, etc.)_which is known to produce antibodies can be immunized with the pseudogene polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous or interperitoneal injection of the polypeptide. One skilled in the art will recognize that the amount of the protein encoded by the ORF of the present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the peptide and the site of injection.
The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are well known in the art and include, but are not limited to coupling the antigen with a heterologous protein (such as globulin or galactosidase) or through the inclusion of an adjuvant during immunization.
For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells.
Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. 175.109- 124 (1988)).
Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and •Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)).
Techniques described for the production of single chain antibodies S.
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of 25 the present invention.
For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.
The present invention further provides the above- described antibodies in detectably labelled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labeling are well-known in the art, for example see Sternberger et al., J.
Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E. et al., Immunol. 109:129 (1972); Goding, J. J. Immunol.
Meth. 13:215 (1976)).
The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify cells or tissues in which a fragment of the Streptococcus pneumoniae genome is expressed.
The present invention further provides the above-described antibodies immobilized on a solid support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling antibodies to such solid supports are well known in the art (Weir, D. M.
et al., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth.
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for Is15 immunoaffinity purification of the proteins of the present invention.
3. Diagnostic Assays and Kits The present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.
9 In detail, such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.
Conditions for incubating a DF or antibody with a test sample vary.
Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, An Introduction to Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. et al., Techniques in Immunocytochemistry, Academic Press, Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, Practice and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985).
The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized.
In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the assays of the present invention.
Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: a first container comprising one of the DFs or antibodies of the present invention; and one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of a bound DF or antibody.
-In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another. Such containers will include a container which will accept the test sample, a container which contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Trisbuffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.
Types of detection reagents include labelled nucleic acid probes, labelled secondary antibodies, or in the alternative, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with the labelled antibody. One skilled in the art will readily recognize that the disclosed DFs and antibodies of the present invention can be readily incorporated into one of the established kit formats which are well known in the art.
4. Screening. Assay for Binding Agents Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the fragments and the Streptococcus pneumoniae fragment and contigs herein described.
In general, such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Streptococcus pneumoniae genome; and determining whether the agent binds to said protein or said fragment.
The agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.
s15 For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.
Alternatively, agents may be rationally selected or designed. As used 0* herein, an agent is said to be "rationally selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al., "Application of Synthetic Peptides: Antisense Peptides," in Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczak et al., Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.
In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or multiple ORFs which rely on the same EMF for expression control.
One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix by binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity.
Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix see Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991)) or to the mRNA itself (antisense Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the 15 sequences of the present invention can be used to design antisense and triple helixforming oligonucleotides, and other DNA binding agents.
5. Pharmaceutical Compositions and Vaccines The present invention further provides pharmaceutical agents which can be used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or Sanother related organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions. As used herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.
As used herein, a pharmaceutical agent is said to "modulate the growth pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in vitro," when the agent reduces the rate of growth, rate of division, or viability of the organism in question. The pharmaceutical agents of the present invention can modulate the growth or pathogenicity of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth by binding to an important protein thus blocking the biological activity of the protein, while other agents may bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone to act the bodies nature immune system. Alternatively, the agent may comprise a protein encoded by one of the ORFs of the present invention and serve as a vaccine. The development and use of a vaccine based on outer membrane components are well known in the art.
As used herein, a "related organism" is a broad term which refers to any organism whose growth can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine. As such, related organisms do not need to be bacterial but may be fungal or viral pathogens.
The pharmaceutical agents and compositions of the present invention may be administered in a convenient manner, such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 15 pharmaceutical compositions are administered in an amount which is effective for treating and/or prophylaxis of the specific indication. In general, they are S* administered in an amount of at least about 1 mg/kg body weight and in most cases they will be administered in an amount not in excess of about 1 g/kg body weight per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into account the routes of administration, symptoms, etc.
o;ooo: The agents of the present invention can be used in native form or can be modified to form a chemical derivative. As used herein, a molecule is said to be a "chemical derivative" of another molecule when it contains additional chemical moieties not normally a part of the molecule. Such moieties may improve the molecule's solubility, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in, among other sources, REMINGTON'S PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein.
For example, such moieties may change an immunological character of the functional derivative, such as affinity for a given antibody. Such changes in immunomodulation activity are measured by the appropriate assay, such as a competitive type immunoassay. Modifications of such protein properties as redox or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic 33 degradation or the tendency to aggregate with carriers or into multimers also may be effected in this way and can be assayed by methods well known to the skilled artisan.
The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient by any suitable means inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the organism is to be controlled. To achieve an effective blood concentration, the preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single or multiple injections.
In providing a patient with one of the agents of the present invention, the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history, etc. In general, it is desirable to provide the recipient with a dosage of ~15Is agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeutically effective dose can be lowered by using combinations of the agents of the present invention or another agent.
As used herein, two or more compounds or agents are said to be administered "in combination" with each other when either the physiological effects of each compound, or the serum concentrations of each compound can be measured at the same time. The composition of the present invention can be administered concurrently with, prior to, or following the administration of the other agent.
The agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.
The administration of the agent(s) of the invention may be for either a "prophylactic" or "therapeutic" purpose. When provided prophylactically, the agent(s) are provided in advance of any symptoms indicative of the organisms growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the rate of recovery.
The agents of the present invention are administered to a subject, such as a mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically effective concentration. A composition is said to be "pharmacologically acceptable" if its administration can be tolerated by a recipient patient. Such an agent is said to be administered in a "therapeutically effective amount" if the amount administered is physiologically significant. An agent is physiologically significant if its presence results in a detectable change in the physiology of a recipient patient.
The agents of the present invention can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby these materials, or their functional derivatives, are combined in a mixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, human serum albumin, are described, for example, in 15 REMINGTON'S PHARMACEUTICAL SCIENCES, 16 th Ed., Osol, Ed., Mack Publishing, Easton PA (1980). In order to form a pharmaceutically acceptable composition suitable for effective administration, such compositions will contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.
Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations may be achieved through the use of S* *polymers to complex or absorb one or more of the agents of the present invention.
The controlled delivery may be effectuated by a variety of well known techniques, including formulation with macromolecules such as, for example, polyesters, O* 25 polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the macromolecules and the agent in the formulation, and by appropriate use of methods of incorporation, which can be manipulated to effectuate a desired time course of release. Another possible method to control the duration of action by controlled release preparations is to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of incorporating these agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization with, for example, hydroxymethylcellulose or gelatinemicrocapsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES (1980).
The invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical compositions of the invention. Associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds.
15 6. Shot-Gun Approach to Megabase DNA Sequencing The present invention further demonstrates that a large sequence can be sequenced using a random shotgun approach. This procedure, described in detail in the examples that follow, has eliminated the up front cost of isolating and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols.
Certain aspects of the present invention are described in greater detail in the examples that follow. The examples are provided by way of illustration. Other aspects and embodiments of the present invention are contemplated by the inventors, as will be clear to those of skill in the art from reading the present 25 disclosure.
ILLUSTRATIVE EXAMPLES LIBRARIES AND SEQUENCING 1. Shotgun Sequencing Probability Analysis The overall strategy for a shotgun approach to whole genome sequencing follows from the Lander and Waterman (Landerman and Waterman, Genomics 2:231 (1988)) application of the equation for the Poisson distribution. According to this treatment, the probability, P that any given base in a sequence of size L, in nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 0 sequence has been determined can be calculated by the equation P e-m, where m is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 Mb of sequence has been randomly generated (IX coverage). APthat point, P e-1 0.37. The probability that any given base has not been sequenced is the same 0 as the probability that any region of the whole sequence L has not been determined and, therefore, is equivalent to the fraction of the whole sequence that has yet to be determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing approximately 17,000 random clones from both insert ends with an average sequence read length of 410 bp.
Similarly, the total gap length, G, is determined by the equation G Le m and the average gap size, g, follows the equation, g L/n. Thus, 5X coverage I5 leaves about 240 gaps averaging about 82 bp in size in a sequence of a °polynucleotide 2.8 Mb long.
The treatment above is essentially that of Lander and Waterman, Genomics 2: 231 (1988).
6 2. Random Library Construction In order to approximate the random model described above during actual sequencing, a nearly ideal library of cloned genomic fragments is required. The following library construction procedure was developed to achieve this end.
Streptococcus pneumoniae DNA is prepared by phenol extraction. A S 25 mixture containing 200 pg DNA in 1.0 ml of 300 mM sodium acetate, 10 mM Tris- HC1, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The sonicated DNA is ethanol precipitated and redissolved in 500 ll TE buffer.
To create blunt-ends, a 100 .1 aliquot of the resuspended DNA is digested with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30 0 C in 200 p1 BAL31 buffer. The digested DNA is phenol-extracted, ethanol-precipitated, redissolved in 100 p.l TE buffer, and then size-fractionated by electrophoresis through a 1.0% low melting temperature agarose gel. The section containing DNA fragments 1.6-2.0 kb in size is excised from the gel, and the LGT agarose is melted and the resulting solution is extracted with phenol to separate the agarose from the DNA. DNA is ethanol precipitated and redissolved in 20 .1 of TE buffer for ligation to vector.
A two-step ligation procedure is used to produce a plasmid library with 97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) contains 2 g.g of DNA fragments, 2 u.g pUC18 DNA (Pharmacia) cut with Smal and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase (GIBCO/BRL) and is incubated at 14°C for 4 hr. The ligation mixture then is phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in gl TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete bands in a ladder are visualized by ethidium bromide-staining and UV illumination and identified by size as insert vector v+I, v+2i, v+3i, etc. The portion of the gel containing v+I DNA is excised and the v+I DNA is recovered and resuspended into 20 p.l TE. The v+I DNA then is blunt-ended by T4 polymerase treatment for 5 min. at 37 0 C in a reaction mixture (50 ul) containing the v+I linears, 15 500 p.M each of the 4 dNTPs, and 9 units of T4 polymerase (New England BioLabs), under recommended buffer conditions. After phenol extraction and ethanol precipitation the repaired v+I linears are dissolved in 20 l.1 TE. The final ligation to produce circles is carried out in a 50 p.1 reaction containing 5 pl of v+I linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70 0 C the 20 following day, the reaction mixture is stored at -20 0
C.
This two-stage procedure results in a molecularly random collection of single-insert plasmid recombinants with minimal contamination from double-insert *00 chimeras or free vector Since deviation from randomness can arise from propagation the DNA in 25 the host, E. coli host cells deficient in all recombination and restriction functions Greener, Strategies 3 (1990)) are used to prevent rearrangements, deletions, and loss of clones by restriction. Furthermore, transformed cells are S plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells.
Plating is carried out as follows. A 100 p.1 aliquot of Epicurian Coli SURE II Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 .1 aliquot of 1.42 M beta-mercaptoethanol is added to the aliquot of cells to a final concentration of 25 mM. Cells are incubated on ice for 10 min. A 1 pl. aliquot of the final ligation is added to the cells and incubated on ice for 30 min. The cells are heat pulsed for 30 sec. at 42 0 C and placed back on ice for 2 min. The outgrowth period in liquid culture is eliminated from this protocol in order to minimize the preferential growth of any given transformed cell. Instead the transformation mixture is plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar SOB agar: 20 g tryptone, 5 g yeast extract, 0.5 g NaCI, 1.5% Difco Agar per liter of media). The ml bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is supplemented with I ml X-Gal 1 ml MgCl (1 and 1 ml MgSO /100 ml SOB agar. The 15 ml top layer is poured just prior to plating. Our titer is approximately 100 colonies/10 .1 aliquot of transformation. 4 All colonies are picked for template preparation regardless of size. Thus, only clones lost due to "poison" DNA or deleterious gene products are deleted from the library, resulting in a slight increase in gap number over that expected.
3. Random DNA Sequencing :i •High quality double stranded DNA plasmid templates are prepared using a "boiling bead" method developed in collaboration with Advanced Genetic Technology Corp. (Gaithersburg, MD) (Adams et al., Science 252:1651 (1991); Adams et al., Nature 355:632 (1992)). Plasmid preparation is performed in a 96- *20 well format for all stages of DNA preparation from bacterial growth through final DNA purification. Template concentration is determined using Hoechst Dye and a Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding templates are identified where possible and not sequenced.
Templates are also prepared from two Streptococcus pneumoniae lambda genomic libraries. An amplified library is constructed in the vector Lambda GEM- 12 (Promega) and an unamplified library is constructed in Lambda DASH II (Stratagene). In particular, for the unamplified lambda library, Streptococcus pneumoniae DNA 100 kb) is partially digested in a reaction mixture (200 ul) containing 50 g.g DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23 0
C.
S* 30 The digested DNA was phenol-extracted and electrophoresed on a 0.5% low melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised and recovered in a final volume of 6 ul. One I.1 of fragments is used with 1 pl of DASHII vector (Stratagene) in the recommended ligation reaction. One u.1 of the ligation mixture is used per packaging reaction following the recommended protocol with the Gigapack II XL Packaging Extract (Stratagene, #227711). Phage are plated directly without amplification from the packaging mixture (after dilution with 500 ul of recommended SM buffer and chloroform treatment). Yield is about 2.5x103 pfu/ul. The amplified library is prepared essentially as above except the lambda GEM-12 vector is used. After packaging, about 3.5x 04 pfu are plated on the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and stored frozen in 7% dimethylsulfoxide. The phage titer is approximately xl 09 pfu/ml.
Liquid lysates (100 pl) are prepared from randomly selected plaques (from the unamplified library) and template is prepared by long-range PCR using T7 and T3 vector-specific primers.
Sequencing reactions are carried out on plasmid and/or PCR templates using the AB Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (M13-21) and the M13 reverse (M13RPl) primers (Adams et al., Nature 368:474 (1994)). Dye terminator sequencing reactions are carried out on the lambda templates on a S'"i Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence the ends of the inserts from the Lambda GEM-12 library and T7 and T3 primers are used to sequence the ends of the inserts from the Lambda DASH II library.
20 Sequencing reactions are performed by eight individuals using an average of fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read distance. The overall sequencing success rate very approximately is about 85% for M13-21 and M13RPI sequences and 65% for dye-terminator reactions. The average usable read length is 485 bp for M13-21 sequences, 445bp for M13RPI sequences, and 375 bp for dye-terminator reactions.
Richards et al., Chapter 28 in AUTOMATED DNA SEQUENCING AND ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, (1994) described the value of using sequence from both ends of e* sequencing templates to facilitate ordering of contigs in shotgun assembly projects of lambda and cosmid clones. We balance the desirability of both-end sequencing (including the reduced cost of lower total number of templates) against shorter read-lengths for sequencing reactions performed with the M13RPI (reverse) primer compared to the M13-21 (forward) primer. Approximately one-half of the templates are sequenced from both ends. Random reverse sequencing reactions are done based on successful forward sequencing reactions. Some M13RPI sequences are obtained in a semi-directed fashion: M13-21: sequences pointing outward at the ends of contigs are chosen for M13RPI sequencing in an effort to specifically order contigs.
4. Protocol for Automated Cycle Sequencing The sequencing is carried out using ABI Catalyst robots and AB 373 Automated DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature control robot which has been developed specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labelled sequencing primers, and reaction buffer. Reaction mixes and templates are combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification one primer synthesis) steps are performed including denaturation, annealing of primer and template, and extension; DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents evaporation without the need for an oil overlay.
Two sequencing protocols are used: one for dye-labelled primers and a 20 second for dye-labelled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labelled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR- generated templates with both dye-primers and dye- terminators with approximately equal fidelity, although plasmid templates generally give longer usable sequences.
30 Thirty-two reactions are loaded per AB373 Sequencer each day, for a total of 960 samples. Electrophoresis is run overnight following the manufacturer's protocols, and the data is .collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI 373 performs automatic lane tracking and basecalling. The lane-tracking is confirmed visually. Each sequence electropherogram (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing sequences of low quality are removed and the sequence itself is loaded via software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker sequence is removed automatically by a software program. Average edited lengths of sequences from the standard ABI 373 are around 400 bp and depend mostly on the quality of the template used for the sequencing-reaction. ABI 373 Sequencers converted to Stretch Liners provide a longer electrophoresis path prior to fluorescence detection and increase the average number of usable bases to 500-600 bp.
INFORMATICS
1. Data Management A number of information management systems for a large-scale sequencing lab have been developed. (For review see, for instance, Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Washington D. 585 (1993)) The system used to collect and assemble the sequence data was developed using the Sybase relational database management system and was designed to automate data flow wherever possible and to reduce user error. The database stores and correlates all information collected during the entire operation from template 20 preparation to final analysis of the genome.. Because the raw output of the ABI 373 Sequencers was based on a Macintosh platform and the data management system chosen was based on a Unix platform, it was necessary to design and implement a variety of multi- user, client-server applications which allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort.
S* 2. Assembly An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of thousands of sequence fragments was employed to generate contigs. The TIGR assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the speed necessary to assemble more than 104 fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Beginning with a single seed sequence fragment, TIGR Assembler extends the. current contig by attempting to add the best matching fragment based on oligonucleotide content. The contig and candidate fragment are aligned using a modified version of the Smith-Waterman algorithm which provides for optimal gapped alignments (Waterman, M. Methods in Enzymology 164:765 (1988)). The contig is extended by the fragment only if strict criteria for the quality of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the algorithm in regions of minimal coverage and raised in regions with a possible repetitive element. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Fragments representing the boundaries of repetitive elements and potentially chimeric fragments are often rejected based on partial mismatches at the ends of alignments and excluded from the current contig.
TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template. It enforces the constraint that 15 sequence fragments from two ends of the same template point toward one another *in the contig and are located within a certain range of base pairs (definable for each clone based on the known clone size range for a given library).
The process resulted in 391 contigs as represented by SEQ ID NOs:1-391.
3. Identifying Genes The predicted coding regions of the Streptococcus pneumoniae genome were initially defined with the program GeneMark, which finds ORFs using a probabilistic classification technique. The predicted coding region sequences were used in searches against a database of all nucleotide sequences from GenBank (October, 1997), using the BLASTN search method to identify overlaps of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence matches are shown in Table 1. The ORFs without such matches were translated to protein sequences and compared to a non-redundant database of known proteins generated by combining the Swiss-prot, PIR and GenPept databases. ORFs that matched a database protein with BLASTP probability less than or equal to 0.01 are shown in Table 2. The table also lists assigned functions based on the closest match in the databases. ORFs that did not match protein or nucleotide sequences in the databases at these levels are shown in Table 3.
ILLUSTRATIVE APPLICATIONS 1. Production of an Antibody to a Streptococcus pneumoniae Protein Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the methods known in the art. The protein can also be produced in a recombinant prokaryotic expression system, such as E. coli, or can be chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows.
2. Monoclonal Antibody Production by Hybridoma Fusion Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the 15 classical method of Kohler, G. and Milstein, Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated.
The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supematant fluid of the wells by immunoassay 25 procedures, such as ELISA, as originally described by Engvall, Meth.
Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al., Basic Methods in Molecular Biology, Elsevier, New York. Section 21-2 (1989).
3. Polyclonal Antibody Production by Immunization Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody-production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins ro fall. See, for example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12M).
Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, Washington, D. C. (1980) V* Antibody preparations prepared according to either protocol are useful in 25 quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi- quantitatively or qualitatively to identify the presence of antigen in a biological sample. In addition, antibodies are useful in various animal models of pneumococcal disease as a means of evaluating the protein used to make the antibody as a potential vaccine target or as a means of evaluating the antibody as a potential immunotherapeutic or immunoprophylactic reagent.
4. Preparation of PCR Primers and Amplification of DNA Various fragments of the Streptococcus pneumoniae genome, such as those of Tables 1-3 and SEQ ID NOS: 1-391 can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. The PCR primers and amplified DNA of this Example find use in the Examples that follow.
Gene expression from DNA Sequences Corresponding to ORFs A fragment of the Streptococcus pneumoniae genome provided in Tables 1- 3 is introduced into an expression vector using conventional technology.
15 Techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If 20 desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et al., U. S. Patent No. 5,082,767, 0 incorporated herein by this reference.
S**
The following is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome fragment. Bacterial ORFs generally lack a poly A addition signal. The addition signal sequence can be added to the construct by, for example, splicing out the poly A addition sequence from pSG5 (Stratagene) using BglI and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXTI (Stratagene) for use in eukaryotic expression systems. pXTI contains the LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The positions of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Streptococcus pneumoniae DNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Streptococcus pneumoniae DNA and containing restriction endonuclease sequences for PstI incorporated into the 5' primer and BglII at the 5' end of the corresponding 15 Streptococcus pneumoniae DNA 3' primer, taking care to ensure that the Streptococcus pneumoniae DNA is positioned such that its followed with the poly A addition sequence. The purified fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonucleasc, digested with BglII, purified and ligated to pXTI, now containing a poly A addition sequence 20 and digested BglII.
The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri).
The protein is preferably released into the supernatant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface. Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted Streptococcus pneumoniae DNA sequence are injected into mice to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae
DNA.
Alternatively and if antibody production is not possible, the Streptococcus pneumoniae DNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as, for example, a globin fusion. Antibody to the globin moiety then is used to purify the chimeric protein. Corresponding protease cleavage sites are engineered between the globin moiety and the polypeptide encoded by the Streptococcus pneumoniae DNA so that the latter may be freed from the formed by simple protease digestion. One useful expression vector for generating globin chimerics is (Stratagene). This vector encodes a rabbit globin. Intron II of the rabbit globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al., cited elsewhere herein, and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptides of the invention also may be produced using in vitro translation systems such as in vitro ExpressTM Translation Kit (Stratagene).
While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention.
All patents, patent applications and publications referred to above are hereby incorporated by reference.
Throughout the specification, unless the context requires otherwise, the word "comprise" or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
•coo e EDITORIAL NOTE FOR 33351/01 PAGES 1402-1403 ARE CLAIM PAGES THEY APPEAR AFTER THE TABLE AND SEQUENCE LISTING T A B L I S pne m a Co in 9ei n con ai in 999w sequ99ces 9 c- 9 9rn t 9a-e 9 9 99rcentl P *I F 1 1 431 S pnuona 100 Coding135 regit~os cnuonann knownd sethuene sufxd ects mr ad1 9 2056 8331SZ Coti OF trtItp athmathgnte naexB percent,,.FG,1,,JK gens ntPrjans OF nt642 ID D I InlI bintthsi gee acesdo ai ident Iegh int 3 ll1 71 100 17gbIU4113SPZ S~nuoe1 x pnemonapertid ,etFGh.iine sulonie dredccasetsAan 92 J 620 624 I I4- ho o er n kinase-- ho o o genes,- complete----cd-- 2 51 61049 5970 IgmbIUO4O47fPZ Strnetoocspemna Snexrnguoiaegn adisrin9l5 2 :52 j 66 ebZ33IPB I~n w oie dexB. ca B.C.D0.E. FD. KI genes. dTDP-rhamnose j 9812 4 6 I Ibiosynthesis genes and allA gene I 19 I3 113 1154 9770 0914 leblZ8351P2 IStrpuoniaeu deuml. capi(A.B.CDE.FG.n.1.J. gene. domplremne an 1 99 6244j 644 1I I 1 1 biosynhesse genesand l gene ata d I II j121 148 39 71 emb 118335 1 5 t IStrpeumoia epea ca l A 8 C DEur FG.IIas lB JKj gene. dcomplret ne s an1 9 1359 1 9 I 113 113446 112039 IgbIU435261 Streptococcus pneumonia neuraminidase B Inane) gene, complete cds, and 99 I1 1 I I I I I neuraminidase InanAl gene, partial cdsI I 3 114 115017 11378 IgbIU43526j IStreptococcus pneumoniae neuraminidase B InanB) gene, complete cds. and 1 9 11 1 1 1 I 1 neurastinidese InanAl gene, partial cdsIII I3 118 1137217 114398 gb1U435261 Streptococcus pneueonia. neuraminidase B (nanBi gene. complete cds. and j 99 16 18 I I I I I neuraminidase (nanAl gene, partial cds III 3 16 1 1 42 111 ImbI43I Streptococcus pneuaoniae nuaminipdas (noan gene ne, omplet cdand I 99 8143 8143 4 1 2 111:92 112829 lgb1431SP Streptococicus pneumonia. nuaminipdas cpan genne, omplet cnd, endS 99 215 1 21511 4 I 1 11ji210 11843 IgbIu41356 Streptococcus pneumonia. neurtids BmatnS) gufiene cplee cdsrA and 1 175 1069 1 I I I I erinde kinane ologh gene cds pet 11311 I12 736 I 6 118ImbIYI426IS Stretooccu NeufornertinaG reoDencpo ent and31 O13F2 and 93 2138 I 2143 F 4T 6 8 111327 111473 gebIU41751I IStrpocspneumoniae p o netidenmethionce S fxid redut6s bp)A an58 175 f 2U9 a 6 7 7135 136 795emb1Z7772615P15 IS.pneumoniae DNA for insertion sequence 1S131 (1372 bpl 99 4523 403 4 6 12 1 221 75701 lemblZ7772515P15 ISpneweoniae DNxA for insertion sequence161381K (966s bTp( jmos 96 160 j 24 6 123 1205 16823 lemb1Z83335l5PZ8 S.pneumonlae dexB, capl(A.B.CDE.FGIIi.J,I( genes, dTDP-rhamnose 96 j 4 f I 1 I1 biosynthesis genes and eliA gene IIII *D 11 901 99t I n9~s~ I I dS La egh a 901 06 IbZS3335jSPZ8 S.pneumoniae desfi. capl(A.8.C.D..F.U.1.1.J.Kj genes, dTDP-rhamsnose *I 1 I0, 110 biosynthesis genes and aliA gene 1 1 1 1 I I cd 1 1 51 I i II 2j58j198 IembjZ796SljSOOR jS.pneumoniae yorf(A.B.C.DEJ, ftsL. pbpx and regft genes I 99 16 372 I1 3477 82 18 lembJZ7969ljSOOR jS.pneussoniae yorf(A.B.C.DEj. ftsL. pbpK and regR genes I 99 I 109 108 4 -4 11 I 6 3480 13247 lembIZ7969115008 jS.pneumoniae yort(A.B.C.D.EI. ttsL. pbpX and regR genes I 99 234 234 I 3601 457 1embIZ796911500R jS.pneumoniae yorL(A.B.C.D.EI, itaL. pbpX and regil genes I 98 957) 957I 11 1 a 8 4506 148816 Jembi Z79693I SOOR IS.pneumoniae yorfiA.8.C.D.Ej. ftsL. pbpx and regR genes I 99 le38 381 11I 110 1 71131 1 8124 lembIXi6367ISPP8 IStreptococcus pneumoniae pbpX gene for penicillin binding protein 2x j 98 1 70 993 I 13 111 53 1 1126 IgbIN3I296I Is.pneumoniae recP gene. compiete cds I 99 I 431 1074 14 3 11837 2148 1emb1Z.833351514Z8 jS.pneumoniae deaR. capi(AB.C.D.E.?.0.H.I.J.Kj genes, dTDP-rhamnose 87 j 96 312 V I I I I II biosynthesis genes and eliA geneIIII -4 14 14 12518 2108 jgbIH36l80J IStreptococcus pneumoniae transposase. (comA and coe) and SAICAR synthetase 98 411 1 4111 1 1 1 I purCi genes, complete cds I I II is 9 1394 851 IgIU09391 Streptococcus pneumoniae type 19F capsular polysaccitaridea biosynthkesis I I I jpartial cds I I II 17 1 1,3910 13458 1emb1Z1726159l5 jS.pneumoniae DNA for insertion sequence 1118 (1372 bp) 1 98 I 453 I 453 17 18 14304 1 3873 1emb1Z7772715P15 IS.pneumoniae DNA for insertion sequence 1S1318 (823 bpl 96 1 382 j 432 19 1 41 1529 IembIX949O9ISPIG IS.pneumoniae igo gene 1 75 368 489 I 19 12 1554 j 757 IgbILO077S21 IStreptococcus pneumoniee attachment site (attet. DNA sequence I 99 I 167 204 I 19 1 3 1946 j 1827 jgbjL0l7S21 IStreptococcus pnewsonlae attachment site tattSl. DNA sequence 1 94 1 100 1 882 1 1 3 182 gbU311 IStreptococcus pneumoniae orfL, gene, partial cds, competence stimulating 99 756 7561 1 gbj1J33l~j peptide precursor (coed), histidine protein kinase (comD) and responseI I I I 1 I1 1 regulator (comEt genes, complete cds, tRNA-Arg and tRNA-Gln geneIII I 20 I2 2271 1931 gbU31 Streptococcus pneumoniae oriL. gene. partial cds. competence stimulating 1 98 14 ii 1 gbjU333i Pe pLde precursor (comd(, hlstldine protein kinase (comDi and responseI I I I I I regulator (comE) genes, complete cds, tRUA-Arg and tRNA-Gln genes III 4- *0 too 90. 9 0 a 09 TAB LE I S. pneumonise Coding regions containing known sequence&.
IContig IORF j Start Stop match wtch gone name pret S t jOrn 10 lID InL "aacsio identl S l nt I ieFnt I I~b76281 (cos) epnergltr42foolog ComE (comE) genes, complete cds petd1rcro oC j 9 9 4 31 57 IbA-068 treptococcus pneumoniae R801 tRNA-Arg gene, partial sequence, and Putative gg 1206 f 12061 j 332 1 bA'ta subunit of DNA polymerase III lspdnan) genes, complete cds 1 4' pneumoniae R801 tP.NA-Arg gene, partial sequence, and putative J 9 7 771 isrno protease Csphtra), SFSPOJ Iasp3oa), initiator protein Ispdnea) and 771--- I I I j jbbta subunit of DNA polymerase III (updnan) genes, complete cdsI 6 5532 16917 j"lA0068 eSretCCU pneumonlse R801 tRNA-Arg gene, partial jeune an uaivI9 18 138 Sf I I 1 bA~o065~ rie pteaseo Isphtrai, SPSpoJ Ispspoji, initiator protein Ispdneai and I~ I ta subunit of DNA polymorase III Ispdnan) genes, complete cds I 7 6995 8212 1gblAFOOo6s8j 1St reptococcus pneumoniae 11801 tRNA-Arg gene, partial sequence, and putative 99 1218 11 I I I I i rine proteas o sphtral, SPSp J Ispspo J. i t a or p te n s d aa a d SI I I I l eta subunit of DNA polymerase III Ispdnan) genes, complete cdsI
I
1471------05 Sretococcus pneumoniae Real tRNA-Arg gene, partiai sequence. and putative I 20 8 8334 871 jgbAF0O068~ jserine protease Isphtra), SPSpoJ Ispspoal, Initiator protein Ispdna and 28 28 I I lb eta subunit of DNA poiymerase III Ispdnanl genes, complete cdsI I 8534 9670b 1 gb1Asub0ni8 of DNA polymerase III Ispdnsn) ens complete proei I 9 j 14 17 I 2 14 111887 112267 1embIz7772615rl5 lS.pneumoniee DNA for insertion sequence ISi 1131(32 bp) 1 99 226 381 22 115 112708 112256 1embIz777271S815 IS.pneumoniae DNA for insertion sequence 1S1318 (82) bp) 1 97 I 353 I 453-4 22 116 113165 112662 10mb1Z7772615P15 IS.pneumoniae DNA for insertion sequence isi318 (1372 bp) 98 504 504 i 22 j23 11835-8 1189i0 1emb1Z8611215PZ8 IS.pneumonlae genes encoding galacturonosyl trensferase and transposase end 95 1 63 5131 I I I I I~i nsertLon sequence 1S1515II
I
4- genes encoding galacturonosyl transferase end transposase and 994 47 24f9j199 jebz6115z jinsertion sequence is1ss I15 41 4 23 1 5 I 5624 14203 1emb1X5247415PrL IS.pneumoniae ply gene for powumolysin 9 42 12 23 I6063 5629 IgbIHl77j IS.pneumonise pneumolysln gene, complete cds I 98 197 435 I I I5500 I 2 IembIX94909ISPlc IS.pneumoniae Igo gene 1 8 47 1 59 4 1 6 5823 15584 cbU781 Itetccu nuoieimngoui Alpoes ia gee copee9 1520 -63 1 1 7 .158 gbIU476871 Ite cds cu pneumoniae immunoglobulin Al protease ((gal gene, complete87 37 I I I cds 1muoloui Al0 prtes Ii'lencmpeel o "a 0,0 a -0 0 e I TA BLE I S. pneumoniae -Coding regions containing known sequences Contig orF I Start j stop match Imatch gene name Ipercentl HSP nt ORV nt I ID 10 inl I nt) acesslon jdI n length-- l-ength 26 1I 114498 1184 '~bZ8 '51PBS'numna deco, caplIA.B.C.D. ,o,G1,l1,JK genes. dO-hmoe9 3 I .isnte1 genes and &1iA geneIII 26 9 1173:194 :ebz::zz;1P:; ISy2n 5!saed:::e c~lAnd sit; ee G1,1JK genes, dbnP-thamniose 100 9'16 1 1 1 -8I IS, 9leumothei genes and aliA geneJ42 I------Streptococcus------ 26 1 0 1 9 2 1 5 7 9 l O 0 7 r p o o c s pneumoniae S Sn dextran gl c s d s e e a d i s r n1 97 11121 I I I i sequence IS1202 transposase gene, complete cds
II
9 28 1 0 523 lebIO3SGI Z Strnetooccusda3 pnumnaemaltosDe/maltodectri naes dTUaIX an s tw 99 2 1317 ma11 iotesin pernessei and maID) genes copetId 34 1 2 147 50 1 26 gbILO86il( (Streptococcus pneumoniae mS eit/ altdrn uptoi a e geneai~nd ton 96 J 450 89 0 I I I I j j m1 eqtoexncI10 pere aseaCan mD genes, complete cd i 28 34 3 29378 13298 IgbIL0l8S6( (Streptococcus pneumonlae ma)A genea gcoet d mait gene complnert e cd16 446 5182 I04 j 7 124 (gb(L0iB6(1 IStreptococcus pneuieonlae maltogene mletei; tk mal gene, c ot cd 99 1317 I 1447 I4 I 348 446 1bl 85( tetocu peumonae m.ICa ndl a genesplt cs tI ee omplete cd i 96 I 99 9 1 34 1 2 1 7764 1 7507 1 gb(U48135 IStreptococcus pneumonlae peptide/malet in suptoxie reua se a rA ando 93 9 201 158 I II I n e ln ase omog anthro genes, complete cd I
I
L
34 (16- 562 (134025 gebL162so (Strp ocspneumonle m a-oo maAg ncmlt I; aRgn.cmlt 96 238 82 306 I I 34 1 4 1 1760 1249 1ebi283515 Istro c cuneumonle dco c a lA gC~ien, opltecdx gene T cohmplese 98 248 14J 1 3 I 1 5 I48 1 4 1 Ig I 2 8 6 I bs the sico cs g e n e a l t\ g ne co pl t Ids IaR g n c m l t 1 9 9 9 -4 1 9 174 i96O I(gbjU09239( IStreptococcus pneumoniae tpetd ISP io n capulr olyachde biosynhe sis an8d6450 I I I I ho oern iseoCD ooaaz,,saBo genes, complete cd, an all gee (0 I I 1 35 176 11062 11057 1emb1X85787(SPCP IS.pneumoniee deOs idAx cpIB cps2c cplo cps4E cpiP cps06 1 7 I 9 1 35 (3 j 1 17 6 1709 jebZ331PB S.pneumoniae deco, capll;.BC.D.E GiiIJXl genes dTI7P-rixamnose 86 72 26 I I( j I I biosynthesis genes and &11; gene
II
(515 91 IbU931 Streptococcus pneumoniae type 19F capsular poiysaccaride biosynthesis 83 75010 19 17620 l6871 ob~u0939( (operon, Icpsi9ACDEFssrJKLMaOa genes, complete cdi, and all; gene,II I I I j partial6 cds
I
r a aa a. a. a a..
TABLE I S. pneumoniae Coding regions containing known sequences -a-t I CotigOAF Strt ISto wath Imatch gene name I ident lengt length 2 J 1 9061 1164 lmI877SC ~nuoi e d S. cpsl4A. cpsl4e, cpaI4C, cpsi4o, cpsI4E*,cS1F cpsl 120~C514.1 Cp1141. lbxaSBf crI Pf on ed cpaldJ, cpsI4K. Cpsl4L, tasA genes 94 145 1458 36 19 1890 185 IbU4761 Sreptococcus pneumoniae surface antigen A variant precur sor IpseAl and 18 j 99 60 I Ijfk~a protein genes, complete cds, and 0Kv! gene, partial cda I I 120 j- 1 3 9 118966 jIgb-u-53509I---- JS--:ep-t-occus pneumoniae surface adhasin A precursor (psaA) gene, complete 99969 J 969;- 1 4 4 4d 3'7 1 2743 179 1emb1Z677391sePA IS.pneumoniae parC, part and transposaa genes and unknown or!( ~99 2565 2565 37 12 12985 1 2824 IembIZ67739IsppA Is.pneumoniaa parC, pact and tranaposase genea and unkno~wn or! 1 100 1 162 1 162 37 1 3 1 5034 j 3070 IembIJ67739ISrrA Sp umna aC aEadtasoaegns nd nk wn rf1 9 1 195 96 1 3 I 4 1-'5134 1 5790 IembIZ67739ISPrA IS.pneumoniae parC, part and transposase genes and unknown or! 9 657 j 6571 37 1 5 I 6171 15833 IambiZ67739IsprPA Is.pneumoniae parC, part and transposase genes and unknown or! 96 1 39 I 339 38 119 112969 113268 1gb1428679I IS.pneumoniae promoter region DNA 106 0 j2137 19bIU41735I Istreptococcus pneumoniae peptide methionine sulfoxide reductase imsrAi and j 99 8 882 II I homoserine kinase homolog ithrgl genes, Complete cds
LAI
3- -1 3- 2 ins homolog (thril genes, complete cds 96960 J 9 j 5253 1 7208 lgbIN296861 Is.pneumoniae mismatch repair Ihexa) gene, complete cds I 99 I 1956 I 1956 41 j 1 3 I 1037 IembIti73ol1SRaE IS.pneumoniae recA gene encoding AecA -02 -3 41 1 1 2713 fembjZ34 303 ISPcI jStt occu donsemna i necdn the ciMA reA IF yA9 1386 1356 re to c uspneumoniae cm peron ge e co dingt jd 99 966 4 i 9- 41 14 1 32-2 13096 10b1N138121 IS.pneumonlae autolysin (lythl gene, complete cds j 100 j 177 j 177 41 15 136(13 1 3860 IgbIH138l21 IS.pneumoniae autolysin (lytA) gene, complete cds 1 100 258 1 2581 41 16 14755 j 5162 jgbIl, 666oI IStreptococcus pneumoniae OAF, complete cds 1 98 j 408 1 408 41 j 7 j 5270 1 5716 IgbIL366601 IStreptococcus pneumoniae OAF, complete cds 1 98 1 47 1 447I 41 1 8 16112 1 6918 IgbIt.366601 IStreptococcus pneumonlae OAF, complete cds I 98 1 431 j 8071 41 1 9. 1 6916 1 119 IgbIL36660I IStreptococcus pneumoniae ORF, complete cds I 100 204 2041 41 110 1 1082 1 '7660 Igb1L36660I IStreptococcus pneumoniae OAF, complete cds 1 97 552 579 41 111 j 7680 j 7979 IgbjL366601 IStreptococcus pneumonlae OAF, complete cds 1 98 1 i jl 300 41 112 I 9169 j 8737 IembIZ7772l1sPxS IS.pneumoniae DNA for insertion sequence 1S1318 (823 bp) 1 97 I 353 4 53 *a a a aa a a. aa TABLE I S. pneumoniae -Coding regions containing known sequencesI I CotgjR tr tp wtch watch gene name Ipercent LISPgnt I? Fnt 1 10 JID I (nt) I (nt I acession (de nt j ength Iang 'h 41 113 9533 9132 lemb1277725l5P15 jS.pneumoniae DNA for insertion aequenca 1S1381 (966 bp) 95 160 402
I
1 41 j14 1 9669 19475 IembIZ8200IISPz8 IS.pnaumoniae pcph gene and open reading frames I 100 189 195 4 1 44 1 j 7190 1 755S IembIZ8Z0O1iSPZa jS.pneumoniae pcpA gene and open reading frame.s 99 36366 1 44 16 18059 1760) lembl277726l5P15 IS.pneumonise DNA for insertion aequence 151318 (1372 bpi 97 453 I 453I I I8423 8022 1emb117772515P15 IS.pneumoniae DNA for insertion sequence 151381 1966 bp) 95 160 402 f- f- I- 44 1 8 18559 18365 1eamb1Z8200115PZ8 IS.p naumoniae pcpA gene and open reading frames I 100 189 195 f- 4687-- 1gb1L390741 -Straptocccus pnaumoniee pyruvate oxidae (spxB) gene, complete cda 99 1794 1794 1 49 1 2 1 3) 2603 igbIL2056l1 IStroptococcus pneumoniae Exp 7 gene, partial cds j 100 216 j 2373 4- 4 53 1 6 12407 1 2156 IobIU04047I Istreptococcus pneumonia. SSZ dextren glucosidase gene end insertion 9 4 1 9j 24 I 53 I7 1 2566 2405 1emblz8333515P28 s.pneumoniae desa, csplIA.ac..Go)n1.aJet genes, dTDP-rhsmnose I I I I biosynthesis gene. and &Ilk gene 100 162 53 a 2831 12475 'embjZ.8333SjSPZ8 S.pneumoniae dexa 4 cepllk.8.C.D.E.F.Gii.1.J.Ki gene.. dTDP-rhamnose 99 13 '1 I I biosynthesis genes end aliA geneI 54 113 112409 111105 IembIZ8333sISPZ8 S.pneumoniae dexa,. cspliA.B.C.DEFOi* lJf~ genea, dTl)P-rhamnoaa I biosynthesis genes and eliA gene 67" 59) 1 55 122 120488 11994.9 1amb128437911i528 IS.vneumoniae dir gene jisolsta 921 1 99 1 540 1 5401 1 61 111 111864 1 990P IembIZ16082IPNAL IStreptococcus pneumonia. subl acne1 98 1 1965 1965 f 1 63 1 1 3 1 239. IgbIftil729I IS.pnaumonise misatch repair protein (hexAl gene, Compiate cds 10 to j 217 j 37 63 2 233 2611 IgbjIl8729I IS.pneumoniaa mismatch repair protein ihexA) gene, complete cds 1 99 2330 2379 f 1 63 1 3 1 25517 2823 1gb1H187291 IS.pneumonise mismatch repair protein ihexA) gene, complete cds 1 99 1 266 267 1 63 2958 4664 IgbIIMl87291 IS.pneumoniae mismatch repair protein ihexAl gene, complete cds 1 95 1 69 1707) 339 -ft- teptoo -cu- s gen. c-pleft- 1f96 1- -ft7 676 7 3--70-6--339T 2 I-tre tococcus pneumonia.----hy---uronid--s---gene,--comp---ete---------96-3 2 372 -ft- ft- ft- occs- -ft- -ft cd 1 9 293 299 -67 7 7161 4171- Stetccu pneumonia hy--urnid-s geecmpet-d----- 93 29 ft- 1 70 1 1 1 I 702 jgb~fIi43401 jS.pneumoniae Dpnl gene region encoding dpnC and dpnl. complete cda I 100 693 1 702 70 2 1678 11160 IgbIftl434Oj js.pneumoniae DpnI gene region encoding dpnc and dpno. complete cds j 100 483 j 483 ft- -ft0-2-0--ft- -ft- -ft- -ft- -ftcomletecds 8 46 128 70 24911-12-0- gb-n4339i peumon--e-D-n---gene-regionencoding---nH--d-nA--d-s----complete-cds-- 98-462-128 70-74 -ft- plee cd 1 9 1 14 19 70 7 423 jT 4424 J-2-4 ;Spemns e doyibnces l- gee coplt cd 47-- 4 ft- 1 70 1 8 1519-11 4316 1gb1J042341 IS.pneumonise exodeoxyr ibonuc Iesse lexoAl gene. complete cdi 1 99 881o 882 a* a *@a TABLE 1 S. pneumoniae -Coding regions containing known sequences Contig 10KV Start Stop I match Imatch gene name Ipercentj NSF nt 0KV nt I D1 IlD Int) Intl I acessLon I i dent length ln 70 113 1 8103 19874 IgbIL2OS62I IStreptp occua pneumoniae Exp8 gene, partial cdi 93 234 1767 -I 4 71 j22 1279641 128341 lambIXG3EO2ISPeo IS.pneumoniae mmsA-Box I 93 1 233 1 3781 72 1 5 1 4607? 3552 lembIZ268S0ISPAT Is.pneumoniae 111222) genes for AT~asa a subunit, AT~ase b subunit and AT~ase j 97 102 11 I~ 1 c subunit
III
71 1 1 471 1133 temblX636O2ISPfo IS.pneumonlaa mmsA-Box 1 91 1 193 f 339 71 I 3 I365111 977 Igb[JO4479j IS.pnmumonlae DNA polymerase I IpolAl gene, complete cdi 1 99 2682 1 26821 I3 1 8 1"8 157 IebIN]61801 Streptococcus pneumoniae transposase, (comA and comB) asd SAICAR aytheta I I I I (purCi genes, complete cdis 1naa 98 j 18 5161 4 4 77 1' 3 622 11999 jemb1Z8333515rZ8 1 S.pneurnoniae dexe. CapIIA.B.C.D.E.FGII.1,J.K) genes, dTDP-rhamnose 1 95 j 624 2 4 I I I I IIbiosynthesis genes and aliA gene 11 8,8191 7 8 1 1 3 41i 3 1emb1X7724915PK6 Is.pnmumoniae (86) ciaRk/ciall genes 1 99 I 339 1 339 1 78 1 2 J 1095 325 jembIX7724915p86 IS.pneumonia. (86) ciaRk/clall genes 1 99 1 771 771
ON
82 110 111436 110816 jgbIu9072l1 Istraptococcus pneumoniae signal peptidase I Isp1) gene, complete cdi I 97 1 621 j 621 C 8 112403 111434 IgbIU93S76j Streptococcus pneumoniae ribonuclease HII IrnhB) gene, complete cdi 1 98 1 953 I 969 6 2 112 j12381 112704 jgbIU935761 IStreptoc6ccus pneumoniae ribonuclease kill Irnhai gene, complete cds 1 100 1 51 324 I 3 31: 350 1eb177275ps S.pneumoniae DNA for insertion sequence 1S1318I 1823 bpl 1 97 j 290 339 1 83 110 14663 6851 1gb111361801 Streptococcus pneumoniae transposase. (comA and comB) and SAICAR syntlietase 991 2190 1901 I I I I I IpurCI genes, complete cdi 1 99 5 136 Ill-I-684--8213--b-------0--Strepococcus-pn-umo--e-transposase,------and-c-mB--an--A-CAR-synthetae-99-I I I I I I purCI genes, complete cdsII I 1 83 j12 1 82361 9090 ~gbIH36i80j IStreptococcus pneumoniae transposase. (comA and comB) and SAICAR synthetase 1 99 j 855 1 1 I 1 I purCl genes, complete cdi 4 1 83 113 j 9283 113017 IgbjLl51901 Istreptococcus pneumoniae SAICAR synthetase IpurC) gene, complete cdi I 100 1 107 3735 3 13 217 123313 1 gbIL369231 Stepoocu pneumoniae beta-N-acetylhexosaminidase (str)) gene, complete 1 98 218 16 1 1 I 11671 1 83 124 1226 j23450 IgbIL369231 Streptococcus pneumoniae beta-tl-acetyihesosamiviidase (sail) gene, complete I 98 172 1833 1 1 I 1 cds a] 8 125 12752) j23505 IgbIL369231 Streptococcus pneumoniae beta-N-acatylltexosaminidase Istrlll gene, Complete 99 38261 02 II I I I I cds IiII 06e as 0. *0 of: eee. 0* 9 t* C C C TABLE I S. pneuwoniae -Coding regions containing known sequences -St- ntch -c 1I) 1I Intl IIntl I acession idant length length 1--L369231----St-e-tococcu---n--m-----e-- 1170 I 1 I cdsI 4 4 64 4 i455-11 6173 1ewbjt8333515Pz8 1S.pneumoniae dexi, cspliAsc,D.e.F.o.HIt.J.KI genes. dTOP-rhamnose 1 67 12 I I I I I j biosynthesis genes end ellA gene jI 8 j 67 j 12 I 7 I6 5951 5316 lembIt7772515rrs js.pneuwoniae DNA..for Insertion sequence 151381 1966 bpl 1 96 1 439 j 6361 8- 88 so1 2957 3 511 igbIH36l8Oj Streptococcus pneumoniee transposase, IcomA end comB) and SAICAR ase 941.55 5 1 1 i purC) genes, complete cds I I a 1 88 j 6 3 461 4269 IgbIi36l8Oj Streptococcus pneuwoniae transposase, comA and coma) aqd SAICAR synthetase 94 804 804 I I purC) genes, complete cds 3 89113 987b1 1 10091 gobji36180 1 IStreptococcus pneumoniae transposase, ico.A and comi and SAnAR synthetase 9 1 1 89 j IpurC3 genes, complete cdi s1 I I~ biosynthesis genes end allA geneI I 4----006-2 102 9- 1- 0 I io--5302 14941 -IewbIX636O2ISPSo IS.pneumoniae ewSA-Box 1 6 1 6 97 4 I 17011 I 1520 u- 1 71 50IbU131 Streptococcus pneumoniae peptide methionine sulfoxlde redtictase lmsrAI and 10j 19 I I I I I homoserine kinase homolog ithrill genes, complete cdis 1 1 110 I 199 1 1 89 j 00 IsbIZ8333SjSPz8 S.pneuwonlae dexa, capllA..C.D.E..cnzIJ.Kl genes, dTOP-rhamnosej 93I 52 62- I I a biosynthesis genes and allA gene 9I I 9 I 1 1 99 1 2 1 1771 775 lewbIXl73371SPAji IStreptococcus pneumoniae awl locus conferring aminopterin resistance 1 99 I 998 999 199 1 3 1 2794 11712 1embI173371sPAn IStreptococcue pneumoniae awl locus conferring aminopterin resistance 1 99 16 1083 4 I 99 1 4 1 313;- 1 2788 IembIXl7337ISPAn jStreptococcus pneuwoniae awl locus conferring aminopterin resistance I100 945 945I 199 1 5 5249 37114 1ewblxl713371SPn4 IStreptococcus pneumoniae .ini locus conferring aminopterin resistance 1 00 1516 1536 a F 101 1 -2653 IS22I E ISpneuw-moni&-ae epuA and endAgenes ftor -7 koa protei-n a&ndimembrane 99 j 1323 1 1 1 I endonuclease 1 1 1I 4 1101 12 11492 1719 1 embIXs422SISPENSu pemna epuA a:3 endA genes for 7 kDa protein and membrane j 9 228 228 1I 1 5.emonilaa 9 e I 101 13 11694 1855 IewbIXS422sISPEai IS.pneumoniae epuA and endA genes for 7 kDa protein and membrane j 100 I 162 162 1 1 1I 1 endonucleaseIII 101 14 I1701 j 2582 IewbIXS422SsPEti Is.pneumonlae epuA and endA genes for I Da protein and membrane I 10 882 882 to I 555 50I mI Z994 I ed o cu s noae sd IgInI -4 5 6 j 5 4 em b 1 2 l 15 8 z IS t e oc u p n eu m o n la e f r i s tio n g qen e jS3 1 (8 23 39 2 6 1 5 16 104 j 2 1341 1556 1ewb1Z777271585 IS----------pneuwoniae DNA for insertion sequence IS13i8 1823-- bpi j 81 206 j ~0 at* C* p p s S. pneumoniae -Coding regions containing known sequences Contig 1097 Start IStop mtch mtch gene name jpercentl HSP nt OR? nt 105 5 5381 j5028 IembIZ67739ISPPA jS.pneurpioniae parC. parE anid transposase genes and unknown ort 98 351 354I 105 6 j 6089 15379 lemb156773915PPA IS.pneumonlae parC. parE and transposase genes and unknown orE 1 98 84 1 711 107 4 2785 1880 1emb1X1602215PPE IS.pneumoniae penA gene 1 98 72 906 -4 1 107 15 12913 14988 IembIXl6022jSPPE IS.pneumoniae penA gene I 99 1 1692 2076 91159 l~iI16SP Streptococcue pneumonlae penA gene for penicillin binding protein 28 11 107 615! (07 J 4981~ Jembx13l3ISPPEI lacking H-term. (penicillin resistant atrain( i I 18 j9 j9068 8718 IembIZ67739ISPPA fS.pneumoniae parC. parE and tranaposase genes and unknoon orf 95 342 351 9- I 0 12 11!1308 110922 lembIZ67739ISPPA IS.pneumoniae parC, parE and transposase genes and unknown orf 1 99 1 199 387 908-- -i I 109 13 12768 12241 1emb1Z7772515P!S IS.pneuinoniae DNA for insertion sequence 1S1381 (966 bp( 96 1 61 528 I 109 14 12688 1 2855 IembIZ77726ISPlS IS.pneumoniae DNA for insertion sequence 1S1318 (1372 bpi 96 1 148 1 168 1 109 15 1 2862 13269 lem()lz771727ISP1S IS.pneumoniae DNA for Insertion sequence 151318 (823 bpi 97' 353 408 9 9 1 09 1 j 5320 1 3584 IgbIHl87291 IS.pneumoniae mismatch repair protein (hexAl gene, complete cds j t00 371 1737 F 113 ;1 1 431 1 1 gb1N361801 Streptococcue pneumonia. transposae. Icm an IoB n ACRsnteae 9 1 2 1 1I IpurC) genes complete cdi cm n oa n ACA lnhta 5 490 1 13 110 97E 1 8532 IembIX99400ISPDA IS.pneumoniae dacA gene and OR? 99 1257 1257 9 I 113 Il 9870 110985 lembIX99400ISPDA IS.pneumoniae dacA gene and OR? 99 1116 1116 114 3 125301' 203Q IgbIH36l8Oj IStreptococcus pneumonlae transposase, (comA and comill and SAICAR synthetase I 9 8 I I I I I (purCi genes. complete cds 1 1 "1 I 0 4 1 115 )I 111)0:1 110932 IgbIti040471 j streptococcuu pneumonia. SSZ dextran glucosidase gene and Insertion 97 372 172 I I I I I I sequence 1S1202 transposase gene, complete cds 1(17 1 897 j 3302 1emb1X7296715PNA IS.pneumoniae nanA gene 1 9 2402 2406 1 (17 13 143231 3899 1gb1H361801 Jstreptococcus pneumonia. transposae, (comA and comb) and SAICAR synthetase 98 j 2 429 I II I (purC) genes, complete cds II 9- 121 2 11369 11941 IgbIU72720j Streptococcus pneumonia. heat shock protein 70 idnaK) gene, complete cds 99 202 573 1 1 and DnaJ idnaJI gene, partial cds 1 121 3 2412 14253 gbU2 01 Sreptococcus apneumonia, heat shock protein 70 IdnaKI gene, complete cds j 99 1842 1842 I I I I I land DnaJ (dnaJ) gene, partial cdsIIII 122 1 8 150651 5587 IgbIU0404 'I Istreptococcus pneumonia. SSZ dextran glucosidase gene and insertion 64 j 51 5221 I I1 sequence 1S1202 transposae gene, complete cds 4 4anuo a *oin *ein ca inn knw seq*u*n.ea ato mac *ac gen nam I ae t a1S at as 125 1 1 1 1811 S. pnemoia -gI310 Coding reo s cnuonann krnson seqen ces n oB n ACR yte e 9 111 I 2 115 ~12D 9 1120 Intl8331SZ jSpemna (ntl, acessionEF,.HI,,K gees idenrs legh legh I I II I biosynthesis genes and allA% gene 1 1 134 .1 1 1 492 IembIYlO8I8ISPYl IS.pneumoniae aipsA gene 99 203 4921 I 134 12 j 556 j 2652 IgbIAF0199041 IStreptococcus pnaumoniae choline binding protein A IcbpA) gene, partial cds j 86 1 685 1 2097 4- 134 13 11160 1837 IembIYIOB1BISPYi IS.pneumonlee apsA gene 1 86 j 324 j 324 134 14 13952 2882 1gbiAF0l99O41 IStraptococcus pneumonlae choline binding protein A (cbpA) gene, partial cds 1 98 215 1071 134 8 992 9848 IgbIUl2S67j IStreptococcus pneumoniae P13 glycerol-3-phosphate dehydroganase jIpal 99 285 1857 I i I I gene, partial cds, and glycerol uptake facilitator Igipri and OR F3 genes.I I I I I I Icomplete cds 1I 86 102 gI151 Streptococcus pneumoniae P13 glcrl3popaedehydrogenase 'QIlpoi 51017 134 1022 1 g~u257~ gene,. partia cds. and glycerol uptake facilitator (gIpri and ORF] genes., I I I I I Icomplete cds -;gi~jul256a gnprtil cds, and glycerol uptake facilitator igipri and ORF3 genes, 1 734 11 7l 0 844 IgI029 com~pltecu pneumonia. yp P 1 9F gl cr sularposa a idedr oge nase s 9g0D 420 1 4- I I I 1 gbu~229 operon, Icps19LABCDEC111JKL0NO1 genes, compQlt cds, and aliA geneII I I I Ipartial cds 137 114 1 8590 18775 1emblZ83325l5Pz8 IS.pneumoniae dexa, capl(A.B.C.DE.F.G.11I.J,KIc genes, dTOP-rhamirose bisnhssgne n4lAgn 174 16 4 F 137 115 I8773 8967 1emb128333515Pz8 S.pneumoniae dexB, capl(A.8.C.D.E.F,CII.1.J.Kl genes. dTOP-rhamnose 19 1 1 I biosynthesis genes and aliA gene I 981 15 j 5 i- 4- I 137 j16 I9222 1968'7 1emb127772615P16 jS.pneumonlae DNA for insertion sequence 1S1318 (1372 (apI 96 446 465 00514 S- A- nsequnceIS1184 pi- 41 137 j17-----9641-1-0051- le-----------------IS--neu--on-ae-DNA--for-insertion--sequence--1-1318--(823----p-----6--j-293-4-- 4 1319 110 11299E 112702 1emblX63802l5P80 1S.pneumoniae mmsA-Box j 90 234 j 297 141 8 17805 18938 lemb124998815P* I Streptococcus pneusioniae MSA gene 1 99 338 1134 I 141 9 8936 110972 IamblZ49988ISPMh4 jStreptococcus pneumoniae mmsA gene 1 99 2037 2037 4---4 14 10 1172 1146 emblZ49988ISPHH jStreptococcus pneumoniae mmsA gene 100 76 996 4 142 2 57 81 jbjlB~lj Streptococcus pnaueoniae uvs4O2 protein gene, complete cds 98 J 174 558 4 14 87 957. 1eb1N802151 jStreptococcus pneueonlae uvs4O2 protein gene, complete cds 100 142 171 14) 8 022 jgbjN802l51 IStreptococcus pneumoniae uvs4Ol protein gene, Complete cds 1 95 1997 2043 F4-- a a a a a. 9%e S. Cnuon Cod n re in ontann know sequences *D aI an at ac o ae I ae a I o alanat* 142 5 3020~ pne5oia -giB25 Coding reos cnuonann kn40own i seqene opeec 0 5 7 l1a Str 1 Stop lmatch15 A match geone namie fo percentk HSPe nt O9? nt 181 I 147j 1 302 3594 gbLN8056 IStreptococcus pneumonie uvsOproi gene, completed 9ds j1001 56 14 1 3 1 228 159 ImbIZ3Sl3SISPAL lS.pneumoniaa l ge ne for3A amiBAlike gene s n Irs 99 15 5313 4- 4 I145 1 2 1 9971 196 94 Ibl05561 IStreptococcus pneumoniae pip enellpartnial cdsei 99A gee 1811p1824 1 45 I 5 22087 799 1eb t4 2 10527 IStpe o unumoni n c p e icap il lia nd in protge e n an d ene co pl t 9951 513 9- 145 J J15 1 776 jgbIZ8902S ISzpteoo ccus pndona p enclinbndn prti1pn)gncmlt 99 2156 j 156 9 9 11456 1 2 4 1 8 3J499220 1 gbj319027jsn IS~pteoo ccu nda peiclli-id prtig~tlgene, coplt 99 251 257 1346 1 175 1174 Iemb118200215nZ8 IS-pnsumoniam pcpB and pcpC genes j 5 98716 102 4 7 2 1 1 0 6 1 91 0 0 3 I e m h I Z 2 I 2 S r z e I S p n a u m o n i a e p c p f l e a n d c p c g e n e s jno i g u a i N g y o y a e a d 8 9 8 2 5 5 2 5 5 I 147 J1 110181 110202 1emb1 Z217021I5PUN IS.pneumoniae ung gene and mutX genes encoding uracil-DNA glycosylase and 8- 99 11 66 I 431 I I I I IoxodGTP nucleoside triphosphatase I III 147 132 11118 1 10676 IebIZ2130Ii Stetcu pnaumoniseug ee andeptidges encoing sutacli-ON glucosase and 8 6 61o I i I I l o oaT? n nuc e se t ho spog (htas ee s co pl t Id 1 01 0 9 a 3485 1 1 15 103 140 8835 IgbjU 02l75 iStrpocspneumoniae peptidex mehonn su9oid reuts lmsa-A3and 6- a a 359 131 9048 18521 1gb131361801 IStreptococcus pneaimoniae trauisposase, (comA and come) and SAICAR synthetase I 98 I 526 j 58 I I I I I lpurC3 genes, complete cdsII 1160 1 147 1 embIZ268slISPAT lS;pn:u7rile CR63 genes for ATPase a subunit, ATPase b subunit and ATPase c 100 142 j 147 1160 12 1,79 898 IembjZ268S11ISAT js. pneumoniae (86) genes for ATPase a subunit. ATPase b subunit and ATPase c j 99 I 72072 1 1 1 1 I ubunit II I 160 13 1 906 j 1406 IemblZ26BS0iSPAT jS.pneuasoniae (H222) genes for ATPase a subunit. ATPase b subunit and ATPasej 95f 01Si I c subunit 9j J0 0 360 -4I 1313 1942 IoembIz268so1 SPAT IS.pneumoniae M3222) genes for ATPase a subunit, ATPase b subunit and AlPase j 87 j 306 570I I~ I c subunitj 1161 111 1 984 1emb1X7724915PR6 jS.pneumoniae CR61 ciaR/ciati genes j 99 j 984 1 984 1161 17 1 690 17497 IembIX839l7ISPay jS.pneumonise orfigyrs and gyrO gene encoding DNA gyrese B subunit 99 1 437 1 588 1361 j 8 17443 1 9386 IembiX839l7ISPav IS.pneumoniae orfigyre and gyma gene encoding DNA gyrase B subunit 1 98 j1912 1944 a 1163 1 1 1 I 2155 IgbIL205s91 IStreptococcus pneumoniae Exp5 gene, partial cds1 98 327 2154 a *a O. a TABLE I S. pneumonlae Coding regions containing known sequences Contig JOR j Start j Stop match fmatch gene name jperetjlS t On I ID IID int) nt) aceasion I den t length length, 165 I1 j 32 j1618 1 gbIJ017961 IS.pneumoniae malx and maY44 genes encoding membrane protein and 99 1587 1587 amylmaltsecomplete cds, and mair gene encoding phosphorylase1 16 68 3902 IgbIJ017961 IS.pneueonlae maIX and mal genes encoding membrane protein and 1 o 100 I I amylomaltase, complete cds, and maiP gene encoding phosphorylase I1 29 1 166 1 1 1 3I 4 lmbIYXI463ISPDN IStreptococcus pneumoniae dnac, rpoD, cpoA genes and O11I3 and ORFI 1 c 100 375 375 f ft- I166 1 2 11507 1 320 IembIYI146IISPDNl IStreptococcus pneumoniae dnac, rpol, cpoA genes and OR?] and OaFS 99 1168e 1188 66 3 32410 I1432 IembIYI463ISPDN Streptococcus pneumoniae dnac, tpoO, cpoA genes and 09173 and OrS 99 1 563 1809 I167 111 10-17 1 328 IembIt7lSS2ISPAo IStreptococcus pneumoniae adcCBA 1 167 1 2 1 1844 1 999 .ebzlS2SA [Streptococcus pneum4;niae adcCBA Ip o 98 155 740 1 167 13 12714 11842 leIZ151 PDI t etccusp emna dCAoeo 98 4 841) I6 39 6 1 embIz7i552lSrPAo Streptococcus pneumoniae adcCBA oparon 1 99 04 759 4, 4 16 4 399 j261 Ie bl7sSIPA Streptococcus pneuoniae adcC8 onpr l d 99 703 259 ft- 1 17 110 1 -13A8 j 7685 1emb1Z7772615P15 IS.pneumoniae DNA for insertion sequence IS13l8 (1172 bp) 95 I 315 348 12 16 12462 1 91 IgbIIJ476251 Streptococcus pneumonia. formate acetyltransferasa CexplZ) gene, partial 1 97 365 2520 1 37:1i 20 1 gbIa4361 801 Streptococcus pneussoniae transposase, (comA and come) and SAICAR synthetase 89 3534I I I IIpu rC) genes complete cdsII 1 175 14 j 1843 1 3621 jembIZ472101SPDs IS.pneumoniae dexB, cap3A, cap3e and cap3c genes and orfa 95 I 89 1779 I 1 176 15 1 39094 12980 IembIZ67Yi9ISPrA IS.pneumoniae parC, parE and transposase genes and unknown onf 1 100 573 J 1005 f 1 178 111 3 1 42 1emb1Z6773915PrA IS.rmneumoniae parC, parE and transposase genes and unknown on I 95 I 423 j 423 I 179 1 1 426 1 70 I embIz83335Isr28 IS.pnaumonise dexa, csplIA.B.C.O.EF.v J.u. KIg genes, dTDP-rhamnose 9 1 I I I I 1 biosynthesis genes and aliA gene 9IJ 3 1 180 1 1 30114 11855 IembIX97l8ISrcY IS.pneumoniae gyrA gene 99 381 j 1230 186 111 714 1 I embIZ7S69l1sooa IS.pneumoniae yorliA.B.C.D.E), ftsL, pbpX and regft genes I 98 J 59 711 1 86 j2 1 2254 1 608 IembIZ79G9l1soaa IS.pneueoniae yorfIA.8.co.Ej. tsL. pbpX and rega genes 1 98 1 15 1647 186 3 1 707 1 880 jembIZ7969ljSGOa IS.pneumoniae yorIIA.8.c.D.EI. ftst, pbpX and regft genes I 98 1 174 174 189I 2 259 IgbIU72720I IStreptococcus pneumoniae heat shock protein 70 Idnaa) gene, complete cds 258 28 I I I I Ijand DnaJ Idnaa) gene, partial cds 99 Il S-t- I 189 12 600 385 igbIU727201 St reptococcus pneumoniae heat shock protein 70 (dnaK) gene, complete cds I 982416 I II I land Dna.] IdnaJ) gene, partial cds j I1 I0 I1 T A L I. p n u o a C o i n r e i n *ot i n n .o *eq u e n c e s a .11 .n 9 *t 1 ac 33 o t e .h 4 18 3 .1:018 851 10bIU7272O1 (Streptococcus pneumonia. heat shock protein 70 (dnaK) gene, complete cdis 99 168 16819- I Iand'PDnaJ Idnai)gene, partial cds
I
18 s9 14 1012 j 14 Ibj77oStreptococcus pneumonia. heat shock protein 70 IdnaK) gene, complete cds 99 1062143 I I I I I land DnaJ (dnaji gene, partial cdi s 4 4 194 1 I 729 lobIH36i8Oj (Streptococcus pneumoniae rransposase. IcorsA end comB) and SAICAR synthetase I 91 72812 I I I I IlpurC) genes, complete cdi I 79 41- 99 j2 1117 1 881 IembIZB333sjspze IS.pneumonise daeB capIIA.BcE.oG1p~awzaKI genes. U1'DP-rharsnose 96 21121 I I I b a bis nthesis genes and aliA geneII
I
4£99 1499 1 1762 1emnb11833351SPZ8 (S.pneumoniae dexB, capliA.B.C.D.E.r.G.HIx,JX genes, dTD)P-rhsmnose 8 1 6 I I i I I I ~biosynthesis genes and allA gene.89 j 28 J 64 199 15 1 61 I2284 Ieel.I Z83 335S p28 S. pneumon iaea dex B, cap I IA,.B.C. D J .K genes, dTDP- rhamnose I 910 5041 I I Ibiosynthesis genes and aliA geneI 203 1977 1337 igb(L205631 IStreptococcus pneumonia. Exp!) gene, partial cds 1 99 1 342 1 1641 -1L3-6 3 4- I 04 145 3 gbL3l~( Streptococcus pneumoniae explO gene, complete cds, recA gene, 5' end 1 99 1143 1143( 4- 4 I 208 (1 59 (2296 (gblu89711j IStreptococcus pneumonia. pneumococcal surface protein A PspA (pspA) gene, 1 0 1 71 1238 I~ I complete cds 23 j3 (25 2123 IembIZ8333s1spza jS.pneumoniae dexe, capIlA.B.C.0,.tGH uJK! genes. dTI)P-rhamnose I 96 3321 3 I. I jI I biosynthesis genes and aliA geneI
II
216 1 348 12 (embIz83335(spza (S.pneumoniae demol, capiIjA, B.C,D, Ei'G,)Ii,.1.Kj genes, dTDP-rhamnose I 99 311l 31 -ibiosynthesis genes and aliA -gene 7 4 I 216 (3 1 21;50 1 2327 IgbII428678( IS.pneursoniae promoter sequence DNA 98 86 j 324 222 1 1 417 4 IembIZ8333s1SPZ8 Spneumoniae dae, cap)IlA, B.C,D,E,P.GI 1,JK! genes. dTDP-rhamnose I 94 141 4141 I I I bisnhssgenes and ailA gene ;-227 i 26 I 28 IebAOO36s Itepoocu numna ihgn I 391 804 jgbjM13296( 1Spneu,*moniae recP gene, complete cd 99 48 109 09 4 247 13 1 1625 1807 Igbjn36l8o1 Istreptococcus pneumonia. trasoae cm and c B and 4ACRsnhts 4 18 184 1 i purCi genes, complete cds SIA yteaeJ 9 7 249 (3 9;i 1364 1emhIz833351SPz8 (S.pneumonia. deme, capIAe.C,o.E,r,..I,£aKi genes, dTDP-rhamgaos3e 94 443 444j I I I Ibiosynthesis genes and aliA gene jI 253 (1 (362 1 3 IgbIn3618o1 (streptococcus pneumoniae transposase. IcomA and comB) and SAICAR synthetase j 99 360 360-- I I I I(IpurCi genes, complete cds 36 4 a. 99 TAB LE 1 S. pneumonia. Coding regions containing known sequences IContig IORF Start jStop I match match gene name I iIlI t) Intl (l j acsso jIPercentj lisp t I RFn ident, length length, 253 j6 2065 2512 lembizB33s51spz8 S-pneumoniae dexe. caplA.,C.EF'QjjlJKj genes, dl'DP-rhamnose 9 0 0 I I I I j Ibiosy'nchesjs genes and aliA gene 4 4 -504-- I 55 I I j 00 Ieb~S20Isra S~neroniae pcpB and pcpC genes 1 97 531 j 798 F 255 ;2--;798 I1841 IembIz82002IsPz8 IS.pneumoniae pcpfl and pcpC genes I 97 4 72 I 1044 I 25 3 I24913 1969 IembIZ67739IsppA IS.pneumoniae par6, parE and transposase genes and unknown on I 92 j 435 525 I 257 I2 985 770 lemblxl737ISPAn IStreptococcus pneumoniae assi locus conferring aminopterin resistance I 96 117 216 257 5 I7 *907M 36-gbIN36 *Sore Streptococcus pneumonlse transposase.o s (comA ando come) ad--o---)---Lnd-S-AZ-CAR a-nndhe SAXCAR---1-s--y3the---1as-1 1 I 1 1 1 (purCI genes, complete cds 26 ynthetase (sligunsn t rihoph cylhdrls IsI) 95 84s I I I Ipyrophosphokinase IsuIfli genes, coeplete cds 4 26 21 27 911651 Streptococcus pneumoniae dihydropteroste synthase (sulAl. dihydrotolatLe 97 755 987 267 121 227 JbI~h6i5~ Isynthecase IsuiBl). guanosine triphosphate cyclohydrolase (suiCl. aldolase-_ I I I I I pyrophoaphokinase CoulD) genes, complete cdsI
I
26 21 30 gI116 rpoocspneumoniae dihydropteroate synthase (aulAl, dihydrotolate13134 267 I 262 303 jb~u1lS~j Synh:ts (sule), guanosine triphosphate cyclohydrolase (suiC), aldolase- 98I0 I I I I iIpyrophosphokinase CoulD) genes, complete cds I I 2 I 5 51 16 IbIUl6I561 ISrpoccu fnuoiedhdgtr~eznh~ slJ lyrtl~ 99 f 57657 I I I I I pyrophosphokinase (subD) genes, complete cdsI
I
1267-- 6-4- I I 1 g~u616 ynthetase (uSqao nstripliosphate cyclohydrolase (suICI. :idoiaae- I I I I j pyrophosphokinaae 13um) genes. complete cds 99 j 78jI 6 267 7J5544 5140 'gb'Ul65 Streptococcus pneusotia dihdrteot sythas CsIA) dibydoolt 300 186 0 I I i synthetase CoulD), guenosine triphosphate cyclohydrolase IsuiC), sidolase-jI I I j jpyrophosphokinase (subD) genes, complete cds I 268 4 1193 11990o Immblx6I6O2ISPaO IS.pneumoniae mmA-Box I 89 1 94 1 198 -27-1 1- ;I--;562 i104 -igbjH29686j S.pneumoniae mismatch repair (hexe) gene, complete cds I- 93 I 160 j 459 1291 I 75452 Streptococcus pneumoniae SSZ dextran glucosidase gene and insertion j 96 j 450 4501 2 jgbiU4047~ Isequence 151202 transposase gene, complete cds 3 21 2J110 2 enI835SZ S.pneumoniae dexa, cap) lABCD .rg JjKj genes, dTDP-rhsmnose 87 205 7 biosynthesis genes and aliA gene 1291 3 807 1559 IembIzsmns51spza S.pneumoniae deaD, caplA8.CD.p.pcl~,lJK genes, dTDP-nIhanu~ose I biosynthesis genes and liA gene SO 170 2"1 291 4 1374 11099 gbM681 Streptococcus pneumoniae transposase. (comA and comB) and SAICAR synthetase 85 26427 IgiC3I0 j j JprC gees 7 cmleej TA L I S. a.uona Coin rein cotinn knwS s S 0 T a- 1I 11 D I Itl ntl acession percientl l nt lengFt In-leg-- 293 1 3 173---I--------S-nemol--gr--gee-nd unnon-rt--8---- 67 670SG Spemna ySgn n nnw 1 9 9 -5 1- -7 296---J-1434-----I--b------j------S-pne--oniae-da---cap3A--cap---and-c-p]C-gees-and 430-128 orf 317 519ebz73~pA I~nuoiepr.prtadtasaaegnsedukono 9 430 1254I 157--t -9 325 2 1237 485 P ISpneumoiiae desh, par c a ndiA.8.Cn.Ero3 rj gee nesnkow dTPrha s 89 299 1541 326 I 32 1 60 6 1 e~z33s~pz S. pneumoniae dash, caPllA.8.C.D.e.POGHIJKI genes. dTDP-rhamnose 94 899 751O I~ biosynthesis genes and aliA geneI o 3 6 -5 4I 5 g b I u 4 l 3 s j S-t r e p -t c o-c c u s p n e u e o n i a e p e p t i d e m e t h i o n i n e s u l i a x i d s r e d u c t a s e l e s r A l a n d 8 9 I I I I I IIhomoserine kinase homolog (thral genes, complete cds 7j 3 3 30 1 336 1 3OR 1 9 lm IZ26SOISPAT lS.pneumoniae 1142221 genes for ATrase a subunit. Taebsbntan Tae 1 0 1 c subunit A~s uui n Tae 9 0 V-360 ;1T- i519 1eeb1Z677391SPPAi IS.pneumoniae parC, part and transposase genes and unknown art 95 43 519 411598 j 1960 lebZ331P8Spneumonlae dexB, capIA.aC.EroGH 131(1 genes, droP-rhamnose 1 94 3 53 36 136 genes and aliA gene 36 1 63 embfza335ssPza IS.pneumoniae dexB, caplIA.B.CoEFrGo 14 1351 K genes, dTDP-rhamnase 95 63 6721 I I I I Ijbiosynthesis genes and aliA gene
II
362 12 118 1 28 gIU04471 Streptococcus Pneumonlae SSZ dextran glucosidase gene and insertion 9 1 1 1 1 sequence 151202 transpasase gene, complete cds 96 441 14, 1 384 j 1 1347? 1l 1 embjx857871srcP IS*pneumoniae deah cpsi4A, cpsi4h. cpsl4C. cpsl4l, cpsl4E, cpsi4F, 1 1 I cpl4I. cpsl4l, cpsl4J, cpsldK, cpsI4L, tasA genes I 9 -9 ti .R St.r Sto am ace *eee .ame aia egh ID aI (r *nt a~s o a a *aa .10 1 4 apr F 06 3 F 0 Itans a el a tio a o au a *te t c c u a l *0 0 8 260 1 2 11 S8 l p neumon6ia.0 -tsati eongareions f ove T -Spr o s la o known proteins1 228 1 2 1605 1002 IpirIF06O63l o traslaon lonation facr Tu Sotraset esstmr rococcus oa 109 10018 9- 9 320 -1 90 2 j 113 j---9rlF60466319F606--- ATra ensl tion e o ni o actor ubtretcocs r alics saia s 99 98 1137 9 9- -i i -18 25 I 86 1 9 i 1 74 9phy po et -a I a o iu sin e sE .5 -La t c c u -ac i -r r 98 96 14991 1 1 2 68 10 02 6 IgiJ310621urci phosphorioy tran sferas a sytm[~ Streptococcus 1aiais 78 8 9 951 II 1 1 274 IiI36159 IAP-dpeen n provte sase po otc sbunit sytmezm(Streptococcus (aOaNulI 989 8 I 29 1 1 0 119288lisina nos pht deydoens (Srpoccu pygees I'l 9880 1 17 8 1 1 5 14 168 gi1153755 lphosphtobtaufacto ias IC3.185 ILactococcus lactis ceors 97 I 9 119 9- 1 128 11 2 110448 36115 19113479987 luailpo hrbsytaslrs (Streptococcus thalivariusl 1 96 1 88 6847 T 1 218 1 1 1 91 1 2741 J1 1 56 lphosphenorpyruaresgar phosphoran eae s yste Sre nzymeU I (reo coccu 1 96 1 92 83 127~ 1(u6 j 5129 lnietiion fconi vLctococu Iats 6 j 8 6 1284 112 110438 111154 191112768 IO73 neonvat omt-yStreptococcus tempi us mtnsl 96 9 234 7 1 186 3 1 316 j 015 g19 4606 Iac poSyeptidecu Cm -36 S tahlcc ue) I 96 80 376 9 218~~ jg I84 11103745 itragenerki cagegainr vn adeiStreptococcus gordonli)I 9 9 3 86 834 9 4 S 19 2 9 641 434 191120856 jea-shock proein82/nomyn I~pho onses e sol rti hp2no 96 6 327 T I~ 43 544 112557 11 862 106Ag11 1107 Pvae formte-lyacs (Stepcocus mtns 954 89 j 12469 9 I l i 2 j 606- 45 8 3 18 1911143 96 laO (La r cepocu s t e oc c u py g n s actis- I 9 35 89 -1 06 4 1 46 3 j 87 340 2 04 giJ1 8 5 6 IorYIXH rhdrflt snhts (Streptococcus mutansl 1 94 1 86 664 4-T Se*~ so 0..se s. 0 e TABLE 2 S. pneumonia. Putative coding regions of novel proteins similar to known proteins Contig ORE tr Stop match match gene name Si %i Iident Ilength I ID 111) int Int) a cession t I 65 11l 1 4734 j5120 1g1140150 IL14 protein IkA 1-122) (Bacillus subtilis) 93 I 87 1 381 ;4 S--68 53 11297 19 1141 -341 antitumor protein (Streptococcus pyogenes( 93 j 87 1245 j1 3 299 jgnlIPXDIdI1l166 Iribosomal protein S7 (Bacillus subtllis( 93 I 84 297 27 13 695 11093 1011142462 Iribosonal protein 511 (Bacillus subtilis) I 93 86 399 160 15 J1924 3462 10111173264 IATPase. alpha subunit (Streptococcus mutans( 93 I 85 1539 4- 4 21 5 35 3047 1011535273 leminopeptidase C IStreptococcus thermophilus) I 93 j 82 711 I I 262 1 1 16 1 564 1911149394 Ilaca ILactococcus Isctis(1 9 9 4 4- 4 4 0i1295259 Itryptophan synthase beta subunit (Synechocystis sp.( 93 91 195 j3 1392 j1976 19111574496 Ihrpotheticsl Illemophilus influenzsel 92 so 585 T -i 1 36 121 120781 119927 Igij310632 Ihydrophobic membrane pro tein (Streptococcus gordonill 92 86 855 18 I -F3 ;1265 I 1534 19111l49396 IlacO (Lactococcus lactisl 1 92 I 83 270~ I 181 1 3662 1 4060 1011149410 Iensl'me Ill (Lactococcus lactis( 1 92 83 1 399' 46 32 353 1462 gIonl ID 20 90 Iibnrec ogbniin prtinle proe AStreptococcus gordoili 91 1 85 1 16953 1 65 110 1 4442 1 4726 1pir151786515178 Iribosomal protein 517 Bacillus stearothermophilus I 91 8so 285 I 77 I2 I260 1900 1011287871 loroEL gene product (Lactococcus lactis) j 91 82 1641 -T i 4 4 84g. 06 111871784 IClp-like ATP-dependent protease binding subunit (Bos taurusi 91 79 1 2055 4 99 1 8 110750 19272 1911153740 Isucrose phosphorylase (Streptococcus mutansl 1 91 84a 1479 I99 i 9 114g110?19153739 Imembrane protein (Streptococcus mutansl 91j 7876 4 -127 -i5- ;206-5- 24-69 IpilS07223IR5BS Iribosomal protein L17 Bacillus stearothermophllus I 91 78 405 I 32 I j959 990 1gj1305 hubst (Bacillus stearothermophilus) 91 89 1501 4--32 4 137 I8 j 4765 j6153 InIjPIDldlOO347 INs. -A'IPase beta subunit (Enterococcus hlrae( 91 79 I 1389 4--F 7 11119 1 9734 10111815634 lolutamine synthetase type 1 (Streptococcus agalectiae( 91 j 82 f 1386 201 2 1798 278 10112208998 Idextran glucosidase DexS (Streptococcus suil(j 91 1 9 1521 4 1 222 1 2 1673 11839 1011153741 IATP-binding protein (Streptococcus mutans( 91 85s 1167 -11-3 *I 4-0O -gi--Il 326166 j6570 lpirIA36933IA369 Idiacylglycerol kinase homolog Streptococcus mutans go 90 I a t a0 a a a a a a a IContig (CR? I Start stop I match match gene name I Sim iet lnt ID JD I (nil Itntl I aceasion jnt dn 4- 33 1 2 1841 1 527 (gij1196921 junknown protein (Insertion sequence 158611 1 90 70 315 4 48 127 120908 119757 1onhiPl01e274705 (lactate oxidasa (Streptococcus iniae) 1 90 1 80 1152 (21 119777 118515 (gnilrxvIa2212l3 (ClpX protein (Bacillus subtilla) 1 90 1 75 1 1263 I 56 12 1 117 1 977 (oh 1710133 (flagellar filament cap (Borrella burgdorteril 1 90 1 50 261 I 65 1 1 1 1 1 606 jgi(1165303 (L3 (Bacillus aubtilia) 1 90 75 606( 11 2 1 98 .1 1 1 1 5 3 5 6 2 I ar:a:e beta-aemialdehyde dehydrogenasa (EC 1.2.1.111 (Streptococcus 90 80 7 120 I1 (1305 827 (gi(4107880 (aanI (Streptococcus equisimilia) 1 90 75 I 5191 I 159 112 (71690 8298 (gi(143012 (cxr synthetasa (Bacillus aubtilis) 90 84 I 6091 166 (4 406 13282 191617: high affinity bnhd chain ain acid transpot protein Strtococu 90 78 795 401 jg(617 I mutana ac a an prpe pccu 4 13 11 1 20 1 1395 Ioi(308858 IATPzpyruvate 2-0-phosphotransferase (Lactococcus lactia) 90 76 1368 I 191 1 3 1 2891 1 1662 (gi(149521 (tryptophan synthase beta subunit (Lactococcus lactia( 90 78 1230 I 198 1(.2 1 1551 1 436 (gi(2323342 I(AF014460) CcpA (Streptococcus mutans) 1 90 1 76 1 11161 I 30S (I 1 3 1 783 (gi(1573551 (aeparagine synthetase A (asnAl (Hlatmopillus influenza,) 90 j o 80 747 4 I 8 (3 1 22015 1 33,43 1011149434 (putative (Lactococcus (actisi 6 9 78 1059 46 8 75-17 1 73.62 IpirIA45434IA454 uIrbosomal protein L19 Bacillus stearothermophilus 89 76 216 I 49 1 9 1 8361 110342 Jgi1153792 (reoP peptide (Streptococcus pnaumoniae( 1 89 83 1980 1 51 (14 118410 (19447 (gi(3088571 (ATP:D-fructose 6-phosphate 1-phosphiotransferase ltactoccccus lactisl 1 89 1 81 1038 1 57. 111 1 9686 (10669 (gnl(P10(d100932 (1120-forming NADI Oxidase (Streptococcus mutans)( 89 77 984 1 65 1 5 1 2418 1 2786 19111165307 (519 (Bacillus subtilla) 1 89 1 369 -4 4 65 8 1 38036 .14225 Isp(P14S77IRL16- 15So RIBOSOHAL PROTEIN L,16. 89 82 420( 0 (18 1 8219 1 8719 (g11143417 (ribosomal protein SS (Bacillus stearothermophilusi 1 89 76 501o 73 1 9 1 6337 15315 jgi(532204 (prs Il~ateria monocytogenes) 1 89 70 1023
I
7 6 1 3 1 3360 1 1465 (gnlIrlnje200671 (lepAk gene product (Bacillus subtilil( 89 76 1 1896 1 99 110 (12818 (11919 1911153738 (membrane protein (Streptococcus mutans) 1 89 73 1 900 1 120 1 2 1 35.52 1 1300 (gi(40788l (stringent response-like protein (Streptococcus equislmills) 1 89 1 79 1 2253 1 122 1 5 1 4512 1 27191 (gnlPIDIa280490 (unknown (Streptococcus pneumonlae( 1 89 1 81 1 1722 Pt.~ *a oo 4. p: TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins iContig jIonr -i Start iStop- 1- m a tch Imatch genename -ident--- I ID (II) I (nt) nt) I acession I Int I i I 177 6 13050 (3934 9 i1912423 (putative (Lactococcus lactisl 1 8 I 71 8851 1 181 18 14033 1 5751 19i11494l1 jensy'se III (Lectococcus lactis) 89 80 1719 1 211 14 13149 12793 (giJ535273 jaminopeptidase C (Streptococcus thermophilus) 89 83 357 4 ft 1 361 111 431 1 838 (gi1l196922 lunknown protein (insertion sequence IS8611 1 89 70 408 f 1 34 117 111839 110535 Isp1P300531SYH.5 jHIxSTzovL-RaNA SYNTHETASE (EC 6.1.1.21) (HISTIDII4E--TRNA LIGASE) (1415851. s is 7 1305 1 38 13 11646 12623 Ioil2058544 (putative ABC transporter subunit ComYA (Streptococcus gordonit) 88 78a 978 1 54 1 1 I 3 1227 lgnllPIDldi01320 jYqgU (Bacillus subtilis) j s 88 66 1 225 I 57 1 2 1 611 j 1468 lgnl1P101e134943 [putative reductase I (Saccharomyces cerevisisel s I 88s7 858 f ft- 1 65 113 j 5497 16069 (plrIA29102(RSBS (ribosomal protein LS Bacillus atearothermophilus I 88 1 5 57S3)3 1 65 120 1 9030 1 9500 1giJ2078381 Iribosomal protein LIS (Staphylococcus aureusl 1 88 83 471 I 78 1 3 3636 1 1108 IgnlPr10d100781 Ilysyl-aminopeptidase (Lactococcus lactisi 88 e 80 2529 ft(.
1 106 112 112965 112054 Igil2407215 I(AF017421l putative heat shock protein HtpK (Streptococcus gordonii) I 88 72 1 912 107 1 2 1 219 1 962 (enl(PIDje339862 (putative acylneuraminste lyase (Clostridium tertiuel as 88is 744I I III 1 8 1140713 110420 IgiJ402363 IRIJA polymerase beta-subunit (Bacillus subtilis) I 88 a 74 3654 1 126 1 9 113096 112062 (gnl(PIDje31i468 Iunknown (Bacillus subtiliel 88 74 I 0351 140 (17 119143 118874 (giI(13659 influentam predicted coding region H410659 (Haemophilus intiuenzae( 88 61 270 14 (I (394 (555 (gnl(PIDje274705 (lactate oxidasa (Streptococcus iniae( 88 75 162 S---ft t- -ft- I 48 (4 j2723 (3493 (gij 1591672 (phosphate transport system ATP-binding protein liethanococcus jannasclhiij 1 88 1 68 1 771 I 160 1 8 1 5853 1 6278 (gi( 1773267 (ATrase. epsilon subunit (Streptococcus mutans( 88 65 426 f 1 171 1 4 1 1770 12885 IgiJ149426 (putative (Lactococcus lactis) 1 88 (1 72 1 11161 1 211 1 6 1 4140 2 613 1gi1535273 .(aminopeptidase C IStreptococcus thermophilus) 1 88 j 74 1 528 1 231 1 4 1 580 1 957 (gi(40186 (homologous to Ecoli ribosomal protein L27 (Bacillus subtilis( 88 78 1 378 1 260 1 5S 2387 12998 (gi( 1196922 junknown protein (Insertion sequence 1S8611 88 69 612 1 291 1 6 1 2017 1 3375 jgnljPlDjd100571I (adenylosuccinate syntherase (Bacillus subtiiis( I 88 75 1359 1 319 1 4 1 658 1 317 JgI(603578 (serine/threonine kinase (Phytophthors capsici( 1 88 1 88 342 1 40 1 5 1 435 3 1 4514 Igi(153672 Ilactose repressor (Streptococcus mutansl 1 87 56 162 .nu o s Put ti e *oin a.gi s f no e pr te n a~ .ek ow ro e n 0 e a .1 If I ant a ce s o nt a, 1 4 11 1166 pneuonia 2 utative roding (Irions enel protein simla to knwnprten 654 115 116603~ 1 73 giI11972 Iunkownm protein (Insertin sunc 15611 81 j 3 12 21 4 1 15 1 j 1401 380 665 giII1165309 15 (acoillus Is tilisi ocu mu s 817 13 6 1 806 15 2 6631 7039 gil0495 I~I P O I66 ibeosgo p tri G8 Bacillus subtilli) 81 7 13 11 I 150 8 125411 I259 j nIjl811 4 jganlakne-RA nhse (Srptcccs ta sns t I~cl u utl a 81 18 115 82-0- 2 j- F 1034- 2- 8 05 gnljPlD lal l66 -i e ongat in f aor 0 (o g ti n fa to F Bacillus- u t i I subti--a-- 8 71 -1316 1 031 4- 4- S 26 4 j 20964 1 2489 19i1l196921 Iunknown protein (Insertion sequence I586113 81 69 294 31 12 1203 123891 IinIPo' e9 99 30 Iph oomalartn p yr hta b~etaco sbuni (Bcills tll 81 74 1131 22 25 2 1154 147 IgnlIPI j34 58 Ilb proei Sila to elongatin s fatrE1 Bclu tls 871 73 418-9 4- 1 2911 10967 23894 jgLIj196921 junknown protein (Insertion sequence IS8611 I 6 8 52 9 4 4- 6523 109521 160 IgiI89115 130eriooale protsei (Pdiococcu cdlacti cil 86 76 624 1 j 4 1534 j 450 1g11153495 Irai o s l preii n z m II8 (Bcil S t e oc cu subtilia 81 13 1 483- 4 1 1 110270 11445 gj196 IunknD~267 o wn pecrbotein Insertonoc sequence 1561 86 76 5179 4 I 06 6 12 78246 1 68804 IgiISI PlO Il Ire98 xas r ae trept ccuam ls e pneum o i s i hm n iI11 86 1 6 4 1 657 14 15 1 2401 IgnitIO elS9RSB Iruibco a l erote in at 2 ails clstrhermop i l 17671 8273 4 4- 4 I 5 2 110951 1110 IgniIP44014 28 lDA dndna knas BH (Lc rs tetococcus lactials [66 32 i i- 4- 13 10 6430 1 4980 IlilD 26470 IOHP ero xyase Lacococcus actis 673 4- T 4- 8 6 1 6 141 115 1- -10 2 6110 n--D--22B ID de--hosp ende-I ont A Le y d e ase (re tococcus la t s pyogenes- 66 I 804 3-1 2-34 a. a1 aI *a anr *a a.t I* acss o Ia *a* 1 161 15 150:15 1 6284 1gi147529 lUnknown (Streptococcus salivarius) 86 j 66 12601 184 11 1 2 1483 IgiI642 661 INADP-dependent glyceraldehyde-3-phosphate dehydrogenase (Streptococcus j 86 71 1 12I i1 1 ImutanslIi I 10 I 3!i 571 191115366) ltranslational initiation factor 112 (Enterococcus fasciumi 86 76 j 2913 36 j4 I244 I30 gi12149909 cell division protein (Enterococcus faecalisl I as 73 1266 38 i4 2-45 I3587 9 11084 pttv AB trnpre suui CmB(Srpoocs odnI 8 j 72 j 13- 0 4 38 5 3577 135 19112058546 IComYC (Streptococcus gordonlil 85 80 339 1 57 1 5 2- 797 13789 IgnllPloIdloln16 JYqtJ (Bacillus subtilisl 85 7293 1 82 15 14915 16054 1911153746 Imannitol-pliosphate dehydrogenase (Streptococcus mutans( j 85 1 68 1140 4- 4- 4- -j 9-1 -110-- 1 87 1 2 1 1417 12388 Igi11184967 IScrR (Streptococcus mutansl 1 85 1 69 1 9721 10g 66 14 111153566 ICR? (19X protein) lEnterococcus faecalis) I as5 67 1 489 *-56T31--i 127 1 12 312 1692 19111044989 Iribosomal protein £13 (Bacillus subtllis( 1 85 72 381 1 17 1 7 1292 14767 IgnljPIDjdIoO347 INa. -AT~ase alpha subunit (Enterococcus hire) e1 i5 1 4 1806 3 70 12 j2622 1709 IgnljI~jI2zo( 01: FUNCTION UNKNOWN. SINILAR PRODUCT IN E.COLI. It. INFLUENZAE AND 85 70 1914j 1 1 1- 1--LISSERI HMlINITIDIS. (Bacillus subtilis) 4 1 187 15 1.3710 14386 1gil727436 1putative 20-kos, protein (Lactococcus lactis( as8 65 627 233 2 1728 11873 1gi11163116 IORF-5 (Streptococcus pneunonlae( 85 67 11461 I 234 13 1962 11255 Igil2293155 I1AF008220) YtiA (Bacillus subcilis( 85 61 294 4 1 :0 1 1 1 309 j1 1931 1g11143597 ICTP synthetase (Bacillus aubcilis( as8 70 1623 I 6 1 199 1521 19iJ508979 IG1P-binding protein (Bacillus subtilis( 84 72 j 1323 i4 i 3-5 4- 49-3- 1 14 13 63 1 2093 1gi1520753 IDNA topoisomerase I (Bacillus subrilis( 84 69 j 2031 1 19 14 1 1793 12593 Igi 12352484 t(AFOOSOSS) RNAseH 11 (Lactococcus lacris( 1 84 1 68 j 8011 1--4 4 41- 19-8- 1 22 128 121723 120884 1ig1299163 Islanine dehydrogenase (Bacillus subtilis( 84 1 68 840 a a ce a aeaa. T A L*2S n l a s P u a i v c o i n .e g ao s *f n o e p r t i s s i i a *t 5 n w p o e n S. pnu9na 1 55 1530 Ii 479 Puaptecing regeions o ol prten simla to knwnprten Co 6i 2a2 21 Start 2 g1 3 03 Stp mth Jatch b nigen namtei I~rp oc cu aimi 1 8d n 72 lengt 41 D lID45 84 ja yl s (Srniloc u Intls ac ss o j4 j 7j 1in 5 3 9 j 615 5300 jg11147194 phn prati (Sce ptcui colil nae I 84 71 351 S 62 1 1251 1207 644 11306317 IATP indin p roatiei (Stetococcus gordonji 1 84 j 72 780 4- 12 1 951 j1169 gij445084 Iams e pr tep tococcus bov tis 84 68 I il1 22j~ 162 1gni195105 I3 S IR7 O cra lputativ e Se toccs pneuonaille s pl na m 84 74 32402 5- 4 0 629 1 1 241 64 Igi[806487 jonRili; putative ILactococcus lactiel 1 84 66 6 2 44 4- 4-- 165 14 j 4690 82590 lii 140 Iribosomal) proteinthLI (Bacillus subtilis j 84 j 69 1 291 19 15 11 950 1109 Ig(J440736 Itecy prteins (Lactococcus lactis) 84 68 j 163 4- 9 60 1 5494 22621 IgnilIDOje 9376 Iiabemost-ahospNAe binthasproe i (Latoacilius platais 84 68 47321 159 11 14261 845 Igi18048 j0821 p ttiv Lacetococcus iactia l 1 84 1 63 1034 163 11 1138690 15104 jgnIj2293164 5S IiAF00822 SAM U sytae (Blu ubtilis 8 84 69 1 2215 3 4 1 3348 j 1308 lgij4 94 Itriepridase o2LacToccs 38 asct fis 4 c detcl( gas 843 1 74 716 4- 28 1 6 1 331522 3505 IgiJl4366 j11 ihr uenv) prECte Bcilu subiois H1 6 9Iam p i u n l e z e 83 5 200 4 4- 3511 183 3417 IgnJ31 7D07 6 singlhetralnuDNA obindein drin (Baceillu sublismai i 83 68 1471 115 117432 84572 IgiJ52o78 iomA-I4 pro en eici (te ococus pnuoie 83 j 66 I 0 103 5120 1185t0 11441 IgnIPID9 d oB uno wn Bacillus subtlis( 1 83 j 69 6285 4- -4 4 o258 Thi 23 a1 2195 is 404 pct1650 idenica iBcilu gaps( o 23 83 1 4 1 34 i 281 668304 I 3 1gijI1736594 j imnglouen prite s codngregioo 310659 laeohlu inl na( 83 1 57 3019 te I S t o ac m a c Ce n n a m *i i I .e n t h 4- *1 *-87 1 5 1 51-47-nI l~ 332 uaiv pZp oenf ail ssbill8 4- 42 96 112 18963 19631 IgiI47394 I5-oxoprolyl-peptidase (Streptococcus pyogenea) I 83 73 I 6691 I 8 1 3 263 0111183885 Iglutamine-binding subunit (Bacillus subtilis) I 83 55 261 I 120 14 171701 5233 IgiI310630 Izinc metalloprotease (Streptococcus gordonlit 1 83 72 j 1938 #4 17 17 129911 14347 191,1150O567 jIM. lannaschii. predicted coding region llJi665 (Hethanococcus jannaschli I 83 72 13501 1 17 11 1 3 1440 Jg1I472918 IV-type Na-ATPase (Enterococcus birael I 83 1 60 1 4381 4 4 1 160 16 1346j; 1 4356 Igill773a6s lAT~ase. gamma subunit (Streptococcus mutans) 1 83 I 67 1 8911 I 214 14 122711 12964 jgij663279 Itransposase (Streptococcus pneumoniae( I 83 72 6871 4 4 I 226 3 236-P1 2020 1gI1142154 Ithioredoxin (Synechococcus PCC6301( j 83 1 58 I 348 1 4 4 303 111 3 11049 1gL14O046 Iphosphoglucose isomerase A (AA 1-449) (Bacillus stearothermophilus( 1 83 1 67 1 10471 303 12 111553 1931 Igij289282 Iglutamyl-tauA synthetase (Bacillus subtilisi 1 83 1 67 1 777 1 117 1153741 114318 gi633l47 Iribose-phosphate pyrophosphokinase (Bacillus caldolyticusl 82 1 64 1053 1 4 I I 1 299 1 96 IgiIl43648 Iribosomal protein L28 (Bacillus subtilisi 1 82 1 69 j 204 9 1 3 I 14791 1090 IgiJ385178 junknown (Bacillus subtilisl 24 9 1 9 1 7 I 421:( 1 3899 Ign1IPIl3~d10o576 Iribosomal protein S6 (Bacillus aubtilis( 82 60 391 1 12 1 611 134 IgnlIPIDjdloosl junknown (Bacillus subtilis( 4- 7- 22 117 68 1 2 17 1322 114837 1giI520754 Iputative (Bacillus subtilil 4- -116 22 118 114891 115658 IgniIP1D~dlOl929 luridine monophosphate kins ISncoytss. 82 1 69 146 I 26 6 13 116 11147] 110641 Ign1IPID~d10ll90 jORV4 (Streptococcus mutans) 1 82 1 68 j 831 1 35 1 9 1 7400 j 6255 IgiJ188l543 IUDP-N-acetylglucosamine-2-epimerase (Streptococcus pneumoniae( 1 82 1 68 1146 40 0 I80 753 1 1359 Iriboflavin synthase beta subunIt (ACtinobacillus plouropneusoniae( 82 1 68 471 114 113833 114765 Ig11142521 Ideoxyribodlpyrlmidine photolyase (Bacillus aubtilis) j 82 61 j 933 4- 4 4 14 147371 1 1849 IgnlIPlDjd102221 I(ABOOI6IOI uvrA (Deinococcus radiodurans] j 82 j 66 2889 62 2131 1457 Igi224674g (AF009622) thioredoxin reductase (Liateria monocytogenesl 82 63 f 675 1 71 Ill 116586 117518 Ign1IP1De322063 Iss-1.4-gaactosytransgerase (Streptococcus pneumoniae( 1 82 1 60 1 933 1 3 113 j 9222 j 7837 IgnljP1D~d100586 junknown (Bacillus subtilis( 82 65 1386 4- 444 ~00 4 4 44 4 4 44 0 4.4 TABLE 2 S. pneumonia. Putative coding regions of novel Proteins similar to known proteins Contig CliP Start Stop ma.tch' mtch gene neme JIt aim Iidont j lngth 123 ID Int.) Int) acession ma n Iit) I 74 1 1 3771 JgnlIP1DjdIl99 lalkaline amylopullulanase ifacillue spi1 82 68 1771 4---I 83 9 366 383 lnl~De3OS362 lunnamed protein product IStreptococcus thermophilusi 1 82 I 52 288 86 1ll 110716 I9394 Igij683583 j5-enolpyruvylshiklmate-3-phosphate synthase iL&ctOCOCCU. lactisi 82 67 1383 hmlgutoEcl-KlailssbilisI 8 11 137 82in IP~d000 A8003927l phospho-beta-galactosldase I (Lactobacillus gaaaeril 82 74 j 1536 118l I I 1 1 1332 IgnlIP1D~dl0057)9 Iseryl-tIWA synthktesa, (eacillus subtllisi 82 71t 1332 1 151 3 146571 6246 1pirI50609715060 tp Isite-specific deoxyribonucicase (EC 3.1.21.31 CtrA chain S -12 66 1590 III I I Citrobacter treundli
II
13 6 4183 3503 9g112313836 AEOOOS84I conserved hypothetical protein Il1elicobacter pylon)l 82 68 j 681I 1717 112 j 5481' 17442 IgnlIPID~d10l999 I1AB001341I lNcrB tEscherichia coli) 82 1 58 1 19621 I 193 12 1 178 1 576 IpirISO8S64IR3BS jrlbosomal protein S9 Bacillus stearotharmophilus 82 70 3991 1 245 12 1 258 1845 1viI146402 IEcoA type I restrictIon-modification enzyme S subunit (Escherichia colt) 8 2 68 1 588 9 30 34 gnilPIDIdlOOS76 Iribosomal protein SIB [Bacillus subtilis) 81 66 I 255S 16 T 7484 18413 IglIll00074 itryptophanyl-tRNA synthetase iClostridium longisporumi 81 70 930 4 I* 20 111 110308 113820 IgnIIPIDIdI00583 Itranscription-repair coupling factor [Bacillus subtilisi 81B 63 1 3513 0 1 38 12 11232 11606 Ig1I2058543 Iputative D14A binding protein 15treptococcus gordonili j a8 63 1 375 1 45 2 3061 1 1751 IgIJ460259 lenolase [Bacillus subtilisi 81 671 1 1311 4 1 46 1 1 2 1 1267 Ig1I43l23l luracil permease loacillus caldolytlcusi 81 61 1 12661 48 3 24S]3 1440 ~gn1IPIDIdl00453 Ihannosepllosphate Isomerase IStreptococcus mutan.) 81 70 1014 4 1 54 j 2 1 11t; 336 19i11547S2 jtransport protein iAgrobacterlum tumefaciensi 8 a1 64 7711 1 65 122 110306 110821 jgi 144073 IsecY protein [Lactococcus lactisi j i 81 66 516 89 4 87 203IgI5686 serine hydroxymethyltransferase iBaCillus subtilla) 81 69 1272 I 99 11i6 119126 118929 -gij2313526 IIAEOOOS57I II. pl-or1 predicted coding region lIPO4l1i ilicobacter pylon)I j- 751 1981I 1 106 1 1837: 1 7822 1gnl1P10je199384 Ipyrft iLectobaCillus plantarumi 81 1 61 j 5521 1 108 16 1 5051 16877 IgiIl469939 jgroup B oligopeptidase PepS (Streptococcus agalactisa)I 81j 66 1 1824 4 I 113 115 115899 118283 Ipir150941I50S94 IspolIlE protein Bacillus subtilie j B)I 65 2305 1 128 I 5 I 3358 3634 IglIl685l11 Iorf1O9l istreptococcus thermophilusi I 8I1 69 j 2761 321 5e.049 aco typ I retito -s dfcto enym R a uui a ce i *oi at5a28 552 a83 01298 asM aytet3 [ail s a a.tiis at *9 111* 110~S pnuona 73 Puttiv coding~I00 regions8 ofCTO UKONoveacluots siisla to know proein 17oni59RPIStr Stop match52 maythgne sntame apasbnt(atccu at I em 6 5 dnt legt 2 11 1 8390 3119 jgj30486 recvers toIrestcritis ondfcatifon enzyme suit Esriiis coi 8t1 5 282 T 4 1 1 10 4 14159 4508 IgniPID6 dO200 I(AllB00148 FUNsphtION sf s UNKNOWN Ba Illu subils Itaohrohls 81 55 282 4 26 1 1 2 159 188 1si1145227 Itryptophan- synase SLrphasuunt(ctcccus lactis) 81 65 867 ;29-9--i i: 6-643--i- 4 217 4 15 400183 Ig1II46473 tcelioioee pyhoshtenfea sebnzymt I (Bacclus ltstertemohla I t 65 24 I 299 11 1:6605 1 74 Ign1IPIDIe3Ol546 IntySKI motye (Sao nella I etoriccasLhe~ l1 6 81 60 66303 36 2- 11 6 j 835 96 1911149521 Itrytophn sntase bta suunit(oc cclius lat I 8o1 59 62 121 13 1 2I 4 876 9242 191121690 jD /hantotrnpte me tlim fT-inprin prteptoc~aco s u an chl 80 64 j 58 1 8 o 1 28 1 3 424811 15798 1911452360 IvaitMnaa synteditaed (Bcoilu sr binlisl la-phlsiflez I so 69 2670 j I 4 32 1 2 1902 j 1933 IgnlIl0e264499 Idihydroorotate dehydrogenase a (Lactococcus (actis( 80 66 1032 I 9 1 1 1266 lgnIlP101e23407)8 Ihom (Lactococcus iactis( 80o 63 1266 4- 4 52 5 14363 13593 IgI11183884 IP.TP-blnding subunit (Bacillus subtilis( so8 57 I 71 4 I 54 1 1 4550 14744 19112198826 11AF0042251 Cux/CDPIlBlfl Cux/COP homeoprotein (lbs musculus( 80 60 195 I 59 j11 7109 7 486 1gi1951052 IDRP9, putative (Streptococcus pneumoniae( l 68 378 3 1230 11550 lplrIA02815IJt58S Iribosomal protein L23 Bacillus stearothermophiius 80o 69 j 321 4- 1 65 112 1 5174 15503 IrirIA028l9I858S Iribosomal Protein L24 Bacillus stearothermophilus 80 70 330 1 66 19 19884 110681 19112313836 1(AE000584( conserved hypothetical protein (Nelicobacter pyloril I so0 66 j 804 4 82 12 648 12438 1gi1622991 Imannitol transport protein (ascillus stearothermopiuilus( j 80 65 17911 I 85 11 950 1630 1911528995 jpolyketide synthase (Bacillus subtilis( I so 46 321 89 8 6870 577919g1853776 Ipeptide chain release factor I liacilius subtilial 80s 63 1092 4 93 112 I 81181 1438 IgnlIPIDldl0l959 Ihypothetical protein [Synechocystis sp.( 80 60 1281 0 0a 40 G* -to~eA TABLE 2 S. pneumonlae -Putative coding regions of novel proteins similar to known proteins Contig JORF IStart stop matcm match gene name Siem 6ident length, ID 11 In I Intl ac Sion I Intl' 4 LrumI80 6 11 4 124 9 j4246 j3953 l;gnIjPI~ldlO22S4 j30S ribosomal protein S16 (Bacillus subtilisj so8 65 294 4- I 128 1 8B 5148 16428 1gi12281308 jphosphopentorsutase (Lactococcus lactis cremorial 80 66 1281 I 3 1 12665 j11376 IgiIis9iOS INADP-dependent giutamate dehydrogenase (Giardle intestinalial 80 68 1290 4 4 1 140 119 115699 119457 i9I;17210 putstive transposase (Streptococcus pyoganesi 80 s 70 243 158 12 12474 1984__19111877423 jgalectoae-1-P-uridyl transferase (Streptococcus mutansi 80 65 1491 4- -4 I 111 110 j 7474 1 7728 1911397800 jcyclophiiin C-associated protein t~us eusculual 80 60 255 4 I IRI I1 1 2 16k9 1911149395 llacC (Lactococcus lactisi 80s 66 618 -4 1 313 1 1 27 1 539 1igI1143467 jribosomal protein S4 (Bacillus subtilis] I so 1 70 513 1 329 1 2 11652 j 858. 1911533080 19c? protein (Streptococcus pyogeneal 80 63 795 4, 1 371 1 2 1958 1911442360 jClpC adenosina triphosphatase [Bacillus subilis( so 80 58 j 957j 4 a 7 14312 15580 19i1149435 jpurtltve_(Lactococcus lactis) 7 9 64 1 1269 -175- 35- -4 nei 79 1 1 4 23 1175--------1542975-I-bcB-(----o--a-robacteriu-t-e---o-ul--r-gnes- 61-104 1 33 11- 9244 21 1i gn11P1D1e253891 luor-glucose 4-epimerase (Bacillus subtillsl 1 79 1 62 1044 4 1 36 13 11242 2633 lqnl1P101o324218 IftsA (Enterococcus hirael I 79 I 58 1392 2-- 3 13 I715 I8378 1 1053 ace---ate- k as (Bacillus-- -ubtilis- 58-1224 4, I 55 I7 1 9011 1 8229 1gi11146234 idihydrodipicolinate reductase (Bacillus subtillsi 1 79 56 f 783 4 119 I8661 I8915 19112078380 Iribosomal protein L30 (Staphylococcus sureusl j 79 68s 255 69 I4 1 3678 2128 1gn11P101e3l1452 Iunknown (Bacillus aubtilis( 79 64 1551 cccus4- ul 79 9 788-- 7279------1677850--Ih---othetica----protein---(Staphylococcus----a--reus- 59 -J 603 4 1 72 110 1 8491 19783 IgnlIPIDIdlOlil9 Ihyvotheticsl protein (Synechocystia sp.( 79 62 1293 13 129061j 7300 jgiI143342 Ipoirserase III (Bacillus subtilisl 1 79 j 65 4395 4 1 82 114 113326 115689 1gnlP10e255093 jhypothetical protein (Bacillus subtilial 1 79 65 -2364 4 -4 4I 4 4 1 86 113 112233 111118 1911683582 Iprephenata dehydrogenase (Lactococcus lactis( 1 79 58o 1116 9- 92 3 940-- 1734-- -g--53-286 ,tl s ph s t -sm--e-Lctcc-s--ti-- 5 j 9 1 98 1 6 1402; 44 IgntIIDId(00262 jbivG protein [Salmonella typhimurus( 79 63 1 720 4 TABL 2 5 a 55u a Puat v codi* re in of nove prt i s s mlr 5ok o np o en 107 S pneumo60 JIJ608 -niaePuaie cd-ain regseofnele protein similaroo wn poe is9 1 5 2 -I I8 830 flo4688 Int Intl acession8 [Mcbctru Ie 9 6 j 1ntl 113--49- -9 112 1131 1115 -gj576 galactpos idglceas e (Str eccu I emutoa mai a 79 64 12166 I 107 7 1j56 640 3 01145600 jaD-al a D-lanineu ligasrlae prti7Etrccu acls 9 67 1823 4 171 J 2858 I 3032 1114823 pptativ B196c2.l89c (l acterus ipae) 79 I 64 1446 T i 1512 10 13 2 0 n 112213 oo I01508 3poso glcaat kUC INa NKN erm. ogalu mailal 7 9 60 j I121 4 I 1 5 216 3052 1011912 I ai valelceo (L sht snhsactococcus lacti 696 1 17 18 45637 g1114944 jpuative (Lmatococusu atis IMMnjshrci oi 79 1 5136 I_3_ 2187 3 38212 j 21 IgnlIPIDjdlO2002 gl1AB004881 FNCTION UNon. Bacillccus abtilia 79 58 1804 I 4- 21 19 1 1 49 1 1145 1g9119 2 indoleglyc)arl in ph shate a synthase acoc us lactis 1 19 j 66 801 C 38 21 1 363 321 Ion(118602 04 l yutsreol-ie ps rein (Laccocus uLactiS 79 64 6923 215 2 15 1215 10 2293242 I(A00820 riin n spt) uciae sytas[Bacillus subtilis 1 19 j 64 217 4 32 2 53 I18 119L1953 1305 iv rboMal poteinu (Pediococu acd7a8ci 67 2521 I I 314 55 129 Igi1I43328 Iy9h geP prondut; putativeu (acilua btilia 1 79 59 417 I- 6 1 282 409364 1 1 0(1853161I05 IUDP-eN-ac teticsamine la boy nytaseae(acillus subtilial 7 8 62 j 1 a 2 0 15 i 7863 101149432----- ;pta ti e h sh rbsl rnf rs Lactococcus lactil 78 63 j 113 I 1 5 j2 822 1364 18314 IgnlPIOIdOO3as yaein ytas Bacillus aubtillal 1 78 j 63 1 1059 I 20 127 0 978 120610 1911299163O~s jsagie V sprulrona (Bacillus subili( 18 58 53 T I 78 -0 4- -46 C. Cc CI ma c gone Cam aim CCngth 4 1 40 11l 9287 .18001 IgI1713516 UTP cyclohydrase II/ 3 ,4-dilydroxy-2-butanone-4-phosphste synthace 28 1 I I (Actlnob acillus pleuropneumonlae( 18 I 422 12I8 olicobacter pyloril zi prtin g5 I 2 2 2101 1430 1gi11183887 lintegral membrane proteins (Bacillus subtllis( 1 S8 j4 672 4- 4 55 17 163 j51 lPIe3l3O2i Ihypothetical protein (BacIllus subtilisi 78 J 1 1026 1- 1 1t ,19756 119598 giI79764 j calcum channel alpha-jo subunit (jomo apiens) 78 J S* 159 74 j1 i1 031 114018 IgiIlS7I279 joolliday junction DNA helicase IrusRI (laemophiius intluenael 78 7 j 1014 I I9 I6623 1 7972 Igl 11877423 Igalactose-l-P-uridyl transferase (Streptococcus mutasl 78 62 1350 1 81 112 112125 113906 Igil 1573607 jL-fucose isomerace (luci) lllaemophllus intiuenzse( 1 78 66 1782 3 2 2423 4417 19i1153744 joaR X; putative (Streptococcus mutans( 1 78 64 1995 T -4 183 18 116926 118500 Igil143373 phosphoribosyl aminoistidarole carboxy formyl tormyltrsnsterase/inosii. 1 17 1 monophosphate cyclohydrolase (PUA-ll(JII (Bacillus subtiiis( 78 j 63~ I 00 I 83 120 j20212 120775 1911143364 Iphosphoribosyl aminoimidazole carboxylase I (PUR-E)l(Bacillus subtilial 78 I 64 564 92 -i 2 T 165 878 I- d jC F S reptococcus-mutans 78-- 62 714- I 98 1 8 1 5863 16909 19112331287 I(AF013188( release factor 2 (Bacillus subtiiis( 78 63 1047 I 13 3 1071 2741 1gij580914 IdnsZX (Bacillus subtilisi 7 64 1671 4 T I 127 4. j1133 I2071 1911142463 ItNA polymerase alpha-core-subunit [Bacillus subtilis( 7 59 1 939I I 132 1 2782 497 19111561763 jpuilulanase (Bacreroides thetsiotsomicron( i s 78 j o8 2286 I 115 4 12698 j 3537 19111788036 I 1AE000269( 11*3-dependent HAD synthetase (Eacherichis colil I 78 66 840 I 24 126853 125423 19111100077 Iphospho-beta-giucosidase (Clostridium longisporuml 78 64 1431 5S 1 4690 14514 jgij 149464 jamino peptidase (Lactococcus Iactis( 78 4,2 1 177j 151 1 6 9- 5 99 INADO dehydrogenase su u i ru b ri a- 78 5 +4 4 I 162 14 149971 1 4110 1gn11P1lje323528 Iputative Yhs? protein (Bacillus subtilisi f 78 j 64 1 88 I 11 10 I8651 I7941 1911(49402 lactose repressor locH; sit. I(Lactococcus lactisi 78 48 j 705 -i I 00 4 327 4958 IgnlIPlojdioO12 linvertase (Zymomotiac mobilisl j 78 6 il1 j 1332 1 203 13 1 3230 13015 IgI 11174237 ICycK (Pseudomonas fluorescens( j 78 1 57 1 216 4 4 L 2 e e S. pem a Puati* coin ein of noel prten siia o *ew roen 9**ti 9OR Ctr Sto mac gen *am 5i *dn S10 pnuona 678 Putativ002 O ee crodng regins ofnovel rten simla tokow8roen SI- 4 214 6 13810 12797 [gnl[PID~di02049 P. haemolytica o-eialoglycoprotein endopeptidase; P36115 (6601 78 60 1011 I I I I I tranamembrane (Bacillus subtilis) ST 21 1 9 21 [i[843 alcohol dehydrogenase 2 (Entamoeba histoiytical 78 64 2709 122 3 2316 13098 's15307 pore germination end vegetative growth protein lgerC2) ilaemophlus 78 65 783[ 26 j-I11534 i influenza. I I [I [742 8 gi[5172i0 [putative tranepoass (Streptococcus pyogenesl 78 I 65 735 27 753 [gnIIl[P IO00306 [ribosomel protein LI (Bacilius eubtilis( 78 65 531 312 [3 1567 079 [g[12 89 261 [comE ORFi (Bacillus subtile) 78 I 54 I 489 TIi I 1 139 1 1 117 1 7)94 1g111916729 [CadD (Staphylococcue eureusl 78 53 I 6781 1 342 1 2 1 762 1 265 [gl[11842439 [phospheatidylglyceropliospltate synthase [Bacillus subtilis) 78 59 I 496 3 83 1 1 737 1 3 1gi[1184680 [polynucleotide phosphoryleso (Bacillus subtilia) 78 64 1 7351 0 1 15 111923 111018 [gl[1399855 [carboxyitransferese beta subunit (Synechococcus PCC79421 77 63 1 906 1 8 1 2 1 1698 1 2255 [gi[149433 [putative (Lactococcus lactisl 77 a 59 1 1 17 114 1 6948 1 7550 [gi[520738 [comA protein (Streptococcus pneumoniae) 77 60 603[ 30 112 1 9761 1 8967 [gi[1000451 [TreP [Baciiius subtililI 77 I 43 I 795[ 1 36 114 111421 112131 Jgi[1573766 [phosphoglyceromutese lgpmA) Iliaemophilus Influenzae) 77 64 711 1 .55 1 3 1 3836 14096 [gi[1708640 IYeaB (Bacillus subtilis) 1 77 55 261 61 8 1 8377 18054. [gi[1890649 [multidrug resistance protein [nrA (Lectococcus lactic) 77 1 51 324 S 1 65 1 2 1 607 1 1254 [gl[40103 [ribosomal protein L4 (Bacillus eteerothermopiilus( 77 j 63 648 68 6 7509 1 7240 [$1J47551 IHRP (Streptococcus suis) 17 68 270
S
1 69 1 1 1 1083 1 118 [gnl[P113[e311493 [unknown [Bacillus subtilisi 77 57 I 966 77 5 1 4583 1 4026 [gnl[PID~e281578 [hypothetIcal 12.2 kd protein (acillus subtilisi 71 60 558 1 83 114 113104 114552 [gi[1590947 [amidophophoribosytraneterase Ihlethenococcus Jannaschill 1 77 S 6 j 1449 gI~l~3289 J04-- -tid- it4 regiusl 77 6 4 94 [206 544 [gl-;-e299 i 0096 cyli nuloid-ae chne bet suui -tu novgcs 77 66 2439--- 96 111 1 8518 1 8880 [gi[5518719 [oaR I (Lectococcus lactisl 77 62 363 -bindingprotein(Streptcoccus- -12 4 99 g [108 -179 [l15337 [sugr-bidin prti [Srpoccu uas 1284- I 9 9* 9 9 999 to 0 99 0e TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins i Contig IORF j -Start j -St-o-p m tch jmatch genename aim 8 ident length I ID ID I Intl I Intl I aces sion I nt I,
I-
106 12 1361 i 176 Igij148921 IIicD protein (Haeophilus infiuenzae( 7 51 8161 1 108 j 4 13152 14030 Igi1S5747.30 Itallurite resistance protein (tehl (HIaemophilus influenae) I 77 58 8791 4 4- 1 118 14 1 3520 13131 Igil 1513900 jD-slanine permeas. (dagA)l(Haeophilus in! luenzael 1 77 j 57 f 390 1 124 14 1 1796 11071 IgiJl573l62 ItMA Iguanine-NI(-methyltransferase Itrm)l( Haemophilus influenrae) 77 1 58 726 4 9- 4 I 126 4 j5909 14614 jgnlIP1D~d10l163 ISrb (acillus aubtilia) 77 62 1296 i 4 128 2 630 f1373 IgnlIPlD~dlOl328 YqiZ (Bacillus subtilis) 77 I 58 744j 130 1 1 1287 jgnlIPID~e3250l3 Ihypothetical protein (Bacillus subtilis) 77 61 I 1287 i 9 1 139 j5 1.4388 1 3639 Igij2293302 j AF008220) YtqA (Bacillus subtilis) I I1 59 I 750 1 140 111 110931 19582 I1911289284 Icysteinyl-tRNA synthetasa (Bacjilus subtilisi 1 17 1 64 1 1350 4 1 140 118 119451 119263 10i1517210 Iputative transposase (Streptococcus pyoganest 1 7 66 1 189 S4- 4- I 41 ;2 -;976 j- i1683 I;gniIPIDlelS7B87 IURF5 l&a 1-5731 (Drosophila yakuba)I 1570 s 708 1 I 11 275 j59 gi1556258 IsecA (Listeria monocytogenes) 1 77 1 59 1 2559 00 144 212 671 1 3173 lgnIlPIDdOOS8S jlyayl-tRNA% thynthetas. (Bacillus subtilis) 77 61 1503 t I 63 S 642 3* g11511015 Idihydroorotate dehydrogenas. AK ILactococcus (actIs) I 77 62 9871 4 164 110 7841 7 074 IgnlIP1D~d100964 homologueot ironidIcitrate transport ATP-binding protein FecE of E. col 77 52 7681 1 191 1 8 17257 1 5791 Igill49Si6 lanthranilate synthase alpha subunit ILactococcus lactisl 77 1 57 I 14671 I 198 1 8 j 5377 1 5177 19i11573856 jhypothetical (Ilemophilus infiuenzse( 77 1 66 201 213 1 202 462 111743860 jBrca2 Ilumusculusl 77 j s0 261 4 250 f2 231 j509 9nl1P101e334776 IYIbH protein iBacillus aubrilial 77 60 2791 F2 4 289 3 1737 j1276 gIonIPlDIiOO947 IRibosomal Protein LbO (Bacillus subtilisi 77 62 462 i_ .4 T -iI I 292 12 11399 1668 1911143004 Itransfer RNiA-Gin synthetase (Bacillus stearorhermophllus) I 77 58 732 1 3 1 2734 11166 InIPID~dl01824 Ipeptide-chain-reiease factor 3 (Synechocystis sp.) 76 53 I 1569 4 1 7 123 118474 118235 Ig 1j455157 jacyl. carrier protein (Cryptomonas phi( j 76 1 57 1 240j 4-- I 9 8 5706 14342 jgiIl146247 jasparaginyl-tRHA synthetasa (Bacillus SUbtilisl I 76 1 61 j 1365 10 15 14531 1 4385 IgnlIPIDIe3l449S jhypothetical protein (Clostridium perfrlngens( j 76 j 53 1471 4 1 18 12 I 165 842 1gI11591672 Iphosphat. transport system AlP-binding protein (Hethanococcus jannaachli j 76 1 56 77 4- 4 9 4 TABL E 2 S. pneueonlaa Putative coding regions of novel protein-ifumiar to known proteins I Contig (ORF Start I Stop mtch I match gene name sm Iiet Int I ID (ID I ntl Int) a mceasion I I n- 12796(2873(gn(PD~e339 (raslaio intitio fcto Il iA -17)[Bacillus stearothermophllus( 76 64 1 378 35 *16 1 2682 (g111773346 (Capso (Staphylococcus aureus) 76 61 11881 48 (28 121113 121787 19112314328 j(AE000623( glutamine ABC transporter, permease protein pyo gnPi (lelicobscter 76 52 675 4- I 1 151 107 gnl(PIDle283llo (femD (Staphylococcus aureus) 76 1 195 -4 4- -1 68-- 12661 62 5 2406 1 2095 (gnl(PIDja313024 (hypothetical protein (Bacillus subtilisl 76 1 '59 312 I 9 1 223 (4441 (91(40148 (L29 protein (LA 1-66) (Bacillus subtilil1 76 1 5B 219 68 (2 38 2371 (gni(PIDje284233 janabolic ornithina carbaeoyitransterase (Lactobacillus plantarum) 7 6 61 1044 69 j8 .1297 16005 (gnl(PIDjd101420 (Pyrimidine nucleoside phoaphorylase (Bacillus stearothereophilus) 76 61 1293 73 (12 89 (7267 (gnljrxoje243629 (unknown (Hycobacterium tuberculosis) 76 53 573 4- 74 I 5 11433 1 7039 (gnilPio~dlo2048 IC. tharmocellum beta-glucosidase; P26208 (985) [Bacillus subtiliap 76 60 1395 1 00 (5 643 (7936 Igi(2314030 ((AE0ooss9) conserved hypothetical protein (lielicobacter .ipylorii 76 61 294 4 -4 I 8 (1 (1019 (1696 91(573900 ID-alanine permease IdagA) IHaemophllus influenzac) 6 56 978 83 (19 111616 1198841 (96(143374 (phosphoribosyl glycinamide synthetase iPUR-D; gtg start codon) (Bacillus 76 6 I I I subtilisi (0 I 11 1 86 (14 11:1409 (12231 (gI(143806 (AroF (Bacillus aubtilis) 76 58 1179( 87 1 3 1 1442 (91(153804 Isucrose-6-phosohate hydrolase (Streptococcus eutans( j 76 59 1440 87 (16 (1!054 (15110 (gnl(141Dje323500 (putative Gmk protein (Bacillus aubtilil 1 76 56 645 I 93 I1769 (1539 (gi(1574820 (1.4-aipha-glucan branching enzyme (glgal (iaemophilus infiuenzae( 1 76 46 1 231 4- 94 1 T 51 ;365 1(1I44313 6.0 kd ORF (Plasmid ColEI) 1 76 1 73 1 315 1 1 116 12 1 ;'15 1 1678 (I1(153841 pneueococcal surface protein A (StreptococcuS pneumoniae( 1 76 1 59 474 1 123 1 6 1 3442 1 5895 1gi111314297 (Cipc ATPase (Listeria monocynogenesl 76 59 24541 I 126 1 2 1 2156 1 2932 (gnlPIo~dI1328 (Yqiz (Bacillus aubtilisl 1 76 6il1 7771 1 128 (10 1 6973 1 7797 (gl(944944 (purine nucleoside phosphorylase (Bacillus subtllis( 76 60 825 131 (11 1 6186 5812 (g111674310 IAEOOOOS8) Hycoplasma pneumoniao. M.6085 homolog, from Mi. genitl&ium( 96 4 I I~ycoplasma pneumoniaei 1 4 -4 as ID 5I aI a't *nt I asss o a i-140-- f if1t3 -ft ftl--- -3 -3 23 4 1 f192 5 1i02 930282 a s~ 1-tqA (Bacillus subtilis 1 76 1 53 13710 F f 145841487 11523 1g1151880 npolynclete pynhp yase (sbait l ccus bl i( js 76 f 61 2337 (43 2 58 395 g1143172 Itransper n RUA-yr u synth te(illu sbtis 76 61 1323 -i ft- f 21 2 12 207 foi11400198 lexaF B21 0ids (aaS 1-8211c u (Bcilu ubila 76 53 1371 (91 J7 2j8 5228 Ig111S5lB7O laT nthr nt sythei bEcea sbuni Lc toocullcis 76 J 56 f 1588 f f- f t I 15 I3 329 24 Igi12149905 ID-guamic id adin It enzymeu (Enerococusfacai 76 60 13 f 207 1 914 j 362 1011431723 Ilysis rotei Bacillus sutli s( 76 f 0 58 J071 ft- ft61L 208jI 41 207 giJ20998 IDexltai g acidasein Denzym (Str ept ococcus uls) is 1 76 57 6722 f 1 217 23 j3480 giI663278 ItransposiOY nhaiaeIoease (Stretococcus pneumnlae 76 69 3 08 23 8 1 1259 11724 igloll 9 1163115 Ieubral rnid s c rept onrcus pinem soni.)-Hempiu3ifue e 76 f160 72 1 ft-- -f-t 4- 2 10 1 82 1 66 1011I24990 1D-gutamic aid tadngf s enzyme l Eneous facais 76 61 6781 40--;12T 4 4 ft- 5 8 112599 111428 19i113 1579 jofmbia hptasctilpoen egu:cnepul rnlation rerssr (pBp(pmoh lus ilen jy 75 I 1 8 1 4 f 306 111 19125 111894 IgnI~I dlO2OS lnyd me (Bcillus [Suteplccs( I 75 64 li660 I 1 2362 600 1gi112381 laspartats pami eransf red i Bacillus s lis I 75 I 53 33122 4-FST 1711 4I88 8379 lg11149493 SCRFY fommedprh mdntAgyylase (Lacetococcus lacris( 7 5 61 876 ft f 33 18 4 26 33035 1 4 IonIP13ldl lI a5 goneM (Bacil u (B ci llss)tl 75 1 52 f 8 96 1 22 j3 41 40 83 2728 jgi1133157 uthnor (Bacillus subtills( I 56 70 1 ft- 4-- L 2 s a a S. pemna s Puatv coin rein .f *oe *rten siia to *nw pr *e 36 S pnuona Puttiv c 92879 I(F0 todng regionsrotynovelhproteins similar to knownlprptein I 33 2 339688 71836 I111r3AO 2 C oufere ri n e prte n Costriiuxs ermaetuicul 75 56 36 4 319I IgJ 8 j n sport geer (Caenorabdcitis delbegan il 1 75 I5 1 4 1 I 110 8318 '1 7CG Si te N4o. 620; alternate gene names ha, hap. bar. rsrl, apparent frameshift 71 50 3366 1 1 118 9I312in Geneank Accession Number X(06545 (Escherichia colil I 54 118 1.19566 120759 11666069 cr62 gene product (Lactobacillue leichmsnnii( 75 I 58 1194 I 5' 9 I8448 1 7822 g11290S61 1c188 (Escherichia cclil i 75 i so 627 4 I 65 14~ 6072 6356 ~g11606241 130S ribosomal subunit protein S14 (Escherichia ccli! 75 j 64 265 4- 4- 4 70 4 j3071 2472 g1 1256617 ladenine phosphoribcsyltransferaae (Bacillus subtilis( 1 75 I 57 I 600 4 1 71 124 130399 129404 10111574390 IC4-dicarbcxyiate transport protein (H1aemophilus influanzae( 7 5 I 57 I 996 1C I 7 2 10 55 gniPID~e249656 lYneT (Bacillus subtilisl i s7 57 I 456 7 a i91-- 55--ig 11 79 j 1810 49i g1I11462l9 1 28.2% of identity to the Escherichia coli GTP-binding protein Era; putative '15 I 9 I 1320 4 I 82 6 6360 6536 ~g111655715 IBZtD (Rhodobacter capsularusl I 5 I 55 I 1771 I 83 1 6 11938 1 2915 IgnllPiDle323529 Iputative PIsX protein (Bacillus subtilis( 75 56 I 1038 4 I 93 111l 7368 15317 Igij39989 Imethionyl-tRtNA synthetase (Bacillus stearothermophilusl 1 1 58 2052 I 9 11 949 699 gij15949 Igutainetrnsport ATP-binding protein Q (lethanococcus jannaschi) I 5 I 54 7 11 1 1795 1 47 gnljPIDe323Sl0 IYoV protein (Bacillus subtilis) j 75 I 57 1 1749 103 2 1362 11186 IgnljPID~e2i6928 Iunknown (Mycobacterum tuberculosis) 1 j5 64 I 825 104 1 1691 1915 IgiI460026 Irepressor protein (Streptococcus pneumonisel 1 75 1 54 225 I 113 15 12951 13883 IgnlIPlDIdlOl19 JABC transporter subunit (Synechocystis sp.( 1 75 I 55 1 933 i-11 90--9 1-2 1-5 -ep-essoro---4- epress-inHrcA--- coccu-- tns1---- 4- 4 62 6 2614 300 9 i 11500451 jannsschii predictedccding region M1J1558 (Ierhenococcus jannaschii( 75 I 44 1 387 42 137 18 110082 j10687 jgil393116 P-glycoprcrein 5 (Entamoeba hisrolytica( 7 5 I 52 6061 4 149 III 8499 I9338 IgnlIPIDjd100582 junknown (Bacillus subrilis( 75 j 55 I Cae a 0 ace a .a c .a a TABLE 2 S. pnewsonlae Putative cod Ing regions of novel proteinAsSimilar to known proteins Coti a t Star Sto match m tchgenename aim j6ident le1ngth j ID JID I Int) j nt) acession I nt).
i 151 ;6 -;9100 I 7673 igiI40467 IHsdS polypeptide. part of CfrA family (Citrobacter freundi) I- 75 I 57 1428_ 18 1 986 3 IgnIjPID~e25389l IUDP-glucose 4-epimerase (Bacillus aubriIIl 75 I 61 -984 F I-17 5653 j -6774 i;giI142978 Iglycerol dehydrogenase (Bacillus atearorherisophiluag 75 56 1122 172 (9 7139 9730 1gnljP10je268456 Iunknown (Iycobecterium tuberculosis) i 75 I 58 j 25921 473 4 -1 5 4BeI 185 3 30-66 12014 1 5 7 4 8 0 6 *permidine/pu-rescine transport AflP-binding protein (potA) I la-- ohlu-75- 1053 1 I I I IIsinfluenzae( I I S 1 191 16 15235 1 4213 IgiIl49518 Iphosphoribosyl anthranilate, transferase (Lactococcus laccia) 1 75 I 61 1 1023 SI- 1 3 774 j1181 1i2314588 I(AE000642) conserved hypothetical protein (Helicobacter pyloril I 75 1 65 1 594 231 11 1 1 153 Igti40l73 Ihomolog of Ecoli ribosomat protein L21 (Bacillus subtilis) 7S I 57' 153 1 234 1I 418 1gij2293259 11AF008220( Ytql (Bacillus subtilisl 75 I 59 I 4171 279 1 552 151 iglI1119190 junknown protein (Bacillus subtllisl 75 50 4021 -T 4 291 7 3S58 13827 IgiJ400Il 10RP17 OAA 1-161) (BacIllus subtillal 75 1 48 1 270 00 I 375 12 137 628 IgiJ410137 10RFX13 (Bacillus subtililI 75 I 58 492 4 I 20 :16721 117560 JgiI2293323 (AP008220) Ytdi (Bacillus subtills)745 0 i I I6 j4682 16052 gLi135421l 1PET112-like protein (Bacillus aubtllisl 74 53 8401 18 4 311 42 gnI~Dd101319 IYqgl (Bacillus subtll 41 601 171 1i 6- g-8 n0 -I-I 10721 4I 1 2 1 6 1 518 1 48 0 1 1 0 23 1 Iglutamyl-aminopeptidasa (Lactococcus lactisl i 74 I9 1086 24 I2 79 548 IgiI2314762 I(AE0006551.ABC transporter, permease protein lyseEl (Helicobacter pylon)l 74 I 46 j 192 1 25 1 1 2 1 367 IgnljP1D~dl00932 jll2O-forming NADI] Oxidase (Streptococcus mutans) 1 74 63 366 4- 1 38 118 1114132 112964 IgiIS37034 jORF..o488 (Eacherichla colil7 713 I 48 li0 8!124 16669 Igil 1513069 P-type adenosine triphosphatase (Lisreria monocytogenas) 74 I 3 2256 I 5 il 11964 14O1 qIgnlPIflje283IlO Ifeno [Staphylococcus aureusl 4 6 6 I- 4 I 61 12 11862 1427 jgiJ2293216 I(AF008220( putative UDP-N-acetylmurarnate-alanlne ligase (Bacillus aubtllis 74 641 1564 -76 i-o 4- 4 I 8 I2 616 I 96 plnIC33496(C334 Ihi3C homolog -Bacillus subtills I 74 I 55 210 2- -46 4 -V 1 86 19 188I 00 II638 prephenate dehydratase (Lactococcus lactis( 74 55 1 906 to o 9 0 TABE 2S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Concig IOn" I Start I Stop j match jmatch gene name %aim tidant Iongth I D 11n 1I Int[ I [nt[ acession I ntl 102 j 5005 15652 jgi[143394 IOHP-PRtPP transferase [Bacillus subtilia[ 74 I 57 I 6481 *-4T 1 0 5 4364 3267 1gn11P10le323524 IYIoN protein [Bacillus aubtilis[ I )4 I 62 10981 43-- -4 18 7 6864 j7592 fgnIjPIDje257631 merhyltranaferase [Lactococcus lactisi 7 4 1 56 f 729 1- 44 4 il I2 478 146 IgnllPlDIOI32O IYqgZ [Bacillus subtilis[ I 4 I 45 I 3331 1 17 9 16167 1 6787 IgnlIPrDId100479 Iwo. -ATPaaa subunit D [Enterococcus hirae! 1 74 I 53 I 621 I 9 I4 I3008 13883 IgnIIiDIdlOOS8l jhigh level kasgamycin resistanco [Bacillus subtilis] I 74 I 55 876 57 12 243 824 i [173373 thltdDA-rti-ytiemethyitransferase [datli [Iaemophilus 74 j 48 181j 1 [64 1 6 13515 1 4249 IgiJ4l0l31 IORFX7 [Bacillus subtilisi 1 74 1 48 735 17 7 j5446 5201 19114 13927 lipa-Ir gene product [Bacillus subtilisi 74 1 55 I 2461 T 171 1 1 j1818 IgnlIPIDIdlO22Sl Ibeta-galactosidase [Bacillus circuians[ I 4 62 1818 i4 4- 46-7- i 2 g I534 g2#[ transport ArPase protein C [gtC) SP;P22037[ [H1aemophilus 74 8 32 18 j361915366 Iintiuenzael 24 J 4 F4 -4 189 Ill I 6491 7 174 lgil 661199 jsakacln A production response regulator [Streptococcus mutans[ 74 6- 1 684 210 2 1520 1 287 g9112293207 I[AF008220[ YtmQ [Bacillus subtiiis[ 46 6 i 4- I 21 I1 836 192 1911666983 Iputative ATP binding subunit [Bacillus subtilis[ 74 556 685 263 3 161 j6 365 3gi66 232 137.7 -protein Inaubt-oMeri- 7- 645- 037- 1 Y repeat region [Saccharomycea cerevisiae[7 J 4iJ 23 I 265 1 2 844 j1227 191149272 lhsparaginaso (Bacillus licheniformisj 7 4 I 64 384 4 I 368 1 1 1942 1911603998 Iunknown [Saccharomyces cereviaiae[ I 4 I 39 9421 i -i 4 O I5106 1 449 :IgnIPIDje305362 lunnamed protein product [Streptococcus thermophilus[ 73 I 47 1 258 I 3 f2 52 j24 Inl~lIdlOOS76 Isingle strand DNA binding protein [Bacillus subtilis[ I 73 I 55 I 279 1 -2--522--i244---In-1jP1 1 32 1 6 15667 1 6194 IgnIPDIdIOl3Is [YqIG fBacillus subtilia) j 73 so5 528 1 34 115 11,3281 19790 IgnIlPlDIdlO2ls1 J([A001684[ 0RF42c [Chiorella vulgaris[ 73 j 46 4921 4 4 9000 *Vol0 :a TABLE 25. pneumoniae Putative coding regions of novel proteins Similar to known proteins i D l jtt) j In ntl acession n I n), 40 j2 j9876 9226 jgiIll735l7 Iribofiavin synthase alpha subunit (Actinobaclllus pleuropneumoniaej 1 73 1 55 651 l2T ft f 2 3592 839 lgnIjIlDIOl887 Ication-transporting AT~ase PacL ISynechocystis sp.1 I 73 60 2754 f I 55 118 117494 116586 lgnlIPIDle265S8O junknown (Hycobacterium tuberculosis) 73 52 I 9091 16 1 7213 7767 1011143419 Iribosomal protein L6 (Bacillus stearothermophilusi 73 60 555 66 300 369 gnljPlDje269883 jLacF (Lactobacilius casei) 73 52 I 3601 7- 0 5557 j5733 -gfl1-857631 lenvelope protein (Human immunodeficiency virus type 11 73 60 177 4 6133 8262 1g111e;26 -a tsirnfra (Srpoccu pnu e 2130s- -f 8--9 F -3 851 3- I--A---0----0---transporter--- (Bacillus----subt----l--- 849- -019- 1-6 -95 -ft- 6482 7- 1 80 j '17 8113 9372 10i11377823 jaminopeptidase (Bacillus subtilia) 73 60 I 12601 I 7 j349 j1668 IgnlPIDIdlOl954 dihydroxyacid dehydratase (SynechocYstis sp.( 7 3 I 54 1722 97-- 0 I 8 9 I6912 7619 1gnljPl01e3l4991 jFtsE [Hycobacterium tuberculosis) 7 3 54 708 110 f 108 III 110928 ~10440 IgIl388lO9 Iregulatory protein (Enterocoocus faacalis( I 73 54 489 128 6 3632 4222 ~g~65l orfl091 (Streptococcus thermophllusl 73 1 63 j 5911 *1 8 1 1 -f 1 138 1 2 11575 1394 IgiIl47326 Itransport protein (Escherichia colit 73 60 1182 1 14 13 112538 j11903 1pir1E5340210534 Iserine 0-acetyltransferase (EC 2.3.1.30) -Bacillus stesrothermophilus I 71 55 636 4 f- 162 5 5701 4991 IgnlPIDfe323Sll putative YhaQ protein (Bacillus subtills) 7 3 so5 711 ft- 164 4 223 270 gi15206 hypothetical protein ISP:P257681 (Hethanococcus jannaschii( 1 73 j 52 468 144-- -4232-- 16 55 56 111410137 jORFX13 (Bacillus subtilis) 1 73 j 56 732 f 170 j 5 4394 5302 IgnljPIDld100959 Ihoimologus of unidentified protein of E. coll (Bacillus subtilis) 73 46 909 F T 178 93 I 48551 101146342 modulation---- protein B. Vend (RhisobiusB lotit I '3 56 I 963i-----oti 1 1 6-1 9-3- 1 204 1 6 j 50396 14278 1on11PlDje2l4719 IPIcR protein (Bacillus thuringiensis) 1 13 1 41 1 819 1 21 3 07 IiI626 Ilooa rti lhm lg euneseii DNA---bi-- -ng- 213--2--832-2037---------6--296---riboso------protein--SI--ho-o---g--se--uence--specific D-A-bindin----protein- I I I I ILeuconostoc lactia)I j 5 J 26 8 287- 141- -ft- som- ei- -21[Bac- -us- 4 6 231 2 84 287 i 40 3 o-o- og of Ec l rioos prti 2 (Bclu ubtii----s- 61 204- 23----f50 t- -ft- ine- 4 -ft- oli 73 1 i 4- 4 TABL 2 e c c c S. p a *uatv coin ein *f noe proein iia to kn.w prten S pneu-- onlaa- Putat piI0271RM i v o da rtin regin o- novlro tens similar to know proein 316 1) 4691 IgIPD~II3 amntls Intlhoy i s. 7c2so 1 52 4 289 7 13743 1 320 IglpII277IR7H Iiooal pohroin LiLiz Hi yc riru re deyrts cl 72 596 441 7 4- 1 22 11 11637 112469 IgnlIPlDldlOl833 laidsoe reesn coSynechocyatis 72 52 1527 4 7 71975j7647 1911146976 musSatltN l ssc ri ha s ilsteropil 72 52 1521 4 2 19--sI 15037 116224 1gn11Pl0dl297 Iribnoo m acasiact r (nercocyis ap(2 56 882 4 33 61 1121191 114256 lonlIID dlll IaoRF rerso (Streptococcus mutanal 72 58 68 F4 S 46 29 117 11127 IgJS6S143l jasparty l- ta m s ne Atas a nspoermster ophilsj 72oei 52ll, 1521coac I 3 I 4I59 6 1328 gnl1P50c87 junknow (cousbtiumtueruoae7 56 12 5 4 5 144 4636 Igi1153723 [lactose2 reprsso (Bcisteoccu mubt ns 72 58 1681 4 48 j2 14598 j 12538 1911142l21 inhxib i yiidn bea--ubn tOvi s (a cilusasjii 72 335 2071 48 19 1172 12424 88'2132 JIP 4 pylon gltarin ABsChtrianspor emse rti (lP falcbc 72 J 9 49 5 1 5 1 5 4293 3288 jgnlj175 0 8 B lnb Bc il eisac prn eglt ypoei (B ilus subtilisi 72 44 1242 1 71 1 !0454 5222 19112293230 (AF008220) YtbJ (Bacillus subtiiia( 24 7 -i 72 528 113 1126816 11239 191114 252 jd eox ri protdin phtlys[Bacillus subtill al 72 52 2518 9 451 55 476 g3 111825185 sktl0F o34 GuscG sart m Echaneric h a -u ni coqul) Iaal 72 38 1 8271 91--i7_ 4 4- T AB L 2 9. 999m a Pu at v coin 999on 999 999e 999ein 9ii a ok o np o e n 99 9999--- 9---99-99 TABLE 2 17 pn jeeonise -putative coig eioofnvl protein aclus ls' imia to kno0 proein -I 09-- ti lO F -34- 4) Star 3- Sto march malacin e nhpa e reuatr aimei B~c u identis length33 S- FI 2 2004 j l------gn-IlIPIDl---e 8323S Igputaie-id n Asp23ami protein (Bacillus subtilie) 72 40 288o 10956 1 14 2 9 11 giJ43331 lalkaile phosphase reg Culor piroein(ails utls 72 j 52 1345 146 3 1735 1247 1gniI24 39 hypF0th775ica rotyei as (acis ubills sbi s 72 45 74 66 h3 55 229 11422 -yptetiNa lroei [Bnarcoccus hubiram) 12 46 1002 F147-- i- 1 140. 1i 5 60561 9203 191149232 71 ITP- ypnechoceou e hy rg n beapub ni jCotiiu an 72 56 10119 146 1 jqg S 96 14 iPD~344 hpteial rtn(aillus subtilis 72 so I 660 148 8 31 605 443 191I1973 INAD(PI-depnden d43 pId yaentonel-phoaps)ato reductascilus subtom a 72 41 9918014 105 109675 IgiPJD6d3 l3 jvqgnwn (maclus yce subtilis)l 72 50 582 10 1 301 2862 1 91 ji 7178 junknown~ a (acha oyce s cerevisise) so 5 141 267 1 3 j 49 91290513 1f470 (Eacherichla colil 72 I 48 441~ 28 2 1 9 540 Ig~PDd100964 homologue of aspartokinase 2 a]pha and beta subunits LysC of B. subtile 72 45 3601 290 1 101l1 1DI (B,4495 Ti a Is eubtilogos) to a 40.0 kd hypothetical protein in the htrm 3' 72 5I10 Li 0 I 10re4 gj749 gion from E. colt, Accession Number X61000 iMycoplesma-1)ke organiam)II 105 0 1 00 111 63 1587 1911746399 Itranscription elongation factor (Escherichia colt) 1 72 so 525 316 j1 1326 1 giJI58127 jprotein kinase C (Drosophila melanogester) j 72 40 j 1323 32 1 I227 j 3 lgnilIxDjdlOll64 junknown (Bacillus subtilis) 72 54 I 225~ I 54 I 1 105 gnIPD~lOO8 IC. thermocellum, bete-glucoaidasez P26208 19851 (Bacillus aubtilis) 72 52 1005 I I j0 S~t 1146 In1P1DI9264229 Iunknown (Iycobacterium tuberculosis) i 71 57 I 2334 I 7 20 116231 115464 191118046 I- ocy (sy care poti) edtse(uhelanceolats) 71 52 j 768 xay-ay-are-rtilrdcae(u I 15 1T 1291 2 gnIPIDIdlOOS1l Ireplicative DNA helicasef (Bacillus subtilis) 71 51 1296 *ea So 0 @0 e TABE 2S. pneumoniae -Putative coding regions of novel proteini similar to known proteins f Contig ORF Start Stop watch mtch gene name t aj% ident length I ID 1ID Itnt I tnt) aceslon I n t tn), 1I8 16-0 5120 i4218 IglP11O~S~qO(Bclu utls 51 903 29 1I 1 540 gi11773142 Jsimilar to the 20.Zkd protein in TETS-EXCA region of B. aubtilia 71 56 540 38 120 113327 113830 11537036 jORF..o158 IEscherichia col) 71 48 504 '1-A-F0-1 545-31-- u--a-e-Ioa-t-dprote-n-(ac-tobc---- _-1f---f561 I 0 2 75 265 IgnIIPIDIdl0132O IYcigZ (Bacillus subtills) 111 44 1 4411 71 1l8 J24679 126226 1gij580920 Irodfl (gtaA( polypeptide (AA 1-673) (Bacillus subtilis) 71 44 I 15481 -ft f- 71 125 130587 130360 1911606028 IORF,.o414; Geneplot suggests (rameshift near start but none found 71 50 228j I I I I Escherichia colil
II
8- f 72 14111991 1288 g1624085 similar to ratibeta-alanine synthetase encoded by DenBank Access3ion Number 71 54 888 j 1 4 S a 27 5281 cntans ATPIOTF binding motif (Paramecium bursarla chioreilaI I tI I virus 11 1 1 73 111 1 7269 7 033 1gi11906594 IPNt (Rattus norvegicus) 71 42 1 2371' 174 16 1101385 8517 1 gi11573733 Iproly1-tRN1A aynthetsse (proS) (ileemophilus influenael 71 52 1 1869 81 I9 !.5772 6578 1911147404 Imannose permease subunit 11-H-nan (Eacherichia coli( 7 1 45 8071 4 f- 186 15 14602 13604 Ign(IPIDIe322063 Iss-l.4-galactosyltranstersse (Streptococcus pneumoniae( 71 53 999I j4 3619 4707 0g12323341 I(AF0144601 Pepo (Streptococcus mutans( 71 58 1089 1106 113 10557 112955 1gi11519287 ILemA (Listensa monocytogenes) 71 48 J 603 (4 j2 1029 1979 1gi1310303 ImosA (RhIzobium mliloti) 71 55 I 951 f- 12 I2 5.64 1205 19111649037 IglutamIne transport ATP-binding protein OLNQ (Salmonella typhimurium) 71 50 642 13 519081 03 g IPD~I209H. influenzae hypothetical ABC transporter: P44808 (974) (Bacillus 71)1 13 76 giPtldOO9Isutiit J 1 1 1141 1227 19111673788 IjAEOOOOI5I tfycoplasma pneumoniae. fructose-bisphosphate aldolase; similar 7I1 49 915 Ito Swiss-Prot Accession Number P13243. from B. subtilis (Hycoplasa I II "0 11 Ipneumonialee 140 153 1' 497 jg9II D~1 0 09a64 hoolgu ofhpteia rteni aay synthesis genie cluster of 71 1 4 663 I II PHEUHONIAE. (Bacillus subtilis) III 09 6@ S 0 9@ 9 TrABLE 2 S. pneueoniae -Putative coding regions of novel proteins similar to known proteins f I 99 I 220;-51 1519 Igi(5353524 (CodY 0 0 Ieacllu su zbii s p.NR24 71 52 121 F ft 209 1 2 1 2022 1 1141 (gii42432 (fepC gene product (Escherichia coi) I 71 46 1 8821 f- a--ilu-ssub-Iis------- 16- F -210 F6 -i 3069 13-386-19-iI580900 (08F3 g-ene prodc (Bclu sutls 71-- 48-- 318- 4 1 212 1 2 1 3561 1 2381 (gil557s67 (ribonucleotlde reductase RI subunit (Hycobacterium tuberculosis( 71 53 1 2181 I 233 1 3 1 21303 1 2920 (gnilPID~d20132o IYqgR (BacIllus subtilisj 71 50 1 9181 244 1 13 1 053 Igni(PIDjd100964 (homologue of aspartokinase 2 alpha and beta subunits LysC of 8. subtilia 71 5 1 01 I I I I I I (acillus subtililIII ft 2521 2 1 1008 1 2874 1911755601 (unknown (Bacillus aubtilisi 71 1 46 8671 282 2 96 712 (91(1353874 (unknown (Rhodobacter capsulatusl 71 46 1 195(ft 4 -t f- -t I 338 1 1 683 19111591045 (hypothetical protein ISP:7324661 (Hethanococcus jannaschii( 71 48 681 4 346 1 3 164 (gi(1591234 (hypothetical protein ISP:P42297) (lethanococcus jannaschil1 71 1 36 162 f 374 1 1, 1 619 1 2 (91(397526 (clumping factor (Staphylococcus aureus( 1 11 23 6181 377 1 638 2 (gi(397526 (clumping factor (Staphylococcus aureus) 1 23 687 ft 3 1 8 1 7119 16958 (gnl(PIDle269486 (Unknown (Bacillus subtilis) 1 70 42 1 462 3t r -pre r (St hyl s e d -r 4- 1- f6t1 1 114 (11024 (10254 (gnl(Pzodloo29o (undefined open reading frame (Bacillus stearothermophilus)( 70 55 771 ft 7 118 11413 (13719 (gnllPID~d101090 (biotin carboxyl ca~rier protein of aceryl-CoA carboxylase Synechocystis3 70 j 56 1 49 Sf I 9 I2 LI 1i5 287 (gnllPIo~dloosal (unknown (Bacillus subtilis( 70 52 j 771 12 (4 (26110 (1789 IgnlPID~dIo1195 (yycJ (Bacillus subtilil( 70 52 8221 21 (2 12586 1 1846 (91(2293447 ((AF00893oi ATrase (Bacillus subtills) I 70 54 741( 1 22 (13 (10955 (11512 JgiJI165295 fYdrS4ocp (Saccharomyces cerevisiael( 70 so 50 j (6 (4315 980 (91(39478 (ATP binding protein of transport AlPases (Bacillus firmus) 70 51 336 S 5 5*0 50 a a TABLE 2 S. pneumonia. Putative coding regions of novel proteins timilar to known proteins Contig IORF I Start I Stop wach (mtch gene name S im (%ident. I length 4, 31 (I 370 113 (gi(662792 (aingle-stranded DNA binding protein (unidentified eubacteriuml 70 j 36 258 33 1;5 110o639 -9521 (igi(1161219 Ihomolgous to fl-amino ecid dehydrogenass enzyme (Pseudomonas aeruginosal 70 j 0 1119 4 4 I 38 1 6 1 3812 1 4312 (gi(2058547 (ComYD (Streptococcus gordonli) 70 48 5011 I 38 (2 17986 118417 (gLi(37033 (ORF..f356 (Escherichia coi) I 70 58 9 :1111:- 1 42 2 1 722 1 1954 19111146183 (putative (Bacillus subtilis(1 70 1 5 1 13 1 43 1 3 1 2373 1 1612 (gi( 1591493 (glutamine transport AlP-binding protein Q (Nethanococcus jannaschiil 1 70 8 76( 2 45 18 9197 1 8049 (gnl(Piojd102036 (subunit of ADP-glucbse pyrophosphorylase [Bacillus stearothermophilusi 1 70 1 54 1 1149 559 2 57 j 43 95 gnilPIDlOO275 Sneopu~cllu se (c llus sp)1 70 42 3907 70j3 t7 7956 0 (gnlID 7 6466 ca iop idae P a crococcuhse( ta lac i mohi Inl eze 70 528 1080 61 4 532 2437 (gnlPIDe2750974 ISnknw Bacillus er u i li 70 541 3Iasi i 4- 68 1 7126 (6962 (gi(1263014 (emml8.l gene product (Streptpcoccus pyogenes) 70 37 165 1 72 112 (10081 (10911 19112313093 I 1AE0005241 carboxynorspermidine decarboxylase (nspCl Ilelicobacter pylon) 70 56 831I (10 (7988 8124 (gi(1877423 jgalactose-l-P-uridyl transferase (Streptococcus mutans( 70 59 I 237 -T 4 79 (3 (3424 (2525 (gLi39881 (oar 311 (AA 1-3111 (Bacillus subtilis) I 70 47 900 1 87 110 9369 1 7324 IgnIIPIDle323SO6 (putative Pkn2 protein (Bacillus subtilisl I 70 52 2046 1 96 (14 (10640 (11788 Igi(1573209 (tNA-guanine translycosylase Itgt( Iflemophlus influenisel( 70 52 1149 1 113 12 1 574 1 1086 (gi(43363o (AI80 (Saccharomyces cerevisiae( 70 1 59 I 513 123 1 2901 I3461 (gnl(PIDjdloosss (unknown (Bacillus subtilisi 70 45 I 561 4- 4 12 (S 493 (4282 (gnlPIDle276474 capacitative calcium entry channel 1 (Bus taurusl 70 35 312 4--1 4-93- 4 4 129 5 4500 1 3454 (gnl(IIdI114 (YqeT (Bacillus subtilis)( 70 47 1047 13 (3 208 (1394 0gi229312 (IA008220) YtfP (Bacillus subtilial 70 so5 1215 135~ 1(40(62(njDe2s3(yorfE (Streptococcus pnaumonlael 70 47 1 243 I 137 3 (438 1 932 (gij1472919 (v-type Na-ATPase (Enterococcus hire) 1 70 57 495 18 II (40 3 gi1j147336 (tranamembrane protein IEscherichia colil 70 42 1 438 4 i 1* *I I *nt at a ac aI*tla, TA L 82 S.69 pneumoniae' 1-a n Put tiv ing einszoy oe l prcotins csmila to knwnprten -67 -119 318796 274 jgni97 lP I24 E.NS-ma hy tet alrot atein ho 3cys 05n (Sachrom1e (Bcl ufut 70 j 53 2433 9- 167 l I 2637 1 6869 1gnjI49 3 2ID alGan ie ctivtie zym ic euLcoailscsi 70 5 2569 *T 1 20 42 1175226 2473 IgnIPIDjdl2O489 IC1 thypoth etical protein P385 273(acillus subtilis( 6095 4 00 l2-- i14--92 9-1 9- 2 16 1-4 11 8 43 IgnlI2~ 2 3 7 g ihyotA tical2 protei (Bacillus subtil s 69 44 119001 -9 22 1 18492 19461 ljln~PI~dI83 8 Iipa-8d n e prodluct Bcil lu sbt ia( 69 51 66901 27 6 5 146 584 19112209379 1 I(AF0067201 PrTO N (Bacillus suB ilus sutl 69 J 8 5143 36 22 0 79924 11 11 865 ,1 lI 1 6dlooS o u ow Bc l lusN s bt ilinge (S a h l c c u Ie s 69 1 51 28660 32 11 2 98 1 1067 IgnlIP(DdlO5B junkown (Bacillu s e ubi jAcl g n se t oh s 69 58 189 9- 9- 271 40 1 5813:7 11148 Ign iJ ldO2 O l oli a IjAn001488 FU C I N A UNKNOWN. iu s subtoils i f ue z e 69 j4 2812 94- 1- 5- -emop--lu-sin-f--en--e 9- 36 1 60j7941 11 0 19143 8791 starchcy ct raly og n synthetas (Stph lc s urus( su tl s 69 47 1428 9- 9- 9- S 1 40 14 j124912 111944 igiljlS7j32B 0 jo ia junc t ei n DN ahellas (uvtli(ampils iiuere 69 44 612 I 40 11 1 75936 1173 1g91176530 IsOlA-3- thyla e ne s yc odas IHG (tagl (N eo ph s pinten3ae( Ira 69 5056 9 -9 -9 1 55 6 69476 1 5403 1gi158088 8 Istlya)c (bactera lcg ytase (Bacillus subtilis( I 69 1 5 1458 594 13 124 124153 IgnlIPIDje3I38 Ihypothetical protein (Bacillus subtilis( -38 9- 62 3 6 [61:33 51 1 113962~ O97 5 jsimteia t p rpotein syease ystem enzym Ii (shrci1oi 69 36 78019 4929 1 (similar to A a i es I~co o c s a t a 69 40 1 346 75.3 8 338-77-6 jil39642 0 Eahrci c i eurpu pG D -bl- 3--epimerase---69-------- 0 6 24 033 19(1 14385 Ipoly po lera se Bac oillus sutEC s .5 '~co o c s I C L r r 9 I 59 1230 T 9 9 1 6 1 1 726 762 i29 gJ307 banz e1 (pu paie (tococcus lactis( 69 j 42 J 145 111 6 110017 110664 lgnlI1Dle322063 Iss-l.4-galacrosyltransferase (Streptococcus pneumoniae( 69 39 1 648 aD *II *n Int *csso *e I ai 0 0on *e TABL 23 1296 neuaniae4 Puatiecin regpions of novelpteins siia to known protein 71 s2 123 1796 Ign11IDd7605 (DEcadhern pDrosopilafuU mel aoteri jnleza 69 30 1 237 4--8-T4 77 1 0 27 piIC287870 3 (gros geneo prodclu I sac tococ s ac alI 69 1 46 1 67 1574 S 362 10335 gi113730 fuose o rony proteinamd i fuc yltaemophilus inf u -n ae i lu (utl 69 6 529(48 I 5 2 21 96 gi19097 (lEN-response element binding factor I thus musculusi 69 48 297 91 15 13678 14274 JgJ541 anaerobic rtbonuleoalde-triphosphate reductase activating protein inrdGI 69 '44 597 I 98 1. :124 7 j4032 jgnl(P Id100262 (LivF protein (Salmonella typhimurium( 69 51 786 I 0 (O 1 1'1085 5056 (gnl(PIDje25i7629 Itranscription factor (Lactococcus lactisi 1 69 j 49 I 972 26 1 3 1 :1018 14568 ignilPlD~d101329 lYqJJ (Bacillus aubtilisi 69 49 I 1491 1 131 1 6 1 1121 1 2889 IgnlPID~d10l314 jYqet (Bacillus subtilis( 69 47 I 1233 136 2 1 1505 1 2299 (gnl(PIOjdloossl (unknown (Bacillus aubtills I 69 I 47 1 95 I' I 149 1 I 9336 110655 (g111il (omology with E.coli end P.aeruginosa lysA gene: product o nnw 9 I I I I I I function; putative (Paeudomonas syringoel I 1 I 1 I 1 151 4 391 3829 (gi(1710373 lerno (Bacillus subtilis( 69 44 I 6391 160 49 (2324 (gnltiuoDdloosa2 (temperature sensitive cell division (Bacillus subtilie( 69 49 1476 4 180 1 566 3 (gi(488339 (alpha-ainylase (unidentified cloning vectorl 69 s0 564 4 212 1 1( 1196 (231 (gil1395209 (ribonucleotide reductase 112-2 small subunit itycobacterlum tuberculosis] 69 53 966 4- 1 226 1 1 2 1 661 1pir1J022851J022 (nodulln-26 soybean 69 1 41 1 660( 233 349 466 (gi4791 I-type Ha-ATrase (Enterococcus hirae( 69 1 56 1538 23 3 I£6 16 g1148945 (iethylasa (llemkophilus Influenrael I 69 1 43 1107 45-- 4 243~~~ (RF 2 6 31 (nj~~lo2 or Barley yellow dwarf virus) 69 1 69 1497j -4 gn1--PJ-j-1---I 251 (3 1 2899 1 1967 (gi(2289231 Imacrolide-efflux protein (Streptococcus agalactiae( 69 j 51 933 310 1 1 1 1 1 282 (gnl(P1Dje322442 Ipeptide deformylase (Cloatridiun beijerinckii( 69 55 282 4- 1 369 1 1 1 868 1 2 Igi(397526 (clumping factor (Staphylococcus aureus) 69 1 22 1 867( -4 #4 4 4 -4 -4 1 370 1 1 1 749 1 3 (gi(397526 (clumping factor (Staphylococcus aureusl 69 1 21 1 747 0 C a TA BLE 2 S. pneumnoniae Putative coding regionis of novel proteings *imilar to known proteins Contig 1 OAF jSrart I Stop match I match gene name im j Ident Ilength I 0 11 ID f ntl j Intl I acession I Intl1, 39 j 1j 44 1280 IgnIIPIojdi0O649 IDE-cadherin (Drosophila rmlanogasteri 69 30 j 237j 4 388 I 26 0 72 g1772 A002)hypothetical 32.7 kO protein In trpb-btR iR tre crgo gi--72 JlEscherichia coli rein61 4 J ej,~ I I 2 2006 3040 lgn1lIlDIdI18O9 jABC transporter (Synechocystis sp.( 1 68 1 43 I 1035 -2T 20 4- -4 1258j1526003958 1 282992 Ihietidine kinase (Lactococcus lactis cremoriel 1 68 1 4 1359 4 4- 4 4 I 1 I2 )90 131 PirI169741RB Iribosomal protein L9 Bacillus stearothermophilus I 68f 561 4801 4. 45 4 16 6 7353 15701 1 911i78704i I 1A5000184) 0530; This 530 as ort Is 33 pct identical 4,14 gaps) to residues of an approx. 640 aa protein YNESJIAEIN SW; P44808 IEscherichia------- oi 17 112l 1 6479 1 6805 1gJi553165 lacetylcholinesterase (Hlomo sapiensi 68 j 68 321 2 I' 114128 114505 1i142700 jP competence protein ittg start codon) Putative (Bacillus aubtilisi 68 j 40 378 22 132 124612 125397 1gi1289362 IcomE 0891 (Bacillus aubtilisl j 68 1 36 786 30 113144813488 i llslOFF (Azorhisoblum caulinodansl1 68 46 j 261 i7--i5448--4- 4 36 5 I3911 i45-85 19 111573041 hypothetical lHaernophilua influenzsel I 68 54 I 675 -4 4 46 6 15219 6040 19111790131 f(AE0004461 hypothetical 29.) kO protein in ibpA-gyrB intergenic region I 68 82 I I I 1 IiEscherichia colil I7 Ill 54 10 I6235 j7086 1911882519 CG Site No. 29739 lEacherichia colil I 68 55, I 852f I 5 IS 769 5165 IgnhIPIDIdlolgl4 IABC transporter (Synechocystis sp.( 68 45[-i 1905 71 I 3 I 6134 15613 19111573353 jouter membrane integrity protein (tolAi (liaeophilus influenrael 1 68 50s 5221 4 4- 4- I 71 110 115342 116613 1911580866 lipa-ild gene product (Bacillus subtilis( 1 68 j 31 I 1272 1 71 112 117560 118792 191144073 ISecY protein (Lactococcus lactisl 1 68 1 35 I 1233 4 -9 '3 116 110208 I9729 1g111353537 jdlrPase ioacteriophage rlt( 68 j 51 1 480 4-4-T 4 4-511 4-1 87 17 1411 115866 1g11150209 joaR 1 (ilycoplc.-.a mycoldes) 68 43 j 1626 89 1 89 Ill I 8021 1 8242 1911150974 14-oxalocrotonate tautomerase (Pseudomonas putidai 68 1 43 I 222 I 7 9 a 8 75 1 53 94 112367358 1 l0scherlchyp othcl( 52.9 kO protein in aidB-rpsp intergenic region 1 68 j 41 1362 00 ~00 t gogo* TABLE 2 S. pflCutonlae -Putative coding regions of novel proteins-similar to known Pyoteins oni OA Str Stp j match Imatch gene name I ID 1 Int) Int) j acessionsio IetI1inih I idn legt 91 4 -4 438 ;njIjIO6 Lv rti 4 1 6l------rotein Salmonella typhlmurium( 68 I 40 j 89 4- 1 j16414 111280 1911455363 Iregulatory protein IStreptococcus mutansif 8 50 1 867j I 115 I3 1 5054 1 3693 1911466474 Icellobiose phosphotransgerase enzyme if'' (Bacillus stearothermophilus, 68 44 j 1362 4 125 j j292 j192 1115066 trnsemran potin Bailus ubila)68 1 50 1 1002 I 3132 i2 -i4858 -i 2888 I;gnlI Idloii32 JollA ligase (Synechocystis sp.I I 68 52 1 1971 140 J7 1 7765 I7580 19i1120971l Iunknown lsaccharomyces cerevisiaei)684 186 IS 39 1 3 -'IA P- r -L>o in i -h ydro 1a se lMus-musculusi 68 59 537 164 1 j 58 867 IonlIPtolazss1l 4 Iglutamate racemase (Bacillus subtilis( 68 49 j 8101 -164 V2- I119 I 1835 Ignilrrolelsllt7 Ihypothetical protein (Bacillus subtilisI 68 1 s 1017 iIG9 -69 i7 3946 j 4104 1pir185454s18545 Ihypothetical protein Lactococcus lactis aubap. loctis plasmid pSL2 I 68 j 40 1 159 4- I 170 I4 1 4247 1 4396 1911304146 Ispore coat protein (Bacillus subtilis( 68 52 150j 171 8 (1,002 7054 ;1i138722 Iprecursor (s -20 to 381) (Acinetobecter calcoaceticusj 1 68 54 I 1053 0 21 I2 99 198 I- 1 2473 j- i 1 8 71 Ign l IP Io lehlio, 5 Ih nbothe tica l p ro te in (B ac illp s su b tllis( 68 J 46 1 603 1 4 214 j8 14926 1 j 31 IgnlIPzoldloao4g Ill. influenzae hypothetical protein; P43990 1182) (Bacillus aubtilii 1 68 so 696 4 4- 217 16 4955 151170 IgnIjPlIje3z6,66 imlrto B.vulgaris CtIS-associated mitochondrial (reverse 68 36 216 S'stra:sriptase1 (Arobidopsis thelianal -218 7 1 3930 I, 4-45 I-giI22931 9a I(A F00822 01-YtgP (m acilium subtilis l 1 68 j 38 i 22 I6 68 438 1nIPlOI 0 32s7,i I(AJOOOOO5I orl (Bacillus megateriuml I 68 51 291- 220- 1 462 438-- 28 j1 746 J 108 1911410137 JOR6FX13 (Bacillus subriiis( 68 I 46 639! 237--- 2-675 -j--1451--1- -ii39--34- ch- c- 7- 7 I -4-4 1 24 11 57 155 gJ7 10 AE000l89l o648 was o669; This 669 ae onf is 40 pct identical 'II gaps) t 11 6 254 191117871o5 217 residues of an approx. 232 a& protein Y8BA- IlAEIt SW: P45247 to63 j I 337' 1 1 1 774 IgnIjPlDje26lsgo 1putative onf (Bacillus subtilisj 1 68 47 I 774 4 4- 345 3 i653 19 i11149513 Ithymidylate synthase IEC 2.1.1.451 (Lactococcus lectisI 68 1 j J1 651 4 TA L nuoi e Puat v *oin reg on of noe prtinrd a 0to know pro0ein TABLE22 49 g 5 4 pnM.na ut aive rec coding region of5vel proteinr-uimjlar to known7proteins 1 3D 1 1 5:9 1nt 459 Int l 2237 I(F020 acegnio trnduto reuao jBclu mc), 7 0 9 9- 2 4 j1 1226 j167 IgLI4l5911 lIpa- ene i prdcte l cingu regtions 14357lltaoccu ansh 67 J 26 1026 3 j6 597 4917 Igil229375 gtc gene80 arognat trancti on reguslao (Bclu Kutls 67 41 830 29-- -OT 9 4 4 9 1 63j2 11 606384 116758 gi142091 ipaId gene product Iceilus subtilis j 67 471 6902 9 9 9 9- 9 I 22 13759 78912 jgii1537037 pyrroln le-scaroylat reutae(ctndi eicoa 67 j 51 78 48 29 1 10 8 :144 19721 giJ144 IpgtcR gene product (Bacillus bre is 67 41 731 5 9 9--9-i-i -9 1 3 1 1 1 849 110150 giIj54291 Ib ria ge nrodu tio I e ai n rersoolil lamphlsIfle 67 40 1329 9 I 6 169 91410 81554 jgn15921 IABCtrasotr poalT-binding subunitrasprtr ISh loc ethanococus jnachi 67 so 717 71 8 12 5137 16141 gi1j19767 vI0RF og6 nIcano cherihac llu 67 52 5081 81 75 I 1104289 4981 IgiII55l1i0 Ibah ing enymuaenoe (gigesp als steentro athr ophilu l lu 67 12 148 7 485 1232:14 1714 lgij4139496 Ii la- rrdiee product (Bacillus surubtil 67 30 8 8 9 T I 5 2 113 92 gnjPDPlS33 l~qia I clus subiris) 67 8132 i 9 97 35 1 1174 111946 lgnI1P101do250 la O I sA003 nacchaprotyesn prc IS e cccsmuas 67 j 51 79e1 9 6 0 1 60 8 43 IgiiP18264l1 gAT ebnden prcete tas poerteri Ai (Sahlccu1ues 67 365 882 106 LiCD-----i-n-li-em-phil -p-tative-el-o-iose-po-3ph---- 9- cc c sc c ec 0 0 cc as cc0 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins S4 Contig loaF j Start Stop match match gene name S im ident lenth I ID 1ID j in t) I (ntl a acession I nt 115 7 8421 8077 1911466473 jcellobiose phosphotransterase enzyme 11P (Bacillus stearothermophilual 67 51 345 F T 127 113 I8127 7021 ig1I147326 Itrans port protein (Escherichia coll 67 45 I 1107 4 i 136 13 1 2215 12859 ignlPIDjdl001 lunknown (Bacillus aubtiil 1 67 49 1 6455 4 -140-121 5*23317 i20906 i;gn1--lPI-DjdlOl-9l2 I-phenvlalanyi-t .RNA synthetase (Synechocystic sp.) 67 5 43 5 2412 146 16 2894 1 1893 5giJ2182994 Ihistidine kinase (Lactococcuc lactic cremorisi 67 5 44 1002 1 1511 8 11146 111117 ignlIPIOjdlOOO8S 108F129 (Bacillus cereus)1 67 48 36051 160 10 17453 8646 IgJ21317 jOrfa; similar to a Streptococcus pneumoniae Putative membrane protein I 67 46 1194 J~128encoded by DenBankiAccessiontNumber 199400; inactivationc of the DriB gene led oU-ensitivity and to dec.rea se of homologous recombinarion (plasmidic test) (Lactococcus 1 1 163 1 3 13099 14505 jgnIPIl3IOl3I7 5YqfR (Bacillus subtilis( 67 47 5 1407 5 67 8 6704 j5454 g9111161933 DB~t (Lactobacillus casei( 67 45 I 12511 T 1 169 1 4 12322 28379 IgnIlPlDIdlOI331l YqkO (Bacillus subtilisi 67 41 5585 I 11 11 766 384 1911381 pneumococcai surface protein A (Streptococcus pneumoniae( 67 50 729- 18 I3 I 93 372) 19111542975 IAbcB (Thermoanasrobacrerium thermosulfurigenesi 67 5 46 1794 4 4- 4- 19 6 3599 53141 5gnlIPlDle325178 jHyporhetical protein (Bacillus subtilis( 67 52 4595 F19---T T 25 3 51663 j2211 1911606073 IORF-0169 (Escherichia colil 67 5 7[ 5491 207 4 2896 1456 19152276374 IDtxRhiron regulated tipoprotein precursor (Corynebacterium diphtheriael 67 5 49 561 217 3 4086 3703 1911895750 1putative celiobiose phosphotransferace enzyme Ill (Bacillus subtilis) 67 42 3845 246 ;2 i2-j 91 ;662- -g11 8 4 2 438 unknown (Bacillus subtilis) 67 43 372 1 25 7 45 19112351768 IPspA (Streptococcus pneumonlae( 67 41 744 4 I 265 3 11134 j1811 19112313847 (AE0005851 L-asparaginase 11 (ansB( (Helicobacter pylonl) 67 j 42 678 F 25 1 5 1 375 19112276374 IDtxR/iron regulated lipoprotein precursor jCorynebacterium diphrheriae( 67 43 3755 1 7 4898 j5146 1gnl1Pl~le255179 Iunknown (Hycobacterium tuberculosis) 66 j 56 249 4- 3 5 1 3 89 1 3 lgn11P1Dje269548 IUnknown (Bacillus subtilis( 66 48 3871 3 120 515267 120805 Igi 139956 51101c (Bacillus subtilis( 66 so0 1539 I 3 I255 2718 5gi51787564 51AE000228( phage shock protein C (Escherichia colil 66 5 36 5 174 4---12545T I 13197 112592 lg151574291 5fimbriai transcription regulation repressor (p118) Jllemophiluc influenzael 66 5 46 6065 TAB Ea S. .n a Puttiv coin rein nove poen -i a *e o knw roen a a- a a- 15 112 1109( 1nt 987Int 343 IE00 )tnltolonainfco FT tt aceasioncte p r i I6 I9 nt 1 4 4 16 1 2 1 13212 1 74 1gnllP1Djde2628 junkOwn4 (ycobFacrlu tubclsl j 66 j 437942 2 2 1 2 1 1469 1 1200 lgi124071 [signalOT setiatyp odo (B aclsthcuringciessi 66 42 287 0 221 12 7 109 582 98797 jgiI23l4738 2 l1AE0006531tan hslat engao facrtocu-T itf helicobacter pyo 1 66 1 9 1 0836 4- 1 6 22 120 1124 1713 jgnIPIDedl224 j(8001551y Bcillus subtilis 66 50 959 220 2 5302 18516 1gi121409 jAegna 7 p tiAse tynpoe r -bn n (Lctccc o a ti n jyc)11lcb 66 38 3 2 12 1 1194, 11718 JgnlJPID2 8l1 jOitL (Bacillus suboltis 66 50 9451 33 113 8352 17234 0111I387979 44% identity over 302 residues with hypothetical protein from Synechocystls 66 44 11 I Isp. accession D64006_CD; expression induced by environmental stress: some I I Isimilarity to glycosyl transferases: two potential membrane-spanning I I helicas (Bacillus subtil 34 6 1 5658O 4708 1gn11P10hl250724 lorfa [Lactobacillus sakel 66 39 9510 I 34 14 19792 19574 1gil590997 IM. jannachil predicted coding region M4.0272 Ifethanococcus jannaschil( 1 66 48 219 I 35 11_ 156_140 jil735 jCap5H (Staphylococcus aues 66 46 663 1 36 19 1 61-'3 16976 jgi11518680 Iminicell-assoclated protein DivIVA (Bacillus subtilis) j 66 35 I 804 36 III 110396 110824 1bbs1155344 linsulin activator fatr INSAE (human. Pancreatic insulinoma, Peptide661 9 1 Partial 744 &a)l (Homoaa.piens( 7 2810 4192 IoiIP1D2504 jhypothatica Yprotei(Balu s ubtiG234si 66 1 50 1392 4- 1 52 14 13595 2789 jgiJ38B565 Imajor call-binding factor (Caspylobacter jeJuni( 66 1 52 807 4 1 54 1 3 1 2662 11076 IgnlIPIDIdlOl83l Iglutamine-binding periplasmic protein [Synechocystis spI1 66 1 43 j 1587 6 110 j9740 9183 IgnIIPIDIe1S4I44 Imdr gene product (Staphylococcus aureUsl 66 44 I 558 4 1 72 113 110893 111993 10i12313129 IIAE0005261 H. pyloni predicted coding region HP0049 Ilelicobacter pyloril j 66 1 44 it01! 4 1 4 19 113267 112476 1giJ157394l jhypothetical (leaemophilus influensael 66 1 43 792 1 75 1 1 1 2 1 868 JgII574631 Inlcotinamide mononucleotide transporter IpnuCl Illaemophilus influensaei 66 1 48 f 867 I 75 I7 I5303 ;4275 10111 put. £80 repressor protein (Escherichia colil 66 1 40 1 1029 c a a 0 .a 90 4 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins itiilar to known proteins I onigj~rj tat Stop match watch gene name aim Iident Ilength I ID JIt I (nt Intl I ecesslon int) 82 7 I6813 8123 IgnIIPIDIe2SSI28 jtrigger factor (Bacillus subtilis( 66 j 53 J 1311 1- 83 13 1905 11219 1pirlC33496lC334 IhIsC h omolog Bacillus subtilis 1 66 44 I 315 1- 58 s-hii-a-tekia-e 4 88 10 I7001 6060 IgiI2098719 Iputative fimbrial-associated protein (Actinomyces neeslundlil 66 52 942 481- 89 95 0-X19 (Bac llu -ubt-- 66-- 4- 9 93 7 1 3661 2711 19111787936 1 (AE000260i f298;aThis 298 a. ort is 51 pct identical (5 gaps) to 291 16 9 951 (04 I I------esidues of an approx. 304 aa protein YCSHBACSU SW: k42972 (Eseherlchia j I I- 4- 1 14 13 18.105 13049 jgi11469784 Iputative cell division protein ftsW (Enterococcus hirae( 66 48 1245 106 11 1356 1423 9140027 Ihomologous to Ecoli gidft (Bacillus subtilis) .j 66 52 678 107 13 1965 11864 Ig1I144858 JOR? A (Ciostridium perfringensl 1 66 49 900 1 6- I 1 1 3 30 gi1727367 H4yr~p (Saccharomycas cereviaiae( 66 56 300 i F 1 4- F 1218ju59 111046 IgnlIPIDIdlOll63 10RF3 (Bacillus subtilial 66 48 714 1 128 111l 8201 18431 Igi1726288 Igrowth associated protein GAP-43 IXenopus laevis) 66 41 j 21 I 11 8 4894 j4508 Iq11486661 ITMnl. related protein (Saccharomyces cerevisiae( 66 I 39 387 1 140 13 13236 12574 IgiJ40056 Iphor gene product (Bacillus subtilis( 66 1 36 663 1 S4- -140 j16318 11l5434 19111I658189 S.10-methylenetetrahydrofolete reductese (Erwinia csrotovora( j 66 1 48 885s -i 1 47 6 17137 j6154 1gij472326 ITPP-dependent acetola dehydrogenase alpha-subunit [Clostridium megnusi 66 48 984 19 6 4435 5430 jgnlIPIDIdlO1887ljpentose-5-phosphate-3-epimerase (Synechocystis sp.( 66 j 46 996 4 -i T 1 49 I13 110754 111575 1gLi42371 Ipyruvate formete-lyase activating enzyme (AA 1-246) (Escherichia colil 66 42 822 16 4 2578 2270 IgnIjPlDIdIOll99 IORFIl (Enterococcus feecalici 66 41 3091 4 207 2 2340 2597 IonlIPlDle321893 jenvelope glycoprotein gpl6O (Hluman Immunodeficiency virus type 11 66 46 2 58 S 10358 3678 jgiJ49318 oav gene product (Bacillus subtillal j 66 j 46 j 321 4 217 18 15143 15355 191149538 Ithroebin receptor (Cricetulus longicaudatua( 1 66 38 213 220 14 13875 j 3642 1911466648 lelternate name ORFD of L23635 (Eacherichia colil 1 66 33 I 2341 0 Ce. 9. CCC *e *e TABLE 2 S. pneueoniae -Putative coding regions of novel proteins similar to known proteins Coti Str Stop match Jmtch gene name j i r ident length ID ID Int Itnt) I acession I t I I 23 j i j 1073----I- 138 lgnlljPl01e247l87 js-inc finger protein iBacterl-ophaege phigl-Ie) 66 f 5 933 I 4 0 1 224 1 2 1 1864 1 2640 IgiJ1176399 Iputetive ABC transporter subunit IStephylococcus epidermidisi 66 41 1 777 1 243 1 1 3 1872 IdbIiA8000617_2 iAB0006l71 YcdIl il~acillus subtilis) 1 66 j 45 I 8701 4 1268 12 1891 1 568 jgiJ5l72l0 jputative trensposese IStreptococcus pyogenesi j 66 60 1 324 I 322 1 1 2 1643 jgi1499836 jtn protease ilethenococcus jennaschilii 66 40 642 I 5 110 113909 112178 IgiI1574292 jhypothetical Illsemophllus influenzeet 1 65 34 j 732 1 6 11l 1145 1190 1g11142854 hmologous to E. o i aci u geepoutlisj nienife protein from 65 48 726' 1 1l 106 IhStaphylococcus colredC genelu prout ed o nienife III -4 4 I 7 12 1647 1 405 1pir1C64l461C64i Ihypotheticel protein 1110259 Ilsemophllus influenzee (strain Rd KW2O01 65 j 42 243 F I T I T 6246 682 -g1 1 2 c--iu- sub--i--6-50576 I 10 1 2 11873 11397 jgiIlI63lll IORF-1 IStreptococcus pneumonisel 1 65 54 477I 1 16 13 11428 1 2222 IgnlIPJD~e325010 Ihyvothetical protein iBecilius subtilisi 1 65 45 I 7951 1 21 14 1 3815 1 3357 jgn1IPlD~e3i49l0 Ihypotheticel protein IStaphylococcus sciuri) 65 1 40 459 4 4 I -J- 1 22 134 125776 126384 19g.11123030 ICpxA tActinobacillus pleuropneumoniael 1 65 1 42 j 6091 1 43 12 11648 1290 Igil044826 IFl4ES.l iCeenorhebditis elegans) 1 65 1 38 1359 1 48 113 110062 110856 191,11573390 Ihnbotheticsl Ililemophilus influenise) 65 45 79I 1 48 122 117521 116883 igiIl57339l lhyipotheticai IHeemophilus influenael 65 37 I 6391 1 48 125 119027 118533 IgnlIPIDIe264484 IYCRO2Oc. ien;215 ISaccharomyces cerevisise) 65 38 495 -4 I 49 I3 1 38S6 1 5334! IgiJl480429 Iputative transcriptional regulator [Baecillus stearotherroophilus) 65 j 32 1479 I 50 6 1533.7 14519 1gij117l963 ItMA isopontenyl trensferase iSeccheromyces cerevisiae) 65 j 42 819 1 52 115 1147;2S 115588 IgiJl499745 IN. jenneschii predicted coding region KJ.0912 ilethenococcus janneachiii j 65 46 861 4 I 59 1 7 I 3963 j 4745 Igil4965l4 Iorf teta iStreptococcus pyogenesi 65 j 42 7831 a- 4 4- 68 13 12500 3483 IgiJ887824 IORF-o3l0 iEscherichie coli) 65 j 46 984j 4 I I 69 13 121111 1077 Ign1IPlD~e3lI453 Iunknomn b1acillus subtilis) 65 42 1095 7 09 52 i896 4- 4- 4 4 655 70 4 1 71 15 8 516 11j 9783 19111573224 glIycosyl transferase lgtC )GP:U14554-4) Illemophilus influenre 65 j 42 1248 1 72 18 17664 18527 Ign1IPlD~e267589 jUnknown, highly similar to several spermidine syntheses iBacillus subtilis) 1 65 1 39 I 864 TABL 2 c c c c c S. *emna Puatv c odn rein ofc aoe prten simla tokow roen 4 I 76 19 180319 1 7875 19111574276 lexodeoxyribonuclease. small subunit (xseb) (NeeMophilus influenreel I 65 I 38 225 1 84 1.2 1 2810 12352 IgiJ2313188 iIAEOOOS32I conserved hypothetical protein (Helicobacter pyloril 1 65 41 j 519 1 86 115 114495 113407 lonIjPIDldl~l8SO ji-dehydroquinate synthese (Synechocyatie ap.( 1 65 44 1089 1 87 13 1 3706 12423 Jgij151259 IHHO-CoA reductase (EC 1.1.1.883 (Pseudomonas mevalonlil 65 51 1284 I 88 1 3 1 2435 2 736 1gI11098510 lunknomn [Lectococcus lactis) 65 1 30 j 3121 89 j 2 116:27 11007 1 gn11P10jdl02008 (A800148) SIMILAR To ORF14 OF ENTEROCOCCUS FAECALIS TRANSPOSON I Il 6 6635 61186 IgnlIPIDe246O6i INN2J/nucieoaide diphosphete kinase (Senopus laevisl 65 s0 450 T T I I 116 1 1 3 2 016 IgnIIPIVldi0ll2S Jqueuosine biosynthesis protein QueA (Synechocystia sp.1 1 65 44 1014 I 23 I 6 I39 914983 ORF2 (Clostridium perfringens) j 65 36 321 F12- 4 I 23 I7 6532 7190 jgilS7SS7l DNA-binding response regulator IThermotoga maritimaj 1 65 39 1 669 F4 4 I 2 81 I2859 jgnljPIDje257609 jsugar-binding transport protein (Aneerocellum thermophilum) 65 47 I 9635 s 1 137 112 1 8015 17818 JgiJ2i82514 IIAEOOOO9OI Y4pE (Rhisobium ap, NGR2343 1 65 j 41 1 198 :1 47 -4 5021 -;3885 1911I472329 jdlhydrolipoamide acetyltranslerase (Clostridium magnuml 1 65 47 1 1137 1 148 1 2 1053 11931 jgnIjPlDjdl0l3l9 jlqgN Ilacillus subtilis) 1 65 j 42 1 8191 151 2 13212 14687 1gi1304897 IEcoE type I restriction modification enzyme N subunit (Escherichia colii 65 5o 1476 4 156 12 1730 1437 1g11310893 Imembrane protein ITheileria parval 65 47 j 294 4 F 164__ I 4256 1 4837 Ig1I410132 JORFX8 (Bacillus subtillal 65 48 582 1 169 1 6 1 3192 3 914 19111552737 Isimi 1cr to purine nucleoside phosphorylase (dec01 (Eacherichia colil 65 41 f 723 176 4 I2951 2220 jgnlIPIOje3SSOO joligopeptide binding lipoprotein (Streptococcus pneumonieel 65 j 43 I 732 4 1 195 14 14556 13900 19111592142 JABC transporter, probable ATP-binding subunit (Nethenococcus jannaschii( 65 40 j 657 -4 196 1 1160 11572 IgnIjPIDjdlO2004 I(AB001488) ;PROBABLE UDP-N-ACETYLNURAJIOYLAL.ANYL-D-CLUTANYL-2, 6- 1 65 51 11 I IDAMINOLIGASE (EC 6.3.2.15). (Bacillus subtilisl jI 11 204 2 12246 1215 1g11143156 Imembrene bound protein (Bacillus subtilis) 65 37 J 1032 210 14 1544 1891 191149315 IORFI gene product (Bacillus subtilisl 65 48 348 I 7 242 12 11625 723 Ig111787540 I1AE0002261 f249 5 This 249 as orf is 32 pct identical (8 gaps) to 244 654293 SI I I I ri d~ues of an approx. 272 as protein AGAR-ECOLI SW; 042902 (Eacherichia jj I 4- 4- S.e pn u on a auat v co in aei n of no e po e n i i a to kn w pr te n Contlg ORF Start jStOPI match Match gene name aim ident legh I ID 1ID J nt) nt) I aceasion It 1 284 j 1 1 900 Igij559861 IclyM (Plasmid pADI )6 36 900 9 1304 111 2 1 54 lgnll81D~e290934 junknown (Hycobacterium tuberculosis) 65 52 573 -4 13 1$ 1 2 11483 1911790694 Imannuronan C-5-epizaerase (Azotobacter vinelandit) 1 65 57 I 1482 1I 2 569 jgn1IPID~dlO2048 aerogenes, histidine utilization repressor; P12380 (199) DNA binding 65 4656 1 358 1 1 1 1 309 IgnIPIDje323SOB lYloS protein (Bacillus aubtilis) 1 65 55 309 1 2 17175111 6696 1g1114'98753 Inicotinate-nucleotide pyrophosphorylase (Rhodospirillusmrubruaj 1 64 1 71 876 1 6 1 6 15924 16802 IgnlIPlDjdl0llll Imethionine aminopeptidase (Synechocystia op.) 64 1 52 1 879 I a 1413417 13686 19111045935 IDNA helicase 11 (Mycoplasma genitalium) 1 64 1 58 1 2701 1 11 14 13249 1 2689 IonIjPDIe265529 10r18 [Streptococcus pneumonia.) 64 46 561 I S I7 I6504 I7145 1giI1762328 jYcrs9c/YlgZ homolog (Bacillus subtilia) 64 45 j 642 i _i 1 22 111 1 95418 1 9895 IgnIIPIDIdIOOS8l junknown (Bacillus subtilia) I 64 1 38 j 3481 9 1 22 130 12250(3 123174 IgiJ289260 Icome EN (BF acillus subtilis) 64 44 I 6721 1 26 17 114375 114199 1911409286 IknrU (Bacillus subtilis) 1 64 30 1771 1 27 12 11510 1 1334 19114079$ jode! methylase (Oesulfovibrlo vulgerla) 1 64 j 1 177I 1 29 1 3 614 1297 19112126168 Itype ViI collagen tMus musculus) 64 so5 3181 1 35 2 .1368 1723 IpirIJcIISIIJCII hypothetical 20.3K protein finsertion sequence 151131) Agrobacterium640J I I I IItumefaciena (strain P022) plasmid Ti 1 1 I0 "1 9 1 .40 111 3 1449 191 146970 jepio gene product (Staphylococcus epidermidis) 64 41 44 a 1 40 17 146113 j 4976. 1gnl1I81D~e325792 I (AJOOOOOS) glucose kinase (Bacillus megatariual 1 64 45 294 9 1 45 1 7 I 8068 16920 IgnlIPID~dl02036 Isubunit of APP-glucose pyrophoaphorylase (Bacillus atearothermophilus) 64 1 40 1149 a 51 12 j 301 11059 1gi 143985 InifS-lIke gene [Lactobacilius delbrueckil) 64 1 54 759 1 I 51 i1 115251 118397 1gi12293260 (IAr008220) ONA-polymarase I alpha-chain (Bacillus aubrilia) 64 46 1 3147 1 53 1 3 1 13b.7 1555 19(I1192 Ihypothetical (Haeophilus Influene)1 64 j 47 1 6031 1 58 j 2 14236 11606 1gi11573826 jalanyl-tRlA synthetase (alaS) (Haocmophilus influensaec) 64 51 1 2631 7 1 66 111 3 11259 1911895749 Iputativa cellobiose phosphotranaferase enzyme 11'' (Bacillus subtill 1 64 42 1257 1 68 15 152K3 1 6556 IgII436965 I (malA) gene products (Bacillus stearotharmophilus) 1 64 47 1344 a 1 69 16 15356 1 949 IgnIIPIDIdl0l3l6 Icdd (Bactilua subtilia) 1 64 52 4081 CD *I C (tt C ant I ac s o I ma n C CI TALs2S pneumonlae T-Cutative ongriosnve proteinjC1P Ptasrliptionlactor knownula rte N -15 8 -Ihmn-xi----acma-oi.Petd-ut 141 ID23 IDI437 Intlano deyrgns Intl1 suui acession3 P. 64 Int 21 4- 1 67 lit4 148 j 530 ij72648032 O IL-utaie-D-f ruten6-aciho steaiorn ees (Bcluaubtilis) 1 64 50 1911 4 3 102 1476 bbs1133 hytL icalauao protein(~ciIOP./EB sutrascrpto fatr L1ncer A 64 57 183 1 105 1I 111 Ig I670 I sal (Iloto saimn s Iec r(l e ut s Ec eih a cl l1 6 517 41- -4 1--5 1 I 19 11 1 14291 1g114~317 5 2 hypthno l eyrogeins [NLalp aa sui (Baacil ap 64 37 1216 I 3 1 2 315 1200 IgnlIPIDIdlOl8S hyothetacilu sutinlal coysi sp. 64 54 520 *1-4 I 7 II 11116 30 gnilPIDje3235OS YptaeU Pc rti (Bacillus aubtilisi 64 53 759 i4T 70-9-- i-I 41 I 131 1 1 52 1216 1gi1165503 Isimla ow S.Baurelus. mercril ll reuts1Echrci o 64 42 ISO5 S 13 7 13.6 1 6540 l~pi lDId10ll hypoth(Se ccal stis3 proein(neto.eune)S11 goatr 64 50 1254 4 139 1 1 1 3226 2656 Igil2P23284lOR YDL00244w (S aca llycs cerevisiae 64 40 1032 4- 4 102 I231 I 18 IgnIIPIDIdl3O88 jhypothetgcae produtei (Sncocysils sr.) anii 64 50 1521 4- 11 1 4113 361 Igljl37764l 5 Iunno n ais Btils uia l ij 64 42 7502 4 4 2 12 IgnII~IO9 I I tmefacies (s intran spr s02 ypstmd pemaepoen laiisbiis 44 1 4- F 6 4?24 4 I 164 13 1 j70 186 IgnIIPIDIel3O336 Iuhow l gne ptardct (Latoa anlspr eclsanniim pemrs rtiIDo 64 46 9039 I 148 IIqu 843 878 a1103 IaFO4O dya ile protei Hst apes I 6 2 4- 1 (7 56 1 30 1 2 Ign98 P1D1d13045 Itanmermbnran (Bacillus subtils) 1 64 391 702 1 1 5 18 41 129 214 11450 IgnjPIDdOO89 jhesomlo gu tor trasportsystem pe~rmepoen Bclu utls 64 43 814 162 6 35189 263 11145120 lR po puttive 42 an rotei Stmrae atococcus yogen s l 64 58 483 4 T 4- a a c c a* C C Cc* C 0 0Ce TABLE 2 S. pneumoniae -Putative coding regions of novel proteinrnrmilar to known proteins I Contig joRF Start IStop I match Imatch gene name I aim It ident length I TD JiD nt Intl I acassion I I In t I 202 j j 76 1 1140 IgnljPIDje2938O6 jo-acetyihomoserine sulfhydrylase ILeptospira rayarij 1 64 1 47 1065 I 224 1 1 j 234 1 1571 Igijl573393 Icoilagenase 3prtC3 (Haemophilus influenzael j 64 42 1338 4 231 13 1291 1647 Igij4OI74 joaP X iBacillus aubtilis) 1 64 43 1 3571 4 9 253 13 1 709 11089 1pir137C3151jc11 Ihypothetical 20.3K protein (insertion sequence 1S11311 Agrobacterlum J 64 5038 I I I I I Itumefeclens (strain P0223 plasmid Ti III I 265 1 1j 820 2 91gi1377832 junknown (Bacillus subtilial 64 j 1 819 I 297 1 I 1 I660 igi11590871 Icollagenase (Hethanococcus jannaschii)1 6 4 6 -i 64 48-- 41 243- 1 4 1 8730 1 8098 19i1556885 jtnknown [Bacillus subtilisi6 4 3 to0 5 173 1 4403 IgiJII73I0l 1hnpothetical IflaeopHilus influenzae(6 4 9 -2 9323 9902 g1863 eb aepoenIail sai oulltca 63 1 401 696 1 9- 6- 1 75771881 9375 g1117743 Iunknown (A bac terU sytlinul 1 63 40 0 91 S9-- 4 I I 7 2 103 30 jnI~~j216O P -deU en nLctb clese pia u s su l 63 32 2723 9 38 I8 738 I695 g1g1377843 Iunknown (Bacillus subtilis( 63 35 7 04 0 3 6 4 11 980 I7078 gnIjl2d440 B F IAP-endentnculse ae c ilautlis 63 46 2703 9 12 29 343 119 gij 13129 r ju rko en B acillus ubtilis 33 3 9- 3979 1 883 0298 IgnI1lD8di0l i~ea 108 8 ntc ocus aealil 63 41 I 843 481YAY HYOTETCA 11250 1169 PgROTE8INypthtia INempiu infuen a INEGNCRGO 32 63 1 40 819 6 86 75 655 4 5687502 Jglj 157084 [ea (Bacillu sutinlii Heohlu nleza 63 41 I 5 15 j6695 1 160 IgnIPID0 e3ISO Iutthireoxiereducase (acillu p btils(Acioye jasu 63 41 9 02 9 9 100 111 240 1 1940 IgiI7171 Itucosidace (Dictyostelium discoidauml 1 63 36 I 1701 1 T A L S n0mn a Pu at v *o in eg on of *ove *ro ein ir a eto *n w ro e n i-C-o -n ,-OR -F -os ee* ma tc h- e- *Cmat-- g ne 0- 98 854 ILS39 pnuoi en Pu ati e oIng regins of nove l prten sf4ia tokow3roen F -06--F 1221D 1D 4nt4 1 4nt6 I acslPDdo13 Itaspss I~nco i I. 63t ,9 18 104 4 30563 157 191144985 jp anopa to coylase (Crnebatrusgltmc 63 46 2703 10489 59I 85 19115 3309 hytendn cl pr i (Bacillus subtilis 63 45 466 4 122 4 1 1 4886 10nl17 ldl03 9 JEO Itrans ossee nehocis sp.(f s2 ctietca 1 ap)t 6 63 19 I 81 128rsiue f 7n 4517 520 Igl.1i3 I2f Latanbctru thrm ruotopicu jIAECL 63: 5099 687hhi 1 4- 96 1 3 147 1911472920 Iv-ptet4aAls (Enkdprtecoc c hi) il 63 34 4- 4 I 12 I 7 40 4585 lg0159m 583l32 Ihcothetmicai yl h protea ius subtlls) csjanach 63 1 36 21 F 78---;3T 1591 5 17 25 jn 18704 l 1A50084 £271,cocu Thi 7as on is 24pt1dnicl063as t 6 I I I I I esidus ofan aprox.272 s pr tinocc Y)D..EOLI sW:P099 Esheic 63 36 I 2131 IilO05 I~e I.cilu IuL~ 3 4I colt i-2-9 a- 283--FIT 4 29 17 3 2 e 42 1 3466 i1722339 Iunknown (Acetobacter sylinum) 63 47 66351 -IT 19 32 1377 6 -S 1gnl1P01e341 7 IfitsQ in (E ercoc mc h i dne)enP9(amyoatr eui 63 33 1205j5 4O 3 45-- 249 3 1 276 1 9 11000433 Ire (Bacillu W oautirxl) l 63 1 41 21 28 1 IJ27 137 139 46 IORFB(Bacillus subintinli ne fon Iaiopi hla 63 33 4 12217 -3 36-4 8 29 3 :1 t4 12436 191172~233 9 9 lunkiownAet obacprter yinBclus bis 632 37 j e663 4- 1132 1 56 55620 191114777418 Ihitsd pe lescy barindin protein P29hs (Camycocytr jepun) j 63 436 1260 4 4 4- 999 6 9* so: V,99 9999 9 9 TABLE 2 S. pneumoniae -Putative coding regions of novel proteind -similar to known proteins -Contig -10KV F -Start -Stop matc ma-- -tch gene name aim Iident length, 10 (113 I ntl I nt) atcsion I n Int .7 124 j19526 118519 19111276434 Ibeta-ketoacyl-ACP synthase III (Cuphea wrightill 62 37 1 1008 12 (7 1 5904 14702 Igi11573768 IA/0-apecific adenine glycosylase imutY) (Haeophilus influenza.) 62 43 1203 12 1 9 18032 j 8793 (gi(1591587 (pantothenete metabolism flavoprotein (Hethanococcus jannaachii( 62 33 762 ,1li 9678 j 9328 IpirlICl151 (Jll (hypothetical 20.3K protein (insertion sequence IS11313 Agrobacterium 62 4135 I I II I I tumefaciens (strain P022) plasmid Tii II 31 4# 1 17 4 126(,9 12442 IgillO0l IM jannaachii predicted coding region $30374 (Hethanoceccus jannaschii( 62 1 43 168 17 15 1:3053 2835 1.I497 roein the expression of lactacin F, part of the laf operon [Lactobacillus 62 4411 32 1: 1 sg(1950.t I II 22 110 1 9538 lgnllP1DId100SB0 Isimilar to B. aubtilis Dnail (Bacillus subtilisl 1 62 1 43 j 912I I 30 j 3 8 a6!i 2043 IgiJ2314 379 £A0006273 ABC transporter. ATP-binding protein iyhc0( (Ilelicobacter 62 43 I 1179j I I 1 1 pylori( IIII I 33 1 5 1 22:15 1 1636 1g114l3976 lipa-S2r gene product (Bacillus subtiis( 62 44 I 6001 I 38 Il 1 56119 1 6123 I91(148231 lol5i (Eacherichia colil 1 62 1 34 435I I 40 10 1142-12 IL3328 IgnlIPlDIdIlO9O4 Ihvpothetical protein (Synechocystis apiJ j 62 1 43 I 451 I 42 1 3 3 (31 Igij11146182 Iputative (Bacillus subtiliag j 62 41 j 309 44 2 167 40 Igi[17B6952 1IAE000I16( o877; 100 pct identical to the first 86 residues of the 100 a& 62 43 27391 Ihypothetical protein fragment YBGBECOLI SW; P54746 (Eacherichia coil)( 48 112 I 9712 19304 1911662920 jrepreasor protein (Enterococcus hirael. 62 1 32 1 429 1 51 18 5664 1 7181 I~nlIPKDIe3Ol3 (StySKI methylase (Salmonella enterical 1 62 j 44 1 15181I 1 52 3 1 2791 12099 IgijIlB3BB6 (integral membrane protein (Bacillus subtilis( 1 62 j 41 j 693 4- 116 115702 114704 IgnlIPlDje3l3O28 (hypothetical protein (Bacillus subtilis( 1 62 j 40 1 9991 I 59 1 6 1 3418 13984 (giJ2065483 junknown (Lactococcus lactis lactis( 1 62 1 32 1 567 I 3 j5 j4997 4809 1,1 149771 (piLin gene inverting protein ivHL) (loraxella lacunataj 62 1 28 189 ;4 4 70 (14 110002 110739 g91(992977 IbplG gene product (Bordeteila pertussis( 62 j 45 738 71 131 118780 208 (gi(12B0135 1 cded frby C. elegans cDNA cm~le6; coded for by C. elegans cODN cmOleZ; I 62 62 1593! j23B iiIart me:Ibiose carrier Protein (thiomethylgalactoside permease IlIII I 71 128 132217 132768 IgnlPIO~d101312 (YqeG (Bacillus subtilis( 62 35 I 552( I 4 7 1166 (033 gi15273 hypothetical (Escherichia coli( 62 38 1284 GO .a 0 S* S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig ORF Stairt jStop mtch match gene name aim ident. length I ID 111) nt.) Iintl I ocassion II Cnt( 4 UNCT-ON-N----N--[Ba-l-Iu-- ubt 11 97 110 1 9068 17041 1giJ882463 jprotsln-N(pl)-phosphohistidine-sugsr phosphotransferase [Escherichla colil 62 42 2028 -4 I 98 14 12306 13268 jgnIjPIDidil496 j8raE (integral membrane protein) IPseudomonas aeruginosa) 62 42 j 963 4 I 102 1 3 1 2673 13539 jgnIjPIDjeIlO ihypothetical protein (Bacillus aubtilisi j 62 24 717 1 i- nf-e---hypothetical -ARCtransporter;--- IB -c 1 7 111 12 12035 j 3462 Igif581297 jNisP (Lactococcus lactis) 62 1 44 I 1428 11 4 j 1S 48 Il~5737 lic-1 operon protein (licA) iHaemophilu5 influenzae) 62 1 39 927 1 212 16 14939 15649 fgif 1574381 Ilic-! operon protein (licC) (Flemophilue influenza.) 1 62 1 39 1 711l 1 124 13 11137 721 911704 lnaerob ic ribonucleoside-triphosphate reductase inrdo) (H1asmophilus 62 45 17' I I I I I i1t73024zI---I 124 1 6 1 3162 12329 Igl1609076 Ileucyl aminopeptidase (Lactobacillus delbrueckli) 62 40 1 834 I 2 11073 I 7516 gn1lPID~diDli63 foaRi (Bacillus subtilis) 62 38 1 3558 S 1 129 1 6 j 493313 4540 IpIrIS4lSO9IS41S Izinc finger protein EF6 Chilo iridescent virus 1 62 1313- _7 4510-- 4103 jg-wn-[--ef1857245 Iunknown 4081-0 1 149 12 1 19:23 1 2579 Igif1592142 JABC transporter, probable AlP-binding subunit (Nethanococcus jannachil) 1 62 1 41 1 657 1 149 7 15350 16055 jgnlPXDje323SOB jYloS protein (Bacillus subtills( 62 1 40 6961 I 16 1 4' 23 glPO 2464 membrane protein (Streptococcus pneumoniae) 62 1 40 213 16 1 4, 28-gllle564* 156 6 j 3606 j2935 jgnlPIDfdlO2OSO Itrensmembrane (Bacillus subtilisl 62 37 j 6721 4 171' 2 11719 12291 fgI 43941 IEIII-il Sor P1'S (Klebsiella pneumoniae( 62 35 I 5131 4 4 1 172 12 1385 1723 1giJ895750 Iputative cellobiose phosphotransferase enzyme III (Bacillus subtilis) 62 39 339I 4 4 173 13 12599 893 1,11 2591732 Icobalt transport AlP-binding protein 0 (Nethahococcus jannaschi( 62 44 17 01 4- 179 2 492 1754 Hgj1701 I. influenree predicted coding region H111038 (laemophilus intluenrac) 62 j 38 1263 4- 681 16 28U6 3707 jg111777435 LadT (Lactobacillus caseil 62 j 42 f 852 i I 15 j2 2014 j311 jgij2182397 I(AE000073) Y4fN (Rhizobium sp. IGR2341 62 41 1764 I 200 12 1 l06l 1 1984 jgij450566 ftransmembrane protein (Bacillus subtills) 1 62 j 37 f 924 4 I 202 13 12583 13473 jgiJ42219 IP35 gene product 1 314) lEecherichla col) j 62 1 41 1 8911 1 210 13 12374 11565 lgi[49315 IORFi gene product (Bacillus eubtilisl 1 62 1 45 1 1921 C. 9 9* 9 0 909 09 0 TABLE 2 S. pneumoniae Putative coding regions of novel proteinsa5Aiilar to known proteins Cotg OE'JStart stop match mtch gene name imn ident Ilength 1 I0 jID I Inl I Intl I acession I In tj3 211 j1 3 1971 9gij147402 Imannose permease subunit III-Man lEscherichia colil 62 43 969 223 2 1495 1034 Ign~lPiDldIiO110 RP2 (Streptococcus mutansl 62 41 1 462 22 1 1 34 909 gJo10063 1glycerol uptake facilitator [Streptococcus pneumoniaej 62 44 876 234 12 1 90 1917 IgLi2293259 l(AF008220( YtqI (Bacillus aubtilis) 1 62 j 38 1 828 1 I 282 5 1765 1487 1gnl1P101e276475 Igalactokinase (Arabidopsis thaliana) 1 62 j 33 1 2791 i 1375 j1 j 5 gI643 (AE000052) Iycoplesms pneumonise. hypothetical protein homolog; similar to 6I1 I I I Ipneumoniae) I 1 385 5 1 54 1 357 191,11573353 jouter membrane integrity protein (tolA) (Haemophlus influenzae) 1 62 47 2281 1 3 119 118550 119269 Igij6o6l62 IORF....229 (Escherichia colij 1 61 1 41 1 720 1 7 j 4 2725 13225 IgiJ2ll4425 similar to Synechocystis sp. hypotheticel protein, encoded by Gensank I I I I I I Accession Number D64006 (Bacillus subtitle) 1 0 17 6 j3326 3054 gi1149569 Ilectacin P (Lactobacillus ap.1 61 43 273 ft I I 3 I 4061 I 4957 IgnIIPI8-x-y-lo--e-re-jdlO1O68 syloso repressor 8977- 51 ft 1 5' 6 13974 1 6037 IgnlPIDjdl.0l3l6 jYqfK (Bacillus subtilis) 1 61 42 2064 1 58 15 17356 16565 IspIP45l69POTC- ISPERM4IDINE/PUTRESCINE TRANSPORT SYSTEM PERHEASE PROTEIN POTC. 1 61 1 34 7 92 1 67 1 j 3 1692 Igli537l08 IORP~t254 (Eacherichia colil I 61 46 6901 -f f f I 68 19 1 8816 j 7890 Jg119501 IpPPLZ12 gene product (AA 1-1841 [Lupinus polyphyllus) 61 1 41 1 927 1 70 115 110731 112008 Ig1I992976 lbplP gene product (Bordetella pertuaslal j 61 j 44 j 12721 72 1II 9759 110202 IgnIPIDIdlI833 jcarboxynorapermidine decarboxylese (Synechocystia sp.) 1 61 36 4441 76 8 781 j703 gnIPD~1035 farnesyl diphosphate synthase (Bacillus stearothermophilusl 61 4 I 879 ft 1 87 113. 112311 111361 gqiJl789683 I(AE000407( methionyl-tRNA formyitranaferase (Escherichia coil) 61 44 1 951 I 91 12 1731 1 2989 Igi1537080 Iribonucleoside triphosphate reducrase (Eacherichis colil 61 J 45 22591 105 3 711 399 gn~PI~d0I81 hypothetical protein (Synechocyatis sp.) 61 j 44 j 7891 6 j 7961; 64788 jg i~_8_5_74?gi-----95747t Iputatlve e1 c pci operon egu regulator Ba (Bacillus s 61 -jI 61 36 -149911 1 123 18 17181. 18518 JgIJ1209527 [protein hiatidine kinase (Enterococcus feecelis) 61 j 40 1338 .05 ao' 00e' a *o i aOR a St I aSo a.tc ac gen aam .i aegt ID JID I (nt) I Inl I acession I I tt) 72 1675 gII8 43 (A6000184)o£271;.This 271s r is 24 pet identical 116 gaps) to 26538B l 12 61aar 752 67250 1 gij1787043 residues of an epprox. 22 a protein YIDAECOLI SW: P09997 (Escherichia 6 128 1 1 639 Ign~jP1DjdiOl328 jYqiY (Bacillus subtilisi 61 41 639 135 11632 I 513 gl10e2700l4 jbeta-galactosidase iThermoanaerobacter ethanclicus) 61 41 6720 143252 4 jgjSOSl penicillin-binding proteins IA and 18 (Bacillus subtilis) 61 42 2511 4 1(Es122ehe4gJI573 terhdrdpcoiaerichia coli) 61 42 702 16: 3 i Z 1 1 3505 Igni lPIDo 72 6 2 l unknsp (Stpyc o e p o s hataem o Sy ncu os) i (220) 41 4065 177~~ ji 3s j 11 72 jnsudO~4jnnw Bils uiis 61I 1 4 224 1 3 1 27112 13144 191IS1144 1M. jannaschii predicted coding region 1430440 (Hethanococcus jannaschi( 61 30 363- 225 4 1395 3766 jgiJl552774 lhypothetical (Eseherichis coii6 4 7 24 12 802 jgijlOOO4SJ ITreR (Bacillus subtilis) 61 42 j 591 4- 1 257 111 3 1350 jgnijPlDje2SS3lS Iunknown (Hycobacterium tuberculosis) 61 42 3481 293 14 j3911 13657 jpirIJCl I5IJC~I Ihnpothetical 20.3K protein (insertion sequence 1S1131) Agrobacterium 61 45 I 315 I I I I tumefaciens (strain P0221 plasmid TiIIII F 301 1 1949 1 g22 20 IAF 644 cotains siMilaritY to acyltransierases icmohbii lgns) 61 933---- 4 -4 1I1 1066 1287 jgij393396 ITb-292 membrane associated protein (Trypanosoma brucel subgroup) 61 38 780 3 124 124473 124955 IgiJ537093 JORF..olS3b (Eseherichia ccli) 60 1 27 1 4831 -4 -a-ill- 543 14 6 112 j11936 j1l87 ~gi12 93017 jCRF3 putative (Lactococcus lactisi 60 j 44 1 750 1 17 113 I 67108 16484 Igij149569 jIactacin F iLactobacillus sp.]I 60 32 I 2251 *8 7 697 57ogJ7810 (E02P 4Bl; This 481 &a orf is 35 pet idnntical 119 gaps)I to 3 09 604110 I jgi~l788i40 residues of an approx. 856 &a protein ?JOLIJIUHAN SW; P46087 )Escherichla1 colii I 1 0 115 115878 117167 jgnljPIDI0584 Iunknown (Bacillus subtilisi 044 19 4 ace a a fee e a ec se seeaa TAL 2e a c* S.pe n a Putati e *in regon of aoe prta r a o ec ow a rten a *j 1 32 110 j 0296 8964 Igil2293275 I (AF008220) YtaG (Bacillus subtilisi 1 60 1 37 1 6691 38 115s 8837 19697 Jgij40023 1Bsubtilia genes rpmil, rnpA. S0kd. gidA and gidB (Bacillus subtilil 60 35 061 4 1 43 1 I 8610 15944 jglJ 171787 jprotoin kinase 1 ISacchatomyces cerevisisel 60 36 2667 I 4 j1 1 1269 jgnIjPIDje235B23 junknown (Schizosaccharomycea pomba) 60 I 44 I 1269 1 45 110 j1113f. 110368 1g11397488 jI.4-alpha-glucan branching enzyme (Bacillus subtillal 1 60 1 43 1 771 1 48 j 19 11576t. 114378 jgnlPIDje2O5ll3 jorfi (Lactobacillus helveticusi 60o 39 j 1389 1 48 121 116.727 116951 IgnhjPIDIdlO2O4l I1AB002668( unnamed protein product (Neemophilus actinomycetemcomitansi 60 j 32 225 I 50 I1 2 1898 jgnijPIDje246S37 JORP286 protein (Pseudomonas atutterl 60 1 31 897 1 62 12 1638 .11177 IgnijPDjdIOOS87 Iunknown (Bacillus eubtilis) I 60 1 42 540 1 68 14 135904 1 5203 IgiJl573583 IH. intluenzac predicted coding region H10594 (Ilaeophtlus infiuenzae( 60 1 36 1 1614 4 4 4 4 4 11 1 5781 1 6182 IgnIjPID~d120l4 I(AB001488) SIMILAR TO YDFR GENE PRODUCT OF THIS ENTRY IYDFRBACSU). I 60 33 4021 I 1 1I. I sclus cubtilis) I I I 70 112 16343 18133 jgnIjPiDje32497O jhypothetlcal protein (Bacillus subtillst 60 1 38 1 1791 6 4 4 1 71 18 111701 114157 Ig1i580866 Iipa-12d gene product (BacIlluxseubtilil 60 33 I 24571 1 4 18 112509 111664 IgnIjPIDIOIB32 jphosphatidsta cytldylyltransferase (Synechocystla ap.1 60 j 45 846 4 1 76 14 14115 3367 JgIJ2352096 sr '"imilar to serine/threonine protein phosphatase (Fervidobacterlum j 60 19 I 7501 4 31 17665 1 giIl78642O I'7E0011) 86;1a co t Identical to GB: ECODINJ_6 ACCESSION; D38582 60 30 I 2941 4- I 81 6 140731 4522 1giI147402 jeannose permeace subunit III-man (Eachericlila colil 60 35 1 450 1 86 1 1 940 j 155 jgiJ143177 Iputativa [Bacillus subtitle) 60 26 j 7861 4 1 92 j 1 1 192 jgi1396348 jhomosarina transaucuinylase (Escherichla colild 60 1 5 192 93 [4 [019 9384~g~jl788 89esidues of an !ppcox. 416 as protein MTRC..NEIGO SW4: P43505 tEsciserichia IIII 4 I 94 I 5 I 5548 1 8121 IgnljI'1IDje339895 Ij(AJ000496) cyclic nucleot ide-gated channel beta subunit (Rattus norvegicusl 60 50 1 2574 4 I 97 1 7 1 5396 1 4533 jgiJl591396 jtransketolase' (Nethanococcus jannachill 1 60 41 864 4 1 102 1 2 12081 12833 1gnl1P1Dje320929 1hypothetlcal protein (i~ycobacterium tuberculosisi 60 41 753 4 baa c a a. 0* Contig OAF Start Stop I mthatch gene name 6 Im jIident. length I ID t ntl (nt) acessc~ion 'a t 3 1106 19 1 9713 19183 1gn11Pl01e334782 IYIbN protein [Bacillus subtilial 60 31591i 113 8 1 6361 16837 1911466875 jnifU; 51496_.Cl...57 (Iycobacterium Iaprae( 60 1 43 1 477I 115 21 27551 524 IgnlIPIDIe328143 14AJ000332)_Olucosidase II (Homo sapiensl 1 60 32 22321 112 7 1 14131 5068 jgnl(PlI3~d101876 Itranaposase (Synechocyatia ap.( 60 3061 127 8 14510 15283 (g111777938 IPgm (Treponema palliduml 60 38 774 1138 14 1 3082 12672 IgnlIPIOIe32Sl96 1hypothatica1 protein (Bacillus aubtilisi 60 16 1 4111 1139 1( 1 1711 4 lgnlIPIDIdlOO68O lOAF (Thermus thermophilusl 60 39I 1741 13 119 11 114510 113009 Ig11537145 jOAF..f437 (Eacherichia coliP 60 30 1 1512 140 1 2 1 25!,2 1 1249 IgiI1209527 Iprotein histidine kinase (Enterococcus faecalis( 60 37 1344 1141 1 1 1 21012 1049 1gi1463181 IES OAF from bp 3842 to 4081; putative iHuman papillomavirus type 331 60 34 840 1141 1 5 1 5368 16405 IgiII45362 Ityroaine-sensitive DAHP synthase laroF) (Escherichia coll 1 60 41 1 1038 1142 16 13558 14049 1911600711 (putative (Bacillus subtilisi 1 60 1 37 1 492 1148 (10 j 77,12 18713 IgnIIPlDIe3l3O22 1hypothetical protein (Bacillus subtilisi 60 27 1 972 153 5 1 36637 1 4278 19112293322 ICAFOO822Oi branch-chain amino acid transporter (Bacillus subtilial 1 60 1 42 612 1155 1( 1 14131 748 Ig1J2l04504 (putative UOP-glucoae dehydrogenase (Escherichia couP 60 40( 666 15 I 778 1386 1 gnlIPl0(e308090 1 product highly similar to Bacillus anthracia CapA protein (Bacillus 04 I IIutls 01 0 13 (7 049 8468 (gnl(PID~d101313 (YqeH (Bacillus subtilial 60 38 420 S- 170 V 4130 1 2688 19111574179 11. influenree predicted coding region 1111244 lHaemophilua influanreel 60( 39 1443 T 1 171 1 4717 1 5901 1911606076 (OAF..o384 lEacherichia Coll) 60 1 44 1 1185 13 I3I2440 2135 Igi(1877427 (repressor (Streptococcus pyogenes phage T12) 60 38 1 306 1191 110 19444 18428 1911415664 Icatabolite control protein (Bacillus megatarium) 60 42 j 1017 ~~13 1201 13 1 3895 1 1928 (gi(475112 jenzyme Ilabc (Pediococcus pentosaceus) 1 60 39 1 1968 -a 1214 115 110930 110439 19111573407 (hypothetical (hlemophilus influenzael 60 g 91 9 -a 2188 4 1422145 363 g2363 60(gi(608520 ;myos---n heavy 31 219 99' 990 9so99* 9 990 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins'siilar to known proteins 4- Contig ORP IStart IStop I watch wtch gene name jI aim I ident length ID jID Intl nt acession I I, 1 Int 'J 226 4 f2511,1 2351 jgiI437705 Ihyaluronidase (Streptococcus pneumoniael 60 1 53 168 242 111 725 1 3 jgij43938 ISor regulator (Klebeicils pneumoniae) 60 41 723 1 1 288 1giI304897 lEcoE type I restriction modification enzyme HI subunit (Eacherichi. coli( 60 56 288 445-- 25 1 95 4 gi61632 junknown (Staphylococcus aureus( 60 36 1 8611 29 1 I969 82 jgij 153794 Irgg (Streptococcus gordoni( 60 32 See88 ;2 *2 j 1492 1662 p-i 1 r-ISp3ir1153184-0S153-18 p -ro-b I bpro-batble n s p o trSanspoasese 1 1 -S Bacillus-h- M steharothermoph---------------us1--6j 60 262-6-171-- 274 1 1836 196 111592173 IN-ethylammeline chlorohydrolase (Hethanococcus jannaachii( 60 40 5 741 4 308 1 463 2 Iii1787397 (E024 oS Ecerci coi5 604 42 4- 4- o1- 04 4 5 1 I 3 308 IgniIPIDImI37S94 IxerC recombinase (Lactobacillus leichmannii) 60 5 42 5 306 4- 4 4 1 344 111 731 522 5gi5509672 Irepressor protein (Bacteriophage 1uc2009( 60 1 32 I 4505 4 1 1 1 576 1 4 IgiI2293141 I(Ar008220l YtxH (Bacillus subtilisj I 59 I 31 573 1 7 122 118140 117142 5gnl5PlIa280724 junknown (Hycobacterium tuberculosisi 1 59 1 39 I 9991 1 10 111 1413 4 Igi11353B80 Isielidase L. (Hacrobdella decoral I 59 5 41 5 14105 5 56 6463 5156 5gij580841 jFI (Bacillus subtitle i 59 i 35 I 13081 4 1 22 1 2 5 479 1 1393 1gi1142469 Isle operom regulatory protein (Bacillus subtilial 1 59 I 34 9155 4- 4- 4- 1 22 5 5 126911 14614 IgnIIPIDIe28O623 IPCPA (Streptococcus pneumoniae( 59 5 44 I 1917 1 30 111 208 1 558 IgnlIPIDIe233B68 Shypotheticsl protein (Bacillus subtilis( 1 I9 1 7 1 3515 130 14 1 36711 12455 IgnlIPlI e202290 Iunknown (Lactobecillus sake) 1 59 I 33 1224 4 I 35 ,113 112201L 111071 1gn119105e238664 5hypothetical protein (Bacillus subtilia( 59 5 35 I 1131 I 35 114 1132811 112182 9151I657647 ICsPBH (Staphylococcus eureus) 59 I 39 11075 1 36 118 11807$S 117897 1 9 i11500535 IN. jannachli predicted coding region MJ.1635 (Hethanococcus jannaachii( 59 j 33 1 38 112 I 6172 1 7137 19112293239 I(AF0082201 YtxK (Bacillus subtilis) I 59 I 34 I 9661 S4- 4- -4 4 F I 50 53 12673i 1728 IgnIIPIDIdIO1329 5YqJx (Bacillus subtilis) 59 5 41 5 951 1 56 15 11870) J 2388 Ign15PIo~el37594 jxerc recombinase (Lacrobacillus leichmannii( 1 59 5 41 J 5195 4 1 61 16 1 681-2 15628 IgnIIPIDIells6 Iaminotransferase (Bacillus subtilisl 5 59 40 to j 11855 1 67 5 j 2381 1'3023 59111146190 12-keto-3-deoxy-6-phosphogluconate oldolase (Bacillus subtilisl 5 59 5 36 6425 4ace ec a seeace ea V.a Ie eD 5I Ia an~ eec) Ie assso ca IC ICC 1 8 11 11311 1S05 pqn jemonise Putative coin rein of ove proteins [Bacillur toti a knw proteins32 Coti 11 F S14 r Stop2 match4 match773 geneOlO ny asm tnuoie aimos-ar s IB identent length 5 to Sws-Po Accesio Nube P20966 fro E.t scoslo I I I Int 69 11 0 8B67 8991 Igil157386 IM janhnatei k rin ctd coing eonlus inlunselhnccu Iansh 59 38 24 33 1 18 117 113438 110055 jgnlIPIDje232O Iutatie [Muoterim (Bcls stlis 59 J 44 21329 r 4 1 14015 22 8 2 1g1901590388 6 I ja naschi pr edicwt h mlotd coding regieo of111 Bsiflhano cos lan schi I 59 381 246 44 1489 1 196 152 04 IgnIPIDI2O900S IhIsBoogoust FUC INUKON SIrdEF operons of icol a n Altyp59murlu8 I I I IYSI S.L(ctocolcus lectisi 12 149 11 1 713 113178 IgnI1D2 62 Ium-nkn ngMcoactr Im Ituberclosis)u aues 59 38 2612 7 164 9 6113 6 014 IgnljIlDjIOt9oS Ifle00148 FnUNbCTION-idn prt n ecs F UNNON SIMLA PROUCTill .IFUN aE AN 59 I 3 I I I I IBaNcilusI. B cill us" sutiii 1164 1 164 112 88:16 7823 jgn~jPlIjdlOO964 Ihomologue of ferric angulbactin transport system permerase protein FstC of 59 35 I 1011 I I I I V. anguillarum (B1acillus subtllls) IIII I 7 12 0 1072 jgil289759 1cddfrb .e~a:cN CE203 (GenBank:Z1472811 putative 1 59 40 672'' 1 177 1 I 384111 4200 Igi12313445 I(AEOOOSS1( N. pylori predicted coding region 11P0342 Ifielicobacter pylon)i 59 38 j 360 4 I 4 4- 1 183 14 12708 12508 1911509672 Irepressor protein (Bacteriophage Tuc2009( 59 50o 2611 4 I 186 j 6 13398 12820 jgiJ6060 80 IORP-o290; Geneplot suggests frameshift linking to o267. not found 5938s9 1 I Eschenlchla col) 1 1 1 190 13 13120 11711 19111613768 Ihistidine protein kinase (Streptococcus pneumonise( 59 j 32 j 1410 1 194 12 11621 11019 IgnIIPIDIOOS79 Iunknown (Bacillus subtilisi I 59 1 40 6031 79 5205 4306 ~gn1lP1Dle3l3073 Ihypothetical protein (Bacillus subtilisi 59 1 38 9001 4---1a 1 220 15 14362 13958 IgnIIlDIOl322 IYQhL (Bacillus subtilis) 1 59 I 46 1 405 1 242 13 11573 12367 Igij1787045 [AE00184) f308;aThisx308 a ort is 25 pct identical (35 gaps) to 305 Zia 412 j 795j I I I I ~~~~~~esidues of an approx. 296 as protein PFLCECOLI SW: P32675 (Escherlchi -4 r 4- 1 247 j 2 1 1154 11480 191140073 IORFIO7 (BacIllus subtilis) I 59 I 39 1 327 1 -4 4 4 cc c c c c. 0 0o so' TABE 2S. pneumoniae -putative coding regions of novel proteins 'sldtllar to known proteins Contit-JO-Whj-Start-Stop match match genename aim 9ident. length I ID JIr I(nt Intl I &cession I f I I I 26 j1 868 I 2 lgnllPlljdlI924 Ihemolysin (Synechocystis sp.)I 59 I 9 I 867J F I a- 2865 1 820 19112246532 IORF 73. conteins large complex repeat CR 73 (Kaposiis sarcoma-associated 11 2071 I I I ~herpesvirus( 2 258- 40 1 270 1 j 386 1126 IgnflIP1Dldl02092 lYfnB (Bacillus subtilisl 94 4 VI 1 281 l -51 552 -Ii-1166 0 21- 1666062 Iputetive (Lactococcus lactis 1 8-- -7 33 1 2 I1894 IgiI9S2OB gastric mucin (Sus scrotal 5 1 1 19 ;-33 I 2- 4_25 19 I S 6 I11123 110465 IgnIIPIojdl 0l812 ILumo (Synechocystis sp.1 58 29 159j AT~e ubnt Eneoccushral58 3 11 -T I 0 5 I4058 I3651 191139478 lATP binding protein of transport AiPcses (Bacillus tirmus( 58 34 I 408I 3 -i I-PID--I- ;unkn-wn- 1- a- 36 j8 j5316 j6179 1gijI18679 lort (Bacillus subtilisi 58 32 1 864 I 3 jS j5926 I3971 Ig11l788150 1AE000278) protesse 11 (Eacherichis colij 58 37 1956 4 5 374 3 6I 74 52 InIlP1Dle267329 jUnknown (Bacillus subtilisi s 42 1518 T 4 I 48 114 111722 111066 IgnlIlDldIOl77l Ithiamin biosynthetic bifunctional enzyme (Syiechocystis sp.l) 58 it 657 I 2 II 1229 3 IgnlIDIdl0lz9l Ireductase (Pseudomonas ceruginosa( 835 12 1--F -a 4- 1 1 12 172 42 li2137 i(AE0005451 cytochrome c biogenesis protein Iccd.A)if(elicobacter pyloril so5 25 291 I 5 I4 656 49 111147329 Itransport protein itscherichia coli( 58 41 j 1089 1 69 15 14934 13807 1onl1P1D~le3ll492 Iunknown (Bacillus subtills) I 58 1 41 11281 1 71 127 1335 132277 IoiI24080l4 Ihypotheticel protein ISchizosaccharomyces pombmi 1 58 1 33 1 921 -1 -70 72 T 358 2882-- 18694-- I- in 2 20l-- yc ne ma---5--j70 -8 I I- 191---22932---2-- (Bacillus- j 3 -708 I 9 I4 I4594 3422 1911121-7989 10RF3 (Streptococcus pneuaaonicei 58 j 44 1173 4 I 6 17 16017 115337 191147642 5-dehydroqulnate hydrolyase (3-dehydroquinssei [Salmonella typhi) so5 32 681 97 2 931 1 560 jg11l53794 jrgg (Streptococcus gordonll 5 81 321 3721 0 9 9 0 9 9 TABLE.2 S. pneumonia. Putative coding regions of novel proteins similar to known proteins ~Contig O0RP Start Stop match mtch gene name aim ident lenth to 1ID in ntl I Intl acession mI I Il 08 2 358 12724 IgiIlozo2 IvacB gene product Itacherichia coli) I so 37 2367 1- 4 11 1l 5 J 4593 1 5240 1gii1592142 JAB3C transporter, probable ATP-binding subunit (Iethanococcus jannaschiij 58 j 36 648 120 13 j.44211 5110 IgnIIPIDIdlOl320 jYqgX Iaacillus subtilil 1 58 1 47 1 6901 4 1128 116 113111 112673 Igii662919 jOAP U (Enterococcus hirsel j 58 42 J 459j 13 3 674 439jiIlBOO3ol Ieacrolide-etffux determinant IStreptococcus pneumoniael so5 35 1236 32 ;3--V611T49--1 1 133 1 1 1 11 890 IgniiPIDje269488 jUnknown (Bacillus eubtilis) 1 58 1 36 1 780 1 1 160 111 1 86S15 9865 IgiI473901 IORFIl ILactococcus lactial I 58 1 39 1251 1 161 16 16258 16849 IgnIlPIDjdiOlO24 1111-1 protein [Hlomo sapiens) I so8 32 1 582j1 1169 111 214 1 2 IgnlIP1Djdl00447 jtranslation elongation factor-] IChiorella virus) I s8 31 213 1187 1 1 48-1 2 jgi14751l4 Iregulatory protein [Pediococcus pentosaceus) so8 38 486 4 4 181 6 J 43114 j 4620 IgIj167475 Idessicatlon-relatad protein iCraterostigsa plantagineuml I so8 55 2371 4- 1 190 1 2 1 1464 11640 IgnljPIDje24d727 jcompetence pheromone IStreptococcus gordonil so5 38 177 1192 12 1 2012 11344 IgnljPID~dl00556 Irat GCP360 IRattus rattus) 1 58 1 44 I 669j0 206 1 1 1292 1696 jgnljPIDje2O2S79 Iproduct similar to WrbA ILactobacillus sake) 58 35 597I 216 12 12333 1555 IgnIIP)DIe325036 1hypothetical protein (Bacillus subtilisl 58 1 33 1 1779j 1217 15 15250 14321 IgiJ466474 Iceilobiose phosphotransferase enzyme II- Ie1acilius stearothiermophilusj 58s 38 930 4 217 1 1 1 56:16 j 5106 jgn~jP1Odi0I2048 1- aubtilis celiobiose phosphotransferaaa system celBs P46317 (9981 58 I 4 I 53 I I I -It~ anseembrane )Sacillus subrilis) I22 1 2 811 'gijl573777 jcell division ATP'blnding protein (ftss) llaemophilus Influenzael 58 39 810 I 264 11 1 2 1 715 IgiJ973330 II~atA Iflacillus subtilis) I s8 32 1 714 1 280 11 1 331 767 igiI1786i87 I(AEOO01)1 hypothetical 29.6 kD protein in thrC-talB intergenic region j 58 31 I I Escherichia colil I III 1 306 j 1I 84i 3 1gnl1P101e334780 IYlbL protein )Bacillus subtilis) 1 58 1 47 843 1360 13 11556 11092 IspIP463SliYZGD- IIYPOTHETICAL 45.4 KO PROTeIN IN TIINASE I S'REGION. I so8 32 1 4651 4- 1363 15 121610 11867 jgi~16067l Is antigen precursor IPlasmodium talciparum) 58 j 51 1 2941 1372 111 806 13 1011393394 ITb-291 membrane associated protein [Trypanosoma brucei subgroup) I 58 1 37 1 804 i382 ;2--;749 ;519 IpirIJCiISlIJCll Ihypothetical 20.3K protein linsertion sequence 161131) Agrobacteriu I 4--s 11 211 I I I I I tumefaclens (strain P022) piasmid Ti1 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins I ID lID I int) I (ntl I acession I nt)4 4 3 41 7471 ~gij1499745 M. jannachil predicted coding region M4.0912 (Hethanococcus jannachil) I 57 38 939 SI- 10 j10 i7674 7i-507 1igiI1737169 Ihomologu .e to SKP1 (Arabidopsis thaliansl 57 30 168 ii 1j2 j412 ~gnIlP1DId100139 jORF lAcetobacter pasteurienual 51 42 j 411 a I 3 4 20:1 138 gi22323 AF082201 YtpR. Ifacillus subtilisi 57 37 I 6451 3- 1-4A-- I 3 l1 691 44 gnlIPIDle324949 Ihypothetical protein iBacillus subtilisi 57 36 I 4831 1 45 1 5 1 54456 5060 1JI11592204 lphosphoserine phosphatase (Hethenococcus jannachiil 57 I 44 1 387 1 49 1 7 1 652:! j 7632 Ig11155369 IPTS enzyme-II fructose IXanthomonas campestriui 57 I 34 1110 a 152 16 14,520'1 6850 IgiJl574144 iaingle-atrandad-Dk4A-specific esonuclese (recJ) (Heemophilus influense) 57 35 1 2331 1 53 1 5 2079') 1795 Igil1843580 Ireplicase-associsted polyprotein lost blue dwarf virusi 57 46 1 285 163 6 153112 4995 1gJ1268 IAE000094) Y4rJ [Rhizobium sp. NGR2341 1 57 1 39 3181 I 172 115 113893 113059 IgnhIPIDjdlOO892 Ihomologous to SwissProt.YIDA-ECOLI hypothetical protein [Bacillus aubtilie) 57 40 825 29 1 2 2561 1IRIlS IgnIIPID)jdIOO96S Ihoaologua of HADPII-flavin oxidoreductase Frp of V. harveyi [Bacillus 51 4I 747'~ I I I I I I ~subtillalIIII 4- 110 F 82 -i9 6 973 gilO6 4 jhort re&gion ofs*imila&rityto g-yceroph~osphoryl diester phiosphodleaterases 51- 3 5 1 168 1 1 1 1 I lCaenorhsbditis eleganal IIII 86 '1615371 '14493 i189) (E024 o288; 92 pct identical (I gaps) to 222 residues of fragment 57 I 34 4 879! 1 1 1 3 YgD7B93 ItEcoLl SW: P28244 (223 as) (Eacherichia colil I I- I93 1 3 1 1695 1 177 IgiJIlOOOO3 Imutator mutT protein (Hethanococcus jannaschiil 1 57 33 I 519 96 6 13026 14519 jglI559882 jthreonine synthase [Arabidopsis thalianal 1 57 I 43 1494 199 114 111211 118212 1gi1773349 IBirA protein (Bacillus subtilial 1 57 44 1002 I 4 112 6 8 I 448 7903 19111591393 jennachil predicted coding region HJ0678 (Hethanococcus jannaschil) 57 30 456 113 16181 138liI46SA5 ure-parasite-infected erythrocyte surface antigen MESA Plasmodium 57 22 I 001 I2 11 167jB2 jiI46SAS matliau 11312 1 343 11110 1pir1F64l491F641 1hypothetica1 protein 1410355 ilsemophilus influena (strain Rd ICWi2l 57 38 I 766 1123 14 121086 2884 IgnIlPIDId1O2I4S 11AB001664) sulfate transport system permease protein iChiorella vulgariel I 57 I 39 77 '1~ 1 127 110 1 64771 5587 1g111573082 jnitrogensse C (nifC) (Hammophilus influenaelI 57 1 35 8911 1128 11 9251 19790 IgiI153692 jpneumolyein (Streptococcusf pneumoniae) 57 38 is 5401 0 4 1131 14 12139 11363 191142081 InagD gene product (AA 1-250) (Eacharichia col)l 57 1 36 777 4 *D asD Is ant aI (nt I acs o a I a 136 5 5 214 a22 lb s558 5 as&- n oc r i i asuo o n astge [S r p o o cs 5. nu MU O i-1-9 S 10 15 2701 12685 IgnISSIGI~I03 R Ibe agcosidte pmaseriacius sbils 57 42 384785 T 4 144 1 973210 92785 jgni5PXDjdIO0lIO IOr (Acetobactar pasteurianusi 1 51 42 5 462 i I 1 65 4 02 1 544 456 5g15647004 5glyosyl tranease (Erwuni I a n mylovo r h 5o 57 5 34 814 I4- 176 141 j2970 1 9249 gnl5IP105d100139 I08 (A ero bacer A p ste uran us RN HEICS 42A 51MLG 462 lssbiIi 5 919 190 1 5 145 5 1455 1911149420 lexport/processing protein (Lactococcus lactis) I S7 1 30 1311 4 f- I 98 51 298 95 1911522268 unidentified 0RF22 IBacterlophage b1L67) 1 57 1 36 204 194 20 0 507 5gi11439527 jEIIA-man [Lactobacillus curvatusi 57 28 468 0 I 214 j7 4243 3797 IgnlIiDIdIO2O49 I. influena. ribosomal protein aisnine acetyltransferase; P44305 (189) 58 I1 I I I I I 18acillus subtilis) f- i 4 9 9L C r a u m l r p i l s n d g n o r c -ba il f- 9 1 5gI437 I curvatus)ssl I 351 1 5 324 1 34 1gn11P101e275871 5r03F6.b ICasnorhabditis elegans) I 57 5 31 291 I 386 11 1 226 j 2 1g11160671 IS antigen precursor IPlasmodium falciparum) 57 1 455 225 108 -4 -471 5 -5 51 42 8 777 g I- Es- ri- co- 5 367- 4 ium4 145639 23 I 8 5--36 -4 3910 5- -6--467199- j-ksC;-- (M--cobacte----iu- lepra--- 237-- 3 53442 1874 lgnl5P105d101907 Isodium-coupled permease (Synechocystis sp.J) 6 I 6 16 47 I 1 2 1 21980 25 6 33 5 i2313949 (AE000593) smPROEcro rti 2pTYTANF RoWX) 2aa llcobacbterl on) 56 37 1489 27 5 1 1 13SI15 3 1911215132 5ea59 (525) (Bacteriophage lambda) 1 56 5 30 5 1359 1 28 5 9 1 46.57 5 4278 5giJ 1592090 IONA repair protein RAO (Methanococcus jannaschti 5 56 1 29 5 3905 0 I 33 I 1 1 3 5 386 5gnljPlD~dlO0l39 5ORF lAcetobacter pasteurlanus) I 56 5 41 1 3841 4 I -4 -4 -4 a* e .a _t c- a ai a a a a a XD IDj Itl Inl aesla I I I(t' 50 4 13.1 5397 jpiJIaO3f1o jhcoheclu protemiprot (Sreion) u -Pseuonsaeruioa(tanPO 56 28 112 I4 j1 1251! j13191 jgnIPlDIe2l76O2 IlnU (Lactobacillus planterum) 1 56 j 38 6811 ft- ft I 5 I4j174 I2594 IgnIIPIDIdiO2036 Imembrane protein (Bacillus stearothermophilusi 56 1 25 1 921 1 05 13 1184?. 1 1459 IgnlIPlDIdtOOl3S lORE (Acetobacter pasteurianua( 1 56 41 1 3841 ft I 8 I7 l 815 490 9185377 Ipodct imla t EcliPRA2proei Iacllu sbrlil1 56 1 42 1 8761 105O 2 11601 2718 jgnIPlDjdlolgl3 Ihypathatical protein (Synechocystis ap.( 1 56 1 37 1 1359 1 1 112 13 1 2151 1 3194 1911537301 J0RF-o345 ltacharichia coli( 56 31 1044 I 13 4 75 I293IelIlIdlO934O lORE (Plum pox virua( I 56 28 2101 _2963ign1jPID F122 3 j1201 2054 IgIIl649035 hIgh-affInIty perlplasmic glutamine binding protein (Salmonella 56 30 81 I I I I I typhimurium( I0 1 I I
I.
1124 J 8 13939 13694 IgniIPlDIe248B93 Iunknown Iycobacterium tuberculaais( j 56 1 27 j 246 12 40 171n11P101dl00247 jhumsn non-muscle myosin heavy chain (llama sapiens( I j6 3 2971 t 127 111 1 6608 1640Si 1gi12182397 j IAEOOOO73) Y4114 (Rhizobium sp. NJGR2341 1 56 1 35 I 204 1 1 134 15 1 4769 j 3849 IgnIPlOIdOB7O Ihypothetical protein (Synechiocystis 56 1 39 1 921 1 1 37 110 1 6814 1 7245 Igi 11592011 Isulfate permease (cysA) Iethanococcus Jannaschii( 1 56 34 it 432 1 142 1 8 15019 1 4582 IpirIA47O71IA470 jarfliImmediately 5' af rutS Bacillus subtilis j 56 29 1 4381 1146 18 1467f 1 3660 IgniIPIDIdlOlsll 1hypthatical protein (Synechacystis sp.( 1 56 32 1017 1 148 13 1190f 1 2739 IgnlIP1DIdlOlO99 lphosphate transport system pensease protein PatA ISynechocystis sp.( 56 36 834 ft 1 150 14 14449 2743 jgIPDe304628 Iprobably site-specific recombinase of the resolvase family of enzymes 1 56 27 1 1 1 1 glPD Bacteriophage TP21(
II
F 0 f I I IIresidues at1 an Zppro x. 3 20 as protein YXXCBACSU SW; P39140 IEscherlchia II I I I I I coli( 172 17 14979 15668 1911396293 jsimilar to Bclusutisyph.20 kna protein, in tsr 3* regionr 1 56 40j I I I I IIE cherichiB colil, Iutii Ipoh I0 1 1 186 1 7 3732 1 3367 19111732200 IPTS permease for mannose subunit IIPlian (Vibria furnissiil j 56 1 36 1 366 1C f t- 1187 12 12402 1819 1pir155790415579 IvirR49 protein Streptococcus pyogenes (strain CSIOI. seratype M149) 1 56 1 35 1 1584 SLO mac ach e *ae It sie 9n I length T4 I ID JID I I Intl I acession I I Intl 'I -404-- 0R-F 3- 206 12 13342 1 1633 1911559861 IclyN (Pasmid pADII 1 56 38 I 17101 I 219 13 116119 j 1096 Jg1(1i46197 1putative (Bacillus subtills) 1 56 27 1 594 1 230 2 409S j 1485 1pirlC603281C603 Ihypothetical protein 2 (sr 5' region) Streptococcus mutans (strain 5 017 1 I 1 I MZI175. serotype f)I I 233 I4 1 2930 1 3268 1gi11041785 Irhoptrr protein [Plasmodlums yoelii( 1 56 I 24 I 339 1 4 273 1 2 1 154.3 12-724 1911143089 flep protein (Bacillus subtilis) S6 1 32 1 1182 1 1 I 359 1 1 87 1 641 jgiJ1786952 I(AE0001761 o877; 100 pct Identical to the firat 86 residues of the 100 as 56 46 555 I 1 I 1 1 hypothetical protein fragment YBGB-ECCLI SW: P54746 IEscherichla col! II 1 363 17 14482 4198 Igil 1573353 looter membrane integrity protein ItolAl (llaemophilus lntluenzae( 56 1 38 285 1 376 111 2 1 508 IgnIIPIDje325O3I1 hypothetical protein (Bacillus subtiliS( 56 1. 33 I 507 1 1 18 1 1 1 836 1 177 IgnIjPIDjIOO872 ja negative regulator of pho regulon (Pseudomonas aeruginosa) 55 1 31 1 6601 1 28 14 118;A!4 1618 IgnIjPIDje316Sl8 ISTAT protein (Dlctyostelium discoidum) 1 55 1 40 1 207 0 29 j 6 14446 15041 19111088261 Iunkno*wn protein (Anabeena ap.1 1 55 1 31 I 546 1 1 38 116 I 9695 110702 1911580905 IBsubtilis genes rpml(. mnph. S0kd. gidA and 91dB (Bacillus subtiie( 1 55 1 31 1 1008 I 49 1 5 57 1 6182 1gi11786951 I(AE000176) heat-responsive regulatory protein (Escherichia coli( 1 55 29 1 456 I 51 14 1231111 3241 IgnllPlDjdl01293 lYbbA (Bacillus subtilie( 1 55 j 42 I 8611 52 9 9640-- 1186 11 3 16IRF419 protein ([StLaphyl-o-c-c-us aureus)- I -55- 23 j- 1227 4 I 53 1 4 I 1823 1 1349 1911896042 10sp? (Borrelia burgdorferi( 1 55 30 1 465 4 4 1 60 15 14794 15756 1iil499876 Imagnesium and cobalt transport protein liMethanococcua jannaachil( j 55 38 1 963 4 1 71 1 9 1141716 115408 19111857120 1glycosy1 transferase Ilsseria meningitidis( I 55 I 41 1233 4 7 5 6 131119 14229 IgnlIjPDe2O989O jNAD alcohol dehydrogenase (Bacillus subtills( 55 j 44 j 1041 108 110 1104118 9820 1gn11Pl01e324997 hypothetical protein (Bacillus subtilisi I 55 I 36 j 669 11 12 12-1 1307 1gn11P101e311496 Iunknowdn (Bacillus subt(iis( 55 I 34 I 7651 I 13 13 11l301*'-*17 j3945 -*-191(573423 -1-phosphofructokcinase fruKI- (Hlemophilus influenzeel 55- 39 -I 939I 126 5 6164 15907 19111790131 i(AE000446) hypothetical 29.7 lcD protein In ibpA-gyrB intergeatlc region 1 55 I 37 I 8581 I I I I I I (Escherlchia colil I III 0 a 5*5 5*C 9 TABLE 2 S. pneumonia. Putative coding regions of novel proteintdirmllar to known proteins Contig ICRF Stert IStop mtch mtch gene name i .Sim ident length I ID jlo Intl I Intl a .cession In Ij Intl Id 1 129 13 1 2719 1902 IgnlPIDIdlOl42S jrz-peptidase (Bacillus iicheniforeis( I 55 1 35 Iola1 138 13 12593 11610 1911142833 jORii (Bacillus subtillsl 55 37 9841 14 61 63 ~IPI~IO94 hmlou fhypothetical protein in a rapamy ,cin synthesis gene cluster of 55 26 12841 6 6916 533 jgnlPLD~dl0964 1 hoogue hygroscopicus (Bacillus subtilis] 4 I 141 3 138514 2136 lgi1472330 Idihydrolipoamide dehydrogenase (Clostridium magnumi 55 I 39 I 1719 4 I 347 110 110201 18921 1gnl1P101e73078 Idihydroorotase (Lactobacillus leichmannil j 55 1 3 8 1284 1 148 15 1343-3 14139 IglI290572 Iperipheral membrane protein U (Escherichla colil 1 55 29 690 1 48 1 6 j 4173 4650 1gi1695769 Itransposase Ixanthobacter autotrophicus) j 55 I 37 1 4801 49 114 1125641 111650 jgnlIPlDldl~l329 JYqja (Bacillus subtilisi 1 55 1 32 1 915 156 13 1111.3 550 1gi12314496 (AE000634( conserved hypothetical integral membrane protein Ilelicobacter I 55 I 3 I 5641 1 1 I I I py lon III 1 159 110 1 6625 5891 Igi1290533 Isimilar to E. coi OR? adjacent to suc operon; Similar to gntR class of551 9 1 1 I 1 I 1 regulatory proteins (Escherichia colilII 1 164 13 1 17814 2332 IgnIPlDe255118 1hypothetical protein (Bacillus subtilis) I 55 1 37 I 549I 164 5 12773 1 3521 j19i140348 Iput. resolvase lnp I (AA I 2841 (Bacillus thuringiensis) I 55 I 35 1 164 Ill j 742131 7216 IgnlIPlDe249407 junknown (Mycobactenium tuberculosis( j 55 1 38 1 213 1 167 15 1 3861)3 3345 1g11535052 linvolved In protein secretion (Bacillus subtilis) I 55 28 1 516 186 15 28) 12563 jgij6OhOBO i (EschericGe2Cp%7h suggests frameshilt linking to o267, not found j1 3511 1 189 1B 431.1 1 5396 jgnljPlD~ei83450 1hypothetical EcsB protein (Bacillus subrilisl 1 55 1 32 1086 -4 1 192 1 5 1 3271) 1 3079 IgIIl196504 Ivitellogenln convertase (Aedes aegyptil 1 55 1 38 1 192 1 195 j 2 124541 1384 1gi~i574693 Itransferese. peptidoglycan synthesis ImurG) IHaemophilus lnfluensae) I 55 I 33 1071 98 14 130131 j 2471 IgnIJPIDIe3l3O74 1hypothetical protein (Bacillus subtllisi 1 55 29 1 543 1 214 j 1 j 373 j 744 IgnlIPIDdlOl741 Itransposae (Synechocystis sp.) I 55 j 33 j 3721 1 219 12 11115 1456 Igij2B83Ol 10RP2 gene product (Bacillus megaterium( 55 I 30 1 6601 J 1 263 17 13742 1 3443 jgiJI137 Icgcr-4 product (Chlamydomonas reinhardtii( 55 48 1 300 1 285 111 2 1829 jgnl(P1D~dl00974 Iunknown (Bacillus subtills( 1 55 40 1 828 1 286 1 -1 650 1249 1911396844 ICRF (38 k~a) (Vibrio cholera.) I 55 I 31 j +4021 1 297 12 1122-) 1 1696 1831150848 jprtC (Porphyromonas glngivals) I 55 I 39 1 468 we -I I Intl a Intl I. ac s o Iw *o e TABLE 8 2iI19 hypemoie-putatiecodin rHegpions oflnovel rti sm a to 2-1 ID28 1 2 6ntl 2ntl gI 510 acessioni ISch roye Iee e I5 27 nt 2 330 1 1 11340 j 474 1giJ396397 IsoxS (Escherichia colij 55 I 29 867 34 3 2M3 1546 jgi(393394 jTb-29i membrane associated protein ITrypanosoma brucci subgroupl I 55 I 36 993 i368 i 3 J 941 1-05 jgiiiiO-67l1- IS-an-tigen precursor s od u aici-- r- j 55- 40 8 1- -637-1 4 4 1 8 14 10 3 17246 IgiIIl1 42 Iputative trncitoa(euao Bacillus subtilisiphi s 1 54 38 115 38 4 24 1A 150713 1 193 IgI314849 Iputtiv rsrpionyltN yteasl, eultrl clus stearothermophilu el I 54 27 125 4T S4- 4 52 I0 104 1213 Ign iDei86 ll FA Itraphyorter (Lcoillus) hevtcs1 54 1 25 1264 -4 4 -4 I 52 1 1 1114 1120 giI5176 VeA(tpyoocu iuas 54 I 9 120 I 5 I 12 g115817 endo-1,4-beta-xylanase tCellulmonaa fimil I 4 I 36 510- I 58 13 14'149 j 4246 IgnIIPIDIdIOI237 (hypothetical (Bacillus subtilisl 1 54 1 29S0 I 71 1 7 11D1;84 111703 Igii5iO2SS (orf3 tEscherichia colil I 54 I 31 10201 71 10 17146 127737 (gi(202543 Iserotonin receptor (Rattus norvegicusl I 54 31 192 72 (2 j844 1 109P (giIl48613 IamnB gene product (Piesmid Fl 54' 37 I 2551 4 72 (7 17438 16695 Igi,(1196496 Irecombinase (loraselis bovisl S 4 I 38 744 F 1 74 110 1140143 113465 IgiJ1200342 lOB? 3 gene product IBradyrhizobium japonicum( 54 1 32 579 -4-595-I23778 maurs- cllgees 4 74 112--116483--I- *-----g----3*7-9--I---tu---e--e--ated-protein----seud--mon-s--a-ca-igenes 30-489 possblyenc-es-- -0un-tpo- erase- 54 4 4 1 89 1 5 1 4433 13921 IgiIl472ll IphnO protein lEscherichia coli( j 54 j 41 1 513 4- reae rti ~edmnsaclgns 4 -6 96 110 Io i8058 -i8510 -IgniIPIDId1O2OlS I4AB00I481 SIMILAR TO SALMONELLA TYPIIIMURIUM SLYY GENE -REQUIRED FOR 54-- 32- 4531 I I 1 I SURVIVAL IN KACROPNAGE. [Bacillus subtilisi I jI 4 1 97 1 6 1 4662 1 3604 IgiI1591394 Itranaketoisse' Illethanococcus jennaschiil I .54 I 30 1059 -4 4 4 4 -4 4 1 106 111 110406 112010 Igi(606286 IORF..o637 lEacherichia colil j 54 32 1 16051 147 8663 17404 Ign1IPI oIdlOl6ls ORF ID:o31917: similar to (SwisProt Accession Number P373401 lEscherichia 14 15 116I a TABLE 2 S. pneumonia. Putative coding regions of novel proteinis similar to known proteins IConcig OAF Stuat stop I match Imatch gene name j t sim tident length. I ID 11ID Intl j acssion ntl) 17 1i 4 1 2477 13223 Igi3439528 JEJIC-man [Lactobacillus curvatusi 54 I 36 1 747j 114 12 120J-8 1 781 Ign~IPiD~d1005i8 Imdt *or protein lilomo, aapiensj I 54 I 35 1 2821 I188 111 52( 1 1188 IgniIPIDIe2so3s2 junknown liycobacterium tuberculosisl I 54 31 1 663 1 I 198 j 5 j 3582 j 2884 jgniiPID~e3l3o74 Ihypothetical protein loacilus subtilisi I 54 I 33 I 699 1 0 6 207 1 1 1 1641 jgnijPIDjdio3Bl3 Ihnpothetical protein iSynechocystis sp.1 54 I 24 1 1641j I 210 1 1 2 1655 jgi 12293206 jiAF008220i YtMP iBacilius aubtiiisj 1 54 1 29 1 6541 I 225 12 1966 j 2357 Ion) IP1Dje330194 1R11116.3 iCaenorhabditis eiegans3 54 I 39 11I92j1 I241 1 1 16811 347 IgnIIPJDIdiOl8h3 Ihnpothetical protein iSynecioCystis sp.A I 54 1 26 13 I 263 2 1907 1 1395 IgniIP3D~dI018R6 Itranaposase iSynechocyatis sp.I 1 54 1 30 4591 I 263 1 6 1 3450 1 29-77 IgiJ36067i IS antigen precursor iPlasmodium falciparuml I 54 I 47 1 47 I I 277 13 12537 1 1363 1giiI96926 Iunknown, protein iStreptococcus mutans) j 54 I 30 j 1155, 3 0-7 1 1 1 828 j 4 1giJ2293198 jiAF008220) YtgP IBacilius subtiliat I 54 1 28 1 825j 1 325 1 3 9 1 768 10112182507 ICAEOOOO83i Y41H iRhIziobium ap. NGR2341 54 1 37 1 750 1 1 33 1I 9 9 g[51L ADP-ribosylglycoitydrolase idraGI Itlethanococcus jannascitili 54 1 32 j 309 1 4 240 1479 011530878 amino acid feature: N-glycosyiarion sites. a& 413 43. 46 48. 513 53.
72 74, .10 30. 128 33.:0, 332.. 334. 358 3. 60. 163 165; 1 amino acid' feature: Ro poen dmain. a 369 340; amino acid feature; 4 4 I I Iglobular protein domai I 1 9" 1 25 119702 119493 jgnIPiDje255sI Ihyvotheticai protein Ieacillus subtiiisl I 53 1 32 1 2101 4 23 13 j 2497 1 2033 Ion) IPIDIdIO2OIS j3AB00314883SIMiLAR TO SALMONELLA TYPIIIMURIUN SLYY GENE REQUIRED FOR 53 146 I I I I I1 SURVIVA IN I4ACROPIIACE. [Bacillus subtilisi I 1 29 111 9042 110121 lgili43331 lalkalne phoaphatase aegulatory protein i~acillus subtilisl I 53 I 31 3 080 133 13 j 147) j 3009 IpirISIO6s5SIlO6 1hypothetical protein X Pyrococcus woesel Ifragmenti I 53 I 13 1 471 36 16 1 45831 5134 Ign3IPID~a316029 Iunicnown [Mycobacterium tuberculosis) 53 1 30 I 552 1 38 114 j 85213 8898 1gij580904 1homologous to F..coli rnpA [Bacillus subtilisi 53 j 30 3 78 52 F- 1 54 j17 117555 119564 Igi1666069 jorf2 gene product ILactobacilius ieichmanniii 1 53 I 36 I 2030 I 56 11 1 1 683 IgiIl592266 Irestriction modification system S subunit Imetlianococcus jannaschiil .I 53 1 32 681 I as *aD a sla Ia a 5~so 'as ant 57 10 9431 18487 111788543 IiAE000310i f351; Residues 1-121 are 100 pct identical to YOJL_.ECOLI SW: 1 53 114 110 1 jP335d4 (122 ee and as 152-351 are 100 pcL identical to YWJKECOLI SW:II I I I I IIP33943 (EscherichIa colI III 61- 1 1 1 429 1 4 IgnhIPlDje236d67 180024.12 ianradtseeaa 3 I 3 2 nohbitseegn) I 1 i11 5772 T 4 igi 1393394 ITb-291 membrane associated protein (Trypanosoma brucel subgroup( 53 I 33 1 57691 4- 72 13 1891 12840 IgiJ2293178 IjAP008220( YtsD (Bacillus subtilil 1 53 j 27 j 19471 I 73 114 1 9713 19212 1gi11778556 Iputative cobalamin synthesis protein (Eacherichia coi( 53 I 32 j 5821 88 7 5217 4342 1gij12098719 Iputative fimbrial-associated protein ihctinorsycea naeslundii( 53 I 38 876 T i I 3 I5 I23)5 1688 IgiJ563366 gluconate oxidoreductase ioluconobacter oxydansl 1 53 1 33 I 708 4 I 96 1 9 1 66312 17162 Igil517204 IORFI, putative 42 kna protein (Streptococcus pyogenesi 1 53 1 42 1 1131 108 1 8 j 7629 18600 IglI14958l Isaturation protein ('Lactobacillus paracaseil 53 I 32 1 972 I 128 I9 1 6412 j 6972 IgnljP1Dje317237 Iunknown (Hycobacterium tuberculosis( 53 I 36 j 561 1 128 j12 1 8429 19253 jgijlllOO Ipentraxin fusion protein IXenopus leevisi 53 1 31 j 825 14 5 pirIA61607IA616 Iprobable heasolysin precursor -Streptococcus agalactiae (strain 74-360) 531 36 948C7 I 13 2 216 I 022~gi1751S0 Inoturin ~enpuslaeis53 30 861 I 171+ j 3 12304 12624 1gl11732200 IPTS permease for mannose subunit IlPhan (Vibrio furnissii( 53 1 32 11 321 4- 209 13 12948 11935 1giJ1778505 (attric enterobactin transport protein lEscherlclhia colil 1 53 I 28 1014 4 40 62- ne- -od- illus- -4 34 -7 218 88 246 jO621 ur gen prouc (Bclu sutls 53 j 34 1479-- 473 0 250 -i 3 473 79 Ig iP 0j3 4 7 IYb pr ti (B clu ubti- s- j 30 j 3181- 1 275 1 1 1j 1 1611 Ign]IIiDIdlOi3l4 lYcleW (Bacillus subtiliel I- 53 j 35 1611 1 4- 1 332 111 544 1 2 Igil409286 IbmrU (Bacillus subtilis) I 53 I 31 543 I I2 54 I345 gn1lP10le233879 1hypothetical protein (Bacillus subtilisi 52 39 I 903 2 I 3 12 1240 1237 Ig~396911cr gene product (Agrobacterium radlobacterl 52 36 975 i24-021---- 4- 1 5 1 3 18094 12356 IgnilP1Dle324915 11lgM protease (Streptococcus sanguisl 1 S2 1 32 5739 1 22 126 119961 120212 jgI1529Ol lORF 3 (Spirocheete aurantil 1 52 35 I 2521 22 13 (310 24i6 Ig2826 comE ORE] (Bacillus subtilisi 1 52 j 32 1527 I 7 I6 597 4801 101139573 P20 (ALA 1-1781 (Bacillus licheniformls) 52 35 $971 4 TA B L E 2 S. 999oia 9uat v 9oin reg on 990 099e 990en 'l mi a 9ok o n pr TABLE 2 35 .5024 pnuoi.-putative codigng rinspofte noeprtica cilr to know proteis 4- -0-T 4D lI Int 1nt ace0io 1 366 IgIPD~I24 Intl554 hooosaefudI l ndl.Ifuna;seS ISP 4 4- I- 4 54 1 4 81 373 662 IgillD0 dl224 IN.0554 hoisolgsi areiced foudin re. oilan m I I. influnza.; use jansWhiS R 1 52 36 1896 54 61 114365 113769 inIJe12O53 i74 0 9 jorf2 Latbcluhelveiu p Nl 34 52 25 26 66 4- 66 6 1 840 I3 1 5955 1g1231740 11AF013987einitrogereguatcoryli rten (irc hlre 52 f 19 14465 -i 4 4 IS 12 6 5250 496 10i35 845 Iq~~~j IJAEOOOO7SI Yai0 (~risours .p.1 G24 52 40 2821 81 66 3 6 1400 6295 jgnij43~140 I4rk pramuote inas (E acihia co ilis 1 52 30 1446 4 8 1 126 13d49 135312 IgnJI1Di4 93 Iunnown perycoacterbuium 1P-a tuec lsi chacoi 52 1 23 6954 I 21 1687 121803 Ignl4i3di227 IAospOi3I l FarA (Strepto y e sprol as 1IU-;tgsatc 52 2 7 1639 861 3 58 1439 2 19n11P1276 i149 IrhasnuILose kinas (Bacllusauilis) 1 52 2 1 1451 86 8 1 2 8 19497 117861 191147403 JR I 3nns permet a subuni l-PHnlahrci oi 52 372 150 96 11 6 105785 1 4659 19112879 lOR1p e ne ~rpt ctu thrc oplus)gtei 52 26 1194 il 86 I2 11939 117826 19144844 IcyoEF i 3 (citoasor pratso n lI eoocu acl 52 26 20150 112 1 2 11457 2 167 1911471234 lorfl. (Haemophilus influensaej 52 1 3371 1 118 3 291 365 IbsJ5123 ip24kdamarohag Ifecivtypotntato poten LIioIll i11356 1 12931mphla jhldlha1 2365de 1bbs1a&)233 g (egioneila 52la 122 19 15646 j 5951 19118214 Imyosln heavy chain (Drosophila relanogasterl 52 36 306 4 i22 111 6159 16374 1gi1434025 Idihydrolipoamide acetyltransferass IPelobacter carbinolicusl I 52 1 52 216 114 1 6 1 4880 1 6313 19i1153733 IN protein trans-acting positive regulator (Streptococcus pyogeneal 1 52 43 1 1434 I 135 13 11238 1 2716 1gnl1P101e245024 Iunknown (Hycobacterium tuberculosis) I 52 1 35 1 14791 I 141 13 11681 12319 IgnlIPlDldlOOS73 Iunknown IBacillus subtillsl 1 52 j 32 1 6391 161 4 2562 1 5024 19111146243 22.4% Identity With Eaclierichia coii DNA-damage Inducible protein 52 I 36 I 24A1 I putative iBacillus subtilis).jj 173 12 1968 1183 19111215693 Iputative orf; GT9_.or1434 IMycoplasma pneumonlael 1 52 J 30 j 786 a a ac 090*ee% TA BLE 2 S. pneumonlae -Putative coding regions of novel proceini'Anilar to known proteins Contig ORF Start I tp mthmthgn ae% aim Ident Ilength I D l 1ID In Intl I acession I mItp I 98 6 j4400 3561 IgnliPllIe30ol 1hypothetical. protein (Bacillus subtilis( 1 52 26 I 8341 210 112 1 8844 19107 lgi1497647 IDNA gyrasa subunit B Ilycoplasma genitaliumi 52 38 1 2641 4 4 214 It0 I 52f4 15431 loil550697 lenvelope protein (hluman immunodeficiency virus type 11 52 36 168 I 225 1 1 1 I 884 jgij 1552773 1hypothetical (Esc3.erichia cohlil 52 j 34 870j 230 111 3S 362 IgnlPlDIdiO0S82 junknown (Bacillus subtilil 52 28 j 324 287 111 8713 2 IgnIIPIDje335028 Iprotease/peptidase Ihycobacterium leprael 52 29 870 363 12 11305 1 4 Ig11393394 ITb-291 membrane associated protein ITrypanosoma brucal aubgroup( 52 32 1302 I 23 12 12048 11173 1gnI1Pl331e254943 Junknown (Iycobacterium tuberculouisl 1 51 30 1 876 29 13 174; 1521 IgiJ929900 l1 -metlylthioadetioslne phosphorylase (Sulfolobus solfataricusl 1 51 31 1 780 I 4 I 4(1 1597 gilI877429 jintegrase (Streptococcus pyogenes phage T121 1 51 32 1 1188 4I_ 00 3- b-c---r--ylo-iI 82-I I 73 I 5 1 42-)6 14016 IgiJ474177 laipha-D-l.4-glucosidase (Staphylococcus Xylosus) I 51 31 1 261 I 81 111 1 8935 112057 g9I(311070 Ipentraxin fusion protein IXanopus laevisl 513 31 3 123 100 I 83 1 S 1 19 1986 IgnlIPI)jdlOl316 IYqfI (Bacillus su~btilial 1 51 j 33 1 792 4 I 98 110 1 75-111 8538 Igij4l500 lORF 3 (AA 1-352); 38 kO (Put. ftsX) (Escherichia coli 1 51 1 28 1008 I 113 1 6 13908 15173 jgI1466882 Ippsl; B1496.C2.189 (Nycobacterlum leprael 1 51 1 27 1266 I 124 111 326 1 57 1g112191168 11AF007270) contains similarity to myosin heavy chain (Arabidopsis thalianal 1 51 I 32 1 270 'L 1 129 110 1 12116 1 6816 -IgiIl04624l lorf14 (Bacteriophage IIPII 1 51 j 30 1 471 4 4 1 143 3 14963 13983 Ig11354935 1probable copper-transporting alpase (Escherichia colil I 51 1 26 981 1 148 115 111359 110226 Ig1i2293256 J(AF0082203 putative hlppurate hydrolase (Bacillus subtilis) 1 51 36 1134 4 4- 4- 149 18 60033 7313 IgiIl633572 Illerpesvirus salmiri ORF73 homolog (Kaposi's sarcoma-associated herpes-like Si 21 1311J 1 I 1 I1 virus III 4 1 351 1- 9 112012 111550 IgnljPIDje281S8O Ihyporhetlcal 40.7 kd protein (Bacillus subtilisl 51 34 j 543 1 159 16 12555 13208 jglj146944 ICNP-N-acetylneuraminlc acid synthetase IEscherichla colil 1 5I 1 36 j 654 4 3 74 1 1 11177 4 IgiJI773l66 Iprobable copper-transporting arpase (Eschericitia colil SI1 28 1794 265 4 12231 I1773 IgnljPlD~e2S6400 lanti-Plfalciparum antigenic polypeptide (Saimiri sciureusl 1 SI is J 459 4 277 2 643 j1311 Ipir 153291515329 IpilD protein eisseria gonorrhoeae j 51 1 33 669 4 ate -6 0. .e'4 C 500 094 .C St 0 9* 0 TABLE 2 S. pneumonlae -Putative coding regions of novel proteind 'seilar to known proteins I Contig lOAF I Start I Stop I match Imatch gene name im 11 idt Ilnt I 10 jio I Intl I Intl I acession- 350 1 1 890 1 3 Igii29OSO9 Io0 ~cerci oi 51 1 30 1 see 4 9 1 363 4 j 1228 j 4485 19i11107247 lpartial.CDS ICaanorhabditis elegans) 1 51 23 3 258 1367 1 1701 J4 1i1393394 ITb-29l membrane associated protein ITrypanosoma brucei subgroup! 51 f 32 1698 1 I5 1 5174 14491 fgnljrxoje5slsl jF3 (Bacillus subtilial I so 38 1 6781 I16 14 12220 12582 IgnljPI0je325010 jhypothetical protein iBsciiiuS siubtilia( 1 50 j 29 1 3631 I19 15 12591 14159 19111552733 jaimilar to voltage-gated chloride channel protein lEscherichia colil I 5s 30 j 1569 i4- 4 125 1 4 1 2701 11997 1I1I887849 I0RF~f2l9 (Eacherichia colil 50 27 1 7051 1 35 1 1 1 1 417 IgnijP10~e236697 Iunknown iSaccharoeyces cerevisisel I so0 33 207 1 39 1 I 3416 15152 IgniIPIDjdlOO974 junknown (Bacillus aubtilisi 50 1 27 1 1737 7 1i 4000 5181 19i11592027 Icarbamoyl-phoaphate syntitase. pyrimidlne-apecilic. large subunit I so 27I 112 I I I I I I(Nethanococcus lannaschiii I I I 1 S i 9 7179 18303 gi(1591842 jtype I restriction-modification enzyme, S subunit Iaethanococcus 50 28 1125 I I I I I I jannachili III 152 18 18740 19534 gi11144297 lacetyl esterase IXynCi ICaldocellum saccharolyticuml 50 1 34 1 95I l) 1 52 116 116591 115770 Igii2l08229 Ibasic surface protein (Lactobacillus fermentumi 50 34 1 822 a 1 57 1 7 1 6031 j 6336 IgiJ2275264 160S ribosomal protein LIB iSchizosaccharomyces poinbel I 50 1 40 306 1 71 123 129348 128383 Ignl[P1Dldl0l328 IYcilA (Bacillus subtilisj I o0 30 966w 1 86 112 111155 110769 Ign1IPlD~e324964 lhypothetical protein (Bacillus subtilisi I 50 24 1 387 93 12 11205 1330 IgiI066016 Isimilar to Escherichia coli pyruvate. water dikinase. Suiss-Prot Accession 50o 24 7 1 1 Number P23530 IPyrococcus furlosusi I II 1 96 5 11673 1 2959 j gni IPlDIe322433 Igamma-glutamylcystelne synthetase (Brassica juncea( I 50 29 1 128.7 1 98 j 2 1 218 j 1171 1g1115111O lieucine-. isoleucine-. and valine-binding protein (Psaudomonas aeruginosal I s0 j 30 954 4 1 103 14 j 3303 1 2785 (giJl54330 (0-antigen ligase iSalatonella typhimurlue( 1 50 j 31 j 519 1 115 15 j 6480 j 5980 jgij895747 Iputative cel operon regulator (Bacillus subtills) 1 50 1 26 501~ I129 Ill 1 7559 17305 jgiJ116475 Iskeletal muscle ryanodine receptor (Homeo sapiens( 1 50 1 32 255 I 129 113 1 8192 j 7965 Igi 152271 1319-kDA protein (Rhizobium eeliloti( I 50 1 30 1 228 1 151 15 1 7634 6819 101140348 Iput. resolvase Trip I (AA I 284) iaacillus thuringiansis) I 50 1 35 1 816 a 1 153 1 1 9 IgnIIPIDIdIO2OlS 11AB0014881 SIMILAR TO NITROREDUCTASE. [Bacillus subtilis( I s 29 597 *ABL a a a a2** c I .D .1 a ant a ant a ac o I a a S96.43 pneumoniae -ps Putativecong reons f ol prten simla tokowroen
I
160 j9 17390 16323 1giIl786983 IAE000179) o331; 92 pct Identical to the 333 as hypothetical protein 50 j 30 1068 I f YBHE-ECOLI SW: P52697; 26 pct identical (7 gaps) to 167 residues of theI I I i I 373 as protein ILE-TRJCU SW: P46057; SW: P52697 (Escherichia colil I 1 163 16 17396 18091 lenIllPIldlOI3l3 IYqeN (Bacillus subtiis( j o 0 22 696 111615232 13940 1911413926 jipa-2r gene product (Bacillus subtilisi I 0 jo 27 1293 169 12 1 807 1130 jgn1IPlIe304S40 jendolysin (Bacteriophage Bastillef I so 35 I 678 1 17 1'l 5 I3168 14025 1911606080 I0RF~o290; Ceneplot suggests frameshilt linking to o267. not found I 50 27 858 1 1 1 1 I Eacherichia colil) III 1 210 111 1 8151 18414 1911330038 114KV 2 polyprotein ((lumen rhinovirus( I so 25 j 264 1364 1I 1518 1135 1811393396 ITb-292 membrane associated protein (Trypanosoma brucei subgroupj so0 31 1 1404 I 0 7 5911 5090 g911144859 jOBF 8 ICloetridium perfringensl 49 24 1 822 0 I26 1 5 110754 19768 1I1II142440 jATP-dependent nuclease (Bacillus subtilis( 49 1 31 1 9871 66 7 9777 8398 1911414170 tkAgnprdclethanosarcina macmill I 4 6J 18 i T g----produ-t 1300; 77 6 5354 4648 1gnl1Pi1e285322 IRecX protein (Hycobacterium smegmatilI 49 1 28 1 717 182 113 112639 113249 IgnhIPIle2S5O9l Ihypothetical protein (Bacillus subtilis( I 9 20 5611 1 93 1 9 1 4856 14531 191140067 jx gene product (Bacillus epheericus) 1 49 1 26 1 3361 1 112 1 5 1 4019 14948 19111574380 Ilic-I operon protein IlicB( (Ilsemophilus influenisel I 49 27 930 I 129 17 16058 1 494 gInl1P1D~e267587 IUnknown (Bacillus subtilisi I 49 j 35 j 11101 4 I135 5 1 3875 1 4438 1911J39573 jP20 (A.A 1-178) (Bacillus lichanlformisj 1 49 25 5641 1154 12 114-23 1953 IgnIIlIldlOIlO2 Iregulatory components of sensory transduction system (Synechocystia sp.I 1 49 I 29 531
I
1 156 15 1 2818 11631 1on11P10ld10l7312 Ihypothetical protein [Synechocystis sp.j j 49 1 25 j 1242 I173 5 1 3500 12940 1911490324 I.ORF X gene product (unidentitiedi 1 I9 30 I 5611 1 182 111 1057 1 2 1911331002 Ifirst methionine codon in the ECLFI 0KV (Saimiriine hsrpesvirus 21 49 1 25 1 1056 1 192 16 15351 3667 jgi 12394472 I(AF024499) contains similarity to homeobox domains lCaenorhiabditis elegansi 1 49 1 23 1 16861 1 253 14 1 Il:9 1 1350 1gi1531116 151R4 protein (Ssccharomyces cerevisiae( 1 49 1 23 1 2221 277 1 1600 1 136 1911396844 10KV (18 k~sI (Vibrio cholerse) 1 9I 4651 1327 13 1 1435 1 887 1911733524 Iphosphatidylinositol-4.5-diphosphate 3-kinase (Dictyostelium discoidouml 1 49 24 5491 4 -4 99 9* 0 0 9 9 9 TABLE 2 S. pneumonia. Putative coding regions of novel proteins similar to known proteins ma c -t cI)g -n a-n am 4 I D lID Itnt) I Intl I &cession IjIntl 3 1436 132 11i393394 Irb-291 membrane associated protein (Trypanosoma brucei subgroupj 1 49 j 31 1 1305 I 33 1 I 4461 13277 1911145644 Icodes for a protein of unknown function ltacherichia coilt) 1 48 j 26 1 11851 I 40 12 1652 11776 IgnIIPIDIC29O649 jornithine decarboxylase (Nicotiana tabacum( j 48 29 1125 67 67 4- J- -1377-- 2384-- 2---et d g--uc--n-t---kina---e-----alofera- licante---- 48--30 1008- 74 1 2 14269 13871 1g112182670 IlAEOcOoI0I Y4vJ ifihizobium sp. NJ0R2341( 48 1 27 1 3991 I 81 12 11326 1541 1911I53672 Ilactose repressor (Streptococcus mutansi 48 33 j 78 I 81 14 129811 3646 jg1146042 Itucuioae-1-phosphate aldolase (fucA) (Escherlchia colt] I 48 j 30 1 6661 2- I 97 1 1 602 1 51 jgij 153794 Irgg (Streptococcus gordonii) 1 48 1 29 55 I 110 1 1 1 1 3132 19111381114 Iprtll gene product iLactobacillus deibruecklil 1 48 23 3132 I 131 15 12914 12147 jgniIP1I3~e183811 IAcyl-ACP thioasterase hlrassica napus) 1 48 2 7 1 768 4- I 133 14 13494 1262I IonIIP10Ie26I988 Iputative OaF (Bacillus subtills) 1 48 27 867 I 139 1 6 14231 14599 1gI11049388 1ZK470.t gene product iCaenorhabditis elegensj 48 1 23 1 3691 139 1 8 1 5036 1 5665 19111022725 Iunknown (Staphylococcus heemolyticusi 48 29 8 30 1 140 112 111936 111007 IgnIjPIOjI2O49 IN. Intluensae, ribosomal protein alanlne acetyl transIaerase; P44305 (189) 48 1. 37 I I I I I I iBacillus subtilis)II 4 I 146 1 9 1 5670 14654 iI19173I leelvalonate kinase (lethanococcus Jannaschiij 48 j 24 1017 4 I 161 1 3 1 1280 1 2374 Ign(jIDjdl0I578 ICollagenase precursor IEC IEschericluie rolfl 48o 24 1 1095 4 172 Ill 110581 111048 IgnlIPiDldlOll32 Ihyvotheticai protein [SyneclIocysls ap.1 I t 2-7 468j I 182 14 1 2930 1 2586 191140067 Ix gene product (Bacillus aphaericus( j 48 1 37 345 210 115 110786 111196 IspIPi394ILE9_ ILATE EN8RYOOEI4ESIS ABUN1DAN4T PROTEIN D-29 (LEA D-29). 1 48 1 30 411 4 214 112 j 6231 j 6482 1giJ40389 mnon-toxic components IClostridium botulinum( j 48 26 1 252 4 2211 1 1 704 j 3 l~il573364 IN. infiuensae predicted coding region 1H10392 iaemophllus influenze) 48 27 702 27 12 647 3928 'gI639 IAEOOOOOSI Ilycoplasma pneunoniae, C09...orflI18 Protein llycoplasma 48 10 3282j 127 2 11639 pneumonialee I I 4- I 253 12 1480 1 758 IenIIPiDje236697 junknown (Saccharomyces cerevisael I 48 3 t1 279 I 363 13 11874 11122 I91118137 Icgcr-4 product (Chlamydomonas reinhardtili I 48 j 40 1 753 I 389 111 505 1 2 191118137 Icgcr-4 product (Chlemydomonas reinhardtii) 1 48 1 38 1 504 4- I 3 121 120879 122258 1gn11P101e264778 jputetive maltose-binding pootein IStreptomyces coelicolor) I 47 I 33 I 13801 a a sa V*0, T A B E *S pn u on a Pu at v co in re io s oe l *rte n *r a o o te n 5 5 5 5 5 5 S 5 0 5 55 5- 55ti 5*R I Star I ato h mac gen *aeam i lnt *eso I 5 55 55 5 S. pnuona 1 09 145 gJ 1Putaiv codin reg6(aionus of hno rois 47ma to knwnprten 6 5 4 4085 1 4768 fgilj39573 O5 jun2Ow fA A c u 1-78 uBills 1hnfoss 47 1 23 19707 4 1 35 115 114511 113263 1gi11773351 jCap5t. (Staphylocnccus aureus) 1 47 1 20 1254 1 51 .16 1354-1 4002 1pir1A370241A370 132K antigen precursor tiycobacrirum tuberculosis 1 47 1 38 456 1 55 1 8 1101501 9273 1gi139848 JU3 (Bacillus subtilis) 1 47 26 1 882j 92 T 1 127 9 5581) 5386 111786458 [A20001341 f120;aThis 120 as orf is 76 pct identical 10 gaps) to 42 47 32 204 I I I residues of en approx. 48 aa protein Y127-HAEIN SW: P43949 (EacI~erichiaI I 1 22 1759 IgnlIPIDje2i65S Iunknown (Hycobacterium tuberculosis) 1 47 23 I 5281 4 1 140 4 4951 1 3542 ignlIPlDIdiO0964 Ihomologue of hypothetical protein in a rapamycin synthesis gene cluster of j 7 24 14101 I 1 I Streptomyces hygroscopicus (Bacillus subtilia)II----- 4 4 1 151 14 16814 16200 1gi11522674 IM. jannachii predicted coding region KJECL41 (lethanococcus jaitnasclsiii 1 47 27 615 1 157 1 3 1 803 1 174 lgnljPijdl01320 IYqgZ (Bacillus subtilisl 1 47 1 25 I 372 1 1 178 5 13267 1 2155 9l37 0 A000390) o334; sequence change joins OR~s ygjfl L ygjS from earlier 47 3 I I91379 vealon )GJR-ECOLI SW: P42599 and YGJS-ECOLI SW: P42600) (EacherichiaI Io I Iy coI I I ve 300 1 2 1880 1644 IgI18635755 Irinc finger protein Png-l (Nus musculus) 1 47 22 1 2371 1 54 114 114182 112638 1pir154360915436 IrotA protein Streptococcus pyogenes 1 6 1 24 1 1545 1 1 88 111 2 11018 1gnl1Pl01e22389l IXY103e repressor (Anserocellum thermophiluml 46 27 1017 1 96 17 4553 15860 Ignl[PID 1d101652 I0RF-ID:o347I5; similar to (SuisaPror. Accession Number P452721 ~saclierichia 1 46 231 13081 II I I I coll I III 4 1 112 1 11127 1 3 1gLI22b9 215 (A004325) putative oiigosaccharide repeat unit transporter IStreptococcus 46 24 j 11251 I I I I I I pneumoniae( Ij I 122 113 I 7308 7 982 1g111054776 1hr44 gene product (Hlomo sapiens) 1 46 34 I 6751 1 127 114 I 9198 18125 igif 1469286 jatuA gene product (Actinobacilius pleuropneumoniae( 46 1 28 j 10714 1 132 14 17093 16197 IgiJ 153794 Irgg (Streptococcus gordonilil 1 46 26 1 8971 I 140 1 8 1 8220 17723 IgiJ1235795 jpulluianase (Thermoanserobacterium thermosulfurigenes) 1 46 1 21 1 498 -4 4 4- 1 140 j 9 19205 1 8315 IgiJ407878 Ileucine rich protein (Streptococcus equisimilisl 1 46 I 27 891 999 9 9* 9 99 9*9 99.
TABLE 2 S. pneumoniae Putative coding regions ot novel proteins Similar to known prQteins ei I I JI1i I Intl I Intl I acession I I I F 162 j- I j I 11- -I25 -;g1j143209 OAF7; Method; conceptual translation supplied by author (Shigeila sonneil-i-- 46-- 25-i-- 11251 19 jI j 1 585 Igi1194717i 11AF0002991 No definition line found ICeenorhabditis elegansi 46 1 28 1 585 223 1 1971 1477 isrIPO2S62NYSS. MHYOSIN HIEAVY CHAIN, SKELETAL MUSCLE (FRAGMENTS). -j 46 271 495 i 232 2 760 1608 19111016112 Iycfl8 gene product ICyanophora paradosa]) 46 1 28 849 ;292 68 1220 j9-i111673744 E00 0 0 Hycopias-a pneust-onis-e.-cytid nedeaminae simila to I I I I II Accession Number C53312, from 14. pirum Ifycoplasma pneumoniael IjI 647 1 esidues of an approx. 216 aa protein YTXB..BACSU SW: P06568 lEscherlchla 4560 I I I I colii 48 1 6 13461 13868 jgilj722339 Iunknown lAcetobacter xylinus) 1 45 1 29 I 408 60 1 307 1 2 IgiI6990' coded for by C. elegans cOMA yk4lh4.3; coded for by C. elegans cOMA45630 60 1 j 07 2 IgjI 199079 kl4SgiO.5; coded for by C. elegane cOMA yklS2gS.5; coded for by C.45330 I I I I I eegas cOMA ykS9alO.5; coded for by C. elegans cOMA yk4Ih4.5; coded for I I I Iby C. elegans cOMA cas2OgIO; coded I -S--l----I041 99 71 9158 1 ,91.91129 mt&eion causes a succinoglucan-minus phenotype; ExoQ is atrenamembran. 45 28 1218j I ~pro0tein; third gene of the exoYFO operon;; putative lIfilobium melilotigI 127 112 1 046 16606 IblbsI153689 I~itB-iron utilization protein (Haemophllus influenzae, type b. D1L42. 14TH! 45 24 j 441 1 1 1 1 T14106. Peptide. 506 llhaemophiius influenzeelIIII 1 137 1 5S 1561 1 26.19 1911472921 Iv-type Na-ATPase lEnterococcus hirael I 45 1 3 1059 1 209 111 774 1364 1911304141 Irestrlction endonuclease beta subunit [Bacillus coegulansi 1 45 28 411
I
314 1 604 2 jgij 1480457 jlatex allergen I~evee brasiliensisi I I5 31 j 603 I 2* 18 198 1228 11434lOaF [Lactococcus lactlsl 1 I4 26 507 18112i0 I 87 I8 I7030 6452 19i1537207 J0RFf277 IEscherichia colil 1 44 26 I 519 T 166 15 1 4509 14037 lgn11P101e308082 Imembran. trensport protein [Bacillus subtilisj 44 25 j 873 1 247 1 1 1681 75 IgnIPIDIdlOO7i8 lORF! ifacillus sp.I 4 1 2 4 1 32 1 3 1885 1 3876 19112351768 IrspA Istreptococcus pneumoniaej 43 24 1 19921 7-- 36 157467 j18256 g 1045739 -tai u predicted coding--reg--on----0064-----c-p---s-a-genita---- 26-2790
I
1 54 115 114f56 117343 1911520541 Ipenicillin-binding proteins IA and 18l (Bacillus subtilisi 43 27 j 2688 67 2 696 11352 1911536934 jyjcA gene product iEscherichia colil 43 29 6571
I
139 12 2416 338 1911396400 Isimilar to eukaryotlc Na./ll. exchangers iEschierichia colil 1 43 f 24 2079 9*a 0 0 9 0 .00 090 000 009 00 0 9 000 0000 00 0 0 0000 0 0 0 0 00 9 00* 0 000 0*0 0 TABLE 2S. pneumonlae Putative coding regions of novel proteid!-slmilar to known proteins -i t I D 111 1 ntl I tnt) I acession maI tnt) 1 298 1 3 809 1gij413972 Iipa-48r gene product (Bacillus subtilial I 43 24 807 I 87 1 47 42? Igi 12315652 f)AF016669) No definition line found (Caenorhabditis elegansl 43 J 30 381 l- i I -185 4 4-T 4221 -i3127 11i2182399 j(AEOOOO?31 Y4IP (Ithizobium ap. 11082341 41 1 25 I 1095 i 30 jI 52 70 ~gnljPIDje2l86B1 ICDP-diacylglycoro1 synthetase (Arabidopsis thalianal 41 1 20 1 513 ft- -IT58 -ft- 36 1 941 191121783 ILHV glutenin (AA 1-356) I~riticum aostivuml 1 41 1 34 j 942 1ft- j3 J 48 1142023 eme fAPdpnen rnpr aiy very similar to mdr proteins and 1 40 18 12 t- f 1 365 2 1 95 11438 igiIl633572 I~erpesvirus saimiri 08F13 honiolog IKaposi's sarcoma-associated herpes-like 40 21 j 1344 I 1 1 I I IvirualIIII ft-- I 1 1 3 12979 3 860 Igni I1DIdlOl908 1hypothetical protein lSynechocystis Sp I 39 16 1 883 1 I 1 1 5 1 3814 14647 IgnlIDIdlOl96t 1hypotheticai protein ISynechocystis op.) 1 9 19 I 834 1 26 j 6 114035 110124 1911142439 jMTP-dependent nuclease (Bacillus subtilis) 38 1 20 1 33121 1 47 1 1 3 14916 Igi1632549 1111-180_IPetromyzon marinus) 36 1 23 1 4914 4 Caa a a a a a a. a 09 a 900 a a a 0 9 a a a a.
a 9*0 aas *aa 0 9 00 a a a a. a a a a a a a a a 9e 9* 99 *a *aa TABLE 3 S. pneumonia. Putative coding regions of novel proteins 1 -Contig IORF I Start iStop-- ID lID I (tit) (ntl 1 4 342a 3009 1 6 4611 4964 8--6--i I 6 1512 1574 I- I 532-49 3 125 1259046 125396 6 2 15 1689 -7-1-688-- 4-6--1--i 1 1$ 1315 j24 16 11-597-159 i--7 12 9-9 55 -4 90-; 4169 not ei-riar~ to known proteins 0 0 00 0a 0 900 06 TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not similar to known proteins iContig -ORF iStarLrt Stop-- ID j ID I in tl (niL 21 5 4802 4482 22 121 117099 117362I 22 125 119467 119982I 22 133 25S40 125764 I 22 36 16382 127572I 23 1 6655 6032 2 9 23 i8 -i7132 -i6653-- 24 1 36 518 27 14 4819 4223 27 V5- 4789 4956 28 5 3017 1797 28 8 j4272 3850 1 28 110 15028 14597 1 28 111 1 5746 15072 1 29 7 5596 4919 29 8 5039 5518 29 9 5595 i8207 9 j6511 6263 31 6 2664 2344 32 5 5203 1 5538-- 34 812 5-327 -8468 I 34 13 966 1024 77 T A L9 n u o ia u a ie cod n *ei n of no e prte n *ot Of~ a to now prte n S *9 C Carti-s o C 0 C I ID I nt I (t C I 3C- C-8- ;1310- ;110-2- C 36 11 13104 10190 i~ 35 (11i 3 -l9 3 -l88 58 i -36 112 1-51120 1193 I 36 173 109932 1118 2--3--i 43 115 (1272-- 114595 I 3 128- 96 913 I 3---1-4 46 I4 I3875 I 3468 1 46 I7 I6074 I7081 I I 48 (5 (3196 582 I I 4 I I4579 4229 I 48 (6 13042 (12494 I 20 i-142 (15764i--48 24 (17971 (1I8351 -I I 48- (30- (21979 (i21776 -i T A B L 3 .p e m n a u ai e o in e n of no e *ro ein not eeor a eto *n w ro e n eq-- eq ec# cc-- C oni cc R Star I Sto I c so5 4 j3307 I2672 I 1 5I 3239 3598 52 j 116 128 I 54 a 6013 54S9 I 54 I9 I6004 6210 5- j16 117685 117506 I 5-5 1- 12-111947-112141-1 56 13 93S 1387 ,-4-T9-1-4-9- 15'II 210 251 I0 6 4 59 2 43 4 I 59 I I246 23 4-TI7 615 8 3316 3176 64 I 66 I 5459 592917- 66 6 574908 169495 0 CCC TABLE 3 S. pneumonia. Putativ, coding regions of novel proteins not hl'nlar to known proteins Contig jOAF start Istop I ID jID nt) IIntl I 5 j4059 j3922 6 4215 I4057 9 5268 j5504 71-- 1 15-120351-121901--I +1-71 1 16--121859 122338 I 72 9 848 88 73 I- I- 3815 4216 1 73 16 1 4214 j4582 I 73 I 1 4369 4773 73 1- i10- 7183 -i 6428-- 73 I 5 9462 9668-- 76 -524 195 76 12 1867 1535 1 76 I11 I 8602 19210 I1- 80 i 6 -i 7924 j -8109-- 81 1- ;10 -i 6631 -i 8931-- I 83182 1 83 1I7 116810 116460 I -84 i3-1 4464-1 29291 86 12 2147 j ;092 160---287- 86 119 116767 117114 I 87 5 536 87 -;6459 6001-- I IC S- 87 j- 9 -i 7224 -i7006-- 4 TABL 5 3 5 5 5 5*uona Puatv 55din rein of nove prtisnt55l okonpoen S oti 55 F 55 5 5* 5 5 5 5 Ss 55- 5 5 515 17570 SiD 119 11827 1179n8i1 4 88 87 11 173 177
-I-
B 94--5---606 88 3 21619 1840 88 2 7119 2878 9 114 622 501 91 6 1 723 68683 90 2 99 29 3- -T -C 903 -i 1143 279 -i 91 3 295 3141 961 4 1147 j 6147 916 6- i, 253 1420 i 99 11 845232 162 960 12 494 16320 96 4 9120 4203 4-4-5 9 9 9 9 9. 9 99a TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not -41 iiiar to known proteins ~Contig ORF Start Stop 10 j1D Int Intl 106 I1 1 j363 I 10 110 1 9832. 110212 1 108 1 2 268 111i 3 3417 j 788 I 111 4 3809 4606 1 115 1I0 110854 110438 1 116 3 2873 2121 1182 12 1 2271 1571 1 I 122 21368 23330 F129------4-69 127 122 21 5858 6199 1 122 112 637091 741 1 11 0 92 46 F129 1 6259 102 139 21 180 11 45324 TABL 3 f C C C C Cn a *uatv Coin rein Cof noe prten no siia tokowC roen C t 1 140 j20 119622 j20838 142 1 1 1 285 146 13 760 479 I 146 j4 j1149 1778 146 1II 8223 9401 1 146 ~14 19399 110676 14 1 10052 1 9750 -6 F 7 1488 1 7276i147 ;9-1 -8913-1 8647-- 1148 F7 1 5298 -i47651 I 4 j 1 2 1936 j I 149 3 1 2557 2880 153 3 2061 2642 155 2 211 1411 156 8 4550 4311 1I 37 1294 i -159 2- i 631 80- 159 7 3271! 4017 47--- 161 2 1332 11018 1 C- 00 60 0 a TABLE 3 S. pneuueonlae -Putative coding regions of novel proteins not similar to known proteins Conig RFStart Stop ID 11D Int Int 167 j9 j6075 6395 169 i5 2828 j3205 170 7 6485 I6243 170 8 1 6964 6362 170 1 9 1 7303 6962 170 1111 8790 7906 1 171 9 7150 74-76 172 5 2298 I1948 F7 -176 ;--;2913 166 267 i175 i9 i 665 49835 177 It Sil 893 7 176 2 3487 574 i-4 1-1---006 183 2 220 24660 187 66 42 4- -T i 171- V10-1-4923- ;4817 4- .0 00: :00CC TABLE 31 S. pneumonia. Putative coding regions of novel proteins not simfiar to known proteins Contig IORF jStart I Stop I 10 JID IIntl I Int) S 1 188 6 5882 ;6493 I 189 j9 j5956 1 5564 1 1921 1 601 1 287 1 191 I 37 680 110331 S7-- 83-- I 192 5 624 4508 43288 192 6 6800 62531 *-19 F-0 19 i-6-1 6620 1 62351 -5--i-4-T37-- V-198 5-1- I 644 66449- I 200 5 252 52692 200 60 793 82305 204 5 25891 23276 14 i91 T A B L 3 S. ia Pu at v coin *eg on e nove pro eCn no CA a ok o n p oti OR tr .St* o*p* C *22 4 I 217 1 51 5194 3414----53--4-2- V-218 2-94-43 4 310--- -68 I 218 1 6 639 382 S4 219 1 3 5 362 I 220 2 869 6002 22 267 1964 227 1 1179 101 23 1 6 539 1312 263 6 2116 i 19038 I 235 1 52 i 312 235 2 310 6817 246 2 112 1 170 248 1 5 386 4 275 2 1179 1616 TA La.S.numna Pu av coin reios o noe arten A. ae o anw roen 4 282 1 684 39 i42 5--,-4-185 283 2 25119 j 262 288 1 j 2540 60 S- 2896 1- i 3- -684 V 43 3- 4 230 2 2539 1029 324 1 92 608 296 2 j 49 3570 296 3 627 843 302 1 261 5300 9 0 319 3 477 133 TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not htmilar to known proteins iContig IORP j Start j Stop-- ID ID I Intl I Intl I 4 1 44 1 973 I 35 2 636 1448 360 2 1948 ;628 364 2 1639 11265 ;378 -11 -i345-1 -1004 1 I 379 I2 j683 510 i381 ;1-1 109 i693 4 GENERAL INFORMATION: APPLICANT: Charles Kunsch Gil H. Choi Patrick S. Dillon Craig A. Rosen Steven C. Barash Michael R. Fannon Brian A. Dougherty a a a a a (ii) TITLE OF INVENTION: StreptocoCCUs pneumoniae Polynucleocides and Sequences Ciii) NUM4BER O F SEQUENCES: 391 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Human Genome Sciences. Inc.
STREET: 9410 Key West Avenue CITY: Rockville STATE: Maryland COUNTRY: USA ZIP: 20850 COMPUTER READABLE FORM: MEDIUM TYPE: Diskette. 3.50 inch. 1.4Mb storage COMPUTER: HP Vectra 486/33 OPERATING SYSTEM: MSDOS version 6.2 SOFTWARE: ASCII Text (vi) CURRENT APPLICATION DATA: 149 CA) APPLICATION NUMBER: FILING DATE:
CLASSIFICATION:
(vii) PRIOR APPLICATION DATA: APPLICATION NUMBER: FILING DATE: (viii) ATTORNEY/AGENT INFORMATION: NAME: Brookes, A. Anders REGISTRATION NUMBER: 36,373 REFERENCE/ DOCKET NUMBER: PB340P1 (vi) TELECOMMUNICATION INFORMATION: TELEPHONE: (301) 309-8504 TELEFAX: (301) 309-8512 INFORMATION FOR SEQ ID NO: 1: SEQUENCE CHARACTERISTICS.
LENGTH: 5625 base pairs TYPE: nucleic acid STP.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: CCAAGCAAAA CCAGCTACAG CTAAAGGAAC TTACGTAACA AACTGACTA TCACAACTAC
TCAAGGTGTT
AAAAAGTTGA
AGATGTGTAC
GCTATATCAA
GAACGACATG
GAGAGGGGCT
AGTCCATCGA
GGTATCAAAG
AGACGCTATG
TATTCTAGTT
AACCAGTCCT
CGTTAAAAGT
AGAGATTATC
GCTTTCTAAT
TTGACGTAAA
TCTCAACTTT
CTCACTTTAA TCAGTAGN'A AAGTAATGTA TTTTGATGTA CGACGGGCAT GTTGTATAGT 0 a.
a 0 a 0 *0 0 0800 0 *0 *0 a 0 *0*0 *0*0 0 a. 0* a a 0 TCAATCTACT ATAGTAGCTC AGAAGTCGGT ACTTAAACGT TGAAAAACGT GGACTGGTTT CGTGTTrTGGA TTATTACCTT TAGTTGAACC GCCGTATGCC GAACGGACGT ACGGTGGTGT CCCTACTCGA TTrCGAAATC TAGTGGAATG AATCTGGAAT ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT CTTCGTCACT TCTATCCACA ACCTCAAAAC AGTGTTTTGA CTACAACCTC AAAACAGTGT TTTGAGCAAC CTGCGGCTAG TT'TTCATTGA GTATAACACA TTGTTAGAAG TTGGTTTAAA CATTTACCT'r CGATATATTA TATCCCATAG TTA.AGGTTGG GCCGTGCGTA TGGTTACTGA GC'rGACTACG TCAGTTCCAT TTTCCTAGTT TGCTCTTTGG TTTCCTAATC AGTTTGTTCA TCATACAGAT GATTATAGTC TGCCATGAAA AAAATATTTG TATCACCGAT ATTCTATACG TGTTTCAATA GTTTCGGCA.A TTTAGAGTTA CTAGATAAAT TTGAGGGTAA GGAAAAGTAA TGGGAATGAG TGGATGGATT TAT'TA'TGGA CAGTTAGTCT TGCTAATAAT GAGGAGGTTA ATTGCTAAAA CATTTATAGA TTCAATCCGC TATATATTAT ATATCTGATT ACAAACAGAA ATGGAGCCGT AAAACTrAGT TAACTGTAAT AGGATATTTT TAAATGGTAC TGCTATTCTr TTGATAGCAG TGAAGCAATG CTCAACCTTT TGAAGAAGAA AAGCAGTAAG AAAAATGTCT GAATAAAATT TGATTAAGAG TGAAGTAGTC TAAGAATTAG GTTrATGTAT AGTAGACTGA AATTAATTTT ACTTTCCCAA GGTATCGAAT CTTCATCAGA TATGAAAGCT TTTTATATCA GTTTCTTTAG TTGACAAAGA GAAATAAATA TAGATGAAAA TATCTTTAT'r TACGTTCAAT TTGCTACCTA TCA'rTAATGT TAATTTATTA GCTCACTAAA TGCATTATAC AGCAACCTT'r TGGATGATTT ATCTGTAGAT GTTATAATCA GTAGAAGCCT ATCTAAAATA GTACGAAACA TCGATTTGTT CTCATCTTAT ATGATAAAAT TAATCAATTG CTATTGAAAA ATTTATACGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 GATGATGAAA GCCTTAAGTG TTA-TTATA AAGGTTATTT CAACTCGTTC CAAGGTAACA AGTCTAGATC AGATTGAAGC TGA'rAAAACG ATACAAAGAA AATA'rTCAAG TGAGCTAAAA AAATTrTATTG GATTTTATAA TGAGATTATT TGTGAGGAAA A'rAGTTrCCT ACA'rGTACGA AAGAGGTGGT CGAGTTGGrr TAGGTAGTCG ATGCGTGAGT TGATAATTCT CAGGGTATGG ACTTCTTT'r' CATGAATGAG GTAAAAGAGC AGGTATrGTT TAGAGACAAT CA'rTCTGAGC ATATTCTG CATAGAGGGA GTATCCGATT r'rATGATCAA AGTrAATACC GCCCTC'rGGT GAGAAGATGA GTAGGTTGGT AATTTAAACT ATTAAACAGA A -I-'TTr AAAAGTATTA TTTCATGAGA GAAATCCTAA Tr'rCACAATC CATAGGCAAA CGCTTGCAT'r TCGTrTTr"A rrGGACTATA ATACGTI2GGT ATAAAGCCTT CTGTAGTAAT AAAATGTAGA AGGTGTAGAA AGTAAGGATT TAGAA'rATTT GTAGTTAAAA ACACAATGTT GCTAT'rCCTT ACGATAGGGA GATAGATATG GCAATGATAG AAGTGGAACA TCT'rCAGAAA AATTTTGTrGA AGACTGTTAA GGAACCGGGC TTGAAGGGGG CT'rTGCGCTC CTT'rATTCAT CCTGAAAAGC AGACCrrGA AGCGGTCAAG GATTTGACCT TTGAGGTTCC AAAAGGGCAG ATTTTAGGAT 'IrATCGGGGC AAATGGTGCT GGGAAGTCGA CAACCATTAA AATGCTGACA GGAATTTT~GA AACCAACATC .fl 4 a.
a a en a. a CC Ca a a C. a 4 a
C
a.
a. .a TCGGTr'rTG CGGATTAACG GCAAGATTCC TA'rTGGCGTA GTCTTTGGAC AACGCACCCA CTACACTGTC TrAAAAGAGA TTATGATGT CTTTTTGAAT GAAGTCTTGG ATTTGAAGGA ACTGGGACAA CGGATGCGGG CGGATATTGC TTr'rAGAT GAGCCGACCA TT-GGTTrTGGA AA'rTACTCAG ATCAATCAAG AGGAAGAAAC TGATATT'GAG CAACT'rTGTG ATCGGATTT TGGAACGGTG AGCCAACTCA AGGAGACCTT GCTACCACGT CAAAGTCATC TCCTCTCTCA
CCAGGACAAT
GCTATGGTGG
GCCAGACTCG
CTTATCAAG
GGCCTCC'N'G
CGTTTCGGTT
TACCATTCTT
CATGATTGAC
TGGTAACATG
CTATGACGGT
TGATAGTTCT
CCGCGATTTG
GGAGCTCTAG
AGGAGTTGAT
CTTT'rGTGGC CGGCAAGATT ATGTCAAAGA GATTTGGCTC TGCAAGAGAC CTCTT'rCATA AGCGTATGGA GATCCCGTGC GGACTCTTTC CTCCACAATC CCAAGGTTICT AAGGATAATA TTCGTCGGGC 'rTGACCACTC ACGATTTGAG AAGGGGCAAG AGATTTTTGA AAGACTCTCT CTTTTGAACT CTGTCTGATA TGACCATTGA CGCTACCAGT CAGCTGACAT AAGATGGTGG ATACGGATAT GATGATCAAA TTGTGGAGAC TACTTACCGA CTCAACTTTA CTTT~TATCTC TGGAAGGCTG 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180
TAGACAAGGA
TATCAAGCAA
TGAGGATATT
GTTATAAACC
TTCTCTATCG
AACAGCCTCA ACATTGAAT ACCCTGTCTG ATTTTGAAAT ATCCGTCGCT TCTACCGAAA CTTTATCAAT GCAGGGGTTC GATTGGCGAT GTCATCGGG TCTrTGATTC TTCGCAAGAG TCTTTGATTC ACATCATCAT GAGTTTTGTG ACCAATCTTC GGGAGGAGGT CAAGGATGGC TCCATTATCA CCTCCTATr TTTCACCGAG CI'TGGTTCCA CATTTTTAAG TGTCATTGTC Tq'GATGAAAA 'rAGGATTAAC TGTCATTAT CTTTAGCr TTAATATTTG CTTGGATTT TCAGCCTTTG 152
AGGGCTTCAG
TGACTAGATC
TGCGI-rGT-r
AGTGGTTGAT
TCATATCCGG
TAACGCTCGC
TGrAAAAA TATGGCGGAT A'rCACCCTCT cCATrCGTCC ?r'rATGATTG
GCGACCAGTG
TTTTATCAGC
CA 1r'rGCGG
GTTGGCC'TC
TTAAGACTTC CATAGTGGCT TrTTATGTCGG GGAG7"MGAT AGGTTGTTTC AGATATTCTC TCCITrTTGC CTITTTTCATC TCAAGGTATT GTAGAGGTGC CTATCTGATT AACTTTrC'T TC'rTTGGGGT TCCAACCTAC TCCCTTGGCA TT7"rTTCCAA CTTGATTTAT ACTCCAGTTA GGCACTCCTT TTGCAGTTCT GAAACGGGTC CAGTCCTrrA TGCATCTGAT TNTMATCAGA
TGATCATTGT
TCTGGCTCTT
TCACCATTCA
TGGAAAATAC GATGCCAGTC AGATTCTTCA AGTGATGGTG GGATTGTCTC AGTTAATTT'G AGGAGGTTAG TATGAAAAAA TATCAACGAA CAATACATCA AACAAA'rCAT GGAATATAAG GTAGA7TTTG TGGTTGGTGT CTTGGGAGTC TrrCTGACTC AAGGCrrGAA TCTCTTGTTT CTCAATGTCA TCTTTCAACA TATTCCATTC CTAGAAGGCT GGACCTTTCA GGAATGCACC ATCTCTTTrT GGGGAGTTTG ACAAGTATCT ACC'rTCAGA TTGATGCCTT GTGACCAGCA TTGTTTGGAC GCGACCTTGA TTTATACTTC CAGTCAGGCG CCATGATTTA TCTATTTACA AT~w=TTCT GCCTAC2'ATC CAGCTAGCTA TTGATGTTGA 7rTCTCTGGT GATTCCTACG AAACTGCGGG AGAGATAGCT TT-CATTTATG GAT=TCCTr GATT'CCCAAG TGACAATCTC TGGCCACTAG GGCAACGCCT AGTCCGAAAA 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 GACTCGTCCC ATCAATCCTC TCTTTCACAT GGGTGAACTC 'rTAGTCGGTG GTATTTTATT TCTTCCAAAA TTCCTGCTTT TCCTAGT'rTG TCTTAA-AATC GCAACAGCCA GTATCCCTT CATCTrTCTAT ATGTTCAATG AC=rTGCTAA TCGTTGG'rG AT'rAGC~rrA TCGTGCCTTT
CCTAGTTCAA
GGGAACAACA
TArrCCTT'
TTGGACTAAG
G1'ATCCGATT
CGCCTTTACA
TTTCTTrACAG TTCTTGrr
TTCGTAAAAG
GAAAAGGATG TGTTCTTTAA CGTAGGAGGT ATI'CCCTTA AACI='GGGA TAAGGGCTTA TTATCATGTT 'rGTAATTGAA GAAGTCAAGG AGTTTGAA CGATTTGCCA GAATGG~wrTG AAGGAACCAC GACACTGCAA GTTTGGACCG TAAGCTTATC CTATT'CG.AGT GAAGATTGTG N'ATCAAGGT AGAAAAATTG GGAGCCAATT
CTAAAGTAAG
ATGAAAATCA
GAATCCCAGA
CCTATCAGGA
CAGAGATTGA
ACTAAAATCA AGAAAGAKAC AAAAAAGGCA GTTGTCGCTG AAGCACACAA GCCTATATAG GACTGATTTG ACTAGATTG TTGTCTCGCC GTAAAAAAGC GCTTGCTACT TTAGAGAGTG AAGCTCGTAA AAAAGTTGGT TATCTGCAGG TCAAAACAGT GGCAGA.AGGT AACAAATGAC TTTTATCGAG GTCTTGGCTr TAAAAAGTTA GAATCCGCAA AATCCTTGTC AGArTTGAT TAAAAAGCT TATTCTCAGA GTGCTATACT GTAAGTGTAA TCGCCGATTT GCACTCGTAA AGCCTAGGTT ATAGGTAGAT AAACGACTGA GTAGAAGATA. ACCGTTAAGC CTTACTCTTA GCGGcTTATT T TCTAATAAAG ATTATGATCG GAGAT??rTTC CTCAACTATG GAATAATATT ACTTGACATC AGCT'TAGTrG GTAGAGCAAG GGATN'GAAA AAATAGATAG ATATTGTTTA ATAGCGCTAA GTTCAAGTr-r ATTGCACTA ACGAAAGTTT TGGACTTATA CCGTGCGTAT GGTTATGACT 5040 5100 5160 5220 5280 5340 5400 5460 5520
TATTTTATCA
TTTTTGATGG
CTCTTCGAAA
ATTATGCCTG
TATGAATGTG
ATCTCTTCAA.
TTTTCGTGTT TCTGGTAG'TT CTTATAATGT ATCCCGGTTA ACCACGTCAA CGTCGCC= G TCGTCAGTTC TATCCACAAC CTCAAAACAG TGTTPrGrAGT GACTACGTCA GTTCCATCTA CAACCTCAAA ACACTGTTTT GCCCAATCTG CGGCTAGTTT CCTAG INFORMATION FOR SEQ ID NO: 2: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 7571 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: CTCTCCAGCT TTCCTTGCGA GTPTGGCCATG TTGTGTCTTT AAGAAGTCT1A AAAATATCTC
CAATAAAACG
GCTTACAAAA
AATAGTTTCT
CTCCATCTTA
TTGACTATCG
CGGCTAGCCT
CATCGCTCTC
TTTATTTACT
CGAACCACAA
TCCATTATAA
AATCCCATAT
CATCTACTAG
TCCTATCTCG TTTCTCTGTG TGTAGTGTAC TTGCCACAAT TCTAGTCGTG TAGGCTTGAG GTTTCCGCTG ATCTTGAI-rG ACCGCACAAG CTAGGCTTGC Tr'rTTTTAGT GCCATAACGC CAAGAAAGCT AGGCTTTGAC AAGCATCTTA GCGAAATAGA TGTTGAGCC TTTTCCT'rAA TCTTCGCATC TGAGATAGCC ACT'rTGCGCA CGCCCTCGAA TATCAGACAA ATTATCATCT 'TTGTACTTGT CTTTTTGTAT TGGCTGGTGC AATTCCAT CGTAAAGGTA CTTCCTGGCG TATAAGGTAA AATGGTATTG AGCTGCACCG TT TGAAGTAG AGCCAGCTAG ATAGTGGTT'r AAGCCAGTGG CTAATCACTA CATCCGGAGT ATAACCAATT GTCTGGCTAT TATCATTGGT TGCTTATAAG CATTTTCAAC GCAATGTTTC TAAAGACATG TCATCAGTGG TCGGAAAGCC ACCCACTGGT CACTTGTGTA CTCCGGATTG AAAACTGCTT CAGTTGTTCC AG7I?1CCCT 154 CGATGAACTA ATACCGGTAC CGTTGGTGAA AGTCCCCAAC 780 GCCATGACAT AGTCTGCAGG ATCATACTGG TCATCTTGTC TGACTCGCAA TAACTTGTCC ATTAAACCTT CA?rTGCAAA ACACCGCTTC CCAACGCGAC 7"rMCGCCTG CCTCAAA.AC AGATTAAGCG AFCTGCCAA GCA'rAGTTAT CAACCTTATA AAAGCCCAGC 'rTGCTTCAAC GGACTACGCT TTGA'IrGGGT AGCrACAGAC
ACTAGCATTT
GGCGGCGTAT
ACCAAGAACA
CTTGTCGACA
GGC?1'GATAC
GCTGTCATAC
TGCTGGCGTA
TGCATAGTTG
TTATCAATCA CCCCIrTTTG TGAATrTTTTA TCAATTCTAC TAATAAAATG AGCTTCAGC GCrGAGCCA TrGAAGAGG GTrCGNTCA CGGTCCACCr
CCCAAATCAT
ATAGGAACTT
TGCATCG1'AT
TAAACAACTA
AAATTCCGGA
CCTGTTTTCG
GGAAATAGCG
TCTGTGTAAA
ACAGCTTCAT
GCAACTTGAC CGACAACTCC ACGAACTCCC GATITGAGCAA ACGTTCCATC CTC'TGCCCTC TGCATATTTG CTTGGTAGTT TTGGTCCAGC ATCTCTTCCT CTGTTAGATT ATACTTGGAA TTTCCATGTT GAGTCCGAAT TAACAGTGGC AACAGCAGGT CTCGACTCGT TTTGATCCCT GGTTATCCAA CTGCTTATTC AAGGCTTAAT TGTAGAACCA ATCCAGT'rrT ATCATTGTCA GTTCGAGGGC TACACTTCCT ATGTGTTTTC ATAAACAATC TGCGGTAGCC ATTATTGACA TAACCACCGC ATCAAAATAA CGTGCAATTG CGAAGTCATA ATCCTGCTGC AACCATATTC AATTCAAGGG ATTATACAGT CCAGAC'TCAC TTCTGATGCA ACACACCATT TCCAAAATAA CAGGGGTAAC GGTAATCTGA GATTrTCCT TCATACTTAT TCAACT'rCAG CAGC71TGGT TTCTTGGTTT T'rAqCAAT TGCAAGACAG TATCGCGCCG ATTAGTAGAA TCTTCTACGG TCCGGCCCCT TGAGCATCCC TGCCAGAGTC GCAGCTTGAT GAAACTCCAA AGTATTTCTT ACTCGCATCT TCTACACCCC 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GCG'rTGTTAA GGTACATGGT 'rAGAATTTGC TCCT'rACTAT AT'rTTTTGCT GCAAGGAAAA ATTCTTTCGC 'rTrTCTCTCA ACAGTTTGAT CCTGCGATAA 1TTAGCCAGCT GTTGGGTAAT GGTAGAGCCA CCACCTGAAC GTCCAGCAGT AAGAAAAAAC GGCCATAGTT AATCCCGTCA ?TTTATAGA AAGAACGGTC ATAACAGCAT TCTGCAAGTT TTTACTGATG TCAGTCAGCT CAACATAGGT CCAGACAAGG CACCAGCCTC ?TTTCTTCA CGGTCAAAAA TAAGAGTCCG GCATTTGCA AATCAT'rGAC A7TGCGAC TTGGCTACAG CAAACAAATA AGCAAGCCTG. CACTCAAACC TAGTATAAGG ATAATCTTTG TTAGATGATA
TAATTCTAAG
ATAGGCGTTT
GACA-ATAGCC
TTCTGTCGCA
TCCCTTTTrGA
AGTTTI'CAAG
GATTCCAACT
ACGACGCCAG
AATTTTCGAA
ATAGTAGAAT
TCAAATAATT
TCGGACCTAC
CAGAGTCCTC
TATCTAATTT
TTGGGCTAAT TrI'TCGAT CACTACCAGA GCGACGTAAG TAGTTCACTT GTTTC'rTT' TAAAAAGAGA AAGAAATTTC CATGCGTTTA TTTATCATC TTCATCATAG GAAGACAAGA ATTTAGCTA'r TTCCTATccA AATAGGGcTT TrrTTGTTAC AATATCTGTA TGCAATTC-Ac ATTTACATTA CCCGCCTCTC TACCTCAAAT GACAGTAAAG CAATTACTTG ACCAACAACT CCTCATCCCT AGAAAAATCC GTCA77M GAGAATCAAG AAACATATTT TGATAAATCA AGAAGAAGTC CACTGGAAGG AAATCGTAAA TCCTGGAGAT GT'TTGCC-AGI TGACTT'N'GA CGAGGAAGAT TATTCCCAAA AGACGATCCC TTrGGGGCAAC CCAGACTTAG TGCAGGAAGT TTATCA6AGAT CAACACTTGA TTATGTAAA CAA6ACCAGAG GGGATGAAAA CGCATGGTAA TCAACCAAAC GAAATTGCCC TrCTTAACCA TGTCAGTACC TATGTTGGCC AAACCTGCTA TOTCGTTCA'r CGTCTGGACA 'rGGAAACCAG TGGCTTAGTT CTCT'rTGCCA AAAATCCTTT TATCCTGCCC ATTCTCAATC GCTTATTGGA GAAAAAAGAG ATTTCTACAG AATATTGGGC TCTAGTTGAT GGAAA'rATCA ACAGAAAAGA ACTTGTTTTC AGAGACAAAA TTGGACGTGA TCGCCATGAT CGTAGAAAAA GAATAGTTGA TCCAAAAAAT GGGCAATATG CTGAAACGCA TGTAAGCAGA TTAAAGCAAT TCTCAAACAA GACTTCCTTG GCTCATTGCA AGCTAAAGAC AGGCGAACC CATCAGATTC CCCTCTCTAT AATAGTAAAT CTTTACCCAC CCACTTACTT AAAAGAATTA AAAAAGAATG CCACAAAGCC TTGCTTTCTA AGAGTACGAA CAAGTTGTGC ACCAATTGTT TACCGTCAAC TAAGACATAC CTACCATATC TTTGAAGCTG CTTTCATAGC GCTACCAATT CAC'rAACTGA TTACCATTCA ATTCTGGGAT ACGATGTTTG CAGCACCAGC AGGATCATTT GGTCACCAGT GTGTGCACCT TTCGCATCAT AATCTTCCTA TCCTGCGAGA CAAAGACAAG CCGGCTTATG CTTCATGCCT TCCGAC~drC TAGAGAAGCT AACTTTCACT GATGATCGTG TCATCCATTT 'rCAACTCA6AG AATTATTTAG AGTGTATGAC ATTTCGTTGT GTCAAGAACT TTAGTTTGAG TGAAGATACG ATTGGATCr TGCGTTCACT TCATCAACAG TCCAGTTGGA GTTGGAACGC TACAAGACCG ATAGCTTTTG GCGAGCACGG CGAAGGTCAC GTAAGCGTGG ATACTACTCA ACCCTTCA6A A'rACATTTGA TTCCATATAA AAAAGCAAGA CAATTTTTGC GAAGTATTCA CGTACCATGA TACAACTTTA TTGCCTCAAA CAATGAACCG CTGTGTAACC GTATGATTCG TAACGTTCTT TTCAAGAACT GTTGTGCAGA TCCGTCAAGT CAGCACCAGT TGAGTTAGGA CACCACGGTG TGGTCCG'rCA TCAATCCTTC AACAACACCA T'rGTAGTACA TGAAGCACCT TGTTGAATAC AACTGrTN'A CTCCACCmT AAGGTGTT GAACGATTTC TACACCGTCA 2580 .2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260
AAGTTGTCTT
GAGATAACTG
ACGTCGTTTC
TCAGCTGCTT
GAAGAGCT
TTTCAGTACC
CACCAGGAGC
CTTTCT'rAGC
AGCCATTGGA
GTCAAGAACG
AGTGATAACA
A)LAGAAACCA
GCCAAGCAGT
TCGTGCGTTAG
ACTrT=AG
GTAGCTTCAA
156 GTAGCCCACT CCArTrGTWr TCGATCACCT TCAGCAGAAA CTrGATGAA TTrACCGTTA ACTTCAAATC CACCTTCTTT AACTTCAACA GTACCGTCGA AACGACCTTG AGTTGTGTCG; TIAI'TCAACA AGTGTGCAAG CATAACTGCA TCTGTAAGGT CGTTGATGCG TGTAACTTCA ACACCTTCTA CGTIAT ACGACGGAAA GCAAGACGAC CGATACGTCC GAAACCGTTA
ATACCAACT
TGAAAACAGT
TGAG'rTGAAT
ACCAAGGAAT
TACGCGCCGG
AAACAGATAA
TCGCArrCGC
CTACAATCCA
CTCCTTTICAT
CAATCATAAC
TAACTACCAT
AACTTGAATC
TGCAAGTATG
TAGTGATTTC
ACTACAAATC
GCCATTGTT
CTCC'rTATGA AAATCATGAA AT'rrTTATTG ACCTTTCAAC AAACCTA'rrA TACAACTATT TTCTATGTTA CTTC~rTT'r AAGACTGTAA CCCTTACTAT TCATAGCATA GAAGTAGGCT GAGACATAAC
AAGATTTAAT
CAAACTTCCC
GATAGCCTCG
GACACCTAT'r
CGCTGCTACC
TTAAAAACCT
ACCCCTTGTG
TAAATAAAAA
TCCTTGGAAC
ACAATAGCTT
ACGATTCTAT AGCATCCATT 'rTACTAATCT CAAGTAATAG ACGAAAACT AGCTTCC'rA TAGTGATGGA TGGGTAAAAG TGACI'TACAA CAACCAAAAA TGCCAGCAGC AAGGCGATC TT'CCTTTGAC ATCACGATTC TGATPAACCAA GTTGCATGAT ATTGATGTAA ATAATGATAC TAACACGAAT AAAGGTAALTC ACAATATCAA TCTTAT'T'TT CTGTAATTCT TCTGTTACTA AGATAAAATA AGA'rACAGCT TTCGTAAATC
GTGATGAAAG
GAACTCTCTG
C'rrTTGTCTG
CAGCCTCTTT
CACAATCAAT AATCCCTGAA TTGAGAAAGC ACAGTATACT TGATGGATCT TTGAGTTCCA CAAAATCGTT TCCATTTGAT ATCATTGATT ACACGTACAA ACTTCTACA ATGCTGGCTG TCCTGTCTGT 'rCAT'rCCAAT GAGACAGCAT GAAACTGTTG CTGTCCTCCA TGTCATC'N'C TCTTCGTT'rG AAATTrGAGCA ATCr 'ACTAG TTCGGCAGC AGACTGATTT GCCAATAAGA TCATTAGCTG TCAAATTTTT 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 T'TMTAGTAA ACTGCTTrGGA ATCGTTAA'rC CCTG1-rCA'rT TGTATCAGTA TAGAGGGATC CAGCCAACAC TTTGTCCGTC TCATTATTAC TAACAGAGAT ACTTGTrATCA TCATAAAGAC TCACTACTTG AGCATAAGAA GGCATCGTTT GACTCAGATC CATTTCTTGC CCATCTA'rAG TAATATTTGA CATGTTCATC CCAAAAGCAC TCTCCAAATA TTTAATAGCT TCTTTCCCAA CTGTATCCGT GATATATAGT CAATTGAAAC AAGAGCAGCA TAAAAAAGCC TCGTAAAAGG TATTGCAACT TGGTAATACC 7T?1'GAGGT GCTTTTTGAT ATGAGCCCAT GT-1TTCTCAA TAGCATTGTA CTCAGGCGAG TAGGGAGGAA GAGGTAAAAC ?r'rATGCCCA AACTCTTCGC ATAAAAGTTC TACrCCCC ATTCTATCGA ATCTTACATT ATCCATAATA ATAACCGATG GTGTGTTTAA TGTTGGTAAG AGAAAATTCT GAAACCAAGC TTCAAAAAAG TCGCTCGTCA TCGTCTCTTC GTAAGTCATT GGAGCGATTA ATTCACCAT'r TGTTAGACCT GCAACCAAAG 7 AAATCCTCTG ATATCTrCTr CCAGATAC?1T GACCATATTC TCGATAAAAA TAAGTATCGA GGTGCTTTAA ACTATTAAAA TTCTAAGAA AGTAGGTG1'G GT 'CNqrTTT CGAGTGTAGC TTGGATGACA GCCAAATTCA GAAGCTATTT GATAGTTrTT AAGTCTATCT CTATCAACCT T'rAGCTCTCC TGTNTCTCT T-rTAGCTTTA GGAAAACGTG TGATGCTTCT GTrATACTAC TACGAAAATC TATTGAATAT GCCATAAAAA TCATTTTACI! ATATTTGAAG AGGCGTTTAA AAGACATCCT TrAAAAAGTT AGTTTArI'TT TTCATGGAAA AATCAAGACT CTTAGCACTA ATCGCTAAAC CACGAAAACG GCrAATAGTG CGAGAACGTC C'rGCAAT'rAG GGTAATGGCC AACATGATAA TATCAGCACC CGCCGCCGCA TCCACCTCGA CCATTrTCAC AAAAGGGGCA ACACTACCTA, CTGCCGCAAT G'rGATrGTCT CGATGATTAT AGCCACCGCC AACTCTCACG GTAGTTTTC GAGTATCAAA TACCTTAATG GCTGTCATCG AAGCAATCCC TGATAAATGT GTTAAGAGAC 'rTCTCACCGA GCCTATGAT TCCCCATCCT 'rAAAT'rGATG AGGATTCTGG ACCCTTTGAA AAACGGTTAG CCCCGCTAAA TGCCTCTTAT TAATTGACCT TTTrAATGAGC a a.
a a a as a.
0 a a 00 *0 a a a. a a a a..w a a .0.0 *0 *a a a 0S*a a a. as a. 0 a a ATCCTG?'TC GTCAATCTAA ATAAGGCTAC TTTTTCrGGG CCATAGCTTT GAGCGTATAG CAGTCAAATA AGCGTCTGGA TTCTTGGTr'r TAT'rCCTTTr ACCAGCCATA AATGGTATTA CTGTTCGCTC ACAATAACAG GA'rTATACCA CATTGTGTAC ACTATCTGAC ATAAAACTCG ACAACTTAGA CATCAAGGTA TGGGTrAAAC TACCACTGGA GTCATATCAA TATTTCCAGA TGI'TCAATCT GTTCCAATGA GCTTCTTCG4G CAGCAGCAAG TAGGCACGCG CTTGAGCAAT TTTAGCAGGA TAGCATCTGA GCATATTTCT CAAAAAGACG CAATCATCGC CTAAGGCTTC TGTAAAAAAT TCAACGCA-AC TCTAAAACCA AATCGCCACT AAGGTCACCT CGGCATCAAA ACACCAGC'N' CCTTGGCAAA TTCGTACTGT AATCTTCGGA ATTTGAAAAG GGGTAAATC CGTGAGATTT 6480 AGAACTITrT 6540 TArr'rrTGGT 6600 TTCTAGAGGA 6660 GGTTAACCCC 6720 GACGTAATCA 6780 ACATTCA.ATC 6840 CATATTATCC 6900 GCTTTCCACT 6960 TGCCTTTGA 7020 TAAATTAAAG 7080 TAAATTAGGA 7140 TACATAAGCA 7200 GCGTTCACAT 7260 AGTCAAACGA 7320 TAGGGTAAAA 7380 AAGCACACC 7440
ACAGGTGCTA
TCI'GTTCAT
TGGATGGTAG
TTGTCAGTAA
ACTTGGTGGT
TTGGCTTGGC
TCTCGCAAGG
ATTGACATCA
CATGATGATC
CTGCTTCAA
AAAAATGGCA
TGTATCATCT
ATGAACATCT
AGTTGAAATG
7500 7560 7571 INFORMATION FOR SEQ ID NO: 3: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 26385 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: TTTGCTAGTG GCTTAAATTC TTCAGGAAAA TCAGGCGTAT CTAAAAGTcG TGTCGr GTTTCATCTA TATAAAGACT TCCTGCTCCC CCTACAACTA GAAAACGTGT CTGTG?1'CCA GCAAGAAGCT GA'1TAAATAG TTCGATTGAT TTGCTGTGGA GCGGTAGCGT ATCTGGTGA TAAGCACCAA ACGCTGAA.AT AACAGCATCA AATCCAGTAA GATCATCTTT TGTCAACTCA AATAAATCTT TTTTAATAAT AGACTCAGCT TGACTTTGT TTTCAGAACG AACAATAGCC GT'rACTTCAT GTCCTCGTTT GACTGCTTCT TCAACAATTG CT'rrCCCCGC TTGTCCATrT GCTGCAATAA CTGCTAGT CA1-TTr'rAT ACCTCTC7TG TTGTAATTAT TTTAGTTACA GAAATTGTGA CACTCTTAAT AATCAATGTC AATAGTCTTG CAATA ATCAAAATAT TTCTACCAAG AAAACTAACC ATGATTCTAG TGAAAAAAAA TCTTcTTTG CAACAAATTT ACTTTCT'rGT TTTAA.ACATG CTATAATAAT CATAGCAAGA GATCTAAGTT GTCT=TT' TTAAAACGAG GTGATTATCA TGCGTAGATT CTATTCCCAT CTCCCCTACT ATCTGGTCAT ATTATTrCTTT TATTGGCCAC TTTATGAGTT GTICTTACTA GTTGTTTCTG ACCCCCTTAC ACTCAAGGGA CTCTATATAA ACAATCTTCT CTTCTTTACA CCTCTGGTAA TCTTGATTGT ATCGTTACTC TATAGCTACC GTTTCCGTT'r C'TCACTTGA TGGTTAGTTG GTAACGGACT GCTCTTTTAC T'rTACTATCA TAACCTTTGG TGAG'TrTATA CTAATTTACT TGCTAATCTA TGAAACAGTT GCTCTCGTCG GCATGGATTC TGGTATTAGC ATCAAGCATA TTCTACAAPA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AATGAAAAAC AAA AACTTT CACAAAATCC T'rGAAAAATC TCCATAGAGA CAAGTCACTT AGTCCCTTTC TACTAGAGAG ATAGGAAGTC TAAACTGATA CTACTCTTGA GTrTTTATG CGTTAGAGCC GATCAGAGGT GTCCCTCTCT TT'TGAGGTAC GTTGCGACGT CCTTTCGAGG ATGTCGCATT TTr'r-ATTAG AAGAATTAGT GGAGCGCAGT TGGGCAATCC GACAAGCTTA ATCATGATTC CAAGTGGACG GTAGAAGAAG ACC1'CTTGGC TCACAATCAT GCTATAATAA TGCGTGGTTG CTGGAAACGC AAAACATAAA ACGGTGGCCA ATAAATGAIAG GTGGAACCAC GATACTAATTr
TCACGAACTG
'TTTATCTAAT
ATTTCCAACG ACTGGTGATG ACAAAGCAAG GACGCTACTA TGATGALAACA TGGAACAAAA AC'T 'CAGAA AATATCTGGT GGCTATTAGA ACTTTCTCAA TAGACATTCT GACGGAAATG GAAAACTTCC TCTCTGATAA AGAAAAGCAA GGACTTGGAA GTAGTCTGCT GATAAAAAAT CAATGCTTAG AAACTATGAA
ATGGAGTI'GC
GAAGTTAAC
GA'rATTGGAA
CCCTACACAC
CGTTTGGATA
TTGAACGTTA
ATAATAAAAA
159 AGGAGAACAT CATGA'rrAAC ATTACTTTCC CAGATGGCCC TCTI'CGTGAA TCGAATCTG GCGTAACAAC TT'GAAATT GCCCAATCTA TCAGCAATTC CCTAGCTAAA AAkAGCCTTGG C'I'GGTAAATT CAACGGCAAA CTCATCGACA CTACTCCCGC TATCACTGAA GATGGAAGCA TCGAAATTGT GACACCTGAT AC7TGT'rCGC CCAAGCAGCT CCATCGAAGA TGGTTTCTAC ACCT'rCCTCG TATCGAAGAA GTGAAGAAGT GACTAAAGAC AAT'rGATTGA AGAACACTCA CACGAAGA'rG CCCTTCCAAT C~rGCGTCAC TCAGCAGCTC CGTCGTCTT'r TCCCAGACAT TCACTTGGGA GTTGGTCCAG TACGATACTG ACAACACAGC TGGTCAAATC TCTAACGAAG GAAATGCAAA AAATCGTCA.A AGAAAACTTC CCATCTAI-rC GAGGCACGTG AAATCTTCA-A AAATGACCCT TACAAGTTGG GAAGACGAAG GCGGT'rTGAC TATCTATCGT CAGGGTGAAT ATGTAGACCT CTGCCGTGGA CCTCACGTTC TTCTCCATG'r AGC'rGGTGCG TACTGGCGTG CATCAACAGC TCGTATCCAA ATCTTCCACC GAAACAGCGA CAACGCTATG ATGCAACGTA ACTTGAAAAA CTACCTTCAA ATGCGTGAAG TCTACGGTAC AGCTTGGTTT AAGCTAAGGA ACGTGACCAC AAGAAGTGGG ACAAGGTTTG
GACAAGAAAG
CGTAAACTTG
CCATTCTGGT
TGGAACGCTA CATCGTAAAC AAAGAGTTGG CACTTGCTTC TGTTGAGCTT TACAAGACTT TGTTCCCAAC CATGGACATG GGTGACGGGG CGCACCACAT CCAAGr'rTTC AAACACCATG GTAAAGAGCT TGACCT= T 'rGCCAAATGG TGCGACTATC TTTTGGCTA CCAACACGTC CTGGTCACTG GGATCAT'?AC AAGAATTTGT CCTTCGTCCA TTCACTCTTA CCGTGAATTG
ATGATTTCAC
CGTCGTGAAT
TACACTCCAC
CAAGAAGACA
ATGAACTGTC
CCAATCCGTA
GGCCTTCAAC
CAAATCCAAG
AACTTGACTG
'TTTGATAACG
ATGGGCGTGG
ATCCAGATTA
1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 'rCGCTGAAAT
GTGTACGTGA
AAGAATTCCA
ACTACCGCT
ATCAGATGTG
ACTACTTTGA
AAACTGCCCT
AACCCTTCGA
CGGTATGATG CACCGTTACG AAAAATCTGG TGCCCTCACT AATGTCACTC AACGACGGTC ACCTATTCGT TACTCCAGAA ACGTGCCCTT CAGTTGATTA TCGATGN'TA TGAAGACTTC CCGCCTCTCT C'rCGTGACC CTCAAGATAC TCATAAGTAC GGAAAATGCC CAAACCATGC TTCGTGCAGC TCTTGATGAA AGCCGAAGGT GAAGCAGCCT TCTACGGACC AAAATTGGAT TGGAAAAGAA GAAACCCTTT CTACTATCCA ACTTGATTTC CCTCAAATAC ATCGGAGCTG ATGGCGAAGA TCACCGTCCA
TTGTTGCCAG
GTCATGATCC
ACCGTGGGGT TATCTCAACT ATGGAACGCT TCACAGCTAT CTTGATTGAG AACTACAAGG GGGCC TCCC AACATGGCTG GCACCACACC AAGTAACCCT CATCCCAGTA TCTAACGAAA AACACGTGGA CTACGCTTGG GAAGTGCCCA AGAAACTCCG TGACCGCGGT GTCCGTGCAG 160 ACGTAGATGA GCGCAATGAA AAAATGCAGT TCAAGATCCG TGCrCACAA ACCAGCAAGA TTCCTTrACCA A'rTAATTGTT GGAGACAAAG GCrACGGCCA AAAAGAAACA CAAACTGTCT CTGATATCGC CAACAAATCA CGCGTTGAGA TGGAGGCT' TTCTCATCTA rrTr'rACTCA CGCACTGTCG TTCCTTTCC GACCTCAGAC GAAATTrTTCT TAGATAGATA AACGCCAAGT TATCCTGAAA AGCCACGTrC AAATACTCGG GTATCTTTA TACAAAGCrC 'rTGCTCATCC AAATGGAACGA CCGAAACAGTC AACGTTCG'rC CAGTrGATAA TTTTrGTCAA GCTATCCTAG AATAAGAGTC TAGCATAAAA GCCTCCAATC
AGGACTAAGT
TCGATACGAA
CCAGAGGACI'
AGGACATCAC
ATATAAATCT
TCACTTGAGC AAACTGAATC TCTGGTGCCC CAGrrCT'rCA GCTGGGTCAA ACGGCCATTG TGTrTTTTTAT CCCGATTCCC CCAGACCACC ?rCCTTGGTG
TACTTGAGAC
ACGATTTCTT
GCATATTTAC
TCATCATGGA
TTGAAAATTT
AGTTGACTGG
AAATCCCTCA
TCrAATIrr
AGTTCCTGGC
ACAAAGCAAC
TGAAAGAGTA
CGCAGA'rAGG TrrCTTGGTC TGTTTGAGAT GA'rTTGC'rCA TATCAAGGTC ATGTAGATTG GAATTA'N'TC CTTGACCAAG AACT*=CAA ACGCAGGTAC ATAACCACTA GCAGCCACTT TTTATCCGTC ACAT'rTAAGC CTTTTTGAAT AAAGAAAAGA TCCTCAATTT GAACCTCCTT TAAGACCAAA TGTAAAACTA GGTTGGTATA GGAGTCGATT
CCTGTTCTAG
CTCCAATGGG
GT'NrCTTTC
CTGCTAAAGC
GATAGACCTG
ACAAGAAGAA
AGACAAGAAA
C1'AGAAAAAA
TCGATAAATC
CTGCTGCTC
GGTCTTTATC
TGCTTTTGAC
AGTTGCCGGT
TGATGGACCC
CTCTGCTGAT
CGACCACT'rC TGCAACTAAG
ACAAGGTATA
AGAGTTCCAT
CTTCCCTCTC TATTTrCCAAA GGAGACTTGG CGTTTCCACC AATATGTCCC AAGTGAAAAA AAAGTAGAGG AAGTAAATTC CTAGACTGGC TGCCAALAGAA AGCAGATAGA TAAAAAGACG TTGrT=rCCAA TCAAGCATGC TTCAATCCGT CTACCAATCC CTGCTCCTCC AACTTTTTAC TATCATCAAT GAAAAAGTCA CTGTTCCAAA TGTTGCCTGC ATGCTCAAAT AACACGCGTA AAGAC7"rGCCC TTGATAATGT AA.ATCCATGG CCAGCAAACT CTCATCACGC CCAAACTCAT
GTAATCCAC
CTCACGCGCT
TCCATAGAGA
TAATATGGTT
AAATAAAAAC
ACTACGGGAG
ACCCTATTCC
GCAAACGAGC
GTTCCCGCAT
AAATCTGGAA
ATTTGGTA'rr
AGGAACGACG
3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 CACATTGACA GAGAGGGTAT CAGGTCGTCA CGTGCTACGA TTCATTCTTG GTCAAATTCA GAGGATAACA CCAGCATATT CAACAAGCCC TGAACCTTAG ATCCGCCCCC ATAT-TGATTG CTAAAAGAAC CTGCTGGTCA AAAGGCTGG TCACAAAGTC CCATGACAAT ATCCATAGCC TGGTCTCTCG AAGAAAGAAA CATGATAGGT ACCTTGGAAA TCTTGCGGAT GGGCAAACCA ATATCCATCA GGACCAGATG T'rCCTGACAC CAGTGATAAC CATTAAACAA AGGTTCCGAC TGAACAAATA GACTCAAAAC TTCCATAAAG TC=CACCA GGACCACTTC AAATCCCCAT TCAGAGAGCA TTTCCCA.AT CTGTTGACGA ATGACCTGAT CATC7TCTAT TAATAAAATC TTGTGCATGC GCTI'CTCCTT TTCCATTATT ATAACAGATT TTCCATGCT AGATGGTCTG AAACTGAATTr TGAAATAGCC TGTTTPVMAGC CAGTACAAAC TAAATAAAAA GAAAGGAGCI' ATGATTCTTA TTACGACCTT CCTACCTCTC CTTTGATAAT CTC-AGGT'rCC AAGAGATCCC AATTGGCTCA ACACAAGCGC ACCCTCAACT GCAAAAGCAA TGATTGTCTT TCGTGGGACA CCTATA'rGAA GGAAATTCCT CCCATCATCC TAAGCAAAAG TCTATGCTGC TAGCCAAATT TTGATGCACC TGG'TCTCCAT AGGCTATGCT ACTAGCTAAT TTGAGGGAAA CTATGGCCA ATATN'rGA CTATCTCAAA CCC7TrGAATG AGTTACACAT TCTAACC=A CTGGTCTCCA CACTTCCTCA ACGTCTrI'TA ACCATGCTTA CTAGCAAAAA TCGCCTrCAA
TTGCTAAGA
GATGTCGCAT
ATAGAAATCA
GATCTAGCAC
TTATTAGATG
AACGACATCG
GATACCTATC
TTCCACCTGA
'rrCAAAAATT GCAAACrCTC TTTGCCGCTA TGACTTATCG GATGACAGTA TCATTGGCTG CCATTT'rATC
TGTCACCC
CAAGCGAAGAT
GCTCAAAAGC ACGCCCTCG CTATTT3AAAG AACTT'TTTTG GTTATTCTAG CTGCGCATTC CAAGGGACGA AATCTCGCTA
S.
SS
S S S S
S
S. 55 S
S
S
*SbS S 5
*S.S
S. 55 S S
GAGCAAAGTT
CAAGAATTGA
TGCAAAATCA CATCACAGCA GTTrATACAT CACAGACTGC GGGrTATCAA AGGATAATGG CCATTATCGG TATGATGCTG GAAATTCCTG TG4GGTGGCAT CGCCCAGCAC GATACCTTTA 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 ATAGAAGCAA GATAT'rCATT CCACAAGGT CTCACCAAAT CATCGTTCAG AGTACTGCCC GT'rGGCAGAT TGAGGACAAG CACTTCGTCC AAGTAGACAC AACCTTTAAA GAATGGGTCG ACTTCGACCT CTTCTTTGGC ACTA'rrCTTG CTCCTAAA GGCGCTTGAA TACA'rTCATC
AACTGGATAA
CCACAGTCCC
ATGCTCGTAT
ATCTrCTTTGT GACCAACAGT GATAGCCAGC TGACGA6AGAA CTTCAGCTCT TAGCTCTATC AATGAC'rCGG CCAAGCTCAA TCCCTCACTC CAGAAGAAAG AGAAACCTTG GGTCGCCTTA CCCAGTTArT GATTGATACT CGTTACCAGG CATGGAAAAA TAGATAATAC TC'TTGAAAAT TAAATGTATA CAAAACAAAA GACCTAGAAT ACATACTTTC ATCTGCATTC TAAGTCTTTT TAAATACAAT CTAATAGTCA ATAAAAATCA AAGAGCATTG AGAGATAATG GGGCTTGGAA CGTCCCTCTC CTCAACAA AATGACCCCA TTATAGATTA AAAAGATGCC ACTTAGAAAA AGCAAAAAAG GAAGTAAGAC AAAGGCAAAT ATATAAAAAG CTAACTGAAC ATTCTCGTAT CCA'TrTAT AAAAAAGGTA GGATAGATAA AAA'rAACTTG AAATGAGGGA TAATAAAAAT AATACTGGAT TCCACAAACT TCTATTATCC GGCTAATACA ArrCCTATAA TTCCAAAATG ACACTATAAA CCAGATACAT TTCTTACTCC
I
*t
U.
S S C. S.
0O S 'r'r'AATAGCT ACA7=~ATC ATAATTATCC ATCCTTCATC TGACTCTCTG CATCGGCCAC TGCCTCCATA GTCAACI'GAA TTcCCTCCAA TCCTACAGGG CAA'TTCGAT TCGGATTGTC AAGACATTCG ACCGCCTGAT AAACATCThA CTCTGTTCCG CCCGrTCCAC GCGCTACTGA GATCTrrTC ATAATGACAG GATTGACCCC CrrGCTTTCC rrCCCCTCGA GGGCAATGAT AC 'TGGAATT TGCATCCTCT TCTCCT TCTATTATTA TACCCITTTTT AGTTG'TAATG TAACTCCCGA TCGCAGCCCT CrrTCTGAGC ATGGTGGATC CCATTGACCA GACTTTCATA TTTAGCCAGC TGCAATTCAG CTGCTACATC CAGGCCTTTG TCATCAAACT CCAAAATCAT AGCGTCCAAA AAGAAAGGTT GGCCAATCGA CCCGTAACGA AGCAACTGCT GGTGAATATG AGTcA~CCAAc CGGTCAAACT CCTCTTGTTT CTTATCAGGC AGATGATGGC 'rAATTCCCAC AAGAGGTTGA TIGTGGAGCA CTTCAATTTC GCACTGGCGC AAGAGATAGC GT'rGACTGGG ATGCCAAAGA CTGTCTTCCC AGTTTCCCAA CAAGTCCAAA ATCCTTCTAC GCCCTGTCCC TTCATCCACT CCTATCTGCC GAGCATCTGC ATGAATATCT GAAAGAAGAG CTATTTTCGT ATAAGTATAA CATAAAAAGT CACAGCTAGA AAGATATTAG ACAAGAGGAA ACGAATGACC TATGAAAT'rG GCATAGACCT GCATAAGATT GTCCCCCC CTGCCGTPLAC TCAAATGATC AAGGACAAGG AATGTGGCTA TCTACTCACT TATCG1TAAGC ACCGCTTCAT TGAAGTTTT 162
AAAGAAAAAA
GACr-TCT 1-rrrTGATCC ATGGAAACTG AAGAGTTGAC AAGACTAATA-TCCTTAAGaT AATCAGCTCT GCCTTCTTCA GACACTAGCA GCCAGAAAAT
TATCAGCATA.TGAGTCGCAA
ACGAGGCTAC CCTGCCTCTA TCAATCGTTA CCACTT'rrCA CAA'ITTC'rC AAAAArwrCCT
GTAAACCTCA
GTAGTCTACC
ATACTGGGCC
ACCCGGATG
TCCATAAACA
GCCAGTATGA
CGTCAAATCC
TTCTAGGGAA
GCGAGTACTG
AACTCTAGCC
TGGCATGAGA
CAAAACAGCC
CATATrCCATC
GAAATCTAGC
CCAAACAAAG
ACCAACAAGG
AAACGAATGA
GACCTCGGTC
CTAGTTCATC
AAATAGGGAA
CGTCGGA.AGT
CGCAAGTCCT
ACAATCAATT
GCAATATCAC
ATCAACTCTC
CCAAACTGAC
ATTTCCTCTA
TCCAATTCCT
GTAATCGGTA
ATATCTCCCA
'rCCAAGGCGG TCCTCGTT7TT 7TTTTTTTGAT
AAGACTATCT
GAGGGCATTT
AGACTGGTTT
AAAACATCAT
ATCCCTCTTA
GACCAAGTrC
GAATATGAGC
CTGTCTTACC
CCTTGACAAT
ACTGGGACAA
CACTGGTCAC
TGG'rAAATCT CTCTTr'TT
ACCAGTCGTC
GATGATGAGT
GTCTCAGGTC
CCATATCI'AC
TCCGTAGCTG
GCCCACCAGT
AGGGAGGATG
GCCCCCACTT
GATGAATTTG
AAACATACTG
TACGGACACC
GTTGATCCAA
AAAGCCAGTA
TGGTATTTCC
TTCTCTTGCA
ATACTAGATA
AAAATGTATT
6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 3340 8400 8460 8520 8580 8640 8700 S. *f S S
S
S. 55 55 5
S
AAATTGCGGC TCGCATGCAA AAAGTGAAAA TCTCATCCTA TCAA.ACTGGT CTCTGAGCTC ATTTAGACTA TACAAGTGAC CAGAT~rCACG AGGAAGCrGA GGTCTTG.GAA CACACTGTCT CTACATAAAC TGCrAGGTTT CCCTAAAACC TGCCCCCACG GGAGAACTAC TCGTTGAAAT CAATAACCTC CCACTAGCTG TACCGCCTGA CTCGGGTCCA CGATAGTI'TT GACATTCTCC C?1'CACATCG GTGACCAGCT CCAAGTCAAG CAG7TTGATG ATCCTCAGTA ACGACGAGGA TTACAAGTG AATATGGACA GAGAAAATCA ACTAATT'rCT CAAGTCCCCT ACCAACCCTG TC;TCAACTGT AGTGGGTTGA AGTCAGCTAA GCTCGACAAA CTGACCTG~r CGTGGAAAGA CGGGAACTA'r TCCTGCCAAG ATATCAAGGA AGCTGGCGCC ATTATCTGGA CAAGCACTCA GCTTCAGCAA TACCTTCACT TTGCAAAACA ACTCTATGTC AAAGT7rrTAT
CACAAATTT
'IrTCAAAGTT T'N'TTGATAT 'rCAGAGCGAT AAGGCATTGC GCr'rGATAAG TAGTGTAGTT GAAGGGCGTT GTCTGAAAAA TAGGATGAAC GTTTCTTAT TCTGAAAGTG TCTGTGAAT AGCTCAAAAG GTAGGACGAT AAAATCGCTT TACGCTTGA TAGCCTTGTA CGCACACGAC TCATAGCACG TTAGCATTCG GGACTGAAAC AGCGAAGCTG TTTAGCCAAG CTTGACAACG AACGGCTCTA TGCCTTCAAG AACAGTGATA TTCCCTTAGT GAAGGCATAC GCTCAAAGTG AAAGI'CATTG TT'rGA'rGAGA TTATTGGTCG CTTCCAGT'T GACAA'rCTTT TCTTTATCTT TGAGGAAGGT CTGCTTTAGA TTGTCCTCAA TGAGTCCGAA AAACAGCAAG AGTTGATAGA GCTGATAGTG CTTCTCTAAA ATCTCTTrAT TGGTTAAGTG ATCACTCAGT TTACCGCTAT CCTGTTGTAT TTrCATGGGAr TTTCGATCCA ATTGGTTCAT GCTAAGATGT TGTACAATGT GAAAGCGATC AGTCTGGGAG ACTGTT'rCAG CCTGAGCCTA TCATAGTAAG GACTAAACAT ATCCATCGTA TCGTAGCGAA GAAAGTGATT TCGGATGACA ATATTAAGAT TATCAAAATC TTGCGCAATG TCATCCCAAG ACATAATCTT TGGAAGCCGA AGC'rTGCCAA TGACAGTTGA AGTTAAATG AAAAATCCGT TTTTTGAAGT
T'N'GGCTCTT
TGTCCTTTCT
CCGAAAACCA
GGCA'N'AGAA
TTTAAAGACA
AAA'rTCC
GTGTTTCAAG
CATACGAAAA
GAGCTTCCAG
AATTTCAACA
CAACACGATT
GAAATTTGAA
ATGATTTTCA
GCT'rGTCTTC AAAC1'CATCT
GAAAAATCAT
GCCAGCTGAT
TGGTTGATGA
8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 T'rAACTTTTG AGCAATYTTT GGGCAATA'rC AGTCATAGAA A1T=CAA TACGAGGGAr TTGGTGATT'r TrcTTTAccA AGTGA'rAGCA CTTGAAACGA CGCTr'rCTAA CAAGATAAGG AATTTrAGAA GGTrTTGAA CAGGGCAAGA TGGGGCGTCG TAGTCCAGT TGATGATGTC TAAAATCTGG ATATTAGGGT GGGGAGTCTC AGCAACCATC A7TTTGAAC GGAGAATTCT AGAAGGCATA CCAGTCGrT
AGTCATATTT
TGGCGATGAT
CTTTAATGTC
CTTrCAATTGG TTTCCGCACT 'rTCCTTrGTGT GTATCCTTAT TAGTAA=N GTGATAAAAT 164 GTAATTGTTC CATAGArrC rrCTAATGA GTTGITTTTGT CGCTTTTCAT TATAGGTCAT 10500 ATGGATTT TTTTCTACA TAAAATAGGC TCCATAATAT CTATIAGTGGA ?N'ACCCACT 10560 ACAAATATTA TAGAACCGTA AAAATAGAAG GAGATAGCAG G7*rMCAAGC CTCTATCTT 10620 TTTTGATGA CArIrCAGGCT GATACGAAAT CATAAGAGGT CTGAAACTAC TrrCAGAGTA 10680 GTCTGTTC'rA TAAAATATAG TAGA7TGAAA TAAGATGTGA ACAACTCTAT CAGGAAAGTC 10740 AAT"TAATTT ATAGAATTAT Vr'rAGCAGTC AAGGTGTACT G'IrATAGAT-r CAATATA'N'A 10800 TATGACTAT'r AACCTTGTCT 'rCTCCTAAAA TTGACTTTCT TG7"r~TTrA TCTTGTCCAC 10860 TCCAAACAAG TATTGTAAGA ATTTGATTAT ?T'rGAAAGT ACNTTTAATA TACTTGATAT 10920 AGTTAAAAAA GATTTGAAAC TAAATTCCAA A'N'AGAAAAA GACTTGAAAT ACTAAAAAAA 10980 AAAAAGTATA CTCTAATTCA AAACGG;TAAC AAAACTAATT TAGAGAATGA AATATAGAGT 11040 AT'rTCTCTCT TAAAAGTTTr TGCTGAAACG AGATGTAGAA AGGAGATT'TA GCCAAAGAGT 11100 CTATTAGTGC TAGAATAATA GATTAGAATT ATN'TAGAAA AACGA.AGTGA GCAGCTTA'rA 11160 AATTCAAGTC CCCAAATAGA TTCATACTAG TATCTTTTGC AAAAAATAAA GGGCGACTTC 11220 CTTCATGAAT A'rCAAT'rTCA TCTA'rAAGGA AGGTAGCTAA TTGAACTAAC TTATTTATTC 11280 TGTTIGTCGC TAGAAAAATC AGACCTCCTT GTGAAGATTG AGGAGATACT TAATGAAAAT 11340 **.CAAAGAAGAA ACI'AGCAAGC TAGTAGCAGA TTGCCCAAAA CACCGCTI'TG AGGTTGTAGA 11400 ***TAAGACTGAC CTATATAATC CAACGTGAAG CGACTGTGGT TTGAAGAGAT TTTCAAAGAG 11460 ***TATACGCTAG AGAGTAGTGT TITTTATG'rCC TTCTAGTAGA AAATGCTAGA CAGAAGAATG 11520 *GCGAACTTGG ATAGGAAAAA TAGATTGAGA AAGGAGGTTA GAAGAGATGA TTATTACAAA 11580 AATTAGCCGT TTAGGAACT1 ATGTGCGAGT AAATCCACAT TTTGCAACAT TAATAGATTr 11640 TCTAGAAAAA ACAGGACTAG AAAATTTAAC AGAAGGTTCG ATTGCTATCG ATGGTAATCG 11700 ATTGTTTGGG AATTGCTTTA CTTATCTAGC AGATGGTCAA GCAGGGGCTT 'rCTTGAAAC 11760 CCACCAAAAA TATrrGGATA TTCATTTAGT 71?I'GGAAAAC GAAGAAGCCA TGGCTGTTAC 11820 ATCGCCGGAA AATGTAAGCG TTACCCAAGA ATATGATGAA GAGAAAGATA TTGAATTATA 11880 *CACAGGGAAA GTGGAACAGT TGGTTCAT'rT GAGAGCTGGC GAATGCCTCA TCAC7=TCC 11940 AGAAGATTTA CATCAACCCA AGGTTCGTAT AAATGATGAA CCTGTGAAAA AAG7TGTCTT 12000 TAAAGTTGCG ATT-TCTTAAT GTAGAAAGAG AAGAACGATG AAAAAAATGA GAAAGTTTTT 12060 ATGTCTAGCT GGAATTGCGC TAGCGGCTGT TGCCTTGGTA GCTTGTTCAG GAAAAAAAGA 12120 *AGCTACAACT AGTACTGAAC CACCAACAGA ATTATCTGGT GAGATTACAA TGTGGCACTC 12180 CTTTACTCAA GGACCCCGT'r TAGAAAGTAT TCAAAAATCA GCAGATGCTT TCATGCAAAA 12240 GCATCCAAAA ACGAAAATCA AGATGAAAC ATTIrCTTGG AATGACTTCT ATACTAAATG GACTACAGGT TTAGCAAATG AATGGAAATG GTCAACTCAG ACAAGATAAA TTTAACGAAA TGTTCCTCT'T TATTCACATG TAATATTGAG GTTCCTAAAA AGCTGGAGTT TATGGCTTGT GAAATGTGCC AGATATCAGT ATGCTTGT 'CCGCTAAAT CTGCCTTAAA TGAAGCAAAA CACAAGTCAT GTGGGTTAGA CTTrGGGATCA ACTCTATGAA CTGTTCCGTT TGGAACAAAT ACACCTC3-rc cTAAccAAGT GATTCTATCA AGCGTATTGG ATCGGAGATG ATTACTACTC ACAGA7TTG TAAAAGAACA GCTTCTAAAA AATTGAAAGA GACT1TAATGG CAACACGTTT- ACAAAAGATC TTAAAGCAGA CTTGAACT'rC TACGTACCTA rTGGTGGAGG AAGCCTCTTA CTTGACAAGC CAACTTGCTC AAGATGGTAT CTCACCTCAA GATTCTTGA ACT'rTAATGT AAAAACAGCA T'ITGACTTTA ACTCTGGCTT TAAATACTGG GTTAAATTGT CCTTCAACAA GCTrACCTTGT CCATATCGCA GGAATTAATG ATGCTTATCC TArrCCAAAA ATCAAAGAGT TCAATTGATT
CCAAGGAA'TT
AGTTGCTAAA
TTCAACTCCA
GATTCGATTG
GAAACCTCAA
GCATTCTTAG
GTACTATGT *t 4 0 0 0* 0 AGAAAATGAA ACTCCTAAGA AAAAGGTACT GCTATTGGT'r CCAACACAT'r ATTGAACAAA AGCAGCAAAA GAAGCAGAAA AAAGACTAGA AAATAGGTGG AGGGAGAAGG AGAATGGTTA TATGATTATC G'rAGGA'rTAC TACCAATAAG CA'PrrGATTA TGTCCTATCA GATCCCAACT CTCATTAGTT GGTCAAGT?= ACATTCCAA'r
AAGCACTTTA
TGCCAACTAT
AATTTAAACA
ATGAAAATGG
TGTTCCAAGA
AACAATTAAA
GGTTG'rTTGG AAAAATTCAA TAA'rGAAGAA GACTACGTTA TAAGGGGATT AGCGATTCTG TGCTGAAGAA GTAATTACTG GCCAAGTGTA CAAGCTGGTA TATCATTACA AATGGAACAG
ATAAAGAAAT
TCTATCAAGC
CCAACAGTCC
CTGATAAAGA
AACATCCAGA
AATTCCTTGA
CAGCCTATAA
AAGCTGTTAA
TGTTGACTAA
ATCCTATCAA
12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 TGA'rrrATTr GAGGCTGTTC AGTAGATGTA GATAGTGAGC TGAAAAGCTC AAGAACGTAA TT'rAACTCGC TCTT'rGTTTA TCCGTTTTC TGCCTAATTA TAAATTI'GTT TC'rTrAATGC GTTCTTTAAT TAGTAGGGTT TGTATTGGCT TAGCCCAATC TTGTAAAAGA TGGATATTTG TTTTGCCAC TCGAGTATTT TTTATAGCTT CGTTTGGCTA ACTATAAAGC TCAATTAAGT GGACCGTTTT TTAGCTCTrC ACACAGTACG TGGGCATTTC CTACCA'rCGT GGCTACTTAC CrAATCTAAT ACAGATAGTA CATGGGCrTT ATGATTATGG TrAATGTGCT 000* 0.00 00 00 0 0 CCACTTCAAG AAA'IATATA GGACATTATT GATTGTTCCI' TA'GCC'rTC TCTTGGCAGT GGATTCTA.AA CGGGGTTTA'r CGTAAAATTA GGTTTAATGG AACATACACC TGCATTTTTG CC'TATGTTrG GTGTTTATCA ACATTT-GG'rr TGGAGCACCA 166 TTCAGC'G CAAACAGTAC CAGAAGAACA ATITGAGGCT AAGrTGGCAG GTG'rrCAAGT TTATCGTCTT TCCACATATI' AGTTGTTNTG AGAACTGTAT GGATCTTTAA TAACTCAC TGGTGGACCA CCCAATGCTA CAACGACGCT TCCAA~rTT AACTAAATTG 7rrGGGTCGTG C?1'CAGCAGT TACAGTACTC GATTTGCTTT ATCTAC7"MG CTATCATCAG TAAGTGGCAA GCTAAGATAG A'rGGTGCTTC AAAGTCGTTG TAGGACTTCT ATTATCTACC TCArrACTGG CTACAACC 'rGGGCTGGGG CTCTTTATCT TCTTGGTGGC AAGGAGGGTA GAAAATAATG AAGAAGAAAT CCAGTATTTA TTTAGATATT CTCI'CACA'rG TACTTrAGT 'rGGTGCGACC ATCGTTrGCAC T'TTTCCCAT'r GGTATCGGATT ATCATATCTT CTGTCAAAGG GAAAGGGGAA TTAACTrCAGT ATCCAACACG GTTATCAACG ATTTrGCACTT ACAACCCT'rA T'rGCGATTAT CCThAArGG GAGCAATCAT TTGTTAGCAA TTCCCTATTC GGCTTGATGA TGG'NATCT T'TTTTCCAAA CAGTTCCAAT T'rTGT'rACGT TTTATAAAGT ATTTATACAT TTATCAATGC ACAGGAAAGA TGACAGTAGC TGGGGAGATA T1GATGGCAC ATCATCCAAA ATAAGATTC GGAAAAAAA'r GAATAAAAGA GTCT'rTTAAT GGGAGTCCCC CCATATCTCC TAT'TTTCAA AITMTTGGCCI' GAACAGTTTA CATTAGAT'rA TTTCACTCAT CATTGATAAC ATTCGAAACA CTTTAATCA'r TGCCT'rGGCT TATTTCTGCT ATGGCAGCCT ATGGTAT'rGT TCGA~rTT-T GTCGAGACTA CTCGTCATTA CCTACAT'TrT cCCACCAAT-r AATTGCCATT GCTAAACTTG GGTTA6ACAAA TAGTTTAT'TT ATCTTTTAGT GTTCCATATG CAGTTTGGCT CTTAGTTGGA TGGAAT'rGAA GAAGCGGCTA GAATTGATGC TCCAAATAAA 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 TGTGCTACCG ATTGTAGCAC TTGGAATGAA TTCCTG'rATG AGTAGCCCTT CGTTCACI'TA GTCTGTTATT GTAGTTCTTC AAGTGGATTA TCAGAAGGAT GCTCTTTATT CAAAACTAGG ACTT'rGATTC ATGCGAATGA GGACGTTCAT ATCAAC'TGAA CAGGTATTGT AGCAACAGCI' CCTTGATTTTr GATTAACAAT ATGGTTCAGA AATACTAGAC CATCAAT'rAT TTTCTT'CTCT CTGTGAAGTA GACGAAAGAA AATTTCCGTT GTAGGCATTA ATTAAACTA'r GGTCAACTGT CAATA6AGAGT ATAGATATCA AGTAGTAATG AAAfTTA.AAG TAATAGTAAA GCAGGCTTTA GATAGGTGTA GAAATAAGAG TTCATTATGG GGAAAACA'rA 'PTCTAAAGAT AAAACATACA TGATACATI'T TTGCCAATTT TAATCGTGAA GGTAAGCAAC GCTCTTTGTT ATTAGATAAA TTGTCTGGAG CAGA'rAAACC AAACTCTCTT CAAGCTTTGT AAAATAATTA CTrrTCAATT TTCATGAGAG ACGCCCAAAA GGGAATAAAT TATTTA'TTTr AAGGACAGGC AGTTGAAAAT ACACTAGTAT CAATGTATGT TAATGGAATA GAAGTGTrCT CAAATA'rAAA TGGTATAGAT AAGGCAACAC
AGAGTCAGAC
7TTGGCCTATC
ATTCTGGTGA
CCAGACCAGC
T'rGTATCTCA
CTGAAACAGT
TAGGAGCTGT
ATTACCTCGC AAAAGGAACT ATTGATGAAA TCAGTCTATT TAACAAAGCA ArrAGTGATC AGGAAGTTTC AACTArrCCC TTGTCAAATC CTACTCAAGC TAACTATNTT AGAATACCGA TATCAAGTAT TGATGCACGT TATGGTGGGA CCACTrrA TAGTGATGAT AATGGGAAAA TTAATGACTA TGAGGAGCAG TTAGTTTACT AAArrAGTGG AAGTGCTTCA TTCATAGATT AAACGATATT ACTAGCTGAT GTTATGCCTG CCGACTCAGG 'PTTTAAAGAA ATAAATGGTC CAT'rTCACTT
CACTATATAC
CTCATGA7TTC
CGTGGAGTGA
CGCCACGAGA
CATCCATTGT
CGGTArrCC ATrATTATT AA~rr'CCAA TCAGGAGATT ATTAAGTAGT GGAAGAGTTC TAAAAGTAAG ATTAATATTG GCCAAT'rTTT GCTA'rGAAGT TAATAAATTrA AAGAATACTC TGAAGATAAA AAATCTGGGA AAATAATAAT GCAAATAAAG AAAACTAAAG AAGAATGGAG
ATAACGATTT
AACCTACAAA
CAGTCGAACA
AACAGGTTCC
ATATAGCAAT
CCGTTATACA CTTAGAGAAA ATGGTGTCGT TTATAATGAA ACAACTAATA TTATACTATA AATCATAAGT ATGAAGNTT CGAGGGAGGA AAGTCTr'rAA 0 0 0 0* 0* 0 ATArrCGGTT TATGAATGT-r
GACAACTAGT
CTCCGTTCTT AGGAGAAAAA TAAAATCAAG TAACA'GATTG TTTCTGATGA TACTGGTCAA CAACAGCAGA AGCACAAATG CCACTACAGG TAAGATAGCT TTTCGTATAT TGATGGAATC ACTCTCAATT AATTGATGGA CCCGCAAGGG AGGCCAATITA GAT'N'GATA GTGGCTCT'rT AAGAGAAAGG TTCTACAAAG ATTCGTTATT TAAAGTGACT CAGAA'rAGAG GACAGAGTTG GGAACAATTT CATAATGGAA CTTACTTATG TCCCGGACAA A7""rTTGCAA CATATACTAG TCGAGAACTA ACATGGAAGA AATCCTCAGC TTCA.ATTCCG GTTGAACTGA GAGATGGTGT GATTAGAACA TATATCACTA GTAGAGATTC TGGAGAAACA
CATAATGGAA
CCTACTAA!-r A.AGT'rGT'rGC
GGTTTAGCAT
ACCTATCTCA
T1-rAAAAATG
TTCTTTAGAA
TGGTCGAAAG
GCAA'N'AAAT
TCTAGAAGTG
AGTATTGATT
15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520
CAACAAACTT
AAAGAAGCAG
GT'rGTCGGTT GGAAATACCA CTATGATATT GAT'rTGCCTT AATTGCCAAA TCA'rCACATA GGTGTACTGT AATrGCATTT AAGCAATGTA GTTCAGTATA AAAGGAGAAA AACATGGTTA AATACGGTGT GGCTCGCTAC ATGCAAAAGA ATGATGGAGC TGCAGAGGCG ATTGCAGAAG AATTGGGAC 'rrCTAGCGAT GAAGTAGATT GTGTTATCGT
CATATCGCAC
TCATTTTGAG
TAGTCAATAA
CGTATGGTTrA
TTGAAAAATA
TAGATTNGGA
TGTTrGGAACA
AGAGATTACT
AAAAGTAGCA
CGCAACTCCA
ACAAGTATCT
TACACCAAAT
AGAAGATGAT
TGCCTATTCT GCGATTACAG TGA~rCGTGG TCGAGAAATG AAT'TAATGAT TTAACAAAAT GGGTATTT'rG GAGCTGAATT CTTCTCTATG ATCCAGATAA AGTTrCCTTAG ATGAGTTGGT AATAATCTTC ATAAGGAACC 168 GGTTATTAAG GCTGCACAGC ATGGTAAAAA TGTN7=CTGT GAkAAAACCAA 'rrGCGCTTC~r 17580 TTATCAAGAT TGTCGCGAGA IrGGTAGATGC GTGI'AAAGAA AGGACATATT ATGAA'rrTCI' TTAATGGTGT TCATCATGCA AGTTATCGGA GACGTTCTAT ATTGTCATAC AGCTCGTAAT GTCAGTATCA TGGAAAAAAA TTCGTGAAAA ATCAGGTGGT TGAATTGGAT TGCGMCAAT TCCTTA'rGGG GGGCATGCCT TGGAAATGTG CCCCATGAAG GTGAACATTT CGGTGATGAA TATGGAATTT TCTAATAAGC GTTCCrr GTTAGAA'rGG
TGAACATTAT
TAAAGGAACT
AGAAGATGAT
TGGTAAACCA
CTATCTGCAT
G3TCT1TAATCC C1-rAACrAG
GATCGGACTC
GGTAAACGTA
GAGATTATGG
AAGGAAGCAA AGGTGCCATC ATGCGCAAGA AAGCTATTTC GTATCTATCA TAGTACAGAG CTCCATTATG GCI'ATCATCT AAGGAGC3'CC AGTATCAGAA AACAATGTAA CCTTTATGGC 17640 AAAGAACTCA 'rTAATCAACG 17700 GGTTGGGAAG AACAACAACC 17760 CACTTGTATC ACCACATCCA 17820 GAAACTGTAA CCATGACAGG 17880 GATGATATGA TTrrGTCAA 17940 GGTTCAGCTT ATCGTTGGGG 18000 CGCTTAGACT TATTCAACTG 18060 TTGATTCACG AATCGCAAGA 18120 ATGGATGGAG CAAT'rGCTTA 18180 GTCATTGATA AAGAAATGCG 18240 GAAr'TTGCAA AACTT1TGAC 18300 TGTACCCAGT CTATGTTTGA 18360 TTGGTATTCT CCTATTTATA 18420 TTGACTTTGC TAGTI~rTGA 18480 AAT=TAAGG TATAATAATC 18540 T'rAGCAAAGA AGCCTTGATT 18600 TTCCTCATGA ACCGCTTTAT 18660 CTGAGCAAGG TGGAGCAGTC 18720 AGGAAGTCAC TAAACT'rCCA 18780 CCTTCATCAC GGCTACTATG 18840 AGGTGAAGCT GCCCTAGAAG AGATCGCAAA GTAAAATTGT GGTCGACTTG CTCCTCTGAA AACTGAAATC TATTATACTA TCATAGAAAT AAAGAAAAGG GAGCAAATCA AAGATGGAAT ACAGAAGCGG GAGGGGTGAT GGTATCCGAG CAAACAGTGT ATCATTGGGA TTATCAAACG AAACAAG'N'G ATGAATTGGC CGTGAACGCT ACGATGGTTTr CAATTGCTAC TGCAGATGCr CAGAAATTGT AAAATAAAITT AGTACT~rTA GAGGAGCTGT CAAACTATTG AAAGCGTT AGGAAAGAGG ATGCCACAGA CATCGTTTCT TGTCAGGCTC TCCCTTGCTG GTCAAAGCGG TCGCGATATC AAGGAAA'ITA TGATTATCCA CCTCAGGAAC AGAACTGGAC ATCGAGGTGA GGAAATTCAA GAGTTCATTC TGATACTAGT ATCTTCGAAG TTGCTCTGGA TTGTACCAAG GTCAGGTTAA GGAGAAATAT AAGGGCTAGC AGCTGTAGAA GCAGGAATTG ACTIrTGTCGG AACAACCTTA GACGG'rCCAG ATTTTGAATT GATTAAGAAA GAAGGAAAAA TTCATACACC AGAACAAGCC ATCGTTGC= GTGGCGCCA'r TACTAGACCA CTTAAAT1AAG ATGTGAGGGG GAGTTTTATG TCAGGCTACA CATCCTACAG TCCAAAAGTA CTCTGTGATG CTGGTGTAGA TGTCATTGCA AAACAAATCC TTGAATATGG AGTGCGAGGC AAAGAGATTA CAGAACGCTT CGTTGCTAG'r TT'rAAAGTTT TACAAAAAGT TGGAAAAGCT 18900 18960 19020 19080 19140 19200 19260 19320 TITATGTTAC cTATAGCTAT ACTTCCTGCA GCAGGTCI'AC TTTGGGGAT TGGT~GGTGCA cTrCAAACC CAACCACGAT AGCAACTTAT CCAATACTAG ACAA'rAGTAT TTTTCAATCA ATATTCCAAG TAA'rGAGCTC 'rGCAGGAGAG GTTGTA'N'CA GTAATrrGTC ACTACTTCTC TGTGTGGGAT TATGTAI'GG CTITAGCGAAA CGAGATAAAG GAACCCCTGC GTTAGCAGGA GTAACTGG~r AC~rAGTTAT GACTGCAACG ATCAAAGC~r GAAGGATCTG CPATTGATAC TGGAGTTA'N' GGAGCATTAG TATTrTGCACA ACCGATATAA CAATATTCAA TTACCTTCCG TC-ACGCTTCG TTCCTATTGT TACATCGTC TCTTCTATCT GTTATTGGC CACCN'TCCA ACAACICTT G7='CACAG
TGGTAAAACT
'rTGTCGGAAT
?N=ATGGCA
AGTTGCCGTA
GGTCCAATTG GAACTTrWCT ATATGGAT CATCATATAA TTACCCTAT GTTTTGGTAT GGACAAACAG TGG~rGGAGC TCAAAAAATA TCTGGATTAT TTACAGAAGG AACAAGGTTT GGTTrACCGG CTGCCTGTTT1 AGCGATGITAC TACGCGGTT TGTT'T=W AGTTGCTrrA 0 0 0~ Ce C C
C
C* CC C C .0CC
C
CCC.
CC
C
.c C C 0* C.
C 0 0 ATTGAATTTA TGTTrCTATr GGTGTTAGCT TCTTTATTIGC GGTGTAATCG ATTTCACTTT CTTCAGAT'rC CATTTGGACT ATTACTCAAT TCAACGTTCT TCTGAATCCG CAGATTCAAC A'r'A'rCAGAG CCTTGGGTGG TTACGTGTAG CTGTAAAAGA GCAGTrGA'rG TCTTAGAAGT TTATATAAAA ATAGTATTAA ATAAAAAACA GAGGAGAGTG AAGAACTCAT TAT1CCAAGTTI AAAATTAGCT TGTACAGTAC TTCTGGCGGA AGTAAAGATG
CGTCAGTCCG
AGACGTCTTA
ATr'rGGAATT
TATTTGGAGT
AACGCCAGGG
TTCAAATACT
ATCAAATAAT
AGTTAATCAA
GAAGGGTGGC
TGAAATTTITA
ATGGATGAGT
GGATACGCTT
TTGCGGGTGC
CTGCCAAATC
TTAATGAGAC
ACTGAACTTG
TTTTTTrGCTC
TTTGCAGGTC
CATAGTGTTC
ACATCTTTTA
GTTCTATATG
AATATTTCAA
TTGCAGGGGA
GTTTTGTATT
CGAGGAGAAG
GCAGATTATT
ATAGAAGATG
GTTGATAAAG
CT7TAGGATr CTrTGGAGGT TGATTGGCTT TGTCTTCTrT GTGGATATAT Tr'CTCAGGCG TTTCI'GGAGC AGTACG~cTA GTGGTGTTGA AACTGTTGCA AATTAGCCGA TTTGGCCCAT GT'TTCTCAAC AATGATGN'C CTAAAAATCG 'rCGTAAAAAA TTACCGGTAT TACAGAACCA TTGTTCACGC AT'rCCTTGAT TAGGAAACAC AT=rCAGGA ACGCTAAGAC GAATTGGGTT ATATTATT'rT TAGATGGTTC AAGTAGATTC TAAAGAAATT TAAAACAGGA TAGCCTACAA TAGATGCTTG TGTGACACGT CACTTTTAAA ACAAATTGGT 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 ATrCA.AGCAA TCTATCGAGC AAAAGCAATC GGTGTAGATG. ATTAAGTACT TACTGACTTA AGGAI'GAAAT GAAATCGCAT ACAAGAAATA ATTACATAGG AGAATACAAA TGAAATTTAG TGCGTCTT GGTCTTGCTG CTGTGGCAA AGGTGGTGAC GGTCCCAAAA CAGAAATCAC 170 TTGCGTGGGCA TTCCCAGTAT TrACCCAAGA AAAAACTGGT AAAATCAATC ATCGAAGCGT TTGAAAAAGC AAACCCAGAr CA'rCGACTTC AAGTCAGGTC AGACG'rACTC TTTGATGCAC TGAGTTGAAT GACCTCT'rCA ACAAGCAAGT AAAGCTGGAG CATGGCAATG AACAAGAAAA TTGGACAACT GATGATTTTG AGGTTCATTG TTCAGTTCTG CCTTTATAGC GGTTCTG'rAA ATTCGTCAAA GGTCTTGAAA TTCACAArrT GACGGTGGGG AATCCTTTGG GCACCAGCTC AGAAGTGGTA GAAGTACCAT CrGAAAAAAT CACAACAGCC CAGGACGTAT CATCCAATAC CAGA'rGAAI-r TGTTAAAGAT GACGGTCTG GAACTTATGA ATAAAAGTGA AATTGGAAAC ATCGAAGCAG GAACAGCTCC GGTAAAAACG GTAAATTCG4C GTCAACAATG AAAACATCGT AT'rAGTTCTG CCCCATTCTA GCAAACCTTG TAAAAGAAGG AAAGACAAGG GTTACACACC
ACAAGGCTTA
TGTTAGAAGA
AAAAAGTAT
TATGTATCCG
TGCTGGAGTA
GAAAGCACTT
GTCAAGGGGG AGACCAAGGA ACACGTGCCT TTATCTCTAA CAGATGAAAA AGTTAGCAAA TATACAACTG A'rGA'CCTAA
AAGCAACTAG
CAGATATCCA
AA.AA'GGTAT
TCCCATCAGA
CTGGATTAAA GACAATGA TCAATAATGG
AAACTTTGCC
CCAAGCTAAA
CGAAGGTAAG
CGACAAGAAA AAACGGGTTT GCAGTATTCA ACAATAAAGA CATCCAGTTT ATCGCAGATG ACAAGGAGTG GGGACCTAAA ~T=CCCAGTC CGTAC-IrCA' TTGGAAAACT T'rATGA-AGAC CGGCTrGGACT
AACACTTTGG
TTTGAAAGCC
CAATACTACT CACCATACTA CAACACTATT 1TCCCAATGT TGCAATCTGT ATCAAATGGT TTCACTGAAA AAGCGAACGA AACAATCAAA CTTAGTTATT CTATAAAAAG TAGTTTTTTA AAGAACCTAA AACGGTCAAA CATCTTACAC CTTTTAGAAG CAAGTAAGGT CCAGCTCTTG AGTACCTTGT GTCGCTGCAT CTAAGAAATT GACG'rAcrTC GTACAGGTGC AAACGCATGG AAAcAATrCAG GATGGA=TG CTGAA6ATGJAG GACGAAAAAC CAGCAGATGC AAAGCTATGA AACAATAGTC GAGTGTATAC CCCCTTT'TCC AATGTAAGAA ACTGTCACGA TTCATTTTCA TTTTAAAACA TGCCGACTGT GAAAGTCAAT TAGCACCAGT ATTATTCTTC TTACAAGTTT CTTTAACTAC TCCGTATGTT TAAAGATCCT TrGGATCTGT ACCAGTTGTT AAAATGTCAT TGCCAGATCC GTGTTCCCT GACAGTTGT 21120 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 CTCTACACAG ATAGTGTAAG AATTAAAATG AAGrrCTTAC GTTCAAGAAA GTCAAAAAAT AAAATCCGTA TGCGGGAAAC TTTGTCATCT TTGTGTTGGC TCAATGACTA AATTTGAGTT GTCTTTACAA AATCTCTGAT GTTCTATTCT CACTCTTTGT TTCTACCGT TCGTCTTCTT AAAAGGGGGC TTTTGTTTAA ATAAGCGAAT CATAAAAAAT 'rATTCTATTT GAAAGAGAGG AGTGA'"CC TACGCTTTCC TCCGATGGTG ATGGGCTTCA TGTAGGCTrG GATAACTATA TAACACAOTT ATTqrGGTTA AGCATCTCAG ACCTATCATC CCTTCCTGTT GTAACGGGTA TGGAAATGGA 1TTrATGACCC ACTATCAGGG ATTCTAAACT TrGTCCTTAA GTCCAGCCAC ATCATCAGCC AAAACATT'rC 'rrcGGTGGGA GATAAAAACT GGCATTGAT GGCGATrATG AT'rATTCTCT TGACCACTTC AGTTGGTCAG CCCATCATCC Tr'ATATCGC TGCCATGGGG AAT'TrGACA ArTCACTGGT TGAAGCGGCG CGrGTTGATG GTGCAACTGA GTrrCAAGTT 7TrTnrGGAAGA TTAAATGGCC AAGCCTTCTT CCAACAACTC ACAATTAACT CATTCCAGTG TTTCGCCTTG A1-rCAGCTTT TACTCAACAA GTACCTTGAT GTACTACCTT TACGAAAAAG GGCTATGCCA ACACAATrGG TGTCTTCTTG GCAGTCATGA CAATTTAAAG TACTTGGAAA CCACGTAGAA TACTAAAGAA ACAGAAAAAA AACCA'rTAAC ACCTTTrACT GTTATTTCAA ACTGTGCTGT TCATCTTTCC AT1'CTACTGG A7TTTGACAG
TTTATATTGC
TGACATCTGG
CCTTCCAATT
AATCATCACA
TGGTCCAAAC
GACAGAATAC
TTGCTATCGT AAGCTTTGTT AGGAGACAGC TATGCAATCT CAATCATTTT GCTCTTGTTG GGGCATTCAA ATCACAACCT CAACCATGGA AAACTTCCAA GATACAATTG
CAACTCATGC
GTAACCATGT
CTTrCrATG
CAAGTTGTCC
TGGGCAGTTA
AGTGAAAATA
CGTACCr'rCT TTTrACCTTCA
TTATTCCTCC
TGCAGAACCC
TCTTAGTTTG
GTCAACGCAT
TTGTACCAT-T
TCTTGCCTT
TCCCTACAGA
GGAGTGTAGC
TCAATACTTG
TCAGTGGTTC CCTAAAA'rGC TGCCT'rGCAA TGGATGTGGA ACTCAGTATT TATCTCA'rrG TGCAACCTCA TCTCTAGCAG GTTATGTA1-r GGCTAAAAAA TCTATTTGCT ATCTTTATCG CTGCTATGGC GCTTCCAAAA GGTACGTATC GTCAACTTCA TGGGAATCCA TGATACTCTC 22920 22980 23040 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600
GATTOGATOG
GTTGCTTGAA
CTTCCCGATT
GAATGACTAC
CCATTCGGTG
TCAGCTAAA.A
GTGAAACCAG
TTCATGCAAT
TCTTCCTCAT GAAACAGTTC TCGACGGTTC TGGTGAGATT GCTTTGCAGC CCTTGCAATC TGGTAATGrr GACTTCACGT CTGAAATGGC AACCAACTAT TCGTCACAGT CTCCTAGTC TCAAAGGATA ATACTCTGCG TAAGTATT-GC CTGCGGTTAG AACAATwlrGA CCATCTCACT TGGGGTTGCG ACCATGCAGG GGTT'rGATTA TGGCAGGAGC TGCCCTTGCT GCTGTTCCAA TTCCAAAAAT CCTTCACACA GGGTATTACT ATGGGAGCGG AAAATCTCTT CAAACTACGT CAGCTTCACC TTGCCATACT CTTCCTAGTT- TGT'r=?CAA TTNCATTGA GTATAGGAAA ATCAATCTAT CAAGATACAG AAGTATATTT TATAGATTTA GAGAATATAG AGGTTATAAG TGTCTACAAA ATGGAGGGTA TGCAGTTACT TrATGAAGTT TTGTCAGACA CTTATAAACT TAAGAATGGT TTTAGTTAAC TATCAGAAAC GAAGGAAAGA GTATGATT= TGACGATTTG AAAAACATCA CCTTTTACAA AGGGATTCAT CCTAATTTAG ACAAGGCTAT CGACTATCTC TACCAACATC GTAAGGATTC TTTCGAATTA CGAAAGTATG ATATTGATGG TGTCCTCAAT CAAGCTGAAA ATGATCAATI TTGTGA GAAGGACATG AATATTCGAG AGCATTCGAC GAAGCGAGTG ACATTGGCTT GTTGGGTTAT CACAATTT'rG CGATTrCTT TGCAGGCATG GAAGAAAAGG TTCGAAAATA GGATGAATTG TTTTTGTA AAGCNTGAT GGTAGAGAAA TGAGAATAAA ATA'N'TAAAA TAGTTTCTTA GATGGACAGG GGATTACAGT TGTGAA'rGCA TTGATTGGTA GATACATAAA 172
AGATAAAGTC
TGAGTATCAT
CTACGGTTCA
TGTTCATTGT
CCCAGGTGAG
TCTCT'N'AAA
AATACTCTAC
ATTGGTATCT
TTTCTAGr~G TTCAGGAAAA AAGAACTATG CAGATTTGCA CGTATCAAAG ACGAGGCAGT CATGAACACT ACCCACTCTT CCACATCAGC CAAATGGTTA ArTTGATTG ATTAAAAATA CATGAAA'rTG ATCTTTGTGA TCTAAGTATG CI'GCAAGAGC TGATCAGATG GC= GGATAA TTAGGGGCAT ATTAGGTACT TATGCCGCTA AGTATGGTAT TAGTATGGCA CGCTCGATCT TAAG1'AGGGT AGCTGCAACT GCAGCAGCAA GAGTAGGATT
ACTGACCAAG
TAATTTTGCC
TCGTATAAAC
AGCTTGGACA
TTGGAGTACT
AAATAGAAAA
TTACTGACAA
AAAAGGAGTA
TGAGAACGTC
CGTGACGATT
CAGACGGGTG
TCTTCAGCTG
TTTCTGGGGT
GAT'rTr
TTGGAAAGAA
ATTTCTGGAT GGATTrACG AGTAGCTGTG AATGTAGCTG ATGTATATGG AACAATATTG CTGCAGCTTG GGATGCATAT GATAAAATTC CTAACAATCG TTTTAAAATG CGAGAATGAA AGCACTTTGT ATTTTTTTAT TGAATATG'TT GTGCTrGCAA TGATAATTCG TGGAGGGCTA GATGGATTTG ATAGGCATAC AT'n'rAATTG CGTCGCTGTT CGGGGTATAT GATTATAAGC CCATAGATAA AAGTCCAAAA GAAAAAATAG AT'rrGTTCAT GGTAGGGACT TATGAAAGCT AAAAGAAAAC ACTTTACAAA GAAAAATGAT GGAG4GAGCAA ACATGGCACA AGCCTTATCA AGGCAGCATT TGATACAGAT AACTTTCTCA TGCGTrTTTAG T'rGGACATCG TGACAGCCAA TCTTCTTTTT G'rCGTCTCr'r GTTTACCCAT GGAGTGGCTA AAATCAGCCT CTACGAGACC ATGrCGAAG TTAAGAAGAG 24660 24720 24780 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26385
CCTGTTTTTA
GGT7TAATGG
CAAACAGCTC
ACTATCGTGA
ATTCTTCAAA
AAATCTATCT AAGATCTTTC AAGCAAAATC AGTTAGGAAT TGTGTTTCTT ACCCTTTCAG
TGAAACTAGG
ATCTCTATCT
CC'TCATGTTA GCCATTCTTG ACTCTTAGGT GGCTCAGTCT TGGATTGATG GACAAAATTT TCAAAGGCTC CAAACGCTAT TGCCCTTCCA ATTGCTGAAA GCCA'NTrGTT TACGTATTLT 'rGCTGGCTAG TTACCCTATC CGGCACGTT ATGACCTATC AAGGATTGAT GTTGGCTAGT TTTAACTC CTTGGTTCT'r TCCTCATTGT GATGGTT= TATCTGTCCG CCTTCAGTCT TCCTAC7=T TGGCTTTGCA CTATTGG'rCT 'rrATCCAGAC TCGCAAAATA CCAATAGGAG CTTTATTTCT GAAACTACTT TCTATAAGCG AGAAACTAAA ATCGG 173 INFORMATION FOR SEQ ID NO: 4: SEQUENCE CHARACTERISTICS: LENGTH: 2716 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: CCTGCCCGCA TTGCCCTAGG CATTAAGTAA ACATATAAAA GCATGTGAGA GACTGTTGGA TTCTTTTGCT GATTTTATTC
AAAGCGAGGA
AAAGAAAATG
AATTTCCCCT
ATATAATAGT
TCAAGTGAGA AAGTAGCAGG CTTCACCGTG CTGCCAAGGA GTGACTCACT TTCATACGAT TCAGGGAGAA AGATTGGCTA ATTCCATTTT TCTTAAAGGG GAGCACTTGG TTGTGGTCAA CGTGAAAAAG TGACCTATAT CAAGAAGAGG TAGTCAGACT GTAGGTGCTG GGCAAGTTCA GAATTCCCTC AGATT-ACC'T
CTTTTCCTCT
AG'rTATGGAG
ACAGGGAGTT
CCAATTGATT
TGATTTTCCC
TGTGCATTTC
AATTGTGAAA
TCCTATGTTT
TCCTAACTTT
GCGCACAGAT
GAAACGTAAA
TATCTGGGCT
AAAAAGAAAT TACGCATCAA TCAGGCCTT ACCGTGAATT GT'rACAGAAA ATCTTCCAAT TATTATTTAT CAACCTTCCA TTGCCAGCTA CACTTGAGGG CGCTATGTAT TTTCTTTTTA ATTGAGGATT TGGTAGCAGC GTCAACAAGG AAAAATGGCA CTTGGTCTTA GTGACAATCA GGGAN'GATG ACTTTATCCG GGTGGCTTCT CTTTTGGTGG AATCCCCCTA AAAATTTGAT TATGCTCTAG CGGATCTTTT TTAGAAGCTG CGAGTTGTGA ATTTTGGAGG GAAATTATCG GAATATCAAG CAAATCCTGC AGAGAGTATTr CTGAAGAGCA
AGTCTCTCCT
TATGTTGAGT
AGTTCCTCTT
CGAGGCAGAT
AAAGAAACGC
AAGTTTGAAA
CAACCGGATG
TGGTATTCCA
TCCTCTACCA
GTTTATCGTA
TC'rGGCTGAG
TATGACAGAT
TTTTCCAGGC
CTTGTTGCCT
GGCTCCTATT
GGCGACAGCG
TGTCTTAAAA
TCTGTTACAA
GGTTATGAAC ACTATAAGAA AAI-rATGGAA ATTGTATCGC CAGAGCGGAT GCGCGAATTG AGTTACAATG AGCTCTTTCC TATGACTATT ATGTTGCGTG ATTTAGATCT CTATAAGGTG GGTAGAGAAG AGATGAAAGA GGCTATTTTG GATCTCAAAG AAAAGGCTAA GAATATTTCC ATCTGGTTGG ACTTTTATGA GAAACAAGCC TCTATGCGAA 1-rGGTTTATT TACAGATACC AGTATTCGAA CCTTGAAAAC AGAACTTGAA ACGACAGATA AGGATGTCAA TCGCTACGAA
GCTTTAGGGA
TATTTTCCTC
GAAAGTAAAA AGTGAGCTA.A AGGTTTCTGG TGTTGCGACC AAGCAGGGAC ATGCTGTTTT TATCNTTACG GA'rTGGCAAA 'rTATCCGCAT TCCAAGTGTT CCTTTCTTTG CTTTTAAGGA TCGTCGCTTT ATTGCTAAAC AGTATCAGCT AGATATTATC TTGGGGATTT GGATTGCGCG TGAATTGAAA TATGAAGACT ATGTCCATTA TATTGCTAAG TATCTGGT rA GACGGITCCT GCATGATGTG CGTGACTTGC TATCTGAT'rA TAAGGTCAAG GAATTAGCCA AGTT'rGAGCG TCCGGAAATC AAACTA =GA TTCAAGATGG TGAAAAGACG 174 GCCTACCGAG CTTTTAGCAA GGCACTGAA CATACTCAGA CAGAATTTTC TCTTGGCCTG ATTCCAGTCA TCCATACCTA TCACACCCAG GGGATGTTGA TCCGGCCGAG TATGGTCAAG GA'rGGGGTrA TT TGCCCTAG TGAGATTrGTC GTTGAAAAAC GGGTCATTCC TACTGGGATT AAGCAGGAAA AT'rTGAAAGA ACTGCGTAGT
AAAAATATTC
AAACTGGTAG
CTAGAGATTC
AAGCAGTTTT AGCAGCCTT TAGCTGGGGA TGGCCCT TAT AAGACTCAGT CATCTI'TACA T'rGCTTAGTC TTTCGAGAAT GCTGATGTTC TGAAAGAGGA CTGAATGACC TCAAAGAGCA GGGATGATTG CTCCTAGTGA
CTCCTATGAA
AGACAAGGTT
AGCCCAGAAC
GACGGCTCTT
AGGTTTGACC
a. a a a. a a a TACTATAAAG CGGCGGATTT- CTTCATTTCG GCATCGACAA TACTTGGAAA GCTTAGCCAG TGGAACACCT GTCATTGCTC AACCTCATCA GTGATAAAAT GT'rrGGAACC TTGTACTATG GCTATTTTGG AAGCCCTGAT 'rGCAACACCA GACATGAACG TTGTATGAGA TTTCAGCTGA GAACTTTGGG AAACGAGTGC ATTATTTCAA ATAACTrCCA GAAAGATTTG GCTAAAGATG TTrAAGACAG T'TTGTATCT TCAGCAACAG GTGGTTGCTG CGCATGTTGA AGGCTTCAAA AACACAGTTG ATCAGTATGA GAAGAATAGA AAGAGGAACA GCTATGAAAA AAACAATTAA AAGATTrGCGG GTGTTTGTGC TGGGGTGGCC CATTATCTGG CAAGTCATTT GGGGTGTTCT TACTTGCTGT TACGGAGCTG TTATGGATTA TCGCGA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 13926 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear
GCGAAACGCA
ACGGAAATCC TTATTTGAAC GAGAACATGA TTTGGCTGGT AGCATACCTT ATCAGAGAAA ATGAGTTTTA TCTGGATGCC ATACGGTCAG TCAGCGTATC TACCTGTAAA AGGATCTAGA GAGACTATTG GAAAGACCAT TCAGAAGCGG TCGTGATAAA ATATGGATCC GACTATCGTT GAATTGTAGC TTACATTATT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2716 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CTTTGG7=T CCCTTATTCA AGACATGAGG CCCATCAGGA ATGATCTGAA ACTGCGAATC rGrrAACAGT
AATTAGTAAG
CTATGGAGAG
GTTGGATAAG
CTCCTCAG.AA ACTCCGACCA GGGAAGTAGT TrAAAAATCA CG-GGCATCCT GACAGAATCA
TGTCAGGGCA
GTGCr'GATAA
GATAGCCTCA
GCAcCCTAAGG
ACTCGCTCTT
GAAGGATAAT
175 CTIrCATACA ACTAAGAT'IC GG~rI'ATCTT TGCTGCCACA GGTAAGTTCC TGCTATATCC GrrAAATCA6A GMGTCTTCAA !AAGAGTCTT GTCTGCTCCC TGTTITTTCAA ATACTC?T GCAA7?rGAAG ATAAAATAGG AT'rr-CCCTG CTAATTTAAG AAGCTCGAAG ATTTGGTAAA TCCTAACTGG AAACrrCTAG ACAATCCAA'r CAAAACAAAA GGTTCTGTCT CTTGAGCTAG TAGC?1'GTTG ATAG'NTACTA ACTCCAGA6AC GAAATAACTC CTGTCAGTAG ATTCCGAAC? TCTTTCCA6AG ACTCTCTGA ATATTAA~Trr CA'rCrAGTTC TCCTC.AAGGC TTA.ATTCATA ACCCGTAAAT AGCTTCTGCT TGGGTTAAAT CTGCCAAGGT G'TCCTGTTTC TAGCAAA'rGC TGACGGTAAA TTCCTGGCAA GTGTGTAGAG TTTTCCAGCG ATTTTCAGAA CCAAATTTCC CTCCTGACT ATTGTGGTA6A ATCTTCTCTT GTTCTCCTAG CTGCCCTAAC CCATGCAAAA CAAGCCTCTC ACTGCATTAC
S
a.
S S S. 55 S S
S
5* S. a
S
59 S S 555*55
S
S
SS SO
S
S
.5.5 55 S S
S
CAAGACTC
GAT'rCCAAGT
TATAGAGGTT
GCTCAAATC
CTGAAGACAG
GACT'rC'A'rC
ACAATCCTGA
TCTTCTACCT
CGGATAGGCG
TCAAGCAGTT
GGTCGGTGAG
AC?1'GGGCC1' rCTCCAGA'TT CACTCTrCCT 'rGGrT'rAAA GTAGGTAAAG GACAAAAGCT TGTACTGAGA TGCTAAGGCT GATTCGCAAG CAATCTTGTG TCCCAAGTCT AAAATAACGA C'rAGC=TC TCAGCCTTTC CAGATGTTGT rrGGCTGATT TTTCCAGTTG TAATTAATG GAAGCGAGCT TGCCTTTTGA TGAACCTCTC GGTATTCAGA TTCCCATGTG GCCAACTCCA TAAATGGCTT GACCTTTGTG AAGTTGAATG AATCCGTCGT CCATrTGG.AA GCAAGACGACC AATCGTTCCA AGGCTCCAAG TCCTTGATAA TCTCCATGT CGCAT1TC ACAAGGAAAG AGrGAGCGGA AGATrTCAAC AAGGTCCACA gATGGTCGAA GTCATCTGCC AA.ACAGTTGA ATACTGCT GTGCTCGCTC CCAACTrCAG AAATACGGTT CATATCATTG CATATrrCA GALGCGATTTT TGGGATCCTG TTrCCAACCAA TrGGTCAGTr ACCCCACGCT GAGTCGTCCC crrcATrGGT ArTrTGCTCA AAAAAGACI' CTGGGCTCAT GGAAATCACT GA=rGATTCA AAGCAGCTTC GGGG-TA6ATA CTTGACGATT CGGTAATCTC GArrAGCTTC TCTGCATCAA AAGGAAAAGC TCT'rCAAACA TCAGTTGT TGTTTACGAT AGAGAAC'rGC CrTICCCAAG TAATCCCTCC GTACCAA'rGG CCACATTAAA CAGTAGACTC CACGCGGTTG GGTGCACCCG TrATGGAACC TCCTCTCGCA ACrGACTCTT ACCTGACACA GACGCTCCAC CGCAAGAGGT CCACAATCAT CrGGCCTGT'r CAAGATCT'rC CGTGTGTCA ACTCGCGATC GTCA'rCTCGT CATGTTCCAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 176 ATAGGCATTG TAGCCCGCCT CCTGCTCTAC CACCA'rACGA TrGTACATGG CAAAAGGATI- GGCATrrAAC TTrTCTrAA GrrGGACGGT GTAGTI'GACC TGATACGTAT CTCCCTCCCG TAAATGATGG 'rGAATTrGGG CAATGGCCrr TTCATAGTCT GCTGCAGACG ?IACTTCCTG CCAATq-rGAG GGCAAATCAA TATCCTCATA AGTCAGAGGA ATAGGGGAAG PPCTACGAT ATCATGAACA CTAAAGTAAA GCAGGTACTC TCCCACTAGG TrTlICCCA AAAGCAGG'rG CAGCCTCGTA GCTGACATAC CTCTTGGTAG CT'rTCCACrr CTG3CCAGCAA ATCTGCCACT GGATCCrrGT GAACTGCTAA CCCACCACAT AATAACCTTG TCTTCTACAT TTCTCGT=TT AAAGTCCTAA AATCAATCAC AAACCCTCAT CCGCAAAGCA 6 CAACTCTrI'A
TGTTTTCTA
GATGACAGAT
TTTCTTATAA
GTTTGCTTCT
TAAATC.ATCG
TGCTTTATCT
ATTGTTACAT
AGATTG'rTGT
ACCCCATAAT
ATTACCTCCG
GCTAATTGCT
ATCAACTAAA
TCGAATTrrA.
GAACCATGCA
ATAGGCTGGG
TGCATACCTT
rrCAAT'rATT TAAAGATTGA AGTTTTAAAG CTATTT'rT GTTGAAGAAG ACAGCTrCTT rrAATrTAAC TGTATTA'rTC TGTrrAAGAC TTTCGGCATC TrrTTTAACA TATGATGAAA CGGAAGAACC ATTTACTTCG rPAACTTCTT TGAAGTAAGC TrrrFAAAT ATTTTCTTGA TAATATATTC ATCACTTAGA TTATATTTAT 'rTGAAGCATA ACCTAAGAAC CTAAAAGCAT TATCTTTGAA TGAAACAGCT TAGATACCGG TCATCATTCT AACACCTACA TCCG1-rrAT AGATACCATT ACCTGCATTG TCATPAACAG ATTGAATATT TAATTCATTT TCCCATTGAT TTAATTTAT'r CTTATCACGG CTATTTAAAT CTTTAT'r TTGAGAAATC ATACATACTC TTTTATTACC TAAAGGTATA TCTCTCCCC AAGTATAGCA TAAAATAAGA GCTTCTTTAA ACAATGTCAG AATGTTCTTA ATCCTTTCGT TCrrCAATAG TATTAAATGT ACAGACTCAC CATCTGTTTT CCATT'rrCGT ATCCGTAGTA CCAGGAGCAC CTTTACTAGT TAAGGTGATT GATCGTTATA CGATTAGTCA TTAATTGT'rG TTCTCTTCTT GACTTAGATT TATTCTCTAT CTATTTT'TTT ACAGATTCAG CCTCA.ATTTC CTATTAATAT CTTCTCGTGT TCATTTTTGC GTTTAAATAC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 ATCAAGAAGA GTI'AAAGTGT CATTATAACC TTTTAGACTT TTTGGATCTG TAATATACCA CTrCATATAT
CTGATTCCCA
CATArrAATA CCTAAAGAAC CAAACTCATC AAATCCACTA CCAGTAACAG GACTrTGTAG CATACCCTGA GCATATGCTT CAGCATCAar ACCTCACGG TGTCCAAAGC CACCTAAGTA AATCGCACGG TCCTTGACGT GTGT-TGTTTC ATCTGTGTA.A ACTGAAATAC CGTA7TCACC AACCATTTCT AAATGAACAT ATrD'ACATC AGTTCTAATA TCATCAGAGT TAGGATATAT AGCAGCATAA GCTCCTGTTC CATTATAATT ATAATACTTA TCCATAGGAC CAAAGAATTC TCTAAGAGGA GTATATAC'rT TGTCGGTATT ATAGCGGCCA TATTTTTCAA CCCATCCACC 177 AGGAGCGTTA TAACCTTCCC GTTATCAGAC GCTAGACGAT GTTAACCATA TCTTTAATAT GGCAATTGCA ?rATAATPG
AAATAGGAAT
ACCAGAAATC
CTrCTAATCA
AAATTAAATA
GAGTATAGTA 'IrTCTAAGGT GACTTCGTT TTTAATTTCT TCGACCTCAG AAGCGCGI'TC AATAAACCAA TCGTTCATAT TGTCTATATT TGCATCTAAA TTACCTGATT TAGTATATTT ACGTGAACCT TTAATGTTCT TCTCTTTAGA AACAGCA'rCT CT'rAGTAGTC GrTGTT'rAAC ATAATAGTTT CTATAACCAT CTGCAGCT TT7MTACCT AATCCrICTG CACTACCAAA AAGATGTGCT TTrATCAATAT TCAGTAGTGG TAAATTATCG AATGCACGAT GTTTAGAATT TGCGATGTAG ACATGGTCTT CTGTAGCATC TGTGAACAAT TGTCTAT!'AT AATTTAAAAA AGCCAATACT TGACCGAATG CGTCGAATGT ACCGATTTCA ATTAATCTGT CTAATACGCT TAGCATTAAT TCTTTAATAT TAACATCACC AGTTAAACCT AGTAATAAAG CTGCTTTGTT 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260
AACTTTTTCA
AAA~rAACT
TTTCTCGACT
TTCG?1'TTGA CCATAGAAAT CTGGTTTGAA CCATAGTAAC GATr'TAGGTA TTATCACGAA TCA'rTTGACC ACTAATTTTG TGATTAGGTT
AGCAGCTCA
TGTTAAGTTT
GAATCATTTA
TCTTTAACAT
TTCTTCTAAA TATAAATCT TGATTGCATT AACTCTATAG TCACCTAATC CTGATACATC GTTTGAGACT GAAGCTC'rAC TGATTCTAAA ATAGATTTTA AAGAGTAGTG TTATCTTTrTT GAACGATATT AGGTGTATAT TTAATTCCTA AGTATATTCT TTTACATTAC TTAAACCTTC ACTGCTAGAA GACAAGTTAA TGTACCGTCC GCATAGTGAA CAATAAT'N'T ATTAGCTTCA TCTAGGTTTG ATTGTTGTTC ATCGCGGTAA CAGAAAGAAC TTCTTTAGTA T'rTAGATGGT TAATTTATTA CCTTGATATA CAATATAATC TTTA''TAG AATGGTATTA AT7=TATAG GCTTGGTTAT ATT'CAGCGT'r ATAATCTTGA ATACTAGAAT TCATTAACT TTTGCAAGAG GAGATAGATC AC=rCTAAT TTATCAGCAG AGTAGTAACT TTAGCATCAG CTTGTTCTTT AGTTAATTTA GTAAATGTTT AA.ATGATCTA TTACCTGACG AATATCCCTC 'rACCGCATAT AAA7CTT=rA AGCATAATCA GAATCA'rCAA CGTCGTTAGA GCCGAATAAC 'rCCTCTCCAC AGCATAGCTG ACAGAATTAC TTACCGTACC TACAGGCCAA GTCTTACTTG AACTTCTACT GGATTTGAAA CA'rCTATTTT- ACCTTTACA ACCGACTCAG TTTTGTACCA ATAAGATGGT CTAGAGTTAA TCCATAATCT ACTTTAGGAA GGCGCCTGTT TTGTTTCCTG TAATAGTAGC ATCAACATAT GCTTTTCTAA GTTCATCTTC 4320 CTGTGAAGCT 4380 GATrrAGATG 4440 TATCATTAAC 4500 AGTCAGTTAT 4560 AGTAATCTTT 4620 TGATAAACTC 4680 GTTCTTTATT 4740 ATTNTCAAG 4800 AGGCT'rTTTC 4860
TAATATTGAA
TAGATTTCCT
TATCAGCACT
GGATAA'rCTT
CTATTGCTCC
TTAGGAGAGC
CTAACAAGCT
CAATTCCTCT
4920 4980 5040 5100 5160 5220 5280 5340 178 ATAGTTTGTA CCTGCAATTC CCCCTGTATG AGAGCCATTI' CCACTTGTAG AGTGTAGrT GCCAAAGAAA GCAACAT~r CAATACGAGT TCCATCATTC ATATTrTrA CAAATCCAGC AACATTATTA CGACCTGAAA GTGTGCCTGT TNTCATAGTA TTGGCTAATG ATGCAATATT TTCAA.AATTC ACATTATTA TCGTTGCCTTr TTCAGTAATA GCAAA'lrGTT TTCCTTCAGA GATATATGAT TTCCATTAG GAACAACATT TTCTrTTTGAA GGATCGTTTT GAATAGCTTC AATTrrGACA TTTGTAA'rAA CTGAAACC
ATCTTGACCA
TGTTATCACA
ACTTAAAAGT
TCTAGCGCI'C
CACTAATTCT
GAACGTrTCTA TCTCTACATT TTAAATAATG GATGT'rCCAA TTTCCTGTGA ATTCTTTAGT ATTGATTGTC CCAGACGATA PTGAAATTAT AATATACATT ATCTTCGTGG ACI'TAGGTT T'rTCAATATA GTGAACCTAT AGCAGTTCTA GAGACTAAAT TGTCTGCGAT TGC'TGTAACT
TCTTCTTCAA
TTATATACAG
GTT=CTGAT
ATTTATTATC
GTGTTCCGTT
TATTTGAAGT
a a a. a.
AACCGTAGTT TCTTCTATAT TATTTAAA TAATAATTC TTCTTT'rCCA TTTTCGTATT TTTAAGATCT AATTGAATAT GTCGTAAATC ATAGTTGTAG AGCGCTTATA CTTTCTGTTG TrTC'TCCTTTT TTCAA'IrCAG AGTATACTTA GCAACAGCTT TATTGGCTTT TCAGATAATT TTCGTTAACT TCACTrTrGTT '1TCAGTTTGG AGGTTN'GCC TCTTATCATC AGGAATAGI'T GTTATCAGrG ATTCATTAGT? TGATTAAATC TGTACGTTTA ATATT-rTAA GCTCAACTTT TTTGA=TTC TAGAGTTTCA GTr'ICTTCAC CCTTACCTCT ATAGGGTGTA TTCTT'rGTAG TACTCTAGGT TCTTAAATGC T'rACCTTGTC ATCTGTAAGG ACTACAGTAT TALATAACTTC CTGTGATTGA TTTGATTTTT GI-rTTCTTTT GAT'rTTCTAG CACGTrCCA.A TATTTTCTTA TCGGTACTAG TCAATGT-,AA CAACCAATTT TTCAATAGTT GCAGTTAATT TTTCAACAGC TAGCATCTGT A'rAGCTGCA AC'r'NTTCAG CCTTTGTAAC AACTTCTATC ACTGTAATGT TCTTTTACCT TTGTTTTTGC TTTTAACAGC TAGTAAI'GTA 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 ATCTGCAA'rC GTATTGTTTA ATTCAG=rT ATCAACGT'rT ACAGCGTCAA TAGCCGTN' AAGTTrTATTT GTCTCGCTAT TTACCTCAGG CTGT'N'IACA GGCTCTGAAG CATAGACACC T7'rTGCAGTT TCTA).AACAG GTCCAAGAGC ATTGTAACTr GCTGTAGAAT AATCAGTAGG AGAAACTGAA CTAGCTTTAT CAATTTGATT ATTTAACTCA CTTTATCAA CTGCTTCTTT AGTACCAATA CCCTTTATTT TATCTTCTGG TTTCGGTGT TCCTCTACAG CC71rcCTC TTCAGGAACT TCTCGTTGCT TTTCTGGCTC AACTGGTGCC GT'rGGTGCCT GTTCGTCTTC 'rCTTGGCGCG ACTGGTTCAC CTGCTTGTTC AACrlTTGGT TCCTC7GTG T'rCTGT'rG T'TTTTCTACA GCAGCGTT CAACTTT'rGG TTGTTCAATA GATTGA'N'AA CAGTCTCCTC T1WrTGGTTCT ACAGTTTCTT CAGCCTTGGT ATCTGGAGTT GACTCTCTT GTTTCGGTGT 179 TTCCTCTACA GCCTCCTT CTT'CGTCT TCTCTTGGCG IrGITTTGTCT GATGGTTCAC AACTTCTCCA CCTACTT
TACTTTAGGA
T'rCTTCTGTr
TGTCCTAGCT
AGGGTGTCGT
TTAGGTGCTT
TGCTCCTGAT
CTCTCCAGGT TTTGCTGACG GTCACCTGAT AGATAACCAA AGCCAGCGCT AGGGTCGCAA CATAGCTCCA ACTAGAAAGA CCCAACAGTC AGCAAACCAA CTGATCTNT TGATACACCA AATTAAATCT TTAGCTTCT GTCCACTACA GAAGGAGCCA AATCTTACGA A'N7GAAAAAC AAACTTCCTA ATAGCATCTT ACTAGCCAGT GCCGTTACAT CTGAACATGC T'rGACATGCA CTTGATAAAT TCAACCTCAA ACCTCCTGCA TGAAGAGTAG CTGACGI'TCA ACAACGAGAG ACCTCTACTA CCTAGA.ATAT GGATTCTTTC CCAATATGAC CTrCAGGAGC TTCTGGTGCc CGACTGGTTC ACCTGCTT TTrcTGGCTT AACTGCTACT CAACTGGAGC TGGTTCTGCT CAGTAGG'T TACCTCCGAT C1-TC1-rrTGG AGCTTCCTCT TTGTTATTGA TTGAGGAGTC TIrrCTTCTAA AACAGTGTCC CATAGCGATA GCCCTrCCAT'r CTGGGTCTAC AGCCCCTGCA CGCTAGCAAT 'N'TCTTTCTC AACCTGTCAA AACAGATGCT AACCATATAC AACTTCATTC GTGAAATAAT CTCTrTATTT 'PCAAAACGCT TCCAAGAAAT
TCAACAACAC
CTAGGAAGAA
'TCTAGAT'rA
TCTGTCCCTG
CTGTCAGGCT
ACATAGTGAT
ACAGAGCCTA
CCTCTCGACT
CTACCAATCC
AAAGCAAC
TTT'GAGGCAA
TTCCTGTCTG
ACTGGCTGC
CAACTCCCT'r Tr'XTCTGGCT CGACTGGTGC TCAACTTTTG ATTCCTCAC TTTTCCTCTG GTTTTGACTC GAATCTTC?1T TCCCCTCTTC TTTGCTTCTT CCTTTGGACI' GTCTCTACTA CTTGGTTC TCAACT'rCGA CCACAGTCAC AAGCCAAGCG TTTTGAGGAT 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880
GGTCTTTTTT
GCGGATAGTC
GGGCATGACC
TGCCAATTGC
CTGGATCCTG
GATGGACACT
CCCGATTGAC
CCAAGATAGT
CACCTAGCAC
TCTTTTCTAA
TCTCTGTTAC
AGACAACC 'A
CTAGT'N'CTA
ATCTTCATAT
AAACACTTTT ATCTCCTTA TTCAI'CTCA CGCACGCGCA CCTCCGATTA ATTTTGGACG AATCTCTCTC AAAATAGGGC GAATCGGAAC AGTGTCTCCG ATATCCPLATC CAGCATCAGC CATAAACTTA AAGGCTGCCA ACTGCCCCGA GACAATTITCC AGACCAAACT GCTCTCCCAC ATGCTCACAA CCTTGAACTG CTAAATGGAT CTCCACTATC AGCTCACCAA TCTCTTGACT CTCACTAGAA GATAGACCI'A AAACAAAAAG AACATCTTCC ACTACCTGAC GTGTTTCTCT C'rCTGTTGTC ACTCTTCTAT CATACCGT'T GAAAGTrTGC CCAA'rrACGC ATAAPLACTCC T'rCTATTT'AT ATATA'rTTCA ACTTTCGTCC GAATGGC TCCAAAATGA AGTTTGAGCC
GGCCCCCTGC
TTGAATCTGT
TTCTTGTTT
CAGAATTGAC
CrTTTTTGGGG T'rCAAA'rrGG
GTCTCGTTCA
TTAGCAAGAT
TGGGAGTTAG
TCTAGAATCA
180 GTTGATCGAC A~rMTAAGA CCAACTCCCC CACGTTTGAG CAGCATCTTG GAAGCCAACG CCATCATCCT CAATACGGAT TCTGGACAGA AAGTTAATA TGG.CCCI'GAC C?1'CC=rC CATTTTCTAC AAGGGGTTGT AGGACCAGCT TGGGTAAGAC TTTCATTAAT TTCGTATTCC AGCTTATCTC CATAGCGTTG GGCGGACATG ATTGATTTCG TCAGAGAGAC AAATCAAGTC AGCGGAAATA GGTTGCCAAG GAC?1'GGTCA CCTGCACCAC CACCCATCCA GATGATGGTG TCCAAAGTCT TATAGAGGAA 'IrGAC7'rrGA CTACTATCAC CACCAA'rCCC GAATCCTG~r CTTAATGCCA TGGTAAAGAG TAAATTATCA AAGGCAACAT TTCGGATA AAGAGATACT CTTGCCTTGA TTGAGCGCCA TCCTGACTA TCATGAAATT ATGTGGATTA ATCTGGCTCG AAAGGGCTTG AAGTTGGTAC TGACGGGTCG TTTCTCCTG GCTACGAATA GCTACCATCA ACTGATCAAT CTGATCCAAC ATAGCATTAA ATTGGCGAGT TACTTCTCTC CACCAACTTC C7rGGCACGA AGA7TTGAG CACCAGAAGC AATMrCCAAC TCAAA'rCCTT CAAAGGAGCA ATCCAGCG~r TAAGACTGAA Cr-ACACTAAG CAAGAAGAGA 'rGTGACACTG GCCCCAAGCA AGGTCCACAA GAGCTGACTC CTAACTTTTC CAATGATGAC ACGCCAAGCA CCGTCCAATC ArTCCT.CA
AGTTCATAGG
ATCGTTTCTC
CAGAGACAGA
CGAACCTGGT
ATCTTC'rCrT
TAGGGTTTCZA
ACAAATCAT
TTGAGATAGG
CCC'rTrGCAT GACTGACGTA GGATTGTGA CCAGGAGTAT AACCCTGACC TACCTCCAT TTTGCTAGAC GAACTATAAA CTGTGTGTTG GG''TCATT GATAATGAAG GCAAAGCCCT GCTGCCCCAA CTTCCAGAGT TTCATAAGAA ATATCCAAAC GAAGCACACC CAACAAGTTC 'rTGAGrGACA GAAATGACCC ACTGACTATC AAACAGGCAT AGCTCCCTGA TGAATGGCCT TT'NGGTACCA
TGTATCGATG
AGGATGGTAG
CTGGAGTTGA
AAGATTGGCT
8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 AGGAAG'rrrT CATCTGCACA CTGTCATCTG GCACAACAGT TTTCAAGTCC TTATCTGACT
TAGAAATGAC
TCAAGATGGT
TGATTTACGA GCTGGAGTCA ATCCTCAGCC ATCATATCAG CTGACCAGAT PTGGTCACCA CAAAAACAAA TCTCGGATTC CCTCGACCTT GTCTTGACTG GGATTCTCAG CATAGGCCAG AACATCCGTC TGCTGGGTCA
AACCAGTCGA
TGATGGTCGT
ACTAGAAAGT
TTCTAACTAA
GGTGGTTCT AGTT'rTI'GA TATAAGACTG AATAAAGTGG CTAGTCTGGC TTGGCTGTTG CCCTCAATGG TG.GCCTCAAT GGCTGAAGAA CTTGATTGAT TCCAACCAGA GCTAGGAGAA TGAGAAAGAC CAGAAAGATG GAAATAACCA AAGAGAAGAA CGCTTCATCG GTCrrCTCCC TTCTI'AAACT GACGAGGTGI' CACACCTGCA ATCTGCTTAA AACGTTGGGT AAAATAGTTC ATATCTTCA.A AACCAACCTT CTCTGCGATC TCATAAATCT TCAGATCTGT AGTI'AAAAGC AAGAGCTTCG CTTGTTTAAC ACGTTCTCTC ACCAGATAAT CCTGAAAAGG CAAGCCCAAC TC'TT'rCTTAA TCAAGGAACT CAGATAGGTC GGACTAAAAC CTAAGTCACT GGCTAAAGAC AGCCAGATGA GACTGGATTT TCTGGGCCAT GTTTCCTTCA 'N'GTAACTGC TCTTCTTTCT CTTCCTrGTC CTCAATATCC TGACGAGAAA AGGGTTTGAG AGACAAGGCA TAATCAAAAT CATCGTAACC G1TTTCTCGT ACCAGACTGG CCAACTGGAT TAAAATGATA TCTGGCACCT GCTTTTGGAT CTGACCGATG ATTTrCCATAT CGTAGGCTGC TACCAGATAT TCATCTTCTA CCATTAAGAT TTACTAG'rAT CAGTATAGCA AAATTCTCCT TACIrTTTGT
CAGGTAGTCG
TGrrAAAAAG '1-1-AAACTAA ATTGGCTATC AACCTAT'rAG TCAATAAATC TTGATTTTCC CCAACATTrC TCCACACCTA GTTNGACAGC ACCAAATGAA CCTGAGGATA GCCATTTAGA TGAGGCATGT TGATATCGGT CAArrCCCAA GCCTGCCTTC CATTTTCAGC 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 TACATTGACC AGTTTAGTCA TGTGTAGGTC ATGCTCTGCT CTAACTGCTT AGGAAAGACC
AACCTTGTCT
CCTr'rACCAC
TCTTATACTC
ACCCCTTTGA
TAGATAA.AAC
AATAAAAATC AAAAAGTAAA CTAGGAAGAT AGCCACAGGT TTCTCAA.AGT GGTTCTAAAT AAAACTGACG AAGTCGACTC AAAGTATAGC 'rTTGAGGTTG TGACGAAGTC GATAACCCTA CATACGGTAA GGCGACGCTG ACGTGGTTTG CGAAGAGTAT TAATCAACAT AATCTAGTAA ATAAGCGTAc CTTTTCTTC
S
*c 4@
S..
56 .0 5 55 S 0
S
*6 S S 6
S
55 5* S S
S
S
56 .5 S S
S
TTGGGAATAA AGCGGATAGA GGACCATCCG TAAAGAC?.TG CGCGTCATAT 'rGTAGGACTT GTAAAACTAG GCCAGCCACA GAGGCTATTG ATACAGTAAC GTAAGCCGCC CCCAAGGTGA GP.ATCTCCTA CTCGGCTCCG ATCTTCCTTG TAGGTGACAA CATCTGGACT ACCAGACTCA AATTTGTCTT TTCATGAAAA CCAGT'?GCTA TATCCACATA GATACCGGAT GCTCGTTCTG TTTGATTrC CTGGGTAACT AATTCCTCAT CACTTGGTT1' TGGATATTTG TTAACATTGA rATGGCAGTA GCCATTTGCA TCAGCCACCA CAAAATTCTT CAAGTTTTCC TTAGCCACCT CATCAAAGAC TTGGTTAATC ACACCAGTAC CGTACTGGGT CCCCACATCA ATAATGCGGA AATAGTGAAG CAGGATTTCC ACATGGACGG TTTCTGCATG ACCTGTTTGG
TCAAATTTAT
GCATACTCCT
CTGGCATCAA
TTTMCTTGA
CCCAGTAACG
CAGGTGACAG
TGACAGGATA
GATAGTCTTG
AAGAGATTTT 11460 CATTTGGTCT 11520 CTTGTCCTGT 11580 CACTTCCATA 11640 GATGGCTTGG 11700 GAGAGGTTCC 11760 GTTTGAGAAA 11820 GGTCTT'rC 11880 GGCCGCCTGA 11940 ATGGTAATCC 12000 ATCGTAT'rTC 12060 TGTGTAATAA 12120 GGrTGGATTG 12180 ATCATAGGTG 12240 TGTTTCTCCT 12300 GAAATAT'rCC 12360 TTTrCAACTG CTAGAGGTTG AC'rTCCAAAT CCTTGTCATC r1TTCCTrGT T'rGAGAGAAA
TTAATCAATT
TAq'TTNGCT rTTrGCTTGGC
CGTACTTGGT
CTACCATTTG CATAGCCrGA AACGGCATCC GTCACCCCGG GAACACGTGA TCCACTCCCC AGAAACAACC TCCAGCTAGA TAAATTCGT GCALAGTCTGC GTC7-rTACTA120 12420 ATTCTGTTT TT T'CACTGC GCATC!TGTCT GCCCTGCATT AATACTCC'rA GCAACAAGAA TTCCTTCAAA GTTTGCAAAA CTGCCTTCT TTG'rCTATAA AAGTr'rGCCT GATGGGTCAA ATTCTTAAAG TCCGCTTCAG CACATAGTCA TCACCAGCTT GATGGAACAC CAAGAAGCCC ACGGTAGGTC TTGCCATCTA TGCGCTTGTT TTACTAGCTG AGTCACCCAC 'rTGCCTGAAC TGTTTGCCAT 'rTTTTCATAT TTGAAGCATT TCCAAACAGA TTTTGACGAT TCCGAGATAG TCAGAGCT1AG AAGCAAGAA'r CTCCCTCCCA AGCTCCTGAA GCCCCACGCA AGGCGTCCAA AGCCCTTACC ATTTTGCCCC TAAAGTGTAG AATCTCCATT ATTGGAACCA AGAAGCATAA TAAATATAAA GGAAATTCCT TTGAAAATTT GCCGCTAGAA AGACCGGTAA CAAAGGTAAG CACT'rAGAAA AAAGAAAATA 182 TTTCCTCCT TGGCTAACTG
TCGTATCAAT
GATrTAAC TGCAwT
AGGCTTGGGT
CTAGGACTGG
ATTGCTCTCC
CTTTAGCA.AT
AGAATTTGAG
CTCCCATCAA
TCTGCTCCGT
AAGCCGTCAA
TGATATTCCT
ACCAAGAAGC
GGATGAAGT
GGTAGCGCCA
CCACCTGAAG
GCAAAACTAA
TGTCCTTGCA
TGGTGCAAAC
AGCAAATCGC
GCTATAAAGG
GCCTGAGCAC
ATACAAGGAG
TGACCCATAA
AGAACATAGA
TTATCATTCA
TTCCATGAAT
TrGGGTAAGAA GAGATTTrTA CTTATGTCCT GGTGACACTA CTGTCAAGAC CTrA'rCCGTA TCTGGAAGAC TAGCCAGACA ATAGACTTTC TTGCCCI'TGT TTCAAAATCA GCCACCTCTT CTTCATTTCA. TCTTTCGTITT ACAAAGGAGC GAACCTGCTC TTrCCATT~TTA TTCAAATAAT
AATCAGATAA
TCCCTTTAGC
GGTGTTCACT
CAAGAACACA
TCACTTAAAA
CCGCCTrTTC AATTT~GCGAG AACCGGTTAT GGCTAGAAAA TAAGACGCCT CCTAGGCTAA CCTGGATGTG TTTTGACCAG CGGACACCAT AAGTT'TCCAA TAATCCAATC CCTTATACCA 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13926
CCATCACAAT
'rTCGGAAATG
AGCCCAGCGT
CCGCCAAGGC
AGGTCAAGCC
GTTGTAGCCT
CAAGAAGGAT
CTAAAAAACC
CCAGAGTTCG
CATCCTTATC
AATGAGAAAA CCACCCACTT TTTCAAAACA TAACTAGAGG ATACACCAAC ATGAGACCAG CAAAACAGAC CCCAGAACCG CAATAAAAAT GCCTGAC'rAT CTTTTCCTTA TAAAGCCCCT AATAAT'rGCC CCAGTAAGAT AGCTCCATAG CCCAACAAAA TAATAAACTA GTAACTGAGA ATCTAGTAAC ACTCCI'GTAT AAAAGAAGGA TAGAATCCCT GCCAAAAAGA AGTTCCTrCCT ATCATTTTAT TGATAGATTT
ATTATA
INFORMATION FOR SEQ ID NO: 6: SEQUENCE CHARACTERISTICS: LENGTH: 20199 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENICE DESCRIPTION: SEQ ID NO: 6: CCCAGCAGAA AAATGGCA?1' TGGAGATAAT GGAAATCCTA AAAAAACTAT ATAACC?1'GT TTATCGTGAT TATCATGCTA GTAGCAAGTT TATTGGGAAT GCAATTGGTG CCCTCAGTAA TCTATAAAAT AGATTCAAGA AAATrrAGTG
CCCAGCCCTT
AAGGTCAAGG
CCTAATGGAG
GACGAAGGAC
GGTGAAAAAG
TTTTAAAGTG
CTCGGTAATGG
CCCCT'rGGCG
TACGTACCTT
GGATGACCAA
AGAAGAAATA ATGAGTA'PGT T?'N'AGATAC TGCCGATGGT ATGGTTGCCT TTCGTCGTGA TGGTGATGGT GGTCGTGGAG GCAATGTGGT GATGGA1'TTC CGCTACAATC GTCATTTCAA AGGGATGCAT GCTCGTGGTG CTGAGGACCT
GTTTGAGAAA
7TTTGCAACT
ACTGGGATTT
AGCTAAGATT~
AAAATATGTC
CTTCGTTGTA
GGCTGATTCT
TAGAGTTCGA
GTACCACAAG GTACGACTGT TCGTGATGCG GAGACTGGCA AGGTTTTAAC AGATTTGATT GAACATGGGC AAGAATTTAT CGTTGCCCAC GGTGGTCGTG GTGGACGTGG AAATATTCGT TTCGCGACAC CAAAAAATCC CGTGAGTTAC AATTGGAACT GTAGGGAAGT CAACACTTTT CACTTTACCA CTATTGTACC GCAGTAGCCG ACTTGCCAGG CAGTTCCTCC GTCACATCGA TGCACCGGAA ATCTCTGAAA ATGGAGAACC AGGTCAGGAA AAAAATCT'rG GCAGATGTCG GTTTAGTAGG ATTCCCATCT AAGTGTTATT ACCTCAGCTA AGCCTAAAAT TGGTGCCTAC AAATTTAGGT ATCGTTCGCA CCCAATCAGG TGAATCCTT= TTTGATTGAA GGGGCTAGTC AAGCTGTTGC TTTGGGAACT GCGTACACGT GTTATCCTTC ACATCATTGA TATGTCAGCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AGCGAGGGCC GTGATCCATA TGAGGACTAC AATCTTCGCC TCATGGACCG TCCACAGATT AGTCAGGAAA ATCTTGAAGA CTTTAAGAAA GACTTACCAG CTATCTTCCC AA'rrTCTGGA GATGCTACAG CTGAATTGTT AGACAAGACA ATGGAAGAAG AAGCTTACTA TGGATTGAC GATGACGA'rG CGACATGGGT ACTTCTGGT AACTTTGATC GTGATGAATC TGTCATGAAA GATGAAGCCC TTCGTGCGCG TGGAGCTAAA GAGTTTGAAT T'rGTAGACTA GGAGACTGGT
CTAGCTATCA
ATTGTAGCTA
AAA'rTGGCTG
TTGACCAAGC
CCAGAATTTTT
GAAGAAGAAA
GAAAAACTCA
TI'TGCCCGTC
GATGGGGATT
ATGGGAGATA
ATAAAGAC GGAGTCTTAC ATAAGATGGA CATGCCTGAG AAAATTATGA TaAATTTGAA AAGGTCTGGC AACACTTTTA TGCTCTACGA CGAGTCCGAT AAGCCrTGTA AAT'rAGTCGT TGAAACTCTT TAATATGACC ACCGTGC TATGGGGGTT TGGTCCGCAT TGCTAAATTT AACCGATATC TTTCCGAGAT GCGGATGGTA ATTTTGTrTC CGCCGCAGAC GT7rGAATG AAAACAAATT GGAAGAACTA 184 TTTAATCGTC TCAATCCAAA TCGTGCCTTG AGATTGGCAC GAACI'AAAAA GGAAAATCCA TCTCAGTAAA GAAGCTAAAA AATCCCGTGC CTCATCAGAC ACGGGA'NVI' GTGGTACGAC AGGCATGTAT AGCAAACTGA TGAAATGAGA ACAGGACAAA AAGCAGAGAT GTACTATTCT CAAAATTrAAA
TCGGCGAGTC
CTGGGGATAC
TTGTTTGATT
AAATAGCGAT
ACCGTTTTAA
ATCTGGAATA GCACAGCATA TCTTCTAAAA TATACTAAAA TCGATCAGGA CAGTAAAATC GAT~TCTAAC AATGrT'rAT AGTTTCZAA'rC AACTATATTG rrATAAATTG ATTTGAATTT CTTATTTCAA rTGTATAG TATATCTGA'r G'rCAAAGTTC TCCCAAGCCT GACTATCGTG AGGTAGCGGA TTAAAATGGT GTCTGACGCT GGAAATAAGA ATTGTCAGAA GAAGGCATAG CAGGAACCTG ATAATAAGGC GTATATAGCG GATAAGAGCG AAAGGTAGTC GTAACCTATA TGCGTAAATC ACGAGAGTAA CGAAATCGTG GCTCTACGAA TTGAATTCGT ACTAAGA~rr TCTA7rTrCA CTGTAACCTT ATACACGAGG AAAGATGTAC GACTATCCC c'rGAGGTCTA CAGATAGAAG TGATCCTGAG TCACGGTTAT CTGTCTGATA TTCTGTGAAC 'rGAGAGAAGG GGGAGAAGT CCGATACTTA GATAAGAGAT CTAGTCTTAG GGCAATAGCG ATTCGAGAAA GATTATACTC CGCCTTGTCG TATGTGTAGG ATACTGAC'rA CT'rGCTAAAA
CTCCTACTCA
TTCGAAAATC
CG'rCAG'rTCC TTAACGCCCT TATA'rCTG TCACTATAAA GAGAAAACGA GGACGGTATG TATAAAACGC TTTAGT'rGAA CAGCCGTATT GTTrTTAGGGG ATAAAAAAGG TCTTCAAATC ACGTCAATAT ATCTACAACC TCAAAACAGT GA'TCAT'r GAGTATTACT A.ATTGAAAAG G.ATGGAAAAA AGAAATCTTG AAAGAGTATT 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 GTTITTGAGCA ACcTGCGGCT A.ATTCAGTTA CTAACTCGTC AGGATAAATT TATGATATAC GAAAACTTAG AATGAGAAAA CTATTAGTGG TGCTAAkAAAT ATGTGGTGAC T1'TGGATTGC TGGAATTGAT GGGAGCTACT GTGTTCAAAA TA'rTCCAATG TTTATGGGAG CCTCTTAGGC ATCTTGGTCC TCGTCCGATT CTAGCTACGA GGGAGATAAC GTATTTACAT GGATACGGTT AAGCAAATGG TCTACTA'1r AGTTrTCCTAG
AACTCTGATT
TTTAT'PTGA
ATTGTTATCA
AGTGTCGTTG
GTTCCAGATA
GTTAAGCGTT
CCTTATGGTA
CT 'TGGTc
GACTTACACC
TTTGATCTT'.
TATCCAATAA
AGACCTTATT
ATGGTGGATT ACCACTGCAA GGTGAAATCA CCTTAATTCC AGCTATTATC 'rrGGCTGATG TTTCGGATGT AGCCAGTCI' GTCGAAATCA ATGACGATGT ATTGGAGATIT GACCCAAGAG AAATTAACAG TCTTCGTGCA TCT'rACTATT AAGCGACAGT 'rGGTCTACCG GGAGGATGTG TTAAGGCGTT TGAAGCTATG GGTGCCACTG ATGAAGTTAT CTGCTAAAGA TACAGGACTT CATGGTGCAA AGTGTGGGAG CAACGATTAA TACGATGAT GCTGCGGTTA ATTGrAAAATG CAGCCCGTGA ACCTGAGATT ATTGATGTAG CTACTCrcITr GAATAATATG GGTGCCC-ATA TCCGTGGGGC AGGAACTAAT ATCATCAXTrA TTGATGGTGT TCAAAGATTA CATGGGACAC GTCATCAGGT GATTCCAGAC CCCATTGAAG CTGGAACATA TATATCT'rTA CCTGCTGC-AG TTGGTAAAGC AATTCGTATA AATAATGTTC TTTACGAACA CCTGCAAGGG TrTTrGCTA AGTTGGAAGA AATGGGAG'rG AGAATGACTG TATrC'GAAGA CAGCATTT'rT GTCGAGGAAC AGTCTAATTI- GAAAGCAATC AATAPTAAGA CAGCTCCTTA CCCAGGCTTT CCAACTGATT TGCAACAACC GCTTACCCCT CTrrTTACTAA GAGCCAATGG TCGTGGTACA ATTG'rCGATA CGATTTACGA AAAACGTGTA AATCATGTT'r TTGAACTAGC AAAGATGGAT GCGGATATTT CCACAACAAA TGGTCATATT TTGTACACGG GTGGACGTGA ?r'rACGTCGG GCCAGTGTTA AAGCGACCCA CTTAAGAGCT GGGGCTGCAC TAGTCATTGC TGCCCTTATG GCTGAAGGTA AAACTGAAAT TACCAATATC GAGTTTATCT TACGTCGTTA TTCTGATATT ATCGAAAAAT TACGTAATTT AGGAGCGGAT ATTAGACTTG TTGAGGATTA AACCGTAGAG GTGTTTATGA ATA=rGGAC CAAA'TAGCA ATGTT'rTCT'r TTTTTGAA.AC GGATCGCTTG TATTTGCGTC CTTTCTTT' TAGTGA'rAGT CAGGACTTCC GCGAGATAGC 'r'CAAATCCA GAAAATCTTC AAT'N'ATTTT CCCAACGCAG GCAAGTCTGG AAGAAAGTCA ATATGCACTG GCCAATTACI' TTATGAAGTC CCCTTGGGA GTGTGGCCAA TTTGTGACCA GAAAAATCAA CAAATGA'N'G GTTCTATTAA ATTTGAGAAG TTAGA'rGAAA TCAAAAAAGA AGCTGAGCTT GGCTA'TTTT TGAGAAAAGA TGCTTGGTCG CAAGGATTTA TGACAGAGGT TCTTAGAAAA ATTTGTCAGC TTTCTT1-rGA GGAATTrTGGC TTAAAACAAT TATTTATCAT TACCCACCTT GAAAATAAAG CTAGCCAAAG AGTTGCTCTT AAGTCTGGAT TTAGTTTGTT CCGTCAGTTT AAGGGAAGTG ATCGTrrACAC AAGAAAAATG CGGGATTATC TTGAATTTCG GTATGTAAAA GGAGAG?1'CA ATGAGTAAGC ATCAGGAAA'r TCTAAGCTAT TTGGACGAAT TACCAGTAGG TAAAAGGGTC AGTGTTCGTA GCATTTCGAA TCATCTAGGA GTTAGTGATG GAACACCCTA TCGGGCTATT AAAGAAGCTG AAAACCGTGG AATTGTGGAG ACCCGTCCTA GAAGTGGAAC AAT'rCGTGTT AAATCCCAGA AAGTTGCTAT AGAGAGA'rTA ACGTGTCTG AAATTGCAGA AGTGACTTCT TCTGAGGr'rC TGGCTGGGCA AGAAGGTTTA GAGAGAGAAT TTAGTAAGTT TTCAATTGGT GCCATGACTG AACAAAATAT CTTGTCTTAC CTTCA'rGATG GGGGGCTCTT CATTGTCGGA GACCGAACCC GTAT'rCAGT'r GCTACCCTTG GAAAATGAAA ATGCAGTTCT GGTTACAGGG GGATTTCAGG TTCATGATGA TGTGC'rrAAA CTGGCCAATC AAAAAGGGAT TCCTGTT-CTA AGAAGTAAGC ATGATACCTT TACCGTCGCG 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 ACCATGATCA ATAAAGCCTT AAACTTTATC GCCCTACTCA TATTTGGACr TGGTTCGTAA GTCGTTGTTG G7GTTGTAAC GA'rAAGGTTA TG3TCTCGTAG AGTCAACGGA TGATCGCAGA TTGCTTGGCG TTGTGACGCG GCTCTACCAA CTMTTCTGA GTCATT'ACAG TGGAACCCTT GCAGAAATTrC TGACCCACAT CGAGCAGATG CTGATCTACT GGCACGGATT ATTCATCATA TCACCAGATT GTTTCAAAAG GATGATAACA TTAAAATCAG
GTCAAATGTC
TGAGTA'rGGT
GAATCGTAGC
CATGAGAGAC
TirATT?1TG
AGACTTTGAA
ACGAGATGTC
GCAGATTGGA
TATGCTAGAA
GACCCGA'rTr
TTTTGCAGGC
CGAGACGGTC
CAAATGTGAC
CTCGTGAAAT
186
CAAATCAAGA
TTTCTGAGAG
AGCCGTTTCC
GCTGGTGATA
GTTGGATTAT
ATGGTACCAG
ATGGAGAAGA
CAAAAGC'rC-
AAAAATGGAG
CTGATATTCT GACAGTTGAG AGACAGATAC AGTTA.AAGAT CTGTATCAA TCAACATCAG AATCACCAAG CACGACAATT CGACAAATAT TGCCAATGTG TTGTT'CGAAG CAATCAAACT 'rGAGCCGTTrC CCAAG'rTTCG CTTATCACCA TGATGAAGTA 7 GGCTAA TGGTGTATTG AGTTGTTAAT AGTC'GTCGCA ATCTCATTAT TGTTCAGATA GATGATATAT TCCGCATTCA AGCTATAArr GATTACGATA TTTATCATGG TGTTAAAATT AATTAGA.AAC TAGGAGAAAA CGAAGC'rATG GACAAGGCTG GTGATTTTCT 0* Se C. C AGCAAGTA'r CATATAGGCT TACGTGA'rTr GATTAAGCCA GGCGTAGATA TGTGGGAAGT TGAAGAATAT GTCCCCGTC GTTGTAAAGA AGAAAATTTC CTTCCACTTC AGATTGGGGT TGACGGTGCC ATGATGGACT ATCCTTATGC TACCTGT'rGC TCTCTTAACG ATGAAGTGGC TCACGCTTC CCTCGTCATT A'rATCTTGAA AGATGGTGAT TTGC'rCAAAG TTGATATGGT TTrTGGGAGG'r CCCATTGCTA AATCTGACCT AAATGTCTCA AAATT'AAACT TCAACAATGT TGAACAAATG AAAAAATACA CTCAGAGCTA TTCTGGTGGT TTAGCAGACT CATGTTGGGC T'rATGCTGTT GGTACACCG'r CCGAAGAAGT CAAAAACTTG ATGGATGTAA CCAAAGAAGC TATCTACAAG GGTATTGAGC AAGCTGTTGT TGGAAATCGT ATCGGTGATA TCGGTGCGGC TATTCAAGAA TACGCTGAAA GTCGTGGTTA CGGTGTAGTG CGTGATTTGG TTGGTCATGG TGTTGGCCCA ACTATGCACG AAGAACCAAT GGTTCCTAAC TATGGTATTG CAGGTCGTGG ACTCCGTCT'r CGTGAAGGAA TGGTCTTAAC CATTGAACCA ATGATCAATA CAGGCGATTG GGAAATTGAT ACAGATATGA AAACTGGTTG GGCCCATAAG ACCATTGACG GTGGA7'rGTC ATGTCAGTAT GAACACCAAT TTGTCATTAC GAAAGATGGA CCTGTTATCT TGACTAGCCA AGGTGAAGAA GGAACT'rAT'r AATAAAAAGT GAAAAGACTA CTGGAAGT'rT ATTTTCATAA AAAATCCAGT AGATCT'rrTC ATAATAAAAC GCATTGTATC AAGTGTTAGG GGCTGATATC ATGCGTTTT'r C'rGC7=TAA GATT7TTTCC AACTCTGTTT GTAAGCGCAT CATAACAAAG 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 C* C.
C
S.
C S
C
GGTCTAGGAT TCAGGGCTCT CCTCCTATAT ATN'AGTGT CGCAGTCTAT TG~TCCTGTA AAAAAGAGAA ATGGAA'rTCT GTTTAASTT TATTATACTT CCTGCGAAAC AAAATATGGT AACAACTAAC TGATGCACGA TTTAAGCGTC AGATGTTAGC TGTATTAAAA ACAGCTTATC ACTATTAGTA AACTAAAACT GAGATTCCAC AATAT'rGTCG
CTAAATTAAG
TTATGAAGAA
ATGGGTTGAA
TGAGCACACG
TAGCGAATGA
CCTAGAAGAC
ATTGCGGCTG
CrrCTTATGC
ATTTTGGTAT
GTCAAGGGAG
TCAAALATGAG
AAGGGCTCAT
CGCTAACAGC
AGAACATCTT
GTAAACGCTT
TCTAGTTTTG
AGTCGAGTGT
ACTCCCCGTA
AGTAAGGAAG
GTGCAAACGA
GTTTTGACCC
ATAACTCTTG TTCAAAGTGG GTAATGATTG ATGCGACGGA 'TTcrGGTAAA AAGAAATTTC AATTGTrrCT TTGGATATCG TCGTAGAAAT ATCGAACAAG GAAGATATAT CCTCAAGCAC TGAAGATAAA GCCTATAACC
ATAGTAGTTC
TTGrrGGTGT AACT-rAAACA CCACrCTTCA
TCACGAAAC
TTrTACGTT
AGTAAAAATC
ACGCTATGAA
CTGTGAACTA
CTGGTAAAAT
AAACTCCACG
ATGCGCTATC
TATGAATGA'r TCAGCGTrACC
CGCAAAAGGT
ATAGTGCGAG
AAGGGAGGAT
TTTGATCT
TACTTTTTTA
GAAGCAAGTA
ACTTTrTGAAG
GGACGAAAAC
AATATCGAAC
7020 7080 7140 7200 7260 7320 7380 TCCCAAAGTA AAAACGTTTA AAATATTTTC CGGATTACGA ATGAATGA GCGCTGGTAT CAGGAAGTCT ATTGAGGTAT TGAGCTAGTT rrrAGAAACC CACAGTGTAG TATTCTAGTT AAGTTTCTAT TTTCCCTGAT rrCTGATATA AGAAGATGAA CGCATTATTA AATGGAATGA CAGAAGGTCC CTTGCTAATC ATGGCACGGGG ACCGTATCGC TTATTTGATT GATGAAAAGC AACTTTATCC GTCGGAGCCA 7440 TCAAGAACTC CrCTCAGTTC 7500 AATCGCCCTA AAAAAACAAT 7560 GGCTCAAGCG ATTGTCACAA 7620 TAGTCATGAT ATGAAGTTGT 7680 CTTGGCTGAC AGTGGTTATC 7740 TAAATCCAGC AAACTCAAGC 7800 TAAGGAAAGA AGCAAGGTTG 7860 AACAACCTAT CGAA.ATCATC 7920 TATCAATCAT GAACTAGGAT 7980 TATGAAAAAA TTGGGTGAAA 8040 TC.AATCCACT ATATTT're 8100 ATAGAAATAT TGACTTCAAG 8160 ATGACCGTCA GGCTGAGGCG 8220 CTGGTTCTGG AAAGACTCGT 8280 TGGTCAATCC TTGGAATATC 8340 AAGAGCGTGC TTATAGCCTC 8400 CCATG'rGTGI' GCGTATTTTG 8460 TTGGCCATTA CCTTTACCAA CAAGGCTGCG CGTGAGATGA AATCCAGCGA CTCAGGACTG TCTGATTGCG ACCT-rCCACI' CGTCGCCATG COGACCATAT TGGCTACAAT CGTAATTTTA CAGCGA.ACGC TCATGAAACG TATTCTCAAA CAGTTGAACT GAACGAACTA TTTTGGGGAC CATTTCCAAT GCTAAGAATG TATGCTGCCC AAGCTGGCGA TATGTATACG CAAATTGTCG CAATTGTGGA TCCTGGTGAA TGGACCCTAA AAAATGGAAT ATTTGATTGA TGATGTTGCI' CCCAGTG'NA TACAGCCTAT 8520 8580 8640 8700 188 TGAATCCG'N GACTTTGATG ATTTCATTAT GCTCACCT'rG CAAAAAGAAC T'rCGTCAGTC CGTCTCTTTG ATCAAAATCC
CACCTTGATG
TCCCGTrTA
GGTGCTGATA
TcrrGAGG
AAAAATAATA
ATCGTTTACT
AGTACCAAGA
AAAATATCTG
TGCAGAATAT
AAAATT-ACCG
AAAATCGCCG
ATCGTGCCGA
TGATGTTTTG ACCTACTACC TACCAACCAC GCTCAGTACC TGTGGTTGGG GATGCGGACC C7TGGATT GAAAAGGATT CTCAACCAAA ACCATTCTCC TCCTAAAAAT CTCTGGACTC AGCAAAAATT CCAATACATC AATTGGTCAA ACTCITrGGCT AGrCTATCTA CGGI'TGGCGT A CCCCAAAGC CAAGGTTGTT AAGCGGCCAA CGACGT'rATT AAAACGCTGA TGGGGAGCAA GATGAACTTA GTCGCAGTCA AATCCCCAG'r CCCGTACAAT GTTGGCGGAA CCAAATTCTA CTTATrGCTA ATT-rGAGTGA CGAATTGGTC TAGGTACAG'r ATGCTGGATG C'rTCTGCTAA ATCTGGGATT TTGCCAATAT ACAGAGTTGG I'TGAGTCCGT GCGACTCTAG AAAGCAAGGC AACTTTGATG ACACCACGGA TTCTTAAATG ACTTCGCTTT GTCACCTTGA TGACCCTrGCA GGGATGGAAG AAAATGTCTT 'rGATGAGCTG AAACTTrCCTT
TGAGGAAGCC
CAGCCGTAAG
CAATArrAGT
TGAGAAAATC
GATGAGGCTG TATTTGTAGC CAGAACCATC CATAACGATT TT'GCAG'rTCT CTATCGGACT CTGCTCAAGT CTAACATTCC TTATACCATG GAAATTCGCG ATAT'rA'TGC TTATCTCAAC TTTGAGCGTA TTATCAACGA GCCTAAACGT CGTGATTTG CAAATTTGCA AAATATGTCT TCTGGTATCA AGGGTAAGGC AGCCCAATCT GATGCTTGAT TTGCGGGAGC CCTAGAAAAA ACAGGTTATG ACGGG~rGAA AATATCGAAG 'rGTGACAGAA GAGGAALACTG GATTGCCGAC ACAGATTCAG TGCTGCCAAA GGTCTCGAAT AGCTAGACCA CTTAAGCATT TCGATAT'rCT TAACTCCCAA 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 ACTT'rCTTTC
GTCTGGACAA
GTAGTCAGGA
TTCCAGTTGT
TGTTACGAAG-
ACTGAGTCGT
GACATCAGAA
CTTT7'GATT
TGAATTAGAA
CTATCTGACC
TCGTTTTAIT
AAATACAAGC
TCCACI'AGT CG'rGCGACTG AAGA'rTCACA GAAGAGCGCC GTCTAGCCTA TG'rAGGTATC AATGCCAACT CACGCTTGCT TTTTGGTCGT AACGAAATCA GTTCAGACTT GCTTCAGTAT TTTAAGGCAT CATATAGCAG TGGTACTATT GCTCTTCAAG ACCGTAAACG CGGTGCTGCC TTTGGTCAAT TTACAGCTGG CGCAAAACCA GATATTGCTC TCCACAAGAA ATGGGGAGAG GCTAGGCAGG AATTGAAAAT CAAT'rTCCCA GTGGCTCCAA TTGAGAAAAA AATCTAAT'T ACGCGTGCAG AGAAAATTCT ACCAATTATA ACCGTCCGAC CAAGGTCTGG CTCCTCCTGC TCC1rGTC AAGGTATGAG TT'rGCCTCAG CCAAAATCAA TCCAGTCAAG CGGTCTTCCA GCATCTAGCG AGGCAAATTG GTCCATTGGT GGAACCG'N'C TGGAAGT'rrC AGGTAGCGGT GAAGTAGGTS' TGAA.AAAACT TTTAGCCAGT TCCATCCTTC TCACGAATAA TAAAGTGAGG AGGATTTTTrA TGTACAGTAT TTCATCCAA GAAGATTCAC TATTACCAAG GCCAACGAAG GAGTTGAAGC GCTTAGTAAC CAMAGGTGC TAGCTATTTT GGAACACG'rC AAGCTAGCGT T7"rGAAAWP GCCCAAAAAG TCI'GAACAA CTAACGGATT TGAAAAAAAT GACCCTGCAG GAA7TGCAGA GT7=GCTGG GTTAAGGCCA TAGAATTACA AGCTATGAT? GAACTGGGGC ATCGTA7'rlCA ACTCTTGAAA 'rGGAAAGTA'r TCTCAGCAGT CAAAAGTrGG CCAAGAAGAT TTAGGGGATA AAAAACAAGA GCACC1'GGTG GCACTCTATC TCAATACTCA
AGAAAGGCTG
ACTCAGGACA
TCI"IrCAAGC
TATTGGGCGT
CAAACACGAG
GCAGCAGGAA
AAATCAAATC
ACCGCCAGAG ATCCATCAGC AGACCATTT TATCCCGrCT ATTC7"rCACT A'rGCAA'rCAA GCATATCGCG TCAGGAGCGG TAGCGCCTAG CCAAAATGAT TGCGAATTGA TGGGGATTG'r TCTCTTGGAC AGTTATCCTG AAAAGACAGA TTTAATCTAA GTAACTCGTA GTATCGCTGA
ACTTCTCTTA
GATCATGTCA
CATTTGATTG
AGr'rCATTAA TCTTGG'rCCA CAATC-ATCC'r CTAAACTG TAAAGAAGCC TCTCTCATTC TAATTACTT'r CGACATAGTC AAAGAGTTTT TTATC~wrrGG GACGAT'rTTC AAAAAGAAGT TCTGGATGCC ACATCATCCG TACTCATGAC AGCCTCAATG ATACCATCTT TTTAAAT7TG
TCTCCATAGA
TCAACAGAAG
ACGTTAAAGA
GTGCTAAGTC
TTTCTGGAG
AATCCTGCCA
GTTGGGTACC
CT'rGATGCTC TGGTGGTGGA AACGGTATCT GGTTC'TGTTA ATGGTCTTCG ATATCTTGGT ACCCCAGACA GAGAAAATGG
ATT'GGACACC
TAGGATCATG
AGGAGTTGAT
CCAAGCGTTG
ACAAAGT'rCC
GCTTTTTCTG
GAGAAAGGCG
ACCCACAACT
ATGAGAGATTr
AGTTCTCTAC
ACCCATCGCA
TTTAATAGCT
ACTATCAATG
GATGAGCTTG
GATGA'rGGGA
CATCATGATC
10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 TCCTTGATGA GGGCCAGTTC GAAGATATCT CTTTGAAGGT GATAGTCATC GTTTTGGGTT CGCCATAAAA TTTTGGATCG ACATT'rTGCC CACCTGTCAA TCAATCAAAC TGATATAG'rG GCAGGCCATT TCTTGATCAC CAATCGGTAG ATCCCTCCAG CATCTTTAAC GCCTTCAACA AAGCCTTTTG CTGCGTAGCT TCATCATCTC GATGAGrT'r TTCGTTTCCT TGATTTTCGC TTTCTAATCC TC'rTTCGCA 'rCAAATCGAC ATACTGAACG ACCACGTCTT TGAGAATTCC 'n-rCACACCA GCATCAACCA TGCGGAACAGT TAGGATAGCA GTCTTCACAT GTAATCCCAA TAACTGGT'rT 'r'r'CATAAAA TGAAC'rAGAG GAGGGTTTGG AGTTCACTTG TTGGTAAATG CAGATGGACT GGTGAAAAAC AGAGATTAGC AACCTCTTGT GACTTGACGC CAGCATCC'N' GATrTTATCC 7rGATCTGAG AAATCCCGTA AATCGGAATC CCGTCACGAG TTITGGGTACC GACTTCAGGA TGGTCGTCTA GGTCAAAGGC CATGATAATC TTCATC~rGT TACGTTCGTG GAAGCGGTAG 'rGGAGAAGGG CATGGCCCAT ATTTCCAP.TA AATCGGCAAA AAArGTcATr CACCAAAATA GGAAAAATCA TTCPAGA GTTGGCACGT AGAGAGAGAG TCT'NrTGCT CACAACC7"TT C'rA'TCTTCT 190 CCAACCAGCA TGACATTGGT AATAGAGTTG TCATTGAGCA AGNNTrGA CATCATAGCC AAAACCACGA CGACCAAGTr CGACG;TACGG TCGCTGAATC AATACCGATA GCCTCTGCAA TCAATCTT'rT CTGCATGAAA GTAGCTNTTG GAATAGCAAA MTTATAGA AACATTGTGA AAAAACTAAG AAAAATCTTA G'rrGATGT AGGTCTCCGA CCAGCCCCTG ATAAACTTTT AGTGTATCTG GTAAGGTTAC ACA'FCCTGAC TACT'rGAGAG TACGCTCTAC ATGATAGCAG CTATCTTTCC GA77rrGTAA AGACACCACG =Ml-CCTGA TGAATATGGT GGTCTTCTGA CTTGTCAGTG ATATTGATTT TAGCCCCTGT
AAAAAATCTG
TTGCCCCTAA
AAAGTCAACA
TCCI-rATAGG 7"rCTACCAAG TCTCTTAAAA ATTCGATAG'r CTGTTTATCT TTCACAAAAT AAAAATCAAC AAAAATAAGA CATGAGATAG AAAACGGTAG AAGTCAGAGA AGTCACATAA TGAGAGCCTC ATGATCCTCA TCAGTT'CAAA CATTTTGGCT CTATCCATGA GGAAGTAGAA 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 TTTGAAAATA TCAACTAGAC GAAGGCCAAA AAGTTCC'rTG TTAATGATGA TTTTGAGTTG GAAGCCTTCA CCGCTGTTTG CTTAGTTTCA AAAAAGGTGT GTAATCGTAA CGACAATTTT CGATAATTTC TTTTAAGGTT ACAAACTGAT GCGAAGGGAT GAACA'rGGCT GGATTGGACA
GCACTTTTTC
TATCTTTGAG
TTAACTGAAT
CAAAAGGCGA GTCAGT'rCAT GGTGAATTTT TTAACAGAAG GATTTTTCA AATGCCATAT AGTTACCAAC 13080 GGCTAAGAGT 13140 GGCTAACCTC 13200 T71'TGTGGCG 13260 ATGGCTTCAA 13320 ATTGAAATTC 13380 'rTTGCGAGGG TTTGTAGGTC TTCAACGGTA TCCTTCAAGC GTTCTGAATT TGCGCCATAC ACGCCTGCAG TACAGGCTGA GCCAGTAGAG CAGCTAAATC TAGCCGAAGG AGTAAGAGGT CATTTCTG ACCAGGAAA'r CCAATATTGA GAACATAAGG GAGATGATGT TTTCC'rC'rA' TCAGGTAATA CTGAATCCCC TCCAGCTCTG
CCAGAAAGGC
CTTCTTTTAG
TTTTCTGTTC
GAAAACCGAT
CCAATTCTTC
AGTTTCTAGA
GGCTGCAACC
CTGGTC'rCCG
TCCCTTAGGA
TGAATGAATT
TTTTGTACAT GTTGAAAATG TTCTTCTTGT rTTrCTrAGGT ATGCCTACAA TGGCAGGCAG ATTTTCAGTT CC'rGCACCGT CCATGTAGATI AGGAATCAAA GTCCATGCTA GATGCGTAGA CCATGGAATT TGTGGGCAGA AGCAGTGAGA AAATCAATGC GGGA'ITr'TAC CAATAGCCTG AACTGCATCA ACATGA'rAGG 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 CAGCAGGGTG T'rGCT'rGAGT ATTTGGCCAA TTTCAGCGAT GGGCAGTAGG 'rTrCCTGTCT CATTATTGAC AAACATGGTA GAAACCAAAA TCGTATCGTC ACGTAAAGCC TTTTGAATTr GCTGGGCTGT GATTTCTTGA TrTCTGGCT GATAATGGT TGCTT'CAAAC CCAAAGTGTT GAACCAAGTA ATCAATTGTT 'rCAAGGACAG CATGGTGCTC GATGGCAGTT GTGATGATAT GrTTTCCTT-G TrC~rGA CGAAGACAGT AGCCAATGAT GG'TAGTATTA TTGCCTrCAC TCCCACCAGA AGT4GAAAAAG ATATGTTGAG GTT?1TGTCCT TAGTAACTGG GCTAGTT~CCT GACGGGCTTC TCGCAAGAGT TTGCCAGCTT GACGACCATG ACCaLTGAATA CTAGALAGGAT TTCCGTrGGGT 7TCTTGCATA ACCTTGGTC-A TAGCTG.AAAT AGCAAC'rGCT GACATAGGAG TCGTTGCAGC AT'rGTCCAAA TAAATCAAAG AATCACCTrA TTTCTrTTA TTGTAGGCAA AGAGTGGGCT GACTGGMT CTrTCGTGAA TACGGACGAT AGCATCACCA AT'rAACTCAC TAGCAGTGAT GTAGCATACA ITTT1TAGGAG TTrCTTT TGTGCrACT CAAGAATTTC TTTAATATrA CGTGGCTAGA AACAGCATAA CAGAGAAGGT ACGTCCTGTA CATCACCAAT AATATAACCT CGATAGGAGC A'rCAAGA'rA' GGCTAACGAC AACAACATCr GGGGAACAGT GAAAAGATTA GCAAATCAAG AGTCAGGATA TTGCTGTAAG TGGCTCACGA CTA7TGCAA GAAGCTCAGC
ATTTCTGTAG
TTTAAAATAT
TCGTTrACGAG
TCAGCCAGGC
CTCCTTCACG
CA'rCAATCAA
TTGCATCGTC
TACGCGCACG
AGCTCCCTCG
TTCAACGATT
GATAGCTTTC
GAATCAGTCA
ACCAAGACAC
TTAGAAGCTT
TT-ACCTTCAA
S
GAACCAAGCA ATCCTrTATC TCCACTGGAA TATCAAAGAA CGATCAACTC CAGCCTTAAC GGACAAGCAA TGCGGTCTTG TTGAGGGTAG TCGATAATGG TTTGACACC'r GAA'IrTTTAG GCAGTAATGT TTTGCGAATA ACCTTGAACC TGAACGGCAr CAGCATATTG GCAACTAGTT ACGTGCATAG CCAAAATATC AGCATCGACC ATGATTAACA 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 GAAGGACAAC GTTGATACTG TGGGCACT'rG CACCCACACA ATTCCATTAG GTGGTTGTTG ACAGGGAAAC TTGTTGATTG GATGATGTAA ACATCATAAC CACGGACACT TTCTTCGATA TTTACTTGGA TTTCTCCGTC TGAAAATTGA CGTGA'rGATA GTTTTCCAAG TGGGACACCA ACAGCTTGGG CAATTTTTTG TGCAATCTCT TGGTTAGAGT TGAGTGCGAA AAGTr'rCATG TTTTTMTC1AT CTGACAT'rAT AGACCGTCCT CTGTAAACTT TATAAATCCT AGTTATATTT ACCTTACATA TATGAACTGG GATTTGTGTA TTTTTATCT'r 'rTCTATT'rrA
ACCAATTTTG
TCCTGTTTGC
AAGTATCGGC
GTTGT-TAGAC
CCAAAAAATG GAGATTATTT CAGCTATTTT TCATACTT AAGGAGCTTT TTGATAGGAA ATCTGATTTT' 'rCTCTAAAAA CTTGCTCATG ATT'I-'CCACT TCAAGCTCCA ATTCGTAATC TCTGATCCAG 'rGCCATGAGA CCAATAGCTG TTTTCATTTC AACCAAGAAC CTGCCAGTTC TTACTTTGGA TACCATGTTT
GACAAATCGA
TTGTCGAAAA
TGTTATATCA
ATAGCGAAGC
CGCCAATTCA
TCCAGTACTA GCCCTTGAGG AAGTTCTTCC T'rACTCAGAT AGTTCTCAGC ATCTTTTAGT TGCAAT'rTTT GGTTGTATTC CATGTrCCA ACACrCTGCG GGACTTTGAG TGTCAACTCA 192 GCCCAGTCTT CAAAGGTTCG AATGCGCATA GCGACTTTCT TTCTCGCAG TTCAAAATCA GGCGTTcr.A 'rGTAGTAATT TGTTGAAGA AcAGG.AGTG.A cAccrGTGAA cTGG;TCTT AGACGATTGT ATTCATCr?1' TTCAATAGT CnwTCAATT CAATTTCTAA ATGTTTCATT 77T1CTTACC1' 7TNNATCG InrGAAAGCGG GTATATGAAT CTGGAGAAAA AATCAAAGAT GAATATAGA CCTTAGAA'rG GGAAGAATTT TTAAAGATA AACTCGTGG TATTCGTAAG ATTGAGT'TG TGACCGGTCG AGTCAAGCCA CGTGGCATTA CT'rATGCCAC CTTGGAACAC ATGGTTCAGT 'rTGTAGATGA CGTCAAGGAA A'rGCGAATCA TACAGGAGCG AGATTACATT TATCATGTGG 'rAGTAGAATA TACGGTTGAT GAAATTCAAA 'rTCGTACTTT GGCCATGAAT TACAAGTACC AAGGGGA'rTT CCCAGATGAG ATTATGGTA TAATAAGCAT TCTATTTA~r A'wTrGACC GATAATATGA GAACAAGGGA CTAGATCCTT ACATTCAACC TGTTGGTG.AG CAATATCGTA AGCAAAATAA CCATTCTCCA
ATTGAGAGCA
GATTGCAGG
GTACTGGATA
ACTCATAGAA
ACCATCAATG
TTCTGGGCAA
AT'rAAGAAGC TCAAAGAAAA AATCGCTCCT ATATTGCTCG CTTACG'rGTG TT!TTGCACAA GCGTCAGGAT AAGCATCAGG C'rATCGTTCC GAGCTAAGAC TATTTTGGCA CGATAGAACA T'rCTCTCAAC GACTGGAAAT TACAGCTAGA ATGATATCCA AGAAGCCCAG TAGGAAACAG TGACGATACA AGAAAACCGC AGAGTCAAAG TTTATACTCA ATGATACCAA a a **eaaa a a a a a a a a ATCGCCCATC AGTTGGATGA AGAAATGGGT GAAATTCGTG GCACTTrTTTG ATCC~rTGAG TAGAAAATTA AATGACGGTG 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 GATGAAGAAT ACAGGTAAAC GGTTTTGTAT GAATTGCGAG TCCGGATATT G'rCATTTCCA CGAAAATCAG CTTGACAACG GAATTGATCT GATAGCCAAT ATCGTTTGAA GAGAAATCAG TTGGCGGGGA TGGTATGCTC TTGTCGGCCT TTCATAAGTA TCCGCTTTA'r CGGTCTTCAT ACTGGACATT TGGGCTTCTA TACAGATTAT CGTGA=TTG AGT'rGGACAA GCTAGTGACT AATTTGCAGC TAGATACTGG GGCAAGGGTT TCTTACCCTG TTCTGAATGT GAAGGTCTTT CTTGAAAA'rG GTGAAGTTAA GATTTTCAGA GCACTCAACG AAGCCAGCAT CCGCAGGTCT GATCGAACCA TGGTGGCAGA TATTGTAATA AATGGTGTTC CCTTTGAACG TTTTCGTGGA GACGGGCTAA CAGT'rTCGAC ACCGACTGGT AGTACTGCCT ATAACAACTC TCrTGGCGGT GCTGTr'rTAC ACCCTACCAT T.GAAGCTTTG CAATTAACGG AAATTGCCAG CCTTAATAAT CGTGTCTATC GAACACTGGG CTCTTCCATT ATTGTGCCTA AGAAGGATAA GATTGAACTI' ATTCCAACAA GAAACGATTA TCATACTATT TCGGTTGACA ATAGCGTTTA TTCTTTCCGT AATATTGAGC GTATTGAGTA TCAAATCGAC CATCATAAGA TTCACTT'rGT CGCGACTCCT AGCCATACCA GTTTCTGGPA CCGTGTTAAG GACGCCTTTA TCGGCGAGGT CGATCAATGA GCTTTGAATT TATCGCAGAT GAACATGTCA AGGTTAAGAC CTTCTTAAAA AAGCACCAGG TTrCTAAGGG ANTGCTGGCC AAGA'IrAAGT TTCGAGGTGG
CTATTGGACG
TTGGAGGCTA
AATAAACCCT
T'rGGAGACTA
TTGAGCTTCC
ATGGAGTGGC
TTTA'rCAAGG GTTACTATGT AGACTAGATA GGGATACTTC CGATTAGACA AGCAGTTGCA GGAGATGGAC ATTTGGAGCC TCCATTATrA CCAGACGAGT GTAGCTTCTT ATGGAAATAT CAAATCCGAG TCCATTITTTC
AGCTATTCTC
CGTTACCAT
ATTAGATATT
T'rCTATTCCT
CAAGCAAAAT
TGGCTTGATG
GAAGAAATCT
AGAAGGGGAA
GGCTAAAGGC
T1CACTTGGTC
TCATATCGGT
GTCAATAATC
GACAT'CCCG
CTCTATGAGG
AcTGTCAATC
TATGAAAATC
CTCT'rrGCCA
ATCGAGAAAC
ATTATTGCTC
GGAAAGTATG
AACCGCAAAA TGCAACGTAT CTGAGAAAGG C1'TTGAAACC ATGACCACTT TCTAGTCTTG ACTCTAATAC CATTGCCAAT AGCAG-GTrCA CATrGTTACC AGCZACGGTrA TGCCCATGCA GCTACTTTGC TTTGGTTAAG CGATTGCGCG TGATGAAGAT CCCATACT'TC ATACAAGATT TATATrCACC TGCACACTGG TCGAACCCAT TTTCCTTTGC TGGGAGATGA TTTGTATGGT GCTCTGCATT GCCATTACCT ATCCTTTTAT GAAAGTCCCT TGCCGGATGA 'TrTAGTAAC
S
Sj S S 5*4 5 S. *5 S S
S
U4 4
S
St *4
S
GGTAGTCTGG AAGATGGTAT TCAACGTCAG CATCCATTTT TAGAGCAAGA CTT'GCAGTTA CTTATTACCC AGTTATCA.AC TAATACrCTA TAAAAACrGT CTCAGAGTAT TTA.AAGGAGA AAACTCATG(, AAGTTTlTTGA AAGTCrCAAA GCCAACCTTG TGCTCGTATC GTTCTCCCTG AAGGGGAAGA AGTAAAAGAA ACAGAAGTGA TTCCTG7 GCCTCGTATT CTTCAAGCAA GCTTGGAAAT CCTGAAAA.AA TCTTGAAATT GAAGGAATCA TGGATGGTTA TGAGGTCATC GACCC'rCAAC ATTTGAAGAA ATGGTTTCTG CCTTGGTGGA GCGTCGCAAG GGCAAAATGA TGTACGCAAG GTTTr'GGTTG AAGATG'rCAA CTACTrTGGT GTGATGTTGG CTTGGTTGAT GGAATGGTGT CAGGAGCGAT TCACTCAACA GCTTCAACAG
AATTATTATC
TTGGTAAAAA
CAAAACGCTT
TTAAAATTTA
ATTATCCTCA
CTGAAGAAGA
TrTACTTGGG
TTCGCCCAGC
17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 *so.
TCTACAAATC ATCAAAACTC GTCCAAATGT AACTCGTACT TCAGGAGCC'r TCCTCATGGT TCGTGGTACG GAACGTTACC TATTTGGAGA CTGTGCCATT AACATCAATC CAGATGCAGA AGCCTTGGCT GAAAT'rGCCA TCAACTCAGC AA'rCACAGCT.. .AAGATGTTrG GCATCGAACC TAAAATTGCC ATGTTGAGCT ATTCTACTAA AGGTTCAGGG TTTGGTGAAA GCGTTGATAA GGTCGTTGAA GCAACTAAAA TTGCTCACGA CTTrGCGTCCT GACCTTGAAA TCGATGGTGA GTTGCAATTT GATGCAGCCT TTGTTCCTGA AACTGCAGCT CTGAAAGCTC CTGGAAGTAC GGTAGCTGGT CAAGCAAATG TCTTCATCTT CCCAGGTATC GAGGCAGGAA ATATTGGTTA CAAGATGGCT GAACGCCTGG GTGGC?1'TGC CAAGCCAGTT AATGATCTTT CTCGTGGATG CATCACAGCA GCTCAAGCAG TTCATCAATA TACTGTAGTr ATGAAACTAT GTACGAAAAG CTGATTGGTG TCAAAAAGGA AAACTTCCAA AGAAATCTGT AATATACATA TCCGTAAAAC ATGAAAAGAG AATTTTTTGG CTCTTTGTCA GAGAAAGGAC AAATTTCATC CTr'rCTT TGAAGTTTTC AAAGTTCCGA AAACCAAAGG TGGTCGCTTC CAGTTTGGCG TTAGAATAGT TATCTTrGAG GAAGGTTT'rA AAGACAGTCT 194
GGCTGTAGGA
'rAATGCAGAT
GTGAAAACTA
CCTGTTTTGC AAGGTTTAAA GATG'TTTACA AGTTGACCCT TAAAGTGATA TACTATCCTA CACTGCCATT ATTCCTGAG AACTAAATTA GCGATGATAT CCTGTCTATA CACGACCTAT GATAAATTCC CTTTTGATT TTAA.ATGAGT ACTGTAGTGG GTTGAAGAAA AGCTAAGCTC TGATATTCAG AGCGATAAAA ATCCGTTTTT CATTGCGCTT GATAAGTTTG ATGAGATTAT GTAGTTGAAG GGCGT'rGATA ATCT'rTTCTT GAAAAATAGG ATGAACCTGC T'rAAGATTGT 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20199 CCTCAATAAG TCCGAAAAAT TTCTCTGGTT CCTrATTCTIG GA.AGTGAAAA GATAGAGCTG ATAGTGGTGT TTCAAGTCT'r CCGAATAGCT CAAAAGCTTG
AGCAAGAGTT
TTTAAAATCT
CTTTATTrGGT TAAGTGCATA CGAAAAATAG GACGATAAAA TCGCTTATCA CTCAGTTTAC GGCTATCCTG TTGAATGAGT T'rCCAGTAGC GCTTGATAG INFORMATION FOR SEQ ID NO: 7: SEQUENCE CHARACTERISTICS: LENGTH: 19702 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: ACCCGATGTA TCAGCGGATA TTTACTCTAT GAAAAAAGAC CCTAAGGTCT TTCAAAGCAC GAGCTGAAC TTTTGAAGGT TTGGTTTTAC CCTGATACAG TCTTACGACC TAGATCTAAT ATAGCGGATG GTTTTTTrGCA AGGGAATTGG ATTTcTGCTC TCCACATGCC TAAAAATCAT CTCTAACTGC
CCTT'TGCT'TT
CCAAACTTT
GGCACGTTTT
TGTAAAGTAA
TGCTAGCACC
AAGATTTTTT
ATCGTTGATT
AGCTAAGGGT
TTTTCAAACG
TATTATTAAA
TTAGGTTTAC
GTTTGGTTCA
CATACTTTAG
ATGTTATACC CACA.ATAAAA CGCGTTCAAC TTTACCTGAT CATCGATAAG AACAGTAACT TCGCGTGTGA ACGGTTGTTT CCATTGTGTT TTCCTCCTAT ACATACCGTA CTATGTTATC ACATTTCTT ATTTGTGTCT TAAATCAGGT CTTGCGTGAC AACAGAACAC CAGAATTAAA A'ITATGTGTA ATAGCCGTCA AGTCCAAATC CCACAGCTCA 195 TCTATCGArTT 1CTACAAC AATATCTGAA TCCAAATACA GTACACGAGA CTCGCTTACA TACTrGGAA TAAAATACCT AAAAAAGCCG CATATGAA.AG TCCCTCAAAG GGGAGACGAT AACCT?1'CAG AATArTACTG TCAATCTAAA CArrCACAAT C'TCACrATTC AAAGTCTCTA GTC1~'r=C CATCAATTGG AACCATTCTC GCGGAAGGTC ATCATTAAAA ACATAAAACT TAAGATTATA ATGATGAACA CAAAGAGAT
CATTATCTGC
TTTA'rrCCCA TATrr'rCGCC
CGGTTTACAA
ACCTAAGACA ATCGCTTPT TCATATTAT'r CCCATCATAT TATTCGTTCG TAAAACCATA TTTTTACA'rT ACGACACGGA TTATTGTTGT 'rTCAACI-rTA TCCATATAAG TCTCTTCTTT CAC~rTTTAT CTCATTTCTT GTTTCCCATC ATATGTTTCT ACGTAACCAT CCAC'rGGAGA TTTTAGATGA GTTTTACAAA TCGATTTCAT
AGTCCCATTA
TTGCCAAACG
TAGTTACTGA G4GCAG'rrAGC TTTAGTTCCA ATTGTTGCTA TGTCCGCAAA CCAGCGGAGG GGTCTc-rGGC TAAGATATTG TCCALACGAAC GGTCT~rTA
TAGTTCGCCA
CIYGAGTCACA
GCAAAGTCAT
CGACTTGACG
CCCATTGGGT
AATACGACT AGCGTCCAAC AAT'ITGAAC TCTTCTCCTC TAACTCTACG WrTGGATACT 'NTTCAAAGAG AAAGACTGGT TGGTCAAAAC ACATCCGTTC ATCCAAGTCC TCAGGCTTGA TCATAACTAC TTCCGCATTG TACTCGCCTT CCA'rGCGGTG TTTAAAGACT TCAAACTGGA CTGTTTGCTA ATTc~rATAA AGCTGAACGG
GTTGACCTAC
CCTCrTG AGCGCCTAGC ATGTACTCAC CACCAATTGC TCAATCCCCT TGTGGAAGGA 7TTrGCTTC ATAACATTCT TAGCAGAAAC TTTCATGAAA ATCTCAGGTG 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280
TAAAGGTTGG
CCTGATAAGT
TCTCACCACT
CAGGGGTTCA AATTCAAACT TCTT"rMrCC AACCGTCAAG GTATCCCCAA ACCGGTATCG TAAACCCCGA TA.ATATCACC TGCCACGGCA 'N'GGTCACAT CTCCGCCATA AACTGGGTAA CATTAGATAG TTTAGCCCCC TTACCAGTAC GAGGGAGATT GACACTCA'rG TACGGTCACG GTGACGAGGG CCrTGTATA AGGATCCACA CAAACTTGAG GAAGGTTTCA AAAAGACAGG CGTCAATTCT CATTTAAAAG CTCAA'rGTCA TGTCCCCGTC TTCTAGACTG TATAGAGGTC ATACAAGCCC AGCTAGCAAT GCCCAAGATrr CCCGCTCAA ATTCGCCAGA TACGATACGG ACAAAGGCAA TCCATGTTGG CTTGGATTTT AAAGACAAAG CCTGAGAAAT A7'rCACCGT CTGTTTTCTT GTGACCATGT GGTTCTGGAG ACGAACGTCT GCACACCAAA GTTTGTCAGG GCTC.AACCGA CCAGCCAGAA TAGCTTCCTC TGAAAACTCA TTCCCCGCTr TCCTTGACTT GCTCGTAGAA AGGATTGCTA CCAAAGAGTT
GCAAAACGCT
TCAAAGGCTT
TCTTCCAATT
CATCCCCTr'r GTAAAGCTCT AAACCTTGGT TCCCCATCCC GATAGGCCAG TTCATAGGGT CTTGCAAGAG ATCCAAAGGC TCACGACCGT 196 CACGG'rCCAG CTTGTTCA'rA AAGGTAAAGA CTGGAATGCC ACAATTTCT GGTTTGAGCC TCGATCCCCT TGGCAGAGTC CCACCGCCAT CAAGGTACGA TAGATCTT CTGAGAAGTC AGATATTCAC GCGCTTGCCG TCGTAGTCAA ATTGCATAAC ACGATGTTTC ACAACCTCAA CACGACCATG ACCGCAGCA'r CTCGTGCCCT GGCGTGTCTA AGATGAAGTA ACAGAAATCC CACGTTGCTr CTCGATATCC ATCCAGTCAC ATT'TAGCAAA AcTcccTGTT
TTACCGTACC
TrCCCCGC AGCCTCACGA ATCTCACCCC CAAAGTAGAG TAACTGCTCA GTCCGGGTGG G.AGATAATGG CAAAGGTACG ACGTTrTCTTA GAATATTCAT AACTTCTCTT TCTTGA'IrC TCTAT7"=r TTGTTTCAAT GATTTTTACA TTGGATTTTA CCATTCCTTT CAACACTCCA TTATATCGGA TTC7Trccrr
GTGATCTGN
ATTTC7TTCTT
AGCTGAGAAT
TTTTAGCATT
ATATCGGTAAA
TACGCAAGAA
AACTGCeGCC
TTTTTCAATT
ATAGAACAGA
GAGGTAACAC
TTTGCGACCA
ACTGCTGTCG
TCTATTTCTT
CTAAAAATCA
ACGTTGCCAA
CCTTGTCTAA
CACCTACTAC
TTCACTTrCCC CCTCCCTTAT TTATAGGAAA TCATTTCACG AAAGGATGCA AGATGAAAAT TCTrCAAAA TTAAGATTCT CTGAAGAAGA GATTGTTGAC ATGGTTGAAT TGCTGGGCGA AGTTGACACA GACTATGGCT GACCGCAAGA CTGTACTCCG CCCTGATGTG GCCGAAGAAG GAATAGACCG TGATCGCTTG TTTAAAAACG TACCTGAAAA AGACAACTAC TATATCAAGG TGCCAGCTAT CCTAGACAAT GGAGGAGATG CCTAATGACT TTTAACAATA AAACTATTGA AGAGTTGCAC AATCTCCTTG TCTCTAAGGA AATPrTTGCA ACAGAATTGA CCCAAGCAAC ACTTGAAAAT ATCAAGTCTC GTGAGGAAGC CCTCAATTCA TTTGTCACCA 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 TCGCTGAGGA GCAAGCTCTT GTTCAAGCTA ACAATGTCCT TTCAGGAATT CCACT'rGCTG TCACAACTGC TGCCTCAAAA ATGCTCTACA TITGCCAATGC AAAAACCAAG GGCATGAT'rG
AAGCCATTGA
TTAAGGATAA
ACTATGAGCC
TCGTTGGAAA
TCAAGCTGGA ATTGATGCTG CATCTCTACA GACGGTATTC AATCTTTGAT GCGACAGCTG GACCAACATG GACGAATTTG CTATGGGTGG TT'CAGGTGAA ACTTCACACT ACGGAGCAAC TAAAAACGCT 'rGGAACCACA GCAAGCTTCC TGGTGGGTCA TCAAGTGGTT CTGCCGCAGC TGTAGCCTCA GGACAAGTrC GCTTGTCAC'r TGG7TCAT ACTGGTGGTT CCATCCGCCA ACCTGCTGCC TTCAACGGAA TCGTrGGTCT CAAACCAACC TACGGAACAG TTTCACGTTT CGGTCTCATT GCCInrGGTA GCTCATTAGA CCAGATTGGA CCTTTTGCTC CTACTGTTAA GGAAAATGCC CTCTTGCTCA ACGCTATTGC CAGCGAAGAT GCTAAAGACT CTACTTCTGC TCCTGTCCGC ATCGCCGACT TTACTTCAAA AATCGCCCAA GACATCAAGG GTATGAAAAT CGCTTTGCC'r AAGGAATACC TAGGCGAAGG AATTGATCCA GAGGTTAAGG AAACAATCTT AAACGCGGCC AAACACTTTG AAAAATTGGG TGCTATCGTC CAAGAAGTCA GCCTTCCTCA CTCTAAATAC GGTGTTGCCG rrATTACAT CATCGCTTCA TCAGAAG3CTT CATCAAACTT GCAACGCTTC GACGGTATCC GTTACGGCTA TCGCGCAG.AA GATG.CAACCA ACCTTGATGA AATCTATICTA AACAGCCGAA GCCAAGGTTT 74GGTGAAGAG GTAAAACCTC CAGG?1'ACTA TGATGCCTAC TACAAAAAGG ATTTCGAAAA AGTCTTCGCG GATTACGATT CCTATGAC?' GATTCTCTC AACCATGACC GTATCATGCT GGGTAC?1TC AGTCTTTCAT CTGGTCAAGT CCGTACCCTC ATCArTCAAG TGA="TGGG TCCAACTGCT CCAAGTGTTG CAGTTGCCAT GTAC -rAGCC GACCTATTGA 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 CCATACC'rGT GTCTACC1TG1
CTGCTGCTC
GTGACAACTA
CAAT'rCAAAA
AAACTTGGCA
CGGACTCCAA
TTTTGAAGCA
ATGAACTTTG
ATCTTCTCAC
GGACTGCCTG GAATTT-CGAT TCCTGCTGGA TTGAT"GCC CCAAGTACTC TGAGGAAACC ACAACAGACT ACCACAAACA ACAACCCGTG AAACAGTCAT CGCACTTGAA GTCCACGTAG CTACTTCTGC CCACTTTGGA AA'rGACCAAA
TTCTCTCAAG
ATTTACCAAG
AT'rTTGG-AG
AGCTCAACAC
ATGCCAACAC
TAACCTGATT GACTGGTCTT TGCCGCTATC AAGGCTGCTC
TCCCAGGAGT
TTGCCCTCAA
TCTACCAGTT CTCAATAAAG CATCGACATC CACAAAAAGA CCGCAAGAAC TACTTCTATC CTGATAACCC CAAAGCCTAC CAAATTTCTIC ACCAATCGGA TATA.ATGGCT GGATTGAAGT CAAACTAGAA GACGGTACGA GGGTTGT'rGA 4860 TGCAC'rTGA 4920 AGTTTGATGA 4980 CCAAGAAAAT 5040 GTACAGATGG 5100 TATCTGAGGC 5160 CGGTATCGAA CGTGCCCACC CTACTCTTAT GTTGACCTCA AGATATGCGT TCTCCTGAAG GTACGCTCGC A'TrTCTGACG CTCCCT C=~ATGGTC CTCCTTCTCA AACGTTCGTA TAGAGGAAGA CGCTGGTAAA AACACCCATG ACCGCCAAGG GGTTCCCT'rG ATTGAGATTG AAGCCTATGC TTATCTGACA TTAAGATGGA GGAAGGTTCG AAGAGAAATT CGGTACCAAG AAGGTCTTGA ATACGAAGTC TCGCTCAGGT GGTCAAATCC GCCAACAAAC ACGCCGTTAC CATCCTCATG CGTG1'CAAGG AAGGGGCTGC TGACTACCGC ACCCCTCTTT GAAATTTCTG ACGAGTGGAT TGAGGAAATG TCCAAAAGAA CGTCGTGCGC GTTATGTATC TGACCTTGGT TCAGTTGACT GCTAATAAAG TCACN'CTGA C1TC~rTGAA TGATGCCAAA CAAGTCTCTA ACTGGCTCCA AGGGGAAGTC AGGTAAAACA CTGGAACAAA TCGAA'rTGAC ACCAGAAAAC GCCCTCAAGG AAGTTATCCA ATGCGrGTGG ATGCCAACAT ACTGA.ATTGA AGAACCTCAA CAACGCCAGG CTGAAATTCT GATGAAGCGA A'rAAAGCAAC TACTTCCCAG AACCAGACCT CGGACTGAGT TGCCAGAGT'r TTATCAGACT ACGATGCTAG AAAGCTGTTG CCCTAGGTGG GCTCAGTTCT TGAATGCTGA TTGGTTGAAA TGATTGCCAT 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 198 CATCGAAGAC GGTACTATTT CATCTAAGAT TGCCAAGAAA AAATGCCCT GGCGCGCCTG AATACGTGGA AAAAOCAGGT AGCTATCT'rG ATCCCAATCA TCCACCAAGT Cl~rGCCCAT CTTCAAGTCA GGCAAACGTA ACGCCGACAA GGCtN'ACAG AAAGGCCAAG CCAACCCACA AGTTGCCCI'r AAACTACTTG GTCrTTGTCC ATCTAGCTAA ATGGTTCAAA TI'CAGATCC AACG.AAGCTG CTGTTGCCGA GATTCCTTAT GAAGGCAACC CACAGGAATT GGCGAAGT'rG AAAGAAAACT AGACAGAACA AAACCAGCCC TAAGGTTGG'r 7'IrrTTCTTC CTACCAACTC CCAATAACTA 7TTTGGCT'rT ATTTCCAGAG TATTTTATGG TAAAATGAAG AGTAATAATA TTTATTAAAG ACGTAAAAAC AIG.A~rGAAG CAAGTACCTT AAAAGCTGGT AAACAGCTGA CGGCAAATTG ATTCGCGT~TT TGGAAGCTAG TCACCACA.AA C. GAAACACGAT CATGCGTATG GCTACCGTCC AGAGGAAAAA TGTACAAAAT GGATGACACA TCCCTGTACT CAATGTTGAA TCCAATTCTA CGGAACTGAA TTGCTGAAAC TCAACCATCT CGATGGAAAC TGGACTTGTC TTATCAACAC TGCAGAAGGA CTATGGGAAT TGAAGAACAA TCATTGCTA'r CGCTACTGCA CTGATACCCT TTCAAAACTT AACTCACAGC AGATATCTAT AAATT'GCG'rG ATGTCCGTAC TGCTTCTACA TTTGAACAAG CTATTATCGA GACTGTCCCA CCATACTTCA TGAATACAGA AAC=rATGAC AACGAATTGC TrrACATCCT TGAAAACTCT
ATGACCTTTG
CCAGGTAAAG
=MTACACAA
GCTCAATACT
CAATACGAAA
GATGTGAAAA
GTGATCGGTG.
ATCAAAGGTG
GTAAACG',rC AC'rTACCTrT CT'rGGCGAAA
AAGGTAGAGG
TCACTCGGCC
TCACCGTTCC
CTACTGTTAC
CAGACTTCAT
CTCGTGCCTA
TCG'rATCGC
GTGTTCACTC
GTGGCATTTA
TACTACTGTT GAGTTGACAG AGGTTCTGGT AAACCAGCAA CGAAGCAGGA CAAAAACTCG ATCTCTAGAA AGAGGTCATT CCCACGTGTA CTTGAAAAAA T=TCAAAC AGATCAGTGT TCTrTAAAAAC GTGGACGAAG AAAAGTTCCT AAGGTAGCGG GGCTGATGTA GAACTCGCTG ACCAAAACCA GAAT'rGAAAG TAGAATCTAG ACGCCAACTC GTACCGA'rGT CGAAACTGCT 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 CTCTACCTTG AGTACGGAGT r'rGCTATCCA GAAAGCTGTC AAAGATGCCG TCCGTAATAT CTATCAATAT TCACGTTGCA GGTATCGTCC CAGATAAAAC ATCTATTTrGA CGAGGACTTC CTCAATGACT AGTCCACTAT CGTAAATGCG CTVT'CAAGC TCTCATGAGC CTTGAGTTCG TGTCGN'TCG CCTATACTCA TGATCGTGAA GATACGGATG TACAACr'rCC AGCCTTTTrG ATAGACCTCG TTTCTGGTGT TCAAGCTAALA AAGGAAGAAC TAGATAAGCA AATCACTCAG CATTTAAAAG CAGGTTCGAC CATTCAACGC TTAACGCTCG TGGAGAGAAA CCTCCTTCC TTGGGAGTCT TTGAAATCAC 'rTCATTT'GAC ACTCCTCAGC TG'rTGC'TGT TAATGAAGCT ATCGAGCTTG CAAAGGACTT CTCCCATCAA AAATCTGCCC GTTTTATCAA TGGACTGCTC 199 AGCCAGTTTG TAACAGAAGA ACAATAACGC TCTTGTCAA CTGTAGTGCG TTGAAAAAAA GCTAAGCTCG AGAAAGGACA AATTTCGTCC TTTCTI'TrM GATG~rCAAA GCGATAAAAA TCCGTTTTTT GAAGTTTCA AAGTTTCGAA AACCAAAGGC ATTGCGCTTG TGAGATATT GGTCCCCC AGTTTGGCAT TAGAATAGTG TAGTrAAGC TCrTTC'rT ATCTrGAGG AACGTTTTAA AGACAGTCTG AAAAATAGGA TAAGATTGTC CTCAATAAGT CCGAAAAATT TCTCTGG'rTC CTTATTCTGG GCAAGAGCTG ATAGAGCTGA TAGTGGTGTT TCAAGTCTTG TGAATGGCTC CTAAAATCTC 'rTTA'rrG=~ AAGTGCATAC GAAAAGTAGG ACGATAAAAT TCAGTCTACG GCTATCCTGT TGAA'rGAGr'r TCCAGTAGCG CTTGATATCC
ATAAGTTTGA
GCGTTGACAA
TGAGCCTGCT
AAGTGAAACA
AAAAGCT'rGT CGCT'rATCAC
TT'GTATTCAT
GGGATTTTrCG ATGAAACTGA TTCZATGATTT GGACACGCAC GATGTTCGTAC AATGTGAAAG CGATCAAGAA CGA~TTAGC S S S S 0* OS S S S S 5S 5 5
S
GGGAGACTGT TTCAGCCTGA GTAAGGGCTA AACATATCCA GCGAAGAAAG TGATTTCGAA GAGATTGTTA AAATCTTGCG CCAAC-ACATA ATCTCAGGAA ACGAATAACA GTTGAAGTTG TTCAATCAAC 'rTrTGAGCAA GACGATAGAA GTTTCAGCGA TCTAAGGAGA ATTCTAGTAG TTGAAAGTCA TATTTCTTCA GCCTAGGAAT TTGAAAGCGA TAGTAATAAT T'TTCACGCGA TGATAGCT'rG TGTTCTACCC CAATGAAGCT CATCTTTCCC GACAAGAAAA ATCATGTTTA AGATGGAAAG CTGATGG4GCA TC'r'r'TGGTT GATGATACGA CCATCATrTT TGAACAGTGA GCATACCAGT CGTTTCAAGA ATTGGTI'TCC GCACTCAGGG ACGACTCATG GCACGGCTAA AT'rCGGGAGT GAAACAGTCr AGCTCTITrAG CCAACTCATA CATCGGACAA CTCTATCGTA TCAAGAACAG TGATGATATT TTTGTAAAAG CATACTCATC AAGTGAAAAT CA'rTGAGCTT ATATCAGTCA TAGAAATCTT GGGATTTGGT GATTTrTCTT TAGCACTTGA ATCGACGCTT TAAGGAATTT TAGAAGGT= CAAGATGGGG CGTCGTAGTC ATGTCTAAAA TCTGGATATT TGT'rCCATAT GAATCTTT ACTT=TrTC TACAA'rAAAA TATTATAGAG CCAACAATAA 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 S. 55 S S S w5 *S 55
S
CAGTTTGGCG ATGATT'rCCT TGTGTGTATC CTTATTGATG AGGGTCTTTA A'rGTCTAGTA ATT'TTGAT AAAATGTAAT AATGAGr'rGT TTTGTCGCTT TTCATTATAG GTCATATGGG TAGCCTCCAT AATATCTATA GGGGATTTAC CCACTACAAA AAAGAAAAAG 'rGTTTGATAG ATATCAAACA CT7"=TCTT TGCCTCCCAC TATCTAAAAA AATGATAATA GATATAAT'rG TAAACAAAAA TCCAGATAGG TTTTGCATGA TTGAGAAAGT TAAAAAAACr ATGGCAGAGA ATCGTTAATC TCAGATGTC GGTAGAACGA TAAACAAGGG CAAAAAAGAA ACCAATCAGA CTATAATATA ATAAACTAAT TGGATCTCTG TCGATAGTA 200 TCAAATGGCT AATCCCAAAG AT1GATAGCAG ATAGGATAAC ATCCAAATAG 'rACTTGGACT AGGGAAAGAA GGTA7TCATA AAATACCCTC TATCAAGAGT CTCCrCAAAA ACACGACCGA TGATTACAGG CAGGACAAAA GATAAGATAG 'rCGATAAAAA ccTTGGTrGr CCANTGAAA AAACCACGGT AAAATACTCA TCATGAATAT TCCTA'rGATT AATCAAATGA GCATAGCGTG CCCAAAAAT'r ACCGAGAATC TGA'rAAACCA CATAAGTTGC-A r.ATIAG AAGACAAATG ACCAGTTCCA GCTCTTNrTC TCAAAGATAA TTAATAGAAG GAAACTTCCC ACTAATCCCA CTrAACCCTAA AATGATCGTC ACATACAATC
AGAGCATCTT
TTGTTAAAAT
CAATTGTTG
AAATAATAAG CAAAAATArr CCAAATTGTC TTAGT'rITrT TTTGAAAGAT TACCCTGCTC GGAAGCCGTA CTTCCAAGCA TTTCTr AACCTCCAAA AAGAGAATAG ACATCAGCTC TGGTAAATAG GTAGATACTA TGMrG'r1CTC ATCGTACTTT TCTATATAAG AATTAAGTGC CATCTACTAT ATCCATCTTC GTTGAAAGAA CTGTAAATG'r GT'rGTAGGCT CACATTTATA AACTGGCAA'r TTTTCGACCT CATTTAGTCA TT AGTAT
CCCTTGCCTC
CCAAACAGCA
ATTTGTACCT
GTATATTTCT
GAATTArAM-r
TTTCTAAATC
GccmAfCrCCG
ATTTCAGAAA
ATCAGTTCTG
TCTGGACTGA
ATATAGGGAG
AGACCACCTG
GTCATTGCAA
TT'rGTCT
TTTCCATAAA
GTTGATAGCG
'ITTGGAGTTC
TCACCTTATC
CTGCTTCCAT
GAATGGCATA
CAAATrCTCT ATAATATAAC AAGTTTrGCTC CAAGTCCTCA GTACCTTCTT AAAATAGAr'r ATTTTATAGC CCATCTCCTC AAATGAGACC TT'rCTAGTCT 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160
TTCTTCCAGC
TTTTTTGACA
CACCACGTCC
AGCGCGAGTA
GATAGAATT'r AACTCTTCTA GCGGNTrTG TCAAAGTCTA CTCTTAATCA GTrCTTTACT AGAAAGTCCT ATTTCI'AACA GTTCATGCGA AGTGATTTTC CCCTCCTTCC ATAAAATGGA AGCAAAGCCT TCCAGCATCC AGACACGG'rC CGCGACAGCT CCGATAATAA TGGCGATAAT AGGAACTTTC ATACCTTCCC CTTGACCACG TTCTTCCGCT ATAAAGGTCA CAACTGGACG GCCAAAT-1rC ACAGCCAGAG CCCCGCCTGA ACCACCTTCA AGGTCACTCA TTTCCATGAG ATTGCGAGCC CCGACACCAG GATAAGCACC TCCTGTAT'rG TCAGCC7TGTT TCATCAACCG CAGTGCCTTT CGTAGCCTT CTGGATGTGG TTGGCCAAAA TTCCGTTTGA GG'rTGTCTTG CAAACTCTTG CCTT'N'GGA TACCAACCAC TGTTACAGCr TGGTCTCCAA GCCAACCAAT ACCACCAACA ACTGCACCAT CATCACGAAA ACAACGGTCA CCATGTAATT GGATAAATTC ATCAAAAATG CCTGTCGCAA AGTCCAAGC;T TGTCAAGCGA CTCTGCTCAC GCGCTTCTCT GACTATT=~ GCAATATTCA TCTAGGACTC CCTCCATGCA ATCTGACTAG GCTAGCAATC GTATCTGGTA AGTCTC~TTCT 'TTTGACALATA GCATCCACAA AGCCATGT'rC TAATAGGAAT TCTGCCTr'rT GGAAATCCTC AGGCAAGC'N TCACGAACCG 201 TATTTCAAT CACACGACGC CCAGCAAAAC CAACCAAGCT CTGTGGTTCA GCCAGAATGA TATCGCCCTC CATACCGAAA GAAGCI'GTCA CACCACCAGT CGTTCCATCT GTCAAAATGG TCAGGTAAAA GAGACCAGCA TTrTGAATCGC GTTTAACCGC GCATGAGACT CATGATTCCr TCCTGCATAC GGGCTCCACC CTGGCAAT'PT 'rTCGACAGTC GCATACTCAA ACAAACGAGT TACCCATAGA AGCCATGATA AAGTTAGAAT CCATAATCCC TAATAACAGC AGTTCCTGTC ACAACGGCI' CA'rGCAGACC CCAGTTTC'rTTGTAACCA GGGAAATGCA AGCGATCCTT ATTCTNGCAA GGTTCCCATA 'rCAATCGTCA AAGCCAAGCG AGGTATAGCT ACAGTGCGGA CAGATACGTT CACTTCCCAG GCTTACAGCC TGCACACTGG GAAAATAATT CATCTGGAAC TTTCCCTAAC CGAACGATrG GGATTGATTC GAATATACTT CGCAGAGATC TAGCCATCT AGAGGCTGTG AATAGGACAA GATTTr'rCA CCrACAACCG AAGAGCCACA GTCTGACC'TT TG7TTTTTrCA CGCATAGATG GCT'rTCAATC CCTGTAAACA TTCTTGGGCA GAAATACGAA
ATCCTTCTGA
CTCTGGCTTA
ATCTTTTTTA
CCATTGATTC CCCT'1TTCGG TTTAAACTCT TAAAGTCATT TTArrCr'rTT TAGGTAAGAA GGTT'rCCATC AAGAAGGAAG TATCATAATC CCCAGCAA'rG
TAGATGGTAT
GCTTGAGGTT
CTAAATAGAG
TCTTGATATT
ACATTGCGAT
TCTAATTCAT
ACTATCATT'r
GCTGAATCCA
GGACTTGGAG
CCCCGTAGGA
CTGAAATGAG GTCAAGCTGG AGACGGCACG TTGCATTTTC TGGCAATCAT ACTATCATAA CGCGCAAGCC AACTCCACCA CAAAGTTAAA GGCTGGGT AAATCTGCA'r TGGTCTGCAC TCCTTCAATT ATCAAGCCGT CAAAACGATT TTCGCCGTGT TAAGGCGGAA TGGTATAACC TGGATAAACT CTGGCAGAT AGAGATTAGT AATCTTACCT TCTGCATTCA TACCACACTC GATGGCATGA 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 CAATATCTTC TTGCTTAACA GACAAAGGCT GACCTGCCGC CGATATCAAC ACCTGAAACA AACTCTGTTA CTGGATGTTC TCTCCATGAA ATAGAAATTG CTACTTGCTIT CATCAAGAAG TCTCATAGCC AACAAACTCT GCCGCTCGAA CAGCAGCAGC T'rTTrCCGAT TGCAATCGAG GGACTTTC'rT CCAAAACCTT AACAATCCCG TTCACCCAAG TGAATCACAT GTCCATGCTC CAATGTGCCG AGCTGGATAG ATAACCCGTT CTATGTACAT CCT'rGGCCTC ACTAGAGGCA GTTTCAAAGG CAGAAACGAG TACGAATCCC TT'rACCACCT CCACCTGCTG AAGCCTTGAG TTTCAGCAAC AATCAAAGCT TCTTCAGAGT TATCCACTTC AATGCAAATC 'rGT'rCCTTAA TACCTGAACA CGAGTATTCA AAA?1'CAATG GTTCCTGCAT ACCTATTTCA TGACGCAGCG TTGGTTATTC CTTTGAAGAG ATCACCTAGG ATTTrGAACCT GGCACCATTG CCATAATTGG GTCATCTGGT TTTCAACCT CATAACAGGA 'rAGCCAATTT TCCATCTGAA CCTGGTATAA 202 CAGGCACACC TGCTTTAATC ATCTGACCAC GCGCA'IrGAT CTTATCCCCC ATCATATCCA TAACATCACC AGA'rGGACCG ATAAACrrGA TACCTACTTC TTCACACATG GTccAAATT TGGAATTTTC ACTGAGAAAT CCAAAACCAG GGTGAATAGC TTCTGCCTCA GTCAAGACTG CAGCTGATAG AACTGCATTA ATA'N'GAGAT AAGACTCTCT TGCCTTGCCA GGACCAA'rAC 12960 13020 13080 13140 AAACTGCT'rC ATCTGCCAAA AGCGTATGAA GAGCTTCCTT CTACCGTCGC AATCCCCAAT TCACGTGCCG GATTGGCAAT TAAAATTTTT CGAAACATGG AAGGGTACCA CTGGCTGCAA GCTTGCCATC CGTGCCACGA CGTTTTACAA CTI'CrTGAAC TTAACCTTGT T7N=TGATAAC TCCAACACAC
AAGTCGCTGT
CCATACCAGC
CGGCAGTDTG
CACGCATAAT
AGAACCTCCT
CACTTCAGCC
CATAACCAAT
GTAAAAGACC
CCCCAAGGC1'
AAAGAAAGGC
CAAGACACGG
ATCAGCAGTT GAATAAACCC ACGAACCGCA A'N'TCACCAC TAGTTCCCAA TTGCAAAAGT TTTGCTTCAA CCACAGCTAT TGGTCGCCTG GTACAACTTG AGTTTTCCTT TATTTTCAGG TCCATAATCA CAACACCTGG TCGTTGATGG TCACAT'1-- TCCACTAGAA GCATAGGA'rA 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 CATAACTGGG TATTGAGGAA AGTGGCCGTT GATAGCAACA ATGGTATCCT CGCTCACTTC ACGGTCGGGA AGAGCTTCTT TGATTCCTTG TTACCA.AACT CAACCATTTC T'rCGTTAGAG GGAGCTGGGA TNTCATTCAT GACTTTCATG TTGACACTAT CACCAAC-GT AACGAAGGCA ACTCCAACAA GTGGACTCTC TACAAGATNT GCTGGAACTT CTTCI'GCTAC AGTCTCTGC1' G'rrGCTAGAA CGGGTGCTGG AGCGACTTGA T'rCTTGCTAA ACTGCAACTC ATCCGTCCCA TGGTCAAATT GAGTCATCAA GTCTTTAATA AACGrr'rGAA AGCAAGAACT GCATTGTGGC ATGGAATTTC TTTCTCCAAG CCTTGTCCAT AATATCGATC ATTTGATACG TACCAATCCT ACGAGAATTT CCGTTACCAC ACCATCC'N'A
GCTTCGATAA
GGTTTATCTG
CCCTCAGTAG
GGAGCAGATG
GTTGCAACTT
TTACCAATGT TTGACCTTTT GTCCAGCAGC CAAGTAAACC
CCACACTTGC
TAGGAGCTAC
CAGGCACAGG
TTTTATAAG AAAATTCTCT TCGTTTAAAT TCATACTT'AT CTCCAAAACC AAAAGTATTT AAACGACATT AGCTTCGATA TTCAGC'rGGA 13980 TGGACTCGGT 14040 TCTTGCTTCA 14100 CAAACTTGAC 14160 CTATTCTCCC 14220 GAAATAGCGT 14280 TAATCTGATA 14340 ATGGTGACGA 14400 GTTGATGATA 14460 TCATTGCCAG 14520 TCAGCT'TCTT 14580 GAAGTCATGT 14640 GCTCCACGTT 14700 CTTCACTTGT CCCAGCTGTC A'rTGG'rACAA AGTTATGACG CATAGCTTCG TAGCTTCTAC 'rGCACCCGCA GCCCCCAGCA AATGTCCTGT AAAAGACTTG CAGGTACTTC CTTACCAAGA ACAGCTACGA TAGCACCACT TTCTCCTr GAGTTGACGT TCCGTG3AGCA TTGACATAGG CTACTTGCTC TGGAGAAATC CCAAGGCTAG ?TGATGGCC TTGATAGCTC CCTGACCTTC TGGATGTGGA GGTACGCATC ACAAGTATTT CCGTAACCAA CCACTTCAGC CAGGATAGTA 203 TTTCAGCGTG TCAAGACTT 'rCTAGAACCA ACATCCCTGA ACCTM'ACCC ATAACAAACC CATTGCGATC CTTArCAAAT GGGATCGAAG CTGTrAAGCc TTGGAAACCA GCGATGGCAA CCAACATCAC ATCTTGGAAA CCAAACTTAA
TTGATGAAGA
C1'ACATTCCC
TGGGTCCTTT
CAGATGCAAC
GCAGGCACTA TTGATAGATT AGAAGCCATA TTTGGTAAAG TTCATGAAGG CGAAGTACCI' GATAACACCA AAACGATCCC
CACGAGTTGG
AAGGTGTGAT
TGGAGCGGAA
TACAAACACC
CTTTTrGGAAG GA'rCTTCAAT
TATTAAGAGC
CATATAAAGA
ATCCTCTGTA GTAGAGAGAG AGAAGC~TCT GTTCCTCCCA GGCATCCCCA ATCGCATCAT GTTTGCACCA AAACCCATGG AGTCATTGGT TTGACACGTT TTCCT'rGATT CCACCAATAC CTCTACATCA AGATTGGCAT ATAGTTATCA AAACGGTTGG GATTTACAGC CTC'TTGGGCT GCATACAAGG TATCTTTTTT TACAAAGTAT CATCAAAGTC ACTATGATCA AACTATTCCA AAA7"rCTTCT CCACTACTCG ATTTACTTTC CATCAATGGC AACCAC7TT CTGCAACCTG CTCTGCCTGC TAATCTTATC TGACAGGATA TGACTCGTAT ATTCCGAC'rA CAGCCTTAGA AGCAGCATAA ACATATrrAAT GATAGCACCT TATTAAAGGC ACCAGTCAGA TGAGCATAAG AGTATCTTGG TTATCGAACG GAAAATCTTG; GATTTCTGCC GCATTATGCA AATTTTGTAA TGCCACCAAT GCCGATTTTC CCAGTTGCTA GGTGTATTTC CGATTGGAGA TGTTACTCCA 'rAACCTG'rTA ATTCrTTrCA CCTCTAGCTT TCGCTACATA CTTAAGCCAC CCAGTTAGAT AATCTTGGCC TGCTAAAAAT CCAAATTCTT TCATCGGAAT CTGAGCTAGT GCGCTCATAT CAGACTCAAT CATTCCTGGA
ACTGTCA.AAT
GTAGCTTCCT
GCAATCACAT
CCAATCAAGC
14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 GCGACC'rCCC
TTAGCTTGAC
TCTCTGGCTT
rrGACCTTGA
GTAATCCCTG
GTGCCACAGA CTTGGTAAAG CAATATTCCC CATCAAACCA ACAACACTAG TCATCATCGG TTTCAAGACI' GAT'rGTGTCA GCACTTTTTC AAAATCTGCT TCTGTCATCT CATTGTTGAC CAAAACATCT ACTGAACCCA GTTCTGCAAT ACCTTGATCA ATCATACGCT AAATGGGA.AC CACCTTGATA CCATAGTTTG CCCCACGACT G =TAAGACA ATGTTGGCTC GACCAATTCC ACGACTCGAA CCTGTAATAA TCCTTTCAAA ACTTCTACTT A1TTrAGTCT GATCTTCCAC ATGAGCTAAG TGAGCAGTTT CTTTCCCCGG TCCAATCTCG ATAAAGTTGC TrCATAGAA ACGAACGGGT TCCTTGACCT
TAGCGTCTGC
AAAACTCAC
CTGCTNGAGC
AGATATTTTT
ATTrTTTCTAA AAAATCTGAT ACATCTCCTG GAGCAATTCT TCTGAGATTG AAACTTGTGG GCGATGGCAA ATGTTCTAGT TTCATT1r' AAGTGCI'ACT AAACTCGCTT GATCAATrT TTTAACAAAA CCTGACAAGA TTATGCCTGC 'rrCTTGCATG ACCCCAATAC GACGCGTCAA GAGCTGAGCA ATGTCCTCTT 204 GTATTGCCGA CTAGGGGACA AGTAAAATCT GAAAAACTTA TTTGCATCAC AGCAGCTTCT
CCTGAGCTAG
GACCTGACAC
CTCGATCAAC
CTGGAGTAAC
GCGTA'rTGAG
CTCCACGCTT
AGGCAGAGTA
GCAATAAACG
AGTTTCAGCT
CTTAAGAGGA
TGCAACCACT
CACTCCAAGT
AACTGCTACC
AGCTACCAAG
TTCTCCAAGA
GTAGATAGCA
AGI'TCTGGC
ATCAAGCGTT
TCTCCAGCAA
TCAGAAGCTT
ATCTTrGCCAG
GCAACCGCAT
GACAAACCAG
ACCGAAGTCG
GTATCGATGA
TTAACAATCG
CCAGCAAATA
TTCTGAATT
TTCTTTAGAA
G'rGAACAACT ATCTTC7'rGC
AATAGCACGA
TAGCAGGTTC AAGGAGAGCG TGGCACCTGC TTCTTGCAAA TGACGATrN'G
TTTG.ACAGCC
ACTCAGCAGG
CTCAAAATC
CAACCATATC
CTAGAATGGC
GATAACGCAA
GATACTGATC
TGCAGGTGTG
7TrC7'rCAATG
AGCCGCTTCT
CAAGGCGCCA
AGGCTGATAG
TGGT'rGCGTA
ATCATAACCG
GTAAAG43
AGTTCAACCG
TTATAG1-rGG
ACCTCTACTG
TCCATAI'AGG
CTTGCCACCA
CCCTTCTT
'rAGCGGGrCT AGCACCTGGC GATTGAGTTT GTCTTCTrCC TCGCTCGATC AATCGTTTCT CTAGATACTG GGCACCTTGA CCTGTCCAGC GAGAGGCTTC AGGATTTCTT CAGCTGTTTC GAGCCACCAT CCACATCACC TCAAAGATTT CTAAATCAGG TCTCTAGTCA ACTGATTTTT AAALAGGCTG'T TTTAGTCATT TCTTACAACT TTCTTAGCGG CTCCGTAATA CAAATCTTTT ACAAGCCCTG CGAT=TGACC TGCCATAACA GCTTTGGCTA GAGCACCTGC TCCCATTTGT TTAAACGCAT CTTT=TCAGC CAGTTCAAAA ACAGCATGAC CAAAGTGCTG AGCTGAAATC TTCTCCTTCT AGTTTGGATG GGCATTCGAC TGTACAGCCT CTGCACCTAG CATAAACCCA CCTGCAGCAA TAACAGGAAT AGATATAGCT GTTrAATTTAC CGATATGCCC CCCAGCTTCC ATAGAAATCC CGTCCCZATCC 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 GTAGTATCAA TATCCCTTGC TTTTAAAATT TCTTTTGCAA CTACAAACCG TGTCCCCACC GCCGCAGCAC Cr'rCACCATC CGCAATTCCT GTGGCTACCT GTCGCACCAA GGTCATGG'TT ATTCCTTCTG CAATAACAGC GTCTGCACCG CTAGGAACAA CAGGAATAAC GATTATCCCA GGAT'TTCCTG CTCCTGTTGT GACAACTTTA
ATTTTTTCCA
GCTTCATGGA
ACACCTTCTT
TGCGTT'TAGC TAAAGCGACA AACGTTCCAT ATACI'TGCTT CAATAACGAG ATCCACGATG TCTTCCACAA AGGGAGATAA GAGCAfGATG TTGACCCCAA AGGGTTTATC AGTCAATCAT TTGATTTTAT CAATATTGGC CTTGACAACT TCTTTCGGGG CATTTCCCCC ACCGATAATT CCTAATCCTC CAGCCTTGGA AACAGCCCCT GCCAAATCAC CATCAGCAAC CCAGGCCATC CCTCCTTGGA AAATAGGATA ATCAATCTTC AATAATTCTG TAATACGCGT T'1TCATAGTG CCTCCAACCT TCCTTGCTTrA CGTAATAGTT CGAT'rTCACC ATAATrTGAC AGTCAAACTA TTACCTAAAC AAGAGGGAGT GGG'1?rCTCC CTACTCCTTC TACTAATATT CTGCT'rATTT TGCTTGCTCT 'ICAACGTAAG GATTTGGATA TCAAAAGCAT
TGCGTCCAAA
TTCAACGATA
ATAGrrT
CCCCAGGTCA
ATGAGACCTT
TCATCAAAAG
ATTTC TTGTA
TATAACAATG
AGCCTCCACC
GTTCTACACA
205 CAACCAAGTC ACCAACTGT T TTCAAG'rCAT TTTCTGCTTC CTTCGATTTrC TAGATTACI' TGGA.ACAACT CCAATGAATC TTGATTCAAG TGTTACTTCT GA'rGCGTCTT TTCCAAGT'rC CTTTTTCAAA TACTGCCATG ATACCACTCC TTTAAAATAA TGTTCACCAC ATGATTACCT AAATTGTAAC AATGAGCGTG GAAGCCTGAT AGAAGAACAG TCTGGCTACC ATCTAAAGGG CTCTGAAAG'r AAAATCCGGA TACTGGCTGC ACTGGTATTG TGGAAGTTTG GCTCGGTCAA CACCAATTrT TC'rAGCCATC GGCTTGATGA AGTAGCAGAT AATCCAAGTC TGTCACCTCT CTGCTTGATA GACTTGGCTA CATCTCGAAT GGCAAAATCA CTTCAAAAAC GAATCTGCAC TTTCTTGATC TGAAAATGGA ATAAGTTAAA CACTCGCTGC GACTTCCATC GCTATTGAGA TTGCTCGCTA GCTTCTAACA AGACACCACC AGCACCATCr1 CCATATTCCA TCATATTGGC TTATCCAAAA TACGGTCA'T ATAGGAGATT CATCAATAGT AAGACTGTGC GTCCATCCAT GAATGTAAAC CTGAATGCCC CTCTCAGCTA AGAAATGCTC CCAAACAACA CAGCTGT'rGA 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 *0 TCGATCCGAC CAATCGACTG CCAATCACCA AGCCr'rTTTG AAAGCGACCA GCAAATACAA ATCCACTGCA AGCCCCGTT ATATTAGCTTr GAACACGAGC AGCTGTAGAG AGGATGATAA AATCCAGTTC 'IrCTCCTGTT ACCTCTGTAG CCAAATCACT GGTAGATTCT GTTCGACTTG AAATCCAC1'C ATCAT'rGGTA ACCACTTGCT CTGGCACATA ATGAGCAACC AATCCTCCAA AAATTGGTAA AGATTAGTCA CGCTCA'FGCC ATCAATAATT TTTTCTACCA GAATCAAGCG ACCCTTCTT'r G1TCAAATGCA
GAAGCGATAA
AAGTICAAAAG
GGCATCATCG
ATTCCAGCTT
GTTCTTGAAA
TCCATAATCT
TGACTTAT'TT
AACCTTTACC
TGGCCTTGTG
GATGCACCAC
CCTTrAGAGAG GG2"ITCACTA ACTTTTCAGC AGTTGAAAGA CAAAGGCTTT ATTAGCACCA AATCTGGAGT AATGGTAGCT TTGCCATCAG 'TrCTTAGCA TATGCCTTTG TCGTATTCCC GAGCCAAGTC GTGATTTGTA TTGCAAAAGC CATTATTTCA CATCACAGCA ATTTCTTCCT GAAGCGTTTA TGCAGTCTAT ACGACGATCC TGTTCTGACC GAACTCGCTC AATGTAGCCC GG 19702 INFORMATION FOR SEQ 10 NO: 8: SEQUENCE CHARACTERISTICS: LENGTH: 6211 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 206 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: GAAAATTTCC TCTCTTCTCT TGAAAAAT1TT TGAAAAAATG GTATGATAGT AACAAGTTAT 77 TAAGAGG AAAGAAAGGG GAATAATGGA GTCCGACCTA GTI'TTGGAAA TGGTGGTGCG GTTTTGCCTT TCTAGGGCGC CATGAGCAAG AAAGTTGGGT GTTGCCGTCG TGCGGATGCC ATGAGCGATA AGGGATTGGG AAGGATGCCT
CACTTCGTGA
TTTATGATGC
GTTGTTTGCA
TCACTAGTG
GCGTTCCCCT
TTCAGGAGGC
GiAAAATCAGT TTAGAATCTC CTAAGACGGG 1-?rAGGACTT GATACCATCT TrGGTTATCC GATATATAAT TTTAAAGGCA TTCGCCACAT TGAAGCTGAA GGTTrATGCCA AATCAACTGG ACCAGGAGCA ACAAATGCCA TTACAGGGAT 'rTTGGTCTTT ACAGGTCAGG TGGCCCGAC AGACATCGTG CGAATTACCA TGCCAATCAC TAAGTACAAT TACCAAGTTC GTGAGACAGC TGATA'rTCCG CGTATCATTA CGGAAGCTGT CCATATCGCA ACTACAGGCC GTCCAGGGCC AGTrGTAATT GACCTACCAA AACACATATC TGCT TrAGAA ACAGACTTCA TTTATTCACC AGAAGTGAAT TTACCAACTT ATCAGCCGAC TCTTGAGCCG AATGATATGC AAATCAAGAA AATCTTGAAG
C
C
4@ CO C C
C
C. C.
C. 0
C
GCCAGTCTTG
ATTTGCAGAA
A.ACGAGTCAC
TGCTATGACG
GGGGAA'rCCT TGAGATrGGC
GCAAATGTTG
CACTAAAGAC
AGCAGTTATT
TTAGCTGGTG GTGGA.ATTAG TTATGCTGAG CGCTATCAAA TTCCAGTGGT AACCAGTCTT CCACTCTr'rC TTGGAATGGG AGGCATGCAC GAAGCGGACT 'rTA'GA'rTAG TATTGCTTCT AAGACTTTCG CTAAGAATGC TAAGGTTGCC AAGATTATCA GTGCAGACAT TCCTGTAG'rr CTAGCAGAAC CAACAGTTCA CAACAACACT AAGAATCGTG TTCGTrCTTA TGATAAGAAA CAATTGTCCA ACGCTAAAAA GCTGCTACGG AACTAAATGA TTGGGACAAG GAACGATTGC GGGTCATTCG CACCAAATrAT CGTTTCGATG ACCGTTTGAC CACATTGATA TTGACCCAGC GGAGATGCTA AGAAGGCCTT GAAAAGTGGA TTGAGAAAGT GAGCGTGTGG 'rTCAACCGCA 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 GAACGAATTG GTGAA'rTGAC GAATGGAGAT GCCATTGTGG TAACAGACC~T CC CC
C
C
*CCC
*0*t 0C *4
C
TCGTCAACAC CAAATGTCGA CAGCTCAGTA TTATCCCTAC CAAAATGAAC GAC 'TCAGGT GGTTTGGGAA CAATGGGCTT TGGAATTCCA GCAGCAATCG TGCTAACCCA GATAAGGAAG TAGTCTTG=r TGTTGGGGAT GGTGGTTTCC CCAGGAGTTrG GCTATTTTGA ATATTTACAA GGTGCCAATC AAGGTGGTTA TCATTCACTT GGAATGGTTC GCCAGTGGCA GGAATCCTTC TATGAAGGCA GTCGGTCT'rT GATACCCTTC CTGATTTCCA ATTGATGGCG CAGGCCTrATG
GTCAGTTAGT
GTGCTAAAAT
AAATGACCAA
TGCTGAACAA
GAACATCAGA
CTATAACI-r GACAATCCTG AGACCTTGGC TCAAGACCT'r GAAGTCATCA CTGAGGATGT TCCTATGCTA ATTGAGGTAG ATATN'CrCG TAAGGAACAG GTGTTACCAA TGGTACCGGC TGGTAAGAGT AATCATGAGA TGTTGGGGGT GCAGTTCCAT GCGTAGAATG TTAACAGCAA AACTACAAAA TCGTTCAGGA GTCCTCAATC GCTTACAGG rTICCTATCT CGTCGTCACG TTAATATTGA AAGCAT'rCT GTGGAGCAA CAGAAGATCC GAATGTA'rCG CGTATCACTA TTATTATTGA TGTTGCTTCT CATGATGAAG TGGAGCAAAT CATCAAACAG CTCAATCGTC AGATTGATG'r rI rGGTTAA
CTTTCCGTGC
GATTCGCATT
GATGTCAGCG
AACAGTAGTA
CGAGATA'rTA CAGACAAGCC TCZATTTGGAG CCAGCT'GAGA AGAGAGCTGA GATTTTAGCG ATGCAGAAAA GAGCGAAGCC CTCGAACGGG TGCAACTCCA AGCCTAAAAG GCAATA-AATA TGAAAAAGAT GTTAAAGTAG TTCACAAGGG CATCCGCATG TGTACGTCCA CGTAAATCTT AGCAGAAGCT ACTAAGTTGG AGAATTGTAC GAAGCAGAAA CCATGGTTTC A.ACATCCACT GTCTGCTCCT AAAGGACCAG TCCAGCTCTT TATGCAGTAT CTGGTGTAAA GGTGTTGGAG AACTGAAGAA GATTTGTTTG CGAAGCAGGT T1'CGAAGTCT AGTTCTTCAC GAAATGAAAT GCGTCAATCT ATTTCAAACA CACGTAGCCC AAGCTrCGAT CTATTGCGAG TCATTCGCCC TTTACCCGCG ATTAAAAATC ATAGAAAAGA GAGAAAAGCT CAGCACTTGA CGGTAAAAAA TACCA'rTCAG
ATACGGTATT
CAAC'rrAAAT ATGACAG?1'C
ATCGCCGTTA
CTCAAAACTT
TTGATAAAGC
CTGATGTTAT
TCGCTCCAAA
TTGAATTTAT
GACACTTGGT
ACCAAGATGC
CGGCTCGTGT
GTGAACAAGC
TGACAGAAGC
TGATCGTTGA
CTGCTGAATA
GCGTGAT'rCA CGTCGTGACG TTATTATCGG AAAAGAAGAT GGATTTGATA CI'ACACAGT CATGATCTTG GCCCAGACG AAATTCAACA CTTGGAAGCT GGAAACGCAG 7TGGATTTrCC CAAACTTCCT GCGGATGTAG ATCTCTTCAT ACGTCGTACT TACGAAGAAG GATTTGGTGT AACAGGAAAT GCTAAAAACA TTGCTATGGA AGGTCT'rdTT GAAACAACTT ACAAAGPAAGA
CGCGAGGTGA
ATTATTCAAC
ATGACGGGAA
CGCAATATTG
TTATTrAAACC
AAATGGAATA
TCGGTTATGG
1740 1800 1.860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 TGTACTT'rGT
AGGTTACGCT
CTTGATCTAC
CGTGGTTTGA CTGCCCTTAT CCAGAATTGG CTTACTrTTGA GAAGCTGGA'r TCAAGAAAAT CACTGAACAA GTTAAAGAAA ATATGAAGC TIGCAAATGAC TTrMTAAATG ACTATAAAGC ACAAGCAGC'r AACCTTGAAA TTGAAAAAGT CGGTGACrAT GTATCAGGTC CACGTGTAAT TGTCTTGGCA GACATCCAAA ATGGTAAATT TGGACGTCCA AAATTGACTG CTTACCGTGA TGGTGCAGA6A rTCCGTAAAG CAATGCCArr CGT'rGGTAAA AACGACGATG ATGCATTCAA AATCTATAAC 'rAArrAGAAA 'rATATAGCGC TGGAGATGAT TTTATGAAAA AGATITATGAG AAAAA'rrGCA TCGTTATTAT TGCTTCTAGT 208 TGTATAATCT AA'N'ACACCG TCGAATAG TGCTAGCAGA CCAAAATAAA GCAGATTGGT CGTATGATGA AAATGCTGTA ATTAACATTT ATGATGATCC TAATNTGAA GATGGTAGGT TGCATATGAA CTTGAACAA TTCTTCAAAT T1GGCACAAAT AGCTAGAGAA GAAGGTCTTG AAATTCATTC TCCGTrGAG AGAGCTGGTG CGACTAAATC TGCTCGTTAT ATAGCGAAAT CGATT'"rGAG AAATAAAAAA CATTAACAAA TATAGTTGGT AAATCATTAG GACCTAAATC AGCTGTTAGA TTCGGAGAAG CTTTA'rCCTA 'rATTG.AAGGT CCTCTTCGCA GAATAAATGA GACGATAGAT GGCGGTTTAT ATCAAATAGA GCAAATTATT GCATCTCCAT TGAAAGAA'rC C4GTTTAAAT
ACTTATTTAG
GTGATGAAAC
AATTGACAGA
CGACTGGACTG
GGGTTGAAAT
TGA'rAGGCAA
TCAACTAAGA
CCAAAACTTT AGCTTCAGCT ATTCGTGGGA TATAGATGT CATATGAATA T'rACCAAT'rT GTTTTCTATC AAGACAGGAT CTGCAAAAAC TATTTTrrCA GTTGGA'-rA CAA'rrGGGAG AAATTAGATT CTAATTTTGT TCCTCGTAGT CAATTTGTAG ACACGTTGGA ?I-rGAA'rGAT GTAGAATATA AAGAAATTTT AAACTATTTT ATCTTCCATC GTAATGATAG TGAAGAAAGT TTGGTAGAAT GGTTATATGA TTGGATTTCC ACAAATCGTT ATGAACTTCC TAAAGAGTTT TCGATTCGTA TGGCTCATAA ATACCATGAA AGTGT-rACTG AAGTTTTCGG AGATGAATAA CTAAA.AACA GTCATTAGTG ACTG7TTTT ATAGAAAAAG AGG'N'T'ATA TGTTAAGTTC AAAAGATATA ATCAAGGCTC ACAAGGTCTTr GAACGGTGTG GTTGTGAATA CTCCACTGGA TTACGATCAT TA'rTTATCGG AGAAGTATGG TGCTA.AGATT TATTTGAAAA AAGAAAATGC CCAGCGTGT'r CGCTCCTTTA AAATTCGTGG TGCCTATTAT GCCATTTCCC AGCTCAGCAA GGAAGAACGT GAACGTCGGG TAGTCTGCGC TTCTGCGGGA AATCATGCGC AGGGAGTAGC CTATACTTGT AATGAAATGA AAATTCC'rGC TACTATCT 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220
ATGCCCATTA
GTAACTATTA
CTACGCCACA ACAAAAGATT GGTCAGGTTC GCTTT1TTTGG TCGGGATTTT AACTAGTTGG AGATACCTTT GATGCCTCAG CCAAAGCAGC TCAAGAATTT ACAGTCTCTG AAAATCGTAC CTrrATTGAT CCTTTTGATC ATGCTCATGT TCAAGCAGGT CAAGGAACAG TTGCTTATGA GATTTTAGAA GAAGCTCGA-A AAGAATCGAT TGATTTTGAT GCTGTC7TGG TTCCTGTTGG TGCTGGCGGT CTCATTGCCG GGGT-TTCTAC CTrATATCAAG GAAACAAGTC CAGAGATTGA GGTTATCGGA GTAGAGGCGA ATG'GAGCGCC TTCCATGAAA GCTGCCTTTG AGGCTGGAGG TCCAGTAAAA CTCAAGGAAA 'rTGATAAATT TGCTGATGGG ATTGCTGTGC AAAAGGTAGG TCAGTTGACC TATGAAGCAA CTCGTCAACA TATTAAAACT TTGGTAGGTG TCGATGAGGG ATTGATTTCT CAAACCTrGA TTGACCTTTA CTCTAAGCAA GGGATAGTCG CAGAACCTGC TGGAGCGGCT AGTATCGCCT CTTTAGAGGT TTTAGCTGAA TATATTAAGG GGAAAACCAT ATGCCAGAAA TGGAAGAGCG AATTTCCCAC AACGTCCAGG GATGATATCA CACGTTrGA ATrGGGATCG CTI'AGCAGA T'?rGATCCAG CTTATATTAA GGACTAATAA AAAAATATCA ACACTGTCTT TAATACTCITr CAAAACAGTG TTTGAGCAA AGTATAAGGT ATGATTTGAT A.AGTAATTAA CTGAGCTTAT GTGTCTGcTT CTAGGCTAGC GGAATACCTA TC'rCTCAGAT AAGGCTTGGA TTTCTAAAGG TTGTTGTA'rC
TGCCTTGATT
AGCTTTGCGT
GTATATCAAA
TAAGCATGAT
CTTAAATGGT
TACCTTCATT
CGAAAATCTC
CTTGCGGCTA
TTCTTTTTGT
ATTTCTGGAG GAAATAATGA TATCAACCGT TATGATGGTA TC.AAACATTA. C TTTCTGGTC GAGrTAA ATGATATCCT GGGGCCAAAT CGAGCTAGCA AGGGAACAGG CCCAGTATTA TATGCAGGTI' TGATTCGTAG AATGGAAGGT AATGAAACGC ThTAATAT GCTTGTCTGA TTGATTTCCT ATCTATTGAC AAGCATAGTC TTCAAACCAC GTTAGCTCTA TCTGCAACCT GCTTCCrAGT TTGCTCTTTG ATTTTCATTG TGACAAATAT ACTATATTAA AAAGATATAT 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6211 CTGTCTTGTC ATCTCTATTA AGGATGGTTT AGATAATCGG ACCTCAATAT CCAAAGGAGT GATGAATTTG AAGGACATAA GATTTATTGA GGAAGAAAGA TAGGAGTTTT TGAGCTAGTG TTAGAACI'AT CATCTTCAGT TCTTAAATCG AAGAAATAAG CTATCTTACG GAAATAGAGA AGCAT'I-I-tT AAGAACTTGA ATAATTTCGC ACCTTAAGAG GGTAATAATA CAGTATVrTT ATTAGCAAAT ATTrATGGTG TAGAGGCTAG CAAAACCTAT ATATTATCGG ATTTAAAAAG GAAGTAAGAA A INFORMATION FOR SEQ ID NO: 9: SEQUENCE CHARACTERISTICS: LENGTH: 7939 base pairs TYPE: nucleic acid STRANDEONESS: double (0D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: CCGGACTCCC CACGATTCTT CAAAATAACT GAGTATATTT CTATCTTGAT TTTCAGATAT AAATTCTTCC TTCTGTGGCC TCTTCTTACG CTTGAGAAGA GCTTCTCCGA CAGGCTTCT TCCTTACTGA GCAAAACCTT GAGCATAGAT AAGT'TGCACT GGCA.AGCGTG CTCI'GTATA TTTGGCTCCC TTCCCACTAT TGTGGATAGC GAGGCGTCTT CTCATATCAG TCGTATAGCC TATATACTAG GATCCATCAC GACACTCCAG AACGTACATA TAAGCCTTAT GATCCATAAT AAATCTCTTC GATTTCGGGC GTATAAGAGC CATCATCATT GTGGACAATC AAAGGAGGTA 210 AGACCTTAAA GCCAC71XTT GAGCCATCCT TGATCGCCTC AATCAAAAGC ATAT'rGGCTT CC'IrCT T1GGATAA ACAAACTGCA GGCGCTrAGG GGCTAGATTA TGTCGTTTTA ACGTATCCAA AATATCCAGA AGTCGATCAG GACGATGAAC CATGGCCAAA CGCCCATTAG ACTTGAGAAT ACTClr.GGCA CTACCACAGA 1rTCCCAA A'rTAGTCGTG ATTTCGTG;TC GAGCCAAGAG ATAATGTTCA CTCTCGTTCA GATTAGAATA AGGAT'rCACC TTGAAA'rAGG GTGGATTACA CAAAATCATA TCCACCTTAC TCCCCTGAAT AATCATCGCA GATGACCTGC ATTTGCTCCT CTAATCCATT CCATATCCGC CAAACGCTCC TGAATCTCAA CAGACAATAT GTGAGCAGGC ATATTT'rTCA CAAACGGACA GAGCGTTCAG CTGTG GA GTACGAGTGC
TAGCAAAAAG
GAAAACGTGG
TTTGAATGAT
ATTGI'rCTTC CCCCACTGCT CCATTCCCAG CACAGAAATC CACAATCAAC CCCTTCTTAG AAATCGTGAT AAGAGAACAC TATCCACCGA ATAGCTAAAA ACCTCTCTAT TTTGATATCT GTCGAAAAGA GCTGGTTAAT T'rCCATGGTC CTATTATAGC AAATTCATAT AACTCTAAAC TACTTCTTCT CATTCGTCGA AAGGGAGCAA AGGAAACTGG TACTTTTCTI' TTCATCTAAA TCCACTACCT TTTTTAAATG GTGCAGGGCT AGCCGTAGTT AAAGCGGTCG CCTCCAAAGT GCGGATAGA-A GAACTTGAAC CTCTTCATCG GCGCTCTCCT GATTTTAATA TAACATTACA AAAAATATAA TCTCCAGTCC AGATTGG'rAG CTTGAAAAGC GTCTCCGTCT AGACTGGCTT TCCCTGTAAA ACTTTCAAGG TTTCATGAAT AGCCCCGTAT CACCCGTCTC CGCCCCTTTA GCTTATCACC AACATCTTCA ACTGGCTTGT AGCGTAAGAT GCTTCATCAG ATTTTCAATA AA'rCCTGTCC GAATCTCTGA AATGTGAATC TAACTCAACA AAGGCACCGT AGGGCTGAAT CCCTGTAATA GATTTTCATC TTAGTCCTCG ATTTCAATAG TTTCALAT'AC CCATAGCTCC TGTCTCAACA GCAGCAATGG CATCCAAGAC 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 CTAACTGACC AAAAACCGTG TGACGGCGGT CTAGGTGAGG TGTCCCACCT TGAT'rGGCAT ACATTTCTGC AATCGGTTCT GGCCAACCAC CACGAGTAAT TTCTTTCTTA GAATAAGGTA GGTGTTGGTT TTGCACGATA AAGAACTGGC TGCAAAGAGC ACCACGGATA TTGTAAAGCT CCTAGATTGA CTCGCCACCC ATACCAGTTC CCTTGATAAT ACGGTGGAAA ATGACACCAT AGTTAGCCAC TGTTTTAGGA GCATGTTCAG TGCCGTTGGT ATT'rGGACCA GCATTTGCCA CTTCTCAGAA TTCATCCTCA AAAGATTCGC CAGTTGGCTC TCCACCTTGG ATCATAAAGT CATAGTAGCC ATCTTTITGAA AGAGATACAA CGAAAAGCTT GATACGTAAG TCTCCGTGAT TGGTCTTAAT AGTCGCAAGA GGACCTTCTA CTGC=CAAT GTCTAC'rTGT GGAAAATGCA ATTCT7IrTTC TACCATACCA AATACTTCTA AGGCAGCAAA AATGCCATCT TCTTCTAATG TTTTTGTAAT ATAATCTGCT TTTTCTTTGA TTrATCATG AGAAATTCCC ATGGCAACGC 211 TGATTCCAGC ATAATCAAAG AGTTCCAAGT CG7rGAGACC ATCTCCAAAA ACCATGACCT TCTCTGGTTT CAAGCCAAGG TG=rCCACAA CCTTTCCAC CCCCGTCGCT TTGGAGCCTG AAATCGGCAC AATIATCAGAC GAA7NGTTGAT GCCAACGAAC CATGCGAAGT GACTGTCAGG CAAGTGCAAG TCATCTCCCT TAT'rrCAAA AGTCCACATC CTTCTTmrC ATGGAAATCG GGATCTACAT CTAAGTCGGG ATAAATTGGA CACTCATCAT ATCGGTGCGA GTCGACAACT TGGCATCATG ACTCCCAACC CAATTCCTTC TTGCTTAGCC CAAGAGATAT ACTCCTCAAC ATCTGACTTT GCTGATAAAT GACCTGACCT T'rTTTATCT'r CGATATAAGC CCCATTCAAA AGTCAGGCTT GAGATCACGA ATCTCTGGAA CAACACCAAA AATGCCACGTr TTCCTGTTAA AATTCCTT TCACGCAACT GTTTAAAAAC AGTGGGAATT TAAACCCTGT CTTTGAAT'rC CGCAATGTAT CATCAATATC AAAAAAGACA TCTTTGCCTr GTATCTTAAT TTCGCGTCCA TCTCACTACC TCTTTCAATC CATTATATCA TAAAGTAGGC AAATCCCCTA TTTTCAAAAA GTTTATCA7TT TTTCTTGGAT GAGAAAAGAG ACATATTTAT GAAAAAGCTC CATCGTGCTT CTCTTGTTTT CAAAC'rCGTA AAAAGGGAGC CACTGATCCT AACTCGCTCT AGCTTGTCALA AAAAGACCCG TTGGGGTCTT AATTCGCTTT CTTGTTT'rCA AAAGAGACCC AACTGGGTCT TTTC='-AAT C'rTCGTTTAC GAAAGGCATC CGCGAGCGCG TTGATAGCT GTTGTTrACTT- TACGTTCGT'r TTTAGCTGAA CACGACGAGG AAGGAT'rTTC CCACGTTCTG AAACGAAACG GCTAAGAAGC
TTGTCTGAGA
TGATAGATAT
'rrGATAGCTT-
AAGCCATACT
TCAATCTGAT
GTTACAAAAA
CCAGAGGCGA
CTAGTTGGAA
ATCTTGATCT
TAACTCTTTC
T'rTATTTTAA
TTAATGTCTT
CTCATT'rCAA
AGCTCATGAA
AAAGCCATTA
GTTCCTGTTA
TCAGTATCTT
S
S. 55
S
S
S S
S
S
55 S S 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 TGTAATCAAC ATATTCAATT TTGT'rTGCTG CGATGTAATC ATCCGCCACG ACGTTGTTGA GCCATGTTTT TTCTCCTTTA AATGGTAAAT CATCATCTGA AATA'rCCAAT GGGTTTGT'rG CGTGAAAAGT CTGACTGA A~rTGTAGGT GCTGAATAGT GCTCCACCTG TGTGACCCTC ACGCACACTA CGGC'rTTCCA ACGACCTCTG TCACGTAGAC ACGTTGTCCT TGCTGGTTAT CGACCTGTCA CCCCGATAAG TGAGCCTTTT TTAGCCCAGT TGCCGCCACA TAACGACA'1r GATAAAATCA GCCTCACGTT AACTTTTrA CGGCGTr'rGA TAAGTTTAGT TGTCCATTAG CTCCAAATGG ATTT'TCATTA TTGCAGTTGG TGCAGAGTAA ACATTTGGAA ATTCTCACCC CGTAACTACG AGTCTGGATA TAGCAAGATT TTCAGCCTGT CACCATTT'rG ACI'CTTAAAT G;TACGGTTTA CTGCAAGAGT AAAAGTCGCA ACTGCTACAT TTGATGGGGT ATAACGCAAC TCAGCGTCAC GTGTCATACG CCCTACAAGT ACAACATrGP 'rAATCATAGT TT-ACCTTCT-r 212 ACGCGTCAAT 'TrrGACGATC ATGTGACGAA GAATGTCAGC GTrGATTrr GAAAGACGGT 3960 CAAACTCr 'r AAGAGCI'GCA TCGTCATTTG CTTCAACCT? AACGATGTGG T1AAAGTCC~r 4020 CACGGAAATC NrGGATI"rCC TATGCAAGAC G.ACGTTC CCAA3'1rT GATTCAACAA 4080 CAGTTGCACC GTTGTCAGTC AAAATAGAGT CAAAACGTGC TACCAAAGCG TTTTTAGCTT 4140 CTTCTTCAAT GTGGACGA ATGATATAAA GAATN!'CGTA 'rmGCCATT GATATGTTCC 4200 TCCTTTTGGT CTAATGACCC CAAGACTr'rG CAAGGGGTAA GTGAGGrrCG CTCACAATAA 4260 ACTATTATAC TAGAAAAAAT 7TrTTACGC AAGTAAAAAC ACTAGAA'rrC GAAAAAACGC 4320 CACATGGGCG TTTTCCTGTT CTTATGGTTT GATACGCTGC AACATACGTG CGAATGGAAT 4380 AGCTTCACGG ATATGTTTT TTCCTGCTGC GAAGG'rTACC ATACGTTCGA TACCGATACC 4440 AAATCCTCCG TGTGGAACTG TACCCTATTT ACGAAGGTCA AGGTAGAA'rT CATArrCTGT 4500 ACGATCCATG CCAAGTTCAT CCATCTTAGC GACAAGGGCA TCCTAATCTT CCTCACGCAT 4560 AGACCCACCG ATAATrrCTC CATAGCCTTC TCGAGCAAGC AAGTCTGCAC AAAGCACGCG 4620 CTCTGGATTT CCAGGAACTG GTrrCATGTA GAAGGCCTTG ATGGCTGCTG GATAGTTCAT 4680 GACAAATGTT GGCACACCAA AGTGGrTGA AATCCAAGTT TCGTGTGGTG ACCCAAAGTC 47 ATCACCATGC 'rCAAGATGCT CGTAGTCAGC ATCTTCATCA TTTTCATGCT CTTGCAAGAG 4800 0GTCAATGGCr 'rGATCGTAAG TGATACGTTT GAATGGCTCT GCAATGTAGC GTTTCAAGAG 4860 *TTCTGfATCA CGTTCCAACG TTTccAAGGc TTGAGGCGCG CGGTCAAGAA CAC cTTGTAG 4920 -AAGAGCTTTC ACATAAGCTT CTTGCAAGTC AAGCGACTCA TCATGTGTCA AGTATGAGTA 4980 *CTCAGCATCC ATCATCCAGA ACTCAGTCAA GTGACGGCGT GTTTTTGATT TT'rCAGCACG 5040 GAAAAC'rGGA CCAAAGTCAA AGACACGACC AAGAGCCATA GCCCCTGCTT CTAGGTAAAG 5100 CTGACCTCAT TGGCTCAAflT AGGCTGGCGT TCCGAAGTAG TCAGTTTCAA AGAGTTCTGT 5160 *AGAATCTTCT GCCGCATTTC CTGAAAGAAT TGGGCTGTCA AAC-rCATAA AACCGTTCT'r 5220 GTCAAAGAAC TCATAAGTTG CATAGATAAT AGCGTTACGG ATTr'rCAACA CAGCTACTTG 5280 ***CTTACGAGAG CGTAgCCACA AGTGACGGTT ATCCATCAAA AAGTCTCTTC CGTGTTCTTTr 5340 *TGGTGTGATT GGGTAGTCTT GAGATTCACC GATCACTTCG ATGTCTGTGA TCTCCAACTC 5400 ATAGCCAAAT TTAGAACGTT CGTCCTCT'rT GACAATACCT GTCACATAAA CAGACGT'rTC 5460 ***TTCGCTCAAG CGT'rTGATAA CATCAAACTT CTCAAGTCCC ACTTCTTCAC CAAATTTTTC 5520 .GACAAAGTrr GG~rAAAAG CCACACCTTG AAAGAAGGCT GTTCCATCAC GCAATTGTAA 5580 CAAAGCGATT TrTCCTTTC CTGATTTGTT GGCAACCCAA GCGCCAATCG TCACTTCCTG 5640 ACCAACATAG TCTT'rrACGT CAATAATCGT TACACGTTrr GTCATTATT TTCCTrCT 5700 213 TTTTTrATTCT TTATGGCAAA CCACCTCTAT ATTGTTCCCA TCCAGGTCAA TCATAAAAGC AGCATAGTAA ATCGGATGCr CACTTCGATA ACCAGGAGCC CCATTGTCTC GCCCACCTGC CTCTAAGCC-A GCCTCATAAC AAGCCTGAAC ?rTT=CTA TrCTGCI'A AAAAAGCAAA ATGAACAGGA TCTrGTGTTC CCTGAGTCAG CCAAAAATCA CCACCAGGAT GAGGGCTGT'r CGGGGATAGA AAACTAATTA GAGAACrAGT CTTAAAAGCC AATTTATAGT CCAAAGGAGC GAGAAAACTC CTATAAAATC CTTATGAAAT TrGTAAATCC TT'rACCTTAA TCTCAAAATG
ATCAATCATT
CTAGGTCTGT
CCAAGGCCAC
CTTTCATCTC
CCACTTCAAA
AGGCCTGACG
ATTGGGCTAC
TAATGTCTGC
ACACACCATT
CTCACTACCC ATAAATGC'rT TCAAGCGTTC CGCATAGCTG AGGCGGACAT TTTCTGGTGC TTrCGGCTTCT TCTAAGATAA CAGTrGTAAA CATGGCCTTT T'rGACATTT G GAAGAGATA TCCTGGTACC TCTGCAAGGA GGGGATAGAT CATGCTTTCT ACAGTATCTT GCTCACCTGA TGCTGACGGA TTCGAAGTTG TTTGACCTGC TTCTCCAACG GCATAACCAA TCCGCCAACC GATGACCACT GTTTGCI'TGC GAATCGCTTC GACTGCTTCT TTAAGCGTGT TCCAAA'rCCA GCTCCTGTTA GTCTGTCACA TCCGTGI'AGC GAAGGCCCCT TGCGGTGA GGTATTAAGA CGTTCCTCAA TAGAGCCTCA ACTGCTGCAT AATCTTGGAC ATGGCAGCGA AGTCATGGCA TAAGTTTTAG CGATAGGCTA GAAATCGGTG 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 *g 0 0S* 0 0* 00 0 0 0* *0 0 0 0* 00 *00000 t 0P*0 0* Of 00** 0 TGAACTCATG ACCATTATA-A ACCAAGCGGC CATTTTCTAC ACCCCAGTTT CCAATTGCCA 'rGGGATTAGA 'rGGCGAA'rrC AGCACCAAAA ACTGCTCTAC GGTCACCTTA AAGTGATTGT CT'rC-GCCAT CTTGACCTGA TCTCCATAGC CATCACCTGG ATTGACCACA GCCATAAAGA CGACTGTCAC TTGATGAC GCTACAGAAT CCGCCGCCTT AAGCTCTGGC ACACCTGAGG GAATCGATGC AATGGCGGCA TCTTGGATAT AGGTTAGAGA CAAAATATCT CTACCCTCAG CCAAAGTCAC ACTrTCTTCC ATTTCTAAAA GTTGACCAAT GCTCCTGTTT CAAAATCTAC CCAGATTGGC TTATCTTGAT AACGGCCAAA TTCCTTAGAA ACCGTTTCTG CTI-rTTCTTG
CATAGATATC
AGAGTTCCTC
CCTTGGTCTT
CTTCCTTAGC
TAACCCAGTA
AGGTATAGAG
AGCCGTAAAA
TTACTGTATA
GTCTGCTAGG
ACGGGTGTA.A
GTCAGTGCGA
AGAAACAAAG
TGGGGTTGGG
AGAATATTTrG
GCGCTCAAAG
AAAAGAAGCA
ATGAGAATA'r
ATCATACCTG
GCTGCTTCTA
ACGGGAACGC
ATGATGACTT
GCTCCCGCAG
TAGCTATTGA
CGCCCATCTC
T'TTTGGGAGT AGTGAAATCT CCTTCAGTGC TTTGGCACGG CACGGTTGGA TAGTTTCATA TAGATAAAA.A 'rCAGATCCTG GGTTATCTTG TCAATCTCGC TGAAACACCC TGATTTAGCT GGCTCACCCA 7140 GCTCCAGCAG 7200 GGCCCTCCTr 7260 ACTTAACTTC 7320 CAGCTCCCTT 7380 GATAAACGTA 7440 214 AATCTrATGG TCATCTTTAC CAATCAGGAC AGCAAGCGCT TCTTGCTGTT TGTTACGACC AAGAACGCTG TAATAAGATT CCAAGCCATT GTATAAATCA ACCTGATCAG CCTGCTCTAA TCCTGCATAC TGCTGAGCTA A'TTTTCTCC TrCACTTTrA GCTGTrTGAT AGGrTCAT GCTAAGAGAA ACCATATACA, GAAAGGAACC ACTGATAACC ACAAACAAAA TCGTCATCCC TAGACCATAC TGCCACAGTA GATTATTT TGCTr'rGTTT TG TC7 T7Mr TCACTCGTCT A'I=rACCAT CrATrAAGCT TTATTACAAG TGAATATAAG AATACTCTrC GAAAATCTCT TCAAACCACG TCAGCTTTAT CTGCAGACCT CAAAGCrGTG CT'ITGAGCAA CCAATrCTAT TTCTCCCTTC AAACAAAACC GATTTTGAAA GTGAAACAGT TCTrACT'rTT TCAGTCACAA ATGA'rTAGAG TTTGCCGGG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 9897 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 7500 7560 7620 7680 7740 7800 7860 7920 7939 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCGCTCTACC GTCAAATAAT TACCATTTTG TTTAATACCG AAATTTTTAT TTCAGT'rGGT CTG7=GGAC GATCGTCGTA TACAGTACCA TTCTCACGAA GTAATCAGTA TCACCTTGTT TCCTTAATTT AAGGTAATAA TTACCATCAA ACCTGAATCT TTTCTAGTTG CTTCTCTAAA ACTTACTCCA GCAGGCATCA CATGAGTACT TGTTTGTTCT TTTT'rTCA.AC AATAACAGAG TCAATATAGG GCTGAT'rTGT AAGTCACGTC CACCAACTTC ACGAGGCCAT TCTA.ATGGTA ATCATCGAAT GCCAATGT TA ATTTTGGTTT AGTCCATGTC TTACCATTAT
CTACTGAAAA
TAGTATAATr 'N'TGTTrATA
CATCAGCAAA
TTGCACCACC
CTGGCGCAAA
CATCACTATA
GAGCGTCAAT
AATAGTTAGA
TTGTCATCTT
ACTTGTAGCA ATATrAATTT TATTCAAGAA GCTTGAAAAT ACCCGACCAT TGCTAAAAGT ACCTGT TGTA TCATTAGCCG TATAAATTAA
ATCATGAGTT
ATACAGAACT
ATGTCCAGTA
CCACCGTAAC
GGAATACGGA
ACACCGTTTG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 TTTAACAGI TCTTCATCCA ATGCACTATT AAAGAATTTG ATATI'rCTA GTGTTCCGTT AAAACCAAAC GCCGT7rTTC CTGCACGTTT CACTCCCCCA AGCATATAGT AATCAATACC TT TAATATCC TTGATGTA GGAAATTATC CACTTTCTTT TCTACI'ACTT TTGTACCATT TGCGTATAAA GAATATGTP TTTTGACTGA ATCTGCTACT ACTGCAACAG TGTTAGTCAC AGCCTCTTGT TTGTACTTAC CCCAAACTGA AGCAGGTCTrG GATACTAGGT TATTTTATT GGAAGAACTA TCACGCGCTT CCATCCCCAA CTCACCATTG TCTCTAAGGA ACACATCTAC ATAACTATTT TGTTGACCGG G7"TTGGAATT AGATATTCCA AACAGAGCTT GTAAGCCT1'r CTCACTTGAC TGATTGTACT TAATCACTAC AGTAAAGTCA TAACTCTTTrA GTAACA7=? CTCCGCCCCC AGGAGT'N'CT TCCGCTGTAG AAGATGGATC TACAGTAACT TCCGAAGAGT TATCCGATGT TGCAACAGGT TGCACCAACT TTGGTGTTGA AACTGAGTTA GCAACAAATG CTGATAATAC AATATTrTTT TTCATTTAT TTTTCCTCGT TTTCATCAT'r GCAATGAATC TTTGTGGT TCCCTGGAAG CAATTCAACA ATTTGATAGT CTTCGCTAAA AGGTACACGT GACTGGGC-AC 'rGTAAAGTA
CTTAACAGTA
AGGTTGTACT
TACTTCAGAA
CACTACAGTA
?rAAAACTTT
GAAGATCTTC
CT7"=CATC
GAACTGGGGA
CCGCTAGTAA ATTTATCCrT ACATTA7N1M 'rrrCTAAGAC G7"TTC.AACTG TTCGAGG'I-r TCCCAAATCG GAGTCGTTGG GTTTCAGTCT CCTGAGCTGC CCTAAGGTTA CATATTGTTT GATAACAAGT =7TTAACAG TTCAAAAGTC ACCAACATAT GTAAAAAGCA ATATCCT'rCT AGTTACTGCC ATTrTTTrCAG AGTTTCAAAA ATATCTCCTG ATTTTCCCCG TCAA'rATCAA TA''TTCAAC AACAATATGA ATATCTAAAT ATT'rCT'rATG GA.ACTCCATC AGCTAGATAA GTCATACAMT ?1'GCAAAAAC
TTTTTCCATC
AATATTTATT
TTTCTCTT
CATGAATCGT
AACTAAATCT
GACAGAAGCA
CTA'N'AGTGA
TGTATATTTA
GTCAAATTG TATT'MCTAA AAAATCACAG ACTTTTGAAA TATCGTTTAA AATCAGATTG TTCAGAAATA ATCATATTAT CGAACTTCCC AACTTGAATC CGCTTTA.ATT TCTGTAATAT GGTGCAGATA CTT'TATTTCC AGTAAGA.ACA GATACAATAT 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 AACCTGAAAC TACTGATACA GAGATTGAAA TCAATGAATA TGCCCAGTAG CTAACAGCTG 'rTGGAGGAAG GAAGTATTTA ATAAATACCA TGACGATCGT TGATACAATC AGCG;CTGCAT AAGCACCTT'r TTTA GCT GACCAAGTAC AAGTCCCATG GAGCCATGAC AATGGAAACA CAA'N'?TCGT ACGACGATTG 7TTTAGAAA CAAATCCAAG AATAAATACA CCACCAAGTA AAACTATTGA ACCATTCGTA TGCAGATTTA ATATCTGAGT CCAATTGAGA ATAAACCTAC TGCTAGAGAT ACGAATTGTG TCTGACATAT TTTTAGAAAT GACATCTTGA ATATCCAATG TCCATGAAGT TGCAACAGAG TTCAAACCTG TTGAAA'rAGT TGATTGAGAT TCGCTGCCAA GATCAAACCT GTGATACCTA CI'GGTAACTG GTATGCAATA AGATTTGGTC TTGAGGGATA T'rGCTAGCTG CACTATCTGC ATTGTACT
GCTGCATAAA
AAGTACATAA
TGATAGAATA
CGTACAAGCC TGTACCAATC AAGTAAAAGA CTGTGCAGT TGCAAGTGAC AAAACACCGT TTGTGAACAA CATCTTA'PTA AGTTTCTTAA TATTrTCTGT TGTAGTAAAA CGTTGAACCA 216 AA'rCTTGAGA '1GAAGCATAG GAAGACAACA 'N'GTAAAGCC TGAACCCATC ACAA'rTAAAA AGATGGAG'rT TGAAAGCAAG TT'AGGATCGA AAAGrrrTC ATTTGCAGCA ACGAATTTCC CGT"=CAA TGTTTCTGCT ACTGCACCAA AGCCACCTr AATATTAGCA ATCAGTACAA ATAAAGCTAA AACGACACCA CTAATCAGAA CGGATTTAG ACCACCACTA TAAGAATAAA AAATATTGAT G;TCAATTCCT GTCAATACI'G TAGACATACG TCCCAATTGA TAAATAATAA TAGAATTAAA ACGTTTATCC AAGTAATCAT TAGGTAAGAT AAAACGAATT GTCAGTGGAA ATAAAATCCA GCTACCTGCA TAAGAGCTAC GCATTGTGGC AAAAATGGAT ACCGAAGTAA TCACACCTTG AATAAAGTCT CTCCATA.ATA CAATTGCAAC TACACCCATC AAAATAATCA ATAAACCAGC TGATGCGAGG ACAAGAGTCC TGAAATAATA ATGCCGTATC GATGTCTATC
TACATAATGA
CGAAGTGCTT
CGTGCAAAGA
TAGCTACTAC
CAGCGAGTCC
CATACCAAGG
AGAACTCTTT TCCT'rTCATC TCTTTTTTAG AGAAATAGAT GTAAATAAAC AATCAAGATA ATCTCCATAT TGAMTATT
ATTAAGTCAA
TATTATAAAA
b *b p 0 0 0 000 *0 0 0 0 0 **00 *000 0.00 00 0 0 0 CTGCTTGTTT TGCAACTTCC AAGTCACCTr AACCTAAATC AAG7TTTTCA TTTAGACGCA CCTTACCTGA TATCATCTTA TAGATAACTT TATCTAAATC TCGTTCTrGA ATCAAACTTT CATAAGTACC ACCAATACCA GCTTCTGCTC CTGGACCATT GAATACAATG TAATCTTCTC
TTATTGTAAA
ATTC"MTTCG
CTGCCAATGC
AAACTCTTT
CATTGATAGC
CATCCCTAAT TGAGCAAACC CAAGA.AGGAA ATCGGACTGA AACCGAACCA TCTCCTTTAA ACCTGCAACC AACACCGCAA TCC'rGTTGTG CCCATA.ACAT TGCTTGTTGA ATAAGTTCTG TTCTAAAGGT TGACGAACAG TGCTACAGCA TACATATTTG ATA7TTGAAGT TT'TTTAGCTG 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 CCAATTTCAA GAACAAATCT GGCATAACGC CCATCAAGCG ACCACCAAGA TATTG'rrCAT CACCTGCAGC TACAAACATT TGAATATCTT GTACAGGCAT AGAAGAATTT TTAACTCCAA TCACACGAGG ATTTTGACGC ATTGrrGCAT ACAAACTACC AGTCAACGCA ACCCCTGCCA ATTGTGGAAT ATTATAGATA ATAAAATCTG TATTTGACGC AGCT'rCACTC ATTGCATTCC TGAAATAAAT AGGTGGGATA GCTGCAA'rAG CCAA7'rCGAT ACTATCTTTC GTGTTATTAC CTTTAGCAAC TTCCATAACA GCTTCAATAA TACATTCACC TGAAGAACCA TTTACATAGA GTACCAGAGA 7"MTACACGA TCTTGGCTAA ATCCAGGGAT AACGCCTTTG TATTTAGTTA AATATGCTGC GATTGAATAC TCTGGCAATT CATCGACTCC AACACTTTCT GAATGTTTTG ATGCAATATG GTTGATAACT GTTAATTTAC TTTGTTTACG ATCTTCTACA CTT'rGGTAAA TACCTTTrTAC ACCTTTGTCA ATGAAATATT
TTTCACCATT
AATCTTCAT
TTCATCATAG CAAGCATAAA CAGATTTCTC CTTTATATTG TTTTGGACGT GTAATCGCTC TT7rTTTA' GATGACATTA ATAAATCGCT GAGCAATTTC CACCAATCAC TACACTGGTrA ACACCTAAAC GAATTrCt TCGGCAATTA CCGGAATATT ATCACGCTCA TCTGATTGTA CACTTGTACT ATCAACGCCT GArI-rAAATG CATAGAGACC CAArrGATTC GGATArTrT C'T1TATITTT ATATCN'GGT CTTAAAGTTG CATCAAATGC ATCTACTTCT TrCATCGTAG CACTAATATA AATTCCAATT ATTGGTAAAT CTACTACTTT TCCGCGAATG CCCACTGCTC CTGcaTCTAA 217 TATAAGCNTT T'1rrAArTGT WrTGGATAAT AAAATCACCC AATPTTTTCA rTAGTTCAAA TGTG7AACCI GATAATGTrG TACCAACAAA TrCATCTAAA rrACrACAT CCGCCATCAG TTrGATAAAT TCACTGACAA CTAAGCCA'rc AATGACTGTT GTI'CCGCATT CTACAAGflC TGGTrCrrGA GGTGGATAAT CCCTTTTGAT CTGAAT1GCT TrA.ATATCAC CCACAGAATT 4500 4560 4620 4680 4740 4800 4860 4920 4980 CCCATAAAAG GCATCAAGCT
AGCTCCTTTA
AAATTCTTCA TTATAAAGGG TTGAACTTGG CTATAAATT TATGCATAAT AGrrTGATTG ATAAGCAGTC TCTAATTAAA CATGATCGCT CGAAGCTAAT TTCTTGTAGT CATTAAAACT GTACAATATC AGTTTGACCT GTAAGCTACT CCACAAAATC CCATAAATCT CATCTTCATT ATACACGCTC AGCAGTTTCT CCGTGTAAGT TTTTCTCAAC TATATAATGC CAAC'TTCT TAAAACCTTT AA.AACCACAT CCTATTrCGCA CAATGCTrTA TAGCAATGTC TT'rrrCAGCA CTTCACCAGG 'rAAACCTrGA rrrCrrTAGTr CCAAATTTGG TAATAATATT GTCTCTCTGG AGTATTGGAA ACTGAGGTGA AACAATAGT'r CATCAAAGAA GTPTTAGCGC CTTTATCTGC GAAATCGATG CTCCAATGAC ATATCCTCGT CTCATAATAC 'rCrrarAAAG CAAGAACAGA ATCATaTCAG CAATACGCTC ATTTCCTCAT AGTCGGATAA TTCTCATGAA TCATCTCTTG TTTTTCCCAA ATCGAGTCAA GATGAATAAT CATTCAGAGG TATOCCATAT TTGGTAACTT CAAGAAACAA TCACTCCACC CTCATTTTAT TATTCCTCCT ACTTTCCAGA TAATTAGAGA TATGCGATTG CCATACGAGA ACAATCTTCT TCGTCAAATT AGCTTTTTGT AGACCTTCTA AAGGCAATTT TCATTAAGTA TTCACCAATC ACTCCGAGAC ACTTCCT7-A CCGTAGACAT AAGTTCAACT TCATCAAGAA AACTTrrTCT G=rGCCTCTG GTATT'rGAAA ATGAATTGTC TGTTGCTrTG GATACATTAA TTCCTGflTr' AAGAAGAArr ACCTTCTATC ATTGGAATTA 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180
AACGP'PTACA
GTTCTTTTTG CAGTAACATA TCAGCTCCTT AGTTGAAGTA TAACACTTTT TTTTTTTrTC AATATTTTTC ATAAATTAGA AACTAGTTTC CCTTrCATAA CAGAACAACA AACATAAAAA TATAATAC'TT TTrATTCT'T TATATGTATT GTAACAACGT r1'ATCAC'rAA TAATATGTTC ATATTAAAAT TA'TrrTATrr TCGTTTTATT ATPTCTTTTC GGAAfl'TCTA TATAATATTT
TTCTTTATTT
CAATrrCT
TTATCGTAAT
ATTTTACTAA
TATTTCTAAA
AAAATTGAAA AAATATTTCT A~rAAAAGAG AATCCCATAA AAAAAGCACC AAACTATAAA TAAGTCAGAT TTATACCGCA AGCTAGAATG GTTCCTGGAT AATATCTAAT TTTCTAACCA ACTCGCAT ATTAAGAACA TA.AATTTG~r 'rGATTCGTGC CAGTAGACTA GCTAGTCCAA CACT'rG'TTTr CGTTGACCAT CTGTAGTATA GTTAACTCAC GACATTTACT 'rGT'rGGAATA AGTTrCrA
AAACTACAGA
CTAAAAAGTT
CCATACCTAA
GATGTACTAA
AGTTCCATAA
ATAAAAATGA
218 TTTTATA'rAG
TTTATGAGAT
GTAATATATT TTATTTCTAA AAATCAGGTC ACCTATTrrA CCACACCAAA TGTAACCCCA TACTTCCCCA AAACATT'CCA AGTGAAACGr ACAGACACCA GGCAAATAAA ACAC7'GTCA AAGCAACTCG AATTTCACGA TACAGAAATT CTTCAACCAT AAACCAAGGA ACrTGATG~r GAAGGCCAAT TTCCTTGAGC ATGAATCACG CTAAAACATA TACCAAGGCA TTTCATCCTA GTTCATAT ACATCCATAA AAAAGAAAAA AGAGACGCAC CGATACAAAG AAATTTCAAT AAGTATAGAG TATAAACTGG AATTATTCTT TTCATAGTTA
GACTTATAAT
TGACCTTGAC
CATAGAGAAC
ATACCAATAG
CCI'CCGAAAT
AAATCTTCAT
A'rTCGTTGTT
TAGTGAAACT
GCCAAAATAA
GATAAATAAT
CTAGCCTCTT
CCGATGGTTA
TCACGGTTCT
AATCTAAATC TAATATCrGC ACAATCCTT CTACCCATGG ACTTTGAGGC CCATCT'rGTA GTGGCGAATC TTTTGATATA AACGATTCAA TTCACTTGGA CTCCCGCAAA CATTTTTCTG GTTAACTCAA TCCAGCTGAT ATTTC'rTTCA TGGACAAGTT CTCCCAAAAT CG'N'CAGCCA TATTrCTTCT CCTTTAG1'TA 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 GTGTTTGYCC CATGTAAATC AATTrGTTTC-G TATCTCTTGG CCAAATTCAG ACTTGGATAA ACCCGCTTAT TTGAAACCAC GTTCAGGAITr 7=rAAAAI'T ATCTCAACGA AATCCGTTAA TAAATCGTAA TAAATTGGGA GATAA.AAACT CAAAACAATC
CAATAGAGCT
AAAAGGAAGT
TCTTAGATTG
'rGAAGAATAG CTCATCATCT CAATTAATTT GTCCTTTGTC ATr'rCAGAAA CTGAATGACA AGATACCTCA ATGCCATAGT TTTGGAAGAA GTCTAAAAGA AGT'IGATTTC TT'rGGCTATT TTTACTTAGA TAGAGATCAA TCATGGGAGA CCTCCAACAA ATTTGCTTCC ATTTGATATr CTGAGACGAkT TAAGGAATCT AACAACTTTG AGAAGTTAAT CGATTTCTTG TCTTCATCAT AAGCTTTTAC AGTTACTTGG GTTGTAACTA TCCCCTCTTT TCCCTCGGCT CGATAGTCTT GTCAATATAA AACAAAAACA AGATTCTGAT TATCATCTAC AAAGGCATITA ACTCCGTTCT TTATATCCTG ACTTTCAAGG AATTCCATAA CGrTT7TGAAG ATAGGATITCA TAAAATAGTG GGTAATTATG TT=ATGG TAATCATCTA AAAATGTTAC CTCAAACTCA CATGGATAAT TGGGCATCAA AAATATwrrGT TCATCCAGCT GTr'rGATTC TGCATCATCT AA'rT'CTGT1'T C'TAATTCATC ACAATCTAGT ATTGATTCTT TATTTAATGC q=TCT TT'CC'CTATT TCTT'rTAAT'r 219 TCTTTGCGAT TGCCGCAATC ACAGGAACGG rrACACTATT ACCAACrrGT TATAGAGCT GACTAT'rAAT AGAGAC?~?= CTAGCAGCTT CAAAAGCCTA ATCAGGAAAG CCATGCAATC GAAAACACTC ?I~rAGGAGTG A1'TCGTCGTA TTCCAAACG GTAAAATTGT CCATCTATTA AAACACCAGC TAC7GT=AA AC'rG=rAT CTTCTCCTTC ATAGCTAGCC ACTACTACTC CCArrTGACC ACTAGTTGTT AACGTATTAG CTATACC~rr TCCAACTCTA CCACGACGAT- ACTGAGAACT TGGTCTITCT AAAT'rGATTG AATCCCCAAT CTCTCCTTGA GCATATCCrT TTTTrCGTTGC TTCCCGTACT TTTAGAAATT GGATTGGTTC TGGAATTAGT ATTTTGGGGA TTTATCTCC TCCTTGCATC GTAGTCAGTG GACCTGTCTC CTTAAAGCTA GTCGGTAAAT GAGTATI'AA AGTAAACATC GGCTCTTGAT
TTGGAGATAA
CTCCAACAAC
TrrCCTAAA GCCCTCACTT CCATAGACAC CACAATGCCA TAACGATCCT GCGTCTCCCA TTTTGTCTCT TGTCTAATCT ATCTGGTGTC ATACAAGGAA TCGCAACTTT AAATCCTTCT CCTTTACCAC GAACTAAGGT TGGCGCAAGA CCTrCTGAAT AATAGACTTT ACCGCTCATT CCACTTCTTG ATGGATTCAA ATTTCCTAGT GCT'rTCAAAG TCTCAGAGT1' AGTTGCTTGA CCTTCTCGTC TGAAAGGAAA TAAGAGTCTG GTACCT'rTCT rTCTAGAATG TCCGATAATA AACACCCTCT a a. .a a a a a a CTCTGTTT GGGAACGCCA CCAACTCATC AAGTGTGGTA AAATCCTTrAC TGTTAAGCAC CTGCCACTCA ACATCAAACC AGTATTGTGG TGAACGTCCG TCCCTTATCG TGATTGAGTA 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 GGCC-TTTAAC ATTTCAAGA AAAAGAAAAC G'rGGTTGGAT ?TGTTTGGCC GCCCGAGCAA TTTCAAAGAA CAAAGTTCCT CTAGTATCTT CAAATCCCAA TCGTCTTCCT GCGATTGAAA ATGCTTGACA AGGGAATCCC CCACAGATGA CATCGACTTT CCCTCTAAGT 71=rAAATT CGTCATCTGA AACATCTCGT ATGTCATGAA ATTCrATTTC TCCTTCCGTT TGAAAAATGG ACTTATAAGA TTTCCTAGCA AATTTATCAA GAGCTTCCAT TCCCATCCTA AAGCCTCCTA TTCATACCTC TCTCAACTAG ATGTAACTTA TCCTCCTCAT GAGGTCAGTT TTACTTTCTG TTCCTCAAAA GGGCAGACTC CTCCCTTGGT CTTTAATGCA 'rCATTAACGA CGCTTTTCTT AGG7TTGACTT r'rCTAATCCT AGAATAAAGT TCTCACAAAA TCCCAAGCAC TCATGCCCTT TCCCAGCAAA TAAATCTAAA ACCCAAATCA
CAAAACCCCT
CTGTTCCAGT
TCGTCACACG
CTAGGTGGTT
GACCTCATGA
ATCGTTTTrTC AT'rT'rCAT
GCCACTTTCT
CTCGCTAGAT
CTCGACTGTT
CATAAGGAAC AGGAAGATTC GCTGAAAACA AT'rCGGA-ATA GCCATAGAGA a. a.
a a CTAGACAATT TGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCTAC CACGTGAAGA AAAAGATGGC GGAAGCGTT GATTGTTAAA G'I'TTGGAAGT CACCTCCAGC TAGATGTTTG 220 AGAAAAAGAT AGAGATTGTA GGCGATACAG CTCATCATCA TACGAACTCG ?rI-rGATT-A AGGTTGAACT ATCCGTNTTA TCGCCAAAAA ATCCCTCCTr CATCTCCTrG ATGAAATTCT CGGCTTGACC ACGrCCACGA TAAAGCTGAA ACTGGTCTTG GCN'TTCCG GTACCGA INFORMATION FOR SEQ ID NO: 11: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 8148 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 11: CCGTGGAACA AGCCAAGACC AGTTTCAGCT TTATCGTGGA CGTGGTCAAG CCGAGAATTT CATCAAGGAG ATGAAGGAGG GATTTTTTGG CGATAAAACG GATAGTTCAA CCTTAATCAA AAACGAAGTT CGTATGATGA TGAGCTGTAT CGCCTACAAT CTCTATCTTT TTCTCAAACA 9780 9840 9897 120 180 TCTAGCTGGA GCTGACTCC AAACTTTAAC AATCAAACGC TTCCGCCATC
TTTTTCTTCA
CGTGGTAGGA AAATGTGTTC TGCCTATTCC GAATTGTTT TCCTGTTCCT TATGAACCAC CGAGATGAAA AAATCGTGTG CGAGGAAAAA CGATACTGGA TGGCTCATGA GGTCAGGGGT GGGTAAATAC AATGAGCTTG TCAGATGCTG GATTATACCA CTACCAGACA GATGATCGTG GAACAGGACG CAAGCAGCTC CTCAAATTGT CTAGTCTCTA CAGCACTTTA T'rCTAGGATT AGAAAAGTCA ACC'rGAATCT CTAGAAGAAA AGCGTCGTTA ATGATGCATT AAAGAACAGT ACGAACCAAG GGAGGAGTCT GCCC~rTTGA GGAAATCTAG ACAGCAGAAA GTAAAACTGA CCTCATGAGG AGGAAGAAAG TTTGTAAGTT ACATCTAGTT GAGAGAGGTA TGAATGATTT 240 300 360 420 480 540 600 660
AA.AGAAGTAG
TCATTGCGCA
AAGTGGAAAA
CAAACTCACC
TGAGAGTTTT
TGCTCTGGCT
GGATATTGGT
TTTAGTATAT
TTGAATAGTG
AAGCGCCAAT TCTTTGAGAA GAAATCATCC GTCATT~CTGT TTTGAAGTGA AAAATGATGA GTAGGTGAAA AATTGTGCCT GATAAAATAA ATGAGAGAAT AATATGTTAT AATAAGTATT AACAGACAAG CTGATTCTGT TATTAAGCGA CGTTGACGGA ACAAAAATGC GTGGAAAATG GATTCGCTTG CAGTGCTAGA AATAGGCATT AQTAGGAGGT GTTTTAGATT GGAGAAGAAA CAGACCTCGA AAACAACCGT GTCATTPTAC GAGACACGTG AAAAGATTGA AAAAGTTATT 720 780 840 900 960 1020 1080 1140 1200 1260 CTGACCATAA AAGACATTGC CGAAATGGCT CTAAACGGGA AATATGAAAA AATGTCCCAA CATGAAACAA ATTACAAACC GAGCATTGTT GCGCGTAGCT TAAACTCCAA ACGAACAAAA TTAATCGGTG TTTTGATTGG TGATATTACC AACAGTTTCT CAAACCAALAT TGTTAAGGGA ATTGAGGATA TCGCCAGCCA GAATGGCTAC
CAGGTAATGA
ATGCTTCTCT
TCTCGTATCA
TAGGAAATAG
TGGGAGTAGA
TCGATGAGAA
a a a a CACCGGACTA GCTGGGrrAA TGTATCGAAA AAGGTTATGA ACTCGGAT'rC AGCGGGCAAG GCCACTCTAA CCATTGAAGA AAAGAAATCG ATCCCGATGA CTAGTCTI'TA CCGTTATCAA TTTGACAATA CGGAGTGGAC TCCTTTGAGG AAGGACAACA CAAGAAGAAA GGCAACAAGT AATGAAGGAA AATCACTTGC C'rAGGTGGGA TTATTTGCCT GTTTTTGATC TACTrrCTTTA TGATGGTTAC GACTAGGAAT GAAAGGATAT GAGGAGAAAG T'rTCTCAAG TCCTTTATTG GAGAAATAGC TCCTrGAAG'rA 'rCAGTAGTCT AGCTAGTGAT GCAATACGAA GTGAGTCGAC GTTTTAAAAA TATG'rrGT AGGGTATCAA TGAGTGGAAT AGACCAGGAT AACGTAACTG TTGATTTP'rC TTAG7'MTG GAAAATAGAA CCGAGAATAC TAA'rTACAGC
CGGCTTTATT
AAAGAAGAAA
AACCAATAAC
ACATTTTCIC
TGGTT=rGG
TAAGCATACG
AAAAACTCTG GTATrrATCC AGAGTTGAAT TATAACTTGC TITGcTTT'rCT TCTCCAACTG GGCTrACAAAG ATTTTGATTG CTrGGA'IrGT AGTGTGAATT AATCTCTGTT AAGAAATAAA ATGAAATGAG AAATTATGGG CAAG.AGAGTG AGGACCGTGTA TATTGAAAGC
ATTCAGCCGA
ATGGTCTTTT
TATGATGCCG
TTGATTACAG
GATGCTTTAA
AATTTGGAAC
CC'TCTAATrT CCGAAAATAT TTGATAGTCA GC'TCTATGAA TTTATGACAT GACCCAGTCC CGGATACGAG TCGTTTGAGT CAGATGCTAA TATGCGTCAC AAATTAAGGA A7TTTTACAA C'rAACTGTTG GGCCCTACCT CACAAGTTGG GTTGATTGGT .TTCGACGCT GGTTCAGCCC ACCAGATTGA AGGTCGCAAT GGAAAGAGTC GACTTrCTAA ATAATCCCAC CTAGAACAAG AGCAAGCTCC TAAATCAACT 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100
ACTACTTGAT
ATTGAAAATT
AAGAGGGCTG
AGCGCAGGA.A
AGGGCGAAGA
TTCATAAGGC
AATATGTATC
ATTTAGAAGA
TATAAAAAAT
Tr'TCTTTrTCT
CALATGCGACG
AAAAGTTATA GAAGTAGGCC TCCATTGGAC AGGGTT1GG'N' CGTTGAGGAC AGGTATCCGT GAAAGAGGAG TAGGAGTAGT AAAGGAAAAT ACTGATAAAA ACCTCCtAAT CCTGGTCTTT ATCACTCCGA AAAAGA.AAGC GAACTGATAA AAT-TGATT ATCACTGTTC CATAAATCGA TTTTTCATGA GTTTCCTCCT GGAGATGAGG AACTGTA'rGC AAAC7TGA-AA 2160 AAAAGTTGTG 2220 TrTGATTGTA 2280 AAAACTGTAT 2340 ACATGAATGA 2400 =TAGCTCTT 2460 TCCCAGTATA 2520 TTCACTTGTT 2580 ACCTCCTTC 2640 AATCCTCATC 2700 TCGCTCCGAA 2760 AATGAAAATA 2820 TTGATACACC AT~wrCTTATA GTGAGAAGAG GTCCTGACCT TCATCTATGA GTATCCTGAG AAGAGGAGTT ATAAAAAACA AAAGAACAAA CCTGCTTTCA GACCTGGGTA GTGTAGTTGC TTGC-,TTCTT CATATCTGGT TCAATGACTG TGATGC&rGT TTTTTTCATT TGG'ZAGGTGA
TCCATAGACC
TCTCATTCAG
CATAGCCAGA
2880 2940 3000 222 ACCGATGAGG GCAATCACTA AAATCAGAGG ACGATAGATT TTATAGGCC AGAAGGAGTG GAATAAGATT TCCGAAAATC AAAGACTTGG T1'CCCAATAC TATCGGCCTC ACGCCGTTTG AA'rACCGTAT GTGCGTTTGA TCAGTTTTTC AGTGA.AGGTT CCTTI'?TAA AAATCTTCCT CCCAAAAGAG ACTGTTGAGG GAGATTrGAGA CAGAGTTCCA AGGTTGGATT GTACTTGTCG CTGTCTCGAG ACACCGATAT CCTGGCGAG TTCGAGCTGC AGAGCCACTT CTTGAGGGTA ATCAGATAAA AGAGGATGAT 'rATTCGCAA GGGGACCAGA TCT'rT'TCA TGAGTTTGCT TCACTTTGGA GGCTGCGGGC =rTCAATCA TATTGATACT GAAATACCCA ATTCCTTGCG AAATTCTrTrC ACACGATTCA TCTGTCTCC 'ITITCTCATT'r ATGTCGTATA TATTTGACTA TATTATAGTC 7"'T"AAACAT AAAGTGTCAA GTATI'TTTGA CATATrT? GAAGAAATAG TAGTCTCCTT GTCCTATTTG TCTGACAAGT GCAAGCTCGT TAAGATATGA CAAAAGAATT TCA'rCATGTA ACGGTCTTAC CTrGACGTAA ACC~rATGG 'rATCTACGTT GATGCGACTT GAGTATTTAT TAAGTAAATT AAGTGAAAAA GCCCATCTC'r AATGCCATTG ACAATGCGCA AAAACGCTT'G GCACC1'TACA TTTATCAAGG ACAACTTCCG TCATTTACAG GCATGTTTGC ATTGATGGAA TT'GTTATGA CTTGGGAGTG TCTAGTCCTC CGGATTTGTG GTAAAATAGA TCCACGAAAC GATTGATATG TGGGCGGAGC AGGACATAC ATGCCT?1'GA CCAGGATCAG 'rTGAGAAGGG AATGGTGACC GCGAAGCTGG TGTTCAGGAA AATTAGACCA GCGTGAGCGT Ct
S
*5 S 0
S
**5000
S
S. 55 S S
S
5555
S
*SS*
*5*
S
*.SS
5* *0 S S
S
3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 GGT;I'TMCT-r ATAAAAAGGA 'rGCGCCACTG ACAGCCTATG AAGTGGTGAA CAATTATGAC TATGGAGAGG ACAAATTCTC rAAACAGATT AAGCCGATTrG AGACAACGAC TGAGTTAGCA GAAC'rCAAGA AGAAGGGGCA TCCTGCTAAG AATGATGAAC TGGGAGCGGC AGATGAGTCC GATGGTAGAA TTTCAGTGAT TACCTTTCAT TTCAAGGAAG CTTCAACAGT TGAAGTTCCA AA GCCCAAGA TGGAATTGGT GTCCCG'rAAG GCCAATAACC GCTCGCACTC AGCCAAGTrG
GACATCCGA
TATCATGACT
GCGCG'rAAGA
GAGATTATCA
CAGATTTTCC
ATCCAGCAGG
TCCTTAGAAG
AAAGGCTTGC
CCAATCTTGC
CGCGTGGTCA
TGAATCAGGA
TGGTTCGTAT
TTGAGCAAGC
AGTTGGTCAA
AGGCTATTCG
CTATGGATAT
TGCTAGCCTG
TTTCTTCAAG
GCGTGAAGTC
ACCTGCCAAG
AATTGAAGTC
GTrGGCTCTG ACCGCTDGAC CAAGCAAr'.G CTTTCATCCC AGATGATCTC CAAGTGCGGA AGACTTAGAA GAAAAATTCA CAAGTAAGAG GGAAAAAGAT CGCAGAAAAA ATGGAAAAAA CACGTCAAAT ACTACAGATG CAACTTAAAC GGTTTTCGCG TGTGGAAAAA GCTTTTTACT TrTCCATTGC TGTAACCACT CTTAT'rGTAG CCATTAGTAT TATTr'rTA'rG CAGACCAAGC TCTTGCAAGT GCAGAATGAT TITGACAAAAA TCAATCCGCA GATAGAGGAA AAGAAGACCG AATTGGACGA TGCCAAGCAA GAGGTCAATG 223 AACTATTACG TGCAGAACGT TTGAAAGAAA TTGCCAATTC ACACQATTG CAATTAAACA ATGAAAATAT TAGAATAGCG CGACCAAAAA TCGGAAATCO TATTATCTGT CrG7"=T GCACTCGCTT TGGAACAGAT CAGTTCCTGC CAAACG1'GGG CAACCTCCTA TAATGTCTAT TTCTTTACGT AGAAAAAACA ACATGGAAGA ATCCTATGTA GAGTAAGATA TGAAGTGGAC AAAAAGAGTA ATCCGTTATG CCGGCT.A.AA ACAGACGCAG
GCCATTTT
TTAGCGAAGG
ACTATTATG
GCGGTCATTG
CAATTTAPACA
AGAGAGCAAC
TAG'TCAATTT
AAGCTAAGAA
ACCGAAAT1GG
ATGAGAACTA
ACGTTGCAGA
TCTCGCAACC
AG7'rGGAAAA AGTCTGAGTT TGCGGTCA'rT AITGGGACAG Gc7rCATCAA ACCACCCGTA AG1'CCCGATT GCTGAGGATG TAAGTC.AGCA ACGGGTAAGA GGTCTITrCAT AAGTATCTG TAATCTCAAG CAACTTTCCT 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 'rTGGAGCAAA GGGAAA'rGGG ATTACCTATG CCAATATIGAT GTCTATCAAA AAAGAATTGG AAGCTGCAGA GGTCAAGGGG AT'rCATTT'rA CAACCAGTCC CAATCGTAGT TACCCAAACG GACAATTTGC TTCTAGTT'r ATCGGTCTAG CTCAGCTCCA TGAAAA'rCAA GATGGAAGCA AGAGCTTGCT GGGAACCTCT GGAATGGAGA GTTCCTTGAA CAGTATTC'rT GCAGCGACAG
ACGGCATTAT
TTTCCCAACG
CCTTTATGGA
CGACrGGT
ATGCAGATAC
GTAACTATGA
ATACCTTTCC
TACCTATGAA AAGGATCGTC AACGATGGAC GGTAAGGATG AACCCACATG GATGCTTTTC CAGTGCTAAA ACAGGGGAAA TGGGTAATAT TGTACCCGGA ACAGAACAAG TTTATACAAC CATTTCCAGC AAGAGAAGGT AAAAGGAAAG TTCTGGCAAC AACGCAACGA AAAAGAAGGC ATTACAGAGG ACTT'rGTTTG GCGTGATATC GCCAGGTTCC ACTATGAAAG TGATGATGTT GGCTGCTGCT AGGAGGAGAA GTCTTTAATA G'rAGTGAGTT AAAAATTGCA CCCCTCCAGT 5700 TACATGACAG 5760 CCGACCTTTG 5820 CTTTACCA.AA 5880 A=ITATAATA 5940 GATGCCACGA 6000 TTTTCTCAAG 6060 GGAGATGCTA 6120 GGTTTrGACGG 6180 AGCTCATTTG 6240 TTCGAGATTG GGACG~rAAT GTTrGCACA CTCAAGTAAC CCTGGCTTGA TTATCTTAAT ATGACTATGC TGGTCAGCCT GACAAGGGAT TTCAGTGACC ACGGTGTC-A' GCTGGAGCCT CTCGGAAATC TCAAAAAGAA CTCCGACTAA CATGGTTTTG CCACAGGCAA GCCAACTGTA GAAGGATTGA CTGGTGGCAG AACGATGACT GTTGGGATGA CCCTCCTTGA GCAAAAGATG C01'?NAAAT TTGGAGTTCC GACCCGTTTC CCTGCGGATA ATATTGTCAA CATTGCGCAA CAGACGCAAA TGATTCGTGC CTTT'ACAGCT ATTGCTAATG AAA?1'TATTA GTGCCATTTA TGATCCAAAT GATCALACTG A'rTGTGGGAA ATCCTGTTTC TAAAGATGCA GCTAGTCTAA GTAGGGACGG ATCCCNTA TGGAACCATG TATAACCACA ACTGTTCCTG GGCAAAATGT AGCCCTCAAG TCTGTACCG 6300 6360 6420 6480 6540 224 ATCTAGTCGG C?1'AACCGAC TATATT~'= CTCAGATTGC 'rGACGAGAAA AA'rGGTGGTT CGGCTGTATC GATGAGTCCG GCTGAAAATC AACCTGAACA TTAT'rCAGGT ATTCAGTrGG CTTCAGCTAT GAAAGACTCT CTCAATCTC GTCAACAAAG TCCTTATCCT ATGCCTAGTG AAGAATTGCG TCGCAATCTT CACAACCCA AC.AGTTCTGC TGAACAGGG AAGAATCTTG ATAAAGCAGA GGACGTTCCA GATATCTATG CTAAGTrGGCT CAA'rATAGAA CTTGAATTTC
CTGATTTTAT
GAGAA'rTTGC
AAACAACAGC
TCAAGGATAT
TCGTTGTGGG
CCCCGAACCA
GTTGGACAAA
AAGG'rrCGGG CT'rGTATGTG ACGGTCCAAC CAATCCTATC TTGGAGCGGG TAAGGCTTTA GACCAAGTAA
AACAGGAACG
GCAAGTCCTT
GGAGACTGCT
CTCTACTG'rG
GATTTAGCAG
AAGATTAAAA
A'rCTTATCTG
GAGACCCTTG
CAGAAGCAAG
ATGTTCGTGC
AATATG'rrTA
CCGCCCTTTA
TAACACAGCT ATCAAGGACA TT~TCCATCAG TGCTGGAAT TCCAATTTTA TAGAAAGGCG TTAAAAAAAT TACATTAACT TTAGGAGACT GTGACATT T TAC'rAACTT'r AGTAGAAAT CAAATTACAG GCCAGCACAT GTCAAACAGC ATCAGGCAAA ACTTC'TGTTT TGGTTGCTT'r GGAATGATTT TGT'rCATCT AAGGTCTTTC GI'AAAATCAA CTAGGtGGAG TTATCTTCTA GG'N'ATCCAG TTCAT'rTGGG TTTCAAACG CAGTAAACTT ATTAGTTTGT CTGCCTATGG GTGATTCT'rG CCATGATTGG AAGGTCTT'rA TGGGTGATGT ATGGCTCTCC ACCAAGAATG ACTTCTGTTA TGATGCAAGT ATGACGCCTG TACATCACCA AGCGAGTGGA AGGTTGACTT AGCTGGGACT CCTACAATGG GAGGTTTGGT CTTTTTCGCC CTATTTAGTA GCCAATTCAG GGTCTTGTAT GCCTTCGTCG GATTrlrTAGA TGAGGGGCTT AATCCTAAGC AAAAATTAGC
GCATGAGGAT
TTTCTGATT
CAATAATGTG
TGACTTTCTC
TCTTcAGCTT 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8148 TCTTTTCTAT GAGCGCGGTG GCGATATCCT GTCTGTCTTT ATTTTTCTAT ATrTCTrCG CTCTTTTCTG GCTAGTCGGT GACAGACGGT GTTGACGGTT TAGCTAGTAT TT'CCGTTGTG AGTTAT'rGCC TATGTGCAAG GTCAGATGGA TATTCTTCTA TGGTTT-GCTC GGTTTCTTCA TCTTTAACCA TAAGCCTGCC GGGAAGTTTG GCCCTAGGTG GGATGCTGGC AGCTATCTCT GAC'TCTCTG ATATCGGAA TTGTGTATGT T'?rGAAACA CAGTTATTTC AA.ACTGACAG GTCGTAAACC TATTTTCCGT T TTTGAGCTT GGGGGA'rTGT CTGGTAAAGG AAATCCTTGG CTTCTTTTGG GGAGTGGGAC TTCTAGCAAG TCTCCTGACC CTAGCAATTT TATATTGAT GTAAGAATGG CACCCTGATG TT'rCAGGG INFORMATION FOR SEQ ID NO: 12: SEQUENCE CHARACTERISTICS: LENGTH: 9909 base pairs TYPE: nucleic acid 225 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: TACrCCACCC TTAATATCCG 'rTCCTGTAAA TACTTTACCG CTTTTAAGTT CATAGAATTG AACPTTAAA TGCTTGTCTr CAAGCATCT TrCCA'rCCAA TTTTrAGGAG ITrTGACCAGC TTTAAATAAA AACCT'rGCTG GGGTGAT'rAG TATAGATTTA TCTGCGATTr TATAAGCTTC ATCAATAAAA TAGTGATATA TCGGCTCATC TCTGGCTTCT CCTGT'IrCCT GATACGGAGG ATTTCCTATC ACGACATCAA ATTTCATTTC ACTTTCCTCG CTAGATAGGC CCTCAAAACC TATCATTCTA TTCT=T'CC AGTCTTTGAT TTCTAGCTCA TCCGCAAACA AACTCAATTG ACTACTT'rTT TTCAATCCAT CCATCTGAAA TTTCTTTTGC TC'rAATGTTG GTTGATTT'CC
ATGGGTTTTA
TTGAGATTGC
GACATTGTAA
AGTCTTAGCT
GATTCTTCTA CTTCTTGGAC TTTTGTrTAG CTGAATAAGG GAGATAATAG rCGCAATTTC AGATAATAGT CCTCAAAAGT TGCCAAAAGA TTCTCACGCG CCAAAAGGAG AGAATCTCCT TGATACTCAT AGCATGATAA GCATCTTTTA CAAGTTTATA AAATGTGACT TCATrGAAA AATCCGTTGC AGTrTrCTAT CAACAAAACC AACTCGCTCA GATAATGGAA AGTTACGGTA TCATATCTCG TTACCATATA AGGTGCTTCA CCACAAGTTA TCGTAAGTCC ACATACTCCT CAAGACTTAA CGAGCCTAAT TTCGATTCTA TTGCTTTGCG ACCAACCACG T'TGGTGTAAA CACTTCTGCC CTrATTTTTG TTGTTCATAT TT-GGATT'N-r CAGATCTGGG CTGAATCAAG TTGGCAAAGT CTTACTTGGA TTGATGCGAT CACTTGGAGC AAATCCCTTT CCTAACAA='
AACCATACGA
CCTCACGACT
TTTCCTCACC
CCTCTAACCA
CATATCCAT'r TCCGATCTTr TTCCAGTAkAC
CATAAGAATG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 CGTAnCCCAA ACAATTGATT TCTTTGTCGT TCGATCTTTT AAAAGAATTT TTAATAAGTC AGCCGATTCT TTAGCCAAAC TTTCTTCACT AATATCTATT GTCATCAGCA ACCTCTCTTA TA'rrGTAAGC CCTATTATAT CATATTTTAA AGAATGAAAA TTTACTTGAA AAAAGTAATT CAATAAATA'r CTCTCCGATG ACCAACTTCT AGAGTAGCAA CGACTAATTC ATCATCTACA ATTTGTACGA TA.ACTCGATA ATTACCAATT CTATAGCGCC ATTGACCAAC GCGATTACCA ACCAAAGCCT TTCCGTGTCG TCTTGGGTCT TCCAAAACAT TGCTTTGTAA ATAGTTTGTA ATTAGCTTCT GCGTATAACG GTCCAATTTT TTCAATTGCT TGATAAAACG TCTTGTTGGA ACTAAT'rTAT ACAAATTATT CATCCTTCAA GCCTAAATCA TGCATCATTT CTTCCCAAGT AATGGGTTCA ACTCCTTTTT CCAAGTCTTC TAAATACTCT TGATAGGCTA AATCTGCCAC 226 ACGAGCATCG TATTCATCTr CTAGGGCITC AAGACr=r GTGCGAATAA GICCGAAAG GGAAACTCCT TCAAACI'AG CCA7TGCN CATAAATIGrr TTATCAGCTT CAGAAACT TAATGTAATA GTAGTCATCT rTTCTGCTCC CTrT'TAAT GGTAACACCA rGTATTACT TTTTAGGTGT TCAGTCAATA TAAAAAGAAC ACCTTCTCAG CGTTCTTTCT ATA'rCTCTGT CAATCGTGTr GCGGTATCTG GTGAGGTATC A'rAAACCTTA AAGTCTACTC CGACTCCCAG ATCAGC1TTGA GCCAGCTGAT TGACCATGGT CATATGAGCC AGrTCCTTGA TA'IrCT~rTC CTTAGATAAA TGCCCAAGGT AAATCTrCTT AGTACGATr CCTAGCGTCC GAATCATAGC
TTCAGCACCG
CCAAGCGTAA
ATCCGCATr'r GACAAAAC'rC
TCCTCGTTAG
GAACCTGATC
TCGACAATGC
'PTATCATCCT
AAAGGTGACC AAGGTCAGAT AGr.ATTCGTT GCAAAATCTC TACATCATGG TTGGCCTCGA CCGCCATACG GTCACTCACA TAACCTGTAT
GTTTGAGTCG
TAAGATAACC
CTCTCAAGAG
TCATAAAGCG ATAGAACTGC GGTGCGACTrG CATCATGGCT TACACCAAAA CTCTCGATGT CGATATCTCC AAAGGTTG GTTTACCCA =rTAAAAAT ATGCTTTTGC GAAGAATCCA CCTTGCCAAG ATATTTACTA TTTTCCATAG CTTGCCAGGT a.
a a. a.
a a a a a a. a a CTTTTCATrG GCATAAAGAT ATGATCTGAA TGCTCATGGG CCATACCATA CTTGCGAGCC TAATCAAGAT GGCA'rCCAGG AAAACGCCTA CTCCATGGAT TCrrrCTGGCT TACGGTTAAT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 TTCACCTAGC AGACTGGTAA TT'IrCTTGCC AGACAAGCCT GCATCTACTA AAAGCTTCTT 7'rMrGAGGTT TCCAGATAAA AAGAATTTCC ACTGGAACCC GACGCTAAAA TACTGTATTT AAAGCCTAT'r TCACTCATTC TAGTCTTCTA CTTCATCCTC CCATACTTCT TCTTTCACTG CATCCTTATC ATAAGGGAGT ACAATGGTAA AGGTTGAACC CTTGCCGTAT TCACTCTTGG CCCAAATAAA GCCCTTATGT TGTT'TGATAA TnTC=TAGC GATAGACAGT CCTAGACCTG TACCACCTTG TGCACGACTT CTAGCACGAT CCACACGATA GAAACGGTCA AAGATACGTG a. a.
a a a GTAAATCCTG CTTAGGAATC CCCAAACCGT CAGT'rGTCTT CATTCGACA GTGATTTTAC TTAAAATATT GTCGACAACC TGCGTCATCT TGATGGGATA ATCTCTCACC AACTCA'rATT CAAAACGATT GAGGATAAAG GTAATAAAAG GACTGGTAGC ATTATCAATA CGTGAAAGAT GGTCAGAAAT GGATAAAATC ATCTGGTCTT CCCCATC 'GG CGAATACTTA ATAGCATTAT TATC'rGTATC AATTTCCATC CAGATAGAAT 7T~CTCCTT TTCCTGTCCT TTCATCTTGT CAGTGAAGTT AATCAGTTCC ACATC'rAGGT GGAGGAGATC CGTCACCATG CGCATCATAC GGTTGGTCTC ATCAAGAGAA ACCTTGATAA AGTCTGGTGC TACAGTTTCA CACAAAGCCC CCTCATCCAA GGCTI'CAAGA TAGGATrA CCCTAGTCAC AGGAGTCCGT AACTCATGGC TAACAT'rGGA AACAAAGAGT CTTCGTTCGC G1-rCTTCCTT CTCCTGCTCC GTCGTATCAT 227 GCAAAACAGC CACCAAACCT GA .TAAAGC CAGACTCTCG ACGTATCAAG GCAAAGCGAA CTCGAAGGTT CAAATATTCG CCATTGATAT CTGGGAATC TAGCAACAAT TCTGGACI-r GGGTAATCAA ATCACGCAAT TCATAGTTr? CrrCTATCTT GAGCAATTCC AAAATGCTTC TATI'CAGAAC ATC~rCCTTA ACCAACCCCA G7TGCTTCTT GGCTGTATCG TTAATCATGA TAATCTGACC CCGACGGrrA GTCGCAAGAA CCCCATCTGT CATATAAAAC AGAATACTAT TTAGCCTCTT ACTCTCT'TGT CATTCAAAT'r ATTGGTAATA AATAATCTCC TGCAATCAAA CACGTCTATT 'rTCCAG'rAAT TAAAGATAAA ATCTCTGCTA TAATACCCTA CACCACGGCG TTCTCACGCA GACGTCGTAC CCCCAGACAG TCTCAAGCAA TGATACAAAA GCTCAAATTC ACGTAGGCGr CTGGAACAAT GCTTCCTGAC CATCTACTGG TGCAACTCAC GATTGGAGAA ATAACCTTAT CAAATTCACT TCTAGATTr CCTGAGT'CAG ACGAATAACC TCCGACAAGT TTGGrGATTT CAGACCCACC TTGCATATCA AGAACCTTGG TCTTTAACCT TrMGATTGAC TTGCTrCAAC TGAATATTAT AAGAGGGTCA CAACAAGGAT GAA.ACCTAAC AAAATCAGGA AAAATGGTTT GTTTCACTAA ATCAAGCArr AT'rTCTCATG CGTCAAGATA TACTCTGGTC GCCTGGGCGT ATCTTCAATC AGTCACATCA ACTGTACGGA CA'rCACCAAA ATAGTCATAA GTGTTCGCGC GTGATGACTT GACCTGTATG CGA'rCCTAAA ACGATGGGTT AAGTCTAGTT CTTCCCCATA TTT'rrAGCC TTCTAAATCC CCAA'rTrGGA TAGGTTGACG ?3-rACTATCT 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100
CATAG-GTTGA
GGGTTTTGTT
ATCTTTGGCT
GAACGACGCA GAAGAGCTT'r AACACGCGCC
ACATAGTCAT
GAAAGCATAA
TTACGAATGG TCTTAGCAAC TTCTAAACCA TCAATTTCTG ATAA'rATCTG GT'rGCTCTGC TTCAAATTGC TCTAGCGCTTr
CTGCCCCAAG
GAATGGGCAC
GAAGCATCAA
CACGACCATT
TATCCGAGAT
CTCTACTATT
AAGTTCGGCTT
7'GAATrTT ACAACTTCGT AACCTTCCTT TCATCTACAA TTAGTATTT= GGTCATATTA AACTTGATAA .TTTCATATGT TCACCTTTT
TTCCAAACCG
ACTGCTTGTC
ATCCAGAATA
AAAAGCAGTT
TrCTTCTCA
ATACCAAAAA
GTGCATAAAC
CTCGTGAAAG
CTACAATCTG
CATCAAGCCA
CAAAGTTCAC
AATAGTCAGA AGACACAATA GCTAGTCTTG GCTACTGTCT CI'GCCAGATT ?rrGTTGGG G'N'TGGCAAG TGGGTAATTC CCAGCGAACT TCCCTATCTG AAAAATCATG TACATGCCAT NTTCGATGAC TAAAAACA'rG ATCAACATCT AGGTCATAGT CCTGCTGGAA GAAGTCACTC ACCTGACCTG CTGGACTGTA TCAAAACAAA ACTCTCTTCT GGACTGGGAC ACTTTCTTCC GCAACCTGAT GAAAGAGGTC AAACTGCTCT TCTTGCGAAA AGTTATCAAC T'rCTATAAAG GGGAAATGCC AAAAACCTGC CAAGAGCr'rT TCGCTIrCAT 7Tr=rTCAAG 228 TAA.AAATTGT CCTTGAGAAT TTTTCACAAC TAAGGCTTTA AGATAAATAG GAACCGGCT'r 7?rCIAGGA GATTTAATTG GATAACGGTC CATGC?1'CCA rrCTGATATG CCGCACTAAA GTCCrTGACr GGCTTTCTTc GTCCA'rCAAG GCTTGA~rAA TGCCTGAAAA ArMCGAT CGCCAAGACC CGCATGACAT GGAAATGGCT CCTGCTGTGT ATTTGGAAAT TGGCCACCAA CACGTCTG.GG ATTTACAGGA GACTCAATAT CAGACCCTAA AATCACCCGG ACGATCCGGA TTAATCAAGA TCTCCATCAT
TACTTGGAAT
TACCATCTAC
AAGGTCCAAT
AGTCAGTCAT
AACTCGAGAA
TGCCAGACTT
TAATAGCCCA AGCCCTCCCA TCGACAGTTG GAAACCAGTC ATCCACCCTG GTCTGCTGAA GCATGATTTC TCTCCTCCAA GGCAAATCTC ?N'rTGTTT~TC AGAAATGACT TTCTCCTCCG GCCACATGAC TCTAGTATAA CACAGAAGGT TTCACCTGTC ATAGTATATA ACrTTTCAT CTACTTATAC CTAGCCGCAG G1TGCCTCAAA ACACTGTTT= CCCAPATATCG TGGTTGACTT CAAACAGACG ACCTGCCA GGCAACG7TAA AAGCAATACT CCCT'T~cAAG CTGGAAArrc CTTCATAGGT AATCTGCTGG GCTGCAGCCT GCATATTGCG AGCTTTCAGT AAACTCTCCT CAGGCGCAGT CAAAA.ATCTT TCGTAGTAAG GGATAACTGT AGATACCCAG ATGTGATAAG GA7=r~ACT ATCATACCAA CGAGAAGTT TCTCACGGAA GATACCGTA'r TCTTTCAAAT CTAACATATC TTTGTATCTG ATTTATAATA TT'TTCAATAG TCAATGAAAA TCAAAGAGCA AACTAGGAAG GAGGTTGTGG ATAGAACTGA CAGAGTCAGT TAGTTTGAAG AGAT-rrTCGA AGACTATAAA ~AAAATGAGC TTGGATATTA TT'rCCAAACT AAGCCTAGTA CAGTTCCATC GCTTTCAACA AGACCTGGCA TGGTCATAAC ATCACCAGTT GGTACCAArr CACGAATGGT AATTTCAAAG 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 ATCATATAcT ACGGCAAGGT TCTTATTGAT GAACTGCTG CACTrAAAGT CAATrrCAAT TCCATGTTGA GAGCTGCTGG AAGGCAACGA TGAACCCTGC TTTTrCTGGTG CTCCAAGCGC CAGATGGCA A'TGTCCCA TTrCTCAAAGT TCAC7"rGCT TGGACAGAAA GGTCArrATC
GAAGCTGACG
CACTCTGAGA
CCACTAGAAC
ACGTTT'rGGA ACCTAATTrT ATTTGGATTG TCTGAGAA.AC TGTATTGAGT TTTAGCCATA ACCGTT'r.A ACGATT'rCAG CAATTTGTGT 'rTGAGCTTr ACCACGATAG ATTTCAGTGA ATACAAACGT TTATAGT'rAG CAATTTT'TTrC AATCN'TTCT CTGGATTTTC AGCAATTGTC CATCAGCCCA GACACTACCC TTAACAACTG TTTCGCCAAG TGCTACTCCA CC~rCTGCTC AATTCAACTG GTACATCGAT TGAGGCACAG AGN'CTTTTA AGGCrGCAAT T'rCAGCTTCT GTATCAGATA CAAATTCGTT AATAGCTACA ACTGCTGGAA TACCCAACTT ACCGATATTT TCAACGTGGC GTTTCAAGTI' AGCAAAACCT GCACGAACTG CCTCTACATT TTCTTCAGTC AGAGCGTCTT TAGCCACACC ACCATTCATC T'rAACGGCAC GAAGGGTTGC GACAATAACA 229 ACTGCA'rCTG
AGGTCCGCAC
GTCGCCAAAA
GAGATGPrGG CAAG7TTGGT GTC?1'GATAT CAAGGAATTT CTCAGCACCA CAAAACCAGC TTCAGTAACA GTGTAATCAG CCAAGTGAAG GGCTGTTGTC CAG.AGTTAC.A GCCATGAGCG ATATTGGCAA GCAGGTGTAC CGTAAAI-TGT CTGA.ACCALAG TTTrGCFAA CCCAAGGCAC CCTCAACCTG CAAATCACCT ACAGAAACAG CCAATAACGA TATTCGCCAA ACGACGTrTC AAGTCCTCCA ATTGCCATGA TTTCTGAAGC AACTGTAATA TCAAAACCAT AGAGGACCAC CAAGACCAAC AGTCACATGG CGGAGCGTAC CGTTTCCAGA GGATACCACG TTGATCAATT CCCAGC'TCAT TCAATCAAGG CAGAAAGGGC ATTGTTGGCA GTTGTAATAG TGGAGGrrGA TGTCTTCCAT TGGCAGAACT 'rGTGCATACC TTGATCCCCA TGACTGGACC AAGAGACGGT TCGCGGATAG ATCTTGTTCA AGGCATCCGC AAGACCAATG GTAAGCGTCG ATGGACCACC GTGTACAAAG TAGCATCCTT CAAAATCAAA GCGTACGG'TC ATAGCGATAA T'GTCCGTTGC CAAGCAAAGA CCTCACGTGG AATACCG'TT GGTCGTTCAA GTCCACAACG TCCCTTGGTG CAAGTGGTTG CATGCATATC TCCAGTAAAG CACCACCAGC AGCACCACCC CAATCATGGT TTTCTTGCCA ACTTTCC -rC ACCTGCAGGT GT'rGGGTTGA TGGCAGTAAC CAAGATCAAT TTACCGACTG GATTGCTCTC AACTGCACGA ATTTTATCAA AGCTGAGTTr- AGCCT'rGTAC TTTCCGTACA ACTCCAAATC GTCATAAGAA ATACCAAGTT TCTCTACAAC ATCAACAATT GGC~rCAACT CAATACTCTG TGCGATTTCA ATATCTGTT~T TCATTCAAAA TTCCTCTAAC CTCTTATATG ATAATTCATT ATATCACAAA ACAAGAT'rTT TAACATCCTA AAACTCTCTA AACGTTCGTA AATATCTCTG T'NTTAAGAC TTTTAGAGTC C~rTCTTAAA 7TTTATATGG Crr'rATACT'r TGAAACTATA ATAAATCTTC GTTTTTACCA AAAATTTATC ACTTTCATTT TACT'rAC6GC TTAT-TTTGT GTACAATAGT 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7 620' 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 GCTATGAAAA TTTAGTTAC ATCACTAACC ATTCTACAGG GGGTATGAAG TTTrGTTTA-AT CTAAGTATTC GAGAAATTAC CAGGATTATC AGGTCTTGAT ACACGGCT'rG AGGAAGT'rCA CATCACGCCA AGATTTCT'rC AAAATCATAT CCCTAGTCAA CTGGTTGATG TTACCGAACA ATCGGGCGGr ACCAGTGAAG CTATCGATAG CGTCCGCI'CT TCAC= GGGG AAAA'rTATCA CAGAGACTTT GCT'TCTGCA TACGACAAAA CGAGCTCTGA AGCCAGAGCC TCATCCTAAC CAATACCAAG GACC= CTAA TAGAAATGCA AGAACGTGTT CCACTCAATG GCTGI-rTCTG ACTACACTCC TCGTATATG CGCTAGCTCC AATCTAAAAG AA'N'TTTAAG CAAGCAAAAT AACTGATGAG GTTCAGGTTT TG~rCC=AA AAAGACACCC GGAATGGAAT CCTACTATTC ATCTGA7TTGG TTTCAAACTG TCATCTGGTT GACATTCCAC GAAAAAGTCT TATCAAGAAT 230 CAAGCAGATr TAATCATCGC GAATGACCTG ACTCAAA'1rT ATA7"rrGTrG AGAAAAATCA GCTTCAAACA GTCCAGACTA CTCCTTGAA.A AAATTCAAGC CTATCATTCT TGGCTGTAAC GGGTTCAATC GCCTCTTATA AACAAGGCCA TCAAGTCACr GTCTTAATGA
TAGAAAGGAA
AGTCGGCAGA
CTCAGGCTGC
TCCACTTGGA
AAAAAGCAGA
ACGGATTTC
CAGCAGATCA GCACCGACCT AAGAAGAAAT TGCAGAACTC AACTATGGCA AACATTCTCT TTrAGTCAGT TCTCTAAAAA TACAGAGTTT ATCCAACCTT TGTCATGAAG GAACCCTATC TrT A'NrATC GTGGTACCTG GGACAACATG GTAACCAGTA
TGACACTACA
CTGATCAGGT
CAACTGCTAA
GGTAC'rCTCA
CA.ATCATATC
CACTATTGCA
CAGCTCTAGC CCTACCAAGT TGTATGACCA TCCAGTAACT GATTGCTCCT AAGGAATCCC CCTCACAATT ATTTTAGAAA CACCCATTrGC TATCTTTTTT TrAACCI-rr TCCATTTCCA GCATTATT'rA TGGTCCACGA TGACGGTTAA CACGATTACG ACCGAAACAT CTACTCAGCT
CAGAATCCTG
GAACTTGGAA
AAACTAGCTC
CATATTCCCA
CAGAATAATC
TACTAGCTTG
GAATAAAGGA
GCTACCATGC
ATCAAACCGA
AACTAATAGC TCCTGCTATG TGAAAACATT AGAA.ACTACG TGGAGACCAC GGACGAGGAG AACTATCGAT GAAAAAACC TCGTGATACA CTTrCTGAGC CCATTGTTCA TATTCCTGTC
AATACAAAAA
GCTATCAGCT
CTTTAGCTGA
TCTAATATTG
TCACTTATCT
ATTATTGCCA
8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 a 'C *S 0O 0 0* no.
fl* I. *0.0 0 GTTGGGGTTA CACTTGGATT TTTGATGGGA TTACTTAGCT ATTCTACCGA CAAGCTACCT CTTCTCTCCC TTCGTACCAA ATCATTGCCA TCGTCCCACG TATTTTGATT GGTTTAACTC CrTACTrAGT CTATAAACTG ATGAAAAACA AGACTGGTCT GATTTrrAGCT GGAGCCCTTG GTTCcTTGAC AAATACTATC TTTGTCCT'rG GAGGAATCTT CTTCCTAT'rT GGAAATG'I'r ATAATGGAAA TATCCAACI' CTTCTGGCAA CCGTTATCTC AACAAATTCA ATTGCTGAAT TGGTCA'rTTC TGCAATTCTA ACCCTAGCCA TTGTTCCACG ACTACAAACC TTGAAAAAAT
AAAAACAGG
INFORMATION FOR SEQ ID NO: 13: SEQUENCE CHARACTERISTICS: LENGTH: 1126 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: TAATITTCAT ATAATAGTAA AATAGAATGT GTGATTCAAT AATCACCTCA AATAGAAAGG AAATTCTATG TCAAATCTAT CTGTTA.ATGC AATTCGTTTT CTAGGTATTG ACGCCATTAA TAAAGCCAAC TCAGGTCATC CAGGTGTGGT TATGGGAGCG GCTCCGATGG CTTACAGCCT CTTTACAAAA CAACTTCATA TCAATCCAGC TCAACCAAAC 'rGGATrAACC GCGACCGCTr TATTCTrTCA GCAGGTCATG G?1'CAATGCT CCTTrATGCTr CTTCrrCACC TTTCrGGriTr TGAAGATG'rC AGCATGGATG AGAT TAAGAG TTTCCGTCAA TGGGGT rCA.A AAACACCAGG TCACCCAGAA 7IrrGGTCATA CGGCAGGG.AT TGATGCTACG ACAGGTCCTC TAGGGCAAGG GAT2-rCAACT GCTACTGG'rr TTGCCCAAGC AGAACGTTrC TTGGCAGCCA AATATAACCG
TGAAGGTTAC
GGAAGGTGTC
TGTTCTTTAT
AAGTGTTCGT
AGACTIGGRA
GATTGAAGTG
ACACGGCGCC
AATATCTT'rG
TCAAGCGAGG
GATTCAAATG
ACCACTATAC
CAGCTTCATA
ATATCAACTT
TTrACGTTATC TGTGGAGACG GAGACTIGAT CGCAGGCTTG CAAAAACrTG ATAAGTTGGT GGATGGTGAG ACAAAGGATT CCTTTACAGA 'NGGCATACT GCCrTGGTTG AAAATGGAAC AACAGCAAAA GCTTCAGGCA AGCCATCTTT GACCG'rTACA ATGCC'rACGG GCCATCCATG CTGCTATCGA a. .a a a AAGACGGTTA TTGGATACGG TTCTCCAAAC CCTCTTGGAG CAGATGAAAC TGCATCAACr CTACGAACCA TTTGAAATTC CAGAACAAGT ATATGCTGAT CCGTGGCGCA TCAGC'rTATC AAGCIIGGAC TAAATTAGTT TCCAGAACTG GCTGCAGAAG TAGAAGCCAT CATCGACGGA AAACAAGGAA CTAATGCTGT CGTCAAGCCC TCGGTTGGGA TTCAAAGAAC ATGTTGCAGA GCAGATTATA AAGAAGCTCA CGTGATCCAG TCGAAGTGAC 540 600 660 720 780 840 900 960 1020 1080 1126 TCCAGCAGAC TTCCCAGCTT TAGAAAATGG TTTTtCTCAA GCAACT INFORMATION FOR SEQ ID NO: 14: SEQUENCE CHARACTERISTICS: LENGTH: 2520 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: CCGGCAACAA AAAAGAAAAA ATCAACAGTT AAAAAAAATC TAGTCATCGT GGAGTCGCCT GCTAAGCCAA GACGATTGAA AAATATCTAG GCAGAAACTA C.AAGGTTTTA GCCAGTGTCG GGCATATCCG TGArTTTGAAG AAATCCAGTA TGTCCGTCGA TATTGAAAAT AATTATGAAC CGCAATATAT TAATATCCGA GGAAAAGGCC CICTTATCAA TGACTTGAAA AAAGAAGCTA AAAAAGCTAA TAAAGTTTTT CTCGCGAGTG ACCCGGACCG TGAAGGAGAA GCGATTTCTr GGCATTTGGC CCATATTCTC AACITGGATG AAAATGATGC CA.ACCGTGTG GTCTTCAATG AAATCACCAA GGArIGCAGTC AAAAATGCTT TGGTCGATGC CCAACAAGCT CGTCGGATCT CTATTI-rGTG GAAGAAGGTC AAGAAGGGCT TTAAACTCAT CATTGACCG'r GAAAATGAAA C-AGTTGATGC TGTCTTTAAA AAGGGAACCA ATGGTAAAAA GATGAAACTG ACCAGCAATA CG.AGTAAAGA CTTCAGTA GATCAGCTGG TACCCTATAC CACTTCATCT ATGCAGATGG GAAAAACCAT GATCGTGCC CAACAGCTCT AAGGTTTGAT TACCTATATG CGTACCGA'TT 232 TTAAACAACC TCcrrAAGArC GATATGGACT 'rGCATCGCTT GGTAGGG'TAT TCGATTTCC TGI'CACCAGG TCGCGT'rCAG TCCATTGCCC TCAATrCCTT CCAGCCACAA GAATACTGCA AACAA~rrTCA TGCT'rCCTTC TATGGAGTAG ACGAAGTCAA GCGAACTCTTG TCTCGTCTGA ATAAGAAAGA GCGCAAGCGC AATGCTCCTT ATGCTGCCAA TAAAATCAAT TTCCGTACTC ATGAAGGAAT TAATATCGGT TC'TGGTGTTC CCACTCGTAT CACTCCTGTA GCGCAAAATG
AGGCGGCAAG
TCAAAAACGC
ATACACCAGA
TCTGGAATCG
AATTGTCTCA
ATCTTGCCAr
ATGTGGTCAA
ATTCTGAAGC
ACGCGCCAAC
TTGAACCGAC
TCGTAAACGT
AAGAGCAGTG
AGGCTGAAGA
C'rTCATTACG
ATCAGGTGCT
AAGCATCGCT
TTT'rGTGGCT
AAAAGGGGTT
TTATAATGAT
ACAGGTCAAT
AACACTGATT
CAT'IGAAACC
AGAGIGGGA
GACCTTCACA
GCGACGGGTC
AGAAATGGAA
GATCGTTTTG GTAGCAAGTA TTCTAAGCAC CAGGATGCCC ATGAGGCTAT TCGTCCGTCA AAGTATCTGG ACAAGGATCA GCTTAAGCTA AGCCAGATGA CAGCGGCCGT TNTTGATACC CAATTTGCTG CCAATGGTAG TCAGGTTAAG TCTGACAAGA ATAAGATGTT ACCGG;ACATG AGCAAACCAG AGCAACATTT CACCCAACCG AAAACCTTAG AGGAAAATCG GGTTGGACGT ATTCAGAAAC GTTATTATGT TCGCCTGGCA GAAATTGTCA ATAAGCTCAT CGTTGAATAT GCTGAAATGG AAGGTAAACT GCATGATGTC ATTGATGCCT 'rTrACAAACC ATTCTCTAAA AAAATCCAGA TTAAGGATGA ACCAGCTGGA GTCATTAAAC TTGGTCGTTT TGGTAAATrC CATACCCAAG CAATCGTGAA AGAGATTGGT ATTA'rTGAGC GAAAAACCAA GCGTAATCGC TGTGAATTTA CCTCTTGGGA CAACCCTGTT
GGTAGCAAGG
AGTGTCTTTrA
TATACCCTTA
ATGGCTGTTA
TTTGATGGTT
GTTGTTGGAG
CCTGCCCGTT
CCATCAACCT
GCCAAACGTT
TTCCCAGArA
GAAGTTGGAA
GAAGTTGCCA
TTTGACTGTG
TACGCTTGTA
GTTGAGTGTC
CTATTCTATG
GGTCGTGACT
AAGCAGGTTGG
TAG'rGGGTTG
TTCAGAGCGA
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 AAGTGTGTGG CAGTCCAATG GCAATTTCCC AGATTGCCGT CAAGCTGTCA TCAGGGACAA GT'rGCAATCG CTATCCAGAA GTCCAAAATG TGGCAACTTC CTCATGGAGA AAAAAGTCCG TGGTGGTGGC
TTGTCAACTG
TTTTTTGATA
7TrTGTAGCAA AGGCGACTAC GAGGAAGAAA AGATGGCTCT AAGTCAGCTA AGCTCGAGAA ACGACAAATT TTGTCCI'TC 233 TAAAAATCCG T'nTGAAG T'TCAAAGT TCCGAAAACC AAAGGCATTG CGCTTGATAA GTTTrGATGAG ATTATrGGTC GCTTCCA.Arr TGGCGTTAGA. ATAGTG'rAGT TGAAGGGCGT TGACGATTr CTCTTTGTCC TTrAGAAAGG 'rrTTAAAGAC AGTCTGAAAA AGAGGATGAA CCTGCTrTAG ATTGTCCTCA ATGAGTCCGA AAAATTTCTC CGGTTCCTTA TTCTGAAAGT GAAACAGCAA CAGTTGATAG AGCTG.ATAGT GATGTT-rCAA GTCT'rGTGAA TAGCTCAAAA GCTTGTTTAA AATCTC'IrTA ?rGGTTAAAT GCATACGAAA AGTAGGGCGA TAAAAATGTT INFORMATION FOR SEQ ID NO: fi) SEQUENCE CHARACTERISTICS: LENGTH: 10993 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TTTTCTCGAT AATAACTTCC ACCTTATTAT TTGGGATACC CTCCTCTTCT TCACCACCAC GTTCATAGTA GTCATCGCGA TAGAGAAAAG CTACGATATC AGCGTCCTGC TCAATAGACC CAGAT TCACG AATATCAGAC A.AGACCGGTC TCTTGTCCTG ACGTTGTTCT ACACCACCAG AAAGCTGACT CAGAGCGATT ACTGGAACCT TCAATTCCTT GGCTAGTATT TTCAACTGAC GAGAAATTTC AGAAACTTCT TGTTGACGAT TTTCTCGACC AGTTCCCGTG ATAAGTTGCA AATAGTCTAT CAAAATCAAA CCAAGA'm'TC CAGTTTCTTG AGCCAATTTA CGAGAACGAG AACGAATCTC TGTAATCCGA ATACCTGGCG TATCATCGAT ATAGATACTG GCGTTAGcTA GATTACCCTG AGCAATAGTA TATTTTTGCC ACTCCTCATC TGTCAATTGC CCTGTACGGA TAGAATGTGA CTCCACTAAG CCTTCTGCAG CTAACATACG ATCTACCAAG CTTTCCGCAC CCATTTCGAG TGAAAAAATA GCAACCGTTT TGTCCAACTT AGTCCCAATG TTCTGAGCGA TATTCAAGGC AAATGCTGTC TTACCAACTG CTGGACGAGC TGCTAAGATA ATCAACTCCr CCTCATGAAG TCCTGTTGTC ATATGATCCA AATCACGATA ACCTGTCGCA ATACCTGTAA TATCGGTCGT TTG'1-GCGAG CGAGCTTCCA GATTTCCAAA GTTGAGATTC AACACATCTC GAATGTTCTT AAACCCGCTT CGATTTGCAT TrCACTGAC ATCAATCAAC CCTN'TTCTG CCTGAGCAAT AATTTCATCA GCTGGTTGTG ACGCTTCGTA AGCTTGGTTG ACAGACTCTG TCAACTTGGC AATTAA.ACGA CGTAGCATTG CTTTTTCTGC AACAATCTTA GCATAATACT 2220 2280 2340 2400 2460 2520 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 CCGCATTAGC AGAAGT'TGGC ACAGAATTAA CACCA AGAGCAGCCA CAATCTCAAC CAAGTAAGAC AAGCCACCAA 234 TATTCTG2'AA ATCACCTTGA TTATCAAGGA TAGTACGAAC CGTTGTTGCA 'rCTATGGCAT CACCACGATC GGATAAATCG ACCATGGCTT GGAAAATCAA ACGATIGCGCA TACTTAAAAA AGTCCCGAGA CTCAATGTAT TCTCGCACAA AAACAAGTTT ACTCTCATCA ATAAAGATAG CCCCTAAAAC GGATGCTCA GCTAAGA'rAT CrrGAGGTTG TACTCGTAAC TCTCTACTT CTGCCATCAG ACTTCCCTTC ClI'TACAAT CTTGTCAAGA AGGTGTAAAC rATCCTTCT TTCACACGAA GA?1'GATTAC ACTTGTCATA TCNTGATAGA 7''MCACTGG CACATCAATC AAACCAACCG CTCGAATCGG AGCCrGTACT TGAATATGAC G~rATCAAT CTTAATTCCA AATTr.CI-N-r GCAATTCTTC TGCAATCTTC TTArrGGTAA TAGAACCAAA GGTACGACCA AAATTCTACA ACAGTTTCTT CTGCTTCAAG 'TTCTGCTTTA TCTGGACCAA CII'TrTCAAC ATTrGC7Mr-C CTrCTGCAAT CGAAGTTCAC CTIACAGCTTG TTTTGCGCAT ACCCTGTTGG TCTGCTAAAA AGATTACTT T'rTCTGTCAG TTTI"TCACC-r AATTAAAGTG GCCTCCACCG GACTTCGAGC TGAGATAGAG CATCTCAGCG TGAGCr'rTr AGCAGTCGCT TCT7TTGGCTA TACTTCCTTrA ATT'rGCCTT 4*
C
C. C C CATTCTrCTT
GCTTCTGACA
CCTAACTCTT
ATAAATCCT'r
TCTCCTTTTC
AGG'rTACATC
CCATAATCCG
GTGTAT'rCI-
CTGCCTTACT
CATC'rGAACC
AATCTGTCGC
TAGCAACATC
CTTCCGA'Tr TTGT'rTACCA GATTCTTTTT GATAAGAAAG TTTNACCTTT TCCTTTAACA C'rTCATTTCA TTTAATACAA TTTAAT'rTGA GCTGCTGCCA TTGTACATTC AGTTTACTAC CGCAAGAACA AAACTCCr AATAACAACT GTATCATAGC TAATTTACGC CCCTGTAAAA AGCGATTTCC TGGATAGCAA AAATGTCCCA CTAGTTAC'rC CAAGACACTT GCTTGCATAC TrCCGT'rACC AACTCACTGG ATCTGGAAAA TCCTGATCCC 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 CAATACCTGA CATGGCTAAC ATGGCATCTG ATTTCATGTC CTTAGCCTCT GCTATTAGTA TAACTTCA'rT GACCTCACGA TATTCTTCAA TACTATCACr TCCGCGCG'rT CTGAGATAGC GCGAGGTGAA AT'rTTAGTA TCCAACATCA TACCAGCCAT GACTCAAACG ATTTrTCTTA GAATTCTGGA ACTGAATCAA CACTACTTGC ACCACI'?rCG ATATAAGTAA TA.ACCCCATT Go.
TTCTATGGTG GT1CAATAACA ATGGTTTGGG TAAATAAATC A'rAAAATTCT TTT'GATAATG TTAAG4GCTGT CTTTGAATGG TCTACAAGAA TCAACAAAGA ACGATTGGTC ACCATCCCCA T'rGCATCCTT AACAGACAAC AACTTCGTAA CTCCTTCTTT T'rCTATGAAT GAAACAGCTC GT'rCAATA'rC TGGAGACAT'r TGTTCTTCAT CATAAAGAGC ATAGC'rATTT TCAATCACAT TGCTGGCGAA CAACTGCATA CCTACAGCAG AGCCCAAAGC ATCCATGTCT AAATTTTTGT GACCGACTAC AAAAACCTGA TCTACACTCC GAATCTTATC 'rGAAATAGCT GTCATCATAG CGCGCGTACG AGTCCGT'GTA CGCTTGATTG AAGCAGCAGA CCCACCACCA AAATAAACTG 235 GArTrTCCT TTCGTCGTTT TCC1-rAACAA CCACCTCGTC GCCACCACGT ACTTCAGCCA 2880 AGTTCAAATT- GAGCAAAI3CA ACTTTCCCTA TCTCATCATG ATN'TCCATCG CCATAAGAAA 2940 ATCCCATACT 'rAAGGTCALAG GGCAACTGTC TCTGTTTCGA CTCrCTCTG AAAGCATCAA 3000 TAACAGAAAA ?TrATCATTC ATCAACCCT CAAGCACCGTI GTAGTCAGTA AATAGATAAA 3060 ATCGATCCAT ACTTACCCGA CGAGAAAACA TCATGTGTTT TTCTGAAAAC TCTGATATAA 3120 AA'rrAGCTAC AAAACTATTG A7"rTGACTAA TATCTGACTC AGAAG=rCA TCCTCCAAAT 3180 CATCATAA'rT ATCCACAGAG ACAATCCCAA TCACTCGTCT ACTTGTTACC AATTCATCTG 3240 TT~ATGGCTTG TTCCCTGGAT ACATCTACAA AATACAAAAC ACCGGAAGAA CCATCCATAT 3300 GAACAGCATA ACGCTTCTCA CCAAGCTTGG CA'rAAGTAGA CGGATTTCCr ACTGAAGCCT 3360 TGATAATCGT TTGAACAGCT TCTAAATCAA AATCACCATC TTCCTTGGTC AAAATCAATT 3420 CAGCATAGGG ATTAAACCAC TCAACCTCTC CAGAAGATAA ATTCAATTTC ATAACACCTA 3480 CAGGCATCTG TTCCAATAGA GCTCTCAAAC TrrCTTCCGC T'rCGTGGT'TT ACATACTGTA 3540 TCTGTTCTAC ATCACTCCTT GTATAATGCA CTCTCAGTTT CTTAAATA.AA AAAACATAGC 3600 *CTCCTACAAA AAGAAACAAA ATTAAAACCG TCAACAGATT ATTATTAACA AAAATAATGA 3660 ***AAGTGGATAA GACTCCAA.AC GCAATCAATC CTACTAGAAT AGGAAAAATI' GGACTTACAT 3720 *..AAAA'N'TTTT CATTCAAAAC CTCTTGGCAC CCATTATACC ATAATACCCC TCAAAAAGCG 3780 *.*ACrTTTAAA AGTGTAATCA GTA.ATTCTAT CAATTATAAG AAAA.AGGTAG =~ACAATTC 3840 AGTAAACCTA CCTTTACAC.A TATTGAAATT AAGATTCTTT AACCTCTAAC AAACCAATTT 3900 CGCCATCCTC ACGACGATAA ATCACATTCG TTGTCTGATC TTCAACATCC ACATAGATAA 3960 **.ACAAATCATG CCCCAATAAA TCCATTTGTA GAATrCCTTC TTCCAAATCC ATTGG.TT'rTA 4020 AATCAATTTG rTTrGAACGA ACAACTTTAG ACTGGACA.AT ATTTGAATCT TCCACCAAAG 4080 ***CATCTGTAAA TAATTGACCA GTTGCTACCT TATTwr=TT TTTACGCTCG ATTTTTGTTT 4140 TATTTTTACG AATCTGACGT TCAAI-T'AT CAGTTACAAG GTCA.ATTGAA CCATACATA'r 4200 CTTGAGATAC ATCTTCTGCG CCGAGAGTAA TAGATCCAAG CGGAATCGT'r ACT'rCCACTT 4260 TAGCCGTrrLT TTCACGATAA ACTTTTAAGT TAATrCCGGGC ATCCAACTCT TGrCTGGTT 4320 *..GGAAGTACrr TTCGATCTTT 'rCGAGTTAG AAACTACATA ATCACCAATT GcrCTGTI'A 4380 .CTTCTAGGT'r TTCACCACGG ATACTA'rATT 'rAATCATATG ACTACCTTCT 'rTCTAAACAT 4440 TTTrGTTT ATGATTTTAT TATAACGCTT TCATTCTATTr TGCAAAT'r lrTTCCTCAT 4500 CTTACAAGGG AAAATGTTTT TACATCCTTA GCACCAGCTT CTTCCAACAG T'rTCTTAACA 4560 236 CGA'TTTATAG TTGCTCCTGT AGTATAGATA TCA'rCTATAA GTAGGA=r TTTAGGAATA GTGACTCCAC TTTTAATAAA GAAACAAGT TCTGTCCCCA GAAGAACTGG CTCTCTCTTC TCTTTTCTCT AATAAATCCA AGCGCTCTCA ACGATTr"A GATACTCAAA GCCTGCTGCC TCTACCAAGC CCTCAACCTC ATTACAACAA ATTrGATACTC AT'rAAATCCT CTATTAGCAT ATCTATCAGG TTTGTACTTT TTCAACTCCT CACTTAAAAA
ACTTTTTA
TGATTGThAG CAA'rCTTGAC
ACAGGAAGTC
TAAAAATCC
ACTTTGTTGA
TCCATCAAAC TTATACCGAC 'rCTATGACTG ACTrCAACTC CAACTCTGTT TTCATACAAT ACAGTCTGAA CAAAGACAAG AACAGTCTTC ATAGTCTGCC
TGAAAAAATC
CCTCTTTACA
TTGGACAGTT
ACTTAGGGGA
TGAAGCGAAA
CTTCATAGCT
CCAAAGTTGA
CTCTTCCCCA
4680 4740 4800 4860 4920 4980 5040 ATTCTTTCAA AAGTAGAATC AAGAGACTAC TAAAAGTTAA GACCAGCCTC CTTATTCATC ATCTGAATTT CCTrAATCGC ACCCATCATG GAACAAAAGC AAATCTCCTG TCGGTCTATC AGTCATCATT CCTCAGAAGT 5100 CACATAACAA GCACTTCATA 5160 CTTCTTGATT GAAGCATTTA 5220 CATGCTTCGT CCAACTCGTC 5280 ATTGGCCTCT ACTACGAAAA 5340 CGTACTGATA AGTATTGTCA 5400 CTCTGTTACA GAAGATACAA 5460 CACCAATCTG AATCAAACTA GACTTGGTAA CATCCACACA AGGGAAGGTA ACTCCGCGCT GTTCTCCATC TCGAAAAGCT TGTACTTGCT AGCCAATTTT CTCATT'TGGA AAT'rGCTCCT TAATTTCTGA AGCAAAAATG AGTAACGGAT ACTTTAACTT 'rGG'rGACAAA CGAT'rCTTGT TTGGTTTrGG AATAATCAAC GGATTTCCAT GTTCTCCTAA ACGGACCTT'r TTATCTAACT TCAATCCATT CTCCTT-TACA CTATTCTTGA AAGGAA AAGC ATCTACTTCA TCCACTATCA
ACAAACGATG
CCAAGATTGT
CTAATCGATC
GTAAGATTTC TGCTAACTGC TCCCCTT'rCT AAGCTGTCTT TCTCTGCTTC TCAATATAGG CTAACTAGCG ATTAAAATCC GATAACCAAA GAAACCGTCT CGGTAAA'rTC AGTCrrT'rA CATTGGTCGA AGTCGCTGTT AAAAAGA'ITC CAGCGTGGTA AAGCATGCGA TTATCAACAT GCAAATCAAA AGCTTGATPAA AACTTCAATA
ACTGATGGGT
GCAAAGCTAT
CTATGCGAGG
GATAAATCAT
TGTCTACTAC
TGTTGCAACA ACTAGTGGTG TTCGAAAATA CCCGCAAGAA AAATCCTGTT GCAGGCGCTT ACTAGCCAAA CACACTGCAC CACCCGCAT'r TTCTGTCTTT CCAGCTCCTG TTACCGCATG TTGAAGCAAT CCCTCTGACA CC'TTCTCTTG
AGGTTCCGAT
G'rACAGCTCC
GATCACTTTA
AACTAAGGTT
AAA.AGGAG?1'
TCTCCATGTA
AAACAAACAT
GCCACTACTT
GGCrTrGCT
AATTGGCCGC
5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 GCCATTTGAG AACATCrGC TT'TGGAAAAT CCTCCTGCCG AAAATAGTAT AAAGTTTGAT CACTrTCTGAC TCGCTTCATC AGCAAGCACT CTCGACAATA G'rAAGCACCG ATGGGCAAAT ACCATTCTTC TAGAATAGTA CTATTACAGC GTTGACACAA AAGTTTCCCC TTCTCCTTTC TCATTGCTGG AAGTTTCTCC GCCAACTGAC GTTCrCrC ATAAACCACC GAGATAATCT AAATrAC~r TCATAC~rCT TrAGATGAT TTTPTAGTAC AATTAAATCA TGGAATTTAG AAGTCCAAGA AGAAATCAAA AAATCTCGCT TTATCTGCCA AAI.AAGAGGC TCGTGACTTC ATTACTGCCA TCAAAAAAGA ACTGCTCTGC CTTrCATTATT GGAGAACGTA GI'GAAATTAA AGCCTAGTGG 'rACTGCTGGT GTTCCCATGC TTGGGGTACT ATGTCTGTGT GGTCGTGACA CGCTACTTTG GTGGTATTAA TTCGTGCTTA CGCCGGCAGT GTCGCCTTAG CTGTCAAAGA AAGAACAGGC TGGCA'rTGCT AIITCAAATG.T CTTATGCTCA TCCTTAAAGA ACATGGTCTC ATGGAGCTGG ATACAAACTT TGATTTATCT TGATAAAGAA GAAAAAGAAA CTATTAAAGC ATGGAAAAGT CACTTTAACT GACCAAGGTT TACGAGAGGT TGTAAACAAT GAATAATACA GCGTTTCGTT GACATTCI'CA TCG-rAATTCA TTCTCAGTAA TTATI'CGTAA AAACTAGCAC GACAATTAAA GAGGACGGTC TGCCAAGCGT GTTTATAGCG ACACTACAAA GCGACACATA ACGTACAAGT GATGATGG'rG AGAAAATCAC AATCTCACCA ACTAGCCGCT GGAGGACTAA AATTCTATT ATGAAATAA GTACCAAGAG TACACTAACT TACAGATCAA GTCGATACGA TGCACTTGTG GAGrI-rTA TGAAGTTCCT GTAAACrTAG CAACTACTI'T AGCGAGCAAA 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 ATAAAAAGAG CCTACCAAA CG'NTrTCCTT CACACCTATT CT1TTGATCTA TGATATATAG AAGAGGTA'I1 CATATGTCTA TGTTAAACTT AACAACATCG ATTTAATCCT GGTTCATCTG
ACAAGATGGT
'rATTGGACT
AACTATGAGT
TCCTG4GTAGC TCATGGTT'rC
AACAGGAGCT
AGTAGGTACT
ATTCTGAAAC
TCATGGGTAG
GTAGAACGAC
GAGGGAATGA
CTTCCTCTTC
GAGATACTAG
GGTGGAACGA
ATATACTAGA AA.ATGAAGCA ATTCA.AACGA AACCTGATAT TACTAGAAT'r AGCTGAACGC AATCACTTGA AAATTrAATGA AAATGGTATG GATAGCGTTA TACTAAAGAT ATCTTATACA TTTATAACAA CATTACTGAA TTAATCGGTC AAACACCGAT TGCCAGAAGG TGCTGCAGAC GTCTATATAA AGCTTGAAC TAAAAGACCG TATI'GCCCTT AGCATCATTG AAAAAGCTGA CTGGTTCTAC TATTGTTGAA GCAACAAGTG GAAACACCGG GTGCTGCTAA AGGGTATAAA GTCGTCATCG TTATGCCTGA GTAAAATTAT CCAAGCTTAT GGTGCTGAAC TCGTCCTAAC AAGGTGCTAT TGCTAAGGCT CAAGAA.ATCG CTGCTGAACG AATTTGACAA TCCAGCTAAT CCAGAAGTAC ACGAAAGAAC CTGCTTTCGG TAAAGATGGA TTAGATGCCT 7rGTTGCTGG TTTCTGGTGT TTCTCATGCA CTCAAATCAG AAAATTCTAA AAGCAGATGA ATCTGCTATT CTATCTGGTG AAAAACCTGG CATTCAAGTI' TTCAGTAG TCCTCACAAA ATTCAAGGTA TCTCAGCTGG ATTTATTCCT GATACACTTG ATACTAAACC 8100 CTATIGATGGT ATCGTTCGTG TGGAAAAGAA GGCTTCCTTG CGTTGCCAAA AAATTAGGTA ACGTTATCTC TCTACAGCAC AATCTCCAGA CTAGAGAACT 7TrCT'rcrACA ACTITrAGTCC TTCCACGTTT GGAAGACAr'r
TAACATCAGA
TAGGGATC
CAGGTAAAAA
?ITTATGAAITT
CACGGATAGT
ATGGTAAATA
CTAGAAGATA
238
TGACGCTCTT
CTCAGCTGCA
AGTCCTTGCC
GTAACCG;TCC
TCCTAATCTG
GGCCTCTAAA
GGATAGATAT
ACATGATGGT
TAAGTCAGCT
GCTGAGTCA
GTCTTGATAT
ATAATAGA.AG
AGCTTATCCA
TACC''A.AAG
TTGAAATAAG ATATGAACAA ATCGATTAGA TCAGCT'rrCC CAGACAAAAA AGTCCAATAG CTATAAGAAG TTTCATCCGC ATGAAGTAAG TTATAAAGGG GCTCCAAA'rA GTATTGACTC CGTGTGATTG GTAAACCCAT CCTAGCCCAA AGATTAAACT TCTGATGGAT GGTGTGAGCG AAAGGGGCTT TAGGAATAGG AGCTTTCACA TATGGACAAT GCTATATGGC ATAAATCAAG GCACTCGGAC GTGAAArrGG GCTATCTACG GAGCCATCCA C'rAGCACCAG ATAACGGTGA AATAACGAAG TCTATTGAAA GAL-ATTTCTT ATTTGCACTT ACCTCTTTGT TTACGAGAGT TTCTCACTAT TTATAATGGA AAAGCGTAAT CCCTTGTTTC GACTATCACr CTCTAGCACC ATAGTCTCTC TCGCAAGAGG GCCAATTAGA GATTTCCTTA GGCGATAATT GGGTACCTTC CTGAGCCAAA GTTATGCCCT GATGATTATC TTTTACTCGT ATTCCGACTA ATATTGGCTT GAACAAGTGT GGAAAGAGAT GAAGATGTCA TACAAGGACT 0 09 9.
9 0 9 *0 00 9 0 0 09 0e 00 *0 0 0000 0 9. 0 0 00.0 0 0000 0000 0 0..0 .0 0 0 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 TGCATTTATT CCTCCATACA CACCAGAGAT GAACCCCATT TCGTAAAKCGT GGATTTAAGA ATAAAGCCTT TCGAACT'1-G GGAGAAGGAG GTGATAAAGT CCATCG~rAA TCGGAGACOG CAGATGAGTA TAAAAAGAAA GTCCTCATTT CAATAGAAAT ATAGTAAAAT GAAATAAGAA CAGGATAGTC AAATCGATTT GAGGTGTACT AT'rCTAGTTT AAA'rCCACTA TAT'rTGGGGA
ACTAGAATGC
CACGACTTTC
CTAACAATGT
GTGATAGAAA
TN'TGAAAA
TGATGAArTT
TTTAGA.AGCA
AGCCCTTCAT
CAGCCAATCT
CAG?'r=CCG AACTTrAAAG CAA'rTCAAAG
CTTGCCACAA
CCTT'rCCTcC ACCT'rGTTAT T'rATTATAGG
TCAGGAAAAA
ACTTGTTCAG GTGCGAGAGC TTTGACATCC TTTTCTGTAC TGGACCAAGT TTCTCAAAGC GTTTATATAA TATCCAAAAT CCTTGACCAT CCCAGTAAAG CGGTCTTTAC GTCCACCACA AAAGAGAAAG ACTTGATCGG AGA.AAGGATC TGGGTTTTAA CTACATAGGC TAATGAGTCT ATTCCCTGCC TCATATCTGT ACAAGGTGAA CTTGACCTAA ATCACTTAGT TGAATTATCA TAGTACAATA GATAATTATT TTTTATCTGG TATACTGGAA GTTGGGGAAT TAGGATAGAT GACGCGCTTA CTATGAATTT GAAGTATAGT CTCCTAAATG CACTTAGCCC GCT7TTTTGTT TTAATTATTC TAATCGAGTG AGACTGvGGGA AAAAACAATT TCTAACCCT ATACAAAAAA GGAAGCAATT TGCTTCCTTT CTATTATTAG 239 TTATTCAAGG CTGCTGCCAT TGTAGCTGCA ACTrCAGCTT CGAAGTCGTT TGCAGCTrTC TCGATACC1TT CACCAACTTC AAAGCGAGCA AACTCAACTA CCGAAGCGTT AACTGATTCA AGGTATGCTr CAACTGTC T GCTTGGTCAA. CTTTAGTGTT TCCCAGATTT TTTCTGGTTT TGAGCAATAA CATCATCAGT GGT'N'ATTAA CCATTGCACG AACTCATCTT TAACGAA'NG GCTGCGATCT GCATTGACAA ATAACACCGA TACGTCCACC
GCTGTCATCC
ATCAAGCATG
GCC TTCTGCA
TAATTGAGCT
GC'TTTCGTTG
CTCATCCAAT
ATGATGTAAA
AAGCGATCCA
GCCAATTCAG
TTTGATCCAT
TCT'rGGTCGA
TCTTTGTAAG
CTTGTGCAAG
TTTTACCTGG
CTTrGATGTC
ACTTCAAGTG
TAACGTGATT
AAAGAACTGT
AAGTGTGTAA
AATAATTTTG
AGCTTCAGCT
TGGAAGAGCT
CAATTGTGCC
TGGTTTCATC
TCAATCAATG
TATGCAGCTr
GCAGGTTTTC
CAAAGCGACG
CAAGAGTrC
CTTCAGCAAT
TTGTT rAGCA AGTCTCGT CTCCACCTTC AACAACTGAA GTTATGTTGG TATGCTCCAA AGTGTTGTGC GTCTGTT'Tr GAATGAGATT TTCTCTCCGA TAGTTGCTGT TGCAGATACG ACCTGAAGGC ATTATCAAAG CA.AGAGCTTC TTCGTTGTTA GACTTTAGCT GTACTATT'rA CCAATTCAAC GAATTGAGCG TTTTTGCAA CGAAGTCAGT TTCAGCGT'rT ACTTCAATAA CTGCTGCAAC ATTACCGTTA ACATAAACAC CAGTCAAACC TTCTGCAGCA ACACGGTCAG CTTTCTTAGC TGCCTTAGCC ATACCTTVI-r CACGAAGCAA TTCAATCGCT TTTTCGATGT CACCGTCTGT TTCTACAAGC GCTTTTTTAG CGTCCATAAC ACCGGCACCA GA'PrTTTCAC GCAACTCTTT TACA.AGTTrA GC'TGTAATTT CrGCCATTTT AATTC'TCC'rA TATTTTTTGA AAATAGGAGA GCGCGGCTAA.
GCCCCCCCTC CGG INFORMATION FOR SEQ ID NO: 16: SEQUENCE CHARACTERISTICS: LENGTH: 8411 base pairs TYPE:-nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: CGACGGGGAG GTTTGGCACC TCGATGTCGG CTCGTCGCAT CCTGGGGCTG TAGTCGGTCC CAAGGGTTGG GCTGTTCGCC CATTAAAGCG GCACGCGAGC TGGGTTCAGA ACGTCGTGAG ACAGTTCGGT CCCTATCCGT CGCGGGCGTA GGAAATTTGA GAGGATCTGC TCCTAGTACG AGAGGACCAG AGTGGACTTA CCGCTGGTGT ACCAGTTCTC TTGCCAAAGG CATCGCTGGG 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 10993 120 180 240 TAGCIATGTA GGGAAGGGAT GAGATTTCCC ATGATTATAT 240 AAACGCTGAA AGCATCTAAG TGTGAAACCC ACCTCAAGAT ATCAGTAAGA GCCCTGAGAG ATGATCAGGT AGATAGTA GAAGTGGAAG TCTr7CGACA CATGTAGCGG ACTAATACTA ATAGCTCGAG GACTTATCCA
AAGTAACTGA
G;TAGGTATTA
ACACZAGAAGT
GAAGTCGCTT
TGATGTCGTA
GAATATGAAA GCGAACGGTT CTCAGAGTTA AGTGACGATA TAAGCCCTAG AACGCCGGAA AGCTTrAATC CGCCATAGCT GGTTCGAGTC CTACTGGCGG TTCTTAAATT GAATAGATAT TCAATT=GA GCCTACGAGA TACACCTCTA CCCATGCCGA GTAGrrGGGG G1'TGCCCCCT GTGAGATAGG CAGTTGGTAG TAGCGCATGA CTGTTAATCA AGTAATtGAT AAAAGGGaAC ACAGCTGTCI' TCCTCTTTTT GTATCAATTT GTATCACCAA GAGAAC~rrC TTrr=rCCA TGTGCAATCC TTAGATAGAT GCTACTATAT TCTAATTCAG TTIGTAAATC TGTACTAAGC ATGATATGAA GCATTTCAT AAGGAAGTCT AAGTTTGGCA GACACCAAAA TGGTATTTAG ArrC-AGTTGC GTTATTTCG TAAGAAA'N'T GTTATTTCTr
AGTGCATGAG
ATAAATCGCT
TGGATTTCTT
TATTCAAAAA TAGTCCCATT TTCAGAAACA AGGGCAGCCA GAAGTGGTTC CTTCTAAAAT AGCGTCTCT TTGTGATGAG CATGTTTTTG GGAATAGCTT GCTTTGATAG TGCTCAATCA 'rATCATACTT AGCTGGAACG ACTAATTCCG GC7=TCTA CTAATTTGAC GCTcGrTCCAT CAGTAAT'rG'r ACCATAGCAT TTTCAATAGT TGTTACTTTC TTGCATATrT CCTCCTTGTA AACAAATTAG TTTTTTrATCT TG'rAATTTAG ATTTTTTAAT GTATAATCTA ATATGTTTAA AAAAGGAGAA ACTAAGTTTA AAGAATGGA CCTTTATTAT TGTCATGATC GGGATTTCTC TTATTCCAGA
TGACTCTAGA
AAAAATATAT TCAAATGTAT ATC ATAG TGAGTATAGA AGTAGAAATT TTATCAAATG TCGCTTTGT'r TTTAAGCGTT ACTATATGTC TAAAAA'rAGA TTATATCA.AA ATTT'TAGACA AGCAATTTAA AAAAAACCAA TCTGTACAAT ATCATAT1TTT 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 TGTCA'rCAAT GTGGGATCCA TATGGGCAAT TGTCTGACTT ACCTGTGGCA GTTGTAAATA ATGATAAAGA GGCTTCCTAT AATGGTAATA CTATGGCAAT AGGAAAAGAC ATCGTGTCCA ATTTAAAAGA AAATAAAACC TTGGATTTTC Ar'rTTGTAGA TGAAGAGGAA GGAAAGAAGG GATTGGAAGA TGGCGATTAC TATATGGTAG TGACTTACC AAGTCATTTA TCTGAAAkAAA CAACTACATT ATCCAATAT CAATCGACAG CAGCTTATCA A'rCArGACA AGTGAGCAAC AAACTGAGAT AAGTGATTCT GTATCTCAAA ATTCAACTGA TAGTA'rrCAA TCGGCTCAGT CAATTGTAGC TT-TAGTACAA GATTTACAGG GAAGTTTAGA AAACTTACAA AATCAATCTT CTAATCTTC GAC'IrrAAAA AATCAATCTA ATCAAGTATC ACCTATTACT TCTACTTCTT TGATAGGATr GTCAAGTCGA TTAACAGAGA TACAAGGAGA TGTTACTAGC AAAT'rAGTTC 241 CTCCAGTCA GTCGAT1TGCA TCAGGTGTAA ACGCATATAC TACAGGTGTT GATAAAG~rr 2100 CTCAGGGCGC AAGTcAAcTA AGcAAA ATGCCACCrr GACAGGTAGT TTGGATAAAc 2160 TAGTTTrCAGC CTCAAACACC TTGACACAAA AATCTTCTAG ATTGACAGCA GGAGTTGGTT 2220 AAT-rACAATC AGGATcTGGG CAATTAGCAG ACAAATCCAG TCAGTTACrr TCAGGTGCTT 2280 CTCCATTAGA GAATAGAGCT AATAAATTGG CAGATGCATC TG.GGAAACTA GCAGAAGGTG 2340 GAACAAAGTT AACT~TCTGGA T'rGGAAGATT TACAGACAGG ACTTCCrCT TTAGGACAAG 2400 GACTACGTAA TGCTAGTGAT CAACTCAAAT CACTATCAAC AGAATCTAAA AATGCAGAGA 2460 TTT-rGTCAAA TCCACTCAAT CTTTCAAAAA CAGACAATGA TCAAGTTCCT GTAAATGGAA 2520 TCGCAATA~C TCCTTATATG ATATCAGTTG CTCT'r'rITTT GCAGCAATAT CAACAAATAT 2580 GATATT3'GCG AAATTGCCTT CAGGACGTCA TCCACAGAGC CGTTGGGCTT GGTTGAAATC 2640 TTGAGCTGAA ATAAATGGTA TTATAGCTGT TTTGGCAGGA ATqrTGCGTAT ATGGAGCGT 2700 TCAGCTTAT'r GGTTTAACTG CTAATCA'rGA GATGAGAATA TTTATTCTCA TCA'rCCTAAC 2760 -AAGTTTAGTA TTCATGTCTA TGGTGACCAC TrrAGCAACG TGGAATAGCC GTATAGGAGC 2820 *.TTTTTTCTCA CTTATTTTGC TTTTACTACA GTTAGCATCA AGTGCAGGTA C1'TATCCACT 2880 TGCTTTGACA AA'rGATTTCT 'rTAGATCTAT TAATCCCTGG TTACCAATGA CCTATtCAGT 2940 TTCGGGArTA CGACAAACAA 'rCTCTATCAA CAAGTCATTT TCCTAGCTGT CATACTAGTT 3000 CTATTTACTA GTTTAGGTAT GCTAGCCTAT CAACATAAGA A.AATGGAAGA AGATTAAAAA 3060 AATCGACCGA TTAAC'TGGTC GATTTTTTAT GCCTTAGATG ACrTTCGTCT GrGAT'TATAG 3120 ATTCCAAATA GTAAGAGAGA AGTAAAGGAA CAGATTGCTC CAGTAATAAA ACCATTGCGA 3180 .A'rGAAGGAAA GTGTAATAGT TCCTTTCCCC TTGGGAATGT CAACTTTCAT AAATCCAGTr 3240 .*TGAGCT'rGT'r 'AATwrCTAT TTTC?1'ACCA TCTTGGTAGG CAGACCAACC TTTGTCATAA 3300 GGAATGGTGA AGAAAATAGA TGTATCTTGT TGGACATCAT ATGTAGCAAA AACCTTGTTT 3360 *TTAGAAGTTG ATACTGTGAC AGGTTGTTCT TTAATT'=T GAATTGCC'rC GGTGAAAGTT 3420 T TGGTATCTA AACGATAGAA GCTAGGAGAT TCAAATGATA CT'rGTGAA'rr TCCAGGGAAA 3480 CTAACATTGA TATTGAAAGT 7TrNrCTCT TTAGTATATC CTAGATTAAA GAAGGAGAAG 3540 ACATTATCAG TTGTAAAAGT CTTTT'TTTCA CCATTTACAA GGATGTCAAC CTCI-rTGMT 3600 .**TTATCG'N'AG AAAAGTGAAG GTTTATGAAA GAGAGATAAA CTTGGCTGTT TTCTGGAACT 3660 TCAATTTGAT ACTGGATTGC TGCATCTTCA TTTCAAGAAC TTGTGACACT AATCAAATCA 3720 TTAGTATTTT CTATTTTrTC TGTT'r'TrCA 'rAAGGTATTG GAGAAAAATA ATCAAAATTG 3780 242 ACGTTAGCAA GTTGATTAA AAATGAGGCC TGATI'ATCCA AGGTATGTrC ACATCATTGT AAACAGATrG ACTCGCAACT GCAATCGGAA GAGAGTATTG AGGGTAAGAT TATCTTIrG ATAGATATCT TTAAAGCCAT ACI'ATCAAT GAGA'rATTGT ACTGGATACC AAATAAACTA TCAGCCAAAA TACTATT AIrr AGATTGAGAT TAGTCCCAGA GGATTTAAAA CCAAGTTTAT CTAAAGTAGA CGATTTCGAA CAGATGAAAA TTGAGAGATT CCATTCTAGT TGAATTTCAT
ATTGAACTTG
AT'rTTCATAT
AGGACTGTCT
TGCATATCGG
GCTTGATGAA
ACTGTC-ATrr
CCTCTCTGAG
GATTCCATAG
GCAAT'rCCGT
ATTAGAATGG
GAATAGACAA
TGCGATTTTA
AAATT'CCAGA
TTTGTAGTTT
CTGGGATATC
CCATTTGAGA
CAAATAGATT
CCAAAAATT'C
TTCAGTACGA GTAAATTGAT TCGACTATAA GCACTTCGAG TGAAGCATTT AAACTCATTT CACAGATATA AACI'TTTTGA AAGAGTAAGC AGAATATTCA TTCCAATATA TGTTGAGAAA AAGCAAATCC CCATTCCTTA CAACCAGTAT AAATAAAGAG TAACTGCAAG GAGTAAAAGA AATCTGTTAA AAAAGAATAA CAAGAAAAAG CGAAACTAAA CTGCTGCTGT GTAAATTAAC GATAGATGGT AGCTAAAAAT CCTGCTACTA CTTTAAGTTC TTTCAGACGC TTTAAGACT1' AAGGTAGAGA AAATCCAAGC ATAGCGATGT AAAAACATGT TTGGAGTATG CATGCCrrGC CAAAATAAGT CAAGAGCTTC TATGTAAAAG CTTGCAATTA GAAATGCAAA CAATATTACA TATATGAGTT TCACGTGAAA CTTAATAGAT TTCAGCGTAA AAAATAAAAT GGTCAAAATA AAGGGAAATA GTCCAACAAA AATCATTGGG ATGGCCCCAT ACT'ITTrGT GTCAAAGGAA 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 CCAATGAATT GCTTAGCAAA GAGATCAAGA TACCAGCTAC ACT'rCAGTCA ATTTTTCCCC ATGTGTCTGT AAATCAAATA AAACTAGCCA 'rACCAGCTAA AAAGGAGATA AC'TATGAAAT GTCTTAAAGT CCCACGAAAT TTGACAGAGA TACCAGAAAA TATCCAAAAT AATAATTTTG AATAAATAAG ATTGACAGAC TTTCAGTTTG AAkACTTTGTA GAGTGGGAAG AGTCATAATC CAAGAACAGA TGATTTCGA TAAGAAACAA TACTGTCATA TTGTAAAGTA CAATACCAGT TTCTTTTCAG T'rATCAGTAG ATGTAAACCA GTTATAATTA AAGGAATCAA GATAAAAACA TCTAGCCAGG T=TTATCTC TAATTGACTG ACAGTGAAAC TCATCAGAGC ATAGGAAGTA GATAAGGCTA GTTTTAAAAT CTGAGGGATA GATTGAAACA ATTTATTCAA ACTAAAAAAG GTTGACAGAC CAATCAATCC AAATTTTAAG AGAGTTGTCA GATAGATAGC ATCTGGCATA rrcGTTAGAT CAAAAAAGTA AACCAGAGGC GCGAGAAAAC TACCCAAGTA ATAACTAGAT AGGCATAGA ACTTTAGCCC TAGACCACTT GTAAAGCTGT AAAACAGATT ACTATTTCCA TGTAGGATAT TTCGTAAGGC TACATCAAAA ATAACGTATT GATGAAAGCC ATCTCCTAAT AGAGGAGAGT TGTCGCTATT CCAGTAGATA CTTTGAGATA GATATACTCC AGACATAATC 243 ACTACAGGAA TGATGAAAGA AATAAAATAG GTTCGATATG TTTTTAAAAA TGATTTCATG 5640 TTACCTCGTA GAATGATAGA AAACTCAGTT GGTTAACCCA ACTGAGI'TT TTAGTCTTTC CAAAGTTCTT TAACTI-1-rGC TTGTACTTCT GCA~rrrCTA GTAGCTTTCA TCGATACGGT CAATGACGCC A'TT=AGAT AAGACAATGA CAAAGTTTGA ATAAATTCGT GGTCATGGCT GGCAAAGATG ATTGA7TTC GAAGTTTAT 5700 GGAATTCATC 5760 TATGTAGC 5820 TAA.AGTT'rrT 5880 GATCA'rCAAG S940 CTTTTTCTCC 6000 TACGGCCGAG 6060 GCAACCAGTC 6120
CAATCCATCA
TACAAGGACA
CCCTGACAAG
GAAGCCACGT
AAGAA'rTGAT
AGTTGTAACT
rrCAACTG TrGAT'TTA
ACATTTACAG
AGATAGATTC
AGAGCATGAG
GTlrGTAAC AGCAAAGTAT TGTCATCTTC TCTCCTCCTG CAAALATCAGC CCCCACTTGA CAGTTCCTTC
CAACTCCAAG
TTTTCAAAGC
TTCATCTCCA
TTCCTTTAC-T
TGA=TATCT
ATAGTCAATA
AATA.AGTGCIT
TGATTGTrG
ATGACACGAA
GAGAAGACCA
GCGAATTGAC
TAATOCAGTC GTTTGAATAT CAT-GTCC a a a GATGAAACTA ATATTA'rCCA AGATAGTTTC ACCATCAATC TGTCAAGACA TCATTACCAA TCTCACGTC CGCTAAAG ACTAGATGGC ACAATCTCTT CTAGCTCAAT CTTATCAAGC CTGCCTGAC TTAGAAGCAT TGGCAGAGAA ACGAGCAACA TTTGGTAGGT AAGATTCACT TCTCCCATCA TTGCAkCCAAT GTCTTATCAT C1TGGACGCAA TTTACAGTTA AATTTCTAC TTGTA.AAT G ATA'TTACG ATTC'CPTAC GTGATGTTGC AATTrCTTGCA ATTGTTTAAT TI=TCTTCr GCTTTAGCAT TACGGTCTGC TAGCAATTTA GCACCA.AGCT CAGAAGATTC CTTCCAGAAG TCGTAGTTTC CGACATAGAG TN'TGATTTTT CCAAAGTCAA GGTCGGCCAT GTGAGTACAA ACTT'rGT'A AGAAGTGACG GTCGTGGGAT ACTACGATAA CTGTGTrATC AAAGTCAATC AAGAAGTCTT CTAACCAAGT AA'rCGAT'rGG ATATCCAAAC CGTTAGTAGG CTCGTCCAAG AGAAGAACAT CTGGTTTACC AAAAAGTCCT TTGGCGAGGA GAACCTTTAC TT?MCACCC TTGGCCAATT CGCTCATGTT TTGGTAGTGT AATTCTTCTG GAATGTTTAG GTTTTGAAGT AGTTGAGAGG CTTCACTCTC TGCTTCCCAA CCTCCAAGTT CCGCAAACTC TCCTCGAGT TCGGCAGCAC GAACCCCGTC CTCG'rCTGAG AAATCTTCCT TCATGTAGAT AGCATCTTTC TCTTTCATGA TGCTATAAAG TrTr'rCATTT CCCATGATAA CCACATCAA'r GGCACCTTCA TCTTCGTAGT CAAAGTGATT 'rrGACGAAGA ACAGAGAGAC G?1'CATCTGG ACCAAGAGAG ATGTGACCAG TAGTAGGTTC GATATCTCCA GCrAAAATTT TTAAAAAGGT TGATT~CCG GCACCATTrAG CACCGATTA.A TCCGTAAGTA TrCCTTCTC TAAATTTGAT ATTGACATC.A TCAAAAAGTT TGCGATCACT AAAACGTAGT GAAACATCAG ATACTGTAAG 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 244 CAATGTTTTT CTCCTATAG TGTAATATAT T'rATTCTACT AGAAAATACA GAAATATTCA AATTT'rTATT TGTCAATTTT GTGTAAATrA TATTrACAGT ATXCCTTrACA CAAATCTGTA AAAAGCAAGG CTGAlrATr TTGATAAATT ACGGrA'rrr CAVTAAAAAA ATGCTATAAT TGAAAGGACT ATATCGAAGG AGAACAAAAT GACTAAACCC ATrAT'r"AA CAGGAGACCC TCCAACAGGA AAATTGCATA TTGGACATTA TGTrGGAAGT CTCAAAAATC GAGTATTATT ACAGGAAGAG GATAAGTATG TCATGCCAAA GATCCTCAAA TGCAG'rrGGA TTGGATCCAA GGCTGAGTTG TCTATCTATT AACAGTCAAG ACAGAATT GGTCTATCCA ATCGC'rCAAG TGGGACAGAT CAGAAACCAA TGCATATAAC TGTGATGTCT ATATG37rTGT GTTCTTGGCT GACCAACAAG CCTTGACAGA CCATTGTAGA GTCTATCGGA AATGTGGCTr TGGATTATCT ATAAGTCAAC TATTMrATT CAAAGCCAGA TTCCAGAGT'r ATATGAATCT AGTTrCGTTA GCACGTTTGG CTCAGAAAGG ATTTGGAGAA AGCATTCCGA CAGCTGATAT CACAGCTr'rC AAGGCTAATI' TGAT'rGAGCA AACTCGTGAA ATTGTTCGTT TGGTAGAGCC GGAAGGTATT TATCCAGAAA
AGCGAAATCC
CAGGA'rTCTT
ATGTTCCTGT
CTTTTAACAA
S S
S
AGGGCGTTTG CCTCGTTTAG ATGGAAATGC TAAA.
TTATTrAGCT GATGATGCGG ATACTTTGCG TAAA.
AGATCATATC CGCGTTGAGG ATCCAGGTAA GATTI AGATG'N'TTT GGTCGTCCAG AAGATGCTCA AGAA.
ACGAGGTGGT CTTGGTGATG TGAAGACCAA GCGT' ACTGGGTCCG G INFORMATION FOR SEQ ID NO: 17: SEQUENCE CHARACTERISTICS: LENGTH: 9064 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLO0GY: linear
ATGTCT
AAGTA
ZAGGGA
TTGCT
rATCTA AAATCACTAA ATAATGGTAT ATGAGTATGT ATACAGATCC AATATGGTTT TCCATTATCT GATATGAAAG AACGTTATCA CTTGAAATAT TAGAACGTGA 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8411 120 180 240 300 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: T.GCCGTACTC AAGTACAGCC TGCGCTAAGT TTCCTAGNTT GCTCTTTGAT TTTCATTGAG TATTAGTAAC CAAAATCCGA CCACATAGCC ACCCCTATG AATATAGCCA TTAAAGCTAG CATGGAATTT AGGAAATTAA AAACCACCGC AGATACAAAG GTTAGCACAA AAACATTAAA AGCAATGGTG TCAGAAGCCA AGACTAGAAT ATAGGGTGTC AACCGATCTA AAGTTTTGGA ATCTAGGAAA AATAAGTGTT TATACATGAT GACCTCCTCT ATGGCTGAAA AGCAAGCCTT TTCrTrTT ACCCCAAGAC CCTATGTAGA AAAGTGAGCA AAPLACGGGAA GGTCGCTACA 0 1 Ub* *p 0t 00 09 0 9 0 00 0 0 9000..
0 C 0000 9. 0e 0 ATATrATrGA TCACATGCAC CCCATAGGAT CCAGCAAAGA TGATTCCAAC TGOTTGCAAAG GAAAAATGAG GGAGAGCAAA TAAAATAGAA TGCTrAAAGA AAGCATGN'G CAGTAA'rCT AGAAAGAACA GGGCrATATA AATACCTAGC AATACAGCCC AACCTrCCGC AG'rTGACTGA ATCTGGAACA CTAGCACTAA 'rACTGTCAAA ATGCGGAAGA GATAACCATG GCCTGTCTTA AGTAAACTCA AGATATTTTG AATCCAGAAT TACT TTT GGA CGATAAGCGT CAGCTGAGAA ACTGCACT'rA TTTTGAATAG AAGTTCATAC TCACCTTGTC AGGCTCTACT GCTGTAAGAT ACCTGACTAC TAGATAATAG ATACATTAAG A6ATAAAATCA ACCTCGCATC CAAACCAAGA AGAATT'rGAA ACCATAAGGT TTTCCAAAA TCCTTGAT'rT TTACCGCCAC CCCTTTATTA CCACTGTAAA GAACAAGCCA CCCAATAGAT AGAATATCCA ACACACTACT CAAGAAAATA ACCrCCAT'rC ATTTATTTCA CTAACAA'ITTT GAAAAGGATA GAAAGCTACT 'TrTATAATA AA.ACA6AGCAG AGAATACACC TATATAAGCG ATACCTCTAT ACAAACAAAT GACAAACATA TTGGTTCTAG GACTAACCAA ATCATCAT AATGTATGTT ACCACTGAAA AGCAAGACAG GGTTAG1'CT CTAAAAAAAT TATCTACTGA GGATAAATGC TCTTGGTATA GCGGGTCAAA ACGAAGATAT CTAACAGACT AGGCAGGCTT GGAAGAAGCA AATCAAGACC AAATCGCGA6A CTATAAATCA ATTCTTCCAT CAGTGGAACC TCTGCAAAGT TAGTCCCACI' ATAACCAATC ACATG'm'TAG CTGTCTGAAC GTTAAAAGAG ATCGAATACC A.AAGCCATTTr TTTrCTrGGA ACAAGAACCA CAATCATGAC TCCAA'rAAAA AAATTGCCTA TCTGAGAAGA AAATTGCCAA AGACTAAATA CGAAAAATAA GTAAGAGAAG TTTTrCATAG AAATCCTCCC TACTATGACC TAAGAAGACA GTTrGTTTT TTTAAGGCTA GCA'N'AAAGA CAATGAAAAT ATGTCCATAG TAAAGTTTGA Tr'A'rCAAAAA GATGAGCAAA ATAAATTTAA AGCGA'r'r'CG AATATCTACT GCAAGAAGGA AAACTCCTGC T'rCAAACAAA ACGATAGAGA T'PrGTAAAAA TGTCCCTAAA ACAAAAAATA ATCTGTATTT CATATTAAAT AATAGAGCCT TCTACTCAAA TATCC'rGTCA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1250 1320 1380 1440 1500 1560 162U 1680 1740 1800 1860 1920 1980 2040
CTTCAAGCCC
AT'rAGTTGTT
AAATCTGCCA
ACTTATATTT
GCCAATAATA
CACTACAAGA
CACATGAGCA GAAGCGTGAT GATAGAATTC TGTTTCTGAA AGCCGATAAA CATAAGTTGA AAGAGTATCT CTTTTATTTT TTTAAAATGA ACAGTAACGG AATACTATAC A'rATrATAGT CCTAkACAAAT CCAGCTTATC GCAACTTTGA CAAGTTTAGT TGTCATCGAA ACATCTTGAA 0. S 0 CGAAACTATC TTTTTCTTAT AATCAAGAGC GATTrTTTAAC ATATCATTGT TTTTrTAAAAT CCATAATTAT TTACTCCT ATAATGTAGC AGCACCCGTT TT'rTCATCCA AATCTTGAAT ?rGTTAAAAA ATTTAAAAAG TACCAACTTG I1rGTAGACT 246 TAAGCATTAA AAACATACTT TCCTCTTTAT 7wrrCA'rCCTG CTATCACATA TCATTTTCAC TAAATTAAGC TAGCAAATAC AGGGGAGAAA TTTTTGGAAG ATI'TTGAAAA TATTTTTCTA CCAACTACTA AACCAATAAT AAAAC=rA ATATT1AAAGA
AGAGAGTACT
CCATATAAGG
AACTCCCCTG
ATCCGTATCC
ACCAAATATA
ATTGTATTGA
AGGCGAAACA
TTATIrTT
ATTAAGTCAT
AAATCCATAA
TTACCACCAA CATATTGCTG AGTTACACCT ATTCCTATAG CCATCAATTG CGCCATATGC TCTCCACCTT CAACGCAAGC TTTGTATCCA TATAGTGTAT AAAAAAACAT AGGCAATAAA AAACAATTCA ATGATGTTAA AGATAAATAA GATAGGTTTG CATAGGCTAC ACCTCCAAGT CAAATGGTCC CAATAGAAAT AACCCCTGCT GCACAACTAA AAGCATTTCA TTATCCATAA CACTTTTCAG rrACGGAACA ATAGCTCCAC CTGCAGCACC GTCAAACCGT TGTTGCACAC rrTTTCTTCC CCAATCAATA CTGCAAATTG TGACATCATT AGTTTAATAT AAAAAT'rATC
GAGAAAAATT
TTCAATAGTC
AAAAGCGAAG
AATTTATCAT AGATTAGAAA TAATATGACA TTrTTGTTTT TATCGGAGAT ACTTATGGAT AGAATAATAA AGAATATAGC CTTCATA.AAA TTTAGCTTTC ATTTTTATGA TGTAGCGGTA CCAATTCCTC CTATTGCAGC GCCCCATGGT CCACAAC CTAAAGCAGC AACTACAGCT CTCAGCATTG T'rCATTTAT ATTACAATAA CCACCCGTTG CCCCTGTTAC TCCTGCCCAA CCACATGCTC CCATAAATGG TGCTCCAACA
TAGGCTAAAT
CCTAGAAGTC
GC1'CCTCCGG
GTAT'CATAC
ATCCACAAAC
TCCCATATTT
AATTACCTCC
AAGTCTCCTT
CACTGCTCCT
CACTCCACCC
ATAAACCTCA
TTATTAAAAT
TCCTATGTAT
TGTCCCTAGC
ATTA'rTCTTG 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 AGATCCACAC CAAATTTAGC CCACTCGCAG CACAAATAGC TCTAAGACAT TAGTTTGCCA ACCACCACCT CCCCAGCCAC CAAAAGCAGC CCTCCTTCAA TACTAGATAA GTATCCATGA CAAATACTCT TTTAGTATAT CATCGTTTTT CTTGAATTGC AAAAATTACA TAATACCAGC ACTCAAATTC -TTGAAAACAT 'rTTAAACGT TAGATACCGC ATGGTTACAG TACGTGAAGT TTGTGCTTGA AGATTTTACC AGCTTGTCCG AATAGTTCAC AGTGATATCC CATAG'rTATA TCCATTTCAT GAAATTGTTC CATAATTTTTr TTTTTATTTT TAATTTTGT CTTGTTGTAA CTTTGACAAG TAAAATTrTTT CATCCAGATT TTGAATAGTC ATCGAAACGT TTAGACTTCC TGCAAAACTA GAATCCTAGT TCATGATTGA ATTCGTAATC CGAAGCGTTT ACGATGACTT CGATAGGTTG TTTACTTTGG CA.AAGATGTT CTCAACCTTG CTTCTCTCCT GCTTTATCTT CAACTGTTAG CGGTTTGAGT TTGCTGGATT GGATATATCT TCATGAGCCC TTGATAACCA CTGTCAGCCA ATA7TTCTGC GACTCATTTT GAACAAC'rC ATATCATGAC AA.AGAAACAA TTCTCCCTTG ACTTGTGACA ATCGCTTGAG 247 TCTTCATAGC GTGAAATTTC TT=rACCAG AATCATTCGC TAATTrCrTT rrGATrTTA CNrCCGTCGC ATCAATCA'rT ACCGTGTCC'r CAGAACTGAC CAAATCGTAA CACCACIIYG AACAAGAGTr ACTTCAACCC AlTTCGCTCCG TTCC'TCGT GAACACCAAA ATCAGCCGCA ATICTTrCAT AAGTGCGGTA
TTTAGGGCGA
AGGAGTTCTrr
ACGGAGTAAG
rrCTCGCACA TATTGAAGAG TGGCCATAAG AAGGTCT'rCT AGGCTTAATT
GCGTGTTTAA
TGAACACCAA
GTTGATAAGC
CAAGACG=T
TCTTT'rTAAT
AAATCGTGCA
AATAAGAACA
ACAGCTAGCA
TCAGITAGTT
GGATAAATCG
ATAGAACTAT AGTAAAATGA TTCTAACAAT GTTTTAGAAG CATATTrTTGT TTCGCAGGGA TTTrTCTTTTA TACTTCATTA AAGCATGATG ATTAAGCAGA TAGG1TTwTCG TCCACCTT TCTCTTCAAA AGTCGTGCGC GTTTAC??GC TTCATAATTC ATCAGGACAG TCAAATCGAT TTCAATCTAC TATACTA'rAC TATTGCAAAA ACACTTACCC TTCAAGCCCC ACATGAGCAG 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 TAGAGGCGTA CTATTCTAGT ATCTATTATA AAAGGGTAAG AGCTCTACTT 'rTTATAATAC GAACAGCGCC AATATAACG ATTATTTGTT TCCTGCTGTG ATACCTCTAT ACAAACAAAT AATAGACATA AAACCTGTCA CATAAG N'GA TTGGTTCTAC GACTAACCAA ATCATCATCT TCAAAC'rCTC T'rCCCTAGTG AGATAAACAG TAACCAAAAT AGAAGCCA.AG TTAATAACTA T'rGGAAAACT ACGGAkb.AAT TTAAAAACTG ACGAGATAGA AATAGATAAG GGTAGGATTC 4560 AGCCGATGAA 4620 TTATCCTCAT 4680 CTAAAAGAAA 4740 TAGAAACAAG 4800 TAGCAAGA.AA 4860 AACTGCAATC 4920 ATATAGAAAG 4980 CAAGGGCAAC TGACCTAAGA ACAATCTCGC AGTTTTCATT TCTTTTCTCC TTrTCTTTTTA ACATAGGCTA TGGTATAAAA TAGCTGATAC ATGGACATGA TTAGATACAG AACGAAAATA CCTAAATGTG CGATTTATCT TAG7TGAGCA TTTGACCTT GGATCACTCA AATCATAAAT TAAAAAAGCA AGCATGAAAA ACATACTTTC TGTAGACTTT TCATCCTGCT ATCACATATC
AAGGAAGATG
TTGATAGCAA
CAAGCACTCT
TTCCGTTTTT
AATAGATCAT
CCCTA.AGCGG
ATCAGTATTT TTTTCTTCAT AAGATrTCCT AGAACATTTA CACTGCTACT ATAGCACTTA GGTCATCAALA ACCTCTTGAA TTGTAAAAAT CTCTTTATAT TGTATTGATA CCAACTTCTT Al'rMGACAG GCGAAACAAT ATTAAAGAAA 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 CTCCCCTGTA AATTAAGCTA GCAAATACAG GGGAGAAATT TATTTTTTAG AGAGTACTAT CCGTATCCTT TTTCGAAGAT TTTGAAAATA TTTTTCTAAT TAAGTCATCC ATATAAGGAC CAAATATACC AACTACTAA.A CCAATAATAA AACTTNTAAA ATCCATAATT ACCACCAACA TGTTGCTCCA TAGGCTACAC CTCCAAGTAT AGCTCCACCC GCAGCACCAG TTGCTGCACC TTGCCATGTT CCTGTTTTAA TGCCTAGTTG AAGACCTCTT GCTGCTCCTC CTCCAACACC TGCTTTGGCA AAATCTCCCC AATTGCATCC ATCCATAACA GAAAAT'rGTG ACATCATN'T AAACTAAAAT AAATCAGAAT AGAATCCTCA CCCAAI-rAT CACCAACCAT ACCTCCTAAG TGTGCCCAA CAAATGCACC AGCAAGTCCA GTTCCACCAG TTATAATTCC CGTAGTGACT GAGCTATACC CCCCTTCAAC TTTCGCAAGC AACAITTTTG TATTCATGAT GAATACCTCC AAATTCAATA AACAAATAGA, T'-rT'AG 248 GCCACCTTCA ACGCAAGCAA GCATTTCAGT TGTATCCATG ACAAATACTC CT7='rAAA TAATTTTACT ATAArCTCTA CCAACTTAGT CATGTTAATC CACCCCCAAT TGCACCAATG GCTACTCCTA AAGTGGCCAA ACCTGCTCCA CCTGTAATCA CTCCA7=GT ACAATCAGTG ATTTCAGTAT CCA'rAACCTC TAACTGTGAC TT'rrATTTT CAAT~rGTTA CCAAAGTCTT TATCT'=TG ATT-TCTTAA AAAAGTATAT CCTATTTTTT AGTCTAAGAT T'rCAATAATC TTCTCCTTGC AATAAAAAGT TTTACTATAC ACGTCTACTA TCTTCTTAAA TTGAGTATCT AAAATATCTT TATTTATTAA CTTGCAGAA6A TATTCCTAC CAATCCATCA ATAATTAAAA TTTGCTAAC TAAGTAAGTA AATAAGACAG ATCTTACAAA GAGGACATAA TATCGGACCA GTCGCAGCAG TAAATTrGCCT C'PrCCTCCAC TTCCATCA~rTTTGTATTCA TGTAACTTTG ATAAGTTTAG TTGTCATCGA AACGTCTTGA
GGTAGCAGTA
AA=TCGTTA
GCAAAAA.ATA TTAGTAAATA ATACTTTATA GTTAAGTTTT ACTAAGTAAA GCATCAACGA TTACATAAAC GAT'rGATAAT 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 'rATCTTATTC TCATCATTCT TAAATTAATA GCGATAATAA TTCCTCAACC TACACAAATA CTAATAGTAC TGCTCCAATA TAACTATTTC GAGTTCTTCA TGACAAATAC TCCTrTTTTTC TAGATAACNT TGATATTT'rG TACTATATTT AAGAATCATA AGTGTTGC'rG CTCCCCCAGT CA.ACCACCGA TTGCAGATCC TTATCCATAA CAGAAAATG ?TTTTTATT TTTGTCrTGT TATATCATCG T'rTTAAAA TT=CATCC AGATCTTGAA ATTAGCTTTT TTATTTCAAG CCACCTCTA.A ATGTTrAAAA AAAATAATTT CTAATCACTT IM'rACCATT ATAAAATATG AACTTAGT'rT TATGACATAA CAATGACTTC TTATAAACGT ACATr'rGTTC CTGCCTTAGC CTCGATTGCT AAATTCTATG AACTTGCAAA GACCAATAAA GAAGGGACGA AAATGGGCTT TGAAACAAG3A CCTGTTCAAG TCCCCTATCC ATTTATCGTr CACGTTAACA CAGGAAGTTT TAATGACTAT TCAAGATTTC TAGACCTATC CACTATATGA AAGGAATTGC
CTCAAATAGA
G'rTCAGATTT
CTGCTCTTGC
TGCGAGAGAC TGTGGTGTCG T'TCTCTAGCT CACTTCAGAG CATTGTAAAA GCCGCTGATG CAGATAAAAC GC'rCTTTGAC ATGAGTGATG AAGAAGGAAA ACTCCAACAT TACTATGTTG TCTATCAAAC AAAGAAAGAC TATCTGATT'A TTGGTGATCC TGACCCTTCT GTAAAAATCA CTAAAATGTC AAAAGAACGC TTTTTCTATG AATGGACTGG AGTAGCTATT TTTCTAGCTA 7320 7380 249 CCAAACCCAG CTATCAACCC CATAAAGATA AAAAGAATGG TCTACTAAGC CCTCTGATT-T TCAAACAAAA ATCTCTCATT GCTTACATTG TTCTCTCAAG ACTAT'rATCA ATATAGGTGG Trc-=ACTAT CTCCAAGGAA TC?1'GGATGA AATCAGATGA AATCAACTT CAACAAGTCA TCAGCTTCTC ATrGATGTGA TTTTATCCrA ACACGTCGTA CACGAGAAAT TTGGCTTCTA CCATTCTTTC GTCTTACTCG CACAAAACCC ATGTTCATCA TCTTTTCTTT AGTAATTCTA TGGTTAGCTC AGGAATCATC TCAGTTGGTC CAGAGA'rAT CTCCTAACCG TATTCGCCAT ATTNTGAAC
TGGTTATCAC
TTCTGAGTCA
TTCCCATGTC
AAGCTTCCTT
CTTATTGGTC
ATACATTCCA
CTATATCCTC
GAGAT'rAACT T7TrGCG
TATAGATGCC
TGTAGGAGGC
TCCTATATAC
CxrrCACGA TCTTT7TTCTG
TAATCTCTTC
TATGAAACCI'
TGCCA'rrA'C TTCACAGATG C1'AACTCTAT GATGTTTCTA TTCTGATCT CTTCTrrCTC TTATTTCCAT 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 rTTCGAAAAAA
GAAGATATCA
AATATAGACA
ATr'IrACAAA TGAACCATGA TGTCATGCAA ACGGGATTGA AACTATAAAG GCGAATrTGT AGATTArTTG CGAGrTTAAA GCAGGGAACA TCGCTCACGA GTGAAGAAAA TCGCTATCAA GAAAAATCCT TTAAGCTCAG TAAATATTCT AAA'rTAGTTC TGAATATCCT TATCCTATGG TT'rCGCGCTC AATTAGTCAT GTC.AAGTAAA ATTTCTATCG GTCAGCTGAT TACCTTTAAC ACACT'TTrT CT'rACTrTAC AACTCCTA'rG GAAAATATTA TCAACCTCCA AACCAAACTC CAATC-GCGA AGGT'CCCTAA TAACC=TTG AACGAAGTCT ATCTAGTCGA ATCTGAATTT CAACrrCAAG AAAACCCTGT TCATTCACAT TT'rrGATGG GCGATATrGA ATTTGATGAC CTTTCTTATA AGTATGGTTT TGGATGAGAT ACCTTAACAG ATATTAATCT CACCATTAAA CAAGGACATA AGGTTAGCCT AG'rrGGATT 8400 8460
AGTGGTTCTG
GGGCATATTT
CATATTAATT
ACCTTGGCCG
GTAAAACAAC
CCATCAATCA
'rTTAGCCAAA ATGATT'GTCA ATTTCTTTGA TCAGGATATT AAAAACATTG ATAAAAAAGT ACCCTACAAA 8580 CT1rGCGCCGT 8640 GGAA.AAC-T~a 8700 TGAAGTAGCT 8760 ACCTACCCCA ACAAGCCTAT ATCI'TTAATG GCTCTATTTT GTAATCATAT GATTAGTCAA GAAGATATTC TAAAAGC.TTG GAAATCCGTC AAGACATTGA GC'rGGTCTAT CAGGAGGACA TCTCCGTTT TAATACTAGA GTTATAGATA ATCTTATGTC AGTATAGCCG AACGAACCAA
GGTA
AAGAATGCCT ATGGGCTATC AAACTCAGCT CTCTGATGGA GAAGCAACGA ATCGCTCTCG CTCGTGCTCT TrrAACTAAA
TGAAGCTACT
TCTAACTGAT
CCGTGTCAT
AGCGGTCT'rG AAAACCAT-rC
GTTCTTGACC
ATG'rCTTGAC TGAGAAAAAG TCTTTGTAGC CCATCGTCTC AGGGGAAAAT CAT'rGAAGTT 8820 8880 8940 9000 9060 9064 250 INFORMATION FOR SEQ ID NO: 18: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 7780 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: CTCCATTTTr TTGATNTCAT AAATAAACAA CCTCTCTGTT AA~wrTTTAT AATTATAACG ATATCCAAGT TACTTGTCAA GfGTTrTMA AATTTTTATC TCAAAAATAT ?TTCG-TC 120 AA.AAAAAGGA GCCATCAGTT GA'ErTCAAGC TCCCT'TrAT ACAGAATTAA AC'rATTTTAT 180 AGTTCGACAA TCTTACCTGT T'rCAAAGTAG ACAACCCATT CACAGATATT TTTAGCATAG 240 TCACCGATAC GCTCCAAGTA GGAAATAACT TGGAAATAAT CACGACCCGT AACAATGGCT 300 TCTGGATTTT TCTTAATCTC TTCAGTCGCA AGGTCACGGA TAGTTTCAAA ATAGTGGTTA 360 ATTTGCTCAT CCATGGAGGC CACCCGGTAT GCGTCGTCAA CAGAACCATT AAGATAAAGA 420 TC.AAGTGCTG CTTCCACAAC GCTTTTAACT TCACGTCCCA TTrTTTTTAAT TTCTTCCTCT 480 ***ACAGCTGGAA TGCGCTCTTC CCCCTTCATA CGGATGGTTG CCTGGGCAAT GGCTACAGCG 540 *.*TGATCCCCCA TACGCTCCAC ATCTGATACA GCCTTAAGGA CAGTCAAGAC TGTACGCAAA 600 ***TCTTGAGAGA CTGGTTG'rrG GAGTGCGATC ATTTCAAATG ATTTCTTTTC CAGTTTCACT 660 ***TCGTATTCAT TTACTTCTGC ATCATCTTCG ATGACCTCTT TTGCCAGGTC ACGGTCATGC 720 GTGACAAAAG CACGTACCGT ACGATTGATT TGTGAGAGCA C=TT~GTCC CATAGCGTAG 780 *AACTGGTTAT GTAATTTCTC TAAATCTTCT TCA.AATTGAG ATCGTAACAT CTTTCATCTC 840 CTTATCCAAA TTTTCCTGTA. ATATAGTCTT CCGTTTCCTT GTGTTGGGGA TCAAGGAACA 900 TCTGCTTGGT ATCATTAAAT TCAATCAAAT CTCCATCTAG GAA.AAATCCT GTCTTATCAG 960 AGATACGTGA AGCTTGCTGC ATGGAACGGG ?TACCAGAAG CATGGTGTAC TTGTCTTTA 1020 *GACCATACAA GGTTTCCTCA ATTTTACCAG CTGAAATCGG ATCCAAAGCC GAAGTTGGCT 1080 CATCCAAGAG GATGATTTA GGACTAGrTG CCAAGACACG GGCCACGCAG ACACGCTGCT 1140 GTTGACCACC TGACAATCCA ATAGCTGAAT CATATAGACG ATCCTTGACC TCATCCCAGA 1200 TAGAGGCACC TTGCAAGGCT rTTTCTACGG CTTCATCCAG AACCTGCTTA TCCTTAATTC 1260 *CATTGATACG AAGCCCGTAG ACAACATTCT CATAGATAGT CATAGGGAAA GGATTAGGTT 1320 GTTGGAAAAC CATTCCGATT TCC'rTACGTA ATTCAACCGT ATCTGTACGC GGACTGTAGA 1380 TGTTGTGACC ATTGTACACC ACGGATCCAG TTGTGGTCAC CTCTGGATTG, AGATCTCCCA 1440 251 TGCGGTTG.AG AGAcTTGAGG AGGGTTGAcT TCCCTG.ATCC AGATGGACCA TAATTTCCTT AGTGGAAA GATAGGGAAA CACTA'N'CAA AGCC=rCTTT
AAACGGACAG
AGTGACCAGA
TCTTGTCATA
GTCTGATACC TGTAAAATCG TACATAGTCA ?TGGTGGACT CTCAATCAAA TCACCCAAGT CATCTGTCAT ACGGTTTCCT GTAGCTTGGC ATrrGGAAA AAAAGAAGCC TGTATAGTC-A
ATCAAGGCTG
TTATTATAAT
'TTCTAACCAA
ATAGTTGCAG
CTTGCACGAG
S
S
S
S S
S
S
55 55 S S
S
CAGCCTGCTG CATAr'rATGC GTTACAATGA TGGTCTCTTC TAGTTGCATG GTCGCAATCG AGAGGATATC TGGCTTAACA GAGATGGCAC CTGATAAGGT CAAGGCTGAC TTGTGGAGAT GACGAAGGGA GGTTTCTACG ATTTCATCTA CATGCGCAAA GGTAATATTA CGGTAAATTG CCATTCCAAT GTGTTTACGC ATTTCATAAA CACGATAGAG AATCTGCCCA GTTACN'TAG GACTGCGTAA GTAGGTAGAT TrCCCCGATC TTCTTTCAAA TTGCATATCA ATcCC'TTAA CATCCTTAGT AGAP.AGGGCT ACr--rCTT AGTTATATGT TGACATGGCT TCI'CCTTTAG CCGAACTTAC GAGCTCCAAA GTT'AAAAATC CCTGCTGATA CAATCGTTCC ATCTGGAATA ACAGCCAAGG TTTCTGCTTG ACGGAAGATA CAGTTAGACC AGTCAAGAGC TGGCGCCGAT TCGCCAAAGA TACGACCAGA TGCCAAGACG GGAATAACAA CATGAACCAC TG'rCTCCCAG CGTTGGGTAT GGTGAACGTG TTTCAAACTA T.TAAAGACTG TCAAGGCCAA GGCACCTGAA ACAAAGATCA AGTAACCAAA GAGACCCACC ATACAAGTCC GC-ACAAAGT'r GGTAACAGGA TGATGGTAAA GTTTT'rCTTG AGCTCAAACA GATCCAAGGC TGAGGCTGGC TCATCCATTA CAGCGATACA GAGACGT'rGT TGCTGACCAC CGTCTTTAAC CTGATCCCAG AGGGCAGCCT GGAC'rTGCT ATCCT'rAACT CCAGCACGTT AC= AGCAAA TGGATTGGGA CGTTGAAAAA CGTTGATTTC TGGACGGTTG ACATCAATTC CAATATCAAT AGTATCATTC ATGCGATTGA CCGACGGGCC AATCAAAGCT GTAATTTTAT TGGATTCATT TTTACCATAG TAAACATGGA CAGGAAAGGT AAGGATATGC TTCTCATCCC GCAGCGGTr'A ATTTCTTGTG TAGA'rAGCTT AGGATAAAGA TCAGGAGCAC AGCGGCAGAA GTGCCTTCAC TATTGACTTT CCAGATATGG GAGATGGGGC TAGTCACACT GAGGATATTC 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060
TGCCCTCCTG
ACACCCGTTA
CGAGAAATCC
TCCTCTACAT
ATGAT1TGAAA
ACCACTGATG
CCTrTTTTAG TATAGATCAG AGCTGCAGCT CAATACCTGG AAGCGCTTCC
CAAGAGCCAG
TACCCGTCAT
ATCCATACTC
GTAAAGAGGA
CATATTCAGC
ACCAGCCTCA
CTGAGGCAAG
AAACTGGACT
CAAAATTTCA
CAAGTAAATC
CCAGCTCCCA TAGAAAGAGG TACAGAAATA ATCAAGGTAA TGACCAATAG GAAAAAGGAA TTGTAAAGCT GAATGCCAAT CCCACCACCT GCT'rCAAAAG CAGAAGACCT TCCAGTCAAG 252 AAAGACCAAG AGATATG.GGG CAAGCCCCGA ACCAAGATAT AGAGAATCAA GGAAGCCAAG ATTGTCACAA TGATGCTAGC AATCGTATAG AGGACAGCTG rCAAGTrT ATCTAATTTC TTAGCGCGCA TAAT'rrCT TTCCTCTTTC TTTCGTAATC AAT-rTAATCA CACTGrrAAA AACTAAGCTC ATCAAGAGCA GTACCAAGGC CAGTGACCAG AGAACATTAT TCCCATGACA GTGTTCCCAA TTCCCATAGT TAATATAGAA GTTAAAG7'rG GGTCAAGGAA GrGGGATAA CA=~GAGT'r TCCGACAACC ATCTGGATAG
TATTTACAGT
CAGCTGGTGT
CTAGAGCCTC
ACCAAAGGCA
CAAGATCACA
ATAATAACGA
CATGACAAAG
CGCGCCATCC CAAAGACCAC CGCCAGATAG TCTGCCAGCG GGAACCGCAC GCAAGCTATC AGGACGGAAA TCCCTGACAA TGCAGTGAAA ATACCAGAAC GGGCCGCCr AGTGGCTCCC ATAGCGAAAC TGCCTTCACG CCTTGTCATA AAGGTTACGG TCGGCAAAAT AATCCCAAAA CCAGTCCCAC CAAAGACACT GCGAACAAAG GGAACGACGA AACCAGGAGT TCAATAGCTG AAAAACTGCT GCACCAATAG AAAGGAACCC AAAATCATAG TCCCAAAAGA AAGTCAAAGA
CTTGCAAGCC
GTTGCAAAAT
CAAAGGGT
GAAGGGCACC
TATTCACACC
AATAAATCCG TACACTACTG CTTCGCCCCT TTTGGTGATA TGCGATAAGG GCTGAGAGAA AAATTCTTTA CTAGAAGGAT
AAGGAATCCC
CTTCGGTCAT
TGGTAACGAT
TCCAAGT'rCC .*t
S
PU
U
P
CS
5 5
S
5.
S S
ATTGACAAAG
GATGACTATC CGCTACGAAA ACCAAAATCA TGGCCACAAG
AAGGTCGACA
AAAGAAAGAC
CAAACC'rTTT CCTAAT'I-CT CCAGACGAGA ATTCTTTGAT GGAAGCAACA TAAT'rCTTCT TGA'rrCATTA TTGTCTCCCT TCCAACACTG TCACAGTTCC TrTTTCAACCT TCATTTCCTT AATCGGAATA TACTTCAATC CTT'rGACAAT GTCTCATCCC AGAGAACAAA ATTGAGAAAT TCTGCAGCCA ACrCATTGGG GTATACATAT GCTCATAAGA CCACAAGGGC CAATTATTCC TACTTATATT AAGTCATAGC CATTCAACTr CATCCT'7TG ACCGAATCAT CTATATAGGT TAAGAGATAG CTCCTGGACT TTTTGATACG ATTGATTN'A CCGCTCCArTT TCCTGACTr'r GC-A'GGCAGA CTGACCTrCC ATAATGACAG TATCAAAGGT CCAGAGCCGG CTGCCCGATT GATAACAGAG ATGGGTAAGT CCTTACCACC
AGCCTTTTTG
AGGCAAAGGT
TTTTCTTAGC
GGCAGCATCT
cCCTTCTTGG
CTGCCCCAAT
rrCTGGACTT
AAGAGATAAA
TGAATCCTGC
AGCACGAGAG
AACCrTTC 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 *U 5.
S U
C
.555 S5* .5 S S 5
CAATTGGTTA
TCAACCTCCr
GCAGAAGCAT
GCCCCAGACT
CCTCACCTAT GAAGATITGA CGAAGTTGCT CTGTCGTTAG GTTATCAACA TATTGACAAT CAGAGCCAAG CCAGCTACCG CGACCTTGTG G'rCAACAAGA CAATTCCGTC ?rTTTTCCTCA CCAAATACAT CTGAGTTTCC TATATCAACT GAACCTGGGA CAAGCCTGTA CCAGAACCTC CCCC7rGGAC A'rTGACCCTT TTTCCAACAT GGATCGTGCC AAA7TCATCT GCCGCTACTT CAACCAAOCG. TTGCAAGGCA 253 GTTGAGCCAA CAGCCGTTAT GGATTCTCCA CGATCAATCC AGCTACCACA GCCTACrAAA CAAGCCGTCA GCCAAAAAGC GATAAGAGAC AGAGCAAGCT TTrTr'CTTTT rrTCTCCTCG AAAATAATTA TGAATACTGT GAA7"TPTA AGTAGTTCT-r CGCATGAATT CrrACCAAAT 7T=CGCAA TTGArrATTT ATATAATATA CTCTTTCCTA ACCTCC1'Trr TTCATATGTG GA'rAAAATCT CTTGTCTATC TTGTCACCCA TTATAGTCAT TTCGTGTCC rrrTCCCCT Ir=AATGCA CTCTCCTTAG ATGATAA'rCC AAAAGCTAGA AAGGTATCTC AAACCTCTCT ACTAGTTTAC AACTAAAAGG AAAAGATTCT ATTTTATGAG AAATCTAGTT AAGAACGCTA ATAACTAAAC TTCTTGTACT CTrAAAAT CTCTTCAAAC
TTCACTGTT
TATGAG3TTGA
GGCTATATT-A
CCTTCCCCCA
AGGGAAATTA
ACTCTCCCAG
TACAAGCGGT
CAGTGTT'rTG AGCTATC'rAT GGCTAGCTTC CTAGTTTGCT CTTTGATTTT CAT'rGAGTAG TAAAACTACA TGTAATGGCA ATCAAGATAT CAAGAATCAT CCTACTAAAA AAATCCATAC TTrCACTATA ACATAGAATA AGATATTTGA CTAGCATTT CATTTGAATC TGAGCCTTT TGGAAAATAA 'rTTTTCAAAA CATTTCCAGT AACCTTTGCA AAGCCCAACC CATrrGCCTTT AACCAAAACT TGGTACCAAC CA'r-r'GGCAG ACTTTCTGCC AGCTGAACGG TT'?CTCCAGC CGCATACTTG ACAAACGCTT CTGGCCAAT TTCAACCGAC TGTTCGACCT CACI'CGGTTT CAAGC'TAAA CCAAGAGCGA AACTGGGCTC AAAGCGTTTC TTCTTAAAAG TACCCAGATG CAGTCCArrG CGAGCAATCT TGAGCTTCCA TAAATCTGGC AAAAGTrCTG GCAAGAGATA AAGC'rGGTCT 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 CCAAAAATCT GCAAGATACC CGGTAGAT'rG CACAAGGCAA CTTG'rrCACG GCTGAGGTTA TTGTTACCCT TAAACTGTAG ATGGGCAACA TACATCCCAG CCGTTTCTGG CAGGTCAATA GGCAACAAGT CAAAATCATA CTCTTCCAGC GGTCCCCAGG TACAGGTCGA ATAAACCAGA TCCTCCAGAA 'ITTCTCTTTG CAAGCTAGCA TCCATAGCAT CAGGTTGCT ACGAAACATT ACC'rTCAAAT GGTTTTGGGC AAATTCCTGC CTCTTACTTG CCTTAAATTT AGGAGCTGGA AACTGACCCT CTCCCTTAAA CTGATGAGGA CCAC'TACCA TTCCATTGAT ATGCTCTACT AACCAATTGA CAATCTCTTC GTTTTCCTCG TGACCACCTT CAGCTAACAT GGTCACTGCA CATTGACTCG GATAATCTAA GCTCCAATAG CC'rTCACCAG AGCAAGGGGC ATCAAGAACG ATTAAGTCAA AATAGCCTTT ACGACATTTG TCGCTCCAAA GAAATTTCAT TGGAAnCAAG TTGCCCCCCG GTGCAGCAGC AAAGACCT'rG ACCAAGCGGT CGGCAGATTC A'rTGGTCACC ACCTCCATG TTTT'CAACCA AAATCTTAGC CCGTTTGCTT TAGCCCCTCC CCTGCTAGAT AGGCTGCCAG TTGAGTTGAT CAAGTCCAAG ACCTTCATAC CAGGACTGGG TTGGGCTACT
TGAGCCACCA
GATTTCCCTG
GAAAGTGCr
GAGGCAAGAA
TCTGGAAATr
ATCTCAAGAG
AGGGTCACAA
GGTTGCAGAT
254 TTTrGAGCAGC AGGTTCTTGC GAATAAACTA AACCTGTAGC ATGCTCAGGC AAACCTTCCC ATAGTGGCCC CAAGGGGTTT GAGTAATGGC ATCAGAAAAG CTrCTrTAA GGGATTGACC CGAAAGGCCG AAACCGCT'rC CTCCTCAAAA AATCTCTTGC CTCATCTCCT AGTATCTCTr TATATTTTTC GCATTTrAAGT TCT=TT? T1CGTAAATAT AGGACTGAAT GCACCATCAT GACCGGCTGT CTGGTTTGA.A AATCAGGAGC CCCGATAGCC CAGACTTTCC CCTAAAATAC TAGCTGCGGC AACTCAGATA GGTCAACAAA CGCCCTGACA AAATCTTGC!
AACAAATCCT
TTCCTrCrGC
TTCACCAAAA
ATAA'rCCCAT
AAAACTAATG
GCCGCACTTC CATAGACACG TCATTGGTTT CCAGCATACC TTAAAAGGAG CTAGGGACCT CAATCCCCTT TGACCACATC GCCATCATAA CAGCAAAATC AACACCAAGA ACCGCTCGGC TCAAATCAGC CAGCCCCCAT ACTATTCCCT GCAATGAGAA AATC'rCCAAG TGGTTrAG?'r ATCATTTAGA CAAACTGGAA ATTCCCCACC ACCGTGGTA.A ATAAATCAGA CCAAACTGTC CCTGACCATr TTCAAAATAA TTCCrGCTGG GCTACAAAAT TATTGGTACC ATCAATGGGA 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7780 TCAATGACCC AAACCTTGCC CTCTTGAACC GAGGCTCGCA GACA.ACCTTrC TTCAGCACAA ATCTTATCCT CAGGATAACG GGACAAAATC TCACCAACCA TCCAGTCTGG TCACCAAATC TGTTGGAGAG GACTTGG TTT ATATGGTCAA GAATGTACTG ACCTGCTTTC 'IrAACAAGCT CTTTCCAAGA GAAATCTTTC CTTCCCCTTT TTCTTTGGGG INFORMATION FOR SEQ ID NO: 19: SEQUENCE CHARACTERISTICS: LENGTH: 4820 base pairs TYPE: nucleic acid (C STRANDEDNESS double TOPOLOGY: linear AGAGTTCCTG AACTTCTTTG CAACACGCAA GTCTTCCTGC CTTTAGCAAA TTCAAATTTA (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: GTAATGATAT AGGAACACCA GGTGACCTGA TGGGACGTCG TAAGCCTATG AACTACTAGC TGCTAAAGGC ?I'TAAAGATG GTATGGTACC ATATATCTCA AACCAATACG AAGAAGAAGC CAAACAAAAG GGCAAGACAA TCAATCTCTA CGGTAAAACA AGAGGTNGMG TITACAGATGA CTTGGTTTTG GAAAAGGTAT TTAATAACCA ATATCATACT TGGAGTGAGT TTAAGAAAGC TATGTATCAA GAACGACAAG ATCAGTTTGA TAGATTGAAC AAAGTTACTT TTAATGATAC AACACAGCCT TGGCAAACAT TTGCCAAGAA AACTACAAGC AGTGTAGATG AATTACAGAA 255 ATTAA'rGGAC 01rGC1'G~rC GTAAGGATGC AGAACACAAT TACTACCATT GGAATAACTA 420 CAATCCAGAC ATAGATAGTG AAGTCCACAA GCTCAAGAGA GCAATC -rrA AAMCCTATCT 480 TGACCAAACA AATGArrrA GAAGTTCAAT TTTTGAGAAT AAAAAkAT= GTCTACTATT 540 AGGAAATAAA GTTTAAAAG GTGATGAAGA ACAAACCAAG ATTCAAGCAG GAA?1'CCTAC 600 TIGATAATGAA GTAAGTTATG ATCI'TATTA TCAGCAGGAA ACTCrrCCTG CAACAGGrrC 660 A'rCAACIrCT GAGCrrACAG CrTrrAGGCCT ATTAGCTGD GGTAGTTTAG TCT'1rGGT 720 TCATAATATG ACGGGAACAG TTrrTGCTC CCTCTGAAA GTATCA'1rr GATGGCTTT-r 780 rrCTATATAG GGTAAAAGAT AGGGTAAAAG GCTATCATCG GACAAAATAA AGAAGGCATG 840 ATATAATATA AAGTAGA?1'T CTATGTCATA AAACAAGAAC TGrrGGACA TCATTCATrr 900 GAAAACTCTC TATGTTCAAA CAATAGTAAA ATAAAATAGG GGA'rCTAAAT CCTTCCTA'rG 960 AAAGGAAAAA ACTCAA'rGGC TACTATTCAA TGGrrrCCTG GTCACATGTC TAAACCCT 1020 CCACAGGTGC AGGAGAATTT AAAATTG'r GA'N'?rGTGA CGAN'TFAGT AGATGCACGC 1080 rMCTCTAT CTAGTCAAAA TCCTATGrG ACCAAGATTG TTGGTGATAA ACCAAAACTC 1140 so* TTGATTTTAA ACAAGGCCGA CTTGGC'rGAT CCAGCAATCA CCAAGGAATG GCGTCAGTAT 1200 ***TTTCAATCAC AAGGAA'rCCA GACGCTACCT ATCAACTCCA AAGAGCAAGT GACTGTAAAA 1260 %GTTGTAACAG ATGCGGCCAA GAAXGCTCA'rG GCTGATAAGA TTCGCTCGCCA GAAAGAACGT 1320 GGGATTCAGA TTGAAACCTT GCGTACTATG ATTATCCGGA TTCCAAACCC TGGTAAATCA 1380 o ACTCTGATGA ACCCTTTGGC TGGTAAAAAG ATTGCTGTTG TTGGAAACAA GCCAGGGGTC 1440 ACAAAAGGTC AACAATGGCT TAAAACCAAT AAAGACCTGG AAATCTTGGA TACACCGGGG 1500 *ATTCTCGC c TTrGA GGATGAAACT GTTGCACTTA AGT/rGGCATT G;AcTGGAGCT 1560 *o oATrCAAAGACC AGTTGCTTCC TATGGATGAG CTTACCATI-r TTGG-ATCAA TTATT'rCAAA 1620 GAACATTATC CAGAAAAGCT GGCTGAACGC 'rTCAAACAAA TGAAAATTGA AGAAGAAGCG 16a0 Voss CCTC'rGATTA ?rA'rGGATAT GACCCCCCCC CTCGGTTTCC GTGATGACTA TGACCGTTr 1740ooo* TACAGTCTCT TCGTGAAGCA AGTCCGTGAT GGCAAACTCG GTAACTATAC CTTAGATACA 1800 'rTGCAAGACC TCGATGGCAA CGATTAAAGA AATCAAAGAA TTCCI-rGTCA CAGTCAAGGA 1860 o GTTAGAAAGC CCTA~m'T TAGAGCCTGA AAAGGATAAT CGCTCAGGAG TTCAAAACGA 1920 o* AATCAGCAAG CGTAAAAGAG CC-A'rCAAGC TGAATTAGAT GAAAArrTCC GCTTGGAATC 1980 CATGCTTTCT TATGAAAAAG AACTTTATAA GCAAGGATTG ACCTTAATTG CAGGTATTGA 2040 TGAGGTGGT CGTGGTCCTC TTCCTGGTCC TGTAGTCGCT GCGGCCCTA TTTTATCTAA 2100 256 AAATrGTAAG A?1'AAAGGTC TCAACGACAG CAAGAAAArr CCrAAAAAGA AACATCTGGA 21.60 .GATTTIccAA GccGTrcAAr. ACCAAGCCTT GTcGArrG6A AT~rTATcA TAcATAATcA 2220 GGTc&TCGAc cAAGTcAAcA TcTATGAA~c AAcCAAAcTA Gcc.TwCAAG AAGcAArcT 2280 CCAGCTCAGC CCTCAACCAG AGCACCTT GArGATGC!C A74SAAACTGG ACTTGCCCAT 2340 TTCACAAACC TCCATTATCA AAGGAGATGC CAACTCCCTC TCTATCGCAG CAGCATCrAT 2400 AGTAGCCAAG GTAACACGTG ATGAATGCr GAAAGAATAC GATCAGCACT TCCCTCGCTA 2460 TGA'rTCGCT ACTAATGCAG GATATGGCAC AGCTAAACAT CTGGAAGCCC TCACAAAACT 2520 AGGACTTACC CCAA'N'CACC GAACCAGCTT TGAACCCGTT AAATCACTC'G TTTTAGGTAA 2580 AAA.AGAAAGT TAATTGAAAG GAAATAACAT GGACGAACAG TCGGAAATAC TCCGTTCTAA 2640 GAAAGAATTC GCCTTCAT CCAGCACTAT ACTATCCCAA GTTGGTCGAG GAATCAT-rGT 2700 CGGCCTCATC GTTGGAATTA 'rCGTCCGATC C7TTCGTTTC TTAATTGAAA AGGGCTTCCA 2760 CCTGATACAA GGAG~rrATC AAGATCAAGG GTACrrACTC CGCAATCTr'r TTGTACTGCT 2820 TTTGTTr'rAT A'rACTCATCT GTGGCTCAG TGCCAAACTA ACACGGTCAG AAAAAGATAT 2880 TAAAGGCTCA GGAATTCCTC AAGTCGAAGC CGAACTGAAA GGCCTCATG'r CCCTCAACTG 2940 .GTCGGGCAT-r CTTTGGAAAA AATATGTCCT AGCTATTCT? GCTATTGCCA GTGGACTCAT 3000 GCTGCTCCA C.AGGGACCCA GCA'rrCAACT TGGAGCAGTT GGTGGTAAAG GAATTGCCAA 3060 ***GTGGCTrCAAA TCCACTCCAG TAGAGGAACG TTCC~TGATT GCCAGTGGAG CTCCAGCAGG 3120 ***TTTAGCCGCA GCCTTTAATG CTCCTATT'GC AGCAC1-rCTC TTTG-TGTAG AAGAAGTCTA 3180 TCACCATTTT 'rCGCGCTTTT TCTGGGTCTC AACTCTAGCA GCCAGCATCG TAGCAAACTr 3240 TGTGTCTCTA CTCATG= CC GTTTGACACC AGTA'rTGGA'r A'rGCCAGATA ACATTCC TCC 3300 5CATGACCCI'A GATCAGTATT GGATATATCT CGTCATGGGA ATTTTCCrrG GATTTTCACG 3360 TTTTC TCTAT GAGAAAGCTG TATAAACGT TGCAAGAGT'r TATGACTTGA T'rCGTCAAAA 3420 AATCCATTTG GATAGGGCTT ATTATCCCAT CTTGGCTTTT ATCCTTATCA TACCAGTCCG 3480 *AATC TCTTA CCTCAAATCA 'rr1 4 GGCGG AAATCAGCTT GTCCTTTCTT TAACTGAACA 3540' AAATTTTAGT TTCCAAGTTT TATTAGC'rTA C.TTTTT LAATC CGCTTTATTT GCAGTATGAT 3600 TACCTATGGA AGTGGACTGC CAGGAGGAA1 TTTCCTCCCC ATTTAGCTC TTGl~rCTrT 3660 .GCTGGTGCC TTAGT'TCC TTATCTGTGT CAATCI'rGGA CTTGTC.AGTC AAGAGCAATT 3720 ***CCCTATAT"?r G'rCATTCTAG GAATGAGTGG CTATI=TGA GCCATATCAA AACCTCCCTT 3780 A.ACCGCTA'rG ATCCTCGTAA*CTGAGATGGT ACGAGATATT CGCAACCTA TGCCACTTGG 3840 TCTrTGTCACT C=rGTTTCTT ATATTATCAT CGATTTGCTC AAAGGTACGC CAGTCTATGA 3900 257 AGCCATGCTG GAAAAAATGC TTCCAGAAGA CGAAATACCA GTICTGATA AAATTGCTGG CAACGTCCTC ATCACAACTC AAGTCCATAA CAGAATGTAT CTGGGTGATA TGATrCACCT CAAAGATTTG TTGTTGTAGT ATGAGTATTT AGTATCTAGC GAAGGAGAAG TTACACTrAT GAAACAAGTT CATGA1ACTCA ACTTACCACA TGGCAAGAGC CAAACAGTTA ACGGCTCAAC GGTTATTCCA AAAAGTGAAA TTGGAAAAGT ACATAATTTA TGTrATGTAA ATGATCAGTT TGATTTATTr TTT'rACA'rAT
AAAGTATAAA
AGAAAACCGA
AAATAATTGA
GC'TAGAAAGG
TTCTCAGGAA
ACY'TATTAA
AGTTTACTGT
TGAGATCGGT
AA.ATAAGACT
ATCAAATCTG
TATrTTTAC
ATAATTAAGT
TACAGTAAGA
GAAAAAGAAA ACAATAGCAA TTATATAGAG AAATGAAATA GAAATAGGAT GGACAATCAA ATCAATrTCT AGCAATGTT'r TAGAAGTCCA GATGTAC1'AT
TGATGAGGAA
TAGAAATGAT
TTAAAATCAT
AAAACAATCA
TCTAGTTTCA
AGTTATGTAG
TTTGTTTGTA
AAACTATAAT
TGGAGAAAAT
ATCCCCTGGT
3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4820 6 0 0@ 6.
0 4 *0 *0 0 *0 0 ATCTA'rTATA CAATGTGT TATAACAGA AGTTTAGTGG GGTCTAAGTC TTTTTATCAC T'rTCTTGAAA AATATACAAG ACATGAAACG TGAGATTTTA AAGT'rCTGGA ATACTACCAA TGTATCTCAT AGCTCCTTAT ATAGCTCTrC GTGAGATTTT TATTATTrTC CTTATTCTGT TTTGAAAAAC TCCTATAACA TCTTTCCGAA TCTATGCTAT ACTACTAGTA TA CTT ACTT A CTGGAACGAA TCGACAAACT AAAACAACTC INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 21338 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear *0 00 00 0 0 000 0 *0*0 (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: CTACGACATC ATGATTAACA GTCATGCGCT ACTACCAACT GAGCTATGGC GGATAAAATA GTCCGTACGG GATTCGAACC CGTGTTACCG CCGTGAAAAG GCGGTGTCTT AACCCCTTGA CCAACGGACC TTCTATCTGT AGCAGATATA ACCATTATAT CAATTTCTTG CTAATTGTCA ATCACTTTTG AGATTTTTTC TAGGACAACG ATGGTCTTCA TATTACCTAT ATTTACCAAA CCT'ITTCCTG AATATCTGTC TCTAAA6ATAT CT'TTTAATTT TCTAATTTTT TAGAAAACAA TTTCTAAGTT TTTTCGATCA AATGACTTGT GAGGAGAATA AAATCGCTGA ATGGTACCAT AAAACTCTTT TGCAAAATTC
AATCTTGAAA
ATTTCTCTGA
GTATGTTTGT
TTACCAATAA
258 TGCGCAA'rTT ATGAGATACC CCTGMTG?1- CAATATACAA AATA'rCA'rGG TAAGGAATTT 480 TTAAATCATT TCCCTTG'rAA TTGTAGTCGA AATAATCTAC AACP.TCTTCA 1rNrCAAGTA 540 ACATACTC7'r CGTGTAGAAG ATATIrMCr CAArrCTT CTTAAACATC TCATCAT'rGA 600 TATCC?1'ATC AACAAAATCT AGGGCTGATA CCTGGTATrr ATAGGTTAGA GTCGCAAACT 660 CTGATCGACT AGTGATAAAG ACGATAATAG CGTAAGGA~r GTAM'GACGA ATGAGCTGAG 720 CCACTTCAAA TC=-C--1TTTCx TCAATTCCAT GAATATCGAT ATCTAGGAAA TA.AAGCrGAT 780 7*rACTCATC A'TCAATG 'rATTC7rCAA ATTCACGGAC NrMCCCCTT GTCTTGTATG 840 ATATTGGAA'r ATTCGATTCT TTCGAAATTT- CATCCAATAT TCTCTCTAGT CTCACTTGAT 900 GTTCAATAAC ATCI-rCTAAA ATTAAAACTT TCATTCAAAT TCCCTCTTAA ATCTAATGAT 960 TTGTCTAAA'r GTACTGCCTT CCATCTCTGT TTCTAAAATA ATATTGTTGT ACTTATCTAG 1020 TAGTTCTTTC ACATTAT~rA ATCCGACTCC GCGA'rrrCTT CCCTTAGTGG AGAATCCTAA 1080 GGCAAATAGA TCTCCTGAAG GAGTCATCGT CATTTTACAT GAAT'rCTCAA TC-ACAA'rAAC 1140 TGTTTCACTT TCCATCrrAA TAACTGCTrAC TTCCATCTGC TTTTTATAGC TATCAGCCGA 1200 *TCCTT'CGACA GCATTAT'rCA ATAAAACGCT CATGATACGA ACCAAATCCA ATAG'rTCAA'r 1260 TGGAAGCTTG GTAATCGTAT CTTTTACTTC CAG'rOTAAAC TCTACACCAT TATTTCGAGC 1320 ATAGACAATT GACTGAGCAA CCAAACTTCG TAAAGCTGAG TCTTCTATGT TGTTCAAATC 1380 *AAAGTAAGTG TACTTATCTG AACGCAATTT ATGATr'rGCT TTGACTAAAA CTTCATTGTA 1440 *AATTCTGTCA ATTTCCTGTA AATTACCACT GTCAATTGCC ATCTGCATGC TGACAAGCA'r 1500 *TCC-AGCATAA TCATGTCGAA AACCACGGAT TTCATTATAC AGACCAACAA TTTCATCTGT 1560 GTAATTCTGT AAATGTT'TCT GTTCAAATTT CTI'CTCTTC AAAGCAATCT CTTTCTCCAT 1620 TTGAACTTTA TGAGAATTCA TTGCAAAGAA GGTCA.AAAGG AGAGAGATAA ACACAATAGA 1680 .TGACAAAATA CTTCCAAAAC TATTCAAATG TTTAATCGTA CTTACCATAT CTGAAACCAA 1740 AGATACAATA TGTAGCAATA GTAAAGCAAA AAATACTTTT TTCAAGAAAG GATAAAGGTA 1800 *GTCCTTGTCA AAA'rAGGCTA GTTCCAAATG GAAATAGTAA A'rGA7=T-A ATGTALACAAA 1860 *ATAGGTTAAC ACCGTCACAA CGAAAAAGAA TGGGAAATGA TA'N'GTAAAA CAAAATTATC 1920 TCCTGT'rATA GAGGAGAAAA TTACGGACAG AAAGTTATGA GTGCTCTCAT ATAAAAGAGA 1980 TAGTAGTAAA CTTAGGAATA GTCCTCTATC CCTCTCATAC TGTTTCATCC ATCGAAAATA 2040 G* *CGAATATAAG CCCAAAGGAA ATAAAAATCT T'rCAATCCCT ATT-rATCTA AATATAGAAG 2100 ATAAAAGGAA AATTCAAGTA CTATTTCAGT TAGTAATGTA TAAGCACCAA AAACGTATAA 2160 TTCTTTTCTA TTTATTCGAC C'TTTACAAAT TAAACGCTAA CTGTGACTAA TAATTAAAAA 2220 259 ATGAACAATA AC74GTCCCAA ATCCAAGTAA ATCCATTACT CIrC~r ATTCATTAC TTTTTTCGTA GGAAAAGAAA ATCTTTTGTA AGTCTTTTTC ATAATAAAAT CTCCTAAAAT AATGGAATTT CGTT~rAGAT TTTTrAAGA AAAAAGCCGG TTCGAACCTG CGACCGTTCC TAATACAATT ATTCTACCAA ATCCCTGCTG AATCGTAAAA ATCAAX3GATG ATTCTTGAAA C1TTCAAAGCT ACAAACTGTT GT7TTTTCTTI GTAAGCTAAC!
AAAATTCTCT
GAAAATTCCC
CTTAGAAGGC
AAATTCAATT
GCGCGATAGA
CAACTGTCAT
AGCTTTGCTA
GAATGCTCTA
AAAAGTCAA'r TTTGTrCAAC CAGAAAGATr'
AAGTAAGAGT
AGAAGAAAGT
ATAAAATTCT
CTGGTGTT
GATGGGGI'AA GGTTAGGCGA CCAAAACTGA ATGATAATCC TAAACTTCCC CCAATAATAA TTTCT'rCTAA CTGCTTACTA AATCTTCTG CAATAACGAA ATCACCGTCA GCAATITTG TCTrrTGATT T'rCTGATTCA. CTGGCC'rTAT TCCTCATCTC CCCACCTTTA *CCAATTrAAC TGTGTrTrC TTACAAAAAC CATTATACAA ?1'?rTTCTCC CAAAGTGTAC ?ATAI-rGAT CCCAGCAGGA *TCCAGCTGAG CTATGAGACC TrCTATTT-A 'rGGTAGGGGA AAGAACTAGT CTCATTAACT GGCTCTATTT T1'TACAGATG AGAAAATCCT TTTATAGAAG TTCCCTTCA ATGGCTAACA CTGACC= CT ATTTCTAAAA TTCATCTGAT AACTCAATCA TGCGATACCA TCTTTTAAAT CATGACTCTA 'rTCTAACATA GATTTCAATT AAGAAAATCA GTGTTTAAGr TATAATTAAG CAAATATGAA ACATCTAAAA TCAT'rAGCTT T=rAGTGGA TTTCAAACTT AGCAAA'rCTA GAAATTCGTT T'rGAATACTC 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 ACTTTTCTTT CAGT'rTCCCA TTCTCTATTT 1-I-CACATCT CAATTTCAAA AGAGTTATCC CTAGTCAGT'r TATACT=TCA ACATTTTACA AAAAATGGTT GCCTTGGGTA GTTTITTCA.AT AACAATAGTrA CTATTACACA AACAAAGTAA AAGATGCTGT CTATTT3GGCA ATGATGATAC GTTATTTATA AAAAGAATGA ACTGTTACAA CTTTAATTTT TA'IrCACAAA A'rAAAAAATA ACAGTTTGTG TAAAACT'rrr GTAATTCAAA CATATGGAGG TCAATTATTA GTCGTTATCG AACTCAACTA ACTCAAAAAA GTAGTGTAAA CAACTCTAAC AACTGCCTAT AAGAACGAAA ATTCAACAAC ACAGGCTGTT TGTTTC'GTT ATTACTTATT~ CGGCAA.ACAG ACAAAATAGC TGACACAGAT TCTCAGCGAA TCTCTAGTGA AGGATCTGGA TAAAGAAGCT TACATCGTCA CCAACAATCA CGTTATTAAT GGCGCCAgCA AAGTAGATAT 'rCGATTGTCA GATGGGACTA AAGTACCTGG AGAAArrTC GCAGC1'GACA CITCTCTGA 'rATTGCTGTC GTCAAAATCT CTTCAGAAAA AGTGACAACA GTAGCTGAGT TTGGTGATTC TAGTAAGTTA ACTGTAGGAG AAACTGCTAT TGCCATCCGT AGCCCGTTAG GTTCTGAA'rA 'rGCAAATACT GTCACTCAAG GTATCGTATC CAGTCTCAAT 260 AGAAA'rGTAT CCTTAAAATC GGAAGATGGA CAAGCrA7*rT CTACAAAAGC CATCCAAACT GATACrGCTA TTAACCCAGG TAACTCTGGC GGCCCACTGA TCAATATTCA AGGGCAGG?? ATCGGAATTA CC'TCAAGTAA AATTGCTACA AATGGAGGAA CATCTGTAGA AGGTCT'rGT TTCGCAATTC CTGCAAATGA TGCTATCAAT ATTATTGAAC AGTTAGAAAA AAACGGAAAA GTGACGCGT1C CAGCTTTGGG AATCCAGATG ATCAGAAGAC TCAATATTCC AAGTAATGTT AGTAATATGC CTGCCAATGG TCACCTTGAA AAAGAGATTG CTTCATCAAC AGACTTACAA GTTAATTTAT CTAATGTGAG TACAAGCGAC
ACATCTGGTG
AAATACCATG
AGTGCTCI'rT ACCATTAAGA TAACCTACTA TCGTAACGGG AAAGAAGAAA AAGAGTTCAG GTGATTTACA ATCTTAATIG ACATCTATGT TAATTGTTCG TTCGGTACAA TAATTACAAA AGTAGATGAC ACAACCATTC TATCCGAGAC CTACCTCTAT CAAACTTAAC AAAGAAAGCT TTACATAAGA TGATT'rCTAT CACAGATATA AAAAACTAGA TGAACTAGCA TTCGTCAATC TCCTGTTATT CACTNrAGC TGGTCTACGG GAAAAGATGT GTTA=TTAG AATCATG4GAA CAAAAAAATC CCTATCAACC CCGAAAAGAA CAGTCTATCA AAGAAAATGG GGTCATTCAA GGTTATGAAA TCC~rGCAGG AGAGAGACGC TCTATCCCAG CTGTTGTTAA ACAGATTCA GAAAATTTAC AGAGAGAAAA TTAAACCCA GTAGAGAAAG GATTCACCCA TGCTGAAA'1r ATCAGCAACT CCATTCGTTT ACTTTCcTTG GGCAAACTAT CACAAGCCCA TrCGCCGTTCC TATTCTTTC AACGGATTAT AGAAGAAGAT ACAGAGAAAA AACAAAAGAA ACAGCAAAAA CAGTTAAGAA AACTACTCGG ATTAGATGTA AAAATCATTA TTTCN'T'rC AAATCAAGAA
AA.ATTTGAAA
TTTGATAGAG
CCCATTrATTC
TATCGGGCTT
GACCAAGAGA TGATGGTCCA GTCCATTATT ATAGAAGAAG CACGCGCCTA TGAATCTCTC GCAGATA6AGA TGGGCAAGTC TCGTCCATAT CCAGAACAGA TTCTTTCAGA AGTAGAAAAT CTAGTTGGGT TAAATAAGGA ACAACAAGAC 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760
ATTTCTGTAA
ACTAATCATT
GGAAATTAGA AGCTCTTCTG TCATACAAAA TGAAGAAAAA GAAATTAAAC TATCTAAAAA AGACAGTGGA GAATATAGTA GAATTATCAA CAGCCTGAAA TAAGGCTGTT CTTTTATTTT GCTTAATAAA TCAATAATTT TGTrGGATTAT 7TrMCACAGC TATTAAACI-r TTAAATAGTA GTATATTAGA ATTTGCACAA AAGCTG.AACT CATCAACGTA TGCAAATGGT CTGGGAAAAA TTTATCTCAC AAGGTTATCC ACTATGTTTT TCGATAAAAA CTTC'IrTTAT CCCCAACCTG 'rGGATAAAGT 'ITGGTAACAT rTGTGGAAAA 7TCTTGCTA'r CTATGGTAAA ATATCTCTAG AAGGAGGAGA AAGGATTGAA AGAAAAACAA TTTTCGAATC GAAAGACTGA CTCGATCCAT GTATGATTTC TATGCTA7TC GAGGAAAATG TTGCCACTAT ATTTCTACCT CGCTCTGAAA CAACTAAAAG ATA'rrATTGT AGTAGCTCGT TTTGAAATT ATGACGCTGA AATAACTCCC CACTATATN TCACCAAACC TCAAGATACG ACTAGCTCAC AGTTCAAGA AGCTACAAA'r TTAACTCNr ATAACTATAG TCCAAAGTTA GTATCTATTC CTTATTCAGA TACGGGATTA AAAGAAAAGT GAAATGTTTG GGCTGTATCA GCCGCTTTAG ACCCTC?1'TT TATCTATGGA GGACCAGGCC GAAATGAAAT TCTAAAAAAT ATTCCTAATG TTATTAATGA CTTTCTTGAT CACCTAAGAC ATCCTAGTCT TGATCTTTTG 1-rAATCGATG CAACTCAGGA AGAATTTTTC AATACCTTTA TCCTAACGAG TGATCGTAGT CCAAAACATC G'TrTAGTTG GGGATTGACA CAAACTATCA TTTTACAAAG TAAGACCGAA CAT'rTAGGCT TAGCTGGGCA ATTTGATTCA AATGTTCGAG TAATTGCCAG AG'rAAAAAAA ATCAAGGATA ATACCTTTGA TAACTTTATT CAAGGGGA'rG CTGTCTCTGA AGATTTGGCT CI'GACCTATA TTGGTAAGAC TCACTTA'rTA AACCCTATTG CGCGTGTTAA ATATATCCCT GCCGAAAGCT TTGGGCAAAT GGAAAAGTTT AAAAAGACCT ATATCCACTC ACTCAGCGGA AAAAAAGICG ACGCCCTTCA TGACAAGCAA AAACAGATTG TAGAAGGGCT CGAGGAGAGG CTTGTCACGC GAGCCCGCAA ACAAGATGTT AAGTTGGTAA C7=TATGGT ATATTGTTTT GGCCCGTCAA TTCCAAAAAT TGGGAAGGAA CCAAAATAAA ATCTTTGATT AAAAGAAAA'r CAAATAATT TTTAAACAAG CTAAAAAACT ACACTCTATT ATTACTATTA TCATTTTCA ATTAATAAAA TAGTTCTAAA AATGCCATTC TATTACTTTA ATTGGTTCAA TGAAGATGCT GGITTTGTTAA TATCAATGTA GTATCTAGTr TCAAATTGT'r TTAACCAGTG
AGCCAAATGC
GTTAGTATCA
GTAGCCATGT
'm'TGGGGGAA
GATCAAGACG
CCCCCCCTCA
ACAATTTCCA
ATCTTGAGGG
'rCACTA'rTGA
TCGTCATCCC
AAGAAATGAA
ATTTATCTAG
AAGATCATAC
ATAATTTACG
CTT'rGAAACA CGTAT'rGCCA AAGTGATACT CTAGAATACC AGCCATCAAC GACATCACTT TATTGCTGCA CAAGCCATTA AATTGATAAA ATCCAAACTG GGGAAGTAGA CGCCTI'CAAA AGAACTAACA GATAATAGTC CACAGTCATT CATGCCCATG TTTAGAAATT GAATCAATCA TATCTTTTTT ATCCACATTT CTGTTTTCCA CAGATTTCAC AATAAAGGAG AATCCATGAT AATACTACTA AGAGAGCTAT ATTG.ACGTGA CCAATOAAGG AATTTTATrr CTCAAAAAAA CTrTCTTGAAG CTTC=TCTT TTTAAACAAA TGAACAAAA 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 GTGGATAACT TTTAGrT'r TGATATGACT TGTTTAAAGG TCTTTCTAAT ACTAAAAATA ArTTTATTTrCT ACAAGCATTA CTATTTTATC AACAGTAAAA ATGGTCAAAT TTCAATTGAA TTACTTCTTT AGGTTCGATC TACCTGATGT AACTCTTGAT GCAAATCAGA AATTACCCTA AAAGGAAAAG ATAGCGAACA 7440 ATATCCACGA ATCCAACAAA TTTCAGCAAG CACTCCTTTA ATACTTGAAA CAAALATTACT 7500 CAAGAAAATr ArrAATGAAA AACAGGTGTC CACTTCGTAT TCATCGCCTA AGCCAGAAAA 262 CAGCCTTTGC TGCAAGTACA CAAGAGAGTC GTCCGATr? TGAGTCAACA CAAAGAGTTrA AAAACAGTTG CAACAGACTC AATTGACTCT TGAAAAAAAT AGTGA'rGATT TTGATGTCGT AATTCCTAGC CGTrCTCTAC GcGAAT7*rrc AGAGA'rrrTC TTTGCCAATA ACCAAATCCT TCGTCTCCTA GAAGGAAACT ATCCTGATAC TACrTT-ACT TTTAATIGTGG TAAACTTACG AACTGCGACT CAAAATGGTA CTGTGAAACT TGTTCACTCT CCAGAAGTTG GTAAAGTAAA TGAAGATTG ACCATTAGT'? TCAACCCAAC TAGCGAAAAG GTGACTATTA GCTTTATCTC AGATACTGAC GAAGACTTCA TGCAGCTCAT GC'N'CAGCCT GGCTCGCCTC TTTTATGATA ATCAAGTTGG AAATTTTG'rT GAGATGAAAA AGCGGTAI-rr ACAGATGATA TCGAAACTGT CT'rTAGAAGC dG-;TATTA GCTTCTATAC AGATCGCTTG ATTCCAACAG ACTI'TAACAC CCAGTCAATG GAGCGTGCCC GTCTTTTATC TGAAATTAAG GATGGGGTrG TTAGCGCCCA CGAAGA.AATC GATACTGATC AGGTrACTGG TTACTTGATT GATTCTCTTA AAGCTr'rAAA AGCrTTCGT CCATNTACTC TTGTGCCAGC TACACCAGTT CGTACAAATT AAGTGAAAGA TAATCGAAAA AGAAAAGGAG AGTAGTATGT AATCACACC TTGTACAATC AAGTCGACTG GTAAAAAGGC TAATCGTTGG GAAATTACAC GTGTAGGAGC AGATATCAAA GTAATTGTGA GCATGTTGTC ATGATGGGGC GATATGATTT TGAGCGAAAA TTTGACTG AGAACCCTA GTTAGAGGGT TAGCACTTTA TCCC-TTTTG TTAGGGAT-rG AAATGAAAAC GGAGAATGAG AAATA'rGGCT TTGACAGCAG
C
C C
C
?1'TGCCAAAC GTTGGTAAAT AGCAAACTAC CCATTTGCGA ACGCCTACAA AAACTAACTG ATTT~ACAGAT ATTGCAGGGA ATTCTTGGCC AATATTCGTG TGAAAATGTA ATGCGCGAGC '?GATACCATTr AATCTGGAAT GCGTGTAGAA AAGATGGCAC TCTTCAAAAG ATTAAACCAG AGATGAGGAA CAAAAGGTTG TGTAGCTAAT GTGGACGAGG AATTCGTGAA 'rTTGCACCGA CAACAC1'ATT TAATGCA.ATT ACAAAAGCAG CGATTGATCC AAATGTTGGA ATGGTGGAAG AAA'rGATAAC TCCTAAAAAG ACAGTTCCCA TTGTAAAAGG AGCTTCAAAA GGAGAGGGGC AAGTAGATGC GAT'TGTTCAC GTAGTTCGTG AAGGACGTGA AGACGCCTTT GTAGATCCAC
ATAAAATGTA
ATGAATAAAA
TG'T-ATAATA
GTATCGTTGG
GAGCAGAGGC
TTCCAGATGA
CAACATTTGA
TAGGGAATAA
CTTTTCATGA
TTGCAGATAT
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 TGATTCTTGC TGACTTAGAA GTACGCAAAA AGATAAAGAA TCCTAGAAGA CGGGAAATCA TCAAAGGTCT TTTCCTTTTG TCAGTGAACA AACGATATGC TCAGTAGCAG AATTCAA'rGT GCTCGTACCA TTGAATTTAC ACGACTAAAC CAC'rTC'TTTA ATGTC=C AGAACCTGAC TCTATCGACT ATGTCAAACA CAGAAAATGC TGAAGTAGTC GTTATT'rCTG CGCGTGCTGA 9300 GGAAGAAATT TCTGAATTGA ATGATGAAA1A GACAGAATCA GGTGTAGATA AGTTGACGCG TTACTTCACA GCTGGTGAAA AAGAAGTTCG TCCTCAAGCA GCTGGTATTA TCCACTCAGA CATGTCATAT GAAGATCrAG TGAAATACGG CTTGCGTGAA GAAIGGAAAAG AATATATCGT TAATGTCTAA AAATT-AATAA ATGGTTCAA GC7'rTTGAAA COAAAAATAA ATGACCAAAT AATATTTITGA AACAAAACAC AATG'1-GGTT AGAATGTCAC TTTTACACAC GATAACATAT ATGGAGAAAA AATTTATCTG GTAAACCAA TTCATGCTTT ATTAACTTAC TATGGTTTGG ATCTTGACAT GGAAGI'rGGG AAAATTCGr ATGGTATCAA GTCTAITATT CAACA'rATAG GAA7*rGGAAG ACCTAAAAAT GGTATGTCAG GGGATGATTA TATCGGTATT 5-rACAGTCTC ATTTACAAGA GAAAAATTTT GAGAAAACAA TATT'AGATTT ATTCTCAGAA AATGATCAGA AGAAAAGACA ACIAATACrr GGTTTATCAA GTTTAGAAAA AGAAGATAGG A7"rGTG'rTAT TTGTTAGTGA TCT'PATTTCT ATCTTGGGTG ATGCTCCrAT GGTGGAGTTr TTGATGTCT CCTTGCGTr TTTGACTGAT TCATCTAAGA GTCGATTGAT TTTACCGTCT CCCAATGCAT GTGAAGAATA TGATCAACAC GCGIrATCC TAAAAAAGAG TTTCTTGAAG CCATTGGTTT TGCAGCTTAC CACTTGCTTG GATTGGGAAC CGCTTGGACr TTCAAACGTG GTATGAAGGC CTTTGAAAAA GGCTI-rATTC GTGCAGTAAC ATCTGAAAAG GCCGTAAAAG AAGCTGGACG
TCAAGATGGC
TTrAGGTTGGA TACT'rGTAGG
TTATGTTGAT
GATATCATGG AATTCCGCT
AAAAAATTCC
CTTGGGAAAT
TGATCAACTA
AACCCTTTTG
CCAGGGGATA
GCGAAGAAAC
*b 8 8 8 4* 8 TTrCAAGCTGA CCTAGCATCC TTTTTCCTAA CCACCS-rTAT GAATGAAAGT GGAAAAGCAG ATATTGACGA TTTACTTATC ATTTACGATG TAAGAGCAAA AGGCTCAGCA GGTGGTCATA GAACTCAGGT CTTTAACCGT GTTAAGATTG TTGTrCATCA TG7TrGAGT AAGTTTGACA TTGACAAAGT TGACGArrCT GTAAACTACT 'rGCAGAGGTA TAACGGATAA ATGGTGACCT TTAAAAAATG GCATCAAAAT TTAACAGATA CATCTACTAA GGCTCYTGCA ATTGCAAGCA TGACGTCAAC TTATGGAGAA GCAGAAGGAC AGGAACTCGT CI'ATCCATr'r TTGGTAGATG CACAGGAAAA AAT'rATTTCA CGGGTTGAAG AAGGGATTTT AGTTTGTAAT ATCGCAGCA.A TCAAAGATAG TATTGTAAAA ATCTCAGTTG ATCAGTTAAA GGAAAAX'GGC TATCGAAAAG 9360 .9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 goes8 0 00#0
TTACTCAAGT
AAATATCCCA
GGTCATTTGA
CAGCTAGTGA
ACAAACTCAG GGCGAA'rTTA GTCTCGAGG AGATATTTTA CATArTG GTTAGAACCT TG'rCGAATTG AGTT7TI'GG TGATGAAATT CATGGTATCA AGTAGAALACA CAATT-ATCGA AAGAAAATAA GACAGAACTC ACTATCTTTC TATGCTTTTG AGAGAAAAGG ATTATCAACG ACGACAGTCA GCTTTAGAAA 264 AACAAATTTC AAAAAC~rrA TCACCTAT TGAAATCATA CCTAGAAGAA ArrCTTCAA GTTTTCACCA AAAACAAAGT CATGCAGACT CTCGGAAGrT rrATCTTTG TGCTATGATA AGACATGGAC TGTa-NGAT TATATTGAAA AAGATACTCC AATATTCTTT GATGATI'ATC AAAAATAT GAATCAGTAT GAAGTCTTG AAAGAGACTT AGCGCACTAC TrrACAGAAG AATTACAGAA TAGTAAAGCA TTTTCTGATA TGCAGTATTT TTCTGATATT GAACAAATCT ATAAAAAACA AAGTCCAGTG ACCTTCT CTAATCI'CA AAAGGTTTA GGAAA'rCTCA AATTGACAA AATTTATCAA TTCAATCAAT ATCCTATGCA GGAArTT1TC AATCAGr 'r CTTTTCTAA6A AGAAGAAATT GAACGA'rATA AAAAAATGGA TTACACCATT ATTCTGCAGT CTAGCAATTC AATGGGAAGT AAAACATrGG AGGATATGTT AGAGGAATAT CAGATTAAA'r TGGATTCTAG AGATAAGACA AATATCTGTA AAGAATCTGT AAACTTAATA GAGGCTAATC TCAGACATGG rrTTCATTT TTCAAAAGAA ATTAAAGCGT AAGATTACAA TGAACI'TGAA AATATCTAGG AATTGAAACC AATACCAAAA TGGTGATCAA ATAT'rTCAAG TGATGGTAAA AGGCCAAGCA AAACG'TAAG ACTCTGAACG TAGTCAGTTG CCTTrTGATGA TGCTTTCCCT TCAAGAGGGA TATGCAGGCT GTAGATGAAA AGA 'TTATT GATA6ACTGAA CATGAGATr CGTr=CGAA GACAACATGT TTCAAATGCA GAGATTAA AAAGGGGACT ATGTTGTCCA 'rCATATCCAT GGGA7TGGTC AT'rGAAATCA AGGGAATTCA TCGCGATTAT GTCAGTGTCC A'PTTCTATCC CCGTGGA.ACA GATTCATCTA CTGTCCAAAT GC'rCCAAAAC TCAATAA6ATT AAATGACGGT CATT'rTAAAA AACCAGGTAG AGGATATAGC TGATGATTTA ATCAAACTCT 11100 11160 112 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 AAGGGTTTrc
TATGTTGAAA
TCTCAGCCAA
TTGGAAAGAC TGAAG1'TGCT ATGCGTGCAG TGTCATTCT AGTTCCGACG ACCGTTTAG GATTCCAAAA TTTGCAGTT AATAT'rCATG AGAC'rGCAAC AcTTGAAAAA T'rGAAAAAcG GTGTTTTGTC AAAAGATGTr GTGTTTGCTG AGCGATTrGG TGTCAAGCAT AAGGAAACTT TAACCTTGAC CGCTACGCCA ATCCCTCGTA ATTTATCTGT TATTGAAACT CCGCCGACTA CTTTCTCAGC TGATGATGAT GATCAAGATG CGGATGATCA ACTTCGTAGT A'rTGAGGAAA TGGATCGACT 'rrTAGTTGGG GATGTTGGT'r CCTTTAAAG~c AGTCAATGAT CACAAkACAGG CGCAACAGCA CTATACGAAT TTTAAGGAAC TGTI'GAGTCG CT'r'AGAAGT AAAAAAGAGC GTCAAGTCGA TA'T'rGATT GGAACACATC A~TrGGGCTT GATGATTATT GATGAGGAAC TGAAAGAACT GAAGAAACAA GTGGATGTCC CCCTCCATAT GTCTATIGCTG GGAATCAGAG ATCGCTATCC TGTTCAGACC TATGTTTTGG AAAAGAATGA TAGTGTCATT CGTGATGCTG TCTTGCGTGA AATGGAGCGT GGAGCAAG ~TTTATTATCT TTACAACAAA GTTGACACAA TT'rGTCAGAA GG=rCAGAA TTACAGGAGT WO 98/18931 PCT/US97/19588 TGATTCCGGA GGCTTCGATT GGATATGTTC ATACTCTATT AGACTT'rATT GAGGGACAAT AGACAGGGGT GGACATTCCA AATGCTAATA GCTI'GTCAAC CTTATATCAG N'AAGAGGAA CTTATCTCAT GTATCGTCCA GAAAAATCAA CGATTAAAGG ATNTACAGAA TrGGGCTCTG TTCGTGGAGC AGGAAATCTT TTAGGAAAAT ATGGTCGAAT GAGTGAAGTC CAGTTGGAAA ACGATATCT GGTGACGACT ACTATrATTG CTTTATrTrAT TGAAAATGCG GACCATATGG GAGTCGGTCG TAGTAATCGT ATTCTTATG TCAGTGAAGT C'rCTGAAAAG AGAI'rAGAAG GCT'AAGAT TGCAATGCGA GA7TTm'CGA CCCAGTCTGG TCA7'rGAT TCTGTTGGTT CTAIrGCTAA ACGAAACGGT ATGrCTAACG TGAT'rTTGCA AATTGATGCC TATCT'rCCTG TTGAAA'rrTA CAAGAAAATT CGTCAAAT'rG AGGAGTTGAT AGACCGTT GGAGAATACC GTTrGGTCAA ATCATACrGr GACAAGGTCr ?I'GAATTGTA 7TCGCAGTTA CTAACACAAG AACCAAAGGG ATACTTATAT TTCTGATCAA ACAACCGTIGT CAATTATGAA CAGATGTAGT AGCCTATCTG TTGTTCAACG TGTGGAAAGA AACGACTGTr TTTAGCTCAA GCATCGCTGA GAATAAGGGA ATGAAATrM AGAAGGTTG AGGAAGAAAA TTCCAT'GA AAATTGAGGT AATAAGGATG GTCGTACAGT CGCAAAW.AA CCAAAAGTTC AACGGACTTG TGCTGCTTGT AAAAGTACTA
TTAGAGGAAG
AATGCTGAGT
CGACATAAGA
GAGTTACAAG
TTAGAGM'TG
AAACATAATA AAATTACAAT TCAATT7VGAA AAAGTCACTC GATTATT'rrA AACr ATC CGTAACCAAC TTAAAAGCAG TTAATGGAGC TTGTA~rTGA TGTCCAAAAT CTGA7"rT'TG GAGAAAGTrT ATTAGAGATA TATTTT'rCTT CTATAAAATA GATAAAAATG AGATTAGATA AATATrrAAA AGTATCGCGA GTAGCAGATA AAGGTAGAAT CAAGGTTAAT AAAGTTAATG ACCAAG~rGA AATTCGCT GAGATGAAAG ATAGTACAAA AAAAGAAGAT TGTATGAAAT TATCAGTGAA ACACGGGFAG AAGAAAATGT CTAAAAATAT AATAATTCTT TTATPCAAAA TGAATACCAA CGTCGTCGCT ACCTGATGaA AAACGGAATC GTrPATGGG AGGGGTATTG AT'PTTGATPA 'FGCTAT~rATT ACTT'rAA'rr TAGCGCAGAG TTATCAGCAA TTACTCCAAA GACGTCAGCA 7TrGCAAACTC AGTATCAAAC TTI'GAGTGAT GAAAAGGATA AGGAGACAGC AAGTTGAAAG ATGAAGATTA IrCTGCTAAA TATACACGAG CGAAGTACTA TCGAGGGAAA AAGTTTATAC GATTCCTGAC TTGCTTCAAA GGTGATAAAA ATrAGACGTA ATAGAGCAAT TT~rGAGTTT GTCAGATGAA AAGCTGGAAG
AAGAAAGATT
AAAGAGTCTA
GTACAATAAT
AT'rATCAAGC
GGAATCTTGG
GGCAATAACT
GCAGCAGGAA
TGTACAAT-G
AGAACGACAA
TATCTTGCCA
ATTAGCAGAC
ATTTGCTACC
7TATrCTAAG
TGGAAAA=IT
AATTGGCTGA
12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 TAAAAATCAA rA1'TGCGTT T-7TTTGTT GCrACCAAGT TCGTCTATAC TTCGAAAGAA GAGAAAAATT AAGTCTCCC TAGTAGTG.GA ATCTGGGAAA TAAATAAACA AGGAATTCCA AACGTTTTT ATATGATCAA CTGACTTTAA ACTGTACAAT CTTATTCGCA AGTATCAATC 'rrGATCAGGC TGGATGGGTA 266 TACAAGAAGA AAAGGAAAGG AAGAATGCG'r TTTTTGACCA TrTCAAAAGT CG~rAGCACA ATTTATTACC TNTCACAATC TGACNGGT ATGGTI'TATG GAGAGGTTCC TGTTTATCCG 7TACTCCCA AAACAAGTTT TCAAATAACC GTATrTAAGC TATCAAATCA TCAATTTATA TCAGAGGTAA CTCCAACAAT AAAAAAAC'TA AGTCCTTATG ATTAAAAGA AGTGAAATCA GACAAGACCA TCTTTGTAGA AGGAAGAGAA GCTAA.AGAAT CAACTTCTGA AGAAGATAAT
AAATTCTTAA
GAAAAAGAAG
ATTTATTTTA
AATGAAGA~r
GAGTGGCGCT
GCTGCGGACA
TGGTTAGAAT
TCCTTATCAG
TTTCTACATA
CGGATGAGTA
TATGTTAAGC TTTCTCTAT .00.0
I*.
AAGTTCAAGA AATGTTATCT GAAAAATATC AACTGACTAC TGGAAAAGAA GCTGGTATCA tNrGAAACT CTCTTATCTC TArrATACGC TAGATACGAC TGTAAAATAC GTATCTGCAG AGGGAAGTGG TAGTCTTCCT AAAAAAGAAG
AGAAAGATTC
TTACGAAAGT
CAAACCAATC
ATCCAAAAGA
ATCAAAATGG
ATCAAAAGAA 'rCTGATAATG TGATGCCACA 'rTCAAATCCA AAAATTGATT TCTTCTAAGA ATTTGTGCTA GAGTCTTGA ATCAAGATGA AAAGATGTAT GCAC-CCAGCG AAGAAAAAAT AAATGAGGGT CTTTATCAGT TCAATGATTT TCCAGGTTCT TATAAACCAG ATAATAAAGA ATATTCTTTA AAGGATTTAA TAGCTCATAA TCTATTGGGA TA?'rACA'TTT A GATGTCTGC CATTAtGGGA CATGAT'rGGG 'rGGCCGGGAA GTTTATGGAA GCTATTTATA CTAAAACAGA TTTTGATAGT CAGCGAATTG AAATTGGAGA TGCGGATGAA TTTAAGCATG TTATTCTrrC TATTTTCACT AAGAATTCTG ATGTTTATGA GGTTCTAAAA TGAGGGAACC ATATTTcAAA AAGC-ATGCTA AGGCGGTTCT TCTATTTAAG GTATTGTCTA CTTATCAAAA TGTGAATCAT AAGCAGAGAA 7rGArACA 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 CCAAAGCTGT TTCTGTTAAA GTAGCTCATA ATACGGGTGT TGTCTATGCA GATTCCCA'r ArrATGATAC
AGATTTTTTA
AGCTCTT'rCT
AGAGTTAGAG
7rGGGAAGAA
GATTTCTAAG
AATCATTTC
GGTGGATTAG
ATTGAATTGA
AAGGAATTAA
ATAGCCAAGG
TCAAGAAGGG
ATTCCATGTT
TTCTAGCTCA
GGAAGTrGGC
CAGAAGCGCG
GTGCGACAGC
CAArTTTCA GGAGAATTT AGAGG'rCATG AAAAAGACAG
TGCTGAAGCA
TGCACGAAA'r
TTTAGTCACT
AGGAACTCGC
AATCATTCGT
GAGCTTCCTA TTTATATCAG TTCGTTATG AT'rTTTTTCA GCCCACCATG CTGATGATCA TTGCGCTATC TATCAGGAAT CCCTTCTTGC ATTTTCAGAA GGTGGAAACG ATTTTTATGC GCTTGATTCG TAAGGAGAAG CAAGTAGTCG GAGAGATAGA WO 98/18931 PC11US97/19588 AAAAGACTTT CCATCAATT'r TTCAC=rTGA AGATACATCA AATCAGGAGA ATCA'rTAT'lr TCGAAATCGT ATTCGAAATT CT'rACTTACC AGAATTGGAA AAAGAAAATC CrCGATTTAG GGATGCAATC TTAGGCATTG GCAATGAAAT VflAGAPAT GATTTGGCAA TAGCTGAATT ATCTAACAAT ATX'AATGTGG AAGATTTACA GCAG1TTATTT TCTTACTCTG AGTCTACACA AAGAGTTTTA CTTCAAACTT ATCTIGAATCG TNTCCAGAT TTGAATCTTA CAAAAGCrCA GTTTGCTGAA GTTCAGCAGA T=?AAAATC TAAAAGCCAG TATCGTCATC CGATTAAAAA WGGCTATGAA TTGATAAAAG AGTACCAACA GTTCAGATT TGTAAAATCA GTCCGCAGgC TGATGAAAAG GAAGATCGAAC TTGTGTACA CTATCAAAAT CAGGTAGCTT ATCAAGGATA 1TTTArrIrCT TTTrGGACITM CATTAGAAGG TGAATTAATT CAACAAATAC CTGTI'rCACG TGAAACATCC ATACACATTC GTCATCGAAA AACAGGAGAT GTTTTGATTA AAAATGGGCA TAGAAAAAAA CTCAGACGTT TATTTATTGA TTTGAAAATC CCTATGGAAA AGAGAAAcTC TrGCTCT'rATT A'TMAGCAAT TTGGTGAAAT TGTCTCAATT 'rTGGGAAT'rG CGACCAATAA TTTGAGTAAA AAAACGAAAA ATIGATATAAT GAACACTGTA CTTTATATAG AAAAAATAGA 0 0 0 .5 0 0
S
S
05
S
S
00 *0 0 5 TAGGTAAAAA ATGTTAGiAAA ACGATATrAA AAAAGTCCTC AGAAGCAGCT AAAAAACTAG GTGCTCAATT AACTAA6AGAC CTTAGTTGGG ATTTAAAAG GATCTA'N'CC 7N'TATGGCT TACACATATT GAAATGGACT TCATGATGGT TTCTAGCTAC TGGTGTTATC AATAI-rAAAC AAGATG'rAC TCAAGATATC TGTAGAAGAT ATCAr'rGATA CAGGTCAAAC TTrGAAGAAT AAGAGAAXGCA GCIrCTGTTA AAATTGCAAC CTrGN'GGAT AGAAATTGAG GCAGACTATA CTTGCTTTAC TATCCCAAAT TTrAGACTAC AAAGAAAATT ATCGTAATCT TCCTTATAT-r GTT=rCACACG ATGAAATTAC TATGCAGGAA AAAATCCAAT GAXTGGTCA AACATA'r'GA CATGGTGGAA CAGCAAGTAG AAAGGAAGAC ATGTTCTATT TTGCGAGATA TIGTTAAhGA AAACCAGAAG GACGrTGT GAGTTTGTAG TAGGTTATGG GGAGTAN'GA AAGAGGAAGT 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120
S
0*SS *S 05 S S
S
GTA714CAAAT TAGAAAGAAT AATCTPTAAT CAAAAA6ACAA AATAATGGTT TAATTAAAAA TCCTTrTTCTA TGGTTATTAT 'TATCTTTTT CCTTGTGACA GGATTCCAGT ATI'CrATTC TGGGAATAAC TCAGGAGGAA GTCAGCAAAT CAACTATACT GAGMWGTAC AAGAAATTAC CQATGGTAAT GTAAAAGAAT TAACTTACCA ACCAAATGGT AGTGTTATCG AAGrrTCTGG TG'rCTATAAA AATCCTAAAA CAAGTAAAGA AGAAACAGGT ATTCAG~rT TCACGCCATC TGTTACTAAG GTAGAGAAAT TTACCAGCAC TATTCTTCCT GCAGATACTA CCGTATCAGA ATTGCAAAAA CTTGCTACTG ACCATAAAGC AGAAGTAACT GTTAAGCATG, AAAGTlrCAAG 268 TGGTATATGG ATrAATCTAC TCGTATCCAT TGTGCCAr GGAATTCTAT TCTrCTT-CC'r ATrCTCTATG ATGGGAAATA 'rGGGAGGAGG CAATGGCCGT AATCCAATGA GTTTGGACG TAGTAAGGCT AAAGCAGCAA ATAAAGAAGA TATTAAACTA AGATTTTCAG ATGTTGCTGG AX3cTGAGGAA GAAAAAcAAG AAcTAGTTGA AGTTGTTGAG ATTCACAAAA CTTGGAGCCC G;TATTCCAGC AGGTGTr'Cl-r AGGTAAAACI, TTGCTTGCTA AGGCAG1'CGC TGGAGAAGCA CTCAGGTTCT GAC?1'TGTAG AAATGTTTGT CGGAGT'rGGA TTCTTAAAAC ATCCAAAACG rTTGAGGGAC CTCCGGGGAC GGTGTTCCAT TCTrrAGTAT GCTAGTCGTG TTCGCTCTCT ATCGATGAAA TTGATGCTGT GAACGTGAAC AAACCTTGAA GGGATTATCG TCATCGCTGC T7"TTGAGGAT GCCAAAAAAG CAGCACCAC TGGACGTCAA CGTGGAGTCC CCAACN'TG ATGAGATGG GACAAACCGT TCAGATGTAC AGTATCGM CGTCGTCCTG GAATAAGCCT TTAGCAGAAG TGTTGCrGCT GA'rTrAGAGA TAAATCGATA ATTGATGCTT TTCTAAGAAA GATAAGACAG AGGACATACC ATTGTrGGTC TGTACCACGC GGCCGTGCAG 'rCTATCTAAA GAAGATATGA AGAAAT'rATC 'TrTAATGTCC AATGGCACGT GCAATGGTTA TGAAGGAAAC CATGCTATGC AGCTTATGAA ATTGATGAAG TGAAATTATT CAGTCAAATC
GTCTCGGCGG
ATGGTTTTGA
'rTGACCCTGC A'rGTTAAAGG
ATGTTGA'T
ATGTCTTrGAA
CAGATATTGA
TTTCACAAAA
CC'TrTGCGT CCAGGACGTT TCGTGAAGCA ATCTTGAAAG GAAATTAGTG GCTCAACAAA TGAAGCAGCT TTAGTTGCTG TGAAGCAGAA GA'rAGAGTTA AGAACGAGAA TTGGTTGCTT TAGTCr'rGTC GAATGCTCGC G'rTGTCCATA GCGGATACAT GATTGCACTT CCTAAAGAGG AAGAGCAATT GGCTGGCTTA ATGGGTGGAC AAACCACAGG AGCTTCAAAC GACTTTGAAC CAGAGTACGG TATGAGTGAA AAACTTGGCC TT'GGTGCACA GAGTCCTCAA AAATCAATTr AGGTTCGTTC ATTATTAAAT GAGGCACGAA GTGAAACTCA CAAGTTAATT GCAGAAGCAT TA'rCATCT
AGGTAATGAC
GGGAAATGAA
TTGATAGAAA
TTCACGCTAA
CTrCCAGGCTT
CTCGTCGCAA
TTGC'TGGACC
ACCATGAGGC
AGGTTACAA'r
ATCAAATGCT
GTGTAGCTGA
AAGCGACACA
CAGTACAATA
CAGAACAAAC
ATAAAGCTGC
TATTGAAATA
TGCCTGAAC
18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 CGAAACAT'rG GATAGTACAC AAATTAAAGC TCTTTACGAA ACAGGAAAGA AGTAGAAGAG GAATCTCATG CACTATCCTA TGATGAAGTA AAGTCAAAAA TGAATGACGA AAAATAACCC TGAGAGAGGC TGGA3GCCTCT CTINTGTG CAG=rAGGA GCTAAAGGGA ACAGAATGGA GAATGGAA CAAATGTCTT T'rCTAATCTG TTAGACTGTA TCTAGAAAGC GGAAAATTAT GATTAAAGAA TTGTATGAAG AAGTCCAAGG GACTGTGTAT AAGTGTAGAA AT43AATATTA CCrrCATT'rA TGGGAATTGT CGGA~rrGGA GCAAGAAGGC ATGCTCTGCT 269 TACATGAATT GATTAGTAGA GAAGAAGGAC TrGGTAGACGA TA~rCCACGT TTAAGGAAAT ATTTCAAGAC CAAG?rCGA AATCGAATTT TAGACTATAT CCGTAAACAG GAAAGTCAGA AGCGTAGATA CGATAAAGAA CCCTATGAAG AAGTGGGTGA GA'rCAGTCAT CGTATAAGTG AGGGGGGTCT CTGGCTAGAT GATTATTATC TCTTTCATGA AACACTAAGA GA'N'ATAGAA ACAAACAAAG TAAAGAGAAA CAAGAAGAAC TAGAACGCGT CTTAAGCAA'r GAACGATTrC GAGGGCGTCA AAGAGTATTA AGAGACTTAC GCATTGTGTT TAAGGAG??? ACTATCCGTA CCCACTAGTA AGTCATGCAA AAAAAATGAA AAAAATTAGA AAAAGTACIT GACAAAGTr CAAAAGGCTG TATAATAGTA ACAGTTGAAA ATAACAACTC AGGTCCGTTG GTCAAGGGGT TAAGACACCG CCTTTTCACG GCGGTAACAC GGGTTCGAAT CCCGTACCGA CTATGGTATG TTGCGTCAGG ACCACTTGAT GAAAAAAAGT TTAAAAAAAC 'rAAAAATCT TCAAAAAAGT GTTGACAAGC GAAAGCAGTr GTGATATACT AATA'rAGTTG TCGCTTGAGA GAAGCAAGTG ACAAAGACCI' TTGAAAACTG AACAAGACGA ACCAATGTGC AGGGCGCTAC AACGTAAGTT 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21338
GTAGTACTGA
AACTTTTTA.
ATGCAAGTAG
AACGCGTAGG
CATAAGAGTA
GACCTGCGTT
CGACCTGAGA
GCAGCAGTAG
ACAATCAAAA AAACAATAAA TCTrGTCAGTG ATGAGAGTTT GATCCTGGCT CAGGACGAAC AACGCTGAAG GAGGMGCTTG CTTCTrCTGGA TAACCTGCCT GGTAGCGGGG GATAACTATT GATGTTGCAT GACATTTGCT TAAAAGGTGC GTA'rrAGCTA GTTGGTGGGG TAACGGCTCA ACAGAAATGA GTAACAACTC GCTGGCGGCG TGCCTAATAC TGAGTTGCGA ACGGGTGAGT GGAAACGATA GCTAATACCC ACTTGCATCA CTACCAGATG CCAAGGCGAC GATACATAGC
GGGTGATCGG
GGAATCTTCG
CCACACTGGG ACTGAGACAC GGCCCAGACr CCTACGGGAG GCAATGGACG GAAGTCTCAC CGAGCAACGC CGCGTGAGTG AAGAAGGTTT 'CGGATCGTA AAGCTCTGTT GTAAGAGAAG AACGAGTGTG AGAGTGGAAA CTTCACACTG TGACGG'rATC ?1'ACCAGAAA GGGACGGCTA ACTACGTGCC AGCAGCCGCG GTAATACGTA GGTCCCGAGC GTTGTCCGGA TT'rATTGGGC GTAA.AGCGAG cCCAGGCGGT TAGATAAGTC TGAAGTTAAA GGCTGTGGCT TAACCATA INFORMATION FOR SEQ ID NO: 21: SEQUENCE CHARACTERISTICS: LENGTH: 6273 base pairs TYPE: nucleic acid STRANDEDNESS: double C(D) TOPOLOGY: linear 270 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: TGTTTTTAAA GAGCCGTGTC TGGATAGACT ?I'CGGACGCA ACGCTCTATT AGATAATGAA
CTGCCTATAC
GTAGTTCACA
TTPrAGAAAAT
ACATCAAAAA
T'rGAGTTCAG
AGAAACTCCT
TGGCTCATCT
GGGTGTTTCT
TGCTTGTCAT
GCATCCGTGA
GTCGCTACAT
TCGTTAAACT
ACAAGATTTC TAACCTTACT CGACATGAGC TGAAACCTCT TATTTGTTAA AAATATTATA CACCTATNT ATGAATAGTC AAC74GTCTTT ACAGTAAAAT CATGAAAATT TTCTC=~CT TTCCATrMA AGTGACATTC AGCCCAGACG AAATTGTCTG TATGTTTAAA GTCTCTGTCC TGGCTACTTG C'rrTGC1'GAC GGCTTTCTGT AATCTTACCA TGAGAAGAGC ?'PCTTTCATG CTTCCAACAC CTGTAALATCA TTTGAATATC CCCTGACTGA TGAGATTGAG ACCATACATT CGAGTGTAAA ACCTGCCTTC AGCATTCTTT TATCTAGTCG CATCATTTCT TCAACAAACC TTGCCTTCAA CACCGACTTG GCCAATGTAT TAAGAACTCT AGTGGAGCCC CTTGATAAGG TAACGCTCCA A'rTCCGCATC ATAGCCTGAT AGCGAAGGGC GA'TTCCAAGC CCTTATTTCC AACTGCCCTT CCACTTTTT TT'rTTCGCAA CAGCTACAGC TGATCCTGCT TAGCAATGCC ACCT'rGGGTG ATGGTTGAAG TCAATATCGC CTTTTTTCAG ACAGTCGCAG TCATGCTGGT
AGTCATTCTC
TTTAAGGAAG
TTrG'rCTTGO
GTAGTTGACC
TTCCAACTCT
TGGCAAGAGT
AGTCGAATAG
TGGCTCAATG
ATCTI'CACGG
CAAGTCTGAA
ATAGGTGTTT
ATCACGCGCC
CAAACTTTCA
AGCTTCATAA
ATTT =TCA
CCCAGCAATA
ACCCAGTAAT
CATCACTTTT
ATGGTC?1'CA AGCCATATTC TTGAGCAATC TGATAAGACA TGGGTT TGAG ATAGGCTAGA ACCTGATAAA CCTGTTCTGG TTCATGACTC GTCACCGTAC CAGTAAATTC AGGATAGATG AG4GAAGCTTG TCTTCCCAAA ATTCGGTTTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 ATCAGCAACT TATACATATT GGCCAAAATT TCTGGTTCTG ACCAAGTTTT CCTTCTCTTT TTGAACCAAA AGAGCTGGAC AAAGCCACCA AGGCAAAACC TGAGAAAATC GTCCGTAATT GACCTATTrT
TATAAGACAG
'rrGCTTTTTC AGTAGGAAGT TAAAGGCAAT GGCTAGCACT GCAGAAGAAA GTGCCCCAAT CAAAATCAAA CTGGCATTAT TACGGTCAAT TCCCAAAAGA ATAAAGGAAC CTAGTCCCCC TGCACCAATC AAGGCCGCCA AGGTTGCCGT ACCGATAATC AAAACAGCTG CCGTCCGAAT CCCAGACATG ATAACACGCA TGGCGAGTGG AATTTCAAAT T'PCPGAGAC GTTCCCATCT GGTCATCCCA AAGGCAATCC CAGCCTCTTG CAGGTTCGGA TCAAT'rCCCT TCAGCCCAGT GATACTATTT 'rGCAAAATAG GGAAAATCGC ATAAATCACT AGAGCTGTCA AAGCCGGCAA CGTCCCAATT CCCATCAAAG CGATAAAGAG CCCCAACAAG GCCAGAGACG GGATGGTCTG GAAAATACCT GCAATCTGCA AGACCCAGTC GGCCAGCTTC TCATGATAGC GAAGAAAAAC AGCCAAGGGA 271
ATCGCAAGCA
GCTGTCAACC
ACCTCCAAAC
CGCTACCTGG
TTCArCCG'rA
TGTCAGAACC
GAGGAAAATC
ACCAGATAAT
CAAAACCTCT
AAATAGCTAG TAACAAGGTC AAAAGCG-ACA ACTGCAAA'rG AATCACTAAA ACGATCCTGA AAAGTTGCAA TTAAATTAGT AAGTCTCGCTA CAAAGTCTGT TGCAGGCGCT TAAAATTG CGAATTTCTC CATCCTGCAA GACAGCAATA CGGTCCGCCA TCATGGTTA CAAAAATCGT TGTCATCCCA AACTCIrAT TGCAACTGTTrrCTCGAAA'r AGCATCCAAG GCCGAAAACG TTGGGCTGAC CAATCATAGC TCGGACAATA CCGACCCGTT TCACTAGGTA AGCGATGCCC ATACTCCGCT ACTGGTAAAC TCTGTT'1'CT TCGTAATTTC TTCCTI'GCTC CACCCCTTCA
TTGAGATAGA
CATGAACACT
TCTCGGGATT
ACTTCAAGGC
GC)ArTr-r
GTTCATCCAT
GcrTT~CTCC
CAACCTTAGC
TrrCAGGAAT GAGAGCAATA TTCCGCAA ACCAGTAGAA AGACGAAGTTr AATATTTCCA TCAGTTGGTT
CTGTTAGATT
CACCCATC
CCAAAAGACG
TGGAAAAAGA GCAATAGCCT GTAAAACATA ATAGTC'rTG ATGCGCTTCC CATCCATA'rA
S
S
S
S
55 TGACCCAGAA GGCCCTACTA AAACCATAAA TCTCAAGACA ?CCTTT'rCTG TGTAGCGCAG TCCTCAATTr AAAACTrCCC TCGATTGGTC
GTAATCATC
TTCCCCATCC
TGCTACA2I'T
AAGTCTTCTA
TTGAGCATGG rCGTCTrACC TCAATCTGTA AGr'rGACATC TrGTATTCAA TCArrcTrTG CC?7AGGCAT AACTTCCrTA AGATATCGTA CrGGGCATAA AGTT CCTTG 'rCCTA.AGAGT 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 TTATC.CCAAT GCTCCACAAT GCAACGCCAT CAATCTGAGT TGGAAAACAA AGTCAAAAGT CCTTGTCCAA TATCATCATT TCTAGCTCCC CA'IITrGGAA GC7'rTCTTAT CCAAATCCTT TCAGGTGTTC GATAGTAGTC TCACGGAAAG 'rATCCGTCG;T TGAACAAAGA CCAGATTGCC TGACGCTCAA AGAAATCTGC TTTCCCGTTC TCTAA.ACGGA CTGACCATAG CTAACCACAT GACACTATAT 'rCAGCCACAT AGTTTTATA AGCAGCACTT A'rGCTGAATC AAATCGTCTG CCACATAATC ACTCCACTGC AATTTCTGTC AAGAAACGCC GAACCACTT TTrATTTTCT GATTTCAAAA TCrCCAAAAA =TGATCTAG TTGGTCATTT AATGACATCC CAATGCTCAA CAA'rACAACC ATrCTCATCC CACCCATTGA GCTTCTCCAC CATTCAGATA TTGATGAACA ATCCTCAATG GTGCGGACAA TCTTAATCTG ACGCTCTGGA AAAGAAGGCT GCAAATCCI-r CTr'rCCCGTC AGGAACACCT GTCGAATGTr GGATATAGGT ATCCCCTACA TCTTGAATGG CA'rGGATGTA TAGGTTGTGA ACCTCATTTC CCTTCTCTTT CAGATTCGCC 'rGAATTTCTT CCTCTGAAAA TCCTTTGTAA GACTGGGCTT GAGCCTCAGC AAC1'CGTCCG CCA'rI'?TCA
AAAATWNT
AAGATAGTAT
CTTGTTGTGA CATATTCTAA CTTGAAAACC TTCAAATTGC CCAATTTCTG ACTGACACGA 272 TGCCCCACr CT7TCTGGGA CTTGCCTAAC TCCGTTAAAA CTAAATACTT CrrACGCTTG TCT~rrCCAC
GTCAGCGTAT
GTTTCACTAT
GGATCTTGAC
ACCAAATGAC
A'rrGTAACAC ACW.ACTAAC AATTACAAGC TN'TGTTCCT CTAGCTrT TATCA'rAGTC TArrCGCAAG
TCCATAAAAC
TCAGTAACI-r CATCTrrCAT
TCATCGTTCT
TCCAGTCGCA AGCCCGATAT CTGTCCGT CGCTAAAArC 'IrGCCCTG?'r CACCCCTATA TTGAAAAATC CGCCCATTCA ACAAACGAAT AACACCTCCA ATI'TATI'TCG ATATCGAAAT AACTGTCAAC 'rATTTCGATT TAGAAATAAT TGCGCAGCcA
AAGAGCCTCA
ATGATGGGCTr
GAATAAAACA
TITGATAAT
TATCCACACC ACCATACTCC TCCTCCAGTC TACAAAAGCC TTTCCTATCA GACCA~rrAT GGCTCAACrA
TTCCATTCGT
'rTTAAAGATA TT?1'ATAGTA TA7"rGAAGTT AGACTAGAGC GATTTGAATr TCCCAATCAA 1Tr'CGTA CCATGACTGC TAAAAGATTT CrATAAATTC CTTATTTCAT TCCGCTATAA TNrCACCrrA ACT'rTTAACC
ACTATCCTAT
GAAGTAAATC
ACTGTATCT'r
TTTATAGCAT
ATTTAATTTC
CCCTATCT'TT
AGAGlw=CA AACTCCrrCG AT7rTATGAG GGGACACATT ATAATTGCTT CC-ATCTGTTC CTAAAACATT GATAGAAAGC TTCGAAACTG GAATAbGGACA CTCAATCAAT TTGTTCATAT TTCGTAGCAC CCTTCAAACA 6* SO 6 66 6O *6*6 66 66 6 5666
S
556S 655 55 6 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 GCCTATCCCC TACCGTTTGA CGAT'rCCTCA CTTCGCTCCA CTTCCAT'rAC AGAACI'TCT TCACTACTAT GGGCTCGGCT GACTTCTCAT GATTCCTTGT TACTACTATT TGAACGCTCA CGAGATAGAT CT'rACAAAAA ATGCTTTGAT CCACAATGGA TCCTCATACA TAAGCGCAGA AGTCGCAGTT CCTCTGTACT AAGCGAGCCA AGTTGAGCAA CTCAGGTGCT GGATGTTTGG TTGACCAGGC CTGAGAGACG AACTGCCTGC AATTGCTCAT GTAGTCTCTA GGAGAGCAGC AACTAAATCT 'rCAC'rCAAAT AGATCTTTTA TAAGGCrTTC TAGGTTTGGT T'rACCATCC ATCAAAGCAT TTTAAAGAGT TGGCTTCTTC TCrrT'GACA GATTTAGGAG CAATTCACGA TTGTAGTAGG CAGTTTTA CATGTCGAGC ATGATTGTAA CTACCACCTC C ATGGT
TAATAATG'N'
AALAGCTTGCG
CGAGCGCAGA
GCATCTGTTC
GTCAAAAGGA
TAATCAAATC
CTTCTAGGAA
CACCGATAGT
CGCC~TTACC
AATCCTTCAT
AACCGTTGAA CGATCCAATT TCTTCACCAA CGCTTGTAAG GTCATCCATT GCATAGAGGG T1"TGGTrGAGA ATGGATATAA
TGTTGATGGG
ACAGTAGTAT
CCCTGGGAGA
GCCTTGACCA
ACACCACCAT TTTTCAGATG TGGTAC'rTGA TACCAGCTTC AGCAAGTGAC CTGGATCATA CCGTAGACAT CACCTGCTGG
AGCTGCACCT
TTCAGCCcGTT
GAAACCAATC
TGAGCAATCA AAGGTTCCAT CTCCAATCTT ACTGCGAGGA AGAC1TTCTGG GTCAAArrG ACrrCTrCTT GGACGTI'AGA ACCCAGATAG GTTGTAGAGG TATGAGCGCC ACGCAGACCA AGTTCATTGC CGAGTTTTTG ACCCGATAAA 273 GCTTCAGCTA GCTCGCTTAC CATGAGGACA ATATrTT CATTGGCTGT CAAAATTGCA CGGATGCCAA AACTTTCTGC CTCAGCCTTG GCAATGGCrG GCATGGTTGG TCCCCCCTTT GAAATCACAG GAATTTCATG ACCATCACGA ACC.ATGGGGT TCCAGCCACC GATT'rCTACG CTGACCATAA AACCAACTrC GTCCATATGA GCTTCTrGAAT GTTTGATACC AAAAATACCA TGCGGTGTCA ACT ITTCACG AAGATAAGCA GCAAGTTCTG TTACTTCTTT AATTrTTGAA T'rTCATCCAT TTTACCACTT TTTATAGGAG AGTATC?1'AG TCC'rGCrCTA TCTTAGAAAA TCTAGTAAAC ATTCCAAAAT TAACTCGAAT CCGTAGCGGT TATCCCAAGC 'rN'TGAGATG GAACrATCTG GTACA.ATGGT ATCACCAGGA TCCGCAAAAC CACCATCAAA AACGATATCG CCACGAGTCA AATGCG4GAGG AACAGAACCT GTCAAGAGTT TGAAACGTTG GCTGCTAACC ACACGGAAGG TACCATCTGG CTTGATTTCG GAAGCGACCA AGACGCGCGG TGCATCCACA CCCAAGCCAT CTGTCACCAC TTCATCCACA CGGACAGGCG CTTCATGACC TGAGACrGCA AATAAT1TTG TCATTTCAGT TCCTTCTTrC AAGGATAGTG GGAAGGTGGA TTTC-AAGT'r GGATAGTATT C'T GCATG TAGTGCAAAA AT1TTATTTCC AAACAAAAAA ACAATACACC 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6273 ATCAAAGTTG TTTGGATTTT TCATGAAATT TACAGAAAAT AGTTGACT'rC TTCTTTAAAT ATATAGTTGG TTGAGTTTGG AATAGTACGC TGTAGCTGCT TAGAAATTAA TTTGACTTTC CTAATAGAGT TGTTCATATC TTATTTCAA'r ACAAAACTAG AAAAGGAAAA AATCATGACC AGG INFORMATION FOR SEQ ID NO: 22: CCTrCTTCT
AAAACATTTC
'TATG
SEQUENCE CHARACTERISTICS: LENGTH: 28171 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 22: ACAACCTTrT TCAAAAACTC ACCTTGGTAC GGAGATGTT TGCTT'ICTGC TATTATTTTC GTATATrC ATATCAATTT TGCN'TAACT CCTCTTGCTT TTTTCATTTA TGCTrAGTGGA GGTCTTATTT TAGCT~CTATT GTATCGCATG ACTAAAAATC TCTACTATCC AATACTAG=r CATATTCTCA TTAATATCAC TGCCTTCTGG GATGTGTGGT TGCTCCTATT TCAGGAAGT TAGCTTACTA AAATAATGTC GGAACTTTCC GGCATTTTCT TTrTTTCACAA ATAGTCAACG ?'TTCTTTT CGATATTGTA GTGGTGTGTA TCCAGTTATT TTT1GAATT GATTNGAAA 274 A'rAAGGTTGA CTTGAGAAAG GCAGATAGTG AALGATAGTTA AGAAGAATAG GATGTTCT 42 420 ?1'TCCTTTTT GGAAAACTTC TAAAATATGG GAAGATGAAC ATTCAACAAT TACCATGT TGAAGCTGCT GAAAAGATGT ATGrrAGTCA GGAAAAAGAG -rGGGC~rrA AGATrTCCG TCGTGGGATG GAATTTTATG AAAAATCGCA AAATCAGTAT GCCAATCCTG AAGAAGAAAA TGACTTCTTG CCACCAACTA TTACGGCCTT CCGTA7Trr GAATCAACT1A CTGTTCAAAT GATTrGGGATT ATCTACCTCA ACAATCAAAA ArrAGGTCTG GAGCTCATCG AATTGATTCC TATAATGAAA AGATAAAGAA GTGGGGGTA TGTGGCTATT GCCAATAGTG GTACrrrTCG CCGAGI'CTG TCTATTrTG TTCGTGATT'r TCGGACCAGC -1CXr.GGAcrr TcTTGACCCG AGAA7'rGGTI' AAAGGATTG ATA7"TTTCA AGATGAAT'rT TCTGITTGCTA GCCAGCACTA TTCAGAGCGC TATCCTGACT ATAAGAACTT ATTAGA'rGAA GTGGCCCAAG GGCATAGTGA TAAAAAGGGG ATATGCAAC GGGTTGAAAA TTTCCATACC CATATTTATC TCCGTGAGGG 480 540 600 660 720 780 840 900 960 1020 1080 1140 TCATCCTI-rA GCCCAGAAAG TCGTTTCACT CAAGAGAAAG CGCTAGCTCA CAGATGrA GACGGACGCC TATGCGACAG AGTTATTCGT CTCAAGGATA GGAGC'rrAGT CALAGCTGG4GA GAGGAAATCA TGAAAAAAAG GATCAGTTGG TCAAATCCTA ATCCCCAATT TCGTTACCTT CAAGATCAGC AGCTGTTATT TATTTACATA AACACATGGA GC TGrTC TTraAAACTT ATGTGACAGA CCGTGCCACC TTGAATGGTA GTTCTGGATT TTTAGATAGT GACACTG'A ACCTAGATAA CCGCATGGTC -A'GTTAAAC CTCTCTTCGT AGAACTCATG CAAGAATT AGCAATAGTG GCAGTCATTG 'rACTGCTTTT 'rATCGTCCAG CAGATTCCAC TGGGTGAAGT AGGAATTAGT CATGGAGGAT TTAGCGGATT TACCAACGGT ACGAGTACCT TTATTATTCA GAGAACTTTG TCGATACCAG TTT'rGGAGCG 1200 ATGGCATAC 1260 GTGAACA.AGT 1320 TGATCAAAA 1380 GATTGGGCTG 1440 GCGCTCCTGG 1500 T-TC'rATCTTA 1560 TGCCATTTGG 1620 TCTAATAATC 1680 GGATATGTTC 1740 GACCTACCTG CAAAATCGAG CGCTGTCATT ACTCTGTTG GCACTCA'N'C TGGATGGTCT TATTGACAGG GTCAGTCAGG
GTGCAGCCTT
TCGTGATAGG
TGG4GTTTGAC GCTTTG7'rGT CACCTTGACT TTATCAACTT TCCAATTTTC AATGTGGCAG ATAGCTATCT GACGCTTGGA GTGATTATTT TATT-GATTGC AATGCTAAAA GAGGAAATAA ATGGAAATTA AAATTGAAAC TGGTGGTCTG CGTTTGGATA ACGCN'GTC GAATGAACAA ATTAAATCAG GCCAGGTCTT CACAGTCCAA GACGGTGATG TCGTCACTTA TGTGGCTGAG GATCTTCCGC TAGAAATAGT CAAACCTCAG GGAA'rGGTTG TGCACCCGAG
AGATTTGTCA
GGTCAATGGT
CCATGTGCCA
GAAT'rA'CAC GTACTCTCC CAAGTC.AAGA AAGC'rAAATA GAACCAGAGG TAT'rAGAGTA 1800 1860 1920 1980 2040 2100 2160 CTACCAAGAT GACGATGTGG CTGTCGTTAA TGCTGGTCAT ACCAGTG4GAA CCCTAGTAAA 275 TGCCCTCATG TATCATATTA AGGACTC GGGTATCAAT GGGGMC G GTCCAGGGAT TG'rTCACCGT GCATCTAG7CA I-Gr CATGGA
AAAAGACCGT
ATTGATAAGG ATACGTCAGG C'rGCCCAAG AATC2TACCTA
AAGAAACAGG
AACTCAAGGA
ATGATCGTGG
CTGTAACTGC
ATTATAGCTT
TGGCTTATAT
AAGGACATGG
ACGACCTTGGA
CGTCITGGAA CCTTGGCG TCATCAAATC CGTGTCCACA TGGrCCTCGC AAGACTTTGA TACTCATCCG AGAACAGGTA S S
S
S
S
S
.555 S. GGAA.ACCTTG GAGAGATTGA GAAAGTAACA GTAGGCGCTT TTTAGTT GTCATGGTAT AATAAAATCC ACNTTATCAA TGTTCAAGAA AATGGACATT TTGCCATGGT GGATACAGGA TCTCGCTATC CATI'GAGAGA AGGAA'rTGAA GTCTT'rCGTC GTI-rGAAGGA A??GGGTGTC ACCCACAGTG ATcATATTGG AAA'rGTTGAT GTCTATCTTA AGAAATATAG TGATAGTCGT CTGTATGGCT ATGATAAGGT TTTACAGACT AATATcAC-Ac AAGGGGATGC TCATTTTCAG TATGAAAATG AAACTGATTC ATCGGGTGAA TCCTTGATTA GCGTGGTGAA AGTCAATGC AATGTTCATG GAGCAGAAGA CAAG'rATGGT TTTAATCATC ACCATIGATAC CAACAAATCA CCGAGTTTGA TTGiTrcAAAc TTCGGATAGT TATG?=AATT GGCTCAAAMA ACGAGGAAT GATGCAACAG TTTITrGATAT TCGAAAAGAC CCGATTCCAA GTrITCAAGC TGGTTGGCAT TCTTCrATG TAAAAAGrCTr TGrTATGAA
TAAAGGGAAG
AGTAGAGTTG
CGGCCATCCA
ACAAT1rTCTT
ATTAAAGCA
ATGAAAAAGA
GGGAATGTTC
GGTGGCAGTG
GAAGA'rTATG ACGT=?1ATA
CA)AAACTTG
GAArrACTGT
ATTACTAAT
GCTGCAGAAA
TTTGGCGACA
7TrAAAGA
AAGAAAATTT
CCTCTCATTG
AATACCAAGG
CTACCTTGGA
GAGAGAATCA
GGI-1-rGTCA
AAGAGTGCAT
ATTGCTAAAA ACGATGATGC CTCCGCAAAT ATrGGvGCGAT GCCCCGATTG GCCGGAGTGA CCTG.CAGTGA CGCGrTNTCA CAACTGGAGA CAGGGCGCAC GTCGCI'GGTG ATGAGGTCTA CATGCCA.AGA CTTTAGGT GATATCCCAG AGATTTTTA.A AATTAACTAG TrrAGCACTT AGGCTAAGA AAGTTCAGGA ATGCGATTAT TCTTGAAAGC A~rrCCCAGA TGGAAGTGAT AGCATGTTCT AACAGACCGT ATTTTAT"=T GGTGACCCAT CTACCT'ATCC AGTTGACCGA CTGAACGTC1' ATGGGATAAT AAGGTGTTTC AGTTAT'rCAA TGGATATTCA GCTCTATAAT TTTGGGATGA CAAT'rCCAAT ACCTTGGGGG CGATTTAGAT GAAAAGTTGA TT'rGATGA.AG ATTTrCATTAA AAATTTGAGT AAAATGGTGT TGATAGTGAG ACGCAGCCAG CAAAGACTAT ATAITCA.AC ATCCTACAAG ATCGGAACTG GTGGTATCAA AAATCGAACG TGAATG4CTAT AAAAATGGAA CAATCATTGG 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 GcGCCTG.ATr CTACAGGAGA .GTATGCTGTC- GGTTGGAATG TACTTTAACC AAACGGGTAT CTTrrACAG AATCAATGGA ?1'CTAI'1TGA CAGACTCTGG TATTATTTTA ACAAAGAAAA TATTATTTGG ATGTTGATGG TATTAC7'rTG CTCCA'rCAGG 276 TGCTTCTGCT AAAAATTGGA AGAAAATCGC TGGAArCTG4G CC.AGATGGAA ATTGGTTGGA TTCAAGATAA AGAGCACTGG TTCTATGAAG ACAGGATGGC TTCAATATAT GGGGCAATGG GGAAATGAPLA ATGGGCTGGG TAAAAGATAA AGAAACCTGG TACTATA'rGG A'rTCTACTGG TGTCATGAAG TATTATCTGG AAGATTCAGG AGCTATGAAG TA~rCTACA AGACAGACGG TTCACGAGCT TACTTCI-rGA AAGAAAATGG TCAATTACTT GTGGA'rTCAA GTCGTGCCTG C'rTAGTGGAT ACTACAAGTC ATTCAGAAAT AAAAGAATCC AAAGAAACGA GTCAACATGA AAGTGTTACA TCAACTTCAC AAAGCTCTGA AACGAGTGTA AGAAGGTTT AGGGCCTTCT TTTCCTATC ATAATGGATA AATATGAATA ATCGGAGTGA GGTGGG'rACT TCTTCTCTGA CAAATGAGGA TA'"ACCCAG CAGTTGGCTA TGCTGCACGA ACAGGTGAGA TAGAAGTTGC TGCTCAACAT
CAAGGCTGGC
GTGGGTTGGA
GTGAACGGTA
ATAAAAAGGC AAATGA'rTGG TCAAGGACAA GGATAAATGG AGACACCAGA AGGrrATACT GTTrCGATCG AGAAATCTGC TACAATTAAA AAAGAAGTAG TGAAAAAGGA TCTTGAAAAT AATrTTCAA CTAGTCAAGA Tr'rGACATCC AkACAAATCGG AATCAGAACA GTAGTAGAAA
AACTCTTTTC
GACTATGAAA
TGGAAGTTTA
GGCTGGTCAT
TATTCCGTr TATTCATGTT
TACAAACGGA
TCACGTAGTA
GAGTTGATTT
AGGTGCCATT GCGGCTGGTT 'rTGGAGCCTT AGGATTTAAA AAGCGTCCGA TGATAAACAG GCTTCAGCAG TCTTCTCTTG CGTCAAATCG TAAGCGTCGT TATAAAAATG TCCTATCATC AATGAGAATG CACTCTAAGT GCTCAAGTAG TGTGGACGGT CTCTATACTG AATCGAGACC ATCAATCGTG AACTGGGGGI' ATGTTAACCA TGT'rrATATC TGCTCATCCT GGATGGTTCT TACTTTGTTG CTTCTATGCT CAGAGTCAAG TCAATATGGA AAGAGTCTTC CGGTGATATC GTGAC-AGTAT CGGTAGGGCA GGGGCTTTTG TTGGAAGAAT TTrTCTGCACA AATCTTGCTG ACCCAAGATG CCCATCAGGC TTTGTCGGTT TTGCTCAACC ATAGTGTCGT 'rATTGATGAG CTCAAGGTTG CGGCGATGGT CCAAGCAGAC CTTTTAGTTT GAAATCCTAA TTCAGATCCA AGAGCCAAAC ACATTATTGA TATrGCTGGT CAGCTGGTT AAATCAAGGC TGCAACTATC GCGACGGAAT TTGTCrrTAA
AGGTAAAGGA
TGGTGTCTTC
CTrAAGATTGC
ATACAACCAA
ACTTTGTGGA
GTGGGGCAAT
GGGACAATGA
TCTTGACAGA
GCTrGGAGAG
CGTCAAACGG
CAGGAGTTCC
3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 TGAAATCAGA TTCCATGATT GAGGCGCCAG AGGAGACCGA CTCAAGAGAA GGCGCTTCGT ACCCAGAAAC AATGGCTTGC GTTCTATTTG GG7rGATAAA GGGGCTGCGG AAGCTCTCTC TCTTATCTGG TATCGTTGAA GCAGAAGGAG TCITrTCTTA TTGACAAGGA AAGTGGAAAA TCACTT'GGAA AACGACGCGT GCAATGGA GCATCTGCTT GAT-rACCGT GACGACTGGA TTAGAGGTAA ACrATGGTGA ATCGATTAAC ACAGCTAGTG CTTAGTGGCT GCTACTGAGG GGGGAAAATC TCAGATGTGA GATGGCAAGA GGAATTCGTG AACAAGTCAG CT'rGAAAATG TGGAGGATAT GTTGCGrCT TTTCCAI'AC TCCTGAAATC GTAGACAAGA ACAATTTGAA AAGAAGTGAA AAACCAAGCC AAATTTTAGC GGCTAATGCC 'rG'I-IGATCG
AAGTGGTTGC
GTT'rGGTITA'r CGGTATT-ATC TATGAAAGCC GTCCAAATGT CACTGGAAAT GCGG'rTG~rC T'rCGTAGTGG TGTCACAGCC TTGAAGAAGG GCTTGGAGAC GGTGGAGGAT ACTAGCCGTG AAAGTAGTTA CCTTCTCATT CCTCGTGGAG CAGCTGGCTT ACCTGTTA'rC GAGACAGGGA CTGGGATTG'r AGACAAGGCG CTGTCTATCA TCAACAATGC CATGGAGGTT CTGCTGGTTC ATGAAAACAA
TCTT'NATTTG
CTTACCAGAT
CACAAAAAAA
GACG;TCTGAT
TAAGGATGCC
GACI'ACTATT
TGCTATGATG
CAAAAAGCCA ACGGTGTcTr CAACTACTrr TTACAGAATT CAGGTACAGG CTGTTAAAAA 7"rGCTAGCCA TGGCTGATCA C'rCGATATGG CAGCGGCTAA GA'rGCAGATC GTATAGAAGC CCAATCGGTG AAGTrTrrAGA CGTGTAGCTA TGGTGTCAT GCGGCTGCTT TGACTCT'rAA TATCAAACAA CCCATGCCAT CATCCAAATG TGA'rTCAACT AAGGCCAAGG GCTATCTAGA GATCAATGCA GTGGTTGAGA ATGCGATTGT CCATGTCTAT GTGGATAAGG ATGCAGACGA TAAAACCAGT CGTCCTTCTG TTTGTAATGC GGCAGC;AAGC TTCCTTCCTC GCTTGGAGCA AGTG1'TGG?? GCAGAGCGTA AGGAAGCTGG ACTG.GAACCA CAAAGCAAGC CAGTr'rGTTT CAGGTCAAGC AGCTGAGACC ATTCAATTCC GCCTAGATAG CAAGACTNIG ACACCGAGTT 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 TTTAGACTAT GTCCTTGCTG T'rAAGGTTGT GAGCAGTTTA GAAGAAGCGG TTGCCCACAT TGAATCCCAC AGCACCCATC ATTCGGATGC TIATTGTGACG GAAAATGCYG AAGCTGCAGC A'rACT'rTACA GATCAACTGG ACTCTGCAGC GGTGTATGTT AATGCCTCAA CTCGTTTCAC AGATGGAGGA CAAT'rTGGTC.TTGGTTGGA AATGGGGATT TCTACTCAGA AATTGCACC GCGTGGTCCC ATGGGCT'rGA AAGAGTTGAC CAGCTACAAG TATGTGGT'rG CCGGTGATGG GCAGATAAGG GAGTAAGAGA TGAAGATTGG ATTTATCGGT TT'GGGGAA'rA TGGGTGCTAG CTTGCCAAAA TCTGTCTTGC AGACTAGGAC GTCAGATGAG ATTCTCCTTG CCAATCGTAG TCAAGCTAAG GTAGATGCTT TCATTGCAGA CTTTGGTGGT CAGGCTTCCA GCAATGAAGA AATGTTTGCA GAAGCAGArG TGATTTTTCT AGGAGTTAAG CCTGCTCAGT TTTCTGAACT GCTTTCTCAA TACCAGACCA TCCTTGAAAA AAGAGAAAGT CTTCTTTTGA ?r'rCGATGGC AGCTGGATTG ACCTTAGAAA AACTAGCAAG TCTTATCCCA AGTCAACACC GAATTATTCG 'rATGATGCCT AATACCCCTG TAATTGCAGG CTIGAGGACA GGTTGAACTA GGAGAAAGTr CTTGTCTAT CTT'TTTATCG AGAAATAGCA TT'GAAAATGG AAGTCAGCAA CATCCTGGAG CGCTGGTGTA GCAAGCCTAG TCAAGCCTAC AAACGAACAC TATGGTIGGCT GAAATGAGAA AATAGAAGTA GTAAAAAAGA 278 cTrcTATcGG GCAAGGAGTG ATTAGTTATC GTGAGcTcTT rTATCAGCTT TTAGCCAAGG TAATCGATGC AGCGACAGGT C1-rGCAGGTT AGGCCTTGGC AGATGCAGGT GTTCAGACAG CAGCACAAAC TGTGGTAGGA GCGGGCAAT TATTGAAAGA CCAAGTCTGT AGCCCAGGCG AAGCGCATGC TTCCGAGGA ACAGTCATGG AAGAACTAGG TAAATAAGAG GTAGTPGMA GACACAAAAA GA'N'GTCACA AACCCCTATr
CCTTGTCTCC
CTGGTCTCTT
GTGGACCAGC
GArrACCACG
TGGTCCTTGA
GTTCGACTAT
ATGCAGTTCA
C'rGCCTCTr-r 7TTTGATAG AATGAGrrAG ACATGTCAAA AGGA~rTTA G=CCTCTTG AGGGACCAGA GGGACCAGGC AAGACCAGTG AAAAAGGAGT AGAGGTGTTG ACGACCCGTG TTCGGGAAGT GATT'MGA'r CCAAGTCATA TCTATATTGC CAGTCGCAGA CAGCATrMG
TTTTAGAGGC
AACCTGGCC
CTCAGATGGA
TCGAAAAAGT
TCTGCTACCA ATTTTAGAGG AC'TCTTG.ATT GGGGAGAAGA TGCTAAAACA GAGCTACTTC TCTT'CCAGCC CTTGAAGCTG TGCCTATCAG GGATTGGTC TGCGACAGAT GGCCTCAAAC C~TGGCTCGT ATTGCTGCTA GGACTTGCAT AAAAAAGTTC GCAAGTTCGT CATCATGGAT CGTNTATCC ATAGTTCTGT GTGGCTTAGA TATTGAAGCC ATTGACTGGC TCAATCAGTT 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 CCCATTTGAC ACTCTAI=N' GACATCGAGG ATAGTGACCG CGAGGTT-AAT CGTTTGGATT GTCAAGGCTA CCTTTCTCrr CTGGATAAAG G'rCTCCCTTT GGAGCAAGT'r GTGGAAACTA TGGCCAAA1'G AAACAAGATC AACTAAAGGC CCGTATCTTA GAACAAGACC AGCTCAATCA
TGGAACAAGG
TGGAAGGTT
AGGGAAATCG
CCAAGGCTGT
TTGGCAACCA
CGCCTATCTC
CCTCTM-rGT GA'rrGAACAG
CATTGTCAAG
CTTGTTTGAC
GCTCAGTTTG
TTTTCAGGTT
ACGGATAAAG
GGAGAATTTC
ATT-GA'rGCTA
GGAATGGGCT
ACCGTTTTGT
TCTTTGAAAG
TTGGCGrTT
CCGATGTCAC
TGGTGGGTCA
CTTGGAAATG
ACCATGTGAG
CTTGATTAAA
GTTTTCTCAA
AATGCATCCC
GCGCAATTTT
AAATGCCGAA
CCAG7TAATC
GCAGGGATTG
AACGCAGCC-A
TAGCTAAGAG
GTTGCAAGCT
AGGTCATTAA GACGGAACGC AT'TCGAGAAT TTATATTTTC TTCTTGACTA TCAGATCT'rC CACTTTAAAA ACTTGTTAAG AAAAAAGCGA AAAGCCAGCA ACAGGTCTTT ATCA'rCGAGC AAGCGGATAA ATTCTCTGCT CAAGGTCATC GAAGAACCCC ACAGTGAAGT GCGATGAGGA AAAGATGI'TA CCGACAArCC GAAGTCGGAC AGCAAGAAGA AAAACTTATC TTACTCTTAG AACAAATGGG CTCTTAGC TAAGTTTAGT CAATCGCGAG CTGAAGCAGA
AAAGTGC
rrGG~rAGTA
AGATGATAAG
AATCAGGCAA GTTTTTGGAC GCTAAGAAAA AAGAAAGTTA GAAAAACAGG ATCAGGTT 279 CTTGG'rCGAT GAAAGTGAAC GCCTGCTGAC TCTACAGGTT GCCAAATTAG CCAACTTGGC ACGGATTCTT GAAGTTCTCT S CCTCTTGCAG GTAAGAGTAA CAGTGATCT ACAAGATI-rA GCAAGCrAAT GTCAGC71rC AAAATGCCAT GGAATATCTG TCAAAAATGA ATGATAAAGA AACGAAAGGG CTGTTTTATG CGCGCTGGAT GATT=TCCC AACAATTATT GGTAACCTTA GAAAAATCTC AAGAGCCTGG TAGAGGAAAA TACAGC'rCTT GCGAGAACGC TT'GGG'rGAGG TGGAAGCAGA TGCTCCTGTC AAGTGTCCGT CGCATTTACC GTGA'rGGAT'r TCACGTATGT TCGAGAGCAG GACGAGGAAT GTATGTTTTG TGACGAGTTG CAGATTCAAA AAAGTTTTAA GGGGCAGTCT CCCTATGGCA CCGATTGGCA ATCTAGATGA TATGACT'rTT CGTGCTATCC TGGATTGCrG CTGAGGATAC CGCAATACA GGGCTTTTGC ACCAAGCAGA TCAGT'rCA TGCCACAAT GCCAAGGAAA TTC7"GAAAG CAGGGCAAAG TATTGCTCAG.GTCTCTGATG GACCCTGGTC ATGATTTAGT TAAGGCACCT ATTGAGGAAG CCAGGTGCCT CTGCAGGAAT ?TC'rGCCTTG ATTGCCAGTG ATC'TTTACG GTTTTVTACC GAGAAAATCA GGTCAGCAGA AAAGAr'rATC CTGAAACACA GATTTTTTAT GAATCACCTC GAAAATATGT TAGAAGTCTA CGGTGACCGC TCCGTTGTCT ATCTATGAAG AATACCAACG AGGTACTATC TCTGACTTAT CCACTCAAGG GCGAATGTCT TCTCATGT GAGGGTGCCA GACGAGGAAG ACTrGTCGT AGAAAT'rCAA ACCCGCATCC CAAGCTATCA AGGAAG'rCGC TAAGATTTAC CAGTGGAATA TACCACCACT GGGAAGAAAA ACAATAAAGG GAGACAGGAT TGTTTAACT'r AAT'rAGTGAT GATAATATAA AGA'rGTATCA ATTAAGTTTT rrATTAAGCC CATACGGAA'r ACCGATGGTT CTTAGAAGGT ATAAATAGAA AAATAAGGTC ATTTrAAATC
CTAGAAGCTA
GTCTTGAAAG
GACAAAAAAG
GCCGATGTGG
CGC'rrGGAAA
AAGGCCAAGC
AATGATr'TTr CTATACAGGG AGTAGGCATG AGCTGTATCT AGTGGCAACG AGACCTTGAA AGAAGTGGAC TCAAGCATI'T 'rGACA'TTCC AAATTCCTGA TT'rGATITGGT CCGGTTTGCC TIAGCATTTCA AAATTGCAGT TGTGACAGTT GTTTAGCGCC ACAGCCACAT AGCAA'N'TT TGGCTTGAAA ATCGTGTAGC AGACACGTTG TGGTCAGGGA ATTGACCAAA TAGAAAGCAT TGCTGAAACG GTCAGGGTGT GGAGGAAAAG AGCAAGGTGT GAAGAAAAAC AAAGTCAGCT CTACGCTGCC GTAATAATTC TGTCTGTTTC CTTGGTATAG AAGCTTTGGT GGAGCAGCAG TTATAGCGT AAAGGA'N'GA TAAATCAGAA
GTGGGCAGGA
GAAAAATGTG
AAATATAAAC
AArTATTTGA
AAGCCATCAA
ATAGTAAGTT
ATGTTCGTGA
ATGGACAACG
9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 AGAAGGTGAT TTTTGCGAA AACA'rTTGAG AAATATATr'P
AGCTGATCAA
CTTGGTTCT
TAAATTTAAA
TTTATCTCTG
GATTGATTCG
AGCGACTAAA
T'rTGGAACT TTTAAC?1'TA
TAACAGAGAA
AGAATTTCGG
AGGAATGTCA
TGTCGAAGAA
TGAGCTTGAT
GTTGACAGAA
AAATGGGAAG
TGGAATCAAC
CAAGAGT'rGA
TTGAGTGAG
ACAGCGACTT
TTCAGAACTA
AAAAArTA GCAAAVI-rrr
GCGGAAAGAT
GGCTTATCAG
TTTAAGAATA
ATTATGAAAA
GTAAAAAAGG
280 CATACGAAAA TAAAGAAGAA CTAAAAGCTG AGATAGAGAA TAGAATTTGA TAATATTCCA GAAAATTTAA AAGATAAGAG CTCCAGCACA AAACCTTGCT TATCAGGTTG GTTGGAccAA AAGATGAAAG AAAGGGGCTr CAAGTAAAAA CACCATCGGA I-rGGTGAAT'r ATATCAGTGG 'rrCACAGATA CCTACGCTCA AAGCAAAATT AAATGAAAAT ATTAATTCTA TCTCTGCA6AT AAGAATTATT TGAACCGCAT ATGAGAAAGT GGGCTGATGA GGGAAGTGTA TAAGTTTATT CATGTAAATA CGGTTGCACC AAATCAGAAA A'rGGAAGAAG ATAGTATTAT AAATTATATT TAAAAATGGT TACCAAAGGC GATAGAAGAA AAACTATCGT AAGAACGGAG GTGATCTTGC ATrGAClI'TG AATATTmA TTAACTTCTT AAAAGTACCG GAGATATTAG TTCATAGAGA CAGAAGCAAT TATCCTTTAT TCCATACTTC TTAAACAGAC
ACTGGATAGA
GAAGAAATAT
AATAGGACTG
0 0 0 00 0 0 Ot .9 0 4 000000 0 **00 0 0009 00 0~ 0 0 4 @00.
0 0000 0004 0 0000 00 00 0 0 0 CCGAACATCA TTrATGT'rAA AGACT'rTATG CAGAACTCAA AAAACTTAAC TTCAGAAGTA GAACTTCAAG AGGTTAAGAA CCTTGACTCT AGTAAGAGAG AATATAG'rTT 'rGGTGAAAAC GCTGCTGAAG ATATATCGGA TTTACAAATC AGACTTCCTG CAAAACTAGA ATCCTAGTI'C TCGTA6ATCCG AAGCGTTTAC GATGATT'rCG TAC'TTrGGCA AAGATGTTCT CAATC7TGCT TTTATCTTCA GCTGN'AGCG GCTTGAGT ATATATCTTC ATGAGCCCTT GATAACCACT ATTTCTGCGA CTCATTTTGA ACAACTT1CAT AGAAACAATT CTCCCTTGAC TTGTGACAAT TTTACCAGAA TGATTCGCTA ATTCrTNr"
CA.AGGAAGGC
CTCAAAGCCA
ATCGAAAGAG
AGTATAT'rTC
AAAGATTTTA
AACTATATAG
GGACTTGGAA
ATAATGAACT
ATGATTGATA
ATAGATTGTT
TCTCrCCTTG
GCTGGATTTA
GTCAGACAAG
ATCACGACAA
CGCTTGAGCC
TAGGGCGAIT
AGAGTA'N'TA
ACTGCCATAA
TAAGGCTTGG ACTTGGTALAG AGGTAAAAGA AAATGACTrA ACCTCAGAAG TAAAGAAAAT AGAATA6ATAA GAGTAAGTAT CATTTCAA.AA TGTGTTTTA CACAGCTTGA GAATTACATT ATGCCAGCA6A TCAAA'rTCAT GAAAACATTT TAAACGTTT GA'rAGCGCAT GGTTACAGGC CGTGGAGTTT GTACT'rGAGG ATT1'TACCAG CTTGTCCGAT TAGTrCACAG CGATATCCAA TTCATACCGT GAAATTTCTr GATTI'TTACT TCCGTCGCA'r
TCTATTTTAC
AAACATTAGA
11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 CAATCATTAC CGTGTCCTCA GAACTGAGAG GAGTTCTTGA AATCGTAACA CCACTT7VGAA 281 cAAGAGTTAc rrCAAcccAT TGGCTCcGAc CAGCCGCAAT TTGTCATAA GTTCGATATT GGTCTTCTAG GCTTAATTTA GGTTTTCGTC VTTAATAC AGCTAATATC TC'?rCAAAAG ATCGTGCATC AG 1TAGTTGT TTAcTTGCTT TGr'rTCGCAG GAAGTCTATT GGAAAGTAAG TTGTGAGCGT GGTCCTATTT TTTCAGGTAA AGGAGATCAC TATGTTGAAT G??ATGCTGT AATAACAGTC CCTATCGATC CGT'rATGTGG TTCTTAACTG GAATACTCAC TATCTCTrTA GCCTTCTTCC CTTrTTrTGT TATACTAGTA CAAACAGCCC AGTGGGGTGT TTTAATATGG TCGTGTCTCA ATCTTATATC AATGT'rATCG ACCAAATCGC AGAGCGI'GGT ATTCCAGTTA CACCCCAGCA TAAAACAGAC AATTTTGCTG ATGCTGAC AAATGCAGTT GGTCTTCTCA TCTTGGAATC TGCTGAGGCT ACACGTGTTC ATGGTTTCTC TCAAATGGTG AccGAAAAAG GTGATGAAAT TACAGAATTG CCGACAGATG CAAGTGATGC CTTGGCTGAA AAGATTCATG ACGATGCGGC AGCGCCTATT ATCGATGTCA AATCACGTTA TGATAAGGGA GAAGCGGCCT TTATGGATT CCATGAAGCT TTGGTCAATG AAGAAAAGTA CTTTGAAGGA TCTATGCCTA CTATGCITTA TGGCCCTATG AAGCCAGTCG CTCGTGATGG AGAATTTAAA ACACCTTATG CTGGTAGCCT CTACAATATr GTTGGTTTCC GTGTCTCCA AATGATTCCG GGTCTTGAAA ATCGCAATTC TTACATGGATI TCACCAA.ATC
GGATTAAGTT
CTCGCACATA
CACCw1rGC
TCGTGCGCTG
CATCA'rrCAq*
AAATAT'GAA
AATAAAATAT
TTrAGATAAT
AAAAGATT'T
CATCAAGAAA
GAAGAAAAAA
ACTTAGGTCC
GTGCTGGTTT
GCTTTCCTGA ATACCAAAAT rTGAAGAGTG GCCATAAGAA GTG7rTAAGT TGATAAGCTG AACACCAACA AGACGCTI'AA AGAACTACTA TACCATAT~I' GCTGAGGCTA TTAGAAGAAA CACCAAGATT CACAGTTTAA ACGGTTATAG CAAGAGATAG ATAGAGTAGC ATATAATTGA ATGACTAAAC AGGGAAGT'N' TTAGAAAGAT TTGTGGGTGT CACCCAAAGA GGTATTAGTG GGCACGTTCT GAAGCAGCT AACTATATGA AATGCGTGGT GTCAAGTCTA AG7TrGGTTTG TTCCAATTCT TTGCGTGGGG AGGAAGAAAT GCG'rCGCT'rG GGTTCTGTTA CTGCAGGTGG TGCCCT'rGCA GTGGACCGTG TTGCCAACCA CCCCTTGATT GAAGTGGTTC T-AtrTACGGT TATCGCTACT GGTCCTTTGA CTCTTAATGA CGGTrGCTGGT 'MT'TCT ACACTATCGA TATGAGCAAG GTCTACCTCA ACCTCAATGC CCCTATGACC AAGCAAGANT CAGAAGAAGC ACCGCTTAGT TCTTTTGAAA TCGAAGTCAT GGCCAAACGT GGCATTAAA-A GTCT'rGAGTA CCCAGACGAC TATACAGGAC 12840 12 900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520
CGTTGTGCA
AGACCCACCT
ATGCGGAGT
TTCTTGAGCA
ACTTCGTCAG GATAATGCAG CAAATGGGGA GAACAAAAGC 'rGTCCGT'rAT GGTGTGATGC GACTTACCGT TCTAAGAAAC 282 AACCAAATCT CTTCTTTGCT GGTCAAATGA CGCG7GTGGA AGGCTATGTr GAGTCGGCGG CTTCAGCCTT AGTTGCGGGA ATTAACGCAG CTCGTC'TCTT CAAGGAAGAA AGCGAG.GCTA T'PMrCCCCGA G.ACGACACCG Alr4AAGCT TAGCTCATTA CATTACCCAT GCCGACAGCA AACATrTrCCA ACCAATGAAT GTCAATTG GGATCATCAA GGAGTTGGAA GGCGAGCGTA TCCGTGATAA GAAGGCTCGT TATGAAAAAA TTGCAGAGCG TGCCCTTGCC GACTTAGAGG 14580 14640 14700 14760 14820 AA7TTTTGAC ATTGTGA'rAA
ACGTATTTTA
TATCCAAACA
AATTr-CCcTT 'rATGGACCGT
GATGGCAGAT
GCAACAAGTG
TATCGrrATC
CCTTCGTGCA
TGTTTACAAT
CCGTGACGTT
GGACAACGAC
CGTATT1TGGT
AAGAATATGG
TCACTTCCTC
CGTGTACATG
ATTCCAGAAG
TGTCTAATTT
AATAGGTAGG
TTTGAAAcA ATTGCTCATG ATACTATAAA AATC~rAGAA ATGAA.AGAAG GAGAGTGAAA ATGGCGAATC CCAAGTATAA ATCAAGTTAT CAGGTGAAGC CCTTGCCGGT GAACGTGGCG TAGGGATTGA GTTCAAACAA TCGCAAAAGA GATTCAAGAA GTTCATAGCT TAGGTATCGA GTTATCGGTG GAGGAAATCT CTGGCGTGGA GAACCTGCAG CAGAAGCAGG GT'TCACGCAG ATTACACAGG AATGCTTGGC ACTGTTATGA ATGCTCT'rGT TCArrGCAAC AAGTTGGGGT TGATACGCGT GTACAAACAG CTATTGCCAT GCAGAGCCTT ATGTCCGTGG ACG'rGCCCT'r CGTCACCTTG AAAAAGGCCG TTTGGTCTG GAATCGTTC ACCTTACrrC TCGACAGATA CAACAGCGGC GCTGAAATCG AAGCAGATGC CATCCTCATG GCTAAkAAATG GTGTCGATGG GCCGATCCTA AGAAAGATAA GACAGCTGTT AAGTTTGAAG AATTGACCCA ATCAATAJAAG GTCTTCGTAT CATGGACTCA ACAG=~CAA CCCTC-'CAAT ATTGACTTGG TTGTATTCAA CATGAACCAA CCAGGCAACA TCAAACGTGT 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320
GAAAATATCG
CTAACCCAAT
GTGAATTTGG
TAGAATACTA
CGCGTGTT
GAACAACAGT TTCA.AATAAT ATCGAAGAAA TATTGAAAAA GCTAAAGAGA GAATGACCCA
AGGAATAAGA
GTCTCACCAA
TGGTATCCGT
TGGAGTCGAA
GCTGGTCGTG CCAATGCAAG CTTGCTTGAC ACTCCTCTTA ACCAAATCGC 'PTCAATTACG GAACGTGCCT TGAACGCTTC CGCTTGGTTA TCCCAGCTCT GT'rGCTAACA CCATT'TGACA 'rGATATTGGT ATCACACCGG TACAGAAGAA AC'rCGTCGTG AGTGGCTCTC CGCAATATCC AAAAGAAATC ACTGAAGACG AGTCTTCATT GAAAGACATC CTAATGACGG TTCTGTGATT
AAGLGTCGGCG
GCTAAGAAAC
GACATTCAAA
GAGAAAGAAC
AGTTTrTATTC
AAAATGCTAA
GAGAAAAAGC
AAGTAACAGA
TTTTGGAAGT
GAAAGAAGGA
ACCI'TGCTAA
GTCGCGATGC
AATTGAAGAC
AGAAGTGAAG
TATGGACGAA
TCTGAAAAA
CGATGCTGTT
cTAAAAATAA
AATATGAATA
AAACACATCG
ACAGAAAAAC
CAAATCTTGC
ACGACATCAC TGCTAACAAA TCAGTTGGCA TTGCTGGCTG AAGTTrTTATC GTTGGACTGA TCATCGATGA AAACGACCGT TTACTTG CTAAGGAAGA AGGCCAACAT ACACTAGGG TGAAGCAAAA ACTCCGCCTG ACAACCTTAG GGGGACGTGT CACAGAGGTT CGTA.AGGACT ACAAGGAAAT CGrrGTGTCA CTCGATATIC AGGGCGACCA ACTCTACATC CGTCTTGAAG TGGCTTATCA AGAAGACTTC CAACGTCTTG AAAACTGGCC AGCCATGT'r TACCGTCTCA AAAATAATAT GCTTGGTr ATTCATCCTA AAGTATTAGA TGCGCGCGT ATTGGTTCC TCAAACCACG CTCC7r.A ATGTTGGAAA AAAGCAATrG4 CGGTTTCATG ACCTIAAATG CCTTTGCCAT T-rCTAAAGGT CAGTTCAAGA AAATCAAGCA GGACCAGI'T GG4GACAGAGT TTACACTTGG CTCATGACCG AGCGCAATCC AGACCTCGCT TTTGAAGAGT CAGCC1'TTCC TCGCTrTrG GAGGAGCATG CCAGTrrCTC GCAGGAATAT CTAGAACACT AGCATTTATT CCTCTGCTAT AATAAAAAGA AATAAAAGGA TGCAAAAGGA TGGTCAAACC TATGCTCTTG ATACGGTCAA AGGTTTTGCA TACACGGATA AAGTGACTGC CACTCAGGAC TGGGTGTCTT TGTGGATACA TCCCTGAGCT CAAGGAACTC TGGATAAGAA AGACCGTATC CTCGTCCTGC CTACAACAAC AGCTGTCAGG AACNrrTGTT GCGAGCGTTA CGCAGAGCCA GTGAAGTGGA CCGCAC'rCTG ACGATGCTCA GATGATTTTG ACAAGTCATC TCCAGACGAC CAArrTGGr
GGCCTTCCTG
=GCCTAAGA
TGGGGCCTCT
ATGCAGAACC
TACCTACCAG
CGTTTGGGGC
AACCTCTCCC
ACTTAT'rTGG
ATCAAGGCAA
0 .0* 0* AAGCTTTAGG TGGTCTTATG AAGGCTGGTA TGATTTAGGG AGGCTTATGA GAAAATCATT TAAAAGTAAC AGTCCCAAAG CAATTTTGGC AAAACACACA. GATGATr'rTG ATGAGGTCAG 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 -17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 7 AACCTA GGAGATTTTG CATTGGGTTT GGGCTAGTAA TTAGAGAGGT TCTTTATTTG ACCTGTTT CA TCTTTTTGGT ATGTTGTGAT TCATGCTCGT AGGAAGCCCG TCAAGTTA'TT
ACAGCATTTG
TTTCTCCATC
AAGGAACATT
TCCAATGAAC
ACGGAGATTG
CAGGCTTTGA
CAATACACAT TCAACTGAGT GCCATCTTCG TTTGATGGAA TCCAGGTTTT GGGAGAAGAG TGG'rCTTGGT AAATCGTGGG TGG'rCAAAAA TGATGAAATT ATAATACTGG GAAACCTA'rC TCAAACAGCA TGATCTGACC
CATCCAGATG
GAAGAGCTTG
TCTGCCTGTG
ATGACCGTTG
GACAAGTTTG
CGTGTCAA.AA
TTGGAATrG
GCCCTTAAAC
GAGAGTCTTG
GTTTACGATG
GTACGCCAGA TGTAG'TCACT GCGATTAGCA TCGCCCTTTA CGAAGAAGAA ATTATCAAGG CCCTAGGGCA AAAGCTTTAT GTGGACAG1'G GGCCAGCAGG TACAGGGAAG ACCTTCCTTG GTGGGCAAGT CAAGCGAATT ATCCTAACTC GATTTCTTCC GGGTGATCTT AAGGAGAAGG CCTI'GTATCA AATTCTTGGG AAAGACCAAA CAG'rGACCTT
GTCCAGCGGT
TGGATCCTTA
GGCAGTGACT
GGAAGCGGGA
CCTTCGTCCT
284 CGACTCG'rCT CATGGAGCGT GAAATTA'rCG AAA'N'GCGCC CCrGCCTAT GGACCTrGGA TGATGCCTr'r GTCATTCTCG ATGAGGCGCA AAACACGACC TGAAGATGTT CNTGACGCGT TTAGGTTTrTC ATI'CTAAGAT GATTGTCAAT
ATGCGTGGCC
ATCATGCAGA
GGAGATATTA
GTCAGATTGA CCTGCCACGT AGAACATCCA TCAGATTGAC TTGTCGCTCA GATTATCCGA GAGGAAGTTC GCCTGCAAAA rTTTTATGGAA ACAGTATACG GATITGGTGTC AAAAAGGAAA GATTCAGCAC CGAAAAGGTG AATGTCAAGT CCGGTTTGAT T1GATGCTCAA GAGAAACTCA TT'rGTTCATT TTTCAGCCAA GGATGTGGTT CGCCATCCTG GCCTATGAAT ArrCTACTGA AGTTGCACAC GACTGATT GA.ATAGACTT GTTCGGTAAC 'rGTAAAAAGT GTTATACTAT ACAAAGCACA AAAACI'AAC TCAAAAAACT TCAAACTArr CCTTTCAACT CATGCTAGAA CACCTGAATT CAGCCTATCA GACGTCCACG TAGTCTGCCC ATGGAAGACC AGCTCATTAT ATTATCCCAC TCAGCGTCTG CTGGCCTTTG ATTIGGCGT CCATCATCAC TTGGGTGGAG GATACACTTC GTGCGTCAGG TAGAAGCCCC GAGTGCTGCT GTGGCTATTG ACGTGACCGA ACAAAACCAA ACCAAAAATT ATTCTGGTAA AAAGAAACGA 0#0 *0CC 6 0
GACCCTCCGT
CGGTGTAGCT
'rAGC'TTTGAT
AAGTCCGATT
CACACCTTAA
'r1r'CTGACG
CCTGAAACGA
AATACTTTCA
TTAAATAAAG
ACCIYCCAAA
GAATTAATTT
TACTTCCGAT
ACGGTAAATG
TTGGACCATT
CAGCGTCCAA
AAACTCAAAT
GACATACGCA
CGCTrGCCTT
TTCCTCCTAA
ACATGTCAGC
TCATGTCAGT
GTGCCATCAT
TATCCTGGAT
TGATTI'TACT
TGTTGACCTA
AAATTCCAAA
GATACGAATT
CCCr'TATCGT CAA'rTA'rGAA rTrGACGACAC ATAAAGTCrG CTCTTCAAAG AAAGTATTGG GGTATTTAG GCATCTTGAA AATCCCCGCC TGAGTGAGGA GAAATTGAAC ATTTrAACGC
TCAAATGGCC
ACAAAGTrTTG
ATTTCATGAG
TGATAAGCAG
TAAA'rTCAAG 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 CT=GrAGAG AGGAAAATCC AGTTGTATAG ACGAT'rAGGC ACGATGGAAA GAACTT'rrAT 'rGTCATAATC ACAGGGCACA AGAAAGTAGG AGTATTACAG TTGTAGGATA CTAACTG.AAA AAAGTAATCC TCTGTATTTT CAGCATTGTC AGGACATGCT T'GTCTACCT GAAGGTAAAG GGAATGGATC TGACCTTGTG GCTGTTATGG CTrG7=TAT TGGTTTGTT ATGGTTGATC ATATTGTGAC AGAAGCACTA GCTNATTTTG
AACCGCAGAA
GTGAACTAGA
GCTAALAGG'N'
GI'GGCTGATG
AA~'GAAA
AGGATATTCC
CACCAGAGCC
CTAAGGCTGA
AT?'rTGTCTA PAGCCTrATCA
CTAAGAACTT
AACGTTTCGA GTTACGGGCG TTCCGAACAA GTCTAATATA TTATCCAAAG GTCTGAGACA ACCATCAGTG CATCTTCCTG GATGATTGAC CAACTATCTA AAG'rA=TTA TCITATATG AAATTTTGCA ACTGTAAAAG TAAGTTTTTT G'rTGGATT TGCATATCCT GA'rGAGGAGA GAGAkAAAGGG ATTCGTAGTC TCGAAAGGCA CGTTTGGC?1' 285 ATGTTAAGG4G
GATGCGAGGT
AGAAATGGCA
TGATGTGACT
TGGCCGCATT
TGACCAATCT
ACTTGATAAT
AAATCCGCAA
TAAGCAAGAA
TCAAGTAAGA
TACCCTTCCA
TATGACGATC
TCTTTTGAGT
AAGATGTrrC
TCTCAGCA'N'
C'TCTATACGG
ACTATTTGGA
TGATGGGGGA
GCT'rITAGT
TTCCATACAA
TATAAGACCT
TTGGGAAAA
TTGTTATCGC
GCAGGGCTTT AAA'rCAATTG TGAACAGAGC CTAGAAGATT ATT1TGTTTI'G GAACAATTAT GTATATT1CTT TACTTCCGCG TAAACCCGTG CAAGCAGTCT
AGGTGCCAAA
AATTTTAGCT
GAAATGATT
ATGTATAACC
CAGGATTAGA
GCAAGATTAT
TAGATAAGAT
GAGTGGAAGA
AACTGCCAAC
TGCTAAGGCC
GAGCTCAGCT
TCTGTTGACT
GCCCAAACCT AAAAAGAAAA AGCAAGGGTG AACGAAGTAA AAAAGAACTC CTGTCTTTGC ACGGGTAAAA TTTrATATAT AAAAAGAAGC TGGACTAAA TCCTTTGGTT TATATAATTC TCATTACAAG ACGAAGTGGT TIGGGCGAAAC TTATTCAATT TAGAGT'N'CT AGTATACAGT ACAATAAACC ACTTGCGAAG TCACCCTTTT CACGATTAAG TCCTTGAGCA CTTGACGCAA GCCTTCATCT CGATATAAGG A.AGACGTGACA GGATATTGGC AACGGCATAG TTGTCACACG GTCAGCTGTT AGAGCCTGGA CGCATTTG'TT TATGCACAAT TGAGTCTGGA ACGAAAGTCT CCAGTTGCAA AACGATGTAA TAGCTGATGA CACAAAGCAC AGTGGGTAGG CTTTTCAAAA TTTATACTAA ATCATTGATA TCAGTGTAGT ACTGGTAGGT TAGTCAAGTA ACCTTGATAA GTAGTCACAC 'rCAGAGA'rTG CTTGTGCGAA TCCTTTGCCA GCCAAAGCTT TTGGTTAkGGG CGATGGTTGA AGTGCGAGCA ACCGCACCAG.
TGGAGAACAC CGTGTTTTTC ATAGACGGGT TCATCGTGCG TCGATAACGC CACCTTGGTC AACAGCAACG TCAACGATAC TGACCATCTC ATCTGTCACC AATTCCGGTG CTTITGCACC 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 21540 21600 AGGGATGAGA ATGGCTCCAA TCACCACATC AGCATCTCTC ACACTTGCTT CAATGTTGAA TGAATTAGAC ATAAGAGTTT GAATTTGACT TCCAAAGACT TCTrCTAGAA CTGAGAGACG CTTGGAACTA ATATCTAAAA TAGTCACT'rG AGCACCAAGA ATGTGTACCG ACGACACCAC CACCGATGAT AGTTACTTTT ACCACCAACT AGAACACCAG AGCCACCAGC TTGCTTAGTA AACAGCCATA CGACCTGCAA CCTCACTCAT AGGAACGAGG CTCACGAACA G'TTCAGTTG TTTTTGCTGT TAACATAGCA GGCCATCTGC AAGTAGGTGA AGAGAAGAAG ATCGTCGCGC TAAAGAT'rCT TTTACTTTCA CAACCAACTC TGCTGCCCAA AATCTCAGCT CCTTGCTTTT GATAGTCAGC ATCAGTAAAG CCAAGGGCGA TGCGGGCA'3C CCTTTGGAA CACCTGGTAC AGGAAGTGAG CTCCGATTTG AGCGGTACTT GTCCTTGATT TCTGCTALATT CTGGAGCAGC AAGTAACCGT ATTCAGAACT GCTTCACCAG CAGTAGCGAC CCAGAACCGA GACCAGCATT TGTTTCGATA AG~GACACGAT GGCGACACGG TTTTCGTTAT CTACCTTTCA ATTGACGGTC TGACGN'TC ATTGTATATG TTTTT-ATGCT AGACTAGTGA TTiG'rGAAATT TGCCCAGTAT C~rrGGATGA 'rGCGGAAcAA T'N-rCCAACC AATCAAAGCT 286 GACCACGACT AACTAAGCTA TTTrTAATTTC TTTTGGGA'TT TTGTTTTGGT TGTCACATTC AAACCGCTTC AAA.AATCAAG AAATCAAGCT CTAATGGACG CCGTCTATAG AAACGGAGCG 74=rGACTA TGCCTCGGAC TGGAAGAAAC CAAGAATAAC GAATAGAACT AAAAAGCAAT CTGTTCT'rAA GAAGGCAGCT
TGAACACCTG;
CCGATTAACA
CAGTTCATAA
AAAAATGT
GAAAAGTATG
T'rTATTGCTC
AAGGGTAATA
ATTGC>TCAGT
GGTCAGTTTA
ATTGGCTACA
CAGGTGTGAG 21660 TTGAGATAAC 21720 ATCAAAAATG 21780 CATCCAAATT 21840 GAATCAATAT 21900 AGACCTGTAA 21960 CACGTTACAC 22020 TCTACTTGGC 22080 TTGGAACCAT 22140 T'rATCAATAA 22200 AGCTAGCTTT 22260 CCGCGTCAGG 22320 CTTGTATGGA 22380
TAATCCCTI'G
TG.ACTTGCAC
AAAGTATTGG
GGACGTTGGG
AAGATTGATT
AATCAAGGAT TAACGACAGA AGCCAATCGT GCTGTGATTG TGAGAAJGATA GGGATGAATA ACTTGACTGC CCTTCACGAT AAGGCTAATC AAAGGTCATG GAGAAATCAG GCATGCGTTT CCAGCATCAA AAAGGCCGAA TCGTGACAAG TTTTGCAAAT AAATAAGCAG TTGAAAAGAA AATAATCTAA CAGAGGAGAA AATATGGAAG TCATCGT~CAT CTGTAC'rGGT CTGGGCTTCC CTCCACAAAC ACCTGTCAAA GAGACGAATT TTCCCATGCA GAACCATATG AGT'rCATTAT GT GACCA AGGAAGACTA AT'rTTCGAC TGTTTTTTCT TCCTCTTACG CAA7TATCGA GAAAATCAAA GAGTATAAAA TTGTAGGAGG ATTTTTCCTG CTrAAAACCAG TCCAGGCTGA AGTTGCAGCT GTrTCCAACG AAGAAAAGGA AGAACCCCTT GAACAAGATC ACTCATCGAC CGAAAAGGAA TAATCACAGT AGATGTCAAA GTAGTCGAGT CAATGATGCT AGTCGCTCAA TCTAGCTCAG GAGAAGAAGC AGTTAGTCAA ACAAGGTCAA TCTCAACAAG GAAAACGAGC TCAGGACATT ACGAGCTCAA GAAGGTCTCT TTACAGTGGA TTAAGAAN'T CTTTATTACG CTATN'TCTC GTCTGTCTCT TTATCCAATT ATCI-rGGAT TTTGGTT'rGT
GTGAAGAAGG
GGTGCTGTCA
GTTCAGAAGG
AAAGTTAGTG
CAGACTCGTT
GCCAGTCTCG
ATTGACCATC
GGCATTGGTG
CTCTATTCCC
AGCATCTTAT
AATCGCCAGG
CTGGTGGCTT
ATGAGGCTCT
CGGGGACAGC
AAGAACTCAA
GTGAGGCAAA
GCAAAACAAT
CTAATT-TACC
CTTGCITrGT GA'rTrATGAC TTGCCTGTAG GACAGAGCAA GCAGACAGCA GGTTTACGTT CCTACTAAGG 'rTCTTCAACA AGCAAGGAAA GCAGCTCAAG GGACTGGGAG TGGCAAGTTC AAGTCACTAG 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 23100 23160 23220 23280 23340 23400
AGAAAAGCTT
TGAGTTTTCT
TGGGCTTTGT
AAGTTCTAAT
GTCAAGCGAG
AAAGACTATG
ATTACTTTGG
T=TCTGCTA
AAT7TTGCGGA
TCAAAATCTG
TCCGTGGAAA 'rCTGCTGGTA TTTTCAAAAT TGGCAACAGA GCGGTCTG TrGAAAGGGT CTATCCTTTC GTGGCAAGTC GAGGAGGAGA AAGAAGCCTT AAGCTTTCGG AGCCAGAAGG AAGACTCAGG GAATTTACCA ACGGATTTTG CCTGATACTA TTAAGGTTAA TGGTGATAGT TAACGGTCGT GC=NCCAAG TCTATTATAA ACTrCAGTCC TCAAGCTTTA ACTGACCTGC ATGAGATAGG ACTAGAAGGG GCAGAGAAAT VIrrGGTGGCT rrAATrACCA AGCCTATCTG GACTCTCAAT ATCAAAACAA TCCAGTCACT 'rCAAAAGATT AAACTTGTCC AGIrACGTC GAAAXGGCTG;T GGrr~GGATT TATGGGCAAT TACATGACAG GACTCTTGCT GGGACATCTG GAATGAGCTT TATTCCAGTC TAGGAIATTAT CCACCTCr
GGCAGTTGGIG
AAGACGCACT
GACACCGACT
GCCCTATCTC
TTGGGCTTGA
ATATAGGAGA
TTCCAGACCC
TTGAGGAGAT
GCATGCAGGT
CCCAAGAAAA
GGACTAACTG GA'ITTTCAGC CATGGGGTTA AGGGCTTGGA .0* 0 0
CCAAACTTTT
ATGACCAGCA
TTGGGCATAT
TTGACCTTTG
TTTGTCCTTT
GGCATTATTC
GCATGGCTTT
ATTAAAGGAT
CCACTGG3AAA
ATGTAACTGG
TCAAAAAATG
TCAAAAGTCG
ATGTTGGAGA
AAGACAGTCT
GTAGTATGAT
CAAGGAAAAT
ATAAGCAATT
TCTTGACAGC
AAGAAGGGGA
TGcCC.ATTCT TcLrT=ccTr CCTT'rCTCTA
GCTTGGTCTC
TAATCT'TATr TAACAGTAT1T
ATGAAATCAC
GAAAACCA'T
GCAAGAAAAG
AGGAGTAGCT
T7NGTCAGAG
GAAACAGAAG
AG'rAGGGGAG
GGGAGATGGA
TCTCTTCACG
AGGTTTTTTC ATGAATGGAT GTrGAAATGG CTGACTTATC ATCGGTTAT'r CGCAGTCTCT TAATTrTTGCC TTGACGGTGC AGGAGGAGTC TrGTCCTGCG GGGGCTCAAG GCTGrTACTA ATCCrrCTAT NrTGCGGAAT TCTTr'rTGAC TTGGTCTTCT TCCAGTCATT CAGCTGAACT GCAGGTGGCA AGGAGACCAC GTTAATTTCC TTGGCTTTGG GAGTTA'rTG ATTACAGGTC CATGCTGGAT GTGGGGCAAG CTCATAGATG TAGGTGGTAA ATGACGACCA GCAATGCCCA AAGATTGACC AGCrAATTTT ATGACCAAGG CTTTCCATGT GAATTTGTGG CAGAACTACA AACTTGCCCA TTTTTGGAAG GGACACGATG ATACCCTAGT GGAAATTTGG AGGAGAAAGG TTAGAAACr TICTCTTrGCGA CCTTTTCCCT TATCTATGCG TGCAAAAGCT ACTGGCTCAA
TTGTCCI'CTT
CTATGC?1'T
GTGAAAGTCT
TTCAACCTTG
TACCGCrCTT
TTATCTTTGA
'rATTGTCATG 'rATCCTGACC
AGTCATCTCC
GTCTATCCT
GTCTATCTTA
ATGGTTAGAG
23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 24660 24720 24780 24840 24900 24960 25020 25080 25140 TTGTCTTTGG TCAACCCAAC TCTATGATI'T GAGGAAAAAC TCTTTTTCCT TACCAAGTAT GAGAAAGTAT TTTCTACGGG GGCAGAATCT TATAAGAAAA GCGAACCTTG ATTCCCTATC GACTAACACG GACAAGGAGC AGGGGAGA'rr CTAGTATCAA GGCGACTCAA ACAAAGGTGC TCAGTTAGAA GTTCTATCTC TCTGTATGGG AAATTCTTGG AGAGAAGGAC TTGCTGAAGC 288 ACTATCCAGA CTTGAAAGTA AATGTrTTA AAGCTAGCCA ACATCGCAAT AAAAAATCAT CAAGTCCAGC CTTTCTAGAA AAACTCAAAC CAGAGCTTAC TCrTATCTCA GTTCGAAACA GCAATCGAAT GAAACTCCCC AAGTTTATCG AACTGACCAG TCGAAAGTGT TCGATAGGAA TTGCATAATA ATGATAAAAA CATTAAAAAT CAGCAAAAGT TCCAGTG;TAT CTGCTGTGAC GAATATAGTA CCAACTATCA GACAGCTATT CTGAGTATAA AACTATTACA GCGGAGGCTA TGCAGCTTAC TC-ATGTGACC CCTTTrAAAAA GGGTAGGGTT TGTTCCGTGC CAT'rAGCAAT CTTTATTTTA TTATGAGAGT TTCGTCTTAT CAAAAACATC GGGAAATGTC TGACTATATC GCTTGGTGAT TGCCATGTAT CAAATGGCTr AGATGAGTAT GACAAGAACA GCTGGTTCTT ATAGAGTAGT AACCATTCAT GTCTATTTTT ATTGAAAAAA TTGTATCTGT ACTCGGTGTr TGGAGAGCAG GTTGGAAAGT AGAAACTCTC CCAAATGTCT TAGACGTGCA AAAAAATCTT GGCGCTGGAA AGAAGCCTAC TATCAAATGA CCCGACI'GCT AGTTGGCGGA GGCTGGAACT TTAATTCTTA AAGAGTGAGA TTAAAAACGC GACAAGTCAT TATGGTCTTC TTGCTATCAA TTAATTCCCA TAAGTAGTGG ATTGAATGGC TGGATGGAAA TGGAAGTCAG GTCTrGAACTT AGTCTTrCCCA TTCGCAAGTA TTCCTCAGTC AGGCCAAATG TATCGACAGA AGTTTTTTGA TTAAGTTCCC ACTATAAGGA CAGGGGCAGA TAGAAGAGGT GTTTTCAAAA GCCGCTrAPA ACCTIVATT TAAATAGTCA CGCATTGCAG CCAACGAGAG GATACCAGCT CGCAGGAATA ACGCTCGTTA TGAGGTCATA AAAAGGAGGG CTAGATATGT CTTGCAAGAT GTGGAT'rTCA TGGCTCTGGA AAGACGACCC AAATATCGCA GCCCCTCCTT CTTAAGTGGG ATGGACTACC GAGGGATGAA ATCGCCTATT TTCCTTAGGC ATGAAGCAAC CTGGCTCATG GATGAGAT'rA TAGGCTAGCA CAAATCGATA AGAGTTGGTT GATGTCTGCG TTAGTTTATG AAAGATGTTA CTCGATTGTC TTAGCTTTAT GACTGCAAAC TCACACAGCT GGCTATCALAT GAAAATGAAG CCAGTTTGC1' AAAAATAATT- TCTGACTTTA TTAAAAGAAG CATCAGGAAA CATTGACACG ACTGGAAGGT ATCAATAGCA CAAGGAGCrA TACGTTTTAA GGCZIrGGAT AG'rTGGAAAA GGATAAATGT TGTAGATTAG TGAAATAAAC TAAAAATTTG TGGTATAATG AAAACGTATT CAATATTGAG GATATAAAAT TGTTTTATTA GTTAGTTTAT AATCTAT'rGG TCCrrCAG AGTCACTAAA AGTTACAAGT ATGATTGGAA TACGGTTTGG CGACCATCAG TATGCTTGGA TT'CCGTCATG GTCTCGTTAT 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700 26760 26820 26880 26940
TTGACGCGAA
TATTTGCAGT
AGCCCTGGCT
AGACAGAAAT
GGCAAGATGA AGAGAAGAAT TATGAArrTG TAAAAATGGG GGTTGACCGC GAACGGAAGA AAGCACATAC TTTGGAGTTT CCGACCCACG TTATCATCCC AAGT'rTGTTT GTGGTTGCTA TTTACCAAGC CCTGTATCCC 1TTGAACATAA GGAT'rGATCA GATTGTCTGG ATTTTAGAGG TTAT'FrTTAT GCTAACACAA CTATTTGCAG ACTTATATCC TG? TCAAAA GTGACATTTG ATGTAACTGT GCTGTTTATC GGAATCTGTG GTGG?1TGG ACAGTrAGAT TATCCCTACC CTATTGGGAA AATACAAGAT GTATTATTTC TCGTCA'rrGT GGAAGTTGTG TACTTGATTG TCTI-TCTrTC ACTCATTGGG ATTGTTGGCT T'rCAAAGGA'r TGCACATCTG ATTCCCTTTA GAAGATTACC TAAGCAGATT GATAATGTCG TTCCTTGCCT GATTATCTTT TTGCTATTGG CACAGAAAAA. AGAATTTTT AATAGATTCT AAACTAACAT AGAGAGGGAA TCAACTTGAT AATACAAACA CAAACTTTTC AAAAAATAAC GTATCATATA AAAGTTGAGA AAAGCAGAAG AAAGArATCA AAATCATCTG GACACAGCTC CAATATCCTC TC7 TGGAGTT GGAGTGGGAT GCrFI-rCTTT TCTAGTGGGA AGTCTGATAA CAATTTATAG C?1'AGTGAAT CAAGAAGTAA CTCGCTTGCT CTTAGCTTTC TTAGCCTTTA CTTACT'rTTT CAAGCAAAAA ATGCCrGTCC TATTGT'TTGG 'rATCCAAACC ATTCAGCCTC CTTACT'rGCG
ATCTAAATTG
GAATT-CTA'T
TTCAGTGGAG ATTTTATCTG GAGCATGGGA ATGGTCTTAC TATTGAAAGA TGGGGAAGTT AGCTTTCCTA TAGGTAGGGA AAATAAGTAA TCTCTCT TGATTCGAAA ACCAAACCAA TTTTTATCTT GACAAGAGCT ACAAAACTTG TGAGAGCTTC TCGCCTTGTG ACATTAAGTT GCCTGGCCCT ACGGATGAAA AGTTITCGAAG AAACGCTATC ATAACGTGCG GGCTTGTATA T"TTACAAGTC CGCTATTGTT TTCTCTAAT AAAACAAAAG AGGTGAAAAC CATAGCAAAG CAAGACTTAT TCA'rCAATGA TGAGATTCGT GTACGTGAAG TTCGCTGA'r TGGTCTTGAA GGAGAACAGC TAGGTATCAA GCCACTCAGT GAAGCGCAAG CTTrTGGCTGA TAACGCTAAT GTTGACCTAG TATI'GATTCA ACCCCAAGCC AAACCGCCTG TTGCAAAAAT TATGGACTAC GGTAAGTTCA AATTTGAGTA CCAGAAGAAG ZAAAAAGAAC AACGTAAAAA ACAAAGCGTT GTTACTGTGA AAGAAGTTCG TCTAAGTCCG G INFORMATION FOR-SEQ ID NO: 23: (i).*SEQUENCE CHARACTERISTICS: LENGTH: 7147 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: CCGCTCAACT TTTGCAATCA AGGCTAAGTAGACAGCAGCA AA'PTTCATAT TGTATAATPT CTGACTCATA CTTCTCTCTT TCTATGTGTA CTAGTATAAA TAAGAAAAAG AAGGCCGTCA 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28171 120 AGCCTTC~rr TGATrrArrC
GCGTAGAAGC
TCrTTTCATGA
AAC.ATGTGC
GTGTCAACAG
CGAGCAATAG
GTATCATAGC
TCCATCATTC
CCTTCAAAGA
GTCATAGCCT
TTCATCAAAA
7'rTTGACCAG
AGAATTTCTC
CCAGAACCTG
C-ACCTTGCGC
CCAAGATCAA
c1'CCTTCCAT
AAGAAGTCGC
TCAATAGTTG
CA'rCTGGCAA
GTTCATCACT
GCCAGGTATC
TGGTATCCAC
GAITTACAA'r
CATGAGCTGT
CTGCTGACAC
TTGACCCAAT
290 TTCTGCTTCA TCN'CTGTAA AI'GACTATT GTACAAGTCA CATCAGTTCC TCATAGTTGC CTTGCTCGAT GATATTTCCA G'rTCCATTT CGGATGGTTG ACAAGCGGTG GCATGC CAAACGGTCC ATGGCVNTT GGATCAATTC CTCTGTCCGT cTCATCCAAA ATCAAAAGCG GTGCATCCTT AAGAAGGCCA 1-~r=TT~ ACAGACAAGG TCACGGT=C ATCCAAGATG GGTCATAATA AAGTGGTGAA TTCCCACAGC CTTACTAGCT AATCCCTATT TGATTATAGA TGAGATTGTC TCGAATAGT'r CTrGCAAGACC ATTGAAAAGG CATCATGCAC TTCTGAACGC ACCATCAATG CGAATACTTC CCTTATCAAT CTCATAGAAT CGTTGTCTT1A CCAGCCCCAG TCGGCCCAAC AATGGCAACC
S
S
S
S. *5 S S TGTTCAATAA CTGCCTCCGA TGACCTr'rGA AG~rTCATC AAATCTAAAA CT'rGATTAAT AGTGCTCCC-A TGAGAAGGAA ATGTCACTAA AGAGAGGCAG ATCCAGTAAA TCGCCACACT GCCATAAGAC GGTrGACAAA TTTTCArT GATAATCCTC CGCAGAGAAG TCATAGTCTT GTCGTAGAAA CGTGGAA'rCA AAAGGCCACT GTTTGACCAG ATT'rGCCGCA TAGCGgAAGG AGTCAGCTGC ACTTGAACAG CCGCTTAGCA GAGACCATAG GCCCATGACA ACCTACATGG ACGCGCTATC GGAGCAGCGT CAAACCACTT GAAATCCCCA CAAATTCAAA CGGGTCAAT TGCATTGTAG GCACGAACGA TTATCTGTC AGCCCCtGAA CGTCATCAGG ACGTTGATAA ACCTAAAATC TTCCCAATAG GCCCATAGTA ATCAACATTT AGGAATTGAA AATTTCTTAA ACTTCTCAGC CTAC'rAGTAT TACGGACAAG AAGGCAAGAA ATCTAAATTA GTTTCTTGAC GAACATTGAC ACCGTCCACC GATTGACCAG AGTTGATTTA TTTCTGCTTT AAAGCTAACA TCACATCCTT AAACTCGACC GGT'rTTGGAT AGAAGAATGC TTCGGGGAAG AACGATGAAG CATAAGACAT GAAAACA.ATC CGT'rAATCAC ATAGGCCCCA TCATGATAGG ATTCAAAATA CATCATTTAC TGCTGCAAAT CACGAATACC TGTTAAACTC TCAAG4GACTG TTTTGGAAAG TCACTGCCAC AAGTACGCC CCCAGATAGC CATAATTGAA GAACTTGAGT AATGrCATTG TCTCTGTCTG CGAGTAATCC AAGAAGCCGC CACTCGGGAT AGGACATTCC CATCATCATG TACCTAGCAA ATCCGTAATT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920
TCACGAGTGA
GCTAGCGTCA
CAGAGCCAGT
CCACGCGT=A
GTAGTACGCG
TACTGTTCAG
TCAAAACGGT
ATTCTGAATG
CCACTTGCAA
TCAAGAGGCT
AAAACTCGGT TAAAAATATC GCAAAAAA'rC CAACTGCAAC CTTGCCGACT GCCACAACTC 291 TTCGAGATAT AGGTCGGCAC TrTCCAACTCT AGATAGACCG AAAAGCAAGT GCTAMTAAAA TCATCCCCCA rTTT~'TCTA CTAATTrCTTT TGGCTAAT'T TCCTCCTAT CCCrGATAT TTTG.CCTGTA GTTGACCGAG AACCTCCA ATTCATCTTC ATCAA'rGTCT TCCA'rCAACI' GCTGTCTAT GCGTTCAAAA CCTGTTGCAT CTGAGAACGT GC rGrCCG TCAGACGAAC AAACTTACC CAGGACTCGC CTCCAATTCC ACCAAACCAT 'N'TCACTAT ACGCTTAACC CAACAGGCTT GGTAATATTG AGTTCCTGCT CGATATCIT AATCAAGACC
AAAGAGAATG
CTTTATTCTC
AAAATCAGTA
AAAGCCTAA
CGCTTATCAA
AGATI'ACTAG
AAGTCTTGGT
TT=CTCGCG ATTATCCAAA AAACGCACAA CCTGACCTTG TGCCGCAACG TTTGGCTTCC TTTTGCACCA TCAGGTGAAT AGACTAACAT CGGT'1rATCC ATAATCI'CCC CCTTCTAAAT TAAT'rAAATT TCTATGAGAA CTATTTICTT GATTAAAAAA TTAGGATCAT GT'rCTATAGG T'rAAAT'rAAA ACCCATCTAC CGTCTTCGTC GTCTTCAAGA ACGCTGTAAA GTT'rrTCAAA ACAATTCCAC TTCTGACTGA GGAATCATTT CCAATTCAGT CGGCCCACCC ATAAATTCAA TTGATGACCA AAACGCTTAA AAAAATAGTT CTCTGGAGAA ATCCCAAGTG ATTTTCTCAC G'I-CGTA'rAA ATCTTTrGGA GGTTTCAAGG TCTTCGCCTG CACTTGGAAT TCTTCAATAC TCGCGCTGTG TAAACTCTGA
S
S.
S S S. 55
S
S S 55 555555
S
U S 5.55 *5 S S 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 CAGACTCACG GAGGGCAACG TTrGTAccT~TT GTGCrrcl AGACTGCGTC CGCATCTTCA AAACAGAACC TGAAGCGCCC CTGCTGTACG GTTGACGTTA CAAAACCTTC GTAACGTCCT TCGCTTTA'rC GATAATGTGT TCA.AAGCTGA GTTTGATTCT CACCAAATT TGCATATACT TATTGGCCCA 'TTrACGTCCC ATAACACAAG Tr'T1TTGAT GGATAGCACT TTACCTGCTA ATAGCCTTGT GAAGGTCAGT
ACGTCATCCX
CCTCCAAATA
ATGTT'rCCGC
GAAGTCAAAG
'rC'GTAAAG
TTTGGCACTT
GGATCTGGAT
TrAGAGTTAG
ATTAGGAATC
TTTCACTAGA
AGATGGTCTT
AGACCATTAT
GTTTGAAGTT
TACTCCAGAC
CATCCACATC CGCTTCGAGC AATTGCTCAA CAATAACACC TrTGT'rGTCA AAGAGGTAAG CGTrTT'rACC AAAGGCTGCA CGGACATTGG TATCCACAAT TAGCATAGAG CCATTTGGCC TTTCGTCTGT GTTrCCT'rTG GCTTATCAA CGGC7T=GT AGCACGGTCG ATAACGAA'T CACCTTTTTT AGCTGCTACA TAGATTTCTA CTCCATCTTT AGCCGTTTTC TTGGCTACGA TCCTTTTTTC ACAT=TAAT CTTTCTTATT
GGAAATGGAT
GCCTrTTCTAT
CTGCA.AATAG
TTTA?1'AGCA AATCAAGCTA CTTTATCAAC AGCCACTCAT AAAGTTTCAG CCAAGTTTGA
CCACATTCAA
CAAAGTCAGC
TGCACCTTTT'
AAAACAAACT
TCAAATTACI'
GTTGACAGTC
TGTAGATATA AGCGACAAAA ACAATCATAC ATATCATAGT TCAAGTAAAT ACTTTCAAAT TCAACAC7TC ?1'ATAGGCGC ACCTCTAATA CTCAATAAAA AACACTGTTT TGAGTGCG AATTACGTG TTCCCAAGAT CTAGCAATTG ATG CAT ACAGCAAAAT ATNTCCGATA GGATAATCAA TCCCCTGTAT rGrrGATAG A'rrCATITA ATAAACTI1TC ALGATATCCGC TCCTAGTCAT TTTCTACCTT ATCGTCCGCT TACGCACTAG 292 TATTGTATI'C TAAGAAATCA ATAGAAGAGT TrCTAAGCAA ATCAAAGAGC AAACTAGAAA GCTACCCTCA GGTTGCTCAA GATGOGGCTG ACATGGNTG AAGAGAN' CGAAGAGTAT GvGAGAAGTTA GACTAGTACA CTGGCACrC TAAAACATTG ATTTAATTC A?1r1TTCCA 'rAAAMrGGTA TTAGATATAA CGTGTCGTTC TTGAATNTCC AATCATCTAA AACAAGTAAA ATCAAGGAAT TGGCTACCCT TTTTACTTTT TTACACATTC ACATCACGAG CATACTCCAA TGGAAATCGC TAGGCAAGAG AGAGAGATCA TCGCCTCTTT TTGTCGCAAG CATTCTCCTC ATCTTCTACC TGAGGATAGA GAGTTGTTCC CCAAATAGAA TGGCAAATCG GTrTTTTTCAT AAACCGTACG CCACCATTCC 0 0*e 0 0* 0 0 0 0 0 00 *0 00 0 0
CAGGCAAGCC
GGAATACTAG
GTCAAGACTT
GCAACATAGG
cGG'rAcAcTC.T'cTAATrTTGr TGGTAAAGTG AGCCGTTAAA CCTTACCTTG ATGATCATAG CACCACTATG ATCCAGCAGT
ACAGAGAGAT
TCCTGCCCAT
TACGAACAT CCCTTrAA.A TTCrGTCCCA AGCCTTAGGA 00 00 0 0 *000 0000 0000 0 '.00 00 *0 0 0 0 GTCTCAACAT AGTCTGTACT ATTTTGAAAG GrrGTATAGG AAATCGGCAA GCCrGGATGA AAGTCCTCTA CCATATCCAC CTrGCCTGTT CCTAAAATAA CCGCCTTCAC TTCrGTATTG CCTACCTTGA CTCCTTTrA'r CAAAGCTTCA GTGGTIYTCCA ACTTGAGATA GACTTGGCGC GGACGCTCTG CAGAAATTCC TCTCTG7TTT ACATCrCCTG GATTTTTAAC AGCATCTACG ACAATCTGAA TCTGCTT,c GCCTGAATGG CCTGTCTrTT CAAAGTCAGA ACCAAACTTG ATTTTCAT ACTGCAT'TCT AGCTGGGACA TTAGCCAACA AATCGT'rTAC CGCTCCGCGA AGAAAGCTAT CGCTACTTGC CAAACCAGGC TCGACCGCAA GAAGAGTGGG ATrATTCTCT CCAGGATAGA GGCGACTGTC GTTGGTACT GATAATTCAT TCCAAGTAAT ATAATATTGG AAATCTCCGT TTCTGTAAGC 'rGTAACCTrA GTCGCAACTA CATTGTCACG TAAAAAAGAA TCTGCTGTAA AGCGACTGCC TTCI1rGAATC ACAACTCGGG CACCCGAACT TGGGTCGCCC TCCAAAA'rCT
AAAGCAGCCT
CCATAAGCAA
AAATCCTCTA
CTGACTGTAT
ACAGAGTTAA
ACCTTGAGTT
TTATTGACCT
GTTTCCACTC TGTCTGAGGA CTACTTCATC ACTCTTACTC CACTCGAAAT ATAGACCAAA CCGTTACAGT ATCTTGAAAC AATAAATCTG Cr'rAAAATTA AATCAATATC AAGAGAATTC GTrCCA'rGCT GTGAGCCGTG GACCATAATC TTGATGCCAC 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 AAATCAATAC TATAAGTCAT CGGAGCACGA AACAAGGTCT CATCCACTAC GAGAAGTGCT G'N'ACAGAAA TATCACNrGT ATTTGTCGAC AAGCTCCGCT TCTTCTTTC GATAACAACA AACTCATCGG CTAGCITGATT ACCCTCTTTG ATGAAACGAT TTTCAATACT TTCTCCCTGA TGGGTCAAGA GTTTCTT ATCGTAATTC ATAGCTAGTA TAAAGTCATr TACTGCTTTA TTTGCCATCT TCTACCTCCT AATAAGT~cc TGGATTGAGT 'rGCA'rAAACT CAGACTGTT CAGCGAAATC AGCCGTGG?1 GGACTAAGTA ATCCAAAArr TCCTCGTACA A7TCCTGA GACATTGCGT CGCCGTCTGG CTAAA3A AGTCGGAATG ACCGTATTAT CCAACATAAA TGTAAAAGGA TTACGAGCTA GATCCGGCTC GGTCAAGAGA GCTGGTTI'GA AGGTCTGArr GTTGTCAATA CAGATATACA TATGATTCrr AGGCAAATAA AGAGCTACAA CATCTCCTCT CTAATGTAAC ATAGACAGGA TTGACCAAGT GAGTNTGGCG CGCTGTCAAT TTAGTAGCAT CAATAACCTT GACAATATAA TCCTTCTCCA GTCTGTCCTG AGAACTCAAG AGAGAAGGAT CAAAAACAGG AGCTAACTCC 'rGCCAATCTG C'TTTGCGAGC TGATACTTCT TGGTTAGCCT GAAATCGGGA AACTCCAGAA AGCAAA'IT TT-CCCAGACA TAAGTCTTTC ATGTCGCTrT AGTAAAAGAA AAATATCTGG CACTGGGTCG TTAACCAGAG CAAGGGCGAC AGGTAGCTGG TTTTAACTGG AGCATCTGCA CTAGCTGCCA TAATCTCCTC TGAGATAAGA CCTTCGCATG TTCCTGCATA TTTCTCCGTT TGGCATTGAT CAGIT'T7rTG GTAAGAAATC CAA'1ITITCCI' ACAGCAAGCC GAGACTGGAT AAGGCAACCG GAATAGCACA AAAGACAGCT ATGAGGAAAC
TTTAACCAAC
CACAGCCAAA
CTTAATCAAG
CTTCTGATTG
CT'TGTCTCT
AAGCTGACTG
TTTCAAGCCT
ATTGGCTAGT
TGAGAGTTAC
TCTCTAACTG
CTCTAGCAAA
CGCTCATCTT
A'IrGAGACAT CT'rCTTTTGA
TCTTGACAAA
AGAGGAATTT
T'rGCATTATA
AAATCAAACG
CTGCCACGAT
TCT'11'GTTTT TAGTCATGCT TCGCTACTAA TAGTCGGAAA CAAGAGCACC CCCTTTTCTC ACI'CAGAATT TCCAAAGTTT CAATACAAAA TGCTTGTCGC GTAAATCCAC ATCAGATGT CCCGATAGTC TGATAAAAAT TGTCAAG40CT AGAAAAAGGG TTTCCCCTCC- AAGTLTTT'rTA CGAGAAATAA AAACCTTTCG TAAGAGCTCA AACATTTGAT TTCCTTATCG GCTTCTTTTT TTCCTCTACC TCCTrACTCTT CTGGATACTT 'rCCCACTGGT TAGGGCAAAA GCCTTGGTCT TCTTT~CTCC AGGAGTTGCG TACCTTATCT AAGTCAATCA AGGTTGGTCT TTCTATCATA AAGTTCTTGA CCAAACG;TCT 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7147
CAAATTGATA
GTAGCGATCA
TTCTGTTrATTr
ATCCCCTCAC
CCTGGCATAG
TTTAATACCT
TATAGCGCCT ACGATGTGA GCTCTGTCAG GTACA'rTTCT
ATCAGGG
ACGCTTTTCT TTAAAAAATG AGCTATCTGT ACGTCTAATC TCTGGCGTCA TATTCGTAAC TCCTTTCA'N' TACTTTGATA INFORMATION FOR SEQ ID NO: 24: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 755 base Pairs CB) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: CCGCATGGGA TTGGTGTCCT TTTGGGCAAT CTCrT-GACC AAACTGGAAA CATGTTTTAT 6 GCGCCTGCCT TTACTGCCCT TGTCGGCGGT ACGTCTATAT GATCCTAGTC GCAAAAGTTC 120 CGCGCTTTGG AGCCATTACC ACTATCGGCC TTGTCATTGC CCTCTTTTC TTGGGAACTA 180 AACACGGTGC TGGTTCCTTC CTTCCTGGAA TrATCTGTGG CC'rCCTAGCA GATGGAGTAG 240 CTCATTrAGG AAAATACAAG GACAAAACAA AGAACTTCCT TTCTTI'CATT ATTTTCGCCT 300 TTAGTACAAC AGGACCAATC TTGCTTATGT GGATTGCGCC CAAAGCCTAT ATGGCTACTC 360 TTCTGGCAAG AGGAAAATCC CAAGAATATA TCGACCGTAT CATGGTCGCr CCAAACCCTG 420 GAACTGTCCT TCTATTTATC GCAAGTATTG TCATCGGAGC CCTAGTGGGT GCCTTGATTG 480 *GACAAGCCTT GAGTAAAAAA TTTGCCCAGA AAATCTGATC AGTTAAAAAG AGCCACGCGG 540 *CTCTTTTTTA TTTATGGCTC AATTTCTTAG TCAAGAAATC TCCCAAGAAT TGGATTGCAA 600 *-AGATAATCAA AATGATAATA ATGGTTGCCA AGATGGTCAC ATCGTGA'rTG TAGCGGTTAA 660 **SATCCATAAGC GATGGCTACG TTACCGATAC CACCAGCTCC AACCGCACCG GCCATAGCTG 720 T TtcCCAACA AGGGaAtCAA GGTcACAGTC GTCAC 755 INFORMATION FOR SEQ ID NO: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 3010 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TTCAATTGGT ATCTCAATCA ACGGTCTTCA CATGGTTTCA ACTGGTTTGA CTCTTGAAAA AGCGAAAGCT GCTGGTTACA ACGCAACTGA AACAGGCTTT AACGATCTTC AAAAACCAGA 120 ATTCATGAAA CATGACAACC ATGAAGTAGC AATTAAGATT GTCTTTGACA AAGATAGCCG 180 *TGAAATTCTT GGTGCCCAAA TGGTTCACA TGATATTGCA ATTAGCATGG GAATCCACAT 240 GTTCTCACTT GCTATCCAAG AGCATGTGAC AATTGATAAA TTGGCATTGA CAGACCTCTT 300 295 CTTCTTGCCA CACTTCAACA AACCATACAA CrACATCACA AAATTAAAAA TGAATGAGCT ATCTGGCCrr AAGrrAAGGT 'rGTCCCCATA CAATTATAGT TnT11TATCT T~lrGCrCA AAAGGTACCT ACCAATACAA ATGATGAGGA TAAAACAAAT TAAATAAAAA CTGGCACAG ATGCrCAAGG GTGGTGTTAT AACAGGCrCG TATCGCAGAA GCTGCTGGI'G CGGCAGCTGT CC GCT CATAT TCGTGCACCT GGAGGAGTTT CCCGCATGAG AAATCCAAGA AGCGGTTACT A?1'CCAGTAA TGGCTAAG AAGCTCACAT I-rTAGAGGCT ATTGAAATTG ATTATATCA CAGCTGATGA CCGTTTCCAT GTGGACAAGA AAGAATTCCA CTAAGGATTT GGGTGAAGCC ?rGCGTCGTA TCGCTGAAGG ATGGCITGCCC TTACGGCTGA CAGATAGTIr TrAGCTAATT TCr'C.=CTGA CTTAAAATGA GACITGAAAAT CG?1'ATGAAC TATrGArGTG CAGAA'rCCTG GATGGCCTTG GAACGAATTC CCACCCAAAG ATGATTAAG CAGAATCGGG CATrTTGTTG CGAGAGTGAA GTTrATCTC AGTTCCTTTT GTCTGTGGTG TCrCCATG ATTCGTACCA AAGGAGAACC AGGGACAGGG AGGAAATTCG CCCCATTCAA TGCAAGTCCC TGTAGAATTG ATTT'CGCTGC TGGAGGTGTT CAGAGGGGGT CTIGjTCGG1' GTGCCATTGT TAAGGCTG1'G AAGATTTAGG AGAAGCCATG AACGAGGAAA ATAGATGAAA CAAAAGTGCT AGATCAATTA AGCAAGA'rCA GAGTGACITG GCAAGCTCTT ACGTGACCAG TACCAGTGTT TGGGACCTGT AAGAGAGTCA TCTAGGAACT TAGGAAGTTT C1'ACACGGAA TCCGTGGTCC GATTATCAGT ATCAAATTG'r TGCAGCCCAA CTGATGATGT GCCrTGCAC GAATTTCTCA AC?1-rTTrAC
GATATCGTCC
AACTTACGTG
GTCCAATATG
GCAACCCCAG
'rCAGGi A'rrT AAGCTGTTCG TCATATGCGT ATGATGAATC AGGACGAGCT TTATGTTC? GCCAAGGATT 'rTCAl'GAACA TGGAAAATTG CCAGTTGTAA CAGATGCTGC GTTAATGATG CAATTAGGGG TCAAGTCAGG AGA'TCCTGTT AAACGAGCGA 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 ACTAACTTCC G'?AATCCTCA AATCCTAGCT CAAATCTC1'G GTTGGTATTA ATGAAAATGA AATCCAAATT CTCATGGCTG ATCGGAATAT TGGCCTTGCA AGCGGCCTTT GCAGAACATG GGTGTCGAGA GTGTAGAACT CAGAAATCTA GATGA=IC TCGGGTrTTGA TTTTGCCTGG TGGTGAGTCT ACAACCATCG AACATGCTAC TTCCCATCCG AGAAGCCAT'r CTrATCTGGCT GCGGGCTTAA TTITTCCTGGC TAAGGAAATC ACT'ECTCAGA ATGGATATGG TGGTCGAGCG TAATGCTTAT GGGCGCCAAT GCAGAATGTA AGGGAGTTGG CAAGATTCCA ATGACCTTTA AGTGIrGGTG AGGGTGTAGA AATTTTAGCA ACAGTGAACA GAAAAAA.ATA TGVTGGTAAG TTC~r'rTCAT CCAGAATTGA CAGTACTT'rA TCAATATGTG TAAAGAAAAA AGTTGAGATT ATGTAATAAA CAATAGCGAT GTATTGAAGT GCGGACGCAG 296 CTAGGATAAA GAGATGCCAA ATCATGTGGA AATAAGGTTT CTCCAACTGT ATAAC.AGAGT CCGCCAGTTA CCATGAGACT GACTGATAAT GGCAGGAATG ATAGCCAGAA CCAACCAGCC GGCTAAATTT CrCATTGACC TT'N'TAGCAA AGATI?1'ATA TTCCCCATTG GATGACAATA ATCAGATAGC CAAACCAGTT CGGGCGTGTA TGAGCCGGCA ATGGCA.ACGT AAATCATAGA CATATTTGTG GGTCGAACCA TAGGCCATAG AGTGATAAAT GAAAGAGACT GATGACGAAA ATGGAAACGC CGATAGAGGA AACTATAGAT GGATGAAATA GGCAGCAAGA TAAGCATGAT TCACGCTATT AGCAATCTCC TCTCCAAAAC TGAGTTGTTT TCATTGGATT ACCTCCTCTT GAGTATGATC GATTAAGTCT AACGGTTTGG CAGCTGGTTT1 GGATAATAGG GTrAGCTGGG GTCCACAAAA GCATCGTAGA GTTGGTCTGA ACTTGCTTGA CTGGGCTATT rC1rCAATAG AAAATACAGA CTTGAGGGTT1 AATCTGTTGG CGTTGGTATT TTTTTTTGTC! AGGCTTTGTC ATTGTrGACC ATAGATGCTG TTAGGCCCTr GTCTTTATTA T'IrT=GGCA TAAAATCCAG CCAGAAAACG GGTGTCGTTT CATAATCAGG TAAAGAGCAA GAGAATACCA AAGATGGTCG ATrCATCAAG GTCAAGACAA ATGGTCAATG ATTCGCAAAA GGTGGATGAT AGGAACATGA TAAAAATCCG TGTGCTTCAT GACTGCACCC ACAGCATGGG GCTGAGTTTA AGACTAGTGT AGAGTrTGAT GATAGAGTTT TCAATTCCTT GGTTCATGTA GTTTGTAGAG TATTAAGTGT GTGATAGCAA TCAAACGGGC 21.00 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 TTT'rCACATA AGGTAACCAT GGAGAGATAG GGGCGCAGAC 3000 3010 CTGAT'rGACA INFORMATION FOR SEQ ID NO: 26: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 15213 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: CATAAATCGG TGCAAATAAC TTAATAGTGA AGTAGCCATT TCTTTCGTAT TTACCTGAGG CATATTCCCT AGACGAAAGA ATATTATTAT CAATCAAATC ATTGAATGAA CGTAGTCT CAACTTCTTC TACTGTTAGA TTTCTGACAA. CATT TGTTGC ATAGACCTTA TTTCCATCAG GATCAGGATG GTACTCATTT GTAACPTTTC TAAGAAGTTG TTGTTTPTGA TTCGTATCCA ATTTAAGAAT TGAATTTCCT TCGAGATATT CCAACATATA AACAACGTCA AACATGTTGT GGACATATTG CTTCAAATCA TCTGCATTAT TANATCTTGT AGTTGGATCA AGTACTTGTA ATCGTCGACT TTCTGTACTA TCAGATTNG AATGTTTCAA GATCGAGTTG ATGGTAATGG TCGCATCATC TGGATGGCT GGTGCTTGTA AGCCAC?1'CT 'rCGACCATAT CCTCCAAGAT CATGCGTATA AGTAATAGC CCATCCTTAT CA'rCACCTGT AGCATAAGCA CCGTGTTGAT AGAAATGT3TC CATTGCAGGA 7nTGGAT'rAT CGGTATTATC ATCGCCAAAT TTATAACCAT CACGTCCATT GTCGTCTAAA ATACGATACC CTGTTTCACG CGCATrTTCT TCAACAAAAT TACTGCGGTA GCGATCATAA GCTCCAAATC
ATAATCCTTT
AAATGTCCTC
cCAACATTCG
TATGCCCAAC
CAAAA'rCTGC AGCAAAGAAC TCTGGTCCCA ATCTGAGTCA TGTGTCATCT ATAACCCATA TAATAAACTG rrATTTCC-A ACAGGTCCAA CACTTCT'GTA GCTTTCCCTA
CGTAAAGCAA
AATAATCGTA
CA7TTGAGAGC
CTAGACTAGA
AATATTTCIA TAAAGTT~ GTGATC1'CGC TGACGTTTGG CTTGCCCGCT TTATGGTCAC CATGGTCGAG ATGACAAATA GCGGTATTTC CATGTGGCAC AGCTAACCCT TGCTTCGTTT CGGATCTCTC TGGCAAGGTC AGGAGAGGCA AGACCATATT TCGTGATACG ATCATAAACA CCGATAGAAT ACN'GGTGCC 0 0* 0 0000 0000 0 00 00 TCACCTCTTC GATAGTGGAT TTTTCTTCGA AGTCATTA'N' GCTTGTATI'T GGTAAAAAGA AATCTGTCGT TCCATGTTGA CTGGCAAGAC CTAGAAGATT GTTAAAGCCA GATTTACCCA TCCCCTTACC AAAGAAGTCC AAATGGTACA AGC1'AAAGTT ATACCACCGT TCCAGATAGG GTTrGA7=r' ATCTACAAGA TAACCTTCAG CCGCTGACAA GAGTTTTTC AAACTGTCTT CTTCTAGATA GAGCTCAGTT TCCT1TGACGT CTTCTGAATG ATIAGTCAACC TI='rcAAGT CAATGTAAGC CTrACTCTCT CTTTTCGGTA ATGTTCCAGC TGATACCATA AGTATCGACA ACTCAATCAG AGTATCTAAT GAACTAGGTC ?r'rGACATTC 'rCAAGCCAAG TAGCAAGGCT TGACGGGGTI' AGCACTAGCC CCAGTTGTTG TTTGTTTG TTGGAGAAAT ACCCAGCGTC CAGGTAAGAC TTGCTTGATG
GTGCTAAACA
TTATTCTTAG
GGTGAAGCAT
ACCTGACCAT
TCCTTGTTGC
AGTCCAGCAT
GCGAACTGGT
TTTCTGATGG
ATAGAGGTTT
480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 GGTCATACAG GAATTGGTT'r GGCGTATAGA GAAGTCCAGT AT'rGCCCAGA CTATA'rrC-G CTAATTTGGC GAAATCATTC TGGTATTTGA GATCCAGCTT CTCAGATAAA TCATCCTTCT AGTGAAGCAA GAGTrTTGTTT GCAGTCTGTT TGTTAGAAAC AATGTCTGTG ATGACTTGGT TGTCCTTCAT CA'rGACTGCT GACAAGAGTT Crl-I-r-GATA TAAAAGACTG TrCTCATTGA CCAGGI'TCC GTATTTGACG ATGGTTGCCT TGTTGTAGAA AGGTAGCAAT TTIrCAATGT TTTTATAAGT CAAGTTGCGC TTAGCI'TGAT AATAGGCCAC CTITAGAAAAA TCACTGTCTT TTTTGCCACT TGTTGAAAGT GGCTCCACTG TTGGTAAAAT GAGAGGArrG ATr'rCrGCTT TTTTGCTTGC AATTTGAGAA GCATC2'AGCA TTGTTCCTCT TTCTTCAAAG GATTCCTTGC
TGACGACCTC
TTCCTTTAC
TAACATCGCT
298 ATCCTI'GACC AAGGTGACAT TGTAGACTCT G?1'GGCCTTG CTGCTGAATG CTTCATTTCG TTATAGTGGT AACCAGTGAT GGCATTCC IrGAAT GAGAACATIG GTCAAACTC CAGCATIGCCT AACATCACCA GAAGTTCGA'r
CCCACAAATT
CTTCAGCATA
CAGTCTGATC
CTAAAGCCT'r
TTTCATAAGT
CCTTAGCCAG
GCTTAATTTT
AAATAGCAAC
GAGG i rT L
CAGTAAAGGT
TCAGGCTGGC
GCCTGCCACT CCAGCGACTC TACCAAAGTG GCTATCTTGG ATCTGTGCAT CTCG(7rCTAC TGAAGTATTT GTGTTAGATG AAATGGCTAC GTCACCTG'IC AAATGACCGA CCATACCACC GTTGATAATT CTTCCCTTGA AACTGCTCTC CAAACCACCG ATACCAC=? CACCAGCCAG TGTGTTATTC TGAGCTTCAT TTGCCAGTGA A'TT'rAGA CTCAGTT'rT CTACTGTAGC CAAA'rrATAG ATAGCATAAT 'rCTTGCCATC GTCCTTGATA TAGGATCT'1T CATCAGGACC CGCTAAATGA TAGGTTCCAG AGGGAT'G CTTGACATTG V1'GATATCAC TAGGCCTGCA AGTCCACCCA TG'rCGc=l GAC~wrAGTAA GATATTGTAG CCAGCAGTCG TGTGATGCTT GATTGCTCAG AACACCATCG ACGTGAACTT ACCGATATCA TCTTTCCCTG ACCACTCAAG TTTTCAAACA TTTTCACCG ATTAAACGAC AAGCTCCACT TCGTTAGCAT GTTTATAGCT TTGACCAGAT CTTAGCTAGA TACAAGGThA CTCTACTITT GCTGTGATTT TGcTAcTGGwr AGGTATACAT TGGATTGCT GGAACTITGCT GTATAACTCT AGGTCGGAAA *9 TACTAAACA AGTAAAGTT Gr'rGTTCTTI CTGTTCCCTT AATTATCTTT ATATCTGCTT TCTATCTCCT GCTGAAGCTT TATAAAGGAT TTTATCA'TTT mTCT'rCCT CT-GATATTGA CTT'rGAATGA AGAAGAT'rTC ACTTTAACAA AGTAGCTATT CTAACGAAAT GTCTTGTTTA TAAGTACCAT TTGACAAACT CATTTCTTAA TTCAAGTGTT TTCTCTGGTT CTTCTACCT TTTCTTC'TT AATTTCTTCG TTTCCATTTG AArTGGATGT CCTCAGTTGA ATTTCCGTTT GATGGTTCTG CTTCTGTTG TACCTGAATT TTCTGGTTTT GTTGCAGTTC CGTTTTC CTGGTGGtTT TGAATCACTA GGTTTATTGG ATACTTCTCC TCCCAGAGTT TGT'TCTGTT TCTTCTGCAG G 'rGAACTGG AGGTACCTTC TACTGTGCCT TCATTrGGAT TTACTGGAAC AAT=rCATT TrAGAGTCA TTATGTTCTG GTTTATTTGA AATCACTAGG ATTACTGGAC ACTTCCCCAG TATTN'GCT TCTCTGAATT CCTrGTTGAT TCTTCTGCAG GTTGAACTCG AGGTTCCT'rC TGTAGTACCT TCA'rrGAT TT'ACTCGTGT 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960
TTTATCAGG
GTTTGATTCC
'rCCATTCTCT
TGGTTGATT
AGTATTTTCG
TTTTCTGTT
TCTAGTTCAT
GTTGAAACAT
GATGTTGTAT
GATTCTTCAA
TTAGCTATTT
'rCTTGATTTG TTC'rTCTACA G1-rTI-IrCTG TT'CTCCAACT CACGTTGTCG AGATGTATCT GGTGATACTr ATTTTCTGCT TCTTGAATTG TTCTTCTGTT GGTTTTACTG GAACTTCTTC AGT?1r'rTCT GGACCT'rGTT TTACTTGCTC AATATTACCC TTATATCT TATCACTTAC CACAGTATCT GGCGACrCrG CAACTGCTTC GGrAATGTA GGTTGAACTT CGGGCAACTC AGGCTGAATT GCGGGT'rCAA CTACACCACT CTCAGGTTGT TCCN'TATAA TTCTTGGACT AGGCGCAGTC GTTGAAGTTG 299 CTTGGTCrr CrCAACCGGA GTrTCAGGTT GAAGCGGI'GC TACCTGCTCT GGTTCACCI'
S
S
S
*5*q S. S S 55.
S
TTACAGAGAA TATTCTCACG TTACAGTTCC TrCAGCTAAA TCTCCATAGT 7rCCTCACGA CATCCTGTGG ATTAATGTA TCTTCGTTTC TAGATTCT1TA CTTCTTTTCT TGGATTGATT TCGGCTTAGT TGAAGAAACA CTACAAAATT CGGTGTAACA AAcTCT~TTTG ArrACTACT TATAAGTGTA ACCTGAAATC AGTCCGTATT GTAATrAGC CATTTCCTAT CCCTGCAACC
ATTTCAACTT
TCAG4GATTT'r
TATAAGAGT
TTTACCCCAG
TGrrCGGCrA
AATTCAGTAG
GGTrGTTG'T
GTTGAACCTC
CTGGTrCGCC
CPLATAGCTCC
CT'rGAGTTT AAACAATT'rC
TCTTACCTAA
CTTGAATTrC
CAGGTTTGTT
TCTTTrCTTT ATTG1'TCTTG
AGAAAGGTIT
CCTGAATAGC
A5GTCTCACCT TTGTCGGTCA TT'rTCAC 'r ACTACAGCTT AGAcrG'TACG TCCrATGTT TTAGTACCT TTTTCG;ACTA TCGCGAAACT TCTrCCTrGT 'TrTACCITT 4TTTTACTrC TTCTTGAAAA TCTATTTTIG CAAT-rGACCT GATAAAACTT TGGAGAA.ATC TTCrCCTCTr AGAATCTGAA GATTG'TTTCT TTCAACTACT TGAACTTCTG TTGTACTGTT GATGGATGGT 7=~GTAGGA GTGGCAACTG TCCCTCTTTG ATATA'rccAA TTCTCCAGAG GTCAATTCAT TTATAATCCA CCTTTTGTTG TCAGACTCAG AAGTCGT'TT TCTTTAGGAA GAGGTAATTT 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 AAAAGATGAT TTTCTAAAGC A'rGGACI'GAA AATACTAAAT GTAATACCGT TTTATTCTTA TGGAAACAGC AAAAATTAAA ATTCCCATAG CAGCTAAGCT AGCACCAGCA GCCTCTCATT CTTGCTTCCA GTAT'rTGGCA ATTCCGCCAG TTGATTTTGA TATAAACAAG A'rAATAAGTT TCATCATCAT TCTCCACGTA TGTCGGAATA GCI'GCTTCTT T'rCTTCTGAT GATAGCTCI'G AATCTGCCAC ATAT'rTATAG CAGTTTCTTG AGCATCCACA GATGAACTAG CTAATACAGA CATAAAAAAT
ACTAAGACAC
ACCTTrCT
ACTAGGGCTT
GAATTTAACT
TCATAGACAA
TGAACTCCCG
AAACTTGAAA
TCGTTGCAGA TACAAGTCCT ACTGATAATT CAAAATACTT TTCCATTAr CCTCCTTGAA ATTATATTAG TGTAT'rATCT ATT-A'rCTATA AAT'rTACAAA AAAGTCTTAA AATTGAGATG AGGTACAATA ACACCTACCA TGAAATAC TTCTAAATCA AAAACGCTCT 'rGTT~rTCAC ATAAAATA TATATC?1'AC AAACACCT GAAAAGGCAG TA'rACCTTAA TTATACTCTT CGCTTTCATA CTrTTGTTTA TATTATTTGG ACGGTAGGTG TTACTCATAT CACTAATCGT 300 TCTAAAAATG GTGAGGCA GTTGAGGAGA ATTCCTTCTA TCCAGCCTCC TTGGC AT GAA3CGATGGT CTTCCTGCAG Crr TTT AGAAAATCTC GGACTTGTTC TGGTGCGA'r? TCAAATTCAA AGGCTTTCAT ?TTATAGAAA AAGTCGATGA GATGATCTGA CAGGTATTCA GTTGAAAAGG GTACrCACC ACrTCTA TArCTAATA AGAGTCTAGA AAATCGAGCT 7T-?1CrrA GAAGCTCACrG AAAATAGGAA TTGAGGATCC
TCAATTGGAT
GCCrTGATAG UV rrGAGTCA
ATGGCTAGAA
TTCTCTGCCT
AGGTAGAGTT
TCTTCCATCC
CCTGACTGGC
CTCGTTCTGC
AATrCGTrGG TCTrTTCCA TCTAT?=A CCAAAAAGAA XAM?:TT cir G?1TCTT GCTCTT-rTG G'rATTGTTTG Trl-rTrCCCA CTTGC=C GGGTCTCTGT AAAGCCAAAG TAATCTTGAT AAGCACGCC CCAGATTGTC TGCATATTGC TTGCCGATTr TATCCCTCTT GGATACGGAG TTCTTGTTCG TAGTCAATTT TCTCCTTCC GGTCATCCGA rTTCCCAAGT AAAAWGGTT TGATACACTT GAGCCrTTN CT'rTGGTTCC CCrGGTCC AACTTCCTCC
TGCGGGTCCC
CTTGCGTTCT
TAGCTTGACA
'rTCAAGGACI'
CTGAAAGACT
GGGTTTGAAA
ATTGACCTTG
CAGr='TCT
GTCGCTACCA
TCTAGGAAAA CGGTAGTC TCTCTCAGGC ACACCT="~ CCCAGAGCCA 'TTTAGAAGT ATTTTTTCCT rTNCTGAGC =N'CTGGTT TCCTCTTCCA AN'GCTGGTC AAGGGACAAT ATTGGAAAGA GGCGTTGGCC TGTGACACCG TTTCCACAGA CACAATTGCT ACGGCCGATA TCCXTCGGTAG CTTGTTCCCA AGTATCCGCT GATTrCTAA CTGGAAGTGT CATGAGGTCT GCAAATTGAT TGCCACGATT CGCTCGTCAA AGTTACTT AGATT'rTCAA CCTTTCTGAG CGATGAAAA1 GACGAACACA 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 TTAAAGAGTT CATAAGCGTA TTTGATGGCA CCCTTAAAAA TAAAGGAAAC TTCATTCCAT TTCGAAGCCT GTAAAACTGC ATCCTGCAGG CCTTTCTAAT ACTCAATAAA AATCAAAGAG CAAACTAGAA AGCTAGCCGC AATCACCTCA AAACACTGTT TTGAGGTTGT AGATAGAACT GACGAAGTCA GCtCAAAACA CTGTTT'rGAG GTTGTGGATA GAACTGACGA AGTCA9TAAC
CATATATACA
TACTTTTACA
AATCGAATAG
TACGAGTTGA
TTCGTAAAGC
GCAAGGCGAA
ACTTGAACCT
GCTCGTGATA
CCGCAT
AGATAGCCAT
GCTGACGTGG TTTGAAGAGA TTI=CAAAGA GTATAAGTTA CGTCTTTACC GAGTAAA.ATC AAGTA7*=T CAATATTTTC AAGCCTCTTrC GTATAGAGCT AACTGACCAC GATAGCGGTC CATAGCGGTC TGTCTTGTAG TCGAACAGA.A CAA=rTG1rr CAAGGATACC ACGGACAACA AAGTCTTCCT GACTCTI'TG GTCTCG7"TG AGCATGGAGA AAGGTTrGCTC GCGATAAAGA TGGTCGGTAT TAGCAAGAAT TI'CCTGACCG AG;TACTGTGT CAAAGAAAGC AAGAAT'rrA TCAAGATTGA TCTTGTCTCT GACAGCTTGG CTAGTTTGAA C -TGTTTGAG TGTrrCTGrr AG3GCTAGCAA GGCTTAGTTG CTGGCTGAGG 'rCAATTCTCT TACCT'rCT TTGGTTGAAA ACCTTGACCA GCAATCTCGA TTGACTTGGG GTTTGAACAC TTCCACCTCC TTCAGCAT TTGGCAGCTA TC7TGGAGAG CTGATC~TTCG CCAATAAAAC GATAGCCCAA AGCCAATCTT CCCAT7'MTG GCTGCTGGGWT.
AAGATAGACC TTTTTCTCAG GCATGAGTTC GTGAGTAGCA AATCTGGCAA A'ICGAAGCTG CACCTTCCAT ATCCATAACT CTACCAATCT CAGCTCCAGT ATITCTTGC CTACTGACTG GTCGTAGA ATTTCTTGAT TAGGAAGTTC AATAGCTGCG CGCTGAAGAG CCAGAGCTT-C TTTGATGGTA TCTGACTGAC GACTCTTGGT TTCCAACTCT CCGATAGCTT GATAACTAAA CTTGAGCTTG TCCTTAGTAA GGAAATTCCG TGCrTGCAGT CTAGTA1-rGC A'T-CC??GGA TTCCAGCTrl' TCACGAGAAC CCCGCGTCAT AGCAACATAC AGCAAACGCA TArrATAAAC
GATTGTCGC
CTCTGGTC.AA
ACAC -rTACT TATTrTrr
CCTTGCCGAC
TCTGCTCAGA
GAATGGAGAC
CAA-ATArrPT ATAGCTTGCT AGCTGT)AATT CCTCTTCGTT TTTGATGGTT 'N'AGGATAGT GGTCTTCTAC GACACCAAGA CCAT'rCTGAC GACTGACAAT ATCTTGATCC ATATTGAGCA TAAAGACGTA GGTCA'rGAGC TCTACTGCAT CTTTTGGCGG GGCTTCTAAG ACTTGGTCAA TCATACGAAT TITCAAATTGA 'rCAGCACGCA G'rGCTAGGGC CGGCAAAGCC CCAACATAGT CATAATAAAA ATAGAGAGAG TGGGTTTTGG CA'rACAAGCG 'rAG7-rT-rCA GCTAGALGCTG TGTGAATCAA ATTGACCAGT TTCTCATAGA GA =?CGTG TGCCCCTGTC TCCATCTTGG GACTTCTGAC ATAGAGTCTT GCTTGTTGAA AGGAAACTCC AGCCCTI'TAC TCTTGTGGAT TGCGACGGCC ACGCTTGCCA AATCGTGCTG AAAACGCGAC AAACC??TGA AATTGCTCTT ATAGAGAI-rG GCCTCCCTAG CAGGACCA'T ACGGTCGTTG TAAATCTTCC AAATCAAGTC CCAAGAAGCT AGGATATCCA TGAATTGCTT GCCTTTTTGA CTACTTGCCA T TN'TTGTGC T=C~ TTTTCT GAAGGGACAA: CATTGGAGAC 'rTCATAAGGG CAACCAAGGC CTGCCTATAG GTCAGACTAG 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 CAAAACCAAA ACGTGCTAGC TCATCCTCAT GTAGTCT1'GC AGGGGATTGT GGATTGGAGA TAATTGTT GGCGAGGAGA ATCTGGTCAT GGCAACACCT 'rTrCTGAT TGTTAGTTTC GTTTCTGTTT GAGAAATGCT CCCTN'GTTGT GAATGACACG AAGAGTGTCT AGCATCACTT CCACTTCTAG GCTCTCCGTC AGT1'TGACA GGAATTCCGT ACTCAGACAG TACGACTGCG GCTGGAGGTC AGAAGGGCAA TTTCC'rTAAA GAAGTTTCAG AATCTCCTTG ATAACTAAGC GCATTTCGCC GACTCTCTTC TTCCTCACCT GTATCGTCCT TGTCGTAGAG CTGGATTGGG AGTCAGTTTG GTATTGGCAA AAACAAGCTG GTCTGTTA TCATA=TGA TTTCGCCGAC CTCTTGGTCC ATGAGACGTT CAAAGACATC 302 ATTGGTTGcT GACAGcAcTT cTGAAcTACT ACGGAAATTT TCCTTGAGCA GCC?'rCTTGG GGATINPrGCG CATAGCGTTG GAAN-rCTCA Tnr.AAATCT CTGACGGAAA CGATAGATGG ATwrGCTI'GAT ATCI'CCCACC A'rAAAGCGAT AGACAACAAT TCCAGCATCC GTCTTGAAT ATGGTTGGS'A TCCTCATACTr GACTTCATGG AAGCGCTCCT GATAAGACTC ACGAACTTGT GGGAAA7=C AATGGTGTAA TGGCTGATAT ATAAGCCTCT ACAAAATCCC TCCATGATAA CGTTCTTGAT AGCAAACTGG GTCTr'rCTCT CGGCTGGCAT TAC'TCAGAC AGCACTGCCT GATAAGCCTG ATTAACTGAA CATT'rTCTAA TGGTAACGGA AAAACCTT'rC TCT'TTrCAC TGGTAAAATC TTTTGAGGAT1 TACTGGTGGA CCCCGTTCGT CCTTGCCACG CAGCGAATTC GAACCCATTT TCCTGTCGTT TCA'rGAAAGA TTGGAAGGTT TTAGCTAGTT AGTCGAGAAT CGCTATCTGG TCTGATAATT TAATrGAGCCT
GCGGGTCTGC
TGTGGCCTT
CATCGACCAT
CTAAAATCTC
TTCTCT'GACG
TCCAAGTGTC
GTCCTAGTTT
AATCACCI'A
AACACGCGCA
TTCATCCAGA
GTTATCCAGA
CTTCGTTGTA GGCATCAGCC TCGACCGTTT TTCTCCTTAG ACTATCGGAC TCCTGATTTA ATAGGCAGCC TTTGCAAACT
AGGGGCTTCA
AGATGGCGAC
GGGAGCCAAT
CCT'rGGCATC CAAATCCCAA AGGCCTTrGTT TGATTTGCTC'GGTCACTTTT AGCTTTCTCA AATCCTTTGA GGAAAGATNC ACTCAGCCAC T'rGGAGGAAG TCATAGATMTT ATAGACCTG CTGGCGCAGA CCCAGCAAAG 7rTTTCAGCA AATGACTAAA GGTCTCTTTC 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040
TGTTTACCTT
TGCTCGCTT
TTGCCAAGGA
GCCAACTGGC
CTGATTTT'T
ATAAAGAGTT
GGTAATGCGC TTCAAAGACC GGTTTGTAA AATACGGAAA A'TTrGTGT GAAAGAATCC GACCCAAGTG TTGTTTGAGG TCTCTAAACG TTC'rTTAAGT GAGAAA'rrTC GACACCACGC TCATGAAAGA CTTCGT'r'rC GAGAATAAGT TTAGGTGCAA TATCAAGCAG ATAACCATGT ATGGTTCCAA TGGCAGCGTT CGGTAGGTCT TCGACATCAT CTGTTTCTrC GATTTTCrTG TCAGTTGCAG CCTTGACGGT AAACGTTGAG GCCAATTGGT CCAGALATGCG CTCTGCCATG ACAAAGGTCT TTCCAGAACC AGCCGATGCT GAGACCAGGA TATCTGGGC AGAAGTGTAG ATAGCTTCGA Tr'rGCTCGGC AG=TITC TGTTCCT'rGC TCGAAT'N'GC TTCTGCTTCT TGCAG=Tr~ GAATCTCCTC CTCACTTAAA AAGGGAATAA GCTTCATCGA TITCAACTCCT CTCTTATTTT T'rCAAGCCAA GCTTGCTTGA G7"TTC1'CC GACCAGACGC TTGCCATCAG CTAGGTCCAA CT=CTAGG AAACGGGCTT GGCCCAGATG GTA)LTTGGCT TCAAAGCCTG TAATAGCCTG ATGTTGCTGG ACGTATGGGG CAATGC'rTCT GCCA'rrTTCA GTATAAGGAT TGATGGCGAA CCGGCCTGCT AAAATCT'rCT CAGCAGCTTT CTTGTAAACA TAGGCATTGT AGTCCAGTAG GAGCTGAAAT TCCTCATCTG TCAGTTGATI' AGCCTrGT'rT TTGTTATAAA 303 AT'rCGCCTAA ATAACTGCTT TCTrM'CCA AGAAGAGCCC T1'GGTA'TrrC ATAGATTC TGGCTTCTAC CACTGC=CT GCCAGAC?1TT TTACCGCCAT CAGAGATTGG ACAGTCAG CCAT~rCCAA GTACATGGCG CCGAAAAAGT TCTGCTCCCC TTCTCNTT AGGGCACCAA GATAGGTTGG TAACTGAGAA TTGAGCCCAT 'rAAAGAAATG AGGAAACTGG AACTGAGTcA GACTGGATTT GTAGTCTACT ACTCCrTCG C'rCCATTAGC TTTCAAACGG TCAAS'CCGGT CCACCTTGCC TCGTACAAAG ACACTCTC CATTGTCTAA 11'GAATAAAG GCC7GGTCTT TTCCACCAAA A~TTTGCwr TCrGATGG T'N'CGATGGC TGGArrGTG;T CGGAGAATAT GTCCAGTTGT CCGTCCAACA TCAAGCAAAA C1'TCCTMGGT AAACTGGGCT TCCAAACTTT CTTGATAAAT AGCT'rCAAAT TCGCGTTCTT GACTGCTTTC TTGAATACCr TGTTCTAGAC GTTGGTCAAA GGAATCrCA TTAGGCAACT GTAAGGCCC TTCAAAGATA CCATGCAAGA AATTCCCGTG ACTACGGGCA TCAGGATGCA AACGTAATTC CTCCTGCAAG CCTA.AAACC1T AGCGTAGGAA ATAACTGTAT TCATTCGAT AAAACTCTGT CAAACCCGAC AAAACTCC'rG TN'GGCAGGA TAGAGAGC?1' GCAACCTGTC CTTGCTAAG TTGGACTGGT TGGGATAGCT GGA=TCCA GACCTTGCTG ATCTAGrr'rr CACGCGACAG AACCTTGACA AAAGTCAAAT CTTGCTCAGT ATCGCTCATC
GTAGACAGCT
GTCTNGCTGC
TTACCTATGA
TCACCCTGCT
11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 GGTGATACGC AACCACACTA GACAAAAGAC G;TCCTTTGTG A'rTCATCCTC T'rCTCTCTCC GATAGGCAGA TrCCTTACTT TCACTT-rCcT GCTTACGAG;C AGAATTGACC AAGGAAAGCA TGCTCGCAAT CAGTAATTGA ACGCCTTCTT CATCTGTCAG AAGACTGGTG TTTTGAGAAA TAGCATAGAC AAAGTCAGCA GTCAATGG1'G TGTCCACTGT TGCTGGAATG CTACGGTATT TGTGATAGGA CCCCATATCC TCCTTAGACA GCCTAAATCC AAA.A'GGATC AACTCTTGAA
TAAAAAGGCT
TAGTGTAGCG
CGGT~CGCTTG
'rTTTTGGTAA
CAATCAAATC
GCGACAAACT
TGGAGCCGAC AAGAACAACT ATT?=CTTG AGATTTTCAC GTTTAGGTTT TGCCTrTCTT ATTGTCCTGA G~rAGTCCAA GTAACTCTGC ACCAGAACAG CATTCCAGAA TGGAGCAAGG C1TAGGAAGTC TTCCAGACTA ACCT'GTGAAC CAGCAAAAAC AGTCGCAAAT TGTTCTAAAA C LTGGCAGAA AGCCTTCCAA ACTTCTc GTC?1'TCCTG 'rrCTACAGCT ?1TGTCAAATC TTGTAACTGC TTGGCACAG CTCCTCTTT TAGAAAGACA GTAGGAGTTr TrCAGCCTrr 'rGTTTCGGC TGGCAAAGAG GTTCAAGA TTCTCAGGCG GAGGACATTC AAACGCTCAA GATI-AAATT TCCATGGTGG AGGTTTGCTG AAAGGCTGGC AAGCCATTGA TACCAAGATA GCGGATATAT
TCCAAAGTGG
CTCCATTTTT1
GGTGCTAAAA
GAN'TGGTGA
TGCTCAAAAG
CATCAATATC AGACTGACTG CCTcCCGACG AAAACGGTAA GATGATGAGC CA'rCGCTTCG TGGTTTTGAG AGATAACTGG AGCTCAGGTC TGAGTTC'rCA CCTCC'rTlrG CGTCAAACAA CCAAAGCGAG TrTCGAAAAG TATCCATCTT CTCATGAGTT GATGATGGAC AAATIACG AGGCTTTCTT ACTAGCATAA 304 AGGTCAG'rAT ACAAATCAGT TCTAAGAAGA TTAATCAAAT cGTTtrAAAG CTAAAATAGA CTCGACAAAC TGAGTCAAGG C7rCTACCAA GATAAAAAGG AATCTGATAC TGGTCAAAAA TAAGAAGCTA CATCCCCCAA GAGAATACGA AAATGCTTGT TGTAA'rrTCT
GACCACATTT
TCATAAGAAG
TGAGAACAGT
CTGCTTrGG'r GACGAATACT ACGCCTACT AGCTCCAACT GTAAATTTTC ACGGTCTrC TCATCGACAT ACTCCAACAA ACGAGAGGCC TTGTCAAAAC CCTGAGCAGG CGTTTGGTAT TTAGAAGCCA AGAGATTGCC CTCGCTAAAA GGACTGG1'AT GCCCCGATAA CAATCTCA6AC ACCTTTGCCG
TGAAGTA.AGT
0 S
S.
0 5 S.C
S
S. OS
S
C
CCACAACCCG CTCTrCCTCA GCAGAAAAAC GAGTAAAGCC GATTAA6AATC ACTACTTACC TTGTCATTCT CAATAGCCTC TTTCCTGGGC TAACTGACCT TGATTAAGAT AGGCTGTTAC AATCCGCCCT CTrA'rCCTCA TCTGTTAAAT TCTCCAAG1TC TGGTCATCTC ATGGTAAAGC TCAATTAACT GCTGGATCAA CGCCATAAAC ACGCAAGTCC 'rTGGGATCGA GTTCGGCAAG CAAGACCGAT ATCA'rCAAGA GTAGTTTTAG CTGGrAAATC CCA7TTGAGC AAAGCGCGTG ACGGTAATCG AAAAAGAAGC 5 5@5* 50 SOC5
S
0 5.U* 50 5 0 GCACGGCGCG TTCcTrTTCA CAGCTGCAAC TAGCTCTTCT CAGTATAAAG 'rAATTTCATC TGAT'rAGCCT CGTAAATCTG CTAAATCTI'A AATACTTAGC CACATAATAA AAAAGCCTCG CAATACAC?1' CAATGTGTTG TCGCCATCAA CATCGGACTC ATATATTCAA TATTC?1'GAT ACCGTGGATG CTAAAATATT TC'rrA'rTCA ATCTACTATA TCTTATGGCA TAT1TCAATAG
AAAGAAAGAG
GCCTCTCI'TG
TCAGCCTCGT
TTAAAATATT
TTACTTGTA
CTAACAAGGC
TCCCAGTATC
TAATTCGATA
AGTTGGGGGC
rrAGAATTTC TGGAATTTTTr
TAGGCCATCC
'rTAGATAGAA TrGAGTTTT 'rTAATGTCGA
TCAGAAGAAG
GTCA6A'GACC AAGGCGATTT AATCAAATGG GACAACTGAC ?TTCTCAAAA ATCAAGAGTA CAAAAAACTC ATCTGAGAT TrGAGGATCC TGCTTAATAG GCATTTGI'AA AAGGCCAACC ATTCAAGACC AGATAGCGAG CTGCTGGGAC AAGTATTCCA AArGTAGAAG ACCCGCTTGC TGTCAAAGAA GTCCGAATAT CATCACCCTA TATTATACCA TTTCTTTTCT TCATCATCTG TAAGTCTGGC TACTGAAAAT ATGATTGTTT CTTAGGTACG CTGGTAGATT GTCTGATTTA T'TTrAATATT ACGTGCCTTT 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13140 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 AGAATGATTG AACTATAGTA AATTGAA.ACT A'rAATAGTAC TCTAGAAATT AATTTGATTT CCCTAATCAA GCTATTCGTA ATAAAATGAA CCAAAAATAG TACACAATGT Gc'rATAATCT ATTMCGTAA AAAAGTTCTC TCTTATTGTG AGCGAACAGG TAGTATAACA GAAGCATCAC ACGPI'CCA AATCTCACG'r AATACCATT ATGGC'TGGTT AAAGCTAAAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA AAAGGAACAA AACCAAGAAA AGTTGATAGA GATAGACTrA AAAACTATCr TACTGACAAT CCAGATGCTT ATTTGACTGA AATAGCTrCT GACTTTGGCT GTCATCCAAC TACCATCCAC TATGCGCTCA AAGCTATGGG CTACACTCGA AAAAAAGAAC CACACCTACI' ATGAACAAGA CCCAGAAAAA GTAGCCTTAT TTCT'rAAGAA TTTTA6ATAGT TCCATACTTA T'N'TTATCGA ?I'AAAGCACC TAGCACCTGT GAA'rATGGTC GCTCATTAAA ?1'AGATTGAC GAAACAGGAT AGGTCAGTrA ATAAGAGGCA AGCTCTAACA AATGGTGAAT CTrTTTTrGAA GCTTGGTTTC TATAGTAAAA TGAAATAAGA 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15213 AAGTA'rcTGG AAGAAGATAT CAGAGGATr CTTTGGTTGC TAATCGCTCC AATGACTTAC GAAGAGACGA 'rGACGAGCGA AGAAGTI'TCT CTTACCAACA TTAACCACAC CATCGGTrAT ATAGGGGGGGC GGGGGGAGGG GGGGGGAGGG AGA INFORMATION FOR SEQ ID NO: 27: a a SEQUENCE CHARACTERISTICS: LENGTH: 6004 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: TTATTACCTG AAACATTAAA TTTAATTGGA CATCCCGTTA TCAATTTTAT AATATCATCA AGATTTTTAT TATCTGAI-rC AGGAA N'TrA TCTGATATAA CAACACCATT TTCAAGATAG TTCATTAAAT TATTTGATTC ACThACATTA GTGTTT-TGAT CTCCATCAAG CCAAAAATAA TGGTTATCGG AATCTAAATA CGATGAGTTT AAAATATTAT TACAAATTAT TTGATTTGCT CCACCAGGAA TATATCTCAC TACTAAATTC TGTTTAAGAT TCTCACTACC TGAATGAGTG ATAACAAACT CTAGAATATA 'rTTAGCTAGT CTATCTTCAA CATAAATCAT CTTCCTAGAA TGATACACAT CACCTAATI'C AAAAAATGCA TCCTGATA.AT CAATATTrTTC AATAACATCT ,ACCTTTTCTC CGj-rMTCAC TAAAAGTTC ACGGCTTCTC TAGGAAAATC ?ITTATAAGT TGTGTAGAAT GTGTAGTGAT AATAATTI'GA TGTTNTAT 'IPAAACACTC TrGAAGTAAA AACTCTTTAA ATrATAGAT TGCACTCGGA TGAAGTGAGA TTTCAGGTTC ATCTATTAAT A''A.ATGA.AT TTGATTGCGC ATTI'ACTATA TCATTTACTA ACAAAATAAT TCTAGCCTCA CCTGTTCCTG CAAAAGCCTC GGAATATTCT TTTCCAGATT TTT TCATCCA AATAGTTTTG GAAGCTTTTA TATCATCACC TAAGATTCAT CCATTATTTC TCTATAACCA TTTCCTTTTT CCAACTGGCT TAGATCGTAA 'rTAAAATTAT AGTGATAGAA AAGACAACAT 7TT'rT'rCAAT T7rTTrGGGTA ATCTTTCCAT AATCCTT'rT 71rGTTCCAGA TTATAATATG AATAGATTAA GTACTAAACC AATATTCAGA ATTGAAGT'rT TATTTACTCC TTTTAACAGC TACGGAATCA GGCCACTTGT AGTGCCACCT TATTCTCCTA AAG'rTTCTCC AGGGTTAGGT TTT'AACA'rC CGCGTCATAT TTAGCTTGGT T TTGGCGTTG GCTACGAAGC TTGCCATTTA GCAAGTTCCT GGCCAGTGGC AGGTACATTT AGCGATGGTT GATGCGATTT ATTGCTGTTA AAGAAGGCTT CTTGCTTGGT GCTACATr'rA GACA~wCTrC! ACACCAGTGT 0 000.
306 TTTTGAATAC AACTTATGTG ACAATAA1r TCACAAACTT ATAACGCGTA TAGCrACTTG 'rCTTA'rAAAA TCTTGTTTAC AAATAAATCA AAAGCAGAAA GCCATCCCAT CTGTCTGTCG CTCATATTGT TNrGAGGAG ACGGCCTTTA AGAACTTCTA ACATTGT'rTC CCATCCACTT CACACTTTTA T-rGGCTGGAG ATATCTATTA CAGACACCTC ATrCAACAG CAACTTGAAC GCACCGTTAA ATAAATCAAT TTTTTAT'rAT AACATTATCA ATTCACCAA CTTCTTCATC ACTCAGCTTG TTTGTCGCAT GTTCGTTAGA GAGTTTC" A TGTCGAGACG GGCCAGTTCT CTGCTCCTGT GATGACACTT CCAAGTGTTC TGGATTTGTA CCA.AGTCGCT ATCGCTTGTC CTTrCCGCACG CGCATTCCCA GAGCCGCAAG GTCTTCAAAG CITTCTGAGAT 'rTGTCCAAAG GAAGGATCTT GTCCAGCGTA TGTCGCTGTA TAGAACrrCC TGAAGTTO;TA AAGGATATGA CTTT'rGCAAT GG'N'TCGTTG CACCTGTTGC AACrTTTGTG TCATGAGGAT ATAGCGAGAA
TATCATCAAC
TA'rrATTCTT
TACGTTGAGT
CATATTrCT
AAGAACTTCC
CATATGGTTC
CATTTCTACA
CATCATTG
AACCATATAA
AGGATTATTN
ATCTATGCCT
AGCAACAArr
AATGTAAAAC
TCATCAATAC
TC7F'=GCA
CICAACCATGT
TCTTCAACAT
GACATAGCCA
AAGCGTTTGA
TTAACAAGGA
ACAGCACGAA
GCTAGATTAA
ATTTCCTCTG
='AACATTA
TAAAATATCA
AGAAATTTT
ACAATCACAA
AATATATTTA
CCAATAATCT
AGCTTTAATG
ATCAACATTT
AGCTTGTAAA
AACTTATAAG
GATTTTAA
TTCCCCA'rAG
CCAACCCGAT
GTGCGACGGT
CGACrrCTGG
CCAGTTCTTT
TGAGGAGATC
GTTCAGGTGC
TATAGTTGAC
TGGTGATAGG
TCAAGTCTTT
CAGTTGGCTA
TCACGAATGG
T'rAAAA'rrTG AATCTCTGTA 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520
TGCAGCTGTC
CATGATTGGG
AATGACCTTA
GTTGGCAAAT
C?'rATCAAAG
GTCCGTCACA
CAGCGTCAAA
ACGATAGAAC
TGAAGGAGAC
TCGTCTTCAT
TCTTCCCAGA
TTTTCAGTAA
TTACCAGCCT
CCTTCATTGT
TAGAGGAGAA CAGATCGAGT TTGGTCAACT CAACATACCA CCAGCCACAC CAAACTCGAA AGATTGTGGA GAATCCAGCG ACATTGTCAT GCGCCACATC ATGTTCCAAA 'TTTGTTAAT 307 AAAGT-rCCA' CAAGCATCCA ACCGTTTGAA AGGAACCAAC GTCAATCCCG TTACCGAGAG G'rCGATAAGC ACGITTTGGA ACGAGACACC CAGAAGAAGA "TrCTCGTA AGAGAAACGA ACGTrC~rGAC CTGGTCCGGA GAAGGGCATC AGCACCGTAT TT'CTCGATGA CA'rCCAraGC ATrTAGACAT CTTGCGTCCT TGCTCGTCAC GGA'rGAGACC ATGGCTGACG ACCAGTAAAT TCCAAGGACT GGAAGATCA'r TGATGTCGTA ACCTGTTACC AAGGTTGAAG TrGGGAAATA
ACGTTTAAAG
AGAACTGAAC
TTCTTCGCCG
CCAAAGCTGA
GTTGAAACGA
CTTAGCCAAT
ACCTGTACGT
G7M'C'TTCC
TTCAAAGGCA
ATGACGTTGG
ACTACCAAC
TGGAAGGATG
AACCGCAACG
ACCATCTTCC
CTCAATATCA
TCrrCTGAGT
CAAGTATCCA
ACATACATTT
CGAGAGATAA
GTGGTAGA
TGGTCCATCT
TCTGAGTGAC
A.ACTTAGCAA
AGCTCATTCA
CCAACCAAGA
TCACGATCTGC
ACGTTT7TTAC
TCCCCAAACA
AGCATGTAAT
GAAAGGGCTG
CGACT'rCAGG CCAGCCCATG AGACGTCTTC GTCCTGAGTC CACCATCAGC ATTGTACCAG CCCAGTCGIG GACATTTCC ATTCGACC~r GTCCTCTGTG TGACGAACCA 7IrGAGTAGAC CAACACTGTG GACACGr'Ir CGACTGCCTT ACGACCTTCA TAGTTCCGTC GTCG=rCATG AGTCATTTGG ATCGTGGGCA CGTGCTCATC TCCAACGATT CAATCAAGTC CTTGTAGCC TAGTCTCAGG ACGAGTTGTA TCATGTGGTA GAAGGCACCT TGCGAG;CTGC TGGGTCCCAG AAAGG7'rCAC AAAGACCTTA CACGAGAATA GTCTACAGAA ATTCG'TCTTT CCATTrCCCAG TAATACCCTC ACCACGTAAG TCTTG43TTAG
AAGCCGTGGCT
TCGATTTTGA
CAATGGCGTT
CAACTACGAC
CAAGGGCACC
GrTGAAAATG GCCAGAGGGC CATCCGTCAC CTTCTGGAGC GCA ATTT GGTGACCCCA ATCCATTGAA GGAAGGTATC AAACGATCCA TGCCTGAAAA ACG;TTGACTT GTGGCAAGT'r GGTGTGATTT TCACGACACC GGGATGAGTT TAT'rAGCGAT GGGTCTTCTG GATTAACCGC GCAACTTCAA GGGCGCGTGA TCTACATCCT TGTGAATCAC TTGATGATAA ACTCACCACG CGAACAGCTT TTGACAA.ACC AGCCCCATCT TGCCCCATC ACCTTCGTCA AGAAAGACTC CGCTCCTCAA CCTTAGCCTG GTATCAAAGC CTTGCATGCG GCGTGACCAA GGTGAAG =T 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 ATAGATCCAG C=r~CTTGT TTCATCAAGA CTGAAACGC'r TTCCTTGATG CrAGTGGCAT ACGACCrAGG TCATAACGCG AGTCGCAATA CCAGCGTCGT TTrTrGACGG ATGATGATAT CCCAGTTACG ?rI'GGTGGTG AGGCTTGAAA ACATCCGCAT
CCATACCTCG
CCTGCAAAGT
GAATCACGAT
CAAGCCATTT
AAGCCAAAGG
CGTATCCCAA
TGAATAAGGC TTAGCCNT GATCGCCTGA TTGGTAACGA CCAGCCTCAA CCTCGGCTGG ATTCTATTTA GGTGAAAGTT CTTTAGACAT GTGTGTGTCC TTTCTCrATT TTGI'TATIT 26 4260 TAITT'GAAT TTGC?1'AGCA GCCTCTG AGTGGTGCAA CTCATTCGGT TGATGTTGGG AAA71rTCr~ TTCAGATACC TCAATATGTC CCGTTCGATT TCGACG'rATG CACrT?1CAA CTTGATGAAA G~rATCCAAT AAATCCTGAA TTGAAAAATC AA'rTACGTCT GTTAAAATTA GGAAACAAAA GGTAATTCCA CGAACAAATT TCrCrAGTGC AAAATCAATG GCTTGATGGT ATAATTCTTG ACCAATGGTC ATTTTTCCTG TCAAATCACC TCCCACTCAC TCCTCAGCAA TTGAGCATCT TTGCGGTCTC TTATGCGAGC TCGTTIGACT'r TGAAGGrrAT ATCCAAAGCA TTTAAAAGCT AGATCAATCA AGGAACACC GTCTGGGTGC AAGGTATAGC CAAGCTCTAG AACAGAACCA ATGACTTTAT CGGTTCCTTT TTCCITI'GA GTACGCTCCG GAAGAATGTG CTTGACTGGA GGAAAACCTG CTGGATAGGC A'rCCTCAGCA TCCACCACTG TGCGGACI'CG TGGCAGCTCA GTTCTTGCCA 'rCCT'rCTTCC CTACACGCT GTCCAGATAG CGATAAACGC GGCAAACAG TTGATTTTTA AAATGTCCCT GCTCAAGAAT CTTGGCAAAG ATG'rGGCAAA GACAATCTAA AACAAC'rGGA CAGGCGTCTA CATAATGCAC TCCCAAACCT TCCALATTTCT TCGCAAGCCM AGAAG~rrCA TCAGAAATA'r GCTCTGCGGC A~rCTCTCTC GCAACGACTC AGGAACTACA GGTCGTGA'rG TTATAGCCAA CAGGAAAACC ATAATATAGT TTACTTGTGT ACTAAGACGT AAAAGAAAAG CCCTGCCATC
GGGG
308 CAGACAAATr AAmTAATT'G
GTTTAAGGG
GAC7?1G=C TT'rGCTTTAA ATACATCAGC TGGTACTGAT CTGATCGCTG ATTTCTTGC-A CTrGCCCAA CCCACATCTC CTCGGTATAA TCCTGATAGA 'rATAAAATAG GCTAGCATCC GTCAGTCGAG ATGCTGGAGC ACCAATGATG AAAAGATGCA GCCATATCTC AAATCATCAC AGCAGTTGCC TTCGAGGGTA AAGCCAAGCT TNTCCGAGAC AG~wrAGTTCA ATCTTGTGAA GACCAAGTTC TGCTTCTGGA ACATAACCTC GACCCCAATA CACATCATCC GCATGAAGAT GGTTGAAGTC GACGACAATC CCATAGCCAG CTGGGAGAT'r CTCCAGATAA TAAATCTCAT CTTCCAAGAT GACCTCTGGC*AAAC TAGCGT AGGTATGGAT TAAAACGAGA CGT'rCTCTTT CGATTTTATC TCGCTTTT'r GATGAAACTG CCC7TCATAT GCTGATATCC ATCTCCCATG AAATAGGTTG T'rTCATCCAG GAGTTCTGGG GCAACAAGTC TACCG'rCTTC CTCAACAATC CTrATCTACCC AAATAGGAGT CTCAGTTCGT TCAGAAAT r CCTGATGACT GATAAAACCA GCCTGCTCCA TCACACTAAA 71'TTGATAC TGTM'GATCT CAATCACAAC CCAATCTCCT AGACTATAAG CTATTATTr A'NI'TAAAGT AAGTG?1'TCA GCGGTCTCTA 7'TGrGCm AATCGA7*rCT CAATrCAACA AACAGAATCT 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5 2 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6004 AA'rTCTAATC
TAAAAGATTG
TACATGACAG
TTGATATCC'r AAAATAAAAA CTTCATAACA ACCCCCTTTG GGACGAATGT GTTTATCCGC INFORMATION FOR SEQ IV NO: 28: SEQUENCE CHARACRISTrICS: LENGTH: 5857 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: TGTAGAATC ACGACAATGC TTCGTTGATT TCTGGGTGA 'IrrCGTCGCG TTCTGGCA.AG CGAGTCAATG AACCAAAAAT AGATTI'TCGT AAAAAAGTC ACACGTNTTC CAAATCTCAC AGGAGAGCTA AACCACCAAG TAAAAACTAT CTrACTGACA CTGTCATCCA ACTACCATCC CCACACCTAC TATGAACAAG TTTAAAGCAC CTAACACCTG AGAATATGGT CGCTCATTAA TCAGAGGATT TCTTTGGTTG CGAAGAGACG ATGACGAGCG ATTAACCACA CCATCGGTTA AGAACTCT'rG TGTGAAGAGT AGTACACAAT GTGGTATAAT TCTCTTATTG TGAGCGAACA GTAATACCAT TTATGGCTGG TAAAAGGAAC AA.AACCAAGA ATCCAGATGC TTATTrGACT ACTATGCGCT CAA.AGCTATG ACCCAGAAAA AGTAGCCTTA TTAGATTGA CGAAACAGGA AAGGTCAUTr AATAAGAGGC CAGGTCTAAC AAATGGTGAG
GAAATAGCTT
GGCTACACTC
TTTCTTAAGA
'FrCGATACTT
AAAGTATCTG
TTAATCGCTC
CTGACTTTGG
GAAAAA.AGAA
ATI'TAATAG
ATTTTTATCG
GAAGAAGATA
CAATGACTTA
TCTTACCAAC
TGGGGAAGCT
ACTCACCTGA
AGGTATTACC
GACTATATAA
TTT'GTTACCA
CC~TTTATGG CATA'rTCAAT GGTAGTA'rAA CAGAAGCATC TTAAAGCTAA AAGAGAAAAC AAAG'N'GATA GAGATAGACT ACTTTTTTGA AGCTI'GGTTT CAGAAGTrC TTAT'rATGGA TAATGCAAGA. TTCCATAGAA TTGGGTATAA ACTTThCCT CTTCCTCCCI' GTACAATCCT A'N'GAGAAAA CATGGGCTCA TATCAAAAAG CACCTCAAAA AAGTTGCAAT ACCN'TATG AGGC?1'TT GTCTTGTrCT TGTTrCAATT ATTGTCTAAG CGAAACAACC GATAAGAAT GGCACAAAAG CGACCGTAT ATACAGGAAA AACAGTTCAT AGTTCTATCT TGAGCAAGTC TCTCCAGCGA &CAAACGAAC GCCTTAAAAA ACCAATTCCC AAACATCTGT CCCCTCACAT CTTCAGACAC ACCACTATTA GCATCTTATC AGAAAATAAA ATTCCTTTAA AAACAATCAC GGACAGGGTT GGTCATCCCG ACTCTGAAGT CACTACI'CC ATCTACACCC ACGTCACAAA GAACATGAAA GATGAAGCAA TCAATGTACT GGATAAAGTT ATGAAAAACA rFN"NAAAA AGTTGTCC CTTI-r'I-GCC CTCTAAATAC AAAAATAGCC CTTCGGATAA AATCCGAGCG GCTAGAAACG TTGI-rAAATC 310 AACGGCCGAA CT'rTTGAATT TCATGGTTCG GGATAAAATA GTTCACTGAA CTA=NArr Ti-rAAGGTT A'rCATAATAT TTTACCTTTT TCATTCTAAA AAAAAAGGTA TTAAATCGAT TTIrAATT AACGCCACGT CATTTrCACGG TAAACATCGA TATTTTTGTA TTAATAACTT CTCTCCTTTC TCATCCTGTA ATCCAAATTG GTCCTTCTCC TCAAGAGTGT TGACAATTT'r CTCGCCAT'rT CATTAAGCG 'r'r'TGCTGTA 'rAGATCATAT TTCTCCATAA CTGATGATAC TTCATAATCG TCTCAACCAT TGATTTA'rCA CAC =TCATT AATTCAATTT GTCTTATAGA CCAGTTATGT TT'rTCAGTC ACTTrCAACCT CATCATCAGT A'rTAACTCTA ACTCAGCTTC TCTAAACTAA PAATTAGTTAT
CAAATAG'N'C
ATGTAAAGTA
GAGTTrCAGCA 'rAACTT'mGA TGAAATrCTr TTT'rAGTATC AATTAAATAC GCTAAA'rrAC TAATATACTT CAAACAAT'rA CAATATACTA GAGGOGGAGT GGCAAGAAAA TAGCACCTI'T ACGGGTCA TTGATGAATT TTATTGT?1'G GCACrrCT TCCAACATIA TT'rTGGAGT TAACTGCA'r GAAAGAATGG TTTAAGAAAT CCATAACTAA ATCAAGATTT TTATCAATGT 'rTTAGAAATA GCAAGTACAT TCTCTTAAAT GAAGTTAATT TTGCATTCCA ATAAGGTCTA ATTTTCAAAT ATATTCTCAA TTGCTCTGCG ATATATACAT CTCTGACTGT GCAAAAGGGA AATAACTTTC CAAATT'AATC TGGAAATATC TCG'TCTGTAC ATATCCAATG AAAATCGCTT ATGAGGAAAG GAT'rTAAAAA AAAAAATTCA AAATTACTTT AGCATTTAAT AAAATTTTAT CAAAATAGTA TTTTCrATCA CTACCGGACC TCCTACTGTT CAATAAATGT TTTAGCTGTA TTATAGGATT TATATAATAT TTTCATCACC CAATCCATTT TTA-AGTTAGG ATCTATACCA 'rTATATGACA AGTTTTATGA GTTAGAAAA AATTCCATAT CATAACCTCC TATAACTA6AT TATkTAAAGA 'rTAGCAATA CATCGTCTAC AATGCTr'rTT CAGCTTCAC ?rTrGAAATT TAAAATCATC TAGAGTGATG CTTCATAGGC AATGCTGTCC AATCCTCAAC TTGTAGATGT GGGCAGTATT ATCTAATAAA CCCACGGAAT TTCCATAAAC TTGATAAAGT GAATAATTT'r 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 GTTTCACCAT TAGAAACTCT TAAATCAGCT1 GTTTCTTGCG AAAATACTTC TTGTACTTCT GACAATATAA 7TrCTTAATA TTAAAGGAAA TTAAAAATTC TATTAGCTTT TCAACGTATT TCTGTGCCAA TAGCCTGCTT AAACTCATT AAAATTACCT GAAGCGTTCC CATATATCAT GATCCCCACG GAATGTTCTT .CGGCGCTAT TAAAAACTI' TGAATTTTC CCGTCTGATA AGGTI'ACAGC GC1'ATCAGAA GCCAATACAA CACCATTI1= ATTNAATA?1' CCAATTTCTG CTGTCAAAAT ATCACCTAAA CTTTCTAAAC CTGC'rCATGC TCTAATGGTA CAACAGCTAA GGTCrrACCA AGACTTGCCA ACACT'N'TAA TACTGTATCA AGTTGTGGGC T'rGTCTTTCC TGTTTCCATT CTAGCGATAA CTGGCTGACT AACACCGCTC ATCTCCTCTA GTTTCTTCTG ACTA.ATACCC TTTCATTTC 311 TAGCCTCGAT AAGCTCACTC ATGATAGCCA CGCGCATATC ACTTTCCAAA ATTTCCrcTT TGCTGAATAA 7TCAGCTCTT ACATCTTTCC AGTTACTACC AATAGCATTA TiirrrCATT TCTAAACCTC TTTCTrrAA ATCTCCAAGT G-=rr'CT GTGTCCTTT CATAAAATGA GCAACAAATA AAA'IrCTA'C TCTAAGTGGT TTAATATATG GTTCGCCTC C CGTGTTCCA ATTTTATTAA GCTTAATTCT GCTATCTTTC TCAAAAACAG GCTCATTGCC G'TTTTATCC ACCTCTTCCT AATAACAATT ATAACCTAAA TAAAATAAAA AGCACCTAGT T'rCCTAGATG CTACCTCTAT CAAGGTGTAC TCCTTCTATA CACAATCAAC TAGATACCTA CCATCTCATG GCTAAAATAC AAATCAGAAT AGATATTAAA TTGACTGATA AATAATATCC GC'rGACAAGC AACCTCTTTT ACACCTCA A;AT'GTrCAGC CTGCTTGATA GTTGCGTATT 1?'rGATAAGC CATTTTACGC TCAAGGACAC TATACTTAGG TCACG?1TTAG CTTGCTCAAT CTC-CT'xrG TGCAGTAAAA CAAAACTACC ATCCATCCAA CTCAGCTCCC AAAI??TCAGC ATCTAAATGC TGTTGCCTTA ACAACTrCAAT ATAATCATTA CCT7T'rAC TGGTAAGCTC TCGCATATAA TTGTAAAAAT AGATATTATG CACTATTAAC AGTTATTGTT TGTAAATAC"T rr'AAGTTA'r CTAGCACAAT GACACGGATT CGCACCGTGG CTATCCCTTG TGCTT'rAGAA TATTATACCA ATATACCCCC Arr--GGGCA AGGGTACAAC
S.*
CCACTTATTT
TCCGATAACA
CTCACTTGTT
TAGCATATCT
TTGTTCTTTA
AACTTrATCAT AAGCTGGTGA TTCATGTGAT TGTACACATA TGTACCCTAA TATCTG'rTAT TGA-r1=AG CAGCATCAAA TCTCGCATGA AATACCACTT TCAATTTTCT TCTTTGTCAT CACT=TAGA AATAGTTGCA TTTCCACCCT AATATAGTTC CATCCTCGTT TAGGATACGA 3360 3420 34a0 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 GAGCCATAAA ATCI-rMCTC GGTGTATTAC AGAAATACC TGCTACCTCC TAAATCATCA ATTTAACAAT TCTAACCACT TAGATCTTGT TCGATGTATG ATACAAAGGT TCTAAATCTT AT CTT ATCCT CATGAGTAGG AAAGTATAGT ATTTCCG'TTT TTGCACCAAT CATCAATAAT AACTGGCACT TCCCACTCAC GCCATTTTTT AAGGTTTTCT A.AAACTTCAT TATCACTAAA TAGCTCGCCA TCTATTTGGA AAAATTrCCCC TAACTCATTG TTTCCTTCAA CAATAATAAA CTCTGGCATA TTTCTATTAC TTAATAACTC CTTGAGTTCT TGTAACTCTr TGA7MTC TAGATACTTC CTCAATTTCC AACCTCAATT CTTCAATCTG CCTTACTACT CCAAAAATTT CATGGGTCTT ATAAGATTGT TCAAGTATAG CCTI'TGCTGC TTGAGTTCTr ATAAACGGGT TGACCTTACT GTCCATCATA ATATCATNCA CTACAGAAAC AGCGTTAGAT GATGCAAAT AAAGCATTTG AGT= ~TrA TCCATCATCT CATCTTGC~r TATCCTCAAT GTCTTTTTAA CCGCTGCAAC TTTTAGATAC TTATGACCTG TTGCGCGTGA 312 TACCCCTGCT TTI-rGACATG CTTTGTCTAT CGTTGGCTCG GTAAGCATGG CATCTATGAA TTTrAATTTGC T'GGACGTAA GGTTATCATT TTCATTTCCT GCCATCTATT ACCTCCTCAT TA'rCAAAATA AAGGGTTGC!C CCTTTATTTC CCTATGCTAG ATAATTCTGC AA'PTCTGCAT CCA'PTGCCTC TGAAIGCCC TCAACAATCA TTrCATGCTG TACTAAATCA ATCTTATCTC CGTTAATAAG TAAACCACCG TGGAAATAAT TTTCAAGGCG TTGCTGTTGG CTGAATTGCT TATCAT'rATC CATAATATCT TCTAATTTTC CTAGGTATTC TCTCATTrCT GCCACTGTTA CTGCATCATC TGCTGTAATA GGCTC 'TCTr TCTCTTrrTTC TAGTTGCTGA TACAATAGCT CAA1-N-rTCT -rTCAGGAAA TGTACTAGCT
CCATGTCAAT
TAAGAGCTAG
ATTTGATACT
TTGATTCATG
GAGCAGTATT
TTCGATA'rAA GCAAGGGTAG AGGTrTATTT TTATATTrTTT AGATAATAAA CTTAGTTCAG GTrTGCTAGT TCAGCATITT TTGGGAATAG TTTCGCCCT
CTTTTTTATA
AAACTAATTG
TTTTAAAAGT TCTTGCrCTG CATACACTTT CCCGATAATC ACTrCCTTAT CCCATCTTGA GCTTTTAGCT TAATACTCCC ATGCTCTGGA ATTTC.AATAT ACT'rAATTAT ACCATrI-rTT GAGTATAAAA TATCATCCTT GTTTTCAGTC ATGCTTTTCT CAAAGCCTTT CTCCATCATT TTTAATAATT CCTTTATTTC ATTTTATTAT AATCTGAATA 9e
S
S S S S
S
S. Sq CCCCTAGTCT A'rrTATTTCA C'rAGGTrTTT AGGGTTCGTA TGCTAAAATA CTACCCTT TGTGTACCTT ATGGCTGACT T'rTCAAATTG GTTAGTT INFORMATION FOR SEQ ID NO: 29: SEQUENCE CHARACTERISTICS: LENGTH: 10254 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: AAAATGATAG CAGGAGAGTT TTCCCGTCCA TCAGACCCAG AACTGAGAGC CTrAGCTCAG GCTTCTCGCC AAAAACAGGC CGCCTTTAAC AAGGAAGAGA ACCCCTTGAA GGGAGCCGAA ATCATCAAGA CTTGGTITGC CTCAACCGGG AAAAATCTTr ACATCAACAC TCCCTTGATG GTGGACTACG GTGTCAACAT CCATCTAGGG GAAAATTTTr ATPTCTAATTG GAACTTGACC ATGCTGGATA TCTGTCCCAT TCGTATCGGG GACAATGCTA TGATTGGTCC TAATTGTCAG TTT TTGACAC CCCTCCATCC ACTAGATCCA CAGGAACGCA ATTCAGGTAT CGAGTACGGA AAGCCTATCA CAATCGGAGA TAATTTCTGG ACTGGTGGTG GCGTCATTGT CCTrCCTGGA 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5857 120 180 240 300 360 420 480 55 5 5 @55.
S
555.
S S St 55 0
S
GTGACACTGG GAAATAATGT CGTTCCAGGAGAGGCGAAPCCATTTGC GCAGGGGCAG TAATTACCAA ATC!TTTTGGC GACAACGTTG TCCTAGCTGG CA.ATCCTGCG AAGTAAAAAG GAACAGCTGG GGTTGTTTCT TrACA~rrA CCTACTCTAT CTCTTAGCAA TCGTAACTGG GATGTTTTC TCCTCAGTTC ?TTGrATACAA TAGTACAAAA TTAGAGGAGG ATTTAGAGT TGATGACAA TCCTCAGGCG CGCGTGATrA AGGAAATACC TGTTAAATAG ?rTTTGAGG T-rrCATCAT'r TTTTACCCAG GTCTGTT'TCA TTAAGCAAGT TCAAAGCATC ATCAGCTTCC TCCTTGACAC TCGGTCAGAT CAGGCrT'GA TTCAGAAACA TGCGATTCCT GTTATCATGC CCAATCACGA GGGGCTGGAC
TTGCAGTTGC
AGGGAAGTAG
TATGTCGTGA
CAAAGAAGTG
GGGCGAACTG
ACTACAAGGA
TGTT'rATGCA rTTTAGGTG AGGAGATTGA TGTTGGCGAA TTN'C'Ir'CTG CCACCAAGAC CCAGGAGGTC TGTCTCGCTC AGGCTCCTG'r
CCCCTATGCG
CTATCCAGTT
TGGCTCCGCT
CCAGCAGCCC AGTTTATGGA 7rGGTTGATT GGCTATGG'rG GGCACCTGTG GTGTCCTAGC TGATATAGAG GAAAATGCCT CTGCGAGATG AAGGAGCCAG TTACCACTAT GTGGCACCT CCAGAGGCTA TTGCTGCTAT TGAGGAAGTr TTGGAAGACA GTCA'rGACC'r GGACGACAGA CGGTTTTTAC CGAGAAACGG AAGGAAGAAG GCTGTGCTGT TGTGGAGATG GAGTGTTCTG ?TGCGTGGGG TTCTCTGGGG TCAATTGTGa TTCACAGCAG CAGTACGACA GTCGTGACTG GGGCTCGGAA GCTTrAATA TGGAGCAGAT TATC rCTACT TTCTAGTCCC TCTTCGCGCT GTCGTTATAT GGAAATGCAG GAGGGATTCC TTATGAAGAA CTGAAAAGGT GGCTTATCGT CTCTTGCGGC AGTAGCTCAA ATTCTCTAGC GGACTTGGAC AGGCGCTAGA ACTGAGTrrA GT'rTTATCAT AAAATGTCTA 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 GCAAGTGT'rC ACCACCTTTA GTTGTACTGG GCTCATACTT 'rrCAAAAATA TGTTTAAACG GAGGrCGGA AAAATCTTTA AAATCAGAAA ACACTATGCG TTTTATGTCG ATAAGA'rTTA CAAAGGA'N-r AGGTCACCTT CCTCTTGTCC TAGGCATGTT AACGTATCAT ATCAGGTGAT GAAAACTTTG GAGTGAGATG AAATGATACT CTTCGAAAAT GTAGGTATAT GT'rACTGACT TCC'rCAGTCT CTGACTTCGT CAGTTCTATT 'rGCAACCTCA TTTCTAATCG ATGCCT'rGGT TI-rCA'rTGCC
CTCTTCAAAC
TATCCGGCAA
AAACAGTGTT
TATAATCAAA
CCTGTCTTGC
CAATAAAAAG
GTCATTTCGA
ATAGAGCTAG
CAGGTCAGC1T TCACC1'TGCC CCTCAAAACG GTGTTTTGAG TTGAGCAACC TGTGACTAGC AAGAGAA.ATT TTCTCCTGAA AAGCATATAG AGTAGCTGGC GTTAAAAGCT 'TITTGACC TATAGTCACA TCTATCAAGT ATTGTTCTTG CC'rAAGCTAT G'rGGCATTT TTACGCTTGG TGTTAG'rAGA TTr'rGCCTTA ACT*ATG GTACAATGGA AACATGTTAT TCAAAT'rATC GCTTATCTCG TrrATCGCCA GCCCGTCGTA T'TTrT'rGAG
TCCTATCTAA
TAAGGAAAAA
TrTTGCCTTG 314 GTCATTTTAC TAGGCTCTCT TCTTTTGAGC TGCCCTTG TCC.AAGTTGA AAGCTCACCA GCGACTTA'N' TrGATCATCT TTTCACTGCT GTCTCTGCAG TC74STGTGAC GGGTCTCTCA ACCCTTCCAG TAGCTCACAC CTATAATATC TGGGGTCAAA TAATCTGTTT GCTCTTGATT CAGATCGGTG GTCTAGGGC'r CATGACCI-rT ATTGGGGTr'r TCTATATCCA GAGCAAGCAA AAGCTTACTC TTCGTAGCCG TGCAACTATT TCTTNGAGAA AGIrTGTCTA TTCTATT CAGGATAGT TTrAGTTATCG CTCACGACCT TTI'rGGTTGA GCTA?1TTTGC TCCA~fTTTTC 'rTAGTTTTCG
TAGCGATCTC
AGTTTATTITG C7*rTTCAGAC ACAGGCGGCC TTGCITrAT AAAGGACCrc TGCACTTTCA TNTGGAACAG CAACTACTCT CCTGTTGCCG ATAAGGTTTT TTTTCTACGA TAGATTATAC ATGTTTCTAG GTGGGGCACC GTCCTCTTGG TC7rTGCACG CAPCGATCG CGCCGCGAAC AGCTTCTTGA TAGGA'rTGAT CACCTCGTAT 'rTGAAACCA'r CCTGACCTTG GGAAATTGGC GGTCCCT'rGA CC7TGTTGT CACTATATGA AAGCAGATAT GATTGGAATT TTGGGCTTGG GGATATGAAT ATTATCCCTA TTTGGCCCGT GGAGTGATTG T.GATACCTGC GATACCGTTG GGTTATGCAC TGTAAGACTT CCTTATTCCT CAACTTrGGCT GGGGACGTGG AGCCTTCTGT AATGCCGGTT TTCATAAT'N' CGATTACTG GTCAATCTGG TCATTGCAGG GGTCTGGT'rT GATTTGGCTG GTCATGTAGG TACGAAGCTT GTACTAT'rAT TGACTATAGG CTTTCTTGAG TGGAACAATG CTGGAACGAT AGTTAGCTTT TTTCAAACAG TCAGGCTCAT CCTGTGACTC TGGAGGAACA GCTGGGGGAC AAGTGAGCTr CTAGGCT'rGC GG'rTCAAAAA TCCTTTAGTG TCTGCTAGGG ATAACAGCCA TTCAGCTCTT AGTACAGTTG TCTCAGTGTT ATCATGCCAC TAGCITTGGCA GATTACCATC TAGTA'N'GG? TAAGAAAGGA G.AAT-1-NrG4G GAGCAGTGTC
TGACGATGCG
T=TGATTTA
TCAAGATTAC
CTCATGCCAA
TCTTTATTAT
AAGGCAATCC
GTGTAACGGC
TTATGTTTAT
AGAAACTCGA
GAGCTTGGGA
TCTTTTTAGT
AGGGAGCACC
CTTGATTATT
AAGAAAGAA.AA
TTTGTTGTTA
TGGCAATCTC
AACAGCTGGC
TATCTTACAG
GACAT71'MT
TGTTGCGAGA
CTTTTTGATG
TCCCTTTATC
AAATCTGACT
GGGACGAATT
2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020
TGATGACCA
G'rGACATCAC
TAGTCGCGAC
CGCAGAGCGC
AGATGAAGAA
AGGTGAAAAT
CAGAAAAGAA AGATATGATT AAGAGCATGT CAGATCGTAC CTAGCTGCCC TAGCCAAGCA ATCAA'rCAGT 'rTGAGCCAGT TTATTGAGAT CAGCAGGGAT CTGGAGTCGA GTGTGCrrGC TGGGGGTACC GACTGTTATT GCTAAGGTCA AAAGTCAGAC CGCTAAGAAA GTGCTAGAAA AGATTGGAGC TGAC'TCGGTT ATCTCGCC-AG AGTATGAAAT GGGGCAGCT CTAGCACAGA CCA?TTT CCATAATACT GTTGATGTCT TTCAG7rGGA TAAAAATGTG TCTATCGTGG AGATGAAAAT TCCTCAGTCT TGGGCAGGTC AAAGTCTGAG TAAA'rTAGAC CTCCGTGGCA AATACAATCT GAATATTTTG GGIrCCGAG AGCAGGAAAA TTCCCCATTG GATG3TrGAAT TTGGACCAGA TGACCTCTTG AAAGCAGATA CCTATA7"r"r GGCAGTCATC AACAACCAGT AT7rGGATAC CCTAGTAGCA TTGAATTCGT AAAGACGGAT GACCCCTCI-r TrGATGCC TAAGATGGCA AATAGAGACA GAACCCCTT GTCT7CTAGT AAAAGTCTT CAA6AGGCTGG ACTTTATGGT TACTCAATGA AAATCAAAGA TCAAACTAGG L-LTGAGGTTG CAGATAGAAC TGACGAAGTC CGCGGTTTGA AGAGATnTrrC GAAGAGTATA TATGAAGTTA TTGTCTATCG CAA'I-rTCTAG TGTGGAGTCG CTAGTGATTG GTGG'rGAGCA TCAGGATCAG ACTCACGAAA TCGCTGAGTG AGCCATCTAT CAGGAAAATA AATGCCATGG TTCTGGGCGC TATrNAAAG TAGTTGACAG AAAATAGAAA GAAGTGACAA GAGAGAGTAA AAACTAGCTA CGGGCTGCTC AAAACACTGT AGTAACATCT ATACGGCAAG GCGACGTTGA AGAAAAAATC AGTCCCCTAA AGGAGTAGA'r CTA'rAATGCA GCAGCCTA'rC TT'CAT'rACTG AGTTGGGATT TTGATTATCA'ATCACGGGTC TrAGCTAGC AAGTATCCTA ATATCGTTAG CGGTGCGGTC AATCGTGGCT TGGTAGAGGC TGATGACTGG GTGGATCCTC GTGCCTAC'TT GAAAATTCTT CGAAACCTTGC AGGAACTTGA GAGCAAAGGT CAAGAGGTGG ATGTCTTTGT GACCAAT=T GTCTATGAAA AGGAAGGGCA GTCTCGTAAG AAGAGTATGA GTTACGATTC ACTCTTGCCT GTTCGGCAGA TTTTTGCTrG GGACCAGGTC GGAAATTTCT CCAAAGGCCA 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760
GTATACCATG
ACTGCCTGAA
CAAGACCATG
GTCTGTCAAT
CTTGATAGAC
GAATCATATT
ATGCACTCGC TGAT'rTATCG GACAGATTTG TTGCGTGCTA GCCAGTTCTA CATACTTTT ATGTCGATAA TCTCTTTGTC TTTACGCCCCC TTCAGCAGGT TACTATCTGC CTGTCGATTT CTATCGTTAT TTGATTGGGC G1'GAGGACCA GAGCAAGTGA TCAT'rAAGTG CATTGACCAG CAACTCAAGG TCAATCGACT CAACTTGATT TGTCCCA.AGT GAGTCATCCC AAAATGCGAG AATA'rCTGCT GAACTCACGA. CGGTGAT'rTC CAGTACCCTG CTCAACCGAT CTGGAACAGC GGAGCA'rCTG GCAAAAAAAC GCCAATTGTG GACCTATATT CTTTCAGGCT ATTCGTAAGA CCATGT'rGAG CCGTTT'GACC TCGCAAACTG TCCAA'rGTCG TCTATCAAAT CACCAAATCT TAAGTGTN'T ATAAGAGGGA T~rAAGAAAA ATTTTAACTT TCAGGAGATT ATACTAGAGT CATCAAATAA AGAAAGACTC CAGCAGAAAA ATCCAGAACT AAACATTCTG TCTTGCCAGA CTTTATGGAT TTAATTAATA TTTCTTAGTC C1TrAATT TAAGGAGAAT CCTATGAAAT TCAATCCAAA TCAAAGATAT ACTCGTTGGT CTATTCGCCG TCTCAGTGTC GGTGTTGCCT CAGTT-GTTGT GGCTAG'rGGC TTCTTTGTCC TAGTTGGTCA GCCAAGTTCT GTACGTCCCG
ATGGGCTCAA
GTGACTTATC
CTGGAAATAC
GCCCTTCTAG
TAACAGATGT
CAGCAGAAAC
TCGATGTTCC
TAAACCAAGT
TAAAAGCTTC
CTCCTCr'rGA
CTGTTGGTAA
ATAAACCTAC
TTGCTACTAA
AAGCCGTTGC
CCAAGGGTGA
CAGGTGATGG
ATAACGGCGA
GTCAATACT
TCATTGACCA
GTAACAAAGA
ACATAAACG4G
ACAGTATCGA
CAGGTGTCAA
TCTTGCTCAA
CCCTATCTCC
TGCGACGGAAA
GTACTCAAAC
TGGACAACAT
AAACAGT'rCA
ACCTAGAAAA
316 TCCAACCCCA GG'rCAAGTCT TACCTGAAGA GACATCGGGA ACGAAAGAGG AGAAAAACCA GGAGACACCG TTCTCACTCA ACCGAAACCT GAGGGCGTTA GAATTCAC?1 CCGACACCTA CAGAAAGAAC TGAAGTGAGC GAGGAAACAA TCTGGATACA CT'rTrGAAA AAGATGAAGA AGCTCAAAAA AATCCAGAGC CTNAAAAGAA ACTGTAGATA CAGCTGATGT CGATGGGACA CAAGCAAGTC TACTCCTGAA CAAGTAAAAG GTGGAGTGAA AGAAAATACA AAAGACAGCA TGCrGCTTAT CTTGAAAAAG CTGAAGGGAA AGGTCCITC ACTGCCCGTG AATTCCTTAT GAACTATTCG CTGGTGATGG TATGTAACT CGTCTATTAC GGATAATGCT CCTTGGTCTG ACAATGGTAC TGCTAAAAAT CCTGCTTTAC AGGATTAACA AAAGGGAAAT ACTTCTATGA AGTAGACTTA AATCGCAATA ACAAGGTCAA GCTTTAATT1G ATCAACTTCG CGCTAATGGT ACTCAAACT TGTTAAAGTT TACGGAAATA AAGACGGTAA AGCTGACTTG ACTAATCI'AG AAATGTAGAC ATCAACATCA ATGGATTAGT TGCTAAAGAA ACAGTTCAAA AGACAACG?1' AAAGACAGTA TCGATGTTCC AGCAGCCTAC CTAGAAAAAG AGGTCCATTC ACAGCAGCTG 'rCAACCATGT GATTCCATAC GAACTCTTCG CATGTTGACT CGTCTCTTGC TCAAGGCATC TGACAAGGCA CCATGGTCAG CGCTAAAAAC CCAGCCCTAT CTCCACTAGG CGAAAACGTG AAGACCAAAG CTATCAAGTA GCCrrGACG GAAATGTAGC TGGCAAAGAA AAACAAGCGC GTTCCGAGCA AAyGGTACTC AAACTTACAG CGCTACAGTC AATGTCTATG CGGTAAACCA GACTTGGACA ACATCGTAGC AACTAAAAAA GTCACTATTA ?TTAATTTCT AAAGAAACAG TTCAAAAAGC CGTTGCAGAC AACGTTAAAG TGTTCCAGCA GCCTACCTAG AAAAAGCCAA GGGTGAAGGT CCATTCACAG CCATGTGATT CCATACCAAC TCTTCGCAGG 'rGATGGTATG ?'rGACTCGTC GGCATCTGAC AAGGCACCAT GGTCAGATAA CGGTGACGCT AAAAACCCAG ACTAGGTGAA AACGTGAAGA CCAAAGGTCA ATACTITCTAT CAArrAGCCT 'rG1AGCI'GGC AAAGAAAAAC AAGCGCTCAT TGACCAGTTC CGAGCAAACG TTACAGCGCT ACAGTCAATG TCTATGGTAA CAAAGACGGT AAACCAGACT CGTAGCAACT AAAAAAGTCA CTATTAACAT AAACGGI'TTA ArrTCTAAAG AAAAGCCGTT GCAGACAACG T'rAAGGACAG TATCGATGTT CCAGCAGCCT GGCCAAGGGT GAAGGTCCA'r TCACAGCAGG TGTCAACCAT GTGAT'rCCAT 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 317 ACGAACTCrT CGCAGGTGAT CACCATGGTC AGATAACGC TGAAGACCAA AGGTCAATAC AAAAACAAGC GCTCATTGAC TCAATGTCTA TGGTAACAAA AAGTCACTAT TAAGATAAAT C7TrCTAACTC TGGTrCTGGC A'rAGCATGCC TGCTGACACC CTGC'TTCTGC TAACAAGATG GGCATGTTGA CTCGTCTCTT GACGCTAAAA ACCCAGCTCT ?I'CATCAAG TAGCCTTGGA CAGTTCCGAG CAAACGGTAC GACCGTAAAC CAGACTTGGA cTrAAAGAAA CATCAGACAC GTGACTCCGA TGAATCACAA ATG.ACAAGTT CTACCAACAC TCTGATACGA TGATGTCAGA ATACTGGTGA GACTCAAACA TCAATGGCAA GTATTGGN'r G=TACTCGG TGGTCTAGGT TTG.AAAAACA AAAAAGAACA TAAATGATC'G ATAGTGGGCT GACTAAGATT AGTI'TAACAA TTCTTTCAAT AGCAGATTAA AATCATCGTA AAACAA'rAAA AGTATAGCAC TG7"TTTATC AAAGGAGAGA CACATGGGAA GACGAGGTAG AAATCACAGA TATTCATCAG AGATACTTA.A TrGTAGCCC ATGATGGACI GGAAGCGCTA GAGCTC-TrCA ATTATCACAG ATGTCATGAT GCCTCGGATG GATGGTTATG TACI-rATCAC CAGAGCACC TT'rCCTA1rrT ATTACTGCI'A Ar-rTACCGCC TGAGCTTCGG AGCAGATGAT TrrATTGCTA CTGGTTTTGC GTGTCCACAA TATTTGCGC CGCC?1'CATC Arr'rCCCT'rG GCAATCTAAA AATGAATCAT AGTAGTCATG ATGCTGGATT TAACTGTTAA ATCATTTGAA TGCTGTGG.A CGAGTTTTrCT CCAACACAGA CCTCTATGAA AAGATCTGGA ACCAATACYT TG;AATGTGCA 'ATCCATGC CI'CGACAGG G-CTCAAGGCA TCTGACAACG ATCTCCACTA GGTGAAAACG CCGAAATGTA GCTGG4CAAAC TCAAACTTAC AGCGCTACAG CAACATCGTA GCAACTAAAA AGCAAATGGT TCATTATCAC TCATGCTCA GGTACTACAG GATGGCAGGT GAAAACATCG GGATAAAGCT ATGCTACCAA CCTTGGGCTI' GCGCTTGCAG A.AACTAATCA GCTAAGGAAA CTCAATCAGC AATCAGGACT AATAGTGTTA TACTTAAAGC AGACAATTTT ACTCGTTGAC TTrCAGGCAGG TTATCAGGTC AGAAAAAACC GAITTGATTTG A-TAATCAG TGAGGTTCA6A AGACCAGTGA ACAGGACAAG AGCCTTTTAG CCCACGTGAG GTGGGGGCGA AACAGAGCTG AAGTTCAAAT AGGAGAAGAA T'=~AGCTAG TAATCCAGAG AAGAACACI'A CGTGGATGAC AGCTGGCAAA ATATAGTAGT 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 GACCAAACTC CCACTATTAA GGACAAACAT GAAACTAAAA CCATTTTGGT rTTrTTTrGG ,7rTTGCTTGG GATGACCATC TATTGCCAGT C 'rTACGTCG GACAG7"rTGG GCGTTGGGAT ATAAGATAGA GAAACCGAGA AGTTATA'Tr TGOTrGGATA TA1-rATTTCA ACCCTCTTAA GCTGTTCA.AA AAATGCTGAT TGCGAAAGGC GAGATTTACT GTT=CCAGCC 'rTGTCGGTGC TGGGATI'AGT CTCTTTCTCC TTGGGCAAAC TCAAGGAGCA 'rCCCAACCGG GTAGCGGCCA 318 AGGATTTTCC TTCAAA'NTrG GAGGTTCA.AG GTCCTGTAGA ATTTCAGCA.A TTAGGGCAAA cTTrAATGA G;ATGTCCCAT GATTTGCAGG TAAGcTTTGA TTCCTTGGAA GAAAGCGAAC GAGAAAAGGG CTrGATGATr GCCCAGTTGT CGCATGATAT TAAGACTCCT ATCACTTCGA TCCAAGCGAC GGTAGAAGGG A'1rTrGGATG GGA'rrATCAA GGAGTCGGAG CAAGCTCATT 9360 9420 9480 9540 ATCTAGCAAC CATTGGACGC CAGACGGAGA TTTTGACCCT AAACACAGCT AGAAATCAGG TGGACAAGCT CTTAATTGAG TGCATGAGTG GAGATGTCCA CTTGCAGGTA ATCCCAGAGT GGCTCAATAA ACTGGTTGAG TGGAAACTAC CAGTAAAGAC AATrTCAGTT T'rTGATTGAG CTGCCCGGAT TGAGGGAGAT
TTTCTCGTAT
AGCTGGAAGT
GGCAGGGTAT
CTTCGCGTAA
CCCATCAATT
C'TTGGTGAAT CTGGTCGATA ACGCTrTTAA ATATTCTGCT GGTGGCTAAG CTGGAGAAGG ACCAGC'TTrC AATCAG TGTG TGCCCCAGAG GATTGGAAA ATATTTTCAA ACGCCPTAT CATGAAGACA GGTGGTCATG GATrAGGACT TGCGATTCCG GGGTGGCGAA ATCACAGTCA GCAGCCAGTA CCGTCTAGGA
GAGTTGAATT-
AGTATTTTTC
CAGGAGAGAA
TATGCTAAGC
CCAGGAACCA.
ACCGATGAAG
CGTGTCGA.AA
CGTGAATTGG
AGTACCTTTA
9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10254 CCCTCGTTCT CAACCTCTCT CTATTCATGG TAGAATAGAT CAGGTGTCTT ATGACAAGTA GGTAGTGAAA ATAAAGCCTA T'rTGTGTGAA ATATCAGCAG ACCTTGGCTG TTTAGGCGAA AAACCCCTTT ACAAATCCAG GAAAGCATGA AGCTCGTCAA GGGCATCTGC ACGG INFORMATION FOR SEQ ID NO: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 9769 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: CCGGCGACTA TCGATAACAC TTGACTTGGT AGCCCCACAT TTTGGACAAC GCATCCTIrC CCTCCTTATC GTTTT'CTTTT CATTATACCA TN"N'TAAGC GATTCCCAAA ACAATTCTTC TTTTTGCTTG ACAAGTTrT TGTTTTGTTG TATTATTTAA TTAAGACAAC AAGGTAAAAG AAAGGAGACT AAGATGTCCT GGACATTTGA CAACAAAAAA CCCATCTATT TACAGATTAT GGAGAAAATC AAGCTTCAGA TTGTTrCCCA TACACTGGAA CCCAATCAAC CGTGAGGAGC TAGCTAGCGA GGCTGGTGTC AATCCCAATA CCATCCAAAG GACCTTGAAC GAGAAGGA'rr TGTCTACAGC AAGCGAACAA CTGGACGATT
AACTTCCAAC
AGCCTTATCA
TGTGACTAAG
GATAAGGAGC TAATCGCCCA GTCACGCAAA CAATTATCAG AAGAAGAATT GGAACACTTC 319 GTTTCCTCCA TGACCCATTT TATATTAAAG CAGTTTAAGC TGGAGCAACA CCAGCCCTTG CCTTCTTGGG CCAAACGGCT ACAACCAGA'r CAAGGACGTG CGTTCTAGCT TATTTGCCTG TGGCTATGAA AAAGAAGAAC CTATGTCArr ACTAGTATTT AAAATG1rC TC7TGACAr CAGGAAAAAC AACCCTGA'TT TCCTCATCAA CGACATGGAC ATACGACCTA 'rCTCAATGAG TCTATAAAGA TTGTCAGATC TGAAAATAGT CGTCTCAAGA TACCAGGCGT AGTCAGTGAT GAAAATGTAT CCAAATCATA CCAGCTGGAA AAATTGTCGG AAACTAArrA ATGGCCTCTT CCAAGCCCAG CAACCAAGGC
S
CCTAACCTAC
CTTGCAGACC
GA.AAAGGTTC
CCCATTGGTG
TACTCACCAA
T'rGGA'rGAAA ATTrCGCTACG CAAAG4GAGAT
TGGTATTTAG
CAAGGCTTTA
ACACTCTTTG
CGCTTCA.AAG
GAACACCATA
GCTGTATTGG
CTTTCTrATG
TCCTTCCTAC
CAGCCTT'rCA
GTCATTGGAT
GTAGGACTCA
TTCATAGCTA
TAAATAATTT
TCTTTAAAAT
AGATTTTTTC
TCAAGACCT
TGGGCArrGA
CAAATGAAGG
TTGA-ACGCGC
AACTATCAAA
AACTGATTTT GGT'rATGAGC CGTGATGCTC GTCTCTATGT TTTGGACGAA GGGTGGA'rCC AGCAGCCCGT CTATATCC TCAATACCAT TATCAACAAC CTTCTACCGT TTrGATTTCT ACCCACTTGA 7TTCTGATAT CGAGCCAATC TT'GTCTTCCT AAAAGACGGA AAAGTCGTCC GTCAACGAAA TGTAGA'rGAT AGTCAGGTGA ATCCATTGAC CAACTCTTCC GTCAGaATTT AAGCCTAAG TA'IITATGTT TTGGAATTTA GTTCGCTACG AATTT-AAAAA 'rGTTAACAAG CCCTCTACGC AGCCGTGCTA GTCCTTTCTG CCCTCATCGG AATACAGACA AAAATCTACC Tr'ACCAAGAA AGTCAGGCTA CTATGC1'ACT TT-TTCTAGCT GTGGCTTGAT GCTTACACTT GGGATTTCAA CCATTTTCTT GATTATTAAA GTAGTGTC'rA CGACCGACAA GGCTATCTGA CTTTGACCTT GCCAGTTTCT TCATCACAGC CAAACTAATC GGTGCCTTTA TCTGGTCATT GATrAGCACC CTCTAAGTGC TGTTATTATT CTGGCTTTAA CAGCTCCAGA ATGGATTCCT TGATTACATT TGTAGAAACA CATCTCCCTC AGATCTTTCI' TACAGGTATA TAAATACTAT TTCAGGAATC CCrGCATCT ACCrGGCTAT TTCCATTGGA ATGAATACCG TACAGCACTC GCTGTTGC-AG TCTACATTGG TATCCAAATC T'rATTGAACT T'1-rCTTCAAT CT'rAGTTCTA ATTTCTATGT CAA'rTCACTG 'rCAAAGAAC
CCATCATCI'A
AGGAAACAAA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 ATGACCATTT CTATATGGGA GCAGGTATAG TCTTTTATCT CGGAACCTAC TACATCTTGA TTACCTAGAT ATGTAACATA CTCATAGAAC TAGAAAACGC ATAGTATCAG GTGTTGAATA TGTCTAACTT TTGGGGGCAG T'rCATAAGA-A CCAT'rGT'rGA
GAAATAAGGT
AAAAGAGACC
AGAACTCATA
TAAT'rTGCTT
AGGCAAAAAG
TGTACTGCcC CCCAAAACTT CCT'rGGTAAT ATGCGTTTT TGTGAGCTGA CT1,ATTTCCT ATTTTGGAAT TCAAATCAAT TTCAGTTCAC TATACAATTG GGTTCG1,GAT TCCACCC7M' ATAAGATAAG GCACGTTTAA ACTAGCATAC ATGCGTCCGA AAGTGAAATC CATGCTCT~G GAAGGTCGAA ATAAAGGGAA ACCTAGAGCT GTCACTCCAA TTCCCTGAG TCCTCAGGAC CATGAAAAGA CTGATCAAAA ATCCAAAATA CCAGACTGAG 320 T'rCACTATAT CGCAAAATGA AATAAGAACG GAACGATGGG '1-TATAAGAAT GTITTAGAAG TAATATTATC CTATTCCAGA AGTTTTCAAG CPAACCTGTTT ACATAATGTG TACATAATTAr 'rCACCTTTAA AAACCTCGCT TTCGCAAGGC I~rTAT AGGT'mrCCA AATCCCTAAA TCATCCGTTT GAAGAACGAG TAAATCCTGT TGCTACCACC GCAAAAATCA CTGTAATAGC CTCCCCCCGC A'rAGTCATTA ATCGTTCGAA ACGGCATAAA TATAAGAACC AA'rCTTCAAG AGGAGATTGT AAAAACCACC CATAATCAAA ATCATCAAAG GAGAAACCAT AGATCCTAGG AAGCCTGCCA TAAAGAGCAA GGTATTCAGT CAGATAGCAT CCAAGAATG4G CAAATCTrrA AAGAGCAAAA
CACCAGCTGC
GCGACAAGGC
AGACTACGTA
CTCCCAAGTG
CGGCAGCCAG
S
S.
S
*S *S S ACCACCTACA ACATAGATCC CA.ATATGCGT 'rAAAATCACT AGAAACAGAG CCATCATCCG CGCATAGAAA TAGTGACTTG CCCTTATGCT AGAAAAAACG ACI'TCCATAA TTTTGGTGCC 7"rTCACTG GCAACTTCCT GAGCTGTTAC ACCCGCATAG GTAATCAGAA TCATATAAAG AAAGAATCCT AAGGCACCTG CTGCAATTGT TTGAATAAAC TTrTTATTr'r CCTTGGCTTC ATCAATC?' TCTGTGAAiTT GAAT'iGTC'rG CGCTAAGCGT ?I-rrCCTGCT CTVGAGACAA" GGAAGCAGTT GAACGATTAA CrGATTrTTG CAGTTCATTG AGTGTACCTG TAACCTCAAA rTTTAATTCCA TTTTCAAGCG ATGTTTCGCC ATGATAAACT GCCTTTAGAA CACTATCTTC TTGATCAATG GTCAAATAAC CTTTTAATT'r TTCTrC'TTTA ATTGCTTCTT TGGCACTrGC TTCGTCTTTA TAGTCGAAGT TAACACCAT'r TACATTCTTC AGTCCTTCTG CTACAGATGG CACTGTTGTC ACTACTGCCA CTTTATTATT TTT'AGCCATA GAAGAACCTr GGAGATGCCC 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 AATTCCTACA GAGA'rTCCTA TGACTCGACA TGTCGAAGAT ACTCCTGATT CTAGTTTAAA ATATATTGAC CTTGAGTCAA A'rCAATTTCC AACTGCCTTG AAAAGAGGAA CGGCGAAATC ACCATAAAGA AGAAACTCCA AGG'rTCCTT GATTACAACC CACATATTTC TCATACTTCC
GATTTCATCG
GATTGAGAAG
TTTGGTCAAG
ATAGTTGGCG CTTGTTGGTC AAATGTTGCG AGTTCCCT'rC CAGCGCTCTC ATCCTCCAAA CTCACCTGTT TGACATGAGG AAGATTTTCC
AATTCTT'CCT
ACATCCTGAA
CAAAGTTCCT
TGCrTCGTTC CTGGTCCG1rG
CAACATTGGT
ACTTGAAACA AAGAGACGCG T'TTTCCCGTA TTGAT'rGCGG CAAGACCACA CGGCCATCTC GGATCATCAC AATATCGTCA CATGACATGG TCAGAAAAGA TAA~rTGT CCGCGCTCTT TTTCCTGAAA AATGACTTGT TTrGAGCAA'rr GCTCATCCAA GATAATCAGG TCTGTTCAT GCTGATTTCC ?N'TGACAGA CTCN'GATTTl TCTTCATCCA T'rGAGGGAGT TTTTCTTTGA CCAAGTAGCG AACrrGGTCA AGAACTGTCA ATAACCAATC CGAGCATAGG TCTCCTGACG CTGATATTCT AGGAATTTCA AALATACTATG e* 0 0 @00 0 0* @0 0 0 0* 00 0 0 0 0@ 0 0 *00000 0 0000 0 00 00 0 0000 0000
S
0..0 00 .0 S 0
C
TCCGACTAGT CCCAAAATAC GACCTGGTCG C -rGGATCCA AAACN'TTCT CTAGACTTCT CTI'GCACTCA 7rATACTCCT TTTGATAGC GACTATrGCT GTGTAAAATA TGGCCTGGAG AAACTAGGAA GCTAGCCGTA GACTGCTCAA ACGAAGTCgA CTCAAAACAC TGTTTrGAGG ATCTACGGCA AGGCGAAcTG ACG'rGG'1r'r TCCATrATAC AGCAGCAAAC TTAATTTATA CCTGAAlTTGT TAT1TTGAGTA ACTICCTT'T TGGAAAAAGG CTAATAG?1'T CAGACAACAT CAAGAAGGAG TAATCCTT'rA TCTACTAATG ATGTTrrrCTA AGGATTATAT AGTAAAATGA TCAAATTGAT TTCTAACAAT GTTTTAGAAG TATATCTATT ATGCACACCC CrATAGGA'rC ATGGTTACCT AAGCCTAAGG GAACTAAGAA AAAGTAGATT AACAACTATC CTAAAAAATG CTGGATGACT AACTTGAACT TGAAATTTAG 321 CrGTATTAAC
GAATCAGAGT
TATCTGTCAG
CTrCTTrGGC
ATTTAGGCAT
AATATCCTGA
GAAAATCr
CGCTTGAAAG
TACTTCTAGC
CTTTACAATG
CACTTTTATA
AGTACAGCTT
TTGSTGGATAG
AAGAGATTTT
CCTTCCGCTC
CCTCGTAAAG
TTTTATAAGA
GACGGAACAG
AATAAGAACA
TAGATGTATA
'rAATGAAAAT AACGACTrACC CTrGAACTAC
CAATAATTAA
rGGGCCAA'r CCACTAAAAG AATAATGAGC TGAATCTTCT CTTrCCTTrC ACTTCCAACC ATCCATGCCT T?1'AGAGTCG GAGA'rGCGrr CTTCAGGCAG CCATCCAGAC CGATTTCTCC GTT'rTCCAC CACCATTTT TCAATACCAA ACAAAACT'rG ATCTTTCACC TCCGAAATTT TTTTTTGTCC ATTT-rTAGAA C'rCAATGAAA ATCAAAGAGC TGAGGTTGCA GATAAAACTG AACrGACGAA kcr'rAaCTAT CGAAGAGTAT TAGTGATAAA CTCAACTCTC TATTTTTAAT TI=TCTTCCT CTAAAACTTC AACAAGTTrCA TCTGTCATTI' AATCALACCG CTTGTCCGAT GGACAA.ATTG ATCAGGACAG CTATTCTAGT TTCAATCTGC CACAACAGGC TCATTCATAG AAGGAAGTCG CATTCATCGA AAGTCCCCCA GAGAAGACTT TTCACTATCT AACTATATTT cAATTCAAAG GATTCATACT TAGA7?IT AGTCTTTCTA GAAAGCGCGA ATGCCTCAAA 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4 980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 ACTAArrATTr TCAGAACTGA TTAATATTAA AATTAACrAA AGCCATAAAT TACGTCCATC AGAGAGAGAC TCrrACTACT GCTTCAGAAT ACATCrAAAC TTTAGGGAAA ATGACTATTC ATTATCTCAG ATAAGCTATT CGAAACTTAG AATGC~rTTA AATTT-ATGGA ArTGCGATTA TTCGAAACCr AGAATGCATA TAACCTTTAG TTGACAGACC TATTCTAAGT CTCGAAGGGC
TATTTACTT
TATAGTAGAA
rGTTTCTCCC IL 3ATr'rTC
TGCCTTTATC
ACAGACTGTC
CGGGCACATC
CTTTTATCTA
CTACGACAAG
TATGATGATI'
GATGAGCTI'C
TTATACCTAC
AAATCCCTTT
CTAT1'CCTTA
ATATACTATC
CTTCTGCAGC
TGTAT7TGG4G
CAGCAGGCAG
322 TCAAAAAAGA CTCATTCCCC CTTTCTCCTC CAAAATATGG TATGAGGAGT TTACATGTCA CAGGATAAAC AAATGAAAGC GAGTTATCAA TATCTCATCG ArTGTCGGTG GCCTGGGAG CTTATCAGGC TGGGA7*rTTA CAATCCAAGG AAACCCTCTC GCA'rCTGGGG TCCACCTCTC GTCCCTATCA TTCCACGGGC CTTIGACCTrG ATCGGGACTA TCTACAACTA TATCGGCATC GTCCGCCTAT ACGGAGCTGC CTTTG'rCCAG TACATCGACT CGCTAGATAA GGGCAATCGT TGGCCCATTA GCCCAGCTGA CTTTCTCTGT AAGCGCTACA TGACCATCAT CATTCTGACC GGTCTGACCT ATATATTGA CTTTTTCTGG C4TTTCCCAA GTGGATTTTT AAAGCGTAGA AATATACTTT CGTATGGAAA TCATGCATAT TTTTCGATAG CCTTTCCGCC CTGATAGAAA CACCTGAAAT CTAATGGT GCCTAGTGTC TCAAAGTTTA GGTATGGAAT TTTGAAGAAA CTTAAGGAAA GGCTCAAAAA TA'rTGTTT'rC AACCACAAAA GATTTGTGC rTATTTGA AACTTCT'rTT GCAAGAACAA CCATTTCCTG CGACTGCTGG CGTCACGATA TAGTCACGCA CCATTAAGAA GAGATGTAAA TTTCTCACGG ACACGGTCCA ACCCCTCCAC CAAAGACAAT CACGTCTGGG CGGAAAGTCA TGAGCGATAT AGTAGGCTTG AACATCCCAA ACAGGGTTGT CGTACACCTG TACGAGCTI'C CAAACTTCGA CCAGCTGCAT TGGAAAGGAC AAACACCCTT AAACTCTTTT TCAATATCCA TAATGACCCA TTr'CAGGGTG ACCCACACCA CCGATAAACT GCACCGATAC CTG'IACCGAT TGTGTAGTAA ACCAAGTN'TT TTACCGGCAA CCATI'TCACC GTAAGCAGAG CTGTTTACGT ACGTTTACG CGCGACGAAG GGCACCAAGC AAGTCTACAT GTGGCrGGGG TCNTTATCTA GTGATTGCGCT GTGCCATTAT TCTGTCGTCA GCAAGCGCAC TTTGACCGCT TCTTTATTTT ATGCTGGCTG CCCTGACCAA AAACCCTr'rA CCCTCGTGGT CAAATGCT'rT GACACGTAAA TTAACTATAG CTTGATACTA TGAGGCGAGG ACTTACCTAG CAGGTAT"TCG GAAACTTTGA GTCGCTACCC TCCGTAATCA TCCGTTTGGT TT-CCCAAGCG AGTTCCCAAG TGTGGCAGAA CA'rCTGGTAC TGGTAGGTAA GCATAT'GTTG TTGAGCCATG CTGTCGCATT AACCGCAGCT TGAGTTCAAT AGTrTTCCCCA AACCTTC'rAG ACATCCCTTA TTGGGTGTCT AGCAACATAA CACCACGTTG GATGACGCCT CGATACGACC ACCAGCATTG CTGTTCTGAA CTACATTGGC 1-rGCCCAGTT TGGTTTTGGA 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 GTCGTCGTGA TAAAGCCATA AGT'rTGAG CCAACTCCAA CACCAGCAAC GTTATCGAAT TT'rrTGTCAA TATCAATCGG CCCAAATGAA TTTGAGAAGA ACTCAATGGT TTTATCGArr 323
GTTCGATTG
CCGACAGCAC
ATAAACCTCT
?wrTN'TAGAT GTTCaTTTTC
AGCAACTGAA
AACCATG?=
GAGTTGTTGT TGGAAATTGT GTTTrTCTA CAACGTrAAA GIrCA'rCA AGACAAACTT TGTACCGCCC GCTTCCAAGC TTCCATATAA =NGTCATG TGTTTTTAAT TTC=~ATrA TAGCATACTT CGAAAGTCTA AATGTCTCTA TTTCCTCTGT AAATCTTAC1T ATCTAATAAA AACGAACAAA CATGTCATr ACATTAGAGA GGATT-GATTA GATTTTCACT TCGATCACAG CATCCCCCI' CC74TTGCGA CTGGAGCTAC TGAAGCGTAG TCACCTGTAT TI'GTAACGAT
GTATCATCAA
AACATCGCCA GCTTTCACCT CATAGATACA GTATCAATAC ACCAAAAGCG TGCCCTGT'IG CACGCCTTGG CTTGCTTTCA GTCATTGACA TCAGCAAGAG TTGAACAGCT GCTGGCGCAA TGCAGTTGCG TCTACTTCAT AAATGATACA GCTACCATAA TGTACCAGGG ATGATGGTGA TCCACCACCG ATTGCACCAG CACCCCGAAG ATAGCAGGCT AAGTG7TTTC AGTTTTGGAT GTrCCAGCTGC AGCGATTTTG TATTACCTTG AGCAACTT CAACATGAAT CAAAACTTCA GAAAGGCAAT TGAAACTCA CAACGATACC TTGTCCCATA CGACAACATC ACCGACGATA CTTCTTCTITr TTCTTCAGCC CTTCGTAACC AAACATGTAA GAAGGTATTG TGGAAGTTrT TACCADTACC AGTACCAGCA CAATCAATGA AAGGAAGAAT CTGTAATACC TAGGAAGGCA r=rTGTTrr AACACCAACC 7"TTGAGTCAA ATGTTCCAAG GTI-rCAAAAC CGTCACCGTT GCACCATT'rC 7TrTTTCAA GCATCAGCTG GTGCATAGAC GCTCCACTTG AGAAGACTGG GGAGTTACAA GTGTTTCATT ACT'rCAGCTC GTTTTGCAGC GTAAGAGCAA AACCAAGGGC CCGTTACCAA CATAAAGCAT
AGTCCAAGGA-TAGAAGCCAA
GGT?1'ACGGA AGCGCAAGTT GAAAGAGCAG CCGGGAAAGC GCAACAGTAG CAGCACCTTG GCATGGTCAG CAGCAAGTAA ACGACGATCA ATTGGTGAAC AGAATCGCT'r TTGTAGCAAT ACAAAGAGTC CAAGGATAGA ATCACATCTG GAACAACTTG TGAAGGCTGG AAGAACGGAA 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 -8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 AGC'rG'CATA GCAGCTGTGA TGATAGCGTT T'rGCAC'rrCA AGCAAGTTGA AGATGTGGTG CCCACCAATC AAGAAACCAC CAAGACCAAA AAGGATGTAG TTrCAACAA CGTGGAAAAC CATGACCAAA AGTG;TCACGA ATGGTCTTAC CGrGACAGCTT 1-rTCAAATTT AGCTCCGACA CCTTGCAAAC CAACAACAGG GATGAAACCA TGAGCAACTG CCCAAGCGTT TGGAAGTGAG CCAACGGCAG GATTTCC-ACC AAATACACGG ATGATGAAGG CTrGTATCTGT CAAGAT~rGT
GAATGGGTTA
CACACCTGAC
TGGCATGCTA
TGGTCCAATG
CAAGAGGTCA
ACCCCGATGA
AAGAAGT'rCA TCGCTGTTAC TTCACCACCT CCAGAGACAA GCA'rCATACC AAGAACGATA AAGGTTGACC ACACAACCAA ACCTIGGCAAG GTGTAAGTTG CAAAGTCACC TGGAAGTCGC 324 ATTrCAAGAG CGTTGAAAAG ACCACGCACA CCCArGAAGA GACCTGTCG( GGGATGATTG GAACGAAAAC ATCACCAAAA GTACGGATAG CACGTGGAJ TGTTTAGCAA CTTCTGCTTT CATGTCATCC TTAGATGATG TTGG AATC( ACTTCATCGT ACA"'TGTr AACTGTACCT GTACCAAAGA TAATN'GGT ?TAAAGAAAG CACCTTGAAC TTTP'rCCAAG TTCTCA.ATCA crC-TATI TCATCTI-'GA CCATGACACG TAGACGAGTC GCACAGTGGG CAACACTA71 CGTCCGCCCA AGGCATCGAT GACTTTTTTT GCAA'rTTCCT GAI-rGTrCAJ CTCCTTATAT AACATTTTGT TCTTGTT'rGA AAGCGATTTT ATTCGCCGG INFORMATION FOR SEQ ID NO: 31: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 3149 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TACGATAACT
SCCAGTTCCCT
AAGTACAACA
TTGCCCTGAG
GATTTTCTCT
9360 9420 9480 9540 9600 9660 9720 9769
GACATTTTCA
ST1'GCAAAAAT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: CGCTTGAGTG CTAATTCATA GTTCTATTGT ATCACTTGGT CAGAAATAAT CAAGAAAAAA GTCTGACTTT CTCAAGATAA AAAGCCTGAG ACCAACTCAG ACTF TTTAAT TCTTAAAATG GCAATTCTTC CTCTTCC.AAG ACCAAATCTG CCAAATCTTG GCCTGCATTA TTTTCACGCA TAGCACGTTG GGCACGACTT TCCAAGAGTT GGAATCCTGT GACAAGTACT TCGGTCACGT AGTTCATTTG GCCATTTTTC TCAAAGCGAC GGGTACGCAA TTCTCCATCA ACGGAAATGA GACTACCTTT GGTTGCGTAC TTGCCAAAGT TTCTGCTAGT CTGCCCCATA GGACCATATT GACAAAATCA GCTTCACGTT CACCGTTTTG G'TCTTTGTAA CGACGGTTCA CAGCGATAGT TGCTCGCGCT ACCGACTTGT CATTGTTGGT ?TTGTGCAAT TCTGGTGTAG ACGTTAAACG TCCAATCAAG ATAACTTTAT TATACATATT TTCTTCCTCC TACTTATCTA TTCGTAGGA.A ATCAAAAAAA GTTACAGAAA 7TTGTAACrT TTCGAGAAAA TTTTTATTT TTTATGAACC ATGAAACCTG TCGCCTGTTG ATTGGCCATA ATGGTCATAT CTGTAATCTG AACACGACGA GGTTGACTAG TCACATAGAC TACTGTATCT GCAATATCCT GAGCTTGCAA AGCI'TCTATT CCTTGGTAAA CGGACGCAGC TCG'PTCTTTA TCACCATGAA AACGCACTGT AGAAAAATCT GTTTCGACAA TTCCAGGCTG AATGGTCGTC ACCTTGATAT CCGTTGCGAT GGTATCAA'N' CGCAGTCCAT CTGAAAAGGT CTTAACTGCC GCC?1'GGTGG CTGAGTAAAC AGCTGCACCA GCATAGGCAT AAATTCCTGC GGTTGACCCC ATATTGATAA TATGACCTTG ATTGGCTTTT 325 ACCATTrGCTG GCAAGAAACA GCGAGTGACT
ATGGTCAGCA
C.CGTATTGA
TTTACCATTG
GICTGCAA
ACATCCTCAC
cCCGTAATC.A TATCCAACTC ?TCATAGTCIT CCAGGATGTC AATCTGACCT GCCATCAAAC CTTTGACATT TGATAGGGAG CTAAGCCAAG ATCGTTTCTA AAATATCAGA
GGTATCCAAC
AGCCAGTCCT
GCAGACAGTC
TCATATCCGT GACATCTAGG AGAAAAGTCC AAACTGT1 G ATGAA.AA ACTCCGCCCT AAGAGCCTCT AGTCTGTCTA TCCGTCGTCC TG=rAGAACG CCTCCCAG ATAACCACGC GCAATCGCIT CACCCATTCC TGATGTCGCT CAACATT'1' TGCCATCTTA TTTCCTTCTA GCTGGTCTAT CAGATATAA
CAACTTCTTA
CTGATAATTC
TGCACATCAT
TTGGCATCAT
CTCCTGCAAC
GGCAGTCCAG TGTTTCGCTG GGTCGAACGG AAGCACCCCA CGTTT7TT~G GAGCA'rTTGG ACCAAAACTC ?'TCACCAC GAAGTTCACC AGCTCCAGGA AGCGCGACAA TGGTTTTCAA GATTTGTACA GTCTTATCAC T'rGCGACTGC GGTCACTATC TGGA'rGGGCT TATCATTAAC AAIWCTTCT CT'rGCTCATC TGTCAAAAAG CGAAAATATT CCAAGCCACr* CTTTGCGCTC CACATCCAGT CACCGACATG r'rCTTTrrA CTGCTAAAAA GTCArrGATT *TTTCCTTG'rC cTTTCTAGA CCAAATCCAT A'rACTGATCT CAATCTCTC TAAGGCAGGA TGAAGACCTT CTTGCCCGCT GTATCATAAG ACTTCC1'CCT ACGGCCATCC TCAAAAATGA AACATAAAGG GTCGCAATr'r CTCTTCCTGA GTCTTCATGA GCTCCOATA CCTAGCAGAG ACCATCTCAA CAATTTCACC GTAAAACCTT CCGCCTGCAA ACTTGACCGC GCTCTGCAAT GTTCCCCAT TATCTGTAG TTGGCATCTC CCrATT1'TT TATGTAAAAA TCA?1'GTTTC TGTTGCTTGC TTTT'ACGGTC ACAACAAGGC TAGGAATTCC CGGTCCATTC CAATAAAGGT TAAATATAAC GACAATCGCT TTT'rCCACTA AAGATGCrAA CATAGACTAG GTCTTCAT'T CGCCACCAAC CAAGCTCTCC CCCCCATGTC CGAAAAATGG GCTTACGGTC ATCTGCAACT CCAAGCCTGC CATCCACATT TGTTCCGACA ACTTGGTCTT CAGATGCAAT TCACGAGGAC TGGG.AAAATG AGATTCCCTT CCCCACACGC GCATTGGGAG AACTTGGCAG ATGTTGAGGT TACAACAA.AC TTAGGTTCCT CCTtrGGTTrC AAACGACCA TTCAAATAAA CTTGAAACTT AAAAACACGG GCTACCTTGC CACGATGACC ATAAGGACAT CTTTTTCTCC TATTTCAGTC GCGATTGACA AAACGACCGA GTAAACATCC CAGAGTTTGG GAAC'rCTGGA TTGGTCTCCT ACACCAGTCT GCCACAAAAA T'rCrrCrAAA CrrGC 'GGCT TCATAGACAA AGGTATAATG AGACTGCTTT CGTAAACTTG TCTCGCACAA TCTCTGTCAA TTTTCGTAG CAAGAGCAAG T7M1'AGCTT TCATACCArr 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 CATTTAACA CAAAAAAGGC TTCAGGACAA ATGAGGAAGC AGCAGAAAAG CAAGTAAAAA 326 CCTCTT TTAAGGAAAA GGACTTCTTA TACTCAATGA AAATCAAAGA CCAAACTAGG AAGCTAGCCG CAGGCTGCTC AAAGCACTGC ?N'GAGGTG TAGATAGAAC TGACGAgTCa CTCAAAACAC 7431rTGAGG TrTGGATGA AGCTGACGTG GTTTGAAGAG A'TTTTCGAAG AGTATTATTC TTATTGCCAG GCACCTAAGT TGCCAACGTA GTAACTATCA GGTGTGTAGG TATTGCGAGC ATC?1'ACCTG ATGAAGCCAG ATAATACTAC TTGCCATTGT CTTTGACCCA ATCATTCGCA ATCATGGAAC CAGAAGAACT TACATAATAC CATTCTCCCT TGTCATAAAC CCAAGTACTG ACNTTCATGG TT'CCTGAGCA ATTAAAGGCA A-AAAAACTGT CCAATAACAT TCGTTTrTr'A AAAGCArTTG ACACTACAT INFORMATION FOR SEQ ID NO: 32: SEQUENCE CHARACTERISTICS: L.ENGTH: 10240 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: a.
a a. *a a a a. a.
a a a a.
a a a a a. *a a a a a a a
CCAAAAATTC
GGTGCAATT
GAGGCCTTGC
AGTTATGAAT
GGTTGCACAA
GAAGCCATGC
CGTCGTCCTA
CGGATTGACG
GTGATGGGGC
CTCCTTGTrG
GAACGTGGAG
AAAACGGAAT
ATCAAGGGAA
TCGTGTGGGG
GCCTATTTAT
AACCTrTAAG
TCTAGAGGAG
ACTAGCAAGG
CCCACATGTA
GAGGAAATCG
GAGCAGGCCA
TTTCAATTTC
GAGCTGGGAC
CTCAGGGAAA
GTGGTGGGAT
TGAAAGTAGT
TGGCTCAGTA
ATGTTTCCGT
CTCCAGCAAT
CTCTGGA.ATC
GGGAGTCCAG AGAGACTCAC AAGGTGTCAG ATAAAAGAAT ACTTTTIGAG TGTGCTCTCT TGTGTTGTAC GATTTTAACT TCTrTTCTTT ATCTGGTCCC CTTAAAATTT AAGGAGGAAA AGAAGCGTTT GGGTGTCATT CGGTTGGAAA CCATGAAGGT CGCCACAATC TrTGAATTAG TCCTAGAAGG AGAAATGGTT ATTrCTTCAT CTGCGTGTAC CGGACGATGC CCATCTCTTA GTCTATTGAC AACGCAAACA AGCAGTGTCA CC'TCATTTAT TGCAATTTTI' TCAACCTTAA GTCAGGGAGA CACTCTTGAT TGGT I TGAC 'rrGTCTGACC TTGATGAGCA GAATCAGGT TGGTGTrCCA. CCCTTGCTrG AGGTGGCCAA GGAATTGCAT GACAGTCCTC GGTTTTGCTA ATAAGGATGC TGTTATTTTG TGGTCAGGTC TTTGTAACGA CAGATGATGG TTCTTArGGC TGTTATCAAT GATTTAGACA GTCAGTTTGA TGCTGTTTAC GATGAAGTAT ATCAATCAAA CCTTTGATGA TCACCCAAGA TCGTATGGCT TGTGGGATGG GAGCTTGCTA TGCCTGTGTT GACGGTCAGC CAACGCGTCT GTGAAGATGG TCCTCIPTC CTAAAAGTAC CAGAAAACGA 327 CGCACAGGAA CAG7=GATT AThAAGAGAA TCTACCTGGT TrGGATTTGA AAAATCCCAT ACAAGAGTAT GCCAAGTAcT ATGATTTAGA AACCC7TGAA CCACGTT'rra GGAATCCAAC
AA'N'ATGACT
TATTCCAGCA
CCTTTAGGT
TCCAAGACTG
ACAAATCGAT TACAAGTTTC TCAGGCTGTr TTGGCTTTCC TCTATTATGA TCAAGGCGAC GCAGACACCC CTGCTCGTAT GTTTGGCTG AAAACCTACC AATGTAGCTG GT'rTTrCAAA ACTAATGTAA AAGCTATCGA GCTCAATGCA AT'rGGCTTGC AAAATCCTGG TTTAGAGGTT TTGGCTGGAA AGAGAATATC CAAATCTTCC TATTATTCCC ACAAGAGTAT GCAGCTGTTT CTCATGGGAT TTCCAAGGCA GCTCAATATT TCTTGTCCCA ATGTTGACCA CTGTAATC-AT GGACTTTTGA TCCAGATTTrG GCTTATGATG TGGTGAAAGC AG~CTGCAA GCCTCAGA-AG TGTCAAATTA ACCCCGAGTG TGACCGATAT CGTTACTGTC GCAAkAAGCTG GG.GAGCAAGT GGCTrACCA TGATCAATAC TCTGGTTGGA A'rGCGCrrG TAGAAAACCA ATCTTGGCCA ATGGAACAGG TGGAATGTCT GGTCCAGCAG AGCCCTCAAA CTCATCCGCC AAGTTGCCCA PAACAACAGAC CTGCCTATCA AGGACTGGAT TCGGCTGA.AG CTGCCCTAGA AATGTATCTC OCTGGOCAT AGTTGGAACA GCTAACTTTA CCAATCCT'rA TGCCTGCCCT GACATCATCG AAAAGTCATG GATAAATACG GTATTAGCAG TCTGGAAGAA CT-CCTCAGG a.
a a S. 55 5 a S a a. *a
S
a 5 a a. a.
a a
S
S. a a
TTGGTCAAGA
TGCCAGTTTA
CAGAAGATGC
ACCTCAAAAC
TCTTTCCAG'r T1rGGAATGGG
CTGCTATCG
AAAATTTACC
AAGTAAAAGA
TAATATGAAT
AATGAAGAAA
TGCAATGGCT
ACCAGCAAAT
1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 GTCTCTGAGG TAAACTGCAA TCAATCTGTT TTAGGAGAAT TTTGGTACAA TAAAATAAAT GTAAGATTTA 'rTTTTTTAGC TC'rGCTATTT AGTGATGGTA CTTGGCAAGG AAAACAGTAT GAGTGGG3TTT TTGATACTCA 'rTATCAATCT GCTGAAAATG AATGGCTAAA GCAAGGTGAC ATGGCCAAAT CAGAATGGGTr AGAAGACAAG AAGATGAAAA GAAATGCT'1C GGTAGGAACT ATAGAAGACT GGGTCTA'rGA TTCTCAATAC CTTGATTT7TT TATTAG1T'TG
AAGAACAGAG
TTCTTAGCTA
CTCAAAGAAG
TGGTTCTATA
GACTATTTTT
GGAGCC=T
TCCTATGTTG
GATGCTTGGT
GAAGAAGGTT
GTCCAGAGGG
ATGGCAGTCA
TAAAAGCAGA TGCTAACTAT ACCTCAAATC TGGTGGCTAT ATTATCTTGA CCAAGATGGA GTGCAACAGG TGCCAAAGTA TTTATATCAA AGCAGATGGA ACTATTATT CAAATCCGGT I'GAATGCTAG TGGTGCCAAA GGTT'rTACAT CAAAGAAAAT ACTATTATTA TCTAAAATCC CAGCACGCAG AGAAAGAATG GCTCCAAATT AAAGGGAAG GGTTATCTAC TGACAAGTCA GTGGATTAAT CAAGCTTATG GTACAGCAAG GTTGGCTTTT TGACAAACAA TACCAATCTT GGAAACTATG CTGATAAAGA ATGGAIrTTC GAGAA'rGGTC GGrGGYTACA
TYTGATGGGA
TACTTCAAAT
?TTrACCTCA CAAGCTrGGT
TATC-AGCTTG
TGGCAGCCAA TGAATGGATT AAATrGCTGA AAAAGAATGG CCGGTGGTTA CATGACAGCC AATCTGATGG GAAAATAGCT ACTACTrCAA ATCTGG1TGGC GAAGCGATGG TAAATGGCT 328
TGGGATAAGG
GTCTACGATT
AA'rCAATGGA
GAAAAAGAAT
TACATGGCGA
GGAGGAAAAA
AATC"N'GGTT TTATCTCAAA CTCATAGTCA AGCTTGGTAC TTGC.GATAA GGAATCT'rGG GGGTCTACGA TTCTCATACT AAAATGAGAC AGTAGATGGT CTACAAATGA AAATGCTGCT CAGATGGTGA AAAGCTr'rCC GAAAAAGTGA TGACAAGCGC 2760 2820 2880 2940 3000 3060 3120 3180 TACTATCAAG TAGI'GCCTGT TACAGCCAAT GTrrATGATT TATATATCGC AAGGTrAGTGT CCTATGCCTA GATAAGGATA TTGGC'TATTA CTATTTCTGG TTTGTCAGGC TATATGAAAA GATGCTAGTA AGGACTTTAT CCCTTATTrAT GAGAGTGATG GTrGGCTCAGA ATGCTAGTAT CCCAGTAGCT TCTCATCTr
CAGAAGATTT
GCCACCGTTT
CTGATATGGA
ACAAGCGCTA 3240 TTATCACTAT 3300 AGTAGGCAAG 3360 TCCCTTCCTT 3420 GGTATTTAGT 3480 AAATATTATT CGGCAGATGG TTCAAAGAT'r TAACAGAGGC rTTCTAAACA TTAACAATAG GAACATTACC A'rATCAATC GGAAGAAGTA AAATTGCCAA ACCCCTTACC TTTC7GC'rAA AAGTGGATTA AGGAAAATTA GGT1ATGAATG TGGAATATCC C'rTGGAG AACAAOGGCG CTACTTTTAA GGAAGCCGA.A TCTTTATCTC CTTGCCCATA GTGCCCTAGA AAGTAArTGG AGATAAGAA'r AATTTCTTTG GCA'rTACAGC GACATTTGAT GATGTGGATA AGGGAATTTT TATCGATAGG GGAAGAACTT TCCTTGGAAA TTCAGACCCT TATTGGGGCG AAAAAATTGC CCTGCATTTT GATGGTTTA AGCTTGAGAA TACAAACTAC AGTGCTGAAG AAT'rGGATAA
CTATGATACG
AGGTGCAACC
CAAGGCTTCT
TAGTGTGATG
ATGAAAATCA ATGAGAAGCT AGGTGGCAAA GATTACTACT ATAAGTGAAT ATGATTTGAG TGAATAGTAA GTTAAAAATC CTG.ATTTCAA GTAAAATCAG GAT=TTTCA TGGATGCAAT T7TTTGGAG
TTTTCCACCA
CGACAGGCGC
TCTGGTGTGA CGCGGAGGGT GTTGC=?AT TGAGI'TTTT C'N'TTGTCCT GTGTAAGTGA CAAAGCCGGG GACTTCAATC ATATCTACCT GCACCAGAI-r 3540 3600 3660 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 CCTTGAGAGA AGTAGGCAGC AGATGGGTCA AGA'rTTCCTG AGATGACAAC CCAAAGTTCC TCCTTGCGGG CCATNTTAAA TCCGACATAG ATGATGGTTT TGCCATCCCT CTGGATTTTC TCCCGTTGTC TTCTGCGGAT
TAACTCTGCT
ATGCCTTCCA
GGTCAATTCC
'rGC'rAGATAT AAAACCTGT'r GCGTCTGTCT TGACTGCATC GGAATGTCCT TAGCATGGAA TCATTTTGAA GATTGTTTCG TGTTCTAGTT TTTTGCGTTT TGAATCAATT CTTCACGGAT TTCAGCGATT TC=TCAGTC CAGCTTGGTT GAGGACGGTI' TCACACI-rT CCAGATAGA G AATAGTrGGCT TTGGTTTCTT CAATCAAATC AGTCAAGTAT TTGACAGCTT CTTTGAGTTT 329 C'TGATACCGT ?IAAAATAGC GPrGGGCATT C1,GGTTGGGA GTCAGAGCCT TATCAACCC AATCATGATA GG7TGGTTGG TATAGTAGTT GTCTAGGATA ACCTGGTCTT GGTCGTTAGG CACTTGGTGG AGGAAXGTTG TCAGCAATTC TCCTrrI'TGA CGAAATTCrT CAGCGTTGTC TGTCGCCAGT AACTCTTT T CCTGTTrTTTT GAG7rrGTGT CGGT~rT'r GAAGTrCATT TTCAACACGA CGAATCAG?= CACTGGCCTG CTGrrTGACG CGGTCGCGCT CTTATAGTAG G'rGTCCAACA AATCAGAAAG A~rGCAAAA GGCTCTCCr-A AAAAGGAACT GGACTGAAGG A'rTTCGGAAA GCGGAAAGTT GCCTCCCAGA CCTTGAAAGA GATTTCAAAG AGCTPTTCAT ACATATAG GTCGATCCTG ir 7ATAACr TCGAGGATTT CCCCATAATT TCGATAATCA AACTGTAATT TCCACAATAC CTGCAAATAC TTTCTCAAAA AAGTCTCACT CAAGCATG.GC TTTCACTAAC CAGTATCCTT GGCTTTGAAG ATTTT'rTGCT CCTTGATAGT AAAAGGATTG GAAGTAAGGT GCGGTAGCTA TrGGTTCTT
TCCAATTCAT
GTTAGTTCTT
CAGCCTTATC
CCTGATTTGC
CATTGAAAAA
TTGCCGTATC
GCGGTTTGAG
ACAGATTTTG TACTTGGCGG 'TTTTGTGAAA AGCCGACGTG TATGACTGCT TTTATCGACC AGTAGAATAT TACTGTGTTT
AGGTAGCCTG
GGTCAI'=T
GATATGGTCT CCAATCTCGT TTrTATTGGA CACTTGCTCA ATCGACTCAA TCAGGGCCCC GGTAGAAGGT TGAGCTGGAT TTrCAAAAGT TGGATGGGCA GAAAGGAGCA GGCGATGGCT 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240
CGTTTGG.CTC
PTGGCGATTG
GATGCGACCA
AGCTGAATCC GTCCAAAAAC CTGCGGATTT GCAAGACCAA TTCACTAATT CGCTTCGCAA GTCAAATGAC ATCGTTCTCT AATAAAATCA TGGAAATCTG ATATTGAAAT GGTAGATGAA AAATITGGA A7"rTGCAGCC TTGTGACCAA TGAGCGTAGT
CCTTGTGATT
GTATAATAAA
ACTGGTCAAG
CTCTTGTwrCA AAACGCTGAT TTCCTCAACT ATGTGGTGTA GT~ATTCCATA GTATTATATC GCCAAGTAAA GAGAAACGAG TTTCAAAAGA AATGTTGCAA
TGA'TTTTCTG
AAAAAAATCC
AAAAAGGTAG
A.AGCACATGT
CAAACCCAAG
GCAGTCACTT
GACCGTCCGA
GAAGAGGATT
ATTGGGGAAT
CAAAAATTAG GAAAAGAAGA CAAGGAGATG CATGAACTTA ATCTCGAG1'A CCGTAACACC TATAAACCAG AA71TGGAAAT TGCCTTTGAC GCAGAGATGA TGTCTGAGTT TGATCCCTAT
CAGATGTCAT
TGCTTGAAAA
TGTTCATCTC
GTGAGATCG
CTCCGGAAGA
TCACAAGACA
CAGCCTTGAG
TTCAGAATTG
TATCGATAAG
CTTCTTGGCA
GCTCATGAGC AGGCCGAAGA ATATGGTCAC AGCTTTGAGC GTACACGGCT TTTACATAT TAACGCCTAT GATCACTACA AGAAGCGGAG ATGTTCGCTT ATAAACGAAA ATGGAAAAAT TACAAGAAGA AATTTTGACA CCrATGGAC CGTGAC~wrGA TATCCAGTTr AGAA'rTTGCT 330 TTGACAGGTA TrTTACTGC TATCAAGGAA GAACGCAATA GCTCTAGTCG TCATCC?1'GC AGGrr'rGTT ?TTCACGTGT CTCCTATTGA GTAN'rTCTT GGTAGTAGCC TTTGAGATTA GTGGTGGATT TGGCCAGTCA CTATCACN'T TCCATGCTGG TGCGAAAACA CGCAGTGACG CACGAATCGA ATGGCTCTT TCAACrCTGC TA~rGAAAAT CTAAAAATGC CAAGGATATG GCGGCCGGCG CGG'rATTAGT CTCCCACGAA TCTGGGATTT AGGC7"NTGTA GCCATIrrAG TATGGGGCAA AAGATTGCCA GGGAA'IrTAC ACGACTGATA GCCTAAAACA GCTCTCGGAG GGACACTGTT CTT'rCATGG TATCGAGCGT CTCAAGGCTG GGTT'rCTCTT TTCGCAGCCT TAACAGGCGC ATTGATTTTT ATTATTTAA ACAGTAAGAG GAAATTATGA CNrTAAATC GACGTCCCAA TCTTGGGAAG TCAACCrMr' TAAATCACGT TCATGAGTGA CAAGGCGCAG ACAACGCGCA ATAAA.ATCAT AGGAGCAAAT TGTCTTTATC GACACACCAG GGATTCACAA ATTTCATGG'r TGAGTCTGCC TACAGTACCC TTCGCGAAGT TGCCTGCTGA TGAAGCGCGT GGTAAGGGGG ACGATATGAT CCAAGGT'rCC TGTGATT-G GTGGTGAATA AAATCGATAA GGTCCATCCA GACCAGCTCTr 'GTCTCAGAT GGAAAT'rGTT CCAATCTCAG CCCTTCAGGG GAGTGAAAAT CTGGATGAAG GTTTCCAATA AGAACGTTTC TTGGr'rTCAG AAATGGTTCG GATTCC~cAT 'rCGTAGCAG TAGTTGTTGA GGTTCACA'rC CGTGCAACCA TCATGGTCGA TAAAGGTGGC GCTATGCTTA AGAAAATCGG TGATGAC'rTC CGTAATCAAA TGGACTT'rAA AAA'rAACGTG TCTCGTCTAG TGGA'rATTTTr TTTCCCGTCT GATCAAA'rCA CAGACCATCC CGAGAAAGTC TTGCACCTAA CTCGTGAAGA CTCTATGAAA CGAGACGAAG*AGACAGACAA GCGCGATAGC CAAAAAGGGA TTATCATCGG 'rAGCATGGCC CGTCGTGATA TCGAACTCAT 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 GCTAGGAGAC AAGGTCTCC TAGAAACCTG GGTCAAGGTC AAGAAAAACT GGCGCGATAA AAAGCTAGAT TTGGCTGACT TT'GGCTATAA TGAPAAGAGAA TACTAAGTAG AGGTAGGCTC ATGCCTGCTT C~rcTTTTrA CAGAAGGAGGLACTTATGCCT GAATTACCTG AGGTTGAAAC CGITTTGTCGT GGCTTAGAAA AATTGATTAT AGGAAAGAAG ATTCGAGTA TAGAAATTCG CTACCCCAAG ATGATTAAGA CGGATr'rGGA AGAGTTTCAA AGGGAATTGC CTAGTCAGAT TATCGAGTCA ATGGGACGTC GTGGAAAATA TTTGCTTTTT TATCTGACAG ACAAGGTCTT GATTTCCCAT TTGCCGATGG AGGGCAAGTA 'TrTACTAT CCAGACCAAG GACCTGAACG CAAGCATGCC CATGTTI-rCT TTCATTTTGA AGATGGTGGC ACGCTTGTTT ATGAGGATGI' TCGCAAGTTT GGAACCATGG AACTCTTGGT GCCTGACCT'r TTAGACGTCT ACTTTATTTC TAAAAAATTA GGTCCTGAAC CAAGCGAACA AGACT'rrGAT T'rACAGT'CT TTCAATCTGC CCTTGCCAAG TCCAAAAAGC CTATCAAATC CCATCTCCTA GACCAGACCT TGGTAGCTG 331 ACTTGGCAAT ATCTATGTGG
TTCCCAGACT
GGGCCAGGCT
AGATGGAAGC
CTG'rCGTACC
CTGTCAAAGG
AAGTCAACTG
GTCGTCCACC
GGGCAAGAAA
TTTTCAAATC
GAACTGGCTA
CCCCTACTTT
GACCGAGATG
GAGTCTCGTC
CTTGA'rAATA
GGTAGGCAAG
TTTTCTGACA
TCTAGGTGTA
TATTTCCGCG
ACCCATGATG
CCCAAATATC
TCCTAATGCA
AGGTACTTTG
TATCGCAGAA
AGCTGC'rAT
TTGACAGCAG
GTTGAAAAAG
ATGCAGGACT
ATCATrGAGA
AGGGACTGAT
TGACAAATTT
ATGAGGTTCT CTGGCGAGCT CAGC=,CATC CAGCTAGACC AAGAAGCGAC TGCCATTCAT GACCAGACCA TTGCTGrMr GTGGCrCCAC CATCGGACT TATACCAATG CC?1'TGGGGA TTCATCAGGT CTATGATAAG ACTGGTCAAG AATCTGTACG AAATTCAACT AGGCGGACGT GGAACCCACT TTTGTCCAAA GGGAAAAATC ATCGGAATCA CTGGGGGAAT TGCCTCTGG TCTAAGACAG CAAGGCTTTC AAGTAGTGGA TGCCGACGCA AACTACAGAA ACCTGGTGGT CGTCTGTTTG AGGCTCTACT ACAGCACTTr TCATTCTTGA AAACGGAGAA CTCAATCGCC CTCTCCTAGC TAGTCTCA'rC CTGATGAACG AGAATGGTCT AAGCAAATTC AAGGGGAGAT TA'rCCG'rGAG CTTTGAGAGA ACAGTTGGCT CAGACAGAAG AGATTTTCTT CATGGATATT TTGAGCAGGA CTACAGCGAT TGGTTGCTG AGACTTGGTT GGTCTATG'rG CCCAAGTGGA ACGCTTAATG AAAAGGGACC AGTTGTCCAA AGATGAAGCT TGGCAGCCCA GTGGCCTTTA GAAAAAAAGA AAGATTTGGC CAGCCAGGTT ATGGCAATCA GAACCAGCTT CTTAATCAAG TGCATATCCT TCTTGAGGGA ATGACAGAGA T'rAACTGGAA GGATAATCTG CGCATTGCCT GGTTTGGTAA 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780
GGAGCCAGTA
GGGAGTCAGC
GCGCTCT'NTT
AT1TCGGGCAG
TATTGGTTAA
ACGGCACTGA
TCTACAGGCG
TTA?1'TGGCA TTGACTATr TTTCTTTrGGT 'rGTACC=Tr AAGTCGCTTTr TTATGCAGGC C'rCCTATTTG GGGTATTCTT GTCTTGCTAT GACTATCACT TCTTTCT1'CG TTTACTAAAC TAGCCAGTCA GG='CCAAAG 'rAGTTGCAGG TACTCTAACT TTCGTACAGT TTCTTACTG GCTT'rATCAA GGAAGA'rTrT ATGCCCATCT TCGTGGAAAA TTAGCA.ATTT CTGTCTCTGC GCTGACAAAT ACGGCCGAAA ATGGGAGGCT TGGCCTTTGT GGTGTATTTG CAGGTT=T'G GAGAAATCAG GCTCTGCCTT GGTCCCTTTA 7rGGTGGCTT1 GTTGGTAGTT TTCTATrTTT CAACCAGTAG CCAAGGAAAA GGCTATTCCA ACAAAGGAAT 'rATTTACCTC CTT'TTAACC AGT7"rTGTCA TCCAAT'rTTC TTATGTACGC GACTTAGGGC AGACAGAGAA CAGTATGGGC 7TTCCAGCA TGATGAGTGC
GGTTAAATAT
AGCTCAATCG
TCTTCTTTrT
CCCTATCTTT
ATTGGCCCTA
GTCTCTGGTT
'rGCTCAATCT TTTTGGCTCTr TGA'IrGTGTC GTGACAAGGT AGGAGTCATG GGCAAGCTAG 332 GGGCAATCAT CGTCTCTTGG VI'GTCGCCCA G'TPTTATTCA GTCATCATCT ATCTCCTCTG 9840 TGCCAATGCC TCTAGCCCCC TrCAACTAGG ACTCTATCGT TTCCTCTTTG GATTGGGAAC 9900 CGG'rGCCTrG ATTCCCGGGG TTAATGCCCT ACTCAGCAAA ATGACTCCCA AAGCCGGCAT 9960 TTCGAGGGTC TTTGCCTTCA ATCAGGTAPT CTTTTATCTG GGAGGTGTTG TTGGTCCCAT 10020 GGCAGGTTCT GCAGTAGCAG GTCAATTrGG CTACCATGCT GTCTrTrATG CGACAAGCCT 10080 LI'GTGTTGCC ?r'rAGTrGrC TCTTrAACCT GATTCAATT CGAACATrAT TAAAAGTAAA 10140 GGAAATCTAG TGCGAGTAAA AATCAATCTC AAATGCTCCT CTTGTGGCAG TATCAATTAC 10200 CrAACCAGTA AAAATTCAAA AACCCATCCA GACAgATTGA 10240 INFORMATION FOR SEQ ID NO: 33: SEQUENCE CHARACTERISTICS: LENGTH: 13206 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: CGCTTTATCG TGGACGTGGT CAAGCCGAGA ATNTCATCAA GGAGATGAAG GAGGGATTTT TTGGCGATAA AACGGATAGT TCAACCTTAA TCAAAAACGA AGTTCGTATG ATGATGAGCT 120 GTATCGCCTA CAATCTCTAT CTT'!'TTCTCA AACATCTAGC TGGAGGTGAC TTCCAAACTT 180 *TAACAATCAA ACGCTTCCGC CATCTTTTTC TTCACGTGGT GGGAAA.ATGT GTTCGAACAG 240 *GACGCAAGCA GCTCCTCAAA TTGTCTAGTC TCTATGCCTA TTCCGAATTG TTTTCAGCAC 300 *TTTATTCTAG GATTAGAAAA. GTCAACCTGA ATCTTCCTGT TCCTTATGAA CCACCTAGAA 360 GAAAAGCGTC GTTAATGATG CATTAAAGAA CAGTCGAGAT GAAAAAATCG TGTGACGCAC 420 CAAGOGAGGA GTCTGCCCTT TTGAGGAAAT CTAGCGAGGA AAAACGATAC TGGAACAGCA 480 C AAAGTAAAA CTGACCTCAT GAGGAGG.AAG AAAGTGGCTC ATGAGGTCAG GGGTTTTG12A 540 AGTTACATCT AGTTGAGAGA GGTATGAATG ATTTGGGATT AATCATTI'CT TGTTTTAAAT 600 **.CAGGAGAATA GTAACGATTT TTTCCTTTT TGACGAACTC TATTCCGTAA CGATCAATCA 660 ATTTAATCAT GTACCTAATA TTAGAATTGT TTATCCCAAA TTTATTTrGAA AGCTTCTCTA 720 AGCTATATCC TrGTTTTCTA AGTTCATAGA TCTGAACI-r ATCATCATAA GTTAGTTTCA 780 **TAATAAAAAC ACCCCAAAAG TTAGATTTTT TCTGTCTAAC TT'rrGGGGGG CAGTTCATTC 840 AACACCTGAT ACTATGCGTT TTTCTTATTT GAAATACTTT TTACTCAACC TCTTTATACT 900 CAATGAAAAT CAAAGTGCAA ACTAGAAAGC TAGCCTCAGG CTGCTCAAAA CAGTGTTTTG 960 ACGTTGCAGA TGGAAGCTGA CGTGGTTTGA AGAGATrC GA.AGAGTATTI ACTTAATCTT C?'GATACTT TGACTAAGAA TAAATCCTAC AATCATCCCT ACCATA=r GCATAAAATT
CGGTAGAATT
GCCTCCTACC
TCCTGCGAAA
GCCTGATAAG
ACCAAAGTAA
TCTGGGAGGG
ATGGCAATAG
AATCCCTGCA
AGGTCAATCA
AAGGCCGCAA
CTGCTGCCCA GCCATTCATC AAAGCAGAAC T'rGCTAAAAT AAGGCCTAAC CACTGACT-TT AGCCATGGTT GACCAAGCTA AAGAACATCC AGAAACTTGC TAGTCCTCCG ACTACCGCTC AGAAGACACC AGCATCTAAA AGAGTTAGAA
CCAAGGCGTA
TTCCTTTAAA
ACTGAGGGTA
CTTCACCACT
TTCCTGTAGG
TGTTGGGATT TTAAGAAAT GGCGATT=A GTTGT7=?G CAATAGCACG ATAAACGAAA TAACCAGCTG ACTGGCAATG GGATAACTGG ATTTTCTAGG TGTCCTGACT AAGACGA'rTG AACCTAGAAC CACAGAAAGG ?TTGCTTCAT ATTrCTT~AC GCCTTAGAGC TTTCrACTGC CTAGAGsCAA ACGTACAACs ATAGTAAAGG TCTGTCCATC CCTCCCTTGA TAATGACTG t GCCGTTAATA GGGATACAAG TCCATACTCA TCTGCTTGTG TGGCAAAAG'r 'TATCACCTT TGCACCAGCA TTTTGGCCTT ATAAAAGACA TCCACAGCCT GGCGCTCCTA kATCATGCAA TCCTGACCGG ATAATAATTC TTTCTGCGCT GCAGTTTTCA TGTCTTCCPA TGCTTCTGGG AGAT'rAGGCG TAATCACACT GCACAGCTCA CTGACAGCTA CATCATGCG'r CACAGGTACT CCTGGGCGTT GITTGATAAA AGGGAGAAGA CCAATCTTAA TTCCCCCAAA TTGAAAAATG GTATCATCAG TTGGAAAGAC ACAAGTCACT GCTACAAACC CATGCAACCC CAGTCCACCA CCACTAAA-AA TATCATTTCC AACCAATCTC CTTTAAATAC AAACCATTTG CCTTCTTCTC CAAGATGAGA TCAATCTGCT GG7=rAAT'r
GACATAAGGG
TTCTGCAG
GTCCAAGGCC
TTCCACATCA
AAAAAGCGAA
ACCAAGACAG
TTCTCAGCCA
CGCAAGCTAT
TTCAAATCCT TTTTCTGTCA GTTCAAdGTA TAGGTAGCCA AGAAAGTGCT AAAATACCAT GTGCTGCAGT GGGACCTGCA CTACTGGCAT CCGG'rTGTTA
TCAACTCTTG
GATCCAACAC
CGCTGACAGT
CTAATTCATG
AGGCTOT CAA
AATCAGCTGA
TATTCTTCAT
AGTTGCCTGT
CCGAT'rTTGA
GAAAAGGTAA
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 GAAGAGTCCC CACCATATTC CGAATC'TGTr TATACAAGAA ACCAT'rTCCT AGGTCAAAAA TTGTCCTGTC TCATCGACTA TTAAACTAGC TTCTGTGATG GTGCGAACCT TATCCTCTAC ACTAGTCCCA GAGGCTGTAA AACCGGTAAA ATCATGGGTT CCCTCTAGCT
TTTGATTGC
GACGGCCCAT
CCT'rGGCATA AATCTGCATT CGT'rCCACAT CGAGTGGGTA GGGAAAGTCG CTGGCATACT CGGAT'rTTGr GGACGTCCTC TATCCACAGT AAACTCATAG GTCTTGCTAT ACGGCAATGA AAATCATCTG CCACAAGCTC AATCGAAATC ACATCAATAT 334 CTTCAGGAGA CTGGGTATCC AAGGCAAAAC GGAGTTTrCTC GGTCAAAATG AATCACCTGT CCCAGGGCAT GAACCCCACT GAACAGTAAT GGCI'GCCCrT ATTTAATC TGGTCAAGGT CGCTACGCGC ATGAGGCTGG CGCTGAAAGC CAGCAAAGGC ?TCrATA TCTCGTCATA GccrTA~rr TATCAAGAAA TAAAACAAAT ATTGTATGGC TATAAAAATC TCATACrTN GTCAGTCC ATCTGCAACC TCAACACACT AITTTGACCA AGTAGATTGA AATAAGATAT GAACAACTCT A'rTAGGAAAG ATTTTAGCAG CTACACCGTA CTATTCCAAA CTCAATCAAC TCATTGAGTA TCAAAAGAAA AACTTAGGAA TCAATCCTAA CATGACAAAG ATAGAGATTA CAATCAACCA ACCTCCTAAG CTCATCCATC TGATAAGGCA ATC'rGTCCTA CCAGCACCGT TT"=CAATT TCrTCCTCAA ATAACCATCA TAGGAAATAC TTAGTCTGTA AACAAGGACC CGAAAATCTC TTCAAACCAC ACCTGCGGCT AGCTrCTAT TCAAATTAAT TTCTAGAAAT TATAGTTTGC TC'rTTGATTT GCTCTCTrCr GAAGTAGGTA ATACTAAAGA CCAACATCCC 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 ATTGTGAGTT AGTAAGCCAA ACAGCCTAAT ACAGCAAATG
SSSSSS
S
S
S
S
S
S
GACAACTTGA
TGC 'ACTACC
AATACCAGAC
CCCAGCCACA
CCCCAATCCT
AACTACAACT
AAGACCGTCG
ACCCACAAGG
CAGAGGACCA
ATCCCGATCA
CTTrCCACCA
GCCGCTTCGA
TTGCACCTAG AACGAATGGG GTCGTAAAGG AAGTTGCTTG ATTGAGGAGT TTAGCTGGAA TCAAGACTAC ACTATAGGCA AATCCAGCCA ATGAAGACAA GGCAATCACG ATT'GCCCCA GTTTCCTTT AAAGATAGAA ATCAAGAAAG ACTGCATGAT ACTAAGAACA AAACTAGATA TCAAACTTGG AATACGGATG GTAATAGCTG TAGCTAAGGT AAAAATCAAG CCTTTCATTT TTTTCTTGAC TTCI-rTCTTT GATTTTCCAT CCAAAAATCC AGCACTATAG GCrAGAAAGA CGACGGCCAA GGTAATGAGA GAAGC'rCCAA TCTGAATTCG CCT-rTCCq' TGGTAGCGTT TCATCCCAAG ACCCAAACCA AAGAGAAGCC CTCCGAAACT 3420 TTCGTTCAGA 3480 GAACACTTCC 3540 AGCCAAAGGT 3600 AAAAACTCAC 3660 AC'rGGGCATC 3720 TATTGGTACA 3780 CTCGAGTrAA 3840 AAGGGACAAA 3900 TAGCTGTCCA 3960 CGACCTCTGC 4020 CACTGATAAT 4080 GTGTTCCAAA 4140 AAATCAGCAA 4200 TTAGCAGTAA 4260 CCT'rAGAATA 4320 ACGACTTGCT TCCTTCGCTC GAGCAGATAA AGGGGCAGCA ACCAAAGGCC AACAACTGAC AGAAGCCCGT AGCCCTAACA AGAAATGGCC TTGGCATTGA GACAA.AGGGA TAGGCTTGGT ACCAGAAGGG GCCCAAACTA ATCTGTAAGC GCTCAGGAAA CATCATCATG ATTCCAPAAGG AACGCAAGCT ACCCTGATAA TAGTCAAACA TGGCTGGTAG AACGAGGGAG AGAGCCAAAA TGCTGGCCCG TATATTTCTC TTAATCTTCT ACTTITGA AGCTGTACCG CTCAATGATA TAT'rTTTCT AAGAAACCAT CACCAAGAGC TCAATT'rGTT GGCACTCGAA ATGGAAAAGG AGGTAATCAA TTCTAAkAAAT TGTTCATGA AATCTCTTTC TAGTTATCAA ATAAGCAAGA AAAGAAGAAG 4380 4440 4500 CCTCATTGGT rT=AGACrC CTrCT'rAAA'r TCGAAAATGA ATCCCNTCTA TCTTATACTC AATGAAAATC AAAGAGCAAA CTAGGAAGC AGCCGCAGr TGTTCAAAAC AGTGTTTTGA GG7TrGCAGAT C-AAACTGAC GTGGI-rrGAA GAGA~rTTCG AAGAGTArrA CTCTTGATTT GCTTGATAAA ATCAGACACC ACTTAAACAC AAAGGATTAT cCCTrCGCArr
AACATCATAT
CCCTGTTTGT
AGGAAATTTT
ACAGAAAGGG
CAAAAAAGAG
CTAC=ATA
CTAGAAAATA AATCCTCCTA CCATA'rAGGC AACATTCCAA CCCTTGTrCA CA-rCAAAAA TGGAATATTC AGrTTrAGAA CCP.AGCCArr TAAAATGGI'C CACACTGCTC GATCCCAAAT GGTATCCGCT AAAAACCAGA TGGGAACGAT GAAATTAGTC GCAATGGGCG CCAAGAGGAA CATGGTGACC CCACCTTTTA AGCGCAAGAG GCTCATAACC rr'rAGAAGAT AAAGGGTA;A GTAATAGAGA AGCATCCCAA AACCACCATG AGCCGCTAGA AACAAGAAGA TACGGCTATA ACACAGGTAA TCATGATACT TGCCAA'rrTT CACCTACACG GGATGAC~rr
AACAAAGATA
GAAGTAAGGG
AAAAAGAGCA
CTTGTATTGA
ATACTGGCAA
ATGTAAATC
ACTTCGGCCT
AATAGTTACC
CTTAGTAATT
AAATACAAGT
T'rCATGGCAC ATATTTTrrCA 7T GATGAGAG AGACCTTCTTr
ACCAGTAGCG
S.
S. eS S S S. S S AAGAGG'rTGG TCAAGATAAAk TTATAGTGr
CAAAACCATC
CGCGCGTCCC
TTACTGTATC
ACAGAACCGT
CTCCCGTAA6A
TTGACATGCT
AATCTTACAG
TTrGTTTcAAA
ACCATCAGCC
TAAATCTTCC
TCGATATTGT
TCTT1'TGGCG AATrrAT-rTC
GGATTCCACT
'rCTGAGTTAC
CTATTCTTTG
TCACAAATT
CTACT'rCTGC AACTTCAGCA CACCGTCTTC GTAGACATAC TTTI-CTCCT-rA TCATCATTCA ATACCCTAAA A'rCAGCATTT
TCACAAACTC
rGTCGCCTTC CACCTrAC CGTTGGCA'rC
CATGACCACA
ATTTTGGACA
AAAA'rCAA
AGAAAAAAAC
GCTTTGAT'T
TCCTCTGCTr
ATTCCTTT
CGACGCTCTG
AATACAACAG
TGTTGGAGAG
TCATTTAAGT
'rACGATGCGG
TTTCAAGTCC
GATAGCGACA
CTCTGGGCAA
4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 S S
S
S
*.SS
S. 5 0
S
TTCTACATTT ACATTCTTT TTCAGCTTCT TTCAAACCAG r'rGTATCAAG GTAGACAGCA CGATGACTAT CCTTGTAGTC ACGCGCAGCA TqAATTCCCT TGGCAATATT TTCCTTGTAA ACTAGGAAAA TTTT-CAATC TC?IGTGGC ATGACAATCC CGCCTTGCTG GGCAATTTCT TGAGGAATTG CTGCAATAGC AGAAACATGA GTAATrATCCA CATCTCCTAC AAAAACAAGC AT'rTGGTAAA TTGTTCATGG TTTCTCGAAC AGCAACTAr'r CGATATCAA'r CTATCGGCTr TTTCAACTAC TTCTTGAATG GTT?13A4AGG AGAAGTCTCA TTAGGGT'ITC AAGGTCTGTT CTCTCTCATC AACAGAAGCT TTCCAATATC GCGACCATCC AAACCAG=r CTCACGCACT TTGGTCACTT CATTT'rCACG GA'rAGGATGG TGGTCTCCAG TTTCTGAACG TCCAAAGCTG ATTGGATGCT GGTCCAACAA GGCTAGAAGG rrAAGAGCCA TATAGGTCGC TGCACGATAC AAATCCTTAG CAATAATCTT TG.CGACCGTA GCAAT'rTGAA TTCNTTCAT ATCGGCTCCT CAAACCAAGA TCCTGTAGCC ATGTGCCCAG T'rCCTGCACG AGCGGCAATA GCr.=~CCC CAGGATTAGC AGCTTCTTCT GAACTACTAG GCTGAGAACT ACTTGAAGAT GAGATr'rGTA 336 GCTTCGACTT CTTCAACTCC TAA'rrGGrrC ATAGCTCCTG 'rATCAAGGTA CGTGAATCCA CTCTTACCGC TGGAAGCAGG ACCA'rCAA ATTTTA'r'?rr TATAACATCA CCTGGATTAG GATTCAAGGC tTTAACTCA GCAATGGAGA CTTCTCCTGC GAGAACTTTA ATCGTr0CCTT AAGTAGATTC TGGCTCTGAA CTCTGCTCAG CTACACTGGC ATCAGAATCA TGAAAGCC?1' TTrAAGGCTGC TGTGCGATTA CTCCCCCCCG ATGATAGATA GATGAGAACG ATGACCATCA CCACCACAAT TACAAAGAAA ATACTAGCTA GGATCGTCAA AATACGATTA GCCATCCTAT CAGCCCCTCC GTGG CGA TCCCGACGCT CTGCTCT'rGA TrT CTTGA 0 0
S.
4 0 000 S *9 .5 5 0
C
9@ S S 0 09 5 5
S
DSO
S
@055
SC
4 0 5
S
5050
S.W
55 5 5 5
CTTCTTGCCA
CAATATAAAT
TGCCAAAC'r
TGACCCTGAC
AAATTGCCCA
ACTCCCTCTA
TGTAAAATCA
TAAATAAA'rC
AAGAAAGAGA
GGTCCGAACC
CGGTTCTrI'T
ATCAACATGA
TATTCTGATT
CAACTGGAAA
ACTCGCGCCC
GTTTAGCACA
GGAAGGTCAC
TTACTGATAT
ATAACAGCCA
ACAACAAAAA
AAGAAATTTC
TGATTGGAAA
TT'rCCATGTA
=TTTCT
ACTTGCCGAG
TCAGAGAGGT
TCCTAGTCAG GATATCTTAG GCCATACCTT ACTCCTrGTr 'rr=rrACT AAATCACACT TATACCTrGAA CGATGTATCG TA'rTTGATTA CCACGATA.AT GGAATCGTGC TCATAGATA'r
'N'TCTTATTA
CCTGTGGGCT
GTTTTTACGA
AGGCTGTTAA
GCGATAATCC
TATTTTTTTC
TTTTAGTCTC
GAATGCCTCC
CCAAAATGAG
6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040
CCAGGAAGCC
AAATTATAGT
GATAAGATAA
AATCCCAA'rG
CCAACCAGAT
TAAATCAATG
CTTTTCACTT
AGTGGTCT'TT
G1'CAAGAGTT
GTCGGATTGC
AGGTCATCAG CAGAGAAAGA TACAAAGGCT ATAATCAAGG GAAAATCTGA AAAGAGTI'GC CTAAAACTCC AAC=rTCTT CAAATAGTAG GTTTTGAAAC CCACCCAGGC GGTCAAGCcT AAAA'rGGCAT CGACTACAGA CGACGGCAGA GCTACTGGGC TGATr=lG
AAAATTAATA
GCCAGCCTGC
GAAGAGCCCC
CCTTCTCATC
CATGAGGCGA
AAACAAGATA CGAGACCAGC A'rCCCTTCAA TGGAGTAGAA CTrTCGACAT TTGGATAGTC =rGAAAAAC GA'rTGGACTC ATACGAAAA'r CACGGACACC TAAAAGAGAA CAGCACAGAA ACTGCGA.AAG AAGGGTTCTrC CCTCCCGACT AGTTCTCCGT GTCCCAACGC AGTCCAAGTC TACA71"rTTG TCTAATACAT T-TTCrATCTC AAACAACATA CTAGGACGAT CTTGGAGGTC TGCATCCATC ACCACCACCA AATCTCCTGT CGCATATTGC AAGCCTGCA'r AAAGGGCTGC TTC'GCCA AAATTCGAG AGAAAGAAAT ATAATGGACT GCCGGA rrT GC'rCCCGATA 33.7 CGCCTTTALG AGTTCCAAGG TCCCATCACT TGATCCATCA TCGACAAAGA CATACTCGAT TrCrGl-1rCC AAATCTGGAA GTAAAGCTTC CAGAGCCTGA TAAAAAAGAG tzPu'rG1'r AAACAAGGGA GATGC7?rGCT TTGCCAACGC CCCAACCGAG ?I'CTGCTTA GGCGACGTTC TACGATGCGG TAA'rTTCAAG AACTGAGTAA CrrrTGGAT TTTTTTCAAA AGTCACGAAC ACCTGTTCCA CTAATTTTCC AACGGCTACT GA'TTTCC CAAATCACCA CGATGATTGA AATCATCATC TTAGTCTTCA CATGCGTCTr CAC.ACATTTG GGTGATGTCG GCTTT'GCCG GGCTrGAGTA GCAGGCAGCG
GAAGTACTTC
AATCCATTT'G
AGTTCTGCI-r
ATATCACCTG
ATGT'rTrGGA
AGTCCTGAAC
ACGTGGATAT
TAAGGAATAG
CCTTTACCAG
GCTGCAACGT
TCTTCCGTAT
TGAGTCACAT
CTCTCATG4GG
GACGGCCCAC
TTCCAAG=T
GACCCTTAGC
CGTAATCGTC
ATGGCAAGAG
CTCCCATTGG
CGCTTTTCC
ATAAACGTTT
CAAATCGACA
CAACA1-rCCA TTCTGAGTCT GCN'TGTAAA TATCACTCAA TAGTACGACC GTATGGGTTG GTCACTGAAA GTGGGAAATC GCGGATCCCC GTAAACTG'TC GCAGAAGAAC TGAAGATGAT CCATGGCTTT CAAAAGGCTC ACAGTTCCAG CGATA'N'GTT TACGTGrr--Z TTCGCCAACA GCCTTCAAAC CAGCAAAGTG CCTGCTTGAA AATATCTCTG AGGGTATCTG TGTCACGAAT TCCAAACACT 'rGCACTTGCT ATTGTTTGGA ATACCGTTTG GTTAAACrAA CCAAGCAAGA AATTTCCcTCT AGCATGAGCT TTCCAAGATG GGCACTGTGT GTTTTTACAG 7rGTT'rTCTT GTCATAGTAG GCAAGAGGGA AATGACACCA GTCGGTTC= ATCTGCCTC-A TAGAAAGGAA 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 TCTCAACTCC TGTGATTCCT TCAACAACTT CTAAACTCTT ACGATTGCTA TTGACAAGAT 'rATCCACCAC AACAACTTGA TGACCTGCT'r GGATCAATTC AATAACAGTG TAAAACCGGC ACCACCAGTT ATTATTTTTT CTTATTTTAC TGATAAAATT TCAGTAAAAT TCGCCTTCCC ATGGGTATGG G7'TTr.AGCT GACTTCGTCA CGCCTACTT CCTAGN'TT 'rGACATAGTT TT'CAATTGGG G7TGGGCTAG ATGGTAACCA TGGCTGCATA GACACCAGTT TGTAGGCACG GATTCCAACA ACCAAAATCT TTTCTTGCA'r CT7='CCT CATT-rTTGAC AGGGAATGTC ATTTGCCATC GCTTATACTC TTCGAAAATC CAATTCAAAC TT'ACTGACTT CGTCAGTTC? ATCCACAACC CTTCTATCCA CAACCTCAAA GCAGTGCTT TCTr'rGAVII' TTAT'rGAGTA TTAT'rC~CCT TAATTTAGAG GCCAAGGT CAACTCCTTG ATGA'rAGGAC CAG'IrTGAG GCCTGATGAA AAGTCAGGCA CCTGCCCAAA GAAAGGAGAG CGCTCAGAT'r 'ITGAAGTAGC TTCAGCCAAA 'rGGGTTCCAA
CGATTCTCAG
CTAAACTACC
TACGTCAACG
TCAAAACAGT
GAGTAACCCG
7N'TACTCGT TCI'GGA'rCA
CCTAGTCCAC
AAATCACTGG
ATCAGATAGT
GAGTCAAGGT GGCCTCCTCC CCATGTCATT TTCGTGGGTA CCCACTCCCC TTCTGC.A'G CTCGTAGTG TCCTTTTTGA CCCCCAACCA AGCTCCCGTC AGCCTGATGC TAACGGTGTC CCAGCAAACG AGTCACTAAA GGCGGTCAAA TCCCTGCAAA CTAATTGCCC TATCAAGGGA CCAAATTGGA TTCATCC~T CTTGTCCTGA TTTCTCTAAA CCATCTTGTA CCAGGCTTTA CGGCC'rTGGT GGCTTGACCT TCGAGAGGTA GTAGGCAGCT TCATTGTCTT CACrTTCTAA AAGATATCAA TAGACCACCC TCTACAAAAA TCACTTCGAT ATATCCTGAG CAGTATGGTA AAAGCATCC'r TATAGAAAGA TTCrGCAAAT CATCCTCTTG TTGACCTGT'r GG4GCCAATTC CCACGTTCTG GACCTGTTTC AACCAGTCAT TGAGCCCATT ATATCATGCT TAATCAGGTC TTGA'rAGGAC GAAAGATAGG CCCTCTGCCA CTGTCTCTI'C TCrAAAGTGC CAATTTGCAG AAGGC7"rCAT AAGCTTG4GAT AATCTTATCT ACTAGAACTT GAACTCTACC T'TCTCAGCTT
ATTTGTTGGA
GCGCCTAAGG
ACAACAGGGT
GGACGGACAT
GCCAAAATAA
AGAGTCACTT
AGTTGGCCA'r
CCAGGGAATA
GAT'rCTT'CTC
TTCAAGAGAA
TCAGCTAATA
TTACGGCGTT
TGCTCATGGT
GTTGCTCCCA
CTAGATATGA
CTTATCTTCC
ATAGTGACCT
GTAGATATCA
CTGCCCGCTT
ATAATGCACC
CCAAAATGTC
CTGCAGATAA
338 GCAACGT'rTC
ATAATTTCCC
AATCN'CCAT
CCACTTCATA
CCTGCTCAAA
TIrCTTTCAC
CT'ACTCTCGC
ATTrCATTrAGC TGAGGCTTGG TTCAGAATGG TGCGCTGGAG GGCCAGTTGA TAAAGTTCTT AGACTCCCGA ACGCTCGTAA AAGTCGATT AATCCACATA AAAATCAGCC CCCAAGCGCG TGGAAAACCA AGGACTGATA ATTCCTGCTG
CAAAAACGGT
CAATTCCTGC
TGGAAAGGAT
TTCCAT'rCAC
GGGTCCAATG
CCAGTGATAT
CCACCACAPA
ATTCGAAGGC
CACCTCTAGG TCACTTTCTC TCCAATAATC GCAACTT'TTT TGGTTGATGC CTGACTAGGC 'rAAGAAGTGC TGCGATTT CAAGCAACCC ATCAGATAGC* AGACATCTGC CCCCTTTGCC TTGCTACTCT TGAAATAGGC TATCI'AGGTC AAAGACTTGC ATCTACCGTC AAATCAAATC ACCTGCAAAG GGAATCAAAT GTrC7TGGCCA AGCTGATAAT ACCTAAACGC TCTAACA'rGT CTCCTCTTCA CCAATCTGGT CAGCTTGACA TGACTGACTT TCCACCAGAA GCATAGAGCA 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 TGAGGCTGAA TATTCCCAAT ACGTCCAATT GTCGTCTCCT CGATTCCTAG CATCTGACAA TTCAACGATA TCAATATTGG TATGGCTGAC ATAAACTGCG GATGTAAATC TGA7Tr'TGCG GACGGCTGGC AAGCAAGTCC CGCGTGCTTG ACGATAATCA AGTCCACACC CTTTTCAA'rG ACGAATATCG AGGGCAACCA TGACCCTTTG GATACCCTTG ACCACGGCTG TCTCCC1'CCA TAGAAAA~rC CTGAGGGCAA CACTTCACTT GCTAACATGG AGCACCTCCT TGATAGCTTG GACGTTCTTC CAGAN"1 TCTGGGATTT GTCCGAGGGC C7*MTGCCA TNTGGACA AATACTGGAC TGACTCTTT GGACAAGAAG GGACCAAAGC GAACATCACT GGCTGATAGC CACCAAAATC TCATAAAACT TTCCAGCTC TX'CTAACATG TCCATGATCC TGTAGCCAGA TACGCAAGTC GTCTTCACGA ACGCTCTACA TTAGCTAACT TCCCCAAACC TTCTTCTAAA ACCCATGCCA GCAATGGTAA TGACAGACAC TrGTCAGTC ATTGGCTAAA CGGACTTCGA TTTT=CTT TAGOCCGTGA AGACTGATAG GGACCTTCCA CCACCTCACC TGCAATAGCG AACCAACTCG ATAGGCAGAT AAGCATGGTC ACTTCCCACA TGACACAAAG GAAGCTACCA AT'rCTAA'rCT CTTTGAAATC ACTCTATTAC CTCTTATTAT ACCACATTTC AATCTTCAAC TCTGGCGAAA GAAGTTTCAA TGTCCTAAAG TAATAAG1'GA ACAAT'TTGCA AAAATGTCAA AAAATAAAAA ATAAACAGTT TATAAAAACA CATGGTAGAA TATAATTAGA AAGTTAGAAA TTTGTATTTG AAGGTGGTGT TCAGA'rAAGA AATTTAG1TCA TATCCTTTCT GGAATTTATC ATAACAGGAG GATACAGTCA TTAGAACTTAC TTCCAGAGGA AGATATCAT'r CTCACAGCC ACTTGTTI'AA TTACAGGTCG CTAGTTATAT TTTATATAAA AATAGGCTAG TGCTGTCTCT CTAGCCTATT TTAA'rAATTA TAGAGAAAGA ATGTTTAAA.A TGTGATAAAA ATTTCCAACA ATTATTTATC AGATAAGATG CCTGCACAAG GGTGGAAAAT AAGACGCTGT AAATAT~rT'rT AAGATTGTGT ATAAACTATC T'rAAAGTTGT TAAAAATTTA GAGGAATTAA AAAAXAT'rAA CTACTGCTAA CAAATTTATA ACTCTATATC CTAAGTCAGA TTTGTAATCT TACGAATAGA CTG'rCAGAAT TTAAGGCTCC A.ATGTGGAAT GCATTCTCCA GTTCATTATA GATATGGGGC ATGATGAAAA AAATAAAAAA GTCATCTATT TAT1'GCTACA TAGAAGATAA GAGACAAAAT TTCCCTACTC T'rCCTAGCTG
AAGAAG
INFORMATION FOR SEQ ID NO: 34: C7*TTCTGCTA
TTATTGGGCT
ATCCTrAGCAA
TCTTC.AAAAG
GCC'rCAACAT CTr'GATTT
TCTAGI'AAAA
ATCTTCTCTC
CAATCTGGA.A
GGAGCA'rCAA 'rCAAACGACC
CTGCCAAGCC
TTTTAACCGC
GGCCTCTCTC
TAGCCCCCTG
ACTTTCCAAA
TTCCCAGTAA TATAAGCACC ATCCAATTGA AAGATTTTAA.
TATTCAGAAA ATTCTTGACA AAATAAAAGT TTGACTAAAA GACGAACCAC GAATTTGCTC TGGAACAAAC ATTG'N'TGAA TCCCTAAGTA TTGTTCTTT'r ATAAGTAGCT TTACTTACGG GGAGTTTGTT ATGGATT'FAT GGGTGATATT TGGAATTACT ACACATAAGC TCCCAAATAA CCAACTAAAT AATTGTAGCT '1rCCCCTAGG GAAATGAGCC ATCTGAAGCT AAGAGTATGA AAkAAATACTA TCTGACTATC T'rrTTTAAAA AAACAAGCT'r TGAAAAAAGG AAGAACTATG GAAAATGGAT TTATTTTCAG TTCATTI'GTC CTGCTTCCAC 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13206 340 Wi SEQUENCE CHARACTERISTICS: LENGTH: 132.04 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: CCGGATCCAG CGAAAAATAT GCTCTTGAT GCTGTAAGT1G GTCAAAAAGA TGCTAAAACA GCTGCTAACG ATGCTGTAAC AT'rGATCAAA GAAACAATCA AACAAAAATT TGGTGAATAA 120 AAAATTTGTT CAAGGGGGGT GGAAATCAAA TCCCCCTTTG AATTTATCAA TAGAGACACA 180 AATAATT'rAG CTTTCTTATA AAAAAGTAGT ATCCTATGAA AGGAGTTAAT ATGGAAAAGC 240 AACAACCTAG TAAAGCAGCC CTGCrGTCTA TC-ATTCCTGG GTT-AGGACAG ATTACAATA 300 AACAAAAAGC CAAAGGTI'TT ATCTrCCTrG GTGTAACCAT CGTATTTGTC CrrTACTrCC 360 TAGCACTTGC AACCCCTGAA TTGAGCAACC TCATCACTCT TGGTGACAAA CCAGGTCGTG 420 ATAATTCCCT CTrTATGCTG ATTCGTGGTG CCTTCCATCT AATCTTTGTA ATCGTTTATG 480 TACTCTTTTA TTTCTCAAAT ATCAAAGATG CACATACGAT TGCAAAACGC ATTAACAATG 540 GAATTCCAGT TCCACGCACA CTCAAAGACA TGATCAAACG GATTTATGAA AATGGCTTCC 600 ***CTTACCTCT'P GATCATTCCA TCTTATGTTG CCATGACCTT CGCGATTATC TTCCCAGTTA 660 T CGTAACCTT GATGATCGCC rTTACCAACT ACGACTTCCA ACACTTGCCA CCAAACAAGT 720 ***TGTTGGACTG GGTrGG'TG ACCAACTTr'A CAAACATTTG GAGCTTGAGT ACCTTCCGTT 780 CTGCCTT'rGG TTCTGTrCTT 'rCTrGGACTA TCATTTGGGC TPTTGGCAGCT TCTACTTTAC 840 *AAATCGTAAT TGGTATCTTC ACAGCTATCA TTGCCAACCA ACCATTTATC AAAGGAAAAC 900 GTATCTTTGG TGTTATTTrC CTTCT'rCCT'r GGGCTG TCCC AGCCTTCA'rC ACTATCTTGA 960 CATTCTCAAA CATGTrAAC GATAGTGTCG GTGCTATCAA CACTCAAGTA TTGCCAATCT 1020 ***TGGCTAAATT CCTTCCTTTc CTTGATGGAG CTCTTATTCC 'PTGGAAAACA GACCCAAC'1T 1080 GGACTAAGAT TGCCTTGA'PT ATGATGCAAG GTrG~CTCGG ATTCCCATAC ATCTACGTTC 1140 TGACCTTGGG TATCTTGCAA. TCTATTCCTA ACGACCT'N'A CGAAGCAGCT TATATTGACG 1200 GTCCAACGC TrGGCAAAAA 'rTCCGCAACA TCACr TCCC AATGATTrG GCTGTTGCGG 1260 CACCTACTTT GATTAGCCAA TACACCTTCA ACTTTAACAA CTTCTCTATC ATGTACCTCT 1320 *TCAATGGTGG AGGACCTGGT AGTGTCCGAG GTGGAGCTGG TTCAACCGAT ATC1'TGATCT 1380 CATGGATCTA CCGTTTGACA ACAGGTACAT CTCCTCAATA CTCAATGGCG GCAGCTGTTA 1440 CCTTGATTAT CTCTATCATT GTCATCTCAA TCTCTATGAT CGCATTCAAG AAACTACACG 1500 CATTTGATAT GGAGGACGrC CAAAGCTTA CTTACCTTTA
ATTACCATTA
ATCGACCTCA
TACCTCAACA
TGTCAGCC?'r
ATTTTGATAA
CT1-rGATTAT CTTrGCTGG' ATGC~rACAG TTCTTGATCA TCCAAATGGT CTTrATGTTGA ACGCCCTTAA ATCCCGATGA ATGCTTGGC'r GAATCTGCAA AACTAGACG CTTGTTCGCC CAATGGTTC TACATCCTCT CTAGTTTCTT CAAACCTTCG 'rrAACAATGC CTCATCGCCC TTCCAATCTG
TAAGATGAAT
ccTGATTGGT
TAAAGCAGGT
CT'1rAAAGGC
CGCCTTAATT
CCGTTACAAC
GCCAACTATG
CCACAAC1'GG CA'rGAAAGGC
TGCAGGACAC
CGTACAAGCT
GCTTCGT'GAG
GAAAAATG
TATTCrCTTC
AACTCAATTA
CTATCAATTG
AACCTCTCAG
CTCTTCACTG
ACCA!EGCCTG
TTCTTGGCTC
AACTCAAACG TAGACTGACT TAA'N'ATCTA TCCACTGTTG CC'1'AAACT AGATACTAAT AAACCTTGTA CGGTACrGG TTCAAACAAG TATCATCGTA GTAAACAAAG T 'rGGTCTTC GCCGCTTTGA CACCNTCTT CGTTATrC-CG TTCCTCATCT TCCTCTACGT TGGGTGGT TACTTCGATA CAGTGCCAAT GTCTTTAGAC TTCCGCCGCT TCTCGCAAAT 'rGTTCTACCA CTCTGGGCCT TCATGGGACC TTTCGGGGAC AAAGAATACT TTACTCTTGC CGTAGGTCTC AAGATTGCCT ACTCTCAGC ACGTGCTATC
'T-CTTCCTAC
ATCCCCGCCA
AAAAGAACT'r TGTTTCAGGA CCCTTTTTCA TTT'rATACTC CT'rACAAGTG
TTCGAAAATC
AACCTGTGGC
ATATCCTTGT
ATAATACTCA
TATTTTCAA
7TTCAACTTC
CAAAACAGCT
CTGACAGATA
AC3-rATACTG
GAATTACCTA
AGCAAGGAAC
CGACAGCTTGA
CTC'rTCCTAG
GGAGCTAGCT
GTGGCGACAA GGGATAATTTw TCTTCAAACC ACGTCAGCT TAGTPTGCAC TTTGATT'rTC AGCAAGCAAT TTTTCTCCTA TATAGAAAAC ACCTTTTAGA GTATTTGGGG GGTTCGTAAG TCTTITACCAG TATCTTCCTT CCCAGGAGAC CTATCCGCTA AGGN'GCCA GGATCTCTCT GAACAGCTAG TCAAGCCCCT AGGACTTGCA ACTGCAT11TC TGACCCGCAT CTCTTACCGA CCCAAGCAAT TTCTAAAGAC TTCTCGGTGC GAGCTTCCTC TT-CTCCTTT-A TATCACCAAA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 TATCTCCAAC CTCAAAGTTG TGCTTTGAGC ATTGATTATT AGCAATTGTC ACTGTAAATA GACTTGAAAT AAAGCGCATT TCTCTATATA AAGATACCTA TGCTTCC-ATA TCCA'TTCC CCCCTGTCCA AACGTTTCGA GCTCAACTGG ATCAGCTTCT CTATGGTACC CA'N'GC'ATC GAAACT'rA TCGATAATGT CTATGAACCT GAACATCCTA CAATTGTCGA TGGCACATTA
TCTGTTGTGA
GATACAAATG
GCCATTCAGA
TGGTACCAAC
TTTGGT'rrGA
AGATCACGCC
TTGGTCCAAG TCAAATCAAG AGCTAGTCAT CAGCAAGGuAA CTGAGAGTTT CAAAAGCAAA AAAATCGTGT CTATATCAGC AT'rTCTTTAT CGTCTCTCTT TCTT7TTCATT TAATACCTTT
AAAGAGTGCT
TTGGGATTAT
ACCATTTTAT C'TTGAACTCT TGGCCAAAA TATG.ACAACC 342 TTAGGAT'rGC CGACTCTGAT TACACTTATT CTGATTACTG TACAAAATAT TCm'rTT' CTGTATCTGG TCACTATCTT
GAGATTTTTA
ACCGTAACCC
CGCAAAGCTA
'rGCCCGTTAC
GTGTTATTCA
TGAAGGAACT
TTATAAAACA CATTTCCG3'G ATCCAAATTA CCATAAATAG GATTAAAGAC GTGGCCAAGG CTGCTGGTGT TTCGCCTTCA AAA'rAAATCA ACCATTAGCG ACCAAACAAA AAAACGTGTT AGCTATACTC AGG'rTATCGG CCTT-rCTT~rC CATCGTTCT A'rTCAGATAG CAACAGGGAA TACCGCAAGC CTGTAGATGG AAACTCGTCG CAGAAGAACA CAACTACCAC CCAAACCTCA ACGCTCCTAG ATTAGTCl-r CCTGA'rGACT CAGACGCCTT ACGTGGCATC TCTCAAGTCG CATCTGAAA.A AGATGAGAAG GAGCGTCTCA ACGCTATTTC GCTAATTTT CTCTATGCCC AAGAAGAAGA GTTCCCCTTC CT'rATCTTAG GTAAATC'rCT ATCCCACT'rG 'rCGACAACGA CAATGTTCAA GCTGG7 G ATGCGACTGA AAAAAAGGCT GCAAACGCAT TGCCTTrATC GGAGGAAGTA AAAAGCTCTT GACCGTT'rAA CAGCCTATGA ACAGGCGCI-r AAACATTACA AACTTACCAC CGCATCTACT 'rrGCCGACGA GTTTCTGGAA GAAAAGGGCT ATAAATTAG TTCAAGCACG ATCCACAAAT TGATGCTATC ATCACAACCG ATAGCCTCCT GTTTGTAACT ATATTGCCAA ACACCAGCTG GATGTCCCTG 'ITCTCAGCrT AATCCCAAGC TCAACTTGGC AGCCTATGTC GATATCAATA GTTTAGAGCT TCCCTTGAAA CTATTC'rCCA GATTATTAAT GATAATAAAA ACAATAAACA CGTCAATTGA TCGCCCACAA AAT'rA'CGAA AAATAAGAGA CTGGGCAAAA AGCAAAAACG CATACTATCA GGTAT'rCAAA AAACT'rGA'A CTATGCGTTT AGATTTACTT CCTrTTTCTAC ?G.AAATTGAG TCTr'rCCCA AGATcT'NTT" AAAA'rCAAAG IrCAAACTAG GAAGCTACCC GCAGGTT-GCT CAAAACACTG GTAGATG.AAA CTGACGAAGT CAGTAACCAT ACCTACCGCA AGGTGAAGCT GAAGAGATTT TCGAAGAGTA TTAATCACTA AT'rATCTATC TCAACAAATC A'rGAACATTr TCCGAGACAG AGACAAAGGA GCTTGGATCC ACTTGTGTCA AAATTCATTA AACTCTGCAC GTGTAATGAC AGTGATTAAA ACTGCCTTTC ATAGGTTCCT TCTGCATCGT GGATCATGGT TGCTCCGCCG TGCAATTTTT
CTTCGGTAACC
C1'ACCAGAAT
CCACTATGCC
ACAAATCGTC
CCCTCTCGTA
ATCTCC~rTC
ATATTTCATC
CGTGACCAAA
TCACAACAAT
CAAGCGA'rTA
AGCTGAACCT
TGACTCGGT
TGGTCG'rGTT
AATTTGTTAC
AGTCGTTAAA
TATTGTGGGA
ATACTCAATG
7'TGAGGTT GACGTGGrrT
TTCCTAGAAT
TAATCTGwrT
TCTCGTGATT
TATGGATrTTT 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 TTCAATTACC TTCTCTGGAT GA7"VMTAC AATCATGGCC TGCATACGCT ITTGCrAG'r AAAGACTGCG TCTGTCACAC CGCTAGAGAC AAAGATGGTA ATCATAGAAT AAAGAGCGTA 343 TTTCCAACCA AAGGTCAAAC C1'GCTATCAG CATGATAGTT CCATTTACCA AGAAAGAAAT ACTACCGACA TTC?1'ACCCG TMTCTTACG AA'rAGTCAGG CTGACGATAT CCGTCCCACC ACrWGAGATA TTGTTTCGAA GAGCAAAACC AATCCCCAAA CCCATAACAA CACCCCCAAA AAGGGAATTG ATAATGGGAT CC1'CTGTCA6A GGTTOCCACA GGGACAAACT GGATAAAGAA GGAACTCATA GATACCGTGA TAAAGGTAAA GACGGTGAAC 'rrAGGCCAA TCTGATACCA AGCrAAGACC ATCAAAGCGA AGTTAATGGC GTAGAAGCTr AGCGAAATCC GAATATGAAA ACCAAACCAG TGATTACTCA AGCCAGAGAT AATCTGTGCC AGACCTCTG CACCACTCGA ATACACATGC CCTGGTTGGA AAAAGAAATT AACTGCTACT GCTGATAAAA AACCATAGAC CAGAGAGGCC GAAATCrrCT CATCATACTT AA'TTTATC TGATAAGCAA AGCGGCGCAG
S
p.
S
S* p
S
S. 5O S
S
VS**
55 p
S
S
S
50 5 5 TTGTTTCATC TT-rCTrACT ATGGAGCTTG TGrCATTGGG GGATA~wrTC TTCTCCAGCA CGTGTGGTG4G GAAACCATAG CTTCAGTrGA GAAACCAAGA GAAGGCTACC ACCACCAAGC CCTTAGCCAA ATCACCTTCT GGTGGGCG4CT CATGTAGCGG CCCAAAGGAA G'TTGAACTrA GAAGGGCACC CAGTGTrGCA CCT'rATC'TC AAGAACAAGC CAACTGGTCC GTTTAATTCT
ACTGTTTGGC*TACTTCCGTC
CTrGTGACCAC AATCGCTTTT
TGTAAGCTGA
TCAGTTGCCTr
AGCAACATGA
TCCAkTGGCTT
GCCTTGAACA
TCA'rAACCGT
AATTCATGAG
cc'rrCTTCTT
TCATTATCAA
TTAGCCACTT
GCTGTTGTCA
CCATCAACCA
'rrCTCGAGAG
ATAATAGCGC
GTTCCTCTAG
TGTTGTTCT'r
CAAAACCGTC
CAAGAAGGAA
TCCGTTCTTG
TCAAGACGAT
CAGTCTCTTC
CAGACCATTC
TCAAGCCAAG
CAAGCGTATC
ATTCTTCTG
CCTTGACCCA
ATACTTGTA AGACACGTAA CACCGCTAA TrCGT~TrGT TTGTTTGAGA GCGACTGTTG AGGAA.AGGCA ATGACTTCAC AAGCCCGATA GCCAA6ACCAC ACCAAACTGG TCATTGGCTT AAGGTCTTrTr TGGTTGATAC ATCGTAAGCA ATGGCACGPA CTGTGGAAGT GTGAA6AGGAT AAACATCGGC CAGTCAACCA CTCrAGCA ATACGTCCAC CCCCACAAAG AGAACCAAGT GATACCAGTC AAGAACTTGG AGCAAGACCT TrGGCACCAT TGAATAGTTG TCCGCAGCTC GACTTTAAAG TCTACACCTC 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 ATCTTGI'CGA TGTCTTTACC ACAGCAGGTG CTrCTGAAAA -GGACCACTTC TGTCAAGTCC TGAAGCAACA TGTCAACC AGTATCTGGC CGTAAAGAGC CATAGCATCA TCGTATTTCA TACGAGGGAA TGGrACCCTT CTTTGTT'rC CTTCATCACG CGCGCGATCA AGCTTrCTGT AATATCI'TGG CAGTAAGGAA GGACGTNTCC AAGTCGACCT GACTAAATTC AGCCTGGCGG AGTCCTCGTC ACGGAAACAT rrAACGATTT GGTAGTAACG GTCAAAACCA
TTGTCAGAAC
ACTTCGATGC
ATTTCTTC
TCTCCACGCA
GCATTCATCA
AGAGCTGTT
GAGACGGCAC
CCACGTCGAT
GAAGTTTAAG
GTGTATCGTC
CGTGATTTGT
TAAATAATCA
AAACTCCAAC
ATTTTCCAAC
ATTTGCCTCA
344 GGACTTTGAG GAAGAGCGTA AAAATGCCCC TTA-rAACAC CGCGCCCCTT CAGCCGTTGA CTTAGAAAGG AATGGTGTCT TCATCCAAGT AGTTGCGGA'r AGAG74GGGTC ACCTrGGCAC Ar'rTCTGGAC GACGAAGGI'C AAGGTAACGG TAACGCAAAC ATGCCATCCT TAATCTCAAA TGGTGrlrGTC TTAGCTGrIGT TAAGCACAAT AAGCGCTGTC ACGTTTAACT CTTGTCACGC GCAGCGACCT GACCAGTCAC AGCTGTTGCC ATAACCTCTG CAGATACTTT TTCACGGTCA CGAAGATCGA TAAAGATCAA TCCTTTCAAG GTTATTTCT'r GTCCCATGTG ACGTT'rCA?? ATTCCTCC TCTTTTATTC TTCATGAAAA TCATCAGAAA AGTTI'GCCAG TTAGCGCTAA TACTCTTCGA AAATCTCTTC ATGGTTACTG ACTTCGTCAG TTTCATCTAC GTCAGTTrCTA TCCACAACCT CAAAACAGTG TTGCTCTTTG ATTTTCATTG AGTATAATAC TATTTTGAAT TTTTGCCTGC T1'TACCCTTT ACAATTTCCG TTATGTAAGC CGTCCCAAAA ACTGCTAGAT AGTTATAGAA GAAATCGCCT AAAAATAGAA CGACTGCCTG AATCACTGCT GACTCTATTA TAGCATGAGA ATCATCAAAA AAATACTGTA GACCAGACCT TTTCTGCTAA TAAAATAGAC AAAAAATTGT TGCACATCAC TAGATACCAG AAGAAAAATC AGGGTTCGTT GTGCTAACAT CCCTCTAAAA ACAATCT AGAATGAGAA AAGTGGTTGA GACAAGGTCA GATCATCTGG CAAGAAGAAT TGAACGACCA AAATAAATCG ATTAAAGCCG CTCTTCTCAA AAATGCCGTA CACATATACT CCAGCCAAGG AAGCGCCTAA AGCAAGCGAC GCAGTCGCGA
CAACCGCACC
CTCAATAACA
TrCAGGGTrG ATAACCAACT GCATGATTCC ACCACCAAGG TCACGACGAC GGCCAACCCA TTCCTCACGA ACACGACCAG CATACATACT TGT'rACTATT TrACCATAAA AGCGCAGCTC TCTrTAAAAG TCAGGTGAAA GCCCTAAAAA
AAACCACGTC
AACCTCAAAA
T'rTGAGCAA
AAAAATCCGA
TCAGCGATTT
CGCAGTACAC
'ITGAAGGCAT
AATAAAATTA
AGCCGACTAA
TGTAAGCCAA
CTGGAAAATG
TACTATI'GTC
CCGTCAAAGG
AGTCTGTCGC
GAGATAAGAA
AGCGTCGCCT 'rACCGTATGT CCATG~rrTG AGCTGACTTC CCTGCGGCTA GCTTCCTAGT AGTGGCAAC TTATCATrGG AATTCGCTAC GAAGGcTTTC 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 3220 8280 8340 8400 8460 8520 8580
TGAACTTCAC
CGGCTGCCTT
CTGCAATAGG
CGGACTCTTT
TCGAGGCAAG
AGCAAAGACA
AAGCTAGCGC TCCAATGATG CTCGTTTCAT GTGACCTCCT ATTATTCAAA GCGTGAAGAG ACCCAAACTA AAACCAAGGC AATCAAGGCA AATAGAAGAC CTGCrrAGGA AAGAGATAGC AGCAAAAATA ACCACAGCAA TATTTGCTGA TTTACTGAAG CCA.AACCAAG ACAGGAAGCC TATGAACAGG AGCCTTCTGA TACCATTTGT CCACATAGAG TAGAGTAACA GCATAGGGTG GCCCCTGAAT AAAGCCATAG ATAAATAAAA 345 AGGATAGAAG GGCrAGAAGA ATCCAGCCAA GGT'TwrrAAG TAATr'rCATA GATAACTCCT TTATTTrGAAA TAACGTrA CCATAGGTAA CTGCATCACA TTGATATAAA CATGGATGGC TCCTACAAGC AAGAAAGCTA GTAACTGAAT CTCTCCTGTC AAGAAAGAAA TGATAATAAG AAAAATATAT AAGGCTGGTA AGACATA'N'G GTGTAATTGG AATAAAATTC GAAAACTC'rC TTCCAAATTA GCCTGACGCT CCCCTTCATC ATAAGAATTT ATATAGTTCA AGACATCCTT TGGTGTAGCG A.AAAATTCCA AATCAAACTG ACGAACAATC GCAATGGTTT TAAAAAGAGA LI' TTGAGCG ACTAAGAATA CCACAAAGAG TAAGAAAGAA AGGAAAAATG TTTGAGGGTT TGTATGCAAT ATAATCACCT CACTTAATGA AATAAAAATA GCCAATGGAA TCGCTACACC TGTAATATTA AAAGCAATGG rrCCAAACTC T'rCAT'rCAGA TCGTCATCCA TTTCCTCTTG TAAGAAATTG AAAGTCAAAA ACATACTAAT CCATGGCATC AAGGCTT'rTA CATCTAAAAT CATCCCTACA AACATGCCCA AGAACCCCCC CGTTTCTTTT TCATA7TCAT TCTCCTTTTT
AAGATTCCGA
ATACAAAGAA
GAAACCTATC
AATTTCGTGG
AAGACAATAG
CACTTGCTAG
TCCATTCAAT TACTGGGATG AGAGCAAAGT AGACCCAAAC GA'rTA.AACCA GCTTAGGTCC ATCCCAATCA GTAGAAATAC CCACTACATA ATAAATCACT T'rATACT'rGT 'rCATCACTCG CGATTCGACT GTTTCGT'rGA AAATTTGAGA TATTTTCAGG GTACTCATCC CGTTCTAGTA GGCTAATGGT CTGTCTGGAA GTCCCTTTGA TTGAGACCAT CGCGAGCTCG AAGCTCTTTT GTTACACACC TACTCrCCGT CAAATTCAAC C.GT~rGGATA CGAA'T=TCT TTTCCCGTAT TATCTACACG TCGTAGCTTT CACAACTTCC CAGTTATCTG GCCCAATATA CACTCCCGT CATTTCTTGT AATAA'rCTCG ACArTCTGC GTTTCCT'TC GATTTTATTC TCTAGTTTCT TGATTTTTr AGAATTATTA TAGTATAAAT CCTAGTACCC ACATTATAAC TCCTTTCTGC TTCATTGTAA CATATCTrTT TCTrrrTGAC AAGTATAGTTr TGTCATTTTG CAAAAGAAAA AGGTCAGGAG TAGGTTCCTG
TACATTTGCA
TGAAATTTTC
AGTAAACAAA
GATTCGACAC
ACATCAAAAA
ATTTTTGGAT
AAAITrGGTCG
GCTGACTAAT
TCCTCCTCCA
GCAATGATAA
ACCCCTGCCA
AGACGATT
TCCTCAATAC
ACCCATTCCT
CATAATAGGT
TGCTTTTCTT
TAGCTGATAT
GTGCCTTAAA
TAACAATCTA
TTCTTTTCAA
CTTTGATAGG
AAAGCTATGA
AACGAAATAC
TGGATGGGGT
GTTTGGCTAG
TTrAGTTGCAT
GTTGCAACTT
CATCAACATC
8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 3540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 ATAATTGGTT CCTTTCCAAT TC7=TCGCT CAAGTCTTTT GALATAAAAGA AAATCATA-AA TTCCTATTTC TTAACTTGAA GTCAAAAAAA TTATGAITTT ACCACTTTAT CTATCATTAA 'rACTCTTCTA AAATCTCTTC AAACCACGTC AGCTTCACCT TGCCGTAGGT ATGGTTACTG 346 ACTTCGTCAG TTTCATCTAC AACCTCAAAA CCATGTTTG AGCTGACTC GTCAGTTCTA TCCACAACCT CAAAACCA'rG TTTTGAGCTG ACTTCGTCAG TTCTATCCAC AACCTCAAAA CCATG?'MG AGCTGACrC GTCAGTTCTA TCCACAACCT CAAAACAGTG ITrTGAGCAA CCTGCGGCrA GCTTCCTAGT TTGCTCTTTG CAAAGATTTC TGAGAAGTTT TGGCTGATTG TTTGGTTGTT CTTGACCGTC ACTTGTCCGC GGGTCTAGC CGCAAAGACA TCGGCTGACT AATCACGCTC TGCTTTGAAA CCTTGTTGC TATTTGCCCC TrCGCCCAAG ACTGCGATAT TCACACCTTG CTTTTCAAGG ATGAGAAGCA CAGCAGT'rTC AGGGCCTCCA AAGTAAGCAA CGGTCAGGTC Ar'rCCCCTCA ATC1'CTGTGA ATTTTTATTG AGTATAAAAT
TCTCAAGTGA
'1-r'CGACTTC 'rGAACTGAGC
GAAGAGCCTG
AGACATCTAG
GGCCCTCTAC
CCAAACCATC
'rAAACTCGAA
CACTGCACT
GCTCTCTC
TTTTAGTTTA
TACCAATTCC
GGCGTT'rTCG
ACCAAGTCCA
GTAGCGACCA
AATGGTGTGG
CCTAGTT
TCTCTCGGG
AGGGTGATGA
CGGTGAGGT
AAGGCCTTGA
ATAGGGAGCG
AAACCAAATC
CCCGCACAGA
TTGTAGTAGT
CCAGACCACC CACCATATTG GTATCGATGA TGTAATCTAC TCCAAGATTT TCCAACATCT GACGCACAGC ATCAAAATGA G GCTTTr C"TTCATCAAG AAAGTCCAAG ATAGACGGCG CATTCTCTAC TGCCACCTI'G T CTTTTT CCTTAGAGTC CAAGACACGA AGAGGATTTT CCTCCAAGCG ACGTTGGCTA TCCTTAGACA AGGTCTCCTT GAGCGGTGTC AAATAGTCAA 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120
TCAAGGCTTG
TGACACCTTG
CGGTAGCTGG
GCCCTGCC'rG
GCGGTAGGCT
AATACCGATT
ATTGCTAGAG
TGGACGCTCA
GCACGGCTCT CAGGATTNCC TCCTTCAAAA AA'rGGGCTGC CCAAAACACT CAACACCAAT TAACGGAACA TAGGTCCCAT AAGAGTGTTG AGGTGCAAT'r CATAGCGATT GTTTCCACAT CTGGTGGAAT TGGCGCAAC GTAGTAGAAC TTGCTTGGCT TTTGCACT'rC TGCGGCGAAA AGTTA=Tr CCACATAGGA ACGGACAACG GGTGCAGTTC CTTCTGGACG GAGGGTAATA 'rOACGGTCAC CCTTGTCATA AAAATCGTAC ATTTCCTTGG TTACGATATC CGTTGTATCT GCGTGCGCAC TTCTGCATAG AC'rGCCACTT ACCAGACTCA TCATAGGGAA TCCTCmTAA CAGTAAGAAA AAAATTAGGA ACCTAGCAAG GAAAGACCAA TAGTGAAAAA CAAGCTGTTC ACTAGTACGA GCTAGAACCT CCGACAGAGC GACTGATAAC CTCGTAATGC TCAAAAATAG TTGTAGCGTT TGAAAATCTC ACGGGCAAAG CCCTCAACGT GCAGGTAAAA TATCCTGCGT TCCTTTTGGT TTTTGTAATT ACTTAATAGT CTTATrrAC CATAAATAGA GGGA'rTAAAA ?T'IAGATATC A~rMTGAGA TTAAGAATTG TCAAAAAAAT CAAATAGCAT CCAAGTCAAC TGTATATTCC ATACGGCTAC CCACAGGTAT GGATAAGGTA AACAATAGAC CTAAAAAATT CTGGAGCTAG ATIrTrCATG AGCATGGCAC TAATCTTTGG 347 TTGAACTTTA CCAGACACAT ACAGAGTAAA GAAGAGAAAT AGCAAACCAA GCACGACTrG ATTGAATAAA TTAGCCAAAC CAACTAGACT AAGTCCTACG GTCTCCCACA TCATCAA'rCT AGGCAAGGAC TGCFCCCAA AATAATCATT GCCCGTAAGG CTACTGATGA TGACTGATAC
TAAAACACAG
GCTCAAAAAG
GACAGCAAAG
ATTTCGGTAA
ACCATCAGCA
TGATACTAGG
AGCTAAGAGC
TGAAAAAGAA
ACGCAGGCCA
AA'rrGA1GA TAAATAGTGC CTCTGTATAA GAAAAATTCA AGAGAGAATG AAGATATTAT AAATTCCACC CAAAGCGCCA CCCAAGGAAT TAATAAGCAA AGCATAAAAC CAAAGTNTT CTGTCCACTr TTAAGAAAAA CGAGACGTAA ATTGTTAGGA ACTGGTCTN' GATAGAAAGC TTCTCATTTT TrAAGVIN'C GATGACATTG ACAGGCTCAA TTT1GCTTTTT CCTAAAAAGA GGATAGTGGC AAAAAGCAGG CATTGATTCC CGCAACGAGA GAAAAATTGT TGACCGATAG CAGACTCCGA AAGCTTGACC ACCAATAGCT GAAATATAGG TGATGAACTG TAAOCCTCCA TCAGATCATC TTCAGCTACT TTTCCTTAA TAAGAGGCAT CCTGCAAAAT CACTGATGA'r ATCACTAATO ACATTGATCA AACACAGGCT 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13 080 13104 AGAAAAGGCA AAGAGACTAG CTTGCTGAAC AACTAGGGCT CTGAAACAAA CCGCTATAGA CCATCCATTT GACCTTGTCC CCCTGCAAAA ACTGTAAAGA GGGTCGGAAG AATCATGACA AAAAGATGCT TGTGACAAGG TCGATGCATA -GACGATAAAG ACCAAALAGCA TTGAAGAAGC OTG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 19250 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear OCTAGAAAAA ATAGAACCOC CTCGTGTAAT CTCCCOGAAT ATATTCGCCA TAGCAACAGC ACCAGGTTGA AAATCGAAAC (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCGCAAAT AGTTTGAAC TTTTCATCAT 'N'TCTCCTTT AAAACTTTCT CTCCATTATA GACTCTTTTC AGAAAGTTGT CAACAGAATT TTCAGAATTT TTGAAAATTA 'TTCAAAC AACATCTTTG CAAAAAATAT GAATATCGTA AGCGCGTCAT AACAAGGTAT CTATCATTCA TGGAGCTCCT CCTGTATACT ATTAGTAAAG TAAATATTGG AGGATATTTT AATGCCACAA CCTATTGTTC CTGTAGAOAT TCCACAATCT CGTCGTTTTG ATTCTAAAAA GAGAAATGAT ATTCTrCTTA AAATTCGTAT TGGCA.AGCTT GAAGTAAGTr TTTTCAATC TCTCAATCTC
GAAATGATAG
AGGCACOTO
TTATCTCGTT
TGGTGGACGT
ATATAAACGC
TCTCGCACCT
GTAGATTGAA
CTGTCCTGAT
TGATTTCTAT
GTCCATCTCC
TGAAGTTGAT
CGAATCTCTT
GCAAAGCCAA
AACAGCT
TATCTCGTGT
AAAACCCACT
AAAGACCGCT
TTTGAGAACG
GAACAAGTAG
ACTAGAATAG
CGATTTGTCC
TGAAA'rGAGG
GATTAACGAT
TCA'rGACATC
TCCACACTTG
TATTAGTCGG
348 GGATAAGGTG TT CTCTATG ACAATTCATC TATCTAGCCT GTGGGAAAAC TGATATGAGA CAAGGAATCG ATTCACTGGC 7TGAAITTGGA TCCTTTCTCC GGTCAAATCT TTCTCTTTG TTAAAGTCCT TTACTGGGAT GG'TCAAGGAT 7TTGGC'rACT GCAGACTGAC TTGGCCCAGT ACAGAAAAGG A743TCAAACC
ATTGGCTGAT
TACACCTCTG
GAAAGGCrTT TCTATCACTC CTTCTAAAAC ATTGTTAGAA
CAAAAATATA
ATCGAT'I-r'A TGTTATTATT TCATTTTACT AC'N'TCTTTT TATACTCATC GGACTTTATC ACCTCCTTCT .5.5.5
S
ATAACGAGTA AAAGATAATC TTTATAACCT CTTGCGAGAG CCTATCGGGT TC1'AGAGAGT AAGCAGAGAA ACAAGGGA'N' TACAAGAATTr CCTAGGAGAT T'rAGGACTTT AGTCCTCTAG CGACGCTAAG CT'rGGTAAAC CACTrGTTGG ATGTTGGGCG CAGATAAATC A'rCCTTAGGA AAAGAGACTG GGAGGCTTTG AACCCCTACT GGAAGACTTC AACTAGGAAG GGCAA'ITGAA AAGACGGACA TCTGGTCCTT TGGGACGGAG TAAAAGAGTC GGGTGGTTAT TTTTAAAAAA r-rCCAAAGTT TrCAATGGGG
AATCTTTAAG
ATCTGGATAA
AGACTATTGA
GATAGCCATC
ACGCTTTACC
TATTCTGGCT
~rCTGCCTA'r
TGCGAACAGC
CATGTGAGAA
GCTAAAGGTT
CCAGCTGATG
TTrTGCT'rGGT
TACAGCCTCA
TCCAATAATC
CAGTGGACTC
GCGAGGGTGG
CGAAAGGCTT
TTCATCTCTG
G1TACTTGATD
GCTTGTGAAA
ATAAATCCAT CAGAAAGTCG TGCITTCAAA AAGCACTCTA CCAGTCC?1'G TATAACATCT TATTCTrAAA TCCACGTTTA GTGTIGTATGG AGGAATAAAT TATGCCATAT AGCATNGTCC GCTCCTA'rTC CTAAACCCCC CTCAGCCCTT ACTTCATGCG GATGAAACCT TGACCTACTA TTGGACN'TTT TTGTCAGGTA ACCATGATCA GTGTCGAAGT GGTTCAGTAG ATGTTCATTG TGATATG=rG CGGCAGTAAC GCGATAGCAG TCCAAGGTrT AGGAGTAAG TAGAAGCTTA TCGTCAACTG GAAGAAGCTG GGAAGT7TT TGAAGTGCCC CCCAAGCAAG TAGCCTATTG TGATCAGTTA TTTTCCTTrGG AACGGCTACA GAAACGTCAA GAACATCTCC CCCTCGTCA GTCAGTT=TA TCGGGTTCAA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160
AGTATGAAGA
TAGCTGAACG
TTr'rAGCCTA TTATT'rCTC
ACAGCTAAAC
AATATrTrCTA AACCT'rAAG ACCATTAA CGCCATTAAA TCATTGGTTA AGCTCAGTN' AAAAAAACGA AAAGTTTTGA AGGAGCTAAA GTCATCAATT ATAG'rGCGTT TAAATCAATT TTCCTTTCCT GCAAGAGCTA TTATATGAG 'rrrGTGGAA GAATCTATAA CAGTACGCAT CGACTGCTAA 349
ATGATT
CTTCrAGAAT
TTACCGTGGA
ATATATGAGT
CGAAACTI'G
ACGAAATTTG
TTCATATC?1'
GTCTTCCAA.A
CTAAAGTTGT
'rCTCTAGTCr
AAAATCAA
TCCI'TCTCA
ATACAAT1CC ATTATAAATA GCGAGAAATA CGAGGAAACT CTCGTAAACA AACAGGTTTT ACAAGAAAAG TGCAAATAAG AAATCTCCAG GGAC.Aw=T CAATAGACTr CGTTAITrGGG AAAACGGATT ?TTATCGCTC TGAACATCAA
TCTATCCTAT
AGAGGCCTAT
ATTAGGAACT
CCTTACT
AAAAGAAAGG
GACAAAGAC
AA1'GGCTATC
TGTTCTACTA
CCTTTATTCT ATCAAACATG CCTCGACTAT CAAGTAAGAC AGAATTAATC ATCTCGTTTT ATTTTATCCA CTATCCCTGT TCCTCAACAT AA'rCCCCAAC G CAACTG CTTGAAGAAG TC'rAAGGACT CAGGACGTTC AGCTTAGCTT 'rrCTCAACC AAGCGCAAAA ACAAGCCAAA ATTTCCATCA AATACGTTCA GATTTATTAA AAATATACA.A ATAGC'TCTGT ATT-ATCTTAA CGGTAATCCA AAA'DCCTCAT AGAATTTTCA TCAATGCCTG TGTACTTGTA CGAATACATA CACTACAG?1'
AATCCGATAG
ATTTTrACTCT 'rGACTACTAA AGAAACTACT TTCTTCCGGT GCATTCATCT G'rAAACAATT ATACTCA'rGG TTATCTAGTT 'rAAAACCGCT CTCTTGTAGC AAGATTGGCA TATTATACTT TTCAGCTAAT =rTTrATCTG TATCAATATT TTCCTCACGG 'rTCAATCCCA AACGTTCATG GATATCTGAA ACTTCTCCCA TAGGAGAACC AGTTACATAT AAATACTTAC GTGCATGTTC AGAGTATGCT CGACGATTAG TCTCTTCCGG 'rAGGCACTCA AAAATTGGAA TATGTAAACG C?1'GGCACCA AAAATCAATA AAGCATCTGG TTTAATwlIGA TTCCCTACAG TAGCACCAAG ATCA'rCTCCA AACCCTAAAT TATCAAAGAA AATACCAT GCCAAAATAA CATCAAAATA CTTTCGACAT TCT'GGACGTG TTCCCACAAT AATCAATAAC TCACTATAAT CTGTCTrAAT TTTCATTTAT
AAAGTTCCCG
AATCCTAGTG
CTrTrTCTTTG TGAT'rAATTG
TGAGCTGAAA
GACTCAATAG
GTGcc~cTT AAGTTAACA'r
TCTTTACAGC
ATAGCTGATA
TTCATCAATT
T'rCAGC'TTTT CCTCCAAACT CAAC'TTTACT AGAGACATTT CTTGTGCCAA GCTAACAGCA CCAAAATAAA TCCTGCCTTA CAGCGGGAAA AGGATAACCTr AATCAGATAC TACAACAAAT GCTCATGCTG AATCACACGT ATCTAGGATG GCAAGAITA'r CTGTAAAGAG AGAAATAAAA GTAAGATATA ACCTCCN'T CAGATAAATT TTTATGTAAC TAGGTAAACC ACACTCATGT CTGAAATAAC ATCAACAATC GATTGCCAGC CTCCATATGA AACAAGAATT TGTATCCCCT TGTATGAAGT ATTAATAATA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 ACAGCATCCA 1'GTATACGTC CGGAGTGTCT AAATTGTAAT CATAGTTTTG TCCAGTATGT TrTAGTGATAA CACTACTTAG ACGTATAATC TTAAC~TGTC CATTATCTTT AAAGTGAATA TTCTCCACTT GTTCAAAAAA AGTATCTGGA TGTCTAGCAT CAAATGACTC AGATTAAThA TATTATGTGC GACAC7"rCAA AGI'rCAGAAT 350 ATTAGCCCAC ATGACAGTAA TTAGATTTT'C TGTATCAGAA ATAGCCCGGT ATCATATGTA 7TGCTTCAAT C~rATCGCCC AGGATACTCT TGACCCNr CA'rCCAGCCC TATCCTACGC AGAAACAACC ATGAAAAArr CCCACTAGA ATGATGCCAA AGGTTTAGAA ATATTAACAG -AXXTTACC CGTATrCT
TCTTGTATTA
TGTTGCCCT'r G7TT=AATA
AACTTATCTA
TGAGGAATTTr
ACAATCTCTC
CTAGGTAAGA
AGACACTCrT
ACTTGAATAG
TTAGGACGGC
G'N,'TCTr-rC
GAGTTGAAA
AACCACGACC
TGGTAATGCC
ATTCCGTAAA
CTGGTAAATA
CAGGCATAAC
ACTACCTCGT TCATCTATAT TCATTrrAG AGATIAGGTAG GTAGAATACA ATrrC~rTTT TAAACTATCA GCTGTTTTT TAAATG??c
AGGAAACTTA
AAACGATCCC
TAATAGAGAG
TCCTGATGGG
CAATGCAGT-r
TGGATCATTT
CTAAGGTTGC ACCATGAGTC TTTGTAATCC ATCTAGATTA GTATCAAATC ATCAATATAC GTAAATCGTG AGCTAGATTA ACCACTTCCC ATAAAGATTC CATA'rTCAAA GAAGACTTCT ATCGGCCTTC TAAACTAGCT GTTGGTACGT AGCAGTACTT CAACGATGAG GATT'rCCTTC AGCAACTCCA ATTCTACACT GTGT7MTCAT ACTN'TCTAA AATCTCCAAT ATGAAfTcAT CAGGATTCTG TGGACGATCG TTCTTACAAT ATTCATCTAA 'rAAAATCGGA CCAA'rCTCTA AAT'rAGGACG AG'rCCTATCT ATAAGA'TTrr TTCCTACAAA TCCTTTCGCT CCTCC?'rATT TTATATGCTG TTTTAATAGT
TAACAGAAAG
GGGAAACGGT
TCCCCTGCTA
TGAGTAGAAC
AATCTACTTG
ACACCAGCTA
TCTGTATCAC
CGTCCATCTT
CCTGTGATTA
TAACTCTCTC
TTGCTACAGC AGAATTGTAG AAACTAAGAC AGGTGCTCCC GCTTAGA'rTG 'rCCATATATA TTGAGAGTAG AACAGGACAA AAAAACCGTA ATT'rCCCTCC AATGGAATAC GAAATCGGCC GATCATACTG AAAAATCTCT TCAAAGCTTC CAGAGTACAG AAATAIrrTTT AATCATGCCC GACAATACAT GATACATTAT TCTCTTrCGTC TGCTACCATA TTCTTTTAAC TTG;CTC'rACA AATTTrCTATT CGTACTACCA TAGGAACTCT ATAAAAATCA GTGTrTCATA CC?1TTrrCT CAAAAATrrC TGATACACCC CTAGTATATC TCCAGATTrTC 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 ATATCCTTGA TAA'TAAT GTATCrrAAA AGATTTTACA TCACGAATTG CTGTCTGTAT TTCATCTAAT TCTAGCAACT TCCATCAAAT CGGTATTATT ACTATTGAAT TCTGTCAACA TCTTTGAAAT ACTTATCATA GTTAAGAfTwA CGATTrATCAC CCCAAATCAA TTGCATTTGC GCACTCTTCG TTAGTTAATA CCGTGTCTAA TACCTATAAT CrrAATATCT TGTTCTGAGG TTAGCCAACA CTTCAATCGT ACATGCTGGT GCTTTCTGAA CC7"rCTTC.AA ATGCAAATAA AACCAAGTCT ACTGCTTCTT CCAATGTCAT CACAAAACGT GTCATGCTAG GTTCAGTAAT TGTAAGAGCA TTTCCT'rGCT TAATTTGCTC AATCCAAAGA GGAACGACAG ATCCACGGCT ACACAGAACA TTCCCATAGC GAGTCACACA TATCTTTGTA 56 5760 TGCTCAGGAT TTACCGTCCT GGACTTACCA GTTCCCATAG CATTGACAGG ATAAGCCGCC ACACCAGCTr CGATAGCCGC AGTGAGGACA GCTTCACAG GGAAAAATTC ACAAGAAGGT ACAGCAATCT TTTCCATCAT rACTGTAG AAAGACAGAT TTCTCCGTTC CCAAAATG=r ACTTGTTTAA GAGCAGCAGIC TAATCCACAC CATGCATAGC A'rI1-'?ACC GAAGCTAAG'r CACGCACATC AGCCIrGGAT AACTTGCIrr
AGITTTTACC
GTGAAAAACA
TCCAAGGTAA
CATATCATCT
TAAAAAACGC
T=~CCTGTA
ACAAGCCGTT
A7=TCTAAT AAACGGATr TGTTTC'ITrT
TTGAGAACCG
AATTGTGACA
CCATCTCCAT
TCCCAGCCAC TTCTCGTACT TrACCTGAA ACTCATGACG CATCTCGCGA AAATATACGA A'rCTCTGAGA CATCTGTTTC CATTCCCAAA TGAACCTGTC CCrCCTGTAA TTAGGAGAGT TATATTACAC TTCTCCTTCT AGTATGTCTG CAAT=~CTT ATGGATTTGA AGCTTGACTC ATTCCTTGAT AAACTGAATC 0 0 00 0 0 0 0 .0 0 0 0 000000 0 0 0 0 00*0 0 AATTCTTTrAA AATGCCTATA GCTTCAATTrC CCTCTGGACG CTTGGAGCCT CTTCCTGAAT AAATTGTGAA AATCTAATAC AATATTA=T TCATCAGCAC TTCAC'XTGTA TCTCTCATAA ACCACCACTA TCTGTTAAAA TTCTAAAGGT TCCATCATCT CTACAAGTTT CAAAGTCCCT CCAAAACAGG TTTTCCTAAA r'rAAATAACT TCTTGATAAA TGATACGTTC ACAGCCACTT GGATAGGATA. AATAGCCTTG ACATATGTCT CATCGGTTCA 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 AGTT=CTTCT CAGCAAkTTTG GCGAACACGA GGATTCATAT ACATCTGAAT A1'TCTrCAAT AATCCTTCTA ATTrGCTCI'AA CCAAGATTTT CACGACGATG AGCTGTAATT AGAATAAACC TGCTTTCTCC TATCCATTCT AACTCAGGAT GCGTATAGTC CTCTTGAATT GTAGT'IrGTA AAGCATCAAT CGCCGTAT'rA CCTGTCACAA ATATGCTCTC TGGAGI"=T TGTGrTGG TAAAATGATA CTGAGCCAAA GGATATGGTG AATAGATATC GTAAGTGCC TGTAAATAAA AGGCCGCCAG TGAACTAGCG ACCAAATCAG GTTTTTCTGA CTCTAAAATA ACATCAAATA AAGT~TTGTTT ATC~rCATA AATGTG1TCCA AGACCTGATC CAACA'rTTGA CCT'TCTCTTA AAAGATTATC TTTTGAAAGT ACCCCAACTG CTTGACGATT AAACTCTTCA AAACCAGCTT CAACATGACC AATTGGAATC AACGTCGTAC TTGTATCCCC ATGAACTAAC GCCTTCArrC CTTCCAAAAT ATAGACAAAT CAAAATCGGG CGGTGTTGGC CCGTAACGCA
GCCAATGGTC
AATAATCCC-A
AACTAATGrT 0 0 TCAATATTCT TACGTrCT TAACTCTTTG ACCAAAGGAC ACATCTTGAT GGCTTCTGGA CGAGTTCCAA ATACTACAAC TACTTrTC ATATATTTAC TTACTCCTAA CAAATAATGA ACG CTTA AAATAAATTA GATAACGGCT AATCCATAAC ACCACCTCAG ACATACTTGA ACAAATAGCT AATGTTACTA AACTAAAATT AG7'rTGGACA ATCGAAGCTA ATATAGTTGT TCCTAAGACA GGCCATCCCT AAATCATAGA GTACI-rAAGA AAATCTGCTG AAACGGTATA ATTTGAAAAG AATAAAACTA TCAAAACTCC
S
S
S. 55 S S AATACTCTTA ACCGATrGTA TATAGGATTA TACAATGATT TATCTCTrGA CT'rTGTAAAT AGTTGCAAAA TTGGATAAAA CAGATAAGAA AATGATAAT AGCAACrAAA TTCCCAATTC TACTACAATA AATGTCAAAA TGCATGCATC 7TCAA'rTC AGCTAACAAA TAAAAAACTG CAATATGGrG TAAATTAGAA TCCAACTATC CTTCCAATCT TGATATCATG AAACCAAAAT AATAACCArT CCATACGCGT AGGA.AATAGT AATTTAACAA AATTCTATCA ACT'rTCACGA TTCTCGCTTT CAACACCAAT GTTTCTTTAC AATACTATTA CAATTGCTCC ATAATAACGT TATTAAAACA TGTGCCACT TTTT~CAAcAA TGATGTCATT T'rCTAACTAA T''rTCTAAAT AAGTrACTAC AACATAATAG ATT~?rCAAG AGCTTTTTG CTTCTCTrlTr CAATTGACAC
TATCTTTAGT
TTGCTGCTGA
AAAAACCCGA
AATAAATAGA
TAAT1'CCATA C?1'CCAATAT
TTGTAATGAT
CACGAAATAA
AAAAAAGAAT
TCGAAATGAT
TTGAGGGAGT
CAATCACCAG
CTAGCGAAAG
TATTCAGAAT
ACTAGTCCTI'
TCTGAAGGTA
ACACTTGAAT
GCTGTTTTTT
ACCCCAACAT
TCTGCAGGAG
TCCTGATAAT
TTAGGATTGT
352 ATCAGACAAG ATAAATATTC CrAATCCCAA CATrGTAGI'T TCTTCACTT TATCAA'rAGC ATAAAAACTA GCAACAAAAG CGGGTAATAA Tlr'TCACCA CCAATrATAG AAAGAATTTG AAAGATAATA GGAATAAACA TAATCCGAI'r ACGTATCATA TGCGGATATA AACTATTCGC AAGCAGTTGC ATTGCTATCC CCCAAAAGGC AATGACTGTC GTAAAGACGC CAAAAATAGT GGA'rTCCr'rr AAA'rCTTTAA CCCAAACAGA ATAATGAAGG AATCrATAAG AAACTACTC AGGAATCr-AT AAAATAGAAG AATCATCTTT AGTTTTAGAA ATAATATAAG GAATTGCAAbC AAAGTCAAAG ATAAAAATAT TGGTCACTGr A'rTCTCTCTC A'rTATTrGGGA TNTGCCACAT AGATAAAAA'r ATTTTTTCAA CTAGAG'rATC AGTACAAGCA 'rrTACAATAT 'rTTGTAGC TTGAACATAA GCTATTAACG CTTTAACATA CACCCTTGTC AAATACGGGA GTG'IrAATAA ATAGAGAGAA CTrGTATTTT 'rTATAAATGA CCAAAAAAAG ATCTAAATAG TCCAAACTAC T'GGTTATCGG TTTTAGATGA AAAGTT'rCAA CAAATAAACA TTCACAACGT TGTAACTCTC CTGGATGGCA TGCAATGGCA ATCACAGATr GTAATTTACA AGTTAAAACC ACATCTACCA AATGATACTT GAATTGAAAA CAATCCTCAG AAGCATCTTrC ATAAGGTAGA ATGGAATCCG 7500 7560 76 0 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180
TTTCTAGAAA
GAATGATTAT AGTGAACAAG AGCTGCTCTG TTTCTTC AAGACTAATT GATTCCGCAA AA'rTATCTTC TTATCTTT'AG TCTTAATTTA CTTGAAATAA TTAAATCAAA GCTrCTGC ACTGGAGCCG AAGGCGACAA ATGCTrTCAAA GAATCAAATG 353 ATTCTCCATC ACGAACTGTA ATAAATTGAG TCATCAAAGA A'rCGTTATTA GGCCCTGCAC AATATGAAGC CCAAATTCCC AAAGGTAAAA AACGTGCATT ATGCCC?'rCC CCAAAATATC GT1 71 L AG AAAAC7'='T TTTTGGCGAT CATGATTA.AT AATTCTCT A'rACCATAAT CAATACCTAA TACTCCTATA GGCN'?TAA ATCGTTTAAA TTrGGATTAAA rrATCACGAA CTCCCGGGAT ATACAAAATA GCATCTGCCT A7TrCTTTCAA GTACAITTGA AAGAAATCTG ATGGATTATA AAAAGAAACT TCATATCCTT TAGATTCTAA TAAATCATAG ACAATCTCAC CCTAAAGATA ATCACCGTAA TTACTTGAAC CATAATCCGT TGCACCATGT TTTTCACCAC TATTTTTTCA ACCTCCTAAA AATAAATATC ATAATCAAAC AGGACGATAA ACATCTATTG AACTACTTCT CACTAAAAGC AATAGTTGAG AAAATAAATA ACTTTTGAGA 'r'r'TACTTGT TTGAAAAGCT CTGAAATTTA ACTAAATAT'r CCCAAAACAA AACTCCAAAA AACACCACCA TAGTAACCAA
AACATAATTT
TATACATAAT
AAATTACCGA
ATCGCCATCC
AGTTCCAAAA
TAACAACCGT
TTGATAGGAT
TAArrCTTCC ACAAAAGAAG CGCTGATGCT 'rTATCAAAAA AGCCTACAGG TAACCCCAAA AATTTATTAA AATCACCAAC TAACCATCCA A'rAGGAAAAA 0 0 0* 9 9 9 0* 9* 9 9 9 99 9.
9 b 99 9* 9 0 AGTGCGTAGA AATGTCATCC CATATTCATA TGGAATGCTA CTAGGCACAA CAGT-rACAGC AGAAGCTACT GTTAGGCTGG TCAGTCCCGA CTCTGAAAAT ACTTCCCCTA GTATATTCT'r TACAAAATCT AATGAAGAAA AGGAPATCAAA TAAGTATATA CCTATAGTAT TCAAGTCGbA ACGGTGCCCC CTAATAACAA CTAATACATT TAATAGAAAT ACAGTTACTA TTAAAAATAC 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 AAGTACTCTT TTCTTCGAA.A AAGTAATCCC CAAGA'rTGAA AACACCTGGA TTT'rACGACT AA.ACAACATT ACCCAAAAAA TAGTACGCTT CAAGGAGAAC ACACCAGGAA GCATAAG'rAC
TAAAGATTGT
TCCTGTTAGG
TATAACTCGG
TCCTAAATCA
*9 00 9 0 9 *99* 9 9*90 0 9**9 TGCCTCTGAA TATGCTGAAT AGCTATTCGC CGCTCTAACT AGTTATTACC CTAGAAA'rAA AGCCCACTCC TGTrTAAAATC GTGTATACTA AAACCAACGC ATCATTATCA AAATTAGGTA GACAGCTTAT CTGAATAAAA TC'rATTA'rC CTGAACTAGC GCTAGTACTG TTTTAGAA'rC CTACCCGCAT TGTACAAAAT TAATGTACCT TTCCATCACT CAAATTATAA AAATATATGA CACAGCAGAG TTGTTTGAAA TTTCTCTTCA T=TCCTGAT AATTrTGTAC ATAAAAAAAT AAATACCrA CAGAATAACA ATGAAATAAT TCrrCA?1'AT TATAGAAGTT
TTCTGAATGA
AAACAAAATC
ACTAGGGCTC
CCCCATATAC TCATTGAAAA TTAATCCAAA CATAAAAAAA TAAGATAAAA TCAGATACCA TACAGAAAAA TCATATATAC TAACTTTTTG TAAAATAAAA CCAGTAA'rrT GAAAAATAAT TAGAAAGCAA ACCCATATAA ATATAGACGG AACATAATTA GATATAAGAA AACCATTAT'r CCAATTATCG AGAGTCCAGA TGTCACTCTA CAAATATAC'r ATAACGATT'C AATAAI'TTAC GTrATATAr TCAAAACGAT TAGCAACACA GACTCTTCGT 354 ACAAGTAACA GAAAGCAAAT ATAAAACTTA TGTCTGCAT CTATATCTCC TrrTACAC TAGcTTGA'rA ACAAATATCA TAGAGTCCAT TGCATTCCTC AGATGTTAAA GACAGTACTT TGATAGGTAA GTAAC 'AATG TTTTTGGTCA
ATGTCACTAG
ACATTTCTTG
CTGTCA'rACT
TATCTTTCCA
CATCTACTTC
TTGCGTCACT
AACAGGCAAC
AGAAATATCA
TTCTGT-T
GTATCTGACG A'rAAAATG CCCTCATATT TAGACGGAAG GTCCTTCTCC CTAAAAATAG AATTTCTGCT CATCCTCACC TAATCCCGAT GCCTGAGCCT CI'ACTAGAGA
CAAAAAAACA
CACATATGGG
ATTACCAACT
AAATAAATAA
ATr'rGACACA
TTTCAAATCA
TTTGATTAAA ATGAG'rTCTT TTAAAACGTrr TAGGCGAGCT ATAT'rTCCTA ATACGAACTT T=CTIAACA TCTGACAAAA A~rGATACTT TCCATCGCAG ATAATAAATC G'rCAGATTTA GTTCTAAAGC AGGAGTAAAA TAACATTTG CTrTGGLrT TTTGA'rCTCA 'rCTAATTCTC TACGACA'IT A'rrGCA'rTAA AAATAATTTC TTAGCCGAAT CTTCCCCACA ACTAATTTAC GCAATACT'TT
*.SW@S
90 S 0 0*
S
S
09 S S 5* S 9
S
AATTTTTCCG 'rCTTTATACG cTrTrC'rTCC TGCAAACCAA TGAGTTCCTA AGA~wTTTAC TTGAAAACTG TTTTCTGTTA CATAAGCCAT AATTAI=rA GATAAGATCA GACCAA'rTGC ATAATCTCCT T'rCTTTATTA TTCTAGCAAG TAATAGAGGC ACATGATAAA CCT1-rGCACC TTrGTTCTTM CCAGGCACAA TAAAATCAAA ATAGTTrGAAT AGAAAACTTT CTACTCCACC
ATATAACCAC
CAAAATTGT'r ATGACTATGA ATAATTC'rAA ="TACAACC AGA ATAG CCATGGCAAT GAACTATATC AGAGAGAAAC TGATGTAGAG -GCTTTTTCCT CAATTCTTTC ATT=ATCCT CTAAAAATCC TTGAATrTT T7=CATCAA TGTGAGAATA 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 S. S S 0
S
0 0S0* *5 *5 *S S 5
TTTAATCATT
GTTTTTCTAA
CAAGGT1TTTG
CGAAACCAAA
TAGGTrTATA
PAGTGTGATT
CTTCTTCCTT AAGCTTAAGA ACTAATrCTG TCCATGAAGT AATATACAAA GCCAAACAAT ACCATTCTCT A'I-TGACACTr CAACAATGCA GCAAAGTAGA CCCGTATAAA TTCAAAACAA
ACTATCTAGT
TTcGcr'rCTC
TATCACAATT
CTTTCCGA
TTrCCATATA
GITTTATTAGA
CATCTGTArr GTTGTAAATA GATGTAATAC TAATCTATT TCTGTTTTT CTTAATTAGC TGTTTCCTGT TTCATCCTTC ATAGGTAAAA AGTATCTTCA CAAACTAAAA CAAAGCATAG TCTAGTAAGG CTTATAAAAA GACATGG'rAT AT~'rTTCT TGACAAAATT AAATCGACTG TCATTTGCAA TAAACCAATG TAGGAAACTT GTTTAATTCT ATCATANCGG CTTTAGGCTG GAATGTGTCC ACCAAGTTAA CATTGCTGAT CCCTTAATTC TCCTGCATTA GTACCTATAA AATTCAACTG AAAAATCGAT TATTTTTTTA TTrrTCT GAAAACGAAT GAAN'GGAAA CTACTATTA TNTrAACT GCTTTACCTC GTAGGTTATG GGTAGTAAAA 'rACTCTCCCA AAACGA'rATT CATTAAAGAA TTTTTICACCA ATTTT'rCATA ACTGTAATCA CGAATATCAT GAAAATCTAC TAAAAlrGAA GACACAATAC TIr'-~AG c CCTTTA ATCTrC ATrTCCTTT ACCAGAAAAA TAT'rCTTATC CCAA'rATATA
AAAAATTGAA
AATTATTCTT
T~wrGAAAAGT
ATAGATAACA
CTAACATATC
ATAATTGCTT
ATCGAGTCTC
GTAAGGAGTT
TCTTIACTTTC
GA'rTrTCATA
TACTAAATT
CACAA'IrrGC
TGCCTGCCGC
CTATAAAGA
GAAATACGAT
ACATCGTAAC
AGATATATAT
CCTCTCTAAA
GTAACAACGA
ACAAATATTT
TTCTTCTACA
CATATAAGAk
AACTAACATA
ATACAAGGAA
CGTACCATCT
CAAACTGGCA
7TGGTAAAAA AAATrATAG CCCTCTGAAG ATTGTTTCTG AACCAAACGA TAAACCAAAA AAATATATCT ATTTTrAAAT GAAAAGAGAA TATGTAACGG CAATATCATA TCATAATCAT TGAATTrrAC A'rAACCTAAT ATCTTACTTA AGTAGrrTG rTTTGTAATA ATCTCGTTAA TAATAGACAG TTT-C~rCAAT AATTC~rrAT TATCAGATAG TATAAACAGT AC'TCTCATTA CATGTCTCCA GTTCGAGCAT AAALTGCTCT GCTTTC?1'TC CTAACTCTCT TTGTCTCTrA TTTGCCAATT GNN'ACATC TCGrrCGGGA ATTAT=AG CATC'TCCTGA AA~rGCACCT TGTACCTTCC CAGGTATAGT ACGAGAAACT GCATCTGA'rT TTTTATAGAA GGATGGCATT ATA'rrCTTTA ACTCCAATTC ATGAGCTAAT CCAACAAAAT GAAAATGAAT TTTTCrGGGT GCT' 'CAAAA TAGTTTCCAA ATTrTTGTGCT 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 TCCTCCAAAG AACGTCTTCC GCTTTCATGC TTAACAATTC AAATTGGTAT TCTTCTCTAT TTGCCAATAT TACCAGCAAA GGGATAAAAA GATCTTCTGC AATTGCTTCA CAAAATAATT CGGTAAAC~rTT TTGAGAT CCACCTACGO TTAAACTATC TATTTTTrTT TATAAGCCAT AGTTAGGTCA ACACTTTCTT TATTAACTAT ATATTGTGGC AAATATGTAA TC=TTGT'rC TTTAAATGAT GGACTAGTGA CAAATATATA AAATTTAAAC AGCTTGAAAA 'rCAAGCCATC TGGCCAAACA TCCATACAAT A'rAGAAACAT ACCAGCCCAT GCCATCATAA CTGGAGACAA
AGATTCATCA
GGATATGTCA
ATCACTAGCT
TTGTI'CACT
CGGTTTC7TTA
TTGGTT.AACG
AACTCCTAAA
TTTTTCTA
TTTTTNTATTA
CAAAACAGAG
TGGTTCTGGC
TTCTTACCAC
AATACACAGT CAAAATTCGA TCCATCTTC GrI'rrATACC TCCCCAATAA GTAGA.ACTAA TTGCAAAGCT AAAATAATTC AACAATCGAA ATACAACACT GGGATTGTAT AAGAACGATA TATCGTAACA CCTrCTATAA TCTCACGTCT TGACGATAAT CTGCATATAT CTTCCCTTCA GGGTAATTAG GAATCCCAC ACTTCATGCC C rTTCGAAC TAAATCTTCA CAA.ATATCTG ACAACCTGAA TTATAATGTT GGCAAACAAA TAGTATTTC ATTGTCCAAT TTAACTTTCT 356 TACCCTCTAC AATACCN'r CGTTTCAGTA CCTAAGCTA'r TGTCTTAACT TATCCATTA'r CAAAGACAGA TGTTTAACAT AGTAGCCATC TAACTCCGTC
CAGACAAAGT
TTGCTCCATA
GACCrACAAT ATGT'rTTTCG
GAGGCGCCAC
ATTCTTTATG
TAATCGCAAT
ATCACGCCCG
CrATCTCTC
ACTCATATTA
CAAGAAAGCC
ATT'rAGGT
AATACCAAAG
TGCAATTATC
TTAATTTGTG CCCATCCAGT TAACCCTGGC TC 'GCAATCA AATCTAGTTC ATTTATACCC CCAACAAGAA TATTAAACAA TTGTGGTAGT CCTACT'rTTG TAATCYATTG CTCTGGATTA GCATC'rATTTI TCATAGACC1' AAATTTCAAA CGT'rTGCT TAAATA'rAAC CGGACCTTCT
ATACATCTAA
TTCATCTCAA
AAGATATCAT
GCTGGTCTAG
TCATCCAAAG
TATAAGTTTC
ATATAGAAGT
GAATCAAGTT
ATAAAAACCG
=rrATTATAC GACACAATAT TATTATCCCT ATTAAAGATA CGTACATAAA CAACCTCCAA CTA'rAAATTC ATAATATATC ACCTAATCGT TATTTCCATT TTTCA'rTCTA CAGAAACTCA ATATATA'TT CTACGTCAGA TAAAACTTTT GTTCTCAATA ATGAA'rTAGA TTTGAAGACA TATCGGTTTC ATGCTCTGTA AGAATGCCAG 'NN*ACCAT AATT'rCCATA T'rAT'rTAAAA CAACTCCTAG TGGATATCAC GTTTATTCGC TGAGTGATAA TTGCCCCATC TCAAAATATT TACGCAATGT GTAGGGTTTG GTGATACAGA CATAAACCGT GAGATAAATC TCACGAGATT TAAAAACTCC GTTTTATAGC C'rGCACCCGC TCCCCAGGGT TAACAGAAGT T'rrGTACACA AGGCATTGTA GCTATTTCTA ATGTCGGCAT CCCAAAAGTG TCATCTGCAA AAAAGT'rCAA GATGAAGAAC TT-rCCATTTG ACAAATTAAA GGTCACTCAA TGATTTTCAG CTCCATCTAA ACAAAATTTA AGGTCCAGTT CAATTATTCT TGAAATTGCA ATCAGTACAT AGTTATATTG TAGTCCCCTT GGAACCGTAA ACTCCATACT GAACAGTTTC CCTGTTTGTT CTCACCTGTT GCTGTTACCA AATAACAATT CCAATAGGCG TTCAATCATA TCATTAAAAT TCCCGATTGA ACTACAAATA AGCTGTCCCA GATAAAAATT TAACATAACT GAATTTCGAG AAACGACCAT GCTATATTTA AACGGAAATT ACTTTT'AGTT ATATTCTTCT GCCTTCyrrAA CC?1'CTCI'CC TATTTCAACT TCAGGCAGTA CATGCAACTA AAATATAATN' CTTTTATCCT 'T'rGTTTCAG TAATATATGA TCCAAATAGA CCGAATATTA AAGCTAATAA ACTGATAAGT CCATACTATA TTCAT'N'TAT TATTAACCGA GATATCCAAT TAATTGTTG TTTCGCTI-rT AGATGGACGC A'rCACACTTTr GTGTA'rCAAT AATGATATAA TTTTACTT'rG TAACAAGGCT AAT'TTTCAAT ATTTGTATCA CTGTTAGCCC TGTAATTI N' TATCGCCATC GATCAAAAGA TGGAAGTAGT TGNTTTCCT TATCTCCGCT CAACTGTATA TGAACTrCCAG 'T'TTTTTTGT TACCCAAGTT TGGCACAACT GACGTT'rCAC ACGAGTATCC CCCC'GCCAA AAAACCAATT 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15520 15600 15660 15720 15780 15840 15900 .5960 16020 16080 16140 16200 16260 16320 TGTATr'rTCG
TATAACACTA
ATATCTTCCG
GTTCCAATCA
357 AGTGTATTGC GTTTAATATT
GTTGTCACGT
GAGTT~AGCGA
CGGGTATCAA
AGI'TTCAAAT
TCTTTACCA
CAGAAACACG
TACGGCTTGC
CTGGTACTGT
CAGAAACAAC
G.ATAAGTrCC TGGCGAAGAC GGGGATATCG CCGGCCTTGC AGTAATACTG ATAAI'TTTr GAGCAGCTAC CTCTTCAGGA ACTCGATCAT TAACTGAAA'r CAC~rrAATT TTATTAGCCA AACCN'TGG TTCCTCCAAA ACATCCTGCG AAAGGATAAT TGCCTGCJa6A TCCTGATTTG TCAACCCCGG
CTCCTCCAGT
TTCTCTCA-AA
AGAGACAATA
CGTCAAATCT
CTCACGGTAG
CTTGTCrTcc TGATTGCGAT TCACTACCTA AATTCGCGTG CTACTCGTAT ATCTGGCTr AACAATAAAA GT'ATATG CAAAAGCCCC CGCACCTGTC ACAAGTGCCA CTAT'rAAAAT CATTAGCTI'G CGTTTrCCACA AGC'N'?rAAC TAATTGAAAT ACATCCATTT CI'ATCGTATT TTGTTCt.:rLC ATCATTTCTC CTAAATTAGT TGATCCATTA CAATTTTTCG AGGATTGTCT ATAAAAAGTT CCTGAGrCCCTT CTCCG TrT ~1TGGG TA.ACAAGGTC ATATGCTTCT GCCA'rATGAG GACCACC GTCTAGATTG TGCATATCAC AAAAATACTG AGCTCT'N' TTCATGAATT GGACA'rGTGA ACTATr'rACT 'rGCGTGTAAC TTTCATTATT TTCAAGAGCA TCATAGCGCT TTGCAATGAC ATGAACCAAA TCC'rGCTCTA TATAACGTTC GCCAAAAAGT TTGGGTTTGA AGCCCATATC GATCAGTTCT CGAACGCGTT CAATGTGGGC AATGACTGGA GTAA'rTCCCA 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 172e0 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 ACATCAAGAT CTT'GCCAG GCGCTATGAA TATCCCGATA CTATCAAGGC ATAACGACTA TCATTGAGGG TCGGAATCCG GAACATCTGG TGTGTAATAA ATTTCAGCCC CCTTAGCTAT TTCCCGAACC TGAAGAAAGT ACATGCCCTr GCGACGGTGA GAGGTAGAAA CTGCCAAGAG AGCCTTGCTT TCCTCTC'IrG
CGTAAGCAAT
TTTCTGCTAT
CAATGCTrC ACT'rGGGACC AGCGAGTGTTC ATArTA.AACT CT'rT'rTCC AGCTTATCCA GACCAAGTCA CTCGCCACTT CTTCTCTTCC GGAGTTTCAA CACCCCCTGT CTGTAGGATT GTCATCTACA TCAAAAACGA CATCCTGTAT AGCTGCTTTA GGTTACTG;TC TGGCA'TGCA TATGCCAATG GATGTCTATC ACTACAGCTA AACTACTATC TAAGAAGGAA GATCCATCCG CCTCCACTTT CTAACTGAGC ATTTCATCTA CCCTCCATCA ATCTATTTCC ATCACATAGA ACCTGTCCCT TTTAAA'rCTT GAGAATTTAC TTTATAATTC ATTGACCAAA TTATCATGG TCTCAAGTGG CATATGT-r TGGATAGAAT CTTGCAAGCT AVTAATGATC GTTAATTTT1-I GAAGGATAGC CACAA'rCACC CCATCTGCTA GCGAGTAGCG CTCACGAACA ACATTGCCTG CACGGTAATA C?1'TCCATTC GTACTATAA'r TTTTCAGCAC =rGGTTGAC TTTTGTTGAT GCCGCCCGCG GTCACGATCG AAACCCAGAG CCTGTTCTGA ATCAAGATGA GTATGGGCAG TAAArrCTTG ATCATTATAA ACATCAAT'rC CACCCAACAA ATCAATCAA'r TAGTAATTGA TATCCACTCC ATAGAGA~ TAAATGCCCG CATGAGTCAA TTTATCTTTT TAGGCATCAC GTGGCG?1'GT GGTCAAGAGG AGGATGTTGA CATCTGATCG CGACACCGAA 358 TTCAAAAACG AAGTGAAGTr CAATCGCACA TCTAAGGTGT GAAT1GGACGA ATCAACTCCA TGATTATN'C CACCATCTGC GATTGGTACA 18120 18180 18240 ATTCTTGG TATC'rCGATr CTAATAGGAC CATAGGTGTC ACATAGATAT TGAAAGACTG CCCTTAGTAT AAATCTTT'r TTTTCAAAGA CACTATTTAG GCTGCCAAGT AAGACGAACT TCAGCTAGTA ATTTCTGAAT AGTTGCGTAA CAT'TTCGAT GAGTAA'rTAG AAGTCGCATT AGCGACACAG AGCTGACAAG TTTTTATAGA TAATCAAGAG ACTCTTAGAC GTCTTAGGAG CT rCTACTTr 'rATCTrCGAT GCGTAGTCTG GATACTCTGA GACAATGGCC TTAGTCTCCC CTGCAATCAA.
CrGGTTGACC GTCAAATCGG TATTCTGACT ATTTTCATTA TrAGTCCCAG TCGGTGCTGT CTCACTATCT GCTAAAACAG CGACACTGAT TAAACGATTrG GTCAGTCCAA CAAACTGCTG GATAGAGAAC ACCAACAGAA AAATAGTAAA TAGCCCTACC AAGGCAACTA GTAGGACTAA GACAGTCAPC 18300 AATTCCACTA 18360 TT'rAGTGA6AT 18420 CTCGATGATG 18480 ACTCTTGTAA 18540 TGACTTGATA 18600 CACACTCGTC 18660 TGAATATTCT 18720 TACTGCAAAG 18780 CTTTTCAGCT 18840 CGCAGTTACC 18900 ACTAGAT'rAA
CAAACTAACA
GAACGTGATT
TTATATCACT
ATCTTAGTAG
CGTTCAGGAC
GATATCTAAA
ATAAATAAAT
?TT.TAAAACG
TTT'rTACGGT ACT'rCCCGCG
AGTCAAATCG
AGCAAGGATA TTGTACTTAA AGATTAAGAA CAATAAAAAA AGTCAGCAAA ACTATATTAA CACTTCGCT'r CACTTTCTGT TCTACTCATG ATTAATACCT ATACATTGAA AATGTCTACA CCTT'rATTTT TACTATCTGC AAACAAAAAT ATAGTAAAkAT GAAATAAGAA ATTTCTAACA ATGTTTTAGA AGCAGAGGTG
CATTATACGA
ATCTTTAAGT
CAGAACAAAT
18960 19020 19080 19140 19200 19250 INFORMATION FOR SEQ ID NO: 36: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 21706 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: AAAGTTGAAA GACTGCTAGC TGTTTT-TGAT ACCAATCGTr TCCAACTACA GAGCAAACAG TATACAAAGT TTGTTTTGG ATGTAAGCTT CTTGATGGAC AATTCCAAGA A.AATCAAGAA ATTGCTGACC TTCA.A'1TTT TGCCATrGAC CAACTGCCGA ACTTATCTGA AAAACGCATT ACCAAGGAGC AAATAGAGCT TCTTTGGCAG G 'IrATCAAG GTCATAGGGG GCAATATCTT 359 GACTAAGAAG ATGATTATCG TA?=TCAAA TCCA~rT"MA ACAACTAGCA TGGTATAATA ATATGCAGGA AAA7?NTGAA 'N'ATGAGGAA GACTAGATGA ATTTATGGGA TATIwTCrT ACGACrCAGG CAACCGAGCC GCCCAAA~rr GACC'r7=r' GGTArcGrAG CCTATTTACG CTCI-rAGC TAACCTI=A TACAGCCCAT CGCTATCGTG AAAAGAAGGT TTACCAACGA lrTrCCAAA TCTTGCAGAC T13TTCAGTTA ATCCcrCTTT ATGG~rGGTA CTGGGTCAAT CATATGCCAC I'GTCAGAAAG CCTACCCTT'ACCATTGCC GTATGGCTAT G~rrGTGGTA CrCTTGC1-rC CTGGTCAATC ACATTAGCAG CCTTTGTTTA CAAATATAAA CAATACTTTG TCCACGrGCCA GATCCTTACC CTATCCTTTA TCTIrrGGTCA TrrAGCACTC CACTATAA'rG CGCGATTGCT GGATGTGAAG GCCT'rGATTT T1'CTGGTCAA rTTTGGTGACA CCATTGGTTG GCGATCACGG TCTAGTAGCT GCTACTATCA G~wrTGACTAA GAAAATCT'rA
'TTCGGGAACT
GGAATTTT'TC
GGTGGCGATT
AATTATTTAC
GAATTCTTTT
CATTATTGGG AACATTTGGG CTTI'TCCACA TATCACCATT CTCTAGTTTA TCTATTGAGA TCATGACCTr TGCCCTAAAr ACGGATTTT GACAAAACCG TTGTI'TCAAT TGTGCTGGTA TAGCTCAAGA AGCAGAAAAA ATGATTGCAA AGGAAGCTTA ACACAGAGCT TTCTTTTTTG CTCT'rAGAGA GTrN'TACA.A GCAGCTTATA AAATAACAAT TTCTGAATAG ACAAACTCAA AAAATGGCI'G GGAAA7r'TAG GAAAAAAGCA AGCACGATTA AATTTTTTGT GTTATAATAT TT'TGTGAATA GCI'ATGCCTA TG=~AGCTA IGGAATAATA CGAAGTGCGA AACTTGGAAG ATAGAGAGGA AGCGATGTAA TGGCTAGAGA AGGCTTrTT ACAGGTCTAG ATArrGGAAC AAGCTCTGTC AAGGTGCT'rG TGGCCGAGCA GAGAAATIGGT GAATTAAATG TAATTGGCGT GAGTAATGCC AAAAGTAAAG GTGTAAAGGA TGGAAT'rATT GTTGATATTG ATGCAGCA:GC AACTGCTATC AAGTCAGCCA TTTCCCAAGC GGAAGAAAAG GCAGGCATP'r CGA'PTAAATC AGTGAATGTC GGCTTGCCTG GTAAT'r=T GCAGGTAGAA CCAACTCAGG GGATGATTCC AGTAACATCT GATAC'rAAGG AAArrACCGGA TCAAGATGTT GAAAATGTTG TCAAATCAGC TTTGACAAAG AGTATGACAC CTGACCGTGA AGTCATTACC TTATTCCTG AAGAATTTAT TGTGGATGGT TTCCAAGGGA T'rCGTGACCC ACGTGGCATG ATGGGGGTTC GCCTTGAAAT GCGTGGTTTG CTTTATACAG G ACCTCGTAC TATCTTGCAC AATTTGCGTA AGACGGTTGA GCGTCrAGGT GT'rCAGGTTG AAAATGTTAT CATTCACCA CTAGCAATGG TTCAGTCTGT T?'rGAACGAA GGGGAACGTG AATTTGGTGC TACAGTGATT GATATGGGGG CAGGTCAAAC GACTGTCGCT ACAATCCGTA ATCAAGAACT CCAGTTCACA CATATTCTCC AAGAAGGTGG AGATTATGTA ACTAAAGATA 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 360 'rCTCCAAGGT TT=AAAACC TCTCGCAAAT TAGCGGAAGG CTTGAAACTrG AA'rTACGGGG 2040 AAGCCTACC GCCTCTTGCA AGCAAAGAAA CCTTCCAAGT AGAGGTATT GGAGAACTAG 2100 AAGCAGTCGA AGTGACGGAA GCCTACTTGT CAGAAAWrAT TTCTGCACGA ATCAAGCACA 2160 TCC~rGAACA AATCAAGCAA GAATTAGATA GAAGGCGTCT ATTGGACCTC CCTGGTGGTA 2220 TTGTCTTAAT CGGGGAAT GCCATTr'rAC CAGGTATGGT TCGACTGCT CACGAAGTCT 2280 ?TGGCGTCCG TGTCAAGCTT TATGTTCCAA ATCAAGTTGG TATCCGTAAT CCAGCCTTTG 2340 CGCATGTGAT TAGTTTATCA GAATTTGCGG CTCAATTAAC AGAAGTTrAAT CTTT'rGGCcC 2400 AGGCAGCGAT AAAAGGTGAG AATGACTTAA G'rCATCAGCC AATTAGTTTT GGTGCGATGC 2460 TGCAAAAAAC AGCTrCAGTrr GTACAATCAA CGCCTGTTCA ACCAGCTCCT GCTCCAGAAG 2520 TAGAGCCGGT GGCGCCTrACA GAACCAATGG CGGATTTCCA ACAAGCT'rCA CAA.AATAAAC 2580 CGAAATTAGC ACATCGTTTC CGTGA'rrA TCGGAAGCAT GTTTGACGAA TAAAGAGGAA 2640 AAATAAA~TTA TGACAT~rc ATTTGATACA GCTCCTGCTC AAGGGCAGT CATTAAAG 'A 2700 ATTGGTGTCG GTGGAGC'rGG TGGCAATGCC ATCAACCGTA TGTCGACCA AGGTG?2'ACA 2760 GGCGTAGAAT TTATCGCAGC AAACACAGAT GTACAAGCAT TGAGTAGTAC AAAAGCTGAG 2820 *ACTGTTATTC AGTTGGGACC TAAATTGACT CGTGGTTTGG GTGCAGGAGG TCAACCTGAG 2880 :::GTTGGTCGTA AAGCCGCTGA AGAAAGCGAA GAAACACTGA CGGAAGCTAT TAGTGGTGCC 2940 *GATATGGTCT TCATCACTGC TGGTATGGGA GGAGGCTCTG GAACTGCAGC TGCTCCTCTT 300r- ***ATTGCTCGTA TCGCCAAAGA T'rTAcGGCG CTTACAGT'rG GTGTT'rTAAC ACGTCCCTTT 3060 **.GG7TTTGAAG GAAGTAAGCG TGGACAATTT GCTGTAGAAG GAATCAATCA ACTTCGTGAG 3120 CATGTAGACA CTCTATTGAT TA'rCTCAAAC AACAATTTGC TTGAAAT'rGT TGATAAGAAA 3180 ACACCCCTTT TGGAGGCTCT TAGCGAAGCG GA'rAACGTTC TTCGTCAAGG TGTTCAAGGG 3240 ATTACCGATT TGATTACCAA TCCAGGATTG ATTAACCrrG ACTTTCCCGA TGTGAAAACG 3300 GTAATGGCAA ACAAAGGGAA TGCTCTTATG GGTATGTA TCGGTAGTGG AGAAGAACCT 3360 :*.GTGGTAGAAG CGGCACGTAA GGCAATCTAT TCACCACTTC TTGAAACAAC TATTGACGGT 3420 GCTGAGGATG TTATCGTCAA CG'rTACTGGT GGTCTTGACT TAACCTTGAT TGAGGCAGAA 3480 GAGGCTTCAC AAA7r=rAA CCAGGCAGCA GG'CAAGGAG TGAACATCTG GCTCGGTACT 3540 TCAAI'TGATG AAAGTATGCG TGATGAAATT CGTGTAACAG TTGTTGCAAC CGTGTrCGT 3600 CAAGACCGCG TAGAAAACGT TGTCGCTCCA CAAGCTAGA'r CTGCTACTAA CTACCGTGAG 3660 *ACAGTGAAAC CAGCTCATTC ACATGGC=r GA'rCG'rCATT TTGATATGGC AGAAACAGTT 3720 GAA7TWGCAA AACAAAATCC ACGTCGTTTG GAACCAACTC AGGCA'rCTGC TTTTGGTGAT 3780 TGATCTTC CCCCAA'rC CArrGTTCGT ACAACACATT- CAG74CGN'TC TCCACTCGAC CGC~rTGAAG CCCCAATTTC ACAAGATGAA AATCGTTAAG TAAATGAATG TAAAAGAAAA GGCTAGTCTG AGTGCTCATC GAGAGAGTGG GATGAATrGG ATACACCTCC ATTTTTCAAA TACAGAACT'r GTTTTTCGAG AAGTTGCAGA TGTAGATGTA CCGACAGCGG TCGTGTAGAT AAGTTTCTGG TTTGATTGGT ACCTTGCAAA CCATGCATTG GACTCAGTAA CAAGTGT'rTC CTTCAAGTAA GGAACTGCTG GAAATCTTGC AATGACGATG GCACCT7"rr CCAAGAT'rTA CAAAGAGAAA AAGTATGGGA ATGAGTCGTG TATAGCTACA TCATTTTTTA ATTTATAGAT TATTTTACGG GCCTGTGTTT ACTTCAGTAAN ACAGTCCGCT GGCACAAAAG
AAGCTGCT
AAAAATATGA
GACGTAAGGT
AGCTAGCAGG
TCCGTCTCT
TCCG-CTAGGT
AGCTrrAAAA
GAAAGATGTC
GGAAATTCAA
ATATTTCTAA AGAAGAAAC CAGAGTTAGC CAGACTAGAT AGGCTACCAG TGAGCAGTTG TTCAAGAGAA ACA.AATTCCA ATTATAAAGA AGCGAT-rCAA AGTAGGAGAG AACCATGTCT AGGATGAGGA TTCAAGTCTC ATTCn'-CACA GGAACCGGCT AGAACAATAT CACCAGACTT GTCATTGCAG TTACCAAGTA GTCCATCATA TCGGTGAAAA GATCGAGATG TGACTTGGCA ATTCAATACG TrGA'rrATTT AAAAGAAGTG ACCGAG'rCAT AAACACGGTT TTTCGAGAGA AAGATTGAAT ATGTrGGTrT AAAGAGATT'r TCAAGGCGGC AATATGCCTA TGACCGAGN' TTCGTCCA CTTrTGTTCG TTAAAAGATA GATTCGATAG CCTATGAAA AAAGAGATGA CTCCCAATGA ATCAACCTTC CATGCAAGAC AACAGGAATT GATGT'rCGTT ATCCTAGAAA AACGAAAGTA TCTTGATTGA GACTA71NTGG ATGGAGCTTG ATGTA1-rTGT TGACACCAGT GAAGATCAAC AGGGrGAGTr 7=TAATTCG TATGATTTA'r CTGTCATGTC T'rG.GTTTCCA TGGTGAAACC AGTG.CTTGCT TATCTG~rG GGrGCGATT TGGCGATGAT AGGATGAATA 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 GGCAAATCAG AGTCAGCGTG CAACGGATAA ATATGAGCAT GCAACAGAAA TTGTTrGATTT TTTTCAGTAT ATGACAGAGG TGCAGGCTCG TCATGTTTTA GCTGGAAATT TGAAAAAGGT GAACGTTATT GTAAATGTTG AAGATATCCG CGGT7TTGAT ATGAAGCGAA ATAGAGTACG AATGCAGTGG ATATTTACTC CCTGA'N'TTG GGTGCCTACG AATCCAGT'rT AGGTCGTTGG CCCTTGCAAC GCC'rGCCTTT ACAGATAGCG GTTTTGGTTC GATTTTAGG AGAAAACCTA
GGTCATTATA
ATTGGCAGGA
'rCGTTGTTTG
AGCTTCTACC
TTTACCAGAT
ATAATGATTT
GTAGCCTTCG
ATTGTAGCGT
GGTCTTGATT
GTGCGTTTTC
AACCATTTA TCAGCATTTC TCCATAGAAG ATCGTCCATT TCTTGACAAG GGAATGGAAT GGATAAAGAA GGTAGAAGAT AGCTATGCTC CTT'TAAC TCCTTTATC AATCCTCATC 362 AGGAGAAGCT ATTAAAGAITT ?TGGCCAAAA CCTATGGTCT TGC~rGTAGC AGITAGTWWG AATTCGT=T GAGTGAGTAT GrCGAG~T TATTATACCC AGATTA'N'?C CAACCACAGT 7rrCAGAI-r TGAAATATCr CTCCAGGAAA ?MGGTATTC CAATAAATTT GAACATTTAA CGCATGCTAA GATTTTACZG ACAGTCATCA ATCAATTAGG GATI'GALACGG AAAC=MrG GAGATATCCT AGTAGATGAA GAACGGGCGC AGATTATGAT-rAArCAGCAG 'TrrCTC TCTTCAAGA TGGACTAAAG AAAATTGGTC GTATACCTGT ?1'CGCTGGAG CAACGTCCTr TCACCGAGAA AATAGATAAC CTAGAACAGT ATCGAGAACT GGATTTATCT GTGTCTAGTT TTCGAT'rAGA TGTTC=rTA TCAAATGT TGATTGAAAA GAAACTTGTC CAAGThAATT TTCAAG7TGG AGAC'TTGATT AGTGTGAGAA AGGGACAAAC GAAAAAAGAG AAGAAAAAAA GAATAGAA'rG CCAATTACAT CATTAGAAAT
TGAAACTATC
ATCATGTCCGT
AATTTGGTCG
TAACCGTCCA
AAAGGACAAG
TAGGAATCAA CCAAACCAGT AGACAAATCA GATTACACTG CTTGAGATTA CTTCAAGATA GTTA'rTATTA AGTAAGTGAG AC7TTG4GAA CTCGATTCAG AGGT'rrGAT CCAGAAGAAG TCGATGAATT TTTAGATATT GTGGTTCGTG TCTTGTGCGT GCGAATCATG ATAAAAATTT GCGTATTAAG AGTTAGAAG 7TrACTTrTGAT
TGAGAGAGTG
AGATGCGCAA
AACTGATAAT
CTTCCACCAA
TTGGGAAGAT
GAAATAAAAG ATTCATTGAG CCACTCTGTA TTGATTGCTC AAACAGGCGG CGCATGAACG TTCAAACAAT ATCATTCATC CGCTTGTTGG AAGAAGCTAA ATATAAGGCA AACGAGATTC GCTAAGAAAG TCGC'rCTTGA AACAGAAGAA TTGAAGAACA
ATTACGAAGA
AGCG~rrGTC
AGGATACAGC
AAGCAGAGCA
TITCGTCAAGC
AGAGCCGTGT
5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320
CGTCTCAAAT
ATTCTCCG'rC CTACAATTGA GAGTCAGTTG GCTA7"TGTG AATCTTCAGA CAACAGCTAC TTATCTTCAA ACCAGTGATG AAGCCTT'rAA AGAAGTGGT'r AGCGAAGTAC TTGGAGAACC GATTCCAGC'r CCAATTGAAG AAGAACCAAT TGATATGACA CGTCACI'TCT CTCAAGCAGA AATGGCAGAA T'rACAAGCTC GTATTGAGGT AGCCGATAAA GAATTGTCTG AATT'rGAAGC TCAGA'rTAAA CAGGAAGTGG AAGCTCCAAC TCCTGTAGTG AGTCCTCAAG ?rGAAGAAGA GCCTCTGCTC ATCCAGTTGG CCCAATGTAT GAAGAACCAG AAG'rAGCTCC AATGCATCCG ATAGGTCCAA CACCAGCTAC AGAAACTGT'r GATTCAATAC CGGGArrrGA AGCACCGCAA GAATCTGTTA CAAI!'=ATA AGAAATATTC TGAGAACAAT ATCTTATCCT TATATT'rCCA GCGAGCAGGA GATGGTGTCA GTCCTGTAAT CCCTATITGAT AAGArrATCC 'rCTCAAAAAC 'rCAAGTCTGA AGCTAGTAAG ATTTGACGTT TCCCACGTTA CGGGATAAGA GGGAGAAAGA CTAAATCTT TTCCGAATAA AGGTGG'rACC ACGA7"'rCG TCC'I-rTI'TGG AAGTCGTGGT TTNTAATTTG TTATTATTTA TAAAGGAGAT 363 ACCATGAAAC TCAAAGACAC CCTTAATCTI' GGGAAAACTG AATTCCCAAT GCGTGCAGGC CTTCCTACCA AAGAGCCAGT I-rGGCAAAAG GAATGGGAAG ATGCAAAACT TTATCAACGT CGTCAAGAAT TGAACCAAGG AAAACCTCAT TTCACCrGC ATGATGGCCC TCCATACGCT AACGGAAATA TCCACGTTGG ACATGCTATG AACAAGAT~r CAAAAA3ATAT Crc'1-rCGT TCTAAGTC'rA 'rGTCAGGATTr ?rACGCACCA rrATTCCTG GTTGGCATAC TCATGGTCTG CCAATCGAGC AAGTCTTGTC AAAACAAGGT GrCAAACGTA AAGAAATGGA C'rGGrrGAG TAC7'rGAAAC ?rTGCCGTGA GTACGCTCr TCTCAAGTAG ATAAACAACG TGAAGA?'N'T AAACG7'rGG GTGTTrTrGG TGACTGGGAA AATCCATATG TGACCTTGAC TCCTGACTAT GAAGCAGCTC AAATTCGTGT GCTAAGCCAG TT-TACTGGTC TACCATGACr TGGTTTCAAC GTTCTAGATA CAGATACTTA 'rCICGTG.CT TGACGGTTGG GCTCGTAAGT TTCTCGTTGC GCTGA'rGTTC AAGTGGA CACCCATGG4G ATACAGCTGT TCTGGTACAG GTATTGTCCA ATTrTGGTGAG A'rGGCTAATA ACCGTTATAT CTACCGTGGT ATGGTCATCT GAGTCAGCAC TTGCTGAAGC AGAGATTCAA TTCCCTTTAC TATGCCAACA AGGTAAAAGA TGGCAAAGGA TATCGTTGTC TGGACAACGA CTCCATTTAC CATCACAGCT TGCAGATATT GATTACGTTT TGGTTCAACC TG;CTGCTGAA TGCTGAATTA TTGACTAGCT TGTCTGAGAA ATTTGGCTGG AACTTACCGT GGCCAAGAAC TCAACCACAT CGTAACAGAA AGAAGAGTTG GTAATTC'N'G GTGACCACGT TACGACTGAC 9 9* 9 9 .9 99 9 9 9 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 TACAGCCCCT GGTTTTGGTG 9 9*9* 9 9* 9 9 9**9 9 9*9* 9* 9 9 ATTGCTAATA ATCTTGAAGT CGCAGTGACT GTGATGAAC GCTGGTCCTG AATTTGAAGG TCAATTCTAT GAAAAGGTAG C7'rGCTAACC TCCTTCTTGC CCAAGAAGAA ATCTCTCACT ACTAAGAAAC CAATCATCTG GCGTGCAGTT CCACAATGGT CGTCAAGAAA TCI-rGCACGA AATTGAAAAA CTGAAATTCC CGTCTTTA.CA ATATGATCCG TGACCGTGGT GACTGCGGTrA GGTGTTCCAC TTCCTATCTT CTACGCTISAA GATGGTACAG AGGACGATTA CAATGTTGGT GTGGTATCAT GATGAAGAAT TTCCAACTGT TATTGAAAAA CATATCCATT TGACTGGCGT TTGCCTCAGT TTCTAAA'rrC ACTCAGAATG GGGTAAAGTC TCTCTCGTCA ACGTGCTTGG CTATCATGGT AGCTGAAACI' ATTGAACACG TAGCTCAACT 7r'GPAAGAA TATGCTTCAA GCAVNrGGTG GGAACCTGAT GCCAAAGACC TCI'GCCAGA AGGATTTACT CATCCAGGTT CACCAAACGG CGAGTTCAAA AAAGAAACTG ATATCATGGA CGTTTGGTTT GACTCAGGTT CATCATGGAA TGGAGTGGTC GTAAACCGTC CTGAATTGAC 7rACCCAGCC GACCTTTACC TAGAAGGT'rC TGACCAATAC CGTGGTrGGT TT'AACTCATC ACTTATCACA TCTGTTCCCA ACCATGGCGT AC-CACCTTAC AAACAAATCT TGTCACAAGG TTrrGCCCTT CTTGGAAATA CI'ATTGC'rCC AAGCGATGTT cTcTGGGTAA CAAGTGTTGA CTCAAGCAAT CAAGTTTCTG AAACrrACCG TAAGATTCGT TC3'GACTTTA ACCCAGCTCA AGATACAGTC TACATGACGA TTCGC~rrAA CCAGCTTGTC GAAl'rC'TGA CGATCTACAA GGCCTTGGTG 364 GATGGTAA.AG GTGAGAAGAT CTCTAAATCT GAAAAACAAT TCGG;TGCTGA AATCTTGCGT GACGTGCCTA TCTCTATGGA TATCTTGAGC AACACTCTTC CTTTCTTGAT TGCCAATACA GCTTACGATG AGCTTCGTTC AGTTGATAAG AAGACCATrC GTGATGCCTA 'rGCAGACT AACTTTATCA ACGTTGACTT GTCAGCCTC TACCrTGA7?T TrGCCAAAGA CAAATGCAGA CTGTC'rrCTA CT'rCCTCACA CTGCGGAAGA CAATTGTCAG AATTACCAGA TGGGCAGCCT TCATGGACTT GCAAAAGTTA TCGGTAAATC AAAACTCTAC TCGAAGCAGT ACCATCGCAG AAGGACCAGC GTTGAACGTG CTACTGGTGA GAACGCAGCT ACCAGCCAGT TGTTGTTTAC ATTGAAGGTG CCA.AATCACT GGAACGCCGT TGACATTCTT GTCAAAATCA CCAAkACTCTT GACACCAATC AATCTGGTCA TATCTTGAGT TTGAAACAGA AGACTTCGTC AGTTCAAACT TTTGCTAACC AAGAAGAAAT CTTGGATACA 6*
S
TCGTGGACAA
ACTTGAAGCA
AAACAGCAAT
TCCGGAAGCT
AGTATGTGAC
TATCTGTGAC
GCGGAAGCAG
AATTTGAGAA
GAATACCTGA
TAC 'CAACAA
TTGATATTAG
GTTATTTCTG
TCGCAGAAGG
GAAAAGACAA
TA'rGATGCGT
GAATCAAAGA
GGATAAGArr GT7'rCTGAAA
ATTTCAAGAG
CTAA7=TAT
TTTT'ATTTA
GA.AACTTAGC
GGTAAATAGT
ACTATTATAT
TTTTAAAAGC
AAAATAAAA.
TCTTATTG4GT
TGGAGGAGCT
TGCAAATCTT
GCACAAAAAG CCTTGGAAGA AGCTCGTAAT CACTTGACAG 'rTTATCCAAA TGAAGTGTG GTAGCACAAC TTTTGATCGT GTCTGAGTTG GCCCTTAGCT TCGAAGATGT AGCCT'rCACA CG'rTGCCGTC GTATCGACCC AACAACAGCA CACTCTGCAA GCATCGTAGA AGAA.AACrTT* AAkATAAGATT GAAAAGTCTA GGCAAAATTC AGTCTATTAA ACGCATTGTA TCACGTTTTT TTTTAAAALAT TTGCGAGGTA TGACTTTTTA AAGCTAACAG TAGTAAGATA AAATAGGAAT GTAATATTTT TACAACAATA AAT'rTATATA TTTATTTCAT ATTATACAAA TrTTTA'1-rr AAATATGATA CAAT=IrATT TGAAAAAA.AT A.GACT'rGCTT TAATTAGTGG TATCGTCGGT CCTTTTGTCT TGTTGGGAAT AGCGGTAAAC ACTGCAGGGG CTTT'rTCAGG TGTAGCCTTA GTTCTTGGTA TCATTGCTAT TGTTTACTAT rCTGTACTAA TGATTCTTTC TGGTGGAGT= rTGGGGGGAT TTTTGCTATT ATCGGAGGAT 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 ATAATATCAG AACATACTTT AAAAAAGGAG ATTTTATTAT
S
S.
S
S
CTTGTGGGAG
ACAGCTGCTA
CTCT'rGAATG
AAAGGAGATA
AGTCTCATTC
GAATTTTACT
CAACTCTTAA
CCTTGAAGAT
AGCGTGTAGG TGCAGCTCCG TATTCCCTTC TTAGGATCG CTCrTI-CCT 'rTCAACATTG AAGAAATTCA AAAAGAACAA AAAAGTTTAT CGGTATAGGA G-CATGTGGAA CATAAAGTTC AAAGAATACr ACATCCAATA GTCAAAAGA AACAA'rCACT ATGAATCAA TACGCACTTA TGCAGATrAT TATNTTACTA AAGC'TGAGGA AGGTTCAAA ACTAAACTAA AAGAGTCA.AC GAAGA'rAGAA TAGAAACAAC TCCTTTTrGG TTTTGACTAG AGGGGAAACC TCGGAAAAT GCCACGG'rGA GITCTGAATGG TGTCTGAAAA AGGTACACAA
TCAAAAATTA
CAAAAACAAT
CT'TTTTrGTG AATCAGAAGA ATAAAAGGTA V~rTTAGCATG GTAGCTCTAT TATCTCTTTC TCT'rCTAGI-r TCAACAAGTA A'rGATGAGAA GACAGTAGCA TTCGA'rACAC CGGTrGTAAC AGACGATGCG ATAGATCTT ATAAAAATAT ?1'TTGAGAT GGCATAGCTA TGGAAAATAA TGACTCG~rr 7rCG.ATGCGC AGAAAAAAAG G1'TAAATAAT* GTGATTGCCA AACATTGTCA AACAGTCCTT AAAAATTGT'G TAAAATAGAA TAGATAAACG TCCATCTAA1 GGTAAAATTG GTTTTCCTC ACCTr'1-1CAC TGGTTGGGCT GATGTTGATT ACGCTGGTAA ATTGATCAAA GAAGCTGGTA
TAAAGGAGAA
AACAAAGCTA
CAAGCGATI'G
60000*
OS
6 0
S
@0 0 4 0 0* OS 0O 0 0 00 @6 0 0
S
#5055.
9 6 @005 Sf 0.
6
S
0000 0 0000 40 0 6 0 TCGAAT?'rGA CCAAGCTTAC CTCTTGAAGC TTCTGACCA.A GI'CACTACGG TGGTTTGACT AGCAAGTTCA CATCTGGCGT- ATGAGCACTC AGCrCACACA ATGCTGAAAA CTTGAAAGTG CTCCAGCTCT TAAAGATGGT CCC7rGTAAA ACACATCAAA ACTTCCCACC ATTGGTATTC TTGGAAAATA AAAAATTIGTA AGTATGATAA GGAAT1AAAAA ACTTACATGG GT'rGGGAAGA.
G1'ATTTGGCC TTArAGGGGC GGTCCACTTC CTGATAGTrC TTAGCTGAGA TTCGGCAGGC GAGCAGCACC TAATTGGGCC
ACTTCAGTAT
TT~GTGGG=rC
GGTAAAAACA
CGTTCAIrACG 'rGAAACG'rGC TATCAAAACA ACTAACTrGG CAGTTGAAAA A'rCATGGCGC TTGAACGA.AC AAGCTGAAGC TGCTGAACA ATGTATTGCC TCCAAACATG GACCGTrCGTT ACCTCACT ACTTrGGAAC GTGCTCTrCC AAAAACGTAT TCGTAGGAC GGTTTGTCAG ATGACGAGAT GAATTCGACG AAAAATTGA.A AGTCTAG.AAT TGATTTCTAG PLCAAGATTAT GTACTGGCCT AGAAGCTTTrA CCGATAGGCA TGAACGGATT CAAT1-rAATG AGATTATCAG GGTGGAAATC TTTGGAGAAG AGAGATTACA AAAAACGAGT CAATATGG4GA
TGACGACTCA
A'1-rCGGGAA
TCACGGTAAC
TTTGGTGATG
GACCGTGATG
GTTATCCCAG
GA'rAAAATCG
TCAATCCGTG
10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 CATGGACG'rG GAAATCCCTA CGTCGT=TCT GAATACTACC GCTTTTTATG TrAGTATGGA ACAAGCAACC AGCTTCAACC ATGGTTCTTT AGGAGCAAAA AAAAAAGTCT CTGGTCTGGA TTCAGGATCA GTATGTT ATCTGGCTAA GGAACTGGCT CCTATCTGTC 'rrTGGGGAT ATTCACATTG AGTTCAGCCA GCAAGGTACG ACTTTGTCTC AGGTGACGGA CTATCAGAGA CAGCTGAATA TATAAGGC ACTTGCGACG GAACGTAAAG C7TTTGCGAG TTTCCAGAT GGGTTGGAAA CTCTAGATTT TACTATAGAA GGAAAGTATG AGCAGGAAAA ATcTGATTAc 366
ACTTCT'TATG
GATCTCTTGG
CTATCCTTGA
AAGGAGGTA
TCTATAAGGG AACGCGATT TTCAATGT-IT TACTAAGGAA CCTGTGATr' GGCTTCTGAT AGTTGGATAT TACTGATT-CT
CATATCTTGA
TGGGAAACGG
TATGCCAATC
CGCAAGAAAC
TGAAGGGAAG AGTTAAGGAT AATGATCTGC GGN'GCTAG TTATCTAGCT ATGGAGATAT TAGAGTrTGG TCAGATAGGG N'CAGATATC AGGAGCCAGT TCTTcTrGGc CGCTAAGACG GATTTTGCCC AAAATcCCTGC TAGCAATTAT TAGATTTAGA GCAACAGGTG ATAGAC7TGG TGGACACACC TAAAGAAAAG AATTGAAATC AAGGCATATC GAGGACTACC AAGCCTTATT CCAGCCTGTr TGGAAGCTGA TGT'TGACGCA TCCACTACAG ATGA rrGTT AAAAAA'rTAT AAGGGCAGGC TTTGGAGGAG CTGrCTTCC AGTATGGACG GTA'NTrATTG CCAGAGACTG CCCAGATGCT CTACCAGCTA ACCTACAGGG ACTCTrGGAAT
GGCTATACCC
CAATTGGATT
A.AGCCACAAG
ATTAGTTCGT
GCGGTCGACA ATCCTCCTTG GAATTCGGAC TATCAC'N'AA ATGTCAATCT GCAGCTGAAT TATTGGCCAG CCTATGTTAC CAATCTCCTA GAGACGGTCT TTCCAGTCAT CAACTATGTA GATGATT'rGC GTGTCTATGG TCGTC'rAGCG GCTGTAAAGT ATGCACGAAT CGTCTCTCAG AAAGGTGAGG AGAATGGTTG GTTGGTTCAT ACTCAAGCGA CTCCCITTTGG TTGGACGGCA CCTGGITiGGG ATTACTATTG GGGTrrGGTCA CCAGCTGCCA ATGCGTGGAT GATGCAAACC GTTTATGAAG CCTATTTATT' TTATAGGGAC CAAGACTATC TCAGGGAGAA AATrTATCCC ATGTTGAGGG AAACGGTTCG TTTTTGGAAT GCCTTTTAC ATAAGGATCA GCAGGCGCAG CGTTGG3GTGT CTTCTCCGTC TTATTCCCCA GAACATGGGC CGATTTCGAT TCGCAATACC TATGACCAAT CTCTGATTTG GCAGTTATTT CATGATT'rrA TTCAGGCTGC 'rCAGGAATTG GGACTGGA'rC AGGACTTG'rr GACTGAGGT AAGGAGAAGT CTGATTTACT AAATCCTTTG CAAATCACTC AATCTGGTCG AATCAGGGAG TGGTATGAGG AGGAAGAGCA GTA7"M'CAA 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 AATGAGAAAG TGGAGGCCCA GCATCGGCAC GCTrCCCATC TAGTGGACT CTATCCTGGC AATCTCTNTA GCTACAAGGG ACAAGAGTAT ATrGAAGCGG CGCGTGCTAG CCTCAATGAT CGTGGAGATG GCGGCACAGG CTGGTCCAAG GCTAATAAGA TCAATCTCTG GGCGCGTTTG GGAGATGGCA ATCCAGCCCA 'rAAATTATTG GCAGAGCAGT TAAAGACATC CACCTTGCAA AATCT'TCGT GTAGCCATCC TCCr'rTCAG ATAGATGGTA ATTTTGG'rGC TACTAGTGC ATGGCAGAAA TGTTACTCCA G;TCTCATGCA GCTTATCI'GG TACCTCTAGC TGCCCTACCT GATGCTTrGGT CAACAGGT'rC TGTTTCAGGC 'N'AATGGCAC GTGGACATTT TGAAGTGAC ATGAGCTGGG AAGATAAAAA ACTCrrACAG TTGCGAGTr CTTATCCAGA TATTGAGAAG AAAGCGAAAT GCATGGGGAA AGATTGTATT CAArTTATT TTTAAGAAGA TGTTATAAGG TTTAAGAATA TAAGCAGTrr TCAACTAGTT ATACTCAATG AA.AATCAAAG AGCACAAACT TGTrGAGG 'rTGCAGATGG AAGCTGACGT 367 TTGACCA'rT
AGTGTGATTA
TCGGrCCAA
CAGTAATTTG
GAAAAAACGT
TATCAAGGAG TGGAGGAGAT AAATGAATCA AGAAAAAATA CAGCACAAGG TGATCrTGTT AAACTGCC?? TTAATAAGGA TATAATGATA ATAGGAAGTA AGGAAGCTAG CCGCAGGTTG GGTTTGAAGA GAGATTTCG TT'rT'TGAT AGAGGGTGGG TCTGNTGGCT TATATTGAGA TGAAACACTG TATCAGG'rTG GGGACACCGA GATTGTGGCC AATMrTGATG TGAATTTTGA GGGGAGCTGG TTATTATCCT TGGTGCTTCA GGTGCAGGCA AGTCAACAGT CTTGGGGGAA 'rGGATACCAA TGATGAAGGG GAAATCTCGA ITGATGGTGT GATTATAGTT CCCACCAGCG CACCAATTAC CGTAGAAATG A'rGTGGGGTT TTTTATAATC TAGTTTCTAA TCTGACAGCT AAGGAAAATG TGGAACTGGC GTGACAGATG CCTrGAATCC TGATCAGGCC TTGACAGATG TAGGTCTGGC AATAACTTTC CAGCCCAGCT TTCTGGAGGG GAGCAACAGC GAGTCTCCAT GTAGCCAAAA ATCCTAAAAT TCTCCT~rGT GATGAACCGA CTGGAGCCT'r
CTCAAAACAG
AGGAGTATAA
TTACAAGCGT
GA7"GAAAAG
TCTTAACCTT
TAATATTCC
TGTTTrCAG
TTCTGAAATT
TCATCGTCTC
TGCACGCGCG
GGA'rTATCAG 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140
ACGGGCAAGC
ATCATCGTGA
GATGCCAGTG
TACTAGCATG
CAAGGGGCGT
CCTCAAAGTA
AGGTTTTGAA AA'rTCTCCAA GACATGTCTC GTCAAAAGGG AGCGACGGTG CTCATAATGG ACTTGGCG CCCATTGCTG ATCGCGTGAT TCAAATGCAC TCAAGGATCT GGTGCTCAAC CAGCATCCTC AGGATATTGA CAGTTTGGAG ATCAAGCGAA AAACTTAT'rG GAAGGACTTA GTTCAGTCCT TCACACGCTC TTTTTATCCA TCTTGATCCT GATGATG'rTG GGATCTCTAG CCTTAGTAGG ACCAGTCCCA ACATGGAGGC GACAGCTAAT GCTTATTTAA CAACTGCTCA AACCT'rGGA'r TTGGCAGTCA TGTCTAACI'A TGGCTTGGAT CAAGCAGACC AAGAAGAACT AAAACAGACG GAGGCCGCAG AGGTCGAGT'r TGGCTATTTG ACAGATCTGA CT1ATGGATAA TGGGCAGGAT GCCAT'rCGGC TGTACTCCAA ACCAGAGCGA ATTTCAACCT TTCAGCTAAG AAAGGGACGA CTTCCTCAGT CAGACAAGGA AATCGCTTTG GCCACITCATT TGCAAGGCCA ATACAGCGTG OGACAGGAGA TTAGTTTTAA AGAAAAAGAA GAGGGTCATT CCTCTTTAAA AGACCATACT TATACCATTA CTGGTTTTGT GGATTCGGCT GAAATCCTCT CCCAGCGAGA TATOGOCTAC GCAGGAAGTG GAAGTGGGAC TCTGACAGCC TATGGGGTGA T'TITACCTAG
TCAATTTGAT
AAATGCCTTT
AA'NrATCA
TCTAGACAAG
TCGTrAGCA
TCAAGTTCAG
368 CAGAAAGTCT ACAATA'rAGC TCG~rrGAAA TATCAAGATT TCATCAGCTT ATGAAGAAAA ATCCAAGCAA CATCAAGAAG GATAATGGCA AGGTACGTCr GCAACTI'?rG AAAA6AAGAAG GGGCAAGAGA CCCTTGACAA GGCrCAGACT AATNGMCAGG GCTGCTCAAG CTCGTATACA GCCTCAAGAA M71'CAACTAG AGAGAGCAGG CTAGTGCTCA AC~rACCCAA GCCAAGCAGC
TAGCGGGTTT
AGCTTGAACA
GACAAGAGTC
AAGGCAAGCG
CCTTGI-rCC
AATTGGGCAA
GGAAGAGGAC AAACTAAAGC AAGCTGAACA AAATCTAGCC AAAACATCAG CAAGTCTTGG ATGATTTGMGC GGAGCCAACG GACCA'rCCCA GGTGCTCAGG GCTATCTTAT GTATAGCAAT AGTGGGCAAT ATCTTTCCTG TGGTACT'rTA TGCCGTAGCA CATGACTCGC TTTGTAGACG AAGAGCGAAC TCATGCAGGG TCGTAGTAAG GATAT'rATCG CCAACTTTCT CCTTT'ATGGA AACGGCTCTA GGTAGTATAC TTGGTCATTA 'PTGCTAGCC TACAAAAGGC ATGGTGGTGG GAGAAACTCA GArrCAGTT'C AGCT'NTrGTC TTGAGCTTGT TGGCGAGTGT G'ETACCAGCC ACTTCATGAC GAAGCAGCCC AGCTTCTACT TCCTAAACCT CTTATTGGAG CGTATCGGTT CCCCAACATC TTTCGTTATA TGTAGCTCTG CTCTTTGCAG ACAGTTTCAA CAAATCCAAC TCAGGACAAC GTAGAGCTAG AATCTATTCT AAAGCGCTAT TCTTATGATG ATAGAGAACG GGAGCTGACA TTAAAAGATG T'rATCTGGCC TCGTCTCAGT AGCAGAGAAT GTTGATGACA GTTTGGGAAT CCAATCTTCT CAAGAAAAGG AAAAAT"?AGA TATCAGGTTT ATAATCGTCA GC'TTCATCCA GTATTCGAC GCCATGGTGA CCTTTACGAC ATTTTTAAGG CCTTGC-GT'rA CTAGTAGCTG GGACTGTCGG AGTGTAATTIT CAAGTGTCAT TATTCGACCT ATAGCTTACT TATCTG4GTGG CTTGGAGGGA CCTGTCAAAG GAGCTAAAAT ?1TTACTCATA AGG!'AACAGC ATCTTTGGTG TGGCAGGTTC GTAGCAGGAG TTCCGTCTAA GAAAATCCTA GTGCGACCAA GAGATACTAG CCTACCAGAA GCTGGTCTTC AAAACATTAC CATCTTCAAC ATCATCAGCA CTCGCCCAGC TGGCAGGTGT AAGGTCCTTG CTATTACTGA TATGAGCAAC TTTACGGACA ACCAGTGCAA CTAGTATCGA AGCGTrrGTCC AAAATGCTTC CAGACCATGA CCZATCTTGGT 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 7580 17640 17700 17760 17820 17880 17940
AGTATCAGAT
CAGAAGTGTT
ACAAGGATTT
AAGATTTGAC
GCATCGTTAT
GCTTGTCTCT
GAAAGGGCAG
CAAAGGCAAA
TCCCTTTATC
TACAGCTAAA
TAAGGAACTA
TCAGGCTAGC
ATTAAGGGAT
'rGCGGTGTCC
CTCACTCAAT
CAACGG'TGGG CAGACrrTAG AAATTGAAGG GAACTACGTr GGTCACTTTA TTATATGAG GCTACCCCAA CCCAACACTT ATCTGGTCTC AAGTCAGGCG GGCT'rGCTTA TGAATCAATC AGCCATTCGA CTCTTCGACT CTATCGCTAG CATCGTATCG GTTrCTATTAG CTAT'rGTCAT CCTrACAAT CTGACCAATA TCAACGTAGC 369 TGAGAGAATC CG;TGAACTCT CCACTATCAA GICTTGGT 'rTCATAATA CCTCTACATT TACCGTGAGA CGATTGTGC GTCCCTTGTG GGAATCGTAC AGCTGGTTTC TATTTACACC AATTTTTGAT TCAAATGATT TCCCTrGCGA TTATCCGCAG G'TAGGCTGGG AAGTCTATGT AATCCCAGTG GCAGCAGTAA GACCrC?1 GGTTTCTTCG TCAATTA?1'A TCTGAGAAAG GTTGATATGT GAAATCTGTA GAGCrAAGGTA =~ATTTTTA GCTGA'rTGAA CTTCrATTTrA
ATGAAGTCAC
TTG.GTCTGAT
CTATTCTCTT
GCATCAT~rr TAGAAGCCCr
CTAATATTCA
AAAATCCrCC GTrTCAAAGA GCAGCGAACT TTTAGCAGCT CCAAN'GCGG CTTCGAACTT AACGTAGCGA ATCGTATTGT CAGTATCGAG T'rCGTTGATC AAGACGGCAT AATCGCGCCC AATGGCATI'G TCAAGGCCTT CAGCACCGCA 'rGAAACAGTC AATACGACCG TGTIGTCCAG *o 0 0 00.0 .000* 0*0:: TTGAG'rTGAG CAGATGCCTG ATCAAAATCA GCCAGAGATT CTTGTCGCCG ACTTGTAGrr CA'rAGGATAC TCCAATCTTT ATTTGACCGG AAATATGGGC TAGGTTTATT TTGCCAATCC TAGCTGTCGC CTTCTTTACC GGGATGTTTG TGTCTTGAGA AAGAGTGATG GAAC71TTGT GTTCC'rrCTT C1-TCAGTAT CAGAAGTA'rT TGAATGCTrCC rrAAA'TrrAT CCTTACNTrC TirrrTTCAT AGAATTCT TTAGCAAAAA TAATACCGTT TCATTNTGAC CT'TCAAGGTA TTTTCAGTTT TCI'TGGCATT~ AGGTI'GATAC CGTTATAGAA
TATCGATAGA
T'rTTAGAAAG GT'1'ACCTGT TTTr-TCCAT
ATAGAAAAAA
TTCAGCAATC
T'rGTTCTGCG
AACAGTTTTC
r'rGG4CGAAGT
GATTTCCCAG
TTCGCTGGTT
TTT ATCAAGT
ATTGTTAGGG
TTCA-AGGTTA
GATAACATCA
TTCTACCAAT
AATCAAATTA
CTTTGTGACA GAGGA?r TCTATAGGGC TGGCTCAGAA TTGATATTAT CCACGTA~rC GACAAAGACT GCGCGTGCTA ATAGGTGCCA GAAAGAATGG TCAAACTAGT CTGAAAGCAT CCAACGTTTT TGAGCAAAAG GTAGGTCCAT T1CCAGCCAAT TCTTCATTAA AACGACC'TGT AGGAACGACA CTCAAGACTT TTrCTTGCC ATCTGTTGTA GTAAGAGAAA AkATCAAGCGC AAAGCTCACA GGATT'rCCGA GAAAAGTTAC 'rTTAGCTGAA ACAGTCGGAA TI -CCAATG CGCCAGCTCA TGTGAGAATG ACGTTTrCA TTGTCAAGGT TGTATTTCAT CATGCTGTAG ATAGAGTCAG 'rAAAGATT'rG AGCGTAGATT ATI'GGACGGT CATCCACACT TGATTfCTACA T'rTTCAACCA AGGTCTTGAT TTGTTCAGGA ATGTAAGCAC rTGGGACACC ATAGGCTTTA ACAATGAGTT TCTTTTCAGC AGGGATCTTA T'rGTCTAACT TATCAGTATA 'TCTTGAGA TCTTTGGCGC TCAATGTTT GGCGATATTTr AGCCAAGCGT GTGGGTCTTC ~TCCTTTT ACCCCGTCGC TGACTGCGAA GTAGTCTrTG TTTGTAAACC AAGCA7TGCC ACCTGTTTCA GCCTCAGAAG TTTTCTAAC GTCTTCAGGA 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 370 GTCTTGCCCA ATCGGAACGA TACTATGAAG GTCAAT=~G AGTGGTI'CGTr ATTCGTcTGG 'rCACCAGCAA TATTT~rrAG;T AATA'rCAGCG ATGATTGACT 71"TTGACCAG AAGTTTATC TTT'rTCCG CTAGCACATG AGAAAGAGAA CGAGTAATGT ACCrAATTTT TTCATTAGAT
TGCCCCTTAT
AAAGAAGCTA
ATTAAAACTA
TrrAACAAAT
ATGAGAAAGA
TAGCCAATAA
GTTTATTTTT
AACTAGCAGC
AGAGTCCCAA
CACI=CAAA
TGTAAGCACG
AACTrGAAGCA AAGCAPAATC ATACTT'rTCA GACTA'rTAGC ATACAGATAA CAGCATGGCT ACAATCAGGA TAGTTCCGAC ACTTTGCATG CAGGAGTACC ATGAGAAGGT AGTGATAGALA ATTGACAGGC GACTTCATCA AAGGAAGTTA TCAAGAGTTG CTTGAAGAAA AGCTGCCCCC ACACCCATAG TAATAAACAT ATCCGTATCT AAAAAGCATA TGGAAkAAGGT CAGTTGAACT TTTAGCGACA GGCTAAGAAA GAAGAAAAGG TA.ATGCCCAT GGCGGTATCC TTTGATGTAG GTAATGATGA TGGCAGCTAG CAATCCAAAG TTGTAGCAAC AACTrrrAGT CTACAAGAAT GATrGcAGAA CCTCCAAT'rT A'rTAGGGCTT TATCGC"rGTT TGCGAGCGAT ATACTAGAAC CTCCCGCAAC CTAGCTCCGA AGGTTGAGGA GCAGTTGCAG CTGGGGTAAT GCTlGTCACAG ACACGAGACT ATTCCCATGG CTTTAGCCAA ATCCAG.ArrA ACAAGAGGAT TGGACGGCCA CATb.TTACC CCAATCAAGA TGATACCGAG CTTTTGATAA TCGAGTTTCC ACAATGGCTC CGATAAAGAA ACAGCATGTG AAATGGCATC ACAGCTCCAG CTACAATCCC AATTrTTTGCA ATCCATCGAT 4 *4 *s .4 a 4. pe 4 4 a .4* 4 0O 44 4 4 19 74 0 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 GTCAAGGCCC AAGATGAAGG ATAGGGCrAC ACCTGGTAAG TCCCATGAGT GACATCCCGC GTAGAPATAAT GAAACATCCC GACGACAATA GCTGTTATCA AGGCATTTTG TAGGAAATCG AAATTCTGCA ATCATAGCTC AGATTGGTTT CGGTAAAAGT AGTAAGACTr GATCGAAGTA GTCT'rCCCAG CTTr'rTTCAA
TCAATCCCAG
GCAATCAAGA
TCAGCTAGGC
CTTCGAAAGA
GGAAAGTTGT
CAAAGGGTTC
CCCGCTGGAA
CGACGATTTC
GAGGAATAGA
AGTCGATATT
ACCTCCATTG AAAAAGAGTT GATTACCGTA ACTC'T TTC?1TTTGTT GGACCAAAGG CAATCACTTC TCGATTGACA GTGGGGAATC TTGCTGAGGT CGTGGTGAAC GATGAGAACC ATCTCTCAGC GTATTCATGA TGA'r'rTCCTC ACTGACAGAG ATCCAAGAGG ATATAGTCGG CTTCCTCCAC CAAACATCTG TTGACCTCCA GACAGTTGAC TAAr'rTGACG TrAGCCTAC AAGGGCCTCT TGCACTTTCT TCCAA'rGTTT AGCCTTTAAA GGGAAATACT CCTAACGAGA CGCATTCCT'r GACCTrGATG CATT'r'T'rGT TCCACATAGG CAATTCGGTG TAAGGA7=T AAATGCCTGA CCTTGATGTG GGATAATTCC CAACATACCT TTAACTTCCT TGTCATCGAG TTTAATAGTG TTGArrTCCC AGCGCCGTrr GGACCAATGA TGCCCGTAAT TGTTGGTCCA TGGAGCACTA GTGAAATATC CTTAAGTGCC AACGTTITCTT TGTAGGAGAC ACTGAGGTTT 371 TCGATACGTA TCATAAACTT GTATTCCTCC TGTCTCTPTAA TATACATI'AA AAAAAAAArr AAGTCAAGTT AA'rTTrGAA AAAATTAAAA TAATAACTGA AAAATAGATT CTAAAGATAA CTTTCAGGAT AAATTAA ATTATAAAAC GCATAGTA'rC AAGTGTAAAA AACTTGGAAT TATGCGTTTT ATCATGGAAA GAIr'rAT AATAGCTAAA AAATAA INFORMATION FOR SEQ ID NO: 37: Wi SEQUENCE CHARACTERISTICS: LENGTH: 6171 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: GATCCCCAGG AAAAACCGAG GTTTTCCCA.A TCAATCG'rTA CTGTCATATT CCAC'rCCT-rA TTCTAAAAAC CTAT'CTA TAITCACAC TATTTTTCTA AAATAGCAAG TATATTTITGT 21540 21600 21660 21706 AATTTTCAGA AAA'?TTCTCC
ATTTATCTTC
TGTCAGAAAT
TTTCATAGAT
CCGTACTGAA
GATGCGGTTT
CTAGAATTTC
ACGAAGTATT
TGACACCTGC
AACCATTTTG
AGTAACTACT
GAAGCTTCCA
TGTTCCTTrA
GAAATCGCCA
GTTTTTCAAC
CTCAATGGAA
ACAGATAAAG
TGCCTTAGCA
ACTAACCAAG
AATAAAAACC AACTCTTAGA ACTGATTCTT CATT'rCACTT TCCTGAAGAT AAGCGTCAAA AACrTCTTCA TCTGAAATCG TTGCTAGTGC GTTCTGACAA GTTCAAGTCT TGCA.ATCGGC TTGGATTGGA CAAGCAGAGT TTGGTCGTTC. ACATCCACTT ACAAATCCTT GCTCTGCAAC TGCTCCTGCC AAGAAGACAC TCACGCAAGA CTrGTAATCC TCGTTTGGCA CGGCTGGTTC ACACGTTTCA AGCTTCCACG CTGGGTCAAG AGGTAGAAGG
CCAGATTGGA
CCGACAACCG
ACAACATCAT
GATCTGTATC GTCTTTGAGC TTAGCATACT GAGTGAATTC TTTTCGCTCT ACCCGTTTGA TTGTCGCATC GTCAAACTGA TCCAGTACTT iGT'rTGTGAT GGT 'TGGCTC AGATGCTCTC CATGGATTGG TCTGTAGATG ACATTTCCAA TCTTGGCAGA TTGAACAAAA ATCAAACGGT GGACATCATC TTCTTTCAAA TTCATAGCCT GAACCTCTTC GATATTGAAA CGCAGGGCAT C'rAGTTTAAT CGGAGCCACT GCTACAATCT TGACAGACTT AGATCTATAG GTCCGCCATG TTTGACCAAG GCGAGTCACT GCAAAGTAGG CCACATAA.AG GATTTCTTCA TTCGTTrCAA CGATGTCCTT CCAACGAATA TCTGCCAACT GACTTGTGAA CATCAAGAGG TGCTGGGTTG CATCATCACG CTTGCCA.ATT TCTTCCAAGG IS0 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 TGGAAGCCGC AAAGGAACGT GGACTGGTAC GCTTGATGTA ACCTGCCTTG GTCACGCTGA 372 CGTAGGTATC 7rCCTCAGCG ATAAGACTAG CTGTATCAAT CTCAATTGCT TCGCAGTGT CTTCTAAAGA ACTCAAACGA GGAGTIrGCAA ATTTCTTCTT GACCTCACGA AGTTCvTCT TCATGAGATT CTACATAGTC CTTTCATCAC CGATAATAGC CGCCAGCATA GCAATC -rCT CACGAAGCTC TG.CTTCTCT TCCTGCAAGA CAACCACATC GGTATrGGTC AAACGGTACA GTTGCAAAGT TACGATAGCC 'rCAGCCTGrr CTTCCGTAAA ATCATAGCTA ACrPMAGGT TTTCCTTGGC GTCCGCCT'rA 7TCTCAGAAG CACGGATAAG AGCAATGACT TCTCCAAA TCGAAATCAC ACGAATCAAA CCTTCGACGA TATGGMGACG ?rTCTCAGCC TTTTCTT'TGT CAAAGCGTGA ACGCGCCAAA ATCACTTCTC GACCGTGAGC GATATAGCTA GACAGGT GAACAA'rCCC AACCTGACGA GGTGTGAAAT TGTCAATCGC CACCATATTA AAGTTGTAGT TGA7"GTAG GTCGGTGTAC TTAAATAAGT AGTrGAGAAC AAGCrCAGTA TTAGCGTCTr TCTTAAGTTC GATAGCGATA CGAAGACCAT CACCGTCAGA CTCATCACGA ACCTCAGCAA TCCCAGCTAC CTTGTTATTA ACACGAACAT CATCGATTTT CTTGACTAGA TTGCCCTrAT TGATTTCATA AGGAATCTCA ATAATAACGA TTTGTTCCTT ACCACCTTTT AGCTTTTCAA TTTCAGTCTT GGAACGAACA ACCACGCGCC CTTTCCCAGT CATCACGACC CTGAATAATA GCCCCTGTAG GGAAGTCTGG GTTTATCAAT CTTCCAGTT GGGTGGTCAA TCATGTAAAC CTAAATTATG CGGAGGAATIG TCTGTGGCAT A.ACCAGCCGA CCAAGAGGTT TGGAAAGGCT GCTGGCAAGA CCGTTGGTTC TCCATGCAAA AGGAACrGTC TTTrTCTCGA TATCCTGAAG ACAAACGTGC CTCAGTATAA CGCATAGCCG CAGGAGGATC TACCGTGCAT TTCA.ACTAGA ATCTCACGAT TT'rrCCAGTT CTCATAAGCT TTCTTGATTT TCCAGGCAAG AATTCrZATGA TGCAGCATCT ATGACCTCAG AATCCCAGTC GAACCATTGA TTTCTCCGTA TCCTCAAAGT AAGGTACCr GCAA'rrCAG TCCGTCCATA GAACCGTTAT CTGTGACATA CGAACCATGG 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160- 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 CATCATAGAT AGAAGAATCC CCGTGTGGGT GGAAATTCCC CATGATGTTC CCGACTGACT TGGCCGAC2-r ACGGTAGCTC TTGTCAAAAG TATTGCTATC CTTATTCATA GAATAAAGAA TACCGCGCTG AACCGGCTTC AACCCATCAC GAATATCTGG CAAAGCCCGG 'rCTTGAATAA TGTACT'rGCA GTAGCGACCA AAGCGCTCTC CCATGATGTC CTCCAGGGAC ATGTTTTGAA TGTTAGACAT AAGATACAAA CCCCATAAAA TACCAAGTGA AAATAGAAAA TTCTTGAAGT AAGCAAACTC ACAAGAGAAT 'rrATCT1r7' CACACAGTAT CTAGGGCGTG 7rCAACTCCT TTCAAAGAAT GTAGAGTAGG TTTTTATGCA GTAAAAGATA TTTTACGGGA ATTCCTCCCG TGTTCAGTTA CGATAAGTAA CCAAACTA'rC CTGTTTGTAT 7TTTTCAATAT GAAAATCC.( TTAACCGCCT CTTTCACATA TTTTCCAAAA TTAGTCTTAG TTTGTGTCT'r AGCCGCTCCC rACcT TTAG A 373 AGCAC'TCATA GCAGA'N'CTT
ACTGCTTTT
TTCTTTCAT
TTrACGGCCT
TAAATCTTCA
GAGCTGGTCC
GAACTGTTTrA
CACATAGTCA
TAAAACACTG
GGTTrACCT A'rrGTGACAC
GCATTCATCT
CCGATC'r CAT'rAA'rAAT CCTGCAATTT T7"rCAAACCA AGA~rTTCAA 'rrCACATCCC ACTCTAATTT CCAGr"ACT AACATATTAT TCGT'CTTC TAGCGTAAAC 7rGACATTAT CTCAATCCA TATCTrCCCAT GAGAACATG ACGCGGCGTT CGGCGCGCGC GGATGAGGGT ACGGTrc GGTTrCATGG TTG~wrTCCCA CACCAAGTCC TrrGTATCGT 'rGGAGGGTAG CGCCT~rACC CTAGTTCTCC GTCCGTCCAA GCGTAGGCC-A CTTCTTl- CTGCCTT1'A CCTTTGGACA TC?1'GTAAAG CTCGACTAGC GGACCCATGT AACGGTAGAA AGGTGG4GAGG
AAATGTCAAG
ACCGTCGGTA 'rCCGCATCGG TCATGATAAT GATCNTATCA GAAG-rCT= CCAACACCCG CACCAATGGT ATAAATCATG GAGGATATCC GCCATCTTGG CCTTGGCTGT ATTGACAACC AGCCTGGAAC TTGCGGTCAC GACCTTGTTT GGCAGAACCA TAGATAGAGT TCATTCTTAG CAGGATTCTT AGATTGGGCT CAAGCCCTTA TCTN'CTTGT TTTTCTTCCC ATrrCGCC GCAATATAGA CATGACCTGC AGCAAGGTCT GGATATGCGC TACTTGGCAT CTTCA.ATAGA GTATTGATCT CTTCATTTT TTACCACGAA GAGGTAGAAT CCGGCAGAGT CCCCCrCAAC GGCGTCAT TCCCAGACAA TCATCACGCG CCTTACGTGC 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 41.40 4200 4260 4320 4380 4440 4500 4560 4620 4680 TGC-'TCACGA GCATCACGGG ATTTTCCATA AGGAAAAAGG TAGGGGGCTT CCTAG=rAT TAAGATAGAA AGAACGGCCG TTTTTCCTTG AGAAGACCTG CTTGAGTCCI' GTCTCGTGCG AATGTTATCT GAGAATCCGT TTCCCCTTCA AAGTAAAGAA AAAATCTTGT ACTCCATTCT TAAAGACAAG CTCACATT ATTGTACTTG AAATCTGTCG
CCTTGATAGC
TCAACTTATC
CCTTGGTC'rG
CTAGTCCCTC
TTTTACGTGC
'rTCCACCGTC C TTGCGGATG
AGCCACTATT
TCCTTrCAAAC
ACGATAGTCT
AGGTTAGAAG
CCATCCACA
TGCAAGTGTT
GAACCTTCAA
CTAA'rrCCCC
CTGGGCGAGC
CT'rCAGGAAC
GGTTTTTATC
TAATGGCAGA
CAAAAGATAG
ATAGTCATNC ATGACC= GG CTTrGGTGCGA ACGTTATTGA CATTGTACTG GAGGGCTACT TCCACTTGA.A AACCATTGTC CTGGCGTCAA GATTTCC'rTA TCTTCGT.rGA GATAAGAAAC CATAGTGGAA CTCAATCGCC TCA'r'r'rCTC GCTTGTCCGT TCAAGAGAAA GGCTGATTCA TrAAGGCGCT CTGAAATG4GT TAGAAAATAT AGTCGCGTCA GGCATAAAAG TAACTrrTGr GCCTG'ITTTA GACTTGGGTG CTGTrACCGAT TTTCTTCAAA CTCGTGACAG GTTTTCCACC ATrTCGAAA cGTrGcTTGT AAAcTGcc ATCACGGGTA AT'rTCAACTT CTAACCAGCT AGAAAGGGCG TTAACAACGG AAGAACCCAC TCCGTGAAGT CCACCTGATG TCTTATAGCC 374 ACCTTGACCC AATTTCCCTC CGGCATGAAG AATGGTAAAG ATAACCTCAA CAG'N'GGAAT
TCCCATAGCG
GTCTTTATTG
TGCATACcTG TCGGCATCCC ACGTCCATGG TCTTGA.ACCG ATACTTACAT CAATACGATC ACCAAACCCA GACAAGGCTr ATrArCAACG ATTTCCCAAA CATCCCTGGA CGTTrTCGGA ATAATTGTTA ATATTGATTT CTATTCTACA GGTTT'rCCAA AGCAGAGATT CTCTGCTTT GTrTTAT'rAA TCCTAGCCTA GTAT'rTTTC AAATCAATCT TTrCCGCATTT TAGGTAAGAA CTAGGTGATG AAGACCAGCG CCATCGGTCG CCGCATCCAA CCCTrCTAGC CCTTTTTTGA CACAAGGAAC GGATTTTGCA AAATTTT'TCT CTTTCCCAA'r 'CATGATATA TCTGCTGGGT TCGA1TCCAT
)ACCTGAATAG
CTCCTrATTCG TTCTCCGA'rG
ATAGGAGTAT
CTGGTCTCTG
TTAGACTACC
CATCGACTGC
ATCCAATATA
CATCATCATT
TTCATCTTTA
TGACAATTrC GA'rTACAATA
GATTGGACAA
ACGCGAGCAT
AGCTGGTATG
ACCCTAGCAA CGCTGCTTCC GATTA'ITT GGACTTTTGG CTGTTATCGG CCATACCTTC GCTGTCGCAA CCAGTGCTGG AGTGATTTTC GCGATTATCT 'rCTTTGCAGC TCTCTA'rCT GCATCGATTG CGGCTGTTAT CGGGGTTCTG GGTTCTGGTA ACACTGGAAC GACCAACACC GCAACC'rG TGATTGACTT TTTCAAAGGA CATCrACAAG GCGTTTCTCC TCTCATCTTT CCTATCT'rTG CAGGATTTAA AGGTGGTAAG GGATTrGCGC CTATCTTCTG TCTCTACCT 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6171 GGCAGTATGA TTTCACTGTC CTCTTTCCAC T=TGGTTT AACTATfGACT CTCTCTTCAT CGCTATTATC TTAGCACTTG CTAGT'rTGAT CATAAGGACA ATATAGCTCG 'rATCAAAAAT AAAACTGAAA ATT'rGGTCCC AACC1TAACCC ATCAAGATCC TAAAAAATAA AATGCCAGT CTCTACTGCC TAGACAAATA ATTTATCCAA AGGATTTAGT TCTGTACTGC ACAGGACTAA T'ITTACCTTA ATTCGTTTGT TGTTGTAGTA ATCAATATAG TCTATAATGG TTGATTAAGT GATT'FAAATG TI'TCTCATA GCCATAAAAC ATTTCGGATT AAAGAAAnAT TrrA'rCTAC CCTr.TCrTr CCTGTTGCCC! TTACGTGACA
TAGTGTCACA
TATCCTGAGT
TATCAP'rCGT
TTGGGGATTG
CCCAAACAGT
GTCCTTTTAG
CTTGTTCCAA
TTAAAATGCC
TGGATGCTTG
AAT'rCCCTTA CTCTCTAGGA ACCGATGATA AGAATCGTC? TGGTATTGCC ACCCTTGGTC ACTATGGAGA ATCCTATTCT CGTAGTGCTT CTCTGTGAAT GCCTGTTCCA A INFORMATION FOR SEQ ID NO: 38: SEQUENCE CHARACTERISTICS: LENGTH: 18475 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: TATTACAAAT AAAAAAACGG AGGAGTGCTT AcTTGCTPCT TTTGTTGATG TAGACAAACC GCGTA'N'GTA AAAACCACTA TTTGTGGAAC TACTTGCCAA AGTGGTACCL TITCTTGCCA GGAAGGAGTT TCCAACTTCA AAAAAGGTGA TGGTAAATGC TACTACTGTA AAAAAGGAAT GA'TTTCGGT CACTTGA'rTG ATGGTATGCA TAATACTCTT TACCATACTC CAGAAGA CTI CATTCTGCCT ACTGGATATG AAATTGGTGT CGTAGCCATT ATTG4,=CAG GTCCAGT'rGG TTCACCAGCT AAATTGATTA TGGTAGACCT ATTCGGTGCG ACTCATAAGG TTAATTCTTC TGA~wrGACA GATGGTCGTG GTGTGGATrCT
TATGAAAGCC
AGTTATTCGC
AGACCTCCAT
CGAAGGGATT
TATACTTATG TTAAACCAGG
AAGCCAACAG
ATTATCAAAG
GGGAT'rGTTG CAAGGTCTTG ATTrCTTGCG TTATGCTCAC 'rGTGAAGACG GGCTGAATAT CTACGTGTCC GTCAGATGAA GCTTTGGTTA CTTAAAAGGG AAACTAGAAC
ACGCTATTGT
GGGATGTTcC
AAGAAGTTGG
TCTGTGCCTG
AAGGGGGCTG
CTCATGCAGA
TG.CTGTCAGA
CI'GGTrTGCAG
ATTGGCTGCT
AGACGATAAC
AGACCCTGAA
CGCTATCGAA
ATTTGATTTC TGTCAAAAGA TTATCGGTGT AGACGGAACG TGGTAAACCA .GrTGAATwrCG ATTTAGATAA AC1I'TGGATT TGG'TGGTA TCTACAAATA CGAC'rCCACA ATTGTTGAALA TGAACCGGAA AAAT'rGGTAA CTCAC'TATr-r CAAACTCAGT AGTCTTCACT AAGGCAGCAG ACCACCATGC CATTAAGGTC AGAAGCCTAA GTAGTAAAAA TATTTTTGTA CATAAGfAAA AGATCCTGG ATTTTATC AAAAAATTAA GAAATGAGCA CT'ITTAACAG CCCAATTCTA CGCTTCGAAA CTGCCC'rATC AAAGCCATTA AAGAAATTTA GCTGTTGGTA TT'CCTGCAAC G'rTGCCAACTr GTGCTGTGCA CGCAACATCA ATG'rAACAAC GCACTTGAA.A GTCATAAGAT GAAATTCAAA AACCTACGA ATTATCGAAA ACGATATCTC TAGAAATTCA GTCATCCATC TATTTCTTTC CTrTCCGGC 120 180 240 300 360 420 480 540 600 660 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680
GGAATTGGTT
GTACGGAGAT
TAGAAAATT'r
GGAACAAGAA
GGAAGACCAA
TCTTPTGCAG
AACCAGTCAG
TTTGGAGTTG
ATAATATACG GTACAAAGGA ATGAATGAAT ATGTATCGTG TTATAGAAAT ?r'PGAACCG'r GGTGGTTCTT AGAAGTTGG GAAGAAGATA TTGTAGCAAG GACCAG'PATT ATGATGCTCT CAAATACTAC AAAACPGCT GGTTTAGATT TCGCCTCTT'r ATAAAAGTAG AAGCGACTTG ATGACCATTT 'TTTGGGACCC CGCTGGTGTG ATGAATGTGA TGAGTATTTA CAACAATACC ATTCT'GGC GATGAGCAGG TTATCCCAGA CGAAAAACTA CGCTCAGGCT ATGAAAAACA GAAAGGAATC GTTCTTGCCG TATGAAATTA AAATAGACAA AAGTAACTTT CTr'rTrTTAT 'TTTCTAACT CTTPGCGAAT AGTATAGGTG AGGAGGTAAG TATrGGTTCAA GAAATTGCAC
TATCTATTTT
CTGTAAAATT
'rGTGGCGGGT
CTATGACCAT
GAGTTTGGT1'
TATTGAAGAA
'rGGGAGTGGT
GTCCCTAAGT
GGTAGCTATG
ATGAATGTGG
AAGATAGCGT
376 AAGAAATCAT TCGTTCAGCT CGGAAAAAAG GGACGCAGGA TAGACGCCTA TGAGCTTCAT ATGAGGGTAG GAGACGAGCG ATT'GrAAAA GTTTGCAGCC GTTATCAGTC ACTTTAAGTT GAGAAAAAAG ACGTAGTCAA CTGGG?1'CCT CTGA'rTATGC CTCTACGTrT ATCTACTGTA GGCCATTATC GGGGGCATGA ATCCCTTTGT TGCACGATGA GGAGCAGGAC TTAGCCAAGC AGTACAGGCA ACGGGGACTC AArACr.ACCT TGATGCATGA ATI'GTCCAAG C1'GCA'r'TTT TATC7TTTTTG TCACTCT'rTA
GGTTTCAGGA
CTGGTCCGGT
AAGGACAGCA
AGTTATGTCC ATCGAAGATC GAACGAAGCA ATCGGCCTAA AGATCTCTTG ATTATCGGAG TAG'NTGACA CGTGCGACAG TGAGCGTCTG CTGGAGTTGG a a. a a a a a a a. a.
a a
CTGCTACCAG
AGAACACCAA
CACAAGTCTT
CCC'rA=TAA
AGATITAATCG
GCAGCCAAGT
CAGGCTGAGA
CAATCTCTTT
CTGTCGAAAT CAAGCAGGAC CCrATGAAAA TCTAATCAAA AAAT'rCGTGA CAGCGAGACG TCTTTTCAAC CAT'rCACGCC GTGTGAGTGA AGAAGAATTC GGGGAGGAGG AATCGTTGAC GGAATGAGCA AATTGACCAG CGGAAAAAAT TAGCTACAGC TCTAGCGGTT TTCATCTGGT AAGCAGTGTG TGACCCAGAT ATGGAAAGTT TGGGATGTTC GGCAATCTCC ACCTGAGTTT AAGAAAAAAT TGATTGAAGT ATTATGCTGG GGCTACGGAA CAAATTATCG GTAATCrGCC ATAGGAGTGC TTTGTTGGAC GGAAATCATT CTCAGAAATG TATCCCTAGC TGAAGTrCAT
TGGACAATCT.GGCTAAGGTC
TGCTGGGTTT TCTTCTCTTA ATAGTAGCAA TATTGCCACC CACATGCTTC AGTTGCAGTT CTTTCCTTGC GTCATCGACC GCGCGTGCAG TGGTCAGAGC AAGAGTATCC GAGGTGTTTA GCAGTCTTC TGCAAGGAGT TTTGCAAGCA GAGATTATCA CTTCTrAAAG ATGGACATAT 'rAAGCAAAAA AATATCATCA GGAGACTATC TCCTTr'rTAG GCGTGTGGGC TTGTCTCAGG AAGTGCTA'rT GTCACTCAGT GGGAAAGATA GAAGAATATC AGCGACCTAT CCCTrGATTT TTACCTGCTC CCACAACTGG CCAAATT'TT CTAGGCATGG TTATAAAAGA AGTTCTAAGA AATCTTTGTG CAGACCTACT ACAGGGAATG GAGTTGACGC TAAAGAAGTC GGTCAAGATC GATAGGAACC TATCCTTTCT TAAGTCCAAG CTGGGTAGTG TACCCGAGTC AACCGCACCA 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 TAGGGCTTGT TTCCGTGCTT CCCCTTTTAG TGAGTGTCTT TTCTA'rCTrA GCACGCCTTC TGACAGCCTA TTA'rGCACGT GAATGGGGGA AGA'rTTTTCA AATGATGCAG GAACAAGGTT
CACTCACTT
CCTTTATTGG
ATATGA''TC
CCCAGCTCT
TCGCTCAAAC CCTGAAAAAT GGCCGTGAAT TTTCTCAGAC TTAGGAAGGA ATTGAGTCTC ATCATAGAGT ATGGGGAAGT AGTTGGAAAT CTATGCTGAA AAAACTTGGG AAGCC1=~ TGAATTTGGT GcAGccAcTG GTTTTITATcT CGGCAATGCT CATGCCCATG TATCAAAATA ATGACATTCT TGAAAAAAGC TAAGGTTA.AA TTGCTGATTA TCAGCGTGCT TLTTCTTGCTC GCAGTCAATG ACAAAGGAAA AGCAGCTGr TATAGCTTAG AAAAGAATGA AGATGCTAGC ACGGAAGAAC AGGCTAAAGC 'rrATAAAGAA AAAGTCAATG AT'rAAGGCCT TTACCATGCT TATCCTTGCC TTGGGC'rTG'r CCGGCTCTGT GATTTTCTTT ATGGAGTTTG AAGAACTCTA TCAGCAAAAG ACTAGTCTGA ACTTAGATGG GCCAGTCCCT AAAGGAATTC AGGCCCCATC GGGCAATTCG TCCCTGGCTA AGGTTGAATT TT-GTGGCACT GATTATCGTT TTACN'ATG TGGAGGTAA6A TTT~rAAAAT GAAAAAAATG GCTTTTACAT TGG'I'GGAGAT GIGCTGGTC TTTGACCTA ATCTGACCAA GCAAAAAGAA GTTAAGGTGG TGGAAAGCCA GGCAGAACTT
CTAAGAAAGT
TACAATGATA
GGAAAGTCTC
TACAAGCAGA
AAAATGGACG
TTGGTTTTGG
TGGACGCATC
AGCAAATCGT
GACTTGTGAG
CCAGTCCACT TTTTCAGCGG TAGAGGAACA TCGGGAAACC CAAAA6ACGCA GTGTAGCCAG
GCAGACGCTT
AGGCCAAAGT
TCAGACCAGT
ATTATATCTA GGAAATGGAA TTTTACTGGA AGCAGTAGTC GACAAATTCA AAAAAATAGG GGGTAGCTAA GATGGCCCTG TTCAGGTATT TTCTAGTGAA CAATCAAAGA GCCATAAGGT GTCATCAGTG GGGGATTACT CGCTACCAGC AACAAAGCGA GAATTAGACC GTTCGCAG'r GGCA6AGGACA TCGCCATCGG GGTCGAGGTTI ATCAGCCTAT CAACTG4GTTC GCTrrCA'rrr GTGGAAAAAG AAAAAAGTTA CTTTrAGTCTT TTGTTGCAAT TTTGAATAAA GAAAAATTGG AAATTAAACG CATTAAGGAA GCTCTAGCTA TCTTTGCCAG CAAGAGGAAG CAAAAATCTT CAGACGGGCC AAAATCAGGT
AGCAATGGCA
ATTACATTTG
AAAGGAGCGA
ACAAAAAATT
CAT'rGCGACC
GCAAAAW(AA
AAGCATCA6AC
GTCAAAAGTT
ACCGAGCTGG
TrCGCTATCA
AGGGCAGTGA
CTCCTTTTrGG
GAACTCTTGA
GGAGTTGAGA
3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 AAAGGATTrGG AGGTCTACCA TGGTTCAGAA CAGTTGTTGG CAAGGCTTTT ACCTTGTTAG AATCCCTGCT TCCCTCAT'r ccrrTTrrCAA GCTATGA:CTC AGCTCCTCAT TTCAGAA-GTT GCAAA.AGGAG TGGCTCTTGT TTGTGGACCA ACTTGAGGTA CGAAAAAGTA GAAGGCAATC GCCTATACAT GAAGCAAGAT TAAGTCAAAG TCAGATGAT TCCGTAAAAC GAATGCTCGT GGTTTATGGA CTCAAATCTG TACGGATTAC AGAGGACAAT CCAGTTCCAA AAAGGC?1'AG AAAGGGAGTT CATCTA'rCGT
AGGCAGGTGT
TTATTGAA
TTGCTTTTGC
TCTCCTCTAC GCAGTCACCA TAGCAGCCAT CCCACAAGTC GCCCACTATC AAGACTATGC TATGGCTAAA CGAACCAAAG ATAAGTTGA AGGTCAGGTA AGCTATCAAA ACAAGAAAAC GCAAGAAACT GGGGAACAGT TrrTAATCT TGGCTTAGTG ACGAGGGTTC CAAAATCAAA GAAGAGAAAA AGTGGAGAAG AAAAAATCAG 378 GTACGGATAA GAGCCAATAT CGrCTGT T1TCCTTCAGT GAGATAAAAA GGAAGAGGTA GCGACCGATr CAAGCGAAAA AAGAGAAGCC TGAAAAGAAA GAGAATTCAT AGTCAATTCA GAA'rAGTCCA CTGTAG ?rC TAGAAAATTG CTGGAAATGC TG?!1'ATA'rC TTAT~CAGT TTACTATACT TrGGCrAAA TTTAACCACA AAGCAGAAAC TTTCGATTCC CCTAAAAATA TGTCAAGCAG CCGAGAAACA GATTGATCTT CTATCAGACA
ACTATAATGC
ATGTTAAGCT
TTAAAGATAT
GrrGAATCCA
CCAATTCATT
GAAACATGAT
TCTTCCTCGC AAACT'rGGrA AAGAAATTTT AGATTTCGGT GGTGGCACGG GTCTrrAGC
AAGCAGGCTA
TTGAAAGTGG
AGTCACTCAC TCTTGTAGAC AT'rTCTGAGA AGCAGCAAGC AATCAAGAAT AATCCC7TGG AGAAAGAGTT TGATTGCCTT GATTTGGATG CGGCTCTCTC ACTGTTTCAT AT'rGCTGATT TTACCAAGAC AGAAGCTAAT AACAAGCTAA TTGAGCATGG TTTTTCATCT GACCTGTTTC AAGGAAATCA CTCACAATTC TAGTCAGGGA GTGA7TTTTC TATAAGGATG
ATCCAGTTTT
GCTGTTAGTC
CAACATrTTA
CATCATGGAT
CTTrGCCCCTA ACCCCTACCC AAATGTTGGA GCAAGCTCGT TGGAGCAAGA TTACCGAAA GGGT'rCTTCA TCATATGCCT AGGAAGA'rGG GAAACTCATC TTGAITTAGC TGAACTGGAA 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 See
S.
S S 555 C S S
S
S. a. S
C
S. *t S
C
GTGCATAGVC AGATTCTCTA TAGTCGAA ?I=NAATAG TAGCCCAAAA ATCACTCGCCC GAAAAAAGAA CGGAAATT GTAAGATAGG AATATGGA-N' TTGAAAAAAT ATCCAAAGTG ATTTGGCGAC CTCGATGGTG AAACTGAGCT GCACTACGCA AAGAAGAATG ACAGAACCCT TGCAGGCCAA TTTATTGTGG AAGAGTTGTT ATGGGAATTC TAGGCGCTAT TGAACAAGC'r TATACCTATT 'rACTAGAGAA TGTCCAAGTC CAACTrAT GACGCC''G TGGAGCAAAA 'rAGCATCTAT AAACCAGGTC AAGGAGAACA ATCAAACCCT TAAGCGTTTA GCTCAAGACC TACCAGTTTC TCTGATGA.A GGCTGGGCAA TCACCAGT'rT ACACCGGATG CTATTGCTTT TAAAGAGGAG CAAATTACTA TCCTCGAAAT T'r'CTTGACC TCGCTTACTA AAAAGGTGGA
GCTTTTGGTG
GGGPTCTGG
TTACTTCGGA
55 5* C S
S
.55.
C
55 5 9 0
ATGGAAGTGG
CAGGCTGGCT
GTCATCAGTG
GTTGCTTCTA
TACCTCAAGT
CAAAGTGATT
ATGATTTGCT GATT1GATCTG GCAGCTAGCA TTG'rCCAAGG AGATGCCGTTI CGCCCACAAA ACTTGCCTGT CGGCTATTAT CCTGATGATG GCCAAGAACA TACTTACGCC CA'rCACTTGC CAGACGGATA CGCTATT'TTT CTAGCTCCGA T'GTAAAAGA ATGGCTGAAA GAAGAGGCGA TGGCAGATGT AATTGGTrrG TGCTCAAAGA AAGCGATGTG CCGTTGCGTC GCGCCATCAA TCATGGAACA AGGGCTTAAG GTGATTTGTT GACCAGTCCT GTCTGGTTGC TATGATTAGT CTGCCTGAAA ATCTCTTTGC TAATGCCAAA CAATCTAAGA CTATrrTTAT CTACAGAAG 72 7020 379 AAAAATGAAA TAGCAGTAGA GCCTTTCTT TATCCACTTG CTAGCTTGCA AGATGCAAGT GM=AATGA AA'TTrAAAGA AAATTTrCAA AAATGGACTC AAGGTACTGA AATATAAAAT AGATTTTGTT ATAATAGTTG AAAACGCTTA AAAAGGGGTA TCATGTTATG ACAAAAACAA TTGCAATCAA TGCAGGAAGT TCAAGTTTGA AATGGCAATT ATACTTAATG CCAGAAGAAA AAGTATTCGC GAAAGGTTTG A'N'GAACGTA TCGGTTrTGAA AGA'N'CAATT TCAACTGTAA AATTTGACGG CCGTTCTGAA CAACAAATT TGGATATTGA AAATCATATA CAAGCCGT'rA AAATTTATT GCATGAC 'G ArrCGT'rTCG ATA'rrATCAA GGC'TTATGAC GAGA'rTACAG GTGTTGGACA 'rCGTGTTGTT GCTGGTGGAG AATAT'rTCAA AGAATCAACA GTTCT'rGAGG GAGATGTTTT AGAAAAAGTT GAAGAGTTGA GTTTGTTGGC TCCTCTACAC AACCCGGCCA ATGCAGCAGG TGTTCGTGCC TTCAAGGAAT TGTTGCCAGA CATTACCAGT GTAGTTGT'r- TTGATACTTC C'TTCCACACA AGTATGCCAG AGAAAGCTTA TCGCTACCCT CTACCAACAA AATATTACAC AGAAAACAAG GTTCGTAAAT ACGGTGCTCA TGGTACAAGT CACCAGTTTG TAGCAGGAGA AGCTIGCAAAA CTCTTGGGAC GTCCATTAGA AGACTTGAAG TTAATTACCT GTCATATTGG TAACGGAGGC TCAATTACAG CTGTGAAAGC CGGCAAATCT GTAGACACTT CTATGGGGTr CACTCCTCTT GGTGGTA'rrA TGATCGGAAC GCGTACAGGG GATATTGATC CAGCTATCAT TCCTTrATTTA ATGCAATATA CAGAGGATTT TAACACACCA GAAGATATCA GTCGTGTCrT TAACCCTGAA TCAGGTCTTT TGGGAGTTTC TGCTAATTCT AGCGA'rATGC GCGATATAGA AGCAGCTGTA GCAGAAGGGA ATCACGAGGC TAGCTTGGCT TATGAAA'rGT ATGTTGACCG TATCCAAAAA CATATCGGTC AGTACCTTGC AGTGCTAAAT GGAGCAGATG CCAT'rGTTTT CACAGCAGGT GTCGGTGAAA ATGCAGA(AG TTTCCGTCGT GATGTAATCT CAGGCATTTC GTGGTTTCGT TGTGATGTTG ATGATGAAAA GAATGTCTTT GGCG'ITACAG GAGACATCTC AACAGAGGCA GCTAAAATCC GTGTCTTGGT 'rAT'rCCAACA GATGAAGAA'r TAGTCATTGC CCGTGACGTT GAACGCTTGA AAAAATAAGT GAAACTAAAA AAATATTCAA TACAAGGAGT TGGGAAAGTT ATN'TCCAG CTTCT'TrTC TGATGAAATT GTCCAAAACC TTGCTATGAT TGGCT'r= GAAAAATATG GTATAATAGT AGTAA'TrAA TAGATGGAGT TGAG'rrTGA AGA"AAACTT TCGTGTAAAA AGAGAGAAAG ATTTTAAGGC GATTTTCAAG GAGGGGACAA GT'rTrGCTAA TCGCAAArTT GTOGTCTACC AATTAGAAAA CCAGAAAAAC CGTTTTCGAG TAGGTCTATC AGTTAGCAAA AAACTGGGGA ATCCCGTCAC TAGAAATCAA ATTAAGCGAC GGATTCGGCA TA2'TATCCAG AATGCAAAAG GGAGTC'TGGT ACAAGATGTC 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 380 GACTTTGTTG TCArTTCTCG AAAAGGAGTC GAAACCTTGG GATACGCAGA GATGGAGAAA 8820 AAI'CTACTCC ATGTATTAAA A'rTATCAAAG ATTTACCCGGG AAGGAAATGG GAGTCAAAAA 8880 GAAACTrAAAG TTGACTAGTT TGCTAGGACT GTCrTGTTA ATCATGACAG CCTGTGCGAC 8940 TAATGGGGTA ACTAGCGATA TTACAGCCGA ATCGGCTGAT TTI'TGGAGTA AAlnrGTTTA 9000 CNCTGG GAAATCAI'TC GCTrTrATC G7NGATATT AG;TATCGGAG TGGGGATTAT 9060 'rCTCTTTACG GTCTTGA'r1C GTACAGTCCT CTTGCCAGTC TTI'CAGGTGC AAATGGTGGC 9120 rT=AGGAAA ATGCAGGAAG CTCAGCCACG CATTAAGGCG CrCGAGAAC A.ATATCCAGG 9180 TCGAGATATG GAAAGCAGAA CCAAACTACA GCAGGAAA'rG CGTAAAGTAT TTAAAGAAAT 9240 GCGTCTCAGA CAGTCAGACT CTCTTTGGCC GA'N--rATT CAGATGCCGG TTAT'rrTGGC 9300 CCTG'N'CCAA GCCCTATCAA GAGTTGACTT TTrAAAGACA GGTCATTTCT TATGGATTAA 9360 CCTTGGTAGI' GTGGATACAA CCCTTGTTCT TCCGATTrMA GCAGCAGTAT TCACC7"tTTT 9420 AAGTACTTrGG TTGTCCAACA AAGCT7"TGTC TGAGCGAAAT GGCGCTACCA CTCCGA'rGAT 9480 GTATGGGAT'r CCAGTCTTGA TT=TATCTT TGCAGTTTAT GCGCCAGGTG GAGTCGCCCT 9540 ATACTGGACA GTGTCTAATG CTTATCAAGT CTTGCAAACC TATTTC'N'GA ATAATCCATT 9600 *.*CAAGATTATC GCAGAGCGCG AGGCCGTAGT ACAGGCACAA AAAGATTTGG AAAATAGAAA 9660 S*AAGAAAAGCC AAGAAAAAGG CTCAGAAAAC GAAATAAATA AGGACGAA'rC TGGTAGTGGT 9720 AGTATTTACA GGTrCAACTG TTGAAGAAGC AATCCAGAAA GGATTGAAAG AATTAGATAT 9780 :*..'CCAAGAATG AAGGCTCATA TCAAAGTCAT TTCTAGGGAG AAAAAAGGC'r TTCTTGGTCT 9840 Ar'rTGGTAAA AA.ACCAGCCC AAGTGGATAT TGAAGCGATT AGTGAAACGA CTGTTGTCAA 9900 AGCAAATCAA CAGGTAGTAA AAGGCGTTCC GAAAAAAATC AATGATTTGA ACGAGCCTGT 9960 GAAGACGGTT AGTGAAGAAA CCGTTGACCT TGGTCAT'GTG GTTGATGCTA TTAAAAAAAT 10020 AGAGGAAGAA GGTCAAGGTA TTTCTGATCA AGTCAACGCT GAPLATCT'rAA AACATGAAAG 10080 ACATGCCAGC ACTATCTTAG AAGAAACTGG TCACATTGAG A'rTTAAATG AACTTCAAAr 10140 *CGAGGAAGCG ATGAGGGAAG AAGCAGGCGC TGATGACCTT GAAACTGAGC AAGACCAAGC 1U200 TGAAAG'rCAA GAACTAGAAG ACTTGGGCTT GAAAGTI'GAA ACGAAC~rTG ATATTGAACA 10260 AGTAGCTACG GAAGTAATGG CTTATGTTCA AACGATTATT GATGACATGG ATGTTGAGGC 10320 *TACACTTTCA AATGATTATA ACcrcTGTAG CATCAATCTA CAAATTGACA CCAACGAACC 10380 *.*AGGTCGTATTr A'rCGGCTACC ATGGTAAAGT CTTGAAGGCC rrGCAACTGT 'rGGCTCAAAA 10440 TTATCTTTAC AACCGCTATT CCAGAACCTT CTACGTTACA ATCAATGTCA ATGATT-ATGT 10500 CGAACACCGT GCAGAAGTCT TGCAGACCTA TGCGCAAAAA 7TGGCGACI'C GTG=rTGGA 10560 AGAAGGGCCC AGTCATAAAA TATTATTTCA CGTATGGATG TGT'rGTTGTA GATACAGAAT GAGGTTAAAC 'rGAT7TGAA CGTAATCCTG ATAAAZCGGG CAAGACGCCC GAAAGGTTTT TGGCAACTCC AACAGACTAG TGGGTTTATC TACAGAGAGA GGGACATCTA CTGACrrG GACCAAACAC TCGGCAAGCA TATCTAACCT ACTCTAATGG 381 CAGATCCAAT GTCAAATAGC GAACGCAAGA TTATCCATCG GCG'rGACTAG TTACTCTGAA GGTGATGAGC CAAATCGCTA
AAGTAAAATC
TAAGATAAGA
AGCAGAGCGA
TACAGAACTG
CCAGTGGATG
CGGACAAGTG
AATTTCTTTG
CGCCAAAGT
TCAAAGTCAA
AGvGT 'TATCC TGAT7*rTTTG GACTATAG AC?1rTGCTGG GAGAAGATGC TGGCATTCCG GCCAAAGCCT TTCAAGCAAG AATCAGGCCC AGCGTGAG ACAGAACCTA TGATGGCCTT GAAGTCAGTT TCATCGAACG TTAGACATTC CAACCGTTA.A
CTAGTTAGAG
TTGCAGTAC
CCACAAAGGA
CCATCCAGAA
ACCACA'TTTT
ACGTTTGTAT
TAAGAAGGAT
AGGGATTTAT
ACTTTACGCG AGAAGGTGAG AAGTCAAGAA CCTATGACAG AAAATTCGTC TGAAGAAGAA AAAATTCTTC CCTATTATCT AGCTACGAGA CATACAGTCC AAGAGTGAAC AGTCCGCTGT GCATTATATA ATAAACTTAC AAAAACAATTr AAGTGTTCTA AATTGGTTTT CTAGCAATAT ATTGGTGTTG ATAGGATATG CACTTTTGAA TGTTAAAGCA ACAGTAGGGr ATATGTTGCT CTTTCGTCCA ATCTTAGCAG CTCTTAACTA CCCTTACTTT GGACTTGCTG CAGCAAACAA TGGAACTGCA ACTACAGCTC TATTGATTGG TCGAAAGATT ACGAAGGTAA GAACCCTCTT TGCAACAGTA 'rCTC~rATGG 1-rCTATrCTT AGCAGCGATT GGTATCATCT GTGGACTTTA
CGGTGGGAGG
GTTCGAAAAG
ATCGTAGAAG
AAATAAGATA
GTAATTCTTG
CAAAAGGAGA
?TTGCAGAAT
AAAACCTGCC
CGAATGAAGA AAAGCGTCGT 'rTTTAG'rGAA GGTAGATGTT GCTTATTGAA GTCTTATTCT ATTTGTAAAA CATCATAAAT GTCTTTTTGT TTGCGCTTTC ACAATTATGG AAGTCGTTTC CCCGCATTTT TCGTAGGT'TT CATGACGTTT TTTCAGGGTT 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 TAACGTGGGT GCTGGTGGTT TGGTTACAAC CAAATCtAA ATTGGTG;CAG CGGTTATCGA
CAAAATTGTA
7TrTTGGAATA TATTrACTGGT
AGTACCACAA
CTGGGCAGr'r GCAGAGTTTC CAGATTTTGT AATATCTTGC TCGTACCTCT CACATCATGG TACAACAAGC TTGCGCAATG CN'ACGGTAC AGTTCAAATA TGACTG'rTGA GGTCACCAAC AGCAAqMTGC AAAGA6AGAAA GTTTAGACAA GTTG'rTGCAT CTGCTACCTT CCAGACATTA TGTCTAATAA GGCAACTCAA CGCTTGACTG GTGGTGGCGG ATTTGCGATT AATCTGGTTT GTAGATAAAG TAGCAGCACG CTTTGGTAAG TCTTAAATTA CCTAAGTTCC TCTCAATCTT CCACGATACA GATGCTCGTA T'rCTTCGGAG CCATTCTTT1T AATCTTGGGT AGAAGTCATC ACTrCAG3GAA TATCCAAACA GCCTTTACCT GTTCGTATCT GAGIGACAA
ATTCCCAGCG
AGGATTTACC
AAATCCCATT
GTTGACGTTG
TTTGGT'rTGA CTTAT'rATTA GGTCTACGCT GATAAACGCG TCTCCTTCAA GTTGCTCTAG
TGGCTACCAT
ATACCTTGGT
'rCAATTTGCC
TTAGTATCTA
TGGGTTCATC
GGAAATATCG
AT'rGTT-GGTT
AAAGCAAAAG
GAAAAGGAGA
AATGGT'rATC CAGATTTTAC AGTCAATTCA ACATCGTAAT CGCTTCTCTT TAATTGGGC' TGATAACTTG TACAGTAAAA CGTI'GGAGGG CCTCTTTA-AA TAAAGGAGGC ATCCGACTAG GTTTAGAGGC TTAATTGAAA GTGGGGCAAT GAGTA'rCGGC CTACTATAT GCAGGTGTGC AAAGTGATGC GATGGGAAAG AGGTATCTGT AGTGTAGCCA TTCCACAAAT CAGGCTTGCC AGACTAAAGA TATCTCGAAG GAT'rGGATTT ACCTAATTTA CAAGTTGCAT TTCTGT'rGGT CAGGAAGTAG AAGTGAACTG GCTGAAGTCT AAAATGTGCT GATGCTGGTG 382 CTCTATTCAA 'rCCTGCTAAA CAAGAT1TTCT TTATGTACAT TCTCAGTI'rA CTTGTTCGTT 1'TGATGCAAC GTGT1CCGAAT ACGCCTTCCA AGGTATTTCA AACAAATTG'r TGCCAGGTTC CAGCTTCTTA TGGATTTCGT TCTCCAAATG CTTrGTC 'rrGGCAA'rT GATTACAATT GTT~rGCTCA TCG;TCTTAA CAGGATGT ACCAGTGTC TTTGACAATG CAGCCATTGC GCCGATGGAA AGCCGCTCTT ATCCTTCCT TTATATCAGG GAGCTCTTTG rG'TGGCCCTT CTCGATTTGG CATCTTATGG ACTTTGAATT CCCATGGCTT GGATTTGGAT ATATCTTCAA ATGTACTTGT GTGTCTCI'TC TTGCTTGTTA TTCCTCAACT ATAAAGAGAA ATATTACAAC GGTGAAGTTC AAGAAGAAGC AATAAAATGG TTAAAGTATT AGCAGCGTGC GGAAATGGAA AAGATGAAGG T'rGAAALATGC TCTCCGTAAG CTTAATCAAA TGCAGTGTCG GTGAACCTAA AGGTTAGCA GTAGGATATG CATT'rGATTC AAGAATTGGA AGCGCGAACT AATGGGAAGT ATGGATGATA AAGAAATCAC CCAAAAACTC AGTCAAGCAC GGCTGrGACAG AAACTGAGAG TTATCG~rTC TGTCCTTCTC AGATATGAAT TTAAA.ACAAG CTTTAANTGA CAATGACTCG TAACAATTGG AAAGAAGCAG TCAAGG'rAGC AGTAGATCCC 'rTTGCCAGAG TATTACGATG CTATCATTGA ATCGACTGAA C'rTGATGCCA GGTATGGCTA TGCCCCACGC TAGACCTGAA CrTT'rCATTG ATTACCTTAC AAAATCCTGT TGTATTTTCA TTTGTGGCA CTAGCAGCAA CAAGTTCAAA AATTCACACA TA'rrGCCCTA TTTGAATTAG AAGATITCTAT TGCACGTTTA AGATGTCTTG GCTATGATTG AAGAATCTAA GGATAGCCCT GGAAAGTTAG AAACAGGAAT AAAGAAATGA CAAAAAGAAT TAGACCATTC AGACTTGCAA CGAGCGATTA AAGCAGC'TGT ATAT'rATCGA AGCTGGAACT GTTTGC~rGC TTCAAGT'rGG TGCGTAGCCT TTTCCCAGAT AAGATTATTG TGGCAGACAC GAACAGTTGC TAAAAATAAT CCGGTTCGTG GAGCAGACTG 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 383 TACTATGGAA GCAZC=CAA AGGCTATCAA 0* GATGACTTGT ATCTGTrGTIG CAACCATCCC GACTGAACGA GGAGAACGAG GCGAAATCCA ACAAGCTCAG CTGGCTAG ATGCAGGTAT TGCTCrCTr GCTGGTGAAA CTTGGGGTGA TGACATGGGC TTCCGTGTAT CTGTAACAGG TGAAGGTA'1r GATGTCTTTA CCTTTATCGC AGCAGGAGCA GCGCGTGCCT TCAAGGATGA ACG'rCCAATT GGAATTTA'rG AAAAGGCAAC AAATTTTGCC AAGGAGTTAG CCTTTGATTT GCGT'rTAGCA AGACTTGACT GGACTAAGGA 'rGAAACTG.GT GTrcGTAT'rc cTTcTATCTG TTCAAAAGAT CCAGT'rCTAG AGGAAAAATC AGCTCAAGAC T'TGGGAGTTC GTACGATTCA AAAGTCACCC CAGACACGCC AACGTTTTAT TGAAGAAGCT CAGGTGGTAC TTGCTATTGA CGAAAAATAT rrGGCTATAG AAAAAGAGAT TATTGGTAAT GTGTCTGCAT GGCATAATGA TGCCATCGCA GCTCTCCATC TCAAGGATAC GT'rCCGAGAT GTACC~rrCG GGCAAGGTTG AAAGGAAACC AATTATAATG GACCTTTCCT AGTAGAAGAA ACACGCGCAG CCATTCAAGA GAAAGCAGGT rrGATGTAAG ATGAATCAAG ATGCCAATCA ATCATTGCCA AAACATGGAC AAGTTAATCG CGAACTCGGT GTCATrGTTA TGACACCTGA AAACATGGTA GTGACTGA'rC GACCATCTTC CGACCTCCCA ACTCATGTGC GTGTGG~rCA CACCCATTCG ACAGAAGCTG CTTTCTACGG AACAACCCAT GCAGAT'rATT TGACCAAGGA CGAAGTAGAA GTGGCCTATG
GATCGAGCTT
CTCACAAGCT
AAAAGACCTT
TGGTCTAGAT
AGGTCGTGGA
AATCAAACGA
CCCAACACAC
TGTCGAGATG
TATGGCGAT
AT?1'ATCACC
AATAAGGTTA
GTAGATACTC
ATTACAGAGG
AT'rTGGGGGT
TGTACTTGC
TCTAT'rGACG AGAACGCTTG GAAGTTGTCA AAGCAATCTA TTTTTCAGGC CATCGTCGCT ACCCAT'rGGG TCTAGAACTC ATGAAAAAAT GTATCGAAT ArTAGCTGGT TACGATGTTT ACTATGAGGA CAAAAATTTG AGAAAAGCCT GTGACTGGGC AATT-ATGGAT GATCCTTTCA TCAGTAGCAT TGACTCTCCC TTCCTCTTTG TATATCCAGA 'rATCTATACT GAGTTTTATC TTGGTCATCA TTATGCAGTG ACAGAAAGTT CAAAGGGCCA TGTCAAATGG GA.AGAAGCCT TCGATATr= AATCGAAATG TGGTCTGAAA ATTGTGAAAC GGCGCAAGCT TTTCTrCTATC CACT1CATTAA TAATCAATGC TATGCGTAAA CGAGTCTGTG
GCACTTTTGA
AATCTCGTCA
AAAAACTCAT
TCAAACTCTT
CTGrGCCc
AAATCATGGT
TAGAACGTT'r
AACGTGACGA
14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840
TTGTCAAATT
TCAAACCATC
TAGATGGTAA
AATTATATAA
rTTGTTGGGC
TCTACGGTTIC
AAAAAGATAC
TACCTGGGGG AATGTATCTG AGGCCTGGAT TA'rGACGAAT GATCCTACAA GGGGATTTAA GACTTGGTCA GAAATTGGTA TCAGG;CAGGT CGTGATATTC AATCCCTTGC GCCCGTAGTr TGGCCTGGTT ATCGTAGAAG 384 AGTTTGAACA 'rCGCGGAC1'T AACCCGGTTG AAGTACCAGG AATTGTTGTA CGCAATCACG GTCCATTCAC CTGGGGCAAA AATCCAGAGA ATGCTGTTTA TCACTCTGTC GTACTAGACG AAGTATCAAA GA'rGAATCGC TTTACAGAAC AAATCAATCC AAGAGMWGA CCTCCTCCCC AGTACATACI' ACAAAAACAC TACCAACGTA AACATGGACC AAATGCTAT 'rATGGTCAAA AGTAAGAACG ATGAACGAGG AGAAAAAGAT AAATTTAGCT CCTCTTII-A Ci'r-AT TTAT'rGAGAG TAAAGTTGGA GTTGAAGTAA TrTAAAAGA =rTTAGAA ATAGCGCTTC ATATATATAT GGTAAAATAA AAACAATTGC TGTGATATCA ATAGAT'rTGG GGGATTTTTT AATATGGTAC TGGATAACGC AAGTTGTGAT TTGCTTCAAT ATTTGATGGA TCAAGAAACG TCCAAAACGA TTATGGCGAT TTCGAAAGAT TTGAAAGAGT CAAGAAGGAA AATTTATTAT CACATTGACA AAATCAATGC TGCTCTGGGT GACGAGGCGC TTCACATCAT TAGTATrCCA CGAATTGGTA ITrCACTTAAC GGAAGAGCAG AGAGATGCTT GTTGTAAACT ATTATCCGAA GTAGATTCGT ACGA'N'ATAT CATGAGTGCG CATGAACGTA TGATGATAAT GTTACTATGG ATAGGTATTT CTAAAGAACG TATTACGATT GAAAAATTGA TAGAGTTAAC AGAGGTA'rCT AGGAATACTG TTCTCAATGA TTTGAATAGT ATTCGTTATC AACTAACT'rT GCAACAATAT a 9* *S
S
9* S.
S S
S
a a.
CAGGTGATCT TGCAAGTGAG AAAATTCAGT ATCTTCAATC GTATCTAT'rr TAGAAGATAA GAAATGAACC AATTTTTrAA ATAAACCATC ATGAAATAAC CATAATGTTG AACAGTATCA AGAAAAAGAA TAGAGTATCA GAAATTTrCTT TGTCAGGACT CAAGTCACAG GGATACAACC TTCATGCCCA CCCTCTTAAT GCTTCrATAT CATATTT'A TGGAAGAAAA TGCCACTT GATGAAAGAG AGGTTAGATC ATGAGTG=? GCTTTCTGTT GGAACAGGTT CCTTTAGTTG AACAAGATT-T AGGGAAGAAA TTTTATGTTG CAGGTTCTAC CTTATTTGCT GTTAAGCTGT AGAAAGACAT CAGGATATAG AGAAAGAATT TTCTTTGATA 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 GGTGTCTAAG; AAATTA6GAG TGAAGTTTCT CTTGTAGCTG AAAGATTTGG ATATTCATGC AGAAAGTGAT GA7=TCGGC GAATTTATCT GGTATTTTGA ATCACAAATC CGAATGGAGA TTACGAAATT TGATGATCCA CTGTAAAGCC TTGTTATTTA TCTAAAAATC CTCTAACAAA ACAAATTCGA TCCAAGTATG AGAAAATCTG CGGAAATTTT AGAAGGAGCA TGGTT'rATTC GCCTATTTGA CGA'N'CATAT TGGAGGATTr TTAAAATATA AACGGTTGTT TCAAAAGT TTCTCCTCCT CTCCTATCGT AATTAAAACT TGCTTTAGAA TTGAGAACAA GGATGATTTG GAAAGACTTA CGGTA'TTT GAGAATTATT TNTAGTCACT GGCTAACAGA CGATGATATT CACCATCATC TCAAAAAAAT
*S*S
*.SS
S. *S S S
S
ATGAAAAAAG 'NTTATCTCGT TTGTGATGAA GG'rGTTGCGG TTTCGAGACT TTTGCTGAAA CAATGCAAAC TTTATTTTCC AAATGAGCAA ATTGACACTG TATTTACAAC AGAACAATTTr
AAGAGTGTGG
AGCAGATTTC
CTAGACTATC
TCTAGrCTA
GTTCAAACAC
ACAGTCCAAT
TGGkATACTA
AAAGAAATTA
AAGATATTGC
CGATTTTAAG
TTAAACACAA
TTTCGTCTTA
TTATAAATCA
GATGAACACA
ATCTCAAAGA
CAAGAGAGTC
ACAAGTTGAT GTAGTGATTA CTACTAATGA TGAT'rTGGAT GGTrAATCCT ATCCTTGAAG CAGAAGATAT TTTGAAAATG TATATTTCGT AATAAGAGCA AAAGTrTCAG TGAAAATCT TATTGTAGAC ACAAGTTGG CTAGTAAGTT CCAAGAAGAG AGAAATAGTA GTTCAAGCTT TTTGGA.AGr TATTGAAGG AACCTGTGTk TTTCsTGGTC TTTTtTAGTG; TTTTGAAGGG TAACAATTAT ATCCAAAGGA GGCAACATAT GCCAAACGTC ATGGATTTTA GCCACTTTCC CAGAGTGGG AACATGGTI'G AGTCGTACCI' GAAGGCAACT T'rGCCATGTG GTGGCTAGGC GACACCAGCT GGTGCTAACG TTGTCATGGA CCTTTGGTCA AAAAGTGAAA GATATGGTTC GTGGGCACCA AATGGCAAAT GCAACCAAAC TTGCGTGTTC AGCCAATGGT TATCGATCCA CTAT'rACTTA GTTTCACACT TCCACAGTGA TCATATCGAC TCTCAATAAT CCTAAGTTAG AGCATGTTAA GTTGG 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18475 AACGAAGAAA TCGAAGAAGA AACTGTGGTA CTTGGATTAA AACCGTGGAA AATCAACCAA ATGGCAGGTG TTCGTAAGCT TTTGCTATCA ACGAACTAGA CCATACACAG CTGCAGCAAT INFORMATION FOR SEQ ID NO: 39: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 7186 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: CCAGGATTTG GTACCGTTGC AAGTGGTGTG ATCAATCAAT CAGCACATTC AGATATCAA GAAAAAAATC GCT-rGCTTGC AGCAGGGAAT ATTTTATCAG ACCAGGATAT TACTATCGTA AAAACCTTTA TCACTCGTGC CTTGGAAGCT CTTTTAGCTG TCCATGGCGC AGAATTGCTA TACTACGAAG CAGCAGTTGC TGGTGGGATT
CCTTTCCTCC
GTTGCTAAGG
GACTTTAACT
GTGGAATTGA
GGAAAACACG
GAAATCGCTC
CCAATTCTTC
TAAAGGAAAA TGGAGGAAAA TATTGGTCAA GGATGAAGAT TTGTAA.CCAA TGTGGATGAT TGGGGCGTAT TGAGCCTGCT TTGTTACTGC TAACAAGGAC AAGCTAACAA GGTACCACTr GTACTTTAGC AAATTCCTTG GAACTTCCAA C'PTCATGGTG TTGCGGAAGC ACAACGTCTA GCTTCTGATA AAATTACGCG CGTGCTTGGA GTAGTCAACG ACCAACATGG TGGAAGAAGG CTGGTCTTAC GATGATGCTC 386 GGATTTGCAG AAAGCGATCC GACGAATGAC G1'AGATGGGA TTGATGCAGC CTACAAGATG GTTATTTTGA GCCAATTTGC CTITGGCATG AAGATTGCCT TTGATGAX'GT AGCCCACAAG GGAATCCGCA ATATCACACC AGAAGACGTA GCTGTAGCTC AAGAG3CTTGG TTACGTAGTG AAATTG.GTTG GTTCTATTGA GGAAACTTCT TCAGGTATTG CTIGCAGAACT GACTCCAACC TTCCTACCTA AAGCGCACCC ACTTGCrAGT GTGAATGGCGT7CTGAACGC GAATCTATCG GTAITTGGTGA GTCTATGTAC TACGGACCAG GTGCGGGTCA
GCAACAAGTG
GGCAAAGACT
TTGTAGCTGA
TCAACGAATA
TATTGTCCGT ATCGTTCGTC G71TTGAATCA TAGCCGTGAC TTGGTCTTGG CAAATCCTGA
TGTCTTTGTA
AAAACCAACT
TGGTACTATT
AGATGTCAAA
GCAAACTACT A'rTTCTCAAT GAAATCTTCA ATGCTCAAGA GACAAGGCGC GTGTCGTTA'r TCAGCTCAAT TGAAGAAGGT GAATAAGATG AAGATTATTG CTTGG~CTA GACTCAAAAG GTCAGGTCTT GAAGTTGGCT TATTTCCTTT AAGCAAATCC TTCAAGATGG CAAAGAGGGT CATCACACAC AAGATTAATA AAGCCCAGCT TGAAAATG'rC TTCAGAATTC GACCTCTTGA ATACCTTCAA GGTGCTAGGA TACCTGCAAC CAGTGCCAAT ATCGGGCCAG GTTTrGACTC GGTCGGTGTA GCTGTAACCA AGTATCTTCA AA'N'GAGGTC GCTGA'IT1GAA CACCAGATTG GCAAATGGAT TCCACATGAC AATCGCT'rTG CAAATTGTAC CAGACTTGCA ACCAAGACGC CCCTTfTGGCG CG;CGGTTTGG GTTcTTcC-Ac cTcGGT'rATc TGCGAAGAAC GAGATGAGTG GAGCGTAATC TCTTGCTCAA TTGAAAATGA CCAGTGATGT GTTGCTGGGA TTGAACTAGC 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 '1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 CAACCAACTG GGTCAACTCA ACTTATCAGA CCATGAAAAA TTGCAGTTAG CGACCAAGAT TGAAGGGCAT CCTGACAATG TGGCTCCACC CATTTATGGT AATCTCGTTA TTGCAAGTTC TGTTGAAGGG CAAGTCTCTG CTA'rCCTAGC AGACTr'rCCA GAGTGTGATT TTCTAGCT'rA CATTCCAAAC TATGAATTAC GTACTCGCGA CAGCCGTAGT GTCT'rGCCTA AAAAAI'rGTC TTATAAGGAA GCTGTTGCTG CAA CTA'r CGCCAA'rGTA GCGGTTGCTG CCTTGTTGGC AGGAGACATG GTGACCGCTG GGCAAGCAAT CGAGGGAGAC CTCTTCCATG AGCGCTATCG TCAGGACTTG GTAAGAGAAT TGCGATGAT TAAGCAAGTG ACCAAACAAA ATGGGGCCTA TGCAACCTAC CTTTCTGGTG CTGGCCCGAC AGTTATGGTT CTGGCTTCTC ATGACAAGAT GCCAACAATT AAGGCAGAAT TGGAAAAGCA ACCTTTCAAA CGAAAACTGC ATGACTTGAG AGTrGATACC CAAGGTGTCC GTGTAGAAGC AAAATAAAGA ATAGAAGATA GGATGGGGAA ACTCTTGACC AGAGGGGTTC ATATCCTTTT TGTGAAAAGA AGTTTATACT CAATGAAAAT CAAAGAGCAA ACTAGGAAGC TAGCCGCAGG CTGCTCAAAA CAGTG~rrG AGGTTGCAGA TAGAACTGAC GAAGTCAGCT CAAGACACTG 7TTTGAGGTT GCAGATAGAA CTGACGAAGT CAGTAACCAT ACTACGGI'AA GGTGACGCTG ACGTGGTTTG AAGAGAT?11r CGAAGAGTAT TAGTAAAAA CGTGATAAAG GAGAAATAAA GATGGCAGAA ArrATCTAG CAGGTGGI-rG 71"rGWGC CTAGAGGAAT ATTMCACG CAT 'TCTGGA GTGCTAGAAA CCACGGG CTACGCTAAT GGTCAAGTCG AAACGACCAA ?I'ACCAG?1'G CTCAAGGAAA CAGACCATGC AGAAACGGTC CAAGTGATTT ACGATGAGAA GGAAGTGTCA C'TCAGAGAGA rMrAC?=A TTATTTCCGA GTTATCGATC CTCTATCTAT CAATCAACAA GGGAATGACC GTGGTCGCCA ATATCGAACT GGGA'TrATT ATCAGGAPGA AGCAGATTTG CCAGCTATCT ACACAGTGCT GCAGGAGCAC GAACGCATGC TGGGTCGAAA GATTGCAG 'A GAAGTGGAGC AA'ITACGCCA CTACA7TTCTC GCTGAAGACT ACCACCAAGA CTATCTCAGG AAGAATCCTT CAGGTTACTG TCATATCGAT GTGACCGATG CTGATAAGCC ATTGATTGAT GCAGCAAACT APGAAAAGCC TAGTCAAGAG GTGTTAAGG CCAGTCTA'rC TGAAGAGTCT TATCGTGTCA CACAAGAAGC TGCTACAGAG GCTCCATTTA CCAATGCCTA TGACCAAACC TTTGAAGAGG GGATTTATGT AGATATTACG ACAGGTGAGC TTGGCCAAGT TTTAGCCGTC CCATGGAATG GAGCGAATTG TTTCACAGAT GGACCGCGGG ACGCTTTGTG GCCAAGGATG AAACAAATAA AACAGAGAGT GGGATTTATG AAACACCTAT CCCCTTGTTC AAGCTGTTAG GATTGTTGAC CAATCTTTAC GCTCCTTATC TTTGCAGTAA AAAGGCAGCA GTAGGTTCTG CTTGCCCAAG GACAGCAGAG GGATACCTAC CAGATTCAGA CATTATCGTT TTTGGTGCCA GTTCTTAGTC TTGGTTGCCA TCCTrTCTAC ACTAGTCTCA ATTGCAAGGG ATGCGGG'rTA CACTCTT'TT TGCCAAGGAT AAGTTTGCTT CAGGTTGTGG CGATTTCCAA AGAGTTGATT CATTATTACA AGGATCTGAG
AAGTTCGTTC
AGTTAGGCG
AGATGGAAAA
GGGGCT'rCCC
TATCTTACTT
AAGCTGTTTT
CTCAGG4GAGA 'I-rGGCGTrT
CTAAGGAATT
ACCGTCTGAC
CTGIGTATCAA
TTTTTATGGC
T'rTTGACCAT
GAAAGAAAAC
TTCG'rGCT'1 TCGTTCAGGC AGTGCTCACT TGGGTCATGT CCTCCGTTAC TGTATCAATT CTGCTrCTTT ACCAGGATAT GGCTATCTAT TGCCTTACTT ACTTTCTTCA TTTCTAGAAT ATGALATAGAA CAAACCCTAC ATCAAGGAAT CAATTTTAGC TGAGCTCTTG GTTCCCATGG TGATTGCTGG 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600.
3660 3720 3780 3840 3900 3960 4020 4080
TCAAGGTCAT
AGTGGCCTTG
GACAAACGAT
AACTTCTAGT
TCAATTCCTG
T'rATCGAATC
TGTCATTGTA
CTCTGGATGC AGATTGGCCT ATAGCTCAAT TTTACTCAC CTTTATCGTC ATATTCTTTC TTGGTCACTC GCTTGACTTC CGTCTCTTTT TACGAGCGCC TCAGCTGAGT TGAC7TTCTG GGGTTA'rCTC GATTGGTCAA GGACCAACTG GTTCAGGAAA CGCGCCAGCA TGGTCAAGAA AAACGAGAGT TACAGATTTT TCAAACCCTT AACCAATN ATGCTAGATT ATTAACACCT CTGACCTATC TGATrrGTCAA C 'ATATTTCA ATTCAAGGAG GAGTGCTCAG CCTCTTACAG AlrTrGGTGG AAT'rGGTCAA GTrCCTATATC TCAGTCAAGC GAATCGAGGA TTCAGAGTTA CAACAAAAGC AAGCTACCAG CTTACCTA'r CCTGATGCGG CCCAGCCTTC AGGACAAAT CTAGGTATCA 'rCGGGGGAAC CTTACTTGGA CTTTATCCAG TAGACAAGGG TCCTC?1'AAT T'rCGAGCAGT GGCGGTCTTG LCirAA.AGGA ACCATTCGTT CCAACTTGAC GGAACTCTGG CAGGCCrT AGATTGCGCA ACTCTTGGAT GCTCTAGTTG AGGCAGGGGG ATTG'rCTA'rC GCCCGAGCAG TCTTGCGCCA CTCGGCACTG GATACCATTA CAGAGTCCAA 388
ACAAGAAAAG
TGGAACrT?
TCAGGTGCI'
GCrAGCCATG AGTCTI'rGT
AGATAAGGTT
'rCTGAGATAC TGGrrCTG40T
GAACATTGAC
ACAGG CT CGTCTAGTTT CTCG7'rA'rTA TCTGGCAAGG CTCATTGCTC T1TATCAATTA 7TGATCAArr CCCTCAACCA GAGGCTCCAG AGGATATCCA TTACAAGTCC AAGAATTGAC ATTTCCTTTrG ATATGACTCA AAATCAAGCT TGGTGCAACT CTTTATCAAA ATGGACGTAG GATTGCCTAT GTACCTCAAA AGGTCGAACT TCTAGGTTTrC AATCAAGAAG TATCTGACCA
S
55*5 5555
*.SS
S. *S S S
AGCTAAGGAT
GCGAAA'TTC
CGCTCCGTTT
GCTCTTGAAA
AAACACGAGC TTAATTTTGA TCTCTCAACG AACCTCAACT TCTCCTCTTG CAAAAAGGTG AGTTGCTAGC 'rGrrGGCAAG CAGCCAAGTC TATTGTGAAA TCAATGCATC CCAACATGGA GACAAACTGT AAACCAGACG CTCAAACGTT TAGCCGTAGA TCCTTTTCCT AGCCTT'TCTA GGAACTATTG CCCAAGTTGG TTCTGATTGG GCAGGTCAT GACCAAGTCC TAGTGGCTGG AGA7'rNT'CT CCAGATGCrC T'rGGTGGTAA TAGGAAATAC CTCTCCTCTA TAATCGTCTA ATCTTCTCTT ATACCAGAGA ATAAGCTCCA TCGTTTACCG ATTGCCTTG TAGATAGGCA GTCGTGTAAC CACGGACATC GAACAGTTGG CAGCTGGCTT TTTTCATTGG TGTrTGATG A7NrTGGTCA GTATTCTAGC TCATGACTCT CTTAG'rCTTG CT3TTGACGC CACTGTCCAT CCAAGAAATC CrATCATCTC TTCCAGAAGC AAACAGAGAC TGATTGAAGA ATCGCTTAGT CACCAGACTA TAATCCAGTC TTATCCAAAG ATTGCGTGAG GCTCATGACA ACTACTCAGG TTTGTCAGTG AAAAGGAAG TCAGGTGGAC AAAAACAAAG CTCATCCTAG ATGATGCAAC GCTA-rAGAG AAAATTTrCC TTACAGATGG CGGACCAGAT CACGATGACT TGATGAAATC AAGGAGGACT AGAATGAAAC r'rTAG4CAAGC CATCCTTTCC CT'rATCAATT TACCTACCTA 'rTCATCACCA GT'T7rTGGC TCTGGTACAA TGGCCAATC TTTACGGGAG CGAATCATCC AGGTAGTGGA GAGATGGTTA GACCATGATT TTTAACCAAT CATGCTCCAA ATTCATCTCC GGTGATTTCA CGCTTTATTG GAGGGGAATT CAGACTCAGT CTTCAATGCT CAAACAGAAT CTATTCTCAG TCAGCCATCT 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 TTTATrCTTC A.ACGGTCAAT CCTTCGACTC GCTG'rAA.A TGCACTCATT TATGCCCTTT 5940 TAGCTGGAGT AGGAGCTTAT CGTATCATGA TGGGTrCAGC CTTGACCGTC GGTCGTTTAG 6000 TGACrITI'TT GAACTATGTT CAGCAATACA CCAAGCCCT T TAACGATATr TCTrCAGTCC 6o060 TAGCTGAGTr GCAAAGTGCr CTGGC? TGCG TAGAGCGTA'r CrATGGAGTC ?1'AGATAGCC 6120 CTGAAGTGGC TGAAACAGGT AAGGAACTCT TGACGACCAG TGACCAAGTr AAGGGAGCTA 6180 TTTCCTTAA ACATGTCTCT TTTGGCTACC ATCC'rGAAAA AAT TTTGATT AAGGACTrGT 6240 CTATCGATAT TCCAGCTGGT AGTAAGGTAG CCATC GTTGG TCCGACAGGT GCTGGAA.AAT 5300 CAACTCTTAT CAATCTCCTr ATGCGrTI' ATCCCAT'rAG CTCGGGAGAT ATCTTGCTGG 6360 ATGGGCA.ATC CAT-rrATGAT TATACACGAG TATCATTGAG ACAGCAGTTT GGTATGGrGC 5420 TrCAAGAAAC CTGGCrCACA CAAGGGACCA TTCATGATAA TATTrGCCTTr GGCAATCCTG 6480 AAGCCAGTCG AGAGCAAGTA ATTGCTGCTG CCAAAGCAGC TAATGCAGAC TrTTTCATCC 6540 AACAGTTGCC ACAflGGATAC GATACCAAGT TGGAAAATGC TGGAGAATCT CTCTCTGTCG 6600 GCCAAGCTCA GCTCTTGACC ATAGCCCGAG TCTT'rCrGGC TATTCCAAAG ATTCTTATCT 6660 -TAGACGAGGC AACTTCTTCC ATTGATACAC GGACAGAAGT GCTGGTACAG GATGCCTTTG 6720 *CAAAACTCA'r GAAGGGCCGC ACAAGTTTCA TCATTGCTCA CCGTTr'GTCA ACCATTCAGG 6780 ATGCGGATTT AATTCTFGTC TTAGTAGATG GTGATATTGT TGAATATGGT AACCATCAAG 6840 oAACTCATGGA TAGAAACGGT AAGTATTACC AAATGCAAAA AGCTGCGGCT 'rTTAGTTCTG 6900 AATAAGCCAT TCTC7T'GA AAGTTTA'rGG ACGAAAAAAG TTGCCT TCGA GTGACTTT-rT 6960 *TGTTACAATA GCTAGAAAAA TTGTTCACrG TAATACTCAA TGAAAATCAA AGAGCAAACT 7020 AGCAAGCTAG CCGTAGGTTG CTCAAAGCAC AGCT'rTGAGG TTGTAGATAA GACTGACGAA 7080 GTCAGTTCAA AACACTG'rTT TGAGGTTGCA GATAGAACTG ACGAAGTCAG CTCAAAACAC 7140 TGTT'TTGAGG T-rGCAGATAG AACTGACGAA GTCAGCTCAA AACAGG 7186 INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 14273 base pairs TYPE: nucleic acid os.. sTRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CTGAAAATTC TAAAAAATPT ATAAGTAAGG AATTAATTAG rTT' GT GATAAAGTTT ATGATGAAAT ATTTGTTGAA GAGGTAGTTC CGCACCTTTT TCTGCCATAT GAATCTGACT 120 TACrrTAT TTTACCAGCT ACCGCAAATG TGATTGGCAA AATGCTAAT CGTATTGCTG 180 ATGATTTAGTr TACAGCAACT GTTAAACT 'rTAA'rAAAAA AATAAr"r'r TGTCCCAATA 240 TGAACrCTAC TATGTGGGAC AATCACATAG TTCAAAGAAA TGTATCAATTr CTAAAGGAGT 300 TGGGACATAT ATr=ATTT GAGTCTAAAA AAACATATGA GGTAGGATTG CGTAAAGCAA 360 TAGA'rrCAAC ATG'rTCAATG TTACAACCAC AGTCGTTAGT AAAAGAACTT ATCAAATTAG 420 AAAATArrGT1 CCN'GAAGAG GGACA'rTAAA AACTACTGAG AATATTAATC AGGGGAAAAA 480 ATGGAAAATT CATCAATCGA TGTAG.ATATG CTGTTGGAAG AATTGACACA AGAAGCAATG 540 GTCCTGTTG CTGTTGATAA GGACTGTTAA 'TwAAACTTA TCGCAATATA TGAAAGGTTA 600 CTGGA'rGTTT TAAATTATGC AGGCACTAGC CT'r'TATITAT ATACAAATGG ATAAAGTAAG 660 GATAATACAA TGATTAATAA AAAAATACAA CAAGTTGTTT TGGAATCA'N' ACAGAATTTT 720 TTGAA'rGGGA ACTTCATT'rC GCCTTGTGTA GTCTATGATT TTGGCT'rGCT GGAAACrGTA 780 CTTGATGAAT TTAAAAATCA AATTCCTGTA ACATTCAAT'r ACCAAC7"r' TTATGCCGTT 840 AAAGCAAATT CAAATGAGAA GATACTTGAA TTCTTAGTAG ATAAAATTGA TGGAGTTGAT 900 *..GTGGCGTCAT TATCTGAATr AGATGTGGCT AAAAAAT'rrr TCCCACCAAC TCAAAT'rTCT 960 *..GTTAATGGTC CCGCAT'rrTC 'rVATGAAACTr TTATATAATC TGATTAAAAA ACAATATAAA 1020 ***GTTGAPTATTA ACTr.-"TGGA ACATCTTCAA CAATTTCCC CAAAAGAATC TGTTGGAATA 1080 AGAGTAACGG AGCCAGATGA ACTTAATAAT CGTATGAGTC GATTTGGA.AT AAATATTrTGC 1140 AGTGATAATT GGACTAGTAA T'rTACAAAAT CCTTTAAT1TA CACGACTGCA TTTTCA=N1 1200 *GGAGAAAAAG ATGA'rAAATT TAT'rCTTAAG TT'AGATAAAA TATTATTTAA GTTACAAGAA 1260 ATTAATAAAC 'rTAGAGAGGT TAGAGAAATA AATCTTGGAG GCGGTTTTAT GAAATTATTT 1320 ATGGAA.AATC GTTT'GAAAGA ATTTTTTCTA TCACTTATGG AAATCTATAA AAAGTACGAT 1380 ATTGATAGTA CTGTGACTAC AATAATAGAA CCAGGTAGTG CAATTACTTC ATTTTCTGCC 1440 *TATATGATTA CTAGCCCAG'r TAATGTTAGT GAGGTGAATG AGCAGCAGGT TATCACGTrA 1500 GACACATCAA TATACACCAA TACATTATGG TTTGTTCCGC ATATTATTAC AACGTTAAAT 1560 TCAAGTAGTA AAGAGCGTTA TAGTACTATT CTCTATGGTA ATACCTGTTA TGAACATGAC 1620 AAGTATAAAA ?CAAAGTTTC GCTTCCAAGG TTAACTCAAA ATAGCAGTAT AGTCTTTTTT 1680 *CCTGTAGGAG CTTATATAAA AAGCAATCAT TCAAATTTAC ATCGTAATGA TN'?ATGCGG 1740 GAGGTATATT TGTGGACAAA AAACTTGACA TATTAGATAA AGTTAAGGAA TATTTAGGAA 1800 ATAAAACTAC TCAAATTCTG GATAATCAAT ATAAAGAATT TTTGAAACTT AATGATATAA 1860 391 GGCGAGCGTT TGGTA~rrCA GAAAAAGTAT TAAACAATTC L I LAATI-1- AATTTAATGA .TTTAATAAT AACGAAAATT1 ATN'ATTCGA ATATGCATGT ACCAATGGAG AAAAAAATGC TTTAATCATT CTTATCG1-1r TCTATr.CTC-A CAGATGA?1r TCTTAACACG AAGACATTGA GAAGTAGCCA AATTGAATAT GATATTTATC GAAAAGTTCG ATAGGCGATA GAGCGGTTGA TGGC7TTTG= CTTTAACAGC TAATGGTATG TCTGCTATTA AACTATGTCT TGAGATATTA
ACGACTAAAG
AGAAI-rAGAG
CCTATAATTA
AAATATGAC
TCCTTCAATA
AACTrCTATTT TC7'rCAAGAA
TAAATAATCT
AAGATAAATT
GAAGATTGAT
TGCTAAATCA
TTATAATGTA
AAACTGATrr CAAGATAGTA TTATIAGA TATTTCATAT TTAAAT~lrlGC GAATGTAATT GAT'rAGAA?1' GACAAATGGG TGAAAAATTT TATTGAAGAG GCCTCTATGA ATACTGTTTG CTGATTTAGT TATGAAATTT AAAATTCTGA TATTGAAATC TAG4GTGAAGG TAAAA.AAGAA AGCTTGGAAT CACATTGTAT ATTTTGGAAT TATTGGGACA
TTATTATATT
GGTAT'rAGTT 'rTCATGA'rGG
GAATATTTTG
CAAGGTTCTA
ATT7TTGTGG
GGAATAATAG
GAA'rTCAATA
CTTGATAATT
ACGAGTAATT
ATCCATGAAG
TATGAAATGT
GCI'AGAAATA
GAAAGATATA
AACCCAATCG
TTAAGTATAA
ATT'rTAAArr
TACGATCTTT
AAGTGT -rAT
AATTTAGAAA
CT'rrAACT T'rTATGCTGA
AGGGAGTACC
7=TTCAATG
GTTTTGGGTT
TATTTAAGAT
TAAAATCTT
GATAAAGTTG
AGCATTGTCG
CAACCGGATA TTATGAAACA AGATTTTTAT GCTATGAGGT AAGTAA'N'GT GAATTCGATA ACCCATTTA ACATTACAAA AAATAATTCA ATA.AAAGTCG AGTAGAATTTrrAGACAAAT GATAAAATTA GATCAAATCG TCCTAATCAT TTGAGAAAGT ?rCTCACGGA GCTAATCTAA AAAAAATGAT TGGAACTATT TATAA.AAGAC TTG.TTCATGG TTGTAT'rT TrAGATTTAA GTTAAACTTC T=r'ACAAAC TCGGAATCTA ACAGTAGAGT TTrGTCCAGGT GTTTA'rAAAG TTCAAATGAA TATTI'AAAAA CTAATAATTA GATTGATTCT CACGTTAGCA ATAATT'ATTC CTACCGGATT TGTTACTAAT ATTCTTATAA TATCAAT'rr AACCAAATAT CATrTTGGGT ATAAGTTACG TGATAGAAGA T'rTGCAAATT CTC'N'TTTAG GCATCATTTT TACAGGTGGC 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 GGTTAAGTTA TTATTTGATG AAAT7NrTAT CTACTGATGA GGTTAATAGA 'rGAAAAATTT TAACTTAGCA GACAGTGTAT TTTATATAGT TTCGACCATG 'rTCTTAGGAA TATTTATTGC AGTAAATTAT CTTTTGGA CCAGTTATTG ACAGAGTAAA TCCGCAAAAA GGTTCAATTA GCAGTGGC2TG TAATATTTT AT'rATTATTA GATAATGAGT CTACTGTTTA TTrCAGTAAT CGCTACCTCC TGTCTTGATT CCTCAAGTGG TAGAATATGA TAAGATTGTA TATTCGTAT AAAGTATTAG ATTCTATTTT TAATTCATTC 392 ACGrAGGATTT A'1rrTATTGG TTAAGATAGA TATAGGCATA 7=TTACT CTCTATTTAT ATI'GTTGTTG TTAAAATr'rA GAACTACCAA TGCGAA'rATA GAAAACTTCT CTTTCAAATA TTACAAGAGA GAAGTGTTGC AAGG'rACAAA G~rrAIrA AATAATAAAT TATTATTTAA AACCAGTATT TCTTTAACGC TTATAAACTT T=MTTrCA 7TCACACAG TACTGTACC GATTTTTCT AT'rCGATATT'rTrGATGGTCC GANTAT GGArNrT TAACTArrGC TGGTTTG4GGT GGTATATTGG GAAATA'rGCT AGCGCCAATC GTAATAAAAT ATTAAAATC GAATCAAATr GTTGGTGTAT TTCTIrrTT GAACGGCTCA AGTTGGf.=AG TAGCAATTGT
TATAAAAGAC
CTTCAATATT
TATACTTTAT CACTArrTT ATr=~CGTT AT'TTTTAATT CG~rrGTACCA ACAALATACCT GGTAAATACT ACCATTGATT AGGAACGCTT AT'rGATTTGA TTTGr'rCT TATATTTTT ATGTTTATGT TCATTCAAAA AGGTTGTTCT TCTTGGTGGT CTATTATTTC TTTTGGAATG ATATTGAATT AGTCTTAATT ATACGGATAA TGGATTGAAA GCATAATGAC TATAACTGAA CAGATTCGTG AGACAACCCA TGTTTATGT CTAAAGGAGT CCACATCAAC I'TC=GGTAG CCAATTGGTA GTTTAGTTC GCTATTAGCA TACCTTATT GAATT'rAGI'A TATATTAGAA AAAGAAAACT GATATCTTTA AGCTTTTGTC GGAAAGATTA TTTAGTGCTA ACGGTATCGT ACTAAAAAAC TAGCCTTGGA CCAATGCTTT GATGGATAGG ATGTACTTTA GGATGGACGT GTAATAACCT CTTCTTTCGA GCGTTCCCCA GAAPACGC-ACT TACTCATTGA TTCATCTGTT CCTTGAAAAA AGTCACAGCA TATGAAGAGT TAGAAATTTA TACGCAGGTG AGTTGATTAG TATTGACCTA GGTGGTACAA GTGAGATTGA AACTT'rGTGG AGTATTACAA
GCAAGATGTT
GGAGTATTTT
CTCTTrCAAAG
GTCATCACAG
ATTGTATAAA
ATATTAAGAT
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 A'r'r'GGGGAG
ACGATAGTAC
GGATCTGGAA
TACTGTTCTT
AAGAT'TTTC
TGAACAAAAC
ATAGATAAGA
TCAAATGACG
TCTCAAATTA CAGATA:CPA TrAGAAAGCT TATCGGACAT CATCAGTTCT ATTAAAAATA AATTGACCGA ACGGAATAT'r CCTrGATAGCG ACCTTCTTGG AATCGGTATG GGAAGTTGCT CATCATACTT TCCT'rGTAAA TCATAGCGGC TATAAACTCT CCGTCTACTT GTCCTGCAAC AATTGAAGTC TGCTCAAAAC GCCGTCCGCT AATC?'N'CA TAGACTrTTCT CCCTTTTAGG AGCCTAGCTT TCTAGTTTCT TCTTTGATTT TTA'rrGAGTA TACCACTATT 'rTACTCCCTC TGGCAAGGGA CTTTGTCTAT GTGGAGGGAT TGGGCTCC1'A TGTGGTGGAG CTTTTCTGTT CI'1CTGAAA TATGGTATAA TAGCACTAAT CAATTTCTAG GAAAATAGAT ACAGAAAGGG CCTGAAAGAT GTCTCATATT ATTGAATTGC CAGAGATGCT GGCAAACCAA ATCGCCGCTG GAGAGGTCAT TGAACGTCCT GCCAGTGTGG TCAAAGAGTT GGTAGAAAAT GCCATTGACG CGGGCTCTAG TCAGATTATC ATTGAGATTG 393 AGGAAGCTGG TCTCAAGAAG GTTCAAATCA CGGATAACGG TCATGGAATT GCCCACGATG
AGGTGGAGT
'rTCGGATTCG 'rCrGACTCT
GGGGTGAAGT
AGGATCT
TGTCTCATAT
GGCCCTGCGT CGCCATGCGA GACGCTrGGT TTTCGTGGTG GTTAACGGCG GTGGATCGTG TGAGGAAGTC ATCCCAGCGA TTTCAACACG CCTGCCCGTC CA7rGATA'rT GTCAACCGTC CCAGTAAGAT AAAAAATCAA AAGCCTTrCCC TrCTATTGCG CTAGTCATGG AACCAAG?1'A
GCAGATCTCT
TCTGTTAGTG
GTCGCGCGTG
CTAGTCCTCGT GGAACCAAG GTTTGTGTGG TCAAGTATAT GAAGAGCCAG CAAGCGGAGT TGGGCTTGGC CCATCCTGAG A CTTTTA
GCTTGATTAG
CAATCGCAGG
ACCTAGATTT
ATTATATCAG
TTTTGGATGG
TCCATATCGA
TTCCAAGGA
TGATGGCAAG
GATTTACGCT
CGAAATTrCA CCTC7'rCATC 7?rTGGAAGC
CCCTTATCTA
AAAAGAACTG
GAAATGACGC GGACAGCAGG GACTGGTCAA TTGCGCCAAG TTGGTCAGTG CCAAGAAGAT GATTGAAATT GAGAACTCTG GGTTTTGTGT CCTTGCCTGA GTTGACTCGG GCTAACCGCA AATGGCCGTr ATATTAAGAA CT'rCCTGCTC AATCGTGCTA AAGCTTA'rGG TTGGACG'rTT TCCACTGGCT GTCATTCACA GCGGATGTCA ATGTGCATCC AACTAAGCAA GAGGTGCGGA ATCACTCTGG TrrCAGAAGC TATTGCAAAT AGTCTCAAGG GCCTTGGAA-A ATCTTGCCAA ATCGACCGTG CGCAATCGTG CTCCCACTCA AAGAAAATAC cc'CCACTAT GAGAAAACTG.
ACTGAAGTAG CTGATTATCA GGTAGAATTG ACTGATGAAG AACAAACCT GATTCCAGAT AGAAGGTGCA. GCAAACTATT AGCCGTCAAG ACCTAGTCAA GGCAGGATTT GACCCTGTT'r TGCA~rTGC AGAGAGAAAG TTGCTAGCAT CCATAAGGCT AGTTGGAG=T TTTCGGACAA TTTACATCAT AGATCAGCAC 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140
GCCAAGGAAA
CCTGCTAACT
TATGACAAAC
ATGCACGGGA
GCTGCTCAGG
CCTTGGACCG ATTGACCAAG CCAGCAAAAC ACGACCAGCT AGACCATCCA GAGTTAGATC GCATTGGCAA 'rCTTGACCAA AGCCAGCAGC CTGCGGATGA TGCCCTGCGT CTCAAGGAAA TTCTAGCAGA GTACGGAGAA AATCAATTTA AAGAAGACAT TGAATCAGGC ATCTATGAGA
TGGAGCGAGA
CTTATCTCTT
AACGGTCAA
AACTCCTAGT
GAATGCCTCT
TTCTACGTGA
TGTGCGACAT
AGAAGCATCC AGCTTCCCAG TGCCCAAGGG CGAGATGGAC GTACGAGGAG TACCGTGAAA GCCCTATATC TTTGAAT'TC CTTMAGGAA GTGGGCGTCT ACATCCTATT TGGATCCCAG GCTCC'TT ACCAAGGAAG GTC'rTGCAAG CCATCTATCA CCTCTATCAG CT~r=TAAT TTTCTATCAA GAAATACCGA GCAGAGCTGO CTA'rCATGAT AGGCCAATCA TCGTATI'GAT GATCATTCAG CTAGACAACT GTGACAATCC CTATAACTGIT CC'TCACGGAC GTCCTGTTTT GGTGCATTTr' ACCAAGTCGG 394 ATATGGAAAA GATGTTCCGA CGTATTCAGG AAAATCACAC CAGTCTCCGT GAGTrGGGGA AATATTAAAA GTATAAAAAA GTCTGGGAAA AATrTTC-AAA ATCAAAAAAA CGCATAAAAT CAGGTGTTCA AAAACC~rGA TTTTATGCGT TTTATCATGG AAA'rACTTAC T'rCAl-mrr CCTAATTCTT rrCGAAACTC ?TTTAAACG TGTTTTrGAGC TAATTTTGCC AGI'?1'TCT CGACTGG 'I CCTAGTI'TGC TCTATGATTT GTTTCT'rTAT TCCTCTAAAA GTAGAGTCTG ACCATTTCAA TACCTCCTTG TGCACACTCA CGrrCTATGG CTTGGTTCAA'r TGTATCrTC CTAT'!rAAAC TGAIq"MGGC GATTCCCTTA C7rGTATTCC CTCTAATGAC AGCTCCCAAG CCCATTrTG ATGCAATCAG TGGTATTTCA ACGTCAGT'rr TATCAGTAAT GTAACATCGA ACTTGTrGTTT TCACAGACCA TTAAATTGCG
TTCTATGCGT
GAACCCTTAT
GTACCACAC
GATACCTCGC
CAGATAATTG
CTAATGTACG
TTCCTCTTT
CAAACATAAC
TACATACATA
CATCATAr
CTCAAAACAG
TACCACTCTG
ATTT'rGCCAA
AATCAGGTTG
AGTACCAGCT
AGGAATTTCG
ATCATAATGA
TTTACTTTTT
AAAGCTCCTG GAACCCAGGC 'rACCTCTATA TCTT'rTGAGA TTATCTAGTG CTCCAGATAA TAATTTTGAA TC7"rTCTCGT TTACATTCTC GTTATAAATT CATTAAATCT T'rACCTTCAT AAGTGTTCAT TTCTT'rGTTT CTAAATAAAA CTGGAAATGG TAATTCCATA AAATGAAGTG ACTGAAGTCC AAATCACCTT CAAAGCCTAA AAATGATAGG CTTT'rAA'rr AGTAAGACAC CCGAACCATT CAATCGCAAC GTAAAGAGCC AATACATTGG CrCATCCTC AGTTATCTA TATAGCTAAT GAAACTCGTT CTACCAGCTG ATTAGTGGAA TGTTGTGTTT
CGCTACAACA
TTATT=CC
ACTATCGTAA
T'TTrTCTAAC
CAGATCTTTA
TGCAAGATTG
ATACCTATTT TAATATTGTT TGCTACTAAA TCCATATTTA AAATGTGACC CATTCGATTT GGATTGGCTT CTATTTCGAT TGATATTCTA TGTTCAACCT, TGTCAGGATI' ATTTGTCAGT AGCATTTTG CTCCAATATG ATATTCTCTT GCATCAAGCG TATCCATGCC T-rGATCTTGT 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 ATTGATAAGT CCAATTCCTC GTCCCTCCTG TCCCAAGTAA CTCAACAATC ATTTTCATAG CCTTATCGAA TTGCTGTCCA TAAAACATCT CCTGTTAAAC ATTCGGAGTG GACCCGACAT TATATTTCCC ATAATAAGAG CAAGATGATG TTCCCCATTT TGCTTTGAAA TTACCGTATC TAGTAGGCAT ATTGACAGTT ATCATATACT TTCTATATT C?'rGTAATTC rr'rCATGGTA TTTCGAGAAC TGAATTAAAT CATCTGTTCT CATCATTTT CCATCATGAT TCATTATTTC ACAACATAGG CCACACTCTN' TTAGTCCAC AA-ATCAACAG TTGCTT'CTGT GTGTCCATTT CTTTCTAGGA CACCACCTT AAAGGAAACA TGTGTCCTGG CCTGCGAAAA TCAGAGGGTC TTATATCTTC ATACGTGCGG TCAGTCCTCT TTCCTCCGCA GAAATACCTG T=GCGTrC
TAAT'N'TAAT
TTTTGCAATT
AGCTACACAC
TTTATAATCA
395 ATTGAAACTG TAAAAGcAIcT cPATGATI'A TCTGTAT'rGT TTTCAACCAT AGGTGAAAGc ATTAATrGAT TAGCTAAACT TIrCGCTCATA GGCATACAAA TTAATCCTTT GGCATAAGTA GCCATAAAAT TAACATI-1rC TGTTrGTAGCT GCTIGTGCAG AACAAATTAA GTCTCCTTCA =TCTAT CCT'=GTTC TATAACAAGA ACAAGTCCTC CC 'rCGCAA TCCTAAT GCTTCrrGTA T1-N-rCGATA TT-CCA'rrGAC TGATTATCCT TTCTGCTAAA ATCCATTT-TG ATATAATAGT TCCTTAGATA TITCTGATrr TGGAGAGTTA TCCATCAGTr TGCACATA 9000 9060 9120 9180 9240 9300 TTTACCTAAG ATATCATTT GGTTTGTTCC AAGGTATGAG GACAGTCAGA CTAATGCCGT TTCTTTTGT GTGTTGA'rrT TTTTCCTGTA CCATCAATGT TAAGGC7CT TCTAGATTCA CCATGrTCA TTCATTACAT CAAGATTTAC TGTACTCCCG ACTTGTTTAC TCI-rAAGAAT GGATAACAGA TACTGAAAAG T rACTTTGG AGACTNTAGC CAATGTAAT ACATCC="r TCAACTATTA AATCTAAAAT GATACCATAC AGCATTATCA TCTT"rT=.A TTGACGAGAT GTCCTCI'AAC GACGTGACCC CCAAGTCGAC CGTTGACAGA CCTCACTTCC ATGTTTTAAT AGAGTAAGAG CTGT'rCGACT CAACTGTAAA GGATTGATGA TTGAAATGAG TAACTGTAAG ACAGATACCA TTTACTGCTA TACTATCGCC TTTAATTGAT AGTTTACAAT TACGAGAGTC TTCAATTATT CCTG'rGAACA TGGATAAATC ATTTGAGAAA ATGCATAAGG =TCAATCTA CC TCCGACAG GAAACTTGGC ACTACCTCCA TCATCAACAA T'TTGTTC CAAAGCACTC AGGCTATCAA TCTGCATGTT TCCTAGATGT
TAAATGGATA
7"=CGTATT
ACTTCACTTT
ATAGCGTCAT
AAAACTG
CAATTCAT'rA TGCATTAAkAC CCATGATTT'r
TTAATATCAT
CATATGATAC
TCCGTTAATA TTTTTGAGGC CTTTCAACTT TTCCGATTTC CTATGAGATA GTCATTTCCT TTGGCAAAGA AATAC =TCA GTGCAATATA TATTTTCAGC GACTGCCCCC TTCTAGAACT TCGATAAGTC TATATGATTG GATATAGCTT CATTTTATTT TTGCTGT=r~ TACGATMTA GGATAGGATT TTTCCCTTCC TATTGACTCC CACCATAATT CTTCTTCTTC AGTAATCCAT ACATTGCATA TrrCATAAAA 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 CCTIrTTCT TTATGGAAAG TTGTCTTCAG AGGAAGTGGC GAGGTAAGAG GAG=rCGTAA TCCAATCTAC ATGTCAGCAA PCACTAACAT GGTGTCGTAA TTCCATTGAT TTGTTTTAGT ACATAGGGTA CATGCTGGG'r 'r'NCTAAAA TTCCAACAGT CCAGATACAA TAGGATTACA
TATTTCACAG
AATGTAAGTT
ATGTGTATCG
AGGATCGTCT TGAATAACAG CTGATGCACA TGCTTTCTTG GGCTATTTTT CCATCCATTG AATATACTTT CTAAAACTTT TTATTAAGTT AAGACACTCA AACTTGAAGA TTATTTCCT CAAGTATCTT TACTCCTTTT GTCTAGGCTT CCAATGACTA CTCTTGTAAT ACCACTATCG ATTATACCAT CTATACAGGO ACGGGC TAAAGCGTCG CTCCGACAGG GGATTCTCTA GGGCCACCAA AAAACTCATG ATAACCrI'GT GCGCCCACCA TAGGATTGGG ATTGACGTAA AATTTCATAT AT'NrGAATC GCTCATCTCG GGACTACTCA AGGCATACAA AAGAAAAC?1' AAGTAATcAT TTcAGAGCAC CCAAAAACCA CTATACTCTC GGCTGGAA TTTCACCAAA CGGTAGGGAA TTTCACCCTG CCCTGAAGAT TAAT'rTrGAA TATGTCAACA GATAAATACC ATCGATTCTC GCTCCTCGGA TAAAGAAAAT AAGGAATACT ATGTACGCAT ATTTAAAAGG TGTTCTTGAA ACCAATGGTA TTGGTTATAT AGGTCAGGTT AATCAGGAGG CTCAGATTTA 396 CCGAAGTGAC AACAGGGTTC AAGTGTTACA CAGTTTTAA GAGCATTTCT CTCAGCATGT
CCGATAATGT
CCAGCCCCTT
CTACCTCCAA
ATGCGATTAA
CAAATATACT
TCATGCCTT
AGTTATTCAA
GATTGTTTT
ATGATATACT
AATCATTACC
CCTCCATGTG
TGTGCATCAG
GATTATCNT' TACAATAACT TN'rGTGCCAG TTTAT'GCT AAAAATATAC CTTGAATAGG CAAAAATGCT CTGAAATGAC =~ATCTTCT TrCATCCAGA CGGCTCGTGG GCTATACCAC TTACAGATGA TTATAGTACT GATATACTGT ATTrTGTGATA AGATAAACGA AATAAGAGAG AAAATTACTG CCAAATACAT GCCAATCCTT ATGCCTAT'rC GTTGTGCGTG AGGACGCCCA TTTCTTAGTC TGAI'rTCGGT GCTGATGACA ATGCTGGCTT AAGTTCCC-TA AAAT'rGGCAA GTAGTAGTTG CAGGAGATGA CAAGAATTGG AAGAAGCTAT AAGAAAATCA AGAAATTCTT GCCCTTAAAA TG7rGGTCAA TTTGCTrTAT GGATTTCGCT CAGAGGATGA GAAAAAGCCC CTCTGGGATT GGTCCTGTAT CAGCTCTTGC TATTATCGCT GG'rTCAACCC ATTGAAACCA AGAACATCAC CTACTTGACC GAAAACAGCC CAGCAGATG4G TGCTGG.ACTT GGAAGGCAAG 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 CCTTCCTGCC AAGGTCGCAG GGAAGCCATG TTGGCTCTGG TGAAGGAACG ACAGATACAG ATAGGAGCAG AGAATGACAA TGCAAGCAAG TGCTGAAAAC GCTACAAGGC AACAGAGC'rC CTGAGAACTA TATCAAGTCG
CGCCTATCAT
GTTGTGTATG
AGCTTTCCGA
TCGAATTGGAA
TACACGCGCT
CTATCTTTGG
AGCGCCAGCT
CAAGTTCACA
GATGAGGAGT
GAAACCTATC
GAAGTCTTC
GCCATGCTGG
AACGCCCAAG
TCTTTTGTTG
AAAACACCCT
GGCCCAG'rCG AACGTTGTTC GTGGGTCAAG ATGACCAACC CGCTCTACAT GGGGCCAGCC CCTCCATGAT GACCAAGTAT TGTTTGAGI'r AGGC-AGGCCT GTCTrTGGGAA ACGGTACTCA ACAALACGCCA ATAGCTATCA AATTCACTCA GTCGCAGAGA TGACTGACAC AGAATCCAGC TATCATTCGA AATAGAGCCA AG;CTTTTTGC CCTTTCTACA GTTACAGGCA GAGTACGGCT CTTTTGATGC AGGGGAAAAC TGTCGTTAAC GATGTTCCTG ATTATCGCCA TATCTGAGAA ATTAGCCAAA GATCTCAAAA AACGAGGCTT CCGTATTGTC =NTCTACAG GCTGCAGGGC TAGTTGATGA 397 CCACGAGAAT GATTGTGAGT GGAAAGGTCT TAAATGATGT CTAACAAAAA TAAGGAAATT 12540 CTGA'1T 'G CGATTCTCTA TACAGTCCTC rI'TATGG ATGGCGTTAA A7rGCTrGCT 12600 TCrTTAATGC CATCTGCCAT TGCAAATTAT CTTGTrATG TAGNTTAGC TCTATATGGc 12660 TCCTTCTTGT TCAAGGATAG ATTGATCCAA CAATG-GAAGG AGAI'AGAAA GACTAAAAGA 12720 AAATTCT= 'rGAGTCTT AACAGGATGG CTCTTTCTCA TTCTGATGAC TGNrGTCTTT 12780 GAATTTGTAT CAGAGATGTT GAAGCAGTrr GTGGGACTAG ATGGACAAGG TCTAAATCAG 12840 TCTAA'rATTC AAAGTACC'N' TCAAGAACAA CCACTACTGA TAGCTGI-N-r TGCTrGTGTC 12900 ATTGGACCTC TGGTAGAAGA ATTAT'rTTTC CC1'CAGGTCT TATTGCATrA CT'rGCAGGAA 12960 CGGTrTGTCAG GTTTACTAAG CATTATTCTG GTAGGACTTG TTTrTGCTCT GACTCATATG 13020 CACAGTTTGG CTCTATCAGA GTGGATTGGT GCAGTTGGTT ACTTAGGTGG AGGCCTTGCC 13080 TTTTCTrATTA TTTATGTrGAA AGAAAAAGAG AATATCTACT ATCCCCTACT TCTCACATG 13140 TTAAGCAACA GCCTCTCCTT AATCATTTTA GCTATCAGTA TACTAAAATG ATGAGAAC 13200 AGGACAAATC GAT'rrCTAAC AATGTT'rTAG AAG'rAGAGGT GTACTATTCT AG'TTCAA'rA 13260 *TACTGTAATA TGTGATGAAA ATGCCAGTAA TGATACCGAG AAAAAAGCTG AGAAACTTrT 13320 CCCAGCTrl'A TTTGTTATAG TCAAAGAGAA TGACT'rGTTC CTGTGCATCT ACATGACCAT 13380 ***GGACCCCAAA GGGTACAATT GCTCTI'rGGAG TTGCGTGGCC GACATTCAGA TTATAGACAA 13440 *TCGGGATATr GCTGTCAATG ATATCCAATA GTGCCTC'ITT ATAGTCGTCA 'rGGAAAGTTT 13500 CATCCATAGG 7TTTCCGACC AAGAGTCCAT TGATCACCGC GAATATGCCA G'rGTCCTTTA 13560 AAGTTAGCAA CATC=rTG AAGTCTTCTG GCTTfAGGCTT TTCTTCGCTT GTrTTCGAGCA 13620 *AGAGGA'TTTT CCCTT1CCCAG TCTGACAAGT CAGGGAAXAG TTTGTATTT'r TGGCAGJAGTT 13680 CCGTGCTATC 'rGCGTATCGA GAGTTGTCAA AGATATCGTA GAGGGATTCG AGGCAACCAC 13740 CGAGGATTTT CCCCTCGAAC TGGGCACTTC CTTGCAACAA GTCAAAACCT GTATTTGTAT 13800 GACGAAC AGGTGTTCCC AGGGCCGrC GACTAAAATC AGTTCGTTCC TCATACCAAA 13860 *CGTCACTAGG GCGGATTTCT GAAATTCTTC CCCTCTCAAT CAATTCTTTA AAGTAGTGAA 13920 GGCTATAGGC TAGCATTTCT TTGTCTAATIT CACAAATGTC TGCTAAAAAG GATTGACCAT 13980 AAAAAGTCTT GATTCCTAAT TTATGCAACA TGAGGTGGTT CATGGTTGTA TCCGAGAAGC 14040 ***CAAGAAAAAT TTrTIGCTTG ATAACCnTT GGAGTTGGTC ATTTTCAAAA AGATAAGGTA 140 GCAAGCGATA GGTA'rCGTCT CCACCGATGG CACATAGGAT CATGTCGATG CTA'rCATCAG 14160 AAAAGGCATG AATCAAATCC TCTGCACGAG CTTCAGGATG GTCCTTGATA AAG'rCTAATC 14220 398 CTTTTAACGA ATGGGGCAAA AAGATGGGAT TGGTCCCAGA TCCTTGAGAC GTT INFORMATION FOR SEQ, ID NO: 41: Wi SEQUENCE CHARACTERISTICS: LENGTH: 9828 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 14273 GTGAAGTGCG GCAAAAGGTG CAAGTGATGA GCTCAGGTTC TTGGCTCAGG TGG?1'ATCCT AAGGGACGTA TCATCGAAAT GTAAGACAAC GGTTGCCCTT CCTTTATCGA TGCGGAACAT TTGACGAATT GCTCTTGTCT AATTGATTGA CTCAGGTGCA CTCGTGCGGA AATTGATGGA
CATGCAGTTG
GCCCTTGATC
CAACCAGACT
GTTGATCTTG
GATATCGGAG
6 a 0.04.* 00.0.
0.00* 0 TGAGCCAGGC CATGCGTAAA CTTGGCGCCT TTATCAACCA ATTGCGTGAA AAAGTTGGAG GCGGACGTGC '!TGAAATTC TATGCTTCAG TTAAGGGAAC TGGTGACCAA AAAGAAACCA TAAAAAATAA GGTAGCTCCA CCGTTTAAGG GAATTTCTAA GACTGGTGAG CT'rTTGAAGA CAGGGGCTTG GTATT CTTAC AAAGATGAAA AATACTTGGC AGAGCACCCA GAAATCTTTG
CACAAGCGCA
CAGCTTATGC
CAGGAGAGCA
TCGTAGTCGA
ATAGCCATGT
CTATCAATAA
TGATG'N'TGG
TCCGCTTGGA
ATGTCGGTAA
AAGCCGTAGT
TTGCAAGCGA
AAATTGGGCA
ATGAAATTGA
TTTAGCTCTT GACA'N'GCCC CTATGGCCCA GAGTCATCTG AAAAGAAGGT GGGATTGCTG TGCGGCCCTT GGTGTCAATA AGGTCTTGAG ATTGCGGGAA CTCAGTTGCT GCCCTTGTTC TGGTTTGCAG GCTCGTATGA AACCAAAACA ATTGCCATTT AAATCCAGAA ACAACACCGG TGTTCGTGGT AATACACAAA AGAAACTAAG ATTAAGGTTG TGAAATTATG TACGGAGAAG TTTGGATATT ATCAAAAAAG AGGTTCTGAG AATGCTAAGA TAAGCAAGTC CGTTCTAAAT TGAAAACAAA AAAGATGAGC 2TTAGGCGAT GAACTrGAAA CGCTACTTTT TCGATT'TrTG rATGAACAAC TCTATTAGGA GTACTATTCC AAACTCAACC CTAAAACATT GTTAAAAATC CTGTTTCTTG TCGCTCCTCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 TTGGCTTGAT TGATGGAGAA GAAGTTTCAG AACAAGATAC CAAAGAAAGA AGAAGCAGTG AATGAAGAAG 'PTCCGCTTGA TCGAAATTGA AGAATAAGCT GTTAAAGCAG TGGAGAAATC ATTCAAGTTT TTAGA'N'ATA TATAGTAGCT TGAAATAAGA AAGTCAAATT AATTTCTAGA AATGTTTTAG CAGCTACAGC AACTATAATA GATCGAAACT AGAATAGTAC ATATCTACTT GATTTGACTT TCCTTATTC A'rTCCGCTAT ATATAGTTTG GGAAAGCTGA TATAATAGCT TTATGAATAA AAAACGAACA GTGGACCTIGA TACATGGTCC 399 GATTCrI'CCC TCGCTCTAA GC1-rCACCrr TCCAAT~rMr CTATCAAATA 7rrMCAACA GCTCTATAAC ACTGCTGATG TCTTGATTGT TGGACAT Clr-GTCAAG AATCC2-rGGC 'rGCAGTAGGA GCGACGACAG CGAT-rTI'rGA CCTGATTGTA GGTTTTACAC TTGGTGTTGG CAATGGCA'rG GGGATTGTCA TTGCTCGrA TTATGGGGCT CGGAArCA CTAAAATCAA GGAAGCAGTA GCAGCCACCT GGGCTr'TCTT GGCTTGTATC TCAA'rCTTAT CAATATATr'r TCTTTTTGCA GGCTTGTTGC TT1TCTCTGCC TTGGTTAATG GGATVAGG 'rGCTCTrG AGCATTCTAG CTCTCTTGCA A'rACTTAGAT ACTCCTGCAG CTATGATTGT GACCTGTGTA GGTGTCAGCT GGTCTATTGG TGACAGTCTA GCAGCCCTGG TGGTTCTGGA TCTCTA=I?' A'rTACGCAAT
TTATGTTGCT
AAATTCTTCC
'GC'rTATAA
GATTTCTGAT
TGCATCT1GG 0
OS
0 0 0Se 0 0* 0 S 0 55 A S I A A. *q
S
S
S
S
5*55 *0 At' S S
A
S
AGTTCAATCC
TrATTATATT
CAAAAGCTTG
TGTATCTATC
TAGTGCCCAG
?rCTGCATCA
TG;T'CAAGGT
T'rrCCTCTTT CTrGATAGAA
CCTCTTGTTG
*T'rCTAGCTTT
AGGATATAAG
GTACTTCTrCA
AGTGCAATCC
GCAGGACTTG CTACCATTAT TTCGCAAGGT CGTAAAAG'rG TGCCAGAACT CTTGCCACAG TACGCGGATC TCTTGGAGCA AGGTTTGGCT GGCAGTGTGA TTT'ACAGTT TTCTGTTAAT ACGGCAGCrC GACGCAT1TAT GACCTrrGCC ATGACGACCI' T1'GCTTCTCA GAATCTAGGA
CTTCGAATCG-GCAGTCGTTT
TTTGCCAGTC CAGCTTTGGT AATGGAAGTC TCTATCTGCA ATTTATCGCA ATTGCTTGCA AT'rGAACI'AA TCGGAAAAAT GGTTrATCC TTrGTGAACC TTATTCCG;TC ATCCCTTGAT TAGTTGGATT TACTGAATAA
AAGTATATCC
TTCCTTCTTG
AATCAGTTCA
GGGCTTGGGG
CGTTTTTGTG
TCTTATCTGG
AAAAGAAGC
AATCCATTTC
AAAATAGT
AAAAGGACAT
ATTGTAGATA
AAGTTCTATC
GGAGTTTCTG
GGAAATTAAT
TTA'rCAGCGG TTCTCTGCTr TTTAAACATT TCAAATGGGA ATGGGCTTGA TGAGTTCAAT ACATTTGGTG CAGTGATTAT CTTCT'rCCTA TGACCGCTAT GCTAAGCGAC CTGACCGTAT TGGGCAGT?'r TTGTT'rGTAT GCTAGTTCGA CAGATGGTTA ACCTTTTATC CCATTTrGAG CAAA.AGATCC TTCC1'CTAGT GTTTTGATTA TTCCI'TGGGC GTTGCCA'rGA CAGTTCAACT AAGGCAATCT TGGCAACCAA CTCTAGTGAA AATCGAAAAA AACAGACTTT TGACTTCTTT ATAGAGACTG TAAAAATATA GAATGCAGAC CTTGTCAGTC TACAAGCCTA ATCCTGACTA GTCTGGATT GTAAAAAATG GTGAGTAAAT TAAGA6ACAGA 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 'rATATGATAT AATAAAGTAT AGTATTTATG CTTTTGAAAA TCT7r=AGT CTGGGGTGGTT CTATTTACAG TGTCAAAATA GTGCGTTI'TG AGATTGTCTT CI-rGTAAGG AGT'rGTTTTA ATTGATAAGG
TAGAAATAAA
AGTAGAATAT
TT'rGCCTCAA GTCGGCCGTGC GCATTCAACC GTACAGAATG TTTCTCGCAC ATTGTTGGGA CTGGGACGTT GGGGGCGG7T CCATrCAACC AAAGAAGAGT TCTAGCAGAT GAAGCAGGI AACGCACGAG TATTGCACGA TCCATATCTT GCTAAATCGG CTTGACGAT'r GAAACACGCT CGGCTC7TrAT CCAAAAGACA ?rCACGCTAT ATGC~rIGCAG CGACAACTCA GGCGAAATGG CAACGAAGA.A GGTGCCATGA
AGACGCTAAA
CTGGTACTAC
AGATGGCTTG
TATTAGGTCT
TTGCTACTAT
TAAATTTGAC
CCATTT1'?GA
TTGTTTTGTC
GAGAGGTGTA
AAAAGAACTTI
GAATCAAAGC
GTAATAAAGC
TTTTTAAAT'r
ATATTAGCCG
TGCCATTAGG
CGkAGGCGCCA
CTCAAACCAG
ATTACAGTAA
TGAPAAAGCT
ATrA.AACAAA
AATGTAAACC
TAGGAACTCA
ATAATTTTT
'rGAATATGA'r
AGAATATCT
ATGATTGCCA
TTCAAATCT'r
GGTACAAAAG
AGTCTG'N'AT
AAGGTAAAGA
AAcCATATAG
AAGCGGATTA
ACGGTTGCAT
GGAATGCTGA
TCATGACGGA
'rGCCGAAAAC
ATAACCAACC
GCATTAGCCG
GGCAGAAGAA
AGTTTGAGAA
ACCGCTGGAG
CTACAGGCTG
AGACAGGCI'G
TGGTATCAAA
ACGGAACACT
AATAATAATG
TAATAGTATG
AATCAAAAAG
GAGTCGGATA
TCAATTTTAG
TATTTAAAAA
AAATCTATGT
CAAATCTTAC
AA.ATGGAAAA
AGAAAAAAGT
ATAGAAAATT
AAAGGAACTC
TTTTCTTTAG
ACTCTTATCT
'rAT'TTTrAA AATCAATGGC ACTTGG1TACT GAAGCACACA GACGGCAACT GAAGAAAATC GCTGATAAGI' GGTCAAGTAC AAGGACACTT TGCCTTTATC CAGTCAGCGG GGCAGACAAG CCAGAATTCA GAATN'CTT CAAATCAGAA CCrT'TCTTG TGGAGATATT CAAACrAGAA AGTTATGCTC GCTTAAGTA CTGT'rTGAG ATTTTTAACC AGCATCAATA ATTATGACntha GAGTGTGCTA GATAAATGTA TGTGATGTTG TCAAGATAAG ATTGCTGAGT AGCTGAAAG4G AATATCACGA TGGGAGCTGA TGGTGAATCG TGACGCTGGT A'rTrTATGGA AGTTCAGTCT CATTGCAGTC CACATGGACT GTATCATCTT TGA'PTGGTGA AGGAGATGAA 'N'TTCCCATC TTCACTGTAT 400 GCAAGTACAC GCACACTCAA TCACTGGGG AAAGACCCAC CATGCAGGTA GGACCTGTTG GACCTATGCA OCGTTGAAC
CTACCGCCTT
GCTTGATACA
AAACAACCAC
TGAGCAGTTT
TGACACTGC
TATATCGAAC
GGGAGTTTAG
TCAGACCACG
AAGCATGATA
TACTGGTACG
CTGGGAATCC 3180 AATTAGGTTr' 3240 ATAATGGTGC .3300 TGATTGAAAG 3360 TCTTACGCAA 3420 CTOGAATTAA 3480 TTGACCCTTA 3540 TTGAGAACGG 3600 'rACATTCAGA 3660 ACTTTGACAG 3.720 GGTACTGGTT 3780 GGTACTATTT 3840 GGTACTACTT 3900 ACGGAACAGG 3960 CAGTAGAGCC 4020 CAGCGCATAT 4080 TCCT'rCAATT 4140 AAATAAAATC 4200 GT'rGAAGATA 4260 AA'IrGCTTCC 4320 TTC'N'TAT 4380 GAAAAAGAAr 4440 ATCT=T'T 4500 ATGGAT7TAA 4560 CCGATAGATA 4620 CTCGGAAAGA 4680 AATTCAGACA 4740 TATTATGATG 4800 ACTGAAAGAA 4860 AGGATGGMT 4920 AGGTGAAGAA GAGTTCAGTC AAGCGGATCA GTTTGCTTCT AGGAAATCAG AGAAAATGCC AATAGAACTC GTCAGTTTA TGGTATCAGT CATAAAGCTA TTGATGCAGA AGAAATTAAA AATATGGATA GCTATGATAC AAGTTTATAT CGTCCTTTGT AATATATTAA TTCAACTGAA CAACTTTTAG AGGAACTGTT ACTAGATGCT TTCAGATATG GAGTTGTCGT TTGACTAGTC GTGTATTTAT GGTTGCC-ACT GAACATCTTT- TAGAAAAGCT ATCTTGAAGT AGAAGATATT ATAAAA'rTGG TCTTATATAG A'rTGAGGAAT GATGGATACC TTAG'rGTTA' ACAGACAGCT TCAAGATTAG CAGAAAGTAA AAAAGAAATG GCATTAGGAT AAAATAACAG AATTTCGCAA GGGAAGTATG ATArTTTATA TGGGCTAGAT GAAGACGGGGG TGATGCAGAT TGTAT'rTCAG TATrMATG CTATTTrGGGT AAAATTGTrTA TTCCACAAGA TCCCCATTTA AAA'rCTAGGA TAGATCAG'TT CATAGACATT GGAAC'rGAAG AATACGCATT TAACAAGATT AT'rGGTAAGG GACAAGGGGC GCTGTATGAT GAAATCAATA GGTAGCTAAG GGTTCAGCTG ATATAGAGAT TTAACAAGAA ATCTATTTCC TTAGCGAAAA 'rAAATCATAT GTAGAAGAA'r AGCGTTTAAA GCGTAATTTA TAAAAAGAGA AGGAAAATrTG TCAAAATAGA CAAAAATAAA AATTCGAAAA AGAAAACTGT TGCATAGTGG ATGAGAGAAA AAACGATATG TAGGGTAATG AAACTTATAT TTTATTATAC AGCATAATGG GATATTAGGA AGTAATAACC TAAGAGATGT TTTCTTTAGA ATATATGACA ACAGGAGATA TACTGATTGA TTACTGAATA AGAGGGCAAT CATATCTGGA ATAATATGCT GTGCAAAT'rC ATTTTAGAC TATCTTCGTG GAAGTATTCA TTTGGATAAA TCGAACTCAC TATTCAGGAG GCATATGAGC CAAATTGAGC CTATAGGAGT AGAAGTGAAA TAGTAAGTCC AGTTCTCCTr GAAGTT1TTCC TGAACTATCA GTCGCATGTC TGAGAGGGGA TAGCGAGTAG TTTTTGGTTA TTTTATCAAA CGAATGATAA AATATAATAA AAATGATAGA ATAAGGAAAA
TACCTACAAT
AGATTGTGAG
ATCATGATAG
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 AACATGAATG TCAAAAAGAT AATGTCAATT TTTCAATCCT TTTATGTTGA TGTCAGTA'I- CAGGAACTGA CTTTGAC'TTT ACCAATCAGT T'rTGTAAAAA GGTTTGAGTA TACTCAAATG ACTTTTCATA AGGAATCATT TTTATTGATT AAAGAAAAGA GAAGGGGGAG TTTGAGTTCA TTTGTTACTC AGGCTCGCAC TATGGGTGAA AAAGCCAATA TGGATGTTGT TTTGGTG'TTT TCGAAGTTAT CAGACAGTGA AAAAAAGCAA TTACTTCAAG CTAGAGTTCC GTTTGTAGAC TTTAAGGGAA ACCTCTTCTr CCCTCCATTG GGACTAGTAC TCAATGCGAA TGATACTGAA GTCCCTAAGG AATTAACACC TAGCGAACAA TTAACGTGGA TTGCCTTT-r ATTGACAAAA GGTCAAAAAG TAGTAGATGT TGATTTGCTT TCACAAGTCA CTGGACTCC AAACTCAACA ATTTATAGGT GTTTGAGGAC TTTTAAAGCT TTATA7rGGT TAAACAAGCA AAATAAGCTT 402 TACACATATA CGGTGTCAAA GAAAGAATI'A 1TrCTTAAAAT CCGTGTCATG CCCATCAAAA AACGGATTTT ATTGCCAGAT GGCGATATAA AGCAGATAAA AACCTTCTAT ATGGTGGTGC TTATGCTrTG TCGCATTCAA CT,'TAGC GAAAATATrA GCTATGTCAT A'rGGCAGAGA AAATTCAATC AGTTATCCT CAGCATG~rr TAAAATGAAA GATGCTAGAG ATATGGAAAT ATCGTCCT TTTGGAATG ATTTAAAAA TAATCATGAT AAACAATTTG TAGATCCGAT TTGACCTTAA AAGATGATGA TGACCCACG'r ATAGAGGAAG AGAGTGAAGC ATGATATTAC AGTATCTGGG AGAAGATGAT GCCAGCTAAT ACGAAAGTTA AATGTTTGCG GAT'N'TCAGA ACTATTATGT TCTGATTrGGG GGAACTGCTA ATTGGATTCG CAAGGATTTA AAAGTCGCAC AACAAAAGAr TATGATATGG TTATTTAAT 6720 ATCTGTTTCT 6780 TGAAACGGAT 6840 GCCACTTTCT 6900 TGTA'rCTGAG 6960 TTCTCTTTAT 7020 ACTAGAAAAT 7080 TTTTTCAAGA 7140 CCTCTATCGT 7200 TCATCATTGA 7260 TGGGAGAGTA 7320 CTAATCCTGA 7380 TGAAGTAAAA AATAAGGAAT TCAAGGAAGT CAGAAAGATG GTTTCCTTCT ATGATTGAAC TCGAGAAATT CCCTTACATT AGATTATTAT AATATATTGG TAATTGTGGT TTATACTCTT TCAAAACAGT GTTTTGAGCA GAG'rATTAAT TATTqrAAG GTGCTCAAGG TTTAAGTAAG CTTCCTTGCr AGGAGATGAA ACATGCACCG CTTTGTGATA ACATTTCATT GGATCAAAAT ATAATTGTAG CGAGATGGCT AATTTCAGGA GAAAATGAAG CAAGACCTGA TATTATGCG'r TCCTCTAGTC ACTTGATTTG 'rATT'rAGTAT CTTACCAGAA TATCCATTAA AGAAGGACGG 'N'GACCAAGA TGCTAGTTTA TCAGCCTTAT TATTGGATGA TGCATGAAAA AGAAACCATT CAGGGGTAT'r CGGTATTGAG CGAAAATCTC TTCAAACCAC GTCAGCTTCC ATCTACA.ACC GCCTGCAGCT AGCTTCCTAG TTTGC'rCTT'r GATTTTcATT GCTAAAGCTT GGCTGGATAT GAGGGAGCGC TCTGCCACAG TCCATTAAAA AGCATT'rGAA TGACCTTACC CGTTTGACAG AkAGTTATCGG CTATAACATC GAATTAGAGC CTGTGAAGTC GAAA'TTTTG AAATTCTGAA ATATTGAATT CG'rCTATATC TCAATCTTCC CACAATCAAA TwrTrMCTTT TCAAAACTTT TTTCAGGTGG T TACTA TTTATACTAC CTTGAATCAT TT'r'ITAGAAT AGAAAGCGCA GCTTTCGA TTTACAACAA AAGTAGTGCG GTAAAAGCAG AACTATTCTT CAAAATAATG AAATTTTCTC GATGGTTAAA TGGAAACTAG AAAAAACTTC 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460
CGTATAGTAT
TTGCCCAGTC
TAGTAGAATG
CAAGGTrTTTT
TTCGTTTTA
AAACGAGAAC
GCAGAAGTGT AGGACAAATT GATCAGGACA GTCAAATCGA TTTCTAACAA TGTTTTAGAA ACTAT'rCTAG TTTCAATCTA CTATAGTTAA ATCTGCGGTC AACTCTACTG GTGAATCTAT CATTGTAATA CTCTTCCAAA ATCTCATCAA CCACGTCAGT CTTGCCTTGC AGTCTGTATC TTACTGACCA AGCTAGTGAT GGATTTAGAA TAGGTGATTT GGAGCGTCCT AT'rACCTAGG 403 AAATGCTC CATAGTCCTT TGCTGAGGCT AGGGTCTTTC AACAT'rCAAC TTGATCTAGr TGATAGGAAG GGAGTTACTA TAAAATACTC ACGCTTCCAT GAAACGATTG TGTAATCAAA ATG'TACCAAT ATrGTAGTAT TGGTACAGAA ATGGATAAAT ATATCATAAC TGCTATCTCA AAAAGA'rrrC ATATGTCrGT TAGACTrcC GCAAAACTAG AATCCTAGTT CATGATTGAT AATACCAGCA ACTCAACrGG
CATATTTTTT
GATGTTGTGA
GCATATATAA
ATCAAATTCA
TTCGTAATCC AAAGCGrI' 'A CGATGATrrC
CTACTTTGGC
CTrTATCTTC GACATATCTr
TATTFCTGCA
AAGAAACAAT
AAAGA'rGTTC TCAACCTTGC AGCTGTTAGC GGCTTGAGT'r CATGAGCCrT 'GA'rAACCAC ACTCAT'rTrG AACAACTrCA TCTCCCTTGA CTTGTGACAA GATAGGTTGT TGAAAACATT TTAAACGTr'r 'rTCrCTCCTT AGATAGCGCA TGGTTATAGG TGCTGGATTT ACGTGGAGTT TGTGCTTGAG TGTCAGCCAA GATTTTACCA GCTTGTCCGA TATCATGACA A'rAGTTCACA GTGATA'rCCA TCGCTrGAGC CTTCATAGCG TGAAATTTCT TTTTACCAGA ATCA'rTCGCT'AATTCTTTTn TCAATCATTrA CCGTGTCCTC AGAACTAAGA
TTAGGGCGAT
GGAGTTCTTG
TCATTTrTAC TTCCGTCGCA AAATCGTAAC ACCACrTTGA TGCTTTCGTG AATACCAAAA 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9828 ACAAGAGTTA CTTCAACCCA TTGGCTCCGA CGGATTAAGT TCAGCCGCAA TTTCTTCATA AGTGCGGTAT TCTAGGCTTA ATTTAGGTTT TCGTCCACCT TTTGCGTGTT TAAG-TTGATA AGCT1GTTrTTT AATACAGCTA ACATCTCTTT AAAAGTCGTG CGCTGAACAC CAACAAGACG CTTAAATCGT GTATCAGTTA ATTGTT'rACT TGCTTCATAA TTTCGCAGGG AGTCTATTGA CTCTTTGGTA GGTG'rCAATG TTTTTTTCAT CTATCCCGAG AATTATTTTC CCGCCAT'rTG TATTI'GCAAA TGCTGAGTAG GTTTCCCAGA AAGACTCTGG AAGATTGTTT TTAGCTTT TGTATTCTAA ATCAACCCCT TCAAATTTTA AGTCCATATT TTTCCr'TAC ATCTGTTTTT TGTGGTTCTG GTATTTGTTC AAGTTGAGTG ATAATATAGC GAATTGAATr TCGAGAGTr'r ?TACTCAGTT AATTTCTTTT TTAACCCACT TTAATTGCTT- TTTTAACACG GGTTAAAAAA GAAATTAAAG TGGGTTAATT TTTCTTGA IN"FORMATION FOR SEQ ID NO: 42: Ci) SEQUENCE CHARACTERISTICS:.
LENGTH: 3369 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 404 CCGCGAAAGA TATTrGAA CAAGAGTTTG GACGCTGAGGT CCGTGGCTAT AATAAAGTAG AAGTTGACGA GT='AGAC GATGTCATCA AGGACTATGA AACCTATGCT GCC7TGGTCA AGTrCACTTCG TCAGGAAATT GCGGATTTGA AGGAAGAATT AACTCCTAAA CCGAAACC'1r CACCAGTTCA AGCAGAACCC C7rGAAGCGG CAATTACAAG TTCTATGACG AATTTTGATA TTrGAAACG CC1'GAATAGA TrGGAAAAAG AAGT1rTrGG TAAACAAATT TTAGATAACr CAGATT'NTTA AGTAG~rArr TGAGATGTGC AATTI'rTGA TAATCGCGTG AGGAGAATTG NTTCI'CATGA GGAAAGTCCA TGCTAGCACA CGCTGTGATG CGAAACCATA AGCCTAGGGA CGAGAAATCC TTACGCCAGT ATAGGCCAGA GTACGC'rrGA AAGTGCCACA GTCACGGAGT TGGAACGCGC TAAACCCCTC AAGCTAGCAA CCCAAATT'TT GGAAACGAAC GTAGTATTCT GACTGCTATC AGCTAGAGCT TATCGAAGGA AGTGCTCCTA GTCACTTCTG GAACAAAACA ATAGGTTGGG GCrGAGAAAT TTTCTCAACC TCAT'NTrTTA CCTGTAGTGT TTGTGCTAGG TGAAATGGCT AAGTCCT'rGG CTTTCTGGAA ACAGAGAGAG GGTCCGGGCA TGGAGTACGC GTTrAGTGGTA GACAGATGAT TGGCTTATAG AAAATTGCAT a a a. a.
AAGTGGACAT
CTTGCAAGAC TGTAACATGA TGAGGCTGTC GTTGGTCGTG ACGTGTTCGT TTTCAAGGAG AGCAGATCG;T A'rCAAAATTA TCAGGGAGTT TCGC1TTTGG TTCAAAAGCT AAATGTGT'rA TAAGAAAGCT GTTGTCAAGA GATGGAGAAT GGCCCAGAGT CATGATTGAT ACGACCGGGT AAAAAGAATT TAATTAATT GCAACTGTGG AAGTGCGAGA GTrGGGCTAC GATTGTCAGG ACGTGAGAGC TATTATCGAA ACCAACCT
TCGTAGGAAC
ATTGGGAAAA
AGTCCAAACT
AATTGCAGAA
TTAAGATTGA
CTAGCCTCTT
GTTCCCAGCT AAGACTTTTG TTATTTACCA CTTGGAGCTC 'rCACAATGAG CCCAGTGTTC ACACTATGCT CGCCCAGAAG GGTCTCTATT CTCAAAGATG TAAACGTGGT TATCGTACCG ATAGAAAGGT 780 CAGCAGGGCT 840 TTGAAAATGG 900 GGCTTCGGGC 960 AAGAGCTATT 1020 GGTTCCCGAT 1080 AGGCTATTTC 1140 GGGTTCCTCT 1200 TGGCAACTGT 1260 AAAAAGGTGG 1320 CGCTCCTATC AACGAAAATA TGGCAGCAGC CATTTTACA-A CTTTCTAACT GGTATCCAG.A CAAGCCTr-TG A'N'GATCCGA CCTGTGGTTC GGGGACTTr'C TGTATTGAGG CAGTTATGAT TGCTAGAAAG ATGGCGCCAG GTCTTCGTCG CTC?1'TTGCA TTTGAGGAAT GGAACTGGAT CAGCGATCGC 'rTGATTCAAG AAGTGCGCAC AGAAGCCGCT TGAGCTGGAT ATCATGGGCT GTGATA'rrGA TCCTCGCATG TGCTCAGGTA GCTGGTGT'rG CAGGAGACAT TACTTTTAAG ACGTTCCGAT AAAATCAATG GAGTAATCAT N'CCAATCCG ACATGATGCA GGGGTGACCA AGCTCTATGC TGAGATGGGG AAAAA.AGTAG ACCGTGAGCT GTGGAAATTG CTAAGGCCAA CAGATGCGCG rGCAGGATTT CCTTATGGTG AACGTTTGTC CAAGTATTTG CACCGCTGAA 1380 1440 1500 1560 1620 1680 1740 1800 AACTTGGAGC AAATTTATCC TGACTAGTGA TGAAGCTTTT GAAAGCAAGT ATGGTACCCA AGCAGATAAG AAGCGTAAGT TATACAACGG AACCTTGAAA GTGGATCTAT ATCAATATTT 'rGGTCAGCar GTCAAACGGC AAGAGGTAAA ATAGAAACGG ATACTCATGA GTAAAAALAAG ACGP.AATCGT CATAAAAAAG AAGGTCAAGA ACCGCAATTT GAT'rTTGATG AAGCAAAAGA GCTAACAGTT GGTCAAGCTA TTCGTAAAAA TGAAGAAGTG GAATCAGGAG TCTTGCCTGA GGATTCCATT TTGGACAAGT ATGTTAAGCA ACACAGAGAT GAAA'rTGAGG CGGATAAGTr TGCGACTCGT CAATACAAAA AAGAGGAGT'r CGTTGAAACT CAGAGTCrGG ATGATTTAAT TCAAGAGA'rG CGTGAGGCTG TAGAGAAGTC AGAAGCTCT TCGCAGGAAG TTCCATCTTC TGAAGACATC TTACTACCCT TGCCTCTGGA CGATGAGGAG CAAGGCTTGG ATCCTCTATT GCTAGATGAT GAAAATCCAA CAGAAATGAC TGAAGAAGTG GAAGAGGAGC AAAACCTTrC TCGTCTGGAT CAAGAGGACT CAGAAAAGAA AAGTAAIAAAA GGCTT-ATr TGACCGrI-rr so 00 GGCGCTrGTA TCAGTAATTA TTTGTGTCAG TTCGACTAAG GAAATTGAAA CTTCTCAATC TTTTA.ATACA CTTTATGACG CCTTrTACAC CCAGTTTGAT AAACTGAGTC AACTCAAGAC ACATACGCTT GCCAAATCTA AATATGATAG TGTCA.ATGCT CAATTTGAGA AACCAGCTAT AGCCAAATCG GATGCTAAAT TTACGGATAT GCTAGATAAG GCTATCAGTC 'rTGGTAAGAG TGCTTATTAT GTCTACCGTC AAGTGGCTCG AACTACAGCC AATCAATCGG ATGTGGATGA AGATAGCAAT AAAACGGCT TGAAAAATAG TTTACTTGAT AAGCTGGAAG GTAGTCGTGA TCTAGCAACG CAAATCAAGG CTATTCAAGA TGTGGATGGT GTGTTGGATA CCAA'rGCCAA TAAAACTGGA AATACGGAGC TTGATAAAGT CCAGCAAACA AGTACTTCTA GCTCAAGTTC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 j580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3369 a a a a a
AAGTCAAACT
ACCAAGTAGT
TGCAGGGGTT
TGATAGTAAT
TTCACGTTCA
TAACCGCAA'r
CTGTAAGACA
AGCAGCTCAA GTTCAAGTCA AGCAAGTTCA AATACGACTA GTGAGCCAAA TCAAATGAGA CTAGAAGTAG TCGCAGTGAA GTCAATATGG GTCTCTCGAG GCTGTTCAA.A GAAGTGCCAG TCGTGTTGCC TATAATCAGT CTGCTATrGA AACTCTGCC'r GGGATTTTGC GGATGGTGTC TTGGAACAAA TTCTAGCGAC CGTGGCTATA TCACTGGAGA CCAATATATC CTTGAACGTG TCAATATCGT GGTTATTACA ACCTCTACAA GCCAGATGGA ACCTATCTCr TTACCCTTAA G3GCTACI'TG TCGGAAATGG CGCTGGTCAT GCGGATGACT TAGATTACTA
AGCAGTCGG
INFORMATION FOR SEQ ID NO: 43: SEQUENCE CHARACTERISTICS: LENGTH: 9713 base p~airs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCIPTION: SEQ ID NO: 43: AAGPTACAA TTTAAATGAA TTAACAATTT TCCCAACTAA AAGCACTCCA GTTACCGCAA CCTTTGTACT GAATCTACTA AATCGCATTC CATCAACTI'C ATCTGTTTCG TCAACTTGAA CAGATACTAA TTGAAGATTI' AATACTTCI'G CTG;CCATAGC TAGCTCCTCC TATTTAAATT TTTGGGATTA AGTACTTTAT CCACCCTCAT ATACTCTCTC CACCAGTAAA ATGCAAGCAA TGATACAAAA TAGATTTAAC TATI'TATAT AGCGAAAACT TACAAATTTT TAAGAAATAA TNr'TTTGCArr CTTAAAGATA AAATAGGAAC TTTTAGTAAT AAATATTAAA ATAAATAAAA TA.ATAGATAC TATAAAATTT GGAAGTATT1A ACCCCAAAAG ATTCATATCA TCTATTAAAA a.
a TATCCTCTA4 AGAGTAGTAT ATGAAGTAAC AAATGTCAAA TAGAAGTrCC TAAAATTTCA ATGCAATTGA AATAATATTT ATGCTACTTT TAGCTCTAAA TCATCTAAA ACTAGAAATT TATAAGACTT ACAATATAGT ATATAATTAT AGAAAGTAGT TAATCACAAT ATCTAAGATT TATCTTGGAA AAT'rTGTTGC ATTAAAGCCA TAATTTTAAT GTTAAGTAAA AATGCAATTA AATATAGCCT CACCAACTTT kATCTTrAACC ATCTGGTAAT AATTGCTGAA TCTCAATCCT TTCTTGATGC GATGACAAAA GCAAGTACTA TCAAAATTGG 'rGCTCCTACA TAGACAATAA TCACTGTCAT CTTGAAATTG AGATACTATA TTCTGAGAAA AGTAATATAG CTCCTGTAAT TGCAGCACTG ATAGATTTTA AAATrCCACT TCGAAACAAT GAACATAAAA TTATTrCTAA TTGATAAAAC ATGACTGTAT AAAAGGAGAT AATTGATAAA ACAATATTGA ATATTATCTG GGCCTTCGCT AA.AATTGTGC AAAGAAAGCA ACCAGATAAC ACTAAAACCA GCCAATAGCA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 a. *a a a a a a.
C
GTATTCTTTT TACTATTGAA AGAACATCCC TTAT TAGA ACTCTTCCTA TTTCTAATCT TCTTGAACGT ATAAAAGCAA CCACTTAGAA AGGCTAAAAA TGAAATCAAC ACTACTGTAA TGATACATCC AACACCACTC GTTTGAAATT GGATATCAGG TAATATAT'rT TCCCCGAAAA AGTATTGTAA AAAATAATAA TAATTTGACG TAACAAATAT AGAGCATAGA TATGCAATAA AACTAATAAT CGAGGAAATG ATAAAAATCT GTCCCCCCAC AAGAAATGAT AG7'TGAGGC GACTTGCTCC CAACACCTCC AGAAGTTCGT AATCATCTCT AAAAATTTCA ACCAACATAT TTATTATGTT AGAGAGCACA AAGAATAATG TTACTCCTCC GAATACTATC GGAAACATAA AAATTGGTTT AGGATCTGGA AGTCCGACAA ATACTTGCGA ATTATTCTCA ACATTAATTA CCCCATTAAC AGCCAATCCC ATAACTAAAC TCGAAACAAA AATTACTGGT GAAACGCCTA 407 ACCA7TG1-rT CTTATTATGT AAAAATTGAT AGTAAACTAA TCTGAGCATC TCTATTCCTC CGTAGrrGAT TGTACCTCTA AGATI'TATA CAACTCT'rCC CCGCTAGGTC TATGAAGrC r TGAAAATT TTrCCATCTT TCAATATTAA TGCACGATCA GTTTTCGACG CCAATTCTAT ATCGTGCGTT ACCATAA'rTA CACACTTACC CGCCCCTACT AACTCTCTCA ATAATCAA AArrACTTCA CGAGAAACGC TGTCTAAAGC CCCAGTTGGC TCATCAGCAA ATATTA'rATC ACTATCAGCA ATAACCGCTC TAGCTATAGC AACCrrC'rGT TGTTCTCCAC CAGACAGAGT TCCAACAAAA TCGTAAGC CAGCATTAAA CTTCA'I-Tr T GACTAAGT 7TITCTACATT TTAATAGTT AATTTTrTT GTGATAATCG CAAAGGAAG'r GCTATATI' CTATTACCGG CAGGGAAGGT ATTAAATTGT ATGCTTGAAA TATAAAAGAT ACTTCGTTAC GTCTTATACT TGACAATTTr GCATT'rCTGA T-TTTATAGGG GTTGATTCCA TTTAAAAT'rA CTTCCCCACT TGTTrGGTTCA AGCAAACTAG AAATACATTT TAATAAAGTT GACTTTrCCAG AACCACTAAT TCCTAGAATA CTTATAAATT CTCCTCTCCA AGCAGAAAGA GAAACATTTT TCAGCACTTG CAACGTTTTrA TTATTNCCTA GTAAAAATTG ATGATACAGC CCTTTCACTT TTAA'rATATA ATCrTTTATCC ATATTCTTGC CTCCAATCAC TTAA'r-rTA AAAGTGTTCC ATTrTZ.CAAT TTATATATAT CAGTGTA'rCT CTTGTCA'rTT A.AGTCATAAT GATGTGAAAC TTCAATAAAT GAAATACCTA AATTGAACAG AATATCATGT ATGGAATTTG AAT'rATCATT ATCTAA.ATTA GCTGATA'TTT CGTCAAATAA GTACACTTTA TTATTTCTAA TCAGAGCTCT AGCTAAAGCT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 *5 S. S
S
ATTITTTTGTT TTrGACCTCC AGACAAATTA CTACCATTTT ATATCTATCT TTTCTAAT'rC TTCATATAGA TTTACCTT- TCATCTGAAA AATATTCATT 'rrGAAATAAA GTTACGTTCT ATATATGGTG TCTGATCAAC TGTT~GGTATT GAATCTGAAC AAATTTACAT AACCTTTG TGGCTTTAA.A GAACCATTAA TTCCCACTAC CAGAAGT'rCC TGTTAATAAT ACCCTAAATG ATACTTAA?'r TATTTTCTGG TGTAATAGAA TATACAACAT ATTGATGAAG TATACAGTCC GTTATTATCA TGTTCAGCGT CTTAAGTATT TTAAAAACGG TTTCCTTAAA TCTTTGGT-rG TAGGCAATrG ArTGTATCGG CCCTAAAACT TTATCGTTTG
CACCACATTG
TTAACACCTC
CACGAATAGT
TCTTTTTCCC
TTAAA=~AA
GTGACTTAAA
CTTVTCATGTG
CTA'rAAAATT TATT'rATCTT
CTAAGAAAAT
ATCCTACAAC
AAAGAACCTC
ATAATTTAGT 2640 AATTA'rCTGA 2700 AGTGTCAA.AA 2760 ATGTGATAAC 2820 AATCGTTGTT 2880 TGAGAAGTCA 2940 TATCTCATCT 3000 CTTCTCTCCA 3060 ATTTAATGAA 3120 ACCTATCAGr 3180 CAAGGGAACT 3240 TGATCGV1'TC 3300 TCACTAAAAG AAAGGCTTT AGAAAGCAAA AACCTGAAAT ATGATAAATT ACAAAATAAC TAGTACTGCA ACCAATTTTG 408 AAATTAAAAG TAGAATCTTC TAGTTTATCC AACTTTTT-AT CCGACAAACT AATTATTTCT TTAGAACAG AATA~AATTT TAAGTCTTA AAACCArTrAA AAATTTCTTT' TATTATGTGA GTATACTCTC CATIGCTGTr AGAGTACTCA TTAGCTGAAT AAGACAGGTA CTATAATCGG CAATGCTGAT AATACAATAA TTI'AAATAAA GCATAAAACT TAGAGAGACO ATGAACAACA AAAATTTGTC TAAAATAGT' rrCTrCGAT- AATCTCAAAT ATAGATGAGT AATC??rAAC CATTTCAGAA GAAAGATACr TrAA'I-I-TTA CATTTATATC 'TrTAGI-ATr GATGCTTCCG GATATATAGA TTGCTGACCA ACCCAGAATA CTTATAGCAC AATGAGGAAG TCTGATTTAA ACTACCTGC-A TATACAATAA TTAA.ACGAAG ATAGAAA'rAT TAAAATCCCC ATTAATATAA TAGACAACAT CI'CTTCATA ATATTA'N'GA nAcTAGGAAG ATATrGAAGA AATTATTTCA CATTTGACAA AACTGAAZATA GTrCTCTAAA ATATCCTTGT TACTTCTAA ATAGTAATTT CAAATCTTAG AACGTCAGAA TTCCTGAGAG CAAGACACCA
GTTTAGTCTT
00. ~0@
S
a.
0 0 0 00 *0 0* as 00 0 a *0 @0 @0 0 0 TrTAAATAAT 'rCATAAGTrA TTCCTTCCCA CTTCTrCAAA GAAATAATTT TCATTAAGAG AACATCTGAT GGAGTAAAAC CTCCATGACC AGCTGCTTTG ACAAACTr'rT AACTCCAATA GAATTTAATT- TC?1'TGACCA CTCTATCACT TAATATATGG G'rCTrrCTA CCCAAAATAT TAACTATAAC AGTA'r'I-GAG TTTCAATATT TrGCATAGGC GAATATGACT TATATAAGC CTTTACTTCA TATCTCCCCA CTCTGCTATT TCGGTCTTAG AAAGAGGATC AT .TGGATTC CATAAGGAT'r TATAAATGGC GAAAATAAGA GAATG3CTTTG CAATAAATTT TCAACACCGC ACCAGCAATT ATTCCACCTG CACTAGAAGT TATTAAACCT TGTCAATTAC ATCAT'I-rCC CTTAAATAAT TTACTCCCTC AATAAAATCT
TTTTATAAAT
AAAGTATCAA
TTTAAATACA
TCGTTATTAT
TCTCGTGCCT
GGGTCTCTAA
TGAAGTGTAT
'NTTTCCTCGT
AATCGCTTAC
CTGATAGAAT
3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 @0 00 00 0 0000 0 *@00 ao.
00 @0 *0 *0 a 0 TCCATTTGTT TAACGCCTTT CCTGAGCGAT ACCATTCACC ACCCAAATAG CCTCCACCTC TTACATGAAC TATAGCATAA ATAAAACCTG CATCTATTAT AGATAACATA ATTTCATCTA AATCAGAAT'r ATCATTCTTA CCATAAGCCC CATAGACACT TAGAATACAT TT71rTTrC~ TTGGGAGCTC ATCCGTATCT TCACTTTTCC AAAATAAAGA AATCGGTATG CTTACATCAT AACTGTCTTT TTTAGTCCAA ATCACCTTAG AAAAATATTT AGTATTATTC GA7=TATGA TGGGTCTTTC AAATTCAGTT- TTAATGTAT TTTCTATTAA ATCAAAACTA AGTA7TTTTT CGTAAAAAGT TCTCCTCTCT AAAAACAGAA GAACACGATC AGAAAATGAA TTTTCATAAA GTGTTGTCTT TTCATCAAAT GTTATCTTAT TAACACTCAA CTCCCTCAAA CTATTA7TM TAAATGTAGC AAGATAAAAG ACGGAATTCG CTGCGTTTGA ACAGTCTAAA AGGATATAAC GTCCTATACA GTG.AACTCTr CTAGCCCTAT CTTGATATGG TATAGTAATA GAAACTCTGT 409 CTCCCGAAGA AGTTTCCCTT AGAATI'AGTT CAAGAAAGTA CTGTGC7=T TCTGTACTAA CCGTrcTGT GT AAGG1T CTAACAAAAC TGCCTCTG AAAGTCTACT GACTTTACAA ?rAATCGAAA AGAGCAr'rCG TTITCTTCAA ATTTATAGAA TAACTTACTT GGCCTCCCGT A'rTACCCAA ATAGAACGAA CrrTCTACTG TAGTATATTT TT=ATTCCA A'rTAAATTAG TATAACTCTC ATCAGATGAA ATCCTAACAT ATT~rrCCAC ATCAAAGACA ATTTTAAGTG CC7'rAGGAAT CCAGTCAT'rA TCTTCGACAT TACTCCAATT A'rCAAA'rTGG TACCAATATC TTTTTAATTC CTGAAATGAT GAAGAGATAG TATTTAAAAA TATTTCA'rTA CTCTGATTCA ATATTATATT TGAGGAAGAA TCGTCAATTT AACACCCGCA .AA'rCCCGAAG CAATATCTGT AACAAATCCT TCTTCAATTA CACACAAA'rA TCCTGAAAAG T'rATCATCGA TATCACTATA TCCGTTA.AAT AAACCTGGTA ATACACAAAA TAAATATTTT AAGTATTTGC TTGACAAGAT GATCTTT'rr TTCTTCAGTT GAAGAGAGCC ATAGAGCGAT ATCTCTAGGT GTTGGGGCTA CCGTCCGGTC GAAACTGTAT AGAAAAATCC AACAATTATr GCTATCAATG TGGACTATTT ACAGTTCCTC TTCTGTAAAG CTA'rCAAAAG ACTCTrGGA GCCTATAC ATAACACCGA AAATATCT'TC AATGATAAAT AACTCTTCCA TCGTACGCAG TGACCATACA ACCAAAACTA CCTGTAAGAT ACTATCATCT GGCAAAGTAT AATTTGAArr GTCTAAACTG GAAGAACTAA 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820
ACCATTCCTT
GCCCTCTCCT
ACCTCTTATA
CAAGTATGAC
ATTATCCAT1T
TGTTATCTTT
TCTATAAAGT
TATATTATTA
GTGTG=
CCCTTAATAA
ATTGATACCA
AAACCATTAT
TGTTCAATTA
GCAACTTCAA
TCCTGTATTG 5880 TGGTATCTAA 5940 ATCCAATTGC 6000 CTCCCGCAAT .6060 ATTTCT7TTTG 6120 GACCACAAAA 6180 CTAAAGAACT 6240 GTAAAAGCAG 6300 ACTGCAATGA 6360 TATTACACCA GTAT'rGGGTA AAATATCAAA GAATTCCATT AACTACATCA GTTGCCCTCT TTCTTTATTT CTAT'rAATAA GCCAGCACTT CCAGTTGCTA GATATGGTAG TAATCTATGA CCTTGCCTGT ATTATTACTA TCTAC'NTrAT AAGCAACTAA 'rrCTTTATCT ACAGCCAAI-r TTTATAGATA CTTCACCAG TTAATTTATA AGCTTCACCG AAGAGCCAAG GTGACCATAT AGTAATCCAC CAAAATTCTC ATAAGGATCG TTACTCTGAA GCCAACTTTA CAAAAAGTTIT CTGGATTTTC TATATAATTT AAAC'rATATT AATTAGTATT TCTTCTCCTA GTTTATTATC AATTCCCCCT TTACTAAGAA AACCAGTAAA ATTCCAGCCT GCCCACTATA TAAA'TrTA T'TTTGTGAAT CTCTATAAAA TGAGTTGTAA AAAGT'rCAAC TGCCCGATCT ATCTCCCCAA GAGCCAGATT GTACCAATTT TACCATCAAA AAGACCAGAA AGGGACGATT
CTAGACCATT
CTACCCCTC
CATCACTAGC
CTCTAAGCCT
AATACAGTCC
TCTCAAATAT
ATTCATAAAT
TCTTAAAATT
6420 6480 6540 6600 6660 6720 6780 6840 ATTTACTGCC TCAT'rAATAA TTTTACTTTC TTAGCTAGTT TCCTTGATTA AAATTCAGAC ACGCCACTGT CCATAAGTTA ATCTTGCTCA CTCTCGA'rAG AATATTC~TA ATCTGTGGAC TATATGTTCT CTTGATAAC 410 CCTGTGTrCG AATCTCATAA TAGTCATCAA ACTAAT GTTGA'rAACT CCAAAGCATA GCTAAATCTG AAAACGCAxr CATAATAATG AACTGjGGAAG GCGTAAACCC TCTCAATAA'r AA'rCTTGATT GAAATTCT1-r ?I'TATAATAA AATCTTGTAT
TTCTAATCTC
AT'N'AGAATC
CCAAAGACTC
ACI'ACATAG TTGTATGTCA AATCCGATGT TA7l-rGTC'TA ATACCATACC AATCTATCTC ACCAGGGGCT GCAACTTTAT GTACAACTTT GTCTATTATA CTAATCTCAT CrrCATCCT'r ATGATATACA TTTTCAGAAT GAAACTTATT TGAAACTCTC AATAGATAAT CTTTGGTCTT GGTAACCCAT CTA'rTTAGTG GAACGCCCTT CCAGATCTTA CCCTGCCACT ATATTT'rAGG TGCTTTGCAC TCCGAAGCTA ATTTCTCTGA ATACGGTCTA GCCTCTTTTA AAATTAT!TTT TATCCCACCA CTGTTTGAAA ATCTAATTGC TN'rTrTATCT TTCTTGTCAA GCCATTTATT AATTTTAAAT ACTGGTAAAC GTTCATCTTI'
ATGCATGGGT
TAGATATGAC
AAATAGTTTT
AGTTACTAGT
ACTGGGAAGT
TTCATCAT'TT
AACCAAGATA
CGTTAAATCG
ATCAACAACT
TTAAAACTT
AATAAACT
TT'rCCT'rCTA
GGCATGTATA
GTTCTCGCC
GAAAAGACAG
TrrCCTAAAT ATGACTTTrr
TCATATAAAG
CATATGTTCA ATTCCTAAGA CGTCTCACTC CAT'rCA'r'-A AGAATAAGTA CCATCAAATC TTTCCCATCT TCTTTTAGCC ATTATCTATA ATAAAGGGAA CAAAAAGTCA GGGGGCACTA AACAACTTCA TCGCCAACAA CGGCCTA AAC ACACCATACC
TTTTCCTGGA
CTACATA.ATC
TCCTGGT'rTG
AATAATGAGC
ATGCTCTAAA
CCTGTTCCCA
GT1AAATCTTG
CTACTATCTT
GAAAATTATT
AGGTGTGCTC
GAATTT'rTAG CTAGACC'rGT
TAGCATTATA
AGTCTCCCTG
TACCT7TTTGG
TTAATTCATC
TCAGATATA'r 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860, 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 AATAGCAACC TTC7rTTCAT TGGTGCTTCA TCCCAACGTT ACTTTCTAAC CTTTGCAAAA 'rTTACCAGAA AATCCTCGAC ACTAAGATGT T'rAAATGGAA TTCATGTGAA CTGTTATACT TTTTCCAATA GGGTATTGAT AGGCAGACTT ACTTGGTACT GAAAGGATGC TCCAAAT'I'GA
CATCCCTTGA
TATCGCTTAA AATATATGGC CCATTATAT'r GCNTTAAGGC CCGACTCTAA TTCATTTTGA TTTGGATAAC ATGTAATAAA TAACCAATTT CCCGTTTCGC ATCATAAAT'r TGTCTTCTGT TTCGCA'r'r'C ATCGCAAATT ITTTGCTACAT CTTGTA.ACAA CTGAACTAAT GTGTAT'TrC CACCCrrGTC 'ITTTCAACAAA AAACCCAC'rC ATCATTAT'1C A'rTACTTCCT GCCAArTA.AA TTATGCTAGT ATCTGTACTA TAATCATTAT TAGTGAAAAA Ax1TTATAATC CATAACAAAA TCTCCAAGAA A=TTATCAA TTTAAATAAA ACTrAATATA TCTATAGCDA GACAGACTTA rATA AGGATCT'GGT A.AGGGAGAAT CCTTTGGATT CTCCCCATAT AAGCAC3'AAC ATTCCAACGT GAGAATCTCT AAAGTXTACA ATTAAATGA AGTrACCGCA ACGATTTGTA CTGAATGTAC CGTCAACTTG AACAGATACT AATTGAAGAT CCTATTTAAA TTTTTGGGAT TAAGTACwrT AAAATGCAAG CAATTATACA ATGTTGTCAC AGTAACTTCC ATCTCTCTCC CAAAACTGGA GTCACCTATT TTAAAAAAGC AGCAAACTAT CCATACTGCC CCATAAGTCA GAI-rTATAGC CATACAAACA CCAAGCTAGA ATGGTTCCTG TCAAAGCAAC TCTGATATCT AATTTTCTGA ATTCTTCAAC CATACTCGCA TTGATTAAGA GTTGAAGGCC AATTAAGT'rT GCTTGATTCG ATAGACTTAT AATCAGTAGG CTAACAAATT TATTGACCTT ATGCGCTTGT T'rGCGTTGGC AACCATAGAG AATCTGTAGT ATAGTTh1ACT GAGrTACCAA TAsGACATTT ACTTGTTGGA TTACCTCCGA AATAAATCTT CATAATCTAA
GCACATATTG
ATTAACAATT
TAAATCGCAT
TTrAATACTTC
ATCCACCCTC
ATAGAAAATA
AGTTAGTT
AAACTAGTAG
GAACGACATC CATAACTCCA TTCCCAACTA AAAGCACTCC TCCATCAACT TCATCT3TTT TTCTGCCATA GCTAGCTCCT A'rrATACTCr CTCCACCAGT ATGTTTCCGT AACTTTTCAA AGAAGTTACC TAAAAATCAG GTTCCACACC AAATGTAGTC GCACCATACC TAAAAACATC CCAAGTGAAA TATGATGTGC TAAGGCAAAT AAAACACTTG CCAAATTCCA TAAAATTTCT CGATACAGAA ACAATAAAAA TGAAAACCAA GGAA'rTTGAT TGCTTCCTTG AGCATGAATC AGACTAAAAC CAACACCAAG CCATTTCATC CTAGATTTCA CATACATCCA TAAAAAAGAA ATGAGTGACG CACCGATACA.AAGAAATTTC AATAAGTATA ATATATAAAC TGGAATTATT CT'rTTCATAG ATCTAATACC TGCACAATCC ?T 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9713 INFORMATION FOR SEQ ID NO: 44: SEQUENCE CHARACTERISTICS: LENGTH: 8657 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: AAAGAAATTG TCAGAGAGTG GCTAGATGAA GTAGCAGAGC GGGCTAAGGA CTATCCAGAG TGGGTGGATG TTTTCGAGCG TTGCTACACC GATACCT'rGG ACAATACGGT TGAAATCTTA GAAGATGGTT CAACTTTNGT CTTGACTGGG GATATTCCTG CCATGTGGCT TCGAGATrCG ACAGCCCAAC TCAGACCCTA CCTTCATGTA GCTAAAAGAG ATGCCCTCCT GCGTCAGACC ATTGCAGGTr TGGTCAAACG TCAGATGACC TTGGTACTCA AGGATCCCTA TGCTAACTCC 412 TTCAACATrG AGGAGAACTG GAAAGGGCAC CACGAGACTG ACCACACAGA CCTTAACGGC 360 TGGATCTGGG AGCGCAAGTA rGAGGTGGAT TCGCT~rGCT ATCCTCCA GTTGGCTTAT 420 CTCCTCTGGA AAGAGACTGG CGAGACrACT CAGTTTGATG AGATT= CGCAGCGACT 480 AAGGAAATTC TCCATCTGTG GACGGIGGAA CAAGACCACA AGAAC'TCTCC TTATCGT?1TT 540 G'rCCGAGATA CGGACCG'rAA CGAAGACACC TTGGTAAATG Xr=CTTGG ACCTGACTTT 600 GCACTGACAG GTATGACT'1G GTCAGCrT CGTCCGACTG ATGACTGTG CCAGTATAGT 660 TACTTrGATTC CGTCAAATAT GT'rTGCTGTA GTAGTCT'rGC GT'rA'GTGCA AGAAATCTTC 720 GCAGCATTAA ACCTAGCTGA 'rAGCCAGAGT GTTATTGCTG ATGCCAAGCG TCv'TCAGGA'r 780 GAAATCCAAG AAGGAATCAA AAACTACGCT TACACCACCA ACAGCAAGGG CGAAAAGATT 840 TACCCTTTTG AAGTGGATGG CCTAGGAAAT GCCAGCATCA TGGATGATCC AAATGTACCA 900 AGTCTACTAG CTGCGCCCTA TCTGGGCTAC TGTTCGG'rCG ATGATGAAGT GTATCAAGCT 960 ACTCGTCGTA CCA'rTTTGAG CTCTGAAAAT CCATACTTrCT ACCAAGGAGA ATACCCAAGC 1020 GGTCTCGGCA GTTCTCATAC CTTCTATCGC TATATCTCGC CAATCGCCCT TTCTATCCAA 1080 GGCTTGACAA CAAGAGATAA GGCAGAGAAA AAATTCTTCC TGGATCAGCT GGTTGCCTGC 1140 *GATGGTGGTA CAGGTGTCAT GCACGAAAGC TTTCATGTAG ATGATCCGAC CCTCTACTCT 1200 **CGTGAATGGT TCTCCTGGGC TAACATGATG TTCTGTGAG;T TGGTCTrcGA TrACTTGGAT 1260 K-A'rCGCTAAG GGGCTCGCTT TAGCTCAACC CATT"CTTATlC AGAATCACAX GTTTACATT'r 1320 *AAAACGTTAA AATT'rAAATT TAGAATGAGG TTTTACTTCA TGCAAAATGT TGTTGTACAT 1380 *ATTATCTCAC ATAGTCACTG GGATCGTGAG TGGTACTTGC CTTTTGAAAG CCATCGTATG 1440 CAGTTGGTGG AATTGTTTGA CAATCTCTTT GATCTCTTTG AAAATGACCC TGAGTTCAAG 1500 AGTTTCCACT TGGATGGACA AACTATTGTC CTTGATGACT ACTTACAAAT TCGCCCTGAA 1560 *AATCGCGACA AGGTCCAACG CTACATTGAC GAGGGCAAAC TTAAAATTGG TCCCTTTTAC 1620 ATCTTGCAGG ATGACTACTT GATCTCCAGT GAAGCCAATG 'rCCGCAATAC CTTGATTGG'.' 1680 ***CAACAAGAAG CTGCCAAATG GGGTAAATCA ACCCAGATTG GCrACTTTCC AGATACCTTT 1740 GGAAATATGG GACAAGCGCC TCAAAT'rCTT CAAAAATCAG GCATTCACGT GGCGGCCTTT 1800 GGTCGTGGTG TGAAGCCGAT 'rGGATrrGAC AACCAAGTCC TTGAAGATGA GCAGTTTACG 1860 TCTCAGTTTT CAGAAATGTA CTGCCAGGGT GTGGATGGTA GTCGTGTTT AGGTATI'CTC 1920 ***TT1TGCCAACT GGTACAGTAA CGGGAATGAA ATTCCAGTTG ACAAACATGA GGCCTTGACC 1980 TTCTGGAAAC AAAAATTGTC AGATGTGCGT GCCTACGcrr CGACCAACCA ATGGTTGATG 2040 ATGAACGGCT GTGACCACCA GCCTGTACAG AAAAATCTGA GCGAAGCCAT TCGTGTGGCA 2100 413 AATGAACTCT TCCCGGATGT AATCTTTGTT GTAGAAGGTG CGCTTCCTGA ACACTTATCA ACAG.ATGGCT GGTACACACT TGCCAACACT TTCCAAGAAA ATAGCAACCT CCTAGAGCAA GGACACAACC ACAAGGACCA GTTGACCTAT CATAGTTrCTT TTGATGAATA TTTCAAGCT ACTGTTACAG GCGAGTTGAC CAGTCAGGAA TCTTCATCCC GCATTTACCT AAAACAAGCC GTGGTAGAAC CCTTGACTAT TATCACTGG'r GCTTGGAAAA CACTTTGCA GAATGCGCCA 2160 2220 2280 2340 2400 2460 2520 CATGATAGTA TCTGTGGCTG TAGCGTGGAC GAAGTTCACC GCGAGATGGA AACGCGTTTT GCCAAGGTCA ACCAAGTAGG AAACTTrTGTT AAAA~TGCTA CGCATAAGGC TCAAAGTGAC AAAAGTAAC'r TGCTCAACGA GTGGAAGGGT TATCTCT'N'A CTGTCATTA.A CACAGGCTTG CATGATAAGG TCGATACTGT 'rTGCACCCAA CAGAAGGCTA GAGGACTTGG ATGGTCGTCC CAGCACAGTG ATrGATGTGG CGACTTGTGA TTTCAAGGAA CAAAAAGATG GCTGCTCTTA TCTTGCCAAG TTACCGTGTG TGTAGAGGCT ACAATCGAAG ACCTCGGAGC TAATTTTGAG 2580 2640 2700 2760 TATAATTTAC CAAAAGACAA GTTCCGCCAA GCTCGTATTG CTCGTCAAGT GCGCGTGACC ATTCCA C ACCTAGCGCC GCT'rTCTTGG ACAACCTTCC AATTGCTGGA AGGAAAACAA
GAACACCGTG
GTGGATGACA
CGCTTTGAAG
GAGCCAATCT
AGGGTATTTA
ACATCACAGT
ACCGTG;GGGA
TTGCAGAGCT
CCAAAACGGA GTGATTGATA CACCATTCGT AACGGTGAGT CTATGACAAG ACAACTCACG AAGCCTATGA AGACTTTATC CATCGGAAAC GAGTATATCT ATCCAACC AAAAGGAACA TAAGGGCCAC GAGGTC'rTGG AAAACACAGC TTGCTATGCT 2820 2880 2940 3000 2060 3120 3180 3240 3300 3360 AAAATCTTGC TCAAACATGA ATTGACCGTG CCTGTCAGTG CGGATGAAAA GCTAGAAGAA GAGCAACAAG GTATCATCGA GTTTATGA-AG CGTGAGGCTG GACGGTCAGA AGAATTGACA AACATTCCTC TGGAAACTGA GTTGACTGTC TTCGTTGACA ATCCACAAAT CCGCTTCAAG ACTCGCTTTA CTAACACTGC CAAGGATCAC CGTATCCGTC TCTTGGTCAA GACTCATAAC ACGCGTCCAA GCAATGATTC TGAAAGTATC GCTGCTTCAT GGGAAAACCC TGAAAATCCT GACGATGAAA AAGGGGTGAC TGTATCCAAC GATAACACCA TTGCCGTGAC CATTTTGCGT TTCCCAACGC CAGAAGCACA ATGCTTGCGG CACCAAGCCC AAGAACGCTT CTCAGCCTA'r ACCAGCCTTC AGCTTGCTAG ACAGGAAGGA CA7NT=TTC TCAGCATACC GCAAGTTTGT TATGAGGTGG TGACACCACC AAACAAACCA CAACACCAAC AAGCTT'rTGT CAGTCTGTAT AAGGGATITGA ATGAATACGA AATCCTTGGG GCATCAGGTG AGCTAGGTGA CTGGGGCTAC GAGTTTGAAG TCGAGTTTGC ACTTGAATGC CGTCGTGCCA AAGCCTTGCA GACACCGTTT AGCGTGGTTG CGACTGGTAG CCTCTTGAGC CCAACAGCCT TTAAGGTAGC TGAAAATGAA
GAACGCTATG
CAACATCTCT
CCACAAGAGA
ATCAAAAGAA
AAAAACAATG
GACTCCTGAT
'rGCTTCTTA
TCCTTGACCT
TTCGTACAGA
AGGAGGGGCG
ACCATTGCAA
GGGAAAATAC
414 CTACAATATG TGTAGTGAAA ATGTACGTrGT CCCAGAAAGT ACTnrGAACGA CCATACCCAG TTCATTCAGG ACTATTGGCT ATTCATCAAA AAAGAAGAAA TTTAATr'rCA AAAAGTAAAC AAAAAGTAAG AACTAACrC TGATTCGCCC CTTTTATGGT CGATTGATAT CCGAGGGACT GGGA'rTAAGT TTGCCAGTCT TGGATAAGAC AAGTATTTCA ACGCCTGAAA ACT*rGGA TTTACTAGCG TGGCTAGATC AACGC?1'GTC CCTTCCAGGT GCAGTCAATC AAGAGACAGG AGAACAGGAT TACACTGGGA TTGCTATGAG TGTGATTGAT GGCTTCAGTG CGGTGCCCTA CATCCATGGC TTTTCTTGGT AAATGATGCC AACTGCGTrTG AGCCTGTGTC GTGATTGGGA TCGAGGTCGC CACCGTCTGG AAAACTTAAT AACTGGTCC
ATGAGGCGCT
GACTCAGTGA
CAGGGATTGG
GTGGAGAArr
AACTAGCATC
I.
I
I
**II
I
I. I@ AAAATCTGGT CA'rACTGATT GGGACGGTCG TATCCTTTGT CAAGAAGCCA TTGAGCGCAT TATCCAGI'AT CTGATCGATC CAGGTGTCAT AGATTTTATC CAAGGTGTCA AGAAGGCTGT CACGGTCGCA CCAGTTATCC AGGCCTGCAC TCTTGTCAAC TGGCTACAGG AGGAAAAGCA CAAACGCAAG CTATTGAGGT TTTAAAAGGT GTCACTCAGT CTGACCAAGC ATCTATCTCT ACCTACCGCA AACCTCACCA ACTTTATCGT GAAGCTGATA AAGTAGAGAT TCAGGAACAA
TAGCTCTTAT
ACTACTAGCT
CGGAGCCATG
TGGCTACATG
AACTGGGAAT
CAAGATTTAC
GAACCGCAAT
CAGTCTGGGT
TGAAGACTTT
C1'ATCACGCA
ATGGTAAGAT
CACATTTCT1C
ATCGAGGGTG
GCCTTGTCCT
GCACCTTACC
CAGCTACCTG TCCATTTAGA CATCCAGAGC TTGAAAATGC ATTATCAATG GTAGACTTCA ACAACCCTTG CCCCTGCTGA ATGGTACGAT ACGTGATTGA CAAGAGGCCG CAGCTGGTAA CTGGCGCAAG GCTTCCAA GGCTCTATCA GTCAAAATCC GTCG;ATGCCT ACGAAGAATA GATGCCAATC TCTACGGTGC TTACAGGACT TAGTCTCAAA TACCAGATGT GGAAGTGGCT AGGAAGGTCA CTATCAATTG 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640
TGTTGGTAAC
AAGATT'rGGC
AGCAGATGAT
ACACTTACCA
AGTTCTAGCA
TTACATGGTT
TGAGATATTG
GATTGAAGGG
GACTGTTCTC GAAATGCGGT GCTCTCATGG GCTACTCAAC CAGCCTrACI' T'rGGCTATT GCCTATGCCC AACAGTTrGA TCGGCCTTTG TCAAATGGGG CTTCTCATTG GCGAAGAAAA AAACTGAAGA CTCGCAAGGT GCTGAATGTG GCTTCTGCCA C GAGCTT TACATGGAAG CCGTGGAGCT TATTCAGCAG AGGAGTTGCA GGAAATCGAA CGTGACCTTT GTACCATGCA TCCAGACCTT GGCCCACrrG TGTC.AAGGAA CTGCAGGAGC TCCGTGATGT GGTTTIATGAC 'I-rGATTGATG GCATGIrTGC CAATATCGGG ATGGACGAAG CCCACTTGGT
AGAGGACATT
CACGTTGTCT
TCGTTTGGGA
415 CGCTACCTGA TTCTGAACGG TGTTGTGGAT CGTAGTCTCC TCATGTGCCA ACAC?1'GGAG CGCGTrGCTGG ATATTGCrGA CAAATATGGT TTCCACTGCC AGA'rGTGGAG TGATATGTrC 7rCAAACTCA TGTCAGCGGA TGGCCAGTAC GACCGTGATC TGGAAATTCC AGAGGAAAcT CGTGTCTrACC TAGACCGTCr CAAAGACCGT GTGACTCTGG TrrrACTGGGA TTATTATCAG GATAGCGACG AAAAATACAA CCGTAA~wrTC CGCAATCATC ACAAGAT'rAG CCATGACCTT GCATTTGCAG GGGGAGCTTG GAAGTGGATT GGCTTTACAC CTCACAACCA TT7,rAGCCGT CTAGTGGCTA TCGAGGCTAA TAAAGCCTGC CGTGCCAATC AGATTAAAGA AGTCATCCTA ACCGCGrGGG GAGACAATGG TGGTGAAACT GCCCAGTTCT CTATCCTACC AAGCTTGCAA ATCTGGGCAG AACTCAGCTA TCGCAATGAC AATACTGGTC TAACGGTTGA GCATTTTATG CTACCAGGCA ATCTCAGCGG TATCAATCCC TGTCCGATTC TTGATCAACA CATGACACCT GCTGAGACGC TTGCTAALCAT TAAAGAAAAA CAGGCCCAGT TGAATGCTAT TTTAAGTAGC GCCTACCAAG CGGATGATAA AGAAAGTPTA CTTACGAAGCC. .AAATTGAACGA CTTCCATC-CC CTAGATGG'rT TGTCTGCGCA CTTCAAGACC CAGATTGACC TTGCCAACCT CTTACCAGAC AACCGC'rATG ?TTTTrATCA GCATATTCT GAACAGGACA AACCGCACTT CCCTCAGGCT GCTGGAAACT ATGCC'rATCT C I- IGAAACT AAAGTAGATC TGGGACGACG CATTCCTCAG CAACAAATCG CCAGACAAGA ATTACCAGAA CT=~TAGCC ACCAATGGCr GAAAGAAPLAC 0 69 00 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 66CO 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 5l S S
S
5.5.
S
5656
S
SS 65 5 0 6 AAGCTCTTTG GTT'rGGATAC AGTTGACATC CGTATGGGCG GACTCTTGCA ACCCATCAAA CGAGCAGAAA GCCGTATCGA GGTTTATCTG GCTGGTCAGC TTGACCGCA'r CGACGAGCTG GAAGTTGAAA TCCTACCA'!r TACTGACTTC TACGCAGACA AGGATTTCGC ACCAACTACA GCCAACCAGT GGCATACCAT TGCGACAGCG TCGACGATTT ATACGACTTA A'rAT'rC'TCG AAAATCTCTT CAAACCACGT CAGCTTCCAT CTGCAACCTC AAAACAGTGT TTTGAGCAAC CTGCAGCTAG CTTCCTACTT TGCTCTTTrGA TTTTCATGA GTATAAAAAC AAGAACACCT PGCTTGGCGC AGGGTGTTTC GCGTGAAACA GAAGAATTAT CTGGTTTCAA ATGCTACAGT TAGACAAACT 'rATGATAAAA TAGCAGAAAG TGAATG -rrC CTAAGAGCA-A TTGGAGGTAT TATGCTACAC TTAAAATTAG TAAAACAAGA AATAGAAGCT GAAAAGCCAG CATCTGTAGA AGCTTCGATC ATT'rCCGTCA AATTrAAAAA AGGTTGCTAC CGACATATAT AGATTCCAAA AACAAAAACG TTAGCCGAAC TAGCAGATGT GATTTTATGG AGTTTTGATT TTGCAAATGA TCATGCTCAC GCAT=TTCA TGGATAATGT TGAGTGGAGI' CATGCAGATT CTTACTrrCG TAGCTTTGTT AGTGACGATC TTGAAGAACG rrACACAGAA AATGTCTATC TGGATAGCCT AAGTGTCAAA CAAAAATTTA AGTATTTTI 416 CGACTTCGGT GATGAATGGC CCAAGAAGCT TATCTCGTAC TGGTTrTTGAC TATGAAGAAT
CCAAGTGCTG
AACGTCGCCA
AAATCAGTCT
CATGATTGAT
GATAGGTTGT
AGAGAAATCG
GAACAATATC
GTGTAGGCTr
AATACCAGCA
TGAAAACATT
AGACAGAGGA,
CAGATTATGA
AG'rATTTCAA
ATCAAATTCA
TTAAACGTTT
TAGACTTCCT
TrCGTAATCC
TTACTTTGGC
GCAAAACTAG
GAAGCGT-rTA
AAAGATGTTC
GT'rTTGAATG 7440 GTTCGGTTGG 7500 CGTAAAATTG 7560 AATCCTAGTT 7620 CCATGATTrC 7680 TCAACCTTGC 7740 GGCTTGAGTT 7800 TGATAACCAC 7860 A.ACAACT1TCA 7920 CTTGTGACAA 7980 AATTCTTTTT 8040 TTCTCTCCTT AGATAGCGCA TGGTTATAGG CTTTATCTTC AGcTGTrAGT TGCTGGATT ACGTGAAGTT TGTCCTTGAG GACATATCT1 CATGAGCCCT TGTCAGCCAA GATTTrACCA GCT'rGTCCGA TATTrCTGCA ACTCATTTTG TATCATGACA ATAGTTCACA GTGATATCCA AAGAAACAAT TCTCCCTrGA TCGCTTGAGC CTTCATAGCG TGAAATTTCT TTTTACCAGA ATCATTCGCT TTAGGGCGAT TGATTTTTAC TTCCGTCGCA TCAATCA'rTA CCGTGTCCrC AGAACTAAGA GGAGTTCTTG AAATCGTAAC ACCACTT'rGA ACAAGAGTTA CTTCAACCCA TTGGCTCCGA CGGATTAAGT TGCTTTCGTG AATACCAAAA TCAGCCGCAA TTTCTTCATA AGTGCGGTAT 'rCTAGGC'rTA AT'rTAGGTTT TCGTCCACCT 'N'TGCGTGTT TAAGTTGATA AGCTGTTTTT AATACAGCTA ACATCTCTrT AAAAGCTCGTG CGCTGAACAC CAACAAGACG CTTAAATCGT GTATcAGiVrA ATTGTrACT TGCTTCATAA T~rC~CAGGG ACTrA7"rGA C-TCTTTGGTA GGTGTCAATG T7rT'rTTCAT CTATCCCGAG AATTATTTTTC CCGCCATTTG TATTTGCAAA TGCTGAGTAG G'N'TCCCAGA AAGACTCTGG AAGATTGTTT TTAGCT'rTTT TGTATTCTAA ATCAACCCCT TCAAATTTTA AGTCCATATT TTTCCTTTAC ATCTGTTrTT TGTGGTTCTG GTATTTGTrC AAGTTGAGTG ATAATATAGC GAA'rTGAATT TCGAGAGTTT TTACTCAGTT AA'N'TCTTTT TTAACCC INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 11384 base pairs TYPE: nucleic acid STRANDEDNESS: double D0) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TCTAT'TGG GTATAGACTr ACCTATAAAG AAAAATATCT ATACACTGCC TTACTAGCTA TACTGAACGA GTCAACAAAA ACGATATATA TTGATGATAT AAATACAGCA AGATTTTTTA 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8657 ACTTCTrGG CAATGATATT CAGAATTT TTCATTTAGT TGCAACATAT ATTAAAATCA TAAATA?1'TT AGAAGATCAA TGGTTGATCT CATGTATAAA
CCTAATTCGT
ACGTACGTTC
AGTTTTAT
TTAACAAAAC
TACCTAACAA
417 CTTTAAAAAA AATTGACTAT ATCGCACCT'r GACAACGTC TAAAGTAATT CCTAAAATTr TAGAGAATAT AGATNGTTTCT GGTTACACTG ATAGAACAAT CAAAATTAGT AAAAACTAAC AACCACGCGC CTTGCCTGCT GATGGAAAGA- AATCAAAAAG AA'rGCCCAAA GAGTTATTA AACGGGCAAA AAAGCCCGTA CAACTGTTAC AAGGTACAAA TACATGAATA TCAAAGAAAA TGCTAGTGTT TATCTAGGCG 'rTGACCAACT AGCAACCACT AAAAAGGGCG TTAAAGTAAA TAATGGCTAT ACAGTTAAAG ACAAGCCGAC TTGGTGGGAT AGTTACAAGA ATACAGTTAA GGTTAGAG'rG CATTTATTGC CTGTATTTGG TATTCT'rCAA CAGCAAGTAA ACAAATGGGC ATTGCTAAC TACTCTTTGC 'rCCATAACAT TATCCAGGTA A'rACAATACA ACCCAGCTAA AGAAAAGGCT GCTGTCAAAT ACTTAGACAA AGATGCTCTG GATCAATCAA ATTATGAGAA AGCGCGTGAT GCGATCAATA AATTACAACA TATAATGAGC GCCAAATACT CGCCAATCCA CGATTACAAG CTATCTAAAC I'GACAAGGCA AATAAAGGCG
GAATAAGCGT
TGATGTCATC
CAAAGAATTA
CTTA71MTGAT ATTGGCCACT GGTTGCCGTA TTACTCAGGC TCTGGCTCTT' AGAAAGCGGT GTTATCAGCA TCAATAAGAC ACTAAACCGC AT'rTTGAAAT
GTTCCACCCA
AAACAGTTTC
GT'rGT-rCTGT.
GPLATGGTCTG
TATCAGGAAA
GCCACATTAC
GGCCGATCTG
T'TACGCAAAC
CTTTTrGCTGC
TTGTAAAAGT
TGGAGGGATT
TTACTACGCC
AAAALAGGG4GC
ATGGCGTAGC
AACAGCAAAA
TTGATTATTT
ATAAGACTTT
ATATTGACCT
TAAACTCACC
TTTTACTGAA
AAACAGTTGT
GCCTAAATAA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 TAAATCAAGC GCTGGTTATC ACAATACAAA AACCGTCAAC ATTCTCTGTA TTTACGGAGA GC-AmTTGAT GCTGCTGGAG TACTATGATG CTCTATGCTC TAATTTAATG ATCACTGAAA CGTCTCAAAT TATGAAACAG GCCTACCCTC TTACTATACC AAAGCACTAA GGGAAAGCGC CATAAAGAGA T'rAtTrTTA CACCAAGTTG GTCTTCGATA GTGATATACC AATAGACAAA AAATTCAGTC T'rGGAAATTA AATATGCTTA TGCTTGTAAC TAACTAACGT ATCATTTCAT AGGTTAGCCC GAAAGATGTT ATACTTACTG GCATACTAAC CTATCAACAA TT'TATAAAAA.
AAAAATTAGT AGGGGTAGTA CCCAALAGTGC TTATTTCAAA AGGTTGTAGA ATGA=rCAA CGAAGCAATT GGTTGTATN'T GGTTTCCGCC ATACACATAC CAGTATAGAT TAGCCCACTC CAAGAGAATG CAAAAAAAGC ATAAGGGTGA CCC-ATTTCCG AAAAGGG'rAT TAAATTATAA GGCTTTATAG CCTATAATCA TCCACGATAT TCAGCTACTT AGCGATGCGG TCTGTACGTG 418 AAAGTGAACC AGTCTTGATT TGTCCTGCGT TAGTTGCAAC 'rGCAATATCA AATCTTCAGT TTCACCTGAA CGGTGTGATA CAACAGCAG'r CTAACCAGCT GCGAw=G 'rCTTTAGCCA
TTTCGATAGC
AGTTAGCAGC
CGTCACCAAC
AGTCGTTTTC
CAAGGTAGTC
'rGTAGTCGTA
TTAAPAGT
ACCrrCTTGG
AAGTTGTACT
ATCCATACCA
GAT'N'GTTCT
AACTTTACGT
TCAGTAACAG TACCCATTTG GTTAACTrTT ATAAGGATTG ATACCACCTG CAACCTAGTC AGTG7rrr ACGAAGAAGT 71=TACCAA GACG -rCAGT AAGAGU"1 rC CAACCATCCC TCTrCAATAG TGATGAT'rGG GTATG=A ACCAATTCTT GCAGATGTAC GAACAGCAGC ACCTTCACCT TCAAATTTAG 'rCTrrATCGT AGAATTCTGA TGAAGCACAG TCAAATCCGA ACATATCCAC CAGCTTCA.AT CGCAGCAAGG ATAGTTTCAA AAACGAGGAG CGAATCCACC 'rTCGTCACCT ACGGCAGTTT AT7'rrCTTAA GCCGTGGAA GATTTCAGCA CCGTAACGAA GCACCAACTG GCAAGATCAT GAACTCTTGG AAAGCGATTG TAAATACGTC 'IrTACCTGGT CACCATCTTC AGTTCCTTCG CCAAACCACG TGAT'rTAAGG GGGCTTCTr'r AAATGTTrGCC GAGCGTCAGA GTGAGAACCA TGTTGAATCC ACCAAGATAG CTACAGCGAT AGACACACCG CGTCAAGTGC GATCATAGCA TAGCTTCAGC AATGATGTTG APACGAGATTT G'rCACCGTCG
CCGTTGATGA
CTGTAAAGTG
AGGATTGCAT
CGGTCAATAG
TTTACGTTGT
CGAAGTT-CAA
TGTTCATCAT TGGAGTTGGA AGAACTTTAG GGATTTCAAG GTAGTCAGCA GCAGCACGAG TCGCACCCAA TI'rACCTTTG TTAGGAGTAC CTTGTrGATC ACGTACATCG TAGCCAATGA CAACAGCTTT TTGTGTACCA AGACCACCGT CTGCTTCGTG TTCACCAGTA GAAGCTCCTG ATTCAGTG'rA AACTTC'rACT TCAAGTGTTG CG'rAAACATC AGTAATAATT GACATTTTTT TCTATAATAC CT'rAAAACCC CTCCTTTTC CCTTAACTT-T ATAAAGTAAT CGCTT1TCTTT 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060- 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 ATGGAACCAT ACCACGTCCG AAAGCACCTG GGTTACCGCG TGAGTCTAGG ACTTCGCGAG ACTCTCCTTA TGAGI-rAAAT T'rTTTACACC AAGAAAAAAC GTTATC7"rrG TGCAAC~rITT TGTCTGTTTT ATTCTAACTT TTATGATA'rA TACTTGAAAA AGCTCATCGT G.GGTTAAAAA GTACTTTTGA GGAAAQAGTT CTTGGATATG TAGAAAAAG4G CTI-rTTA'rTT ATTTTAGAAA TGAAGAT'TC ACCAACTATC GAATTTGATA AAACTGATAG 'rCAAGCCACC ATAGTATCTG TTATTCATAG CAATGCACCA GTTCAAGTAG
CTGTTTTCAT
TAAATCCGGA
TAGATATTGA
ACCTTCAGGA
AGCAAGTTTT
AAGAGCATAT
AAGAAAAAGA
GACAGATTTA TCAAAACAAT TGAGCAAAGA CCATCTTG
CACAGCAAAT
AAAAGCAGAG
CTACTTAAAA
TAC'rTCTCCT
CCTTCGACTT
AGCCCTCAGT
CCACTATTTC
GAAGCAAAAG
TTGCCTGG
GCTTTTCCAA
AAC?1'TGGGA AGTTAAAAAG GAAGAACCAG CCAAAACATC CTTATGGAAG AAATGGTTTA 419
GCTAAATCTT
AACTGGCATT
CTTGGATATT
TAACAGGAAC
TTTGACrATC CAAAG'rrGAT
GCACATA'PN'
TTTCAAACTA
T1'CAAATGGT
TCCAACAGGG
AAGTCTCT
TAAGTCCAAG
AATAAGTGCC
TCTTCTAAAG
AGGGGAGCTA
GTTCTTTTTG
TCTCCAACAA
CAATATTGGC
GTTCACTl-1-r AAT'rrCAG.C
CAACACCTAT
CAACCAAGTrC AGCCGTGCGC TCCAGATAGA CTCCAAAATA GAAAACACAG AAGACTACCG CAAATAGCAA AGGCGCTTTC CCAGCAAAGC AGCATCTG.AA ACT'rrCTTAT GATACTTGCC TGAGCAAAGG TTTAATI'TCT AATGTTGCAG 3720 3780 3840 3900 3960 4020 4080 CACACAAACC ACCAGCAACG GTGAGAATTT TTCATAAAAA G1'AGACCTTT TTGATTGCCA CGACATCTGC TAAAATATGA AAACAGAAGC TAAGTTTAAT CATAAAATTG ATAACCTAAA CACCAACACC GATATAAATA CAATACCACA AGTTTGGATT CAACCAAGTC AGCTACTTCA CrTTTGT-CC AAAAGGG.TCT GGATAGCATC TACAGTACCT CTGCTATCGA TTGTTGGAAG CAGGTATCAA T 'CCAGACAC CCTCCACCTG CTCCTGCTCC TCTTGGATCC CCTGATCTAC AAAGTGTAAG TCGCACCTTG ATTTGAACAC CT'rCAGGAAT
GACTGCAAAC
ATGACCACAT
TTTATAGCAA
AAGGATTGAC CGGAAGCAGG CAAGACATTT CCAGCAGCAA TCCCCAGTCC TCCATCATTA TCTTTAATCC CTTTAGAGAT GAGATGAAGA TGAAGTGGAT TTCGTTTCTC TAGCGGAAT AATAGTGCCA G'rTCCCCTrT T'rGAAAATAG G'rCACTTGGA TCCATTTTTC TTTTAGGTCA TCTCCCCCAT CACCAACAGG GCAGAGGAGA CCTCTTTTTA TTGCTTCAGC TACCTGTTGA
CTGGCCGTGC
ATCAACTCTC
TTTCCAAGAC
4380 ATAGTCGCAT 4140 AAGGGACTCA 4200 TT'CTGrrG 4260 CCATCCCTAT 4320 TTTCCTTAAA CGAATCCGGT GCAATTACAA TCTTCATATT TTCCCTCATT AATCAAAGGG AGAACI'TCTA AAAAATCCCT GAGCACTTCT TTGGCACAAA AGGCGATTCC A'rTAACCCCA TCACCGATTG CCACCGTTCT TTTTTCCAGA CTCTCTTTT TGACCTGGGG TAAAAGACCT TCTTTGACTT CAAGCTAGTT TGCTAATCTC TCCAACTATT GGTGTAAATC TC7'rTGGAG AATAGAGATG AACTCTGGGA CGTTATCAAA GACCAAAATA GGAAGACCTT TTTCAAAGAC CAACTCTCCT CGCATTGCTC GACCTGCCTC TCTCCCTAAA AGA'rCAATCA CTTGTCAACA TGATGTGGTA TAACTTCGCC GACTTCA6ACA T'rCTT'rAGAA AGTT=TAGTT ACTTATAATT TGTCCAACTA GGCAGTGAAA TAGGCAATAC CACCAGACAC CAGACCAACT CATTTAGCGA TAGATGAATT CCAACAAGGA CACTCTTr'rT GACTTGTAAT CTGCGAAATT CTTCTTCTAG GATTAAGGTT CGCATGGCI' 4560 AGAGAATGTC 4620 CATTCTACAT 4680 GCTGTCAAGC 4740 CTAAACAGrC 4800 T'rCTITTTT 4860 TTAATAGATT 4920 TCTTTCTCCA 4980 ATTTTCCTGT 5040 CAAGGGATTT 5100 AGGATGCCAT 5160 GAGTTGAAGA 5220 CTTAAACTGC 5280 TCCCCTCAT 5340 CCATCTACAT 5400 420 CCAAAACACA CAAGCCT?1r ACI'.GACA TCAGTCTCC TCTCI'AAACA GCCTAAAAAT CGTATGAAGT CATCATACGA TTNTATCTAT TAATTAACTA AACTATGTGTA CAAGTCAAGG TATGACTTGC AGGCTGTATC CCATGAGAAG TCACTCTCCA TAGCTTGTTT TTGTAGGTTT CTCCAAA'rGT CTGGATGGTT TCTATACAAG TCCAATGCTG CAATAAGGAG ATAGATTGTC AAAGCTAAAG CCAGTACCGC GCGCGAACTG TATCTCGCAA GCCTCCAACT TCATGGACCA
ATACCZATCA
CAAGCAGCGT
TCTGGGTAAA
AAAAGAACAA
AAACrTIrr
TTTGAGACAA
AGATTTCCTC
TCTGAGCAAA
TCTGAACATC
GACGTGTCAA
GCCACACGGT TCAAAACGAC AGCAAGT'rrG ACATCAAAAG CCA'rGAGAAA GCTCCTTCAA TTCTrTGCAAG ATATGGTGAA ACGAGAAACA ArrCCCACCA CAATTTTGCC TTA'T'rTGG AAGAGCATCC GTCTGAGGAT TTTGGAAAGT CCAATTTAAC r'rCCTTCGAT TGGATTGAAA ATGGCAAGGT TCCATAACC TTGGCATGAG GAAGACGTCA TGATATTTGT TGATAGCTTG AGGCTGGA'rC GCCAGTTCCC GACTTTCGAC CACCACATCA GTGGAACGTC 'rGCTCTAACA CTTTCCCAGA CAAATCTTCC TATAAAGATC AGCATCAA'rC GAATCTGATC CAAATTACAT AAACGGT'rGA AACACGGTTC TCCATCGAAG GGTGCCATCA TTCCTTCTGA AAA=NGTCCT CCTCATAGGC TT~GAA'rCCAA GGTAGTCATG AACATGGAGA CCAAGCCAA CTC'rTTCTTG TGATTGAAAT GATAGTCTAA CCAT'rCACGA TACCAGATAC TTTACCAGAC CCAAACTGAC TAGTCATAAT 1'TCATGAGCA GCATAGAGAA TACCTGCCTT CATCCAGTTC GCGTAACG'rT CA.AAGCCAAC TCCAAACAAA TGGAATTCTA AATTATGAAT GGTTAAAACT CGGTATTTTT C CAACAA GAAAGGAATC TCCA1T'rMAA
TAGCTAGGTG
AGACAGTTGT
TCACCCAACA
GTTTCAATGT
ATAGCTGTAT
5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 61 84 0 6900 6960 7020 7080 7140 7200 AGATCAGGAA TAAAGTCAAT AAGCGTTCTC CGTCATCAAA TCAATAAAGT AGAAGGTTAC CGCCAACCAA CGCTCACCTC TCTACCATAT CATAGTAGGG 'N'TGGAAGAG CGCCAATGAC .GCTGCTACAA ATAAALATTTT CCACAACTGG ATGTTCTGCA TGTCCAAGAT AGCATATTCG CCTTTCCATA GCCTCAATGG CAGCCAGTTG GAAAAAGGCA ATCACCGTAA ACATGACCAC GGAAGAAATA ATATTGATTG ACCATTTAAT ACTGTTTTCT TAATTCCACA ATACTGTCTG AAAATGAAGC ACATCTTCAA TCTGATTTCC AAATTTAGCC TAAAATCACT GCAACTTCGT GCCCAGCTTT TACCAGTGAT GTCTCCCAAA CCACCTGTTT TTGAAAAGGG TGCACCCTCT CATGAATGAPA TATCCTCTGT TACTTAGCA CCN'CTTAA GTTCCTCGAA TCACAACACC ATGCTCAACT TCAACCCCTT ACCTGAGCCC CTTCTCCAAT AACAACACGA GGGAAGAGCA GGCTATCTTT AACCAAGCTA TCCTTATGGA CATGAATATT ACGTGATAGA ACACAATTAG CTACTTGACC TTCAATAATA CTACCAGAGG CAAACTGAGA AGTGCTTACC TTAGATGTAT TAGCATAGTA AGrGGCTCT TCGN'?rGA AAAGAGAATA G;AAr'GT CATTCAAGCA CAGAGTGAAT ATTGGCTAGA TAGCCCGTGT
CCAA.ATCCCC
AGTGTTCAAT
CAGCTGTTGA
TAAAACATAG
CAACCAAGGT
CTTGCTATCA
CGCAAITCT
GTATCAACGA
AAGAGTTTAT
GAAATATC?1'
TGTAGGTGGA
421.
CC'rG'rATA AATCTTGG TATCGATArr CCTGATAA ACTCGTAGGC GAAAGCTCCC CTGGATGTTC TTT AGCT CAAAGATATC TGTAGACATA GAGAAAGAAC ATGGTCTrr TCTTAGCTAG 1TrI=ATAA AAACTTCGTT CAAA'rCA.ATC TTGTGAGA 7260 TAAGATTrAA 7320 TCTTrACAG 7380 TCTTCTTCCA 7440 ??G.AACG=? 7500 TCA'rCTACAT 7560 ACTACAGTGA 7620 TTAATAAGAA 7680 GTAAGAAGCT 7740 ArrCCTAGAT 7800 TGGTCAAATA 7860 CCAAGATIrGC ATTTACTTCT TAGGCTCTTT TGrrGTrACTA CATCGCAGTT GAGGGCAACC G=CGGTACTA TTCT~rTCCA AGTAATCGCT AAGAAGGGIr CTGAGCTGAT ATTATCCTGC GGC -rGAAAG TGGGAAGTCA GTI'TGGTTTG AGCCACAACG TrCAAATAA ACTGTACrAC TTTCTACACG GGTATTGTAA GATKAGCCCC ACTCGCGTCC TGAACGAATA TGGAAAATAC CAAAGACACT ACGAACACCT GCATTAGCAA ATCAAACGAT ATTTCCACC AAATGGCAAA CrGCTACTC GACGGTCGTC CGTCAATGTC GACATATTGT GAAAACCAAC AATATTTATC AATCTTCATC TGTTGCTACC CCCACTACTT ACTTCATCTG 'rrCCATCAAT TTCGACACCG 'rCAGAAA'rAA GCACGTTTAA TCTAGCTCC T'rGACCAATG ATAGCTCCAC ACTTCCGCTC C?1'CGCGAAC TTGCGCGCCT GTTGAAACGA TCAACGAAAC ATCCGTCTAC AACTAATGAG TCTTCCACAT TTTGGTGGTG AAATCAAGTT 'rCTGAGTAA ATCTTCCATT GCATTTTCTG GAGAAATATA CTCCZATGTTC GCTTCCCAAA TCTTCCAAT AACCACTAAA TTCCTAAGCA TAAACACTTT GGAATGACAT TTTTACCAAA GTCTGACATG CCAACCTTGC A'rATTACGAA GGCGTTGCCA ATCAAAAATG TAGATTCCCA GGTTGAGCTG GINTTCTTC AAATTCAACA ATACGATTGT TGTATTTCCT AAAA'GGCAG CATIrATATCC TACAACTTGT TCGCACCTTC ACCAATAATG TCATGATAAC TGAATCAAGC TAGAATGT= AACACTTCCA GAGCATTC CCCGAGGAAG GACGGTTACG ACTATCCAAG GTGACTCAAT AGTACCAACA CACCTGACTC AAGGTAATTr TCTTTTCAGC AGCGACTA.AC TAGAAGCETr TGTAGATTTA 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460' 8520 8580 8640 8700 8760 8820 8880 8940
TACCATCTGT
CCAAAACGGC TrGCTTCTTT AAGAGGGACG TTATCCTTAT GAGACTCGAG CATATCATCA AAAATCAAGA CATAC'TCAGG AT'rGACACTG TCTAAAACTG CTACTGTCAA TAGTCCATTT TGTAGATGTG TCGATATAGT CGATATTTTG
GTTCATGATA
GCTGGCATTA
ATCCCCAGAC
GTAAATAGCG
TGACrAGTCC CCTCAAACCA ACGATTTCCT TCACTTGCAG AATAAGNG AAGAATAGAG ACACCTGAAT TAATACCGTC TAGTCCCCAG GCAATGTT GATACTCTCT AACGACCCCA GATAGGGCAA AGTCAATGAT ACGGTAGCGC CPTrGAGTGA GTTTACCGAG ACGAGTTCCT TCAT?1"rTCA I-rCrACTC CT1-r?1GGTT GCGACGTTTG AT'rTTCCA'rA CACTTGCTCC CTCATAATCT TTCCATAGTC CTTCTTGCGT GCCTCCCCAC TCTTCCAACT CAGTATTCCA TCCGATTGTA A).ATCTTTCC GCTCAACAGG TCCCTTTMrA CCC'rTACGAA TAAAGGAAAG GATTCAATA CCATCATAGC TGGTATCAAT C'rGGTTTAGC TGAGAAGCGA AATACTTCAT CCATTCCAAC 'rGTI-CTTCAG ATrCC.ATT1C GAGCAATTTC TTCCAGGGT GACAAATTTG.
TN'GATTGTAA CGATCTCCCC ACATCTTA'rG ATCGTGCGAG AATGGCAAGA GATAA'rrCTC CAGGTTAAAG TCATATT'rAC GATAGATCCG CATCCAGCCC A'rGTTCCA'N' TG2'AGTCAAA AATCTTGATC GCAGACGAAC TTTCTTCTGC 422 CTTGAACCAT TCCCAATATG GlIvrCTGGA ACATTGTGAA TCCCTGAGTT GGCACAGTTT CCACCAAATT GCACACGG TTTTGCGA'rG TGCCCACCAG CAAGAATCAA AGCTAACAT'r TT~TATTTGTG ACGGNTTAG TAGA~rrCAA CATAGCCGGT AGGGTAAAGG TTAAGGTCTG =~GAACAGTT TGATTATGTT CTTTCCAAAC 'rAC7TCCG TAAATTCCTG CAACGGTAG 'rACCATATTA AAGATACAGA CTAACATTrC AACACTCTGG TC'rCGAT'rAT CCGCATCAAT TTCCCACAGA CAGCGATGAT CTTTGTAAAA CTTAGCATTC ATTGGGTCTT CTAGGTTAGA TAGGAATTGA CCGTATTCGC TACCCATGAA GTACGTATAG AGATTGCGCA AGCCTGCGAA CATCATACTC TTCT'rGCCAT GAACCACTTC CrGAAAACA TACATAAAGC TGAAAGTCAC ATCTTCTTCG TAGAAACGGA GGATATCATT TCCTAGACCA CCAATCTCT'r TCATTCCCGT AATCATCATC ACATCTGGAT ATTCTAACTT ATAACCTTCA TAGTTGAGAT TTCCGCCATC GTCCAAATAG AGCATG7'rGC TAACAGCATC AATCCAATGC T'rAATGCAAC AAATTAAGAA ATTAAGGGCA CCCCAACCAT GGTTATGAGC TGTCCCATCA TAATAGGCTA AGGCATCATC ACAATAACCC CAATATTATG GGTATGACAC 6P** 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 AATAACCTCA TTCAAGCGCT TTTATTAGGT GTCCATGGAG C-ACACGAATA CCATCCAAAT
GAAGGAAATA
CATCATCATA
GATAGACATC
GGACTGGACT TCATTTTTTC CAAGGTCAAA C'N'ATTATGG, TCTTGGTATT CAAAAGTCGG GTTGATGGTA AAGTGACTGG TACCCAGTCC TCC'rCGACAA AA'rCTTGAAA CTCCTCTGGT CCCATAAGCT GATACCCCCA ACTCAACCC ATATGAGTAT AGTTCATTTC AACGAGATAA C'rATAAGGAC TGCCATCAGA ATTTCTTTTC ACAGGACGCT CTTCAAAGCC CCAACGTrT
CGGCCATAAG
AAAGGATGGG
GGAATGAGTT
CATGATCCAG
CTTCGTGCCA
CATGCTCTAA AGCGAAGTAA ACATCAAGGG CATAAACTCA CATCCTTGAG CTGCGCAAAA CGTGAACTTC ATAAATATTG GCCAAAGTCC ATCCTTCCAT 423 CCTGTTCCTG GACGAGCCTC ATACCTGACA ?Tci~rTcAG GAAGcTcTGT 'rAcGATTGcc GCA.AAAGGGT CAATCTTCAT CAGTTGATGA CCATT'N'AC GPGT( ATATGCCCTI' CrrGAGCCAT ATrrGGI'AAAG AC?1'CCCAGA CCCC ATTGGAATCT GATTTTCAAT ccAGI-rGGTA AAATCACCAA CCAAC I-rAGGTGCCC AAACACGGAA GGTATAGCCA TGCTCTCCAT TAG1 CCTAGATAAT GTTCGAGAT'A AAAATTTTCA CCCG'ICATAA AGGT' TTATCCATAT ACTCCCCTC TCCTGTAAGC GIT1'CTATG T=N, TAGAGAAGAT TCAAGTAAAT TACTATACT CTTrAATTAT Trra TCACTTACTC GTrCAATTG AAATCAATAT TTTTTCAAAA AATT( TTTTCTACTA TAGTGAAATG AAATAAAACA TGCGCAAATC GATT TTTCTAACAA. TGTCTTAGAA ATCAAAGTGI' ACTATTTTAA CTCC INFORMATION FOR SEQ ID NO: 46: SEQUENCE CHARACTERISTICS: LENGTH. 7577 base pairs TYPE: nucleic acid STRANDEDN4ESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: ACATG ATA7"rTGTAA
AAATC
~TGAAC
rrc
ATTTCTTACC
AGCCTGAGCA
CCTATGTGCT
TTAA TGCTTCTCTA 'ATTAT ACTACC?1'TT k.AAATC TACAACAAGT ~CGAAA ACGCCTI'CT A.AGGAA TTTAATCTAA 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11384 06 s *6.
0.0.
TGTTGATTTG TTACTAGACG TTGACCAACG TTTCCAACAC GTTTTCGCCA TGTTTGGTGC GCCTGTATCT GTTGCCCTTT TrGCTTCAGG TGGTTTTAAA. GTTCCAGTTT ATCTAGGTTC GGCTATGAAA GAAATGGGGG GGGATGTATC Tr'rGGTCTAT GTCCTTGI'TG CTACCAGCAT ACTCTTGCCA CCAATCATTA TCGGTCCTAT .PTCAGCTG?1' ACCAATGCAG GTCTTGTAGC CG~TGTTACT TrCCTAATTG CTGCCTTTAT TCCTTCGGCT GGAA.AAGGAA T'rCTCCTTAG GACCATCI'G GTACCATrGA TTTT-GGGAAr
TGTTGGAACA
TTCATTTGC
TGCTGCCCAA
CCGATTI'GTA
GATCATCGTT
AGACGGAAAT
CAATACAAAA
CrCATCTACA TGATTGCTAC TTTATCACAG CTATGTCACT ACAGGGGTTA TC'rTGACTGG GGAACAAAAT GGATTGATAA ATCGGTCTTG GACTTGCAGG TGGAAAAATG CTCTGGTIAGC GGAAAACGCT TCCTACGAAT GCACTAACTC TTGGCTTGGT CCTGGTTTCT ACTTGCCATT GGTCCAGAAG CC~rCGCTAT CATTCCAT'rC CTCTTTGCCA TGACTTTACA CCAGTTCT'rA TAGCACAGGT G GTG CCTTTA TTATCGGTGG TTACCT MTC AAGCCAACTG GTTCGAAATT AAGAGTACAA TCTTTACT CTTGCCAATC GCTATCGTAA AATCTGTGGT CGTCAATTCT TATCGCAACT TCTGTTTICTG TACAGGGT ATCGGTATGA CATCGCGATr GCCCTCAGCT CGCTGTACI' GG'rG=TATGT AGTCTTGATT AAAGAACGTG TATGTTGG?1' C7TGGACTTG TACTGCCCTT TCAGCCATGA AGACTAAGAG TCTAAATACA A'rAATAA.AG T1GTCTTAACA AAAACCTTGA TACTGTACGT GATTTGCATC AATCTCTTGT CATACGTT'TC ATGGACTGTT AAACATCTGG CTCATATCTG GCCTTGTCCC ATCATTCCTT TTTAGGAAGG 'rTATT'rGGAT CA'rAACACCC TGCATGAGCT GACGAATGTA TTTCCAGAAC TGGGTTTTCA CGCTCTTCAG GCGTTCATCC ACCTTCATGT GAGCAAGTCT TCCATCGGCC AAAATCAAAC GTGTTTCGC TTCCTGAGAA GCTTTCTCAA CATGCGGTCI' GGGTGGAAGG -GA'rTGG=TT CCAGTAATGT AATC7T 'A AGGATGACCC GCCTGCTTCC TGACCAATCA TGCTTTCACA TCACGAAGCT ATCAATCAAG ACATAGTCGT CAA7'TC'GA
TAAAAGAACC
CCTTCCTTGG
CTCGTATCGC
TCCTTGGTAA
CAATCCTT-CT
TTGATTTCGC
GAGGAGCTAT
CAGGAATCAT
CCTAATCCAC
AAATTATTAA
TT'rA'CATAG
CTTACTTGCG
TCATGGCAAA
GCATTCCTGC
CAAGGGCAGA
TAATCCCCAT
GTT-TAGCCTG
CAGCAGCAAT
GTGTCATCGA
TTTGAAGGGC
CCATATTTTG
GCATC?!YCTC
TCAAAGTGAG
TrCAATGTC
GACGAACAGA
CAGTCACTTC
TAGCATCAAC
CATTCATGAG
TATGATTAGT
ATTCACTGCC '7I9S7VrrCAA CTATGGGGTT ATCGCCAGCA TCAAATGCGA AACCTCA'rCA CCTTAAACT GGTCCAGTI'A CTTGAACTTG ATCTTGCCAT TCAGACAGCT GAGTGGA'TT AATCAAAAAA CGTATA.ATAT AAAT TTTAC TrrTT=C TTTCT'!CTTC GCTTT =CA TTCACCAAT'r TTACCTTTCA TCCTCCGAGA GCTGATAAGT CATATCCATT CCTCCCATAT TTGCTTCATC ATTTTATTCA GTTAAAGTCC TrTGATGAATT ACGACGGCGA CGGCTTGGAT AGACACAATG GCACGTTTAC TGGATTGTTG GCCATACCTG CACCTGATCT AATGATCGA AGCCATTTCA AGGGCTTrrT CATATCCCCC ATACCAAGGA CGTAATCTT 'CACCTGTAC CAGAGCAGCA CCACCACGAG CAACTGAGCA TT'AAACTCAC
CTATTCCAAA
ATGCT'rrGAA
TCGCAAGTGC
CACTTTCAGG
ACGAAAATAA
TTCGTATACC
CAGATATTCT
CATCAAATGA
'TTTTGTTAGC
AACCGCCACC
CAGGCATACC
TTGGCATATT
TATCCCCAGA
TATTGACTTC
'PTAACAAATC
GAGCAATCTG
GAATCATCTT
TGAAATCATT
GTTcATcarA
TACGGCTAGA
CAGTGAACTT
TATCGCCATC
GCGCAACATT
424 ACATATCGGA GACCATACTG 7=TGGGTCA AGGTCTTCAC CCTACTCT'rC T'rGGTGACGG TGGACCAGCC AATACAACTT ACGCAGAAAA TTCTGTCTCA GTrATCCGTA ACGCTCCTT 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520
GACAAGCAAG
GAGCTCATCA
TTGGGCTTGC
ATTTCATTTG GTITGAGCCAA ATC'rGCAAAC GACCCGCAGT TCCAAACCTI' GACGTACAAT 425 CTCAACAGCT GGTACTTCTG ?1'CCAAGTGC AAAGACAGGC GGTCTTAAGC TGGOTCAATGG CAGC 'GGACG ATAAATATCC AGCATTTTCT TCI~rCTTGA GTTTGTTGGC CAA'rrACCA ACATCAA'rCT
CCCGCAATCA
GCAAAGGTG
CCCTTGTAAA CCAACCATCA TGATGATGGT CCTATCAGAA CCTAAAACGG CTGTCAATTC AGCATTAAGT GTATCAATGA CCTC-ATGCCC GTCCTrrACA ACAGGCAAGG CAACGTCGGC TGCCTCTTGG ACATCAGATT CAGAGATTTI CTGCAAACGT TC'TG1'TAAAC TTTCAAATC ATGCTTGTTA AAATTTCTAT CTrGCTCCTC ATCTGATCAA AAATCTGACT GCGGACAA'rA TAATCTTCCA GAATCrrrC TGTTCGCTTG CCGAACTCCT CCGCAATTTC AGCAAGGCTG TCrATTTGC'T TATCTGTCAA AAGCGCCGCA TGGAATCTTA GGACTTGA CTCATCAACG A'NT=AATAA GACTGCACGC TCACGAACTT CTCGAGCAAG GCCAAGCGAA TCCTT-rTTrA CGTAGATE? CAT-T'rCTT CCTCTTATTC
GTTGTCCCAA
TCAAAGGACC
TT'ACCAGC
TAATTTCTC
TCTGTrGCGC 'rCTGATAAA TrrcTrGGT
TAAAGACGTT
TCTATTATCA
AGAAAGTCAT CCTTrGGGATA GCCCTCCAAA TAGCCCAGT ACATGTGCAA TTTCAkTCTCA ATATTGTCAT AGACAGCCT'C ACGACTGACA TAATCATCAG CG'rAGTAGAG CTCGATATAA TAAAATTCAA AGAGCGCATT CATACGATTG CCAAAAAT'rA GCCTAATCTA CCACACTAGG
GTTTTTCGA
AAGCCGATCC
ATTTCCAATT
TCAATCTTTT
ATAGGTGTAG
GGATTAGAAA
TT'rCCATAAC 'rTTTATTATA AAGAAGATAG ATAGCTAAAT GATACCTGGC AAAGGGATGT GCCTATCAAC TTGATAATGG GAATATAGGC TAAACTATCG ATCGGCTCAT CACAAGTTCT 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 TTGAA.AAAGA CATGAGCCTA CCCTCTTGAT TTTGTAGPTG CTCGTTTGGA TGATAAACTC
GCCCCAAGTA
ATAATCTAGT
CTGCATCCC
CTATCCTr'rA GAAACCGCAT AATGGTCTTG TGACCATGAA ATTTAATCAC TACTTTTTCC TTTTCCTCAT TATAGAAAAG TAAAGCTCGT CAA'rCACT'rC TTCACATCAG3 GCCCTCTTTC ?TGCTATAAT GGTCATATGA TGTCAATAAT CAAATCATCT CATCAAACAA CrGGGAACTT TCAC1TGTCTA GGAGTCTATG TCCTGAGGAA 'rGGAATCCTG CCTTGGGCAT GGTGCCIYACT CAGGTAGCTA TAATCTCCTT TTTCATGCAC TTCCACATCA CAACTGCTCA 'rCAAACTGAA TCGTA'rrrCG CATCCGAATC TTGTCTCTTG TCCTACTATT TTACCAAAA.A GAGCAGGATT ACGAAAAAGT A'rTCCGTCAC CCTGTTCACA ACTACATCCA ATGACTTGAT TAATACAAAA GAATTTCAGC G7TTGCGCCG CCAGTTATAC CI-rCCACGGT CGAGAACACA GTCGCTrTCT AAATTGCACG ACGCATCACA GAGATTTTCG AAGAAAAA'rA CCCAGTCTCT CTTGACCATG ACCGC'TGCTC TCCTACACGA CCCATACT'TT TGAACATCTC TTTGATACAG ACCATGAAC 426 CATTACTCAG GAGATTATTC AAAATCCTCA GACAGAGATT GCCACCTGAT TTCCCAGAAA AGGTGGCCLG TGTCATTGAC GGTCGTGCAG CTCATTTCTA CTCAGATTGA CGCAGATCGC CTCCTATTTT ACAGGAGCAT CCTATrGGGGA ATI'TGACCTG TCGTCCTATC GAAAA'rGGTA TCGCCTI'TCA GCGCAATGGC CGTCCTCAGT CCACCAGA TGTACATGCA GGTIrATTTC GGAAGTTCTC CTACAGAATC TTCTCAAACG CGCCAAGGAA T -rCTT'rGCC CGAACTTCTC CACACCTCCT GCC?1'TCTTC CACCAAGTCC TGCI'ACAAGT CATACCTATC CTAATAAGCA ATGGACTATC TCTTGCGCGA ACTCGAATCC TCCGAGTCAT ATGCACGCCA TCGAAGACTA CACCCCGCAA CACGCGCCAT 4320 4380 4440 4500 4560 4620 TGACTATCTG GCTCTGGATG TCCTGACAAG ATTCTTGCAG CATTACCTTT TCACAAGAGG TATCGGCTT GATCCCGACT TATCTATCGT CCCGAATCTG AGALACTGGCC GAACTCTCTA CGGAGATAAT CGCTTTTATT CAT'rACCCAG CAATTTTTAC GAAGAGCAAA TTTATGAGTA CAGCCAAAAG GAAATCACTC TGTCAAAGTC GTGATTGCAA CTTGCAGTTG AGAGACGAGG AACTGCTACA GGACATGAGA GGAATTCCTC ACTCGCAAGC TACTGCAAA'r CGCAATATCG TAT CTT CTAC CGTACCCCTG ATGGCGTGAT GAATACCTAC ATTTATCGCA TCGCTT'rGTC
ACCAAGATCA
ACTACACTGC
AAAACCCACG
GCCI'GTCTCC
ACTTACrTAGC ATGAGAAAAT CATTCATAAG AACTTTGACC GACACAGATT GAGAT'rITAC TA'rCGTCCAA TCCCTTCTG
CTCTATCCTG
GAAAAAAATG
TTCCAGCTT
AACCGCAAGG
AGOACAAGGA 4680 TGACCTTGAC 4740 GGATGACCAG 4800 TCTTTAAATC 4860 TGGTTGAGGA 4920 TCCCTTATGA 4980 AAAAAAATGG 5040 GCAGTCGCCA 5100 TCTTTrGCAAG 5160 ATAAAAACTA 5220 CCCTTGTCAA 5280 AAGAACrGG 5340 TTCTAGACGA 5400 TTGTCCAAGA 5460 ATCTAGATAT 5520 ACGGTATCTA 5580 TTCCAAAAGA AATGTrTGGAC CAAAACAGCA ACTTGATTGA GAACGATCAT TTTACCCCAA 'rTAAACTAAT TGCCGMrGAT ATCGACGGAA CTGAAGT'N'T 'T-CTGCCATC CAAGATGCCA CTGGCCGCCC TATCGCAGGC GTTGCCAAAC GGGACTATGT GGTAACCTTC AACGGTGCCC TrTATCAGCGA ATCCTTGACT TATGAGGA'TT TCGGTGTCCA CATGCATGCC ATTACCAAGG GAAAATACAC TGTACACGAA TCAACCCTCG TCAGCATCCC AAGAAA'rGGC TGGCAAAGAA AT'rGTTAAAT GTATGTI'TAT CGATGAACCA GAAATTCTCG ATGCTGCGAT TGAAAAAATT CTACTCCATC AACAAArCTG CTCC2TTCTA CCTCGAAC'TC GGGTT-CAGCC ATTACTCACT TGGCTGAAAA ACTCGGATTG AATCGGTGAT GAAGAAAATG ACCGTGCCAT GCTGGAAGTC GGAAAATGGA AA'rCCAGAAA 'rCAAAAAAAT CGCCAAATAC ATCCGCGTr GCCCATGCCA TCCGAACATG CCTACTGTAA CCAGCAGAAT TTTACGAGCG CTTAAAAAGA ATCTAGACAA ACCAAAGATG AAACCATGGC GTrGGAAACC CCGTrG'rCAT ATCACCAAAA CAAATGACGA AAGTATCATT 'rrTCAATAAG 5640 5700 5760 5820 5880 5940 6000 6060 427 AATTGATTAG CAATAAAATC TAATCATAAT AGAGACACAA
TGCCCTTACT
TAATTTCCCA
ATTATACAAA
CCTGGCCAAC
AGA'rATTTTA
TTAGGATACT
CAATGAATTT TTTTAGCAAA CTA'rTTAATT ATTCTGATTG TAACAATTTT TACCTAAACG TCATACTCAT AGATTGGACT CAAAAAACAG AATACTCI'CT 'CAAATTGAC CCTGAATCTA TCACTAAAAA TAAGACTT TA 'CATATCTTA AACATAATT AGCTACTTCT GGAAAAAAAG
TAAAACAAAA
AATTAGAATG
GGAGAAATTA
CACACAATCA
CAGTAGTAGA
CACAATTAAT
ATTAAAAACT ATATTATCTA TACAAGAATA ATTAATAATG AACAAAAGAA GCACAAAATC TAAAGACATA ACTATGGAAG T'rTTGGTGAT ATAAAATGGA
TTAACATTGA
TTATTATCGA
ATTATTGTAA
GTCTCTTAAA
AGATTTTGGA TGCTTATCTA ACACACGAAA TAATATTAAA CATGCTTTAA 'rATATTACAT AGAAAACAAT TTTrCAGCCA CTGATTATCC CTCACTAAAA CATATTCAAA CATTAATGGA TTTTGATGAA GCATTATTTC GCTTCTCAAT AGATATTGAC TATT'rAAGAG 'rTAATTTACC CTrAAAGAAA TATGAATGTT ATAGTCCITTT AATTGACTAT ACAAACA'Tr ATATACTCGA AAAT'rTTCA ATAATAAAAC GCATAATATT TTTAATATCA AAGACTN'T GACAAACTTC AATCGAACCT GCAACTACTC CTTAGGAGGG AGATAAAAAC TCTGCTA.AAT GAGCAGAGTT ATACGAGCT G CTTTACCTTG AAGAGCACGC CCGTAACGAA CAACTTCGAT TTTTTCAACA ACACCTACAC CG'rTAGAGAT TTTACGAACT CGTGCGATAA CAACACG CAATTTAGAT A'rTCCGTrCG AATrTAAACr ACATAGTOAC TATATCAAAG CA'rACTATGA AGAAAAAAAC AATATCrCTT TTAAACCAAC TAAACAAGCT CATAGCAATA C'rGTATCAGG GGCTCAGGGA AGACATT TGG AACAAGAAGG AGAATCTTTG ATAATTACTA ?r'rCAGGATT AGTATATTAT CCCCATTCGA TACCTGACrA ATCATTATCT GATCACGATr ATGATTTTGC TTTGAATGCA AATCATrT- TATCTAAGGA 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7577 TCGT'rCCACT
AAATAATTTA
TATAGTCTCA
ATGGAGACAA
TTTGATATC'r
AGTTGTTATA
TTTTAGTCGA
AAGTAGTACA
CGTGGAGTGT
GTGTAGTTTT
GC'TGAAGAAA TAAACAATTA GAACTTGACG ATTTTGAATA
S
ATrTTATCTG
TCCCCTATAT
AAT'rACATGC TCCAT'rGAAC
ATTAACGACG
AT TTCGCACG
GGATTGGGAA
CTGAGATTCC
ATGA'IrGCCC
TATGCGTTCT
CCCCTGCAGG
TAAGGGAGCT
GATTTCT'rTG
ACGTACTTTA
GATACGCTCA
AGCACCTTTA
INFORMATION FOR SEQ ID NO: 47: SEQUENCE CHARACTERISTICS: LENGTH: 4945 base pairs TYPE: nucleic acid sTRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: CCTCGCTGAT GATrGGTGCT GTI'TTATITG TTCCTGAAAA TAGCGGAnCT AATACAGAGC ATGAAGCTGA TAAGCAGAAT GAAGGGGAAC AAGGAGTAGC GATAGCATCT GAAACTGCTT AAACTGCAGA AGCAGCTAGC GCAGCTAAAC AAACACCATC TGCAGAAGCA AAACCTAAGT C1VvGTCCAGC CTTGGCTGAA GAAACTGCAG TTG?1'TCAGG AGAGAGTGAG ATGCTAGAGA AAACAAGCTA CGCCAGCAAG CAATGAAGCT CAGAGGAAAA AGCAAGTGAG CTGACAAGGA AACAGAAGCA
CATTCGACCA
GAAAAGGCAG
GCAACTACTG
GTGGTTGCAG
AAGCCCCAAG
CAACTAACCA AGGGGATCAG TCCAGCCAGA TGTCCCTAAA ATTCI'GGGA AGAATTGTTA GCGGATCTGT TGTCCTCGCT AGGAAGCAAA AGTTCAACCC 7TrGTGAGA AGAGI-rCAAG TCTTCTGGGA AGGTCTCGTA T'rCCTGTATA CGGTACACTC TTGCTGAAGC TTTGAAGCAA ACATGGCCAA GTATTATGGC TGGTTALAACC TCTTGGAGAA CTAAGGTAAA CCATCCAATC GTTATCATCA AGATGGTTTG AGGTTCCGGC AGATAACT'rC CTATTGCAAC TGCCAACTGG TGCAACAGGG TGGT'rCCTAC GGAAATTGCG CCTTTCTcTT CTGGTGAAGA TTATCAThAA CTGGCCAAAA ACCAGGTGAC CGCCAGCGGT AGGTAATACT TCGTAGATGG TAAGGTTTCT TCTAAACCAG CAGCAGAAGC TAATAAGACT GAAAAAGAAG AATACAGAAA AAACATrAAA ACCAAAGGAA ATCAAATTTA AAATGGGAAC CAGGTGCTCG TGAAGATGAT GCTATTAACC TCACGTCGGA CAGGTCATT AGTCAATGAA AAAGCTAGCA TTATCAAACA CCAATTCTAA AGCAAAAGAC CA'rGCTTCTG GCCTATGCTT T'rGACTATTG GCAATATCTA GATTCAATGG CCAACTCCTG ACGTrATTGA TGCAGGTCAC CGTAACGGGG 'IT'rCAACT GGTCTAATAG TATTGCAGAT CAAGAAAGAT GACGCAGATG GTAGc'rTCCC AATTGCCCGT AAATrGGTAG TATCATGGCT ATTTCATCAA CCAAGAAACA ACTGGAGATT AAGATGCGCC AGTTTATGCT CTATAGCAAG GAATA'rGCTG AAGTATTCTT GGTACGATGC CATGACCTAT AACTATGGAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 GGAGAATACA ACTACCAATT CATGCAACCA GAAGGAGATA TTTGCTAACT TTAACTGGGA TAAGGCTAAA AATGATTACA ATTGGTCGTA ATCCTTATGA TGTATTTGCA GGTTTGGA.AT AAGACAAAGG TTAAGTGGAA TGACATTTTA GACGAAAATG GG'rrTG CCCCAGATAC CATTACAAGT TTAGGAAAAA AATGAAGATA TCTTCT'rTAC AGGT'rATCAA GGAGACCCTA AAAGATTGGT ATGGTATTGC TAACCTAGTT GCCGACCGTA TTTACTACTT CTrTTAATAC AGGTCATGGT AAAAAATGGT AAGGATTCTG AGTGGAATTA TCGTTCAGTA TCAGGTGTTC 429 7TCCA6ACATG GCGCTGGTCG TTACAGATGC CTATAATGGC CAGATCAGGA TGTGAGACI' CAGACTTCAA CAGGGGAAAA ACT1CGTCCA GAATATGATT GGAAATTCCC TTAAATTCTC TGvGTiGATGTA GCCGGTAAGA TATTCTACTA A AGAAGT AACTGAGAAG ACCAAACTTC GTGTTGCCC-A CAAGGGAGGA AAAGCTA ACTACAAATT CG.ATGATGCA GATGC.ATGGA ATGAAGAATT TGATCTTAGC TCACTAGCCG TCGAGCATGA AGGTGCrGTA AAAGATTATC ACAATCACCA AGAGCCACAA TCGCCGACAA ATGCCCAAGA AGCGGAAGCA G~rGTGCAA'r AAGTTTATGA AAAAGATGGA GACAGCTGGA TTTATCTACC AAAAGTTAGC CGCTCZAGCAA TTGTAGCAGT CGGTAAAAAT GGAGTTCGTT GTATCACTGT AAAAGATACC AGCCTACCAA AAGTTTATAT GGCAT1'CTCT ACAACTCCAG AAGACCTAAC CCNrTCrGAC AACTGQACAA GTAAAACCAT CTATGCAGTC AAACAT AGTTTrAACCT AGGACAATTA ACTA'rCTCGG GCIrTG T AGTGAAACAA TCTCTTAAAA TTAAAGGCAA CANAGGATGCA GATTTCTATG AATTACTAAC TCGCTCATCT TCTACAACTA GTGC'TCAGGG TACA6AC'TCAA GAAC"rAACG CAGAAGCTGC AACCACAACC TTTGATTGCGC AACCACTAGC TGAAAATATC GTTCCAGGTG CTGAAGGTGG AGAAGGTATT GAAGGTATGT AATGGTCTTC AGCTCAGTTG AGTCGTAGTG ?rGT*rAGATG GGTCA'rCCAT CATGCACGAG TGAACACTAA AGACTTTGAC CTTTIrrATA AGGAAGTCCG 'rGGTAACAAA GCACACGTGA CAACAGTTAT TGATAGTACT TGAACGGTAC CATTACTAC TGGATATTCG TTTGACCAAG CTGGTGGTGA GTCTGTTAAC AACATGCAGA TGGTGAGTGG CAGATATCAC TCT'rCATAAA CTCACAATGG AACTCCATGG TTGATACTGA GAGTGTCAAT ACAAGGTACA AGTTGGCTTr ATCCAAATTC TCAAACTCCG GTGCACCA7'r GGATTTGACA
TTCCCTAACA
TTG'rCAGATA
CCACGTACCC
GATGG=TGA
AAGCTAGCTA
CCAATCACTG
AAGGCTATTC
ArrCCGATCG 1680 1740 1800 1860 1920 1980 2040 2100 21.60 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360
CTCAAGACTG
GTATCTATAA
GCGCTTGAAT GTTGTCACTT CTGGAAAATG TATGAAAAGC CCAAGGCTGC AGCCCCTTCT GCAGATGTAC CCGCTGGAGC AACTATTACC CTCGCAACCT TGAAGAGCGA AG7'rGGAGGA CTAGGCAA'rA
G=TATGATA
GACCTAGCAA
GCAAGGAAAT
CAGTrCAGCCr
TTAGAGGTCG
TAACTCACC
TTACTCTCCA
TAGTAATGTC
AGAAACAGGA
TGTTCTTCGA
AGGTGTATCA
ATAI'TTGGGA
AATCAATCTG GTCTTCTA TATCGTACC CAGTGCCAG CTAGCAGTTT CCGTTCCAAA AGATGACAGA AGAA'rCALCT ccTAAGAAAA CAAGCTACGC CGAAGGGGAG GATTTGGACC G'rTCAGTATG AAGGAGGAAC TGAGGACGAA CTCATTCGCC GTATCAGGCTT TTGATACGCA TCATAAGGGA GAACAGAATC cAAccGGTAA ATGCTAATTT GTCAGTGACT GTCACTGGCC 430 AAGACGAAGC AAGTCCGAAA ACTATTrrGG GAANTGAAGT AAGTCAGGAA CCGAAAAAAC ATTACCTAGT TCOTGATAGC TTAGAcTTGT cTGAAGGAcG cTTTGcAGTrG GccTATAGcA ATGACACCAT GGAAGAACAT 'cTCTTACTG ATGAGGGAGT TrGAAATTTCT GGTTACGATG CTCAAAAGAC TGGTCGTCAA ACCTTGACGC ?ICATTACCA AGGCCATGAA GT'rAGCTTG ATGTTN'TGGT ATCI'CCAAAA GCAGCATTGA ACGATGAGTA CCTCAAACAA AAA'PTAGCAG AAG=TGAAGC TGCTAAGAAC AAGGTGGTCT ATAACTTTGC 1'TCATCAGAA GTAAAAGAAG CCTTCCTTGAA AGCAATTGAA GCGGCCGAAC AAGTGTTGAA AGACCATGAA ACTAGCACCC AAGATCAACT CAATGACCGA CTTAATA.AAT 'rGACAGAAGC AAGAGAAATT TACGGAAGAA AAGACAGAGC TTGATCGCTT TCTTGGCTGC CAAACCAAAC CATCCTTCAG GTTCTGCCCT ACAAGGCCTT GGTTrGAAAAA GTAGATTTGA CTCCAGAAGA GTCTAAAAGA TCTGGTTGCT TTATTGJZAAG AAGACAAGCC TCATAAAGCT CTGAATGGTC AACAGGTGAG GTTCAAGAAC AGCTCCGCTT CTTGAGAAAA GCTTACAACA GCGAAACAGA AGCAGTCTTT TCTGATAGTA TGTCATCAAG GGTTTGAAAG TGC'rGGAGAA GATGCTCATG S S
S
S. 55 S S
S
S. SS SS
S
Sn.
S
*SSS
S5 55 S S
S
S. *S S S AAACAGGTGT TGAAGTACAC TAGAGCGTGT TCAAGCAAGT TCTTTGAAA'r AGAAGG'N'rG TTGTGAAAA'r CCCAATTGAA GCAAAGAGGC AGTAGAATTG CTCACTTTAC TCAT'rATGCC CAGCACCACA AAACACAGTC AGGCTCCTAA ATTGGAAGTT ATACTGAGAT GCTAGTTGGG GACATGTCTT TGAAG'rTGAT TTCTCAAATA AAGAGAAGAC GCTGAAGAGA AGAAATACTT GATGAAAAAG GTCAAGATGT AAAGATAAGA AAGTTAAGAA GCTTTTGAAC AAACGGATAG 'rTGTTTATG AATCTGCTGA C1'TCCAAAAC CTACTTATCA CAAGAGGAAA AGGTTGC.CTT GAACAACGAG TCATCATACA GAAAACGGTC AGCGTCGTCT 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4945
TGATCTCTCT
AGTATTTTTC
TCATGTTATC
AAAACCACAA
ACCGACTTCT
TCATCGTCAA
GGGACGAGAT
TCGTTCAACA
TATGCTTCTA
TTACCTGAAG
T'rTACAGCAC
CCTGCTAAAC
GATCAACAAA
GAGCATGAAA
GGACTGTTAA
GAAGTrCATCC
CCAGCAGTAG
GCAAGCAAAC
TTAGCCAGCC
GAT'rAAATAT
AAGAAGCGAT
TAGC'rACACA AAT'rGCCAAA T]GGTC'rTGC
TCCAGAAATT
GGAAAAACCA
TACAGGAACA
TAGTT'rAGCC GTTGALAATTG GAACAAAAGT AAAALACAGTA GCTCAAAATA CAGCAGT'rAA ATCAGAAGAA GCTGATGCTA ATGAAGCCCT AATAGCAGGC TTGACCTTGA GACGGAAAAG AGAAGATAAA CGAAAAATCT TGTGAAATCT TTCCG INFORMATION FOR SEQ ID NO: 48: SEQUENCE CHARACTERISTICS: LENGTH: 25002 base pairs TYPE: nucleic acid STRANEDNESS: double TOPOLOGY: linear (xi) SEQUENiCE DESCRIPTIION: S= ID NO: 48: GACAACTCAA GrAGC~~?= C~rTT=MrA AAAAGGAGAT CAGAGTTTAA CTATGTCAGA AAAATCACAA TGGGGGTCGA AACTGGTTT TATTCTAGCA TC'TGCTGGCT GGCCATCGGG 120 CTTrGGTTCCG TTTGGAAGTT TCCCTACATG ACI'GCTGCTA ATCGCGGTGG AGGCTrTTA 180 CTAATCTTTC TCATTTCCAC TATTTTAATC GGTTTCCCTC TCCTGCTGGC TGAGTTTGCC 240 CTTGGCCGTA GTGCTGCCGT TTCCGCTATC AAAACC~rTG GAAAACTGGG CAAGAATAAC 300 AAGTACAACT TTATCGGTTG GATTGGCGCC TTNCCCCTCT TTATCCTCI-r ATCrr'rTAC 360 AGTGTTATCG GAGGA'rGGAT TCTAGTCTAT CTAGGTATTG AGTTTGGGAA ATTGTTCCAA 420 C?1'GGTGGAA CGGCTGATTA TGCTCAGT'rA TTTACTTCAA TCATTPCAAA TCCAGCCATT 480 GCCCTAGGAG CTCAAGCGGC C-rATCCTA TTGAATATCT TCATTGTATC ACGTGGGGTT 540 *CAAAAAGGGA TTGAAAGA GC TTCGA-AAGTC ATGATGCCCC TGCTC~rrAT CGTCTTTGT'r 600 TT*TATCATCG GTCGCTCTCT CAGTTTGCCA AATGCCATGG AAGGGGTTCT TTACTTCCTC 660 *AAACCAGACT TTTCAAAACT GACTAGCACT GGT TCCTCT ATGCTCTGGG ACAATCTTTC 720 **.TrrGCCCTCT CACTAGGGGT TACAGTCATG TTGACCTATG CTTCTTACTT AGACAAGAAA 780 ACCAATCTAG TCCAGTCAGG AATCTCCATC GTAGCCATGA ATATCTCGAT ATCCATCATG 840 GCAGGTCTAG CCATTT1'CCA AGCTCGATCC CCCTTCAATA TCCAGTCTGA AGGGGGACCC 900 AGCCTGCTCT TTrATCGTCTT GCCTCAACTC 1TTGACAAGA TCC7TTTGG AACCATTTTC 960 TACGTCCTCT TCCTC -rGCT CTTCCTTTTT GCGACAGTCA CTTTTCTGT CGTGATGCTG 1020 *GAAATCAATG TAGACAArAT CACCAACCAG GATAACAGCA AACGTGCCAA ATGGAGTGTT 1080 ATTTTAGGAA TTTTGACCTI' TGTCTTTGGC ATTCCTTCAG CCCI'ATCI'A CGGTGTCATG 1140 GCGGATCPC ACATTTGG TAAGACCTTC TTTGACGCTA TGGACTTC?1' GG TTTCCAAT 1200 CTCCTCATGC CATTTGGAGC TCTCTACCTr TCACTTTA CAGGCTATAT CNTTAAAAAG 1260 GCTCT~TGCAA TGGAGGAACT CCATCTCGAT GAAAGAGCAT GGAAACAAGG ACTGTTCCAA 1320 aGTCTGGCTCT TCCTTCTTCG TTCTCGT'T TCGTCATTCC AATCATCATC A?1'GTGGTCT 1380 TCATTGCCCA ATTTATGTAA TCAAAAAGGA CTTGACTAGT GAACTCAGGC CCPq CTTTT 1440 TATGGATGGC TAACAATCAA TTCCAAACCT TGCCCTTCCA GAGTCCAAGC TTCAACATCA 1500 CTTGGTAGGA TAAAGTGGCT GCCTI'TGA ATTGGATAAT NTTCCCGTC AACAGTTAGC 1560 TGACCTTGAC CAGCCAAGAC ?TCCAGTAA TrCCCACTr CGCAAATCAT CTCTTAAC ACATCGATGG ATTT'ITCAAG AAGTCATAGA CGCCATAGG1' GCCCCGATAG CGTGCATAGT ACTTTGGTCA ACAAGTCATC GACTTGGCAT TGTGACCGTA CATTCTGTTT' TT'CCGAGTTC TGGACACTGA GCCAGTCGTT GGACGATTGC CAAATAATTC 'rAACGACCAT TGGCAACTT CCGATTTTTT CACTTGGGAT
ACTCAATAAG
GTAAACTrGCG
AGTTACAGGA
ATGAAGrrCA 432 CTGTAGTCAG C74STCTT'TTC AAGAAATCAT TAGATACAAG CGGCTA7*rTG CTGGCTCACC CGCAAGT'C C~rTGTCATC GACTGCTGGG TTTCAAGGAT ACATAGAAGA AATCTCCAGIC TCCTCGATTT GCTGG.CGGAG GAACCTTCAT CCGC'TGCGAT
AAAGTCAACT
GAGAGTGGAA
AATGTTCAAG
CTTGCGGTC.A
TAAGATACCC
CTTAACAGGG
AATGTACC-AG
GCTATCGCTA
CCCGCTTGGT
cCAGTCTG
GATAATCTCT
GCCTTCATGC
GGCATCGAGG
ACGGTGTTCC
AGAGACTCCA
GTCGTAGCCA
9e
S
S.
S
9
S.
S 9
S
9S *S S S
S
~S
I
V
In.
55 5 9 4 9 55.5 S. Se S S
S
A7'rTTTrT' GCATAACTGA TTGTAAAAAT ATTTTTCTCC CCTCATTATA GCAAAAAAAG TAAAGCAGGG AGAAGATTTT ATAAAAATAG CCATTGCTAT AAATGACATC CrTG'rACCAA GCTCCGTTTC C1'ACATTATC TCGATCTACA CCTGTGCCAG CTGAAACCGG ATCGATACAG CCGTCT'rGGT AAATGGCATC TCGCATGGCC TAGTCATCTG CrACATAACC A'N'CTCATCC ACGATAATAC TAAACTAAAA TCAAAAAGCA GAAGAAATCC AAACCATCAA AACACTTTTA CGCCT'rCAAA TCGTrCTATA GTAAAATGAA CAAATCGATT 'rCTAACAATG T'rTAGAAGT .ATA'rTTCGTC TGATGGGCAA ATCTTATAAA ATAAGATGTG AACAACCTA TCAGGAAAGT CAAGG'TGTAC TGTTATAGAT TCAATACACT AATGTAAAAA AA'rATGAGGA GTrCGGACTC CGTAACCATG CATATATGAC AGTTGAGGAA TCGAGTCCAT AAGCATCGTC TGCGTGAACTr ATCTTGGTCA AAAGTGGAAA TACAGGTTCT GCATACAAAG TAGCAAGATC TGTTCCCTCG TTGATGGG CTGAGATGGC CCAATATTCT AACTCATCAC GTAGCTTGGC TCCACCCCAG AATGGT =TG ACATGTCGAT CTCCTGTCTG AGTTCGAATTI GAACrCTTTT TrACATC1'TA TAAACAAATG TGCTCTACCC GATGCTTGCA TAGAAGGACT TCTTCTTGCT ACGTTTGAGA TAGATAAAGC CATAGCGCTT ATTCAT~rCC CCCCAAGTCG TATAACCAAG CAAGTCAACC 'rTGATGTGGG CCTCTAAGTA AGTAATCCGA GGTGTATCCA TAGCACCGAG TCCATTI'TCT TTATATAATA GTGATATGAA ATCAACTAAA AAAGACTCTC GTACAGCTAA ATATCATA.A ATAAGAACAG TACAAATCGA TCAGGACAGT AGGGGTGTAC TATTCTAGTT TCAATCTACT GAGATTATAG AACT'TrTATA GTAGTT'rGAA CAAATTAA'rT TATAGAAATA TTTrAGCAGC ATAGACTGTA ATCAAACAAC GATTTGGCGA GACTCTCTCC TrCAAGAAAC ACGTGGTGGT GAGAAAGCCT TrCTrGcCCCG CCATT'rGAAG 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 GCTACAGAGG CAGGAGAATT TGTTACAATT TrAGGTCGTT CCTACACACG TGATGCCTTC AATATTACGC CACGTCCAGA ACATCCTAAG AAAAATAAAA TCTCAATCCA AGAAGGCAAG TAAGGTI-rGC 'ITGATGTACC AAGCTrGAAGC TTGTTGG~cC CCAA'rAGAG TAGGTCCACA 433 GATGCCTTA'r 'rTCAxGcCTA TAAAAAGGAG TATCAACTGT TGAAGCGCCA TGGTTGGCGA AAAGCAGACG CTCAAACCAT T13TTGCGTCT AAAGCGTT AAATATAGTA TGGrCGGT AGAATCAGTA TATCCATAGT CACTATATAC CTATTGTAT GGAGCTGTTG ATGCCATAC AGGCAATCA T=rTTC-rAA ATGTAATACT GAGTGGATGA ACGCCTTTrr AGAAGAGCTT TC-ACAAGC?1' TCGTATGGA CAATG.CTATA TGGCATAAAT CAAGTACCI' AAAGATTCCG GT7'rTGCATT TATCCTCCA TACACACCAG AGATGAACCC CATTGAACAA AGATTCGTAA ACGTGGAT'r AAGAATAAAG CCTTCGAAT TTTGGAAGAT
GACGTTTTCG
AACTGGGATC
GAGAATTTCG
TAGCTGGTAG
A'rCCTTTTrAC
ACTAATATTG
GTGTGGAAAG
GTCATGAATC
AACTCCAAGA TGTCATACAA GGATTGGAGA AGGAGGTGAT AAAGTCCATC GTTAATCGGA GATGGACTAG AATGCTTr GAAAGCAGAT GAGTATTATA TGCAATTTCT TTATATAAAA AGACCGGATT GCTCCGATCT TTCAATAGTT CATATTC'TCA ATTTCTATTT TAAAAATAGC TAAGG-rrAAC GTrCAAATGAC TACGCGACCT ATTTCATACG ATAAAAATCA AGCACTAGAC 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 CAGCAGCTCC I-rGAACTAAT AAACCTrTAT ACCAAGCTCG TATGAATAA'r GATTCCTGAC TA.AGGCAGTG ACTACCAATC TATCCAAGAT ATGCTCCAAG ATTGATTT'AA ATTCATTTAC GTTCTGCTCT ATGAAAATCA AAGGACTCTG TTCCCCAATC TrCAACC=T TGTAGTTCTG
TGAAAGTTTI'
GTAAACTIGAG
TCAGCACAGA
TCTCCTCCAT
AArA-ATAAAG
CCAAAGGAAC
CAAAACCATC
CTTGGGGAAC
AAAAAGACCG
TCTACAAGTT
TCCTTCTTCT
GAAGGCTTTG
AAGTGGACGG
GT'ACAG'N' GGTCCGTGTA GTCTACATCC TCALACCTCGA CAAATGATTT TGTGACAACA ATTAGCATAA TCTGCCT A7TrTGAAACG ATAATATCTA GATTGCTCCG ATCTTTTAAA TCATATTTGA TTTTCGGCGA TCCAAGAAGA GACGGAATCG ATAGC'rTCTT CTGAGTGAAG TGI'GAGTGA TGCGAGCATC GAGGAATTAT rTAATTGCGC GTGArrGCAA TACGAGTTCT TCTGCTTCGT ATrTTCCTT TTTTGGATCC AA~rCAAGTA C'rrCTACTGG GATGACAACA GTN'TACCTT C7*TTGT'rCAA GTCTTCGATA CCGTCAACTG TGAATCCAAC GTCAGCGTTT GTGAAGTCTA CACCAAACAA TTTAACAGCT 'rCTGCAACAA CTGCATCGAT AGCTCCT'rGA GCTTCCGCAA TTTTAGCGTA GTGrTG= GTATCrrCGT ATTTGTTCTT GATGAAGCCG TACTCAGCAT TTGAGAAGAC AAGGTTGA'rA ACTGGAAGCT CGTATTGAAC G?1'rGTGATA ACGTCTGCG TTGGCGATCT GGATTGTCTS' CGCAAAGAGT GGAGA'XGTAC TGTTTGAGTA GTGTTACCTA GATTGCATTG TAAACTTGA'r CATGTAATCA CGCCAGTT'N' AAC7TGGGTTT ACT1-rGTCAA AGCGTCAAGG GCATGACGTT TTCAGTGTTC TTGAATGCT1' AACTGTGTCT GCTTCAA.AGA ACCTCTCAAA CCTTCATAGT TGGTGCTTTG ATN-rrACG'r TCCAGCATAG ATAACTGGGC AACTTCGTTC AAAGCAGGAG GNTCATCG ATTTCTTG43A TTTAGAAACT CCAGCACCGC TACACGTTTC TTGTAAACAG 434 AGCACATG?1' GAATGCTCCC TCACCCATGA TGTTCCATAC TCTTAGCAGC GATACCACCA GGAAGGGCAA TACCCATTGT GCCACATGTT CTTAA3GTGTC ATGTGAAGGT CGTCGATTGA GTAGATAGCG TCTTGATCAG ACAA71rGCAA TTCACCCTCA GTrrTTACCT'r GGTrG7rCrT AACGTrrGCA CGCCACCATG GGATAGCTrr
TACCAAGTTT
CGTAAACTTC
CCACTTCGTT
TCCATTCGAA
ACAATTCAGT
GTTCAGCATT
CGA'rCAATGA
AACCGAAGTT
AGGCTTCGTC
CCATACCGTT
AGCTG.CTTGA CCACCATCAC GTAAGGGTCG ATATCGACTT AGCAAATGGG AAG7TTGAAC GGCrGG'rTTC CAACCAACAC AGCTI'CAAAG TrPI-rACCAG AATCACTTCA CCAGCTTTAA
GACGAGTAGA
CATGTTGTT
CGAGTTGTT
GAGTTGATTC
CAAGGATTGA
GGATGAATTT
CAAGGAAAAG
GGTAAGCAGA
TTGTGATGAT
CACCACCAAA
GTTCAAGAT1' TCAACAGCI'T TG'rCGA'rTrc GCGTTCGTAT GAACC1TGAAC CGTAGTATGA TACTGGAATT TCAACAACAG CTGGACCTTT AATTAC==T GGCAATTGCT CAGCGTAAGC GTACATTGGG TrTTTGGT'rAA GCTCTTGGAA TGATCCAAGG ATCGCTAGGA ATGGAGTGTT CAACTGAGTC GCACCTGGAC CACCTGAACC AGCTTGCATA ACCGCTGCAA GAGCACCTGT ATCTT'rGTCT TCAGCCAAAG CGTCCATCAA GATTGTATCT ACGCCCCATG ?r'rTCAATAC 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 AGCATCCATG TTCAArrCGT TAACTGGACG ATCCATAGCT GCATCGTAAA CACCGTAAT AACTGCAACC CCG.ATTGAGC CGCCGAATTT CTCTTCGTIGG CGAACTTGTA AGAAACGGAT TGAGCTGAGT GT'rCCTGATG GGATACCGTA GTTAAGCATT GCTGCAGATG CAGTAATTTT ATTTTTTAA ACTTGGAGAA TACGATTACA TATTCCACTG TATCATATrr ATGCTGACTT TCTATTCTAA TACAG~T'TG AAAGTTCTCT
CCCTTGAGTC
TAGAATTGGA
ATAATGATAA CTCTCCTTCA AACGTTCTCC AAATTTTTAC TTCTAAAAAT CTGCTCAAAA CTCTCTATTC CATTTCTGTT TTATAACAAA GAAATCTAG3'
CATTACTTTT
GAACTAAAGG
AAGACTCCTG
ACTAGAATTC
AGTCTATTTT ACTAAAATTT AACAGAAGGG CCATGGCTAG ACCTGCCAAT TCTGGGTTGA CTGCAATCGG AA'rrCCGACA ACATTGTAGA GATGAAAGGT TTTCTTACTC ATATCAAAGG AACTGGTCAG AACAGATACA GAGCCAGTCC AACACCTGAA TAAAAGCCCA CAAAACATTG CACGAACC.AC TCCTAAAAGA 435 ATCrGCTGAC TCGATGGCGA TATCTGTTCC AGCTCCCATA TTATTGGTTG TCAACACCAA
GCAATCCCCA
GCTACTTTCC
CCTGCAATGA
TCTCCTGTCA
GCATTTTCCT
AAGAACACAA
GAAATATCCA
CCCCCTCAAA
CATCTGCTAC ACTAAGGGCA cTGAcTGTGr cAGTTTATGG CCTCTTCAAT TCCGATI'rGA GCATGACTGT TCGGAGACCA TAGGAATATC 'rTGCAAAGCA CTGTC7'rAGC TrCTT'rCT TGCCATCCAG CATTTTAGCA CACCTTTCCC GTGCAAGGAC GGAGCGTCAT TCATACCGTC CCCA-ACAAAG ATTTCATGGG CTrTTCTTC 'rGGCAAGACG TCTGCAkATAG CACGCGCCAC ACCAGCATTG CGT=rMTA GCTGACTGAT GCCTAGCTTA AGCAAGCCTT TGATTTCATT GTCAACAGCT CCAGCTTCAC TCGCTCGCTT AAGGAGCCTG CCAACCCAAA TTCCCTTCCG TCAAAGTCCC TGAAGAC.AG TTCCATTTT1' ATAAGGGCTG TCGGTGTTGC ACTCCGTAGA GAAGAGAGGA AAGACGAACC AAALCCCAAAA CCTGAAATCT TATCCGTCAA AAATCCACAA TCTGAGCCAA GT'rCCACTAT GATTGATGGT AGACTCTCAC CTGTCACCAT TCAACAGCAA TCTTTTCACC
AACGATAGCC
CACTTCTACT
GGTCTTATCA
GAGGAGAACC
AGTTCTTCTA
'rTTCCAAGTA
TGAAAAT=T
TCAGCCAGI'G
TCGTCGCCGA
AAGACAAGGG
CCCATCT1TGG 'rTTGAACTTT
CACTACGTCC
AGGCGATA6AT
CAACCACACT
AAGTCCCAAG GCACAAGGAC CACAAAGCTA GCTCCAAGCA CTTTATCTTG ATAAGTATTA AAACTTGTTT 'rCCATTGArr CAACAGTTTG AAACTCAAGT GGTGTTGAGA AGCATCX1TCC TGACATCTGT TACCACAGGT GGTCATGAT CCTAAAATGA CAACTACTGG GTCCTGAATC GGCGCACGAC TTGTCTGAGC AACAG'rCTCT GAGCCAACTT TTTCTGCTCT CTGGATTcCC
TCTCCCCACC
CAAAACCGCC
ATCCC1TGAGC
GACAAAAATC
TTT'CTrTCACA
AAAGACAAGC
GTCCACAG4CC
TACGACACCA
GACTTGTTCC
,rrGCAAGTrCC 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640
TGAGCCAATG
GGATTCGTCA
GGGACGCACT
ACACTATCTC CAACTGTCTT A'rACI'AGAGA CACCTTCTAC CGAATCAGGT CGCCTACCTT AAGACTTCTG CGGTI'TTAGC AAAGGAACTT GGACATAACT ATCATCACTC AGTAATTTCT CCACAGCITG GCACGTA~r AAAAGAACGA AAAAGAGGAT AAATCCAGCA AGAGCAACTA GGCTATAGAA ATAAGCCACT TTGGCATTGT GCTTrT~AAA ACTGGCCCAA AACATAATAG GCCTTGNTGC TAGAAAGGTT CCTGTCAACA TCCCAATCAT GAGAATCACA AAACGTTGCA GGAGAGATAG AGAT'NTTCGA T'rCTCATT'r TTTrCT!AAA A.ACTGCTCCC CTTTCGAACT AAACAGGGAG ACCAGCAAAG AGAGTTCCCA GCGCAACCAA GCTATCCATG GCACTCTGGA TATATGGCTN ACC'rGCAACT CCCCAATGCA TGACTTGATG ACTAATGCTA AGAGGCACAG TAAAGATACT AGTAATCCAA GTCI'TCTCAA CGACTGTATA GCTTCCCT TGCATCTTCA TGCCACAAGA AAATTCATGT ACTTTCTCCT CATCTACGCC GATTGGNTCC TTATAACAGT TTGAAGGAGT AGCACGATGA AGCTGGATAT GGGCTGGATG ATAGCCTr TTrrCTAAGC TTGCTTTCAC AATTTCTGTC CCG1TGCATCA TGTTCATACC ACAAGCAAAG TCCACrACAT ACTCTTCCCC CATTGGCAGG ATTGATCCA GACATGGTGA AGGATCCTTG TTCr'1GAGGA CAATCAACTC AGGAGTATAG TATCCGTTTT TT'rGCTGGGC TTTTr TGTCCA AAGATAAACG CGATAAGGGC AATACAAATA CCTTTACATA CAArrACATC TTACTTCTGT AGCTTCCAAG TCTTCCAAGT CAGTCTGAGT CA.AATTCCTA ATCCTACGGG AACAAACCTT ACTTrTGGTCT AGAGTTAAAA GGGCTGAATA CAAACACTCT TTATCAACCA GACGAGCCAA AAACCdCTCT GCCAAAACCC TAATCAAATC AATCTTCATG ACCTGCCATT CTGCATCTGA ATTTGTCAAT TACACTCATC AGTATACTCT ATATTTTCTT CGAAAAATAG AATTTTAATC 436 CGCCCTAATT CTTGAGGCGT AAAACGAATG AAGATACCTT CrTCrCAAA CAGAATTTCC AAGGTAATCT CAGCTGGAAT TCCCTTTTGA TCAGCTCGGA TACGGATTTT TTGAATGCCA ATAGTCTCCA tCC~rCTCTAC AATCATCrrG CCAAACTCTC CAGCCTGT'rC AGGCCTGAT TTCGCATGTA CACCAAAATC 'rGGAAAAACA CGGTCAAAGA CAATGCG;TGC TGGCACTG.AT CCTCCCATGA CTTCCACTCG AATCTCTTGG GM-!rrMCAG GC~rTTTGAA
AAACCAAAAC
ATGGTTACAA
TACAGCACT
AAATTCACAT
GTCTTTGATA
AACAAAGGAC
AAGTGTCTGA
TGTACTGGTC
AA'rCTGCATT
TAAAATCTAC
ATTTGAAAAA
TACTAT?1'AA CA'rGACGTCT GATTTCTTCT CTGAAATCAC TCTACAATCA AGTCAGCCAA TCTTGGACAA GTAAATCCCG TTGCCTTCTT T1'TTCCGAGT ACCGTCGACT TGGACCAGTC TGCTCCCCCT GCATCCAAAT ACCATACCTC CAA.AATCTAC ATTTGTCAAT TATAGAAATA CGATTTGCAG TCAAATATTA AAAACAACCA CTAGGGGTGC ATCTAGTAA TGCTAGCTGA rCCcTTCTAA AACAGACI'AG 8700 8760 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 CTATATAAAC AATAAAAATA TGCTATACTA GTAAAGCTGA GATTAACGAC TGTTAGATCC TGGAAGTGGA AATGATAATG GGGACTAGCA CTTGTCT TA AGAATACAAA CTTCAGTTGG TTGCTTTGAC CATTGCAGGG ACTGACCCTA AGTCATTCCA AGCGAGAGAT GTCTATGGAA ATACCAGAGO TGTTCAGCTA ATCGAGCACG AGAGTGTC?1' TTCTGATATT CCACCTCAGG AAATCATGGA AA'rCATCCAA CCCTATCTTA CTCTTATGGT TGCTACAAGT GGAGATGCCT
AAGAAAAAAG
CTCTGACTCA
GTC'rrCTATT
TTGGGAGGTT
GTGGTGGTGC
TGGCTGTTGT
TTAGATGACT
TGGCATTATG
AACCAGTCTT
TATTTACCCG
GCAGATTTAA
GTCGCTCAAA
TT'rCTCCTCA AATGTTGAAA GCCCAATTGG CTGTAAAAAC TGCAATGTTG GCTACTACTG AAAAACTGGA TTCTCCCTAT GTCCTTGATC TGATTGACTC AAATGCTAGA GACTATCTCA
AAACAAACTT
?rG~rTGGTI-r
AAGAATTTGG
A'N'TCCTCT
ACACCCATGG
AGAGTCTr'rA
CCCCTCAACT
AAAAACTCTC
CTTCAAACTA
AG'rCTTArT
ACTACCTCTA
TTCAATCCAT
TCCTCAGTCT
TACCAAGAAT
TACTGGATGT
437 GCAACTATTA TTACGCCAAA TCN'CCTGAA GCAGAAGAGA GACCCCGAAG ACATGCAGCG TGCTGGTCGC CTGATTTTAA GTGGT'rATCA AAGGCGGACA TCTCAAAGGT GGTGCTAAAG GAACAATTrG TCTGGGAAAG CCCACGAATT CAAACCTGTC ACCTTGCTG CAGTGATTAC TGCTGAACTA GCCAAGGGCA
CTCTTTGATT
CTTGCGTTTT
CTATGCCACT
CTGGAATGC
TTTCCTTGGT
CCGCTTCCAG
AGACTTTGCG
CAACCGCAAG
TTATTTCCTT
.GCAAT'rGGAC
CCACGGAATC
ACCTCTCCTT
AGTACTTGGT
ATA'rTGAGAT
AAAACAGCTG
CAGCCATTTC
ACTAAGAAAG
CCAGGCAGTT GATAAGGCCA AGGCC11'TAT CACAAAAGCT ATTCAAGATG CGGTCATGGT TCTGGTCCAG TCAACCATAC AACTTTTAAA GATTAAGAAA TAG'N'CCCAC TTTAAGGGAA TTAGAGAGTr TTTATACTCT TCGAAAATCT CGTCAGCTTC CA'rCTGCAGC CTCAAAACAC TG7"TTTGAGC TGACTTCGTC AAAACCTCAA GGCAGTACrr TGAGCAACCT GCGACTAGCT TTCTAGTI'TA TTCATGAGT ATTAATTAGG AAAGAATIGTT ATGCAACTTT TTTAAAAAGG TGCCTCAATA TCTCTGCT'T GCATCAAATC ACGTACAACA GCTACACCAG GCCCATAAGC TGATCAATAT TCTCCGAAGT CAAGCCTCCA ATAGCAACTA AACCC=rGG CAAATTGTTT TCAAGGTCGA TATCAGAGTA ATGGGCGCAT GGTGTGGG AAAATGGCrC CTGTACCCAA GTAATCTGCA CCTGATTTCT AGCTCTTA ACCGTmAG CGGTGACACC GAGGATTTTT TCAGGACCCA AGCTACCGAA ACTGGTAATT CATCATCTCC GATATGCAGA CCTGCTGCAT ACAAACATCC AACCGATCAT CGA?1'ATCAA GGGTACCTGA TAAGCATCTG GACTTGTTITT GCCAGTTGAT AATATTGATT GC'r'GTGAGA TTTTTTCTC TATGGTAACC CCTGAACGGC AGGCCGTCTC AACTTTTGCA AGAAAGCTr TTGATAGCGA TTGGT'rACCA GATATAGTCT AAGTGCTTCT CTATTCATAA TGATGGTATC TAGCCAATTT TCATCTCTTC TTAGGAGCGA AAGC'rGATTG AACGAAATTC 'rTCCAATCCC A'PTCCTTGAA CAACTATTT CTCAGCAGCC AACAGACTGC TAAGCA.AGAA GCTTC-AAAAC CAGTCTTTCC TTGGCTGAGA TTAAGGCTCC AACCAAGTCT CCTGTCCCIG 'rTATCCAGTC TAATTCAGTA CCAGTACAGC GACCTGATTT TTCGAAACGA CGAGGTCCrr GGGACCTGTG ACATACCAGG ATAGGTCTGA CACCAGTCTT TCAAGACTTG AAGCAAATCC 10500 1.0560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 TCCGT7TTCTT GATCTTTAGC ACTCGCATCG ACCCCAACGC CGTGGTGCTT TAATCCAACA AGACTTCCAA TrTCTGACAT GTT-TCCTTTA ACGACCGTAG GTCTA'rAGTC TAAAAGGTCT 0*
S
S. *S S S
S.
S S S. 55 S S
S
55
S
S
S S TTAACTAAGC TCTTACGAAT AGAGAAGATT GGTTTGCATA AAATGCCCCA AATTGATGAA GAA'rCATCTG CCATGACAGG CAAGAAATCT CA'rTGGTAAT GTAAATTCCT GCATCAGTCT ATTCCTGCTT GATTAAAAAT CTCCGAAAAA TCGAGGCGTG GTACCATAAC AGGATAGGAA GAGCAGAATA GTGAAAr= AAGTCGCTCC TGTIGAGAGCT TGACTGTAGC CATAGCCAAG CTACACTGGA CATCGGTGCC GGGCAATCAT CATAGATAAA TT'rATTCTT CTCCT=TC GCCATrTCCC AAAAC?1'GGC TGCTTGTCTG TCrCGTCACT ATCTGTTGCT CTAACTCATC GGTGATGGTT TAAGA'IrAAG AAGCTTGCAA AAGCGATGGC TAATGA'rAAC ACGGTGGAGC TCCTTGAAAA A'rTGTTGGCG TTCAAGAGTC TTTCATCTC GAGAAAGCCT 'rCAGATAGTA AAATTCCCCT CTTGTIAATTG
GGATGAACTC
CGAAGCTGCC
GAGACCCTGA
TTTGCATCCC
GCAGTGAATG
ATCCTTTCAC
CGAAAGGCAA
TAGATAAACC AGCTAAGCTT ACAATGGAAC CAATAATACC CGACCGTACT TATAAAAGAG AAAGGCGGAA TCCCTTGAGT GCATAAACAG GTCCCATCAT ATTrCCCTCAA TrCGAAAGAT ATGGTTAATT TGTGAACTTG TAAAGACTIGT AAATCGCTCT TTCCATATGA ACACTGATGT TTC'rCGATAG AGCTGATTGA CGTAATATAA G~rTCAATCC TGATTTGCCT ATATCATGGT TAAGTTCGGT TCTGCAAATT GATTGGATGT TGCTCCATTT AATAAATAAC TCACCCTCCA TTGGTTTGAA GTCTTATCAG GGCATCCTGA ATCAGGTAAT TAAAATAAAG GGATGATGAA AGCAGCTGAT CCTGTAAAGA TGTTCCCAAA ATCTCTCCTA ACCTGCTAGA AGGGCTCCAA CG'rCATACGG ATAAAGGCTG GATTCCTGCT AGAATATTGA AGGTGTAAGG ACTACATCPA TAATTGG;TGC TTTCTCATGC TCCATGTCTG GTGTTGGTAG GGAAGGCATC TAGCATTTTT CCAGTrGCTCC CTCCTCTCTG ATTGTTGATA GAGAGGATTT ATAACCAAGG ACAAGGAAGC GCCTATAAAT ATGAGAAATG CCTGGTCGCT GATTTCCAAT CTAAACCCTG AGCAT'rTTGT 438 GACGCCAA CCGCATCTAC TACCATCGGG ATGCGGATTG CTTTrTTCCTT CTCAGCTGAC CTTT'GCTTAG TAAAATCAAG AACTrCACGG AGACCCAAAA TCCCAT'rTGC CAGCATCTCA AGGGAACTAG AGCCTATAGG AAAGGGATTT TAAAGAAATA TCCCTGCACT TT'TTTAAAGA TAAAGGAAAT CGCTGTACCA ATCAAGGTTG 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 CCAAAAGA'rG AGCG4GAAAAT
AGGAAGCCTG
ATAGA?1'TCT
GGCAGGTTCT
CCAAGCTTTC
TTGGATAATT CCATCGCAAT ATCTGTAAAT GGTT=GAACC AATAAAAAGA AAAGCAGGTA AGTCCGATAG CTATTCTTCA ACTGTGCATG CAGGTAAAGA TGGCCCCACC TAAAGACTGT CCAAAAATGT AGTTCTTGAC CACGTAATAG
TCCATAATAA
GATAATT
TTCGTCAT.AT
CATCAGAACC
GCAAGCTTCT
CTCCTITATA AAAATAGACT GTTTTTTAG GAATATAAA.A CCGT~GAGCAG ATAGAGCTCT 'rTACTGTAAA TCAAGGGCGA 'IrGAGGGACT TGATTTCTTG
S.
C
S
45 S S @9 *S 5 9
S
*5 S S
CTGAATGACA
CCCCTTTTGA
GCCGATACAA
GTGTAACTGA
AGCTAACAr
AAGAGTCAGA
ACTGATAATC
TGCTACTCTG
GACTAGTCAT
TGAGCAGAAA
AGGCTCTATC
TCAAGATGGT
TATCGACTGC
CAGAATCAGG
CAGCTTCCTG
AACCCTTCCA
TTGATTTCCC
CAGCAATTGA
GTTCCGGGCT
GAAACACTTC
AGTCAGATAA
GAATCTCCAA
GATAGACCCT
CTGATAAGTG
CT.GCTAT'M'
ACTCCTCAAA
CCGTCTGGAC
GACCGCTATA
439 GGAAAACAAT TGAATACCAC AATCAAGCCA GCCAAGTACA AGAGA.ACrC ?I'TTAGTGAA ACTGTcAcAA AGGCCCTCGT TCCAAGCATG ACTCCCCACT AGTTGGCAAA AGATGGTAAA TrAAATCGAC G=rAA'rAGAG CATAAAGAGA GCAA'rCGAAG GAATGAAAGA TG7TTCCAAG GGTCTCTGGG TrGCTACTT GACCATACTA AGATGTAAGT GGTrrGGTAA TGGTCACTrC CTCAATCCAA TAATCAACCA CAGAAATCA.A ACrCrCC'r TGATTCCTCT CCTCCACAAT ATCCAAACCT GCAAAAGGI' CATCTAGCAA CAGGAGCTGA AGAAT'rrTT' GCTGACCACC CTGCTCCAAA TCAAAATATC GTAAAGCTG TCCATCTAA'r TGAAGCTCCT CTCGCAGACT AACAACACCA GTCAGATCAC GATACAAACT AGTAATGCTC CCCTTATACT 71"TGAAATTG GACACCATTG TCACCCAGGA TACAGGAA-AT AAAGAGGGGG CGATTACCAA GCTCACCAcrT AGAAGCAACT TCCmTGAAG CAACCTGTGT CCTAGTTTT CCGTCTCTTA GCrCCACCAT ATCATGGTCG CACAAAATAA CTGTCTTCCC TATCTCGATT CTGCTCTTGC GGTCAATGGA AGGATTCATG GCAAAGAGGA CAGCCAGCGC ATGGATGAGA CCGTCCAAGA TGTCCTTGCA AGA.ATCAATr TCCTGAAGGT GATAGCCGAT CAAGCTCTCC A'IGGTAAATT GATGATTAGG ACGTTCGACG ATAGAAAGCT GACI'GACCTC GGGAAGAGAA CTAACTrGGG CAATCA'rTTG rAGGACCAAG
ACAGAGGAAA
ACTGCCTGTG
ATC-GCAAGTA
ATACAAAATG
GATAAAATCA
TCrC~CTCC CCTTGGGTAT TTTCACATGC CGAAGACCCT AGGGTCTAAA CGATGACTAA CCACTTGCAA AAATAATGGC GATCACGGAA GCC~rACTGG ACTTAATTGA TAGGGACTCT AAAAATCCGC TGATr'rCTTT GACTCGGATA AACTGCTTCT C'?PITTCrr TTCAGGACCG AAGAATAGAC CGAAAGAGGG CCCTTGATAG AATGTGAAAr CACACGG=rC ATATGGATA CATCTCATAG GA.AGGGATTT ATGGTCGATA TAGGCTTTAT ATCATAGACC AACrCTTTTA AGCGAAGGGC TCATCCAAGA
AGCGTGATAG
CAAAGACAAG
AAGCATCTCC
TGATCATCCA
CGACTACCCA
GCAAGAAGAG
14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 S. Sb 5 9
S
S
*S*S
TGC TITTGC
ACGACATTGC
ATTT-TCCATG
TTTTCCCCAC
TGGACAACCT
GTAAAAACCA
ATTv'-GCAAG AGAATACCAA GCTCCCATCT ATCAGGACTT AAAGAGGCTG GATTT-rCCAG AcccAcTAcT cccAAcTAAc AAGGTAAAGG CTTGCGCATG AAAAGTAAAA TCAAACGGCT CAGAGAAGAT TGGGGACTGA TTGCAAACTG ATGATAGAGT
CAGAAATAAA
CTAACTTAAT
GAGAGAGCCA
ACGTACCACA
GTATTCATAG
ACTTTCATAG
440 ATCGCTCGTA GTTCCAGACC CATCTATGCT TIpCCTCAG TTGACAATGG CACGAACCAA GATGGTACAG AAGAAATAAA AGCAAGGAAA GGACAAACGG AACGGAAAAG GCGTAGTAAC ACAAAGCTAA CAAGCGTAA'r CCCAATACTA TTAGCAGI-rA CGA7'rCTTAG TT-ACCATAAA ACCAAATTCA CTTCCCAAAC AGAGCTCCTA GACCAAATTG GCTACCATAA AGGAC7rCAG CTTGAACAAA GCCAGACAAA CAACCGCAGC TAGCACTTCT CAA'rGGGCGC AGCCATACAC CAAGAGGTr TAAGAGTAGA CACCAAAAAA GATAGACAAG ACATAAAAAA CTCCTT'rT TrTGTATAGC AGACTGAATT AATTTGAATT T'rCTAATTGA CCAGTTTCGT AAAAGAAAAA CCAATCGTTG CACTTCCGAC CAGAGACCGA AGAGCATTTC CTGAGAATAT TATACACATA AAAGCAAGCA AGATAACATC TAAAGAAAAG TGAGGCACTC TAGAACAGTA CACAAGAACA TTGTTCGCA TCTTATTTCA ACTCTAATTA CAGATACAAA 'rCTCGGAACA
ATTGGCAAAG
TCCTGAACCA
?TTTAACTGC
AAGAAGACCG
CTAAAATATT
ATCTACTATA
AAGATGGCAG
GCCTGCAAAC
ACGAAAACCC
CATTTrTCA
ACCTAAATAC
TCTAGAAATT
TCATCTTCAT
TTAGACAGTT CTTTTCGACA TACGAAAAAA CTGTATCAGG TTCAATGCCT ATCATCTCAG AATTATGTGA TTATTATAAC ACACATTTA AGCCTTGCAC TCACAAAGAC AGCAGATC~ TGAATTAGCC ATTCAAGCTG AATCTGGACA CTTAGAATCC AAGGATAGAT ATCTATTGTT TTTTTGCATA CCATATTCCC GAAATGATTG CTTTAATATG TTTCGTCTGT ATCCCACCAA
ACATTTCACA
CCTAAACCZAC
TACTAGTTCA
TCI'1TTG4CAA
TAGCTTTA
CACTCATTTC
AAACGCCATC
TTAGAGTTrCA GCTTACAAGA ?TTrCCC1'TCG CCAGTCTTAA CCCAAATGTC 'N'rATTATT AGAAATTGAA CTGGAAATAC AAAACAAATG ACCTGTTTGA AAAAAGGAAA ATCCTACTTA CCGAACAGTT TTTTCTATAT CATATTGGTC TTTATAATGT 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 TTGCAACTA6A AGGCATTTGT GGCAATAGTT
TTCTCATCAA
TACCAAATAC
CTTCTTCGTT
.CAACACCAA'r
CTATATCGTC
TAGCTAGCTC
ATAAGCTAT
AAGTAGTCGT
TTCAAGACCT TCATAACCTA 'rACTACCACC AGCATCATCC =NACGG AGGCCCAACA CCTACATAAT CTACATATTC AACTGAT TGTGAAATT TC'rTATAGAA AGACCAATTA 7=rATCTGG CATCAA= CTAATTTCAT ATCATCTTGA CCTACATGTA CGCCATCGGC CTCAATrTCC ATTGC'TAAAT ATTAACGATA AATGGAACAT TGTAT'rTTT ACAAAGTTCT TTAATTGGA AAG7"-N-1rCT AAGCCTTCTA AAGCACCCTC ACcT~NTTCT CGAAATTGAA ACCACCTN' AAGGCTTCCT CAACGACTGT ATATAGATr TTTCCTTGGC TCCACAPLATA AAATATAGVT TTAGTAATTC TTTATGAAAC ATCTTACTTC
ACTCTTI'GA
ACTTTAAATG
ITGrAAACCA
AAGCTTGCTA
CTACCATTAT
ACTACTATTG
ACGCCCGCAC
CCAGCATTTC
ATTCCTTTAC
TCCCAGGAAG
ACACTGCTGT
CrACAGCTCC
GAATCATTAC
CGAATATTGAA
TATCTACTCC
CTCTAATCGC
441 ATCT'rCATCT GThATCTCGT ATAAAGGCAT TATAAATTCA ATGTCCATI'T GGACGTTTrr CTGCTA'11TC TCCAGCGATA 1-rTTAATGAT T~TAATTCTr GACCTTrr TAGTCCGATA TAATAAGCAT CCTGTCCCAA TGACTTTCGG CATCATAGCA CACTTCTCCA TTAACAGCAA TGGCATCCAC ?1'CACCTGTT cCTCATTT GCTGCTAGAG CAATTTCGTC AA'rATTATCT ?r'rAGATGCC ACATCTATTC CTACTAAAGA GGCAATCTCG TCCTAGTTTA TAATTTGA TTAGATCATC TGCTACr-rT' ACAGGCTACA GGATCTAAAA CTGCTGGGAC ArrATATTC 'rTGGTATAAT TTCCAATTMr CATCTGTCAA TGTTCCTATG ATACTTTAAC AAATCCTCTA AATCTGCTGG AAACTCACTC TGCTACTAAT CCArrTGCTG 'rGAAA'rTrT 'rACTACATCA TGGTGCT=T TC=rAATA ATT'TTAAACT TGTCATATTG ATACGATCTA CTAATTTCGA T'rTATCTTTA GTTGAGAATT A=TATACTC AATCAAAATC AAAGAGCAAA CTAGGAGGCT ACTG"rrCA GGTTGTGGAT AGAACTGACG 'rGGTTTGAAG i?1'CTATATT CTCCTG.CTCC TCTGCAATT TCAGAGCAGC TrrATTAATA AACCACCAGC ATG;GC'TGGTG AGGCGCCCAG TTGCTTATAC AAATGACCAA AAA'rCCTrCC TTTCACTTT TTTTTCATTT ACATTGAATG AACCGCAGGT TGCTCAAAAC AGATT'rTCGA AGAGTCTTAC 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 CTCATCAAAT TTGTAAATAT CATGACCrrTCTCTAGACAT CG'TAACCAAT ATCAAAA-AA.A GCTAATTCTA AACCGACTGC 'TTGATTCCAG CGrrGC1GAA GTTCTGTCAA ATCTTCTCGA TTTTTACCGA CACGATTGAG TTCGTCA.ACC AGAAATTGAA CCCACTCTGC AAAGAAAGGA CCTCTG'rGGA GATTGATCCA TTCCGAATGA ATATAGACTT CAGGTAAAGC CAAATCTTA GAACCCCAGT CTAAATAGAG ACCTTCTGCA A'rGACCAGCA TGACCAAAAG ATGGGCATAG TCTGATGAAG CCACCGCCGA ATACATTAGA 'rCCTGAAAGG CTTTTGTTAC AGGGTGCAAA GTCACTTCTA GATAGTCATT CTCTGCTACT T'rTAACTCTT TAAAAGCCTT TTGGAAATAA CCATCTTCAT CTGCTTCAAG AAAGCCTAGT TGCTTGGCAA AACGAAGCTT GGATTCAAGT TTATCTGCGT GACTACGCAG GCACCCAGCA TGGATAAGAA GCCATCAAAG AAGTGATAAT CTTGAATCAG ATAGTCCTTT AAGACCTTAT TCTCAATTGT CCCCGCAAAA AGTTCCTTAA CAAAACGATG A7TTGAI-rGCA GCCTGCCAAT CCTTCTCACT GCTTTTTAAT AATTCTCCAA CAGTCAAACC TGGCTGAAAT GCATAGTCTr G'rGTTTCCAT ATTTACTTCT CCTCTCTTTA CTTGTTAGTA ATTAATAAAA CACCAAGAAA TATCAAGCAA 442 AATCGTAATT CCACTTGATC CT'rAAAGC ACATCGAGAG AACAAGCCTA TCCAGTTTAT ATAAACAAAA AACTCCAA'TT ACTTACAAGA TTAGACCGTT CATTTCACCA TACGAAAAAA CAGTCI'AAC TGTATCAGGT TCAATGGGTA TTATCTCAGC c1TATTA'!rI'A ACTACTGAAC CAGTATAGCA AAAAATGAAA CGAAAAATAT CTTTA'rATAT AATATATTGA AACTAGAATA CATTGTTAGA AATCGAI-rrG AcCTCCTGA TTGATTTGTC TATAGTTC GATAGCAATT TATTCTTCCA ATACACGAAG GAGGCAATCT GNTATCAA TACAATTrrA AGTCACGAGG TGTA'PGGArr GTGACGGAGC TTGAAGTGTT TGACATCN'C ATTGCATAAC TGTCTTCAAT TCCGCATTCA AGTGTTCAAA TACCACCGAG ACCCAAGCCA TAGATCAC GGCGTCCAAT ATGCCAAGCC TTAAAGACG TCTTGACCAC GACGAACACC CACGTCTATC AACTGCTTCT GCCACTTCTT GAAGCGAGTC CGATTTGACG ACCACCGTGG rrGGTTACCC AGATACCAGA GTTCAACGTC CTCACCGCAT TGTGGTCCCT TGACATACAC CATTrGCAGA GAGCTAAC'rA ACAATCAAGA ATTAGAGTTG CTGTTCACAT TTCCCTTCGC CTAAAGCACC CCAAATGTcT GCCCTAGCAA GATATTTGAC GTACACCTCT ACrrATAAAA CTAlrCAT TTCAT?1-rAC AAAAACCTCC ACA~rCAGTG GTC.AACTGGG AAGGTTGGGT AATGGTCTGA GTTCC-AGACA GAC~wMACGC AcACCGACAC AGCAACCAAG TCTGCTCCTG AGAGTCALAAG ACAATCGGCA AAAGGCACCT GGTCCACCGT AGCTCCTGCA GCAAGCGAAC AGGAAGACCA GAGTATTCAG 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 CGATAAATTC TACA'rCGCGT GGAGACAAGC GTTGTTTAGC TGATTTGTAA ACAAAGTCCA T'rGATTTACC AGCACCTTCT GGCAGGTATT CTTCAACAAT CGGCATGCCA ACTGGGAAGA CAAAACCATT ACGCTTATCC ACTTCACGAT TCCCCCCTAC AGTAGCATCT GCCGTCAAGA CAATCGCTTT ATAAC=TCA GCCTTCACAC GGTCCATGAT GTGGCGGTTG ATACCGTCAT CCTTACTAAA GTAA.AATTGA AACCAATGAG GTGTCCC~rG GAGGGCTTCA GAAATCTCTG GAAGGTCAAC AGTAGAGTAA GAACTGGTTG TATAAAGAGA ACCAAACTCA TGCACACCAC GCGCAGTCGC CACTTCCCCC TGTTCATTTG CCAATTTATG AGCCGCAACA GGTGCCATAA TGATTGGAGA AGATAGPT TCACCTGCAA ATTCAATCTC TGTACN'GGA TTTCTACAT TGCAAAGTGT ATGAGGAACG ATGAGCTTGT GGTTAAAGGC ACGGA'rATTC TCTCTTAAAG TCAAAGTATC TTCCCCCCCA CTAGCGATAT AGCCAAATCC TGCTrAGGA ATAACTTGTT GCGCCATTGG CTCCAAATCA TAGGTATI'GA TGAArTCTAC ATGACCTTCT GCATTGC~rG ?TrTGTATGA CATAAAATGT CCTCCTTAAT AAGTAAGCGT T'rACTTTGTG TATTACAAAA ATATCTTAAC TCr''r'CAA AACTTTTAAA ATA'1rTGTT TGGAAATTTC AGAAATNrrA TGTCTATGAT AAAAA'rCCTT ATAACGGCAA TAAAAAATAG ATATTATCCA AAGAAGATT 443 TAAGGCTAC AATAACTGTA ?rA~rrCTAG ATGGGAGGTT CTAIITCG ATTGATCCAT TGTTGAACAA TATCTACCAC TATATCAAAA GGCATC1'TT CTGACCTTGC ATATrGCAGT TTGGGGAXTT TGGGATCCT TTCTGcTCGG rrTAATCGr AGTATCATCC GACATTATCG AATCCrG=r TTGGCGCAAG TAGCGACAGC CTACAII'GAA ?I'GTCACGTA ATACG.CCCCT I'rIGA'rTCAA CTC7TCTTTC TCTACTTCGG TCTTCCCCGA ATCGGATTG TCCTATCTT- AGAAGTCTGT GCAACC?1'G GGCTTGTC?' TAGGAGGC TCCTATA'rGG CAGAATCr1-r CCGAAGTGGG CTGGAAGCCA TC-AC;TCAAAC GACACCTCTA CAGGTCTTr ACTATGTGGT CTCCrAG1' GCCAATCTC-A TTNrCCTTAT TTTGGCCG.AC CTC-ATGTACG TCGCCAAGGA TGCGCTAGCT ATGTTGGTAG TTGCTTATCT TAGCTGGATA GAAAGGAGGC TCCGCCATC GAAATAATCT CCTGAGAATC rrACAGGGAT CTGTCCTCTT ATCCATGA'rG T'rCAGAACAG CCAGCAGGAG ATTrGGCCTCG CTATTGGTCT TCTTCCGCAA GCAACAGCCG TGCCACTCCC CAAGGAAACC TCTGT'ITTC CAGCAGTGGC T'rTGATTGGT CTCTACTATG AGACAGACAT AATCATGCTG CTACCCATCT CACTCGTCTT AGGATTCGGG AATCCAAGTA CTCTTT-CAAG TGGGCGTTAC GATTGGGATA TCCATCCTGT TCATGGGAAT CATCATGACC TCCCA71TCTA
S
S.
5 6 6 S. S *6 .5 S S S. 55 5 0 5055.5
S
St..
S
5.55 SS Se S S
S..
5
S.
*S S
S
GAATCATACG
TGCTACTCTT
AGACrrCAGC
GTGGAGCTAT
ATTTTTAACA
CATCGTrTAC
TATTATCGTT
CACTTCTCTC
CGAT=GATC TGGAATTTAT CCGTATCA'rG CCCCAGCTGG lrTGGCTTGG CTCGAAACTT TAATATC:AAT ATCTCAGGTG TTTACCCTCT GGGGAACAGC TGAALATGGGA GACTTGGAC CCTAAACATC AG7MrAAAG TGGACAGGCA CTCGGCTTGA 21120 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 CTAATGTTCA ACI'TTAC'rAC CACATCATCA TCCCACAAGT CTrAAGAAGA CTGCTACCGC AGGCTATCAA TCTTGTCACT CGGATGA'rTA AAACCACTTC ATrAGTrGTT TTGATTGGGG TTGTGGAAGT GACCAAAGTT GGACAACAAA TCATCGATAG CAATCGCC'rG ACCATCCCAA CTGCTrATT TTGGATrTAT GGAACCATTC TAATCTrATA TTTCGCAGTr TGCTACCCTA TTTCCAAACT ATCCACTCAC ?1'AGAAAAAC ATTGGAGAAA CTAAATGTCT GAAACTATCT TAGAAATCAA GGAACTAAAA AAATCCTrCG GAGACAATCC CATCCTCCAA GGACT'rTCTC TA( AATCAA AAAAGGGGAA GTTGTTGTCA TCCTAGGGCC ATCTGGrGT GGGAAAAGTA CCCTCCTTCG 7TCCTCAAC GGCrrAGAAA GTATTCAAGG TGGAGATATT CTTCTGGATG GTCAGTCTAT CGrrGAAAAT AAAAAAGATr TTCACCTAGT TCGCCAAAAG ATTrGGCATGG TCTTTCAAAG TTATGAACTC TTTCCCCATC TGGATGTCTT ACAAAACCTC ATCCTAGGCC CTATCAAAGC 'rCAAGGAAGG GACAAGAAAG AAGTAACGGA AGAAGCTTTG CAATTACTAG 444 AGCGTGTCGG ?TTGCTGGAT AAACAACATA GCTTTGCCCG TCAATTATCT GGTCGACAGA AGCAACGTGT TGCAA?~TC CGTGCCCTCC TAA74GCATCC AGAAATCATC C7rTTGACG AGGTGACTGC TrCGCTGGAT CCAGAAATGG TGCGTGAGGT GCTG-GAACTT ATCAA'rGAT'r TGGCCCAAGA AGGCCGTACC TTACTGACCG GAq-rATCTTC CCTTCT'rTAC CAATCCGCAA GCCAATTCGG CTCATATCTA TGTTTTAGCA CTTGCCTTTG TGGTTCATCC TCTGGAAAAA CGGTGAACTG CGAATCCCCG TG~CrC CAAGCCTACG GGTG3TCAAGG TrAAATACAT AACAAGGTAG ATATTACTCT GATTTTGCCC T'TCCATATAT ATTACAGACG TCAAACAACT GAGACTTATT' TTGAAAAGAA GACTCTTACC AAGCTCTTCT GTTCTAGCTT GGGCGCTTGA CCCGATACCA TTGCCCCAGC AAAGATATTG AAAAATTAGG ATGATTTTAG TAACCCACGA AATGCAGTTT CTCGACCAAG GGAAAATCGC TGAAGAAGGA ACCAAACCAG CCCACGAATT TTTAAACGTC TAAAGGAGAT TCTTATGAAA CTATTCAAAC CCCTTATCTT TATCACTGCT TGTAGCTCAG CAACTGCCAA AGCTCGCACT ATCGATGAAA TGTTTGGAGA TAAAAAACCG TTTGGCTACG
GCCCAAGCCA
ACAGCTCAAG
?'MACITA
CACTCTTAAC
GTGGAAACGC
TCAAAAAAAG
TTrGACAATGA
CTACGATATT
TTCAGTCGAT
TGCTAACTTT
GAAAGTT1'CT
TGAAGGTAAA
'rCATCCAGAA
TGACGGACGT
AAATAAAGGA
AGTTCAAAAA
CAAGGAAAAC
GAACTAGGGA ACCAACTAGC TCAACACCT GCTGCCAACC GTGCGCAATA CTTGATTTCA ACAGTAACTG ACGAACGTAA GAAACAAGTT CTGGGTGTCG TATCACCTAA GACTG4GTCTC ACCTTAATTIG TCACAAAAGG A.ACGACTGCTr 22860 22920 22980 23040 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 2,1240 24300 24360 24420 24480 24540 24600
ATCAAACTCC
GGAGATGCCT
TTTGAAGTAG
GGCAACCAAG
TTCTTCCACA
GATGACCTGG
ATTTTAAGCT
GTTCAATCTG
CACCCAACCT ACGGTGACGC TGCTAAAGCA GATTAGTCAT TAACTCTTAA AAGGAACTGG TACCTATAAC ATCCTGAGTC TATCTAAGAT AAAAATACGA CCAATACAGT ?NTCAACTGA CAATACCAA GAA'rTACTTC CCTCGGTGAT AATTGCTAGA CTTCATCAAT ACGCCTATGA AAAGACACTT TTGTTGAAGG TGGAAAAGTT CCAATCCCTT TTwTAAGA'TTT AACACAGTGT ACATACTTTA CCTCTTCACT ATGACTAGCA ACAATGTI'CT CATATTTTr AAAGAATACA GGGATTCTCC CCGAATAAGT TTTAACTrrC CATTGATTTC CTGATTACCA AGCCACTATA ATAATTTATA TTTTTCTCTC AGGAACAATA TCTTCTAT TG CATATACTTT' ATCACATAAG ATACGAATAT ATCAAAATTG TTGTCCCTTT TTCACTAGAG AGCTTTCTAA ACACTTGATT TATCCAAGGC ATTCATAGGT TCATCTAGTA ATAATTGCTT GAGCAATCCC TAGCTTTTC CTCATACCTA TGGTCTT= GCTCATATAG ACCAACTATT TTCAGTGTAT ACTACTCCTC GTATGCTTGC CAAATATTGT AAATTCTTAA AAACCAGGITr C7rCAATCAA AGCTCCCAAA TTAGCTGGAA TTTTCCCCAT TGATTAACAC TTCTCC.ATAA AACAATACAC TTTTCCCTGA GCCATTCGCA CAACTAAAGT TAAGGTTTI'G AAAAACACAT AATGTAATrA TPTCAITCAT TCTATAAACC r TGAAAAAG AAAGACTAAA AATAGCAACT TCCCTCGATT CAAAATATAA AATAGATAAT ACACAATCAT GAGTAAAAAG AAACTAACGC INFORMATION FOR SEQ ID NO: 49 SEQUENCE CHARACTERISTICS LENGTH: 11443 base TYPE: nucleic acid STRANDEDNESS: doubl TOPOLOGY: linear 445 GACGGACTAT ATAAACCAGC TATTAATTTA CCAGTAATITC CTATAATTTC CCCCTGTTrA GTCTTTT~PTA ATrTCAACTC AATATTTTTT TCCTCTT'G ACGAGTGAAA TAGAAAATGC GAAGAAATAA ATCTCGTCCT ATATCTCCAT TAGTCGATT TCCTACAAAT AGACCACCAA AACCAAAGTT CG 24660 24720 24780 24840 24900 24960 25002 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49:
CAGGTACGGT
TTATGATAGT
TAAGTATAAA
GGAGGGCTGG
TTAAAAAAAA
GAGAAGTCAA
CCAGCACTTG
CCATTGCTTT
TCTGCTTrAA
AGCTCAATCT
?I'AATCCTGA
CTAGTGATGC
AAAGTGMT
GAGGCGCAAC TAA.AATATAA GAGCATTGCC ATTGATGGAC ACATGCGATC TCCTTCGATT CAAACTTC CCTTGACTAG TGATATAATA GAA'TTATGG GGAAAGACAG GCTGAGGGTT GCAAATCGTT AAACGAAATG GGCTCTTGCC TTTGTGCAGG CGCTrTT'ICT GGGATrGTGA CATGACCAAG GAAAAGGTCA AGAATrAGTG CTAGGAGATG CTTGGN'TG GAAGGCTTTG GGTGCAAAAG GAAGTTGACG TTrTCATCTT GATTAGGAAT TTTrATCAGTA CATAAGAGCA ATACAACTAA TCCACGCAAA GTTrCTrGT TATTATTATA CCTTATCAAA ATACATATTT AGGATGAAAT TAGAATTCTG ATAAAAATAA GATTATGGGA TTAACCCAAA TGGTCAATGA CTTTACCGCA TCAGCCAGTA TCTTTACCCT TTrTAACGCT TTGAACTTTG CTTGGAGCAA TCTGGTCTTC TTTGCTGrTA CCGAGCTACG AGCCAAACAC ATG-GTGGACA AAACCATCCG TGATGGTCAG GAAGTTCCTC TCATTCGPT GTCTGCAGGA GAGCAGATTC CGGAAGTCAA TGAAGCCATG TTAACGGGAG CTTACTTTT GTCAGGAAGT TTCCTAGCCA
TCGGTGCAGA
GTGGGTCAGT TTTATCTCAA GTTCACCATG TGCTTGAGGC TAAGACCGTT AAACCCATCA. ACTCCCGTAT TGGCTGGTTT TACTGGGAAG ATTATCA'lrCCCTTTGGTCT
CA.ACTATGCT
CATGAAATCG
GGCTCCTG
GCCAAACTCA
CTGGACAAGT
CTGGAAGCCT
TGCTTTTAAA AGGCCTGCCT CTCAAGTCAT GAATGTTGCC TAAGGGAATT GCCC77rTGA AG7TrGGGCTT GAAAAAGGTC TTGGTGCAGG TGGATATGCT CTGTCTGGAC AAGACGGGTA CTGTTCI'CC GTTGACGGAA ACGTATGGTG ACATGGCCCA TAGTGAGGAT AAGAATCCAA GAGATGTTGC TTATCCrATG ATTrCCAATC CTATGGAGTT AGAAGGCTTG GGGACAGT CTGAAGTCCC AGAAGCTAGG GAGGCC7TG TCAGTCAGGA GAAATTAGAC CATCACAAAC CCTTCCTGGA AATCI'GGAC CCC-ATTCGAG GTTCTCAGGA GGTGGGACTC AAGATTATCT TTGCCCAGAA GGCTGGT GCGGACTATC A'rGAGGAAT'r GATCGCCATG GCGGACGAGA AAAAGAAACT CATCATCCAA ACGTTGAAAA ACGGGGTTAA TGATATCTTG GCCCTTCGTG GGGATCCAGC AACCCGTCAG AT'rGCCAA'rC TTCCTGAGAT TCTCT'rCGAG GGTCGTCGCG 'TTrCTTGAT AALAGACCATC TATTCCTTCC TACTAGGTCG GTCAGAGTGG ATTTTGATTT TTGACCAGTT' TGTGGAAGGT TTCCCACCAT CTGTTGAGCA GAAN'TCCTC AGAAAATCCA TCGTCTTCAG CGTCCTGTTT GTGAAAATGT AAATCTCAAC TCTACTCTAT TATCTCTTGG CCTGCATGCC ATTTACCCTA TGGCGTGTCC TAGCCACAGC TCTCTTCCCA AGAATT'CAAA AAACGTTGCC TGTTTATGGT GTCATGATG'r GTCGTTACCA AGCGAAAAAA TAAATCAAAA GGCTATAAGC CGCTTCTACC GCCCAGGGCC CCACTT'ICCC GAGCAGGTGC TAAAGCACCT 446
CCGTTGTAAA
CCATTACTTC
AGATGTACrC
CCATCACCCA
AAGAGGCTAT
CTGCCCAAGC
TTCCCTTCTC
TCTTAGGGGC ACCTGAGATG TTGCTTGATT AGAGAGGATC ACGTGTCTTG GT =rAGCTC
CACAGAAACC
AGGGAGCAGC
CTGGTGACAA
ACAGCTATGT
CAGCTATTITT
AAGCGGGACA
AGGCGCATTG
TGGTTCTCTT
TGGTCAATAA
'rGTTAGCAGT
TCCCCTTCAT
TCGTTCTGAC
TGCTTCGTGC
TTCGCGCGAG
GGTCAATTGG
TCTTGATTGT
AACTGCTTGA
TGGTCTTTAC
CCACCAGTGT
AAAGCCCCAC
TAGTTrACTTC ATCTGATATT CAGGCTCTAG AGAGACCCTG GACTA'rCTCC TCCAGTTACG GTGT.CCAGCA AGATTrGCTCA AAAATCACCG CGGACGTGTT TCCCCTCATC TACAACGGCT ATGACAGGGG T=CATCGTG ATGGCGGAGG GAACTCAGAC TTTAATGATG CAT'rGCCCAC ATCGCCCCGA CATCTGTATT GCCAGTGCTT TCCGATCCAG ATTACCATGA TTTTGAGCGA AATATCAAAC CCTACCAAGC GCTCTCATGG TCAAGGNTGG TCTGAGTTAG TTTCTTATCC GTATTTAGAG 'rTGGTCAGTA GGAGGTr'rCC AATTTCAACC TTAACAGAAC CGTGATTTTC ATCCTGACCA GAAcTGGTGG TTT-GTrCTGC CGAAATAGCT TCCTCGCGCA CTCTTAT'TA TTTCGCCAGT CTCGTCGACA GCTCITTG GCTCTTGACr GCAGTGATTA TG7?TGAGACC TrGGCG=GG ACGAAAGATG CAGGTGGAGG TGCCAGCATC rTTACTAGCT CATTCGCCAG CGTwrMG GAGCGACCGC AAGTGGGGGG 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 447 AAACCGATCT ACTGACTCrA ATAACGTGAG CTGGTCTC~ AC'TCTGTC-1r CTTGTAKI-rG ATTCTGAATA TATTCAGCTA TCACTT'CTG ATTACGGCCT ACCGTATCTA C-ATAATAGCC TCTACACCAA AAcT-rCcGA5T TGccpATATTT GTATTTAAA TTCGCATGCT 'rATCAAAAAT CATCAAACTC CTCTTGCCCT TTAAATACCC CATAAAGGAC GAPLACACTA-A GITrTCGGACG AATACrGATA AGCATGrGAA TATGGTCTGA ACAAGCATTC GCrrCATGGA TT'ATTACACC CTTACGCTCA CATAAGTCAC GTATGATTCT TCCGATACTA CCT=GTATC TGCCATAAAT GA'rTGACGA CGATATGG GTrGCAAAAAC AATATGATAT TTACAATTCC ATGTGGTATG TGATAAACTT TGATTATCCT CTCTCATGAG GTACCTCCTG TATGATATGT TGTAGTGGCG GAGAAACCAC TTCTATC'rTA TCATTTTAGG AGGTTCTTTT TGTTACCACG C'rAAAAGCTC TATGGAAcCA CTAGCATAGC TAGTGG=TI CGGGAGACAA CAAGAAAGAC TGCAATCTGT GGATTGCAGT T'rTTTATACC ATGGATCTAT CGTAGATCTG ATGTGCAAGG CCTACGTGCC GATCATCTAT CGGTGAACCC AAGAGCGACC CTCAAGCCTG CrTGGATTGA GGTAATAGAT TCAAATATCT GTAGTTACAC TATT'rGAAGT TTGATGTAAG AAAGAGAAAG CGACAGA'N'G AAGTAATTTT AACTCTCTTC TATTGCTAGA AATGAAGATG CTATCTATTG TTAAATGGAA T TA'IrTCI ATCAAA'rACG AAAAGCAACT AGAAAAGGTA AAATGATTTT GGCATAGTG.A ATAAAAACAA AAATGTCCAT TGCAAAGGAC GATATAA'rGG ATTCATAAAG GAGGTGTATC TTGTTGCTCC TTTTGCGAGA TAGTAAGGAT TTAAATTCCT CTGATAAAAC GCTTTATCGC GTAGAAGCAT TCA1-rTrATC TGAAAAAGGC CTCGTGGACG TrTGATGGGAA TTTTACAGAG TTACTAGAAC arCTCTTGTT GACTGC1'CCT .GAATTCTACG TAAGCGAGTC AGTAGTACTA GCAATTTATG GGTTAGATTT AAAAATGAGA GCTCAAATTC GTTCAGCCAT TCTAAATCTA CAAATTACAC AGAATAAGGT TCAGCCTCTT TT-ACTGAT-rA CACTTGAGAG AGAATTGGGG ACAAATGGTC GGATAGGTTG CATAGTGTTA ?TTATTAGAA TAAATATITTC AACTAAAATA GGTTCTGTTC TATTTGATAT AAAATGCGAA GTATATTATT
GTAGTTTGAA
AATCGTTTGG
GATGTTATGA
CATAT'rTTG TT'rTGA.AAGC 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500
GTGTCTAGAA
TATATATCTG
CTTCCAAGG
AGAGGTTTCA
GCTrrTGATC
AAGCCACN'
AAAGATCGTC
CAACGAAAGC
CTGCCAATGT
GACGGAGAAC
GTAAACATTC
AACAAGAACA AATGGAAACG CTAAAGTA'rT GGGAGAAAAA CAATCAACAA AGATTGTCCG AATTAAATCC AAGAAGTTCC CTGAAGTAAG GCGTGAAAAA CTATTTATGA TTTAGGAGAG AGATATTACA AGAGAGTCTA TTTTTATTGA TGGG4GATGAG TTAATCAGTT GGATTTAGAG TTGCTCACTT TTGTTTGGGA CCTATCCATA TAATATAAAT 6 ATTTTCTCTC ACCTGTATAT TTTTATCAGT GTAGCACCTT CAAAACCTAC TATTGTTGAT ATTCAAGAAA TTGAACAATA TTTTAGGATG TATCAATACG 'TGcTATcTTC GAGATTGCAA TCTrCAGCGAG TTTTAGATGT CACTCATTAC GAGAT'rGAAA CGACAGATCC TGACTTTGTI' AGGAGATTAG ATAATAGAGT ACAGATTAAG TATCCTAATC TGG'rTAAAGA CTTAACAACT TTTGCTICCT TGAGTCTGGA CGAGATTGGT GAAAAGCGAG CACG=CCTCT AAAAACAGTA GAGCTTTTAC GAGCACGA'N' AGAAAAGCAA GCTTATCATC AATTAGATGA GCTGATAAAT ACCGTAGCTT TGCAGGAACC AGCAAGTGTC GAGGGTGA'rA AACAACGTCT TCAAGCAAAA GCTTG'TCCTr ATGGATATAT CTGTTCAAAA TCAAATCAGC CTrCTTGTTT CTGAAGATAC TGAGAGACAG GGAAGTATAG AGGTTGCTAA CTTTCAACAT CATGTCTTAG TGATTACTAG GGATATCCAC TGTGTrGACC TTATTATCGG ATGTATTAAA ACATTGATGA GAAGACTAGC GrTTAACAAAA GAAGAATTAC GGGAGATAAT GTCTGATCCA GACCGATTTA CAGATGCAGA AACTATTGGT TGAGACGGGT TATGTGTCTG AGAGAGAGGC AGAAGGACAG ACCGGTATG GTTCTGCTGT GGAGAAGGCG GGGGTAGTCA .AGACCATTGA TGGGAAAGGG GTCAAAGTAA AAGCTGCTAG GGAGCATTTG AAGACCTTAT AAGTTGTTGC CAAATTAGTT CGGGCTCAGA AATAAGAAAA AATrTTTGGAG GGTATCCGTA 448 AGGAATCGT GTAGTACTAG TA'rrCATGTT GAGAAAATTT ACAGTGTCTG TCAAAAAATT AAGG?1'GATG CAGr'rGAGAT TGACTATCTT AAACCATTTr CTTCCGGGAA GcCTCCTrrr TA?=TAGCC =TTG=GAT GGACAATAGA GACTTGGCGA GTCATATCAG TCCCTTACTG AATAGTCTTT TATCACAAAT TCTTTTAACC ATT'rCTAAAG AAGTcAGTCTI AGTA7TTGGT TTTCTAGTCT TATA'rTrrCC ACGGTTTCAA GTGATGTGTA CATCAGGTGT CCGAACTTCA TTTTMCTGAAT TGGATATTAT TGATGTAGTT CTATATCCAG ATTTAGATTT CA'N'GTGACG CCGTT-TGTCC TAGTTAGTGT TTT'rC'TAACC ATI'CAGGAGA TAAACTATGA ATAATCr'riC TCCTCAAGAA GCCTACAAAG AATITAGCAAA AGAAAAAATA GAAGAGCTTC TATATTACCG AGGTGN'CTT CTACCACATT GTGAAGGAAA ATTAAAATCA CCTATCAGAG AATGGTCGAA TTT-GGCCATT GCAGTATCAC AGGACAAGTC AGA'rGAATCA TTCATAAATC AATTAAAACA ATATGGAAAT CAAAGATATT CTTAATGTGA GCAAAGAAGA GGTT'TTTGAG GCATTAGCTC ATAGAGACCA ATTTATCGAA GGTCTI'TATC GGAATTATAT TGCTATTCCC CATAGCAACA 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5 4-00 5460 5520 5580 5640 5700 5760 5820 5880 -3940 6000 6060 6120 6180 6240 6300 p* TAGCTATAAA TCACAATGAG TTGTACTCTT 'rGCAGTTGGT CACTCTTr-GC TCGAAAACTT CATCTGATGA TGTGA'rrGCA TGAAAATTGTI TGGTGTTGCA
ATTCCTTGGG
GATGATACAG
GGTAATGACG
GCT'rTTTGTT
GCTTGTACTG
TGGGAATTGC ccAcAcTT-AT A'rIGcACAGG AAAAATTAGA GAATGCCGCA AAGGTAGCTG 449 GACATGTGA'r TCATGTTGAG ACTCAGGGGA CAATAGGGGT AGAAAATGAA TTCGAGTCAAG ACGCAGATTGA TGCAGCGGAT GTAGTTArr AACC7TGA GGGTAAAAAG ATTATCAAGG ATAAACrGAT TGCrAAACCT GTTGAGAI'G AAATATATGT TGAAACACI' AAACTTAAAA ATTCCAATTrG T71'GTGTIGC AGGATTCTTA GTTCCTGACG C'rC1rTAGC AGGAAAArrC GGTAAAGCCC TTGGTCTrr GCCAGTTGTT GCTAACCCAG CGATTCCACC AGGTTTTGT GGGTT'rATCG GTGGTATCTT GGGAGarrAT AAAAAGGTCA AAGTACCAAA CTGGATTAAA TAGCAGTGA TGTTrAAr.ATT TCTGGTATGG TTCCAACAGA ALGTGGCAGTC AAATCTCCCA TTACGAAATA ACTGAAAATA TTTAAGGAGA GGTCACTTAT TGACAGCCAT TTCCTATATG GTTCCCATrG
ACTATCTGGG
AT-rGCTACAG
GTTGGTCTAA
ATAGCTG?
GGTTAA'rGC
ATTTATATTA
GN'AGCALAT GGGGG'rGGT ATGCTTTAGC AACTATGGGT GTrTGTCTTA C'rCGATTGCT 7GCCAA'TC TGTTGGTTCA TCTTGGTTCA AGCGATTATT CAACCTTGAT TArrCCTT TTCGAGCGCC TATCGCAGCC TTTGATTATG GCrAGCCTCTT TGG;TAAGTAG TTTACCAACT GGTTGACGAG GCGGCAGTTA TTGGAATTCT TATGCG G TGTTGACTTT TTGGTGAATA C-CCTACACC AAAAAAAATA TCTATACTCA ATTGTCAATA TTG= GAAGG ATTGCAACAG GTATCGGTGG TCTGCTGTGC CATTTGGTGG ATTTGTGCCT TGTTAGCTAA CCAATAAAAC ATGCAGAACC GAAAT'rTrGT AAGAGGGTAA GGATTTGGAC AAATTCAAAG -TATcGATArr ATGGALTGGcc CTTATTACAA AGCTTGGGAA G1'GCTTCAAA TGGD'rGATG CAGTGCTGTT GACTTTGGTG GCCCACTTAA TAAAACAGTC ACAGGCTGAA GGTGTGAAAG AACCATTGAC 'rGC ACAA AGTTGGATITT GGATTGGCCT ATTTrTATCGC GAAATTACTC AGAGGAAATC GAAACAN'GA AATCGGCTG'r TCCTATGGGG TGTAATTCCG ATTGTTATGA ATAACTTGGT TccAG.GTCTC 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 TGCTGTTGGT GGTCrGTTrT CrTTGACAAT AGTGCTTATG TTACCAACCA TGACTCGTCC CATTGTAGTC ACAGGACTTG TCTACGCGAT AG -rATGACT GTTGAAGAAG AGATTGA?1TT CGATGTCAAG AATTGAA'rTr TCACCATCI' AcGCAGATTAC Tr~rT=GAAT GA'rAAAGTAG GGGTGCrGAT
AOTAGCTGGT
TTTGAAAAAA
GTCAGATATT
TGATGACCAT
CATCTTATCA
A'TTTGTTCC
AGAAGTTCAA AAAATTAGTG ACACACCTrr1 CTTTTGGGTA GATCAAGTTC TCGATrrACA TCTGAATGGT CTTGCITTTC GTTTrGATTGA TGT'rGTCCTT AATCCTGAAA CACCTGTTTC CAATATTACC TT-GTCTCCTT GGTTCATTCA ATCAGTTCAT CTGATGGTCA CAGACCCAAC ATGTGAGTAT ATTTrGTATTC ATGCTGAAGT TAAAATTCAT GATGCAGGTC TAAAGGCTGG TACAATCITT CCCTACATTG ATI'rAC=TGA CAAAGCAACT ATTATGACTG
CTTGTATAAA
TGAGATGGATI
TATTTATG1rT
GGATATCTGT
G7"rTGAGAAG TCGTTrTCTT
GGGTGCTGCG
GCCAAACTGG
GTATGCTCTC
CCGGCAATCG
TGCGACATCT
GCGTTTTTA
TGTTATCGCT
ATCCAACQAAC
GGTTCTTCGA
ATAGGTCGCA
TCTAGAGATT
AAAIT'TATTA
GGCGTAGATG
CCAATGGC'1r
ATTAACCGAG
TTCATTAA
GGATCrAAGA
GGTCCGCTTG
GCTGCTAAGT
GGAGACGGTG
450 TAGATCCAGG =TrrGCAGGA CAACGCT'rTT TGGAGTCTAC TCCGTCAGCT TAGAGTTCAG AATGGTTATC ACTACATCAT GTCGTAACAC TTTCAAACAA ATTGATGTGG CAGGACCAGA GTGGATTATT TCG~rTGGAT GACGATATTG CCAAAGCCTG ACGAAGAAAT GACCGGAAAA ACAATGCCAA TCAAATAATC GTTAGGAGGA ATATATGTCA CTACAATCAG TTAACGCCAT CTATTAACAA ATCTAATTCT CGTCACCCGG GAATTGTCAT ATAGCCTATT TACAAAGCAC CTTAGAATTA CACCTGAGCA ATCGCTTTAT CTTGTCTGC GCGTCATGGAT CAATGCTACT CAGGGTATAA GGATGTATCC ATGGACGAGA T'rAAAAATTr CACCTGGTCA 'rCCTGAAGTG ACGCATACGT CTGGTGT'GGA GTCAGGGGAT T'rCTACTGCC GTTGGTTTCG CCCPAAGCAGA ACAACAAAGA TGGTTTCCCT ATT'rT'rGACC ATTATACTTA ACTTCATGGA AGGAGTGTCT GCGGAGGCGG CT'rCTTATGC AGGTCATCAA GCTTTAGATA AGCTTATCGT CCTCTACGAC TGGTGAGACC AAAGATACTT TCTCTGAAAA TGTTCGCGTC GCATACAGTT CTCGTACAAG ATGGAACAGA TTTAGCAGCA GGCCAAG= T'CTGGTAAAC CGAGTTTGAT TGAAGTGAAA ACCCAATAAA AGTGGTACAA ATGCTGTTCA AGCAACTCGT AAGTTTTTGG GATGGGATTA TTCTGAT'rTC AAGACAAATG TAGCGGATCG TTTGGTG'rCT GATTACAAGG TTGCT'rA'CC AGCTGGAAAA TCCCCTGTAA CCATTACTGA CTCTCAAGCA ACTCGTAAT CGTCCCAAGA CTTCTTAGGT GGATCGGCAG ACTTAGCTCA CTTACAAGAT AAATATAATC CATTAAACCG CATGGGAACA ATCCTCAATG GAA'rGGCTCT CTTCT7r'rGTT TTCTCTGACT ACGTCAAAC GCCTGTAACT TATGTCTTITA CCCATGA'rTC 'rGAACCAGTT GAACATTTGG CAGGTT1'ACG
TGGTGCACCA
CGATCCATTT
TGGTCAGGAG
CGAAG?1'GCT
AA.AAGACTTC
TGCTATTAAT
CTCTAACATG
CAATATTCAG
TCATGGTGGT
TGCTATTCGG
AATTGCCGTT
CTCAATGCCA
TCCAACGACA TCTCTGGA CGTTACGATG CTTATGGTTG ATTTCTACAG CAATTGAGAC ACGGTAATTG GTT-ACGGC'rC CTAGGAGCAG AAGAAACAGG GAAGTACCAG AGGAAGTATA GCATACGATG CT'rGGGCTAG AGTGAGATTG ACGCTATTGT CCTGTCTATG AGAATGGCTT ACAGCAGCAG TTTTACCAAC ACCTACATCA AGGCAGATGG TT'ITGG43GTAC GTGAATTTG.C TTACGAGTTT ATGGCGGAAC CTATCAGCCA T'rCAGGAGTT~ GGTGAAGATG GTCCAACTCA AACTTGACTG T'rATCCGTCC 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 451 AGCGGA'rGCC CGTGAAACTC AACTIGTCATT GTCTTAACCC GGTCGCTAAA GGAGCCTACG TACAGGATCT GAGCAATC AAGCGGCTTG GCATCATGCC TTGACCZAGTA CCACCACTCC GTCAA).ACTT GGTAGT TGAA GAAGGGACAG AC~rGGAA TCGTGTATGA TACCCCGGGA TTN'GATACTA 1"rATCATTGC TAGCTATCAA AGCTGCTAAG GAATTGGI-r TACAAGGTGG TGCCCTCAAC CGAACTATTT- GATGCTrCAAG ATGCTACCTA CTAAGACTCG TCGTCGTGTG GCCATTGAAA TGGCAGCGAC TGGTTTGGA TGGCGCCGTC ATCGGTATTG ACATCTTCCG CTGTGATTGA. TAATTATGGA TTTACGGTAG AGAATATCGT
TAAAGTACGT
CAAGGAAGAC
CCAAAG'rrGG
TGCGTCTGCC
TGCTCAAGTT
GTGGTATCTA
ATTT TACCAT
TACAAGTATG
CCAGCTCAGA
AAGTCCCTAT
GCAGATGTAG 'rGATAGACAC TAATAATTTA TCTACGAAAG AGAAACCAAT TACAATGAAG TAATCAGATG ATTGGTTAT TTATAGTAGA TAGTATACAC ATACAGCTGT 'rOTCAGACTA TAAAAACrGT AATGAAAATG AATAGAGTAT ACCCTGAAAC GG'TTGCGAAG TACGCrAATC ACTTTGCTAC ACAGCATCCA CAGATTGACT TAGGATATTG CTAAAATTAA AAAACC.ATA GTATAGGATG GACAGAAAAA ATCTAACTT TT-GGGGTGTT GTTCAGT'rCT ATGAACTTAG AAAACAAGGA GGGATAAATA ATTCTAATCT TAGGTACATG TTCGTCAAAA AAGGGAAAAA TCGTTACTAT AAAGTCTGAC ATGAAGGCTG GACTAAAGAT CGTACGATAC TTCTrAAC'rG GCTAGCACAA AAAACAAAAG GGAGAGTACC TGAGAGCGGA TGATCTAGA'r AGTTT CTTA ATCAATAAAC TAAGTTTTT'T GAAAGCTAGA GAGAAGGTCT TTGAAATGAT GAACTGCACC CCAAAAGTrA TTTAT'rATGA AATTAACTTA TGATGATAAA 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 TA'rATC1'rAG ATTAAAT'rGA
TCTCCTGATT
AGAGTTTCTC
TAC.AGGAAAA
GAATGCCATC
AGAAGCT'rTC AAATAAATTT TTCATCGTTA CGGAATAGAG
TAAAACAACA
TTGAATACGG
ACGGGTATAC
CTAAAAAAGT
AATGATTCAT
TCTCCCAAGT
TA'rTGT'rGAG
TAAGAGAACT
CCGATTGAAG GAGGAAAAAG ACAAATAAGA AAGACAGAAA TTGTTCAAGA ATTAATGACT GAGTTTrTCGT TAGATCTTCT TCTAAAAGCC ATTAAACTAG CTCGTTrGGAC CrACTACTAT CACTTGAAAC AGCTAGATAA ACCAGATAAG GACCAAGAGC TTAAAGCTGA AAT'rCAATCC -ATCTTTATCG AACACAAGGG AGATTATGCT TATCGCCGGG TTCATTAGA ACTAAGAAAT CGTGCTTATC TGGTAAATCA TAAAAGAGTT CAAGGCTTGA TGAAAGTACT CAATTACAA GCTAGAATGC GACAGnAACG AAAATA'lrCT TCTCATAAAG GAG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 5338 base pairs 11280 11340 11400 11443 TYPE: nucleic acid ST'RANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCAATTACAT TATA'PTATCA ACTCTTTATC ACTCAGCCAA
AAATCGTCGA
GTCTCTCCAA
CAT'rGAGATC TTTATCCGCA ATCCCAAGGG TCTCTCTTAT GCCCGTCAGG TTGTCGAGCA TCCTGTCGCC CACCGCGAAC TCTTTAGCGT TGCCTTTGTC TCTTTGCTCA AGAAAAGCGA AACTCGGACT TGGGAGATTA TCGACGACCT CTTCTTAAAC AGTTACAACC GTGATGTT AGCCCACCAT CTCTTCACAG CGCAACCGCA
AACTGGCTCC
TGCAGTGCGA
AATCACCTTG
GACCCAGCTT
TTCGTCTCAA
TATGGAGAAA
CAAGAACTTC
AACCAAGATG
TATCTTTGTC
ATGA.ATGAGG CAGCCAAGCA GATTTGGAAA ATGAAATGGG ACCCGTGATG GCATGGAGTT CTGGACGAAC CCTATAAAAA CACTATGCCT TTGTGGTCAA TACGAACrCT TCCTTCGTGA CGCAGTGAGG TCGGGGTCCT CTGGATGACA ATCACCTGCT AGCAAGACCA ACCCTCTGGC AAAGAAAGAC AAGGTGAAAC TGTCTGATTr GGAGAATTTC CCTTACCTCA GCTATGACCA AGGGACGCAC AACTCCTTCT ACTTTTCAGA AGAGATTCTT TCTCAAGAAC ACCACAAGAA ATCCATTGTG GTCAGTGACC GTGCCACCCT CTTTAATCTC TTGATTGGTT TGGATGGTTA TACCAT'rGCG ACAGGGATTT TGAACAGCAA CCTAAACGGA GACAATATCG TTTCTATCCC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 ACTGGATATT GATGACCCGA TCGAGCTGGT CTATATCCAG CATGAGAAAA TAAGATGGGC GAACGCTTTA TAGACTATCT CCTAGAAGAA GTTCAGTTTG AATGATAAGA ACCAATATGT AGGCTAGCAA CAACCTGCAC ATTGGTTCTT ATTAAAAGTT TCCCCTGCCA ACTTATCAGC TAGC'TTGGGA A.AGAGAGTAT GGCTAGGTTC AACAAAATCG GGAGATTGAG TTCTCGTTTG TrTTTTrCCTA AATCTTTTTA GCCACTGCAT CTGGTTCTAG CAGGAAXCCGA TCAACCGATT
CCAGCCTATC
ATAGTTGAGA
TTTACTTATA
AAAACTTATG
TAATCTTGAC
TAAGATAAGT
TCC.ATCTGGG
CACATAGACT
AAACTTGGTC
TCGGCTTGGT
CCATAGGGCA
GCTGAGTAAA
GATGTTGATG ATATGCCCTT ATTCATCAGG GCAAAGGTAT GTCAAATCCC TCAAAAATCC GCGGAGATAA AGATCAGTTA CGAAAAATCC TGTACGGATT GGTCCTGGAT TGACTGTTGT TAAGTTCGAG TCCCAGAGCA TTTGAAAAAC CAATAGCCGC GACTAGACTT GCCAGTAGCT ATTAGACCTG CCATGCTGAC TGCTGCTTTC CTTCATACGA GCCGCAAGGT GACGAGACAG TGACCTCAAA CATCTGGTGA ATATCTTTAT CAGCAATCTG CGTAACCAGC GTTGTTAATC AAGACATCAA TCTTGCCATA CCAGAGCT'rC TAGGGCTGAA TCGTCGGTAA TATCAATTTC AATCAATTr GCATGGGAAT AATTTCCGflA AAGCAAGATG AGTrGGTCAT TGGCCAGGAG AGCTCCGGTA ATGAGAATAG TAGGCATACI' TCTTCCAACT CTTTGACCAC ATGGACA1-rT 'rTGCTAATAT CTTTTGAGAG GAAACGGGCA CCTGCTTCTA CCGCTAC 'rG TGCACTIGC AT7rTTrCAT CACCCTTGCC ATAAG'TGGAC AGACGCACAC 'rCGCACCCGT TTTTCGAGTG CGTGGCGCTG AGATATAGTC TGCTGCCTTG TGGCCGTTTT TGATT'rTACC A.AAAAGCGG TCAGCATCCA GCGTCCCTTC TAGATCCTTT GTGTrGGTCCA GCTCCTCTGC ATACACAGTG GAATCTTGGT CAAACTCATG GAAATGAATG AGGCTGGTTA AGACAAATGA CTTGATTCCT 453 GAGTGGCCc TTGACCATTr
TATCCTTTCT
TCAA.AAATTG
CTGATA'rGGT ATATTAGTTrG TCATGAAC1'A
TCCAAAA
A=TCAGTTC
AAT'rrTCCT TATTTCTACC TCTTGAGCTA GACCACCGCT GTGACTGCTA GA~rrCCACT TGCCAGCGTC TTTCTTGAC;T TGACTAGCAG GCG ?GGCA AGTGACCATC GTTACGAGCA GGACATCTGC ATTGAcAGcc TAGTGATAAT CTTACC'rGGA CGTCrTCCAA AACAAGATCC CCGAACGGAA CACCAGCAGC CN'GAGTTTT TGCA'rGACAC GATAGCCAAC ACAGAAAATA AA7"TATCGG TTTCAAGA.AT CGGTAGGGCA GACGAGAACC TGAGGTCCGT AGATTTCCAA 7T'rACCCAGA
TGACACACGA
ATCTGTCTGC
TCTCATTGC CCTGAAAGGC ACGGCTAGAA AGGAAACCTG GCAAACCAAA AATGTGGTCT CCATGCAGAT GGGTAArAAA GA=rGCTG ACC7,TACG-,G GTCGAATTGT GGrTTCCAGA ATGCGATTTT GCGTACCTTC 'rCCACAGTCA AAGAGCCAAA CTCGT'rAAT1 CTCATCCAAA AGTTTCAGGG CGAGACTTGA AACGTTGCGG GCTTTAGAGG GC1'GACCAGC CCCCGTTCCT AAAAATTGAA TATCCATTCG ATACTTTCTA ATTAATCAAT ATATAACATG GCTGTGCGGT TTTCCGATCG GAAATAGCGT TTGCCAGAAA AAGCAGCAGC T'rCTTGCAAT AAATCCTCT GGCTG'rAGCC TTTGAGACGT TTTCGACCAT CAGCCAATCT T'TCCAAATCA GTCAAAGCTC TGAGACTTTC TAGGCTGA'rA ACTTCCTCGT CCTCGACAGG CTTCATGTAA ATCTTACCAG ACTCTTCAAA GACTAATTGA TGGGGGAAAA TTTGCGCAAT rTCAAACAGC AAGTCATCCG AGATT'rTCTC CTCATTrMCA AAGAAAATCC GACCAAGGCC GTCACI'CTCA TAACAAAAAC CAAAGGATT ACCAGACAGA rrAAGCCGAA TAAAAGGCTI' A=rTCTAGG GTGAAACTTG GCrCAGTATT GTAAAGATTC AGTTCCTGAC TGAGTTCTGC AAAATAATCC GTCGCAGCCT GAGGACTCTT 'rTTCTGATAG AGTrCTGCAA AGTAGGCAI' AACAACACTT GGCGGAGGTG TAATAAGTGT 'rAACTGCTCC TGATCTGTTT TACCAGCTAG AAGCTGATCC AGATAGACCT TGTCCAGACT TGTATAACCT CCATACTTTA GAGCCAA6AGT ?rAATATCA GTCATAAAAT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300
TCTTCTAACC
TGATAATCAC
GAAAAAGGCA
GCTTGCAAGT
GTAGGCGTGA
TCCATGTCCA
TTGCTAGCAT
TccATTTArT 'rNTCTCGGAA GTTCTTCCAG AATTGCAACA CTCGCAGGGT AAATGCTTCA TrTCACGACT GTCCTCAGAC AATCCTCCAC CAAATCCGCr GGTCTTTCA'r GATGGAGAGA CGATAACATG AACCAGAAGG 454
ATGTAGCCTG
CTCTCTAAA'r
AAAA?'TCT
TTGCAGAAA
TTATTATAAA
ACCGTTTTTT
TCCACATGCT
TAATCACTTC GCCGTCTTCC CATGAATCTr GTAGGACTTT TAA'rCTTATC TAGCAATAAT TGAGGGTATA TGGCGTTTGG GCG'rCAAGTG AGGAATATCT CATGCTCCTC GTGGTAAGGA TGCT~TTrC CAAGGTTGAC
C
*e
C
?1'GAAACTGG ACACCAACTC TGTCGGCAAA TCTTGGATAA AGCCAACGGT ATCTGTCAAA GTTACTTGGA GATTGCCTCC CAGATGAATA CTCTTGGrrG TCGCATCCAG AGTCGCAAAG AGCTCATCTG CTTCATACTG GGTCTTACTG GTCAAGATGT TCATGATAGT TGATTTCCCA GCATTAGTA'r AACCAATCAA ACCAATC?1'A AAAGTGCTAG ACTCCAAACG T7I1IrCTCTG ACAGTCGCAC GATTT'rrCTC AACCACCTTG AGCTGGCGCT CGA'rATCCGT GATrrGATTG CGAACGCTAC GACGGTTCAG CTCCAGCTGG CTTTCACCAG GACCACGGGA ACCAATTCCC CC~gCCTGAC GGCTGAGCAT AATCCCCTGA CCAACCAAGC GAGGCAAAAG GTAN'TGAGT TGGGCTAGGT GG.AC7*TGGAG CTTCCCTTCA 'rGGCTTCGAG CCCGCATCGC AAAGATATCC AAAATCAACT GCATACGGTC AATGACCT'rA ACACCGAGAA CTTCCTCTAG ATTGACA'r'C TCCI TGG 'rCAGACGATT GTTGACGATG AcAGTAGTGA TTTcTTCTGC ATCCACCATA AGCGCAATCT CTTCCAACTT ACCAGAGCCG ACGAAGGTC'r 'GGAA'rCATA TTrTl'rCACGT TT=GTCTG'r AGCTATCTAC AACCACTGCC CCTGCCGTTT TCGCTAAACT AGCCAATTCT TCCA'rGGAGA GGTCAAAACT GTCCATACCC TGCAATTCCA CACCAATCAG CAGGACTCGC TCCTCTTTTT TCTCCGTTTC AATCATCTAA AAACTCCTCT ATCTGGCTTA AAATGCGGTC 'rTGTACACCA GATTCTCCAA TCTGATAALAA GGTGACCTGC ATGCGATTAC GGAACCAGGT CAGCTGACGC TTGGCAAAAC GACGAGTCGC CTGT'N'AAGA CTCTCACTAG CTTCCTCCAA GGTCTGCTCT CCACGGAAAT AAGGAAAGAG ?rCCTTATAG CCAATTCCTT TAGCAGCCTG TACATTAGGG GAATGGTCAA ACAGCCACTT GGCCTCATCC AAAAGCCCAG CCTCAAACAT CAAATCCACT CGGTGGTTGA TACGCTCATA AAGTTGACTA CGTTCATCAT CCAAGCAGAT AATCAGCGGT TCATACAAGG TC-TcTTGATT rrccAAATcc TGACCAAAAT GGGCAATTTC TAAGGCACGC ATAGCACGAC GACGATTAAA CTGGGGAATC TCAAGGCCTG CTTGATCCAC 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 6 CAAATGGGCT AATTCCTCAT CTG-AATATGG CTCCAAACTA CTCATGAGGA GTCTCCCCAC CTAGGTGGTA ACCTTCTAGC GCrCGATAAG CTAAAATCTC AAGCTCTGGA TATAAAGTCC 455 AGTCCCACCG GCGATAATGG CTAGCTTGCC ACGGTTGTGA ATACCCTCAA TAGTCATCTr AGC~rCTGAA ACAAAATCAA AAGCCGAGTA Ac.ACTCGGTr ATCTCTCTAA CATCCATrAA ATGATGAGGA ACAGCTGCCT GCTCT'rCTGG ACTAGCCTI'G GCCGTCCCAA TATCAAGTCC TCGATAGACT TCCTGGCTAT C'rCCACTAAC CAC'TTCGCCA TTAAAACGCT TrGCGGGG INFORMATION FOR SEQ ID NO: 51: SEQUENCE CHARACTERISTICS: LENGTH: 19446 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: CGGAAACCCA TCTAGTCTCC ATCGTTTGGG AGACCAAGCA ACACGAATCT TAGAT-GCTTC TCGCCAACAG ATTGCAGATT TAATCGGTAA GAAAAGCGAT GAAATCTTCT 'TACCTCGGG TGGAACAGAA GGGGATAACT GGCTTATCAA CGTGTGGCC 'TTTGAAAAAG CTCAGTTTGG CAAGCACATC ATTGTTTCAG CCATTGAACA TCCAGCAGTC AAAGAGTCAG CCCTCTGGTr GAAAACTCAA GGATTTGAAG TGGATTTT-GC TCCAGTTGAT AAGAAAGGCT TGGTCGATGT TGAGGCGTTA CAGGTN'GAT ACGGCATGAT ACAATCCTCG TT TCCATCAT GGCTGTGA.AC AATGAAATCG GCTCTATCCA ACCTATTGAG GCTATTTCAG AATTCTTGGC AGACAAGCCG ACTATT'rCCT TCCACGTTGA TGCGGTTCAG GCGCTTGCCA AAATTCCCAC TGAAAAGTAT 5160 5220 5280 5338 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CTGACAGAAC GGGTGGATTG GTTGGCTTTG TCTATATCAA CAGGAGCGAG ATTATCGTTC GCCCTCCGTT TGTCTATGGA GCAGTGATTC CCCAACCTCT AACTTI'GCAC CTCATATTCT CACGCCTTTG AAGACTATGA GGAAAACCAG CCGGTACCTT CGCGACTTTC TCTAGTCACA AGTTCCACCG GGTTCGAGGT ATCTGGCAAG A.AGATTACAC CTCTTCTTAC AGGTGGTGGC GACAACTGAA AATGTGGCAG GGATTGCAGC GACAGCCAAG AAAGCTAGAT ATCTTTAGGA GCAAGACTGG GCAGATGAAG TCTGAACrAT CCGGATATTT TTGT=C AGATGAGCAA GAC TTrTGGA ATCAAAGGTG TTCGAGGTGA AGTCATCGTT TATTTTCATC TCAACAACCT CAGCTTGTTC ATCrAAGGCA GATTGCCATG GGAGTGGACA AAGATAAGGC CAAGTCAGCT GTGCGTCTTA CCTAGAC T GGAAAATGAT ATGAGTCACG TCGACCACTT TTI'GACCAAG TTAAAATTGA TTTACAATCA AACTAGAAAA GTAAGATACG AGCATTCATG CACTATTCAG AAATTATGAT TCGCTACGGA GAGTTGTCAA CCAAGGGTAA AAACCGTATG CGTTTCATCA ATAAACTTCG TAATAATATT 'rCGGACGr CAGATCGCGA CCGTGCCCAC GCTTACCTCA CTCTCAAACA AGIrTTTGGA ATTCAAAACT TAGAA.GTTTIT GAACTCTTCT CTCCAAGAGA CCTTTAAGAT TTCTAGCAAG CGTAGCGACC ACCAAACACT TGGAGGGGCT GTATTCGAAG GTCCTGACAT CAATC7rCAG GTGGAGATTC CCATT-CGTGG GGCTGGTGGT TTGCCAGTI'G CACGAGGGAT TGACTCACCT GTAGCAGGTT AGGCAGTTCA CTTTGCTAGT CCACCATATA ACTTGACCCG TAAATTGACC AAGT'rTGGCG CAGAGA'rTCA ACAGGAAATC AAAGCCAAAG GTCGCTr'rAT GATGCGGATT ACTGACCG;TA TCAATGGGGA AAGTCTAGGT CAAGTAGCCA A'rGCTGT'rAC CAACACTCCC ATCATTCGTC T'rGACA'rCGC CCAGGAAATC GATACCN'TG GTACCATTTT TGCACCAGAT CGTCCAAAAA ACGAAGCGCG TATGGATGTr GAAGGCTTGG CTGAAATCAC ACCTCAAGCC GAAAAAGATG AATTCAGAAA ATCCAAAAGA A'rAGCGAAAA AACACGTAAA AAACTAACTT 7TTTATTIT AGAGAGT=~ CTGACAATGA ATCAATCCTA CAAGGTTCCT TATACAGGTA AGGAGCGCCG GAAAGATACA GACCGTTCCT ATCCTGTTGT TACCAAAGAG TCTmCArTG GACATTCATC GGATATCAGT CGCATGATTG TCGTTGCTAT GTATGCGGCT TGGAAGT'rCC AAGAATCTCC TGTGGAGTAT GCTGAGTTTG TCATGGAGGT TACAAAAGCA GACTCCCAGC ATACGGCTAT CCAGTTTATC CTTTGGAAT ACCAAGACCA AcAAcTTTGA ACTTGATAGT CGI'GAACTCA CCATTCCAAA TGTGCTT CAAATGAAAA GTGAAGAAGC AGCCTATCTT TCTArGAAA GAACrrCAGG TAAAGGGATG CTCATGTTGT A'rCTTGCTCY TAAGCGTGGG GTGG;ATATCG CTACTCCTGG TGCCCI'CAAG AAAGCGCAGG GAAATATCCA GT'rrATAGAG GTGCCT'rTCA CGCCAGAAGC TTATrGATG ACTCTAACTC TTCGTGAGGT ACGAAATGGT TTGGT'rATCA CCCAAACCCT TGAAAGTATG AAGGCTATCA CTGTGGTTAC CATGGACAAG TTGGAAATCA ACATTTCAAT CCAACCG'rTT GAAGACTGTT CAAATCCTAA AATTAAGAAT GCG;GAGCAGT TTGAGCGAGC AGTGGCTGGA ATCATGATTA AAGTTGATGA CT'rGAT'rGAC AATCTCCT TCAGTAAAAA AAGTTAGr'rT TTTCTCTAAA TATGATATAA TGATATAAAA T GAATAT CTl=ATCTA AAAATGAAAG AACACAAACT TGTACGTArr CTTCTCCTA AAGATTATGA 456 TGTC'TATCTA TACCCAAGTT AAGGTAACAG ATGGAGCTGA TTACACAGCA GT'rGCAGAAT TrrCTCCTGI' TTATAAGOTT GAAAAA'rCTG TTATGCGGGA CATCTACAAG GAAGGTATGA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940
ATACTTTCAT
GAAGATTATC
1'GACAATGAT
TATCCCAGGG
GGTCAAGCCT
GATTGGTTCC
AATrGGTTGC GACCGGCAAA ATGTTT'rAA CCAGCTATCA AACGAAATCC
GGTATGGGGC
CAGCATTTG
TTTATCGATG
TCACTAGGAG
TTG.GGCTT
GGATGAATGA
GTCGTAAGGG
AGACCTATCG
GCAATATTAC
TTTCATCTGC
AAACTGCCTC CACCAAGAAG CCTTTAACCG CCAGCCCATC TTCATCTATG TAGGAACAGA G;GATGGCAAT ATCAAACAAG CCTATATCGA AGCAGGGGGA GTACATCTGG ATAATCTT TGAAATCCCT TGGTCAGAAA ATCTACCAGA AGTTAAGAAA GGAAAAAACG AAATGCATAT TAACCGTGAA ATGTACCrrA ACCGTTATGG TTCATCAGGT GGTAGTCACA ACGAATACTA CTTATCGAG GAAGGCCI'G 'rCCACGrCTT GTTGGCTACrT GGAAAAATG CTCATGACCA CTAT 'CGAG TGCCAGAAAC TATCGCCTGA AGAAGCAGAT GATACAGACA AGACCTTGA'r
CTCGTCGCTT
GCTAAAAGT
TwTGTCTGAGA
TGAACATCT
ACATGGTGGG
TGArrTTGGC
TACCCTATCT
AGCGGAAATG
'rGCTATTACC
CAGTCTGGTG
7*TTTrGCAG
AGCCACTGGA
ArrCCAGTTG
ATCATTGATG
AGTTTGGATA
CACCGTGCCT
A'rGATTTGAT
CCATCCATAG
AAAAATGGTA
CTGGTCATCT
TGGTCTrGC
CCTGTGCTTC
GTGAGAGCTG
ACGAACGTTA
'rGGCATGATG
GCATCCAGAT
TGTGATTGAG CAGGCCATTC ACGACAGGTTI GCTCTATGGG GTCTTTACCA AAGTCATTGC TACTACAACG ATGATGCTAT GACGCTGGT TTATTGACCG GCCTVGGAAC AAGATGGTTT CAAAT'rCCAG CCTGGrrTGC CG'rAAACAAA TGCCTTATTT TACCTTGTTA TTTCTCCCTA AATAAAGGCA TCACAGTCIT TNGCGCAATA GCTTGACCGA AAACGTGCAG TTGCT?MrT AATGAATACT GGCTTGAGCT AAACCAGAGG ATCTCAAAAA GCAGGTGTTC CTGTGGTACC CTGAAAGAAA TCGGTCTTCC ACCTTTAAAC TTGAGACAGA ACCCTTATT TCTTTGAAAA GTGGACAAGG ATGGAAAGAT TTTTATCAAG CACAAGACAG GTTGGTTTGA AGCCTATCAT GCACI'CAATT TCTTCCTCCA 'rCTCAGTGGT GTTTACGACG CACG7'TTT TGTCGGTGAT TTACCAAAAC 'rCGCCAGTAG ATTATATT'rG GAACCAAAAC TTACCGTCAG GCAGAGATTG TGCTGTGTAC GGGGCTGGA GCCATCC'rTT TACAAGCTCA A.AGAAGCCTT 'rGACAAGAAA TGAATGGGGA CATGATGTCG CCCATGACTG GGAATGGTGG CCTCGGTAAT CTCTATTTAT AAAAGGAGTT ACCTATGAAT CTATCCACAA AAC'rTTCAAC AGTTTACCAT CGAACTAGCT GGGAATTGGT CAAGAGTI'?T ACGAGCAATT GGATGACCCC GTATTTTCGT GTTGATAATC TTGAGAACAT AGA'rGA.AGTC CTTTTATAAA CATGGTCCAA 'rTGGCCGCAT CGAGTCTCN'C AGACGCAACA CTCAGAGAAC AATTrCAATCT TTTGGTGCC GACGAAATAT AACTCTGAAA TGAAGAAACT TTTCAAAAAA TGGAGCTGTr A1'CAAGACGG AAGCAGATGT TGATCAAGCA AATGATTGCC AAACCTGATA ATGGAGTGGG AGCAGCCGCA AGACGATATC AATCACTTCA AGCAAGAATG GGACCATTCA ATrTGICACT TCCACAAA TCTGTACCTT 'rGACGGGCTC TGTCTTCTCA ACAACCTTTG ACTACGCCTA TACACCGCT 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 GACCTCATGA TTTATAAGAT CTGCGCAAGI' ATGGGGAAGC
ATTGAGTTCT
GTG-TA
GCAGCTATTG
GCTACTTCTC
TATAGCCAC
GATTACCTGT
TCCGTGAGGG
CCATTGATGT
'rCGCAGGAGA
GCCGTGCAAA
AGTTCAAGGT
ATATGCTGAC
0 0000* 00.0.
0.00* 0 0 TTCGGACAAC GTCAAGAATA TGTTrTGTCT GA'rAAAAAAT ATAACTATTT GAAGCAGGAT ATAGTATTGT GTTTGCGTAT ATATAGTTTT CAAGATACCA TAGGAAGCTA GCCGCAGG'T AGTCAGTATC ATATACTACG GTATAAAATA TTCAGGTGAC AAAATAATGA TATTACTAAG AATTAAAAAA TAAAATCGAT CTT'rAACAAA TAT'IGAACAG AGCTTGGACC TGAAAGTGAT AAGATAAACC TGAATACCCT ATCGTTTAAT GACTCAGGAG ATACAGCCTT TTCACGATAT 'rTCAAAAGGC AATTTCTATC ATAATGACAA ACAAACTATT 458 GGACAATTCT TATTATGTGC TCAAGGATAT GGATCCTAAA AATTGTCAAA GAATTTGGTA TGAAAGAACG GNTTTCCAT GGACGATTAT ATTACCATCG AGTACAATAA CCGCCCTGCA TTATAACTTT GCTCATTCCT TGGACCITA TCGTGGCTAT GGAGTTCCCG GCGTCAGACT TTGAAACTCA CTA"rG'NrTG TGCTCACTAT GTr'rATTCAG AAGAGGA'rr? GCTTCCCAAA TAAAAAACTC ATGCCAGC'TG CCrTCGCGGA ACTTCAAGGA CAC'rCCGAGT CGACAAGAAA TGGAGCAGAT GATTGCAGAT AGAACTATCG GATTAAGGAA ATT-AACTCCC TTAATCCTT'r AAGAGCATCC CAACAAGGTA GC'rATCATAA AACTTGT'rCG TAGGTGGTCA GAAATTAAAT TTTAATATT1' CAATTGAGTC CCT1TAAATCA GCTAAAAGGA TCCATGACGA CACCTATACC AACAAGTCTA TTAA'rATTCA ATGAAAATCA AAGAGCAAAC TCTCAAAACA CTGTTTTGAG GTTGTGGATA GAACTGACAG GCAAGGTGAA GCTGACGTGG TTTGAAGAGA TTTTCGAAGA GCATAGATAT ACTTAATTGA AGCTTTGTTT GAAATCTGAT TrTAAAAAC TAAAGAAAAG GGAAGATATG ATTACAGGCG CAGCTGTGGG AAATTCTTTG GACAGAAGGA AACGCAAATC T'rCACTTATC TCTTATTTAT GAAAGATTTG GATAGTGTCG GCTGAATTTC TAGGGATTCC TTATGAGGGA GT7TTCCAA TGGTCAACTT TI'AAAAATAT AGGAGATGCT CAGGAAGTTT ATTTTTCCGT TTATTAAAAA TCTCAAGGGG GATACAGATG ATGCGAGAAG CTATTTTCA AATAAATAAA CCTGCTACGC TTAGATGTTT TTCCAACTAG GGGATTAGAT GTAGATTT'TG ACTGATATCG GAGATATCTA TGAATATCTC TTATCAAAAT GGACAGTTCC GTACACCTCG TCACATCATC GATATGATGG ATCAAAGATA TCATCTCAGA TCCCGCTATG GGTTrCTGCTG CGTTACTTAA AGCGTAAGAA AGATGAATGG GAAACCAATA CATAATCAGA PGMTrCA'rGG AAATGATACG GATACGACTA AACATGATGC TACATGGAGT AGAAAATCCA CAAATCAG=N 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480
TGTCGACCGC
TTGAG?1'GAT
GCTTCTTAGT
CAGATAATAT
TGTTGAGACT
AGGTAAAAAT
GCAACCGACT
ATCTGCTAGC
CAATCATTT
TGGGGCGATG
459 ACCTTGACTC =GTCTCAA A'rCCTCC1'TT TAAGGGCTCA TAAAAACCAA AAAAACAGAA GTGGACGACC AGCAGTTATC AAGGAATTCG TCAGGAAATT GrGGTGTGTr CAAGCCTTAT GTAATGGTGG TACTGACAAA ATGATAAGCG ACAACCGATT ATCTTGAAAA AGAAGCAGAA AGATAAAG4GA AAATGATTA'r AAGTTGAGTA TGAACCAACA TTCAAGCTGG CTGGCTGAA AAGTGAAGTT GGGGGAMGTC AACAAACAAC TCTAACCCAA GATAAr1GAAG
CTTGACTACA
?rACTCTTTC
GTACCTGATG
GTAGAGAATC
GCTGGAGTTT
GTCTGGTTT
AGCGACAATG
CGTCAGAGAA
GATTrGTCTA
GAAGTCATAT
AAGCCGATAA ATATACTG GTN'rAGCAA ATTCAACCTC TAATGACCTT CTTGCAACCG TTTCTC=? CTTGCGAACT TTAAAACCAG GTGTCCTIrT_TGGTrCGTCT ATAAGCTTGA TCCTGTAATC CAACTGCCAT TCTCATCT
AAAGCTCATA
TCAATGCCTA
ACAAAAACTG
6540 6600 6660 6720 6780 6840 6900 6960
ACGATATGAA
ATATTCCAGA
CGGATCAATC
TCAATAAATA
TAAAGAAAAT
AGCGGATGGT TTAAGT'rTGG TATTATCGAA CGCTTTCATC
S
S
S. S SS *S S S 55 S S
S*
S
S5
S
*SbS 55 S S TTGCAAAAAT TACTCAAGTA TTATCTCTAA AAAAAGGCAA CGTTATATTC AAATAGATGA
TAAAATTCAC
GGGATGGAGC
TTACGGTCTT
TCTT'TTTGGA
ATTTAAACAA
AGAACATTAT
TAGATGAACT
TATTGAAAG
TGAAAGTTTA AATATGACTG TAATGCAGGA ACAGTTGGTT AAAAAAGAAT GAGCGATACA AAGTAAATCG CAGTATTTAC GAATATATTA CTTGATTTAC CTGTATTCTT AATACGATTA AAACTTGCTC GTCAAATCCC CA'rTGATAAC TTA'TTGATA
AAGCACTCCC
ATGGATTATC
AAGAAAAAAT
GAGATCATTC
AATTAGAATT
AAAGGCTT AT
GATTTAACGA
T1TATAGATGG AC'rG=TATT
AATTATCAC
TT'rCTTT'GTT CCAGTTGCTC 7020 TAAAGAGATT GAGTATGAAA 7080 CAATGATTTA GAAAAAGAAA 7140 CGGAGGTGGC TGTATGAAAA 7200 GAAAGCCACT GTAC -rGCTG 7260 TTTAAGAAAT AATAATAATT 7320 AGATGATATTr CTCATAGCAT 7380 GGGCAGCTGTT GGTAGTACAA 7440 TATATCAGAT TACTTGGGAG 7500 AACAGGTGCA ACAATTCCTC 7560 GCTAGGTATC GAAGAACAAG 7620 TACTAAAAGA AAATrCAGT 7680 GATGTTTGGG GAAAATAAAA 7740 TGATAGGGGC AAAAA'rTATC 7800 TTTAAATACA AAGAATGTTA 7860 TAAAACAAAG GATAAATTAC 7920 AACAAGAGGT ACTGTTGGAA 7980 CTAAATCAGA TGAGTTGTTT AGTGACGAGT CTAAAAACGG ATM~CA'rTC GATACAAAGC
*SSS
54 S 55 S S
TTCGAAAAGG
ATGTAGCGTA
TAATATTACG
ATAATAATTA
TAAAAAAAAT
CAAACTTGAG CGT'rATGATA TAGTCTTGAC CTACGATGAA TTAA'rAAAAT ATAAACATTT ACGTATAAAT 'rCAGGTATGG TCCCAAGACA CCAAATCTAA ATCAGAAATT TATTATCCAT GTTTTAAGGA TAGTCGAGTG ATATCAGGAA GTGCTCAGCC TCAGTTACCA ATTACAAAAT ACTTCTCCCC C'rCCCCCCAC TAGCCCTCCA AAATGAGTTC GCAGACTTTG 8040 8100 8160 8220 460 TAGTCCACGT CGACAAATCA CAATTTGCTT GTGACATAGC TA'rAAAAGTC TGGAGAAATA GCTTGAAATT TACTATAATA TAGCTAAACT ATTTGTTTAA AGTGAGAAAA AAATGGGAAA TTTNTAGCTTT CTTTTAAAAA ATGACGAATA TGAATCTTTT TCAAAACC?1' GCATTGAAGC TGAGAATATG ATTGCTACAT CAACTGTGGC TACTCCT~r ATGGCGCGTC GTGCTTTAGA GCACGCTGTC CATrGGATAT ATAC'rCACGA TTCATATr'rA GAAGcTCCCT ATCGTGCTAC TCTATC1-rCT TTAGTATGGG ATGATGATTT TAGGGATATC GTAGATTCTG AACTCCACAA GCAGATAGTT CTGTTGATTC G'TGGGGAAA CCATGCTGCT CATGGTGGTG AAATTAAGGA ACGAGAAGCG ATrTTAGCTT TGCATCATTT GTATCAGTTT' GTTAATTTTA TCGATTATrTG TTACAGCAAT GAGTTTGTGG AGCCTTATTT TGATGAGAAG TGCTTACCAC T'rTCAGCAAA CATCAAATAC CGAGAAACTC CACAATC'TAT GATAAAGTTA CAAGACAGTT TACCAGAACT GCCTGATTTT CATGAACAGA TGGCTGCTCA GTCCGTACAA GTTCAAGAGA CTTATACTGA AAAACGTGAG ACTGCAGCGC AACGGCAAGA TCGTGcc~rrc CATATTCATC AA'rT.NTCTGA GGCAGAGACA AGAAAGCTCT TTATTGATAT CGATCTCCGT TITAGCAGGAT CGATATTTGA *AGAAAACTGT CGTGTTGAGA TAGCCGTTGA TGGTCTCAAG CACGGTrCAG GAATTGG'rTA *CTGTGACTAT GTACTTTATG GTAAAAATGG GAAAATT -rA GCGATTGTGG AGGCTAAAAA *AGCCTCTGTC AATCCAGAAG TAGGGGAAGT ACAGGTCAAA GAATATGCTG AAGCTTTGGA *GAAACATATC GGCTATCAGC CAATTTGCTT TATTACAAA1' GGGTTCAACC ACTATATACT ***TGATGGTCCG AACCGCCCCC AGATTGCAGG CTTTTACTCT CAAGAAGAAT TGCAATTAGT GATGGATAGA CGTCATCTTC AAAAACCGCT TGAGGATAT'r TCTAGTAAAA TTAGGGACGA 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 92 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 TATTTCCGGG CGTCACI'ACC AAAAACATGC TCATCGTAGA CAGGCACTTT TGGTTATGGC TTCTCTAGTT GATAT CTT AT CACGTCATAA TAGAACTTCC TTGGTTAAGC AAGCATATGA CGTTTGTAAC TTCTTAGAAG ATAAAGAACG TTATCCGACC ATGA1'TGGAG CGATTAGTGG CATTGCAAGC GCTTTGTGAAG CTTTCTCTGA AACTGGGCCG GGGAAAACTC GTACAGCAGT CTGGGTAAAA AACGTTCTCT TCTTAGCCGA TTCGTTTAGA A.AATTACTCC CAGATCTTTC AGCTCAATCA AGTCGCATGG TCTITrrCAAC TCAAGAAGAA GTAAATCAAC GCCCTTTCAC CGAATCTCAC CGTTCTATTT ATCAGAAATA TGTTGGGCAT TTTGACCTTA CAAGTCCATT TTTGATTATT AGATTTAGAT AAAAACACCT ATATGATTTG GAAGAGGCTG CAAACTGAAA CTACCTACGG
TCATAATTGA
TTGATGCAAG
CTCCGCG'rCA AATTGTAGGC TTAACAGC'rA ATGGATTCTT TAATTTGGAG AATGGGGTTC CAACATATGC TTAAAGACGG ATATTTAGTA GCCTATCATT CTATCGAAAC ATGG;TCTACA TTATGATGAT TTGTCCCAAG AAGAAAAGGA 461 ACATMrrGA'r AGCAAATTTG AAGACAATAG CTGTGAAAAA GATATTGATG GGAGTGTATT TAATTCCTTT-' A=rrCAATA AAAGTACAGT AGAAATTGTT TTAAATGAAC TCATGACAAG AGGAATTCAG ACAGCCTCGG GTGATGAAA'r TCGTAAAACT ATTATTT'rTG CTAAAAA'rCA TGA'rCATGCG GAATATATCA GAGGTAI-T- TAACAACCC.C T1ATCCTGAAA AAGCG.AGCGA CTATGCTCAG GTCATGXT ATAGTArrAA GCATTATCAG ACCTTGATT-G ATGATTrrAA AAT'rAAGGAG AAGTACCTC AAATTGCGAT TTCGrCGAT ATGTTAGATA CAGGTATTGA TGTACCAGAG GTTGTTAATT TAGTCTTCT GCAGATGATT GGTCGAGGAA CCCGT=TATG GGAAAACTTC TTGGTATTTG ATTATGGGGA AGATGGAGAG GGTCCTCACA TTGT=?CGCT CTTGATTCGA GAACrrCAGG GACTCCAATA TCAGCAGCTT GTCTCCGAAC TTCAAGGTCG GGTTCGTATG GTTTTAGATA CAG -rATAG AACTGCTGTT ACAAGTGAA.A CCArrCAAAA TAAAGAAGAT GAGATGGCGA GGAGATTTGA ACTGACAGCT AAATCTTCCA CTGTTCATAT TTCTGCTATT GGCAATATCC CGCAGT~rr CAAGAAAGTA CGCTCTAAAA CTAAGTTTG TAAAGATT'rA TTGGACCTG AGCAGGATAA CAA'T'rTGAT TATTTTCGTG CAGATCCAAG GACTCAGCGT 'rTATTTAATA TCAAAGTGGA CCAAGAAGAT CAGTTTGCGA GAGCATACCG TATAGAGAGC 7rAAA'rGAGT TGGACTTCAG CTATAGGAAA TTGGAAAGTT GGCAGAATCT AAATCTCTCT CCGC7=TAT 'rTGATCAAGA IrrGGTTG CrTCATATITC ACTTGGGGCA TTCCCAAGTG ATG-AAGACGG CTAGAGCTCT TGAGCAGGCT GAAATTATCA GGAAAGTACA 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 1 1160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 GGAGCCTGA.A 7M TGGAAAG AAGT'rAACTT GTCTGAT'TTG GAAAAAAVTC GTCTrGCTAT TCGAGATTTA TTACAGTTTT 'rCCAAAAAC AGACCGTAAA CCCTACTATG TTAACTTTGA AGATCGTATA CTCTCCACTG TTCACGAGAC CACAGCATTT TTGCAGGTCA ACGATCTTCG GTCTTACAAT GAAAAAGTTG AGCATTATTT GAAAACTCAT CTGGATGAGG AGTCCA'rTTC TAAGCTATAC CATAATAAAA AGT'rGACA'rC TGATGATATG CTTGCACTTG AAAAATTr"T TTGGGAAAAA TTAGGTAGTA AAGCAGACTA CCAAAGTCAT TA'rGAAAATA AGGCAATTCC GAGATTGGTT CGTGAGATTA TTGGCTTAGA TAGAGAGTCT GCCAATCGTA TTTTTTCTAA ATTTTGTCG GATGAGAATC TTAA'rCCCAG GCAGA'TrCA 7TCGTAAAAT TGATTGTAGA CTACATTGTA GAAAATG=? TTTTAGAGAC GAAAGTGTTA ACGCAAGAGC CGTrrAAATC TTATCGTTCT GTTCAACTAC TCTTCCAACA CCAACTACCA GTACTTCGTA ATATTGTTCA AATCArrCAA CrATCAATA ATCGCTGG AGAAGCGGCT TAAArrCrAA AGTCATTGCC ATGCTGAGAC TCATTTAAAA TTAAAAAGAG 'rAGAAATTTA TGCTATATAT GAGAAGTr'rT 462 ATTAGGAAGA ATGTCATCGI' TTTCCTAGAA TACAGTATCA G7M=TAACT CTGATAAA TTTCAAAGTA GATACTTGTA CCACGATrGTT TGTTGATCGA GTTATAACA AAAGAGCTAC ~rLTrATTTTA AAGAAATAGA AAACAAAAAG CCGAGCAAGA ATTCAATTGC AGGAGAAAAT GAAATAATAC TCAATGAAAA TCAAAGAGCA AACTAGGAAA CTAGCTGCAG ACACTGTTT GAGGN'GCAG ATGGAAGCTG ACGCGGATTG AAXGA~r? AAATCTTCCT AGGATAAAGC AAAACGCATA GTATCAAGGG ?r=CAACAC
GCTGCTCAAA
CGAAGAGTAT
TTGATACTAT
GCGTrCTG ATGTTAAAGA CTTIrCTACCA GTCATTTATT ArrCTTCAAA GAAAAATGGT TN'rTGAGCAG TATCTGCATC TTCACAGATG GCGATAGCGT CAGGGAAGCT CAAAGTATCA GGAAGGTrT TTAGTAGGAC TTGGTGT'rGA ATGTAGAGTT CGAGACGTTT TTCCCATTTT GCGCTATCTT CTTCGCGGAC TrGATAGG GGTrTTTTAA AAGCATAATT GTTAGTTGTA GGGGCGAATT 7T='CAGTTC TTCAAAGCAC ATAAGACAGA CATCATTACC ACAAAGGGTA ATGATAGAAC CAAAGGATTG AGCCAGTCCA ACTGGGCGCA TCCAGACAAG GGCGTCTTCC GACATGGAAC CATTGTTAAG AACATAATAA TTCATATrr TGATGTCGCG TGAGAGGGTT GCCTGGGTTA CTTGAATGTC ATC'rTGTTTT T'rGTGATAAG ATAATGTAAC TCCTTTTAGC OGGTAGTCT GTAT1TGTCAC
GTTCTCAGCA
AGCGCGTATA
AAGGTAAGG.T
GGATGGTGAT
GCTTGTAAAT
CCTCTACTAA
AGA6AGGGCTT
AGTTGGTGGC
AAGCATGGAC
TTCAA-AGTCA
AGTTGGCTGC
CAGTAAA'rCA
CAGAGTGTCT
CCATTCATCT
GTAACCTTT'r
CAACATGTGA
TAGGTA'r'ITC
GGATTTGATA
CCTTCTTTTA
GCAGTAATAT
GTCACGACTC
TACACCGT
r'TTTCCCCAA
CGAAGATTGA
GCAACTCAGC
GGrGT'rCTCA
TGAGCGAGGT
GTAGTATAGA
AGTTCAAATT
Tr'rCTATTTT
GGCAATTCAC
AAAATAGCTG
A6AGAGCCCCT GTTGTGCCTC TGGTCTAAAT TGTCAATGG'r TCrAGCACTT ATT-CTAGCAG TTGGGCAGAT
CTGTGTATGA
TTTAT'rCATA
CGACAGTCAA
GGACTAAACG
GAACGTCCAT
GTAAAT'rAAG AGjAGATTTTC
GATAAGGTTG
TGTTTGTACT
TAGTCACACG
AG4GTTTGGTA cT'rGCTCTTC
GTGATAACCA
CTGTTTCCTG
T'rGAT'rGCCA
TATGAGTAGG
11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 AAGGTTAATA GTAATCTGAT TGTCTTACCA TAGL'GCTCAA TTGACCAAGT GAGAAGGTTT AGTCTGTGGA GCTGTATTGT TCCTAGTAAT TTCTTGGTCA ATTGTTCATG TAAACATGGG GCTGAGCGCG ACCGTTTAGG TGGAGATCr TGGCTTTCCC TrGATAGAGC TCTGTATTGA AATCCTTATC TGGGTACTGG TTTTGAATAG CACAG=rTC ?'rGCCCACCG AAGTTATCAA CCTrGCCAGAT AACAGTAGGA AGTTGAAAGT ATAAGGCATT TATGGACTCA CGGAAGTCAA CACCATTATG GAAAAAGAGA TGCTTGTGTA AAGAGCATGG AACATCTGGT AAACATGAAG TGGNrTGACA TTCCAATCCT GAGAACCATG 463 AGTAAAGACA ACCTCTGCCT TTACI'TA'rG GCCATTGAGC AAACTGATTG TAGTCCCCAG TTr'N'CGGTC TAGCTGAGCT
AGATAATTGC
TTCACTTTT
TrGTGAGCT
CTCAGCAAGG
T'rCACGGTAG ATCGACrCCT
AGCAACTT
GGCACGGCAA
GTCTCCATA
GGCAAACC
CTTTTCCTCA
AGGTTTCTCT
TCATTGCCAC
CAGTCAAAGT
TAGTTGTACC
GTAGTCGCAA
CCGTTTGACC
CGACCG'IrAA CTCATGAA6AC
CTCGGAAGGA
GCCTCTGCTA
AGCTCAATCT
S
S S
S
S. 55 S S
S
*S
S. S 5 S. 55 S. S
S
5555**
S
U
S S 55-.
S
S. 55
S
GAGAGCCTTG TCACTAGCCT GGCAGGGATT Tr'rCCATCAA CCCTTT'TTGG TCAGTATCGA GT'rAGAAAAA GTAGCCAAAC AAGACCATCA CTAACAAGTT TAACTGATAG AGATTPTCAA AAAAACGTCA CTATCTTCGA AGTATAAAAA ATATCTGCTG TTTTTTATCA GCTGCTAGGA ACGACGTAAA AAGGTTTCAA ACAT7TTr AGTrCAGATA AAAACGCATA GAACCTCCAT GAAAGGGAAT TGTTAACTI'C TCAATATAGA GAACAGAGTA ATTTATATTA TATGCTTrCT AAAAATACAA ATGAATAGAA CATTATAGG TGAAAATGGT GGA'rATAGTC GCCAGCTAAG AGATrACGAG CCTCACCTGG ATAACCACCT GGGCTAGTCA A'rGATGAAAT TCCTGCCTCG GCAATG.ATAA GACCATTGGA CATGGTACCT AGATAGGAAA AATCAGCCTT GACTGACGC TGGCGCGTGT GCCAATCGAT GACATTTTTA TAAGCCTCGA CTGTCGAGTC 'TrTGGTACCA ACACCTGAGA AGTAGTCGTT TAGTGTATAG CTAGAGTrGA TAAGCTCAGC TTTACCTrGG GGTrGGACGA TGTGAGGAAG C?'rAACCTCA AGCrCGCCC'r TG.TCA~rGGT TCCCTGATGA TAAGGGCTGG AACGAGGGCG AATAATGCTA ACCTTTACTA CACGAGACTC AACGTAAACG ACTTCACGAA TCT'rGCCGT'I AAAGTAGTGG TAGTCATTAT GGTCGATAAG AGTATrCCT TTTGGTGC TCAAGTCACC ATATATAATG GGAAATCCAG AGTCAACCAA ATAAGAAAAG CCTAAA6AGTT TCAGTrCATC TTC'rGATtGA AAAAATGTCA TAGAAAGTGG GTAGTTGGTG TCTTGATAAG GTGAGTCTTT GTGATTGGCT GTATTTTGTA AGACATI'TTC TI-rrGGAAA.A TT~GATATAAC ATAGAArGAC AGTTAAGGTT ATTATATCAA GGTCA'rGCCA
CTAAGTCAGC
AATAGGTrAA
CCAGACCGTT
CTTCTAAACC
GTCCTGTTGT
GATCAG'rAAA
TTI'GCTGGTA
CATAGAGATr TGTGAGT'rAG GATTrAG'rTG
CCATCTTGTA
CTGTCATGAT
GGTCTGATAG
TGACATCCTG
CCTCCGGAAT
GAGTATTGAG
TTTCTTTACG
GAAAAGCAAC
GCAGGTCTGT
'rGAAAAAGAA
AATCAAAGCC
TATATTGATT
AAAAAAAGCA
13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 AAAAGGAAAT AATCCAATAA AAATGAATAA AGTACTAAAT ACAATAAGAA TAAATAGATA GGGTATAAAA GTTCTAGGAG ATrTTTATAT ACAATATAGT ATAAATATAA AAATGATGAC AATAAATTAG TAAGCTGATG AAATTNTC CAAGAGAAC ATAATATAGT GAGAAGGATA GAGGAGAAGT GTAAATTGAT 464 CGCACAACTA GATACAAAAA CAGTCTATAG TTTATGGAA AGCCCCATTT CGATCGAAAA GTATGTGAGA GCAGCTAAAG AATACGCA CACTCATTTG GCTATGATGG ATATTGACAA TCI'TATGGC GCTTTCGACT 'TTCTAGAGAT TACAAAAAAA TACGGCATTC A'rCcTTTGCT AGGGCT1'GAA ATGACAGTGT TTGTAGATGA TCAGGGAGTG AAITTTCCT TTTAGCTCT ATCTAGTGTG GGCTATCAGC AGTTGATGAA GCTTTCGACA GCCAAGATGC AACTGGTCA GTCCTGTCCC AGTACCTGGA GGATATCGCG GTCATTGTGC TAGAGTTGAG TCGTTAGAAC TAGGCTGTGA TTACTATATA GGGGTTTATC
AGGGGGAGAA
CTTAT7TTGA
CAGAAACACT
AGCAAGCGAA TTCATCATC CTATCTTACC GGATAGAGAA GTTCTTCAAG TTTTAACAGC TCCCTTGCGT TCGAGACAAG ATGTCTTTAT AGAGCGTTTT CCGCAAGCTT TG4GACAATTT CrrGGATACT AGTCTGAAAC TGCCTCGTTr GAGAGAGCGT GCTGAACTGG GGCTTGTTCA TAGACTAGAC CAAGAATTGT CTGTTATTCA TGTTTGGGAT TTGTTGCCTT ?TGGACAATC TTCTGCAGTA GGCAGTTTGG TTTCTTATGC TCTrATCGG GTCAACGC'rT TTGAAAGCAG GATTAAAGAA AATCTACCGC TCAGAGAAGT ATCAGCAAGT TCTTTAGAGA AGAAAAGCTT ATTTCAGGCA TAATCCAGCT AGACCAGCAG GAAGGGGTTG ACTAGTAAAG TGATATGGGC 'rrTGATGArr
AACTATTCCA
T7TrCTTACGA
TAGAGGAGTT
AATATCAAGA
ATTTCTTGGT
a.
CAATGGCTAT
CTTAGACA'rC GAAAAATCTG ATrTGAAC GCTTTCTTAA TCGTGAACGC TATTGATATC CCAGATATTT ATCGTCCAGA TTTTATCAGA TAGTAAACAT GCGGCACAAA AGATGTCTTG AAACGCTTTG CAGT=~CGT GACAATCTTA CAATAGTAAG TTAGAATACC AAGGCAAACC TCTGTCCATG CATTCCTCTA AAGTATGGTG GGCTAGCGGA CTTTTGAAGA GATGCAAGAG TTGCTTGCTG AGAAGACAAA GAAACGTTAG TGAGCAACCA GGTGCCATTC CGTCGCGACT ACTTCTCTAA TCGTTACTTT TTCAACCTr'r GTGTGCCAGA GTATGAATTA AGTCGGCCTA TGAGGGAAAT AAAAAGCT= TGAGATTGCT CGGCTGGTGT TGTAAT~rAGT ATGAAATTCC ACTGACTCAG TGGACTTTCT GGGACTACGA AAACAGAAGG TATTCATCTG CTTTATTTGC CTCTGGTAAT GTCTGCTTAA GCCTGTGCAA ATCGACCGGG TGCTAGTGAC TATATGGGAA TGGGA-AGGGG ACGGGGATTG ACCCAGTAGA *TAtACCATGC CTGATATTGA TATGTTGGTA A'rAAATATGG GGAGCCAAGC AAGCTCTTCG TCTGCAATTA CTAAGAAAAT CTCCAGTTTC GTCAGCAAAT TGCAAGATAG AGGGCTATCC GACCAAGATT TAACCAACTA TATGATGCTC ATGGAGTTGA AATTTGACCT TTGTCCAGAA AAAATTGAAG AAATCGATTT ACAAAAGGTA TCTTTCAATT CCAGTCTGTT TTGAAGATGT TATATCAATA ATTTTGTGGC 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 AAGAAAGCAT GGGCAGGAAG AAGTGACTGT TCTGGATCCA GTACTGGAGG ATATTTTGGC 465 TCCAACCTAC GGCATAATGC TCTATCAGGA GC.AGGTTATG CGGATTI'AGT CTrTGGGAAAG CCGATATT GCGTCGGGCT TGCCATGCAT GAGATGAGGG cCCx'r'AT TCAAGGTrCA GGAAAAAGCA GAGCAGGTCT T'rGATGrrAT GGAGAAGTT GTCACACCCC TA'rGCCTACT CAGCCTTGGC CTTCCAGTTG TCCAGCCATT TTTA'rCAGG TCATGTTAAA TrrCrCCAAC ACTTGAAGCA CGT'rIrGAAG TAGCCTCTCT ATCCATCAAC CAGGTTGCCC AGCGACTrGC ATGGGGAAAA AGGATGCCTC rAGAAGCTG GTCATACTCT GCAGGTTATG CTTTTAACAG GCTTA7'rTCA AAACGCATTA AGTGATrACT TAATAGATC ACCATTCCCI' ATCACGATAA 17160 17220 17280 17340 17400 17460 17520 A6ATTGCCA6AC AAGGCCATCT ATCTAGGT GAAATCCATT AAAGGAGTCA AGCTCTCTGG ATTATTGAAA ATAGACCTTA TTCTAACATT GAAGA''TA ACCTGAGAAT TATCTGAAAC TTCCTCTGCT AGAACCTrTG GTAAAAGTTG TTCAT'TGAA AAAAATCGTC AAAAAGTArr TAATAACTTA GCTAATCTAT GAAAGAG'rTG GGAAGT'rTGT TTGGAGATGC TATTTATAGT TGGCAGGAAT GACGGAACAA GAAAAATTTT ATATGGAACA AGAGC7=rA GGGATAGGTC TCCACTACAA GCTATTGCAA GTAAGGCTAT TTACCCGATT ACCCCAATCG ACAAAATAGC 'rATCATTA TCTrTGGTTGA AG7"rCAGAAA ATAAAAGTGA AAACCT'GAA AA-rATCGCCT TCTTAZCAGGC AGATGATAGT AJNGAAAAAAT TCTCrTTTTCA GACTTATATC GTCAGGTTGG ACAGGAAATA AAAGAGGGAG TCTAAAAGGA AAAATACA6AT CACGTGATGG CCGCTrGCAA ATGATTGCAC AGAAGCAGT'r GCTGAACGCT TTTGGATACA GGTGAAAAAT CATGAATCGG TTCACGCATT T'rAGAACAAT TTAAAGGCCC AATCCCAGTC ATCATCCGGTI ACAGAAAACC ATCGTTTCTC CCCATCATTT TGTAGCTAAA TCCAATGAAT GTAATGATTT 17580 TAGCTAAATT 17640 GTCTTNTCGA 17700 'TGAATTTGT 17760 CGGAACATTG 17820 TCAGCAAACA 17880 GAAATTTGTC 17940 TTCGTACCAA 18000 TGCATGTCAC 18060
CCTTCTACTA
PLAGA.AATAAG
ATCAAGA.AAT
ATGAAGAGGA
TAGAGGAGAA
AGAAGAATTT 'rCGCTAAAAA 'rACGGAAAAT ATTGAATGAA ATCGTTATGA AAACGATTTA TCAACCTAAA TGTGGTATAA 'CAGTAAGAA GAAACGTATT GCTGTTTTGA CTAGTGGTGG TGCAGrTG=r CGTCAAGCAA TTrCAGAAGG TGCTGGTATG GTTGCCGGTG AAATTCATCC TGTTAAAAGA AAAAGGAGCA TAACCAATAT AGACGCCCCT GGTATGAACG CTGCCATCCG AATGGAAGTT 'TrGGTATCT ATGACGGATA CCTAGATCCA GCTTCAGTAG GGGACATCAT 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 TTCTCGTGGT GGTACTrrCC 'rrCACTCAGC TCCTTACCCA GAG1-rCGCTC AACTTGAAGG GCAACTTAAA GGGATTGAGC AATTGAAAAA ACACGGAATT GAAGGTGTAG TTGTTATCGG TGCTGACGGA TCrrACCACG GCGCTATGCG TTTGACTGAA CATGGCNTCC CAGCTATTGG 466 TCTTCCAGGT ACAATCGATA ACGATATCGT TGGTACTGAC TTACAATCG GTTTGACAC AGCGGTTACT ACTGCCATGG ACGCTATCGA TAAGATTCGT GATACATCAT TCGTACTTrT GTAATCGAAG TrATGGGACG TAACGCTGGT GATATCGCTC TATTGCAACT GGTGCTGATG AAATCATCAT CCCTGAAGCA GGCrTCAAGA CGTAG4CAAGC ATCAAAGCTG GTr-aLATG TGGTAAAAAA CACAATAT'rA TGAAGGTGTG ATGTCAGCGG CTGAATTTGG TCAAAAACT'r AAAGAAGCTG CGACCTTCGT GTAACAGAAC TTGGACATAT TCAACGTGGT GGTTCTCCAA CCGTGTTTG GCGTCACGTA TGGGTGCACA TGCTCT'rAAA CTTCTTAAAG TGGTGTTGCG GTTGGTATT~C GTAACGAAAA AATGGTTGAA A.ATCCAATTC AGAAGAAGGG GCAT'rGTTTA GCCTTACTGC AGAAGGTAAG ATTGTGGTTA
TACAAA
CAAGTCACCG
TTGGGCTGG
TGGAAGATA'r TCGTCTTAGc
GAGATACAAG
CTCGCGTGA
AAGGTATCGG
TTGGTACTGC
ACAACCCAGC
18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19446 INFORMATION FOR SEQ ID NO: 52: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 16593 base pairs B) TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: TCGTAAATAT GCTCTCTTTT TGGATT?1'GT TTCTTAATCT GT ATAGAAATAG GACCACACAT ATAGACGGTT GCATGTTCGG CC TTAAGATAGC CGTCTTTCGT ACTGTCGATT AGATGGACTT CA GCATAGTTAC GGAGTAAATC TAGOTAGACT GCATTTTCAT CT( AAGTGAACCT GTTTATCTAA AATAGGATGT TCACGGATGT AA( CCAATACCTC CAGCAATCCA AACCTGATTT TCTCGTCCTT CT1 rrGGCAAG TGCCTTCATC %CTTCTTT TTGTTCAAAA kAATTAGG ATTTTTCTGA CCACGGAA GCTATAGTAG ~AGATGAA GGGGGTGATC ~CTATGAT CATGTGTCCG TAAGCTCTGT CTAGGGTTAC TTTGCTGCCG GCTTGAAGAT TATCATAGAT ATTTGGTA TGGTCGCCTG AAGTTTTAAC AGTAAAGTAA AGAGTTTGAC CATGACCTCC TGAGATAGAA AAGGGATGCG GAGCACTTTC AAAGCCTTCT TGGAAAATCT TTAGAAAGGC AAATTGTCCT GATTGATAGT TGAAAGGTCT GCTAAGATGG ATTTGAATTT CTCTAGTATC GTGATTTAAG CGTTTGAGAT GGGTAATTTT CCCTAGATAG GGGAAGGAAA TCTTTTGATA TAGAAAAATG ATATAAAAAC CAGCTAGTAA GCCTAAAAGG GCATAGCTAC CAACAAGAAA ACTTAGAAGA TTAAATGTAA GGAGACGATT GCCCATTATC ATGTAGATGT GAAAGAGTCC TAAAATATAG CAGGTAAA CCAGGCGGTG
TAGGCGACAA
AAGCGAGAC
AGAAAGGCTG
GGATGATGCT
CCCACAAACC
TGAATTTGTG
AATCCA'ICGC CAAGCr'rCGT ATTGGATGTA TTrCCTA;A GGCAAAGATA TAGATGGCAA GATTGCCAAA CTGAGCAGCT GCCCATACTA AAGTTATGAA AGA'rTAGTAG GATCATTGAG GACGGTGTAG ACC~rCTCCA AACTGlrAAA CCAGC?1'TCT TAGGATAAAA GTCAGAGATA GGCTwrGTTAA AGCTAGTCCT AGTGTTCATC CAAGTCAAAA GAGTCAAGAT A.AAACTAGCT AGAGTGGGA GACGAGTGGC GGAATCATGA ATTGGGGAGA ATGATAAAGA GTAGCCTT TrlGTTGTAA ATAAATTTGT CTATAAATA'r CTACTCTCAT
GACTGATTTC
TACATTTTAT
CAAAAAACTC
ATAGAAAA'rr
CATAGAAAAT
TCCAATTGAA
CCATTTCATT
GTATGCTGTC
CTGGAGAGTG
TAGATTTCGA
AAATTGAGGT
GCTGTTTATA
S
S. *S S
S
S. S S
S
S. S S CTCAATGAAA ATCAAAGAGC AAACTAGGAA TGAGGTTGCA GATAGAGCTG ACGTGGTTTG CTTGTTGCCA ACG'GCCT AGCATATGAG AGI'AGATGAG GGCAATCAGG ATGTAAAGAC CCATGAGAAT TTGGCTGGCT CCAAAGAGTT TGGTATCCTT AATCACGGTA ACAAACTGAG CrGTGGGAG AATGATGTAG TAGAGGATTT CTTCGTACTG TCCCTTGTCT ACGGCATTGA
ACAGGCTAGA
TGAAGACCTG
CTTGTAGGGC
AAATGATGC
AATTGCTAGG TTAAAGCTGA CTCTGGTTCG AAATAACGGC GATA.ACAGAG TAGAGGAGAC TGGTAGCATT trGCGGATGG GCTAGCCGCA AGTrGCTCAA AACACTGTT= AAGAGATTTT CGAAGAG'rGT TATTCTGCAG 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GGGCTGAGGT GAAGCCTTG'r GACATTCCTG GACCGCCTCG AATAATCTCA GCCAAGGCTG CTGATGTAAA GAGAGTAAAG AAAAGATAGT AAAAATCCAG TGGAAATAAT GCGTAAGACA TGATAGTAGA GAGGATGATG AGATAAAGAC TAGGTTATCT AAAGTGAATA GGCTTTTTTG AGCATAGAGC AAAGTAGAGA GAGCCGACCA AGACTTAG'rC AGGTGTTCTT- GATGAGGTTA GCTGTAATAC CT~GCTGGTGT GGATTTCATT AGAAGGTTGG GAACGTTGCG CACAAACTCG GGATTTTTGC CA=TCTCGT GACAGCTAGC
TTGAACACCA
ATATAAATAC
ACCGTACCGA
*S 55 S S
S
.5.5 GCAA'rCAGAG AAATATAGAG GGTCAAGCCA AATCCTTTAA GGGGTTAAAA CTTCTAAAAT AGATTCCATA GTAACCTCCT TTGGCTTGCT CCATCTTGCG ACCAAALCTGG GCAACAGGGA AGAGCAGCAC CTAAAAAGGC TGGTATATAG TTTCCGTT-GA ACAAACATCA AGTCTACTCC AGAGATGATA GCTACAG'rAG ACAAT'rTGGT TGGTCAATGG AGGGAGAATG ATGCGGAAGG S. 55 S S CCTGAGGCAA GATAATCAAG CGCATGGCAC TGATATAGGT AAAACCTTGC GACAAGGCGG CCTCCATCTG ACCACTAGGA ATAGACTGAA TCCCTGAACG AATAACCTCA GCGATATAAG CGCCGTGATA GAGTCCCACG CAGAGAACGG CTGTCCAATA AATTGGAATC ATGATGATAT 468 GGTCACTGAT AAGAGGTAGG CCATAAAAAA CAATAACAAA CTGCACCAAG AGGGGAGTAT TTTGGTAAAA TTCAACAAAG ATGCGAGCTA AAATGCGTAA AATTGGACGT TrACI'GGTTG ACATGGCACC AAAGAAGATG CCCAAAACCA TAGCGAGGAT CAAGGGTGAA GAGGAAACCA TTGAAAAATT GTCCAAAATC ATGATAAATC TGTCATG.GGG TGTCCTCCTT AATCTGCAGT AAAGGAACCA ACCGCTAGGG CTGAAAATAG GCTGTCCAAG ATGGCTAGAT GGTTTGAGCT TGTAACGGTC ATAAAGNrC TGCAAACTAC CAAGATAGTC G7"rGAGCTCT GTATTTGATrr TGAAACTATC ATCTACTAGT GCTGTCCGTT CAACCGAAAA GGTATCGATA CGATGAGCGT CATCCTTGCT CCATTTAGTA ACCAAGTTAT TCTTGGTAAC AATACCGTAG TCAGATGGCT TACTAGTGTA GCCAGATAGA ATAGAGCGGT GCAGGGAAGT AATCAATTCT GGGTAGGAAC CAAGT'ICGAC GAATTITAAAC GGGTGATAGA ACCTTGGGCG 77TrGCGA TTTATTGACC TGTAGAGTTT TrrGCGTTCG CATTGTCTAG AAGGGGGCCG TGAGTTCATC AGCTACCATC 'rGGGA'rCTr-r GTAACCAAAA TTTrTTGAAT GTCTGCGATA
AAAAATCCAG
TCCGTrGATGG CGGGTTTrGTG TTG~CCAAGTr TrGGGAACGT
C'TTGTATCAG
AAGCGTCTGT CTAGTAGGGA TAAAGGTCGC GATATCCATA CTGTAACCC CACATAGCGA CGGTTTCGAT ACCAGAATAA C'TCTTTGAC ACCGACAACC TTCAGACCTr TCTTTT'ACC CAGTTCAGTA ACTCCGATGG TTTGCCGTT TAGGTCCTCA
ATCAGGCG'TT
ATCTT'r'rTGA
CTGGTAAAGT
'rCGACCTGTT
ATCTGACCT
GTACCCCTCT
AGTTCGCCTC
2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 CCTCGAC'tGG TT'rCCCAGCA GCAAGGCCGA AAAGGCTAAT CAATAATGCT GATAAAAAGA ATrTTC ATAGGCGCCT CCTTATTTGA CrTTTGTCACT TTCGTGGTTG ATAAT'TTGC 'rGAGGAATTG GATTCTCAAA AAAGT'rATCG ACATCTGTCG TATCTACTAA AGATAATGCG GTCCGCAACC TCTCGAGCAA AGCCCATTTC TCATCCCATC ATGCGCCAGT TTCTGCATAA CTGCTAGAAC CAAGAGCAGA TGTTGGTTCA TCAAAGAGGA GGAGTTCCGG CGATGGCGAT CCGCTGTTTT TGTCCACCAG ATAGCATGGC CCCACATA'rr TACAAATTCC AGATAIr= GGGCGGTTTI- TT-CCTAGAAC 7TCAATGGGT GCAAGCGTTA CGTN'TCTAA TTGGCCACGA GGTTCGCTTC AACTTCTCCG TCGGCCATAA GTGGGTAACC ATGATCATGT ATCTCCGATA GTCTCAGGAT ATGCATAGCA AGACCACGAG GGGATAGGAA TCTTTCT'rGT TTCAGCTTCT TTTTTATCAA CACAGC7'T'TG TGTGGATAAA CGTTAAAATG TTGAAAAACC ATGCCGACI' CCTTGCGAAG AGGTACCAAA TCTTTCTGGC TGGCACCAGC AACTTGGTGC CCATTGACTA GGAGACTTCC T7=GCAACA GTCTCTAAAC CATTGATCGT ACGGATAAGA GTGGACTTCC CAGAGCCAGA AGGTCCAAGC AGGACAACAA CTTGTCCTTr T'rCAAAACGG AGATTGATGT TGCGGAATGC GTGGTAGTCT CCGTAATATT 469 'TTCGACGTT ?ITrAAATTCT ACrAAAGCCA TGAGAGATCT CTATT~GTr= ATATT'rATA ACACGCTTCT ACAATAAAAG AATGTCTTG TCAAATCATA TCTGAAAAAA TTCACTATAG TGAAATAAGA ACAGGAAAAA TCGATCGGGA CAGTCAAATC AAC'rAGAGGT GTACTATTCT AG7TTCAATA TACTATAAAA GATAGAGAAA ACGTCTAAAT CA'rGTTATAA TGAAGCAATA 'rTCT'rTTTG ATAACACCI'A CTTATGAATG 'rrTTACAAAG ATAGCCAAGA AGGCTGGACT GAGAGGGATT CAGAACCAAG AALAGTCTGAA ACATGATGCT TATCTGCTCC ATGATATGGA TGAAGAAGGG GAAAATATTC T'rGT'ATGG TTCTATTGTG AAGGAAAGTT TGGAACAACT TCGTTTTACC GATGGCTATG GCCCTAATC AGGGA'rTTCC TTGATTGTGA CGGTGGACAA GCCTCAGTCT ATGGGAGTAG ATGTC-ATTGT
GCAGTTTGCC
GGGTCCTGAG
GAAGTTTA
CAAGGCAGTG
AGACTATGAT
GATTTrCTAAC AATAT2'TTAG TGTTATAAAA AAGCAATCTC GAAITrCrrAG AAAGAGTGGA CTGCAGGTAG AACATGCGGA GTGGCTCGGT TATTCTTTGA GAACCTTCCI' TGGAGGACTT GAGCCGATTC GTCAGGCTAT GCGGATGGCA TGACTCGGC 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 TGGTGCTGAG TGCCGAGTTr ACCTGCCAAA TAGTGTTTAT AAATACTTTA TCGAGCAAGA TGGGGTTrGCT GGTCATGAGG CTATTGCATT GACAGACCAT CATTCCATGC CTGAAACCCT ACATCCAGAT GCGGATTATC CTTTTAAATA GGCTTGTGCC CTGTTAGAAG AAGTCCAAGT GCCAGATGCT TATGCTATTG TT''GGTGT 'GTGGAGTTG GGAATTGCTT GATTTGGTCG AAATCGTATC TTAGTTCAAT GCAAGAAATG CTGGACATGG 'rCCAGATT GCTCCTCGTT TCATTTGTTG ACTGGATTTG GAAAAACGAA GAGCGCAAGG GGATCCTGAG AAGAAGGTTC
TCCATCCTGA
CTTTCAAGTT
CTATTGGAAC
ATGGTCTGGA
TATTGCAGAT ATGGTGAGTC AATGTTGGGT CATACCCAC CTGGGATTGC TGCCAACGAA GTAACAGAAG TGAATGCCTT GGGTCGCTTG GATGATCCCA
ATGATGAGGA
AAATCGTTCA
AGGTCTTGGC
AGCGCATGAG AT'rGCCCTTA GTCTATCTAT GAAGAAGCCA CAAGGAAGGC TGGAATCCTG TGACGGATGA 5340 GCATTGGTCT 5400 AAACGGTTGG 5460 ATCCTGCCAT 5520 TGATTCACCA 5580 AGACCATCT 5640 GGGTTCTAGG 5700 TTAATATAGA 5760 TTGAAGC'rCT 5820 CGGGTATGAC 5880 TTCGTGAAAA 5940 A1'TTGGAGGC 6000 TGGATAATCA 6060
AATCCTGGCT
AGACGGTCGT
GGATCCCCAT
GCTGGAAGTT
AGGTGCAGAT
GGTCGTTTAT
GCCAAGGGCA
CGAGACCTCT
GAGCAACTCT
GCTGGTGGCA
TGGAAGAATT GGGACAGACA GTCATTGT'rC GTGCTCGTAG TSTGGAAGCG GTCGATATTT TCATCGCCTT TGGAGGTCAT GCAGGTGCAG CAGATTTATC TCAGGTTTTG GAAGATTATG AGAATAAGTT AAACCTAGAT GAAGAGTTGG AAAG7=TGA ACGTTTAGCT CCTTrMGAA ACTTAGCTTG GAAACGGTCA 470 GAAACCTA'N' TTTTATATCA AGAATN'TCA GGTCGAAAGT TAATGCCCAT CTAAAGCTGA AAATTTCCAA GGGTGAGGCG TGGTCAAGGC AGATGGGCGA CAGAGTrC TCAAACCAAG ATTGTCTGTC AACCAATGGA ATGGCCAAAC TGCCCTCCAG AGTGGAAGGT GTTCAACTTT TTAACATTCG TGGAAAAAAT TCCAGTCTTG GATTTTCCTG GAGAACTGCC AAATCTTGCG GCTCGTACTA TGGGGGCAGG AGTTTGAAG TGGTAGCCTT AATC1'AGAGT TAGCGGT'rAA TTGATGATGG TGGATGCGCG GCAGTCTTGC CAGAAGGTT GCTAGTGAAG CTGTTGTCGT TTTCAGGAAC AGCATTTCTC CTGACAGCTT ATGGGACTAG CCAGAGTTTG ATA'rTCGCTA ATCTTGCTGG TCAAGATGAT AAAAAACAT'r CCAGAGGATA TTACTCAGCT TGCTGTCTAT TTCAAAAATG ATATTGACAA AGATCAGTT'r GCCAAATTGT ACAAGACTAT CAACCTGAAA GAI'GGCTG CATATCTTAA TCAAGTATT GAAGAACTAG GC71wrGTGAC AGAGGCGCCA AAGCGGGAGA TAGGAGAAAG TAAAGACCAA GAAATGATGG CGCTGGGTAC AAAAGAGTAG AAGTTAGGAA AGAGT7VGGA TTTTGAAAAT CA'rCAAAAAA ATGGTATAAT AACTTTTAGA ATAAGAGGGT AGAATTGCCC AAAAATGAGT AATATCAGTT 'rAACAACACT
GAACACCATT
GGCT'rATTrA'
TTACCAGTTC
TArrCAACAA GATAAA.AGAT GGTGTGA'rGA CAGTCAATAA TCAAATTTAC CAAAATCTCA AACAAACCGT GGTGCAAGAA AT'rATCATT T'rTTGATGGA AATCAACI'CT T=NTGAAAA CAGACCTTCA GGTPLGGAAA.A GATTCGGCTG AAAGTATCAG TATAATCAAG ATAAACTAAG ATTTTGAGG TGGTGG.TGTG. CGTGAGAATG GAAAAAATAT TGTTTTGAAT GTAGGGTTAA AATATCCTGA TCCAAACATG GATTACCTT TTGAAAATAG CGGGCATGCG GATGCCATTG G'rGCTCTACC ATTTGGGTCT GAGTTGACCA TTGAGTTGC TAAGAAATT AATGATTTCC ATG'rCAT'rGA ACGT7CC TTCTTCCCTA CGACTTACTC 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080' 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860
GTACATTGCT
AAATGAACAA
CGACCGTATT
GTATCTCTTG
AAAGCTCTTT
GAAATTGGAG
TAGGGTCG
GCTGGGGTT
GCAGAGGCTA
GTCAAAGGAA
AGTCCATTTT
ATGTGGTGAT
TC1'TGACCCA
AAGTTCCTGT
ATGATGCCGT
TGGTGGGAC
TGAGAATACG CAGATTGATT CGTTCCAGAG AGTCTGGGAA TGAC?1'CAAA TTTGACCAAA AGAGATIGG'r CGTGACGGCG TATT-CAGGTG CCTAGTGAAA TTrGTCTTGAA GACATCGGAA GGAAGCATCG CGGCTAGTGA ATCTTATGCA AC'TGArTTTG TCCTGGCTCT CCTCAGTGAT TCGGCCAATG GTGAAGTTAG GGATGAAAT'r ACCCAAACTA CAGCTGTTTC CAGTAATCT TCTCGTATTC GTCGACGTAT CGTCTTGACA GGATTTGATA TTAAGAAGTT GTCTTTAGCC AACGAAATTC
'N'TATACAGG
CTCGTrTG4GC
CAGACACCAA
TTGCTGACTG
AGCAGATTTT
TTGAAAATAT
TTTTGATTAA
GGAAGGTCGT
TGACGCTGCG
CGTCCGCACA
ATCATCGTI'G
GATAAAACAG
GCGATTCGTC
GCCTAAAGAT
GGGTGAGCCr
CAAGGATGGG
TGCGCGTGTG
?1'TACATGTA
ATGTCTCGCT
ATCAATGGAC
GACCTAGTCT
GAAAATAT1GA
TCAGGGCACG
Tqr.AAGACCA
TTCGTAAGAT
ATA?1'GCTAC ?1'TATCAGGC
GAAATGTGCG
TGAG -rGATT ATTCTTGAGA CAGGTCGTAT GTCGATTGG CGCCATCGTT ATGTAGAAAT GGCTCCGTCr ATGCTAAAG AAGCCTrrGT AGGTGGG;GTTGTCAAATrGA TTACCCAAAG TGATTTGCAG CTGATGATCA ATCTTTTGCA GTATCGTGAG TTGGATGCTC ACGCTAAGGC CATCTTCATT CCTAAAAAGG GGACGACCAT ACCTAAGTAC CTCTTCCCTG TCCAAGGGGA TGCCATGGCA GTTGGGATGT TGCCAGAACG GGCTTACCAG AATGGAGACT TT-GTTCCAGC TGATGGGAAT GCCA?1'GGTG ATGTTGGAAA AGAGGATGGA ATTMCATCG 'rGGCTATTAC TAGGGCTCGT GTTCACACGC GTGGATTTGT TGAAAGTTrCA GAATTGATTA ACCAAACGGT
TGGATCGGTT
TGT'rGTTCT'r
AGTCAACCGT
T'rATCTrCAAG
AGAAGAGTAT
TCAGCAGGAG ATATCTTGAT CGTGACCGTA AGGTCTTGI'C CGTGAGAACA AAATTGTGGC AAGAGTCGCG ATATTCTCCG CTTCA-AGGAC ATGACTTTGA CTGGGCAGAT CTCAAAGGTA AGGTTCCTGA CAATCTGACC AAGTACCTCT TTGATCAAAC CAAGCGTCGC CCAGCCATTT 'rACCAG'rAGT CATGGAAGCA AAATAATCGT TGAAATAAAC AGAGAGAAAG TCGAGTTTCG CCT'FTTCTT ATAGAAAAAT AGAAGGAGAA AATCATGGCA GTGATGAAAA TCGAG1'ATTN CTCACAAG'A TTGGATATGG AGTGGGCGGT GAATGTCCTC TACCCTGATG CCAATCGAGT GGAAGAACCA GAGTGTGAAG ATATTCCCGT CTTGTACCTT TTGCACGGGA TGTCTGGAAA TCATAATAGT TGGCT'rAAGC GGACCAATGI' AGAACGCTTG 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 88 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 CTTCGAGGAA CTAATCTCAT ACCCACrrA'G GTTTTGACTA CGCTTCTTCC CTAA'rATGAC GGAGGCTACG GCTGCTTCAA TTTTCAGGTG CCCTCAGCTT GCCTACTGGA GAGGTGTTTT GAAAGTCTGG CTAAAAAATC GATT'rCTTGT ACGAAGCCAA GTGACCTATA GCCATAGCGC GTTTNTAA CAACCCTACC CTTCAGCATA GGGGGAGTAG CGTTGT'rATG CCCAATACCA GCAATGGTTG GTACACCGAT CTACACGGCT CTAGCAGAGG AATTGCCACA GGTTCTGAAA GAGCAAGCGT GAAAAGACCT TTATCGCTGG TCTTCTATG ACTGCTCTT ACGACAAATC GTrT=CTCA TGCAGCTAGT TCAAAACTTT TCTCCTGAAA GTCAAAATCT GGGAAGTCCA TGGAGAGATT AGAGACTCGA CAACTAGTCC CTATTCTCTT GGATAAAAAG ACCAAACTI' GGGCGTGGTG TGGCGAACAG TAATCTCGCA GTGAAAAATC TCAAAAAACT AGGTTrTTGAT TGGAAC'TCAC GAGTGGTACT ACTGGGAAAA ACAATTGGAA AAT'rcATTC AAATTAGAAG AGAGACTGAC TTAGTTTGAA AACTAAAATA AAATATGW TCACTAGACT TTTCAAACGMn AAGTAGTAGA ATAGTAATAA AATACTGGAG ATTGGCATT-C CCACATTAGA ATA'rGATCAG TTACAAACTA GTGC?~r2GGA GGAAGTTAAG TACAGGGAAG AAAAATTACT GGCGACAGCT TATAAAATGT TT-TACATCCC AAGAGGACCT AATTT'TGCCA rrCAGTCTAT TAAGTCCTAT TTTACCCAA GTATTTGCCT ATCTCAAACT GAAAATCTGG CTATTATTGA TAGTTTGCAA 472 GAAAGAGACT AGGAAATGTA CCC-TTA'rCAA TTTGTCAAAG AACATGAATT AGCCAATGTA TCTAAT'rGGC AACATGAGAA GTrGTGTT AGTATTTTGA TTAGAACTCT TCCGCTAGGC ATATTGGATT ATGGGGATAA AGAACTCTTG GCTCGCACTA AGAGAGCGGT TTTTGTGAC'r rrAATCAATC AGGAAAAGAC AGAATTTCCT CAAATCGGAG TAAGGTGGTC AGGAAAAACG 9660 GAGGAAATGG GAGACACCAT TCAACCTCGT ATTCAGCCGA AAATATACAA GGAAAATT GAAGAAGATA AACTr'rCCAA GTCAACAAAA CAGGCTAP'rC GAACAGCACC AAACAAAGGG see...
S
*q e e 0. q* C C 0 eq eq S 0
C
CC
C
C
C
tee.
et e eq C C C eq..
C C flee
CCC.
eeL.
e. *e Ce
C
CTTGAGATTC
ACTGAGAAC
AATTTAAGG
GAGTTAGAAG
CGAACTTCAA
TCGC-AGG
TTGGA6ATTTG
TACAATGCAC
ATCTGGCAAA
AATATGGTGG
GAAAAGAGAT
ACAAGGCCTIA
AACAGTTAGC
AAGTACAAGC
AATATATAGA
GTACTACCTC
CAArrTrTAAC
ATTTAGGTGG
AC'TGGAACTA TTAGATTCA'r T'rrCGGAGTT TCATT'rGAGG AATGAAGCCT ATTATAAAAA TATCACCTTG GCCACCTTGG ATGTTTCTAA GAAAAATAGA GCCrrGGAAG AGACCTTTAC GCAGAAGAAG GAAAAAGAAC GTTTGTTAGA TGTAGGTCAA GCGAGAGTTC CTrAGCGGC TG'rCAATArA TATGC1GGTA TGGA-TGATGA TTGGTATGAA ACGGCTCGCT A'rGCCTTr-GA TGTTGAAAAC TCTCTCAATG GTGGACTTTA
GATGAAAAAA
AT'rGTTAGAT
ACGTTCGCAA
TGAGTCGACT
GGAATTCACC
TACTTTGAGT
TTTTAAACGT
ACGAGGTATG
TCATTTTAAG
9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 GAAAAATTTA ATCCAACGAT TGAAGAATAC CTCTATCCTC TGTTAAGACT TGCTCTTGAT AAGTAAGTAT1 ATGGCACTAA CAACACTCAC GGTTTCTTCT CGTTCCTTTA TGCAATCTGT
TTGGGTGAAT
TTCCGTAAAA
GAAAGAAGAG
CCAGATGGGG
GGCTCGAATT G'T'rATCTrG CTTrGAAACA AGAAGGAGAA 'rTATAGCC'rG CCCATGCTGG GTGGTCTGCA TATGGAACTC TTACAATGCC CACTCATCCT CATTA-AGAAA AAAACATAGA TTTCAGACTT ATTCTGATCA GATTTGCTAG AAAAAAGAGG ATTCAAGTTG CAGCTCTGGT AArTCGGGGC CGATTTATAC GAATATGCCA AGCAAAATG ACTTTGATA GCCAAGGTAA ACTGA'rTTAG GTTATCAATT TGGTTATACT ATAAAGATTT AAAAAGGGTA AACCCTTGGT CCAACAAGAT GCTCTTCCAG TTr=rATGC TGTATTAGAG TTGCTTGTAA AACCCTATGA TCCAATAGAT GCTGAGAAAA AAAGTATTAT
AGACTTAAAA
AACTTATCAA
TCAAGATTI'G
TGATGGCTTA ACAACAGGTT ACCCAGGTGG AGAACCAGAT AACTGAATTA ACTGAAAAGA G?1-rGC1-TAA AAGTTTTAGC 473 GAAAAAGCCT CAAACCTN'C GCATTCGGTT GAAAAAGTTA AAACGTGAAG AACTATCGAT TTTTAAGAAT ATAACAAAAG AAACCTCTGA ACGTAGAGAA TATAGTGATA AAAG~rrAGA ATATTATCAG CATTTTTATG ATACTTCG AGAACAACG GAGTTTCTC-A AAATTTTCG GACTATATGA GGACAAGTTG CGACTTGATT GAGAGAATAT TCTAGTCAAT GATTGAAAAA TATGGAGAAG TCAGGAAACG ACTTATCTCT TGCACTGCTT CAAAAATATG CTTCCTAGC ATTCAAGGGA T AA'rGGC TATATTGTAC ATACAAAGCT ATCCAGTTAC GCAAAIrGCA AGGTGAACAA AGTAAACTAG TGAGTAAAAA TCCTCATTCT GAGAAAAAAC TTGAAACGTT TGAAGTTCGA AAAGCAGAAG AAGATATTGT TrrAGCTGGG AGTTTATTTG TTAGTGGTTC CTACACTGAG TTTAATAAGT
TAGCAAGCTT
AAGAAAACr AAAA'rCAACT CGCCAGAcr
TTTATATCC
TCTATGCCCC
CTAAATACAA
T'rAAACAGAA
CGCCTTTAAA
TTATGTTGGA
TTTTTrrATGG
GCAAAGCAGG
TCAAAAAAAT
AAGCATAAAA
AAGTGATGGT
TACTTTCCGr
AGTAGCACGT
TAATTCTTAT
CGTGGAATAC
GTTTTCCGTT
TACCATCCAT
TAAGATGAAA AACTCAGTAT TrGCTAGAAA GGTGGAGAGA 'rTAGATTTCT
CATGCGCTGG
GCGTCTGGTT
TTTAGCTTCT TTTAGTAAAA CTTTTTCGTT TGATAGGGGC TGGATAGTTG TGCTCTTATG TTTCTTTTCT TTTGTGTGGC GT'rTGTTTTG TGTGCTTGCT TTCGGACTTC TCTGGTATCT 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12 360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 GAACGGAGAT TTTCAAGGAG CGCTA.AAGCA AGCAGAACGG TCACTAAAAA TTGGTCAACA AAGTA'N'GAC CAATGGGAGA AAACAGGCA ACTGCCTAAG TTAAGCCAGA CAGATAGTCA CCAGCATTCT GAAGGAAGGT CGGCACAGGC CTCTGCTCGT ATTTACCTGG ATCCGCAGAT GGATTCACGC TTTCAAGAGG CTTATTTAGA AGCAATCCAG AACTGCGAATC AAACTGGTGC TTTTAACTTT GAACTCGTGA CTGAGTCTAG TAAGGCCAT ATTACGGCTA CGCAGATGAA CGACGGAGGC ACTCCTGTGG CAGGAGAGGC GGAAAGTCAA ACTAATCTCT TAACAGGGCA ATTCTTGTCC GTAACGGTGC GGTTGAATCA TTATTATTTG TCCAATCCAT ACTATGGCTA CTCCTATGAA CGCCTTG'rCC ATACGGCAGA ACATGAGrrA GGTCATGCGA TTGGCTTGCA CCATACAGAT GAGAAGTCTG TCATGCAACC AGCAGGTTCC TTTTATGGTA TCCAGGAAGA GGATGTTGTCA AACCTCCGAA AAATATATGA GACTAGTGAG TAGGGTACTA TCN'TCCCTA ,TrL,,rGC TATAATGG.AA CTATGAACAA CTTGATTAAA TCAAAACTAG AGCTCTTGCC GACCAGCCCT GGTTCCTACA TTCATAAGGA TAAAAATGGC ACCATTATCT ATGTAGGAAA GGCTAAAAAT CTGCGTAATC GAGTACGGTC CTATTTTCGT GGAAGTCATG ATACCAAGAC AGAGGCTCTG GTGTCTGAAA TTGTGGATTT TGAAWTATT GTTACGGAGT CTAATATTGA 474 GGCACT'rC1'C CTAGAAATCA ACCTGATCAA GGAAAACAAG CCCAAGTACA CAAGGATGAC AAGTCCTATC CI-rrCATCAA AATCACCAAT GAGCGCTATC TATCACTCGT CAGGTCAAAA AGGACGGAGG TC~rTT'T GGACCCTATC GGCAGCCAAT GAAATCAAGC Gcn-rGCTGGA TCGGATArC CCTTrTCGTA CCCGcC'rC AAGGTCTGTT TTTAr'rACCA TATCGGCCAG TC7TGGCCC TAAGAAGGAT GAGGCTTAT'r TCAAGTCTAT GGCCCAGGAG GTGTCTGATT TCAGGATGAC AAAATCZATCG ATGATCTCAA GAGTAAAATG GCAGTAGCAG GGGTT& yCrTGCGC AATACCcGTGA CCTGATTCAG GCTATTGGAA
CAAGCAACGG
TAAGGGCTGG
GTCAATCTCT
TTCTATCAAG
GTCATGGCGA AAGATTTGCA AAATCGCGAT G;TCTTTCGCT ATGTGTGTGC AGGTITTTCTT TG'rCCGTCAG GtAAGCTCAT TCCCCTACTT CAATGATCCA GATGAGGATT TTTGACCTA AAAAATCTCA TCTAGTTICCC AATGAGGTAC TGATTICCGCA AAGAAGCTGT CkAGGCPITG GTGGATTCCA AACAACTG4GT CAATCTAGCC ATAAAAAATG TGCTAGAAAA ATCTGTCCAA AAGACTCAAG AAATCCCGAC CCCAGTACGT ATCGAGTCCT CTGTTTCGGC TATGG'rGGTC TTT'GTCAACG ACAAGATAAA AACGGTTGTT GGACCAGACG GACGCTATGG TCGAGTACAG CGTGAGGCTT GGGGGCAAGG TCAAGTCAAT ATCGCTAAGC T'rCCAATTGC 'rGGGCTGCAA AAGAATGATA ATCCGCTTGA GGTGGTGGAT TTGTCTCGCA TCCAAGATGA GGTGCACCGC TTT-CCTATCA CTTTCTCATC TCAArTGGAT GGGATTGACG TGAAGCATTT CAAGTCTTTG ACCAAAATCA TTGGGGTACC 'rAGAGTCGTT GCAGAGGCTG CCTTGCCTCA AGTAGCAGAA GAAAGAGTAG CATAAAATCG CAAT'rTTATC AGATGTTCAT CCAGATGCTA AAAATCAACG GGCCAGTGAA CGTCCAGGCG CAAA'rGACTT AGTCGCCCTG AGATTCTTAA GCCTCAACGT CTCGTGTTAG TC'rAGAGCAG GAGCTATTGA AAATCTAGGG ATATCATGCT 13200 CACGCTTGAT 13260 CCGATGTGGG 13320 AGTGTACCAA 13380 ACACCATCTG 13440 TTCTGAAAGG 13500 CACAAAGTAT 13560 CGCTTCGAAC 13620 ACTATGTGGA 13680 CGAGCGCGAT 13740 'rGTAGGACAA 13800 CATATTGACG 13860 CAGAGAAAA 13920 AAGTTCAATC 13980 CGTTTGC1'CC 14040 GGAACTAGCC 14100 TACCG;TAAGT- 14160 GTCATTCGCA 14220 GTGATTGATG 14280 GGCTTGGATA 14340 CTCTTTGGAG 14400 CTCCAACGCA 14460 TCCAAAAATT 14520 CAGAATCTTA 14580 ATTGTCGAAG 14640 CAGGGAGAAG 14700 CACAATGAAC 14760
TCGATAACTC
GTAAACCGAG
AC1'ATGCCAG
TGACTCCTCC
AGGTTATCCA
AGCACCAAAC
A'IrCTCAGGA CTTTCCACCc
GTCTGGGACC
AGGAAGCCAG
TGCAAAGAAA
ATTACCAAAC
CGCAATGCGA
TATTG;GCTTC
C'rAAAGGACC
TAATATCATG
TAAGAAGGAT
CATCAGAGAG
AGATTTGA'TT
AGAGGAACTG
CCATGAATTG
ATTTTTCCTC
CCAACTGCGC
TAAACGCAAG
TGTGGATGAG
GTTGAACCCG
GGAAGGAAAC
CGGCGCTAGA AGCAGTGATT TGGGAGATAT Tr-rTCTTCCT TTCCTATCAC AGCAAGTGTI' 14820 14880 14940 475 CGAGGCAATT GGGATGATCG CCACAGGAAG TTCAGCTTT ACGAT'rGTCT GGCTACGAAG TGTCCTTGAG GCTTTAGATG GGCAATATGG GCGTATGACA CAGTATTTGA TGGAGCGAAT CTTGCCTTTG CTGGAAAAGA AAGAAATTGA ACCTGACAAA AACTATGGTG C1TGACTTGCT ACTGCTAGAT GCGGAAACGG ACGTrGGCAGT
CTAGAAGAC
GGATCCTGCA
CGGATTGCGC
AGTTGAGAAT
TTATGGTCAT
rTTCTATCT
GATACAGAGA
GTTCACAAGC
ATI'GGCATGC
ATAGA.ACTTG
CTCATAATT
AATTTGACCA
AGTTGCTTCG
CCTATTTTAA
AAGATGGCGA
TTATGGAAGT CAAGGGCAAC AAATCATCAA TCCAGGGTCG T'rGGGAGGCG T'rAAAAAATC ACCGTTCCCA GTATGCCG'rC ATTACTCAAT ATCCAATTTC GTAAAGTTGC TTATGATTAC CAAGTCCAAG GGGCTTCCCT TTATCGAAAT CTATCAAGAA TCAGGGGCAC AATCTGGAAT TATTAGCCAG CTTAATAGAA GAAGCTGAGT TAGAATTGGC CTGCGTCGTG ACGATAACTA 0 00 0 0 0 0**0 0 0000 00 *0 0 0 0000
AAGCATGGGT
ATAGCCAATG
CTCAA'rGAAA TGAGGT'rGCA
GAGATTGATC
ATCTAGAGGA TGTGAAGAAT TTTT'1-GATT TTTTGTAAGA GTTTCCTAAA CAAACTAAAA AAGCGATTTG CTGGTCCAAT CGCTTTTAGT ATATCTTATA ATCAAAGAGC AAACTAGGAA GCTAGCCCTA GGTTCCTCAA AGCACAGCTT GATAAAGCTC ACGTGGTTTG AAGAGATTT CGAAGAGTGT TATTGTAACT TGGGAGGTAA GAACCACCTA GATAGGTATT GCTGAGTTTT TCAAGGGTTC 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16593 CGTCTTGATA GAGTTCrTTG AGCC-CT'I-' TTGAGAAAAT GATATAATTG CI'GGGGCTAT CTAAACCACG GTCCTrGATA ATCTTTTGAA ACTCTCCCT'r AGCAAGGTCT AGGATTCGTT TAGCGGGATT ATCAGTGTGT TTCTGATTCC CGGTATCCTC TTGTGTrGTT TTACCAGCGA TGTTGCTGAC AAGGACGAGG GGATTGTTGG CACGCTCTTT TGTGTAACTC AAGTTATTGG CTGGGAAGAT GCTCTCCCAG GCGGTTCITrT CATCTACTGC CTT'rAAAACT TCGATATCAA CAALATTGCTC -TTAAACTCT TTTTGGTCCC
CTGCAGAAGG
CGGATACCTT
TACCAATATC
AGTTATTCAT
TAAATCAACG ACTGAGAGGT GTCAAAAACT AGGAAATCAA CTCACCAGAA AAATTAATTG GAATTGAGCG TTAGAAGTTC TCTGGTCAAG AGAAGTCAAA. GGATTTTTCT AAArTrGGAAG CGAGTAAAGG TATTTTTCAG CCGCAGCCTG ATAGTGACCA GAATCAAGTC GGAATTGAAT CTCGTAGTCG CTGAGTTTr AGCCTGTCAG ATTGCCCTTG TCTTCGTAGT GGACGATT-GT CTTTTGAGCG CTACTCTCTT AAAATrGTAG GATAGCTAGT AATAGGCTAA
AAG
CAAATGGTGG CACGTCGCCA TGGGTGTAGC TTGA'rTCTCA ATTrTTrCAT ACTGTCTCCA
GCTGTAGCAA
CAGGCAACCA
TTCAAATGTA
INFORMATION FOR SEQ ID NO: 53: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 3510 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: GGGATATCCT TATATCCTrG TTCCTGGAAC CATTGTGGGA ATTGCTCAAC AGTTTTTTCA CCTTGAATTC CTGGTGCAAT GACAGTAAGA ATTTCGAAAT CACGATCTGG TTTCGCCGCT AGTrCCATCA ACTCTGGCAT ACTTTCTrG CATGGACCAC ACCATGAAGC CCAAAACTrc AAGTAAACCT TTTrACCCTT AATGTGAAGT CTGGAGCATC
AAAATCAGAT
TI'TTCCAACA
a .000#0 GGCTGTTGTG CTG3CTTGAGT CITrTTAGTT AATGACAAGA GACTTAAGCC AGCAAACA'r AATTCCAGCT AGAACATTrrA CTTGTCCTAA GAAACCACCA ATTTTCTTTA GTAGCATCAT GACTAGACCI' GAAGCTAGTG CCAATACCAA AGAGTATAAA TCGCTCCTrG CCAAGCGCCA GAACTTAAAA CTGGACCAAT ACAAGGTGTC AAAGCTGACC AATAACGATT AGAATCTGAT AATTTC'?rCA AATGAAAAAT TTCCATCTGG ATGCCATATC. GAAACCAATT TOCATAGAGA AGAATAAAGA AAATGAGAGA GATACCAGCG CAGAGAACCT TTCTCCCAAA CAAAGAAAAG AACI'TAACTT CTTTrGCCATC CATGGATTGC GCAATTTGTr GTACAGTCGT TrGTTGTTrT TCTTCCTCAC CACAGGCCAT CAATACAACT ACwTTTrTCA TTTGTCCTCC TTTATTCAAA TAGTAACAAA AT'rCCCATTA AAACAATGAG ATGACGCTTG ATTTTACTAA AATATGGCAT
GAAAGGAAGG
TTGCCTCCAG
CAACCAAAC
?TTTAAAGG
TG.AAGACCCA
ATATGACCAA
ATAAAGCAAA
CrTTTTGCAC GCCATGCCaG AGTGTAAATG AAGCCGCAAG TGCTAAAACA TAAAGGCTAAT ACCAAGTAAA TAAAACTTTT TTGAACTTCT AAATGATAAT AATAGCTCCC AGTAACCAGC ACCAAAGCCT GTGTTCGAAT CAAGCCTGAC TTTCTTGATC ATCCAATAAA AAAAAAAGGA TAAAACACET 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 ATCCCAGCAT AG~aCTGGCAG AAGAGGAAAA ATACAAGGAG GCTAGAAAAA CAGAGATTAA ATTCTAATCC TATTTTACTA TATCGGGCAC TA'N'GGACCA AATAGCCAAA AAGCAACGAC TTAGGTCTGA CATTTCATAA CCTCTACTAG TTrAGITCT TTTCTTGGAG AAGGTTTTGG AAATACTATC GTTTCCAATA AAGAACCAAC TTTCTTAATA TATTCAATTT TATTTGTAAG CTTTCTGCTA CGCAAAATCG ATCTTTTCTT TTGCTAG'rCA AGGCGGATCT TATCCCCCAA AAGGATTACT CATCGCTGCT TTTGTGAACG A.AAATGTCTT ATCATGTTTT ACI'GAGTTT G'rCAAGGATT GCTTTAAGCT GTCTCTGCTG AGCCA~rTTC TTCTTTCACG AAATCAAGG GCTTTGGCAA GGACTTTTTT ATCCGCI'?r TCTGCATCTA 477 GCTGTCCTAG AACC7TGATC AATTCCGTGC TTAATTGCTG GATTTCTGAC TC?1"rCTTAC GGCGAATCAG CCAGAAGGCA ATCACGCCTA GGAGGGCAAG TAGAC 'GACC ACAATcACTC CTGCCGGAAC TGAGT7=I- TCAGTCATCT TATCTGAATC CTTACTATCT TCCGrCCT GTTTGCATC CTrCTTGTCC TGTGCAGGCT TG=rTCGCT AGCATTTGCT TrCACATCTT TGAGAGAGTC CAAGGCAGCC CAGCCCTrCAC AGACTCTACT GCAGCTATGCA GACCTTACTC TGTCAAGGCA CTATC7"rCCG GAGCTT'rrTG AGCATCTAGG AGGACAGCCT TGG7.rGCATC GATT'rrCGGA TCAGATACTG TTCCCAAACC 7TTCAAGCGT 7GGTCTAACT CTTGACTCAA GGCACGAAGT TCAGACTT'GT CAACTTCGCTC TTGACTGT GTGCTCCTTG ACCTAGCCGA AGCGC'rTGCT ACCACTCTAG GATC1TTGAGT CGCAGCTGAG CTTGAGCTG GGACAGGGCT TGCAGGTTGA CTAGCAACAG TTrATGGTATA TTGAAACTAG- AATAGTACAT ATGGACTTCT A.AAACATTGT TAGAATTCGA TTTTACTGTC CTGATCGATT TGICCTAT'rC TTT~CATT TTACTATAAT AACCGATGGT CTGGTTAATG 'rTGGTAAGAG AAA=CTGA AACCAAGCr 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 00 0 0 4 9 0 00 09 0 9 0 .t 00 0 .0.
0 0 CAAAAAAGTC GCTCGTCATC TAGACCTGC AACCAAAGAA ACTGACCTTT TAATGAGCGA CAATCTAAAC AGCTGCTAGG TT'rCTGGG'rC TTGT'rCATAG AGCGCATAGT CGATGGTAGT GCrTCTGGAT TGTCAGTAAG GTTCCTTTTA CTTGGTGGTT GTCTCTTCGT AAGTCATTGG AGCGATTAAT TCACCATTTC ATCCTCTGAT ATCTTCTTCC AGATACTTTG CCTCI'TATTA CCATATTrCTC GATAAAAATA AGTATCGAAT CCTG'rTTCGT TGCTTTAAAC TATTAAAAT CTTAAGAAANT AACGCTACTT TAGGTGTGGT TCT"rTT=C GAGTGTAGCC CATAGCTTTG TGGATCACAG CCAAAJCTCAG AAGCTATTTC AGTCAAATAA ATAG7T'='A AGTCTATCTC TATCAACTT'r TCTTGGTrr TAGCTCTCCT GTTTTCTCTT 1-rAGCl-rTAA CCACCCATAA 2340 2400 2460 2520 2580 2640 2700 2760 ATGGTA'rTAC GTGAGATTTG GAAAACGTGT GATGCrCTG TTATACTACC TATTCGCTCA 00 00 *0 0 0 *040 0 t000 0b* 00 00 00 9 0 CAATAAGAGA GAACTTTTTT ACCAAAATCT ATTGAATATG CCATAAGAAG ATTGTGTACT ATTTTTGGTT CA~rCACTA TAACACAAAA TAGATTATTA AAAAAGAGGT CTAAACCTCT TAACTCAATT' ACTCCGCCAG TAGGACTCGA TCATGATTAA CAGTCATCCG CTACTACCAA CrGAGCTATG GCGGATTAAA 'rTCCCTATCT CACAGGGGGC AACCCCCAAC TACTTCCGGC GTTCTAGGGC 'rGTTCGGCAT GGGTACAGGT G'rA'rC'CCTA GGCTATCGTC ACTTAACTCT TACTCAAAAT 'rGAATATCTA T'rCAATTTAA GAAAACCGTT CGCNrCATA C7TTGGATAA GTCCTCGAGC TATTAGTA'rT AGTCCGCTAC ATGTGTCGCC
ATTATACCAC
TTACATAACA
ACCTACGACA
GCTAAGCGAC
TTAACTTCTG
GAGTAATACC
TTCTCAGTA
ACACTTCCAC
2820 2880 2940 3000 3060 3120 3180 3240 478 TTCTAACCTA TCTACCTGAT CATCTCTCAG GGCTCTTACT GATATATAAT CATGGGAAAT CTCATCTTGA GGTGGkTtCA CACTTAGATG CTTTCAGCGT TTATCCCTTC CCTACATAGC TACCCAGCGA TGCC~rrGGC AAGACAACTG GTACACCAGC GGTAAGTCCA CTCTGGTCCr CTCGTACTAG GAGCAGATCC TCTCAAATTT CCTACGCCCG CGACGGATAG GGACCGAACT GTCTCACGAC GFCTGAACC CAGCTCGCGT INFORMATION FOR SEQ ID NO: 54: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 20986 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: CGGAGAAAAA CATGGCTAAG TCAAACTTNG AAAAAGTAGA ATCAGTTGTT GGCTGGGTTC GTGATAAGAA AATCACAGGC TACCGTATCT CTAAAGAAAC GAATGCGCGT GAAATGTCTA TCATTGCTCT GGCGCAGGGT CGTGCAAAAG TAAAAAATAT TTCATr'rGAA ACAGCCCrAG GCCTAATrGA TTrCTATGAA AAAAATTATG AAAAATTTGA AGATTAATCT TTGGATAACG 3300 3360 3420 3480 3510 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GCGGATTCTr GACCTTCAAG TACTAGAGAT CAAAAAGACT GCACGGTTGA TGCAGCCTTT TCT'rTTGTTC GGTCTTCTTT AGGA'rTGGTG ATCACGCCCT TATCCATAAA GATAACACGG TGGGTTACGA CAATCATGGT CAAGCCTTCC TCTCCAACCA T'NTCTGGATC GAGAGCTGAT TTCATGGAGA GGGCACGAGC GATGGCCACA GGTTTGGCTT GCCAGTAGCG TTCTCCCATG TrTTTCAGCTT CTGTGCGTTC GCGTTMAGG AGAACATTGA GA1TCAAA GAGGTTAAAG TATTGCGTGA GGTCATAGCC TTTTCGAGG
AGAGAATCTG
TCTTTTrATT~
AAGAGGTCTT
CCT'r"CATT TTGAGGACAG TGAGATAGCG TTGAAGGAAC CI'GGTTTACC TTCTTCAGCG TGAGAGACAT CACGGGCAAA TGAGCCAGGT CCTGCATGAT GTTGGTTCAT CAAAGAGAAT CGTTGTTTTT GACCACCTGA
TTCCATTTCA
TTTGAGGACT
AGCGTCCGGA
GAGTTGTTT
CCGACCTTTT
ACAGTTGTCT
GATrGGAAAA
ACGTTTTGTC
CCAGGTTrTTC TTTGGCAATC GAGCGACGAT TGTGTTTTCA CCATCCCCAA CT'N'TCACGG CATGATAAAG GATTTGTCCA TCAGTTGGTG TTTCAAGTAG GTTAATGGAG CGTAGGAAGG TCGATTTTCC GCTTCCAGAG CTTCCGATGA TAGAX3ATGAC CTCTCCCTTG TGGACAGTGA GTGAAATGTC TTTTAGCACT TCGTTTTGTC CATAGGATTT TTTGAGGTGT TTAATTTCAA GGATTGCFTG TGTCATTATT TCAAATCCTC CG7 TGCATT TGGTTAGCAC CTGTAGTGTA GGTATCCATG TCCATTCTGC GCTCCATAAA CCGTAGGATA CCTGTTACGG TCAAGCTGAG GACAAAGTAA ATCACGGCGA 1200 TGArGAA TGTCrGAAG TATTGATAGG TTTGTGIrCc CACGGTA?r CCTCAGAAAT 1260 AAAGTCGAC AACAGAGATA ACGTTCAATA CAGA'rCTATC TrrGATA?'rG ATGACAAAT 1320 CATTACCACTr TGCACGTAGG ATGTTACCGA CTrACCTGACG 'rAGGACAATC TTACGCATGG 1380 TCTrGGTTAG GG1'CATAcCA AGAGCAGTCG CAGC774CAAA TTG=CCTTc TCAACTGCrA 1440 GGATACCACC ACGGACGAT'r TCAGTCATGT AGGCACCGGT ATTGATTGAA ACGATGAAGA 1500 TACCAGCCAG TGTACGGTCA AGGTTCATCC CGA.AACT-rG GGCAG1'TCCA TAGTAGATA 1560 CCATCGAG ACAATCATT GCCGTACCAC GGAAAATTC AATGTAGACA TTGAGAACCC 1620 AGCCGACTAG TrrGTG CCGTAAATGA CTTTGTrT-rC AGAGAGAGGA GCAGTACGGA 1680 AGACACCAAT GGCAACTCCA ATAATGAGAC CTATGATGGT TCCCACGATA GAGA'rTAAAAJ 1740 GACTGATACC AGCACCACGC AAGAGTTG'rT GCC-AGTTrTC AGAAAGAA'r- TTAGCAACTT 1800 GGCTAAAGAA ACTACTGCrA GTCTCTrCAa TTGIwMTAGC TTCGGCACGT TGTTCCTTGA 1860 TCATACGATC CATCAAGGCA AC'rTGGTCAT CTTTTGAAAT GGTTTCAA'rG CTGGCATTGA 1920 TGGCTAAT ACGATTGTCA TT'r'I-ACGAA GCCCGATAGC GATAGCTGTA TCTTCTTCCC 18 CAGTT'rMGAA ACCAGGT'rCT ACTTGAATCA TCTTGAACTrT AGAGTTCGCA GCrCAGCAG 2040 TCACTGCTTC 'rGGACGTTCA GAAACATAAG CATCAATGAC ACCAGCCCTCA AGAGCTTG'rC 2 100 ***GCATTTGAGC GAACTCTCCC ATGGCTGTT C?1'TrTAGC ACCTGGGAr-r TGTGCAATCA 2160 AGTTATAAAG GTAGACCCCT TGTTGAGAAG TGAT"rr'CC ACCGTTAAAG TCATCCAAAG 2220 AT?1'AGCACT TGCGTAGGCA GAATCTT'TTT TGACAAGCAA AACTGGTTCG CTAGTATAGT 2280 *AACTGCTCGA AAAGGCAATT TCTTGT'rGC GTrCTGCA~jT TGACTCATA CCTGCGATAA 2340 TCATGTCAAT C?1'ACCAGAA GTAAGGGCAG GGACTAGACC TTCCC-ACTTG GTTTTAACAA 2400 CCAAACGTTC TTTACCTAAC TCCTTAGCGA TTTTCTTGGC GATTTGAACA TCGTATCCGT 2460 *..TGGCATACTG'AT'rGGTCCCA 'rCGAT'rTrGA CAGCTCCGTT GCTATCATCA TCCTGGGTCC 2520 *AGTTAAAGGG AGCATATGCTr GCrrCCA'rAC CGA'rGCGTAA ATATTCATCG GCTrGACCAA 2580 CATTGACAAG TCCTAGCATC AGCAAGAGAC T'rGTGAAAAT AGATAAGTAy ATGTGGC'rCA 2640 *TGATTTCTCC 'rATTCTGATC TATTAAAAAA TAACTGTCTC CTATTTTATC GAAAAATGCG 2700 **.TAATTT-rTCA ACATAAGTAA G'rCTTTACTrr ACGAAAAAAT OCTATAATGA TAAGAAAGAT 2760 AAAAAGGGGG CTTAGTTGAT GAAAAAACT 'N-r1rCTTAC TGGTGTTAGG CTTGTTDTGC 2820 CrrCTCAC TCTCTGTTTT' TGCCATTGAT TTCAAGATAA ACTCTTATCA AGGGGATrTG 2880 TATATrCATG CAGACAATAC GGCAGAG=? GACTTrAAGG GCCAAATCGT GGGACTTGGA 480 AGACAGAAGA TAGTTACCA GTCAGGAG CGTGCTGGA AGATGCCTAG CGGGrMGAC AAAAACGGTG CAGAACTAGC AGATGTGACT ATTGACCCTC
AGCGAAGTAA
GGCGACATAG
GATATCGCTG
GAATTTrCATG
TTTAGAGAGG
ATCCAAAGAT TCAGGCCGCG CAGAAGAAGC GGATGGTAT ?1'GAAGIrGA CCTCGTCTGG AATTAAAT'rG GCAACCTCTG TAAGGGGAGA CAAGGGGGCT GAACGATTGA AAAGAGTAAC ACTIGTGAGAG TCTATAATCC AACTrAAAAA ATTTACTT?1' ACAGATAGTr CAGAGTCTAT GAAAAACTCT 7NTCCATAC CT'rGATTATA CTATCCGTTT TAT'rGGCCTC GGACCGATTr GAAGAGTTTA ATAAGATAGA
AGGTCAGGAG
CCTTTATGAT
TGAAAAGTTr
AGGCAAACTT
AGACAATCTT
2940 3000 3060 3120 3180 3240 3300 3360 CCGGCTAAGC GTGGAG'rTGA GTTGCATGCC AGGGATCAGG GATTGAAAGG GAATCGTTTA TGCTAGCGCT 3420 AGACTCGArr 3480 GATCCTTrCC 3540 CACTCCTTCA 3600 GTTAGAGAAA AAGATCAGAG ATCTCCTTGT TATTGAGTGT GTCAAATATG CCAAAAATCA TTATCAGAAG CAGTCTACTC GGAAAATTCA CCTTTGATCA AATGTCTCTA TCATTTCAGA TNGTCAAGCT TTGAGAAAGA CTTTrCCAATT TGTTTGCGGA TCTGATGAAA AACGGATTCA TTGAACCAGA TGCAAGAAGG TATCGTCCTT TAACTGGTGG CTGCCCCTAT TTATCGGATT TACCTCCCTT TGCCAATACT TCGTCTCTAT GAACCACCAA TGGA.ATTAGA GCCTATCGTT GACCTCCTTG GAGCAAGTGA GTCCCTTGGT CAAGGGAGCT ACTTATTCAA GCTACCTTC AGGAGATGCA GTTGGTTTGA CTGCCTAAAT CTAGCTTTTT TTACAAGGTA TCTGATAGTC AGCAAGAGGG CTTCAACTCA AGTGAGAAAA CGAGTTTCCT GGAAAAGGCC ?TGCAAGTGG TGGTTTGTTC TTGTACAGTT TGGT1TCTA GGGTTAG 1rT TAAACAACTC GTTACTTGGG TCCTCCCTTC CTGCTTCTAT TTTATTTATA GAAGAAAGAC TAGATGTGAT AGACCGTGG GGCTAGTAAA AGAAGATGGT CAGGTAAAAA AGAAGAAACI' TTTATCGTAG AGCCAAAGTT AATCTTCTTT TGAAGAGGTA TCTGGGGGCT CCCAGATTAT GTATGGGTGC CTTGACTATC TAGACGTTCA TGGCTATCTT TGTCTGTTTT CTATTATTGG CGGGAGCTGA GGTCTACTAT GATTGGATCA GGCTGAACTG 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 AAGCTTCGAC TAGATAATCG TGATGGTGTT CTAAATGAAG CTCTGGACCA G~rTGAAAA TATGTTGCGT GAGATTGCAC GAAAGTATTG 'rGGTCTGGAA TCGCCTCTTG GTCTATGCGA CCTTATTTGG CTATGCGGAC
AAGGTTAGTC
TATGTAGCTT
GCTAGTGTCG
GGTGGCTrCT ATTTGATGAA GGTTCATCAG ATGGCTGGCA CAGTACGTT CAAATACAGC AAGCACCTAC CTGGAGGCGG AGGTGGCGGC ATTCAAGTGG AAAATCCAGA TATCATTCAA CAGCACAAAT TCTGTATCTT CTGGAAGTGG AGTATCGGTG CC=MTAAAG
TATCAATCTC
GAGCCATTAT
AAGTTCTGGT
AGAGCTACCA
TAGACTGAAA AAGTA'rGATA A'rAG'T'AT CTAAACTATT CGGCAAAAAG CCCTTGAAAA ArrACATCAC 7-IrrT'TCG GATAAGTTr 'N'I'GCAAGGT AGTCTTAAGA TAGGCCrAG GAGTGCCCAC CGCAGATGAG ACCTGACTGA TAAATAGAAG AAAGGCATGA ATAT~rCGAA TAATGGAAGA TAGAAAAAAG ACAAAC'TATA AGAAAAGTCA TCTTATr'rCA A'rTTGATGAT 7rGGCGATGA I=TAGAGCA AGTCCATTT 7rCAAAGGTA ATCCTGTGTT AA~rC-AGAA TCAAATGGCA GCTCm-'rTT AGGATATAAA ACAGGGTTCG GGATGATGGC TACATTGrAA TGm'TCCTT ATTCTAACTT AAGCAGGTGA AAAGCGAGGG CATGC?~rGG CAGCTTGTAT GGGAACCCCG TTTGACCAT CTTCCAGCTA AATCAATCTG AATCCAGTCC AGCGAAAGCT TGTAATTGAG CAGGATTATC TCTCGGCTAA AATGACCCCC CTAAACGATC CCCAATCCCA GTAACCGTCG TGATGACCGA GTTGAACTCA GCCATCGAGT CATTGATACA TGTTTCCGCC TTGTCAATGA GCCTCTTGTA ATGCTTGATG ATTTCGAATT CACGAGCAGG AGATG?3'GTT CCGATAGAAC GAGGTGCGAC TGAGAGGATA ATT'rCTATCA GCTTATCAAA TCCTGCCTCA AAGAGTTGGT AGGTATATTC TGAATGCTTT ATATCAAGAC AACGAGTGTA T'rGTACTrTC TATGTCTAGC CACTATT AGGTCTACT-r TGGGGTCAGA AAGAAGTTTA AGAGCGATC TGCGAAGTGA TAATGATTTG GCAAATTCCT TATATCCTTG TTCATGCAGG AAGTTCAGTA GAGCGATGAG ACAGTCTTGC TTGATCTGTC CTr'rATTATT TGAAAAAGTG AGTGGTTTAA TAACATCGTG T'NrArN'A GCGATATCAA ATATAGTATT TCAAGTCTAC ITGGTTATCC GATCAAACGT CTATGCGTTA TCAAACTCAT CCTTTCGGAA ATCGTCAAGC GATTGGAGGA GTATACCACT TGGGCTTTGG CAGTAGCTAA TCC'rGAA'rTrr TAGAAGCGGT CAATCGCTTA ATCCTT'rrCT GAGGATTAGG GTAGCGTGTC CCAACGATT TATCCAACTC AGGAAAGATG CAATCAGACT GTTTTTCTTG AGACGATGAA GCCGATTATC GTGTTGAAAT TGTrCACGAT CATGAGCGTC TTTCTTATCC GTTTTAGTCT TGATGAGCAA AGGATTGTAG GTGTAAACT'r GATTAAAGGC ATAATGTCCA GTA'rCTTCAA 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
GAATAGACAG
GAACAGTTT'r
TGCCTACATA
ACGAATTTr ATCTAAGAGT TCAAAACCAG TCCTCGAACA TTCALAGGCTG AAGCATGGGA GTACCTCCAG TGCCTTGTTA CC'N'AGACGA TACCAATTGA AACAAAAGCT AATGAACI'AA TCCATAGTGG CTGCGCTAAA TATAATATAG
GTGGTTAGAG
CTTATTCCAA
GGAGTAATCT
ATGTATCTTA
TGGTrGCCGA
AATCAAAATG
TTrGAAATT
TTTCCAGTAC
AAGCCTTTAT
AAAATCTATC TTCTTCGGAA TTGTTGAACG AATTACGGAA AGGTCACTTG AT'N'TAGCAG AGGAATTCAT CCAATACCAA GTCCATGTTT' AATGTCGTGA TTCAGCTTGG TGCTATTTA 482 GCAGTTATGG TGATTTA'rTT TAACAAGCTC AATCCTTTTA AACCGACCAA GGACAAACAG GAAGTTCGTA AGACTTGGAG ACTATGGTTG AAGGTCTrGA TTGCTACTTr ACCTTTACTT GGTGTC7TTA AATTTGATGA T'rGG7*rTGAT ACCCAC ?CC CTCATGTTGA TTATCTACGG GGTTGCCTTC ATCPA~rrGG GCTATCGAGC CAAGTGrAAC AGAGIrGGAC AAGCTTCCTT
GGACTCTTCC
GGTGGTTTGT
ATTCCTGTTA
AA~CT~C?.C
TAAATGGAAC
T'rGrTGGAGC CTCIGAGCT 7rGGGCAATT AGCATGGTGG CTATTCGCTT GGTAAATACC GTATCGTGCT GTATAAGAAA AACCTTGAAG CAAACCGCGT CAGCTT'rATC CCTCCTAGTT TGCTC1rrGA TAGGCGGACA CCTCTTTCTT TATCTGACTA GCATCTTGTG GTCCTCGTAG CCGGATTTTCA GAGAGCGACT TTTTC'rGATA AATCTTCCCG TAGGTTTTCT AGAGTTGTGA ArGCCAcCAG TATTAGGGTT ACCTCAGGAA TC7=TACCA GGGACTAGCC CAGTCGTTCA GT'rGTGACAG TAGTGCCTI'A AAGA'N'TCA G rrlrGCTC TTGGTCGCGA CTTGACCAGC TATCTGAAAA TGGTAGTGTT T7rGCACTTT GGGCAACTCT TCAAGGTT TGCAACCTCA AAACAGTGTT 'I-r-TCA'rGA CTTAAAAT TCTTGCTTAA TTCTTCATAG TT7=TGAGC AAGACTTTT ATAACATGGT T-rCAG~wrGCT AAAAA3CGCAA TAAAGCGCGT A'rACGACCGC TN'CTATATC GTCAGGTGC AACGArGTC AATTTACCTT CTATCTTGGG AATTT'GTGAA AGCCGGAGAA TCGGAGTAGC TTTTGCGGTC AACACGACTT CACCCTTTT ACAGTT'T'rT CCCTTr'ATTT ATACTCTTCG AAAATCTCTT TTGAGCAGCn CTGCGGCTAG CCAGTCATGG TAATCCCCAA AGTTGCAGGG CTATTTGGCT CGTTTGGTAA GAGTTGAAAA 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 AAATGACAAT TTTTCCAGCT 7=CrrTGTT GAeGTAGATT GAAGAGTCAG CTCT'rTT'rTG ATATCTTCCT CAGCAAGGAG CCrTGCCGAT 'rGATTTACGG ATGCGATTGG ATTTGACTGG CCTTTCGATA CAGATCATAG CCTACGTCTAC CAAAACGGTC CTTCAAGTAA ATCAGCACCA GTAAAAACGC CCATTTGATG AAGACGT'7CT ACTGTC~wr"r TTCCTACTCC ATGAAATTTG GAAATATCCA TrcTITGAG AAAATCCTCA GCCTGTTCAG GTAGAATCAC TGTCAAACCA TGTGG==T' GATAATCACT CGCCATTA GCTAAGAATT TG7TGTAAGA AACGCCTGCG GAAGCAGTTA GATGGAGTTC TTGCCAGATA TC~rrrTGAA TIGAGGCGAGC AATTTTGACC GCTGACTTGA TACCGAGTTT ATTTCTGTC ACATCCAAAT AGGCTTCGTC AATGCTCATG GGITCAATCA AATCTGTATA GCGCTTAAAA ATAGCTCGAA TCTGGAGTCC CACAGACTTG TArrTCTCAT AATTCCCTGA GATAAAGACA GCCTGGGGAC AACGTTCATA AGCTTCCT'rG GAACTCATGG CAGAATGGAC ACCAAAAGCT CTTGCCTCAT AACTACAGGT AGAAACGACT CCCCGTCCAC CTGrGCCG AGGTCGCT'r CCAATAATGA CAGGrrI'TCC TCTGAGTTTA GGATTATCCC TGATTTCCAC 483 TGCAGCAAAA AAGGCATCCA TCTCAATATG GATGATIrrT CTTGACAAAT CATTTAACAA AGGAAAAATC AACATGCCTA GCACCTTTTT ATACTCTTCG AAAATCTCTT CAAACCACGT CAGCTCTATM TGCAACCTCA AAACAGTGT TTGAGCA6ATC TGCCGCTAGC rrCCTAGTTT GCTTNTCGAT TT~CCATTGAG 'rGTTACTGCT TAI-rYTClT TA'rrATACCC rr1ICGA~ AAAAAAGAAA AAAGGACTTT ATTTTTI'CAA AAATATAATA CAG7*rTGAAA TAAAATATAG ACIGTTTTAG AAAAGAAAGT GTAAAAATAG GGAATT1'TCA C7rGlrGAAA 'rCGG.=ACTA TA'rGGTATAC TTCTCTTATG AATGTAACAG ATATGGCrGGT TAAGACAGTT =GAAGCAC TCAAAGCCT AGATTGGAAA GAAAAAGCAA CACCTTA'rGA TGGAGACGA6A AGCTTCCTTG AGAAAA'rrGT AGAAGAAACT AAAGCACACT GTCCAACATC TATCGCTGAT ATCCCTCTG ATGACTG;TTA CTAGAXAAAAA GAGGACATTA AAGATATTTT TGACAAAGCI' TGGGAAGGCT GTGTA'rCACG ATTTGTACA6A CCTAACTACA CAGGACCA6AC AGAGCGTTCA CTTCACATCA ACGAAGAAAC TCCTTTCCA GATTTATCGA CAAAGAAAAT TCGGTATCCA AAACGATGAA CTCTTCAA6AT TGAACTTCAT GCCAAAAGGT TGGCTGAA6AC TACTTTGAAA GAAAATGGAT ACGAACCAGA CCCAGCTGTT
ATGGACACTC
GAAGTTATCT
GGTATCCGTA
CACGAAATCT
TCAAATATTC
CGCGGACGTA
TCACTAAATA
GTCGCGCTCG
TCATCGGI'GT
TAAATGACrG
TAAACCTTCA
ATGTTCGCAA
TGGCTGTCTG
TGGACATCTT
AATTCGTTGA
A'rGACCAATT
TGTAACAACA
TCACGCACANC
GTTAACGACG GTAI=TCCG TGCCTACACT AC-GTAACTG G.TCT-TCCAGA TGCATACTCA 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 T'TACGCACGT CT'rGCTCTTT ACGGTGCAGA CTACTTGATG CAAGAAAAAG GAATGCAATC AAAGAAATCG ATGAAGAAAC AATCCG'TCTT CGTGAAGAAG ATACCAAGCA ?TGCAACAAG 'rrGTTCGCCT GGGTGACCTT TACGGGGTTG ACCAGCGATG AACGTGAAAG AAGCAATCCA ATGGGTTAAC ATTGCTTTCA CCGTGTGATT AACGGTGCTG CTACATCTCT AGGTCGTGTA CCAATCGTAT TGCAGAACGT GACCTTGCTC GTGGTACATT TACTCAATCA GAAATCCAAG TGATTTCGTT ATGAAACTTC GTACAGTTAA ATTTGCTCGT ACAAAAGCTT GTACTCAGGT GACCCAACCT TTATCACAAC 7rCTATGGCT GGTATGGGTA ACGACGGTCG TCACCGTGrr ACTAAGATGG ACTACCGTTT CTTGAACACT CTI'GACAACA TCGGTAACTC ACCAGAACCA AACTTGACAG TTCI'TTGGAC TGACAAATTG CCATACAACT TCCGTCGCTA CTGTATGCAC ATGAGCCACA AACACTCTTC TATCCA6A'AC GAAGGTGTAA CAACALATGGC TAAAGACGGA TATGGTGAAA TGAGCTGTAT CTrCATGCTGT GTGTCTCCAC TTGATCCAGA AAATGAAGAA CAACGCCACA ACATCCAGTA CTTCGGTGCT CGTGTAAACG TTCTTAAAGC CC?1'CTTACT GGTTTGAATG AAGTA7PTGA TATCGAACCA ATCCGTGACG AC7rTGAAAA ATCTCTTGAC 'rGGrGACTG ACTACATGAC TGATAGG'PAC AACTACGAAG AACGTGCCAA CATGGGATTC GGTATCTGTG C'TATCAAATA CGCTACAGTr AAACCAATCC AAACAATCGG TGACTACCCA CG.CTGGGGTG AATrG7rGA'r CGAAGCTTAC ACAPLCTCCTC AAGCrACAGT ATCACTrTTG ACAATCACAT ACTCACCAGT TCACAAAGGT GTATACCTCA T-rGAATT-CTT CTCACCAGGT GCTAACCCAT ACTTGAACTC ACTTTCTAGC CTTGACTTA CACAAGTATC ACCTCGCGCT CTTGGTAAGA CAATTCTTGA TGGTTACTTC GAAAACGCTG TGAACGATGT TTrACGAAAAA ATCATGTCAG ACTGTGTAAA CACTAAATAC CTCACTCCAG 484
GTGTACGA
AAGTTCTTGA
ACACTTACGT
CTGTTCAAAT
CAT7'rGC'rAA
GTGACGAAGA
AAGA'rGACCC
TACGTAGCCA
CTAACGT'rGC
ACGAAGATGG
CTAACAAAGC
CGATGTTCAC AAAGACTACA ATTTGAATCA GTTAAAGCGA AGATGCCTrG AACATCATcc GCCCTCTTG CCAACTAAAC CACTGTTGAT ACA7"TGTCAG TGGCTACATC TACGATTACG ACG7'rCAAAC GAA'rTGGCAG CAAACTATAC AAAGACGCAG TTrACTCTAAA CAAACTGGTA TTCTGTGAAC TTGTCTAAAC TAAAGGTGGT TGG'rrGCAAA GT'rATGCAGC TGACGGTATC TCArrGAC1'A CTCGTGATGA ACAAGrrGCAT AACTTGGTAA S. 55 55 S S
S
S S 555555
S
5**
S
55 S S
S
5.*
S
55 TCCACGAAGT TCTrTCAATG TAATAAAAAG GAACCCTCGG ATTCCTTCGG CGCAATATGC AAAGCTAAGC TTGAGAAAGG AAATCCGTTT TrrGAAGTTT TGATGAGTT GTTAGTGGCC TGATGTAGT'r TTTATAGCAA GAGGTAACGT GTCTTGAATT
GATGACCCCT
TCAAACGACT
AATG7'TTTG ACAAATTrCG
TCAAAGTTCC
TCAAGTTTAG
ATAAATGTGC
AAGCCCCAAA
TCATAGTAAr GT'rAAGGTCT T'rTAG4GATAA
TGCTTCATGA
AAACGATCGA
GACAACACGT
GCGAAGACGT
AACAAAAAAC
TGGATGCATN'
GAGGGTTTG
GGCTCTTTGT
TCCTTrCTTT
GAAAACCAAA
CGTTAGAATA
TAACTTGAAC GTTATGGACT TATCGTACGT ATCTCTGGAT TGAATTGACA CAACGTGTCT GAGCTAATCA AGTTCTTGAA TGCTTGGGAT AGTATGAGCA CAACTGTAGT GGGTTGAAAA TTrGA'rGTTC AGGGCGATAA GGCAT'rGCGC TTGATGTCTT ACGCAATTCA ATCGCG'1TAG 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760
ATAGGAGTAG
TCTrCAGACA
GAGAGAGTTT
CCAGAGA7* TCA'rGTGTTG
TAATGAGAGG
T'rGATACAGG
CTCCCTAGGA
CCGACTATCT
ATCATCAAAT
GACAATGTGG
TCAAAGTGGT TTTAAAGGTG CGGTTGAGAT ACTGGTCAGT ATTCTTCTCT TGTAGATGAA CTTTrAAGTTC AGGTACTAGA GTAAAGATTT CTCTGAAAGT TCTAGCATAG AAAGGCTTAA A7TTCCAGTA ATATTTAAGA GCTCTGTATT 'rGTTGATTCT AGTCTGATTA AGAGCCCTGC GAACAATTTT AGCATTGGGA AATAATTCT CAACAGTGAC GACTTTAACT TTTTTCTAG GA'rATAACTT CCAGACATAT CTTCTTTCGA GTACTTGAAG TGGTCATGAT TTTCTTAGTG AGGAGAA'N'C ATCCCAGGAG A'IrGCTTGAG CTTACGATAG AAGCGATATG TGTAAGAGCC TTTCCGACAT TTGGCAATTT AGGACTTGCA TTGAAATCGT CGATAAAAGG GArrTTAGAA GTTTACATTT AGGTGGGTGA GAATGGCTTT ATrrAAGGTG 'rCATGTG1-rC CATAAGATAC ATGCGACTTT ?r'rCATACTC AGAAATT1ATA CAGCCAGAAA GT'rTTTCTGT TCAGATAC AGGAAGCCCC T'I-I-rGTGTG TAGCAATCAT TGCGACCCGT AGATGAGACG CTGTTGGCTA CTTTGGCGATA CTGC7TrTA GCTCTGTCAA ATGCTCCTCT AGGCAACTTT GTCAGTAAAA AAATGAT'1rC TTrGAAATCCT
AGGATTTCAG
ACGGTAGAGG
TCTCGTGA
TTCTGAACGA
cCCTTTCA GGC~rrGGA
TAATCAAGTG
ATGTrTN'GT
'TTTCTAATGA
AAAAAGCCCT
AAACACTTrr
CCAAAACTGG
TAGACAGTAC
TTGTCAAAAG
ACATGCAAAT
ACGTAAGGCA
GAAGGAGGAG
TTCCGTAAAA
GGATGGTGT '1'IGACGTCTC TTATCAAGAA GAGCAA74SAA AGCCAATNTC CCCTTCTGGT
GCAAAG'TGGT
TAGAGGTAGA
GTAGGAGTTG
GAGTT'GTTTC
AATCAATGAG
AGTCGTATTT
GTAATCCTCr TGGAAATGAA GGTAGAGATG GCTAA~rrAG GGCAArMC 'rGTCrCACCA AGC 'ACAGI'G ACTTTCCGAC GCTAGGGAAA CCACCAATCT CATTTGTTTI, CCTTTACAkGT b S S S
S
S
S
S S
S
TAGCGAAGAC TrCGATATG GTATCGTGCT CTT?1'ATTCC GATGAGTAA'r GTGGTATGAT Gr'rGTTTAGG CGCTTr'rCAT TATAAGTCTT ATAATCTCCA CAGTGGGATT TACCCACTAC GTTCACTAGC AGAAACTAGA GACCAGAAGT GAAATATGGG GATAAGAATA GAGATGGCTT GATGAACTTA TAACAAATAG TGAGCCTTTT CCTCT=CG GATATCTACA AtrGTCTGAT CTAAGGCAAT CGTCAAAAAG TGATGTTTCC GGTATTCTTT CGTTGTAATA ATAATCAATG GACTAATTAG AATATTGTAT CCTGTAACAG TAATGGACTT TATTAAGTTT ACATCTGCTT CAGGTAGTGA GGAAAAGATG GTTTCTGTCA CGTCGATAAC TCCTTCAATC 'rrCTGCTCAG TATGATACAG TGGCTTqTCG CTTTCAATCC 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 6*b* 55 t
S
55..
GA'rTATTTAA A.ATGATAAAA ATCGGGATAG AGTAGAGTGA GAAAAGGTAC AGCCGATGCT TCATCCACTC TTGAACAATT GCTr'rCGAAA CATAATGTTC GTAATAA'N'A TAATAGGGAA TTCTrAAGAA AGTCAGTGCT GTTAAAAAAG TATTrCTTGAA CT'rGGATAGT ACATGCTTTC TATGAGTCTr 7TTT1-CTrGA TTCCA='GT AAACCAAAAA GATATAATCC AGrrCrCCT AGTAAGTTTG GCAATGTTCC ATCAAAATCG
CTAGATTTTG
AAAGAGAAT-r TAAACCAAAC AAAAACGTTC CGAAATGTCA T'r'rCCTAAGA CTCTTGTATG CTGAAGAATC AGTTGAATAG CCTTrGGAAAA CGAAGAATTA GCAGAACAAT GAGTAAAAGT CATGTTGGCA TGTGGCTCTA GATACATAAA GAGGTTT ATTTCAA ACTCTTTrGGA CrCAGGGAAC GTATGCTAAA ATGAACATAC AATTAGACTG CACAATCATA TCTCAATACC AAAATGAAAC CAACTACTTG A7TTTCACA GTCCTACGCG G1TAGCCTTT 486 TCAAGTGGAA ATTCCCGACG TTTCCAAGTG AGTGCCACTA T4CGTCAGGTG TGATTTCAA CACTTCATGA CTGAGTTrGAG TGTGTGACCC AATCCATACT TCCATCATTC AAATCATAAA TGGAGGAGTG CAATrAAAAA ACGAATGCGA TATTCAGGAC AGGTCCAAAC CTACTGAACG TAGTAACAAG CCACACT?T GCGATGGAAA TATACTCTTT TTGTGTAAAT TCGTTAAAGC TGGTTATAAA TTTGATTACC 7TTGTAGTAGA AAGAAGCGGA GTATT?"AA AA'rAG7rGAT GCTGATCCAA GTAATAATTC GTTTGATGAG TATCTAAATT- AAATCTCAAC TCTTCCTCGA GGAGAC=T AGA~rGTAAT GAAGTTAAAG TATCCAA'rAA TATATTTAAA A'rGGTAAwrr
AATGGTGTTC
ATGTTTCTTG
TAGACAGTTC
TATCTGTAAT
CATTAATTGA ACTTGTTGCG TAATTCCTGC AAAATGCTTA ATCTAGTTCA ATALGACCGAA TC7'rTTTTCA ATGTATTTG'r T2'AGCATAGT TACCGAATCT TAGTTGCATA TAGATAATTr TAATTATTAT AATACAAA-AG AAACTAATTrG TCTTGTCAAA AAGGTTGTGG AATTTCCGAC TTTATTGATA AAACAGCATG
TAATAAAAG
TTATTAGAAA
TATCTATTGT
TATACfATAT
TTATACGCTT
GAACATAAGT
CATTI'TAAAG ATAGTAATGA ATAI-1-rTT1'T ATCAAATATT GATGCAAGTT G7TTAATTT TGTATACAAG 'rGTClCATTG T'rGCTACGTT TGTTAGTGAA AATCCGTTTC TTCGTGTATA
GTATTGGTGG
GTCGTTCTAT
ATACTAGGAT
CCAGGTTGAG
AGTTTTrATGG
AAAAAA.ATAT
AGTTAATAGT
AAGATAGCTA
CTTATT=TT
GTGATAAAAA
AATACTATAC
TAACGCACTT
AATTTTATCA
ATCATAGAAG
13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 i4520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 CGGATTAACT CAGTGAGATA CAGATTGAAA GTACCTATGA GATTAACTrG TTCTATGAAT AATGCTTAAC AGOGAGACAC ACA'rGAAAAA AGTAAGAAAG ATATTTCAGA AGCCAGTTGC AGGACTGTGC TGTATATCTC AGTTGACAGC TTTTTCTTCG ATAG'rTGCTT TAGCAGAAAC GAGACAGGCG AAGGAGGAGC GATGGCACAA CTGTTCGCA ATAAAACCTC GGACATACAC ACTAAACAAT GGACTGTTGA GCCTG-AAACC AGTCCAGCCA TAGGAAAAGT AGTGATTAAG GCTTCTAGGA CA'rGCCGTCT TTGAGT'rGAA AAACAATACG AAGGACAGAG GCGCAAACAG GAGAAGCGAT A7"rTTCAAAC CTTGACAGAA GCCCAACCTC CAGTTGGTTA TAAACCCTCT AGTTGAGAAG AATGGTCGGA CGACTGTCCA AGGTGAACAG GTAGAAAATC GAGAAGAGGC TCTATCTGAC CAGTATCCAC AAACAGGGAC TTATCCAGAT GTTCAAACAC CTTATCAGAT TATTAAGGTA GATGGTTCGG AAAAAAACCG ACAGCACAAG GCGTTGAATC CGAATCCATA TGAACGTGTG ATTCCAGAAG GTACACTTTC AAAGAGAAT'r TATCAAGTGA ATAATTTGGA TGATAACCAA TATGGAATCG AATTGACGGT TAGTGGGAAA 487 ACAGTGTikTG AACAAAAAGA TAAGTCTGTG CCGCTGGATG 'rCGTTATCTT GCTCGATAAC TCAA.ATAGTA TGAGTAACAT TCGAAACAAG AATGCTCGAC GTGCGGAAAG AGCTCGTGAC GCGACACGTT CTCTTATTGA TAAAATTACA TCTGATTCAG AAAATAGGGT AGCGCTTGTG ACTTATrGCTT CCACTATCTT AAAAACGGAA AGCGATTGAA ACCAATACCA AACATTATAG TTAAAAAATA AGGTACCTAC TTCGGTGCCA CrTTTACTCA GCGAGACAAA ATAGTCAAAA TATCCGATTA ATTTTAATCA T7TTTAGTA AATCTCCTAA TGATGGGACC GAGr'rTACAG TAGAAAAAGG GGTAGCAGAT TGATTCTCTT 7TTGGAATT ATGATCAGAC GAGTTTTACA TTAI-rTAAAG CTGACTAATG ATAAGAATGA CATTGTAGAA CGAGGCAGAA GACCATGATG GAAATAGATT GATGTACCAA GAAAGCT'rTG ATGAAGGCAG ATGAGAT'= GACACAACAA 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 ACTCATTTTC CATATTACGG ATGGTGTCCC TGC'rACGT'rT GCTCCATCAT ATCAAAATCA 'rAAAGATGGA ATACTATTAA GTGArMAT
AACTATGTCG
ACTAAATGCA
TACCCAAGCA
a a. a.
a a ACTAGTGGAG AACATACAAT TGTACGCGGA GATGGGCAAA GTTACCAGAT GTTTACAGAT AAGACAGTI-r ATGAAAAAGG TGCTCCTGCA GCTTTCCCAG rIAAACCTGA AAAATATTCT GAAA'GAAGG CGGCTGGTA TGCAGTTATA GGCCATCCAA CTTAATTGGA GAGAGAGTAT TCTGGCTTAT CCGTTTAATT TTAATGGTGG A'rATATTTGG
AATCATGGTG
GTCTTTACGG
AG'r'r'TATGC
AAAATATTGG
ACCCTACAAG ATGGTACTAT TAGGTATTGG TATTAACGGA AAAGTATTTC TAGTAAACCT AACAGTTGAA TCGTTATTTC
AACGGGAATA
GATCCTGGTA
GA.AAACTATA
CACACCATCG
CTAATACTGC
TTGCTCCTGA
CGGATGAAGC
CCAATGTTAC
TAACTGAAAA
ATTTGCAATT GAGAATGGTA CGATTACAGA TCCGATGGGT GAGTrAATTG a a a a GGAAGATTTG ATCCAGCAGA 'rTACACTT'rA ACTGCAAACC ATGGTAGTCG GGACAAGCTG TAGGTGGTCC ACAAAATGAT GGTGGT'N=G TAAAAAATGC TATGATACGA CTGAGAAAAG GATTCGTGTA ACACGTCTGT ACCTTGGAAC GTTACGTTGA CCTACAATGT TCGTTTGAAT GATGAGTrrG TAAGCAATAA ACCAATGGTC GAACAACCT ACATCCTAAG GAAGTAGAAC AGAACACAGT CCGAT'rCCTA AGATTCGTGA TGTGCGGAAG TATCCAGAAA TCACAATTTC AAACTTGGTG ACAN'GAGTT TATTAAGGTC AATAAAAATG ATAAAAAACC GCGGTCTTTA GTC1'TCAAAA ACAACATCCG GATTATCCAG ATATTTATGG TAAAArTACC 16200 TGGGTATGAT 16260 AACGGCTACT 16320 TGACACGACA 16380 GAAATCAATT 16440 GGGCACAGAT 16500 CTTGGAGAAT 16560 AAAAGTGCTC 16620 GGATCAAAAA 16680 ATTTTATGAT 16740 GCGCGACTTC 16800 AAAAGAGAAA 16860 ACTGAGAGGT 16920 AGCTATTGAT 16980 CTTTAAAAAT 17040 CAAAATGGCA CTTATCAAAA TGTGAGAACA GGTGAAGATG GTAAGTTGAC 488 CTGTCAGATG GGAAA'rATCG ATTATTTGAA AATTCTGAAC CAGCTGGTTA TAAACCCGT'r CAAAATAAGC CTATCGTTGC CTTCCAAATA GTAAATGGAG AAA2TCAGAGA ATCGTTCCAC AAGATATACC AGCGGGTTAC GAGTTTACGA ATGATAAGCA AATGAACCTA TTCCTCCAAA GAGAGAATAT CCTCGAACTG GTrGTATCGG 7rrCTATC 'GA TAGGTTGCAT GATGATGGGA GGAC?1CI'AT TATACACACG TAAAGTGTAC AAATGATAAT ATCTATGTTC TGAACGATAC TTTAAGAAG
TGTGACTTCA
CTATATTACC
AATGTTGCCA
GAAACATCCG
TAGCACTCAA
ATCATTGAAA
TGTAAT1'CAT
AATCAATCAA
T'rCAGCTGC GAAGAGATTT AAGI'TACTT GGGGACATGT TTTCGAAAAC TACATTGCTC ACAGTTGATT CAAATNTTA ACAATGCTTC AACAGTTTTT GCGCTGGGA AGATGGGGAT ATGGATAAAA AGTGGGTGTT CTACCTGCAA 'rACTAATAAT GAAATTATTG AACATTTAAA CTCTCAGGGG ACGAGCTAAA 'N'TAACACGG CAGT'rTATCA ACT'rATGTCG AATTGAAATT GAATTACCAT AGAAGCAAAG CCAAAAATTG TGTAGATAAA GATACACCTG TACAAAAATT CCAGCACTTG AGGTNTGGCA TTCAACAAAG AGGTGA'rrAT GCTCTAACAG TTTAGCTAAA GTGAATGACC ATTGAATGAC AAAGCAATTG TAATAATCCA GATCACGGGA GACATTGACC AAGACATGC AACGTTCGAT TTGCTTAATG CGACAACAAC ATCTGTTACC GTTCATAAAC TAT'rGGCAAC TTGCAAATGA GTTAGAAACA GGTAACTATG CTGGTAATAA ATGCAAAAGA AATTGCCGG;T GTTATGTTCG 7"ITGGACAAA ATGAAAATGG CCAAACTCTA GGAGTGAATA TTGATCCACA CAATGCCGGC AACTGCAATG AAAAAATTAA CAGAAGCTGA CAAATTTACC AGCTGCTAAG TA'rAAAATTT ATGAAATTCA GTGAAGATGG AGCA.ACCTTA ACAG CTA AAGCAGTTCC TGA6ACGATGT TGTGGATGCG CATGTGTATC CAAAAAATAC ATAAAGATTT CAAAGGTAAA GCAAATCCAG ATACACCACO TGAACCACCA AGTTGGAGAT GTTGTAGAGT ACGAAATTGT CTA6ATTATGC AACAGCAAAC TGGAGCGATA GAATGACTGA GTACAGTGAA AGTAACTGTT GATGATGTTG CACTTGAAGC AAGTAGCAAC TGGT'rMGAT TTGAAATTAA CAGATGCTGG AAAACGCTGA AAAAAC'rGTG AAAATCACTT ATTCGGCAAC TAGAAGTACC AGAATCTAAT GATGTAACAT TTAACTATGG ATACTCCAAA GCCGAATAAG CCAAATGAAA ACGGCGAT7hT TTGATGCTAC AGGTGCACCA ATTCCGGCTC GAGCTGAAC CTCAGACTGG TAAAGTTGTA CAAACTGTAA CTwrMACAAC GCTGAAACCT G'N'TTATTCG TAAGTAAACT 'rTGCACAGAA AAAGGArrAT TAT'TGTCATG TTAAGAGATA TGAATAAGGA GAAATCATGA CTGCCTTATT ACTGACAGCG AGTAGCCTGT 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 AGACAAAAAT ACAGTTACTG TTAACGGATT GGATAAAAAT ACAGAATATA AATTCGTTGA ACGTAGTATA AAAGGGTATT CAGCAGATTA TCAAGAAATC ACTACAGCTG GAGAAATTGC 489 TGTCAAGAAC TGGAPAAGACG AAAATCCAAA ACCACTTGAT CCAACAGACC CAAAAGTTGT TACATATGGT AAAAAGTTTG ATTrTGTAATr GCAAATCG GAGTCAAGAA GAGAAGCACT TGCTTATAAC GCTCTTACTG AGCTCAAGCT GCTTATAA'rG AGATAAGGAC AATGAAAATG TACAGGCC?1' CTTGCAGGTA ATTACTAACT AGCCGTCAGA AGGCATTGAG TATACrGCI'G AA'rCACTATC CCACAAACGG GATTATCGGT ATTGCAGTGT TTAAGTAAGA GAGAAAGGAG TCAAAGTTAA TGATAAAGAT AATCGT~rAG CTGGGGCAGA ATAATG3CTGG TCAATATTA GCACGTAAAG CAGATAAAGT TGGTTGTTAC AACAAAGGAT G-CTrTAGATA GAGCAGTTGC CACAACAACA AACTCAGCAA GAAAAAGAGA AAGTTGACAA CTGCTGTGAT TGCTGCCAAC AATGCATTTG AATGGGTGGC TTGTGAAATT AGTTTCGAT GCACAAGGC GCTTTGAAAT CATATTACTT AGAAGAAACA AAACAGCCTG CTGCTTATGC AATTrTGAAGT CACTGCAACT TCTTATTCAG CGACTCGACA GTTCAGGTAA AGATGACGCT ACAAAAGTAG TCAACAAAAA GTGGTATTGG TACAATTATC 7TTCCTGTAG CGGGGGCTGC ACGCATATGT TAAAAACAAC AAAGATGAGG ATCAACTTGC CCATTGATGA CAATGCAGAA AATGCAGAAA ATGATTAGTC 3 GTA'rCTTCTT TGTTATGGCT CTGTG?~r= CGCAAGAAGA TCACACGTTG GTCT'rGCAAT TGCCATCTCG TGATGGTCAT CGGTTGICAAG ATGATCGGGT GCAAATTGTA AGAGACTTGC CTCTrGTATG GGGTGCACAT GCAGTCCAAG TGGAGAACTA TCAGGAGGTG GTTAGTCAAT TATGGAAGTT GGATGA'rTCG TAT'rCCTATG ATTCGTGGGA TGAGAATAAA CTTTCTTCTT 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 TCAAAAAGAC TTCGTTTGAG ATGACCTTCC TTGAGAATCA GATTGAAGTA CAAATGGTCT TTACTATGTT CGCTCTATTA TCCAGACGGA TGCGGT'rrCT AATTTTTT TGAAATGACA GA'rCAAACGG TAGAGCCTTT GGTCATTC;TA CAGATACAAD GACAACAAAG G;TGAAGCTGA TAAAGGTGGA TCAAGACCAC AGGGTGTCGG CTTTAAATTG GTATCAGTAG CAAGAGATGT TTCTGAAAAA TGATTGGAGA ATACCGTTAC AGTTCT'rCTG GTCAAGTAGG GAGAACTCTC
TCTCATATTC
TATCCAGCTG
GCGAAAAAAA
AATCGCTTGG
CAGGTTCCC'T
TATACTCATA
AAAATGGAGA GATTrTTCTG ACAAATCTTC CTCTTGGGAA CTATCGTNTC AAGGAGGTCG AGCCACTGGC ACGCTATGCT AGCTGGTGAC GATTACGGTT AGGTGGATGG 'rCGGACCAAT AAAGCCGACA CTATACTCCT AAGATGGTCG TTTCCGAGTG GTTACGACGC 'rGGATACGGA TGTCCAGCTG GTAGATCATC GTCAATCAGA AA'rTACCACG TGGCAATGTT GACTTTATGA ACCTCTCTTC AAGGGGCAAT GTTCAAAGTC ATGAAAGAAG GTTCTTCAAA ATGGTh.AGGA AGTAG'rTGTA ACATCAGGGA GAAGGTCTAG AGTATGGGAC ATACTATTTA 'rGGGAGCTCC 490 AAGCTCCAAC TGGTTATGTT CTCGTAAGGA ACTGGTAACA ATACAGGGGA AGAAACCCTT GGTTATTGTC TTACGAAAAA ATAGCAGGCT GAAGGGAAGA ATGTGGCATG AATCATCAAT CAA 'TAACAT CGCCTGTTTC CTTTACAATC GTGGTTAAAA ATAACAAGCG ACCACGGATT GTATATCrI'G ATGC?1'GTTG CCATTTTGTT ACCAAATAAC TGATATTCAA TGTACATCAT CCAGAGTACT CTGAGGTGAT GTTAATCAGG AACGGATATG AGGCTGGGCA GAT7GTGCCA
GGGAAAGATA
GATGTGCCAG
GTTTGGTAGT
TATGAATAGG
AATCATGGTG
GCCTCATTGT
20640 20700 20760 20820 20880 20940 20986 GGGTrATrGr TI'GTAAAACG ATAGGACTGG TCTGGTAATC ATTTTA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 21040 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCCAGCAAAA AGCCATCCGA AGATGACTTT TTTGCTATrT AATrTCTGTA CCAAGCCACG CTTAACAGCT GGACGATrGG CAANTNTTC TGCCCATTT GATAACTTGA GGCATCCAAG AATTTTGCAG AACCTTGGTA AAGATTTCCT GTCCATACCA AGACCAGATA GCA.ATATCTG CAATCGTATA GTCATTGCCT
TAAGTTACTT
ACTAGATN'TT
TGAACTAACT
GCAATATAAG
CTTTCTGAGC CAATTCCTTA TCCAATAAAT CCAACTGGCG TTTCACTTCC ATCGTAAAAC GGTTAATAGG ATAT'TCCAAT TTTTCAGGAC CATAATTGAA GAA.ATGTCCA AATCCCCCAC CTAGAAAAGG TGCTGCACCT GCTTGCCAGA ATAGCCAATr CAAAACTTCT ACCTTTTCCA CAGGATTACT TGGTAAAAAG GCTCCAAATT TCTCAGCAAG GTAAAGAAGA ATATGAGCAG ACTCAAAGAC TCTrACGTTT TCAGTACCTG ACTGGTCCAA TAAGGCTGGA ATCTTGGAAT TTGGATTGAG CTTC-ACAAAG TCTGATCCGA ATTGATCCCC ATCCATGATA GCAATCTTAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 ACAAGTCGTA AGCCGCTTCC TTAAAACCAG CCTTCACACC ATI'TGGTGTT CCCAGTGAAT AGTTTTGTTC GAAACGGGCA CCTGCTGTTG TACTACTC ATCCTGCCAT ACGGTCGGTA AAATCGCATT CTTGTCAAAA CCGAGTTTGC CTT CTAGTAA TTCTTCCAAT AAGATAGTAA AAAGCTGAAA AGCTTGTTCT CCTTTTGGCA GTCTGTTTAG CCCCGTAAAA GCTCCTTGAT ATTGATATGC TGACATCCGA AACCTCCCTT GTTGAATAAA CTTAACGATT TCGACGATGA TAATCATTGA GAAGCTTCCA GCCATAACAA TTCCCCATTG TGACAAGTCT AGTTTGGTTA CGTGGAAGAT TCCTTCAAGC GGTTCTACAA CGATTGTTPGC CATGAGAAGG ATAAAGGATA a a a a a a a. a.
a a a a a a a a a a 491 CCAAGATGGA CCAGTTAAAG GTCTTAGACT TGAATGGGCC AAC1'GTCA.AG ATGGA7-PGGT AGACAGACTT GACATTGTrAG GCATGGAAGA GCTGAATCAA ACCAAGGGTT GCAAAGGCCA 'rCGTrAGGGC A'rCTGCATGA ATAGCATGAT TGTCACCCAC ATGAACrGGG TAAGCAATCG CAAGGCCATA AACACTCATA ACAAGAGCTG CTTGGAG 'AC ACCTrGATAA ATGATAGAAC TCAAAACACC ACCTGAGAAG AAGCTTCCCT TGCGTCCACG TGGT-rrATGA TTCATGACAC CAGGTTCCGC AGGTTCAACA CCAAGAGCGA TAGCTGGGALA GGTATCCGTr ACCAAGTTGA TCCACAAAAG ATGAACCGGC 'rGTAAGACAT CCCAACCAAA CAAGGT=GAT AGGA.AGATGG TTAATACTTC AGCAGTA'NTA GCAGAAAGTA GTACrGAAT AGTCTT'NGA AT'rTAGA AGACCTTACG TCCTrCTTCC ACTGCGACGA TAATAGTCGC AAAGTTATCA TCTGCAAGAA TCATATCAGA AGCCCCCTTA GAAACCTCTG TACCAGTGA'r TCCCATACCG ATACCGATAT CGGCTGTTTT' CAGAGCTGGC GCGTCATTGA CACCGTrCACC TGTCATGGCA ACCACTN'AC CTGTTTr-TG CCAAGCCTTG ACGATACGAA CCTTGTGTrC TGGAGACACA CGGGCATAAA CAGAGTATTG ACCAACCACT rTrCAAATT CTTCATCTGA CAGTTC-AT'rG AGTTCAGCAC CAGTTAAAAC GTGACCTTCT GTATCGrTTG CGTCAATGAT TCCCAAACCT TTGGCAATGG CTrCCGCTGT GTCTTGGTGG TCACCTGTA-A TCATA.ATTGG ACGGATI'CCC GCT'rCCTTAG CCACACGAAC AGCCTCAGCG GCTTCAGGAC GTTCAGGGTC ALATCATCCCA ATCAAACCAG TAAAAATTAA A'rCA'TrCA AGCTCTTCAG AAGTGAGATT TTCTGGAATA CTATCGATAA TCTTATAAGC ACCTGCAAGG ACACGCAAGG CTTGATGAGC CATTTCAGAA TTGTTTGTAC GAATGAGATT 'rGTAACCTTC TCATCAATCG GAGCAATATC CCCAGCCTTA TCACGAAGAA GACAACGTTT TAAGAGTTGG TCTGGCGCAC CCrTGACTGC TACAAGGAAA CGACCATCTG GCAATGGGTG AACrGTTGAC ATGAGCTrAC GGTCAGAGTC AAATGGCAAT TCAGCTACAC GAGGATATTT CTCTAAGAAA CCTTTGACAT CATAGCCCTT GTCCAAGGCA TATTGGATAA AGGCTGT'rTC GGTGGGrCA CCAATCAAGT TACTc~rcCAC ATCGATT=C GTATCATTGG CCAAGACAAC TGAACGAAGT AGTGGCA'rT CAAGACCTAG TTCAATATCA TCAGCTGAGT CATGTAGAAC CGCATCGTAG AAGACTT=r CGACTGTCAT CT'rGTTCATA GTCAGCGTAC CAGTCTTATC ACAAGCCATG ATTTCAGTTG AACCAAGTGT TTCAACTGCT GGCAACTTAC GAACGATGGA ATGTCGTI-rG GCCAAAACTT GAGTACCAAG AGAAAGAACG ATGGTAACGA TAGCAGGAAG TCCTrCTGGA ATGGCTGCAA CGGCAAGGGC AACAGAAGrC AACAACTCAC CAAGTGGATT TTTCCCTrCA ATGAAGACAC CCACTACAAA AGTAACAACG GCAATCACCA 1080 1140 1200 1260 1320 1380 1440 1500 1.560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 492 AGATAGCATA GGTCAAGACC TTAGAAAGGT TG=rCAAATT ?I'GTITGAG'r GGTGTATCAG TCTCATCCCC ATCTTGAAGC ATACCAGCAA TATGACCAAC TCAGTGTAC ATACCTGTAT TGACAACAAC ACCCATCCCA CGACCATAGG TTACGTI-rGA GTrGGAAG GCCATGTTGA
CACGGTCACC
CAGATTCACC
GGTCCGCTGG
AATACCAGCA TCTGTCGCAA GCTCGACTGA CAAGTCTTT TGTCAAGGCT GCTTCrCAA TTNTAAGAGA GTTGGCI-rCT TACCACGTCA CCTGC TCAA GGGCAACGAT ATCGCCTGGT TAGAGTCAAT CTCTGCCATG TGTCCATCAC GAAGAACGCG GGCAACTGGA ATTTGAGGGC TTCAATAGCT TCTTCAGCTT TTCCTTCTTG GTAAACACCA
TCGACTGGTA
ATCAAACGTA
ACCAATTI'r
CTAGACATCG
AAGGCAGCG;T
CCAGAAGTCA
AATTGCTCGA
CCAAATTCGG
TGATGATAAC CACAGCTAGG ATGATAATGG CGACTGACAA GATT~ctGCC GCAACTAGGA TGAATTTGAC CAAGATTGAT CGTTTCTCGC CAAGGCGCT TTCCGCCTCA CTTGATGACA AGACCTCTTC AGGGCTCTGA GTATAAAACG TCCTCCTTGA CATTG'rGTGC AAAACAGACT AAAAAGAAAC CTGTTAATCA TAACAAGTCT TTTTCAGCAT AAAATTCGGA ATGACGACAC AGTAGTACCA TTATACCAAA TTTTGGGGAG AACCTTGCTC GGTCGCATCC ACArCCTGCA CT'rGGCGT'rT TTGTTCrTTT GACATGTGTC CTCTCTCI' CATAGC=rT CACGACAAAC CGCTGTTTAA GATAGGGCCG GAAAGCATAC TATCACAGGT TTCTGCCAGC TACTCCCT'rG T~rrCAAAGA GTAAAAACTG CCTTATTTGA CATCTGCGAT ATCrTCCCCA TGATAATCAT CAAATCCTTA CTTCTTCGAG TTC-ATTGTGC 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 A7'=TCCTT
GAAGCAATCT
CTATTTGATC
GGTGGCCGAT
GATTCGGATG
GAkAAACCAGT ATAATGGTAG ATCTCA.AATC TCAAGT'rAC 'rrGTCTGCAG CCAAA'rTAC GGTTTTAATA ACGTATCGGA GCGCGCCACC TGCAGACCGT
AATGCTATGT
TGAGCGTGGA
AGCTGGTCAT
CATCATTGGA
GACCACCGTT
GACTAGAAAG
GCCATTATCA
CTCCtTCATT
AATGTGGCCC
TTGGTCATTG
TCG'NTCGA
GAAGT'rGAAT
GTATTTCGAC
CATCCAGTTT
TCTTAATCGG
GAAGAT rGAA TGTTCTAAGA GATTTGGCAA GCTTGATCAC TTCTATCATC ATGTTCTATG GATACCATTC AAAAGATTCT CAGTCGGGAA GAAACGGTCA TTGATCCTCT TGG'rGCAACT CTAGGAA'rCA 'rTTCTGCAGC GArTATGTTT GTGG'rCTATC TCTACAATAC TCGCCTCACT AAGAAATCCA ACTCCAATGC GCTGAAGGCA GCTGCTAAGG ACAATC?=C TGACGCTGTT ACCTCACTTG GAACCCCAT TGCt!ATCCTA GCTAGTACTT TCAATTATCC GATTGTGGAT AAACTCGTTG CTATCATCAT CACTTTCTTT ATCrrGAAGA CTGCCTATGA TATCTTCATC GAGCTTrCCT TTAGTCTTTC AGATGGCTr'r GACGACCCCC TGCTCGAGGA CTACCAAAAG GCTATCATGG AAATTCCCAA AATCAGCAAG GTCAAATCGC AAAGAGGTCG CACCTACGGT ArCAACATCT ACCTGGATAT TACACTAGAG ATGAATCCTG AC?1'GTCTGT TrTGAAAGC CATGAAATCG CGGATCAGvGT CGAGTCTATG CTGGAGGAGC GrrTGGCGT C??TGATACC GATGTCCATA TCGAACCAGC ACCrATCCCT AAATTGCTTA TGCGTGAACA ATTGATTGAC GATGATTNTG TCrATATTCG CCAAGATGGA AAAAAAGAGT TAAATTCTGC TATCAAGGAC AAACTCATCT GCTATGAGTT AGATGGTATC ACCTGGCAAA ATATCTTTCA TCAAGAAACC CGGGATTTT CTA7rC'rTTT ATACTCAATA ACAGGCTGTA CNTGAGTCGG CAATGTGAAG AGTCTTAACT ATCAAATTCA CTGAGATACT GAGGATGAAA rTTTAGACAA TGTCTATAAA
CAAGGAAACC
GAGCAGATGG
AITCAAArrA
ATCCATACCA
AAAAAAGAAT
AAAATCAAAG
CCGACATAGT
CATAGCGTTC
AACTAGAAGA ACTCTTGACT ATAA.AGAGGC TTATAAGACC CN'CCATCAG 'rCAAAAAACC GTATCTGGCG TCGCCACGAA AGAGAAATCC TTTCATGAGA TGCAAATTAG GAAGCCGGTC TTGCACTTTG ATTTTCGAAT GTATTTTTCA AGGAGTGCTT CATTTTCTC ATCCAATTCT '=TGGAGAG TAGCCAGCT'r ACCAAAGTCA GAGCCG'rTAG
CCTGCATTTC
TACTTGCCCA
CTCTTCAATA GCAGCGATAC GTTTTT-CCAA GGTTTCAATA TCACCTTCAA CTCCTGCTr'r TC7rGGTAGG TCATGCGTTT CTTGTCTI'CT CGAACCTTGA CCAC7"7"TC CNrTCGGCC ?TrGCACTT GATTGGCCAT ATCTGTTTCA AAAGCTTTT CATCAAGATA GTCGGTGTAA TGACCAAAGA AAGGACGAAT CTTGCCATCC TCAAAAGCGA 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 GAATCTTG3GT
CTGCAAAACC
TTGGC'TCGTC
GT'rr'rCTC CGCTACCTTA TCCAAGAAAT AGCGGTCGTG ACTGACTGTT AAAACGGGAC I'rGCAAGAAA TTCTCTAAGA CTGTCAAAGT TGCAA'rATCT AGGTCATTGG TAAAAGAAGA ACATTTGGTT TTTCCAAAAG CAGT'rTGAGG AGATAAAGAC ACCCCCTGAC AATTTCTCAA TCAAAGTCCC ATGCGTCGAA CGTGGGAAGA GGAATTGCTC CAGCAACTCA GCGATGGAAG CTGCCACTC CTGCAGGTAA TTGATCACAC 'rCGTAGAACC ACCACTGGTC TTGACCTCCT GCTTTC ATCCAAACCC TCAA~trGTT GAGAGAAATA GGCGATG.CGA ACAGNTTCCC CAATCACAAC TTGTCCTGCT G;TCGGCTCAA GACTTCCTGC AATCAGGrrA AGTAGGGTTG ATTTTCCAAC ACCATTGTCC CCAACAATTC CAATACGGTC TTTAGCCCTGA ACTAAGAGAT TAAAATTTTG CAAAATGGGC TTA7MrCAT AGGCAAAGGA AACATCCTGA AACTCGATGA CTT'rCTTCCC AATCCG.ACTG GTTTCAAAGT TCATAGTCAA GTCTGTCTCA GCACTACTGC CTGAAACTTC CTrTTCAGA TCATGGAAAC GATTGATACG AGC'TTGT'rGC TTGGTCGCAC GCGCCTGCGG TTGTCTGCGC ATCCAGGCCA ATTCTTGTT GTAGAGTTGT TCTTrTGT GAAGAAGAGC CGCGTCGCGC TCATCCTrr 494 CCGCCTTAG GCGAACA'rAG TCCTGGTAAT TTCCCTGGTA CTCGGTCAAG CCTGCACGAT CCAACTCGAA AATCCGI'GTr GACAAAGCGT CTAACAAATA ACaATCGTGA GTG.ATAAAAA GGACGCTC??r CrrAGAATTT TTCAAAAAGA GTCAGCCA CTCAATAATC GCAATATCCA GATGGTTGGT CGGCTCATCC AAAAGCAAGA GGTCGTGGTT GCCAACTAAG ACYCGCCA ACTGTACCCG 7427rCTCAGA CCACCTGACA ATTCCCCAAC AGGAGTAGAT AAGTCTTGAA TGCCCAATTT GCTAAGAACG GTC7*rGACCT GACTTTCGAT TTCCCAAGCT TGGAGAGAGT CCATCTCTGC CATGACACGT 'rCCAAACGCG CCTGCTTGTC CTCACTATAG TCGAGCATAA 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 TCAATTCATA CTCACGAATG AGCTGGATT'T CCTTGAGTTC AAACTGTrCTT TCTATCATCA AAATCAGGAT CCTGAGTCAA TTTTAGCTGA AAAAGGACTG ACATCCCCAT CAAATCCAGA AAAGGGTGGI' CTTGCCAG'rC CCA'rTGACAC CGATTAAACC TAA'rAAAGGA AATATCCCTA AAAACGGTCT TGTCACCAAC CGATAAAATC ACTCATTTT TCTCCCTCAG GTAAGCATGC CAATTCTCCA TCGACAATGG CAAACTCAAT CTCTGTTAAA ACTAGATAGA ACCGTATCCA GTAACCAATC TGGTAATCAT AACACCAGAA AGGACGTCCA
AATTCTGTCT
GGATTTACTT
ATGGCI'TCAC
ATCTCTCCCA
S*
S S S. 55
S
S
*e S S
S
C
S.
TGGCTGATAG CCATATTCCT TGATCAAAAT ATGGATAGTC AAGCTTTGGT ATTTTCTGT AGCTrGACGA AGATTTTCAG CCTGTAAAAG CT'rGCTCAAT TCTCCATTTT CACGCAGAGC GGCAAACTGG CGTGAGGTCT TCCAAGATTT AGCCCATAGT AAAGCCGCCC AGGCTTGTTC ATCAAACAGT CTGTTGAGCT TGTCCTGGCT TTGACTCTCA ATCATGGAAG CCAAGCCCCT AAACTCGACG AAGGTACGCT CTACAGAAA'r AGCTTTAAAT GTTTCTGGCT CAAGTGCAAA CATA6ATCCGT AAAGCATCrT CGTTGAAACG TTGCTTTTCC AAATCTTCTA AACCATGGAA ACCGCCATTA ATCTGA6ATCT GATGGCTTGT GGGTTGACT CAAATCTArG TCAAAGCGAT CAAAATAATC AGCAAATCCT CAAAAATGAC TGCGCATTTT AAGTCA'GGA 6960 AGTrTTCAA 7020 GATTATTCTC 7080 AGTCTGG.GCC 7140 CTTTCTrGTC 7200 CTTTTCCTTG 7260 AACAATCTCG 7320 GAACT'MTT'' 7380 CA.ATC'rCCAA 7440 AGAGGATTCA AAAGTAAAA'r CAGTCTCCAA AGATGCCATA TCAGGGAGAT AGTCATAAGC TCTCCAAAAT GGAGCCAGCA AGAGTTTATC TTTCTCCAAA AGCGGCGTCA AGGTCTTCAT ACCAAGACTA GCCTGAAAAC GGAAACCACG CTCACTAGCC ACTCCAACTG CTCGCAAGAC CAAGTCAACG ATTTCTCCTG TCTCATCCAA 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 *5 S
S
GGCAAAGGCG 'rrGACTGTGA AATCACGGCG TTTGAGGTCT TCTTCTAGCG ATCGTACAAA GGAAACCGCA CTGGGTCTGC GATAGTCCAC ATAGACATCC TCTGTCCGAA AGGTTGTTAC CTCATACTCC TCATCCCCAT CTAAGACCAA GACGGTTCCA TGCTCGATTC CGATATCGGC TGTTCGCGGA AAAATCTGCT TGGTCTCTTC TGGATAAGAA GACGTCGCAA TATCCACATC
GTGGATAGGG
GCCTGCTTCT
CGTTAATCTC
TTAATrCCA
GTCAACCCTT
CGAACTGCAGT
CTATGGAGAA
TTATTTT
ATAATAACTG
AATTGACTCC
CTCCCTGArr
GGATGCGCAT
GGGCATCTCG
CTAATACTG
TTCTAATCCA
TGTCATCAAG
GCCAAAGATG
ACCATCAAAG
CTGAATTGAC
ACCATCCTT
AGCCTGCGTC
GCCACCCAAG
495 AACAGAGCCC CCAACAAAAT AAGCC1TCAAA TAAAGCCTC TGA.AATTC-AG AAGGCATTTG 'rAGACAAGCT CATGACGC'rT GACAACTTCT GAGATGCGAT CATAGGAGTC ATGACGGAGG ACTTCCTCAT GAGCTACCA.A GCCTCGCAAA TCCTCATCTG CTGCACCTTG TTrAATrGGCTG 1-rCCACTCGG ACAT'rTGGGA AATATTTGGC GCAAAGT'rAG GGGCAATCAG
TCAGCACCAC
TCTCGAACCT
TTCTTGTCA'r
CCAAATTCCA
GAGCACCAGC AATCAGCTCT C'TGCCATCAA CTCAGCTGTT GATGGAGCTC AATAATCTCC TGAGTAAGAC AGCACCCAAG e.e.c
C
C*
0
S
S.
C C
C
*e e.
S S
C
S. S. S 0 GCAATTTCTT CACrCGTGAA ACCAGTCGTT GCAAALACGTG TATTTTCGTA GGCAACAGCT TCAA.AACCAG CTAAATCAGC CTTATCCTTG GACTCAAAAG GATCCAAAAC TCCZACCAAG CAAGCAGCCT GGCCCATCTT TCCCTTAAAA AcTCCTrGTCT AACATACAAA GTCCGTAAGA GAACGTC'rA CT'rCTTGGAA GALACTATCTT ATCAAGATAC AAAGGACCTT AGCTGCCTCT GAAAGGCAAG ACAGGCATCT TTTTTCAAGA ACAAGGCATC TTAACTAGCC TAGAAGCGCC ATACTTCCTG CTCCTAGGTG CGTTCCAATG GAACCCAAGC CAAAATCAAG CAAGTGcTGA CCATGAATGA CAATGACCCG GTArTGACCT TCTTGGGCAC GAGAAAATTC TTT'AGCTCr CCAACTACTG GAGCAAAGCC ArTTCAAGA GGAGTAGTAA AATCTACCCA GACATCCGCT AAAACAGGAA TACCCTGCCA 'IrCTCACTCA TCCA.AGTCTG GATCAGTCAA TACCATCTGA CCGGCAATAA TT-ACTCGAAT ACACAAAGTG AAAATAGGAA
ACTCATCTCT
TTCCAATCAA
8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 TTTCACACAG GGTTCCAGGC GTGTTCAATr GAAAAATAGG GAATGGCACT GACTTTCCAC GGCAGGTAGT CCGTGTTCAA TTTCTAAGAT AACTAAATCA CTGGAATATA ACCCAGAGCA ACACTACCAA ATGTAGCAAG TGAAACATCC CGCAATTCTT CAGCCTTTTC AGGAGCATTC GAAGCCGTG TTTCCT'rGAT AATTTCAATT of** *040 0S AAGCCTT'GG TGGCCTTCrT TCAGTACGA ACM=rTCGT AAACTCAAT CACACCT'rGA rCGTTAAAAT AAAGGA'rTGG C1-rAATGCTA AGCAAATGC CCAAAATGGC AGCCCCATT GAAAGGCGTC CACCT?r'AC CAAATGATCC AAGTCATCTA CCATGATAAA CGCTGACGTA CGGCTGATTT GAATGGCTAG CTTATCCTGA ATGCTGGCAA AATCATCGCC CTGATCACC CAATrAAAGA CGCIrCAAC CATGATGCCT AGGGGAGCAC TTGTAATCAA AGTGTCTCGGG AAAGCAATGG TTAACCCCTC ATAGTCATCG ACCATATACT GGATATrTTG GTAAA.AACCT ]Page(s)., were not lodged with this application 497 GGCACTTGCT CCTTCAGTT CAACCAAGTA Ar.CTACTGCT GCT'rGCrAA GGACACCAAG TTTGTATACG TGAATCCCTA CTGAAAgGAG ACCTGCCACC AAGGCCGATT CCAACATTTC CCCTGAAATA CCTTGTCrAC GTCCTACAAA GACTTTCGGC GCTTCCGTTT CATGTTGACT AAGAACATAG CCTCCAAAAC GTCCTAGTTr AAACGCTAAT TCTGGTTA GTTCTAGGI-r AGCTTC1'CCA CGGACTCCAT CAGTCCCAAA ATATTTACCC ATTGTTATAA AATCCTTTC TATTrTAAT TCGTTr'GA ACTAGTN'G-= ?CCG=ACC AAGATCGTCTC CTTGTACTTG AATTTGATGI' GCTTGAACTT ATTGTATCAA ACGGAGTGAT AACTGCCGGT AAACGTACTG AACCACT1GTA AT'rACCTGT? ATCTTATCA.A TTCATCCAA TGTCTCTTGG GCTCCTACTG GT'rrGTAGT CGATGAbACTG
CACCTTCATT
AAGACAACAC
ATACCTTCGC
TCACTCGTAA
CATTGCGGTC GATTGCCTGC TAGTTGGCAA AACAGCGATA TAGACACTTC TTTA'rCGAC ACCATGACAT TTTCAATTTG TACCCGACTA TCAA'N'TGAC TAGGGTCAAT ATCT'rTACCT TATCCTTCTG AGCCTTCTTA CCAATCTTGA CTCGTAATTTT
CTCTGGTACA
TTGCGGAGTC
AATCGTTCCA AACGAACL-rC GCCACAGCGG TCAGCCCATT ACACCGGCAT CTGTTAGGTC CTACCTAGCG ATAGCGAT CTAATAAAAT ACTTATCACT GTATAGGTTT CCGT'=TAC GCATAGACAA ATAAGACACA TTCATGTTTC CATCCTCCTA AAGTAAGATT TCACGTAATT TCCATTATAG GTTATCGAAA TCAGACrTCT CATAAACCGA TGTGT~rrTT GTCAAGGGCA CACCGCACCA TCATGTAGGG
GGGTAAATCT
AGCAGTAACC
TGCACCAGTC
ATTATAGCGT
CTGCCTAGCA
AGCAAAAAAG
GCAATCGTTC
TCAATGCTCA
TNGAATTTAC GTGTACTTTC TI'GCATTTCA AAGACCAC1'G ATACTTCTGA AGCAAAACCG ATGTCAATAG GGACATTTGT TACTGTATTA CTGGTACTGT TTTGAAAA'rT CGTCGCCGTA AGTGAGGATA TGATATATAA ACTATTTTTT TTTAAAACTA AGACCCACTT CCTCTT1'TGG 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12 660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 CTGTTTCAAA TTCATCAAGT GTTAGGTTGT TTCCTCCCCT TTCCTCTGA'r ACGACAAAAG TAGCCCCCCG GTGI'CTGGTC CCAAATTCCT GATAGOCAGA CGTCACAGCG ATACGTT= GAGTGTTGGG AATAAAAATG TTAATGAGAA
GCTTAAACCT
TCAAGGCATC
TGGAAATCCC
CTTTGATAAT
GTTCTGCAGA
GTACACGCTG
TAACAAAGGIC
TCGCrCTTCC
CAATAACCCC
TA'rTTGCAAG AATCTTAGCA TCCAAGGGAA 'rTCCTGTCGA AATATACTCC TGCAAGGTAC A ATAGCAACC AAGGCCCCGA TTTTACGAGG ACTCA'rGTAT TCAACAGACr ACGAATCATC TG rCTCAG CACTAATAGG GGCANTGGAA AAGAAA'rCTG CAAACGTTCC AAACCAGTCC GAATCTCTGG AGAGAAGATA ACAACCGCCG ATAAGTAATA ATTTGATTGA TTAACCAAGA AATCGTAGTC AAACCAATCA Page(s).,.4.."712 were not lodged with this application CAACTGGTAC IGCCTGACT CA GCTCACG AA'rGCTCCCA ?TGCC.AAGA AAGCGCCACA GAGATAGGCA CGACCTGC?1' CCTCATCCrGA TAAAATCGCC TCA'rCAATAC CTGTT'1CCAG GCCAAAGAAA rGAGrCTGCC.A AGTGCAAA'rC ACTTAACAAA TCCTGCACCT TTTCATCTGT AAAAACGGTA TAGACGCGAT TCTTGCGAAG TTTGATTTCA TAG3AGATGGA GAAAGGACTC TGTCACAACT GACAAAGTCA AGCCCGAAGT ATTGCTCCGT TGCTrGGTGAC GAATTTCAGA ATA;Ac-GTGA CGGGccAGTT TGGCATTrrc CGAGAGACCG A'rGCTACCAG ACATTrrGAT AATGGCAGAT AATCATGCC TACTGTGAAA CTCATTTTTT CCATCGTGGA AGGCACCGCC ACTTGCTTAC AAAGACCTAC TTGGAATTCA TCTAT'rCCTG GGGCCACCAA GGTGACGATG TCCCCACGTT GGGICATGAT GCCCGCCCAA TTTCCTTAAT
AGCTCAGATG
CACCTGTATA
ArrrCCAGA
GTG=GGCCC
ATGCGCATCA
CGA.AGGAAGT
AGGAT'rTCTT CTTTTACTCC ACTCGTCCAC AATCAAATCT TAGATGAAAT CACGCGCGA AAAATCGTGT TCCACT'rGCA CTAACTATTC ATCAAAACCG AGGCAC=T TCAATATTCA CCAAGACAGT GTCGATAAAA CAAGACTTCC ACGTGGTCGC TATCTGTA.AA ATTGCAGACA TAGCCAATTT CTGCCTTGGT CACGATATTG GGCAAAATAG AGGTAAAGAG GTG1TTCCGTC
TTCCAAAAGA
GGAACCTGGC
CCTAGGACAA TCATGTCACT TTCAAGGATG GTCTGCACTA CTCGACGGCT GGCCAGAGCC GTATCATCCT TTAG4GGCATT GGTCACATAG ACA'rTGTCAA TTATGCCTCC ATGGTCTACA ATATrGACTCT CTCCAGCCAC TTCTGTCCCA TCCrGAAAGA CTGCATGA-AG GGTCA.AAGGA 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 TGGTCACTGG AAGGATAAAT GCA'ITATAGG TTGAACCCTG TGGCCAGCAA AGGCTCCGGC TTAGGCATAT CCGACATGGC TGCATA~rrT TTCGGAGTC GCc;ATTrCCA CATCTI1T1TIC CCAATCACCG TTATCTTTGG TGTCGCGATG CCCTTCATTA TT'rCCCTGI-r
CATTTCTGAC
ATCCTCAGAG
CACAAGGACA
GTATGGAAAA
AAGCCACCAA
AACCGA'rACT
TTACGAAGAT
ATTTGCTCAA TAACTGCATG TGATGAGATT TCCCAATGGA GAA.AGACC'rT CTCATAAAAC CACCTGGCGG TGTCAACTGT ACCTGAAGAA CCACCATCAT CTGCCACCGT CACGA'rAGCr CCGCAGACTI' TTAGAATGA CGGGACTTCC AGTCCCTCCA T7"rTCTCATG AACGG~rrAC CGTTTCCTTT ACAGACCAAT TCTTCGATAA GTCCTGCGCC CAAATCCCAC ACTACGGTGT TGTCCACCCG TACATCCCAT GGCAAT C TACCTTCCTT ?TGGAACTT GGCAGAATCG CTCAATCAA GGCCALATAAA
CTGCGGTCTT
AAGCGTTTAG
AAAACGGACT
TGTTGATAAA
AGTC"rCTGA CTCAGGATGG TTCATGACAT AATCATAAAC AGGTTCA'rCC ArZACCCGTTT GGrTrCTCAG 7rCGGTAAA TAATAGGGAT 7rGGCAAGAA ACGGACATCA AAGACCAAGT bo were not lodged with this application 501 CAACATCATC ATAAGAAAGC AAGGrTTCGGT ATTGI-rCAC CAAGGCATTT CTTCGCrC~r TCAAGATGCG AACCAAGTCA TCAACGGTCA ATTGCTCAAG AGCCGCAAAA ACAGGCAAGC GTCCAATCAA CTCAGGGATA ATACCAAATT ?~rGAATGTC TTCAGCGA'rG ATTTCTTGCA TGTATGAGCr GTTTTCGTCA A'rCGCCTTAT TATTTTGACC AAA1'CCGATG ACTT'rTCAC CCAGACGTTG T?1'GACAArr TCTrCAATAC CATCAAAAGC ACCACCCACG ATGAAGAGGA TAT'rTTTGT ATCCACTTGA ATCATCTC~r GTTGTGGATG TTTGCGTCCA CCTTGAGCCG GTACGCrAGC AACAGTrCCC TCAATAATCT TGAGAAGGGC ?TGTrGCACC CCTTCACCAG AAACATCACG TGTGATAGAC ACTC~rAC TCTTCTTGGC AATCTTCTCA ATTTCATCCA 18780 18840 18900 18960 19020 19080 19140 19200 CATAGATAAT GCCACGCTCT GCAGGATA'rT TTCCACATCC CAATAGCAAA AGGTACATTC AACCACG GCCAATCATC CGCGTGTATC GTGGAAAT'TG GCTTGGCACG ATCTTGACCA TrGGCACCTC AGACAAGTCT GAGCTAACTC CACGCATTCA T=Crc-G GrT'rTTGCCA TAGACATCAT TTCCTTCCAT GCATGAATAC TATTGACCAG TAACGCT'TT TACGAAAAGC
AAAATGTTTG
ATGCGT?1'01 A'rrACATAGT
GCCAAGACTT
TTACAAATAA
CAAAATGAGC
TCTATACTGT
ACI'CTGCAA
AGTGGTTATA
GGTTrCAAGAT
CCTCAACCAA
AAGCATTGTT
GCACGTTCGA TTTAAACTC TCACCCACAT AACCAGCCTC AAGCTC?1'AG CCAAGGTCTG ACCAACCTGC AAGAGTTTCrA 19260 CGTCAGAGCT GTCGCATCCG 19320 GGCAAGGAAA GTTTFcccTG 19380 ATCCACATCT TCTGACTCr-r 19440 AACCGCCACT GCCAAGGCAC 19500 ATGGAGGAGT TCAA'rTGGTT 19560 TTCT'rCTCGA ATGATT'TCCT 19620 GCCAGCAA'!r ATT'r'r'rTA 19680 CATA'rCA'rTT 'rTrCTATTTG 19740 AAAATA6ACGT CATGTAAAAA 19800 AAAGGAGGAT AGAAAGCCCG 19860 AGATGAAACA CAGAAAAGCC 19920
AATAAACCAT
CATTCTATCT
A'rTCGTAAAG CCATTTAACC TTGTGCTCCT GCCAGAAAGC GTGAATAGAC CAAATAAACT CCGTTCCATT AGACTCCTI' TAAAATCATA AGGATrTr.C TCATCTrTTG CG;TAAAAT GAGACAAG'rC AAGTTCrrCA GGGAAATAGG TATCI'CC TCT=~GCGG TATTGGATCG GCTTGAAAC'r GTCTCAAAAA CACCCGAGCA TGAATIGTGAG TCACAATCAC TTCATCAAGG TAAGGTTrCAA AAGCCTGAAA AATTTGCTTC CCACCGATAA TGTAGACArr CrrrCTTGA GCCTGATACC AGTCAACAAC AGACTGGACG TCCTGAAAAG TAGCAACCCC ATCTAWCTTT TCTTCCCCAT TACGCCTCAA AATCAAGG? TCCCGTTTTG GAAGCA6AGCG ACCCCCCATC CCATCAAACG TCACACGCCC CATCAAGATA GCATGAT'rCA GAGTTG?1-C TTTAAAGTGC TGCAA'rTCTG CTGGCAAATG CCAAGCGCAGA CGATTTTCCT TACCAATCAC ACCCTCTTCA TCCTGGGCCC AAATAGCTAC GATIMTTA GTCATGCTTC 19980 20040 20100 20160 20220 20280 20340 20400 20460 E02 Page(s). were not lodged with this application 503 CAAGTGGTGA GA1TA'CTTT ACI'CCTGAAG AATTGGGCCA GCAGGTTTCT TATGTATCTG ATGA'rGCCTr TGACTTAAAT r'rAGATAAA.A TAT TTGACGA A'rACGACGAT GTrrCAAAG CTTTGGT GGA AAAATGACAA *TCTATITrGAC AGAAAAGCAA AT'TGAAAAAA TAAATGCTTT
AGCAATTCAA
TATGAT'rGTG
TGATAAAGCA
TAAAAGAACT
TGTAACGGTA
TGAAAAAATG
AAAGTAACCT
AGAATTCTGT
AATTATGGAA
ACAATTACTA
GAAGTATTCG
TAGCTCAAAA
AGAGTCTG1'G TGAAAAGGA'r TGCTTG4GCTT
AAGACTGAGC
CGGTATTCTC
AACT7rACCAG ACGA'rAClwr
GCTTTCTTCG
GAAGAAGCAG
ACAAGCTACT
AGTATGCTGG
AATGCAGAAA
TCAAAAGTTA
GCGTTGAGAG
AATAATAACC
ACAAT'rCGTT
ACACAACTCA
TGTAGTAAGG
GAGCAAGAGG
AAAA'rGTGAA CAAATGAGAA AA'rTCAAACA G'rTAGCT'rT CTGCCTTAAA AACAATTrGT C~rrGGCAAG CCTCTrTATC CAACAATT? TrGTCCAArT GA'rAAAGAAG CATG'r=G CrAATGCTAA TTrCGTCAA ATTIrrACAA TTAAACGGCT ATCGTTTTTC TAAAAATGTG TGTAACCATC GCAGTAGAAG CTTTAACTGA CCAAA'rGGAT TI'CTGAACAT TCTGTTAGAG AAAAGGTCAA ATTTGAATGA GCACAAGAAA ATAAATGAAC AGACAATA'TT CTGATATTGT CTCTTPTTAT TGATGAATAA GAAAGrGAGA CAATTATCA'r GCAAGAAATG TTACCTCITTT TAAATAATGA AGAGTTTAGA ACATCATCTA GTAGACGGAA AAAAGCAGCA TGTTGCAACT A'I-ITAT'rACC GCCAAGCAGG TAGAGGGCTG ATTATCAGAG GACGATTGAA AACTTGTTTA ATGCTA'r'AA CAACAGATGA TTTAAGGAGT TArTTAGCAA ATTACCAGTC CAAATTTAGA CAATATTAGG CGTATAT'rGT CTTCTTTTTT ATATATCATT AAAATTCCCA TTCGACGGAT ACAGAAAATT GGAAA CTAT ACTGATGAAC ATTTGGAAAT TATGCGTGAT 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2387 A.ACTGTGAAA ATTTGAGAGA TTTGGCAATA ATAGACCTAC TAGCATCGAC AGGTATGCGT GTACGGGAGC TTGTAcAGTTr GAATCGTTCA GATA7TATT 'rTGAAAACAG AGAGTGTGTT GTCTTrTGGTA AAGGAAAGAA GGAGAGACCA GTATATTTTG ACGCTCGTAC GAAAA'NTCAT TTAAGAAATT A'rCTTAACGA CAGAAAAGAT AGTCACCCTG CTCrTTlr"GT AACGCTAGTr GGAAAAGTCC AGACCCTTGG AArrGCTGGT G'rAGAGATTC GCTTAAGAAA GTTAGGAGAC AAACTCGGCA TACAAAAGGT TCACCCACAT AAGT'rCAGAA GAACTTTAGC GACTAAGGCA ATTGA'rAAAG GTATGCCTAT CGAACAAGTC CAAAAACTGC TAGGTCA INFORMATION FOR SEQ ID NO: 57: SEQUENCE CHARACTERISTICS: LENGTH: 10669 base pairs TYPE: nucleic acid STRANOEDNESS: double Page(s). were not lodged with this application CAA'rGAAGTA ATAAATTAGG GTGGAACCGC GTTTCTGACG CCCCTAGGrr AAATCAAccT AGGATTGTCA GATGTGGTTC TTTTGCTTAT TrCAGTCTrATT G1rGrGAAAGA AAGGAGAGCC GTrGGACAACC TT1TATCTTGT AAAAGACGAT AGTCAACTAG CTACATTTCG TCA'rrTGTA GTAAGAAATA CTGAAAACTT GAAAGATTAT CAATCTTTT TAAAGAATGA ACTGCAGTC TGTGATTAC CGCAAGCTGT TATrGGTCA GATTNTAATG CTCCACACA GATT'rA'IG GAAAGTGCTG TTCCAACCTA TACAAATAAT AGACGAGTC TTATGACGCC TGATTTAGCT G7TrrGALAAG AATTGTATrr GTATCAGTTG GCAATAGAAA GTCACTATCA TTCTTTATCT GAGTTAGCTC Ar'GGTCCGA CATTTrrrAG ATCGACTACG AGTTrCTGA GCAAACTCAA GAAAATTCC TCTTACACAT TGTAGGACAT ATGAT'rTrGA TGGTTATCAC TCrrTA'cT GGTTCGAAGA GGGGATG1'r GAATATAT'rA GTCGCAAGTA 'TTrrGACA rrCAAGCGGA
GGCATCATT
AAAATTTGT AATCAATCTC TCGTAGAACT Tr=CAGAAG GAA'rATTTT GGrCrCGA CTTATGATAA GAACTATGCA
GAAGCGAAT
AAGTATAGT
AGTATTT=
ATGAATACTG CGCAGC??1' AAGCGGTCTT AGAT'rCTTAT ATTGGT'rrGT TCAGCAGAAA TGTCTAAGAA ATTAACATTT TCCACGCTCA GCACTAGAGT
TTGACAGTAG
CATTATCGG
TTAATTCGAA
CACTGCATCA
GCCTGAGCTA
ATAAGI'TGGT AGAAAATTTA GGTAGTGTAC CAAATACAGA AAAAACTT' CCCTTGTTAG AAGAAATATA AAAACTAAAG GAGTAAACAA GTGGCAGAGA CCTCCrTACA GTCGGGCTGC GACGCAGTAC TAACTCGTCT TGCCTCGTAT ATTrA7"r= A'rTAAGGAGT ATTCAATGTC CAGAAACCTC CTTACAGTCG GACTGCCCTA CAGTACTA.AC TCGTCTTGCC TCGTATAA'rC 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 GATCGACGAG GCAGACTCGT GTCGCA-AGTA TAAGAAATTA ACATCACT GCGTCAGTGG CGCTCAGCAC TAGAGTGCCT GAGCTACACG
CACGAGGCAG
AAATTAACAT
ATGCTrATGC
CTTCGTGCTA
ACTCGTGTCG CAAGAAAT'rA rT'TTTrATTA AGGAGTATTC AATGTCTAAG TTCAAGAAAT TATTTTGACT TTGCAACAAT 'N'TGGAATGA CCAAGATT(GT AGGCTTATGA TAAlrGAAAAA GGTGCGGGGA CAATGAGTCC TTACACTTTC TCGGACCTGA GCCATGGAAT GCAGCTTA'rG TAGAGCCATC ACGTCGTCCT GCTGACGGTC GTTATGGCGA AAACCCTAAC CGTCTCTACC AACACCACCA ArrCCAGGTC GTCATGAAGC CTTCTCCATC AAATATCCAA GAACTTTACC TTGAGTcTrr GGAAAAATTG GCAATCAATC CTrrGGAGCA CGATATTCGT 7TTGTTGAGG ACAACTGGGA AA.ACCCATCA ACTGGTTCAG CTGGTCrG TTGGGAAGTT TGGCTTGACG GAATGGAAAT CACTCAGrrC ACTTATI'TCC AACAAGTCGG TGGATTGGCA ACTGGCCCTC 'rGACTGCGGA AGTTACCTA'r Pag~s.,~?~..were not lodged with this application TGCAGCAGCC ATTACAAGr ?TGACTTGrr CCAAGGAATT ATGGGTGAAA AATACACCCT TGCTATTCGI' GAACACTACA '1'GCCTACATC CGGCGCAGrr CTAGCCATTG CAGACAAATT ATI'GATTCCA TCAGCTA ATGACCCTrA TCGTATCTTG GATGCCI'TG GTTGGCACAT TGCA'rTGAAA ?1'TG.ACAGTT TGACTTATGA GGCTCGTGTT GATAAGATGA TGGGCTCTAC AGGTTCAAAC TTTGTTGTG4G CAGATATGIT CAAGGAAGA)A GATTTTAAAC CATCTGTTGA GAAGGCAGAA GGGGTTGCTA CGGTTGATTC TTGGCAGAA GCAGTAGAAA CACTCA'TTr ACTTTGCG CTTAGCCCAG TCATTGATGC 507 GACAGGrA'rG GITGGTGAAT TTGACGAACT TCTTGCTG' GAAACTCCAG CGGTGGCAGC AGCTGA6AGGA GAACTTCCAG AGAGCAAGGT GGATACGAT'r 7"rGAGTrCT TCTCAGTrAGG TGCCCTTCGT CGITGCAACTC AAG-GTGTGGT TGCTATGGAT GAGCTGAT'rG ATAGC~rrrA AAATAAAGCA GAGGTTATGG AC~rrATCAA TCCAAAAGAT ATCAAGGAAG CAGTTr rGC GGAACCCA AGTGCTCTCG TAGAAGTAAG 5220 5280 5340 5400 5460 5520 5580 5640 5700 ATCACTrrCT CGTGCCTA AGCACTATTT GAGAATGACC ATCAGGACCT GCAAGTCAGC TTTCTTTGAA AATACTATGG 0*
S
S S
*SSSIS
*5 5
S
S
*5*S *Sf,.
S
S
5
C
AGATCAGGCr GTCCGTCAAA ATCGTCGC AATCTTGTCA CAACTAACCA ACCTGGCCGA 5760 AAGAAAAAGC 5820 AAT1'GAAACA 5880 TAATGGCTGA 5940 AGAAAGCAGC 6000 GACTTrATCT 6060 GCTTGCrAAA 6120 ACTACGTGAG 6180 CAAAATTGTG 6240 TGAAAAAGGA 6300 TAAGTT'rCT TATrACAAAG
AAGAAAAAAA
GAGTACATCG
GACGAAGAAG
TTACATGGCC
AAACCACGTC
GCTCTTAT
AACAAAGAGA
GGTTCCTAAA
TGT'=rAACC AAATTAACAC 'AAATAAAAT TTGATAAACG GAGAAGAAAT GGATCCGAAA AAAATTGCTC GTATrCAATGA CAGAAGGCTT AACACCAGAA GAAAAAGTGG ALACAAGCCAA AAGGT"TATCG CCGCGCTGTT CGITCACCACA TTGAAGGAAT
GAAACGATGT
GTAGTCTTGA
AGCTTCACCT
TTTCATTGAG
CTAGCAGTTT
AACTATCATT
AAGTACAACG TACACCAGAA AAACTACGCC TGATCCAAAT TCATAATA AT TGCCGTACrT AAGTACACCC TATATGTATT CTr'rCTTTTA GTGTTTGCTA GTCT==r~C AGTAACTTGC ACCGGCTGTA C-AGAAGCGCC ACAAGTAGCA ACTCTTCGA6A AATCAAA'rTC 'rGCGGCTAGC TTCCTAGT?1' ACAAAGATAG ATGAAACCAT GCTAAAAAAG GAACCATA6AT GCGTCTGCGT CACCACCGTG TCGGCGTCTC CATGACCTCC TGACCAGTCT C=r~AGGTA ATACCAGCGT ATGA'DCCATC CCTGC.AGGAG AACCTGGAAC CCGATAACCC ACTCTACGTT 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 GCCTCCAGCA TCCCCTGAAT GvGCAGCAGGA GCAAATGGTC CGCTACCACC CACCA.AACGT CCAGTCAACC CATGGTTGGA AGTTAAAGAC AGGATAGTAC ATTGCTTGGT AGTTGTGAGT GATCG'rACGG ACGTATTCTT GGTTTCCGTT GA'1?TCATTG
GTTGATAACA
GCGAAGTGTI'
Page(s).,F. were not lodged with this application 509 AAAATGTGAT GGTAATCTTG TCCATCr.ACG GTCAGGAC GTTCATAA.AT GCCTGAAGTC ACGACAGATT TATTGACAAC AGGGATGC ATAAATGAT 'rrCCCCTAGG ATTGGCTGGG TCITGAATCC CGAITTGCA TGGGTTATCC CC'TCTTGCCT GAT?~r=CC AATGGTCAGG ATATTCCCTC CCAGATTGAT CAAGGCAAGAA GTCACCCCCT ClrCCAAG AAATTGGGCA ACCrTATCCG CACTGTATCC Tn'rGGCTAAA CAACCTAGAT CGATCTTCAT TCCT'rTT 8760 8820 8880 8940 9000 TTTAAAAACA CAGTAGAAGT AGCACCCArT C.AAT1-rrG GTTTGAATrA AGGGACCAAT AACCCAAGTG AAATCAGCTC TGATAA'rTGA TTTCCATCAA TCTTr.AGCA AGTCAAAGGA ATAGTGATAG TAGTCCCCAT CC~rrCTCT-r ATAGAAAATA AGAAGAATCT AACTCGATAC CATCAGGATT GAT1TAGAGGC AGG.CTGGGCG ACCrGGCAT CTGAAAAACC GA'rACGCCAG CCATATTG AGGTGGCTAG AGAGCCCTAG GCTATGCTCT AAACAGGTCT GGATGAACCG TGACGGGGGC TATTCCTGCT CTCAGATTCT TGACTAT1GG CGTGAAGCG GTATTCAAGT ?TTTTGGAGA AAGATATCGG CTTGCTCATC CACTAATGAA TAGCCGTTCA GAATGTGAAC GAAGACTCAA GCTACCAACT AGTTGTAATA TCAAATAATC ATTTCA'T= CATGTTATTA TAATACCATA AAGTTAGAAT AAAAGTCAAG AAATATGCTC ATAALATT'CA TCAGGCTTGA TA7'rTTAT AAAAAATC'T GAAATA6ATAG, TACCCCCCTT GTATACTAGT AAGGTAAATT TAGAATGAAG GCAGCAAA'rr AGTCGGTGCT AACCACGCTG GTACAGCATG TATCA6ATACC TGAGAACGAA ATTG'ITGTAT TTGACCAAAA CTCTAACATC.
GGCTCT'rCG ATTGGTGAAC AAATTGACGG TGCTGAAGCC AAAA'rTGGAA CCTAA6AGGTG CTAAAGTT'rA CATGAACTCA ATCTAAATTG AAGCCCTTAC TTCACAAAC AAAATTTGGA AAACAGCATA AATGGGGAAT GTAAACGCTA ACGGTAAATC TTrTATGAGTA AAATCGTTGT ATGTTG.GATA ATrrGGAAA TCT'rrCCTAG GATGTGGAAT TTGT'TCTATT CTGATAAAGA CCTGTTCTrr CAATCGACTA 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 TGATAACAAA GTAGTTACAG CGGAAGTTGA AGGAAAAGAG CACAA.AGAAT CATACGAA AA ATTGAT'IrrC CCTACAGCCT CTACACCAAT CTrGCCACCA ATCGAAGGTG 'rTGAAA~rGT TAAAGGAAAC CGCGAATTA AAGCAACTCT TGAAAACGTA CAATCGTGA AATTGTACCA AAATGCTGAA GAAGTTATCA A'rAAACTrrC 'rGACAAGAGC CAACACCTCG ACCGT7ATCGC CGGTGT GC;GGGTACA TCGGTGrGA ACTTGCTGAA CCT'rlrAAC GTCTTGGAAA AGAAGI'TGTC CTTGTrGATA TCGI'GATAC TGTCTTCAAC GGTTACTATG ACAAAGACTT CACACAAATG ATGGCGAAGA ACTTGGAAGA TCACAACATC CGCTTGCCTC TAGGTCAAAC TGTTAAAGCA ATCGAAGGTG ACGGTAAAGrT 'GAAC~cCrG ArrACTGACA AAGAAAGCCT Sio Page(s)., were not lodged with this application TTTCAAACGAA ATCCNrGACC 511 GCCTCCTAGC CATCCGAAAA CATI=rGGCCT ATGGAGAACA CTAACTGTAT CCGTTGGGTA CCTCAGGTG CTGAAAATcA TCTCAAATGA CCAAGAAAAC AGCAAGTCAA TGTTTG~cGG
AAATGACTAC
ATCCCCAATC
TCAArGAATGG TTNrGACCATG GCAGTCCrrA
ACTAATCAAA
AATGATGAC GAAG=~ATG AGTCAATACC ATCTAATAC TGTATTCACA AAAAGACCTA TACTGCATCC AACAATTCTT 'rGGTTTACCA CCACCACGTC ATGAAGGTCT 'rTTGTCTTGC TAG4GACAAG.A AGATCAGAGT ACCGGCATCG GATACAGACA ATCTTTGAAG ATATCGCCI'G TTTTGAAGT TGACCAAGTT T'IGAGGTGCT TTCAAGGTTG ccTTTG1'AGA TNrACTTGGT GACAA'rrCCC 'rGTCTCAGCT TCAT1A6ATA6AC CAAGCTAGGT CCCAAATGGA TAGATCN-rA GGATCTTAGG TTGGTTGCTT CATCGATGAT TGGTGC'TAA'r 'rTGCTACA6AG GACATTGACT AGrCTTTTG TTTCCAGTTA CrrGACTAGC AATGTAACGA CGGCTGCAGC 'rGCGGCTTTT GTTCT-rGAAG 'rCCI'TCTACC CTGCGATAGC rTTAAGAGCA AACCACCAAG GTCAAGTAC AGATCCGTAA CTGTCTGGGC CCAAGCCGAT 'rGGCTr'IrT CrGATTACA AT11ACCTGC CCTCCTGCCA TGGCCATArC TCTTTGACAA GGTTTCCTGC TTGTCACCGA TAGCGGCAAC TCTGCAAAAG TACGAAGGGC TCACCGTTGA CTTCCTTA6AC TCTrCAACT CAGCATTrTTC TTGTGAGGTA CTTCCTTGAC TCC=CTGTT CACGATAGGC GTTCC'rGAAC CGATTCCTTC TCAACATGAG TACCACCACA TCCTTGCCGT ATTTCTCACC TCCG'rTTCAA CTGTCTTCAC ATCGCACGAA GTTCC'rCAGC TCAAC=TCGT TAAGAGATCC GCGTGAAGCA AATGAGTCGC 1260 1320 1380 1440 1500 1560 1620 1680 1.740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 TTCAAAGGCT 'rCC -rACCAG TCACTGCCAA 'ITCTTTGACA ATrTwrGAAGA GACCAATCTC AAGTTCAATA GAGTAGTCAC CGATAGTCAC AA.AGAGGGCC ATAGCTCCCA ?I'TCTTTAGC TTCAAGTGCT TCCCAAATTT TCTCGTTAAC AGTTACTGCT TGGAAGTGGG TAAAGTCAAA TGCCTGTGTT GCGTGGTrTC CAAGGATATr AGTGTGGTTr 1-rCATGACAC GGTGACGGCG GTTCAAGGCA AGCGGTGCAA GGACTTCAAC CTGAACATG GTCACAGTAG CCACAACCTT AGCTACCTGT CCACCCATTT CAGCATAAAA TCCTTCTGAA ACAGCTCCTA CT'rCTGCATT CAAITrGGCTA GCATTGTAGT TGAAGACACT
GATACGGCGA
AGAAGTGTTG
GACACGAACT
AGTGTCAATA
TTGCTGT'rCA
GCGAAGGAAT
GTGAAGGGCA
ATTGCTATCA ATTGCCAAGG TATATTCTTG TGTATGAAGG GCTTGACCAT TTCGGGCT -r ACCTGACTCA TCCAAGA'r'I- GTCCGTAGTC TGACGTrCC GCAAAGATAA GAGAGGCACT GTCAGCAACG ATAGCTACCA ATTTAGAAGA
TTGCATACCC
7rCTACAGTG ATGTTTTGAA TGACGCACGC GCCCGTTCTT
GAGTTTCATT
GCT=TCTTT
?rGATACCAT1'GAGCCAC CCTTGACAGC 5v2 were not lodged with this application 513 CAAAAATCAA AAGACAAGCT CATATCACGA CAATGAACTT GTCATICTCT TGrrCTTA'rG AGC?1'AGATG GCTCGC.AGCA CCGCCAXT1TC AGCGCGAAAA ACCGCGGTAC CACCTTCATT CAATTGTATG A'I"rGAGTAGC ATGACTTCCT 'rC'GGACTAA GACAAGTGAA AATCAATCT CAACTTC ATT1ATAACGT TTI'TA6ACC TTGCGTCAAC TGGAAATGAT CTCCGrTGAA 7rAGACCAAT TCCCTACATC TCTGXI-ACT ?r'rCAGGAT ATA~rI-'rC TTACTGCCAT MMTC N' I T ATCCCAAAT TTCATATTAC TAAACACAGC TACTAGAATA ?I'TCCAAATA TAAAGCTGCC TA'rCACCCAA TATATGGACT CAGT'rGTTAG GTArTCGA TCCAAGCCAT CCTTTAAATG GAATAGTATA GCAG1'TTGGT TAACAATCAT AAAGG1"rGGC CAGAAACTTT TrG raAAAA AGTAGACAT 'TCATT-AT'rT GTGCCCCTT TC'rGTAAGGI' TAATACL'CAA TAAAAATCA.A AAAGCAAACT AGGAAGCTAG CCTCAAGCTG TACrrGAGTA CGCCAAGGCA ACGCTGACGT GGTTTGA6AGA GTATAGGCT ACTATACTAC TAGGCAAGCA AATAAACAA.A TAAACAACTA GAATAGAAAA AGATAGGGCT CTAAAAACTG ACTTCTATTC CTTAAAAACG AACCAGCTTG ACTGATTCGT CTTCTrTACGT TTATCTCCTA CTTCCGATAC ATTTTAAACT GTAGGAAGAG GTCGCTATAT TITCCCTGTCC ATTTATGGTC AAAT'rTCTCA TAAACTTCTA GGTGTTTCAT GGTTTrCAACA TCGGGATAGA AGCCTTATC TTCCTT'rGTr TCCTCTGGGA GCAATrCCTT CGCTGGTAGG TTTGGTGTTG AATACCCAC ATACTCCGCA TTTTGAGAG CATT=CAGG TTrCAACATA A.AGTTGATAA AGGCATAGGC TGAGTTTCG TTTTTAACrG TTTTGGGAAT GACCATATTG TCP.AACCAAA GATTGCTGGC CTCTGTCGGT ACCACATAAC GTAGAT'rrTC ATTTTTTTCT AACATTTGGC TGGCTTCACC AGAGAAGGTC ACGCCGATTG CAACATTATT CTGAATCATA TAGCCCTTCA TCTCGTCCGCC AACGATAGCC TTGATATTTG 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 GAGTCAGTTT GTAGAGCTTA GGCTCTAGCC GAGGGAATTG TGATAGAATT CTTATACTCC CCATGGT=rC GTTGTAGACA TACCTGGGTC AAACGACTGG TTGAATAATC AAGCGGAACC TTGGAATGGC AATATCGTAG TGGAGTCAAA AGTCTCGTAC GTTCAGGATC GATATAGTCT TCCACTGTCT CTTCCAACTG CTGCAGATCC AGTCCTAGTC CCAGCACCTC ACGCGCCCCA GGCTTCCAAA GGTCATCCCA ATGCTCAGGC ATTCCTAAGG TTCCCCAGAA GTAAGGGATG 'rTGAGAAACT CTGGTCCGAT ATTNCGATT AAGAGGTCI-r CGTCC1-rCAT CTTGTTAATC GTCGTCCAC CCTCNTTTAT CT1TAGTGTAC TGAACTTGAA TTCCTGTTTC TTCTGTAAAC CCCCAGTTAT AGATAACCALA TTTTGACTA
TTGGACTTGA
TCAAAGAGCA
GCTTCATCTA
GAGAATTTAT
CCTTCAATTT
ATGTATTCAC
ATGGCTTCGT
TGAGTCAAGA
TCTCGACTAT
P s5Iq.were not lodged with this application
TAACCTACCT
AGAGGAT'rAG
TCCTTTAACA
GAGTTTGAAA
TGAGTGCTAT
TCCATATTCG
GTAGAAATCA
TATATAGAGA
ACGGACArrC
ACGTTTGCCT
TTACCATAGG
AMrCGAACT 51.5 AAC'TCCTACC 'rCAGCGAGTA GC TA?1'NT CAAAACCGAA AAGCCcrr= TAAGGCT= TGTATACTT*T AAAAGATAGT CTTGCAAATA GTrACAAA AATAGTATAC ITGGATCATT AAAATTTGAA TGAATACTTT AGGAGACAAA 'rTGATGGAAT CCTGAGTCGG AGTATGACTA 'XC?1TATAAG GATAAGAAAC AAGAAAGGGG ACATTGAAAG CATCAACTTG CACTATGCGGG GAG??TTATC AGGATACAAA ACAAATGGTC AAGATAAC -r TGGCAGCTTG AAC'TCTCAGT TGACTTTGCA CGTATCCAG'r ACAGAAGGTC AAAATAT=r GTATCGCGAT AAAGGGTGTC ACCCTIrrAT CTTTATGGAG CTGG;TACCI'T ATTTGACCAT ATCTCrTrGA GCTCAGAGAT TGGAAAA'rTC TCTAGAAAAT1 ATGAGATTGA TCCCTGCALAG TTCCTCAAAC ATTTGCCAAT ATTCA'rCTGT CACACCTAAG ATCATATGAA TTAC'rrGCAA AATCTACAAG CAATCACAAG *00.
C.0** o* C CTTCATGCAA TTGGGAATGG gTCCTGACT GGGTTTCAAA GGCAATGCTC TATTAAACCC AGCGATCATT TCr'rTGGTGG GAC?1'GGGTA 'rTACTGGACT
TACAATACGA
GAGACAAGGA GACCTTTCGG GAACTGGTGCG TGCTGGATGC GGTATTTAAT CATATrGGTT A.AAATGGTGA ACAGTCTGCT TATAAGGATT CTGAAAAGCT AGTTAATAAG AGAGACTTAC TCCCTAAGCT AAATACAGCC AATCCAGAGG ATTGGATTGA AGAGTr'rAAT ATCGATGCTT ATCAGTTCTG GAAGGArTr CGTAAGGCAG TAGGAGAAGT CTGGC-ATACA TCTCAGCCTr TGAATTATCC TTTATCI'GAT AGTATCAAGG ACCAGTTCAT CGATGAAA'rC AATGGAGAGT Tc-ATGTTTAA TCTCTrGGAT TCACATGATA ATG7TCAACT GGTTAAATCA GCCTTAGCCr TTATACGG AACCGAGCTA GCCTTGACTG TGCCTrGGGA ACGTGTATCA AGTGACAATG
CAGATTACTT
ATCAAGCGCA
CGCAATCTCT
GGTTCCATAT
CCTATCATGT
TCAAGAATTA
GGCGTTTGGA
ITTTAGCTAA
GGCTAA6ATCG
ACTATTTCTT
CTATGTATTA
CAGAGCGAAT
TTT-cCTTrrT
GAGGACCAGA
ATATGCTGAA
A'rTTAAGTTG CCTTACTC TACGGTA'rGG TATCAGATAT AGAAGGCACT TTAGACTGG TGATTT'ACAG GGGATTATTG ATATCTTTGT CCCATCTTTrG TGAAATTGAC CGTCATwrT'IG TCATCGTGG-C ATGAAAGTCA TCAATGGAAA AATGTCGTCA TCAACAATTC CCAGTGACAA 71 TGG.rC GAGGACTATA 'rTTTTAAAG GTrGCGACTT TGTGCGCTAAT GAGATrGACC AAATCCTGAT CTT'rATATCC AGATGAGTTC CATGCCGTCA ACGAGGAATT AAGAAGACAG CAAGCAGCAG ATTrCAGAGG CCTGTGGACG GCCAATGAAG ACAAAAAGGA AC-ACCGTGCA TCCAGATTGT CGTCG'rTGTA CI'TATGAAG AGGCTGATTA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 gi Page(s). were not lodged with this application TI71GATGAAA TAGAGAGATA TAGTGGCTAT TCTGAAAAAC GGATGGTCGC CTAATGGAAA ATGCTTTAGC AAGGAATCN' CGTTCACTGA TAG3CGATTTT GCTTGGAGGG CAA.AGTGGTG AGAAAGAATr TCAAGGAAAT ATTGTTATCA CACACTATTT AGAACTGCAG CAAGAA'rATG TTGCAGGAAA AATG.GTAGAG TCTTTAGTAA TGATAGAGGG AACTrTACGA ACAGTTGATG ATAAGGGATA TGAAGTACAA TTGGCCTTAA TTGGATGGA6A
TCCAAGATTA
CAAGAGGAAA
CCGGTAAGAC
TAGATGGTGA
GCAAAGACAG
TTGAACGTAT
TACTGATAGT
AAAGTCCAGT
TACAATTCAT
TAGTTTTCGT
TGTAGAATAT
GTTAGAGATA 3960 CAAAATTGAG TAGTTTGAGA TTCCAAAGAA AACAGCACAA TTGCGACAAA GCCTGAATTG S S
S
S
GTACTCTTAT
CAA.AAGAACA
AACTAGCTAT
CAAAAGAA
CCGTTATGA6A GAACTGTACA TT-ATCALATCC AAATCAAGCA TCATGATTTC ATTGrAAATC ATCTAGTTGA 'rAACACACGA CTTrGAAAGA ATTCAAATTT ACCAACGAGA TAGAAGTTGT TACAACTTCA GCAGCAGATG TTrrCAAGA GT'rACTrCTTT GTCAGGTAGA GAAGGAGATG TTGCAGGTGG GGGAAAAGAG ACTTAATGAA AATAAACAAT TGATATTTTI AGGAGAATAG AAATGAGAGG GTTrAATA6AC CTG= 'AT CA AGAACTAACA AATTCCAAAG AGAAATTCGG TAGCTTTCAC TTCATTGCA TACACCTGTT TCTTATGATT ACAAGCTATT TTCTAATTGG AATATAGAAA AATTACTGAA GATGA6AC1'AT ATGATATATT TTGAAAAT AAGTTGATAA GACAATTTTT TrTAGTAA'rT 1-rGATAAGG'r TGTTTTrCT AATATATTAG =TT~ATG TTAGCAGAGG CAATCATAAA AA.ATGGAATA TAGTAACTGA TCATAATACT ACCAAAGGTA ?TAAAAAGTT ACAA6ATGGCA TAATGAAAAA TTATCCGATT TATGATATAC ATCCTCATAT TTTACATGGA GTGCAGCAGA TAAATGCAT ATTGTATGTA TATATGATTA TGAACAAGAA ATCAATGG'rT AAGTGAAAAT ATTATAAGTG AGAAAGATGG AAGTTATCAA CrATAATGAA GGATTrCAAT AATCAAAAAA TAGTrAACTA TATTGCTCAT ATGACATTTT GAAAAAAGGT TCTCACTrAT CAGGTGCATA TAAACGAAAA UAATTCAAAC 4020 AAGCAACCTrA 4080 CGTATTAAAC 4140 TCTCAGCATiC 4200 ACCAAAGATT 4260 TACAA'rCTr'r 4320 C'rCTTGAAAA 4380 TCGTATCTAA 4440 CGCGCAACTC 4500 AAATTGGAAG 4560 GTATATGATT 4620 GGGGAGTGGA 4680 TTACTTGAAA 4740 AAGATAAAGT 4800 AAGACTTTAA 4860 ACTGCAACGA 4920 AAGAAAATAA 4980 AGTTCAAAAG 5040 GAAATAGTTG 5100 GTCTCAATCA 5160 GTAGAAATTA 5220 TCATGGrrA 5280 CA'rTCACTGA 5340 TTCAA'rAGT 5400 ATrTTTTTCTA 5460 AACAACTTGA 5520 CCATGCTTGA 5580 TTGATCAGCC 5640
AAGAAAATAC
TATTCTCTAT
,Tir=TATTA ACGATTTTG4G AGI'TAATAT TAACTCGAAA GAATCTTCGC AAAGAAGTIrG GTGTATI'AAG rr'rGGGACAA AAAGTTGTAG GCATATAGTG ATTATTCTAA AGACTTCAGA CCATTGArrA Sis Page(s). were not lodged with this application 519 TAATCTGCAA GCCCAGAAGC TAAAGGAGTT CCATGGTCAT CACTAAACAC TTTGCCATTC ACGGAAAGAG TTACCGCAGA AAGCTTATCA AGTACArCT CAATCCTGAG AAAACCAATA ATCTTGCCr GG1'TCGGC TATGGCATGA AGAAT?=CT GGACTrCCT AGCTATGAGG AAATCGTGCA GATCTATCAT GAAAATTTCA TCAGCAACGA TACGCTTTAC GA'TTTCGcC ACOACAGGAT GGAAGAAAAT CAACGAAAAA TACACGCTCA CCACATCATT CAGTCTrTCT CGCCAGAGGA TCATATCACT CCrGAACAAA TCAATCGGAT AGG=rATGAG ACTGTGAAGG AATTAACTGG TGGCAAAI-rT CGTTTrA'rCG rrGCGACCCA TGTGATAAA GACCACCTGC ACAATCACAT CAT'rATCAAT TCAGTAGATA GCAATTCTGA CAAA.AAGCTC kAGTGGGACT ACAACGTGGA GCGAAATCT CGCATGATr CTGACCG=r ?I'CTA.AAATC GCAGGTGCTA AAATCATTA GAACCGCTAT TCTCACCAGC GGTATGAAGT CTATCGTAAG ACTAATCACA AGTATGAACT CAAGCAGCGA CrCTATTTTT TGATGGA.ACA rCTAGGGAC TTTGAGGATT TCAAAAAGAA TGCTCCGCTA CTACATGTGG AGATGGArrr CCGTCACAAG CA'rGCCACCT TTrTTATTAC GGACI'CAACT ATGAAACAGG TGGTGCGTGG CAAGCAACTC A.ATCGCAAGC AGCCTTACAC AGAAGAATTT TTTAAGAACT ACTrTGCCAA AACAGAAATA GAAAGTCTCA TGGAATTT= ATTGCTGAAA GTTGAGAATA TGGATGATTI' ACTrCAGAAA GCAAAACTTT TTGGACTAAC TATCAATCCT AAACA.AAAGC ATG-rTCTrT TCAA=TTCA rGGAGTGGAGG TAAAGGAGAC AGAGC'rAGAC CAGAAAAATC TTTATGATGT AGAGTrTrTC CAAGATTATT 'rTAAAAATAG AAAAGATTGG CAAGCTCCAG AAACTGAGGA TTTCGrrCAA CIrATCAAG AAGAAAAGTr A'rCCAAAGAA AAAGAACTTC CAAGCGATGA GAAGTTCTGG GAGTCCTATC AAGAGTTCAA GAGTAACAGA GATGCCGTTC ATGAA'T-rrGA GGTGGAGTrG TCACTCAATC AAATTGAAAA AGTAGTGGAT GATGGAATTT ACGTCAAGGT CAAGTT'rGGT ATTCGTCAGG AGGGACTTAT CTTTGTGCCC AACATGCAGC TTGATATGGA AGAGGATAAG GTGAAGGT-TT 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 TCATCAGGGA AACCAGCTCC ATATGAAAGG TCGAACCrrA GCAGAAAAGC GACAGTCGAT TACTA'rGTCT ACCACAAAGA CGCTGCCGAG AAAAATTGTT ATTAGACAGT TCAGCTATGA AAATCAAACC ATTCCATTAC ATGATTAAAG AGAAGATTGC GGAAGTGGAT GCTTTGATrIG AACTGGAAGT AGAAAATCAA TCTrATGTCA CAGCGTCTGA ArrGAGAATC AATGAGTTGC CAGAATATCr ACTGGCTTCA GrrGAAAGTA TGAATATAAC TGAGAATATC AGTGCTAATA CGA'PTAAAGA TGAGTTAGTG CATGAACTAG AAGAACGAAT GTCAACCTTG AATCAAGTAG AGCAAGAAAT GAAATTAAAT C?1'TCAAAAC TTrGTTGAGAA AAAATTGAAG AGCCTGGGGA Page(s).,E;.2C). were not lodged with this application GlrrlrCCrG G7NIGATAAA GCCTTGCGAA TCGCTGCAAT TGAGI-rGTG GAAGCAAGTT ACTGAAATCT CATCN'CAAA ATGACAGCCA ArTCATCACG GCGATGATAT ACCGAGAGTC CCGTTGAGC CCCAAAAACC CCATCTAGTT CI-r TTCTTG 521 GTCAAGCGCA TCGCGGAAAG C~r=GCTGA CTCTGCCTCA TCCTTAATCA TACGAAGACC CAAACCTGCA AAAGCTGCCT GCATACGGTG ACCGATACGA GTCAAGCCCA TCTCCIAAC ATCAGCCACA ATC~TAAAAC CACTGTTC TGTCACTAAG ACCTGACGGT CACG-ACTGAT AGTCAAATAA TAGACGTTTT TAAGATTGTT CATTTTAGCT AGAAATGC?1 GTACGCCT GTCCTG.TATA GCTGGCTTCG TTGGCAGCTA
GATAGAACAA
TTCCACAAAC
GTAATAAGAC
AATTCCTGCA
TTGCTTAGCT
AAAGACTGTT
GATGATGATA
ATTCATGATG
CTTCTTCGG
AGTCAATGAT
T1AwGCCATC TAAC~rrCCT AGT'rCCTGTT ATrGGTC1'GCC
GTCTACAAAG
CGTCGGCTCA
TAACTTCA'rA
ATAGCC'TAC
GAAAAATT
GTAGTGAACT
ATAAACATCT
ACAGCGACCT
TTCATT'rGTC
GTTAGACCTC
CTCAATCCC1' GGcAATGGC'r
TGTCACTGCG
TTCAAATAGT
ACGATGATGG
GTCTTGATAA
CGAGCTAAAA
TCCAGAATGT
cCTGGGCTT
TTCCACCACC
CATCCAGATT
GACACCGCCC TCAGCTCCC-A GTGCTCGATC ACGAGGACTG CCTTrGAGCAG GCCAGCAATG TCCTCTGTAT CAACCCCTGT AGAAAGATTT TCCTGTCGAT CGTTTGTGGA GTTCGCTAC CTCCCCCAGA AAGGGTGG'rA GCTGGCTGTC CCAAGGTCAC CC'TACATCCT TGATGGTCTG ACCGCATCGT TGCCGTCAT TCTAGGGTTTr CACTGTTATA GGCAAGAAGT GCATCTCAAT CCCITTCACGT TGAAACTGAA TGAGCAAAAA GGTCACGTAT GGCGTCCGTC CGATAGGGCT GTAATAGTrCT TAAACTTACC TrTTTGAGAA TGCTGTTGAT ATAAATTTTC CTAGTGGAAA GAGTT'rGCGT TGAATTTTCC GAATGTGTTG 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 ATCCAAGACC TGCGAAATAT GCGGGTTCCG TGGCAAACTT CTTGATAATC CCGTCACCTG GCGCCCCTTC TTGTAGCCTC ATCGTCAAAA ACTCCrGTAT CTGGTCAATA TCAATCAAAC AGGTTTGTCT GAATTACGGT TAGAGTCGAT TTCCCTGAAC GCGAGCCGTG ACATTTTGCA ATTTCCGACA CGGCGCTCTT
TCTTTCCT
CACA.AGCCAC
AGCAAGCTTC
GIAATCTTGGC
AGGTAGCTGG
GGTCGACATG
'rGAGCTTCl'G
CCGACACACC
AG7'rG'rTCTC
CTGGTACTGG
A(;GCGCTCCT ATCACrrCAA TAAAACGACC GATGACACGT TI'GCCTGAC-A AGTACTGACC CTTAGGTGTA CCTGCTGCAA CAATCTCACC AATCAGATAA TCAGCCTCAC GCATGGTATC GCCCAAMTCA CGCATCTTr TCAGACTGGC TGTGATAGAC TTGCTGTTGC GAGCCACTTG ACCAAAAACA CCGGCACCAG G.ACCAACGTC TTCGTCGTGT TCCACCACGA TAAGAGTATT AATCAGGCGA TCATTGTCCC TCTGGTGAAG Page(s)were not lodged with this application 523 TCGCCAACAC CTTCG.ATATT TGTCTCGTAT TACCATTGAA AGGAAAGAAA TAACCGCACC AAACCATTAT CACTACGTGT GATTGCGTAA TTTCTATIACC ATGCAGAGCT T'rACCTAACA GTCAACTrGCA TCAATCAACT CTATCGTCTA 'r TCAAGGCC GTTCAACCAG CAATATCAAG TTGAAACCCA ACAGGCCATC CAGAGACCTT TGCCTCTATC TTGTGACCAT CG'rCATGTCC AGGATAATGA AATCCCCCTA CC-I-1-GCrAT GAGTGTCTCG AGT'rCCTATG TCTCAAATTG CA'rTGCTACC CAACAATCA TGAGGAAGTC ATTCCCCAAA GATATTGCTG ACCTTCGAGC GACGAAGTATA CCCTGATTAT TACTACGTAA CCATCCCGCr TTGGAACCAC TACCTGTCCT TTCATGCGTT CACGr=T' GCCCT'rCGTT CAATCGACCC CGTAATGAAG AATTGATTGA TCCCTCAAAA CAAATGAGCG ACCGCTCGAT GCGGAAGAAA CGTAGACGTG CCGGTCACGG TGGTATrATC ATCACTGAGG TGATGTCTTT ATCAACCGTC CTTTCAAATT CTTTATCGCA CAAGAGTGAA CAAATCGAAA GCTCATG-GAA TTGGAAAAAA CGTGAT'rAAG AAATTCACCA
AAATACCI'G
GAGATCGCAG
ATTTCTAACA
ATCCCAACCA
AACGGAGAGC
CTCACTCTC
ATCTACAAAA
TCAAAGATGG
TCCTTGAGGA
AGGACGAAGA CCTGC= GAA GACACCCTGA ATATTTATGG AAACGTCTTG CATTCTATGA ACCAGAACAA CA'rCATGAAA ACCTTGGCCC TCGTCrrTTC TGCCTACGGG ATGAACTTTA CAAATGCCTT CTGGTTAATC G1'CTTTATCG ATCTCATCCA TAAAAAATGG TTCTAAGAGG ATtAACTAAG AAAAACCAAG AGTTTGTCCA GAAAACAGAC GCTGAAATCC AGACTArTT GCAATCTAAA GGTACAACTG CCCGrrCCCT CTTCACTGTC AAAGAC-AGT ACGAAAAAGA GATGATTATG GACTCAGCTC T CATCAC AACCTTCTTT GCGGCAGACC AAGCTTTCGG 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 ATACGGCGCA CCAACTCATT GGGCTCA1'AG GCATCCAAAA GAAAATGATG ACCCAAAACT TAGCCTCTTT GCCCTTGTCA GCGCCCTCAC CTATGGATTG ATTACTCTTC TATTAGTTGG GTACTACTTT GTTTACCAAT ACTATGGACC CTGGAAATCT GTACTAGTTA TCCTAC?1C AACAAGCT'rC CrACCAGCTA GCCTTAACCC TGGAGCAGCC CTCCTACCCC T'rCGCTTCTA AAGTGCACCA CCAACACGCI' ATCAAGAATA TTGCTTTTT ri~C ACTTACTTT TTGAGTTATA GCTAGCTGCA GGTTGCTCAA AGCACAGCTI AAGAATT CCAAGACTAT TAAAAGTATT
ACTGGTTGGI'
AGATATGGAT
TATGL'rCCTT
AG'TACTGGAT
TCTCAAGAAA
AGAAAACGAT
TTCAATGAAA
TGAGGTTGCA
CTTCTGAAAT
GGATTTGCCT TCTACTTGAT CGCAGTCAAC GTCCACCTTT TGGG TCTTCTTTGC CCATTG4CCAC TAGCATTAI' CCC"?'.GAATA TCCGTAGTGC AAAAGCA.ACT GCAGGTGCGC ATCAAAGAGC AAACTAGGAA GATAAAACTG ACGTGGTTTG CCCACATAGC TTTCTCTTAT Page(s) were not lodged with this application GTATTCTGGG TGCTAAGGCA TTGGCATTG GCCGAACCAC GTGAGACAAG ACATGAGACA TCCC.ATGCTA ACAAAGTCAC GATACCAT'N' TGGTCCGCZAT AGCGATAATT TCCATATCAC 'rGCCTGCTCA AGACCGCGT GTGCCCCATC TGGTGCAACA AAAATGATAT GTCTr'rGACA C'T'GGGCAAC CCGACCTTCT TCTTTAGAAA ATCATAATTG TrGkAGTGACC
GACCTGTCCC
ACATACCTGT
GCATAA.AGCT
AGGCAATTTC
CGTCTAGATT
GGGTAAAGTA
TCAAGGCCAA
TGTTCTCC
TGTCTGl'rT
TCCGCCAAGA
525 CTCATATTCG TCAGATrCAA AGACAAAATA ATCTCCAA'rC AAGAAGCTGG TATCTGTAAT CGTTGAAGTT TTTCCATGTG CTCCTGCTAC ACCTAGAAAC TCATGGTAAC GTTTGTAGCT GACG7TTGA TCTGGACGAA AGGC-ATTCC TrTTTTCATCA AAAGGAAGAA TGGTAATTCC GTACTTTTrCA ACATCTGATC CCTGAACCTr GGCACTCATC CCTGATCCCT TAATTCCGAT CCTATTCTGT CATTCTG7GTC AGATTCAACT GTTTACTTTT TTTArTGTAG ATTTGGCTCT TAGGGATAGA ATCAACTTCT *9
I
ATT'rGACAAA TrCACCAGGA TTTcCT11r TAACTAGACr GGGCTGAGAA TGACGTCTCG CAGAGCGrr C~rT'CAAG TCCGCACGCG TTrCCTTCT'r TTT~AACCCCT AAAGGAGCCT GTTTTACTGG T*rTTT~CA GCAATAGGAG CACCCTTGA'r ATTAC1'GATC AGATCAGACT TCAACATGAC CTCGTCATCT GACACCAATG TICAACACT 'rCATTATAGC GTATTGTCTT CAAAATTC CTAATTrCTC CrACGGTCAG TTGTGACACC AGATGT 'TCT TIr'GTTCTTG CTTGGTCACC AAGCGATAGA CCTCAACCGT
GAGCAGGTGC
TATAATCACA
GGAAAGGAGC
CAAGGCTGAA
CTTCTTCACG
TT'rTAGGTTTr
CCCATTCTAA
CATCATAGAG
TGACACTTCT TCATTCTTGG CTGGGTCAAT TrTGGC'rAT TGTCGGTrGA TI-GCCC'TGTC ATCCTGAGTT AGGTAGT'rAG CGCCACCTCC GCATAGCTCT TTCGACTTGC TTTTCAATCG ATAATTTTA TCTCGATACT ATTCATGACT GGCATTTCAG 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 GAAATCGTTrC TTGTTTCATT TTCTATTTCC CATT=CA.A GTGCTGGCTT CAGAAATTCC
ACTACCACGT
GAGTTCCTGA
TTC'rTCCTGA GACTCTGTGC CGTCCAATAC A1I=~CTT CAATGGTTCC CCCATCCGAT GGGCACGGCC AATGGCTTGC GCTTCCACCG ACCTGTCAGG TTCAGACCGA TCCTTGGTTA AAGGCCTTGG TTTAAAGGAA GTCAGGCCCA GAACTGAGAG AAAATCAAGA GAGACTATCT AG~TTGCCGC CAGGATTCCA CCAAAGGTCA ACCAAGATCA CTGTATCTGC CCCCACCAGC CTGAGGGAA ATCAGAAAGG CATCTCTTTC TCATGTCTTG TCTTTCCTTG GCTG.GGGTTG AACCCGTAAT AGT= GGCAG TTCTTGTTCA ATTrTTTCCA ACATTCCCTT CACGGTGTCC GCCGTCTGCC ACCTGTACCA GTAGGTCTCG TGGCTCCCTG ATAATCTTCC ATAAACAGGG CAGGAGTGTC
527 GG-rrTGGTAG CrATCCACTT GATCGTCAAA CACACCCTGC TCAAAAAAGG TCAGAGGGAA GAGGTCTGGA ACCAGCCCTT CCTCCGTTAG GGCATCATCA AAGGCTTCCC AAGAAAGAGA ct1rlCTCTGC TCGACAATCC TTAAAAAAAG GCTATTGA'rT TGGCCr.ATTC TCAGAGTCCA
TTGAAAATGG
AAAGAGGTGC
TCCGTIGCAAG
CTCCTCATAA
'rGG.AA'rATCC
CAACATATGA
TGA'r'rTTGGA
AGCCTCTTT
ATtNTCAGA GGCAACGCCCA TCAGAAATT CGACCTTGGT TTGGAAAAA AAAGTCAAAA CTT7rCrc'T
ATCTTGCCAA
CGAATGACAT
TGGTCCTG
GATAAAATTC
TCTTCATGAC
AAATGCTCTA
TCATATACGA
AGAT'IrTTG
CTTCCACCTG
GATCCAAAAA
C7rCTCCAG
GCGCTGCCAA
0 0**060 06 6 6 600 0 0 6
S
@0 Sq *S 0 6 *6 55 S S 6 ACCCACAGCT GATAACTCAT CTTGCCACCC AAGGTCACCT ACTCCACAAG AT-TCCTGAC ATGCACACAG TAGCCCCTCT TAAACTATAG CCCAGTTCTT GATGATATCA ACCTrACCAG AATCAATTTA GCCATAT'N'T ATGAAAArAG CCTITCrAGGA AAAGTAGCCA CAATCCGCI'G ATGCGGCAT GGAGACTTCC GCTTCATTrC TCAACAACrC AGCCAAATCA AGTAGGTACC AGATCCATCA CATAATTGAT TTACCGTATC GATAGGCAGC TTATrGGCCA AC-AGGCGTTT TAGGCAA'N'TT TrGTTCCAGC AAATTTTTGA AGGCAGGATT TTTGAAAAAA ATCACAGCXA CAAAAAACCA AATCATCC'rC CTTCTGCAAC GCGAGCGTAG AGCCGATTG1T TCTTTTCCTT 'TrrCATAAAG GGCAACACCT TCGATACGAA TTTTCCCCGG CACCTTTACC TTATCT-NrrT ATTATACCAT ATrTTCGCCT AGACTTTCT CC'rAGACGC TGGATTTTTA ACGTTTGGCA ACAGACTTCT TGCAACAGAG ATTrGGGCAT AGCTATATTG TTCCTCTCCA AAATCCAAAC CACGGTTGAG GATAACCTTG TTGCAA'rGTT TCATCAGTCA GGTCATAAGC TGAAAAGTCA
AGGCGCATTC
TGGTTrCAAC
CACGCTCATC
4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 TTGCCGTTTC ATGACCTTGA TTAGTCTC GTGGTCTTCA AAGACTTGCT TGAGTTCCTC TTCTGTCGCC AAATA.ACCCA ACCrGAAAT CTGGAAAGCC AGTCTCAACT TAGGATTTTC
TTTTCCAAAT
TAGCCAATCT
TTCATGCTGA
AATGACTGCA
50 0 5 6 0S@S A 0.@ 5 6505 *5 @6 0 0 AATATTAAAT GTT=AGTGG CACTGCTCAA GACGATAGCA GATCGTAT'rG AAAGACTGGT G'IrrGGACC AAAGAGGGTC AAATCTTGCT GAATCTCATC CGAAACTAAC ATCITCTCCA ACACrTCTTT TTCCCAAACA ,AcATAGAGrr TAACCrCCTC TTCCACCAAA AACAGACTAT CCT~rTCCAC TAAGGAATTA CrGCGAGCAA AGGGrGGGTA GACAGGCGTG AAGGTTTGAA TAGCTGTTGA GATGGCTGGT
AAAACACCGT
CGTCCACCAG
GTTTTTGGCA GAGTTGGCCA GATTGTGACG GTTGCAAAGA TCCTTTrTCAA GTTGGTCAAA GTCAATCTCA GTAATCAATC TACCATTATT CAACTTGACA TTAATTAAAA CCGCCTCGCC TT'rTTTGTA ACCACACCCT CGATAAAGAC AAGAGCCTCT were not lodged with this application 529 TTTCGTrCGC CCCCCTGACA AGAGTCGTCC GCGTTCACCA ACTTCACTA'r CTAGTCCCTC
TTTCATGGAG
ATCAGTTACT
TGCATTAT?1'
CGAATCTCAT
AAGCGATCA
TGTG~aAAccc CACCTAGTGA TACTAAGTCT AGCACTTTCA TCAATTCATC AACCGAGACA AAGATTGTCA CGAATACTGC CAGATAAGAC AAGCGAIT ACT"TCTCCAT TCTTTTAAGT TAAAATCATA CLAATATCTCC TGAAAGCGGT TTATAA.AACC GCTCTAACAA CTGATCCAGA TGG'TCCAACA AAAGCAAI-rT TTTGCCCCTT CCTTTAAGAC AGGTCGA'rr TCATCATAAC CAAAATAGAC TATACTTGAT TGCTCCATTA ACGCACAA'rC GTGATTTTC CAAAATTGAA CAAGTAATAT ATGGTTAAAA TTCAACCCTC AACTGCAAGC AAGTTCTCCA GTCCTGATAC CGATmCCT GTGCAACTGA AGATCCCTTG
CCCIYCAAATT
CTCCTAGAAT
AGGTAAATCA
rrC=AGG
AAACAGTTAC
AAAACGAAAC
8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 AAAATTAGC'T ATATTACTAA TAGGATTAAG CAAGGTTCCC ACAGATATAT ATCCTGCGCT AGCTATAGC GCAAAGATAA ATAACAGAGC TGATTTCAGT GAATTGTT'rT GTACCCTTTC TTTCTCTGCI TGGTTACTCT TAATTAATTC TGTTAAATTT CCTCCTGTAA ACGACCACTA ATAATAAACA TCATACAAGG AAGAGTGATC
TAATTCAAA.G
GACCCGATAA
AAkACGGGGTC AATACAATTA TCCAAAACAT CCTGTACACT ATGT'rCT-rGA ATCTTTTCAG TCAATTCCCC TAC=rTCAC TGATAT'rGGA AAGGGGCAAG AATAAAAGTA GAGAAAGATT CCAATCAAGA ATAACTAAAC TCAGAATAAT ATTTGGGAAA GTG;TCA'rTGA CAATGGCAGA AGTCAACTCC CCCCCATAGG TTAGCATCAC TCAAAAGAAG TAACCCTATC
CTAAATAAGA
GTCGTAATTA
CCACTTTGGC
TTCCTGATTT
GAAAACAAGG
TCAATAGAAC
CTAGACAGAA
AATTTCATAA
CTACAATGGA
AAAACTCACG
TCTTATCAAA
TTGCTATCTT
CTTGACCAAT
TCCCATCTAT
ACCAAGTACC
AATGACACTC
GAAGGATTTC
9360 TCTACATAAA TCAACCCCTC TTTTCACCC GATTGACTAA ACAGATAGTA AAAAATCA.AA AACGATTGAA ATACTTTGGA TAAATCCTTT AAGATAAGGG GAAGCAACAA TATCAC'TITT 9480 ACCAATAGAA 9540 GCC'rATATTT 9600 AGCAAGTAGA 9660 TAATAATTNT 9720 CTCAATTTGA 9780 TTTATATACC 9840 CAAGTAAGAA ACTCCCCATA ATCACCTTAG TATCTACTCT ATACTCCTTA TAATATTTCA ACGGATAAAG TCGGCAATAA GGATAAAATC TA6ATAAATCT TCCTATAACA AAACCCATAA CATCTAGGAT TG.ATATTATG CGT1TTTAAG CACAAAGACT TCTTACACAA ACTTATCTAC AATTAGATTT TATTTGACAT GN'TGCAA TTCTTCTTGG GCTNN'rTAT TGGATTCTTC 'NTCTrrC AACCATTTTT CTCTGGCTI-r TGCATATTCG TCTGTTGTGA CAATCTTATC TTGTACTTTG AGGTATTTAT ATGATTCAAC CCCT7TTGTA CCGGTTAAAC CATAGGCAGC AGCAAATGGT 9900 9960 10020 10080 530 ACGGTTCTTC TCAATGATGG TG=TCCCCA CGCGAAACAC 1'TGGAACGAAC TCAATCAACC AAC7'TAAT ATCAGCATAT TTCTCATAAC G!rTTGGCCGG TTATTAGC~r CTTCCAACAT TTGAGTATAG ACATCCAGTC CAACGC~rr TTGGCCTCAC CAGGCTCTAG TCCAAGATTT TGCAGAAATC CTCCACTA1rr ATATCGAGAT AGGTTrGACGG GTCTTGATAA TCAGGTCCCC AACCGCCA'rG TAATCTTTCT GAGCAGCTGT 'rTGAGCAAAG TAGCCTGAAC TGTCAAACTC AATTGCTGAA 'rGTCAATCAC TACATTATCA GAACCTAAAA CAGArrCAAT TIAAAGAACTA 10140 ATCTTGCTCT 10200 AGCC7?DGTCA 10260 AGTATTAAAA 103120 ATATAAATrCA 10380 ATCTGATGTT 10440 TGATTGTT'rrG 10500 ATAGAACTAA CTCCTTGTAT GCCTACTrrA ATTGGGAATTr GAACACCCTT TGCTTCGAGT GCTTTCTCAG GATTGTAGTA AGGGTCTTGA TTACCATAGT TGACCATCTT AGAGGCTACA ACAPLAGTTTG GAGGAACCAC TAGGTTACGC GACTGAGCCC CATAAGATGT TCTGTCAAAA TTGAGAACTG CTTCCTGAGT CGATTTCTTT TCTGTTACT CCACAGTCTT ATCCAAGTGG TCTTTCTTAG CTTCCGCAAA CTTAGCCTTG CCATCCGCAA AGTGATACC TTGCCATTCC ACN'CACCAA AGTC=t~CC CTTGATACTG
AAAATCTTTG
GCAAAATTGA
TCAATGTCAC
TAAGAC'rTCC TATCTAGGTT AAAATTAAAG AAATATGAAG ATGATATTGT TrMTTATTT TTCTTTAATC CCTTCATAGC CGAGCCGTAG TATAAGCACC AGCTGTAAAA TTACGTTCCA TCATAGTAGG TCAATTTCAC ATCGTCTACA AAGACATTCT T'rTTTCTTAT ATTCAATAGC AGATTTTGAG ACAAGTGCTr TACAAAATAC TAGATGGArC CGCCTTCCCA AAATCATCCC GCATTAACAG GAAAAAGTAT CGTTGCAAGT GVTTTGAAT ACCAAAGTAT ATTGAACCGT T'rGGTCATCA AGTGCCTTGA CTG"TAC CAGTGATATA GTCATCCAAA CCAGCAACAG GCTTCTGATT TTTTATCAGC TGCATATTGC AAACCTGTCA TTGCACCTTC TTTCCCTTCA TAGCCTGACG GAAGTTTTTA rrGTTTTACA AGTATAATTC TTGAATTTTG CATACTATAG TGGAGCTGTT AGGAAAAAGA GTGATTCTTG GTCGCTACCA TAGCATCCCA GTAATTAGGG
TCATCAAGAA
C=rTGATTT
TCCAGTAAAG
CACCGACAGT
AGTCCTGCAC
CAAAATCC'TG
AGGTCCATTG
CAGGAAATC'
TTCTGGTTTA
TGAAAAGTCG
TACATACAAG
GGCAG?1'ACA
TTTGTAGGTA
AATATTCCCA
TGTTGCTGCG
GTTGTAGGrr
CAGGACAAGA
10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11864
GGCGCATATT
TAGGTCAAAC
TATTGGTCA'r CCIGTrTCTC T'rTGACGCCG CTTCrCCCTC
CGTCCTGAGA
TTTCTAATAA
CTAGATAGTT
TGCTAGAATT
AGAAGTAAAC CACTTGGCAT CCTACGAAG AACAGTCCAA TCCTCTGCTA ATGATGGAAT CCCGTCTACC AAATTTGCAA CAATATCGGA CAAGCTAGAT CGATCACTTG AATAAACATA TCCACACGCG CTCAATAAAA CTCCTGTACC CCTGCCAACG TTAGATATTT GCTCTTAGAC TwlTCATTT CCGG INFORMATION FOR SEQ ID, NO: 62: SEQUENCE CHARACTERISTICS: LENGTH: 2412 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: TAACTGCACT AAACATAATA TAAGGAGAGA AAATGTCTGC AATAGAACGT ATTACAAAAG CTGCTCACTT AATTCATATG AACGATATTA TCCGTGAAGG GAATCCTACT CTACGCGCGA 120 TTGCTGAGGA AGTCACN'TC CCCCTATCTG ACCAGGAAAT CATCCTAGGC GAAAAGATGA 180 TGCAATTCCT TAAACATTCC CAAGATCCTG TCATGGCTGA AAAAATGGGA CTCCGCGGTG 240 GTGTTGGACT GGCTGCTCCC CAGTTAGATA TCTCAAAACG CATTATCGCT GtTMGGTAC 300 CTAATATTGT TGAAGAAGGC GAAACTCCAC AGGAAGCCTA CGATTTCGAA GCCATTATGT 360 ACAATCCAAA AATCGTCTCT CACTCTGTTC AAGATGCTGC TC'rrGCCGAA GGAGAAGGT1' 420 *GCCTGTCTGr TGACCGTAAC GTGCCTGGCT ATGTTGTTCG CCATGCCCGC GTTACTGTTG 480 ACTACTTTGA CAAAGATGGA GAAAAACACC GTATCAAACT CAAAGGCTAC AACTCCATTG 540 TTGTTCAGCA TGAAATTGAC CACATTAACG GTATCATGTr TTACGATCGC ATCAATGAAA 600 AAGACCCATT TGCAGTTAAA GATGGTTTAC TGATTCTTGA ATAAAGAAAA TCCCGTTGCA 660 AGACGGGGTT TTGTGTTATA ATAGAGGCAT GAAAACAAAT GATAT-rGTCT ATGGTGTCCA 720 CGCCGTTACC GAAGCCCTCC TTGCAA.ATAC AGGAAACAAA CTCTACCTCC AAGAAGATCT 780 *CCGAGGTAAG AATGTTGAGA AAGTCAAGGA ACTAGCTACA GAAAAGAAGG TGTCCATTTC 840 TTGGACATCA AAAAAATCTC TCTCTGAGAT TACTGAAGGT GCTGTTCATC AAGGTTN'GT 900.
TCTACGAGTG TCTGAATTTG CCTATAGCGA GCTAGATTAC ATCCTTGCAA AAACACGCCA 960 *AGAAGAAAAT CCACTTCTAT TGATTCTAGA TGGTCTAACC GATCCCCATA ATCTGGGTTC 1020 *TATCTTGCGA ACAGCCGATG CGACCAATGT TTCAGGTGTC ATCATTCCCA AGCACCGTAC 1080 TGTCGGAGTA ACTCCTGTCG 'rTGCCAAAAC AGCCACAGGT GCTATTGAAC ACGTtCCAAT 1140 *TGCCCGAGTG ACCAACCTCA GTCAAACCTT AGGATAAACT TAAGGATGAA GGTTTCTGGA 1200 **.CC'PTTGGAAC GGATATGAAC GG'rAC'rCCTT GCCACAAGTG GAATACAAAA GGGAAAATCG 1260 CCCTCATCAT TGGAAATGAA GGAAAAGGTA TCTCTAGCAA CATCAhAAAA CAGGTCGATG 1320 AAATGA'PTAC CATTCCGATG AATGGACATG TTCAAAGCCT TAATGCCAGT GTTGCTGCGG 1380 532 CCATTCTCAT GTACGAAGTr TTCCGA.AATA GACTATAAAA GGAAACT'r TTATGATTAA CTATGTrCTG TAATGAATTT TAGCTCCATC TCCAACCGCT GTTGTTACI' GGCGAAGGTC CTGCAAAGAT ACCGTCGACT GCAG1rrTCA TGTGGTTATC GATCTTGGAT ATTCAATTCT TTAACAAAAT CGCTAAGAGG AGACACCACC GAAGCCTTGT TCTGTCACTT GACCTG~rT AAGTTTCCAG TCATCTGATT ATAGGCTrCT TGACCAGCGA '1-I-CAAGCGA ACATCTCCAA TGTCACAATC CATCCTGCCT GTCCAAACCA ACATAGATAA CACATTrTCA A.ATACGACTG ATCCCAGATA AAGCTGATT ACGAAGTTGG TCACGACGGT GGC 'rCTTCA ACAGCTGAAT AGCACCATCA CACACAGCAC CACTCCCAAA GGACGGTGTr TTGGTCATCA GTCATCACTT TAAATGTGCI' CAACACCAAG ATrCTACTCG GTTTrCACCC T1'GATrCCC ?r'rCATTCCC AAAGGCGCGA TCTTGTAAAA GAACAATGGT AACAGTCTrA GCAAAACGAG CTCCACCACC AACTACCAAT AAATCTTGGT AGTAAGAAAC ACCACGACTG TTCAGTTCTT TAGAACCAGT TGCTACGATA ACTGTACGTG TCTTAA.AATC ACCATGGCTT CGACATTTC TrACTACAGA CCT'ITrGGGC
TCAAGAAGAG
CACGGAAGAA
CTTCTCCAGG
TTTCATATGT
AACATAACCA
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2412 AT'PTTCAAGT GGTTCAAACA TCTTTTCAGC CAATTCAGGT CCACTAATAT TAGCGTATCC TGGGTAATTT TCGATATCAG ATGTATTATT CATCTGACCA CCTGGCAGAC CACCTTCAAT CAAAGCTACT 7'NTAGATTGC TTCGAGCAGC ATACAAGGCC GCAGTCATCC cTGCAGGTCC AGCACCGATA ATAATACTAT CGTACATATA GATTCCTTCT TrC'TTGGTGr AACTATCTTT ATTCTAACTC TG INFORMATION FOR SEQ ID NO: 63: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 7760 base pairs B) TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: CCGATTTGGT GGAATTTTG TCTCATCAT' TAGAAGGTGT TGCAAGAGCA GAGTTTACCT TGGTGCTTCA TACCAAATTG GGAGAAGCCT CTG=NTGGC AAATATTGTA AGGATGAATG GATT-TTAGGA ACAGTTGCTG GTGCCAATAC CTTATTGGTT ATCAGCACGT TGCCAAACTC ATG3GAAGATC GTTTGCTAGA TTTGATGAAA GTCTTGGGAG TTGCTCTCAA GACTTATTT TGAAAAGGAG AGACACAAAA AAAGTTATCA CCCGGCATGC AACAGTATGT GGATATTAAA AAGCAATATC
GATGTAAACA
ATTTGTCGAG
CATAACTAAG
TGCATAGA
CAGATGCTTT
TTGCTCTTT CGGATGGGTG ATMATGA GCAGATTCTG GAAATTTCCT TAACGAGTCG GGCCGGTT CCCI'ATCATT CTGCCCAACA TAAGG;TGGCT ATCGCAGAGC AGATGGAAGA AGAGGTGTT CAGGTCATTA CGCCAGGGAC GAATAATTTT TrGGTrrCCA 'rAGACCG.CGA TTTGGTGACG GGTGACN'TT ATGTGACAGG AATCCGTAAC CTCAAGGCTC CAGAAGTGGT ACAAATCCTC AGCCGCCAGA TGAATCTGGT kI-rATN'PA'r GAGGATrGCGG TCAATGCTGC CAACAAGP.AT GCCGACAATC CGATCCCTAT GTATATCGAT GTrCTTGAI-rG AGCAGGGr-rA TCCTAAACAA GCAGTTGGGG TTGTTAAACG AGTGGTCGAT AGCAGTAAGC CGGACAGTCA AGGCAATCAA 'rTTGGCCTAG CTI'ATATGGA TCTr7GAT TTCACGCTGG TTGTGCGGA G'rTGGGTTAT GACTTGTCTG AGGAAGAAGA ACTCTCTTAT GAAAAAGAAA GCTTGAAGA CCT'rCATTA ?TGGATTTGC CCAGTATGTT CATCGGACI'C CGAAATTAAG GATTTCTTGC GAATGCTCGC TCAGGTAAGA S S
S
*S.S
S S
GGC'TATGGGG
AATCGTCCAA
CTTGACAGAC
'rGGCAAAAC
ATGCGTCTCT
CGTCAAGAAG
AGTCTCAANGC
AATCCAAAGG
GATTGGCAAC GGTGCAGCAA ACGGCATCTA GTAAGCTGCT AGA'rGAGGGA ATrGAACCAC CTCAAACCTG 'rrATCCGCTA AGATGGATTA TGCGACCAAG GCTAGTCTGG AT~rGGTTGA AACAAGGCAG TCTTTTCG CTTTTGGATG AAACCAAAAC TGCGTTCTTG GATrCATCGC CCCTTGAT?G ATAAGGAACG TAGTGCAGGT CTTTCTCGAC CAT7TCTrTG AGCGTAGTGA GTGTTTATGA CATTGAGCGC TTGGCTAGTC GrTTrCT1r ATCTC7"rGCA GTrGGCGACT ACCTrGTCTA GTGTGCCACG GGATGGAGCA ACCTACTCTA GCCTATCTCA TCGCACAACT AGAGTTTGAT TAGCGCAGCG A?'rGCTCCTG AAGCTCCTCA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 GATTCGTGCG ATT'IrAGAAG GGATGCAATC CCTGAG'rTGG
TGTGATTACA
TTGCGTTCTC
CTCTGGTATC
GACCAATTCG
GATGGGGGAA TTATCCGGAC 'rGGATTTGAT GAGACTTTAG ACAAGTATCG AGAGAAGGGA CTAGCTGGAT TGCTGAGATT GAGGCTAAGG AGCGAGAAAA AGCACCCTCA AGATTGACTA CAATAAAAAG GATGCTACT ATN'CATGT CAACTAGGAA ATGTGCCAGC TCACTTTTC CGCAAGGCGA CGCTGAAAAA woo.
0. .0 CTCAGAACGC TTTGGAACCG AAGAAT'rAGC CCGTATCGAG GGAGATATGC TGAGAAGTCA GCCAACCTCG AATACGAAAT ATTTATGCGC AT'rCGTGAAG GTACATCCAG CGTTTACAAG CTCTAGCCCA AGGAATTGCG ACGGTTGATG TCTGGCGGTT GTGGCTGAAA CCCAGCATTr' GATTCGACCT GAGTTTGGTG AATTGATATC CGGAAAGGGC GCCATGCTGT CG?1'GAAAAG GTTATGGGGG TATTCCAAAT ACGATTCAGA 'rGGCAGAACA TACCAGTATT CAACTGG7*rA
TTGAGGCGCG
AGGTCGGCAA
TCTTACAGAG
ACGA'PTCACA
CTCAGACCTA
CAGGGCCAAA
534 CATGAGTGGG AAGTCTACCT ATATCCGTCA G1'TAGCCATG ACGGCGGTTA TGGCCCAGCT GC4T=CCTAT
TATCGGAGCA
GGCCAATAAT
ACGTGGAACT
TGAGCACATC
GTCTAGTTTA
CACCTTCCTT
AAGATTGCTG
GMCCTGCTG
GCAGATGACT
GCCATTTCGC
GCAACTTATG
GGAGCTAAGA
CAACACTTGG
CACAACATTG
GCTTGCCAGC
AAAGCGCCCA TTTACCGArr TrTTGATGCGA TT~rTACCCC TGGT'TCCGGG TCAGTCAACC ?1'TATGGTGG AGATGATGGA ATGCGACCAA GAACTCTCTC ATTCTCrG ATGAA-rGGG ACGGGATGGC TCTTGCTCAG TCCATCATCG AATATATCCA CCCTCTq-rGC GACCCACTAC CATGAGTTGA CTAGTCTGGA TCAATGTCCA CCTGGCAACT rTTGCAGG ATGGGCAGGT AACCGGGACC AGCTGATAAA TCtACGGTAT CCATGTTGCC AGACCTrTTA GCAAGGGCGG ATAAGAT'N'T GACTCAGCTA TCCTCCTCCC ATGAGACAAA CTAGTGCI'GT CACTGAACAG AGAAGAGCAT CCTA'rCCTAG CAGAATTAGC TAAACTGGAT GCAGGTTATG AATGTCrrAG TAGAGTTAAA ACAGAAACTA GAGAATCAAG GAACAGAGAG AT'IrCACTCT TTGATAGGGC CTGTATAATA TGACACCTAT TAAAACCAAG ACTCACTAGT AC7TTrTTGC TAGAATAACA CCCTrNTGrTC TA'N'TTTTAA GGCAGGCCCT GGCTTCCCTC CGcG7,rATcT GCAGGAAG'rC
TAATCTAGCT
TCACACAAAC
CCAGAAAGTA
CTGATCACAG
1TAGGCGCCC
GTATCAA:GGA
AGAATGAAAA
TGCTGATTCA
GCTTGATGGT
TCCTTACTGG
GACTTCT'rG ACAATTCTCC GGAGCTGACC CATTGTCGCT GAAAATAAAA ACC1'ACAAGT TGCTACTTCA CTTCTGCAAC GAAATATGAA GCTAr'rTA'rA AG7'rGCTGGT GG3ACTCAATG CCTrCGGGAG GATGCCT'rCC 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 GTATCGGGGC TTGGTTGATT TTGTCCTCGC AGCCTA'rATT GTAAAATTCA AACCTTrCr TTCGAATGAC AAATCATATC TTTTCAGACT TCCCCTCTTG CTCTGTCGTG GGTGATTGTT TGGGAATGAT GGGCCTCGT TTGCCAAGGA AAA'rTTACGT AATTTGCTAA GTTTACAGAG TATGCTGATA 'rTGAACAAT'r TAATGCGGGA AACCAGAT'rC AGAACGrrCG? CATGATGACC 'rTCATCGGTT CGTTTATCCT AGCGGTTCAA GGTGTCGCCC TAGTCGGTC'r GCCCAAGGAG TTTCATCCGA
AATCTAGTCG
TTCCAAA'rTC
ACCTTACCTT
CTCATGGTAG
TTTGCCAAGT
GGCGTTCGTG
GTCTCAGACG
TCTTGA7TTT TGGT'r'GACT GCTGTCATGA TTrCAAACCCT TCTTGAGCGC ATCAATGCCA TGGTCAAGTC CTTTGTCCAA GAAAA.AGAGC AGCTTCTTGG TCAAAACCTT TACATTGGrT TGTTGGT'rGG 'rrACGGGGCG GTCTTCCTCT ATGCCTTTTC AGTAGTrGGAA CCCTTTATCA CTATTTGGCT GGTCGCGGGA ATGGTTCAGT CGGATCCGTC TGrTTGGT TCCATCGCTrr C-IrTTIG=AA TTACCTAAGC CAGATTATCT TT~ACCATTGT TATGG'TTCA TTTTGGGAA ATTCTGTCAG CCCCCCATG ATTTCCATGC GTCGTATTCG AGAAATTCTT GACGCAGAGC 535 CAGCTATGAC C7rCAAGGAT ATCCCAGATG AAGAGTTGGT TGGAAGTCrr AGC71"TGAAA ATGTGACCTT TACCTATCCA ATGGACAAGG AACCGATGC'r GAAAGATGTG AGCTTTACTA '?rGAACCTGG TCAAAMGGTr GGTGTAGTTG GAGCGACTGG TIGCAGGAAAG TCAACCTTGG CTCAATTGAT TCCACGWrCC 7"TTGATCCAC AGGACGGGGC CATTAAAATC GGTGGCAACG ATATTCGAGA AGTGAAGTGAA CGAACCCTGC GTAAAACAGT TCCATCGTT CTCCAACGTG CCATTCTr TAGTG4GAACG ATrGCAGATA ACTTGAGACA CCGGAAGGGG AATGCTACTC TATTTGAAAT GCAGCGCGCA GCCAATArrG CCCACGCTAG TGAArrCATT- CATCGTATGG AGAAAACCTT TGAAAGTCCA GTrGAAGAAc GGGGAACCAA TTTCTCTGGT GGACAAAAAC AAACGATGTC GATTGCGCGT GGGATTGTCA GCAATCCACG TATTCTGATT TTTGATGATT CGACCTCAGC CTTGGATGCC AAA'rCAGAGC GCTTGGTGCA AGAAGC~rG AATAAGGACT TCAAGGGGAC GACAACCATT- ATTATTGCTC AAAAAA'rTAG C'TCGGT'rGTC CATGCAGACA AGATC7T=G TCI'AAATCAA GGACGAT'rGA 7TTGGTCAAGG TACGCATGCA GACrrGGTTG CCAACAATGC CGTTTACCGT GAAATCTATG AAACACAGAA ATGAAAGACA AACTATAAGA AAAGTCAATA GCTTATCTA AACTATTTCT TATTTCAATT TGATGATTTG GCGATGAT TAGAGCACGG CAAAAAGCCC TTGAAAAAGT CCATTTrTTC AAAGGTAATC CTGTGTTAAT TTCAGAAATT ACATCACTTT TTGrrCCTCA AATrGGCAGCT CT 7TTAC, GATATAAAAC ACGTTCGGA 'rAAGTTTTT TGCAAGGTGG ATGATGGCTA CA'rTGTAATG TTTTCCTTGT TCTAATTTAG TCTTAAGATA GGCCTTAAAA GCAGGCGAAA AGCGAGGGCA 'rGCTT'rGGCA GCTTGTATGA GTACCTACCG CAGATGAGGG GAACTCCGTT TGACCATTCT TCCTCCTAAA TCAATCTGAT CTGACTGATA AATAGAAGAA TCCAGTCCtAG CG;AAAC'rrG TAATTGAGCA GGATTATCAA AGGCATGAAT ATTTCGAA'rC TCAGCTAAAA TGACCGCCCC TAAACGATCC CCAATCCCAG TAACCGTCGT GATGACCGAG TT-GAAC'rCAG CCATCAAGTC ATTGACACAT GTTTCCCCCT TGTCAATGAG CCTCTTC'rAA TGTTTGATGT TTTCATTACA CGAGATAAAA CGTCTATGCG TTATCAAAC'r CATTACCAAT TAAAACAAAA AGCTGTGGTT AGATCC'r'r'C GGAAATTGTC AAGCGATTGG AGGAAA'rGAA CTAATCCACA GCGGCTTAT CCAAGTATAC CACTTCGCCT TTGGCAGTAG CTAACTGCGC TAAATATAAT ATAAGGAGGA GTAAAATGAA GACAGTTCAA T'rTrTTGGC ATTATTTTAA GGTCTACAAG TTCTCA2-IrM TAGTTGTCAT ccTGATGATT GTTcTGGCGA CTTTTGCCCA AGCCCTCTTT CCAGTCTI'T CTGGACAAGC GGTGACGCAG CTACCCAATr TAC2'TCAAGC TTATCAAAAT GGCAATCCAG AACT'rGTATG 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 '2920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 3960 4020 4080 4140 4200 4260 536 GCA.AAGCCTA TCAGGAATCA TGGTCAA'rC' TGCCCCTG GT~GG=rC TATTrATCTC TAG'rGTAATA TACATGTGTC TCATGACGCG CGTGATTGCA GAATCGACCA ACGAGATGCG CAAAGCCCTC TTTGGTAAGC TTGC'TCAGTT GACGGTTTCT TTCTTTGACC TGGCGATATC CTGTCTCATT TTACCAGI'GA TrGGATAAT ATCCTCCAAG AAGCTTGArr CAGGTCATGA GCAATATTGT TrATACATT GGTrCrGATTC
TTCGAGAAAT
GCTGATTrrC
GTCGACALAGA
CCTTTALACGA
TTGTCATGTT
CTTTCCTTAT
AAGAGGTAGG
'rTGTGCAAGG
GTCACGCTGG
ATCGTGAAAA
CTCTCATCAC CATTGCCAGC ACCCCATTGG TGGCACGCAA ATACACCAAC CTCCACCAGA GAAGCTCAAC GCCTATATGG ATGACAGCAT AAT'rCAAGAG GATATGATGG CACGATTCT CrAAAGGA AGAATGTTCT CAGGAATTCT TAATACAGCC ATCGCrCATCT TTGCTGGTTC AACAAGTACA GCCCTAGGTT.. TGATTGTTAT GCCTATTATC CAAGI"TGCAG CGAGTGGGC ACGAATTCAG GAAA'rG'TG ATGCAGAGGA CACTAAGTTG CAAGAAAGTG TTGAAA'rCAG ACCTATTTTG AAAGATCTCA GCAT'm'CTGC CTCAGGCCAA AAAGCCGTGA TGAACAAA.AT GAGCGCGTGC GCAAGGCAAC TTTCCCTGTC ATCAATGGGA TGAGCCTGAT GGCTGTACTT TTGAATGATA AGTCTATTGA G~rrGCACAA TTTCACAGC ACTPACrACCA AAGCCTTCAG TTGCCTT'rA CTCGAGCTGA GGAAATCCGA CCTGAAAAGG CTCCAACC7?? TCATATCGTT TTr'rCATACT CCCTAAAGGC CAGATGACAG GCCGCAGGT TCAGGAAAAA CGACTATTAT GAACCTCATC AATCGCTTTTr TGCTGGTGGT ATTTATTT'rG AACCAAGGTG CGAATTGTAT TATCCGATTT GGTGTGCCAG CCACATTCAC GACTATATCG CCAGAGCATC TTTTCAACAG AGATCCAGAA GTTCTCATTC CAAGATTCAG CATGCCATGG CCGCTTGAAA ACCATTCTCA ATGGTAAAGA CATTCGTGGC TATGACTTAG TGCAAGATTC CGTCTTGTTT AGCGGAACGA ATGCTAGTCA GGAAATGGTT GAGGTAGCAG AAAGTTTGCC TGATAAGTAC GATACTCTTA GGCAGAAGCA AT'rGATT~TCA ATCGCTCGAA TCGATGAAGC AACTTCAAAC GTAGATACGG AGGTGGTTGT AGCAGGTAGA ACTAG CG
TGCCTGATAA
CAGTTGT'rGG
ATGATGTTGA
ATAGTCTTAG
TTAGAGACAA
CAAAAGCAAC
TTGATGATGA
CCCTGATGAC
TGACAGAAAG
TCATTGCCCA
5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 ATCCAGATCA GATTATTGTC CTTAAAGATG GAGAAGTCAT .TGAACCTGGT AACCACCATG AACTTT'rGAA GCTAGGTCGC CAATCAATTr G'TITTCGAAT AAGAAAGAAG 7"=?CTATG ATAAAAAATG T 'rATCACAG CCTTAAAAAA AACATATTAC TATGATAGGA CTATCGTTAG CATTCGAAAC GAGAGGCATC AGTTCCTGCA AATCTATGTC CCGTAGACGC AGAAGGCAAA TTTTATTCAG AACTCTATCA TGGGCAGCTT TTTCTTGTCC ACGAAAGTCA 1'TTTGAGTGA ATGGCTAGAA CGGTTGTAGG ATCATTCATT CATCTGTATC 537 TGTrAGATTC GCAGAGATCA TTCGTCAAGT CGGTGGTCTC CCTrAGTCA TGATGAGTCA GTTGTACGTG ATTATGTGGA AATGATTGAC AAACTCATTT CCAAAATGTT CATCCTCAGT 'N'TATGGAGA GAAAAAGACC GTCGAGAGCG TCTGGTCCGT GACGAATTTG AATTGGCACT CTTGAAGGAA GCGCTTCGTC AATTATGGCA ATCTGTCGCG GTGTCCAACT TGTCAATGTT GCCNTGGTG TCAAGAAATC GAAGGTCAGG INFORMATION FOR SEQ ID NO: 64: Wi SEQUENCE CHARACTERISTICS: LENGTH: 2723 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64:
TTCCTGTGG
TGACAGGAGG
ATGATTACAA
AGAATAAACC
GAACCCTCAA
7500 7560 7620 7680 7740 7760 GAGGTTTTAA TTCACTTACC TCTsCCGTAT CTTTATTTA-A AATGAATTCT TTTACGGTTG
TATTTCTTGC
TAATATCATT
AAAATCTTTT
AAATGATGTA
ACAACAATCT TAATGTTTAG TGTCTTGTCT TAT'rCTTTTC CATTTATATA AATATGTTGT CACCATCGAA TCCATTATTT CATATTCGAT ACTAGTATTT ATAAAAACTG ATAGTGGACT GATAAACATT TCCATCCATA TCATTCCCCA GTCCATGATG TGTTATCAAT GTGTAAATCT
CTTTTATCAT
CCCTTAGGAT
CCAACTGCTT
TCTGTTACCT
TCACCGTCTT
CCGTAGATTA
TGATGTTAAA GACTACAGAT CAATGTTTAC TTCGGGTTTA TAGCATTCAA ATCGCTATAG TATCTGGAAA TCCGTTTGCT TAACATTCAG CTTAATATTA AATAATTATC TACAACCGAT
ATTATTTGT
TCTTGAATCT
ITTCCATCAG
ACATTATCAT
CCAGTTTGAA
TTATAGTCTT
AAATCTCTAG
TCATTAACTC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 TCAATTCCCA GTTAAAACCA CCCTTATCAG AAATCTTACC TCTTAAATAA AAI7CTGGAT T'rCGTACATA AATTTTATTA GATTTAGATG GATTAAAGTA GTTCTTATCC ATTGAAAGGT TTACTGGTTT GGTATCAATA AATAACATGG ACTTATCCTC TCCAGTGTAT TCTTTATCAT CTGTCACAAG A'rFTCCGTCT ?1'ATCGATAG ACACTTGATA CCTTATAATG TTAAAGCCGT TTC?'TCCATC TTCAACATTT CTACTATCAG AGCCATCTTC TTTTATAGCT TCTACATTGA CCTTACCAAA TAATACAAGT TTAGAAGAAT CTTCCCCTTT ATCGTTCATT TTAAATGTAA CCAAAGCCGA CATTAATACA GATTGGGTAC CATAAATTGT TGTM'CTGAA AGGGCTCTTA GATTAGGATT GGCCTTTTGT AIwrrrCCTA TATCTTCCTT GCTATAGACT CCATTTCCT 538 CTAACATATC CG1TrTPTCCA GGAFATAGG TAGTCACUrT TAGTGCATAG CCTTrrCTTA GAATGATATT ATICCNTTAAC AGATATTG'rr GTTTTCTGA ATCAGAATAG ATTTTACCAG ATTCCATT?1' AGTTAAATTG TCTGGTTT1GT ?1r'N'GAAAG ATCTCCTTCC CCTAATTCTA TG.ACATTCCC ATAACT'rGAT ACATAGGGAT A? CTGATTT AGTTTCCTTA ATrrrl-rCAG GCATTCTAAT TTTAATTTCA GCTTTTTTCT GATCATTATC TTTAACAAAT AATCTCATAT CTCCTGCAAA AGCTAATCCA TCCACAATAT CATTAATATT AGCG1'ATAGA TCAAATCTCA TCGNrTTGA GTGGAAATCA TACTTGGTCG CTTrTGATTTC TATAGATTTA TAGTTATcCC 1080 1140 1200 1260 1320 1380 1440 CATAATATAC CTTGGCATTT CATCTTTAGA CGCACTTAGA TTTCATAT'rC TACATCAGTC TATAATCGTA TTCCTCCATT ATT'rCTTTT-T AATGAGT'N'C TrCTATCAAT AGTAAAACTA TTAGAAACAT TACTTATCTT TCCAAGAA?'r TCAAAGTGTC ACACCATAAA TTTTTGATTT GATTT'CTCrA AGTTTCTCAG CCATCATCGT AGGCTATTAT ATrI-CCTTTA TCATCGTATT
CTCTTACCAG
TT'FAAGTCTT
GATTTTrCTT TTTCACTTGT AAAATCATCA ACTTCTCTAA TATTTTCAAA CTCTCTAAI' GTTGAAA'rAT TAATAGACTC TTCATTTTCT TGATGATGAT 9.
9*q ft 9 9 9. .9 9 9 .4 9 9 9 99'.9 9 99 9.
4* 9 9 GTrCTACCCC AGTGTATCT 'N'TTTTAGAC TACCCTCTTT ATTTAGATTC TGCAA'rCTCG CCAAGC?'r?1 GATATTrAGA CTAGATAATA GGAAATCATC CCCTTTTCAT CAGCCTGATT 'rCTTTGTGAA ATTGCTAGAA CCATCTAATG CAATGACTTC TCCATTTCCT AAATTTTTAA TGAATCT'rGA TCACGATCTA AGCAAATTTA ATrCTATGAA AATGATTTrT CCCCTTAA.AT CTCCCGCACC TTTAATTTCA TAAATGGTAT TTCCGTCTTT ATCAAGTTTT CTAT'T-CTTC CTTGACCCTC ACCTGCGTAA GTTACTTCAA GATTrTTTTC AACCrCrCCA TCTTCATTAA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2723 CAAGAGCGGC GCCAGCATAC CAAACTTCGT CT?IGATC .TCTCGCAAAT AGCGT~rCAT TATCCTTTGT AA'rCAACTTA ATTT'TTTCAG GGGCGGTGTT ATCAATTTTT ACAGGAATAT ATCTATATTT AAATTTATAG AAATATI'GAC TCTTAGTAGC AGGATCTTGA TrA'rCC~rAC -GATrATAGAT GAGTCCATCC CACTrCAAGT TCGCAATC'rC GTCA.AATTTT TCAGGATGTT TCTTATAC'rG ATCT'TTTACC TrATGATAAG GATTTGAAA-A ATCAACCGAA ACAATCTrAG AGGAAACCTG CCATGGGTAA 'rCTrTAGTTA CTTCCGCAAT CGGTTCAAAT TGACCTCTTA TrTTCTGGTGC ATTTTCTTCT CTACCTCTAG CACCCCAAAC TTTTAGTTTA GA'rGATTTGA 9. 9 9 9 TT1CCCTTTGC ATCATTGCTT TTAGAATTTA AAATTCCTCT AATA.AAGTGT TCTCTCGAAA.
TGACTTAA GTCTCN'TGA TTTTCTCCCT CTr'rATTTGT ATTTACTATT GAAATCAATC CTTCTTCTGC ACTTCTrAAT ACA INFORMATION FOR SEQ ID NO: 539 SEQUENCE CHARACTERISTICS: LENGTH: 11831 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: AAAAAAGTGG GAATGACTCA ATTGAAGCAA CTCCAAACGT GCTATCCAAG TTGGTTTCGA CATGTAGCGA AAGCTAACAC AATCTTCACT GAAGCTGGCG AATTGATCCC TGTAACAGTr TGTTCTCAA CTTAAAACTG TGACAAACGC GAAGTATTGA GGCTCCTAAG CGCTTCATTC AATTACAGTT GAAACATTCG TAAAGGTTTC CAAGGTG'rTA TTGAAACAGA CGGATACAAC GCAACAAACC TGCTAAAGGA
GGCTTGGAAG
GTAACGGGTA
CGTGGACCAA
GCACCTAACC
ACAATTCAAA
??GGTGCTGA
CTTCTAAAGG
TGGCTCACGG
GCGTATTCAA
ACCTTCAAGT
GTGAATTCAA
CAGCTGGAGA
TCAAACGCCA
CAGGTTCTAT
GTATGGGTGG
GGTAACGTAC CAGGTGCTAA AAATAATAAA GAAAGGGGAA GGTAAAGAAG CTGGCCAAGT TCAGTTG'rGT TTGATGTAAT GTTAAAAACC GCTCTGCAGT GGACGTGCTC GTCAAGGTTC GGACCAACTC CACGTTCATA AAATCAGTTT ACTCTGAAAA TrTACAGCTC CAAAAACTGC AAAGTTCTTG TTATCCTTIGA CCAAACGTGA AAGTTGCAAC AAACTTCTTG TCACACAAGC TGTATGATGT TATCAAAAAA GAAAATATGT A'N'TGAAGTT AAGCTGCTTT CGAAGGTGTT 'rTCTCGTTAC AG43TAAAAAC
TGTACAAGTT
GAAATCTCTT
ATCAGTCACA
TGTTICTTAGC
CATCAGCCAA
ATCAGGTGGT
TATCCGCTCA
CGGCTACAAA
GTTCCAGAAA AGAACGTTAT ATCACTATCA AATCAGCAGT ATGGCAAACG TAACA'N'ATT GATGCAGTAT TTGGTATCGA CGCGCAAGCC TTCGTCAAGG GGACGCAAAC CATGGCGTCA CCACAATG'GC GTGGTGGTGG CTTCCACAAA AAGTTCGTCG
CACCGTCGTC
CTTGCAGGAC
AAACGTTGAA
CGTTGTTGAC
CGGACAATCA
GGGGCCTGTT
CGACCGCGTA
CCTTATCAAA
TAAAGCTGGT
TGACCAAkACT
ACCAAATGAA
AACACACGCT
AAAAGGAACT
TGTT-GTCTTC
CCTAGCTCTT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 AGTTGCTGAA AACAAATTCG TAGCTGTAGA CGCTCTTTrA TGAAFTTTGCA AAAGTTCTTG CACCATTGAG CATCGATTCT AGAAGGAAAT GAATTCGCAG CTCTTTCAGC TCGTAACCTT TGCTACAACT GCAAGTGTTC TTGACATCGC AAATAGCGAC AGCTATCTCT AAAATCGACG AGGTTCTTGC ATAATGzAATT CCTGTCATCA CTGAAAGCTC AATGGCTCAA CTTGAAGCAG GACACTCCTG CACACAAACT TTTGATCAAG CAAGCTGTTG AAAGTTGCCA ATGTTAACAC AA'rCAPACGTA AAACCAAAAG 540 CTAAACCTGT TGGACGTTAC ACTGGTT?1'A CrAACAAAAC TAAAAAAGCT A'rCATCACAC TTACAGCTGA TTCTAAAGCA ATCGAGTTGT TTGCTGCTGA ACCTGAATAA TCTAAGGAGG AAATATCGTG GGAArrCG'rG 'rTTATAAACC AACAACAAAC GGTCGCCGTA ATATGACTTC ?n'GGAN'TC GCTGAAATCA CAACAAGCAC TCCTGAAAAA TCATTGCTTG GAGCAAGGCT GGTCGTAACA ACAACGGTCG 'rATCACAGT'r CGTCACCAAG CAAACGrTC TACCGTrTGG TAAA.ACAATC GAGTACGATC TTGACTTCAA ACGTAATAAA GACAACGTTG CAAACCGTTC 'rGCAAACATC GCTC7*MTAC TCGCTCCAAA AGGTCTTGAA GTAGGTCAAC AACTCGGAAA CGCTCTTCCA CTTGCTAACA
CGGTGTGAAA
AGGTCCAGAA
TACTTTGAT
TGGTGCATCT
AGGTGAAGrr
ACAACATGGA
AACAGTTCGT
GCATACATCA
GCAGATATCA
CACAACATCG
GCTCAAGTAT
CGTATGATTC
CTTGTAAACC
GGTTCTGTAA
AGTTGAAACC
TGGGTTCTGA
TTGGAACTTG
TTGGTAAAC
TGAACCCTAA
CACCATCTAC
AGGTCGTGGT GGTGAAT'rGG AGGTAAATAT GTTCTrGTTC
TTGCATTGAA
GTGGTGGACA
AAGCAGTrT
ACTACACTGA
GTATCGTTTC
TCCCAGTTGG
TACGTGCTGC
GTCTTCAATC
TCGGAAACGA
GTATCCCCCC
GTGAAGGTAA
TTGGTCTTAA
ACGAGAAATA
CTCCATAGGA
CGCAGTCTTA
ACCACCAG'r'I GCTCGTAAAG AACTCGTAAC AAGAAAGCGA AATCTGACAA ATA'rTAAACT AGTCGCTTAA GCAAC'rAGTA GTGCAAGCCG CTGTGGTACA ACATr'rAAAG AAAAAGGACC 'TTrCGTCGAT GAGCATTTGA AAAAGAAAAA AGT'rATTAAA ACTTGGTCAC
CCGTGCTACA
AGGACGTAGC
CGATCACCCA
TCCATGGGGC
AC=ATCGTT
AATCCGCCAG
GAGAAAATAT
TIGAAAAAAGT
GTCGTTCAAC
GTTGGTGTTG
CGTTGGAAAG
CACGGTGGTG
AAACCTGCTC
CGTCGTCGCA
CTCGGTAGCG
AAAAATGGrGA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 TGAAGCTCAA GCTALACGACG GATCTTCCCA AGTTTCATTG TGT'r'IACATC CAAGAAGACA TTACAAAGGT CACGCTGCAG GTT'ACACTAT TGCAGTTTAT GACGGACGTA AACACGTACC TGGTAGGCCA CAAACTTGGT GAA7TTGCAC CAACTCGTAC ACGACAAGAA AACACGTAGA AAATAAGGAG AACATAAATG GCAGAAATTA CTCAGCTAA AGCAATGGCT CGTACAGTAC GTGTTTCACC TCGTAAATCA CGTCTTcTTC T1'GATAACAT CCCTGGTAAA AGCGTAGCCG ATGCAATCGC AATCTTGACA 7TCACTCCAA ACAAAGCTC TGAAATCATC TTGAAAGTTT TGAACTCAGC TGTAGCTAAC GCTGAAAACA ACTrTrGG7TT GGATAAAGCT AACTTGGTAG 1'TTCCGTCCA CGTGCGAAAG TGTAGCTGTT GCAGAAAAAT TATCTGAAGC ATTCCCAAAC GAAGGACCAA GTTCAGCTTC ACCAATCAAC AAACGTACAG AAGGAGGTAA AATCGTGGGT CAAAAAGTAC
CTATGA.AACG
CTCACATCAC
A'rCCAATTGG TATGCGTGTC GGCATCATCC GTGATTGGGA TCCCAAATGG TATGCTGAAA AAGAATACGC 541 GGATTACCTT CATGAAGATC AGCAGTTTCA ACTATTGAAA TGCTAAACCA GGTATGGTTA ACTTAACAAA TTGACTGGAA ?T'rGGATGCT CACCTTGTAG CCGTCGTGCA CAAAAACAAG AACTCAAGTA TCAGG'rCGTT T'rGCAATCCG TMAATTCGTT CAAAAAGAAC 7?rGCGACC TCGAACGCGC AGTAAACAAA GTTAACGTr'r CACTTCACAC TCGGTAAAGG TGGGCTAAC GTTGATGCaC TCCCTGCAAA AACAAGTACA CATCAACATC ATCGAAATCA AACAACCTGA G1'GAAGGAAT TGCTCGTCAA ?1'GGAGCAAC GTISITGCTTT CAATCCAACG TGCAATGCGT GCTGCAGCTA AAGGAATCAA TGAACCGTGC AGATATCGCC CGTGCTGAAG GATACTCTGA AGGAACTGT CCGCTrTCACA CACTTCGTGC TACTACATAC GGTAAACTTG GTGTTAAACT TCGTAAAAAC ACTAAAGGAG GTAAATAACC TCGTGAGTTC CGTGGAAAAA TGCGCGGTGA TGAATACGGT CTTCAAGCTA CAACTAGCCA TCGTATCGCC ATGACTCGrr ACATGAAACG ACACAAATCA TACACTGCTA AAGCTATCCG TGAAGGTTGG GTAGCACCAG TTAAACGTGG TGAAGAGATT GCACGTGAAG CGCT'rCGACT ATTrCGTAAAA CGTGAAGCAG AATAAGGAGA TGTTAAAGAA CTTCGTCGTC TrrCTCAAGA AAAAGAATTG 'TTGAACTTC CTTCCAAC CTTGAAAGAA GTTAAAAAAC AAATCGCTCG ATAGACTAGG GAAGGAGAAA TTTCAATGGA TGTTGTATCT GACAAAATGG ACAAGACAAT CCCAGTCTAT GGTAAACGTA TTAACTACTC 7TGT'rGCCAAA GAAGGCGATA TCGTACCTAT
ATGGATCTAC
AATG=rAGTA
AGCAAAAGGT
CTIGGATCACI'
TGGTGGTAAA
TGTGCGTATG
TAAAGTGATG
TGCTAGCCAC
AGGCATGAAA
AGAACTCGCG
TGCTACTGGT
CATCAAAACA
ACGCAATAAT
CACAGTTGTA
TAAAAAATAC
CATGGAAACT
CGTGGTGAAG TTCTTCCAGC CCTAAACGTG TTAAACACCG GGAAAAGAAG TAGCATTCGG AACCGCCAAA TCGAAGCTC GTTTGGATTA AAATCTTCCC GGATCTGGTA AAGGGGCACC TTCGAAATCG CTGGTGTATC AGATATCGAT 'rACGCTTGGG AAGAAGCAGA 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980
AAATTGCCAG
CTTAATGAAG
AACCGAAA
CAATTGGAAC
GTTCA.ATCTG
CGTAAAGTC
GTTGAAACAA
AAAGCTCATG
CGCCCGCTTT
TTAAATGTAA
TAAAAGAATT
ACGAATTGAA
AAACAGCTCG
AAGCGAAATA
TTG7TGGACC
AACGTAACCA
ATCAAAACAA
CAGCTACAAA
ACCTGAAAGG
AGCGGTGCTC
AACATCGGTG
AAAGGTGACG
GGTTCATACA
ACGTTTCCGT CTTGTAGAAG AGAAA.ACTGA AATG.ATTCAA GCGAAATCTT GACTATCAAA ATGTTATCGT GGCATCTGTA TTGrTAAAGC AGTTATCGTT TTGTTGAAGA AGCGGTCATC ATCTAATCAA ACAGAAACTC GTTTGAAAGT CGCAGACAAC GTTCTTGGTG GTTCAGGACC TAAATTTGCA AAACAAGCTA CTCCTGGTGG TGCGGTTAAA CGTACTAAAT CAGGTGCTCG TCGTGCTGAT 542 TCAAA'NTGA CGAAAACGCA GCAGTTATCA TCCGTGAAGA CAAACTCC' GTATCT7TGG CCCAGTTGCA CGTGAATTGC GTGAAGGTCG CTTCATGAAG TTGCTCCAGA AGTACTT'rAA GCCCT'rATGG GCGTAAGAAA AGTTCGCGTA ATCGCTGGTA AAAAGTAAAC AAAG3TTATCG TAACGAGCTT CCTCAAGGTG TCAAGrrrc GACAAAAATG T'T=AGGAA CAAACTAGTC CCCTAGCTTC AATCAACGAG AAACCI'AA'rG TTTGAAAAA AAGATAAGGG AACAGAAGCT GTTGTCCTTA TTAAGGTGT 'rAACATr AAGAAACACC GTATCATCGA GAAAGAAGCA GCTATCCACG GTGTAGCTGG TCCTGTTGGA TACAAATTTM
CGCGGAACAC
ATCGTGTCAC
AAGCTAGGGT
AAGGCGACAA
CTGCCCr-rCC
AACCTCCAAC
TATCAAACGT
TAGACGGTAA
AAAAGTTCGC TACAACAAAA AATCAGGCGA AGTCCTTGAT AGTATAATGCG CAAA'rCGTTT AAAAGAAAAA TATCTAATG GAACAATTCA ACTACTCATC AGTGATGGC ATGGCGTGTTG CTGAAGCTGT ATCAAACGCT GCACrrATCT CAGGTCAAAA ACCACTTATC CGTCTTCGTG AAGGTGTTGC GATCGCTGCA GAATTCTTGG ATAAA'rTGGT ATCAGTTTCA CCAACAAAAT CA'N"rATGG ACGCGGGAAC TTCCCAGAAA TCAACTTCGA TGACGTTGAC ACAACTGCTA ACACTGACGA AGAGTCACGT GCAAAATAAT ATAGGAGGTA AATCTAATGG AACGCCAAAA AATTGC'IAC CGTTATCCTG ACTACGAAGG TTTATCTAAA rrACCTCGCA GTAGGGTTAC GGGGCGCCCA CATTCAG'N'T 'rTCGCGAAC'r TGCGCATAAA GGTCAAATTC AGATATCAAG AGCGTCAAAA CTCCAAGTAA TTCTAGGAAA GTTTATCTrr TTCACACAGA
GTGCCTAAAG
AAAACCTT'G
ACTAAAGCTA
AAAGTTACCC
CT'rCCACGTG TACACACT'rc
AAAACTCGTG
GCATTGCT'rA
CTAAAAAATC
AAAAACGTGC
ACCCCTCACC
ACCGCAAATT
C'rGGTCGrAAC
AAATAGGAAA
GTTTAGCCCG
TAATCACGAA GGAAAGGAGA AAGTAGT'rCC TGCTTTGACA TAGATAAGAT TGTTTTGAAC AAAAAGCTGC TGAAGAATTG AAAAATCAAT CGCCCGCTTC TTCGTGGTGA ACGTATGTAC TACG'rGACTT CCACGGTGTC GTGTGAAAGA ACAATTAATC GTCTTGACAT CGTTATCGTA CAGGCCTTGG ALATcCCTT AATGGTAGCT AGAGAGGCTA TGCATTAAAG GCGGCAGGGG GACTCGTTlTA CATAATCGTT TGGTCTGAGT CGTATCGCTT AAAAGCATCr TGGTAATTTA CTTGACGAAG AAACTAAAGT ~GT'rCAA'Tc GGCTTGCCAA 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 TTTGAACACG AGCTACAGCT TTGGCAAAAA AGACCPA=?~ GCTTTGGAGC A'rTGCTTCTG CATTAAATTG TCTATTG CTCGTGCTGT TACGCTCTr? GTATCATGTA rrAACTAGCA AGTGCAACTT GCAAACTACT AGTAAGAGGA GAAAAACAAA ATGGT'rATGA CTGACCCALAT CGCAGATTC CTAACTCGTA TCGTAATGC TAACCAAGCT AAACACGAAG TACTGAAGT ACCTGCATCA AACATCAAAA AAGGGATTGC TGAAATCCTT AAACGCGAAG GTTTTGTAAA 543 AAACGTTGAA ATCATTGAAG ATGACAAACA AGGCGTCATC CGTGTArZ'rC TTAAATACCG ACCAAATGGT GAGAAAGrrA TcACTAAcTT GACGTGTT TCTAAACCAG GACTTCGTGT CTACAAAAAA CCTGAAGACC TTCCAAAAGT TCTTAACGGA CTGGAATTG CCATCCTT~c AACTTCTGAA GGTGCTTA CTGATAAAGA AGCACGCCAA AAGAATGTTG GTGGTGACGT TATCCCAC GrTGGTAAA ATCAAGATAC AAAGCTCGTA AAGAACAAAG CAAAATAGG AAGTTGGAGA AG~rrG~rrA CAAACAAGCC AACTTATCTA TITTGCACAG TTCrTAGAGC GTGTTCAGrr CAGCTC~rGA ACTAAATAAG TATCTGAACC CCCTGAAAAC TGGCCGTTCT' 6840 6900 6960 7020 7080 7140 7200 GCCCTGACAA TTTAACAGGA GAAAATAAAC TTCCCTGCTG GTGTTGAACT CGCTAACAAT GGAGAACTTA CTCGTGAGTT CTCAAAAGAT ACTCTTCACC GTCCAAACGA ?TCAAAAGAA CTMGAACA ACArG7TGT TGGTGTATCA GGGGTTGGTT ACCGTGCACA GCTTCAAGGA CATCCAGACG AAGTTGAAGC TCCAGAAGGA ATCG7TTG=A GCGGAATTTC AAAAGAAGTA ATGTCACGTA TTGGTAATAA AGT'rATCGTG GACAACGTTG TAACTGTAAA AGGATCTAAA ATTCAAATCC GTGTGGAAGG TACTGAAATA ATGAAAACTA TCCACGGAAC TACTCGTGCC GAAGGATTCA AGAAACAACT TGAAATGCGT TCTAAACN'G TTTTGGCTGT TGGTAAATCT AT'rACT=G AAC??CCAAA CCCAACAACA GT'rCGTCAAA CAGCTGCrrA CGTACGTAGC p 0 p C'rTCGT'rCAC CG'rAAAGAAG TTTCCAAC1TT
AACCAGATAA
GAACTGCTGA
TGATTGATGA
CAAAAGGAAC
CAGAACCATA TAAAGGTAAA GCTATCCGTT ACGTTGGTGA AT'rCGT'rCGC GTAAAACAGG TAAATAATGT TGAGTGG?'rG ATCATCAACC ACCAACCTAT TGTGCATAGC ACACGATTT~A AAACTAAAGA GGTGAAAACT GTGATTTCAA AA.ACAAACTC CGCCAAAAAC GCCACCGTCG CG7TTCGCGGA AAACTCTCTG TCGCCCCACGT TTGAACCTAT TCCGTTCTAA TACAGGCATC TACGCTCAAG CGTAGCGGGT GTAACGCTCG CAAGTGCTTC AACTCTTGAT AAAGAAGTTT TAAA.ACTGAA CAAGCCGTTG CTGTCGGTAA ACTCGrTGCA GAACGTGC)AA 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 p.
p p p *ppp 0 Opt.
*OPS
Op p.
0 0 0 ACGCTAAAGG TATTTCAGAA GTrGGTGT7CG ACCGCGG'rG TGAAAGCTTr GGCTGATGCA GCTCGTGAAA ACGGATTGAA GAAAATGGCA T'rTAAAGACA ATGCAGTTGA ATrAGAAGAA TGTTACAAAA GTTGTTAAAG GTGGACGTCG TCTCCTTTC TGACCACAAT GGTCGCGTAG GATTrTGGTAC TGGTAAAGCT CCGTAAAGCA GTAGATGATG CTAAGAAAAA CTTGATCGAA AATCCCACAC GAAGTTCTTT CAGAATTCGG TGGAGCTAAA ATATCTATAT CACGGACGTG ATTrCTAATAG GAGGACACTA CGCGTAGTTG CTGTCAACCG GCAGCTCTTG TTGTTGTTGG CAAGAAGTTC CAGAAGCAAT CTTCCTATGG TTGGAACAAC GTATTGTTGA AACCTGCTGT 544 AGAAGGTTCT GGAG'rTGCCG CTGGTGCTGC AGTTCGTGCC GTTGTGGAAT TGGCAGGTGT 8580 GGCAGATATT ACATCTAAAT CAC7"rGGTTC TAACACTCCA ATCAACATTG TGTT'GAAGGT 7TGAAACAAT TGAAACGCGC TGAAGAAATr GCTGCCCTTC AGTTTCTGAT TTGGCATAAG AAAGGGGATA AAA'rGCTCA AATTAAAATr AGTCTCCAAT CGGACGCA'rT CCATCACAAC GTAAAACTGT 'rGTAGCACI' TTCGTGCAAc
GTGGTATTC
ACTTTGACTA
GGACTTGGCA
ATCACAGCAC
TC'rGCACTGT
ACTTGATGGG
AATTGAACAG CrCTGTTATT AAAGAAGATA TATCTCACTT ACGTAACAGTT GAAGAAGTAA ACCATCCCCT AAAACTAGAT ATAGTCATCT GGAGACAACC TTrCTCCCT TATCGGCGCT TGAAACTTCA TGAATTGAAA CCTGCAGAAG GTGGTACTTC ATCAGGTAAC GGTAAAACAT GTAGCGGTGG CGGAGTTCGC CTGGN'TTG TTCCAAAACG TGGATTCACT AACATCAACG AATTGAACCT CTTTGAAGAT CGTGCTGAAG TTGTTAAAGC TGAAAAGTCA GGTATTAAAA 'rGACTGTGAA AGCAGCTAAA TTCTCTAAAT GTTCAGTAGA AGTCATCTAA GAGAGGTGAC TTAAAGTCAA GCAGGTTCGA TCAAAAATT GTA'rCGGAAC TAGCATTACA GT'rCCTGGTG GATTATCCTT C'TTAAACATG TTCAGCTTGG ACGCTGCTA'r ACTAATGAaG
ATGATGACAT
AGCATTTAC AAAAGAGGAG AAAATAAAAA G'rTCTCGTAA AGTACGTAAC CGCGTTGGTC C'rGGTCGTG4G TCAAAAAGGT CAAAAAGCTC AAGGTGGACA AACTCCATI'r T'rCCtTCGTC CTAAAGAATA CCCAATTGTG AACCTTGACC TAACTCCAGT TCTTCTTATC GAArCAGGAA TTCrrGGTAA CGGTGAGTTG ACTAAGAAAT CAGCTGAAGA AGCTATCACT GCTAAAGCTG
CCGTGGTATG
7TTTTAGGGGA
CGTATAGGCG
8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 CTATGrTTNT
TATTTACAAT
TGAATCCCAA
TGTCGGGGAA
TAAA'rTATTA iGAGAAGCTC TTTTATCGTT TTGGTCTT'rC TAGCTTGAAT GCTTTAAGTG TGCCCTAAAA AACTTTTCGA
TTTTTGCCCT
TGGATATTT
TGAATCAAGC
CAGCTGGTTT
TT7rTCGAC
?AGCAAATTAC
TTTCCTCAAT
AGGAGTTAGT CCCTATATCA ACCCAAGTT'r GTAGAGTCGGG TACTCGTTAT AT'rCCTCTAG TAATACCTTG GCTGGAGCTC GATTCGTATC ATCTTAACAG AGATAAGGGA TACGGAAACG TCCAGAGATG AT^TCACGGCA CCGCTTCTAT TGTTGTCCAA CTCTTGCAAA GTAAACAAGG GGAAGTAGGT CGAAGAAAAT TTCTCGCTTT TGTCAATCT AATTGATTAA AACTGCTTTA CTC4TAGTAT GATTGTCACT
ATCGGGATTA
ACTCCACAAG
TGGTTCGTG
0 GTGTTTCCAT
TCTATGTGGA
GATTATCTrr G4CCGGGATTG CTACtTTGTG AACGTCCCAA GTAGCCGTAT CACTTCATCT A'rATTTTCG TAATCATTTT GATTATTACT GTATTGTTGA TTATTTACl-r TACAACTTAT GTTCAACAAG CAGAATACAA AATTCCAATC CAATATACTA AGGTrGCACA AGGTGCTCCA TCTAGCTCTT ACCrrCCGT'r AAAAGTAAAC CCTGCTGGAC 545 TTATCCCTGT TATCTTTGCC AGTrCGATrA CTGCAGCC'TG CGGCTATTCT TCAGTTTTrG AGTGCCACAG GTCATGATTG GGCTTGGGTA AGGGTAGCAC AAGAGATGTT GGCAACTACT TCTCCAACTG GTATTGCCAT GTATGCTTTG TTGATrMTTC TCTrTACATr CTrCTATACG TTTGTACAGA 'rrAATCCTGA AAAAGCAGCA GAGAkCCTAC AAAAGAGTGG TGCCTATATC CATGGAGTTC GTCCTGGTAA AGGTACAGAA GAATATATGT CTAAACTTCT TCGTCGTCTr GCAACrGTTG GTTCCCTCTT CCTTGGTGTG ATrTCCATTT TACCGATrGC AGCTAAAGAT GTATT TGGTC TTTCTGATGT TGI'GCCTTT GGTGGAACAA GTCTCTTGAT CATTATCTCTr ACAGGTATCG AAGGAATCAA GCAATTGGAA GGTTACCTAT TGAAACGTAA GTATGTTGGr TTCATGGACA GAACAGAATA AAAGTATTTA CTGAATCAGT AAATACTGAG GGAGTGGAGG TTTAAACTCT GACATTTGTA AGAGTTGGAT CTCCCCTCTT CTATTTrGTr TTTAAATCGG GGTGA-AAAGA CTTTr'TGCTr CTATTTAAAA ATAAAATAAG GAGATCAAAT CATGAATCTT TTGATTATGG GCTTACCTGG TGCAGGTAAG GGAACTCAAC CAGCAAAAAT CGTAGAACAA TTCCATCTTG CACATATCTC AACAGGTGAT ATGTTCCGCG CTGCAATGGC AAATCAAACT GAAATGGGTG TTCTTGCTAA GTCATATATT GACAAGGGTG AATTGGT-rCC TGACGAAGTr ACAAATGGAA TCGTAAAAGA ACGCCTTTCA CAAGATGATA TTAAAGAAAC AGGATTCTrA TTGGATGG'I- ACCCACGTAC AATrGAACAA GCTCATGCCT TGGACAAAAC ATTGGCTGAA CTTGGCATTG AACTAGAAGG TGTTATCAAT ATTGAAGTGA ACCCTGACAG CCTTTLTGGAA CGTr'rGAGTG GGCGTATCAT CCACCGCGTA ACTGGAGAAA CTTTCCACAA GGTCTTTAAC CCACCAGTTG ACTATAAAGA AGAAGATTAC TACCAACGTG AAGATGATAA GCCTGAGACA GTAAAACGTC GTTTGGATGT 'rAATATTGCT CAAGGAGAAC CAATCATTGC TCACTACCGT GCCAAAGGTT TGGTTCATGA CATCGAAGGT AATCAAGATA TCAATGATGT CTTCTCAGAT ATTGAAAAAG TATTGACAAA TTTGAAATAA AGCGTI'TTTC ACACTTGCAA AAATCCGCTA CAAATGTTAT ACTGAGATAG TCTGACTTAT AATTGTTGTC TCTGTGTCTA GAGGCATCGA ATCGAAATTT ATGGAGGTGC TTTTGCGTGG CAAAAGACGA TGTGA'rTGAA GTTGAAGGCA AAGTAGTTGA TACAATGCCG AATGCAATGT TTACGGTTGA ACTTGAAAAT GGACA'rCAGA TTTTAGCAGG G INFORMATION FOR SEQ ID NO: 66: SEQUENCE CHARACTERISTICS: LENGTH: 10726 base pairs TYPE: nucleic acid STRANDEDNESS: double 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11831 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: CCCGGCATIT GAAAGCTATT CGTGAAGGAT TTATGATGGC AATGCCTTTG ATTrAGTCG GCTC?17TATT TCrTA'rTCTA ATCAGTTGGC CTCAAGAGGC TTTTACAAAT TGGCTGAATA GTGTTGCATT GCTAAGTATC TTGACAACTA TGAATCAGTC AACAGTAGCG ATTATCTCCT TGGTCGC TqG ?PTCGGTATT GCCTACAGGT TGTCGGAAGG ATATGGTACA GATGGTCCGT CGGCAGGGAT CATAGCCTTA TCCAGTTT'rG TATTGATGGC ACCTCGTTITr TCGAGTATGG TTTATGATAA AAATGGGGAG CAGGTCAAGC AGTTATTTGG CGCCAATA CCATTTTrTrA
GCCTGAATGC
ATCCTATGTr
TAAGTAAATC
ATCTI'CTTTG TTTATGGCGA TTACTATTCG ATTGGTrACA GCAGAGATTT TATCCAGCGC GGAATTACGA TAAAAATGCC AAGTGCTGTC CCAGATGTAG ATTTTCAGCT C 1'?rATCrG GTr'rTACTAC ?TTTTGTTTTG TGCGC'ITMG TCTTAAAAGG TCTTGAAGCG GCAGGAGTrG TTGTTGGAAC ACCGCTTAAG TTAATTGCAG 'N'GTAAACTC ATTCTTTTGG TTCTGTGGAG TAGACCCAGT T'rGGTTACAA TTrACTACAG CAC CCAACA CATTATTACA TTACCGTTTA GAGCGACTAT TGGTCTTGCG ATTTGTCTCT CAGGAGGTCT CAACGGACTC CTAGGTGCAA CAACGCTrCC
TTAATGGGGG
AAAACCAAGA
AAGATI'TATT
TCCTATTTAG
TTTTTAATAT
TGATTCCGTT
TAGGATTAGT
AGGTATGATT CTATGTG'rrA ACAAGTTTTA AATGCTTTrG AGCTCTGGCT GCAGGACAAA TGTATTrATT GGTGGCGGTG TAAGAGTCGT GCGAATAAAA CAATACAGCT ATTCTATTTA TATTGCTACT CCTACAATCA ACCCTATACA ACAGGTGTAA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 CATTAGGTAA GCTAGCTATT CGTTTCCAAC AGTTTTAAAT ATGCCTTGAT TACCTATGTA TCCTTCCGTG GACAATGCCA GAGGAGCTCT ATTACAAGTT TCAAAATTGC AGATAAACC ATGGTTATCA GAGTATTTGA
ATACCGTCTA
CCGATTATGC
TCAATGCCTC
CCGATTATAG GAGGCTTCCT TGCAACAGGG GCTAGTTGGC GPTTTGATTT TGGTTTCTGT AGCAATTTAT TATCCATTCT AATCTTGAAA AAGAAAAAGC TACTGTTGGA GGGAAATAAG TCAACAGAAA AATACTTATT CTAGCTTTGC C,-rAGAGGAA TTAAGT'ACT ATATGAATCG GGTCTAAG GCGGATAT'rr TTGTAGGATT AGTCAATAA.A TTAGACAACG GTAAGGGGAG AATTGAGTCT TACCGAATGT TTCATGAATT TGGGGTTCTG GTTCCAGAGT TACGATTTGA AGAT-rTTTTA ACTAACATAG AGCTTGTCGA GGAGAAGGAA GAGGACAGAA AAGACCATGT TCTI'ATCTCA AATACAATTG TAGG'N'TACT TATTGGAATT TATACTAGAC CAGGGCGCAG ACATGACTTT GATAAACAGC TATCTATAGA TGAAACAGCC 547 AGTTACTATC ATAGGGGAGT ATGTATAGAG GGAGCGGATT CA'TrGAAAA TATACTAGAT 1680 TTCATTGArr GCCTACCTAA GATTGGGATG AACAGTMT TCATCCAGr TGAAAATCCT 1740 TACTC7rT TG.AAACGTTG GTATGAACAT GAA2rrAATC CATATCTAAA TAAAGAACAA 1800 TrTcAAATG AATTAGTACA AGAATTGAGT GATAGGT-rGG ATAAAGAATT GCAAAAA.AGA 1860 GGI'CTTAITC ATCATCGTGT TGGTCATGGA TGGACAGCTG AAGTTWTAGG TTACTCTTCA 1920 AAATTTCGCT GGGAATCAGG TC?1TAGTAI-r TCAGAGGAGA AGAAACCCTA TGTCGCTGAA 1980 ATAAACGGGA AACGAGAATT GTTTAATACG GCTCCGATT TAACCAGCCT GGA7"rIrCA 2040 AATCCAGATG 'rAGCTGATAA CATGGTAGAA ATTATCAAGG ATTATGCCAA GAAAAGACCT 2100 GATGTTAACT ACTTACATGT ATGGTTGTCG GATGCTCGTA ATAATATTTG TGAATGCGAA 2160 AACTGTAGAC AAGAATTGGT TTCCGATCAG TATATTCCTA TTCTCAATCA ArrGGATAGG 2220 GCTT'TAACGA GTGAGGGATT AGATACAAAG ATTTGTTTTC TGCTTTATCA TGAGTTGTrrA 2280 TGGGCACCTC AGAAAGAAAA ATTAGATAAT CCTGAACGCr TTACCATGAT GTTTGCACCG 2340 ATTACAAGAA CA'IT-rGAAAT GACTTATGCA GATGTAGATT TTGACAATTC CA'rACCTACG 2400 *CCTAAACCTT ATATGCGTAA TAAAATTA'rA CTTCCGAATT- CTCTTGAGGA AAATTATCT 2460 *.*TATCTTN'G AGTGGCAAAA AGCATTTAAA GCAGATAGTT TCGTATATGA CTATCCTTTA 2520 ***GGGCGTGCTC ATTATGGCCA TTTAGGCTAT ATGAAAArTA GTCAAACTAT TTACAGAGAT 2580 GTATCTATC TTTCCAACCT ACAT'rTGAAC GGGTACATTT CG'rGTCAAGA ATTACGTGCC 2640 GGATTCCCTC ATAATTTTCC TAATTATGTC ATGGGGGAAA TGCTCTGGAA GAAGACAAGA 2700 AGTTATGAAG AATTGATTGA AGAATACTrT TCTGCTTTCT ATGGGGAAAA TTGGCAGTCT 2760 GTTGTTGAAT ATTTAGAAAA ATTATCCATT TATTCCTCTTr GTGATTATTT TAATGCAATT 2820 GGCAGCCGTC AAAGTGATGT TTTAGCGAAT CATTATTATA TAGCTTACAA TCTAGCTGAT 2880 ***AATTV1=AC CAATTATTGA GGAAAATATT TCTAAGTTAT TAAATACTCA AAAGGATGAA 2940 TGGAAACAGC TCAGTTATCA 'rCGTGAATAT GTTGTTAAGA TGGCGAAGGC TTTATATCT-r 3000 CAAGCAACTG GAAAAACAAG GCAAGCTCAA GATCAATGGA GAAATr;GTT~ GAATTATATC 3060 CGTGGGCACG AAT'rGCTATT TCAATCTAAT TTGGATCTTT ATCGTGTAAT TGAAGTAGCA 3120 AA.AAATTACG CTGGTTTCCA CTTATAAATC ATAAGTATAG AAAATGAACT AAGGTATTCA 3180 *GAGAAGATTG ATCCTAAATA TTATGAAATT TAAGGATT TAAGATATTr ACGGTCAACT 3240 TTCTATTTAT ATCGTAGCGA AGTCATTTTA ATAATGATGT GTAAAAGATG GATCAAGATT 3300 GAGGAGGAAG AAAGATGAAA TCAAAAGAAG AAATAAATAT GCTTGGTTI'T ACAATTGTCG 3360 548 CTTACCCAGG AGATGCAAGG TCAGA'TrGA TGCATGCTTT GGCGTTTGCG AGAGATGGAT 3420 ATTTTGAACA GGCAAGAGAA TTGG7*MAGT CTGCAAACGA CTCAATAGTG TCTGCCCATC 3480 GAGAACAGAC TAATTTATTA GCGGAGGAGG CATATGGAGA TAA'rrGAA GTGAGCTTTA 3540 TTATGATT-CA TGGrCAAGAT AC~rTGATGA CAACGATGCT ATrGTATGAT CAGGTAAAGT 3600 TTTTTATTGA TGAATATGAA CGAATTCGAA AGATrGAAGA AC17.T'GGT 7rIGCAAC.AG 3660 GATTAGTCAT GGAAAATTTA CAGGTAAAG CCTrACCGAA CCAGTN'TTA TTACGAACTG 3720 CTACCGCTGC TTATCAAGTA CAGGGTGCAA CTAGGGTAGA TGGCAAACGA ATAAATATGT 3780 CGGATGTTTA TrtrGCAACAA AA'rAGTrCCGT TCTTACCAGA TCCAGCTAGT GAT7TTTATT 3840 ATCGTTACGA AGAGGATATA GCTTTGGCGG CAGAACATGG TTTGCAGGCT rrGCGTTTAT 3900 CTATTTCTTG GGTTCGTATA TI'TCCI'GATA TAGATGGGGA TGCTAATGTA TTAGCTGTTC 3960 ATTATTACCA 'rACAGTTTTT CAGTCTTGCT TAAAACATAA TGTGATTCCG TTTGrrTCTT 4020 TACA'rCATTT TGATTCGCCT CAGAAAA'rGT TAGAAACAGG CGATTGGTTrG AACAGAGAGA 4080 ATAT'rGATCG TTTCATACGA TATGCTCGCT TTTGTTTCCA AGAA'r-ITACA GAAGTCAAGC 4140 *AT'rGGTTTAC AATCAATGAA CTGATGTCTC 'rTGCTGCAGG TCAATATATA GGAGGTCAGT 4200 *TTCCTCCAAA TCA'rCATTTT CAATTA'rCTG AAGCAATTCA AGCGA.ATCAT AATATGTTGT 4260 *TGGCGCATGC TCTTGCAGTC CTCGArTC ATCAATTAGG GATTGACGGA AAGGTAGGTT 4320 *GTATTCA.TGC TTTAAAGCCA GGCTATCCTA TTGATGGGCA AAAAGAAAAT ATTTTGGCAG 4380 ***CTAAACGGTA TGA'rGTTTA'r AATAATAAAT 'rrCATTAGA TGGAACTTTT TTGGGCTACT 4440 ACAGTGAGGA CACGCTTTTT CACTTGAATC AAATATTGGA AGCTAATAAT TCTAGCI'TA 4500 T...'TATTGAAGA TGGTGATTTA GAAATTATGA AGAGAGCTGC ACC'TCTTrAAT ACGATGN'TG 4560 *GGATGAATTA TTATCGTTCA GAATTTATTC GTGAATACAA AGGTGAAAAT AGACAAGAAT 4620 TTAATTCAAC AGGAATAAAA GGACAGTCTT CTTTTAAATT AAATGCTCTA GGTGAATTTG 4680 :..TAAAAAAACC TG.GTATTCCG ACAACAGATT GGGA'IrGGAA TATTTATCCT CAAGGGTTAT 4740 TTGATATGT'r GCTTCGTATC AAAGAAGAAT ATCCTCAACA TCCGGTCA'PT TATTTAACTG 4800 AAAATGGTAC AGCCCT'rAAA GAAGr1'AACC CAGAGGGCGA GAATGATATT ATTGATGACA 4860 GTAAGAGAAT CCG?1'ATATT GAGCAACATT TACACAAAGT TTTAGAGGCT CCAGATACAG 4920 GAGTCAATAT TCAAGGCTAT TTATATGGT CT'rTGCAAGA TCAATNTTCT TGGGCGAATG 4980 *GCTACAATAA GCGATATGGT CTTTTMCTC TTGA'N'ATGA AACACAGAAG AGATATATTA 5040 AGAAAAGTGC TCTTTGGGTA AAAGGGCTAA AACGGAATTA AGGTTAGCGA TTTGACTGAT 5100 GTTTAATATG 7=~AAATAT CACGTTGAAT TrrmATAGG AGGAG=~A TGGATAAGCT 5160
AGTCGCI'GCC
GATGGCTAT
ATTGAAAAGC AACAAGGGAA ATITGAAAAA ATTTCTACTA ATAACTATAT AAAGATGGAT TCATTGCTAC TATGCCTrA ATTATGT7"M CAAGCr'1NTr ATTATGATTC CTAAAAATTT CGGAGTAGAG TTACCGAGTC CAGCTA7~T AAAGTGTATA TGTTAACCAT GGGAC?1?rG GGTATTATTG TTTCAGGGAC
GATGATTATT
CTGGATGAGA
TGTTGGAAAG TCATTAGrG GAAATGTTAA CAGAAAAATG CCTCACGGAA
B
B. TGATATTTCT GCAATGTCG AGTTGATGAG AAGACGGGAT GATAACTTCG ?1'TGTCAGTG AGACATTACT ATTCATTTAC TATTTTCCCT TTrrCTTTTG TAGTTrAGAT GrrCCTTTrG GGCAGAATCA TATCCTGCTA TGGAATTCAT GGACCATCTA GGAAGAGAAT GCTCAACTTC T GGGAAT TATATCGCTG TTTGATTTTC TTTATGCGGT TGTTTTAT'rT GCGGTAAATG TCTTT7TTGTC CC'rTTTrGA TGATTTCTTT GGAATGAATG GGGATrG'rTA ATTGGAACGA AGTTGTCGAC ATAT-TGATrr
CAGCCATATG
CTACAAG~rT CCTTTArrAC
CTAAGGAAGT
TTTNTACTTAT
CCCAAGTATT
TGATGTTGAT
'rrGTCTTACC
TTGCAAATGG
CTATTGGAGG
CTAAACAATT
AACCTCTTCT
TGACTCCACC
GAT'rTTATAT
ATTTTCAACT
ATTTGCCATT
TAGTTATCTG GTATTAACTG GTCGACAAAC TATT-TAGGAT 'rGTAAATCTr TACCGATTCT TCCTGGGGCT ATATCACAAG TAGTGGTTTG TTAGATATTG TCAACAACTA TTGACTCCTA TTCGTTTATG TGTGCTTTGC TGCTGTTACA GCTTTGCAAC GCAG7?rCCCT TATCATTCTT
AGGTAATAAA
TAACGCTGT
CTCAAGGATT
GTATTAAGCG
CTTTTAGAGA
TATCTCGGTT
TTTTTAAGGG
TTTGGTTTGT
TGAGCAATAr
TAACACCTAA
TACCATrrAT
CAATTACTCC
TGAATCCCTA
AGGTCTTTAT
CTGGTCCC'r'
CTTTGATTTT
AACCGGGGCT
AAAATCGGTA
ATTTGGI'ATG
AGTGAATGTA
CCAGTTACCT
TATcTccTT
CTGTAGAGCG
TATTTTAGAG
GGAGTTGAAA
AATTAACGAG
ACCTTFGTTG
GGTAAAGCTA
CCTGTTAT'r
TTTCTAGGAA
TGGACCTTTC
GTATTTTTAT
5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 GAAAGAAGAT ATTGCAAGCT CAAATGA'TAT TCCTGGTGAG ATAGATGAAA TAAAAAGTAA GTCTGGAACA AGTGCGCAAT TAGCCAATGC TAGAGTGATT GCGAATTCAG GAGCGTACGG AGCTCATTAT TTTAAT'rATT CTGGCCCCAC AAGTTCGGAG TTATTATAGA AAGATTAGGT A'rCAGATAG TTGCTACCAG AGGAATGGAA TCCAAGTAAA GCCTTACAAT TTGTA7TGGA GCATTACCAA ATCTTTTATT TGAGTAAAGA TTTTG?1-rAC AGATAGGCTT TATGATAGAC AGTTACTCGT GAGGATACAA GTGAAATAAT CTACTGG;TTC TrTGTGCAGG GGGGCTAACT TAACAGACGT GATATTATGG GTGTTTATGA GAGATGAAGG TGGATGCAGA TATATTCATT TAACAAAGAG GCTGTGTAGT AAGTTTTCC GGATTTAAAA ACGT'rCCCCC 550 7Tr'?TMAATA TAACAATCCC TCTTcACA.A TTGTAAAAAG AGGGAwrrrG TATrMATCT CTTAGACCAA GTTCTCTTCA TAAAGAGAAG GAGGATTGGG TAAATCTCCA AGCGCCCTGC AATCATTGCA AAGGATAGGA GAArl-N-rGA GATGGGACTA AAGA'rrGAGA AACTAGAAGT GGTTCCTAGA ATAGGCCCGA TATTATTGAA AAAATCAT'rG CTATCTAGGC TGACAATAAA GACAAAGTAC T'rGAGAATCT TATGCTGGGT GAGGOTCAAA ACACGGTGGG GCGATAGGAT AAGGATCAGG CCTCGAATAA TC~rGAGTCC TGCCATGAGG AAAAGGAGGA TAAACTGGGA TCCAAAACCA GTTGTTGTAA TCATGTTGGA TGAAAACCCT GGAGAGGT AGAGGGTGTT ACAGCTAAAG ACAGCGCTGG GA'rAAGCGCT ACCAAAATCA ATCTITGTCA ATCACCTr
TGACAAAATI'
ACCTGCAGT
GAAGAGCGC
AACCTGGAAG
GAGGCTAATC
GATCCAGCAG
CAGTTGGA
AAGGTCA'rTr
AAGCCTGTAG
TCACGACCAG
TAGCATAGAT
TATTAACATG
CAA'TTTrGA
ACCCACCCAT
TATCTCCATA
CAAAGCT'rT
AAACCAGTAC
AATGACCAAG TAACCCCTAA GCTCTTCATC TCCAAAGAAG GCCTTGATGC GACCCACCAT GAGGTAGTAG TAGAGGTTGA AA=TACTCC AAAAACCAGA ACTCCCATAC TGACCAGATA 000 0 00 00 0 0 0 0 00 0 0 *0000.
S
0 0 GGTAATCAGT GAGCTGCCAT TCCCGCTGTC CCCATAGCAA GATGATGACA AAGAGGGAGA TTTTAGTTTG GATACAACCT TAGGTGGCTA T7rTGGCAT TCCAATCAAG TGGGTAAAAC GTTCAAAATA CT'rGCTCCAG AAGGCTGGGG A=MTCCCAG CCAACAGAGG GCAACGATCA CTGTAAACTC CCTGAACCGC
AGTGGGCAAT
TAACAAAACT
AGAGAGCTAG
TGCCAAAAAC
TCCGTCGTTA TAGACGGTAA AGCCTCCAGT ATCGTAGAGA GGCATACCGG CTAGATAATA ATAAAGGAGA TAGAGAATCT GGGCAGTGTT AGGACCTGGA ACCTCAGCCT TCATCACCTC 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 TGTCCATAAT AGCAAGTGCA TTCGCCAGAA GAGGAGGGAA AAAACAAGCA CTCCCATCCC CGGCTGAGAA CCGAAACGTC TAGTTGTA.AA TCCAGAACTA ATTTCAAAAA AGGCATCAAT AAAAGACAAA GGCGAGACCA CCAAAGAAAG ACCAAAGGAT AGACTCCCTC CTTGGCATAA ATCCGTTGA'r TTTTGGCTT CTAACAATAC GAGAATCCCT ATGGTCGAAA AGAGGGCTGT 0 0000 00 00 0 0 AAAGAC7TGG CTCGATTCAC GGTAATAGAC AGCTTCAATC AAAAGTAATT 7"rGAAAGGAG CCTCGCGATC AAGTCATAAA TCTTGGTGAT CTCTCCAACT TCCAACATAT CCTCCCCAGT CGCTGCAATA AGAACCCCTT 1-rTTCAATTT ATTGGCTTCC TTGATATGGA ATTGCAG4GGT AGCTTGAAGG TCTGAATACT GGGCATTAAC AGCAATCGCA ACAGGAACCA AAAGAAGAAC CTAACGAATC ATACTN'TAT TCATTCTTA GTTTGGCAAC AAGGTTGT.=A CTAGGAGCTT TGGGAAAATA GTCTTGCCCT TTCGAATAAT CAGTTGAGA.A AGAGGTTTGG CAGTCATTTT VrCGATTTGG CCATTCGCTA GATGGTGCAT TCGACCACGA ATAAAG'rGCA TAATCGTATC TACAGCGATG C 'rTTAGGTG TGATGATACT GAGACTGGTA CGArI'GACCT TAGTAATATT AGATGTAATC AGATrTTCCT CATCGACTCC AGCACTTTCT TCCAGCAGGA TATCTTTTGC TGGGAA'?TTC TCGCTAAAGA AGCTGGCGAT GATACGACTA TCTTTGAGAA GAGAAGGCTC TTCACGGCGC GT'rACCAGT'G ACAAAGATIC AATTTGATGA TCCCTCTCTA AGAALATGGGC ATTGGCAAA
TACCAAGTAG
GTGATTT1AAA
TATCTTTATC
TCGCACAGAC
GACCGCTGGT
CGACAGACAG
CAGGATTAAC
-e.e.c C V
V.
V 0
C
V. CV V V p
*V
0 V
V
V. eq C V
I,
GCGrCCACCA
AGCGCGGGCA
CTTGAAA'rAA
CA'TTTTG
AAAGATATCA
AAGGATACCA
AATCAGCAAA
TCCCCCTCCG
TACCCTTrT
CGCTATAAGG
ATATGGTATG
ATCAGACTTT
GTCTCCAACC
GGGGAGCTAT
CGTTGTCGAT
TCAATACATT
TGTAGCCATT
GCAAAGCGTr
GCCAAGAGCT
GAGTTAGAAT AT'rCAGGGTT GCTAGAACTG CTGCAATCAT CAATCTTGGA CGCTGGC?1'G ATGATATCAA AGCGACTGAC ACATCATGCT TTTCTGCAAC ACAAGGATAA TTTCATAAT CAAGAAAAAA TGCACCTACT TAAAACTATA CCCTAACCAA ATAAAGGATA TACAAGGAGA GGTCTGGTTG ATGGTGCGGT TTAAAAATAC ATCACTrGAC CGTCTCTrTC AGACGGTGGA CCAGGTGTCG GTTCGAAACG GTCACCCCAG A'rAATGGGAC CGTGAGATTT CTGAGGTGGC TGAAAAATCA GGCGCA'?rGA 'rAATCTCGAG TTTCTGTACA CCTACCCTGT CAAGGAACAT TGTTAGAGTC GCAACGGCAT CATAGTGTrG GGT'rCCA'rCT CCTTGAACGA TGTAGAGA'T T1'CAGGATTG A'rTCAATGA CT'I-r'GTATC ATAATAGGCA ATTCTACCTG CCCCAACGAT ATAATr'ATG4G AAGAGTATCA TATCGACACG CTGTACAGTC ATGTCACCGC TrGGAATGAT AATGACATTA CCAAAT'TI'T TACGAAAATC GGACTTGACG ACAAATTCCA TGAGGCTAAC GGCGTTGGGG AAGTCAATGA TATTCGCGAT GATAAGAGAA AAACCGAGAA TATTCTTTTC CCGCACCCGA ACCATAGTTT CTTTrAGC'rCC GTITGACT'rCA TCGTGCTCAG TCAGGGCGAT CTCAAGAATG GCAAAATCGG CCCCGTTACC AATATGATTG AGAACAGCTT CGTCTTGCTC CAAGGAGCGA CAGAGGGCAA AACCAACTTr AAAACCTACT 'rTTTCATGAT GTAACTATCA AGCTAATAAC AAGAGTTT'TT AC;TGAAAATT TTGAAAT1AGC TAT'rAGCGAC TTTCTCTGAA TAAALATGAAT AATAATTTAC TGGTATTACA ATCGGCTATG ATTGGAGTGG C?1'TAGAAGA GCACGATATC ACGCCTTATA ATATTTT'rGA TAcTGGCCT GAGGGAACGA CGTTTGTATC TAAGAGTGTA GTTGCCAAGA CTGCAAAAAA CCTTTCCT ATCAAGAAAC ACGTTGGCAT CAATAGGCGT CAAA.ACACAG AGCATTCTTA TACrGGTGC1' AAACTGGCCA GTGGTCACAT TGTGGAACAG ATTGTAGAGC T-rCCAGTCGT 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 @0*e C V. CV
V
V 0 TACCT'rCCAC GGTCGTGATG TCTA'rCCCTA TAC1TT'1'GAG GAAGTAGGGC CAGAGCTCAG 552 AGCGACCATC ATAGAAGATC ATCTGGTGAA GGGAGCCATr GATATT CTGG ATGTGCGT CGGTTCGCTIT TGGACCTCTA TCACACGGGA AGAATV"AC AAGCTGGAAC CAGAATNrGG TGATCGTTTT GAAGTGACCA TCTATCATGC TGATATGCTG GTCTATCAAA ATCAGGTTGT CTATGGCAAA TCATN'GCAG ATGTGAGAAT TGGGCAACCs ATcTTTACrc TCAGCaTCTt CGATTAGCTG GGCAA'N'CGT TCTAGTrGGA ?PTCGTCAAT CAAGGT INFORMATION FOR SEQ ID NO: 67: SEQUENCE CHARACTERISTICS: LENGTH: 7163 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 10500 10560 10620 10680 10726 TTATCTrTAA CGATATCAAT T'PGGCATCCA AGCCCATGGA TGGTCAAGCA AATCGAACAA TTATCCGCCG AGTTCCAATG TTGACTTAAA CGCACCATTT GGCATGGAAA GATTATCAAT GCGCTTATGC TGCTGCTAAA ACGGTGGAGC CAATATCCAA CAGCACCTCT TCGTGAATTG TTGCAAAAAC ACCTGCTGCA TTCTCGCTAG TGATGCCAGC TCTTAGCCTA CATCGGAAAA CATTAATCAA TGAAAATAGT CAAGATCTGG TCAATAAAGG TATGTCTGTG ACGTGACAGA GAGGTTGGTG TCATTGACAT TGCGAAATGA GCGCCGCTGA ATCGTTTCAA AGGCAGTTAT
GATTGGGGCT
CGAGGACGGT
CCTCG?1'AAT
TATCGTGAAG
ATCCAAGCCA
AACGCTGGTA
TTTCCGTAAG GTCATCGATA TCCTTCTATG ATAAAGAAAG
ATT'TGTTCGA
GGGGGCTTGA
TGTAACGGAA
CAAGAAGATG
CGTTGGGCAA
AAT TTTGTCA
CAACCTGAGT
CAAGCTACCA
TGATGAGCGA ACTGGGACGT GAAACAGTTA AAATGTTGAC CCGCAACATT GCGTCTGAAT TrGGACCGGG TTATATTGCC ACTCCTCAAA GTTCTCGCCA CCCATTTGAC CAGTTCATCA ATACTGAAGA TTTGATGCGC CCTGCTGTCT ATGGCCACAT CCTATATGTA GATGGCGGTA AAAAATAGAA AGAAGATC?1' ATGAAAATCG ACAATCACAT TATTTACGAT AGTCTAAAAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 AAGCGACAGA TAAAAAAGGC TACCAATTAT AAAGTCAATr AACTTATGTG CAGAACGGAC CAGI'GACPr TGTTGTrACC GGCTGTGGTA GCTTCCCTGG TGTTGTCTGT GGTCTAGCAG AAATCAATGG TGGTAACGCC TTGTCTATCC AACTGACCCT CAAATTGATG TTTGAACGCT
TTAACTATGG
TAATGGCTGC
CGGGTGTAGG
TGGACCCAAC
CTTATGCCAA
TATGCGTGGA GAAGAAGGAG CATCCTrTTTA AATACAAAGG
GGCTATGCTT
TGACGCTTAC
AGGATTGGC
GCTTTAAACA.
CTTTATTCTC
TGGGGGGCAG
TATTTGCTGA AGAAATGGGC GGTGGCTACC 553 CAAGAGAACC TGTAATCCCT GAACAACGCA ACGCTCGTAT TCACCCACAA TGA~rPGATG ACCATCCrrA AAATAATCGA CCATCTCTGG CAAATACTTC CAAGAATACT TCTG~rAAAA CTGCTTATTT GAAAGAAGTA TTAGCCAAGT AAAGCTATTC GGATGACGAA AATATTACTG 'TTGGCGAAC CATTAATwrCG CC-AGTATCGG CGATCAT=~ GCC-AGTTCGA CTTATTTTG
CTTAAACGAG
CCAAGACTTC
CTGCCAAGAT
TAAACCAGAA
AA'rTCACCA
CGGATCAGAA
GTrGAAACA.AA
CTCAAAGACA
GATGAACTTG
AGGAACTAAT
?TAGATGCCA
ATTAACATCG
CCTGCCAACG
AGTTCAATCT
TGTCC'rCAAA CTTG;TAATTT GCAAGCCCTG GGTATCTCAA CGAAAGTT TACCGCACTC AGATTOGAGA TCGTTTTCTC GTCGGCTTGG CGATCGAATC ACA7TCTTGA AACAGCACCA AATCGATACC GGCCTCTACT ATTTGGAGAA CGGCTTTGGT GTGAAGT7r
ATATCGATTC
CTACGATCGT AAGCATACGA GTATCAGCCA GATTCGGCCA AACATGCTAG TC-TCTTTCAG GGGATTAGCC AT7"rTCATTT TAGTGGAATC ACCGTAGC'TA TCGGTCAAGA GGTCCGI'GCG ATCCTTCTCC TTGTCGTTTC AATGGATCTC AATCTCAGAA TACTCTTGGA AGAAGCCAAG CCCCGAGGAA CAA.AGATGAT TTCAGTCCTA GAAGCCAACT ATGAA'ITTTC TAAGTTTGCA CGTTTTACTG ACTA'rrGCTT TTGATGACCA AAATCTAGAG ATGTTTCCAA GAGACAGTC ATCGCATGCG ACT TAAAA GAAGCCTATG GTTTCAAGGC CTAGTGATGA GCAAGACAAA AATGTCTATC AAGCCTATGC CGGTATTGAT CCTCTCATGA
TAGCCTAGAA
CATTTTrCCAT
TCTAGAAGAA
GAGGTGGAAA
ACCCTCCGCT
CTATT'rGAAG 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 AGTCTGTCCA ACTAAAAACT GCACTCTATC AACGAATrGG TAGCGGGGAT GCCTTTATAT CTGGTGCCCT TTACCAACTA CTCCATCATT CCTCCCTAAA AACTACCATT GACTTTGCAG TTGCGAGCGC AACTCTCAAA TGCACTCTC CAGGAGACCA GTATTGAAAA TTTACTGGCA AATGCACAAG ATATCATTCG TCTCTCCACT TCCTCAACTA TTAGGAGAAT TACATGACCA AATCAGA'rAC GATTATTGAA CTAAAkAAAAC AAAAAATTGT CGCTGT'rATT CAAAGGAAGA AGGACTACAA GCCTCGATTG AAATCGCCTA TACCAATCAG TATGCAGGAC AGGACGATCA GAGTGTTTGT ATCGGTGCAG CTTGTATCAA GGGCGGTATC AAATCATCAA CGAACTTGTA GTACTG'rGCT TGATGCCGTA
CGAGGAAATA
AAAGCTATTG
GACTTGTATC
ACTGCTAGAG
ATGCCATTCT AGCTGGAGCA AATTACGTTG TTTCTCCATC '1-1-CCATGCT GAAACTIGCGA AAATGTGCAA TCTCTACAGC ACACCGTACA TTCCAGGCTG TATTACCCTC ACAGAGATCA CGACTGCACT TGAAGCCGGT ACTGAAATCA TCAAACTCTT CCCAGGTAGT ACTCTCAGTC CAGCATA'rAT CTCTGCAGTC AAGGCACCGA 'rCCCACAAGT TTCCGTAATG GTAACCGGAG CAGTCGGCCT AAACAACATC GTGGCGAACT CA.ATAAACTC AACAGTATAT TACACTCAGA AGCTATAAGC CCAAATCATC CTTATTGCTC TTGACTCGTC 554 CCTCAA'rGGT TCGCTGCTGG 'rGCAGATGCC G'rTGGAATTC GCTTCCCAAG GCAACrWTGA CCGCATCAGC GAGATTGCCC TAAAATCATA ACTACCCGTC TAACGGGTGG TTATCTCAG AGCCAGCG;CC TAAAGACGCT GGCTTTCACG TTGTTCAACC ACTTGCCTCT TTAAGAGACT TTGGTATTAC TTACCACTAT
CCCTAAACGG
ACTGGAATAG
CGATTTTGTC
ATCCI'CATAT
TACACCTCTG
CTGTTCTTAT
TCTTTrTACAC
CTTCTAAAAC
TCA'rTTAC .0 .4 0 TTCATACTCA TGAAGAAATC ATCCACTCGA ATTGTGGCTT CAT'rAAGCCC TACTGGACT TTCCCAAACT TGTACTTGAG AT'rGGCG'rG' TTTAAATACC CCATAAAACT TGAAACAC?1' ACATGGTCTG GCATTAAGTG ACCCTCCATC TGAAATATTT CGTCTAAACT ACTTCTATAT GGGGTGAACA CAATATGATA GAACACCTCC GTGCCATATT TTTTCTCCTT TCGCTTTACA TTGGAGTTTT TTTGGTATAA CCTTCGACGC CGCACCTCAC GGAGCGAGAC GGACTAATAT ATTAGGAAAA TCAAATGAAT TTATAGAAAT TTCAAAACGC TATAGTCACA TAA'rAATGAA TN'AAAAGA AAAA'rCCGAA GATATrGC TTCATGTAAT TCATCTAGAT GATGAACAAA AA.ATGCCCCA ACTAGCAAGA ACCCTAGACT AGGGAGATTC TTNMGATCCG AATGGrCTT ATTCTGTACT TGTGGCTGAG CTGCT'rGCTC TCAATTTATC TAGTGCTATA GTAGATTGAA ATTGT'rAAA6A ATCGATTTGA CTGTCCTGAT TATATATCAT ACTTTACTCG TTCTCAAATT TAATTTCTTT AATCTTGACT ATAN'TCTTA ACATAATAAC CTTCCTCCCA GAAATIGCCGA TTGTCAAACA TCATGAGTGC ACTTTTGCCT AGCCTCGACG GAATACTGAC TAACATGTGT ATTTCAACAC CTTTATAACT ACACAAGCGA TGATTATAGA TGACTTTTCG TCTATACTTA ACTI'TGTGTA TGATAAACTA TGAGTCTT ATTGGA'rTGA ACACCTTTAT TGTATCGCGT GCACCCGTAT AGCGrGGTGGT TGI-rTTGTCT AG'rGGAGrGA AATAGGA'rAC GAACAAATTG C'rrITAGCAG TTATAACGTT CTAT'rCTAGT GTAAAAAAGG ATAAGTATCA ACTTATCCTT CTTCTTCGGA TTTrTTTCTAT TTTCCACAGT TTAG'rTGTTC TTTCCTCTAC CGAATAGA'rA 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620
TGCCAAGATT
'rrCCTCTTCA
TAGCT=A
GACTGACCTT CTCCTGTCTG CATTTT'rCCT TTTCTTTTGA TGATTCCTGG Gr?rCAGGAT TATAGTAGGC AATCTTATAT GGTATAGACT CCACGTTTCA AAACTTGGAA TTGGTrGGAA ATATT'rCACA ATGCCCCAAA CTCCTTGTTT AGCATCATAA Ar=~CGATG AGGCTACTr'r CTAACTCTT 'ATCATTTGA MTTAGGAATG AGCATATAGC CATAAGAATC TCTA1?rGC
AAGACTTCCT
TCATCCCCI'
ATAGTAGAGA
ACAGACTGAA
TTGAAGGTGG
GATCT'GGAC
CTTTCGAAT
CAGAATCATC
GGGTTTCGT
CACGATCCAC
T'rATCAGCCT GACTAATCGI' 4680 AAGAAATA TTCAACTT CTTTTGCAAA GCCTTACTCA cI-L-rCGAA TCCAACGATT 7rCTGAAGG GAGGCTTC?? TTTCCGG TCAATTGTAG TAAAAAGGCG ATCrrATCCT CCAGTTCGGTG AAATCCATGG GAAACCAGAC GGTAAAACT TGTACCACGC ATCTTATATG CCTTGTCTGA CTGTCCTTCA TTGATATCCT TCCAGGCTCC TACTGATTGA ACTCTTCTTA AAGAAAAAGT AACCAATATr CTAAAAAGAC ACTTTGGGTT 'rCAGGATAAT CCI'TTCTTG TATCATTGAC ATAGACTTTA TA'rGGAI-rAC CTGATTCCAG TTGCAGCAGT ATCTGTTGAA GTGNTTGGA TATTGCTTC-c TTAGCATAAA CCAGCTCTT-A TGAGCAGTCA ATGTTTGATT TTGCI'GTCGC ATTGGCATCA TCTAG'rTTGC TCGTTCCAAC TACCTGTATC GCTATCCGC CTCTTAGCAT CCGTCTCTGT GATTAACTGT TCGCCAGTAG CCATCGCTAT AGTGACTCAA ATCGCCATTG TAAAGATAGA ACATCCCATC ACTCGTATAC CAACCACGTT TA7-TTTCCTr Ul CATG7W~ TCGTAATTCA AGGTACGAcT GGAAAAGAGT GACAAGCCAA ATCCAAACCC TTTCTCTGCA 1'TGTACATGG MyGr'rATC CATCTTGTTA AACGCACATA GGTAACTTGG TCTTCGAACA CTTGCGACTC CTGCATCACT TAACAAGGAT TGCATCAAAC TGATATCCTT ATAAGTCTTC AAATTCTTAA AGACATCATA ATAACTATCC GA'rTGAACAA TGGTCTTCAC AAGACTCTGC AAACATTGrr TGGTTTCTCC TTCAGACATA TCCGCTATTC CGTGAATCCC 'rCTTAGTAC'r TCTACTGCGG CCACGTGCCC CTCGCTATTT GCACGACTGA TCGAGCGTCC ACGACTCATA TCCATCAACT CTCCATTCAC CAGCAAAGGA GCAAACGATT TATCAATCCA GTGG'rACATG GTTTGCATTT TATCTTTATC GATTGGATTC TTGGTCTTTT GAATGACTGG CAACAGTTGA GACAGOCCAT CAATCAAAAC ATTCCCATAA GCACCCGTAT AGGCAACArr GGTG'rGGTCG ATATAGGATC CATCTTGATA AAAACCTTCA CCTTGGTCTA CCAACTTGAA CACTTGCTCA ATCGAGCGAA TGGTAGAAGA AATTTCTTCA TCATCCTTAC GCAGTAAACC AGCTATTAC-r TTTACCCTTC CCATATCAAC TAAGTTTCCA CC'rAGACCCT TGAATGG7'T ATCAGTCGTC TTTCGGAAAT GTTCGGGATC TGGTACAAAT 7MTCAATCA CATCTGTATA 7TrTTAATT TCCTCATCAG AGAACTATTC TTTCATCAGA GACAAGGTAT TGTTGA74GGC ACGAGGTGTA CCGATTTCAT AATCCCACCA GTTCCCAACA ATGCCCI-IT CACTATTGTA GACATGTTTA TGCATCCATT CCATGGAATC CCTGACTGTT CGAACGACAG TTTCATCTTG ATAATAACGA GAAGAAGGAT TGGTCACTTG CTTGGCCATC TCCTCCAAT'r TCCGATAAGT GGCAGTCAGA ?TrTGCAGACG TTTTATAATT- TGAAAATTT TCCCACAAAT AGGTGCGGTC CGCCTGACTT CAAATACTGG ATAGGCTATC AGCTACCTrT CCTTCCAATT CCTGGX-rTAA 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
TTTGGCCATC
GTCATCCAAA
TITCACTTCC
TAA.AAT"TCCA
556 TGTrCATTTT TAGAATCATA GTATTGArC CCAGCGATGA CGGTC!TGTGT ATGCATCCTT AACAGAGGCC AGAATCTTCA TrGCCATCTr TACTGACAAT GACATTGGTT GTCCCI-rCCT TTTTTGACTG AAGCAACGTC AGGATTTTCT ACCTrATAAG
TGCCATTCCA
AAGGAATCTT
TAAGAGGTTC
TATAGTCCGC
GCTGTTrATC
CCTTAAAGGA
CATCTAAAGT
TTGCAGAATT
AAGAGAAAAA ACATGrTTTT TGTrTGAGAA TCCTCAGAAA AACAGTCCCA GTrCCTG TT CGGGCTATAG TCTGCTTCA.A T1'CCAATTrGG TAAATCAATC GCTGGTCTGC TACCTCTACC CATAGAATAA CTCCAGCN'G TGCTCTGCCA GTCCTTTGTI'
TTTTCCTCAA
AGCTCAATAT
ATTTTATCAA
CCTGACGTCG
CCACAATCGC TTGTCCTrAC CACTTTCCTC AATGATACGA ATTATCTGTT TTAATCTTGA AACGCAGTTT ATACTTTTTC ACGGTGAAGC GCTGCCCTTA AT'rTCTCATG GC'N'GAGATA CTCAATGACT CGAGTI'GAGG CATCTGCACT ATTCTTCTGG CCTGAGCTTT GCTTCCTGTC CGG ACTTTGGCAA TCCCGATTTT TTAGCTTCAA TAGGAACCAT GTGATAOCCC CATCCTrAGC TCrACCCAAG CTGACCACCC 6960 7020 7080 7140 7163 INFORMATION FOR SEQ ID NO: 68: SEQUENCE CHARACTERISTICS: LENGTH: 9244 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ 1D NO: 68: CGTTATAACA TACATGTAAG CGGTACCCAA AATGGTGCCA AGTCAAAATT TTTAAGGAGG AAAATACATG TCTTCACATC CAATTCAGGT CTTCTCAGAA ATTGGGAA.AC TGAAAAAAOT TATGTTGCAC CGTCCAGGCA AGGAGTTAOA AAACTTGTTG CCGGACTATC TTGAAAGGCT TCTTTTTGAT GATATTCCTT TCTTOGAAGA TGCTCAAAAA GAACATGATG CA'N'TGCCCA AGCTCTTCGC GATGAAGGAA TTGAGGTTCT GACCTCTCCA GAAATCCGCG ATCAATT'TAT CTACCTAGAA CAACTCGCTG CTGAATCATT CGAGGAATAC TTAGACGAAG CCAACATCCG TGATCOTCAA ACCAAGOTTG CTATTCGTGA ATTGCTTCAC GGCATCAAGO ACAACCAAGA.
ATTGGTTGAA AAAACAATGG CTGGGATTCA AAAAGTTGAA TTGCCAGAAA TTCCTGACOA AGCTAAAGAT CTAACTGACT TAOTTrGA.ATC AGAGTATCCA TTTGCAATTG ACCCGATGCC AAACCTCTAT TTCACTCGCG ACCCATTTGC AACAATTGGA AACGCCGTAT CGCTTAACCA CATGTTTGCA OACACTCGTA ACCGTGAAAC ACTCTACGGT AAGTATATCT TCAAATACCA CCCAATCTAT GGCGGAAAAG 'rGGArGGT CTACAACCGT GAAGAAGATA CGCGTATCGA -AGGTGGAGAC GAGTAGTTC 7""CTA;ACA CCTrGCA GTAGGTATCT CTCAACGTAC AGACGCAGCT TCTATCGAAA AACTTrTGGr GAAAGTTrTG GCCrTGAAT TTGCTAACAA CACTATGGTA GACTATGACA AGTTCACTAT TTACTCAG -r ACTTACGAAA ACGAAAAACT TGAACTTCTT GCTCAAAACC 7rGGTGTAGA
CAACATCTTC
CCGTAAATTC
TCACCCAGAA
TAAAATCGTT
AAAAGTTCAT
CAATATCGTA GCAGCTGCGC GTGAACAATG GAACGACGGT ACCTGGTGTG GTAGTTGTTT ATGACCGCAA TACCGTGACC CGGCTI'CGC TTGATTAACA TTCGCGGAAG TGAATTGG4.r AAGAAAAATG TrGGCTTCAA ATGCACTrGG ATACTGTCTT ATCGAAGGCG ACCTrCACGT GAAGAGAAAG GTGACTTAGC TTG.ATTCGTT GCGGTGGTGG TCTAACACTT TGACCATCGC AATAAGATTT TGGAAGAATA CGGGGCCGTG GTGGACCTCG CTGTTCGATA TTCGTCAATA AATTCAGTAT TCCAAGCACG GAATACCTTA TTGGTCTTTC CACTACCTTG CTGGCAAGA-A
TTGTATGTCT
GAAAATGTAA
CAGCT'rCTTA
AGCTCACTTG
ATGCCATTTG
AAAATAGAAA
GCAGAAAAAG
AAAGATTTGA
AACGTGAAGA AGTCTAATCC CAGGAAATAA TAAAATGACA ACTm'ACCCG TGCAGAGTI'A AAAAACGCAA TATTCAACAC a. TATCGCTCTC CTATTTGAAA TATCGACCTT GGTGCTCACC AGAATCTACT GAAGATACTG CGGATTCAGC CAACGTATGG CGGTCTAACT GACGAATGGC AAACTTCGGT CGCTTGGAAG TGCCAACAGC TTGCTCGTAA AAACATCTAC TCGTACTCGT GCAGCCTTTA CAACTGCGGC CAGAATACCT CGGAGCAAAT GATATTCAGT TGGGTAAAAA 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400
CTAAAGTATT
TTGAAGAA'TT
ACCCAACTCA
GCTTGACATT
CAGGTGCTAT
GGGACGTATG TTTGACGGGA TTGAATTCCG GGCAGAATTC TCAGCCGTTC CAGTA'rGGAA AATGCTCGCT GACTAC'rTGA CTGTTCAACA
GG'TATACTGT
CCTTGGTGTC
TGAATTGcCA GGTGATGGAC GTAACAACGT AATGTTCACA TCTTCTCACc GAAGGATTTG CTAAAGAAAG a a a. *a a a AAAAGAACTC TTCCCAGAAA AAGAAATCGT 'rGGCGCACAT GTTCTCATCA CTGAAGATGC TTACACAGAC GT7"rGGGTAT CAATGGCTGA TGATGAAGCA GTTAAAGATG CAGACGTTCT AGAAGACAAA TCGCAGAAC GTGTAGCTCT TCTTAAACCT TACCAAGTCA ATATGGACI' AGTTAAAAAA CTTCCTACAC TGCr'rGCCAG CATTCCACGA TACTCACACT TGAAAAATTT GGTGTAGAAG AAATGGAAG'r AACAGACGAA TCGCCACTTC GATCAAGCAG AAAACCGTAT GCACACTATC ACTTGGTAAC CTTTATATTC CTAAAGTATA A'TrTAGATA GCAGGCAATG AAAACTTGAT GTTTATGGTA AAGACGTTGC GTC7"rCCGCA GCAAGTACGC AAAGCTGTTA TGGCTCCTAC ATAAACCGTC TACCAACAGC 558 TATGAGGGCT GCGACTAATA CTTAGTCC GGTCCTCTTT TATGTAATGG TAATCTArrA TTTC'ITATAA AATATGTGAA AAATCA'N'AA ATTGAAATCT AAACGCZATTC TATGG G GATAAAGGAG AATTwATGGC AAATCGTAAA ATTGTAGTAG CT'rTGC4ACG AAATGCGA?1' CTTTCTTCTG ACccATC-AGc AAAGGcTcAA cAAGAAGcTT TAGTTGAAAc A~cTAA~rCAT CTrGTAAAAT TGATTAAAAA TGGAGATGAT GTTGGGAATC 'rCTrGCTCCA ACATTTGCCA CTCGACTCAC TTGTCGCTAT GACAGAAGGT CAAAATGCTC TCTNGGATGA ACGCATCGAA GTCGTAGATA AAAATGATCC AGCTTGTT TCAGAAGAAG AAGCAAAAGC AGAAGCCGAA GGCCGTGGCT GGCGTAAGCT CGTTGCCTCA ACCATCCGTA CTCTTTAAA TAATGGTCAA CTGATTATCA CTCACGGTAA TGGACCTCAA TCAGACTCTG AAAAGAACCC TGCCTTCCCA AGCATCGGTT TCTGGTrGAA AAATGC7r'r AAAAATGTTG CCTCTGTTGT AACGCAAG1TT A6ACTTGAGTA AACCAATCGG TCCTTTCTAT AAAAGCGGAG CGACTTTCAA GGAAGA'rGCT CCAAAACCTG TTGACATCAA AGAAATTGAA GTCGTCGTAG CTGCAGCTGG TGGCGG'rATT CCCGTCGTCA AAGAAAACAA TGGACA'rTTG ACTGGTGTCG AAGCGGTTAT TGATAAAGAC TTCGCTTCCC AACGTTTGGC AGAATTGGTT GATGCAGACC TCTTCATCGT TTTGACAGGT GTAGATTATG TAN'GTTAA CTACAACAAG CCAAACCAGG AAAAATTGGA ACATGTGAAT G'rTGCCCAGC TGGAAGAATA TATCAAACAA GATCAGTTT CACCAGGTAG CATGCTTCCA 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 39Q0 3960 4020 4080 4140 4
AAAGTAGAAG
TCCCTGAAA
TAAGTTGTTT
TA'rTCT'rGAA
AACAAATATA
ACACTTT'GTT
CAGCL'ATCGC TTNTGTCAAT GGTCGTCCAG ATCTAGGCGC CT'rGATTGAA TCTGAAAGCG TACTAATAAG ATGTATTCTA TTTCTAGTAT AACATGTACA ATATTTCAAA AGATACTACT AATAGAAAGC GTTTTCTTGA ATGTrTATTTr AGIACATCAGG AGGAAAAACA AATGAGTGAA
AAGGAAAAGC
GAACAATTAT
CTTTATATCA
T7'rAGACTTT
AAGAAAGTAG
AAAGCTAAAA
ATI'ATGGCAG
AGTTATTACT
TGAAAAAGGA
AATTAGAAAT
AATATGGTAA
TTGGTTTTTT
AAGGGTTTAA
TGCTAACTTG GATGCCTTCA TCTTACACCG TATTATTGAT AATCATTGCT GTTTATCCCT GCGGGGGCCT 'rTATAGAAGG TATTTACGAG
AGGGATTTGG
AGGTTCGCTC
TGGTGGTTTC
CGTGAAGAAG
GATGTCCTCA TGGCACCGAT ATTAAAGAAA CGAG3CGCAGC CTTGGCATTG TCAACAAAAC TATAAGGGCC GCGAAAAAAT
TCGGGCTATG
GATTGATGTA
TGGTGCTCTT
G7'rAATT'G
AGAAACAATG
ACTCAGCCTC AAAATCCACA CTAGGTACTC ATCCAGAGGA CCTCTTCA TCCTTATCGT
GACGTAGGGA
GTACTGATGC
GCCTTCTATC
T1TGCCTCTAT CTTrGTTGC CACTCCTTGT CCTCGGTGGT ACAACTTATG GTATGGGTGA GCCAGTrATG ATGGCCGTTG GTTr'rGATAG CCTGACTG.GT G7TrGCAATTA T'N'TGCTCGG TTCTCAAATC CGCTC=TGG CATCTACTCT GAATCCATTT GCGACACCTA TTCTCAGC GACTGCGGGA GTTGOTACAG GACTGCTCrr AGTACTTGGT TAAGTCACTG GTTTATAGTA TTCATCTGTA GAATCTACAC GACATTCATC TTGATGGTA'r 'rGATGACTTT AATACTTGGT T'ACTTCTGCA C1'AGGTACT TATCCTGATT GGTGTTATTr TCGTGCTGCT GACI'?GCTCA TATCATGAAC GACCGTATGA CGGTCTATCT TCACAAGTCT GGGACGGTAT CGTACTTCGT CTGATCTTCT GGrTACCTT TTGTTTACCG 'rTATGCGGAT AAGA'IrCAAA AAGATCCGAC CTCGCAAAGA AGA'I'TGAAA CACTTNAACG TAGAAGAATC TTAGCAGCAA ACAAAAATCA GTTCTC~rCT TA~rGTGTT TGAGCTTCAT TCCATGGACA GACCTTGGCG TTACCATTrr TGACTGGTCT TCCAGI-rATT GGTAATATTG TCGCTTCATC GGTACTTCCC AGAAGGCGCA ATGC'?CTTTG CC?1'TATGGG ATGGTCTTAA AGAAGATAAG GTGTTGCCTT GATCGTACG TTACCGATAC AATCCTCAAC TTATCGT'rGT AACrrATATC ATTATCTCTT CCTTCATGA.A
ATTGCTCGTG
TGGCGTA.AAG
TTCTATCTAC
GTAT'rCAAGT
AAGGCTTGAG
CTATGTCAT'r 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 CTTGATCCCA TCTTCATCTG GTCTTGCCAG AGAATTTGTA AATGTCCGTC CTAGCTTGAT CTTGAACTTG ATTGCACCAA CATCTGGTAT CAACATTGG'r ACTTGGTGGA AArrCA'rGGG CATCGCCCTT CTTCTCCTTG GAACCTTCCT TTCCATGAA-A ATAGATATAA CAA.ATCAAc7T CGCAACTATG GGTATCATGG CTCCACTTGG TATCACTGCT TACCAATCTG CTTCAGGTGT TGTGATGGGA GCTCTTGCAC TTGGACGTA'r CAAACTCGTA GTCGCTATTA TTGTAGTGAC TCCATTCCTA TAAAATAGTG AGTGAGGTGA TAAAGATGAA TTTCTTATAT CArrAAAAAC CTTGATTTCC TATCCFCAG AATCCAAGAT GTCCTAGAAA TCTTGACCCT AAAGGTTATr CATTCTCTGT CAT'rTGOGATG ATTTGAAGCA ACTATCAAAG CCCTTCGCTC GCAGCTCTCT AAAGCGCGTA CGCTTTATCI' CTACAATACC ATCGAAGAAC GACCTATGCr GAAAAAGGGC AGAGCTTGAA GTAGGAGGCG TACTCAATGA AGGAGAAAAT GGAACACCTT PTGGACAAGC AAACTTTAGA GATTTG'rCGA GACATAGGTT TCACTACCTA ACGGATATGC AGAAATCGGT CAGGGAGCAG AGCTTCTGC TTGTTCCATC AGGTGATGAA GCAGATTGGC AGACACCGCC ACGGCTGCGT ATTCGGACGT GG'rcTCCMNG ATGATAAAGG ATGCAGTAAA AAGCTrGCTG GACCAAGGTA TTCAGTTCAA TTGGTACCGA TGAGGAAACC CTCTCGCGCT GCATGGCACC AGGCCAGTAT GGGCN'TGCA CCTGACTCAT CTTrrCCTCT TTCTACAGGT CAAACTTCAT GGCCCTGGAT CGGATCAACT CCTTTAACGT TG'rACCAGAC AAGGCCAACT ACCAAGGTCT CCTCTATCAA CAGCTTTGTA ACGGTCTCAA AGAAGCTGGT TATGATTACC AAACCACTGA 54 5940 560 ACAAACCGTA ACCGTT =CG GAGI=GCCAAA GCA'rGCTAAG GATGCTACTC AAGGTATCAA 'rGCTGTCATC CGACTAGCTA CCATTCTTGC TCCTCTCCAA GAACACCCTG CTCTCAGTTT TCTTrGCAACA CAAGCACGTC AAGACGGCAC AGGAAGACAA ATC7"rrGGTG ATATAGCAGA TGAACCTTCT GGTCACCTAT CCIAATGT CGCAGGTCTC ATGATCAATC ATGAACGTTC TGAAATCCGT ArrGACA1TrC GGACTCCTGT CTTAGCTGAC -AAGGAAC TAGTAGAGTT GCTTACAA3A TG;TGCACAAA ACTACCAACT TCTATACGTC GCAGAAGACA GrAAACTCGT GACTGGCGAT AACAGTCC1'G CTATrCATC AAATTG;TGTA GCCTTCGGCG CCTTA'rTCCC 'I'GAATGTGCC GTTCTAGAAG ATTTGTACCG
CCGCTACGAA
TAGCACACTG
CGGTGGTGCC
AGGAGCGAAG
TGCTATGGAT
CTACCAAAAA
AACTTTTGGG
TAAGAAAGCA
GTCTAAAGTC
CTGGTGTTTC
TCGACTTGCA ACTTAATCAG TGCACCCCAA AACTTAGACA GTTATCAAGA TAAACTTCAG TTTrCAAAAAG ATTTGGTGTG CTTTATAAAT CGCN"TTC CCGTGGTATT GCCTGATT= AACAT'N'TTA GAAATCGATT ACTATATTTG ACCCACTTCG TACTATCAAT TACTTCTGAT ATTTCTCGAC ACGCTCAT-rA T7'=TCAAA AATATATTTA
GCAACTGTTT
GAATAAATCT
ATCTATGAAC
GATCTTTCTG
AG7'rTTTGCA GAGTTTrGACT ATCTAGCGCC ATGCAAATCT ACCAAGAAAA ACTTTTCCTC GCACCATGCC CAGACAGAAC ATCAGCCA.AA ATTTATGCCG AAGCCGTCTA AAATCGACCG ATTAATGAAC GTGTTTT'rT ATcGAATTGA AGGACAAAGC TTCAAACAGC ATCTGAATCT TTGAGATGAG GATAA.ACTCA AACNTTTAG TAGTACACCT CTCCTTCTAA TccTT'rCTT AT7rTCATTT'r AACCTCTrG'r AATTTTTCTT AATAGGTTTT AACTTACCTA TGCTTATGTT T'rTCCTAAGA TTCTTCAGCT TGGTTTTTGT TAATGCTCCC ATAAGTTCAA AGCr=CAG ATAAATCCTG 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 ATAGTATATT GAAACTAGAA TGACTGTCCT GATCGATTTG TCTTTAACGG CTTTATTCAT 7rrCCGTrTGT AATTTATTGT ATTTGATCTT TTTTGAAGGC TCAGATAGCG GT'rTGTCTTC A'rTAATTTGA AACATAAGGA ACAAATCCTT CATAGTAACC AAGCTTGTI'T TCTAATCAA ACCATTGCAA CTCAGATC CTCATCCAAA TAATGACTTG AAATTAGTGC TGAACTCGTr TCTGTATCCT GTACAGCTG AGCACCCATA CCAGCAAAAA ATAAACTCGT 'rCCTAGCAAG ACCGAACAAG CTCCTATTGC ATATGGCCTC AAAGAAAAAC GCTGCTTTCT CTCAAATTGA AAI-rCTTTCA TCCCATCTCC CATCA'FrCAT TA'IrACTGTA TATrrGTAT ATCAGAAATA cTrCTATTC ACAAATCTTT CTAGTTATTC CCTTATCATT CCTAATTAAG GGAGATAACA TACAATAATT 'NrTAGTTAAA TGTATATCGA TGTx~TrTT TTTTCTTAAT AAACCCAATA CAAAAAGAGC CTGTTACCAA GCTCI=CA CTCAATGAAA ATCAAACAGC AAATTAGGAA ACTAGCCACA GGTTGCTCAA AACACCGT'rT TGAGGTTGCA GATAGAACTG ACGAAg'rCAG CTCAAAACAC TGTTTFTGAGG TTGCAGATAG AACTGACGAA GTCAGTAACA TCTATACGGC AAGGCGACGC TGACGTGGr TGAAGAGATT TTCGAAGAGT ATTAflTCTA'r TATTTCTTCT CAGCGCGAAG GGCTGACAAG
ATTTGTGTTC
CCT'N'AGAGG
ATTTGTTCTT
GGATATCATC
CAAAGGTAT
CTGTCATGCC
CCATCCACAA TCGCCTrACG GCTrCTGCTT CAGCTGCAGT GCGACCCGCT TACGTTGCGC GGTTCGACCT TGGTAATCAA CACACCATrr GGAGrAI-rG GTAAAAAGAT AG?1rGA~T CAAGGTATCC AAATACTGGT TGGTCAAGAG GATAGACATG AACATTGGCT TCCTT GAGTT CGGTGATAGA CTCTGCCAAT TTGTrGGGCA ATCCCCACAC CATGAAGGCG GTCTrCT GACAATTTTA ATCTTGTCAG CTTCCGCCA.A r'rCTTGTGCT CGCATGATT TCATTCATGG ATTGCTTAAC 'FrCTGCATCT GGTTTTCACG ATAATGTAGC CGTAAGTGGT CATrTCTrCT 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 GCTACTTGCT CTTGAACTrC AAGGGCAATC TCATCTTM GTTAATTTTG GAACAGAAGA GCGAAGAGCA TCTTCCATAT GGACGTATGA GTTTATAGTA AGCATCrGTC ACGCTCTGCT GCTACATTCA TCATAACGAA CACATTGTCC TTGGTCTTAG TGCAACAAGC GCAACTGAAT CCCTGCTGCA ATCGAGTCAA TGAATACCGC TAT'rACCA.AC CTITTTGGTAT 'rTCCCAAAC TCCTGACGA.A CCACATAAAC TGTACTCAGT GTGACTATCA ATCAGAAAAA TCATGAAAAA TATTGCCATA ATGGAACCTC TATACCACAT TTAAAGAAGG CTGTGCCGTT TTrACTGCGA ATTAGAGGTG AATTGTCCTA TTGTCGTCCA ATCTCTTCCT GCAATCGTTT CTTCTAAGGT TGGCATAAAT GGATTTCCTG
TCTCAAACAA
AAGATTTAAT
CGTTGACACG
TCTCAACCAC
TCCCAAAAGG
GTTCAATAAT
CCAATAGGAG
TTCATCCAAG 8400 CTGAGAT'rCT 8460 GTACTGAGTC 8520 AATATCACTT 8580 CAAGCGAATA 8640 CGCCACCGAC 8700 CACACAAACA 8760 CACAAGTATT TTTCTAGTAT TTTTTCCTGA AATGTCAATA AAAA'rAACTC 'N'TATAAAAG GTGCGCAGGC ATCAATCAAG GCATrCTrAG AAAGGTATTC AAAGTCGAAA TCr?1rCTT CAATACCAAG TTCAGTCAGT 8820 8880 8940 9000 9060 9120 9180 9240 9244 TTCTTAGGAA TACCTACTGT CTCAGAAAGC TTCTCAATCT CAGGAArCGC CATTCTTGAT CTGATTTACC TTCTACATGA AGTC( GCTrCTGGTA CACGTI'TAGC AN'?rCACGT TCTA
ACGG
INFORMATION FOR SEQ ID NO: 69: SEQUENCE CHARACTERISTICS: LENGTH: 8898 base pairs TYPE: nucleic acid STRANDEDNESS: double 'CAAGG CTTGGCAAC ~AACTG GTAGCAACAT
ATAATCGGCA
ATTGCGGAAA
GGCACAGCAC
562 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: GATCTGAACT TTATCATCAT TTCTGTCTA ACTT~?rGGGG TrTTTGATTTG ATGTAGTTGA TGCTAAGACA AGAA 'rGTCA TTGTAGGAAC GGCAATTGAG ACTAGAAAGC ATACCACCGA AAATCCAGGT CCAACAATAG CGCTCCGCCA ATTCCACCTA GACGTTGATT CCCAAGGTAT ACCAAATTGA GTCTTAAAGA ACCAAGTAGA CTAGTTGACT TGGGAAATCA AAGCGTCCAA AACTTTAACT AAGAAAACAG AACATG'TCT GCACGGAAAT AACCAATCCT GCTACAAGCA
AACTTAATTT-
TGTAGTTCAG
TACCATCTGC
AAACATAAGG
CATAATAAAA ACACCCCAAA TCATTGGACT GACGTTTTTT TTTGGI'GCG ACTGCTTC TGCAATTTGA AGATAAACCG AACCGATAAC AGCCAAACTr TGTGAAAGTC TTGGATTCCA TTTCCCAAAG ATCATCGCAG TTGTCACTGA GAAGTTAACT GAGATTGATT GAAAACCTGA AATAATAACC CCTAAATATC CCGCTGCTTG AGGATGTTCA CCCACAGAGC
AGTTAGATTT
TGTATGCTTA
CAAAGAAGGC
CTGGCAC'TCC
CAAAGAAGAG
CAAGCGAT
GCGCATAAAT
TCATCTTGTA
GGAC-ACGAAG
GAATAAACCA AGCAAGGAAT GAGAAGGCAA TCGCCAGATA TGAAGAAGAT ATCACCAATC ACTGGGATAT TTGCCAAGAC AAGTTTGACT TAGG'PTGTCG GTTTGTCCTT TGTTATAAAG CCAAGGCAGG CGCCATCAAG TTCAATACCG TACCGCTGAC
GAACCGTCGC
AGGATAGCCA
TGCTGCGTGG
TCGAG'N'GCT
TTCAAGGTTA AAGACAACTC CAGAAAAGGC ACCCATAACC GTTTACCACA CCACCACGTT CAGAGAAAAC ACCACCGATA TGAG'rAAATC AGCATAGAAG ACACCAAGAG GGGGAGCAAG TTACCTCCTT TAACTTGT'rr TTTCGGTT'rG ACAAAGCGTP ACAAAGAAGA TAATAGACGC TGTTACAATG CTCACAAGCT TTCATACCAG GAGCCCCAAC T'rGGAGAACG CCAAATAGGA ATTGGTGAGT TGGCCGCAAG CAAACTAACC GCCATTCCGT .GAACCTTGAA CATAGACGTT CTGGAAGGTT CCCAAACCTT GCCAAGGCAC CTGAAATAAT CATAGATAGG ATAA'rAG'rCC TATTCTGAAG CATGTGGATT AAGACCAACT GCACGGATTT TTGAGCATGA ACCAAATAAC TGCAACGGCA ATGATGGCAA ATGATAGAGA AAACACTACC GCTCCAAATT GTrTCTGCAAA ATAAT'rCCTT CAAGGCCAAC Cr'rGTAAAGA TGAGAGGTGC GTTATAATAG ACATCTTTAC CGATA-AGGTA ATGAACACTG CAGATGGTAC CTGCGCCGCA AGGCTGCAAA GAGTATACCA TAAATCCGAT AGCTAATGAC CAACAGCTCC ACCAAGACCT GCT TGCCAGA AATACCAGCA CAAAACCAAG AGTTGTTTTC AGAAAATACC AATATTCATC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 CGTGAGTTAc CAGTCAACTC AGCCAACCAA GGTGI'CTGAT AGGTTGCATr AGCCCCAACA 563 CGAATGGTCG~ AATCTGTACT TTGCATGAAG TCN'AGCGA AAGCATGGAT AA.AGGCATTC CCTACATACA AGACAATGTA GTTCAT~CATG. ATGGrrACAA TAACCTCTGA CGTCCCTAGA TAGGCCCTAA GAATACCTGG AATCGCTCCG ACAATCCCAC CAGCAATCAA GGCAATCACG ATGGTTGCTA GAATCATCAA GGGACGGGGC ATA'rCTGGAT GCGACAGGGC AAACCAACCA CTGAGAATCC AACCTGCCAA AGCCTGACCA GGAAGTCCGA COTTAAAGAA ACCAGCTCGA CTGGCAACGG CAAAACCAAG ACCAATCAAG ACCAGAGGAC CCATAGCACG GAAGATTTCT CCAATCCCAC GCAGACTGCC AAAGGCTGTA TAGAACAATT CTTCGTACCC CCAAATAGCA 'rCATAACCGA AGATCCACAT GACAATGGCT CCGAGTAAAA I'TCCTAGGAA. TACAGAAATC AAGGGAACCG AAA7"TrGTrG TAATTflTTTA GACATCACTC TTCTCCTrrC CCAAGrTTCC ACCAGCCATC AAGACACCAA GT'rCTTGTr A7'rGG~rG~r TCTGCTGATA CAATACCTTG AATCTTACCA TCGTGGATAA CGGCAATACG GTCTGAGACG TTTAAAATCT CATCCAATTC AAAGCTGACA ACAAGGACAG CCTTGCCAT-T ATCACGCTrCl TCAATCAAGC GTTTGTGGAT 9* ATACTCAATG GCACCGACAT CCAACCCACG AGTTGGCTGG CTAACGATA.A GGAGATCAGG ATCTCGATCA ATI-rC-ACGAG CAATAATTGC TN'rGTrGA TTCCCCTG AGAGTGCAGC TGCAGGAACT AATTCACTGG CAGCGCGAAC ATCAAACTCT TCCATCAGCT TTT'rAGCATA AGAAGTAATA TTTGAATAAT TCAAAA'rTCC ATT7rTACTA T.QTGCTTCrr TATAGTAGGT TTGAAGGGCA ATATTrCAG ATATCATCAT TTCCAAAATC AAGCCATCAC GGTGACGGTC 7rCTGGAACG TGCCCAACAC TTAGrTTCGT AATCTGACGT GGGTGCAAGC CTACAATrGA ATCTCCTTTT AGCTCA.ATGC TACCAGATTC AACCTTACGA AGACCTGTAA TGGCT'rGAAT CAGTTCAGAC TGACCATTTC CATCAATCCC CGCAATAdCA ACAATCTCTC CAGCACGAAC 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 ATCCAAGGAC AGAlwT'rTAA CAGCTGGAAC ACCACGGTTT GATAGACAAA ACCACTTCl-r TTGGTI=TAGA GGCTTGCTTC ACGTCCTACC ATCATTTCCG CCAAATCAGC ATTGGTAGCC AATTGATTTC CCACGACCGA TAACTGTAAC ACGGTCAGAA TTrTGTGGGTA ATCAAGATAA TTGAT,TCC TTCTTTGACA CAACTCATCA ATTTCTGATG GACTCAAAAC AGCCG7TGG'r AGCCCCCCGA TA.AAGTGTTT TTAAAATTTC TACACGTTGT TGCTACCTTG GCAGAAGGGT CAACAGCTAA GCCATAACGT 7TrTGCTAGCT CCAGCGATAT CTAGCACACC AIV=?AGTC TCATTGACCA CCAAATCT'TT TCTGTT7TTAA AGGAAACArA CCTGCAATTT CAACGGTN-TC ACTGCTCGAA TTTCATCCAA AGATT7TTTCA TAATAGCCAT TCGTCAAAGA TAAGGATATC TGGGCTCCAA CTGAGATATC TCAGAAAGAG CCTTGATTTC AATTCACTAC CTAAAATGAT 564 ulr'rTCAGCC ACTGI'CAAGG CTTCAACCAA CATAAAGTGC CAAGCTAGCT GCTrrAGATG GGGAG;TCGAG ATTGACAACT ACCACTAGTT GGTTCAAGAA GGCCTGCTAA CA'rG=CATT A'rrTTCTCCT AAAAGTG.CAT GAATTTCACC T~?rCGTAGG GGCAACAAAT CCACCAAACA CCTTGGTAAT ATCACGCATC TGGTGAACCA TCCCGATTCC TGACCGTTGA CCGCCATTTC AGCGTGGACT TACCAGCCCC TGCAAGTTGA Trr 'CTCGTT TCAATGACAT TrCCGrGMC CATGTGCTCT TCCI'TCAGA GTCTTAT1-r AGCAAGCTTT ACTTAGACAA AATGACTTTG GCTTCCTAAG AAATGACTTC CA'rCCATTAr ATTTTAGCTT TTGCATCT'rC GACAGCTTTT AAGTCAACCC CTTTATCCTr CAATGAGTAA CTTTCTGCCT TGTTAGAAAT ATC7=TACA AGAACAAAGT TTGATTCTTT GCCATCTTTA CGATCAACAC CGATAACCCA AACT7MTCA GCCTCTGCAA AGACACCTGC ACCTGTACCA GCTGCGTATT GTGCGGCTGC AATTGV'rTA TAGTCAACTT GGACTTTGAT AGATGGGT'CT TCAAAACGAG AGATAA=TC AGAI'rCGATA CT'rGT'rTTc CTGCAGCCAC ACCTGCAAGg ATTTCAATAA AACTTGCCTAG ?TCTCTAGT TCTCAACTCT TAAAAAAGCG GCCCTTGGCC TTTTrCAGGAA ClTrTTACGCT TCCATCAACG TTACCT'rCTT CTGAAAGGTT TGTTACTGCC ACGATCACTTr GACCGCCAGG GAATITCTCCT GTTCTACCAA CTTGTTTCAA AGTAGATACA GAAGTGTATT TACCTTICTGC TTCrTGGTCA TTTTCAGGAC GGCTT'rCcTT GAGAGATM CCAGCTACTT GGTAAACAAT ATCTGCACCG CCTTTAGCCG CATCACCAAA TGAACCAGCG ACTCACGCAA CACCAGCCTT GAATCCTGCT ccAccTAcAA AAccAAcT-rG TTrrGTCTTA TAACCTGACT CATTATCAGC GA.AAGTTACG 0 0* 000 0 0* .0 0 0 0 00 0 0 0 0*0000 S 0
S
0050 0 5 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 CTCGCAACAT TCTTTTGGTC TTTAATCACA 'TCATCA.ATCA AGACATAGTT CAAGTCAGTG TGTTCT'rrTG
TTGTAACTTC
AAGTAAGTGA
TCCCAAGCTG
GCTTr'rGTCT
GCAAGTCCAA
ACTGAACCTC
GACCGTGCCA
ATTCCCCGAC
CTTGTCAAA
CAcT'~rGACC CTGCATCTT AACTGCATTA TTA.AGGGCAA AACCAACACC GAAGATTAGG CAGCCGCTTG TTGCAAGTTG TTAGCGTAGT CAGCTTCACT TGT'rGATTGG AACCGTTATC TTTGAAAGA T'rGTGTTCTT TACCCCAAGC CTGCAAACCT ATTGGTTGAA TGArr'rGTCA TCAACACCAC CAGTATCAGT GACGArrGCT TCACATCAGA AGATGAAGCT GCGT'rACGAG AAGAGCGGTT ACCACATGCA CTGCTGCCAC TGCAACTAGG CCAAGACCTA GCCAT'TCTTT CTTGTTCA'rr CTAAATAAGA TGTGCAACGA TrrGCAAGT ATGGArTGGT TGGCCACAAG CTCAGAGAGC GACTCAGACT AGTTTAAGTC TGTAAAAGAG TATGGAAGTA CGTCATCrCG
AAATTCACCC
ATAGACAATC
ACCCTCGATT TATCTT1GC GACTAAGGTC ACT'TTTAGAT ATCACT'rGGC GACAAGCACC ACATCGCGAG ATCGGTTT AATTCTGAAA ATTCTCTTTG GCCTTCAGA'r ATAGCCTTAA 565 AAATAGCTGT TCTCTCACCG CAA?1'GGTCA AAGCATAGCT CCG1'GTAAAC ACTTCCGTCT TTAGCTACTA AAACTGCTCC GGACATAGGC ATGTTTGCTG GTTrTCAATTG CCAG7"rCAAT GCCAAMT=T C7"MAAAAT AGCTACCCCA GCTGACGrrC TCGACAAAGG CAAGAGCATC TGC-ATAAGAA CGAGCTCCAC TCAGATCCAA CTG7TCACG CArrAATG'rA ACATCTGCTA AAGCCAGTAG ATGrTTGAC AAACTCACCC CCAGCTrT ACT7TTCTT GGTCTGTCAG AAGGCAAGCT TCAATAATGA C7'rGCTTCCA CTACTGCGCG AATATCTGAC 'rCAACCAAGG GCTCCAACAT TGATCACCAT ATCAATCTCA TCTGCACCAT ACATTIICA ATATTCACTC GATAGGAAAG TGAGAATAGG CAACTCAGTA GTCGCCATCT CGATACGGGT CGCACCTGCT CGGCGGCCTT GACACCCATA TCGTAGCACC ACCAGT'rGAA GGGCCAATTG GCAAACAACA CT'rTCACTAA CTTATCACCA CTAAA'rTACC
TTTGCATAGC
TGATTTGAGA
TTC1-rTTGTC TCAAATGCCr TCACGCrGA AGTTGrrCCT CCCAAAGGGA AACCTACTAC TGTGCAAACC TTAACATCTC TGCCTTCAAG TCC71"[T'rA GCATG'rrCAA CCCAGGTCGG ATTAACCCAA :099 0 ACACTGGCAA AGTCATACTC TCTAGCCTCA GACAACAAAC GCATCTTGTT ?TAAAAGCGT ATGATCrATA TATTTATTTA CCATTTAGGA GATGA rCT ACAATTTCAC GGA=T7=N' CACArTrrG GAAATCTGTA ACTAGTTGAG GTGAATTTr' CA6ACAATI'TC ACCCTTTTGA ACGGAGTCTC CAATCTTCTT CATAGTCCAA GGCATCAGAC ??AACTGCAC GACCAGCACC CAAAGTCCAT AGCTGGAAGA GCTGAAATGA CACCCGT-rrC CATGAGCTAC A =TACAGGA CGATAGAGGT CTTCCAAG-rC TTTCCTCAAA CTTAGCCAGT GCTTGACCAT TCTCAAGATG VTTTAAC ATITTCCCAAA CCAAGCATAA TTTGAGCCAA TATCCTGACG TCCTTGACCT TGCAAAATCT CCAATGCTTC CAATCGCTCG TCCCAAAGGC TCGCTCATAT CCC'rAATCAC CAACCTTACC AAGATCTACC ATAGTTTGAG CCAACTCACG TGA6AGGCACC CTCACCCACA GTCACGTCTA GCAAAATAGC TATCA6ATTTG 7T1rTTTT AT'rTCArr"C GG='TCCCT CACTTCATCA CTTATTrTAA TTCATTTGTG TATACTTTTG TTCAAAAACA ATTCCTGT'TT CAGCCTCATG GCATAAAGAC CTGAGCAGGG ATTTCCACCA TCCACCTTGG GCTTGCACCA TTGGTGAACT TCTTCAACAG TTCACAAATA AAGTGGGTA.A AAGGATTTCC AGACGATT'rC TGCTACTGTC TTCCGTCCAA CGCCTCATCA ACCGTCTTCA ATCCGCCCCTI GCCGCAATT'r 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 9 0 909* 99 9.
9 9 0 9 TCTTGCTCAT CACCGAACTC GCA6ATCAAAG GAATCCTGTC GACAGTTGCG GTCACATCAC GAAGGGCATA GAGAAGCTTA TCTGCTTTGA CCAGCTGGTC TGATTGCCCA A'rGACAGATA CTCCAATATC CTGAACCTGA CCAATAAAAT CCTCTTGACT ACGTTCTACT 'rGATAGCCCT 566 TAATGGACTC CAATTTATCA ATTGTTCCGC CTGTATGGCC AAGACCACGA CCACTCAT TTGCTACAGG CACACCGAAG CTAGCAACAA GAGGAGCTAA AATCAAGGTT~ ACCrrATCGC CGACACCACC AGTAGAATGC 'rTCAACrr TCACACCATC AATGGCTGAC ACGTCAAACT CrGCCCAGT CrAACCATA TTCATCGTTA AATCAGAGAT TTCTCGAGTrC GTCArrCCTT TAAAATAAAC AGCCATAGCA AAGGCAGACA AGCCTTCTAT CAGCCATT-CA ATTTCACTTG GGATTAAATC AACTGCTCI'C ATTCITCAC TTTAAGGATT TCACAATTGC CAAACACATC TTGTTTTTTC TGGATGACGA TGGTCAAATC CTCGATGATT TCATGAACGA CTTGCTTGCC GTrCAAATCGC CCTTGAACrC TTGCATAAAT TCTGATAATC AG0AACAGTT CCTGATACAT AAGTCAGTrTC 'rTGACCGTCT CGTTrT'rT ACTTCTAAGG ATATAGTATC CC7rGTCTT
TTCCATCTA
TCCACCAATT
CGCACGGA
ATTAGATTGA
CAGGGCACGA
CAAACCTAAT
CAGACACTTG
AT7'rTrrCA GCA'N'TCrCT CGCCTGAACT CCGTAAAC ATCTAGGACT C;TCTCTCC~r GTCAACCAT'r TTCTTGCTAA CAAGTCCACT CTCAACTCAT TTTACTCATG ACACTATTTT TAATAAAACG AAAAAGACCG AACACTTATA AATAATAAAC AAGTTTATCA TCTATAATCA ATTATCAACA AAATTCCTAA
GAGCTAAATC
TGACCAAGGA
GGTTGACATC
AAACACCCGC ATCTGTCAAA GAATG'rCGTG AGCAGCGTCA ACCATAATTT GACTCAAATT1 AAGAAAGCAA GTCACGAAGC CATT'rAGAAC TATAAATATC GGCAGATTAT TATTTCTATT CAAAATGTTT AGATAAAAGC CACTTGGCAC TTGGAGCTCC TCCAAGAAAT CTTTACTT GGAGGATITGG AAATGACATG AATATCG'TCG ClTTTCATT GTGTTAATAT CAACCATGGT GGACCATAAC CACAGCCTAC AGCAAGAGTT GACTTCCAAA AAAGTCATTT T'rTCTCCCAA GGAT=TTC CATAG'rACAT GTAAATCGTT TACAAATTGA CATTTTCTTC AATCTCTTTC ACAGTCCAGA TAAAAACAAA GCTTAACCTT AAAATACTTT CCAACTGATA CGTTTATGTC GTGACATGTG GAAGAAATAA 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 AGGATTCCA AACTTGTCCA AAGTCGTATC AAATCTrCTA CCCTCTGTCG CAATCCGTAG GACTAAAAAG CAATAACTAC CCGCAGCAAT CCATTTCGTC CATCGTTTTT TAGTAAGAAA GCAATTAAGA ACGAACAAAT AAAGACAGCTr GTTACAATAG CATGTTCCAT CAAAAAAGTA AAACCGTAAT AGGTr'rCCAC AAAGCATCTA CCATTATCTG CATTGGTTCC 7 ATAAAA GGTAAAGCAA AACTTAAAAT AAAACAGAGT 'rCCAATATGT AACGTTTTAA GATTrTCATA GTACACCTCC TATAAGTTGT GAACTAAAAA GCCCCCTA TAAGCTTATA AATCAGTAGA ATCTATCTCC TATTTCATCA ATAAATTGAT CACTTATACT ATATACCATT GACTTACCAC ATTCAAGAAA CCCCTTTATT TTT=AGCTT 17rATGGTAT GATAGACAAA ATATCTAGGG GAAAACAAAT GACCAACGAA TTTTTACATT T'rGAAAAAAT 567 CAGCCGCCAG ACTI'GGCAAT CTTTACATCG AAAGACAACA CCTCCT1TTGA CAGAAGAAGA ATTGGAATCT ATCAAGAGTT TTAATGACCA AATCAGTCTC CAAGACGTrA CAGATATCTA TCTCCCCTrG GCTCAT'PTGA TTCAGATTTA CAAGCGAACT AAGGAAGATI' TAGCCTTPC AAAAGGAATT TTrCCTCCA IFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 13188 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TA'rCTTAACG aGGATTGGGT TTATCGTCAG TC'N'ATTGCC CTAATTGTGG GAACAATCCC 8760 8820 8880 8898
TTAAATCATT
GAGTTTGAAC
GCAACGATGA
ACAAAAAATT
TTGAAAATAA TCGGCCTGTA TAAAGAGCAA AAAACGAAAT TGAAGCGTGT GCAGGCAGAT TTGAGGTAAA TAACTTTCTr TCGATTATTC AAAGAAAkACC ACTTGCACCA AACATTGATT TATCACAAGT ACCTTCTAAA GTTAGAGATC CAGAAAAAGT TACAAAAGAA TCTCTGTCAT CAAGAGGTTG GACAATAGAA TCAGAATTTA CCCTTGAAGA TATGTATCGT AAGAACAATC ATATCAAAGA. AAAGATTAGG ATAATAGAAT TTAAAGGTAG AGGAAAGTAT CATCAGAAGA GGTGTATGAT TTCTTAAAAG GTTACGATAA CCTA.AGTTTA ATCGTCTGTA TATCTGAAAA TATGAAATTT GGTGACGAAA GGTGGGAAGT AGGAGATAAA TTGCCAGAGT TAATAGGTGA ATTATTGGAA AGCAATCTAA AAACTTTATC ACGT'rTAGTA AGTATAGAGG TAAGAGAAAA GTATAAGAGT TGTGAAATTG GCAGAT'I-r'T ACTGTAATCA TTITCATCAA CAATCAATGA AATAATCCTA ATTTCTTTTT GTCCTTCCGA AGCAATTTGT ACTGCTAGAC GAGCAGGTTG GGAAGGATAT TTC TTGTGCA TTTAAGCAAG CTTTATTTTT ATTCTAAATT GTATAGATAA TTTGAAAGTG ACCTAAAAAA
TTGTAGTGAG
TGGTGCTTAT
TTTAACTTAC
TACACCGAAA
GATTGGTTGT
AGATGGACAA
AAGGAAGAGC
GATAGAGGGT
TATCTr'TGTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CAACAGCTTC AAATATTAAG AGACAAAGAA CGGAAATTAT GAAAACGAAA CAACTTGT!G 'rCATCTGGCC TGATTATGAA ACTGAAAGCC CCTTATCAGA TCCCGATTGT GTGAGATGGT AACAACTAGC TrPGATGAAG GAAAAATATG GGCTACATAG CTCCTATCAT AGAVI'ATTGT AACTGAAAAA GTATACAGTA GAAATTACAG CTGAAAATCC AGATGAAGCC TTCTTGATGC AGATGATT
GAACGACTTG
CAGGACTATG
568 ACACTAGCAT ATATGAATAG GTAGATGTTT TTATTT'TG'rC AACAAAAAAG CrTrTTTTCT TATTTC?1'TT TATGATTTAA TACGGCATTG AGGACAATAG GGCTACGACG ATTCCGTTG AGAAGAACAT TTGGAAGGCT GTCGGCATGC
AGGCTCGCAC
CGAGTAGGCT
TGACAAAGAG
ATTACTGTTG TTGAGACCGA CACCTGCAGC GA77rGAAACA GCTGCGATAA GGAAGTTGTG rrCATTGrrA GCAAAGTCAA CACCGGCGAG GATTrGCATC CC~rGAATTG ATACAAAACC AAACArrACC ACCATGGCAC CACCGAGGAC AAACTTAGGA AGCACTCCAA GGAGAAccAG TTTGATG CCTGACAATT TAACCAAACC GGTGTTAAAG ATTCCTCCGA GAAGTACGGC GCGCGTGCTG TCGATT-GGAT CCTTTGAT CTCAACCATA GACACCGTTG CGATGATACA TGGCATCCCA AAGTAGAGTG GAGTTGGGAC GGAGC?1GGA ATGATN'rGGG CAAGGGCGCC GAAACCAGCT GCGTAGTACA TTGGCAGGCG AACGTTGT GAAAATCCGG TCTAAGGGAA CAAACCTTCT GCGCGGTATC CGTTGCGAAG ATCAGACAAG GCCAGATAAA CACCAGTTGA CATCATGACA ATAGATGAGA TTTCAAAGGT ATGGACAAGT GGACCTACCG CAACAGGAGA
S
GAAGTCCACC AAGCCCATAG AGAGATAGAC TTGATAAATC AATAGCTGCA AGCAAGAGAC AATAGCGACA GGGATCAAGG TGGGAAGAGA TTGGCTACTT TGCGATAAGG GCACCAAACA AGCGACCGAC TGGAATGCAA GAGTTGGAGT TGGAGGAAGG TAGCAGCAAT GGCAGTTCCA ACAACCAGAC CAATCAAAAT CTTTGGTAAA GATGrrGATC AAGAGGATAA TCAGAACAGT TTTGACCAGT TGGCTCTGGA ACGTTATTTC CCATATTTCC TTAAACCAAT CGTGGTAATA ACAGATCCTG TTACGATAGA TTGAGAAGAT CCCTGAAACA AGAACCACGt AAATCCCAGA TAGCGCCACT ACCATGGCTT TGCCCAATCA TAATCAAGGG CTCCAAGAAC GACTGGGAGT CCAATCCCAA AGTATTTGTT TTGCCACCCC ACACATGAAG ATATCTGTAG AAATCAGGTA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940
GGTCAACTGC
TCCTGAGTAC
TTGAGTTTGC
AAACGAGCAA
AAGGATTT'CT
TGAATCAAGC
TCCTCTGGTG
GAGTAGACTT
AAAATCATCG
TCAATGGTTA
TCAGCTGAAT AGCCAAGGGC 'rGTCGCAATC ATGATGGGAA CCAGGATAGA ATGGCTAGTA AGTGCTGCAA GCCAAGAACG GCTGC= GCG AGTGTTTTTC ATTAGAGATC TGCCTCCTTA AA'rACGACTT GACCATTTTC AAAACAATCC GTGATAGGAC AGGGTAGCCT GCTTrTTTCAA GCAAATCACG ACCATCTTGG CAATCACGAT ACCGATAGCT TGGACTGTGG CACCGGCCTG CTTTAGCAGC TTGGCOATTA GcAAGG.AAAT CG;TCGATAAT AGAGGAATTT TTCAGCGATA GAAACGGTGC TGGTCACCTG
TTCGATGATT
CAAAACCTTG
CTTGGTAAAG
GAGCAGTTAA GATGCCTTCG TTCATGGTGA TGTTCTTACC TT7"TTTGGCG GAACGTTTAA GGCTTCAGCT GTAAAAACGG CTGGGGCAAT ACCCGACGCT CGACC7rGGT AATGCCAGTA GTAGCAAATT TTTCCGCAAA AACCTTACCA 569 ATCTCTCGCA TCAAGCTAAA GTCAACTTGG TTATCACCCA AGATATGCCC ATCCTTGAGG TGGGTTAAAA AGvGAATCTAC CTTGAGGATG ATGCGCTCTT CTAATAATTT CATAAGACCT CCTAAAGTCT AAAAGTTAAT TTACTTrGrrG ?NrAAATATT TAATACTA'rA TATTTGATAA AACTATTACC AGCGAAGCGA TTGTAGTGGT ATCATAGACA
AATAGTTCGG
'rTAGGGATGA
GCACTAATTA
GGAACTATTT
AATTCCCTGG
AGGGAAATAT
TAGACATGGC AAAAACGCC ACCTTGTAGA AACGTCGTGC ATAATCTTGT TATTGTC 'AT TAGCCTAAGC CTAGAAATGA ATTCCTGAAA TTATTCACAG TAAAAAAAGA CCTACTTAAT ATATCTCACT GC'rGACT'rAC CAATTCACGA CAThAACAAG rlrATTTTAC ACTAAATAAC ACGAACAAAA AGAAGAGAGG TGAAAATCAA AGAGCAAACT TGGATAGAAT TGACAGAGCC TCTATAGTGA 'rCCCTTTTGC GTCTTATCAA ATAI-IrCCCG GACGGGATTT TAGAGGTAA AAGAGCrAGG GGCTCAAAAA GATAATTrTCA CCTCCCGTCC CTCTAAGTAA GTCCCCTAAA TTATTGTTAG GTGTTCCGGC TAAAACGATA TTCA.ATTTTA TrTAGAAATC AACTATTTTG GTGAACAAAA ACTCCATTGT AGGAAGCTAT CCACAACCTC AGTATCATAT ACCTACGGTA TTAGAAGATT TTTCCATCAT CC'TTTTCTA ACrTTAAAGA S.
AATAGGCTTG
TTAGTCTT7'r
AAGCTAACAG
AAAACACTGT
AGCCAATGTT
GGTTTAAA.AA
TTATACTAAA
TTGAGGTTG
AGGCGACCTT CACGTGGCTT GAAGAGATTT AAAAGGCATA CTATCAAGCT TTAGACACC CT7TTTCCCAA TTTTATTAT TCTACTCGCT AACTGCAAAT TTATGAGGAT AGATATAGGG GGGACGATAA ATAGGATAGT TTCCTTCAT TATGCCAAAG GAGAGA'rTGA CTCCACGACC CTTTGTTCCC TCCTCTAACA TTAGTTGACA GGACGTTGGT ATTCAATCCT AAAACCCAGT ATATrrCGATA CAAGCAACTC GGGAATGACC AGTCAATTTC CAAGACAATC GGTGTATGGT ATTTAGI'GAC CTTGTCAGCG ATACGGrrAC.
TATTGTTGAT TTAGAAG'rr TTGCTGCGTT
TCGAAGAGTA
TGACAATATG
3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 AAATCTTAAA AAATAGCCAT CTGGATCCAA ATCACTGACA CGAAACTTTC rTTTTGGTCAA CACTCTTTAA TAGAGTTTTG AAACATCCTT AAAGGGATAG GTCAGTTCAG CTAGTTGATC CTCTTCAAGA GAAAGAGAAA GTTTTCTTCT AAACCACAGT AGAAGGACCG GGACTGTTCG GCATTGTAGT CCATATAGAA AATCCTTACA
CTTGGCGAC
T'rGGAGCCA
GTGCCCACCA
ACCTGAGTCA A'rCATATCAG GTAG'rCGATT- CTCCAGCCTG AGTGTAGCGT TCAGGAACAT CGCCATGAAC ATGGCGGAAG CACGTTCCTC GTCAGTAAAT TTTCATTGTG GGCTACCTG GTGTCTGTAA ATCCAGTTGC CAAAACGTTG GTAAATCCAG CCAGGTGAAC GGCGGTTGCT AGCAGGATTT GCAAGGTCGA TAGTCACCGG TCGCAAGGAC TGGT'rrTTCT TTGTCTAGTT 570 CAGCCAAATA CTCAGCATAT TTGGCATCCC AGACTTGGCG CGTCACCAGC GTTVGGAGTG TAAACTTGGG T1TACGAAAAA TGATACGACC ??CCAAGTCC ATGGTAGAAG CGGCACCGAT CTGTAAGTTC TTTCTTATAA AGGAACATGG TTCCAGCATA TTCTCCAAG CGT'TTGAGAC TGCATCAAAT TCTAGAGTGA TTCTGGGAAG CTGATAGTAG GCCTTTACGG GCAGGCTCT'r TTCTAAAATT TCCACG=GTT AGCAATGATA TCAGCATTTI ACGAGCTGAG TCACrAGTTA
GGGAAGAGCG
TCTTTGT'AGG
CAGCGACCAA
GGGCAGCGTT
TCAGATTATA
AATTACTCTC
AATTCTACTC
CCACGTGTTT TCGTAGCCTG GGAAG.AGTC TCCIrGGCA GAAAGCTTGG TTTCTTGGAT GGTTTGTAGG ACTTCTTGGG ACAA'IrrGGC TAGGGAATCA ATATTCCATG AGATAAGr'rT GATTTArTA TACCAAAAAA AGATCTATTT TTTCGTTTrAT AATTAAGAAT GATTT'rATGA 'rTATGACTAT GTACTCAGCC AAATCGGTCA 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520
CATAAAGTTA
CCCCAACGTA
AAGGGACTGA
GCAAAATGCT
CCTTTTTCAT
TGGTTTGAAA
AAATACATGA
ATCATGGTTG
TACCATAATA
NT'AGCTCTG
ATTTCAAACT
a a a. *a a a a GCTTTGGGAT TGTTCTATTA GCTGTGACAG 7rwrTTTr'GC TTTCAACGCA AAAAGGGAAC CGAATTT'CGT GAG1TTGGTCA TGATTTCAGA TCTGGCCTTA CTTGGTCA GCATCACGAC TTATCAAAAC AATCAAGTTT CTAACAATAA *TCACI'rCATT' TCATCGAGGT GTTAATACTT CCACAAACAC TG'rTrCCAAkA GAT'rTGTGAG TAGACAAGTC AGATGCCGCA CTrArCAAGG TGGGAGATCG AGAAGTCTAT 5580 C'TATTATCGT 5640 GCCCTAAATG GAAGTGAGCC AGACAAGTAC GACGCAATTG AACTGGTGGA TGTGAACAAA AACTGGTCTT GACTCTCAAA TAGCTCA.ACA ATGGTAAGCC AACTCTCCTT ATCAAAAATG CTGTTrAGAGA AAGTCGAATT TGACACTTAA TTATATCGAA ACAATGTTCA CTTTGTGAAA GGAATATTGA CCCAGAAGCC
GTATAAGACA
AT'TrAATCA
CGTTTGATTG
TGTCGTTCAC
TTGCTrrGTC TGCATCGGAT GTATCCCTCA AACTTCGTAG CCAAGGGAT'r TTCCAGATGA AGCAAGTCAA ACGAGCTGTG CAAGAGCAAA ATGGGCAACT CATCGTTGTG CAAATGGGAG ATGA.AAATCC TAAGTATCCA GTTGTGACTG CGATTGGITCG TAGCGAAGAG TGGTTGCTTG TAGCCAATAT CTTTATTGCT GAATATGACA .AAGAAAAACC TGGGGTCTTG TACTCTTCGA 'rCCCGTATGT AGGTTAC'rGA CTTCGTCAGT GCAGCCTGCG GCTACTTCC TAGTTTGCTC ACGGTGTGAT TCAAGTAGAT GTCTTGGAAT ATAACCTCAG TAAACAAGG CATGACAATG AGGGTGCTGT TACAG'TCCTA ACTTATGAAT AAATCTCTT AAACCGCGTC AACGTCGCCT TCTATCTACA ACCTCAAAGC AGTGCTTTGA 'I-r'GATTTTC ATTGAGTA'rT GGCCTCACGT 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 TTCCATrTGC AATCAGAAAG GGATTTT'ATG TCCATTATTC AAAAACTTTG GTGGTrrTTTC AAGTTAGAAA AACGCCGTTA TCTAGTCGGA ATTGTGGCCC 'rGATCTTGGT TTCCGTCCTC AATCTCATTC CTCCTATCrT TATGGGGCGG GI'CATGATG CCATCACATC GGGGCAAT'rA TAGCCTAI-rT TACTT13CTAC 'rTGCAGCCTT TGGTATGTAC ACCCAGCAGG
TATTTGCGCT
ATGCGGTCTC
CGGACGGGTG
GGrrGGCGGTG
ATGCTCTTTA
ACCTCCTTCT
ATGTGTGGCG
GCTTGTTTAA
ATCTGATGGC
TCATGTCTGC
GCATCTCATG
TATGTATATC CTTGGGACCT GCATTTCACA AAAATGTCGT ACACGCAACC AATGATATCA GGTGGATGCC 'rCTATCACG GCAGATGACT CT'rGTTGCCA CTTATTGCTT GGGACAGATC CAGCCN'TA TCAAACCTAT ATGCC7TTGAC TCGTTTAGCA CTCTGGTGAC TTTGTTGACC TTCTCCCCCT ACCTTTCATG CCTTTGGCGA ATCCCAAGCT CAGGTATCAA AGTGACCAAG GCCTATACGA CTAGTCGCCT AGGGAGAAAG ACTCATAAGG GCTTrTTTCTG AACTCAATAA CAAGGTACAG GAGTCCGTAT TCTTTCGGTT ATCAGGCAGA CGAGTTGAAG CAAAAGAACC TGCAAACCAT GAAATATGAT GT'rGGTTCGT CCTATGtTr AACGCTTTTG A'rTACAGTTG GGAATCTrAGT CACCTTTATC CTGGCCA'rCG GTTTCCTCTT TAATACTACT GAAAATCTTT TGTCTCAGGA ATCTCCTGTA TCTTTTCAGG CAGTCAATGA AGTC'rCTTTG ACCCTATGGT GT'rGGCTCCT
AGCTATTTGG
CAGCGAGGGA
TGATGGTTCA
ATATGCTGGT
AGGT'rTCTTA CAAGACCCTG AG'rrTCCTCT ArrAACCTTC
TCTCTTGTTT
GGA.AGGGCAG
CTGGCCTCTT
CCAGCGGATT
GGATGGTATT
GGAAACACTG
GCAGACAGGC
TA6AGGGTGCC CAGTC'rCATG
TATCCGCTTT
CCGGGTTTAC
6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 GAAAATGGGC GTTTGGAGTA TGCCA'rTGAC AGCTrCCT'r 'TGAAAATGA ACCATATTC ACTTTAGTTT GGCAAAAGGG CAAACACrcc GCTTGGTTGG TCTGGGAAAA CGTCCTTAAT ATTTATCTAA ACGGTCACGA GGCTATGTTC CTCAGGACCA GGCAATCCTA ACTTGCCCCT CAAGATATTG TAGACATGCC CTTTCTGT'G GTCAAAAGCA ATCTGATTT TGGATGATTC GACAACCTCA AGGAGATGCG GCTGTTGTCC ATGCAGATTT ACGCACGAAG ACTTGCTAGC CAAGCTCCTC TGCGTGAAT ACGATGTGGA TATTCGGGAC 'rATCGTCTGA CAGACC'rTCG GTTTCTTTTT GCGACTTCAA TCCTAGACAA TTCAGCGGTC GAGGAAGCTA CTAAGCTAGC TCAAGGAT'T GATACGCTGA TTGGTGAAAA AGGAGTCACT ACGGTTGGCT ATGAGTCGGG CTATGAT*TTN AGACCCTGAT CTTATCCGCC GTAGATGCCA AGACAGAG'rA TGCGAr'rATC AAAGGACAAG ACAACCATTA TCACTGCCCA TCGCCTCAG'r TATTTTAGTT CTACAAAATG GTCAAATTAT CGAACGAGGC 'rTTGGATGGC TGGTATGCCC AAACCTACCA GTCTCAGCAG TTCGAAATGA AAGGAGAAGA AGATGCAGAA TAAACAAGAA CAATGGACTG TATTGAAGCG CTTGATGTCT TATCTCAAGC CTTATGGACT CCTGACCTTT TTGGCACTCA GTTTTCTCCT 572 AGCGACr.ACG GTCATTAAAA GTGTCATACC CCTCGTGGCT TCCCA=?~A TCGACCAGTA TCTCAGCAAT CTTAACCAAC TAGCCGTTAC CGrTrTGCTG GTCTACTATG GTCTCTACAT CCTACAAACT GTAGTTCAGT ATGTCGGCAA TC1TCTCTT GCGCGCGTGT C?1'ACAGTAT TGTTAGGGAT ATTCGTCGGG ATGCC TGrC CAATATGGAG AAACTGGGCA 743TCTTACTT TGACAAGACG CCAGCAGGTT TGATATC=N TCTGGGATTT CCTTTATACC ATG7TGGTGC TTrGATTTC CTT7rGGTCA CAGAAGTCTC TTGTCAGATA TATTCAGG.CC TTTAACAAG ACACT'rGGCTC TACGCCAACC GAGTTTGCTG AAACTTCTAG TTCTATCGGG ATAACGGTCG CTATCGTTTC TCGTTGACC AACGATACCG AGACGATTAG TATCCAGCTT TATCTCAGCA G7rTATCT TGGATTTTCG TTTGACGGCT ?TAGTCTTGC ATCTCTATCG AAAAAAGTCA GTGAAAATCA TCAATAGTAA GCTGGCAGAG AATATCGAGG AGAACGCCT GCAGGCAGAA TTTGATGAAA
TTCTGACAAC
TCTTTCTTCC
TCGAGAAAAC
GAATCAGGAT
'rCAACCAAGA GTTCTGTAGC CTTGGATGCC GCTATGCAGT CTTGATGGCC S. *S S S GGACCATGTA TGCCTTTATC TTCAACTCTG 'rGACCCCI-rG ATTGAGGTGA CGCAAAACTT CTC~rTGA GACCTGCCAT TACTTTGGCT ACCG-'GGTITT CAGTACATCA ACCGCCTTTT CAAACGGCTA TGGTTTCTGC CCTCTTCALAG AAAATGGGCA TGTT'rCTCAT ATGACGGTAA GGTGAAACCA TTGCCTTTG'r CTCATGCGCT =TATGAATT AGGTCGTGTC TTTGCCCTGA TAGACGAGAG GACCTATGAA AGCCAAAGTC CAAGAAGGCA ATATCCGTTT TGAACATG;TG ACATCCGATT CTGGATGACA TTTCT"rCC TGTTAATAAG AGGTCATACA GGTTCAGGGA AATCGTCTAT TATCAA'rGTC 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 CCAGTCAGCG AGAGTTCTCT TGGATGATGT GAGA.AAAAAC ATCGGTGG 'rCTTGCACCA GGATATCAGG GATTTCAGTC AAGAAGAGCT ACCCTTCCTC 'rA'CATGGAA CTATTAAG'rC CAATATCGCC ATGTACCAAG AAACCAGTGA TGAGCAGGTT CAGGCTGCGG CAGCCTTTGT CGATGCAGAT TCCTTTA'ITC AAGAACTTCC TCAGGGGTAC GAC'TCCCCTG 7TTCCGAGCG TGGT 'CGAGC TTCTCTACTG CCAGCCTAAA ATCCTGATT CTTGGTTCAA GCTTCTCTGG CCGCCTTTCT ACTATTCAAG GGCAACGCCA GCTTCTTGCC TTTGCTAGAA TGGATGAAGC GACAGCCAAT A'rTGACTCTG CGAAGATGAG ACAGGGCCGA ACAACTATTG ATGCCAACTG CATCTATGTC TTGGATAAGG
CAGTCGCCAG
AAACAGAAAC
CTATCGCTCA
GACGCATTAT
AGATGTATAG CGAGAGTGGA ACCCATGAGG AACTCTTGGC 'N'CAGGCA GGGGCCATGG CCGATACTCT ATCTGCAATC TCAAAGCTGT AC7"rTGA'TT TCTGGGAGGA ACCTATCACA
TTGAAAATCT
TCATTGAGTA
CTTTAAACCA TGTCAGCTTT CTAGAAGGAA ATCCTTCAAA TrACAGATTTr CTTrCACCGC CrCCATT TTGTGGTATA ATGAAAAATG TTGACAAATA GTATAATAAA AACAAAGGAG AATCAAGCGA GGTTGAGTCT TCAAGCGTTG CTTGGACTGG TTCTCATCTT GAGCATTTGG AGCGTGTGAC CCAGTACAAC
ATGCGGATAA
GAAATTTCA'r
GTGAGATGTC
CTGAAATGAT
ACAAGGATGA
AAAAGGAACT
CCGACGTGTC
CTTTGTCGGT
GGCAACCTTG
GGACACAATT
AACAGCATGC TGAAATGGGA AGACI'GCCT GTGGAAAMGA TACTACCAGC TTGrCTCTAA AAGGAAGGGT TCGCTGATr GTTTTGGCCT TGGTCTTACT GGTTCTGACC TCTCCCA'rCT ATCAAGTTGG ATAGCAAAGG GCCAGTGAVT TACAAGCAAG CGTCGGTTCA AGAT7"TGGAA GTITCGrACC A'rCGTGACCG CTGGTGACTr CTGCTAACGA TAGCCGCA'rT ACCAAGGTTG CGT-rTGGACG AACTGCCTCA GTTGGTCAAT GTCCrAAAG ACACGACCTG AAGTGCCACG TTATACAGAG CAGTATAGCC CTCTTGCAAG CAGGGATTAC CTCTCCAGCC ACCArCAACT ATCAGTCAAA TGACGGAGAA AGTCTGTCA GrTGATCAGG CCTGAAAAGA TGCCCTATAA CCTCGCCTAT CTCCGAGAGT AAAATCATGT TTCAAACCGT GTTTGAGGTA CTAAAATAAA ACAGATAAAA GGAGCAAATC AATGCCAAAT TACAATATTC ACAGAAGCAG AAATTACTGA AGTAGTGGAT ACCCTCCGTT 44444* 0 44 4 @4o *4 40 4. 4 *4 0 4 4 @4 4:9. 0 4 0 4 4444..
4 4*~4 *440 *4 44 4 4 44 4 4 44.4 44 *4 4. S 4
CCTATGTGGA
TTAGTT'rC7T
GTAGTCATAA
CATTTTCACC
CTGGTrGGAT
CACAGACACC
GCGTTTIGGA
CATGTAGTGT
CGTTTGAGAT
TTCCAGTAGA
AAAAACGTGA
TTGTCTCTGA
TCGCTGACTT
GAAGTGCGAC
AAATCCTTTC
GGGAATACGA
'rTGGT'TTGGT
ACCGCIATGA
CTGTCGAATC
GCATGTTCTT
TGGGGACATC
GAAAATCAGT
GCCTGATATC
CACAACAGGT CCTAAAACAA AAGAACTGGA GCGCCGCTTG TCTCTTTACA TAAGACTGTT TGTCrCAACT CTGCGACAGC CGCTCTGGAG TTGATTTTAC AGTGGGACCT GGTGATGAAG TCATCGTTCC ACCCATGACC TATACGGCTT CATTACGCAC GTrGGAGCAA CCCCTGTCAT GGTGGATA'rC CAAGCAGATA GGACTA'rGAC CTGCTTGAGC AAGCTATCAC TGAGAAAACT AAGGTGATTA GCTCGCAGGG ATTGTT'rGCG ATTATGACCG TrTGTTCCAA GTCGTGGAGA C'TCTTTACC GCTTCAAGCA AGTGGCAAAA GGCC'rTTAAC CGTATTGTCA TAGTGCCCAC GCTTTGGGAT CTAI'TATAA AGGACA.ACCT TCTGGTTCTA TACTTCCTTC TCATTCCATG CAGTTAAGAA CTTTACAACG GCAGAAGGTG TrGGAAAGCC AATCCAGTGA TTGATGACGA AGAGATGTAC AAGGAATTCC CCTTCACGGG CAAACTAAGG ATGCTCTrGC CAAGATGCAA CTGGGGTCAr TATCGTTACA CCAGCCTATA AGTGCAACAT GACCGATATC ATGGCTTCAC ACAATrGGAC CGCTATCCAA GTrTTGCA ACGCCGTAAG GACATTGTGG TAGTCGTTTT GCAGGTTCTC GCATCCATCC TTrrGGCACAC AAGACTGAAA 7TCACGCCAC CTCTACATCA CCCGTGTAGA AGGAGCAAGC CTAGAAGAAC 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 GCAACCTCAT CATCCAAGAA CGCTTCCTC T CTTGACAGCC 574 TTGGCTAAAG CAGGAATTGC AAGTAATGTT TATAAGAATC TrGGATrTGA TATGACGAAC CACThCAAAC
TATCCTAAGG
CCTA'rGCCTr CTTTGAGAAT GAAATTACCC TCCCTCTTCA AAGTAGACTA TATCATTGAG ACTTTCAAAA CAG'xTrCTGA AAAAA'rGACA AACTACAGTC AAGCGAAAGT GATCCTGCCC TAAAAACrGT TGTrCAAT TGATAATAGT TTACACCTGT CAGAGAGAGA AT'N'TTATAG GATT1TTCCTT TCTrGTGGGA ATGTGAGCAA TTTAGTGTAG CATTT~AGAAT CCTTACTAGA GTCTT13TTCT AGTN'TCAAT 'rCACCCTATT 'TTrIGAAAGA ATTGTG3GAAA CTCGCGTCTT TTTTTGrrr CAGAATAT'rG TTrCATGTT-C TAGTCATTCTr TTTGCATGAT AGAATTrATA TACAAATATT CTATATGTTT AGTGATGCTT GCTATACATT ATCTATAAAA CACTTGTCTA CGATTACC1TA TATGCCCTAT GCATCTATTT TTATCGAGG'r TAAATCTAGC TTTTATAGAA TGTAGTGTr'r TAGTTTCAAT CCGCCATATG AGCGATATTC TGCTTGTATG ACAAGGTATT TGTTCTTTCA TTTATAATTT AATATAGTAA ATGGGATATT TTATAT'rCAA GCTAAGAAAG AAGGCTAAAG AGCAAACTAG GAAGTTGGCC ATAGATAGCT GTAGATATAG TAAAATGAAA TGAGAATAGG ACAAATTGAT CTAACAATGT TTTAGAAGTA GAGGTGTACT A'rTTAGT'TT CAAGTCAGTA ACCTAGAC?1' AGGGCAAGGC GGCACTGACC AGAGTATAAA TTTTAATATT TTCT'rGTGTT ATTCCTTGAC TACTAAATTA AGCGATGAAG AAAAGTGCTA ACTTTATCAA -CTAAAGTC TAAT'rGAGTG AGTT1GAGGCC CCTTrCTCCT GTCCCGTGGT TTGAAATAAG CATCATTTAG AAAATCTAGT CGTGAGTTTC CATGAGTGAG TTCAAAATTT TGTGCCTGTC GCATGTTGAT ATTATAATAA ATT1AGATCTC CrGCGAGACA TCCAGTAN'T TAGAAGCACT GGTCTATTTA AGAAATATAT AGGTAAATAT CCCTGGCGAA ACAACATATC AACAAATTTA ATAGCATCAC TTTTGAATGG CAAAACCCTG CTrTGAGGT-r CGGGACAGTC AAATCGATTT CAGTCTACTA TAGAACTGAC TAGTTTGAAG AGATTTCCGA AATTCAATTT GGAAAATATA 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13188 TGATAAAGAT AATGACAGCG GTGTCATTCT ATCTAT'mA AGAAAAGTAA TAATCAATTG TTAAAAATAG TAAAAA.AATT GGAGGTTCTG ATGAAATATT TTGTTCCG INFORMATION FOR SEQ ID NO: 71: SEQUENCE CHARACTERISTICS: CA) LENGTH: 32768 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 575 AACCAGTGCA TCACTCTCAG TGAATCCGCA TCAACCAGTG AACAAGTGCA TCGGCrTCAG TGAATCGGCA TCAACGTG
AACAAGTGCT
AGCCTCAGCA
GACAAGCGCC
AGCC'TCAGCC
TGCCTCAGCC
TCGGCrCAG
AGCACATCAG
TCAGCTTCAG
TCGACAAGTG
TCAGCAAGITA
CAAGCACCAG
CCTrCAGCTTC
CAACACAAG
C=TCGCTTC
CGTCAACGAG
CTTCTCGAATC
CAAGTACCAG
CGTCGGCCTC
'rGCGTCGGCC TCAGCAAGCA CCAGCGCC AGCAAG;TACC TCAGCATCTG AATCAGCATC TGCTTCAGCC TCAGCAAGTA TCTCAGCGTC A~CCAGTACT AGCGCCTCAG CATCAGcGTc TGCGTCTCAG TCAGCATCAA CGAGTACG;TC TGCATCAACC AGTGCCTCAG CCTCAGCATC TGCGTCAGCC TCAGCAAGTA CCAGTGCTTC AACCAGTGCA TCTGAATCGG CATCAACCAG AGCAAGTACT AGTrCATCAG CGCCTCAGCT TCAGCAAGCA CTAGCGCCTC AGCCTCAGCA TCAACGAGTG CGTCCGCTTC CATCAGCATC AACCAGTGCA TCGGCT'rCAG CAAGTACCAG CCAGTGCGTC AGCCTCAGCA AGTACCAGCG CCTCAGCCTC CTTCAGCAAG TACCAGTGCG TCAGCCTCAG CGTCGACAAG CCTCAGCGTC TGAATCAGCA TCAACGAGTG CATCAGCTTC 0000 00000 0000 0 0:0:.
0
AGCAAGCACC
TGCGTCGGCT
AGCATCAACA
TGCGTCCGCT
AGCGTCAACG
AGCTTCTGAA
AGTGCCTCAG
TCAGCAAGTA
AG1TG=TCAG
TCAGCAAGTA
AGTGCCTCTG
TCTGCATCAA
CTTCAGCAAG TATC'rCAGCG CTAGCGCCTC AGCATCAGCG AGTCAGCATC AACGAGTACG CCAGTGCGTC AGCCTCAGCA CCTCAGCAAG 'rACCAGTGCT CATCTGAATC GGCATCAACC AGCAAG7rACC AGTGCGTCAG TGCGTCGGCC TCAACCACTG TACTAGCGCC TCAGCCTCAG CATCAACGAG AGCATCAGCA TCAACGAGTG CATCGGCTTC CACCAGTGCG TCAGnCTCAG CAAG'rACCAG AGCTTCAGCA AGTACCAG'rG CG'rCAgCCTC TACCTCAGCG TCTGAATCAG CATCAACGAG AGCT TCAGCA AGTACCAGTG CGTCGGCTTC AACCAGTGCC TCTGAATCAG CATCAACAAG GGCTTCAGCA AGTACTAGTG CATCGGCTTC AACGAGTGCT TCGGCTTCAG CATCAACGAG TGAATCTGCA TCAACCAGTG CGTCCGCTTC
TGCGTCCGCT
AGCAAGTACC
CGCCTCAGCC
AGCG'rCGACA
TGCATCAGCT
AGCATCAACG
rGCCTCGGCT
AGCATCCACA
TCTGAATCGG CATCAACGAG TCA.ACAAGTG CTTCGG=C~ TCAGCCTCAG CAAGCACATC TCGACAAGCG CCTCACCT'rC TCAGCCTCAG CGTCGACAAG AGTGCGTCAg CCTCAGCAAG TCAGCAAGTA CTAGTGCATC AGCGCCTCAG CTTCAGCAAG TCAGCAAGCA CCAGTGCCr'C AGTGCGTCGG CTTCAGCAAG TCAGCATCAA CAACTGCTTC AGTGC -rCAG TCTCAGCGTC TCAGCA.AGCA CCAGTGCGTC AGTGCGTCTG AA'rCGGCATC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 TGCGTCAGCC TCAGCAAGCA AGCGTCAACC AGTGCGTCGG
CATCAGCTTC
CTTCAGCGTC
GACAAGTGCT TCGGCTTCAG AGCG'TCAGC t TCCGCC1'CAA AGCAAGTA'rC TCAGCGTCTG TACGTCAGCC TCAGCAAGCA 576 CATCAACGAG TGCGTCGGCC TCAGCAAGCG CAAGTACCTC CCAGTCCGTC GvGCTTCAGCA AGCACAAGTG CGTCAGCCTC AATCGGCATC AACGAGTGCC TCTGAGTCAG CA'rCAACGAG CATCAGCN'C TGAATCTGCA 'rCAACCAGTG CGT-ACC'CC AGCATCGACA AGCGCCTCAG CTrCAGCAAG 'rACCAGTGCT TGCGTCGGCC TCAACCAGTG CATCTGAATC GGCATCAACC
TCAGCCTCAG
AGTGCGTCAG
TCAGCATCAA TACTAGTGCA GGCT'rCAGCG AACA6AGTGCT
TGAATCAGCG
AACCAGCGCC
TCAGCTTCAG CATCAACGAG TGCATCGGC'? TCAACCAGTG CGTCAGCTTC AGCAAGTACC AGTGCTTCAG 'rCAGCCTCAG CATCGACAAG TCCCCGGCT TCAGCAAGCA TCAACCAGTG CTTCCGCTTC AGCAAGTACC AGTGCTTCAG TCGGCCTCAG CAAGCACCTC AGCTTCTGAA TCGGCCTCAA
CGTCGACAAG
CCTCAGCAAG
CCAGTGCCTC
'rCTCAGCATC
CATCAGCATC
CTTCAGCATC
CCAGCGCCTC
CCTCAGCATC
CGAGTACGTC
CCTCAGCAAG
GGCCTCAGCA AGCACCTCAG CTTCTGAATC AACGAGTGCT TCGGCTTCAG CAAGCACA6AG GGCCTCAACC AGCGCCTCAG CGCCTCGGGT 'rCAGCATCAA AGCATCAACA AGTGCGTCAG AGCT'rCAGCG TCAACCAGTG TATCTrCAGCG TCTGAATCGG AGCCTCAGCA ACCACCTCAG CACAAGCGCC TCAGCTTCAG GGCCTCAACC AGTGCATCTG TGCATCGGCT TCAGCA'rCAA AGCAAGTACC AGTGCTTCAG TGCCTCGGC'r TCAGCAAGCA AGCAAGTACC AGTGCGTCAG
CTTCAGCCTC
CATCAACGAG
CTTCTGAATC
CAAGTACCAG
AATCGGCATC
CCAGTGCCTC
TCTCAGCATC
CATCAGCATC
CCTCAGCCTC
TGCGTCTGAG
GGCCTCAACC
TGCTTCAGCC
AACCAGTGCG
GGCTTCAGCG
AACAAGTGCT
TGAATCAGCG
GACAAGTGCG
TCAGCATCAA CGAGTACGTC AGTGCGTCAG CCTCAGCATC TCAGCGTCGA CKAGTGCGTC TCAGCCTCAG CAAGTACTAG TCAACCAGTG Cc'rCAGCT'rC TCAGCCTCAG CATCGACAAG TCGACAAGCG CCTCAGC-TC TCAGCCTCAG CAAGTACTAG TCAACCAGTG CA'rCAGAGTC TCGGCTTrCAG CAAGCACCAG TCAACCAGTG CGTCAGCCTC TCCGCTTCAG CAACTACTAG 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 TGCATCAGCT TCAGCATCAA CGAGTGCATC GGCTTCCCCG AGCAAGTACC AGTGCGTCAg CTTCCGCATC AACAAGTGCC
TGCGTCGGCT
AGCAAGTATC
CGCCTCAGCC
GGCATCAACG
TCAGCAAGTA CTAGCGCCTC AGCCTCACCC TCAGCCTCTG AATCGCCATC AACGAGTGCG
TCAGCGTCAA
AGTGCGTCCG
CAAGTGCATC GGCTTCAGCG TCAACGAGTG CGTCTGAATC CTTCAGCA.AG TACTACGCC TCAGCCTCAG CGTCAACAAG CGAGTGCGTC CGCTTCAGCA AGTACTAGCG CCTCACCCTC CTTCAGCGTC AACGAGTGCG TCTGAGTCAG CATCAACGAG TGCATCGGCT TCAGCATCAA AGCGTCAACA AGTGCZATCCG 577 TGCGTCAGCC TCAGCAACCA CATCAGCT'rC TGAATCTGCA TCAACCZAGTG CGTCAGCCTC AGCATCGACA AGCGCCTCAG CTTCAGCAAG TACCAGTGCG TCAGCCTCAG CCTCGACAAG TGCGTCGGCT TCAGCAACTA CCAGTGCGTC AGCCTCAGCA AGTACCACTG CGTCAGCCTC AGCGTCGACA AGTGCGTCGG CCTCAACCAG TGCATCTGAA TCGGCATCAA CCAGTGCGTC AGCCTCAGCA AGTACTAGTG CATCAGCTTC AGCATCAACG AG'rGCATCGG CTTCAGCATC AACCAGTGCA TCAGAGTCAG CAAGTACCAG TGCGTCAGCT TCCGCATCAA CAAGTGCCTC GGCTTCAGCA ACTACTAGCG AACCAGCGCC TCGGCCTCAG GGCTTCAGCA TCAACGAGTG CACCAGCGCG TCTGAATCCG TGAATCAGCA TCAACAAGTG TATCTCAGCG TCTGAATCGG AGCATCAGCG TCAACAAGTG CCTCAGCCTC AGCGTCAACA AGTGCTTCAG CAAGTATCTC AGCCTCTGAA TCGGCATCAA CATCAGTCTC AGCAAGCACC AGTGCGTCGG CA'rCAACCAG TGCCTCAGCT TCAGCAAGTA CCTCGGCTTC AGCAAGCACA AGTGCTTCAG
CTTCCGCGTC
CAAGTGCCTC
CCI'CAGCAAG
CCTCAGCATC
CC1'CAGCAAG 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 CATCAACGAG TGCG'rCCGCT TCAGCAAGTA CTAGCGCCTC CTTCGGCTTC AGCGTCAACG AGTGCGTCTG AGTCAGCATC AACGAGTACG TCAGCCTCAG CAAGCACATC AGCTTCTGAA TCTGCATCAA CCAGTGCGTC AGCCTCAGCA TCGACAAGCG CCTCAGCTTC AGCAAGTACC DLGTGCGTCAG CCTCAGCAAG TACCAGTGCT TCAGCCI'CAG CGTCGACAAC TGCGTCGGCC TCAACCAGTG CATCTGAATC GGCATCAACC AGTGCGTCAG CCTCAGCAAG TACTAGCGCC TCAGCCTCAG CATCAACGAC 4500 TGCGTCCGCT TCAGCAAGTA CTAGTGCA'rC AGCCTCGACA AGCGCCTCAG CTTCAGCAAG TGCGTCGGCT TCAGCAAGTA CCTCAGCGTC AGCATCAACG AGTGCATCAG CTTCAGCATC AGCTTCAGCA AGTACTAGCG TACCAGTGCG TCAGCCTCAG CCTCAGCCTC 4620 CGTCGACAAG 4680 TGAATCAGCA TCAACAAGTG CGTCGGCTTC AACAAGTGCT TCAGCTTCAG CAAG'rACCAG TGCGTCCGCT TCAGCATCAA CGAGTGCTTC AGTCTCAGCG TCAACCAGTG CCTCTGAATC CGCATCAACA AG'rGCCTCGG CT'rCAGCAAG CACCAGTGCT TCGGCTTCAG CGTCAACGAG TGCGTCTGAG TCAGCATCAA CGAGTGCGTC TGCATCAACC AGTGCGTCAG CTTCCGCATC TGCTTCAGCC TCAGCATCAA CCAGTGCATC AGCGTCAACC AGTGCCTCGG CTTCAGCAAG TGCGTCAGCT TCAGCATCALA CCAGTGCTDC
AGCCTCAGCA
AACAAGCGCC
AGCTTCAGCC
TACCAGTGCG
GGCr'rCGGCA
AGCACATCAG
TCGCCCTCAG
TCAACAAGTG
TCAGCTTCAG
TCAACAAGTG
CTTCTGAATC
CAAGTACAAG
cT'rCAGCCTC
CAAGCACAAG
CCTCAGCATC
4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 AGCATCAACG AGTGCGTCAG CCTCAGCAAG TACTAGTGCA 'rCAGCATCAG CATCAACCAG 578 TGCATCAGCC TCACCAAGTA TCTCAGCGTC TGAATCCGCA TCAACGAGTG CATCAGCATC AGCATCAACG AGTGCATCGG CTTCAGCGTC AACCAGTGCA TCACTCTCAG CAAGCACCAG 'rGCGTCGGCr TCAGCATCAA CGAGTGCCTC AGCCTCAGCA AGTATCTCAG CGTC'rGAATC GGCATCAACG AGTGCGTCAG CCTCAGCAAG TGCGTCGGCT TCAGCATCAA CCAGTGCCTC GGCATCAACG AGTGCGTCAG CCTCAGCAAG TGCATCGGCT TCAGCAAGTA CCAGCGCCTC TACTAGTGCA TCGGCTTCAG CAAGCACCAG AGCCTCAGCA AGTATCTCAG CGTCTGAATC AGCAAGTACC AGCGCCTCAG TGCGTCAGCTr CAGCATCAAC GCATCALACCA CGrCCGTCGGC GCCrCAGCTT CAGCATCAAC GCAAACCATT CGAACTCACA
CCTCAGCAAG
AAGTGCT'rCA
TTCAGCAAGC
AAGTGCOTCA
AGTTGGAAAT
TACTAGI'GCA
AGCTTCAGCA
CACCAGTGCC
GCTTCG~ccT
ACCAGTGCCT
TCAGCMTCAG CATCAACGAG AGCACCAGTG CG'rCAGCCTC TCACTCAG CAAGTACCAC CAACAAGTGC GTCAGCTTCA CGGCCTCAGC AAGCACCAGT GCTTCAGCAA GTACATCAGT 1'TCAAATTCA ACTCGGAT CGACAGGTAA ATCCCAAAAA
GAATTGCCTA
GCTGTT'ACAG
ATACAGGTAC TGAG;TCGTCA AI7rGGATCTG TG~rACTGG AGTTCTAGCA GTrATTGGATT GGTrrGCGAAA CGCCGTAAAC GTGATGAAGA AGAGTAAGAC AACCTGTAAA GTTAGGCrAA ACTAACTCGC GCACATAALAT CAAGGAGAAA ATTGCTAGTG GATGATAAAA TA.ACAGTCAT 'rGTACCAGTA TACAATGTGG AAAACTATCT GAGGAAGTGC CTAGATAGTA TTATTACTCA AACATATAAA AATATTGAGA TTGTTGTCGT TAATGATGGT TCTACGGATG CTTCAGGTGA AATTTGTAAA GAATTTTCAG AAATGGATCA CCGAATTCTC TATATAGAAC AAGAAAATGC TGGTCTTTCT GCCGCACGAA ACACCGGTCT GAA'rAATATG TCCGGAAATT ATGTGACCrr TGTGGCACTCG GATGATTGGA TTGAGCAAGA T'rATGTAGAA ACTCTATATA AAAAAATAGT AGACTATCAG GCTGATATTG CAGTTGGTAA TTATTATTCT TTCAACGAAA GTGAAGGAAT GT'rCTACTTT CATATATTGG GAGACTCCTA TTATGAGAAA 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 GTATATGATA ATGTTTCTAT GCTTTGATAT CTGCTTGGGG
GACATAGGTA
,AAGGTAATT
AGAGTT'rCGA
CTACTAGCTA
GAAGTCAGTC
AATTAGGAGA
ATTTAAATAA
CAGAAAAGTG
ATATCGGTA
TCGCCAACGG
CTTTGAGAAC
TAAACTCTAT
AGATGGTTAC
AAGTCTTTAT
GATGCACGCT
TCCTCTAGAG
TCAAGCTAGT
TTGTATGAAA
AAGGCAAGAT
CTCAATCAAA
GCTTATCGGA
TTAGTTGATG
CTCAAGAAAT
TGI-rrGAGCA
AGGTATATTT
TTAGAAAACG
CTATGTCTGA
GAAGAGT1TT GTTGCGCTr
ATTATCAGAA
TAGTTTA'rCA
ACGTATTACG
TCAGATCTTG
GTATAAAGAG
TGAAAAuA.AA AAACACTTGG CACTTTATCG GGTrTATCrG ACACACCAAC C'rATCGAGAC AAGAGGAAAG TTTGAAATGA AACAAAGGCT =TAAATCAG GCCATTGTCC TCGCAGCAAA CrATGGCTAT GTAGACCAAG TTTTAACGAC AA'rCAAGTCT A7"rrATC ATAATCG 'rC GATI-CG'r'rM TATCTGA7TC ATAGCGAT TCCAAATGAA TGGATI-AAGC AATTAAATAA GCGCAGAG AACN'TGACI' CAGAAATTAT GTAACTCTG AGCAAArrC ATGTTATAAA TCGGATATTA GTTACACAGT 'rAT7TCATAG CTGATTTCGT GCAAGAAGAC AAGGCCCTCT ACTTGGACTG GTAACGAAAA ATCTGGATGA CTTG7"rTGCT ACAGACTTAC AAGATTATCC GTTAGAGATT 7TGGGGGCAG AGCTTATTTT GGTCAAGAAA TCTTI'AATGC
TAATTGTCGG
CTTTTACCC
TGATCTAGTT
TTGGCTGCT
CCGTGTTCrC
TEGATGTAACC
GCrrTTTGAA 0*
TTGGTAAACA
AATGAATGC
CATAAATGGT
GATTATCAAT
AAACCGTGGA
CTTGAATGGA
TATCCAATAA
ATTGAGACAT
GTTAGTGATC
CACTATTTGG
TGGAATTGGA
TGCCTGAGG
AAGATT'rGGC CAGAA'rTGGG
AGGAACCTTT
TGG'N'CAATC
GATTGGCTCA
TAGATGTCGA
CTTTGATTAT AATCATATTG TCAGGATTAT CCTGCTATTA
GGCCCAAACC
ACAAAACCAT
CACTTGTCTA
CTTGCCTGAT
GATGACAATT
TAATGAATTG
ATGCT'rTTTG GAAAAAAGAG AATATGACCC AAAAATTAAT ATGATAAGGT GGATCAGGCA GATCAGAGCA TCTTGAATAT TA'rCGTGAAG
CATTTACATC
ATCTATACTG
AT1'CAGTTTA
TATCCAAACG
GTAGAAACCA
GATCAATTTrG
GTAGGTCAGG
ATAAGCAAAT
TCATTCATAA ACAGTTTCT1 TTrCACTATCT TTCTCATCGG TTTGGTGGTA CTATCATGGG CATTACAAAG ATCTCACATC CCTCAGACCA TATTGAACAA AGATAGCAGC TAGAGTAATA TGACTATATT TAACGGAATT GTCAAGTACT 7=?AGATA7T CTAATCTTGG CAAGCCTATC AGGCATATGC TGTTGACCAA GAAGAAAAAT CATTTAGTAG 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 AATCATGGCG AAAAGACAGA AGAAATTCTC TTATCCTTTG AAAATACTAA AACCTATGAA G'N'CAAGCAA TGATTGAAAA ATTGAGAGAA GAGATGCTCT GATTTTGACG GTTAGTGATC ATTTCTCCGT TCATCATATA TGAAAGT'rcr AGATTGAAGA GTTGGATTAT TTTTTATAAA TCAAACATCA GAGTGCTTTA 'rAAAATATAA ATAGACCTAA AGATATTTAA TATGAACTGC ACCCCAAAAG 'rTAGACAGAA AAAATCTAAC 'TTrTTGGsGT CAGTACAATA TTAGGGTGTG ATTAATTATC TTTTAGGTG AAAATGATTC TATAr'rATAG CTGTTTGATA CGAAATTTAT TATAAGGAAA TTATGTTAAT TCTATAGTTT TTAATGCAGA TAATGATTAT GTAGATAAAT 'rAGAAACTGC ATTTC'rTGTT ATAATAATG TTTAAAATTr TATGTATTTA A'rGATGATAT TGGTTTrTGA TGATGAAThA GCGATTGA.AG ACTATACAAT CTGAAATCGT ATTGTAGATC ATGr'rCTTAA AAAGTTTCAT TTACCGTTAA AGAATTTAAG
GAATACAA.AA
AATTAAATCT
TGCGTCAGAG
TAATGTAAAG
TTATGCCPACT
TTCTTTCGTT ATTTfTATACC TAATTTTGTC GACATCATTG TTACACGAAG TTTAGACTA'r TTGGCAGCAG TAGAAGATrrC ?I-ITGGTGAT ?rTT~AGTTA ATGTAGATAC TTGOAGAGA'r ACCAATCAAT ATCATGAAAC AGCATATGGA GATAGATGGA AAAGATTAGA CCGAAATTTT 580 AAAGAAAGTC GTGCTTTATA CCTAGATTCT TTATTTGATA TAGAACTAGA 'rGCTTATGCC GTTCCrrCTA CCAATTNTAA CTCCGGAATG GAAGATGCTT GTTCGAAACT G~rAGAACTG GATCAAGGAA TrAAATAT GTTATTCCAT AATTTTATGG TGGGGA'rGGA TAGCGTCGC.A 8880 8940 9000 9060 9120 9180 9240 9300 CACATAGAAG GAAA'rCATAA ATGGTATGAG AT~rCTGAGT 'rGAAAAATGG AGATTACCT AG7rrATAC ATTATACTGG cGAAAACCT TGGGAAATAA =rCCAATAA TCGCrAGA
S
S S
S
Sc S
S
*5 S S
S
5 5
S
GAAGTT"TGT GrrTTATAA ATTAGTCGTA GTTTCGAAGA GCTAGTTG'rG AGATGGAGCA TCTATACTAG CACATACATA GTTACGATTT ATCC"rGT T=T'TTAG ATATTAATCA CTATCTAAAC CAATTTTTAC ATAT'TTCTT CAACCGAACC TAAGTTTATG CGCAGACGAAC TCTGTTAGAA TGGTCTGATA ACTTGTATAC AGTCCTAAAG TGTAGAATAT TTGATAGAAA 1,'TTGCGTCT AGTGTCGTTG TTCTCCATTT GArrATCGAA TTATAAAGAA GTGGATAATA CTTT'GAAAAT ACTAGTCATG ~TATTGAG AAAAGACATT CTCATACAGC AATTI'TTACA A=rACCAGA GGTACATT' CTTTATTAAG ATATAGCAAT AAA7TTGGA TAATTTAGAT TT'GTATCCGT TGTTCAACAA ATATAGGCAA TCAAACTAAT AAACAAAATG GTAGAGGCTA TTAGAC.AATT TATAGGAGAA TAATTAGTAT TGTAGTTCCA ATCTACAACG TTGAGAATTA
TTTGCGAATG
AATCAATGAT
TTrcTCGTTrC
TATTGAATGT
ATGATGCT
GGCGTTATAA
TGTTTGGATA GCATTCAGAA GGCTC'rCCAG ATCATTCATC AAATATr'TTG AGAAAGCAAA TCGGGGGGGG GCGTACAT'rA AGACCGATTA TATGGTGCTT TTCTTATGAT GAAACACGCT
TCAGACGTAT
CAAAATATGT
CGGCGGTCTT
CTTTTGTAGA
CAAAATTT'rG ACTGTTT GAAGAATTTG TAGAGAAAGA TCATCAGCTC GTAACCTAGG CTCTGATGAT TGGTTGGAAC 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 TGAAAAAGGA AAACC-CAGAT ATTACTATCG ATGTGTATAT GACTTATGTT ACGGATCCAG S. S S
S
ATGATTCTrCT AGAAGTGATA GA.AGGTAAAG CAATTATGGA TAGGGAAGCT GTCGAAGAAG TCAGAAATGG GAACTGGACT GTAGCTGTCT TGAAGTTATT CAAGAGAGAC TTACTACAAG ATTTACCATT TCCTATAGGA AAAATTGCAG AGGATACTTA CTGGACATGG AAGGTACTTC TAAGAGCTTC GAGGATAGTC TATTTGAATC C'N'GTGTTTA CTGGTACCGT GTTGGTTTAT CTGATACTTT ATCGAATACA TGGAGTGAAA AGCGTATGTA TGATGAAATT GGGCCTAGGG AAGAAAAGAT AGCTATTTTA GCAAGTTCAG ACTATGACTT GACCAATCAT A7TTTGAT ATAAAAATAG ATTACAAAGA GTGATAGCAA AATTAGAAGA ACAAAATATG CAGTTCACAG AGA'rTrACAG AAGAATGA'rG GAAAAATTGT CTTTACTTCC GTAGATAGTA ATAAAAAATG AGATAGCGTA ATATGAAACr ACAT 'TAACA AATTTA'rACG GCATGGCTGG TGATAGTACG GTTATCTTAG CTCAAAATGC TGICAAAAG ATAGCTAGTC AACTGGGATr TAGAGAGGTT GGTAT'ITATT TTTACAACAT TGCTTCAGAT AGTCCTTCTG AAATGAATAA GCGTCTGGAT CGTATTATGG CCAGTATCTC TA7rGGGCGA' ATTTTAGTCT 7TCAGTCTCC AACCTGGAAT GGTrGAAT TTGATCGTCT CTTGTTTGAT AAGCTAAACG ATATGCAGGT GAAAATTATT 'rGCTT1-ATCC ATGATGTTGT TCCCCTCATG TTTGATAGTA ACTATTATCT CATCAAAGAT TATCTGTATA TCTATAATCT ATCAGATGTT TTGATAGTGC CGTCAGAGAG AATGAAAACA CGCCTGATGG AAGAAGGATr GACGACTAAG AAGATTCTTG TTCAAGGGAT GTGGGATCA'r CCTCATGATT TA'rCCTTATA CACCCCTGCT TTTAAAAAG AACTrlr'T TGCTGGAAGT T'rAGAGCGTT TTCCAGACTT AATAAAGGGG AAGCTAG'rTC GAGGAATTGT TGCTAGAATT AATGAGGGAG AAAGTAACCA ACAAAATTW TCTCAAGATA TAGTGCTAGA AGTCTCAGCA ATCAAAGGGT GGATTT'GGCC ATACTATACC TTGAATATAT
CTAACAGCGG
GATCAAG40CT
ATGAATCTAC
GCATTCCAGT CATTGTACCA AGTAGCTTGT TGCGCTTTAT GGCGGATAGT CTGGAAGAGG AAGAATATCA AGAAATGACG AATCGTA'rCA CGCCTTTGAG AG'rA'I-rTCA TCGAAGGATG GAAAAAAGAT TTGTCTGGGG AACCcATCAA CTCATAAGGT CAGTACCTAT CAACTGCTAA ATTTATAGTA TTCATGAGA'r AGTTGATAAA AGACCTTTAG CTATTTGTTA TCTATCAC7'r GGGAATTGAT A'rCGCAATTA CACCAGGCAG 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 1.1340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 AAAGAGGGCT ATTTCACTAA AAAGTTATTG GTAGATGCAA TAAGGGAATG AAATGAACAA AACAATTGTA CTAGCAGGOG TT~AGAAACAA CGATAAAATC TATTTTATAC CACAATCGAG AATCAAGATA TCATGCCAGA 71GT=CGC AAACCACGAA AGTGAGA'rTA 'rCGATGTTAA ACTACCTGAA CAAACTGTGT GATCACATTA GTAGCATTAC TTATGCTAGA TATTTTATTG AAGGNTTAT ATTTAGACAG TGATTTGATT GTAAATACTT ATTTGTTT'AG AAGAAAAATC AC'rCGCAGCA GrrAAAGATA GCAGGTGTr'r TATTAATCAA CAATAAAAAA TGGCGTCAAG ATGTTAAGAT TTATATTTTG AAATAGCTCG CATGTTAGGT T'rCAAGATTG GGAAAAGCAA CAGATTATAT CCAAGAAGAT CTTTAGAGAA ATTATTTAGT CAGATGGAAT TACATTTAAT AGAAATTAAA AGAACGACTA ATTGAACAGA GCATTGTAC GGTCATCAAA CGATTTTAA TATAATTTAC AAGTAGGGCA AATGAAGGAA GTTGPAAGAAG GCCGTTTCGA GCA'PTTAAT TCAGGTCTTG CAAGATGATT GGTTAGAACT AGGTCGAGCT TGA'rATTGTG GCTTTGTATA ACAATTGGCA GGAACATCTG 582 GCTrMAATG ATAAACCAGT GGTGATTCAT TTrACGACCT ACAGAAAACC CTGGACTACC 'rrGACAGCCA ATCG~rATCG TGATNTATGG TCGGAAT'rCC ATGATrrGGA GTGGAGTCAG ATTITTACAAC ACCATAIGGG AGAA'TrGAA CTAATATCGC CTCTAGATAA GGAAI'M-CT TGCrrAACCT TAACGAArrC CCAAGATT'rA GAAGGAATAG AACAGCTAGT CCTGACGTGG TA7*TTCATAT CGCAGCrrGG ACGGATATGG GAGATAAATT
GCTGTATATA
AAAAAGTCAA
AAATCTTTGC
'rrAGGACAAA
TTAAGAAAA
ATAATGTGAC
CAAATCTATA
AAGAACAAGA
TCGTTTTCGA
ACCGACATCT
ATIGCATCCA CAAATTGTTC CACCCCTCTr TTTGGATATC AATCATGGTA GTGCAGATGA AAAAACGCTA CTAGCT-r'1-C AATCGACTCA AAATGGGAAA GTTTCCTTTA TGATTGATAC TACCTG7=? CGACAACTTC CAAG=TAAC CGAACAATTG GATTACTTGG CTGGACAGTT
TACAGCTCTA
AAAAAAAT'rA
AGATAAGCTC
GAAC7"rTT'rA
GCACGGAGAG
GATTAAAGAT
'N'GTTTAACG
GCCAAATGT'r TTTACGGCTT CTCAGTATAT G7"rTTTCAAA TTGCTGCTTG TA'rCCTAATA TTCAGCTCTA AAGATGGATG CTTANrAGA ATGGCTCATC TATCTAAACC GACACCTATG GGGCCAAAAT
TCCGGCAATT
TATCAACCTA
TA'rACTAGCC CAAAGGTTGT ATTCAAGTGA ACATCCTGAA ACTAAGCATA TGCTAGAAAA ACCGCTTGAT TTGGATTATA TTATTGAACA CAACTC'rTCT ATGCTTGCAG GGCATTCAAT TCCCTACCAG AGGGACATTA TCGGCCAAGA AAGTCGAGAA
TCTAGAGATA
CTGACTTCAA
TTTTATA.AAT
CGAATGTTGG
ATAATCCAGG
TTAGTTCGTT
CATTATGATG
GATTTAGTAG
TATATGATTT GTCTAATCGT AGCTAGACGA GTTGAAGGAG CATCCGATAT CGTTGCAGAA CTCAAAATGG GAATAATGC CTGATTTGCA AAAAnrGATA 'rCAAAGGGAT AGATGAAACC 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160
TTGGAGATGG
AAGAGTTGGT
TGTGCCTTCC
GGAAATCAAT
TTCAATCATG
TGATGCTTTT
CATGGATCAT
C7TTGATCr
AAAATTGAAA
ACAGATCGTT TTAGGTTTAC TATATGGATT TTTACAGAGA CGCCCTTATA TCGATTTTGA
ATCGTGGGCC
GTTATGCAGT
ACACAAGAGT
ATTCCATTTT GGAAAGATCA GATTCATGGT ATGGCTCAAC CAAGCTAAAG CTCAAM-rGA AGCATTTGGG AAAACCGTGA CTTACTGATA GTCGAAGGTG CGACTTCTCG T'rCAGGTGTC GGAAATGAT'r TATTCGATGA GCCAAATTCT ATTAAGCGAA TTATCTGTCC TTCTCATAGT PCCTTTTCTA GAGTTCATGA ACTTGAACAA GAAATTGAAA AGTATGCTGG TGGTCGCTG ATTTTATGTA TGCTTGGACC TACAGCAAAA GTTCTGAGI'T ATAATCTATG CCAGATGGGC TATCAAGT'rr TGGATGTAGG CCATATTGAC TCAGAGTATG AATGGA'rGAA AATGGGAGCT AAAACTAAGG TTAAArT=C TCATAAACAT ACTGCAGAAC ATAATTTCGA CCAAGATA'rr GAATTTA'rTG ATGATGAAAC CTA'rAACAG'r CAGATTGTTG CACGAATATT AAACTAGACT
SRI
ATrAAAATA AATGATAAGG ATTTAAAATG AGAAATACCA AACGCGCTGT AGTATTTGCA 14220 GGTGATTACG CTTATAT'rCG ACAAATCGAA ACCGCGATCA ACTCACTCTG TAGACACAAT 14280 AGTCATTTGA AAATTTATCT GCTAAATCAG GACA'r7CCTC AGGAATGGTT TAGTCAAATA 14340 AGAATATATT TACAAGAGAT GGGGGGCGAC TTGATTGACT GCAAGTTAAT TGGCTCACAG 14400 TTTCAAATGA ATTGGTCTAA TAAATTACCT CATATCAATC ATATO-ACATT 'rGCACGCrAT 14460 ?I-rATTCCAG ATTTTGTAAC AGAAGATAAA GTTCTCTATC TAGATAGTGA TTrGATTGTG 14520 ACTGGTGAT'r TGACCGA?1'T GT=GAATTA GACTTAGGTG AAAATTATTT GGCAGCAGCT 14580 CGTTCTTGCT T'rGGAGCAGG AGTCGGCTTC AATGCTGGTG 'rTCTCTTGAT TAACAACAAA 14640 AAATGGGGAT CTGAAACTAT TCGACAAAAA TTGATTGACT TAACAGAAAA AGAACATGAG 14700 AA7VTGGAAG AAGGAGACCA GTCAATwrr AATATGTTGT TTAAAGATCA ATATAGICC 14760 C~wrGAAGATC AATATAATTT TCAAATAGCA TATGATTATG CGGCGGCAAC CTTTAAACAT 14820 CAATTCATTT TTGA'rATTCC GCTCGAACCA CTGCCACTAA TTTACACTA TATTTCCAG 14880 CATAACCrT GGAATCAATT TTCTGTTGGA CGTCTAAGAG AAGTTTrGrG GGAATACTCT 14940 *TTGATGCATT GGTCTGTTAT TTTAAATGAA 'rGG7=~CAA AGAGTGTGAA GTACCCTAGT 15000 .AAATCACAAA TATTTAAGTT GCAATGG?? AATTTAACGA ATTCTTGGTG TGTCGAGAAA 56 *ATCGATTATT TGGCCGAGCA ATTGCCAGAA G7TCATTTTC ATATTGTTGC TTATACAAAT 15120 ATGGCAAATG AACTACTAGC T'N'AACGCGT T'rrCCTAATG TTACCGTATA TCCAAATTCC 15180 TTACCAATGT TATTGGAACA AATAGTAATA GCTTCAGATT TGTATTTGGA TTTGAATCAT 15240 GATCGAAAAT TAGAAGATGC ATATGAGTTT GTGCTTAAGT ACA.AA.AAACC AATGATAGCT 15300 *TTCGACAATA CTTGCTCTGA AAATCTT'rCT GAGATNrCAT ATGAAGGTAT CTATCCAAGC 15360 TCCATTCCGA AAAAAATGGT TGCAGCAATC AGATCTTACA TGAGGTAGAG AACAGTATGA 15420 GAAAATCAAT AGTATTAGCG GCAGATAATC CCI'ATCTTAT TCCTTTAGAG ACGACTATA.A 15480 *AGT1CTGTATT GTATCACAAT AGAGATGTTG ATTTrTATAT TCTCAACAGT GATATAGCTC 15540 *CTGAATGGTT TAAAT'rAT'rG GGGAGAAAAA 'rGGAAGTTGT GAA??CTACA ATTCGCAGTG 15600 TACACATTGA TAAAGAACTr T'rrGAAAGCT ATAAAACAGG ACCTCATATA AATTATGC77 15660 *C'rTACTTTAG ATT=r~GCG ACAGAAGTGG TTGAATCTGA TAGGGTATTG TATCTGGATT 15720 **.CCGATATCAT TGTAACTGGG GAACrAGCTA CTTTGI-rGA GATAGATC'rC AAAGGATATT 15780 CAATTGGTGC TGTTGATGAT GTCTATCCCT ATGAAGGACG AAAATCTGGA TrAATACTG 15840 OTATGTTACT AATCGATGTT GCAA.ACTGGA AAGAACATTC TATTGTCAAT AGTTTATTGG 15900 584 AATTAGCGGC CCAGCAGAAT CAAGTrGTTC ATCTTGGGGA TCAGAG'rATT TTAAkAATTT ATTTTGAGGA TAA7"rGGCTA GCCTTAGATA AAACATATAA TATATGGTG GCTATTGATA TTTATCACCT TGCTCAAGAA TGTGAACG1'C TAGATGACAA TCCACCTrACA ATTGTTCACT ATGCTAGTCA TGATAAACCT TGGAATACAT ATAGTATATC TAGACTACCT GAATTATGGT
GGTTTATAG
TTGAAAGAAG
AACA'rTAGA
GTGATTGTTC
ATGTATTACA
ATACAGGTGG
'rCGCITrTGA 1'CGAGAGACC
ATTAATTAGT
AGATTIGGAT
CAATCAGTCT
GTATTTAGTA
'rGGTCAGAGA T'rGCTTTTCA AAAAAACAAG TGATCCTTGT CAACGGTTAC CTGAMrGCA
ACGTTCCGAT
GACATGGAGT
TTTTCATrrG TGAGGACrG ACCTCTCTAT CACAGTATAC GAATCGTAACA 'rAGTAGAATT GATTGGCTAT TGGACGATTC TA'rAGTr'rAT
TTAAATTATI'
GCAGATATAA
GCTGCACCGT
GTATATCAAA
TTAGATATTA
AAGAAAA'rCT ATTTT'rTCTG
TAATGAGTGA
AGTGTGTCGA
15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920
AGAGGT'T
TA'rCACACCT
AGATGATTTA
GTTGTGGTAC
AATGTAGTTA
AAAAGTATGG
GTGGATAGAA
CGATATACAA
CAAGGGCACA AGAAAGTGGC ATGATGGACT CTATGACGGT TGAAGAATAT AGAGATAGAG TACGGGAAAA TATTTAGTGG
S
S.
S S
S
S S S S 6 5* S S
S
S
S S
SSSS
5 55 S S GCATATr'CI!G AAGCAAACCr ATCAAAATAT AGAAATTATT TAGTTGATG ACGGTTCTAC GGATAATTCT GGGGAAAT~r GTGATGCTTT TA'rGATGCAA GATAATCGTG TGCGAGTA7TT GCATCAAGAA AATAAGGGGG GGGCAGCACA AGCTAAAAAT ATGGGGAT'rA GTGTAGCTAA GGGAGAGTAC ATCACGA'rrG TCTTTATCAG CAAGTCCA6AG TTGATTCAGA TGATATCGTA AAAGAAAATA TGATTGAAAC A.AAAGGATGC AGATGTTGTT A'rAGG4GAArr ACTATAATTA TGACGAAAGT GACGGGAATT rTTATTT'rTA ATTAGCTATA CAAGAAATTA TGAACCGTCA CTTTIATATTG CCGACATTTA AGTTGATTAA AAATGGTCGC CGCTrTGATG ATGAAGCAAC AATCGTC=r ATAAACGATA ATCTCTATCT AACGGAATrT GATCTr'rCCT GGGCAAGAGA GGATTGTGTC TTGGCTGGTT TGGATGTCTC AAAAGATTAT AAGCAAACTT TAGAATACCA TGTAACAGGG CAAGATTT'r GCGTCGAAGA AGCAGGACAT TGGAAATTCA ATAGCTCGGC AAAAGAATTA TTCAATGAAG TTCACTTTTC TATGCATCGC TTTTATC=T TAGCCTCTAA GTATAGAAGA CGTT'CAGGAA GCATCA'rGAG TATTGTTGAA GTGTTTTCTA AGAAAATATC CGTTCTGCGT ATTCGATTTG TCAATCTrTr TCAATTrAACA GATACTGAGG AATATAAAGA 16980 17040 17100 17160 17220 17280 173 17400 17460 TATTTGTT'rC AGATTAAAGT TGTITTTGA TGCAGAACAA AGAAATGGTA AAAGITGAAA TAAAAGAArr GTTATTTACC ATATCACAAA CAATGAAGGT GAGGGGAGTG TI 5
NATGACT
AAGATTTATT CGTCAATAGC AGTAAAAAAA GGACTATTTA CCTCATTCT ACTG'N 6
TATC
TA'rGTATTGG GAAGTCGTAT TATTCTCCCT TTTGTTGACC TAAATACTAA AGATTTT'rA 17520 17580 17640 17700 Page(s),K-R were not lodged with this application 588 ACAAACTCCT CTGATTA'rrG CGGTrcCC TCGTGTTCAG TCTAATrACT ATGCGATCAT TGATACACTT GTAACAACCT TGGTCGAAGG AGAGGATTAT ATCTTTAAAG AGGAGAAAGA GGAGGTTTGG CTCACTACTA AGGGGGCCAA GTCTGCTGAG AA~rwMCTAG GGATTGATAA TTTATACAAG GAAGAGCATG CCTC~wrTTC TCGTCATT'rG GTTAGCGA TAAGCTCTTT ACrAAAGATA AGGACTATAT CATTCGTGGA AA'rGAGATGG TAAGGGAACA GCGTCTAA TGAACCCAAG GAACATGTCA TCAGAGTCTT TT'rAAGA'rGT 7rGAAATGAC 'rAAACTTCAA GGAGGTCTCC AATTATCTCC TGAGACGCGG GCTATGGCCT TT'AATAAGAT ATCTGGTATG ACAGGGACAG TTrCGAGCTCA
TACTGGTTGA
ATCAGGCTAT
CGATCACCTA
CTAAGGTCGC
GGAAAAAGAG TTTATTCAAA CTTACAATAT GTCTGTAGTA CGCATTCCAA CCAATCGTCC GAGACAACGG AT'rGACTATC CAGATAATCT ATATATCACT TTACCTGAAA AAGTGTATGC ATCCTTGGAG TACATCAAGC CTCAGTTGAA ATGTCTCAAC TGTCCTAAAT GCTAATAATG
AATACCATGC
TCTATTCCTC
CGGICGCGTGA
b S
*S
S S
S..
SS *S S S
S
S S 55 S S
S
GGGGGC'TGTG ACAGTGGCTA CCTCTATGGC AGGAGTCGCA GAGC'TTGGG GCT'rCATTGT
TAAGGGAAAT
TCTCTTGTT'r GCTc'CAGAT'r
AGGACGTGGT
TATTGGGACT
TCGTCAGGGA
GAAATTTGGT
TCAACCGGAA
CAGTGATAGT
TATACAACGG
CCT'rrACTCC TTTrTTGTAGG CGTGAAGGGA TTGCCCATAA ATCTCCGAGT CAGGTCAGAT ACGGATATCA AGCTTGGTAA GAGCG4GATGG AAAGTCAGCG GATCCTGGTA TGAGTAAATT CCATCTGGG TGCATAAAAA GTATTGAAAG GTCGTAAATA GCTGGACGTT CAGCACGTCG GATATAGTCT ATAAAGAGAG 23040 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 24660 24720 24780 GATCGACCTA CAAAT'rCGTG TTTTrGTATCC TTAGAGGATG GTACAAAGAC TATCAGGT'rC CCGGAAACTA GTCGAAAAGG TCAGACTCTG GAGTATGCTG AAATCGTCTA ATAGATGGr'r ATATACAGAA GAGGTAGCGG TCTGACCAA'r ATTAGTTTC
CCCTTCTGG
ATGTTATCAA
AAGATATGAC
CTCAGCATGC
AAAGTATGAA
CTCGTGACTT AGAGGATGTT GTT'GGATA CTGATCACTA TGCTAGTCGT GAATITATTGT ATCTTAAAGA GCTCCAGAT TATATAGATG
TCATTGAGAG
TTCACTTTAT
TAACTGACAA
S* 55 S S
S
AACTGCAGTT CGTAGCTTTA TGAAGCAGGT GATTGATAAA GAACTTTCTG AAAAGAAAGA ArrACTTAAT CAACATGACT TATATCAACA GT'rT=ACGA CITrCACTGC TTAAAGCCAT .TCATGACAAC TGGCTAGAGC AGGTAGACTA TCTACAACAG CTATCCATGG CTATCGGTGG TCAATCTGCT AGTCAGAAAA ATCCAATCGT AGAGTACTAT CAAGAAGCCT ACGCCCGGCr TGAACCTATG AAAGAACAGA TTCATGCGGA TATGGTCCGT AATCTCCTGA TGGGGCTGGT TGAGGTCACT CCAAAAGGTG AAATCGTGAC TCATT=CCA TAAAAGGAGA AAATATCACA ATTTACAATA TAAATTTAGG AATTrCGTTGG GCTAGTAGCG GTGTTGAATA CGCTCAAGCC TATCGTGCTG GTGTTTCG GAAATTAAAT AT'Ir'rAGCCG ATAATATTCA GCACTTAACA 589
CTGTCCTCTA
GCCAATATTG
AGTTTATCTI' TACAGATATG GTTI'TGATGA TAATCAGT ATCTrGCCTTI ATAATCATTT GATGTCTTGG C~rACITrGG CGTGTATTCr T'mTrTGACCA GACTTGGTTC AACATGCCGA TCTTATACCC G~rAT?GTAG CAACGAACTT TTTATAATGA AAGGAAGAAG TTrTATCATTT GCCTTTATGA AATCTTTGAA GGTArMGAC AGGTTGTGTT CACAGATATC AAAATTGCAC CTACTAGCG;T GACAGTGGAT TGGTGAAGAA ACTCACAGAG AAAAAAATGG CAAGGTTTTA AGATAAG'rT AACCTGTT AI=GG'MTA TGAGAACALAG GTATGTTTrr AAGGGAAACC TGATTCGGAA GGATT'ACT'TT CGAGTA=' GCTrCCCAAGG ACAA'rGTTGC AGCTTATAC AGACGGGACT CC-AGTCTATG ATA'rCTTGAT GAATCAAGGG CAAGGATAAG ATTT'rCTA'rG GAAAGCAAGC TT'TrGTG.CGT TTGAATAAG TCTGATr'rGG TCA'XrCTCGA TAGGGAGACA TGAGGAAGCA CAGACAGCAC ATCTAGCGGT AG~rTT CA'r
S
S*
S
S. S. 55
S
S.
S
S
S S
S.
GCGGAGCATT ATAG'rGAAAAL TGCTACAAAT GAGGACTA'rA TCCTTTGAA TAACTATTAT GACTATCAGT TTACCAATGC AGATAAG=~ GACTC~rrA TCGTGTCTAC TGATAGACAA AATGAAGTTC TACAAGAGCA ATTTGCCAA.A 'rATACTCAGC ATCAGCCAAA GATTACC ATTCCTGTAG GCAGTA'rrGA TTCCTTGACA GATTCA.AGTC AAGGGCGCAA ACCATTTTCA TTGATTACGG CTTCACG;TCT TGCCAAAGAA AAGCACATTG ATTGGCTTGT GAAAGCTGTG A'rrGAAGCTC ATAAGGAGTT ACCGGAACTA ACC'N'TGATA TCTATGGTAG TGGTGGAGAA 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 GArrCTCTGC TTAGAGAAAT TATTrGCAAAT GGGCATGCGG AACTTT'CGCA GATr'rATAGC AGCGAAGGAT TTGGTCTGAC CTTGATGGAA ?rTGATGTGC Cr'rATGGTAA TCAGACCT
CATCAGGCAG
CAGTATGAGG
GCTA'rTGGTT
ATAGAGGATG
AGGACTATAT CCAACTCAAG
TCTACTTAAC
CAGGTCTACC
GGCAAAATGG
CTTATGCCGC CCAAGTTCAT CTGACCATGT CAATTGTATC AAGAAAATCG GGCTTCTTGA CCAAAGAAAT GATTGAAC -r TATGATAGT TACTGGTCTT TCTCAACTTG AGAAGACCAA ATCAAGCAAG TTGGAAGCT ATGCGTGCCT ATTCT'rACCA TTTAGAAAAG TGGAAGAAAA CAGTAGAGGA ACAGTCAAGA AAGTCGAGAT TTACATGAAA GAGTGGTCLT CGATGCAGAT GGTNTCTGC
GGCTTCTACC
TCTAATTGGT
TTATTTGATT
TAAGATTTGT
AA'rTGCAGAA
GGTGCTCCAT
GTCrAGGCGC
CTGATGGTCT
ATT7rAATCA
GTATTGAAGA
GCTTTCTCT 1'rTACCrTT~ ATCTAGGTTA CGAGCATGGA AAACCTCTCT AGTTCCCGTT TCAGAT'T GGGAAATTTT AGGAGATAAT CAGTCTGCTT TGTGACGCAG GAGAGGGCTG TCATTCATTA TGCTGATGGA ATGCAGGCTC GCTTGGTTAA 590 ACAGGTAGAC TGGAAAGACC TAGAAGGTCG AGTACC1CAG GTTGACCACT ACAATCGC7"r CGGAkGCTTG'r TTTGCTACAA CGACTTATAG CGCAGATAGC GAGCCGATTA TGACAGTTTA CCAAGATGTC AATGGTCAAC GACTIrGCCA T'rTGCAAGAT
CTTGGTTTCC
GGTCAG'rCCA
TTGGAAATAG
TTCCATCATC
AAGTIrT'ACT CGAAAACCAT GTGACGGGTG ATATCI'rATT TGCG'rTACTT TGCAAATAAA GTTGAATTTA TCACCTTC?1' ATACCAGITCA GCTTATCTTT AATACTCTAG CGACTCCT'TT CAGATAAATC TGGCTCGGAT GTC7TGTAT GGCACGAACC TCTCTATGAT GCCATTCCAC TAAGAAGATC ATCATPCCAA GAAATACCAT GATCAGTTTG CCTAAGACGA GATGCCTTAA CGCAGGAGCC TTGCCTGATG GC'TCT'rAGAC ATGCTTTGCT GATTCAGGAG CTGTATCAAC GTAATATGCA GTTGATTr"G ATAAGGCGAC TTATGAGCGC TGCACTTGGG TTATCATTAC TCTTGACCAA TTCAGATCAG TCAC'TTTCCG TATTGCACCO ATCCTAATGT GGCCCTTTAC TGTCGGATAT TT~ACT'rGGAT GAAAGTGATA ATGTGCG.TAC GCTTTAGAGT TAACTGACGA CAGTTCAAAC GTGATAATTT ATTGAGCAAG TAGAAGCAAT GTGACAGAGA TCTCTTCTAA CAGA.ACGCTA GTCCACAGAA ATAAACCACA GTAATGAGTT ATTCrTCGCT TTAATCAGAC GAAAGTAGTG AAGTTGCTGC 0 GCTACAGGCA GTGCGTCAGG CCTTTGAGCA CAATCTCTTG GCTGCACAAT AGACTTTATA TCGCTCCAGA CCATCTAT'TT 26580 26640 26700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 TTTGGTTGAG ACCATTAAAT TGGCCCTTTC CAAACAACGC CAAC-ATGCAA ATTATGTTGA TGTTTAGGA GGCTAACATG TCAGAGGAAG TGGAAGAG7TT GAAACAAAAA CCCATCAAGA GTAACACTTT TTCACTTTA CTGCGGTTGA TGGGAATTT'T GAGGTAGATC TATGATTGAA CTTGCTTTGA TTGTATTGGT AACTATACAA GCCACTAGTA ATATTGGTAA ACCAAGCTAC ACTTTA7rGG TGAGTGGC T'IrATTTATT AAATAAAAGA AAACTTCAGA TATTCACCTT AGATGTTGAT CAAATGCGTC AGGCACTTGG CTTGGTGAGA TATCAGGAJAA CCATGCAAAC ATTTATTTTA CAAAGACGTT AGGAAAAAGA AACCCGAGGG TGATTCTGAT TGGTTTGCTC ATACTAATTG TTTTAGCTAT CCCCGTCAAA ATCAAC'TATT TGGCAGAGCA ACACCTTGGT
GAAGGCCGCA
GAAAAGATTA
TTTACTTI'GC
TATCCTATCT
TTCCATGGAT
CAAGGTGCTC
CTACTATTAA CCTTTATGGT GATTACTTAT 7rGTGGATTG GTCTGAACTT TTCTTT'rrA TACTCAATGA AAATCAAAGA TTTGAG4GTTG
ATAACTGACG
AGCTCAAAAC
TAGATATAAC
AAGTCAGCTC
ACCGTTTTGA
GCAAACTAGG AAGCTAGCCG CAGGC1~gCTC AAAACACCGT TGACGAAGTC AGCTCAAAAC ACCGTTTTGA GGT'rGTAGAT AAAACACCGT 7ITTGAGGTTG TGGATAGAAC TGACGAAGTC GGTTGTGGAT AGAACTGACG AAGTCAGCTC AAAACACCGT TGACGAAGTC AGCTCAAAAC ACCGTTTTGA GGTTGTGGAT TTGAGGTTG TGGATAGAAC AGAACTGACG AAGc tCAGTA ACATATATAC AGCAAGGCGA CGCTIGACGTG TATTACTGTC TATATTTTa GTAAAAA'rCA ACTrrACTT GGATGAAGGT CGTAGGAGTT GAAGAAGGGT GGCGCGGGTT TCAAATTCTT CTCTTGI'CTT
GTTTGAAGAG
TTGGCTTCA
GGGCAGACTG
CGGTTCCGGA AGACTTCCAG GTCTGGCTCA GT'rGACCTGC
GCGGCAAGGT
TGTCTCATCT
TCCAAATACA
GCTCGCTTGC
GACAAGTCT'r
CAAAGTAGTG
CCAG'rCTGAG
GTCCGTCTCC
CCACCAGCGT
ATAACGTTCA AT'N'CATCTA GCAAATCACA AGCAGGATTG AATTTTTGAA AAGAGTTGCG CTAAGATCAG GCTr'rCACTG- AATCTGTTGG GCCATGTTTC TCAGGATACG AC~rGTCC GATATGGTAG TCTGTCTGGT GAAAGAGGTG GTCAGAGTGA GGCTrC~rTC AAAACCGTGT ATTCTGC TACCAGCTCT TCTGGATAAA TAGTATTTGA AGCGCTGGAG GATATCTT GTGGTAGTGC TrGATTCCT CTTCTCGTGA AGOCATATAG TCCTGTACCA ATACCAAAGA GAAGGAATC ATTGACTAGA rrGAACCAAG AGATGGCTAA CCAAAACAGT GCTTGGTGTG CTTGTAGGCT AAAGGAACGT AGAAGGCCAG ATAGAGGCCG GCTCAAGTGA AAAGCTAGAA CACCGATAGC CAGAGCTAGA AGATTAACAA GCAAGGCAAA AGGTCTGGAG AGGTTCACTC ATGCCAATTT CCCAGCCCAT AGACTCCAGA TATGAAATCC AGCATAGAAA AAAGACGATT CTCAAGAGAG CGATAATTCC AGGCAGGCAA GACAGGTAGC ATTTCTAAT AAGTTAGAAT GAGTTAT'rTT TTACGGGTTG AGAGTTAAGG GCTCTTTCAC AATTTCTATG GCAATATCGT GAGGGAACCT CCGGCAATAC GACAAAGGCT GTGGCAAAG'r AAGGTTAATG GTAATCGCTA AGAATAGGTA TC 'GGGTTGA AGTCGCAGAA CTACGAGTGA TAGAGGGTAA GGATTGCGTC AACAAAAAGC ATAGTCGTI'A GACTCCCTTG TCAGAAATGG GCGAGCCAGT TTTAAAGTAC TTCTACGCCGT AGCCCAAACT GCTGACGAAA GATTGAGAAA TAAGATGAGC ?TGGTCGTAC GTTGGCTAAT AAALAGCGTAA AAGACAAGAC ATGAGCAGGC CTGCGTATTC GGCAACGGCG GTAAAGAGGA ATGAGTCTTG GATGACACCA ATCACAAALAC TAGAAATACC GAAAAGGCTA CAAGCAACTG CTGAAGCATC ACAGGA'rGAG ATAGCTGCTA
ATCAGATACG
ATAAGCAAGC
AGACATAAGA
TTGCCTTGAT
CATCTGTAGA
CAACCCCAAC
GGATAAGAAG
CCACAC'TGAG
28380 28440 28500 28560 28620 28680 28740 28800 28860 28920 28980 29040 29100 29160 29220 29280 29340 29400 29460 29520 29580 29640 29700 29760 29820 29880 29940 30000 30060
CAACAGGAAT
CTCCAGCCAT
GTCCAAGOTC
AAAAGGCTGT
TCATAAAGAA
TCCAAGAGTG TTAACTGCAG CAAGGGTCAA ATTGATAGTA GAACCGAGTG GGATAGAAAC
ATGGCAGACT
CACACCGCTG
GAAGGCAATC
'rTCATGT'rGA CAGGAATGTT ACACGGAGGC AGTTCCAAAC AAAGGGTTGA CCACAGGGGC CCGTAGTTGG CAAGGCTTCC ATTCCAAATG GAGCCAGATT CTAATAGAAC CAATAAAATA TTTTAAAAAC AAGACCAAGG 592 GATGATCCAT TCGACAATT TAGAAGTCAC GTCAGCGATA GrT-rAGCA ATTCTTGACT ATTTTTACTG GCTTCTCTCA TAGCGAP'rCC AAAAATGACT GCCCAAGATA AGA?1'CIAAT ATACN'AGCA CTAAGCACGG CGTTGACTGG G?7TCAACC AG7M~AGCA AGAGTGCT GAGAACCTGC CCAATCCCAT CTGGTGGTGC AA'TrrCAGTA 1'TGGCACTAT ?TGGGGTAAT TTCAATAGGG ACGATGAAAT1 TTGCTAGTAC AGCTACAAGA GCAGCGGCGA AAGTCCCTAT CATAGGATAT ACAAGAAAAC AACAGNrTC ATATrGCTAT C7TGTCCCTT TTGATGTTGG GAAAGGGCAT TGGCAACGAG AGCAAAGACT AGGATAGGAG CAACAGCTrr TAGACCTCCA ACGAATAAAT CCTCGAGTAG CCCCACAAAG CATACCAATC TGATTCTTT CATAATAATC GACAAAATCA AGAATTTTCT AATATGGTAT AATGAAATAA CCCAATCCCT GAGAGATTAG GAAGGGTCAG TCCTAGGATT AAGATACGCT TGACAAGGCT TGCCTTATTC CAAGCATGAA TCC'?TGT GTAGTGATTA TGATrATAGT A'rAAATGATA GTCTATTr TGAATA=rA TGGAGAATGA GACTGATGAA AGGAGTTTrA TATGCAAAAA TTTATTCAGG CTTATATTGA
AAAGCTAGAT
GC7*I-rAATT
CAAACTTCT
ACTAGAAAAT
TTTACGG'rTG TATG43GAGCC
TCAACTGGAT
GG7MrTCAGT
TGTCCCTAAC
TTTTAAGTAA
AAGCAGTTGG
GAAGAccTi'r
AACGAGCGCG
GTGACAACCA TTATCGAGAA TATTCTAACC AAGGTCATTTr CTCTTTTACT GrATTrTATA TTGCTAAAAA AATGCrrCAT ACCATGGTGC AGAGAATTGT CTAAAAATGT CTCGTCATGA TGTTGGACGC CAAAAAACCA TCTCACGTTT GTGTTTAATT ATACGC'rATA T?1'CT~rTTA CTC'rACTGCA TTTTGCGAT 30120 30180 30240 30300 30360 30420 30480 30540 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31320 31380 31440 31500 31560 31620 31680 31740 31800 31860 CCAGTTTCTA GTTTGCTGGC TGGAGCTGGT ATTCCTGGGG CAAGGCTTTC TGTCTGATGT CATCAATGGC TT?1YrCATCC GTGGGAGATG AGGTCGT'rCT GACAAATGGA CCGATTACTG GTGGGAATTC GTACGACACA GCTTCG'rAGC GAGGAGCAAG CGAAATATCA CAGTTGTTAG CAATTTCTCA CGCACAGACT TTTGTGGTAC AATAGAGGGA GTTTAATAAG GAGAAAAGAT
TAGCGATTGG
TCTTT'GAACG
TATCGGGTAA
CCCTTCACTT
AGACCTGTTA
GGTTTTAGAA
GCAATGGTTG
CTGAAATT'rA
CCCACATGGA
TACCTGGATA
CGGTT'rGGAC
CTACCACCGT
GACCTAGACC
AAGGAAACCA
GAAAGTGAGA
TATGAGACTT
TAGGAAAGTT GAATAAACTA TTGAATACGC ACTGGATAGA CGGTTACCTT TATCTATAAT TTCCCATGAC CTTTATTGTC CCTATGTCAT TGAACAGATG GTCT'rAGACG TAAAAAAGGA CAAGGCCTAC GAGCATCGTC GCCTGAT'rAC CATTAGTAAT ACCAAGAACG ACTCGTTATC TGGAGAACCA TGACACGCTT TCGA'TrATA GAAATCATCA GCAATGCCTA CTATCCTGTC ATTGAGCAGA
AGTTTCTCTT
TGGACAAGAG
TGCCAGTCTG
TAGGGATGAG
GTCAATGACC TCTTGCGCCA GCGAACTACC AAGAAAAACC TCTNGTCCT GTCI'GATTTG 593 GAGACTGGTA TGGTIrATCT GACGGCAGCT GCCAAACAAA ATTCAAGGTC ATGCCTTGTA TCGTAGTN'T GATGAGATTG GCCATGATTG AGGCTCATCA GCrGGTATCC ATGACAGACC CAGCTTTCAG CCTC?1'ACAA CAATATTCTA AACAATAATC TTGACTATCA TTCAGTCTT GCTAGCTGTTl AATGTTCCCT TACC =TAAC AGATGAGCCC GCAGGrrrGT GGATTGTTTT ATCCTTGTTA AAGGAGCCAG AATGGCGATT GAAAATTATA ATCTGACAGT CCCAAGCCTG CAGGCGCAGG
TTGGCAGTCG
CATGCTTGGC
CTAAGGAAAA
TACCAGATTT-
GAATAAAGGC
ATCGGATTNT GTTAGA~cAT AGAGAGAACA GN'GATGAT TAATCTCTCA GATTrTACAG TGAATGACAA TTTGACAACC TGACAGGC'1r TTrCGGAATG TCTATATCAG TTTGGCTAGT 'rrGCGAAAAA AAGTTAAGAA TGC TGTGGAA GCAGTCTATG TGTTTTGGTC GATI'TGGATA GATGAAGCAA TGGCTACA'rG TAACACCAAA AAACGCGTTC GGCCTTGAAG CCCTTCACAT AAAGGAAGTG GTCATIGG'rTG AGGGATTCGG TCAATTTAG TAACCGAACT CGTGAGCGTC 31920 31980 32040 32100 32160 32220 32280 32340 32400 32460 32520 32580 32640 32700 32760 32768 ATACCCTCAT TGCTTGGAAC AACCCTGArG GAACCCCAGA ACCTTCGGGA CGCGGGTATT GGCATTATCG TAGTGTCAAA AACGAGCAGT TGAGAAATTT GGGATTGATT TTGGTATTGA CCCTGCTATG AAGGAATTCC GTGACCAACr CATGACAGAT ATACGAGCAG TCAAACCCTT GGTCCAACAT GACTCAATCA
GTGTATG
ACGTTTACTG
ACTATGACAA
CCCACCGTGC
AAACGCAGAT
INFORMATION FOR SEQ ID NO: 72: SEQUENCE CHARACTERISTICS: CA) LENGTH: 14872 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: CCAGTCACAA AGAAATTGAG CGCGTTCAGc TGAGGATGCA CTATGATGCA AGCTACATTT CATTTGATGG GATATTAAGA AAGGAGATTT TCATGACACT TTTAGATGTA AAACACGTTC AAAAAAT"?rA TAAAACACGT TrTCAGGGCA ACCAAGTAGA AGCCCTCAAG GATATTCACT TTACCGTAGA AAAGGGTGAC TACCTTGCCA TCATGGGTGA GTCTGGTTCT GGTAAATCAA CTCTTCTCAA TATTCTAGCT ATGTTGGATA AACCAAGTCG TGGTCAGGTT TACTTGAATG GAACTGACAC CGCAACTATT AAAAATTCAC AGGCTTCTAG TTTCCGGCGT GAAAAGCTAG GATTTGTCTT CCAAGACTTT' AACTTGCTAG ATACTCTGTC TGTTAAGGAC AATATCTTGC 594 TTCCGCTrG'r CrrGTCAAGA AGACCTATAA CGGAGATGAT GAAGAAATTG GTGGTGACAG CrGAGAATCT GGGTATTAAC CAATTGCAAG AGAAACAGCG TGTAGCAGTA GCCCGCGCCA AGAAGTACCC TTACGAGATT TCATCACAGA ACCTGAAATT CATCTGCAGC CTTACTrGAT TGGTAACCCA CTCAACAGCA TCTGGTGG'rC
CTCCTTGCGG
CTTTAATG
GCTGCTAGCA
ACGAGCCAAC
AAATCAATGA
AGGAGCCCTT GATTCCAAGT GCGTGGGCAA ACCATCCTCA GGGCCAAGCG TG7TCTCT'TT AGAAGACAGA GCG1'CAGATG AGGTGAATTA GTATGTTTCG CGCAAACTCT ACTATCCCTT T'TTACTCCC TAACCTTCAA GCAACACTTG GATTTGGTAT CCAATAGTTT TGTCATGAAA TGGAGAAGCG CCATCTAATC A'rCAAAGACG
TTCCAAGAAA
ATTAACCAAT
TGCACTGGCT
TCCAAAGATT
GTrTGTCGT'r AACCGTTrCCA
AGTATGACCT
GCATTCTI'TA CAACCAAATC TACCG'rGGAG TCTCTGATAC CTTGACTGTC ATGGCAAGCG AAGTTAGCGG TATCGAACTT GATTAAAAAC
GTTCTCTTGG
GCGGAAATCC
ACCCTTGCGT
AGGAACTGGG
TTAAGGAGTT
a CTGTTGGAGC GGGTATCGGT TCAAACTAAT GAAACTGAAG CAGTACTTGT TGTCTTTGGA 'rCGCCCGTAT GAATGCCCTC GCTTCCTACC TCTCCAAACG CCCTrACGGT AACCGATCCT 'N'ATCTTTGG 'rACT'rATCTA AGAAAAACAA GAAATACTAT TCCGTATGAA GAAAAATGCG TGGTAACCAT GTCAGCAGCG TAAATCCTCA TGATr'rrGGG TCTTGAGCCA GTrMCAAGT ACAGTAACTT TGGTATTGCA ATTGGAGCCT TGTTTGACAA GTTGAGCTGG TTGCTACCTT TTGATTTTCC TAGGCCTCAT CAGCTCTCGC GTGAGAAAC ATTCTTGGTT CCATAAGTTT CTTACAGCCC TAACAACTTT TTGTTTAATG CAGGGATTAC TACCAACCTA ATAACCTCAT GTTGGACTAG CAACCATCGC ACAAGCA'rTT TCAATTCCGC GTTTCAGGGC AAAATGTTGA G.ACAAAGGTT ATAGTGTCAA AATCAAGAAG GAACCAAGT'r CAGTCACCAT CACCTATCTC GTGGAGGAAC CACCA'rTCAA CAcCATTATC GTCCTCTATG 'rATATATGGC ATGTTAGCCT AGTGGTAN'T GGGArCTAA GTTAATTTC GCTT'rCCTGC CCAAATGAAT GTTGTCATTG GTTCCTGAAT GCTCTTrCGAA AAGCGGAGAG AAAAGAGGTC AGGGATTGGC T1ATTATCTTG C'rrCCTAGCT GTTTTGCTGG AGTCTTCCTA CAAATCTTAA ATCTGTTTCC AACTTGATT TATTTTGTCA ACAATGCTTT AGAAAGCTTT AAAAAAGTTC AAAAGAAGAT TT'GGACAAAC AGAGAAAGAA GTACTTCGTT AACTATTTTT GAAAAAGGAC CCAAAAAGAT TATGAAAATA TCTCTTTGCC AAAAATGACG ATT'rTC'rGTC AAAGAAGAAT TAATATCTTG ACTACTGATT 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 a. AAAACCGTGT CCAACCCACA ACAGTTTTCA TGGTA'rTTGA TGACTGGTCA AAAACTGTCT CTATCAGGAA ATGAGGTCGG GACTGAAAGG ACAGAAACCT CTAACTCTAA ATGATCATCA TTAATAALAGA TTTCATTGTG AACCATGTTC CAAATAAGTT 595 ACAA'rTACCT TGrrGTTCCT QATTACAAG CCTrrGGA TCAATTCCCA GATCGGCTA TCTATAAMCA G=rACGGT GGTATGAATG TAAATGTCAG TGAAGAAGAA CAACTCAAC 'rCGCTGAGGA GTATGAAAAC GCTATGTTTA TGGTAGCAAT G'rGTCTTCTT TATCGCTATN TCTACTACAA ACAAATTTCT AAGTCGGTTT GGACCAAAAG TACCTCAATC AATTAATGC CTAGCACATG CTAGTTCTCA 'rCCTATCCA
GAAGGCTACG
CAAATCAAGC
TTATCTTTAT
A).GACCG1TGA
A.AACCATCAA
TCAA'rTAGAC ACAGAAGGTA GATGAGTCCC CTCTTTGGTG GG'TCGGAACT GrmCTGTCA ACGCTT'rM-r ATCTTGCAGA CAAACAGGrr TTAACT=
TCTTCCTTCC
TGATTTTAAA
TCTGCGCTAT
GCAAGATTGT
TCTAAATGCT
AGTGATTGGT
CTTCCTCZATC
GCAAATGTAA
GAAAAGTTGT
GCCTTCATAC ATCTCGCCTT TGCCTACCAT GTACTGGATA CGACTATGAT GTI.GATTGG GCCTATGTGC TGATTT'TCAT GATTACTTCA AAAAGATACC TCGACTTCAA AATCGACGTA CCGAGCAGGA AGGTAACTCC CATGGTCAAG GTTTTGGTTG GGGCCT-ICC AAGTCTAGCA CCGACAATAA GGACGGTAGC ACGGATGCGG
ATGCTTAGCC
ACCTTGTCTA
AGAAGTTATC
TT-CTTGTA'r
AGACCAATAG
CTTGTGTAAC
TAATCACTTG
CAAGTCCG AATCATAGCT CAGTGAGAAG AAGGGCCACA GAAAAATGGT CACTGACAC AAATGGCAGC GTGCCAAGGA ATTGGAGGCA AACTCTAAG GAAAAAGGCA ACGAAGCTAG TTrGGTAAATT C~TCATACTC AATCCCATAT -TTTCCTCTA 2400 2460 2520 2580 2640 27 00 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CCAGAGCCTT GAGTGGA'PTT TTAAGAAAGA TGAATTCTCC ATTTTGGATA TAAGCAGCAT GGTCTAGCAA GAGTTTCT CGCGAAACGG CGGATACATA TTCTCCACCA GCCATTGAAA ATAAAAAGAT AATCCAGATA TTGGTCGTGG TGGAAATAAT TCCATCGTTA GCATCAAGAA CAAAA7'rTGA ATCAATETG TGATTTGTTT- ATCAAGAAGT CTTC'rCTGGG TGACTTGTAG CACTN'TCGA TGAATGCGAC TTCTTTGGGA CTACGAATGA GCCTGTTGTG ATTCTCATTA GCATAA'rAAA TCTGCTCCTC AGAAAATACA TCTCCGrGA TGGAAAGAAT CTTGTGTTGT ACATCAGCAC TTTTATTTGG AATCAATCGG TCTTATTGGT CALAGAGTTGG GCAGAAGTr'r AGAGGGA3'TT TTTGGCTAGT TCCCTATCTT
CAGCTTCCTC
AGGCACCAGC
CACTGGCAAC
CACCCGCACG
CTGACGCTAA
TCCAAGCAT
GTCATTTCT
GTTCCCCTTr
TTAGACAAGC
TTTAAGATGA
ATGATCTGAT
GG'rATCTTTT GGAGTTGAA TAAGATAGCC GTrAAAACCTG TCCGATAACC ACACCAGCAA CAGGATArT AAACGACcC AI-'TCAAGTr CAAGTTACCC TTT'rAGGATA GTTGTTAATC TGGTTCCCT'r AGGTAACCAT CCCAAGAGGC ATAGGGATGT GATTGAATTC CGTTCCATTA CTTTTAG.AGC CTGATTGACC GTCTAC?rT TCGATCCGTC 596 AAGACAAGCA AGCAGrAGT-r TTCGCTCTC GTAACTAGAA CTGTATCAAT CTCATAATGC CCATTCTCCA AGCGAAAATT GATAGCTTCA AGCCGCTGTT- CGATGGATTG ACCAGCAGGT TTAAAGTTGG TGCTGGCCTG TTTCTAAGC GCTTNTCCTT TCTAGGGTA AAGCAGATCC TGTTTGCTTA ACCCCAATTT TCCATGATGA ATCCAATAGT AAATGGTTGA AATTCCCACG ?PAACCCCTTr TAGCCATCAC CATCATTTCA GGCGAAAATT rGGrrATG ATACTGGAGA ATC?-rrCCT TT'AG~rCCTT GGTCAAGCTT GATTTCTrA CCGAGCGCTT GCGATTGTTT TCATAAGACT GTTGAGCATA GTCGOCAGAA TAAACCTrr
CATTGTCGGA
AGAGAGGCAA
CTATCGATTG
CTGTCCCACG CTTGATTTCA GTG'rGGATAG TTTCTCTATT TGATTTCCCT TCTTTTTCC TCAAATGTTC GCCTrTTGTA GTATAATGG'r TCAACCCC TTTTCCAAGA TI-rGAGGAAC 'TrTCCAAGC ATCTTTCGAT TAAGCGACG TTrGCATCTC TGrGCCTTTC TTCTTATGC CTACAAGAGC TGGCTAACTT CATTATAGAA AACAAATCGA TCAGGACAGT TATTCTAGTr TCAATCTATT T-rGTGTTTGT GTGAACAA CAAGTATAAC 'rACGGCTAG rrGAACCATC TAA~rTTrAG CT~rCAl-rrA CGAACATATA GTAAAATGAA AAAATCTA'rT TCTAACAA'rG 'rTTAGAAGC ATATTTITGT TTTTTATCAA AAAATACTTT AAGCTTAGAA AATGAGATGA TG~rTC'TAG GGACACGCAG GGTTGAATGC CGAAGCGTGG AGCCTACTTT TATAAGTTGA TGTTAGGACA
ACAGAGGTGT
GAGGGCTGGG
ACAAGAACAG
AGAGGTGTAC
ACAAGTTCTT AAAAACATGA TATAGTAATA 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 41960 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 CAAATATAAA CCCGAGTAAA TTGAAAAGCC ACATTATTGA CTTGTCCTAA TTCATAAATT
AAATGCCTAC
'rAGGGTTAAA
TTTAGTGTGG
TGAAAGCACA
CTTAG4GAAAC
TGCAGAGCAG
ATAAGGTCCT
CCAATTTATG
CGTCATCTTG
ATCGGGAACA
GTGAAAACCT
AAAAATATTG
AGGCCTTGGT
TGAAACGATC AATAAAGTAC GTAATATTTG CTACTAGACA GACATACTCA ACAGAAACCA AAATAAACAC GTCAGAAGAT GCTCTTTTTT CATGAGTCAA CCTTTAGTTC CTTAGTTTTC AAAGGAGTAT G=T'GAAAG AGTTAGATCA AAACCAAGCC GAAGTTACGC AAGAAAAGGA TTGTTCCCTr TGA'rGTTCCA GTCACAAGC GTGGACGGCGG AAATCCAGAA CTTGTCGAAC CGCATTGATG TCAATTCGAT GAAACCTTTG GATAA'rTrAG -CGTGATGCAG AGGAGCTGGC TGCAGATGCr TTTGGAGCTA GGTGGAACAA CTTCATCGGT GCAGACTATG ATTCTGGCAA ATTATTCTGC CACGAAATGT CCATAAATCT GCTATCAATC ATITCCCATCT ATATCGACAT GAGTGTAGAT CCTAAGATTG AATGACCGAG TAGCACAGGC CATAAAGGAC CA'rCCAGA'rG TCTTAGGAGA AAAATGTGTA GCCATCC'rAT TTCGATTATT GCCATGCCTT TCTAATGATT CCTGCAACGC AGGAGATAAG CG'rTGGT-C' ATGTGGTGCC GTATCGCTTT- AGGTCTTGAA CTAAGGCTAT CCTAATCAAC 597 AATCCTACTr ACTACGGCAT CTGTTCAGAC CTAAAGGGGT TGACAGAAAT GGCTCATGAA 5820 GCTGGCATGA TGGTTTTAGT AGATGAACCC CACGGACCGC ATTTGCATTT CACI'GATAAA 5880 CTrCCAAI CTGCTATGGA TGCAGGGGCT GATATCGCAG CAGIrCCATr GCATAAGCTr 5940 GGTGGGAGTT TGACCCAAAG CTCCA A CTTATCGGGG AGCAGATGAA ?1'CTGAATAC 6000 GTTCGTCAGA TAATTAACCT GACCCACTCT ACATCTGCCT CTTACrCT GATGGCTAGT 6060 TTGGATATTT CACGTCGCAA CTTGGCCCTT CGTGGTAAAG AGTCGTTTGA GAAAGTCATr 6120 GAGCTATCTG AGTATCCCG CCGTGAAATC AATGCTATCG GTGCACTA TGCCTACTCA 6180 AAAGAGrrAA TAGACGGTGT 1-rCGGTTTGC GATTGACG TAACTAAGCT CTCAGrrTAC 6240 ACTrCAGGGTA TTGGCTTAAC AGGTATCGAG G'N'TATGACC TCTTGCGAGA CGAATACGAC 6300 ArrCAGATCG AGTTGGGA TATCGGCAAT ATCTTGGCCT A'rATTTCCAT CGGCCACCGC 6360 ATCCAAGACA TCGAGCGC'rT GGTTGGTGCT CTGGCTGATA TrrAAGAGACT CTATTCAAGA 6420 GATGGAAAAG ATPTrGATAGC AGGAGAATAT ATTCAGCCCG AGTTAGTGCT GTCTCCGCAA 6480 *GAAGCCTTCT ATTCAGAAAG AAAAAGTTTA ACTTTGGATG ATTCTGTrGG ACAGGTC'rGT 6540 GGAGAATTTG TTATGTGTTA CCCTCCAGG'I ATTCCTATCT TGGCTCCTGG TGAACGCAT 6600 ACACGAGAAA TTGTCGACTA TA'rCCAATTC GCCAAGGAAC CG~rGCTC CCTCCAAGGG 6660 ACGGAAGATC CAGAGGTCAA TCATATCAAC GTTATT'AAGA GAAAGACAAA CTATAAGAAA 6720 *AGTCAATAGT T1'TATCTAAA CTAT'rTC TA 'N'TCAATTG ATGAT'rTGGC GA'rGAT'rTTA 6780 GAGCACGGCA AAAAGCCCTT GAAT'rAGAAG CGCTCAATCG CTTAATTTCT ATCAGCTTAT 6840 .CAAATCCTGC CTCAAGCC -r TT'CTGAG4GAT TAGGGTAGCG TGTCAAGAGT TGGTAGGTAT 6900 ATrTCGAATG CTTTCCAACG AT'rrTATCCA ACTCAGGAAA GATGATATCA AGACAACGAG 6960 TGTATTGTAC TTCCAATCA GACTGrTTTT TCTTGAGACG ATGAATATGT CTAGCCAGTA 7020 IrTrAGTTC TACTTGCCGA TTATCGTGTT GAAATTGTTC ACGA7TGGG TCAGAAAGAA 7080 *GTT'rAAGAGC GATGCCATGA GCGTCTTTCT TATCCGTTTT AGTTTTGCGA AGTGATAATG 7140 ATT1TGGCAAA 'I-1-CTTGATG AGCAAAGGAT TGTAGGTGTA AACTITATAT CCTTGTCAT 7200 GCAGGAAGTT CAGTAGATTA AAGGCATAAT GTCCGGTATT TTCAAGAGCG ATGAGACAGT 7260 ***CTTGGTGAG CTGTCGAAGA GACAGATCTA AGAGTTCAAA ACCAGCTrrA T'rATTTGAAA 7320 AAGTGAGTGG TTTAAGAACA GTTTTTCCTG GAACATTCAA GGCTGTAACA TCGTGTTTAT 7380 TTTTAGCGAC ATCAATGCCC ACATAAAGCA TGGGAGTATC TCCAGATATA GTATTTCAAG 7440 TCTACTGGGT TATCCACGAA CTI-1I-GCCT TGTTACCrrA GACGAGATAA AACGTCTATG 7500 598 CGTTATCAAA CTCATTACCA AlnVIAAACAA AAAACTGTGG TTAGAGCCTT TCGGAAATCG TCAACCGAI' GGAGG.AAATG AACTAATCCA CAGTGGCTI'A TTCCAAGTAT ACCAC?'rGG CTrGGCAGT AGCTAACTGC CTAAATATA TTCTGAAGTT CATACTCCAG ATGTCAAAT TCGAAAAAGT GAATGGCACG ATA'rCGAAGT GATrT'AAAT GGCCATGTCT 1'GTTCTC.AGA CGTTCACGTT CCCATGGCTG TCCACCCAAA ATA'rAAGGAG AAATAGATGG ATTTATGG7-r GTCTCTGAGA ACAGCCAAGC AACTTTACCC CI'TGGATACG -CC=CT'TG GGAAAATACT TGCGGATGAT TTCGTCTACA ATGAAATGAC TCCAAAGAAA GTATTGGTTA TTGGGGGTGG CTATCCTGAA CTGGAGCAAA TTGATATTGT TCCTGAGTAT TTCCCAGACT TTCGCAGG CCAAAATGGG CTACCC7rTT TGCGAAACTG TGACGGCGGT GTTGCCCPAG GGAACCGGAT GAGATG'rG GCTAGATCAT CCTCGTrCTTA CGAAGATGAT TACGATA'rrA ACTCTTTACC AAGGAATTCT GATTTACCAG CATGGGAGTC CCGCAAGGTC AATCAAGCCT CCCAGCTGGC TATTGGTTGT
TATTAACCCT
TCGAGGTCTG
CCATTTACTA
TCATCAACGA TGCGACAGAT CCATTTGGCC ACGGCAATAG TTATCGAGCT CTGAAGGAAG CCTTC'TTTCA CGAGGATGAG TCGGCCTGCC
ATACGGAAGG
ACGGCATCAT
GAAGCATGCA
TTCCAACTAC
TCAAAGArT
CAAACTTACA
TTCCAATCAG TCGGrrTAT TTG4GATT'rcC ATCGAAAAAA
CAGGCCCATA
TACCACCCTG
TACTACACTG
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 TGACAAGGAA GGCTGGAAAA AACGCCAGCT 'N'TCACAGAA CGTGCGaAGCC
AAAATGACTC
ATTTGTCAAG
'rGCGATGACT TTTATGTTG;C CCAAGTATGT GTTTACTAGT TAT'rGGTGT ATAGCGAAAC ATTTACAGAG TGAAAGCGAA GCTAGAAGGC CTTGATGCTG ACAAGGTTGA AGAAGTGATT CTITIrGAATG TAGCTCTGCC T'rATCA-AGAT
TGAGGACATI'
GGGGGCGTTG
ATTATGATTG
AAAACAAGTA
GCCCTGATTG
TTAACCATTA
GAAGCAGAAG
CTTGGTTT'rA
GAAGCAGGCT
TTAGAAGAAG AGGAAGGAAA CCCAAGTTGC TATTTCAAAG CTAGCCG'rAC CAAGTCAAAA CTAAAATT'GA AACTGCAGCA AAAGCTACAA ACCAGAAGCT TGGATGCTTG TTT'GGCAACA ACACAGAAGA CCCTGAGTGG CAGCCTACTT TGACTACTCA TGACTGCTCT TCT'rGGT'rCT GGTGTTCACT ATATCGATAC CG'rGCTATCT ACCAAAAACG TGGCAGTCCG CTTATCAAGA GGTI'TTGACC CAGGTGTAAC GAAATCCATT ATATCGACAT ACCAACTTTA ATCCAGAA.AT GATGGGAAAT GGGTCGAAGT G7rGACAAA AAGATATGTA
AGCCAACTAC
TTGAAGGAA
GAAATTCAAA
TAGTGTCTTT
TTrAGACTrGT TCAGCTTATG CCCTCAAACA CTA'T'r'rGAT AATGGCGGTG ACCACGCTTA TCCA'NTGCA TAATCTCCGT GAG'r'TCTG CGCCAGGTTC TTACTGGGAA CGAAGCTATG TCTATCAAGC GTGAGTATGA TTTCCCTCAA TCTCCTTCAC CATGAAGAAA TCCAATCATT GGCCAAGAAC ATTCCAGGTG TCAAACGCAT ATGAAATGTC ?rGAAAATGT GAAATTGTTC CAATTCAAITT CG1TACAGTCG GAAAAACCAA AAGACTATCT ATATCTACAA CAAGCTAT1rr CI'ATACGAC GGAACTTGGA AACAAGCTGG GAAGCTTTGA ATGAGTATGG TAATGAAGTT AG.AACAAC'TA TCGTT'rCTT'T ATGACTTTTG GTCAATCTTA CN'CACGCAC TGGACTCCTT CGTACCGATA CCATTAACTT TAACGGCCAA TTTGAAAGCC 'ITGCTTCCAG ATCCTGCCAG TCTTGGGCCA TATT1GGATGT ATCTTT-ACAG GTGTCAAAGA CGGTGTCAAA TGTCTGCGAC CATCAGGAAT GTTACGCAGA AGGAGTTCCA GCCATGATTG GGACAAAATT AGTGTATAAC CTTGAGGAGT TAGATCCAGA TTrTGCCATGG GTTGTGGTTG AAAATCCACA CCAACACCAG CCTATGTTAT TGACTTGGCC CGGTaCG
AGTCATGAAC
TCCATTrCATG
AATGGTGGAC
AAGTTAGAAG
99 9 9 99. 9 9 9 9 9 99 9 9 9 CTAATTGCCG CATTCTACAA TATCTACAAG AAGAGGCCGG TTGCAAGGTC TTG=~GCCC AGAAGGCATA 'IrCCCTCTAC AAAAcrrATc CCTTrcATrAG CCAGTATCTA TCAGGTACGA CAGCTAG'rGG ACTCTATGAG GCCAAATTGG CAAGGGAAGA ATITCCTGGT GAAGTCCATG TAT'GCGCC TGCTTTCAAG GATGCAGACT TGGAGGAATT GCTAGAGATA ATGGACCATA TAGTrCITAA CTCAGAGAGA CAGTTGCGTA AACACGGTCC GCG7TGCGA GAGGCTGGTG TCAGTGTTGG 7TTGCGCCTC AACCCTCAGT GPCAACTCA AGGcAGATCA CGCGCTCTAT 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 9 9 9 99 9* 9 GACCCTTGTG CACCAGGTTC CTAGATTTGG TTGACGGACT CAAACAACTT 'rGAAAGCAG'r CTCAATATGG GTGGTGGTCA TCAGAAATCA AGCGTATCCG GCCATTGCGC TTAATGCGGG ATGGAAATCT 7rGTTTTAGA CCCTATCGTC CACC'N'rGAG
CTTT
9 CTTCTA ATACCTGTCT G;TCCAAATCG GAGACAGACT AATACCTTTA ATGGTArTGG AGCTTACTCA AAGCTTTTGG CAAAAAAATT AGGCTATCAC
TCATTTTCAT
AGAAGAACAG
TCATATTACA
AAAAACTTAC
TTATT'rAGCA
CGCCTCTGCG
AAATGGCT'rr
GACGGGCGAT
TTA7TrrTCAA
ATTGCCAAGT
CTATCAACAC
ATGCCAGCAG
ACCCTT'rGCG
TTT'GGTCCCT
AGAGAAGGTT
AATCTTGA.AA
ACTGAGGTAT
ACCTGCCATA
GAGTCACAGG
GTGATTGGTG
TCGCTTTGGA GTTACTATAG ACAAGATTCC GAGTGATTTG AGCAGGGAGC AGATGATTTA ACTTACATGA GGTAAAATGG ACGATGTGGA TTTGCTGATT TCTATATCGA GCCTGGTGAA TAGATATTGT AGAAAACGGT TGCCTGATGT ACTTGAGATG AAAAAGCCCA TACCTACAGA ATTATAGTTT TCAAAATCCA GACATGGCCA 'TrTATTCTTT TGTCAAAAAT CTCTATCrCA TGGACGAACA GGGAGACTCT TTTAAAGGGA GATTATCATG ATGGACAGTC AGTACGAACC CCATCATGGT ACCCTCATGA TATGGCCGAC TCGACCAGGA TCATGGCCTT TTCAAGGAA.A GGCTGCTAAA AGAGCATTTA CTCAGATTAT CGAGACCATA GCAGAAGGGG ATCTATCTGA AGCCCAATCC TATCTrGGAG ATGATGCCTG GGCGCGTGAT ACTGGCCCAA TAGCCGTGGA ?TGGGCCTTC AATGCTTGGG ATGAAGAGGA TGACCAAGTA GCCAGTCGTT ATGCTAAACC TTTTGTACTG GAAGGAGGCG TCGTAACTGA AAGTTGCTTG CTTAGTCCTG 600 AAAGAGTCTA TCTwTTGGTG GAGCAGGCCT ACAACGTTGT TTATTAGAC AITCCCACCA cCATTCTCGT CAATGATAAA GGTAAGAAAT GAGGCACCTA TGATGGTC1-r TATCAAGATT r'rGCTGAGGC CrGGAAAGG CCT'GTCATG CAATCCATAG CGATGGTCAA GGAACTATTC GTCGCAATCC TAACTTGACT AAAGAGGAGA TTGAAAACAC ATTATTAGAA AGTCTTGGTG CTGAAAAAGT TArTTGCT= CCTTA'rGGTA 'rTTACACGA TGAAACCAAT AGCTGTTr GGCTTGGACA ATCTCGAACT CTTAGAACAG TGCCTATCCC TGCAGTTCGA GAACACGTCG ATAATGTTGC GATGACGAAA A'rGATCCCCA GAAACAGATG CAAAACGTG CAAGTTGTGA CAGAAGAAGA TGCCTTTGTT CGTCCTGCTC G'rATGCCATG TCAAAAGCAG TCACTTCACC ATTCATAAAT TTrCCcAGGC TACATCTATG p S. 0.
S
S
@0 0
S
0S *O 0 0 AAGAAGGAGA AGAAA.AGCGA TACCCAGG'rG AACGACTAGC AGCTTCCTAC GTAAACTT ATATCGCCAA CAAGGCTGTC TTGGTTCCAC AGTTTGAGGA TGTAAACGAC CAAGTGGCCT TAGATATCCT CAGCAAGTGT TTCCCAGACC GTAAAGTITG'r CGGAATACCA GCCAGAGATA TTCTCTTAGG TGGTGGCAAT ATCCACTGTA TCACCCAACA AATTCCAGAA TAGGAGAAAA 11100 11160 .11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 AGATGAGAAA TGTAAGAGTT GCAACCATTC ATATCCAAAC CGCAGAGCGT T'rAGTACGTC TCT'rCCCCCA GTTGTNTTGAA CATCCCTATT AGTATGCCCA ATCTGTAGCG GAAAATACTG
AGATGCAATG
AGCGCTGA
TCTGTCAGGA
CCATTCAGCA
CGCTAAGGAT
GCAAGGAGCC
ACGTCAGTAT
TTTTAAGGTG
TGCTAATGTC
TTATCGAAAG
AACTACAAGT TGTTrTACCA ATCAGTTTCT A'rGAAAAAGA CTATTGCCGT CA~rGATGCA GATGGGGAAG TGCTCGGCCT *0 SO 4 .0.5 0**0 *0s 0* 55 S 0 0 CAGATGACCA TTATTATCAA GAAAAATTCT ATTTCACGCC TGGTAACACT TCTGGAATAC TCGCTATGCT AAGAT'rGGTA TCGGTATCTG TTGGGATCAA AAACAGCGCG CTGTCTTGCA TTGAATGGTG CTGAATTGCT CTTTTATCCT GrCAGAGCC AATTTTGGAT ACAGATAGTT GTGGTCACTG GCAACGTACT ACGCAGCAGC GAATATTGTT CCAGTCATCG CAGCCAATCG TrATGGTTTA CTCCTAGTGA CGAAAATGGC GGACAGAGCT CCAGTCTTGA CTTCTACGGr TGACGGATGA AACAGGAGCT ATTCTAGAAC GAGCTGAAAG ACAAGAAGAA TAGCTACTTA TGACCTAGAC AAGGGAGCAA GTGAACGCCT AAAcTGGGc
GTGGCAACAA
CAAATTATTC
GACTACTACC
ATTGCTAAGG
7rGTATAACT
ACCCATATAC
GCTTTCAAGG
TGGTTCCC'6)
ACAGCTATCG
ATGCAAGGGC
GAGGAGGTTA
TCCTCCT'rTA GCTG7TCTGT
TTGTTCGAG
601 ATAGAAGACC AGAAATGTAT AGACAAATTA CAGATTAGTG 7GGGAGAAAT GAGAGAT1TCA T'rCTGCTAGA CTAAC?1'C~r ATTAGTAACT ATAAGATACT ATGGCATCTA GTAAATCGAT MITATGAT CGCTATrTT GTCTATTGAT TAGTCCGTAT TTTAAAATAT GCAAATAGCA GTAACTTCTG TCTATTTGCT T7"rC'N'T ATAGAATATA GCACGCGCAA CGCCCTCTTC TTCTGCTT GACGTAACGG CATCCGCAAG TAA'rCGCTGG CAT'rrCCCAT TGCAATCCCA ACCCTrGCAA ACTGGAGCAT TTArrAGCAT CGCCCATGCC CATAATCTCT GAGGAATCAA TCTTCAAAAT CGTGAAAGAG CAGTAGCCTT TGTCGTTCCA AGCGGCATTG CTTICATAAAT GAACCAACTC CACTrGAATCG TTGG.CAAAGC TCT'rCAGCAA AACGCTGCTC G7TTTCTT TTGTTCCTAA ACACATACCT TCGAACATCC GGAACTTTCC TCT'rCAAGAG WATTrCAGT CAGGTCTGAA AATACTAGTT TACCATCAT TGATTGGGCT TGTCACCGAG AACAAAATAA TGTGACTCGT CAAAAAGTGT
TAGCAAAAAA
TTTCTCAATA
AGATTTGATA
TTCGATATCG
CTCAGCTAGT
GACAGGCTGC
AAAATCGTCT
ACTAGTCGCT
TTCAATAACT
CAAC'rGAACA TCACTCTTTT CAGCAAGGTC TCAACTAGAC TCCAATCACT TCGTTCTGGA GGTCAAGCTC CCCGTACAGA GAACCAGTTT CTT"GTGGGA TTTCCTTGGC TTAATCATAG CTCTTCCTCT ATAGAGGTAT TCGATGTCAG GGTCTGGTGA GTTGAACA.AC CAGT7TTTTG TAGTAGGGGA GACACC'rrTT TTCAT'rGAGG
TTATCTTTGC
TrCAATGGCTT
AGGGTGCCGT
TATTATTATA
TAGTTTAAGA
TATACTATGT
TTTT7TTGT;A CTGGACTCAG TTCTTTCCAG CGTTGTTAkAC AATAATATAT GGACACCGAA AAGGGGGCGA TGTGA.ATAGC AGTAATCTGT CCATATCCAA GGCTAGTAGT GCATATTTTG GAGAAGAAAT TGTTTTGATG ACAATTCATG TTGTCAATGT TGCAACTAGA GTAGGTATGA GTAGCAGATT 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 TGATAGAAAG CTTGAGAC'rA ATTGA=T'A ATTTGAAGAG GATATTTCGC AAAGATATGC CAAATTAAAA AACCAACTTA ATATAATAGT ACTCAACTAA TCTGAAGAAT AATGGAGGAA AAATCTAACA AATGAAGAAT TAGAGCTGAT ATATATCATG AT'rTTAATGA CAAAAAATAT ACAAGCTGGA GCACATCCAT ATGGTAAAAA TCCTAATGGT AGGTACGATT TTGTCCCAGA GGCACCTATG GGGAAATAGA ACCAGTATTA ACTCTGCTGG AT'rCAGGATA TATTGGAGGA GGTAATCATC 7rCATGGATT
TTTGCAAAGG
AAGTGCTGCG AGATTTTAAG TAAAATTTAT TAGGAATATG AAGAAACAAG GAGAAA6AC AGAGGATTTA ATATGAAAAA ACGAGCTATT CAAAT'NTTAC TAGCATTGTC CTTAATTTTT TACAAATCAA C7TGG=1rG GAGGCTTTTC AATTATCTCG CAAAGCCCTA TCTACCAGCA AGTCGTGAAT TrTTTCAGAT TCTGCTr'rTG ATCGAGAGCG GAGTTCTTTT CTTAGCGGTC 602 ATCTATCTAC TGGTTTTTGC AGGAAAGAAA ATTITrCATT TCAAGTGGCA GCTGAGGTAC TTCATCTACC TTTrACTGGG CTACATCATPT TCATATATGT C1'GACTTCCT CTTTTCGrAT TTCATATCCC TGTCTrC.AAA TCAGAPTCT TTGAATGAAA CGGTAGAAAT GATGGGGAGA CAGGAGTTCC CTTATGTCTT GCTCATCGTT TGCTTCATCG CCCCTATTGC TGAGGAATTG ATTTATCGAG GtGTGCTrAT GACAACCTGT TGCAAAAACT CACCTTGGTA CG INFORMATION FOR SEQ ID NO: 73: Wi SEQUENCE CHARACTERISTICS: LENGTH: 10223 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: CGTGCTATCG GTCTCAAAAC CAATCTGGTC GCTATGGTCA AATCCAGTTG GAAAATCCAT 14640 14700 14760 14820 14872 TCTTCTTGGA GCCATCTGCT TATCGGCATT TTCTAATACT GTAGGTATAT GTTACTGACT GACTTCGTCA G'rrCTATCTG ATCTACAACC TCAAAACAGT ACAGTGII'TT GAGCAGCCCG TAACACAAAA GGTAGCCCAT GTTCTCTCTT GATACAGCGA CTTTCGCT'rC TAAACTTTCA GAAAATCACG CGCCACATCG CAGGAAAAGG AATrTCAGCG CCGCCTTGGG ATTGACTGGG GATTGCTGGT TGTTTCTTCA CGAGATACTG ACGAATCACG GGATTGCCAT CATCCTCACC ACTCTTGGTA TGCAGACCCT CTTCGAAA.AT CTCTrCAAAC CACGTCAACG TCGCCTTGCC TCGTCAGTTC TATCTGCAAC CTCAAAACGG TGTTTGAGCT CAACCTCAAA ACGGTGTTTT GAGCTGACTT CGTCAGTCGT GTTTTGAGCT GACTTCGTCA GTTCTATCTG TGGCTAGTTT CCTAGTTTGC TCTTTGATT CAGCTACCTT TTTCTTATGC TTCCTCAATC TTCATCACGA TATCATCACA TCCACCATCA AGTCCTAGCT GTGCCCAAAA AATCTrGGCA GGCAGAAATT CACTGCGACG ATAAACATTG AGGCTAGCAT AAGCCTTTC ACCCAAGATT ATGAWT MAT AGCCCCGAGC CTGCATTTCC CGGTCAGACA AACCCACCAC AGCAAGGGTT CCATCACTTG GATTGATAAA TTC77GACTC
CAACCTCAAA
TCATTGAGTA
AAGCGAGTAT
CGCAAAATCT
TCAGCTTTGA
ACAATATCTA
TCGCCACCULG
TTTGTTACTC
TTACTCGTTG
ATAGAAATCC
ACAAAAAAGG
TCATCTGCGA
TTTTTCATCT
AACAGCATCT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 TCCTTTTCA TCAGTATAGC ACATTTTGAA AAGGTTTGCA AGGACTAGCC CCCTTTTAT TTAGCCTCGT ACCAGGTTGC
GAATTATACT
CCCTTCATTC
TAAGAGGAAC ACTGAGTTGA ATGGCTTCTT CCATGGTN'G TTTCACCAAT CTACCAATTC AGATT'rAGGC ACTTCAAGGA CGAT'PTCATC GTGCACTTGT TAGTCI'GATA ACCACCTGCA ACCAAGGCTT TA'rCCAGCTG AATCATGGCA ATCTTGAGAA TATCTGCTGC CGAACCCTGG ATAGGTGAGT TGATAGCAGT TCGCTCCGCA AAACCACGAA TA7rAAGTT GCGCGAATTG ATAI'CTGGCA ACTCACGGCG ACGCTTAAAG AGGGTCTCTA CATAGCCCCI ATCACGCGCC TCCCGCACCA CTTCATCCAT GTAGTTTTTA ATACCTGGAA AACGTTCAA6A GTAGGTATCA ATGTAGGCTT TGGCTCCI' ACGACTAATT CCCAAATAT TAGACAAGCC AAAGTCTGAA ATCCCATAAA CCACTCCAAA GTTAACTGCC TTGGCATTGC CACGGTCGTT TGCAGTCACA TCA'rCAGGAC GCTCAATGCC AAAGACCCGC ATGGCTGTCG AAGTATGGAT ATCTGCCCCC TCTTGGAAGG CCTTAATCAA GTGCTCATCC TTAGAAATAT GCCCCAAAAC CCGCAATTCA ATCTGTGAAT AGTCAGAGCT GAGTAGCACA CTATCCTCCC 0 0 ACTCTGGCAC AAAAGCCT'rC CGAATCAAC GCAAGTTTGG ATCCACAC'rA GACAAACGCC TATGAATCTT TCCA'rCAGCC AAAATCCACT TCTT'AGCAAT TTGACGGTAA TCCAGGA'Tr GCTCTAAAAC ATCCACTGCT GTCGAATAAC GAAGTCCCAA TTTCTCAAAG AGAAGCACGC CCTCACCAGC CAGCTCGTAA ATCTCT'rGAG CCTGCATCTC AAGCAAGGTC TCTTTCTTGA
GCCCCTGTTC
CGGTCTGGGT
CCTGCAACCC
CAATCGGGCA GGAATA'rw CAAATCCTGC ACATAGCGAG AATTACATAA GTAGATTGAA TCTrAACAkAT CGGAGCAArA GGAGCGAGAC CTGTCTTGGT TTTCTTAGTG TATTCAGAG CCAACTGCTT AGGCGAGTTG ACATTAAACT TCAGrTT'rC AATGACAAGC TCA7TTTCAG CCATAATCCC AGCAATTTCC ATCTTGGCAA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 GGACAAAAGC CAGAGCTTGC TCCATATCAT AAAGAAGCTC TAATTGCCCA I=CGCTGA GTrrTTCAAG TAAAATAGCC TCTGTTTCTA CCAAAACAGC CCAAGAATTT CTCACGTTCA GGAATGGCCT TTTTAACAkCC CATCAACCAA GTAAGTCTGA CCATAAAGAC TAGCGATGGT CAGTCGAAAG GAGGTATTTrA GCCAAACGGA TGTCAAAAGC CAAAACGTTG CAAAAGAACT T'rAACCTTCT TAAAG'rCATA CTAAGAAATC CTTGAAAATC GGGTCTTGCA ACAGCTCAAG TATCCCCACA AGACCAGACA AATCCAACCA AATTATCCGT GCTCAAAGTG GAAGATAGAC TCTTCACTCA GCATA'rCTTG TAAAATCCAA ACTCTCAGAC ACATCAGCTG ACGACACATT AAGTTTACAA GCTAAGTGTT CTTACCGTAG AAAGTTTCAT CGCAATTTCA TTGTCCTCCA AGGCGCCTGC AAATCCACAC
AACTCTCAGA
CTTGTCTGTG
ATGGTAATTC
ACTGATTTGG
TAAAGCCTGC
GATGTITTTT
GCATAGAGCT
TCACCAAAAA
TCAACAATAG
TTTAGCTGTT
TGAAGCCCAT CTCATCCTAG AA~rrCCCAA GATTTTCAAC A'rCTGGACCA CTATAGACCA AGTCC'rC'AA ACCAATCGCA ATCGGTGCCT TGGTATCAAT GGTCGCTAGT GTTTTAGACA 604 AAAAGGCCTG ?rCCTTGTCA TTGATGAGAT ?rTCC?'rCAT CrrAGAACTC TTCATTCCAT CAATATTTC ATAAATCCCC TCAACAAC CATGCTCCAG CAAGAGCTTA ATACCCGTCT
TTTCACCGAC
GATCGATAAA
CCTCAAACTC
GAATCAAATC
TATCCAGCGT
CCATATGATC
TGGCCCGACC
CAAAAGCCAC
TTGGTCACC CCAGGGATAT TATCCGACTT ATCACCCATC AGCGCCTTGA CTGAGCTGGT GTrGAGGCCCA TTTCTTCCAT GAGGTAA'rCT GGCCTAA6AGG AGCCACACCT TTCTTGGAAA Tr'rCAACCAC CGTATGCTCA TCCGTCAGCT CTTGTCCCCA CTGACAATAG CCCAATGATG TCATCCGCCT CAGCAACTCA CGAATGAAAG ACCCTTATAG TCCGCATACA CAAAATATGA CTCGGCTCAA TAATATCAAA ACCATCCTGC TCTGCTAGCT
CATACTGAGC
GAAArGCTC
TCTCTGTCCG
CCCGCTCCAA
CAGA'rCATAG TGACGAATCC ACGAAACTCA TCAGGAGTCT GAACGTCGTC T~rCCCGCAT TAAATGACTC AACATCAACT 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 GAAAACCATA AATCGCATTG GTATGCAAAC CAGCCACATT CT1'AAAACGG TCCAACTGCT GATACAGCGC AAAAAACGCC CGAAAAGCTA CAGAAGACCC ATCAATCAAT AAT.AATTT TCTTATCCAT ACACCCATTA GCAAGTATTT 7rCAAACTT AGATTrTAAAA ATGTGCTATA AAAATATTTC TATAAATTAA TCACTATAGA AGGAGGAATA 7TrTGATTGCT AAGAAAATAT GC'TGCAC7rG CCAGCAAAAA CTCCATGCCT TACTCGGTGC TAACGGTACT CAAGTAATTA CTTGTGGGCT TACCTGTGCA AGGTGATACT CACTAGAGCT 'rAGTAATTAT TrCrCAGATG AGGTGAGGTA TTGGCGATTA CTTGGAATTA AAAAATACAG TTTT'rCGCCA ACAAGCCCTT CTGCTGCACT ACAAGAGCTG CGAGAAGAAC AAGTCTTGTT CTTGAACAAG GACTAGAGCG
TAAAGGAAAG
'1TCCGAATAA ATATAGrATA
TTTGACTTTC
G4GAGGATTCT
TCAGCAATCC
ATG'TGACCAT
AGCATTTTTA
AATCAAAAAA
ATAGATAGAG
TTGAATCTAT
CTGATAGAGT
CAGACATCCG
AGAAATCACT
TTTGGAGGGA
TACCAGTATA
TACCATTGGG AAGAGCTAGA CCAGAGAATIT TAGTAAACCT AATAGTACAC CTTGACTGCT TATTCACATC TTA'N'TCAAC
GGCATCACCC
TGTCAATTTA
AGCGATATTC
GACGTCITGG
CAACTAATGA 3900 TT~CGCGATAT 3960 ACGTATTACT 4020 CGGAGTTGGA 4080 TCATCAATCA 4140 GTCAGCGAGA 4200 CTATCGTGGA 4260 TTGAGATTCA AGTCCATCAT CAGAATTTTT GTCAGGTTAA TCAAAATCTT GAAAAAATTC ACAAACACAT CGCTCCTGTT TACGCCATTG ACCTGGCTT'r TCATAGCTTT AGTATGCGCG AAGACACAAC CCAACAATGG ACAGGAAAAC CATCTGGTTA AGATGGCATT AGAAACCAGC AAAGACAAGG TTCGCAAGCC ATCGTTGGAG TACC:CAGCAA CCGCAACGAG CCA'rTACCCA AGCAAATCAA GTCCGAGGAG GACAGGAAAA TGT~rAGTCA ACrACATATG ACCACPAGGAC TATGCCTTGG AAACTGCTAG GGCTGAAGGC TGGGAAAGTT GAAGGAAGGG CAGAAAGGAPA ACrTTTTGCC 4320 4380 4440 4500 4560 4620 4680 'TTCCTAGACA TAGTACGCCA AG-GTCTTCTG AC7r=rAGG TTGCCAGCCA GCAATTAGGT ATGTcAcTA'r CTGAAXNTGA GGCACTGrrG TAAAATGGCT CCATAATATC CATAGTGGGT AAATCCCCTA TGGATA'1TAT
TAATGAAAAG
ACAAAATTAC
ACACACAAGG
CGACAAAACA
TAGACATTAA
AAATCATCGC
GGAGCCTAT TrTGTAGAA AAAAAGTCCC ACTCATTAGA AAGAATCATA TGGAACAATT AGACCCTAAT GTCCAGATTT TAAACATCAT CAAACTGGAC TACGACGCCC CATCTTGCCC CTTTCAAAAA CCTTCTAAAA 'rTCCTTATCT
ATATGACCTA
ACATTTTATC
CAATAAG4GAT
TGAGTGCGGA
TGAAACGACT
CTGT'rCAAAA
TT'ATTATATC
AACCAATTGA AGAAATATGA 4 4 4 4 4. 4.
4 4 4 4 44 44 4 4 4 4*44 4 4 4 4. 4.
GGTATGCCTA CAAGAATTCT CCrAGAAAC ATGATGGTCG CTGAAACTrC TGATGACGTA ACAG7'rTAA ATCTAGCTTT ACTAGATTCA CGAAAAAACC TGAGAATCAT CTCAGGCT-rG GTGGAGAAAG TGGTCGTTTT TCATGAATAC TTGAAATCTG CTCAATCTTA TCAATCAAAC CAACCAATTT CTTAATAGCT GA7"rr'GGA GGTGCGTrACA CCTTGCATAC ACTTCAACAG GACAAATCCT TCCAGCGCTA TCAATCATAT CCTTACCGAT GATATTCATA ACTTCACTAG CAATAATAGC GATAGATGTT TTCA.AAAATT CATGGTCCGG GCTGACAACC ACATAGTCAG cTGC.AA'rCAG AGGAGCACCC ATCAAATGAT GCGCAG-CATG CAAGTCGATG GTCAATAAAC CGACAAGT'rT TGAAGTGATr GGCTCACGCG CATAGTAAGG CATGACAACA TTGACAGATT TAATCAAAAT TTCAAGCAGA 'rTGTCATTTA CGTGTTTCCC ACGGATTGAT TCTTCAATGT GAACACTTGA TTTCCCCAAC TCTATCCCAA TATTAGAAGA AAGGGCAAAC AGCTTTAAAT ATGTATAACT TGTGCTTTTC ACAAGATTI-r TTTTTCAA'rC AAAAATAAAA GAAGGGCACC CCGCTACTAT CTATTrATTC GGAAAAAAGA GTCATTAAAT ?NTTrCTCA ATATCGAAAA
GTACGATAGC
GCTCTTCTGG
TATTGTCCGT
CACCAGCTC?
CATCAATCAA
TATTCATCTT
CTGCCAACTT
ATCCCCTAGG AGATGAGCGA CAGATAGATG G'rATCCAAAA AGCAGGACCA GAAAGAACTG CGCAAGAGCA TCTCCCGCA'r CGTCGATrCA AGTGCTATCA CAGTCATATT TCTTCTCTTT 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 564C 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
GANTACAAGTC
ATCAACGCTA
ACGAGCACGA
AACCAACCAT ACCACGACGC CCACAGGAAT ATCAAAGA.AT GA'rCCACTCC AGCTACTTCA CTCTCGCCTT TCTATCCTGA CTGCACTCCC ACCCTTCAA.A CAGGCGAACT AGTTGATTGT TTGCcCTCAA
CGACGTTTAT
GTCACCCCTC
TCAAAATAAT
CCTTGAATTT
AGCATATTTG
CGTGCATACC
CCATCTACCA
AACATAAAGA
TGACCTGAAT CTCTCCATCT GAAAATTGGC TCTCCTGCGC CACACGTTCT GCCAATTCTT CAGAAAAAGA CA'rGAT'rTCC 'rCCGGTATAT CCATCTACCA TTGTAGCGCT TTTTGCACTA ATAT'rGTAC CCTTGCATCA TTCTTTTGAA AAATATTCTA GGTCA'rCAAC TCATTGTGTr CATAGAGAGC AATAGCCGTA ACCACTGGAA CAAAAGGAGA GTTAAACAAG AAGTGAarrC GrrTTGCC AACCTTCTGT CCTTTATAGG CTAGACCTGA AAAAGTCCAG TGAGAGGCAA AAATAGTAAA GTCAAAGCCC TCTGGCAAAT 77rGGAATCC CAAACCCGAA GCAATTCCAA GAATCAAAGC CA-k"CAAAA ACAAGTGACA 606 TCTCAACAALA GCAATAAGCA TGATAAAAAC TCGCTAAAGG CAAC TCTGTT TCCAACTCCA CCAAGGCTAA ACCTAGAAAA ATAAGGCCCT CTCrGTAAAG CAAGTAAACA CCTACTACAG TTCCTGAGAT GATACGCTCT AAAATTCGCG
CCGTACGAAT
GTAAAAACAA
ATAATT'rCAA ATAACCAATA TCCTTAATCA ACAT7rTAAT T'N'CGCACAG GGGT'rCTTCT ACCAAAGGAG CCGCAATAGC ACN'TCAAAG GCATTTAAAA ATGGACTATC TGGGAAAAGA ACCCCCAGTA AATCATGGAT ATAAGTATTA GCAAAACTAG ACAACCAGCC TGAAAGGAAC ATCCCTCCCA ATAAAGACAG AATCAAAACC TTCTTTGGCA A1-rCCCATTT ATAAAGAGCC GGAATCATGT AAAAGAGAGC TAGAAAGATA *4 0 000 00 *0 9 0 0* *e 0O 0 *e 00 0 0 0 000.00 0 00.0 0 *0 00 0 0 0000 0000 0* 00 0 0 0
TTCCGCACCT
TAGCAATAAA
AAGTATTTAT
GATAATGACA
ACTCTCTTCT
'rTCCTCAAA GACCTCGAAC CGTCCGTATA GTAGATGGT ATAAAAATAA ATAAAATATT GCT=C??C AATTCTACGA CTGTCZATACT TCCTGTATCA TCGTCTGGTA TrAGGGGAGA CTCGATAAGC ATATCTTGGA AGGGCAAGAA GTCCTGGTCT TACTCCTGAA AAGGTTCATr T'rCAAAGGTC 1-rCCCAATAC GGAAGAGAAA GAAACTrCCCA rrAG-CCATA TCATACTGTA AACCAATACA ATACACTrrC TTTCTAAATG ACATTGTAAA TGGCACCAGA AGTTGCATAT CCTCGCGTAC GACACATCGA CACCCAATTC TGAGCACCAC AGTCTGTATG GAAATAACTA GAGJAACGAAT TGAGCATCCC CAAGTGCCAA GTCACAATGG CTACTCrGGT ACATAAGCCT GA~rGGCTTG ATTTGATAGT CAAATATNIC GCAAAGACCT TTTTCAAGAC GGCAGAGTGA TrCCTGTCAA 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 ATGCAATACC ACAATTT'CTC TTCTCCCCAT T'rGT7GCTGG
CATATCCTCA
ACCTAGAGCT
TTTAGGTTTA
CATAAACTGT
GTCACTCGAC
TGCGCAACGT
AGTGGCAGAT
TCAAAATACC
CACCTGCATT CCGCAAAATA GCP.AACGTGA GI'CCATACAG 'rTAACTGCCC ATGTAGGGCA ACACGATTCC CTCCTTGAAA GAGAATTTGT CACGGATTAT CAATGACCTC AATIrCCTTA TCCTATCTTA TCATTTTTAA TTCCTGAATC GTTGTC-ACGC GGAATTCT'rA GGTACATAAA CTCAATACGA TTCACGCGCC CTGAGGATTA GTTGGCI-rGT
TCTTAGTAAA
GAATCTCTCC
CTTTGTAGCT
GCCCAGTTTA GCAGCTTCGT TGTCAAGCCC AGI'TCTCCGA CGAAGCAATA GCAACTGCAA AGCAGA7TT AGATAGCAT AGCCATAATC AAGCTAGCAC
TGATGCGTTG
CAAAACATTC
CAGCCAAGTC
CCTGATTTTG
GC1TTAAAATC AATCGCAGGT TCATCCAATT TAACACCACC CAAGAGAAGC CCTGCCCGTT TTTCCAAAAC 607 AAGTCCTGT'C GTAGTACGCT TGGCATTTCC AAACAT13GTC CGTGTACCA CTCCGCCAAA ATCGGACGCG TCCCTTCCAT GGTTACAACG ATGGAGGAAC ATCCAAACGC TCTTCTAGGA AAACTTGACT CGGATTGAGT ACCTCAACCA CTGCATCTCA AAAATCCCAA TCTCATITAGT GGAACCAAAA CGA7rPTA AATACGAAAG GTGTCGTGAC GCTCCCCTTC AAAGTAAAGC ACCGTATCCA CAACATACGA GGCCCAGCCA AGGTTCCTTC 'N'GGCACA TGACCTACGA AATG7'rATTG GTCrTWCCA AC1TCCATGAG T'rCAGCGGTC ACTTCACGCA AGACCCCTGC ACCCCTGAAA TCTCAGGAGA CATGATGGTC TGCATCGAAT AAAGTCTGGC TGGATACGCT CCACTrTCTrGC ACGAACACTC TGCATATTGG
AAGCCTGAAC
CAGTCGCCCC
AGCCGCCCGA
CCGCTCTCAA
CCATA'rGCTC TAAAGA7TCGC
CCTCAGAAAC
CAATAATGAG
'rCTCTGCATA GAGATAAAAC TCACTATCAA TATCACCTAA AGACTCCTCC CCACTGACAT AGAGAACTGT TAGGAGAAGA GTTGA'TrCC TACCACTCCG CCTCCAAGCA AT'rGATGGAA GTCACCTCAG ACGCGCATTC TTAACTTCGG GTTGGGGCAA CGTCCCAGAT CGCTTTTTTC TNTGCGATGA GGCAAAAA'rC AATCTTCTCA CTACCACAGT CTGATGTTCT
CAATCCCAGG
CACGGTTGAA
CTAGTTTCAT
CAACC2'CAAC
ATTTAGGGGA
CAAACCTCTT
TTTGGCACAA
CGATACTTAG
GCGCTCTGCA CGTAGTr'rAA TCTGCTGGC CCCCACTTGG GACAACTGC TTGAGACTTG ATCCCCACCG ATAAGGACGA GACTTCCTGG T'rCC'CCATC TCCGT=rGG TTCGATTGAC GGGC7TGTT TTCTCACCTG TCAAGGACAC
CTCTTCCACA
ArrATACCCA
TCTATATCTC
ACTGGCGCAT
ACATACATTC
AAAGAAGACC AAGACCCACA CAATTTTGAC ATACAAATGT TAACTCACAC TCAATCACTT GAGCATTCGA TGAGCAACAA TAGAAACCGA GACTTCATTT CTCCCCCTTG TrTCTAAAA ATAGACCTGC CATTCATGTA CACATACCAA GCCG T'A 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 CCGTAGCTGT CTCATAT'rGA ATAGGACTAT TAGGAAGCAA ACAGTCTTCT AGCTGTTTCA AAGTTTTCTA TTCCTGTTTT ATAAAGGCTC TACTCTTAAA GGAAGACCCG TAGCACAGAC AAGCTCTTGT GACTGCAGAA GATACGATTA TTTCAGCTGA CGAGAGTAAA GGATTTTTGC TCAATTTCTG GACTTGCTGC CGTCCCATCT CAGACAAGGG TGCCAAATCT ATCCCAAATC CTATATAAGA ACGCTCCTCT AACTCACGGT AATCTGGCTC CCCATGACGT ACAAAGATAA TCTTCATTCT AGTGCCCTGT CGATCCAAAT CCACCAGTTC CAACGCCATC AGCTGCATCT CCATCTGCAA TTAAGAAAGT AGCAAAAACA GCCTGGACAA TACGCTCCCC AACTTCAAGA ACAACCTCTT GGTCTGTGAT ATTCT'rCATC TGCGCAAAAA TATGCCCTTC ATTTCCAGGA TrCCATAAT AATCCCCATC AATGACTCCA ACTGAGTTAA TTAAAACCAA GCCCTTCTTA 608 CGAGGATNrG AAGAACGATC ATAGAGGTAG AGAACCTCAG TCGGCTGCAT ATAAGCCrrA ACCCCTGTCG GAACCAAGAC AATCTCTCCT GGCGCAACAA CTGTACGCAC AGCAACCTT AAGTCGTAAC CAGTCGCATG CGCTGTCTCA CGCTTGGGCA ATAAATTTTC ATCTGrAAAA CTCGAAACCA ATrCAAAACC ACGAATTTTC ATAAT1TTCT CTrI'TCTAT T ATCATTrATr CTAGA'PTATT CTATACTTAT 'PTA INFORMATION FOR SEQ ID NO: 74: Wi SEQUENCE CHARACTERISTICS: LENGTH: 16535 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 10020 10080 10140 10200 10223 TGGTTCTGTC CTTATCGGCG GACGAAAGTA CACCAACCAC CCTCTTACTG ATACAGCAGC AATGCAAACG CTTCCCTAGA GCACCACAAA CTGGACAAGA AC'rGAAACTA AGGCAGAAGA CTTCCTGAAG AAAACAAGGA TCTGAAAACT GGCCAAACGG TATTACCTAG ATGTCAAATT AATACAGCTG GAAAAAATCT CCTTGTCTTG CTTGCCATGG CTACACCAAC TAACGAACCC AACAACAGAA ATACAACCAC TGGCTCTGGT AAGAACGAAA GTGATATrC GAAAACAGAA GA.AAAACCTG CTGCAAGCCC TCGT'rCAAG'r GAGCCAACTA CTr-C'rACTAG GCCCATCGAA GATAACTACT TCCGTATCCA
TATCTCATCC
CC'rTGCCCAA
TTCACCTGGA
AGCCGATCCA
TCCAGTAACA
TGTCAAAAAA
TGAAAAACCA
TGACTACGGC
CCTCATCAAC
TCCAAAAATG
TGC'rCAAGGA CTATGGACTT AGCTrTGTCC TrCAAGGATG AAAGGGAGAA CAAGCCAAGA AACCGGCGAT AAATCTGTAG
GGGACGATGT
CCAAGAAAGA
AAATTAGCTT
AAAAACTAGT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 AACGAAGCT'r GGTTAGACCA AGATTACAAG GTTITrCTCTr ACTGTTCGCG TCAACTACTA CCGCACAGAT GGCAACTATG TGGGGAGAirG TGAAAAATCC AAGTAGCGCT CAATGGCCTG ACAGGCAAAT ATGGCCGCTA TATCGACATT CCTCTTAATG 'N'?rTATTAC
AAGTTCACAG
TACACAAATC
TCTAGCATTG
CACTCCAACA
TAGATGAGAG CAAACAAGGA GACGACGTGA ATTTGAAAAA TCATAGCCAA ATT'PTCCTAA.
CATACTATGT CCATGATATC CGTATGACAG AAAGTAGCT'r 'TCAACAC~r GTCGGTGCTA TCACTAATCA CCTAGGAAAC AAGGTAACTA ACGAGCCACA GCCTGCAGGA ACAAGAAATC TCTCTGGTAC ACGGAACAGA CTTTACGGCT AAGCCGCAAG AGAATTTGGA AAATCCGTAA AGAAAATTAT AAGACGATGA TGAATCGAT'r GAGCCCAACA CGTAGGCACT AAAAAGAAGA TATCCTCAAA TTACCGATGT TGCAATCGAT GAAGCTGGTA AGAAAGTGAC CTACAGCGGA GATTTCTCTG ACACAAAACA TCC?1'ATACT GTTAGCTACA ATTCCGACCA ATTCACTACC AAAACAAGCT GGCGCCTGAA AGATGAGACA TACAGCTATG ATGGCAAACT GGGAGCTGAC CTAAAAGAAG AAGGAAAACA AGI7rGATTTG ACCCTrTGGT C-ACCAAGTGC TGATAAGGrr TCTGrI'TTG TCTACGACAA GAATGACCCT GACAAAGTAG 77GGAACT'GT CGCT'I'rGAA AAAGGGGAAA GAGGAACN'G GAAACAAAC1' CrAGACAGCA CAAACAAACT CGGAATCACA GATTTCACTG GCTACTATTA TCAATACCAA ATCGAGCGTC AAGGTAAAAC TGrrCTTGCA CTCGATCCTT ACGCTAAATC TCTTGCTGCT TGGAATAGCG ACGATTCCAA GATTGACGAT GCCCATAAAG TGGCTAAAGC CGCCTr'rGTA GATCCAGCTA AACTCGGACC TCAAGACTTG ACTTATGGTA AGATTCACAA TTCAAGACT 1200 1260 1320 1380 1440 1500 1560 1620 1680 CGTGAAGACG CCGTTATCTA CGAAGCTCA'r GCAAAAGACT TGACCAAACC ATTTGGGACT CTCAAAGACT TGGGTGTAAC CCATATCCAG AATGAATTGA AAAACCATGA ACGCTTGTCT TGGGGATATG ACCCTCAAAA CTACTTC-rCC GTGCGT~GATT TCACTTCAGA TCCTGCCATT TTTGAAGCCT TCATTGAAAA ACTAGACTAT CTCCTPCCAG TCTTGTCTTA CTACTTGTC GACTACGCTT CAAGCAACAG CAACTACAAC TTGACTGGTA TGTACTCAAG CGATCCTAAG AACCTCATCA ACGAAATCCA CAAACGTGGT CACACAGCCA AACTCGATCT CTr'rGAAGAT GCCGATGGCA CACCTCGAAC TAGCT'rTGGT AATCCAGAAA AACGAATCGC ATGGGAGCTA TCCTAGATGT TTG4GAACCAA ACTACTACCA
AGAATTTAAA
CGTTTATAAC
CTTTATG-GAT
GGTGGACGCT TGGGGACAAC CCACCATATG ACCAAACGGC TCCTAATTGA CTCTATCAAA TACCTAGTTG ATACCTACAA AGTGGATGGC TTCCGT'rTCG ATATGATGGG AGACCATGAC GCCGCTTCTA TCGAAGAAGC TTACAAGGCT GCACGCGCCC TCAATCCAAA CCTCATCATG CTTrGGTGAAG GTTCGAGAAC CTATGCCGGT GATGAAAACA TGCCTACTAA AGCTGCTGAC CAAGATTGGA TGAAACATAC CGATACTGTC GCTGTCTTTT CAGATOACAT CCGTAACAAC CTCAAATCTG GTTATCCAAA CGAAGGTCAA CCTGCCTTTA TCACAoqTGG CAAGCGTGAT GTCAACACCA TC=rAAAAA TCTCATTGCT CAACCAACTA ACTTTGAACC TGACAGCCCT GGAGATGTCA TCCAATACAT CGCAGCCCAT GATAACTTGA CCCTCTTTGA CATCATTGCC CAGTCTATCA AAAAAGACCC AAGCAAGGCT GAGAACTATG CTGAAATCCA CCGTCGTTTA CGAC~rGGAA ATCTCATGGT CTTGACAGCT CAAGGAACTC CATTTATCCA CTCCGGTCAG GAATATCGAC GTACTAAACA ATTCCGTGAC CCAGCCTACA AGACTCCAGT AGCAGACGAT AAGCTTCCAA ACAAATCTCA CTTGTTGCGT GATAAGGACG GCAACCCATT TGACTATCCT 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 610 TACTTCATCC ATGACTCTTA CGA='cAGT GATCCACTCA ACAACTTGA CTGGACTAAG GCTACAGATG GTAAAGCTTA TCCTGAAAAT GTCAAGAGCC GTCACTATAT GAAAGG'N'rG ATTGCCCTTC GTCAA'rcTAc AGATGCcrC cGAcTrAAGA GTCTTCAAGA TATCAAAGAC CGTGTCCACC TCATCACTGT CCCAGGCCAA AATGGT1GTGG AAAAAGAGGA GGCrACCAAA TCACTGCrCC AAACGGCGAT ATCTACGCAG TCTIrGTCAA AAAGCTCCCG AA2rrAATTT GGAACTGCC TTPGCACATC TAAGAAATC GCAGA'rGAAA ACCAAGCAGG ACCAGTCGGA ATTGCCAACC CGAAAGGACT GAAAAAGGCT 'rGAAATTGAA TGCCCTTACA GCTACTCTTC TTCGAGTCTC ACTAGCCATG AGTCAACTGC AGAAGAGAAA CCAGACTCAA CCCCTTCCAA CAAAATGAAG CTI'CTCACCC TGCACATCAA GACCCAGCTC CAGAAGCTAG TGTAGTGArr
TGCGGATGAA
GGAAC'N'rTc
TGAATGGACT
TCAAAATGGA
GCCTGAACAT
ACCTGATTCT
TACAGCTGAT
AGCGGTTCGA
TAAACAAGCT
ACTAAACCAG ATGCCAAAGT TCACAACCTC AACAACCAGC AACGAATCGG TAGAAAACTC GAACI-rCCAA A'rACAGGAAT CTTGCGCTCC TTGGTCTCGG CCTATAGAAA AATCCCCCAA TCTCCtAATA AACTTGATTA TACTTTGGGT GCAACTTGTG CATAGAACCA AGCGGTAGAT AATCAATTTT TCGGCCACCT AGCTGATGCG GAAAATA.AAC CTAGCCAAGC ACAAGAAGCA CAAGCATCAT CTGTAAAAGA TAGCAAGGAA AA'rATACC'rG CAACCCCAGA CAAAAACGAA AACAAACTCC TA'TTrGCAGG AATCAGCCTC TTTCTTACTA AAAAA'rAAAA AAGAGAACTA AACTAGCCCT 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 '1320 4380 4440 4500 4560 4620 4680 GCATTATAGC TCGGGGGATT GGA7'r'r'r'TA TTAAGCCTCT TTCCGAAGAG TTCAATAGCT GAAGCATGAA GCGGTCCAAT GATCTGGATT GCCAACAAAC AATTTTTGTA CAATATTTGT TTCATAGCAA AATAAGCTCG' CTCAGAACCT GGTCATGAGG CCTAAATCCT CTATCATGCG ATGGCGCCAT TTGGCCCTAC GGACGGTCTTr TGGAAATAGC GCCTGCTCAC CATCTTCCC TCAGCA'rGGC CCCTTCC'- CTGCTCCAAA TATTGCTCAT AACGCAATTC CTGCCAGTGC ATCCACCACT TGCTTAGTCG GATGCAAATA ATCT'rCACC AATCCACCCC CAAGAA'rGGG CTCCCACTTT CAAGTCTTTG CCAATCTCAC CATAAGCCTG AATCAACTr'r TTAAAATAAC GTGGATTACC ACCAATAATA GCATATACAA TCCGTAGACC AGCCTGAGCA AT CAC'rG TTGATTCGAC ATGACCACCT GTAGCTATCC ACAAGGGCAA T'rTGTCCTGA ACTGGACGAG ATCGTTTGAG TCAATCGACC TTGCCAGTC1' AACTTGGTCT AAGTCTAATT TCTCATCAAA AAGAGAGTCG TAGTCrTICA GATAAACTTC TTTACCAGCA 'rrTCATTGAC TAACTGAAGC AGTCATAACC AAACAGAGGG AAAGATTCCG 'rGAAAGAGCC CCTTCCAGCC ATAATCTCCC ATCGTCCA'N' TGACAAAGCA TCGATAGTGG CATACTGTTG GAACAAACGA ATCGGCTCCA TGCTTG.ACAG AATGCTGACT 611 GCACTCGTCA AACGGATTT C~rGTATTG ACTGCCCCAG CGGCCAGAAC AATCTCTGGG GCTGATACTG CAAAATCCGC CCGATGGTGC TCACCAATCC CA'rATACA'rC CAAACCAACC TTGTCAGCCA GCTCAATCTC TGCCACCAAC 'rGGCGAATGC GTTCAGCATG ACI'GTAAGTT TGTCr-AGTCC C~rC.AAGCTC CGTTATTTCC CCAAATGTTG AAATcCCCAA T'rCTACCATT GTGA'N'CTC TTATCTATCT CTGTACTTCA All~rGAAAAA TTA~rCrAAC ACGAA'rCTT1G
C
C.
*C
CC..
C
CCC.
CC CC
C
C
CCC.
C.
C C
C
AGTACAAGCA ACCGATrGC TCA'rTAGAAA AAGCCTAGAT TTCTACCG?= ACTGACTrGG CAAGG~rACG 'rGG=rGTCC GGTTGCAAAG 'rAAGCGACTA AI-rGCcrrGG TACGACCATr ATCTACGGTC GTAAGGACGA TATCGTCGGT ATC'TTTGGCT GAGGACTTTG GCACCACGGG CTGCGACCTC 7TIGGATA= AACTG4GATCTr GACAAGAGAG CCAAAACAGG CGTTCCTTCT GTGCTTGAGTr TCTCCTGCAG CAAAGCCTrC ACACTGGATA GAGACTTGCT TCCATGGCTA CGTAGTAATC T'rGACCACGT AGTTGTTTCA AGAAGTTCAC GAACCTTGAC 'T-CAA'rGGT'r TTCGATAGAC TGAGCTACGA 'rTGACAArTC ATGAACCAGG ArrACCArr GCTrCTCCGA CTGC'rTTTGC AAGGAAGGCA ATAGGCTTTA GI'GATGCCA CGGCAATTTC AGGACCI'GCG GCTTCACGT GAGAGCGGTT AACCTGGAAC GrTGCTCACT TrCATTAGCC TT1GACCAAAA CTTGACGACT ATCCGC'rGTT GATGAAGAGT GGTTrCTTGC TGAGAAGTGG CATACCGTAG AAGTTrCAACT GGTGTA'rCTG TCAATTCTTC CAACATTTTC GTAAGATGTT CCAGCTGCAA GGA'rGTAGAT GCGGTCTGCG ATCTGGGTCIT ACGACAACIT GACCAGCCTC ATCTGTGTAG AACAGTTG.GT TGCTCGTCAA TTTCCTrGAG CATGTAGTAA ATCTGACAAG TCAAGTrCAG CAGTGTAGCT AGCACCCTCA 'GAACTCC ACACTATCAG CCTrGACGAT TACCAACTCT TTGGTTAGTT TCACGAATCA TAGCCATGGC GTCTGAGCAG AAGACCAATC AAAACTGGTG AT rATTTTT AGCTACGTAG GTCAACCAAG GCAAAGGCAT AAGAACCACG GATGATGTGA AACTAGACTr TTTTAGCTTA ACATCGAGGC CACGGTGGAG GAAATTG;GTG AGAGGTATGG ACATTCTCTT CTGCGATAGT CCACGAGTAT GATTCGCAAG TCAATCAAGG CAATGGTTCC TAAGAAATCI' CTTTGAGTTT CCGATGTAAA ALGGCGTTACG TCTTTCTCTC AAAGAGTTGA TCAAAGGCTT CGCTTTAGC AGGGCT-GCGA TTTGCGCTGT TGAAGGAGCA TGGTATAG-T GTTAAGCTTG GAATTCCCA'r 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
TCACCAGA'N'
CCCCACTCAG
TTAGAAGCAA
TCTTGAACAG
GCTTGGATGA
GGGTAAG=C
GGCTGATAAA
ATGAGATTCC
ATCCTGCATG
CCT'rAATCAT
CTTTCCGCAT
CCTTACCGAT
CGACGATTTC CATCATAGTrC TGGTCATCGA TTTCCATGTA ACCATGTTAT AGCCrrCTCC A'rGACTTCAG GATCTI'CTGA AGCGCTTTTT TGAAGGCTTC 612 AAGAACTGAG AGCCCTTCTT CTTCCGCAAA TTTCCAATC ATCTGTCTGC CCCrGAAGT GGTGACCTGC CTCAATCACC CCATTATGCA CCAAGACAAA ATTGTCCTCA GrrGGI-1rTTC CGTGAGTAGC CTCAACACCA GCTGTCTTGG CAGACAATTC GTTATCAGCA CCATCTAGGA CAAAAATTCC TTCAAGCCCT 'XGAATCAAAA TATCAG'N'GC
AAGGTATTCT
ACGTTCCGTC
CCAACGAGTA
'rGCAATACGA
CGCAGAATCA
A'T=GTT'r AAATGAACGG CTArCAGT TCCTTGArr CAAGATACTT TCAGAGCGG'r GTGGGTGAGC TGTCCGATAC CAGTTGTTCC CCA.ACCGCCT TCACCAAATG TAGCCACGGT ATTCAAGCTT CCAACAACAC CAACAArrCC ACACATACTA TATACGACAC TTCATCTTrTT ATAGAATCAG AAAAATTGGT ATAGTTCAAA AATCAGTCAG AGTCCTTTT TATTTCTTC AGACGACGTr TTTATAGCTG TTGAGCCATG GTTGGCACCA TCTTCAATG ACGGTAATCA TCTTGAATCA AGGCAAGCTG TGCTTTCTCC CAAAAACAGT ATATACTTGT TTAAGCTCCT GTAAGC-ATAA 7ITAAAATTGG T'rCTTTCACT
AAACTCTGAC
AAAATCCATT ATTATCGCTT AATTCTTTGA CTTGCGTTTC CAAGTCTAAT TCGACCAAAC ACCAGCAGTC AATAAAGGTC CAAATCAAGT CACGGTGAAG TTCACGAAGA TGACCTTTTA
TATAGTCTAA
TGTCAAGAGT
CGATTGGGAT
ACCAGTGGCC
CATAGCGATT
AGCCCTACA
CAAAGTCAAT
4* a a a. a.
S a 5 0 a a a a. S a S a a TTCCATCTrG ACGGAATTT-r TCTTCCCCT'r CAACACCCAT 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 ACCA7TCCA GTCAACATCC ACTCAATAT'r CCCATAATTT TCCTTGATAT TTTGGGCGAT GTCATAAATC CCTTGCTCAT AAA'rCTCCCA ACCACGGTGA GAATTGATTT TACGTCCAGG CATCACATAA GGCTCGTAAA AATGTTCTGG TAAGAGTGGA CTCTCTGGAT GCTTAGCAALA TCGAGGAGCC ATAACACGCA AAGGTTGATA ATCACGAATG AGTTCCKACT CTTCCTCTG'r GATTTCTACC AACTCCTGTG GATAAGTCCC CTGAAAAAGG GCCGCAATAC GAGCTGCCTT GTAGTTCACA CC.AAGGAAGT CCACCGTA'r' AGCATCAGGT AAAAGACCGT GTTCATGCAA CAAGACAGAT GGATCTAAGA AAGATITGGGC GACATCAGCA GGATGCTGGC TACG1'GGATA AGCCGGTGTC AAGTTGAGGA CTTAACAGCC cGGCTcccr ATCCACCT'rA TCTGGATAAT CTCGTTAAAG GTAATCCATT ATAGTCTTrCA TAGGCTGAGA GGCAAAAGG1' AAATCAAAAT AGCCTCAAAG ACCTrACGAT TGGAAAAATC CGTGACCAC'r CAATCCCAAT CTTGGAATCA GGCAAAAGTT CCAAT'rGTGT A'rGATAG~cT ACCTTAACAG GGGCATCATA AAAATAACCA AATTCTACAG GATCCACTAA ATCrCCATAA GTC'rCAAAAC CTGTCGCCTT ATTTTCCCAA CCATCACCAT GATAGAGATT GACTAACAGA CGAATTCCTT AAAAATCCAC ACCrrGAGTG TTGACTTTC GAATAGAAGT CCGAAAGGCT GTGTGACCAG
CATGGCAAC
CTGCCTCTGC
GAACGATGG
AAAAACGAGC
CCTC7TTGAAG
TAGCCTTAAT
CACAGCCT'rG
TCTCTAACAA
AAGCTCAATA TCCCGCTCCC AAT'rTCATA AAAAGTCGAT ATTATAGTAA CGATTTGGCT CCACTTGGAA CCAGTAATCC GTCACCAGCT ACACGTCCTT CTGTCTGCGG TCCAGAAGTA CTTTGGAA.AT CTTAGCATAC A?1'TACCTCr TTATCTACTC AAAACAAGGT AA.AAACTAGT TACA?TT CTTTTTT ~Gw" TCTGCTTA GGATTTCAAG CGTrCAAGC ACGTATCTG GTCTTATCTG AACCAATCCC CAGAGA'IrGT CTCCCTTACC GAGGATCCCC AGACAAAATC ATTrTCTCCCA TTATACAGAA CTTCTGATTA TAGrTTTAT CATGAACCTC AATGGTGTCA CCAGTTGCCT TGATCTTAAC TTCTACAATG CCATCGGCCG CT7"=TACC AACAGTGATA
CGGATTGCAA
TCATCTGTCA
GCTTGCGCT'r
TCTTTAGGGA
AAGAGGCGAG
GACCAATCAA GTCACTATCG CTAAATTTAA CACCGACACG AGACN'CATA ACCAGCTCCC ATCAAGC1TrG CTTCAAGTTT CTTCATCCTT GACA'TTCACA GTAATCAAAT GCACATCAAA AATTGATTCC CCAAGCGTAA CGGTATTCAC CNTTGGCGT CGTGT'rGCTC CATCACTGCT GAAAGAAGAC GGCTGACACC
TTCGT'TACGG
TTCTGTCAAG
TGGTGCCAAT
TNTGTTAACA
GATACCGTAA
CATCCCATGA TGATTGGCAC AGCACGACCA TTTTCATCCA AGACATCTGC TCCCATGCT GCTGAATAGC GAGTTCCGAG TTTGAAAATA TGACCGATCT CAATACCACG CGCAAAGTTA AGGACACCTT GTCCATCTGG GGAAALTTTCA CCCTCACGAA CTTCACGGAT ATCCACATAT TCTGCAGTAA AATCACGGCC TGGGTTCACA CCAG'rCAAGT GGTAGTCATC TTCGTTACGCA 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 CCGACAACTG CATTGCGAAC ATCTTGTACC GGCAAACCAA CTGGTCCAAG TGAACCAAAT TCGCTACCAA CGTCAAAGAA ATCTGCTCCC TGGTCATTTC CAACTAGAAG GGCTGCAACA T'rACGATCTC
CCTGCTTGAA
AAGTGATTTT
AGCTCACCAT
TTAATCGTT'r GTTCTTCTGG AACATTGAGG AAGGCTGCPA TCTCGCGTT CAACACGAGT.AACTTCTTCT TCAGCGACAA CAATAATTTT AATATTCTCT CAACATTrCGC CACTTCTTCT TCAACTTGAC 7TCGTTGAGT CTGCAATCTA GAAGAGGGTT CTTCATCAAT TGATTTAACA CACGGTTGCT TGGTGTAC CACTTGAGTA AGCAATGGTA TTTCTTCrG CACT'rCTGCA CCCAGCGGTC AAGGTCTGTA CACCCATGGC TCCACCGTCA TACGCTCATA GGCTGcri G TCGTTTGTTG CCATI'CTAA GTTAGCTGCA TCTTCACCAC AGACTATCCA 'r'I-GAGCAAT GGAATTTCGT CAAATGAGGC AACTGACTTG CGAGCAGATG TAATGGCCAT AAATTCTTGG CCAATAATAG CC=~AAGTC TAAACCACTA
TAGCTAGACT
TCTGCCTTGA
TCCAAGACAA
CTATCCTTAC
CGAGTGAAAA
TACTCATCAT AAACACTATC CAAACTATCA TAGTTAGCGT GGAAACTATA AGCATCCTTC ATGATAAACT CACGTGTACG AAGAAGTCCA TTACGCGGGC GTTTrCATC ACGATACTTG GGCI'GAA'rrT GATAAAGGTT GAGTGGCAAT ATAGCTGTAA AGGTTTCTTC GTGAGTTGGA 614
TGCTTGTAAG
CCTAAGATAA
ATTTAACAGA ATCACCGACA AGTCTGATr TT'CACGGTr'r TTTAGT"GT AAAGGTCTTC ACCATAGGTT 'XCGTAACGAC CTGATTCACG
GCACTAAGAA
A'rGATGT'I'r'
GCTGAAACTT
TCGCTTGGCA
GATTATCTAA
ATGATGACCA
GGGCTGGAGC
TAGCTTTTTC
GGCGAACATA
TTTCGCGAAG
AAAAGAGTCG
CTCCGGCCAA
CAACATCTCA ACAGCACCAA TCTWTTCGAA AATCACACGG TTGGCAAGTG GTAGATAAGA ACCAGCACGC AACATAAGAG CATGGCTGAT CGTTGGATA GGCA'rrTAC 7TGTTTCAT CATAATGTCA TTCCAAGTCA CAGCAATCAT GGTGACATAG GTITCAAT'Ir CTTTCAA GAGCACAATC TTACCACCAT CCAAGGCTGG GATGGAAATC ATTGCCAAGA AGTACAAGAT
CCACAATTCT
TTCTTrGGCGC
ATAA.ACACCT
AACTTGAGCA
AATATTCCTC
CAAGACAACC
TGGTT'rGCGG
AATCGGAATA
ATTTTCAAT-r CGGATGGCT-r CTAGGA'rATT AGATAAAAA TCCCAATATT CCATTTTTAG CAGCATCACT AAATCTGGTr GGAAAATCAG GCAGTTGTAA AACCACCTAC CCTAGAAGG'? AACGACCTTG TTTTCAGAAA TAGTCACATC ACTTGCCrrA AAGATAGCAA CAGGTCCACC CAACTTGTTC ATTTrTTCAGA GCTGAGAGAA TTCGGAGAGC.TGAGTCAGCA
AAACATGGAT
ACTATCTTTG
CAAAG1'CGCT GCTTGGATCA AGC7TT'CCCA GTTGCTAACC ATTTCTGGTA CTCCTACCTT GGCCAAGGCA TCAACA'rCTC TCACACCACC CTGCATAAAG ATAAAATTGT TCATAGGACC TGCAAAATTG TGATATTGAA CATCTAAACG TGCAATCCGA GCATCGTGAT CCACTGCAAA TGTr'IrTTCT TTGTCTTCAA AATCAAACTG GGTCACCTC CCTGAGAGAT TGATGCGT= AACCTTrACCA CCTGTCTTGA TT'rCAGTTGT ATCATCACCC
AGAAAATCTG
GGTGTAACAG
GCCG'rCTTA'r
TCATGTGAGC
CCTTGGGGCA
ATTAAAACCC
GTAATCAGTI'
ACCTCAG'rAC
TCTTCCAGAA
ATAGGGAGGG
ACI'TAACCCC CGGTTGAACA TCACTTrGT7-r GTCACTCCCC CTT'rGGCrC TGTTTCCACA CAATCTTGGT'AATTTG1 CCC TGATATGGAA CTGATTGGTA AAAAAACAAC GACACCTAAG 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 .1400 11460 11520 11580 11640 11700 11760
TGCCCCAGAT
CATCTGCTTC
CCAATCCT
CTGTTTGATC
AGTCGCATTT
CACAACCGTT
GATAAAGAGC
CAAT7'r? A TCATCAGCAA GTGTCAAAC1' AACAGGCGTT CAACCGGCCA TGCGGACATA GCCACCCAGA CGCAAGA'rrC GAATG-GTATA GGCCGTTCCA TCCTTGCCA.A TGTGAGCAALA AATTTTAGGT CCCATACCGA TGGCAAATTC ACGTACTAAA ATCCCTGATI' TCTTGGCAAA GTAGAAGTGA CCGAACTCGT GCACCACTAC AATAATCCCG AAAACCAGAA TAAAGGTTAA AATTCCGAGC ATAGCG7??TC CTCCGTCTTT TGATTAAAAG AGTCCAAATA AGTGCATGAT TGCAAATACA AGCAACATAC TATCGAAACG ATCCAAAACA CCACCATGTC CAGGGATAAA TTTCCCAGAA 615 TCCTTAACAC CAAAATGACG T1rGATCGAA CTTTCTAGTA AATCACCAAA TTGTCCAGCA ATGCTAAAGA AAATAGCAAA GACTGACATC TTGTAhAArC CATATGGAAG AGCAACTGTA CTGTCAACTA 'rCATAAGGAT AATGACT AAAA'PTGCTC CTAAAATACC ACCCAAGGCA CCCTCAAGGG =TrTAGG CGATACCCTT GGTGCTAACT TTCG=rCCC ATAGTTCATC CCAACAAGAT AGGCACCACT GTCTGTCGCC CAGACGATAC ACAAGGCTAA GAGACCTT~G TCCAAACCTG CAACACGAGC ATCTAGTAAA GCATTAAATC CAAAGCCCAC GTAGAAGCTC ATACCAACAG GGAAAACCGC ATCCTCAATC GTATAAGACT TGCTAAAAAC GCTCGTTCCT AACATGATTG AAATCAAAAC ACTATAGGCA ACCACATTCC CATCAACTGG CAAAAAAGTC AGGTAATTCT CCAAGGGAA'r GGTCAATGCA AAGGTTGCAA AGAGGGTCAA GAGGCCCTCC ATCGTCATCG TCTCTAGACC TCTCATCT'rC AAAAGTTCAT GCA'rCCCTAG CATGGCTATG ATTCCGATTG CTATCTGAAC CAAGAGGCCC CCAA'rCATTA AAATTGGTAG GAAAATAGCC AGGGCAATCC CTCCAAACAA GGTTCTTTTC TGTAAATCCT GGGTCArATT TCCTCCTAAA CTCCTCCAAA TCGGCGA'rGA CGACGATTAT AGCCAAGAAT AGCTTCCTGC AAGGCCGCT CGTCAAAATC AGGCCATAAG GTGTCCGTAA AATAAAGCTC ACTATAGGCT CCCTGCCATG GAAGGAAATT GCTCAAACGT AATTCTCCAC TAGCTACGGAT AATCAAGTCT GGGTCTCGTA AGTCCTTAGG CAAATGCTGA GTAAAGAT AGTTACCAAT CAATTCCTCT GTCATGTCAC CTGGGTTGAT 'rTTGGCATC'r AAAACATCCT GGGAAATCAA CTTAAGCGCC TGTGTrAATCT CAGCACGTCC ACCATAGTTA AGAGCAAAAT TAAGAATCAA TCCTGTGTTG TTC~rAGTCA ATTrCCTC-AGC C'rTGGTTAAA GCTTCAAAGG TT'rGCTTAGG CAGGCGGTCT GTCTCCCCAA TCA'rTGAAT CT'rAACATTA TTCGCATGTA GT'rCCGGGAC ATAATTATCA TAAAACTCTA CTGGCAAGTT CATGATAAAC TrTGACT'rCCT GATCI'GGACG GGTCCAGTTT TCCGTAGAAA AAGCATAGAC CGTAATAACC TTGACGCCCA GTTTGT'rGGC TGCCTTGGTC ACGGT'rTGCA ATGCTTCCAT GCCCGCCTTA TGTCCAAAAA CTCGCGGTTC CATACGTN' T'rAGCCCAAC GGCCATTCC ATCCATGA'rG ATGCCGATAT GAGCAGGAAC CTGTGTCGGA ACCTCTACTT CCACAGCCTT ATC~wrTCTTA AAAAATCCAA ACATGATCTT ATTCCTATTC AAAAATCTAT CGTTTCATTA TACCATATT'r CCCCATrrC TTCATCACT AAGCTATTTA TTCTCAGGCA CCAACCCAT TTTTCAAAAA AATAAGCCGC CTGATTGGGC GACTTrATTT TTATAGGGAG ATTATTATGA AAAAGTTTTA GGAGT'rTAAG TTAAGGTC~r CTTAACTTAT GA.ACTTAGTG TACACTCCCT AGCT'rAAAGT TTCCTTAAGT A7M=TAAAA ATCAAATTTT TCCATrCTC 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 CTGCCAATrr TTCTTrGGATA GAAA'rGAGGA GTrGGACGAA GAGTTCAAAA TCTAATCTT CTTACTCATA GCATCACGAG ATAGACCTGA TTTTTCAATT CAGTAATAAG GTTTCCTCAT 'rTCATGATCA AAAGGGGAITT TGCTGCCAAC CACGAGTCTT A'rCTGGACTr TCTCTAAAAG 616 AACGrG1-1-r ATAGAGTTCC ATTCGG=C' CTTGAAAATT CAAAATATCC TCCAAACCAT CA71-CAAGCG CAGTCCAACT GCCGTACACC AACTGGTATA GGAAGCAGTG TGAGGACTGT CCCACTGGTA CTGAGCAAAA TCCTCTCTCA AAGAAAAATC TGGATCAAAG AAAATCACAT 'rGTCTGACAA AAGATTCCAG ATGAAATTTC TCAAACCAAG GTCTCGGATG ATCGTCAGAA CCTCTAAGAT TTCTTGCTTA T7=TCACTG CATrTrCTAA 13560 AACGTACATA 13620 GTTCTGGATA 13680 GCIGA'GCAT 13740 CCTT1r=CTC 13800 CTATATCTCr 13860 TGACACAACC 13920 TGGCCATCAT 13980 TATTCATAAC 14040 ATGATAAAAC 14100 TCGGCGATAT 14160 CTAAGTGCrC ATATGCCTTA GCAGTCGCCA CCCGTCCAGA CCGTGTCCGC CTTTTGAAT CAACTAAGGC TCATACATGT TCACAGAAAG AGTTCCTAGA CCAACAGGTC GGATTTTG ATCCACATAG TCCAAACCI' TATCGGTAAT AACATCATCG ATAACCCCAT GCTTGAGGAG ACGATTrGGCA ATACGAGGGG CTGCCTCATG GGTGATTCC ATCTCAAAAA AGTCAGCATG AGCATAATAC TCCATATGAC TTGAGAGCAT ACCAGCCCGA GTCGTCGCAC GAACACTGCG ACTGCCTTCA CCAGCCCCAA CACTATAAAG CACTTCTTCC ACTGACATGG CATCTCCAGG CTCTAAATCA 7rCAAAATCG GACCAGACGT TTGCTTGAGA TTGACTCCCA CTCCACTGTA CATCTCAATC ATGG-GCGAA
CATGGTCAAC
TCCCCATTAT
7rCCAcCACT
TATCTCCCGT
CTGTAATCCC
CAATCAAGGT
TCATAATATC
CTTrCAACTGT CTCACGCTCT ATCCAGCATA GTCAAAGCCT CTGGGCAAAA TCCCCACGC ACGTAGGGCC AACTCAGATG CCGCTCGACA ATTTCTGTCA AAAACGTGCC CGTAGTGGAT AAAAG4GAGGC AACTCCAAAT GATGTAGAAG TCCTCCATGG ?rTTTCCCAAG
TAGCGGCTTC
GTAAATACTG
CTAAAATTCT
CCTGAT'rGGG
CACCGCCTAC
GGACCAATTG
TAGTGTCAAA
CCCTGGAGGG
GATAAAGA'rC
AGGACGGAGC
ACTCATGGCT
TGACTCCTAA
GAAAGGAGGT
TGGTCGGTGT
AAAGACCTCA
CCAAATAAGA
TGAAGTTGAT
GTGCGTTCTA
CTATI'ATATC
G'rAAGCGA'rG AATCTCGTCA ATAAAGAGGA CTACCAAATC ACCCGCTT TCGATAACAG GTTCATTGGC AATGACAAAA GCCATGGT'rG CCACATGATC CAGCGCTTCA TCCCGCAT'TT CCTTAACCTT ATCCTGACCA ATATATTCAC CTAACTCCTC ATCACCCATC ATCTCATT'AT AAAAAAAACA AGCCACAAAC AAAAAAGCCA 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 GTN'fAXCACT TATGTGGTAT AATAT'rATAC GAGATAGCCC ATGATGGAAT TAGTACTCAA CGTTCTTCGT ATAGTCGATA AA'rGGCTAAA AGCTTATTTG GTCGTGAGCT TGGGGTCTIT GGCAC7'rCTA AACTA'rTATC
CAAGGACAAA
TCTAGCCTAT
617 GATATAGAAC TAGTACrCAA TTCCTTPTA TTATCCCATA GTTCACGAAT CTTTACATr'r TCTTCAACCG CTGTACGACA AGACG GTTIAA GAT'rAAGAGA TTCTATCAA'r TTCATAGAAA TI'TTGAT 'TC GTAAACGAAG AGACAATC'TT TCTCATTTAA TACGCCACTA CTAGACAAGC AAAATCATTA TTACAGTAGT CAATTAACAG TCACTTACAA TCAAATTGAG TTTGAACTAG CTCAAGCGAC TrTGCAAAA
ACCTTAGGGA
ACATGTCACT
TCCAGTCCrT
CACAGACCTA
TT'rCTTAGTC ATATTCGCTA AAAAAATCCC CGCCAAAATC TCAAAAAGTC CCCGCCAA??T CCCCGACCAA AATCCGAAAA ATACCGAAAA ATATCGAAAA ATTA'rTTTTA GAATAGTCCC AAAAATCCTG AAATAGAGCT AAAAAACTCC ACCTGATTCG GTGGAGTTAA GGGAGATTAT TATGAAAAAG AAAAGTTTAG GATTTTATTA AATAAAGTTA GGAGGTCT'rT ATTTAATAAC TACATGATAC AAGACGAAAC TTAAAACTAG C1-rAACTTTT CTAAAATTTT ACTATTI-rGc AAAAAATTTC TATCACCAGC ACCTCACCAA TCGAGTAGGG GATA.ATCTCr AGCCCCTCTC ACACCACCGT ACGTGCCGTT TGGCATACGG CGGTTCAACT AAC'TTrAAC GCATGTCGTT CAAGGTAATA ATCCAAACAC GAAACCAGTC CACGrTTC CAGGACTGGT TTTGATATAG CACGr'rTAAG TACCGACTTC TGAGCTACTA ATTGATAA'rG GTCGCCCCAG CCAGATACCT TATCTGCTAT CCATT TAGGA ACTCCTAACT TAAGCAATCC CCATAATCGT CTCGATTrCT TCTTCCATrG CTTCCAGATA ATCACTCGTA GGCGAGTACG CAAGCGCTCA TCTATGCrGG CGACTATACT TTTCATATTT CCCAATGAGC AATAGTrTAT CCATCCTCGA ATAGACAAAr *rCAGTTGC'rC AATACGTCTr GTTAGG'rCTA 'rACTCCATTT CCTCTCTGTT AGTTCTTCA ATTrAAACTT AAATCTCCGA ACACTATCTT GA'rGTGGACG CCTTTTCCAA CCATCTGATA ATTrCCAGAA CCCAAAACCT AGATATTTCA ACTCTCT'rGG TCATGTTTAC TTTCAAACCT AGCCGTTTCT CAATAAACGA CTGACTGAAT ACATC INFORMATION FOR SEQ ID NO: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 8136 base pairs TYPE: nucleic acid STRANDEDNESS: double TrOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCAGAGCCT'r GCGTCCGAAA GTCTATCCAG ACACGOCTCT TTAA-AAACAA AAGGAGAAAT GATGCATACT TATTGCAAA AGAAAA'rGA AAATATCAAA ACAACCCTAG GTGAAATGTC 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16535 618 AGGTGGTTAC CGTCG1'ATCG TTGCGGCTAT GGCTGATr'rA GGATICAG GAACTATGAA GGCTATCTGG GATGACCTCT TTGCCCA'rCG TAGT'TrGCC CAGTGGATTT AX'TGCTGOI' ?r'rAGGAAGT TTTCCTCTCT GGCTGGAG'TT GGTTTACGAA CATCGTATTG TTGACTGGAT TGGGATGATT TGTAGCTrCA CACGGATTA'r CTGTGTAATC TTTGTATCGG AAGGTCCAGC AAGTAATTAT CTnTGGCT TGATTAACTC TG=ATTTAC C7"TATTIGG CCCTrACAGAA AGGC'=AT GGTGAGGTGC TGACGACACT ?1ACTTCACA GTCATGCAGC CAATTGGACT TCTAGTTTGG A'TTrATCAGG CACAGTTTAA GAAGGAAAAG CAGGAGTrTG TCGCCCGTAA ACTGGACGGC AAGGGCTGGA CAAAGTATCT TTCCATTAGT GTGCTTTGGT GGTTGGCCTT TGGCTTCATT TATCAGTCTA TTGGTGCCAA TCGTCCCTAT CGTGATTCAA TCACAGATGC AACCAA'rGGG GTACGCAAA TCCTCATGAC AGCI'GTTTAC CGTGAACACT GGATATTCTG GGCGGCTACC AATGTCTTT-T CAATCTATCT GAAATATCTA ATTTATCTCA TTAACAGTCT CTGGTGGGGA GAAAGCCTGC AGTTGTTGG TATCAATGGA TAAGCAGAAT ACTGATTTAC TCGATTAAAA CAGATATAGT GGTCC1'CTTT TGTT=GTAA GCTT'rACTCT ATATTCAATT TGGCCTGA AATCTTGACC ATGACCAAGT GTTGGAGGTA GCTAGCACGG AATAGACATG TTT'rTGGAAA ATAGCAGACT TGGGCGAGGT AGTCCTCcGGT TAGCGGCAGA GCTGGTCCAG CGTCCCACAT CTGCGAGATC TTCAAAAC'N' GTATAAAAC TTAACTAGGA AAAGATGTTT GAAAGTGCTG TGATAATCAA GGATTTATAG TATGAUAAAAG
AAATTCAAGG
GCAAGGCAGC
TTTTGAGATT'
AGGATCGCCG
GACAACTGAA
GACAACTTCC
AAAGTTGCAC
AAGATAAAAA ACTCAGTAAC TTTAGGAATG AGA.AGGTCTA TTTTTTGAGC CACCACCTCA
CTAGAAATAA
GATAAAATTG
ATGTCTCGAT
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 AGAAGTAGGC AGATTAGGGT GGGCTTCTTT TAAATTATCA GTGT'rCTAGC TCTr'rATCA GTTGACGGAG GAAGTAGTCA GGTGATATGG TCTTGGTT'rT TATGAAAATG GAGAAAGAGG TGAAATGGCT TGCTCTCTTT CAAAA.AGATG ATGGAAGAGG AAGAAGCTCC T'rACTCTCAT AGTGACAGTA AAAGGTGGA'r AATGATATCC TGAACAGTAG TGGCCTCGTA GCCCTTACA TTGATAGATG GCrrTTTTGG TTITGCTGAT ACGGCGGTCA TAAGGCAAAT TGTTCAGAAC TGAATAAAGC TGACGTTTTG TTTAGTGGAT AATGATAATG AACAAGGTCT TCATAAATCT AATATGAAGG CAAAATATGC TGTTTGCGTG GCTTTTTCT ATG7TTAGTCA
CTTCTATCCT
ATTATAACAA
TATGGACACT
TTCTTTGAGT
AGGAATGAGA
TAAATTTGAC TTATGCCATT' GTTGAGTTTA TCTCTGA CTCTGTGCAT GACrTTGGAG TAGAAACAAT CTCCAATCGT GAAGAAGACA
TTGCAGGTGG
ATGCGAT'rGC
ATCAGTACAC
AGTATTTGGT TCTAGCGCTG AATTGGAATA TCAGCTTTTC CNTGGGCTAT AAGCGGTT'rA 619 GCCTGCTAGG AGCCTTGGTA ACAGCTGTGA TTCTCGTAAC GCCCTCTGTr CTAG'rCATTT TGGAAAATGT CACGAAGATT TTGCATCCGC AACCAGTCAA TGATGAGGGG ATTCTCTGGT TAGGAATTAT TGCGATTACT ATCAP'rCTGT TAGCGAGTCT GGTKGTTrGG? AAGGGAANAGA CAAAGAATGA GTCTATTCTG AGTCTGCATT TTCTGGAAGA TACGCTAGGG TG-GGTAGCTG ?TATCCTGAT GGCGATI'GTT CTTCGATTTA CGGACTGCGTA TA'rCCAGAT CCTCTTTTGT CCCTwrGTCAT rrcC~rT ATTCTTTCAA AAGCCCTTCC ACGTTrrTGG TCTACACTCA AGA'rTrTCTT GGATGCTGTG CC-AGAAGGTC TTGATATCAA GCAAGTAAAG AGTGGCCTGG AGCGA'rTGGA CAA'rG=GGCC AGCCTTAATC AAAAAAATGC CATTGTCCAT GTT'rCTCTAA AGTCTATTCG AA?1rCCTA AAAGATrG'rG CTGACCTAGA AACTCACCAA ACCCATAAGC AGCTTAATCT CTGGACTATG GATGCI'TTGG AAGAAATGGA ACATATGGA.A ACTTGTAAAG G~r3'CAAAA 'rATTACCATT GAAATTGATG GAAAGGTCTG TGACTTGGAA CGGAG=rATG AGCATCAACA TrrAGAAAAAA ATTTCTr'rAT 'rATTAAATA GATATATGAT TGTTAATGAT CAACTCAAAG T'rATATAATA
TGAGAGGAGG*TTAGCGTGTG
ATACGATTCG GTA'rTA'rGAA GGATTCGTGA 7=TCAAGAT CGGCGGGTGT CTCTGTAGAT AAACGAGAGA GGAGAGCCC1- TGTCTCAGCT ACAGACAGCT GAAAATTTTA AATGAAATCA
GTGAAAAATA
TTTCAAAAAT
AAAAA~rrr
AGATAAGTGA
AATATTAAAT
CGGGTTGGTC
CAGGATATCG
AGTTTAGTYG
CG'A'TTTrAG
TTAAATCGTT
GCAGTATATA
CTTGGGTACT
T=GAAGAGA
CGATTAGATA
GTTAGAATAG
ATCTTArTTTG GAATAGAGTA AGAGCATTGT ATAAACTCCA CAAAATGCTT GACTTGGAGT CGTGA6ATTCA GTGAATGAA.A 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 CTGCCAGTGA 'T-TGTTGGGA TTGTGCCACC GATTACTCGT AAGCGCTGGA ATTTATTAAG ACTATAPGTC GCTCTACCAA AAGAGGAAAA GCAAAAATTA TAAATCTCAA AATTAAACTT CAAAGGCAGG TCAGGTTGGA
ATT+CAG;CGG
ACTGCTAC'rG
TCTTTTCGTT
AAGGGAGATG
GAGGAGCGCT
TATAAGGAAG
CTTGCTAGCA
TTC.AACC'rCC GCAAATAATA GAAGCGGATG AT=TATTAT TCGTGTGGTT CGTGCGTGCG ?TTGTGGTTC AGATTTATGG AGGTACCGTA ATCCAGAAAC GAAAGCTGGA CACAAAAATA GTMGACACGA AGCGATTGGG ArrGTTGAAG AAGCTGGGGA AGCCATTACG ACGGTGAAAC CAGGTGATrr TGTGATTGTC CC'1-x-TACAC ATGGATGTGG TGAGTGTGAT GCCTGTCTTG CTGGATTTGA CGGTTCTTGC GACAATCATA TrrGGCA.ATAA TTGGGGGGT GAVMCAGG CAGAATATAT TCGCCCAC TATGCAAACT GGGCGCTGGT TAAAATCCCT GGTCAACCTT CTGACTATAC AGAAGGGATG CTCAAGTCCC 1TTGACTCT TCCAGATGTC ATGCCGACAG 620 GCrATCATGC GGCGCG'rGTT GCAAATGTTC AAAAAGGA CAAGGTTGTT GTTATCGGTG 3720 ATGCCGCTGT TGGTCAATGT GCTGTCATCG TCCTTATGAG CCGI'CA'rGAA GACCGTCAAA TTGTTGCAGA ACGTGGTCAA GAAGGAATTA CAGATGCAGC AC7*rGAA'rGT GTTGGTACGG TTCATAA'rGG AGGGCGTATG GGCTTTGTAG GTTCGACATT 'rATGCAAAA'r ATCTCTGTAG ATAAGCAATT rTTTACTAAAA GCCGTCCT'rG CTTCAAGTTA TAAACTGGAA GATATCGACC CGGCTAAGAT GCGTGGAGCA TCACAAATTA ACATGGCTAT GGAGTCAGGT GCGACAgcTrG CCAAGGTGCG TGAAATCCTC GGTGGAGGAG AGGCTGCTAT AGAACAGGCG CTAGGTGTTC GAGTCCCACA CTATAATAAT CGTGCTCTTG CAGGTGGGGC AGC -rCTGCT ACAACATACG ATGGTGATAT CAATCCAGGT CGCGTCTrrA AAGCCTATAA AGATATGGAT GAACGTAAGA
S.
S
S. 55 S
S
5* S S
S
5* I S 555555
S
5555
S
S. 55 S S
S
5555
S
S. 55
S
CAAT'rAACTC
TTTTTTATGT
AGCATGGTCA
cc'rGcTCGGc 7TrGGTCAGAC GCT'rCTrGGA GGCGGAAGTr
GCAGTAAATC
TT'rTTGAAGT TATGAT'rGTA ATCGAATAAA AAACGAATAG GAGTTTAGA TATCCTATTC TTGATTTAGG GTACTr'rCTC TTAATGTCAG GGCTAGGGAT TTTCCGACCG 'rGGAGGACTT CCTTGTrAAG CCATTTCTTC ACTATAAACT GTAATACTAG AGAGGGGAGG TAGTGTCGTT AAAGGAAATG AGGCTGACGC GATCTGGCAG GGGCACGGAG GGCACCGATA GCTAAACTAT CGCTGGC'rGC GGTCTCCCAA GCTCTGAATG GCCTCCT'rCA TTAAGTCATA TTCCTITGAAA GACCAGTTCA TCATGATAGA TTCCCCTCGC TTTCTAGACG CTTGTCCrGA ATGATTrCTT CTTGGTCTGT
ACTCTATT'CG
TCTGGTTCCC
AATATCCATA
ATAGACCTGT
GCTGATTCCA
GAAAAATCCT
GCCAGACTGG
TTGACTATAG
TCTTTCTTCA
3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 AGGCCTGTTA GAATCCCGAT TTCATAGCAG TGTAAAAATC TCTAGAAATA CAAGAGGCTT TTTCCGATGC AGAGAATCCC TAGCGCAAGA TATCATAGTC TAGTAGTA GA GGTCGTCCAG TGCT'rGGGTT TGTGGGACTC
ACGGTCCATT
CGTGATAATA
TTGGTATTCT
AATCACT'rCC CAACTCT'rGG CTCCCCTTG'r
GCCTGTCTG
CCTTGACTGA
CAGGTATGTC
TCAAAGGCAG
TCGCTTrAGGG GCTCTTrTT GGAAATAATC GACAACCTGT CCAGGGAA.AG TGTATCGCTG AAALTCTGAGC TCGACTAAAC TAAAAGGGTG GTCATTAAAA CTATTCCTAG GCGAATCTGG TCGCTGACCC ATTGGATAAT GGCAATCr AGGTGC7 GG TGTAGCCCAG CTCTTCAGCA GTAACAGATA GGCTCTGGTC GCGGTI'GACG GCTAGCTGTG CAATGTCTrT TAAGGTAGCC .ACGGTTAAAA TACGGTGTCT GGTI-rCTrT ACGCGGGATA CGGTCGCGAT AGAGACAGAG ATAAATCCTC CTTGATTAGG TTAGI'A'ATC ATGTTTTTCT TCTTTTTACT TAAAAT=TA GTAAAAAGGA TTGACCTTGG AAAATTCCT1 GGATATAATA GATTACACGT TAAGATGGCT 'rAACGGACAG 'rCAAAGGAGA ATTCATATGG
GATATTTTAC
GAAAGAAAAC
CACAACATCT
TACTACTGAA GCCCTI'CGCA AAGACTTrTCT TGCTGT7=N CCTCAAGAAG CAGATCAAAC CTTCTTTTCA CCAGGCCGCA TTAATTTGAT TGGTGAACAC ACACACTACA ACGGTGGGCA CGTN'rrrCCT GCTGCTATr'r CCTTGGGAAC TTACGGTGCA GCTCGTAAGC GTGACGACCA AG~rCl*GCT TTCTACTCAG CTAAC7*rTGA GGACAACGGC ATTATCGAAG TGCCTCTCGC TGACCTCAAG TTTGAAAAAG AGCACAACTG GACCAATTAT CCAAAAGGTG TCC-rTCATTT CI'GCAAGAA GCTGGGCACG TGATTGACAA AGGT7TTGAT T=rATGTTT ATGGAAATAT TCCAAATGGT GCTGGCTTGT CTTCTTCTGC ATCCTrGGAA CTCTTGACAG GAGTCGTGC TGAGCATCTC 'TrTGATTTAA AATTAGACG TCTCGA'rrTG G?1'AAAATCG GCAAACAAAC AGAAAACAAC TTTATCGGAG TAAACTCTGG CATTATGGAC CAGT'NTGCTA TTGGTATGGG GGCAGACCAA CGTrCCTA'rTT ACCTAGATAC TAATACTA GA.ATACGACT TGGTGCCACT 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 9 9 09 9
S
et 9* 99 0 e @0 .9 9
QO
0 0
C
TGATTTGAAG GACAATGTCG TTGTATCAT CTCTA.AATAC AATGAACGTC GTGCTGAGTG CTTGGATATT CAGACTCTGG GTGAA'rrGGA GATTAAAGAT GAAAATCGTT TGAAACGTGC GAACACCAAC AAACGCCGTG AATTGCCGGA TGAAAAAGCA GTGGAAGAAT TGCAAGTTTC CGAGTGGGCC GTTGACCAAT A'rAGCTATCT TCGCCATGCT GTGCTTGAAA ACCA.ACGTAC AGATTTGGAA ACAT7rGGAC GCTTGATGAA CCTCAAAGCT CAAGTAGCAC TGCGTCACAC GTTTCTCTGG TGTTCACACA GCTTGGGCAC TGGTGGCTGT GCcATTGCCT AGGCAAACAC TACGAGGAAG AGGTGGCACT CGCGTCCTTG ATT'rGTAACA CATGTCATTT CAATCGTGTT TTGGCACGAG ATTGATTGAC CTCAAGGACC TAGTCAGACT GCGCGTGAAA AAGTCAGG;TC AATCGTGATT
TCCAAGCAGG
AG.CATGATTA TGAAGTAACT AAGAAGGAGT TCTCGGTGCT TGGTTCAAAA AGATACTGTr TAGTTGGATA CGCTCCAAGC ACTAGTCA.AA AGGAGGCT CT CTGAAAGCTC ATTTGAGGAA TGGGAGAAGG TGTTTGGAA AGCTGGTTGA AGAAGCCGTT TCCTTGGTGC TGAACTGATG rTrrGGGCAAC CTACGCCCAC GG'TTTGGAAT TGGATrACCCT CGTATGACAG GGGCTGGTTT GAGGCCTTTA AGGAACCTGT
TTCTATATCG
ATAGTGACCT
ATGGATCGAA
GTTGAGACCA
CGATTAGAGA
GAT'rTGGTGA
TCTCCAGAAC
CTGAAG'rTGC 6600 TAGTAAATAA 6660 TCTATCTGAC 6720 ATCTGGATAA 6780 CGATTGAGGA 6840 CTCCTTGTCC 6900 AAGCGATAGA 6960 TTGCTAGAAA 7020 ATCTCTCTAA 7080 GGAT'TTTAC CAACTCAGTC AGAAAAATGA CTACATCAPLA CTCAAGCCCA TATCGCTTAT CGTGTTCCAT CTGAC'rACGG AGAACTTGAA A'rTACCATCA GCCTGAAAAA GATCCCAAAG AGATTGTGGC AGCCAAGTTG GTGCAAGCTA GTAATTArCC TCAGTGTCAG CTTTGTCTAG AGAATGAGGG CTACCA'rGGT CGAGTTAACC ACCCAGCTCG 7140 7200 TAGCAATCAC CGTATTATCC GTTTTGAAAT GCCC!TATGCT TACPTTTAATG AGCATTGTAT CATTAGTCGT CAGAGITTG AACGTCTGTr TGCTGGATCT AATGCCGACC TGCCGA'N'GT TCAGGGAGGC CGTCACGTAT T'rCCTATGGA TGCTGGTTTT GAGCAGGTCA AGGCTGGAAT GACTTCGGAT TCCAAAGAGG ATTTGATCAA CCAGTATTCA GATCCTGCAG 'PGCAGATTTT TATCACACCC ATrGCCCGCA AACGCGATGG CAATCAGACT TCAGCAGAGT ATCCTGATGG TATCAAGAAC GAAAATATCG GCTTGATTGA TCTGAAAGAA GAAGTGGAGC AAGTCGCTAG CGATTATCAT CAGGAGTGGG CAGACCAACT GAAAAAGCCC TTGCAATCGT CAAGGACTCT GATGCAGGAG TCTACAAGCA GACAGAACAA CAGGTCGGAA TTTTACTAGA CTAGGAGCTT 622 GGTrGGTCAG GAATGGGGTT TCCAGTATTC CTT TTTAGAT GGCCAGCATC GTCCCATGGC GGCTATCGTA GACCAGTTTC CAGGATATTT GGGGGGCTCT ATTCTAACTC ATGATCATrA ATI'GGCTCCC TTGCAAAACG CCTrCCGA'TT TGTCAAGTGG CCCA'rGTCrG TCCTACGTTT TTTGGCTGAT AAGATTT'rGC AGGAATGGCG GGCAGAGACA GACAGGACAC CGCATCACAC ACAGN'TGAG TTGGACTTGG TCNTGCGAGA TATCTATCAT CCCCACAAGG ATGTCCAACA GGTCATGGGC TTGGCAATCT.TGCCACCACG CTATCTTGTA GGAGAAGCTG TTACAGTTGC CAAATCCCAA CATCCAGACT AACGGATAAA GTGGGTGCTA TCTTTGCGCG TGTACTTGAG GOCCAGACAG CCTTTATGCG CTTTGTGGAA
TCTCGG
7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8136 INFORMATION FOR SEQ ID NO: 76: SEQUENCE CHARACTERISTICS: LENGTH: 10011 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: CCCATAGTGA AGAGTGGCCA TAAGAAGGTC TTCTAGGCTT AATTTAGGTT TTCGTCCACC TTTTGCGTGT TTAAGTTGAT AAGCTGTTTT TAACACAGCT GAACATCTCT TCAAAAGTCG TGCGCTGAAC ACCAACAAGA CATTTAAATC GTGTATCAGT TAGTTGTTTA CT'PGCTTCAT CATTCATAGA ACTACTATAC CATGN'TGT TTCGCAGGAA GTCTAATATT GTCAAATACT GGAACGCTCA TTGCTGGGAT ACGGAATAAG ATN'GCCCCAG CTTCGATAAC TGGGATACCT GGTTCAAAAC CAAGGTCTGT 'PGCAGCGATT GGTGTAAAGA TATCGTAACC TTTCATAAGG TCTTCGTTTA CATCTTTCAC CATAACTGCA TCACAGTGAA CATCGTAACC ACGGTTTGAA AGTTCTCTT CTAGAGCACT TTTAATTTGG TGACTTGAGT TAACACCTGC ACCGCAGGCA 623 GCAAGAATTT TAATCATTrG GATTTCCTCC GATTTTA~rrTTTAATAGAC GG7rGCTTCA GCAATGTAAG CATAAAGGGC TTCIGGTTCA GAAATTTG
AAG.ATTAAGC
ATAGGTCTTC
AAGATGACCA TCCTGTGA AGAAGTCCAT TAACTGAGCA AGAATGTTCG ?N'GACTrGA ACTTGAATTA TTGATGATAA AGAAGAGCAA GGATACTTCT ACTrCCTTAC CTGGCGCAA'r CATATTATGG AAAGTCACCG GTTrqrCTAA TCGAACAACC ACCACTrCT CAGCTAGATT
ATGAACAATA
TATAAACCAG
ACAATTTCTC
GC7'rCTAAGC
CTTCTTTCCA
TCTGTCTGCAG
7TGGAAATGA
GTTCTTCCAA
AAAACACAAG
TAACI1'TAT GAATCATTAC ATTTGCAAGT CTTTCACGC GTCATCAACG CAAGCTTGCT ACCTGATCAA GTTTTTGTCA AAGAAATAAT GCTATAAGTA TAACACTATA TrrTrATAT TrrTT1Tr' CTCCAAGCAT T CTGTTC CCGCAGTTGT TCAAAACACA CCTTTCCTAG AAATTCCATA CTTCACGATA AGTTGGAGTG AAAGTTA'rrC TTGATTATCC CTAATACCAT AAGGTTTTCC TGAAATCGTI GTAATTACT TTCTATTCTT TrTTCT AATAAACACA CAAACAAATA AGAGCAAACT AGGAAGCTAG ACTGACGAAG TCACTCAAAA CCATACATAC GGTAAGGCGA CTAAAAA.AGC AGACCATCTA ACAACGAAAG GCATTTCTGA TTCTTGCGGA ACTTCAACTG TTCAAGGAGT TTACGCTTAC ACGAAGGGCC TTGATATCAG AACTTCAAAT TGTTGGCGAG
TTT~ATAGTTT
TAATACTCAA
GTTTTGAGGT
GTTATATAAA
TGAAAATCAA
TGTAGATGAA
CATCGr'TTTG
CGCTGACGTG
AGCCTCCTTT
TAACTTATTC
A'rCCGA'rGGA
GAGAAACGTC
TACGAGCGAC
GGATGATTT'r
AGGTTGTAGA
GTTGAAGAG
ACTA'N'GATT
TTCATCCATA
'NTTCATGCGT
ACCACCATAA
AATCTTGTGT
'rGAAACTGAC GAAGCAACAg ATTTTCGAAG AGTATAAAAA CTTATATAAA TT'rCCTGTGA CTCAAGACGC TGAGGAAGGC TTCTTACCAG C7rTI'TTVM CATTTAGCAA GTACG'rrCTT CCAATAGCCG CTTGGATTGG 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 TTCGTAGGCA AAGTCCT'rGT GAACGATAAA AAGAATATCC A7"rflCACCA GCTTAGATGG TGCATAACCA CGTGTCGAAG ACTTAAGTTT AGGAATTTGA TAGATAACAT TGACACGGTT CCCACGCTTA CGCTGAGCTA GCTCCATTAC 'rTGCGCCTTG ACATAAGGCT CTTCAATGGT CTTGAGTTTA TCAACGATGA GCTGAGGGCA TCCACCTTAT CGATATTCT GACAATTCGT ATCAA.AGAAG TCAAAGACAA ATCATCAATA TAGTCCATAG
CTTTCCCACG
CTCCATTGAG
AGTCAAAGCT
TTTCAGCAAG
TCACAAAGTC
TGCTCCGACG AACTCCTGTG GTACCATCAT CGCAATCTTA GTTGGG'rCTG GAAACTCAGA TGGGTTAGAC ACATCCATAG ACTCACCGTC GGTCAAAP'rA ACTTTGTAAA TAACAGACGG AGCTGTCATG ATGAGGTCAA TATTGAACTC ACGCTCTAAA CGTTCCTGGA TAACATCCAT 624 ATGCAGAAGT CCAAGAAATC CACAACGGAA ACCAAATCCA AGTGCCTGAC ATCTrrCTGG TTCAAACTGA AGACTACCAT CATTCAG?1'G CAATTTTTCA AGcGCTTCAC CCAGGTCATT GTACrGITT GATTCGA7Mr GGTAGAGACC CGCAAAGACC ATAGGATTCA TCTGCTTATA ACCATGTAAT GGTTCTGCCG CAGGATrGGT ATCCTGAACC GTCTIGATAG ACGCCGCAAT ACCACCAACC GCTT'rTGGTG GCTCATGACC TGAATCTTAT GATAACCCCA CGGTAAGCAT CACATCACCC GTTGGTGCTG AATACCAGCC TTCGCAGAAG AATCTCTGTA CGCACGCGCT CATGATTTCC AAATCATTAT
TAAAAATACC
CACCAGGT
CGTAAACAGA
'rGCCAAGGTA ACGGTATCAC CCACACGAGT GTAACCAACA TCACCAGTCG CAAGGAAATC r.ACTTCCGGCC ACATCAAAGC TCTTACTATT GACCACTCCG TCCATGACAC GCACTTGGAG GTCGAAAATC AAGGCCTTAA GTGCCGCCGT GTACTTTTC TACAATTGCC TCGAGGATTT CCAAAACTGC TTCACTGCA TCCAAACCAA CCGGATCTGC ACCGCGCAGG CCAAAGCCAG ATAAACGTG C. C.
C. 0 C C. C
C
b.C
C
*0 *t
C
eec.
C. CC TCCTTGAGCC GCATCGACCA CCAAAATAGC ACCCTCACAG TTCATAGGTA AAGTCAACGT GCCCTGGTGT GTCAATCAAG ATCTTTTGCA G'rGTAATTCA ACTCGATGGC ATTCAACTTA
TCAATTTTAT
GCAAGAGT
GCAGCTAGCG
TGGAAAATAT
ATAGTAATTC
CTTCAATCCC
TCACATCTTC
TAATGATAGG
GAGCCTCAAT
AACGTGAAAC
AAGT7rrCCCC
CACGTTCCCG
2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CTCTAGCTCC ATCTATCCA AAAGCTGGGC CTGCATTTCA CGACTrGAAA CCGTCTCTGT TTTTTCCAAA ATGCGGTCTG cTAGAGTTGA TTTTCCGTGG TCAATATGGG CGATAATAGA GAAGTTACGG ATCTTCTCCT GTCGTTTTT TTCAGGGTAT CTATTTATTA TAAATTGr'rT GGAGTACTAA TCTrCAGCGA CAAAGCCGTC CAATTCTT=CT AAGTTCATGA 'rTC'rCTrCCT TTGATATTTT GACAAGACCA TACCCTGCTA ATT'rTCGATA AAGTGGTGTT CTGTCA7TCC TTGGTCTGTA AAGACAATCC AATCTrGCCA TCTrCTGTAG ATAGACCGGT GTATGACCGA TGrT=CTA ACCATACTT ATCAATACCT GCGTGAACAA M1ATGAATTCG ACCAAGTCTG AACTGGTGCA TCCAAGGGAC ACTATAATGG TCATAACTTT GTTTCCGGAC AAACAGATAG
CGTGAAGGAC
TCCAAAGCTC
AGACAATGCT
ACCACCATAA ACAGCTCCTC AGATGTACCG CGTTCrTGCT TTTTCC-AGTA TGATTTTCAG CATCCATcCC
GTAACAAACC
CTCCGTGGAA
TTTTATAATC TGTTG7rTCA TGCCAGTCGT CCAAGGTCAA AGATATACTT GTCTGTCTCT ACTACAAATG GCArTGACG CCGCTTCAGC G9CAACCCGC GACCTAGGAT AGAGTTAATG CTTCTGGGTC ATCTAGCCAA CCCCTTGATT GTCCACCAAG
TTGGCATCTT
CTTGTATCTC
GTCAAAAACA
CTACTCCATC
CACCATTGCG
TATACTCGTG
TCCTTGACCA TN'rCAAGAAC ACGGTGACTA TCCTCPACCTC TGTCAATCAA ATCACCTAGA AAGAGCAACT GGGGCTGACC
ATCCCAGGTT
ATAATAATCr
CTGCTTCTGT
625 TTGAGAAGGT CTTCCAGCAT CCCAGCTTrT CCGTGAACAT GTCATCTTAT TT-CTCCCTC;T TrCTCAACAA TTCTCTGCTr CACATCATCA CCTGCCAACA TCTTrGGCAAC TTCCTCCACT CCGTCAAGAG ACGAACAGTC ATTGATAATC TGCAATCGCA GCTGACCAAT TTTATGAA7r TATCCACCTC ATCAAAGACA TGGCTAACA'r GAGACGAGAT CTTCTCCZAGG CT'rGGTTG.AA TTCCCTTACT AAAACGAACC GTTTAATCTC AGCTTCGAC? CCAAATTGAC AAGATTGACT GATTA'rTGCC TG'rCAAGAGA GAAACCGTTG AATGGTCATT AC1'AATCTTC ATTACTTGTG GCAAATGGGA GATAGCCAAA TTCTGACCAA TAGCTTGAGC AACACGACCT CTCCAATTAc
TCGCGTCAGGG
CGCTCTTCGA
TCAATAAAGA
ACC~I'GACCAT
GAAACTCCCG
GCAGACTTAA
GGTTTAAACT
AT13CTAGTCT AATCCrCC
ATATAAAACT
TGCCI-rCr-r ACGTGAAAAG CAGAAGCAAC CTTAACCAAG CAACCAT=T ATTTCCCTCA CGACTGAAT CATCAACAGT CCCACCATAC CCTGCATCAG GCGATTGCCA TAA'rGTCI-?C TAAAACATAG GATCATACTC TTCGACACTT AAAAATCTTC ATTGTCCAAC GGTTGAGGAG 'rTTATCTCGC TGAAACTGGG CTTTTTCCAT TGCTGACCCA AATTATGACG TCCAACrCT TAAGC1'CTGC TTGTATTCTT CCGTAATCTT T'rACGACTAA TAGTATGAAG TCAAAATCAA GGTCCTCAAT TAGGTCI'CAG ACAGXI'AGC- TCCATGTCAT 'rCATAGCTGA ATACTGTAGG CA'rTGGTCAG TCTTGATTCA GACCCAAGTC ATAAACATCT TGCAGTTCr-r AGCAGAAGCA AGTTGACCTG TTCCATGTCC TCAGACGAAA GGCAAAATAA AGCAAAACAT GAGGTCCAAA CGATTCTCAA GATAGCTTCC AAACGTTTGC TGAAATTTCA CGGrA1-rCAG ACGAACATTG GCCAGACTTG 'rG'ATCCGCA ATATTI='Tr TTCTCCAGCC TGCAAGT'rTG ACGTGCCTTG TGTTCCTrr ATAGGCATCA AAACTCGTT'r TTCATCCAAC ATCTCGATAT 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4960 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 C'rGCCTCAAT GCT'rTTrCTT
GATAGGTTTC
CTCTGCCArr TGAAA'rTCCA ACATTTCGAT GACTTCCAGA ACCTGCTTGC GCATTCCC TT'rCAAGTCC.CAAAAAGCGG CATCACCAAA CCAGTTGGGG ACGCATTAAC TCCTCATCGT CATGCTGACC ATCAATATCT ACAAGATGTT GCCCAATAGC TCGCAAALACA GACAGATTAA CCATCTGACC ATTTACACGG CTGATACTAC GA CCAN-rG CAAGATTCC COACGGA'rcA TAATrrCATC ACCTAATTCT AAACCTTGCT CATCAAAAAT TTCCTGTAAA AGACGACTAT TCTCAACTGA GAAAAGCCCC TCAATCTCTG CCI'rTGGTGC ACCATGACGA ATAACATCTG TCGTCGCACG AGCTCCCAAC ATCATATTCA TGGCATCAAT GATAATCGAC TTCCCTGCAC CCGTTTCACC AGTCAGGACA CTCATCCCCT 'rTCAAAATT GAGGGAAATA GCCTCAATAA TGGCAAAGrr TTTATCGAA ATTCAAGTA 626 ACATATAGAC CTACCAATT TTACTTGTT CAAAGATTTC CTCTGCTAGA CTTCCACTTC TGGCAATGAC TAAAATCGAG CTATCATCAG TCAAACAGCT AAAAA'rCTTG TCTGCAAAAG TCTCGATTAA C1'GAGCTT'T ACAAAAGCCG TATITTCCTGG AATAACTTGG AGATTGATCA TCTTATCCAT CAATTC.AGCC GATTCGATAT TGTCTTCAGC CAG~rGCAGA CT7"MACGA TTGA7?rTTGG CAATTCGTAG ACATAGGTGT TGTC'TCTCAA AGGAATTNG ACTCTTTGAT ATCTCGGGAT ACCGTCGCCT GAGTGGCACT GATACCTGCT GTTCTACAAT TTCI-rCTTGC GTGCCGATTT GATAATCT'1 CACCAATCTT CAAGTCTCTC 7rTTArTC ATT'rrTAAAT TGACTATGCG CCCTCTCTAC ATCTCAGCAA GAATCTGATT GCTTGCTGAC r'rTTCTTTTT TCA.AATACGC ATATTTCCAT GTCCACCTTG GATCGGAGAA AAGTCCAAGC CAAGGACTGA TCTACTGCCA TAGCTGTTAC ACATTCAAGG ACATTCTGAT GAACCTTAGC A7TrCCATTTT'r CCCAATCTG CTCACGTCCT GCCTCAAACT GAGG=TGAC ACCTGACCTT GATCAGCCAA GACACGGTGC AAGGCTGGCA AAATCAGACT
ACAATACCTA
TCTTCAAAT
CTAATT
TGCTTCTTTA
TAAAAATTCA
AAAACCTACC
ATCTCGAATA
AAGTCCTACC
AAGGGAAATG
AAACTCACAT CAATACTGGC CGGAAAI'GA ACTGCTCCAT TGATTGGTAC CAACATCGAC AAAGCTCGGC TCCTGCTCGA AATCAGTCTT TTCAGCATAG GCTGACAACT CGTGGGTCT'r GGCGTAATT'T CCAAGCCAAC TGCAAAGACC AAC?1'GGCAC TATTCTGTAG CATGACATCG 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 GTAAAACCTC CAGTAGAGGC CCCGATATCA AAGACCTGCA AGGCC?17rTC CAGTTTCAAA CCCTTGAGTT ITrAATTCGGT GTCATCTGGA CCATTAAGGA CTGCTACGAC TAGGCCACCC TCAAACAACC CCTGTTTATA AGCTAGTACA ACTTTCTACT ACACTTACAA TCGATTCTGT
ATCGTAGTCG
CCACCACGGC
ATTTTCTCTC
ATCACACCTC
CGCCATCCAC CGACAAATCA TGACATAC?1r GAGTTCTCC CTGGCTGTC AAACCGTTCT GCTTGGCCTC C.TCTCTCGTT TCCACTCTTT CCTTAGCCAT TGATTCTCAA TTCAAAGGGA AGCTGCTGGG CAATTTCTTC TAAT"=TCA TTAGCTTGAT CCAGGGTTTG GTTACAAAAG GCAATGGACT CTTCCAAGCC CAACAGGGCA GGATAGGTTG ATTTTTCTGC CTGCAGATCC TTTTGAGGTG TCTTGCCGAT TTCCTCAAAA CTAGCTGTCA CATCCAGTAC .CAATTCACCC ACAG7=~CA GCTTCACCTG TGCCGCTTGG AAGGGATAGG CTAGTAACTT
ATCATCTCTG
CATTTCAGGT
CCCAGTCI'A
ACTTGAAAAG CAAGTCCAAT GACAATTCAG CTATAATAGC TTGGCATGAA TAGTCTGAAG TTCTTCCAAA GACAAGTGCT GGTGTTCGCC CTCCATATCC AAAACTTGCC CTGCTACCAT ACCCAGACTA CCTGAAGCAA GGGATAAGTT GGCAATCAAG TCCACCTTAA TCTGACTTGG CAAATCTGCC TGCGCAATCA AGGCATATGA GTCTAAGAAT AAGGCATCTC CAGCCAAAAT 627 GGCCATAGCT TCACCGAATT 'rCTTGTG.ATT GGTTrAACCGC CCTCTTCGAT AATCC'ICATC ATCCATAGCA GCGAGGTCAT CGTGAATCAA OCTCCCTGTA TGAATCATCT CrAAGGCACT AGCTACCTGC CCGTGAGCAG G=rGATGGT AACCTGCAAG GAGAAAAGGC CGAATACCCT TGCCACCAGC ATGAATAGAA ACTAGAGGCA AAC'TGCTGGT C'rCCATAAAA ATCTTCCAAA TTTTTCTTGC TrrTCATTC AAAATCACTT TCTCTTCCGT AAGGTCTr CAGCCTTGTC CAGCGTACCT TGGAGCTC?1' TGAAAGGCAG TAATCGCATC TrCCAGAGCA AT'NCACCAT GTTTCCAGTT CTGCTAGATT TTCCTCAAAT rrCTTTTGTT TTCTACTTGA CCATCTCGCA TCAAAAGCGT TACFGGTCT GCTrCCAGAA CTTCTAACAA TAGAGAACAG ACI'CCCGTAA GCCGACTCGA CAAGAGCTAA CTTCTTGCAT GACTGAcC TTGACAACAC CATGCCCTIrr TTTCCAAACT TrGGACAATG TTGACATCTT TAACCTCTAA TT7'rTTCA AACTCTCAAC CCACGCGCCA CCATTCGGC1T TCAGCA.ACCT TGGCGTCATA CCTAAACGGT CTTGATAGCG GTTCTTGCTT GAACTAATTG CGAA'rCTACA ACGGACTCTT AGTA'rCCAAC ATGAGCAAAG AACTAACGCC ATTTGGCTAC TTGGATTTTG GTAACAGGTG 7TT'rATCA GA.AATCCGAG GCGTTGCAAA TAACCGTC~r TTTT-TCAAA GCCTCTTGTT TTCCTGAT'T TGCAAATGAG CGCCGCTGTT GGCGTTGCAG CTCATGCCCC ACACTAGAGA TTCT'rCGTTA AAGGCCCAGA ATCCAAATCG TCCCGTrCAT TTCACCTTGA ACCTTGGTCG GGTCGTGATA ATATCTCGAA -AGAAAATTGG GGCAGAGCTT TTTCTTAAGT TGTTCAAACT AATGATGATG GAGTAGCTAC CTTCATTCCr'TCIrTCCAGGT TTGAATAACT GCATGGTCAT CT*rTTGAC AATAGCATAA CTTCCGAAAG TCGCTTGGCC CTAAGAGCTr GTCCAACTGT ATAATTGTAC TAATTGATGA TTC~cAAACT TTGTTCAAA ACAAGCG;CTC ACGTTGTCT'A TCTTAGATAG AACATTCGG CGCAG7TTGCA AAGArAACAG
ACTGCCGTTA
CTAATACATC CAACT'rGGTC ACAGGTGTTG CGCGTCGATC TGCCACAAAA TCTGCCAAGG TAACTGGCAA ACGAGATTCA AAAATAGCTC GATCCTCAAT AGAACCACCT CCACGACCAA TAGCACGCGC AATATTTrA GCAATTTCCT GATAAAGAAG GATGTCAACA CCTGGGAATC TAACGGCTCC ACTACGGCTG GTTACTACAC GCTTGAAGCG TTCTTGAAAC AGGCCTTCTT GAATCGCAAG CGCCCCPLACC CCATCAGGCT
GTTGGTCCAA
ACTG;ACTGCA
CCATCCGTT
CCAGTTCAC
TCACATCCCT
GTACCACAAT
TAATGAGCAA
CCGCAGCCCC
GCCTGCTGAC
CAATTCTCTT
CTCTCAAT'T
CAGCITTTTTC
7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9 120 9180 9240 9300 CACTTrGGTTC ATAGACCTGT ACACGCCCAA TCACATTGAT CAAACCCTAA T'N'CTGATAA ATCCCAGACC AGATGGTCC CCTTT'AGGGA GAAATATTGG TGAGTAGGTC GTTACAAA 628 GTTGGAAACT TGACCAGTrA AATAGACCCG TTrCCAAGTAT GGGTCTTTAT CGAAT~rCAT TTTCAGATAC TrGGTCAAAG TTGTTACCGA TAAATACTTr TCCATCTCCA CCTACTATT~C ATTTACTTGC TCTI'TCATGG GTATrATrAT ACCAAAAATA TGCCTAAAAA TCTCCAPTA TGTACCATI'A TGAGGGAAAA ATAGAAAAAG GAGGCAAGGC CTCCACATGT GATTATTTGC TGTTTCGAGC TTCTTCCAAA ATCT'PTGCAA TCTTGG'TCGT CAACAGGTCG ATAGCCACGG TATTGCTAAC CCCTTCAGGA ATGACGATAT CAGCATAACG CTTAGTTGAC TCGATAAACT GGTGGTACAT TGGTTTGACC ACACCTAAGT ACTG;GTI'AAT AACGCTATCA AGGCTACGGC CACGCTCCTC CATATCACGC TTGATACGAC GAATA.ATGCG CACATCGTCA TCCGTATCCA CAAAAATCTT GATATCCATC AAATCGCGCA GACGCTTGTC CAACGATAAA GACATCTTGA GGTTCCTGAC GATAGGTCTT TATAGTCGTA GGTCGGGATG TCCACCGGAC GCCCTGCCAA TCATCAAGTC TGTATCAAAG GCAAAAGGAT GGTCATAGTT INFORMATION FOR SEQ ID NO: 77: SEQUENCE CHARACTERISTICS: LENGTH: 5365 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear CTCCAAGACC AAAATACCCT GCTACTCCGT GTATGCTCTG CA.ATTCCTTA ATCTGCTCGA GGTTTTGACG G 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10011 120 180 240 300 360 420 480 540 600 660 720 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: CGTGTGGTCT TAAAAATAGA AGACAAAGAA CAAACTGTTG GAGGCTTTGT CCTTGCAGGC TCAGCCCAAG AAAAAACCAA, AACAGCTCAA GTTGTGGCTA CTGGACAAGG TGTTCGTACC 'N'GAACGGTG ACTTGGTTGC TCCAAGTGTT AAAACTGGAG ATCGTGTCTT AGTTGAAGCC CACGCAGGTC TTGATGTCAA AGATGGCGAT GAAAAGTACA TCATCGTAGG CGAcTA.ACAT 'TTGGCAATC ATTGAGGAAT AGA.AGGAGAA AGTAAGTATG TCAAAAGAAA TTAAATTTTC ATCAGATCCC CGTTCAGCCA TGGTTCGTGG TGTCGATATC CTTGCAGACA CTGTTAAAGT AACCTTGGGA CCAAAAGGTC GCAATGTCGT TCTTGAAAAG TCATTCGGTTr CACCCTTGAT *TACCAATGAC GGTGTGACCA TTGCCAAAGA AATCGAATTG GAAGACCATT TTGAAAATAT GGGTGCTAAG ?1'AGTATC-AG AAGTAGCTTC TAAAACCAAT GATATCGCAG GTGACGGAAC TACGACTGCA ACAGTCTTGA CCCAAGCTAT CGTCCGTGAA GGAATCAAAA ACGTCACAGC AGGTGCAAAT CCAATCGGTA TTCGTCGTGG GATTGAAACA GCAGTTGCCG CAGCAGTTGA AGCTTTGAAA AACAACGCCA TCCCTGTTGC CAATAAAGAA GCTATCGCTC AAGTTGCAGC CGTATCTTCT CGTTCTGAAA AAGTTGGTGA GTACATCTCT GAAGCAATCG AAAAAGTTGG CAAAGACGGT GTCATCACCA TCGAAGACTC ACGTCGTATG GAAACAGAGC TTGAAGTCGT AGAAGGAATG CAGNTGACC GTGGTTACCT TCACAGTAC ATGGTGACAG ATAGCGAAAA AATGGTGGCT GACCITGAAA ATCCCTACAT TTGA7rACA GACAAGAAAA TrrCCAATAT CCAAGAAATC TTGCCACrT' TGGAAAGCAT TCTCCAAAGC AATCGTCCAC TCTTGATTAT TGCCGATGAT GTCGATGGCG AGGCTCTTCC AACTCTTrTr TTGAACAAGA rrCTGGAAC CTTCAACGTA GTAGCAGTCA AGGCACCTGG TTTTGGTGAC CGTCGCAAAG CCATGCTTGA a AGATATCGCC ATCTTAACAG AGATGCGACA ATTGAAGCTC GGTTATTGTA GAAGGTGCAG GTCTCAAATC GAAACTACAA CAAATTGTCA GGTGGTGTAG AGAAATGAAA CTCCGCATTG TATTrGTTGCA GGTGGTGGAA AITGACAGGA GATGAAGCAA TCGTCAAAmT GCTCACAA'rC GCGCAACAGT TATCACAGAA GACCTTGGTC TTGGTCAAGC AGCGAGAGTG ACCGTGGACA GAAATCCTGA AGCGATNrCT CACCGTGTTG CGG'rrATCAA CTTCTGAArr TGACCGTGAA AAATTGCAAG AACGCTTGGC CGGTTATTAA GGTTGGAGCC GCAACTGAAA CTGAGrrGAA AAGATGCCCT CA.ACGCTACT CGTGCAGCTG TTGAAGAAGG CAGCTCTTGC CAATGGA?? CCAGCTGTTG CTACCTTGA CAGGACGTAA TATTGTTC'rC CGTGCTTTGG AAGAACCCGT
TTGAGTI'GAA
AAGATAGCAC
780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 CAGGArrTGA AGGATCTATC 'rGCTGAGCT'r GGTATAGGAT TS'AACGCAGC AACTGGCGAG AGGTATCArTr GATCCAGTTA AAGTGAGTCG TTCAGCCCTA CAGCTTGATT TTGACAACAG TCCAGCAATG GATCCAAGCA TATAAAAAAC ACAAAAGGAG GTAGTGGGTT GAAGTCAGCT GTCAAAGCG ATAAAAATCC GCGCTTGATA AGTTTGATGA ITTGAAGGGCG TTGATAWTCT AATAGGATGA ACTTGCTTAA
AAGCAGTCGT
TGATGGGCGG
GGAATGACTA
AAGCTCGAGA
AGCCAATAAA
GATGATGTAA
ACCCTTCTTT
AAGGACAAA'r GT'rATCGATC GTTTGAAAAA TGGGTTAACA TGA'rTGATCA CAAAATGCAG CATCTGTAGC CCAGAACCAG TAGCCCCAGC GCTTTCTATA GAAALACAACT TTATAGGCTC T'rTGTCAACT TTCGTCCTr CTTTTGAT G7"TTTrGAA G~rTTCAAAG TTTCGAAAAC CAAAGCCAT GATTAT'r.GT CGCTTCCGGT 7TGGCGTTAG AATAGrGTAG ?TCTTATC TTTCAGGAAG GTTTTAAAGA CAC1'CTGAAA GATTGTCCTC AATAAGTCCG AAAAATTTCT CCGGTTCCI-r ATTCTGAAAG TGAAACAGCA AGAGTTGATA GAGCTGATAG TGATGTTTCA AGTCrrGTGA ATAGCTCAAA AGCTTGTCTA AAATCTCTr ATITGGTAAA TGCATACGAA AAGTAGGACG ATAAAATCGC TTATCACTCA GTTTACGGCT ATCCTGTrGT ATGAGCTTCC AGTAGCGCTT 630 GATAGCCTTG 'rATTCATGGG ATNTCGATC CAATTGI-rC ATAA71rGAA CACGCACACG ACTCATAGCA CGGCTAAGAT G'rTGTACAAT GTGAAAGCGA TCCAACACGA TTTTAGCATT CGGGAGTGAA ACAGTCTGGG AGACTGTTTC TGTTAGCCA AGTCATAGTA AGGACTAAAC CGAACGGCTC TATCGTAGCG AAGAAAGTGA AG.CCTGAGCC TAGAAATTTG AAAGCGAAC ATATCCATCG TAATGATTTT CACTTGACAA TrrCGGATGA CAGCTGTGT TCTGCCTTCA AGAACAGTGA TAATATTAAG GTGAAGGCAT ACTCATCCCA ATTATCAAAA TCTTGCGCAA ACACATAATC TTTGGAAGCCC AATGACAGTT GAAGTTGAAA TGAAACTCAT CTTTCCCTTA GAGAAAAATC ATGCTCAAAG TGGCCAGCTG ATGGGCAATA TGAAAGTCAT
TCAGTCATAG
ATTTGGTGAT
TGAr.CTTGCG AAT7'r7
TTTTCTTTAC
AATTAACTTT TGAGCAATCT TTTGGTTGAT GATACGAGGG CAGGGGAGTC TCAGCAACCA TCATTTTTGA ACAGTGATAG S. 55
S
S
S
S
S
S
55 S S CACTTGAAAC GACCTTTCT AAGGAGAATT GGAATTI'TAG AAGGTTITrTG AAAGTCATAT GATGGGGCGT CGTAGTCCAG 'rTTGGCGATG TCTAAAATCT GGATAT'rAGG GTCTrTAATA TCCATATGAA TCr=CAAT GAGTTGTTT TTTTTCTAC AACAAAATAG GCTCCArAAT TATAGAGCCG AAAATrCACA TCTAATATAT TrATTAAAGG ATGACACAAA AGTTTTTGAA AAAATATACC TGACAGAATC TAAAGAATCT TATTTTGAGT TTATTGAATC TAAAAGTATT CTGATAGATT AAATAGCATT rTCTCTGTTrG GATTGATGCT ATGTGGAAAT ACAAAAAAAT TATACTAATC ATTrTCGTAT 'N'TTTGTATT GGAATAAAGA CATTAAAAAA TAACAGTATA GCATAAATCT CTTTCTAGTA ATGTGTTGTA TC?=ACACA ATTTATTTT'A TAC1'ACCAAA CAACT'rTACC GATTC?1'TAG TTCTACATAG GAGAAAGGAC CACGTCCATT GTTAATCCAA A'rM'AG'rCCA AGTCATCAGA ATAAT'rCA'TT TCCAAGAGAC GTMCCCC ATCTGTAAAA
CTAGAAGCCA
TTCTTCAATT
ATTTCCTTGT
TCGAGCAGITT
TACCAGTCGT
GGTTTCCGCA
GTGTATCCTT
TTGTGATAAA
GTCGCTTTTC AT'rATAGGTC ATCTATAAGG GATTTACCCA GCAGACTACT TTGAAATfGAA
TTCAAGATAA
CTCAGGCAA
ATTGATGATG
ATGTAATTGT
ATATGGGACT
CTACAAA'rAT
ATTAAAAAAA
AAATCTACAT TCAAAT~wGT AGAAGGATAT GGAA'rTAAAC AAATGGACAA TGTCAThAAAA GCTTTATATT TTCAAAAACC ATTAAATGAG AGATATTrGTr TT'rAAAATA'r TGTACTAAAT GTTTTTGATA CGAAGTTCAC CTGTATTTTr 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3960 4020 4080 4140 4200 4260 AAACGATATA AGTTTGTTGT TCTATTTGTT TTATATATTT ACTCTGCTAT AATAGATTTA AA.AGGTCAGG ATTTTGTTCC CGCTTGTACC AAATGTrrAC
AAACTTACAA
TACGAATTCT
TrcTCCIr'rrG
TGACCTTTGA
ATAGGCT'rCT
TTCTTTAAA
CTCTTCAACG
ATCAATATAC
TCAACAAGAA
'rTGCGTTTGT
ATTTTAACAT
TT7TTGACATG
GACGCTCGTA
CCAAATCGTA
TTCAGTGCTT
TTATCACGAA
GGTTCTCGAG
CCAATAA'rGG
CTCCCATCAT
631 CTrCATCCAG ATAGTAGGGG CTAGCCATAT TGCAATAGTA AGAAGTCCA TCATGGC.AAT GATATTAAAC CAATAI'TCT TGTGAAAGTA AACAATAGCC TGACCCAACG ACGACCATCA CTTTCGGTAA CAAGTGTATG ATCG~rGACA CG71'TrCTGT TGTTTTAGT ACCATGGTGT CCCGCCAAGT TCGGTGGAGA GCTTATAACT TTGAATTGTA ATA.AAGTCGC CTTCTTTTGG AAGCTTCATA TNCTACAAT TTATAAGTTT ATCATTTACT ATTGTACCAT AAAATTACCC AATTTCACTT GGAAATATT'A AAGATATTCT CTAAGAGCGC ?I'GCTATA'rC TAGCCCTTTC GTGCTAAAAC TTGAGTrAAA CGCTGC= CA GTTCGTATCC CGGGCATACT TAGTATATPG CTTATCAAGT 'rCCTTGAAGA TGAGTTCCTG TCATCAACTT GAC'rATCCAA T'rCGTCAAAG GCAA7r=AG CATCAAAATA S
S.
S
S
ACTAACCAAC
AAAATCTGTG
CGAAAAATCG
TTCATACTTT
AGTCG-TTrCT
AGAGAAGCCC
TTTCCCTCA
AAAATCATTC
AGTCAGTACA
GTACTGGCTA
GTTTTCATTA
AAAGGATAAG
CTCTTTCAAT
'rrGTrAGTCA AGTTCTGGAT AATCTTATCT TGCAGGGCAC GAGCTGGAAG TATrlICA ATAGTT'rATr GGCTACACGI' TGAGCAAC?!T CCGAAAAATC AAGATTTCTT CTATAGTAGA 'IrTGAAATT CC~rTGTG CTA.ATTTCTG TAAGGTCCCT TGTCTCCTGA AAGTrGATTG GCAT'TGATGA TAGCATAAC TCATrAATCC ACTrCTCT'rC TTrkAGATTA GCAATGACTT GAGAAACGAT ATATCATATT T TTTCAGATA TTCTCTGACC TCTTTTTCAG TACGTGCTT TGGTAGAGGG CCAGATTCT T ACCATAAGAA AATTGAGCAA ACTCTTGAAT TCCTCTTCGC TTATCACCTT ATCTCTCGAT AACATAAAAC GAACAATTGT 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5365
S
55
S
*5S*
S
S
*S
S S GTCTTCGGTG ATATAGCATr TGTCG INFORMATION FOR SEQ ID NO: 78: SEQUENCE CHARACTERISTICS: LENGTH: 3636 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: TTTCC.AGAAA GAAGTGAGT AAAGTCTrTA TCAAAGAGAA TGACTTCCGT ATTGGAACTG ACATTAGGTT TTATTTCTAC TTTACTAGCG TCCGCCCTAG CATTI'TCTAA ATCTTTAATC TCTTCTGTTG CCCTATTTAT AGCCAGCTGA ATAACTGCTT GAGGAT7rTC ACTCAGTCCA TGAAGCTTAT CGTCCACCGA AGTATAAAGA CTCGAATGCA TGACTTGTAA AATAATCAGA GTCATTGTAG AAAAAA'rCAG TCCGCATACC ATGTTTTTT CAAAGTTTGC AAATTCTCTG ATAGACTTCG ACAACCGAAA CTGCCTCTTA GGCAAAATCA TTTCCCCAGC AATTCGACAG CACGATATT1C CCATAAGTCA CTGAATCCGC A'N'TTAAGTT CAG?1'CAAAT CC-ATGTCCCT 632 GGTGAAGACA CCGAAGTTGC GGATAAAATA AAGTTTACTG AACATCTTTT AAAAGATACC CAAAAGTGGT TCCCI'N'AAT TTCTTACGGA TCG"=GATC ACTATCAAAT CCCCATAGAC CAT7'TGATT TTGAAGGAAA TAAACTAGTA GAGTATCrrC AACTTTAACG GTA7TGGTTG AGGTGM"C ATTAAACTrC CCTGAACGTT CTTCTAGGTA GAAAGGTTTG GTCAGATAAT
ACTAAAGTCA
CAACACTACG
CTTGAAAC
GGTCAAAAAT
AATCGAACTC
A'rAAATTAAC
TGAGAAGGGC
CATCCCCTCC
TCAGAACTGG
TGTCGTAATT
CATCAAATCC
TCCATCAAAT
CAGACCTACG
CCCTTTTCAC
AGCAAAATCA
ACCTGCATAA
TCATCCTCAA
TGTCATCCAA
GCAATTCT
AGTCATAGAC
CATCCGCAAA
CCAATAAGAT
ACTTTCCTTG
TAAGACITGG
ACCACTCTCA
ATCGTCTAAA
tTTrATCATG
CTCAACTCTC
GCAGTCATAA
ACTATTATAC CAAA'rTTGCC TTAAAAAAAA TGAGTTTTCT T'rTTATTTTA GGCTTATTTA TGCATTTCCG GACTGCAGCT r'rTTCACGGC TAATCAAGTC AACACGCGCT AAACCAT'Irr 7wTTC'rGCAA GCTTCGTAGA GACCTTCTTC AAGTCAAATA CTGAATTTGA AGAAACTCCT CCTTATTAAA TGCATTTTAC ATGAGATAGC TA7TGAAGAA CAACTGCTTC GCAA CCT TGATTCCCAT TCAAAGAACT CCTTGTATTC ATAACAAAGC TATCAAAGCT TCACGCGCCC AAGACCAAGC CAAGCAGACA AGTCCTGTGG 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040
ACCGATCTTA
CGCCAACCGT
CATATCTCCT
TGTTTTCTGA
CGGCTAAGAG
TGCTGAGTCT
CCAAGGGCTG
GTTGCTTGAT
CAAGGTCAGA AAGTTGCGGT TAAATACATG AGCAGGAAGG CCTTAATCCA AGCCCAGTTT GAGCTAGGAA TTGGTAATAC TCCAAGAAGT AATCAGG'TTT CCAACTGGCG TTTAAAGACA TTTGACCACA AATTTGTCCT ACTGTATGCA AGAGCTGCTG TCGATAT'rAT
GCATCTG
TIG7rTCATCT
TCCTTGTGTG
CCGCATCTrGT
CGTGAGTATA
CATTAATCAG
TTGCGAAGAT
AGTATCAAGA TAAAGTGCTA ACAAGTCI'TT AGTCTCATGA AAC1'TGTGAG CGAATAGCTG CTGGGAGTCC TGCAAGATTC T'rGGCTAGCG ACTTGACTAG C1-rCTGCATC ATTTGAGCGA qTGACGAACC AATTCATCCT CATCTGATTC TCCGTCTTTA ATAGTTATGA CGAGCCAATT TACCAACCAG TCCTrTGAAG TTCATCAATA AAGCGCTCAA GGGCTGAAAT CACTTGAGAA AGACTCTTCC TTAGCAAGTT TATCAAGAAC TGGAAGCAAG ATCATCATCG AAACAGCCAG GCT='CAAAAC CAAGACGGTC GCTGTTTCAG CATCCGTTCC ACAGCTGAAA CCACCAGATA TCTGC.ATAAG
AAATGTGCCC
TGCTCACC ACAACGC TTCTTGAAC AATTTGCAGT TTGC=GGT TATCAAGTGT TGCCTCAGCC AACAAACGAC 633 C 'C1AGCTCA GCAAGAACAG CTGCTAACAA AGTATTrTC.A GTGTTGAGAC GAAGAGCTCC AGGGATTTCG ATACTTTCAG TTTCGAGTGT GTCTCCTTGA TAGTCGGTAA TATAGTGCGC 'rTCATTTTCA GCAAGAAGAG CTGCCTAGCC ATCAGGCAAG CCTTTCCAGT TGCTATTGAG
GGGCACCACC
AATCTTCAAG
CCAAGAATCC
CAGAGACGGT
ACATCATTTT
ATGAAGGCTG
TCI'?GTCTTC
CAACTTTAAC
CGACATCACG
GTCACTACCA ATGGTGTTGC TGTATTGGTG AAAATCAGCA TCTCCTAGCC AACGGCCAAG
GTTCTCACCG
AGTAAGAACT
TCCTGACGCT
'TTTCAAAG
CATGTGCATG
TTCATCTGGA
AAGAGCAAGA
GGTATCCACA
CCACCATTTC
ATGAAGAATT GT7?lwrGTGA GCGTAACCAG GCTGTTCCAA.
TGACCAAGGG CATCCCAAAG TAGGCGTGCA AACCTTTAGC AGACGGCTTC CTTTGGCATA TGTTTAACTT CGACGTGGAC GGTACTCCAC CTGTTTGGAA CAGACGTATT CCATCATATT A'rAGTCACGA GCTTCCCAAA
GACGATACCG
AGACTGAACG
ATCTTCAAAG
AGCGAAACTT
CCATTCGTGA
AGAGTCTCA
AGCACCAGCT
TCCATAGTAA
ATTTGAAAGT
AGCGGTCACC
TGTTGTCTCA
GTTTGACAAG
CCGTCAAAGA
CCATCAGTAG
ATATTCCAGC
TCATTGAGCC
CTGTATTGAT
CGTCACGTTC
TTGGTTCGAT
AAAGGTCATC
GCCAATTCAT GGGCCACAAC AAGGGCAACT TCGACAACCA AGTAAACTTC ACGGTAGGTC GAGAAGTCAG GAAGGGCGAT GTGGAGAGAT TCTTCG'rAAA ACTCGATAGA GCGAACAGCG GGATGTGCTT TGGTTGAGTA GACACCTACC CCTTGCAAAT CACCAGCAAC AAAGGCCAAC AACTTCCAGA TACCTGTTTC CTTACGGT GCCAA'rTCAC CTTCrGCr'rG GTCA.AAGCdA 'rGT'GACCC
ACAAGACCCC
TGAGGAATTG
ATATCCAGTG
AGGGTACCAT
AAGTAAGAAG
TCAACATCGA
AGAGAGAGGT
'rAGCAAATGT AGTTr'rCCAT GGTACrTAAC
AGAAATCAAG
'N'TTAGTT
ACATGCGAGG
TTTCTGGCAT
CAA.A;GTTGC
TC'TCGAACTG
AAATCCCTGT
CAGCCTCAGC
CTTGACCTGC
CACTCTGTC
2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3636 TTTGGCTTCA GGCTCATCCA AGTAGAC.AAG ACCTCCTrCT CATGTTGTCT GTAATTTTAC CAATTCGATA TGAAGGGCTT AACTTCTACA GAGGTGATTT CACATGGGAA AGcTTCCGC GCAAAATGGC TGACTCCATC AACTGTATAA TAAGAAGCT CAGAAAAGGC AAGAACCAAT TCAACTTGAC CATTGTCATG GTCAACTGTA AATGGACGAG CCAAATCTTT TTGGTGGAGG GAGATGCGGT TTGACCAGTG ATGGTCACTT TCCCAGAAAA AGTCTI'GGTC TCACGACTCA AATCTAAAAA TAAATCATAA TGTTCAGGAA CAAAT'rGCTT AATGGG INFORMATION FOR SEQ ID NO: 79: Wi SEQUENCE CHARACTERISTICS: LENGTH: 5066 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: a. ATAGCGTGTA ATAATCGATT TTAGAGGTAC TATAAATCAA TGCCTTCCAC CCTTAGACTT TTTGAAACAG GAATAAGTTA ACCAATTCAT AATGCCCCAT AACTTGATAT CTCTCACAT'r AACAACTGTC CAAGCAAGGC TAAAAAGAGA ATAAAAAATT GGAAAAAACT TACTATTTCT AAAGTACGGT GCTAAAAGTA AGAATTTAAA TTTTGATAGC GT'rTcTATT ATTTTATTAT TTCTACTTTr TTATTTGCGT TTTCTTGCGA AGGCCTTGCG GATTTCATTT TCCAAGAAAC CATTGACAAA GATGACAAAG GrrGGTGGTT TGAGACCTTT TCCTTTGTCT GTCGGTGTTG CGTTCAAGAC AGCTGATGGA ATACGTGTAT CAGGAAGTTT GTGGAGACGT TGCTTGGTTA GCAGGTATTG GA.ACTGCTCA CGGATATCTT~ TTTCAAGCGT ATCCCACTTG TTGACCACGA ATCCTGCGAT ACGCTTGTCG TACTCACGAA CCACATCTGA ACGGTCAATA GCACGCATGG CATAAACCTT ACCAGACTTA CGCATACCAG CTGTATCTGT AAAGTGGGTA TCAATGGCAT TAACACCGTC TTCTCCCAAG ATAGCATTGA CAA'rCAAGCT AAACTTAATG ACATCTGGAT CTACGATCGC ATCTAGCACA TCCCCTGTAC GTTCACCCAA ACCGAGAGCA TAGAAATCAT CCTTGN'GAC TGCGAGGATA ACTGGTTTGT CGTC'TGCATC AGTAATTCCT TCCTTACCAG ATAGAAGGGG ATCTAAAACC GTTGGCCI'TT TCAATCCAGT CAAATGTTCC ATCACCGACA ATCAAAAAAA TCCGGAACTG TGAGATGAAT CGGTGTTCCC GCAGGTAAGA AAAGTGCATG TGGTTGCCAC TTGGGTCGCA GGTTGATGGC AATGGCATCC TTTGACT7TTC GC'rGATTTGC ALAGCTGATAC AAAGATAATC CTTCCCAGTT TTTCATAGTG TAATCATCCC T'TTACCAGCT TGCCTTCTTC CGCATTGATG CACGCATAAC AGAGTATTTC CCGTATCAAT CATGGTAAAC CACGAGTTGT TCCAGCAACA TCAAGCTTGA TTTTCCAACG TTTrCTTCCTC ATAT"?CATT
AGTAAAAATA
TATCAAAATA
TCCCCCCTTC
TCATTCCAGA
TCAAAAACAA
AGTTCTTCTT
TAGAAAATCT
ATGATGACAT
TTAATCATCT
GGTGCGTAAG
TGGTTATCTT
TCATGGGCAA
ACCATCAAGA
TCAGTATTT?1 TCTTGACCAkT
GGACTAGCAA
TTAGGACGAC
GGAAGATTTr CATAAGCCAC CTCCTACAAA TAGAAACCGA CCCTAGTTCC TGTCTCAAGC GAAACATTTC ACCAATAGCT AGCAGAATAA AAAGAAACCA TCTCAACACG GTATTGAAAA ACAGAAC'TGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 CGATTCCATG GACAGATGAG ATAGGCAATG ATATATCATT TCTCATCTCA GGGTTGTCCA GGGT CTT ATA AAG CTT ACGA GCTACGTATT ACACGACAAA AACGATAACA TCTCCTTCTT 635 CCATrG4CAAT
TTCCTCCTGT
GGTCACGTGT
?~rCTGCCTGG
ATCAATCATG
CACTCCTTCG
TAAATAGGGT 'rGATTTCCCA TAATTTCTCA CTr7CACAA TCAGCTTGAC CAAACTGITTC GCA'rAGTCAG CCTGAACACG TCTTCCTCAA AGACAACATT GGCACACCCA GTGCCATCCC GCCACTTCTT CAGACTTGTA TCAGCTGCCA ACAAGGCGT'r CC?1'CCACAC CI-rGGGGTTG TTCAAATCTC CGACAAAGAG TGCTrTGATT? GTTCCATGAA AGGAGCATCG ACATCA'rCAA CTAAAAGAAC GATTGAGCCA CTCACCCGTT GCATAAATAC ACATCTTCTA CAATGGAGAT TCGCTCACCA GCGATCCGAT ACAIrGGGAC GTCCTACAAT GGCAATAGTT GGTAGGGCCA TAATTTCTTC MGCAGT ?rTI=CTAGT TGAGCTTGGT TGCTAGGCGC TGACTCCAGC 7rGTGGTCGC ACGCGCCCCA GTCATAAGCT TGGATTGCCT CAGTTGACrG TTCTTGGTA'r CTCTAGTGGC AGTCTCCGGT TCATATCATG ATGTTGATTT AAAGACAGAA TAGGTGTAGT TCGAACCAAA CCGATAATCA TTGTCCAGCA AGAGCTCCAT GAAGGTGTCG GTATGAAGTC AAGGAAAACA GCAGACTCGGC CAGGTAGGTT AAAGAGCTCT CACCACCATA GCCCAAGCT-r CGACCGAACT AATCA.AGAGA GGGCTCCCTr TTCTGCTCGG GAATGGCTTC T'rGACTACC CAATCACAGA G'rAGGATTGG TCTCATTTAA GTCTACTTGG CCTTCATCAA TTrAATGGT'r AGCCAAATA.A CGGATGCGTT TTTCACTTTC GCAAAATGCT AATTCATACA AGGCATCTTT CT'rCTCTTGA AAATTCTTCC AAGATGATGC CATCTGGGCT GGAATTTCTT GCTCT'rTAAA CCTGCGCACT TCTGTCATCG ACGTTTACT CCTTCTAAAC CCATGACCCG TCTGGCTTCC CAGGTTTCGT TCTCCAAATC TTCAAAGT'rG AAGTTCGATG GATATTGGAG AATGACCTGC AGGATTT-CGT CAATATCATC TAAAATCAGG ACCTCAGACA TGCCATCACT GATAGCATT TTGACATCAA ATGAAACACC ACGTTTTTCT GATAAATCAT
CTTCGTACCA
GCTGTCAAAA
GAAGTA'rGAG
GAGTCTCCTC
CATTTCCATG
TGAAAAAGGT
CACGCACCCA
GCTTAATCTC
TGACAAAGCT
GAGCTAAGGC
1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300
CGGTAAATTT
AACGGTTGAT
ATCCACCAAG
AGGATAGTGG
CGCCACCATG
CCACACCAAA
AGGC'TAGTAG
AACTAGCTTT
CTTCT7=~C
GATTCTTAAC
ACTTGTCACG
GTCTCCATAT AAGTAAAGAC C7"T7'CAAAA ACTGGTAAGC CTTGAGACTG GCTGGTAGAT AGCCGCGATT AGCTCAGGAG CAAAATCGGC TTGTAGCCTT CTCGGTGATG TACTGATrrAA CTTTTCGAAT AGCTGGATAT GCCCCAAATC ATCCAAGTCA TGATTAACTT GAGACGGTTC TT'rCTTCATA TGAAACATCT TGGCAATATA ATCCGTATCC ACTTGGAGAT ACTGCGATTT
TCCTGCATCC
TGCTCGGCC
GTCTTrAACAT
AGGAGAGTTG
AAACTTTTAC
TGCTCCACGA
ACTTGAGCCA
TTAATAGCCG
GCATAACCAT
CCACGGAGAA
AATTCCTTTG
GAGTTAAGGA TTCTGCTGG CCAAATCTTG ATAATAAAAA TCTAATCTCC TCC~nrTTNCT GACGAGTTTC CTCGCTGTT TrM7CTGG GGCAGTCTGA TTCG'rAAAAC GGCCTCTTCT 'rCATGGCATA TrrCTCAT'rG TAATATTGAT GACTTCGTCC CTGTT'rGGGT AATGGTTCCC TT=AC7=~ AGCTTCTTTG GTTTTTGAGC AATTTTT-1TCA 636 ATAAAGGCCG CAACATCAGG GTCCTTCATG CGGCTGGGTT GACGTTTGAG T ACGTCTCCG AATCGAGCTA ATAGI"CTG CTTCTTACGT A'IrTTCTGGA
ACACT?CCA
TCTAGTTCCA
TCA'rCrrAT ?TcTG1-N-1-
GCCGAATGAA
A'rATTTGCCG
AGTAAGCCCA
TTGCGTGTTT
ATAATGGTCG
CGCATGCGTT
ATTCAGGAT'r ACTCCATTTA GGAACATrGG GTGTTTTTGC TTT"CTGCCCT CGATCACGAA TCT'rTTGATA GGCATAGTCA TTGGCTACCT AATCCACCTT ATTAAAGGTC AATAAGAGAA AGCCAGCCAT CTGTTGCAAC AGTTCTC'TTT GCTTGATTTC TGCTAAGAAC TGCAGGGCAG CTTCCTTAAG ACTAAAGTCA GAGGAAACTG TGGTTGAAAT AACCTGGGAA ACAGC74GTTG TCCATTTCTT CTCCTCGGCA ATAGCAAAGA AACGAAGTCC ATCTCGAGCC ATCAGCTGGC CTTTCTTC -r GAGACCAAGG TCTTCTTGAC GATTCACTGA GACAGGTA'rT TCTTCACCAT ACTTGGCCAA TGATAGGrr TCAAACCAAG GGTTTAAGAC ATCGGACTGC TCATCCGCAA GAAAATGT'rC CAAGTCAAAA TCAT'rGGCCA TGCCTAGTTC TGCCAATTCT GC.PAAGACTT CAGCAC~rrC AACTTrTCA.AA TCCTCCACAG TGCGATAAAC AGGATGCCCC AAGAAGTCTr GATAAACATC ACCC IGA TAGAGGGTCA A'NTTTATCAG TCTATCCATC CCAAAGTTGA CTACATCGCC AATCTTTTC GACTAGATAG AGGAGCATGG AGAGATTAAA AGCAGATAAG
TCTAAGAGTC
AGGGCTAGCT
ATTTTCAATG
3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5066
GATGGTTGAG
CCTTTCTACC
CGATAATCGG
TTAGATAAGA
GGGTGATTTG
ATTATCCCAA
GAGGTAGCAC
AAAACGGTCA
TTGGAGCAAG
AAACTGATTG TATAAAGATA TGTACCAGAG ATGAGGTA'rC ATTGGCTTCA T'rTATCTTTC CTCTCTAACT CACTGACATC ATCTCGTCCA ATTCAGCCAA TCATTTTCAT TrCGACCACG TCACTTGACA CAGGACGTTT AATGCTTGAA AAAAGATATT AAGGCTCAGT GCCTCC'rGAC TTGCGACACC CGArrATI-CT CTTTTTCTTT TTAGAGGACT CTTAAAACTA CGATAGACAC CTCCTCCATG ACGAGTGA-AC GAGTTTCTGT TCGATACGAT CTGGGCTGAG CGGATAATCC TAGCAAAACG TACATAGGTA CAATGTCCTC ACTTTGAATT TGACTACCAT GTTGATTTCA CATTAAAGAT TTTATCTCTC AGGTTCTTTC TTCTACTCGT GTCTTCTACG AA'rGGTGTTC TAGCCCCACA TTTTGCACAG GAGAATTGTT CCCGTGTGCC ATCTr'rTTTA ACAACCACTA TCGTAGGTTG TAAAACGGTG TTGGCATTCG TCGCACTCAC CCTTCrrCTG CTTGGCGACT ATCGATAACA CTTGACTTGG
GGTACC
INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 9607 base pai rs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:
CACTTGAAGT
GTCGTGTTGG
CACTTGGACT
ACCGTCTTGC
GTGAAGATAC
ATAGGATGCG
ATTTGAAACA GCTATGGAAA ACATCATGCC TGGTTCTAAC TACCAAGTCC CAGTTGAAGT TCGTTGGTrG GTAACAATCG CTCGTCTTCG
AAAAGAAATC
TCACCGTATG
AAAGCGTTAA
TTGGATGCTG
GCTGAAGCTA
GAAAGTCCCA
CTTTTTCTC!C
TGCAAGGTAA
a a *aa.
a a 'rTGCAACCAA TGAGATTCAT GATGCTAGGA ACGGTAAGGA TACAAGGAGT TTTATCTTTT TCACGCAGCA TTTAGCCCGG GTTCAAATTA GCTAAATCGA TCGTAAGTTG AAACCAACAA TAGCATGAAA GTTTTAT'A AAATCGTGTT ATAATAGAAT CTCATGGCAC GCGAATTrTC ACTTGAAAAA GATGCCGGTA AAACAACAAC TACTGAGCGT ATCGGTGAAA CTCACGAAGG TGCGTCACAA GGTATCACGA TCACATCTGC TGCGACGACA ATCGACACAC CAGGACACGT GGACTTCACA GATGGTGCGG TTACCGTTCT TGACTCACAA TGGCGTCAAG CAACTGAGTA CGGAGTTCCA ATCGGTGCTG ACTTCCTTTA CTCTGTAAGC CACCCAATCC AATTGCCAAT CGGTTCTGAA AAGATGAAAG CTGAAATCTA TACTAACGAC
CTAACAACAC
ACCCTGCATT
GAGAAAATAG
AGACTTTr'AG
AAATAGGAAAA
TCCCGT'rCCA
TTAGTATTAG
ACATTGAGAA
AGAAATCAAA
ACTCGTA-ATA
ATTCTTTACT
ATGGACTGGA
GCTCAATGCA
ATCGAAGTAC
TCAGGTGTTG
CGTATCGTAT
ACACTTCACG
GATGACTTCC
CT'rGGTACGG CTGACGCAGT AT'rCGACGAA GCTCACATCG GCTAACTAAC CTATAACTCA GCTTACCATC CGGGTAGGTC CTGCCTATCC AATAA.ATAGG AGAAACAAAC TCGGTATCAT GGCTCACGTC ACACTGGTAA AATCCACAAA 'rGGAGCAAGA GCAAGAACGT ACAACCACCG CGTAAACATC AACGTTCTCT TCGTGTATTG AGCCTCAAAC TGAAACAGTT TTGCCAACAA AATGGACAAA ATCGTCTTCA AGCAAATGCA GTGGTATCAT TGACTTGATC ATATCCTTGA AGAAGACATC TGTACI-rGAA GTACGTGCAC TCGTCCAGAA CGTCGTACAA TGGTGAACAC ACAATGCAAG TGGTGCAGCA GTTAAGAA.AC CGCACACTTC CGTTGGTAAG GGAATCGAAG CAGGTTGCGG CTTGAGCTCA ACTAAATCAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 CCAGCTGAAT ACCTTGACCA AGCTCAAGAA TACCGTGAAA AATTGATTGA AGCAGTTGCT 638 GAAACTGACG AAGAATT'GAT GATGAAATAC CTCGAAGGTG AAGAAATCAC TAACGAAGAA TTGAAAGCTG GTATCCGTAA AGCGACTATC TCAGCCTTCA AAAACAAAGG TG~rCAATTG Ar.CCCACTTG ACATCCCAGC AATCAAAGGT AACGTTGAAT TCTTCCCAGT ATTGTGTGGT
ATGCTTGATG
ATTAACCCAG
GCTCTTGCCT
CGTTATCGA CTACCTTCCA ATACAGACGC TCAAGAAAr 'rCAAGATCAT GACTGACCCA
CGTCCAGCAT
T'rCGTAGGTC
GTATTGAATA
CTGACGAAGA GCCA'TrrGCA GTTTGACATT CTTCCGTGTT CTrCTAAAGG TAAACGTCAA AACAGCCGTC AAGAAATCGA AAAGATACTA CAACTGGTGA ATCAACGTTC CAGAACCAGT
CACTGTTTAC
CTCATTCACA
TATCCAA'rrG TAC1'CAGGTG 'N'TTCAATC AGGTTCATAC CGTATCGGAC GTATCC7TCA AATGCACGCT TCAGGTGATA TCGCTGCTGC CGTTGGTTTG GATGAAAAAG CTAAAATCAT CCTTGAGTCA ATGjGTTGAGC CAAAATCTAA AGCTGACCAA GACAAGATGG GTATCGCCCT TCAAAAATTG GCTGAAGAA.G A'rCCAACATT CCGCGTTGAA ACAAACG'N'G AAAC~rGTGA AACAGTTA'rC TCAGGTATGG GTGAACTTCA CTTGTTGATC GTATGCGTCG TGAGTTCAAA GTI'GAAGCGA ACGTAGGTGC TCTTACCGTG AAACATTCCG CGCTCTAC'r CAAGCACGTG GAT'rCTTCAA CCTTGACG'rC
TCCTCAACTA
ACGTCAGTCT
S
S..
.a S S
S
OS S 5S 5 5
S
S
S
5.55 *5 S S
S
GGTGG'rAAAG GTCAATTCGG TGATGTATGG GGATTCGAAT TCGAAAACGC AATCGTCGGT GTTGAAAAAG GTNTTGGTAGA ATC'rATGGCT GACGTTAAAG CTAAGC~rTA TGATGGTTCA TTCAAGATT'r
CTTGAACCAA
GGTCACGTAA
ATCGT2'CGTG CGGCTTCACT TTCCCTTAAA TGATGCTTGT1 AACAATCACr CTGCTCGTCG TGGACGTGTA CTTACGTTCC ACTTGCTGAA ATTGAATTTA CTCCAAACGA AGAAGGTAAA GGTGT'GGTTrC CTCGTGAAT'r TATCCCAGCG AACGGTGTTC TTGCAGG'rTA CCCAATGGTT TATCACGATG TCGACTCATC TGAAACTGCC GAAGCTGCTA AATCAGCACA ACCAGCrATC GTTCCAGAAG AAAACCTrGG TGATGTTATG GATGGTATGG AAGCACACGG TAACAGCCAA ATG'rTCGGTT ACGCAACAGT TCTTCGTTCT GTATTTGACC ACTACGAAGA TGTACCrAAG AAAGTGAAG ACTAATCCGT CCTCACTCTA CTTTACAAAA TACCTCTAAA TATGGTAAAA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 GCATCTCAAG GACGTGGTAC TCAGTACAAG AAGAAATTAT GAACGAAGTC ACTTAGTGGC TAGTAGAAGA ATAATGTGAG ATTGGGGATG CCTGCTGAAA
ATTCATGATG
TAAGAAAAAT
TTCCTTT=GT
GAAAATGAAT GTCAAATAGT TTGAAATTT 'rGATGAATCA TGAGACAGGC TCCTCCTTTA GCACAGGCCA ATATTGAGCG AGTTGTGGTT CATAAAATTA GTAAGGTATG GGAGTTTCAT T'rCGTA7-1rM CTAATAT'r= ACCGATTGAA ATC=M1'AG AATTAAAGAA AGGTTTGAGC GAAGAAT'T CTAAGACAGG CAATAAAGCT G7TTTAAA TTAAGGCTCG GTCTCAAGAA TTTTCAAATC AGCTCTTGCA 639 GTCCrACTAT AGGGAGGCr'r TCTCTGAAGC TCCATGTGCT AGTCAAGCT'I TTAAGTCCCTr TTATCAAAAT TTGCAAGTTC GTCCTGAGGG TAATCAGCTA T1TATTGAAG GATCTGAAGC GATTGATAAG GAACATTTA AGAAGAATCA 'rCTCCTAAT ?TAGCCAAAC AACTTGAAAA G irrGGTT~r CCAAC=rTA ACTGTCAAGT CCAGAAGAAT GATGTCCTGA CCCAAGAGCA GGAAGAGGCC '!rMCATGCTG AAAATGAGCA CA7rGTCAA GCTGCCAATG AGGAAGCGCT CCGTGCTATG GAACAACTCG ACCAGATGGC ACCrCCTCCA GCGGAAGAGA AACCAGCCTT TGArrTCAA GCGAAAAAAG CTGCAGCTAA ACCCAAGCTG GATAAGGCGG AGArTACTCC TATGATCGAA GTGACGACAG AGGAAAATCG TCTGGTATI'T GAAGGGGTTG TTTTTGATGT GGAGCAAAAA GTGACTAGAA CAGGTCGCT TTTAATCAAC TTTAAAATGA CGGACTATAC TTCAAGTTTT TCTATGCAAA AGTGGGTTAA AAACGAGGAA GAGGCCCAGA AGTTTGACCT CATCAAGAAG ATTCTTGGC TCCGAGTTCG AGGGAATGTG GAGATGAATA ACTTCACACG CGATNTGACT ATGAACGTAC AGGATCTGCA GGAAGTTG'N' CACTATGAGC CGAAGGATT CATGCCAGAA GGTGAGCGTC GGGTTGAGTT TCATGCTCAT ACTAACATGT CGACTATGGA TGCTTTGCCA GAGGTCGAAG AGATTGTTGC TGCTATCACG GACCATOGGA ArGTCrZAGTC AACAGCTGCT AAGTGGGGAC ACAAGGCGGT CTTTCCACAT GGCTATAAGG CGGCTAAGAA 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 AGCGGGAATC CAGCTGATCT ATCGGATGGA AGCCAATATC CGTCTATAAC GAAGTGGAGA TGGACTTGTC AGAAGCAACC AACGACGGGA CTTTCAGCTA CAAGGGGAAT GTTATTrGCTG CTTTACTACA GAGTTAACTG ACAAGTTTTG CAAGAATTCC TCTATAATGA CTTGATTCAG AATTTGATGA ATTTATCAAT GAATTACAGA TGATCATC TC AAGAATTTTG CAAGGATACG GTGGAGGACC GTGTCCCTAT TACGTGGTCT TTGACGTGGA GTTGCGGCTT CTAAGATGTA CCTrGGGCATC CCTTGTCAGC AAAAATGCCA AACCACTAGA GTCCTAGTTG CCCACAATGC TACCTr'rGAC GTTGGCTT'rA TGAATGCTAA TTATGAGCGG CATGATCTTC CAAAGA'rTAG TCAGCCAGTT ATTGATACGC TGGAGTTTGC TAGAAACCTC TATCCTGAG? ATAAACGCr-A TGG7TTTGGGG
CTACGATGCG
ACATGGTGTG
CCTTTGACCA
GAAGCGACTG
ACCGATT'TAG
AGCG'TrTGG TGTGGCCTTG GTCGTCTCCT T'rTCATC=T CTAGACTCAA CATTGATCTA GAACATCACC ACATGGCCAA ATCAAAGAGG TAGCAGAAAA ATCAGTCCAG ATTCTTACAA AAAAGCTCGG ATCAAGCATG CGACCATCTA TGTCAAGAAT CAGGTAGGTC TAAAAAATAT CTTTAAGCTG GTTTCCTGT CTAATACCAA GTA'N'TGAA GGAGTGCCAC GGATTCCGAG AACGGTTCTA GATGCCCATC GAGAGGGCT'r GATTTTAGGT TCAGCCTGTT CAGAGGGTGA AGTTTTGAC GTGGTCGTTT CTCAAGGTGT TGATTTTATC GAGGTCATGC CACCGGCTAT CAAGGATATG GAGGAACTCC ACACCATTAT TGGCAAGCCT GTTCTG =TA CCCCAAATGT TCGTGAAATT ATCGTCCGTA GN'TGGGACA TGGTGAACAT GCCCAACCAG CACCACI-rCC G7"rGGATGAA TTTGCC7TTT TGGGAGAGGA CAATGCCTTG GCAGAAATAT TTGAATCCCT TTTCATCGAC AAGGCTGAAG AAACAGTTGC TTATGAAAT CCGCTGCCAG ATATTGTTGA ACTGGGAAT GGATTTGCTG TGATTI'ATCT TGAACGGGGT TATTTGGTTG GT'rCTCGTGG GATTGGGATT ACCGAGGTCA ATCCTCTCTC CAGTGAGTTT ATCACAGATG GTTCGTACGG TCCAAACTGT GGTCACAAAC TCAGTAAAAA TGGCTTTTGAT GGGGATAAGG TTCCTGATAT TAGCGCCCAC T'rGGATGTGC GTGATATCTT GGTTGGTACG GTAGCTCCCA AGACTCCCTA 640
GGATCGCG
crATGcAcci: CAAGAGTIr.
TCACTATATC
GGGI'GCGATG
AAAGGCTCAT
ACTGGCTCGT
TGAAGTCGTT
TGAGTTGACC
-rTGCGGATT
GGCATCGCAG
GTTGA.GrTGG CCAACTATTA TTGATTGCCA AAGAGCAGGT ATAGACGTTG GAGACCGCCr GAACCGGAAG AAGAGATI'TA ATTAATCGAA CTATCGGTCA TTTCGAACGA CTAATCAGAT AAACTGGTTA TTGAAAACAC AAGGGTGACT TGTATACGCC TATAAGAAAG CTTCAGAT GAAAAAGAAT TAACATCCAT ATGCTGGTGC AACGTTCTAA GTCTGTCGGA TCTAGTTTCG TTGCGACCA'r TCCTCACTAT GTCTGTGGTC AGTGTCAGTA
TTCAGGATTT
CGGACAGGAT
TGACTTGAAC
TGGTGAAGAA
TGGAT'rTGTC
ACGCCTCGCT
GATATGCCCC ATAAGGACTG ATTCCGTTN'G AGACCTTCCT T'rCTCGGGAG AAGATCACCC TATGCCTTCC GTGCGGGAAC AAAGGTTACG AGCGAGATTA CAAGGAGCGG CGGGTGTCAA 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 'rGGCAAGTTT
GCCACAACA
CGATTTTACG
CTTTAACTTC
'rCCGACTATG
GGATGACGAA
ACAAAT'rGGA
TGGAATGGTA
GTCCCACGGT
GGACCTATCG
TCTGGAACCT
GATTTCAGAA
TATCGTGATG CAGAAGTAGA GCCCAACACC CGCGGGGAAT CCTG'rCCAGT ATCCAGCAGA CACGATATCG ATGAGAACGT ATTCGAAAAC TTCAGGATTT GGCGI'GATGC CACTCTTTTC ACGCCTACGG GTATGT'rGcG GACGAAACCC ATCCGACAAC ACTGATGTTT GG7TTGGGGAA ACTGTTATCC CTTGTCGGGA AAGATGGCCT TTACCATTAT GAGGAGAGAA ATGGCTATAT CGTTCTTA'rr CCGAACTACA TGGATGTCTA TGATGTCACG GCTGAATGGC AGACCACTCA
CCTCAAACTC
GTC'TCGTATT
GATGTACTGG GACATGATGA GACCCTAATA AAATTCCTAT TGGGACTGAT CTGCTAGGGG GATTCCAGAG TTTGGAACAA CTTTGCGGAA TTGCTTCAGC TGCTCAGGAT CTGATTAAGC
TAACACCTGA
ATTTCGTACG
TGTCTGGTCT
AAGGAATAGC
CGACATCATG GTTTACCTCA TGCATGCGGG GGAACGGGTA CGTAAGGGTT TGTGGCrAAA CGAAGCAATG AAGGCTAATA AGGTCCCAGA GTGGTATATC GAATCCTGTG GGAAAATrAA GTACATGTTC CCTAAGCCCC ATGCGGCAGC 6780 CTACGTTATG ATGCCCTTGC GTGTAGCTTA CTTCAAGGTT CACCATCCTA TTTATTACTA 6840 CTGTGCTTAC TTCTCCA'rC GTGCTAAGGC 7TTGATATC AAGACCATGG GTGCGGGCTP 6900 GGAGGTCATC AAGCGCAGAA TGGAAGAAAT CTCTGAAAA CGGAAGAACA ATGAAGCCTC 6960 TAATGTGGAA ATCGATCTCT ATACAACTCT TGAGArrGTC AATGAGATGT GGGAACGACC- 7020 TTTCAAGTTT GGTAAArrAG ATCI'CTACTG TAGTCAGGCG ACAGAGTTCC TCATCCACGG 7080 GGATACCCTT ATCCCACCAr TTGTACCAAT GGATGGTCTG GGAGAGAACG TTCCCAACCA 7140 ACTGGTGCGG GCGCGTGAAG AGGAGAATT CCTCTCTAAA ACAGAACTAC GCAAGCGTGC 7200 'rGGACTCTrCA TCAACCTTGG TTGAAAAGAT GGA'rGAGATG CGTATTCTTG GAAATATGCC 7260 AGAGGATAAC CAGTTGAGTT TGT'=GATGA G~rTT7MAA AAAATGCrr AATAATCTAT 7320 TAAAAGAGGC 'rAACGTATAT CCAATAGATT TACATTAGCT rrCI'TT1TG TTAAAATAGT 7380 CTATGGAAAG AGGGTGAGAG TATGTCAAAG ATGAGTATAA GCATCCGTCT GGATAGTGAG 7440 *GTTAAGGAGC AGGCCCAACA GGTGTTTACT AATCTCGGAA TGGATATGAC AACAGCTATT 7500 *AATA7MTCC T'rCGTCAGGC AATTCAATAT CAGGGAT'rAC CTTTTCATGT TAGACTAGAC 7560 .GAAAATCGGA AGTTGCTCCA AGCGTTAACG GATTrAGACC AAAATCGTAA TATGACCCAG 7620 *TCTTTTGAAT CAGTCTCAGA TTTGATGGAG GACTTACGTIG CTTAAGAT'rC GTTATCATAA 7680 *ACAGTTTAAA AAAGATTTA AGTTGGCTAT GAAGCGTGGT TTGAAGGCAG AATTATTAGA 7740 AGAAGT=rG AATTTCTCG TTCAAGAAAA AGAACATCCT GCCAGAAATC GTGATCATTC 7800 ATTGACGGCA TCCAAGCATT TTCAAGGAGT TCG'rGAATGC CATACCCAGC CAGAI-rGGCT 7860 *TTTGGTTTAT AAAGTAGACA ACTCGGAATT GATTTTAAAT 'TTCCTGAGGA CAGGCAGTCA 7920 CAGTGATTTA TTTTAATCTA TTTTAAGGGG GTTCTCATGA AACTAAGAAT ATTTCCGGAA 7980 *GATAAGCCGG CTAAGAAGGT ATTTGAATAT CAATTAGAAC TTGCTGATCG TACAATTCTT 8040 *CTATCGACAG CACTCTTGTC AGGTGCTATT GCTTTAGCAG GAATCTrC TCrTGAAA 8100 GAAAAATAAA AATAGAAAAG AGAAAACAGA ATGGTr'rTAC CAAATTTTAA AGAAAATCTA 8160 GAAAAATATG CGAAATTGT GGN'GCGAAC GGAATTAACG TGCAACCTGG TCACACIrTG 8220 GCTCTCTCTA TTGATGTGGA GCAACGTGAA TTG;GCACATC TAATCGTCAA AGAAGCTTAT 8280 GCCTTGCGTG CGCATGAGGT CATCGTTCAG TGGACAGATG ATGTGAT'rAA CCGTGACAAA 8340 TTCCTCCATG CCCCGATGGA GCGT -rGGAC AATGTGCCAG AATACAAGAT TGCTGAGATG 8400 AACTATCTCT TGGAGAATAA GGCTAGCCGT CTTGGAG'rTC GT'rCATC-rcA TCCAGGTGCC 8460 642 TTGAACGGAG TGGACGCTGA CAAGCTCA GCTTCTGC TA AAGCTATGGG AAGCCTATGC GTATCGCAAC TCAATCTAAC AAGGTTAGCT GGACTGTAGC GGACTTGAGT GGGCTAAGAA AGTCTTCCCA AATGCTGCGA GCGACGAAGA ?1'CCTTTGGG ACCAAATTT CAAAACTTGC CGTGTC TACG AAGCAGATCC TGGGAGGAAC ATGCAGCCAT TCTCAAGAGC AAGGCCGATA TGCTTAATAA TCAGCCCTTC ACTACACAGC GCCAGGAACA GATTTAACAC TTGGTTTGCC GTTTGGGA.AT CAGCTGGTGC TGTCAATGCA CAGGGCGAAG AATTCTTGCC
ACTTGCCATG
AGCTGCAGCA
AGCAGT'rGAT
TGTTAAGGCT
GGAGCATT'
AAAGAACCAC
AAATATGCCA
ACAGAAGAGG 'rCTrCACAGC GCCTGACTTC AAACCGCTrA GCTACAACGG AAATATCATT CAAATCGTAG ATATCACTGC TGAGAAGGGT AATGCGGGTG CCCG'rGCCTT GGGTGAATGT CAGTCAGGCA TTACCrCTT TAACACCCTT ATCGGrGCAG CCTATGCGAC TAGCGTTGTT GAAGCTGCAG GGCT'rAACCG TTCAGATGTT ATGGATATCG ATGGTArTCG TGAGGATGGA TGGGCA.AATT AAGGAGATAA TATGTTAGGA TTAGCAGGTG CTATGACCAA TCGTGGAGAG GGTTGGATCG GAGCCTTTCT AGGTCACTTG
GCCTTGGTAC
TTCGATGAAA
GATGGAGCGG
CACGTAGACT
ACGCGGGTAC
AGTATGTTCG
CGAATGGGAT
CTC'N'TGGAA
CAGATCCAAG TCCAATrTCT ATGCGTCAAA CCAC'rTGGCT AGATGAGCGA AGAGGAGCTT TTATGATTGG TTCTAACCAA CTCTTTTCCG TAATGGGAAT T'rGGTCTCCT AGTGGGATTT GTTTTGGAAA AATGTTTCTC CTTGGGGGCC AGTTTTA'rCA CGTCGTGCAG ATGGTTATOT CACTrCTACA GAAGGCATTA AGGTGACCTT TAAGGA'rGGA GATCAGGTTA TGAAAGACCT TaTcTrTGAA 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9607 120 180 240 GGAACAGC'7A TTATCCCAGC GATTTTAGGA GCCATGATTG TTTTAGCTAT TTTTTGGAGA
CGAGGA-A
INFORM~ATION FOR SEQ ID NO: 81: SEQUENCE CHARACTERISTICS: LENGTH: 14231 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D0) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: CTACAAGATA ATTCCAGCTA TAACATCCGC TATAATAGTA AGAGCGAGCT CTATGATAAG GCTCATTAGT TTCACCTCCT CTCACGAACC CATAGGAACG TAATCGGTAA CCGATGACAA AAATAGTATA CCACAATACA TTTAGATCAT CAAGGTCACT TAATTCTTGA AATATCAGAT CTAAGAGAAA AATCTTTAAA ATCAGAAAAA CGCATAATAT CAGGTGTGCA AAAACTTGAT 643 ACTATGCGrr TTATrTGTGGG AGCCTCTGTT TTTAGGGrG TCAGCAGACA GAACGATACT GTATATATGr GACTGACTTC ACTTGATCAA TTTTCAAATC TCATTGAATA 'rCAGAAACCC AAGGTrrACT CCATTTTCTC C 'GAAATTGA GTTTTGTCC CTAAGAAAAT AATGTCATGT GGTGAATArr TGTAAATCAG CTTCGAAAA'r CTCTTCACAT CATGTCAGCT TCGTCTTTCC ATCAGTTCrA TCTACAACCT CAAAACAGTG TTCGAGCTG TGTACI-rGA GCAAGCTGAG ACTAGCTCC TATI'rGATTT- ATTCTCCATC AAATAATTCG ACTGCGTCTA ATAATTTTG ATCTGGCACG GTGTCTGAAA TAAAGGTTGT GTATTNGGAG TCCAGTCTTG TAAAATTTAG AACTATCAAT CAGTAAGATG AATATTC7'Tr TrGAAATAG CTTGGCTGAG AGAAGCTTCA ACCTCTTGCr GAACAAAATG CTAAATCGAT ATTAAAATGA ATCATAGTTG ACCACGGAAC AGGATTGATC ITTTGACCTCG AGGGGATTAA TTTTAAAAAA GTTTCATG.GG CNr7GTCAAT TAAACATATT GGTCATCAAT TCTAATAAAG AA'rTTTCCTT CCAGATGTGA TAAAGATTTT' GGAGCTATCT TTAACAG~rr CAGATAGGGT T'rGTGCAGTA rGTAAACCAT TTGTAAAAAT a a. a.
a a AATCAAAT'rA TCAAGTTCAG AAAGATAGGG ACAGAGTTCG TAGACAGTAG
TAGATAGATA
TTGCCTAGTA
CACATACCAG ACCGAATAAA GTCTTTAGCG AGACTAGCGA CTTTCTCCTT CACGTA7MT ATGAGAAAGT TCAATTGTGT
TACTAGAATC
TTAGTCTT'r
TCATAGAGGA
CAGGGTCACG TATCCGTGCT TTCTTTTGAT AAGACCTTGA TTTTCTAAGA AAATTAAATC ACGACGTAAG GTACTTGTGC TGGAGAAAGT GATTCTGCC AGCTCTTTTA CGGCAATTCT TTTTTTCTT TTGATAATTT CAATCAATTC AAGTACACGT TCA'rC7 A TCATAAGCTC CTCCTAATTT ATCATTTCAA CTATATTATA GCACAAATTG GAGGAATTTG AATTATTTTTT ATGAATATTG GGTTAACATT TGAACATTAT TCAAGTAAGC GTTCACATAT TGAAAAAATA AAACGTGGGG ATTATAATAA AGT'rAATCMA GCACGAAGAG AGAAGAAAAA TGGAAGCGGT TTTAGCAATA GATTTAGGTG CGACTCTGG AAGAGCAATC GTI'GGTTACC TT'rCTGAAAA TAAACTAGTA A'rGGAAGAAA TAAATCGCr TTCTAATCTA CCTATTAGAG TAAAAGGGCA TTATCTGG GATATT-GACT TTCTACTAGC TAAAATTCTr' GAAAGTATCC GCTTCGCTAA TACrAGTTAC AAGAT~rAT CTATCGGTAT TGACACATGG CGAG~rGATT TTGGACTGAT 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 a.
a a TGATAATGAA GGTAAGCTGT TATTACAACC TGTTCATTAT AGTGTTAAAG GAAATATCTG AAATGACTGA ATTAGAAAAA TCAGATTATG GAGATAAATA CCTTGTTTCA ACTCTTTAAG CTCTTTCTAT A.AGACCAATA AGATTCTTTT AATGCCAGAT CGTGATGAAA GAACAAAGGG CTGTATTCAG AGACAGGAAA GCACGTCAAG AATCTCCTGA TTGTTTAATT ATCTCTTGAC ACGTAAGTTT GCTACAGAAA TICAAAATTGG AATCAGAATA 644 AAAGCATTGC TTCAACAACT CAATTATTTG ATCCTAGGAG TCrAAAACT A~rTGAAT'rG GATTCATCTT TACTTCCTGA AATTGTTTCA GAGGGAAATG TTCTTGGAAG GATAAAAGAG GAC;TATGGTT TAGCGATAT TCCTG~wrGTG AATGN'rTGTA GTCATGATAC AGCAAGCGCG ATwlGTCTCAG TACCTAAGAC AGAACGTAGT TTATTTATTT CATCAGGTAC 71rGTC=rG G7TGGAGTGG AACTTAC -rC ACCGATTCTT ACTACCGAAT CCrCAGTTA TGGATTTrACA AATGAAGTCG GTAAAGATGG AGTGATTACA TTTCTGAAGA ATTGTACAGG GTTGTGGATC ATAGAGGAAC TAAGACGTTC A7*rrGAACGA AGAGGGAAAG CCTATTCTTT TGATGATATT AGGACAATGG TGGAGAAAGA
S
S
S. S S
S
S. 5 5 I. *S S
S
*555S5
S
S
AAAAGAAAAT CTTCCTCTGA TTGATACTGA GCACAAGACT TTGACAGAAT ATCTAGCTTA ACAACTATTT' AAGATTGTITT ATGAAAGCCT ACTAGAAGAA CTAACTCATA AGGTTTATAA AGCCAGTTAC TTTAACCAAA TGATTGCTGA GACTGAGGCT ACAGCTGTGG GGAATATTGT AGGGATGGAA GAGGCTCACC ATGTTATTGA CCAAAAGAAT TAAAAAGATT GAGAGTTTGT TGI'GCAGGCAA GGGGGGATAA TTGGTGAATT GACAAGGATG TCAGATGTAA AACAAGAATT AGATrTGACG AAAGGAACAG GTGGGAATCT GGCAATTACC CCGTCGGGTA TTGATT'rCTT GGATATTAAT GGAAATGTTG TAGAGGGAGA TTrGATTCAA TATCAAACTC GTGATGATAT
AGCTCAAACG
GAGGATATAT
TAGAACTGGT
TATAGGAAAG
GTGATTGGAG
AAAGAGGTTC
ATCAACTGAA TTTGCAACAG TCATCATGAA ACTAGAGAGT
AATCTGATAT
GGACAGATGG
CGATAGAGTT
GAGGrGCTAG TTACAGGTTTr TGTGCAGCTC ATAGCTATGG GACA.A'TAAA GGAGTTTCTA CAATTAGAGA GTTATTACTC AAATrTGCCT CCCTCCCCCT TCTTAGCTTT GAAAAATATT TAGTGTTT-rG ATATGAGGAG AATTAAATAT GGTAAGAAGC TAGTAGAAAC CAGCGT'rTTC GATCGTGAAA AACAATTGAT TGAAATCAAA GAATCCGATA TTGTAGTGAT ACGCTTGCCA TCTAGCGAAT GGTATATGCA CGATGCAATT ATCCA'TGCTC ArACAACTTA 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 TGCAACAGTA TTAGCTTGTC TCAGAGAACC ACTTCCAGCG AGTCATTATA TGATTGCAGT GGCAGGGAAA GATGTTCGGG TAGCTGAGTA TGCAACATAT GGCACGAAAG AATTGGCTGT GAATGCAGCT AAAGCAATGG AAGGTCGTAG AGCAGTTTTA CTAGCGAA'rC ATGGAATTTT AGCAGGTGCA CAAAATTTAT TGAATGCATT TAATATTGTT GAAGAAGTG AAAAATTTAT TGTTTAGCTA AGAATrTTGG AGAGCCAGTA GTTCTTCCTG GGAATTGATG GCAGAAAAAT TTAAAACATA CGGTCAGAGA AAATAGGGAG T'rAAAACATA TACCGAAAAA TA'TrCTCCA GA'rTTATTGA AGACTTTAAT CATGGAGATG AAATAGTATT AGCTGACGCG AATTATCCTT CTGCCTCATG
AATATTGTGC
ATGAGGAGAT
GATATTAATG
GGAAATGGGA
TGCAAATAAG
645 CTAATTCGTT GrGATC.GTG AAATATTCCA GAATTATTAG ATTCCATTCT GTATTTAATG CCATTAGATA GTTACGTCGA 'rAGTTCAATT CAGWTATGA ACGTTGTTTC GGGTGA'rGAT ATTCCTAAGA TATGGGGTAC CTATAGACAG ATGATTGAAG GTCATGGTAC AGATCTTAAA ACGA'rTACTT ATCTTAGAAG AGAAGACTTT TATGAACGTA GTAAGAAAGC TTATGCTATT GPTGCTACAG GAGAAACTTC ACTTTATGCT AATATTATCC TTAAGAAAGG AGTAGTrTGTT GAAAGAGAAA ATCTTCAATA GAGGAATTVr AGTTGCCAGT CATGGTAATT TTGCTAGCCG AGCTrCTCATG ACCGCAGAAA 747rTTGTrGC TGAGACAACA AGGTTTGATG CCTGGAGAGA ATATTGTAGA GTTTGAGCAT 'rGAACTCTTA
TAATAATGTG
TAATATCCCT
AATTrc~rcAC
CGAAGAAGAT
CTTGTCAC'rA
ATCTCAGAAG
GACTCAAATC AAGAGGTTAT CGr=TACT GCI'TTGTCAC CGGlr'AAA TTTGGATTrCA CTCCTAGTGG AATTAATATC AAGTTATGAT AATGCTCAAA ATAGI-rTGTT TIAA'rGTTAAA TTATGTCTAT AGAGTTGTT CGTATTGATG AATGATAGAG 71TAGGACATT TATTTTAAAA ATCAACTGGA GACTTGATTG GAGGAAGTCC GTTGATATTG TAACAGCCTT TCAAAAATCA ATTT-AGAAGA CAACAACTTA ACGTAGACGA ACCGTCTGGT ACATGGTCAA CGTGGCTAAA AAAGTATGAT ATTGAGCAAG TTATCATTGT ATAAAACACG ACAATCTATT TTAAAGAMrT CTGCACCGGT
TAATGATCGC
AGGTTrTAAAA
AATAAAAAAG
AGGAAATTTA
3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 ATTGTTTTCT T'rAGCTAAA ACGGTTTGTG AGAACAATGC TGATATATAC AAATCCAAAA AAATTGGAGT ACCTCAATGT AGGACAGATG GGAGGTGTAG CTCTAGGTGA AGAAGACAAA GAAGTTTT~AA ACTCTGTGCC GATGTGTATG A'rTCTATTGA AGTAAAACGG AGGAAAATGA AAAGGTAACG TATTATTT'TA AGAAAATAGT TGATAAGGGA ACGAGAGTTG AAATTCAAAT GGTTCCTAAT GATAAAGTTA CAATGTTGGA AAAATTTTTA TAAAAATAAT T'rAAGGAGGT ACAGTATATG CTATTCACAC AAGCATT1ACT GGTGACATTA GTTGGGAT'rA TrCCCACTAT TGACTATAAT GGACCGTTAT GT'rACAAGTG CAATGGTTGG CTTAGTAT'rA CGAGATTTCA TCAGCTCTTG AATTAACTITG GCTCGGTGTA ACAGGTA'rTG ACTAT'N'CAG GTGCGAT 'AT TGGTACTCGCA 7rGGTAT1-T TTATGATTCA CCGTCCGTTA CCCAAGGTGT 'TCtrATTGGT GAGGTTATAC TCCACCAGAT TATCTGGTCA AGGAGAAACT GCTGGTATCG CTATAGCACT 'rCCAATTGCA GTTGCTACCC AACAGTTGGA TGTTCTTGCA AAAACTTTAG ATGTTTATr'r TCTGAAAAAA GCTGATAATG ATGCTAAAAA CGGAGATTAT TCAAAGATCG GTTTrTATCA TTATTCAAGT TTGGTTIAA TCACGTATT TAAAATTGTA CCAATTTTCC TAGCTATTAT GCTTCGAGGG GAATATGTGG CAGACTTGTT TGCTAAGGTT 646 CCACCAATCG TTATG.CAGGG AC~rAACTCT GCAGGTGCTT TACTACC?1'C GCTATGCTr'r TAAATATGAT GCTCAAGAAA AATA'XGTGGG TATrCTTGTT ATTTGrCTG TG'rATGGAGG AATGCAACC ATTGGGA'rCT CACTAGI'TGG GCATACTTCT ACGATATGAT TGGAAGCAAA CCACAAGAAA CAACTTCAAG GAGGAGGATC T'rGATCTATG ATGAATAATA AAGTAACTAA AGTTG-AACTT 'rCAAACCAAG 7=~ATGTAT GC?7"?CAT GGAACrATGA GAGAATIGCAG TTCTATATAC AATTCTTCCA GTATTGAAAA AACrATACCC AGACAAAGAT CTGCAATGAA ACGTCACCTT GAGTTTTTCA ATACTCATCA AACAGCGGCA AATGGrr'
GATTGGA'TC
TATTGCGGTA
TAGTGATGTT
AAAAAAGT'
AACCTAGGTTr
TCAGCTTCTC
CCATTTATTC
TTGGAGTTAC TTCCGCTATG GTATTAAAGT TCGCTTGATG CACTAGTTCC TATCTG7"rT GTATCTTTAT CGCCTTAATA TGAAATATGG GTATACTAAG TGAATCGCGT TACGAGTATG CATCAATGGT TGGTATTAAT TTCAAGAAAT GATTACAAAA TGTGTAAATT AATTAGAAAA T'rGGAGTTAT TCTAGTTG.TT TGGGATATCA CCTCCATTTT GCCGCAAATT GGAATGAAC GGTTCAGCTA CAAAAGAATG TCAGATAGAA ATAATCCAGA GAAGAACAAG AAGGAAATGA GGGCCACTGG CTCGTCTAGG AG1'ATTGGTG CGTCTTATTC TTGTT'rAATA TTATTAATAT GGTTCTAGTC TTATCCAAGA GCGACAGCAT TAGGGCTAGT T'rTGGAT'rAG AA'rTTAAGCA T'rAAT'rCCAG GATT'TATCCC GGAAAGAATC CGGTTGTACT rrAGGA-ATTT TGAAGTAGTA CAAGAGAGG TAAAGAGTGA CTTGCCGATT GGGAACGG'rC TATTCAACTA AACGATGAGA CTCAC'rATTG CATCTTAAAA AGGTGCAGCT TCAATTACTG AGATAGTTTG TTCTGGCTGA TAAAGACGGC GGTGCTTTAG TCCTGTTAAA TATTTCGGTT AAATAATACA AAAGGAACAT ACTAGTGGG'r GGTTTCATTC GGGGGAACTT GT'rAT'TCTG TATGGCrTTTG ACT'rTATTAA AATCTT'rAGT GT'rATGCTA GAAAGTGTGG AGGTGGTATT AATTATGC'TA TAAGAAAGCT 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320
ATTTAGGTGG
CTATTTGGTA
AAATTCGGGA
CAGTGTTTGC
AGC.ATATAGA
TATGATTTAT
TAGAGGAAAG
ATATCTTTTA
TACCCCAAGA
TATTCAGTCT
GATGGAGAAA TTCAGAAAGC CGAAGAATTG ATAAAGTTAA GATCAAAGCC ACTA'rGAATT ACTTGGGGAA CTrrACATTG TGTGCTCTT CA'IrCTATGA AAGAGAGCTA GATTTAGATA CAGCTATTTC TAATGTTGTG TTTGAGCCTA ATAGTTGTAA TTTACAAATA AAAAGAGAAT ATTTACGAG TTTAATAAG AATATTTTAT GTTGCCGTAT AGTGTCATCA GTTCAAAACA CATTAAATT AAACATTAAT 7TGG'AGAA ATAAACGGTT TAATGACGAA GTATCTAAAC TGGATTCAAG TACAATTTA A'rGTCGGCCT CrGCTGGAGG TAGAAAAGGT GTTCAGTTTA AAGTAGTATG TCATTCTAAG GTTACGGATG GTGAAGTAAG TGTATTGGGA GAGACAATAG TTATTCGGAA TGCTACAGAG 647 GTA"rTCTTr ATCTCAAATC AATGACGGAT TATTGGGGAA ATATAGATAT CAGGGAGAAT TT-AGTAGTAT TGATTACTTT ACAGAAAAAG ATGAACATGT CAGGAGCAAT TTAATAGACT TGAT'rTAAA CTAGACTATA GTAAAGGI'G
TTCTTCTCTT
AAAAAAATAT
TCTTAGCAT'r TAACTrGTTA
ACCTGCCAAT
TACGATTAAT
CCAACGAATC
TTTCATTATG
CTCAAGGAA
TACTTCTTGA AAACACTAAA AAGTATAGTA ACTACTTGAC GAAGATA'rCr GTTAATATCG TCTAGTCAAC CGAATGGTTT TATGGTGTGA TGAATTAAAT CCAATTTGGG GTTCTAAATA 0 0 0 0 0 0 0 0 0 ATTAATACTC AAATGAATTA ?'rGGATGGTA GGTCCATGTG ArTTACCAGA AGTAGAATAT CCATTA?1-rG ATATGCTCGA AAGAATGAGA GAACCGGGAA GACTAACCGC TAAGAAAATG TATGGAGCTA GAGGTTTTAC AGCACATCA'r AATACGGATG GTTTTGGCGA TACGGCTCCC CAATCTCA'rG CCATGGGGGC TGCAATTTGG GTATTAACTA TTCCATGGT'r ATGTACTCAT ATTTrGGGAAC ACTATTTATA TTTCCAAGAT GAGCGTATTC TTACGGA-ACA 'TTTGAAATG ATAAAAGAAG CATTTCTTTT CTT'rGAAGAT TATTTA'r~rG AGGTGGATGG CTACTrGATG ACAGGTCCAA GTGTCTCACC GGAAAATAAA TATCGCTTAA AAAATGGTAT TGAAGGAAAT GCTTGTCTAT CATCTACAAT TGATAATCAA ATTCTAAGAT ATTTTTGTGA TTCATGCATT GGCATTGCAA AACAATTAGG AGACAATTCG GA'rTTTATTA GTCGTGTGAA GGAGTTAAAA AAGAAACTAC CTAAAACAAA.AATAGGTAGT AATGGGCAAA TCCAAGAATC GTTAGAAGAT TATGAAGAAG TAGAGCCTGG GCATAGACAC ATTTCACCTC TATTTGGGCT TTATCCTTAT 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 0 0 0 AATGAGATG ATATTCATAA AGGAGATTAT CAAACGCTAA 'rGGTTAGTAA GTGGTTTGCA CA7'TTTTTG CGAGACTATA AATAATGCGA CTCTTGGCAA TTAGGTTT.GG TGAGTGGAAT ,ZTAATTCCAG CTTTACCTTC AACTCCGGAA TTAGCAGAAG CAGCTAAAAT CACTATCAAT T1TTTTTATCT TCACAGGAGA GGGAGCAAGC GA'rrAATAAT TGCTAGTACA CAAACAGGTT GGAGTGCTGC ATGGCTGATT TCAAGGTGAA CCTGCTTATA ACCAGATTAA TGGTTTGTTA TT-rATTTCTT GACCATCCAC CATTTCAAAT TGATGGTAAT TTGTGAATTA TTACTACAGA GCCATCATAA TTGGTTATCA TGCTTGGTCA GAAGGAGAAG TGAAAGGTTT CAGAGTAAGA 00 00 GGAGGATATA AGGTATCGTT TGCTTGGAAA AATGGCGATA CGAGGAAACA AAGATCAAAA AGTAAGAGTA AGAATATATG AATATTGAAT TGGTATTTAA TTCAGAAAAA ATTATTGAGT ATGAATAAAG AAAAAATAAA AAGAAAATTA ATCACAATAT TTATGTTTTG GATTGTTAGC AGGAGTTAAG GCTGATAATC TAACATTCCT AAAATTGGAA GCAAAAATAC TGATGTACAA TAAATTTTTA GGTATAAGTC TGTTTGTATG TATTGGGATG GTGTTCAAAT GAGAACGACG 648 ATTAATAATG AATCGCCATT GTTGCTTT-CT CCGTTGTATG GCAATGATAA TGGTAACGGA 9120 TTATGGTGGG GGAACACATr GAAGGGAGCA TGGGAAGCTA TTCCTGAAGA TGTAAAGCC-A 9180 TATGCAGCGA TTGAACTTCA TCCTGCAAAA GTCTGTAAAC CAACAAGTrG TrrCCACGA 9240 GATACGAAAG AATTGAGAGA ATGGTATGTC AAGATGTrTGG AGGAAGCTCA AAGTCTAAAC 9300 ATTCCAG'PrT TCTTGGTTA'r 'ATGTCGGCT GGAGAGCGTA 7LTACAGTTCC TCCAGAGTGG 9360 TTAGATGAAC AATTCCAAAA GTATAGTGTG 7rAAAAGGTG TTrAAATAT 'rGAGAA -rAT 9420 TGGA'TTACA ATAACCAGTT AGCTCCGCAT AGTGCTAAAT A'rTCGAAG'r TTGTGCCAAA 9480 TATGGAGCGC ATTTTATCTG GCATGATCAT GAAAAATGGT TCTGGGAAAC TA'rTATGAAT 9540 GATCCGACAT TCTTTGAAGC GAG'rCAAAAA TATCATAAAA ATTTGGTGTT GGCAACTAAA 9600 AATACGCCAA TAAGAGATGA TGCGGGTACA GA'rTCTATCG TTAGTGGATT TTGGTTGAGT 9660 GGCTA'GTG ATAACTGGGG C'rCATCAACA GATACATGGA AATGGTGGGA AAAACATTAT 9720 ACAAACACAT TTGAAACTGG AAGAGCTAGG GA'rA'GAGAT CCTATGCATC CGAACCAGAA 9780 *TCAATGATTG CTATGGAAAT GA'rGAATGTA TATACTGGGG GAGGCACAGT 'rTATAATTTC 9840 ***GAATGTCCCG CGTATACATT TATGACAAAT GATGTACCAA CTCCAGCATT TACTAAAGGT 9900 ***ATTATTCCT'r 'CTTTAGACA TGCTATACAA AATCCAGCTC CAAGTAAGGA AGAAGTTGTA 9960 *AATAGAACAA AAGCTGTAT'r TTGGAATGGA GAAGCTAGGA TT1AGTTCATT AAACGGATTT 10020 TATCAAZ;GAC TTATTCGAA TGATGAAACA ATGCCTTTAT ATAATAATGG GAGATATCAT, 10080 ATTCTITCCTG TAATACATGA GAAAATTGAT AAGGAAAAGA TTTCATCTAT ATTCcCCAAT 10140 GCAAAAT'r TGACTAAAAA TAGTGAGGAA TTGTCTAGTA AAGTCAACTA TT1TAAACTCG 10200 C'rTTATCCAA AACTTTATGA AGGAGATGrG TATGCTCAGC GTGTAGGTAA TT~CCTGGTAT 10260 ***ATT'rATAATA GTAATGCTAA TATCAATWA AATCAC.CAAG TAATGTTGCC TATGTATACT 10320 AATAATACAA AGTCGTTATC GT'rAGATT'rG ACGCCACATA CTTACGCTGT TGTTAAAGAA 10380 AATCCAAATA ATTTACATAT TTTATTGAAT AATTACAGGA CAGATAAGAC AGCTATGTGG 10440 GCATTATCAG GAAATTTGA TGCATCAAAA AGT'rGGAAGA AAGAAGAATT AGAGTTAGCG 10500 AACTGGATAA GCAAAAArrA TITCCATCAAT CCTGTAGATA ATGACTTTAG GACAACAACA 10560 *CTTACATTAA AAGGGCATAC TGGTCATAAA CCTrCAGATAA ATATAAGTGG CGATAAAAAT 10620 CATTATACT'r ATACAGAAAA T'rGGGATGAG AATACCCATG TTTATACCAT TACGGTTAAT 10680 CATAATGGAA TGGTAGAGAT CTCTATAAAT ACTGAGGGGA CAGGTCCAGT CTC'rTTCCCA 10740 ACACCAGATA AA'TTAATGA TGGTAATTTG AATATAGCAT ATGCAAAACC AACAACACAA 10800 AGTTCTC'rAG ATTACAATGG AGACCCTAAT AGAGCTGTGG ATGGTAACAG AAATGGTAAT 10860 649 TTTAACTCTG GTTCGGTAAC ACACACTAGG GCAGATAATC CCTCI'rGGTG GGAAGTCGAT T743AAAAAAA TGGATAAAGT TGGGCrTGTT AAAATITATA ATCGCACAGA TGCTGAGACT CAACGTCTAT CTAA7ITTGA TGTGATTCTA TATGACAATA ATAGAAACGA AGrGCTa.AG
AAACATGTTA
AGGTATATTA
GTTTTTAGAG
AAAGTAGTCT
GCTTTAGCAG
ATAATTTGC
AAGTTAAATT
GGGTGAATCT GTTAGTCTAG ATTTCAAAGA AAAAGGAGCA ACTAACGAGT GGAGTGCCrr TGAGmrAGC AGAAGTAGAG AATCAGATGG TAAGCAATCr CTACAAATAA GGTAGCTACT TTGATGGTAA TA)AAGATGGA GAAGAGGATA TAGATAAAAT CAAAGTTCAA CCAATTATGA GATTACGGAC ATCATTCGGT
AACAGAAGAT
GGGTGTAGCT
GACTCA'rACT
S
S. I.
*0 55 S S
S
5
S
AAGGCAGATT CTAACGCTTG GTTGATATTT ATAATAGAAC TTTCTATCrT CATCAGGAGA TTGTrATCTr TAAAAGTACC GCAGCTATTC CGTrAAGTTT AAACTTTCTA ATATrGCATT TTTTCTCGTC TAGCAGTTGA
GTGGCAGGTC
AGATGCCGAA
AGAAGTmrT
TTCTGTAGGG
AGCGGAAGT
AACAAAAGAA
TGGAAA'rAAA GATCTGGGAG AAGAGTTTAC GGT'rCTAAA CCTCAGCGTT TATCTAATTT TGATGTTATT AGAAGACATT TTGATAAAGT AGTTGATGGT *S S S .45 5 5555
*SSS
S. 55 S S
S
CATACCAAAG AAGA'rrCTCC TTCA'rGGTGG GAAAAGTTAA TTATTTATAA TAGAACAGAT ATTATTATAT ATGATTCAAA TGATTATGAA AGCAATAA'rC TATCCATAGA CTTAAAAGGA AGAAGCGCAG GAATTCCTTT ALAGTTTAGCA TAAAAATTAT CACCCAGGCT ACCGTAAATA AAAATAAGAG GAAAATAGTA TGATTCAACA TGG;TCGTCGT CAAGGTGTAC GCGAATCACT TGTGGCAGAT TTGATrCAA GCACATT'GAA GATTTCTCCA TCTACTATTG GCCGTGTACC 'AAAATCAAAT GTTTGCGCAA CAATTACAGT TATGGATATG TCTCCAGATA TTCCTCATGC AGGAGCTGTC TATCTTGCAG CTGTACTAGC
GCTAAGCTAG
GAAGTCTATG
ACTCGACAGA
AACGGAGATT
GAGATAGATT
GCTGAAATTC
GTTTrrACAC
CTGAAGGGAA
GAGGTAGAGG
TAATGGAGAT
TCCACGTATT
TGAAGTGCAA
ATATCCAGAT
AGAGGCTGCA
TACACCATGC
TATTTGGGGA
TTCACA'rCCT
TCAAAATAGA
GTTCAAAGAG
GTTCAACGGA
ATGGTCATCA
TAGCACAAAC
ATTAAAATCA
AACTCCGAAG
TTACAATGGT
TTCAGTGACT
CGAAGAATTA
AGAGATTATC AAATTTrGAT AACATATTGA CAGTTTAGAA AAAAGGTTAG AATTTCTTTG TTTATACTrA TAAGTAATTT GGTAGTATGA AAGAAACAGA GGGATTCGTC CGACTATTGA ACAATGAACA TGGCTAAAAG GGGGAACCTG TGGAATGCGT GCTTCCCATG AGTrGT=TAA TGGTGTTATG GTAGTGAAAC TTTAATGGGA CAGAACGCCC CAAAAAGGGA TTCCAGCC'TT 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 TGGGATTTAT GGAAGAGATG TTCAGGAAGC TAGTGACACA GATATTCCAG AAGATGTCAA 650 AGAAAAACTT TTACGCTATG CGCGTGCAGC TCTTGCAACT GGCTTGATGA GAGACACTGC TTACCTATCA ATGGGTAGTG TrDCGATGGG GAT1TGGTGGT TCTATTGTAA ATCCGGATT CTrCCAAGAA TACTTAGGAA TGCGAAATGA A'rCGGTAGAT ATGACGGAGT TCACGCGCCG TATGGACCGT GGTATTTACG ACCCTGAAGA AAACGTAAAA GAAGGATTCG ACCATAACCG AGATAGACAA TGGGAATTG TrATTAAGAT TAACCCAAGA CTTGCGAAC TTGG'1'TTT GA AGCTGGTTTC CAAGGTCAAC GTCAGTGGAC AACTTTCCI'C AATACTCAGT TTGACTGGAA GTTCGAACGT GCGCTCAAAT GGGTGAAAGA TGAAGACCTT GTTTTAAGCC GTGAAGAAAA G'rTCATGATT GGACGTGACT TAATGGTrGG GGAAGAAGCG GrrGGTCACC ATGCTTTAGT AGACCATTTT CCAAATGGGG ACTTTATGGA TGGTATTCGA AAACCATTTG TATTTGCGAC GCTCTrTAAT TATCTATrAA CAAATACTCC GAGCCCAGAG GCTGT'rAAAC GTGTAACGGG CTTCTTACAT CTAATCAACT CTGGTI'CTTG AGAGAATGAT TCACTAAATG GTGTGTCTA'r ACAAATCTTT GCTGATGTGC GTACTTATTG ACATACTTTA GAGGGTCGTG CTGCAGCTGG TACATTGGAT GGTACAGGTC AAGCTACTCG AGATGGCAAA CCrATTATGA AACCATTCTG GGAGTTGGAA GAAAGTGAAG CCGCGAATAC TTCCGTGGAG AGTAACAATG GTACG'rCTCA AGGTTACACA CTTGAACTTC AGGATGGCCA ACTACTrGGT CTATGACGTC ATGAATAATT AGCAGACTTG ATT'ACCTTGG TGAGGAAGAT ATCTTTAGAC TGCAGGCTAT GCTTGAAAAT ACAGACTTCC CACCAGCAAA GAGGATTCTC AACTCGTTTC TTGACGAAGG GGGATATGCC ATCTTCTAAA AGGGGTTGGT CCAGTGCTAC AAATTGCAGA 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 4040 14100 14160 14220 14231
CTGAAGATGT
TTGCTCCACG
GGGGAGCTAA
CTTCTATGTT
CTAAAAATTG
TCACCATACT TTAGATAATC GTACAGATCC TTTGACAGGA AAAGGTGCTT TCAAGTCTGT TCACGGAGCC ATA.ACATATG GACACA'rTGG GAGAATTCCT GTCAATATGC ATA.ATGTACC GTCCTTATTT GGAACAGAAG ATCTAGAATC GCCACTACAT AAATAAAACT TGTTTATATA 'rAAAAAGATT TGTTAAACAA 'rTCACAAA1A.
GATGTTAAAT AGATAGC6CG GAGGCGCAGG CGGCAGTCAA CCTTATTGGA AAAGGTG'N'G TTGGCTATAA AAAGGCACTT 'rTGGTGACAG AGCAGACTA'r CGTGCATGTC AGTTGTTGGG GGAGGTGAAC TTACGTCCCT CCTATCCTTT ATTGAAAACG AATACAAAAA GTAATATAAT AGGAAAATTA TATGGCTATA TTTTATGTTC TAALATGAAGT GGGTCCTTAT ATCAAGGAAC ATAAGTACAT CGAAGGCAGT GATATTTTAC CTAAGACTTT GAATCGAATA T INFORMATION FOR SEQ ID NO: 82: Wi SEQUENCE CHARACTERISTICS: AAAACCACTG GATACAGAAG 651 LENGTH: 16995 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 82: AGTTCTCTTA ACTTrTTTTAG GATGGCATTC TCCGCTCTCA GGTACTCATT TTCTGCTgAA GACGTTCTAA TTCTGTCCrC TCITCAGGTC TCGTTTTTGG CTTACGTCCC ATTrTAGGTA CrCTCCCTCT TGPTCTCA ACAATAGTAT ACCCGTTTT CCTGTATTGT GCTAGCCAGT TAAGAAGTAT CGTACGACT'r GGGAGACCGT ATTCAAGAGA AACTCTATCr TTAGTCCAGC CTTCATGTCA GACTTTATTA CTCAT'rTCTr G=mAAATC AGGAGAATAG TAACGATTTT TTCCN-T-rT GACGAACTCTr ATTCCGTAAC GATCAATCAA TTTAATCATG TACCTAATAT TAGAATTGCT TATCCCAAAT TTATTrGAAA GCTTCTCTAA GCTATATCCT TGTMTCTAA GTTCATAGAT CTGAAC=~A TCATCATAAG TTAGTrTCAT AATAAAAACA CCCCAAAAGT TAGATTTT= CTGTCTAAC T TTTGGGGTGT AGTTCATGTA CACCTGATAT GATGCGTTr'r ATAAT'TTTA AGCCTITTTG CCCAGCCTCG TCAAAAGTAA TGTTI'TGACA CAAAATCTGT GACAAAACTr TAGTTTTAAA ATGA'rCTAAT GGAAGAAAAA 7TrrCAAATAT GGTTATGCCC TCTTTATCGC TGATGGCTAT TAACGTATTT ATrGCCAATC GTCTGCCGT TGTAGGAGCr TGTTTATCGG AGCTATGGTA AGAAGTrCCA GGAAAAAATT GTCTCGTTGG ?NTGCATTA CTCTTACTGG AGCTGTTGGG TGGCTAATAT TATCATCGAA GCATTTTTAC TCCTCTGGGA TATTGGAAGC TAATCCTGGA AAGGTTCTGC TAAATCTTCT ATGAAATTA CTTTCCTTAT GGTT'rMAAC TTTGTATATA CTAGrTTAA GAAAAGGAGG GTATCATTGA AAGTCAGGGT TCAAAAACTA GGGACATCGC AATAT'rGGAG CATT'rA'TGC TTG.GGGAGTA TTGACTGCCC CTGCCAA.APG AACAGTTAGC TACTGTTGTT GGTCCTrATGT CTGATTGGTT ACACAGGTGG ATATATGATC CATGGCCAAC AT'rGCTAC'rG TTGGTGCAkAT CACAGGTTCT AGTGTTCC-TA ATGGGCCCAC TGGGAGGATG GACTATCAAG AAATTTGATG CGTCCCGGAT TTGAAATGTT AGTTAATAAC TTCTCAGCTG TTGCTTTTGG CTTTCTACGC AATCGGTCCA GTCGTATCGA AATGGTGTTG AGGCTATTGT CAATGCTCGC CTCCTTCCTA CCGGCTAAAG TCCTTTTCCT CAATAATGCC CTCAATCATG GTACAACAGG TAGCTCAAGC TGGTAAGTCA ATTCTCTTCC CCAGGTCTGG GAATTCTATT AGCTTATGCT GTATTCGGTA TCTTGGGGGG CAATGGTTAT TCATTTCrrC GGAGGGATTC GTTATGATGA AGCCTACTC'r ATrTAGCT GCTATGGCAG 120 180 240 300 360 420 480 540 600 660 720 *780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 652 GAGGTATCTC TGGAACTTTT ACTrCAAC TCrAGACGC TGGTCT'rAAA TCTCCAGCTT CACCAGGTTC TATrATTrGCG ATGTTCTTTT AGGTGTTTTA T'rCATGCAGA CAAGTCAACT CTAACGCTCA GTCTAAAGGT ACTCAGGGA AAAAATCATT CTAGTArrCT TCGAGATAAG CAATCTCAAA TTGCTTGAT CAAGAGCTAA AGACAAGAGT ATTATAGCTA CGGCCCAAA AGGTGTGG CCCCATCTAA GTG.GCAGCAG TTG~lrCTTT CC7TTGTACCA GCCCTTATTC GAGGATTCGC TCGAAGCTGC 'rCAGGCGGCT ACCCAAGCAG CAGTTAGTA'r CAAC"rCTGT 'rGATG.CAGTT GTTTCGACAG TTCGCCTGCG ATGCTCGAT GGGAAGCTCT GCTATGGGAG GTTAAAAAAG CAGGTCTAGA GATTCCAGTA TCTAATCAGG ACACCAAAAA CATTAATTGT TACTCAGGAA GAACTGACAC CCAAGTGCTA TTCATG~TTTC TGTTGATAAT TTCTTAGCGT CCTCTCGTTA TCATGAAATT GTAGCTTCAT AAGGAGATAT ACCAACT'rCA GCACCAGTAG ATGCTGTAGT AG'rTGCTTAT GGTAAAGCAC TAACAGGAGC TTCTCCAATA GCAGAAATTG ATAGTCAGGA AAGTGACC'T AACCATATTG AGGGAACTGC AACTATGGGC TGTGAAACGA TTrCGGGCTAT
AATTAGGTGA
AAGTGCAGCA
CAGAATATGA
AATGCTA7,rA TTCAA'rGCAA ATCAGAT'rTG
CTATAT'TTTG
TT=AGA.AAC
ATTTAATTCT
AAGAATATTC GTATTCCAGT TTCTACTGCC AAAATTTCAG AAAAACATAA TGATTGTAAC AACTATTTCT TTACAGGCAG AGCAGCACCG AATTCTCAAT CAAAATGGCT GCTAGAATGT ACCAAACGAG AAGAACAA'TT GATATGACTG AAATC~rACA ACAGATAGCA TGGAGCAATA ACTGGAGAGT TGGATGA'Tr TTCTTATTGT GCATAGTTTA GTAACAACAC ACAAATAGAA CTAGAGGTTT CTAAATTACG ATTGAAGGCT TTCCTACATG TAGGGAAGCT GG71TCATCT AGAACAATT-T A'rCGAACrr TGGAATCGAA ATAACGAAGC ATGGGAAATA GCCGACAGAA CTTGAAGTGI' TAGT'rGAGTA CTATCGCC~r CTGACTGAGA GTGGTTTTGT ACTCAGTAAT GTAACTAT'rA TTCAGGATAT 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 TAGTCCCCAA GAAAGACAAG AGTTGAT'rAC CACCAATGAA GCATTGCAAG AGTGCACGAA TTCAGATATT GATAAGCGTC TTTTAGAcTT TGATCtGAAA ATTGAACGAC AAAAAGGTTA TCGGATTTCT GGTGATTCAG 1TGGTAAGAG AAGATTTTTG GCTATTTTAC TGACAAACTG TATCTCAGTA GCAGA?1rTT CAACCGGTAA TTrTGGGAGC TTTGATATTT TAGAAGCAGA TAGAACTGGG CTrGGCCAGTC AGA7TG'AA TAAGCAACTG TCAGGTTTTC CAGATATGGA TGCTAGGATG AAGATGTTTT TTGCGATCTr GTTATCTCTT ATAGGTCAGG AGCAAAACAT TGAAAATTCA CCTAATACTA GTAAGCAGGC TTTGGAAATT TCTCAAAAAA TN'?TCAAGC TTACTCTAAG CAGACTGCAC AATTTTATAG TATrCAGGAA ATTATCTATT TTGCGAGCAT CTTGGATGAA TTAATCATTA AACGTCAGGA CAATCCGCTC TTACGGAGA AATTI'GATGG 653 TGAAI-1-1-rC TACAATATT CAAATCTGAT TGATACGGrr TCCATGTATA CCAAGATTGA CmT'rTAAG GACAACG= TATTCAATTT 'rCTTTCCAT CATATTCGGC CGTCCCTATC CTTwrMCAGG GTGAAAATTT GCCAGAATCT ATCCAGATTT GAATAAATr'r C~rrATACAG TCAq'CAGTCT TTTAGTGAAT GATAT C TCATACAGAG TATGAGTAq'G GCATGATTGC CCTACATrr ATCTCTAGCT TCCAGAGATT TATCCAGTCC GTG=NGCT TTTAACGGAT GAACGTCGGG TTTATTAGTC AGTAAAATTA AGAGTGrTGIC TCCTrGTA GAGTTGATAG TCTAGTAGAT TACCACACTA 'rTGATCTCAG TCAGTATGAT TATATTTTAT GCTGACTAAT CAGGAAATCG ATrGTAATTTC 'rAGT'P1rCCA ACCC.TCAAAC
TCAGTTTAGC
TAGTTGAAAG
CGAAATATCT
TAGGCCGTAG
TCACTAGAGA
ATATTCAATC
CTACCAAGCC
AATTGCTTGA
ATTACAGGAA CGACTTCAGT ATGTACAGGC ACATCGTACA ATTGTCGCGC GTGATGCTAT CGCTCCAGAG AAAAGTTATG ACTTGCAAGbA T'rATTTAA'rA TCTAGTAGTC AGCTTTTGAG TCAATTCGAG TTGGTTCAAT 'rGGAGAATAA TCAATCATTT GAGCACACGG TAGAACAAAT CATCCAATAT CAGAAGAATC TGAGTGACAG CTTCCAGAAT AGTCCTATGG CTATTCCTAA TAGCAAAGTA ACAACAAATA GrTTTACTAT G'rCAATGAAA .CGAGAGGAA G AAGAGGTCAA AGCTAGCGAG GAAGCTAGAG ATTTAATGAC TCTTTATACA GAGATTTACA AGACGGGAAA TATTTTTAAC GAAAAAATTA AGAAATTGGA 'rTAAGCTTAA TAAACAATTr TCTAACAAGG AGCTTACCTA ACAAGAAAAT TACTGGTCTG GTGCTrTAC GTTTGAACTC AAACTACCTA AAGGTCTG CTAATGCTAA AGCTATTAGT CAGTCGATTA TCAATCCATT ATTTATCAGA CAACTAATAT GAAACTTGAA AGGAAGC7AT TTGTTATTGT
TGTTATCTCA
ATAGTCAGTC
TCTCCGCATT
TGTCTAA.AGA
TTGAAAATCA
TGCTAAATAC
AAACATTTCA
GGGCAAGTTC
CGAGATAAAG
3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 TTTATGAGGG TGGATATG=r AA'rGAAGACT ATATTGAAGC CATGATTGAG AGCTATCTGT TTACATCGGT AACTTTATCG CCATACCGCA TGGAACAGAT GCAGCAAAAA ATGATGTCCT CAAGTCTGGT ATTACAGTCG ?TCAACTCCC TAGAGGGGTT GA7*TTTGGGA ATGTATCTAA CCCTCAAGTG GCAACGGTTC TT"TGGTAT TGCTGGTrr GGTAATGAAC ACTTAGAAAT TATTCAGAAA ATr'rCTATCT TCTGTGCACA TGTAGATAAT CTTCTTAAAC TAGCAGATGC TCAGTCAAAA GAGGAAGTAT TGCGCTTATT TGATGCTG= GAATAATTGA AT'N'AGTCAT TTGTCATCTA GTATATATG'r CCCTCAAATA GGAAAAGCAG AAATTGAATG AAACATTCTG TTCATTTrG TGCCGGTAAT ATCGGTCGTG GTTTTATAGG TGAAATTCTA 'ITTAAAAATG GTTTCCA'rAT TGATT7TGTG GATGTCAATA ATCAGATAAT TCATGCTCTG 654 AATGAAAAGG GCAAGTATGA AATTGAAATT GCACAGAAAG GACAG'rCTCG TATAGAAGTA ACTAATGTGG CTGGCATTAA TAGCAAAGAA CATCCTGAGC AAGTCATTGA AGCGATTCAA AAGACGGATA TTATACTAC TGCAATCGGA CCTAATArAC TCCCN'AT CGCCGAACr CTAGCCAAAG GAATCGAAGC TCGCCGAQTT GCAGGAAATA CACAGGCATT GGATGTTATG GCCTGTGAAA ATATGA'rTGG CGGGTCTCAA TTTCTTTA'C AAGAAGTCAA GAAATATTTA AGI'CCGGAAG GrTGACATT TGCTGATAAC TACATAGGT TTCCAAATGC TGCACTAGAC AGGATTGTTC CAGCACAAAG TCACGAAGAT GAATGGCTCG TGGAAACCAA GCGTCTTAAA TATGAAGAAG ATTTAGAACC CTr'rATrCAG GCAACTTCAG Cr'rACA'rTGG TCCGCATTAT TCCCT"TG TTGTCGTCGA GCCCTTTAAT
AATCCAGATT
CGAAAACr'rT
GGTGCCAACA
AATCCTAATA TTAAATCTCG QATTGAATCT c'TA'rTAGCTG GCCAAATGGA ACTTTGATAA AAAAGAATTG GAGAATTATC
I
S S *0 55 S S
**SS.S
S
CTTGAAAACC CTTTCATAGT TTAGGCTATA ATGAACGATI' TATAAAAACC TACTTAAAAC GAAAGTATTC GATTAGGTGA GTTACAGGrT TAGACGACCA TCGAAAATCT CTTCAAATCA ACAACCTCAA AGCAG'rGCTr TGTTTGAGT'r ATCTGCGGTA GGTAAAAGAA GCTGGACAAA AAAAATCTTG ATAGGATGTC GAG'rTTGA TCAGCTTTAT TTATGGACAG TGGGAAAATT GAGAAGAACG AACCAAGCAA
GGACGAGGTT
CATCCGGCCG
AGTTGGCTAT
AGTCGCGTAG
ATACGTGAAT
GTCTTTGACT
TACGTCTAAA AGATGTGCAT TTTCAGTCAA rTCTGGACAT CAATTTTGGA AGCTCTTCAA AA.ATTCGGAG TCTCTTGATT ACAAAGTCAT TATAGAACGA CTCGTACTCC AATCCGAAAA TGAAAGAACT CAGTTTGTCA ATCGCGATGT AAATGATGAA TCAAAGATGT TGTTATACAA TAGAGTATAT T'rAATCTTTrT AGGCATATGT TG'rrCTATCT TATCTGCAAT c'rCA-AAACAC AT7TTTTG TTATTTATAA 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 ATTG'-GC'r AAACA.ATCAG AGA.ATTGACr GAGCAAATTG GGTTACCATC GCT'rrGCTT TGAGCTGACI' CCGTCAGTCT ATCTTTCTAG CTTGTCTTTG AAGTCTTCAA AATCGGGAAA AGGCAGCCTA CTTA'rrATG GAAAGCCTTA T'rGGATTTTC GAGATAGGTC rTGCTAGAGA TGTAGCCCAT G AAAAAA ATAATGCCCA TCAATITCTrl' 7"rTGGAACG AATTCTTTCG AATGCGATCT TCGGG'rGTT~C
TCCTCAGATT
CATGTTA71r
ACTCGTCCAA
ATATAGTAAA
S. 55 S
S
ATGAAACAAG AACAGGACAA ATCGATCAGG ACAGTCAAAT CGATTTCTAA AAATG2'NrA GAAGTAGAGG TCTACTATTC TAGr=CAAT CTACTATATA ACTGAAAAAT TAGATAAATT AGT7TTGGAA AATGACTAAC CAAAAGATAT CCAAAGTAGT CTAAAATTGT CTATACTTTA TGAGTGTTTT AG~TAGGAAA AAGGCTTGT'r GTCTATAATT GTCTGCATTA GTCTAX3ATT TATTATAGA AAATGTTATA ATAGACTG1TA ?ITrAAAAAAT TTTAAG4GAGA AATGACAGAA 655 TGTCTGTATC ATTGAAAAC AAAGAAACAA ACCGTGGTGT AAGACCAAAT CAAACCAGAA 7TTGGACCGTG TCTTCAAGTC TTCCAGGTTT CCGTAAAGCT CACCTTCCAC GCCCTA'rCTT CTTGACrMC ACTA'rCTCTC AGTSAAGAAA TCTCTAATG CGACCAAAAA TrGGGAAG AAGCTCTTTA TCAAGATGCA ATGAACGCAC =NGCCAAA CGCTTATGAA GCAGCTGTAA AAGAAGCTGG TCrGAAGTG GTTGCCCAAC CAAAAATTGA CGTAACTTCA AITGGAAAAAG GTCAAGACTG GGTTATCACT GCTGAAGTCG TTACAAAACC TGAAGTAAAA TTGGGTGACT ACAAAAACCT TGAAGTATCA AGCGTATCGA ACGCGAACGC AAAACGGCGA CACTG'rTGTG GTG.GAAAAGG TGAAAACTTC AAGACCAATT GGTAGGTCAC AAGACTACCA AGCAGAAGAC AAGTAAAAGC TAAAGAAGTT GTTGATrGTAG AAAAAGAAGT AACTGACGCT GATGTCGAAG AACAACCTGG CTGAATTGG TATCAAGGAA GCTGCTGCTG ATCGACTTCG TT~GGTTCTAT CGACGGTGTT GAATTTCACG TCACTTGGAC TTIGGTTCAGG TCAATTCATC CCTGGTTTCG TCAGCTGCC A.AACCGTTGA TGTTATCGTA ACATTCCCAG CTTGCAGGTA AAGAAGCTAA ATTCGTGACA ACTATCCACG CCGGCTCTTG ACGATGAACT TGCAAAAGAC ATTGATGAAG AAGTTGAA.AC ACTTGCTGAC TTGAAAGAAA AAGAAGCTTA CAAAGATGCA GTTGAAGGTG AA'rACAGCAA AGAA'I'GGCT GCTGCTAAAG CAGCAATI'GA TACAGCTGTA GAAAATGCTG 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 AAATCGTAGA ACT'rCCAGAA GMXATGATCC ATGAAGAAGT TCACCGTTCA.
TCCTTGGGAA TT'rGCAACGT CAAGGGATCA ACCCTGACAT GTACTTCCAA CTACTCAAGA AGACCTTCAC AACCAATACC AAGCAGAAGC TGAGTCACCT ACCTTGTTAT CGAAGCAGTT CCCAAAGCTG AAGGAT'rTGA TGCTTCAGAA AAAAAGAAGT TGAGCAATTG CCGCAGACT ACAACATbGA AGTTGCACAA 'rGCTTTCAGC TGACATGTTG AAACA'rGATA TCACTATCAA AAAAGCTGTT CAAGCACAGC AACAGTAAAA TAATCTTAAT AAACAGAAAA CCCACCTGAA 'rrCTGATGCA CTATTTTCCA AAAATCTCTT TGAGGTCTGT GTCTGTAATC CTGGCATGCG GTCCCAGTTT TCTTCGGTTA GGATGTAGGA TTGTrCAGAG TGACTGTTT'C AGAGACAGCT TGTTGCTTTT CTTCAACATT CTCCAGTAGA GTTCAATCAG ATAGGTTTTT CGGGCAGTTC CGATGTGTrG GGTAGCATAG
GTAAATGAAT
ATCACTGGAA
ACTAAGACTA
GAAGAAATCC
GTTCAAAACT
GAATTGATCA
TTGGTGGGTT
CCAATCATGG
G-CACTTGATG
TCACTGAAGC
TCGAAGGCTT
GTAATTCGCC
TTCGCTCCAG
TTCCCTGAGA
TAGTAAGATG
CATACGTCGG
CT'rATGAATA AGTTrCCT'rT TGGCACGTGT AATGGCTGTG TAGATGAGAT CTAGCACTAG TAATCGGTAG GATGACAACT GGGAACTCAC CTCATCGCCAT AGGCCAAGCG AATCTTGTAC CATTCGTTAC GGG=GAAGA GACTTCATTA CCATCAAAAT 'rGTATTTACC AGGAATCAGG TCTGTGATAG CAGCATCGrT AACCAAATGA ATGACCCTGT CAAAACTGAG TTGATCTTTT TGTGGGGGAT CATCAATCCC TGCCGTCCCT CGGTACATAG TACCATTTCT GAGGGCGGCA CCTAAGATN' CAATTTCAAA GTAGGAACGG TCACNl 656 CAATGACAAT CTCGTCTTGT TTCGATTCGG CTCCTAAATC CCCATTAAAG ACATTGATIT CTCTCTTACG ATAGTGACAC TGAGGAGCTT TGAGCAGGTC TTGCATGAGC TGATTGATAG GAGCCAGAAC 7"TGGATATCA CGGGCGGGAA TTTCAATGGT GGCAGGAATA TGGCCACTAG TTTGGGTGAA ATCAGCTGGC AAGATGCCCT
S
S
SS
S S 55.555
S
S
S. 55
S
S
55 GTCGAATCTG ACTAGCTAGG CCAAGCGAGT CTGAGGAATC TGACAGAAGG TAGCTGATCA TGGAGAAGAG TTGATI'GGCC AGTCAGCATC TAGGTAATCT GGCGATGTAT GGTCGCGCTA CAG'rTGGAGC AGCAAGAAGA AAAGGGCATA AACAGCAATG CTGTCAGGAT AAAGACCTTA CATACTCAAT TCCCAG7TrCT CATGACTCTT CTGTTTTCCT CAGCGAAAAA GAGGCTGTTG CGATCAGGTA GGAGAGCTCT ACTCAAGGAG AGTAAGGGTT TTTCCATACA GGCCTGAAAA GACTI-rCGAT GCCTAGTrCC TATCCTCAAC CAGCTGGTAG AG'TCTTGAAT CTGAAAGGCT GTGACGATGG TTGATTCTrT GCTTTGTCGA AAAGGAATAT GAAGTAGATC CGCTAGAACC CTGTCACCTA CGATGAGGAT CTTACTGTTA AGCCAAGTAT CTACCATAGA GAATTCATCC TCCAGATGAC TGGTATCATC GTCACCTGTC ATTCCCAAGT GGCAAACCTG, TCAATTCATT CATGCGACGA GCAGCTCGAC ATGGGCAGAT TGCTTTTCTT CCTGAAGTCA AGTCCTTCTA
TAAATTTTTT
'rGTCCAGGAC
GAAGAGATAT
ACGATGATAA
AT'rCCAT'rGA TAACAGTTGT TTCTGGATAG CATCACAGAT rGCTCGACAG; TAGTGATATG T7-rTCAAGGA TACGAACCAA TCAAAGATCT TGGTATCAAT TGGGCAACTT GGCTGGGGTC TGTTCCAGCA AATCCCGTGC AGACTGTGAA C'TAGACCGGC TCAGCTAGTT GGTCAGCAAT GGATAATTI CAACCACATC CTrACCAGTA CCAGGCCCAC AGCCTGTTTT TGAATGTTAT ,rTTTrGAATG GTTTCTAAAT GTGACTGCGG ATGCCTTCCT CTGCTGAACC TTGTCTTCTT TAGTTCCACG GGACGGGAAG TrCAACATAG GTGTCCCCrG CCGGAAGCGT TCAGGAGCCT GGTAAAGCCC AAACCCTTGA AAGGCTTTCT TCCTTGTAAA 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 AG7TTTGTTGG GAATGCCGTA GTTGGCTAGT TTGGCCAAAA
TCATCTCCGT
AGAGTCCTGC
CGCCATAGGT
TTGAAAAGTA
GCAGTTGTTC
TCCGTAGTTIG AGACGGAGAG GATGCCTTCT AACTTTTCTG ATCCACGArr TTCTGAGCTG CTTGACCAAG CCCTTACrAG TCCATACrrG GAGTGCTGGA TCGAGACGAA AGCCTCGCGA TTTTTGGCAG GGTGTTGCAA AATTTCGTCA ATGGTATTTT TCTTGAGACC AATCCCCTTG AAATGGCTAC TTGGTTrMC GCGATCATAA CGACTGATTT CAATTTGCCC CCAAAAAGTA TAGTCTTCGC 657 CCTCAA?1'AC ATCAGCCATG GMCCTG'rGA CAATGAT'PTC AAAATCATCA AAATCCTCTG CGTCCGTATC GTCGATI'TCT AGGAGGAGGA TGCGATAAAA A'ITGCTGG ?NTCAAAAA TAATCCG7M AATAGTTCCT GAAAAATAAA CTTCCATAAA ATTCCTTTGC ATGAATAGGT GAGAGTTGGG A~rGTTNrA TTTATACTC TTCGAAAATA CCATCTGCAA CCTCAAAACA GTATTTTGAG CTGACTTCGT
AAACACTGTT
TATTTGTAAA
TGCCTCTTAG
TTCCCCTGTG
GCGG'rCATCT
GTAGTTGACA
'rTTATTCCT
TTTGATATAG
ATTTTCGTAA
GCCATCTTCC
TTAAGCAGCC TACGG.CTAGC TrCCTAG~rr TAAACAATCA CTTCTCACGA TAGAAGAAGA G?1'TCTTAAA ATGTTCCGAT ACGGG'rGATT
ATATCTTTTG
CCGAGGAGAA
TCAACTGTGA
TCAAAGCCCT
TCTGCTAGAT
CTTTGAAGGT ACCTACGTGG GGTATTCTCC TTCTGGAACA AGGCTrAGC TTTTTGAGCG TGCCTGAGTA AGTGCTITGG AAGGCTCG'TC CGTITTCTTTG 00.0 00.
CGAATGGTGT CGCCAGGCAT TCATGGGCCA CCACGATATC
TCCAATCACG
AAAACGGTCA
CAAGAGAATT TCGCCATCCG CTAGGGTCGG ATCCATGGAA GCTCCAAAAA AAGATACGAC TTAAAGCTAG TAATGACAGA CTC7TTAAG AAATT=rAA ATGAATTCAT AACTTACCTT TTCAGTGTTT T'rAAAGTGCA ATTTGGCGCA GAAGCTGAGT CAAAATCTGG CTAGCCACCT TGTCAGAAGC CGTTCCAGCT CAGTTCTCGT CCCAAATTTT CAAGAT'TTC CAGAAAGAGA AACTGCGACA GACAAGTArr TGCCCTCAGC C=r-CrCT ACGATTGCGCC TCTTNGTGCCA AGTACTTGTC ATAAT7TTA CACAA7T1rTC TCAGGCI'GAA CACCTrrG AAGGAGGAGA GGCAACCrA ACCGAAACAG CGrGTAGCG GTCTCCGATG TGAGAGAAGG AGTGCCTGGT CCTGAATTT ?rCCTTGAGA CTTTTGGTCG GTCAGAGTCT 'rAGAATCCCC CACACCGAGT G'rCAGGTGTG ACAAAGGCAG CCACAACTGC AAGCCCACCA CrCATCTGTC CCAATTAAAG GAAGATTTTG TCCGC7GGT= TCTTCAAACC ACGTCAGCTT CAGTTCTATC CACAACCTCA GTTC~rGAT ?1'TCATTGAG GGCTGAGATT GGTGATTCTC GGCCATAAGC GGAATTTAGC CGGCTGTCGC TCGAAACCAA GTAAAGCTAA AGT'rGGTGTT ATACTTCTAA AGAAAGTTCC AGTTTGTCAT CCTTGAAGCG TCATTGATGT AGAGTTTATC CGCTTGACGA TGTCCTTATT ATAGGAAGGT GT'N'TACAAC TGTCCTTCTA CGCGAACATT ATTAGGAGGA ACAATCCCCA 'rCTAAGCGTr 7T'CGCTTT CCCTGCATAC CATAGGCTTG CCACTrGGGA GCTGATAACC TCACGCGCAA TGACAGAAGA AAGCTGATAG GATTGCTGAA GCACTGGTAA AGGCATCAAT TAGATAGCCT GATTATGGAG ACCTCG7TTGT ACrrGCTGGG ATAGGAGTAA TCTGACGGAT TTTCGTAAAA AGTCGTGCTG AAGTAGGAAC CATTrCCCAC TGCrCTACAG CTTGATAGCC 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 658 AAAGAAACTG GCGTATTTTT CAGCCCCTTC ACCCTGAAGC AAGAT=r-C CAGAAGTATA GATAGAAACC GTTGC1-rGAG GTAGTTTCAA AAAGTAGCCG ATATAGGGAT TCTTCCTAGC AGCCAGACTG GTTrTGATAGT GT'rCAAGAAA AGCCTGAATA TCCTTTTCGC 7TGGTGTGAG TGTGATACTr GCCATAGTTT' CTATTGTACC ACAAAAGCAG TAAAATTTGT AAAAACTGAC AAAATTAGCG AATTrGA TAATATCq$TG AGGTGAATTT TATGGCAAAT CTAAATCGAT TCAAATTTAC ATTCGGGAAA AAATCGTTAA CCTTGACAAG CGAACATGAC *go to 0.0..0 too.* o* TG4GAGGAAAT CGCTAAGG~r GCGACAGAAA GCGCAGATGA TGAAACAATC GCTCT'TrTGT GCCGTGAGAT TGAATT'rGAC GATAAGGAGC TGACTTGTAA GCAAGAACAG AGCAAGATTG ATTGGTCTTG GTTTGGGGAT TTTATATCGG TrACCTGA'rT TCAGCCATG4G CATCGGCT AGAGCAATTC CATTTANGC TCCCTTATGC TTTrCCCATCG GATCAACTCT TrCAGCrGGA GC'rTGTAT7'r GGGATrGTCT ATAGCATTrGG TCCTAGCAAA AAACTGGGTG GTAAGTTGT'r GGTGACCTTA TTTGTC'ITGC AAATGGCCTT TATACAAAAT CCTCTrGAAA AGAGTATCGT AACAACCAGT TGGCTCAAAC AAATCTGGGT GTTT'rCCTAG CCC7TTGTTT ACAGATT'rGA ACACCTAGAC ATrCAAAGAC AAGGAAATAA AGTTCGATAA GGTCAAGGCC T'rGTTTGAGC AAT'rGAGACA ACTGGCTCCG ACTGCCAAAG TGAAGGAAAT GCAGGCTCTT TTCGTCGAGC AAATTGCAGG AGTCTGCAAG AGGTTGGAGA TACTCTTGAA ACGCGTGCTT CTTGCCAGCC AAAATGTCAG CT'rGGAAGAA TrAGCCCTTT TACAAGGAAA TCTrCAGGCC TTrAATGATG AATTGGCGCG AATCCGTCGA AAAATACATG AAGACTrTGCT CAAGCAAAAA GCGCAGCTGT AATACCAAGC AATTAAAGAA TGGCAGTCAA CTGTTTATCA AAGAGCTAGA AGAACTCCGT CACAAGCTrG AGGATTCCTT ATGATTTCAT TCCTrCTTCT CTATCGGAGA GGCCTGCTCT TATGGCTGGC CAGTrTTATA AAATTCGCAG GAAGGTCAGG TAAGGTCTT TATGCAGGTA TCGTTTACTT GGTCTTCTCT
TACAGGTTTA
AGGGGCTTGG
GGACTTTCTT
TCGGCTACTT
TACAC~rGAT AACCTTlTTA
CAAATCCCTA
ACTCAGCTCA
CCAAGTTTCA
GACAATCTI'G
CGCAAAACAC
GACAAAT'N'A
CTCGAATCTA
AGATGAATAA
121B0 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 GCAGGTATCI' TGTCCATGTT GCGACCATCC CCATGGCAGT ATCATCCAGA GCATACCGGT ATCGGATAAA AAGGGCAGGA TCAGAATGTA AAAAGCTACC GAAAATATTA GAAACATTAG C'rCATTTGTT GACCGAGCAG CAGATAAAAT CAAACAGGCT AACCGCATTT TACTATTCTC TGGGAGCGGA TCTCAATATC GAGAACTTCA A.AATTTTAC GGTTTGAGAA ATTACATGAT CGGTTTCAT TGAAAATTTT ATAGCGAGAG TCAGGTACC TGACGGAAGG AATTGTTGCT
GGCTTGGAGC
TTTGCTGAGA
TCAACTAAGG
GAGGAG7TTCC
ACCAATCTGG
TTTrCCGCAAT
GCCAGTGAAG
GATGTTTTAC
AGCAGAAATG
659 GCCGTCAGGT TrACCAGTC AAAAACACCT ACCGCAATAA GA7*rGCAX3GT GTCGTTCATG ATATTTCTGC TAGTCGAAAC AAGAAATTGC TAGTCI'GCGA TTTCTGAGCG 'GTCCGCCCT ATCTGGACTr GATTCGTGCC ACI'GTCAGA AAATCAAGAG CCGTCGCAAA TGATGTCTAT ATACACGTGG GAAGACCATC CAGGATTrCCC GATT'rTAGCA ACCGTCTATA TCGAACCCCG TGAGGTAGTC AAACTGAGCG GcAGATGAGC GCTATGAAAT CCTTCGCA'?? CTCCAAGAAA CATGCGGCTG AGATrGCTAA TGACCTTG ATTATCGGTC AAGGTTCGA'r TTATCCAACA AAGACAAGCA GTCGT~CCTC ATTCAACTGC TCCATGTCTG CCATCCTTrG GTCAAAAATG
TTTGGTCAAG
ATGCTCAAAA
GACAAGGGAA
CTGATATTGG AGATGAGCAG TCTA~rGAGC CCAATATCGT GGATATTCTT GGCAAGGTCA ATTTAACAGC TATTGTCATT ACACGTCCCA CTCTGGGCTT GACACAGGTC ATGGCCCAGT GTCGTGTTGG TATI'TTGAA GAAATCTTTGC AGAGCTTGTC TACCTTCTCT ACTCATATGA ACCAACAT'rC ACTCTTACTT TTGGATGAGT CAGCCCN'GC CATGGCTATT CTGGAGGACC CCACCCACTA TCCAGAACTC AAGGCCTACG GTATGGAC7'r TGATACTGCA ACTCTTCGCC 0 00. 0* .0 *9 4 0 0 0 0* 00 0 0 .00. -t
S
C
TGGGGGCTGG TACTGATCCC TTCGCCTGCG TCAAATCAAG GTATTGAGAC AGCCTTTGTG CGACCTA'rCG CTTTATGCAG GTCTAGGCCT ATCTGAAGTr ACGTCAATCG TATCATTGAG ACAATATCCC TGAGGTGGAG ACALACGAGCT TAATCGTGA.A
CAAGAGGGAG
ACCATGGCGA
CAAA.ATGCCA
GGTrGTTCCTG GCCGAAGTAA TGCCTTTGAA ATTGCCAAAC ATCGTAGGAG ATGCCAGTCA GCAGATCGAT CAGGACAATG CAATTACAAG AGCAGACGCT GGAAAGCCGC AAACGTTTGG CAALGAAAATC TCAAGATGAA CCGTGCGC'TA AAAA.AACTCT AAGGAAACCG AGcTrAACAA GGCGCGTGAA CAGGcTrGCTG 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 AGATTGTGGA TATGGCCCTA AGTGAAAGTG CCCAACTCAA GCCCCACGA.A ATCATTGAAG AAAAAGTGGA CTTGTCTAAA AATAACGTCC AGGTGGGAGA TGATA'rCG'rG GTTCTCAG'rr TCAAGGACGG TCGCTGGGAA GCCCAAGTTG AGrTTGA'rCT TGTTCAAGCC CAGCAAGAAA TGAAACGAAC TTCTGGGCGA GGACCTCAAG AAGAACCCAT GAATACCTA GATACCTTCA ACCAGATTCT CAAAAATCTC CACAGTAAkAT CCAACGCCAA CTTGAAAAAA TTGGCTCCTG TTCAAAAGGC CAAGAAAAAA ATGGTCAGCG TGGTACCTTG
CGAGCTCCAA
ACCAGTCAAC
*0*0 00 *0 0 0 0.
GCTTGAT'rAA GATGACCTTG GAAGAGAAAG AACCAGTCAA GAAGAAACAG G'rCAATGTI'G CTAGACTGGA TCTTCGAGGC AAGCGCTATG 'rCGACCAAGC CT'rGCTTPAAC AATATCGCTC GAGTCATCCG TGAAGGAGTT ACCAAATACT GCTATGCCCC ACAAAATGCT GGAGGCAGTG AAGTTGATAT CATCCATGGT TGCAAACAAA CAAACATGTC
ATCGGAACAG
AAGAGTrrCG GTGCGACTAT TGTCACTrT TTGAACTAAT TTTTACTAAT ATCAATTAAA TACCAACACC ATGACTGTAA ATGGACGGAA AACTGC'rCAA ACTCTCAGGT CTAAGCATTC CAGACTACAT GGAGAAGGAA ATGTTAGAAT CTTCATTTTA TTGGTCGGAA TTTGCGTCTA CCCAAGGCCT ATCCAGTTT GCAGCTCTGT 660 AAAGGATAGC AGTATTC'rGG ACTTTATAAA GTAAAAACTG AAACACATTG ACAAAAGCCA ACATTTTG TAA.AATTAGA GAATGAAGT'r
CTCTGGAGAG
AAAAGGACAG
GTATCTTGCA
TGCTrAAATC
CAGGGATTTA
TAATAGAAGT GGGGAATCGT TTGATTTTCC ACCGTAAAGG CACCGAAGGG CAAGGCAGGC AGCTAGGATA MCCGCTTTT TAGCATTTAT
TGTGCTCTTT
AATCGATGCT
CCTAACTATT
C7"lrTCGGGT TGAAACCATA.
TTTGCTrGGG GACCGCCCCT CGGCTAGGAC TCTGCAGGT GATA.AGGGAC ATGGTGATGT GTTGGAACAG GAAATATCAT CTATTTTGGA TGTGGATGGC TTCAGCTTAT TTTTATCCAG GTACAGCCTT GGCATCAACT AGGAG'rrGCG ACGGCTATCA AGGTTGGTGG GGCT'rTCTTr GGAATGGCTA CCAAGTATGC CAAGGACGAC CATGGTGCAQ TAGCGGGAGG AGAAAAGTGG CGACCACTTG CTGTTTTGTT GGGAATCGGA ACCTTCACCC AAGTCAACTC GATTrCGCCA GCCATCACAG CTCTCGTCTr TGGACTCAAG TCTATTCTA AGG'rTTCAAC TATCTTAGGA ACTCTrACAG TTATT'TCTT TTTAGTCTTT ACCTCAGCTT T'rAGTCCCCT CGTTCGGATG GCTATTCAAA ATGGTGTGGC GGGTTCTGCT CCTATTGCAG CTGCAGCTGC GATTTCCATG ACAGGAACCT TTATTGATAC CATCTTGGTA ACTGG
ACCAGGAGCT
GGAAGGACTC
TCCCATGCAT
TGCAGTAGCA
GATTGCAGAA
GTCTGTCTTT
TACTGTTGTT
TTGGCCATCA
TATATCCTTC
GGAGTATTGG
TCTATCCAAA
GTAGCGATTG
CCTTTTATGG
AATACCGCAC
TAGGGATGGG
TTGCTCTCTT
ATACAACGAC
CAGTCTTTGG
CCATCA'm'TA
GCACAATCGC
CTGGTGCTAG
AATCTGGTCT
AGCAAGGTT'r CTGGT'rTGAC 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 16995 TAATATCGGA AAAATCCCTG TGCTGCGGTA GGTGGATTTG GCGTGGTGTG TTCTCAAACG CAAGACAAAT GAACCAGTAG CCTCATCATT TGTACTCTAA INFORMATION FOR SEQ ID NO: 83: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 28473 base pairs (HB) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: CCGGGGCTTT TGTAGTATAA TAGAGATACG TTTTGAA.AGT AGGAGGTATC TATGGACTTA ACTAAGCGCT TTAATAAACA CAGGCTATT CC.AGATT'CC ACGCCAGACC ATGTCAAGGA ACAGGGATGA GTGGTCTGCT TACCAACTGG ACTA'rGC'CC TTATCTGCGA CTTTGACGGC GCTI'ATCCAG GCTATGAACC ACGACTGAAA ATGG7"rTTGT
GTTAGATAAA
TGGGGTTG
GGCGGG.CAAG
GACTCTACCT
TGAAAATGAA
TAT7TGGAA.
ATTCAACTr
CGTTTGACCT
CGAGCGATTG
CAGGCAGCCA
ATCrTGGTTA
GAGGGAGACA
CGTTGATTCG 'rCAG~rrGAC TGGGGG.AACC TGATT=ACA A'rCAGAACCA ATCCTACTAT GTGACTTTGT TAAGGAAAAG CAATTrGGGGC GACAGAGGCT AGGTACTTTT GCCAGCTCCT CAGAAATrGT TGAGATTGAT AGAAGGCCAT TTTGGAGCAG ATCCGACAGG AATTACCTAC AGTACGAAAT TTTTGTTGTC CCATGTGTCT CTAGGAACGA GA'rTGTTAAC TTAGTTGGGG CTTGACTCCT GAGATG'rTGG
GGTGATAAGC
AGTCGAGAGC
TGTGATGAGG
TCAAGGCGG'r TAT'rCTCAAC TATCCAGCCA AGTTAGAGGC CTTrGGCAGCT GTTTTACGCA TI'TACTCAGA ATTGACCTAC ACAGGCGAAG Vooo* 0 .0.
00.
TGTTGAGAGA CCAGGCTATT ATTATCAATG GTTTGTCTAA ATCGCATGCC GGCGTTTGGG GCTGATTT'rC GCTCCTGCGA CCTTCACAGC CCAGTTAATC AGTACTTGGT CACTGCCGCA AATACCATGG CGCAACATGC TGCGGTAGAA CTGGTAAAAA CCATGCGGAC CCATGAAGAA GGAATATATC CAACGTCGGG CGAAAAAATG ACTGCTCTTG GTTTTGAGAT TATCAAACCA GACGGTGCCT TGCTAAAATT CCAGCGGGCT ACAATCAAGA CTCCTTTGCT TTTCTGAAGG
GAAGAAGGCC
CCGCCTATCT
GTACATGAGA
AATTTTCGTG
GTTGCCTTTA TCCCTGGTGC TATGCAGCCA GCATCGGAC GAAGCATGAT TCAGTCTA'rC AGGATGACAA GCTCGTCAAA TTTTTTG'TCA AACACGCTGG TCAGTCTAAG GCACGATTTC TCTTGCGAAT CAATGATGAC GTCATGACTT TTCCCAAGAT TAATAGTGAC GCAGCTCT'rG CAGA'rGCTAG TTTGCAGGAC TTGCAAAAGA C1rTrGGAGTT GAI'GGAAGCA TTTGAAATTC AAATTI-rGAC TCGATrTGGA TGCCATCGGG TTGGTCAGGC TTTTGACTr'r AGCCTTTGGA CGTTACGGGG TATCAAAGAA GCCATGAAAC ACGAGTCAAG GCTTGGTGCT ATTT'TTACAG AGCAGGTTGG CTGGCGCC'rG TTATTCAGCC GGACTCAGTT ACATCGAAGA CTCTTTGTCA TGGCCTATGC AATCAGCAGG ATGCTCCCTT GGCTTGGATT ATCAGGTTT= ATCAGCCTCA ATTIMAATGA TCTTTCAAA'r ATGGAGCCTG
ATGACAGGTT
AAGAGTCACC
GCCTTGACGG
ACTATATCAT
TCTATATTTT
ATTTTGCTCA
AAGGCTACGT
GACTTGAGGA
T'rACAATCGC
CAAACGCATG
CTTGGTCCTG
CTATCATGAG
GACCTATGTG
GTT'rGCTTTr
GACCAATATT
GTGTGTCTTC
CCTCTGTCCA
CTATCTGCTC
780 840 900 960 1020 1080 11.40 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 GAGCAT'TATC ATGAGGA'rAA GAGACGTTGT CATCTCAATC CCAATATCCC 662 AATCAATTTC AAGCTATTGA IrTGAGACT 'N'GGAGACCA TrrCGCTCAA GCCTGGAATC AAGCA6AGAGC TACGCCAA'rT TATGGATCAA TTATATGAAG AGTACGTTIGG GATTrCACCTA AAATCAAAGA AATTTATTrGA TTCCCTAGCA GACTGGGGAC AATwrACTAAA AGAGGAAAAG AAATGAAAAA AATCGCAGTA GATGCCATGG GGGGCGATI'A CGCACCTCAG GCCATTGT-rG AGGGTGTCAA TCAAGCCCTA TCTGACTTT CAGATATCGA GGTTCAAC~r TACGGAGATO AAGCTAAAAT CAAGCAATAT AGAAGATTGA TTCGGATGAT TGGTATTGGC AGCCAACGCI' ATACAGGTGC CTTGTTGGCA GTCCTGGACT CATCTCTACC TTGGTGCCAA 'rGCAGAAAAT TCTATGCTAA AAATGTCCGT CAGAGAGTAG CAAGGGCGAC CTGACAGCGA CAGAGCGCGT CAGCATTATC GAACCTACGA GAGCTATTCG GAATAAGAAA
CATACGGATG
AATGCCAGTA
GTCAAACATG.
GCAGGATTCT
TTGCCTACCG
ACAGCCCAGC
GGCATTGCC
CCGCrrCCTA GTGAAGCAGA CGCTGTCCTTr TCGGCTGGGA TCATCGTGCG TCGTATCAAG AATATCGACC TrGATGGAAA AGG?'N'TGAC ATGCTAGACC ACCTCCATCA ATATGCGGTT CTAGGTTCCT AACCACGCGT TGGT'TGCTC AACAACGGAA GCGGCTGATG AGGAAACTTA TGAATTACTG a. AAAGTTTGAA CTTTATCGGA AACGTGGAAG CGCGTGATTT GATGAATGGC GTTGCAGATG ITIGTT"GTGCC AGATGGTTTC ACGGGAAACG CTGTGCTCAA ATCCATCGAA GGGACAGCTA TGGGAATCAT GGGCTTGCTC AAGACACCTA TTACAGGTGG TGGTCTTCGA GCGPLAACTAG 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 .3240 3300 3360 3420 3480 3540 3600 GTCCCCTCCT TCTCAAC;GAC AGCCTCAGTG TTG.GTGGAGC GGTCTTGTT'r GGTGTTAAGG ATGCCAAGGC TGTTTATAGT ACGATTCGTC TTGCCCAGAC TGCGCGTGAA TTTTCAGCAG TGACCGTATT GTGACCAT'rA TCCAAGAGCG CTTGAGTCTG AAAGACGA'rT TGGATCCGGA TCTGGAAGAT GAAITAGTA TCGAAATCAG AGCGAGATGTG GTTAAAATCA TTCAAGGAAA GTTTGAAXAAA ACAGCTCAAT CACCTGTTGT CAAGACTCAT AGATCCGTAC CATGCTAGAA AATAAAAGAG ATGACAGAAA
TATTCAGATG
GGCTCAAGCG
ACAGACGTGG
AAGAAATTT
a
ACAGGGAGAG
TTCTGTTGAC
CGATGAAGAA
ATAGCAATCG
GACTT'rGTCG TGACAGA.ATC TTGATGGAGT TTATCTTGAC ATTGACCAAC TCCAAAACC,' GAGTTCCAAG TCAACGGAAG
TAGATGGTTT
TTATCAAGGG
CAAGCCGCTA
GrGAGAAT
TCATCGTAAA
AGGATTCTAG
TTAGAAATGA GAAATATCGG ACAACCTGGT AAAATCTTGG CTGACAGTGG CTCATGAAGA TATATCCTCA AGCACAAACT CCACGTAAAT CCAGCAAACT ACAGTTGAAG ATAAACCCTG TAATCATGCG CTATCTAAGG AGATAAGCAA ATCTN'GCCA AAGTAAAAAC GTrrAAAATG TTTTCAACAA CCTATCGAAA CGCTTCGGAT TACGAATGAA TTTGATTGCT GGTATTATCA ATCATGAACT TTTTGCAGGA AGTCTAATAG TAAAAAAGTG ATTAGAAAAC ATCTT=T-A AAAATAGAGA TGATTTTGAA ACAAAAAAGC TAATTCAACA CGTCCMATG CCAAI-rCAAG ATTGGATGA AAAAAATTAA TACATACTGT TATACTAAAC TT=GCAAGTT TGTAACA.AGA CAAATATTAA AAATAAAAAA GAGGTA1-rCG TTATGAA'rAC AAAAACGATG TCACAATTTG AAATTATGGA TACTGAGATG
TTGCCAAAGC
GAACATGGCA
ATGCAGCGAC
AGGTGTTGGA
AGGTGCAG-CA
ATGTT'GGTGG
CTTGCTGCG TTGAAGGTGG CGGATGCAAT TGGGGAGATT GGAGGAGCAG CACGAGGTCT TCAGCTAGGA ATTAAAACAA ACTGGTGCTG TGGGAGGAGC TATACTTGGA CGTGTGGCCT Th.ATTATGGA TTTTAAAAGT TTTATTATTG GTTTAGTAGT TGGATGATT AArrAGAAAA AAAT=~TAA AGTCTTCGGA TIGGTATATTT GGTCCTATA GAAGAAAACA GAAAAATCTG AAATTATTT'T GCGCATGTAA TGAATCCTAA TTTAGAACCG G=rAGAAAT AGCAGAGACA TATCAATGAT TTATTTGTTT CAATAGCATA TTTTAGTGG CAGTTGGGAG GGAGATAGGC GTAAGATAGT TGTTATCACA ACGTCGCCT GCCGTATATA GTGTTirTGAG CAGCCTACGG TTAAAAAATA ATCAAAACTA TAAATGATGA AGAGGAGTCT TATAGTAACG ACTCAAAAAA TGTTATCTAT TCTGACTAGG AATAGATCAT TTAGAAArrG AAGTAATAAA TAGGATGTCG CAAGCTTGCC TAGGGTGACA GTAAAAAATC GCAGGACTCT TGTTCTGCCT ATI-rAT TCArrTGGGA AGGAAGTCCA GTTTTTGTTT TGAGTTAATA CTCTTCGAAA ATCAAATTCA TGTGACTGAC TTCGTCAGTC CTATCTACAA CTAGTTTCCT AGTTTGCTCT TTGATTTTCA
ATCTGAATCA
GGAGTAACTA
ACCAGAGGTA
TAAGTGTTAC
AATTTCCTT'r
CCAAAAAGTG
AGTGATTGGG
AACCACGTCA
CCTCAAAACA
TTGAGTATTA
GTGGATCAGA
TATTATTTPT
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 GGGAAAAGGA GATGAATA'rG AAATTTGGGA AACGTCATTA TCGTCCGCAG TGGACTGCGG TGTAGCTTCA TTAGCCATGG TTTTTGGCTA CTATGGTAGT TGGCTCACTT GCGAGAA'I-G GCTAAGACGA CCATGGATGG TCAAGGTGGC AGAGGAGATT GGTTTrTGAGA CGCGAGCCAT TTGACTTGCC GGATTTAACT TTTCCNTTG IrGCCCATGI' TCCACTACTA TGTGGTGACT CGCrAGGATA AGCATAGCAT CCGGCGTGAA GTTGACTAAA CTCCCACGTG AGCGTTTTGA tTCTTTTTAT GGCACCTAGT CCAGACTATA AGCCTCATAA TCTCTNTTAT CCCTATATTA GTGAAGCAGC GTGGCTTGAT CACTCTTGGT AACCGTGArr AACATTGTGG GTTCI-rATA CCTATGTGCC AGATCACATG CGTTCGACAC TAGGGATTAT GACGACGGCT N'GGGCTTGG TAAGGCAGAT ATGACGCTTTI GCTTAAGGAA GGGAAA'rTGC TCATATTGCC GATCCAGATC GGAAGAATGG ACAGGAGTGA GGAACAAAAA AATGGTCTGC TGCCAATATC GTTTGGCAA TCTGCAG;TCr ATCATTGATA 'rTCTATTGGG CTAGTCATCG TCTACATC7*r CCAGCAAATC AACGC 'rGTC GATTGACGTG CCTTCI-rTGC GACPACGCAGG TCATCGATGC GCTGGCTTCG TTAN-TCCCT TGTrCTA1-I- TTCCTATCTA CACACTGATT ATACCATGGA AGCCAATGCG 664 'rTGTCCTTACG CTCAGGAGTA TCTCTTGCI-r GTTTTGGGGC ATT?1'GTCCT ATATCAAGCA 74=N1TCAC CTCCCTATGT ACAGGGGAGA TCGTGI'CTCG TTTTACAGAT GCTAACAGTA ACCACCT CGATrTCCT AGATGTGTCA ACGGI'GTCA TCACAAAATA CCAATCrCTT TTrCATGAC'r 1TAIrGGCGC ATCrrI'rGCCT TTArGAAGCC GTlrMAAAAG ATGAATCGGG GTTCTGTCTT CTTCTATCAT TGAGGACATC AACGGTATTG 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000
AGACTATCAA
TGGATTATCT
AAAAGGTTC
TGGATGGCAA
C'TAATCCTTT
ATAACCGTCT
AGGATTTGAG
ATGGTCGAGA
TGGGGGAT
ACCCAAGTCA
CCCTGCGCCA
TGGAGAATCT
TCGAATTGGC
TCACTTCGGA
GTCCTTGACC AGTGAAAGTC GAAGAAATCC TTTACCTA'rA GTCGAGCAGA GAGTCAGCAA CCATCTCTTG CTTAATGTCG GCATTCTCTG GATGGGGGCT GATGAGTTG GGGCAGTTGA TTACCTATAA 'rACCTTGCTG GGAAAATATC ATCAATCTGC AAACCAAGCT TCAGACAGCG AAATGAAGTG TATCTAGTAG CTTCTGAGTT TGAGGAGAAG CTTGATGAAG GGAGATATGA CCTTCAAGCA GGTTCATTAC TGTCTTATCG GATATCAA'rT TAACCGTTCC CCAAGGGTCT TTCAGGGTCA GGTAAGACGA CTTTGGCCAA GATGATGGTT AGGGGAGAT2' AGTCTGGGTA GTG'rcAATrCT CAATCAGATT GTACATCAAC TATCTGI'CTC AACAGCCCTA TGTCTTTAAC TCTTTTGGGA GCCAAGGAGG GGACGACACA GGAAGATATC PGAGATTCGA GAGGATATCG AGCGCA'rGCC ACTGAATTAC TGGGGCAGGG ATTTCAGGTG GTCAACGTCA GAGAA'rCGCT
AACGCTCTGA
G=rCTGGTCA GTTTACT'rrA CAGG'IrGCCA 6060 AAAACAGTTG 6120 AAGTATCGCT 6180 AAGGTGGCTT 6240 AGCGTTACCA AAAAATTGAC AAGGAA'rTTG
AAT""M'ACG
GATAAAAAAG
GGAACGATT'r
TTACGGGCGG
CAGACAGAAT
TTGGCGCGTG
CTCTCTTGAC AGATGCGCCG GTC'TTGATTT TGGATGAGC CACTAGCAGT 'rTGGATATT TGACAGAGAA GCGGATTGTC GATAATCTCA TTGCTTTGGA CAAGACCTTG ArrTCATTrG 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140
C'TCACCGCTT
TTGTCGAAGA
GACTATTGCT GAGCGGACAG AGAAGGTAGT TGTCTTGGAT AGGAAAGCAT GCTGATTTGC TTGCACAGGG TGCCTTTTAC TCAATAGCTA GAAAGAGGAG ATCGTCGTTA CCATAATTTT TTTTACTTGG CTTTGCAACT TCGAACCTAG TCGTATCCTT AGGA'rGAAAC CAGAA~rTTT AGAAAGTGCG TCCAGTAGTC TGATTGTACC CATGGCCCT? GTTGCAGAGA AGGAGATGAG 7TTGTCCACT
CAGGGCAAGA
GCCCATT'rGG GAGTT7TTATA
CTGCTTGTGT
AGAGCTACTG
GCAAATATCC AGTCAACTAG CAACAATCGT ATTC?1'GTCA ATCATTTGGA AGAAAATAAG CTG=TAAGA AGGGGGATCT 7TTGGTTCAA TACCAAGAAG a.
C
9. *9 9 S 5 9 GGGCAGAGGG TGTCCAAGCG AAAAGCAArr GGAGTATCTG AGGATAAGTT TGGCTACCAA GGGCTAGTAC ATCGCAACAA CCCAAGCCGA AATCCGCAAC CAGCTAAGTC AGCTATTGAA TrrACCAGTC CTACAAGTCT CACAGGTTGA AGCACAGArr ATGCACGTrC AGGTACCCAG TTAAATCCCA ACACTTGGCA TGGAGGCAGA GTCAGGTAAG CGAGTGAGGA 'rGGGGTGCT'r AAGGTGCCCT ACTAGCCCAA CAGCTTATCT AAGTTCAAAA CTACGACTCA TGATGCCGG CGACAGCTAC TAACACTGAG CTTCGGACCA GGCTGAAAAA GCAAGAAAAG TTACCTACGT GTTTTAGAG 'rrAAATAATT ACAA7TTG AAAAACATCT GTGGTAAAAT GTGCTCAAGT TATTCGGGAA AAGCTAAAGA TACAAGGACC AGGCGACTGC GTCT'rGAATA ATCAGATCTC ACTCACTG TGGAGAAACT dCTTGGAAG TCGTGCTCCG GATGAGGGAA TCGCCTTGGA GATGATCCAT TTATCAATGA ATTGCCTACT TGAAGGAAGA GAGTCCTATG CCAGTCACTT GGACATGCTA AAGGATCAAA CAAAAGAGCC TCCAAGAAGG GGAGAACCAC TTTCCAGAGG GCCACCTTTC GCGACTACAT CAGTCAAGCA GGCAGVCrA AATGAGACCA TCGCGTCCCA GAATGCAGCA GCTAGCCAAA CTCATCAGTC AAACAGAGGC TAAAAT'rCGC GATTACCAGA ACAGGTGCTT CCTTGGCCGG TCAGAATCTA GCCTACTCTC CAGGGCGAGG AAAATCCCCA AACTAAGGTT CAGGCAGTTG TrCTCAGTTAG AATCTAGTCT TGCTAC?1'AC CGTGTCCAGT CAAGCCTATC CGTCAGGGTT AAGCAGTCAA TTGGAATCCC AAGGTTGGTC AGGAATTGAC CCTTCTAGCC CAGAAAAT AAGGTACAGG GAAATC="~ AGACAAGGGG AAAGT'rACCG CA'rCTTAATC CTGAGACCAG TGATTCTAGC ATGGTTGCAG CTTTATCCAT CTrTGGAAAG AGAAGGGAAA GCCAAACTCA TATGTAGCAA GAATCAAGGT CGGTGA~rCT GTTCGCTATA AATCAAC=r TCCTAGATTC TACTATTACA AGTATTGATG AAAGAATT TCTTTAAAAT CGAGGCGGAG ACTAATCTAA.
CT'rAGGTACG GGGTGGAAGG CCGCTTGCAG ATGATTACGG TATTATTTGG ATCAATTTT GAACAAAGAG TAATGT-rCG' TTTAAACTGT GAGAAAGA'rT CT'rCTTGCAG TTr'N'TCTTT ACTATTTATT CGGTTAAATT CTTGTGTTTT TTGGTTTT'rr AATACGAAAG GCGAACTTTA AAATGTCAAA ACAATTGATC TATCTATACA ACTGAGGATG AAAATCTTAT 'rAT7TCAACT TTTCAACGGT GTCAAGAAGG AGCAGATTGC AGGTAAGGGA ATCTT'rTATT TMN'AGAAAT TAAATGTGGC TGGTGTGGCG TTCAGACACG GAACAACTCA ATAAAAAGG'r TAAGATTATT CAACTATACT GCTGG'rTCCT TTTCAAAACG TTTTGGTGTG GACTCCGATT GTCGAA?'ITr ACTACAAAAA TGATGATTTG TGAGCATG'rG AAATTCCTAC AGATTGCGGG TGACCAGCAG 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 SC 9* 0
S..
S
90 S S 1LACGCGTCGT ATCAATGAAC TATTGAAAGT CTGGTTrC 666 GAGATTGGGC TTAAATTGAT TGACTr'rAAG CTAGAGTTCG GrrTGACAA GGATGGCAAG ATTATCTTGG CAGACGAAr'r TTCACCAGAT AACTGCCCCT TGTGGGACGC TGATGGCAAC CACATGGATA AGGATGTTTT CCGTAGAGGA TTGGGAGAAC TAACCGACGT C; 17rGGGAAA AGTTGCAGGA ATTGAAATAA TCTGTTTGCA ACGGAAAACC AACTAAAAGG ACTCAGGCTG AAAAGGTCCC CCAGACCTTT TCACTCTGTA TGAACTAACA GATGI-IrACG AAATTGTCTG GGAAAAC?1'G CAGGGTr'rAA rATGAGA1TT
TTCGTCTCTC
GAGAACTAGG
AATAACAACC
GATGGATAAA
GGTAGAGAG
TCAAGCCTGT TNGGGAATAT TGCAAGAGCT CCTAT?=?G 'IrGAAAAAAA GGCTGATTTTT CTCCAGCACA ACTTGGGACT GTCAAGCTTG GTATTTGACT TGGCTGAGGA CTTGTTTGCA GAAATAAAGG AATAACAAT'r CAGGTCAAGT CAGAGAGTTT AAAAGTATTC CTATTGTGCA AGTATATGAT CCTGCAGAGA AGCACAT'N'T CTCTGAGCAG GTAACCGACC ATGTr'rTAGA TGAAGTATCT GTGCAGGCGG ATCTT'CCTAA CTATGCTT'rC T'rTGCCATTG AAAGTC'rGCC AGGGCAGTTT GACCAGCGTG CAGCTTCGTC ACAGGAAGCC ?TGCTTr'TGT TGGGAAGTTC AATAAAGATA TTGATGCGAC CATTCTCGTT TCAAGGATAT AAGACCATTC CCAAATTGAC AAGGCCGAAC AAGGGATGGC AAGTCAATCG GGCGCGTGCC GACCACTGCC GTCATACGAC AAATTTCAAA AGCAATTGCA GGTCGGTCTG AAAAACCACA CCTCAATG GACGATTGGA GAAATTGAAG TGGACGTTGA ACCCACAACC ATCCAACAGA GC'TATTCGTG ATCCGTTGTC
GAGTGACGTG
TGAGTTGGAA
CACGACAGGG
T'rTCTT'rGAA
CATGGAAGTG;
AACTGAGACT
TTTGAGACA
GTCAACCTAT
ACACTCAACA CAGCCCAACT TTACTTGGTG GCTGTCAAAkA ACTACCTGCT CAATCCAGTT ATTGCCAAGC AGGAGTTTTC AGAGTCAGAC AGCTATGCAG CAGAAGACTT TGCTCGCTAC 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 GATGA'rTTGC
GAACTCAAGG
GAGTTGAAAC
GACAAGTATA
AACCTTGATG GA'rATGGCGA TGATATGGAA GTCTCTGACG TGGTGTCAAG GAACCTTGGC AATI'GAGCCA TTTGG'rGGAG AGGCCGT'rCC TATGTTTACC TCTTTATCCA AGACTACTTT TTTTGGACAC TTACTGGTCT ACATCGACTT TTCAGCTTCT TTGCCATGCG CGAGGAATTA CTATNTTCGG TCGTTATGAG AAATCAATGC CT~GCTCAGTT TCCTCA'rGTT 'rAAAAACGAA CGGCTACCTG TATTGGTGGA AAGCCATGCG TATTTCAGGT GGAAATTGCC ACAACAAGTC ACCAGATTGG GCTTGCAACA AACGTATGGA ACTTGGTGCC AACCTGAAGC AGGTGATGTG CCTGGTGATA TTACAGCACC GATTrCGGAA ACTCGCCCTG ATTTCTAAAA CAGCAGCTCA TGG rATTCT TCATATGGTA ACCTACC'rTC GTGAATACTT CCACCCAGGC T'rTGTAGCTA GTTGTTGGTG CGACTCCCAA GGGCAATGTT CTCCGTGAAA ATCATCCTTC TCGGAGGCAA AACAGGTCGT GATGGTGTCG GTGGTGCGAC GGGCTCTTCT AAGTCAAA CAGTTGAGTC ATCGAAGAAC GCAAGATTCA AAGTCCAATG ACTTTGGGGC ClrGAAATCG ACCTCAACAA GCCATCTCTG AATCACAAGA TCGTTGCCG AATGTAACAA AAACCAAATC TTGTCATGCA CTTGACACCA ATGGTGTGCG CTCCCAGAAG ACTCAAAC TCTGACCTCA ACCATGCAAG CGCTCAACGG r'rAATCACCC CTGCAGAAAT TGCCAGTTCA 7TCAACCCAT ATGTAGCTGA GCAACTGCTC GTTTGGTGGC GAGTAT'rTCG AGCGTATGGA
CTAGGTTCTA.T'TGAAGCACA
ATGTCTGGTA CCT'rTGAAGA ACGGCAGATA GCCGTAAGGT TACATCCCAG GTCAAGCCCT GCTCAGTTTG AAGCCATCCA GGTGG;TGTAG TTGAAAGTTT ACCTTGCCTG AACTTGAAAC CCTGAAGAAA TTGCTGGAGT GTCAACGGTC TGAACCTAGA GAAGTT'rACC CAACAGAATT 667 TGTAGAGACT GCTGG'rGCTG AGGTTCAAAA AGCAAATGCC GCGCCTCTC CGTAATGGCA ATG1'CACTCG TCTGATCAAG AGGCGGGTC 74GTCTGGCTA TCGGTGAA?1' GGCAGACCGT GC'rGCCTCTT AAATACCAGO GCTTGAATGG TACAGAAATr ACCGATGGCG GTCGTGGT'rC GTCCTGAAGA TGTGGATGCG AGAAAATATN GATGCTGTTG TGGTGGCGAC AGTAACTGAA CTGGAATGGT GAGACAATCG TGACrTGGA GCGTCGTTTC CGTGGT'rGTC GATGCCAAAG TTGTGGCACAA GGATGTCAAA ATCTGCTGAA ACACTGGAAT CAGATACCCT TACGGTTCTA TCAAAAAGGA TTACAGACTA TCTTTGACTG C'rCTGTTGGA ACrTTGTGGT CGTTACCA.AC TCACACCAAC TGAGGCATCT ACACGGTGTG ACTCATACTG CGTCGGTCAT TGCTCAAGGT ATGGTCTCCA TACCACGGTG CTGCTTATGC GGTTATCGAA TGCTGGTGCC AACTGGTTCA AGGCTCGTTT CTCTTACCAA TAAACAAGCA GAGCGTTTCG GTCAGCCAGT AGCTGCTCT'r AAT'rCAGCT'r GGC=-GCCAT CTAT.CGGT.GG TAAGGACTCC ATTGACCGTT CCGCCAACCT TGGCTGCCTT TGGGGTGACG GCTCTCTCCA GAATTTAAAG C'rGTTGGGGA AAATATCTAC CTCTGCAGAG ATT'GATTTTG ACTTGATTAA GAAAAATTTT AGCTGACCAT AAAGTGACAT CTGCATCAGC TG'rCAAATAC GGCTCTTGCT ACCTTTGGAA ACTATATTGG TGCAGAGGTG AGCTTTGACA GCTCAATTrAG GCGGCTTTGT CTTCACATCT AGAGAAGT GGACAAACGA AAGCAGACTT TACACTGACT TGGACACAAC CTTGACAGTG CATTTCAAGG GACATTGGAA TACCCAAGCG AAAGAACTAG AAGAAGTACC AGCTGTGGCA 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 1..1640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420
TCAGATGTTG
TTCCAGGAA
TGATTAAAGC
CCAACTCAGA
CAAAGAAAAG
ATATGATTCA
GACCTTGAAT
GACTAATATT
GTNGAAAAAC CTGTGGTTTA CATCCCAGTC GCTAAGCCT TCGAAAAAGA AGGTCCAGAG GAAGAAGCTA T'rGTCAAGTC AGTTGAAACT CTCTTCTTTG CTGGTGGATr CTCGGCTGCG CTCAATTTGG TGCCATTCGT ATGGTACA ATATCGACAA 668 GATGAACCAG A'rGGTTCAGC TAAG?1'?ATC GTCAATATCC GTGCCTATTG A'rAGCTTTAT CGCCCGTGGT GGTI'TGArrA CAACCrrAG TCAAATCGCG TCTCCTACCC TACGGAAACT AGCCCAACCC TCrCTACAA 'rGATGCCAAC CAACACGTGG A~rGCCAATA CCAACrCACC ATGGTTGGTT GGTGTGCAAG TGCTTAATGA AAAAGTGCGT TCGGTA'rN'G 1TrGAAGCTGC
CCAAGATGGT
TGGGCGATAT
AGGAA7'rTGC
GTAAACCAAG
TAATGGATC
TAACAGTACT
GCAAACTCGC
CCACGCTA'rr
AGAGCTCCCT
TATGGATTCT
CCTGTTTCGC
GACAATGGAC
AAGTACAA'rC
ACGGTGAAGG
AAMT'CAG
CGAATGGTTC
GAAGTTTGTC GTGACGGCTG CCAA'rACGTT GACTTTAACG TGTCCATGCC ATCGAACGAA CTCAGAACGT TATGAGGATG TACCAGCAA GAATGGTCAA GTC7TTTTCCA AAATATCCCA ATCATCGGTA AGATGGGCCA GGCAATAAAG ACCAACACCT T'rACAGATTT TCTAATAGAT TGATTACAAA TGAAAATTAG AATGTGGTGT TTTCGGTATT TCCZACAGTCT TCAACACCGI' AACTGAAGCG CCATCGTGAC TGGATAAATT GACAGGAGCT CTTCTGTAGA TAACATCCAG CTCATAATGG AAATCTCACC CAA'N=rCAG CGCGACTTCG ATCCTAGCCT GATGGGCAAA ATATCTTGCT GTTTGAGGAC TTTCGATTGG TAAAATG4GCT AGGTCATTGG TGCCGAGTGG ACGAGGGCAT TCAGTATGAC AGTATATCTA CTNTGCTCGC GTAAGAGAAT GGGAGCGCAA GTGTGCCCAA TTCTT-CCCTA A'rCAAATGGC TCTGATCAAA AATTGCGGGA GCAAGGAGTG AACCTGTGGT CA'rGGTGGAT GTTCGCATCA GCGGTTAAAC ATTTCACTGG AAAATAAGAC AGTATCAGTA ATGTAAAAGT CATGTAAATC TAGCTCTTGA GTATAAAAAA TGACATACGA AGTAAAATCT CT'rAATGAAG TGCGGACATC CAGATGCTGC TAAGTTGACC TATTTTGGAC GGTCAGGAGG GGGCAGGAAT CCTCTCCAAT GATCAAGGAC ATGGGGCTTT TA'rCAGAAGT TTTCACAAAT CCAGCTAATT' GGTGCGATTG GGCATGTGCG TTATGCGACT GCTGGCGAAG CCCTTCCTCT TCCGTr-'I'TCA C,;ATATGCAG rTTGGTTTGG AATGCAGCCT CTCTCAAGAA AGAACTGGAA CAAAGAGGAG GACTCTGAAA TCI'TGGCTCA CCTCA'rTCGT CGCAGTCATA ATCAAGGAAG CGCTCAGCCT TGTCAAAGGT CGGTTTGCCT AAGTTGATTG CGGCTCTTGA CCCAAATGGA TTCCGACCC AATGGAGCAG ?rGT'TGTATC T-rCTGAAACC TGTGCTTTTGr ATTCGTGA'rT TGAACCCAGG TGAGATTGTG ATCATTGATG AGCTATACAG ATGATACCCA GTTGGCGGTT TCTTCTATGC CCTGATTCTA ATATCCACCG TGTCAATGTC CATACGGCAC 'rTGGCGCGAG AA'rTTAAGCA TGAGGCAGAT ATTGTAGTTG AGCCCGGCTA TGGGAT1'TGC GGAAGAATCA CGCTTACCAA AACCAATACA CCCAGCGAAC TrrTTATCCAA CCGACTCAAG CGGATGAAAC TG'rcGCMrT TTCGGGTGTT GTCAAAGGCA GATTCCATTG TACGTGGAAC AACCTCTCGT CGTATCGTTC 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 669 AGCTCTTGAA AGAAGCGGGT GCGACTGAGG TTCACGCTPGC CATT'GGAAGT CCTGCACTAG I CGTATCCATG TTCTACGGG ATTGATATCC AGACCCGTCA GGAGCTGA~r GCAGCCAATC ATACGGTCGA AGAAACTCGC CAAATCATTG GTGCGGACAG TCTGACrrAT c~rTCAATTG ATGGCTTGAT TGAGTCGArr GGTATCGAAA CACATGCGCC GAACGGTGCT CTCTGTGTCG CTTAC7*rTGA CGGTGACTAC CCAACGCCTC TTATGACTA CGAAGAAGAC TATCCTAGAA G'N'GGAAGA AAAGACCAGT TTrrACAAGT AGGCGACAGA TTCTCCATTA AAGAAAACGA AAAAATAAAT GACAAATAAA AATGCATATG CCTCACGTCT CACTACTGAC TAAAGGC'rTA AGCATTTAGT CAGTAGACC TTTrAGATAAA AAGATGGTTT GAAAAATAAA ATGGCAAATA r'rATGAAGTT GTTGAACCGA GGGAGCTCTT GGTGGC7TTG CGTCTTG.ATT TCAGGGACTG CAAGCACGAT ACCATCGGGC AGGTGCGGAA CCCCTCTATT GCTAGAACAA G'rGGT7TG CGGTGGGGAA ACGGCTGAA.A GTGGTATGTT TGACCTTTCC ACGGTGTCGG AACCAAGCTC AGGACTGTGT GGCCATGTGT TCTCGACTA CGTAGCGACA CTGTG.GCAGA AGGTTrGTGTG TGCCGGGCAT GTACGGCGAA AAGACTGGGG TTAAAGAACC ATGTTGGCTA 'rCAAGTACGA GTCAACGACA TCA'N'GCTGC GGGAAGAATG AACCAGCTAA CAGGCTGGTG CTGCCCTCAT GACGACTATG ACTTGGCTGG TI'TGTCCTAT AGGATCAAAG CTAGAGCCCT GACTAGTATT ATCTAAAAAT ACGTCGCAGT CTTTCTCAAA AAAAGAAAAG AAAATGCGTA CGCTCAATCT GGTGTGGATG TrTGAAGCCGGG TTAAAAAGCA CGTGGCCCGT ACGGAGCGTG CACGTGTCAT 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 TTTTCCGGTC GGTGTGGCTG AAAAATCTCA AGATGTTCT'r CTCGGACTTG CTTCAAGTGG TCGTrGTCTTT GCGGATTACA CAGGTGAGGA TAAGGAAGTT CTACTTGAGC CGACTCGTAT AATCATTGAC GCTTCAAAGG TGGTAGAGGG GATTCACTCA AATGGTTACT C'rrTGGTTCG AGTCCTACCA GAA'rTGGAAG GCAAGAAACT CTATGTCA.AG GCTGTCTTGC CGCTCZATCAA AGAAGAGTTG GTCAACGGCA TTGCCCACAT CACAGGTGGT GGCTTTATCG AAAATGTCCC TCGTATGTTT GCAGATGACC TAGCTGCTGA AATTGATGAA AGTAAAGTTC CAGTGCTTCC AATTTCAAA ACCCTTGAAA AATACGGTCA GATTAAACAC GAAGAAATGT TTGAAATcr CAATATGGGT GTGCGACTTA 74GTTGGCGGT CAGCCCTGAA AATGTAGAGC GTGTAAAAGA ATTGTTCGAT GAAGCAGTCT ATGAAATTGG TCGCATCGTC AAGAAAGAAA ACGAAAGTGT CATTATCAAA TGAAAAAAA'r AGCGGTrTTT GCCTCTGGTA ATGGCTCAAA TN'rCAGGTG A'1TGCCGAAG AATrTCCAGT GGAGrTTGTC TTTTCAGACC ATCGTGATGC CTATGTCT GAGCGTGCAA AGCAGCTCGG CGTTCTGTCC TATGCTTTTG AACTCAAGGA GTTTGAGAGC 670 AAGGCAGACT ACGAAGCAGC CC7rGTCAA CTCTGGAAG AACACCAGAT TGACN'GGTr TGCCTAGCAG GCTACATGAA AATCGTTGGA CCAACCrAT TGTCGGCTTA TGAAGGTCGG ATTrGTCAACA TTCATCCAGC CTACTTGCCA GAAMNCCAG GCTTGCAATG CTGGCGTGGG TCACTrCTGGT GTGACCATTC GATACAGGCC AGGTCATCAA ACAGGTTCGT GTGCCACGAC AGAT7TTGAAG
CTATTTACAG
GTG7TTTGTTG
CTCGATATIGT
AAACAAATCC
CTCGCATCCA TGAAGCAGAG ATTGACTI-r TGATGATTCA AAG.ACGGCTT CAAACGGAGG ATTTTATGTC TATC'TGATGT TGAA'rTrATA GCATT=T
TACAGGCTGT
TATGATA'rCT
TATTTGTAAT
TATTAACTTG
TAGCTCCAAG
GAGCTCATGG GATTGAGGAT ACTGGGTGGA 'TCCGGTCTG TAGCTGATGA TACCATTGAC ATCCCGAAGT AGTGAAGGCT TTGA=MTTAA ATTGGAGTCA GTTAGAATCT AAAAAAACAA GGGAATCTTA TTTAAGT7'rG GTA'rATCAAT TGGATTCCAT TGAAATG?1'A TTTAATCTGA AACTAATTTA TCTAGT'rTAA 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 TTTCAGAACC ACTAATAGTC 'I-r'TCTTTAT TCCATTAGGT GAATAGTCGG GACAGGT'rTC CAATAGGTAT AACAGATATA TACTGATTTA TCAAATTTTT TCTTAGGTAT GCTrAGCCTT GTGIAACT AATGATTAAA CAGACAAAGC GGGCATTGTT TCTCAACAGG TGGAACTAAG ATGATGTGAC TGGTTCCCA TCCACGGAGG GCTTCTCGCT ACAAGATTGA GCTCA'rGAC TTAAACCAGA TGTGACTTAT TGCT1'C=~C AGCAGCGaAA ACGCTG'rGGT TTGGATGAA GATGGAAAAA TTGTTTTTGC GT~wrGT'rTCC CTTGATAAA TTGATTAGT1' TATTGTTTGA ACGGATT'rGA CTTTAAATAC
GTC'TTACAG
GCTAGGTGTC
TATATTTTAG
TGTGTAGGCT
ATAAGAGTGT TCAAATCACA GACTAGAAAA TGGATCAATA GTTT'GCT'r ATCTTGTTT ACTGTTACTG CATTTACT'rA AAGGAGAATA TAA'rGACI'AA ACGCGTC'rTA ATCAG;CGTCT GAATTrTGCCC AAGAACTCAA AAAACTTGGT TGGGAGATTA GTTGCCCTTG ATAATGCTGG GGTGGATACC ATTGCTATCG GAAATGATGG ACGGTrCGTGT GAAGACCCTC CACCCAAATA CGTCGTGACT TGGATAGCCA CTTGGAAGCG GCI'AAGGACA 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 7400 CTTGTGG'rGG
GCTGATGCAG
AATCATrGCCA
TTGGCAGCAA
GTTTAGCAGC CAAAGTATT CGTCACACAG TCACAGCTCA AGTGGGTGAA AGCAAGCCTG AACCAATGCG rrACGGTGAG AATCCTCAAC CTACAGACTA CTCCATTGCT TCAGCCAAAC ATA'rCCGTGA TGCAGATGCT GCTATCCGTA TCAACCTTTA CCCATTTAAG GAAACTATCC 'rrGAAAATAT CGATATTGGT GGGCCATCIA GTGTTACAGT TGTG-GTAGAT CCTGCTGACT ACGGCGAAAC C'rCTATGAA ACTCGCCAAC CGGCTTATGA CGCCTTGATT GCAGAATACT AAAAACTCAC TTTGACTTA'r GACCTCAAGC AAGACGCGGA CTTTTACCAG AAAGCTTGCC AGCTCAACGG GAAAGAATTG TCATTTAATA TCATCCGTGA CTTCAAAGAT AGTCCAACCG 17460 17520 17580 17640 17700 17760 'rrGTGvGCTCr CAAACACATG AATCCATGTG GAATTrGGTCA AGCTGATGAC ATCGAGACTC C7TGGGACTA CGCTATGAG TCTGACCCAG TGTCTATCTT TGGTGGGATT GTCGTCCTCA ACCGTGAGGT GGATGCTGCG TTGCACCAAG CTATACGGAT GTATCCTTGC CTTGCCA'TTT GTGTAGTCGG TGGACTTCTC GGCAAGTCGT GACI'AAACGT GGAAGGCTAT CAAGTACGTC TTGGTGTTGG TCCAGGTCAA CCAAAGATCG TCTGGACGG4G ACGTGGAAGA AATCGCCAAA ACACI'G3AGA AGATGCACGG CGTTrCCTC GAAATCATCA GAAGCGCTAG CCA~rTTGAT CAATAAAAAG AAAAACTTGC AATGCTCAAG AGGCTAGCGA AGTGGAAGCA GAATACACAG GTGCAAAATC AAGACGTGGT CAAGGAAAGC CCAGCTGACT CAGCCAAC'rG AGACAGAAGC GACTGCTCTT GAGTTCGCTT AAATCA.AATG GTATTATCGT GACCAACGAC CACATGACAC ACCAACCGTG TGCCTTCTGT 'rCGCCTTCCC AT3'GACCAAG GCGGTCCTTG CTTCAGATGC CTTCTrCCCA TTTCCGGATA GCAGGAATTA AGGCCATCAT CCAGCCCGGT GGCTCTGTCC 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 GTGACCAAGA ATCCATCGAA GCAGCGGATA AA'rACGGCrr GACTATGGTC TTTACAGGTG TGAGACAT'rr TAGACATTAA GAAGATAAAA GGGAAGAAAA CTTAAAATAC TAAC'rGAAAC AAGATTAAAA CGAACTr=r ATTCGCAAAA GAGGTTGAG4G AATGAAACTG CTTGTTGTCG GCGATTGCTA AAAAGI'TACT TGAATCAAAA GACGTGGAAA CAGTTTCTTI' CC'rTTTTTGG TGATATAATG TTGGTAAATA GTTCTGGTGG TCGTGAGCAT .AAGTCI'rTGT AGCTCCTGGG 18480 18540 18600 18660 .187.20 AATGA'rCGGA TGACTCTGGA TGGTTTGGAA 71TGGTAPLATA TCTCTAT~TT AAATTGATTG ACTTCGCAAA GACCAATGAT GCCCTTGCTG CTGGTATCG'r GGATGATTTT GTTGCTTGGA CCTTTATCGG CGAACATTAT 18780 TCCAGATGAT 18840
ACTAGGGCTG
TACGGCGTTC
ATCGAAAAGC
CTCGTCGTTG
CAGCGGAGCT
CGACAGCAAC
ATGG'rGCGCC
CTGAGACGGT
AATAAA'rTTG GTGACTCAGG TTTTCACTCT TTGCCT'rTGT CACAAACGTrG CCTATGATGG CCAGTCCCAC ACTTACCACA GTTCTAGAAG GGGTGATTAA
GGAGTGGTCC
A'rATGGCACA
TATCGTAGTC
TGAGCAAGCG
TGCGCGCGTG
CAATGGTGAT
CGACAAAGCG
GAGTGTAGTT
AGAAGCTCGC
AACCAAGCTG GACrrAAGGC CTTTGTCCG AAGGATT'rCG CCAAGCAAAT CATGGTCAAA T"T'CAGA'1r TCGAGGAAGC CAAAGCCTAT AAGGCGGATG GC?1'GGCACT TGGGAAGGGT GTCGAAGCCG CTCATGAGAT GC?1'VrGGAC GTrATTGAGG AATTCCTTGA AGGAGAGGAA AAGTTCTACA TCATGCCAAC GGCTCACGAC CCTAACACGG GTGGTATGGG TGCCTATGCG GATACAGCGG TTGACACCAT TGTCAAGCCA CCT'rA'rCTGG GAGTTCTI1TA CGCAGGGCTT GAGTTCAACG CTCGGTTCGG AGATCCAGAA 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 ATCCTGACAG CTGATGGACC GAAAG'rCATT 672 ACTCAGA'rrA TCTTGCCTCG CTTGACCTCT GACTTTGCTC AAAA'rATCAC AGATATCCTG 19560 GATAGCAAGG AGCCAAATAT CA=GGGACG GCATCCAAGG GCTACCCGCT AGACTATGAA GGCGATGTCA TCACCTACTA TGCAGGGGCT TCAAACGGCG GACGAGTTA TATGCTCCTr GCCAGCATAT ACCAAGAACT ATACCAACAA ATCGGAAGCA AGGCAATTAA GTAAAGATAT
GACAAGGGTG
AGGGGCGTTG
AAGTTTGCCG
ACCACAGCAG
AAAATAGAAG
AAGAATAACG
CAGTGAATGT
ACCGAAGGTT
TCACAAAATA
CACCGCCGTT
TGACTCTGGG TGTGGrGTc AGT'rGCCAGC CAAGACAGAA AAAATAGCAG AGCACTGC 'C ATACCGTrCAA AGAAGCCCAA GACTrCTTTA CCGAACAGAT CGCCGTAGTC GCCAAACACG
ATAATGGTCG
GAGACCTTAG
AAGACCAr'rA
GCGAGCGTTA
TCGTGGTGAA AAGACCAGAA GCTCAAAG'N' TAGGAATGAA TCAAAAAGAA AAATAAAAAT GCGAGCTAAT A'rAGAACAAT
TCTGGTCAGG
TGCTTCCGCC
CG7TrAATGAT1
GTGAAAGAAC
CAATTTIACGC
GGGAAACTTG
TCCA'rCACCT
CGTATGGTTI'
GATTGGATGA
ATGAGACACC
GGAAAAAGGA
GGCAACCATG
TAATCCA6ATC GTTCAGGGAA ATTGGAAGAC CTTGGGTTTC 9
C
C C
*C.C
C
C. C.
C C TTTGGTGGCT GCTGCCGTCC CTCACAAGCT AAGGTGATTG TTGAAAALAGA GAAGAAATGA AACCAGTAAT TTCCATCATC ATGGGCTCAA AATCCGACTG CAAAAAACAG CAGAAGTCCT AGACCGCTrC GGTGTAGCCT ACGAAAAGAA AGTrGTrTCC GCACACCGTA CACCAGACCT CA'rGI'rCAAA CATGCAGAAG AACCCGCTAG TCGTGGCATC AAGATCfATCA TCGCAGGTGC TGGrGGCGCA GCGCATCGC CAGG3CATG;GT AGCWICCAAK' ACAACCCTrC CAGTCATTGG TGTGCCAGTC AAGTCTCGTG CTCrTAGTGG AGTGGATTCA CTCTATTCTA TCGTTCAGAT GCCGGGTGGG GTGCCTGTTG CGACCATGGC TATCGGTGAA GCTGGAGCGA CTAACGCAGC TCTCTTTGCC CTCCGTCTCC TCTCTGTAGA AGATAAGTCC A'rrGCGGATG CACTTGCCAA C?1'TGCTGAA GAACAAGGAA AAATCGCAGA GGAGTCGTCA AA'rGAGCTCA TCTAAAACAA TCGGAATTAT CGGTGGCGGT CAACTGGGTC AGATGATGGC 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 CATTTCTGCT ATCTACATGG CGTCTCTCGT GTGGCGGAAA GTTGGCAr.AC CGTTGCGATG
GCCACAAGGT
TCATTGTGGC
TCCTCACI'TA
'rATCGCGCTG
ACCTTATAAC
TGAGTTrGAA GATCCTGCGG CGCATTGCCC GATGTAGACG CCCTCCGTCA AATCTCGACG CTGACGGT GGATCCCGTT ATCAAGGATC GACAACTCCC TCAAGGAACA AAATCGTATT TTTGAAAAGG AC~rTTGTC AAACAAGCCT
GATCTGCTICC
CAAGTCACTG
TTGTCGAAAA
GTTATTCGTT
GCATTTCGCA
TGGCACCCTA
ACTATGTCCT
CAGAAGCAGA
CAAGGTCGTG ACTTCTAGCC TAGACTTGGC CAAGACTGCG ACTGGTGGCT ACGATGGTCA CTTGGAAGC.A GCCTATGCGC TAGCAGACTC
AGATATCGAC
TGGACAAAAG
AGCAGACTGC GTCT'rGGAAG AAT'=GCAA 673 CTTTGACCTT GAGATTTCTG TCATCGTGTC AGGAAATGGC AAGGAGGTGA CGTTTrTTCCC AGTTCAGGAA AATATCCACC TTCTGAAAGT CTAGTAGACA -171-;.GG ACTCTCTGTG
AATCGCCCCA
GTTTGACACC
GCCAGCCGI'
AGAAAATCCA
GATGGGACAT
GATTGA7=T
TAAGACTGGA
TAAATATGAA
GCTGGATACG
GTTTGAGTAG
CGACCACATA
CATATTCTGG
ATGCTIAATG
AGCGCCCACC
GTGACI-rTGT
TAGGACAAGT
GCAGGTTGr
GAACTAGCTG
GA7TrTGAATA
AAATCTTGGT
GCAACAATAT
AGGCTAAAGC
TGGAAATGTT
AC'TCTGGGCA
GTG=.CTCGG
TCCTCGGTCA
TCCACATGTA
TTAGTGATGT
CTATGATACA
A.AGTGTTGTA
CACAATACCC
CACTAGATCA
TTACCTAGAT
CCGT'rAcCI
CMC'ITGAGGTG.
AGATGTGGCT
pose*: o 6 *so p p CCTG'rCTAAG ACCATCGTAC CAGCCCGCAT AAAAATTAAC ACATGATCAA GAAAATAAAT ACCrTGCTTG TTGGGGGAAA TCCCTAAGGA TATGGCAGTG CGAATCGCAG AACAACTCAA TGCGACAGCT GATGACATCA TTGTCAATGA CTA=CATT GAACCCTGTG ATTrTCTrCA AGCACCATTA CCAGTCATCA AACTCCATGC GCATGTCGAG GCTGCTGAAA AATATGTCAC TGGTAAAATA GAAGCAA.AGC ATAATCGTAA GCCGGATAGT GTGGAAGAGT TTGGGGAAGG AATTATCGTT AATACATTI'A TTGAAAAGTA TGCCAGTGCT GACCAAGATA AGGTACAAGC CGAAAATTAT TTAGCTA'rCT A'rAATGTACC TTACCCGTCT GTGTTTATTG GAAAAGAGGA AGCTATCC CAACAGC?1'A AGAAGAAAGG CGCCCTGAGA 'rGGCGAATAT Tr'GGAGTGAA .GAAATCCTCT CTGACCAGGC ATGGGCTGAG TTGATTCGCA AGAAGGCGGA CTrTGACATC CGCCACGATG TGGTGGCTTT CACGCGTGCG TGGGTTCACT ATGGGTTAAC TTCTACTGAC AAGCAGGCCA ACGACATCAT CCGTCGTGAC AAGGCCAAGG AGCACAAGTT CACCATCATG CCGACAACCT TTGGTCTTAA ATTAGCAACT CGCTTCGAGC ATGCGGCTGC TGGTGTAGAA TTTGCCAATA TCCCACCATT TGTAGAGGAG CAAGAAATCT CTACACAAGT CCTTCCTCGT GCCAGCATTG CGACTTCAAT CGAACGTATG GAGCAACGCG AAGTAGAAGA GTrCTTTGCT CACAAACGCA ACCCAATCGG 'rTCTGAAAAT CACATGATTA CGGCTTATGA AAACGTCGCT 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 p. p.
p p 4 pp PP 0 0 GACCGTATTT 'IGGAAATTGA GCAGGAGACG GTrTCTGAGA CTCT'GGTGA AGAGCGCAAG GTGGTGGATA CTG=TATGG TTACC'rCTAC CTTGAAAACT TCACTAATAT CATCGCTGAC ATGGGGCGTA CTCZATGGTGT GCACGCTGAG TGGTACAGCG AAATGAAACG CAATATCGAG GCTGGTAAGA TTTCTGGTGC CGTTGGGAAC TATGTCTGCG ATAAACTTGG CATCCCTGCC GACCTTCACG CTGAGTACTT TGCGGTTCVI' GCGACTGAGA TTCGTGGTCT ACAAAALATCT AAAGGGCAAA AAGGGTCTTC AGCAATGCCT ATGACTGGTC TGGCGCGTGT CAT'rCGTGGT 674 CTCTGGCATG AACGCGATAT TTCTCACTCA TCAGCTGAGC ACCATNTTGA TTGACTACAT GCTCAACCGT TTTGGAAATA TTCCCACAAA ATATGATCCG AAACATGAAC TCGACTTTTG GCTATGTTGA CA'rTGATTGA AAAAGGCATG ACCCGTGAGC CAAAAACAGC CTACTCTTGG GACAACCAAG TAGAC?1'TAA CAGAAGTAAC ATCACGTCTC ACACAAGAAG AAATCGATGA ACACCAAACG AGTGGATGAT ATCTTGAAC GTCTTGGACT AACAGCGAGC rrCAATCTCG CTGTTTATTT TTTATCGAAA TTAGTGAG'rC CATAGGCTGC TAGTGTGGAC TCGTGAGTTC CTGTTCAGG AAGTTTrTC AGAAC'rTTrGC 7TTCCTCAGC AGGAGCAGTT T rTGGT 1, CTTCAGCAAT AGCGGCTTGT GTTCCGACTT CGACTATTTG AGTAACGGCT TTCACTTCCT TACCATCGGC AGAAGTGCTC ACGCCCTTAG TAATGACTTG TGTI'TTTCCT GTGGTAAATG GAATTTCTTC TTCTTGGATA GCAAGACCAC TTTCATCACC cI-rGTGAGT'r ATAACTTCTT TGGTTACCTG GCTATCAAGG AGTACAGAGA TGTAATGAGT TCGTTCACC1' CCGGCTGGGA GGTTAGGAT'r TTCTTTCTTG GTGATGAGTT CTGGTCTGGT T1TCAACATTG GAATGAGTTA CAGCTGG'ITT GACCCCTGA
ATGAGTCCTG
TCTGTTACCA
GATGGAGCTG
CCGTTTTCAT
TCCTGTGCTA
ACAGAGTAGA
'rTGAGTAAGA
TCCAGTCTAG
ACAGGAGCGC
ACTGTT1TCTC
TTGACTCCTG
GTATCATCAC ACCAGATACG TCGTCAAGAA CTTGACAGTC GTCTTATCTT TAGCCAACGG AAGCCTATGA CrrGGTGCAA ACCACTTCTT GAGGCAGATT AATCTTCAAC CCAGTTTATT AGGTGATTAA TTrAAAAAA'rA AGACTTAGTC TTCTTT'TCTT CGACTACTAG TCCTGCAGAA CAGGAGCTGG ATCTTGAGGA GTTGGCTTGG GATTTCTAGT CGCCTACATG TGTTACCATA CGACACTATT TACAAGTGTT AGTTGCTACG ATGTCCATTG GTGGATTTI ACAAGTCACT GT'ITTACCTC AGTAGI'rGGT CAACTTCAAC CACTTGG''r TTGTT'TTCC ATTTrTCAGTG CTGTGATAAT ATTTTCCTGA 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 A'rAACTTCAA ATGGAATTTC GCAGCCACTT CAI=TCATC AGAGCGGCT TTAGGTTGGC TTGTAATTTA GAGCTGTTT TCTAAGTTTG TAGGAATTTT TAGTCT'N'CT TGTGGTrCTGC I-rCAGTTCTT TAGGC1TTCCT
TACAAGCGTG
AGCTGCGTLA
TCAAGCTCAG
AGGGCCTCAA
TCGCGGAGAG
ATGAGTTCAA
AGCATACTTC
AATGGTGT'T
TTTTGACCAA
GCTAGGTATT
CTTGTTTAT'r
GACTTTCT
CAT'rATAA'1r
AGATTTCCTC
CGACTGTTGG
TTCCAGTATT
GAATGTAGTA
GAGGTGATGC
ACGGTTGAGG
ACTATATCCT
AGCACGAAAG
TTCCTTGTAT
AAGATCTACT
CTCAATAGCT
CCAGTCACCG
GAGGTTATAT
AGCTAA'N'CT
AAAGGCAGTC
TCAGCGCTTG GTCTATCTGC CCAGATTGAA TCAGGATATT TGGTAGAAGC TAGTTGATTG TTCTTGAGGA AACCACCACC ATCTTCTGGT TTGGTATTCA AGAATTTATA GCCTTTGCTT CCCCACCAGC CTTAGACCA GTAAGAAATC 24480 24540 24600 24660 24720 24780 24840 97S AAGACATCrr TGTCAAACTG AACATCGTCC 'ITGTCTTCAT GCCA'rTGGI' GAAGCCCrCT TTC7TTGGCC ATACCTGCGA AAT~rGCCAT AGAGTTGATA CCACTTGAGG TAGTACCAGC TTGGCGTATT CGTCAGTACC AAAGTTGAAA ATCTr'rGrT TATTTACCGA TGAGGGC?1r TACAAAGTTC A'rCGCTTCTT GTT'rrTGAAA CTTTATCAAA GTGGGCTTGA GGATTTTTAA ACCACCATAG CATCCATGTG ACCTGGACTG TTAATAGCTG GATTTAGCGT ATTVAATTAG CTCTGTTACT TCTGCCTGTG TCGTCGTAG;T AAGCTTTAGT TCCTTCGATA ATAGCTTTT GTTTTTCCGT TGGCAGTAAT GGTCATATCA TCCAGTAGAA AGAAGGAGAT GGACATCAGA ATATCCGAGC TCACTGGCCT TGGTTCAGAG TAAAGTATTT GCGTCCAGCA TCGArrGAGA AGTAGAAGCC ATCGTTGAAG
GGGTGTTGGC
C1-rGGGCACT
TACCTGCAAA
CGlTrTTTCAA
TACCTAATTI'
GGATGAGACC
TTAGTGCAGT
TAACGTCATC
ATAT'rCGGCA
AGTCGCATCG
GAAGTCCATG
GTCCATAGTT
TTCCATGGCA
GATGTCCTTA
ACCGTTTGCA
ACTAGCATAG
AGCGAAGTCC GTCATTTCCT TGTCTACCAT GCCTTrAGC r5-ErCAACCT
TTGACAGTTT
AGGTTGCTAT
GCAGTATAAC
TCAGCTrGCGA
GATGGTGAAT
GGGATATCAG
AGCCCCTGAT
CAAGCATCAT
ACATCGTAGC
CCAAAGCTTG
TGCGATTTTA
ACAGGGTAGC
TCTGTCGCAT
AAATAAGCAG
AGATCGACAG
AATTCTTTCA
CACGTTTAGC TTCTTCT'rCT TTTTGAGCTT C'rTGAAGTTT AGCAATGGCT TGATCAATCG CGAGAGAGCG AATAGCTTT TCAGCTTCTT GGTTCACTC T'rTTGGTACC TCGTTAAC"TG TTACCTTG7TT
CAGGCGTGAG
TATCTTGT'rG 7TTACGGCCGT
CTTG-CTCTGC
AcrrATTCAGC G'rTGGCATTT GCAAAATGAC GCATGAGTT AACGTGCAGA TGGAGTGTCA GCCCAAGCAG CTACCATACC CTCCTTCTGT TTTTGGTACA GAAGTGATTG GTGTGTTTT CGAGATTGTA CCAGCCTTGG CCATCAG6GT TTCGTCCAAG TGGTATTAAG GATTTGGTGA CCTTTTTCAG CTAGTAGTTT CTCCCCAACC ACCAGTCCAC ATAGAAACGA TGATGTCTT TCGTCGC'rATT GTAGTAGATA CCGTCGTTAA AAGCCA~rGG CAATACGAGC GAGGTCATTG GCGTAGGCAA TAAATTTTTC CTTCGTTTGG ATAGTATTA TCAGCI-rGAA GCACACTCCA CATTGGCATA TTCATCAAGT CCGATGTTG AGATTTCAGT CATACTTGTC GATAAGGGCT TTTGTAAAAG CGACAGCTTC TACGGGCTGA T'rrCTTCCCA AAATAGCTAA AGTTAGGGTT TGGCAT'rGAG AATCGCATCC ATGTGTCCAG GACTAT'rTAC
TTTGGCAAGT
GGTCAAGTTG
GGCACGGCTA
GACGCTTTCT
AGP.TTCATAA
GAAGAGCCGT
ACCGATGATT
AATACCATTG
AACGTAGTAC
AGAAGAAGCG
GTCAAAACTA
TTTGAGACCG
ATAGCCT'TTT
ACCTTTAGCA
CTTTTTCGCG
TTCGTTGTCA
TTGGATTCCC
TGTCGGAATG
24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 676 AGACCGATAC CTTTATCTTT GGCATAGTTA ATCAGATCTG TCATTTCACT TTCTGTTAAG 26640
TGATTGCCGT
TCGTCACTGG
AGTCCATCAT!
TTGGA'crCGT GTAATAATCA CATAGGTCTT GCCGrrAGC= TTCCGACTAA TAGGTGTAAA TTTGI'ACC7*r 'rr'CAATGGC GCGTTGACA GTGATGCTCA TATCCTCCALA CATGAAACGG TCAGTGTAGC CATAA'G=T CGCTTTATCG ATGAI-rr TGAGCTGI'C T~rTrirTCG CTAGTTTTC CG7*T'-GTCAG CcCCGCTTT TCTGTrAVTGr ACTCTTTAGA TCTGCCTTNT TATTAGCAGT TCACTAGGAG TTGAACTAAC ACCTTATCAG TAGCTGGAGT GCTTTAGG1G TTTCCTCAGT ATGGTCGGTT GGTTTTCTGT TG.GTGAGAA.A TATTTACGTC ATT-TACAGTT GCAGCACGT CGCTTCATCrT 'TTTTAGCTG ATCAACCTCT TTCGCTTCTT TTCTTTTTCA GCAGAAGTTG TTCCTCTTGT GGTTTTCT'r CAJGCATCAAT AGAAACAATT
CCTTTCCTGC
GTTTATCCTT
CC7TTTAGC
GAGTTACCAC
C'rGTTTTTGG
CTCTGTTGCC
GTCAGTCTTG
GCTAGCTCT
T'rCTGCT'rTA
AAGACTAGCI'
TTGAAGCACT
AACCGTATGG
ACTCTGTGCT
AATAGAAAAA
TTCTGTT 'r ACAGTTTG GAGCTTCTGG CCGA=rTCG GA'rGATTGAG GGGAATCAGA AGTAGTAGGA GTAACTCCAT
CGGCTGCAAC
CGTATTTACG
TGGAAGGCAA
CGCTGTTGT'r
ATTATATAAA
ATCCAATTAG AACAGAAGCT TT-TCATGTTT CATGCAAAA TCAACGCCI' TATTTTAT
GCTCCTACAG
CCTCCTGATT
CTTATATTAA
GCATTGTTAT ATTGATAGCG TTTCTTATAT TAACGAGAGT TATAAAATTT AAACT1TAAGA TAATTTTTGG TTCTATTICT 26700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 28380 CAAGAGGAGA TGACAAAAAA TTTCAGA7'rG CTCGGAAAAA ATAAAATATT CCACAAATTA TAGTATAATA AGCATAGAAA CGGTCACATG AGATAAATTT TCTAAAACTT GGCCAGTTGA TGTTCATCGA CTCCGAGGAC ATTTCTTCAA AGTGTTCATC ATAGAACTAT AGTATTCAAG CTATAATAAG TA'?AAAAAAA ATACGTATAT ATATCTAGTA TAGAATTTTC CAAAAATAGG AAGCCCAAGC GATTAGCTCA AATCTTGTAG 'rAATCAGATC TTCGAGTTTG GTGATT1AG
TAAGCGCTAC
GGT'r'r'CTTC
GTTTGTAAGT
TTTGTAGGAC
CT'rTTTGGTG
TTAGTGATCA
T1TCACTGTAT
AGTAGGGAAT
TATTTCGTTG
ACGATTATAG
TGGGATGTAG
CTTGTAGTAG
TGAAGCTGCA TGTTCTGGAG TTGGAAAGAC ATTCATGTGA ATGTGGTAGT CTAACTTGAA GTTTGGATAA TTTGCGTTGA TATATTGT'rC GACACCGTTT GATTCGCCGA TACGTTCAAT GATGTATGGT AGATATAAAC AATTGATCGC CGCGTAGACC CAAT1"ITrCC AAGTAAACAA GAAAGAACAG TTACCTI'ATC ATCTTTAGCA TTGAAGAGTT AGCTTGTGTT TGCGTGCACG TGAAACGAAG GTTCCTTTTC CCATC=='G CAAGGTCGTT TAAGGCGCGA ACAACTGTGA GCTTGTTTCC GCGTTCAAT CAATATCTGA AAACTCTACA CTTGTTGGCG GACAATATAG TAGAGCTGAC ATCGTACATT 677 GAAATGAG'r CTGCTTCAGT GTAAAATTTA TCTCCACTGC TAAACTGCCC AGAGATGATT TTATTI-rA A?1'CGTCTrT TATGTATTGA TGG INFORMATION FOR SEQ ID NO: 84: SEQUENCE CHARACTERISTICS: LENGTH: 6749 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 28440 28473
S.
S. S
S
S S
S
*5 5.
S S
SSSSS.
S
S. 55 S S CCTGATGGGT GGTATGCGAG CGAAGTCACA CCTGAGGAAT AGAAACTGAT GTGCAAATGC AAAACTAGGT CGAAACTTAA ACGAAACAAG GAAATTCAAG TGTTTTGGTC GGAGATGCAG CATTGTGAAC GGAGATGTTC CTCAGGTCTT GAGGCTGGTA AGTCAATGAA GTGAAAGAAG TCTTGGTGCT GGTAGCACTG GATACAGTTC TGAAAATCGC CGTrACTTAA TTAATGGACG TTGCTCACTA TCGTGCGACT GGTCAATTAC CAGGAAATGC CACAACAGGC ATCAGGTATG AAACAAGGCG GTGTCCTTC CAGCAGAAGC GCGTGAGGGC AAGTTGGATC CTGT'rATCGG AAACATCTGA AATCCTCTCA CGCCGCACCA AGAACAATCC GTGTrGGTAA GACAGCAGTT GTCGAAGGTC TAGCGCAAGC CTGCTGCTAT CAAGAACAAG GAAATrATTT CTATTGATAT CTCAATACCG.TGGTAGCTTT GAAGAAAATG TCCAAAACTT CAGGGAATAT TATCCTCTTC TTTGATGAAA TrCACCAAAT GTGGAGACAG TGGTTCTAAA GGACTTGCGG ATATTCTCAA
GCCAGCTCTC
TAACACCATC
TCCTTCGGCA
CCACAATGTC
TCTCGTGGAG AATTGACAGT GATTGGGGCA ACAACTCAAG ACGAATACCG TIGAAGAATG CTGCTCTTGC TCGTCGTTTC AACGAAGTGA AGGTCAATGC GAGAATACTT TrAAAATTCT TCAAGGAATT CGTGACCTCT ATCAACAACA ATCTTGCCAG ACGAAGTCTr GAAAGCAGCG GTGGATTATT CTGTTC.AATA 5555 *S *S S S CATTCCTCAA CGTAGCTTGC CAGATAAGGC TATTGACCTT CTTGGCGGCT CAACATCCAG TAACAGATGT GCATGCTGTT AAAAGACAAG CAAGAAAAAG CAGTTGAAGC AGAAGATTTT AACACGCATT GCAGAATTGG AAAGGAAAAT CGAAAACCAC TGCAAGTG'rC AACGATGTGG CTGAATCTGT GGAACGAATG A.ATGGAAGCT TCAGATATCG AACGTTTGAA AGATATGCCT GATTGGTCAA GATAAGGCCG TAGAAGTTGT AGCTCGTGCT GTCGATGTAA CGGCTGCTCA GAACGAGAAA TCGAAACGGA GAAGCAGCTC TAAACTATAA ACAGAAGATA TGAAAGTGAC ACAGGTATCC CAGTATCGCA CATCGCTTGC AAGACAAGGT ATCCGTCGTA ACCGTGCTGG 900 960 1020 1080 1140 1200 1260 TlrGATGAA GGAAA'rCGCC CAATCGGCAA TAAGACGGAG CTTGCTAAGC AAT'rGGCACT CCGTTTAGAT ATGTCTGAAT ACAGTGACCG AGCAGGCTAT GTrGGGTrATG ATGACAATAG TCCATACTCT ACATrCTCT TGGATGAAAT TCTCCTCCAA GTTCTAGATG ATGGTCGTTT CAAGAACACT GTCATrATrG CGACCTCAAA AGAAGATGCG GATAAACCAG AATTGATGGA CCTCAACCGC TTrAATGCAG TCATCGAGTT 678 CTTCCTCTTT GTAGGGTCTA CTGGGGTTGG CGA'rATGTTT GGAACCCAGG ATGCGATTA'r CACAGCTGTT TCTAAGCTAA rrGGTACAAC CAATACCTTA ACAGAACGTG TTCGTCGCAA TGAAAAGGCT GACCCTCAAG 'rrATTACCCTI GACAG.ATGGT CAAGGAAATA CAGTAAACTT TGCTGGATr GGCTATGAAG CCAACTTGAC CCGTTrGAAA CCCTT-rrCC GTCCAGAATT CTCACACTTG ACTAAGGAAG ACCTTTCTAA 1.320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 GATTGTAGAT T'rGATGTTGG CTGAAGTTAA CCAAACCTTG GCTAAGAAAG ACATTGACTT
S
0*
S
b~ S S S S
S
S. S S 555555
S
S
5* 05 S P
S
GGTAGTCAGT CAAGCGGCTA GGTTCGTCCT CTCCGTCGCG CTTGGATCAT TTAGATGCTA TCGTGAGAAA GTCTAAGACA CTGGTTCCTT TTTAGGTACG AATATAC'TAT AGTAGCTCAG AAAAACGTGG ACTGGTTTCG GTTGAACCGC CGTATGCCGA CCTACTCGAT TTTAAATCAC ATGAGTTTTG CCCATTCTrTr TCGATGGTTG TCCCTGGGAC 'rCCTrGATAA CAGCTGCGAT CC1TGTGCGCC AACCAAATGG TTGGCTAAGA GGTGCTCGAT TGCACCAAGC GAATATCATA ATCAAGCGAA CATAGGGTGC AAGATTATAT CACAGAAGAA GGTTACGACG AAGTCATGGG TGGTTGAACA AGAAATCcT GATAAGGTGA CAGACTTCCA AACATCTGGA AGCAGATATG GAAGATGGCG TrTrGGTTAT GAA7T=TGAG GATAAAAAG AAGGAGCCAG CTGAAAAAAA ACAGGCATGT CGTATAGTAG AAGTGTATTA rTCTAGTTTC
AAGTCGGTAC
TGT7TTGGATT
ATGGTACGTA
ATGACGTTCA
AGCAGAGAAG
A'rCTTCCCAA
TTTAGCACTG
TTAAACGTGC
A'rrACCTTGA
CGGTGGTGTG
AAGGCATCAT
AGGCTGTGCT
GTAGTAGTTT
GTGTGACGTC
TATATCAAAA CCAGTCCTGG ACGACATGCG*TrAAAAGTTA TGAACAGTCA ATCATGCCGT AGTGTGAAG4G CCGGCAGTAG ATT'GGAGATG ATGTCTCCTT TTGACAATG GTGTGGTCAA AGAGGGGCTA GAGATTATrCC 2340 C'TGAAATCCC TrGT'rCCAAG 2400 CCTTGTAGTT TCCGCAAGAT 2460 CAGCGATTTC CTTGAGCGAA 2520 CCCACATAAT CATGTGGAAG 2580 CAATGCGGGT ACGGATGACr 2640 GGATAGAGTC TrCGTTTGGT 2700 TrGGTCCTGT T'rCTrCCCCA 2760 GTTCAAAACT T'rCGACAATA 2820 CATTATATCA TAAAGGTTGC 2880 TGAAATCrTT ACTACGATA 2940 ATACCCAGTT TACGAAGGAC 3000 'rCGC=CCT'r TGAGGT'rGCC 3060
S.
S
S
ACTTCI=G ACATGGTAAA TCCTTTCAGT TCCTGAGACA GAGACAAAAC CTCTCCGAGG TAAGCGGTCG TATTGGTAGT ATGGGTCAAA A7TCTTGTCT TCATCACTCA AGATGATGGT TTcrCTCT
CTGGAGAGGT
CTTACGTT'G
TGAGTCGCT
679 GAG1-rCTTCC ATAGCGCGGG CAGGATTTCA -rGAATGAA 1-rGGATTGGC TTAACAACTT 'rl'TTTGATG GCGTTGATCA AGrGATGATT TCCCCATTNG TTTGGCGC GCAACGACAG GAGCAACTCA ATTTTCTTGA TGI-rGATAG TAACGGCGGA TGTAATAGCG AAACCAACCA
CAGCATCAGG
GGCGTGGATT
CAGGCTCGAT
AGGCAGCGGC
ATTCTGTA
GCGGCTACCG
TAGT1'ACT
TGTAGGACCA
GCTGTGATAG CAAGTGvCAAT AGATGATCGA TTTAAGACC TCTTAGCGA TGTCAGCTGA AAGAGTCTG AGTTC?1'ACC GCAATTCAAA GGCTAGGGCT CAACCTACG GTCTGCAGGT CGGCAGCTTC GCCAACTTT TGATTTCTTG TTTAGAAGCT TGTTGACACC CATATCTGTC GGTCCACCAG TTTCTTCTGG GTGATACCGA GGTCGTTCAT rCAGCTTTGA AGTCAAGAAC TCGACAGCGG CCTCGTCATC GGTGAAGCGT ATGGTGAT
TCCGAGAWI'A
GTTGACAGTG
AAGGTCAGC1'
AACAGGGAAG
CATATTGGAC
CAAGTTGCGA
CGTrCCAACA TGCGTrrTGAG GTTTCTCCAT ACGTTTGAAG GTGGCAGCTr CATAAGCCAA GTTTCAAATT TAGCGTAGCC ATACACGTTG CCAATTTTCC CTGGTTTTrGA TGTAGTCGTT cACTGGGAAG
ATGGAAGGGG
GTTAACTGGA
AGATTTGATG
AGAACCAGGT
TTTGCCCATG
ATAATGAAGA
GGGTrGGCCA
TTTATCAATC
GGAATGTTCA
TCAATCATGT
TGATGAAGGG
CCATTGATTT
TGACATCATC
GAAGATTCCA
GGTCGTG4GTA ATCCATATCC GTCCGATATC CTCAACTTCA TTGCGGAAGG GGAACCAACA AAAATCCCTA ATAAGAAATG CCTAALGTCGC AATCACAACC TCAACCTGCT ATAACCAGGA AGGACACGAG CTAGAGCTTG CCGTCAAATT ATATTGTTCA GAACTAAAAG AAAAAATTGC AAGTTAGGA GCTTGCACAA AATTTTAAAA AACAGATGCA AGAAAAATGC TTATGGATAG TAATGCAGAT
CTTTGATTGG
CATCTGCAGC
ATTCATTGAA
CACGTGCTTT
CT-r'CAA'rTC TTrCAAGAGC CAGCGTGGAA ATCTTCTAAC GGTTAATGCG CTCCAAAATA CTTGTTTTTT CATTTTTTTA ACTACGGAGC TAAAAAAGAA AGTGCTATCA TAGACTATAG TGGCACAATG CCGTAGTCTA GGAGTTGGTG ATTTGCCAGG CCAGGAGCGG TTACGACAAT CCTTCTGGGG AAATGATGTG rA.AGAATCAA TTCCCTTTTT GCGTATTGTG TAATGACAAC AAACGAAGAA CTrCTTGGTC ATGTTGCTAG CATTAATGGC TTGATTTTGT TGTCAGG'rTC AlTTTTACCGC CAAACTCTAA TGGTCGCGTT GTAAATTCAA CCTCTGGACT CTATTATAAT ArrAAAAAGA TTAAGCAAAC ArTATGAAAA TAATGAGGTA 'rCAAGTCTAT CCAAAGAGTT TATTACCAGi' AAGTTGGACT 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800
ATCTAGCTAA
ATGATAATGG
GCTAGGAATC ACAGCAATTT GGCTTTCTCC CGTTTATGAC AGCCCTATGG CTATGATATT GCTGATTATC AAGCGATTGC GGCTA?11'TT GGAACCATGG AGGACATGGA TCAGCTGAT'r GCAGAAGCTA TGGTGGTCAA TCATACCTCA GATGAACATG ACAGCCCTGA GCGAGACTAC TATATCTGGC TTAGTGGC TGCTTGGGAA TACGATGA.AA 680
AGAAGCGTGA
CTTGGTTrGT CATTCGTATC ATCATGGACT CGAAGCCTGT GAAAATACTG GCAAGAAACA GCCGGATCTC AACTGGGAAA TGATGAACTT CTGGATTCAT AAACGTATTG ?rGGCAAAAT TCCTGACGAG AAGGTAGTCA AGGAAATGAA TCAGGCGACC T'TTCGAGATA GACCAACTCC AGAGATTGCC AAGTTCTACT TCTTCCAG'rT TGAACATATC GGTCTTC-AGT AAA.AAGAGCT GAATATCGCT AAGTTAAAAG GAGTTGAGGA CGCCTGGAAT TCCCTCTTCT CAATCTCGGG AA.ATGACCAA GAA'rACCGCG TTCATCTCAT GAGAGGAACT CCTTATATCT ATCCGTTTGA AACACTGGAT CAAGTAGAAG C'TCTTGAAAA AGGTGTTCCG ATTGAAGAAA ACAATCCCCG 'rACCCCTATG CAATGGGACG AACCTTGGTT GGCGGTTAAT CCAAAT'rACG ATCCAGATTC TA'rMrCTAT ACCTATCAGA GGCTAGTTCG AGCTGACTTT GAATTGCTTG GTAAGGATGG CGACCGTCGC TTCCTAGTTG TGACAGTACA AGGAAAAGTC AAATCTGTCT TTGAAAAACA GGTCTTGGCT CCATGGGATG TTGCAGAAAA ATTTAAAATr GAAATCGTAT AAATCCTTTG TT=rTATAA CCAAAGTTTA TACAAATTC CCACTATTAA CGAGAAAGAA AGGCCTGACT TTTGCATCTG CT'rTGCTTT ATGTAACTTA CAAAACCCCT GACCTCATGA GCGATGAACC CAATGACCTA GATTCTATCT AGTCAGGTCA ATACTATCTC CAC~rTTCA ATGAAAAACT TCGCCAGAAA ATTTATGAGA GTGGTrrCCC TA'rGGATGrr ATTGACATGA ATAATGGTCC TA'rGCTCCAT CCCTATCTCA AGGATCTCTr GACAGTAGGG GAGACTTGGG CTCATCCAAA GGGCCAAGAA TTCTCTATGG ATCAGGAAGG TCAGCCTAAA TGGCACTATC AAATCTTCAA CAAATG4GCAG ACACACTTAG GGAACAACCA 'rGACCTCCCT CCTATTGCT AAAAATCTGC CAAAGCCTTT GCAATCTTAC ACCAAGGTGA GGAGATTGGG ATGACCAACT ATATTGAATC TCTCAACTAT GCGCGTGAG TCATGGACAG 'rATCCGTGTT ATTGGACGTG AGAGCAAAA.AA CGCTGGtTTC TCAACAGGTC AGATGATCAA TGTCCAAGAA GCGCTGGCAA AACTGGTCCA AATTCGCAAG GAGAATAGCT ATACGGCTGA TAAGGTCTTT GCTTATATAC TGGCTAACTT GTCCAATGAA GAGCAAGACT TGATTGAAAA CACTGCGCT AAAGAAGTAC CTTrCTGTGT GGAATTACTA TAAATArTT AAAAACAAGG GAGGACTGTA TAAAAGACAG TAAACTrl'CA rTCTTGAAAT TCAA'rAACT GATGAACATA AAGAAGCGTG TCCTTAGTGC ACCCAAATCA TTCATACCTC TCTCAACTAG 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 'rrACTTTCTG CTcCCTGGT
GCCACTTTCT
CTCGCTAGAT
CTCGACTGTT
CTGTTCCAGT ATCGTTTTC GCGTCACACG A7"rTTTTCAT TCCTCCTCAT GACGTCAGTT TTCCTCAAAA GGGCAGACTC CTTTAATGCA TCATTAACGA CGCTTTrCTT CTAGGTGGTr CATAAGGAAC AGGAAGATTC AGGTrGAC?1' TTCTAATCCT AGAATAAAGT GCTGAAAACA A'IrCGGAATA GGCATAGAGA C'TAGACAATT TGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCGG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 1842 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6660 6720 6749 TCTACCCATG GACTTTGAGG CATTCATTGT AAACGATTCA ATTCACTTGG ATAGTGAAAC ATCCAGCTGA TATT~TCTTTC AGCCAAAATA ATATTGCTTC TCCNTTAGTT AGATAAATAA GTATCTCTTG GCAATAGAGC TCTAGCCTCT TCCATCTrCT AGTGGCGAAT CTTTTCATAC TCTCCCGCAA ACATTTTTCT GGTTAACTCA ATGGACAAGT TCTCCCAAA.A TCGT'rCAGCC TGTGTTTGCG CCATGTAAAT CAATTG=TC 'rCCAAATTCA GACTTGGATA AACTCGCTTA TTTGAAACCG CAAGAGGAAG TCTGATGGTT AGTTCAGGAT AAATCCGTTA ATC TTAGAT'r GTCACGGTTC TTAAATCGTA TCAAAACAAT CTGAAGAATA GCTCATCATC TCAATTAATT ACTGAATGAC AAGATACCTC TATGCCATAG TTTTGGAAGA CTTTGTCTAT TTTI'ACTTAG ATAGAGATCA ATCATGGGAG CATTTGATAT TCTGACACGA TTAAGGAATC TAATAAATTA AAGTTAATCG GTTTCTTGTC TTCATCATAA GCTTTTACAG CCCTCTTTTC CCTCGGCTCG ATAGCCTTGT CCATATAAAA
TTTTAAAAT
ATAAATmGG TGTCCTTrGT
TATCTCAACG
AGATAAAAAC
CATTTCAGAA
AATCTAAAAG AAGTTrGATTT ACCTCCCAAA GATTCGGTTC AGGAATCTAA TAAATTTGCG TTACTTCGGT TGTAAGTATT CAAAAACGAG ATTTTGATGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 TCATCTACAA AGGCATCAAC CCCATTCTrT ATGTCTTGAC TTTCAAGGAA TTCCATAACG TTTTGAAGAT AGGATTCGTA AAATAGTGGG TAGTTATGTT TTTTATGGTA ATCATCTAAA AATGTCACT'r
CAAATTTGAA
CAAAGACAAC
AT'rCTTATAC
CTGATATTGA
CAAACTCACA TGGAGAGTAA 7TrGCACTTT TTrGGAATAAA TCAAATAAAT AGCCCCATCC TCCAACCGAT CTTTrAAA.AC TGAGTAAACC GAACAGCCTA AAAGTGCCAT TCATCAATCC AACCTTTGCT ACCTTAACCT CCAGTTTCAT CGTTCACTCT CAAATAAAAG TTTGGGGAGC TTATAATAAC GCTCTGATGT TTAGCGGTAA TACGCTTCAT TATTGTCCCT CCAAGACTAA AATTCCAACA 682 TTTCCAA.ATT CATCAAATCG GATTAAACCT ACTTGTTCCA TTTCATCAAC TAACTGAGTT 1260 GCTTTTACCC AAATCATTCA TACCTCTCTC AACTAGATGT AACTrACAAA ACCCCTGACC 1320 TCATGAGCCA CTTTCTTCCT CCTCATGAGG TCAGTTTTAC TTCTGCTGT TCCAGTATCG 1380 TrTTCCTCG CTAGATrTCC TCAAAAGGGC AGACTCCTCC CTTGGTGCGT CACACGATTT 1440 TTTCATCTCG ACTGTTCTTT AATGCATCAr TAACGACGCr TX'1'CTTCTAG GTGGTTCATA. 1500 AGGAACAGGA AGATTCAGGT TGACN'TCT AATCCTAGAA TAAAGTGCTG AAAACAATTC 1560 GGAATAGGCA TAGAGACTAG ACAATTTGAG GAGCTGCTTG CGTCCTNGTTC GAACACATTT 1620 TCCCACCACG TGAAGAAAAA GATGGCGGA.A GCGTTTGATT GTrAAAGTTr GGAAGTCACC 1680 TCCAGCTAGA TGTTTGAGAA AAAGATAGAG ATTGTAGGCG ATACAGCTCA TCATCATACG 1740 AACTTCG'Tr TTGATTAAGG TTGAACTATC CGTTTTATCG CCAAAAAATC CCTCCTTCAT 1800 CTCCTTGATG AAATTCrCGG CTTGACCACG TCCACGATAA AG 1842 INFORMATION FOR SEQ ID NO: 86: Ci) SEQUENCE CHAR~ACTERISTICS: LENGTH: 19390 base pairs TYPE: nucleic acid STRANDEDNESS: double *CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: TCATCTTTAT CTCCTCGAAA TTTTCTAATA TAGCCATTAT AACAGAATTT TGTGAAAATT ****CCTATTATAG TAAATCACTA TTTCAGTATA AAAAGAAAAA ACGAATCAGA CGATTCGCTC 120 TTCTTAAAAT CTGAAAATAG CTTTCCAGAA AGGATTAGCC GATTTTTTGC AGATTGAGCA 180 CTGCATCGTG ACTCATCAAG ACTTGACCAT ACTCTTGTAA GACTGAGCGA CTGATATCAC 240 TATCGTCTGC AAACTCGCGC ATACGGGCCA ACAGCCAAGC TGGATATGGG CTTGGATGAT 300 TTTCAATATC CACTAAAATG GTCAAATAAT AGCGCTCGTT CATrTTGTAG AGTTCAGAAG 360 TTTCCATTTC AAAAG1TCACT GTCFGGCA.A AAGCTACCAA GTCAGCCAAC TTAGCAAAAG 420 AAAGGATGTA GTAGATGTAA GGTI'C'NTCT TACTCTCAGC TTCTTGTTCA GCCTGCTCTT 480 GCTCTTrCTTC CTTGACTTCA ACTTGCTCAA GAGATTGAAT GGCTTCGATA TCATCCTTGG; 540 *TTTTGTCTGC GATGCTTTTT TCCAGGGTTT TGATAAATTC ATCTGGAGAC ATTTGAGCCA 600 A'PTCTTCCAT ATCTGGCAAA TCCGATA.AGT CTTCAAAATC TAGATTTTGG TCAATCTTTG 660 ACTTGGTCAC AAAGACATCT ACCTTATCAG GTTTTGGAGT CACACGGAAG CTCAACATGC 720 CTGTATCCAG AAAGCTATCA GGCATCTCTA GCTCATCCAA GATAGCATAA AAGAACTCTT 780 CTGT'TTTTC TTGAGGAACG AGAAAGTCAG CAA'rCTCCAT TCCACGATCC CTAAAGATAT CGTGAT'N'T AAAGTTGTAT CACTAAfI-1-G rTCANTTC ACCTCATACT TTCAGTTCTA TCTA'N'ATAC TAGATrM-A CGATI=ATC TCCTCTATAC GGATAGAT CTGCAGACAG ATACAAAAAG CATCAGAATA CTTATAATTT AGAATTTGAA GACTTTGCTC AGGACTGTAT CTC7"rTCCAA 7'rAAACCT'I CTTCACTCrG TCCCTAGGGT CTTTCTATAG GAGACTCCAA CCTTCAAAAT CGGCTAAGAG CCGACT'rTGA AAAGGTTGC'r ACACCGAGGA TAGAACGATT AAATrrTT~A TAACGAGTCA CTCCGTACTC AAGAGATGAT ACATCCTGTA AATCTACAAA TTT'CAATTTA TCTAAGATAG CTTTATTTGA
ATCAAATCCT
ATTGCTACTA
AAAAGAAGC
AAGAAAArr
ACACCTTATA
TAAGTrCTG
TTCAACAAGA
ATGCATTCCT
GCTAACGATC
AATTGTTGTA
GATGTrGA
CACTCCAATA
ACCA'rTGCG
ATAGGCATGT
GTCAATTCCT GI'CCAGTATT TrTGTATGAC AAAACATCTG CTAGGTTAC ATCTCTGTTA CAAAATCAAT ?TGATACTGA GAAAAATCAC CTACTCTATT
S
S
S. TTAAAGAGAT AAACTAACAC ACAACTAAAC GAAGAATCAG T'rCAATTCTT TAGAGTTGAT
ATTTCCCATC
A7T=TCACA
GGTTTCCAGT
TTAAATTTCT CAATCAACCC ATCAATTTTTr GATGTCAAAA 'rrTTCACACC AACCCCTGCA TTCCACCA6AT
GCAATTTTT
CCATCAATCA
TTGTCCGACT
TCTTTGACAG
AATAAACCGA
TCAGCTACTG
TTCCT'rTGTA AGTCATTCGT TGAATrTC TCAACTCACT GGGTTCTACA ATTTCCCATA AAAATCACTA CTTTT-GGAGG AGTGATTTTA TGTTTATT CCTAACTCCT TGAGAGCTAA AATA'r'r'GCT AATAATGTTC AAACATGATTr AGCATTTTTG CTACAAATAT
ACAACCAAAA
'rTTAAGCCAA TTTTCAAT'rr TTCTCTA6ACA
TCGTCAATCA
AAGGTTGTTT
TCATTGAAAA
TAACCACCAA
TAGATAAGAT
TT'AATTGTAA
TTTCGCTrTTT
TTTTT-CTCCT
TCACACAAAC
GCGCTGTTTC
TCACATTTGC
AGTTATTGGC ATCTrACTT TATAGTA.GAC GGTCAANTTTT C1'GTCGTGTC 'rAAT'rCACTG GATAAGCTCC ATTCAAATTA TT'rGA'GATT CAAAATCCTT AAGTTGAATA ACTTGTTGTA ATGGTACAGC AATGAGAGCA TATAAAGATT TGCAAACAAA T'rGTTTACTA GATACTAGTT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 174,0 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 AATCACAAGA ACAATTCCCC AGAATTGCAT TGTAAATAAA TTGAAGAAAC TTr.AAAA GCTG~fCTT GGCATAAAGA ATAGATTATT CAAGATGAGT AGGGATAAAG CAAATAGGAT TGTCCTTGAG CGATAGGCTA CTTGCAGCAT GGC1'ATAAAT AATACGCCGA GTAAGAAACT AAGCAGAAAG ACTCCAATCA TACCATAGTC CGTATACAAC TCCATGATAT AACTACTTCC GATACCATGC CCTTTCAAGT ATTCCTTGI-r CAAGACAAGA TAGGATAGAT TGTGGGCATA ACTATTACTA TCAATAGCTA GT'rCCACACT ATTGGTTG'rA TGTTCAALAGG ATAATCAAGA ACAGGACCAA AAATAGAAAA CCTCGAGCCA TAAGATATCC CAGAAACCTG
TCCCATCGCT
AATCCA'TT
ACTTAAAATA
GGAGACAAGC
TGCA'rAGACC
AACATGAGAA
CCTTTTCAG
AAGGGA?1rC
ACTGCTGTGG
GTAAAGGTAG
0* TGCATAGTAG GCATAGTAGG GAAATAGAAA GGATAAGTTA ATAAAccTcT TT'rAGAGAAT GTAACGAGCC AGAATGCCTC GGCAAAACGA TAGGCTAT'rG GGTCGGTCTT GATACCAGAA ATACTTGATA TCATTCCAAC AAGTAAGCTA ACATTATTATr ATAACTGACT ATGGTCAAAC GTTCTAATGT AATTTTTTAG AATATT'rATG AATAGGGCCA AGAAATAGCA AAGGCATGGC 684 CTTTTCCTCC GAAAATGGCT AAGTAAAATT ACGGAAATCT GAACACCAAA ACTAGTCCCT TATGGGAAAC TTGGACATTA TAGGAGAACC TACAAAAATC T*rTGCTCCCG CATAAAGTAA GTGTCCCAAT 'rGCCAAATGA CCTGCAATTr CTTTGGCTTG ACAAAATGTA GGTAAAATAA AAGTCTGCAA ACGATACAAG GAAGAAAAAC TCCTAGTGAT TTCCTATATT TGCTACTTT CTGTGGTCAA GCCCAGAATC GATGATACGT ATCCAAAGCA ATACAAAAAT GGTTAAATAG AAGCAATTA6A GCTACTAACC TATTAAACAG ATACACAATT TAAATAATAA TCGTTTCCCA ATTTTTrCAAT ArTT'N'CAGT AAGCACTAAT TCTTCTCCCC CTTTTAAAAA ACGATGAATC CCCAAACTCC CCCTTGCAAA CGGTAAGGGA GGCTACTGTT TGTT'rATAGA TAAAGTCAAG TCCCG;TACAT AATTGAGTAC 2760 GCTAACTTTT CTTTAAACCC 2820 TAAACAAAAG CAAATmAAAT 2880 ATAGTATTAG CTGCAATAAA 2940 GTTGCCAGAT ACATACACAT 3000 GGCAGTTTAC TT'rCAAAATT 3060 AGCCGTTCAA ATAACCGAAT 3120 ACAAAGCGTA ACCGCTTGAT 3180 ATTTTCTTCC TAGCTATCAA 3240 GAAATCATGA CAACTATAAA 3300 CCATCCCTAA AATAATCAAT 3360 AAAATAAAAT GGATTAAGTA 3420 AACAAGAACA ATAAAGTAGA 3480 CCACTTACTA GCGTCAAGGC 3540' TCAATCACTT GGTCACCCCC 3600 AATAAGAATC GATATAAGGA 3660 TTACGCAAAA TTGGATTCCT 3720 TGAGAATAGG CTTCAAACTG 3780 AAGTGGGCAT AGGCCAATCT 3840 ACTTCATTAT AAAA=T'TG 3900 TTGGTCGTAA TACTATCCCC 3960 TACTTCTTGG CCAACTTGAT 4020 2580 2640 2700 9 9 99*9 9* TTTATACTGA TCATCTAGCA ACATCTTATC CAGAATAAAG GAAAAAAGCG ACCCTTTCA AGTCACGATA GTNTTrCACA GTAGATATCA ATATAGGCTA AATCCTTCTC TGCATAGGCGT TCTATCGAAA TAGTAATAAT AGGGTTTAGT ATTAACCACA 'rAAATCAAAA TGGTAATAGG CATCTTCGTA AATCAACCCC TGCAATCTGT CTCTTGATTA GCTTArrGCA AATCGTCCCA GTATTCCTTT AGAAA'rGTTT GAGAATCACA GACAAAATAG TGGCTTCA TCATTAGCAT AGACATTCAT GACACCACAG TTGAACTAAT TGCTCATATA AGCTCTGAAT CATTrTCGGA
TTAGGAAAGG
GGTA'TTTT
TCATCCTGAT
ATAGGGCAGT
CACCTATGAG
TGGCTGACTG
4080 4140 4200 4260 4320 C'rCGAAACAT CCGCATCTTC 'rGGATATAAT CATCTGAGTC A6ATAAAAATC AGATAA'rCCC CGTGAGCCTG TCCTTCGTTC TTTrATGAA GCACTGACAC CAAGCGACCA CT7PCATCTG TTGCACCATC CTTCATCCCA TCATTTCGTG CTTGCGACAA CCTGTCATCI' 'rGTCAGCGA 'rrGAATCACA ATCAACAAGA ATAATTTCCA GAT?rGATA GGTCTGCTTC TGAATrGAAG CACAATCACA CTAATTAATG CGATTTGr" TGTAATTGTA TTGAGGCAGA AGTCATGTAA TTTGTCCAA.A TCGTCCTTCT A6ACGGAGTCC CAGACTCAAG TAGACAAAAG AALCTTCGTC AATGTACATA GTCCTCAATC CTATCGA~r TTCTAGGTAC TGCGCCACAT TATAGAc'rGG CAG=~CCAT GCTACTCCTC TAATACTr TCTACrTGIrr AATTGTTGAA TGAATTGGCT AGCCTCATCG TTAGTAATCG CCTGAGCTGC 'rGGGATAATT CCTCAGCCCC GCCTCCACAT ACACTCCAGG
CTCTTGATTG
TCCAACGTCC
AAAACCTTCT
ACATCAAAGT
CTCTCAATGA
GTAGAGATAA
TGTTTAGACA
TAACCAAGGA
TCTTCCATAT
AATCCTATCA
CAGATACTTT
TGACATAGAT
CCATACTCrT ACTGATAAGC A7TrM"GA TGACTCGTT TNTCAGTTCC TTNTTCCCTC TTGGTGTAAT CCTCAATCCG TCCGATAG'rA 0 00 4* 0 00 00 0 0 .0 0@ S 0 .4 6* CACCAGCCCC GATAAAATAG AGATGATACT C?1'CCACTAC ACGGTCAGAA CCCTTATTTT- GAGGAGCAAT CTCGATATCG ATCTTCTCTT ATCCATTGTA GATTGTCTGT AATTTAGAAG TGCTGGTCTT TTT74GAAATC CC-TACAATTG ATTCTCTTTT~ AGAGCTATCC TTAAGAAG=r GAGATTTTTC TAGAATAc'rC TGAkAAATC-AT
TATAATCTGG
TATTCGCAGC
CTTCAATACT
ATAAACTTCC TTGATAGAAT ATCCAACTGC CTTCTAZTGTG TCCATGAATC CAAGATATCTr 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 TCTTGACTTC TC'rTCTT'rTA GAGAACAACA GTGGTGGATT CZATAA'rGGTA AAAGAAACTT CAACATCATA ATCATCTTTrT ACAAGCAAAC CACGAGTCAG TCTTGGAAAA TAAATTCTCA @0 (S 0
S
0**S
S
00* S *0 55 0 5 0 ITTCTCCACAA AAAAGCTCGT AACCATCTGG TTTGGCGATA TGCGTACATG CTTTGGAACA GATTCATATC CCTTGTCAAA CAATATCATA CTTCTGGA TCCAGATTTG AAACAATG CACCTCCAAG AGAAAAAGAC CACATAAAAA ATAAGATT'rr CCCTTCTATT CTGTATAAGA CT'rACCATA TCAGCGA'rGA TGCTTGTCTG CTGGTGGAGG CGTCATATAA TCCCCAAAAG 'rAGCCGArrG GAATAGGCAT CTCTGTTCCT TCAAATGGCA
ATCTTGAAGG
GTGCTCCAT'r
TGATAGAATC
GATTTTAAAA
TCAAGAATA'r
TTCTCTGCAC
TTTCTTAGCC ACCATATTCT CACCATCATG ATGCGGTACC CAGTTCTrGAG ATAGACATCA AGAAAAGAT'r GTCTTCAAAA AGCATAATTC TGTAA'rGCCA CT'rTTrTCCA GATCCGATAA
GATGTGATTG
TCACAATCAG
GGTACTrGTT TCTCATGTAG CCAGGACCTG CCAAATCATA CTTAGTCATT TCI'TCTCAG CGGAGAGATT TTCGACTCAA ACCCAGTAAA ATGCGACTTC CCCATTTCAT GAGA'rCACCA
TGCTTTTCTG
?r'Ir~ccGCT
CCATGTGCCA
ATGGTAAS'AA
TCATCTGCAT
686 GAATAGTTTG CGCACAAAAG AGTGAATAAA TCAAGGCCCA CAGCTGGATT TTTCGGATAA TAATCCAAAG GCAAAACATC AATCCAAATC CTGCTGATAA GGCTTGATAC AGGTGGTTTT AAAGATTACG ATCAACAAAA TCCTTGTGAC TCTTTGACAA AACGACGCCA TAATTCTGC1' AATTTCTCAT ATTTTACG
ACGAACCTGT
CAAGCCCAGA
CTTGTCACGA
GAAATAACGT
AGGCATAAAA
AAGTCTAGGT
CCGCCACAGA
ATCTCCAGAC
ATCCTATCGT
TCAAAGCCTG
CTCGTCCCA
GATAACAGAG
TACGAGCCTG
TTCATTATAC
ACTGCTATCC
CGAAAATCTC TTCAAACCAC TCAGTCTTAT CTACAACCTC TGCACTTT'GA TTTTCATTGA GATATTTGTT TTGATCAACC AGTCACCCAG TTCTGGAATC ATAAACCGTT TGTTACOAGG TTCCAAKGAGC TTCAGAACCA CTGTTAAAAC TTGTTGCTGT AACGAGCATA ATTGAGCACG GCGTTGGAGA CTCAGTCTCG GACGCCCATT CAATGTTGAA CAATCGCTGT GGCAAAGGCC AATTATCTTT CAAGACTTGG ACTCGTAGGC TACGTTTAAC AACCATCTAT ATAGACCAAA -AGCCCCCGAC ATTTAATACC TTCGACCATC AGTACCCA'rG AGACTTCTAC TTGAGCTGCC CACGCAAAAA CATGAATACC TAAAAATTCG TT'rAAGACCG AGGAATAAAT CCCI'G??TC CAAATCATGT TCTTTACAAA AATTGrCTTTT AAATCAGTCA CACAAACAAG GGGTGAAAAT AAATAGCTAT CAAACT'rTGA GTCAGCTTCA CCTTGCCGTA AAAACTGTGT TT=TAGCAGC GTA'PTATCTTP ATCTTAAGCC AGCAGGCCCA AGCCCCCATA GTCAATTTTT CAATACCATT AAAGTATAGG GTACGTTGGT GTGAAAAGTT TAGTGGGATC CT'TTTGTAC GGCCGTAATC GTCGAGCCAT TCATCTGCTG GTAGCGTATA AATCATCTCC AATTGAGCAT CAATCGTCAC
CTGCGGCTAG
CATTTGAGCG
AACATCATAG
TTTTGCTCCA
TGAGGTCATA
TTTAATTTGC
TGCCTCATCA
TTTTCCGACT
GACTGTAGCT
CCCATCAGGG
CTTCCTAGTT
AGCTTGGTTT1
GCATCTACCC
TCCAAAACAG
GCAAAAACCT
TCTAAAATTG
'rCATCACGGA
TTAATGGTTT
TCTGTTAGGG
AAAAGCGTGT
GAAGGGCACC AATAGCGCCT AGGCCACAAA ATATTCAGCC TATTGTTCAT TA'rTCTTCT CTATTGCAGA CTGTAAAAAA TTTTTCTGTC ITrATACTC1-r GGTATAGGTA ACTGACTTCG 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 'rGGAAA'rCAA CCAAGGCGTA GTACTTAATG TCCAAGTCAA CGAACCATTT CTGCCCCTTT TTGCCCCTCT TGTTCTCCTA T'rG'rTATCTG TCTGTTTCT ACCATTAATC ACTTGACTAT TTATCACGCA TGAAACTGAC TAGCI'TCATT TTCTTATCTG ATAATAGAGT CAGTTCGTCT CTCAACACTC TTCTGGCCGA ATTAAAATAT TAACTCCATC TCTAGTGTCC TGACCATTAA CGGGCATCAG CAG"rrCT TGCGCTAGCA TrrGGTAAC ATGGCCAAAG CCACACAGAC CAAAAGTGAA AAAATCACCA AGCTTCCGTC 7 rCT'rrT TGGAGGGAAA GAGAGTGCTT GTGATT'TGGA TTGTGAGCGA CTCCGGTTCG TTCTTGTTC CAAGCTAGAG CTAC'rATTTC AGGCAAACrC ATTTTTrTCT CTCTCATTGA AACGCAACTG CTCATGATGA CTTAAGGGAT CATAGCTTrGG TAAGTCAACC TGCTC7rCTC CCCTAGCAAG AGTTAGCTTT TCTTGCAAAT G.ATAGTGAAT ATrTTT'nAGC T-rCTrACr CATCTTCTCT CAGAAI=TA TACTGGATTC GTCTGATATr GGATAAATAG TAATTCTTIT TGGGCAAAGT AAAGTATTCA CCCAGTGCrG TCGCCAAGCA AAGGTCGACA
GATAGGCACC
GCACCAAGTC CTTATCGGTA TCTrAGTGG ACAACTTTCA CGCCTTATA AAGTGCACCT CAG'TTGGC TTGCAAGGGA AXTTGGCTTC CAVI'TCCTGA
TAATCCACAT
ATTTTTGTCA
GGAAGGTTGT
ATACTAGGCT
ATATCCTCGG
AAATAATCAT
CCTTT-CCATG
CAATCGCTTC
CGATAATGAA
AGTCAATTCC
CAGGTAATGT
TTTCAGCTCC
CAATCAGTTC
CAAGGCCAAA
TAGAACCCAG
CAATCCATAT
TTCGGAAATA
ATATTGTTCA
'IrCAATCTTA
TGCTACTGCT
CTCAATAGAA
GATGACATCT
AACGTGGTGT
ATAATTTICCT
CAAGCCTGTC
GGTCGCTGAT
ACCTAGATGT
CCTCCTCC.AG
ACCTGACGAA
CGATTCTCAA
GAT'rTTCCAG TCTTCAGCAG AACTTCTAGG TGCAACTCCT GCAAAGGGCT GGTCTGGATG AAACCGGCCG CATAAGCTGT ACTAGCTGTT ACCTCGATr'r GAGCCTCTGG TCGATGAATT TCTTTCCTTG AGCCAAGGCC TGTGGATGTG AAAAAA'rCTT GTATGGCCTG GAACCACCAT CAACTGCTGA TGAATAGGC GAACGAP'rTC 'rCGATGTGAG CCTGATGAAA AAGATAGTCC AAGGTT'rCAT GAACACTACC TTTTCAACrc GCACCACAGA ATAGT'rCACT AATCCTTGCr CATAAGCCTT GTAATGTTGG CAAAAGCCTG CAATTCCTCA TGAGGAAAAG CTGTCTGCAC GAAAATGATC CCTTGGGACC TAGATAAGCA ATTrTTCATCT TAG'IrCCTCT CTCGGCr'rAG CTTGGTCACA TCCAAAACCC GACTAGCCAC TTCCTCATAC TT'TCTrGAA AATAGCTACT AGTTCTTCCT TGCTATTATT TAGAAAAAGC TGTCCrrATC AGCTGCGATA CGTTGGTAGA GGGTTTCAAA ATCTGCTCTC TATCTGTATT AG;TCTTGAGT AAGTCACGAT T'rCTC'rGAGA AATAACCACT TTGACACGAC TTGG'rCTGTTr TGTAGTAAAT CAGCTAGGAC TTCTGA'rTCT AGGCTGTTTC 'rCCCTTTTCA GCGAAAAAAT TCCCAATGGA CATACCTAGC
CTCATCTTTT
AAACGTGTGA
7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600
TCAGAGCATC
CCCCCATAAA
CATATCAAGG TAArrAGGGT CCCTAATAAC ACCTTAGCCA ATCAAAGAAA CTAGGATACCTGGTATTGAT GGCTTCTGCA A'rCTGCAACC AAGAGCGCTG CGATAGCTGT CATCATGCCG A'rTGACTCTA GCACCGTGAA GAGCTGATTT TCCrrTGA'rA CCAAGCCTCT TGCAATAGTC TGAATCAAGC TCTCCAAATC CGGTCAAGCT CCACCTCTCC ATACGGTGGT CACCAAACGT ATCATCCCAT CTGCCGTAGG
ACTAATATCT
CTTGACCTTG
CAGGGCAATA
TCCTT1TCAAG AGrrATTTCC
GTTGATCCCC
CCAAAAGGCT
TGGCCCCTGG
CATATCTCA
688 GCTCCCATAC TA~rAAGGC GTCTGCCACA ACCTGAATAC GGTCTGT'rc AGCTCCTCAG CATCCTTGAT AACTGTTACA CCTTGGGCTT CGGTCGCAAG ATIGGGCAATTr CATCAA'rCAA TCGTGGAATC AAAGCGCCAC CAATCTCTGT TCAGAAGACr CAACAATCAA GGTAGCAGAT TTAGCGACTG GATCGATTC AATTTTCCAC CCATGGCACG ACATTCTGCA GCACTAGACG GCACTGGAAA TATCTCCTGG ACTGTGATTT TCrACCATC GTATGA7"rAC GGGTGTACTC AATGACATCA ATAATACCGG TGCGAGrTTC AGAATrTGGA GCAATCAAAC CTGCGACTAA TACGACCACC TTCTGTCCTG TCAATrTI'rG CACACTTAAA TGACCACCAA AT'rGTrTCAA T'rTTCGATA ATAACTGAC'r CCCCCTTAGC GACTTGGGCA GACGCAATTG GCAACTCATA TTGTAAGCCT GCAAACATCA ATGAATAGGT CTTAGGTTTr TTGCCCTGAA ATGCTGACGC TTTGGAAAGA CTATCATCTC TGAAATCAGG CGAATCGAGG TAAGCCAGCC ATGCCTACAC
AGGCTGACTT
TCGTCCCTTT
cCZAT'IrTTTT
CAAACATCTC
TGCCAGAATT
CTTGAATGGT
TAAGCGAAG4G
CAGTGGAACG
TACTTCGAAA
TCCCATATTA
AATAACCCCA
AAGAACGTCT ACCAAGGTCA CGAAAAACC'r GCATGGTCGA AACCTTGCTC 'rCACCCTCAG CCAAACTTCC AAAGATAATG GTCACCTGGG ACGCGGATAC TACCATGTAA ATGGCGAATG GGACCTCATA CTTGCAATAC TTTTACCTAT TTTATCATAA AATTCC'rGAC TTTAGGATCG TTCTTTTCTT ATTTCAGCAA CAAT'r=C AATATCAGAA AGGTAAATGG CCAA7=C~r ACAAGAGGCT ATTTCCTTGA ATCTGTTTAC CAAAGCCTTC CATCTGGCAT TTGACCTGTC TGTGCTAGTT ?I-rGAATTTC GGAGGCAAGT CTCGTTCAGT
GTCACACGGT
TCTGCACCAG
AGGGCAT=T
TCTTTATCCT
TCACCTCGCA
GAACGGTGGC
TrrGTrT'rTA
AAAGCCAGAA
7rCTGAAACT T'rGC7TTGGTA CATCT'rAGCT
CTCTTGAAAG
CCATAGGACG
CAAGGACACC
GTGGCGCT
CAATTT'CAAC
GAATATCATA
TGATAGACTT
G?'rTCATACT
ATTCCTTAAA
GGTTCAAAAA
AAGAATTCTG
TGGAAGGACG
GCAAGATAAT
GCTTTAACAG
9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 CTGTAAAGAT 7TTCTTGCC TCAGCATCTG CTGCAATCGC ATCrrAGCT
CCTTGTATTC
'ITGACATGTT
TGCTCAGCTC
TCCTCACGAT
TCGTAATCCG CGTAGACCGC GACTGAGTTC GlTTTGCACTA TCGTAAATAT CTTCTCC?1'A 'rTGATGACG ACTGTATACT CAGTAT'IT'rC TGTTATGAGA TTTCCAAGTC TTGAGCATTT TTAAATG.AAA TTTGTAGGAT TCCGTGAATA TTTCCTCGTT GATGTGGATA T'rAACCAAGG AAGTTCCACG TAGCAGI'TCC AAAATCCCCA GGATGACATC TTCr'rCATCA GGAACGTCAA CATAGACGTC GTAAGAGCTA TCCACACCAC CACGCTTATG GATTTCCATG GTCTGGCGTT GTTCACGCGC 7TGGTTAAAA 689 AAGTTCCAAA TTTGCTCTTC ATCTCCCTTA CTAATGGCCT TCCTTGAAAT CCTCAATTCT ATCCAGAATG ATCTCCCTAT CACATTCCTG GCTCGCTIrC CGCAATTCGG GTCATATCTC CGCCTTGCCA TCTCATGCTC TTGAGCATAG ACCGCAGTCT AAAATATGAG GAAAA'rGGCT AATCTGAGAA GTGACACGAT TCGATAAAAC GAGCATGAAG ACCTGAAAGC AGATCCTTCA CTTGTCAGGC TTGAAGGTGT AAAGATATAA TAGGCATTTT GAAGCAGCCC CTGTCTTGTG ACTACCAGCC ATGGGATGGG TTGCCAGCCA AATACTGCTC CGCCGCATCC ACAATGGTTG GACCAATCCC TrCCAAACG'r TGGACAAGAG AATGGAGGTC GAAAACCACC TGCCGCAAAG GCTCCATGAG ACTAGAAGCC CATGCTCCTr GGCATCAATC TTI'CCTTAAG CGTGTCCTGA CAAAAAGAT GACATCTGCC CCCCGACAAA GCGAACAGAC ACTTGGTCGA ACCAGCATCT CCrTAATGAA AGCAATAGT'r GAGCA.AAACT AGCAAAATCA 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 9 9 9 GAAATAATAA CGCCTTCTCG CAAATCCAAA TGTTTGATTG GCAAGCTGAG GATAATGACA 'rCCGTTGCAC GGTCAATCAT ACC'rTCTTTC r'rAAACCTA AAA'rTTCATA ATCTGGATGA CCAATCAACC CAAGACCTGC GATATAGATT TTTGTATAGT CTCGGTGTTT GCCACCGCT AAGGCGATAT CTCTCGAAGC TCGCGTTTGA TACCAAGTGC GTTT'rTGCCA TAGGAACTCC TCTTTTAGTT CCTCAAGATT TTGAC1'ACCA 12120 CATAGAGGCT 12180 TTAATAG=rC 12240 ATCTGATGAG 12300 GACCATTCCT 12360 TTCGTGGTT 12420 GACCCCACGA 12480 ATTGGTACGG 12540 'T7GGCCAACT
TCTGCCAAAG
AA'rTTTTCGA G4CAGCTGGAA
TCGATATCCA
ACAACGATGG
CGAGTATAAC
TAACCAGCCT
GCTTGAGCCA
GGA'rr.TCTTG _CGCCAGAACC GTTGCTACAA CTGCTTCCAT GAGCAGTCGG ATCACTTCTC TCCACGGTTG CCTTGTAAGG CACTCATAAG AGGTTTATAA AGAGTAGGAA TGGGTTTCAT GrrGCCCA7TT AGTCATACCA CCTTCAAAAC CACCTAGATT
CGTCTTCTTT
CAAAGCCAAG
ATCTTGCATC
9 9 *999 *99.
99 9* 9 GGAACGCCTC CGACGACTGT TGGTCAATAT AGTCCTTGAT AGACCAGAGA ATrTCATCCA TAACTTGGCT GCCTTTACGA ACCAAATTCC ACCCCTTTAA AGGCATTGAT AGAGACAACA CAAT'TTCTA TCCCATTGGA CATAGGAACC AAGACCAACT CI'CCACAACC CCACCGATGG TATCACCATC ACGTTTGATT TTCCTG7'rCT CGTTCTTGGT TGACAATAGA AACI'CAGAC TTCAGCGACT GTCAGATTTT CAGGAACATC GATT1TCCTTG GTTGGCAATC TCCATATCCA GCTCAGCCAA GAGGCGTTTG CCGCATGGTG GTT'rCACGAG CTGATGAACG CTCCAAAGAA GTACTTAATC CCCCCAACCA AATCGGCATG ACCTGGGCGA
TGCGCAGCTC
CCACCAAAGA
GCTACTGCAC
12600 12660
TTTGCTTAAT
CCACGACATG
CAACTGCCAC
12720 12780 12840 12900 12960 13020 13080 13140 TTTCGCAAAT CATCAAAACG GGATGAGTAA 'TTTCCGCr'r GCN=AAGG CGGTCTTCAA TGTCCTCCGC AGACATGATG TCCAGCCATT TCTGGTGGTC CTTATTGATG TTCCCGTGGC GAACGCCCGA AGTAAAGACA 690 ACATCCATAG TAATAGGCGC CCCTGTCGTC ACCTGG'rCAT TCTCAATCTr CATACGACCA CCACGACCGT AGCCACCCTG ACGGCG9'CTA AGGTCCTCAT TGATATCCTC GGAAGTCCAG CTGGAATTCC CTCA.ATAATA GCTCTTAGAC GGGGGCCGTG GCAGTTAAAT ATCTCATACA CTCTCCI-rAT CAAACTGGGT CAATGGTCGC TGAACCAAGC CCACCCGCTT TCrrGTCATG AGTAAGAGCC
TTTACCAAGT
TCTGGCACCA
TGATAAAGC'r
AGCTT~TCAT
AGACCAAT'TT
TGCCAAC'N'C
AGCTGTCAAT
TGATTrCTCCT CTCTrCCAGA CAAG0TGTTA
CCAATTTTCA
AATGCCAGCT TAGTCAACAG GCAA.ACCGAA TTTCTGACAC ATCTCTGTGA TAGA~rG=G 4* 4* S S
S.
9 5 GGCATGAGGC CT'rTTTCCTC TCTCCATGCA TGACCTTGCC CCAAAAr'rGA GGTAAAGACG TTCACCTGAC AAGAATGT'rC CCATTCAGTC CCGTCAACAG ACTTCACCCA TCCCTTCAAT ATCAGAACCC CATCTGGTTG ACGCCTGTCT TTCCACCGAT *ACAAAGI**GAA TACCCCGCAT CCACCACCAA GAGCAACGAT TAGACTTTCT GAACAGTAGT GCTACCTCAA AACCAGCATC ACATGGTTAT CTGTCACAA'r CCAGCCTGGG CCATACAACC ATTCTGAT~r TCATAGGAGA AGCAACCTTG GAAATCTGTA CCATTCCCAT GGCAACAGCC ATAACCCGCA G'TCGCTTCGA TGGCATGGCC AATAGTGTG AATAcC-AT'rG TCCAACTCAT cT1'CAACCAC CATCTTGCGC AATCAAGGTC TCTGCATGTT CCAAAATACT CTCAACAGAA AGCCCACAGT 'rCTGGATCCT CAATCAACCC ATAC= GATA CAACTCTC?1' rTTCCGAGGG TTTCAAGAAC AAGTGGA'rCA GGCAAAGGTC CCCACCATAT TTTTAGCAAA TGGTGTATTA 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 AGAAGAATCA ACCTGAGCTG ATAGGTAGAG GCTACAAATC TCCATCGCTA CGAGTCAGAC TAAATTCTTT CTTTC1rCAC TTCTAGGCTG AGCTTGACCT TCAAACTAGT CGGAATCTGA CAGCCAGGTC CCCAACA:ACG CTTGCTTGAC TAGAAATTCA CT'rCTAAGAA ATCAAAAACA TCTCTGCATA GAGAGAGGCT AGAGTTCTCG CAACCACTGA AAGGATGGTG AGGAATATCG
CACTACCTTT
?TTTCAATC
GTCTCCCTTT
TGCGGTTGcc
TGAATATCAT
S
S
6 CAAATCTCTT CTGTCGGCAT TTCCTTGCCT TAGAGTAACA TTCCCAGACC ATTGACTGCT AACGGTGTTT CAAAGGGTTG GTATATGATA ATGTI'TTCAG GAACAGGAGA GGATTGGCCA AAATCCGACT CGGCAATCCT TGC'rTGCAGT ACrTTTAAAAC CTGTCTGCTC CTGTAACTI'G ACGGAACGAA CAAAGACCGA AATCrGACTG CTI'TATTGGT ATTTr'rCTGT GTCCACAGTI' GAAAAGCTTC GGATTGCCCT GACTTCTAGC TCTGCAACTA AAAGAGTTTC TCCATGCCCA CACTGGTGGC TCAGAAACAT ArrCTAAAGC TCTAGGTAAC GTCTTCTTTT ACGCCATCCA AAATAGCCTG
TAAAGACTGC
TGCAGCTTGA
CCATTTCAAA
TGGTAAGACT
ATTAACTAGC
ACACAAATCC
TTCCATAGAA
TGCCAAGATT
GATTTAGCCG CACCACC'TGC ACCCAGCAGG GTCA~tCM TACCTGAAAT TGTAAAAGAA GGCAAGCACT TAAAAAATCC CTTGCCATCT GTATTATATC CAArrAAATT GCCATTCTCA TTGACAACCG TATTAACCGC ACCAATCAAG CGCGCN'CAT CGC1'CAGCTT ATCCAAATAA GGAATCACCT GCTCCTTATA GGGCATGGAC AGATTGATCC CAAACATCTG GTAGCGACGA ATATTGGCCA CTGTTCrAC CAAGTCACTC GCTTCAATCT CCCAAGCCAC ATAAGCACCG TTGGTAGCTG TCGCCTCAAA GGCTCTATTG TGGATGAAGG GAGAAATAGA ATGCTTAATA GGATTGGCAA CAACTGCAGC TAAACGTGTA TAGCCATCAA GCTTCATCCA AAATCTCCCT GATr7rTTrTC ATGCTAGCTA GAGAAATCTG CCCAGGGGCA CTAACCTCAT CCAGACTGGC AAAAGACCAA CTCGAACCAG TrACATCCGC AGTGATACGA GAGACCrrGC CCACCTTACC CATAGAAATG G'rCACATATT CCTG?1'CAGG ATTGAGGGTT TTAAAGCCTC CGGTATAGTT CATCAAGTCT AAGACATCCT GCTCCGTGTG AGCCATCACC GCAACCTTAA CAAG71"PrGG ATTTAGGATC GTCAACTCTG ACAAGATTC ATGGTAACTC AAAACAAGAT TTGGGAAGTC ATAGTACTCA AAATCAA'rAT AGTCTGGTTG GATATAC'TCT TCTGGAGAAA GGTCGATT1TC AACCAACTCA CPGCCTGC;A.ATTTTTrCAAA T'rTAGGCAGA TAGTCGGCAC GCCATTCAA'r CAGAGCCTGA GCCTCCTCTA AACTCTTGG ACCTTCATAC TAATCACCTr GAGGTAATTA C-ATC-ATGTTC TCAGG'rcTrT CTTGOAAATT CACCATTTCC TCAAAAACAT CCTTGTAGC'r ATAGAGTTCCGCAACT'rCCT TGATTAGATG TCCACCT'rCG GAGCGAGTTC GTAGCGTGAA AATGGCTGGA GCTACCTGCA AAAMCGCT'?'C GATGTCGGCA TCCAGGTACC TCGTGGCATC CATTACTGAA ACGATTAATT TCA'rTrACTA CTACTTTCAT CTTTTTTAT'r ATAGGCAAAA 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 TCTGCTGGAA GACCA'rA='- GTTTAAAATC TGGTAA&crC TTCCTGCAAA ACCTTTATCA ATTTGTTCTG TAAATTTCTC ACGGGAAACA TTGGCAGCAr 'rGGTACTGGC AATGATAATC CCTCCCCGAT 'ITAAAATCTC AACACTCTGG GAAATCAACT 'rGTGATAATC CTTGGCCACA GAGAAAGTTT GTTrTTTTATT CCGAGCAAAG CTAGGCGGAT CTAGGACAAT CACATCGTAG GTCAATCTT TGCG'NTGGC ATATr1'GAAA TACTCAAAGA CATCCATGAC TATAAAACGA TGCTCGTC'rG TGCTGACCCC Ar'rTGCCTGA AAATGCGCTT GAGACAATTC TCGTGAACGT TTGGCTAGAT CAACAGAAGT TGTATGGCTA GCTCCTCCCA TGGCCGCAGC TACTGAAAAA GCCGCTGTGT AGGAAAACAT ATTGAGTAAG GATrTACCCA TAGCCAAGCC GTCAACTAAA CTACCGCGAA CCTCATGCTG GTCTAGGAAA ATTCCTGTCA TCAAGCCATC ATTCATAAAG ACTTGATACA GGACACCATT TTCTAAAAC-A TTGAAAAAGT CAGGTGCTTC TTGACCATAA ACATGGGC.AG ATTCATAGTC TCAGGGAAAA CCTGTCTAAA TTATACCAAG AAAAGACGGC TCTCCCTCTT GATTAAAGAG CTCTrTTC'3r TGGCrTrCr TCTTTGCTGA TAAACCAGCC AACTTTCCTT CCTGACCCTG TCACTGGCTT CTAGTAAAAC CTTAT'rCTAT TCATAACTAC ACCCTTGCTT T'?TTACTCTT AAGTAAATGT GTAAGAATCA TTATCAACGA AAATCAAATA AGGTACAGAG GTrGTATCAG GAAAGATAAA GAAATAAATC CCTTTTGTT ACAAACTCAA ATAATTTCAC CTATCTGCAT AC'L'TGGAACA CACACATTAT TAAACTATAA ACAACAAAAA ATATA'TTT~ GATTACCAAT G GTCGAA 'rTGAATTTAT CCAATATGCG TGACAGGTAA GAGCTCCTAC TCCCAAGCTG TCCCAATTAC AACGGCCATC TGCCATTTGA ACGATTCACC CGTCCTTAAA GTAATGGTGG GGTAAATCAT CGAAATGATG CTGGTAAAAA GGACTGGACA AGTCAAAGGG TAAACTCTTA AATGCAGGCT GGTGATATTG CAAACCC?1'A
GGCTTCTGAT
GTAGTCGCCA
ACGAAAGGCA
AAACAACGTT
CAAGCCCT'rG
CACCTCTACT
TAGCCCCTTA
CAITTATATCA
CTTTTAAAAA
CAGTAAAAAA
GCAAACTATG
CAATATGTGI'
AGCTTCAGTA
TATACTATCA
CATTCTATTT
AGGAATTAAC
TAGAACTATG
CTTAATCATT
CAAGCAAGCG
TAATGATAGC
ATGGCAAGGA
AGGATTGCAC
AGGTCCGTAA
692
AAGCGGATT
ATAGTCTGAC
TAAAGGTCCA
G=GCAAAT
TCAAAGAAAG
TTTTGC1'GAG
TCCTGATCCT
GCAAGCTTCT
AACTTTTAGA
ATOCTATACT
TGCTCrrCCG TCTTGGAGGA GCATTTC~r AAACTAGCCT CAGGTTAACT GTGAGATTAT CTGTCAAATT TAGTGACAAA GGTAGTAGAA GGTATCTGGA AAA'ITr'GATT TTATAGAGAA ATAAATAATA T'rA'AGAAGC AACAATAATT CGAACTCTAA ATATATG;TTC TATCAAAAAT GTTTTTGAAA TTGAAAAAI'A 'rCCAAATA.AA TTATATTTCT TATTCAAAAC ATTCCTCCCT TACAACTACA TTCTAACAAA CI'ATAAAAGC ACCAACCAGT TCATCTTTI1T TCTATTTCTG CAAAAATAGC AAGAGCAAGC AAGACGATAA TAGGGGAGAG AGACTGAACC AAGAATATGC TATAAATAAA CA.ATAAAACr ATGGCGAC rA TGCTACTCCA ATTGGT'rGAC AGA'N'TTTAA TGACACTGGC AATGATCCAG ACTACAAGAA GATATAGAGA AAGACCAAGC AAAGTCAGAA TCCAAAATTT CACTTTCACA TAACGAGCAA CArTrTCCCT CTCCAAGGAC AAGGCAATrG CTGCTATAAA GAGAGCTATA AAAAACAAGG CTATCTGAGA ATCTCGGATT TTGGAAATCA TCTCATAAGC TCCTAAAACC GAATCTGATA AACATAAGAG CTGTCAGACC CCCAAAGCCA CA7=~GATA GTAGGCGT C?'rGAT'rCAA GGCC-ACCTrG AAAGGTAGGC AGTCCCAAGA TAAGATTGAC ATTCTCAAGA TTTCAACCCT TTTGCTGACT CAATTCTCAA AAAAGAA.ACT AGACTTCCTG CAAAACTAGG 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 i7700 17760 17820 17880 17940 18000 18060 8120 18180 18240 18300 18360 18420 18480
CAAGAAAGGA
GGCAAGCCTA
GCATATATAA
AGAAAATCAA
TTATTGACAA
GTAACCAGTA TGGAGGATGA ATGTCTGGAA GACCGATCAT CATGAGATAA TCCCCTGTCC CAAGAGGGTG TGGTTGAAAT TCTCTCCTTG GGATCACGTC ATAAAAACGA TTAGTCCTAT CCAAGCGAGC TTTTGTAAAA GATATGAAGA TACTrCCAAA AGTCGACTGA CAATCATCAC ATTCGAAAAA AATGCACTGC CACTACCATC GCATCAAAGG CAAACCCAGC AAGCTAGGAT TGGTACAATT TTATAATCTC TGAT'rCTTTA GGAAGGAAAG CACTrGTAAA AAGCACTGTA ATCACGCCAG AGGTGGTAGC GTAAAACCAT GCGAAAAAAT CCCTTTTTAG CTGCGACGTT CTTTr7TGAC CTTCTCCTCA CTATTAAGCA GGAGACCT TCTTTTGGT CAGATAAAGC AGGAAGAGAG AGACCCACTA AGGCTTCTGT CGAAAAAGGC TCCACTGCTA GGATAAAGGA GAAATGGAAT GTCTCTAACT 'rrGTCAACAA AGAAAGAAGA TAAATATTAA AGGTATGAGA ACTCCTATCC ATAGACTGAT ACTTTCTGAA GACCCTAGTT TGAGCCAAGA ACTAGAGCCA CAGAGACAAA TAATAAGGTC AAGGACAGTA CATAGAGAAG GAGCTAGCCT AATGTAGAGG ACCAGAAAAT CCAGTrAGAG CTGGCAAAAG GACAGACACT CCTTTAGCAA AAGGCATAGG GCCTATACGA TACCAAATCC TTACTCTCAT 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19390 AAAAGACATT G'rAAAAGGCC GTTAAAGAAG TTGAAAAGCC AATCACTAGT AAAATAGCAA TCATCGAGCT AAAATAAATA GGTATTTCCT CAAAAGGAAA ATGAATGGCT ATATTACI'AA AACAGATGAT CArCAAGAGA CTGGAAAAAA TGTAAGAACT TAAGAC'rCTA GCGGAAACAT TTACTrrTTT INFORMATION FOR SEQ ID NO: 87: SEQUENCE CHARACTERISTICS: LENGTH: 18436 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: CCGAGCGTCG TTACAGACTT TATCAAGATT GGACGCAAGA AG~ AAAATATGGC ACAATCTCCA TGGCATACTC ATTACCATGT TG~ TCAACGACCC AAATGGCTTT TCTTACTTTG ATGGCA-AGTG GA' TTCCTTTTGG TGCAGCCCAC GGTTTAAAAT CTTGGGCACA GC TTCACTTTAA AGAAACTGGA ATCAAAGTTT TACCAGATAC TC( AAATTCAA CATATAAAGG AGCCAAAA ACAGGACTTC
TCCTCTTT
rAGAAAGT
CATTAGAT
TACCAGAATT
GATGATTTGA
AGCCACGGTC
CCTACTCTGG T1TCTGCCATG CAA'N'TGGCG TI'CGCGATAA AAACTGGATC CGTCACCCAT ATAACTTATT CCTATTTTAT ACAGGAAATG ACCAGATCGG TGCTTTGATG GACAAGGAGG 694 AAGATCTTGA 7rGACCAGCC AGCAGACTCT ACTGACCACT 0 0* 0 0 00 0 0 0 00 00 0 0 *000*0 0 0 0 0 00*0 .000 0 0*00 00 00 GTAAGAT1'AC AAAGATTGAC TCCGCGATCC ACAAATTrr ACTTGGAGAA AAAAGGTTC GGCAAGCAGT TGGCGACCTT CTAATTTGGT CTTTGTAGAG AGAAAGTTCT AGACTACGA'r ACCCTAAAAA TGCCAAAATG AAGCCTATGC AACTCAAGCC TTCCITTTGCC AGA'rGTTTCT TGGTCAAGGA ACTCACTATC AGGACCTTCG TGCTTCTGAA AACTTCAACT CAACTTGGAA AAGGTAAGGG ACTTTCAATC GCCAGGCTGG AGAACAGTAT ATCAGGCTAC TACTGCTACA AAGGAGAAAA AGTATTTTCT TTAAATCTGG AAACCCAACT GATGTCGCCA AAC'rTGCAGG GGGTATCTAT CTGAGAAAAC AAACCCAACA ACCTGGCTCG TT'CCCCAATA TTTCCAATGT TTCAAAAATG GT'rACAAGAC GAATACATCG AAATGTTGGA CTAGGAATCG AAGACTACAA TCGCCAGACA TCCCTGTCGT ACCTTGGTCA AGACAGGTGC AACTCAGG GTCAATATTA GTTCGTCTCT ACAAGGCTGT GACTTTGCTA ACGACCGTAC GAACAACCTG TCCTTCTCTA AATATCTTC CAAATAGA GTAGATGTGT CTCAAC 'rCA T'rCAACGCTC CTGATGGCCG TACCCATCTG ACCGT1TTTGA AAAGACGACA AGCTCTACCA GAAGCCTTCT CAAACCGTTC GCTAATAGCC AGAGCGAGAT AACTTTGACC T'TCTAAACGG GCCCAAGAAT TTGGGACAAC ATCTTCATCG ATAACTCTGT GGTCGTGTCT TCCCACATGC GGAACTTACT ATGAAITAGA
CAATAACGAC
TGCCTACATG
CTGTCCACAA
TACACAAACT
ATGGAATGTC
GGATI'GATA
TAAGATCGGG GCTTCCTTTG AAACATGGAT TACGGTTTCG TGCTCTAGCA GTTAGCTGGC CCACCAAGGA ACCrTCTT GTATrCCAGTC GCTGCTATTA CCAAACCAAG AACACTTACG TGTCTTrACTT GCTGATAAAG TCAAGTAACA GTCGATCGTA TCGTTCTTGC CCTATCGAGA CTTTGCAAATT TTCATCAATA GGACCAAAAT GGTATCCTGA T'rATG4GTCGC AAAACTAAC'r CTCGGG= AT CAATAAAAAA CCATGCGAGA ATTGGGCTAT CTAAGTTAAT CGGCTTGATT ATAAATTGGA ACACCAACTC ATGA7"CTGA GAAGGAACGC TCATTTCTGG TAG;TCACAAC TT'rCC'rTTGA CCGAAACCTA GTGGGGTCT TGCTGCCCALA CAGGGAATGA CAA~rCTAAT TCCCAAAAGC TCCTATTATC AAATCAAGAA TATCTTGACC TGCCAT'rarC GGCGGACAAG 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220
CGTCAGTCCT
CATCCAAAAA
TAGT1CTGCAA TTTCTrATGCA
CATCATCTGC
AGCCAATCAG
TCGTCTGACA
CTCCTCTGAC
CCAGTCTATC
ACTACCGTTT
GTCAATGAAG
GGAAAATCAG
GAATTGATTG
AACAGTGAAC
GTGGACGGCA
CCGCCGATTA
AACTATGCTG
ATCATGATTA
GCATCCGTAC
AAAGAAATGG
TCGCCAACCG GACTGCGCCA CGCTGGTTTT AATGTTTCCA CTGACTTTTC TCCCGTCAGA CGGGAAAAAC CAGATGCCAT T1'r'GCTTCG ATCGCTCAAG AA'rTGGGCAT 'TTCTGTCCCA GATGATTTGA CAGCTATTCT C4TCATTAAA AAAGAGCTCA AGGTCATCGG CTATGATGGG 695 ACCTACTTTA TCGAAAATA CTACCCTCAA TTrGCTrACTA TCAAGCAACC ?1'rGGAAGAG 2280 ATTGCTTGTC TCACTATTGA TCTTCTCTTG CAAAAGATTG AAGGCAAGGA AGTCGCCACA 2340 ACTGGTTACT TCT'rACCAGT TACG-CTATTA CCAGGAAAAA GTATTTAAAC ACAAGAAAAC 2400 'rCAGACCGAT TCGrCGGT T1rrATGATC TTAAAT7rTC GAGATAGCGC TGGGCTGTCT 2460 CTAGGTTAAA CGGrrATCT GAGATGAGGC GCTCTACTAG GGGAGCAACT TCTTCAC 2520 TAGCCCCAGC TAGGAGAGCT AGGATTTGG CCTGTACTTT CATGTGGCCT TGCTCGATCC 2580 CCGTACTTAC CAAGGCTTTG AGGGCTGCAA AATrMTAGC AAGACCGATG GACACCATAA 2640 TCTGGGCTAA TTCTCTGGCA GAAGGA'r'1'C CTAGTAGATC ATGACTGAGA ACTACACGTG 2700 GGTTGAGGCC GATAGAGCCA CCCTTAGTCG CTACAGGCAT GGGCAGGGTC ATCTCACCGA 2760 CCAATTCTTC TCTTTCAAGG TCCAGCGTCC AGCAGCTAAG ACCT'rGATAG CGTCCATCTC 2820 GACTGGCAAA GGCATGGGCC CCAGCTTCGA TGGCACGCCA GTCATTACCA GTGGCAATCA 2880 AAATCGCATC AATACCATTA AAAATTCCTT TATTATGAGT AGCAGCTCGG TAAGGATCAG 2940 SCCTGCGCAAA CTGACTAGCC AACCCAAT'rT TCTCCGCAAT CTCTCGTCCT TGATCCTTTT 3000 *GGCGGCTCAA GTAGCGAAAG GCGATGCGAC AGCTTGCAGT CACCAGAGAA TCGGTCGCG'r 3060 *AGTTGGACAG GAT'rCCCATG AGACTCTGTC CCTGACTGAG TTCTTCTAAG ACTGGTTTCA 3120 *SSAGGCTTCCAG CATGGTGTTG AGCATATTGG CACCCATGGC TTCCTGGGTA TCGACATGAA 3 180 ***TATAAACAAC GAGAAAGTCT GGTTCGCCT'r TTATCTGCTC GACA1TGCAGA TCACGCCCCC 3240 CACCTCCACG ?TTTAACGATA GAAGGATACG CTTGATTGGC AAGCTCCAAG AGCTCCGCTT 3300 TCTrGCTGGC AA'rCTTCTCT TGCGCTAG1'T TAGGATTAGC AACTTGATAA AGGGCTACCT 3360 *GCCCAATCAT C'rGTCGCTGA TGGACTTGTG CAGTAAACC ACCTGCACGC TTGATGATTT 3420 TGCTGGCA'rA GCTGGCCGCC GCAACCACAG AGGGT'rCTTC TGTCACATAG GGAACGGTGT 3480 **.A'rTCCrGACC GTTGACAAGT ACCTCCGGAA CCAGTGAATA AGGCAGAGAA AAAGTTCCCA 3540 CTACATTC'rC ACTCAGCTGG TCTGCCACAG TCACGCTCAT CTGTTCATCC TTCTCCAGAC 3600 TAGCTTGTCT CTCAGGACTA AGGAGCGCCT GAGCTTAA CAGCTCGAGG CCrCTTGGT 3660 ATGA'TT'rT AGAAAATCCA TTCCAACTTA 'rCTTCATT'AT 'TTTTCAACCT TGCTA'rAACG 3720 *GCG'rrGGTGG TCGAGAATTT CAACCAAGGC AAAATCTTGA rrTTCATAGC CAGCAAACTG 3780 5GGCAGAGTTA GTTTCATCCA AGTrTACTTC CTCAAAAAAG ACCTTTTCAT AGTCTGCAAC 3840 GGATAGGGCA GTTCGTTGGT TGAGCTGTT CAAACGGTCT TTATCCA.AAT AAGCTTCATA 3900 'rCCTTCAACC AATTCACCAC TGAAGAACTC AGCCACAGCT CCACTTCCGT AACTATAAAG 3960 696 TTrA7'rTCC AAGAGAGACA AAAGTCCAAG GGCGA=NrA TCCCCAGCT TCAAGCTATC
GAAAAGTGAA
ATGCT?!TTGT
GCCTTTTAGC
CCTGTGTAGA TAT'rCCCCAC AAGAGGTCr TTTTCTTG GCrAATTTAG GATAAGGCAA AGTAAGCTGG TAGCGTTTTT GATATTCAAG TTGGGTAGAA TAGACACCAT TTACATAAGG CATGATGTCA CGGGTCTGAG CTACATTGTC TG;TAATCAAC ATAGCTACAC TTCCAGCACC GTATTTGGCA ATATCACTGG CAATGACCAA CAATTTGCGCA TAATGGAGGG CAGCAGTCGC CTTrACTC
AGGCAGGCTC
GTGGAAACAA
CCAAGTCC'TT
AGTTGTCGAG
ATTA'rTAAAG
TTGAGTTGGT
GACCTTGGAC
TAGAGAATAG ACTGGTCAAA TTATCCATGA T7*TTTTI'CAA ACAGCCGCAA AATCATCCAA TTCAAACTAr CCAAGTATTG TAATTGGTC GCCAGAAATC GCCATCATGC GTGGATN'TG TCTCCTGGAG TTTCAATACC TCCGGAGAAT T'rTCCACATG TCCGTAGCAG GCTTCTTTAA TCTCGAAACT ACGACCAAAG GGCTGGATGC GTCAATTCCT GACTCGGTCG TAAAATACAG TCACTAGCAC ACTCAATTCC T'rGAGTAAGA TGCTAAGTCT TGTAATTTCA GATTGCATA TTTACCTCTG CACAAATrGGC AGTAAAAGAG TCATCATTTT AAAGAATGAT CTCCATTCAC AGTTATCAGC CATCTrACAAA TTT'rrGATAA TCTATCAAG TG'TTTTACTA GAAACTCTGA GCCCGAACTA CAGACAATAA CGAAAGTAGT ATACCAGCAA TCAAATTCAT GAAAACATTT TAAACGTTT GATACCAT GGTTACAGGC CGTGGAGTTT G'TGCTTGAGG ATTTTACCAG CTTGTCCGAT TAGTTCACAG CGATATCCAA
CCAGCAAGCC
CCACAATGAC CATGTCAACT TCTTGTCTTT TGGCCGCCAA GGTCACGATA TCCTCAGT'rA GTCCTTTACT TAAT=rTCA GGGTCAATTC AGACATATTG ACTGG'rC~cA AAACCAATCT TTTTATCATT CATGTAAAAA ATCGTTC'rAT AGAAAAAAGA CTrGATTCAC CAAATCAAGC TAGTTGCTAG AGAGTITCACC GATATAAGTA TCCTGGAGGA TCAAATTTCC TGAGTAAGTC AACTGACTGG TCGGAATTTC TCTGACATCC ACCTTICTCAG CAATCAATrG ATGCTCTTGC GAAACCATGA CTGGGATAAA CAACAAGGTC AGACTTCCIG CAAAACTAGA ATCCTAGTTC TCGTAATCCG AAGCGTTTAC GATGATTTCG TACTTTGGCA AAGATGTTCT CAACCTTGCT 'T-TATCTrCA GCTGTTAGCG GCTTGAGTTT ATATATCTrC ATGAGCCCTr GATAATCACT ATTrCTGCAA CTCATTIGA ACAACTTCAT AGAAACAATT CTCCCTTGAC TTGTGACAAT ATGCACAAAG ACGGCCGCAG CCT'rACTCTG
CTTGCTCAGT
GGGCGCCAAT
CCCTCGC7TTC
TATCAATACC
ACTATTTTAT
CTCTrATTGG GCT'rTATAAG
CTTCCCATCT
'rTATCAAA'rG
CATCCACTTT
AGTAGATTTA
ATGATTGATA
ATAGGTTGT'r TCTCTCCT'rA
GCTGGATTTA
GTCAGCCAAG
ATCATGACTA
CGCTTGAGCC
AGGGCGATTG
4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TTCATAGCGT GAAATTCTT TTTACCAGAA TCATTCGCTA ATTGl'r1Nr 697 AT'=-~ACTT CCGTCACATC AATCATTATC GTGTCCTCAA ATCGTAACAC CACTTTGAAC AAGAGTTACT TCAACCCATT CTTrTCGTGGA TACCAAAATC AGCCGCAAMT TCTCATAAG AGAAAGCGTT ATCAATrrAT TTATCTCA'rT ?ICAGAAAA CTACGATACT CGATGT= r rATATAATG ATAGAGTCTG CATTCCAATA GAGArrACCA AAGCCAACAT GACAACCAAG ATTATAGTCC CCTGTCACAA AAAAGGCAGT TGTrCGGTAG CGCTGCCAAA ATGGCCAAGA TAAAGACCAC AGCAGGTGTC CTGGCTGACA CAAGAACCAA TAATCGCTGC AATGAAGGTA TTCCTTGAGC AAGAGATAGA TTAGCCAGAC AGTCATGCCC CATAGACCGT TGCACATTGA GTACGATTAA AAAAGTGATA TGCTTCTAAT AAAAAGGT'rG 'rrAGTGTCAT ATTAGTTCAT TTCCTGCCCC TAAAGCGAGG GTAATGAGCA GGGATTCAAA 'rTATGTGGTT GGTCATAATA TCACGGACCG CATTGGTCAA AGCTGAGAGG AG7rCTTGAA GGCTCCGACG GAT'rAAGTTG TGCGGTATTC TCGCACATA'r 5820 5880 5940 TTCTTTrATT TCTCTAAAG.T 6000 ACAATCACTC 'r-rCC~CTA~c 6060 CTCGCACrG CCACTGCTTT 6120 GAGACATAAC CTGGAhCCAG 6180 7'rATAAAGAA TACTTAAAA'r 6240 GCTACAATGA CATTGGTCGC 6300 AAAATCCCTC CAGGTAAGAG 6360 ATGGCAAGAA AACTTGCTAC 6420 CAATACCAAG GCGACAGAAG 6480 CATCrTACTC ATACCAGACT 6540 GGCAATACCT GGTACAAACG 6600 AAAACCTGTG TAGCGAGCCC 6660 GGCTGTCACA AAGGGAAT'rC 6720 TAAGGTCGCC ACTCCTGCCC 6780 GAA.AGGAGCA CTAAAGrCG 6840 GGCTTGCAAG GCCGTCAATT 6900 CTGACGAGAA ATCTGTTCA 6960 GCGCTTCATG CGCAAATATT~ 7020 pGGACATTGC AA'rCCACAAT 7080 CGATGGAr'rT CTGACCCACT 7140 ACGGCATTTA ATTCrrTTGA 7200 TATCACAAAT CCGGACTCAA 7260 GCATGACCGC ACCAGCTATA ATCAAATCTG CCGTTGAAGG AAAACTGGGC AATTATCCCA AAGACAAAGG CTCCAGCAAA GGATAAANTT TTCCACATAG CAAGTGCGTC GTAGATATTT CAGCCAGAGT 'rACCTGCAAC GCTTAAAGGC 'rGTTTCTAAG CATCGCAGAC TTTTTCGATG G;GTATTTTCA ATAGAGAAAA AGGGAAAAGG CAAAACCAAA CCGCTAAACA TALACTGAAAA TTAGTATAGG GAAGGGGTTT TCAATCTGCC CCCCAACTAG TTATAAGAAG AGGAGGTCAC AGATAGCGGC ACGCATGGCA GAATCATGGT ATCTTCACA CTAGCATAAT CACATCAATG TTTATCAACT CCCTCTATTC
CCCCTGCGAA
TTTAAGGAGA
TCTr'CCATG
AAAAAATCTT
CT=TAAGGA
TGCGCGATTC
ATAGTCCCCG
CTCCTCCT
TGCCATGAAA
GTAGCTGAAG
TCATGACAAA
TTGTTTTAGG
CATTGATTAC TCATT''GAT TATCCATCTG TTTGTAGATT GAAATCTTCA CTCTAGTCTT ACTTTGTTCA ACAACCATGC CTTCTGCACT ?rCTATATTA GCT'rCCTTAA TCCCAACAAT ATTGAGGTCT ACCTrTTCAC CTGCTCTAGG ACCTGCAGGC GC'TGTCGTCA CTTCTACAAC 7500 TTGAATCAAA TTGTTC~rAG 'rAAACTCCAA ACTTGTAACT '1N'TAGCTA CTGTCAAGAC
CGTTCCGGCA
TTCCrCTATC
TGTAGAGTTC
CAAAACAATT
CAGGACCGT
TGGAACTTIT
ACTAATTTrGG
TCGACCAGTT
CCTGGACT~r GTTTCATAAT TTAATCAAAT 'rCTCAGGAAC CGTCCAA'rAT AGTTCCCTAA TG.AGTTGCCT TGCTCAAGTC CCAGCCTCAC TCTCATTCGA TTCTCTTTTA ATTCCGCAAT AAAGATTGCT TGCCTGATGA CCAGCGCCAG GATCTGTACG 698 CCTAGAACCA ATGTAACTCG GCATGCCAAC AATTTGAGTA GGTTTACTCA CATCATAAGT CGTTCCTrGGT TCGCTTTCGC TGGACTCTTC CTTCTTCM'C T'rGAGTrTC AGATTACrc TTGAATCTCCThG.CIN TAGCTACTGT AGCCTCTGTC TTC'rCCTCAC CA.ATCTCAAA TGCAACrCTC 'rGACCTGCCA CATCTGGAAT CCAAATAAGA GAAGCTCCCA CCAATACAAG A.AATCTATGT TTTTCCGTG CTTGTGGTTG TT TN ATr A43TGT= CTGTTTGCC CTGAGAAACC 'rTCGGCAAGG TCTTGGTATC AC-TTfcATTT CTACGATTGT AGGACAAGCT CGAGCGGTAG CGATTGGTCA ACTTTTTAGC CI'GAGGTACA GATGGA'N-rT CTGCAATAAC GGCAATGGTC ACCGCGCTAT CCCCGTCATA AATAATCCCC ATGGCATAGA TATCACTCTG TGGTGACAAG TAATGAACTG AGCCCAACAT
ATAGGTCGTA
CTCTCTTCC
GACATCAGAG
GACAACCAAA
GATAATCCGC
ATTGGCTT
GGCAATGGTT
GCTGGCCAAC
GThAGTTTCC
TTGAACCTTA
TGCCT1-GCTC CCT'rCzGG'rA GACTTTGCTT TCAATTTTAA TCAAATTATC GATTTCCGAC CGACATAATT TrTGATT'TCG TCCTTCrITr
ACTAGACAAG
AGTTGCCTTG
GGACGGCAGG
AGGGATATGG
CACAGTCGCCC
CGAGTTACTC
CCTTCTTCCA CCTTTTCACT TTG.AGCGTTG CCTTrCGCCTC GCAGGAGTTC TGGATAG'rAT AAAATCAGGT AACGCA'rCTT TCTGTCACAG CCTGGCTTGG GGAATAGATG TCAAGGTACT GTT'rCATCAA AGATTAACTT TCCACATACA TCTCTGAAAC ATAATAACAT 'rTTCTAAAGC GGTTTCTGGA AATGCI'GGAG CCTGTCAGCA TCTCATAGAA TTCGAACCAC GCGCCTGCTC TGGGTCAGAC TTGTCTC'rcC CCATCTCGTG TCAAGAGGAT TGGGCCAAGC GCATAGCCAA 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 AAAGGCTACA GCAATCCCAA AT~trTGAGGT
GAGAATTTGT
ATAGCGTTTG
CTCGCCAATA
TTCACGCTGA
TTCAAG'rCCC
CCCATGATAC
AGGTCCAGTC
TCTGTTATCC
AAACGAGCTA
AGTCTGTGAC CTTGGCAGTC TGTGAACAAT TCC'rCGAGI'A GGACTGCNTC TTCATTAGAA CAGCCACATA CTCCATAGCT GAACGATATG AGGATGGI'CT
AGAGGATAAT
AGGTACTGTT
AGATCTGCCA
TAG7TrGCTCC
AGGTAGACAT
GTTCCTTGAT
GACCGTCTT-C
TAGCTCTCGC
TCAGAACCTT
CCGCCATACC
CAGCTATCGG
CTAAGATTAA
CAATCCGATA
CACTGCCACT TCTTCCCCAT TCCTCGACCA ATCTGTTTGA
GTCCGTCTGG
G'TCTTTGGCT
GCGTCCGGCA AAAATCTTGC CGATTTGGAT 699 CATTCTGCAT CCTCCTCGTT GCATTGrrAG CAAAACGAAC ACAATATCAC GAATCTCACT Ar.ATAGITCAC CTGACTCAAG
CCAATAGACT.GGGTGATAAT
TGACCAGCCT TGAGCAATTC TCTCCACGAA TCAAGCCGAT ATAATAGCAA GGACr'rCCAA TGGTGAATCT TTTGAr=C ACTGTATCGA TCTGGGTATC GCCATATTCC CTGCGCGATG CATAGAAACA AGGCAACCC TAxATGTGC AAGTGYTCTCC GrTTTrATCTG CTAAAGGAAT 'rAAACCTCCT ATCACTGG7'r GCCTGAAATC ATGTTGGTCA AGCCGTCACT ATTGAGCAAfl GATAACTGTC CCAAAATCAG GCTGAAT?1'C ATCTrTI'MC ATrrTGC GGATGAGCTT CTGCCTCTTC ATTAACCAAG GAATCATCGC ACGCGAATCA CCAATATGAG AGTAGTTCCC ATGCCTCTGT 'rCGTCAACTG
CATAGATAGC
AAGCTTCATC
TGGTGTCAAT
ATGGTATTCT
CTGAT'rATCA
CTGACCAAGC
AATTTCTACG
AACCCAAGCT
ACCTCCCA'rC a. a a a a a a a a.
a GCTCTATrGA CATAGTG=T GACATAGTCT CTTAATAATG AAATTTCCAT GTGTCAGTTC TGATCAAGAA TCCATCACTT CCATACAATT TGATATCCTT ACATTCATCT TCTAGT7"NA ACGCCTTAAC GACTTGAAAA TTCTCCTCTG 'rACCACCTTT GCCTAGTATT TGACAAACAC AGGACGCGAA ATCTGCCGTr TCTTTATTGT CCATTCCTrcA ACAAGGAGCA TCCACCAAAA CATGCACC'rr TCTGGCATCC AATr=GAG TAATGGGCGA ACCATTCACG CACTTCATTG ACACCCACGT CTGTGACCGC CATTTCACTA CCATCAGCTA AAATAATCAT GGTACGTrCA TGGTTAT'TC TTCGTTTCTG ACCAACATCT CTTCCTAATC CGATATCTG CGAAATTGAC CAGCTGTAAT GAGGATACAG CCGTCN'TCA CCTGCrCGAA CTCGGGATGA CrCTCTAAAA AGACGATAGT GCAGGTCCTA TAAGrrATTA TACCTAATAT TTCTAACTGA ATTTCCTGCA 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 ATTTGATATC TGGTTTTCGG TCTTATCAAA GGAATCCTGG TTTGAACCCG ATCTTCAACT GGCCATTTTC TTCAATTAAA TCCAACTTGT GGTCGTACAA GTCCAGACCA CTGTCGTAAG ATAAGAGGCT ATATGGCTG TTrMCCACC TGGAGCCGCA GCACTCGCTC ATCACCTTGT AAATCAAGCG TCGGAGCAAC CAGCTGACTG GGATGGTAAT GGCTCCATCC GCAAACAAAT TATGCCCTGC AAAATCCCC CCAGACCAGT GGTTGCTAAA AGGGAATTAT TCGCCTCCA.A CAAGGCTTGG TTCGACTTAG GTCTGTTACA CGAATACTGG CTTrGTTTCC CACTAACAGG TGGCTTTTGC TCTCTCCTCT CCG'rATTCT'r CCTTGAGTTT GGCAACTAGC GAGAATAGGC AATGGAGTCA CGCTTGN' TTCGCTrGAT GCTAGCAATA
CGCAAAAGAC
TCAAAAAACT
CCCAGACGTT
GTAACCTGAC
CAGGCATCAA
GACTCATCTT
TGCTCCTTAA
ATTTCCTCTT
CTTTCAAAGA
CAAACTGGGA
TCTGGCCAGC
CTTCACGCAA GATACGGCGA AGGACAGCGT TGACCAATTT TTCACTGCCT =M~IACGGA GTTTGGCCAA T'rCCAC'rGCT GGAGTGGTA GGCACTCATG 700 TCATTAACCA CAGCA'rGA'rC TGGAATCI'G TCCAAATAGC AGAAGAAGGA CATAGAGCCA GCrTCTrAAC TGGTCTCTGT CTrCGATAAA GTGGGATAGG TACCATTCCA GAGTCAG'r'I- ACGGGCTACC GTTCCA'rAGA CCAGCTCGGT CACTAAGCCC TTAAGGCGAT ATT1'GAATAT AACrrCTAGC CGTTTCTACT 'rTGTCTGCTG CCAAAAGPG GCTTGGTTCA CAAAAACATC TTAGTCACCA AATCGTTCTC ACTCCGTTGA GGAAGGAAGC AATGTCCATC TTAGGCTAC AGGGATAGAG CCCCTTCAGC CGTTGCGACA ATCAATTCTT ACTTCCCrT AGATGCTTAT CTCTAGCACT GCTAGACCTA CTACAGTCAA TGTACGTCCA CAGCTrGGCTG CACT'rGTTTG TCTTGCCGAT AGAGAGAATC AAATCTTAAA GCGGTCGCCC GAATTTGG11T AAAGAGTTGA TTATATTTGG AGAGAAGGTA TCACCTGGAT TTCCCTGACC TTCTACTGGT TTAAGGAAAG TATGGGCAAC AGGCCAGGGG CGATTGGTTN TGTTCCAGTC CAGTrTCT ACCTGACTCG TATCCTGCGG TTCAGGTTTG TCCAAAAGCA AATCACCACC AACTAGCGCC TCA'rCTGTGA TCGGAATGCT GCGACGAGAA AT'rrCCATGA TGGTCACACC AGCTTCCTCA GCACCACCAC GGTGTCTAGG AAGGAGGGAG
AGGGCT'TCAT
'rTCATTCCAC
TCCTCTGGCT
S
S.
S S
S
SS 55 S S
S
S S
S
S. 55 S S ATATCACCAG CAATATAGGC AGGCAGAGTG AAT'rTTTCAA ACAAGG1'GCC AACATTGTCC ATCATATCTC CTGCATCCAT TTCCTTAACC TCCCCTTGAA TCAAGGCATA ATGGA'rAGGC GCATGAACGT 'rGACAGCAAA GTCCATGCTA AAAGCAGCAG TCACAATTCC ATcTGCTCCYr 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 '2480 12540 12600 12660 12720 12780 12840 TCAAGGAGTT TGCTTGGGAG AGCTCATAA GATCTTCCAT AG;TCCTGC= CCTTGGCAGC ACAGCACGGT CTGGCTGGGT CCTTTTAAGA CTGTTGCTGA TCTCCTTCTT ATAAAAAr'FG TCCCGTTCTT GAGTCAAGGC CGGTAT7*rAA TTAAAATCTG GTTGGCCCCA GAATGGGACT TAGGCACGTT TGAAAACCTC
AAACTGCCCA
CIYCTGGACTT
CTGCTTGACT
CACAACGGCT
AAAGTCGGG
CTGCGGCTCA
TA-k"CCTCG CCAGATAATT TTTCAGGTTG GGGGTTTCTT GGATAACT AGAATTTCGT AACGGTCATC GTCCCCATAA AGATTAGTTT
GTAGATAGAT
TTTACGACCA
TGTCAAAAGT
TGTCATATCT
TGGTCAATCC TOAGACCCAG CTCACTATTT TTGAGGGTCG ACCCCAGCTC ATCTTCTPLIA S S
S
*o.S
S
5.55
TAGTAAGGTG
TAATCT'rGAT
ACTTGACCTG
GATAGCCGAG
CCTTGGCAAA
CCTTrCAGC CTAATGATAG AGGTTGTGGG TACGGCAAT CGGTTTTGGC GGTCTCTGAC AAGCCTGACC GCAAAATGTT CATGACTTCA TTCT'rCTTTC qTTGTAGAAA GC-GTAATACC AATCGTGAAA TTGTCGTCTG ATTCCCATTI' CATAGGCATA AAAGCCTTCG TCGAATAGCA TAGTGCTGCG GATTGTAGGA CTGTATCAAG CCGACCTGAT CGACCTGCCA CCTGAGTCAA GAGCTGGAAG ATCAGGCAGA TTCAAGGCCG TATCCGCATT TAGAACTCCG CTTCTCrCAG AAGAACGGAA ACTAGGGTAA CA7TGGCAAA ATCCAAACCC TCCCTCCC CTCGCCCAAA CTGGTCAAGC GTA'rCCACAT CCATCCTCAG AATGCGAGCT GCCTTCTGAG TTCCCGTCCC ATAGTAACGA TGAGGAATAT CCTTCGAAGAA ACCACAATAA AAGGTCAGAG AAATATCGCA GTTGGGACAA ACAAAGCTAG AATAACCACG GCGATTGAGC CGGTCTTGGA TAGCCTCTAG CAALAGGAGGC TAGTCTCGAA AC;TCAATCAC TTGAACCTCA GTTAGACGTA AGTGTTGATA GACGCCTTTG VrTGCAATCA 'rCTGACTACC ALAGTAAAATA AAGGCTTGGT GACTGCCTTT r'r CGAGTC TGGGGAAAGA GTTCTGCTAG CTCATCATAA ATACTGCGGC TCTTACAGTT ACGACAGACC TGGCAGTTCA TAGTCTTGGT ATCCATATGC
GTATCCACCG
ATGAGAACCA
GTAAAGTTTG
GGGATTGI'AG
CCAGCACGTG
GTTGCAGATC CAAGTACCAG CTGGCATGGT AACGGGCATT! ATCATGACAC CCAGATTTTT TCGGCATCGC CACGCTCCAC GAGTGAACAA TGGCTACCTT GTCAAGCAA.A TCTCAGGi'AC ATAkATCTGCA AGTAAACCTC GGTTGAGAAC TGCCAATAGA AACTCCAAAG GTCTACTTGC TT-TTGGACTA TGGTAACAGC GACTCTAACA AGCTAGCCAA TCCAACTTT TCTGGCACC TCATACCAAG ACTGGGTCTT AGGCCTTTTC TAGTCAAACG GCTAGCCAAT CTTCTGA.ACC AGTTGCTTGA TTATACTGAG GCTGTCCTGC TTATAAGCCG CAGAGGAGCA kAGATAGCAG CTTGCGCCAT TCATCATACT GTCCCCAAAA CGTGcqATAA CAGCAAAATA GCTGTCTTGC GGTCTTCCCA CTT'CCTGTA-k ACTCACAACC GCATCACCCG TTCAA'rrCCT 'rCAAAATAAG ACCTTGATCC ACAAAGAAGT GGAAGCGCTC TCTGGATGAG TGTAGAAATC TCAACACCTT TCCCACACTC CCGACACATG CCTGCTCTTT TTTAACCAGA ACGTCTCA2I' TTGTCCGATA CCAAAGGATT GGCACCTTG CCCGGCTCTC TAAGCTCGCC CCCGTAAAAT AGCTACCTCT CTTCATCCTC TTCATCAA'rA ATCTGGCACC AACAACAACT TTTCACCATT GGATAATCCT AACGCTCGGT CATCTGAGGA CCTTATCCAG GGCACCT'rGG TCCCTTGAAG TAGAAAGGGA CCTCTTTG TrCTGCATTT CAGCCGAGCG TTGAACTTCC TGACTTCCTC TCGCGAGTAG ACACCAGATA ATCTCTCAGT CTAATTGAGC ATGGTCAACC 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 GACCTTCTTT TGA'rCGACTC CCTCATATTC CATCATTTCA GCTTGCTTGG CAAGGTCTAG AAACAGGCGC ACTCGTTCT CCTGACTICAA CAGACCAAGC 14220 TGAAGAAAAG 14280 GCCI'CCAGA 14340 CTTGAGGA'rA 14400 GGATAGAGAA TCTTGTCATA GCTAGAAT1'C CAGATTTTGT AGGAGAAGAC AGATTTCCGT GTGAGAACAG GAGAAAAATC CAGCACCTCT 'rCTCCATCTG ATTGGGACTT CAAACCAAGA AGAAACCCTG GAAGCATGC AACTCCTCAG CCAGCCAGAG GCAATATCI' IrAAATCTTG ACAATCCC1TT GAATCAGGCG
TTGTTCTGC
CTCCATCTCT
ATTACCCTTA
14460 14520 14580 702 CCAAAAGGCA CATGAACCCG CATCCCAACT TCCAGCATTC CCrCAAA'rrC CTCCGGAATC CTGTAACTAT AGGGCTGGTC CGTCTGCATC AAAGGGCACAT CTACGATAAT CTTAGCTACG GCCATCTTCT CACCTCCTCC 'ITGTCAGTAC ArCTTGCAA TAGAAAAAAT AAGATTGAGT CCCCCCAACC ?rAAATTTTT TCACCATCTT C1r=TCTrr AGCAATTTGC TC?'rTGArT TCTTTCTTC 7=CTCTG CGGCG""M CTTCrCGAT GTTTTCCTTC TGGATCTGGG TGAATTGTAA CGTTTCCTGA GAAGAGTrA rTT1TCAGAC TTGAAACCT'r GAGTTGCTGG GGGCACCTTT TGCTTCCAAG ArrACGAGTG AATA=~GA TATCAATAGA GGC?T'AAC ATCATTrTCCT TGTACCTATT 7rGGAGATTT TGGTAACATC TCCTGATAG1 GACCAATGAC CTCCTrCAAT CACACATTTG ACACGTTCAG CAGCTAGGGG ACGGCGACGC ACTCTCAC TTCGATTTCT TCTAAAGCGC GGCACCTGCT TCCAAPITCGT AGGAACCTG TCCAGCAAGG TTCTAAA'rTT TATCGGGTAG ACGATCCACA CAGAAGTGTT 'rACCTGATCG TTGACAATCG CTTTTCGATT CG-rrGCGCAA TTGCAATTCA TCCAAA'rCTG CTTGACCTGA AGAGCACCCT CATAATCATA CTCACGCATG TCACTTCTGC ACTATCTGTr GTGGT= CAG GAAGATAAAG GAACT'rCAAT TTCAACGAAA GAGGAGTTCC ATAGI'AGTTA TCAGCTCTTC AAATCTTCA GACGTTGTGC GCCTGTCGTC AAATCTCTCT TCTAACCGTT AGCCTCCGTC TGCCATTGTG
AGGGCAATTT
CCACGACCTA
ACAGCATCTG
ACATCGATTC
,CCGACATATT
CTTCCl-rGGC
CCAAGCGATC
GAACCTTTT
CCrTGTCCAA GGTTTCATTG ACATAGGTCA CTGCCTATTC CAACATCTGT CCTTGACGAA 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 CGAGTACGCA AGAAA'rAGTC AACACCGTCC ACTTC'TCCAG ATCGATACAG AATATT'GAAA TTGGT=rCA GAACTCTCAA CCTTTTCCAA CCCCTGAAGG ACCAGAAAAA ACGATTAGTA TCTCCTTTTA GTCAATCTGT GAAATAACAT TTCTCTAGAA TAATGGCAAA AAGCCAGATT A'rCCTTTACA GTCTTTCTAT CTAGTGTAAC AAAAAAGCAC TAAT"TTTCA ACTGCCTTT CTTATTITATT TAGCATAA'rC TACTGCACGA AGCTCGCGAA TCACGTTAC CT'TGATATrr CCTGGATAAT CGAGA1'TGTT TTCAAT=rC TT'ACGAACTT TGTGAGCCAA GATTGTGACT TTGTCGTCCT TGA'TTrMCC TGGATTGACC ATGATACGAA TTTCACGTCC TGCTTGAAGG GCAAAGCTAG TTGCACTCC TT'CA;AGCCG TTAGCAATTT CTTCCAAATC ATGGAGACGC TTCATGTAGC TTTCAAGAGA CTCACTACGA GCACCTGGAC GGGCTGCGCT CAAGGCATCT GCTGCAGCGA CGATAACTGC TATCACGCTC TCAGCTTCAA CATCTCCGTG GTGACTACCA ATCGTATTCA CCACAACTGG GGGTTCCNTG TACTTACGGG CCAATTCCAT ACCGATTTCA ACCTGGCTAC CTTCAACCTC ATGGTCAATG GC~rrCCCGA TATCGTGAAG GAATCCAGCA CGACGGGCAA GAGCCGCATT TTCACCAAGT TCGCTCGCCA 703 TGATACCAGC CAACTTAGCA ACCTCAATCG AATGGCGCAA AACATTT'rGT TACGGAACTG CAAACCTCCC ATAATCTTCA TCAAGTCTGG ATGAAGGTTT 7rCATAGGC AGCAGCCrCA CCCTATTCAC GAATCTTATT GTCAATCI'CT TCTCAACCAA CTCTTCGATA CGAGCTGGAT GTATACGACC ATCN'GAGC TAGTCATACG GGCAATCTCA CGACGAATCG GATCAAATCC TGACAAGGTC
CCATATGAAG
GGCGCACCAA
TGACGGTT TT
AACATTCCA
ACCACTTCT-G
ATGTTACGAC
G7'rGAGTTTG
ATGTCCTTGG
GTGTATCGTC GATAA'rCACA CTIYCACGACC AATAATGCGT TTGACTCCGC 'rACA'rArTCA CCATrTTGTC AGAACGTTCC CCCTGGTCAA GTTTTCCTCT GCGCACCAAT ACGCTCTAGT CACGCGCATC AAGGTTTTTC
TCGACCCCTG
CCCTTCATAG
CCAGCGATAC
TTGACCTCrr
GTCTGAGCCA
TCTGCTTCTT
GCTCTATCAC
TCAAACTTI'C
TATCGTCTGG
GTTlGCATAGC
GCTCAGCTTC
AGATAATATC
'rTGTCTTTC AAATACTT1'G
AAAGGTACGA
CAGATGAACT
TTCAACCAAG
GCGAATGCGA CI'GGCAATCT TCGTGCT'rCT GCCTGAGACA GACTTCCTCTr AATTGCTCT'r TTC=TTTGT TCAAGTGT= AGTAGCTCTC TCTGTCAAAC GTTCTTTACT CCTCAAATTG TCGTCCTTAC GGTCAAGGCT GACT'rTCGAT TTGTTTGAGT TCTTGACGI-r CrGAN'TCA6A TTCAGCGTCC ACTTCTTCAC GGTATTTTCT GGCTTCTTCT 'rTGGCCTCCA ATAGTGC?1'C 7TTTA.AGA GACTTGCT CACGTTTGGC TTCATTAACA AGTAAATCCG CTTCACGCTC ACTGTCCA CG'rAAATTAG TTGCTTCTGC TTCAGCATTT AAAAGCATCA ACTCTGCAGC TTCCTGAGAT GATTTCATCT TAGCTGAGAT GCTGACATAT CCAATGACTA AACCAATGAT GACGGCAAAA ACAGCAATCG CAAGCGACAT GATTTCCATG TTTTACCTC ATTTTATTGT TATTCCGAAT GACATACATT CTrTTACATT CTACCATAAA AAAGTGATTT TCACAAACCT AAA6ATAGA.AT ATGTTTTGAG GAWI-rrGGAA CACATTTACC AAAATAAACT '1GTTGTTTAG AAATAGTAGT TT'AGTAGAGA C1TTGAGAAAA AGCCTACCr TCAATAGACT TAGTAATGAT CTTTAAAGGA CAAGAAAGCC ACGCTA'rCTC CATCCATCA'r ATAAATCAAG CGATTTTCTG CATCAATACG CCGTGACCAG GCTCCTTGGT AATCATATT GAGTGGTTCr GGTTTACCTA TTCCTGTAAA GCATCACGT TGAATATCCT TGATTAGTP'r ATTGATTCTT TTTAACGTTT TCTTATCCTG ATTTTGCCAG TAGCAATAAT CTGCCCAGGC ATCTTCTGTA AACTTGAGCA GCATTTCTTA CTCCTCAATA ACATGGACCT GAGTACTTCC AGCACGAACT TGAGCCATTC CTCCCAAAAC CTTATCAGAA AGTTCCTTAT TTTGAGCAAT TCTCAGGGTT TCTTGGATAC TATCCCACTC ACTCTTTGAA AGGACTACAA TGTCCTCATC TGGATTTTTA TT~GACCACCG TCAAAGGCTC AAATTCATCA 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 1.7340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 704 TTACCTTCT TCATGTAGTC CTTTAAATGA TTrCGGAATG TTGAGTAAAG GACTGCTTCC ATAACCATAC CTCGIrTTAG CTC~rTTTCCA CTATTATACA CGAAAAGAAA GAAATTGTCA GGAACTTGTA CAAGA?*PTC TN'TCTATCT A~TTATACTC AATGAAAATC AAAGAGCAAA CTAGGAAAC1' AGCCGCAGGC TGTACTTGAG TACGGCAAGG CGACGTTGAC GCGATTTGAA TT'TGATTTTC GAAGAGrATT ATTCGTAAAA AATCTCAAAA AGCCTACCTT TCGGTAGACT TAGTTTGTTT CTATTC INFORMATION FOR SEQ ID NO: 88: SEQUENCE CHARACTERISTICS: LENGTH: 7001 base pairs TYPE: nucleic acid STRANDEDNE-SS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: ACGTAGAAAA ACTATTTCTA TCACAGATAA TATTCCGTAT GTTGrTGGAG GTATTGAAAT AAACGTCCTA GGTATCTTTC TCAGTCTATG TGACTTACAA GGGAAAACTC TTTTCGAGAC AGAAATTrTTG AATGAAGATT ATCCTAT'rTC AGAAATCAAT TCCACCATTA CCAATATGAT AAAAACAGCT ATAGAGTACG TCCCT=GGA AACAAAATTA CTTGGATTTG GCTTATCAAT ACCTGGACAT TATA.ACAAAG ACTCCGGAAG TATCATTACA AACAACCCCA TATGGGAATC* 18180 18240 18300 18360 18420 18436 a.
a a. a a a. a T TTT AATTTA TTAAATGTAA CGATTGTATG GCTATAGGAC TTTCCTACAC GCTGGATTAG CTCTAAAAAT CCTTATATCG TT~GTGAATGC GGAAAAAAAG ACACGCCCAA TTATTATTTA TGAAAAAGAC ATTCATTFrAG ACGTCAACAA ATTGATAAAG TTAAAAGATT CAATTTTCCT TTTATTGTAA A.AAATAATAT AATACCTTTT TAATCCACAC AATACCCCCG ATAACTTTAT GTATTTACAC TTCCTTTTTC ACAAAAGAA.A AAATAGGAGC GAGAAATTGG ACACACCATT GTCGAATTGA ATGGGCAATA GTTGTTTACA AACATATATT TCGGATGCTT GGTTAATCAA AAAATTCCCA ACTAACTGTA CTAAAAGCC TTGTAAAGAC ACACCCTTTT AACGGCTTAT AATTTAGGCG ACTCCGCTTT GAGTCAATTT ATTAGCCACT TCTATTGCAA ATCTCCTCCT TCTATATCAA CAGTCAATG CTTAATTATC AACCTTTCAC TCCAAGACCA GCTCCACTTC GTTCCCTTTA CTCGTAATAT ACAACAAACA TCGTGGAAGT ATAGGAGCTT GTGCATTAGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080
CATCAATCCT
TCATGAAGTC
AGAAATTGAA
GCTGATAAAA
AGGGATAAAA
ATTTTACCTT
TATCGTCGCT TTTTTCATAG AACATAGCAA TGTAVTACAA GATATTATTT TATTAGAAAT CTATAGACCT GTTTAAATCA ACTATAACCT GTAGTAGATA
CACCTTAATA
TCTCGTATTT
AGACAATATC AAAACAAGAC GACTTCCATA TAGGAAACCG CCTCTCGCT ATGTTGAGTG ATTTATATTA AAATAACTNT TCTTCTAGCT GCATT 'rATT ArrATAAAAA CArrCATCAT AACCCCCAGA ACTTAAATAA CAAT?'TAT TCAAGATACA TACTCCTACA ATAAACTTTA TATGAAArrC TCATT?~rTGT TTTTACAATT CTCCTTAGTT AAA7NCTTG'rr TAATATATGT
TTTACATATA
TTT'GAATAGG
AACAATCAA'r
GTAAAAGTCG
TCACAATATA
AAAAGAAACG
AATCCCTAAA
TAAAGTTACA
GTATTTAGCG CCACATAGTA CTGAACTCTC TCCAAAAACG GTTA'rTCCTC GCGTTATCAC AAGAAAAGCA TCTCCACCTT TCAACTTCAT TGATGCrAAA ACCrGTACCT AGATGTTT-CG G?1'CATAAAA ATGAAATTGA TAGCGATAGT CAAATCAAGA GGCATCATAA 'rAAGTTCATC CTCCGAAAAA TATCATTCTA ATTurrGAAA TCAAATGCTC ATGAAACAAC GAATACAGGT ATCAAAACTA
ATGGCTCAAA
CCATCAAACI'
CTCTAAAAAG
TGCCTACATG
TGACAAAACA
TTTACTAAAG ACACTGCTCA ACTTTACACC TGTAAATCGT TGTTGTATAA AAGATCTACG ACCACACTCT TCTIAAATCAT AC;TGT'rCGCG AATATAT'rAC TGATAGCATT TCTACAAATA CAAGTAAAGA GAGCGGATGA GAT'rCAAACG AAATATGTCA
S
S S
S
GTGCT'rTGGC ATTCCTAGCC TTCATATCAT AATACAGACA CTCGTAACAA CTGCTTCATT GATACAAGAG TATTAA'rCCA TAGCTCAGTT AAACCATTGA TAATAAGCAA ATAGATTCCA AATACTAGTG TAACTATCCT TCCAGTCAGA TTAAAGAATT CTATAGACAA AA7'rTTTTTCC 'rTTCTACCAA
CTATACCAAT
AATTTTCTCT
AGCTTGTCAA
CATATTTAGG AACAGGATAA CTAAGACAAA TAAGCTAA.AA ATCTGCTCAT TTTAATAAAC ATCACACCGA AAATT=rCT 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 AAAATTTATC TCGTTAGGCA ATCAAGCAAA AACTCGACGA TAGTACAAAC ATTATCATAC AGGATTGACT TCC1'AAATTA TATACTTTAG TAAGGTTTC GGATAAGAAA AAAGGTT1CAT TTTACATrTC TAAACATTCr ?rrCTAAGAT GAAAAACAGA AGCAACAAGA AGATTTTCAG TA'rCATCCTA TAGATACGAG TTTTG.AATA'r
TACGATTATA
TATTrTGCGAA
AAACTACAAT
TOATACAGOC
AAGTTATTTT
AATATAAACT AAATTTTATA ATCCAAACAA TCACAATI'CA TGAAGACTAT TCT'rATTAA ATNTCGAT TGTGATTTAA C'rAATTAAGA AAAACTACAT GGAGGAAGAC AATGGATTGG ACGCAAGCCA TTGGTTTCGC CAAACCAAGA TGTAGAAAAG TTGCTAGACT CCAAAGAACT AACCCGTTTT CAAAAAA1'TA GCTTGAAGTA TGCCTrTTCAA GAGCATACTC CAACTCATAA ATATGTGATT TCATTAAATA AACCTGCTAA GVTAACCAAT G'N'CAAAAAT TGATGGAGAA ATACAAACAT GGATAAAATG AAACCGGTCT TCCAAGCCCT AAATAAGGAA TTAATTCAGG AAAATCTGAC TA.ACAATT ATCTGTGTCG GTGGTTATGT 706 CTTAGAATAT CATGGTTTAC GTCCCACACA AGATGTTGAT GCTTTTATGG CTCTATAATA ?ITGTAGTGG GTAAATCCCC TATGGATATT ATGGAGCCTA TrTGTGTA GAAAAAAAGT CCCATATGAC CTATAATGAA AAGCGACAAA ACAACTCATT AGAAAGA6ATC ATATGGAACA ATTACATTTT ATCACAAAA'r TACTAGACAT TAAAGACCCT AATATCCAGA 7TTAGACAT CATCA6ATAAG GATACACACA AGGAAATCAT CGCCAAACTG GACTACGACG CCCCATCTTG CCCTGAGTGC GGAAACCAAT TGAAGAAATA TGACTTTCAA AAACCGTCTA AGATCCCTTA CCTCGAAACA ACTGGTATGC CTTCTAGAAT TCTCCTTAGA TCACTGTTCA AAAATGATGG TCGCTGAAAC TCGTATTATC AACCAAAAAA TrGCGCAAAA TGCTCATCAG CTGGCCAT'rT CAACTTCAAC TGAGCATGAT TTTTCGCGTC TTCCTGAGAT AGTGACTGTT TCAATCGGGA GATGGAGATG GATATCATCA CTGTTCTTGA AGGTAGAACA TATGATAGAG CCGTCCGATG TCGCGTCAAA TATGACT-rAG CTAGACAACT TTTCCCGTGT GTACAACATC TrAGCCGTGC TATGAGTCGT
TTCTATCGTC
GTTGAT'rGAG
TGTCATTCGC
TATGTCCTGG
AGCTTTATTG
CAAGCTGTCA
AAACGCCGTr TCAAGTGCTA AAGAAGAATC ATCAAATTCC AAGA'ITr'CTA TGACCGATAT AAGCTCAATG ATTCTCACTT GACGTTGAAA CAGTCCGGGG CGCAAGATr TGAAAAGCTC TCCGAGATCA CTTTCTTAAA 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 t.
0 ATTATTACTA TGGATATGTT GCTAAAATCG 'IrCTTGATCG GTGCGTGTCC AAATCATGAA CGCTACTGGA AACTCATTCA CCTACTrr'rC GTATGCATTT CGAAAATCCC ATGAATACAA CGTAAACTCA GCGATAAACA GAGA'N'?rAG ACAAGCTr CAACTCTTGC TGTTTCACTT GACAATCTTA AGCAGGTTCA AA.AGAAAAGG TTATCAACGC AATAATCTCA TCAAACTTAT
GGCTATCAAG
TTTTTATCGC
TAGTCCTTAC 3660 CTTTCACATT 3720 TCAGTTTCAT 3780 ACAGGATAGC 3840 AACCAATAAA 3900 GAGCTATTCA CAAGACTTGA AACATCACTA TCAGCTCTAT TCAGAATAAG GAACCGGAGA AATTTTTCGA ACTTATCGAG TCCTATTTTT CAGACTGTCT TTAAAACCTr CCTCAA.AGAT CCTTCAACTA CACTATTCTA ATGCCAAACT GGAAGCGACC CAAGCGCAAT GCCTGGTT TTCGAAACTT TGAAPAACTI'C 3960 4020 4080 4140 4200 S. SO S S
S
05S* 0 AAAAAACGGA T7r=ATCGC TCTGAATATC AAAAAAGAAA GGACAAAATT TGTCCTTTCT CGAGCTTAGC TlTTTTCAA CCCACTACAG T'rGACAAAGA GCCGGAAAAA CGAACAGCCT -TAGCTTT.CCT TTCATTTCTT TTTATTTCCC TCCTAGTAAA CGTGCTAGCT TCCACAAAAC AAACAGGATT CCCAGAAATG CCAGTACCAC TAGCCCACGG TACAACCATT GAGAGGTTGC AACACGCGAT ACAGATTGTC CTTCTTCGT AAAAGCAACC CTCGCAACTG CAGCTGTTTG TGGATCTGAT TTTTATAAA CAGCGACTCG TTCAAAATTC ACTAATAAGC GTTTATTAAA GGTAGGAATC CGATCGCAGG TTATCAAGGT CATGATATTT TTAGAGCTAA CCGATTCTAA 4260 4320 4380 4440 4500 4560 4620 707 TTTTCCCAT TCCCACCGTA AAATAATCTC TGTGTCCA'rC ATCTGA'rATT CTACAATT-TC CTGGCCAT'rA TCATAATAAA GA~cATCTCC GACATGGCTT GGCTCTG.CAC GGTGCCCAGC AGGCAGCGGT GTACCATCCA CAAATAAACC GGCTCCA'rGA AGGGTCGTCA GACACTTGGT GCGATTT'IGC GAACCAAGC
CATGAGCCAA
TTTCCAAACr
AATTGACCTC
GTTGATTGTA
CATrTCAGTT G'rCATGGATI' TCACAAATGT CA.ACACCATC TGTCCAAAAC AATAAATAGG TCCCAATAAG GCTCGTAGTT TAGTCCTTGA C7TrrGACAAT CCGTCGATTA CGATACACGC CTCCTAGTAA TAACCACAAC CAGAATTGCC CAATTGGTGC CG'rATACGGA ATCCGCTTCC TATACGGTGT ACAAGTCAAC AAGGTCGCA'r CAAAGTCATT CGGCTCCACC G'rCACTATCT AACTTTTAGC TGATCCAAAT GGCGGAAAAA AATCACTGAG CGAATCCCTG TACCATCCAG GCCCATCCCT AAATGATGAT AATCTGCTCC TGGAATAGAC AAGTAACCAT AGACTGCATC ATATCCCTCC GCCAAAAAAG GATCTACAAT GGCGAGAGAA TGGTTCTCTT GTTCT'rGGTA AGCATGACCT TTCACCTGTC CAAGAGACTG AATCAAACAG GCTACCAACA TCAACAAGTA CATGACGCCC CTCCAATTGC TTTTCTAGTC GATACAGCAA GACAAGGATC ACCGCCATCG CACGCTCTCT CACCGCTCGA TTCCGCTCTG CACGTACCAA CAGACGATGA CTGTTAATCA AATCTTCCCC ATGTTGAATC AAGACAGGCT GATCCACTTG GTAGGCCAAC ACCTGATCTA 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 AAACGTGAAG ATAAAAGA'rA 'rCCCCTTTr'T TCAT ATC CAAT'rGACTG CCGTTrGGCAA TCCTCTG'rGA GCAGTGATCA CTGTATGGGT AT'rTTCACCT GCGAAGCCCC T'rCTAACAGC CCTGCCCCTT TCTGAAGAA'r GTCCTCACTC ACATCGGAAT TTCCTGATCA ATCGCAGGAA TTTCCACATA GCCAATCCGC 'ITAGCATATT GGCA'rATTCT GAGACGCCTT TCrI-rCTC ?TGCTCTGTA GAAT'rTCAGA TCGTTTCAAG GTCGCATTGA ACCTGAGC CAAGCGCCAA GTTCrGCCTT ATCCATCTGG GAAACCGTCT CATCAAACTC T1rA-ATAACC CAATACGATA ATAATAACGA GACACCAATG GATATATCGC AACGGCGAA'r AAATCAGAAG AAGGATCAGC GGATGTTCT TCrTTrGT GCCTrTTI-rT TACTGTTGTC CATCCTCCAC CTTCACTTCC TCCTTGCTG CTTTCACCGC TTTTCCGGTT C'n'rTCTT CTTGCGCA.AG CGTCGAATAA TCCATAAAAG AAACCAACTG CCACATAAAA CAGGTAGCGA TAGAGATGAC TGAGTTTGTT AATTCTTCCT CAACCTCTGC TACCTACCGT ATCCGATGCC CCCGAACCAA GTATTGATCA TGTATGCCGT ACAAGTCAGC AAGGTCACAT AATCATGACC
AACAATTCTG
CCAACAGGCA
GTTCCGACAT
TCATGGACCT
AAAGGATCAA
CGCTcCCCAA
TCGTTTGACT
CCTACTAAGA
CGTGAACGTC
CTTCAAAGCC
AA'rCACAATC
TGCTGCAATA
TAGACGATGG
TGCTACAATC
708 AATAAATCAT CA).AGTrCGT CGGCTCAATC ACCTTTACTr GATCCACTTG ATAGGCCATC ACTTCCTPTGA TATTGTGCAC ATAAAACTTA TCCCCAACTT TAAGTTTGGT CAAATCCG'rA AACATCTTAG CTGTTGGCAA ACCTGTATGT GCCGTAATCA CCGCATGGGT CGAATTGCCT CCGATCGGCA GAGAAGTTCC CTCTAGATGC cC-AGCCCCTr GCTGCAATAC CTCTTCAGCA GTACCAGCAT AAACCGGCAA ATCCACGTCA ATAACGGGGA TTTCCACATG CCCCATCCC TCATGGA'ITT CTAACATACG TGCATACTCT GCTCGCCCTT TTTTCTrCAT TTCTrCCGAC CAAGGATCGC CACTCACTAC ATTATTCAAA GAGTCATTGy-A AGGCT TGTGC CAATrTCAT CGTTCATCAA TGTCAGCCTC A'rCCAACGT'r GCTTTTTCCT TATCAAAGTC AGCAATTTGT TGATTTGATT CCACTCGATA ATACAAGCGA GACACCAGCG GATACGCCAT TACCGCCATT CCAATGAAAA ATACCACTCC TAATAGGAGA TTAT'TTCGTT TTTGCTTTrTT TGTTTTACC ATTTTTATCA GCATCCCTTT ATCTTCAAAC TTCAGGGTAT C INFORMATION FOR SEQ ID NO: 89: SEQUENCE. CHARACTERISTICS: A) LENGTH: 10411 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7001 GAGGGAGCTT AAGAAGTTAC GATATTTTTA AAACTGTCGC TTTAGTGAAA TCTGCATTGC GATGATATTG ATCTGGTTTC TTCAACAATC TTGCGGATAT GAGTCCTGAC TTGATATCAG GTCCTTGTT T TCAGCATTGA ATTAGCTTGT GGCACCTTGG ACTTTCTCCT GTAACTGGAA CGT=ACTGCT GCGTTTCGGT AATCTTGACC TCAAGTGGCG CTGTAAGCTA GAGTCATTGG CTTGAGTTCT TTGGTATCTG CACCGTCCTC TAGCGCCTTA TCCGCATCAA AGTTAAGGTT CAGCTTGTGA TACGATGCTr TGAGGATATC ACTCTTTGAG CTGTTATGAC CTGATCAAGT CTTCTTCTGT CAGA'PTTCCC CTAGGGCAAC GTTT-AATTTA TGTTTAAGGT CATTTAGGGT AGATTCAAGG CAAAATTGAT TTGTAA7I" TTAAGGTATC TTACTTTCTT TAGCT TTGGC TTAGCATCAT AGCCTGAI*?T
TATCTGACAA
CTCCATTAGC
TAGGGGCTGC
ACATATCCTG
ATTTGTC.ACC
CCACA'PTCAT
TTGAGGCAT
AGCTTTTAGC TCTTCTTGAG CCAAATCTTT CTCTAGCGAA TAGTAAATCC CTGCTAAAC TACAGTGATT TTGGCATGTT CCATACCCAG AGTCACCrA GTGATATTT'r CTGGTGTTTC TAGCT'rTTGA ATCTTGGCTG ATGAATACAA GATTTTAGAA TAAACATCAG GTGTCATGGT GTAGCCCAGT I"?=TAAGAG TITTGATTTTT 709 TTGGTCTTCA GATAGGGAGG AACCTAGGAC ATATrCAGGT TGGACATAGG TTTCATCCAT AACrTTTTGA ACATCTGTTG CTGCATGGAC GCTATTCATA GCTGTTACTG CCCACAACAT CGCAGCGCTA GTCAGAAAGA GTTT-CTTTCT CATAGGGAAT TTCCTCCTT'r ACTTCN'TAG AGTAATATAT CTATCTTAAA GAAAACTTAT AACAAAAACA CCTGGTCTAG CCAGATG AAAAGAGAGI, GAAACATTTG ATGATGTAAA GGTTAAGTCG AGTTTCCTCC ATTTACATAG TATAAGTCTT TCCAGTCCI'A TGGCTAGAGC CAGGGI'AT CAAAATTATA GGCI'GAACA GATTrGCCTGT CAGA=~TGA TGGTGGAACC ACTTGCAGAC CAGTATAAAT CATAGCAAGC TTTCTCGCAC CATGGGTTrGA CTTTATAGCC AGCAAAAAGG ACTTCAGCAC CGTGAAAAAT TTACCAAGCA AGGGGGCAAC 'rCCTTGCCAT rTTTCGGCGAT GCTGTCCAGA TATCTACCCC A'rCCCTrGCC GAATGCTAGA TCACTAGACT GCATAACATC ACAAGCTCTT CGTTTGCTGG TAGGTCATGA CTTGTrTGAC AAGACTGCTA GTACAAGCAC TACCTGTCTA GAATAATAAT GGAAATGGGG TGAATATAAC AGTCTCACGA OAGTACTGTT AAAATCGATA TAGGCAGGTC CTTCT'rCTGC GCCAGATAGA GGCATTATCA TTGATGGTGT GCCTTCTTTT CCTTTTGT GGTGTCTTGT TCACTCAATA ATCTTGATGA ACCCGGTAAG TCTTCGAAT'r CGTTTAAACA 6 6 66 6 6 6* *6 6 6 6 S S 6 S. *6 6 5 6 656665 .6.6 6 *666 *6 *665
S
6666 66..
.6.6 56 66 4 6 5 TTATTTACTT TGGATATCCT TTGCATAAAC TCAACAGACT TCCATTTGAT AAGGAGAGTT TTCATCAAA'r TGAACAGGTT AGCCGTCAGA TTGTTGTCAA TTCTTCTAGG TTGTTGAAAA GTTAAAAGAT 1-rAGCAATTC AGAAATCTTA GGAGCGACAG GTACTrGATT CGTr'rTTCAA CTCATCAGCT CCTGTTCCAA
CGATATTT
CGGCGTCTTG
GATTAACATA GAG;TAGGTTC CATLTTCGTT ATAGACGTTA TTGGGAACGA TGAGCTCAAT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 'TGGTTTTC AAA'rTTCT'rr AATTGGCGAC TGGCATCAAT CTGGTACGGC TTCTGACT TGGTCAATAA AGCTCAAACG AAAGGTCATT AGCCAAT~rrC TCAGGTGACA ATTCATTGCT TAGCTGATrT GACCTTGG.AT TGAAATTGA.A AATCATCTGT TCTGGGCTGT ?Tr'rCCAGT TCCTTGATAG A7"rTTTTAGG CAAGAAGATr ATCTGAAAAA TAGTTCAAAA AAGTCCCGTT TCAGGTGA'rA CTTGCTACTC TGAAGATTGA CCACCAAGC ATCCAGGCAG G'rTATTCTGA G'rrAGCT'rGA rrGGATTATC AACTTCTCCT CCGAGGTGGG 'rCAAGGTCTC CCGCAGGGCA ATTCGCAAGA AAGCGAAATG 'r'CTACACCT TCTTTAGAAA ATTGCACAAA AATCAAGTCA TTGGTCTTGA GAT'rTTCAGA AA'rGCTAAAC TCCTCTTTCC AGAGATTAGC CAGCG'rrACT GATGTCTCCA ACA.AATCCTC TGTAATATGA TTGAAGAAGG GATTTCTTC TTCGAAA.A'C CCAGTCTrGG CTTCATCTGA ATACACATGT TCAATTrMT TACGCAGGTA TTC'rTCGAT'r TTGGAGTAA TATTGACAAA CTTATCTGCT AAGAACAGT= AATATAAATG TCCATAAALAG TGACTGCACT TCTCACTTCr TGATGAGTTC AGCCACTTTG CTGCTCCGAT ACGAATCCCA LlT~rArrTAA GGTAATM'TTG CAACTTTAGT CACATCAACA AATCAGCGTC TNGCAACAAG CATAT'rCCTT GAAGGCTGGA CATGCTCTAA AGGACCGCCC GTTCT'rCGTC ATTGGTCAAA TTGT743TTGT GATATGAGCG CAGCGATATG GGCCATGTCC 710 CGGTATCATC CGGACTG4AAC TGGTGAATAA TGGCTTTCTT TTITTAGTCCT CGTATAA'rGG GAAGGCATCT GTCAAT"=T TCTAATACAG CCTCATTrrC TGAATTCTTA AGGG'rTTTAA CGAC'r=T CTTCACCAAA TCCACGTGCA GTAA'rGGCTG CTTGTCTTGA ATGGTGACAAGCTTrCGTAA GCGATTGAG'r ACTTCATCCA ACAAGITTG AGCAACrTMT CCG7=rCTA AGGAAGAGAT GGI'rTCAGT TCCACCTGAA ATAATACGGA ACATCTGCCA TAGCCTTGCT cTTCTTAAT-r ACTCAG 'rCCAAAACTT CTTTGAAGGA AACTGCCTTA GCCCCACAA
TGAATACCTG
ATCAAACCAC
TATGGAACTG
ACCATGAGCT
GGA.AAA'rAGC TGAATTCATT TTT?'rAGCAA CACGAGGTCC ACGAACGTrT TTGTGGGTCG GGCTTGGATG AAGGCCAGCC GCAACCAAGC TCGCACCGAC AGCATC'IGCG ATTTCACGGA A7"rTwGAAAA ATCGATAATT TGAGAATAGG CTGAAGCACC TrTACTTCTTG GGCTTGTTTC AAGATAGCAT CAAAGTCTAA CACTATAAGA AACAAAGTTG TAGGTTTGAC CAGAGAAGCT AATGA CACC 1'GATGCCAAA TCCATTCC:CA TAACCGTATC TGTAAGCCGC ACAGTTAGCT TGGCTTCCTG, A.ATGTGGTTG CGAAAAT'rrC TTTTGCGCC? TCAATAGCAA GAGTCTCAC CACCATAAPA ACGGCGTCCI' GGGT1AACCCT CGGCA'rATTT' GAGCTGCCAT AACAGCCTTG GAAACTACGT T'rTCCGAAGC GTTGGCGT'rC TTCTTC'rTTG GCAATAGCAT TCCAGACATC CATCTTTrGTC AAAAATCATA GGTCTTCTCC TTTATTGTGT TTACAATAA GAAAATCAAA CTAACAGA'rG CG.AATAAACC AGCTACAATC AGTITTTGGT'r GACTTCCGTT TT'AGGATCAA AACAGGAGCC CCATGAGTCA ACCTGGCTCA ATCAAGGACA AACATTGGCA AATTTAGCAC AACGTCTACI' ACATCAGTTrC ATTTGTCAAG ATAGACCC= AATTAACTCG ATATTATTT AGCATCA'rAT GCT'rTAAAAT GACTAGTCCA T'rAGT'rTGAT GTTTrCTGCAT TTTATCACAA GATGATTGGC AGAATTTAAG 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 GTATACCCAA CTTT"rrCATA CGGATAAACC CATAACCACA CCAATACCT'r GACCTTGCGC AAATGCATGA GCACCCAGAC TCI"=CTCT TC7TCTTCCA ACCCTTGTAG TAAACr=A TTGAGGTGAA ACrGCrAAAG CTAAGATATT AAATCCTCCT TTGGAATAGA G1'GAI-rCG'A AACTTCAGCG TGGACATATC CAAGTAAGAC ATGATTAGCT GCATCCTCAT AGCCAAGTAG GAAATGATGG GAATCCTGAG ACAGTCTAGC TAGTTGGCTA GCCGN'TCCT CTGGACTAAA AGTATAACCC AAA~CCCrr GGTTGATGTC ACATATAGCT TTCACATCAG TTrCCrrAA CAACCGAGCA AGAATATCrr TCCCGACAAA TTCAAAATAG CAGAACCTCT TGGTCAAAAT 'rGAGATATTG GCAGACG4GCC GGGATGACTA GGCATCCGAA TCGGTCATTA GCcTCGAGAA 'rTGAAGATAA GTTGCCTGAT GAGATTGAGC GCCTTGTC'rC 'rTCGTCTAGC GCCTTAGCAA ACCATTTTCC AACTCTTGTC CAAATTGGTC CT'rGAGTGTT GAACACTTTG ACCTTGCTTG AGTCTTTTGC ATC'rTCCGCA GAGCTAGA'rG AGGCTCCGAA AGATATAGGG TGGATTGGAA CAGATTTTrT TAAAAATATT AAGCATCT-rG GGAAATATCT CGAGAGCAAT AGCTCCACTA TTTCAGCCAG GATAAGCTCC CATCCACCTT TAAATGCATT TGTGAGCTGC TAGTTGCTGG CCTCCI'GCTG GAGGGCAAAA CAAAAGAGAG GCTTTCCGCT ATAATTGACC TAAT7"TCATr CACCAAGGCA TCCACAACTT GGTCA.AGCCG ATACGGTGGT TTCTGAACGG TCACCAGTAC AATCTGAGCA AAGTGGTCAG 711 ATCTCTTACC ATCTCATTCC CTCGCTTAAT CGCCCCTTGA TTGAATCCTG TCCAGTTrAGA CCTCTAGAAT ?TGATTAAAG CAATCAAGGG ACCTGTCTCT ATCCAACAGT TGCAAGGCCA TAATGGTCAA GGGACCTG4GT TCTTTGAA.AA GTACAAGATG TACGTCGACG TTTAAGCTGG AGAGACCGTA AACTGTCTCT TAATCCTGTC CATCATCAAC CGTAC'TCGCT TTTCAGGAAG TATCCAATT CAAGGTAAAT ATTCTACGGT AA.A'rACTAG
CGTAACAT
AAAGCATCGT
GTCACTCCAC
CGAATCAAAT
GAATTGACCC
AAAAAGATCT
TCCTCTAALAG
TAAACATGGT
TCACCTTGTC
CrrCCAGACC
TCGCCTGACC
CAAGGGTAAT
AATAGGGAAC
CTACAAG'N'T
AGGCAACATT
CAACTGCTTT
GTAGGCAAAA CGACAGCTCC GACAACCATC CTATCTTGAC
ATGTTTCCTA
CTTACCACCA
GCCATCCTCA
CrCTGACTCA AAAAG7TCAG TCTTrTGAGAT
TCTGCAAAGA
TCTTCACGAG
TCCTCAAAAG AAA'rCTTTGG
TACAAGACAT
ACAATTATAT
TCAAGCCTAC
CATATTTT'rTC 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 TGAAGATTTT GATTTT'rAGC GCTGCCGTCA CrGACCAATC CCTGTTCCGA TATCTAGGAC ACCAACTCCT CTGTTTCTGG CCATAAAAAT CTGCCTGTCC TAAATATCTT CTACAAATTG ATAAAGTCTG TAAAAGATAG TCCTCTCCTT GTCTTATCAA A'NrGTT'rAA TTCTTCTAGT CGTCCAATTT ACCAGACAAA CTG'rGACACG GTrl-rGTGGG CGATTGTCGA CTTACGCTCA CAACACGGGC ACGGATGAT'r AGAAA7TT GTAAAACAGT AT'r7TCGCTA GCTACATCTA TGGTCTGT1'TT TTTGCTAGAG CATAAGATTT TTCACAGGAT ACGAGGAATC AAAACCCGTT AATGATGTAC TGAGCTGGGT TrTCTTCC TCTGTTGTCA ATTrTTTCAGA CTACGATAGA CTCT'rCTrCA AAATTTGAAA 'rTTTGTTT GGTCATAAAG A'rCGTATCTA GTTTTGGAG
AACTTATAAG
GCGTCCTGCT
TT-CATG~CCT
'rTCGGATCCG
CATCTTGAGC
TCTCACGGTT
712 CTTCTGCTGG GTACG'rTCTT CCTGCATCTC AACCTTGATA 7rGGTTGGCA ACTGAACGAT ACGAACGGCA GTCGCAACCT TATTGACGTT CTGTCCACCA GCACCAGAGG CGTGATAGAT GTCGACACCA AGGTCTTTTG GATCAATGCc GTATTCAACC AAGAACTGTC GCTGTCGAAG TATGAACACG GCCT~rGGCTT CACACCGTGG GCACCTGATT CATACTTAAG CTTAGACTAT AGCAACCAC3' TCTTTAAAAC CACCGACACC A7TCATAGAG CCAACCTTGG GCTTCCGCAT ACTTTTGGTA CATAGTTrAC CGCTTCGTCT CCACCAGCTG CTCCACGGAT TTCAAGGATG ATCC7TTGGA AGGAGCAAAA TTTTCAGTTT TTCTTCATAT TC7"rCAACTT CTGGCATAAC TCTGTCACAG GAACACGTTG ACAGACTGAC CTGAAACCAT GCTTCCATGA CTTCAAAGCG AAATCTCCAG CGAAAAGTGC ATATTCTTGT CATCGTTTGG TCTTCTT??T CAGCCTCGCC TCTCCGCCTG ATTCCTTAAT TACTCACGGT AGGCTATTAC AAACGCTTGG TGTCTGAAAC CGGTCTTCTA CAACTTGTAG CTTAAATCAT AGATTGCTAC
ATCTTTGAGT
CATCTCTTCG
GGTGTCACGA
GACATCAGGG
TTGATCATAG
TACTTCATTC
TGCCTGTTCA
TGTTCGAACA
ATTCT'rATCA
TCTTGCTTGG
GCATCGACGA
TTGGAAGCTT
TCACTCAGCA
ATGTTCATTT
CCAATTCTTC
TATTTTGAAG
CTTCTN'TGA
ATTCTCCTAA
TTTCTCCTTA
CAACTCCGCA
GACTTGTTTA
AAGCTCCATA
TTCrrCATAA
TTTCTCAATT
TCCGATATTI' CCCCAGT'rrC TTTAAATCCA TAACTGAGGT AACAAAATC'r TTTrCTGGTT CATArGACAA CCAAAGTTTA TTCCTTAAAC CTGCTGGCGC 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 TAG'rCTAGTA CiITATCCAT AATTGGT'rTA AA.ATATCCTT ATCATAAAAC GAAATACTAA ATAATTTCCA CTACTAATTC
GATTTTGAAA
CGATCTTTTT
CCAA'rGGAGC CGT'rGGTTGC ATCATAAGCT ATCATCACAA AACCTATAAT TGCATCATTA TCATAAACTG TACAAAATCT CCATr'TTTAG TGTAGACGTA TGCTTCAGCT AAACTAATTG AATGAATTGT.7*rTTGATArr CCTTGACATC CAAATTTAAA ACATCAAAAT AATTTTCCAT TGTAACATCT CTTAGTTCAA TTGTCATAGT 'rrTGCTCCTT GTTAGAGGTT ATCATTGGCG CAAAATAATG TTTACGGCAA ACTGAGATAT AGGTTCGTT ACCACCAATC TGGATCTCG2T CTCCATCGTA AACGGGCAGT CCATCCTGTG TTCGCAACAC CATGGTCGCC TT'rTTCTTGC AATACTGACA GATGGTCTTG ATTTCGTCAA TCTTGTCTGC TAAAAGCAAG AGATATTTGG AACC?1'CGAA CAATTCATTG CGAAAGTCAT TT'rTCAAGCCC AAAAGCCATG ACGGGTATGT CTAACTCCTC CACAACACGA GCTAGGTCGT AAACATGGTG GCGTGAGA AACTGGGCTT CATCGACCAA AACACAGTAA GGTTTTCTG GT1AGGTCTCG GATATAGCCA AAGATATCCG TrGTl-ICCTC AATCGCAAgG GCAGGGCGI' TCATGCCAAT TCGACTCGAC ACATAGCCAA CGCCGTCACG CGTATCCAGA GCCGAGGTCA TAATCACAAC ACCTTTTCCr TGCTCCTCGT 713 AGTTATAGGC CACrrTGAGA ATCTCAATCG ?TTrACCAGA AGTACAACTG TGCCATGTTT CTrGCTTCAC GTCCATTTCT TATA'rCATAA TITTCTTAAG CTN'AAACGG CAAAATGCG ACTAAGTGGAG GAAGCTATT'A TGCCATI'TG ACGCATCGAT CGAGCAAAAG AAAGCTCTTG CTAAGGAAGT AACGGAAGCA CCCTCAATCT GCTGTCCATG TCATCATCAA CGACATGCCA AGGGGAAATG CGTACTAAAT AAGCrAGC?1' AAGCAGAArr CAAGTAGCAT TCATTGAAGA AATATCCTAA A~rTGTTACA AATrTCCAAG AAAAGAGCrA 'rTAATTAAAG GAAACATTAT C1rrCATGGTrC CCATAACGAT AAATTTTTGC 'rACATTCTAG TAAAATAGAA GAAATCAAAA TTATTGAAG GACGCACGCT GTTGTCCGCA ACACTGGAGC GAAGGAACTT ACTTCCCACA GCTTAGGCTT TTrCAATCTC
ATTI'GAAAAG
GATTACACGT
TGGTATTGTC
AAACTTGGAG
GAATTTGATA
CGCCTGAGCG CCATCGCTGC TATCTCTACT gAACAGACAG TTTGCTATT CCAGCCACAC TCTCAACTAC AG ATGGT 'IGGGGCTATG TTAACACCCA CGGTGGGATT GGGCTCGTr GGCAGAACCT ACTTGCACACA CGCAGAGGCT ACATTGCGGT CAAACAATTA AAATCCTCAA TACACTTGCC TTGACGAAGC CACTACTGCT CCAAACTCCI' TAGGACAGCA TCATTGGACG TCCCAACGTT AGGCTATCGT AACAGATATC TCAATGGTGT ACCTCTCAAA TTGAACAAAT TCGACTTGAG TAGTACTAAA CGCTAGTG3AA AGCAGACTAA TCGCATATT CGCAACTACC TGAAGATGTC TCCAAGAGAG AATCAACAAC CCTACTTGTC AAACGCCCGT GCGCAAAAGA TTTTTAAAGG AAAAGACTTG AACAAGGTTIG GGTCACATTA TTGATCCTCT GACTGGTAAA GTCATGGACG AAGTCTCCAA AGACCTTCAC TCGTGAGGAT ATTATCGAGA GCGGTGACCA ATGAAATTCT CCAGCTAGCT ATTCGTGAAG GGTGAATTTA CCAAACGTGC 7~TTTrAAAC GGTCGCGTAG GTGATGGATA TCATCCGTGC CAAGACTGAC AAGGCCATGA CACGGCTCCC TTTCTGACCT CATTAACAAT ACCCGTCAAG CAAGTTGAGG TCAATATCGA CTATCCTGAG TATGACGATG GT'rGTCCGAG AGAAGACAAT GGAGTTTGAG CAATTACTAA CGTCGTGGTA AAATCCT'rCG TGAAGGAATT TCAACGGCTA GGGAAATCAA GCCTTC1'CAA CAACCTCTTG CGTGAGGACA GCTGGGACAA CACGAGATGT CATCGAAGAG TACGTCAACA TTGATTGATA CAGCCGGTAT TCGTGAAACG GATGATATCG CGTTCGAAAA AAGCTCTTAA GGAACCTGAC CTAGTTCTGC CCACTAACCG CCCAAGATCG CCAACTCCTA GAAATCAGTC CTTCTTAACA AAACTGACCT GCCTGAAACG ATTGAAACTT ATCCGCATTT CAGTTCTTAA AAATCAAAAC ATCGATAAAA CCTCTN'G AAAATGCTGG TTTGGTTGAG CAAGATGCTA CACATTTCC2' TCATTGAGAA GGCCG7"rGAA AGCCTACAAG CCACTAGGTG3 AAGGGGCTAT 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600
CTGTTAACCA
GTACTTGGGA
AACTCTTTAG
TGGATTTTAG
ATITrTATTGT TAGAAAGAA'r
TAATATCCAG
GGACTACGAC
AAAAACCrC
GAAAGCGTCG
TCAAGAAGAA
AAAAGATTTC
GTAAGCTCAA
714 AGGTCTTGAA CTAGGGATGC CAGTTGACTT- GCI'CAAGTr GACTTGACCC AATTCTAGGA GAAATCACTG GAGATGCTGC TCCAGATGAA CTCATCACCC CCAATTCTGT TTAGGAAAAT AAGAAAAATC CATGATCCTr CATTCGGTCA GTTCTATAAT AT'N!GTAGTG GGTAAATCCA CTATAGATAT TATGGAGCCT AGAAAAAAAG TCCCATATGA CCTATAATGA AAAGCGACAA AACAACTCAT CATATGGAAC AATTACATTT TATCACAAAA TTACTAGACA TTAAAGACCC ATTNTAGACA TCATCAATAA GGATACACAC AAGGAAATCA TCGCCAAACT GCCCCATCTT GCCCTGAGTG CGGAAACCAA TTGAAGAAAT ATGACTTTCA TAAAATTCCT TATCTTGAAA CGACTGGTAT GCCCACTAGA ATTCTCCT'rA ATrC!AAGTGC TATCACTGT'r CAAAAATGAT GGTCGCTGAA ACTTCTATCG TCACCAAATC CCTCGTATCA TCAACCAAAA GATTGCTCAA AAGTTAATTG TATGACTGAT ATTGCCCATC AGC'T'ICCAT CTCAACTrCA AC'rGTTAT'rC TGACTTTCAC TTTAAACATG AT'r'rTCTTG 'rCTTCCTGAG ATTATGTCTT 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10411 GGGATGAGTA TGCTTTTAC.A AAAGGGAAGA T INFORMATION FOR SEQ ID NO: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2393 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: a.
a a GTTTTGGGTT CTGGAAATTA TCAGATGGTT GGAAAAGCCG GGAGATTTAA GTTT1AAATTG AAGAAACTAA CACAGAGGAA GACGTATTGA GCAACTGAAT TTGTCTATTC GAGGATGGAT ATATGAAAAG TATAGTCGCC AGCATAGATG AGCGCTTGCG TCTGGAAGCA ATGGAAGAAG AAATCGAGAC GATTATGGGG CTAAATGGAT AGCAGATAAG GTATCTGGCT GGGGCGACCA AGTCGGTACT TAAACGTGCT ATATCAAAAC CAGTCCTGGA GTTTGGATTA TTACCTTGAA CGACATGCGT TAAAAGTTAG CGGCACGTAC GGTGGTGTGA GAGGGGCTAG AGATTATCCC AAATT-TATTT TAATTATGCA AATTTCACGT ATN'TTGATG TCCACATCAA GATAGTGTTC ATGGAGTATA GACCTAACAA AAACTATTGC TCATTGGGAA TACTCGCCTA CGAGTGA'N'A ATTGCTTAAG TTAGGAGTTC TTATCAATTA GTAGCTCAGA AAAACGTGG3A CTGGTT'rCGT TTGAACCGCC GTATGCCAAA CTACTCGATT AACTCCCCTG CTGAGACGAC GATCCTGGGA t. G .fl *5 0 00 00 0 0 00 0' 0 0 0 0t 0 00**b* 0 0.00 ACTTCAGA TArrrTTTG ACrTCTAAA GGATT'rGAGC GTTTrCTGA TTTTTAAGAC TAATTATC ACTAAC'TAAC TAACTrCTTA TCCTAAAATr AGGAATAATA AAGGCAATAG 'rrTGTTTTCT GCTATTTTAT GCTAAAATAT ACTAAACGGG GAGCGCTACA TGTCTAATTC TGCAAATTTA GCAGATATTr TCITAGAGT AAAATCACTA ATTGCCACAT CACTAG?=CC GAGTCT'TrA GTTCCGTTGG TTACTAAAAG TCAAT'rrGGA AAGACTATAT TATTCGCGAT CGTAGCGCCT TTGGTGACCT ATCTATTTGT AGCACCCGTT TCCTATGCTA 7rGGCC-ACG AGCCTTATCA ATGACTGGTG AAGCTGTTCA GTTTGCAACA ATTGGTCTGT TACCTACCAC TAGCTTTCTG ATGTTATTTC TTCCTAACGC TCTTGAAATT TTGCTCAAAG GTTGGA-aGTT TGTATCAGCA AATTTATTGG AAA7M=TTC TGTTTTTGTA ACGGAGTTAT TAAATAAAAC ATACTCTATT GGTATTATAA TTAGTGGCTT TGCTGCTAAA TGGGAACCCC AATTA'ITCAC CCTTAGCTrA GATCCTGGAT GGTTTCTTTT AAAAGAGTNT CCCCTTTATG GTATAACTGT CATGAACAGT TTACCAAATC ATCACTTCCA TGGAGGTCAT 1-rAACCCAGT ATGGTGGTCT GAAACTAAAA GAGCGGATTT CTAAGTATTT TTATTCGGAT TCAGATATCC TTGTCCAGTT GGACTATGCT TGTAAAGAAT TGTCAGCTGA GCAGCTTGCT TCACAGCCAA CCTTATCCCG CCATAGTTTG CGATGCCTCA ACCTTGAATT
ATTTGTCAAG
AACAATCATT
TATCTTAATA
GTTAGCGCTA
ACTGGTAGGA
TGTTGCAATT
CTATGCGACC
ATTGATAGGT
GCTATCAAT
TGAAGTGGAG
AGTTGCTAGA
AAA'rACGATT
GGAAAGTTAC
AA!-TGCTT
CCCAAATCTA
TTCACCCAAT
AGAAAAAAAC
AAACAAGTCT
TATCTT-ITT
AGTAACGAAT
CCTCTTTCAA
TGCCTACTTT
TTTTCT'rTCC
GGTCGAATTC
TrGTTAGTCT CTCAATTA'TT GCTAACATA'r ACATTATTTC GGAATATCCT CTTTTGTTGC AATAGGTT TATCTTTATC ATGTTTACCG TAATGCAATC TCCATACTAG ATGGTTI'TGC GA'rTTGGGTA AGGCTAATTC TGGGGATTAG GTGGACTCTT TTAGTCTTGT ATATCATTTC GTGTTAGAGT CAGAAACTAA AATCCTAGAT TAAOACTTTT TGGGTTITC.Er CCATTATACT 'rGGGGATATT CTAATACAC AGGCTATCTG AAAAGTTCCT AAAACCATCC AGAATCCTTG GGGTGTTTTT 'rACTAGACAA ACAAAAAGAA AGGAAACTCA TTTTACCAAC TA'rCTT'rCGA CAGGAAC1-rT TTTCCCAGTT GACCAACGCC GCTACTGTCG CTGTTAACAG G'rTATGGAAC CCAAAATTGT TGGAAGGAGG AGAACTGACG AGGAAACAGT TTTTTACAGT TTCACCAGCT TCTATCATTA GAAAAGCTTA GAGCGCCAAA TTTrrCCAGT CTCTN'CG ATTGAAGA'rG GTACrAGCCA ACAACGATAA TCATAATTCC 1TTIrGTTT TTCATCGTAAA AAACCTCACT TAAAAATCAA ATrrAArrCC AAAGTTTrA 0000 0*q 0 00 00 CO 0 0 716 AAACCAACTC ATTGTAGATA ACGATTCTAC CCATTTCACA AC?1'ATGGCA AGC INFORMATION FOR SEQ ID NO: 91: SEQUENCE CHARACTERISTICS: LENGTH: 4762 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 2393 TT-TGTATCTT TTAGGTCTC ?1'TCAATCCA CTGCAAGTCT TGTGGTAAT TrAGGTrTGA ACGCTTCTTA TTrTGATGGTC G'rCT'rCCCTG AGAGTCAACA GAGGGGGCGT AGTGCTAGAA GGAGTCGTTr CTrCAAAGGA AATCTATATC CTGTTCCTT-T TTTGATGAAG T'TTGTCCTTT TTGCTGATNT GGTAAAAGAG GAGACAAATA ACTGCGACTC TTrATCGTAA GAGTGAGCGC ACTATACAGC AACTGAGGAA AATCGTAATA CTAAGGTGAT TGTGGTAAAT GATAAGGTGG AAGAAAATTA CCAACAAGTA AAGACTGATT AACCAAATCA ATATCTTGTG TATTTTTAAA TTTCTTrT TAGAGTGGTA TAATACTTTT TATGATTGCA CTAGAAGAAA AAATTACAAT TGGGAGACGT GTTGTATTI'G ATGTGGACAA CAAGGTrATG GATGTGACAC CCCTGGTTGA TATTACAGAA ATTCATAGTC GCT'rTCCAcA CGTAGAACAT GAACTCCTTG AAGCCAAAGA .TCGGACACAG AGGGAtTIG AGCGCTCAAA ACTTCTCAAC AAAGACCAGA CAGTTGTCAA TAACACTCAG CGTGATTTGA CAGCAGGGAT TCCTAAGCAC GTAGCCAATG CCCACCAAAA CAGTCCCTAT ACCCCTATGA CCAACTGCTG
TTTTACTTTT
TTTTCCTACT
GAAGCCGAAG
TGCTAGTTTT
CTTTTCACAA GAGCCTCTGC TTTA'rTGGTA ACCAATATTA AGAAACGCCA TrATGCCTAT TGGGTTTGTC TATCTTTTGT ATCCAGTACC TTATCAAGAA CGTAATCGTC CGGAAGATGC TATCTCATGC ATGGGATGAT TTGTCCCATC GGCGATTTrG TCCCCGTTGC AGTTACTTAA TGACTTGAAA GAAAAACAAC TAGTCTGGTC TGAAGTGGAA ACACTCTTAA ACTCAGAGTT TAAXAGTCTAT AAAATTAAAT AATTTTAGGA TTTTITAACAC AAGATATTGA TAGAAAGAAC ATTTTAGAAA AGAGCATGCA TTTGCCAACT CTCTTCGTCG AGAAACGAGA GATTGACAAG GCTCTCCACA AGGCGGCTGA AAAATGCCTC AATGATCTGA CTGAGCGAAT GGGAAT'rAAG ATTTACGAAA TTCAAAATAT ATATGCGCTG GCTGAGGAGT ATATTACTTA AG3CGACGGAT ATCAACTTTA GTATTCATAA TGAAAACGCT AATAAAGACA GTGATGTCTT TGTTGGGAAA TCAATCGGAC TGCAAATGCT GGGGGATATC CACTATCACG ATTTGGACTA TTTGATTGAT TTTAAGGGTA TGTrGGAAAA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 AACCCTTTAA ACTATACGTC ATTTCGGTTC 717 TGGTTTTAAG ATTGGAAATC CAGAGG'rAGA GAGTCCCAAG TCTATCCAGA CTGCGACAGC ACAGATTTCT CAAATCATTG CCAACGTTGC 'IrCAGCCAG TACGGTGGCT GTrCAGcTGA CCGTATCGAT GAAATrrTGG CGCCTTATGC AGAGAAGAAT TATCAAAAAC ATCTCAAAGA 'rGCAGAAGAG TGGGTATTGC CTGAAAAACA GGAAGATTAC CCTTG.GAAGA AAGCGCAAAA GGACATCrAC GATGCCATGC AATCTC~rGA CrATGAAATC AATAC=CCT TCACrrCAAA TGGACAAACA CCI-IACTT CGTTAGGTTT TGGTCTGGGA ACCAGTCGTT ?TGAACCAGA 1440 1500 1560 1620 1680 1740 AATTCAAAAA GCTATTTTAA ACA'IrCGCAT CAAGGGTCI'? GGTTjCAGAAC TAT'rrTCC AAACTTA'rCT T'rACGCTTAA AAGAGGCCTC AACTTAGAGG ACCGTACGGC 1800 AAGGAACTCC 1860 CAACTATCAC ATCAAGCAGT CTGTCTTAT GATAAGATTG TGGCTCTAGA GTGTGCAACC AAGCGGATGT ATCCAGACGT TTGATTTGAC AGGTTCTTTC AAGGTGCCTA TGGGCTGCCG V.00 oo.
0:0.
TTCTTTCCTT
TCTGGGTGTI'
TAAGT'rCTGG
TGTCGAACGC
T'rTTGGCCAT
GACCGTTTCG
CTGGGAAAGT
CCGTGTAGAA
CCAAAGTCTG
TATCACAGAC
ACCGTTTGAA
CATCCATTAT
CAAGGGTGGA AGGATGAAAA TGGTGTAGAA GTCAATTCAG GTCGCATGAA GTGACGGTTA ATCTGCCTCG TATTGCTCTT GAGTCTGAAG GTCA'rA'GAA GAAATCTTCA ACGAGCGAAT GAATATCGCA GAAGATCCTC TTGTrTACCG ACTAAAGACG CGACACCAGC GAATGCTCCT ATTCTTATC AGTACGGTGC CGTCTAGGTA AAGAAGAAAG TGT'rGACCAG CTCTTTAAGA ATCGTCGTGC CTGGGCTATA TCGGCTTGTA TGAAGTAGCG ACAGTTTTCT TTGGTAACAG AATCCAGATG CTAAGGAATT CACGCTAGAC ATCATTCACG ATATGAAACG GAGTGGTCAG ACCAATATGG CTACCATTTC TCTATCTACT CAACACCATC ACAGACCGT'r TCTGCCGACT AGA'rA'AGAC AAGTTTGGCT CTATTCCTGA AAGGAATACT ACACCAACTC TTT-CCACTAC GATGTTCGTA AAAATCCAAC AAATTGGACI' TTGAGAAAGT CTATCCGGAA GCAGGTGCCT CAGGTGGT'N' TGTGAGTATC CAGTCCTTCA GCAAAATCCA AAGGCCTTGG AAGCTGTCTG 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 COATTATGCT TATGACCGTG 'rAGGCTATCT AGGCACCAAT CAAGTGTGAC TTGAAGGGG ATTTTGAACC AAC'rGAGAGA TGGCAATAGC GACCCTAAAA CAGTAGATGT GGTGAAACGA TCCTCAAGCA AGACCGATGG TCAACGGGCG TCACAAGGAA TATGAATGGT TCAACGATTA AAATAGCTC GCATCAAGTA ATGGGAAAAT ATCAACTAGA CGATAAGGGG CGCGCACAAG CACTCTAAAG GTCGAGCTGG TAAGAAAGAA CGCTTGCTTA ACTCCGATTG ACCGTTGCTA GGGTTTGCTT GTCCAAACTG ACTTGTGGCT ACCTAGGTAA ATCGCTGCGC GTGTCAAACA ACAAATTAGA AAGAAATGAA TGACCCGTTA TCACGAGAAA GCTTCAGAGA ACAA77=TA AACAAGAACA AGAAAAAATA AAAGTGAGAG AAGGATGGAA TTACGCAGAC CAAGATTAGC AGAGTTTGAA AAATTTCAGT CGCCTCACGA GTATGAAGAC TGGTTAGAAA GCAATCAGGA ATGGG TCT GCAATTCAGT TAGTGGCTTT TA.ATCTCCGG TTGCGCCTCA GTAACTTTCT CATTCGTrCCA TCTGAAAGAG GCAAGGGTTA AGTTGCTAAG GAAAAGAACA TCAAGAAAGC TAGCAGAGCA GTCATTCTAG CAAATGGTGG 718
CCACCTCTCG
GCATAAGAAA
CGGCGGIT1rC
ACAGGAAATG
TCTGAGAAA
ACTAGAAGAA
TGCAAAAGAG
'rCTGGTGACC AATAT?1'CAG
ATCCAAAACC
ACGCCI'TTAA
CTTTTCTCAT AGTGGGAGGT GCTCTI'TAG ATATGATGAC TGGATACAG AGAACT?rGT GGGATrAATC 1'GCCTGAAGG GGTCAAGCAG TTGGATTTCT GGTGGCCACA TTCGCTACTC ACTCTCCGTC AGGGCTTGCA TGTAGTGTGA ATAATCCTGC GATGCTCGCA ATGGAG'rCGA ACAAGAATGG AAAAGCGAGG CTTTGTG4GAC GGCGAAGGCG
GCGTTATTGG
AACTTAGTCA
ATAGAGGTAG CGAATGAATA AGGTCGTATC ATTGACTACA TGCGCAACTC TCTCTATGTA TCAGGCTGTA 'rGTTTCACTG CGAGGGA'rGT TATAATGTTG cCACTTGGTC TTTTA.ATGCT GGCATTCCCT ATACAGCAGA ATTAGAAGAG CAGATTATGG CAGACCTTGC CCAACCCTAT ATACTGGGAT TCTCTTGCCA TCTGGTCCTG GACCGGCTAC AATTCTTGTC ACTGATTGAC TTATGCTCCA GTTTCGAGGT AAAGTGGGCA AGTAGTGA'TT TGAAGAGAGA ATGAAGAAAA GAAAGTCAGG ACACTGGGAA GGAATTGTAC CAAGCTTTAG CACCTCAGGA CGCCA'rCTGG TCTGCCAGTG GCGCATGAAC ATGGATGTCT TGACAATTGA rCAAGCCT TGACTTTGCT GGGAGGGGAG CCTTTTCTCA CTTGTTAAGC GGA'rTCGGAA GGAATTGCCA GACAAGGACA ACTTGGGAAG AAATCATGTT GGAAACTCCA GATAAACTGG ATTCTrGTCG ATGGAAGATA TGATCGAACT AAGAGAALATC TCATCTAACC AACGAATTAT CGATGTGCAA AAATCGCTCA 'rGGGACAAGC TCAATGACGG AAAAGAAAGC TATGAACAGG AGGACTTAGT AGACCAACTA GTCTCAGAGA TCGAGACGGG TATACGGTCA TGGAGCTTCA GGTAAATCAA CCTTTGCACA AT'rCTACTAC AGTAAAT'rTG CTAGAGACAG ATCCTTATAT TAGTACCCAA GGACGCGCCG AATCAAAAGG TGACAGCCAG TGGAGAGTTr GCAGAGAGAT ATCCTTGCTr GCAGGCGGGT AGAACCTTGG AAGGCTAGTG AGGTCTTGTC rGGAGCCAAA 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4762 CCAA T~TGA TITGTCGAAGG GATGTCTGTT GGCTTTCTAC CCAAGGAACT ACCATCTGTT TCTACACGGA TGAGGAGACC GAATTAAAGC GACGCCTTGC ACTGTGAGAA ATCGCGATGC GG INFORMATION FOR SEQ ID NO: 92: Wi SEQUENCE CHARACTERISTICS:
CTTTGAAAAA
TACAGATACG
719 LENGTH: 3832 base Pairs TYPE.
Ucleic acid STRANDEDNESS- double TOPOLOGy: linear (A CGGT SEUNEDSRPIN SEQ D NO: 92: AA2IACAC TAT CCAG A AA ~A G ?C MT AG6 GTACc I..G.C r~ T -cr A TGA 120TAT G C GG GAC A C G A C QG C C r G C A. G C C C T T A G A A T C1 8 GCGGTCATC ATTGTTGGCATCA7 6 GG ATTTAT AA T CAC TAr TGAGA G A A GGACC Ac T rACC AGG DAAG 0 ATA ATA C AG~cCT AGAATGA TGC CCAGC ATA rI'G TGAA~ G ATT A CTC TT CA MG c c 420 CCTC C2.A? C CTTAATC AAC rTAT CC AAT CTG 480GTT A C G C T A CT T T C Tc Tr Tc C A A T C G A G A Gc C T 1 A A c cA A T G T A T TC A A C C C G ACTCGGCAC GAA TCcG AT'3CM CCA AC 0 A CGGA
CT
0 GC A GT T C T CACGC A.r A AAC TCcA"CT ACGA C? G 420 CCATAQ Arr rGCCT G GA AAT WGAA G
A
3 C 66o CCA GC TAGG' ArGAJAAAAC T TGA ATTAT CAAG A G G GACATT T ?C G A G ??GAG W TGGCTGA 72CT C G CA G C T ,50 CACGAGGAATCM GA A T G A800 WC A G G C G c A G A T G A A G G~ T A C A A c A r AC A A A TGG A A GAC T A AT GCA ?G GA 2?GGATTT C AG C G TJ T A ATGC G GrGA A A 6o A.AAGAG TA? G~r AT TAG AAAC GGrCGCA G TAA CGGAA M AA G G AA GAGAA G A G CAG G T C AC AA AG TG 1020 CAGATTCA AATGCCTG CT GAATC TCATTGCA CAGATT ~TTG CAA CAG ?AT GC AG cAC1VJC CATGC GCT 140 GAAAGA ATC AC T TGAT T TA T6T ,AA)JI ATAAAAAGTTG ATGTC AcTAC G 7GAT AATCCAGAACTGG 7CrTACCTG TACrT AA 9( CTTGA CTA AACCc ATrGTLL CGTCA GAATA A 1200 G GTAC GC TCG TA C 2 A A CTTC ?GATCAG 1 CTGGAcA A TcTr O AA T CT C TA AT GCJ AG T CT 110 GC AA MyTA GC PCC TC ATC ?CTTG GTGT AC TGAACTTAC 132(0 G C C G A Cc Tc G A C TA 2 T T C G C T A T A G crc G G C T G G A T T G 14 4 0 ACT~ M CT AGDA C C~A~ C TGGG TACGTACT 150o
I
I
720 GGTAAATCAG CTrCACAAGA CGATGCACAA AAAATCTGTA AAGTTGTTCG TCACGTTGTA GCTGCTGACT TTG.GTCAAGA AGTCGCAGAC AAAG~rCGTG TTCAATACGC TGGTTCTGTT AAACCTGAAA ATGTTGCTTC ATACATGGC'r TGCCCAGACG TTGACGGTGC CCTTGTAGGT GGTGCGTCAC TrGAAGCTGA AAGCTTCTTG GCTTTGCTTG ACr7TGTAAA ATAATCAGTA AGTAGCAAAA GCTAGGTGGA ACAGCATTCA GA==TCG= AC-ATTTTTA TAGGAGAGAA AGATTGAAAA CAAAAATTGG ATTAGCAAGT ATCTG=TAC TAGGCrTGGC AACTAGTCAT GTCGCTGCAA" ATGAAACTGA AGTAGCAAAA AC~rCGCAGG ATACAACGAC ACC rTCAAT AGTTCAGAGC AAAATCAGTC TTCTAATAAA ACGCAAACGA GCGCAGAAGT ACAGACTAAT GCTCCTGCCC ACTGGGATGG GGATTATTAT GTAAAGGATG ATrGTTCTAA AGCTCAAAGT GAATGGATTT TTGACAACTA CTATAAGGCT 'rGG=rTATA TTAATTCAGA TGGTCGTTAC 0 0 0. te 0 4 0 TCGCAGAATG AATGGCATGG AAATTACTAC GAGTGGATCT ATGACACTAA TTACAAGAGT GCTCAWCAAG AATGGCAA7T GATTGGAAA'r ATGGCTAAAA GCCAA'rGGCA AGGAAGTTAT AATGAATGGC TCTATGATCC AGCCTATTCT TATGCTAACC AAGAGTGGCA AAAAGTGGGC TATATGGCTC GGAATGAGTG GCAAGGCAAC ACTGACGAAG 'rGATTATGGA TGGTACTCGC CTGAAATCAG GTGGATATAT GGCCCAAAAC TGGTTTTATC TCAAGTCAGA TGGGGCTTAT AAGTGCTACT ACTTCAAGAA GTGGGGTTAC TTCTTGAATG GTCAAGGAGC TATGATGCAA GCTTATTTT-T ATC'rAAAATC CGA'rGGAACT GGCAAATGGT ACTATTTCAA CAAGTCGGGC TACTA'N'TGA CTCGAAGTGG TGCCATGGCG TATATCTTTG CGGCCTCTGG TGAGCTCAAA CACAGAGATG GTAAGCGCTA TTTCTI'TAAT GCTAAGAAAC TCATTGATAT TAGTGAGCAC ATTGATGAGA ACGAAGTGGA TGGTGTCA'TT 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 .000 *0 0.
0
S
0*0* 00 0 0 0
GAAAAAAAAG
AATAGAGAAC
AATGGTCGTA
GTTCGTCTAG
AACCGTCTGG
GCTGAGAGTG
TACCCTATCT
AGTGA'rACAG
GCTITATCAAA.
AT= GAATGT CGGCTGGGTT AACAAGTGGG AACCGAACAT TCAATGATTG GAAAAAGGT'r GTTATAGCC TAAAGAAGAC AAGGAATTGG CGCATAACAT TAAGGAGTTA GAATTCCTTA TCGTGTCTAT CTCTATACCT ATGCTGAAAA TGAGACCGAT ACGCTAAACA GACCA'ITGAA C=ATAAAGA AATACAATAT GAACCTGTCT ATTATGATCT TGAGAATTGG GAATATGTAA ATAAGAGCAA GAGAGCTCCA GCAC==GC TAAAATCATC AACAAGTACA TGGACACGAT GAAGCAGGCG ATG'T'ATC'r CTATAGCTAT CGTAGTTTAT TACAGACGCG TTTAAAACAC CCAGATATTT TAAAACATGT AAACTGG.GTA AACCCTCATT ATTCAGGAAA AAAAGGTTGG ATCCAAGGGC GCGTAGATGT CAGCGT7TWG GCGGCCTATA CCAATGCTTT CAATA'rACCT CTTCTGAATA TATTAAGCGA TGATTTGAAA
AGAATGGGAA
CATGAAAGCA
GAGGGATGTG
ATAGTAGCAC CCTCTm'TC TTrGTTTAT GATAGTTCAT CCTCGAGTAA ATnrCAAGTTC TTGCTCGGAA A74SAAG~CTTA TATACTAAAT TGAATATAGA CAAATACCTI' GTGATTGGTA AAACATTrTA GAAATTCATT TACCrl-IVCT AATCGACTTG GTTTCATCTT ATI"CAATCT ATTATACTAT TGGGGAATTT CTTCAAACCA CATCAGCTTG GTCAGTTCTA CCTGCGACCT CAAAACTrGT GCTTrGGTCA AGCTGGGTTT AGTIrCCTAG TTTGCTGATG GATTTCCAT-r GACTATAAGC ATCCAACCCT CTl-rGTCr TCTAAAGAAT TCTrAAATTA TCACTCTATT GCAACTrTTC TCATATAAGT TCTTTGTC?1' GCTA'r-GGTr TTCCTTAGTA GTATACTAAG GTAGTAATCA TTAAGAAGTG GTrACAAAAA ATAATGAATG AGGTAAAGAA AATGGTAGAA TTGAAAAAAG AAGCAGTAAA AGACGTAACA TCATTGACAA AAGCAGCGCC CO INFOR.MATION FOR SEQ ID NO: 93: SEQUENCE CHARACTERISTICS: LENGTH: 10690 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUJENCE DESCRIPTION: SEQ ID NO: 93: TGAAAAAATC CTCATGAACC TGGCGCCAAT AGACAAGTG;T CTTGT'N'CCC TCACCTTCCT TATAGGCATG GTCAGCTGAC ACTCGAPTGA AGGGTTTAAC AGAAACCTTT GTAATTTCGA CAATGCAGAC AGCCTGATTT TGACTATCTA AAATGACATC GAAGGTCCCT ACTTGGGGAA GTGGTrCGTC TT1CTAGCACA TAGAGCTCAT AGGCTGATGC TGTTGCTGTC TTTTCTCCTr TAAACACCAA ATCCGCTAAA AGGTCTGGTT CAACTCCAAA AGCCCAGGCA TCGATTTCAT CTCCGATCAA AGGATTGATT TGCTTGTATT TATTCCACAT TTCTTGCGGI' ATCATGGGTG CTCCT'TTGTA AT7TTTTACT TTCTTCTTrT ATGTGTTTAA GATGATCTGG ATGGTCAATC TCTAAATCAA AAATCTCTGG AATAGAACTG TAGTGGATAA TGCACTrGAT ACCCAACTGA TTCATTTT.TT GTATGAAAGA AGTATTCAGA TAGCCTGCTA CAGCAAAATC AATCT'rGTTC TrTCTTGCTr TATCCTGCAT ATCTCTTAGC ATATCTAACA TTATTGGACT TTCCATATCA TGCCATTGAC TGTTTCTCAT AGTCGCAAAA ACAAAGGAAG TCAAATCATT CATTCCAACT ACAATCTTTC AAATGCCCGT TTCCAGTATA CTAGATAAGT CAAAATACGC TGACGGTAAT TCAATCATCG TTCCGACTTT CCCAGTAAAA CCCTGCTGAC GCAATACTGT AATAGCTTGT TTTAATTGGT CGCCATCATT GACAAAAGGA AAGATAACAG ATAGATTGGG GTTGG'rTTGA 3360 3420 3480 3540 3600 3660 3720 3780 3832 120 180 240 300 360 420 480 540 600 660 720 780 840 722 TAAACrCTG TAACG.ACATG 'rGCTTCAGCC TGAAATTCAT CCAAACACCC CAGTAAACG.C CTAGTTCCTC TATAGCCAAA CALAGGGATGC CCI'CGTCAA AAAACTC?1'T AGTCCCCACT AAACAATTGG CTCTGTATT CGTTAATTCA GTAAAACGAT ACCAAACTTC CrrACCTAAG TAAAAGGAGC AAATAGTATC AAGATAATCT TTCACAAATI' CCTIGACAACT TTGTAATAGT ATATTTrGAT TGAGCTCTCT CAATAAGTAT TCCCCACGAA -TeATGCCGAC GTGGTGA).AT AGTTGAGGAT AAATTTTrC AAGAATrMT TCG.CCACTAA GCGCAAGTTG ATTTCTCATC ATTCACCTTC CAArCATc' AAGAAGTC1-r GTCCAGTTCT GGAAATCCTA ATAATTCAGA CTTAACCTTC AAGACTAATG GCGATGCATT' 7"rCTCTGTA ATCTCN'GAA TATCCATCCA AATATATCCA AGTGAATCAT TCGCACCATC AGACACAGCT TCCCAAATCG 'rAACTTGAGG TGCACTCTCA TTCATTTCAA CATCATACAA GGCTATGACA TGGTGAACCA TAAAATr- TAACTC7'rCC
TCCCGTCTCT
ACTGCCTCCA
TAACTTTCCA
ACTCCTCCTA
TCGCTTTTCC
GCTTGATTGG
CCGGTCATAGC
CTGACGAAAA CATCGTAGAT TCCATAACTr CTCTAGTCAG GGTAGATCAT ACCGATGTTG TTTTCAAAGC AAACACAGTA CTTCAAAGAC CAGCCACCAT ACTTCCTAAA AAAAGACTAA GGTTCACrA GCCACCCAGT TGTCTTGACT GCTCCTGGAG
ATAACGGCCT
GACCCCAAAG
CTATTGTCAA
CTCGTrrr CAATGCAAAG TGA'TTTGA 'TTTCCATCCA GATTTGTCCT TGCATGGCGC TCCACGATTA GAGTAGCTTC TAACAGTAAA CGTTTCCGTC AGTCCTTCAC CAAGTTGCTG 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1.980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 CATAGTCTAG AGCCAACTGC CGTGACCACC TCCACCTGCT TTTTATTTTC CAGCATTTGT TGATTTCAAA AATCTCTTGA
TTGGTGAAGC
AGGCTAGAAG
GTCAAATAAT
ATGTCCTGCG
GCTCTrGCTAT TTCCTCTGC CAGCCAAACC ACCTGGTTCA CGATACCAAA GACCTGAATC CAGCCAAGGC ATGCTI'GGAT CAATGCAACA CATATTGATG ACCGAGTCAA CTCTACTGGA CCGITTTGTTC CAACAGTGGT
TCAATCCACC
AAA'rCCCCAG
CCAGCTTCAG
GAAGTATAGG
ATGATTCCCT
ATA.ATGTACT
TTGTAATCAT
AAAATAGGTT CCA.AAACTCC AGCAGTATTA CACAAAACAT CCACCTGAGG GCACCAGI'CA CCAAGTCCAA GGTCAAATCT CTCTGTAAAA AGCGAAAATC ACCCTCTAAG AGTGGCTTTT CACCTTGGTC AACTCCATAA ACrrATAGC CCTTCTCTAA AAAGAGGCGA CTTAGCCA ATCCGATCCC TGAACTCACT CCTGTAATGA CTACACGTrT AGTCATGCAC TTCTACCCAA TCCG'MCCA AAACATCACA AACTGTCGGG CTCCACATGG AA-PAACCTTC TCCTTCGCCA GAAACGTTGA TTAGGAAATA AGGTGTCATT TCAAGTGCAA GCCCATTrTG C1'CGATCTA TCAAAGAGTT GGACATAGTT TTCCGCACCT CCCCAACCAG TTCGTACATA 'N=rCTCTTA CCCTTTAACC CAGGCAGCAT CTCTTCAAAT CTCA7437=' TCTCCTT'rAA TTCTACATTC 1. A I 11 w 723 TTCATTTAAT TATAGCAAAA AACCGCTrrA CTGvCTACTAC TTACGGCAAA TTATT-CCCTG ?TGrAAG.CT AAAGTTTGCC GCTCGGCTAA CTACGACTGC CTGCATTI-rr GCTGGATAAC AGGTTACTCC ATA'I-ITAATA GCTAAACGAT CATTTCCAAC AAAATTCCCT TTAAAATACC CCACATCGI'G TAATTGGCAA TATTCAAAAA CTTCCA'rAT'r AACATGAAAA GCTGATTCAA GCTCATTAAC AGCTAACGGC TGCTTCACAT TrrGATTAGA AACTCCAAAA TCTCGAACTT CTGCTACTTG GTCAGATTCC ATCAAAGCAT TACGGvCTT TGAATIGTGAG TTATTrCAAAC CAGCAAGATA AAMrCATAC CM-rC?1-rrc CTTCTCTCAA GTGCT1'AGGA TTTGTTGTAC GCAATATCCA AGAAATGGCA ATAGTTGAAG CAAGTACTTG ATN'AA6AGCT TGAAATTCT CGAATTGTAA GACAGACCAT GCTTGAATrGA TGCTGCCATC TCGCATAGCT GC -rGACTAT ATCCTGCAGT AAAAGCCGCA CTCAA'ZrGTA CTTTTTT'AAG CAACTCCATC ATCATAGGAT TACCrrT ATAAAGGAGA TTAAAGGCI-r CTGGTCGATG AAGGAGCAAG C'TATCTAGAT C'TACTCA'rr TATAATA'rAG TCCTTACAA TGCCACATTT GGACTGAATC CACATC?1= GATCAATCTT CAATCTTTGC AATCAAAATA GGTAAATTCT CTCTTA.AATC TGGACGATTT CATAAATATC AGCCAAGTCG CTTCAACTTC 'rTrTACAGAT
AAAATACCGT
TCAATCCCAA
TTTAGCACAA
AAGGCATTGA
TTATCr'TA
GACCTAACAG
TTCCAACAGA
TTCTCATCAT
ATCTCATCAA
CGC?1-rGCAA ATTCTTTGTC ATCT'rGACCA AGAGTTATGT AACATTCTTC CCTTCATT-A'r AACAAAAAAC TTCTTCACAA CGACCACGAC AAGTGCTGTT TCTACA6AGCT TCCGAGA6ACA AT7TICTCATA ATTI=CTCC TrrAATTTCT CG-ACTTTTG ACTATACT'rC ATTCCAAXAA AGAGGACAC ATCCACTCAT AATAACGGAG AATGG;ACGAA AGACCGCTTC ACAGACAAAC TATCCCACGGC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380
ACTCCATT
TA.AAGGAATA
CAGAGAACCC
TTTCCAAT'rC ATCTTCTTAA ACCCACGGAA CAAGACAAAG ACTrrGTAA GGAAAACATT TGAAATTCCC ACCACAAGAT GGGCAATAAT CATACTGACA CAAATACCGA TAACTAGCGA AATCCTAAAG AGCCGGAATA TAAAAGGCTC CTTCTTGTAT GAAGCTTGCC ATTCCTACAT ATCCTAAAAC AACTAGAAGA ACTATAGTCC CAACAACAAT CTAAGTGCCA ATT-rTCATTT 'rAGGAGAATC TTGGACTAAA CTrCTCGTA AAATTGTGGC CACAAGTCCA AATCCAATCA GAAAAATAAG AAGTTGCCCT AAAAATGTGA GCAAATTCAC TCGrAAGAGA GGACCTTTAG AAAAATCACT TAGTAGTTGA TAATAACGTA A'rACCGCCAG GACAAGAATT GGCGTCAAAA GGGACTCTTT GATAGAACTG CGAGGTGCTC CCTTGAGAAT CTCTTTCATT A=TT'=AG GAT'rCTTACC TAGA'rAATCC 'rCTGCACTCA TGCCATCTCG TrCTGCTTCT GAGAAATCTA GCATCATCAA 724 ATAGATCTGC TCTCTGAGAT AGTCTTCATC ATALGAGAAAT CCAGCAAGAT TAAAACTTTC CCACAACTCC TCAAAATACT TTTGATrrCTC CTCAGAAAAC TCALTGTAGCA AAGCGC?1'GT V1'CrTcarAA TACTTCATTT TC1-TCATGGT TTAACCCCCA ?IICAATCC CTTCTACTTTr 'rTGACTCAAA TCGTCCCM-r GTTGCCAAAA GACTGAGACA CGCTCTTCTC CTTCTTTCAT TAATGAAAAA TACT'rCCGA'r CTGGACCATC TGGCGACGGC CGCAT4GTCGC CTCTTATCCA TTGA=r=?1 TCTAACTTTT GCAACAAAGG A'rAAATAGTT CCTGGAACGA TAGTATCAAA TCCAGCCTCT CGCAAAGTCT GAACCAACTC ATCCAAGACA CAACC?1'CAA GAACACCTr CTTCTAATCT ATIrGTAAT ACCTACTAGT TAGTTTGTAA AGCATAATAG 1-rAATACrT GCCCTACCGT ATGTATGI ACTGACTCG ?I-rGAGCTGA CTTrCGTCAGT TTCATCTACA
ATAACCATAC
TAATAGCTCA
GACTTCACCT
TCGAAAATCT
TCAGT'rTCA'r
ACCTCAAAAC
TCAGTTCAT CTACAACCTC ACCTCAAAAA CATGTTTTGA TTTGAGCAAC CTCCGCTAG AAAAACAGAA CTAGCCTGAA TACAGCTGGA TCAACTGTGA ACCCTGGCTG ACATATTTT TTTCTGCCG ACCAATTCTT ACGATC=TCT CCATCACCAG TTTAGACACT TC=GACTT AAAACAGrGT TTTGAGCTGA GCTGACTTCG TCAGTTTCGT CTTCCTAGTT TGCTCTTTGA CTAGTCCTGT CTAC='IAC GAAGGGTTAA TTTGCCATCA TCATCATT7?T ACGTGCTTrVG CCCTC?1'TTT GACCAATCAT GTTCTTTCA TCACTTCTCC ATACTATATC ACTTCTACAC CTTCAAACCA CGTCAGCGTC CTACAACCTC AAAAACATGT AGTGTrGA GCTGACTTCG CTTCGTCAGTrrCATCTACA C'rACAACCTC AAAACAGTGT T'rTTCATTGA GTATAAATAA CCAATCACAC T'rCCA'TTTCC 'rGT'rCAGCTG AGAGAATCA'r AGGTTAGCAA CGATTTGAAC 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180
GTTCATTTGG
CATCCAACCC
CTGCGACACG
ATAGTAT'rrr GCAA'rTCCTG AkAGAATCTG GAA7TGAAGC AACTTATCTG AACCTTCTAC GATTTCA.ACC TTGTCAAAGT CTTCA-AACTT
GATTTCATCC
TGGTTTATTG
TGGAAAGATA
CAAGT~?TCA
TrrcCATCATA
CATGACACTT
TTGTTTAGTT TGAGCTCAAC ?TCGTCCGGA T'rCCA7TCTT T'rTCGACTC CCTTCCATTT GTTCCTTGAT A'rAGGCGATT TCTTCTTCCA TATTTAGACG GGTG'rTCCTT 'PGGCAACTAC AGTCACATCT GCTGGGAAGT CAGCCAAACT AGACTAGAAA CTTCTTCCAA ACCAAGTTGA G'rCAAAACTG CACGACTAGT AATGGTTCAA TCAAGTGAGC AACTACACGA ATGCTGGCTG CCAAGTGGCT GCCAATTGGT CACGAAGAGC TTCATCC?'rG GCCAAGACCC ATGGTGCGGT CTCATCGATG TATTTA'rTGG ATACTCAACT GCTTCCATGT 'rACGAGAGA'r CAGAGTCCAG ACTGCTTCAA CCACGTGG GTGTATGGAA GTCTGCGATT GATTGTWCTC CAACCTCAGC AAGAACATGA TCATATTCAG TCACACCTTC TACATAGGCA GGGATTTGTC CATCAAAGTA 725 CTATTAATC ATGGAAACCG TACOGTTIAAr. GA=GTTCCCA AGGTCATTAG CCAA'rTCATA 6240 GTTGATACGG CCGACATACT CTTCAGGAGi' AAAGGTTCCO TCTGAACCAA CTCGAAG 6300 ACGCATGAGG TAGTAACCAA GTGGATCTAG TCCA'rAACCC TCTAccAAcA ?ITcAGG.GTA 6360 AACGACATTC CC?1rrrACr TAGACATTrT TCCGrTTTrC ATGACAAACC AACCATGCc 6420 AATCAAACGA 'rCAGGTAATr TAACATCCAA CATCATAAGA AGGAVICCGCC AGrACATAGA 6480 CTGCAAGCGA AGGATATCTT 'rrCCTACC-AT ATGCAACACT G'rTCCA-rCC AGAACTTGTC 6540 AAAGTTACCA TGTTCGTCTT CAGCGTAGCC AAGAGCTGTC GCATAGTrAA GAAGGGCATC 6600 AATCCAAACG TAGACAACGT GTT7TCGAI-r TCATGGGACA GGCACTCCCC ATGTAAAGGT 6660 TGTACGAGAT ACCCCCAAAT CTTCCAAGCC TCGCTCGATG AAG'rTGCGTA CCAT TCATT 6720 AAGGCGACCA TCTGGCGTGA TAAATrrCAGG ATGAGCTTrG AA.AAATTCGAk CCAAACGGTC 6780 TTGGTATTT-G CTAAGGCGAA GCAAGTATGA TTCTTCAGAA AcCCA=CAA CCTCATGACC 6840 *TGATGGAGCA ATACCACCAG TCACATTTCC AGCTTCATCA CGGAAAACTT CTGCCAGCTG 6900 CTTTCTG'rA AAGAATTCTr CGTCTGATAC TGAATACCAA CCAGACTATT CACCCAAGTA 6960 **.GATATCATCT TGAGCAAGTA AGCG-rTCAkA GACTTGTGCG ACAACTTTTT CATGGTAGTC 7020 ****ATCAGTTGTA CGCATAAA' TATCCTATGA GATATCTAGT AATTGCCAGA GrCI-rAAC 7080 *TCCAACCGCC ATTCCATCA.A CATAGGCTTG AGGTGTALATA CCAGCTrC~r CCGCTTTCTG 7140 CTGGATTT'rC TGACCATGTT CATCAAGACC TCTCAGATA.A AATACATCGT AGCCCATCAG 7200 CCGTTTGTAA CGTGCTAGGA CATCACATGC GATAGTTGTG TAGGCAGAAC CGATATGAAG 7260 *TTTCCCAGAT GGATAGTAAA TCGGCGTTGT AATATAAAAA TTTTT?=CAG ACATAATTTT 7320 **.TCCTTCCAG GCAAATGAAA CCTGTTTC TAACACTTCA TTATATCACA T1r=AATGA 7380 ATTTCAATAG GGAAATCCAT ACAAAAACAA GATAGACGAG TGTCCATCTT CTTCATCTCA 7440 *TTCATAACGA ACCGCTTCAA TTGGATCAAG TTTCCATGCC TTGTTGGCTG GCAAGACTCC 7500 0AAAAATCATA CCAACACTAG CCCAAACTGC AAGACTAAAT AGGGCGACTG GGATTGATAC 7560 TCCAACTTCT ATACCTTCTA TTAAACCTTG CAGTAACAAA CCTCIAAGg CAGTTAAACC 7620 ACTTGCAATT GTCAACCCA.A TTAACCCACC TAACAACCTC AAAATCATGG ATTCAATCAA 7680 ***AAACTGAATT AAAATATTGG CA*CCTGTTGC ACCC AAAGCC TTACGAAGAC CAATCTCACG 7740 AGTGCGCTCT GTCACCGAAA CCAGCATCAT GTTCATGACA CCAGTTCCTC CAACAAAGAG 7800 AGAAATCCCT GCGATGGAAC TAATAATCGT CGTCATAAAA CTAAACCATT GTTGAATT-TC 7860 TGCAAATACA ACCCACTCAT CTGCCACCTG GTATTCTCCC TGTTG;TAAGC CTGCAAGCTC 7920 726 TGTCMTTTT CGTGvCCAGr CTGCCA AG?1'GGGGrr AAACTGGTAT CATTCACTCG AAALGACAATA TTAGCTATTT CATCrACATT AAAATTCGCA GCAAGGGAGA TATTIGGTAGT AATAGC-CAAG CCACCAAACC CCCAATGACC CGGTAACTAA AGATTCAAAT AAACTAATGG GA&AATCTTGC TCTCTCAGAC AGTTCTG*=r CCACCTGTCA GGCATTCGTT GAATTGGTTA GACCCAGGAT TCTTGCGGTT CGTAAAAGCT GA7TTGTTTCT CATATATT?'D TGATC71TTrA GCCTCCGGAC TACTATAAAC ATCCATTGAC TTCTACAACC 7rTTAATAG CCTCTTGAGG ACAATTCCTC ATCTAGCAAA ATGACACTTG CAAACTCTrr TACGACCTGC AATAATrrCA TrrCTAACAG CGTCCATG'rA AATT-AGCAI-r CTCAACCTTT rATCTGAT AGGTCAAGA'r CATAGTAACT ATCCACTCCC TTCAGI=rAG CTCCCTCTrG TTGCCGGTTC ACAGGAACT TCCTCTTCCT TTCCAGAAAC GAGTAAAAGA CCCGTCTTTA C71TTTAG CAGAGAAAAA
S
Sb S S
S
S. *S ow S S S
S
55 5 0
S
Sn.
a S OS 00
S
S
SS.S
S
S
5* .S 55 S S
S
GACGCTAATA TT'rTCTGAG A AGTCAT ATCTTTA'IrG ACTIGACGAG A'rAGGGAATC ACCCAAAGCC ATAA'rCACAA CA.ACTGATGA AACACCGATA ATAATCCCAA TCATAGTAAG CAAAGAACGC ATCTrGTGAG CCATGATAGA TGAAAAGGCA AATTTCAGAT TCTGCATCTT AGTTTCCTC CTTTCCTAAC TGAGCACTGT CAGACGAAAr GACCCCATCC CCAATGACAA TCTGACGTTT GGCATAGGCA GCAATCTCAG GCTCATGCGT TACCATGATA ATGGTTTTTC CTrC'rTTATT CAAATCAACC AATAATTGCA TAATTTGGTT ACCTIG GTATCCAAGG CTCCTGTCGG TTCATCCCCT AGGATAATAG AAGGATTGTT TACCAAGGCA CGCGCAATGG CTACACGTTG C'rTTTGACCA CCAGATAATT CTGAAGGTAA ATGGTGACTA CGTTCTGTCA ATrCAACCTT GTCTAAATAT TCCTCAGCCA ACT'rGCGACG TTTTGAAGAC GAAACTCCTG 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 CGTAAATCAA GGGCAATTCT ACA~r=GCA GAGCATTGAG GCTGAAAGAC AAAACCCATT TGTTGGTTAC GGACCTTAGC CAGCCACTTC TTGACC'rTCA AGATAATATT CTCCACTGGT TCGTATTCAT CAGAGTGGAC TTACCAGACC CAGATGGTCC CCTCATTCAC rTCTAGATTG ATATrTGA GAACCTGCAG AACTrCTGAA GATATTTTT AGACTAATTrA GTTCTTCAT TCTTCCAAGG AAGATGT=GG ATTACTGATG ACCTAGCAC AT'rTCTTGAT TTTCTGCGTC AGCATTTCCC AATGAAACCT CTTCGATAGA AGAAAGAACT TAGTTGTr TCACCAACCC TGGTG;TATCC AACATGCCAA CATGATGGCT ACAAATTCAC TTCTTCGTCA CCATTACGG-T CAGCCTTCAC CTCTTTTCCT CGTTCGTTAA ACCAGAACTG CAACT'rT'N'T AGCCTrTTGT TGTTCATCCA CAATCCAGAC ATAATTTTTA CTATCATCCA T'rACTAGACT GCTAACAGGA ACAAGAATAG CCTTAGTT'rT GCTrTTAACC TCAATGTTGA CAGAAAAACC TTGTTTCAAA TCACCAACCT CGCCTGTCAC ATCAATAGTA TAAGGCTATT TAGAACCTGT ATT-A'rTCCCG G1CTGCTGGAC TAGC1'GCrrC ACCATTGTTT TTAGGATAGT CAGAAATATA GCrTAATTI-C CCAGTCCA'PT TrrrATCAGG ATACACTTTA GAAGTAAAGC TTACTrCI'G ACCTACAGAA AGGrGGCTA GATMGTACTC AGACAATTCT CCCTTGACrT GTAAATTTrC ATrTGCrGACA ATATGAACCA TAACI'TGACT CGCCCCTG?1' GGAGArrTAG AAACATGCT ATTGACTTCG ACCACAGTTC CCTCTAGCG1 CTTAATTGCG CCGCAGCATC GAAGCAACAG AATTTCCAC ACTGGCGCTG GTAACTGTGG TCATTGATAT GACGATCTGC GC=TCTGAAC TACTGTACTT ACAAGGATTT CATCTAAATC GCTGTTACTG TCCCTGACAA ACTGAGAACA GT7 GFGCAT CCAA'TGACT TGCACGCGCA 'CACGGGCAT CACCCAATTG CACTGGAGTT GGGCTTTGCA. CCGTTGCATC TTGAGC!CTrG
AGCGTCAATA
TTCTCCTCCT
AGCCGGAGCT GAAGCGGCTr CCTAGCTACT GCTCGACTAG GACTAAAGCC TGCCCTTCGC ACCCTTACTA GCATCAAAAT TAAAACAGAG GAGGCCACGC CA TTTCGTGC T-rGATTGAGT CTGAATCATA GGCCGCCTGC 9180 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10690
TGACCTTATC
AAACATATTG
TTC=C=C
GCCCACAGAA
TTCATTTT
GGCAACAACA
TAAAATCCCC
CCACTTTTTA
AGATGAGTAG GcTCATCTTr TAGAGCAGTC TGAGAAGGTT GTCTAAAGAG CCAGCACCCA ATACAACTAC ACTCGCAGC.A CCGATTGCTG CATACAGTTG CCTTTACCAT TCTTTTTCTT CATAATGAAA CrCCTTrrTr TATACCAAAT rTCCCTCCAG CAAACAATAC AGTTCAGCAT TGCTTrTTCGG INFORMATION FOR SEQ I0 NO: 94: TTTTACAAT ACT'rTGCTAT TAAACAATCG TTCGGAATTT SEQUENCE CHARACTERISTICS: LENGTH: 8195 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi)'SEQUENCE DESCRIPTION: SEQ ID NO: 94: GAGAAAGCGC CCACGTTTCC CCGA.AGGGAG AAAGGCGGAC AGGTATCCGG TAAGCGGCCA GGOTCGGAAC AGGAGAGCGC AACGAGGGAG CTTCCCAGGG GGAAACGCCT GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA T?1I=GTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA CGCGGCCTTT 'PTACGGTTCC TGGCCTTTG CTGGCCI'TT GCTCACATGT TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT TACCCCTTT GAGTGAGCTG ATACCGCTCG CCGCACCGCA ACGACCGAGC GCAGCGAGTC 728 AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTrGGCC GATTCATTAA TGCAGCTGGC ACGACAGGTT TCCCGACTGG AAAGCGC4CA GTGAGCGCAA CGCAATTAAT GTGAGTTAGc TC-ACTCArA CGCAccccAG Gc?11'ACAcT TTATGcrrcc GGCTCGTATG TTGTGTGGAA TTGTGACCGG ATAACAATIT CACACA GGAA ACAGCTATGA CaTGA'rTACG AATTCGAGCT CGGTACCCGG AAAATCCAGA AAATGCTTGA AAAAAATCCr AGAAGATGGT ATAATACTAA ATTGTAAGGG CAAAAGGAGA GTCAAACTAT GGCTTCTAAA CACCCACGTC CAGCAACATT GTTGGTACAA CTTGAGTACA AAGGTAAATC AGTTAACCTT GTTGGCCAAG GTGCTGACGT AACTATCTCA GCTGCAATCT CAGAAACAA'r GGAAAAAGAA TTAAAGGAAT CGCAGCATCT GACGGTGTTG CGGATT'rGTC ATTTGAGACT ATTACAGTCG ATGCCGCTCT ACAGGCATCA CAAGACGAC CGCTCGCTGA AGAAGCAGCT CAAGTTTTTG AAATGATCAG CCAAATCAAG GAAACTATCC T'TATCACA'rA TAACTCAAAA AAAGAAAGAA GA'rrTCCACG TAGTGGCAGA A.ACACGTArr AC'rGCTAGCA AATrGCTTC AGATATCACT AAATCAATTA TGCGTGTTAT GAGTCTTGG GCTGAAGGTG CAGATCCAGA TGACGCTATC TGAAAGA.AGT TACAGATATG TGCAAGAACG CGCAGCGGAT GTAAAAAATT GCCAAACCCA TGACTCCTTC AGATACAGCT TTCGTGGACG TACAAGCCAC TAGGTACAAA TAACATCACT TCACTGGAGA AGTGA'rTATC GTGAAGCCTA TGCGAAACAA CTGCTGACGG TAAACACTTC
TTTATCACTA
WTCCGCGACG
GCrrCTATCA CAAT'rGGACA
TCAGCTATCA
GAAATCGTTA
AACCCAACAG
GGATTGGCAT AAGGGAAATG ACAGAAATGC CAG'rrGCAAA AGCATATCTA CTCGTCAGC AAGA'rACAAA CGC.AGAAGAA GCTCGCCTTG TTTCTGTTAT TCGCGAGAAA GCAGTAGGTA ATCCTCACTT AATGGTITCTT GCTCACCCAG GTGCGAAGAA AGTGAATGCA GAAGCAGGTC TCTTTGAAGG CATGGAAGAC AACCCA'rACA TGACAAAACG TGTATTGCCA AACCTTCTTC ATGAAGAAGT CAT-rGTGATT GCGCATGACT AAAACTTTGT AAAAGCTTT GTAACCAACA 'rCGCACGTAC ACTTGAAATT GCTGCTGTAT AAGACGGTGA CATCCTTGCr GTTAACGGGA ATGAACAACC GGCAGAATTT AAAGCAGCTG 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 AAAGCTGAAT GGGCACTTTT GAAAGATGCT CAAACAGTGA GAGTTGGCTG CTAATATCGG TACTCCAAAA GACGTTGAAG GTGTTAACAA CAACGGTGCA GAAGCTGTTG GACTTTACCG TACAGAGTTC TTGTACATGG ATTCrCAAGA CTTCCCAACT GAAGA'rGAGC ACTATGAAGC ATACAAGCCT GTTCTTGAAG GAATGAACGG TAAACCTGTT GTCGTTCGTA CAATGGATAT CGGTGGAGA'r AAGGAACTTC CTTACTTCGA TATGCCTCAC CAAATGAACC CATTCCTTGG ATTCCG'rCCT CT'rCGTATCT CTATCTCTGA GACTrGGAGAT GCTATGTTCC GCACACAAAT CCGTGC'TCTT CTCGTrGCC;T 729 CTGTTCACGG TCALATTGCGT ATCATGTTCC CAAS'GGTTCC GCCTTGAAA GAATT-CCcTc CAGCGAAAGC AGTCrTGAT GAAGAAAAAG CAAACCTTCT TGCTGA6AGCT G??GCAGTTra cGaAT~AAcAT ccAAGVTGGT ATcATGATcG. AGArrccTGC AGCGCCTATG Cl'rGCArAcc AATTTGcTAA AGAAGTTGAc T~cTTCTcAA TLcGTAcAAA CGACXCATC CA6ATATAcAA TGCAG.LCAGA CCGTATrGAAC CAACAAGTTT CATACCrTA CCAACCATAC AACCCATCAA TCCTACGCT GATrAACAAT GTGATCAAAG CAGCTCACGC TGA6AGG1TAAA TGGGCTGTA TG743TGGTGA GATGCTGGT GACCAACAAG CTGTTCCACT TCTTGTCGGA ATGGGC?1'GG A'rGAGTTCTC TATGTCAGCA ACATCTGTAC TTCGTACACG CACTrG-ATG AAGAAACTCG ACACAGCTAA GATGGAAGAG TACGCAAACC GTGCCCTTAC AGAArGCTCA ACAATGGA.AG AA=~CTTGA ACTrCAAAAA GAATACGT'rA AT?'N'GATTA ATCGAAAACT CCCTGCAACT CAGTTACAGG GATrrtIrG ATAXrTTA.AA AAGAAxrT~T-C AAGwAAATCT TTC--rrATACA AAGTCCAACC TTGAAAAAGT CTTGACAAGT TGGATATTTA C'TCCAATTTA GAAATCAATI' ATTAATAAGA AATACCTTG= AGTTGGGACT GTATCAAGCT GAAAACAAGC GACGCAAAAA AGTCGTCAGA ACAAAAAATA CTTAA-ATGGT TCATAAAAT-r GGAGTAAAC'r ATTAACCACT TAAG'rAATAG AGAGGAGTTT GCA6ACTAGAA ATATCAAATA CAAAGAGAGT TTCGATGAAA TG'rCTGCG GCACT=GAT =TAAGTGTT TGTCTTACG AGAACGGTTA AGCAAAATAA TCGTG-TTTCC TATATACATG ACGGAGAATT TGACTCCTCA TCAGGj=AGC AACCCTGAAC 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 33.20 3180o 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900
GAATCAATGC
GCGACCACTA
TACTCATGAA
GTGGATATGT
TGAGCAAATC GTCA'rC.PAGA TA6ACAkGACCA AGGCrATGTC ACTTCACATG TCATTA'rAC AATGGTAAGG TTCCTTATGA CGC'TATCATC AGTGAAGALAT AGATCCAAAC TATAAGCTAA AAGATGLAGGA TATTGTTAAT GAGGTCAACG TATCAAGGTA GATGGAAAAT ACTATGTTTA C =7AAGGAT GCTGCCCACG CGGATAACGT CCGTACAAAA GAGGAAATCA ATCGACAAAA ACAAGAGCAT ACTCAACATC GTGAAGGTGG AACTCCAAGA AACGATGGTC CTGTTGCCT'r CGCACGrrCC CAAGGACGCT ATACTACAGA TGATGGTTAT ATCTTTAATG CTTCTGATAT CATAGACCAT ACTCTGATG CT TATATCGT TCCTCATOGA GATCATTACC ATTACATTCC TAAGAATGAG TTATCAGCrA GCGTTGGC TGCTGCAGAA GCCTTCCTAT CTGGTCCACG AAATCTGTCA AAITTCAAGAA CCTATCGCCG ACAAAATAGC GATAACACTT CAAGAACAAA CTGG;GTACCT TCTGTAAGCA ATCCAGGAAC TACAAATACT AACACAAGCA ACAACAGCAA CACTA.ACAGT CA6AGCAAGTC AAAGTAATCA CA'ITGATAGT CTCTTGAAAC AG.CTCTACAA ACTGCCI-rT AGTCAACGAC 730 ATGTAGAATC TGATGGCCTT GTC?1'TGArC CAGCACAAAr CACAAGTCGA ACAGCTAGAG GTC?1'GCACT GCCACACGGA GATCXTTACC ACTTCATCCC TTACTCTCAA A'rGTCTGAAT TGGAAGAACG AATCGCTCGr ATT-ATTCCCC TT~CGTTATCG ?TCAAACCAT TGGGTACCAG ATrCAAGGCC AGAACXACCA AAGTCCACAAC CGACTCCGGA ACCTAGTCCA GGCCCGCAAC CTGCACCAAA TCTTAAAATA GACTCAAATT CTTCI~rTGGT TAGTCAGCTG GTACGAAAAG 7rGGGGAAGG ATATGTATTC GAAGAAAAGG GCATCTCTCG rrMGTCTTT GCGAAAGATT TACCATCTGA AACTGTTAAA AATCIrGAAA GCAAGTTATC AAAACA.AGAG AGTG=?CAC ACACTTTAAC TGCTAAAAAA GA.AAPTGMr CTCCTCGTGA CCAAGAATTT TATGATAAAG CATATAATCT GTTAACTGAG GCTCATAAAG CCrG7rGA AAATAAGGGT CGTAATTCTG ATTTCCAAGC CrAGACAAA TTATTAGAAC GCTTGAATGA TGAATCGACT AATAAAGAAA AA7'rGGTAGA TGA'rrTATTG GCATTCCTAG CACCAATTAC CCATCCAGAG CCACrCGCA AACCAAATTC TCAAArrGAG 'rATACTGAAG ACGAAGrrCG TATTGCTCAA TTAGCTGATA AGTATACAAC CTCAGA'rGGT TACATTTTTG ATGAACATGA TATAATCAGT GATGAAGGAG
S
S.
S
SS.
S. *S S
S
S S S S
S
ArGCATATGT AACCCCTCAT ATGGGCCATA ATAAGGAAAA AGTrGCAGCT CAAGCCTATA CAGACGCAGA TG=TAAAGCA AATCCAACTG TGAAAGGGGA AAAACGAAT CCACTCGTTC AGGTTAAAAA CG1AATTTG ATTATTCCTC CTTGGrTrGA TGATCACACA TACAAAGCrC GTCACTGGAT TGGAAAAGAT CTAAAGAAAA AGG'rATCCTA GAGATAG'rGC AGCAGCTATT GACTTCCATA 'rATGG1'TGAG ATAACCATCA TTACCATAAT
AGCCTTTCTG
CCTCCATCTC
TACAATCG'rG
CATACAGTTG
AT'rAAATTT-G 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 CAAATGGCTA TACCTTGGAA GATTTGTTTG CS S
S
55.5 5
S
S. 55 5
S
CGACGATTAA
GCAATGCCAG
TCAAAGCGGA
AGACTGAAAA
ATTCTAGTCT
TTCAAATTAT
AAGGAAGTAA
GTACTACGTA GAACACCCTG ACGAACGTCC ACArrCTAAT GATGGATGGG TGAGCATGTG TTAGGCAAGA AAGACCACAG TGAAGATCCA AATAAGAACT TGAAGAGCCA GTAGAGGAAA CACCTGCTGA CCCAGAAGTC CCTCAAGTAG AGTAGAAGCC CAACTCAAAG AAGCAGAACT 7rTGCTTGCG AAAGTAACGG GAAAGCCAAT GCAACAGAAA CTCTAGCTGG TTTACGAAAT AATTTGACTC GGATAACAAT AGTATCATGG CAGAAGCAGA AAAATTACTT GCGTTGTTAA TCCTTCATCT GTAAGTAAGG AAAA.AATAAA CTAA'rGAAAA ATGAAAGTCT CGATA.AAGAG GcCKN'TT TTATTATGTA TATATGI'AAA AAAAGAGTAA ACTATTAACT ACTTAATTA.A CCGG=~ATT TACTTAAGAA AAGAGGAAAG AATCAAAATT AATAAAAAAT
ATTCTTGACA
ACT'rTATAGT
ATCTAGCAGG
AGCAATATTA
GAATCAAATA
TTCAGTGGCA
TCAGGTTAAG GTCCTTGCCC TAACTGTTTG TTCCTATGALA CTTGGTCCTC ACCAAGCTGG AAAGAAcrcrA ATCGAGTrkC TTATATAGAT TTGACACCAG ATGAAGTCAG TAAGAGGGAG ATTACCGATC AAGGTTATGT GACCrCI'CAT GTCCCrrTG ATGCCATCAT CAGTGAAGAG AAGGAT"TCAG ACATTGTCAA TGAAATCAAG TACTATGTT ACCTTAAGGA TGCAGCTCAT AAACGTCAGA AGCAGGAACA CAGTCATAAT GTTGCAGCCA GAGCCCAAGG ACGCTATACA GATATCATTG AGGACACGGC TGATGCTTAT ATTCCTAAGA ATGAGTA'rC AGCTAGCGAC AAGCAGGGAT C'rCGTCCTTC TTCAGTTCT TTGTCAGACA ACCACPATCT GACTCTCACT ATTTCAAGCC TTTTACGTGA ATTGTATGCT GATGGCCTTA rTr'rCCACCC AGCGCAAATC CCTCATGGTA ACCATTACCA CTTTATCCCT AT'rGCTCG;TA TTA'rTCCCCT TCG 1'A'CGT GAACAACCAA GTCCACAATC GACTCCGGAA CCTCAACCAG CTCCAAGCAA TCCAATTGAT GTAGGCGATG GTTATGT=? 'rGAGGAGAAT CTTTCAGCAG AAACAGCAGC AGGCATTrGAT CATAAGCTAG GAGCTAAGAAL AACTGACCTC GCTTATCACT TACTAGCAAG AATTCACCAA GATT-TTGAGG CTTTGGATAA CCTGTTGGA6A AAGTTACTGG ATGATATTCT TCCCTTCTTA AAiACCAAATG CGCAAATTAC CTACACTGAT AAGTACACAA CAGAAGACGG T"TATATCTTT GATGCCTATG TAACTCCACA TATGACCCAT GAAGCTGAGA GAGCGGCAGC CCAGGCTTAT ACAGACCATC AGGATTCAGG AAATACI'GAG 731
GGTGATCAGG
GGGA'rCAACG
GGAGACCATT~
CTCCTCA1'GA
GGTGGTTATG
ATCATACrA
AAGATCCGAA
TACAAGGT
TAATGGCAAG
TTATCAGTTG
ACATGGAAAA
CGATAATA 'rTCGGACAAA AGAAGAGATT CACCGGGGTG GTTCTAACGA TCAGACTA ACGGATGATC G'rTATATCrT CAATGCATCT ATCCTrTCCTC ACGCGACCA ?TACCATTAC TTACCTrGCTC CAGAAGCCTA 'rrGGAATGCG AGTTATAATG CAAATCCAGC TCAACCAAGA CCAACTTATC ATCAAAATCA AGGGGAAAAC AAACCCTTAT CAGAACGCCA TGTGGAATCT ACA.AGTCGAA CCCCCAGAGG TGTAGCTGTC TATGA.ACAAA TGTCTGAATT GGAAAA.ACGA TCAAACCATT GCGTACCAGA TTCAAGACCA CCTACI'CCAA GTCCGCAACC TGCACCAAAT GAGAAATTGG 'rCA.AAGAAGC TGTTCGAAAA GGAGTTTCTC GT'rATATCCC AOCCAAGGAT ACCAAACI'GG CCAAGCAGGA AAG=rATCT CCATCTAGTG ATCGAGA6ATT TTACAATAAG GATT'rACTTG ATAATAAAGG TCGACAAGTT CGACTCAAGG ATG.TCyCAAG TGATAAACTC GCTCCCATC G'TCA'rCCAGA ACGTTTAGGA GA'rGAGATTC AAGTAGCCAA =TGGCAGGC GATCCTCGTG ATATAACCAG TCATCAGGCG ACCCACTGGA TTAAAAAAGA TAGTTI'GTCT GCTAAAGAGA AAGGTTTGAC CCCTCCTTCG GCAAAAGGAG CAGAAGCTAT CTACAACCGC CTGGTCAAAA GGCAGAAAAC CCGA6ACAAAT CGTCATCAAG 732 GTGAAAGCAG CTAAGAAGGT GCCACTTGAT CGTATGCCTT ACAATCTTCA ATATACTGTA GAAGTCAAAA ACGGTAG1r AATCATACCT CATTATGACC ATTACCATAA CATCAAATTT GAGTrGG ?TG ACGAAGGCCT TTATGAGGCA CCTAAGGGGT ATACTGA GGATCTTTTG GCGACTGTCA AGTACTA'rGT CGAACATCCA AACGAACGTC CGCATTCAGA TAATGG??FT GGTAACGCTA GCGACCATGT TCGTAAAAAT AAGGTAGACC AAGACAGTAA ACCTGATGAA GATAAGGAAC ATGA'rGAAGT AAGTGAGCCA ACTCACCCTG AATCTGATGA AAAAGAGAAT CACGCTGGTT TAAATCCTTC AGCAGATAAT CTTTATAAAC CAAGCACTGA TACGGAAG3AG ACAGAGGAAG AAGCTGAAGA TACCACAGAT GAGGCTGAAA TTCCTCAAGT AGAGAATTCT GTTATTAACG CTAAGATAGC AGATGCGGAG GCCTTGCTAG KAAAAGTAAC AGATCCTAGT ATrAGACAAA ATGCTATGGA GACATTOACT GGTCTAAAAA GTAGTCTrCT TCTCGGAACG AAAGATAATA ACACTATTTC AGCAGAAGTA GATAGTCTCT TGGCNTTGTT AAAAGAAAGT CAACCGGCTC CTATACAGTA GTAAAATGAA TGGAGCATAT rrTATGGAGA AGTAACCTTr CGTGTTACTT CTCTTTTTTA GAAAAACGTA ACAGA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 2004 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TTTACTAAAA GGAAAAAAGA ACTGA'TTTCT CAGTCCTTCA TrAATCTTAT TCCACACTAA ATAGGTATGG GTAAACAGGT TGTTGACCTT GGTGAATCTC GACTTCAACG TCTTCGAATT CTTCTACGAT TTCTTGAGCG ATTTCATTGG CAAGTTCTrC GCTTCCGTCT TCACCTACAT AGAAGGTrAC GATTrCACTG TCTrCATCCA ACATATGTT CAAGGTrTCA GTCAATGT PT GGTGCATATC! AGGGrrTGAC ACAAGAATNT TrCCATCCAC CATACCTAAA TrATCGTT CATGGATTTC TAAGCCATCG ATCGTTGTAT CACGCACGGC TGTTGTGACG CTTCCGCTAA 7500 7560 7680 7740 7800 7860 7920 7980 8040 8100 8160 8195 C4GACATCGCT AAGAGC-AGcT GTCATACGCT cITGGTTTTC TTCAATGGAC CAAAGGCAAG AAGACTTGTC ATACCTTGAG GAAGAGTGCG AGCCTCTACC GTTGCTCCAA AACTTCTGCC GCAGATTGAG CTGCCATGAA GATGTTCTTG AGAAGATGAT GTTACGGGCA TTAACCTG'IT CAACAGCCTT GATAAAGTCT GGTTCATGGT TTG.ACCGCCT TC!GATAACAT AATCCACGCC TTGAGAACAG
'PTGCTTGGAT
ACTACCGCTG
TTGTTGGCA
TCTGTTGAAG
AAGATATCTG
CTAGACCTTT ACCAGCCACC 'rAACTTGAGT AGCTCTTcTrC T TACCTTGAC CAAGCTACCA TATGAACATG GACTTTGACA CATCCAAGTA GrrACGGAAT TAAGAGCTAC CATGAT1TTCA CAGCTACAGA CrATGATGC CAAAGTCCTC AGATGCA6ATA AGACCAATCC ?TGACCACCT CTGGTGTT AGCI'AGAGCT CGTCATCTGT TTGCTC.AGCT AAATCGTTCC TTCAACAG AGGCCAGAGC CAACCTGA CACGGAAAAG CTGAGACGTA 733 ACA~cAATcP. AACCATACTC 11'1CTTCA GCCGACTTGA TCA6AccTGTG crrCGTGrG GrrACGCATA TTGTCpjAC1- TATTTGAGAc CrrCTGCAT AACAACTCCT GGATCTTCTG A?1rCATCA'r CGTIAAC.AAC AAMGAGAGAA TCTrCCAAGCT TCATCGTAGT CAA ATC1rr AGCATAGcTT GGACCTMCT GTACAGTAAC CAAACCTGAT =rCTCAGTC GCTACG GAC TCTACATICA TCATCTCACT CATG? TGGCA GGAGTCGCTA TATTCGCCAG TAAGCCCTA AAGGAXACC1' GAGTCCACAA CGCCAACTC =TCAATACT GTrrAGCAC CTTCCAAGGC TGCGCGC.ATG T1-rT'rC=A CACCGATACC AGCICCACGA
TCGTAGATGA
CGAAGCATGT
ACTTCAACAG
GAAACTGT'rA
TTCATCACTG
CCTGTTAACT
ATCACI'CCTG
TGGCAAGAAT
CACATTC
CGTT'rAATGA 171 -1-I LGA
T'TCAATGTAG
AACACGCTCT
ATATACATCA
ATTTTCC2-rA
GACCACACCA
GCTCGCTACT TC'TCCAACTG AATGGTCATT CCCATATTTG ATTGACATAT TCACTGCT TAAGCTAGTA GTAAI"TTTTG ACATTTACAG TCTGAGC.AG'r TGAATGTTTT TrrGACACTTC ACTGCAATAC TGCCATCTTC CCTAGCAGGG CTTGGAAATT GAAATC'rCAG TTGC CCTTATAGGC AA CTTCCACA CCTGATGGA CCTCTTTATC CTTGATAGCT TrCGGAAATC AGTCCCACG CGCACCC.ATC AAAAGCCCTT 'rAGAAGCTGG CTTGCTGCA A CTTCTTTAG TCCCAGTATC TCCATCTGGA ACTGGAAAGA TAT TCAAGCG AGTGATGCA GCCTGCACCA ACACGGTTAT TCTCCTACAA CTTT-GATAr AATTCCAAGC TGGTTTCCA AGCTAAAGC ACTAATCTTT GTTCCGTAGC TTAACACGGT GCTGCCT1TT ACCACGACAC CTrTAGAATA ATCTTTGAGG GCATT=rAC TACCCATACC 1560 162U 1680 1740 1800 1860 1920 1980 2004 INFOR14ATION FOR SEQ ID NO: 96: SEQUENCE CHARACTERISTICS: LENGTH: 11915 base pairs CSB) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:- 96: 734 CCGGGTTGGO CTGTTCGCCC ATTAAAGCGG CACCACAGCT GGGwrCAGAA CGTCGTrGAGA CACT14CGGTC CCTATCCGTC GCGGGCGTAG GAAATTGAC AGCATCTGCT CCTAGTACGA GAGGACCASGA GTGGACTTAC CGCTGG1'GTA CCAGTTGTCT TGCCAA6AGGC ATCGCTGGG'r AGCTATGTAG GGAAGGGATA AACGCTGAAA GCATCrAAGT GT-AAACCCA CCTCAAGATG AGATTCCCA TGATTATATA TCAGTAAGAG CCCTGAGAGA TGATCAGGTA GATAGGTrAG AAGTGGAAGT GTGGCG.ACAC ATGTAGCCGA CTAATACrAA TA=CCGAGG ACTTATCCAA AGTAACTGAG AATA'rGAAAG CGAACCG~rr TCTTAA.ATTG AATAGATAT'r CAAT=rAG TAGGTkrrAC TCAGAGTTAA G'rGACGATAG CCTAGGAGAT ACACCTGTAC CCATGCCGAA CACAGAAGTT AAGCCCTAGA ACGCCGGAAG TAGTTGGGGG 'rrGCCCCCTG TGAGATAGGG AAGTCGCrTA GCCrACGGA G~rrAGCTCA GCTGGGAGAG CATCTGCCTT ACAAGCAGAG GGTCAGCGGT TCGATCCCGT TAACTCCCAT 'I-1rAGCGGGT GTAGTTTAGT GGTAAAACTA CAGCCrrCCA AGCTGrrTC GCGAGTTCGA TTCTCCTCAC CCGCT7TGAA CrI-rG'1-rc TGTACCAAGT TrGATTG GGCCCGTAGC TCAGGTGrT AGAGCGCACG CCTGATAAGC GTGAGGTCGG TGGTTCGAGT CCACTCGTGC CCATAGTGI-r TAGTCCATTA CTAGGGGA'rr GGAATATTAT CrTT~CACTA AGAGGACACG GGCTTGTTCC CGTATAAACT ATTTrGGAGG ATTACCCAAG TCCGGCTGAA GGGAACGGTC TTGAAAACCG TCAGGCGTGT AAAAGCGTGC GTGGGTTCGA ATCCCACATC CTCCTTTAT ArTA.ACGCGG GATGGAGCAG CTCGGTAGCT CGTCGGGCTC ATAACCCGAA GGTCGTAGGT TCAAATCCTG CTCCCGCAAT AAGGCTCGGT AGCTCAGTT-G GTAGAGCAAT GGATTGAAGC TCCATGTC'TC GGCGGTTCGA TTCCGTCTCG CCCCATTTAT ATATTTTGGA AGGGTAGCGA AGAGGCTAA6A CGCGGCGGAC TGTAAATCCG CTCCTTCGGG TTCGGCGGTT CGAATCCCTC CCCTTCCATT TTACGGGCAT AGTTTAAAGG TAGAACTAAG GTCTCCAAAA CCTTCAGTGT GGGTrCAATT CCTACTGCCC GTGTTAATAG AATTATGCG GGTGTGGTGA AGTGG=rAAC ACACCAGATT GTCGCTCTGG CATGCGTCG TTCGATCCCC ATCACrCGCC TATTTTATAT TGGTATAG CCAAGCGGTA AGGCAAGGGA CTTTGACTCC CTCATGCGTT GGTTCGAATC CAGCTACCCC AGrrACTATT TCCGGCGTG -GCGGAATTGG CAGACGCG;CT GCACTCAAAA TCCAGTGTCC GCAACGACGT GCCGGTTCGA CCCCGCCCGC CGGTATAGrA TAGTGTTAGG AACGTTGTTA TTrCrCCTTC CTTTrTTTATA TTA?'rTTTGG TATAArrATA GTTATTCAAA 7=~ATTTAG ATTAAGAAAG TGTAGGGGAG TATGTCTTGT 'rCTATCGATT TATTAAAACA TCGGTA'rrG AAAAATATTA A.AGAAAATCC TG.AATTGTTT GTCGGAATTG AGTTGGACTA TCCTGTTGCA AGrrTAGAAG GGCATrC~rAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 735 AGATIGTTG.AA GTrA7GAA= AGCAAAGGTA GATCATTTTG TATTTTATTT GAAGTI'TCCT TCAAGAGGTC GAAAATCGTT ATCAAATCAT GCTATrGTTG TCCAGT'G.cr T1ATCCACcGC ATCTATTTCA 1-rA~rAGrr TCTACTTI'GG ATCTCACCGT GCAATCTGXT CCAGTTAGTA GATCCGA'rAA GTCAGGATGC ATACAACGAT TGAGTTTGCA ?'rTCGGTAAGG CAAACQAT TCAATAATA TATGAATGTA ATrCAAGAA AGrTAGCTGA GCTGTCGGTAT CCATCCCAAC TGGGA'rAAAA ATGACAATTG ATCACM'GTT GATGcATTAT rrAA'rrCA GTAGAAATAT ATTTcCrGA ATATGGTACT =TATCTGTC CGGCCAGGT CCACTACTT ACCGGTGATT AATGCTrMA CTCAAATTGA TTGCAAACTC TCAA'r -CG GGTGCGG;ATr GGGATACGAA GGGA.ACAATC TATGCATGGT ATCTATCCAG AGAATC7-rGG PTGATGAAAC TGAT'N'TTT GACTATCTAA ATCATTCTGC
TATTAAATCA
TCAGCTGGAT
AGCGTAAG
GATTTACA'rC
ATTCAAAA
GCTTATTTAT
AA'rrTCAAGG GATA-TTC' GGTCAATGCT AGACTCCrrA 0 00 *0 0 GA'='-ACT GCGGA.ACGTG CTATTTGGCT ACCTCCCAAA CCCCCAAGAG AAGGA'TTG AGGAACAGTT GAGTTTCG'rA AGCTT'rICAC TTGGGAT'rAT ACC3-rTCTTT' AAAGTATTTG AAATCTTACA GATGAGGAAG AGCTGAGGAG GGACTAG'rGG AGAAGAATTG AGCCTATAA'r AATGGACGAG ACTATACATA TGATGCCAAC AATACAGTAA CG*GC-TII ACACCGGA'rA ATGGGCAGAC C'rATTATTTT TATCCTATTC ACCGGCA TCCAAGCATT TGCTCrGAAT CGG4ATCAGG rTAT'rATTTA AAACTCATCG TAGTTACCAG TACCAAGATr TAACGACTCC GTGTGTGTAC ACAGCCACTT GATAGGACTT TTGCTrCTGC TGGTTAATTT AGACAAGT1'A CAAGCTTACT TAGAAACAC CTTATGATTA CAAGTCTrrA AGGAGACAAT TTTCTAAGAA AAACTACGAT TATTGAATTT TCCAAAGACT TACTCCTACT TGAGAAATAA G;GAAGAAATG ACCTATTTAC AGCCTTTGAG TTCTCTTATA AAGGGAGAAT T"ErTCTGAAAA ATCATGATAT AAGGATAGAG AGTAATGACA TTAGCTTTATC AATCAACCC CTCCCAGCCA AGCAATT'rrG CAAGC'rTTG CCACGGACGG CTTATCCAAA GGTACA1-1rG AACTLIGACA AATTGAAAGA 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 0 *000 0 000.
0000 00 00 0 0 0 TCC~wrCTTAC CAGCAAGTTG CTAAGCTAGT TTGTCAGCA TTTrTAGATG ACTTTACACT TGAGGAGTTG GACTACTGTA TCAACAATGC CTACGATAGC AAATTTGATA CTCCACCTAT TGCACCATTA CTGAAATTAG ATGGGCAATA CAM-rTGGAA CTTCCATG CTTCAACGAT TCCTNTAAG CATATCCCCT TGTCTATTT CCATACTTI ATGACGACTG CTGCTAAGAA ACATGGI-rTG GACAACAAGA ?TGMrATCrr GACAGCCACA TCTGGTGACA CCGWGAAAGC TGCTATGGCG GG-GTMGCGA ATGTGCCTGG TACTGAGATT ATCGCT'N'T ATCCAAAGGA 736 rGGrGTCAGC AAGATTCAAG AGTTACAAAT GACCACTCAAG ACrGGCGACA ATACTCATGT TATTGCTATT GATGGTAACT TTGACGATGC GCAAACAAAT GTGAAGC-ACA TG?'TAACGA CCTGGCCTCT CGTGAAAAAT TGACTACCAA CAAGTTGCAA TTTrCATCAG CTAACTCTAT GAACAT'rG CGTCTGGTGC CACAAATI'GT TTATTATGTT TATGCTTACG CTCAATTGGT TAAGACTGGT GAAATTGTAG CTGGTGAAAA GGTTAACrrC ACAGTACCAA CAGGAAACIrr TGGAAATATC 7rGGCTGCCT TTATGCCAA ACAAA'TCCT 7rGCCAGrG GTAAATTAAT CTTCTAAATGACAACA ATGTrTTGAC AGACTTCTrr AA.AACACGTG 'rCTATGACAA AAAACGTGAG TTTAAGGTAA CAACCAGCCC ATCTATGGA'r ATCTTGGTAT CTTCAAACTr GGAGCGCTTG ATTTTCCATC 'rTrTGGGAAA TAATGCTGAA AAGACAACTG TGCCTTGAAC ACGCAAGGAC CTTTGCAGCT GAATATG3CGA TACATTCT TATATCGACG CCAATCGGCC ACTGGAGATG GTTCCCAGTA GT'rGCAGTAG
AATATAAG?!'
CTGAGGA.AGA
ACCCTCATAC
GACAGAC=r GATCCAGAGA AACGGCAGCA GAGATCAAGC AGCTGTTGCT 'rCAGCAGTT'r
AACTTATGAA
TTTTGGACCT
GTGTTTGTGA
ATAAAAAATA
4 4 4 4. 4
S
.4 6 TAACTAAGAC AGTGATTGCT TCAACAGCTA GTCCATACAA AAGCTGTAAC TGGAAAAGCA GGTTTrAACAG ACTTTGAAGC CTrTGGCTCAA 'rTACATGAAA TCTCAGGCGT TGCAGTGCCA CCAGCAGTTG ATGGGCTTGA AATAGCTCCA ATTCGTCACA AGACAACAGT GGCAGCTCCT GACATGCAAG CAGCGGTTGA GGCTTATTr'A GGACTTTAAG ACAGAGGGAG CAAACTCGGT TGGGA.AACCA ACTGAGT'rrC TTTTCA'rCAG GAGGAGAGAT TGM''AAGAA AAATAAAGAC ATTCTTAATA TTGCATTGCC AGCTATGGGT GAAAACTTTT TGCAGA'rGCT AATGGGAATG GTGGACAGT'r ATTTGGTrGC TCATTTAGGA TTGATAGCrA T-rTCAGGGGT TTCAGTAG.CT GGTAATATTA TCACCATT'rA TCAGGCGATT TTCATCGCTC TGGGAGCTGC TA'N-rCCAGT GTTATTTCAA AAAGCATAGG GCAGAAAGAC CAGTCGAAGT TGGCCTATCA TGTGACTGAG GCGTTGAAGA TTACCTTACT 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 ATTAAGTTTC CTTTTAGGAT GGGGACGGAG AGGGATGTAG GATTGTTCTC TTAGG AA ACCTCTGCCT CTCTATGTTA AGCTA~T= GTTCTGGATA
TTTTGTCA
CTGAGAGTGG
TGACTAG;TCT
GTTTrTTTATC
TGGGGATAGC
C1TTCGCTGGG AAAGAGATGA TAGGACr TGGACTGTAT CTATCTTTGG TAGGCGGATC AGGAGCCTTG ATTCGTGCAA CGCATAATCC cAATGccTTG AATATTCTTT TTTCAAGTCT TGGTGTTGCT TGGGGGACAA TTGTGTCTCG 7"rTGGTTGGT CTTGTGATTT TGTGGTCACA ATTAAAACTG CCTTATGGGA AGCCAACTTT TGGTTTAGAT AAGGAACTGT TGACCTTGGC TTTACCAGCA GCTGGAGAGC GACTTATGAT GAGGGCTGGA GATGTAGTGA TCATTGCCTT GGTCGTTTCT 'rTGGGACGG AGGCAGTTGC 737 TGGGAATGCA ATCGGAGAAG -TACGGCAACG GTCATGCTIGT TCTTGACCCA GTTTAACTAT ATGCCTGCCT ?TGGCGCGC TGGCCCGAGC AGTTGGAGAG GATCATTGGA AAAGAG?1TGC TAG'rrTGAGT AAACAAACCT TTXGGCITC TATATATGTrC T'1GGGTGTAC CATTAACTCA GGCI'AGTGTr CTAGTGACAC TGTTTTCACT CATCTATACG GCALGrCTGGC AGGGATTAGG TATAGGAA'rG 'rGGTGTATCC GCATTGGGAC GGGCTTGCCT GGTA7T~rGGG CACGCTCT ACGCTATCGT TACCAGCGCT ATATGCTT TATTTGGAT TTAGACGGGA cT'rTATTGGA GGAGACTTTT GCTCAGTr CTATTCCTTA CAAGTA?'CG GTGCAAGATr TGCTTG'rGCG GGTGCTAAAT CAGGTGCGTG CCCAGAGTCT GCCAGGTGCG CGTGAGGTGC TAGCTTGGGC TACTCATAAG GGGAACAACG CNTrACCAT TACAGAGATT TTAACCAGTC AGAGTGGCTT CTATCTGCTA GATAAGTATC AG'rTGAATTC TCTGGAkTGTG GAATrTGCCC AGAATAGTGG 'N'ATGAAGGG AATCACAGGA TTCAAGCGTr GTGATAAAAA GATTGTGTCA GTrTTGTGAC AGTTTGTTAC AAGGAATAGA CAGrTCTGTT IrTrTGTGT TATCATAGAC AGGTACTCAT AATGTTATTA GCGTCAACAG TAGCCTTGTC ACT 'GGGACC CCTATG.ACGA AAATGCACGC CTCCCTTTrT AGGATATCTG ATGGGGATrG CTTGGATAAT GGTTTTCGCT GAAAGGATAG GAAAT13CAAA
CAGGAACAGT
ATGCGACAAG
TGCTTGGTTG
GGTTA=rCT AAACAGCTrT CTCTTACGAA GCGATTTTAT CAGGGATTGA TGATAAGGAG AAGGTGAGAG AGTTTATCTT GGTGGCAGAA GATAGAAATC TGGATGTTGA GGCTGAGAAG AATGCTCAGG TAGTTTGAT AGACGAATCA GGAAT'rCAGC AGTTTATATA TCTGTTCCTC ATGST1'GCCCC TG'rCCTrrAG TCTCTATACG ACrGATrCTC TAGCGCTGGA
TCTCAAGGAC
TGTrGCGGAAG
TGATAATACT
GATTCAJ4AGC
AGCAGATAT
AGAGACCTAA
TTGGGGGTGG AATCCTATTT CCAAGTCCAG AAGCCCCTAC TArTATATAG GGGATCGGAC ATCAACTTTT TAGAGTCTAC TCCCGTATTT TTGAGACTAA CAAACI'ATTT CAAGTAACCT 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720* 6780 6840 6900 6960 7020 7080 AAATAGGCCC GAGAGGGCTT TrTCTACA 7TGAAAGGAA TTTGAAAGAA TGA.AGAAAAG ATTTGCCCCA GTATTGGCAA CTCAAGCAGA AGAAGTTCTT TGGACTGCAC GTAGTCTTGA GCAAATCCAA AACCATTTCA CAACAAAACA AGTTATACCC TACAGTATCG TGATACTTTG AGCACCATTG
CTAAAACGGA
CAGAAGCCTT
GGGTGTAGAT GTCACAGTGC TTGCGAATCT GAACAAAATC ACTAATATGG ACTTGATTTT CCCAGAAACT GTTTTGACAA CGACTGTCAA TGAAGCAGAA GAAGTAACAG AAGTTGAAAT CCAAACACCT CAAGCAGACT CTAGTGAAGA AGTGACAACT GCGACAGCAG ATTTGACCAC TAATCAAGTG ACCGTTGATG ATCAAACTGT TCAGGTTGCA GACCNYCTC AACCAATTGC AGAALGrACA
CCCAGAGGAG
GACTCCACCT
AACTACAACA
rrCrACTTAT
GCCCGATTAT
AGCTGCI-I-
TCCAGGAGAC
'rTCAGAATTA
TAGTTACAT'C
TAACACTTGG
738 AAGACAGTGA TT1CICTGA AGAAGTGGCA CCATCTACGG GCACT'rCTGT CAAACGACCG AAACAACTCG CCCAGTTGAA GAAGCAACTC CTCAGGAAAC GAGAAGCAGG AAACACAAGC AAGCCCTCAA GCTGCATCAG CAGTGGAAGT AL3?rCAAAG CAAAAGAAGT AGCATCATCA AATGGAGCTA CAGCAGCAGT CAACCAGAAG AGACGAAAAT AATTCAACA ACTTACGAGG CTCCAGCTGC GCTGGACTTG CAGTAGCAAA ATCTGAAAAT GCAGGTCTTC AACCACAAAC AAAGAAGAAA TTGCTAACTT GTrTGCATT ACATCCTTA GTGGTTATCG AGTGGAGATC ACGGAAAAGG '1r=CTATC GACN'ATGO TACCAGAACG GGGATAAGA 7TGCGGAATA TGCTATTCAA AATATGGCCA GCCGTGGCAT
ATCTGGAAAC
AACCCAA'PGC
AACGrTTCTA TGCTCCArTC CAGACCGTGG TACTGACA TCACGIrTCA A'rGAATGGAT AAACCCGACT GCT'rrCGTGA TGGAAAGCGA 'rrCTCGTTCG CTCAAACCA CGTCAGNTT ATCTGAAACT AGCI-rCCTAG T1TTGCTTTTT GATTTTCATT TATCATCTTG TAATGAGTTA AGCAACATTC AATTAGTCAA ATATTGATAA ATCAATAAAA ATTrACTG'rA TCAACTCCGC TTGrCTGAGC 'rTGGGATrAG TTTGACACGG TATCAGATT ACCAAATGG3C GGT'rCAGGAG CGT'rTGAAAA AAATTTTGGA AACGGAAGGT TTGGTGGAGC TGTTGGTAAA GGCTGCGAAG TATGCCAAGG ATATCAGGGT TAAGGAAGAG ATAGAAAGTA GCCGTTTAT'r AAATAAATTG GTTTTCGGTA ATGTCAATrA TTTT-AACAAC GATCGTTGCT -AGTATTGCCA CGCAATCAGA TGCG.ACTAGT
TGATAACATC
TTT=TTT
TCAAAGCTGT
GAGTATCAAT
T'rGCAATCTA
AGAGAGGGGA
AAGCGAGTAC
TACTGTrrT
'ITGATCAGGC
GTCATCGTAA
AGCAGTTAGT
TCTrAACAGA 'rTGAAAATAT IrrGGAGCArr
CGTG;TA'ITTA
GATAGCAAAT ATGGG.CCAGC GAAAATCACT ATGATCACGT AlTTTGACGA ATGAGATCTA GTCArACTCT TCGAAAATCT GCT'rTGAGCA ACCTGCGACT TTGAATGGAA AATGGAAAGT TTTTACTTTA TATCACAATT AGAAATGCTA GAGATTCAAG GCAATTGTTT GAAAAAAGGC GCTG;GAGCAT TCTCCTTGTA TGCTTTGACA CGGCATTTCA TCCTrGAAAAT CAGCGGGAAG GGTGAATCCC CCTCTGCAAC G7rTGAGAGA ACAGAACTCA AGAAATTTAA GGAGAAATAG TTTACATTTT TTA'N"TGAA ATATGGAAAA GGAAGAATTG TTTATAAGGC TCTGCTAGGA AAATTGTGAC TATTrTGTC CGGATAAAAA AATTA'rNTG 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 GCTCATCCGT CAGTAAGTTC ATT=TCAAA AATCAAGGAA GTCTTTCTCT TGTATGTCAT TTA-rrCTCA CAGAATTTAG TrATTTGrGA TCGTGCTGC GACTTACGGC TCN'AACAG AAACAAGGTG GATCAGCTAT 1-TTCGCCTTG ATTAGTA'1rr TACTC=~AA ATACACTTGA 739 ACGTCGATTC TAATCTCGCT AATCC~Tr AATCCAGAAT AA1GOCAAATA TGTTATACTT GTTr rAACA AAAAA~CCC ATT~GAATTGG TrrGAGGAG ?rrAGAAATCA AGTATrAGT GACAGG?1'?r GGCCTT GAGGGGAAAA GGGCAATCCA GC~rGGAGG CCATTAAC TTTACCAGCT GAAATCCATC GTGCTGAGGT CCGrrGGCTA GCGlGCCGA CAG7?rrTTCA CAAATCTGCT CAACTATTGG AAGAAGAGAT GAATCGrAT CAACCTGACT T~rT~CCx-rG TATTCGGGCAA GcTGGTGGAA GAAcrAGrr GACACCTGAA CaA=TACCA 'rrAATCAAGA CGATCCATGC ATTCTGATA ACGAAGATAA TCAACCGA'rr GACCGTCCCA TCGCCCAGA TGGTGCTTCG GCCTACTTTA GTAGTTTGCC GA'rTAAAGCG ATG=~CAAG CTATAAAAAA AGAGGGCTTA CCCCCTCTC TrrCCAATAC GGCAGCGACT TTTGTCTGCA GCCAT'rrGAT GTATCAGGCT CTCTA'TTTGG TAGAAACAA A'rCTCCATAT TATTCCTTAT A'rGATGGAAC AGGTGGTCAA CAGACCGACT GGA'rATTCGG CGAGGGATAG*AAGCAGCAAT CCGCGCTATA ACTCAAGTTG GrAGGCGGAG AAACTCA'rTG ATAGAAAAAA AAGC7'r'rTGG ACGTTTTCGG GCCAATACTG C'TCCGTAAAA TATXAGGTAG GAGTGAAAAA CTAGCAATGC CAAAGGTA.AT GAAGAACCTG TAAATCTAGG ACAAAGTGCT GCAACTTGTA TAGT7TrrTAG GATTCGTCTT GGTGGGACCT GTCCTAGGTC GTTdACGCAC
ACTCCAGCTA
ATAGAACATG
GTTrATGCA TGAGl??TAGT
GAGATCACGA
GCrTTCAGGGG AAAAACCTTC CATAA7=.A GTGCATTGGA CCAAT'TGAGG AACTACCA.AG GCCCTCATA AACGAACGC TAGACTATAA CAGAGAAGAA 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260* 10320 10380 10440 10500 10560 10620 ATTCCACCTG TGAATAGGCA 'rAATACTGTG AGATGAGACT TGCAAGAAAG 'rAGAGTCCAA ATGAGAGATC TAGATr=GA AACTCAGGAT TACTATAAAA GAGGAGGTAA ATCCCAAGTA AACCTTTGAG AACTACCGTC AAAAAG~rr CGCTGTT'rrr TACAGATGGC 'rCCGTTTTAG CGGAACTGGT CAAAAGAATA GTCCCCATAA GAATATAGAG GATATTTCCT ACAATGATCA AGACCATGAG GAAACGC2'CG GTTTCA.ACTG CTAGGGTGAC GAATTrrTG GCTAA.AAAC AATTAGGGAT ACTCCATAAA AAGAGATAGA GAGAAAAGCG CTCCTCATCA AAGAGAGCTA AATCTCAT GAGTGTCACT GTTGCATAGA AGGAGACTAG TAGAGCAAAG AAGGT T GAAGTATTNG GCCAAGTATG CTGAAAAATG GcrGTTCTA.A AACAGTCCC TGG.ATCCGAG ATAAGGGATT AAGAA.AACCA GATAACATGA CCAGCATACI' GGCAAGGATA 'rAGAGGAGAA AGAGACCGGG GGTGTCAGCC TGAAAATGTT TTGACTCCTG ACGAA'rTGTT TTTAAATCAA 71'rTTGGATA GTTCATTCTC TTATTATACC ATAGTTCTTA 'rACATAGTTC CTGACAGTTC CTACTrTT TGATAALAATC ATACAGTGTG TCCTTGGGCA CACTGTA'rcA ACTGGGACTG 740 TCTTTCCCAG CTTCGGAGGT AAAAAATGTC AGATTCACCA ATCAAATATC Gr1TGATTAA 10680 GAAAGAAAAA CACACAGGAG CTCGTCTGGG ACAAATCATC ACTCCCCACG GTACCTTTCC 10740 -GACACCTATG TTTATGCCAG TTGGGACACA AGCCACTGTC! AAAACTCACT CACCTGAAGA 10800 ATTGAAGGAG ATGGGT1'CGG GmA?1ATCCT ATcAAAcAcc TATcAT CTC GGCrrcGccc 10860 TGGAGATGAA CTCATTGCAC G.CGCTGGTG.G TCTCCACAAG TTCATGAATT GGGACCAGCC 10920 TATC?1'GACA GATAGTGGTG GTI'CAGGT TTATTCTTTA GCAGATAGCC GTAATATCAC 10980 AGAAGAAGGA GTAACCTTTA AAAATCATCT AAATGGTTCT AAGATGGTTCC TATCCCCAGA 11040 AAAAGCCATC TCTATTCAGA ATAATCTGGG TTCAGACATC ATGATCTCCT TTGATGAATG 11100 TCCTCAGTTr TATCAACCTT ATGACTACGT TAACAAATCG ATCGAGCGTA CCAGCCGTTG 11160 GGCTGAGCGT GGT'rTGAAGG CTCACCGTCG TCCACATGAC CAAGGTTTGT TTGGAATTGT 11220 GCAAGGTGCA GGATTTGAAG ACCTTCGCCG CCAATCAGCT CATGATCTTG TCAGCATGGA 11280 TTTCTCAGGC TACTCTATCG GTGGTTTGGC AGTGGGAGAA ACCCATGAAG AGATGAATGC 11340 GG'rTGGAC TTTACAACTC AACTGCTGCC TGAAAATAAA CCTCGTTATC TGATGGGTGT 11400 GGGAGCGCCA GATAGCTTGA TCGATGGGGT CATTCGTGGG GTGGATATGT TTGACTGTGT 11460 *CTTACCGACT CGAATTGCTC GTAACGGGAC TrGTATGACC AGTCAAGGAC GTTTGGTTGT 11520 *GAAAAATGCC CAGTTTGCTG AGGACTTTAC GCCACTGGAT CCTGAGTGTG ATTGCTACAC 11580 ***ATGTAATAAC TATACACGCG CTrACC TrCG TCACCTGCTC AAGGCTGATG AAACCTTTGG 11640 **.TATCCGCT'rG ACTAGCTACC ACAATCTTTA CTTCTTGC'rT AACCTGATGA AGCAAGTGCG 11700 ACAAGCCATC ATGGATGACA ArCTCTrGGA ATTCCGTGAG TATTTTGTGG A-AAAATATGG 11760 CTATAATAAG TCAGGACGTA ATTrCTAAAA TGGAAT'rGAT ATAAAAAAAT CCTAAGTTTT 11820 CTCTTAGGAT TTTTCTTCTT TTTTTGATAG AATAAACT GT ACAATGAAAG GAAGAATAAA 11880 CTCGTATGCG CATTAAATGC TTr'rCCTCGA TrAGG 11915 INFORMATION FOR SEQ I0 NO: 97: Ci)SEQUENCE CHARACTERISTICS: LENGTH: 9069 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: CACAGGGCAA CAGTTCTATC GCTTCAAATT TTTTCTTGGT TTGCAGATAT TCAAGAATCG GGAGTNPMC TATAGTATTC GGCAGATTTA TTACAGCCAA GCATCTCAAA AATACGGACA 120 741 GCAC~rCATCTIIM= C~rC~rG CTCACCrrGCTTGCTATC AAGGAGAcr 180 TCTGCCC-ACA GATAAACAAT TCGGAAATAG G1TCTCAr CCTTGTAGAA ATGCcTc~cc 240 ATAACACGTT TAAAATAATA GGCA?1'GGTA AATTCTTCAC ACTCAATACT AGCTAAAAAG 300 CCATTCAATA GTATAGTATG AAAAAGGTT'r CGATTGCCAG ACATTTCCAT TAGAAAATCA 360 GATTTACGTA CCATr'rCTCG TACATATCTA GTAAAAAGAG AAACAGATAA AAATGGAGAA 420 CTGACTGAAA ATAAAA TTCATAGATT CCCCAGATCT CCGTAGAAAA CAAATAATCA 480 TCGAAGGACI-r TrCCTTCCTC TGCTG1'TAAG TCTACCCTTr CATCTATGCT CTCATATAA 540 GAC?1'GATAA TAATGGCArr TAGAATATGT rrClTT1TCT TGTCAGA6ATG GGCATC -1r 600 TA'rACTCCCT GCGATATAAG TCCTCAAGAG GTGCTATA'r CTTTCGTTCC AAGACATCTG 660 TAATTTCTTT TCTCA.AC'rCA GAATCTGTAT CATACTGGAA ACCTCTTCCC AGAAAGAGGA 720 'rCTCCTCCAC ACTGGCAGAT A'rATTTTCCA GAGCAAATAG AAAC7=?CC ACCGAAACCT 78a0 CACTCTGACC TrrCAAAA CGGOACAACA TAGACGGCGA AAATrTTCCT CCGGTCCTr 840 *GTC1'CAGTGA GATATrTTC GACTCTCGTA ATTGTCTAAA GACr'.tTCCA ATCI'GCTCCA 900 *TAGACTTCCC CTTGATTCCG TAT=TTCr ATTTATCAT ATTrTrCAGA AWTTCATCA 960 *AAAACTTGCC AAATTGTCAG AATTATCACA AAATAGAGCA TATTTATCAC GTGGAGGGAC 1020 *TGC1IATGAGA GACGATATCA AAATCAATGA CCCTCCTTTG GCCTTGCAAC ACCAAATTAT 1080 *P*CGAAAAACTA GAGAAAGTTT 'rTGATACAGA TGTGGAATTG GATCTTTACA ATCTAGGTCT 1140 GATTTATGAA ATCAATCTGG ATGAAACGGC GCTCTGCAAG ATTGTCATGA CCTTCACCGA 1200 TACTGCCTCT GATTCCCCCG AAACCCTGCC TATTGAAATC GTCGCACGTC TGAAACAAAT 1260 PP.CGAGGGTATC AAAGATATCA AGGTTGAAGT TACCTGC C CCTCC 7rGGA AAATCACACG 1320 AATCAGTCCC TATGGCCGTA TTrCCCTGG ACTACCACCT CGTTAAGCAG ACCAATCACT 1380 .TTTAAACATG AAAATCAAAG GGCAAACTAG AAAACTAGCC CCAGGTrC? CAAAACACTG 1440 T=GAAGTT ATGGATAGAA CTGACGAAGT CACrCAAAA CACTG=r-rG AGGT'rCTGGA 1500 TAGAACTGAC GAAflTCAgCT CAAAACACTG TTTGAGGTT GTCGATAGAA CTCACGAACT 1560 CAGCCCAAAA CACTGTTrTG AGGTTGTGGA TAGAACTCAC.GAACTCAGTA ACCATACCTA 1620 CGCCAAGGCG ACGTTGACGT GATTTGAAGA GAI=TCGAG TATCAGTZTA ?TTTTACC1' 1680 P pGACTTGTCCA TATTCCAGAA GTCTGTCACG CCTCCGCGTG AAGCAGATGA TACGATGTGG 1740 GCATATI-rAC CCGGACACC ACGGCTGTAA ACT~GGTGGCA AGCTTGTTTC TCCTTGCGT 1800 '1TTTCAAGTT CTTCTTCGGA TACGGCCATA GAAATTTCTT TGGTATCTTG GTCAACCGTA 1860 742 ACGATATCGC CCGTACGGAG ATAGGCAATT GGTCCACCAT CCTGAGCTTC AGGAGCGATA 74GTCCAACAA CCAAGACCATA AICTACCACCA GAAGAAACGTC CGTCCGTCAA GP.GGGCCACC T'rATCTCCCT GACCTTTACC AACAATCATT GAAGAAAGTG ATAGCATCTC AGGCATACCA GGACCACCTT TAGGTCCAAC AAAACGAACA ACGACTACAT CGCCATCAAC GATTCATCT GTCAGAACGG CCTGAATCGC ATCTTCTTCT GAGTCAAAGA CCTTAGCTGG CCCAACGTGA CGACGCACT'r TAACACCTGA TACC7"rGGCA ACTGCACCGT CAGGA1GCAAG GTTCCCGr-rC AAGATGATAA GCGGACCATC CGCACGTrr GGATnrTCAA GTGGCATGAT AACTIrTTGG CCTGGAGTCA AGTCTGCAAA GTCAGCCAAG TTI'CAGCTA CAGTCTTACC AGTACATGTG ATGCGATCTC CGTGAAGGA.A ACCATTTGCC AACAAATACT TCATAACCCC AGOGACACCA CCGACTTCGT AGAGGTCTTG GAAGACATAC TGACCACATG CTTTCA6AGTC GGCCAAGTGA GGCACACGTT CTTGAATCGT ATTGAACTCC TCAACTGACA AGTCAACATT TGCCGCATGG GCAATrGCCGA GCAAGTGAAG AGTGGCGTTT GTAGAACCAC CGAGAGCCAT CGTTACAGTG ATAGCATCTT CAAACGCTTC ACGAGTCAAG ATA'rCTGATG CTTTGAGACC AAGTTCCAAC ATCTTAACAA CAGCACGTCC TGCTGCTTCG A'rATCTTCTT TC'IrATCAGC TGATTCAGCT GCGTGAGAGG ATGACCCTGG CAAACTCATC CCTAGAACTT CGATAGCAGT TGCCATGGTA *6 46 66@ S. 6.
6 4 66 66 6 6 6 66 .5 5 6 54 64 *4~4 .64.
1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TACAGTAT ACATACCACC TTCACGTCCT CAGCTGTCAT ACCAAGTCGA TATCTTTACC ATAGCTGGGA TATCCATATT' ACAACCACCA GGCCCAGGGC ACOCATTACA CTCACCGTGG TTCCATTTTC CGATACCTTC ATCAAGATTT CCCGGTGCAA TAGTTCCACC AGCAATAGCA ATCATACATC CAGGCATGTT
TTCAAGACGT
AAAGACAGAA
ATAGGCGAAA
CTTGTCACAG
CCACCGATAG CGACGAAGGC ATCCACGTTG TGACCACTCA GCGATGATGT CACGAGATGT TAGAGAGAAA CGCATACCAG TCCGCTACGG TAATGCTTCC AAACTGTACA GGCCAAGCGC TTAGCCAGTT TCCCGAAATC ATGCAAGTGA ATGTTACATC TAGCCCCCTC GATGGAGTCC GCGTTCCCAT AGCGATCCCC CTGCAGA'N'T GACACCTTCT GTGTATTTTC CGCCCAAGTC GAAATCACTC CCACAATCGA TGTTTCAAAG TCCTTATCTC TCATACCAGT ATAGCACGGT TAGGTGA'Tr AACCATGCTG TCATAAATGC TACTGCCGTG AATTCAGTCA TCTTATCCCT CCCATTTCAG TTTTTACTAT TATACCACAA
CCCACGAAC
ACGTTTATCT
TTTTCGCATG
666.
6.66 64 65
AAGAACAGAA
ATTAAAAACA
GCTTTCTTTA
CATCTATCTT
TAAAATTCTT GAATTTTCAG AAAATTCTAT ACACAGTCA AATATTTA.AA *ACAAAGCGGA TTAGTGCACT TTCTGATCAC CAGAATA'rGC TTTTTAATCC AATAACGTAC TGTAAT'1'r ACAGAAATTC TTTCAALATAA GTGTATTTAA GCA'rTATAA.A TTTCTAGAAC CTTCTCTTTT ATATTCCATT CACTCAAACC 743 ATACTCATTA AGAAGATAAT CCATr -rCCC TACTTACCG AATCTTTCTT GAACACCCAT CCATrGAAI' ?TG=ATTC CATCATCALGA GAATAATTCA CATAAAGCAC TGCCAATTCC ACCTATCTGA TTCGOTI= CTACAGTAAA TA'rATTrr CCACI=AACA TTG?1'TrAT CrG~rCTGGT ATCGG?TTGA TTrCTAAATAA ATCTATCACA CCTACTGAAT AACCTAA'IT1- Ar.ACAGrTCA 'rCrCAACTC GAATACTTGG ACGCAACCATT ATGCCAGAAG CAACGArrAC AAGATCTTCA CCATGCCTrA ACTCAATGTA CCTTNTALGAA AAATCTTCTC CACCTTGATA 3720 3780 3840 3900 3960 4020 CACAGGAACT GGACTTrC TAATTGTTCG CTGGTTCAAT ATTTCACCAA A'TCCATATC AATTAAACGT AACAATCCAA TTTCTCAAA CGC'rACTCCT GCATC'rGATC CAA'rCACAGT AAATAATTGA TCAAATACTC TTCGTGAAGC AAACCCCTGA ATAGACAAGC CTGCTGCAAC A?'rCACATAA CGGTCTCCAA AGTCCTTTTC ATCGGCTTCT AAGACTACTA TATCAGAATC ATATACATGC CGTAATTCrr TCGTACTrTCT AATCT'L'CTA CAACTGAACT TAACATTTGT AATATA=r~ AGCCC= A AGTCTAATCT ATCACTT-rCT 'TCCAAAATCA rTTATTTAGG TGGCATATGT GTTCCACCA'r TCATCTCTGC GGCATCCAAT TGTGCGTATC CAACAGAAAT AAAAGGACCA AATGTATGA.A GATA.GCTCT CCCGACCATT TCTGCTCCA TAATCCCAAC AAGArrATTA GTAGCCATCG AACTTGACAA ACTwrrGATTA GCCTCTAXAA GGAACTC',TCT CATC-ATTCTG TT'rCCTCCAA TTCCTGACTT 1-rC'CCTCTA CAGTAGCGCG AACATCATGA TGGATTTCA TTTCTTCCAG CTCTTGA.ACC CCTTGACCIT TAATACTATC CACTTAGGTG ATGAATTA'rT TGAC'rGTTT'r AATTCCACAA TCC~tCATA ATATCTGAAC CCTTGACCCT ALATGGAT TCA A6ATCCAAATC CTGAAAAT-TT TCACCTGGAT TACAAATATC C'TTTGTAAAA CCATCTAATT GrTTTTrGrr AATACAATTA ACTTGGATAA CTGT'rGATGA GAAGCAAACT GTATAGCCTC CCCTCATTTA ACTCACCATC TCCAACAA'rA GCGI'AAGTAT AAAAGGCACT CTCTGACCAT ATGCAACTCC AGTCAACA CTAATT-CCTT- CTCCTAAAGA
TAATACA.ATG
AATTTCTCTA
TTCTACCAAA
ATCATCAACA
CCAACA'rTGT CTr'rCTTAT'r
GCCCCTTGTC
TCCA'rrrGTA
ACTIGTATAGA
TGCAAATATT
4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 ATATCTATGC CTGGCGTTAG ATTTrCTATCA TTTAAAGAAT ATAAGAATTC TT'rGTCAAAC GCTCCTCCTC COTGACCTrr Tr.ATAA'rATO TCTGGAGTCA TI'CGCATTAT TTCACCATAA CTTCCTCCG-r AATCTCCGAA TCCAAGATCA CGATCACACC GTAATTTGGT AAACCATTCA AATAGAGTrGT AAATAXTCTC TATCTCCTGC ACCACCGCTA AAACT-rCTAC TCAATG7-rC TAAGAGTATT
GATAGACAGA
TAATCGGATG
TI'AGTCGCAA AT'NTCTTAA CCCATCTTCT CTATrTTTAC TTAAA.ATCAT CCCTTA-rCC 5400 .744q TCCGr'rGCAG ATGGCNr= AATAAAGGAT ACTCCAAACA TAACTGCTAG AATAAGAACA 5460 AGACCAATCA CAATGCCTGC TTGTGAGCCA AATr.AT'IrA ACATTCCTAA AATAATTCCr 5520 GATAGACCAA AATCTGCATC TGAGAAAGr GATCCrrGGA AACCAAGTCC TCCCAAAACr '5580 GGCA~rAAAA AGACTGGAAG AAAACTGM-r AAAATACCTT GTAAAAATGC TCCAATAGK 5640 GCTCCACGAA CACCACCAGA 'rGCATTCCCA ATGACACCTG CAGTCGCTCC ACAGAAGAAA 5700 TGAGGCACAA CACCTGGTAA GA'rAACA.ACC GTTCCTGAAG CAATCATAAT TACCA'rACrr 5760 ACTAAACCAC CAACAAAACT ACAGATAAAT CCAATTACAA CTGCATTGGG TGCATAAGTA 5820 TAAACAATCG GACAATCCAA AG.CAGG=~ GAArrAcGTA CAAGACGCTC TGAAATACCT 5880 rIrAAAGGCTC GAACAArTrC GCCCAAAATA AGGCGAACAC CTGCTAAAAT AACAAATACC 5940 CCTGCTGCAA ATTGACCTrGC TAATTGTAAA GCArAAACTA GACCACTTGT ACCACrACTG 6000 ATTTCTr= CTATATATTC TGACCCTGCA AAGATAGCTA CAATAATGTA AATAACTGCC 6060 ATGGATAAAG TAATACTAAC AGTACTATCA CGTAAAAAAG CrAAACTCTr TGGAAATTTA 6120 ATGTCCTCTG TTGA~rrTGA TTTGTCACCG ATAAGGCTAC CAGTAAAACC AC'rCAACCAA 6180 ***TATCCCAAAG AACTGAAATC ACCTAAAGCT ACCTTGTCAT 'rTCCAGTTAA TTGAACCATA 6240 ***TA7"TTTTGCA CAAATGCrGG GGAAATACTC ATAATAATAC CGAGTrGCTAA TCCTCCTAGT 6300 .*.AAGATGAGAG GCAAGCTAGT AAACCCAGCA ACTGATAAAA 'rGACCGCAA'r CATACATGCC 6260 ATATATAGAG TGTGGTGCCC TGTTAAAAAA ATATATTTAA ATCGAGTAAA ACGAGCGArr 6420 AAGATATTGA ACACCATGCC TGCAAACATA ATCATTGCAG TAGCTGAGCC ATATGTTGTT 6480 AAAGCTACAG CTACAA'rTGC rrCATTATTC GGCACAACGC CAGATAAA'rG AAAAGCATGC 6540 'CCAA.ACATGG TACCAAATGG ATrrAAAGAA TTI-rGTACAA TTCCTGCACC ACCAGATACA 6600 ACTAAGAAAC CAACAAAGGT CTTAATTCCA CCTrrAATAA TATCAGGTA.A T'rTC7"rC7rC 6660 TGAAGAACTA ATCCTAAGAT TGCAATTAAA GC'TACTAAAA TAGCTGGTGT ACTAACAATA 6720 *TCCAATATGA ACTTCATCAT GACGCTAGCC TCCTATATAA GTCCT'rT1-C TTCACAAAIT 6780 TTAGTAATT A ArTCTCGTAG TTCATCCATA TCAATAATAC TATTTAAGAT ACGAACATCT 6840 CCAAGATGAC TAGCTGAATC AGCTAGATCA CGACCAACAA TCCAAATATC AGCTGCATTT 6900 **GGATCTGCTC CACCTAAATC ATAATGTTCA ACTTCTACAT CCGAAACATT CAAATCACTC 6960 ***AATACAGATT CAATATTCAT CTGTACCATA AAACTTGAAC CTAATCCTGA ACCACAAGCT 7020 GTACCAATTT TTAACATTAT CTAATCCTCC TGTTTAA'rTA TCAT-IrMAAT GTCATCATAG 7080 TTTTTTGATG ATATTAAAGT TTG.AACATGA '1-II-rATCTC TrAAAATTGT TG=rAAATGT 7140 GACAAAGCCT TTAAATGACT CTCATTATCA ATGGCTGCAA TACAAATCAA CAATCTTACC 7200 K?1'lC M rCATATCCALA TrAAATAAATC GGTrCI'CCA AAACTAACAT TGACA~rCC ATTCATT-CA CACCTTCATC TGCCGAGCG TGAGvGAATTG CTACTCCTT CCCTAATT, ATAAA.AGGTC CAAACrTTC TACrTTTGA ATCArrGCCT CAGGGTAG?= CTCAGrrAN TTATCTGAT CCAAAAGCGG 7rTAGCI'CCT AAACGAATCG CCTCCTNCCA TCCTAA7'r- TGCaAACTAA CCrGATAGGT 'rTCITTGGTA ATAAGTPGrr CTAGCACTGG TACAATrrCc TTTCTATCATrITT TrGGTA AAGATAATTC TTTAACCCA ATCTTAATTC CAAT'rC7rG GTAATAA'rrC CATATCTT' GACAATA'N'C AGGATTTGTT CAATCrCAAA ATCTCCATA( TCTAAATTCG GAAAATCTTT TAACACTAGT TCTACTAG?'r GTArTGC?1'G CTCTTCAGTC ATCATAACCG AAACTAGATA ATTTGGCT~r TC2'TTCCA CC TI'ATCCT AGAAAAAACC ATATCATAGT CACTACTAGC TTTCACCTGT AAATCATCAA6 'rCTTGACGT TCC1'ATAAAC TCAATTTGAG GAAATAATGC TAATAGA'rrC TCTTTTAACA TCAATGAAGA ACTAACACCA TTrAGGACAAA TGATTGCrCC TTrATACCAT ?1'TTGAGGCA AAGTA'TCTCC TrTCTr-rAAA TAACCTCCGA AATGGATAAC AAAATATGCT GTTTCACTAT CAGCrATGGG ATTGTCAA'rA GCGTCCATCA AGGGCATCAA AGAATCTTTG ACTAATTCAA ATAAATCAGG ATAATGTTCT TTAACATGCA ATACA'rA'TC A'1I-GAACTA CGTACGCCGA ACTTTAATCT ATAGTAACCC GGTATAAGGT GGCGGCGAAG AT'rTrCTC AA'rCCTTCCC =TG=TAAA ATGTA.ACAAA GAAATA'rCT'r CCAT-rCTAC-r TATAATAGCC TCTrGTAATT GATTAAACTA AACCCGCA ACATCTACTr CACC'TTCAAA GCAACL-rGAT AATAAAACGG TCATATAGCC ATAATCATCC TCAGAAAACA CCGTATCTAT AATTCCCAAA TCAACCACTG TATCCAATAA ALATACTGGLT ATATCTTCAA 'rAACAGGAGA TACTAATGTC TCTGAAAGAC ATACTCTTTC AACATCCCTT TGATACCTAC ACAGAATGAA TACTAAACCG AAAACGTAAA CTTTrAATTG ATTAACAATA GGTACTAGCT GTAGCTTCTC ATAArAATCT TTAACTACCT CATCAATCAA ATCATAAGTT AATGAATACC CCCAACTGGA TAAAACATAA 'rCCAAACCCC AAATCCCTAT GGAGGATTCC ACCAACTCAC TAACCATTTG A.AAAGCTAAG CGGTGCTTAT TCCACTCTGA ACCGTGTAAA GTALTAACCTr TTGCTCTACT GTACCCTAGC 'rCCAAA'rCAT TATCTAACAT AATC?1TTCTT AATGATTGAA TATCAGATAA CGTTGTATTC TTACTTACTT TCAAAAACTC 'TTCCTAATGA CTATTCGATA TAAAA'rCTAA TCCGCAAAAA GTGTAAAGAT AGATT-AAAGC TAAGCCAGCC GACTTTGGTA AAACCAATTC ATCCGACTTA ATAATArTCT TCA.AAGACTG CTTCCTACGA TTTGATAAAC TATAGCGACC TTGCTTrTTTA TCCAGCACTA TCCCTTTATT AGCTAGATAA
T
7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 a040 8100 8160 8220 8280 8340 8400 8460* 8520 8580 8640 8700 8760 8820 8880 8940 6 a.
e a 6 06 6.
a 6 a. 4*a* 6 *6 0 *66* *o 6 GGCACTAAAT AATCTATFCC ITTTGACT TCCITATAG GTAAGCTCAC CITAACAGAT AATTCATATA ACGATAGCTC ACAATGATCC ATCAAAGTCA TCAAAATAAC TAGI'GCTCTA
TAATCAAAC
INFORMATION FOR SEQ ID NO: 98: SEQUENCE CHARACTEISTICS: LENGTH: 8654-base pairs 'TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: CGAGACAACA AGATCAAGAA AAATTTGCCC TATCGTrrGT GGCCTGCA AGTGrAGCAC TTCTTGCAGC CTGTGGAGAA GTGAAGTCTG GAGCAG'TCAA CACTCCTGGT AACTCAGTAG AGGAAAAGAC AATTAAAATC GGGTrTAACT TTGAAGAATC AGGT1?CITA GCrGCATACG GAACAGCTGA ACAAAAAGGT GCCCAATTGG cTGT'rGA TGA AATCAATGCC GCAGTGGTAT CGATGGAAAA CAAATCGAAG TAGTCGATAA, AGATAATA-AG TCTGAAACAG CTGAGGCTGC TTCAGT'rACA ACTAACCTTG TAACCCAATC TAAAGTATCA GCAGTCGTAG GACCTGCGAC ATCTGGTGCG ACTGCAGCTG CGGTACGAA CGCTACAAAA GCAGGTGTTC CATTGATCrC ACCAAGTGCG ACTCAAGATG GATTGACTAA AGGTCAAGAT TACCTCTTrA TTGGAACTTT' CCAAGATAGC TTCCAAGCAA AAATr'ATCTC AAACTATGT'r TCTGAAAAAT 'rAAATGCTAA GAAAGTTGTT CTTTACACTG ACAATGCCAG TGAC'rA'CCT AAAGGGATTG CAAAATCTTr CCGCGAGTCA TACAAGGGTG AAATCGTTGC AGATGAACT TTCGTAGCAG GTGACACAGA CTrCCAAGCA CCCTrACAA AAATGAAAGG GAAAGACTTr GATGCTATCG TTGTTCCTG T1'ACTATAAT GAGGCTGG'rA AAATTGTAAA CCAAGCGCGT GGCATGGGAA TTGACAAACC AATCGTrGGT GGTGATGGAT TCAACGGTGA GGAGT'rTGTA CAACAAGCAA CTGCTGAAAA AGCATCAAAC A'TCTACTTTA TCTCAGGCTr CTCAACTrACT GTAGAAGTTT CAGCTAAAGC TAAAGCCTTC CTTGACGCTT ACCGTGCTAA G'TACAATGAA GAGCCTTCAA CA=TCAGC cTTGGCTTAT G.ATTCAGTrc ACCTTGTAGC AAACGCAGCA AAAGGTGCTA AAAATTCAGG TGAAATCAAG AATAACCTrG CTAAAACAAA AGATTTTGAA GGTGTAACTG GTCAAACAAG CTTCGATGCA GACCACAACA CAGTCAAAAC TGCTTACATG ATGACCATGA ACAATGGTAA AGTTGAAGCA GCAGAAGTTG TAAAACCATA ATAGAAAAAT GT'TGAAATAG GGAATGACCC TDTTGACTCAC TCCCTGTTTC GATATrTAA'r ACTCTTCGAA AATCTCTTCA AACTGCGTCA 9000 9060 9069 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 747 TGTGACTGAC TTCGTCAGTC TTATCTACAA CCTCAAAGCA ACGTCGCCTT GGATTATATA GTGCTTTGAG CAACC1'GCGG CTAGr1CC AGTTrGCTr GAACCTATCA AAAAGTGAGG GAAAACCCTC GGAA1TTATAA G=TCAACAA CTCGTAAATG GrrTGATTCT AGGTAGTGTT ATA'rACCATG GTTTAcGGAA 'rTATCAAGCT CATCAACrTC GATGGGAGCC TrrA'rCGG~r ATTTCTTGAT CAATTrCTTT G=TATTGTA GCrATGC'rAG CGACA=T TC~rGGTGTC CCGACC~T'TG CCCCACTCTA CTCGTATTCC T1rTTGAIr CCTAT'rGGAG TATGGAATGG TCTATCTCGT 'rGGTGCCAAT GATTCAAACA GTTCGATA'rG ATTr.GGACC AATTAGCTrA TTTGGCCATT TCCTGATTT TGATGA??Tr CTTACA.AGTC GGGGAAAGCC ATGCGTGCAG TATCACTAGA TAGCGACCC TGTAAACCGT ACGATTAGCT TTACCTTCGC TT'rGGGrrCT TGTTCTGATT GCTCIrATT ATA CTCTCT TGAGCCTTG T~TArrCA TTGAGTATAA ATAGAAAGAG TGAATCTTAT TACCCCrGT TAGCCCrAGG 1320 1380 1440 1500
GCCCATGGTG
CAAATG3AAT ATATTATAr 1560 TCTTTGTAGC 1620 GTGArrGAGT TTCTTCGCTTA ACCGCI'ATTG C.CXTTrr
ACCCGTGCCT
ACAAATGTGC
ATTGTCCAAA
CCAATTGA
GC'TCTrGCGC
ATGGGGCGTA
ATTCCTGGTG
rTGGGATC1'
TCCCTCAAGC
AGTTAATGAT
AGACTAAGAT
TGGGGATCAA
GTGCGCCTCG
CTCCAGCGTCT
CGGCTCTTGG
CAGATTTCCG
CTCG;TATCCT
TTAATATTCT
CAGTCGGAGT
TAAATCTTTC GTTGCCGCAG 'rGGCTTTGTG ATTGGTCTAT TGATGCCATT GTTTATGGAA TGGTAAGAAT GTGAAAGAGA ATGAGTTACTC CT'rTTG 'TAG ACTTAATCTA T'rCTATGTAC TCGTCTCAAC TTAATCG7TTG GGCGATTGT GCCTATGCAG CTTTGGAGCT ATGCTTGTAG TCCAACCTTG CGCTTGAAGG TACTrGGTGG TATCGGAATT DrGGAAAccTT TCCGACIGCCC TCTTGTTGTl'
AGGTGTAAAC
CTGGCTATAG
ACATTTTACA
G77rTCAGG CGCrTTAT GGGCTrTGCT
GGACTATCT
GATCTTGATT GTCCCCCCAG CATCAAG4GAA AATTTAAAAC CTTGATTAGT GTACTGCTTT 1680 1740 1800 1860 1920 1980 2040 2100.
2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 ACAAATTGGA ATTAATATTA T'TTGGCTGT ACAATTTTCA C TTGGTCATG CTG=r~CAT TCGG"rCTAAA 'rCACCAACCT ACGGTGCCTT ?TCAGGAGCA cTTGCCTTAC ?rGTCCCCAT TGCGGTAGCA ACTCTGCGTG TTTCGAAAT CCTTACAAAT GGTGCCGCAG GTATCTTAGG TATCCCTATC T-rTA'rCATCA ATGGTGGAAG GATTCCTAAC TTTACAACTT GGCAAATGGT AACCTGAAC TTCTTGCGTA GCCCAATTGG AATCGCTC GAGTCA G GGG?'rAATAC
TTACTTCTTT
TCCTTCAACC
GACTAAA.ATT
GTCGTGATTA
CTCTCTGTTC
AAAATCATCG
CAACCATTGC
GTGAAGATGA
CTTTTGTCTT
'rCGTGCCATTr ACTGCAAGTA TrGCTGGGTC ACTTCACGCA GGATTTATCG GGTCTGTTGT ACCGAAAGAT TACACCTTCA ACTCGGTTCC ATTACAGT TCTCCAAGAT GTTGCTAGTG GATTTTCAGA CCAGGTGGAC AAAATCTAAG AAGGAGGAAC CATTTTGGTG GCTAACAkC G -rGGATT'AA TCGGTCCAAA GTTrTATCAAC CAAGCGAGGG CCTTATAAGA TTGCCTCTTT GATrrAACAG TITrAGATAA ?TTACTAGTT TCTTACGCTT GCTTTGGAAT TGTTGAAAAT 748 TCAACrCAAT CAACGTrG ATTATI'TGM TATTrTGGTGG CGATTGrC GGCTATTGTT CTGGGAATTT TGAATATGCT TGCGTATGAT TATrrACGCT 1rGOCC?7GG TATTGGTAAT TCCTTGGAAC ATGGGAACTG AGCCTATCAC GTTrTCTTTAA AACTAATG GCATTACTTG-A*TAAAACA G=rAACCAAA TGTrGGAGA'r GTGACTCTTG AA7IrGAACGA AGGGCAACTG CGGAGCTGGG AAAACCACCC Tr-rCAACCT 7TTGACCGGT AACAGTAACC CTAGATGGTC ACC"TTGAA TGGGAAATCA GGGACTTGGA CGTACITTTCC AAAATATCCG TCTCTrrAA6A TGrr'rGATT GCTTTTGGAA ACCATCACAA ACAGCATGTT ACCAGCTTTIT TACAAGAGTG CTTTGATTTA GATGGTGATG CTTTCCTACG GACAACAACG TCGTTTGGAA ATTGTTCGTG
ATTCTCTTCT
GAGrrAATTC
ATCAATCTGG
GCTCA'AGCAA
GGTGAAGCCT
AAGCAGTTCG
CCAACGGTGC
CAGGAAAGAT
TAGATGAACC AGCAGCAGGT GTCGTATCAA AGATGAGTTT TCATGGAAGT AACAGAACGT CTCCAGACGA AATAAGACC AATGTCTATG TTAAAAGTTG TGATGTAACC TTTGAAGTTA AGGTAAGACA ACTA'rrCTTC TGAA7TTTTTA GGTCAAGAAA
ATGAACCCAC
AAGATTACAA
ATC'rACGTAC
AATAAACGCG
AAAATCTTTC
ATGAAGGAGA
GCACCTTGTC
TCCAAAAAAT
GCCACGTC'T
AAAAAGAATT AAAGCCTAAA CAGAGACTCT TGCTAAAAAT CCCTTGCTAC GGAACCTAAA AGGAAACAGC CCAATTGACT TCATGTTGAT TGAACACGAT TTGAATATGG CCGTTTAATC TTATCGAAGC 'IrATCTAGGA TGTCCATTAC GGTATGANTCC AGTTGMCC CTTATCGGTG AGGTT'rGGTT CGACCAAGTT GCCAGCTCAG AAAATCGTGG TCCTGGC1TTG ACTGTTATGG 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 CAAGTGGTCT TTCACAAGTT AAAATCTTGA AATGGGAGCT AGAAGCTTTT CTCACCCTTT TTTCAGGGGG GGAACAACAA TCTTCTTTT AGATGAACCA ATATCATTCA AGATA7TCAG ATAAAGCACT TCCAATCTCT CAGGAACAGG AAAAGAACTC AAAACAATCC AGTGGATTGT
CCAGAAGGAC
1-rCrrAAAGA AAAATCGTGA AGAAAATCAA GCTAACT'AA CCTCGTCTTG AAGAACGGAA GAACCAAGAT GCAGCCACTC ATGCTrGCCA TGGGACCCC CCTCATGTCA ACACCAAAAC 'rCAATGGGAC 'rTGCCCCAAT CTTTATCCAA GAAATITTTTG AAGCAAGGAA CAACGGTCCT CTTGAT'rGAA CAAAATGCCA GACCGAGGA'I ATGTACTGGA AACAGGGAGA ATCGTCCTAT GCTTCATCAG AAGAAG'rCAG AAAAGCATAT CTAGGTGGCT TTTAGTCGGC AGATGGAGAT TACGAAGTAA TCATCAATAT
AGTCCCGGG
AAAGC 'TAAT
AGCCCGATT
AATGTAAG-CG
AGAAGTCrCA
ATAACAGTAT
ACCTTTAG
TTCTAATAAT
CAGGCTCTT
TTGATAGAT
TGGCAG7*rAA CrCATGCAGC
TCGG;TAGATT
TGAAAAAArC rGCTAGCT rrACAAAAAC
AGATTATG
AGATTI'GATG
ATCGAAAATG ATCAATTAGT TrGTTTG=T TCTAAAGCAA CAAGTCTC TATCTATCAG AAAGATGTCA TGATTCGCGA TGTTGTCACT ACTTATCTGA TGTTGAAAAA TAAGATTAGT TACGGAGTTA T'rACTGACCG TGACGrTrC GAAGAAGCGA TTCGTGTACG CTTNGTTACA GTTTCTTGA TTGTAGAACA AAAT'rTAAT GATGGTAAGG TGATTATCGA AGTGCAAATC CAAAAATTTG AAGCAA.ATGG TAT'rCAAGTG TTGTAAGAAG CGAAGCCCAA ACGCTTCTTT GGAAAGAAAr GATAAAA'rAT GCTATAATGA 'TTCAGTAAT TCGTCAGAAA ATTGACGCAA CTCTTTGACC TCGAACCGCT AGAGGAAGAG CCTGA7-=T GGAACGATAA TATTGCGGCC 749 GAGATTGCAA ACAAATCTGC ATCTACATTG GAATCGAAAAA ?TTCTTACCr TCATTCACAG TATrcxAcT rrlCTGAATT TCGAAAAACA ATTGTATAAT AGCGATAAGA ATAGAAAACG ACCCGCAAGO TACTl'TATAT TAGTCCAGAT ACGAGCA.AG GTTTGCACCG TCTGCCTGTT ACTGAGGGAA CCAT7'GCACA AGCAAGTCCA ATGAATTA'rC ?rCTGAATAA CACAAAAGTA GTCTCAGGCT ATGCT A=TC AGAACATGCA ATTCTCCCTG TCGTAGATAA CCATCAACTA CAAGCCTTTC TTGAAA-TrGC ACG=~ATCGC GAAGATGAAG T G--:GTTCT TGGAAAAATT ATrCCCATA CAGTCAATAT TCCGCCTAAG GATGGATCA.A TTGAM-ACC ACCTTGAA6A GAAGAAA'rCG
TTT-CATCAA
AATAATGTAA
ATCCTGAAAA
ATTGCCATCT
CAAAAAACGT
CTCGCACTTC
AGGG;ATTAG
AAAAGGACTA
ATTAGCC=C1
TGGAAAACAA
CGCAAGAATT
AGGATGAAGT
TGGTAGCGCA
TGTCAkGAACC
CTGAGGCGCA
AAGGCTTTAA
TAACTTTwrATC
AGCAAAAGTC
AGCAAAAGAT
TTTATGGACA
TTCAGGGGGT
GATGACAGAA
AAATGAATTA
CGAAATTTTA
GTTAGCCGAA
TTATGACCAC
GCACGGGT
AGTGGAAG
A?1'TGAAGGG 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540
AAAAACACTT
'rTGGAT'TT
CTTGATAAGA
AACAATGCCA
GATATGTTGC
TTGGATTACC
CCTAATGC
CCATTTGACT
TTrGGATGATA ACAATACCTT CCA'rAAGATG GAAGAG'rTC TGCCTGAAGA CGACTCAGTG CATCATGAAC TAATGACCAG CTACGAGATC ACTCTACTCT TCTTGGAAAT CCATCCAGGT TCTGGGTA TTCGTATGTA TACTCGTTAT GGTAA'rGCI'A AAGCAGGTGA TGAGGCTGGT ATTAACTCG ATGGTCTCCT CAAGTCAGAA ATGGGTGTTC CTGCCAAACG TCCCCATACC TC71"rCACAT CTATTGAAGT GGAAATCCGT GAAGATGXA ACCCTTACT CCGAATCTCA CTGTAGAACT GATGCCAGAA TCAAGATGGA TACCTTCCCT 750 TCAaGGGTG CCGGTGGACA AAACGTCAAT AAGGTIrrCAA CAGGTACG TTTAACCCAC ATTCCAACTG GAATTG?7GT CCAATCAACA GTAGATCGTA CCCAGTATGG AAATAGAGAT CGTCCATGA AGAT7TTCA G-GCTAAGCTC TATCAAATGG AGCAAGATAA GAAGGCTGCG GAGGTAGATr C 'CTCAAAGG TGAGAAAAAG GAGATCACTr GGQGAAGCCA AATCCG=c TATGrrCA CCCrATAC TATGGTAAAA GATCACCGAA CTAGC7"rGA GGrGCTCAG GTAGATAAGc TTATGGATGG GGACCTAGAT CCT=TATCG ATGCrrATCT CAAGTGGCGA ATrAGCTAAG ATAGAAAGGA ACTCACA'rGT CAATrATTGA AATGAGAGAT GTCG=rAAAA AATACCACAA CGGAACAAC'r GCTCTACGCG GTTTCGGT TAGCG-TTCAA CCWGGGAAT 6600 6660 6720 6780 6840 6900 6960 7020 TTGCTTACAT CGTAGGACCT GTGAAGTAAA AATCGATAA.A AAAAGAAAGA TCTCCCGCTT TGTTACCAAA GAAAACTGTC ATCGCCGTA-A TATCAAAAGA AGGTTCGTTC TTTCCCAA.AT GTGCAATTGT AAATAATCCC TCAGGAGCAG GGAACTCAAC TTTTCGT GGAAGCCTAT CAGTTGCTGG T=rAATCTG CTACGTCGTA GTG1rGGGTr TGTCTTCCAG TATGAAAATA 7TGC=ACGC TA'rGGAAGTA CGAGTGATGG AAGTTTTGGA CTTGGTTGGA GAACTCTCAG GTGGGGAGCA AAAGTATTGA TAGCTGATGA CGGATAATTC ATGGGAAATT ATGAATCTCT TTTTGATGGC GACTCATAAT AGCCAGATTG TTGAAAATGG CCGTGTCGI AGATrTTTTC GCCATTTAT'r GTAGCTGCTG TCAG-rTCAG'r
CGTGACGAAT
TGAACCTTA
CATGATrACT
TGCAACGGAT
'rAAATACCTT
CAAAAGGAGA
AAAAGTI-rGA TT~GACCT'rGG
ATTGAAAATA
ACAATTGAAA
'rTGAAGAACA
AA.ATTAACCG
TATCATGCCT
ACAGCGGATT
GCCAACAGGA
TAAC'/rACAA
GCGCCACCGT
GTATGGATAC
AACGAAATGG
TCGCAATATT
ATGTCCGTGT
TCTCTGTA'rC 7080 GTTAAGATCA 7140 GATTATAAAT 7200 ATrCGGCGAAA 7260 TTGAAGCATA 7320 GCGATTGCGC 7380 AATCTGGATC 7440 GGAACAACTA 7500 GTCATTGCCA 7560 GATGATTAGT 7620 TTGGATGACA 7680 TGCATCTGTT 7740 AGTAGTTTAT 7800 A=TTrCALATA CAGCGAAACT AGCTACAGAT ATCCGAAAGG ATGTGGAAGA TAATAGTCAG AATAATGACT ACCACAAGCT ATATGATTCT ACCTTTTCAA GTAAAGAAGA ACAATATGAA AAAATCTTTG AAGGAGATCC CAATCCTCTC CCALAATGATG TAAAAACTAT AGCCGAAGAr CAAGATGGCG GTGCCAATAC AGAAAGAC'TC AAGAAGGTCA AACTGTTACA 'rGTCTACGCT TAAAAGTGTT AGATAATGGG3 AGATAACTGG ATATTGTAGA GGCAAACACT GCTAAAAAAA TTCAAGCTGT CTC'rcACGGT TTCAACGrAG CTTCATTTAT CCGTGTTTG GGACTACGGA TTGCTGCTTT GTTAATTTr'r ATCGCAGTTT TCTI'GA=TC AAATACCATT CGTATTACCA TTATTTCCCG CAGTCGCOAA ATTCAAATCA TGCCTTGGT CGGAGCTAAA AACAGTTATA TCCGTGGACC G7rCTGTTA GAAGGAGCCT TTATCGCTTT ATTGGGAGCT 8340 A, 411jj 11 loojoo 11 IV, -11 751 ATCGCACCAT CTGrTGT TCGTTGGTAG GGCAAAATC' GCCCTACTAT TTGTGA'rTGG CTTTATG-?r TATCAAATTG ?TACCAATC TGTCAACAAA ATCCATGATT AGTCCAGATT TATTTAGTCC GTTGATGATT GGTTTTCATT GGTTCATTGG GATCAGGAAT ATCCATGCGC 8400 8460 8520 8580 8640 8654 CA'TCTTCA AGATTTAC-GT AAAATAGCTG CTTTT TTTGCTACAA GAG rTTGA AAAX2AGATGC GCAG CCAGAGAAGA. CTTC INFORMATION FOR SEQ ID NO: 99: SEQUENCE CHAR~ACTERISTICS: LENGTH: 19718 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear PTGAG G~AGATTGTAA AATCTCCTTT kGJAAA AGAGCTTCCA AAGJ1AGTCCC 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: TGTCGCGTCA. AAATCATTAC TATGGCTATG TATAGCCCTT ACTATGACTT GGCTAAACAC GTTCGCPTTC AAATTTCTAG GCTCACGCTG AAACAGTCTC CCAGGCTGTT CACTCCCGAA TCCTAAAATC CT 'CTTGATC GCTTTCACAT TGTACAACAT CTTAGCCGTG CTATGAGTCC TGTGCA*TGTC CAAATCATGA ATCALGT7NCA TCGAAAATCC CATGAATACA AGGCTATCAA GCGCTACTGG AAACTCATTC CCCTACTTTT CGCATGCACT AGAAGACTTG AAACACCACT AGACCCTGAG AAA= rTCG TCAGACTGTC TTTAAAACCT ACACTATTCT AATGCCAAAC TGCCTTTGGT TTTCGAAACT CAAAAAAGAA AGGACGAAAT TTGACAAAGA GCCTAATTTC TAGTCCTT T rATAACGTG GATTTCTACC ATAGATGGCT CAAGACCTTC TTACGTTCCA AACAGGATAG CCGTAAACTG AGTGATA.AGC GATTTTATCG TAACAAATAA AGAAATTCTT GACAAGATTT TAAGCTATTC ATCAGATCTA TCAACTCTTA CTTTTCACT TTCAGAACAA GACTCATTGA GGACAATCTG AAGCAGGTTC ATCCTCTT TTCTCAAAGA TAAAGAAAAG ATTATCAACG CCCTTCAACT TGGAAGCGAC CAATAATCTC ATCAAACTTA TCAAGCGCAA ?TGAAAACTT CAAAAAACGG ATTTTTATCG CTTTCAACAT TTGTCCTTTC TCGAGC'TTAG CTGA CT TCAA CCCACTACAG CATAAAAATT GACATGGAAA TTATAAAACC ATTACTACTT CCAATTCGGC TTGGTTCGCC CAAACATAGT GACCTGGACG TATCAGTCTC ATAGTCGTGT TGACTTGGAT CG'rAAAC CTT AGAT 'GGATC TGGGATTGG ACCGCTGAAA GCAAGGCTTG ACTATATGGG TGAATTGCAT TCTTAAACAA TTCTI'CTGTT TCI'GCAACCT CTACAATAAC ACCCTTGTAA ATAACTGCGA TACGATCrGA GATGAAGAGA TAGrCAGGC GGCACGTACA GAAACGTCCP.
CA74GACCAAG GCACGGGCAA CG.AGCTC?r.
AGGCTGAAAT
TACCGATACG
752
AATAAAGCGA
T1GGAAn= Tr.GCTCATCT TrACGTIGA ACAACCC.ACA ACTC-ATGGG.C TTGAGCAAGT TCAAGACTTG GCAATAACAA AGTCTGGTTG cmCcfiGAG AT1-cAr.AGG GrAACGAGrC AAGTGCTCAG CPAGAAGACC TACTrCACGG ATAATA~rT' GAACrrCTC ?1'TACGTTCT TCT-rCATCCT TAAATAAACG GTGATTGTAA AGACCTI'CAG AAATAATATA ATCAACAGTC GCACGTTCAT TCAALACrTTC GGCAGGGTCT TGGAAAATCA T-CTGGATTCG ACGAATCAAT TCCGCAGCTT GTTCACGCGA TTTCTTACCA TTAATCTrT GACCA'rCAAA AATGATATCT CCATTACTTG TATCA~rAG ACCGATGATA GCACGACCAA TAGrCr-TT CCCACTACCG GACTCACCTA CAAGCGAGAA AGT1rTCTCCC ?TGTTGA'rAA AGAAGTTAGC Al'rTTTAACC GCGACAAACT TCTTACTTCC 'rrCACCGAAG GAAATTrCTA AATC7rTGAT TTCTACTAAT TMCAGACA Tr'rCCTTCCT CCTAGTCAGC CAGA'GGGCA 99 49 9 9 .9 Cl.
9 9 9 99 T'N'CACGGAT CTTATCATC CC'TCATGAAG AAGCCAAGTT Tr'rGTTCGAA GTCAATCTGC GGTCAGTATA AAGTGACGGA CAAGCTGAGG CAAGCTAGAC CTrCCTCAAC CGTTCCATAC CAATAC?'TGC CACCACACCA TTTGTAAAGA '1TTTAGCAAA TTGGCTCATC ACAGA'ICAAG GTTGACCCAT TCC'TCCAGAA GAATG~CCAAc cTTATTCA'r CTTGGTGr TACAA'rAACT.
TAGTCATTGG GTCCTGGAAG AGATTTGCAA rCACACCTGG T7TTCTACT TrAGCCCAAT GTrGTCTCTGA TACI'GAGAAT AT'TGCGTAGT CAGAACGCAA GGCAAA.AGCA GGTGT'rCCTG GGATTGAGTA AAGATCCCCT
AAGAGACTCC
TCAACGATTT
AGGTCGTGGG
TCAATAATCT
ATGTATATGG
CTCCTGCATA
'rAATAAAGAT
GAGCTTGAAT
ATGGCGAGG
CATAACCCCT
TGTGTGAAA
AGTTACATCC
GGCAATAGCA
AAAACGTCTA
CGCTTCTTTA
TI:TTTTAATG
AATCCCATTT
TTCGGAGCAT
TGAGGAGCTF
TCCCCTTTCA
T'rATCATCAG
TCATAGAAGA
ACCTTATCCG
TCATACTCGT
AAGGCAGTTG
ATAACGATAC
TCTGCGTCI!G
GCTGTTTTTC
GGGTrCCAAAC 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 *9 99 9 9 9 9 9999 99 99 4 9 ACATCAGGTC GGCAGGCALAG TATTGGAATG GGTATTCATT TAGTCAA'rGG
TCTGTAATCT
ATAGTCGCAA
CCAATTCTTT
GACTACCAAT
TCTTAGCACC
CACGGTACTC
TCTNrGTCAA
CGACTAGTTC
TGTGAGAAGA TAAAGCTGT'C AAGTCCTGAC GACCATTTTC TITCGAGCATA CCTGTGAACG ACTCACCTAC CAAGGCTAAT ACrwCCCT ACGAATTTG;T TCCCAATCCT AATACTACCT TGGGCAATAC AACAGATTTA CCTGATCCTG AAGCGAAACG CCGCGAATGG CTGTCAATAC TTTGTCACGA ACGTCAAATT CCACGACAAT ATCGCGACCA GTCAAAA'rTA CA7TTTTTTC 7r~rGTCATTr TCTACTCCTA TCTATGTGTA CGTGGATCAC TAGCATCCCC 753 TAAG?1-1-rGA CCAACTACGA AAAAGCGACAA CGATACCAAG CCAGAACAAG TAAGCATTG.G TT-rrACGTT TGTGAATAA ACTTGGCACT GTAATCGGTA ATCCAAGACC GAAGAAAGAC AAAGCTTrGA A~cATITGAG TcATGGTTGT cAcALATAACA AcACJ~GGG TCAATGCAAT TCCGAAATCA AACGACCCAA AAGAAAGCCTr CGTATIGAGAT GATACCAATT GAGGCATrGAT CGTGACCCCA AGTTGTATTc GCAATACCAA TCCATGTTGT
ATTTTTGGCA
CAACTCACGA
TACGCTCATG
AACAATCAAA
AACTAT~T
TG;TCGCAATC
COATTTACCG
ACTAAAGTCC
GAAACTTATC
CATAAATTGT
ACAATCTrCA 'rAGCGCAAGA AGGTTGGTr TCCCAA6AGTA ?rrGCACACG GATCATC6AAG GCAAAAATCA GATTCCAGAA TICCAGC'TCCG ATTGAGTAAG TCAAGACAAT AGAGGrGGGA TG-TTGAGAT GACGrrcrrAA ACTCCATCA TCACACGGTC GAAATACCCC AAATACCACC GACAAAAACA CCGATA.ACCA AGTTAATCAC ACAGAPATGA GGATrGAGT ACGAGCTCCG AACCAGACAC CGTCAAACAG T'rACTGTCAG TACCGAACCA ATGCTCCGCA TTTGG;CTTGA TATAACGAAC TTTACCTTGC TGACATCATT GAAATCAAAC rA.AAACA TTGGCTAGAT AAAATGATGC CTACCAAGAT TCCCAACATG ACTACAGTTG A~rwMTCTT TTAAACACTG ATTTCCAGTA AGAATATGCT GGCCCATCAA TAGTTTCAGA GGCAAAATCG TCACG7=TA CAAACTGA.AA CCTCCTTTCT CAGTCAATTT AATACGTGGG T1-rTTrrA TCCATTCTAG ACA'rrATTTG 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080- 4140 4200 4260 4320 4380 4440 4500 4560 'rCAATAATAG Soo.* 0:: ACACGTGAGA AGATAGAAAT ACATGTAAAG ATGAAGACAA TTAGATGCTIT TTACAGAGTC AATCAACATT TTACCCATAC TCAGTAAGCG TTGCACCACC GATAACCCCA ATAATGCCAG GGAACCATGG CATTTTTAAA GATGTGTTTG TTTGAA.ATTT GCACCAGCGA AACGA.ACAAA GTCTTGAGAT TGCAAGTCAA ATGGCTGTAC CAGGAGCACC CAACAAACCA AGGATGACTG CAATCTCCAC CTCCCAAGAT AGGGAPATGAA TCTGGAAGGG CCAACCATGT AAACCAAGGC AATCcrrGGA AGAGCAAGCA GACAGGCTAT CAATCCA.ACT CTTCTTGAA6A CGAGCCA'rGO AGAGCATAGG CAACAACCAA ACCAATCAA.A CCAGTAATAG TCATCCAAAT ATCTCCCAAA GACCAACGAC CATAGAGTTA Clr0GGAAGGC GAAGACTGTT CAGGAATTCC 'rGAAACCAC CTTTTCAGA CAAkACCTTr TCATGrA.ACG ACGAATCCAA
CTGGTAAAAC
CAATAGATGA
AGAACGTCA.A
CTGAACCAAG
CAGAGC'7GAC
GTAAGAACGC
'rCCAATCAAT AGCCCCTCI'1
TGGCACGGCA
AATCATAGAT
GCTAGC'TACT
TACAGAAGAC
ACCI'GAGTA
GCATATTGGT AATTACTC AGTCGCTGTA TAAGGATCAT CTTTCCCATA TCACGAGAGT CAGCCTGACT AGGTGACTTG TAGGTTCTTO AGTAXATATT GTTTTCTTAC CTGTTGGGAA CTGAACTTG C CAGTTTTGG T-T-GTCCTTG
ATAACCTGAA
ACAAAGTT
GTTCCTGAAC
TTCAAGTCTG
GCATAGAACT
CrrCAGAA
ATAGCTTTAT
'rCTTGCAACT TCATAGTTA'r AAAATCAATr
AA.ATCGA.AA
GAACTGGTGT ATTAGCATAG GATGAACP.AA TGGAACTGA 754
GTTGGGTAAG
CTGTTAAAGT
AGTCACCTAA
ACAAGAGATA
TTTCAAAACG CCACCAATGA CCATCCGATA GCTGGATCAT
GATTTTCAGG
GAAAAACACG
ATTCTCCCAA
TGGTCGCATT
C ~rrAGTATC
CACGTTTATC
rrCGAGGAAC
CCAATCACCG
TAAAAATCC CAAAAGAACC TTCACATGAC 'rTGCCAATTC CA'TTTCAC GAGCTTTTTC TA'TrGAACT AAACATCTGA ACAATTCGTG AAAGCACTGG CTATCTGTCA ACCATGCTTG GTCTCTCTGG CAGCTTCATC GAAGGGCTA'r TTGGATTATC TTTAAAATAT CCAGGTAAGT GATACA'rCCC AATCCTCAGA TCACTTGTCA TTTGTTGAAT GTCTTGGATT 'rrATTGTAT GTCAATGTC TTCAAAAA?1' GGAATTTCAC GAGTAGCATA AGTCCAACCA TGACCTAATr GATTGATGTA TGC=CACT GTTACAGAAG AATCCATGCT GTAATACTCA ATGTAGCCCA TACGCTCAAA AGCCGTI'GTC GCAATwrMAT TATAGTTACG CAAGCTATAG ATAATCGTGT AGGTCAAAGT CAAAACACGC ATAAAAATAT ATTTT!=CAT TTCTCCTCAT GGAGAGAAAG TTCTATTAGA TTTTTGAGCT TTCTCATTTG ATTCAGCTTT ATACTC=rCC TTAGTCACCA CTrTATCTTG CCCCTTAGAG CCTGTTTGCG CAGAAGCTCC TGCTGCACCA GAAGAAGCCA TAGCAGGAAT AGCCGCTGCA TATTTCAT AACGGACATTr AACTAATTTA 'rCGTATTCTT TCA-AACCAAC AAATCCTAAA TATGTTTTTG TAGTTTCACT AGATGGGTCT TGATAGTCTG GCCCCCATGA TGAAGCATTG GCAGCATAGT AAGTAATATT ATCAACAACC ACATTTITCAA CACCAAGAAC
ATTCAAGTTC
TTTATCTTrA
AAGGTAGCGT
AATCAAGTTA
GAATTGACCA
CTTTTCATAA
ACTTGCCTTT
CACAGTATTT
ATCCTGCTTG
CGTTACTAAG
ATTATTTCCT
AAlriATTT-AC 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 TTC?1-rCAAC 5340 TGATTTCAAA 5400 AGTAAATGGA 5460 AAAGAGTGAA 5520 CAAGTCGCTT 5580 TTGAAC'rACT 5640 GCTAGTTGTT 5700 AACTCCTCCT 5760 AAGGAATTCA 5820 TGTTTCThCA 5880 TACTGGAACG 5940 TT'rCGCA.AAC 6000 ATTCACACCT 6060 GGTC7rCTC-A 6120 TGCTCCATCT 6180 GGCTTGACGG 6240 AGTT7TTAGAA 6300 AGAGCCTGAT 6360 GATTGTTTAA AGGACTGAAT ACGAGATATC TAGITITG ATGCTTGGTC TCCAGATGAA TAGGAAACTG AACGCCGTCT GCTTCTAAAG CTTTCTTAGC TCTGCC'rTGG CC1-rCTCAGC ATTGAATAAA CCATCCTGCC CATCAGC'rAA TI'CCACTCAT CACCATAAGC AGGAAGITrGA GCAGCGACTA AATCACCAAA CCAGCTCAAA CAAACTCTGG ?TTTACAAAT AAATTACCAA CTGCTAAAGC 'ITACCATTGA TVMGCTGA GTAACCTGAG CGATCAAGAG CAAAATTCAA AAATCTTTGT TAAGCAATGC CTTCTTAGTA GCTACTI'TCT CTGAATCTGT GTATAGT'rGT AAC71TTGGCG ATCAATATTC ACACCCAGAC CAGCAATCCC 755 ra T.TGTAAT AGATATTGTc cTTGTATTCT TcrGCAACC GrGTAAAaAc GGGCATAAcT ATAAGC~ccA CrAGTGAAT TCTGATCCAT CA~ra=AAGC TAA2ATTGATA GTATCTAGT TATTGc~rAT TTTAcAAA cTcrACAG2AA GATTI'TGCAG GGACCATTAT AAAGCAAGGA TG'TCGGATCT GTTGGTTTAG GTMCGAATTr CTTCATTCAG AGGCCAGAAA ATAGAATAGG GGT'rCAGGCT GGrTCAAAGT GTATTGTAAC GTATAATCAT GTTGAAAAAT CTG-rTGAAGT TCCTGATAGA TAATCTGCCA GCTAAATACA TAGCrCTGA Tfl1' TATCT GCTGCGTGT'r TTAGCCGTCA CCTCTGCATA TrC?1'CTCCA TCAGAGGTAA TAr-XATAGIT GGAGCTGGTA TACGCTCAG CG.ACTCCTG.A GGACATI'NTC TTTATCCCAA TCAACCCT'r' CAACAAGAAT CAAAATCGCT TCCTrTCAT 'rCAACTTAGA GTTCCAC.AAC cAACCr GACACCAACT AGCCITAAC CG.ATT'rTCA 'rAAACGT CACCAAATC1' ACCA'rTrAAC CCCTrI'ACGA ATCTTATAAG TGTAGGrCAA ACCATCCTTA GAGACTTCCC AATCC1'CrGC AACTGCAGGA 0.
GCAAGATTAC CGTAA~rATC GTTGTACTAT TTTTACTTGA TAGCCATAAG CTTT'AGGGGC ACACCTGCTG CTAATAAAAC TACTCCTCTC TTTATGTGAA ATCAA.ACAAA TTT-TCAGAAT ATATGATTCA AAT'rGTCGTT CAGAAACTTT GGAGTTTAG4G 'TTrGGAATT' CAAArrAAA'r TCAGTTTACT ATGTCTTI'TC CATATATAGA TTAAGACTAT TGCTCCrCGC TATCACGTT ATTCTrATTT CCTCCGTCAG CTCACCATAG TCAGGCTTGG GACTTACACC TCAGATGCAC AGATrrCAAA CTTATAAAAC GGAA'rACTGC.ACATAACrAG TCAAAA.ATTC CGTGCACAGC GTTAGTGAAT AAACCATCAA TCCCATTTGA AGTCACTACT AATCAGGTAG TCCAAGGTTT CTrGGT=GC TGTATAAACA TGATGAATCA GATGAT=~G AAGAAC1TGCA TGCVGCAAGT AAGACCTGCT GTAGCAAATA CACGATTTT'r TTTCAT=TC TrATAGATTG ACAACCATTA 'rATCACATTA TCCATTAAAA ATTTACGCTT GTTGGCACAA ATT'TTCAT T~rrTTGAAT CGAAGTGTCA AAGACTACAG TQAAAATAGG AAATTGACG 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 AAGACATACA GTAAAATGAA ATACGGACGG
AACAATGTGA
TATAACAATA
ACACCAACCT
CT=TATACT
CTATTCATAG
ACTTAAAACG
GGAGCG;CTAT
GACATGCCCA
TTAATCCGC
CAAGAAAATA
AACCACTGTA
T'rGTAGAAGT ATCATTCTAG TATTCAAGAT TATCCCGAAT TCAATTACTT TTGTGATTTA TTAAAATTTC TCGCTACC -r ATCCACTATA CCTACGATTT CACTATTGCT TTCTCTGACA ATCTATCCCC A$GACCATI'T AATCCGCTAC TGTATTCACC GGTAGTGGAG CCCTACAGAG TCGTATAAAA AATCTCCTAC CCAAGGTAGA ATGTCCGATA CCAACAI'TCG ATGCTCCAAT AAGCCTCACT GAATCCAGAA GAGAGCCAAG AGGAAAGATA GATAAAGGCC GATAATCGGA 756 CGXTTrCCCCG ACTCCTGACT CATATCCATC ATCAAGCGAA ACTAATAAAA TAGTCCCAC AATTCCGTAA CTCAGAATCG GCAI'GTTCAT GATAAGGAGC ATGTATCCGA GGATAAGAGT TCACCCCAAA AAGGATTTG CTTAAPACAAG GCCATCCCP.C TCTTCCATAG AALGAGTCTAA AGrACCrn CGAACTCCCA CAGGAGCAAC AGAAGACAAA TATCAATATA AAGACTGTGG TCATATAGGT CAATGGCCCT CATCCCAGAT AGAAATGCGI' AATCACTAGA AAAGAGGAAA CCrrCCAkGTT ?I'TAATAGTC AGGCAGTTCG ATTTTGAGTA AGAATACTT CAACCAATTC AGAAACAACA AATAATTCCA
CTCAAACC.AA
GTAAAGAGAT
AAGTTCAAAC
AACTTGGTCG
TCGCGAAGAC
AGATAATTGC
CAAAGAGATT
TTGTAAACAG
CCCAATACTA AGCCAAAAGG TCCAGCGATA ATACCAGGAA AACAAAGCCT GCAATCACAC ATAGAAAGCA ATCATAATAC TAATAA'ITAG GATTAAAGAA GGTGAAAGAA AAGCATAGTT GCAGCTGACA AGACACTACC GA'rAAAA'rCG ACTGATAGTG ACAAGACCCA TCCAATTTTG AGCAGCATCG GATGCTCCCC AAACTGATAA TATATAALACA AGGATAATTrC CCA6ATAAAAT CCTATTGATT TCAAAATGAA TATCCTATAT TGTACCACTT AAAACCCTAT TTTATCCACC TTACCCGATA AGTCTGATTT
CAAAAAGAAA
TGCAAGAATG
CATTTrTCTGA
GAGTACAACT
CAGCTCTTTA
TCC'TT=CTr
TTTTAGCAAT
ATATCAAG43C
TGCAAATCAA
GGTCACTCT GCCCGCTTCT GATOCCACAC CTGCATATTG AAAVXTrCTTC ACAATTTGGA AATGTTCTAA ACTGGCAAAA AAACAAGACA AACTGCAAAA TCAATCGAAA GAATTTATGG ATAGTAAATA GAAACATTCC TACTGA6ACCC GATATAACAG TACTATAGCT AGAATACTlr TCATGTCTCC ACAAAAAGAT AAAAGGGTAA CrAGACAACC CCTTCAGC T CCAAATCAGC TGATTCAGAT TTGAAAACAA AGGAAACGT TTCAAAATGA TAC-'CAACT TTTCTACTrAC TGCTGTACG CTGTTGGTGT CTCTATCTCA AAGAAAATAG ATTGAGAAGG CGTTGCTATA GTAATCACGA AGTCATCTT GCCTGCTAAT C1-rGAATTAG ACGAGCGAGA
AAGAAAAAGA
'rGTAAAAATC AAAGA'rACTC
TTCAATAAAG
AATAGTAAGC
TT'CCAAAATA
CCATTCTCAA
GACTTATCTT
CTAGTCCCTT
GCCTTCACAA
'rTAATATTTC
AGACCCACAT
AAATACTTGA
8160 8220 8280 8340 6400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 TATTT'rGACG TTTGATTACA ATGCTGTGAG CTAGATCAAA GGCrrCTGAA CGGTTACGGA CAGGTCGTTG CACI'CTTNT GCTATTTCCT GACCTCTTT AGTTTC~rCT AATAA'rrCA AGTAAATCTG AGGAATACCT GGGGCAAAAG CATCATCATC TCCAAGCGCT GAATAGTAGG TTCAATTGAT TATACTCGGC ACTAGACTAC TTACGTTTGA CATTGGCTCC AAGCATAGTC AATCTCCTCA TCGCTCAGGA TATC=?'AC CATGGGTATC TAGCGTCGTA AATTGCTTCA TCGGGCTCAT TTGGTAGATA TCTAAGTTGT AACCI-rATAG AGTTCATT1TG ATCTACTACT CCAATCCCAT CTTTAACCAC TTAGCCAAAC 757 AGASGTATAAA GTGTCACCAT TGGAkAGA1GCA AAATCATAAA GC MrCcr=C GGAACTGTAA CATAGTAATC ATGvGTCTGCT AAAGCTCTGT CCCATACTCA CTGCTTCCAC AAAGAAATCA GACCAATCAA ATCACACCCA TAGTTACTTC TTrGGICACA GTTCCACTGA ACCATCrTCA AAATTAAATC TACATCACAC AAAAGAGAGC TTTAAATTC:A ATTTTAAACr
GCAGCGATAT
TTAGTATCCA
TTACTTGCCA
TCAAGATCAA
AACACAATCT
TG=GCGGAC
CTGCCTrCAT GAA'rCGAATA GTGTTICATGA ATCTCAGGTA cTCGAACT GTCCAATAAA TCCCAAATAT S.
S
S
S
S
*5.S ATTGACGAGA AATATGATTA ATCATAAAAT GCTTCACATC CTCCCAATCA CCAAAAGCTG ATCCACGATC AACTGTTGAT GGGAAAAATG CAAAATGCTC ?rCCAAATTA 'rCATATAAGT AGGTAATCAA CATGGT-rrrA TrGAArrG CCAAGTCTCA TATGATCTGG CTflCATAAAT AAACTCGTAC TAATATAGAC TGTGATAA-AC ATTATCAACA AGCTAAACTC 'N'CCTCTGTG TATACTCTTrC GAAAATCTCT TCAAACCACG TGACTTCGTC AGTT'TCATCC ACAACCTCAA A'rrCrrCAC
AGTGCTGAAT
TCTGCCTCCTC
CTTrCcTTGG
GGTTTCTGG
CAAACA'rAAG
ACTCCACTTC
G'TAAAAGGTG
CTrTA.AGAr GCATCATTA2 AAAArrCATT
AAAGTACTAC
TCAAAGACTA
TC-AGCI'CAC
AACAGTGTrr ACCAAAGG1A
TGCACGATCC
CCAAAACTA
ATAGTCCTTA TAATACTTGG ATAATATTTC 'rCACCTAAAC GTCG;TAGTCA AC'TGGCGCAA AACTCCTCCA ATAGCATCTC ATTTCCAAGG CTATCAGAAT TCTCCTrTrT CTAA'IrGAAG T'TAAATCTCT AT=rATCATC TT-rCTTGTT TCTGCATAGA TAGAITCCAT GAGCTCTTCT CTTGCCG-AG GTATC =AC TCGCAACCT GCGGC'TAGCT TGCCCCGTTG CTCATTCr1'G AC'7GATAATG CCGACCACGT TGCG'rAGTrG TATrGGAACA GAGTGGCAAC ATGAAGTCAT CATCGGTTTC ATCA'rrGGGA r.ATCTCTGCT GCTTCATrCCA TCATAAGCA AAGGCA'rCTA GGTCTTACCG ATAAA~rCCA
TTCCACAAAT
TTACCCTTGT
TCCCAGTIrA 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 'rCCTAGTTrTG CTCTTTGATT TCATrGAGT ATTACTTCAC AXATGATATG GCGTTGGAAG AAGAGATACA CAATGGTGAT AACAGGCAA.A GCTTGGTCCG TAGTCGTTGA AATATGGCC AAGGCAGAGT CCACAT??rG GAATCCCGGT TCAAGACAAG TCCAGAACCA AAGGGCATTG ATGATCATGG TTGTCGCATG AqATGAT~cCC GAAATAGGTT GTAAATTrGAT TAGCCCCATC S S GACTTTCTGG AATCGAGATT TT-GATATAGC CAACATAGAG AAAGAGCGTC TGTrGGAATCG CATAGGTCAA GTAGAGCAAG ATCAAACCAA AGGTATTAGC CAAACCGAGT TTACTCATCA TAACCGTAAT CGGAATCATG ATGACTTGGA AAGGTACGAA GA??CCGAGG ATTAAGAGGG TATACATGAT GGTAAAGGCT TrrCTrTAC TCATATTGCG AGCGATCCAG TAGGCTGCCA 758 TAGGGATAAA GATCAT7ACT GCAAGTAAAG ACAAGACAGT AATAOCCTCC AATCCCATCA GCTAAGAGAC GGCTAAAGTT GAAAGCCAAA GAAATTATCT CAAGGAGCGG CACTAAAATC GGGC~rr1TCT TTCATCTTGr ACTCTCAATT GGATGA'rCGA TTG~cA'rAAC CGAATTGGTT GTTGTGGCAT TGTTIGGACC CCACCTrTTA GGGCTAGGAT ATGTTCCAGA AAACrGCTr GTTGGAATAG ATrGCAAAC AGAAGGACAA AGACAGCCGC AAAAATTCAA TATGAAG4GGC AAGATCAAAG CCACTGTCAA AAGGTTTGGC CTTT'GATTTT TCACCAACCA CCATGGCAAT GGATCCATGA AGAGGAGCT'r CCTGTCCAGT TGGTAAAACT GCTTGTAACA AGAGGGGGAT T'rCATAGTCT CrCTACTCCT ATCATTGACC ATGCCTTGTT ACAATATCCT TAGTGGGrTT AGAACCGATC CTAtGAATCAA TTCATCATGC TrCTCCTCTwr AATCACTACA ATrAAGAAGA GTTTAAAr. GCATAr1-rAT ACCACCGGTC ArGGCAAAGA AAAGACCATA GAGACACTTG GATGACGACA GAG'1TCCAAT GTCCCATGTG AAGTTGGTTG GAAGGAACTA AAXGAiGGTAG2 TAGAATGTAT TTGCCAATCA AAATTTCAAA TTC'TAG2AT ACAALGATTAC GGC-AATGGCA AAACCAAGAG CCCAAGTGAG CTTGGTrCAAA GGCAGTCAGC GTAGCAAGTA AGGCAATCA 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 GCTAGTCGCA CCATCAATCC TTrGCTCCC TGTAATCTCA a.
AGCTAGGAAG ATGATCATGG GCATAGCCAC AAAGA2-TC CCCCACrrAG TCCCTAAAAG ATTTCCAATC GCTCGAAGAC CGTAGrrWA ACCAGATAAA ACAGCTGGGA AGXAGXAACCA AGAATTCAAG ACACGCGCAA TGAAGATCCC
CGCAATGATT
AAAGTTGTTT
GTAAAAGGCT
GACCACAAAA
AATCCACATC
TATCACCGGT-
GTCTGCTCA CTGGTCCAGT ATTGTTGCAA TTCGGTCATA CCACCAAGCG GTGAATCTTC TGGAGACCG TCCACATCGT AGTATTTr'G AAAGGCZA'TG GC~TTTG GATGTTTGT CGCACCAACG GTTAAGCTTT GTCCTTTTTC AAAGTTCGGT TTTTGTTCAT TAATCGCTGT GACATCCCCA CGTGCGAAGG CTCCCATAAC TTGCT'rAGAT CCATTGATGC GAAGGATGTC ATCCGACAAT TTAATGGCAT TTGGTTGAGA GCGGTAAAGC CAATCGCATT AAGCCAACAA ATTI'GAGTT CCI'TGAAACA TCGGCACATA GCCCATGCCC AATATTTTTG CGCTTTCATC GGGTTAAAGA CAACACATAG TTCATGGTCA CCAGACCAAG TGACaATCCG TCCTGCTTGT T"TGACCCCT= CATGACTTCT GGACGGGTCA GGTGGCTGAG ATAGACCATG TTTTCCTGGA ATCATGAAGG GATCGCCCAA GACCCATT'rG ATCGGTATAG CCAGCACC?1' CATGACCTTG ATATCAT ATAACGAAGG TATTGATTTG CCCTTGCCAA 12300 ACTGGTTTGG 12360 GACTrGCrG 12420 AGCACGGAAG 12480 GAGTGCAATC 12540 CATGAATTTT 12600 ATAACTCAAT 12660 GAAkCAAAATT 12720 TAATACT'Tr 12780 AGGCATTCAA 12840 AGGTATGGAA 12900 TAAAGGCATA 12960 CGATCGCTGT 13020 TATATTCCAC 13080 CCAAGTCTCC 13140 TCCCAATCTT 13200 GTGTCATGAG 13260 CCCAGTTCTT 13320 TCATAATCGG 13380 CTrT'T'TCC 13440 1. l-f Jul AU. I 11 11 1-- 759 TCCACCTGT CCTGTCGCAA
ACCTGCAATT
TTCA'rCCCAG
AATTCICATAA
TTCAGCGTAG
ACCTGCTTTT
CACATCTCCT
CCAAATG=T
GTTTCAGGAA
GAI7AAGCTG CCATrTlyrCA GCCCATrCCr
GCGAGAACC
AGGCTAATTG ATTGTP.ACCA TTrGAGrGrCC AAGCATCTGC 1-rTGTCCTTr AGCAACGATA TC?'rTGACTA ACTG -rCAAA CCTTCAAGCC CAGTTCTTCG AATI-rATCrr TGTTGTAGTA -:AAAAGA..C s GT.ACrT-1-CGTTTA C1jGCAThTT CGCC?'rrCAG GTAGTCTTTG Tnr.CTCAAAT CTTCAAAAAC GCAGTTCGAT GGACTGTGGG TAAATATA CCACA'rCAGc GTGTCTTCAA TACTTCACCA GCATTTGGTA CATTGACGAC T-rTCCTr'1CTC AAAATCACGA GTGAT"rCT CCAAGGN'
S
Se S S S
S
S
S
S
S.
S
S
T'T-GACCTrG ATCTTAGGGT GGTCATIrcT TI'T=CTGGT ATAGTTG4GAG CAAGCGCCGA 'rrTTTTATAC CA'rrCCATTA CACCCCAAAA GTTAGACAGA TAAAAATATA CTGTCTACTC ACATTAGTTC 'rGCACCTGAG CATCCAAATC TTTAATTTT CGTACGGTAC AATCGTTTGA TATCAGGATT AATTAGTCTA ACAAGTTCAC CTGATTAGCA GTTCATAGCC CAAATTTCCC GTCCCATCTG ATGATTCGGT GATAGGATGA ACCGTATrGA TGAAATACTC GATCGTCACT GCCCAAACAA AGCTAAACCT GAAAGCCTCC TTTATAAATT ATAAATCTAA CTTTGGcGGT
TATACACCCT
CAGTACATAT
AAAAAATCTC CTTGGGATA.A GATAACAGTT TAAACTTCGC CATTrrCCTG TAArTTATAT AAAG -rGTTT CCATGGTCTC TACAACAGAT ?TrCCGTAAT TAAAr GTAC AGCTGCTTCA TACTGCTGTC CrAACTGAAC TACTGGTCCT ATCGTAGCTT TCTCTrCATC 'rGATAAATIT ATCA'TTGCTA CALAGGCCACC TGT'TTCrAAT ACTGCTGACA CATGAGCCCC CATACAAATG ATTGGTAAAC GrGCAATGCC ATCAGTATTA ATCATACCAA GATCA?1'TCG TCCACCACCA TGCTTCTCTG TCAGAT1AAGA AACGAGTTCA ACTGTGTCT CTAGATAAGT TAATCCATTC GTGCCATCCG CAGATN'ACC GTAGTTCCALA GAAGTCCGAT
CATAGTTTTC
AAGCCCGCAT
AGTCCCTCTT
AAAACGCGAA
TT'GGA'rACAG AATTCT'rrAT
GTCAAATCAA
GGTGTCATI'C
GTTGGATAGA
TCACTAGCCC
CCAGAGCAGG
TAAAGCCCCA
CCTAGCTTAG
13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 AGACTTGTGG GAAATAGCGC ACTCAAAGAG AATATGGCTG TGATATTGCG GTTCATATCC CATTTAATGT CTAAGACACT TTrCAAGTA'r 'CTACTACCT TCCCAGAATA AGTATGCTCA TAGCCAGGAA ACAAATCACT ATCTACAGAA ATCATrTCCG TTTCATGGAT AGCTGAAATC AGAC'DTTCTA
AATCAATATC
GAGCATTGGC
CCTGAATAGC
GTTCTAACCA
GACT'?CCACC
A'rGATAAAAT AGGAGTTGAT AAGATTAAGT ACTAATTGAT CCAGTCAGGA TG?1'CACGAT AAGTCCAAAC TGCAAACCTC CAGTTTTTCC TCATTAACAA CCCAATCACC TAAACCACGA TTATCATCAA CAAAAAGTTC AATCCCAACT TTCTTACI= GAAAGTCAAA GTAAGTAr.Cr TCCCA=?1AT ATTCACTTAG CATAATGTGC TTCAGTACAA 760 AACGATTGCC AAACCAACCA 'rCATCTAATA CATCTGCA CTCTAACAGT TTTTCTCTC TGATTAGAAT TGGACGTTCT ?TTTrAGAAA AATTCTGACT TTCATGACTA ATACCAGTTA
ATCCCTGATC
ACTTCCAAGA
?rr GAAC
CAGCATCCTC
AAGCACCTCG
TGAATGAGTC
AAAGrrrrCT
AAAAGCTCA
TGTGACTCCr
GTS'TGAACTA
ACTAAAGCTA CCGGTGI-rrC -AUTAT'CC TCAGGAGCTA
GCATTAATGC
AAGTTGCCAC
TGTTCGCATA
ATCGAAAAGA
CCCTGCAGAG
CAATAGCCAC CCGAACTTCA TTCAA'rTGAT TATACATTAG TTGAATACCA AACACATTCC GTAGAAGAGC TGGTGTTTGA GCA'rACCAG TTCCTTGTTC TACCTGTTGA CGTCTAACAG TTACTATTTC GTAATCTGCA CrGGAAAAT CAACTTCCTG ATTACTrrA TTATCTAATT TCTTTrCACG AGCATAAGCA CAGCCATAAA AGAAAAATCT TACTGTAGCT AGCAATACTC GTTGAGCCTT AGAATCTTCT GAGAAGGTAA GCCCTGTGGA GCATCATTAT TAAAAGTAGT AACATTAAGA CAAGACTCTC CCATTCTGAC CTTTTAAA.AT *9**on: V.0% t o 6 o 0 ATAATACAAA GTCAGACTAA TGTATCGTCC ATGCTATGTG CTI'TGCTTCT ACAAATCGAA TGGrrTCCTA AAATCTCCTA AAACGGTrCGA TTAGTAGCCG ACTc'2GTrAC T'rCAGrrACA CTATGCTGAA CCTGTATGGT AGCCATGTTG TCCAAAAATC TGTCGCTGAG TATCTAAACT 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 i6620 16680 16740 16800 16860 16920 16980 TTGGAT=CC TGA.AAAGGCA TGGTCTCGTT TAATAGTCTT TCCTAAATGr TTCAAAAGTA TTAGATTT7TT ACTCTCAACA TAAAATAGA'r CCTCATCACT TTATTGATTA TAT=rATCA TCTCAAGAA'r ATGGTAAAAT CTTAGGTAGG CAGACTGGAA CAATCGACCr TGCCCTAAGC TACTC'rTTG GTCCAGCCAT TCGTGATACA GGAAAATTTC ATTACAA=G TAAAATT=T CATAAACACT ATTGGAACCT TrATAGTTCT AGTAGCCATT TCGATI'TTCA ATAATCAAAC TArrCTCTAT CCTAACTCCC ATTTACTTCA CCTCAAATCG CT'rTCCAAAA TAGAAAAATG AGGTAGCACA TGTTACT'N'T TTCAGAATAC TTTTATCGAT ATCACGAATG CACACCTAAT 064.
*son *0
AAACCAGAGG
TTAGGAATCA
TCCTATCTCA
ATTGTCCGC'r
CAACTTCATG
AACTAACCTT
CTGGAGGGAA
TCCAATCTGA
TCGCTCAGAT
AACTGATGT
TTrATCAAGCA
AGCCCCTGAT
AACrTGTCAT rACAAAATCA
TCATCTGGGA
TACGTTCTAC
GA'ITrAAAAG
GATAGTAALAG
TATT'N'GCTC
ACCCAGACTA
AGIGAATTAG
ACTATTGCTC
ATTACATTAC TAAAGGACI-.A AAGGAGATTT CTTTCTATTA AACCTTGGGC CTACTACTCG TTTCCCAAAT TTCTGATCAA CTGCAAAACT CATCTCAGAC CTCAAC'TCCA TA'rCA'rGGGA CCAATCAGAA AAAAAAGAAT ATTTCATCAA CCCACCAACT CTATCTTGAA TGCAAACCAT TAATTGATAG CCACTATCCT 761 CAATCACTTA CA.ArrCAAGA TTTAGCAAAA GAAC=ATCCG TTCACAAAG CTACrTATCA AGCGTATTCA AACAATTTAA TACC?1'ATCA CCCAAMGAAT ACCrACTCrA CGTGAT CACCGAGCTA GACAACTTCT GTAGGTmr
CCAAGTCATA
AATCCTACCA
AAA'rCGCAGA
AACAAGAAGG
ACCrrATTCT
CAGATCCACT
CAAGAAAAGA
ACGTCTAC
AAAACTATCC
CATTGAAATTr
TCCAGAC.ATT
CGAAAATACC CAAGAGTCCA TCAAC.GTAAT TGCATAC~cG CCATTNTTCC AAAGCTTATA AACAATACTT TAATCAGACr ATACTCTCA.A TACCAACTAG TAAGAAACCC AACArrATGA CAAATCCTAT CrAAAGAAAC CGACTATATC AGCGGAGAAA CTAAGCCGAA CAGCAA'rrG GAAAGCCATC AAGCGACTAG 17040 17100 17160 17220 17280 17340 17400 GATAGTATCA AAAATAGAGG ATATAAACTG AAACAAAATC AACACAACTA CCCTCTATCT AGCTTCCTAT CTAGAAGAkAA A'rCTTCCAATI GATGCAAAAG AAGCAATTGA CAAACAGCAG GCCGAGCCCG
TAAAGTCAGC
rrAGGCCAT 'rrrrCAACG;T
A.AATCTCCCC
AGCCATTAAG
CACCACAACC TGGTATTTAT ATGACACTCC TACCATCCTA CACACTACTT GTAGCTGGAG TAATAGATGT CGACATAAAA TGGGTCAATG GAATCCTTAC TGAAGCAATG ACCI'CTGTAG GAGTAGGTAT CAATTTCACT ATTAAAGACT GCTTATTrA6A AGCTACAGCT CCTATAACAA ATCT1TAAACC
CTGTCTACAA
ATGAATGGTr, 17460 T?1'AAACCCG 17520 GAAGCAAATA 17580 TCCTCTACT 17640 'rATGACAAAT 17700 AACCTAACTT 17760 AAAATTGGAG 17820 A'rCA-rATTG 17880 AAAGCT.GCCA 17940 ATCTGGCGTG 18000 ATATCTATCT AAACAATCAT AAACTGGCTT AGTCACAGAT TCCCTCAGGA ATTAAAAGAA GGAATGAATT GATCATAGAA CTTTCTTCGA AACACCAGCA GAAGAGCTAT TATACCTATA CAAAAAACAG TCATTCATTC TAGGAAAAGA ACTCACTTTC ACACTAGAGC AAAAAGACrA CAAGGACT1T GCTAAAGACA 'rCTCAGAAAA TGGAAAACTT 7rAGT-rCAAT GTGATAACCG AAAAGXAATC TCCTAA.ATA GTGGCGAAAT TTCTCTCAAT AGTTGGAAGT AAAATAACAC AATTATA.ATA TAAACCATXE' AAAAATAACT TCAGAT'rAGT AATTCAATTA AGrT=ACGG ATCTGAAGTT TTATTGCC TAAAAATAAA AAAGAGAGTT ACAGACrCTC ATTAAAACCG AGAA'rAAGGC ATTCGAACCC TTGCGCCAGT TACCCGACCT AACGATTTAG CAAACCGTCC TC'rTCAGCCT C1'TGAGTAAT TC-TCCAATTA A'rGGGCACGA GTGGACTCGA ACCACCGACC TCACGCTTAT CAGGCGTCCG CTCTAACCAC CTGAGCTACG CGCCCAAGTT AAAAAACTTG GTAATTTGAA CAAAGTTCAA 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 AGCGGGTGAC GAGAATCGAA CTCGCGACAA ACTACACCCG CA'rAAA'TACT ATCAATAAAA ATCCAGC 'rC AATCCA'TGC TCTACCAACT CAGCTTGCAA GGCTGTAGTT TTACCACTAA TGGCGCGAGA. CCGAATCGAA. CCCCGACAC GACCTACCGA CCCrATTGC GGGACCAGGA 762 CGACGAGCTA CCGAGCTGCT CCATCCCGCG 18780 TTTCAACCTA CGACCTTCGG GTTATGAGCC TTAATAATAT AAAAGGAGGA GTTTCAAGA CCG1!TCCCTT TGTAGGACTT GAACCTACGA TGrGGGATTC
CAGCCGGACT
CCACTCGGTT
GAACCCACGC ACGCrTAC ACGCC 'GACG AAGGTCC-GAC AACATCATrA TACC-GCCGAA AACCGGACGC TCTAGCCAGC TGAGCTACAC CTGCGACACC T'rGGTCCCAA ACCAAGTACT TAGAAAAATG CACCCTAGAG GAGTCGAACC TATCCAGTTG AGCTAAGGGT GCTCCATATT TCGTTACCAA TCGCAGGATT TTAAGTCCTG TC'TAAGCGAA COACGGGTT CGAACCCGCG ACTGAACTAC GTTCGCACTG TTIrCTrCTA CGCGACCCTC TGATTACAAA TCAGATGCTC ATATCTTAAT GCGGGTTAAG GGACTTGAAC CTGGTGCGTC TGCCAATTCC GCCAAACCCG TGACCCATrG ATTAAAAGTC AATTGCTCTA GCGTTACCTT AAACGGTCCG ACGGAATCGA TGGGTAATCC TCCAATA'N'C AAATGGACCT ATGAGCCGAG AGCTCTAACC AGCTGAGCTA GGGGATCGAA CCCCCGACCT CCCGGGTXTG CGCCATGAAT CGGGAAGACA GGATTCGAAC CTACCAAGCT GAGCTACI'C CCGAGTTAAA rCTAACCGCC TGATTCGTAG TCAGGTACTC
ATGCCGAGGA
TGCG'TCTGCC
ACCCCCACCT
TCTAAAA.ATG
TACCAACTGA
CCCCACGCCG
CATATATGAC
CCGGAATCGA ACCCGTACGA AGTTCCCCCA CCCCGGCCTC TGGCAAGGTG GTGTTCTACC CCGGCTACAT GACTTGAACA GCTAAGCCGG CTCATTTGTT TTAAGCGCCA GATCCTIAAAT CCGTACTGGG CTCGAACCAG 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19718 CCAACTGAGC TAACGAGTCT AAAATAACTT
CCCGCTAC
INFORMATION FOR SEQ ID NO: 100: SEQUENCE CHARACTERISTICS: A) LENGTH: 4117 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: CCGTGGAAA.A GTCTGGATAG TGAATGGTCT TCACACAATC ACCTGAAA GA ACCTGAGAA TAATTATGGA GAGTAGCATT CTGAOAGGTG TTAGCAGAAC CATATGACAG AGCTGTTTGA AGAGGGAATA TTCAGGAGAA AAATCCTGAG CCTACCAG'N' GGAGTTWGAA AGAGCTGACT GTTAGA'rCAT GGT-rTATTA'r CCACAACCTG TGGATAACTT TGTGAATAAG AGAAGTTGCT AAAGAAGGAG ATATATAACG ATGAAGAAAA TCAAACCGCA TGGACCGTTA CCAAGTCAGA CTCAGCTAGC TTATCTGGGA GATGAACTAG CAGCTTTTAT CCACT TCGGT CCTAATACCT TTTATGACCA AGAATGGGGG ACTGGACAGG AGGATCCTGA GCGCTTAAC CCGAGTCAGT TGGATGCGCG TGAGTGGGTT TGG'rCAAGCA CCACGATGGC AGGTCAGTCC TTGGAGGAG3A CAGAX3TTTGA TATGGATA'rG ATCATGI'GGA CCGAGAAGCG TATCAAATCC TAACTATGGG GAGGAGAGOG CGCGCAAAAG ACCTGCAGG CGATrGCTTG ATGAACGAGG GTATGCAGGT CAGAAGCAGA GCTGAACTAT CGTGTGCTCA AGGAAACGGG CT'rCAAAAAG TTGATTTTGG ?1-rGTCCr ATCCGACAGC TCACACAGAT TA7TCGGrrA GGAAAGGGCG AC7TGCTCCT TGAAGATCC CAAGCTGCCA GGGGTCTACC TGTCACCGTG GGATGCCCAT AGTCCCCTCT GACTACAATG CCTATTATCr GGCTCAGTTG AAGGAAATCT AATGCTGGTA AGTTCGCTGA GGTTTGGATG GATGGrGCCA 480 540 600 660 720 780 840 900 960 1020 1080
GTTAATTATG
ATTTTTrCAA GA'rCCACTGT
CTTCAGCACG
AATTTGAAAA ATCG~rrGAA ACCATTCGTG CAGAAGGCAC CAGTAPCCGC TGGATTGGCA GGCAAAAGGT GAATCCTGAT AAACTAGGAA GGGA'rCCCTC GGGCACGA'rT TTTTCAATCG
GAGAGGCACA
CTCTCGAGGA
TTAATATTCC
AATTTGCGAC
CTGGTCCAGC
TGTTTCCATC CGTCCAGCCT GGTTCTACCA GTTGGTCGAA ATCTACTTTC ACTCAGTAGG GCCGAATCAA GCTGGGCTCT TTGATGCAA6A CTATCGCAA T GAGCTCTATA AAGAAGATTT1 TCTTTCCGCA GACTTTGCTT GTCGCCATTT TGAGGATCAG GATCCTAAGT GCTCTTGGGC AAGCGATGCA GACTTGCCCA 1TCCA=TAGA AAACTN'TGA TGTAATTGAG TTAAGAGAAG ATTTGAAGCT GCGAGGAACT CCACTCTTGC 1140 GGATATTGAA CGACTTATG 1200 GGCTCTGGGA GCTCAGGTAT 1260 GACAGACCC CTTGAGACCA 1320 ACTCGACTTA GTTCCTA 1380 AGGGCAACGA ATCGCTCrGCr 1440 TGGTTrCGGGT CATACTGT'rG 1500 GAAGATACGT GTAGTCATTA 1560 TTATAAAACT CCTGGATTAT 1620 AACCCTAGCT CTGGCAAAGG 1680 TTCATGTGCA AGTAGAGGTG GTTACAAACG TCTCTTACGA CAGAATCACA GGCTTTGCCT CAAAAAAAGA AGTTGTTCAG GAGAAAATGC CTATTTTACA TTTCGA'rTCA ACCGGGGACA TTGCGTTTrCA AACTGGTGAG GAGATAAAAC CTTGGATTTC GATGGT1GTCT GGCAGGAGTT GGAGCAGTTG TTGAGGCACA TTGTTGACCA AGATTTCCCT GAACTAGCAT TTGCAGAAAA GTTAAGCGCA GACAATGTAC TGGTCCTTTA GAAGCTAAGA GGTGTCCATG GTG'rCGCCTA TCAGGATGAC ATTCAAG'rCC ACTGAAAAAA GTCTGACGCT ACCAACCTTG TATTTCGCAC TATCTGAACC TAACGGTCCA TCGTCAGCTT CTCGATCAAC 1740 1800 1860 1920 1980 2040 2100 2160 TTCAAGTCCA AGTTCATAA AAGAAGAACC TAGTGACTTG GTAACCAGCT GACGGTGAAA TGT.TGAA'rAG TTGATACGAG TG'ITTTGTCC TTGG'I.rGT ~rAGTATG GATATCCAGC ?TTGCCGCGAT GCAAAGGTTC TTTTGTTAT GTTAGT'rGTT CACCTTTTAA GAGGTCTTGG AGTCGGCATT CTTTGACAAA GTTAAAATGG CATTTATCTT CT'rTAGCGAG GTAGACTCCGT 764 AGATGGTCAA AGAGAGGGAT TCCGAGGTCA TAGCTTGCTT rCCTGGACA GG7TGGATAA AATCCGAGAG C'GACCAGAT GTACCAAGCA GAGAGACTAC TAGGC1-rCCC AACTTGGGTG AAAACTTC 1'GACGGAGCG TAGTICAGGGT AATCGCTG;TA ACGGAAAGA TAAGGAATGT ATGGCTATr GTCCAAAACC AGCACTAAGCC ATCrCGCTCA TAGCCTGT'rG TT'rCAAAGAG GGGAGCATCT TGACAGGCTT G1r1C-17TIC CACCCATCAG TTGGATTAAG CCAGGGATGT CA?1'GTC?1'C ATCTCCAGGA TCTTGATAAG AAGGGCAGTG GGAAACTAGG CTGGTTGAA TrrCGTGAAT 7TCGTAACCA TCAAAAGATA GTTGCTAAAG CGTGGAGAAC GCCTAAACTA TATALAGCAGA GAAGTCACGGG TCTCAGCGTC AA.ATAGCTGG CTATCTTCTC TAGT1r'rrTG GCTTGAATGG CAGCCATTC TGAAAGTTTC CTTGAT'rGTC CGGTAATTTT GTGAAGCAGC GCACAGCTGG CGATACAAAA TGCTGGTCGG TAGAGAGGTA ATGCCGAGAG GGTCGGCTTT' AGGTCGGGGG TCATGTCCTr GGCATCATAC CCCGTTCATC CTATTGACA AACCTTCTAA GGGAAGGTGG TGCGGAAGGT ACAG'rACCAG TAGCCAGATC
GTCACTATAC
ACCTAGTTCT
GCTGGCTCTT
GCAGGCGCTA
TGGACCCAGC
.GCATAGTCTA GAGTATGGCT TGG'TA'rTGGG CTAGTCCGTG TCGAGCATGG CT-TGCAAGAG TCTGCGATAA TACCGTCTAA CAT=lGGAA GGAAACCAGT
AACACTTTCG
GCGGCCATTG
TTCTCCTTCT
AAGTGTACCT
ATCCCCGTAG
GGCAAAGAGG
ACCAGGCTTG
CTCATAAAAA
GrCAGCCT'CT AGCGTAGTCT CGCCCCCAAC TCGTGCTCGC ATGTAACCTG CTTGTAGGTT TCAGCGATr 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 AAAGCGTTGA TAGTGCTCCG ATCCCAGAAA CCA'rTGTTC CATGTGGATC GCT'rGCCCTG TCTGTAGAGG CAGTGGTCAA
GTATGATAAG
TAAAGAGGAC
ATTCATTAAT
AGAAGGTTCG GTCTGTGGGA AGACCAACAG CCTGTCTCTA TAATGTCAAA ACGATGGAGG ACATTTTCCC TTACACrAT CAAAATCTTC TTGAG~TAGA TrGATrAGAG GAACTGGCTA GTTGCATCTC GN-GACTA CTGCTAAGT TC7rGGCTGA TAGCAAGAAT ATCCGTGTTC ATrTGCCGGG TTTTTGTTAG TrrCAGIT=T ACCTTCTTGT CGCAGGGCAA TCTACTGTCA GTTCATCTGC TGCGTGAAGA TAGAGGGAGA AATCCACTTG GGCACTTGAT CTTGAGAAGG AGAGATGAAA CAATTCGCCA GTCTCCAGCT CAGTGAACAT CCTTAGCGAA GAGTCCGCTT ATCTACTTGC GGGCT7rGCC TTGCTTTTGA TTCAAACGAA TAGAAGCACC ATAGCAAGTC GGTGTGAGCT GGGTTTCAAT CTGATAACGC AGAGAAAAGA GCTTCAAATA GTGAGGCTGG AAGCAAGCTT TA'rCTATATC ATAACAAGAC TGGCGGTGAA AGAGGCTGTC TCCCCCCACT TGACTGGTGA CAGGTGTCAG A.AGGAGCCAA GAGTAGTCCC CAATCCAAGG ACTGGGCTGG TGAGT'rAATC GAATCCCCTG AAAGATAGGC AGATGTGGAT CAAAAAACCA AGATCCATCC TGGTCACTC TCTGCGCAC AAAGTAATTC 765 ATCCCAAAAG GCACGCCTGT GTATGGCAGG GTATTCCCC GAGAAAAGGC ATGCTTGTTG GTAG7 TCCAA AACGGGTATC GATGGTATCA AGTAGTGGTT TCATAGTCP TCCTI'1AGCT GTTTTCTAC ATTATATCAG TAATAGAGGG CCTTTAG INFORMATION FOR SEQ ID NO: 101: SEQUENCE CHARACTERISTICS: LENGTH: 2727 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 4020 4080 4117 CTGGTTCAAT TATTA'N'CAC TC'TAAGTAGT CTAAAGG ATCTTGTTAG ATGGGAGAAG AAAGTTCGTr TATTAGTAGC CTTTTTTAGC TTTATTAGCC TGTATCA TTT GTG GCAAGAA TAAAGGAAAC GAGTATGGAT AAAATTG'rGG GCGTGACGAT CGAGGGAGCA AAAAATGCAG CAAGTGAAGG AAAGACCGTC TTGCAGAATG ATCAGGTAGT TGGTGG'rTTG AA'rGCCAAGG AGGTGGATCC TACTGGCGAC ATCACTGAGG GCGCCTCCAT CGTTGTATTA GGGCCAATCC TGCCAGGTGG TTGTACGAT GGTAGCCGTC CTATGGGGGT TAAGATTAGT CAGACACCTG ATGGTGCTCA TATCTATA'rG GACTTTCCAA CAGCGACTCT GGCTGATGGG GTGACAGTGA 'rTGACTrAGC CATTCTCCTT AATGAAATGG CTATAACCAT TACTGGTGTT GAGAAACTTC GTATCGAAGC AGCAACCTTT ATGGTAGCTG
CATATGTTCT
GIIMrAAAAG
ATTGTCATAG
GCGCTTAGAG
TTATTTATGT
'rGACAGATGA
GCTACATCCT
GATTATTATG
GAGTTTTTAC
TAATACAAGA
GAGTTCTTTC
AAATCAAGAC
TTCAAGGTGG CGATAATCGT CTGGTAGGAA TCTTACCCTT GTTGGCAGCG ACTATTCTAG TTCCGATTTT GTCGGATGTC TTTATTATGA TTGACTTTGA TGAGGAAGCT CATCTTGTCA AAGCCCCTrA CAAGTATGTC AGCAAGATGC TTGCCCGTGT GGCTCATGCC AAGCTATCCA 120 180 240 300 360 420 480 540 600 660 720' 780 840 900 960 1020 1080 1140 1200 CTATTGATCT TCATTTGAAA GGTCrGGAAG GTT ACATCGA AGCCAAGGCA GAACGCTTGC GTGTTGGTGC AACGCAGAAC ?TGATGATGG TTGAGAArGC TGCGCGTGAG CC'TGAGATTG GAGCCAACGT CAAAGGTGCT GGTACAGAGA ATGGTACGAC TCACAATGTA GTCCAAGACC CTGCCATGAC TGGTGGTGAT GTCTTGATTC GAGACGCTGT CTGGGAGCAC AACCGTCCCT TGATTGCCAA GTTAC TIGA.A AAGTAATTCA AGAAGACGAA GGAATTCGTG TTCGTTCTCA ACTAGAAAAT TTCATGTGAA AACCTTGCCC CACCCAGGAT TTCCAACAGA TATGCAGGCT ATGGGTGTrrG
CTA.AAAGCTG
CAATTTACAG
CCTTGA'rGAC
TCCAACACCT
CTCGTATTGT
CCAGTGCGGC
TGGTTCAC?1'
AGATTCAGCG
CGTTTACTTT
ATGGTAG7r AAATGGCAkGG 766 AGTTGCAAAA GGCGAATCAA CCATGGTGGA AGAAGAGATG CGCCGCAGG GCTTGCATTC TGGTGGACAG CCTTTGCAGG GAGCAGAAGT
CTTGATTTTG
GGATAGAGGT
GATTGAGGCA
TAGTCATCA'r A'rGGAATCTT AkITTGATTCA ACAGGI-r'GG TAGCACAGGG TACTACGGT! TCCATGAGAA AGTGATGAAG ATGAATAAGA AGTACTGATr ?TAGGTACTC GGGCAAGGGT CAAGATCCAT TAAA'rTTACA GGAAATTAGG GACAGTTC GAAAATCGTT TGIAGATI'ATC CGTGATACAG ~TTnCAACT GACC?1'CGTG AGAAACI'GTG GTcGGTAAAT GTTGGCGCAG CTAGGTGCTA AATCAAGCTA CGTAGTCAAG rGGCTCTAGG AATCGGTA GGGCTATCCT GTCTCCAGCA CTGGAGAACC AGCC'TTT'ITC TAAAGATAAG GAGAAATATG TATTGC'T1 GTCTACAGGG CCAAAACCAA TCTTAG'rCAG AGAGTGTCTT AACAGACGCA CAGGTGCTTT TATCGTCAAT CCTACGCTGA CAATAAAACA CCCTCTI'GTC TAAGGCCACT CTTCTTGGAC TCCTCCAGGT CAGTCGATAG AGGTCATTTG CCTCAACAAG CAATCCTAAA AACAAAAAAA CAAGACAGAC ACTAATCGGA CTGCTAGTGT AGCTATTATA TCAAGCAGAT GCCGTCGGCA CCTAATAGTC AA-AAAACAAG CGTCTGAAGC TCCTAGTCAA GCATTGGCAG GTCAAGAGTC AAATAAAGGG GAGTCTGGAG TGGAATGGCT GGTAATAAAA CAAATCTAGA TGCCAAGGTT TCAAGTAAGC AAGACAGTGG GCAAGGAAAC TGTTCCAACC GTAGCTAATG 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2727 CGTCAGTACA AGAATCGTAA AGAAACTGGG TGGCATCAGG TCAAGAATCT AAAGGGCTCT TTAGGCTATG CCTTAATCGG TGGTTGAT AACATTGC'rG TrCAGACAGC CTGGGCAAAT
AATGGTTCAA
TATACCCATG
GGTTTTGATG
CAGGCACAAG
TTGCACCAAA
TTAGTTCCCT
GTTCTAGT-rC GTAACTC1AGT CCGAGTATTC GACTGGTCAA AACTACTATG AAAGCAAGGT ACAAGCGTGT CCGTTACCGT GTAACCCTTT ACTACGCTTC CAGCTTCACA GATTGAAGCC AAGTC'TTCGG ATGGAGAATT CCAATGTI'CA AAAGGGAr cAAcTGGATI' AccGAACTGG AAAAGATACG CCTACACTCC TATGTCACTT ATGGATGTAG AGCAGGACTA AGACAGGTAC TAAGACAAAA TAGCAACTTC GGGAGAGAGA TGGAAGTTAC TTTGAGA INFORMATION FOR SEQ ID NO: 102: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 5717 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
GCGTAAAGCC
AAACGAGGAT
GGAATTCAAT
AGAAGTAACT
GAGTTCTTTT TACTAGTTTA TAAAACTAAC TTCCAGTTTT (xi) SEQUENCE DESRIPTION: SEQ TDl NO: 102:
C.
C. CC C
C
C. C C
C
C. CC C
C
C
C
C C
C
C
C. CC C C
TTTTGTAG
ATGTTAGCAG
TrCACATCGTr
ATT-AATTCAA
AACAATAATA
TATTTTCCT
GGAGGGGTGG
GTTCTACG'rA
TCGGAGTATG
ATTAATACT
TGAACTATAA
AGAACTAAAG
ACAG1TATTAG
TATGGAGTGA
AGCCAAAT'rC
AAAATACACA
GTAGAACAAT
ATTGATACGC
TCCTACATAT
ACCATCGGTI'
TCTGATAAGA
ACTAATTTAA
T'ACGTTTATA
ATTTAACTGG
GAATGcTC
ATAAATTGCT
T-rccTTrT T'
ACTTGAATTN
TGAGTCAACT
TGTTTCTAT
AATAGTTATC TTTTTGCTTA CGGGAAATAG CTATAGAACC TGTAAGGAAC TA'rrAGAGAT ATGAACAAAT TTTAATTTCG GT-IACTATTC GTTATTTTAT TATTAGGAGC AGGAGTCTTA GGAAACTTAT TGTCGCTGAT TTTATACAGA GTCGGATGTT AG'rATAATTC AGATCTTCAG TAGAAAAAAT TGTTGCGGAA CCATTGATAT TATAAAAArr CAGGAGGT TAATGGATGC CTTGCTGG TTGTCGGAAT CAAAGTGGCC GACTACACCA 'rAATTAAAAT ATTI'CTGCCA
GGTGCAATTC
AAATTCGATT
TGGACTCI'TA
TCAGTATGAT
TCAATATATA
ATTACAGT
ACTGCTTTCT
CTAAAAA.ATA
TTATCACTTA
TTTTTA'rCA
AGTCCTATTA
GTTCAA~r ?rGACAccCA
ATATTATATA
TAAAAAAGAA
TCAAAT TCG
AGAAGAAGTA
GA'rGAGATAG
AATGAAGAGT
GGATGEATA
TACGATATAA
GGTAAGGAGA
GTACTACCTA
TATCGGAGTA
CTCAATCAAT
TATCTTA'rTA
ATAAACAAAG
GAGATGCCTG
TGTTATAATG
AAAAACAATT
CAGGI-rACT ?TrAACTTAGC 'rTTAAGATA
TAATAGCATT
ATATTA'TTGT
TGATAGGAAG
GGTATAATGT
CA.GAAtTTrA
TTAGGCAGTA
?TTAAAATr
TGTTTATTG
TATGATTTT
CTTGAATTCT
TG'TTATTATT
TCGTACTATA
GACGAAATA'r
ATTTACGTGG
TCAACGATCT
AACTAAC~rTr ATATTCATAG TAGATATTCT ATAATAAAAT ACAGAATAAA TATCTCTAAG TCTAAGTATG TAGAACCATC AAATTTAAAT ACATTLAATGT TCTTTCTGA 'rTCTAT'rAA AGTCTTCA TAGATTTTAT CGTTAALAGCA TTGCTG'rATC GCA'rAAGATA TTGATAATA'r ATATATCCCr ATATAAATAA CTACACTTTA CTArTTGTGG AGGGATAATG XA'rCCTAAT AGATAACGCT
ATATGAGAAA
GAATCTCCAA TTTGTAAAAA 'rrAlrCG=rC AGTGTr'rT GGC.AGTTCAC TG?'rATAGAA TATATTATGC 'rTACThCAAT TCATGCTCTA AGTCAAGAAA AATAATAAAG TGAAAGATAA TGCTTAT'rAT CAGGAGGAGT ACACAATTAA TAGTATTGTT AAAAATATTC AAACATCATT
AATATGTTCT
CAATATTAGA
AGCITTTI'A
CTTGGCGTATT
GGA.AAATATA
GGAAAACGGA
GCGAAAACAT
TCTGCTATTG
ATN-I-rGCTA
GTATGGCTT
S
S
S
a S 55 a.
S S
S
TTTCATCGT TGAGATTA Tr-1I-IGCTrG TGGATATI?1T TATTGGTATG ATAATATTTT TAGTGTGTCT TAG?1ATTAT ATAACTTTAT AAAATAGGTA TAATGAGCAT T'rGCATATAT CGTATTATCA ATAATATI'GA AACAATCTrG T?-rTTAAGAc GTTTATTTGA TAAAGATAAG TAAAGGAGGT GCAATGAGTA TGATTGAAGT AATAGCTTA AATAATATAA GC1rC;LCTGr ACCA'rCTGGT TCTGGAAAGA CCACAACGAT TAAAGGACAA TCrTA''N rGGGACAAAA GAGAATI'GGA TTGGTTAGCG ATACAAGTGG TCTTCNwrM TATAGTAAAT rTATAATAT CGAGTAGGA TTATATGATA GrCGCAAGAT GCAACGAATG CN'AGCAC GAGCTCTTAT ACCGACCTCA GGTCrAGATC CCACAACTTC GAAAACAGCA GGGACAACGA TTTTTCTAAC ATGTGATrAT GTrCCTrA'r TAAATAA.AGG ACTCATTCAA AGATATAATA AAGATAAAAA GATAACTTTT GATTTr'ACAT CACTAGAACA ?TCAATTCAT TCATGrGAGC CrACTTTAGA GCTAAATGCT TAAACGGTTT CTGGCTNGG ATAAGAGTAT TTTATTGCAA GT'TT'rAGTGC TTATGGAAAC ACAGGGGAAG GTCAACGATC TACCwrTTTC TIrT-IrCTrTG GCTGTrG4GAA AAGAAAAGTA CAATTrACAA ACTCTTCTGT TATCAACTAT GTTTCTTCCT 7T'rTTG.CTAA TrTAGc7AC' TACAA'rTGTA CATACTTTT1A
TAATATTCTG
ATCTCAAAAT
xrTTMATCAG
TAGTAAATCA
GG'TAGCAGGA
CAACAACCCC
ACTGGGCACT TCCTTGCCGA TTAACAAGCG GTGAATTAAA AAAATGTCTC TGTATAACAA CGTGTTGATA ATT-TGTTAAA
AAATTATCCA
GCTGTACTCT
CTGGAATGAG
TTCTGGATGA
768 TTAAT'rrrAG A.ArATTATr cAGITAccAG TAATGCTGTG GATA?1'ATT A1'GT~TA TA'TrAGcAA GGAwGCAG TATGTTTAAA ATAATTATrT TATACTGCTT GAGAATG;TAT CTAACGGrrA TATGCI1rAAT GTTATTG?1' TAALATArA1' TrAAGTAAAA ATGTAGAATA TAGCCATTTA rCAAAAAGTT TrGGTGATAA TAAAGAAGGT TAGATrTTTG GATTTTTAGA TCGAACAA1T CATGAGTTAA T'rAGAA'TT GACTCATGAT ATGAATGAAG CAACTCTT'TT GAAAT'rAGTT GAGCAAGGAG CTCCTTCTGA GATTAAGGTI' ACAGATTATA ATGGGAATCA GGTATCTCAG ACrGATCTGG AAAATATTTT AGATATTTrT ATCACATTAA CAGGAGGAAA TATGGTTGCG 'rTGTCAAATC ATCCTTTCCA CTT'rTGCTTT CACATATTTT TATAAATATC AACAGGCATT AGTrCTTTTG ATGATGTGTT GTCCTATAAC TATTATCTTG TCTGAAGAAA 'rGAGTGGTGT TAAAGCCTCC GAATACATTT CTrMT'AT TATGGGAACT ACTCCTCTrA ATTATATTAC AATCG=TCTT CTAACCTCTT GTTTAACCGC GAAGAGCCAA GTAGTAGCTC TrCTrCTT ACCGATGCTA TCTGGTTTGG 1680' 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360
TATCCATCAT
AGCTTATCAG
TTTATTCTAT
TCTTCCTGCT
TTATTGATAG
ATGATTTITAG
ATAAGACAGT TGCGAAGATA ACAGATTATA G~rTATGGG ACTATTTACT AAGTNTCA 769
CAAAATGGGA
GGAITGTTCT
TrAGTATrrr
GGAATTTTCA
TCrATTAACT TAATGArrAT CAGCAATTCT ACAGAATAGT AA'2AACAATA =TCTCATAC CCAATAAAAA ATGCAACATT ACCTGTCTTG TAATCCAAGT TGGAATAAAA CTCrAATTCC TAATCTAACA CTACTrM-1-r TTAATTACGA TAACTATTAG GAAAAAGAAA A~rrCTTAAT AAACACAAGT GGGAAGGAAA AAAI'GAACTC ATCTTTTGA CrAlrGCTA TATTTTGArr TGAG=ACG AAAAAAGAAA TAATTGCAGA AG?11-rGGGT GATAAGATAA CTGATAAATT TTTAAATCTC CTCTA'rAACT GCTTCAAAAA GTCrIrCAAA A7TITTGGGG ACGGTGATTA ATAAGCTAGC AAAGCATCAT 'rAAGGATTTT TTCGTAATr ~GTCCAAAT CGN-AACA AAATACTCAC ATTCGCATTC TCATTACTTC CCC~rrGCCA AGATGAATAG GCATCCCCAA AATTCCCATT TGTTCAATTA AAGGGTAACA AGCAAACTCT AGTCTTTAAC TA'rTCTTG GAAAQAkGTCT TGTGAGGTGT TT'rAGCTGTT TTTACTTGAC AAGTGCTACT ACAAATAATA T7'=CCTGT
TCAATAGCAG
GAATAGTAAA
GAACGAACrCC
AATAAAACAG
CCGAACTGAA
TCAACATGGA
AAACCTTTAA
*9 49 9 9 9@ 9 9 9. .9 4 9 9 99 9* 4 9 0 69*9 9 9. S* 0* 9 9 99.
9 .99.
.e 69 0 9
AGCAGTCCAG
TGGAACCTTG
TAGTAAAGGT
CCTAGTTTTG
CTCAAGGTAG
GGTTT'GGGTC
TCAGCCATGA
GGTGTAGAGA
GTAACTCAC
AGAGGCAGCT AAGGTAGAC GGTGAAAGGG TGGAGACTAC CCATTrCC CTGTTGGCAG GTrCC~rT'rr TCGTGGCrrC TGTrGGCCAG ACTCTCTCAC AAAAGGAGAA ACCTATGCCA GAACATCGTC CAATCATTGC TCTrGA1T= AGGCGGTCAA GG.AATTT=A GCTrCrT~T-CC CAGCAGAAGA AAGCCTTTAT GGATGGAGCT TTArTACGCA GCGGGGCCTG AGATTGTGTC CrACTTAAAA ATrAGTGTCrr TTTGGATCTC AAACrrCATG ACATTCCTAA TACAGTCAAG AGATC'rTGTC 'rCAGCTTGGT GTCGATATGA CTAATGCTCCA TGCGGCTCGr rGATGAAGGC GGCGCGTGAA GGTCTTGGGA GTCAAGCCAA ATTGATCGCT 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 rCACATCAAC GTCAGAAGCT CAGATGCAGG AGTTTCAAAA TATCCAAACC AGTCTGCAAG AGTcTGTGA'r TCACTATGCC AAGAAGACAG CTGAAGCTGG CrTGGATGCT G7TTGTTTGCT CGGCTCACGA AGTACAAGTC ATCA.AGCAGG CTACCAT'CC AGATTTATC TGTCTCACAC CAGGGATTCG TCCAGCTGGT GTTGCAGTTG CAGATCAAAA ACGAGTCATG ACACCTGCTG ATGCCTATCA AATCGGCAGT CACTATA'rCG TAGTCGGGACG TCCCATTACC CAAGCTGAGG ATCCTGTTGC AGCTTATCAT GCCATCAAGG ATGAATGGAC ACAGGACTGG AATTAAAGAA CTACATTAGA AAAATAAAAG GAGAATACCA TGACACTTGC TAAAGATATC CCTAGCCACC TCTTGAAAAT CCAAGCCGTT TACCTCAAAC CAGAGGAACC CTTCACTTG GCATCTGGTA TCAAGTCACC GATTTACACT GATAATCGTG TGACACTAGC CTATCCAGAA 770 ACTCGTACCC TAATTGAAAA TCGG'rMrTG GAAGCTATCA AAGAAGCCrr TCCTGAAGTA 5220 GAAGGATTG CAGGAACTGC AACAGCAGGG ATTCCACACG GAZCCATTAT TGCTGATAAG 5280 -ATGGACTrGC CN'TGCCTA CATCCGrAGT AAACCAAAAG ACCACC4AGC TGGTAATCAA. 5340 ATCGAAGGTC GCGTAGCTCA AGGTCAAAAA ATGGTAGTGG T'TGAAGACCI' TAT'rTCAACG 5400 GGTGGT1'CAG TrC?'rGAAGC TG'TAGCAGCA GCCAAGCGAG AAGGAGCAGA TGTAC -rGGA 5460 GTTGTAGCGA TTTTCA=CA CCAATTGCCA AhAGCAGATA AGAACTTTGC AGATGCTGGT 5520 GTTAAACTTG TGACGCTTTC AAACTATAGC GAGCTTATCC ATCTAGCCCA AGAAGAAGG'r 5580 TACATCACGC CAGAGGGCCI' TGATC?1'CTA AAACGCTTTA AAGAAGACCA AGAAAATTGG 5640 CAAGAAGGTT AGGTCAGTAA GATAAAGAGA GACGAGGCTA CCGAGTCTCT TTTACCATTr 5700 TATTTAAAAT ATGACAG 5717 INFORMATION FOR SEQ ID NO: 103: Wi SEQUENCE CHARACTERISTICS: LENGTH: 5558 base pairs TYPE- nucleic acid STRANOEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: *CCTGGACTTT CTAAAATGAA ATCT-rGCGAC CTGGATCAAG CCCTTCATCA GCATTTTTCA GAAGAAGAAT TAGCTGGTCA CT'rTCATGTC CrTC'rATGGA CT-TTrTTAC AATGGCAPTG 120 CTATCACACC CAATACC'TAT CTAAGCGCCT GGTTCGTAAA CTrTATTGCA GCTCTTCCTC 180 *TA.AATTrCCT AATTrGTTGAA CCAATTGCCC GTTrTATACT AAGTTCTTTT CAGAAACCAT 240 TTACTGGGGA AGAAGTTGAA GA'PTTTCAAG ATGATGATGA AATCCCAACT ATTATCTAAG 300 CCAGTTCTGT AAACTACTAA TATTrGAAAT CCACTTCCF TTAGGGTGCA ATGGTTATAA 360 ATGAATTTrTT GAGAGGATCA GAATGAAAAA AC'rAGCAACC CTTCTrTAC TGTCTACTU~T 420 AGCCCTAGCT GGGTG'rAGCA GCCTCCAACG CAGTCTGCGT GGTGATGATT A'rGTrGATrC 480 CAGTCN'GCT GCTGAAGAAA GTTCCAAAGT AGCTGCCCAA TCTGCCAAGG AGTTAAACGA 540 **TGCTTTAACA AACGAAAACG CCAATTTCCC ACAACTATCT AAGGAAGTTG CTGAAGATGA 600 AGCCGAAGTG ATTTTCCACA CAAGCCAAGG TGATA'PTCGC ATTAAACTCT TCCCTAAACT 660 CGCTCCTCTA GCGGTTGAAA ATTTCCTCAC 'CACGCCAAA GAAGGCTACT ATAACGGTAT 720 TACCTTCCAC CCTGTCATCG ATCGCTTTAT GGTCCAAACT GGAGATCCAA AAGGGGACGG 780 TACAGGTGGT CAGTCCATCT GGCATGACAA GGATAAGACT AAAGACAAAG GAACTGGTTT 840 CAAGAACGAG ATTACTCCTT AT'ITGTATAA CATCCGTCG GCTCrlr-TA TGC-CrAATAC TGCAACCA ACACCAATG GCAGCCAG?1! CrrATCAAC CAAAACTCTA CAGATACCrC TTCTAAACTC CCTACAAGCA AGTATCCACA GAA A?1rATT GAAGCCrACA AAGAAGGTGG AAACCCTAGT CTAGATGGCA AACACCCAGT c1-rrGGTCAA GTGATGAC G TATGGATGT TGTGGA'rAAG
AATCCACAC
CAGTATCCAC
TCCCATATTT
GATCCCCTAT
TTGGTCGGCT
ACCTTGGCCT
AAACCACAAC
ATTGCrAAGG
ATCGAAGTGG
ATT-CtGGTACT
GGTCTATCCA
A'rTCTTTGAG
C=C~CAGCAC
GCrTCCCCC
GGGCAAGGC
a. *a a a a a a. .a a a a a a a a. .a CAGACAGCT CAAGAGGAGT ACTTCTAAAT AATCTCCACG CTC'TCAAGA GAGTTGTTI-r CGTTCGAAGG TAAGATTrAA TTGGCTTGGA AGATAAAGCG GGTTTCTCAC TTTGGAGTTC GCCATATTAC GAGTTGCAAC ATCTCTTTCT GCTGGCG=rC TCTTGGAACT GGTAGTAGTC ACAATATTAA TAACGTCATT TCATAGTTTT GGAGATAGCG GGCTCGTCCA ACAGCAAGAT GT1CTTTGCC CACCTGACAA AGAGCACGCG CTACTTCGTC CGGTCT'rGAA GTTCTCCTAC ATTTCATAT AGAGGTCATI' CGGAGAACAT CACGCACCGA CCGAAAAAGA TGAAAAAGAC AAGCr-AACrA CTGCTATCAC TGAAAGACTA CGATTrTAAA TCTTAAAAAC CAAAAAAATA G;TATTrCTTT TACTCTCATT CTTAAGTTAA ATTrTrAAAA GCCTrATAA AAGTCTGGCT CGTGGCAGAC CATAAGGATA AGCGCGTTTG AGCTCATCCT TTrGCATCCAC ATCCAAATGG TAAAACGTTrG TTTTCACGAT TCATCAAGAG ACAGAAACGA TGATAATACT TGAATCTGGC TTTCAATATG TTTGG= GTC 'rGcACGGACT TCTGCTrGAT TAAGGGCAGG AAAGGCATTC TrGGCGA'rTA CCGCCTTCTA CTTCCTGCTC AAAATAACCA CTCCACTTCC CCAGCGATTG GCGAGATAAT GCCCAAGAGA TCCAATACCA TrAGCACCAA TAATCCCAAC CTTrTGATTG AGGCTTAGTA AGAGGACGGT CGTAACCAAT -TCA6AGTTC CCCTGGTGTA CGAGCTGGTI' TGAAATCAAA GGATGGTTT GATAATATCC ATCTrATCCA ATTrCrTTG ACGAGACATA ACGGGCTTrTA TTACGAGCCA CAAAGTCcTr GAGGTCTGCA GTAGGCTGCC TCrAGCTGAG ATT'rCTTCAT AGCATAAACT ACCAGAGTAA CGCGTCAGCT GTTGATTTTC CACATGATAG GAGGAATGGA ATATCGTGCG AAATGAGAAC AAkAGGCATTC CTTGAGCCAA TCAATATGCT CAGCATCCAA GTACrTGGTC ATCAGGCTTT TCAAGGAGAA GTTrTGCCAA AAGCACCTrG AGAAGTrACA TCCGTATCCA TGCCAAGTC CATAACACCA AATCTTAGCA 'rCCAAGGTAT AGAAATCACG ACTCTCCAGA -rCTTCCATG AGAGCA'rCAA CATCCGCGCC GTCTTCAGCC GATACCAGCT TCAGCTTrGA AAAGCTCATC AAAAGCCGTA CTGTCTTTCA GCAAGGACAG AG;TGCTGATC CAAGTAACCA 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800- 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 772 GCCGTCACAT AN'TGGACCA CTCAACCTTT CCTTCATCTG GCAGCAT= ACCAGTCACG ATACTCATAA AGGTrGAr TCCTCACCA TTGGCACCGA CCAGGCCGAT ATGTCTCC TDGAGGAGAC GGAAGGACAC ATCTTCAAAA ATTGCACGOT CACCAAAACC GTGACTCAGA ?rrrAACTT
AAGGAGCCAA
GCGATTGAAA
AAGAAAACTA
TrrCGAAAATC
TAATTGTCCA
AAAAACCAGG
TTGAGGAGTA
CTAAAATACT
GCCAGATAGC
TCAAAGA)A
AAAAGAAGGA
AGAGGAAAAA
ATGTCACTCA
CGTCCAAGAT
AAGCAAAAAC
CATTrrAATT CC?7rACCTTG ?rTTTTATGTA ATCGTATA CACCCAAACT G2'TGGTCCAC AXXT'CACA TCTCAAAGAC AGTCCAAGAT TAATTGCGTA CACTCGATTC CAAGACTCAC cc7TrTGr
ACAAGAAGAC
CTTCGCCCAG
GCTGAATACC
AGACAATGCA
T'rTCCGCAAA lrTGGAAATA GATAAAGGAG AT'rGAGGATA TrTTGTAAAA AAATCCAACA TTTCCAGAGA GAATTGAAAG GAGTCAAAAG TGGAGT'rCCC ACTCCCACGG TAGATTGTTC AATGCTATAG AAAA'rGACTC CCCAGACCAA GGATTTACCG CTGCGACTGC CrC".GGCC CC-ACAAGCTG CGTCAATATC TGTACCATGC *4* S 4 S.
S
S
55 S S
S
S. 55 S
S
AATATGATTA TAAGTCTrCT TCATCATTAA TCACG'1-TCA TTGTGTTAGA GCCCAATTGr TCTGACGAA CCACACAGTT GACCCCTTI'T TrCTAAGCG TATCATAGAA AGCCAACACG CACTCTTTGG GACTACGGCT ATATTGGTCA TGCTCACTAA CTGGGTTATA AGGAATCAAG TTTACATAAG ACAATTTCTT GATGTTCTTG AGCAA'rrCAC 'TCAArrCCAA GGCTTGTrCT ACACCdGT63T TGACTTCAPT AAGCATGATA TATTCAAACG TTAcAcG.rACG; Cl LIr.L iGTC 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 *S S.
S S
*(SS
S
S
S. 55 S S
S
TCAATGTAGT ATTCAATAGC ATCATACTTG AACGAAG'rTC ACCCCTTCAT CAGCAAAGTC TGACGAGCAC CGATAGCCAT TTGTTGTAAT TATCAAAGGG TCCTGACCAC GCTCATCAAA TTATTGAGGT CACCTTGCT CAGCCGACCT GAGTCGTCAC TCAATTAACA TACCGTCGGG AGCAAAGAGT TTTCAATCG GAA.AGGCACG GTTAATCTTC ATTGTTAGGT GCGTGAAGAG ACACGGCAAG A'rrGACCTGA ACGAATTrrA TGAGCCAAAC CTGAGGTGA AACCGTGATG TCCTTTATCA TCATTGATAG TACGAAAGAA ATTCAAGACA
CTCACCGATT
GTATTTCTGA
CTTAATCAAA
ACAGACAGAT
CAATTCAAAG
CCCATCACA.A CGATATGGCT GATGCGTTCA ACCACCATGA TTTGCCT~AC GATTTCACCG CCAGAGGCAC AGAAGGTACA ACCGATATTA AAACCATAGT GTTGACGCAT CAGTACAGTC AGATA~rrGA CTGTACCATC AGCAGACTCT ACAAACTGGT CATTGAGCTT AGCAATCAAA GACTCCACAC GTTACGGTA GAGCCATTrCC CCCTGCTCCA ATACCCATTC CTGCATGGT-r ATTTCTTCTC CTI'AT'rCTCT ACTCACTTCT TGCACAATAC GTTrCTTCAA GGGATTGACC TCCTTGGAAA GGFGGCAT TTCTTCAAAT CAGATTTGAT CTGCACGGAA 1,'TCTTTTCT TGATGTACCA AACTATGAAT TGAGGGTTTC GACLGAATGAC AAAATGACGT CTGCATTCGA CTTTCGTrTA CTC-%r.CCT ACGCATI-rC TGTCCCTTGT CCTCTwrTCTG ACGACGTCTA ?1?r'rCTTAT GTrTTGA.GTCG GTTTCTTTCC TTTTC!TAGAA GGTGTTCTT ?1'GTCAAATG ATGCrCGCTr AAGGGCTTCA TTrTCTAAGA CAAAATAGGC ACAACCATAA CTACAATACT 7?1"CAAGTrr~ T1'CTTCTGTT CGGTCATCCT TGCT1CCAGTC CCCCACGATA TAATCAAACT AAGTCGTCAC ATC.AAAGGCA TCCTTGATAT TTrCGACCTT GTCCCCGTGT AAATGGAACT ATTCAGGTGC AAkN CTTTT CGCATAGATA TCTACCATAA TTTCTAGCAG TTAGCACGTr TTGTCAGACC AGAATCTTAT AACCrAAAAA TTATTrAGCA GCTGCGTACA AT'rCATCTAC TTTGATGTAG TCAGGACGCA CGTTGCGGTA CAAGCCCAAG ATrGGTTrTT TACCTTCTGA CACTTCAAGT TTCCCTTCTT TGTTGACAAC CTAAAAGGTA GTCTrTGTAAA CGACrGATTr TGTAAAAACC TCGTAZGCGA AGCTGTTCGT TGGTTAATAC TTCTGAAAAA CGCTGATTAA TTTCAACCAA GGAAAAAGCT ATCCCTTCCG
CCGGACCAGG
TCCTTTTTTC
TCTCATAAAA
AAACTTGTTA TAGTTGTATA ACGATTACTT AATACTTTAT ATGAAAAAAG TCTGACGATT 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5558 GAGAAGAACA ATTCTTCCCT CCAACTATCA Tr-rAT'rCCAG TTGATTACTG AAAAGAAAGC
TTTCACGTAG
GATTGGTGTG
CAACCATGCC
TG-LTGCCT GCAGTGAAG4G CrGCTTGGAA 'rTCTTCAAAT TGCTGCTGCC AG1'TCTGCTG AAGGAGCTGT TTTCTCGGGA AGCGTGGTTC AAGTGTCCGC CACCATTGTT GATAAGTGCT AGATTCTACA TCAGCAAGCA AGGCTrCA.AG GTCTTCACCG -AGCTGCATTG GCA'N'GTTGA CATAAGTTTG ATGGTGTT INFORMATION FOR SEQ ID NO: 104: TAAGCATGTT CCCAAACGTC TCTTGGTTTG CTGTTGAAGT CAACCTGAAC CAAAACGAGT GAACCAAA'rG 1'tGCCATCGAtr GTCATCAATT CCCAGAAAAG TGACGGATAT CAGC'TGGGAT A7TTCAGGGT GTTTTTCTAA Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 6735 base pairs C(B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: CGAATTGTAA ATATCATATT GTTTTTGCAC CCAAATATCG TCGTCAAATC ATTTATGGCA GATACAAAGC TAGTATCGGA AGAATCATAC GTGACTTATG TGAGCGTAAG GGTGTAATAA TCCATGAAGC GAATGCTTGT TCAGACCATA 'r CACATGCT TATCAGTATT CCTCCGAAAC 774 TTAGTGTTTC GTCCTTTATG GGCTATTAA AGGGCAAGAG AGCATGCGAA TrTAAAATAC AAA1TATGvGCA ATCGCAAXGfl TArATACGGT AGGCCGTAAT CAGAAAAGrGA TAGCTGAATA AAGACAGAGT AGCAGACCAG CTCACG?1'AT TCGAGTCAGT TAAATAAGAG GAAGrAACTA AGGTGCTrA GCACCTGCTC AGCTATTTCG GTGGGCCTTT GGCCCTGGCC GOTAGA.AGCG CCACCP.CT C ACACTGGTGG TTTTGATTTA AAAAACTTGA TATAAACGAT GGTAAAATTC CTGTTGTCCG AT7TGGACAA TATGGTCTAT AC'NrMTT AGGAGAAAGC TAGATGTACA GAGGATCA'rG A'rCTGCCCCA AAAGCAAATA GCTACAATAC CAG1rrGA'rG ATTTCATA 240 -rGGTrGTAGA GGCTATTATG TATTCAGAAT CAA?1TACAAG AGATCCGTITT ACTGGCGAAA GGGAAAGTGG TGCGCGAGGA GCTTATAGCC GCAGAACAAA TACATAAAAA TAAAAGTCTA 'rATCCTAAAT AGTTACAATA GACGTTTGAG AGATTTGAGG TTTCGTrTTAC AAATTCAGCT ATGTATTGG;T TAAACTCT1CA CTGArTCC TGATAJLAATT AATGAGTGAG ATTTTTTAT'r CATTTGTGGA GCAAATTTGA TATGCCAAAA TTGAACGGGG TGAGCATGCG TTGACGGCTG GATTTCTATG ACGTCAGTAC AGACTATTTA TTGGGATTAA CGCTTTAGAA AATAATCTCC TCAArrTCAT AGAGI'TGAA TGCCCTTTrGA CAACTGAATA GCCTAAAATG GTACTTTCCT ATGGCTCCCC ATGATAAGAG CGATTTTAAA ATCATCAATA TGCCA'rGA'rA CAAATGATAT ACAATGATAC TTCTIGACCGT AGCACCAAGT GAAATTrTA TGATGACTTC ATCAGTCATG TGTTAGATAA ACGCAATTAA 'rCCTCAAAAG GTTCCCCGAA CATCACGTGG AGTGTGTAAG CT-rGTTGCTA AAACCTAAA AATAGACTTT CTGCGAAACA AA.AATATAAT ACAATAAAAC AACAATTGAG CGATAGCCGT 'rTCAAGATCC TTCTAGGTGT AGATGTTAGC TGTGTTAAAA ACAGCTTATC AACGTAAACG GCAAATTAAG CCTAGACGAT CTCCTTATGG TAACTA'rTCA CTT'ATGAACA AA'rTGCGGCT GATr-rTGGCA TTCACGAAAG AATGGGTTGA AGCAACTCTT ATTCAAAATG GTTTTACGAT 'rGTAAAAACA GTAAPAT'TCG AAGGATTGTA ACGTAAGACT GGTATAATAG CAATCAAAAC TAGAAAATAA AACGGAATT CTAGTAGAGT GGTCATACTA T1GAAGATTAG TAAGAGGCAC GATTCCCrAC T-rGCTTT'TAT CTATTrrGGG CT'rGA'rTGTG TAI-rrAATT GAAGAAGGCA AGACCCCTT GCAGTTGGTT AAATAGAGCG ATACTTTATA TCAGCCTGCC AACGTAAAAG CCACGTTGAA TGTGTGAGTT CC TT'TGAGT TCTACAGACG AACCTTGGAA CGAA.AGGAAT TATGAATGAT GKAGCAAGTA TCAGCGCACG ACTTTTGAAG CGCAAAAGGT GGACGAAAAA ATACATGCGA GAATAGAGCAt CAACTTAATC CGTCGGAGTC TTCAAATTCT GCCTTAATTC TTTTTTCTTT CTGAAAAALAT GGAACAGATT TGTCTGTATC 'rrATTAAArr ATTCCATCTT GTCTATTCGA CCACCAGTGC CGAAACCAAG GAATCT'rTTG 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1.620 1680 1740 1800 1860 1920 1980 GATTGTrACT TTGATACTGA TTGCCTTAAT 'GAGCGACTA ATCATTTTAG TTATATTAAT TATTGGTATT TCCGTAAACG GCCAGCTGAG TACTTAAAAA GCAAGAAGAA ATAGCTAC?=
GGGCATACGG
TCATTATTAT
ATGATTCA
TCGTTCTCCT
TTTTrAGTCTT CAACCAT'rCT TTATAATTG AGACTAGATT TTTGMAAA AGAAATGCTT TATTGT'rCT TGGCTCGT TTGGATrrCG GlTrGCAGGAA TAhCTATrCA TTGGTATrrA GCTCACCGAT TCTCCAAACA AG?1-rrGACT CAAAATCAAfP GGCTCCCCG TGCTTrrAAT
TGATTTAGGA
AA'rCGCTTAT
CTTGACCACT
TGTACCCAAG
CCAGTTAGCT
AAACTCGATT
CGTGATTGAA
GAT'rGGCGA'r
AATGCGACTA
CGCTGGTTTT
ATCAGCCrAA TCGGTGTTCA CGCTTTAGTG CCT'T'TAA AA'rTCTTATT TTGCCATGGT GAAAAACGAG GTTATTTGCC GAATTGGCI' TTGTTGGTGC
AGTTCTGATT
GGTTCCTrG
GGCGCTCGTA
GCcCT'TrCA
TCCTTTTGCC
CAATG;GCGGT
AGAAGCTCAT
CAGTCTTATT
GGAAGTTTGG
ATTATGTATA
a GAATTTcCC
CAGTTAGTGG
GATTTTGCGG ATTATCTTGG TCGGTATCCG AGCGGAGAAT ACTCGCTGTC GGAGGGATGA TG7"rGGTTCA GGTATTTGTC CTTGATTCCA TCTACAGGAG TGACTITCCC C?1'CTTATCC AGTCTTATCA GTGGCAGTAG CCTTTGTCTT AAATATTGAT ATTGTACCCA GAATTGGAAA ATCAACCAAT GAACCTTCTG GATAGTN'AT GTCTCTTCAA A.AATTAGAAA ATTATAGTAA AAGTCTTGAT TC'TAACACAA TTACTGGAAG ATATTACTAA rCTGCCCCTT CTG;TC1IrGT AAAATTCCAG TATTCGGCTA GATCGTGCTG ATGCAGGTCA TGG~rTGGTC 'rAGGTCTTGG ACAGAC 1'TG TCTTTTCTAT TTAGCTCTCT 'rGIICAT CCTTTCAATG CCATGGTTGC A.ATATCGGAG GGATT'rCGGG CAGC-GTGGAA JArAGTCTI'CT GCCAGTGAAA AACGCGCTAA TTGAAGTAGr, ATAA.AGAAAG TAAAAGTGTT GTGCAAGAAC AAATrATGCTT GCCCCAGAGA 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3 360' 3420 3480 3540 3600 3660 3720 CCTTT-GAAAA AATAATACAG ?1'GAAAGAAT TATCAACGCA GGAAGAV'rA' CAAGGTCTAA ACCCTrAGT GACT'ACTA 'rCAAATCATG AAATCGTCTA TATTTCACGC TArCCTA TCT'rGCCTCT Trr'GATrAAT ATTTCAGAGG ATGTGGATTr ACTATGAA ATCAA'rCATC AAAATAATAT TGATCAGGAC TA'rTrAGGTA AATTATCTAC AACGA'rTAAA TTGGTAGCAG AAAGGAAAA TGCCGT-rGAG ATCCTAGAAC ACTTGAATCT 'rGTCCCTGTT TTGACAGCCC ATCCAACACA AGTGCAACGC AAAAGTATGT TGGATTTAAC AAATCATA'rT CATAGTCTTT TGCGTAAATA CCGTGATCTT AAGTTGGG? TGATCAATAA AGATAAATGG. TACA.ATGATT TGCGTCG'rTA CATCGAAATT ATCATCCAGA CAGACATGAT TCGTGAGAAA AAATTAAAAG TGACTAACGA AATCACGAAT GCTATGGAAT ATTATAACAG CTCCrNG AAAGCTGTAC 776 CTCA~?rGAC GACGGA~rAT AA3CGCTTAG CGCAAGCGCA TGGTCTGAAT rrAAAACAGG cTAAAccAAT cAccATGGG;T ATTGATAG GTCGGAccc TG.ATGGAAAT ccATrrGrA CACCAAAGAC C?1rGAAGCAG TC 'GCACTCA CCAACTGGA AGTCATCATG AACTACTATG ATAAAAAGAT TTACCAACTT TATCGTGAAT TTTCTCTTTC AACrAGCATT CTCAACGTCA GCAAGCAAGT CAGfAGA)ATG GCTCGTCAAT CCAAGGATAA CTCGATTTAC CGCGAAAAAG AGCTTTACCG TCGTGCCTTG NTGATATTC AATCAAAAAT TCAGGCAACT AAAACCTATC 'rGATTGAGGA TGAAGAAGrr GGGACTCGTT ATGAAACCGC CAATG;ATTTC TACAAGGATT TGATTGCCAT TCGAGArrCT CTACTAGAAA ATAAGGGCGA CTCCTTGATT TCAGGTGATT TTGTG4GAATT AT'rGCAGGCA GTAGAGATAT TTGC7TTTTA GACAAGACTC TAGCGTCTAT GAAGCCTGT TGGCAGAACT ATTCTCGTTA TAGCGAGTTG AGCGAAGAAG AAAAGrGTGA AAGAAGATCC CCGAA'rrCTT TCTGCGACrC ACCCAGAAAA Axr'rAGCTAT T7rrAACACG GCTCCTGTTr TGAAAGATAA GTCAGACCAT CATTCACAT GCAACCAGCC TrrCTGATAT TAAAAGAAGT ACGACTGGTG GATACGGAAA GGGCGCGTGT AAACAATTGA AGACTTGGAT CATTCAGAGG AAACAATGAG 'rTGCCAAAAA A'rGGATTGAC TCACGAAATA ACTACCAAGA ACAGTAATAA AGATGGCCGT TACT'rGTCAT CATGTTGGAC CTTAGCATCA ATTGATATGC CTTrGAAATCA GCAGGAATTC CCTTCTCTTG AAAGAATTAG ATCAGAATTA TrAGCAAAAG GTTGGGAGA'r GATGTCATCC GCTAGAATTA GCTATTCTGT 'rCAGATr-r CCCC'rLLTT17LG AAAATATCTT TCTCTTAGCC AATCATGCTT GGCTACTCTG CCTCTACAAG GCTCA.ACAAC CTTCTTCCAT GGTCGTGGTG TACATCTCAA CCGCTCAAGT AATTGGGAAT AAATACGGTA GGCAGCTATT AACCGTATGA 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 AATTGACTGC TATTGGAGAT GAATrrGGCG TTAAGGTTAC GTACTGTCGG TCGTGGTGGT GGGCCAACCT ATGAAGCCAT CTATCAAGGA TCGTATCCGC TTGACGGAGC AGGGTGAAGT ACAAAGACGC CGCrrACTAT AACCTTGAAA TGCTAGTATC
TTACTCAGAA
TAGTG4GACCC
ATTATTTCTT
CAGCCGCTCG
CATGG1'CACA
GAAGAGCGAT
TAGTTACGAT
CGAGTCAAGT
TAAGACT~r GAGTCGTG'rT ACCAATACCC CAAATCGTTA TGAAACCATT ATGGATCAAG ATCTACCGTG ATr'rGGTCTT TGGTAATGAG CATNTCTATG CCAATCAAGG CTATTTCAAG TTr~rAATATT GGTTCTCGTC ACTGAAATCG CTGTGCG TGCCATCCCT TGGGTAT'rCT ATGTTCCCTG GATCGTACGG GTcGT-rCA AGCT!'CAAGG AATTTATCAA TAAAAATCCA CAGAATA??G CTTTCTTCCA ATCGCTTCTT TCAAATGTG TTGCTTTTGA ATATGCTAAA CTTTCTCAAG
CTATCTTACG
ATATCGGrTTT
ACGAGCAAGT
ACATATGTAC CAAAATTGC GTCAAAATCA AATATGAATA TAAGGCCATC TA'rGAGACTA .777 TIrAAATGA ATGGCAAGTT ACTAAGAACG TTAI'CTrGGC TATTGAAGGA CATGACGAAC TcTTA~CG.A cAATccATAT CTAAAAGCTA GTCTGGATTA CCGTATGCCT TACTTTAATA ?I'rc AAcTA TAPcAGTTG GAGTTGATTA AACCCCAACG 'rCGTGGAGAA TTCTCCAGTG ATCAAGAACG ATTGATTCAT ATCACCATCA ACGGAATTGC GACAGcATTG CGTAAMTC.AG GTTGATAATT TTCAAGAGTG AATGCTAAAA GTGAATATCA AAAAAATTCT AATAGACTAT 'PCACAAGTAG ?rTAAAAATG ATATAATTTA ACCATTCAGA. AAAGTAATCA TACAACT?'r TrAGAGAGTC TGTGGTAGCT GAAAACAGAT CTATTrAGAA TTTGAAATTA TAAA.AATTrCG 'rGCCAGACTG
TCACACAAGT
GGCG'rcC -r AGCGATAGGG AAATTCCCTA T'rTTTGTGTG AcGATTTTrT GTTTAGATAA GTGAAATATG AAGTGGCALAT GATGAAAATT GGGCTGAATG GTAAGCACAC CTTAC.AG'rGC ATCrCGTTAT TAATTGAGGT GGTACCGCGC ATCGACGTCC TGATGGAGGT TAGTATGCAA AGAAAACGAT TTAAAGGAAA TAAAAAGGAG AAACAGAATG GCATTAGCAG TCTTAGTAGC AGGAAGCTTG CAGAATAATA AGGATGAGAA GAAAATAACC CCATCCCTTG ATTTGATTTA TAA.AGGGATC 5580 5640 5700 5760 5820 5880 5940 6000 6060 61.20 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6735 AAAAATAAAC G TTTAATTGG ATTT~ATTCTT CAATGAATAA AAGAT'rGGTG TGCTTCAATT CAAGATGGAC TTGCAGAAGA TCAGAAGGTG ACCAAAGTAA GACCTTGTGG TTGGTATCGC CTACCGrA TCATIGGCCGC AAAAAACCAG GTGGCAACGT AATTATrGCT
ATCAGAAGCT
TGTGAGCCAT
AGGATATAAA GATGATCAAG I-rAAAATT-GA TTTTATGAAC GGTTGCGACA ATGAGTAAAC AAT'rGGTTGC AAATGGGAAT AACACCAGCA GCCCAAGGGT TGGCTAGTGC AACAAAACAC TATTACAGAC CCAATTGGTG CTAACTTGGT 'rAAAGATTTG TACAGGGGTA TCTGACCACA ATCCAGCTCA ACAACAAGTT GAACTCATCA AGGCTCTGAC ACCGAArGTG A.AAACAATCC GAGCTCTTA CTCAAGTAGC GAAGACAATT CAAAA INFORMATrION FOR SEQ ID NO: 105: SEQUENCE CHAR.ACTERISTICS: LENGTH: 6516 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1OS: CTAGAGGATC CCAGCAGGTA AATTGGCTTC AGCTGGCAAA AAAGTTGCCC TCGTrTGAACG CAGCAAGGCT ATGTACGGTG GAACTTGTAT CAACATTGGT TGTATCCCAA CTAAAACCTT 778 GCTAGIGCT GCTGAAAAGG ACT= rGrrT 1GAAGAAGTC A'TGCTACTA AAAACACGAT CACTGGTCGC CTCAACGGTA AAAACTATGC GACTGTTGCT GG.TACAGGCG TAGATATCTT TaATGcGGAA GCTcAcTTcc ivTcAAATAA ACTCATCGA.A ATCCAAGCTG GTGATGAAAA GAAAGAACTG ACTGCTGAAA CAATCGTCAT A.M'CCCTGGA C?1'GCTACAA GCAAAAACAT CAAATTACCT GAAAAACTTG GAATCCTTGG CCTTTACAAC AAACTTGGAA GCAAGGI'CAC TCGTGCAGAA CCTTCCATCG CAGCTCTTGC ATTGCTTCAA AATATCCATA CTACTGAAAT AACTGAMGAC GAAACTTACC GTTTCGACGC TGTAGAACCA CTTCAACTTG AAAATACAGA AGTAGACAAA CACTGTCAAA CAAACGTTCC TGGCCTCAA 'TTTACTTACA TTTCACTTGA TGGAGATGCC AGCTATACAC 'TTGAAGACCC ACCTGCACTT TCACAAGTTG GTTT'GACTGA CGCTGTTAAG GAAATCCCCG TTGCAGCAAT CdGTGCCTTC AAAGCTGTTG TCAATACTGA cAAcAcTGG;T Gc 'G~rrcAA AcccTrcc CTTT-GACTCA ACAL.GTATCC TGGCGCAPLAT ATCGGTCTTG AG;TCCTAGAT GCCTTGGATA TAAACAATAC ATGGAAGAAG CAAAAACGAT GGTGACCAAG CC'TrC'TCTAC GCAACTGCAC
AAAGCTTGGA
AATTTGCCCG
CATTCCTACC
A'rGGCAT'rGA
TGCTTCTCCT
GCAAACCAAA
GTGCTATTrAA
ATGTCAACGG
GC1'ACCTTGC TGrrCATCAC
AACTTCCATA
464006 9S.
*0 0 6 09 S S 06 0
TATTGAACTA
TGGTGTTT
TGACTTCCGT
TCTCAATGTG
AAGCCAAGCA
GCCTCGTG.GT
AACAAAAGAA
ACTGAACGTG
GCAG=TGAG
GTTGTTTACA
CCAAATACTA
GCTGA-rTGA CTCAGAAGGT TCTCAAGAAA TTACACTTAC TTCACAAAAC CTTGTTTGCG ATrTAAGTTG ACTI'CTGCGG AATCTCAAAT TCATCAACAT CATCACTG'T AAATCTTCAC TCACCCAACC AGATTTAATC GTrATCGAACA CACGTAAATG GAGACCTTCG
ATTCTTGGAG**CAAGCATCTT
GCTATGGACA ACAAGATTCC TTGGCTGAGA ACTTGALATGA GCCCTCTTTG GGCTGT'rTTT 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 CTGTCTTTCT CCTCTTTTAT GATATAATAG AAACATCAAC 00 .0 0 5*55
TTAAAAACTA
GGACGTGGAA
AACC'rAGCTA
ACTGCCCTCA
C'TTTGGGCCT
GTACGCTCCC
AGAACTACGA
CTGTCGGCAT
TCTTGCTGGG CGTTCTTCCC AcrCGTTTT AGGGAAACTC GCCCTTCAAT TTGATAAAGA GATTGTCCTT CTCACTGGAA CAAATGGAAA TrrAAAAGAG GT'rTATGGTC AAGTTCTAAC 65 6 5 GGTGCCAACA 'rGATTACAGG GATTGCAACA GGGAAAAATA TTGCCGTCCT CGAAA'rTGAC ATCCAGCCTA GTCTTIrrrCT CAT'rACTAAT GAAA'rCTATA CTACCI'ATAA CATGATAI'TG GTTCTCCTTA ACGGAGACAG TCCACTTTTC ACCTTCCTAA CAGCCAAATC GAAGCCAGTC TATCTCGTAT ATCTTCCGTG ACCAGATGGA GATGCCATTC GGAAAGTTrCC TACAAGCCAA CTATTCCAAA
AAGCCCTCTT
TATTrTTACAAu
AACCCTGACA
CAACCCAAGC
TTCTAAAACT
CTCTGACTAT
CCCTTTCGG1' AACTGCTrACT
CCCTATAGAG
779 TM rrTGGTT TTGACTTGGA AAAGGGACCA GCCCAACTGG CTCACTACAA TACCGAAGGG ATTCTCTGTC CTGACTGCCA AGGCATCCTC AAAMM'GAGC ATAATACCTA TGCAAACTTG GGTGCC=ATA TCTGTGAAGG TTGTGGATGT AAACGTCCTG XrTccGAcTA TCGTTTGACA AAAC 'GGTTG AGTTGACCAA CAATCGCT CGCTTTGTCA TAGACGGCCA AGAATACGGT ATCCAAATCG GCGGGCTCTA TAATATCTAT AACCCC=AG CTG.CTGTGGC CATCGCCCGT- TCCAGGTG CCGATTCGCA ACTCATCAAA CAGGGA=MT ACAAG2AGCCG TGCTGTCTTr GGACGCCAAG AAACCTCA TATCGGTGAC AAGGAATGTA CCCwrGTCTT GA'rrAAAAAT CCAGTCGGTG CAACCCAAGC TATCGAAATG ATCAAACTAG CACCTTATCC ATrTAGCCTA TCTGTCCTCC TrAATGCCAA CTATGCAGAT GGAATTGACA CTAGCTGGAT CTGGGATGCA GACTTTGAAC AAATCACTGA CATGCACATT CCTGAAATCA ACGCTGGCGG TCTGAAATCG CTCGTCGCCT CCGAGTGACT GGCTATCCAG CTGAGAAAA'r TGTTCGrCAT GAGAATCAAG ACTGCAAGCA TGCCTATATT ACTAATCTGG AGCAAGTTCT CAAGACCATT CTGGCAACTT ATACTGCCAT GCTGGAATTT CGTGAACTGC AGAAAGGAGA ?GAACTAATG GTTTATACTT CACTTI'CCrC ATCAGCTCAA CATTGCCCAC CTCTACGGAA ATCTCATGAA CATCCTCATC CTrCAAGTATG TGGCTGAAAA AZ-TGGGACCC TTCTCTCCAT GATGACTTTG ATGAAA6ATCA CTACGACATC AGACTTTGA6A CAAAGTATCA TrGCAGACGA CCTACCTCCT CTACATCCAA AACCACGGTG TAGTTCTGGC TATCTGCGGT ATATTATGTr GAAGCTrCAG GAAAACGTAT CGAACGGCTA GCTCAACCAG ACCAATAACC CI-1-TATCGG TGACATCAAG 'rGGCTAGTCC TCAGATTGTT AAAAGATGGC AATTACCCCT TAC tACGGGG ACAATGGAAA CATGTG.ACCG T'rGACATCGT cCCTTTT'rCC GTGGTGGTCA AAAAAAGAGA GCATTGACAA GGTTT-CCALAC TA7TGGGTCA GGGGTCATGG GACACTACAC ATTCACAATG AAGATTTCGA 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 288u 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TGAAACCTAC TATGGATTTG AAAATCACCA AGGTCGTACC TTCCTCTCTG ATGACCAAAA ACCGCTGGGA CAGGTTGCT ATGGAAATGG AAACAACGAA GAAAAGGTCG GTGAAGGGGT TCATTATAAG AATGTCTTTG GrrCCTACTT CCACGCGCCT ATCCTCTCTC GTAATGCCA6A TCTGGCTTAT CGCCTAGTTA CTACTGCCC2' CAAGAAGAAA TATGGTCAGG ACATCCAACT CCCTCCCTAT GAGGACATTC TCAGCCAAGA AATCGCTGAA GAGTACAGTG ACGTCAAAAG CAAGGCTGAC TTCTTAAA CAAAGGAAAA 'rGATATCAAA GAACTCCGTT ATCTTGTCGG AGTITTTTTGT CTT TCT= ACCCTrCTCC CTTGCATTTT CTCTCATTTT TTGCCAAAAT AGAGGGGTAG AAAGAAGGTA GCATATGTCT AAATTACAAC AAATCCTAAC ATATCTTGAA 'rCAGAAAAAC ,ri-rACA=~
CTCCTCTTT
GTGGGCTATG
GACTTCAAAC
AAAACAGrr
CTCATCAAAT
GCTGTTCATG
CAAAT4CGACT ACTGG 'GATA
GC'TCTTCTCC
ACAGTCGCTG
GCCCAACAAG
GCTGCCCGTG
CATGGTATCG
ATCGAAGAAG
GTTCG'rATTG
AGCAAAGATT
TCTAGGGGCT
TCTAACCCTA
GATGAAACCC
GATGTTAGAA
ATAGTATCAG
TTAGrrCTGT
TACTAATCAA
TCATAGCCAT
780 TAGACGTCGC TGTCGTATCT GACCCCGTCA CAATCAATTA CCTCACTGGT ATCCCCATGA ACGCCAAATG TTCCTC?11G TCCTP.GCAGA TC.AGGAACCT TCCCAGCTCT ?GAAGTAGAA CGTGCAAGrA GCACCGTTTC CTTCCCAGTA TCGA~nNCT'GA AAATCCATGG CAAAAAATCA AACATGCTCT TCCACAACTT GTGTCGCTGT TGAGTTTGAC AATCTCATCT TGACCAAATA CCATGGTTTG 7rGAGACTGC TGAGTTGAC AACCTCACTC CTCGTATCCA ACGCA'rGCGC CAGCTGATGA AGTGCAAAAA ATGATGGTTG CAGGTCTTTA TGCTGACAAG TTGGTTTTGA CAATATTT-'CT CTGATAAGA CTGAGACAGA TATCATCGCA TTGCCATGAA ACGTGAAGGT TATGAA.ATGA GCT'rTG;A'AC CATGGTCTTIG ATGcTGCGAA TCCACACGGC ATTCCAGCAG CTAATAAGGT TrGAAAATGAT TC'TTTGACCT GGGTGCTrCTG GTCAATGGCr ATGCG;TCAGA TATGACTCGT TCGGCAAACC AGACCAA'rTC AAGAAAGATA TTTACAACTT GACTCTTCAA CTGCTrCTTGA CTrTATrCAAG CCAGGTGTGA CTG.CTCATGA AGTCGACCGC AGGTCATCGA AAAAGCTGGT TATGGTGAGT ACTTCAACCA CCGTCTCGGG GTATGGATGT CCATGAATTC CCATCTATCA TGGAAGGAAA CGACATGGTC GCATGTGCTT CTCTGT'rGAA CCAGGTATCT ATATCCCTGG TAAAGTCGG'r AAGACTGCCG TGTTGTTACC AAGGATGGCT TCAACCT= T'ACAAGCACC 'rGCTTTA'Trr TGATTAAACT ATATAGCCCC TATGCTTTCC TrrCAAAATA AqrTTATTGT CATTTTTCTG CTATTAT1GCT AAAGAAATTG GCTGCAA'rAA AGTGTCTGGA ATGATAACGA GGGTGC'?CTC. CGCTT?-ATC AAAGACAAGG CAAGAAACAA CAATGGAA.AT GATAA?1'GAT TAAGAAGTCA TCTATCAAAA AAAGTTCAAT TTCACTAGAA AATGAGGAAA ATCTCCCCAC A.ATA.AAACGC GTATTGTGTA CTGACCCCAA ACAGTTAGAC AATTAATTA TCCGAAGGAT ACTGCACAGG ACrAAGTCCT TTTAGTTTTA CCTTAATTCG TTTGTTGTTG TATAGTCTAT AATGACTTGT TCCAATTGGT TAACTCATrr AAATG=r~C AAAACATrC GGATTAAA A'rGCCAAAGA AAGATTCCAT CATACCGTTG TTCCC'T'GCG TGACATAGAT GCTTGAArrC CCTTATTCTC TAGGAACCGA CGTGTTGGTA TTGCCAGCCT TrGTrCACTAT GGAGAATCGT ATTCTCGTAG TGAATGCCTC 'rTCCAACATT GrTTGTACTT ATTCTAAAT ACGCGAACAA AAGCAATAXI' T'rCGCTGTTA AAGCCATCTA AAACI'CCTGA TAACTAAAGC
TCTTGGCTGT
'rGATAAGAAT
TGCTTCTCTT
GAAAGATTAA
11?GAGTAC
TTAGAGCCTT
TGAGAAMAAT
CCTTGAACTC
CGATAAGCAT
TGGTCTTTAT
TTAATGGCTT
GTC71'TCTTC
CATTCTCCGC
CTGTATTGTG
ACTCTATCTT
TTGCTGGAAT GGCAAATTCA GTCACATCTG TGTAGCACTT TTCCATTGTr CAAATT-GGGC TTGAATGAGA TTCTCTGCCT TCTTACCAAC GTCTCCTTrA ATTTTCGTTT C~TTCGCATT TrAGCTrGTA AAT7GAGTAC TTTCATCAAG rI-rTArATT- TACCAGA'rAA CCACGA'VrTC TTAGTTCTAA ATGAACCCGG AATrTCCCTT GTGI'CGATA AAGATGGATT GAATT1TCAGT TTrAAGCTCT CTGTITGTC TAGCTGTTTC AAGTGATAGT AGTAGGTCCA ACGAGCTAGT CTrAGAAGAAG ATCTAACGAA AACTCAGTCA rTAATTCTTG AACAA1'TTCT TTTCTCTTT TCCTCCTTCA ATcGGAGTrC TCTTAACTTT TTTAGGATGG CTAGCCACTT AAGAAGTArC GTACGACTrG GGAGACCGTA 'rrCAAGAGAA TAGTCCAGCC 'rTCATGTCAG ACTI'TAI'TAA CCCCAATTAT TCACCCCAAA ATCCAGAATC CTTGCCTrAG CTTAGATCCT GGATGGTTrC TTTTTTCACC 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 651.6
TCTAAAAACC
CAATGGGTG'r TTTTTACTAG AAACACAAAA AGAAAGGAAA ACAAAAAAGA. GTTTCCCCTT TATGGTATAA GTGrAGAAA.A CTCACATGAA CAGTTTACCA AATCATCACT TCCAAAACAA GTCTTTTTAC CAACTATCTT TCGATGGACG TCATTTAACC CAGTATGCTG TTTTCAGGAA CTTTTT'rCCC AGTTGAAACT AAAAGAGCGG ATT TCTAAGT GAATGArnCAA CGCCGCTACT GTCGTTATTC GGATTCAGAT ATCCdT'GTCC TCAACTGTTA ACAGGTTATG GAACGGAATA TGCTTG INFORMATION FOR SEQ ID NO: 106: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 14654 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
GTCTTATCTT
ATTTAGTAAC
AGTTCCTCTT
*Sb 4 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: TTTTCAACCC ATATCGTGGC TCCTGAATAC TACTTACTGA CAACTATGCT ATCAGAGACT TCTCTACTTG TTTTCTATAT CATTTTCATC CATAGAAAAC AACTCATCCA CTTGGGACAT ATCTTTAGCT ATACTGTTCG ATACTCTCTC 'T TTCACTTT CCTTTGTAGC AATTTATTTC CTGATTAATT TCGTGTATCC TGTAGATATG GTCATTAATT TGCCATTrTTT GATTAATACT GG?'rTGATTG TCTTGCTATC AGCTATCTCT TATATTAGTC TAC--rTGTCTT CACAA.AAGAT 782 AGCATTTI=C ATGAA7TT=~ AAACCATGTC CTAGCCTTAA AAAATAAAr TAAAAAATCA TAGGAGTTTA AAATGAAACA ACTAACCGTT GAAGATGCCA AACAAA'lrGA AMrAGAAATT 1-rGGATTATA ?1'GATACTCT CTGTAAAAAG CACAATATCA ACTATATTAT TAACThCGGT ACTCTGATTG GGGCGGTTCG ACATGAGGGC TTTATCCCrr GGGACGACGA TA?1'GATCTG TCCATGCCTA GAGAAGACTA CCAACCATT ATTAACATr'r TTCAAAAGGA AAAAAGCAAG TATAAGCTCC 'rATCCTTAGA AACTGATAAG AACTACTTTA ACAACTTTAT CAAGATAACC GACAGTACGA CTAAAATTAT TGATACTCGA AATACAAAAA CCTATGAGTC TrGGTATCTTT ATCGATATTT 'rCCCTATAGA TCGCTTTGAT GATCCTAAGG TCATGATAC TICTTATAAA CTGGAAAGCfl TCAAACTGCT GTCTTCAGT AAACATAAAA ATATTCTA 'rAAGGATAGC C=r~AAAAG ATTGGA'rACG AACAGCCTTC TGGTTACTCC 'rTCGACCGGT TrCTCCTCGT TATrGCAA ATAAAATCGA GAAAGAAATT CAAAAATATA GTCGTGAAAA TGGGCAATAT ATGGCITTA 'rCCCTTCAAA ArrAAGGAA AAGGAAGTCT TCCCAAGTGG TACCT -rGAT AAAACAATCG ATTTACCCr TGAGAATTTA AGCCTTCCTG CACCTGAAAA ATTTGATACT ATTTTGACAC AATTTTATGG AGATTATATC ACCCTACCAC CAGAAGAAAA ACGCTTCTAC AGTCATGAAT TTCACGCTTA 'rAAATTGGAG GATTAGGATG CAA'rATT-rAG AAAAAAAAGA AA'rTAAAGAA ATTCAACrAG CCCTGCTGGA CTATATTGAT GAGACTGTA AGAAACATGA TATTCCTTAT TTTCrAGTT A'rGGAACCAT GCTGAGCC ATCCCCCACA AAGGTATGAT TCCTTGGGAT GATGATATTG ATATTTCCCT TTATCGTGAG GATTATGAGC GTTTACTGAA GATTATTGAA GAAGAAAATC ACCCTCGCTA CAAGGTT-CTT TCCTACGATA CATCTCTTG GTACTTCCAT AATTTCGCAT CGATTTTGGA CACTCTACT GTTATA.GA6AG ACCATGTTAA GTACAAGCCT CATGATACCA GCC7TTTTCAT CGATGTCTTC CCAATTGATC GATTTACACA CTTGAGCA'rT CTCGACAAGA GCTATAAGTA TGTGGCTCTT CGTCAACTAC C'rTATATCAA AAAATCACGA GCAGTTCACG GTGATAGCAA ACTAAAAGAT TrTCTTACAT TATGTAGCTC GTACGCTCTC CGATTTGTCA ATCCTCGCTA CTI'TTACAAG AAAATT-GATC AACTACTCAA AAATGCTGTA ACCAACACTC CTCAATATGA AGGAGGAGTr GGGATCGGTA AGGAAGGGAT -GAAAGAAATC TTCCCAGTTG ATACCTTrlAA ACAACTGA TT TTAACTGAGT TTGACGGCCG TATGTTGCCT GTTCCCAAAA AATATGACCA ATTTTTAACC CAGATGTA'rG GCGATTATAT GACACCACCA TCAAAAGAAA TCCAACAGTG GTATACTCAT AGCA'rTAAAG CTTATCGCAA AAACTGATT'G AGGGGGATT-A TACAAACTAC 'IAACATAGAG GTTArrCAAA AACATAATT TAGTACAAAA TGAAATACAT ATTCCCACAA TAAAACGCAT CATATCAAGG ?lrr'rCAAAA 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 783 ACCTTGATAT rGATGCGlrr ATAAM?11AA ATG.CGAACAA ATCAATTAGA AAATTCAAAT GTACTGTTCT AAATTCAGTC TGCTATATCT AA=CACGAC V1-rCAAGTAC CTTAAGCATIG GGCACCCCCT GTTCAATGGC 'rGAACGACGA T7TTGTTCCTA CTGTGTTAAT GATAGCTTGA TCCTTATCGT CATCACCAAT CTTACCAACA AGACT'rTT'
TAATTTATAG
TATTTTTCTA
OCATTAGCTG
AI'TGCTCAC
ATTCTTCCTT
CGTTrCCC= CCA'rAACCAA GCGA'rGGThA
CTATAA
AAATATTTTA
TITAAATCGC
TATCrAGCGC
CATCTTCGTC
TCTACAAA
GCA.AGCCATG
TGTTTCGAA
AGACGACATT
TTGAAATAAG
GTATTCCTGT
?1'CTGTAACA
TGTAAGAGG
AGCAGTCrGT
ACTT'GGATA
ACTAGCAAAG
ACGACGACC
ACCAAAAG~T
AAGGCTGCTG
AAGTTCAAGGC
CGCAAGTGTA
TCCCTTCTGT
CTc~rc
GATAACAAC
CGCAAGGATr
GGCATCATCA
TrCAAAGCCT
TGACTTCATT
'1rATAGAGAC CTT'1-1-CCAA AGTAGCATCA TCAGGACCGA GCAAGCTG;TC TACCTTAGCT GAACCCATAA CTTCACCTGT AGTTTGGTAA AGGAGAAGAC CCATTTTGGT AGCCAAGTTC GCCATAGGAA TATTGGTTAC ACCTCAATAA CGTACACTTT CAGTGAAGAC CGATTGCTAA GACAAGGITTT GTGGTGGGTA ATATGCTCCA TGATACCAGG CACTCTT-GCC CAACGATATA GCAGTTCGCA TGTAAGAACG CCAAGTACAT AAGATGGGCG GCTTCTTCTT CATTGGTAC GCT'rGCTCGA AGAGGTCACG ATGGTCACAC CTGCTTT'1TGC TGAACGATAA CTCCCTTT'GG AGGTGCCI'G ATATGAACAC GGGTGC~rC AGGGTAAAGT TGATAAACTT TGACCAAGAA TGAG'rTTGC;T CGCTACTTGA CTTAGATAGG AATGGA.ACAC TACGGCTGGC ACGTGGATTG TCATCCTrG ATAACAAACT GGATCAT CAT'rCCAAGG GCG7TT'GGTG TACTCTGCC-A TGGTCTCCTG AACC'rTGC AACAGCCATT GAGTCACCTG AGTGGACACC AGCACGTTCG AATGAGTACA TTTTTACCAT CTGAAAT1'CC ATCAAC= CG AGAGTCGACA AGAACTGGGT GGTCTGGACT AGCCTTAACA AAGGTCTTrCT TCGTTTTCAA CCATTTCCAT GGCACGTCCA GACAAGAACT GGGAAGCCAA TCTTGCGAGC TGCAAGAGCT CCTGTCC'r GGTGGCTGTG GAATATCCAA 7TCrTTGAGA GTCTTCGGCA CGATCTAGGT CAGCAACCTC TGTACCAAGG CAATGGCTCC GCAAGGTA TCGCTG7T-TG ACCACCGAAC TTGT'rCCAAG TCAATGACGT TCATAACATC TTCCAATGTC 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 AATGGCTCAA AGTAAAGC1-r ATCTGATACA GAGAAGTCTG TTGAAACGGT GAGTTCATGA TGATAGCT'TC A'rAACCAGCT GCCTGGATAG CC'TTAACAGA GCGTAGTCAA ACTCAACCCC T'rGACCGATA CGGA'N'GGAC CTGAACCTAG GATTCTTTAT CAGATCTGAT AGATrCAT'rT TCCCAACCAT AGCTTGAATA
CTCTGGGT
GTGAACGGT'r
GACAAGTACA
GAAATATGGC
G1-rTCGGAGT CGAACTCTGC CCCACAAGTrG Tr'IrCCAAGC GAAGTTGGCG AACI'TATCA CGGTCTGAAA AACCATTAAAG TTTGGCTGTT CCCAATTCTT GCTCAATTTC AAAGA'rATGC 784
TCTACCATCT
TCAGTCGTTrC
TTCAAAACTT
AAGAGTTTAT
TAGCCACGAC
TrrTCAATCA
TTTGTAAGCT
AAGAGACGGT
GCAAGTTC.AG
AGAGATTCCT
CCGAGACGGC
ACGTAGTCAA
TCATCCAAGG
CTGCAATTTC TTCAGGTCTG CATCTTGGGC TTTrGACAACC
TATAAACTGG
CCCAC.XGTrC
CTAAACTG
CAAAGATAGAA
GAATCGCTTC
AGGCATCATC
GGGAGCGC.CA
CAGTCGCCI-r
AACGTGGAAT
AACAATCTTG
AGCAATCTTA
TGGATGAGCA
GATATCAAr
TGATACGTAG
AGAAACTGCT
GGCCTTGAGA
CATTTGTGTA
CTTAGCAACT
GTA7'rTCATT GTGGTGCACC CCAAT~rCAA CGATGTTACG ACCC.ATTGCC ATGACTTCTC GTTCACCCTTr TTCAAACT T'CAAATGGGA GGGCTGGTTC AAACATGGCA TAGGNTGAAC 'rCAAACCTAC TGCAATCTTG GCAGCCAACT CTGTAACTGG GTTTATAACC TAGCA6ATCGG ATATCCTGTC GCTTTAGA6AG CAAGGGCTGA CGAACGTGAT ACACGACGGT TTACTTCGAT AACATAATAC TTGAAGCTGT TAGGA'rCAAG CGAATAATGC TCAAGCTCC GCAGGGGCAA ATACA6ATGGA AGCTAGCTGA ACATITACATC CACCTrCAAT CTTGAGGGCA A'rCACGAAGC ATGGTTT CATACTCTGA CATCGTTTGC ATCCCCTGTG TGAATCCCAA CTGGGTCAAA GTT'TTCCATG TTACAAACAA CCAAGGCAT'r GTCAGCTGAG 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 TCACGCATCA CTTCGTATTC AATTTCCTG AAACCGG.AA TCGAACGCTC TCAGTGATT CACGCAATTC GTAAAGGCTG GACGAACGAT TCTACTGTrGT TAACAATTTC TTAAAGAGGT CACGGTCCTC AATCAAACAT 'rGGGTAACAC TTTCTCGTTG GCACACATAC GACTGGGTAG CCAA'rTGTCG GTGACAATT'r
CACCACCAGT
CTGCAAAGC
CAAACCATT'
ACCACCAAGG
AACTGCTTCT
AGATTCTGGA
CGCTTGGTCA
ACGCCAAGCT CGTCTAGGAT ACCATTTrTTA TGACCACCGA GTGTTGGTAG CAACGCATCT AACTCAAGTG TAATCGGTTC AATGTAAACC GTTGCAGGAT T'TGAGTTAAC CAAAACAACC ATrGGGTrGTT CAAGCTCCTTC CATCAATTC-T ATGGCAGATA ATTGGTACC CAGAAGTTCA GATAATTCCA TGGCCATGTT GAGACCTGTC CCACCTTCCT TACGAAGAAT ACGTGTCACA T'rGTCAGCAA TTTCCTTGTC CGTCATGATG TCATAACCTT CCTC?'I-rCAA CGACAAGCPA GCCTCACCAA TAATAATCGG ACCAGAACCA
GCC'TGAGTCC
A'rCACCATAA
AGCGGACAAA
AAACPCA~CA
TT'TTTGAAT ATCAGTACCT TT'AGGCATAT C&AAAATAG GAGTTA'rGAC GAAGAACTGT ATAAGATATT AAGGGTGTCA CAGTTCTAGG AATAACTATC TTTTTAGCAC CGTCCGTAGC CCGTATTCAG TTCAGCAAAT ACGGAGCACC CTTCTCCTT'r CTATTCGCG CCTCTCAGGG CGACATTAAA TAAGATACAA AGGACGAATA GAAAGCCATT GAATTTrAGG AAATCAAGGA AGGATTGACA ATCCAAGTTG GTT~cTAc ATTLPCAGCT 5700 7 ?CCGTCCGT GTCAGTTAC ATAAATTCTC CGACGAGCrI' TTACTCG -rC TTAGTCAT 5760 TGTTTAAAAA CTTCCMTCAT CTCGATAAAC TCGTCAAATA GGTAGCTAGC GTCGTGTGGC SB820 CCAGGAGCTG CATCTGGGTG GTATTGAACA GAGAAAGCAG GrrGG'TATC? GTGGCGCACA 5880 CCTTCCACTG ACTTGI'CATT- GATTCTTCG TrGGGTAATAA TCAAGTGCTC TGGCAAM'CC 5940 TCGCGGCTGA CTGCATAACC AT=GTTCTGG CTGGTGA6AGT CTACTCGTCC TGIGCATT 6000 TCACGTACCG CATGTTGAA TCCACGGTGG. CCAAACTTCA TCTTATAGGT CTTAGCCCCG 6060 TrTGCCATTG CAAAGAGTTG GTGTCCCA'rA CAAATACCAA AGATTGGAAT 7wrMCCr'rGT 6120 ACACCCAA TCATG'rCGAG TCCTTGTGGA ACGTCTTCTG GGTTACCTGG ACCATTTGAC 6180 AACATAACTC CGTCAGGATT GAGATGGAGA ATTTCTTCAG CCGTTGTCGA ATAACGAACA 6240 ACTGTCACGT TACAGTTGCG 'rTrAGAAAGT 'rCACGTAGGA TTGAGTGCTT GAGACCAAAC 6300 TCCACTAGCA CCACGCTCAA ACCAACTCCT GGAGCTGGAT AAGACCTTTT AGTAGAAACC 6360 TGTrTTGATAT TGTCTGTCGC TAAAACTCTT GCTTCGAGCT GGTCCGTCAC ATGGTCCATA 6420 *CTGTCCCCAA CATCGGTCAA CGTTGCACGC ATAGTACCAT GCTTACGGAT AATCTTGGTA 6480 AGAGCACGCG TATCAATTCC TGAAATCCCT GG.AATrCT TGGCTTTCA-A AAATI'CATCC 6540 ***AAGGTCATTT GGTTGCGCCA.GTTGCTAGCT CTACGCGCrT CTTCAAAAAC AACCACTCCC 6600 *TTACAAGTTG GAATAATGGA TTCATAATCA TCACGATTAA TACCATAATT TCCTACCAAA 6660 GGATAAGTAA AGGTCAAGA'r TTGTCCATTA TAAGACTGGT CTGTAA'rGGA 7rCTGGTAG 6720 CCGGTCATCC CTGTATTAAA GACGAT?'rCG CCTGTTACAT CAATATCTGC TCCGAAGG;CC 6780 *TTGCCTTCA6A AAACTGTGCC ATCTTCTA.AT ACTAGAATC 7rTTTCAT A7=TCACCT 6840 CTCGTGGACG CTCACTGGCG TCTTTTAACG TCTTGTGTTT TAGTTCGCGT TTCTACTCGC 6900 TAG'TACGGAT TCTA6AGATTG CCATTCGAAC AAAGACACCA TTGGTCATTT GTTCGACAAT 6960 *CCGTCATTTT GGTGCTTCAA CCAAGTGGTC 'rGCTATTTCT ACATCACGAT TGATTGGAGC 7020 *TGGTGCATG AGGATTCTG TT-TCI-rTCAA ACGA'rCGTAA CGTTCTTGAG TCAAGCCATG 7080 TTGGGCATGG TAGTCTTrr TTGAAAATAC AGCTCCACTA TCATGGCGTT CGTGTTGCAC 7140 *ACGGAGAAAC ATCATGACAT CAACCTGATC AATGArCA TCAATGGTTA CAAAC1'GTCC 7200 ATAGTCTGCA AACTCTTGAC TT~CTCCATTC CTCAGGTCCA GCGAAAAAGA GTTCAGCTCC 7260 CAAGCGTTTC AAAATCTGCA TATTGGATTT GGCAACGCGT GAGTGGTCCA AGTCACCTGC 7320 AATAGCAACT 'rTAAGACCCT CAAAGTGGCC AAATTCCTCA TAAATGCTCA TCAAATCAAG 7380 786 CAAGCTCTGG CTAGGGrT1T GGCCCGAACC ATCTCCACCA TTGATGATGG AAGTCGTAAT CGTTGGACTA GCAATCAAT CTCTATAGTA GTCGACCTCT GGATGGCGAA TCACACAGPAC ATCCACTCCT AAAGCAGACA GAGTCAAAAT GGTGTCATAA AGTGTCTCAC CCrATAAC AG'1-1-AATCT CTGCGACTTC CCAOCTAGTC T1TCACATCAA AGTCAAGTCG ?TCCAATCCA AAAGGACTA TGTGTCCGG TAr-IATCCTC AAAGAAGAGA TTCATAGGGA AGCTIGGGCTC CATTTTTAAA CCAATTCCT rTGATCGACA GTGAGGTC1'T CCATGGACAC CACATrG=TC CA'rGGCTACT CC -IrAACTT TCTAAGC~rC 'rrCACTAATC AAGr'rCTGTC ATCTCTACGA TGATTTCTTC AGAACGACTG GTAATCTGGA CGGATTGGCA ATTCTCTATG TCCACGATCG 'rGGAAACAA
CGCTTGATCA
AATGCTTGTT
TCGGATGGTC
AVI-rCATTAC
GATTCTGA
a.
a a a S S a a *aSaa.
a C gaS .aaa a a a. as S a ACCGCAGGA CGACCATGAC ATAGAGCACA TCA'rCCACCA AGI'ATCTTCT CCACrrTAA AACTGAAAGA TTTTCTAACT ACGAGrTrTA ATACCAGCCA ATAAGTAATA CGCGTAATCG CATGACAAAC CTCCAAAAAG AGCCAGCCCT ACTGCACACA GCCTCACGGG ACAGGTTTAA TTAAAA'rCAA GCAAAAAGTT TGGGTACTGG TCACACTCTG GGCCTGATGG GCAGCTAACC CCAACZTrCG GTCC-CTGATT CGACAATATT ATCAATAGCA AGATAACTTC GCGGTCTGTC CATCATCACC GAAAGGT'rTA GCTTCAAACG TTC1"rGGATT AGACGATCTr ATTCAAATCT CTCGTTTGAC GGTCAATTCG AGAACTCTGT C7TTGGTCATC GTTGGGATAT T'rTTCCAAC ACTAGAACTG CTAAAC1'CAC GCACGGATGG TACGACCTGr ACATCGACAG AAACCAAAGA GTATCCAAT CCACAACAGG CGGTGGGCAA TAAAGACACC TTrGTTGCGTT CGATAATCTC TCTACAACTI' CT'TTrGTrTT AGACTITGAAA TTTATAGCCA CTTTATCGCG C'TCCTrGCCT TAC'TATAGCA CAAAGCATGC TTGCTAAAAT CATATAATTG 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180
AAAAGTCTCC
GTATACACITT
ACGAATATIT
TCAATGTAGC
TTAAACAAGG
CACCCTTCTA
AGI'TATCATT
ATCTTACAAA
AGTATCCACT
GCCACCAACA
ATCGTCTAGTr
CAAGICGAGAA
TGGAAAGGCA
TGATAACATC
GCAAAGGCTG
CCTGCCAAAC
AACTGrrGGG
ATATGTGAAG
ACAAAGAGAC
ACCGCAACCA
GAT7TN'TGG ATGGCAAATG GCTCTTCCAA AATAAATCAT ACTGCTCAGG CGGCAAGATA TCCATGACCC CTTTCCAC TTTGACAAT ATCCTGGTGI' TTGCAAATAC GCTCcAcATrG GAATTrCCAAA TCCTACACTC ATCACAACAT TGGCTGTCTT TCTCCAATTC TTCACCTGTC 'rGAGGGACTT GACCATCAAA CACATrr AAGGAATTTA GCT'rTATTCC GATACAATCC CAATCTCACT CTCTGTCGCT ACAGACATAG CTTGGGGTGT CTGGTGTGGC CTTATTTACC GCTGCATCTG TCGTCTGGGC GGAGTTCAAA ATCATTGGTA AAATCAAGAC TAGGCTTGGC TTTCTTCTAG CACCTTCGT GCTCGTrT TTGACAAGAC ATCTGGGAAG AGGGCAATGA
CATTATTCAT
TTTACTG=?
ATAAATCCTT
787 CTCCGTCAAA TACTCCTTGT AAGCCAGCAA AACGACTCT'r TTtrC TTrGAGCTrG GTATTCTTCC TCTGTCATCA TrGCCAGTC ATTCCTGAG GACCACTC TrCT~rCAGCC GTCAACACCT TCATAGGAAT G~rAGCAGG 4
-S
5*4 9*
S
4* 56 5 ATA~rTCTG ATACACTCTC AGCAAGGTCA AGCTCCCCAT r ?CATGGG CAAGAccAAG TCATCATCTA AAACr=~G ATCrAGCTGG TrAGTTGCGC CI'CCATGAA AAC1TrCCGTG ACTGGATAAG ATTCAACTAA CTCAACTGGC TCCATACTGC GACTCGACCC AACAACAATG GTATAAGATA G~rMATAATC TAAGAAATAC ATACGGTcTr CATA'rTGTAC TTCCCAACT GCAAGGATAT CTrrACATc TAAAATTTCT TGA?'rACGTG CACGCAGGTC ATCAACTAAA TCTAACG'7r GTrCAAAGTT CAAACCTTCA CACTCrAC GAATTTCTTG AATATTTAAT TTCATACT'rC CTCCATAAAG ATrTACTCTC TTCATTATAC CATGAAAAGG CrACAAATCA GCACACCAAA CTrGAATT- AAAATTCAAA ATMtA.ACAT ATTTACTATG ATAGi-ITAT 7TTACTGC TA'rACTATAG GGAAAGCGTA CATCAGATCA AGGAGGATGC TCACATGGAA GACAAGAA.AC TCATTCAACT CCTATCCAAG TTAA.ATAAAA GCTACCAAAA CTGTAA.ACAG GGTACGGCAG A'rGA'ATTCG ACTACAAGAG CTGCTAAACA CTACTATGCA AGAGCTCAAA AAA.ACGGAAC AGTTGAACAA CAGTATCTTA ATTGATCTTG AGAAArrTA CCAACCTACC AGTCTTCTGA TTGGACTGGG TAGCCTAAAA CTAAACGATC AAGCACGCAC TGCTTGGCGA 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800
S
S..
C
*S.S
55
C
4 5 AACTA'rGATA AATTCCATT-A CGA'rCATGTC T'rTGALATT-rT AGAGCATAGA ATTTCCAGTT ATATA.AGAT ACTAATACTC GGAGGTAAGG AGAACCTTTA TTAGTAGAT CTTGTTTTTG TTCTGGCATG TTAATACAGT 'TrTTTGACA CCTTTGT ATGCACTATG AACA'rTCTAG TCTAACCATC ACACAAAACA AAGATTTACG CCALAGACCGGG GAAAAGGTTG CrATTATTGG TAAAATTTTA ATGGGGGAAG CTCTCTCA CTATCAGTCA CTGGCCTACA ?TCCTCAAAA ACACCACTAC TTCTTTTTAG ATTCTATTGA AAACACGrAC TAAGTTCTA TGGACCTGTT TTCTGTTGAC AAAATTTCCT TAAAGGTATA CAGACATGA.A CAACTAAGTC TATCAAATAA TCTCTTTG TGTGC7-CTTT TATGCTCTTT TAGACTTTGG CCTCTACTAG GTAAAGTAGA AA.AGGGAAA'r CATATCATAA AAATCAATCA AGATC1-rGTA TCTGACCTAA CCATGACCA'r TGAAGAAGGA AATGGCAAAT CAACCTTACT TTTCACTATC AAGCGAAACA TCCAATCTGA AGTCCCTGAG GACCTAAAAA AGAAAACTTT TTTAGACTAC AGTATCCTCT ATCGrTTGGC GCGGAArrG CATTTTGATA GCA.ATCGTTr CGCAAGTGAC CAAGAGATTG GCAATCTATC AGGGCGCGAA GCTCAAAA 'rTCAGCTTAT CCATGAGTTA GCCAAACCCT TTGAGATTCT 10860 10920 788 ATIr~rAGAT GAACCTTCAA ATGACCTALGA CCTTGAGACA GTTGATTGGC TAAAAGGCCA
GIATTCAAAAG
AACGGCAGAC
ACCAGGCAAA CCGTTArTT CA'?rTCCCAT ACTATTGTrC ACrGCGACT GMCAAACAC AGTAGAGCAT TTACACTATG AAGTCAGCAA GCTGCrAACA AGTTAAGCAA AATGTAGAA.A ATTGGCTAAA AAGATGAAAA G;TCCATGACT CAAAAGCCAC ACCATTACCA GCTTCTAAAG CCGAGTTG GTTCAAAAAC CGCGCCAAAT GGTGTTGGGA TAAAAGAGAG A~rrTCACTTG TTTATCCCCA ATAGCCTATC A'rCTCACCTA GCTAGTCTCA ATAGCTAJTAG 747a.GCAGAGA ACCAAAGAGC CTACGATAAA CTGCGCI'CG AGCTACCAAA CTCTCCTCTC ACAAGAAAAA ?rGAAGAGOA ACAAATCCAA TCTTACI'CCA ACTGGAAAAA GAT1GAAGACT
CGTAAAGAAG
AAGGCTAAT
ACCATGGAAA
GATAGTACTG
CGCTACGAAA
CTTTTCTrr GAAAAT'rrGT
CAAGAAAAAA
TTACAGAGAC
TTC?1-rCIGA
CGAAACGCT
?rGCCAAACA
AACATCGGAG
CCGGTC~Cc
AGGCAGCTCA
CAGACATCCA
CCATTGACGA
TCCGTAr-rAT
TTCTGAATGA
TACAACTAAC
AATCAACTCT
G'TTr'ATGCC
TCAGTAA.AAC
ATTTCAGTA
TGTCCCTGGCc G~rrAGCCAAG ATCTGGCGGA CAACAGGGAA AAC'TCCTGCT TCTCCTGCTG GATGAACCCA CACGAAACTT ACTCTTTGCT ACCTATCCAG GCGGTCTC-AT AGAAGTCTGC 'rCGATCATCT ATCGC-ATGAC AGATTTATAA ATTTGCAACA TAGCAAAAAT TGTTrA.AAC GTTCAATCCG TTCTGAGATA CCCCCACCTT TCTTAGGA'rC ATTCATATAA CTCATACGTT TGCTATTGTC CAAC?1'ATCT GTCAGCTCGA CACTAGATGC A'rCTGCCAGA ACCAAG;GTTG CAGCGAGAGG TGCCACTACA A'N'TCAAGAC CAT!TrCCATC TCGGTCATCA ACAAGATTAC CACAAAAAAC TGCAA'rTGGA TGGGGAAAAA GAGGAACTAC AGAAAATCCA TCCAGAAATG CAGCATCAAA TTCGCTCCTT TTTGGATTTA GTCCTGCGCA AACCAAACTT 'rrCTCCCACT TCTCAACCCC AAATCAGAAA CACTGTTTCG CATGACCGTC GTT'CTrAAA AGAACACGGT TTGAAGCTAG TTAAT'rTAGA CCAGAGACCA CCTCTGGATT CrTT'TACATC 10980 11040 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11900 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720
GGTGGGTGGG
AGGGCACTGC
AGGGCATTAA
AArrCCCTCT
ATAGCTAGTA
TCACTTCGTC
TATAAAAGAG TTTTGGAAC TAGCATCATC GACGTGGCGA TCATTCCCTG GGGATTGCGA GACGAGAAAT AGCGAGCTGA
GGGAAACCAC
TGCGACCTGC
CACTAGC.AAG
TAGCATAATG
TCCACCCCAC
GGCAACTGCA
CACATCATAC GACCTGCCAT ATAGTCGAAA TACGGATATC TCrAGTTCTT
TGAGGATTAG
GGCATAGGAA
CACGATITCAT
AACCTGTCGC
TCTGAGCGAC
ACTAGAAAGC ATGGTGATAG ATAATTACGA ATATGACTGA GATAGCTAGT AGACCTGAAG AAAGGCATTT AACGCTGGAT CAGAGCCATA TCTTCCACTA CTTCATGTCC CATAACAGCT TCGCACCAAC AGCCGCATTT CA'rCAATGAT GAAAACACGG CATGG'rAGAG GTCTGGTGCC GTTrGCTCAT CCACCTCACC CGCTCCATTC ATGGACATICA ATCATAAGACA AAGCGTAGAT AAAGCCGATA ATCAGTGCAA GAT'rrATAA AGAGATAACC AACCGCATAA CCAACAAGAL AGCAACAAAA TCCACGCTTT TCGTTTATTG CTrCcAATTr ACCTAAACCG CrAAAATCAA CTIrrAGGAAC CGACTrrTCC ATCTGCCGCT TrAAATCCAA ACATTCCAGC GATAATAITG TACA~rGTAG TTGCTCACAA CACTGTTA'rA GAGTT-GACGA 'rGTG'IrCC AACTCCTCTT CCAATrrAAC AAAGrrAGCA GCTTTCTCCA ACTGCAAAA.A TACCTGAAAC CTGACGAG7G CAATC1'CrGT CGATTGAAAA TAACCAAACC ACC.AAGTCCA CrAAGAGTAG GAAAAATACC GATCAAACAA CATCTAGTC TCrrCAGGTG 7rCAAGGAA CTCCGGAAAG TrrCTAATr GAGTAAGAAA T =AITTC CTAGCTTTCA AATCTGGATA AGGGCATCAC TGGC=r'ICAT
ACCTTCTGCT
GGTAGAACCT
ATTGCGACGT
GGTGAAGTCG
TCATATTTGG
TTCAACTGAA
TTTAACCAAA CCGT'?ATAC AATAATCCAA GTCATAATAT TTTCTATGAT TGTGTAAAA ACAT=rACA ACTTGCTTCC 'TTTGAACGI-r ATrMGAGCT ACGGACA6AGG AAAAGTTTA GGTTTGATTC CCAATGAAAC ACTCTACCAA TGAAAATTCI' AAGCGCATCA ACTTCCTACA
CTGCCGCCAC
CATAACCTrr
CATCAATCTG
TAACAATCAC
AAGTCCTrTC
TA.AGATGATA
CGAGCAGAAT
CTTGG1'CGAG TTCGT'TACGT AG~TCTGCCA CCT'rTCAAG TACAGTCTCA ATCAAG=NG GCAAC-ACGTC ACTCCAAGCC TCCTGGTNT GCATACGATT AAAAATAACA ATAAGAGCGA TAACTCCAAG TCTTAGA TTAGTACCAG TATATCAA.AT CTAAAGAAGG AAATAACTAT GAAACCAAAA CTTCCACTTT CGGACCACCA AAAAGAACAA TGGAATGAGA AGArrA.ATTT GACCCCGATT 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 135S60 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 TCTCAAACAT TTM~ACGATT TATCAAACTT CTTGA'rATCG CTATCCGGAG TTAGATGTGA ACTC'TTGGCT CAAGAACTGG TACCACGGAC GTGCCGAAGA 7IrTTGCCCA.A GACAAGAACT GTAACAGCTC GTGCGGTTGC CCGTATGCAG GTCCTATCTG AAGGTTGGTG GCAAACTAT AGCACTCAA G CTAGCA.ATG GCAAGAATG CCCTCAATCT CCTTTTAGT A.AGGTCGAAG CGAA'rAGAGA 'rCCGCGCTrA' ATCACAGTGG TAGAAAAGAA CATGCACC CATTCTTCAA GGGCTGGGGC AGGATTTCCT CCArrA=TA TTCACTCAAT ATTTGAACGG AGTTCATTTC TCCGTCCTCA ATATGATTT AATTGACTAT TCCCTACCTT CGCCTGAGCA ATTATTAGAA ACALATCTCAG TACGCCCTAC AAAAGAAACA CCAAATAXAT ATCCACGTAA CCCTGGTATIG CCAAATAAAC TTTACAAAAT CAGCC'TCCCT C'TTTATC ArCCCTCCCTC rrTTATTTCT AGGCTCCG=A GCCCACTTTA A.ATT7rTAG TAAACAAATG TAGGCTCCGG AAAAAA'rGAT TTACAAAATC AAAAATGATT TACAAAATCA 7TTTTCTG 790 CTATACTATC CTAAGCAAAG GTTTPAATG TCATCCCGTG AGGTGACGAA GACGCAGAAA TATTTAAAAC ?CTrrrAAAAT CTAAATTTTA AAGAAGTCTT ACTCTGAGGG CCTATTGCTG TAAAATAATG GGCTCTTTT TGATGCCCAA AAGTGAGGTT TATATGAAAC AAGAATCAAC TGTTGATTG TTAC INFORM4ATION FOR SEQ ID NO: 107: SEQUENCE CHARACTERISTICS: LENGTH: 6405 base pairs TYPE: nucleic acid sTRANDE.DNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: AGAAAAATCT GCTTTACAGA AAATAAAAAT AATAGGAGAA AATCTATGTC AGATTTGAAA AAATACGAAG GTGTCATrCC AGCCTTCTAC GCATGTTATG ATGATCAAGG AGAAGTA-AGC CCAGAACGTA CGCGTGCCTT GGTTCAATAC TTCATTGATA AAGGTGTTCA AGGTCTTTAT GTCAATGGTT CTTCTGGTGA ATG'rATCTAC CAAAGCGTTG AAGATCCCAA G1-rCATTTTG 14520 14580 14640 14654 a. .a
GAAGAAGTCA
A'rACTAAAGA
CAACGATTCC
ATATCAGTTC
GGGTTGCTTT
TGAAGAACTC
ACCATATCGT
GGGCTGGTAT
TGGCGGTAGC
TAGTATGGAA
ACCAATTTAT
TGCAGCTCCA
GACTCCAAGC
TrCTATGCCA
CTTTAATGGT
CGGTGGTACT
AAAGGTAAAT TGACCATTAT TGCCCATGTT GCTTGCAATA CTTGCTCGCC ATGCTGAAAG C'TTGGGAGTA GATGCTATTG ?'rCCGCTTGC CAGAATACTC AG7'rGCCAAA TACTGGAACG AACACAGACT ACGTGATTTA CAACATTCCT CTTTACACAG AAATGTTGAA AAXTCCTCGT GTTCAAGATA TCCAAACCTT TGTCAGCCTT CCTGATGAGC AGTTCCTAGG AGGACGCCTC TATGGTGCTA TGCCAGAACT CTTCTrGAAA GAAACAGCGC GTGAATTGCA GTATGCTATC CATGGAAATA TGTACGGTGT CATCAAAGAA GGATCTGTTC GTrCACCA'FT GACACCAGTG GCTGCTGCCT TGATTCGTGA AACCAAGGAG
CAATTGGCAG
GTTATCCGTG
CGTGGAGAAG
ATGGG4GGCTA
CTCAATCAGT
AACGCAAICA
CTCTTGAAAA
ACTGAAGAAG
CGCTTCCTCT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 TGATTGCGGA TAAGGACCTA TTGGTAAACT CACTTCTGCT TCAA74GAAGG CTTGAATATT ATCGTCCAGT TGTAGAAGCG AATCTAAAAG GAGGTATTTA TGACATATTA CGTTGCAATT CAAGTATGGT 'rTGGTTGATC AAGAGGGGCA ACTTCTTGAA GCCCATAAG GGTGGACCTC ATATCTTACA AAAGACCAAA AGAAAAAGGC CCAGTAGCAG GTGTTGCCAT ATCTTCTGCT GATATCGGTG GAACCAACAT TCGCATGAA.A TGCCAACTGA GATATCGTAG CTAGTTATTT GGGATGGTGG ATCCGGATAA GGGTGAr-ATT TTCTATGcTG GGCCGCAAAT CCCTAACTAC GGAAATCGAA GAAAGCTTTA CTATTCC1 'G T1GAGA~rGAA TcrGcTGAG GcAGTA~cTG GrrCAGGCAA GGGACCAAGT TGGAAccGGT ACGGTGG7*r GCTTGATTAT GGATAGGAAA ?1'cAGCCTGT GAAGTcGGGT ATATGCATAT GCAGGATGGA TACAACAGCT 7rAG73AA6AT ATGTAGCTGA AGCCCATGGA TGGCCGTAGA AlTTTTCAAAG AAGCCACTGA AGOAAACAAA GCAGGCACCC AGTTCAAAAA
AATGATGTCA
GTGACACTTT
GTCTrCCATc
GCTTTCAAZ
GAAGATGTTG
ATCTGCATGG
ACTGTGCAG
GCTTGACCAT
GTTTTAGCAA
ACTTGGCTrTC
ATCAGTGGAA
AAGGTATTGA
CCAATCCAGA
CTAAGATCCG
TAGAAITTTGC
CAAAACAATC
TAGCCGTTGA
CCGTA'rGGTT GACTATCTAG GAAAAGGTCT AGTGGTTATT CTTGGTGGTG TACACCCTTG AAAGAGGCTT CCATCACCAA AA'rACAGCAG CTAGTTTGGC TCAGCCAAAC GTTTTATT TTCCCAGTAG AAAACTAGGC ATAA6ATCAAA GGATATAGGA GGGTTGGCTG CTTTAT'TGGG AACACTTGTA GACTATCTAA AATAGAAGGA CATCCA'rGAT GCTGGTTGTC CACCTAG'TCG CATAAAACGT CAAAGAAGTA GATACGCTCA CGTCAATCAA GTCTrTGTr
GTATCATGGG
TG-GTACCAAG
GGATGTTGG
TAGGATTTTC
CTATTAAAGA
TCGATAAAGA
TGGGTGGCTA
AAGAGTACAG
071'GAGCCTG
CAGCCAAAGC
AATTTCATTT
GGCAAATATT TGCTACGTTC GCAAGAGGCT ATICCTCAAAC TTTAGCAGAA AAAACACGAT TCCATATTAT CATTTAAGA TTACACGNT" TTGTCTACGA TTTrTCCTT GCTTTCGCCA TTrA=rCCA AGAGCATAGG AAGTTGAGCG GATATTCGTT CAAGAACAGT CTCTGTATAG GTCTGGCI'AT 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 TC'TrrGCCCC
AAAGTGAA
CGTCTTGGTC
CACGGGCCAC
CA'rCTCCr-rA GCA'rCTAATA GCCCAGTACT AGACAArTT TGTCAAAGCT TCGCAGACCA GGCCAGAA CTCCCTG'rTC AATTCGTTCT AGTTCGATTT ACITCGAGG ACTCGTTTGG TGCCTCATGT TGCTATTGGA GCATCTTCGA TTAGCrGCGC TGTTCCCrCA TATTGCTATA a.
TCAG'TGGACT GTGCTTGGAG ACTTGGTrTGG CTTGATNrrC AAATAAATTC TCGGTAGCCA GTAAAGCCAC ACTrTAC GAGAAATATG TAATTTTTGG GTGACTTGTT GAGAAGATAA CTTGCAAAAA A'rAGCGAGCG ATTCrCTT CTAGGTCTGT CAATGATAGT TGCGATATCT GCTTGTCCA TACGGAAAGC GGAAGACTAG ATCAGAGAAT AGTCACACTT CATTATAACA ATAAAAACGC A'rCTCTG?1'T TAAAAACGAA AAAATCGAAA ATTTTCTACT CAAATTGTGG TACAATTAAG AGTAAGATTT AAAGCGG4GTC
ATCATCTGTA
CMA=CTCA
TCCTrTTACAT
CATAATATAA
AAGCT-rCTCT
AAAGCACTT
ATCGTNrTCAG
AAATCTCAAT
GAGTCATACT
GGATAGATAA
CTTTTCCATA
TAAGTrrAGAA ATGAGACTCA 792 TrGrTATGAG AAAATTAAC AGCCATTrCGA TTCCGATTCG CTAATTTA TTGTTTrCAA TCGTCANTI' ACTCTTTATG ACCATTATTG GTCGTGr GTATATGCAG CTTGAACA AAGGAw?rfrTA CGAAAAAAAG CTAGCITTCAG CTAGTCAGAC CAAGATTACA AGCAG7rCAG CCCGTGGGGA AATTTATGAT GCTAGTGGAA AACCTTTGGT AGAAAATACC TTAAAGCAGG rTTGTTrCCTT TACGCOTAGC AATAAAM'GA CGGCTACAGA CrrAAAAGAA ACAGCTAAAA ATTACTGAC TTATGTGAGC ACTA7"rTGGC TGATCCTGAA GCTTGGATTC AGATGGCAAT GTGTACA.AAC GAGTCAACTA GTCAGTTAAA TGCTGTTGGA ATTCTCAGGT GGCTGrrATT CTTCTTGGGA TAGAAAGGTT GTGAAAAAGC TGG'rCTCCCA ATCAGTrT= CAAA~rrGAC AGAACCCAG CTGGCGGATT ATCTATAAAA AAATAGTIGGA AGCTCTCCCA AGTGAGAAAC CGTCTATCCG AATCAGAACT CrA'rAACAAT GCGGTCGATA AACTATACAG AGGATGAAAA GAAAGAAATC TATCTTTTT'A AACTTTGCGA CAGGAACCAT TGCGACAGAT CCTCTAAATG GCCTCTATTT CAAAGGAGAT GCC'TGGCATT AGTATTTCTA
TAAATGACCG
AACGCTCGGT
TTGAGGAAGG
GCGTGGATGC
ATTCTGAAGG
CAGGGATTAA
CCAATGTCTT
ATGGAGTCTT
CTCCCATCAA
CTCTGGAGTA
CCTATCAACC
GTTCAACCTT
TGTAGGAACC
AAAAGAAATC
TAGTAAGGGA
TTTACTGAAA
TTGGAALACTr CCCTTTCTTC TATAGTPGGG GCGGAAGAAG CAGAAGCCTA TCTTAAAAAA 'rCCTATTTGG AAAAGCAATA TGAAGAGACC CATCTGGATA AATATGGCAA TATGGAAAGC AACAATATCA AACTGACCAT TGATTTGGCT AGTAT'rCA ATTCTGAGCT AGAAAATGGT AGTrGTATCCA
GGCTATTCTC
TTACAAGGAA
GTCGATACAA
TTCCAAGATA
CCAGCCAACI'
TTGTCTATGT
3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 AGGTGCGGTT TGTCTATGCA GTCGCCCT'rA ACATGACTTG AAAACGGGAG TGTTCCAGGT TCGGrrGTCA GTCAG4GAAAC CAGACCTTGA TTCTTGGTAT ACTCAGGCT TTCATCAAAT ACCTATATG CAATATGTm GTCGGCACCA TGGCGAATAT GGCTTGGGTA
ACCCAAAPLAC
AG'N'GACGCC
AGGCGGCGAC
CAGACCAC
ACGTTCATT
TCCAAACAC
TGATTCCTGC GAACGCTAA CATCAC7CA GGrrGGGAAA CATTGTCTTC CAAGGTTCAG CCCTATCACA GCGGTCCAAG CTTACGTCTT ATGGCAAA GCAA'rCTAGA GTCTGCTATG GAGAAACTGC CTGCGACAGG AATTGACCTA CCAGATGAAT CTACTGCATT TGTTCCCAALA GAGTATAGCT TTGCTAATTA CATTACTAAT GCCTTGGGC AGTTTGATAA CTATACCCC ATGCAGrrGG CTCAGTATGT AGCAACTATT GCAAATAATG GTGTTCGTGT CGCTCCTCGT ATTGTTGAAG GCATTTATGG TAATAATGAT AAGGGAGGAC TGGGTGACTT GAT-rCAGCA.A CTGCAACCGA CAGAGATGAA 'rAAGGTCAAT ATATCCGACT CCGATATGAG CATCTTGCAC CAAGCTTTTT ATCAGGTTGC CCATGGTACT AGTCGATTGA I. 793 CAACTG7GACG TGCCTTTTCA AATGGTGCCT CCGAAAGCTA TGTGGCAsGAT GGTCALGCAAG CJATCTG.ATAA 'rCCCCAAATC GCTGTCGCAG ATGGTGTAGG ACCTTCCATT GCGCGTGACA TGAATTAGAA AGGAAAT'rAT GCTMACCA TCTA.AGTTAC CAGGTATCGG GATTAAGACG ATGTCVTCCTG ATGATGTCAA TGAATTTGCA ACATATTGTT CTATTTGTGG ACGTTTGACA CCCACTCGTG ACCAGACAAC AArrAGT'r GAAAATATCC AAGAA'rACCA TGGACTCTAT AATGGTATCA G'TCCGGACGA TATCAATCTC GAGGTTTCAG AAGTGATTG? CGGCGACTAAT
TGGTATCCAT
CAACCAATAC
7.TGTTTCC
TATCAATCT
TAGCGGCAAAA AcAGGTAcAG CAATrGCGGTG GCC1TATGCCC TCATAATACC AATCTAACAA GTATCAAAAA TACCATCCAA ACACCTATTG CCAAGTTGAT TGACAGTTAT GCTACGCGTC TGGCCr'rrA TACGATTGGG AAAAATCTCC TTTCTGCTAA GAGAGAATTG
GACCACCATC
CTTGAGGATA
CATGTCCTTC
AAOAGCCTTA
GCTACAGCGG
TATCTTTCAC GTTTGCTCAA GCCGGCTCT A'rCAAGG'TTA CTTGTTCTAT CTGTACTGAT 5220 CTAGAGATGT CGCAGCCATG 5280 ATCCCCTCAT TTCTCCTATG 5340 TGACTCGTCT TATCGATAGT 5400 ATGGTGA.AGC GLACTTCCATG 5460 CGCGTCTAGC ACGAGGTCTC 5520 TCTTACGAGC CATTGAA.A.AT 5580 TCATTTATAA AAA.ATCAAAG 5640 TGATGAGTAG GCTAGTTC 5700 GAGTATTCTA GTAGAATTAG 5760 GGACGGAGTG CGGAACGCCA 5820 GCTGTGGGAG CGGACATTGA CTATCCAC GAAGTGACAC CGGACAGAGT TG'TAAGTGTA GGCAAATTA CGAACTCCAT AGGCTGAAAA TCC~rCCTAT CGGCCTCTTT TTGTATAGTG AAGTTTTAAA AAACCAAGCA AATATGATAT ACTAAAGAGC GACAAATAAT ATGAAACAAA AGTCTCTGTC C-rrrCAGCTG CAAGACTTTC TTTATCAGTC TCCCGGGGCAA GAAGACCGTC ACCA.AGTGCT ATCTACGAAG AGAACATCGC TCTCTTCAAG CA'TTTGTCA TCAAGTCTTG TG.GTATTGCC CAAGTCCTT CGCTGAAGTG GAAGAAAAAT TAGVTGTCGGT ATTTCTAAGT CTTrCCGATAT GACAGCCGT= CGATTA'rTCT 'N'TATAI'CT AGAGTGTCAT GCGTGCGGTC AGTCAGGTGA CTTTATCAAA 'rCATGACCAA TGAAACCATT AAGGTGCAGT GGTCTTTCCA GArTCTTGGA AGTTGAAA GATTACGACC GTTTCACAGT ACACAGGAAT TTAGTCATGC GATTGGGATA AGAAAGTN'C CTCCTTCACG GGCCAATCG ATGCCTTACG TTGGTTGCAA
CCATGGATAA
ATGTGGCTAT
TGGCTTATCC
AATCACGACT AAGCCTGTTC VrGGAATCVGC CGTTCAAGCC GATCATGTGA CTGCTAAA.AT AG'rCTTCACT AAGCCGTCAA ACATGGGGTC 5880 5940 6000.
6060 6120 6180 6240 6300 6360 6405 CTGAAAACCA AGAACAACTC CGTCAAGCCT TAAAACTTCC TCTTGGTTGA GCA.AGGAGTG AATGC INFORMATION FOR SEQ ID NO: 108: SEQUENCE CHARACTERISTICS: LENGTH: 11309 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: CGAGCTCGGG TACCGGGATT TTAAGGAGTT TGATATGTAT AACCTATTrAT TAACCATTTT ATTAGTATTA TCTrGTTGTGA TT IGATTGC AAXrrCATG CAACCAACCA AAAACCAATc CAGCAATGTA TTTGATGCCA GTTCAGGTGA TTTG~rrTGAA CGCAGTAAAG CTCGCGGT TG.AACTtTA ATGCAGCGTT AGCA'rrGACG GTATTATCAA TT'TTTATTTT TAAAGGATGT AAAGAAAkATA TGAAAGATAG AATGATTTGG CTCAGGCTT'r ACCGTTCCr TAATGGAAAG TGACAGGGAT TTTAGTCTTT TTCTGGCTAG CCATTCCTT GTAGATAAGA AAATAATGGG CAGGACTAGG TC?1'TCCCTC TTGAGAAGGT 'rTTACAG'rAA AAGAAAATTA AAAAATCTAG AATAAAAGAA TATTTACAAG ACAAGGGAAA GGTCACTGTr GGGAAAAGAC AGTTCCAAGG A'NrTCGTGA GTTGATTAAA AAAGC-ACCAA A?'rCGTTrTG AAGAAGATGG TAGTCTGACA TGAGATTACC CTCAAG4GGA TT'I=CATGC CCATAAAAAr TTAGAAATTA AGA.AAAAACA GGCTT7TGGCT TTGTTAGTCT GGAAGGCGAG GAGGACCACC TTTTGTAGG GAAAAATGAT GTCAACTATG CTA'IrGATGG TGATACCGTC GAGGTAGTGA TTAAGAAAGT CGCTGACCGC AATAAGGGAA CAGCAGCAGA AGCCAAAATrr ATTGATATCC TAGAACACAG TTTGACAACA GTTGTCGGGC AAATCGTrCT GGATCAGGAA AAACCTAAGT ATGCTGGCTA TA'rTCGTTCA AAAAATCAGA AAATCAG'rCA ACCGATT'rA' GTTAAGAAAC CAGCCCTAAA ATTAGAAGGA ACAGAAG'rTC TCAAAGTCTr TATCGArAAA TACCCAACCA AGAAACATGA TTTCTTTGTC GCGAGTGT'rC TCGATGTAGT GGGACACTCA ACGGATGTCG GAATTGATGT TCTTGAGGTC TTGGAATCAA TGGACATTGT ATCCCAGTTT CCAGAAGCTG TTGTTAAGGA AGCAGAA/kGT GTGCCTGATG CTCCGTCTCA AAAGGATA'rG GAAGGTCGTC TGGATCTAAG AGATGA.AATT ACCTTTACCA TTGACGGTGC GGATGCCAAG GACTTGGACG ATGCAGTGCA TATCAAGGCT -CTGAAAAATG GCAATCTGGA GI'TGGGGTT CACATCGCAG ATGT7TCTTA TTATGTGACC GACGGGTCTG CCCTTGACAA GGAAGCCCTT AACCGTGCGA CTTCTGTrTA CGTGACAGAC CGAGTGGTGC CAATGCTTCC AGAACGACTA TCAAATGGCA TCTGCTCTCr CAATCCCCAA- GTTGACCGCC TGACCC.AGTC TGCTATTATG GAGATTGATA AACATGCTCG TG'rGGTCAAC TATACCAT-rA CACAAACAGT TATCAAGACC AGTTT'rCGTA TGACCTATAG CGATGTCAAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 .795 GATATCCTAG cTGGCGATGA AGAAAAGAGA AAAGAATATC ATAAAATTGT ATCAAGTATC GAACTCATGG CCAAGCTTCA TGAAACTTTA GAAAACATGC GTGrAAACG TGGAGCTCTC AA?1'TTGATA cCAATGAAGC GAAGATTTTA GTGGATAAAC AAGGTAAGCC TGV1GATATC GTTCTTCGGC AGCGTGGTAT TGCCGAGCGG ATGATTGAC? CT?1 ATGTT GATGGCTAAT GAAACAGTTG CCGAACAI-rT CAGCAAG?1'G GAT1TTGCCTr TrATCTATCG ATTCACGAG GAGCCTAAG4G CTGAAAAGGr TCAGAAGTTT ATTGATTATG C'TTCGAGTN' TGGCTTGCGC ATrrATGGAA CTCCCAGTGA GATTACGTCAG GAGG CTr AAGACATCAT GCGTGCTGTr GAGGGACAAC CTTATGCAGA 'TGTATrTGTCC ATGATGCTTC TrCGCTCTAT GCAGCAGGCT CGTTATTCGG AGCACAATCA CCGCCACTAT GGACTAGCTG CrGACTATTA TACTCACTTr ACCAGTCCAA TTCG'rCG=A TCCAGACCTT C?1'GTTCACC GTATGATTCG GGATTACGC CGTTCTAAGG AAATAGCAGA GCAT'rrTGAA6 CAAGTGATTC CAGAGATTGC GACCCAGTCT 4 4 *9 4 4 44 46 4 4 .4 .4 4 4 4 9* 44 6 4 *44044 4 .4* 4. 4* 4 4 4.4.
4 4404 TCCAACCGTG AACGTCGTGC GAGTATATGG AAGAATACGT T'TCGGTCTCT TTGTCGAATT CCTGAATTTT ATCATTTCAA ACTTTCCGAG rGGGTCAGCA GAGATTGATT T'rTCATTCGT TCTAGTCGTA GTGGCAGAGG AGAAAATCAG GACGCTCAAA GGAAAGAAAC CTTTTTACAA GGGAAAGGTC GTCGCACAAA AGAGGCAGGG ATGGTCCTGA TCTCA6AGGAT GGCTTTGCTC CGCGCCTTAC GAAGAGGGCA GCTCCATAAA AAGCAAATTC CATAGAAGCT GAGCGTGAAG TCGAAGCCAT GAAAAACCCT GGGTGAAGAG TATGATGCAG GCCAAACACA GTTGAAGGCT TGAGICGTGAT T'TGACC~rC GATCCCTATC CGTGTTGAAA ACCTAGTGAG TTTGATGTrGA GCGTGATTCA AATCGTCGTT TGATAAGCGT AAGCATTCAC GGAAGTAGCT AAGAAAGGAG ATAAAAAGGC ACGCCACGAC CTGGAACTGA AATCAAGAGT AAGTGAAAAA TGGAGAAGTT TTGTATCAAG TATrGTCAAA TGAT'rCACAT CACTAATCTG GTGGAGAAAA ATCAGGTATC GAGCGGATAA AATGACTGGA TTGAAAAAGG CTTGAAACAG CGGATAAGAA CCAAGACAAG AAA.AAGACAA GAAGAAAAAA CCAAGCATGG CAAAGGGCGA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880- 2940 3000 3060 3120 3180 3240
ATATCTGGAA
AAAAATTGGA
44 4 AGTTCCCCTT AACGTCTATA TAAAAGATGG AGGGAAGCAT GACTATGACA AACGGGACTC CGCGCGTGTG AI'GAAAGCTG TTAATCAGCG AGTTGCCTAT AAACCCATGC CTTTGTGGAA
CCAGGAACCA
ACAAGAGATC
CTACGCTAAG
TATCAAACCT
ATAAAAAGAG
TATACAATCG
GTACGAGCTG
TGGCTGAGCA
GAACGTCGTC
AAAGGGACAG
CTTCTTTTAG
CGTGAGCAA.A
CAATTGAAAA
TAGATACOCT
CTCGAATTAA
ATGTTCATAT
GTAAACTCCT
GAATGACCTT
GAC-1-'GCCAA
ATCGAGATAT
TGGAAAA.ATT
TAAACAAACA ATCCCTGAAC CTGTTCAGCA AAAGCACAAT ACAAAAGTTG GTTTATTGAA TTGACAGAAG AGACAATCCA ATGGCCCAAC GGAATGGTAC TTGGAATTT CAATCCTGTT CATTCAGAGC GGAT~wrGGGT TGTGGTCAGG GACGGCTGTA GATCAAAATG ACATTTGGAC ATCCCTGTTG TGA'TrrATC GTTTCAACAG TATTCAAAAT ATGCAGGAGA GGACACGGAG GArrATCTT GGCAGACTAT TACAAGGATT 796 GGACTTGGGG GAAAATTACT GTCTGAAGG GAGCTCTcAA AAGGGGAAGT TCTAGCTGAA CACCTCTTTG AAGCAGGGGc CTCAAGCCTG GCACCGAGTG GAAGCTGCCA CAG2ATGATGT A'rTGTAAACC TGAGGArrAT TI 'GCTAAAA AATACAATAC TCCTAGACCC CATCCAGACA GTGAAACAAG GGAAAGCTTT GGCGTAATTC TCTrTTCTA CCCCAGCAAG ATTrrGATGT GACTAGCTCT TGAAATCTTG CAAAGCATTG TGGAflCAGGA GCCTTTACGA TATCAATTCA GCTAGCATTG AACAAGAATA rTrTTTCAT GTTTCTACAA GCGGACCGCA 'rNCCACPA' AAACCAGTCT TGGTGAC AACCTTATCG GCrCGGTT=AA C~rCCCATTC ACCTTTAAAG CGGAATTGGT TAAGTACAAT GAAAATCCAC
TTTGTGCCAT
AAGGAGAACT'
GCCATTTGCA
CCGTCGCGAT GAGAATGGCA ATCGTATTCA ACTACGCTTT GCGACCTTAC 'rAGCTAAGAA AATCAAGTAA ACACACATGA AGAT'rAGGAA TT'NCCTGAT CTTTTTTCTT TrACGAAT GATATAGAAA AGGAGGGAAT TCATGTTrG'r TGCGAGAGAT GCTAGGGGAG AATTGGTAAA TGTGTTAGAG GATAAACTTG AGAAGCAAGC ATACACCTCC CCAGC"M=GG GAGGCCAGCT 3300 3360 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 CCATTrTGCGT CAAGGACCAA G;TGTACGGAC GCATT7-GCC TGA'rTTTTC 'rTTGAAAATG AAAGTCCAGA ACACCTGGCC CTGGTTGAAA AAAGAGACAA AGGTTCAATT AGAGTACCCG TCCGATGTA TTTGTAAATG GCAATCTAGC TCTAGAAGTT CATAAATCCT TAAAAGACTG AATAAGGAkAT CCCTCTATCA CTTTrCAGAAC TTAAACAGAT CAGTGTAGTC CCTTGCCTCA GAAACTCCTT AAAGAGCGAA GTGAGGGCTA TCGTAGTCAG GGTTACCAAG TACTGTGGTT GCTGGGTCAA AAACTGTGGC 'rCAAGGACCG TrrGACTCGT CTACAGCAAG GT=TCTTTA T'rTCAGTCAA AACATGGGCT ACTCAAATAC CTGATTTACC 'rTCCTATGGT CAAGGTAGTT TTTAG7TTTG GGAATTAGAC AAGGAAAAAC AGGATCTCCG CGGTAALACTC CATTATCAAA TATITGGAAAT ATTGCGTCTT CCCTATAAGA
AAC'T'TTAAG
TCAAGCAATT
GACAAAAAAT
ATCTCATTT ACAGTTTCTG AGCACAAGGA CATCTCTCGC TTATCAAAAT CWTTTrGGA TGAAAGAACA AGCAGAAGCC CCTGACrrAT GGACTGAAAG AATGGTATCC ACAAACA CCAGATTGAA CAAGACTTGA CTAGCTAT'rA TCAGCACTTT TCCTCAAAAT GA'rGGCAAA AGCI-rTATCC ACCACCCTT TATATCCGGC AACAACTTTA TATCAAAAGG GAGAAAATA'r CCAATACTGGCGCAAATT'm-r TATACCTATT ACCAAAAAAA TATCAGCAAT ATTT'CTTGAA AAATATCGTA GAATAGAAAG GATGGAGGAA TCTAATGGTA ??ACAAAGAA ATicAAATAAA TGAAAAAGAT ACATGGGATC TATCAACGAT CTACCCAACT GACCAGGCr GGGAAAGAAGC CrAAAAGAT TTAACAtGAAC AArGGAXGAC AGTAi3CCCAG TATGAAGGCC ATCTCTTGGA TAGTGCGGAT AACCTACTAG AAATCACTGA ATTTTCTCTT GAAATGGAAC GCCAGATAGA GAAGCCTAC GCTrT'GCTC ATATGAAGAA TGACCAGGAT ACACGTGAAG CTAAGTATCA AGAGTACTAT GCCAAGGCCA TGACACTCTA CAAGCCAGTTA GACCAAGCCT 71nTCATTCTA TG.AGCCTGAA TTTATGGAGA TAGCGAAAA GCAGTAT=C GACrTTAG AAGCTCAAcC AAACCTGCAG GTrATCAAC ACTATTTGA CAAGCTTG CAAGGCAAGG ATCACGTTr TTCACAPACGT GAAGAAGAA'r TATTCCCTGG AGCTGGAGAA ATCTTTGGTT CAGCAAGTGA AACCTTCGCT ATClwrGGACA ATGCGCATAT 'rGTGTTCCCT TATGTCCTAG ACGATGATGG TAAAGAAGTT CAGCTATCTC ATCCGACTTA CACACGTrrG ATGGACTCTA AAAAACGTGA GGTTCGCCGT GGTGCCrATC AAGCTCTTTA 'rCCGACTTAC aAAeAA~T-CC ACACACTA TGCCAAAACC TTGCAAACCA CAAGAGTGCT CGTCATGCAG TTTGGTAGCA GCAGTTCGCA AAAAATCTTG GGGATTTCAG TGAATACAGT TNTACCTACC GGGTGAGGAT TACTTGAGCC CCAAAATCAA GCCAAGCGTT TATGCTTCTC AACTGGCAAG TCACAGTATG CATTCAAGCT TATCTT'=G GCTGAGATTG GGAAGAAGTG CAACACGACC ATGTTAAGGT GCAAAATTAC CGTCCAAAG TTCGTAACTA CCCTCGCAGC CAATTTTGTT CCACGAAAGTG TTTATGACAA AGCAT'NTGCC ACTCTTACAT CGCTATCTTG AGCTTCGTTC ATCTCAAGAT GTACGATGTC TACACACCGC 1TTCATCTGT AAGAAGCCTT GAAAAAAGCA GAAGA'rGCr'r 'GGCAGTCr GT=TAAACG TGCCT'rCAGC GAGCGT'rGGA 'ITGATGTTTA CAGGTCCCTA CTCTGGTGGT TCTTATGATA CCAATGCCTT ACAATCTGGA CAATCTCTTrT ACWTrG 'C ATGAAACAGG ATACTCGTGA AACTCAGCCT TATGTTTACG GGGATTACTC CCTCAACTAC CAA'rGAAAAT ATCTTGACGG AGAAATTATT CAACACGCTT TGCTATTCTC AATAACTCC TAGATGGTTr 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 CCGTGGAACZA GTTT'CCGCC AAACTCAATT TGCTGAGTTT GAACACGCCA TTCACCAAGC AGATCAAAAT GGGGAGCTCT TGACAAGCGA ?TTCCTAAAT A.AACTCTACG CAGACTTGAA CCAAGAGTAT TATGGTTNGA GTAAGCAAGA CAATCCTGAA ATCCAATACC AGTGGGCTCG CA'rTCCACAC TTCTACTA'rA ACTACTATGT ATATCAATAT TCAACTGGCT TTCCGGCCGC CTCAGCCTTG GC1TCAAAAAA TTGTCCATGG TAGTCAAGAA GACCGTGACC GCTATATCGA CTACCTCAAG GCAGGTAAGT CGGACTATCC ACrrAA'rGTC ATCAGAAAAG CTGGT-TTGA 798 TATGGAGAAG GAAGACTACC TCAACGATGC C?1'TGCAGTC ?'NGAACGCC GTrAAATGA G?'TGAAGCC CTTGTTGAAA AATTAGGATT GCGCATAAALAT GGTTGAATCG ATGCTAACCA TA.ACA'rGCGT CGTCCTGTCG TCAAAGAAGA AATTGTAGAC AGCGTCAAAA GCACGTCACA GGTIrTCTTGA AAGAATTGGA AGACTTGCC ATAT=AT TATTCCCCAT GAAACGGrr CTA1TCCC-T==C~ATG
TATAGTAAGA
TTGATGCCTC
CGCAAGGAAA
G.AAACCATGC
TrGATGGCTG ATTGTrTTG
GAGGGAGATG
AGCCTAAAAA -TATTCTGGAA ATGGGACGG CTA'rCGGrI-r AACATGCGCC AAATGCTAAG CCAACGAAAA 'TTTTGCCCAG CGGTGCATGT CTTATCTACA ACTCTAAATA CATCGTCTTT 'rTGTCTTGGA TGATA'T= GTGGTCAGCG AACCATTTAT CAGAACTCAC CCCAACAT'TA ATTACAACTA TTGATCGTAA TTrTGACACTC GCAAGCAAAT TrAGCTC'rC
TCCAGAAATG
CACTCTCCTA
CTGACAGAGT
CTGCCAGAAA
CTTATGATT=
TCCTCAAkACA CAAGGTGGTG ATGflTGCCAA CGAGGCCTTC AAAAATTAT CTGCC AG GAGATCGTAT GAAAGCGAAT CATTTTCAGA TATCTCAAAC GAGTAGACAT CGTCTTTATG GATTCTGCCA TTrTGGAAGT'r GGTGGTGTGG GGATATTATG GAAGTCCGTC TCATGCAACC TrAGACAATC TCI'CATGCTT CGTA.AAAATG AAAATTTAAG AAAAAATAGT GAAGAAAAAA TTATTGGCAG
TAGCAGATGT
AAAATAGATA
GTGCCATCAC
CAG;ACCI'TAT
AAAGCAACCC
TCAACTGTCT
GGTAACACT
ACTATTATCA
CAGCATGAAA
TTCAGCCCAA
5840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 GTAGCAACTT TAGCAGCTTG TTCGA.AAGGG GGGGATGTCA TTACAGAACA 'rCAATTTTAT CAAGTCTTGT TA.AATATGAC CATCCAAAAA GATGATAAAG AGGTTGA'rGA TACTA'rTGCC AACAATATGG CTCAGAGCTT AACAATATGG CGAAAACTAC CAACG'rGTCT TCTCACAAGC AGGTATGACT GTAAAGCTCA AATTCGTACA AGTAAATTAG TTGAGTTGGC ACTTAAGAAG CTGAAr'rGAC AGATGAAGCC TATAAGAAAG CCTTTGATGA GTACACTCCA CTCAAATCAT CCGTCTTAAT AATGAAGATA ACCCCAAAGA AGTTCTCGAA CAGAAGGTGC TGA'I7MGCT CAATTAGCCA AAGATAATC AACTGATGAA AAAATGGTGG AGAAArTACC TTTGATTCTG CT-rCAACAGA AGTACCTGAG AACCCGC=r CCTrAGAT GTGGATGCGc TTTCTGATGT CATTACAGCA AACCCTACAG TAGCCAATAT TACATTGTAA AACTCACTAA CAAAACAGAA ATATTGATGA CTACAAAGAA AAATTAAAA6A CTGTTATCTT GACTCAAAAA
TCAGAAGGTG
GAGCAAGTGA
GTT-TTTGAAA
GAAGAAAAAA
CTTGAAACAC
GTAGCAGAAG
GATGTAACGG
AAAGCCAAGG
AAAACAAAAG
CAAG'rCAAAA
ACTGGCACAC
AAATCATCTA
CAAAATGATT
CAACATTTGT 'rCAAAGCATT ACCAAGCCT'r CCAAAATATC ATCGGAAAAG AA'rTGCAACC AGCCAATATC AAGGTTAAGG TTTACCCAAT ATATCGGTCG TGGAGATTCA AGCTCAAGCA a. .a a a a.
aa a GTACTACATC AAACGAATAG TCCAAATCAA AAAATGAAGC AAACATTCCC ACAATAAAAC AAAGTTAGAC AATTAATTTA TCCGAAGGAT M TTTTA TCI!AATTCT CTTA?'TG TCCAATTGAT TAAC1'GATTT AAATGTTTTC ATGCCAAAGA AAGA'IrCCAT CCTACCG?1'G GCTTGAATTC CCTTACTCI'C TAGGAAGCGA TGGTCACTAT GGAGAATCGT ATTCTCG'rAG AACCATCAA'r CAATTrAATC ATGTACCTAA AAAGCTTCTC 'rAAGCTATAT CCTTGTTrTC AAGTrAATT'r CATAATAAAA ACACCCCAAA TGTAGTTCAT GTACACCTGA TATGATGCGT TCATTT?=T AACTTGATAC TCAGTGAA.AA GCTGCTCAAA GAACAGCTTT GAGGTTGTAG ATGTG.AAGCT GACGTG GAATAGAT'T GATAATGCAA GATTCCATAC AATGGGTAAG AAACTTTTAC CT'TTTCCTCC CTACTCATCT ATAACTGCTT CTAAAACATT CTrATAAA'r' TCT'rATTTCA TTTTGTTCTA CAATCCTGTT rrGGCAAG?'r GCAATACCTT TTACGAGC ATATCTCCTA TGGTrCTAGT TCAGAAGGCT ATTCCAAGTA TTGGGAGTGA ATGTTCAAA ATTTGCAGA CCGCCATCT1' TACGAAGAAG ACTGGCAGTA AGATrGGCGC CGrGTCCGAC TAATGA'rTTG ATAAATrGGA TGGTCCGTrG CATTCGAGTG TCAAA TTGAG CAAGA'rMA TGAGTCAGGG AAAAAACTCG GCATAGTACA AGGTT7GTAC TrAGIrrGT ATTGCACAGA TAATAATCAA TATAGTCTAT TCATAGCCAT AAAACATTTC TCTrGvGCTGT TGCCCTTACG
TGATAAGAAT
TGCTTCTCTT
GATTAGAATT~
TAAGTTCATA
CGTGTTGATA
TGAATGCCTG
GTTTATCCCA
GATCTGAACT
AGTTAGAT1'T TTrCTGTCTA AC7r'tGGG T'rTATAATTr TAAAGAC'TTT ?rGACCAGCC GCAAAGATTA AACTAGGAAG CTAGCTGTAG ATAAAACTTG TGAGGTCACC AACATATATA TAGAAGAGTA TGAGTCTGGA AGT-TTTAATG CT.AGAGTTCT TATGTGAAGA G7T TGGGCAT TAGTATAGAA AAGTGAATCr GAAATAGTAC GA'rrTAAATT CTCAAATCAT ATTATTCAGT GAGAAGACAC GrTGTTCATAT CAAAAAGGTA TCTGTMdTCT TATTTTrGTT TCAACTGACT AGGCTATAAT TATGATTGAT AAGAAGTATC ATCATGGGTT TCTATAATGG TCAGGCTGC TGG'IrCTr'rA TAGCCTAGGA GAGTACGAAG AATrTAGAArA CGTTCAGCTG GACTATCTT AGTTGTACTA TAGAGGGATT CGGCrCCGAA ACGAAAAGCC TGGATTGT GCGGGTAAAT
ACTTCAGGAA
TGCCCCCCAA
GCTAAG'TCCT
AATGGCTCGT
GGATTTTAAA
'rGACA'rGGAT TTrGCCAGCCT
TCCAACATT
AATTTATTTG
TTATCATCAT
8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 AGCTTCCAAG GTTGCAATTT TCAAACCTTC TAACTTCCCA AGGAACGATT TCTAAAGAAC AGGGGGTATA GACTrGACT'r GACCGCTCGA GGTAAATCAC TTGAATAAAT CTGATCAAAA AGTT'GCCArr TG4GATAA'rCT
GGA.ATTTCCT
CACGGAGATT
CAGCAGATTT
TGAGATACTG
ACCAAGTCGT TTAGGGTTT CAATGGA'PTC TTGAAAACGA CCTTCTTGGT TCCAGP.GGGT TACTTGATGT CCTCCAAA6AT ATCTACAAAG GGCGGACGA TAATGCTGTG TCCGACTTCG AGAGCAATTT TATCGATAAA AATGGGATAA TTGGCTCCAT GAATGTAACC AGrrGTTT TTTTATTrGC CAGAAATTTT AGC1'AGNTC ATTCCGATAA TCGGTCCGGT CTrGTCTC!CC CG'PTCATAAC C?1'GAGGAAG CTCTCCTrCT S00 AGGAAGAAGA GGAGAATCAC CACTAGCACC ACG3ACCGTGG CGGACAAAGT AGAG!TTTCAT TCTGCCTTTA CAAAGCTAGC CAAGTCTrGT CCTGCAGAGA CAATCATTTG ATCCAAATCT TrGTGTrCr GACGAATTCC GACAGGATPTA TCTAAGTCCr 71rGTGGAAT CATGCTCACT TTTTCAGACA ACrGCTGAGT GATAGGGAcA AAAAGCGCCA AGGTrTTGAA AATCTGATCT AGGGCATTGA TTTGAATCCC CTGATGAGGG 9 4 9*9* .4 4* ATAGCTGCT'r TAGATAGGAT TTGTTCCACC AATGTTTTT TGATTTTAAC TrTTTM-GCC ATTA'PTTATA TTTATCCTCC AA'rTGACTCA TCCAAATACC AAGCCAGATT CCCAGCGCAA AGAAGAAGGC GATGATGACA TAACCGACAA GTGAA.AGTCC TGTGTATTGG ATACTTTCAG CGTTTCCTGC ATTTGGAATT AAGATCAAAA GGGTACTTGA TAGGACGATA CCCATGATGA AATGATAGAC GAACrG'PTTA CGGAGTTCTT CTAGTTCTCC GTCCGTCCAA GCGTAGGCCA CTTCTTCTTr CTTGCCTrTA CCTTTGGACA TCTTGTAAAG AGGTGGGAGG GCAATATAGA CATGACCTGC CTCGACTAGC GGACGCATGT AACGGTAGAA AAATGTCAAG AGCAAGGTCT *GGATATGGGC ACCGTCGGTA TCCGCATCG INFORM(ATION FOR SEQ ID NO: 109: SEQUENCE CHARACTERISTICS: LENGTH: 5548 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: CCATAGTCTA ACAAGTCTTT GTAAAGGTT ATCCCTGATT CATGTAAAGA ?TTGTAAAG AATCAAAAAA AGCCACPTT GAAAAATGGC TGCTCCTA.AA AATAGCTTTA AAAATTATTA GTCCTGTGCG AAAGATTGGT TAGGAAGAAA AATCGTGAAG CAACTGCCTC TGCCAAGCTG ACTCGTCACC GTGACTTGGC CACCTAATAA TTGACTCAGT TCITTrGACAA TGGCAAGGCC AAGACCAGTG CCACCAGTTT GTCTGCTTCG ACCTTTATTA ACTCGGTAAA A.ACGTTCAAA AATACGATCC TGCTCTAATT GACTAATACC AATCCCTGTA TCTGATACAG AAATCTTAAT GCCTTCGTTC ACCTT-TTGGG TCTTGACCTC AATT'rTTCCC CCTTGTTCAG TGTAACGGAT 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11309 120 180 240 300 360 420 GGCATTGGAT AAAAGATTGA GACATCATCT GGCACCTGCA GCTrGAGTC AAATCCTGTA TTG?1'GACCC TTAGATAAGG TTGTAAATA ATGTrAGAA AATGGTTTCA GCAAAGCr GTAAGATTTG GGAAAGTAAT TCACTATCTG ATACGAGGGT CCTTTAGCTG ThAATCCTTC TTCTTGAGCT GAGGTGCcAA CAAA7TCTGC CAAAGAAAGG GTCGTCCATT GTATAGGCAT TAAGAAGATG CTCAACAATA TGCTCAAGAC GCAAACTTTC AGTCATCCTT GAGCCCTTCT TCTTCAGCTG ACA'rCCCCTTL TAATCGAAGT AACTGGGTC CTCAATTCAT GGGAGGCATT TGAGACAAAG GCTAAATTTA ACTTrCATA AGTTCTAATC GTTCTTAAAT CA'rATAGCAA :.too 6- GACGAGCACA GCTTCCACAG AATCALAGTCA CCCTCATGAA GGCrrGGTGA ACTAAA'rTCC GCCGTCCACA TCGCGAAAAT AGCGGAAACT AAAAACGrCC TAGATCCTTG TATTGCTGAT ACCCAAAGAC TTTAAGTCTT AGGTTrGGTGC AAGGTCTCAA ACCTAGTCCC AA.AGAAGGG TGCCATCCCT GCAATCAGAA GCGT'rTCATC TATAACTCCT AGGGGCTTTA GGATTGTCTT ACGTGTTTCC TGCCCAAAGT TGrCATGTTG GGATG'TTTCA CAGTAACTTA TTCGCCTTGT TAGCCAAGAA TCGTCAGCGA CCTGAGGACA GCCTTGACAC CCATGGTTAG GTGCCACAGA ACAACCTCCA CCTGTTGGGA GACTTTGGTT TTTAGGCCAG C1-rGCCCTTrT TTCTAAAAAG TATrCACTAC AAGCAACTTC CCAT-r~CCAA AGGCAAA-AGA TTG'T'rCGGC
ACACATACTG
TCCI'CA.AGAG
GCCAGTACC
A7"rGGGTGGG CCTAAAA.ACG GGAACTGCTG TCACTTCTA.A ACCCACTTAC TTC7'rCT'rrr AACCTrTTTT TTTGATCAAA GAATATCCAT CCGTTrGAGG TCATCAAGTG ALACTTATGTC AATGAGGCAG AGAGCGACTG GATAATAACA TCTGACCTTG C'TAGAAGAAA GAGACCGATG CCTTTACTGA-TCCA.AGTTAA TGAGGCTAAC ACTTAGATTG ACTAGCCAAA ATTGAAGGTA TGAACTTATA ACCATAACCC CGAATGGTTC GAATAAA'TGC CAATTTTTC CCrCAACTrA CCAATATGAA CGTCCACCAA CATACCCCCA GATACGTTCC AAAAGACGC1' CTCTAGTCAG 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 168a 1740 1800 1860 1920 1980 2040 2100 2160 TAAGATAGAG CAAGAGTTCA A.ATTCTTTTG AGACTTCATC ACGCTCAGGG TATACTTTCA TATTATCTGA ATCATCTCCT TCTTGTTCTC GCGCCAGCAA TTCTCTAGGG CTAAAAGGCT
GGGTCA.AACT
AGGTCCCAAA
CTTTAGTTCG
TGTCAGGTA
GTCATCAGCC CCTAATTCCA AGGCCAAAAC CTTATCAAAT TCATCACT -r TCGCAGAAAC CATCATAATT GGAGTTTTGA CGCCTTTGGC TCTCAGCCGC TTACAAACTT CCATGCCATC ThATTCTGGT AACATGATAT CAAGCAAGAT AAAATCAAAG TAAGGCCTTC CGTCCATTTG TCACCAATTG ACTAGAAAAG GTCAAGCAAT TTCAGAATGT CTTCTTCATC ATCCACTAAT GGTTCTG 1'T
CCTTCCTTAC
AAGACTTGTT
CTGCCAAAGC
TrAAATGGTA
TTGTCATCTA
TTATCTCCTA
A'rAAATTTG
GGATAGGTCG
GGAGCAACAA
802 rTTGThACAT TATAACACAA ?I'ATCAGAAA TCCTAACATT GCTAAATCAG CCTATCAAGA CTACTATCTG GTCAAAC=C CAATCATCTC CTTGTGCTCT CCAGrAGATC TACCC1-rrCA AATAATTCAA AATCCTCAAA TPCAAAACCA G.AcAAGAAAc cAcArcA~cA TccTTATcAA cTGTrGA'rcc ccAAATAGTG CACAGTAGM AAGTTGTTGC CCTTTGCATA TGTCCAGGCC TAAAGTGACT GACCATCTGC I'GTAATCATG TGAACAGTAA CTGGGCATCC TGCATGAAAA CATCTrGCGT CAATCGGTGA AAA'rGTGAAG GATTCGTTC 'rTCTAATAAG
CCCTTAGGAA
GCTTCGTAGT
TACCAGATTT
AAATAAATAC TGGTATAAAG CGCCCTTCCC 'rTACCAGCAA GG'TTTATAG'r GTCTGAAGCT rTTTrTGTTT GTCTAAAA'rA GCCACC'rrCA ATATIGGGGAG CTAACTCTAG ACTTCT'rATC AAGTCTTCTT TATCCGTCGG AGCCAATCGG 7TGAAGTA.AC TCTTG=CAA AGTGGTTTTA CGArTTCAAG AACTCCTCTC AGTTCTGAGG ACACGGTAAT GATTGATGCG ACGGAAGTAC AAATCAATCC CCCTAAAAAA AGAATTACCG AATGATTCTG G'rAAAAAAAA TGCCACCrA TCA.AGGCTCA AGCGAwrGTC ACAAGTCAAG CGAG.AAT'TGT TTCT'rTGGAT A'rCGCTGTGA ACTATTGTCA 'rGATATCAAG 1-rGIrCAA).A TGACTCGCAG AAATATCGGA CAAGCTCGTA AAATCTTGGC TGACAGTGGT TATCAAGGGC TCATGAAGAT ATATCCTCAA GCACAAACTC CACGTAAATC CAGCAAACTC AAGCCACTAA CACTTGAAGA TAAAGCCTAT AACCATCCGC TATCCAAGGA GAGAAGCAAG GTTGAGAACA TCrTTGCCAA AGTAAAA.ACG TTTAAAATGA TTTCAACAAC CTATCGAAAT CATCGTAAAC ACTTCGGATT ACGAATGAAT TTGATTGCTG GCZAITATCAA TCATGAACTA GGAT'rCTAGT TTTGCAGGAA GTCTATTATT TGGTrACGG AA'rTAGTGAA GCCTTTAGGC AAGTGTCTC1' GGTTACGACG TCATGGACTC TAAATCGATI- ATATTTAGGG CTCATGACTA G'rGAAGCAGT rAGCTAG'rC GCATATAAGC GGCTAGCGTC TAACAATTAG GAACTTTAGT TCCAATAACT TTAACATTAC GACGT'rTTAG GACATAAATC GATCATATTT ATGTCCTAAA ACTAGTCAAG CGCCTAGCCA AAGTCCGAAT AGGATTT;GC GTTAGTTACT TACATTGCTT TGCAATCAAG TAACTGTGC GATTACATC TTCTCI'GGCG CTTCTACTCC AAGCAAGCGA AGGGCTCTT TGAGAACGAC TGCGNGCG TAGCTGAGGG CTAGACGGCT GTCGCGTTCT GGGCrrTCAT CCAAGATACG TGTATGTGCA TACTATTrTGT TAAACCATTG AGCCAGGCTA APT-rGCAAArr- TAGCAATGAT AGAAGGTTCA AAG?1'ATCTG 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CCGCACGGTT CATAATACGT GGGAAGTCTT GAATGAGT AATGATTTCC CAGCTTTCAG TATCATTCAA GCTATAGTTG CCAGCTGTr CTGGTTGAA ATCGGCTTTG CGTAAGATAG ATTGGATACG AGCGTAGGCA TATTGAACGT AAGGTCCAGT TTCACCCTCG AAGGATACCA 803 TAGCCTCTAG GTCGAAGI'CG TATCCATTTG TACGGTCGGT T1rGAGGTCA TAGAATTT1AA TGGCTCCAAT CCCAACAGCA TGTGCTrACTr GGCrI-rGTT TTCTAGTTCA GGATTrTTTAG CCTCGATTTG GACCTTGGCA CGGCTAACAG CCTCTGCAAC AGTAGGCTCT AGCAAGATGA CATTCCCIr ACGAGT~AGAG AGTTTCTTCC CI'CITGT AACCAAACCA. AAAGGAACCT GAGTAATGTC GTCACTCCAG TCGTAGCCCA TCTCTTGCAA GACAGCTTTG AGCTGTTTAA AGTGGGCAGA TTGrTTCTTGA CCAACGACAT AGATAGATTT AGCAAATTGC TAT 'CGTTT TACGGTAGAG GGCTGCAGCC AAGTCACGTG TGATA'rAGAG AGTTGCACCA TCACACTTCT TGATGAGGGC TGGATGTTCA ATTCCATATT TCTCAAGA?1' CACAACTTGG GCACCTTCTG ATI'CAAGAAG TAGTCCr'rT 'PCAGAAAGAA TGTCTACAAC TGCATCCATC TTATCATTGT AGAAGGC?1'C TCCGTTATAG ATTCCACTAA ACTTTCATCG TTTCAAGTTT ACGGAACCAT CTGTCAAATTr CA.ACCTTCAA TTCATTGTAA AGGCGGTTAA CGGAACCATT GCCAAAGAGC GAGAGCTTCC TCATCTCCAT TCGCGCGCTT CTTCATCCAA GCTACGGTCA TTTTCAGCTT 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220- 5280 534*0 CAGCGTTGAT GCGGACATAG AG? TAAGCA C?'rCGTCGCC CCA'TTTTTG TAGGCAACAA CCAAATGGTT GACCTTGACC GTTTGATAAC CTCCGATAAC AGTTGAACCC AGGTGGCCAA ACATGTCGAT AACAACA'rTT TCTTGTTTAC CAGTGGTAAC AGCTTGCAAT ACTTGAGCAG GTTCATCGAT TGGATGAGCT TTTACAGCTT
TCAACATCCC
CGATTTTTTC
TAGAAAATGG
CAATATTTTG
AAATGGCAGA
AGGCTTGGCT
CGACTTTTC
AAATTGTTTA
GAAAATATGT
TTTAGCGATA
GTCAGCATAG
TTATCAAGG
GTTCATTrTT
AAGAGAAAAA
CCCCAGTCTC
GACAAGCTAT
TTCGGACTAG
TGTTCTTTI-r
AAAAAGTTAA
TCAGCCAGTT
GCAGGGAAAG
GCCTCTTGGT
TCATTAAGA
AAAAAATT
CGTAAGGTCC
CAGCCGCAAT
CAATGTCTCC
CCAGGCTATC
GCTCCTT
TGTrGCGACA ACTTTTTCA CATTTGTGGT GCTTTACGTT CATTTCTGAG TTTTAGGGG TTTCCAGTAA CTT-rAAAATA AATGATGCTA GATAATTCGC TAGCAATCAA TTCTTTTGTA GGACTTTTCT ACTATTTTAT CACAATTTTA AAGAAAGAAG TGAAATCTCC TG?2'TTrTTG GTATAATATG GTTATAAATA AGAGGAT'rTT ATGAGAAAAA GAGATCGTCA TCAGTTAATA TAGTTATAAA TATGCACGCA A.AAAAAATGA TTACTGAGGA GAAATTAAGT ACACAAAAAG AAA'rrCAAGA TCGGTTGGAG GCCCACAATG TTTGTGTGCc GCAGACAACC TTGTCTCGTG ATTTGCGG INFORMATION FOR SEQ ID NO: 110: SEQUENCE CHARACTERISTICS: LENGTH: 3132 base pairs CB) TYPE: nucleic acid 4C) STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: TACCCGGTAG TCTTAG3CAGA CACATCTAGC GAAAAAGTAG CAGAAAATAA AGAGAAACAT CAGGATTTA AAGAGAAGAA AACAGCAGTC CCTGTGATAG ACAATAACAC TAGCAATGAA AAATCCCAAG GAGATTATAC GGACTCATTT GAAGArAAAG TTGTCTATAT TGC7TGAATr AAGGAACTAT CCAGTCTTAA GAATACAAAA GGTAGTGCCA TAGAAACAAC TCCAGATAAC TCATCCGTTG AAAGGGCACA AA.AAGTCCAA GGAGTTGAGG AAGCTATTGA TTACCTAAAG
TCTGAAGATG
GAAAATATCC
ATTAAGGA.AA
GAAGCAAAAA
GTGAATAAAA
CTTTAAACAT CTCTGATAAA ATAGTGCTAT GGAAACTTCA AAGAAGTTGT TAGTAAAAA'r TCAAAGAAGA AAATTCCAAT ACACAGAAAA TCCCAAAAAA AAAGATAAAG AATCTGGAGA AAAAGCAATC GTNTATATA CTTA'rGATAG AATTTTTAAC TTGGACAAAA TTAAACAAAT AGAANGGTATT CCCATGATGA ATCATGCCAG AAAGGAAATT TCTATCAATG C'rCCGTrrGG GAAAAATTT GATGGTAGAG GTA'rGGTCAT TTCAAATATC GATACTGGAA CAGATrATAG ACATAAGGCT ATGAGAATCG ATGATGATGC ACTGATAAAA ATTA'N'GCT GGCAAAATCA CTGTAGAAAA CATATTGCAG GGATrCTTGC ATAGATGGAA TTGCACCTAA TCTGGGTTTG CGGGTGATGA GTTGATGTTG TTTCGGTATC TGGCAAGCTA T'rCGGGCATT TATGCGACTT CTGCTTCAAG CAAA~CCTCA ATGAGATTTA AAAAAGAAGA CTTAAAAGGC GAGTGATAAA ATCCCTCATG; CGTTCAATTA 7TTATAATGGT ATATGATGAT GGAAGGGATT ATTTTGACCC ACATGGGATG 'rGGAAATGAT ACTGAACAAG ACATCAAAAA cTTTAACGGC TGCACAAATT TTCTCTTACA AAArGTATTC TGACGCAGCA AACAATGTTT CATGCTATTG AAGATTCTAT CAAACACAAC ATCTGGTTTT ACAGGAACAG GTC= GTAGG 'rGAGAAATAT AAGAAAAGCA GGCATTCCAA TGc-rC;TCGC TACGGGTAAC TTCTCATGG GATTrAGTAC CAATAATCA TCTGAAAATG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560
ACCGACACTG
GCTAAAAATC
AGAAATATAG
GAAATGTAAC ACGAACTGCA AAACAGTrGA GTr'TGATAAA GGCC'r'rTTT CGATAAGAGT
GCACATGAAG
GTTAACATAG
AAAATCACAA
ATGCGATAGC
GTGGAGAAAG
CAAAMPLAGA
GGTCGCTTCT
TT'rTAAATAC
TGGAACAAAA
GCTCCTAGTA AA'rTAAAATT TGTATATA'rA GGCAAGGGGC TTGGATCTTA GCGGCAAAAT TGCAGTAATG GATACAATTT GCTTTTAAAA AAGCTATGGA TAAGGTGCA CGCGCCATTA AAGACCAAGA TTTGATAGCT ATACAAAGGA TT'rAAAAAAT TGGTTGTAAA TACTGTAAAT 805 TACTACAATA GAGATAAT1'G GACAGAGCTT CCAGCTATGG GTATGAAGC GGATGAAGGT 1620 ACTAAAAGTC AAGGTTTC AATTTCAGGA GATGATGCTG TAAGCTATG GAACATGATT 1680 AATCCTGATA AAAAAACTGA AGTCAAAAGA, AATAATAAAG AAraATTTTA.A AGATAAATTG 1740 GAGCAATACT ATCCAAII'GA TATGGAAAGT '1TTAATrCCA ACAAACCGAA TGTAGGTGAC 1800 GAAAA.ACAGA TGACTTTAA CITGCACCT GACACAGACA AAGAACTCTA TAAAGAACAT 1860 ATCATCGTTC CACAGGTC TAACTG GGGCCAAGAA TAGATITACT TTTAAA.ACCC 1920 GATGTrTCAG CACCTGGTAA AAATATTAAA rCCACGCT TA A'TG=rATAA IGGCAAATCA 1980 ACTTATGGCT A'rATGTCAGG AACTAGTATG GCGACTCCAA TCGTGGCAGC T1'CTACTGTT 2040 TTGATTAGAC CGAAATTAAA GGAAATGCTT GAAAGACCTG TATGAAAAA 'rCTAAGGGA 2 100 GATGACAAAA TAGATCTTAC AAGTCTTACA AAAATTGCCC TACAAAATAC TGCGCGACCT 2160 ATGATGGATG CAACTTCTTG GAAAGAAAAA AGTCAATACT TTGCATCACC TAGACAACAG 2220 GGAGCAGGCC TAATTAATGT GGCCAATGCT TTGAGAAATG AAGTTGTAGC AAC TTCAAA 2280 *AACACTCATT CTAAAGGTTT- GGTAAACTCA TATGGTTCCA TTTCTCTTAA AGAAATAAAA 2340 *GGTGATAAAA AATACTTTAC AATCAAGCTT CACAATACAT CAAACAGACC TTTGACT'rTT 2400 A AAGTTTCAG CATCACGAT AACTACAGAT TCTCTAACTG ACAGATTAAA ACTTGATGAA 2460 *ACATATAAAG ATGSAAAAATC TCCAGATGGT AAGCAAATTG TTCCAGAAAT TCACCCAGAA 2520 *AAAG'rCAAAG GAGCAAATAT CACATTTGAG CATGATACTr TCACTATAGG CGCAAATTCT 25S80 AGCTMGA'rT TGAA'rGCGGT TATAA.ATGTT GGAGAGGCCA AAAACAAAAA TAAATTrTGTA 2640 GA.ATCATTTA TTCATTTTGA GTCAGTGGAA GCATGGAAG CTC-AAACTC CAGCGGGAAG 2700 *AAAATAAACT TCCAACCTTC TTTCTCGATG CCTCTAATGG GATIrGCTGG GAATTGGAAC 2760 CACGAACCAA TCCI'TGATA-A ATGGGCTTGG GAAGAAGGGT CAAGATCAAA AACACTGGGA 28S20 ***GGT-rA'GATG ATGATGGTAA ACCGAAAATT CCAGGAACCT TAAATAAGGG AATTGGTGGA 2880 GAACATGGTA TAGATAPAA'T TAATCCAGCA GGAGTTATAC AAAATACAAA AGA'rAAAAAT 29 ACAACATCCC 'rGGATCAAAA TCCAGAATTA TTTGCTTTCA ATAACCAAGG GATCAACGCT 3000 CCATCATCAA GTGG1-rCTAA. GAT'rGCTAAC ATTITATCCTT TAGATTCAAA TGGAAATCCTr 3060 CXAGATGCTC AACTTGAAAG AGCAT1TAACA CCTTCTCCAC TTGTArTAAG AAGTGCAGAA 3120 *GAAGGATTGA TT 3132 INFORMATION FOR SEQ ID NO: 111: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 14672 base pairs 806 TYPE: nruleic acid STRANDEflNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: CGAGATTTCT TTAAATGAAC TACGTGAAAT CTACCCATCA TCCAGATCTG GATATTCTCT CCTATCTATA AGTAAAGIr TAGGAGATTT TAATATAAGT TCTCATGCTT TTAAAGCTTC GGTAAGAGAT TTAAAACCGC TCAGTTTCCC ACTCATn'TGC ?TrCGGGAGA GrTCArr TATTATTCTT GAAAAAATTA GTAAAAACAA GTTTTATATT ?TAGATCCTG CAAAAGGCAG GCAGAGAATC TCAATAAGTG AAAGTTAGAT AGCN'TATGT TrAAGTAT*AGGAATAAGC AATTTGAAAG GCATTATTCA AATATCATTr CTCGTAAAGA TAATAAGAAG TCGCCTGTTT TAGGGATTTT ATTTTTTGTA ACAGCATTAT
TAACATTTAA
TA.AACTATTT
TGTATGTAAT
AcAATCATTA GTACCTATAG CTAATAGATA CATAAN'GAC ACGAATTTCA AGGACGArrc GTATTCGTCT AGAATGTTAT TTACTATA'rT ATTTATATTT ACTCTTTCAT TCTCACTAAT GTATTTATTA AGACAGATAT ATGTTGCATC CTTAAAATAT ATAATGGATA AAGAGATTAG CTATGATTTT ATGAAACATT TGATATATTT ACCrrACAGT T'-TATGAAA AACGTACTTT AGGGGATATA CTTTTTAGAG CTAACTCTAT TGTTTATA'rA AGAGAAATAC TATCAAATAA TTTTAT -'PA GCTATACTTG ATTTGTTAAT GA'TTGGTT TATGCTGTGG 'N'TTAPTTAG CTTTTCTAAG TACATGGTAA TCTTTTTAAT ATCACTA.AGT GTATCCAATC A'rAAAAATCT CAAAAAATTT AATTCATAAA TGTTCAA.AAT ATTACTTCCG AGAGCAATTT TGGATTAACA AAGTAATTTC TAAAAATAGT AATGGGATAA TTTTA-ATACA CTAGCTCTAT CTATTGTAAT AATATAAAAG AAAAGGTTAA GATATTAAGC TAACTGGAGA AAACAGCTCA 'rCATAGGTCG AATG7?TTTAC AAATTATTCT TTCGAACAAT TGACGTTAU;G 480 540 600 660 720 780 840 900 960 1020 10a0 1140 1200 1260 1320 1380 1440 1500 1560 AAAACTTGAT ATACATTTAT CAATTGTTAG CCCTGTTG ACCCTTATTG TAGGTGTAAA ACAAATTGTA GCAATAAGTA CAGTCTCACC TGATAACTAT ATACAATTAA TGTTATTAAA -TAATACTAAA TCCGAATTAA TTCCAGAAAG
TAGTATAACG
TATAAAAACA
ATACTTTATT TCTCCTATAA TTTCTTTAAG GGGATATTTT TTAAGAATAG AGGATGTGT'r AGTCAGTCAA GATATAAAAT AATAGAATTA AAAGATATTT GGTATAAATA TGGATTATTT GATGATTATG AATAAATGTT ACTATTAAAA AAGGAGAAAC TGTTGCTATT GTTGGAGAAT
TTGATAAAAA
TTTTGAAAGG
CAGGTTCAGG
TAAGAGTACA TTAGCTAAAA r=TATTAGG TTTATT'AGAA CCTAATATTG GTTCAATAGA AGTTGATGGA GTAGAAAAAG AAGAAATTGG TCAAACATTG TATAGAA.AGA 71TTGGAGC 11 -l- 807 AGTGTTACA6A AAMTCAACCC TAAGPATGG TACCTTAAGA GAGAATTGA CTTTGGTTTCA GATCAAGAAT TAATGACAAA TCTAAATTCA ATTGGTCTIA TAAATCT'rrA CCTCTTGGAT TAGAGACAAT CATCGCTGAA GAAGAATA AGGGCAGCP.G CAAATGATAC TTTTAGCTCG TTGTCrTI'G TCGAAACCI'T T~rTGGACGAA GCAACAAGTA GTAGATAA TTTATCTCA6A CAAATTACAA AAGTGAAA'rC GGTACCACTA AGATTTTAAT TGCCCATCGA CTAGATACTA AGATAAGATC TTAGTAATGC ATAATG.GTGA AATTGTAGAG ATTGGGACCC TC AACTA CGAGGCATTT ATAAGCAATT GTA'rTCAAAT AATTAGTT CCTAAATTTA TGAAGATTAT GAAAAAAAAA TATTGGACTT TACCATATT TTGTTCAATA AT'rCTGTTAC TGCTCAAGAA ATACCTAAAA ATCTT'GATGG CACACTCAGA CTAGCGAAAG T7"?rTCTGAA TCTGATGAAA AACAGGTTGA
CATTTGGACA
GCAATGTAGT
ACT'rTCT=
CGGTAGTTGT
CTTCTTACr
TCAAGTCTGC
ATAGAGAACT
TGATTAAAAG
ATTTTGT
CAATATAACT
CTATTCTAAT
AAAAATCAAG AAGAAGTAGA CCAAAATAAA TTTCGTATTC AA.ATCCATAA GACAGAATTA TTTCTAACAA CAGATAAACA TTTAGAAAAA AACTGrTTTA AATTGGAACT TGAACCACAA ATAAATA6ACG ATATTGTTAA CTCTGAAAGT AATAA'rTTAC TAGGCGAAGA TAATT'TAGAT AATAAAATTA AGGAAAATGT TTCTCATCTA GATAATAGAG GAGGAAA'rAT AGACCATCAC AAAGATAACT TAGAATCGTC GATTGTAAGA AAATATG.AAT GGGATATAGA TAAAGTrACT GGTGGAGCCG AAAGTTATAA ATTATATTCT AAAAGTAATT CrAAAGTTTC AATTGCTATT TTAGATTCAG GAGTCGATTT ACAAAATACT GGATTACTGA AAAA'rCTrrC AAATCACTCA AAAAACTATG TCCCCAATAA AGGATATTTA GGAAAAGAGG AGGGACACGA AGGA.ATAATA TCAGATATTC AAGATAGATT AGGTCATGGT ACGGCTGTTG TAGCTCAA6AT 'rGTAGGGGAT GACAATATTA ATGGAGTAAA TCCTCACGT AATATTAACG TCrATAGAAT ATTTGGTAAG TCGTCAGCTA GTCCAGATTG GATTGTAAAA GCAATTTTTG ATGCTGTAGA TCATGGCAAT GATAWrATCA ATCTTAGTAC TGGACAATAT TTAATGATTG ATGGAGAATA TGAGGACGGA ACAAA'rGATT TTGAAACATT TrGAAGTAT AAAAAGGCTA TTGATTACGC GAATCAAAAA GGAGTAATTA TAGTAGCTGC ATTAGGGAAT GACTCCCTAA ATGTATCAAA TCAGTCAGAT TTATTGAAAC TrATTAG~rC ACGCAAAAAA GTAAGAAALAC CAGGA'rrAGT ACTrGATGTT CCAAGTTrTr TCTCATCTAC AATTTCGGTC GGAGGCATAG ATCGCTTAGG TAAT'rTATCA GAT'rMAGCA ATAAAGGGGA TTCTGATGCA ATATATGCGC CTGCAGGCTC AACATTA'rCT CT'rTCAGCAAT TAGGACTTAA TAACTTTATT AATGCAGAAA AATATAAAGA AGATTGGATT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 808 TrrCGGCAA CACTAGGAGG ATATACGTAT CTNTATGGAA AC1'CATTTGC TGCTCCTAAA GTTTCTGcTc CGATTGCAAT GATTATTGAT AAATACAAAT TAAAAGATCA GCCCTATAAT TATATCTTTG TAAAAAAA~r CTGGAAGAAA CATTACCACT TAAATATACC AAACGTATTG AGATATGATT 'rGAATATGTT AACAAAGTTG GGATAGTTC ATAGATAATG ?rAATTTAAT AAACTACTAT TGGAATTAAA CAAATAAACA CACACAATAT GGTACTCTCA AAATATTTA CCTAACACTT CAGAAAATAC G1'TTAGTTrGG ACTATTACTA CTTTT-TATAA GTATGGTAAA GTAAATGAAA ATAAAATrTG GAGCCCTCTG AAAAAGTAAG GAGTCAAAAG ATGAATCACC TTGATGTAGG GGAGTTTGTC CCCTTCAGAG GAAGAACATT ATAAATCTGT TTTTGAAGAC
AAAAAATGGT
ACAArACGAA
TCANGTTGGAA
TATTACTATT
ATATAATTCA
TATTTTATGG
ATAAAAGTGT
TATAAAAATG
GAGAGAATC
GCCCGAGAAG
?rACAAGTCA
GCTAAAAAAA
TCCTACAGTT CAACTAA.AAT TTATTGCTGC CTGAACACCT GACTTAACCA GTCGCATATC 'rATT'rAGAAT CAGGTCAGGA 7TTTGAAAG ATCCAATCAT TAGTCAAGAT CAACGACAGC TCGTTTTGTG TATAATACGA
AAATGACTGC
CCCCTATT'TC
CAACTGGTCC
TACGGTAGGT
TTACCAGCAG
CATTGTTATA
ACAGAACTAC
CATTGAAAAT
TAATATCCAG
AATCTTGTTG
TATCAAACGC
ACTGGGTG'rG TGCT'rTCTTA
GCAGAAAGAA
AACAGGTGAG
AGGCTGGAAA
ACACCCCAAT
GrTrCTIA TGGGTCT1CAG
AGCGAACCTT
ACAGTCCATT TTGTMTTCA TAGACGCAGT ATCAA7T=GC 'rGA'rCCCAGCGAGCTTATCC AGAGACAAGG AAATGCAAAC AGGN'ACCAC AACTACATCA CATTATTGGA GGGTAATGCT AGCAGGAGCT GTGC7TGGGA TTGCAACT'rC *3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 'I-r'AACACTA TGAATAGGCT CTAC77TGA.A GAA7rTAGAC ATTGCAGGTC TCACGI-rCTT AGAAATCCAT CGCACTTATC 'T=ACTGG GATTTGTTGC GAGTGTrATTT CTTCAGGTAG GTCTTGTT-AC TCTTTACTG TCTAwrCTCT TTACAGT1'AC AACAAGATGT CCA'rGCTTGT TTTGAAGGGA GGTTAATATG TAAATCTr GGAGAACGAG AGTTAT'rTTC GAATCTTTCA AGTCTATGCC TT-AAI-rGGTT CAACTGGTAG CGGAAACA r-AAA -rAcAA CCTTA'rCATG GGACGATTTT TTACCGAGGT GTGCCATTr-r TCTTTCCrCA
AGATAGGAGT
ATGTCCAAAT
ATTGAACT1TA
ATGACATTTG
ACCTTCATGA
APAACC?1GG
CAGAACTTTG
G-CTCAAAAGT
GGCCTGGTTr
CGGGTTGCCT
CCAA1TTAA ATCAAGTGAT VrTTTTCCGTC ACGAA'rGG GCTTAATTGA AAACCAAAGT ATTGAAGAAA ACCTTAAGCT TGAGTCGGTC CGAACAGCG TTGAGGCAGA AGCAGGCTT ATCTTGACCT AGATAAGCGC A'rC7TTGAGT TATCGGGCCG TGGCAAAAAT TATCTTAAAG AAMCCACCCT TTATTCTGGC CTACCTCT'rC
AGGTCTCATT
AGAACAGGTC
AGAATCCCAA
AGATGAGCCA ACAGC?1'CAA 5100 TAGACCCAGC AACCTCTCAG TTGATTATGG AGATTTTGCT ATCTC7TCGA GATGATAATA GGCTAATCAT TATCGCAACA CATAATCCGG CAATTTGGGA GATGGCTGAT GAAGTGTrCA CGATGGATCA TCTGAAATAA A ACCTTGT TTTAATTGC ACCATGAGTT ACTVAAATAT TATCATGAAT CAAGAATTGG Am-rAATTTA GAArrGTACT TAATrrAGAA TTAATATTGA GGTAAC?1-rr AAAAAATTCG ACAATTATA'r AAAATCTTAA TAATTGSAAAG TCTGATAAA GGAAGAAATA ATGGAGAGGA TATTGAGAAG CCTTGCGATT CTAATTCAGA
TTGTACTTTA
AGTTAGAATC
TAAACTGCAA
TTTGGTAGAT GATATTlwrGC AATrMCTCT CAGAATCAAT AATAGTGTrAG
TGTTATTTTG
TTAGAATTTC
CTATGGTAAA
GAGAGATTTT CCTCCTACAA AGGAAGATAT TGTGAAAGTC AAAAATTTAG AGCATTTTTG AAAG'rAGCTr TGAAAACGTA CCG?=~AAA AGAAAACTAT CTTTATTCCA AAAGATGATG ATAAAGTTGA GTGGAATTTG GCTrAGTAAT CTGTGTTGAA GGCTCAAAAC TTGCCTCCAA AGATT'rAGTT AAATAATGAT GAAAGATGI1' GT7TCAGTAT TGAGAAAAGG GGAAGAAAAA GTATAGAAAT ATAGTCAATT TGCCATATTT TTCrCCTTTC GCTTTACAAT q S @0 0 0
S
S. .6 S 0@ 05 5 5 *t 5* S 4 55.5
S
6 0 4555
S
S
*S
S
0 TTAACACAAA A.AGAAATTAT TGAAG'rTCTG TGGGAAAAAC TTGCGAT?1r CACAGAGAAA GAAACAAGAA CAGGATAAAA GAACC1r=G TGGATTGAAC ACCTTTATTG ACCCGCATAG CGGGTGTTTT ACTTAATCAA AAAACGCACC TAAACTAATr GACTATACTT CACTCTTAAA ACCAAAGAC ACAACATTTT GGATAATCTC CCTGGTAAAG CTTCTTCTGG TTGGGACCAG AAATGACACC GGGrAAAAAC TCTGACCAAC TATCGCGTrT GGAGrN-rTT TGGTATAACC TTCCACGCAC ?rTTGTCTCG CACCTAACGG ACAGACAA ACTAATAGTC ATA'rCAAAAA CTAAAAAGTT TGATATCATG CGTCATGTCT TCTATTCAAA TGA~CCTTTA ACCAATTGAT TGAGCCAATC AATTTCTCGC 'rTAGCTGACT CTTCTGAATC TGAACCATGT ATTTTCTCCA GCACNTTTG CAAAA'rCACC TCGAATAGTG ACGAGTTGCA CCCATCATGG TCCGCCAACT TTCGATTACT CACAAGAACr GGACCTGAAG TCATGAATTC ACGAATCGGT CAAGTCCTCA TAGTGCTGGT CAATCAAC'rC TTCTGAAACC 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840
TGTGAACGAA
TCACCCACTA
CCCGTCTCCT
AGTTTTCAGA
TT'rCT'CCAC ACTCCAATTT TTCGATTGTA AATCCACGTT GCCCTCTTTT TACACCATCT GGTTrGATGA TTGTC.AGCTTr CTTTCTTTTA TTTTACCACA AGAGAGAATG AGAGAACCCT CGGGTTC'TCT AGTT'rCAACG GCAGTATCCA CAACTACTTC GTTCGATGCG CTTTAkACACT TAAAGAATGT TTGTTrCCATA =rCGTGGAA AAATGCAG.AA CATTCTCTCT TATTCTACTG TGTTGTTTCT TCATTTCCTT CTTCCTCTAC TGGAGGATTA AGGTAT-rCTT CTTCGTTGAC AGCATGTGGT TCAAGGTTAC 810 GGTAACGGGC CATACCAGrA CCAGCTGGGA TGATCTTACC GATGATAACA TTTCTTTAA GTCCAAGGAG .ACCACGGA TAGCTGCGTC AGTAAGGACA CGAGTrGT CCTGGAAGGA AGCCGCTGAC AAGAAACTGT TTGTTTCAAG TGAGGCTTTG GTAAT'rCCCA TAAZGACTGG GCGACCTGTC GCTGGAACTC CACCTGCG.AT AAGGACATCT TTGTTGGCAT
CTGTAAAGTC
TGACACGGAC
CTACCCCTTG
ATTGATAICC ATGAGGGTAC TTTACGGATC ANTGACGAA GCTACGGTAA ACTTTTTGTA CCATGAGAAG-ACTGTATCA CCTIGGATCCA CCATTACCTC GATCTGTTTG TCACCGATTT C7TCACCGAG AAGGTACGTT TCAACTGACA AGACATCACG AACTG.CAAGG AGACC raT GTTGGATAGA ACCTTCTGTC AGACCAGCAC CACGCGCTAC TTCGCCCCCA ACT'rCGACAC GCATACCAGC TGTAAATGGA ACGACATATT CACCI'CGCC ACGrTCACCC 'rrAACAAAGA CTTTCTTGGT ACCAGTTGAT GCATCTTCTT CGATAGCZAGT AACTTGTCCr TTAACCTCTG TAATAACCGC TTCCCCTTTA GGATTGCGGG V0.0 0.1 CTTCAAAGAT TTCTTGGACA CACCTGTGTG GAAGGTACGC CGATTGTACC AACTGCTTCA AACAGTGACG GCAGACACCC CTTCCACACC AGCATTGACA CAATA)TCAC TGCACCAGTT CACGCTCTrC GAGAGACTCG CGAGGAAGAC CCTGAGTGAT ATCGGTATrr GAC.GCAACCC ATTGTAAGCT GTGTACCAGG TTCCCCGATA GATTGGGCAC CCAACTTCAA CCGCATCACC AG'rCCCCAAG 'rTGATACCGT TGACGAGTGT TACATGTAAA TACAGAACGG ATAGTCACTT ATTTCACGCG CCTTGTCTTC TGTAATCAAT TCATTGGAC 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 TCTGGATC?1' ATCATCTCrT TAACAGTTT CTTAGTrGTAA TTCCTTCTGC GATAGAACGG TGATAACGTC TTGGGCAACG TAAGGGCCGT ATCGGTCATA
CGACCGTTGA
ATCAAGAGAC
TCGACCAAAC
CCTTTACGAG
CACGGTCAGT TCCACAGTCG TCCTCACGGA GACGAG;TCAA GTAACCTGAG TCGGCTGTCT a a a a.
a CACCGTGAGT TGAGAAGAAC ATT'rCCAATA CCGACAAACC TTCGCGGAAG TT'rGAAAGGA TTGGCAATTC CA'rGATACGT CCATTCGGAG CAGCCATCAG ACCACGCATA CCGGCAAGCT OTGAGAAGTT TGAGATGTTA CCACGGGCTC CAGAGTCCAT CA'rCATAACG ATTGGGTTCT TAGGATCTTG GTAGCAA'rC AAGCGTTTCT CAAGTTTTTC ACGGGCAGCA CCCCATTCAG CTGTAACACC ATTGTAACGC TCGTCGTCTG TGATCATACC ACGACGGAAT TGTTTGGTGA r'rTGTTCGAC ACGT'I-rGTGT GATTCTTCAA TGATTTCAGC CTTGTCATCA ACGACTGGGA TATCGCCAAT ACCCACTGTC AATCCTGCAA GAGTT'GAGTG GTCGTAACCG AGGTrCTTCA TGCGGTCAAG TACGGCAGAA GT'rrCTGTCG TACGGAAACC TTTGAAGAT TCAGCGATGA TATTrTCCAAG GTTTTTCTTC TTGAATGGAG GGTTGAGCTC AAGA'rTGCTG ATAGCTTCCT TGATATCTCC ACCAAGTGGC AAGAAGTATT TAGCTGGAAC ACCTTCTGTC AAGTrGGCAT all rTr~TGGTTC TTGCAAXGTAT GG1'AGCCCCT CTGGCATGAT ATCG?1'GAAO AGAATTTTAC CAACTGTTGT AAGCAAGACC TTATGTCIr GCTCrCTGT CCAAGGCrrG ?1TGAGGCTGT Ct-roTGCGAT ACCAACACGT GAGTGGAGGT GAACATAACC ATTGCGGTAA GCCATAACCG CTTCGTCACG GTCTTTGAAG AGTAGTAGIT ACCCAAAACC
ACCATTCCTT
AT1GTCCTGAG CACCTTCGCG ACCAGCTTCT ATGGACTAAC 'rACCGGTTTC
TCC.ATGGTCA
CCATCTrrcG
GCTTCTTTG
GG'rrCAAGAT GTGCTCAGCA GCTAGCATGA GGATACGAGC AAAGTGGTAC GTGGA'rGGCC A'rGGTCCC CGTCAAAGTC CAAGTGGGTG CAACCGAAGA GCCT'rACCAT CAATCAAGAC CCAAACGGTG AAGGGTCGGT GCGCGGTTCA AAAGCACTCG CAAGGATATC CCAGATACGC TCATCTCCGC GT'rCCACCAA TTGCACGAT ATCACGGGCA ACCATTTCAC GCATGACAAA CCATTTCACG CGGCACACCA CATTGGTACA TCTTAAGAGT AACGTCCTCA GAACTCAACA CCI-rACCGA GCAAG?1'TTG CTTTAAGCAT GTGGCTCAAT GATTTCAATG GACGGCTACC C-ACGACGACC A'rTGTCAATC AAAGCGTCAA CTGCrr GAACGATGAT ACCTGGTCCA TTAACTCAA GCAAACGAGC TAACACGCCG GTAAAGGTCA TTCAAGTCAG ATGACCCAAA ACATTGGACG AAGA'rCTGGT GGCATAACCC GAAGGATGTT TGTTTCCAGA CT'rGTAAAAG GCATCCAAAA CATCCAAACG .TTTGTCCAGT AGCTGTTTTC AATTCTTCTT TGAGTTCAGC
TCGCTTGT
A~CATTGTAG GCTTCACAGA TGGCTCGAAG GCTTGGATAC GTGT'rCIrrA ATCACTTCTT GCGTTTAGCT GCTTTGACGT TGGrr-rAAAG ACTrCAATCG TCGACCAACG GCGATAACTG ACGGAAGCCT CCTTGI-rAC TGGTCCTGTG ATTGGACGAC A.AGCATACGC TTCTCA==T CAA.ACGGTTG TTACCGTTGA ACGGCCACCA TCCAACTGCA '.ACAATCATC CATTCAGGCTT ACCATCGC1' TTGACACGCT k.ATrTTTrT TCAAG2ATCTA 8880 8940 9000 9060 9120 91.80 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 CTTGCTTCAA AACGCTTGG ATGGCTTCCG CACCCATCI' GGCAACAAAT GAACCATAAC CA'ATTCACG CAAGCGCTCT CGGTATTCGC GCTCTGTCAT GATACAC'rTG 'FGCTCAAGTG GTCTATCCT'r AGGATCAATC ACCACATAAG CCGCAAAGTA GATAACTTCC TCGAGGGCAC GAGGGCTCAT ATCAAGGGTC AAGCCCATAC GGCTTGGAAT CCCCTTGAAG 'rACCAGA'rGT GAGATACAGG AGCTTTCAAT TCGATATGTC CCATACCCTC ACCACCAACT ?TACTTCAAC CCCACAGCGG TCACAAACAA rTCCTCTGTA ACGAATGCGT CACAAGCACA T-rCCCAGTCT TTTGTAGGAC CAAAGATCAC TTCATCAAAG GTTCTGG'I-Ir CAAGGTACGA TAATGArrG TCAGCTTT TrGCACTTCT ATGAACGGAC TTTACTTGGA GAAGCTAGGG TGATGCAT ACTrrTAAAA
TTCGTACGCG
TTGTACTTAC
AGTCCTTCAC
CCATAAGACC
CGAT1TTACAT 812 CAACCACTAT TTCTTCCCTT TCTATTCTAA GTGAACTGCT TATTC?1TG TTCrrGCr TCCGCI-rTG TTCTTrcr AGCIrrC~rA GCrrCAAAGG CTCTGGGCT GCTTrCGC GGGC?TTC AAGGTCATCT ACGTGGATGA CATTCCTTC.A TCCAAGTCGC GAAGTTCCAC TTCTTGGTCA TCTrCGTCTA
CAGCAGCTTC
CTGCTrAGC
CATCTTCGTC
GGACACCCAT
GTCAAGACCA AGAGATTGCA ATTCTTTGAC TTTTGGAAT GGTTTGCCTT ?TTGTAATAGC AAG-AACTCGG AAGGATT=r GAACACCTGG ITCATAGGCT VrCAAACGTC CGTTGATATC GTCCGACTTG TAAGTCAAGA Trcr7GAAG GACATTTGAC CCACCGTAGG CrCAACAGC
CCAA.ACCTCC
?rCGGTAACA
GTGGAGTTTG
ACGTCCATCG
CCAAAGATCT
ACGAGCTGCC
ATCTCACCGA AACGTTGTCC ACCAAACTGA GTTGAGTATG GTCCGACGA ACGCGCGTGC ATCATGTACA TGACTCCGAC ACAAACACGG TAAAGCATCG TT'rTGGCATC GCTATCCATA TCAGAACNGC CTCCATCAAA GACTGGTGTA ATACCAAGGT GAAGCTCCAT AACCTGACCC GCCTTrACCTC CGACTGGT-rG AA'N'TA'CAT CAACCATGTG ?'rATCAAACG GrrCACCAGT CCTGCTTrr TAACAGTTGA CCGATGTGAA 'rACCAAGAGT ATATTCATAC GTGATGGTAC GGAAGGTAAG GCATGTCTTC CCCAAGTGGG TTCAACATGA TGTCGACTC4G AGTTCCGTCI' TACAGGAACG ATACCAGAGA CAACCCCTTT GACCTTAATC TTACG~rTTT GAGCGATGTA CAACTCATCT CCATTTACAC GTGTAAAGAT CTGTGGTACA CGAAGAGAAG TATCACGCAC CAAGAGACGT TCTTCAGCTG AAAGATCrTT AATATCACCT TCTTTAACCT CAGCACCAAT GAGGGCATCT TCACCAACGT 'rrGGAATT'rC ATCGCGCGTT TCTGATTCGT ArTCT'rCAAG GTTTCCGTGA CGTCCGGCCA TTTTATCTCC AACACGAACC AACATGTTAA CACCTGATTG CTTALACATCA CGAACCACAC* CATCGGCACC' TTCACGAGAC TTGTCTCCAA AGATAGCCG CTCACCCTTA GGTACTT TACCTACAAG ACCGATAATC CCCAT-rrCGT CAAGGTCTTT GCGAGTGAT'r TCTTCAGGCC CAAGCTTTGT CTCAACAGAT GTGTAGACAT CGTCCTTCAC 10440 10500 -10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 CAAGCCTTCG CTCATGATAA CGGCATCCTC GAAGTTGTAA AACGATTGGG TITTGTCCAA GCGCCATTTC TCCATTTTCC GAAATCGCCT 7nIrTCAACGA CATCACCAAC T=TACGAGA ACCTGAGTTT GAACGACGGA A~rTTTGGAT GTGGTAAACA ACGAAC?~rCr ACCTTCTCAG CATCTGCGTA ACTAACTTTA CCN'CCCA6AG TCATGTAGGC ATAGAAGGTC CGTCAGCGAT GT'GCGTTGGT TGTAAGCAGT TCCAATGA6AC CATCTrCACG CCATCATACT GACCAATCAC AGCCGCACCA GAATCGTGCG CTGCTTGGTA TTCCATACCA GTACCAACC;T AAGGTGCCTG AGGATTAATC AATGGCACAG CCTGACGTTG CATATTGCCT CCCATGAGGG CACGGTTGGA GTCATCGrr TCCAAGAAAG GAATACATGC TGTCGCAACC GCAACTACCT GTTTTGGTGA AACG~rCCA'G TAGTCA6ACAA CATGACAATC TTCTCAGCAA A~rTATT-CA TCTTCTTCAT TATAGCTGG ATACTCTTGG T'rGCCCrr GGTGACGTCC AGrCCATC TTCACGA CGAr.AGrAG CCTGAGCTAC 12240 12300 CAGCTGTCAA CCAAACAATT TC17TTCGTGA CAACACCTGT 12360 AGTGTCCATA 12420 TTCACGGTCA ACCTTACGGT ATGGTGTTrC AGATGACAAG TTATTGATCA AACCGATT ACCACCATAG TGAGTGTACT GCACGTCACG ACCACCAGGT CCrAAGGCTG ACAAACGCG TTGTCCATG AACTGTGACA ACGTGATGA AGGACGGATA TTGATAATTT GTTGTGGTGT TTCACGGACA TTAC CCA TACGAGAAAG ACCAACCGCA CCATACGAC GATT'rCCAAG AACAAAACCA TA~rrGTTCA
AGGTCCTTCA
TACTTCA'rAT
TTT=TGAGAC
ACCAAAGAAT
CAAGACTrCA GCTGTCTCGA 7rr/GACACAT CCAGCACC.GT CACGACTCAA AACTCAGAAA GCCGGGTrT= TCTTTAACTG CAACCTGTTrAC TTrGTCCTGAA CAGACATACG TCCCAAACGT ACTTGGTTCG CAAGCAATTC CTGGTCGATA TCATCTACAC GGCCAAGTCC TTC-AGCCAAG TTGAGGA.AGT AGCTCATCTC AGCAAGGATA AACCT'rGTCA TCTGGGTTAG AGCAATAACC TTGAATTTr'r GATGTAGACA ATCTG=TCA ACGAGTCATA ATCG;TACCAG CTC'rGCAATG GTTTrGGTTGA ACGACCAACT GCTGCCAAGT ACGTGAGCTT TCAGCCGTCT GGCTTCGTCT GTACGAGAGT AACCAATTCG CTGTCACCAA ACGAACCAAG GTTGTAAAI'G TTT'rCACTCO cTrrcAAGTT CATTACCAAT GATCGTrACG GAAGAACAAC ACGCrCAGTC AGTCGCCATC CAAATGGCTI! C'rrCTACCAA GA'rTTCCA TCTGCACGAG TCACCGTACC ACCCCATCTG GArCAGTTGG ACAACGGCTG CATCG=NG 'rCAATGCTTT CAATCACGCT GTTTCAGGGT CTACCAATGG GCAAACGTGTr CA'rAACGACG
TAGGCTCACC
CCATTGGATT
AGATATCAAA
GAATCTTACC
CCAACCAAC
TTTALACATTG AGrT=~PAT TGATTT'rGl'A TGGGTrCAAAG AAGCGACrA CAACCAAGCT 'rGGACGAAGG CG=TCGTAAA TITTCTTTCAA CTGTGGATA TCTT -rrCAA CAGTGTTGCG GATTTCATCA TCACCTGAGA AACCALAGAC AGTACGGTCG ATACGAGTGT ACGTGATATC TCCACGGTTA GGGA'rAACAG 'IrGAACCA'rA GTTAAkAGTAA ACACCTGGTG AGCGGACCAA GATGATGAAA GTACCCAI-rT CTCTCATGAT CrGATTTCG CTT=="C~ TATrGATCAA 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 1344G 13500 13560 13620 13680 13740 13800 13860 13920 GCCCACCTTA CCATTTTTGT CTACTTTGTC .C1GAGAAACG ATAATACGTT CACCACCATT TGGGAAATCA CCAAAGAAAA CTTCTTGGGT ACGGAAGGTT ACAAAAATTG GTGCTGAGTA GCTAGCATCG CGTATATTTT- GGTTCCTTGA TTTCATATCC AACAAA'rrCC GT1TTGAAATT GGCAATACAT CTTCAAACAC TTCCTTAAGA TGGATACGAG CTTCTTCTAG AACTCCATTG TCTCTGTGAA CCGTGGTCTA GGAAAGCILT 814 GAATGAGTCA GI-rGAAT1r CAATCAAAT'r TGGTPAGCTCA AGAACTTCTT TGATTCT'rGA AAAACTACGA CGGGTACGAT GIrCCCGTA TTGAACGTCA TGTCCTGCCA AGATGATCT CCTT'r.TAAA TAAGTTCCAA GCCTTGTCAA TCAGGCTTPT CTAATCGTCA. TATGGTrGTA AACCCCPTAT CACCGTGTCC TCTTGACGA6A TTTTCAGAAT CTTTAAGCCT CTCTrACAAA TGCTCAAAAT C 'TGAAAAAA AGCACAAAAA GAGCAGCrAA ATCTGACTTT TTCAGAAGAT TrAACTGCTG TGAGCCTrGT CTGGACAATA TTrCAGACAA AACCrACGAC AAATGATTAC CCATATrATA CCCTATTTAG CTAGATTrTT CAAGGGGTTr CAGTAGG?1T TTGGTAAATT 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14672 a.
a a. a a a a TTTrCCCATA GAAAACTTGG CATCACATrC GAATCACCC ATGGTACAAA AAACTGAAAA AACTATTGAC TCA.AAATCAT TTTCAAGGTA TA.ATAATAAA CGTTAAGGCG GTATAGCCAA GTGGTAAGGC ACGGCTCTGC AAAAGCTTGA TCGTCGGTTC AAATCCGTCT ACCGCCTrCT ATAACTTGAT TTATCAGGTT TCAAATGAAC AGAAACCCCA ATTTGAAGGG CTTTTTTrAT TTrCCCTCGA ATAAATACGT ATAACTTPTAA AAACTTTTGG AGCGAGTTTG TGGC.AGAGTT CTTCCATGG CATAATTCCC TTTTGAAA'rC AG INFORMATION FOR SEQ ID NO: 112: SEQUENCE CHARACTERISTICS: LENGTH: 7902 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: AGGAGACTAT TCAAGCCCAA A'N'gAGTAGC CCAGCAAAGA CTGTATAGAC TGTGATACCT TTTCATAGC CATTGGTAAA GAGAATTTGG GAACCAAGAA TGGTATCTAA GGCCAGGATA ATCGTACGAA AAGCGAAGAG AGAGGTCAAG ATGCCGCCCTC CGATATATTT TTCACTACCG TAAAGTAGGA TGGCATI'TGG TCCTAAAACC ATGAGTCCAA AACTCAGTGG AATGATAA AG AAGIAAAGA TTCGACTACC TCTATTAACC AGAGAAACAT AGGCTTCTTT GTCTCCTTTC CCCAGATAGT1 AACTGAGACG AGGCACACTC ACTCCAATTG CACCTGTTAC AACCCCAGCT -ATAACGGTCA CAATTCGCTG AGCTATGGTA TAGTAACTAA CGTTGACATC AATCCCTGT TTAACGAGGA AGAGGCGATC TAAAAAAGTG AAGAGCATAT TGGCATTGGC AAAGACTAAC ATGGCTGTCA GAGGGACAAA GAGTGGTTTA AAATCACTTA GGTGAATTTT AACAAGTTTG ATGTCTCTTT TAATCCAAAA ATAACTAATC AGGTAGTTAA TCAGCGTCGA TAAACTCATC ACAAGTGTAT AGACAACAAT ATCGTGA TrTTAACAA ATAAGAAAAT AGAGACCAC ATCAGGATAC Gr.ATGAAGGC TTGACCCATT CGATTGAAAA ?TTrTGAccA TTGGATTATc GTCAAAATCG TACAAGCGAT 'TTrGTTAT CCTTCACAIT GCAA.AGGGCA AGAAAAATGA CCCAAGA CACCCGAC CCAATTCCCA TGTAAGATAG TTCATTATAT CATAAAG?1rA 'rGAAGTTTCA AATCTTAAGT 'rCTAAAATT'r CGCAGCCATT AGI-r-1-rTAA AAAGAAAAC TGTAATTTTC CAGAGCTTCAk AATCTrGGCA ATG.AGrAA TCCCCATAAC AAGGTAGACC AGTAAAGAAG AGAGGATAGG Cr).GGATATA G-ACAGCAGTG GCACAAATAA AAAAAGACrAG AAAAGTrcT GrrAAGATCT AcTGATAGCC CTTAAACCG;T AC'TTATAGAC ACC-ATAAGPT CAApAATA=~ TCGACTGAGT TGAACTAACC ATAGTC-ACTT ATAGCCA GTTAGGATGG GAAAAATAAT ATTCAAGACA AGCATTTAAT TtrATAC?1 TCA7TCAATT TACCTCCI?3 GC'TAATAAGA AATGAAGGGC AGTA.ACTCAA GTAATCACTT TTAAGTTTI CTTrAAGGAA AC-,-ATATTAT TCTGAAGGAC TATTAGTAAT TCACACAA 'T-CCTACTCA TTACTAGAAA
A.
S.
S
*5 65**
S
S
S
55
S
S
TGGACTAGTT
AGTATTT-CTT
AAGACAAAGG
CAIGGATC
TCATACAATT
TGGAGACTGG
ACGTATCATC
AGCCATTGAT
TCGTCCATTT
TC7E'GAATA ATAGAACTGC ATAATTCTCC TATCCTAGAA TTATGATAGG ACrAGrrCT GGTATAATAG AGAGAATA.AG AGAAAATAGA TGATTTATGC AGGAATTCT'r GCCGGTGGAA AGTAACTTGC CAAAACAATI TTTAGAGCTA GG'GATCGAC GAAAAATTTG TCTTGGACC AAGTATTGAA AAAATTGTAG GTTTCTCATG CAGAAGATCT TGTAGATAAA TAZ'CTTCCTC ATTACAAAGG GTGGTGCTGA CCGCAATACA AGTATTAAGA GCTrA'rCGTC CCCT-rACTCC AGAGGATATC GTTCIACCC ATTACACTTC GCATGATTCA GGACAATATC CAACTTGCCC
GGGGAGGACC
TwTTrTTTAGT CTGGcAc-AcC
CTATTTTGAT
TTGGTGrrCA TT'rATAAGGA
ACATCATTGA
ACGATTCTGT
AAAATCATGA
720 780 840 900 960 1020 1080 11.40 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 CGCAGTGrGAC ACAGI'GGTAG TArrACAGAT ATTCCAAA'rC TTGCAAGGAC TTCATGGACC ACATGCATGT AAAATCT1'TG CTCAAATCTG AAGATTACAA AGACTAGTAA AA'rGATTAAT AATATCAGGA AGAGGCrTT CTGTCTG;TCA 'rGCGGATCAG AAAAGCTCC AATCCCAATG AAGCCGGTTGA TACTATCGTT GAAACTACCA ATGGTCAATT GTGCTCACCT TTATCAAGGA CAAACACCTC AAACATT-CCG TT'rATGGATC TCTTTCTGAT CA.AGAGAAGG AA.ATCflIGAC TGATCAAAGC AAAAGATGTG GCTt'TGGCCA AAGGTGAATA CCGTAACAGA TTTGAAGATT GCAAAAAGTA TGA'TTGAGAA CAAATTTATC AACI'AACTAA GCCTAAGT'rT ATCAATGTCA GACCAAGAGA ATCATATCCT TATCCGTCCC AACTACATGG CGTTACTATC AGGGAAAACG 'rGATCCCAAG ATMCrAATA ATTCACGACT CATGTGGAAC CGTCATICE' GACCCGACCG GAACCrACGA GGTTGGTCAA AAAGTTGTCA ATGAAGAATrr CTATGAAAAC 'rACATGACAG GCTTTATGAG AGAGTTTGTT TVCCCCCTA AAGA'rACGGT TGCAGCCATT ACAGAGTG( TATTGACTCT TGCTCP.TACC AAGCGGGAGC CNrrrTGGT TGCCAATAT ATCAACTATA GTC(7rCA'rra GGAAAAGTT-G GAACTCTTCT ATATTCCTGA AGATTTGGCC ?r'rGACCATG 816
TGATTCCCAA
GcAcccATTT AArGATCGTGT
TCAGTGTGGG
GGATCGCCOT
CTTTCCAGA
CATTTrGCCAA
CTTTGAATG
TCACTCTCCr ATGCAGAk= CTTGTCTACT GGA~rrGA GGTGGCTTAT GATGCTATTG CATGCACGCr ATGAATCGTC TATTGGAGAT GGAAGTTTAG AGCAGAGAT GTGGTTATTG AGAATGCTAT ATTACGGATA TTGTGGTGGT GATGGTACTG GGGAACGATT CTCATGATGG CTTAGAAAAG GGCTTGAT71T TGCTATCCAA ATGATGGAAG 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300
GACCAGCTAT
GACTTAGCGA
TGGTTGGGTIC
'rAATGACTTG
ATATAAAGTC
ATCTCGTTCI'
ATrCGCTACA TTCGTCCTCA AA'rCTCAATA CTCGCCATGC GGTCGCATI'G A'rrTTGAAAA TCAAGAAArr TGCCAATCGT TTAAAGATAT TCATCGTGTC AGTGGGAAGT ATAAGTACTG TCCTTACTTA GTCAAGAAGA CAAAACTATT TGGCCAAAAC ACAGAAAAGC TTATCAATCG GGCTTAGA'rG TAAAAAATTA ATCGAATCTG CGATTACGCT CCAATATTAC AAACTATTCA GAAGAAATC.A AAAAATACGA TCTGTTAGAA ATGCAGTCTT AAATCTTGTrC ATATCGATTT TATTTGATTG ACTGCZAATA CTTAAAAATA TCCTTT-ATCTr AGAAGAACCT GTAAGAGAAA ?TTGCAACCG ATTTAAACAC AGCCTTTA-AA ACAGTG'rTTA GAGTAATT CTGGAGAAAA TCATTAAAGA AAAAATTTCT GGAAGTCCTC AGTGTTGAAC AACTGGCGrGG AATGACCAAT AACAAATAAG CAATACAT TTAAATTCTT ACAAGATGAA AAGTACAATC TTGAACTACT TCrTNTGAT ATTGAAGCTG GTATCAAAGT TGATTCAACG TCAATCAACA CCAAGTTCGA TACGTCTGCT AAGGAATTAA GAGGAGAT ATCCTTGATT GAAGAACAAA TTCCTTATGC CTCCTTAGAG AAAAGACTGG CTGACTTAGG GGTGCCTGAA AACTT'rATCG AATCACCTCA TTCATCAATG AATGATCCAA TGTGGGATTr T'rCCCAAGAG GAAGAAACTT TCTTA'rCTCA TGGTAAAGGG 3360 AAAGGATTTA 3420 AAATGAGTAT 3480 CAAAAT'rACT 3540 TCCTCCTTTT 3600 CAACTATGAA 3660 TG=ACAGA 3720 GACCAAACAC CGG=NCTCA TGAAAAGAI'T TGGAGTCTAT GGACTGTCTA TAAGGAAGAG AATCGTTACC AAAGAGCTAT TAAAGGTr'rG AAACGGAGTT CCTTT1'GrCC 7T=CCAGG T'rrATATC rT~cdATrT TTr-AGATrr GCTATTTATA AAAT'rACA CAAGGTGAAG ATI'TTGGTGA GCTTCTTATG GAGGTCAGA TATTTTCTGG GGCTTGGGTC GTCACCTT GTGGTrGCTG
AGGACGACT
GGC'rGccCTC CTA'rGACAGT AGA'rACTATT
CTATGGTGTG
TGAAAAGTAA
TAACGGTTAG
CAACTCATGA
3780 3840 3900 3960 4020 4080 4140 4200 817 irrrTGAGC A'rCTTTATCT TACTAG3CTT TCTCTTGGTA AAAGAAGGGA AACTTCGCCT CTCAATrrc TTAAATATTc GCAATGTCAG TG"rATCA'rC GGAGCCTTGC TAGCAGGCCC TATCGGTATG CAGGCCAAMC TrrATr.CAGT ThAGATATC GGAAGTTCTT TAGCTTCATC TGTATCGGCT ATITACCCTG CGATTTCACT TCTArTGGCT I C1-r1C rrT TGAAGcAcAA GATI~cGAAA AATACTGTAT TTGGGATTGT CTTGATTATT GGAGGGATTA TI'GCTCAGAC CTATAAGGTT GAACAG A ATTCrTN'CTA CATTGGGATr CTTrGGCTT 'rGTTTGGC TATTGCA'rCG GGAAGTGAGA GTGTTCTTAG CTC'Nr-rGCC ATGGAAAGTG AATTGAGTGA AA'rCGAAGCC CTCTTAATCC GTCALAGTAAC TTCGTTCTTG TCCTATCTTG TGATTGTGCT CTTCTCTCAT CAGTCATTA CTGCAGTACC CAATGGACAA TTGCTAGGTC TCATGATTG'r ?IrGCAGCC TTTGATATGA TTTCCTACTT GCCTI'ATTAT ATCGCTA'rCA ATCCTGCA ACCAGCCAAG GCTACAGGCT TGAACGTGAG CTA'rGTAGTA TGGACGGTCT TGTTTGCAGT TGTT=TCTTG GGTGCACCGC TGCAGTT'rAT ATTATTATTA GCGGGATTGG GAACTCGCTT GTrAATCAAA AACCTrrGAT GACATCATCA TCATTGTTCG GGTGTTCGTC TCG="rCAA CTTGTAAAAG AAGAATTGGC AATAT'GTTCC GCAATGATrr ACCAACGAAT GGTTCTTGG'r AGCAAGGCAG GTCGCATCCT ATTGTCAGCT N'ATCGACAA AATATGGTTA AGGATAATAT AGCATTTATG AGATCGATAG TAGATATGCT GACCATTATG ACGTCACTTG AAGAATAAAG GAGATTCGTG TGAAAGCCAT GcGTCCTATG ACTGAAAATA CCCCTAAAGC TGAGTACCAA ATTGAG7rT TCAAAGAkAAA TTATCTTAAA GAACAATTCG AT'rCTTGAA TGATAAATAC GCTGACTACA ATAACTTTTA CAACAGC'rAT GTTATTGATG CrGACAATTA GACACGTTCG ACTTA=r~A GTGTTTATCG TTATGGAGAT GACTACAAGG TTCAACACAT TAGTGGTGTA TCCTTCTGGG ATGCTCCAAC GGCTTATGTA AGTGGTGAAT TTGT'rGATCT 'rCGTcAT1TGC
TATCTTAGCA
CTTGGTrCAG
AGGAATCAAT
AGAGAAATAC.
CTCTCTCTAT
TCTCTTTAAA
TGAGAT-rGT
TATTGTTGAT
TGCAGAAAAG
4260 4320 4380 4440 4500 4560 4620 4680- 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 CAAAGAGCTA GATGTCTATC TTGAAGAATT AGAAGGCAAT TGTCCA.AGAC TATCGTAAAT TAGAAGAAAT TCTTAAAAAC GAAAATTAAA GATTCCAACA 'rCTGACAAAA TAGTCGGATG T7r=~GArr T=rACGAAC TTTrACGAAT AGATAGATCA GTAGAAAAAG AAATGGAGTT ATTTATGAAA ATCACAAACT ATGAAATCTA TAAGTTAAAA AAATCAGGTT TGACCAATCA ACAGATTTTG AAAGTGCTAG AATACGGTGA AAATGTTCAT CAGGAGCTTT TGTTCGGTGA TATTGCAGAT ATCTCAGGTT GCCCTAATCC AGCCGT"TTT ATGGAACGTT ATTTTCAGAT AGACGATGCG CATTTGTCGA 818 AAGAGTTTCA AAAATTTCCA TCTTTCTCTA TTNIWATGA C7'?=ATCCT TGGGA'NTGA GTGAAATATA TGATGCG.CCT TCCCGAAGGT AGCGGTCGTG AAAAACTCAT TCAAGGCTTG -IGACACAGC AGCTrCATATG GAACAGGACT GGATGTGrT ATGACCATCT GGTTCTAAGT CTGCCCGTAA 'rCGCATCAr TGCGTTCAGG TAGTCTCA'rr G;TAC TTrTAT ?TrACAAGGG AAATCTTGAC GGCAGTCGTG CTTGrAGCAA ACAGGGAGCT GAAAATGAAC TGG~rATI'GT CAGTGGTCTG
CTCCTGA.AAT
AACTCAGTTG
GCCAAGGGCA
GCAGCCTTC
Th'rCCTAAAG GAAT1WrGGAC
GCTGGACTTT
ACGTrGTGAGC AGAATGGCGG AAAAACCATT GCAGTGATTG CCALATAAACG C7TGCAACAC TACATCCGCA CTGGTGAACA ACCTCTCAAA TTTCAT~rC GTCGTGGTGT GATTG.TAGCA GCCTAAGA GAGCAATGGA AGAAGGACGC GATGTCTTTG CTATTCCTGG TAGCATTT1TA GATGGACTAT GAGCAAAA'rT GGTCACCAGT GGGCAAGATG CTAAGCTAGA A'rTCTAAGAA AAAATCAATr ATAAAACGCA TATTAGCAAG ?TTTAACAC AGTAGAGTAG AGGATT=rC TCATATAATA CTTCCATCTG CAACCTCAAA ACAGTA'TTTT CAGACCGT'rG CCATICATTTG TTCTTGCGGA AT'rTGAATTT 'rTAAGAGAA6A ATGAACCCAA TTGATAATAT GCGI TTC CTCTTCGAAA ATCTCTTCAA GAGCgaCT tC CTCAGTCTTA
ATT'CA.AGAAG
TAAAAATGAC
CATTTCCATA
TAAGTGGATT
ACTACGTCAG
TCTACAACCT
ATTTTCA'rTG CAAAGCAGTG C7"rTGAGCAA CCTGTGGCTA GCTTCCrAG'r TTGCGCTTTG 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 AGTATAAGGG AAACTA'rAGT GAAT'GA-AAT AAGATGTGA6A CAACTCTATC AGGAAAGTCA AATTAATN'TA TAGAAATATT 'rTAGCAGCCA AGGTGTACTC'rG TAGATTC AATTACACTA 'rAATTTAGTG TAATTGAGAA AGGAAGAAATG AATGATTCCT ACCGTrCTCAA ATCTTGTCAG ATTACAAGGC TTG=rAAGA AAGAATTCAA TTATAAAAAA TGGTCTGAAA TAGATGATGA GCTCAGCTrT GCCTTGCTGT CrTGAGCA ATTTACTAGA AA'rGAAACTG ATGAGAGATA AAAATGATAA AAAGAGCTCG TGAGATTCC AGGATTTTAG CGACTAGTTA GCTGGGAAAG ATTGTCATTrG ATGTGGCTA GGTTATGTTC TAAGGAAAAA TAAArTCTTC
AGACCTTGAC
TACTTTTCGA
AGCTACGGTT
TCAGTAGACA
ATATCAGACT
GAAcGATATTT
AAATAAAAAT
AAATCTC7'rC AGCT'rCCGAG
'ITGAGTCAG
ACTAAAGTAT
GTGACA.AATA
AAAAGTAGAG
AAAATGGTTA
AAATACGTCA
T'IrGATT:rC
GATATTATGG
TGAGT?'rGTT
ATAAACTGTA
GAAGTGATTT
ATTCGACTCC
GAGTGGGGCT
TTCGTTGATA GAAT-rTAGAA ATAAAATATA TGAAGAATTA GAAC~rTCCA AGCGATTI-A CTA'rGTGCCA TGCTTATCGC CTCTATCGGA rrAAATATGG CGTGATTATT GGAGCCATGT TAATCrCTCC TTTGATGACA CCTArrCTGG CTCTCTACCT ATATTTGAfrT TTAAATTGTT AAGAAAATCT TTTAAAATAT TAGCTATCA AATTCTTIGCC AGTCTAATAG CTTCAACACT TTA1-1-I-AT C1"rrCTCCCA TTTCGTATGC 7800 TAGTTCGGAG MA1'GTTGCrA GAACCTCTCC GACTAN'GG GATGTTCTCA ?1'GCTTTTGT 7860 AGGP.GGGATA GCAGGTATCA TTGG'7GCTAG GAAAAAAGAG AC 7902 INFORMATION FOR SEQ rD NO: 113: SEQUENCE CHARACTERISTICS: LENGTH: 18627 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: GAAGTTGAAA TGGCCAGCTG ATGAGCAATA TCGGTCATAG AAATCI'CT AATCAAC2'TT *09 TCGCAATTT TTTGGTTGAT AATACGAGGA ATTTGGTGAT TTrTCTTGAC GATAGAAGTT 120 9TCAGCGACCA TCATTrMGA ACAGTGATAG CACTTGAAAC GACGCTTrCT AAGTAGAATT 180 *CTAGTAGGCA TACCAG?1'GT CTCAAGGTAA GGAATC'rTAC ACGGTTTTTG AAAGTCATAT 240 TTCTTCAATT GGTTTCCGCA CTCAGGGCAA GATGGGGCGT CGTACTCCAG TTrGGCGATG 300 :ATTTCCTTGT GTGTATCTTT ATTGATGATG TCTAAAATCT GGATATTAGG GTCTTTAATG 360 TCTAGTAATr TTGTGATAAA ATGTAAI'TGT TCCATATGAA TCTTCTAAT GAGTrGTrTrG 420 GTCGCTTTTC ATrATAGGTC ATATGGGACr TTNTTTrAC AATAAAATAG GCrCCATAAT 480 *ATCTATAAGG GATTTACCCA CTACAAATAT TATAGAGCCA AAAATCCTTr GTTTACTAAA 540 *CAAGGGATTr TTC'TTTTGTC TCTGCTCCTT TTCATA'rA ATAGTTCTAT GTTAAAATCA 600 GAAAAACAAT CACGTTATCA AATGTrAAA'r GAAGAATTGT CCrTCC'TATr GGAAGGCGAA 660 *ACCAATGTT TGGCTAATCT TTCCAACGCC AGTGCTCTCA TAAAATCACG TTTTCCTAAT 720 0ACCGTATTTG CAGGCTTTTA TTTGTrCCAT GGAAAGGAAT 'rGG'rTTTAGG CCCCTTCCAA 780 CGAGGTGTrT CCTGCATCCG TA'FrGCACTA GGCAAGGGTG TTTGTGGTGA GGCAGCTCAC 840 .'rTCAGGAAA CTGTTATTGT TGGAGATGTG ACCACCTATC TCAACTATAT TTCTTGTGAT 900 ***AG'rCTAGCTA AAAGTGAAAT TGTGGTGCCG ATGATGAAGA ATGGTCAOI' ACTTGGAGOTT 960 CTGGATCTCG ATTCTTCAGA GATTGAGGAT TACGATGCTA TGGATCCAGA TTATTTGGAA 1020 CAATTTGTCG CTATTTTGCT 'rGAAAAGACA GCATGGGACT TTACGATGTT TGAGGAAAAA 1080 TCTTAATGTA TCAAGCACTT TATCGAAAAT ATAGAAGTC.A AAACrCTCC CAGTrAGTrG 1140 GTCAAGAAGT TGTGGCTAAG ACTCTTAAAC AAGCGGTGGA GCAAGAGAAA ATAAGTCACG 1200 820 CITATC=MT TrCTGGTCCT CGTGGAACGG GAAAAACCAG TCTGCTAAA ATC~wrTGCCA ALGGCTATGAA CTGTCCCAAT CAAGTGGGTG GCGAACCTTG CAATAACTGC TATATITGTC AAGCAGTGAC GGACGGTAGT TTAGAAMATG TCAMTGAAAT GGATGCAGCT TCTAATAATG GGGTAGATGA AA7TCGCGAPA A7T-CGTGATA AATCTACCTA TGCGCCTAGC CTTGCTCGTr ATAAGGTTTA TATCATAG2AT GAGGTTCACA TGCTGTCTAC AGGGCCTrTr AATGCCCTCC TAAAGACCT GGAAGAACCA ACACAGAATG TAGTCTTTAT 7"rTGGCCACT ACTGAATTGC ACAAGATTCC TGCTACTATT CTATCCCGTG TGCAACGrrr TMAM=rAAA TCAArrAAGA CACAGGATAT TAAGGAACAT ATTCACTATA TCTTAGAAAA AGAAAATATC AGTTCTCAAC CAGAGGCTGT GGAAATCAT'r GCCAGACGGG CGGAAGCTGG AATGCGGGAC GCCTTGTCTA TTTTGATCA AGCCCTGAGT TTGACACAGG GAAATGAGCT GACCACTGCT ATCTC7VGAAG AAATTACTGG CACCATTAGC CTATCAGCCT TGGATGATTA TGTGGCGGCC TTGTCTCAAC AGGATGTTCC CAAAGCTTTG TCTTGCrGA ATCTTCTTTT TGACAATGGT AAGAGCATGA CTCGTTTTCT GACCGATCTT TTGCACTATT TAAGAGACTT GTTAATTGTT CAAACAGGGG GAGCAAATAC TCATCATAGT TCAGTCTTTG TGTTTGAAAT GATTCGCTTA GCAACAGTGA CCAAGATTTA TGCTGAAATG ATGACCGTCC TATCAGGAGC GGTTGAAAAT GAAATT-C1'A AAGAGCTTTC TAATGTAGGT GCGGTTCCTA CTACGGGCAA AACAGTCTAT CGTGTCGATC TAGAAAATTT GGCACTTCCT CAAAAAAATC GTTTAGCAGA TATTAAGTCT AGTTTGCAAC GTGTGCGGA AATCAAGTCC GAACCAGCTC CCCTGAGACA GGAAGTTGCC CGTCTCAAAC AACAAGTTGC ACCAGCTCCT AGTCCACCAG GCAATAAAGT GCAATCTATC TTACAAGAGG 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000
CCGTCGAAAA
AGG;TAATTrGA
CTGCCAATGA
TCCTGATTTA
AAGTCTAGGT
ACACCATGCT
TGAAACGAGA CAATCTCAAT CACCTGAGAT TTAGCTATT CCAAAGCCAA ATCTTCTCAA ?.TGAATTTT-r GGCTGATAAA GCACGTCAALA ATTTAATTCG TTTGCAGAAT GCCTGGGGAG GGGCCGGACA AGGCTCTGCT AGTTGGTTCT CAACCGGTTG ATTCTTGCTT TTGAGTCTAA CTTCAATGCCT GGTCAAACTA ACCATGTTG GTAATATCCT CAG;TCAGGCG GCAGGT7TT TCCATGGAGG AATGGAAAGA AGTTCGCGCA GCCT'rTTCAG ACTGAAAAAG AAGTAGAAGA AAGCCTGATT CCAGAAGGAT GTGAAGGTAG AGGAAGACTA AAG2AAAGATT TCATGATACA ATAAGTTTAT GAATAAACAA CAATTTATTA ATTTT7TCAA TG;AAGCCTGG ATGACTGGCC TACTCTTTAG AAATTTCCGA GTCAGTTATG AGCATTTTAA TAGGAAAGAC TAGCCCTCAG
TTATGGCGCT
GCTATATTAT
GTTTACAGCT GCTGAGACCT GGCAGCCTTT TGGGCAATTT TGATGGGCAA AATCGTTGAT GTCATCGATC CTTCCAGACA AAATCAAAGC CI"VMAGGCT 'rrl--r~rATACTAQAA AGTATATI'TA TAGAATTT GCTCTATTNC TGGGGAAATC ACACGTTTr CTAGTAACTA CTGTAAAAGT ATTAGAGATC AAAGATC?1'C ACGTT.rGAAT mLCCCTG ANNACACCAG AAAT -CCCC GACTCTTCT GCCGCTATCA TGGGAAATCC GTTTrGATGGC GTAAACATCC TTGAGTTGCA CCTTGCrATG CAATACCCAT CAGAAATCCC TTGAAAAAG AAAGGAACTA TCATGTCAGT TGAAGGAAAA GAAATT'rTAA AAGGGGTTAA TPTCA1CCCA CCXA",rA CAGG-,A^A-C AAACTATGAA GTAACrAAAG AGTGGATGAG CGTGCGCGTA TGGAATTACC AATGCTGAGT
GTGAAGTTTT
TGGGAC~rr
TTCTTCGTGC
CGCTATGAAT GCGGGTAA6AG GCTAGATGAA AAAATGGAAT CG;ACGC7TC TCTGGTGG GCCAACATTT GCTCTTTTGG TGTGTCTAAA GGTGTCA6ATG CTACCAACGT CTTTTGAACT TGTTGTCCTT TCTGGTGGTC ATTAGCTGAA GAACTTCGCT GGAGAAGTA6A ATGACTAGAG CTGGTTGGCT GATCTCCGTC
AAGATGATGA
TGCTCAACAT
AGAAAAAACG
ACGAGATTGA CTCAGGTCTT CCATGCGTGG TGAACGTTIrr A'rATCACACC 'rGATGTGGTA CAGCATrTGGC TGCGCCTTTG ACGACTACAA GGAAGAAT'rG AAAATATTAA ACTTTTTTCA AAAAAGC?'rT TGACAAGATT GAACATrrCA
GAAACAACAA
CAATGAAATT
GTTCGTGAGT 'rrATTACTAA ATGGCAGAGC GT'rACCTCAA CTTCAACTTT TGATGTTGGA GATATTCACG CTCTTAAAGT GG1'GCTATGA TCATCACTCA CACGTGATGA TCGAAGGTCG GAACGAAG GATACGCAAA TAATTCCCTC GTATC=MTA GAAA-GCACG CTGAACCAAG GAGACTTTGG AATTACCAGT GGAACGArTA CAGAAAATGA CACTTGAAGT TGGTCCAAGT GCTGAACAGG GTGT'rCTCTT ATCGAAGAAT TCTTCATGTC ACAGC1TACT TTAACAGTGG CCA.ATTGMNG GAAT'rTTCTA 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 438G 4440 4500 4560 4620 4680 4740 TA'rTGACTGT GTCAAA'rTCC ACCGTTGGAA GCCATCAGCA AATGTTCCAG ATrrCACAGC AGGAACTCAA ACTGTTTI'CG AACAAACTCC CACAGACTr'r CACTCACCTT TAGAAGAAAT ATCTrGTTAAG TATGATCATG ACAAG'rTGGC TGCTGTACTC TATATTCCAG ATAACGTAGA CCAAGATAGC GATAGCAATIG TGCCGTTTAA TTCTAAGATT AGCrATCTGG AGCGTTTACA TGCCAATATC ACAGTGGAAG TGATTGCACG CGACCGTCTA GGTGAAAACG TCACTGCCTA
TCTGGGTGAT
TTTAGATCAT
ACTTGAG1'TA
TCCAGAGCTG
GGCTTACCAC
A6ATCACAGAG CAAGCATATTI ATGATTATCG GTCACCCCCT GAAGCAAGTG TTCTCGTGCG CAAGTCAAGT CArrAGCCGT CGTGGTAAAT 'rTG4GTAAAAA
ACA.AAGCAAC
TTGCTCGCTAT
TAG4GCAACGA
CTGATTTTGA
TTTCAAGTG
TGCAAGTATT GACTGGCCTA TCGGTGTCAT GAACGAAGGA AATGTCGTTG 'rAGTGACTTG ATTGGTAATG GTAGCCATGC TGACCTCAAG GTTGTAGCTC 822 TCGTrCAGGTA CAAGGGATG ATACTCGTGT AACThACrAT CGCTGCAACT CAATCGGAAA CATNCTACAA CATGGGGTTA TCCTTGAAAA AGCAACTTTG ACTTTCAATG GTA'rCGGCCA CATCATCAAG GGTGCTAAGG GAGCAGATGC GCAACAAGAG AGCCG'TGrrC TCATGCI'TC AXGACCAAGCG CGTTCAGATG CTAACCCAAT TCTTTTGA'rT GATGAAAATG ACGTAACTGC AGvGCCATGCA GCCTCTATI'G TGGCTTGGAT AAGGCAACTG CGTGGAGATT CCAGTCAAGG GTCAAAACGC TAAGGGGCAG ''T=AGATCA GATTGTCAAT AAAAACCACT AGTAGTTCTG TTCACCGrTGG TGTCCATACC GTCAGGTAGA TCCAGAAGAT ATGTACTACC TCATGAGTCG CAGAGCGTTT GGI-rGTTCGT GGTTT'CCTTG GATCTGTTAT AAGIrCGTGA TCA.AATGATT GCAACTATCG AAGAGAAArr CCTAT=~AG ATGTAGAAGC GATTCGCAAG GATrCCAA GATGAACCTC TGG'rCTATCT GGACAATGCT CCGACGACAC AAAGCTATTA ACACCTACTA TGAGCAGGAC AA'rGCCAATG TTAGCGGAAC CACGACAGC TTCTTATGAA GCTGCTCGTG AAACCATTCG TAAGrTTATT AATGCAGGCT
CGACAACCAG
AGGTCrrGAT
GAAAGACTGG
ATTTGCGAGC
CCTTAACTGG GTGGCACGCT T"TCAGTAATG GAACACCATT AGCAGAGCTr CTCTATGTCT 'rAAATTGACT GATAAGGTTA CTACAAAGGA AGTTCTC=r ACCAGAGGAA TTGCTGACGA AATTCTCACT GACGGAGACC CTAATATCAT TCCATCGCAG GAAGCTTGTC ATCTTAA.AGA CGGTGCCTTG GATATGGAGG AATTTGTTC CCTAGCTCAT GCCTCCAATG TCACTCAATT AGCCCACCAA GTTGGGGCA6A CTCATATGAA GATTGATCTC CAGGA =~GG AGATGGCTGG TCCGACTCGT ATCGGTGTCC TCTCTCCACT AGAATTrGGC GGCGAGATGA 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 i180 6240 6300 6360 6420 TrT=GGGT GGTCAATCCG TTATGGTAGT GGATGGTGCT ATCTGGACTT TTTCGCCTrT TTTACGGCAA AGAAAAGTAT TTGATrGT CTACGACCAA GAACGCCAAA TATGGCAGGA TTGGTATGGA TGCCGTTGAA TGCAGCGCAAT TGAGGGATTG TTATTGCC -r TAACCTAGGT
ATCAAGCAAA
CAATCTACAC
TCCGG'rCACA
CTTGAGCAAA
TTTGCTAGTr GGAAGGAATT GCCTTGGAAA TTTGAGGCTG GCTATTGGAC TTGCGACTGC A=?GATTAT CTGGAAAAGA GCTCATGAAC AGGAATTGAT TGCGTACGTC TATCCAAAAC ACCATTTACG GTTCTCAGGA TTTGGCTCAA CG1TrCGGGTG GATCTCCATC CTCACGATCT TrGCGACGGCT CTGGArrATG TATTTGGAAG AAGGAGTGGC
TCCCACCAAC
TAGTCGATGC
TGTTCGTGCT GGTrCACCATT AGCTCGTGCA ACGTTTTTATA CCTACAAAAG ACAAAGGACT GTCCGCAACC CT'rGCTTCAG TCTACAATAC CAAGGCAGA'r TGCGACAAAC 7 CAATGG CACTTTCTAA ACTAGATAGC CTTTATATGG CAGTGGTAGC AGACCATTCG AAAAATCCAC ATCACCAAGG GAAGTTAGAA GATGCTGAGC AAATCAGTCT CAACAATCCG ACTTGTGGGG A'rGTCATCAA CCTCTrCTG'rC 823
AAGTTTGATG
T CAACT.CT
TTAGAACTGG
CAGAGGACCG rrGGAAGAT ATTGCTm'C CTGCTAGTAT GATGACJAGAT GCCGTTAG CGACTArrr 1-rCTGAAATG G?1'CAAGGGC TAAATTCAGG ATGCACGATT GAAAAACCAA ACAAGAAAr AAAAAGATGA GCGTCAAGAC CAAC CCrC-GAG .C ="TCTCAGT rCT TCCCTCAAAG AATCAA.GG GCAACCCTAG CI'GGAATGC CCTTAAGAAA ACAATTGAAA ATCAAGAAAA ACAGTAAGAC AAGTTTCTTT TGTCTTATGA ATTArrAGAA ATCGAMGAAAG AAAGGATACT ATGGCTGAAG AAAGAGTAGA ACCAAAACCA ATTGACCTTG 'rAGAr.CCTGT CTTATCGACA GCAAAAGGAC CTGCTAAGGG TGAGCCTGAG TGGATGTTGG AAAAAATGCC CATGCAAACT TGGGGACCAG TCTACTACCA AAAACCATCT GACAAACCAG GTGAATATAA ATTTGCGTrC TCAACGAAGG TGTTATTCGT AGTTCCGTTT GAAG'rCTrAT ACTTCTCAGA GATTGACT CCCGTCTTG GGA'rGATCTA
CATGACCATC
GAAT'rATCr
GAAACCTTCA
CATGACTTAA
CCrGAAAAGA 0 0 00 00 00 *0 0 0 0 0e 0 0 0 0000 00 *0 0 0 0 0000 000* 0000 00 00 orTAAAGAAAC CNlwGAACGT CTrCTGCCCA GTACGAGTCA TAGGTATTAT CTTACAGAT AATACTTrGC GAAGTTGGTA ATCGGGATTC CAGAAGCTGA ACGTGCTTAT 'rrAGCACGG GAAGTGGTT ACCACAACAT CAAGGAACAG TTCCAAAAAT ACAGATTCCG CACTCAAGGA ATACCCAGAC TrATTTAAAC CCCCCGACAG ATAACAACTr GGCAGCCCTC AACTCAGCAG 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 TATGGTCGGG TGGAACNT ATCTACGTGC AAACTTATTT CCGTATCAAT AACGAAAATA T'rGAT1GAGGG AGCAAGCGTC 'rACTACGTAG ATAGCTTACA CGCTGCCATT GTAGAAATT= CAACTATCCA AAAC'TGGTCT GATAACGTCT AAAAGGATGC CACTGTTGAG TGGATTGATG ATCCATCTGT 'rrACCTTGAT GGAGAAGGAG CTAATGCAGG GCAACACCAA GACACGCGTG CAAAAGGTGT CAACGTAGAT A'rTCCACTTC TAGGTCAGT'r CGAACGTACC TTGATTATCG AAGGATGTAC AGCACCAACA TATTCAAGCA TTGCTTTGGA CGGAGCTTAT ATCTTT A'rAACTI'GGT AACAA.AGCGT GCTAAGGCTC GAAACTTGGG TGCCAAAACG ACTATGAAAT CGCGTCCTAC CATOCTCTCT ATCGCCTTrG CTAAGATGAT TCACAPATGCT CCACATACCA GCTCGTCTAT TGTGTCTAAA TCCATCGCTA AAGGrGGAGG AAAGGTTGAC TACCGTGCAC AAGTrCACCTT TAACAAGAAC TCTAAGAAAT CTGTTTCCCA CATTGAATGT GATACCATTA TCATGGATGA CTTGTCAGCA TCAGATACTA TTCCATrTAA TGAAATTCAC AACTCGCAAG TGGCTTTGGA ACACGAAGCC AAAGTATCTA AGATTTCAGA AGAGCAATTG TATTATCTCA TGAGCCGTGG ATTGTCAGAA TCTGAGGCAA CTGAAATGAT TGTCATGGGA TTTGTAGAAC CCrTTTACAAA AGAACTTCCA ATGGAATACG CAGTTGAGCT GAACCGCCTG ATTAGCTATG 824 AAATGGAGCG ATCAGTTGGA TAAAArGA TTITATACTC TTCGAAAATC TCTTCAAACC ACGTCACCAT CGCCTTACCG TATGTATGGT TWCTGAtTCG TCAGITrCAT CTACAACCTC AAAACA~G= TTTGAGCAAC tGCGGCTAGC TTCCTAGTTT GTTCITGAT 'TTTGAGTATI' AGATTTACTC AAAATCAAGG A7*rTTGAAGA TGAACTTGTA TCAAAAAATC GCGGTTTAAA ATCGCGATTT rrTATAA1-r? CTCG7I'AACA AAGCGGACAA AC1'GA'rTCCA CCAAACTrrT AMGAAGAAGG C~TTrCAAT TTrCTTGTCT GCrACCATTT CGAAACTAGG GCGCTCTGTG GTCATGTAAC CTTGACCAAT CAAGTCCTTG TCTTCATAAG TCAAATGGCC AACCACTGTT CCAGCTTCAA GTGGTGCTGG GA'rTGCTTTG GAATCAGGTG TGAATTGAAC AGATTGGGAA CATTGATTCC CAACACGrrC GArTAGATAG ATATCCTCTG GAGCCACTGC AGTTACTGTA TCTTCTTTTC CATCTTGTAC AGGGGCTTTG CTATCTTGAT AGGC-ATCGCC 'rTGTTGA.ACG ATrTTGCGAA CTAAATGT AGAAGAAATA TAATCCArTTA GGGAAGATGT AGCTGTAAAT CGAGCGTAAG GATTATTGTC TTGATGATCT GCA?'rTAAAA CAACTGTGAT GACTCTCATG CCTTTTTCGA CACTAGTACC AACAAAAGAC TCTCCAGCCT TATCTGTTGT TCCTGTTTTT
S
S. 5
S
S.
AGCCCATCAA AACCACCACG GTAAGCAGC ATTGTCATCC CAGCAAAAGT AGAAGAAGGT TTTTTGATGA GGT'rGCGAGC AACGATAGCG 'rCTTrTTTAG AACCTGGGTA AATGTrATCC TTGACAACAG TGGCATCCTG AATTCCCCAT ATACCTTCTA ACATGrAGTT GGTTGAAGTG TTTITTGGrGA 'rTTCTAAGAC 'rTGTGGGTAT ACATCATAAG C-ACTAAGCTT ATTTCCTCA CCTAGAGTTT CAT'=-TAAG ACCI'GTCGTA TCCAAGAGTT TTGCCCGCAT CATATCGACG 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 AAATCTTTT'r
GATACCAGAG
TTACTGGCTT
GAGAGGGTAA
GTTATGGAAG
rTTGCCCTCAA
GAAGCACCCC
TTATTCTATC
CTGAGCCAGC A.ATTTTCTCA CCTAGGGCAA TAGCGGCGCT GTTGGCACTA TTGCTTCAAG CAACTCTTCG ACAGTATAAT TACGGGCCTC CATAGGAATA CAGAATTTGT CGTCAATTGA TAAGGATAAT CAGAAATATC TACAGGAGTG TACTCCGTT TTCCAAAGCT TCATAGACCA GATAAkACAGT AATCAATTTT CAATTTCGAC AGGTTGCG=r GCATCCTrCT CATAGAGAAT TTTACCAGTA CAGCAATCGC ATGTTrAGCG GCAATGGTAA AATCTrGAGC AACAGCAGTA CTAAAAGAGA GACAGTTAAC AAAGTTAAAA ATATTTTTTT CATAGTAGTC ATAAAGAAAA AAAATATTCT TGC'N'A.ATA ATTCATCTGT TAAGCTrTTTT 5. .5 S S
S
GAAAATATGG TAAAATAAAG TAAGGGAGGT AACTCATGTT TCGTAGAAAT AAATTATT TTTGGACCAC AGAAATXA CTCTTAACCA TCATCTTTTA CCTATGGAGA CAGATGGGGT CTTTGATTAA CCCTTTG~rT AGCGTGCTTA ATACAATTAT GA'rTCCA'rTT TAAGGGG GCTnTrTTTTA TT-ATT-TGACA AACCCTATTG TT-ACTT'rCTT AAATAAAGTC TGTAAACTCA ATCGTTTGCT TGGTATTTTA ATTACCTTGT vrGc=ATcT cTrAccTAT1 ?TGATTAATC CTAI'TTATAG 'rCGACTACAA GACTTAATCA ATTTGGATGT AGAAGCTACA ATTCAGCAGT ATATCCTAAA 'rAGCGTATCA AATAGTGTGG 177 ATTrT GATTATGACT CCACI-rrrTr TCTGCCCA'r GCTGAAAGA ACGATrCTAA
GTACTTGGT
AGTTATCTAG
TAGAC"ATC
TAAACTTATC
GGAGCGCT'r TGG~rrATr
AGACGGATCG
CTCGGGAATG G'rCATAC=T TTrA?1ATA TCTAGTCAAA TAATTATCCr GCGCTCC-AGA
CTATGTTGAT
GTCAGCCcr
CTTATTAGAT
CTTGCATATT
AGTTTCGATTr
TTTAAAATAT
A'rrCTTCAAA
ATCAGTACTG
GGACATAAAT
CCAGGCTTAT
GACGCAATCA
C.CTTrAGTT
TAAAGAAT'T
TTATAGG?1'G
TTGCCATT
TTCCTATGAT
ATATGCTTGT
'rrATGALAGGT
CGTAGTTGG
TCTTATCCCA
AC1TAAAAGTC
TGAAAATAAG
ATAAAAATCG
TCTATTAAAG
AATTCTAAAA
TTCAT'TTTAC
TTTATAGTT
AAGTATTTTA
ATGTGATmC GGTCG 1 TACC
ACGCTCAGTT
CAGCAGGCGT
ATACGATTAA
TTGAG'TTATG
AAATGCGACG
TTTGGCTTAT
'rrCTGGTGTA
CATCGCAAAT
TGTTCAGCAC
TCATCCAATC
AATCA'GTC
TTTGTATGA6A
AGGAGAACCC
rrAAAGTCrr
AAAAAACCAG
TTCGAATGAA
CATTGGrrAGA
TATATI'TGG
TTAAACTC.A
GATAATTCT
TTTTrTGCATT
GATTCGACAG
AAATATAACT
GACCCGATTT
CCCTTCTA
TGTGAGGCG
A7'GCTCGCT ATATTACTGG ATTGGCTATA G'TATTATTCG GCCAATTTAA TTCCT'rATGT ATAT'rCACTG ATCCCCATAG C.GGCCAAGT A~rGG~rTGA ACTGCGATT GCAGTGAZTTT GTAGATGGCA ATATICTTATA TCCTCGALATC GTAGGAACTG ACGATTTTAG 'rT'rAClwrrT GTTCTCAACC AATATCTATG GCAGTGCCAA CCTATTCTAT CTTGAAACAA ATTTCTAAGTr AATCATAAAA TAATGAAAGA ACGAGAAAGA GAATTAGCTA TGA'I-=rCT TTACTGCAAG TCGCCTTTAG ATTACAAGAC AAACTAATTT TCACAGCTAA GAATAGTAGA AGTTA6ATCTC TGGAATTCTG TGTCAGGGTA AGTTCCACTG GTN'TCATAG ACC'rATTTAT AGTACATITGA .AACTACAATA GTACACCTCT AATCGATTTG ACTGI'CCTGA TCTATTCGTT CTATTCTTAT TGCAATAAGT GAAAAGTAGT CCGAATAATA TAAGGATTGA AATGALATTGA AATAAAGAGA GTACGAXAAAT TCTCATCTGA CTTCGTGAAT TTCTTCAAAA CAGATAGCTT CATCTTAGGT TTTGACTTAG A'rAAGGTATA ATGATrrAT TGTCTrrG CCATTATG~ CCATATTTTG CGACTCGTGT GGCGACGTAA GCA-AAAAATA ACACTTCTTA CGCTCTAGCT GCCTAAAAAC GGATTGCCTCG TGTTCA.ATGA CAGGTCFAT TATTACCGAG GCCGTr'rCAT AAGAGATTGA TAGACTCGCA GrrCTAGAC C1'GTTAAAAT AATACATAAC CTATGC=GT AGACAAATAT 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 826 GTTGGCAGGT GTTNGrACGT GGGTTCGACT CCCACCGGC CCATTATTCC TlrGCATTCT 7rr cr.C CC TTGGrAAA).C GTTG?1'AAAT CAACGTY TM'?rTTATC TTTGGTATTC C7rrGCAcTC TTrrGCTAAA AAGGGAGTCA CAAACAGACC CTAT=TAAA AAAGGATAGA AAAAAGGATA CAACATTTGT CGCATCCTAA AAATAATCTT ?TTCGACGG AAGACATGGG ATTCGAACCC ACGCACGCTA ?1TACACCCT ACCGCGTC CAACACGGCC TCTTAACCCT cTTGAG;TAAT cTTCCAATAc TTAcTcAAAT AGTcTAccAT AAAGCz'TCTT ATCTTGcAA'r AAAAATTCrA GAAATAAGAA AAATGATAGA TTTGAAAGA AAATCGATAAA AAATGCTTGA CTCGAAAGA AAG'rATGATA GAATGAATAG T1GTAAACGAT AACAGGAGGT GATTrCAGTGT TAAAAACAGA ACGTAAACAA CTAATT!-AG AGGAGTTAAA TCAACATCAT G'TACGrTCT TAGAAAAA'rT AC'rTAGTrrG CTAGAAACGT CAGAATCAAC GGTTCGAAGA GAC7"rGGATG AGTTGGAACC GGA-AAACAAG CTTCGTCGTG TGCATGG;TGG ACCAG.AACTC CCCTACTCCT TACAGGAAGA AGAAACCATT- CAAGAAAAAT CTGTCAAAAA CCTTCAAC3AA AACAAATTrCC TGGCTCAGAA ACCAGCCTCT CTCATTAAAG AAAAAGATCT CATCTTT'ATC GATGCTGGA CAACAACTGC TT=TTT'T CATGAAT'rCG TCAATAAGAA TGTTACAGTTr CTCACCAACT CCATTCACCA TGCCGC'rCAG TTGGTTGAAA AGCAGAwTCC AACTGTCATG GTTGGAGGAA ACGTCAAGAC GGCGACAGAT GCTAGTATCG GGGGCGTTGC TCTTAACCAG ATTAACCAAT TGCACTTTGA CCGTGCCTTT ATCGGAATAA ATGGTGTTGA CCATGGCrAT TA'rACGACTC CTGATATGGA GGAGGGAGCT GTGAAAAGAG CTAT-rrGGA GAATGCCAAG CAGACCTACG TCTTGGTGGA TTCGTCAAAA ATTGGACAAA CTTGCTTTGC CAAGGTAGCC CCACTCAAAC CCGCI'ATCGT TATCACTAGT CAAGGGCATG AGCTCTTGCA GGTTATTAAG GAGAAAACGG AGGTAATAGA AGTATGATTT ATACAGTCAC ACTCAATCCA TCCATTGACT ATATCGTTCG TTTGACCAA GTCAAAGTTG GTAGTGTCAA TCGTATGGAC AGTGATGATA AGTTTGCTGG TGGGAAAGGA ATCAA'rGTCA GCCGTTCTT GAAACGTTTG AATATACCAA ATACAGCGkC 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620
GGGATTTATC
CCAGACACGT
CCAAGAAACA
GAAAGCTATT
TAAAAA'rCTA GGTGGCN'TA CTGGTAAATT TATCACAGAT AC=rAGCAG AGCAAGAAAT ?TTGTCCAGG TCGCAGAAGA TACTCGTATC AATNGTTAAAA TCAAAGCAGA GAAATCAACC GAACGGTCC AACTGTTGAA 'rCOGTTCAGC TAGAAGAA'TT TTA'rCTAGTC TGACAGCAGA AGATACAGTTI GTCrTTGCAG GTTCAAGTGC GGCAA'rGTTA TCTATAAGGA TTrGATTTCC TTrGACGCGCC AGACTGGTGC GCAAGTGGTC TGTGACTTTG AAGGACAGAC CTTAATTGA'r AG GGACT ACCAGCCTCT TCTTGTAAAA CCAAACAATC ATGAACTTGG ACCGATTTTT GGGGTTAAAC TCGAAAGTTT AGATGAAATT GAGAAATACG CrT GTGAGTT ACTIGGCrAAO GGTGCTCAkA ATG=ArAT CTCTATGGCr GGTGATGCTG CCCTTCTTGT CACATCTGAG GGAGCTTACT TCGCTAAACC AATCAAAGGA ACAAGTCAAAA ATTCAGrrGG AGCTGGTGAT TCTATGGTTG CTGGATrCAC ACGTrGAAT?1' GTCAAATCAA
AACGG.CAACT
AAAAGTTCAG
CrAGAT1TrGC
GACCACCOTT
'IrGACTTCTA
AAACAACGA
GGACAAGCAA
ACCTTCTCAG
GTAGAAAAAC
AGGCAACTGA
ATGTAACAGA
CTGGTTrGGG
CAGTTCTATT
CTCACCTCTI'
AACACGTAGT AGAACCTC AAATGC.GGAG ATGACrrGGC AACCCCGGAA TrIrArIrAAAG GATGAAAATT CAAGACCTAT TGAGAAAACA AAAAACAGCT GTCATCGACG AGATGATTAA TIGCTrTGCGG
AAACATATGG
TGTCATGTTG
AAATTTGACA
ITTT-rAAACA TTAACAAG TGATGGAATC GCAATGCCTC CAATT7rTGC GCGTGAAGCr ACAGCAAAAA CCCTGCTGTC TGCTAAGTrCA AATAAGGCTG 'ITACTACGA GAGCTTGGAT CTTCATGATT GCAGCTCCAG AACGTGCCAA TCATACTCAC 'rTGGCAGCCT TGGCAGAATT GTC'TCAATAC CGTCAAGCAA CATCTGCAGA CCAAG'rTATC GAGCAACTTG TrTCAAGCACC TCCTAATGAC TCTACAACAG GTrTTCCCA CACTTACATG GAA.ATGGGC4G TTGGTATCAA GGTCGAAACC ACTGCAGAAG ATATCCGTAA GGCTAAAGCT
TTGATGAAAG
GAACI-rlrrG
TCTGGTCACT
GCCCAAGAAG
AACGGTGCTA
ATTATCATTG
ACGCGTCC ACACAAACT ACCAAGCTTC AGAAAAAACT TTATCGTAGC TGTTACAGCT CCCTTCAAAA AGTAGCTGCT GCGCTGTTGG AAATCAACTA CAGCAGACAA GGCCGT'rGA 13680 13740 13 800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 ATGGATCGAT TTGATGGAAA ACAGAAGAGC TAATTAAC= GGTGCCAAAG CTGCAACAC CACTTGATGA GTGGTGTATC GCCCTTGCCT TCTTCArGA GGTTCTTAC C ATGAGTrACC ATGCTTCCAG TCTTTGCGGG GCAGCTTTCG TGCCTGGTGC GCCCCAGGTG GTGAAGCAAC CTrGTTGC-rG GATTTATCGC CCTCGTTCAC 'rCGAAGCTGC ACACGATTTG TTATGCTAGC ACCATTGATC AATCGTCCAG TTGCTGACGG TATCCCTAAG GGCTC=rCA GGACATACTG AAGTCTACCG TGCCCCTALAT CTCTAACGALA AAACAAAGCC TTCGTGGTGC CTGTACAAA TCAAATGTTA CCATTCGTTA TCGGTCGTCG TA'rCATGATT CGGTCCTTG GCTGTTCCAA ATGAAAACCT TGGCAATCTT TTCTATGTTC ATGAAAATTG GTGGAGCTCC CT=GGTTTG TATGTTCCC TACTCTATTG CTGAAAAACC CGGTTGGTA TA'rTGCCAAA GAAGGrTTC CCTTTGCTAA AATT-CCrrAT TTCAACTCTT GCAGcTGTCT CATCrGGT'rT CCTACTGCC
AGCGTGCCTTG
rAAA'rCAATC
TGTGAATATC
GTTCTTGCCA TCAAGAAATA CGTTAAAGTT CTTCTATTGC CACTTCTTGG AACAATCTTC CCAATGGCTG CAATCAACAC TGCTATGA.AT 828 GACTTCCTAG GCGvGTCTTGG AGGAGGTTCA GCTGTCCTTC TTGGTATCGT CCTTGGTGGA
ATGATGGCTG
ACGCTTGCAG
GvGAATGGTGC TTGACATGGG TGGACCAGTT AATP.AAGCAG CTrATGTCTT TGGTACAGGT CAACTG1rC TCAGGGGT TCTGTAGCCA 'rGGCAGCAGT TATO0GCTGGA CACCACTTGC AATC~rrG1V GCAACTCTTC TTTTCAAAGA TAAATTTAcT GTAACTCTGG TTTGACAAAC ATCATCATGG GCTGTCATT- TATcAcTGAG CATrTrTGC CGCTGACCCA GCTCGTGCGA TTCCAAGCTT CATCCTTGGT CAGGTGGACT CG?1'GGTCrr ACTGGTATCA AACTCATGGC GCCACACGGA
AAGGAAGAAC
GGAGCG.ATTC
TCAGCAGTAG
GGAATCTTCC TTATCGCCCT GGAGCAATCG TAAGTGGTGT GAAAAATGAA AAGATTGGAC AATATGGTAT AATAGAAGAA TAAAGCAGAA CTGGAAAGAA GATTTTATTG ATTTCGCAG AATTCGCTTC CTAGTGGGTA CTCTT'rrC AAGTGGATAC TGCTCGCTTA CTCTrGATTT CGTTCTAAAA GGGACC-ATGG
TACTCAAAT
GGTTTATGGT
CG~rTTGGTGC
TGGCAACAA
GCTCTCCTTT
TACCTACGCA
AGTCTTTTTC
GAATACAAGT
AAGAAGCGAT TCAACGAATG TrGATTTCGT TAGGAATTGC CC1-rCAAATT AGGGGCTCGCA GGTATAACCC TrrATAATTT GCCTAGCTTA TCTGGCGATA T"TCGGCCTAT TAATCTATCT GAAAACAGGA AGGACTCTTA TCTGGCTT TCACCATATT TTGAGGCCTA C7TGGTTTGG AAATATGGTT TGGACAAGTC CTCAGGTTCT GACAGATCTG ACTGGTTTTC GAACGACTAG TCCGGGTCGC TCrTrATATT CCAACACCCT TTCTCTTTTC TTGCTTCTAT CTTGATrMA GTGGGTTCTC TCCTAGTCAG TTGCTGA.ATT TTTCAGTAGA GGCT'rTGCCA AA'rGGTGGGA ACCTCGT1TC TGTCTTGGTA AACCACAAGC ATAAAAAATA TCTTCCCGAA ATrCCTGTrGA ACAACAAGAC GGAGACCGTC 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160
CTTTGCTGGA
AAATATCCGA
CCCTrGGTCT G4GGCGCI'GA
ACTTACTTTA
GTTTACGATA
AGGGCACGAG CGTCGAAAAG AGGAACGCTT GGCTGAGAAA GAGGCTAGAr TCCTGT'rGAT ATGGAAACCG TArrCCAGAA GAAP.AGTGGG CCCTGAACAG GAACATGACT AGCCCC~AA TACAAACTTC CTCTAAAGAG AAGAAAATTG CTTTGCTATT AAGGTAACAG ACI'CAACCC GCTG1-rGG'rG
TAGAACAAGA
=rAAATT
TGGAACCIAGA
'rOTCAAACAA GAAGAAAAAG CTCGCCAAAA AGAGACTGAA AAAGCCTTAC TCCATTTGCC GACAGAGCAA GCTGTTCAAA ATCTTCCACC AATCATCCTG CCTCAACr AAC?1'AAATT CAGATGACGA AGATGTTCAG GTCGAT=rT CACCCAAAGA CAAGCTrACA ACTCTTTGCA CCAGATAAAC TCA GAGAAAA TATCAAAATC TTAGAAGCAA TTGAACGGGC CGAAATTCGG CCATCAGTGA TAAGGGTCAA CCGCATTTCC AATCTATCAG
CAAAAGATCA
CC7TTCCTAG
CCAAGTATGA
ATGACCTCGC
TCTAGCCTTG CGCCAAAC A'rGTCCGGAT TGAAGCACCA ATCCCT=GA AATCCCTAAT 829 CGGATrGAA GTGCCCAACT CCGATATTGC CACTGTATCT TTCCGAGAAC TATGGGAACA 17220 ATCGCAAACG AAAGCAGAAA AT1TCTTGGA AATTCCTTTA GGGAAGGCTG TrAATGGAAC 17280 CGCAAGAGCr 1rrGACCrr CTAAAATGCC CCACTTGCTA GTTGCAGr CACGGGTPC 17340 AGGGAAGTCA GrAGCAGTTA kCGGr-1TTAT TGTR1jT CT A AGG CGAGACCAGA 17400 TCAAGTTAAA TTTArGATGG TCGATCCCAA GATGGTTGAG TTATCTG~rr ACAATGATAT- 17460 T1CCCCACCTC TTGATTCCAG TCGTGACCAA TCCACGCAAA GCCAGCAAGG CTCTGCAAAA 17520 001'IGTGGAT GAAATGGAAA ACCGTTATGA ACTCTrTGCC AAGGTGIGGAG TTCGGAATAT 17580 TGCAGGTTTT AATGCCAAGG TAGAAGAG'rT CAATTCCCAG TCTGAGTACA AGCAAATTCC 17640 GCTACCA1-rC ATTGTCGTCA 'rrGTGGA'rGA GTTGGC'rGAC CTCATGATCG TGGCCAGCAA 17700 GGAAGTGGAA GATGCTATCA TCCCTCT TGG GCAGAAGGCG CGTGCTGCAG CTATCCACAT 17760 CATTCTTGCA ACTCAGCGTC CATCTCTTGA TGTCATCTCT GGTTTGATTA AGGCCAATG'T 17820 TCCATCTCGT GTAGCATT'TG CGG'N'TCATC AGGAACAGAC 'rCCCGTACCA TTTTGGATGA 17880 *AAATCGAGCA GAAAAAC?1'C TTGCTCGAGG AGACATGCTC TTTAAACCGA TTGATGAAAA 17940 TCATCCAGT'r CGTCTCCAAG GCTCCTTTAT CTCGGATGAC GATGTTGAGC GCATTCTGAA 18000 CTTCATCAAG ACTCAGGCAG ATOCAGACTA CGATGAGAGT T'rTGATCCAC GTGAGG -rTC 18060 ***TGAAAATGAA GGAGAATITT CGGATGGAGA TGCTGGTGGT GATCCGCCTT TTGAAGAAGC 18120 TAAGTCTTTG GTTATCCAAA CACAGAAAGC CAGTGCC'rCT ATGATTCACC GTCGTTTATC 18180 AGTTGGATTT AACCG'rGCGA CCCGTCTCAT GGAAGAACTG CAGATACCAG GTGTCATCGG 18240 *TCCAGCTGAA GGTACCAAAC CTCGAAAAGT GTTACAACAA TAAAAAAATA GCTrCTTTCC 18300 AAGTTTGGAG GGAAGCTATT TTAGTGGCTA 'FTGATTGCTr TrATTTTCTG AAGTTGGCGC 18360 ATTaGGACTGT TrTCGTrrT CAGTAGCAGG TT'rACT'rGAA GCAGGAGTAG AAGAGTCCTG 18420 *S*AGTTGCTGI-r TTCTGATCTT C7TTTrCTC TTCCTTGACG CTAGATTTTG GTGT'rTCCTC 18480 *TTGCTGTGTT TTTTCTTGAC TAGTGTTAGT CTCTTTACTT GGACTGGTGT TTTCCTTAGG 18540 GGATTCCT'rr TGGATTTCTT TGACAATGGT TGTCGTCTGG CTTGTCGTAG GTTCTTTTrT 18600 .AATATTrTG '1TATTATCCA AGGCGTT 18627 a a(2) INFORMATION FOR SEQ ID NO: 114: SEQUENCE CH{ARACTERISTICS: LENGTH: 2560 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear 830 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: TAAAATACGT TACCTTGCTT CTGCACGTTC AGCAGGTAAG AGATATTACA ATTGAAGAAA
TTCAGCAGGT
AGTACTAGAT
GGTCAATGCT
AGTCTACAT
AATACATCTr
CATGCACTTG
CGACTGAAAC
CAGCTAAGTA
ATTCCGTCA
ATGCTCACAA
AGCTTTTGAA
'rGCACCATAC
AAATCCAGAT
CGGAATCATT
TCGCCAAAAA
TGGTATGGGA
GAAACCACGT
TATCGCCTTT
TCATTGAAAT TTAAAGATCA GGAGTTGATA TTGCTCTCrr GCAGTAAAAG CTGGCGTGGT GTTCCTTrGG TTGTTCCAGA GCCTGCCCTA ATTGTTCAAC TGGGGCTTGG ACCGTATCAT GCAATTCTTG AGACACAACG GATTTGCATG CCGAAATCT AACCC'TCTTC CACAAATTGA AATTCAAATG3 ATGGTGGCTC TTGAGCCGGT TGTTTCAACT TATCAAGCCG TrCAGGTGC TGAACTTCGT GAAGTCTTGA ATGATGGTGT GCCTTCAGGT GGTGACAAGA AACATTATCC TGTrTCACT GATAATGATT ACACGTACGA AATTATGGAA GATGATAGCA TTGCAGTATC AGAGATGAAG ATGACCA.AGG AAACTP.AGAA TGCAACATGT GTGCGTATTC CAGTCTTGTC 0@ et 4* q.
V
AGCTCACTCT GAGTCTGT ATATCGAAAC AAAAGAAGTG GCrCCAATCG AGCAGCTA'rC GCAGCCTTCC TCCTCAAGCT ATCAATGCAG CTrGGATGCA GAAAAAGGAA TGC7TrGGAAC TCAGTrCAGA AGCCGAATTG APATTAAT CTTTGAALATA GAGAGGTGTT CAGCCTTTAT TACCCCCTrC TGATTGAGCA TTTA'rTGGCC AGAGTCCAAC TrTGACCCAC TCAATGGACG CGTTCCTTTG AGITTCCAA AGAAGTAGCG ACTACAACAA ACCTTCTCAA CAGGTGCTGT TCTTGAAGAT GATGTAGCTC I'rGGTTCGcG TGATACCTT GTTGGTCGTA TTCACATGTG GGTTGTTrCA GATAACCTTC TTGCTGAAAC TCTTCATGAA CGTGGA'rrGG TAAAATAGTC ATATCGTTTA GGAGTTCAGA AAGAkAGTAAA 720 ATCAAATCTA 780 TCCGTAAACA 840 TCAAAGGTGC 900 T'rCGTCCAAC 960 TGAACTCCTT 1020 AAAATCATTA 1080 ATTCCAGCCT 1140 ACGACTGCTG 1200 CAAAAGGTT(; 1260 IrTCGr G'CTT ATCAAGATTT CA'rGAGGATG GTTCCATTAA CATCATACGG ATGGAAfN'CT GATGAGGAGT TGGAGTTGTr ATT-GCGGGTG TAGGTACTAA GAATTTrGGTG GTTTCGCAGC GAAGGGATGT ATCAGCACTT
AA.AAAAATGT
CTrTGATGCT
TCTCGCAGGA
TGCGGCTGTA
TGATACGCGT GACTCTATTG TGGGCTTGCT ATGTI'CCTT TAAGACTATT GCAGATGCTT
CTGACCTACC
AAACCATGCT
TCGCTAATAT
AGGATGGAGA
AATTATTATC TATAACATTC TCGCTTGGCT GACCATCCAA GGCTTACTTG ATTGAGCACA TGCTTTCCAT GCCATGAACC CAGGGCGTGT AGTTGTCGAA TTGACTCCAG ATATTATCGG TGTCAAAGAA TGTAC1'AGCT AGCCTGAAGA GTTCTTGATT TATACAGGTG T'rGGGGCGGA TGGGGTTATT TCTGTTGCCT 1320 1380 1440 1500 1560 1620 1680 CTCATACAAA TGGGGATGAA ATGCACGAGA TGTI'ACTGC GATTGCAGAA AGCGATATGA 1740 AGAAAGCCGC AGCAATTCAG CGTAAATTCA TTCCTAAGGT TAATGC!TCTC TTCTCTATC 1800 CAAGTCCTGC TCCAGTTAAG GCAATI'C-rA ACTATATGGG ATTTGAAGCT GGACCCACTC 1860 GTCTACCTCT TGFCCAGCA CCAGAAGAAG ATGCCAAACG CAT TATCAAG GTTGTCGTAG 1920 ATGGCGACTA CGAAGCAACr AAGGCAACTG TAACAGGGGT CTI'AAGACCA GATTACTAAT 1980 AAAGACAATA AAATCCGGCT C?1rTGrCAAC TGTAGTGGGT TGAAGTCAGC TAAGCTCGAG 2040 AAAGGACAAA TTTTGTCCTT TCT?-rTTGA TArTCAGAGC GATAAAAATC CG?1'N'TTGA 2100 AG1'rTrCAAA GTI'CCGAAAA CCAAAGGCAT TGCGCT'rGAT AAGTTTGATG AGATTATTGG 2160 TCGCTrCCAA TTTGGCGTYT GAATAGGGTA GTTGAAGGGT GTTGACGATT TTCTTTTTGT 2220 CCTTTAGAAA GGTTTTAAAG ACAGTCTGAA AAATAGGATG AACCTGCTC AGATTGTCCT 2280 CAATGAGTCC GAAWATTTC TCCGGTTFCCT 'rATTCFGAAA GTGAAACAGC AAGAGTTGAT 2340 AGAGCTGATA GTGATGTrTC AAGTTTTGTG AATAGCrCAA AAGCTTGTTT AAAA'rCTCl-T 2400 *TATTGGTTrAA GTGCATACGA AAAGTAGGAC GATAAAATCG CTTATCACTC AGTTTACGGC 2460 *.*TATCC74GTTG AATGAGTTTC CAGTAGCGCT TGATAGCCTT GTATTCGGGA TTT TCGATGA 2520 SAACTGATTCA TGATT'rGGAC ACGCACACGA CTCATAGCAC 2560 INFORMATION FOR SEQ ID NO: 115: SEQUENCE CHARACTERISTICS: LENGTH: 11303 base pairs TYPE: nucleic acid CC) STRANDEONESS: double 0D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: TATTGGATTT CCCTTGCAAT CAGTTTATGG GACAAGCACC CGGCAGCGCA GAGGAAATCA ACGCC'rTCTG TAGCCTACAT TTT-CAAACCA CCTTCCCACG TI'TCCCAAG ATTAAGGTCA 120 ACGGTAAGGA AGCAGACCCT CTCTA'rGTCT GGTTACAAGA CCAGAAATCC GGCCCACTAG 180 GAAAACGAGT CGAAT GGAAr ?TCGCTAAGT TTCTCATCGG TCGAGATGGG CAAGTCTTTG 240 .AACGCTTTC TTCAAAAACA GACCCAAAAC AAATTGAAGA GGCGATACAA ACTCTACTAT 300 AATTCACAAT CTCACTATGA TTAGGTTTCC TTTAACCTGA TGAATAGTGA GATT'rN'A 360 TGGGCTTTGA CTTAAATAGA AAAACACCCC ATGATATGAA ACATGAAGTG TTGTAAACTC 420 TATGTTGTAG GTGCTTATTT CACAATTTCA ATGTGACCAG TGATAACGAA TACCATACAG 480 832 AATCrrCATA TACACTAAAC AAAI'GACT CTAAT'rArIT CAATrACTTT TGGCTAGTAA ATATCATrC CAACAAACGC CCTCTCAATT CCTTATCCTG ATGATGCAAG ATATI'CATTA AGTCATGAGA GN'TTCGCA TTGATGAATT GATTTAACAA TCTATCTT AATTCATATG GAAGAGAAGC TGTCTTTACT
CTTTCC.ATTC
TTCCTTTAAA
ATGCTAACTC
AAATTAGCT
ATGAATTCTA
TACCATTCCC
AGTCTAAAAA CTTCGTCATT TAAAGATGTC CTTTTATTAT arATCATTCT TATTrGGCAA TTCAATTATA GACACAT'CG rGT~rCTAT TGCTTGGAAC GATACTAGAA TCTCCTTGTA A?1-rCCCAAT CGATTGATAA TCTTrGTTlA TATCTTTGAC CATTTTGATC TTCAAGCAX-r TCAAAAGAAT CAACTTCAGG 'rAAATCAACA CCCA'rACCTA CACNrTTGC AAACACAGGC G;TAGTCGAGA CTGTGTA'rr TTTCTCTGAA AAGAAGTCAT C?1'GCAGA TTGGAATGTC AAATCCATCT TTCCAAAAAA GTATTGGTTT GGAACATTAT AGATTGGACT GA'rrAATGGG GCACCTTCCT GTTGTTTCC TGGGAATACA TACCAATCTA TCTCAGAACC AACCAAGGGA ATCATGCAC TGTCCCTATA ACACTTAAC TTCACACCAC ACCATTCACC TCAG40GAAC CATACATCTA TTTCrACAAT
AGCTCTCATC
CA'rCTGTCTG
GGGAGCCACC
ATTCTCTGGA
TACArrCATC
ATCAGTTCTG
TAGAAATAAT
GTATATAGAT
AGGGAATCAT CTGATGTCTC AAACGAACGT ATTTCTTCAT AATCTTAGAT G'TCTCr AAAAAAACCA AGGTTCTTTA CTAT'rAAAAG GAC?1'CTAGA ACTATG-TA.AT CGAGTAATCG GACTAAAAAC ACCAAACTGT AGCCATCrAG Tr=GAGCTC TATGTCCACC GATATCATGA CTCCACCAAC TATAACCGAT AATAGGGTTG AAATCTTAAC GAA'rTCCAAC TAATAATAGT GGTAGCGGTG ACTACCAGGA CCTGCATATC 'rTGATAAAAT TACAACTATC CTCATAGTGA TAATGGTTTA AAAGCCAAAG TCC GT'rG CCAGTCAATC CACCAAAAAT CTACTCCCTG TrTCGTCATAA TCCCCCAACA A'rTAGATGCT GTCGCTGTAA ATCCCCTGAA AAACCAACAG CA.AACCACC? TCTGCATTTT TGGATCTAGC ATACCTTGTG CTTTTCTAGT TCATAATGAA 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 CATCITTrAAA GTAGGCTTCC CTAGTTCTAC ATTTAACCCC GTATCCCATC AGCAGGATG GCAATAACTG TTCTGGAT'T TTCCAAAGCG AGCTGCAATC CTAAAAGAGG GATTAAAAAA ATCAAAAATA GCAGGTTCTT AACCGTTTrrG CGATrrGAGG ATAAGCTTCT TCATAAGCCC ACATTTAAGG AGAG?1'?AG CT?1'CTATCA TGAAGTGTT GGTAT'rAAGT TCTATCCA ACTATATCCT C2'CCAGCCAC TCAGTTATAT GCCAATCCAT ATCTAACACA CCGATAGATA ATGGAATrr CTCTGTTTCA AATCTGTCTA TTAAATCCAA CTATTCATCC GACGTATAAG GCCAATATCT ACTCCACCAA TTGCCTAAAG CATATCTTGG CAACAAGGGT GTTGAACCAG TCAAA'rGGTA AAAATCrCTG ATT-GCTCCTC TA'rAATCATG CCCATAGGCA A.AGAAATACA 833 GGTCAATTTG ATTTTrCCC AATCATCCAA TAAGGCTATA CATCTGCCI'T ATCCAGAC ACCAGCGACT ACCATATACG.
TCAATATAAC CAGATTGrM ATCCCAAATA AATCCI'GAG CCATTTCGGC TAATAATTCC ATCTTCTAAC GAGATTGCTC cGAGcTGI-rC C2-I-IAACGT TTCAATAGAT TCACCAAAAT GCAAAATrC CT1?rAATrC TATAAATAAA TTTTCGGCGT
TAAATTCTCC
ATGTCTCGAT
7?rcTATCCTC
CAGAGATTCG
GCATAAAAAC
A'rCTGTrATA
TT-AACAAACC
TTATAAAG TGCAGATG.AA AATAGTCCGT CATAATATCT ATAATCTAAC GAAATTTGGC CAAAATCTCT ATT-ATAGATA AAAACTTCCA GT~rrGAGAGT ATTCTAACCT TACTAGCTTG ATAAAACTCT CCCI'AAAAA TTTTCAATTT GTTTTCCTCC AGAACGCACC ATTGATG CGIrT-rTCAT TATTCTGAAT TCTATGACAA ATAATAGTCA ATTGAAAAAA TCCAGTGGAC AAGACITAT TAAAGAGTTA TCACTr'trCA AC TITTCAA
AGTACGTTTG
AGTTCTGTCG
TCTGTTAATA
7=TATGCTA
GCAATG=T~
AAAATATCT
GCTTATGCAC
TTC1'GAAACA AACrACTTT AAACTATTAA CTAAGATAGG TCTTACTAGC AATCATACGA TATTCAAGC'r CACGTGCT TAGAACTGAA GAACCCGGAT CGGTATATAA ATI'ATCCGGA ATAACAGTG CGCTI'CATTA ACTCA'rCCCC AGAGCAAGAG ACATCACThA CCGTAGGTCG CCATCCTTCA ATCATA'rrr ATTGATAAAT AA71wrCAAAC TTTCCTTCCT GCTTATTTCT TCAACATAGT CATAAGATTC CTTCATCTCG TAATI'TTCA TACTTAAACC ATACCAAACA 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CTCTTAAAAA CGGATCGGTT TTCAAAAGCT ATATCTAAGG ACATA'rGCTA CCTCCTTTAG TAAAAATTTT ATr-rGTTTG TCATATCTAA CCTCAAACCC AAATATAAAA CGCATTCT CATAACATAC T1-rACI-rTA' TG~r?1TITA ATTCCCATGA TTGTCATCTT TrrCTTTATCT ATACA7'rATA CCATGTTTCT CTGTAGCTTT CTTTTCAGCA CGCTTATCCT ATTTTATAAG TTGCTTTTTT ACTATTGTAT CGTATTCTAC TCGAACTACT TTCTATTTTG AAGAATATCT TAAAAAGACC CCATTCTATC ATCGAGTATA TTTGATCAT ATACACGGAT AGAGCAACTA ATACAGCTCG GTTGGCCATG GAACCTGTGC TAAATAGTAG CACTAACCAT ATAATAGGTA ATCCAAAAAT
CTTCATACCC
TCTTCACTGCZ
CTTTTTTACT
GAA'GGA6AGA
AAGATCTGCT
GATAATTGGC
AATAACAGCA
TACATACA
TAATATTAAC
TATTAGATAC
GTTCCCTGTT TA'rCAACTAT AAA.AATATGA AAAAGCAGAG ACTCATTATT TAAACTATAT CTTGCrCTTC TTTCACCAAT TACACTAGGA ATGCTCCAAA TGCACATACT GTCCCTA.AGA AAGCTCCAAT CCCTACTGGA TTAATAAACT TTAGAGAATT CGCTACGTAA TGGTGCTAAA ATAAATGGTA TAGCCAAGGC TG4GATTATAG TAATGGTTCA TTAATATTAA ATA.AGGCTGG AACTACAGAT 834 GCrCGTCCTA TTGC?1'rAAG CTGTTCAGAT TTAGAGGCAA AAGCAATATA TAAACATACT CCrAAAG?1'G CACCAGAACC ACCTGCAATT ACAAACATAT TAGAAAATTC ACCTGCAACA GCGAAGTGCC CGCCAGCAGC ATTTTCAGCC ATGTTAGCAA GAGCAATTGG ACTAACAA6AT GCAAAAACAA TGTCGCACC GTGGATACCT ACAATCCAAA GTAG~rAGT CAATAGA'rAA ATAATCATTA AACCAATCCA CGAATTAGTC AGATTGATA CA2CCAAA TGGAATTGCA ATG.ACTTTAA AAATATCTGT TCCCATTGCT ACAAGAAGAC CGTTGATAAA GATAACAACA AATGCAACAA CAAATCCCGG AACCAAAGCG GTAAATCCLC GAGAAACTCC TTCTGGAACA GCTTCAGCCA TTTTAATAAC CCAArTATGT TTAACACACA TACGATAAAT AACAACAGTC ACAA'rTGCCA TAATGATTCC CCTAAAAATC CC7TTCrTCC CAAAACGTGC GACTACATT CCCAT'rGCCC ATTCCACCA'r
GCACCATTAA
TATCCAAGTG
ATG'rAAAGTG
AATGAGAAAG
GGTACAGCAG
CCCATTGGTC
AATAGACCTC
AATTATACTA
ATCCATCTGC AATTACTGCA CCrrC7"rrAGACTTGTCAC AGTCTTCATC CAAAAATGAT TTGCGGTACT GTCATGACAA AAGCCA'rCAA GGCAAGCAAG GAGGATTCAT ATTGAG7T=~ TCTTCCTCTG CATAAA=rr TCGrCAATTCA ATAGAACGAA ATAAAGAGAT AGAGAACCCA TACTCGCATA GI'TTGCAACC ATGTGAATT ATCAAATGAA GCAGAGAAAA TA'rCTGCCAC AATTGGCCAA C TTTGGCAA AATACTGAAT ACCAAAA.ACA TTGATCCTAC AATAGTAAAT CCATACCTGC AGCCGTGATA GCACGTACTIA CT?'rAAACTG AGCAAGTC CCATAACATG GT'rTCAAGA AAACCAXACA ACCCCTTr'rG *rATCCATA CTTAATAAAA CATAATAA'r? TTTACTTCT AAAGACTAG1 TTCAAATACA CATCAGGATT ATAAACTAAG TGAC7'TCTTT TCCAAGA CAAATTGTTG 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 ATAAGCCTTA TCTGTTCG=r TATAAATTTT 'TTTAATTCTT CTAA'rGTCTA ACAAACTCAG AACrAAACCT AATAGAAGAA AATCGGAAAA AGArrCTTAA CTACAAAAAC AAATAAACGT ACCAACTTGT CCAAGTTAAA
GCTACTTGGT
ACAAGTAATC
:..So O AAGCAN'TGT ATTCTAACAA ACATTAGTGT TATTCCCAAC TrrTCINCC AAGTTTAAAT TGTTCAACAG TTGCTAAAAT AGAAAATACT ATGAGCATAA AATAATAGGC GAGGGACTAA TAAACTGACT CAAAAGCCAA TAAATATTCC GAG'TGCTATT GAATAACGTA GAAGAAGATA TCGATTGAAA AAAAGTATTAG CTCTCGACGT TGTTGTTCAA TCTTTTGTCG TTCTTTA TCCATATCAT
TATTTTCAAA
CTA'TTCAAAT
TATTTCCATAA
TGGGGAAAAT
CAAAAAAGAA
'N'AGAGCCAT
TTCCTCCTTA
TATAACAACA CATATTTAGT TAACTTCTT ATAAAGAGCT AACATTTCCT TTGCTACTTC TAATAATGTC ATAGTGGTCA TTAAATGATC ?rGAGCA'rGT ACCATGA'rAA TTTCA6ATTT AATTTCCACT CCACTTGCGT ATTCTTGCAA GAG~rTGGTT TGTGCATGA'r GCGCTTCA6AG 5820 II 'S JO~ A 835 AATITATCTCA ?1'TGATrGAT TTAATTTACT TTCTGCATCA TCAAAACTAC CTTCTCTCAT TilrGCAAAT GCTTCATGTA T?1'CTGACCT TGCATTTCCC GAATGCAGGA TAATTTCAAA TGCTGCAACC TGCAGTTCCT CTTGATTCAT ATAAACCTCC TATTTrTATCT TCTCAAATAT GTTAATAAAA TCTTCAAAC1' TATTGCAAX2A TATTAGCTGA TTGCAATT CATCATTCTC TGTCAGAGAG3 ACTATCTITTr TAGTCACAGT TGCCAAACCT TCGTTCCCAT ATATTGACG AGATAGAAGA AATACTAGCT GGACATCrr.A ACTTTGATTA TCCCAGALGTA ACGAACTTr ACAAATTGCA ACCGAAACCT AATTTTTCA GAAAAAACAA 'rrCCCTCTGT ACCAAAGGGC TGAATAGGAT GCGGAACTG.C CTGAACTTAA TTCrrCGCGC TGTTTAATTC CATA.AAGTAA AGA7'rGTTCA ATTrACCTTG
AACTCATTTG
TCTGATTCAG
S
S.
S
S. *S S
S
GCCCTTGTT TCAAAITTG ATCCATAGCO' TTCTAATCA AACTGGATT TGAAAATATA TTTAAGTTTT TCTTGATTCA CCCAAACTCC GTTTCCAAAC AATACCAAGA ATATTAAACT AAGACTTACA TATGCTACTT TGCAAGAAT'r TCTCTAGTCA CAAAGGTTA TTT'AAGGTA6A TATCAATCCT TCAATTAGTT ATTCTGAAAT GT'r'1rAAATA TTTTTGACAT TGTTGACTCT TGGAAATTCC TGATn'rGTTA CTCTATTGCPA TCATCAGTCA ACGAATCACA GACAATCCTA TAGGTTGGCC TCTTGGCATT
ATTCACCAAC
TACATATTAA
TAATCTTTGA
rl-rCCATCTC GA'rTGATAA A?1'CATAGTA
GATTTCTTAA
TAGTACTCTC
CATCATCTGA
TAAAGAATAT
TCCGCC0TTTC
GGAAATCTGA
AACATTTT
TAGCTAACAA
CTTCTCTTAC
TCTGACAGTC
TCTGAACTAC
CATCCAAAAC
AGATAAACTC TCAACCATCT ?'TTCAAGTAA AAAGTTTCT 7rACTAAAAT ACTGTCTAAA TGATTGTACA TAACTAGAAA CTTGCATCTA ATCACTCTTA AGAAACACAC TAACTTT'AAA ATCAATAGCT GACACI'ATAA AATCTATTCC CCCTATTACA TCAACAACTT CArr-rGGGCT GCACCAAATC TTTGCTACGT TCCATACCAG TATTGTCCAC TCCAAGAACT ATCACTATAA rrC'TGTrrAA TAAACC'TACT TGTAGTGTCA AGAAAAGTTA TACATATCAT TAAACrrCT TCAGAAATAT ATGCAAAG'rA ATGTAGTCTA CTACrCGCTT
CTGTTGCACA
CTAAAAAGTG
TGTCCATATT
TT'rCATCTAC 7TrAGATGAGT
CTAATCCTPA
TCTI'CTGATT
TT'rCCTGAAC
CAACCTCT
CW.A'rrTAA 6360 6420 6480 6540 6600 6660 6720 67S0 6840 6900 6960 7020 7080 71.40 7200 7260 7320 7380 7440 7500 7560 S. 55
S
55 'rrTAGAAAGA
TATATTTTTT
TAAATTCTGT
AATTCTAGCA
ATTCTTTGCG
AI'TCAAATC
AGTACAAAAT CAGATAGTrT AA7'rCTTCTA ATGGAACAGT I-rAAAAAATT GATTCTCTAG CCTGAGACAT AAACTCCTTT TTGATCAAAA AAGTrAAAT 1-rACATAGCA ATGTATI'GT GAAATAATTT ATGATAAAAC GTCGTTTATC ACGI-rCCTCG ATTCGCCCTA CTCTCAATGG ACAAATTATA CTCTrGATAAC ATCACTCGTA TCTTTCTGAA 836 ATCATrGAGAT AATGTTGAAC GACTAACGTA AAGTTCATCA GCTAAATCAT CAAAAAGAAC TIGGAACTTCC TCAAA'rAATA ATNTATTTAA GATAAATACT AAACQATCAT CACC 'rTrGA AACCGCAGTT TTCGTATrAGT CTTCTrCCAG 7rCATAACTT GCCTTGATTC TCAAAAAATA TTTGATACCC 7YGACC?1'GT ?1'GAATAATC ATTGTCTTCT CAATTAATTT CAGTACM-rA GGATAAATAT TCTGCCAGTT CN'GCTTGT AACAAAACGT TTGAAGGATA TCI-rCTCTT TAATGTTTAA CACATTCATT TCATATATTG AACCZATArrA TACACTTAAA TCAGTTTATA CTTAACCTAA ATATT-TATTG ACAI'TCATG TGTTCATCAA AGCCATTT'r TCAATTICCCA IrGGAATAGG AATATACCCT TGGNTTCCT GCTTTAGAAC CAGCCTCTTC AAATTGCrA ACTGACAAGA TACAAATCAA AAGCTGCTCC TGCGA'rAGC'T AATAGCATCA ACTACAATAT CTTTCCCTTT TCCTT'rTAGA TGTCTAAACT CCTGGAAGC TTTGAAATCA ACCCGGACTCC CGGACAGTTC TATCTGAACA TCCrrATTrr 'rrATTAAAAA CCCTCCTAAA ACGTATGTTT TCAAACTCAA AACAATTTAT ATATTCTCAA GAATCAAATT TGAGGAGGTA TTTrGTACAAC p p p p. p p 0
PP
P P
*P*
POP.
PP P* P P
AAGTAC.AT"TT
TTCCCTCCTr
AACTCTGTTG
TGTTGAGG
CAGTAGCACTr TTr'rCTGTGC CATAAG'TGAT GAAGACATTC CTGCTGCACA AATAAT-rAAA GCTTTTGCCA TAATA'TNTC 'rCC?1-rCTr AAATCCAATC AAAGCTGTGC TAAGTTGGCT TArr'rGTTAT CTA==~AT TATAAAATAA AGCGTTTCCA ATGACAATTC CCTCATTTC CTAAATGATA TGGAAAAAAA TTATTTATAC 'rTCAATr'rAT AAAATAAAAT TATTCCTGAG AGTAGAAATG AAACACTAT TGCTAAAATC AAAGGCAAGT CTCCTATACG AATACCATGA GCAAGCCACA ATGCAATACC AATAAC'rTGC ATAACATACA TACCTAGAGC AATAGATCCT GTGTCCTTrG TCTTAACTAC ACGAAAAACT TGTGG'rAAAA ATGCAAATGT TGTTAAAATT GCTGCA.ATAC TTCCAATCA'r ATGTCACCTC AATATGCrAA ACAAACTGAG AATAATCTCA Gr~rGTTTAT ACTATTCTAC TGATTCACCG TTAGATCA.AA TAACTrCCrT A'rACCAGCCA A AGATT TCGGGGAACG ATrATAACI-r CCCTTCCCAT TATCATCTT ATCTACATAA ATAAAGCCAT AACGTTrCCG CATrTCACCG GTACCAGCTG AAACCAAATC AATACATCCC CATGGAGTAT AACCCA'rrAA ATCAACACCA TCIrCAACTA CAGCCTITrTr CATTTCACGA ATATGGGCAC CTAGATArrC AATrCTATAA TCATCATGTA CCATACCATC TGCTGCAACT TGATCTATAG CTCCAAAACC A'r!TrCAACA ATAAAGAGTG GTAAGTGATA GT.GTCTGTA AACCAATrA ACGCATAACG CAAACC?'rCT GGATCAATTT GCCACTCCCA TTCACAAGCC TTAACATAAT TATTrCAC TAAATCTTCT GTTTCAAGAT AATCAAAATA AGGATTATTT TCACGATGAG AGTCGATAGC AAAGGACATA TAGTAACrGA AACCAATGTA ATCTACAGTC CCACCAAGTA AATCTTCCTrr 7620 7680 '7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 ATCCTGGGCA GTAAAATCAA CTGAAATACC '1TTTCGTTCC AGGATATTA CCTAAAACA'r GCACATC7.GC AAAATAATAA TGCCATTAAG ATATCCTTAG GATTCCAAGT AACTGrCATAA CIATACTTGA AAATATGCTC CCCx-rCTGCA TAC1-rCAT AMCGACACA TCGCAATCAT ACAACCTAI' TCAAAATCTC AACTAATTCG TAATGTGCTG TACAATACCT GAGTTAGTAA TTCATTGAAA GTCATCCAAT AAAACGAGCA AAGAAATCAA GTGATA.AGGC ATTTCAAAAT '1TCATCAAAA AGATTATCAT LrCAAAG ATACGTGTCC AAAAAGTGCT ATATC'NTCr'I GATTAATCTC ATaACCAAT TTACAGCTC CTTGATACAT AATTGC-"=CT CTxr'rATCAC ATGGTGCAAA ATCITTCCTGA TAAwrCGCTT ATITAACCTT ATC~rGTAA CGTTTAAATA TCAATTTCCT ATTTTTCCAA CCACCATATT GAGATAGAGT GA'rGACAGGT TCAATACCAT AAAACTGTAA*TCCTTCTTCA TTC-GCCTC'rA ATGCAA'rAGA GGTACGCAAG CACTTGAATC
GTGCAGAAGC
CrrCCTC.A'A GArrATTGAT
CGACTTCTGC
CGCACTAA
TCTrAACCA
ACTCATCACC
CCArTrCAGC 0 0 0 0 0 .0e* 0 0000 0 .00.
ATATrTACCC TCTAAAACTC CZATAACA'rCA GCAACACTAA
TATAACCGG
CCAAAGTAA'r
TTCCCTTGCC
ATAAAAATCT ATCCGCCTCAT GATTTGGATA TTCACGAGCT ACTrCCA'rGAC GACCAGCAGT ACCTTCTTGC CATCCACC T CAAGTTGATG ATCT=AAAA GTAGrCATCT TT=CCTCC AGCAGCAACA GCACCACCCC ATAAAAATCC TCACTTTGAT ACTCTTATTA TAAACCrAA TA=rGTAAG GAAAGAAG'rA ATrT.rAATG TAATGATA'rC =IrACGATTT TCAATACTT ATTCCCTGTG TCTATAAACG ACTTATCGCT ACGTTCA-ACT TGCATCTGCA AGTGATATT CT IMGATTGA TAATGTTTAT C1'AAAGI'rC 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100
ACCAAAAGAT
GAAATAGAAC
CAAACTACAA
?'rCTGG&..TC
TTTCTAAA
TTGATTTATC
GAAAACGCAT TCrr=CCT AATATCTTCT TGTATTCTCG AAACTCTCAC AATAATTCTA CCAGAATCAT CTTCTATATA TCTAAGATT TCTGCATT CACTGATCAA TAAGGAGAAT AAGTTACCTT TTTGATTTCT AACTT-TATTG CACTATCTCC TATCCTCCAA AATCATCCTT T'TATGA ACTAATTATC TAAAA'rCATT TTTTCTTATA CCTCATAAAA GGTATTGCAA 000.
0000 00 00 0 0 0 AGTTCCCTC1' AA'rrTCTrrA A'rCACCTTTA .CAxrTTTAAA TAACGIT CG
GCGATTTATA
TTTTCAATTG GTAAAAAATA TTCGTATTTC ACAAGGCCAC TATCAAGCAT TTCrCTTGCA TAATATACAT GAATAGTCAA TGTCATCTTA AAAACAAG=T TAGA'rGAGCA TCTAAACTTG CCATTACTTT CAATCACTTC TrrATACCAA GTCAAT'rGAA ACAAGAGCAG GACAAAAGAG CTTGGTAATA CCTTTTCAG GTGCTTTTTG ATATGAGCCC ATGTrTrCTC AATAGCATTG TACTCAGGTG ACTAGGGAGG AACGTAAA AG~rrATACC CA.AACTCTTC ACACAAGAGT 838 TCTAGCTTCC CCATTCTATG GAATCT'rGCA 'TATCCA'rAA TAATAACCGA TGGTGTGG~rr AATGI'GGTA AGAGAAACTT CTGAAACCAA GCTrCAAAAA AGTCGCTCGT CA'rCGTCTCT TC'AAGTCA TrGGAGCGAT TAACTCACCA trlTGTAGAC CTC CA.ACCAA AGAAATCCTC TGATATCTTC ?1CCACATAC ?T INFORMATION FOR SEQ ID NO: 116: SEQUENCE CHARACTERISTICS: LENGTH: 3112 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: CCTTAGAT1' CCACT'rGCCA GAGGAAT TGA TTGCCCAAAC GCCCCTrGAA AAACGTGATG 11160 11220 11280 11303
CCTCCAAACT
CTATTATTGA
CTGCCCGCCT
CCTCATCGrC AACCGTGAGA CAGGAGAAAT TATGCTGGAA CCTGGTGATG CCCTTGTCAT CTATGGTCAA AAACTGGACA CAGGAGGTCA GCAAGATAAA CATTTCCACT 120 AGAACACTAG TGGAGACGAG TGGGAAGTTC TGGCTAAACC GTACTCGTAT CAGCTT'rGGT ACGGGGGACG CAIT=TCCC TGGGAGAAAT GCCTCTGCCA AAACCGTCTA CGCCAAGGAA CCAAAGAACT GCTGGCAGAA ATGTCGGACT CGGAACCTTT ACTCAGAGT'r CTATCAACTTr ATGGTGGTCG 'GTCATCGCT GATGGCCGCC TCAGCGCTGT
TTTGAATACC
CCT'rATATCC
AG'TGGCTCTG
ATCCAAGCTA
AGACCTGTTT
TCTGAGGAAG
GTCGGAACCA
AAGGAATTTT
ACCAAAAATT
CTGCAGCACC
AGGGTGTTCA
CTGTGGATAA
CTGCTGCCAC
GAACGACACC CGAGTTCTCC 180 TGrGGAACTT CTCCTCCTTA 240 rGCCAAACcC CTCAAGGTCG 300 CGTTACAGAA GAATTGACCC 360 CCTAGAAGTC TrGGAAAGTC 420 AGATGACCGT GAACGTTATC 480 GACTGCTGGT CTTCACTTCA 540 TCTAGTCTAT CTGACTCTCC 600 TCTGGACGAA CACGAAATGC 660 CCTTCGCTCT GTCAAAAAAA 720 CCAAGTTTGA rGGGCAAATC CAAGCAGATT GGTATGAGTG GAAGGTCGTG GA'rGCCTrCT TGGTCATGTr GGTTTCTGCC TTTGCAGGCC CCATCCAAGA ACACTACCGC TTCrrCAGTT GAA'N-rCTCT AAATCTTCTA ATACCAATAA
CTTCTATCCG
CTGGTTGGAC
CAACCAACTT
GTGAATTAGT
TTGGTGACGC
ATCGCTAAGA
CACCTTGGAA ACTATTGGTT CAATATCTTr ATCAAACCTG CCACCTGCCA AAATCAACTC CTTAGATGCC TACCACCATT CATGTTTATT TATTGAGAAA TATTATTTCA AAGAACATCT ATTGAAATTA AAATGCTTCC CCAAATTGAT TCATCTATAT ACAATTGAhA CTCTAGCTAG CTGTAGAAGA GGCCTAGTAC CCCTAGCI-rC GAAAATATTG CCATAGATTG CGTTGACTCT 839 TTTAI'TTCAG CTTCCTATAC TTTC?1'CGC G'I TGTAAAT CAAAATGCAA GACACATGAG TAGCACCATA TTTGTTACTC TTATCTGTCC TCTCAAGAGA CTATrATGAG ATCATTCACr ACrT1GACCC TGACTCTvCC TACTCTCAAA ADCAAAGACT AAAAATCTCT TCAAACCGCC TCAACG1TCAC C1rTGGATTAT ATATG'TGatC AGTTCTATCT ACAACCTCAA AGCAGTACTT T43AGCAACCT GCGACTAGTT CTCTTTGATT 1-DCATTGAG1 ATTAAACAAA AAGTGAACAA ATCTGAATTIC AAGACTAGGC TT-G~rCACTT TVT'TATAG1'C GCTATAAGAT GACCTTATCT
TTATTTCAGA
TATACTCI'TC
T~aCTTCGTC
TTCTAGTTG
TAATGTACAG
ATAGCr'rr
GAATCTGTTA
TGACCGCTAA
ATATATAA'T
ACGCACCCAT
CTCACCCGAT
CATAGAACCG
CATGGAACCA
TGGCGCAACT
TACAGGATAC
CCCTCTCCTT
AATTGTTCCA
CCTACTTCTT
ATTGGAATTT
GAGTCTTTT'
TAAGTTATCG
GCAGAAAGGT
'rrCCTTTTAG
TGTTCGGTAA
CCATCCTCTC
-TAACGAATTC
ACAATAATAT
TCATCTCTGA
ATATA7*rCAG
TCACCATTAT
GTATTTACAT
TTGCTGTTGA
TCAGTrrAT
GCTTTAGCTG
GAAGGAATAG
ATACGTCTTT
TCACGTTCAT
CTAC'rrCTTG CT'rGGACAAG
TTTGACTTAA
TTGTAACTGT
CATTGTCTGC
CTGGAAC'ITTT
CATAACCAG'r CATCAArrG?
GCGTACTTGA
CCCCATITGTC
CATATTGAAT
ACATACrTT? ATCAATTIIG CATTGACTCT ATAGCCATCT AATACCATT ACCACCAACT GGTAGTACCA TGAACTATTA AAAAATACCA CATACCATTT GTTCTACTGC TACATCTGTT ATGAT'rGCTC AGGAACAACA TTACCATCTC TTTAGTAATT CTACAGTATA GANTGTAGTA ACTTT'rATCA AGAG= CGGC AACrTGGGGC TTGGTTCT'r AGTACTCTCA GTTACTTGTC 'rTTCCCATTC ?TTCCTAGAG TTCATATTTA CTAGCAAATG GATAACTGTA TCCGTGGCL' CTCTGGATTA ACATCGTAG AACAGGATT TCACTACGGT AATTTTCTG GTTACTACCT AGCGTCATCA TACTCTATTC CCCATCAGCA GCATGAACAA A'rACCGCAG AACCAAAAAT GArrTTGCCA TTACATCCTA TGGAACCATT GA'rrGACTTT ACTTGTACCC AACCTGTTGC TCTTGTTTCC AGTCTGTTGT CCTTGGTTAG ATGTAACAGA ACTTTTTCAG GTTCTCTCGT TGACGAGAAG TAGTTTCTTC AGAGTAATTT ACCAATCT CATCGAGATA TTCTGTTTCC 'T=AACAALC 'rCTTGTTTGA CACTCTTTCC ATCTACATTA TAATCTCTTG CTCCTGTCCT GAACAAGAAC TTCTTCAACC CTTTTCTATC AACAGTAACC TCCTTGTCGT AGTI'ACATAG CTTTTGTTTC ATCTTTTTCA TAGGTTTAGT CGCTACIrT CCTCTTCI'T ATCTCTAGTA AACTTGTAT CAGA'rTCCTC CTTCCGAGT TTACGTAT'TG CTTCTACTAT AGCATCTT'N'
TCGCAGGGAG
ATACTTGTAT
1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 CTAAAAATAA AC'ITAGCCCG CATAGCGCTr ATTAGTATTA 840 CTATCAAACG TTAAACAATA TACGTTATAT ATAAAATAGA CTTAGAATGA TATATTGATT ATrGAACTAA CACTTTAACT ATATCGTAAT CAATCTCATA TATAAAGGAT TGCAGACATC TTATCTAAAT ACATGCGAAT ATATTTAGA'? ACAAACATTC CAACTTGATA AT 3000 3060 INFORMATION FOR SEQ ID NO: 117: SEQUENCE CHARACTERISTICS: LENGTH: 4327 base pairs TYPE: nucleic acid sTRANDE-DNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ 10 NO: 117: CCCAAAAATC TCTTCAAACC ACGTCAGCTT CGCCTTGCCG TAGTATGGTT ACTGACTTCG TCAGTTCTAT CCACAACCTC AAAACAGTGT TTTGAGCATC ATGCgGCTAG CTTCTrACTT TGCTCTTTGA TTTTCA~rGA GTATAAAAAC AGATGAGTTT CTGTTTTCTT ?TTATGGACT ATAAATGTTC AGCTGAAACT ACTrTCAAGG ACATTATTAT ATAAA.AGAAT TTMrTGAAAC TAAAATCTAC TATATTACAC TATATTGAAA CCGT'rTTAA.A AATGAGGTAT AATAAATTTA CTAACGCTTA TAAAAAGTGA TAGAATCTAT TTrTATGTAT ATTTAAAGAT AGATTGCTGT AAAAATAGTA GTAGCTATGC GAAATAACAG ATAGAGAGAA GGGA"TTGAAG CTTAGAAAAG GGGAATAATA TGATATTTAA CTAC'rTTTGA CAGT'TTTT ATTGTCCCCT TTAAGCTGGA TCTACAAGTA TTTATGCTTA GGAAGTATGG TGTAAATAGC ACTTTAATTG CAGCGGTGTC GGCATTCAAG ACAAAAAAGC AGAGAAAAAG ACAAGTTGAA
CGACAGTTTT
TAAGATTCTG
'GAAAAGCTA
ATAGGCTGAT
CTGGTAGATA
CTGATT~GA'rT TATTTCTTCA CTTATTTCGGG ATTGTGAGCT TGATTATATT TCCCATTATT
TTTGAAAAAG
GTCCATCATT
AACTAGATTrG AGGAGAGGGG AAAATTGGCA CCAATTTGAG ATAGTTTGTT AATGAACTGT AGTAAAAGAA AGTTAATAAA AGACAAACTA TGTCTTATTT CAGAAATCGG GATATAGATA TAGAGAGGAT AAGAACGTAA GTGTCGTrAT AGCATTAGGA AACTATCGGT TAGGAGCAGT GGTATI'TGGA ACGTCTCCTG TTI-rAGCTCA CTCTGGCAA.A TGAAACTCAA CTTTCGGGGG AGAGCTCAAC GCCAGCCTTC TTCAGAGACT GAACTTTCTG GCAATAAGCA TGTTCGATAA GGATTGAGCA TGCTTATAAA GAGATATNr GCAGGAGTCT GATTGGAGAA TAGTTCATTT TrGTCAT'A AGTGCATTTT CTGGAGTAAA CAGTATGAAT CGGAGTGTTC AGGAGCGGTI' TCTATGATTG AGAAGGGGCA AGTGAGCAAC CCTAACTGAT ACAGAAAAGA AGAACAAGAA AGGAAAGATA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 AGCAAGAAGA AAAAATTCCA AGAGATTACT ATGCACGAGA ?FTGGAAAAT GTCGAAACAG TGATALGAAAA AGAAGATGTT GAAACCAATG CTTCAAATGG TCAGAGACTT GATrrATCAA 12 1320 GTGAACTAOA TAAACTAAAG AAACTTGAAA ATGCC.AAGGC CCCAGCATTC TATAATCTCT AGI'ACTTCAC TATGGCACr'r TACA.ATAATA GGAAACAGT TTACA6ATAAT TACAACGATG ATTCTGTGAC TTTCACAGTT GAAAAACCGA TCTACGTAAA CGGGGTA'rrA TCTCGAACAA ACGCAACACT TCACATGGAr.
TTTCTGTG.TC AACI'GCTACT CTGCTACTC'r AGAGGGGCGT CACCCTAAA AGrrAAACCA CAGCAGAACT ACCTAAAGGC GTCTGAGATC TGGCAATTC CAACCAAGCG TGCCAACAAT TGTATAATCG TGCTTTAACA CAGATTTAGA AAAAAAACTA ?TTAAGCCAG 1380 AAAAAAGA'rC 1440 GGTTCGGATG 1500 GGTCAGTGGA 1560 CGAGTGCGCC 1620 ATTAA.AGATA 1680 ACGG=TGGG 1740 CCAGAAGAGG 1800 CCTGAAGGAG 1860 TGCCAGATGT AACGC-ATGTG GGTCAAATCT ACAGATTCGG TACAAAAACC TAGTCAACTr CGGCTTTAAC AGAGAAAACG ATGGAATCAA GAGTTATCGT CAGCTGCAGA TGAACGCCGT GACGTAGTGA AGATAATGGT ACAATCCAAA AGCTTCTGAC rTCAAGATCC TGAAACCAAA GAATC7TTTGG AATGTCT'rCA GACATATTCG AAACCGGCC TAACGGTAAC CCAAATAAAG ATTCCAGCAC TTrflCAAGAC AGATAAAGGA ACTrTGATCG CTCCATTCGA GTGACTGGGG TGATATCGGT ATGGTCATCA
AAAACTTGG
CCATCGATCG
CGAA'rTT
CAAAAAGAAG
CAAATCGGAG
AATCTCACTG
TTTAAACGCT
G'rGACCGAG'r
GTTCACCAGT
CTATCTATGA
AAGCCTACAA
GAGCTTATAC
ATCGCGTTG;T
GTGACCAATT
CCAAGGATAG
AACCATTACC A.AeTACGTG GAATATCGAT ATGTGTTGG CATGTTCCCA GAAGGGAAGG AAAAATCGAT GGAAAAACC? CATTCGAGAA AATGGTACTG TGTAGATCCT GTTAAACCAG
ATCAAATCCT
TCTATACACC
CCTATACCGA
CAACAAACAA
GTGATGACGA
ATTGATGAA
ACAAGGGACG
CTACCGTGAA GGAGAAAAGG AGATGGTAAG GCGACAGACT CAAGGGTGAT CTATACAAGG ?AACTTCTCCA TTTAGAATTG
ACTAGGAAAT
CTATCTATCG
TACTCCGATG
TGTACTrCCG ATC'rACTTCA
ATGTCCTACA
GTCAA6AGCCG
AATGGGCCTC
1920 1980 2040 2100 2160 .2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 CGGGAAGACA TGGTCAGCTC C'rCAAGA'rAT A'rTCTTGGGT GTAGGTCCTG GAACAGGAAT GA'rT-rrGATA CCGGTT'rATA CGACTAATAA TGTATCTCAC 'rTAGATGGCT CGCAATCTTC TCGTGCATC TATTCAGATG ATCATGGAAA AACTTGCCAT GCTGGAGAAG CGGTCAACGA TAACCTCAG GTAGACGGTC AAAAGATCCA CTCTTCTACG ATGAACAATA GACGTGCGCA AAATACAGA.A TCAACGGTGG TACAACTAAA CAATCCAGAT GT'TAAACTCT TrA'rGCGTGG 'TrGACTGGA GATCTTCAGG TTGCTACAAG TAAAGACCGA GGAGTGACTT GOGAGAAGGA 'rATCAAACGT TATCCACAGG TTAAAGATGT CTATCTTCAA ATGI'CTGCTA TCCATACGAT GCACGAAGGA AAAGAATACA GTGAAAATGG GATGGTCCAC ?PGGCACGTG AACACAATCC AATTCAAAAA GGAGAGTTTG GGGAGTATGG CATCTTAT GAACATACTG TTAGAAAATT TAATTIGGGAA TTTTTGAGCA AGAGAGATGG GCAAAGGAGA GArGGGCAAA GTATTGGTCA ACAAGGCTCC AACCCTTCAA ACCCAGTATG ATAGCAAGAC CTTCTGTTGTI ATTATTGGTA TAGCrAAAGG AAGCATCGAA GGTGCCAGAG T'rCCTGGCGG AGTAAATGGT TTTACAGGGG GAGTTAATGG TACAGAGCCA 842 TCA'rCCTCAG TAATGCAGGT GGACCGAAAC TCGAAGAAAA TGGTGAGTTG ACTTGGCTCA CCTATAATTC GCTCCAAGAA TTAGGAAATG AAAAAGGACA AAATGCCTAT ACCCTATCAT AAAATCTGA'r TTCTCCTACC GAAGCGAACT GGAG?1'ATTG GCTrGGAGTT CGACTC.AGAA ?TGGCAAATG GTAAAACAGC GACTTTCCTA GCAGTAGATA AGGAAGATAT CGGACAGGAA AGTATGCATA ATCTTCCTGT AAATCTAGCA AGCAAAGCAG CGGTGCATGA AGTTCCAGAA GCTGTTCATG AAATCGCAGA GTATAAGGGA TCTGATTCGC TTGTAACTCr TACTACAAAA AAAGATTATA CTTACAAAGC TCC'TCTTGCT 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4327 CAGCAGGCAC TTCCTGAAAC AGGAA.ACAAG GAGAGTGACC ACAGCTTrCT TCCTTGGTCT GTTTACGCTA GGGAAAAAGA TCTAAACATT TGAI-1rTGrA AAAATGGCTC TTTGTCAACT AAGCTCGAGA AAGGACAAAT TrTGCCTTT C~rTTTGAT GTTTTTTGAA GT'rTTTCAAAG TTCCGAAAAC CAAAGGCATT GATTATTGCT CGCTTCCAAT TTGGCGTTAG AATAGTGTAG TCTCTTrGTC CTTTAGAAAG GTrTTTAAAGA CAGTCTGAAA GATTGTCCTC AATGAGTCCG AAAAATr'rCT CCGGTTCCTT AGAGTTGATA GAGCTGATAG TGATG'rTTCA AGTCTTGTGA AAATCTCTTT ATTGGTTAAA TGCATACGAA AAGTAGGGCG
GTTTACG
INFORMATION FOR SEQ ID NO: 118: SEQUENCE CHARACTERISTICS: LENGTH: 3521 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TCCTAGCTTC
GAGAACAATA
GTAGTGGGTT
ATTCAGAGCG
GCGCTTGATA
TTGAAGGGCG
AAGAGGATGA
ATTCTGAAAG
ACTAGGACTA
AGAGA.AGAAT
GAAGTCAGCT
ATAAAAATCC
AGTTTGATGA
TTGACGATTT
ACCTGCT'rTA
TGAAACAGCA
ATAGCTCAAA AC GTTTA ATA.AAAATGT TTATCGCTGA (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: CTCTGGCCCT GCCACTCCAA CGTTGTCA GGG;TGCT'rT TTCATAAAGG AGTTCTTATG 843 TAATATCA AACGTAI-rCG TACAGATTTT GAAG-CTGTCG CA~uAAAATT AGCrACACCT GGTGTAGATG CTGCTGTCrr GAATGAAATG AAAGAAATCG ATOCTAAACG WCGTAACATC 'rrGGl'CAAGG TTGAAACTCr CAAAGCAGAA CGTAACACAG T7TCTGCTGA GAT 'GCCCAA GvCTAAGCGCA ACAAGGAAAA TACACATGAC AAGA71S=~ CCATG.CAAAA TCTATCTGCT GAGCTTAAAG CCTTGGATGC q'GAATTGGCA GAAATCG.A'IG CTAAATTGAC AGAATTTACA ACGACTCTTC CAAPLTATCCC AGCTGACAGC GTTCCrG~rG GGGCTGACGA AGACGACAAT 0000.
GTrGGAAGTTC GCCG71rGGGG TACTCCACGC GAG?1-rGACT TCGAACCTAA AGCrCACTGG GATCTCGGTC AAGACCTrGG TATCCTTGAC 'rGGGAACGCG GlrGGrAAGGT AACAGGCGCT CGCTTCCTCT TCTATAAAGG CCTcCGTGCr CGTTTGGAAC GTGCTATCTA CAACTTTATG TTGGATGAAC ATGGAAAAGA AGGCTATACr GAACTCATCA CACCTTACAT AGTCAACCA'r CATTCTATGT TTCGTACrGG TCAGTATCCA AAAT'rTAAGG AAGATACTTr TGAACTCAGC GATACCAACT rrGTCTTGAT TCCAACTGCT GAAGrCCTC TGACAAACTA CTACCGTGA'r GAAATCTTAG ACG~CZAAAGA TCTTCCAATC TACrrCAC2'G CCATGAGTCC GTCATTCCGT TrCTCGGCTG GTrCTGCCGG TCGTGATACG CGTGGCTTGA TCCC1rGCA CCAATrCCAC AAGGrTGAAA TGCTCAAATT TCCCAAACCA GAAGAA'rCTT ACGAAGAA'rr GGAAAAAATG ACAGCCAACG CTGAAAACAT 'rCTTCAAAAA CTCAACCTrC CATACCGTGT CG'1-rGCTCTC TCTACTGGAG ATATGGGCTT CTCAGCTGCG AAGACTTACG ACTTGG.AAGT GrGGATrCCA GCACAAAACA ATTACCCTGA AATCTCAAGC TG3TTCAAACA CAGAAGATTT CCAAGCCCGT CTCCCAAA TCCGTrACCG TGATGAAGCA GATGGCAACG TGAkAACTCCT TCATACCTTG AACGGTTCTG GACTTCCAGT TG.GACGTACA GTGrGCTGCAA TTC~rGAAAA TTACCAAAAT GAAGATGGT'r CTGTGACCAT CCCAGAAGCA CTTCGTCCAT ACATrGGGTGG AGCTGAACTC ATCAAACCAT AAAAAATAAG GTrTAGCTAT TTCTAGCTAG ACCTTTTTC GTAACCAAAT CAGATAAGCA CCTAGTACAA AGAATAAAAT AGTTAGGCA'r ATAATGGTr'r C-ACCCAATAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 CAGGTAATCC AGAAATCGAA GT7TCAAAAT TCCCTGACCC GATAATCGT GGGAAGGTGA CGGCTGAGAA GGCTGGTTGA GGGCAGACGA GTTAAAACAA AGAAAAAGAA GGATTGAGAA GACCCAAGTC GGCACGCTGG TTCCTCCTAC TCGAACTIAGA AGGMJCACAG TAGATTCCTT CrGTCCAAG CAAGGCTACT ATCGCTATAA ATAAGGGGAT AGACATAGAA CGTCAAGAGA A'rCTTGAGCG AAACC7"=T
CCCAAAATCA
AGGTCGCTGT
TTAAATrr
TGACAATCAA
GAAGCCAACA GTAGAGAGAA GGGAGTGGAT GTrrCTTTAA AA.ACCAAAAC TCAAGGTCC 844 ATAGGCAA'rr TCGATAATAC CTACCAGAGG ATAGGTCAAG GCAGCCACTG CTATCCCCAC ATAGAGAACC CTCCAGCTTG GA~rGGCATG AACCCTCCGC CC1'GGACAAG CAAACTTGAT GGTAAAACCA GCAAT~CAAGG TCAAATCCAA GAGAAATGAA AACCACCAAA TCCCTTGTGC TACCAAAGGA AGATAAGAGA ATACGCGAAA GACATAGGTC GATAAAATCA TCCCAGCCAT ACGAAGGTT CC-ATTCCTG ACAAAAGAGG GGGCT-rGGTC AATrCTGCT TGCTTTrcszr CCAATTAAAG. AGATGCAGAA 'rTAGAAAGTA AAT1CCATAAA ACCAAACCAA TCAGACTAAA AAGATGGGAT AGAACCGGCA ACGTATCTAA AATAAGATTT CCAGCTCCTIC CCAAACCTAG CAAACAACCT GAAAATACTA AGGGCAG=r TTTCATCCTA ACCTCCAATA ATCATGTTAG 7"TTCAGTATA ACATAAAAGC GCTTAAATGA GGATTTAAAA AAACGAGTCC 0CTTA~rrcA GACTTCATr'r TACTCAGATA Ir.AATrAGGC ATAACGTTGC AAITTCTGGAT TAATTGGTGT ATTAGCTAAG TTCTTGGCAT AGTTACAGAG GATTGCTAGG CTGACACCAA AAACCACATC CAAGGCATTT TCTGAGTGT AGCCAGCTTC TAAAAACTCA GACAAGGCTT CATCTCCTAC ACGACCCTTG GTAT'rGATAA CTGCCAAGGT AAACTTAGCT ACGGTATCCA ATTTAGGATC
S.
S S
S
SS S.
S
S
*5 b 55 S S TGT'N'CAATT GGAGTAcG.AT TGCGAAGAGC TTGAATCAAG TTTGATGGAA AAGGCTGTGT CGTGATTTGC ACCACTrCAC GACAATTTGG TAGGCT'rCrA AATATAGCCA TTGTXCGTCT'r TGACTCTACT GITATGGATAG CTAATATTAT TGGAAAATCT TATTCTCTCA TAAGATTTTC ATATATATTA TATATCAGGC GACCTGCGAC ACAGAAGGCA GCTCAACGGG TGTCAGGCTG AAACAGTCCG GGCATTGGCC TTTCTACTGT T'rCAAGAATT TAAAINTTGT CATAAGATAC TATAAAATCC TGATrCCTAA G7'rCTTATAT TAGTTTA'rCA TCATCATTCA TCTGCGATTTG CAACCATTGG TCACGGCTGC TTGCGACGGT GGATAGATGA AAGACACCGA TTAGGTTGGG TCrTTCACTT CrGCrGGTGC CTCTTTTCTT ATrATTGACA GTTTATCTAA GATAAAGCTT CACrrCCAAT CACTTGTATA 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3521 TGATAAAAAT TATTTATAGG CAAAA.AAATC ACACGAGCTG *4 S. S
S
S
.5 S S
S
TGTGATTCCA TTA?1-rGTCA AAATACTTTT 'rAGTTrCAGC AATAACGACT CCAAGAGGGC AATCAAGTTT GGCAGAGCCA TCAAGGCGTT AACGATATCT AGACCATATC CAACTCGATA AATCCTCCTA ACAAGACCAT GAGCACAAAA
GGCCACAAGA
GCGATAATCC
ACCACACCOT
AGAGCCAGAT AAAGCGAACC CCAAAGAGGA ACTCAAAACA GCC7TCTCCG TAATAGTrCC AACCTAGAAT CG7TGTAAAG GCAAAAAGTA CAAGGAAGAT GGTCAAGAGA GCAGGCCCAA AGTGSTGAAAA GTTTGTTGAG AAAGCTGACT GACTCAAGGC AACCCCATTC AAGTCACCGC TCCAAACTCC AmrACCAAG ATGGTCAAAC CAGTTAGACT A INFORMATION FOR SEQ I.D NO: 119: 845 Mi SEQUENCE CMARACTERISTICS: LEN=T: 1968 base pairs TYPE: nucleic acid STIRANDEDEMSS. double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 119: AACCTGGCA AGCAAGCTAA AAGCAATGGG ACCTGGAATC CTAATGGCAA CTGCCGCTGT TGGAGGTTCC CACATrGTAT CCTCAACTCA AGCTGGCGGT T1CTTACGGTP GGTCTCTACT TCTCTGGTC ATCTTACCCA ATCTCTTI'AA ATATCCATTT TTCCG7r'= GTGCTGAATA CACAGCTGAT ACTGGAAAGA CTTTGGTTGA AGGTTATGCC GAAAAAGGAA ACTCTATC'r CTGGA'C TATCCTCA ATG'rCTTTT-C GGCTATGGTC AACACGGCTG GTGTTGCCAT TCTGTGCTCA GCTATCATCG CCAGTGCCTT CCCAATGATT GGACTTAGCA TTACTCAGTG GTCCCTCA'rT CTCGTrGCA-A TCATTTGGGC TATGCACTC TI'TGGAGGCT ACAAACTT AGACGGCATG GTCAAATGGA TTATGTCTGC CTTAACCATT GCGACTGTTC TTCCAG-TAT CATTGCGGCG GTCAAGCATC CAGAATACAG TTCTGATT= GTCG-AGAAGA C-ACCTTGGCA AATGGCAGCT CTGCCCTTrCA TCGTCTCCCT CCTAGGATGG ATGCCGGCTC CTATTGAAAT TTCACCATC AATTC-ACTTT GGTCAGCTGA AAAGAGAAAG ACCGTCAACTr TrAACACAGA AGACGCTCTG TrGACTTTA ACACTGGTTA TATTGGAACA GCTATCCrAG CCGTCT.TCTT TGT4GGCACTG GGAGCACTGA TTCAGTATCC TACAGGGCAG GCGGTTGAAG CTG=TCAGC CAAATACATC TCTCAATTCG TGGGCATGTA TGCCTCTGTT CTTGCCGAATGGTCCCGTTA CTTGATTACC TTTATTCCCT TCCTCTGTAT CTTrGAACA GTTATAACrG TTATCGATGG CTATTCTCGC CTTAATCAGG AATCrCTCCG ACTGCTAATC AGTCAAAAAG ACGACAATCG TAAATC"r AACATCTGGA TGACCATCAC TGCTATCATC GGTA'rCGTCA TTATCAAGTT CTTCGCTGGT CAGGTTTCAA CCATGCTCCG ClrTGCATC ATTGGCTCTT TCCTGACAAC ACCrrTCTTr GCTCPTTGA AMTACGCCTT GGTAACGCGT GAAAACAAAA ATC?1'CCTTC ?TGGCTCAAA CACCTTrGCCA TTGCGGGATT GA'rTTTCCTC 7"rTGCTTCGC CATCTTCT ATCTACGCAC TCGCAATCGG AAAAGCAGGG TAAGGGACAA GCCCGACATG AAGATAAGGT TTCATTTCAA GAGAAAATTC AGCAAATArr TCTATGATAA AAAGCATAAG AACAAGGTTT TCAAGACCTG XACTTATGCT TTTTTACG;TT CTTAAAGACT GITATAcTC AAAAAAC-AGT TCAAC.AACTT CAACCACCTC TTATAAGAAC TTTATACTAT TCGAGAATCT CTTCAAACCA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 846 CGTCAGCTCTr ATCTGCAACC TCAAAGCTGT GCT TTGAGCA ACC'TCGACr AGCTrCCTAC 1500 TTTGCTCTTr GATTTT-CATr GAGTATrAAT TCTCCTTTTC CAACTCATAC AAATCTGCGA 1560 TAATAGCTGC GACATGTI'TG ATATCTTCCA GCATGCCTCG CATTTCAAAG TCAGCCAATA 1620 CAGGGAAGCC AAAGCGTTGA CTGTATrGCT TGGCTGTTAG GCAGTATTGG TTkI-rAAAGT 1680 TACGATTTCC TGACCCAACC ACACCAAAAC ACTrACTAGC ATTGTTACCA TAGG.CAATAA 1740 AATCTCCCAC CGGTGTCGTC AAAATCTCAA CATCTCCGTr ATCCACGCCA TTCCCACCTr 1800 CCAGATAGGT CGGCAAAAAA GCGACATAGG GATGGTCCAT TCATAGAAA TTTTrGCCTr 1860 CCTTGACCAA ATCCTTGATA TGATCTr'rT GAACCTCAAT CCCT".TGTAC TGGGACAAGA 1920 GATATCTTr CAAGCGCGTC ACAAAACTTr CAGTGTTGCC AC'rCAAQG 1968 INFORMATION FOR SEQ ID NO: 120: SEQUENCE CHARACTERISTICS: LENGTH: 7172 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: *CCCCAT T-TT TATCACTAGA CTCGAGACAT CN'TTCAGTG GCTCPTGCTC TCTGGTTTAA *TTTTCTTCCT TGCTCAAGGA CTCCTGCTAT TrCTCTTGGT CGTCCGACTC AAACATCAAT 120 TCGCTGAGAT T'TATCCTCAA ATCAATAAAA AGATTCGCTT CTACTATTrA GGGGTTrCTCA CCATTGATTT TCTA-rrrTT GTTCTCTTAG CCTTCATTAG TTCTCAGCCT TITCATCTC 240 *TTATGCCAAT CATCACTGCT TGCCATTCTA CTT'rTATTA. TATGACAGCT GACTACCTAA 300 GAGAAAACTA TCCAGACTrT TACCACAAAC ACATCTC'rTr ATGGGAGTGT CTCTAAAGAA 360 **.AAGGAGGTTT TAGCATGAAA AAAATCATCT TCATCAAAAC CAT'TCAACTC CTTGTCAP'rG 420 *ATGGAATCAT GCTGGCATTr TTGACATTTA AAAGGGGGCT TACrTrGGGAC TGGATTTTGA 480 TTTATAGCGG TTGGCTCATT TTCN'TCATC CTGTGCTATT GACCrATCTT TCAAACCAAC 540 T'N'GTGACCA CTTI'AGTrAA CTCTATTCCC AGATTAGACC GAGATTCTGG CGTTTTGCTr 600 TACAAATTCT CCTATGGGAT AGCCTGATGA 'rTCTCTCCTT GGTGTCTTTA AGTGATATITC 660 CACTTrTCCT TCAGGGAACT CTCCTCATCC TAGGACATCT CATCCCTTCC TATCGCATCT 720 GCCAAAGCCT-CAAAAGAGAC TTCCCCCAAG CATATCAAGA ACCGATTTrCT TTTTGCAGTA 780 TTTTATGATA GATGAGAAAG ACCAAGCCGA CTGGGCTTGG TCTT'rCTTAT CTCTTTTAG 840 TATCTAGGAT AATGCTAACA GTCCATTAT TAACCAGCTC AACCTGCATA 7CTGCTCCAA 900 847 AGATGCCTGT CTGAACGGCC ALrCrGCG CTAA~rTN'G A'rrGAAAGCA TCATAGAAGT C1'GATGCCAT ATCAGGTTTA CCTGCCCCTG TAAAGgCTGG ACGATTIGCCT CTCTTAGTAT CCGCAAAGAG GGTAAACTCA GAAATAGAGA GGA=~CTCC 7rrCAATATCT TTGA~CA GGI-rfCTCT'r CCCITCT=C TCTGAAAAAA TCCGCATA'rT GACCAGrrTT CTCACAGCAT AGTCCAAATrC TTLCCCIr TCCTCTGGTC CAACACCAAC CAGCAATAAA AGTCCCTGAT TGATrTr~rCC CTCAATCTGG CCTTCTATAC TCACrrGGGC ?TTT1AACC CGTTGGATAA TCATT'r'rCAT AATAGC~C 1 CTACT1AAGAG CTAGGACAAC TAGCCGTTV CCGrr AGAGTAAACT 7NLTCGCACAC GCCAATACCG AAGgACACAT GACCGTTGAA ATATTCTTGG "GTACGGTTG AGACCGTAGA CTGGTCrTCC CATTCCACA' CATACACTCC ACACGGTGAA GTCACCAGGC ACGGGG= AC TC'TTAATTTT ATrCGACAACC GTGTCACTG TAGAGAGGTT GGATATTAGC AAACTTCATA TCCrGGTTG GTTrGGGCATT TTG'TATTTGA AAGAACTTCC AGTACA'rCGT TCAACAGTCC TA'rCCATATG GGCCA'rATAC TCCTTA71"rrG AGCTAGGGTA CAAGGAGACG TTGCTCCTAG 7r'rcTTGMGG CACCCAGG-r TAGCCACACC ACGACCCTTG GTAArG'rAGC CAACAATATC AACACTTAGC AATCCGCACT AGGAGACCAG AAGCACCTTC AATAACCAC'r CCCCCCTCAT GCTTGACCTT GAGGG'rTTCT 7TrTTCAA CCTGACCTC GCCACCTrrG ACAAGCTrCC'P CTGCCTCAGC TTGGCCTTG CCACGCTCTT CCTCACGGCG TTCCTTTCA GTCAGACGGT TAAAGACGGT AATCGCACCG ATT'TCCCCAA AACCAATGCC CGCAAAGAGG GAG'rCTrCTG TCTTGTAACT GGCrCTTrrGC AGAACTI'GAT CCATGTGGCG CrGTCCATA AATTTATTG CCACATAGCC ATTTTCTTGG AACTGAGCCA TCAG-CATCTC ACGACCCTTG TTGACAGACA ATTCCTTATC TTGGTrT'I1A AAGAACTGGC GAATCTTATr GCGCGCCT'rG CTACTCTTGA CCATATTGAG CCAGTCACGG CTAGGTCCAA AGGAGTTrCG G~wrGGCGATA ATTTCAACCT GATCCCCTGT CTTTAAC1-rG GTTGTCZAGTG GAACCATGCG GCCATTGACC TTGGCACCAG TTGCTrTTC ACCCACCTTG GTATGGAT'rr CCGTAGGCAAA ATCAATCGGT CCTGAATCT'r TGGGAAGGGA ACGGACAGCT CCATCTGGGG TAAAAACGrA AATCTCCTCA GCCAAATAGT TTTCCTTAAC AGAGTCCACA AATTCCTTAG CATCATCAGC CTGGTCTTGG AGCTCCATCA TCTCCTrGAT CCAGTrCATrr CCAATACCTG ArCCTrGCT GTTAACTTGC CCCTTTATAC CTrrCTrATA AGCCCAGTGA GCCGCAACCC CGTACTCAGC CACCTCGTC 7-TTTCCTTGG T'rAATCTG GAATTCAATC G~cCCcr'1'T GTCCATAAAC AGTCGTATGG ATAGACTGAT AACCATTGGC CTGCGGTTG GCGATATAGT C71rGAAGCG 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 848 ACCTGGCATC GGTTTCCAAA ATTCATGCAC GflAACCAACC ATGGCATAAA CA'rCACTI'rG GGTATCTAAA ATACAACGAA TAGCAATCAG ATCAThAGAT TCCTCAAACC G=r-rCTCTT GTCCTCCATT ?rGCGGAAAA 7rCAGAAATr ATGC?1'GGGA CGACCATAAA TCTTCCCTT CAAGTGACGT TCTGTCGrAT AC'TCCrCTAA TTTTGTGACT ACCTCA'rCCA CC.AAGGCCrC ACGCTCCCTC CGCTrCC'r TCATCA5TATG ATAACGGAAA GACAAGTCrr CTAA'?TCCCA AAGCGGGGCA TAGATTTCCA TG-GTTTCT A'rGIrCAGG GTCCOCATAT TGCCAACCG GTCCTCAGAC ATGGCCATCA CCATCT'rGCG 'rT-GTACTCG ACCTTGCCAA GC 1GGTAAC AAACTCTCTT TCCAAATCGT CCAAACI'CC TCCACAAGCT ACTGTTACAG CATCCAGCr GarAATCTTG
TTGACACTG
GAAATACGC
GTCACACAGT
ATGATTTTCC
rCCCTCALACA
ATCTGTATCT
TAGCTTrAGCT TAAAACTCCG TTGGAI'GAG CAAArCCCCA AAC.A'rGGCc TCCTGCC=TTG CTTTT'CGAAG rTTACCAAAA TAACGCGGAT GC.AATTGCT CCTCGATCGA ATCATCCGCA CATCAGGACC
TCCACCACAT
AAAATACCTG
CCACTr~GCc AGGGTGAATG ATATAAGGCT CGCCrGA=? GCCATATTGA ATAGACCAAG CCrTATCA CAAuAATGAAC ATCCTCTTCC GTTAAATATT AGCGACAACT TCT'rCGCCTG TTAAATTCAC 'rrCTrTICCCC ATCTCTACTC
CATGCAAGAA
CCACTTGGAT
ATTCAACAC
CTTGGTrAA
TCCAATTCTT
ATAAGAAAA.A
ACGAAGTTAG
GTAACCCAAA
2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 CCTACCATTT TATCAC?11' AATAATTCAA AATTGCTTGA AT=TAAACT TAGGTCATAG GACTTTCTGA CTTCCCCAAT ACATACACTG G'TACC'TTGAC ACTAGGAAAC TAGCCGCAGG GCGGTTTTGAA GAGATTTrG AGTT'rATAAG GAGAATACCT 'rAGGAGTIrG CTGCCTGTTC TTAAGA-ATAT GAAAACTAGA TTGGAACAGA TAAT'rCTGAA ?TATTGGTCC GTAATATACT
AAGCAGAGAT
TCCATTGAAG
TGGATTT'rGG
TTACT'CAAAG
AAGAGTATAA
ATGAAAAAAA
CTATTCAGCC
AGAAGAACGG AAACCATAT ATACGAAAGA TAAACGGTGG AACTCGTATC AATTAATACT AAATGAAAAT CAAACAGCAA CACCGCN'TG AGGTT-,GCAGA TAAAGTTGAC AAATCCTCAA GATACrrTC TCTATCCT'rr CrATGCTAAC GACCAGAACC TGCCCAGTAG AGAAGCCAAC TACGOTAGCA CTCTAAACAT TTATGjCAACC TATGAAGCGC AATTGACACC CCC'1rATTTC
AGCAAAGCTA
GATAACTAT
CACGTCTATC
TTTCCAATAT
TATC-TTrGCT CTCCTAATGT 'rAAAAAACAG TCGTTCAATA AGTGAATATA GCGACAAATG AAACAAGCTA ATGACGGAGT TrGATAAATTG ATAACCAGCC TGTCTTCATC -2kGTCATCCT GGT7rrAAG TTCATI'TAA ATCCTTACCT A7rCTCCTA ACTGTGCTAT AC7TAATTTA TACTCAATGA AAATCAAAGA GCAAACTAGA AACCTAGCCG CAGGCTGTTC AAAGCACTGC TTTGMGTG CAGATAAAGT TGACGCGGTT TGAAGAGATT TTCGAAGACGT ArrAGTACAT TCI-IrGAGAT TCGAGCTACGT ATGAAAATCC ATA.AAACCG;T CAATCCTCVr GCCTATGA'AA ATACCTATTA TCTAGAAGGC GAAAAGCACC TCATCGTCGT CGATCCTGGT AGrCATTGGG AAGCCA'rTCG TCAGACAATC GAGAAGATCA ACAAACCGAT CTGTGCTAT'r CTCTTGACCC ACGCCCATTA TGACCATATC A'rGACTCTGG ACTTGGT.PCC CCAGACCTTT GGCAATCC'rC CTG'rCTATAT CGCAGAGAGC GAAGCCAGCT 'rCCTGTCGAT AATCTCTCCG GTCCCCTCG CC-ACGATGAT ATGGCAGATG ACCTGCACAA CACACC7,rT TCTTTCACGA ACAATACCAA CTAGAGGAAT GTCTACCG ACCCCAGGGC ACTCTATCGG TGGTG=rCC CTAGTCrC TCTAGTC1-rG ACGGGAGATG CTCI'ATTCCG CGAAACTA'rC GGACCGACCC TGGTAGCATG GAGCAACTCC TTCATACTAT CCAGACCCAA CTCTTCACCC
GGCTCTACAC
TGGTCACAAA
TTCGTTTAA
CTATCTCA
ACC -rCCCAC
TACCAAACTA
0 0* 0 0 000* *000 0000 00 00 0
CGATGTCTAT
CT'rT'rCI'AG 'rTACAAAAGG AGATrM
CCAGGACATG
CAAGATGATG
CATCCTATCA
CATTACCAAA
GTCCAGCTAC TACTATCGCT CACGAAAAGG CCrTCAATcC ACAATCGAAA TTTAACTrAAA CTATCCAGCA AATCTTTCTA AGG'TCAC ACATGATTGG ATGCCTTTTT TCTG-ATGACT TAATCACGCG CTCCTCTGGT GAACGCCACA TrCCCTCTCC TAAAGAAATC GTCGAGTTT GGTACTGCA CATTGACACG CATCGACGCT ACCCAAAAGT TrCATAAATT CTGGTCGACC CGAAGTTGTA GAAGAACICT TCTGCTGAGA AGTCTGCTrC CTGCTGCGAT TrCrCCCAAG TCAGCCACGT TTTCTGATAC TTCTTGACA TCATAGG GAG=rG4GCT GTGCGTGCA T'rTCATGC~C CAGATGCGAC TCTCTrAGCT GCTTCAAGCG 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 AGTCAAT'rTA CCGTTAATGG TTGCTCCZATA AGAATCCTCT CCATCAAATT GGTCAATGAC TTTTGiTGTT TTCTCCTTG;A AGGcAGc-ATA GTCGCTCTCT GTCCACCAAT CCI'TGAGGCT ACCATTTTG TCAAAGGAAG CCCCGTTAGT ATCAAAGGCG TGGGAAAT=r CATCGCAAT CACTGCCCCA ATACCACCGT ACGrAGCACA AGATGACTGA TGCAACTCAT AGAAAGC.Crc CTGTAAAATG GCCGC~rGAA AGACAATCAG G7"rCTTrcA GGATTGTAGT AGGCA'rrGAC CATATGAGCA GGCATGCCCC ATTCCTTATA ATCTACAGGC TGc=rCCACT TACTCCAACT GTGCT"?GATT TCCACACCCG CAAAGGCTAG AGCATTCTCA AAAAGACTGG CAGTTTCATT CACTACCI'TA TCC7rGTAAC GTGCAGGCAA TrTlCI=GGA TACCCAATAT AACGGI=GAT CACATTGAGC TTCACGA'rAG CCTGTTrACA GG~rrCTCGA GTGAGCCACT CATTCTTAAG CAGACGCTCC-7TATAAACAT CAATCATGCT TGccTTrr TTcI'ccAcAT CCGccTTGGC TTCTGGAGAG AACTTCTCAC GGGCGTACCA AAGACCCAGG GCT7VCTTGA AAGcGlrcTTG 850 TGCTAGATGA. TAAGCTGCTT TGACCTTATC 'TlrTrCCTCT GGAACTCCAG AAAGGGCACG GCTGTAGGCA CCAGAcAAAA CACGGATATc CTCTGTPrAAA TAGCTGGTr AAAGATTGAc AACACTCAAA ATCAAGTTG CTTTAAGGAG AGAccAcGCT- TCCTCACTGT AGAATTGCTC TGCTGCTTGC CAGAAACG'PT CCTCGTCTAC AATAACCITT TCTGCTAA'rr GCCCAATAAC TGCTTTGAAG AAGTCATCCA AAGGTAGGGC AGGCGCGAAT TTCTTGAAAT CTTCGTAAGA ATATGGATGA TAGAGTTTAC CATATTCTGA ACTrTCTTCA TTAGAGAGCA CCACTGCCC AACTCGGCGG TCCAATTC.AA GTCTTrT-C TAGCAAGTCT TCAA--TTCTr CATCAGAGAA 00.00.
0 4 0 0*e 0 *0 .0 *9 S 0 00 0 0 000000 0 0 *0*0 0* S 0 0050 0000 0*'.0 00 50 0
S
ATCATAAGCC, TTGAGGAGAT TTGCGCTCCT TTCT'TCCAA AGAGTrCAAGA CTGAGGATGT TCTTC1'GCAT AGTAGGTCGT ATCTGGCA.AG ATTGTCC"1"G CCATAGAACA TTGATTCTAG CATCCA'rAAA GTCTGGCGAT AC-ACCAAAAG TCGTTTCCT GC.AAGCTCAA ACTCTGCTAG TTTAGCTGTA AAATCCGCAA TTCTTGGA.AT TC?1TTAAGGA GTGCTAAGAC AGGTGTGATA CCGTCAGCTT AAAATCACGA ACTAGGCGG'r GG'TATTrTGAC AAAGT'rTTCC AAGATAGCAT TTCTTCACCT GCrAACCACT TGTCTGTrGT CGCCAGCATC AGGTCI-rCAA TAAATCAACA AAACCTCCTG TPTTGAGAC'rT ATCTGCTGGG AlwrrCAGCTC TTCTCCATTG ATAGCATCAT AAAAATCATC TrCATAACGT GTCATCTTGT ATTrGTATTT GCATTTATCT TAACAAAAAT CC INFORMATION FOR SEQ ID NO: 121: SEQUENCE CHARACTERISTICS: LENGTH: 4518 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
GCTCTTCGCG
GAGCGCTAGC
GAAGGAAGTT
AAGTCTCCAA
CTCTCTTGTC
CCTCAGGCAC
TTTCCTGGTC
TCTGTTGCCA
TC!TCGCTTTC
6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7172 120 180 240 300 360 420 (xi) SEQUENCE DESCRIPTION: SEQ ID*NO: 121: CGGAAGTTA TCGATCTAG ACTTCGTTCC TGTACAGCTA CTTTCTCAGG TGGTCTTGTT GTTTGTATGA GTTTGTI'TAG AGAGGATC?1' TCTATGTCTT TCTTTCTTAT T?1'TGTrTrTA TATCTTTC TGA'ITrC~rA TCTAATTTAT GCTrATTTCA GACTAAAAAG GAAATACCGA GTA62ATGAAT AGCAAGGTTC TAI3GTCTTCA GATTGATTTT TAGCACTCTT GATAAA.AGAG TGCTAATTT_?rTGAGTTTTT GTCTTGACAT TCTCTTCTAA GGGATAA TAGAATCATG AGTrAGCACT TGGAPGCATr GAGTGCTAAT TGATCAGACA GAGAGGAGTG ATGAGATGGT TACAGAGCGT CAGCACGATA TTTIA.AATCT GATTATTGAC ATCTTTACCA AAACCCACGA 851 ACCTGTCGGA TCAAAAGCCT TGCAAG2AM~C TATTAACTCT AGCAGTGCAA CCATCGTAA~ TGAcAT~ccG GAAcTAGAAA AAcAAcdrT GcITcGAAG GCTcATAcTT cAAGTGGrcc GATGCCAAGT G~rCTGTTrrCAGTACTA TCTGAAACAC TCACTG.GA'r 'TTGACCGGCT GGCTGAAAAT GAG.GrATATC AGATTGTCAA AGCCrT-GAT CAGGAATrCT TCAAATTGGA GrATITCTG CAAGCCI' CTAACTTACr AACAAGACCTG AGTGGCTGTA CGGTAGTGC ACTGGATGT'7 GAGCCG.AGCA GGCAACGr GACAGCCTTT GATATCGTTC. T7=rGGGCA ACATACAGCC TTGGCGGTAT TTACCCTAGA CGAGTCGCGA ACGTACTA GTCAGTCTr GATTCCAAGG AACTTCTTGC AGGACGATrr GCTGAAACTG AAGAGCATCA TTCAGGAACG TTT'CCTCGGT CACACCGT'N' TAGATATTCA CTACAAGArr CGGACGGAGA TTCC~cAGAT TATCCAGCGT TACTTTACA.A CAACGGATAA 'rGTCATCGAT CTC?-TGAAC GGAAATGTTC AACGAAAACA TTGTGATGGC GCGCAAGGTC CATCTCTTGA TCTAGCAGCC TATCAG-rCT TTGACCA.ACC GCAAAAGGTG GCCTTGGAGA GTTGCCTGAG CATCAGATGC AAAATGTTCG TGTGCAGAC GGTCAAGACT 'IGACCTAG GTAA'rCAGTA GTAAGTTCCT CATTCCTTAT CCGGAGTTG CATTATCGGT CCAGTAATC 'rGGATTACCA ACAGCTAATC AATCAACCA CCGTGTTTTG ACCATGAAGT TGACAGATTT TTACCGCTAC CTCACC-AGrA AGTACAT'rAA GA'rTGAAATC AT'rAAAGGAG GCGAACATCG CCCAAGATAT GAAGTAGAAG AAGTTCAAGA AGAGGAAGTTr GTGAAAACAG CTGAAGAAAC AAGTCTGAGT TGGACTGCC AAATGAACGr GCAGA'rGACT TCCAA.AACAA GCTCATGCAG AAATGCAAAA TA'rCCAACGC CGTGCCAATG AAGAACGTCA CGTTATCGTA GCCAGGACT1T GGCAAAAGCA ATCTTACCAT CTCTTGACAA GCACPGCAG TTGAAGGTTI' GACAGATGAT GTGAAGAAGG CTGGAT ACCTTGA'rC ACGCTrTTAA AGAAGAAGGA ATTGAAGAAA TCGCAGCAGA ACATCrAA
ATTTCCCAA
TTCCTGAGGG
CCTGT'rTAGC GAATTCrAGC
ATGTGGTC.AA
ATCA7TTACGA
AAAAAATGAA
AAC'TCCTGAA
ATATCTTCGC
AAACTTGCAA
0CCTGAGCG?
GGTGCAAGAA
TGGCGAAT'r 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 GACCATAACT ACCATATGGC CATCCAAACT CTCCCAGCAG ACG-ATGAACA CCCAGTAGAT ACCATCGCTC AACTCTTTCA AAAAGGCTAC AAACTCCATG ACCGCATCCT ACGCCC-AGCA ATGGTAGTGG3 TGTATAACTA AGATATAAAG CCCGTAAAAA CCTCGCAC;TA AAAATAGGAG ATTGACGAAC 'rcTrCGA'rGA ACACAAGAAA ATCTATC1-rr TrrACTCAGA GCTTAGGGCG TGTTCGATrc GGCAATrcTG ACGGTAGCTA AAGCAACTCG TCAGAAAACG GCAATC GCTA TGGCCTTTGC CTAGCTTCCr TACTAACTCG TCGTCGAAAT AAAATCG.ATT TCGACTCCTC 1860 1920 1980 2040 2100 2160 852 GTGTCCCAAT rrACATAATA GAAAACTTGT CCGAAACGAC AATAAACTAT GAAGAAAGAT AAAATATGTTr TGGC7==GA ATAGTGAGCG AAGCGAACCA AACACGATAC TCI'TCGCCGT GGCGCTA~rr GCGCAAATTT TGAGACCrrA GGCTCAAAGT TTAGTCAAAG AGATTGACGA AGTCAAGCTC TGACGGCGTC GCCACI'CTCG CCACTTAAGA AGAGTATCAA AAAGAAAAAT AGAAATAA c1'AAcAPJ GA GAAAAAcAcA TG~cTAAAAT TATcCGTArr GAcTTAGGTA CAACAAACTC AGCAGTTGCA C TCrGAAG GAACTGAAAG CAAAATCATC GCAAACCCAG AAGGAAACCG CACAACTCCA TCTGTAGTCP CArrCAAAAA CGGAGAAATC ATCCTTGGTG ATCCTGCAAA ACGTCAAGCA CTTACAAACC CAGATACACT TATCTCTATC AAATCTAAGA TGGCGAACTTrC TGAAAAAGT'r TCTrGCAAATG GAAAAGAATA CACTCCACAA GAAATCTCAG CTATGATCCT TCAATACTTG AAAGGCTACG CTGAAGACTA CCTTGCG AAAGTAACCA AAGCTGT'rAT CACAGTTCCG CTGGTAAAAT TGCTGGTCN' TTGCTTATGG TTTGGACAAG GTGGTACATT CGACGTCTCT GCTTACTTCA ACCACGCTCA ACGTC.AAGCA ACAAAAGACG GAACTAGAAC GTATN'G'NAA CGAACCAACT GCAGCAGCTC ACTGACAAAG AAGAAAAAAT C7TGGTAIr~n GACC~rGGTG ATCCTTGAAT TGGGTGACGG TGTCTTCGAC GTATTGTCAA CTGCAGGGGA CAACAAACTr C.GTGGTGACG AC7"TTGACCA AAAAATCATT GACCACwrcG TAGCAGAATT CAAGAAAGAA AACGGTATCG ACTTGTCTAC TGACAAGATG GCAATGCAAC 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 GTTTGAAAGA TGCGGCTGAA AAAGCGAAGA AAGACCTTTC TCACCT-.GCC AlwTTATCACT GCAG.GTGG CTGGACCTC? TGGTGTAACT TCAACACAAA TCAClwrGGAA A'rGACTrrrGA CTCC'TGCGAA ATTTGACGAT TTGACTCGTG ACC~rG'rTGA ACGTACAAAA CTCCAGTC GTCAAGCCCT TTCAGATGCA GG7"rTGAGCT TGTCAGAAAT CCACGAAGTT ATCCTTGTTG GTGGTTCAAC TCGTATCCC- GCCGrGTTG AAGCTGTTAA AGCTGAAACT GCTAAAGAAC CAAACAAATC AGTAAACCCT GATCAAL3TAG TTGCTATGGG 'rGCCGCTATC CAAGGTGGTC TGATTACTGG TGATGTCAAG GACGTrGTC TTCTTGATGT AACGCCA'rrc TCACTTGGTA TCGAAACAAT GGGTGGAGTA TTTACAAAAC 'PTATCGATCG CAACACrAC-A ATCCCAACAT CTAAATCACA AGTCTTCTCA ACAGCAGCAG ACAACCAACC AGCCGTTGAT ATCCACGTTC 'rrCAAGGTGA ACGCCCAATG GCAGCAGATA ACAAGACTCT TGGACGCTTC CAATTCACTG ATATCCCAGC TGCACCTCGT GGAAMTCCTC AAATCGAAGT AACATTTGAC ATCGACAAGA ACGGTATCGGT GCTGTTAAG GCCAAAGACC TrGGAACTCA AAAACAACAA ACTATTGTCA TCCAATCGAA CTCAGGTTTG ACTGACGAAG AAATCGACCC CATGATGAAA GATGCACAAC CAAACGCTGA AGCCGATAAG AAACGTAAAG AAGAAGTAGA CCTTCGTAAT GAAGTAGACC AAZCAATCIP TGCGACTGAA AAGACAATCA AGGAAACTGA AGGTAAAGGC TTCGACGCAG 4020 AACGTGACGC TGCCCAAGC? GCCCF GATG ACCTTAAGAA AGCTCAAGAA GACAACAACr 4080 TGGACGACAT GAAAACAAAA CTTGAAGCAT 'PGAACGAAAA AGCTCAAGGA CTTrGTrA 4140 AACTCTACGA ACAAGCCGCA GCAGCGCAAC AAGCTCAAGA AGGAGCAGAA GGCGCACAAG 4200 CAACAGGGAA CGCAGGCGAT GACGTCGTAG ACGGACAGTT TACGGAAAAG TAAGATGAGT 4260 GTAT TGGAT(; AAGAGTATCT AAAAAATACA CGA.AAAGT-r ATAATGATrT TTCTAATCAA 4320 CCTGATAACT ATAGAACATC AAkAAGATTTr ATTGATAATA 'rTCCAATAGA ATAT1-rAGcT 4380 AGATATAGAG AA2-rATAM-rA GCTrGAACATG ATAGTTGTAT CAAAAATCAT GAAGCGGrAA 4440 GGAATTTCT TACCTCAGTA TTGTTGTCTG CATTTGTATC GGCGATGGTA CCGTATCTGA 4500 CGAACGTTCA GCTTATAT 4518 INFORMATION FOR SEQ ID NO: 122: SEQUENCE CHARAC TERISTICS: LENGTH: 8145 base pairs TYPE: nucleic acid sTRANDE-DNESS: double TOPOLOGY: linear (xi) SEQUENCE DECIPIN SEQ ID NO: 122: *TGCTATTTTC GATTCCCTTG GGCGTTTTGA TTGC-TrGC C7TGCA.AGTC CATTGGAAGC CCCTCCATTA TCTGATTAAC ATrITACATCT GGGTTATGCG AGGAACCCCC TTACTCT'rGC 120 AACTGATTT'r TATCTATTAT GTGCTCCCAA GI!ATTGGGAT TCGTT'rAGAC CGCCTTCCTG 180 *CAGCTATTAT TGCCTrTGTT CTCAACTATG CAGCTTACTT1 TCCAGAAAITT TTCCGTGGGG 240 GAATTGACAC TATTCCAACA GGACAGTATG AGGCCGCCAA GGTCTrGAAG rl'TAGCCCTT 300 T'rGACAGAGT GCGCTATATT ATCTTGCCCC AAGTGACCAA GATCGTTCTT CCTAGTGTrT 360 TrAATGAACT TATGAGTX-rG GTCAAGGATA cTrCTITCGT CTATGCTCTC GGAA3"rTCAG 420 ACCTTATCTT GGCTAGTCGA ACACCTGCTA ACCGCGATGC TAGTCTAGTT CCTATGTTCT 480 TGGCAGGAGC CATTTATTTG ATTTTGATTG GGATrGTGAC AA'rATTCC AAAAAAGTTG 540 AGAAGAAGTA TAGTTATTAT AGATAGGAGG CTGCCATGTT AGAATTACGA AATATC.AATA 600 *AAGTCTTTGG AGACAAACAA ATCCTGTCTA ATI'TCAGTCT AAGTATTCCT GAAAAGCAAA 660 TCCTGGCTAT CGaTTGGACCT TCTGGTGGAG GTAAGACAAC TCI'TTACGT ATCT;TGCAG 720 GTCTGAAAC CATTGATTCA. GGGCAAATCT TTATAATGG ACAACCT-IrTA GAGCTGGATG 854 AATTGCAGAA GCGCAATCTA CTGGCATTTG TCTTCCAAGA ?TC~rAACTA 'NTTCCTCATC TATCAGTTCT GGAAAATIrG ACTTTATCGC CTCAAGAC CATGGGAATC AAGCAGGAAG AGGCTGAGAA GAAGGCGAGT GGACTC~wMG AACAGTTAGG ACTAGGAGGA CACGCAAGAGG CCTATCCT'rr CTCACTATCT GWrGGGCAAA AGCACCGGGT GGCTTGCG CCGCTATGA TGATM'ACCC AGAAATCATT GGCrACCATG AACCAAC'rC TGCCCTGGAT CCAGAATTAC GT71GAAGT GGAGAAGCTA ATCTrTGCAAA ATAGGGAACT TGGGATGACC CAGATTGTGG TTACCCATGA T'rTGCAGTTT GCTCGAAAATA TCGCAGATGT ATTA'nrGAAA GTAGAACCTA ALATAGGAGCA AAAATGGATG AAAAA.ATGGA TGCTTGTATT ACTCAGTCTG ATGACTGCTT TGTT-CTTACT AGCTTGTCGG AAAAATTCTA CCCAAACTAG TCCAGATAAT TGGTCAAAGT ACCAGTCTAA CAACTCTATT ACTATTCGAT TTGATAGTAC 1TrGTTCCA ATGGGATTTG CTCAGAAAGA TrGTTCTTAT GCAGGATTTG ATArrGAT'r AGCrACAGCr GTT'rrTGAAA AATACGGAAT CACGCTAAAT 'rGGCAACCGA TTGATTGGGA NrTGAAAGAA GCTGAAN'GA CAAAAGGAAC GATTGAWCTG AGGTGGCTTT CAGTAACTCA ATTTGGAATG GCTATTCCGC TACAGACCAA CGCCGTGAAA TATATGAAGA ATGAGCAGGT A7TGGTTACG AAGAAATCAT CTGGTATCAC GACTGCAAAG GATATGACTG GAAAGACATT ALGGAGCTCAA GCTGGrrCAT CTGGTTA'rGC GCACTrMAA GCAAATCCAG AAATTTGAA GAATATTGTC GCTAATAAGG 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 19s0 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 AAGCGAATCA ATACCAAACC TITTAATCAAG CCTTGATTGA ATGGTCTATT GATTGACCGT GTCTATGCAA ACTATTATTT ACGATTATAA TGTCTTTACA G7rGGACTAG AAACAGAAGC AGGAAGATAC AAACTTGGTT AAGAAGATAA AT'GAACCTTT GCAAGTTCCA AGAAATCAGC CAA.AAATGGT ITTCGAGAAC.A AAGAAGGACA GTAAGATAAA ATAGTGGC~r, AAACTGCCTT TrTGAAAAAC GATCGAATTG AGAAGCACAA CGGTrrAA TTTTGCC-49M GGAGCCCGTA TTCAGTCTT 'rACAAGGACG TGTAGCAACC AAGCAAGTAA TTGATTAGCA AAACGTAG'TT TTTTTTGTAA TcTAGGAAAA cG.ATAATAGc GATTGAATAT GGATAATTGA ATATGGAATA GCCC-ACTG-TG ATTTCTAAAA CATTGT'rAAA AATTGATTNG ACTTCAAAA TTAAAA713TT CTGTAATGAA ATACTGATGT AACTGTTTTA GGAACAATAA AACGCATAAT ATCAAGM'rr -TTGCAccTTA cArrATGcG1' rr1TGTGA'rT TTAAGACTrG 'N'A~cCATT TTrTAcAATc CTGCGAAATC TTTGATTTCT TGTGCTGACA 'PTGAGAG;TC GCAACGGACG '1-rGAT7T=GC CATCTGAAT-ATGAACAAAA CCTCGTACAG TTGGGATTCC ATAGCCI'GAG CGGAATGCTT1 GCAAATCATrr GAGTTGGCP GGTT~CTTCAC TArrGATGAA GTAAATGTGA GCTTGTTT CAGCTACGAC ACCTGACAAT GTACCTGCAA A~rTACGGCA GTAAGGGCAA GTTTTGCGAC CGATAAAGAA GGI'GCAGTT T'r' ?TTrAT CAAGAGCTTC TTGCGCACGC ACAACTGTAG TGACT'CAAC GTCTTTGATG TTATCTAAAA ATTrGTTCCT GAGATTACCT CGCTI-rCA'Ir GATAAMTCTA GTAGCCATA AAGTrTAA AA~wrCAG ATTTGATACG AAAAAACATC AGTGGTTG GTCTCATCTT TTATAGCT ?~TT=ACA AATGCArTGA TIrrCTGCTTC GATGTTAGCA ATCTTACCTT GTGATTCTrC GTTGGTTTCC CCTACAACTG CAATGTAG.AA CTTGATTT GrTCTGTAC CTGAAGGGCG AACGGCAATC CATGAACCGT CMGCAAGTG GTATTCAAC ACATCACTrG GAGGAG~rGr CAAGrTCTA ACAGTACCGT CACCAACAGT AGCATTTGT GCCTTGAAGT CTTCTACGAC AGTGA'rAGCT CTTGCGTTCC ATTI=GT'rGG AGCATTGT'rG CGGAATTTAG CCATAATCGC TTTGAT=I'G TCAGCACCAT CCACACCTGA AAGAGTAACA GAGATTGTTT' T7T-'GGA GTAGCCATAT TCTTTATAGA TTT'Ci-rCGA ACCGTCAGCA AGTGTCAAAC CACGAGAACG GTAGTACGCA GCAAGTrCAG CAACTACAAG AACGGC7TGG ATGGCATCTT TATCACCTrAC AAATCGTTTA ATCAAGTAAC CCAAGCT'rTC TTCAAATCCC ATCATGTAAG TGTGGTTGTG ?NTTCTTCG AATTCTrGGA rrTrTrrrAGC a GATAAATrG AAACCTGTCA AGACCTTGAA CATAGTTGCG CGTTACCAAG TCAGTTGAAA CCATAGAITT GCAGAGAGCG AGCGTrTTrG TGAGC'rTCCA AGATG;TATTT ACCCATGATA A.AGGT'rGAGC TAGCTACCAT CTT~wrrGAAG AACTTCAACA GTCAGTTGCC ACAAGAACAT CTGCACCAAC 'rTGACGACCA CGCTGCCTrGG Cl'rTCTGGGT 'rTGGAGATGT TACAGTTGAA TTGCGCTTCA ACAACT'rGAA CAGAGTCAAA TCCTGCI'GG CATTTCACCA GTACCATGAA GTGGTGTGA GACA.ATCTTC CCGTAGCT'rT CAGCAATCTT GCATTTTCAG GAAGAGTTCC GCACCGATT'r GGTTAcc'TCA CCAACACGG'r CAGCGTCTG AGTTCTrTCAG CAAGGGCAAA AAGTCTGGGT CAGCAGTTGC GCAAGAGCAC GACGAGCCAA ATGTCrrAC CAAATTCTTC 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 AATCAAGGCT GGOTTGATGT 'rTATGTCCI-r AACCTCTrA ACGTATrTCTA 7TGTCAACACC TTCGCCGATA ACTTCAATCA AGCCAGAAGC ?rT=CAGT' TCCACATCAG CAACTTC.AAC TWCAAATGGG 'I1rCGATTC CACGGATATA AGTAGTCAAA GCGTCCGCA'r CGTGTGGAGG CATTTGTCCA CCGTCTTCAC CGTAAACCTT GTA.ACC=?A AATGGAGCAG GGTTGTGCCT GCrGTGACC ATGATACCTG CGAAACAGTT GAGATGACCA ACTGCAAATG ATAGTTCTGG AGTCGGACGA AGCT~rAA ATACCTAAG.A TrGATGCCC TGTr'rAGCAA GAACTGCCCC AGATTWAAG GCAAACTCAG GTGAGAA=r ACGGCTATCG TAGGCAATTG. CTACACCGCG rL L TlTc rCG TTTCCACCTT TTCACTCAAT CAAACGAGCC AATCCTTCAG TAGCT'rGGCG 856 AACAACGTAG ATGTTGATAC GG?1'TGTACC AGCACCAACC AAGCCACGC-A TACC".rCAGT 4380 ACCAAA'rrCA AGATTTGTAT AGAAGGCATC ?I'CCTTAGTT T7IrCGTrCCA TATTTCCAA 4440 ATCTTGACGA AGGTAGTCAC GAACTCCCAC AAAATCAACC CATTCTGGT AATTN'CTTG 4500 GTAAGACATT CAAATTCTCC T rATT'rA AAACATTTAA TCAG?1'?AAT TATATCATTT 4560 TTTTTACTTT TAGTAAAACC TTATCTGCTTr CGAACATCTC TTCAAACCAG GTCAGATTGA 4620 ATIwrTTGGGC;T TATATGCT TGAGGCTAGG AAAAATTCAA TTCACTAAA AAAAGTAAG? 4680 CTrCTCATAA CAAAACATTG ATATAGTTAC TTAGI-rTrAA ACAAGCATAT TATAATAAAG 4740 CTATGGCATA TAGTACTGAT TrTAAACAGC GAGCArTAGA T'TACATCAAA GAGGGGCACA 4800 GCCATGTCGA GGCAGCCAAG 7?'=GGTG 7TGGCGTCAG AACTCTCTTC ACG;TGGGAAA 4860 ACAAAGACGT GAACAAGAAC ACATAGAGAG GAAAAAGCGA GTCGTCAAAA ACCGAAAGAT 4920 'rCTTAGAG GAAT'rGAAAG CCTI'rGTAGA GGCTCATCCA GATGCTTTTT TACG-GGAAAT 4980 TGCGGCACAT TrrGATrGTIG CTGTTCCTrC AGTATGGGCA GC-TTAAAGC AG.ATT.AAGGT 5040 CACTTAAAA AAAGATGACG AGCT'rrAAGG AACAAGACCC AGAAAAGTAG CCTTA'rTr 5100 *TAAGAATrr AATAGTN'AA AGCACCTAGC ACCTGTTTAT ATTGATGAAA CAGGAATCGA 5160 ***CCGCTATCTC TATCGTCTT ATGCAGGGGC TCCTAGAGGG GAGAAAGTCT ATGAAAAGA'r 5220 TAGCGGACGT CGTTTTGACC GAACT'rCAAT TGTrCAGGA CAAGTAGACG GAGAGTrrAT 5280 AGC1'CCCATG ATTTACAAGA AAAGCATIGAC AAGCGA'TTC TTTGTG.GAGT GGTTCAAAAC 5340 GCAACTCCTA CCTGCTrTGA AGACACCTCA TGTrA7TGTC ATGGGCAATG CTG.GI'STCA 5400 TCCCAAGAAC ATTTTGGATG AACTCTGCAT CCAAGATAAA CACrrC2 TACCTCTACC 5460 ACCTTA7TCA CCGGATTTGA ATCCTATTGA GCAACCTTGG GCTATCTTGA AAAAGAAAGT 5520 GACGGATGTA TTAAGGGAAG TTCCAACTAT TTTTGAATG;T TTGGAATGCT TTTTAAAAC 5580 TAGATGACTA TAACGGTTCT AAAGGAACCT ATCGAGTAGT CAT'IAAAACT AAGGATACTG 5640 .CTGGT'rAAGA GAAGACGGTA TACAATC-AAA CCATTCACCG TGTAGCCGAA ATCGTTCAGA 5700 0ATGAAGACTT GTATC-AGAAT GAAGAc?1'GT ATAAGAAAGG TTTGAATGTT GAACTrGCGC 5760 ACCAACAAAT TAAGGGATTT 71rGAAGCA.G AG=NAAAAA TCGTATTAAT GGAGTTCTTA 5820 *ATACTAAAAT AAAAAATAGT AC-ATTAAATc GTGTAAATAA 'AAAAAcI'ATA cACCAGAGGA 5880 *.*ACAAAAACTC CATGATCA6AT TTGAAGCAGA AGCAACGGAA GATGCTAAAA AACAAGGCGA 5940 TATTGTGTTG AATGTTGACC AGGATTTCAT GAGCATATCT AAGTCTAATA AAAGTGGTTC 6000 AGACTGGAAG AAAACI'rTCA CAGTGAGGAT AACCAATAGG CTrAGCAAATG ACTTGAATAA 6060 TGTCTwTGAAA CAGGTTGATA AAGATACTCC TAATACCCCA ACT7GGCTAA ACTCAGCIC 6120 857 T'rCTAAAGCT AAACATGATG ACAGACTATA TAAACTACTG AAGACTCTTA TACCAGGAGA AAATrACCTA TCATGTTAAG GATAATCAGC TAGAAGTAGA AACAGATAAA TACACATATA CTCCC.CTAG AAATGGTAGT AAGGAAGrTG GTA'TCAAGA GTCAGATATA GCAGCAACrC T.AAGTGCCGA TGAATATAAT GCAAATGCCC TTAATAATIGG CACTTrcTAG GGGTAGACAA 'rCTGATAAGA TrAGGAAAGA TrCTCTACTG GGGATGTTCT TCAGAGAATA ATGAAAGAGT CCAATTCTAC CTGACTr'rTC GAAAAAATCA TTCCAAAACT ATCAAGAAAA AGAA.AAAAAC .CA~CCC AAALClI""ITGA GAGAGAATAC 'rTGGGCTACA TCTGG7'CT AAGATTCAA AGGGATTGTG CGAACGAATG TACTGACTCG AGTGGGCTCT GGAGATAGCA AACTAGGAAA ATTAGGAAAA GATGTTGTTT CTTATACCGT
AAATACAAAA
AAACTrCTCC
TAAAAAACTA
AGGCGGC'rAT
ACAAGTATT
AGGCTAAAC
AGTCATCCAA
AAATATTCCC
CTCAGAATTG
ACTCAAAGTC
GATAC-GTGG
GAAGAAGAGA
GCAGAACTAA
TCATGAAAA
AAAAALACAAC
ACCGTGTTCA GTATAATCTC AACCATCACG AACCGTTGI- AAGGGAAAAt AACC.AAGAA TCTCAGAAAA TGTCAAA GTT ATCATACTOG AATTGAGAA TGA'TGCGTAC CAGCTATAAT *see 9000 0.* CGCTATGTTG ATGAACAAGG GCGTTTGCTA AAAGAAAGTG ACGGAACCTA CATTACCAAT GTCACAGA'rA A.AAAACTCAG TAGCATGACT ACTACTCACG GA.AAATATTA TACTT=TAAA GAAGCAGATA CAAATTCTGCC AAGTTTAACT GGGAATATTG TAAGCGAAGG TAGAACAGTG ACCTTAGTTr ATAGAGAAAG CGAAGCGCCA ACCACTGCrA CAGTAACAGC CAATTACTAT AAAGAAGGTA QGCAAGAGAA GTTGGTAGAC TCTGTTATAA AAGC-GATTT AGCGATAGGT TCTGAGTATA CCACAGAATC AAAAACTATT GA.AGGGAAAA CAACAAC1'GA GCACAA.AGAA GACCGTTA TCACAAGGAA AACAACATAC ACCTTGGTAC CAACTC-CTGA AAATGCGTAC CAGAAGACGG TGCAACAGTT GACTATTACT ACCGrGAGAA TGTTGAGGAA ACAGTGGTTC CCAAAACACC AACCTCTACT GAGACGAAGA CTATAACGCG TA'TCAT'rCAT TACGTrGATA AAGTTACGAA CCAAAATGTA AAAGAAGATG 'rTGTTCAACC TGTAACCTrA AGCCCTACAA AAACTGAGAA CAAGGrCACG GGAGTTGTAA CCTlACGGTGA ATGGACAACA GGAAACTGGG ACGAGGTTAT ATCTGGTAAG ATTGACAAGT ACAAAGATCC ACATATTCCA ACAGTTGAAT CACAAGAAGT TACGTCAGAC TCTAGTGATA AAGAAATAAC GGTAAGGTAT GACCGTTTAT CAACACCAGA AAAACCAATC CCACAACCAA ATCCAGAGCA TCCAAGTGTT CCGACACCAA ACCCAGAACT ACCAAATCAA GAGACTCCAA CACCAGATAA ACCAACTCCA GAACCAGGTA CTCCAAAAAC TGAAACTCCA GTGAATCCAG ACCCAGAAGT TCCGACTTAT GAGACAGGTA 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500- 7560 7620 7680 7740 7800 7860 858 AGAGAGAG3GA ATTGCCAAAC ACAGACAG AAGCTAATGC TACCT'GGCT AAGrGCTGGTA TCATGACCTP GTrrAGCrGGT CTAGGA'rTAG GATTr'-CAA GAAAAAAGAA GATGAAAAAT AATAaATTT AGAATCTAGG AACCAGGAAA AGCrCACAGA TGTGGGCTTT TTrCCTGGTT T TGAGAACGA GGTCTTTCGT AAAGAATAAA AACr.CTTACA AGTCTGTTGA ACTGGGAAAC 'PATGAATCCT ATTTTTTrAA AAATATTTCC- AGAAATCAGT TGCGG INFORMATION FOR SEQ TO NO: 123: SEQUENCE CHARACTERISTICS: LEGTH: 8697 base pairs TYPE: nucleic acid STRANDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION-. SEQ TO NO: 123: CGGTACCGGG AACGATACTT AGTCTAATTr TGCACCTTTT CCATGTATCG TAAAGGT"rT TC-TTTTTrTA AAAAGGAAAA CGAGAAGAGG AGGTTCTTAT G.AAAGCAAGC ATT1GCCTTGC AAGPTTACC CCTAGTACAG GGGATTGATC, GGATAGCT'T TATTCrATCAG GTCATTGCTT ATCTGCAWAC TCAAGAAGTG ACGATGQTAG TGACACCATT TGAAACGGTC TTGGAAGGGG
S
S. S. 55 S S AGTTTGATGA GCT'rATGCGC ACAATGTCTT TGCCAATGTC TTGAGAAGTA TACTGACACG TGGCAGTrAG CAGGTTTTCT AT'TCTCCAGC CCTr-rGTTCG AGAGTGGCTr TACTGGGGCT ATTCTAAA.AG AAGC=CGGA AGTGGCAGGG AAAATAAATG TAGGAGAGAT ?rTAAGTATT ACACATTACT CTATTGGGCT 'r'CT'CGGAGT TAAACTPC'TC CCCAAGTTTA TCCTGCCGAC TCACAGAGAA TTTCTCT.GGC ACCA'rAGCTG GATIrTGGGA GTrTTGATTG CCTGTCTTAT
C-AGGAGGCAG
GATGAGAAAC
A''rGTCAATC ACCTCrTGAA
GGCGACC=G
S.
S f
S
0 .5 S S ATGGATAGTT TGACTT1:GCr CAATGACCTG ATTCCGACCA 'rTGCCATAGC TCCTA'ICCTG AAGATTGTCT 'rGATTATCrT AACGACAACC T'rTAGGCATT GCGACAAGGA TATGCTGACC CAATCCrGT GGCATTTTAA AATCCAGTr GTCAGTGTCT CCTACGCCTT TATCACAACT GGTCTrGGTG '=TATATGAT TCAGTCTAAA ATTATTA'PTC TGGTGTCGAT TATCAGTC?1' A'TrTACCCTA TGATGOGTGCT GTCTTGTGGC TAGGTT-ATGG TTI'CCCATCA TCGTTAGTAT TTGTTTAGTC TGATGCGGGC GGCTGT~C(-TC 600 CATrCAGACC 660 GAT TTGCCC 720 TTTGGACGGT 780 CAAGCCTTGG 840 AGCCTGCCTT ACTrATGC AGGTCTGAGG GTGGTATCTG AGT'GGTTGCG MGTTGAA AAACTGTTTC AGTATGATAC CAT4GTTrGCC T'rGGG'IATGA AGCI!GGTCGA TA'rCAGTGAA AAATATG'rGA TTAAATGGAA ACGTTCGTAG AATTAGAATG TTTCTGAAAaA AGAAAAGAGG 859 AAATCAAAAT CAAGAAAACA TGGAAAGTGT TT~rAACGCr TGTAACAG-CT CTGTAGCTC TTGTGCTTGr GGCCTG"=G CAAGGAACTG CTTDCTAAA4GA CAACAAAGAG GCAGAACTTA AGAACCTGA CTTATCC2TA GACTGGACAC CAAATACCAA CCACACAGGC CTrATc2-I' CCAAGGAAAA AGGrATl-rC AAAGAAGCTG GAGTrGGATGT TGATTGrAAA TTGCCACCAG AACGAAAGTTC I rTGACT7rG GTTATCAACG GAAAGGCACC ATrGCAGTG TATTT'CCAAG ACTACATGGC TAACAAATG GAAAAAGGAG CAGGAATCAC TGCCGTTrCA GCTATTG??C AACACAATAC ATCAGGAATC ATCTCTCGTA AATCTGATAA TGTAACCACT ClCAAAAGACT TTGGGGTAA G.AAATATGCG ACATGGAATG ACCCAAC'rGA ACTTGCTATG TTGAAAACCT TGGTAGAATC TCAAGGTCGA GACTMGACA AGGIrCAA.AA AGTACCAAAT AACGACTCAA ACTCAATCAC ACCGATTCCC AATrGGCGTCr. 'rGATACTGC TTCGATTTAC TACGTGGG ATGGTATCCT TGCrAAATCT CAAGGTGTAG ATGC1!AACTT CATGTACTTG AAAGACTATG TCAACGAGTT TGACTAC2'AT TCACCAGTTA 'rCATCGCAAA CAACGACTAT CTGAAAGATA ACA.AAGAAGA AGC-TCCCAAA GTCATCCAAG CCATCAAAAA AG=~ACCAA TATG.CCATCG AACATCCAGA AGAAGCTGCA GATATTCTCA TCAAGAATGC ACCTGAACTC AACGAAAAAC GTGAC7TTTGT CATCGAATCT CAAAAATACT TC;TCAAAAGA ATACGCAAGC GACAAGGAAA AATC;GGGCCA ATTTCACGCA GCTCGCTGGA ATGCTTTCTA CAALATGGGAT AAAGAAAATG GTATCCTTAA AGAAaACTTC ACAGACAAAG GCTTCACCAA CGAA=~GTG AAATAATGAC ACAAATTAGA CTAGAGCACG TC-AGTTATGC CTATCCGTCAG GAGAGGAIrr TAGAGGATAT CAACCTACAG GTGACTTCAG GCGAAGTGG? TTCCATCCTA GGCCCAAGTG GTGTTGGAAA GACCACCCTC 7TrrAATCTAA TCGCTGGGAT ITrAGAAGTT CAGTCAGGGA GAATTGTCCT TGATGGTG;AA GAAAATCCCA AGGGGCGCGT GAGTTATATC TTGC.AAAACG ATCTGCTCTT GvGAGCACAAG ACGGTrGCTTG GAAATATCAT TCTGCCCCTC TrrGATTCAAA AGGTGCATAA CGCAGAAGCT ATTTCCCGAG CGGATAAAAT TC7rCGACC ?TrCCAGCTGA CAGCTGTAAG AGACAAGTA'r CCTCATCGAAC T1TAGCGGTGG GATGCGCCAG CGTGTAGCCT TACTCCGGAC CrACCrTrrT GGGCACAAGC TCTTrTCTT AGATGAGGCC TTTAGCGCCT TGGATGAGAT GACAAAGATG GAACTCCACG CTTGGTATCT TGAGATTCAC AAGCAG~rGC AGCTAACAAC CCTGATCATC ACGCATAGTA TTGAGGAGGC CCTCAATCTC AGCGACCGTA TCTATATCTT GAAAAATCGC CCTGGGCAGA TTGI-rTCAGA AATTAAACTA GA'rTGGTCTG AAGATGAGGA CAAGCAAGTC CAAAAGATTG CCTACAAACC TCAAATTTT GCGGAATTAG GCTTAGATAA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 860 GTAGAAAAAD AGGGAG=rGG TGAAGATI'AT CCTTT1ACCAG CGCCCIwr CTrAAAAA 2940 TGALGAAAATT TCGGTATAAT AGTCAAACAA GGTCAAG=~ TAAAcAGAGA cGTrGGTTG 3000 'ZrATGACArr TAAAAATACA TCGGATCATA rGAGGCCTA CATCAAGCCC A?!rTAGATC 3060 AATC-TGGTAT CGTGGAG 'rG C;AACGGAGTC AGrGCAGA TACC~TTCAG GTrG"TCCTA 3120 GTCAGATTAA CTACGTGATC AAGACACCCT TrACGGAAAG- TX~..GCTAC TGtMrA 3180 GTAAGCGCTGG TGGCGGACGC TACATTCGTA TAGGACGGAT TGAGTT'rwr AGTCATCATG 3240 AAATGCTCCC GrCAGCTCCTT TACTCCATTG GTGAGCC-AGT CAGTCA-AGAA ATTTATGAGG; 3300 ATATTCTCCA GC=.rCGG- GAGCAGGAAT TGATGACCAA GCAGGACXPG. AA7,rGcrAG 3360 AATCAGTAGC TrrGGATC--C GTTTTAGCAG AAGAAGCTCC AG1'TGl=rCGA GCAAACATGC 3420 TACGTCAGAT CATACAAGAG GTAGATAGAA AAGGGAACTA AGATCAACTA ?TCAAAACCA 3480 TrGAATGAAT GTATCGAAAG 'rGCCTACATG CTTGCTGGAC A'IrrTGAGC TCGTTATCTA 3540 GAGTCGTGC ACTaTGAT TGCCA'rGTCr AATCACAGTT ATAGTGTAGC AGGGGCAACT 3600 TT'AAATGATT ATCCGTATGA GATGGACCGT rrAGAAGAGG TGGC7TrGGA ACTCACTGAA 3660 ***ACGGACTATA CCCAGGATGA AACCTrTACG GAATT'GCCGT TCTCCCGTCG TTrGCAGGrr 3720 ***CTTTGATG AAcCZAGAGTA 'rGTAGCGTCA GTGGTCCATC C-A6AGGTACT AGGGACAGAG 3780 CACGTCCTCT ATGCGAkr=r GCXTGATAGC AATGCCNTGG CGACTCGTAT CTTGGAGAGG 3840 CTrGa=TT CTT.ATGAAGA cAAGAAACAT CACGTCAAG;A 'rTG;-cCTc-r TCCGTCG;AAAT 3900 T'rAGAAGAAC GGGCAGGCTG GACTCGTGA6A GATCTCAAGG CTITACGCCA ACGCCATCGT 3960 *ACAGTAGCTG ACAAGCAAAA TTCTATGGCC AATAGATGG GCAkTGCCGCA GACTCCTAGT 4020 GGTGGTCTCG AGGATTATAC GCATGATTTG ACAGAGCAAG CCCCTTCTG;G cAAGrrAGAA 4080 CCAGTCATCG GTCCGGACAA GCAAATCTCA CCTATGATTC AAATCTTGAG CCGGAAGACT 4140 **.AAGAAcAACC cTGTcTTGGT TGCGGATGCT GGTGTCGGGA AAACACC GGCG;CTT 4200 *CTrGCCCCAGC GTATTGCTAG TGGTCACGTG CCTCCCGAAA 1'CGCTAACAT GCGCGTGTTA 4260 GAAC7rrGATT TGATGAATGT CGTTCCAGGG ACACGCTTCC GTGGTGACT TGAAGA.ACGC 4320 a..ATGAATAATA TCATCAAGGA TATTGAAGAA GATGGCCAAG TCATCCTCTT TATCG.ATGAA 4380 *C'TCCACACCA TCAI'CGrC TGGTAGCGGC ATTGATTCGA CTCTCCATCC GCCAATATC 4440 'rTGAAACCAG CC~rGC.CGCG TGGALACr'G AGAACGGTTG CGTGCCACTAC TCAGGAAGAA 4500 TATCAAAAAC -ATATCGAAAA AGATG-CGGCA CT=CCGTC G7TTTCGCTAA AGTGACGATT 4560 GAAGAACCAA GrGTGGCAGA TAGTA'rGACT ArM-ACAAG GrTGrAACCC GACTTATGAG 4620 AAACATCACC GTGTACAAAT CACAGATCAA GCCGTTGAAA CAGCGTTAA GATGGC~rCAT 4680 CGrrA1-rAA ccACTCGTcA GCAACAGTCC AAAATAAGGC GACAAGGCCC TG;ATGGATGG 861 C?~GCCAGAC TCTGCTATCG ATCTCTTGGA AAA =ATGTA AAAGCAGACC ATCAGATT CAAGTGGAAA CAGGCAGCCC AG3CTAATCGC T12AGGCGCCA
CGTCAGCT
AAAAGAAG2AG GAAGTACCTG. TCTACAAAGA CTTGGTGACA TGTCAGGAA TCCCAGTTCA AAAACTGACT GAAGCAGAAC TCCATAAACG GGTTATCCGT CCCAT"TCCCC GCAACCAGTC AGGGATTCGC 'rTCCTAGGGC C1'ACAGGTGT CGGGAAAACT TTTGACGACG AATCAGCCCT TATCCGCr'rT CTAGTCGTC TCAACGGAGC TCCTCCACGC ACAGAGAAGG TTCGCAATAA ACCCTATTCC CACCCAGATA 'rCTT-TAATGr TCTCT'IGCAG AAGGGACGCA AGGTCGATT TTCAAATACC ACTGCCCTTC GTGATGATAA GAC'TGTTGGT GAAAATATGG AAAAACGCAT GTTTGAAGAA AACCG;TATTG A-GAGAAGGT GGTCTTCCAT GTGAAGA??A TGGTCAACCC TTTAGTGGCA TTACAAGCTT CAGCTCTGAA ATTG~rAGCA GAGTCTGATA 7?rTGcAC~ Cj-rGA.GrCc CAAACGGATG CTAAGAACTA TrrAAATCI-r CAAGATCAAG CTGTCAAG CATTAGCCGT AGTCATAAGC GTCCGA~rGO ?rCC?1'TrG GAAT'rAGCCA AGCCTC'rGC AGAAGTC1-r GATA'rGAGTG ACTATArGGA GAAAT=rCA TATGTACGAT ATGAAGAAGG 7rrGGGAGTTG G7T=CTCTc 1TATGAGGT AGAGAAGGCC GTTCTGGATG ACGGTGTCTr GACAG.ATAGC ATTATCArrA TGACATCGAA TCTAGGTGCG TTTGGGGCTA ACCATATTCG 7rTTGACC-AG CTGAAAAAAG CTTATAGACC GGAATTCATC AGCCTATCTA GTGATCA'rAT GCJ4CGAAGTG AGTTTGACrG AAAAAGGCAT TGACTTGAAA AATCAAGGAT ATGACCCAGA GATGGGAGCT 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 CGCCCACTTC GCAGAACCCr GCAAACAGAA GTGGAGGACA AGTTCGCAGA ACITTTCTC ,AAGGGACATT TAGTGGCAGG CAGCACACTT AAGAT'rCGTG TCAAAGCAGG CCAGTrAAA.A TTTGATA7TTG CATAAAAGAA TAAAAGTATC AGCATCTGAC CATAAGTC-Ac ACTGGAGTGA AATTCAATGA AAATCAAAGA CCAAACTACC CAGCTAGCCC UZG7TrC AAAACAcrGG TTTGAGGTTG CACATAGAGC TGACGTGCTT TCAAGAGATTP TTCGAAGAGT ATGAAACTAA AACCTATAGC 'rTCTAAACGA TCCGTGGTTT ?CATCATTCA ACACAAAATI' CATATVMA TTACCCTCCG TCGTATTTGT CTrAGAGCGT GMTAGTAGA AAAAGAGCAG TCTTATCTCA AA7*rTTTATT C=rCAAAAC AC-ACCTrT CTTTTTrTGCA TGTCAAATCC GTTCTAGCTG GTA'rTrGAAA AATCAAACTA ATATTCAATC AAAATCAAAG AACAAACTAG GAAGCTAGCC GC-AGG7TG CAA CACTG 7IrMGAGGT GTACATAGAG CTGACGTCCT TTGAAGAGAT TTCGAAGAG TATAAGCTGC AAGA'rCAATG A~rrTCTTGT ATTGACGTTG TGTTGACAA 862 AAAGTACC-GG ATAAATCAAA TCCATT'CCAT TATCATAGAT CATAGGCTGG TAGCAATr TCAAATAGCA TACAGGAAA'r AL.ATGTATGG AGTTCTCGTA GTAGAAAGGG AGAGAGATCA ACA'ITTACT TGCAGATGAC GAGGAAATGA 'rrAACAAGG AATTGC.GCA TTTCTGACAG AAGAGCG=TA TCATGTCAT? ATGGCTAAGG ATG.GACAAGA GGTC-rC-GAA AAATTTCAAG ATCTCCCrAT CCA'rCTCATG GTACTGGATT TAATGATGCC 'rAGGAAGAGT GGTTTTGAAG TGTTAAAAGA AATCAATCAA AAGCACCATA TTCCTCTCAT CGTCT-GACT GCTCTIGGGAG ATGAAACI'AC TCAGTCACAG GTA1rGATC T1CTATGCTC;A TGATCATGTG ACAAAACCTT TTTCT'=GT ACTGCTTCGTC AACTA-rA AGGCCrrAT AGGATCTTTG GCGATATCAG AAAATGAAGA AATTGATCTC ATAAAAATCA AGTTTTAAGT ATTTACCTTC TG.ATAGCGTC TAGATTGTATT CG'rGACTGTG TAAATTATTA ACCAAGTCTT~ CATTCATATA GCTATTTATT GTTTrAATGAG AGCGCAAGAG
AA.ACCAAAGG
AGAGAGCAGA
G'rTGAT.GTCT AAAAA'rGTTG
TTTTAAGAAG
TGACCTTTCC
AATrACTGGT TAT'rGGAAGA
ATATTCGTAC
GGTATAAGAT
TITGCAATT
TTTTTATTAT
GATGTAACAG TGGAI-,"nAC CAGACG=-AC TACGTCATAG CTCTT-ACA.AA GCACATTATA ACTAAAGTCT TTGATTCAGC AA'r-rCAAAA GATGTAGCTG TCTCGCAAA AAATTAGCTT TAGCTTATGA TAAAAAATCC CTAGGTGGTG 'rTGGTCTAGT ATTCAACTGG AGGGGGAAAA TcrTTrACGGA GTATTrAA.AG ACTAAGACAT CTGATGAAAT
TCCAAGCTTA
TATTGTAGAT
AAATTATATC
CTCCAGTCTT ATTCAAAGTC CTTGACCATA TCCICACC AAGCCGCrCC CTCrTGTGCA TGAC71rGGAT ATTAAAGATG GTGAT=~AG ATATGTrCTGT TAGTACAGCA GATGGTAAAC 7TrAAAAGAGA GAAAGClwrC
AGGTAACCGT
a a a 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 GCAA'rrTT~ CACGGGTGG ATGTCTACAA AGAAGCAAAG AATATTTTGC T=TATCT CCCATATACA T7=~GGTTA CAA'rrGCT TTCCTTTrGTT TT'rCTTATT.TTTATACTAA ACGCr'rGCrC AATCCTCTT TTTACATTTC AGAAGTGACT AGTAAAATCC AAGATTTGGA TGACAATATT CGTTrTTGATG AAAGTAGGAA AGATG.AAGTT GGTAACTTG GAAAACAGAT TAATGGTATG TATGAGCACT TGTTGAAGGT TA=~ATGAG TTGGAkAAGTC GTAATGAGCA AATTGTAAAA TTGCAAAATC AAAAGGTTTC CrTTTGTCCGC GGAGCATCAC ATGACTTGAA AACCCCTTTA GCCAGTCrrA GAATTATCCT AGAGAATATG CACCATA.ATA TTGGAGATTA CAAAGATCAT CCAAAA'rATA ?TGCAAAGAG TATAAATAAG ATTCACCAGA TGAGCCACTT ATTAGAAGAA-.TACTrGGA= TT-CTAAATT CCAAGAGTGG ACAGAGTG;TC GTGAGACCTT GACTGTTAAG CCAGrTAG TAGA'rATT7T ATCACCTTAT CAAGAATTAG CTCATI'CAAT AGCTGTTACA ATTGAAAATc AArrGAcAGA TGCTAcC-AGG GTcGTC-ATGA GTcT'rAGGc 863 A?1'GGATAAG GTTTGACAA ACCTGATTAr TAATGCAA?1' AAATATrCAG ATAAAAATGG 8280 G1CGTGTAATC ATATCCGAGC AAGATGGCTA TCTCTCTATC AAAAATACAT GTGCGCCTCT 8340 AAGTGACCAA GAACTAGAAC AT TATTTGA TATATTCTAT CATTCTCAAA TCGTGACAGA 8400 TAAGCA--AA AGTCC-GCTT TGGGTC?--TA CA,":GT.GAAT AATA =11 AG AAACTAT CA a46 AATGGATTAT AG N'rCTCC CTTATGAACA CGGTATGGAA TTrAAGATTA GCTTGTAGAC 8520 AGATTAGTTT TTTATTAAAG TTCATATAGG GTrAACATAA GTGTGTTATT CTTTGTGTAG 8580 ATAAAAGAAA GGATACTAAT ATGGTATTAG CGATrATTTT AGTAACATTC TTTA'rrCGAT 8640 TGATrTTTr AAAGCGrTCG ATAGAGAATG AGAAACCAAT CCTTAGCAAT GGCGGGG 8697 INFORMATION FOR SEQ ID NO: 124: SEQUENCE CHARACTERISTICS: LENGTH: 4317 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: AACCATACAT ACGGCAAGGC AAAGCTGACG CGGTT rGAAG AGATTTCGA AGAGTATTAG TTGCCTTTAA AGGCATCCAC CATCGTTrG AA'rrCTTCAT TTGAGAGAGT AATCCCTrTG 120 *CCCATTTTAG TATGGrCTGG ACTCCAAGCA CGAATATCAA ACTTTGCAGG GGCACCATTA 180 AAGCTCACAC GGTTAATTTC CTTGGTCCAA CCTTTCGT TTTCAGAAAG AGTCAACAAG 240 *TGCTCTTCGA TTTCAAATGT AAATTCTGCC ATTTTCT TCT CCTTTTTTAG TTTCATTAGT 300 TTATTCGTAA AATCT'rGTAG ATrTTAGGAA AATTTTATAT AATAFTGATA TAAAAGAAGG 360 GAGGCCAATA TGAGACATAA ATTCCAGCAA GTTCTAAATA AAATACATGA TTTN17TAAAT 420 *GGATATGACC AACCTGACCA GACTGAAACC AACTCCCTTA CAGCCACTAT TGAAGAGGCT 480 ATCCAGAAAC AAACCGCTGT TCACCTTATC TrGTCTGAGA CAAGCTrTAC ACGTGACATC 540 ATCAAATATG ATCAGCAACG CCAGCAAATr ATCGTGAAAA AT-rTTCCAA AAATCTGAGC 600 CGGATTATCC GTATAAGCGA TATTCAACGC CTGCGATTTG TCCCCTCAAC TGTCCAAACA 660 ***GCCCAAAAAA ATAGATTTAA GAAAGAGTGA GATGTAGTTG CN'CATCCCA CTCrTTTTTC 720 TTAGCGAATT TGTTCAAAAT GTAAATGAAC TCCGATATQA TCTCCATAAC CACTTCTTTC 780 CAAGTCACGT TGTAAACGAT AGGAAATGTA OTGTTCTGCA ATGGTAATGT AACCTGCGCC 840 CAATAAACGA TGTTCAACCA TAGATTGAAT CATACTOATA GTCGCACGTT CCACCTTGGC 900 864 TTCTGTAAA TCCAAAACTA CCTTCTTACT GAC=rGAGCA AGATNTGAC GCAAATCATC TGlrCAAAACA TAAACAGTTT GCCGCTGCCTT CAAGATC4GCr TGCTAAATCT TATC."GGATT AAATTCAGCA ATrTCGCAT TACGTTGAT TACTrGCM'A GG?1-rCTCCT 'rTATTCTG T7,rCTICrA 'I-r'CTGCCAG CAT "111CT TCTTrCTACTC TrCAGTTGATA ATG~rCAAGT 960
AAATCCGGTC
TTTCATCGT
CGTG=CACT
TCCT'rCCAA AArrCTCCAC
ATGCGCTCAT
AAA'rCr-CAG TGC=CCGTA cG?rTTCTTT AAACTCTCGT ACAATCGCC-A CTGACCAATC GGCCACTCA'r CAATACATCT GGCACGACCA TCCCTCGATA ATCATAGGGA GAGGATATTC TAAAAGACCT GAAGAAAAAC TATCATCTTC; GTGGCT.AGAC TCAC-.rCTrG AATCAGGCGA ACTGTAGCAT CAATC-ATCGGT CATAGCTCCC CAGAGGAC ATAGTCACCT AGGGAAATCr C-ATCTCTTAC CA.AGGTCTTA CATAACCCTC ATAGTGCCCA CAGATAAAGA TTAGCTCTTC CTCTTGAGCC C-ATAAGCCTG ATCAAACTGC TTCCAGCAG GATCAAGCAG AATAACCC 1200 1260 1320 1380 1440 1500 1560
S
S
I. 55 S S
S
S 0 t S. S.
GGATrTrC7 TTTCAATAC CCCTGACCGC CTCCGTAGGG AAATTATGAT ACTGGATATC ATCAAAGGAA TCGAAAATAG GTT= CTC'? GAGCAACATG CTCATCATCT ACATGACGGG CCT'ITCAGC ATT=CTCGA
CAAGAGCCCT
TCCAGTGGAG AAAACATCTC TCGAAAGAGG TTrCTAAGATT TCCACATCGA CCCGTTTACT GATATAAGGT AAAAGCAAAT CACGT=GCC TTTTCTCCGAG CC=rCCAAC GTTAAAATAT CAA--CTTCAT TGGAATATCA ACATTAGAA TTTTCGTGC ACCACCCAGA ACCTGCTTGC AGGAT'rTCCT TGATGGTTCC AACCAAGCTA ACCGATAATC TC ATAGT AAAATTCACC ATCGTCTACG GACC-TTCAGA CTGTATCCCT TGTACr'rTrC GATAGTATTG TTTAATAATG; TCAAAGTTCT TCTGTTTACG GTGGCTAGCG CTGATCTrT'r TCATCAAACA AAACCAGCTC AGCTCCTTT ATCCGTCACA GACAACACTC GCATCTCCCC CTGTAATCCC AACATTAAAG TAGTTCATCT TGTCrCCTGT AATCTCCTTT
TCACCC-TCAT
TrCATTCA.AAT ATATGcrrACA
ATGGTCACTG
TTAAACCCTT
TGCGTATTAA
GATTGAGTGC 1740 CGTCTAACCC 1800 CCACTGGTGG 1860 CATCATTAGC 1920 AGACTTCCA.A 1980 CTTCCTCACC 2040 'rATCTTTGAA 2100 TTTGGACAAA 2160 CTTCGCAAA 2220 CGATTTTCCC 2280 TTTCCATCTT ATTCTAACAA On.
0 5555 50 0 5
S
TTCTCGAATA ATAGCCCCAA TT'rTrCCGA TTCTGACCAT TGTAAATAAT CGTGATTCCC T*CCTAAAATG AGTTTAGTAT TGGAACTCCA ATATTCTGAT TCTCTGTACT CTTNrCTCT ATAAGGCTGA CAAAAAACAA ATAdAGGAAT ATGAGCTTCT ATAGATACA'r CCTCAAAATC TTCCTCAC;TA A'rCTCTCCAG ATATCTGAAA TTCTGGATCT TGATTTTCCA ACTCTAAGCC ~TTTCTC ATTAATTCCC ACATrrr'IrT A'rrCGTTTCA GGACTAAATC TTGCrrGAGT TAAGTTCTTA AAATAAAGTT CAGGACCACA CTCGTCAATC ACCCTCATCT GCTCTTCCZAT 2700 865 TTCTGG-ATAA GGATTTTCTG AAAAATCAGC AAACATGACT 'TAGTTG TCGGTPCAAT 1'GCTACTAAA GTCTGACGCT TAATTGGTTT CTCGAGTAAT T!'GCAAGCTA AAATTCCACT CCAACTATGT GCACAAAGTA TATATTCAGA AATTCCTAAT TCTTCAAGTA CTTCATAAAC CG-ATCTGCA CA GATTT-,=CC AGCTlTGCA GTCGGAAAA TCAATTGTCA AATAACCAAT TGTAGCAGGA AAAAITTTTCA TAAC"I GGTA GCAAACCTGC TCCGTTTAAA CrtTTGATAA GTAACAGAGA GGCTACCAAT TTCTGTAGAT CAAATCCACT GATTCTATAT AATGAATTAT TAAAAATCCT TGAAT CGGAC ccTTTTTCAA
CAAACTAGCA
TCCtlACCTGT
GTATAAGTGA
CTTTCN-1-rG AC7-CAAACC TCTTCATAAA TATCCr-TTAT TTT'A'rCACGT TCCAAGGATr
CTCCAAATT
TCTTGGTAAG
CCAGTTCTCT
TTCTCAAGTT
ATCTTTATGT
GTCAATATGA
GGCAAGTTCC
GGAGGAAGGG
TGGTAATTAA
TTAATAGCIA
TCGTTCGAAT
GACAATArCT CTACTTTCCC -TCAATAATC TGGCTGCGGT 'TrGTCrI-rC TCAAACACAG CGATTGCGAC GGTGTAGTAA ATGATATCAG CCTATCCCTT CTTCGAC-- AGCCGCCTA TTCAAAACCT CGACTACTTC 'rCCGACTTCC TCCACTAACT TCATAAAGAG GTCCGAGACT CGTTAATG TTCGATTAAG TAGTCTTGGA ATTGCCTAAA ?r'rATACTAT ATTGAAACTA GAATAGTACA CCTrTACTTC ATrrGACTGT CCTGATCGAT TTGTCCTr C7D.=TTCAT ACACAAAAAA GCGAQACATC CGTCCCGCCC TCTArr'1 'I L L Tl IG AT 'rCAGTTGGGA CAGAGTAGAC AATCGTTCTT CTTACGACCG ATTACACCAC CCACATCGCT TTGATCAAGA TTCTGGTGTA 'rCCTCAA'rCT TGATAGTTAA GGCATCGT AATCGCAATA ATGAGATTTT CAATCGTATC CATCTGTCAA AAAATTTAGA ATCGTGGAAT TTTTTrCAATA CGCC'rTCT
TAAAACATTG
TTTACTATAT
TCGTCAATAA
ATCGCAGAAA
TTCAAATGAT
TGTCA.AATTA
ACCTTCATCA
CCTTC.AATCT
TTAGAAATCG
CTTCTA'rTCC
CGATTCTTAC
TAGTGCGACC
ATTCCAAAAA
AGGGTTTCAC
2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4317 CCTACTTTAA ACTrA'rTTTG TGAAAGGATG TTACCTACTG TGTCTGAAGG TTGAGCTCCA TTAGCCAACC CTTCGTTTC AGCAACAAGT GGGTTGTAAC GTGGTGAACC TGAA'rCTGCT ACGTTGATAC ATGCAAGAAC GCGCTTCT TTCAAAGTTA TTCCAACTGT TTCGATGAAA CGTCCGTCAC GGTAGAAAG TrrTTTTA GAACCCATAC GAGTCAAACG GATTTTAACT GCCATTrTTTA AAGTCTCATT TCTTTAATTT T'TTATTTCGG TGAAATAGCT GAGCTATTTA GCACATGTTC TATTATAGCA GATTTCTGGC ATGTGTC INFORMATION FOR SEQ ID NO: 125: SEQUENCE CHARACTERISTICS: LENGTH: 4881 base Pairs 866 TYPE: nucleic acid S"TRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUEN4CE DESZIPTI0NJ: SE-Q 10 NC: 125: AATTrATTTG ACTGGAAATT GTAGAGGGTT CTCGAAATTr CTTGAATGGT TAAAATAAGG ACAAAGAGJAAA ACMTGGATAT CTATATCCTT GTGCCAAAAA AACCACTGCC CTCCCCAGAC CAACCTGAGG AAAGCAGTGA TTCTTATTTT AGGAGT'rAGG AA'TGAATAC.A CGA.AATCAAT T'rAGCTGATT TTTCAAGAAT TCATCGTATT GT'rT7rCAT TrCGTrC.AAT'
ACTTTTTCGT
GGGTCTACAG
GAGATTTCAG
GCrCTGCCA
TAAAGGATCC
TCAAGAACCC
CCGTAAACAA
AGGCACCTTC AGATTTCAAT TTTTCCATCA ATTCTGGALAT CGCT77ATCT TACCAGTGP GATAGCTGTA TCAAATTGyTT GCATI=GTTT AGCAATAGCT AT-rCACA-T GTCAGTATTC AAGATAAATC CAACCCTGG AGATTCT'TTA ATTCTTCTr AGAATTTwrCG ATTrIGTTGGT CTGTAACGTT TTCGTTGATG
AGTTGTTACC
GAACACCGTT
GACCGTTCAA
AGTGTTCCAT CCACCCATG- GAGTGTTTCC TTTGTAGCCA TTCTT-rACCT TCAATT~r CCC.AG-TCT GCCTTCTGGA
GAGTTCTGGG
GAT-TTTC'rr TGTCTTAC-A GTTVI-rrGAG TTTTTCTTCA TGAAGTTAGT AATrGGTTTG AGCAAGCTGT TACCGTAGrrC AGCTGGTCCI' TrCGTATTCA AGAGGTTCAA ATGACAAAGT TAGCAACT'rG ATTTGGATAT CT'rTGTGGC ACTGTTTCTT CACGAACGAA
GATTTCCATT
TG?=GTTTGGC
AACACGTGAA
CCAAGTATC~T
120 180 240 300 360 420 480 540 600 550 720 780 840 900 950 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 TGTTGAAGCT CAAAGGAAGT TAGAATTTGT GAAGAGI'CTr ACTTTAGTAG TATCGCCTTC TCAAAATTAT CAGATCGGAT TTTCTTGA TTGTTCAA TCGATACCAT ATTTAGCAAG ?'PGGCTGC.AA CTGGAACAGC GGGTCAAGTG CTTPGTAAAG TAAGCACCT1' TTTGAGCATT CCAGATG)LTG TGATAACTGA TCCPATTTGG CACCAACTTT ATCGCTTGTT GCG.ACGTCTT TTGGAATGTA GCCAGCTTCA CAAGTGTTCT TTGAAACGAG GCACTTCGTA ACGGT--ACA AAGGTCGATA ACGAATGGAA GACCGTTTGC TACTGGGTAG CAkAAACTTTA CCAATAGCAA ATGGTACTAC CTCTGGAGCT GACTGGCTCA AGAGTTTCGT AAGAAGTAAC ACCTGAAATA GAGAGTTCCG 'N'GAAGGCAA AG IPrTGAGA TGATGC.AACG GTAAATCTTA CCATTI'ACAG TATI'ACCCTT GATGTAAGCT GTCTTTACCT TCTTTTGT ACAATTCTGT CAAGTCAGCG TACAATATAG TTATCTGC.AA AGGCAATATC ATAGTTTTCA CATTTI-CTTA CCATAGTCAC CCCAGCCAAG GTATTGGATA TTCTCAATG ATTTTGT1TGG CA'rTrGCTAA CAATTCATCC AAGTTGTCTG Gr=GCACC GATTErGGTAC ATTTTGATAA CAGGTTTGTC ACCTGAATCA 867 GCAGCrTT-1- TrCTG-,TACC TGTCAAATTT CCACAAGCAG CAAGACCTGC AGCCAG.AGCG ACI'ACACTAG CAGATGCAAA AGCATArT TTCAGTTIr TCATCATAAA AACTCCTr-r? TTTA'rTA AACTTA'rAAA CAATGTAATG ATIrATACT CAATAAAA.AT CAAGACAA ACTAGAAAAC TAGCCGCAGG CTGCTCAAAG CACTGCCTG ACG=1CAGA TAAGACTGAC GAAGTCAGTT ACATATATCT ACGGCAAGGC GACG'rTGACC CGCG'GAAT TTGA-.rrrCG AAGAGTATTA ACNTCACACA AGGGAAGTTG GGAACTGAGA AATC=rATTT CTCA6ATAAGC ACTATTCN'T CACACCACCG ATAGTCAAAC CTTTACAAA GTALGCGrGG AAAAATGGAT ACAAAATCCC GATTGGA.ACG GTTGCAACCA CAACCATGGC GTAGAGCAAC TCCCAGT'rGA CCAATCAAGC CGACCGCTT GTTGGAT'T-G CA74GAGCAAA GGGCGCTTGAA CCAGTCATTC C'DGGTAGTGA CAATCGCAAA TACGAGCCCA TITCTAGAATG TCATGTTAAA TGGTGAGAGA GAAGTAC.ACG GGTCACCATG GAAGGACGAA GATGTAAAG TATTGCAATG GATACAAGTT CAGAAACCAA GAGCTGrTAA
CAGATTTGGA
GCTTCTGGAA
AGCA'I1GGAA
ATATAACCTG.
AATCTGCGAT
AGAAAATCCG
TGGTC=rCT'
CAATCAAGGC
GTACCAAACC
ACTTAA6ACGT CATACGACCT GTTTCTTTrCC GC.CAATG.TAG TCCATATT GTCACTCTTG ATGTAAACAA GACCrGATG GTTGCGATAC GCCCTCACTG GCACCATCGA GAAGAAGGAA CGCATCAAGA CCA.AACAGTG TC-ACCAAGCT AGC GAAC AACATACTGA TGTCCGTGAA ATAGCGTAGG 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 252G 2580 2640 2700 2760 2820 2880 2914 0 3000 3060 3120 3180 3240 3300 CATAGGTG TGTCATAAAG ACATTTGTCA ATGTCCCAAC TACGGTACA AAGACAGAGA TGAAGAGGGC TTGTAGGATr 'rTATCC~rAA ACTGTGCCAA AAACTCAAAA CCCTCTAAGC CAAAT'rGGGA TGGCAAGAAG CTATAGCCGT ATTGGAGGAG GCTTTCTCG TCTGTCACTC AAATAATGAT AACGAATACA AAACGTAGGA TACAAGAGAG GGCAATCAAA CCCCA.AATGA TACTGAAGAA GATATCTGCT TTCTTACrGA AGGACTGAAT GCCGACATTA TCAA*TN CITwrTTAAT TTTCTTTTr GCCATArrCT CCTCCIr=CT AGAACAAAGC TGAGTTGGA TCGACTCGTC TTCGCA-AGCAA GTTGATAGG ATAAcC-ACAA 'rcAAAccA6Ac AAccGA1-rGG 'rAAAGACCGG CTGCTGCACC CATACCGATA TCTGCTGTCT GAGTCAAACC ATTAAAGACA 1'ATACGTCCA AAACGrGT= TACAT'rGTAA AGCTCACCAC CATTGTGTGG GAI-rTGATAG AAGAGACCGA AcGrCTGCcCC GAAGATATrr CCGACTGCAA GGATGGTCAA TACAGTTACA AGCCGAGTCA ACTGACGAAT GTACGTTG CQ.AATACGTT GCCACIrGCT ACCTCCGTCC ACTGTCGCTC CrTCGTAGTA GGrrGGATCA A'PTCCCA'rGA TCGTCGCATA GTACATGACA CTGCTATATC CAAAGCC=r CCA.AATACCT AGGAAAAGTA GGAGATAGGG CCAGATGCCC 868 AGCTCAGCGT AGAAATTGAC TTCTTGaAGA CCAAGACNTT CCAATAGATC ATffGAACACC 3360 CC1TTTATCAA TATTAGGAA CGCATCTGTA AAGAAACTGA TGATAACCCA AGACAAGAAG 3420 TAAGGGAACA ACATAGAAGT TGAAAAATC TTCACCAT TCTTAGAACG GAGCTCCtTG 3480 AGGATAATGC CAATCCCTAC AGATACAACT AAACCTAGAA AGATAAAGCC AAGATTGTAG 3540 AGGAcAGTAT rrcTGTGAT AATAAACGCG TCTCTTGAAC 'rAAATAAGAA 'rCTAAAATTA 3600 TCGACTCCGA CCCATTTACT ATTTATGATA CTATCTATGA AACCATrACT GGTCATGTGG 3660 TAGTCTTrGA AGGCAACCAC GTTCCCAAAT ACTGGAATGT AAA.AGAATAG AATCAACCAG 3720 ACTGCCCCTG GCA.AAACCAT CAAGAGAAAG ATCCACTTGT CTCTCAATGT TTGAAAAC 3780 'rrTTTCATAA ?wrrCCTCCCT T''rATTTTG ATA'rCCATCT AAAAATTCTT T-rrAGACTTl 3840 TTGATAACGA TTACATTATT AGTATACTCC TATTTGCAGG TTAGGTTAAA CTCCTAATTA 3900 TAGAAAAAAC TCCACAAATT ATGTAGCACA T'rAAAACTT TATCACCACT ATCAAACAAA 3960 TGTCCTAAAT CAA'rTGT=A TTTTATCTCT AT'rAGCCCAG 'rGATGGCGTC ACTCTGTTAT 4020 *AAGCATCCAA CAACGGGGTA TACTGAAAAA TCTCCAGACT AGCGAACTCA GCGATAGTTC 4080 *CTAATCTGGA GATwTTTAAT ATGTTATTAG GCGTTTGCT- TCAACr"TAGC AATAACCTCT 4140 'rAAGAT'rAT CAATCAACTC TGCTGCAGTA TGCTCACAGC CT*=~CATC TGCCAAGA6AC 4200 .AAAACTGCTT TTTGAAGTTC TTTTTGAGAG TTTTCAAGGA CATCCTTATC TACTcGrTCA 4260 *AGGTTTGAGT CT-TAAGAAG TTACTTAAT TCCTTGGCTA ATTTCTTGAG TTTCA7='rC 4320 AGACTCATCT TCTCCTGCTG 'rTTCI'TTGCC CGCTCTTTGT CCTCCATCCT TAGTTGCTGA 4380 ***.CTGGCT'rTCC TTAATGGACT C'rAGGGAAGC AATGGCATCT 'rTCACTGTTT GCAAGATATC 4440 *ACGTAAACCT TGCTCTGTCA AACTATCATC TGCAAAACCT TTATI'AGCCT CTGCCAAAAC 4500 CAGACGTGCT GAATCTGTGG TAGGATTCGA 'rACACCTGTC AATGATCTCA AAAGATTTTC 4560 ***TAAGGT'T1GA GTCTGCTTAC TAATACTAGA CTAAAATCAA AAAGTATTAT ATAACAGTGA 4620 TATGAAATCA ACTAJAAG2AAG AAATCCAAAC CATCAAAACA CTTTTAAAAG ACTCTCGTAC 4680 AGCTAAATA'r CATAAACGCC TTCAAATCGT TCTATTT'rGT CTCATGGGCA AATCTTATAA 4740 AGAGATTATA, GAACTTTTAT ACTAGTTTGA AATAAGATGT GAACATCTCT ATCAGGAAAG 4800 ***-TCAAATTAAT TTATAGAAAT ATTTTAGCAG CCAAGGTGTA CTGTTATAGA TTCAATACAC 4860 TATACTTGGT GGTTTAGCTC G 4881 INFORMATION FOR SEQ ZD NO: 126: SEQUENCE CHARACTERISTICS; LENGTH: 13121 base pairs TYPE: nucleic acid STRANDFlDNESS: double TOPOLOGY: linear (xi) SEQUNCE DESCRIPT'ION: SEQ ID NO: 126: AGCATCCCCG GAAAAGGAGA CTAAAAATGA AGAAAAAAT 'rCTAGCAT-T TT=CAATrT TATTCCCAAT T1-rCTCATTA GGTAT TGCCA AAGCAGAAAC GATTAAGATT GTTTCTGATA CCGCCTATCC ACCTTI'TGAG TT'TAAAGAT CAGATCAAAC ACATTATTAA CAAAGTCGCT GAGATTAAAG GCTGGAACAT TTGACGCAGC AGTCAATGCG GT TCAAGCTG GGCA.AGCCGA CAAAGACTAA AGAACGTGAA AAAGTCTTCA CCATCTCTGA TTGTCA'rTGC TACTACAAAG TCACACAAAA TTAGCAAGTA CCGT 'GG1'G TAAAAACGGA ACTGCCGCTC AACGTTCCT ACGGCTTTAC TA'rTAAAACA TTTGACACTG GTGATTTAAT GTGCCA'rCGA TGCCATGATG GATGACAAAC CTGTTATCGA TTATAAAGGA ATTGATGTTG TCAGATGTCC TATCCTGGAT CGCIATCATG GCAGGGATGA TACTTACTAT GATACAAAAG CGACCAATTA ACTGGCAAAA TGAXAACAATC AAAGATAAAT GAACAACAGC TTGAGTGCTG ATATGCCA1-r AACCAAGGTC TT=GCcrirc GGTGTGAAAA AGCC-ITGTCT GAAATGAAAA AAGACCTCCA TATTGAAATG GAT"GGTGAAG AAGGAAGTAA ATACGAGCAC CTG rACTG AAGATGGTAG TC-TATAAA ATTATCAAGA
CTGTAGGAAG
AA'IrrAACCA AATGCAC'rGC TTCATCATCT CAACTACAAC TACTCTCGCA GGATTAAAAG CTATTCC"=G CCAGCGATTC TTCIrNGCC TTGATATGGA ATTGATTAAG ACCCT GGTTr TGATGCTGCT CTGGTATG'rC TGTCACAGAT CTGCTAATAC CATTCTTGGT AAGGAAAGAC AGTCGGTGT AAAGCAAATA CGGCTACAAA CC?1'TTG1'r 'rCCAALAATTC GCAATCGCTA AAGACCAAGG ATCAGTGCTG TCCAAGCTGG CCTCGTAAGG CAAC=TTrGA GTCAAAGAAT CAAGCAATAT AAAAACGGAA CTGCTTCTCA ATCAAAACCT TTGCTGATGC GCCGTTATGG ATGATGAACC TAACCrAAA*
AAGCAACCAA
'I-rTTGA.AATT
TCAAGCCGAT
CCCAGAA
TCAGCAGTGC
rATATC.ATTG TACACrGGTA GA.AArCACCA
GGTATCATCG
TCATACTACA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1260 1260 1320 1380 1440 1500 1560 TGCCATTGAT TAAACACTGO TCCTCTT=AT GAAG.ATUTAA AACCTTCCTA ACAGAAAATC 'rTC7'rAATG TATGACAGTr TGTTC1'CAAA TAT'rCTATCA AATCGGTGAA ACAGCCTrM CAACAACGGA CTTGCAAACC CCTAGCTAGC GAATCTTCAA CTTGCTTCAA AACAACTACA GCCAAGGTCA AAAATTGAAA ACTCCAATCT CCGTTAAAAA AGGAGCAAAT CCAGAACTGA TTAAAGCAAA CGGTGAATTC CAAAAGATTC CTGCTTrCAAC AAGTACTG'rT GACGAAACAA
CTGGAACTCC
1TTGAAATCTT
TTGACAAATA
CGCrCTGGGG 870 AACAACTCCT TAGCCGTCTI GG1TATCACTC TGC=CAGC TCTTATCTCA CCA7"rGTCAT CGGAATTATC 'r'rCGGTATGT TTAGCGTTAG CCCATACAA.A TCATCTCTGA GATrTCG= GACGTTATTC GTGGTATTCC ATrIGATGATT 'rCArCTTC'rG GGGAAI'TCC:A AACTTCATCG AGTCTATCAC AGGCCAACAA ACGACI'TGT AGCTGAACC ATTCCCTCT CACTCAATGC GGC'TCTT'r TCGTTCGTGG 'rCGTA'rrCAG GCCGTTCCAG 'rrGGCCAAAT GGAAGCCAGC GTATCTCTTA qrGGAAAAACC ATGCCTAAGA TTATCTTGCC ACAAGCAACT TGCCAAACTT TCGTCAACCAA TTCGTrA'rCG CTCTTAAAGA TACAACTATC TCGGTTTGGT TGAACCNrC CAAACTGGTA AGA=ATCAT TGCTCGTAAC TCAAGATGTA TGCAATCCTT GCTATCTTCT ATCrGTAAT TATCACACTT TAGCGAAACC CrrAGAAAAG AGCA'r-cG= AATGGCAAAA T'rAAAAATTG rr-rACACAAG CACTATGGAA AAAATCAAGT CCTAAAAGGA ATT1ACCACTA ACGAGATGT'r GTTTGTATCA TCGGTCCTTC AGTCTGGT AAGTCAACTT 7rTGCTATTG TCTCT'rCGCG
CTGCAGCCT
AG.CCCAATTA
ATCGCTCAAA
CCAAGCTTGG
AAATTGATCT
GTATCT-GCTA
TACCAAAGTT
TTGACTAGAC
ATGTh.AATGA
AGTTCTATCA
TCCTCCGTAG
ATG.ATTTAAC
CCTCAATCTT TTAGAAGAAG TGAAAAAACA ACCAATGTG CAACCTCTTC CCTCATATGT GrTGA'rGACT AAGGAAGAAG AGCAGATAAA GCTAATGCCA CATCGCTCGT GCCI'ACCAA CCTrGACCCT GAGA'rGGTTG CATGACCATG ATTA'rCGTAA TA'rCTIrACT GCAGATGGCG CCCACAACAC CCTCGTCTGA TCACTAGCGG TCACATCACT GTGAACCGCT ACCACGTCCQ TGAAAATATC CTGTATTGGA CAACATCACC CTGAGGAAr GGAATGGC ATCCAGATAG CCTATCAGGT TGAATCCAGA CATCATGCTC GAGACGTACT TAACGTTATC CCCATGAGAT GGGATrTGICT AGTTCC'rTGA AGACGGAACA AAGAGTTTT AGATAAG=T GGCATGGTAT TCCALACACTT TTrTGCTCCrA TTGACCACAA 7IrGC=rCAAA AGGT'rGGACT CGTCAAAAAC AACGTGTGC TTCGATGAAC CAACTTCTC AAGGAATTGG CTGAGCAAG CGTCAGGTTG CCAACCGCGT CCTIGACCA.AA TCTTrGATAA '1rAALACGTCT AAACTCAAAC TTTNGATTT TTCGGAAAAT CTAGCAACAC TAGGAATAAA 'rACAATTTTC GGTGCGAGTG 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2 94 0 3000 3060 3120 3180 3240 3300 3360 TGTAAGGATr T4=MICAGT I'TTCIACCT TATGTTAGAA 'rTAAGT-rAT GAAATGAGGT AATAGAAA7T AGGTAGCrAG ATGTCATCTA GAGACCTGGC TAAACGCA.AG CTCTACCCT TTTCCAAGCA CNTGCCGT'r AT-rX.AACTG
CGTATTGGAA
T CCTCATAC
AG'MATTGT
CCC?1'TTTAG AC'rATATCAA TCCGGCAATC CCCGTAGACC ITGGAGTAAG GAATATTr'rG AATCTGTAGT TGTCGAGTCC ATCCTGATT TGGCAGATAG TACCGAGCAA GCCCAAC;AAT 'rrGCTAGCCA CTTCTACTAT CAAAGCCATG ATGTCAATGA VCCGAACAT 'rATATTGCTT TCCGTCAATT AcAAGC"=A TCTTG=CAT GCCACCTCAG TTG-MGATGG CAAAGGI-rr CAACTGCAAG CAAGTTGAAT CTA'rCCACCA ?rATCTrG.GT ACrrGATr~l' TGAAAACGTTr CGGAGCGCrr GGGTAGAA ACATGGTCCA AAACCACACT GCrrCACAAA AGACGAGATT CAACTGATGA AGAACTCAA ATGGCATGAA A'rACATCTCT CTAATrGAAA AATACCAACC TGAACACAAT AAGCT=CT TTCT'N'GGAA CC.MTTGCCAA ACACCTCAAA TCTGAAAACA GACCCrCA TCGTGAAAA ACCATTTrGT ACAGATTACG GACGAACTCC TAGCAACATT TGACGAAGAA CAAA~ri-CC AAGGAAATGA TCCAAAGCAT CTTTCCAGTT CGCTTTGCAA TCGAACAAGG A~r-A6TCGA CAATGTTCAA AtI'ACCTr'rG GA6ACG;TG=T GCTACTATGA Cr-AATCCGGT GCCCTCCCTG CTACAACTTC r-rCGCTCCT CGCCATGGAC AAACCAGCAA CGTGCTGAAA AGATTAAGGT CTTAAAAAC CrCrATC.ATC GAAcAcN'TA TCCGTGGGCA A'rACCCTCT GGTAAGATT-G TATCGTAGCG AGCCAAATGT GAATCCAGAA TCAACAACTG
S
S.
S SS S.
S
S
*S
St 55 S S AAkACCTTTAC ATCTGGTGCC TTCTrrGrAc ACACCGATCG ATTCCGTGGT GTTCCTTTCT TTTCCGTAC AGGTAALACGA CTGACTGAAA AAGGAACTCA TGTCAACATC GrCTTrAAAC AAATGGATTC TATCTT'rGGA GAACCACTTG CTCCAAATAT TTTGACCATC TATATTCAAC CAACAGAAGG CTTCTCTCTT AGCCTAAATG GGAAGCAAGT AGGAGAAGAA TTAACTTGG CTCCTAACTC ACTTGATTAC CGTACAGATG CGACTGCAAC 'rGGTGCTTCT CCAGAACCAT ACGAAAAATT GArrTATGAT GTCCTAAATA ACAACTCAAC TAACTTTAGC C-ACTrGGGA'rC A.AGTTTCGTGC GTCA7GGAAG TTGATTGACC GTATTGAAAA GCTCTGGGCT GAAAATGGTG CCCCACI'TCA TGACTATAAA GCTGG.AAGCA 'rGGGACCTCA AGCCACCTTr GACCTA=~G AAAAATTCGG TGCCAAATGG ACTTGGCAAC CAGATATCAC CTATCG;TCAA GATGGTCGCT TAGAATAAAA AAA~rrCCTG CA.AGTTTAVG CC'rTGCAGGA T'rTrGCTTC TCATTAGATT AAACCTTCCA AGAGACCTTT CATAAAGTrT TCTGAGT'rAA ACTCTCCAAT ATCATCGAT T'IrrCACCAA AACCAATCAA TTTACAGGA ATATTGAGTT CTrACCAAT GGCTAGAACC ACACCTCCrC GAGCAGTTC ATCAATCrTA GTCAAAACAA TTCCCG=-AA AGGTGTGArr TCAAAATT CrTTGCCTG TACTAGGGCA 7n'rTGACCTG TTGATCCATC AAGTG3CCAAG AAGGTTTCAT GTGGTGCTTC 'rGGCACAACA CG=ATAA TACGACCAAr CTr'rTCCAAC TCAGCCATAA GGI'TATCCTT ATTTTGCAGA CGACCAGCAG TATCAATCAT GAGAA'rATCG ATACCTTCAG TCACGGCACG TTCCATACCA TCAAAGACCA CGCTGGCTGG ATCAGCTT TCAGGTCCAG TTACTACTGG AACATCTACT CGTCGGCCCC ATrCAGCTAC CTGAGCTACT 3420 3480 3540 3600 3560 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 St.
.0 *S S 5100 872 AGCATaACCT =.TACCAGC T'rGT'1-G=AG 5160 GCACCCGCAC GGAAGGTATC 'rGCTGCAACC CGGTGGGCTA G7*rTTCCCAT ACAAG7T=T ATAACTGTCA AGT'rATC~rG GAAGTGGATG AGCT-CAACCA AfrTTCTCAAT GATGACACGA AGCTTCCT CGTAACGTAG TrCCTCC=T CTCATAATCA GGAGTTCTTC CAGTTCCTcG GCJAAAGAAGG CATTCAAGCG GGCACCGAAA TATTTTICCT GAACAGT'T TTCrGTC rrCCCAACAC CATTCACACC CT'rTCATCGT AGCTACCATC CGAAGTACAT CAGCI-rrT AAGTTAGAAGC C~~-GAC AAAAArTCTT CGTCAACAGA CCrGrGCGAG TTrCTTAAC GAC=CTG GTTCAAGCAC ?'TTTCrrcA CAGrCC-rc GGTAATTCTT CTATrrCTrC TCAACTC;rGT CTTCGATTTC CC= CTTCCT GAGAAACTrC TCAAGATT-r CCAG.AGCTTC AATAGACGGT CAAACAATCC CAGGCGACAG CTTCCTCATC TCAGGAACAG CGrrTTGCAT TTGGCCTCGT CACCACAAGC TTTGCCAAAC CTGTTGCT-T GITGCTCAAGC rTCTCTTCCT C-GGAATrC TrGAGAAACC cCrACAGcTG GCrCTGA6ATC CTCTC7rGG AACACAGCTT GTTCAACAAT C-rCAACTT' C TCAAGGTAG GATCAACATC TrTACAACT
CATATCTTAG
GTTGGTCATC
AGCAACACCA
CATCACTTGA
ATSAACATTC
TC-TCGA'rTT TAGGTTCTC 1-rrTC G.CACATAr-C GGCGTC-ACTA CA=.'GCGC AGACCrGCCC Ar'rCA3ATCAT ClrrTGGI'CGA 'rTCCAAGATG T7TrGTGACC ATrCTAGCAA AACAAAGAGC 5220 CTr".CATAA 5280 GGCC'ATrTrCA 5340 ACCAACATCA 5400 GCGGAAGTrA 5460 ACTGCGGTCA 5520 TTCAGAATTA 5580 T'rCTGAGTTT 5640 CrrGACT=rT 5700 'rrCAACCTCT 5760 TTCACACAAA 5820 TTrTT-CCG 5880 TICATAGCC 5940 TGCC7tTACT 6000 AGAGAGGTCA 6060 GCTGATTAGT 61.20 CATTCACGT 6180 GGCTGCATCC 6240 AGATAAGTCT 6300 ACTTTCTTGG 6360 TGATAATTTC 6420 ATCAAGGATT 6480 ATCTTCCTGA 6540 ACCAGTTGTC 6600 ACGATCCGTC 6660 T'CTATATCT 6720 CCAATCACAG 6780 'rGTCCCACAA 6840 TCCATTGTGA 6900
TCTGGAGAAA
GATTTA.AAGA TTTCATATTG GTCAAACAAT AAGG=TCTT GAGCAAAGGC AGTCACGCAT TCAAAGTCCA CTCGAACAAA GGTCAAAGCT TCCGATTGGA TTTGATAAAC TG'TTCCTTCT TCrGrwrTT CATACAAACC TGCCACATCA TCTCCTGTAT 71rTCTGAAC TAATCCACCA CCGTrCAGTcc cTAACTCATG G.AG.AAAGAAA AATACGACCT TGATACCACG ATCACGCGCA TC7=C'GAAT TTGTTGTAGG; TCATTTGACT GGATTGAATT TGGCA-AAAG GATGGCAT CAAGAGGCAG TCATATGAAA AGACTGT'TT TTAAAAGTAA TGGTATACTC TCCATGGCTT 'rrAAGCCACG gCTl'GCAAGG TTrCC7'rGGT ACCT?~TTAT CAGTAGTCAG GCCATrATAA-GCCCTCCATA TCTTTGCTAA TTCTAAAATT CCTGCATCAT ATGTAAGTCA CAAGGTCCCG TCCAAGTCCA A'rGCAATCAA TAAGCTATAA CCGACCGTC CTTATGGTGA TCAGGTCGTG CATTTCAGG AGCTACAGGA TTAAGA'rGT CTCCAAAAGC CA'rGACCTGA 873 TACCAAGT rT=AACTAAT TCAACAATGG CC-ACTCCCTT ATCGACATAG TATCAATGGA TTCAAAGCCA GTTGTCATGG CCTTAACACC ACGACGrT AAGCCTCCCC ATc-TrCCAGC G=?CTTCTG TCAACTrGGT TGTAAATTG CTGTGATATC TTCCAAACTC GCTACr=T GGATAT~nC ATATAG1GC ?CAAATAGGr CTCATCAACC GTATCI'AGAA CATATGAACC CTTCrACcC GTTTATTGAT ATCTACATAA GGTGAAGTTT TCAGCTTTTC AAAAGT.TGCC CACGAGACAT AGTCGCC-rCA 'rACAAGTCCT GACCT-rGATA CTCTACCAAA CCGCGATG.AA AATAATGTCA TCACGAACAC CAGCAAATAA 'nTTCAGA CCCGACCCGA AGC'TACCGCA AAGTAAATCC C N1rrCC? GTAGGAAACC TCAGACGATC CATATCAAAG CGTCtArTCC CATCI'AGGAA GGTrCCGTCC CTACTAGTTT AATTGCATC CTTCAATACT TTCTAAATCT TrAACTTAA CTTGAAACA CCCGATTC=. GCATGGTCAC TCCATAGATG GAATC-AGCCG TCCCTTACCG TGGGTTACGA CGATGAACTG GCTGTCCTrG TCAAAGCGGT CCCAAAACCT TTAACATTGG CTrTCA'rCCAG CGCAGCTTCC ACCTCATCCA TGGAATAGTC TrTGACACGAA TAATGGACAA GAGCAAGGCA AGAGCCCATA ACCACCACTC ATGAGATTAA CAGACTGGAT TTTCiTGCCT GGTGGTTGGA AACCCCAGCT GTCAGCAAGT CTCCTrAGT CAAAATGAGC TCAGCCTG.AC CATCTGCTTG AAGGTCAC'rT TAAAGGACTC ACGAATGACC TCAAAGZrrG TTCCTTGACC TCATCATTCA TCTCTGTAAT GGTCTCAAGG AGCAGG=T AATATCATCA CGrTGGCTAT TTAGGAAATC CAGACGGTTG TGAACTTCTT AATAGCGTCT AAATTGACAG GACCCAGTGA GCGTATACCC TTCT-CTAAAT TCCAGA6ACAA
TCGTTTACCC
AAAATIGTCAT
TGACTCAcTr
GTCAAGAGCA
AGATAAAACr CTGCCATTr
GACAGAAATC
AAGAGAGACT
ATATCCGT'rG
CTGAAACAAT
CTGCCATGGT
TGAGCrAA'rC
AGATAACAAA
GGGCTTTTTC
CAGAAAT rC
CTCCACCAAA
ATTTA.AAGCG
TCGCACACAA
CGTACTGTTC
CCTTAAC
6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 82SU 8340 8400 8460 8520 8580 8640 TTGCTCTGCC AGATTGAGAT GATCTGGTAC TGGTCTGTTrA TTCTTGGCT TCAGCACGAG ATCCAAATGA CTAGCAATAT TrrCCAACTIC ATGCGCCTT TCTAAAG=T ATTGACTTTG TAGATGGCGC AAGCC;CTCGC TrrC=rGCG AATCCACTCT TCArrCTGCT CATCCAGT'rG ACCCTCAATA TCATCCAACT CTGTG;TAGrCT TAACCTTTrC
GGCGAGCCTG
CAAACTGCT
CGAATCAAA CCTTGTTGGA GA7MTTTTT T7WZTTTTG GAGCAATTCT GTATCAACCT TCTCAAGATT ATCAATCTTT TTCCTCTrT TCAAAATCAA CATTCCAA TTCCTI'GCCT TTCATAACCT TT'rGCCCTT GCAGTTCTGT CTTAAGCAAA GAT'T=~CCG. CCTGTTG.ACT TC?-GAAGAA CGCGCTGGAT AAGCGTTCAA TATCAG3CAAC CGAGCTTGCG CTAGCTCTTC 874 CTGCAAG'rT TGATMGCGTT C?1'GGATGGC A'1~TTGTTA r.ACT-TAATCT CTTCAATCTC AGCTTCCAGA T-M-r ~rT CACTCGAGAT Tr.CAGCAAGA CGCTCTGGC AGwr='CcTr ATCCGCCrGC CALATCTCCCT CGGAAAGACG ATCI'AITCC TCTTCrGGA AGTTT'CCACT ?CTTCAACrr GC~tGCTAGT IMrC-TGATAA GCGAGGAACA
G-=CCAAAG
AGcc-llmcTc TCAATCTGGC CTGALATACGT GCCTGCTCTC CATCTCATCI' 'rCAAGCTCT AGCAATTTCT TTr'rGTAA'rr ATTGGCACCA CCTGCATAAG C~wrCGAGAT1r' AATAGCTITCT AA'rCACTCGG
TCAAAGTCGC
GCTCCAGT'rC
AACCACCTGT
CTCTTCTGAA CCCAAGCTTrG Cr'rCrTrCr TGCCT-GATA AAAA'rGCTG'r TATTCTGGCG GCGCAACTCT GTCCCATCC-A ATGTCACCAT TGCACGCGC-A 'rGTTCTACGG TATCAAAGAT ACGAACCTGA TAACGAACTT GGCGAGCTGC AGCCCGTCCTA GCTAC-CAAGT
CTCATCTGCC
AATCG;TACC
TTTAAGGAAG
ACTTGCCCCT
ACTGACTGCA
ATCCCAAGGA
GCCTTGCATAG
TCAATAGCCT
AAGGCAATCT
CCAATAATCC
TCTTGAAAAT GGC-rCCAGT AACCTGGCCT TACAGCGATA 'rCGTCAAAGG AAGAAACGTT TC =rGCCGA CTCTTCATCT CTAGGGCAGT 'rTGATAATAA CACCTAGGCG ATCTTN'TCT CTAGTATCAA AAGTCACCAA GCATCTGGT TCIGACTACA GCACGACCGG CTCTGTTCCG TCTACGATGA TATGCTGGCT ACATCAAAGG TCAGATGCTC TGGAG.AACAC TCTTAACACC 8700 8760 8820 Baso 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 TCCATAAAAC TTACTATCATrrCTCAGGAT ATTTTCCA.AA C7T'wTGAGCTC GrrnTGAGA TT'ATCCAGAC GGTCAAAGAG TTGGCCTrrGT TGAGCTTGAT CTGCTCCTCT, TGCTCCTTGC CAATAGCTTG GTAGTCAGCC- AATAATT1CT CTGGCAGTT TCAACCTT CCTITTTGCTG ACTAGC=TC TCT0'rAGCTA CTCTrrCAGC TTTTCTAGTT GATCTGCNTG 'rTTGAGAA AGCTGCACGAC CTCATTCTCA ATACGGGTCA ACTGG'rTrGA GACATCCGCT TCTTCTTGTA AAAGCGTTCA CGTAAGAGCT CAATCATCrG ATCAGGATCG TCTGAGAAAG
TGGCCTGC-IT
AGGAAGTT
GAACCTGCTC
TAGCDAATTG
TATTTTCCAA
AAAGAGCTAC
CCAGCAAITTC
AGCTTCTAAA CGA'IrGAGTT AGAGCITTTCT 'TATCAGACT CA.AACCGGCT TCTGCCTCCT GCTAArrTTT CTTTCTAAAT ?rGGCCAT'r TCAGCCTGTA TAATTTTCA -eGCTTTTGGT TTCCTGTCGAC TCTAGTTCAG I"T'rATTATT T'rGGACTAGA TTTCCCI'CIA ACAGAGCTAA TTC1TTGCT GAGTGAATT'r cCT=ATCCT CCAAAGCAGC GT"TGATT.'CAA GGCCACTTGC TCGGACTCCA GTT'rCGATAG CACTAATCAG AC'TAGTCAAC 'rCCATCAAAC TGCCTTGGTC AATCTTGGCG TTGCTTTTITA AGAG1rGAT TTTCTTCTTC AATAACTCAT CAACAGTTCT TGAACCTGAG TCAACTCTTC CCTTArT"rC CTTGAITTGA GCAACCAGAA CATCTAAATA AATAGCCTTA CG'N'GTCCTT CCAACTCTAA AAACTTACCG GCA=TC'AG CTTGCMCC
AAGAGOCTTG
CGAGTTTrcC A~CCAGCTcr
CTTCCCTTG
ATTTGATTAT
TGc-AGTr'rAC
TCAAAAATAG
GAAATAATAG
CCAACTCGTA GATAATGTCC TCTAAGCGGT CCAGATTATC TCrCGGTTrC IT1'TCTG.CGA GTCTTGTATT rrAAAAcCC CTCGTCGTTC CTC-AGGCTTO GAATTAAAAA TCCcA ATGAATATCA CGCAGACGGA ATAGACATGG CGTTCCACCC ATCCAGAGTC ACAACTACAG GATGATATCC GGCATCTTGC ACGCAGAC-.T TC'TGTAATAT ACCTTGGTCA AAAACGACCT' TTCCTTTAAA TACATGAATC GCTAATTTCI' 'AGAACGACC ACATCAAAAA CCTrATCGTG ACATCACCAT TGACCTGAAG AACTCGCCTG, CTTCAACC-- ACA'rCCT'rA'r CCAAAAGAAG TCACGATTGC GACCACC-TGA AGAAGGAA'rC CT'rTCTTGCC TGArrCIG
AAGCATAATT
CCCCACGG.AG
GTCAATCT'G
ACCTGCATCC
GAGCGGTTTG
ACTCTTGACA
'rGGACTCC AGATCCATTG TGG'CTTA'rC AGCAAAACAC CAACCCCC?1'C TCAACGGCAT TTGCCCTGA CCGATGCTCT AGCAGGCCCT GTrCAGAAA (C-A=CcGG AGATGGGT TCGCCCAAT CCACTATCCA AGAAGACGTC TATTCGCrAT CTCCAC1ACG TTGA'rAAATC CC.TC-ATGATT CGAC 'rCGG TTCCAGCAAA CTAGACTCCC CCAAAGCCCA GGTCCAACAA CTGCCGTCAC ?TGAACCCCT GAATNTCGAT 7?m'GGCAGC TrccTGCTCr TACCTTCAAC AAGAACTTCT 'rCACCT4GATA ACGAATAGCC TAL'AG-CTGT AATCAICTCA 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280
AGGAATCATG
GGCACCAAGA
TTrTTCTTcC TGGCAATCAC GCGCAAAACC AGCTAAACTC GATAGGTCAC CI-rCACGCT T'ITAGGATAT TGTAGAACAG CGTCTCCTAA AAATTCCAAG TGC1TCATrrGG CATAACTCGT ATGAGTAAAG TCGATTGCAA AATGATTCTT TAGTACA=I ACTTGATAGA TAAAI'CZ!'T AAGGCTTCAA AGGCATCACC C=TACCC.A ACTTGATAAA TCCTCACGGA CAATCATAGC rrTATATA GA'1TTCI'GA CGTTCATTGT GTGAATNT GCAG1TTCCA GTAA==1'T TGTAAr= TCATACCAAC GACCTTGGCC 11340 AAGAATCGTG 11400 CTGGTCAAAC 11460 ACCCAGTTT 11520 AATCAATAAC 11580 TAAGACCCGG 11640 GTCTGCAAAT 11700 CTCTTTCTAA 11760 AAAACGGCAC 11820 AATTTGG=t 11880 CCCTGAGGGA 11940 AAGGGTGGAA 12000 AAGACCCCGT 12060 TCAACGTGTT 12120 TCTTGTTCCA 12180 crr-ATAATAG Tcc?T1TAT TATATcAAAA AAA~cccccT. GAGTCrAcTcT TGGAAAGCAT TrGGGAATTC TrXTAGACAGA GATTCTCAGT 'rTTAGCCGCCA AGCATAAAGA AAAAAGCCCT ATTAAAGGCT TTTAGCATG TTTACATCCA ATcGAAcccc cMTccAAG.A ACCGAATCT TAcGTGATAT ccATAc1' ACTTGTTrrA TTATAACAGA AATTTGCTCT AATAACAAGT TTTTGTCA CTTrAGTGGCA AGCATCCCCA TTCCAGATGG AGNT=CAC GATCACATAA TAAGGTCAGC AACCTGACGT CCACCTG.CAT AAGAAATAGC ACTTGAACG TCrCAGTrAA AGTGTCTrGC AGATGACCT TAGCAGGAAG CAAGATACGT 77GCCTCCA 12240 CA? rTrTGTA AGCACCr-r TGATATTGTG AGGCTGAACC ATAATATTCT TrCAACTG r 12300 CACCATCGAC TTCAATCGTT TTCCCTGGAC! TTrCAATGTG 'rCCTGCAAAG AGCGAACCAA 12360 TCATGATCAT GCTAGCACCG AAGCGGATAG ACTTAGCAAT ATCACCGTGA GTACGAATTC 12420 CT.CCATCAGC GATAATCGGT TTACGCGCAG CCTrGGC.ACA CCAGCGrAGA GCAGCCAACT 12480 GCCAACCACC TGTACCAAAA CCAGTCTTAA CC-TGGTGAT ACAAACCTTA CCAGGACCGA 12540 ?TrCCGACCTr AGTAGCATCC GCACCAGCAT T-TCCA.ATTC ACGCAC.AGCT TCTGGGC= 12600 CCACATTTCC AGCAATGACA AAGG'TATCrTG GCAATTC-TT CTTGATGTGT TGAATCATAG 12660 AAATC.ACGCT ATCCGCATGA CCATGAGCAA TATCAATAGT GATATACTCA. GGAGTATCAG 12720 CCTTGAGCTG GCTAACAAAA TCA'rACTCAT AATCCTrAAC ACCGACAGAG ATAGAAGCAA 12780 TGAGCCCTrG ATTGTGCATT CGrrrAATAA AAGGAATGCG TCCTGCCTCA TCAAACGGT 12840 GCATAATG'rA GAAGTAACCA CCTTTAGCCA GTTGCTCTGC TACA'TTCA TCCAAAATCG 12900 0 0 TCTGCATATT CGCTGGCACA ACAGGTAGTT TAAAGGTGTG ATTTCCTAAA GTGACACTTG 12960 *00 TATCCGCTrC TGCACGGCTT TTAATGACAC ATTTATTTGG AATCAATTGA ATATCTT1CGT 13020 :.00 A-TCAAAT TGGAArCA TTTTCATAT CGATG'rCrCG T1I1CI1=I GT ATGACCrAC 13080 *0 **0 *CTATGCTCTT GCATC.ACTAC GCCTrTTCCG ACGTTT1CCTG G 13121 0 INFORMATION FOR SEQ ID NO: 127: SEQUENCE CHARACTERISTICS: LENGTH: 9578 base pairs TYPE: nucleic acid STRANDEONESS: double D TOPOLOGY: linear 00000 :0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 000CCGAATGCAA TGTTTACGGT TGAACTTGAA AATGGACATC AGATrTAGC AACAGTTTCT 0 0 GOTAAAATTC GTAAAAACTA TATTCGTATr TTAGCGGGAG ATCGTGTTAC TGTCGAAATG 120 AGTCCATATG ACTTGACACG TGGACGTATC ACTTACCGCT TTA.AATAATC GAAAAACTTG 180 GAGGGATAAG AAATGAAAGT AAGACCATCG GTCAAACCAA TTTGCGAATA CTGTAAkAGTT 240 0ATTCGTCGTA ATGGTCGTGT TATGGTAATT TGCCCAGCAA ATCCAAAACA CAAACAACGT 300 CA.AGGATAAG ATAGAAAGGA GAAAAC-ATGG cTCGTATTGC TGGAGTTGAT ATTCCAAATG 360 ACAAACGCGT AGTAATCTCA TTGACTTATG ?TTAPTGGTAT CGGACTTGC.A ACATCTAAGA 420 AAATTTGGC TGCTGCTGGA ATCTCAGAAG ATGTTCGTGT ACGTGATCTT ACATCAGATC 480 877 AAGAAGATGC TATCCGTCCT GAAGTGGATG CAA'rCAA.AGT TGAAGGTGAC CprrCGTCGTG 540 AAGAAACTT GAACATCAAA CGrTT-GATGG AAATCGGTTC ATACCCTGCT ATCCGTCACC 600 GTCC1'CGACT TCCTGTCCCT GGACAAAACA CTAAAAACAA CCCCCG-CACT CGTAAAGGTA 660 AAGCTGTrTGC GA77r.CTGGI AAGAAAAAAT AATATAGGAG GTAAAAGTCT TGGCTAAACC 720 AACACGTAAA CGTCGTCTCA AAAAGAATAT CCAATCTGG;T ATrGCTCA'rA TTCACGICTAC 780 A=rAATAAC ACTArrCTA TGA'rrACTGA TGTGCATCGT AATGCAArrG CTTGGTCATC 840 AGCTGGTCT CrGGTTTCA AAGGTTCTCG TAAATCTACA CCATTCGCTG CT.CAAATGGC 900 TTCTGAAGCT GCTGCTAAAT CTGCACAAGA ACACCCGTCtV A).ATCAGTTG AAGTTACTGT 960 AAA.AGGTCCA CGl'rCTGCTC GTGACTCAGC TATTCGTGCG CrGCTGCCG CTGIGTCTTGA 1020 AGTA.ACAGCA A'rrCTCATG TGACTCCACT GCCACACAAT GCTGCTCG;TC CTCCAAAACG 1080 TCGCCGTGTA TAATC-ATCGC ATTACACTGC T'rrCGTTTA AGACGGAGTA ACTAAATGAT 1140 *CGACGTTTGAA AAACCAAATA*TAACAAAAAT TGATGAAAAT AAAGATITATG CCAAGTTTGT 1200 *AATCCAACCA CTTGAACGTG GCTACCGTAC AACTCTTGGT AACTCTCTTC GTCG'rGTACT 1260 TCTAGCTTCT CTACCAGGAG CAGCTGTGAC ATCTATCAAC A'rrGATGGTG 'rGT'rACATCA 12 5GTrGACACA GrCCAGGTG TTCCGTGA6AGA CGTGATGCAA ATCATTCTGA ACATTAAAGG 1380 SAATTGCAGC AAATCGTACG TTGAAGACGA AAAAATCATC GAACTGGATG TTGAAGGTCC 1440 5TG--TGAACTA ACAGCTGGTG ACA7=rGAC AGATACCCAT ATTGAAATTG TAAATCCAGA 1500 TCATTATCTC TTTACAATCG GTGAAGG'rTC TTCTCTAAAA GCGACTATGA CTG=rAACAC 1560 wT=GCGTGGA TATGTACCTG CTG.ATGAAAA TAAAAAGGAT AATGCACCAG TTGGAACACT 1620 TGCTGTAGAkT T=rATrATA CACCAGTTAC AAAAGTCAAC TATCAAGTGG AACCTGCTCG 1680 TG;TAGGTAG;C AATGATGCTT TCGACAAATr AACCCTTGA.A ATCTTGACA6A ATGGAACAAT 1740 **.TATTCCAGAA GATGCT~rAG GGCTTTCAGC ACGTATTG ACAGAACATC TTGATTTGT 1800 *TACAAATCrT ACTGAGATTG CTAAGTCAAC TGAAGTGATG AAAGA6AGCTG ATACTGAATC 1860.
TGACCACCGT ATTTTAGATC GTACGATTGA GGAACTGGAC TTGTCTGTGC GTTCATACA-A 1920 .CTG=~AAA6A CGTCCGGTA TCAATACTGT GCATGATTTG ACAGAAAAAT C'TGA6AGCAGA 1980 *GATGATGAAA GTACGAAATC TTGGACCCAA GAGTTrGGAA GAAGTGAAAC TCAAACrCAT 2040 TGATT'TGGGT CTrGGA'rTAA AAGATAAATA AAGGAGGAAT ACATGGCTTA CCGTAA6ACTA 2100 GGACGCACTA GCTCACAACG TAAAGCAATC CTTCGCGATT TGACAACTGA CCNTTrrATC 2160 AACGAATCAA TCGTGACAAC 'rGAAGCTCGT GC1'AAAGAAA 'rCCGTAAAAC TGTTGAAA.AA 2220 878 ATGM-rACTC TAGG=AAACG. TGCTGATTrG CATGCACGCTC GTCAAGCAGC TGCIIGTA CGTAATGAA6A T'CGCATC1'GA AAACTATGAT GAAGCAACTG ATAAGTACAC TTCrACTACA GCAC7*rCAA6A AATTGTTCTC AGAAATCGCA CCTCGTTATG CTC.AACGTAA CGGTC-GATAC ACT-CGTATCC TTAAAALC-GA ATCACGTCGT1 GGTGATGC.AG CCCCAATG.GC TTAGTATAAA ATCATCAATT TT7TAGTG TTATGATGAT CCAGTCTTGT
GATCATCGAA
GCTCTTACTC
TAGCTCT=G CTACCGCTAG GA?1-rCGGTC CTAGCGGGAA CACTCATCAT AAGrGGGAT AKAC'TrTAA CCAGCCCT'r
AGTAGACCCT
TTGAGTATrr GrrT'rAT'r
AAAGATAAAT
ATTGACGATA
AGTGA'rGAAA IrGAATGTAA 'rG=TACGAA A~rGrT=~ TCTrAAGAAC TC=TAGAA'r TATG=-ATAC TATTTGAAAA
TTAAGAAGAA
CTGGAATTGT
TCTCTCAAAC
AGCAAGATTT
AAATCAATAT
TTGGAGT -TA CTTATGAAAG TGCAGGTGT 'CTGGTAAAA TGTCTTGGAT GAATATTTTA TACCTA'rCTI' CGTA6ATGAAT TCAG.AGTGCA GCGATTTCG GAA'rCCrIG=TT AATGTTAAG CCA'rrATAAC TG;TTGTT-.GGT TI'GCAGAATT AGGATTGAA'r CATGATGC TG?=GATCT TTrGAAGC-,r- TGCAAACT AACCTATGTA TAATATCTAG CCATGATTGA GGAGCAAAAC ACTCTATCGA TCCAGATATC AGGCGGCTAA T-TATAGCT GAGGTCATCA TGGATATTAG ACAAGTTACT TTCGATATTA GAACCATTAC CATCGGGA'rr AATCGTCCG CGGAGAAAAr CTATCAAAAA GTTGGTGATG; AAATTGCGGC TCGAG7'rGGGA ACACCTATTT CTCTGA=WG GGCAGCGACA GCGCTTGATA AGGCTGCGAA NAGAtGG= GTACAAAAAG GTTATCAAAA GGGAGATGAG GCTGAGACGG A'rAAGGTCTG CTCGTC-AGTC
GA.AACCATCG
TCTCT'TGG
ATTACGACAA
2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420.
3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 ATTCCTATCG 1TTAATAAGCG GATGCGACGG ACTACGTGGT GTGCACTT'rA TTGGTGGTTTr ATTCI'CATCA ATTCCATTCC AATATCGGCT CAACCAACTC
TGTATCCGGTG
TCTGGCAAAA
TTCTGCC7TTA
TCGCGCTTTG
TGGTATTAAT
T'rCAGATATG
TATGGCGGGT
AGCCGAAACA
ATGACGGCTG TGGC.AGATAT GGAGTGGCCA AG'rTGGTTGT GCCTTTCATG GTGTGCGGA GTTGTGAAAC GTG.CITTGGA ~AAGAAAA CTGCCTTTAA GAGAAhCTGG GTGTGGAGTT Gr.AGACTCTGC 'rGCAC=rG ACG.ACGGCTG CCTTGGCCCT GGGACGAAT'r ATCAAGGAAA CAGCAAATCT ATTCGCTAAT GCTGTTGAGG ACA.ATCCATT AGC.AGATGT A'rCATCAATG TCGGAGTTTC AAAAGTTCG;T GGACAGAGCT TTGATGTAGT AATCACrCGT ATCGGTCAAT TCGTTGGTCA AATGGCCAGT TGG TGTG GACTcAG-r CCTTGAGGAA ATCGGCCAG CTTGAACGAC CAAGI-rAAAA TG.GCACCAAC CCCTCCGTTr AAACAGTTGG CACGCCATGGA ACGGTrGGAGT GATGG=C AACCAAGTCG GTGGTTTATC TGGTGCCT?1' ATCCCTGTr CTGAGGATGA AGGAATGATT 879 GCTGCAGTGC AAAATGGCTC TCTTAATTTA GAAAAACTAG AAGCrTGAC GGCTATCrCT TCTCTTGGAT TGCATA'rCAT TrCCATCCCA GAAGA'rACGC CTGCTIMAAAC TMATCGGCT ATGATTGCGG A'rCAAGCAGC ATTCCCAAAG GAAAAGAAGG GTTATG.AAGG. -rAATC.GGGC GCACCAATTC ATAGrTAA TTAGACGTGT ATACTATAAT CAC'TGAATTA CT1'TGAATT TTTTCATAAT AATCTCCTTC GATATAATAG AAGCAAACGG CA'rGGcAAGA CCATGT1'TAA ACTGCTCAAG GI'GAACGAGG CAGTTrAGC GTG.CTATTC CT'rGAAGAAC TTGGCIIrGCA 'rGGTGTrTTCG CTAG''rGCA CCGTATCTTTrA ATGTGGACCA GTAGAGGTCC ATACAGCrGG GAAGCTTG AAATGATrTGC GTCAGCCATC GAATGACTAT AA"TCGGTGT ATCAACATGA AAACAACAGC TGTTCGTATC CGATATCATT GAGTTrGGTG G=rAGG AACTGCACCC TTCGTCTCTC GACTTCATCT CTCGCCGTGC ACAAATCCCA AAATTAAG-AA AATAGGAGAA ATTTTAAGTT CTATTAAGA CATAAATAA AGACCTCCTA ATATAm.-G AAACAGATAA GATTTTCATC TAATAPC=r A'rrrAATCAA CTCCTAAACr AAAAGTCGCC TGTATGGGTG GCqr.rATTT TrATCATTCAT AGGACGCGAAA ATGGTAAAAG TACaATTGTA TTTGG'rACCT CACGArrGCT
GATTCAAGAG
GAGTGATTCT
GGGGGAAATrC
TIGGAGCCTAT
CGTTrCACCAA
I-TGGGCTGA-A
AAAAGAAATG
TGGAACCATrr CGCGCGCAAG GrGGACCGA TACTCCCTTA 'rTAGGAATCG G7'-GCGAGA ATCTGATCTA GGTCGTACCA rrCAGACCAT GGGAATTATC CCTI'ATCCCA TGGA CAAGCG GATGGCGATC 7'rrrCA'rGGG 7=GCrTrwATG CTGAACTGGC GGCrGGGAAA AACrCAGTGG GAAGATCAAG GTGGAGGTAA GTTI'ATCTGA TTAATIOGCAT GAATATGAGG ACGGCCAGTT GGACGTGAGA AGATIGGAAGA AGGGA'rTTGA TAAGAATATC TTrrGATGTG CAAAACTAAA CAAGGAACCT ACTGTTAGAA
TATCAGAGAA
CATTATTCCT
TGACGGCTTG
CCGAATCAAG
CcccTTGTr
GCATCCGCAT
TAGGGTTGA
AGGCTCTATT
AAGA'rAAGAA
GTGTAATCCT
CAGTCAGGAT
4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 GGTCTGGATA ATGGTAGCGT GACAATCCTT GrrG1CGGTG ACCGTAGTTA CCGAGAGCTA TAA'rCAGTCT AGACTCT GCCATGAGCT AAAACAGCCG AGGGCACTCC 'T=CGGCTGT ATTGC7=TA GAGATTTTCA TAAACAAGAG -ACTTGACAAG G1-rGCGGCTA CACCGTAATT TCCTCTGAGA ACTCATTGTT CTTGI'rTTrGA CATTGTAGAG GAGGATAGAA TGTGACTCAA GATAAGATGG CTCCAGAAAT GATACCAGAT CTTAGCAAAG GTATTCACAC GACrACTTCC TAAGCTTTCA ACCTCTGTAT AAATAGCTAC GTAG.AGAGTT TIrGAALATCAT AGCATCATTG GAGTTGTAAT GCAGCrCTT CAATACTTGG TCCTATTTGT TGTAACTAG CAACAGATGA GCGAATAG;TA TAAG.GTAATC TTCTGGCAGA 880 TAGAGACATA ATCAAGATGA AAGCAGTCCC TCTAATCATA AGAAATCCAC TTCCAAATAG 5820 ACCAGI'ATTG AAGGAAGAAA TGAAGGCAAT CCCTAGAACG GTTCCTGGTA CAATATAACG 5880 TACCATACTG ACCTVTCA TTAAGTcr AAACAAATTC CCrT-TCTAA CGGCTAGGTA 5940 GGAGATAAAT GTCGCAAATA GAACAACrAG AACTAAGGCA ATCAAAGGGA TACGAATGG-T 6000 ATTGAAAATA GCAGATCCCA TACCATGGAA AGCTACCTTG TAACTrGTrrG GAGAATAACC 6060 '17rAACAGAT ACCATACCTG A'rGTT=rAG CAAAGAGGTA TAAA~rAAGT AGATT-GAGG 6120 TAAAACAGAG ATAAAGATAA ITCCG;TAGAC TGTTGCATAA ATGGCACCC.A =TTrCTTT 6180 TGTAGrTTT rAGCrCAA =TGATGGAG CAGAT!'CATC CTCA.ACTGT AGCGGTTTGC 6240 AATGTG=1 T TGGATAAGGA AAATTGCCAA GC;CAATGA'rA ATCGCCATAA TT1GCAAAGC 6300 AGAM-wrrCCT CCAACcCGCC TAATAAATTG GCTA'rAAATC ACCACACGA AAG~TCCATA 6360 CCCTTCGCCA ATCAACATAG GCGTTCCAAA GrTrGAGAAT GctrC-CATAA ATACAA~CAA 6420 GGAGCTGC'rA GTAAGCGrGG AACTAGGAGA GGTAAAACA6A CCGTTACCAT AGGTTTAAAT 6480 *CCGAAGGACC CCAPGCTrTC AGCTGC=TCA AGTAGAGAATI TGTCAATACT GTrCATTG7TT 6540 *CCAGCAACAT ATACAAATAC CAG'TGGGAAT AGTTGCACTG 'rAAAGACAAG TACAArTCCT 6600 ***TTGAATCAAT AAATATCGAT AGCTGGA.AGA TAALAGGGCAT -GC-AAAAA TTTAGTGATG 6660 ***ACC'rCATTTC GTCCTAGCAA GAGAACCCAG GAGTAGGCTC CTACGAAAGG AGCTGACATG 6720 C AAGCAATGA TAATCAATAT TTGTAGAAAT IPTCrTCCCCT TGAAGTCATA CATAGAGAAG 6780 *AGATAAGCTA ATAGGGTTrCC TACAACrAAC GAAGTGATAG TAGCGGTAAT GGAAACCTG 6840 AAACTGTTGA CTAGTGTCTC AGAGTAGTAG GCTTACTAA AGAAAGTGAC AAAA'rTAGCT 6900 *AGTGAGAATT G'TCCTTCATG TATAAGT=c TGCTTGACCA CGGTAACGAT AGGATAAACG 6960 *AGAAAGATAG GATACGTAAG AAAGAGGAAG AAAGAGGAAA CTG-CCAAAT ATTAGTTTT 7020 TTACGTTCCA TGGTTGACTC CTT'rATCAG GrTrTGCGAA CCATCTGCAG AA.AAGATGTT 7080 ***TAArrTTGC GTATTGA2'TC GTAGACGAAT ACGATGCCT 1'MwMTAGAT CTTCTTCAAA 7140 *AGTTGArrCT TCACThACTT GAATTTTTGA GGCAAAACCT GTCTCAATGA AATAATCCGT 7200 ATTTAGTCCA AGATAGACGC TATCTCTAAT AGTTCCTTCA ATATCTCCAG ArrCATCr1-r 7260 *GATAAAcTcT TcGGG.ACGAA TG;CTTACATG; AATAGC7-iGC TCCTCAACCT GA'rCAAGAGC 7320 TGGCATTCGA AGGGCA'rAGC CATCTGAAAA GACGATATAA GCGCCGTCGC TCCGTT'rTC 7380 AAG.ATTGGCACGGGATAATAT rrGcGTcc GATAAAG=~ GCC-AcAAACT CATTAGCTGG 7440 TTTATGATAG AGTTCTNrT G TCGGCCGAT rTTGGA'rC ACCCCATCTT TCATAACAGC 7500 AATTTGGTCT GAAATAGCCA TGCTCTC TTGGTCCTGG GTTACATAAA CAG=rGTA6AT 7560 881 TTGACGCATA TCCCACTTCCG TcGrcGM-r CTCCGA'rGGC TCCAAGCGAA GI'T1GGCCTC rAACCGA AGGCGCATGC CAGATTACTA AGTCGCT=C CCATGAGGAG AACACTTGCA
CAAGGTGACA
AGCAATrGC CTTc~rlrc cGTTCTrGTT GTCCACCACT ATCACTTCAA GATAC?1'G?1 ATAAGACCAA AAGCAACGT GAGTTTA'rCG CGCTTTCGAT GCTCTGTTGA ATCAATrTr crCCGGACA G'TCAAATCTG G'rAGTrTGG AAAACCATCC ATC.ATCCAAG TAAAATTCTC
CGATATTGCG
CACCrrCGA'r GGTCGCTTNL C CCACATCCTC AAGCTCCAAG AATG rA T-rCTCAA'rAA CAGGGACATC GATCTCACTC ATAGrAACC ?CTrTTACTG GTATTCTTA ACGATATCTG A~rTATTC-r TT'rGA7=TG TCAALTTGC=~ TCATG7'rTTC =TAGTAGTG GrN'GTACCAA GTGTATCTTG TTCTTrGGCA ?Tl'CCATAT 'ITrTTAGATTT GACGGTTCCT TCT -TCGAT AGACTACCTT T7TCTGCGT TCCATA'r-AT ACTGTTCAAA CC-CCPLATCA AAGGGTAAAG AGACTCCTT GTGGTAGAT TTrTGGCGT TTTAGATTGG ATATCTCTAA GATGACATAA TCA'rAATCT GCITrGTTTTA GCATT=rAC TACTTCI-rGA GAGA'rAATAA r'rTAACGA'rA CCAGCACTAC CCG-CATATTG 7740 CT7"rrGGAAC 7800 CGAAAATAGC 7860 TGATTrTl'GT 7920 TACGAAGAAG 7980 TTGGAA?'rCr 8040 TAATAA-=T 8100 AGACTTCCGr 8160 CAGTGAGTGT 8220 GA.ACAGGACG 8280 AA'rCGATAAA 8340 CAGGTAGCAA 8400 0 0* TCCTGGATCr
ACTAGATGAA
CCA.AGCCTTA
GGCGCTAGAA
AAGATCGrrTA
ACTACCATCT
ATTATCATTT
ATAACAAC CA TCATAAGAGA GACCAACAC C7rrGAACCGA T-1rMACCATC TCATCTrG AACCACCTTG GAGTI"MCTG GGTCAGCAGT TATCCTTCGA TGTTCATCC AGTGTA'rAAG GAGTAGAGTA AATGTTAGCT CCCTCATTwrA ACAGTTTAAC CATT=TCCA TCACACTA CTTT-ATAGAC AATAACTCTG AAAAGATCTT TTACPATAAGA ACTGTAGC ATATTTG=A 'rGCCATTTTr CCVrTTAGTT TTTAGrrAAA TC-AGGGT'rGA GCCAGTTGTG ==rTGATATT
ATTGAGCAAA
CAGCTTTGAA
CGATTAAAAC
CTTTGATAAC
TATATTGTGT
CTAG7'=T TTrCTTCAAA TCTTnGrAAG TATAG1TTTTC AAAGAGTTCT CCAAAGATAA CATCAGCrAC AGGAACI-rCl
CCGTGGG;TAG
TTrTCTGACT
ATACCATATT
8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 GAAAAGTTCT CCAGTACCAG CTTGAATCAG 'ITCTACTT GGCACGAATA G'rTGCTCCAA TTAAGCCCTC TGAGrGCT GAA'rAAACGA CTAGCGAACC GCCCTCTCCT TTA'rCACATO AACTGTCATC GGCAGATTCA ?rAGAAGAAC AAG3CAGCATA ATACATCCAT TT-C~rr=CA TGATG.GATAC CTCCTGTG rrTTAAAGT rrA'Ir"AAA ACAATGTAAG CCGM=?AAA ACATACAATT CTATTCTATA GTGTA'rGAA TCTATAACAG TACACTr'rGA CTGCTAAAAT ATTTCTATAA ATTAATTTCA CTTTCCTGAT AGAGATGTTC 882 ACATCTTATT TCAATrCACT ATATTAGAGT AAAATTCrCT ACAAAAAGAA GAATAGCCTA 9360 TMTACTATT CTrCrGAGTG ATrTCAATTC CP'rGGGGAA ATATGGAGAT ACN-rwTAA 9420 TCCTGAC.AAA TGGTTG~rrC TTTTTCTAA6A rCGGTGATAC TGTATCZ-GAG AA'rGCGCGTG 9480 AGGT CACAAA GGCTGCGATA GAGCTTCrAT GGAGAATrT "rT"-r=AGA cM-r11-z-r 9540 AGGAATGAGA CATCCG=1AC CTCCTGGAA GGTTTTG 9578 INFORMATION FOR SEQ ID NO: 128: SEQUENCE CHARACTERISTICS: LENGTH: 13440 base pairs TYPE: nucleic acid STRANDEDNTESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128:- CGGGCTGTT~G TGACCATTCT TATTTCTATC TGTGTTATCT T=~GGGAAC TATIrTGGGT GTTGTCTTGG C~wrTTGGGCA ACcTTCAAAG TTTAAACCGC ?I'GT'rTGG-r GGCCAACTTG 120 *TACGTTTGGA rrTTCCGrGG GACACCGATG ATGGTTCA.AA T'IATGATTGC CTTTGCTCT 180 ATGCATATCA ATGCTCCCAC TATTCAGATT GGAATTTT'AG GTGTTGA'N'T TTCGCGTCTG 240 ATrCCAGGGA TPTTGATTAT CTCTATGAAT AGTGGTGCTT ATG=rCGCA GACTGCTTCGT 300 *GCCGGAATCA A'rGCGGTTCC AAAAGGTCAG CTAGAAGCGG CTT-r'TCCCT AGGGATTCGT 360 .CCTAAAAATG CGATGCGTTA TGTGA7MNG CCACAAGCAG TCAAAAATAT CTTGCCAGCA 420 TTGGGGAACG, AATTTATCAC CATTATCAAG GACAGCTCCC TCTTATCAGC TATTGGGG-C 480 *ATGGAGTTGT GGAATGGGGC TACAACAGTr TCTACA ACA6A CC-TATCTACC TTrAACACCA 540 *CTTTrATTTG CAGCATTTTA CTACTTGATT ATGACCTCTA TrTCTCACAGT ACCTTGAAA 600 GCTTTTGAAA AACATATGGG ACAACGAGAT AAGAAATAAT GACAGAAACC TTGATAAAAA 660 *.*TTGAAAATTT ACATAAATCC TTTGGAAAGA ATGAAGTATT GAAGGGCATC AACCTCGAGA 720 *TTAAAAGAGG AGAAGTTGTC GTTATCATCG GTCCTrCAGG GAGCGGGAAA TCTACCTTGC 780 TTCCCTCTAT GAATIPGTTG GAAGAAGCAA CCAAGGGGAA GGTTATC?1Tr GAGGGAGTCG 840 ATATTACGGA CAAGAAGAAT GACCTG=rG CCATGCGTGA GAAGATGGGC ATGG'rNTC 900 AACAATTCAA TCTCTTTCCT AATATGACTG TGATGGAAAA TATCACCTTG TCCCCTATCA 960 AGACCAAAGG -rGAcAGTAAG CCcTTGCAG AGAAAAGAGC TcAGGAAcT TTGGAAA.AAG 1020 TTGGTTTGCC AGATAACGCA GACGCI-rATC CACAGAGTTT GTCAGGTGGC CAGCAACAGC 1080 GGATTCCCAT CGCGCGTGGG TTGGCTATGG AACCAGATGT ?PrTGCTCTPT17 GACGAGCCAA 1140 CTTCACCcCT AGATCCTC.AL AGTCAGGAAT GACC-ATG~r ATCGTG-rCAT
TTGAACAAAC
T-rGTprrAGCT CAGAGCr'rrr CATTGTATh-r CTrTA'rGCCA
CCAAGGACAA
A7~TTGACC
TCTTATAGTT
TqrTGGTATAA 883 ATGGr'rGGAG AAGTTCTGGC 'T"ATGCAA GA3TAGCCA ATCGTAACAC ATGAGAT1GGG A11IGCCCCT CAGrGGCAG GACGGTC=G TTG?1'CAAGA CGGA6ACACCT GAGCAGATT AGGACTAAAG ACrTCTTGAG TAAGGTIT=A TAAGTTA~cC AGC~wrTAAAC GTAAAGAGA AGAI-rAGTGA AAAGCTCAAC TAAAGCTATA GGATTGCCTA GGAA.AGAAGT =~AAGCTA TTAAAGATAT 7rGTAAGAAA AGAGAAGTGA TATGACACAG ATTATTGA'rG GGAAAGC-r AGCGGCCAAA TTGCAGGGCC AGTTGGCTGA AAAGACTGCA AAATTAAAGG PLAGAAACAGG TCTAGTCCT GG7rGT=AG TGAXTTITTGGT TGGGACAAT CCAGCCAGCC AAGTCTACGT TCGCA:ACAAG GAGAGGTCAG AGCCAAGTAG TACCGGGTTCC AGAGACCATI ACTCAAGACG AAATACPLATC AGGATCCACC ?rGGCATGGG ATVTGGTTC ATTGATGAAG ACGCGTCT ATTGGCTAZr CACCCAGAAA CCTCTAAACA TCGCGCCGTCT TTGGTCTGGT CATCCAGTCA GGAATTATGG AAA'rGTCCA 'rGAATATGGG ATTrGACrrGG
S
S.
5
S
S. *S
S
S
S
S
S. be S S CCCTrTC;GGc
AATTGAGA
AGTTGCCATT
AGGATGTGA
TrGA'TCCTTC
AAGGTAAAAA
?TCTTT'rGGC TGGTrrCCG;T CCTGA'Ir~Cr
ACCAAAACAC
TCGTTTCCAT
GACACCCGCA
'IGCACTCGTC
AAAGA.ATGCA
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980.
2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 ATCGGTCGAT CCAATA7T ACAGTALACCT TGAC-TCACTC AT-CTGGTTG =PGCAATCCG GCGGTAGTCA TTGACG7'rGG GATTATGAGG CGGTTGCCCC CCTATGACCA TTACTATGCT AGAAAATAAG ATAAAAAI'M TAGAAATGAA TATTAAATTT CTTGTATAAT CGAGTGACGIY TACAATGGGA TGGAATAThC CCGAA.AACCT ATGGCCCAGC ACGTACTCAT AATCTTrCCA A=GGCTGC AAAAC.AGAT
TCGTGCCAAG
GATGA.ACCGC
ACTGT=AC
GATGGAGCAA
TCGAGGAAA
TAGAAATAAC
GATTAGATIGA
TACTTAGCAC
TTTGTGACTG CTr-AC-T=r CAAACCAGGT GATCGAAAA'rG GTAAGCTCTG TGGCGATGT CACArTACGC CAGTCCCTGC AGGTGTCGGT ACCTATCAGG CAGCACTTAG GACATTGCAT GTGTAT C TATAGCTA'rA TCTAAAATGA TrrATAAAAG GAGcGrTMCG CCTCCTTTTT TTTTAAAAAT TT-ATAATGGG GAATATAGTT TAATTGATTA TCCAAATATT CAAGAGTGGG a.
S C
S
0 5.5S .5 S S
S
AATTAGAAAA AATTGCTAAA TTTATAGCrr ACGAAAAACT TCATAAACGT CAAACAAGTA TTrACTTC TcxrTTCITGT ?TAAAAAAAG AAAT~rrAGA TACATCGT CAGCATCCCT TTCTGCCACC ATTTACTCCI' AC.AGATAAAA GAGTALCCCTC GACTTATGAC CTACATAAGA GTAGTGAC TTCAGACTAC TGTAGTCA'rA CTACCACTAT AGATGCAGCG A7"=cATTT 884 TTAAAACTGC TCGrCTM1-A 1'C-GCTGTGA AAGCCrTGC GCGAGATGC- GAGGAGTTGG T"rGATAG TCGAAATCr GCATrGATC CCATAMATTA 7*rrGAC-AT GTrCATG=AG CGTGGTCAAA TACAAGTTCT GGTTATCG.AT TGGCGATG-GA GCcr-ATA GGTCGAGCTC CTTCAGAGAA AGAA=ACAA GACAA???A CCTCCA AACTNrrCAT TTATCTATA CAGXI-IrGAT TAAAGrCCT GCTTATATTT TTGATGCTTAC-=GCTCTA AAAA'rTAAGG ACATGC?!TAA TTTATTAAM. GAGTTGTA'rA TI'GCArrAT TCCAACTCAT AATTGAA TATTATTCCA ACCAAAATAC AAGATAGGGT GTATTATC?1' GAGAAGACTT AGAAGAGTGG ACTAAGAAAG TCTATCAAGT TG-'rrAAA AAGGATAGTT GAGGAAAAAA CGATGAAAGT GATTGATCAA ACCTTACTAG TATTGAACGT TC=1GACAA GTCATA.AAGG AGACTACCG1T CG=CGCTGT
AATAAGAGCC
GACTATGCrG
CAATCAGATA
AAA.AAGTCAT
TGCTTrGTCG GACTTATCCTI TATGGTGG AGGATTGGTA ACCGTTGGAA TGAG4GC'TATG GCCTTTCTC AGAAGTTGTC TTGCTG.GGGC ACAGGTCTTT GCTAGCTTAA CA'rCCTTGCT AGGACAAGT CCATCATCAT GGC GCAGCTGTAA AAAGCCGTGC CGGACAGGGA AAATATCCCT GCTCT-ACACA GCCA7TGCC TGCAAGATCA GTAATrCTA CAAGAGCAAT TGCGAGAAGGC Cr.GTTTACG AGACGATACG TT'rGAGAAA ATCTFGTAAA AAAAGAATCA GAT=rrATT GTAGA'rGGAG GGGCCTTAAC TG7'rGTTTCC ATCTAACCAG CT'CAT=~AA CTCCCCACCA 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3 840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680
AAAAGAATGC
TAGTGCCCTG
TA'rrGGCAA
GACTGGTGGT
ACAGGCCAGT
AGAACTATCT
GAAAAACTGT
ACTTCTTTCC
GTTGGCC-AGT
ATCGTGATA
CTCTACCAAC
CAAGAAAATT
CTATTGC
CTCAAGGA-ZAC
CTGATTATTA
CACTGGCTrGG GTrGTGCAGT ATrGTGGTCTT TArATGAAAG CAAAACGAAG AA'rrTTGCTA CAGAAAGCT= CCAGTTAAAG GTTGGCGGTC
GTACAACATC
CAGCV!ACTICG
CCTATCAGGC
AATGATrGCA
AGCAACCCAT
rCCCrCACCCAA GGA'rrGCAG GCCAAPTTCG CTTCATTCAG CCATAGCCCA ATTAGTAATT GTCTTCCTAA AGTAATGAAA AGATATGTCT AAAATAGTTA TTCTTAA'rrC ACAAAAAACG AACuTrAGT G7'rTTrTTTAC TCTTGTAAAT CTATT~TT AATTGATTTG TTAGAGCTCT ACTTTATTA GTATTCTAAA GGTACTTTTA CATGAAATAA CMTAAT;GGA 1:G rTG TGGAAAATCT GACAAAAAA'r GT1wGATAATT TGTATCATTA ATTCTTCTTG CTAAGAAACT AAAT7TTCTTC TAGAGTTGAT TTGGTTTACA TCCGTACTTA AAAAAATCA ATTrCAAGGA TAAATAAGCA AAGCCTI-rAC ATGCTATAAT AGAGGTAGCT G;AAGAAAATG dcAGGTA'rA cGccGc'rGA ATTTATCAAG GATGGGATGG TTGTAGGCT AGGAACAGOT TCTACTGCCT ArrATTT1'GT CGAAGAAATC GGTCGTCGALA TCAACGAAGA AGGC7rGCAG ATTACAGCTG 'rGACGACTTC 885 TAGTGTGACC AGTAAACAGG CTGAAGGrCT CAATATCCCG CTCA6AGTCTA TTGACCAAGT 4740 AGACTTTGTC G.ATGTGACAG TCr.ACCGGGC GGATGAAGTG GATACTCAGT TTAATGGAA'r 4800 CAAAGGCGGT GGTGGTGCCC 7TCTA AAAGDGTC GCAACACCAT CAAAAGAATA 4860 CATTGGGT GTGTAAGAGTG CGAAAAACTA CcGT-rTTA AA7-rGCCAC;T 4920 AGAAGTGGT' CAGTATGGTG CAGAGCAGGT CTTCTCAT ?I'TGAACGAG CTCGCTACAA 4980 ACCAAGTTTC CGTCAAAAAG ACGGCCAACG TlTGTGACC GATATGCAGA ATTrTATCAT 5040 TGACCTCGCC 7IrGGA'rGTCA TTrGAAAATCC AATT-GC=..- GGACAAGAAT TGGACCATGT 5100 CGTmGGTCTT GTGGACCATG GTT-ATrCAA CCAAATGGTG GATAACGTAA TCC7rGCTGG 5150 ACGAGA'rGGA G7rCAGATTT CAACTTCAAA AAAAGGAAAA TAGAACCGGG CATAAGATGT 5220 CTAAATTTAA TCGTAITCAT 'r-GTGGT=AC TGGATTCTGT AGGAA=GGT GCAGCACCAG 5280 ATGCTAATAA CTT74TCAAT GCAGGGN'C CAGATGGAGC TTCTGACACA CTGGGACACA 5340 TTTCAAAAAC AGrTGGTTTG AATGTCCCAA ACATG=CAA AATAGGTCT GGAAATATTC 5400 ***CTCGCGAA.AC TC~rCTTAAG ACTGTAGCAG CIC'AAAGCAA TCCAACTCGA TATGCAACAA 5450 ***AATTAGAGGA AGTATCTCT-r GG-TAAGGATA CTATCACTGG ACAC-TGGGAA ATCATGGGAC 5520 *TCAACATTAC TGACCTT'C GATACTTTCT GGAACGGATT CCCAGAAGAA ATCC'rGACAA 5580 **.AAATCGAAGA ATTCCAGGA CGCAAGG'rrA TTCCTGCAACC CAACAAACr TArCAGGAA 5640 CGGCTGTTAT CTATGAT1-rr GGACCACCTC AGATGGAAAC TCGAGAGTTG A-TATCTATA 5700 CTTCAGCTGA CCCTG'N'TTG CAGA7TG= CCCACGAAGA CATTATTCCT TTGGATGAAT 5760 TG;TACCGTA'r C-rGTGAATAC GCTCGTTCGA TTACCCTTGA GCGTCCTGCC CTTCTT'GGTC 5820 GCA'rCA'rTGC TCGCCCTTAT GTAGGTGAAC CACCTAACTT CACTCGTACG GCAAACCCTC 5880 GTGACTTGGC TGTATCTCCA 'TTTTCCCAA C'TGTTTCGA TAAATGAAT GAGGCTGGTA 5940 .TCGATACTTA TGCTGTGGCT AAAATCAACG ATATCTTAA CGGTGCTCGT ATCAACCA'rG 6000 ACATGGGTCA CAACAAGTCA AATAGTCATG GAATTGATAC ACTATTGAAG ACTATGGGAC 5060 TTGCTr.AGTT TGAAAAAGGA TTCTCATTCA CAAACC1'AGT TGACTGAT GCCCTTTACG 5120 C CCATCGTrCG TAATGCTCAC G1rAccGTG ATTGcrrccA TGAG~rrcGAT GAACGCTTAC 6180 ***CTGAAATTAT CGCAGCTATG AGAGAGAATG ACCTTCTCTT GATTACCC GACCATGGAA 6240 ATGACCCAAC GTATGCAGGA ACGGATCACA CTCCCGAATA 'rATTCCAIrTG TTGGCCTATA 5300 GCCCTCCCTT TAAAGGAAAT GGTCTC-ATTC CACTAGGACA '1rrTGCACAT ATrrCACCG;A 6360 CTGTTGCCGA TAACI-rTCGT GTGGAAACTG CTATGATTGG GGAAAGTTTC TTAGATAAAT 6420 886 TGGTATAAGA TGACGCGCTA TGC7tTGCTG GI'GAGAGGTA TCAAGTGG TGTAAGAAT AAvGTrCGTCA TGGCGGAGCT TCGTCAAGAA ?TTACAAACT TCZCACTCA AAACG7-.GAG AGCTACATCA ATAGTGGCAA TATwrTTCTrr ACTTCGATAG3 ATTCCAAAGC CCAA'N'GGT GAAAAGCTAr. AGACTTCTT 'rGCAGTCCAT TATCCATTTA TTCAGAGCTT TCTTTACTC AGTCTAGAGG ACrT"G.A3C.
GCACGAA.AAG ATTCTCTT Gr1rGAAAGTT TAGAGCTGAA GGGAAATIrrr CTGAAGAATC CCT-TCTACC GCCZACATTAC CTAAAAAAAT AATAAAGGAC ACTT'TCCTGA AAGAAAAGGG CTTGGAGAAT TGGCAGAAGA CvGAAC7rGALA AATCTACCAG 7rACACTGAG GGT'rTGGATG AG.ATGAAGI' CTTTATTTTTG CTGGTGGAG CAGACACTTG TGG.ACCAAGT CATCGCCACA GAAA.AC-rGG GATITPTCTG CTrrCTAAG ACTGCCTATC ATAAGTACT 'rATTCGTAA'r GCTAAAACCT TTGACAAAAT ACACACA.ATG ACATTTTTAA ACAAAATCCA A.ATTGCAGCC CCTGAGTTCG GT~CTAATCCT AATCGAAAAT CCAGTTGTAG TAGACT'ATC
GCTGAAGCTG
TGGTCAAATG
TGAAACTGCT
TcGr-%CAGCA
TIGAGATTCCA
0 00.
AACTGGGGCC
GGTCGCAAGG
GTGGTGACTT
AATGCAGC-Tr
AACATGACGG
CCAGATATGT
AAACTTAATA
GTTCAACAGT AGCGGTCAT TCT'rGGCTCT TCAAGGGCGT TCCCAGTTCG TGTGATGAAA ,C-rGTAAAT TGCGTATATGG TGAACTGGCA TTCCATTTC- ATGAAGGGAA TCCTCTGGAA GTTCTTGGA'r GTGAACGTGT TA'N'GTAACC 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220
GCGGTATCCG
GGCAAAATCC
CTAGGGCCTA
TCAAGCTTGA
ATrTrGGTCCT GCrAkCCTTC;A IGGCTATCTC AGACCATATC ATTGA'rGGGT GAAAACrTGC ATGACIrTfGG CCCACGT=rC CACACCACAA TACCCTGCC.A CrIGCCCATGA AGTGCAAA TGAAGGTGTC TATATCGGAG 'T.ACTGGTCC GACTTATGAA ACACCAGCAG AAATTCGTTC CTATAAGACA CTGGGAGCAG GTTCCTGAAG 'rTATCGTGGC AGCCCJACTCT GGCTTGAAAG ACTAACTTTG CGGCCCC.TTT CCAAGAAGAA CTCAATCACG GAACGIY.=A AAGGTGATTT CAAAGGCTTG CTTAAAGCCA
ATGCAGT'CC
TTCTCGGAAT
AAGAAGTTG'r
TTCTTCCTGA
TATG1'CTACG
TTCATGTATC
AGAAGTGACT
ATTGTAACAA
AAAAL3ATrTA AAAGGCGGGAG 'rGCCTCTGTT TTrCAGGAT TGACTG.CCTA TCCGGATTAA AGAAGAAACA GAGGAATACT ATGAGC 'TCT TCCTG.CTCTT ATAACTGAAA GAAGCGGAAG AATAGC;TATG TCTGATCTGA TAGCCAGCAT TGTGAAAGAC AAGATTrAG GATACTAGCA TTAGCTTCCr AGCCAAGCAG ACTACTATGA TAAGGAGAGA 'rGAGAAfrGAA Tr'GACTr'DCT GAAT7PrCT CA U'rCTTATCAT A'rATAGCACA ATGAGATTTC GCTrCAGTCT GC'T'GTAAAT AAACGAAAAG AAAGATAAGA AATAATGAAA A'rrGGTCAAC GAATTATGCC CTTTGGCATA AAAAATTAAG TATCGGACrr CTATCTGrTG TAGTCGGCTT TGATTTCTAG CTCCAGCTGG 887 AAI'rCACCC AATGuACrAA AGCAAGATGT AACATClr.AA GTGGTAAT).G GTGT=CArGA TTCTAAGGAG GAATTGAAAG ACAr-XAAAA TGATGCTCCA AAACrAGAAA CTCCTCTTAG AGAGGAGCCA AGACTAGCTC CTCAAACC 7CGAAGCA AG7rAAGTTC TTGAAAACAA AAGCGAAGAG 'rCAAAACTAG AGA'rAACATA ACCAGCTCAA CCGGATCATA TCCGCAAGGT TG7'rGCGGAA TTAGCCAAGG ATATAACTAT TACrAAGTG TATAMrACAG GTCATrCTCT TGGATGTTAC CTACTCAGA 7rWGCCC TGAAGCCAC CAAAAATATC CTGA~rTT'A TA6ACr-ATGTA TTGAGGAAAG TGACAACT CAGTGCrCCT AAAGTGATTA CTTCCAGAAC TG~TCGAAT CCTAACAATG CT-TTCrGGA TCTTGCC 1TC GAAAGrC--TA AArrAGCTG-r TAGTGGAAAA AT'rAACCATT ATGTGGTTGA TAATGACAAT GTTGTCACTC CCTTGATTCA TAATAATCGT CATM'TT'TA CATTTACAGG TAATTCACCC TTTAAACACC GT==GC CrATTTTGAA ACTCCAATCA ATGATArrCC TAACTI'TAAT AT-GGTAA6AC AAGCTACCTT GGATAAACAT GCTTATC=T A'rCCGAAA-r GGATAAAGTlG CCAITTC rTA AGAAACACGC TCTrGCCTCGA TCTTCTAGTC A6ACCAAGCGC TGA.ACCAATG GAAAATATTG CCTCAGGAAA ACAGGI'TACT CAA6AGTTCGA CAGC1'T1CGG AGGAGATGCT AGAAGAAGCTG TCGATGGCAA AGTCGATGGT AAC-ATGGTC AC) A'CTGT CACTCATACA AACT'ICCAAT CTAAGCCT-rG GTGGCANAGTA GATrTTCCTA AACAAC-ALaLC CATTCC;CCAA A'rCAATA'rTT ACAACCGAAC AGACACTGCC CAGGATAGAT TGGCAAACTT TGATGTCATT CTTTTAGACA GT'rCTGG'rAA AGAAATTGAG TGAAAACGTA TAACATCTCC TAAAGATGTG TCALGCACA.AA TTACGATTAA CCATAAAAAA GCGCGCTAT-G TTCGGATrC-A GC'TAGAAGGC TATAATGCCC TCAGTCTrGC 8280 8340 8400 8460 8520 8580 8640 8700 a760 8820 8880 8940 9000 9060 9120 91390 9240 9300 9350 9420 9480 9540 9600.
9660 9720 9780 9840 AGAAGTTGAA GTTTTCTGCT TTATAGCTAC GCCAGTTCAA CCAATCAGTC AGACTCCTGT TGGAGCTTAC ATTGCCCGCr ACTCCATAAC AAACCAAGTT GTTCGTAGTC ATTCTTGGGA TGTCCTCAAC CTCCCAATCA AAGAAAA'rAT G ACGGCCCTA CTATGGAATA GATGCCAAAC ACCCCACCGT AAAATTACCC ATTGGGGTAC TGTC7TGTAA TCTGATGGTA GAATGACAGT GAATGCTrGA6A ACGGCGACAC AAGTTTCTAA GAAGGATAAA ACATTGACAA ?1'CAACACAG ?TrGGAAGAA GTTCCAGTAG ATAAAGATGG AGGAAGCGGT CGCAACCAGA CTGCAGGT
GAG.AAATCTG
AATCTATGAA
GACATTGAAT
TAGTTI'GTCT
CGAGTTAAGA TTGAGAAAAA AACAGACCAA TTTTAGCTCA TCCAAGGTGA GTGACGATGA AGrTATAAG AAAGTACTAC CTGACGA ATAGGACTCA GGTAGCTCTC TATGAAACAA CAAAATTAAT ACTCAATGAA AATCAAAGAG CAAACTAAGA AACTAGCCGC AGGTTGCTCA AAGCACTrGCT TTCACGrrGT 88a AGATAAGACT GACGAAGTCA GTCACATATA TAATCCAAGG CGACCTTGAC GTGGT"rrCAA G.AGATTNTCG AAGAGTATAA ACAGAAAGGT AGACCCCTG TTCTAAr1-rG AACACGAGTA G.AAAC~r CTAAAAACAA AAACGAAAGG A'1'GGGTAAAC TGTA'rCGCT GAACTGAATA CGGCCCACTC AAAT CAAAATTAAG AAAGGAATTG ACCCCACCCr AAAACTAGTG GGAAAAAGAT AGTTGATCTA GCGACCPATCG CTCACTG-CGC CCAACTCCTA TTTTCCCTTC C11 TrGAr GOGTTGGTA TCTTTCTCAA TATAAAATAT A.AAATAAAGA AAGGTAGAC GTrGTGTTTTG ATTTGAACAC GAGCCrGAAAA CTCGGAAAAT AGATAATCTG ACTGAAAAAT CAGGA7TTTCT CarCXGGTrC CTAA'rTTCA G'TCG=T'CT -CTCGC-CTT TGTATCATAA ATTATGTCTA TCCA'rATTGC TGC1'CAGCAG GGTGAAATTC GGGGXX-CTC TTCCCTAA GTTTATTGCG AACGAAGTC C aAACATGTr TGGTTAC-ACT A'rGGGAACTG GGATGGGAAT GCCATCI'ATT TACGGTGTGA AGAAATTGAT TCGTrGTGGGA GTTCGTGAAT TAGT'TTGCGC GCAGGCCGCT TCGCCAC.AGT ACCATrN-rCC ACAAATTGCT ATCGCCAAAA AACTTGGTAT GACTACTCAC TAC'rCAAATT ACTrTl3AAAA CAATATCGAG ATGGA6AGC'AG CAGCTCTTTA CTA'rCTTGCT ATGACCATCT CTGATAGCTT GGTCAATCCA
GAGAATTCC
GGTACTTACA
TCGA=TATG
ACTGCAGGTT
GCAACCAACT
AGC?11TGA-r GTTrGGAA'CG
CTTGCTAAAT
CTGATAAAAT
TTrGATGATGC AGGGTCAC'rC
CGCGTGAGTT
CTTTC-NATGA
CAAACATCG'r TGCTTG'rATAA
CATC
TCTCTTCCT
TGTT.FTGTTTTT
TGTATCTGTC
AATCCGTAGAC
AGAGTTCAT
TCG'A6ATGAC
AGCCTACCAT
TCATGTCTw'r 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 G.GCGAGTCAA GGCTGTGGAA GCCCAATACC ATGTTGATGC GCTAGCTATC GACGAAGACA CAACTGCAGCA AGAACGTCAA AATACCTTCA CTGATATGAT GAAGGTTGGT TTGGAA6ACCT TGATTGCAGA ATAAT'rATAG CCAAAA.AGGG GCTCTTTGTC AACTGTAG'rG GGTTGAAAkAA AAGCrAAGCT TGAGAAAGGA CAAAT'rrCCT CC7PrC~rrT TTC-ATATTCA GGGCGATAAA AATCCGI= TTGAAG7TT CAAAGTTCCG AAAACCAAAG GCATTrCCCT 'rGATAAGTT? GATGAGATTA 'rTGGTCGCTT CCAGTTrGGC ATTAGAATAG TCTAGTTGAA CCCCTGAC GATTT-CTCT T'rGrrTA GAAAGCTN AAAGACAGTC TGAAAAAGAG GArGAACCTG CTCACATTG TCCTCAATGA CTCGAAAAA TwrrCTCAGGG TCT'TT== GAAAGTCAAA AAGTAAGAGT 'rGATAGATCT GATAGTGGTG TrCAAGTCT TCTGAATAGC TTAAAATCTr GTCA.AGAATT TCTTTATTT TTAAGTGCAT CGAAAAGTA GGG.CGATAAA AACGTITATC GCTsAr=rA CGACTATCCT GTTGGATGAG 1-rrCCAGTAA CGCTTGATAG CCTTGTATTC ATGAGAT~rr CG7TCAAAC'? GATTCATAAT 7TGAACACGA AAACGACTCA TOCCACGGCT GAGATGrrGG ATAATATGGA 889 AACGATCTAG AACGATTr'rA GCACACGGAA AAAGCTGTTT AGCCAACTCA TAGTAAGGAC TAAACATATC CA'rCGrAATG ATrTTCACTr GACAACGAAC GGCTCTATCG TAGCGAAGAA AGTGA'TrTCG GATCACA=CT GTG~rCTGC CTTCAAGAAC AGrCATAA'rA 'rAAGArrAT CAAAATCTTG; CCCAATGAAA CTCATCTTTC CCrAGTGAA GGCATACTCA rCCCAAGACA TAATC==TG AAGCCGAGAA AAATCATGCr CAAAGTrGAAA GTCAI-rGAGC TTGCGAATGA 11820 1.1880 11940 12000 12060 12120 12180 12240 12300 12360 CAGTTGAAGT TGAAATGGC-C AGCrGATGGG CAATATCAGT ACT'?rTGAGC AATrTTrG 7TGATGATAC GA~vGATTTrG G-AGTCTCAGC AACCATCAI TTTGAAsAGT GATAGCACTT GAA'rCTAGA AGGCATACCA GTTGTTTCGA GGTAAGGGAT CATrT=CT CATrAGACTT CCACAATCAG GGC.AAGATGG CGATAATTrC rTGrGGGTA TCCATATTGA 'rGATATCTAG TAATATCGAG CAGTTTTGT ATAAAATGTA ATTCCAT CATAGAAATr TTTTCAATTA GTGArrC TTACCAGGG GAAACGGCGT TTTCTAAGGA CTrAGACGGT TTTTGAAAGT AGCCTCATAA TCCAGCTTAG
AATCTTGATG
ATGATTC=-
GTrrcc
ATAATATCTA
TATGAATTGT
AGAATAGTGA
'rrGCGGACGC
ATC'TACCCC
AAAGAGACAG
'rTCTTGGTAG AAAGT''rTT
TTGGAGTGTG
TTrTCATTAT 'rAGTGGAT
AGGTCATATG
ACCCACTACA
AGGACTTCCT TMrCTTATc-C CTTATGTGA ATATTCTTGG CCATCCCAAA GAATCCATCT ATACN'CTC 'rGACTTTTCC GTCAATCCCG AAAAA'rTGAG TAGCACCAAG AATAATTACC ATCrCGGTT=A ACATCCATTC TCC-ATGTGCA GTTTTGGAAT GGAC~TIT TCTACACAAA AATATTATAG AGCCCAAAAA AGAAA'rGAT CTACC3CC CALAAGT7TTTT GGrAAT ,TTC GATAAACTCC CACTrCA.AAGC ACGGTA'rTTA AGATAACGCT AAAGATGATTr TGGrCAGCTT 'I-TGGGTCTT 12420 CTAA'rGAGTT 12480 AATAGCrCC 12540 GGAACCCCTT 12600 CTGATTTCGA 12660 1TT=MAGTT 12720 GTTCAGGGCA 12780 TAAAGGCTC-, 12840 CTTGCATTCG 12900 GCT'rG;TGAG 12960 CAGGTrGAAA 13020 TrTCTGCTAT 13080 ACCTTTCCT 13140 TTAGCTGAAkA 13200 TCTGCTTTAT 13260 A).AGCTTTTA 13320 ACCTTCAATT 13380 TTTACTAACT 13440
ATCGATGACC
GCAGTCACTG
GGAG'rAGTAG AGTTrGACTTA CCAG.AACCAG AATATCCGAT AAMTGCGATT A7"rTGGAGAC AAAAAAACAG CCrCTATGGA CTGI=CrrA GACGACNT ATCGCGGCT-. GCTTTGTTTT TGTGAATCAA CCATAGCTG.A GCTAGCAGCA CGGAAAAGTr CTTCAGATGG TAGCAGTACG CATAGCTGAT *TTGAGCTIC AGTTCTTTTC CAGCGCGTI-r GArAGCTG.AT TTAATGTTTG CCAATGGTCI' nFORMATION FOR SEQ 10 NO: 129:
CAAGCTTTAT
TC-TCCAAC
TTAGATAACT
TrCMArrCT
TTAACAACT
ACCTTTAGTT
GTTTGCTTCG
GATTCGTCTA
TACCTCCATA
Ci) SEQUENCE CHARAC1ERISTICS: LENGT4: 8512 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPION: SEQ ID NIO: 129: CCTTTTTCA AAAACTAGAT ACTAGTCTAT CAAAAGTAGG AAACGGGTrTr AAGAAAATrG ATrGGAAAT ?'TTGAAAT CATAGAACTA TTAGC-AATC CCTAGTATTG AAAAGACTGG ATACCTrCTT TCAGGTCATC 'rTGTAAACTrA TTTCCGG'r CAAGTTGGAC ATAGACTrTCC ACCAGACAGG ATCTAAAGTT GGAAAAT?1TG TAAAAATCCT CCCTTCTC TATCGGAAAA TCAACAGTTT TTATCCAAGA AGCTACTTGT TCTTGCTCCA AC-.ICCCTTG TAAAATAGCT
S
S S 55 5.
S S
S
S. *S
S
55 TCATAGATCA CTCTTGCTAA TPAAATAGTr GGCCAAGTAT TGACAAACCA CCTCTGTCAG TGACTTGAGA AACCCCrGT AGGACATT~AC GAA=TCTGC TCCTG4GTTAT AATTTGGT AATACCTCTA CATTTCTAGC T-CTCAATTA AATACAATCT AGTAACTTCA AATrI'AACAT ACGCCAATCC TCATCATCTG ATCAAATACT TCATGAACTC TAAATICGGCT CCATGTGCAA ATCCTTTTCT TTTGAAAGAT AGAAGGATTT CCCAAATGAT TTCTCTGCT AT'=CTTA AACTGTTCAA AAAGGCAGTC TAAAGCGAAT CGACArCTr TGTTTTrACG AAAGTCTGG.A AAGCGTGAAC CCAACCATAC
AGTGCAAGCC
CAAACAACCA
G'TAAATC7TTG
TTAAATGACT
TrGATTrAAA 540 CTGGATTC 600 ATACATGGTC 660 CAATATTGAA 720 CCAGTrCTAC 780 CTACAACTAA 840 TTTCTATTCC 900 GACAAGAATT 960 GATATAAAAT GACGTAAATA ACTATCAATA CACGAC C= C AACCACATT1 TTGAAAATAG GACAAATAGA ATGACGCTTA ACAAGCCCAT AAACATCATT CTAAAAAATT CCTACTCTCC CAACTCAGCA CrATAGGAGA TAATCTGGTC AACTGTGTCA 0.00 00 a* GGATGGTATC ACGGAGTGGT T=GCTGTTlG AAATATCAGC CAAGTrGGTGT CrTGCTACCA CAGAATGTTT TAGATGAAGG AACTATACAA GCCCA'rATAG CAATTrCAAT AGCATCTCCC TGCTTGCTCC AAAGGTCTCC CTTCCAAGAG -GTGGGTGATA CGAAGCGTr TTGACGTGGG CATTGAAGGT TGACGG'TGCT CCTGATTTGA GGAGATTGAG TAACCAGCAG TCGAG.ATAAC TAGTGAGCTT GGCCCATCCA CAGAAATCCG TCAAAACTTC ACCGATAATC ATGCT'GACT CCAGTCTTrCA GCTCCAGTTT TAGTCCTGCT GAGTAAGTGT ACTCAGAGTTr GCATCATCGT CTTCATAATG CTGTTGAGCT 1020 1080 1140 1200 1260 1320 1380 1440 1500 CCTTrCTTCA.A TCAATGTATA CACCTTACGC TGGA.AGGCGG AAGTTATGGA AGTAGGTGTC TGTCAAGCGA TGACCCAGAG TCATTAGACT CGTI'CTCCAA GTAATCACTrG ACTAGCAAT TCAACATAGT AGCGACAT ATGGGCATTG AACTAACTT 891 GA'rGATTGTC TGAAAAGA'rG AATTGACCAG AATGCCCGAT TTCATGAATC AAGGTATAGA CATCGCTCAA ACGGCCTGTC CACCATGA GTACATAACG G;TGTACCCA 'rA'rGC=CCG
CCCCATAACC
GG.TAACGAGC
AATCATAGGC
AGTCTGCAAA
GACACTGG
CCACTTC7"M AGAGTmTC
GAACTCCCTC
ACCGGAATCC 'TGCCACTG1' AACTrCCTGA CAATATTCTT
ATCGTCAATA
GCTCATCTTT
'rCCAAAGTCC
TTCAGCTAGA
AGACTTGACC
TGAGAAGG-MA
GTCACTTCAG
TCAAGACCAT
TCATGATGA
AGATAGTCAA
TGAGCCACAT
TAGCAGCAAA CTCCACCCAG GCCCCAAAGG TTCTACCCAC GATTCAGC GC= TCCAAG TTACCI'=G AACATG;CTTG GGTCAATCTC GCGGTCAAAC AGACAGAGTC GTATCCC1'TC AGGCTGCTCC AGCCGIATTT
CGCTCTCT
TrATCACCA
TCCAATTC
ACTATCTCT
ATGACACGG'?
ATATCAGCCA
TGGTGCTTrAC CGGAAGGATT TCTCACCAAC CTCAGCATCC rATrGTTT GCTAGAAATT CrCATAGGTC ACAAAGCTGT TT7=GAGGT Cr'TGCATGG GCTCAAAGT a CAGCCATTTC AAAATCCCCA GCTCCCATC' TAGTATAAAT CTTCACCGAG ATTTGCAAC GCCTTCTCCA CATCI'CCCCC TTACCCTG ACGAATGGCA GCTGTTAAAT CTCGCAATTT CCTC-ATCTGC T'GCCACCAAG CAAATTCCAT CCCAGCTTGG GAGCCATAAA ACCATAGTTG CAAAGGCCTT CTCGAAATrC GGTATGTC TTCGCGAGCT GGGCTGTTAA GTCCCAGAGT TCTTCCTrCTT ATTCTCTAA GCTG?= CTG CTCGCAAAAT AAkAC?1TCGA TT'rCTGCAG GCTCCTGTwrr CAAGACCAGT GCATCGTCAA AGAAGGTCAA GCAATATTGC CAAATTCGTC CCAATATGGC TCATCTGXAT TCAAAAGTG;T GAAGATTCCC TTCTCG.ATTG CACCCAAGAA TCCTwrCTCTG GAAATTCTGA ?TCTACTAAA ACACTAAGGG ACGAGGACCT AGGCCTGCC-A 'rGAGAGACCG CCTTCTGGAC GTCC'TGCGGA CTGTAGAAAA TAAGT.AGTGG GCTTTN-rrGA ACCCAANACGG TCCAAGACTI' GGCTACGCTG GCATCTGTTT ATTCCTATAG TCCGTCGTCT CTAGATCTGT TCCAATTCCC C'rGTAATC-A CGGCTAAACT 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 ATCCr-ACG
ACCGTGTTM
CTGATAAACC
AAACGGCTCC
CAAAGATAAA
?CTTGGTATA
TGTrCCATIr
GTAAAGCGGT
TTTAGCTTCA
GACCAGTTTG
TTC?1-rAGCc GACTGCTTGC AGAAGC-GCAG CGCTCTCC GATTCTTCAT AGGCTACTAT GATAGAGTCA AACTGGTCCA. GCI'GAGCTAG AAAATCTCCT T rCTCGA AAAGTTTAAT ACTM=GACA ATAT'rACGCT TGCT'rTGC GGCTGCTCCA AGGGCAATTT TTrCAGr T TCAACTIrrT T'rACCCAA'rr TCI'rGCCATC CC-ACTTG.GCA ACTGACCAGT CTGCAGGAAA GGCCCAGATI' TGGCTAGCCC CCACTTCGGT TACTTTTTGA CCATGAACT CCAGCTTCTC TCCCI'TTCGA AATCCAGATG CGATGGTC-AC TTGCACTGGT 892 A~rrCCACAT TGT-CATI-rAA T-Cq.".=ACC AACTCAAACT GACCATrrC CATATCCAGC ACGCGCCCCA AGCGCTTGAT GCCA'rCATCA AAG2ACTAAGG TAACCTC)JC CTCq?"C=.TC AAGCGCATAA CCTGAAACAT ATGCIACTG GrrrCCTTCT CCTCGATAGTr GACACCAGAG ATAGCACTGC C~rTACAAA ATACTGCTGC kTGCTAGCCT CCAATCACAC CAGAGATAT'C cx-~c~'rr 1-AAAGACAC AGGTATrCCA TTCCCCrTCA ACCA7rCTGAG 7TCCAGC.AA AAATCCAGCT GACCAGCCG ACTGGCGCAC CATGTCCA.AC TTrTCCTTGA TATGCCACT CCATAAGCA CT=CTATTA CATCALTG CA'rGA'rCAGG TAhCC1-CAT CCT-rrACCAA GATATCCG;CC AAGATA7-AG CCACA.ATCAC ATCr4CCAGCC GCTACATGGA TATTTCCAT CACACGAACC GCCACATCAT CCAGGTCATA GCTrGCCAATA GAGAGAACCC CrGAACCAGT AAGAACCrG'r TCCAAGGCAA .AAGGCTCAT AGCCATrGCCA GGATCCAGCT TGATAATCAT AG.AGGGAACG ATGGTCAAAT CA'rGAGTGAT GTCTGCCCAG TCrTrCCTCAG CCAAGGCAGT ATCTGCC'rCA GCCAGCGTG'r
GGCGAAAATT
CCCCACATCT
CTCTAGTr
TTCCCCCGCA
ACGAGCAGGT
AITTTCCACAC
AC--'rCAATAT 'rCTT'ACCCC
AGCACCCT'T
GGGTGGMC
GTCGCC-TCAT
TCATAGTA-T
CCrrAAGCAA
TTTCCTCAGC
CCAGAAGCGA
CGCCACCACG
CAGTACCAAA
AGTCTG.TCCA
TCTTCCAGTT
CCAAATCCAT
TCACATCCAC
CI'GGGAAAAT
CGACTCCTTG
CGTACCTATT TTTAACTCTCr 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 AA).ATCTGTC A.A'CTG.CTA GACrACCCTG CAAATCCGCC TCAACCACTG CGTGTCAGGG TAGTAGGCTG TCACTACGAT TTCTTCTTGC TGCTCCACCT CTCTCCAAAG CGGTCCACAT TTCCCACATA GTCCATACTG TCTTrCGATTG CGCTCCC-AGC TCAATCAAGA ?ITTTAACrCT TGCC-ATGTTT AGGAAATTCT CTGAAGACGC GGGCGGGTTC AGf'TrAGAAA A'rCTTTrAAT CTCTCTTGCA ATTCATCAGC TGATAGCCCT TCTTAGCTGA CCAACCATAA CTTGCTCCAA AGCAGAC-AGT CATTGGAAAC CAACTCCTCT CCCTCACGCT TCACTGTAAC CCATTA'rTAA TACCAAGCCC GTAAAACACA AAACCAAAAT TTGTGTCTAA GAGAAGTT'A TCTTIGGC ACAGTGTTTA TGTAACT7GAA CCATCCTTTC ACTGAGGCAC AAC7rGACTG GTCCTTCATC TCCGAAG3ATG AGACCGATTT CTTGCCTTTA CTTCATCTAA ATGAATTCCC TAATCAC?1'A CTr'r"AAATA GAACTAAGAA A~wrCCTCAAC ATA'rGTCAA ATTGTCTTG AAAATrACGC TAGGATAAAT AGTrTCCTCAT CCGAGCCCAT TCAAA.AGGGC TTTCGTCCCC ATTGGCCCAG GOAATACTTG CCTTATCA'rC AAACAAACCA ATCTTGCAAC CTGTGAAATC CTAACAT'I-r CTrCMCAGC TCGGCTAACC TTCACGcCCA CCACCTGCCA CCGTAAGATA GTCAAAAGCT AGAAA7TTC AGTTCCATCT AGrlTTCATA ATATCTTTT-C
AAACTTCACC
GTTCCCACAT
TATCCCCACA
TCAGTTCCTT
'rCATCCCTCA 893 ACATTCCACT ACrATCCATT TTCTGTCAG CAATC?1'GAG ACCTACGA GTTCGA-.=A cATcTcTT cACC?rAAT TGATACCAGG CTTGTATCAC TTrGAACATrG GACACTTTGA aAGACAGAAA caATTr.ACC TGTCGAATAC TAGCA'rATTG CTCCGCTTCC TCAAAATCTC CTTCCAACAA GGCGA'rA'GA ACCAGGGATA GlrGGGCAAC TG=~CATC ATCGGAGTAG TTG;TCCTCTC AAGTAA'rGCT 'GAAACTGCT GT1TrAGCTAC TTCTTCCTTC IrGCAAACTTC ACCTTGCATA GAACTGCTTG AACAAAGTCT AATGTrCTCAA TACCAGGTAG 'rTTTTGCrCC ATCGGTGATC GAAAAATCCA AGTCCCTGCA A'rAATATCALA GCCGAGATGA CGC -rCATC CTCTTTAAA TACTAAGATG AAACCTGCCT CCAGCCGATA GCCTGTCAAC CCTAATACAC CATCCGCAAA ACTCCCTCCT TTCAAATCAT ATTCTTGAGG AGCTAGCAAG
CCTTCCAAAA
CACCTCAG
GflCTGGGCAG GCCGTATTTrGG TATrrTCAGG GT=TTACT AATTCCCAAA TCGACTGGCA AAATGTTAT- TAGGAAGAAA GATAAATrA.A
AAATACCAGC
ACCATCAAGC
CCAATGTA'rr GACT7r'rrGG 'rAGCCACAAA 7TrCTTGTCAA AAATCCAAAC AATATCGCC-A CTCCTGAAAG CATCAGGATG ATTCrGrAT
S.
S. *S 55.5 5*5* S.
S
*5
GAGCACCAAC
TCAAAATAAA
AAGCATGACC
AACCAAAAGT
ArTTTCACA ATGGCT=C ATCTCCTAAT CCAAAAGCCA CAGC'TCATGSA AGAlATAAAGA AA.AGGCTAAA ACTGCGGAAT TTA.AATACAT GCTrAGAAGA CGAAGGCAT ACCCCAACTC TCCAAA'TGCC ATTGT=CCAC CTAGCACATA AACCAAATAA GTCCCAATTT ATC=rGCATA AGGATAAAAT TCTrCCCTT 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 AAGCAAAAGC TAGrCATAATA AAGACAACAG TC7rCATAAC ACCTCCAACC AACTCCTAGT CCALAGCCAAT TTTTCC==C TCAAAGACTT
CTTGGTTCC-A
CA'rCTAGCcC
GGGAAAATCC
TT1CCATGACA AATTCCTCTG CTTCTGGGTC AACCTCAGCA GTATCT=.AA GGAAALAGCGC TT=AGGC AGGTAAGGAA TAACAGTCAA TTCCAAAAAG TCCATGAGGA AAAATAAGCT AAAAA'rTCAC ATAGTCTTCC TCATTCACTG TTGACI'GGC AGGATI'GTAG AAAAGGACCC CTTCCTCAAA AAGAA'rGTCA TCTCATGAAA CCTCTCCGTC TTCATCCACC A'rCTCCACAC CGCAGCArrT TGCCCCA ATAGAAAACT CACTTCTACC GCATGGTTGC GTTTGTCCCA GCTAATCTCA AAGTCAAAGG GAAAGrrT -GTCCAACTCT TCCTCTAAAA TATCTAAAAA TCCGTATGTT GCCATNTGT CCTC?'rTCTA TGCGACTCTT TAATCGCCCC GATTGCTCGG AAATATGCTA AAATAGATAC TACCATCTTA CCACAAAATT ATTnTATGTC CrAATTATAC CATATTACCT CATTTAAACC CTTGGTATCA GTGATTSTCr TAAAAG'TCTG AT=TTCA'r TTCTCATAAA AATCAATATA AAAAGCCCTC GAAAGGGCTA ATAAATCTAT AAAATCAATA GGCGAGTAAC TAGC-ACAAGT GGACGTGCT1' TTrTATrGAC TATTACCACG ATAGCGTCTC TCAAAGTCTG CATI'TTGGTA TGGTCAAAGT TCAACAGATA CCATACAAAT 'rCA'CTTTAA AAcGATCAT AACTCTCCAT CACTCTCTAT GCCAAAGCCT CAG2AAATCCC AC.AAAATAAG TCGCTTrCTGT ACTAACAT'rr CACTACCTCC TTTATCC CTCr'rrCAGT 894 ATACCACCCT TAATCrTAGG CTrrGAACrrT C-TATCTGCA AGAAAAGTTA AGCCCCATTT CTCGTCCCAA cTTATcrccc CTrrAATG GGTCCTGAC TrCCTAGA TICTGATACA AAAAGATTTA TCAAGCAT AG.GTTGAcAc GAAATCTTCA ATCAAT-AAA GACAAGCTAT TGATATCTGA TGCCTGAGGT CAAATCTGCA ACAGTrATCC CTAGCC-AcTC CGACCCCATA rCTCCCTrPGT
TCCCTCTGTG
ATATCAAAAA
GTACTTAT'rC GTAGCTCACT ATTrCAAAATC TCGAAATCG TCCCCTrGCT TTCCATTTT AATATGGGAG TGGGCCGTAA GGAGT'TTAAC CATC'TTr GACAACACCA TrATAACACC TGTTACACGT TATACATAAA AGCCCCTAGA 'rGTGGTrCTA CrAATGAcrrA GTAAA.AACTG CTTCTITATC A.%AAGTATCT CGATCTAAGA CAGATTGAGG TACTCTATAT TTTTrATCTC CAATTT'rrAC ACCACGTCAA CCTCCCTTA CCCTACTCAA TCTTTGATTT TCATT'GAGTA TGATI'AACTC TCGTCATAAT AAAATAAAGC TGGATACGTA GCAGGGACTG AA'IrT-rACAA CCCACTGC AGCTCACCAT GAAGGATrGT GATAGGTCT- CCTTTACCGC ClCl-ACTCrr TATCCAACCA TGTGTCATAG GCATACCGCT TTTACCTCCT ATTGTAAAGG AGTGATACTT ATTATTCTAT AGGGAAGCCA AT7-ATTCZAT ACCTATT'rT GAGCAAT'rCA
CGCACTTGAA
AAACTGATAC
GTACAGCC1'G
TCAAGCT=C
TCATCTGTAT
TGAATCATAG
TCT-'CGAAAA
CGGCTAGTTr GAAArCAGGA AGTCAATrGT
GAACACTGCG
TCAAATTCAA
CCTAGTTTGC
TTTTCAACAG
6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8512 TTATTACAAG GAGGCGATTT ACTACTTCAA AAACATCAAT TAS=CATTT TTCATATTTT CAAGTTTAGA TTTAATG-T TTCAACCCAT 'rATTAGAATG TCGTACAAAT CTAAAATCTC GAAATAACTT GATCACTAAG TGACTTGGTA GTTCCATCAT AATGACZATTC AGCAATATTA CCTAT'rTTGA GCrrrAAGGA AAATAGTAAA AAACAT'rCTA ACTAATAGSGC T1.GAATGGAC CGMAAAG GACTCCTATG AACTrCTTGG TAAGCAAAAT TTTGGAGTA TCTTCCCGGA AGAAAAGTTT TC'TTrCCCT AATCCAA'rGA CGAATT'rGT'r TTGTAAAAAT CAAAATTTCC TTCCATrrGCT TATCACCTCT CTTTrCATTA TAGTTCATAC ?TCTCAAGT CAC-ACTTCC ACrCTTTAG GCTCAACTAT AAATCAAATC TCTCATGCTG ATACCTCTCC TCATTAAATT TCTCACTCCC TGATT-ATTAC AAAACCATTG AAATATCACA ATAGTAAGAT ATAGTAGATG AGTCZATTCTA CTCAAATCCA' CCAGACAATC TCGCCGTTCG CATGCGCCCn G INFORMATION FOR SEQ ID NO: 130: SEQUENC= CHARACMISTICS: LENGH: 2869 base pairs TYPE: nucleic acid STIRANflEMMNSS: double TOPOLOGY: linear (xi) SEQUENC-E DESCRIPTIONl: SEQ ID NO: 130: CTCGTTCAA GGTTGAGTCT CTTGCAAATC T TTCGCGT TCTTCCTTTT GCCAAGWCAT CTCTCCCATG GTTGGTGCc.A GccATrG=r GAATcT~c;CT cTCA-?rGGT= CTACCAAACA AGCAAGAAAG CGATGTTTTT GAAATGGAA'r AATCACTTAA ATCAC=NG TrCC-AAGTC TACAGGAGTG ATTh'TC~r-T TTTATCCGAT GATAAATGTG TTATAATAGG TAGCCAAAGA GGTGAAGAAA TGAATCAAAC AGTAGAATAT ATCAA.AGAAC TGACAkGCCAT TGCGtCCCCA AcAGGCTT'rA CrCGTGAGA'r TGCCGACTAT ??AGTCAAGA CTCTAGAAGG ?rTTGGIAC CAGCCGGTTC CACCAA GGGCGGTGTC AATGTAACTA TTAAAGGTCA AAArC;ATGAG CAACATCCCT ATGrGAcTrc CCA'TGTACA'r ACCCTrMGTG CTATrGTCCC TGCI'GTCAAA CC-AGACGGCC GTCTCAAAAT GGACCGTATC GGTGGCTTTC CT-TGGA.ACAT GATTGAAGGA GAXAAACTGTA CCATTrCA'rGT GOCTACCACA GGTGAAAAAG TA'rCAGGAAC CATCCTCATC CACCAAAC'rT CTTGCCA'rGT CTATAAGGAT GCAGGAACTG CAGAACGCAC GCAAGACAAT ATGGAAGTGC GTr'rGGACGC CAAAGrAACT AGTGAAAAAG AAAC-CGTGC TC-TTGGCATT GAGGTCGGTG ATT-TATCAG T?1'rGACCCA CGAACTGTCG TGACAGAGAC AGGTrATC AAGTCTCGCC ATTTGGATGA CAAGGTCAGT GCGGCGATTT TGCrCAATCT CCTT.CGCATT 'rATAAGGAAG AGAAGATTGA ATTGCCCGTA ACAACTCATT TTGCTTTTTC AGTCITrTGA.A GAAGTGGGAC ACGGTGCAAA CTCTAAC.ATT CCTGCTCAGG TAGTAGAATA TCTGGCTGTG GATATGGGAG CCATGGGAGA TGACCACCAA ACAGACGAA'r ATACAGTGTC TATC-TrrGTC AAGGATGCTT CTGGACCTTA TCACTATGAC TTCCGTCAAC ATTTGGTGGC TTGGCGAAA GACCA.GTA 'PTCCATTTAA GCTGGATATC TATCCAT-TTI ATGGTTCGGA C~cTrCAGCG GCTATGTrCTG CAGGGGCAGA ACTCAAACAC GCCCTTCTCG GTGCTGGTAT AGACTCTAGC 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 CATTCCTATG AGCCTACCCA TATI'GACTCG GTGATCGCAA TATCTTAAGA GCACGrrGGT GGACTAATAT GTGCCTTATI- CXAGAAGGAA GAAAATCCTT ACTTTGTC.AA AGAGTTGGAA AGACCACCAG TATrTrGAAG GCTATAGTCT CTTCTAGCC CAGAACGAAT GGTCGATGCT TGTC-AGAGAA TTGACCTCAT ACAGGCTATC TGTGGI'TGG AAGGAGCATG TCAGCGAATT 896 GCACCATTTG AAAAAGGAGA CAAGACTCCG '1rTTCTAGAA GAAAT.GAGTT TAGTCCA.AGA GGCAGTTGCC AAGGCCCTTTG CTGCTGAGAA AATGAATATC GAACrGCrAG GAA.XrGGCGA TGCTCATCTr CATTGGCATC TGTMrCCACG ACGGACAGGT GATATGAATG GTCATGGrC'? CAACGGT=G GGACCAGTCT GGTGCrCC CTrrGAAGAA ATGACAGCAG AACCTGCCA AGCAAAACCG GA'rGAGA'rTA AAAGATrAGT CAAACG~rA 'rTMGAAG TAGATAAACT ATTAGAAATA AAGGAGTAGA TTGAGTCTAG CAGC1'rGTTC AATGAAGAAA AG3ATACCTAG TCT TGACAGC TTTrGcTAGCC ACAAGAAAAA ACAAAAAATG AAGATGGAGA AACTAAGACA
GAACAGACAG
AAAGCACAAG
ATCGTAGCCA
GCCAAGGCAG
GATCATTACA
AACCAAGATG
CCAAAGCTGA 'rGGAACACTC TGGTCAATAA AGGTGATTAC ACAAACACTA TCCA'rTGTCT AGTTGGTCAA ACTCATCAAA GTGGTTrTAG AAGTTATGAA GAAAGGCAGC AGCrCACCGT GGTAGTAAGT CrCAAGGAGC TGCCCAGAAG TACAGCATTC AAGGGAAPATA CGATGAAATC AAAGACTATA ATCCAGGGGA AAATCCAACA GCGATGCAAG ACACG1TTT CCCTA'rTAGT ACTCAGACCA AGCTCTATCA AGATTATGTC TACTCTGCCC GrCCTGGCTA TAGCGAACAC 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2869 CAGACACC TGGCCTTTGA TGTGATTGGG ACTGATGGTG A77TGGTCAC AGAAGAAAAA GCAGCCCAAT GGCTCTTGCA TCATCCAGCT GATTATGGC'r TIXTTGrCCG TTATCTCAA.A CCCA-AGGAAA AGGAAACAGG C'rATATGGCT GAAGAATGGC ACCTGCGTTA TGTAGGAAAA GAAGCr AAAG AAA'rTGCTCC A.AGTGGTCTC AGTTTGGAAG AATACTATGG CT1-GAAGGC GGAGACYACG TCGATTAATA CTCTTCGAAA AwCCTCAA ACCACGTCAG CGTCGCCT TA CCTACTGACT GCGTCGGTTC TATTCACAAC CTCAAAACAG TGTTTwTGAGT cGATTCGTCA GTTTTA'rCTG CAACCTCAAA GCTGTACT'Pr GAGCAs tGCG GC--AGCTTCC TAGTTTGCTC TTTGATTTTC ATTGAGTACA AAAAGTAAAC T-z--rCCTG CAAT TCCAGA TAAATAGTGT 4PLTATGC.TG GGATG~-LA AACATCT TGTGGAGcPA AAAATC CTA ATITACCGCCA AAACCACAAA GGAGGATTTA AAAATGGCTA AAAAAGTCGA TCCCTGCTGG TAAACCTACA CCAGCTCCAC CGGTTGGACC INFORMATION FOR SEQ ID NO: 131: SEQUENCE CHARACTERISTICS: LENGTH: 6186 base pairs TYPE: nucleic acid STRAIIDEDNESS: double (in TOPOLOGY: linear AAAACTTGTA AAATTGCAAA
TGCTCTTGG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 897 CTGAATCC--T TATAGGAGTC CACTAACT'~TT rAGCCTCrA C??TGCCCTTC ATAGGCAGCT TCAACATCAT TAAAAAAAGA ArCCACTGAA GCAAGTTCTT CAGCCrCCA CaACAAATCT 120 AGrGGTAAC TATACTG11= G7rCATTALAC C~r~rGTACGATAACTA CGAGGCAGAA T7rGCATACC CTGGCATCA ATAATAAT'M TC).AATGAAG CAATTCCGAT ACC-2AAATAC 1Nr7rGAACG AGTTG.AAATG ATGTCATCAG GCAAT.TC -rT TC".rGTTAAA GGACTGGTCA 7'r'T=CTAAC CAGGCCTTAG CCTACGACA TAATACCAGC TCTCATTC=r CCCTTA AAGCACGAAT CTCATCTrCA TTAAAACCGA CACCACGCAA AAGACrAGGA TACTGCTCAA TCTCTACATC AATAMCAAT 7TTrGAAAAA TACCA =.IC CTCAAGGAA AGGATGTG'TT TAATA'rTGTG TTCCACAAAG CGAACITATG 'rGATGTACAG CTCCGTGATA GAAATAGTGT AATCATGCTT TTCTCTCTT GCTTGACAG GGATTTTAG AACGATTGAC ACAATTGGAT A7TTTATAT ATGTAAATTA AGATACCCAA AGAAGT=C GTCTCTGCA AAATCATAA-.
ATCTATACrT GCTACrrCT ATTATACAAA AAAATAAAGC AAANAAAGCC TATT-rTTCA AGAAAAATAG GC7MTr'IGCG TTGGT'rAATT CAC'rCTTAAC GATccGr"A AACGATATAT AAAACATCTT TCCTTTCACT TCCTACGACT T7rCAGATAC ATAGAGGGCA AAAAAGAGGA GGAAGGCAT'G AACACGATCC TTGGCTCCAT CAAAAAGCCA CCCACAAACA GAAT-'TGATG GAAAACAGTA CCAAGTCCAA TCATCACAGG TAAGACTACT AAAAAGTCCT T=TCACAA'r CCTATTGACA GCTACTAGCT GAACCAAATG AAAGAGATTC .GCTGACGAAC GAAAATCAGG CATCTGTAAG ATCAAGATA'r GAAAATTGTA TTGAATGGTT TTTAGCCACT GAATCTCCA'r AGGATAGAAA CAGAGGGAGA GGAGAAAGiAG CATAGAGCCC CCACTCCGCA AGGCTAGAAA CCACATGTGT TGCCTTAGTA AAACCTGrCG TCACCAAGAG AAGAA=-G-T CAAAACCAAT AGAGCCAGGA G~kCT.r-TCG AAGACATAGA AACTTCCCAG ?'rGACCAC=' CGAAA'rGGTG AcCGACTAA AAGGA'ITGGT TCTGGTTNTA CATAGACTCG ATCCAAGCCA GATAAAT=G CAAAACATCA ATTTAGTT AAGAAAGA.AG 840 GGTATCATCT 900 CAAAACTrCCC 960 ATAAAGAGAC 1020 'rCrCACTAGA 1080 CAGACCAGCT 1140 CAGA'rAATT'C 1200 ArrCGTTAAG 1260 CAGAAGGATG 1320 CATCAAAATC 1380 CGGTGCCAT GGCAGGCCAG CTACTTCTTC CG7rGGAA'rG CCA7"'GTCAA TCCCACCCCC AATATCAGTC AAATAATTGr CTCAACCCC TTGTCCATAA TGTCGCTACT TCAAGCAGCG CGGGATGTTG AGGTCAGGAT CCCCAACCAT GATAACTCT TCACGTTCCA AACCTAAGTG TGATGGCATT TGGTTTrCCG ATATAAACCG CCTCACTCG TAATCAGTGA GCCAGCACCT GGCAAAACAC CGCG'rTCCGT TGGTTCCGAT AAAATGGGCA CCCTTTTGAA TAGCAAGACT 1440 1500 1560 1520 1680 1740 TGCTGTrGGCA AATTrrCAT AGTCGAC?1TG TTrTCCTTG TCrrCCACAT AACCAGCCGC GACGACATAG ACGGTCr'1r CAAGCCCCAA CCCTGTGTAG ACAGTCGATA GCGGCGTATC AACACTCTCT GGAGCGGG =TATT=~ TTGCAATTCA TCAACAAAAG TCTCTCCAGC TCCGTCTAAA TAATTAAAT AGCC=TATA 898 CCAATCCAGA CCAACTACCA CGTAGCAGG CTGATGvGCT TCCTTGACTC CTGCTCTCC
ATCATTCATA
GATATTAAAA
GCTTACAAAG
TAGTCGATGG TTGCCAAAGT TTCTGAGCCA ACATCTCCTr AGATAGGGAA TGTCCCGCT AGGGATTCGG TCN'TCCCCT TATAAATGGT TrTCATCT-AT TTC-.CCCTAA GCCTrTTTrA ITrTC-rGCCA AGTAATCATTw TCI'ICCAG TCCAZTCCGT TGACATACTT GAGGG'A-r CCTCAAAAAT CCAGTCCAAG AAAAACCAAC ATG;GTAATCC GGATAAGTTT 'rrTATCAAAA GGACiTrCG ACTGTCGCGG GCTCCGCrTC AT'rA'AAATA CTrCCGTTrc GAGGrAA'T AT'rCTACGAT AATCCAGACC G'rTCAATCGA CTGCATCCCT TA'rGACCCTT CATATCAGCC 7rGCTGATcG GTTCTTCCAA TCAGACT=G AAGGAGCCAC TCCATGAGAA CAAAGGTCCC CTAAAGCGAC GGTAGCAAAA T'rTCCr.CAA AACGA.GGCAA ACAT1TATATT CTTCCAAAAT GAAACTTGCA GTGACAGCAA ATATCAAAAG GAAT 'CAT GGACTCATAC TTCGGTC-CAC AATCTCAGGT TCAACAACAT CATCATICTGG ATGAGCTCCT AAGTCGATTG TAGCTC.AAGG ACGAACCACC TCAATATCAT GGCCTTATCA CTAACTCCC.A AATCACATCT C4AAGTTTGA 'rTTCATTTGA TAAGTTAAGc CTCCCAAAAA GAGACCTCTC CTTrGTTGAAC CAAATCTCTG TTTCCTCAAA TCCCCCATG CI-rCCATl-G CTCCrT1~rcA CCCCAC'rGAG ATAATGACGA CAGATIGCCCC 'rCCACCAATA GCTTGGGCAT TGATAACCCC ATCACTTCTA ATTTCATGCT TCAACAGCCG ATGTAATCAC CCCACCTGGT CGAACTTCCT TTC7rGGGAA TATAGTGGGT CAAAAAATCC GCTCCCATCA TATTACTGT TATL'GACATG ACCATTCATA TCCAAGTCGT 1.800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 CCATCATCCT ACTCCAAAAT AAATCGTCT TTTrGrCATGC GAACGTIGTC ACCCTCAATC ACAAI-rTCTC CATAAAGTCC AGCAAAAGAC CAAACCCCGG ATTTCTTGCG G.AGTcccAAG CTCAGGTGAT CTTCTGTAAT ACAAGCATTG ATGGGACCAT CAT'rTrrATA GCGTACTCCA
GCTACAGTAT
TGCATGAGGC
TGTCCAAATT
AAc-A'rrrCT
CCTTCCTCALA
'rrGACATAAC CCA'rACTATT CAGCA=rGTC CCAGTACATG AGATTATGAC GACrL'CAALA ACCGGGTTTZG GAGAAATTAG AAATCTCATA ATGCrCAAAA CCCGCTCGCT CCAGCTCTGC AATGATG;TAC TcAAAcACT -cc~cTrAG TTCCTCCTTA GGCI*GAGCCA ATTTCCCACG TCGCATCCGG TTCATAAAGA CCGTATrCTT TTCTAAAATC AAACTATACA AACTCATGTG CGTATCC AATCCAMTGG CTTTAGCCAC ATTTTCCTTT AC7TGCCCA TCGTCTCACC AGGCAGAGCA 899 TA.AATCAAAT CAATGAGAT ATTGTCAAAA CCACCCAGT TCAGCGtATC TAAATATCCT TCTCCAAA'rG ACTGCGCCCA ATC=r"rCA ACATCTTATC TrCACACCTA CAAACACG ATTGACAGCC GAATr1wrCA AAACACAT TCCAAATCGC CTGGATTGGC ?TCAATGGTC AACTCTTCCA AGACAGACAA TTACTCAACC CATTCACTAA CACCTCCAGT TGCGGAGCCG ACAGGCCTGT CCACCGATAT AAAGGG'I-GA CAACT"'rTCA AThTCATAAG AACGAAACTC TGCTCTAAAT ACCTGTCCAC TGGCTGATTT T'rGATrGAAGA CCTTTGAAAA
GATATT-TTCA
ATCAAAGGTC
C=ATCCCCA
3600 3660 3720 ATCCAAGTTT 3780 CGgqTCTTCCA 3840 TTCCACCAGA 3900 ATCACAATAA 3960 7TqTCTGCATA 4020
S..
S. 55 S S
S
S. 55 S S 0S S S TAACAAATCT GGGTACAAAA TGCGATCTGC ACATACGCTC GTAATTATTA TACCACAAAG ACTAxATCC AGATAAAAAT TCCGTCCGGA GATGGTGA'rG CTTTTCTT CTGTTA'rATC CATrCAAGAGC TTCGGCTTTT TCrTTCCATT CTCCTTGAG ATGCI-rCTGT CGCTTGAAAA GCATAGGATr' TAGTTTGACC TTrACCTGA CTCAACCTG;T TCTTTTCTT TCACAACAAA CTTCTGTCAG T T=CACAG ACTTGCTCCT TGGCATACTC CTAGAAAATC TrGAGCCTGA CTGCAAACTT GTGCC=rT CAAGAGCTCC ACcTGAAAcG C1TTCCTAAAA GGATTGAGGA TCCTTTrTA TTTrTGAAA AATr'rACTTG CAAGACGAAG TCTTGAGTGT 'rrrGAACCA GCTCATGAAGC crTCTTGCT TGAGGTCTGA AACAGATAGA CATAAATCTG CAACAGCACT CC-ACCTTGAC ATTGATATCA TCTGCCAAGA CATTGACCTT GATGCAAGGT CACATCCACA TrAAGTCA AGGN'TAAT TGACACGACC AAGC1'rr-T .ACAG1'AATGA 'rCAGATAGAC GGGCAACAAG AATATATGCA AC7=~AACA 'rrrAG7=~C GGGCCTTrCrr TCGAT~rrGA TAALATAACGA TCATTATACC ACAGCCATTG GGACACTCGA AACCCGAAGA ACATGAG-ACT TAACCATACG ACCGAAACCA TACCAAATCA AGTAA.AAGCC CACCXCCCC AGATACATAG AATCACAA'rC TC?'rCTGAGr ATTATTTAAT TGATTTTTTG AAGrATACTG TCCACAGTGA ATC=GAGCC TGCTCCTTAA CGGATCT'rCT CTCAAATCAT ATCACTTGTT AAAAACAAGG TALAT'r,"ACCC ATAAGGATTC AGCTGACAGA CTTrCACCAG CAAGACACGC GCATGGTCAT GAkAGAGTGGA TCAATCGTAG AGCCAACAAC TCATTGGTGT CGTCTTTC GTTTCATCGA CAAAAAGACA ATCAAAGCTA CTCCTCTGTA ATATAGTAAG GAGACCGATA AGGACAACTG ATCTGTTCGC ATACCTTCGA CGTGATATGA CCTCGTCTGA GCCAAGCAGA TTCCATAGAG ATACATCTGG TCACGGATAA AGCTTCTTGG TTAAAGAAAT 4140 4200 4260 ACGTrGG.T'r 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 55 S S
S
S
0*SO
S
*5
S
S S GACTCTTCCA TT-TCCCTCTA AAAATCAGAA TCAAGGCAAA ACTCATAAAG GAAAGTCGGT TGACGGTAGC TCCCCTCAAT Arr~rACCTAC ATAATCCAGA TTATCCACTC TTGCACCA'rA 900 TACCCCAACG CCCCAAACTT TGAGCAATCA TAACGC!TAGG CGCCGCAATA TCTAGAAAAT CCCAAGTATF GATGAGI-rTA C GGT CAGCAA AGATATAGAG AACCACCGTA AATG.GCCAAA CCACCATTCC AAATGGCAAA TATAGTAATC AAATCGGAAA ATAACATAGT AGAGACGAGC CACAAGAGCC CCACPrATCA AATCTCTCCT AAATrCTGAC TCCTAAAAPA GCCAAGGGAA AGGCTACTAA GATAAAATCr AAAATATCGT CTGGTATGAT CTrCrCT.A GGTGCTrCTT TCATGGTCAA ATAAACCGCA AJ3AATCAAGC TGGCTAGGGG TCC-AGTTGA ATAGCAATTG ATTAGACTrG TCAGTCGrC GTCGAAC.AAA CGATAATTCA TGGCAGCTGC CTCAATCACA ATACGAATAC GAGGAATGtA CGCCAGAAAC ATCAAAGGTC TTATGCGTAT CGTAATTTTC CTGTCACAAT ACATAAGGCA TACCAACGAA GATCAAGCAT TT7GCACCTC CGGGTCGC.AT CAAAGCCCAT ACAGAGATAT TACGACCTGT TI'CAAGTTCC TCTGCATTAT CAAATAGACA GCAAGCTGAA A'rrTCGAGCG '1rCCTITGGCA
TTY'AACTGGA
'rTCCAAGACG
CCTC.TGAAGA
CCCCACGAAT
CATCCTTGGC
GCTCAAGACC
CATAAATATC
ATCCTGACA GCACTCGCAC CGTAGAGACT CATAAC'ATCG ATAATACCAA T'TCAATCAAG TGTT'rCAAAA TTTCAGCTGG TTCACCCCAG AGAGTAATCT AAAGATATCG ACACGGTCAT CGGCTACCAA ACGGYGACCA. CGTTT"GACAA 'GTCTCGCTC I-ACCAATTC CACTATCTCC CTG.AATCAAG ACGCCCATCC
CATCAA
DJP[ORMATION FOR SEQ ID NO: 132: SEQUNC CHARACTERISTICS: LENGTH: 9541 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6186 120 180 240 300 360 420 480 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: GAAAATCACA ACCCTTI'G CAAAATTTTr GAGATTAT?? TCACAAACTT GATTrCAA AGTATACTCA ATAAAAATTA AAAAAATCC.A CTACGTCAAG GCGAGGCTAA TGTGGTTTGA AGAAATTTrTC QAAGAGCGTG AATGAGTATC ATCTATAGTA AAATAAAAAA ACTGAACAAT vrGPGGACAGCCAAAC CAATTTCTC.A CAATGTI'CA GAAACAAGGG TGTGCTATTC CAATITCAGC CTACTATAAC TGTCATAGAT TGCTGAAACA AAGTCTAGGT AAAAGTCTTC ATAATAAAAA G3ACCTccTAT CAAGTrCA AAAACTI'GA TAGGAGGTCT TGTTTTGTGA AAATATTTAT CAAA?1'?PCT ATACAAGTGA GCTGTTAGCC AGGrrc'Tc TATTc-rTCA ATTTrCAATGA A7KMTT~ TACTAATACT CATAACTGGG AATTTGTCTG TGTAAAAATA GCGAGATAGA TGGTAVITAT AAAACACTCA AC.ACAr.CTAG ACTAATATCA TrAAAACAT TACTTCTTT TGAGCQACTG GATAIGTC TGAT?1'AGCA TrGGrTACrA ACATAGCTAA ATrrCCrGCA ?TrTCAAA'rr TrCACAACCA CC-AAGAGGTG 1rCTri'CCco TGAACTTCAT TCAATCGCAG AATCCACAAA GACATGATGG TAAATTCAT GCAcc-AGrTr TTGTIrCAA TrGGATGAC? TGACGGA'rAA AGATAAGGrA
CATAGCTAGA
AeCTCAATACT GCCGC7ATGT
GTAAGAAAAG
GTCTTGACC
GACCATTATA GCTAT1TCA'rC CAGTCGCAAC CAGTrrCGTA GCATGACCAG ATTCATGGCT GGTCTGGATG AACZGTrGCC
TCATTAATCA
GCACGCTCTC
CGACCAAA'rr
GTGGCAGTT
AATAAATCGT
AGTTCCAGT CACTTGGrC ACACTGTCAG TATCATCATG GGTCAACTCA CCA?1'TTCAC 600 660 720 780 840 900 960 1020 1080 1140 1200 CTTGACCGA'r
CGACCTTGCG
GAAGATTGTA
T'rCCTAATTC
TGACAAAACC
TCTCATTAAA
T'TCCATAAAG CCCATCCCCT CATGATTTGC 'rCTCGAACTTT ATTCTCATGC GCTTC-ACAT ACGACTTCCT AGGATTGCTT- TGACTTGATA GCACCATAAA AG??AAGCAC CTGATTT-GGA TGTGTATAGC TAGCTAGAAT TGGC'rTCGTC GCAGCACCAC CTT'CTCCCCC 7'IwMACAGCA TCGCGCTGAT GGCTAGGCATI' GTCCTTCTTG GCC'rTATCAT AAGGGGCAAG CTTCTCCATA GAGGATAA-G TTGGAGTCGA TTTCATCCAA GAAACCAA'rA
ACCTGTTCCC
GCT=rACGA TGGTCTTGAC ATCATGAATC CCCA'rCA6AGT CAAAACGGAA CCCGTCAATA GCACCCAC;TA TAGAAGAGAA TCAATCATAT ACTTGCGAAA CATTrCG=G TTTCATT'rCC AACACCCGTT CCA=TGGA AGGTACCATC TGGATTCATA TTTGGCATCT 1260 ATATCCCATC 1320 ATCATCTGCA 1380 TTATAr'rCCT 1440 TCACTCGCTG 1500 CGATAATAGT 1560 TTATAGAC'rA 1620 TTCAAATCAC 1680 AATCAGGGAC TGTrT'rrGG CATCCATAAT GACTCCAATA GAATGACCTG AGCTGGATCA TTrTGTGGATC ATAACCCCAG GGTCTGC AAT TGGTTSCAAT CAGTTGACTG GCCGTAT1TGG .GAAGATGTTC ATCTACACCC AGATAACTGC CTTACATGGA
AATGGTGCAT
CCCGCATCGT
TCTGCATTAG
TrT=AGGTTA
TGAACATAAT
TrAACTGTTC GATcTAJ:G:=
TTTCCAAC
CAACAACTGA GAAGGTATGC CATAAGCTTG AACCXI'CACC TTGAAAAACT AGTTTCTGGC GCCTTATAflT CATTTCCATC CTCATCGTAT TCTTTATC-AC TGTAGCCCAG CTTCTTGATG TAATCAAAAG CAGCCTGAGC AGCACCCAAG2 AAAGT'TCCTC ATTTrAGTCAA ATCACc2AATG TGCATCAC GCCAACTAGC CTCCGAALCCG TGCTI'AACCT TCAGAATAGC TGAACGTTTG CCATCAGGC GTG TjrGG ATGAGGGAAT TGGACTTGAT CAACATCCAA ACTCCAGACA CCGATTGTAT 1740 1800 1860 1920 1980 2040 2100 2160 2220 CGAAGrrrC AACTrGCTT T CTrAATGGC TGGTCGCC-AT TGTATAAGGA TCACGTGTCA ACTGATAAGT CTTACCTACC AAATCTTCTT T=C-TATC ATTATAAGAG GTCA6TCATT
AGANGAGAAAA
AATGATGATC
AGCAGCTGAT
ATGACCTGA
AAAACTAOCA
902 TACIATTGC cCCTTTCAT TCATAAACGA CAACTCC AC TTGTCCTA CACGGCAACC CTGTTAATGG CCTTA'rCAAA GCAGGATTTT CAGAGTAATA AGGGGATAGT GA'-TAAAACG
CTCAAAACTC
TTCTGrcccr CAATTCTCCTr 'rCCAAAcCC
GTAGGTCACC
'TGGTAACCCC
TATAMAAAGC ACTGCCAATA AAATCCAGAC CTCrT =AAT 7TrTGACCTGT ATGAACCACA AGCTAAATAA AGCTCCAAAA GACTrlrAC AAAGGAGCAA 'rACGGGTAGTrT ATACA'rr"r '-1-rGTCAA'r CTCGTCCAA
AAATT'CAAGC
TAATCTTC-.r GTGTCATATr TATTT1'TCCT 'rTAACAGACA ?TrrCATA6AC
TGTAGG-IAG
CrCCATTrcrr TTTAC~rTC GGCAAAACGA TrrrGATr AATCCT~ATCA TCGCCTTCCA GATAGAATAT TCT'rrACTAG ATGGTGAACTTr GGGTGTTrCAA CAAATCAATT CGTTGATCCT ACGATGGTAA TGA.A'rCGCA TTT-CTAT=C ACTAATAAAT CGA'rCCATTA CAAACTTTCT AGCCATGCCT CTTTTCAG 'I-I-ACTTr-A GCGTC'rGCAC TACTACCTGT CACTTr'TGCA CCCAGACTTT GTCGTTCCAA TTTTCCTGTC AATACCACGG CCGTCTGTCC TTTATAGTCC AGA711GACCC TACTCATArr
CATCTCTGAC
CTACCACGAC
CGAGT'ACT
TCAAACCTGA
CAGTTTCTTT
CAGACCCTTC TGTCGCAAAA TAAGTCTGAA GACTrTTCGCC CAATACTAGC CACTTCCTCT GAATCTGCCT GAGACAGATT GAAGTAAAAG CTGACrAACC T'rGCTTCCGA CATG.ACGAAT TCTCCCCAGA ATTTTCCTTT GATGCTTGGA TAGCCTGATA CC'TTAACTCC CrCTAAAAGG AGGA ACCT CTECTTGCAA CCTTGACTAA ATTAGCAGCA AAAAGCTTCT C.AACAATACA TCATAGCATC ACGAGAAGCA AAGTGAATCA ACCCTTCCAT GATTGATACA ACGTA~vGGCC ACTTCATCTT CAAAGTGCAA -GACAGTTTGT AGGGATATCT ACN'TTCTT CAGAAACCCG AAACGCCAGG GATGATGTCA CCAGCCTTAT ATACAATGAC CIrTCAGC AATATAATCT ACATTGTGCA GGGTrCGCACG CTTGTACTr.G TGTTAGATTA GCAGTTGGAG TTACAACACC CAACTCATAA GAGfl'IGAC.CT TC~rT'TTT CGGCAGGGAA TITGG-AGCCTT AACTGTAAA.A CCAACTTCTT CrrGACTTGC CrPCTAAACTC TGTTTAA6A CTGGATACCA AGTTCrrGTG GAGGTCGGTC T1"IwrAGAAA TTAGCTTCT GAGCGCTTCA CAAGGCCGCA TCCGCTACTA CAA7-CTCTC AGCAGAA'I1r AATCACGCCA CCTAGACTTTr 'rCAATTGAA TGCAAATATT TCCCAAACCA AATAAGAGCT CAGTTTACCA GCGGACTTTT ACGATAAATA TCCGCC-ACAT TGGACCAAGG CCTGTAATAT GATTTGAGCA GGGCAACGCG CAAGTCAGAG TTACAACTTG TTrTGGACTCT ACCACACGTA CGTATCGTCT TTTCGC-ATAT CCTAACAGTC GTACCGGCAA CGTACGCCA ACrGCCAGT C'rrGTAGGCT ACTGCCCACT TAGGTCGTTG ACCTTGATI'A 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CCACTCCATC AATATCGTAA-GCCAGA1rr CCCG=CCTC TAC'-TCT TGGATAAAAT TCCAGATrrC AS'CTATrTT TCACCAAGA TTCCTTAGG ATTGACCACA AAACCTAGTT GTTCTACGTA CTTCAAACCC TTTTCT'rGGC TATCACCACT 7rGAGGGCTG GCTTCT~rCAT AGAGAAACGT TGCAAGATTA CCCTTGGCAA CTACTGCTGT ATCCAACTGA CGCAGAG -rC CTC.CTGCCGC ArrACGAGGA TTAGCAAA'rT CAGGCTCTCC A'rTTTGC CGCCC7rGG= ThACTTGGTC, AAAGGAAGCG CGTGGCATGT AACA'rTCCCC ACCALACTGTG A'rATCTAGTT CTrTrGGCAA AGTCAAAGGG ATGTCCTTAA CACGCTTGAG G= =.cC;TG ATATTTTCAC CAATTGAACC ATCTCCACGT GTTACCCCAG CAACCAAAAT CCCC-rTrTCA TAAGTCACCG AGATAGA'rAA GCCATCGATT TTCAGCTCAC AAATATAGGT CGGATCAGCC AC-TCCTTAC GAACACCCC ATCAAAAGCA TCTAGCTCCT CACATCAAAA AGCATCCTCC AAACTA'rAAA GAGGATACTO ATGACTGTAT TTTTCAAAAC TCGGACTGTC TGCTAGCACT TGCTCTGGAT CATCTAAAAC CTTGCCACCA ACACGATGAG AAGCAGTTC TAACTCGACC AACTCACGG-T
C
C.
C
C
C C 0
*CC*
*0 0 C AAAGGCGGTC ATACTCACrG TCTGAAACCG AGGGATTATC CCTGGTATAG TACTCAGTCC CATAGCGATT GAGCA.AAGCG ACTAACTCAT TCArrCrTT ATTCATAAGA CCA-.TwTACC ATAAAACAAG CCCTCCTCAC AAACGAGAAG GGCGG.AAAAA ACAC-rAG=r TCAAATTATT 7TTGAAACTC AAGCAACCTr ATATCAATTT TTCA.AAATGA GTCCAACAT ATCCGAGAGC rA6AGAkAATAT A.AXGCTACAA CTCCAAGTCC AATAATCALAG AAAGAATAAA GATGGACACT TGGCAAGACT GTCATAAATC C 1'TGAAT AGGCATAAAT AGAATAGCTA AGGTAAAAAT TGTACTCAGT ACTCTrCCAA GAAATCGCT CTCA6ACCTTG GTTTGTACTT GAGTAAAAAA GTGAATATTA AAALATCGrCA TAAACAATTC ACAAACTAAA rrTCC-AGAAA AGGAAAGAAA AMTrGGAGT GGTAATCCCA TCATAAAAAC TCCGACACCT GTCALAAGCCA GTAAAATCAA AAGATTATAA ATATrAGCCT 'rAA~r'rACr AGCTAGAAGA GCCCCA.ATGA TGGAACCAAT AGCCCCCATA GTTAAAATAC TTGCATAGGC TCCTTCTCAC CCGTA.AAGCT GAT'rCGAAAA GGGAAGTAGA AATTCAAAAG CrGCAAAAAA GAAATTAACG CTGGAACCTA CCAGCAAAAG .GAAGAAAATr TCTrG;CTGA'r GCCAGATATA GTAAcccA TCCrrGATAT CTACAAAAA'r ATCTCTCCCA GTAAAAGCCT 'TlrCTCTrG AACTGrCT TCCTCTrM GAAGGAAAGC CACTAGAACA AAAGCAATGA AAAAAGTCAG CGAGTCTAGC AGTAGCGTCA TATCGAGACT TGCAAACTGT AAAACAAGGA AGGAAAGAAC AGGAGACCTA ACACCTACAA CCTGCAAAAC CAGCTCTAAG CGAGAATTAT AGATCACAAT CTCATCI'TTC TccAcC-AcTr cAGrrATGAT 4080 4140 4200 4260 4320 4380 4440 4500 4550 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5450 5520 5580 5640 5700 5760 904 AGCTTT'ATTG GC=GCGAG AAAAGGCAAA AGCAA'rAGCC TGvCACA6ATGT TAGCAACAAT CAAACCCA ATCATCCAGC TATCATTCCT TATGAAAGAA ATAGCCAGAC AAAGAATCCC ACAAACAAGA TCTGCCGCA TTAAAATCX? ACCACGAGAA AAAC=CCTG AAATAACTCC GCCAA.AGGGA TTGACCAGAA TAr.ATGTG2AC GAGC1'CAGAA ATCTGATACA TrCCTAAAAC TGTrCTGTcC'T ATAGTCCCCA TAGAAG3CCAA ATCCCMTT TTATTGATAG CCCCACGGCT AAGCTCCTC TCAAATTTTG A.AACTAT'EGT CCCTTATTTC GGAAAATTAA CTTTTGACAA CCAGACACTA ?rT'CCTAA'r CATAGAGCAT AATCAACTGC ACTGCA'rAGC GA-rCATATT A'rCAAAACCC AAACGAGCTT TTAT7771 I ArrrTTCC;TA G=TCCTGA TAATAGGCTA
I
C
*6
C
CTTGCTGTG AAGACCTAAC CTTCTTTACA CTGTCC"=T GC'TCATAAAG CTTAATACCT CAT'rATAAAA AGAAGAATGC CTA6ATCTCTT ATGGCGACTA CAATCCGAGT GACATAG'TCT CATACATGGT CCATTCTTCT GCTTCATATC ATAACTCGCA GCTCAAAATA AAGGGGAGTC ACATCAAAAA TATGCATGGC TGATATAAGG CAAAACC-T TTGTCAATAA TCTCTGT C"TCTTGCATC 'rGCTTACAGC 'rAA.A'AATGG KAAACATTAC ATAAGCCT-CA AAATAGTTGG 'rCTAAACAAT GC1'GGTAACA ATTGAGGGCC AAAATCALACA ATCTC'I-rGGT AAAATrCCTC CCrCTCCATA ACTTCTCTAC ACATCGTAGA AACTATAGAG GTTrIrGAAGA GATAATCTGC TC-CTT'rGAC AAATCAGACC GTCGAACTCT TAGACTTTTC GTTACCGAAA AGAATCAACT TACCTTrACCC AAATCATCCT T"TTGA.AAACC TGCAATATCG T1TTGAATAGT AAAGTGGGAT CATGTTCATG ATTA'rGAAAA TTCCTTGCCT TATCCATGAA TGTTATCCAA AATCTCAAAG AAACGGGAGA CTGCCAGGTC GAGATAACTG AGAGGTAGAG CAGGA'rrCGC CTGCTGCTC TTGTTCGAAA TTCACGAAAT ACTTTTCCAA GATGT'rCCAT 1'G'rTAG4CAAA
AAGTPGTTC
AATCTGGCC
ATTTCGT
AGACICCCCA
ATCCZAG7TCA
CTTTGAAGCT
ATCATAGACA
GTTACATGAA
AGCTCAAAGC
6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 CTr'rAAAGAA TAATTTCCAC CTTrTACACCT GCTCTGATAA C C TTCTCCCAC TCAACCATAG CTTCTTCCTG TAATTCCATG ACfGT=CGG CATCGTT'rGT TTGACTTTCT AGCTCTTCALA TTTCAGCTTC AACTTCITrT TGACTTTCTT TCrGGGCCTG TTGATTCCTA GTTGAAGCTT CCTC-ACTCrG ATAGTACTCG TAATCTCCAA GGTAGAGAGT AGTTGCCACA CGA=rGATAA AGTAACGATC GTCAATCAAG GCATTrTCTA GCACTTCCTI' ACGATGGCTG ATTTGTCCA GCTCAGCCTG TTCCAACATT TG-rCAGAAA TGGCTTGGCT TAGACTTTCG ATTTGTCGCA TGAGTTTGCG ATAGTCATTG ACTGGACTTG CTTCCTTTGC ACTCATI'TCT GCTGTTGCTr TCTTCTCAAC 'rGAACCATTC TCAGACAATT CCAAAACATG ATGAC'TGACA AACAGCAAGG TTCCATCAAA ACTATCAATA TCCAAGTGGT TGGTCGGCTC 905 ATCCAGAATC AAAAATTAT TCI'ITCCAT AGACAA~rrA GCTAAA.AGCA AACCAGCTIr TTCGCCACCA GATAGCATGC CGACTGATTT 'rrTAACATCA TCTCCTGAG;A AAAGGAAGC TCCAAGACGG TTGCGGA'rTT CAACTrCTGG TGTCAGTTTG AAATICA?1'CC AGAGTrATC CAGCACCGTA TTACTTGGTG TCACCTTGCT ?'rGGCTrrGG ATTAGCGCCA AAGCGCPTT CTCCCTTGAT AAAAGGAATC TCATACTAAC CAACCTCAAC TGvGTCCAC.AA TAGACTTGAT AAAGGNTGAC TTGCCGATAC CA'rTTGCACC ATCTAGTA ATCGG1TCTrG ACAAGACTTC AGTCAAAACA ACATTGCCCG ACG~TrTC GCCAGCTTCA GGCTGTCCA AACGTTCCAT ACGTTTAG;TC GTTGAAGCAC GAACTAGATT TTCCT'rCTGT TGCTTTTCZAT AG7r7rTGC AACGATAGCG- ACAGCATTCA TCTTACGAAG CCCGTCATAG CCAACAGCTG CATT'rTCAAC AGACTCGAAC GTCATGrGG CTGATTC.
7TTrCMCACT TGTTTACGGC GAGATTGAGC GCGATTGACA LACTCTTrCCA GAGCAGCGAT CrCAGTAACT ACCrrMGGC? CCTTCAAT'rC AATCTAGCGT GACAAAACGA GCTAATTCC CCACATACC ATCCAAGGAA TGC= GGTCA AATTrGTCCCA ACCTTCTCCA AGAAATAACG G'TCC1'GGCTG ACGALTAATGA GGGCACCCCT ATAGTACC AACTAATTCT CTAGCCACGCC CATrGNTCA A'rATCCAAGT GGTTAGTTGG CTCGTCCAAG ACCAAGAGAT AT'7rGACC-A CCAGAA.AGCT TCCATTCAAA ATCGCTCCAA CTCAGATAAG CGGTCATAAT CATCTCCAGC TCCATCTGAC AAGCATrTCA TCCTAGATG CAGACAAATA TCTTNTTTCT 'rGGGCTrrTC AAGGAGCATT 'TTGGCA6AGG CCAAACGAGT CAGCAATTPTT CATCrGCCAC ATAGACTCGT CAAACTTGAA TA'rAGCTTC ATAGCTAXAG CCACCIGC=I C4GCGAAAATT 7620 7680 7740 7800 7860 7920 7980 8040 8100 81.60 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300
CTGACATCAC
GCAcrrrG'rCT 'rATTTI'CAGA
TATTGATTTC
=~ATCCAAA
CTCCGTCCGA
CTCAAAACCG
TCCGCTAGTT
TCCTCACCAG AC71TTTCACC CGCAAALTCAT TAAAGACATG CTATCTTGGG CTAGGTAAGA GGCTCCTCTT CTCCAACTAA AATC7rCAAA AGAG'rAGACT TACCTGCACC ?rCATCAACC TGCAGGGA TATTATCGAA TTT~IrAGCT TCTAAAATAA rATACA.AGT ATAATCGTAA GTCTT'rTAGT ACAACTTT'rA TTATATTAGA TTACTTCACT A74CTTGTTGG AGTTATCGCA CAAGTCrATT ATTAATTCT AAATAAGATG AGAACAAATC GATTGGGAAA ATTGTTTCCT ACTA=rrAG ATTCAGTCTA ATTTr'rCCCA ACAAGAGCAA TCCGATCTCG AAGAACCTCT CCTGCAAAAG AACG;TC.AAT
AGTATAGCAT
TAACATAAAA
A7T'TCrAAC
TTTCATCATT
GTTTCCCTAA
TAAACTAAAT
CACTAATCT
TACGTACGTA
GGCATTCAAG
TATGTATATX'
TGTTT-CAAAT
TAGCAGATTG
TGTTTTAGCA
ACA='CAACT
GTAAAATTAA TT=CATAAA CTATATACAA TATrrTCGGA 906 TTTTAACTCT ATTTIA?1TACT AGATTrcATA ATTrAAAAAC CTACIMACCA AGCrAGAAAG CT1'GATACAA TAGGCTTTFr AAAGACTGAT TATYTAACAG CCTTTAAG AGCTTTACCA GC?1-rGAATG CTGrTAC=r AGAAGCTGCA ATrGTCATTr C?1rrACCAGT TTGTGGGTTC CGACCTTTrAC GTTCTGCGCr. CTCACGAACT TCAAAGTTAC CAAAACCGAT CAATTGAAC'T
T
INFORMATION FOR SEQ ID NO: 133: SEQUENCE CHARACTERISTICS: LENGTrH: 3502 base pairs TYPE: nucleic acid STRtANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: TTGACTATCC TATCATGCTr 'rCrAAGGTCr ACTCAAGAAA ATCAT~TTCA AGMrCACA CCNrTCTCAA AAAAGrTAAA AAATTTCTC AAAAACGCTr GACTCTGACC TAAGGCGAAG GGTrATACTA TCATTGTAAG GAGGAAATCA TCTACCATAT AAAAGAAGCT CCCCAGCT'rr CAGGTGTCTC TGrCAAG-ACC CTGCATCACT ATCACAAGAT AGGACTCTTG GTCCCCTTAA AGTCGGAXAAA CGGCTATCGA ACC'rACAGTC AAGAGGATTT GGA CGCCTT CAGGTCATTC 9540 9541 TTTACTACAA ATATCTAGGC TTTrc=rTAG GCACAGATTT ATTGCCCCAT T1'GACrAGGC ATCTGGATAC C"TMTTT''CC ACCT TGCAAA AAATGACCAT T3AGGAAAAA TrCACGGGAT AAGAAGCGOT AGAGAAA'rAT GGTCAAGAAG GTCACGAAGA CGAGGCrACG GCCGCCTTTA TTCAAGTrGG orTACCrGCA ACAGCAACCG AAGCCATTCG CACTTATGGA TwrrGACTCCT AGAAAATACC AGAC-GTTA AAGGAAGAAA AGTTGGACTA TCTA6ACT CGC GAAAGGCAAC AAACTA'ITCA AGAACAAAAA GGAGAAAGAA TTAGCTATCA AGAC.AATCAA AAATACCACC TCATGGGACA AGCCCTCGAA CGCCAAAAAG ACCAAGrCITr TCAAACTTTG GCACAAAATC AAAACCAGGA GCAAGCAGCC AAGCrCTTGC CTATTGACGT ATTCGGTCAT ATCGCTAAAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GTTACGrCTA CAACCCAGAG TTTAAGGAAA ACATTGACAA GTTTGGTTCT GAAACAGCCC AGTACACGTC AGATGCCATT GCGGTTTACG TTCAGACAAA TGCAGAATAA ATAGGOTAGO AATTTCCTAG CCTATTIT AC TfCAAATC ATAAAGCCAG TCPOACCT T'PTTGTAGTA AAAGAATTCA QTGAGATCTT C?1'CTAGAAA CACACGAAGC ATATCAGACA TATCATCOOT' TGCAAGTTTT AGATGAGAAA GATTCAAA GTCCTCCCAC CAAACI~rCC CTTCGTCTGA AGACTGGAGT TCACCAGTAA AGTGTTCTGT CTTGTAAAAA AGGACCACAT AACGATAATC 907 CTTGTCGTCA TACCAC?1TT TTCTrCTITrC ACTTCAjCGAA ACCAGGAAAA GTAATGCCAG TCCGTwTTTA ATCATACACA CAACCTTTAA TCrcI1rAATCA AAGTrGATGT-r AAAGCAC TGATACCACA GAGTTGCGTc TTWGCAAATGA TCAGACCAGT TGACAGCATC GACAAAGGAT TCG-CCACGTT CAACATG.ACC ACCAGTCGrCG ATTAAcICG TCTTGGACCA GGACCTTATC TG=rAACAAA TTCGACTGCC TCTCrrCl'GT TCM-rCTTC-A TAATGCAGAC TTCCCGCCAC CCAGCCGGTA cAGAGGGcAG GTGTGGGCAT TCATATCCAT AACrrCWCcT GCAAAGTGGA
GGCCAGGTAC
CCrTGGTAAC
TAATGGACTC
CACCTTACTT
AAAGGACTr GACAAG7rGT TCAAGCGTTT TAGGATTG.AT TTCCTTGACA GCAAGGGACA ?rrTTCCACT TACAGGAA'TT TCTC=~CCT TTTCAGrCAG rTrrT=GACT
CTG.ACTCC-AC
TTrAAGTTCTT ?T'rCAGGAT ATCC?1'GTAC AAAAAArrCG GCCAAGCGTT CTGGTAACAA GGrTIrAAA GCGrTTTrCA AGGA=ITT'C CCGATTCT TCTAGAAATG TAACC.AAGTC AAACATCGAG TGAGAGAACC TCCCCACCTT Tr.ACAAAGCT GACCTGACAA ACCAAAGTGG GTAAAGAGTA AATCATCAGT TTAGCGTCAC ATCGTCCAGA CAAATACC=1 GTAACGCTTT AAGGACTTTC AGCAGCCTCA AGATCGGTGA TGGTATCCTT GACCAAAACC AGTCGAACCA GTCGAAGGAT AAGACTTACC TCTCACAAGT GAAGGTTTGA TCCGCTGACT TAAGGACA.AA CAGAAACGAT T=CA=rTGA GTAGCAACTT GACCACCTAG CTTC-CAGAA ACTTGACGCCA AGACATGCGT AGGGCAGCAG GATGACATGC TTACCATAAC ATGTGGAAAA TCTGTTAATA AAAATrGGCGA GCAA'rCrCGT ACCTGITGT ACAATGAGTT CTGGTCATCT AC=='T'AA TTCGGTGATT TTC'TTCCA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2 160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 AAGCTTCGAT AATAGTCCGA CCTTAAGTTTr AACACCATTT AGAAAACACT GTrAAAGAAAG TACCA7TGT GGTCACATTG GACTTGTCAC TGGCTGGAAA GACCCCCC TGG;TCTTCGA TCTGTAAAAA AGTTGATGAT GTCATGATTA TCGAACTGG CCTCCCT'TTC CAGGAATTCC AGCTA~C-AGG 'IrGCTAAGC CAACGTCCCC CACCAGTCCC ACCTAATrTT TTTCCAAG'~T TCCGATTTTT TTCCATGG AGGG"rT'rCT GTCCATAAAA TCATACCAGC AGGTCCCCCA CCGATGACAA TAGTATCAAA .ACCACAAAAA AACAAGAGAT .ATGTCACC TCTTGCAAG TAGCCCATCA GCAAACCGCC CTCTCTGCA TAGAAACTGC ATNT'AATAT CCGCTTGTGG GAAGGTTTCA CGGATTCCCT TTTTCGTTAT TGCGTTGGGC CATGACAATA CGGCCACCAG TCATCATAGG CACCTTGAAC TGATITCTTT CATCCCC?1'G GCTACTGGAA ATCGrAGCC-A A7437rTrCAA GCTCTATTGT AATGCAATTA ATCAATTTCA AGAGACCAGA GGTTGGTAr-A CTCAGAGCTG TTGACAACAT CATATCCAGC T!TrACTAAC CT-I-TTGTAG CAArrCGAGA 908 GTCCCAGTT CACTAGCTT rCCGACCATA CGAATGTwrGA GA.AGGCCAAC GACCGTACCG ATAAGCTTGC 'rAAACGGCC GrrCTTCACC AAGTrATCGA C'-LrGGCrAG GACAAAGAGC AACTTAGTTT T71C=rGATA GGCGGTGATA GCTTCAACCA CTrTCAAA AGACAAGCCC TGGTCAATCA AGTCA"-rCALA TTTTTCTACG AGTAGGTCAA CTrCACCACC AGCAGATAAA 2940 3000 3060 3120 CTATCAATCA CATGAATCTT AGTGTcAGGA TGGTCI'CCA TG-AGCACTAT TGTGACTGCC AGAAAGGGTA CCTGTGATGG CCACCTTCAA ATGCTCGCAA ATAGTCATCT GGGCTTGGAC GCAGTTGCAT ACATGGTTTC CATCATTTGG TCAATATCGA ACCTGATCAG CTACTTGAAT GGTT'AAGGGG ACATACxA GTrGGCAGTT GACGATAATC ACAACCAGAG 'rCACCAATAA CTCCATCI-rr GTCAG.AACG; AT GATAAATATT CTTTGCTACT TTACTAGGAA AATGrT'rrG AAGCCGATTT TGAAGCTCT GACTGGCGTC ATC.AACAAAG AGTT-GTGTT AATAGCTGGT TCTTCCAAGT CATAGAA.ATT INFORMATION FOR SEQ ID NO: 134: SEQUENCE CHARACTERISTICS: LENGTH: 12665 base pairs TYPE: nucleic-acid STRANDEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: CGATTGATTT ?TTTAAAGCG TTCGATAGAG AATGAGAAAC GAATCC-rAG CAATGGCGGG AAAGAATTTG GAGTTGAGAA TACAAAACCA TT-AACTATGG rCTTGCTrGC TTGACGCAAT GGTGCACAAG ACAATTTr'rG TTAGTCTTGC I-rATTmTC TAT1GCTGATG 'rTATGTTGC ATTTGGACAG TGAAGCTTAT GCTTGTCAAT AATCACAAAT AGGACAGTAA AACACCCTAA TTACTTTTTA AATATTCTTC TrGTTGAGTC ATGCTTATGT GACTTTTGTT TAGTTTTTC TATCCACGAA TACCTGAAGA GGAAAAGCTA TTACATGAAG ATAAAGAGAT AAATACAAAA T TCGATTTAT ATACAGTrC-A GTTAAAGAAA AAATATAGAA GGAAATAAAC ATGTrTGCAT CATTATTCAA T')fCGTAAATT TAGTGTTGGA GTAGCTAGTG ATGGGAAGTG TGGTrCATGC GACAGAGAAC GAGGGAGCTA AATACAGGCAA ATGAAAGTCA GGCAGAACAA GGAGAACAAC CTCATATTGT tTTrrATCTC ATGGCATGGG CATGGTTGGT TCA""TC-ACT'r GTTGCGAGAT ATGTAGATCA TATCTTGTT CTGAGTTGAT TGGCTTGACC CAGT7TATGC AGTTATTTTG TTATAATCCC AAATGGAAGC TATTGAAGTG ATATAGTAAG CAAAAAGCGA AAGAAAAGTA TAG-1-GTTGC CAGTCTTGTT CCCAAGTACC CACTTCTTCT CTAAAAAACT CGATI'CAGAA CGAGATAAGG CAAGGAAAMA GTCGAGGAA TATGTAAAAA AAATAGTGGG TGAGAGCTAT GCAAAATCAA CTAAAAAGCG ACpATACAAT ACTCTAGCTC TAGTTAACCA C??C.AACALAC ATTAACUMCG AGTATITGAA TAAAATAGTT GAATCAACCT CAG.AACCA ACTACAGATA CTGATrGATGG AGAGTCGATC AAA.AGTAGAT GAACGTGT CTAAG?1!1'GA AAAGGACTCA -1%TCTT~CGT CAAGTrCA2A CTCTrCCACT AAACCGGAAG CTrCAGATAC AGCCAAG.CCA AACAAGCCGA CAGAACC-AGG AGAAAAGG'rA GCAGAAGCTA AGAAGAAGGT rGAAGAAGCT GAc~xAAAAAG CcAAGGATCA AAAAGAAGAA GATCGTCGTA ACTACCCAAC CATrACTTAC AAAACGCTTG AACGAAAT TGCTGAGTCC GA'rT=GAAG TTAAAAAAGC GGAGCT'rGAA
CTAGTAAAAG
GAAG'rTGAGA
GAAGCAGAAG
CGGCCAAAAC
TGAAAGCTAA CGALACCTCGA GACGAGCAAA AAATTAAGCA AGCAGAAGCG CrAAACAAGC TGAGGCTACA AGGTTAAAAA AAA'rCAAGAC ACATCGTGAA AAGAAGCTAA ACGAAGAGCA GATGCTAAAG ACCAAGGTAA ACCZPAAG=G GAGGAGTTCC TGCAGAGCTA GCAACACCTG ATAANAAA.AGA AAATCATGCG -6 S 0
S
0000.0 0 S. -0 0 AAGTC7rCG ATTCTAGCGT AGGTGAAGAA ACTCTTCCAA GCCCATCCCT GAAACCACAA AAAAAGGTAG CAGAAGCTGA GAAGAAGGTTr GAAGAAGCTA AGA.AAAAAGC CGAGGATCAA AAAGAAGAAG ATCGCCGTAA CTACCCAACC AATACTTACA AAACGCTrGA ACI'GAAATT GCTrGAGTCCG ATGTGGAAGT TAAAAAAGCG GAGC?'GAAC TAGTAAAAGA GGAAGCTAAG 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GAACCTCGAA ACGAGGAAAA GACGCTACAA GGTTAGAAAA CGAAAAGC.AG CAGAAGAAGA CCCGCTCCAA AAGCAGAAAA CCAAAAGCAG AAAAACCAGC GAAGAATATA ATCGCTTGAC TCTACTCCAA AAACAGGCTG GG7TrCAATGG CGACZAGGATG AGTTAAGCA.A GCAAAAGCGG AAGTTGAGAG AATCAAGACA GATCGTAAAA AAGCACAAGA TAAAGTTAAA GAAAAACCAG CTGAACAACC ACCAGCTCCA GCTCCAAAAC CAGAGAA'rCC TGATCAACAA GCTGAAGAAG ACTATGCTCG
TAAAAAAGC?
AGAACCTAAA
ACAACCAGCG
AGCTGAACAA
TAGATC-AGAA
TCAACAGCAA
GAAACAAGAA
GCTCCAAAAC
CCGCCAAAAA
AACGGTATGT
AATGGCTCAT
CTGAAAAACC AGCACAACCA GGTAC1'rCTA CAATACTGAT CGTACTACCT CAACAGCAAT 0@S* 0.0.
0* -GGC=CATGG CGACAGGATG GCTCCAAAAC AATGGTTCAT GGTACTATCT AAACGCTAAT GGTTCAATGG CAACACGATG GCTCCAAAAC AATGGr=CAT GGTACTACCT AAACGCTAAT G=~CAATGG CGACAGGATC GCTCCAATAC AATGGCTCAT GGTACrACCT AAACGCTAAT GGTTCAATCG CGACAGGATG GCTCCAATAC AATGGCTCAT GCTACTACCT AAACCCTALAT GGTG.ATATGG CaACAGGTTG GGTGAAAGAT GGAGATACCT GGTACTATCT TCAAGCATCA 910 G=TG1ATGA AAGCAAGCCA ATGGTTCAAA GTATCAGATA AATCGTACTA 7G2TCAATCGC TCAGGTGCCC TTGCACTCAA CACAACGTA TGGGTAAACT AAACCTAATA TAACTAG?= A TA'rTCCCTAC AAATACCA'rA TCCTTOCAGT TAAAAAATAA CTCTGTAATC TCTAGCCGGA TGATGAGGAA AGAAI'GGCGG CATTCAAGAG ATTAAGCC?? CTCCAATTGC AAGAGGGTTT TGGCTCCACG. CAGTT'rGGCA GCGCCAGATG GATCGCTATG GAGTCAA'rGC A'rACTGACTr CC-GTAAGAA AGATAATATA CCCrGTAGG CAAT1GTGAA
CTCTTTAAAG
AAGTTTACAT
TTATAGCGC
GCTCTTTAAG
CAATCrCTGC 31rCCACGGAG TfhGACTAC GGA7?rT= AGAG-TACGG GTrTrAAAC-r CAGGG-TGCTG CrGCGAAA ATACTGAGGA GCGAGACCC CAATCGTTTC AACTGrTCGT GA TCCT GTTTCAAAGT 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 GGAATTCACC AACTGCGACG TTTTCTCCTT TGAGGrrAAT AGGCGATCTT CATCTTGTCT TCAAAGGTCA AATCAGCTAG AATG.GTTGAT 'rrGG~rGAAG CGTCA~rCACC AACrCTrCG TGGCGATCAA TCZAATCTTG CGTGGCCAGT ATACATTrTGT CAG=rCAGC
TTTTGACAGT
TcTCTd=ATC
ACCCCATGTT
GTGCTAGGCT
CGATATGAAG
TTTCCATTAC
AGCGAGCAGCA
AAGTGGGATA
CTTGAGCCAC
GATATCGACG
GTCr-rCATCG AGGTA.AGGAT TrCCCATCGC AGCTCGC;CrA ATCATGACTG
ATGCCTTCCT
GTTAGAGCr
TCACGGCTAC
T'1-rTCTAC1'G TrCAAGGACAG
ATAGCACCAG
ATATCGGTCT
CTACCAAAAA
GCCATGCAT CGAGGGCA CAAGAGATGG CTCCGCCCAC ACTIGGACCTT GTT--A7,GAT CTTCGTTCTT CACGAT1r'G TGGTGT'rT-C TC;ATGAA- GTTGGATAGA GACAGGGT= TGTTGTA'rTG GATTCCTG CCTTTGCGAT AGTACGAAAG GATTGGGAAT CTCAATATrC TCCTCCACGT CCTT-rITCATC TGGC-rTIG GACAGTACCG ATATCACCGT GCZCAACCTT GTAAAGGCTC TCALAGGTCTG
GAAACACCTG
CCGGTACCCA
CATGTGCAGG GTrTTTCT AACGACTCCA GCTCCGAGCT GAGTrAAATCr 3480 TTGACAGCGC 3540 TC'TGCTGCGC 3600 'rCGCCCTCAT 3660 TCAGAGACCA 3720 GCTGAG~tGG 3780 CCAATCATAA 3840 AAAGTTGTAA 3900 CATTTCCTGT 3960 TCACGCCAGC CATAGGCCCT AAAACGGTAC AAGGTGTATT AACATTTGTC ACGAATCAGT GTAGTTGGC AGAA'TTCACA AGTGAT?1'CT GCCCCGTGGT CTTCCTC7rT AAGTCTGAGC TTGGAAGGCT GGCAAGAGCG TTCATAAAGC TGGAAACGGA TTrCTi'CT'C AGAAAGACGC T=CAGGCTT AGGAGCGCTT CGATATGGTC GTCGCTTTCG AGAAGAGTAG AT'GCGTTT'PT AAGCGAGC AATCTCTTCT rTCTTGCCTC AAACCACCTG CA.ACCTTC.AC CTTGTCTTCC TCGTCCAAAA GAAGGCGTTT G?1'GGCTrrC ACTAAGGTAA AAGCCAAGGT GTTCATGCCT ACAGTCACAT CGTCCCCCTA GATAGCCTTG AGATAGCTGG CATTTC~rGG CTGGC.AAGAC TTGAACTAGG GGACATTrGAG GCCGACCGCT CTTCACCCAT TTCTCCAGAG 4020 4080 4140 4200 4260 4320 Alr.AGGcAG TTATACA~?r CTAAC;GATrr CCACTACC~r TGACCATTC CAACAAAAGG TCCGACTAGG ACTTCACC-AG ACACCAGGAT 1rwr'GAACAA CCrTGACC TTCcCCCrGG GCACCTAGAG AGC'?AGATCC CAACACCTTA ACrIGTAAGI-r GCTGCGACAA TCTGGCrAGC CATAAGAGTT CGACCAAGCG C ri GAI'G TrrCTTCC AGTGCGC.ACG, GTTTCAGTGC AAGGCTrCCCC ?1'TCTGATAT AGTTTAATA ArTTATCCA AAATGCCCAA AGGGGG3AGCC GTGTGT'IAC TGATrICAG GCATGAAAAT AAAAAGAGAA ACAGATI'ATT TTAGCATTTG AGG1'AGAAAA Ir.AAAGCCAT AACAAATGTA TTACGAGAT TGCrCAATTT TCALATACTTT CCrTTCrATT ACAAGAGTCT AG'TCrGCTCAT AACGAGGAAT TCGCA TTTGATGTCA TATCAGCGAC GGTGATAATA TGATTTCC ?rTTCATTG CTACAGTTGA GCTACCTT-GG TATCAAGGAC AAAAGCACCA TrAGCTACrAT TTTAGCATAA GATAA-GGAC CAGGAAATCA TCAGA~rrAT GCTATGCTTA TTGATGGCAGA AACCCGAGTG CAGACGACCG TCAAGGCrTT AATGGAAGAXA ACAGGA~rr CAAAAGCAAC CCTAACCAAA TATGyTCACCC TGCTCAATGA CAAGGCtTTG GATAGTGGCT TAGAGCTGGC TATTCACTCA GAAGATCAAA ATC--GCGTCT GTCTATCCGT GCAGCTACCA AGGGGAGAGA TATTCGGAGC TTGTT-=GG AGAGTGCTGT TAAATACCAG AITTTGG~r ATe ~CCTCTA CCACCAACAG ?T-r-rGCCC ATCAGCTGGC- TCAAGAATrG G1'GATTAGCG AGGCTACGCT TCGTCGTC-AC I-rGGCTGG1-r TAAATCAGAT =MTTCAGAA 'NrTGAT1'rAT CCATCCAAAA TGGCCC7rCC CGAGGTCCAG ACACAGAT TCACTATTTC TATTCrGTC T-rTTCCGAAA GGTCTGGTCG ACTCACGAAT GGGAAGGTCA CA'rGCAGAAA CCAGAGAGAA AACAGGAGAT TGCCAA'rTTA GAGGAAATCT GCGGCAAG TTTCTCTGCG GGGC-AGAAAT TGGACTrGGT TCTCTGGGCT CACATCAGTC AACAACGTCT TCGGGTCAAT GCTTGrCAGT TTCAACTCAT AGAAGAGAAA ATrGCGAGGGr AT'IrGACAA TATCTT'rTAT CTTCGrTMC TGAGAAAGGT TCCCTCCTTT TTTGCTGGGC AACATATrrC ACTAGGAGTT G.AGGATGGTG AGATCATGAT ATTCTrCTCT TTTCTCCTAT CTCATCGCAT TCTCCTCT CATACTATGG ACTATATTCT TGG.I'TTCA GGGCAG-rGG CAGATTTACT ACGCAATTG ATTCAAGAAA TGAAA2AAGGA GGAACTAVTG GGGATTATA CAG.AGGACCA TGTCACCTAT GAACrCAGTC AGCTr?TGTGC TCAAGTCTAT CTCTA'rAAGG GCTATA=N ACAGGATCGC TACAAGTACC AGTTAGAGAA TCGTCATCCA TATTrACTGA TGGAACATGA TTrTTAAAGAG ACAGCAGAGG AGATT-TTTCA TGCTCTACCr GCTT'rCAAC AGGGGACAGA TTAGATAAG AAGAITCTCT GGGAATGGCT CCAGTTAATC GAATATATGG CTGAAAACGG 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 9090 912 'rCGCCAGCAT ATGCGCATTG G=GTGA'rr CAcAT=,=GT T-rcTTGTCT TTTCAAGCAT GGCAGCCATT TTGAAACGGT ATTTGGAATA CAATCCr?!' ATTACCATTG AAGCrrATGA CCCTAGTCGG CA~rATCATT TGCTGrT=AC CAATAACCCG ATTC.ATAAGA AGGAACAGAC ACCAGTrCTAT TATrrAAAAA ATGACTTGGA TATGGAGGAr TTGGTAGCGZA =1CCCAG~r 6120 6180 6240 6300 ATITATTCACT TAAAAGCCTT TCACATTrr TTAAAAATTA TAGTAAGTGT AAAGAGGAAA GAAAAATACA TCATGGCCAT AAAAAAGGGG AAAAGGTTAG.
CrTGGCTG ACCACAATC GCTTTCATCG AAAGCGGTG'r CGTGAAACAA CGCT'rGTCTG TGGC-AGTC-AC GCCAGACAGC AAPATTCCATC AAAAGACCG TCGA7=~GC ATCATGTAGA GG.a.CTATCG ATAC'?TCGT'r TACTCAAATG CAGCTCGTAC AT7TTCAAA TCCTTAACAT GrTAA'rCCA GGTrC"r-.r GTCAAATTCA CACAA'rCTC AAAAAAGTTG ATAAACAAGA AACCTI=A m.77=GATAC CACCTCAAGA TCTTTATC-AG TGACCAGGGA ACTACA.AGTT C'rCGAGTCAA AAAGAGIA GAGGACAGTA CATGTCACAA CTCCGTCCCAT CATr"TTCAAC CC-CAGATTTT CCCTCAGGCA CAATGAAAT TGGAACCTG TrCAGTCAGT TATTGCGGGTr a a. *a a a a CAAGCCAAAT CAAATCG;AGG GGATAAGAAA ACAGGACTTC ACCT'rGCT GAGCAACTAA 7"TGATTAT GATGCTTACT AGGTGCTCAA GAGCGAGCAG GGTT'rrGGAA-A TTGACTGAC-3 cATGCTr'rAT AACATTAAAG CAATCGGGAT TACCAACCAA CTATCTACAA TGCTATCGTT AAAGCCAAGG; 'NATGTrGGAA TCTCTGCTAC CAAGG'r'CGT AAAAAGGGGA ATrGCTCTTT GTCGCCTCA CGTrGACTGAC AACTC.AAATG GGATCATGAG
TCCGAAGSCT
ATCTACGGCA AGACAGC-CC ATTCCAmTC GCTGGGGACC AACXAGCAGC CCTCTTTGGA AATACTATG GAACAGGCTC ?TTCATCATC GAAAACAACC TCTGACAAC CATTGGTTAC ATACTTCCAG AAGTTCGTrC TACGGTGGAG AGGTGCCAAT CAGTrGGCTT 'rTGAGCCAGG ATGAATACTG GGGAAGAGAT GGAATCAACG GTAAGGTrA
TAACTCCGAA
CTCAGGTATG
TATGGTTAAG
GCAGTTGTCT
TTATGCCTTG
6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440' 7500 7560 7620 7680 7740 7800 7860 GAAGGTTCTA TCTTCATCGC AGGAAGTGCT ATI'CAGTGGC 'rCGTGACGG TCTTCGCATG GTTGAAAATT CACCAGAATC TGAAAAATAC GCTCGTGATT CTC-ACA.ACAA CGATG.AAGTT TATCTCGTTC CAGCCrAC AGGTCTAGGC GCTCCATACT GGAACCAAAA TGCTCGTGGT TCCGTCTTTG G71rGACTCG TGGAACAAGC AAAGAAGACT TTATCAAGGC GACTTTGCAA TCTATrGC= ATCAAGTGCG TGATATCATC GACACCATGC AAGTGGATAC TCAGACCGCC ATTCAAGTAC TCAMGTGGA TGGTCGGCA GCCATGAACA ACTTCCTCAT GCAGTTCCAG GCGGATATTr TAGGCATGA CATTGCACGT GCTAAAAACC TGGAAACAAC AGCTCTAGGA GCGGCCTTCC TAGCAGGTT GTCAGTAGGG TACTrZAAAG AC~rGGACGA GTTGAAACTC S
S.
4 *0
S.
S
TrTGAACGAGA CAGGACr CTTTGAGCCA TCTATGAACG AATCrCGCAA GGAACAACTC TACAAGCT GGAAGAAGGC TGTGAAAGCA ACTCAAGTCT rrCCGGAAGT ACACCACTAA MACTGGCANGA ATAAACCIGAT ?rA-?AZAA ALGTG?-.AAA TACCAA- TICAAAG-IAA.
CZACcTGAATT GTCAATTAAA AAAATGCAGG AACGTACCCT GCACCTC"?TG ATTATCGGTG.
GAGGA.ATCAC AGGAGCTGGT GTAGCCTTGC AGCGCCAGC TAGCGG-TCTT GAGACTGGTT TGATTGAA.AT CAAACTTT GCAGAAGGAA CATCTAGTCG- TTCAACAAAA IrCr=CACG GAGGACrTCG TrACCTCAAA CAA'IrGACG TAGAAG'TGGT C -AGATACG GQrrICTGAAC CTCCACTGGT TCAACAAATC GCTCCACACA TTCCAAAATC AGATCCAATC C7C=rACCAG TIrACCA'rGA AGATGGAGCA ACCTrrAGCC TcIrCCGTCT TAAAGTAGCC ATCGACTT ACGACCTCTT GGCACGTGT'r ACCAACACAC CAGCTGCGAA CAACC?rMG AGCAAGGATC AAGrC~wrCGA ACCCCACCCA AACTTGAAGA AGGAAGCCCTr GAGGAGT GGA=T'ATC TTGACTTCCG TAACAACGAT GCGCCTCTCG TGATTGAAAA CATCAA.ACGTL GCCAACCAAG ACCGTGCCCT CATTICCCA.AC CACGTGAAGG CAGAAGGCTT CCT=TGAC GAAAGTGGCA AGATrACAGG TGTTGTAGC CGTG.ATCTCT TGACAGACCA ACTGT'rnGAA ATCAAGGCCC GTCTG=~AT TAATACAACA GGTCCTTGGA GTrGATAAAGT ACGTAATT-G TCTAATAAC GAACGCAA'rT CTCACAAATG CGCCCAAC'rA AGGGAGTTCA CTTGGTAGI'A GArrCA6ACA AAATCAAGGT TTCACACCCA GTTTACTTCG ACACAGGTTT GGGTCACGGT CGTATGGTCT TTGTTCTCCC ACGrCGAAAAC AAGACTTACT TTGGTACAAC TGATACAGAC TACACAGGTG ATTTCGAGCA TCCAAAAGTA ACTCAAGAAG ATGTAGATTA TCTAC'7TrGC A'N'CTCAACA ACCCCTTCCC AGAATCCAAC ATCACCATTG ATGATATCGA AACCAGCCTGG GCAGGTCTTC GTCCA~wrGAT TGCAGGGAAC AGTGCCTCTG ACTATAATGG TGGAAATAAC GGTACCATCA GTGATCAAAC Cq'rTGACAAC TTrCATTGCGA CTCTCAA'rC TTATCTCTCC AAAGAAAAAA CACCTGAAGA TCTAGTCT CTGTCAGCA AGC7TGAAAG TAGCACATCT GAGAAACATT TCGATCCATC TGCAGT'ITCT CGTGCZTCTA GCTTGGACCG TGATGACAAT GGTCTC~rGA cTCTTGCTCG. TGGTAAAATC ACAG2ACTACC GTAAGATGGC TGA.AGGAGCT ATGG.AGcGcG IrGTTG-ACAT CCTCAAAGCA GAATTTGACC GTACCTTAA ATTCATCAAT 'rCTAAAACTT ACCCTGrrTC AGGTGCACAA TTGAACCCAG CAAATGTGGA TTCAGAAATC GA.AGCCTrTTC CGCAACTTGG AGTATCACCT GGTTGATA GCAAGGAAGC TrACTA=- G GCAAATCITr ACCTCAAA 'rGCACCGAAA GTCTTGCAC TTGCTCACAG CTTGGAACAA GCGCCAGGAC 7920 7980 ao4o 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8,700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 *5
S
5*4 S
S
559' St 55 9 5 914 TCAGC"TGCC AGATACrTTG TCCC7TCACT1 ATCCAATGCC CAATAGTr. AC1'C?1AGCC CAG1-rGACTT CCTTCTTCCT CGTACCAATC ACATCCTT~ TATCCGTGAT AGCTTGGATA GTTCGCTGA GCCAATTTTG GA'rGAAATGG GACGA-rCTA 'rGACrC.CACA GAAGAAGAAA AACCAATTA CCGTGCTGAT GTCGAAGCAG CTCTCGCTAA CAACGATTTA GCAGAATTAA AAAATTAAGA AAAAATAAAA GAGGGA=G GCAGCATTCC TTGTCGCCCC TCCcTl-r TTAATGGAGA CAGAAAGATC ATGAATGAAT TA'GGAGA A=~CTAGGG ACTIrAATCC TGATTCTTCT AGCAAATGGT Gv'rGT1TrC.AG GTCTCCTT= TCC-TAAAACC AAGAC-CAATA GCTCAGGTTrG GATTGATr ACTATGGGTT GGGGGATTGC ACT71GCGGTT GCAGTCTTTG TATCTCGCAA GCTCAGTCCA GCTTATTTAA ACCCAGCTG1' GACCATCGa? GTGGCC1-rAA AAGGTGGrr GCCTrGGGCT TCCTG4GGTCA CATGGTT CAGGCAATAT CCTGGCAACC TGATTAGCGA AATCCTTGGA ACGACTTTCA GGCAGGTATC TATCACTAGG TGGGACAACA TCATGCACAG CATCMCCA TTC--TGT'rGT AGGCCCTGTT AGT!TATACT CTTCGAAAA'r 'rACACCTTGC GGCTAGCTTC TCCGrTGC CTTATATCTT ACCCCAGTTC GCAGGGGCCA TrGTTGCAAT TCAAACCTCA CTATGAGCCA GAACAAAA'?G T-AGTACTG GACCAGCCA'I CAAGGATACT GTATCAAACT AC7=TTGTTr TGGTGTTCAC AATCTTrGCT 7rGGrT=.
GGAA"CCTTTG C-AGTGGGAAC 77rGA'rTGTC GCTATCGGC;T GG1-rATG=CT 'GAACCCAGC TCGTGACCTT GGACCTCCTA AT'rCCAAACA AGGGAGACGG AGACTGGTCT TACCL-TTGGA ATCCGCGCAG CC'rGGCAGT GCTTGTATTC TCAC=T1 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 CAAATTCAAA CCACGTCAGC CTAGTTTGCX' CTTTGATTIr
GTCG;CCTTAC
CATTGAGTAT
*e T'rATCTTrGAT AGAGCTTGGG CAAGAGCCCA ATTTCACCAA AAAATCAAGT ATAA'rAAAAC CMCATATC AACCACGAAA ATTCCACGAG GTCAACTACA TCAACAACAA GCCAAAACGC CCAAAAAAGG CCCCAAAAAG CAAGCACCTG GCCGAAATrGG TCAAATCCTG ATTATGTCAA CGAATTAGAC CCAAAAATCG AG1'AGAATTT CACAAcrTCAC AAGCCTTT GGAAACTCCC GACG;CGCAAG CCAAAAACCT GAAGAAATCG AGCAAAGGAC AGCTGAGCT GACGGTAAAA TTTGAACCGC rAACAAAT AGAGTTCCC AAGTATTATG CTTACA6AATT rrAAcTI'AAAA q'A'AAAcccT ccrrAAT CTAG~cAGGG. TTTATATTTT
CGTACTCAAG
TAGAAAACAA
AAATCTCTC
GTCAGAAAGC
CAAGCAACGT
TTGATATC
CAGAA.A'CC
AACAACAGCT
ACTTGAGCAA
AGAAATTCAC
ATTCAGTGAT
ATGGATATGG
CACCAATTAT
cTAGGTTG.Tr XCGGTTTTT-A CATACCCAGT AAACTTCCAT TTT~C7*rAG CAACATGGAT GCT7TTGTGAA 'rCCAAGTAAG ACTGATAAGC ATAGTTTGAC ?rTCTATAGT A'rAACTACTT GTTATGTAGT TTGTATACCA AAATATGCTC TGCACCCCAT GGACCCCCCA ATAAAGCACC TATCCTACCA ATC-ATATAAC 1'GATTCCAGC 11460 -ACCAGTCATG AAGT1'AGCGA ATGTGTTAGC TTGTTTATrTC CCATCTATTG TGr-,GACGTA 11520 ATTCCAAACA TAGGCATCCT ATGATCTAAA AGATATA71- AGCTCC-A7T CAT rC-r'r-rG 11-580 ATAAGCCATA TAAAATGCCC C:ATTGATATA GACCCCTCA GCACGTCGTT CAATAGTGTC 11640 TACACTTCCA TCTGGATTGA CAACCTCAAG AAC?1'CATCG CTrAAAATAT TTAClrGCGT 11700 A'CTCCCAAC CGCACrGATG AGCCATTCTC AAACTGAGCC TCACCAGATA CAACTrTAGA 11760 GT TrGcCGAT AAGCTATCAT CAGCAAAAAC AAACAACCA CGGGGAAATG CrAGACATAC 11820 AGAAJAACAGA CATAACTAGC AAACACATGC ATTTrAAACAT CTTAGACATA ACGGAAACTC 11880 CT ATTr TTGA~T'-r TCAACTTTrA TrATACAATA AAACCAAATA AAAAGAAAGC 11940 GGTAACAATA TGCTTAATGC GAAAATTrTT TATATA~rTT TATG=rTCAT CG'rTATCGAA 12000 ACTACAGGCT TGTTTGTT GAAAAGAGGT CTCGAAATGG GTTATTAGA CACAGAAGCT 12060 ATTATCCTCG CAGTTTTTTC ATT-CCTTTr TACAACCTAT G'rTAPTCGC TTGGGTC-GC 12120 0TTACAATAA AAAACAATAA AAAATAAATA GACGTA'11"r CAAAAAAAAC maAATGCATA 12180 1-ATA1-rAG CAAACGACG ATT!AATCG TCrrTTTr GTACTACCAC GGGCATGTCG 12240 *TATATCTGAG GTGTAAGrCC TCACCTGAC TATCGTGAGG rACCAGGGAG AGGAACGGAT 12300 **0AGCGAAATCG TGGCTCTACG AACAGGAACG TGATAGTAAG GCGTATATAG CGGATAAGGA 12360 **.GGCTTCAAAC TCTAAAGTCC AAXAAAGGTAG TCGTAACCTA TATG-GTAA.A 'rCACGAGAG- 12420 AATTGAATTC GGACTAAGGT TTGTGTGAAA AAGATAAATC TTTC-AGAGT CTAAAGACTC 12480 TGCGTCAGAT TTCCTATTTT CACTGTAACC TTTTAACGTC CTCATATCTT GTATAAACGA 12540 GGAAAGATGT ACGACTTATC CCGTGAGGTT TCATGAGCGT GAAAGCG'rAG TAACAACGAA 12600 TCATGAGAAG TCAGCCGAGC CCA'rAGTAGT GAGGAAACTr CCGTAATGGA AGTGGAGCGA 12660 ****AGGGGC 12665 INFORMATION FOR SEQ ID NO: 135: SEQUENCE CHARACTERISTICS: LENGTH: 5305 base pairs TYPE: nucleic acid 00000(C) STRAI4DEDNESS: double TOPOLOGY: linear (xi) SEQUNCE DESCRIPTION: SEQ ID N4O: 135: CGCTAATCAC TACAATCATT TTATTGTACT ?T-rCACTCT CAAGAAAAGC AAGAAGTATT 916 CArr1-rAGTT TCATTTAGTA TTAT'rr1'GCA TACCI'AAAAT ACAGTAAAAA ATCACTCATC -TTGCTATGCT CCrCTTTCA CTATTCAACA CC?1'TrTGAC ?TATACTAGG CTCA'rTTCCA AAACCATTAT ATAATAGr4GA TATGAAACCA ACTAAACTAA ACAAC%;AATA TAAGCAATAA AAA'TCal-N AAAAGATC-= ACTAAAGCTA ATACTAAATA AAAATAAAAG AGTAAACTAG GAACG'rTATT TCAAACAACC TAAAATACTG ATNrrCGGCr GAAGATAATA CTGGAGTGCA AATTAATGGG GTTATAATAA A'rAGCTGATA GCTTGT Gr~wrrGGATTr TTAAGAcGr AGATGAGTAT 'rAAAACTATA AGGAGGACGA AG=~GC1TAA AAA"-wAAAA TTAAAATTAG CTCGG=AGA GCGTGATTTA AC-ACAAGGTC AACTGGCAGA GGCTGTCGGG GTGACACGCC AGACTAT'MG TrrAATACGAG GCGGaAAAAT ACAATCCCAG TCrCTCGCTC TGCCACTCTA T~rrGCAGATG TTrTAGGGAAA ACCCTAGACC AACTATrTTG GGAGGAAGAA GATGAAAAAT AGAI=~ATTr ATTCTCAAT'r ACTAGACGA.A AGAGAAGAAC AAC-GTTCAA TAAAGCGGGC TCTGAAAGTT TCTATATCTG CATTGCTI TCGCTCCTAT CTTATATCAT TTCAGTATTA .GCACCAAGCC TrI-rAA'rTC TAATATGCTG CTAATCGTTA TCATCATAGG GACATTTAC TTTCAATC GTGCCCGTTA TCTGGGAGTG ACCTACTATG GTCGTTrTCA TTTTACGATT ***TTGGGTTG?1' =rTCCTAAC C~rrGGCTATT ACGGCTC=T TCATGTTGCA GAATTATCAA *..TTCAACATAG AAATTTATCA GCACAATCCT ?TGAATTTTA AALTACCTGTC TGCTTGGGTC ***ATTACT'rATA TCATTTACCr TCCGTGGATC T7TATTrGGCA ATCIrGGCT TAACAGCTAT GGCGAATGGG CTCAGAAAAA A1-rTGAACAA GATATGGATG AATTGGAGAC TGGAG.AATAG CTT~ACTC TT-TTCTCAAT CC-AGCTAAAA TGTrGATATAA TAGTACTAAT TTATrGGAAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 ACATGAAAGT TCITGAAAAT 7wrTCATGGGT TTCTAGCTAA TCCAGATGAT AGTTGACAT TGCACACGGA %=TGTACCAG CT'GACCAA GGGAT'rCACA ATAAGAAGGC GGTCT?3'GAG 7T.TTAAGAAC GGCTATGCGG T?'TTGCAGG TT'rAGAAAGA CTTGCGTT TCAGA'rAGTG ATATAGCCTA TTTGGAGTCG CTTGGATTAC CTTCGCAAI-r TCAAGTTGGA G7TGACCGTT T7rrGGTrI-rr GCTAA'rGAAC CGATTGTGCA GGTGGAAGGA GGTCGAAACG GCTCTNrTGA ACATCGTCAA CTACCAGACT TCGTATT'CGT TCG=TATCG AAGATGAACC CTTGATGGAC GGAAGTAGGA AAAGTATGTA ATCAACATGA TGCAG"IMA GTG'TATTTCC GCC.AACAGCC A'rTGTGAACT ATCTTIGAAGA CTTGGTrTATC ATGGGGCGTT CGTTCTGCCC AAGAAGGGGA CCnTCTAGCCC AATGTCAGr ?TrGGTGGCGA CGAAG-GCAGC T=rGGGAcAc GTCfGGGCTcA ATTGGTGGCG CCAATCXJIAC CGCAGCrGTG ACGAAATGGAT GCGGCCATCT GGGGAACACG CACCAACGTG CGTGCGGGTA AGCTCTTTGA CATTrCCTG?= TTGGGAACCC ATGCCCATCC 917 C7rGGACAG GTTTATGCC-A ATGACTA'rCA AGCTTTCAAG CTI-ACCTr CGACCCACAA AAATTrGTCTC TrTCTTGTCG ATACCrATGA CACCCTTrCGC ATCCGTGAC CAGCTGCCAT 'rCAGGTG-CCG CGTGAGCTGG GTGATCAGAT TAACIATG GGTGTGCGGA 'N'CACTCTGG GGATAT"TCCC TACATr=CA AGAAAGTCCC TCAGCAACTG G=~AAGATT TATCCITCTA ATGATCTAGA TGAA.AATACC r.ATCAGGC~r.G ATTTACAGA ATCC?-'AACC TCAAGATGCA AAAGGCCAAC A7TCATCTCT CCGTCCG GCTCTTGCC GCGGrrACA AGATTGTTGC TACGATTAAGCTcGTCTAATA ATGCTGAA.A GCCCATTACC AG-CGrCAAA AAGCCAACTC GA'ATTAGC GACATGACAC AAATCAAGIAT GACCCTTCGT AATTTGATG- CCGTTCCTCr AGTTTACAAC 7TGCCTAGTT TGACTGACAT TACCAAAGCG ATTACAGCCT ATGACCACC AATCGAAGAT GAAALCTGGTC AGATGCCGCAA
ACTTTCTACG
AGAAGCCGAC
GTTCCATCCG
CCN'CCTCCAT
TCAGGAT1TAT
CCAGCTAAGA
TATATCACTT
ACCTATACAT
ATCTCAAAG
GCCCCTAAGG
ACCAGGTGTG
ATGATGGTGT
ACATCAAGA.A
AAGGAATAT
AATTTGACAA
ATTIGGCGCG
cCC~C7GA
CAAACCAGTG
G7rGTGGGAT GAGTATAAGC GTCTGCTCAA TCCCCAGCAC 'rATCCAGTGG TGATGTATGG CAAGATAAGA TGGACTTGCAT TGATAAGATC CGCAAGGAAG AGGAGAAGA.A CAATGAGTTT GCAAGAAACG ATTATCCAAG AGCTGGGT 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 ATTGATGCCC AGGAAGAAAT CCGTCGOrTrCT A'rTGA'TTCT TAAAAAGATA 'rCTGAAAAAA CATCCCTTCC TAAAAACCTT TGTACTAGGG GGACGTTTGG CCCAATTAGC TATGCAAGAA AAATTTATCG CTCTCCCCCT GCCATACGGA GCCCTAGCCT TCATCCACCC AGATGTCAC GCCATGACAG CrGCAGTGA ACCACACCT ATCAAGGCAC GTTCCCGTAT GATTGCTCAG GTCA.TTGGAA CAGACCACGC CGCGG-AAAAT CCCCGTGCGG ATATTCTCCC TCTTACCGC CAGAAACTTG GCGCAGAGCC AGCCCTTTAT GATAAACCAG CCrAGCTGA CGAAGTCGCA A7'rTCIrCGG GACAAGACTC AACCTTGGCA CTGCCAGCTG AAACGGCAGA CCATAGCTAC GTCCAAGCTG ATGA.AGCAGA TGCTCAAAAA
TTGGT'TGTGA
AGTCCTGTTT
TATGCCCTTG
ATCACAGGTT
CTCAA'rAAAC
GAAAAAAMCC
CTTGGAGTCA
ATATCAAGGA ATCAGCTGAT CAGACTTCAA CAAGGGGAAT CTGGTTCCCA TAGCCGAGCG TC-rTACCAA GTTTGGTGAC GCCAAGGAAA ACAGCTCTTG CAACCGCA CCTAGAAGAA CCTACGCAGA GATTGACCAC TACCTAGAAC GCAAAACAAT CAGCCCAGAA GCrCAAGCGA CCATT43AAAA CTGGTGGCAC AAAGGCCAAC ACAAACGCCA CTTACCCATC ACCGTATTTG ATCACTTTTG GGAGTAAAAA GGTCCCGCrGG ACCTTrMAG CTTCTTGCCC TGAAATrAAA AAGCAAGAAA AACCTCCACT 918 GGAGGTrrC AGCCTCTCAT CT1'GAAATAA GAAAGTGAGA GAAGCGTCTGG GGGATCrGA ACCCCGAGTT TAGAAATAAG AAAATGAGGC AGATTCAGTA ACTCr-AGAG TrCGATTrCA TCGTCTTACC CCTGCAACGA TGACTAGGTT 'rGAAAAAGCT TrCTAGAGcG CATP'rCAAAC CAGGCAGCAA CTGCGTCAAG AAATTAGAAG ACAAACTCGT TTTCTAGCTG TTACTGAGTr GAGCCwrr"r ACTACGAGTA TAGAAATAAG GAAGTGAGGT AGCATCATGA AATCTATCGG TACGCAAATA TT-ACAGACAG AACGTT1'GAT TTTAAGAAGA 7IrGTGG.AGA CTGATGCAG.A AGCr-ATG=r CAAAATTGGG CTrATCCGC TCAGAATCTG ACC-ATGTTA CCTGGGATCC CCATCCTCAT GTrCrGAATCA CTCGAAACTC GATII'GCAAT TGGGTrGCTT CCTATACTAA TCTCAACTAT TATAAATGG TATXCAGCATT GTTAAGATAG CAAGGCTTAC TI'GGGAAATG CCATI??GCT AAAAGAAAAC CCAGAGCAAG TAATAGGAGA ACGAGGCTGA rTAAGC'TCT GAAAT'rGGCT ATIMVTTAGG GTATGATGAC AGAGACTTrG AAAGCTATCT TGGACITrrTG 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 TTTTACTCAA GCAGGTTTC AAAAGGTCAG AGCACCT'TAT GCCAG-,CTrCA ACCCAGCTTC AGGTCGTGTC ATGGAAAAGG CTGGAA'rCC CTATCTACAA ACCAT-r'GTA ATCGTG;TAGA GAGA'kAAAGGC TATCTTGCGG ATCTTATTTA TTATGGTATA AG;TAGr.GAAG AATGTGAAT TCTA1"rTCT GTTTCTATCG AAGTCAACTA 1'TTATI'GTAA ATATAATAAT TAGCAPCCA AGTTTA?1'TG AAACTTTAAA ATAGCATATT GATTAGTACA AGACAGA-TGT 'rCTAG~rCCT TCrTTAATCT GG TAGTGr TAGTTAAAAA TAATCGACTA GATTCTCCAG CTGTTGTTAA ATTTAAAATA 'rATTCATGTT T'rTTCTATCA CTATTCCCAA ATCATTCATA ArGAGCCACT TTCrTICCTCC TTTCCTCGCT AGATTTCCTC TC-ATCTCGAC TG'TTCTrTAA CCGAACAGG7 GAATCC'rCTA
AATTATCTAA
CCTCCTCAA
ATCCCTTTAA G GTAACT AAGAGGGAGC GGTAATGTAC I-rTATAGT GTAATCC-AG TCGAGTTAGG CAA'rTAAATT CAACCAATTT TATTAAAATA GTCTCAT-TCT CTACATGTAA CTTAC-AAAAC 'rCATGACGTC AG 1'TrACTT TCTGCTGTTC AAAAGCCAG ACrCCI'CCCT TGGTGCGTCA TGCATCATTA ACGACCTT TCT=~AGGT GATGAGAAAA 4800 CCCTGACCTC 4860 CAGCTATCGTT 4920 CACGATTTTT 4980 GGTTCATAAG 5040 AACAATTCGG 51.00 ACACATTTTC 5160 AAGTCACCTC 5220 ATCATACGAA 5280 5305 *CAACAGGAAG ATTrCAGGTTG ACTTTTCTAA 'rCCrAGAATA AAGrC.CTCAA AATAGGc~ATA GAGACTACAC AATTTGAGGA GCTGC7rGC6 TCCTGTTCCGA CCACCACGTG AAGAAAAAGA TGGCGGAAGC GTTTGATTGT 'rAAACTTTGG CAGCTAGATG 19fGAGAAAA AGATAGAAT TGTAGGCGAT ACAGCTCATC -CXrG?1l rT GATTAAGGTT GAACT INFORMATION FOR SEQ ID NO: 136: SEQUEN~CE CHARACTERISTICS: LENGTH: 3964 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOG~Y: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136:
TGGCAGCTCG
CAAAAArCAG
GCAAATGGAT
TCGTCGTAAA
TAGGAACTCG
TCGGAAGCGA
CTAAAACATT GTTAGAAATC TTTTACTATA AAG1GAAGT TCAGCTATTA CAATTCCAGG ACTAAACTGA CTGCTGAAAA
GOACCCAAAG
ACTCTACTGA
'GTTACAGTA
GATr'PCACTC
AGGTGGAGAT
AAATCAATTG
ATCAACTGTT
'r-r-rGGCTGC ?TrMA=r. TGTAAAAAAG TTCAGTAGAT GATTGAAACT AGAATAGTAC ACCTCTC-TT TCCTGATCCA T TGTCCTGT TATTATrTTA GCTACAGCAA CAATCGTCTT TAAAGATGGT GTAGCACAAG ATCCAAAAGC ACAAGATACC AAAGCACCTG CTCAAAGAGT AGATGTAAAA GTTAAGGTTG CT'AT-TTTACA AGCAAA'rGGT GCTGCAGATO GTACACCAAC AATCACATTC AAAGATACAG TTCAACA.ATC TGCGAAAGGT TATAAGCTAG ANAATACACC AGGTGGAGAT ATAATCCAAA CCAATTCI'AT GA'rATAACTC ATI'TAACAGA TGAAGAAAAA TCAGCATTAG ACGGAGCGAC AATCAATGTA CCAGATGGTTr CAGTAGTGAC GA~wrCIAGGA GAATCTGTAA CTCAAGAAGC TACACCAGAG AAGGGAGGCA ATACTGGAAG CTCAGATGCT GGTGGATCAG CTCACACAGG TTCACAAAAC GCTACTGAAA AAGAATCAGC TAAAAATGCC GAAATCAAAG GCGCACCGCT TTCTGATAAA GCAGAAAAAC AAGCAGCTCT CAAAGAGATT AATGCGAATG A.AGGCGGTGG TCAGCTCAA'r CACAAGC'rTC ATTGAAAAAG CAGCCAAGGA GAAAAAGCAG AACTT-TAC CAAAATGCGA AAACTATGCA GAACCAGAAA CGATTGGAGT GCAAGCCATT GCCATG4GTTA CAGTTCCTAA GCTCCTAATG CTGCTCCTAA GACAACAAGT GCACCGCAAG CAACTGCAGC GATGTACCT ACCAGTCACC TGCTGGCAAA CAATTACCTA ACACAGGTITC GCAGCACTTG CTAGTCTTGG TCTAGTGGTG GCAACAAGTC GTTwTGCTT'r AAGACTAGAC GTAGAAAATA GAACAGCTAG AAAATTCTAT TCTCTACTTA ATAAGGGGCA ?TT'1'GACGAAG TCA'rCAATCC TAGTCATGGG TGAGAAAAGT GATAATCACA 'ACTTTAGCT GAATIAGGAAT ATTCrATCAA TGTACCCAA1 TCTAACTGTG GAATAGGAGA TGGGCAATAT CGGATAGAAA AGATACCAGA
TAGCCAGGCG
TAAGCAATTA
CAAGCAGGA'r
AAGAGTGGAA
AGATGTGAAG
GAGACCAGTG
AACAATGCAA
AGCATCAAGT
GCTAGGAAGA
AAGTTAGATI'
GAGAACCCAA
CTCTTCTGTC
ATAGCTCTCT
120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 920 ATTGAAGAGA GGAGGGGAAA CCCAAAAAT AGCTGCCCC CCTC?!rr G.GTATAATAG AAGATAGAAA ACGAGI-rAG AAGAGATGAT TTTGATACA CATACACACT TGAATGTAGA AGAA7TTTGCA GGTCGTGAGC CAGAAG~AAT TGCCTTGGCT GCGAATGG CTGTGACACA GATGAATATr G7-rGGTrG ATAAACCCAC GATTGAGCAT CC~rrCAC? TGGTAGATGA GTATGAGCAG CTCTATGCGA CTATTGGTTG GCATCCTACA CAACCTGGTA CTTATACAGA GGAAGTTGAG GCT'rACTTG TGGATAAGTTr AAAACATTCC AAGGGG C=IAGGTGA AATTGGCTTA GATTACCA'rr GCATGACAGC GCCCAAAGAG CTGCAGGAGC AGG-TT=CG CCGTCACArr CACrATCTA AGGACT'1GGA TTTrTCCT'rr CGAACAT ACCrATGAGA TTATCAAGAG TGAGGGCGTT GCATTCATTT TCACGGACGC TTGAGTGGGC AGAGAAGr'r TTCCTTCrCA GGAGTGGTGA CrTTTAAGAA GCCAACTGAC GrrACCTTTG GACAAGATGT TGGTGGAAAC AGATGCGCCT GCGTGGTCGT CANAAATAAAA CAGCCTATAC TCGCTATGG GCGTGGTA'rG ACGACAGAAG AGCTGGCCGT AGCAACGACT TGGACTGGAC AGCAAGTAAT GAAAGAGAAA ATqrITCAAG GATGATACGG 'rCAATCTCAA ACGTTAT'rrC GATGTGGAGA
GTTCTCCATA
GCT CCI'CGTG
GTGGAI'CTTG
CTCCAAGAAG
TACTTAGCAC
GTCGACFrA
GCAAATGCAG
?TATCCTCGT
CCTATGAGAC
ACCA.ACGTCA
CCCGTGATC
GTGG;TATrCAT
GI'ATGACCAT
CAGCTAAAGA
CTGTACCCAA
TCGCTGACTT
AACCAAT=r
TGAAGGGCCT
TCGAC~rCrT
TGGAGTCATT
1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 CACCCrCC GCCATCALATG CTCAGGATAT GTCITrTACAG ACCCACA=r CCAACAGTC A~CA3TGCCTT GGGCGTTCTC TGGCAAT'rGA GTGACAGAAC AATTTGAACA CTTGGTTTTC TAGCAGGGC CGAATrCGGCT ATTCCAACGG 'rTGGCAGAAC TCGAACAAGC TTTrAAGTT TCACAGTATT GTCAAAAAA'r TAAAGAGATT AGAGCGCATT1
TAATGGGGAA
TCTCA.AGCGA
GCATGCCAGC
TGAGAGTCAG
AGACAGCCGT
CAAGCAACTC
TATGAAATCT
CGGATTCGCC GCATGATCAT GATCGTCATT GATGAAGCTG TTCCCAAGTC CAAGACCAAG TATGAAGACC TGAAAACGGC TCTAGCTCAA TTTGACATT-A GTCGTAGCGA TTTGATTCGC AAGCGTAGAG AATATCTCGG AGAGACTCTC CTCAAACGCC TAGAGIG TGGGGTTACT TAIrGAGTAGG AAAr-ATGTAG CCGTACAAT S S TrrCGAAGCA GGTAGAACAG GAGGCGTCTG ATG=rAArrG CGGATAGAAA AAGGAArrAG TCGTCCACAT TTTGTGGAG 5* *5 0 5 0 0 ATGAGCAAGA ACTGACAGTT CGTCAACTGT CGCGAATTCA AAGTGGACCT 'rCGCAACCGA GTTTGCCCA.A GTTAGACTAT AVIGCrCGCC GCCTAGGAGT TCCAGT7TAT AGCC7*rA'GC CGGA"r7rTC AGCTC'rrCCT TCTGCTTATT TAGAATTGAA A'rACCAGATT TrACCTCAAC CAATCTATGG TAAAGAAGAG rGTACCATA AGAAGGAAGC GTG'G-.rGGAA GAGATTTATA 921 AAAcATAcTT TGATAA'rCTT CCTAAAGAAG AACAATTAGC ATCTGSAAGTA TrGCAGGccc 3300 cPrGCATAC 1-rCTAGAACT ACAAGGCCrG AATATGCAGA GTTAATACTT GAGGAACATA 3360 TGCCTCAGAT TATAGAAAAA GAAGCT TA'T CAATAAA'rGA 'rATGTrGTrG ATTCGPrcr 3420 TTTTTTATCA AATGCTCAT'r AGAAAAGATC 'rTGCCAAATT TATAAATCAA ATCCAAAAGC 3480 TrAATGCTCTT TCTTTrGGAA CAGAAGAAGG 'rAACTCAAAT AGAGAATTAC TTTATAATTA 3540 GAGATACTCT TAITCAGGA A'IGTGTTGTC TTrGAAAAGGT AGGAGTAACT GATTGTTTrA 3600 ATGA'rTATCr ATCGTGTTTA CAAGAAATTA TGGATAAAAC TCAAGATTAT CAAAAGAAAC 3660 CTCTTGTATT TATG"rrG TGGAAGCAAG CATTAAGAGA AGAAAGAGA'r TTrAGTrTAG 3720 CrGAATCATT TTATCAGTCT TCTAAAACAT TTGCGCAGCT AAN'GGAGA'r CAATTTCAG 3780 TAAAGAAATT GACAGAGGAA TGGCAAGAGG ATGTCAAAAA ATAT'ITATAA ACATAGTGAA 3840 TCAGTGACAA AGATGTCCTT GTCCTCGTAT CAAAACAGTT CTAAAGTTCG rCTTTAGGGA 3900 TGTIPTTA GATATAAGCT AAAAATGACA CGAAATGGrT AGA'r1I'AAG GACATTGATG 3960 TCCG 3964 INFORMATIION FOR SEQ ID NO: 137: i) SEQUENCE CHARACTERISTICS: LENGTH: 12666 base pairs TYPE: nucleic acid 4(C) STRANDEDNFSS double D* TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: TGAGACCGTT ATTTGTATTA GGGAAATGGG TATCTATrT 'rAATGCTGTG GGGATTTTGA TTGTTTCTAT TATTCAAACC AAAAGCTTGT CAGGTATTrGG AGCAGGATTG TTTAATCTAT 120 ATAACATTTC A'ICTrATATA GGTGATTTAG TTAGTTrCAC TCGATTGATG GCATTAGGAT 180 *TATCTGGAGC AAC'rATAGCA TCAGCTTTCA ATTTAATTGT TGGI'TGTT-r CCGGGAATAT 240 TGGCTAAACT GACAATTGGA TTAGTATTAT TCATTCTTTT ACATGCGATC AATATTITTTC 300 TATCGTrACT ATCAGGATAT GTTCATGGAG CACGTCTGAT ATrGTTGAA TTT=GGTA 360 AGTTTTATGA GGGTGGAGGA. AAACCATT~TC AACCTTTGAA GGCTTCTGAG AALATATATTA 420 *AGGTTATTAC AAAGAATTAA TGGAGGATAT ATATAATGGA ACATT-TAGrCA ACTTA'TTTT 480 CAACCTATGG AGGAGCTTTC TTCGCTGCAT TGGGAATTGT ATTGGCGGTr CGATTAAGCG 540 GTATGGGGTC! TGCTTATGGA. GTTGGTAAGG CTGGGCAATC TGCCGCAGCT TTACTrGAAAG 600 *0 922 AACAGCCTCA AAACTSTCCC TCAZCNGrA TATTGCAATT ATTGCCCGGA ACACAAGGAT TATATGG= T G=rM'rGGA ArTMAATTT GGTrAATT AACTCCAGAA CTCCTTAG AAAAAGCCGT 'rGCTTA71--rC 'rrGTAGCTC 'r-rCCAAx-rGC TATTGTAGGA TACrrNCAG CTAAGCATCA AGGAAATGTA GCAGTAiGCGG CAATGCAAAT CTTGGCTAAA AGACCAAAAG AA'1TCATGAA GGGAGCAATI' TTAGCTGCCA TGGTAGAAAC eChSCAATT C~rGCT-tG TCCGTATCAT'r CATGACC CTTCCTGTAT AAGAAATAAA TTrGCAATTC AAAGGAGGTG TCTAAATGAG2 CATrrAG.AA AACTTACGAG AG1'CTGTTA'r TGAACAAC' CATC-NAAAAG GGCCGTATCAA ATTATTGGAT TCCAAAAAGA AGA'!rGATrGA Tr.Axr-rGAA ATGCAAAAGT CGCTCATrTAT AAALGAAAAAA GAAGCTGAAC A'rCAACGAAA GTAAAACAA TTGCAACAGA AATATCAAAT AA=rT "CAA CAATTAAAAA ATAAGGAACG CCAA-CAACG TTAGTATCAA AACAGAAAAT ATTAAAACAA CTITT-rrCAAT CI'GCITACT AG-AATrGGAA TClwrrGA.GTG C.AGATAAAGA A.ATGGAGTTC ATCTATCGAA ?TCTGGAACG ATATrCACAA CAAGAGTCA TAC'rAACCTT TGGGGAACGG ACTTTAGCTA AATTCAA=r GrAACAATA GAGAAATTGA AATTCTCTTT TCCAAArrAT TTATTTACrTG AACAACCTAT C'TCAAATGAA TCAGGCTTAC TTA7TTCA.AT AGGTAAAATT GATGATAACT ATrrTATAA AACATTAATT GGATCCATTT CTAAGGA.ACA AAGCA.AGT ATCGCAAA'rC AAATTTTAT CAATTAAGGA TGAAATTGCT TAATCCTTCT TAGAAATTTG GAGTATTCCA ATAAAATTAG AAAGCTAT T TATGGATACr AATC lrTT CAAAAAMAAA TACGACGA?'r TCCGTAAAAG AAAACGATT? TA'rTACAGAA GAAAAATT'rC AAAAAATTAr ACAATCCAAA GATACGGAGA CAT'rGGCATT TATCTTAGAA TCAACTCCCT ATCATTTATC GATTGACATC TTAGAAGATC CTAGTCAGAC AGAGATTCG CTAATGACAA AATTAGTCAA TGATTATAGA TGGGCCTATC CTGAAAGTCC GTCTGATATA ATTwGTGACT-r TATI GCTTT ACCATATG? TATCATAATA TCAAAGTr ATTAAAATCr AAG-GCCGCALA TTAAGAAAGA TTTTTCTAAA TATTAATT'C CAATAGGGAT TT=~rGATATA GAAkAGTTTAA AACATrAGT TTCTTCCI'TA CATTCAGATA CACTTCCTGA TTTTATGGTT CGTrGAAGTAG AATCAATTTG GAATGACTAT GAAAC7TTA ATAA'rAITCO TGTACTTGAT GTCGGAGCTG ATCrAGCATA TrrAAACAT CTGAAACTTT TATCTAATGA CTTAGATGAG CGTACTGTCTC AGGTTATTCT CGAAATGATT GACTTTTATA ATATTATTAC TGTAAAACGT GTTTATCTC AAAATAAGAG TCATGGGGAT ATTTTACAAT TACTTTCAGA TCAAGGAAGT AT'N'CrGCrA AAGAATTTAT ATACATTGTA GAAAATCAAG AAATATTTGT GTG=TCAAT AAAATAAATC CAAGCTAGA I-TCAATCrTr TCAACTTATG AATrG-AAGAT GCAGGACGCA 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 ACAATTTCAT CN'CTGAGTT AGAATrrA TGTGAI-rTAC TATTGTATAA AACI'TAGAT CAAGGAAGGT ACAATGTAGA GGGGCCGTTA GCTCICA GATATTTATT GGCA'rTCAG rTAAGTAA AGAATCTC:AG AATGATCATA TCAATAAAAG AAAGGATACG CCCACATTAT TCCTAGCCGT GATATA~r- 'rACCATTrAG CCAAGAACAA GAAGCTATAA ATrACACTAAG TTATATCACT GAAGACATTG CTTCAATGAT AG'rTGTGCCT GCTATTATTT TATTACCGAC ACGTATAGAC GATAArCTAG AGAAAGCAGT 'rCAGCT~r AAAATACAA'r TCCCI--rCAA GGAAGCTAAT AAGTATAAAA TTGGCATAAT
CATGATTGGG
AAAATTAGCT
ATTAGATACA
TCATAAACAA
AGGACACAAT
TTTGATATAT TTCCTGCCTA CA.A'rCrCATT ATGGTGrCAT AT'TCGCCATT ATG.ATCCCA C.C.T=AAATT TAGGATTAAA ATTTTATAA'r AATCTACAAA GGAGAATAAC TTTGACrCAA C.AGGTATGCA GGAGGCTAAT ATTGCTGTA ATATTATTC? ATAA~rTTTG GACTAGTA.A
GGGAAGATTA
ATTCAAGATA
TAAAAGTATC GGGACCTCTA GrrATTGCAT TT'rGCCGTGT ACGTAAGCTA GGGTTAATCG CG,-AAATTAT TGAAATGAGA AGAaA'rCAGC CATCTATCCA AGTCTATGAA GAAACATCTC GTCTGTCC GGGAGAACCT GTTGTTACAA CTGGAC-AACC TTTGATGGCA TACAACCCCC GTTCGTGGG TAGAAGTTCC ATAGCAATTG GTCAAAAAGT GTAGTTAArC ATAAAATTAT TCTGGCGATT TTACAATTCA TATAAAGGAA CGCTrATGCA TTAATrCCAG AAGAACCATT 'rCTCGccC.r GAATTAGCGC CAGGATTGAT TTCTCAAATG ATTAGATCCA TTTAAATrCC CTACTCATALA TGATTTTCTA AAGTTTGGAT AGAGATATTA AGTGGCATT'r TGAT'rCCACT GAGTACGGGT CATAT'rCTTG GA.ACTG-CAA GGAAACCGAG GGTTCCTTAT GGAGTATCTG GAGAAGI'CGT TTCTATTGCA TGAAGTTGTA TATGAAATAA AAAAATTGGA CGGTAGTTTC AAAATCGCCT GTCCGCAAGG CGCGTrCCTGT TTCTAAACG-T AATCACAGGT CAACGAGTTA TTGATGCATr CT'rTCCACTA AGTrCCTGGA CCG~rGGAG CAGGAAAGAC AGTTrGTACAzk 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140
ACCAAMAGGG
CACCAAGTAG
GGAAATGAAA
CAATCAATTA
CGtrGAGGCTT
TCTGTCGCCA
G4GACCTCTAG
GAGCTGCAGC
CTAAA7'lC CAATGTTGAT TGACGGATGT ACTGAATGAG
ATTGMA=~
TTCCTGAGT
TGCAACGGAC AGTrCrGATT CCTAATACTT CAATTTrATAC ACGAATTACC ATGGCTGAGT TTATGGCTGA TTCAACTTCA CGTTGGGCAG A.ACAAATGCC TCGTGATGAG GGTTATCCTG ATCGTCCTTG TGGAGAACGT TGATTGACCC TAATACCGGA CAAATATGCC TGI'GCTGCT ATTTTCGTC.A TATGGGCTAC AACCCTACG TGAAATGTCA CTTATCTC4G AAGTCGTATC GCTGAATATT ATGAAAGAGC AGGACGTCT CAGGTTCTAG GCC=CAGA ACGTGAAGGA 924 ACGA'rrACTC CrATrTGAGC TCTATCGCCrA CCTGGTGGAG ATATTTCAGA ACCAGTTACT CAAACACTT TACGGATTGT GAAAGTTTTT TGGGGCTTG ATGCCCG= GGCACAGCGA CGTrCAT7=r CTGCAATTAA CTGCCTTACA TC=ATTrCAC TATATAAAGA CAGTGGC ACTTATATAG ATGCG7AAAGA GAAGACAGAT TCGAATAGTA AAATAACTCG TGCGATG.AAC TACTTACAAC GGCAAT=rAG i--rAGAGGAA ATTGTTCGTC TTrGIGGAAT TGATTCTCTG TCTGATAATG AACGACTAAC GATG.GAAATT GCTAAACAAA TTCGAGAAGA TTATrTGCAA CAGAACGCTT =rGATTCGGT AGATACA'PTC ACTTCGrrTG CAAAACAACA AGCAATGCTA AGTAAT'rTC TCAC71".-rGC TGATCAGGCA AATCATGCTT TAGAGTTGGG 'TTTIACrTT ACAGAGATTA 7rGGAAGGTAC CGTGGCAGTT CGACACCGTA TGGCGAGAAG TAAATATGrr TCAGAACA'rA GATTAGATGA AATCAAAATT ATATCAAATC AGATTACACA TCAAATTCAT TTGATATTAG AAACAGGAGG TCTATAAATG AGTGT'rATAA AAGAATACAG AACTGCTAGT GAAG7TTGTTG GGCCrCTTAT GATTG=TGAA CAAGTAAATA ATGTGCTCTTA CAATGAGTTA GTTAAATTC AACTTCATAA TGGAGAAATT CCTCGTGGAC AACTTTTAGA GATAAAGCAA TGCTTCAGCT TTTTGAAGG.A TCrAGTGCAA TAALATTTAGA ATTCGTTTTG CTGGTCATGC ATTAGAATTG GCTGTATCTG AGGATATGGT TTTAATGGGA TGGGAAAACC AATTGATGGrT GGACCAGATT TAA'TTCCAGA
GATCCACGAA
AAAGTCTAAA
TGGTCGTATT
GAAATATTI'A
ATTTATTCAG
ATTACCAGTA
ACAAGCGACT
TACI=TGAA
4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 GATATT'GATG GTCAAGCTA'r ACAGGGATCT CCTCTATTGA 77TTTCAGGTTr CGGGCTTACC GT=AA.Arr CTGATGAAAA GALAGCTGAGT TTT=ATGGA TTTATGAACT TGGCAAATG.A ACTGCCGCAG AGTATCTAC ATGACTAACT ATTGTGAAGC A:~GACGAGGCT ATCCGGGATA TAATCCTrGTA TCTAGAGATT TCATTTGAAT ACTCTTGTAC TCATAATGAA TTAGCTG.CTC TTTTGCC--T1 GTATTTGCAG AGAACTCAGIA AAAACAGGAG TCCTGCAATT GAGCGTATT TTTTGAAAAA GATATGCACG GCrACGTGAA GTCTCGGCAG TTTATATACA ATTATCAA
ATCCAGATGA
GTCGTCAAAA
AGXAACAAG
CAATCGGTAT
CGATCGATCG 'rTCGCT'TrA CAACTCCCCG CATTCCTTTA T'rCTAGTTAT CATGACGGAT CTCCCCGTGA ACTTCCAGGG CTCTATACGA AACGGCTGGT 'TTAACAAT CCCAGAAGAT CGCTTAGTTG GTAAAAAAGG TTCG ACA CAGATTCCTA GACATAACAC ATCCAATTCC TGATTTAACT GGATACATTA CTGAAGGGCA AATTATTTTG TCGCATGAGT TGTATAATCA AGGTTATCGT CCACCAATCA ATGT=rACC TTCTCTCTCT CGATTAAA.AG A1TAAGGGATC TGGAGAAGGT AAAACTCGTG GAGATCATGC TCCAACTATG AATCAACTCT TTGCAGCCTA TGCCCAAGGG AAAAAGGTTG AAGAGTTAGC AGTAGTATTA 925 GGACAA'rCGG CTTTATCTGA TGTAGATAAA T1'GTATGTCA GGTTACAAA GCT1rGA GAAGAcTAcA TAAAccAAGG ATTIATAAA AATcGAAATA TAGAAGATAc GI'GAAcTT
GGGTGGAAT
CTTGATAAAT
ACOTTTGAAT
AGCTGAACGT
TCTAAAATCC
TTCAATTCCA
AGTTrCCTAGA
TACTATCAAT
ACTTACvI-1Tr
GTAAAACCAA
GCACATAAGT
CGTGAGAATA
lwrrGCAGTTG
TCGAAAGAAA
ATCGCATATGA
TCTTCCTAGA AC-AGAGTTAA AACCTATCAA GGTAGAAGTT TAATCCGGAA ATCGAGTGAT CTCCTATGGA A'rrGAATAAC 'rTAAAGCGAAC TATTAAAGGA TAAAAGAGAT GAATTATGA ATCAACTI'CG GAAAGAAGTG GAAACTTATC Ar.ATG.ArG 'rATCTATGGT
G'TGACAAC
GGCCATTI'AT
TAATGATAA
AGGAATTA1IT
TGAGTGGTAAC
CTAAATCA'rr
TTGAATTATT
ATATTAC
ATGATGTATT
AAAAAACG;TG
TAGAATACTC
AAAGAAT-CT
TCG'.GAGAAA
CAAA~r--TGG
CAA.AATATCA
TCAAAATGAG AACAGTGAAT ACAGCTATwT TGCTACAATG AATAGCTlrAA 'I'?ATAAATT ATCTTCTAA'r AGTGAAATGG ACTAAGACTG GCAGAAGTTG ACC;TAGACGT GTAAATG=T TTATATAGAA 'rTGAAACTAG AGCAGGCAGA CAAGTAGATC CTTTATTTAG AITrA'TAATT TTAAGCCTTT CTAAGC="r' =TATTGACA TTGTT'AGATA ATGAAAATGT TTCTACTAAT AATTATATAG AAAGTAAGCG CCTCATAACA
TCAGTTAATG
GAT'rATTCCA
AAGAGCCAAT
AGATGAACAA
GTATCAGGAT
CTAGA7TTAG AGG'rATCTAT GCTG.ATGAXAA TAGAAAAAAC A.ACT-.GTCrGG AAACTATT-CA .TAGT-CGTA TTATGAAAGT A'rATCAGCT r GGATAAGC= ATCTrrtr"CA AAATTTTGGT GATTAGTAAA TCGTAAATCT CATrCATGGA CCCTCCTG TGCC.ACALACC TATI-GTTCCT GAAATGATAT TCI'GC--rAAA TCAATCTCGA AAI=GAGAA TCTACCrAG CGGAGGTCTA TCACTGGCTT ATCTGGTTAA 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7980 TATACTATTA GTAAAGTAAA ACTAI~GGAG GATATTTTAA GTAGAGATrC CACAATCTCG ATTCGTATTG GCAAGCTTGA CACTTTGG ATAAGGTGTT TcrccG'GT- GGGAAAACTG 'rCGrrGAT TCTAAAAAGA ACTAAG?1'TT TTTCAATCTC .GCTCTATCAC AATTCATCTA ATATGAGACA AGGAATCGAT
AACCCACT
ALCACCCIT
TCAGAACGGC
ACAAGTAGAC
TAGAATAGTA
ATTTGTCCTG
GAATTGGATC CTCTCCGC TCAACTCTTT CTCT7"rTGTG GTGGACGTAA AAAGTCTTT ACTGGGATGG TCAAGGATTT TGGCTACTAT ATAAACGC-T AGA7TA'TTT GGCITAAGTAC AGAAAAGGAT GTCAAAGCTC TCACACCAGA 'rGGCTTATCA AGGGCTrTC TATCACTCCA AAAATATAGT AGATTGAAAC CACCTCTGCT TCTAAAACAT TGTTAGAAAT CGAT=ACT GTCCTGATCG TTC-=ATTTC ATTTTACTAT AAATCCATCA GAAAGTCGTG ATTTCTATTG 926 AAA'rGAGGAC T=C?r=TA TACTCATCTG C~r'CAAAAA GCATTCTAGT CCATC'CCGA TTAACGATGG AC~rrATC:AC CTCCrTCTCC AG1TC:LrTA TAACATC'rG GAGTTGATrC ATGACATCTT CCAAAGI-rA AAAGGCr=A ?rCTAAATC CACGTTTACG AATCTCTTTC CACACCTT~ CAA'rGGGGrr CATCTCTGGT GTGTATGGAG GAAThAATGC AAAGCCAATA ?rACTCGCAA 74rAAGGT ACrGA'rTrA TCCATATAG CATTGTCCAT AACCAGTAAA ACATAATCAT CTGGATAAGC TTGTGAAATC TCCTATTCCT AAAGCCCCT TAG.CGCMTAA CTTrGGCTrCA GCTTCTATTA TCGCTCACAC CA'rCCATCAG AAGr"AATC TGAACTACC CAATwTATICCC CAAGAAGuAG ATTGGCCTAG GATCGGTTTA CCAATCACAC GTAAGGAAAT CTCTAATTGG CA'rATCAAGG CGAGTCAATA CTATTTGGAG CCCC=TATA ACC7CTTCG AGAGAGACTA ?'rCACTCAGC CCTTACTrCA TGCG:GATCALA ACTTCTTATA GGGrGCTAGA GAGTGATAC? CAGCTGACTT ACTATTGGAC TrrT=TrA GGTAAAGCAG AGAA.ACAAGG GATTACCrT TACCACCATG ATCAGTGTCG AACTGGTTCA GTAGTACAAG AATTCCI'AGG AGATTATTCT GGCTATGTGC ATTGTGATAT T7,rGCGCCAG TAACrAGGA CTI'TAGTCC CTAGTTCTGC CTATGCGATA GCAGCTCCAAG GTTTAGGAGC AAGGCGACGC TAAGCTTGGT AAACTTCGAA CCGCTCGTC'r GCTTATCGTC AACTGGAAGA AGCTGAACTT GTTGGATG1.~ GGC(CATGT GAGAAGGAAG 7rTTTTTGAAG CGCCCCCCCA AGCAAGCCA TAAA'rCA'rCC TTAGGAGCTA AAGGTTTAGC 'rTATGTGAT CAGTTATTTT CCT'rGGAA.AG AGACTGCGAG 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 864C 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 GCTTTGCCAG CTCATGAACG ACTACAGAAA **GAC=~CTTTG CTTAGTCCGC GCGTCAGTCA ATTGAATACA GCCTCAAGTA TGAAGAAACC GTCCTTTCCA ATAATCTAGC TGAACGCGCC AGAGTCCAGT GCAC'rCr=T AGCCTAAGCT TCAA.AGTTTT GAAGGAGCI'A AAGCAAGAGC CGTCAAG.AAC ATCTCC-AGCC CTrAATGGA.A GNTTAGCAG GTTCAAAACT AGGAACGGCA TTTAAGACCA T'rrrAAAGA CGGACATCTG ATTAAATCAT TCGTTATGCG ACGGAGTAAA CACTTTAAAA AACAGGGT GGTTA=r~C TATTATTATG AGTTTGrrGG AAACAGCTAA ACGTCATCAA TTAAATAGCG AGAAATATCT ATCCTATCTT CTAGAATGTC TTCCAAACGA GGAAACTCTC GTAAACAAAG ACGTTTTAGA GGCTTATrTA CC-ATGGACTA AAGTTCTACA AGAAAACTGC AAATAAGAAA TCrCCACAT-r AGGAACTATC CGTGAG=rCT CCAGTCTGGA GATrTTrCAA TAGACTTCCT GCCAAACAAA ATATGGTATA ATAGTTCTAT CAATGATCAA GCAAGTAAAC A*CTAACCCA T'rACGATTT AAGCGCTTG TTGGTGTTCA ACGCACGACT ?r'rCAAGAGA TGTTACCTGT ATTAAAAACA GCTTATCAAC TTAAACACGC AAAACCTGGA CGAAAACCTA AATTAAGTCT AGAAGACCTT CT'rATGCCCA CTCTTC.AATA TGTGCGAGAA 927 TATCGAACMr ATCAACAALAT TGCGGC-M= r=~TOATTC ACGAAAG;CAA crAATccaT CGGAGCCAAT GCCT1'GAAGT AACTCTTG=r CAAAGTCCTC TTACGATTXC AAGAACTCCT CTCAG7TTC ACGACACGGT AATGArrGAT GCGACGGAAG TAAAAATCAA TCGCCCTAAA AAAAGAATTA GCCAATTATT CTGGTAAAAA GAAATTTCAC CATGAAGG CTCAAGCGAT TGTCACAACT CAAGGGAGAA TTG rCl GGATATCACT GTGAACTATT GTCATGATAT- GAAGTTGTTC AAAATGAGTC GCAGAAATAT CAGACAAGCT CTAAAATCT TCGCTGACAG TGGTTATCAA CGGCTCATGA AGATATATCC TCAAGCACAA AC=rCACGTA AATCCAGCAA ACTCAAACCG CTAACAATTG AACATAAACT C1'M'AACCAT GCCATCTA AGGAGAGAAG CAACGTG~rA AACATCTTTG CCAAAGTAAA AACGTTTAAA ATGAr7TCAA CAACCTATCG AAATCATCTA AACGCIrCGG ATTACGAATG AATTTGATTC CTGGTATTAT CACA'rGAA CTAGGATTCT AGTTrrGCAG GAACTCTATT ATCAAAAATA CCATCAAGA'r TATATAAGAT TGATACAGGA AAAGTT=T' TTGATG=~T AAATArrAAT CAAATAGA'rA AAAAAATATT AAGTCAAAAT TTACGAGTAG TTCCACAGGA TAATATAACT TTAAAGCACr- AA=rACTTC TTC-ATTTIA TTGAACCGA.A CTATTCT'rCA ACAXAAACATA CAGGAAGTTT GTAAAGCACT TrCAAATCTAT GATGAA.ATCA TGCCTATGCC GATGAAATTT AATACTATCA TCTCAGAGAT GGGGTCAAAT ATTTCAGCTG GCCAAAGGCA ACGGA'rAGCA CTGC-ACGTG CATTAATAAA 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 TAATCCTAGT A=TGAATT AAGAATAACA AAGTATATAC GTCAACGATT AAGGATGCGG AGGAA.ATCAT AAAGrACTTAA GAAATGAGG TAAAGAAAA ACTAGGGGC ATGATTGGAA TATTTCTTTA CGCITTGTTGA TAAGAATTAA GCATAATTTT- TAGATCAAGC AACTACTGCA TTAGACACTA TAATGAGGA A.AAGTCAGGG CTGTACTCAA ATAAN'GTAG CTCATACATT ATGTTAr'rT? TGTAATGAAA GGTGrAGA TTGTTGAGTC TGCATCTTCC TGGAGAGTAC TACAG-TTAT ATACAAAAAG TGAAGAAAGA AAATGAATAT GTAAT-rrAA CAACAGCCTC TAGTGT'T= AAlnTTT TTA GA=.TCCAG TTCAATATG ATGCAATAGT AlrGGGTCG CTGATTGTT'r ACAAAAACAA T'rGCTGTAAA CTAAGGAGTA GAGATGCrA TACT"TGAAAT TATAA.ATCTA ACAAAAAGCT TTAAAGATAT 'rGAAGTTATT CATAACACTr AAATAATAGA GcAAcTAcAG TAGTAGcTTA AAAACATGAT TAAATCGCTA TTC=rAGGAG TAGCGGTTT'r TCTI'TITGT'r TAATACTCTT TGAAAATCTC TTCAAACCAC GTCAGCTTrTG CT?1'ACCCTA CTCAAGTACA GCCTGCGGCT CCCNTCCTAG TwMGCTCT1r GAT'1-1rCATT GAGTATAAAA AGGGTCAAGT AAGTATAGTA AATTGAAATA AGATATGA.AC AAA'rCGArrA CA.AAAGTCAA 928 ATTAATIrCT AGAAATATGT TAGAAATTGG TrTGAATTCC GCAATCAATV TGTTCAG~T TTATT-CATT TCA%"IrArr TAATTAGATT TCCAAITT TTAATTCAAC CTAAAAATCC CCAATCCTAG TGATTGAGGA TTGAGTAAAT AAATCTTAAA CAATACCTTG TGCAA'rCATG GCATI-rGCTA CATT-TCAAA GGCAGCAATG CCGTATGTTT CTCAAGTcG.T TA~c-rGTc CGCCCATCAA CTTCTTCACG ACTCCATGAG GCTGAAACGG CTACACCACC AA2CGTTr.GCA 'rCTTTGAAA CNTGATGGC ATCAACGTCG ATAACCCCTT GAGCAACCAA ACCTrrAGCT CATGGAAGAG CAATGTCATA GrTTCCAGCG GCAG?1'GCTT =ICAGCTGC ATACTCACTC ACCAAAAGAT CGAAGTCGAT ACCATrTTCA GAAATAACAG TTGCACCGAG TTCAGTTrCT TTAGC'rCCTC CAAGGTAGTC 'rTATCAAGA TTGAAX2ATGT 'rTGTCATCAT GTCITTTGAGA AGGCGAAGAC TGTTGG CT CATTTCAAGA GCTTTTGCAG GTCCGTAGAA CATACCATTr CTCGGCATGT TGGCAC-?rC AGATACACAG GCTTCACCGT TGATrrCGl-r TTGAGTGGCA TAAGTCCATA CAGTACCrrC GTGGTACCTT AAACGAGCAC GACCTrrrC TTTAACATCA 74ZGATGACAT AACCATTTGA CTCGhr?.
a *0 0 a *0 'rTGAAGAG CATATTGAC GAAC--TGAAA TAACGACTr CTTACCAGCA AAGCTGTTAC CGTrAGCCT TCAGTATAGT AAACCAAACC GTAACCAGT'r CCTGGAC GAATCAAGCT CCAAGAGGI'T TACCAGTCAA GACACCAGCA TCAAAT-rGCr TAAGACG-T
AACCTTACCA
GAGC-ATTTCT
ACCACCAAAT
GTATTGACC-G
112350 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12666 .40 *000 0.0.
*0 *S 0 0 TAAACGTAAC CAATTTCACG 'rCCACCAACA CCGATATCAC CAGCAGGTAC GTCAAGTGAT GGTCCGATGT GT~r'I-rCAA I-rCAGTCATIG AAGCTTTGGC AGAAGCGCAT r-AC1-rCAGCA TCTGTrTrAC CTwrrAGGATC GAACTCTGAT CCACCTTTAC CTCCACCGAT AGGAACTCCA GTCAAGACAT =MAAAGAT 'rTGTTCAAAT CCGAGGAATT TCAAGATCCC TTGGTTTACA GTTGGGTGGA AACGAAGTCC ACCTTTGTAT GGTCCAACAG CTGAG 1'GAA TTGAACACGG TAACCACGGT TTACT'rGAAT TTTTCCATCA CGGTCAACCC AAGGAACACG GAAAGAAACC ACGCGCTCAG GCTCACTAAT ACGTGCCAAG ATATT1'TCTT CGATATACTC AGGGTGTr'x- TCAAATACAG GTCA).AGT GTTGAAAAAT TCr'rCAACAG CTrGGAGGAA TTCAGCCTCG
TGCCGG
INFORMATION' FOR SEQ ID NO: 138: SEQUF.CE CHlARACTERISTICS: LENGTH: 3083 base pairs (E4 TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 929 (xi) SE0UE4CE DESCRIPTION- SEQ ID NO: 1.38: ACCAACTGTT GTGAACCAAT TCCCATAAAT TCCAAGAArr' GCrrAATACA GCCATrrTGA CCAAAAATCC CGATAAAAGC ATAGGCTA ACGGCAAAT TGATCCAGGr AGGAAGGATA ATCAGCATGA CCCAGGTG ACGGTGTTG CrGATrAA~CA GTGCCACAA.A GG;TCACAATX' TTAAGATAGG TCAAG=TTG TGACCCAAAG CCT'rCGATGT TG.AAAAAGG.A TTGACCGAAA GCAATCCAAA GCATGTAGGG TACTACAAAG cTCCTCCATT GCATTGATCA AACCTGCTrC AGACCGGTCA AAAACAtGGGC CGTCCGATAA CCTGCATAAA OCACTGAGT GAA.ACTC-ATT TAAGAT'rTGT AATTTTCTAA ACTGAACTGC ATCAACACCA AGGGTGCCAA TACAAACAGC AG=~AGAGC TTGTTTTCTT CATCTCTTTC TTICTC-rcCG ArCTACG-' ACTCCTCAA'r GAGACGCATG ATGTGGATG;T CTTCrGC AGCCTTACGG GlTGAGTGGA TCATCCATTC ATAATGAACT CCACGGAAAA GCTGGGTATC AAGGGTAATG CGCAACrC=r CTGGACGAAT CCCACCATCA ACCC?1CAA AGCGTTTGCC ACGAGCATCG AACTTCT A.AAGTCCAGA CCGATMrCT ATTTCCAAGT TCCTCATAGIG GACCTTAACr TrCC'TrTGC AACGACCTCA ACAGGTTCAT GTTAAATTCG ACCAAGTAGT AAAGGTGGCA ACAAAGTGGT GACAATCTCG CCATCATTCA
CGGTTTCA-T
CACCCACGAT
CG~AA7-TTC CTTCAC4G
TTGGCTTCAT
CCTCAATCAT GGTACCTCCC AAGA'rGTT'rG ACTCCCCGAT TGAT'rGGCTC ATCGTAGATC 'rAACGAAAAT CCAGTCACTC ATCGTGTG ACAAAGACAA AGGTAAGCC CAATCGTrGT CTGCATCTCT GTTC~ICAATT TCAAGTCCAG CCCTGATAAA ACGGCT'rGG TTATrGATAG CACGGGCGAT GGCCACACCC T'rTGCGGATG GAAC7T= CATAACCTTC CAACTGAACC ACCCTGCTCG ATTTCTTTCT TATClc 'rM ACGCAAGCGA AAACACATTC ATATGTGGGA ACAAGGCATA GGATTGGAAG GT'rGGTTG-GA ATATCATTGA TACGAACACC GTCTAGCATG CAGTAAACCT GCAATAATGT TrAGGATACT TGAT'N'CCCC GGTGTAGAAT TTCCCTrCTT CCAACCAAA GTIGATCTCT CTT~CAAAA ACTTTAGAGA CGrT="PGAA 'rTCG-ATAATT TCCACAGGGG 7TTCCAGACTC ATGGCAAGAG CTTC'rTCCTG TGTAATTrCAC GCAAfrTCGTA GGCTCGTCCA ACAAGACCAC TGACGI'GTC CTCCAGAAAG ATCr'TGAGAA CTTCCGCTAC AGrGGAAAGG CAACATTTTC ACGGTATGTA CGTCGCGCTT ATATCTCCTG T CGCATC 240 300 360 420 480 540 600 660 720 780 840 900 960 1.020 1.080 1.140 1200 1.260 1.320 1.380 1440 1.500 1560 1.620 GAACCAGATG CACCTAGAAG TTGAGAACCr TGGTGTTGCT GGCTT?1-rCA ATTGCCATAA ATTCCTTCTT TTTCATAGAT TAACCGATCG CGG=CCTGTC AGGTCCCCAC TACC TCTTGC AGGGACTAAA ACCACCTCCA 'rACATCTTCG CTACCGATAG GC1rCACCC AACATCCCGA 930 CTCTCTrC AAGCGTAATA CCTGAGTGTT CCTTGACTT TTCGATAACC AGTCCTCGTA Gr??1'GGCC GTTCCA'rCTGCGcACATTcAT CA'rAAATCC1
GATTGCATCA
GCATGCCTT
C'rGACACTTC TACGCCAjcr. ATACGATACC CTTCAAGCC C1'GCAAAATG CCCGACTGGA CGCrAAAGA CCGAGCCACA Gc-.rGAGTrc AccTAGGTc~C GTcAAGCCGT ccA'1TNcc'Tr CTGGACCTAG GGCAAATTTA ACTGACAAGA CAACTGCACC GACCGTAACC AAAAGCCAAG 'rCTrAGCAC ACAGGGTTTC AGACCTTACA AGACrGCAAG ATGTGAGCAA ~TCTCGCCACC AGACAGCACC CCCAACGCTrT CCTCGAATAC CACAAGCAAA GACCGAGGGC AATGCCACTT GTrTCAATCA AGTTACCCCC AGCCATCAAC AGAA-ACG1TTA rrCACCTTGT CACACAACAT AGC =CTGAA ATTAACTGAC AGAITYG.OTAT TCCAAAGGTT C TTGArA6ACC TGATCGGGrC- AGACTCCTGA ATACCTGALAT GATrrCTCCA TCCTTGGTCA ATAGGCACCC GCATTCATAA CTCAAAGCCA GTTAAACTAT AGCT-'CTGCT TCA.ATGGTAT GACAAA'rCCA CGAA'rCCCAC CAfrCACGAAC GATGATATTrG 'rCCCAA6ATTT CACAACGCGA CCTCTCCACC TACT"TTGTA TTCCTTCTAA GATTTCAAGC ATTCATTCCA TTATACCA'rr T1TTGCA'rAA GAAAAAGAGG TTCCZATAGGC GACCAGAACG 'rCAACCAGCC TACTTGCAAA C1'GCTAGGAG ATAAATCCAT GTAATTGGAA ACCA.AGCCC CTrCCATTGC CAAGAACCAT CCAAGCCATA TTTTCTTGc'r GCCAACTCAA AACCATVCG TGGAAkAGACC AAA'rAATCAC 1740 180o 1860 1920 1980 2040 2100 2150 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2883D 2940 3060 3083
TAACTATAGC
ATTTTTCTC
TTAGAGACA
rrCCCCCC-r
AGCTCCACG
AGAGATGGTG
TTCTC VGGC
AAGATTCGCT
TATGCAAGCG 7rC-TAAA6A CGGATA'rCAA 'rTACAGACAT GTCACTCTTC CTTTTACAAA
TTTGACGACC
CTAGAATCAC
CCAGATCAAA
CAACAGCTTG
CAAAACCAAG
ATAAAAATAC CTTGL11GA
TT~CAXAGATTTCCTTGG
T-,CAACCAAG ACTGGATTTG GAACCCATGC AAGCCATAGG GTAAACCCAA ACTGTCAAAA CAAATAAATC TCCCACACCG TrccArAACC TGTGCAT'rC.
ACTAGTAAGG CCTAGAAAAA AAAGTGACTG AATGGTTTTT AACATATr?1T CAGAC.PCGAA GAGTTGAAC rTGCCGAAAGA ACAATGTAA.A GATTGAGTAA TCAACTCCAA GCCACCALTGC CCC INFORMATION FOR SEQ ID NO: 139: SEQUENCE CHARACTERISTICS: LENGTH: 15363 base pairs TYPE: nucleic acid (Ck- STRANOEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ 1D NO: 139: 931 CCGGAGGATA TTGACCACCA CCAAAAGCAG GGGGAAAATC GAAATCAACC AATAGTAGGC TACTGCGACA CTGGTCAACT CACTATCTGA TCTGATAA TAATGCAAAA AAGC7=TAA ThAAGGTTT'G TCTATCAGCT CTrCCACCA CI=17?11CATG TCATACTCCT TCACTTATAA TCTTATACTC AATGAAAATC AAACAGCAAA CTAGAAACCT ACCCCCAACC -CCTCAAAAC ACTGTTTTGA GGT'?GTAGAT AAGACTGACG AAGTCGATCA CATACATACG GTAAGGCGAC GCTGACGTGG TTTGAAGAGA 1'N'TCGAAGA GTATTAACTA A~rrc=CTT ACCAATTCCA CCATATCATA CGGTAGGGTA TTGGCAGCTT CCTTCAAGGA ATAGTTCTCT AAGTTATTTA CA7?TTCTCG TAA7"TTCTTG GCATACTTAG AAATCAACTT GCGCTCCAGA CACGATTGAT AACCCGTTCG GAGACTCCTG CATAAAACTA TTTCAATAAT CATCTCTGTA CACCGAGACG GCTTT'GCTCC CTTCATGAAG GAGAAAACGT AGGCAAGTTT ACGATTGA'N' AAGCACCATT GCTAATGTTC TAAGAATCAA GAGTTTCAAA AAATGAGA -r TTCAATACGT 'TCTGGTGTC T'rCCAACTCT GCATAAGATA ATCC1TGCT CTGTTAAAAA GCTAACAAGC
TAATAGCCTC
ACAAAGGCAC
A1'CAAAGAGC
TTGGCAAGAT
TrCTTCTTcC
AAAATCCTCA
CCTTGTTCTA
CCCTCTTCAT
GCAGCCCAGT
CCATGATAAT
TT'rrCCAACT
TCTTCCTCTT
GTCACACCTG
TCGTAATCAA TCC=r'Cr TCAGCATTTC ATCGATATTG CACTGCTGAT AATAGCTGTT GTCTGTAGAC TCCCI'TCAGG TTTCCAAAC AGAGCTCTGC AATTTGGTCA TAATCAAGAG AGCTACGGAA GGTCTTTCCG AGAGTAAAAA AGGAAACAAG AAAATAATAG GTCAGTCTTG TATTTrTCAG ATAACGTTGG TAAACTCGGT AGGCCTGTTC CAAACCATCA CTTTCAATAC CTTCTTGATC ATCCTGGTTT TCTTGGCTTA 'rGTCAATAGC CGCATAGAGG GGAAGTT'rAT CTAGCGTTAC TTCATTCAAA ATGGCGATAT CATCAGAAAG ATGAGGCAAG ACCAAGAGAC CAACAAGCAA AAGCAAAAGA GGATACTCCT TAGCAATCGA CACCGTTCCC TTAACACCTG ACTTATTTAG CTTTTTCTTG AGGCGTCGCG TAAAACGAAT GACAAAAAGG ACAAACGTAA GATTATAGAT TGGATTGGTC AAGATAGGT TCACAAAGAC AGAACCGTTG AGCATAAAGG TATCCACTTG GGCTTCGAGG AGCGTGAT'TT CAACTACGAC GGCAATAATA CCTGAAACAT
TCGTATTCGA
TTGGGTTTGA
TCTCGAAGAC
120 180 2 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 GTTCTAGAT-r ACTTGGTATC*AAGAGAATCG AAAAGGTCAA GAGAAACATG TCCTTCATAT TTCTATAGGC ATAATAGCCA TAGATCATAA GGGCGATAAG AGATAGCAAT AAAAGTAGAG CTGCTATCAT TTCCAACTCC ATCCCTAAAA TCACTGTATG CCAGACCGTC TCGGTCACCG TCTTGAAGCG ACTTGCCTTT AAAATTCCAG GAACTTCTTC TGCCAGAAAG AAGG'rCACTA GAGGCAAACT CAATTCTAAT AAAAGrrCAC 932 TGGCAATATC CGrrGCGCGC ACACTTAGCA AGAAGGTATG GGAAGCGG TTGCTCATGG CTGTTAAAAA TCCAATTAAA AAACCCCTA GGATTGAAAA GATGAGCGAA CTGCTAGCTT GCCCCAGAGA AAAAGCTCCA CAGAAGCATC ATTCAAGAGT TAAAACGCTC CGAAAGAGAG CAGCCAAGCA AGCTGCCAAG TCAGGGTCGA GATAAAAATC AAATAGCCGT AACATCTGCT CCAAAAACAA CTCCGTATTA TTrCCCAAAAG AA'N1'GCACC TTGAGACAAT CAAAACCAGT TCCTT'AATCr TTTTTACAAC G7TGTCCAAG CTGTCAAACC TACC7VGAAAA GCCACCAAAC CCTTCGCCCT TAAGAATATT GGACACGCGC TTAGGAAAGC GCAAAGGCCA CCAAGTCCGT AGGACCAAGG GCTGCCCCAA GGAAGGCTGA ACCAAAGAAG ATGCCCCAAG CCACCCAAAC ACTGCAAATA TGAGATAAAC AATGATTCGC CAGTGTTTTA TCTTCAGCCr CTCGGAAAAG CAAGGGTCCG ATAACCAGTG AGGTGAAAGT CAGTATTGGG TAAAAAGAGA CCAATCACAA AAAGGGAGAG GCAAAAAGGG CAGGAGCTTA TTGGTTGTAC AAAAATAGGA TGAGGTAAAT CAGTAATTCC ACGCACGTCC ACGAT'rCAAA TATCTCCTTC TGCTCTTTGA TT'=CTGTC AATCTTGGAA CAGTCTTTGT GCTCAATTTT TAATTT'TTC TTGATTTTAA GCA7=ITT CGCTCGTTCG TGGTGGG'rTG ATTCAACAAA TAAGTATTGT TTATCCATGT CTGTATCTCT GGTTGTTGAC T'rGGT'rTAAA CTTCGGTAAA GGATCACAAA ATCCAAGACA GCATCTC-.CT
TCTCTGGCAC
GC1'CATATGC
ATTCTGGCGC
CTAATT=TC
TGCAGCTGT
CGAGATCGCC
CGTTCCA IrT
GCTTGGTCTA
A'rGGCATCCA
AATCATCACT
GTACTCTTGT
TCCTTCATGA
CAAGAGCAAC
GCACGCCCAT
GCTTTTCGTG
AAAAACGGCG
TGCTTCAACT
CCATAGTAAA
AATGCCTCAA
CCCAGATTAA
1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 TCATAATAGC AAT'rCGTCCA CCrTGACAA GTAAGCCACA TAGCTTr'rCT TCGTTGTCTG CGGTCGGGTG ATGACAGACT TATCAGCTGC CGGCAAATAG AAA'rCCCTGC CTTACrTTT ATCACAAACT GGTCCAGTGT CTCATGGCC'r TGCAAGATTA ACTGGGCATT TGTCAAGTCA GCCTCATGCA AACGCTCTTG GGTCTTrTCC AAGGCTTGCT TCTGAATATC AAAGGCATAG ACTTGC'rTGG CTAGCTTCCC TAAAAAAAGC GTGTCATGAC CATT'rCCCAT AGTCGCATCC ACTACGACAT CCTCTTTTGT CACGACCTCA GCCAAAAAAT CATGTCCCA'r CTCAAGTGGT CTrrCATTT TCAAACTCCT GTTTTACAGC CTTGCATCC'r .TGAACACTTC CACGACGTCG CATCTCCATC TCAATGCTGT TGAGGACTTC CCATTTA1TG AGGCTCCACA TAGGACCAAG CACCATATCC CTAGGCGCAT CTCCTGTAAT TCGATGGA'rG ACGATATGTT TGGGAATAAT TTCCAGTTGG TCACAGATGA CCCTGACATA 'rTCGTCCTGA CTCATCAATT GTAAACGCCC CTCATGGTAA TCTCGTTGCA TACGAGrA'rr TGTCATAAGA TGGAGCAAAT GCAGTTTAAT CCCTTGAATA TCGTTATCCC TGACACAACG GCGGACATTT TCAACCATCA TCTCATGGGT TrCACCAGGC AAACCATTGA TCAAATGGGA AACAATCTCA ATTTTTGGAT ACTTTCTCAA ACGCTrACC GTTTCCACCT ACAATTCATA AGAATGCGC:A CGGTTAA'rcA GGTcAGAGGr TGCrCATAA GTAGTTTGCA AGCCCAATTrC AACCGTCACA TGCATGCACT CCGATAACTC AGCCAAATA1 TCGATGGr CGTCTGGTAA ACAGTCTGGG CGCGTTCCAA TATTGATTCC TACCACACCT GGCTCATTGA TAGCCTGTTC ATAACGCTCT CGAATAACI' CCACCT~rC ATGGGTrGrrG GTAAAANrT GAAAATAAAC CAGATACTrC CGAACATCCG GCCACTTGCG CTGCATAAAG TCATTCCT TATAAAATTG CTCACGCATA GGCCCATCCG GTGCCACAAT GGCATCTCCA GAACCAGAAA CCG.TACAAAA AGTACACCCC CCATGAGCCA CAG'ZCCCATC ACGATTGGGA CAATCAAATC CCGCATCAAT AGGGACTTA AAAGTCT'rM CTCCAAAGAG ACTTTCXI-rA TAACAAAAAT CTGTT'rCTCG ?TTCCATTAG TTAGCATCTA ?!'TATTATT TTTAGATTAA AAAAACTACG GAGTATTACT TGATTACAGC CCC7CTCGA TCCTAGCCAT TACTACCCTA AAITrATCAA TATATAGAAA TG&7rGTTGA AGGGATTTTT GC =TCT~A AGAAAAGAGT CTCGXCAAT rTTTGGAT G T'rACGA'rAG
CACCCACAAT
CCT''T=rA
GATACTGACC
ACGAAACTTT
TAAAACCTTT
TATTCTCGTC
CTCAAAAGCC TGACTTTCCT 'rGATACAATA 'rGGGTATGAT 'rrAGTCGTCT GCATTATCCT TTrCGATA6A TAATCATTCA AGGTATTATA AGATTTCATG
ATAAATTCCT
TTTAATGA.AA
AACCAAA=~
CTTGGTTTT
ACTGGGGCTA
ACGCAGCTTT
GCGGATTCGG
ACCCATAATT
,r?1r'CTTCC
CTTTTCCAGT
TCCTCCCTAG
TTr'rGAAAAA 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 ATTCTTCTGG CGCGCAGGAT TCTTATTAAC CCTTATCATG AT'rGTTCTTA ATGAAATAGT CGAATCCCTA AGCAI'rTCT CAAAATAGTA 'rAGACAATAA CACTATACAA TTrrArACA ACTCTCTTAT ATCCAAAAAG GCAACGGATT TGCC=TGCT TCTTGGTAAA ATAGAATTGC CCAATAA.ACC ATTTAGAAAG GCTATCCCAT GCATATTCAC TATAACACAA ATCAAACAAC CCTTCTTACC ACAAGATCAT CTCGTTTTrA CTATTGAAAA AACGTCACTT CTACACCTCC TATCATGCCT TTGATCGCCC TGTATCTAC TCTTCTATTT CCCTATTCAC AACGGATTr TTTACCACTA GAAATCAGTT AGTGGTGAA'r ACCTTGGAGG GTCTTATCAC CCTAAAATGC CTCTGGTCGA AAAATTGAAA AATCGAAGAG TTAGTGACCT TAGATTGr" GTTTA'N'GAC AGAACTAAGA 'rTGAAGCCAA TGCCAACAAG TATAGTTG TGTGGAAGAA AACGACAGAG AAATTCTCCG CCAAACTTCA AGAACAGATA CAGGTCTAT'r TTCAAGAAGA AATCACTCCC CTTCTGATTA AATATGCCAT GTTTGATAAG AAACAAAAGA GAGGGTATAA AGAGTCAGCT AAAAACTTAG CGAATTGGCA CTATAA'rGAC
TACCAAATAT
934 AAGGAGGATA GCTACACACA TCCTGATGGC TGGTATTATC CAGAAAACAC AGACAGACI' TCAACAAGAA ATCAAGGTT CGAACCTGAA TCAGCCCCTC AAAAGGGACT GTATATGAAC AGCTAAAGAA TGTCAGGCGC TTr'DATCTCC CCAAGGTAGA GATTGATGTG GAACCTGTCT T'rGGGCAGAT AAAGCCTTCT TCTGAGAGGG AAGCCTCAAG TGAGAATTGA CATGGGATTG CCTAAAATAT AGTAAAATGA AATAAGAACA GCACAAATCG TTCTrAACAAT GTTTTAGAAG TAAAAGTGTA CTATTCTAGT GAGAATGACT CAA.AATTAAA AAGCTAGAGT TCCACAATTG
GAACGCTATC
CAGATTI'TCG
41VGGGTTACA
GTTTTCACCA
ACTACGCCGA
AAAACTTGAA
CTCAACGCAA
AGAGATGTAA
9 99 9 9*q 99 9 9 9. *9 9 9 f. 9 0
GGTTGAGAAC
GGTCTTTAAT
ACTAATCCTG
GCACTACCTG
GGTTTTGTAT
GTTCCAACCA
TTACTTCCAT
TCTACACGAG
AAATCTTTTT
TCAGAAACTG
TCTGAAACGA
GCTACTTGAT
TAGTCTTGAC
TGAGTGCTTT
CCGCTCTTAT
AATG'rrACCT
TCCCAA'N'TT
TAGCTTGGCA
TCCGTACCAC
TTACCATTT
GACCAGTTTG
TA7TTTGTCT CAGGC'rCTTr ATCTTCTATT GATAAAGAAG GTATCAAAAT TTCTAG1'CTT CACTCAAACC TAGAAGAGTT AAACCTGCTG TACTTrGGTAA CTGGGCTTTA TTAGT'rTGAC CTGCTTTTTC TGACACTTGT GGTTTTTTAG AGACGATGCG GTCTGTCGGA ACT-rCTACCA CAGGATTAAT CGC'rGTAAAG ATACGTrCTT TT'rCACCTAG ATACAGTGTT GAATcrT1 CAACAAkATTC GATTTTTGGA AGATC=CT GTTTTTCCTT AGTCAAGTGG ATACGGTATT GGCGAACAAG TACTCGAAAG CTATC CTC TGTTCTTC AACTGAGACT T'1TGGCCGTT GATTTTCAGC GAAATCAGCA AGTTCTrTTC CTTGAGGCAA TrTCACTTGGT GCAAGGA.AGG CrGCrTTACG CTCCATACGC CATCTCATAG TGATTTCATC ACCAGCTGCA ATGTCTTTAT CTGGATTrGT'r GAATGGATGG TCTGCGTCGT CTTCAAACTC TGGACCGACA TAGCGTTCTA TATCTGCAAA GAACTGAACT 7=ICCTTGTG CACGGAAAAT CACACCCGCT GATACT'rCTG TCCAACGACG ATTTTCTGAA TGATCrCCGT GTACTTATGG CCAATAACCT ATAAGGACAA TCAAATCGAr T'rCAATCTAC TATACAATAA GAAATATCTA CrrTGT TAGGACAAGA G7TTTTCTTT C7'rr=TACC TTTAGTAACT CTACTCrGC TTGGCTTGCC TAGCTTCACT TGAATCAATT C=CTTGAGC TACTGGTTTG CTTCACGGAG TTTTTCTTCC TTCCAACTTT TCCTTCTTGT TCTCAACTGT CTTGTATGCC GTACAGCAGC AACTGTCrTC CCTTGACTTG TTT~rCCACTT CACTATCTAC CACAGTTGA.A GACCTTTATA GGTAATTTGA CATCTACAAG AATCTTTGAT TCATCTCAAT CATCGCAACA CTTTGGC7r'r GA'rAGCTrTA CCGCACGATA AGGAACAGCT AGGCTTGGTA GTTT'GAATAG AACGAGTTN AGATGGTGCA TAACAGTCCG TTCTACAATC GATTAGAAGA TGGTG'r'IGGT CATTGAGATA GTCAACGCGG 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 9. .9 .9 9
S
TCATGAGAGT ?TTGCAA'r ATCATTGGTr TCATAGTTAG GGTTA'rCTGA AAGAGTCTCA TCAGCAACAA GGTTACTACC AAGGACACGG AGATTTrCCG CTGGAACTTC TTCCCATTCA TTACCTGTGA AGTAAACTGG AACCTTAGTC AAGCCAGCTT GTTTAACCGC AGCAACTGGT AAGTGCAGAC GGTAT'C'rCC TAAGATGTCG GGCTCACCTT CACGAACGCT TGGAACGACG GTGACTGCCG GAACTTTTCC ATCTACAGAC AAGIwMTCTA AGTCTTTGCC GTCAAC'rTGG ACTTGTT'rCG CAAAGATTTG TACCTCTGTG ACCATGCGAA TACGAACAGC ATAGGTTTCA CCAGCCTTGA GTTGAGCAGG GGCTTTTAGA TTAAACACAT GGTCCTCATT ACCAACAAAA 935
GCTGAAGCAA
CCAAGT'rTGT
CCTCGAACAG
ACTGTCAGGT
GGCAATTCAA
7?rATGAGAAA
CCATTTTCAG
GTAGCGAGAC
TCAAGGTAGT
AGGC1CG' ACTGTTT'rCA
CTGTCACTCG
TAAAT'TGACC
cTrT='C GTrGCTTGACC
GTAAGCTCTT
CTTTCGCGAT
CATTGTTGCT
AGTCTGTCAA
TACAGTGATC
TGCTTTTG;TC
GTAGCCGTCT
TACTTGTAGC
A'rCCTTAGTG
GACACGAACT
AACACTTCCT
ATCAGGGTTG
ATTCTTGTTT GTCCTTGCTT GGCTGCCGCA ATAGACGTTC CACGCTTGTT ATCTGCTTTA ACTTTATCAA AGCTAAAGTG TTAGTAACTG GTTTCCAGT'r CTAGGGTTTT TAGGAGCTGT GTTCATTrTCT
GGCAGAATCA
TGGGACAGTC
TTACCAACAT
AATCCGACAC
ACACCGACTG
TTATTGTAGG
GAAGCAAAGG
TCAGTTTGAG
CCATTAACTG
ACCTTAGCTG
GG'GCTTCTG
GAGACTTTTG
TGATAGGAGT
AAACCTTTGT
GTCACTACAT
GTGACTTCAA
TTATGCAACT
AATACTCAAT
'rrAGATTATC AAGCTTrCTGG
CACATAAGAC
AACGGAGCGT
ATTAGTACGA
VTCGGTACAC CAACTCCATG GTCTTCATGG TTGCTCAAC.A TACC-GAA.TC TCCAAACAGA TTCCAGTTTG TCCAACCATT GGCTGGTTGG 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 AAATGCCTT G'rCATTAACA 'TTrGAAAC'rG GGTCGCT'rGG ATTTGAGTCT CAAGTGGCAA TTCTGAACCG GTCCATTGGT CAGAAATGTT TCCACCTTGC CAGATACGCG AACATGAAGT TTAGTTGTTA ATTGCGTACC TTCTA.AGCGA TAAAGACACC TTCCTTAGCG TATTGCTCTG GACGAATCGC ATCCCATGCA ATGAAACGTG ACCATTTGAA TCATATGTCC GAACACTTTC TGGTAATTGT CGArrGGAGT TGTCACACTG ACTTCr'rCAA CTGAAACGAT ACCTTC1'ACA CACGCGCTTC AAGGTCAATT CCTTCAACTT TACCTAGTAC TTCAAATGT? CTAG7'rTC 'TTCGGAATA GCrTrGCCAAG TGAC~rrATG AGTTTAGGG CATACTCAAC TGTTACTGTT GCTCGAAGAC TTCTTCCTG ATGCAAATCT
TTACAGGACG
CTTGGTCTTT
CAAGCArrCC GATGGATTGC GCAATCTTCT ACCTCCCTCA TATICAGCGT TTACGAATT GCGACTTCCC TCTCACTATT GGCTTGGATA TCAGAGTGAC TGCTCCTGGC CTTCACCACT TGTAGAGAAG GTTACT~rAT CAGCI'GGTAA TACAGCTTGC AA~wrTGACAG TTTGGTCTTC TTTGAGACTG GCAATTwTTG GCGCTTCTTC AAGGAATTGA TTAGGCTGAA TAAAGATGCT CGCACGCATG ACAGCTGCAT TTTTAGCACI' TGCTGTGACT 936 GTTCCATCTT GATAGTGAGC TCGAACCGAC TCAGCTTTT CCACTTGCAA GCTCAAGTGA ATTGCATAGG TTTGAAGAGG GCCACCATCT CCGTT'rGCTG CGCTTGCTTG AAGAACTGTA TCTGGCAACT TAGCTCCATA AGCAAGAG'rG CCTGTTACTG CCTCACCACC AACCG'rTACA TCTTCTACCA CAAGG4GTTGC ATGAATTGGT CGAGAACCTG GAATTGCTAA CTTAGCTA 8880 8940 9000 9060 9120 9180 9240 9300
CGGTATS'GCA
GCTGGTACTG
TGACCTTCTA
TTGG?1wrTG ACTAGTAAGA CAGGTGCCGC AGGATTGCCT AATAACCGGT CGCTTGAATA TCTTCTTCGG CAATCTCCCA CTTGTCCACT ACATAGCAAA CAGATTTCTC TACAGAATC ACTGGTAGCT CTGATTTAAG AGCAATCACT GCCA'rACCTT TCACCGTTAC AATACCAGGC GGACCTTCTG CACGGCTACC ATCACTGTAT GCCTCTCCAA 'rAATCGTCTC TACTTTTGGC CCTTCTTTCT TACCAGTAAA GACAGTGACT GTCAGGGTGA ATTTCCCTGC TTGTTCAGTT AATGCT'TAC GAATCCAAGA ACCATCTGCT TCTCCGTTAT CTACACCGAC CAGTTCACCT TTAGCAGT'rG GAACCACATT CCCCTGGCTG TCTTTTCCAT CTGCTGCAAT CGCATGGTCT CCAGCAGTCG TAATCTTATC TCGAGCAA'r? TCCAAGGTAC CTGGTTGATA GGCAACTT'rC
TCATACTCTT
AAGTCAGTAT
TCTACACGAG
TTGCTCACAT
ACAAACGGAA
ACTTrCTGTCC
TGGTTCGATT
GATT'rGACAA TGCGCCTTATr
TGGCCATGCA
TCAACAATTT
CAACACTTCC ATCAATCAAA TTGGAGCAAT ACGTTTCACA CTTCTACTTC TCGTCCGTCA CTACTGAAGA CCAGGTTACA CAGTGGTAGG CATTTCAGGT CCAAAACAGT CTTCTCTTGT TCAAGAGATC AGAGTGGGCA TGGCAACACC TTTACCATTA AGCGTTCACG GCTGGCTTGT ATTGGAAGCG AACCAGATTA TCCTTAA'rAA GACGAACTGC CGCTGGCT'rA TCCTGCCAG ATTCATCACG AGCAATTGCT CAT'rCAAGAT AAAGTTCATT AGCATTTGCA 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 CCTTCTTGGT AAGTCCGCCC ATCGCTGGTT TGTT'TTTTAT TGAAAGTCTT AAGACCAAGA GATTTTCCAT 'rCAAGAACAA TTCTACACTA GAAGCATTCG AATAAGCACG AACTGGAATC TTACCTTCTG AGTCAGCTAC ?TCGATGCT AATTCTTTGT =TCCCAGTT CCAGTGAGGA AGAAGGTCTA CCATCGGT -r CTTCTTAACA GAAACCCATT GCGCTrTTGTA GAGATAGAAG TCATGTTTTG GAATGCCGGC TGTATCTACG ATACCAAAGT TGATrGGT TGTGCCATGG TGTAGGTTCA CCAATATAGT TGTCCAGCAT AGCCAGCCT'r GTCACGGTCA AAAGTCCATG CAACCCACAC GATCATT'TCC ATAATCTGAC TGTTCATAAT AAGAGCTC'rT AACAGGAGTT CCGTACCTGT CCAGATAAAC AAGCGGTTGC TGTTTTCCCC TACGCTCAGG TCCATTGCTA TGrrTCAArr CACCTCAGG TCTGATCCAT AAATCAACCA GAATAGTTAA ATCCAACAGC ?rrACCGAAAC GGAkATI'TATC ATAACCTTAA CCAAACC'Trr ATTTCATTAC CAATTGACCA GTACG'rAGG'r CAAAATCAGA TTTTCAAAGA AACGTCCATA TCTTCCTGAA CGAGTAAACC GCGATAGTAA CTTCCACG'rG TT 'TGGATGC TTAGCTCTAA ATCGAGT'CA TCAGCAA'TT TGCTCCCATG GTAACATAGC AACAGTrGCT AAAGAGI'GGG CATGAAGATA GCAGGGTTGT TACGGCTAGC TGAAGATG'TT GGGC~rrTA ATTATCTTCA TCTCATGCCC TCCGCTACCA GAGTCTTATC AACATCCTTG CATCACCATT AGCTTCACCT 7TTTCCTCT TTCGACCA'rG CCATT7*rTCA
GTCATAAGGT
TAGT'rCTGCT
CCTTTTCGAG
TTCTTGCCAC
GCGATT'rGCA
CTCTIGGGT
CATACCACGT
AGGTTTGCTC
AGTGGCATCT
ATCAAAGCC
ACTACCAGGG
ACGGCGATAT
GGATACTCCA
TTGTGGG'rTG TACGGATGGA GTTAACTCCC ATCTCCT'rCA TTTGTTTGAG TCTGCTTAT AGrTCTTC TGCTCCAAGC GCCCCA'rGGT CGTGGTGCAA TGGAATTTAA TACGTTCACC ATTCAAAGAG CGGTAACCAA ACAAATCCTT CTTAGCATCA ATCAATTCGT ACAAGGCAGG T'rTGTCATTT TCTAAAATCG CATCTAGGCT TG'rTGATTCA AC.TAAGCCTG TTACAGCATG ACCACCTCGT TGGTCTTTGT CGTCCGTATT GACGATTTTG TG'rTGT-rCTT CAAGTTTTGG TGTTAAAATA TCTGTCACTT GTAAAGTCAC ATCACGATAG GGCTGTTTGT TCACTGCATG GACAGCAATC T'rGGTGATAT CATATGAGAA CTGG'rTATAA TTGACATAAA CTTGAGAATC CATGTAGACG AGGTC7'MrTT CATCTAGTTT GAAAGTCTTG CCACCTTCAT TTTGTGCAGG AGATTCATGA GGTAAATCTA A'N'TTTCCA CGTAGATACG TTGCATTGA GTTTAAAGTA CCAAT'rrTGA TGATTCACTT CTrT'TGT TACAGCTITA GAAGCTTGTG ATTCTATCCT TGGAGCTTTT TTTGGAGTTA CGCCTTCATC TTCTTTCTTC AAACCTTCAT T1'GGAGTCCA GTGATAGTAA ACCAATTGAC CGTCACCGTA AACACGCGTA AAAACAGTCC AGAG7T=TGG TCTT'rCAACT TGTGCTTTTA AGGTACGACT CGCTGTACGA TCAACGATTT GATATTCGGC 'rACAAGTTCA CTGGTCACAT GAGTT'rCAAC CT'rGCCATGT 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 GTTGTCCCAT TTTTCTCAAC ATACCACTTC CI'GAATACCA ACATTCTCAC GACCATCTTT CCATTTGGAT AATGCCCCAC
ATGCACCTTA
ACGGCTACTT
TTGAAGGTAT
TAACTGACCA
CCATCAAAAG TAAGGCGAAC A'r'r'TTCTTG CGA'rACCAAG CTTCCCCACC GT'rGAGCTGT TCGAAATCGT 'rAAAGA'rACT TCTGCATCAG GTTTAATGGC TTAAAATCCA CTTTCCTGTC GCATCTTCCT TGAGCGGTTT TCTTCCGGTT TAGCAGACAC
CCAGTCATAC
TT-CCTTAGAA
TTCAATCATT
TTCTTGA'rT
'ITTTCCTCT
'rCAGA'rGCAA TAGCCTCAGT TGAACTAGGT TCACrG= CTGTCCTTTC AACTATATTT TCTACTATCA 71NTTTCCTC TTTAGCTTTC GCATAAACTA CAGATrCI'CC AGCTATATTT 938 TTAGTTTCCA AAGCTrATC ACCCTTTTCT TCAGCAGTAT GAG'TAATAAG TG=TCATCC CCTCCTAATA AALACTGCACA AGTcccAATc CGAATGCTAT AAACTC?rTT CcGA'rrCCAA 'rATATTTACT GCAGTTAGCT ACTACCAAAG GTTTCAACAA TrTACACTCT GCGAAAATCC A'rTACTGAGC AAGCTCCCAC AGCAAACTTA 'rGGCCTTTCC CCA'rAAAACC CTCCTrATAT CCCAAGTGGT ATACATGCTA TGACAACCTA AATTCAAACT TCCTCAGTGT CGCCTTGCCG TCTACAACCT CAAAACCATG TTTTGAGCTG CCATG7 G AGCTGACTTC CTCAGTTTCA ACTTCGTCAG TCTTATCTAC AACCTCAAAA CTAGTTTGCT CTTTGATTI-r CATTGAGTTT TCCTGCGTAC TCTTCGTTAC GTTTGATCAT ATAGAATACA AGGAACACTA CTGCACCAAG ACCAA7TGTC CAACCAAGAA GTTTCGAT TTGAGTTTGG CTCACACCTT CTGCGAAGCC TGCTGCAATA AGTGTACCTG AA.AGAAGGAA CATACGGAGC AAT'rTACCAC GAGTTACAAC GATACCTGCA AGTGGCAAGA TACCATTTCC CATOATTCT GCAAGTACGT TGGCACAAGC CCAGTCAAGA CCGATATTGA ATTTACGTCC ACCTTGTGAT AGTGGTTCTA CGGCTGCGAT CAAAGATACA CCGGCAGTCA AACCAAGAGA ATCTGCATCT CCAACACC CAA'xrGGATc
TAGATATGAT
ACTTCGTCAG
TCTACAACCT
TACTGACTTC GTCAGTTTCA
TTTCATCTAC
CAAPLACCATG
CTGTGTTTTG AGCAACCTGC ATATTTTATA GGAGCGCATT TrTTTCTG TACCAAGCAA
AACCTCAAAA
TT'rTGAGCTG
GGCTAGCTTC
ATTTTGCTTT
AGATACCGAT
CATTrGCTTTG ATATCACCAG TTGTAGTGTT TGGT'CCTTCA AGAGTAGAGT GAGTAATCAA ACCTACACCT TTAGCAAGTT CTGTTGCAAA GAGTGGCAAC AAGAGTGTTC CGAAGATAAT CAAGAGAGCT GGAGTAACAC CCATACCGAT AACTTTTGAA AGAAGCACTG CTTCAATCAA CCAGATTTCA GCACGACCAG CGATGAATGG TTGAAGACCT TTAGTAGCAA CGTTTGTAAT 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 GAACCATGAA CCGATAAGTG AGAAGAGTTC CAACCATCCT T'rGATAACAA GACGCCATTT TGGAGTTCCC ATAATACCGA TAACGATACC CCAGAAACCG ATTTTCTTGT TCAATTTAGC AAGGATGAAA CCGA'rGAAGA ATTTAGATCC AGCATCAAAG TCATATTTAT CAACGCCTGG GAAGAATTT TCAAAAATCT TATCCAAAAC CATGATAACT GGGTTCATCA TGTAGTTCAT GTGAGTTGAT GTCATTCGTG ATGAACTTCG CGCGTTAAGA AGGTCATCAA ATGTACGTTT CATCAAGTCA GAGTTGATAA TTTT'CAACAC ACCGACAAGG ACGATAGCTG CTGTAGCAAT AAAGAGTGAA ACCCCTTGAC TCACACCATT GTTATCAGCA TACCATTTAA TCAAGAGACC TGTGATAGAC AAGTGCCAGA TATCAAAGAT ATCGACATCA AGTGTATCTG TTTCITCAT AGCTAGCATC ACTATGTTGA CAATCAACAT GATGAGCAAG AAGTATAGTG TCCAAGCAGA ACCCCAAGrG ATrGTAGCAA GTGGTGCCCA
ACCAACGTCG
TGAGAAAGCA
TTTGrATACCA
CAAAATGTT
GATTAGGTCA
ACAAATTAGA
GGAGCGCTCA
GGTTCAAAAC
'rCTTCGTTTA
AGTTCTTCTT
GTAATACTCA
GTG?1'TAGCA
CCTTCAAGCG
AACATGATGA
AAGATrGCAr
ATTAGCTTAA
TTGCTGGGAT
CAAGGTCTGT
CATCTTTCAC
CTAGAGCACT
ATTGGATACC
TACCGATGAT
CTTTGGAGAA
CAGGTCCACC
CCATAACAGT
TCCGTGTTCT
ACGGAATAAG
TGCAGCGATT
CATGACTGCA
AGTGTTTTCA ACGAAl'rMG CTAGTGATGC AGCACCGATA CCTGTAAGAG CGATGGCAAG 'ITrCACTCCA AAAAGTAAAG CCAATACTGT CATTTCTAAG ATGGGATTGA AAACCTTTCC 'rCCTCCCTT'r TTGATGTTAT ATGAATGTTA 'FTAATAGCTG CTTCAATA'1r GTCAAATACr ATTGGCCCAG CTTCGATAAC TGGGATACCT GGTGTAAAGA TATCGTAACC TTTCATAAGG TCACAGTGAA CATCATAACC ACGGTTrGAA GCAAGAATTT TAATCATTTA GGTTGCTTCA GCAATGTAAG AAGATGACCA TTTCCTGTGA ACTTGAATTA TTAATGATAA CATATTGTGA AAAGTrATTG ATGAACAATA TCTGTGTGAG ATCTAAACCA GTTGGAAATG TTTAATTTGG TGACTTGAGT TAACACCTGC ACCGCAGGCA GATI-rCCTCC GATTTTAT'rT 'rTTAATAGAC AAGATTAAGC TATAAAGGGC 'ITC'rGGT'rCA GAAATTTTTG ATAGGTCT'rC AGAAGTCCAT TAACTGAGCA AGAATGTTCG TTTGACTrGA AGAAGAGTAG GGATACTTCT ACTTCCTTAT CAGGAGCTAT GTTTTTCTA.A TCGAACAACC ACCACTTTCT CAGCTAGATT GAATCGCTAC ATTTGGCAAG TCCTTTCCTA GAAATTCCAT ACTTTTCACG CGTGATCAAG GCT-rCACGAT AAGTTGGAGT 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15363 120 GACAATrTCT CCTTCTTCCA ATAAAGTTGC AACCTGATCA CGCTTCTAAG CAAAACACAA GGITrTTGTC AAAGAAATAA
CGG
INFORMATION FOR SEQ ID NO: 140: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 28882 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear AAGAGTTGTT CTTGACTATC TCTAATACCA TAAGTTTTTC S S
**SS
S
S S (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: TAAGACTATT TAATAGTGGA GTGAAATAGG ATACGAACAA ATTGATTAGG AAAATCAAAT GAATTTATAG AAATCTTTTA GCAGTTATGT TATCCTATTC TAGTTTCAAA ACGCTATAGA 940 AGCAGCArrG TGC'TAGTCkA GAr1'CAGTT ACTATACTAA AACGAGTAGC TTGAAATCAA AAAACCCACC CTCACAX3GCA GGTITATCT GTA'N'A7rCA GCTAGATTA'r GCTTTACC?? CTGAACCGAA TACGTCGATA CGTTCTTCAA CCGATGCTTG GATACr ACACCGTCAG CCAAGAATrr ACGTGGGTCG AAGAGrTTT TC7'rGTCGTA rrCTGCTTCG TTTGCTTCGT AGTCACGACC AAATTrACGA GTTGCGTTAC CGAA'rGCGAT TAACTTTGCC AACACCAAGT CACCG'rGCAA TACGATTGGG GGTCAAGACC TTCCCAGTTT- AGA.AGTCGAT ACCAGTT'rCA TACCGATGAT TCCATCTTCT CTTTAGCGTG TGCTTTrTCA GGTGTGAACC GTCAAACATG CGTAGTGACC GTGGTCAAGG GGTTAGCGAT CAAG'rTGCGA AAGTTTGGAT CAAAACTGGA ACTCAAGGTT CTTTGTGTTA CAAATTT'I*TC TGCTGAAACG TTGATAGCTG CTTGGATrrG AATCCTGGAA GAGCTTCTGT
ACTGGGTAAG
ACCATI'GCTT
TCACCACCGA
GACCGTGGAT
'rAGCGTCTTC
TAGTACCAAC
ACAACTTCTT TAGCCAATTT ATTGAAGTAT AACCAACTTC TTGGCATTCT GTGTTAACGT CTCATCAGGA ATACCTGATC CAATTTTTGC AAGTGGTCAA GT'rACCGA'rA CCAGCTGCCA GATTGGAGCC AAT'rCACCTT TTCAGCTTCT ACTGAGATAC AAGGTTTrrCT TCAACTGGAA GATACACTCA AGTGCATCTIr GATACCCATT GATTCAACAA CTATTTAGCA GCACCCATTG GCGCAAGATA GCTTGAGTCC GTTGTCACGG GCTGCT'rGGA CCTGTATA-T 'NrTATGGGTC C. ta..
S
5 0 toe.
0*
S
C
TGGATAGCTA
GCAACTTTCT
GCTTTT'IrAG
AATCCACCAA
ATTGCCATTT
CTGGTACAGT
AACCACCCAT
CTTCTGCTGC
C'TGCATA.ACC
TATCAGGCCr ATCCCATTTA CATTGTTCAT TTTATCACTT TTTGCCA.AAA AAATCTAGTT TTTCCCGCAG 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 TTTCGATTGA TTTTCTTCTA ACTCCATCTA TGTAA.ACCCT TTCTCTCCCT ACTCTTGGAC GACTTT'rGGA AAATCTATAA AGAAGGTTAA ACTATTCTCC TCCA'rCTCGA AACGATAAGC TAATTTTTrCA TGTTCTAATA GACTCTTAAC CACAAAGAGC CCCATACCAG ACCCCTTGAC CT'TGCGACTG GCATTG'rCAG AAAAAGACTG GGCTAGTTT TCTTGTTCCT CTGAGCTACA GCTAITTCG ATAAAAAGTT CTCCTTCTCT 'rTCTCCAATT CGAACTAAGC CACCTGGAAC AGAGTGCTTA ATGGCATTCC TGATGAGArT AGAAAGAATC AACTTCATAA CTGATGGCTT TAGATAACCC TGCTGATGGG TCAAACTATT GTCTATCTGG AGCTCTCTTT CCTTGGCTAG CAAGGCATAA TCTTTGACCA GATTTTGCGT CATCTGGAGG AGGTCAATTG TTTCCCTATC ATCTCGCAAT TCCTGCZACAG AAGAGAGGGA AAGTATCTGC AGAACATCCT GATTGAGTTC ATCCACAATC CCCAAGGCAA CTCCCAGATA CTGGTCTCTA TCCTTATAAC GACCGATATT CTCTCTCATA ?T=CGATTA GGA'N'TCAA ACTAGCCAGC GGTGTTTTCA ATTCATGAGA AGCTCCTCGT AGGAATTCGA CCTTCATCTT CTCCAGCTGG AGAATGGCTT CATTCTTTTC ATGCAAGTCC GCAATA.ACAG TCAAGAGATG CTGGTAGAGG CTATTGATTT GTTCCTGAG ATTACCTATC TCATCCTTAG AATCCACGCG CCGACGGGTC ACCCGCTTGA TTTCCAAAAT GG4CCACCAAA AGGGAAATCA GAAAGGAGGC GATTTGCTCC GCTTCCTTNT GTAAATCCAT ACCGTCTTGC GTTTCACCT CGCGCTCCTC GTCCAGAGGA AGACTGTCCT TGACTTCTAA CCCCT'rGATA TCACTAGTCT GGGAATACAA TTTCCCTTCT AGGGACTGGG CAATGGCTGT ACTCAGATAA GTCGAACGAA AAAGAAAATA ACTAAATATC GAGAAGGTAT AGATAAATAT CGCTCCAATT TrATAACCAAC ATTGCGCACA CGCAATTCCT TGATATAAAC ATCAA'rAACA
CAATCGCACT
CGGTG.CAACA
CAGCAAGGTA
GGAAGCTAGA
AATAAAGAGA
CTTGTCCTCG
CTCTAACACT
TGCCTTTTGA
AATAGCTAAA
TGGGAATCCA GCTCCATCAT ATAGTCCGAG CGTAGATCTA TAGCGAAGAA ACTGGAGACT AACTGGAGAA TCATAGTACC- GAGGTTGTCT GGCGGTCTGT GTCATCTCAC CTTTGACGGT TGCTCGATAC TCTGCCTATC CCAATGG'N'T CCTCACGATC TGAAGGCAGA TAACCAGAAC CTTTGCAAAT AAACCTGTTC GTTTCATTTT GTGAGGATAC AATCCAAG'rC TAGCTTTTTC CGGTCAAAGCG GAACCTCATC TGTCGCTTTC AAGGCCCGGC CTTCATTTrTT CACTAGATAG GGCACT/rCTT GACCTGCGAG GCTTGCACTG CAGACGGCAT CGATAATCTG TCCAGAAT'rT CCAACTCTTT TAGCTTTCAA AGTCCACCTT CGCTTGAAAA TCGCGTCCAC TAGCCATCTG CCAAAGAGGC AACATCAAGA CAGGAACCTG AGCTTGGGCA TCTGGATATC AGAGCTTCCT GACCGTCCGC CTGACCCCCT CACGGATCAT CTCTCTAT7TT T-rTAT7TC GCTGGCCT'rr TGTCTGCAAG
AGATCGAGTC
GGCATTGATA
1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 GGTATCCTTG TAAGAAAAGA TTCGTCCTGT ATCGTAGTAG CCTCACTTTT AAAAGGGAGA GGGAGAAAGG TTTTCCAGA AAAGGCACTC ATCTrGTATT CCTCATCTTG AAAAGC'TGTC ACTGGTrTA CGAATCTCAG CTAGGACTTC TAAGCCGTTG CAGTAAAACC AGGGCCACCT CATAGCTAGA AAATTGCTCC TGCCTCAATA GTTTCATAGC CACAATCCGT CAAATAATCA CTCTTCATCT TC1'ACAATTA AAATTTTCAT ACTrAACTG TTAGAATAAA TACCTACCCT ATTTTCTATT ATAGTCTCTT CAACTGACCA CTAGATAAAA CGTTGTGAAA TTCCTTTCTC ATATTATATT TAAGCACTAA .AGTACAAAGA AAGCAACTGA GCTTTCGGAT TTATrGAA TTGTTAAATA GCCATTCCTA ACACAAGATG CAATCTTTAT TCTAGACTCA TTTTTTCAAA AGCTCTTTTG GTTG7=CT AAGGAGATTG CTTGAAGCAA ATAAAT'rCCA
AACCAATGAT
TCCACTATI'C
TAACTTTAGT
TTTCACCACT
TTGAATAGAA
TTTATTCACC ATCCAGCAAG GCGCCATAAC GAGAACCACT AGAACCAAGG CAAGGACAAA AATGATGATA AAGTCTGATG 942 TCTGAATGGA AATGTCTAGG CTCGACAAGG TCTTGCTAAA GCCATCTACT TCTGCACCAC CACCAAGGrr AGACTTGA GCCGCCTTAC TAGCCTGrr GGCAACACCT GAAGTCACAT TGGCAACGAC AGTGTTTCCA ATTGCACGGG CAGTGTAArr AGCI'AGGAAG TAAGCAGAAA CTAGAGCAGG GATAGCAATC AAGATAGATT CGGTGATIGAA GCTTGAGGCC GATAGAGAGG AGAATTCCCA CTTCCTTGCG TGAGCAAGAG GGCAAGGAGG CCATCTTGTA CATACCAGAG CGAGTGTGTA GCTCTTCCAG CCAAGTTCTT GTCTGCTGTT TGTATCCATA AAGTTTTGCA CTTGTCAGTA GGTTACTGCT CAACTGTTrC CTTGGCTCCT CCTTGTCCCC TACT'rTCCAG CCTTGTCGTC GTTGGTTAAG AGAACTGAGA AGCTCAAGCT ATAGATTCCT CAAGAGCTGG
TTGATACCAC
ACAAAGAAGG
GCAGTGTGAA
GACTTATTA'r
TTTCATTAT
CCGTGTTTGG
TGCTCTCCTT
TGATGCCATT
TTGCGTCCCC
TGTCTGTAAT
GACCATCAAA
CTGCATCGTA
CTGCCAAGTC
CGACTAGTTT
TTGACCCAAG ATACTTGCCT ACGGGCGTTG ATCCAAAGC ACCCCAGAAG AGGAGGTrrGG GTAGTTAGAG GAGCTCTTGA CAACTCTT'rC ATAACATCAT ATAAATGGCT GTGTCTTCTG AGCTGTGTTT 'rCGTAAAGTT GAGTCCCTTG ATTGTCACTT GATATTAGAG TCCAGTTTAA CTTGTGCAAG AGGATTTTAT ATAAGAACCA GAGACAAACT TGTCTTCTITr AGAGGAGTCA TTGACACCTG TAATCATCAA GCTACTTCCA AAACGCTTGG CACGATCAGC AG'rGAGATTC TTCTTGGTTT CTGGCGTTTC AATCAGGTCA TATCCAGTCA 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 AATCTCCCAT AGCTTGATA CGT'rrGACAT AAGACTCAAT GGCCTTGTTT TCGGTGAT'rr TTTTGATGTC TTCACCCTTG ATATTCCCAG CACCACGAGG GATTGATTTG CATGGAGAAG CTATTGGTGA TATTTTAAA CAGTAGCTCC CTTGATTGAC AAGCCGACCA AACTCAAGCT CGTTCCTTGG TTGACGCGAC GGTCTCCTGA GAAGCCTTG CGCCATGAGG AGAATAATCA GGAAGATGAC AATCGA=N AAAAACTTCC TTGTAACATA GGCAAATGCG TTGTGTAACA TAGATTCCCI' TTCAGATTT TGTTTTAATC ATTCTATTAA AATAAGCTC-A AATTATTTAC TAGTATGCG CGTTTCAGTC AGTTTCTTAT CCTTTAATTC AAGTGTAATA TCTGACGCTT GTGCCACTTC TTTACTGTGA GTTACGACAA TCACACATTT ACCTrGTTTTC TGGGCAAGTG ATTTGAGTAG TTCGACAATA TCTCCAGCAG TrAGGATC CAGATTTCCT GTTGGCTCAT CAGCTAGAAT AACTGGAGCT TCTGAGACCA AACTGCGAGC AATGGCAACA CGTTGCTGTT GACCACCTGA TAACTGGAGA ACATTCCGCr TGATCTGGCT TTCATCCAAA CCAAGCTCAA GAAGTGTATT CTTGCTTGCC 7TTTGTTGA CCAATCGGAT AATCTATCAA GTTATAATTT TGAAAGACCA GGGAAATATG ATTTTCCAGC GGAGAAAGAT GTGCATGCGA TGGTAAGAAT AGCCCTTCTT ACGAATATCC TCTCCTTGAA AAAGGATAGA ACCTTCAACA GGACTATCTA 943 GACCAGCAAG TAGGGACAAG AGTGTGGATT TTCCTGCTCC TGACTCCCCA ATAATACTGT AAAATrTCC GGGTTCAAAA TTATAATTGA TCTGATATAG GACTGCTTCA GCAGTArrCT 5520 5580
TATAACGGTA
TACATGATAA
ATAAGCAACT
ATAAACTGCT
GAGTTTGGAG
GGTAACATCT
AATTTCTTTC
AAGCAGAACT
TGCTTTGGCT
TAGGTAAGTT
TGTAATTGTA ATAAAGI'CAT GA'rT'rCTCCT TCTAACTAA GGTGA'TC TAAATAAGAA TAGGAAACAA AGGGCTACAG
AGAAAAACAT
ACTGTATCTT
GTGATTGCGT
CAAGAGATAC CAAAACTACC TCTAAACAGA CAAGTGCAAG TAAAATCCCC ACTTCATAGA GAATTAAGGC TCCAGCTCCT GCTATCAACA GCAAAG7TGC AACTGAGTCT T'rCAmrGTT AGCCTTGATT TTCCAAGGCC AAGTTTTCTA GATT'rTCTAC ATAGAAGCGT GCTGCACTGA GGCTACTTTC ATAGTCTGTA AAGACTTGAT ATTTCTCTTG TTTTT-TACCA GAAAAGATGC TTCCAGAT'rC AGACTGACCA GCATCCAAGC TCTTAGCCAA TTCTTCGTGG ATAAGGATTr CTTCTTTTAG A'rTGAAAGCC GA.ACTGGTAA CCGTTAAGCT AACCAAGTTA TTGTCTGCAG CGCCAGTCAC TGC'TTCCTTG TCTTTTAGTT TTTCCAGCCC CTTAATCr'rG CTTACAGATG TCTCTATCTT CTTAATAGAA AAAGATGTAT TTTTGTTGGA CTTCATCAGA GTCAAACAGG TCAGAAATAA AA'rAAAACTT CTCAGTCGCT
AGGATTCTGC
GTAAGCTTGC
TTCCI'GCAAC
ATTGTAGGAA
CCCGTTCTCT
TCCCATAAAG
CAAAAGCCTT
CCTGCTTCAT
CTTGAGCTTC
TTTCACTGAA
CGATAATCTC
CAATCTTGTC
TCTTGGAATC
AGGTTACATC
CTGATAAATC
TTGCGACCGT
CTAGGTCTGA
AAAAGATAAG ATGxCTAGTTG CTGATCTCCA CTTGCTAGTA AAATGCTGGA AGCAAAGCTC GATCGACCTC TTGCCTTTTC CAACCAGAGA GACAAAACCA GAAGATGGTC AGGAAGGTTT GTTTrCCTTT TCGACTTGGT GAGTCCGTCC ATrrCCTTAG ACTATTGCCC AAAAGGGTTT GTCAGAAGAC AAGCCTGTGA AACTCTACT G7'TGTCCTT ATGAACCCAA AGACCGTTCT CCCrrT'='GA AGGTGTCGCC CTTGGATGAA TCCTCAAGAG ATCACGCTCC ACGCTCTGCT CTCAAGTTCA GGAGAGACAT CA.ACT'rGAAT GTCTGACCAT 6300 6360 6420 6480 6540 6600 6660 6720 'rGAGTGATTT ATAAAGATTG CTGAAATTCC GGCCA.ATAAG TTCTGCTGAC ATAAGCCCAA TTGGATTCAT TTGTCACCTC CATATTTGTA AGACTATTAT AAAACCCAAA TTATGAAATA CGAAAAAAAA ATATCGAGTA GGGGATAATC TCTAGCCCCT CATACGTGCC GTTCGGCATA CGGCGGTTCA ACTAACTTTT AACGCATGTC ATAATCCAAA CACGAAACCA GTCCACGTTT TTCAAGGACT GGTTTTGATA AAGTACCGAC TTCTGAGCTA CTATAGTAGA TTGAAACTAG AATAGTACAC CTTTCTAC-G 6780 ACCAATAAAA 6840 GATCTTTGGA 6900 TATGAAATAT 6960 CTCACACCAC 7020 GTTCAAGGTA 7080 TAGCACGTTT 7140 CTCTACTTCT 7200 944 AAAATA'rTGT 'rAGAAATCGA lwrTGACTG;TC CTGAACAATT CGTCCTATTC TTATITrCATT TACTATAAT TGATAGTGGT CGCCCCAGCC AGATACCTTA TCTGCTATCC ATTAGGAAC CCCTAACTTrA AGCAATCCCC ATAATCGTCT CGATrC=C TTCCATTGCT TCCAGA'rAAT CACTCGTAGG CGAGTACGCA AGCGCTCATC AATCA'IrCC TrrCGTTTCA CTCAAGGCAC A'TGrrr ATAA'rAATAG TGAGAAAACC TAAGTGAGTG GTACAGCCAC TACCTCGCAT AAGTTGTACC ATCTGAATAA GrGCTACAA CAAAGGTTGG AGCTCCTGCT GGATGATTTT CAT'NmC GTATCTNTA TATTATCAGG CGGTTTAGCA GAAACAATI'T rrACTGTTAC TCCAGCA'N'A TCTN'AGCAT T-rAAT'rTrAC TATGCTAGTG ACTATACTTT TCATATTTAT AACACAGAAT GAAAAAGTGT TGCATCrrT TATCACTACT ACAAATCACG CGGAGGTGAA AT7NTGTCAC ATCAT'rTAAC GGTACATAAT TATCATrC ATGCTCTCCT TCACCTTTrAG TATT'rGCCTC TTTCAATTTT TCAATAATGG
ATTTTTCACT
TT'CTTTTTA
ACGTAAT'rCCT
AAGATTTTGT
TTCGAAGCAC
GAACTAGGAA
CTGGATATGT
TTCTCCAGTr
CTTCAGTAGC
AGGTTGATTA TCAACATTAT ATCAATCGTT ATATATAATG TTGTGAGCTA CTTGAAACAG GCTTACGTCI' AAATGAACTT ATAACATAAA ATTTGCATAA GCAGTCACAG TGTACTI'TCC AGATGGCAAT CCTCTTCTCC AACCTAAAAT CAATTGATA.A TCAACTTT'AA TTTCAAAAGA GC1'GT'rGCAT CAGACGTTTT ATGAATTGTT ATTATAAACA GTTCCTTCAT ATTTAGCTGT AACTGAAATT ATACCCAC'rA CCTCCCTGAT TATCTTCAAT CCCCACTA'rT ATTTGGCTTA GCAACAACTG TT1ATAGTAAA ATAGATTAGG GAAATCAAAG CAGCT'rCTAG GAATGTTTTA CAGCATCAAG CCACTATAAC TCTGCACA'rA AAAATGGAGA 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 AA.ATATTAAC TTCTTTACA-A ACCAACTATA CACAAGGTCA GGTCGGTCAA CTCTT'rCAAC ATC=Tr'ATT GGAGAGAATT GCGGTGCAGA GCACTATTTT CCCAAAGAGA GAGAATGATT
GTTGACAA.AG
TGAAGCCCTG
TAGAGTTGCT
'rCCTGAATCT TCAACrCTTC
GGGCTGTTTT
GATCTTGATC
TTTCTCCCAT
GCCACCATGT
CCGTCTGAGA
CCGCTTCAAT
CGAAAACATA
CCATTTATCA
AGTGAGAACA
CAAAATCATC TGGTGTAGAC ATTCCTTGAT TGGCTTCAAG TCCACGAGTC ACTCCAAAGA TAGAGCTGAG AAAAAGTATG AACACCTTGG TGACCCTGAC CTTGAACAAA TCCCGCTCAG CTTTGATTAA GTCTGATAGG GC'rTGATGTC CCAAACCTGA CCCAACATGA TACAAAGACG AAGTCCAAAG TCATACTCAA CG'rATCACTT AAAA'rATCTC TTACAGAAGT GTA'IrTGTCT TGTTGAAGCA ATCCTGAGCT CCGACCTGTA GCACTGTCTG ACAAT'rCGGA AAAAGAGTCC GCATCATATC TACCCAAGAA GCCAGATTr CCTGCTGAAA ATAAGAAAGA TGGCAATAAA CCAACTGAAT CTrTAAAA ACTTGCGGTG CCTGTCCCTT GCCCTCAACC AGATAGGAAT 945 ACCAAGGG'r TAGCGAACGA GCCTGCTCCT GCTGGGTCAA AAGGGCAACC AACTGCTT'rT 9060 CACGCTCGCT GAGCCCAGCT TCCTCCAGCA TGAGATAGCC CrCTTrCTCT ACTGGTTGGT GTAGTTCTTT TGCCCTCATG TT-CTAGCCCT TCTCAAGACG TTCCAGATTC TCAGTCATAT TGGACATACG ATAAGTCACG ACATCTGCAT GGCCACGTTT GTACATTTCT TCTTCrrI'TT CAATCAGGCG AGCCTGAGCC TTGGCTGACA 'rCGGTTTGGT CATACCTTTG AGGATGAGGT CCCCCTCAAA GGCTAGCGCA ATCCCGTTrAA ACTCCATCCT 'rGGTATCAAG GAGCTTAATT TCTGCTGTCG CAAAGTCACG ATTGGCACGC ATCTCTGCAT CCAAAACTTC CTCAACAAAG AGAGCTTGCT TGACACTTGC ATCATAGTTC ACAACTGTGA 'rACCGTTGGC AGCATI-AAAA T'rAAAGTT' GTAACTCr'rG GGCATCCACA T'rGAGATACT TGAGATTGGT CTCGGCATCG TTACGGTAGT GCTGGGTCGC AAAGAAGAAA GCATCGTGTA CCGTAATGAA GTTACCCAAG ACAAAGCCAT TGTGCATCCA GTAGT'rAGCA GCAAT'rrCAT TGGTGTGGTG TGGAAACTCT GTATCACCTA A.AATCTCTGT CGACA'rGACT GGTCCCCAAG GAC'rATCCCA AGAAATCTCA rITTrAGCCr
CCGTTAGGAC
CGTACTTAGT
TGTGTGGCTA TTGGTATTGC CTGCTCCTCC AACATCAGAG TGCTTCTTGA TGGAG=TAT GACGGCGAAT ATACATAGAA TACACCGCAT TGAGATTGAC ATCAATCACG TGTCCACCTC CCTTGAGTAA CCAATTGGTC ACGGATTT'GG GCCTC7?rGGC GTTTTTGAAT CAAGTCTCA AAATCCACTG CTGAGAAC AAAGGGAGCG CTGAAATCCG AGCCTCAGGA AACCZAGTCTT CCACTTTTTG GATGCACCAT GAAACCAAAC GGAGATAGCC CATAACCGCr TC.AAATCCCG
ACAATTCCAA
CCTGAGTTGA
TCTTCATCCA
TTTCCTGTAA
CGAACTGCCT
CGAAGTACTT
AAATTTCTAA CATATCTGCA TCCATTTGGC CATTTCAAAG TAGCTGCTAC AAACTTATCT ATGGTTGTTC GTAAGTATTC TTTCCGTGAA GTTGATAGGC GCCCATCAAG AG'rTTTAAGG 9120 .9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 GACTTAGACA T'rTTGACATI' GTCGATATTG AAAGCCTTGC CTGTTTTAGC TTCAGACTGG
ACGTCAGCTC
GAACACTCAA
CCTGGTTTGG
TCTACAGGAT TTTCCTTACG AGCCGTTTCT TCATCGGTAC AAATCTTCCA AGGTTATT AGCCAATTTA GCATAGTTGT TAGACATCCC CTIGACTCTC ATACGCAAAG CCTTTCTCGA ATGATGTCTG CCATAAACTC CACTACACGC GGATGGCGAG GCCGTCACAT CCTCACGAAA GGCAGCGATG TACTTATCCG CCTTCTTCCC TGGCACGGTT GATAATCTTA TCATCCACAT
CACCACCGTG
TATGCCAACC
AAGATTTCCA
GACCTGAAGC
GGCATTTTTC
TCAACTCTTC
TCGCAGGTTT
GATATCAATC
CCGACGTCCA
TACAGCAAAG
ACCTAGCTCC
TACACGGAAA
CACAAAACGG
CACGCCCAAT
CAACCTCCTG AGGCGTGATA CTGTAAAATT GGAAATATAG GCAACCTTAT ACCCACGGTA CGGGCGTrrC CTACGTGGAT TTGCCGTCCT CAATCGGGAC ATCATAAATC ATAATCAGGA TCAAGTAAAT TTCAGTCCGA AGATAAACCA CCrGAGTCTG ACATTGCTAT CTTCCACTGC ACGAGGGATT TTGCTAAGTA AAACCGTAGA CGTAATTCGT CCAGCTTTTA ATAAAATATA CTATCCACAA CTTCTCTCGA ATTTGATCCT GATACGAACT GGATAAGGTC TTCTATCCTT CAG'rTACTGG CAAAGTCAGG GACCGAATGG GGAAAGAAGC CCCTGACGGC GATGACCTGG TAGACAGTTA ATAAACCAAC TTTG.GGTCAA AATTAAGCAT TGGCAACAGT 'rAAT'rACTTI' GCTTGAATCA TATAGGTATC GCTCGAGTTT ATTGACATAA CTTTCTTACC ATGAAGACGG 946 CTCAAAATAG CGACGAATCG TATCAAAAGC TACCGTCGAA ATAGTTGTAC ACCGTTGGCC CACAAACATA CATCrGATC AAATTCTCGC AAATCACGAG ACATGGTGTC ATAGATTTTA AAGCTGAAAT CCAAGAACAA TTAGTTTCAT CACTAAAAGT ATATC1'CTAC ACTTCGGAAT -QeeTTGCTCC TTTCTCATTC TTTwGACAAAG CCAATTTI'T CATACAAACG TTTGGCACCT AATCTGAAAT TCCT'rGTCAT T'rTGCTCAAT TAGTTGGTTG GCTTCCATAG CCTTTTCCAC CTTCAGGTTC CAATATTGCT ATTAGTCGAT AAATCAACCG TACAAGTTCC AATAACCTGA TAGTCGGCTT TCTGGATCTT TCAGAGCTTC AGCGACATAT TT1CATGTTCC TCTGAAAATG CCTGAA.ATTT TAATTGACTA ATCTGCTAAC AAAACTTCAA GATGGGAAAC ATTTGCTAAC ACCTAACCAA GTTTCTGTCT CTTCATCCTC GATTAGTCCC ATGATTCTCT AAAAAAATAC GTTCTGTCTG AAAAGTGACT TGTTTCTCTC TCAAAACTAG TAAACAATGC ACCCGCAATC ATGAACCAGT ATCGTCACTT CTACATCT'rG GTCATCTGCA AAGTTCGCCY TTTTCATAAT AAAGGAAAAA GGCGGGCATG GTTAGAGAGA TAGATCGC GATAGGTACC GTCATAGTTT TTrCCCCTCA GATAGCTCCT CTTGCGCTTAA CTTGTTrrCTT CTCTACAAAC CAGACGATCT GTGACTGGCA TCTTTAGCCT 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 TACTCTCGTT TTTCTTCGAC TTCGTGAATG ACAATCTTGG CCGGAATACC GACAACCGTC
ACAGGCTCAT
ACGTCACTAG
CTACATCTGC TACGACAACT GCTGCAGCAC CGACCr'rGGC ATTTTCACCA ATTTCCACAG GCCCGATAAC TTGGGCATC GCTGATATGA GGGCTCCCTT TCGTACAGTC GGATGGCGTT TGCCACAGTC TTTCCCTGT'r CCCCCGAGAG TCACTCCG1TG ATAGAGAAGA ACCCTF'rTT CAACAATCGC TGTCTCTCCA ATCACCAGAC CAGAACCATG GTCAATAAAA ACACCTGAAT CAATCTGGGC TCCTGGATGA ATCTCAATCT GAGTCCAAAA GCGCCAAAAC TGACTGTACA TACCAGCTAA TAGTTTGAAG CCGTGCTTCC AGAGAAAATG CGAGAGACGG 'rGGGCCGCCA AGGCCTTGAC ACCTGGATAA GTCAGCAAAA CCTCCAAAGT GGTGCGGGCC GCTGGATCAT TTTCTTTAC AATATCAATG G~nrCGCGCC ACCACCCCAT ACATTTCTCC T'r'rCTTAT
CTGAATCTTT
GGCGGTGAGG
GTAGAAGAGC
CATCAACTC
TCTCAGAGAT
TCTCGATACG
TGATCr'rcr
GCGCTCAGAC
CTTCATAGAG
ATCCCCGAT
ATGAACAAGG
AACGACTTTA
GTAAA'rTCTT TCrrAGC1T'T GTAATCCTr TTTTTCACCTT TTTCATCATG CTCAGGN'T GCATCGAPAC GGCCT=?~C ATCAATTNTG TCTACCAAAT CCTCTACACG ATTGGTACGA GCATCTGI'CT TATCAAAGAG GTTAACAAAG
TGATGACGTG
GGCGGACGAG
ATAACCTTAA
GTCCAAGCCA
GCACCAAATT
0 0 0 000000 0 0 .000 0 0 0000 0 0000 0 0000 0 0 0 CAGCAATAAT TTCTTTGGCA T'TCCTTCTTC GTCTATATCA CTCCACCCTT ACCCATGACA TCGGAGCAGT TGGAGCCAAT G.GATTTCAAA ACGCGCTTC TCCCTTGAAT CTTGATATCC TGAAGTCCAT ATCTCCAAAG TATTTCCATC TGAGATAAGC CACCAGCCAT AAGGGCAAGA ATTCCAAAAC TTCTGCTACr CI-rGAGCAAG AGCACGCTCA CGTAACGACC TGTTT'CCCCT TCTTGTACTC TGGATCCAAA AGACTGAAAG AGCTTGAGTr GGAAGTCAAC AACCGCATCC CCTTGTCTTC TGTAATTAAA GCACGCTAAA CTTCATCCAC 'r'ITGGCcTCA CGAACCAAAC CGGTTAATAG CATCTTGGTC ACTAGAGTAC ATAGACACAT ATCTTAACAC CTGTrTCAGC GATAATCTTG TCGATGGTTT ATCTrAATCT TGTCCACATC AATCTTGATC GTATCAATTT TCTGGACGAA CTTCTGGAAT GGTTGCTTCA ATGACATCAA T'rGGCTTGAG CAAGAGCCTC CGTCAAGA'rr TCTGCAGTAA ATTTGAAGGG CTGTAATCCC ATCACGAGTA CCTGCAACCT TGATCTTCCA AACCTrGGAT ATCTGTCAAT ACTGTGTAGT CCCATAGCAA TACCAGCTAC TGGCGCCTTG ATTGGCACAC
GTTCCCGCAC
AGACGGATAG
CCAAGGGCAC
ACAGAATAr
CCATCAATGA
TGCCCACGAG
AAAGGACGGA
CGTCGCACTT
AGATAGAAGC TTGAGATGAA GAACCGTTTG CGTAGGGGAA TTrCTTCCAAG CTTGGCAAGA CGTGACCGAT TTCACGACGA CCTGGCGCAC GAGGGAAGTT ATAGTGGTGC ATAAACCGTT TTTGAGTTrC TCCCATCGGA CCCAAGGTCA TAAAGAGACC TGAACCATGT ACACGAGGAA 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 CCACATCACG CATAATACGG TCAAATTCTT CAGTCACTTG GTC~rCACT ACTTGAGTCG GAACTGCCTT ?TGGACGTCA CTGT'rGTAGG CC-ACGTGAAG CAATTCCACT TCTGCT7"M' GGAAGGCAAT CAATTCTTTG ACAGCTTCGT CTTCTGACAA TTCTTTGGCA CCAGACTCTA CTGTCAATTC AAGAAGAGAT TGCTCTGCTT
TTTCATCGAC
CTGCGTGTTC
CGTGGTCCGC
CAGCTTCACG
CTGCAATCAT
CTTTACCGAC
GCCCTTTAAG
CTTACGACCA TCAGGACGCA CATTTGTTCC AAGATTTCAG ATA7TTTCT TCGTAAACGG GGCCAATTTC TCTTCTACTT TTCAGCTTGC AATTCAGCAT AGCAGCAACG ATTTCTTCTT GAGCGCTTCC AACATGATTT CCATGTTGAT AGCGTGCrrG GTTCCAGCTPA GTTCTTGACT TGGGTTGATG ATGATTTGGC 948 CATCTACATA TCCCAC7T=G ACCCCAGCAA TTGGTCCGTC AAATGGAATA TCTGAAATAG ACAGTGCCAA AGATGAACCA AACATAGCAG CCA7TWGTGC AGATGCATTT TCATCATAAG AAAGCACTGT ATTGATGACT TGGACTTCAT TACGGAAACC TTCCGCAAAC TCGGACGGTC AATCAAACGC GCTGTCAAGC TCGCA'rCTGT TGAAGGACGT TCATAAAGCC ACCAGGAAAC 'rTCCCAGCCG CATACATTTT rrCICGTAG GTGGGAAGAA ATCCCCAGTT GCCATT'PTCT TAGACATAAC GGCAGCAG'rC ACTCACCGTA ACGTACGACA ACAGATCCAT TTGCTTGCTT AGCAACCTGA
ATAGGACGAA
CCTTC-ACGTT
TTGACTTGGA
AAGACATG
CCGTCTCTA
too**: :060 a CAPrAACTC ACGACCCGCA TTGGATTGAT GAAATTATAC CAAAGTAAAA ATAGGAAACT ACACAGCT1'T TCGGCCGTGT CATTTCTGTA GAAAAATAGG TTTT'rCCTAA GAAATGAGAC CAAACGAAGA TAGGAAATCG CACAGAGTTG TAGGCAAGT1T GTATCTAAAG CTTNCACGCT CTCGTCAAA'r AACATCGAT TGTATTATTT AATACCTTCA T'r'CTTAAAA AGTGAGGTCT TGGTTTGCTT TTATTATCCT TGTATTTTGC TGTATCAGCA TTACCAAGTC CACGATTTGG GGTCATGCTC ATACT'rGTCT AAGTCTTCAA GAGTCGGAGA TGGGAACATC AATCCCAGCC AAAGTCGTTT GAAACACTTG TTTTGCCATT TTAATCCCCT GCCTTGCCTA CAAAGATCAA GATACCAAGG ACG'TCAAAAG GACGAAGTCT TCGATGAAGA CAAGACAGTT TATCTTT TCAATTACAC AAGATATTTT GGACGGTTCG GCTTCCGAA AAGGTGACG'r CGCACTCGAC GAGTGCTAGG AAGCTTATCT CAAAATTCAA GTCATCAAGA TACCAAGCCG TCAAGCAACT AACGACGGAG CGACTACTCC TAGGGAGATT TATCTTTTC CAGTTTTCAA GATACATCAT TAGAAAGGTT TAATACTAAA AATCGCTATC GGGCGATTAG CTAAATGCTT TACTAACTCT TGACTCACTC GTGTCGTTAA A'rCTTACAGT TTAAATGCAT TCTTTGTATC AAGTACGTAC AGAATTTATT TTATCATATT T'rACCATTAA AAAGGAACCA T'rCCCCTCAC CTGAGAAGAA AGAGACTGGT GATTAAACAA GGCATGGGTT GC'rTGATGGA TTATTCATCG TATAGACATG CACACCGGCA ACATCCTGAG TCCACrGCAT AGGCAAGTCC TGCTGCTCTG ACACTCAG AAGATGGCTT TAAATTTGCG TGGAAGATGG ATATTCTCAC GCCTGATTTC GATTCAGAAT TGGCATA ATT CCTGCATGAA AAGATACACT TGTCCTGAAA ATCATAGAAG CGCTCATTGT AGGCTCGAAC AGCCTGCATC CACTTTCT-rC TTAAGATTTT GGCGAATCTG GATGCCCTTC TGGATAGCAA GCTCCAATAA TCCTTGATAA ACTCAATCAA GTCGGTTGCA TAGCGGAAAT GGAATAATAT CCCCACGAAG AGCCAAGATT TTCTGCACCC ATAGTTTCAG CAACCTTGTC CTTAGTTAGA 'rAAATAGCTG 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 00 S 0 *604
S
0605 005.
6 *000 05 S S
S
CAAAGAAGAG
GAATATCTGA
TATCAAAGTG
CCTTTTGTGG
CAACTTTGTC
CTGAG?1'ACG
AATCTGATTT
AGGGGTTTGT
?TCCACGTCT
CAAGTCAGCA
949 GCAAGTGGGC AATGGTCGGA ATCGCCAAAT TCGTTTCCTT GATATTAAAT TTA'N'ATTGC ACTCCTGCAT ATCCTGCAAG GC74GAAATAA GGAACACTTC AAATGAGAG'r GACGGTGTNT TGATrrCTTT TTGAACAACC ACTG'rATGGA CAGCTTTAGC TGCTTCAACA AGGCGGATCA 'M'CAAACC ACAGTCAGGG TTGATCCAAA CTTCGATTGT GTTGTCGATT TCGCCTTICAT CCCCAGGTCC CACTTCTGTT TGGAAGrTT TTGAACGGTTr AGCTTrCAAAG GAAATAACG'r TATCTGTAAA 7TCTGAGTAA CACATGTGAG AGTGTACCAA GCGGAAGGCA GGAATAGCCC GGCGGAGTGG CAATTTTTCA CGAAGAGCAG CAGCTTCAAG GTCAAGTACT TCATCCTTGA TOATAGAGAT GTCTTCACGT GGGAATGACC TACCTTTAAC AGGTTTGTTT GTACGACTTT GGTTAAGACG AGTGACATCA CCCCAGATGA ATTGTACCCA TCCATTTTTA GAGAAGAGGT CCATGTCATT ACGC1'CAA.AT TCACCGTGAA ACTTGATCCA TTCGTCAATC GTTTCAGCAA CACCTTTACG GTAAGCCAAA CGTTTGGCAC TCGT'rGITGT TGGAAGAGCT -GGAAGTTTGA CAAAGGC1'GG CAAACGAGTG TAGTCTGCGT AGCTTT=Tr
CTTTCTTGCT
TTGGTACACG
TCGCTTTGAG
CTGCATCCAT
TG'rGGAT'TTG
AGTCAAGGTA
CCTCGTCGAT
TAGCAACGGC
AGTTrAAGCAT
GTGCATAGCT
TTGGTGGT
ATCCTGACAA
CAAGGACATC
GGAPLAGCGTC
GAACTTCTTT
AAGCTTCTTC
CTGTCAAGCC
AGAGrTCTTT TGTTTCTGGG ATACCACGTG TGGCACTTTrA GCAAGGATG AGGTCAGTGG ATATCGTAAA TTCGTCCAAG ATTTCAAGG.T GTTATCGATA GCTGGGATGA TGTCTCTGGC GCTACTG'TTG GTCTTCGTAC CAGTCGC1'AC TTGGATGATT TTCACACCAG GATTTGGAGA GTTGAATCCT GGTA.ACAGGT CCAGTCAACA AGACCAT'rTA APCAGTGATAG TACCCCACGC ATACCGTATG G7'rTTGACCG AAGTACTCA.A AAAGTCAATA TCTTCTTGCC CTACTCTTTT TGAGACAATT TGTTTGAGGG AATGAACCAA 'rTGGATAGCT TCACGTTCTG ACGATACGC GCACGAAGTT GTTGGCTGCA AGAGCTTCTG CATr'rTGGAT AAAGTCAGCC AAACGAACCG TGGCAGTrAC ACTGATAAAA TG.GGGAGCCA TGTTATCA?'r ACCCACGGCT GGGrrGGAG rGGCGTGACAT ATGTAATAAC C7=rCTAGT GAGAAATCCA ATCTTACAAT TTCTCACGCG 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16850 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 CAGCATTTTC ACCAACACGC AACCTTGACC ATTTCGGATA AGGCAAAGTG GTTCAAGAGT GAAGAAGTGA GCAAGAGCTT CCAAGCTCTT TTCGTAGTTG GAGTCTTGTC AGCTGGGAAG
TCAGTCGCAA
GCATCCAAGT CACGGATTTC ATCCAATTTT TCAACTGCAA GCTG.GTTCAA ATTCTTCATT AGCAGTTGTA AATGGCACAT GTCAAAACGA TGTwr-rCAGC TGGAAT'rTGC TCAAGAACAG TTGCGCCAGA TGTTTTTACC ATTGACAATA CCTACATAGA CCACCTTTAA CGAGTTCAAG AGT'rTTCTTA CCTTCAACAA
AGTCAAGACC
CACCGAAATA
AAAGGITCAA
CCAATTGGAT
CCACTAAGCT
GGAAAGTGAA
GCAACTCATC
GAACGATGTA
C7rTTCTCC
GATAGCATCT
AGTrTGAAGC
GAAGAGAGCT
GCGAGTCGCA
ATCTACGAAG
GGGACCTACA
GAAAATCTTG
GTGGTAGTTA
CTGGTAACCA
950 ACTGGTAAGT T1TACAAGGTC AGCGTATACG TCACGAACAT AAGACT'rCAA GACC7TTTrTI GTCAGCCAAG AGrTGTTGT TT-TTCTTCAG CTGTCAAGTC TI-rrAC.AAGA GCCGCCTCAT
CCAAGTTCAG
TCGTCTGCT'r
AGAACAGGAC
TGACCAGCCA
GTGTTGAACC
CGTCCCAAAG
CCAA'r-r'AGIC AAAAAC,-rCT TCACGCCTTC TTCAAAGTCT GAGTGTTCAA TCCAAGTTCT A TTrACTTG AGTGTCTTTT AT?1'C'TCAT TGGAAGGGCG
TGGTAAGCAG
GACAATTGAA
TGCTTCT
TCAAA'rTrAG
CGAACGTCCC
GACAAGTCCA
AAGTTATCAT
ATGTTCCAGT
TCTTTTCTAA
AG1TTTCAAC GGATGCAGGC ACCACGTTGA ACTGAGAAAA GTCATTTGAT CGAATTTCAG GTTTAGCACG CAAGTC?=~ GCTGCTGCTA AGTATTTTTLC AGT'rGTAAAT TTTAATTCAC
CGAAGTAGCG
AAAGGAAAGC
TGATCCCTTT
AAAGTTCTTC
GGAAT'rCGCC
CTCAAGGTCA
CGCATCTAGG
TTC'NGACA
TTCTGAGATT
CAAACGAGGG AAACCGATGA TTGTAGTTGA CATGATGTGT CCTCCAAAAT TTGTTGTTGA AACTATCT'rA ACAGAAAAGA AAGCGTCTGT ATAATTGTAA AAAATTAGGG 'NTTGATATAG TTTGAA-ACTA TATATCTGTT TCGGACAAAA GAAAAAGACT TGAAGCA-AAC GTCTCAAATC CTT'rGTAATT CT'rAC'rTAC AGCTATATTC CAATTAGAAT ACTAAAACAT GTTATTAGTA AT'rCTTATAA GTGACTATGA 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 CCTG'rTATTA GAAAAGACTA TAACTGATTC TAGTCAACTT GATTGCTAGT GTCTTTCCTA ACGGCTAG GAC=rr'AAG AGTCTTTCCT CTCTCCATCC TAGCTATGAC AGGCTGGCT C7'TrTCTGA CTGATTCCTT GTTCATACCT AGCCTCAATC TCGCATATCA CTTTCAAGGA TTCCTCCTT GCTAAAGAC TTACTCCCAA TAGCACTATT CTTCATCACT TAACCCTCTT TTAAAAAAAT GAGcGAAT'rA TGATTCGATA GArMAcCAG GCC'rArTCT TAAGCGATT'r 'CCTTTTCTA GGATAAACCA AATTTTCCAC GATGAATCCA ATAGTAAATG GTTGAAATTC ATCACCATCA TTTCAGGCGA AAA'TTIrTCG 7rATGTTTTT CTTTTCCTTT AGTTTCTTAA GACTGTTCAG CGTAGTCGGC Tr'rCCCTGTT CAAGTGGGAC ACTGTATCCA ACTGAGGACT ATTCCACTGA CTTCTT-CCAG AACTCGCTCA TGATAGCCAC TCAGATGGAC ATCCTTCCAA T7'N'TTACGT CTATGTATT'r TGGTTAAA GTTGGTGCTA GTTCCTGCTT GCTTAACCCC CCACGTTrAAC
GGTTATGTAT
AGAATAAATC
CCCTT'rAGCC
AGTGGAGAAT
TCTr'rGAAC
ATAGTTIGAG
TTCCATC=T
GCCCT'rTTCC AAGACATTGT GAGCTTI'rCC AAGTAGAGAG CGGAC'IGTCC CACGCITrGAT TTCAGTGTGG GCAATTTCTC TATTTGATTT TCCTT~CTr'rT CGATTAAGCG ACGGCTATCG ATTGTCAAAT GTTTCCTTT TGTAGTATAA TTGTCTTGCA 198 19680 TTTCTGTGCC ?TT'rAATCAT TTCAATCrrA AMrrGGACTT AA'rCTA'rGAG GAAGACAAGA AAAAGAATAT CAATCAAGTA CTCCCAGCAA CCA71rGCAAA rTGAGGTACT CACACAATGA CTTTCGGTCA TTCTCI-TrrC TATCCCTATC ATAACTTATT CTTAACAT?1' GGCTATCTAC CCAACCTATC TTGGCACAGA 7*rTrrACTTc
AAGTCACAAA
TTAAAACATT
Ci~irlrccc
TTTATCCCTT
ACTGCAACTA TGCCTGCTAT TTTAAGTTTC TTATTrTTTr TCCTATCTT'r AATAAACAAA TACGGTTTA CTCTGGCATT T'rGCTCTTAC TATCGC'TCAT T'rCGGAACAG ATAAAACCCT TTCTTCTGCA TCAAATAAGA CTAAAACCTT ACTTGGAACG TCGC'rAA'rCA AATAGAAGCA CAACATATTG AGCGAA'TTTT GACGCCGATA TGGCTATATT CCCTGAACTA GCTACCAATA TCAGAGCTGA CACACAATCA AACTATTGTT TCATCAAGTr GGACTTTCTA TGGCCAACTA ACTTCTCCAC CTACCAATAG TGGAATAGCT CCTGTGACTG TGATTGTCAA GTGTACTT 19740 GTCACATTAG 19800 1'CTCTCTGCC 19860 ATCTTCTAAT 19920 CCCCTTAGCT 19980 TTACAAGAAA 20040 ArrACTATTA 20100 AAAATTAGTA 20160 TAGCCATTTT 20220 GCAAGAAAAC 20280 TGATATTTTC 20340 GAAAAGTTAT 20400 GCTTTCTATA CAGAAGCTAA AACTTTTCAT ACAACACGGT TCGGGACAAT TGTATTACAT TCGAGAAAAC AAAATATACC ACATATCATT GCCTTGCATA CTGCGCCTCC TCTCCCAGGT TTAATGGAAA TCTGC.AAGCA AGACTTAAAC ATCATTCATA ATCAATTGGC TTCAAAATAT CCAAAGGCTA TTATTGCAGG TGA7=NAAT GCAACTATGC GTCATCGAGC ACTTGCAAAA ATAAGCTCTC ATAGGGACGC AGCCAAAGTC CAAAACTTTT TACTATGTTA AAGATTTAGA ACAGAAATCA CATTTAATT
ATTAAATGCA
TAATGCAACA
CATTGTAAGT
ATTTTATATA
CTGCCAC=T TTGAAAGAGG AACTTGGA.AT ATAGATCATA TTTTATTGCC TAAAAACCAC TT'rCAAAACT CTGATCATAG ATGTATTTTT AAATCACCCC TCTAATGT'rC ATAAACTAGA 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 GGGGGA.ATTrT GTATCCTACT ATCG=TAAC GCACTTCTGC ATTGACTT TCTTCGAGAG ACGCTTGGAT r=TTCCATA TAGCGTGCCA CTTCTTCGTC CGTTAAGCTG TCTTCTGGAT
TTTGGAAGGT
AGACGTCAAA
CAACTTCTTG
GGAATrTGCT
GGTTAAGCTC
CAAGCTATAA
GAGTTTGATA
GTGAGTCACT
CATTTCCACA
AGCTACATAC
GCCATTGACT TCATACCAAG TCCC-AG7=T TCACCTGAGA TCTGTCAAAC GTTTCACGCC GGCAGCTTGG ATAGCATCTA 1'CTGCCTTGA GGAGAAGCGC AACGTCACGG CTGACTGCTG AATGGAACAG CAGGTTCGAG CGCCCCTTCG ATGGCTGAAA GTTTCTGGAA TATCGTAAGC =TGGCAGTG ACTGGATGCA CTTGGCCAAG GAAACCAAGA ACTTGGTCAC CGAGTGAAAT CACGGCTCTA CGACCTGGAT 952 GAAGGCTAAC GATTTrCAGAT GT'rGCTGTAT AGGTTACTTG GAGTCCCAAA CGAGTAAA'rA GGGCTTCAAG GATTCCCTTA GCATAGAAGA AA'rCAACTGG AACTGCTGCT GTT1TGGAAAT C7'rTTCAGC AACCAAGCCI' CTrC7'rTTc ATTACCTGTT TATTCTTACG AGCCACGTTG GGAGGACTGA ACGATCCACA CTGTGAACTC AACTGCTTTT CTGCTCCTC AGCAATGGTA CAGCTGTACC ATCGTCTT CGATTTCTTC AAAGAGATCA CTGTAAAGCT GTCTGCATTT CATCAGCATA AGACAGCTCA CTTCCACATC AGAGGTATCA CTGCAAGCTC TGCAATCATG TAATTCCTTT TTCAAAGCGA TACGGATAGA TTTGCCATTA AAATTT'CTGT AGCCTGACCA TAATCACGAG GTCTGTCTCA CACCATCACG CGCTTCACGC CATGCATAGG 'rrGACCAAAG TGGGACGGAT GCCT'rCG'TC TCACATTGTC CAAGATACGA AAAGGGCATC TGCCGCAGCT CCT'rGTCATA GATGGCTGCC GGTTTGGTGT GATGGAAAGT CCTCACCTGG CACGGCATCT CAACTGAGTC AGAAATTCCC CACCGAT'rTT TCCTTr'rTTG CCATGACC'rT GATCCCAGCA CTTCGCCAAC GTTAATCTGA
GTCAAGGCAA
TGTTCAAAGA
TAGGCAACGG
GTCATTGGCC
'rCAGGAGTTG
CGAACTTGAC
GGAAGGCTGG
GCTTCGATTG
CCAGAAAGAC CAAAGCCAAG ACGACGGAAG C'rTCCGAGGA CACGGTTAAC ATCAGCAAGG AGCTCACCCG CTGAAACGAT ACCCTTACGC CTAGCTGCCG CATCAAGGGC TTCATTAACT GAAGATGACT CAGAACCAAG GTTCAGGCGA AAAACAGCAG CTTCAAGGAT AACACGACTA CCCATAACAC CGGCAAGGGC TACTGGTTTG GCCAAGTCTC GTTCTTCACC GTCCAGGGTC ACACGGATGT CAGTCCCT'rC AAATGTGTCC TAGAGCAGGA TGTAGTTTGT CACGTCTACA ATGAGAAGGT lTTTGCAACCA TTGTGGACTT GCTGCATAGT AAGGCGCCTT GTCTGTCTCA 'rCAT'rAGTT'r CTGTTAGAGT AAA'wTTTTA ACTTCGTGAG CCACTCCACA CATAGAAAGG TCGATGATTT CATCATCCAA GTCTAGGTAA 'rCAGGCAAGA TT'rGGA'rGCC ATCTGCGAAT AATTCACCAA GTGAACAGAT CATTCCAAGT AT'T-TGTAGT TATCAGCGAT ACGAGCTCCT CGCACATTTG GGGCACCACA AACGATCTGA CAA.ACATGGA CGTCACTCTC 'rGGCACATCT
ACGTCTTCTA
GTTGAAGAAA
ACCGTCGCGC
GTTGCCACAT
CCACTTGTCT
GAT=TTCAG
TCAGCA.ACTG
ACTAATTTTT
AAGTCAAAAG
ACGTTATTGA
GGTGCGATAG
ATGCTGACAG
PLAGTTGACTG
GCATCTGCAC
GAAAAGACTT
TCCTTAGGCA
GACTCCAAAC
GGAAGAGCCA
CGCTCTTCTT
TCGCAAGACA
AGGCAAAGCT GTT'GATCTCA TTTCGAAG'TT CTTTTCCAAT CTCATAAAGG GCCAAGrTrrI 'rATCAAGGAT CCCTGAAATC ATATTTTGAC ACATGAGTTC AGTAAGGTTA CTTGGTTGAG TCAGAGCATA GGTGATGATT TCTGTCAAAC GGCCGACTTT TTGTATCACA GTCAATTCAC T'rGGCAAGCG GTCATATCCA TAGATACCAG TGATATCCCA ACGACGACGT GGTACGCTGA 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 23100 23160 AGACCTCACC GCcGACAATT TTTGAGAGAC CAGCAGCTGG CGATCCCTGT AG?1'GACATT ?T'rCAGCCA ACrC7~T=A ATTC=rAA CCATTTATAA GATACAAGCA TAA~rTC TGArrCGACA CCCTCTACCI' TCCCACATCA ATGTCCACCA CTCCAGAATG ACAGTTGTCA CCTCTTCTTA CCTTAAT~rC CTCTAGI'CT 1-1rCCTTTCC Tl'rCTCAGTA ACCAATCCT TT'rTCAAAAC CATATCGT AGCCAAGCCC AAGAAAAACT TTACCTAGTC CAAATCCTTG TCCTCTAATT2 CTC'TCTCAG'r TCC'rCCTCAT GCATAATGAA GTTTTCAGAC TATAACCCTC GCAAAGGTTT CACGAAAGGT TCTACTTTTC TAATCATTAT CACGGATATC GTTGATTCCA TATCATTTCA ATAGAACAAT ATCTACTT TGACCAACCA ATAAA.ACGCT TGAGCTTTTG ATTTITTGTA GCAAGTTCAA GAATTTTrTGT AGCACATAGA TTGAGCACTT CCCCAGTTGA ATAGGTTTCA GAGTCAGGAT TTCAAAGTAT TCCTGTA.ACT
TAAAATGATG
TATTATGCTC
GTGCGAArrC GACG-rTGAAT
CTTTGAGAAA
TTGGCTAAA'r
CCAAACACCT
AAACAGTTGC
TTCAAAAGCG
ACCAGCTATC
23220 23280 23340 23400 23460 23520 23580 23640 23700 TrCCCAACTC AGTTGACAAA 23760 GCTCTTCCGT ATTATCATAC 23820 TTGTTTGGCA ATTTTAGCCA ACACCTCAAC T'rAAACTGTT C1'GAGAAGCG GACATCTCCT TAACGGAGCA TAGCTACACG CTCTTGTCCA CACTCAT'rrC AAGGACACGT ATCCCATTr
TGGTAGAATC
AGACCAAAGG
GGGTGAACCA
CCTTCTCCAC
AAGTAAGATG
CAAAGCCAGA GTATACAGTC GCATCGATAC TACCGGCCCC CA'rAAr'rTCG ATCCAACCTG 'rTTTCTTACA CACACTTGAA GCAAGAAACA TCCACCTCAA CAGA'rGGCTC GACGCAAACG AA3-rTGACGC TCTTCACCAA ACATTTTTTG CTTGAAGATC AGCCATAGAG ATATTTTTCC CAACTACCAA GGTGACTGTG GGTCGCATCG TCCGTATCGC GACGGAAGAC TCAAAGGACC TTTAGAAAAA TCATGGGCAT CCATAGCACG GGGTACGGAG CAAGATTTCT TCAGTGATAT AGAAAGTATC GGTCTTTTGG AAGGTTCATA CGTTCAAAGT TATAGTAGTC CCACGACTTG ATAACCCATA CCGATGAAGA TATCTTCGA'r AAACGTGACG GTGACCAGTC GCAACTGGAC GACCTGGAAG TAGCCAGTTG AGCCCGACT 1TT'TT CCAAGAGCr'r
TACATTACAG
TGTGA.ATGGG
GACAATCAAC TGAAGCGTTC GCCTTCGATT TGGTGGAATT ACGCCCTGGC GAGATCATCT CGCCTGAACT GGAGACGTGT CTGCATATCA CGAGCTGGGT TTGCTCCACT TCAAAACCAT TTCTTCACTG GT'PTGTGTCA CGTCACATCT ATACTCTCGC AGCTGTTTC'r TCAAAAGCAG 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 24660 24720 24780 24840 24900 CAGTCAACAC ATCACGAGCT TCATTGACGT GTTTCCCGAT GATTGGACGC ATCTCAGCAG AAACATCTTT CATCCCTTTG AGGATTTCAG TGAGCGAACC CTTTTTACCA AGGACAGAGA CACGCAAATC TTGCA'rCTCT TTCATTTC CAGCAGTAAT CTCTTCAAG CTAGCCAGCG TTTCTTCGCG AAGCGCTTT
CTCGTAGATA
CCATCCGTTT
ACCCAGCTTT
GGCTACI'AGT
CTTGCAACAC
TTATCTAAAT
TAAAATAAAG
ATGCCACCTT
TCCATT'rGAT
TGTACCACTT
TGCAGTGGAC
ACCATCCET
ATCATCTTGG
TCCATTTTCT
ACGTACTCCA
TAAACCACCA
CACACTCATG
ATCTTCAGCT
CGGATAATAA
AGTCGCCCAT
ACATCCCACC
GACTAATTCA
ATCTATTCCT
AAAAGAAAAC
TCATCTGACA
CATA'rAGAGA CTTTCAGAT'r
CTATCI'ACT
TAACTATTTA
AAGT'rTAGAA
AACCGTGATA
TCAATCACTT
CCATCATCGT
ATAGAAACAC
AGAGCTTCAG
ACCGTTAAGT
ACTATAAATA
ATTGGATCAA
ATAATATTCC
TAATAGCTAA
GCAAACTCTA
CCTCTAACAT
ACATCTTTTG
TTAAACTCTG
AATGTCTTc
CACATGCCAA
AGTCAGACCT
GCTTGCAGTC
CCTATTCAAT
TCTGCTTGTT
TAATTTTTGT
TTTAATGCTr
ATAGCTAGTC
CTTATACCA
ACGGCTCTCC TCCCTGATAT ACTTCCCTTG TACTACTAG TTTATCAGAT 7='ACCATT AGCTTATTCT TATCTAAATT TATATAAACC AACAAAATTrA AATTAATTGA CACTCCCCTA TCCAAACTTC TTTATTCCAT AT'rTAATGAA ATCAATAAA.A AACTATTTGA ATAAGGATTC AGTAAAAGAC A TTT TC1=AT ATCGATTT'AA TTrCGATCA.AC ATAAATGAGA AGTCAATACA 'rCCCCAAGAC CAACTTGCAA TAAATGTTCT TATTAAGTTC ATCTTTTATT ATGGGAT'TTG ATAACGGTCA TTTGCCATCC CCACTCTGA.A CTTCTCCTGA ATTATACTGT AGGATAAAAA ATCTACGGTA TGTrA.ATGTC ATTTTCCTTA GCACATCTGA AAATAGATAA GAT'rTGGAGT CATTGGATAA AATTAATCTC ACGAGCAATT CTTGATATAA TTCTTG'rTTC ATIGGTAATTC CAAAACAGAG TATACCTTTC TAAAACTGTT CCGTACC'rTT 'rAGAAAGTTG GTATAGCCCA TAATTTCAAC TTCATATACT GAATTCTATA AGTTGATCTT TAGCACCTA.A TAATATCTAT TTAAAATTAT GACTCTAAAT ALAGGATTTAC GTTGGAAGAG CAGATTGAGT TAATTTTTTA ATAACTCTC AAATATCTTT TTGCATAATT TTTAGATTCT CATACTCATG GCTGGCATAC CTAATACCAT N'TGTA.ACCA AACTTGAGGC GAAAGATTCT CCTTAGG'rAT TTTACTTCGT TAAATGTAAG CGAGCAAATT TTTCATAAAA GCTAAATATA ATGGAGTCTC TGTAGTTCAT CAAACAATTC TCTCCTTTTG GAAAAATTCT TCAGAAAACA AGGATATATC AAGTTATCTT CTGTAGGATT 954
CAATAGTTGA
AAACTCCACT
TCATTTCTAA
CATATTTCCT CCATCAGTCT CCGAGCGTTG ACACGCGGTA ATCCATGCGC AAGTGAATTC 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700
CCACTAGTAA
CCAATATTTA ACTTTATCTT ATGAATCATT CTCCTATCAA
ATAGTCTGAA
ATCATAATAT
AGAGTTACAA
TTCAACCCAG
CCCATCCATG
GTGGTTCTAT
CTTCGTTAGG
GAAAAACA'T
AATCAATACC
ATATTTTCTT
CCCGTGAGCA
TTCTTCCTCA
AAAGCCCATT
TATCAATTTT
ACTCCATGCA ATAGAAGTAC TTCCTTATAT TTATGATAAA 955 TTCTGTTrGCT TCTCCTAATC CACCT'rTGGG TAACACATCC TGAAC'TGATA ATCTCATTA TATGCTCCCr CTACTTGATT AGCTGCAACA CCTCCACCCC ATCTGGAAAA ATGGTCATAA C?1'?CCrCCA TTATAATATT ACCAGTAATT CTCGATTGTC TGA'rrATTAG GTAATACTAA 'rACATCTAGA AAATCATCG AATTACTG.GT GTAACTG=.1 CGTAGCC?1TT AGTCTTGATT AAATTCAAGT AATCAACTGA 7TTTGAAAA CTCTGTCTCC TTCTCTACA TGACTAATAA
AGCCCTTACC
AAAGAAAATC
CCTTAGAATG
TATTCGTTAC
CCA'r-r'CAAA
AACTGACC
CATCACTCT?1
ATGGTGCATA
CTCAAAATGT
TT7"rAGCTCA ACAGTATCTA CAATCCAATr GCGTGCTTAG AACCTTACCT TCACTTGGGA TTTATcCCGG ACATCGCTTA =TATTTGAA ACTCCAGC~r AAA'rATATAA GCTAATACAA TACAATAT 'T GATGGCACAT AGCAAATAGA TATGCTTTAA AATCA'rAGCT GCATAAAGCG GGTAATCCCT GCAAGTAAGG AT'TTTACTC TTTAATGCAA CATTGC1'GGA AGAATTA.ATA AAAAGCCCAA TGCATTCCAG TGTAAGCCAT CCAGCTACAC AA'rTACTCCA ATAGGTCCGA CGTAGGT'rGC AAAAAACTCT ATATr'rCA'rC AACCAAACCA TGTCACAGGT GCACCAA.ATA TGGATG.GAGA AGTACACCTG TGATGCAGAA 'rAAGCTAATA CAAAAAAGCA ATAGTCTGAG CAAGAC?1'TC AACATACCTC GATATACTCA ATGATTCT ACGGAATGAT TTC'rCCTrAT ATAGGAGAAA ATATCATTTT CAACTTCTAA ATTGCTAGAA CT CTTCTT CA'rCGATTCC AGGTAATAAC AACCGAAATG ACCGCCACAA TTA.AAGCATT CAGAATAAAT AAATTGAGGC AACGCTATCA AAGATGGGAC CACTAGTAAG ACCTGCAAAT AATCCCGCTA ATCCACCACC GTT'TTTTATA TTTTAAAGTC ACACCATATA ATGCAGGTTC ATCCAATATG AATTAGTAAC TCAACACCCT TCGGAAAAAT ATTTGTAATT TTCCCATCAA TAATCGCTIAC TCCGTCTCCA A'IrAGTTTAT 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 28380 28440 CTGAGAAACC TGCTGCAA.AA CAGCCATCGA AGCAGCCCCT CGTCTGGAGT AGCAATAGAT TCATAACAAT AAATCCCATA CATACATT'rG CCCAACTAGA CTACAACTAA GGCAATACAG TAGTAATAGC TAGTGTTAAT TAATAAGAAT TGGAACGACT A.ACTAAGAGG ATTCCCTGA'r CTACAGACAT ACI'AATGTA ACAGCGGTALA GAAATAATAT AATCTGATTG CAATATACCA CCCCTAACAT TGCTGGAATG CTAAAATATT CCC?1'TGTGC GCAAT-TGTT TTGTATTATT TGAGCTAAGT TTGACCCTAA CCGCCCAAAA AAATAGGTGC ATAGCACCAA GAATAGCTAA TTTGATALATC CTTCACCAAC CTTGATACTA ATAATACTAG TTAGCAATTA TTT'NCAkT GATGAACCAT AACTAGCTCG TGCACCATTT GAACAAAATT GATGTTACTT TTAA'I-TTG GGAGCATCCC CAAAAAATGT AGCATI'GGTA AAATGATTAC ATTGGAGTCA TGGAACCAGC CCTTGAACAA CrGAATCGGA
TTCAAAATTG
AATTTGATAT
ATCATCATTG
AACTCTATTG
ATAACTCATG
GCCATCAAAT
ATTGCTACCT
AAGTCTTCAT
CCAAGTTTAA
TGTCCATTCT
ACTAAATT
ACATTTTr
GTATTCTCCT
AAACTAATTC
GTTCACCACA
CATCATCGTC
956 CGAATTCTTT ATAA'rAATTA GCTACATCAT TACCAAGTAT TTTTCATAAT ACCrATTACA CCTGGTATCT TCTTCACATC CATCTTTTAA TrCTAATCT'r AAACGrTA CACAATGGGT CACCTCCAAT TACATCGAGG ATTmTGTA CCGTATCTTT ATrCTATrAA TCTAAATTr 'TGTTAAGCG ACGAATATGA ACTAGAAGTC AGCAAATAAT TGTACTCCGT 1-rGTATAAAC 28500 28560 28620 28680 28740 28800 28860 28882 TTCATATTCT CTAGGATATT
GG
TATTTTCAT TAATGCTAAC INFORMATION FOR SEQ ID NO: 141: SEQUENCE CHARACTERISTICS: LENGTH: 12835 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: GCCTATGTCT TTTT~CAAAAA AATGCTTGAC TTGAGACGGG AACTAGGGAA GTCTAA.AGGC GGAAGGCATT GATTTATACT CTTCGAAAAT CTCTTCAAAC CACGTCAACG T'rATATATGT AACTGAC'TC ACTTGCGGCT AGTTTCCTAG TTTGTAGGAG GTGGCTTATG ATGTCTTGCT TACTTTGCTT TCTTTTTTGA TATTCATGCG ACTTATTAAA ATCAGCATTA CATTTATTAT TCCAATCATC GTCGATGCTT ATCTACAACC TCAAAGCAGT TTTGCTCTTT GATTTTCATP GAGTAT'rATA AAGATTCCTC TCTTAACTTT TGCAAGGCAT TTTCTTGCTT TGGTTTATCG TGATGTTTTG CCCGATCTAG CTAAATTCGA TGGACAAGCA GATTTTCGTA TTCTCCAGTT CAATCTAGGT ATTGTTTTGC TAGGTTTTCA ATATATTGAG TCGCCTTGGA 120 GCT'rTGAGCA 180 TTACTTTCTA 240 AAATTTCTTT 300 ATGACTTATT 360 ATTAAAAATG 420 TTTTATCAAT 480 CTGAAAAATA 540 AGAAAGTTGA 600 ATAATTGCAA 660 TTTTCTGATG 720 TTTACTTGTG 780 GATTATGTGG 840 TCTATGCTGC 900 GCTAGCTATG 960 AAG'PTTTACG ATTGAGTATT GGAAGAGAAG TGAGTTATCA AGGGTTAAAA CTTTGCAAGT TGCAAGTATC CCTTGTTTGA TATATTTAGT GACTGTGCTG
TTATAACCTA
GAAGTGGTTT
TCCTACTAAT
GGAATGTGAC
TTTATAGTGC
TTTCTTTGGG
ACAAAGACTC
CGGTATTTTC
TCGTTCGGCA
CTTGCCTTAC
ACTTrTTTCTC CTCTTGGATG GAATTCTCTA CTAGATGGAG AGATAAAAAG CTATTTGTTC ATCAATGCAA TCTATTrT ACAAATAGTT ATCACCTXI'T TGATGTTTCT TTGGCTTGGT TATATGGTTC CTATGACGAG Tr'rGATGCAA 957 GGGATGTAAG 'rTTGATGAAA TGCT'rGAAAA ATATGAACAT ATTGTTTTAC TACICGTAT GCGGTTCAAA AGCAGGATGC TATAATGACT ATT'CTGAAG4C TT'CATGGTAG GATGGCTCTC CACTTGATTC GCTATCAATC ATTTCTAAAT TTTTTACTCA
ATTCATTTCA.AAACAGTCGC
CTACTGTCTT ATCTGATAGC TTTTTAGCAT r'rGGGGACA CTCTTAAGTT TATTAGTTAT TAAACGATGA GAC1-rGAAAT AATCAGTTrGA AT'rTGGTGTT
CTCTTTACTC
AATGTTTAAG
AGTTCTGATG
TGTTATCT
GAATTTAGAA
TGTCATN'TTA
AAGCTCCTTT
AGATTTGTT
ACTTTTCTTT
CTTATATCCT 7rATATTGTC CCTTACATGG AA'TTTrAACA ATATTTTGCT AAATAGAAAG ATGATTI'TA TAAACCA'rCT ATTGTCAACA TrTCAAGAGAG AATTGATTC AA7*"N1CC ATCCCCAAAC TATTGTTAAA CCTTTCGCTT CTTGAAAGTG ATTTGGCAGA CCATTACCAT TTCGATTATA CAAG4GAAACG AT'rGGTTGTC GTCTGGTTTC TTGCTTTACT TCCTCTAGGA TTACTTGCTC AG'rTAATGAT GTTGTACTTA ACTGATTAGT GCGGGCGCTG GTTTTTCCTT TTTNTCTCTAT AGAATGGATG ATGGATCATA TTGTAACAGT GTATTTAGTA GTTGAI=T AGTCGCTTGG TATAAATGGA CAGAAAATTT AAGACAAATT TAAGAAAGGA ATGGGAAAAG ACCTATrr'rA
TCAATCAGGA
TCTGGCAAGA CGGTTCTTTT AAAGATACTT G7TC'TTCAAG
TTAATTGAAA
AGGTATTTTT
TTACAGGAAT
GCCTTGATTC
GCTTTGGATG
GAAAATGGTC
GATGTATTAG
ATGGTAAAGI' TTACGGGGTA AAGTCGAGTT TTTATCTCAT CATCTAAAGT TACGGAAAAA T'rGAAGACAT TGAATACCGT AAGCCT'rTAT TTCCTCTCCT1 AGAAGAGTGT GAGGTTAACC TCGT'rATCCT GACGTCGCAC TTGTCGAAAA TGGACATATA AAAATTTATC GACTTAAAGG TGATAATGGA GCTGGTTATA 'rTAACrTGA CAAAGGAAAA AAAAATCAT'F ATAT1TCAGGA TGCAGGAATT TTATCCCTGA GAGAAAATTT GGAACTGTTA AGAATrGCCT ATTGGATrCA ATACTATGAT CATT'rATCCT TAGGAACAAA GCAAAAAATG TCTATACTCT TTCTCGATGA ACCTATGAAT AAACAGGTCA TTTTATCTTA CCTGAAAAAA ATATCGGAAG ATA7"rTCAGA CCTTTGTACA CAAATGTAAA GGATATACAA TCCTAGGAGA 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 0 0 TGCTTATGG CACATCTAAA ATCATTTATT ACACGATATT GTTCTGCTGA TCTGGCTGTC ?ITrCTTT1 ATCCCTTGGG AGGATTGACA 'rCTTCATCAT ACAGAAAATC TTGCTAGCT'r ATGGCCTTGC 'rCTCCAAGAA AGTCAGTCTC T'rTGTrTI CYTTTGGATTA ACTTqATTTNAT CACATTTGCC ATrTGCCGA CCAAGGTr'rA TATTGGrr'rA ATAAACCACT TCTGGGGATA TTGGAATTCT GTCCATTCTC GACTGATTTG CTGTCTTTCT TTTTTGGCAA TTAAACAGTC ATAAAAGTCG GAGAGGFAG CTTGAAAACT AACCTCT'T? TCCT'N'TCAA AATGCGGATT 958 CTTCC7"rGAA AATAATCAGT AATTGTGCTA AAATTAAAGG AACATTCTAA AATATTCGGA ATTTAAAGTA AGGAAAAACA TGGCTAATAT TTAAAAACA ATTATCGAAA ATGATAAAGG AGAAATCCCT CGTCrGAAA AGATGGCTGA CAAGCTTTTC AAATACCAAG ACCAAATGGC TGC7*rGACT GACGACCAAC TAAAAGCAAA AACAG7rGAA TTTAAGGAAC GTTATCAAAA TGGAGAATCA CTGGATTCAT TGCTTTACGA AGCA'TrGCG GTTGTCCGTG AAGGTGCAA ACGTGTrCCTA GGTCTCTTCC TGGTGACGTG CCAGAGATC ATACCTCAAT GCCCTTTCAG AGAACGTGAC GCGACTGAGA TAACTTGGCr ACCAAATCTC CTCAACTAAC TCAGAAATCG C1"rATAAGGTr TCAGGTCATG GTACAGGGGA ACGGAAAACC
CCC.GGGATTG
TTGACTGCGA
TTCI'TCACCA
CCATGCCGGT
TGGrGAA'TT
CAATGGAGAA
GATTTGACTA
Tt.ACGTACGTT ACGG'ITAATG AATACCTGTC GTACTCI'TGG CTTGGT'rrGT CAGTAGGGAT AAAAGAAGCC TATGAGTG'rG ATATTACTTA CCTTCGTGAC AACATGGTCG TTCGCGCCGA CTTCGTCGAT CAGGTTGACT CTATCTTrGAT AGGTGCCAAT GCGGT'rGAAA CCAGTCAGTr TTTGAACAAA GATGACTACA TCATCGATGT 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600
S
S
S9 *a S 5 *5
S
S
0* 55 5 0
S.
0 5
S
AAACATGGTA CAACGTCCGC 'rTAACTATGC TGACGAGGCT CGTACACCTT TGATTGTATC GTATCACATG GCAGACCACT ATGTAAAATC GCAGTCTAAG ACTAT'rGGTT ACTTGAAAAC CTCTATGACA TCGTGCCAAC TACATCATGC CTTGATTGTC GACCAATTTA GCACCAAGCT ATTGAAGCCA CTCAATCACG TACCAAAACC AGGTAAGACI' GAGGAAGAAG AACAAACCGT CCTGTTCAAC TAAGTTTAAA CCGGTTGTCG GGTTGGT ACA GTAGcGGT'rG TGTCTGAT'TC AGGGATTGAC AGGGCTGAAA GCTACTTCAA TCGAAAACGT GGCTTTGACT CACTTTATCG TTCTCGATAT TGACTATGTG GTGAGCGAAG CAGGrCCTAC CATGGAAGGT CGTCGTTATT AAGAAGGTGT GCCAATCCAG GATGAAACCA TCTTCCGTAT GTACAAGAAA TTGTCTGGTA AATTCCGTGA AATCTACAAC ATTCGTGTTA ATAACGCCCT 3660 AGCAAGAAAT 3720 CTGATGGATT 3780 AGACATCTGC 3840 TGACGGGTAC 3900 TTCCAATCCC 3960 GTATCGAATC 4020 AACCTGTCTT 4080 TTGCAGCTGG 4140 S. 55 S S
S
@5.4
S
*5*e
S
00*0 S. *S 0 0
GTATTGACCA
AAGACGTTAA
CTCAGACCTT
GGCTCGTTAC
CTTTATGCAA
CA.AAAGGGTC
AAGAAATTGG AAACTAGTGA CTACATTTCT TGTTCCTCAC GAAGTCTTGA ATGCCAAAAA CCACTATAGA TGCTGGTCAA CGTGG'rGCCG TTACCATCGC AACCAACATG CAAGCT'rGG'r GAAGGTGTC GTGAACTTGG AGGACTTTGT TGAAAGTCGT CGTATCGATA ACCAGCT'rCG TGGACGTTCA GAAGCCCAAA TCATCA'rGAA GCGGGTCGTG GTACCGACAT GTTATTGGTA CAGAACGTCA GGTCGTCAAG GAGATCCAGG 4200 4260 4320 4380 TGACTCACAA TT-CTACCTAT CTCTTGAAGA TGA'r'r'GA'rG AAACGrTTrG GTTCTCAACG CTTGAAGGGA ATC7MTAAC GCrTGAACAT GTCTGAAGAG GCCATTGAGT CTCGCATGTT 959 GACCCTCAG GTTGAAGCAG ACAAGTCCTT CAATACGATG TTACGATGTC ATCACTGCAG CACGATTGAA CGTGTCGTTG AATTTT'GAAC TTTGCTAAGTI GTCAGGCTTG TCTGATAAGG CGATAGTCAG GTTCAAAAC GATTCrACGA GTGGTGGATA TAACGCGGTT GGACTTCGTG AGGTTTCCGI' A'rCTTTAATG GATGAAAGCA CAAATTrCATG AGCCACTCGC AATATCGCTG GATTGGACGC AATGAACTTT TAAAAGACAA TAAAATGAGA TCGGTATCCG TTTGCTTNAT ATAAATATAG AAGAAG-ICG TCACAGCGAG ATCAAGAGCT GTAATCGGGC CATGCTCATC GCAGTCCTAC AAGAAGAAGT CTCAGAAACG TGTCGAAGGA ATGTCATGCG TGAACAACGT ATCGTGACI'T GCCACCTGAA ATGGTCATGC GCGCCCAAA ACAACTTGCT TCCTGAAGAT CCATCAAGGA AGAGCCNC TACGCGATGA AGAAGCAGTT ACAAGTGGAC AGATCATATC GCTATGCI'CA GAACAACCCT ATATGATTGG T'rCGATTGAG AACAAGAAAG ACCACACGCA CTCACCAAGC AAGTATGCCA GCCCATGTGG TTCTGGTAAG TAGTTTAGAG GCGGATATCT AAGGAGATGA GI-rATGGTAT TGCCITTGTCA AAArTAGAAG AGAAGCCATT ATACCGCGAG
AATAACTACG
GAGATTATCT
ATTCAGTCTA
ATACCCGTAA
ATGCTCAACG
TGATCAAACG
CAAGATGAAA AACTAGAGGC TCTATTACGA TGGAAGACTT CAACCTTCCI' TrAAGG=~A AA.AGAATTCC AAAAAG7 Mr GATGCCCTTG ATCAATT1'CG
GTTGTTGAGT
TTTGATGTGA
ATCAGGCAGA
CACCIGAT
CAACGTCATA TCAGTACAAC GAAGA'r-rGG ATT'TGAGCCA AAATTTAAAA ACTGTCACGG TGTGAAAAGT AAATTTTTAC
TGACAACGA.A
GGCAGATCGT
AAACCCCG1'A CCAACGGAGA TGGCTATAAG GCGCCTAGTC TrATCAATGG AATCAAAGCC GAAACAGGGA 'rGACAACTGC TGATGAAATG GATTT'GATTT CTTACATGGC AG~rGGTGCC GTGGCAAGTG GGGCAGGATT TTCTACTGCI' GTCATGTTTA ATGGGATrrA TGCTGCTCAA GAAGTAGAAA CAACTGGGAA CCCGCTT'rCA TATGGAAAAA ATArrCCCAA CTACTATTAT
GAAGCTGTCC
ATCTTI'ATGG
GCCTTGATC
GT'rCGCCATC CTTTrATCCTG
CGTTCAGTTG
'TTAAAAATC
AACAAACAAA
CACGCTATC
GACAATTTAA
GTCAGGCTr-r GGAGAGGAAA AAGACCAGCG AATTCTCTTC TTGAATACGC TAAGCGTTTG TTA'rGCGTGT TTATACTGCC ACCAGCCTAA CGCGACAGAA TTCACTATCG TGTCATCACA AAAACCTTCC GCTTGTAC-AT A.AGACCAGCA ACACCGCTTT CA.ACCTCTGG AAATCTCAAT GTTTCC7rTT CTTAGGAAAA TTCGTGGTGC TCTTAATGAG TTGATACCAT TGCCCAGTAT 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 GAGAAAATGG CCTTGGAAA.A TCCTTTTATCATCATTGATA CCAATCATGA CAATTCTGGT AAGCAGTATA TTGAACAGAT CCGAA~rGTC CGCCAGACCT TGATTAACCG TGCrTGGAAT 960 GAAAAAArrA AGCAGTTCGT TCGTGGrrTT ATGATTGACT CAAAATGAGC CAGAACTATT TGGTAAGTCT ATCACAGACC ACAGAAGCTC TTGTCAGAGA AATI'ACAAA ACGTTAGGAG AAAACGTCAA GAAATCGATA 'rGGAAGTCAT CAAGGCTGAA CTTGAGACTC AAGGAAAGCC GTGACAGGGA ATTGGCAGA'r CCGTATTCTC 7rCGTGATTG GTCCTGCTC TTCTGATAAT TGCTCGCCGT T'rATCTGCCT TGCAAAAGAA GGTACCGGAT CGTGTATACT GCTAAGCCTC GTACCAAPGG AGACGGCTAT AGATACTTCT AAGGCTCCAA GCCTGATTAA TGGCTTGCAG CCGCGTGATT ACAGAGACTG GTTTGACAAC GGCAGATGAG GATCTGGTG GATGACTTGG TCAGCTACCA TGCCGTTGGA AGAGCACCGC TTTGTGGCTT CTGGGATTGA TGCACCAGTA AGGAAATTTG GGTGTTATGT ?rAACGCCAT CTATGCTGCT TTATCATGGG CAGGAAGTTG AGACATCAGG TAATCCrTG ***AGCAGTCAAC GAGTATGGCA ATTATATGCC GAATTACTAC CATTrGAACGC TATGAAACCA TCGACTTGA AAATCCTTTT TGATAACTCA GGCAAGCA.AT ATATGGAGCA GATCGAAT'r TCGTGATTGG AATGAGAAAA TTAAAAAGAC GGT'rCGAGGA AGCAGATGGT CGTCAAAACC AACCAGAGAT CTTTGGTTGC AGGTTGGGAA AATACAGAGG CCT'rGGTAGA AGAGATTTAT *GAAAAGGATG GAGT'rGGGGA ATCTCAACTC CTTTTGATGA ATTGACATCG AAGAATTGGC TTCGATAGAA AGCGCAGTTA AAGCGTGTAC TGACCGCTCA GGAAATCGAG CGCTTCACCA *ATAGAATATT TAGCTGGTCG CTGGTCGGCT AAGGAGGCCT GCATTAGCA AGCTCGG'rT TCAGGATTTG GAAGTCTTGA TATTTTAGTC AGGCACCATT TTCAGGAAAG AT'rTGGCTGT *TTTGTGACAG CCAGTGTCAT Tr'rGGAGGAA AATCATGAAA CAAGGCTCTG ATTCATCTGG CAGCTATTCG ACAAAATATT CCCTCAAGGA ACGCTCAAGT TGGCTGTGGT TAAGGCCAAT
CTTATCTGGA
CTTGCCTCGG
AATAAGATGG
ACCCAAT1'G1
ATT-ATTTCAG
GAAGAGGCGG
AAGATTTTCA
AAAGGATTAG
AGATGGTCGA
TT-GGGA'rAAC
CATTTATTGA
CTGCGGAAGC
GGGAAGATGA
TCTTGGAATA
TGGTCATGCG
'rTCACCAGCC GCTGTGCGCC AGTTGCACTA ATGCTTTATC CGTCAAATCT
GCTCGTTCTG
GGGATGAAAA
CAAAACAAC
GCCCATGTTA
TATGAAAATC
ATCCTCATTG
GTTCGCCAGA
TTTATGATTG
TCTATTACTG
GTTACCTTGA
GAATGATAGT
CACGACATGA
GTCTCAAAGG
TTTCCAAGGC
ACAATGAACG
ATCCAACCTC
AAACCTTCCT
TCCTCCGTG
TACTCCAAC
ACACCAACCA
CCTTGCAGAA
AA'rCT'ACCT ACCC?1'GCCT
CAAAATAAGT
TGGACACGGA
AGGATTTGCT
ACGCAGGCAA
TATGGGAACG
TGGGGCGCCT
6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 CTATCAGCCA CACCGATCAG GCTAGTCCAC ATAGACCAAC CAGCAAATGG GGGCTCATAT GCTTATCGTC ATGGAGCTGT TGCCGTTCCC AAGGCAATTC AAGATGATGT TGATGGCTTT 'rGCGT'rTCCA ATATCGATGA 961 AGCCATTGAA CT CAGACAAG CTGGACTCAG CAAGCCAATC CTCATTTAG GAGTrTCTGA I AATCGAAGCT GTTGCTCTAG CTAAAGAATA 'rGACTTCACC r'rGACAGTGG CTGGACTGGA GTGGAT'ICAA GCACTCTTAG GATTGATTCA GGGATGGGAC AGATT-TGCTC CAACAACACG TGATGAXGGAA TCAGATGACT TAGTATG.AAG GAAGTTCCAG TGTAGAGACT A=TrCAATG TGGAGCGGTC ?TGGATTTC GGTTCATGTC AAGACAGTTC GCATACGAG CAAGTCATCG CATGCAAAAT TTCTCTGTCT GATGGACCAA ATCACI'ATTC GA'rTGGCTCC AATGGGGA'rA CATTA.ACTAT GAGGTCGTTT ATAAG.GAAGr GGA1TGGT ATTAATrC
AGCTGGTTCA
CGGTTCGTAT
GGACAACI'
TAGAGAGCCA
TCAAGGAATC
CCAG -rAGAA
TGCTAGCAAT
CGGAGATGCC
GGATTGACAG TCCACCTCAA AGTGAGGTTG AGCAG7GCTCA TTTACCCACT TTGCTACTGC CGCT=AAAA CTArTTrAGC TCTGCAACCA CTCr1rTGCA ATGTATGGCC TCAATCCAAG TTGACCT'rGG AGTCTGCTCT TA'rGGAGCAA CTTATCAAGC 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 CTTATGATTT GATACCGGCC CAGCTGGAGC TTGCATGGGC CGACCGTGCC AATCGGGTAT GCAGATGGAT GGACAAGAGA TGGTAGATGG CCAACCTTGC CCAATTGTCG GCAGGCTTTC GATTGCCTAA GCTTTATCCG CTAGGAACCA AGGTAACCTT AGGAAATCAC TGCAACTCAG GTAGCGACCT GCCTCCTCAG CGACCGTATT CCCAGAGAAT
ACCGCGTAAC
AAGAAAGGAG TGGAGCATGA ATCTACATCA ACCCTTGCAT GTCT'rGCCTG AAAGTCAGCA GAAAAATACG CCAAACTAGG AATTGAAAAC TTGCAAGATC CTTTCCTTTC CCTTATGAAG ACTTCAAAAC CAAGCAGGTG CTGGAGCTGG GAAGGCAGT'r CT'I'CTGGTC AGGTAGTGAC TCCTGCTAGT GTCCAGTATT GCGCAATCGC CTGCGTTTTA GTCTCAAGCA GGGAGAGGTC GTTrTTGCCGG TAACCAGCCC TATCTGGCTG ATAAAATAGA G'TCGCGAGCA ACCCTTGCTG ATGGGACCGC GC'rAAGGCTA CTCTGACTGG GATGAAGGTT CTGGCTCAGG CCTCCACCCT GTCTATCGTC TGGCTC-AGGG AATCAGTCAG GCCAGTCTGC CAAGACGGCT TTTCATCAGG GACTGGACCT CVGATAGAA GAAA.ATCTGC ACTAGACAAA TACAAACTCA TGTCCCGTTG TCAGGCAGTC CGTGCTATGC GTh'IrTGGCA GAATACAAGC AGGCTCTTCG CCGTATAAAG ITTGAGGA6AC CCAAATCCAG CTGCAGATGC TCAACTCTGA AAATAGAGTT CAGCGAAGTG GAATTGGTCT CACGAAAAAG TGACAGCAGT TAAAG'rAAGT CTTCCTTTrG AGCTCAGGAA AAGAGTrTTC AGGAAAT'NT AACTGATATC A6AGTCCGACC ATTATTAGALA 8940 GTG'TGGGACC- 9000 TCTTGCT~CTA 9060 AAGACG4GTGA 9120 ATGGTTTCAA 9180 TGAAT'rTCl'T 9240 TCTTGGAAA 9300 TAGAAGATGA 9360 TCAAGGTCAT 9420 CCCAGTCTTT 9480 AITTTTCCAAA 9540 TC=TATTT 9600 GTCTGGTTCI' 9660 CCCTGACCCA 9720 ACCACATGAA 9780
TCGTCTCCTA
TGCGGCAGTG
GCAACACT
TTCCTTGAAA
TTTGATTATA
GArrATTATC
AGGTGACAAT
CACAGCCTr
CAAGGGGATG
ACAGCAG=T
GAGAGTTTAC
GCTG3CAGAAA
GGAACTCACG
GATGAGCAGC
CCAGATGTCC
GGAGATATGG
962 TrGGGCAGTGG AAAAACCGTA GTCGCTGGCT ATCAGGCTGC CCTAA'rGGTA CCAACAGAAA AGAACCTTTT TCCCAAT1'TG AAACTGGCTC AGAGAGAAGT CTTGGAGACC ATTGCCAAGG CTCTGATACA AGATGGGG;TG GAGTATGCTC ACCGT'N-rGG TGTAGGGCAA AGGCGTA'TT TCATGATGAC CCCCACTCCC AT'rCCACGGA ATGTTTCCA'r TATCGACCAG ATGCCAGCAG
TGGCCATGT
TCCTCGCACA
TCTTGACAGG
GTGA13GCTGA GTCTTGG7TTT
TACGGGAAAA
CCCTTCCCAT
GTCGGAAGCC
GGTTAGAGGG
AATCAGAAGC
TTGCAGGCAA
TATTGTGACG CGCTGGATCA GGAAATTCAA AAAGGTTCCC AACATGAGCA ACTACCTCAG AAGTCTATGT CATCTCTCCT GTCTTGACT'r
TTGATTGAAG
ACGACTCAT'r a a a a a TCTAGATTTG AAAAATGCCA TTGCCTTA'rC GGCAGAGGTG GCT~CTTCTAC ATGGTAGGAT GGA'rTTCAAG GAGAGAAAGA CGGATATTCT CAACGTTCCC AATGCGACTG TCATGATTAT ACTTCACCAG CT'rAGAGG'rC GTGTCGGTCG TGCTAATCCC AAGACGGATT CTGGGAAAGA TGGATTTGTC CTTGCGGAGG AAGATTTGAA CAGACAGTCA CACTTCCAG AGTTCCAAGT AGAAGAAGCA AGAAAGGTTG CTAGCTACAT AGAGTGGCGC ATGATTGCCC TTCATCTGGA 'rAAGGAAAAC TTATACTCAA TGAAAATCAA CTCAAAACAC TCT'TTTGAGG TTGTGGATGA TGAGGTGGCA GATAGAACTG ACGAAGTCAG CGTGGTTTGA AGAGATTTC GAAGAGTATT AGAGTCATCA AAAAGAAACG AGGACTCTCA AGAGGAGrrC
GAAGAGTGAC
GGTTTCGACG
CATGGATGCC
GGGGGACAAG
CCGCATGCGT
AATGCGTGGT
GGCTGATATT
TAG7TTCTATA
AAAGAAAGAA
AGAGCAAACT
AACTGACGAA
TAACATATAT
AAGCTAGTTT
GAAAAAGACC AGATCATGCA ACGG?'rA'rTG AGGTTGGGGT GA'rCCCTTCGC TCTCAGTCA CAGTCCTACG CTGTCTCGT ATCATGACAG AALACCACCAA TCTGGTGAGA TTTTTTGGAAC ATCGAAGATT TTCCGATT GAAGCTTGGC AAGAAGATCC CATCTrGGAT'r AAGCTTTCTC AGGAAGCTAA CCGCAGGT'rG GTCAGCTCA.A AACACCG=r ATACGGTAAG GCGACGCTGA TTAGGTrTGG CT=~ATACT 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 TATGACAGTA ACGATTAAAG TAAATTACCA AACCACTTTC CAAAAGAAGG AAGCAAA).AA CTAGTATAAA CAGAAGAGAG AGCCAAATGC TC7'rTrTCG TTTCTAAAAC TACTTTCAGC CCATCATCCT AAAAGTAAAG AATCTAAATT a. a.
a a CACTT'rCTAT CGAr'rACAAA
CCTATAGAGG
TTACCCTTCT TTCTTGCATT GATTACATAG ATATGCTACA ATAAAAGGAG CATGCTATGA AAAATCCAGC 'rTTGCTAGAA AAGGGATGAG GTTCCGGAAG ACT'rTGA'rGA TTTCTGGCAT
GTTIGTGGTAA
GAAATTAAGA
GGGGAAGTGA
AAAATGTTTC CACGCTTC!CA TCCTACCACT TGGAGGAAAG AGATTTCCAC ATTCCTCAAG TCAAGTGCTA TGAGTTAACA TTTGAAGGAA GCAAGGAAGG AAAGGTCrAT GCACGCATTG ?TrCTTCCAAA GAGTGAGGAG AAGGTCCCAT TAATCTrCCA TT"CATGGT TATATGGGAC 11640 11700 11760 GTGGCTIGGGA CTGGGCCGAC ATGCTGGGCT TCACCGTAGC TGGTTACGGr TGGATGTGCG GGGCCAGTCA GGTTACTCAC AAGACGGCTT GCGTTCTCCT CCGTGAAGGG GCATAT'rATC CGTGGTGCTG TGGAAGGTCG GGACCACCTC ATGTTTATC-r
AGAAGCGTCT
CGCTCAATCC
GGGTGATTGA
TTCACGACCC
TCAAAAATCT
ATGTTTGCTA
GGATATTTAC CAGTT GGT CG TTCTAGCTAT GCTGCCTCAC TCGAATTCAG AAAACAGTTG GATrGGTAAT ACTAGCGAGG CTTCCATCAA ACAGAGGAGG TGCCCATCGT ATCCAAGGTG TCCCATTACC CAGTTTGCGA AAATTGTTGC TAGTCTGTCT AAGGAGGGCC TCTAGCTCTA CCATTTATCC CTTCT'TGTCA CTTACGACGA ACTTTTCCGT AAATCATGGC GACCCTTGCC GTTGI-PTCCA 11820 TTAGGAAATA 11880 TTTTATAAGG 11940 CAGGTTGATG 12000 GTTGCAGCAG 12060 GACTTCAGAC 12120 TATTTCAAGT 12180 TATATCGATG 12240 T'rGGACGACC 12300 GATAAAACCT 12360 GACCAAGTCT 12420
AGGTTAACAT
TTTATAATCG
CATTACGGGC
TCTGACCTGC
ATTTGTCAAT
*0100.
0 0 .0 0 0 000 0 00 .0 0 0 *0 00 0 O 0 *0 00 0 0 0 ATCGCATCAr GCCTGAGTAT GCTCACGAAG CCATGAATGT ACAACTGGCT CTGTGGAAGT GAGA'rTCCTT TTAAATATCT AAAATAAGGA GTCGACTCTA AGCACAAAAT CTTAAAAATT ACAAACACGC ATAGTATCAG GGCATTAAGA AAACTTrATA CTATGCGTTT TATCATGGAA ATATAGTAAA ATGAAATAAG AACAGGACAA ATCGATCAGG ACAGTCAAAT CGATTTCTAA CAATGTTTTA CAAACAAATG CTATTATATT TATAGAATTT TTTGTTCA GATTTGTCAA TCAGAAAGCA AAAGCCGATA CCTATCGAGT AGGGTAGT'rC TGTAGGTGTT AATACTTTTC AAAAATCTCT TCAAACCACG TGTACTATrC TAGTGTCAAT ATTGCTTAAA ATAATTTTTT TTGCTATCGT CAGGCTTGTC TCAGCTTCGC CTTGC 12480 12 540 12600 12660 12720 12780 12835 0* 90 00 0 0000 0 9000 00..
0. 00 00 0 INFORMATION FOR SEQ ID NO: 142: SEQUENCE CHARACTERISTICS: LENGTH: 5020 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: GGGGATATGA AGAACAAAAG AATATTTAAA GACTTCCAAG CTTCAAAAAT GAGTTTAAAC ATTTACACAA GCCCCTTGTT AGCCTTTGTT TTTGTCTTCA TAGGACAGTT TGTGGCTTTT 964 ACwVMTATG GrTrGCcT GTTAGCTCTC ATCGGACTTG CTAGAAATTT TGGAGAGGCT GGTCAAAATC TTGCAAGCTA CTTGCAGACC TTGCATCAGA GCTTGACGGA TAAAACAAGT GACTTCGr
GACAAGAAAA
CAGCAATCTT
'rTTAGTGGTC
CTTTGTCCTC
CCTGCTTGG
ATCTAGCCTG
TAATTTTAGG ATTACTGGCC TTTGGTAT CTTAACACTG CN'GAGAAAA GACCTATTCG AACCTTGGCA TTTTATAGAG CTG.AAAGGAT TTAGTCTAGG CCTGGCACTT TrCrCTGA TTAGGTCAAT ATCGTTTGGA ATCCATTCAC TTGAATCCTT TTTACTATCC CAI-1-rGAT 'N'TACAGGGG ACAGCAGAAG CTACTTCCTC AATTGGCCTC AAGAACCAAT CTAAAACTAG TTCTTTACCC TGCTTCATAT GGGCAATTCT GGTCTCACCC TGrrCAGATG
AGAATTTCCT
CCrrGTTAGG
ATTCTCTTGC
AAGTGGTGGC
CTATTCTTAT
CTCTATCTCT
AGTAAATCTC T'rTT'rATTCG 'rTGGGGTGTT GCAGGTATTC TTTAGTTAGT GCTCA.ACCGT GATTGGCTAT CAGGTGGTTC CTACTGCTGA TTGTCTATCT CGGTCCGTCC 'T'rCT'CGT AGAGAGGATT~ CTT'ATGATCA TCAACCATCA GGAGATCAAC AGAAAAAC CAGATTCTGA GGTCATTTCT AAAGTCAA'rA TCAGCTCTAT GGGGAGTNTA CTACTATGAT TATTACCAC GGATAGTTCT GTCAATGACG GGAGCGTAAT GATGTTATTG CA.AGGAATAC GCTGATAGTG ACTCTTGAAT GACTTGGTCG AAGATTTCGC GT'rCGTGGGG CTTCGAGTA GAATTr=G AGGTCAGGTG T'rGGGAGAAG CAATGACGAC CACATGGAAG ACCTGTCTT GAAAAGGAAG GAGTTGCCAT CGCTCTTTAC CTTCTCAAAA CTGATACAGT A'rGGTGCTTG GAATTTGCT CAG4GGTAATC TCTrGGGAT CAGaACGTCT CTGATGACCT rTTTrACCACA AGCCAATCAA TTrGGCATA GAAGGTTCCA TT-ATGACAAG TCTGGTATTA TGCTAATAAA T'rAAAGAAAG AAAATGAAAG GATGTGACITT 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 GAAAATACTA 'rAAGTATGCT AAAATAGGAA ATCACATTAC AGATAATCAA TTTAAACTAG CCCAAGCTAT CGAGCAGTTG GTGGjATAACA TCGGGGCGAC TGGAACAGGG AAGACCTATA AACCAACTCT GGT'rATTGCC CACAATAAAA AGGAATTTTT CCCTGAAAAT GCAGTTGACT CAGAGGCCTA TGTCCCTTCT AGCGATACCT AGATTGACAA GCT'rCGCCAC TCAGCTACCT
TAGCACATGG
TATCAAAATA
TTGAbGGGGG
CTATGAGTCA
CTCTGGCTGG
ATTTCGTATC
ATATTGAGAA
CAGCCCTTTT
TGG'rTCGCC TCGTGGCCTC ACTCTCTTGT ATCTATGG'rr TCGrrAGTCT CCGTCCTGC'T CTAGAGATTT CTCGTGATAA ATATTCAGTT TGAACGTAAT GATATTGATT TCCAACGCGG ATGTGGTAGA GATTTTCCCA GCTTCCCGAG ATGAACATGC GAGACGAAAT TGACCGTATT CGTGAAGTTG AGGCTCTGAC TGGATCATTT AGCGA7TTC CCAGCCACAC ACTTGGAC 'rTGCCATTGC AAAGATTCAG GCCGAGTTGG AAGAACAATT GTAAACTGCT TGAAGCCCAG CGrMAAAC AGCGGACAGA 1560 1620 1680 1740 1800 1860 1920 965 GTATGATATC GAAA'rGTTGC CCACATGGA'r GCACGGAGCG TGATTTCTTG AT'rA'GATTG CAATGGAGAC CG7TTCCCGTA 7rTGGACAA'r CCTCCTCTCC CGTTTCAGCG ACACCTCGTG CA'rTCGTCCA ACGGGACTCr TCATGACCTC T'rGGGTGAAA GTGAGATGGG CTATACCAAT AAGGAGAGCC TCCTTATACG ACGAGAGTCA TATGACCATA AAGAAATGCT GTGTTAATTAT GTCCGGAGGA GlTTTGAGAGT GGGGTTGAALA ATTATTCTCG CTTCTCGACT 'rCTTCCCAGA GGGCAAATCA AGGGCATGTA GGTTTCCGT'r TGCCGTCTGC CACG?'rCATC AGATTGTTTA GAGACAGTGA TTGAGCAAAT CGTCCGACTA TGGGACAGAT AATGAGCGTA CCTTTA'rCAC TTCAAGGAAA TGGGTATCAA ACGGAGATTA TCCGTGACCT 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520
AACTTTGACC
GGTCAAGTAC
GCGCTTGGGT
TCCTGAAGTG
ACGTGGACTC
GTATCGGAC
CAAAATCCAG
AATCCGTGAC
TATCAATAGC
GCAAGAAGCA
GGAAGTCAAG
AGATTTGATG
TGATGCTCCC
AAAATCAGAA
AGTTGGGACT
TGGTATTTAT
GA'rAGATAGG
AGGAAATATT
TCCAAAAGT'r
AAGAAAATGG
ATGCACTCGG
ACTATGAAAA
TGGATCCAGA
TCAATGCCCG
CAGAGGATTT
ATATCAAGAC
GTCTTTGATG TCTTGGTCGG AGCCTCGTAG CTATTCTCGA ATCCAGACCA TTGGACGTGC ACGGTTACCC AGTCTATGCA ATGGCCTATA ATGAAGAACA TrGA'N'GCTG TGACCAAGGC CTCAACAAAC AAGAGCGCAA GTTGAAGTGC 'PTGACTTTGA AATTrAACCTG CTCCGTGAAG TGCTGACAAG GAAGGTT'rCC TGCACGTAAT AGCGAAGGTC ACGTGCTATC GATGAAACTG GAATTGACGT 2580 TTCCCAACGA 2640 ATGrrATCAT 2700 CCCGCCGTCG 2760
TGA.ACAGACC
GGTGGAAGTC
CGTTGAAAAA
GACCGACTAC
CTTGGAACGG
TGGTATCGTT CCACAAACCA TCAAGAAAGA AGTTGCTAAG GAAGAAGACA AGGAAGTCGA AGAACTAGTC AAAAAGCTTG ACTAGCAGCT CAGATTCGTG GCCTTGGA'rT AGGGGAATAG TATGA'TTAT TTPLAGAAAGT TCTT'rArGGG AAATGGCTrA TTCACAGCrT AATCCAGTT TATTATGATG ATTATCAGTA TTTTTCAAAT TTTAAAGAAT
AGAAACAAAT
ATATGATGCT
TAAAGAAAGA
GGAAACAGTA
TCGAACTACA
'rCCArTTTTAA GCAACTCAAA TCGCCTrrGGT ATTTTTGTTG ATGATAAACT GTTTCGCGTT ATTGGGTATG TAAAGAAACA AGATGGATGG AATTGGGAAT GATAAAAAAT TCTGGAACAC TGGTATTGGG AAAGTTGCTA TGTTGCAGTG ACGTTTCAGG ATTAC7TGGA GTTGGAGCAT CTGGGTorMA CAAC1-rGGTC GCTATGATGA AAC7"rGCTGA AAAATTAAGA ATGAAAAAAG AAGCrCA'rAT CGT'rATTATC AAGGTAAATA TTTTGACAGT ATAAATATG GTA7T=TGAG 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 AGAAGACTGCG AGAAAATAA ATGACGGT'rA TTATCAAATC AATGGAAACT CCTGAAGAGA TAGAAGGTAA ATCCTTCGTT CACTGGCAAA CGTGGAGAGA GGCTTATGAT GACCTTTTGC CrGCGGAAT1T TCAGGAGACA CAGAAAATAC ATTGATTGCG ACTGTCGTGA TGAGACTATT ATTATGGAAA AGGAATCGCA TTTCTGAAAT TI'TCTTATGG 966 ATGACATTAG AAAGATGTCG ACTCTTTAGT CAAAAG'rATC AT GGATGGTG TGAAGATAGT TGGrTTTATA AGTTATGGCA CA.AGGTGGTG AAATTATTGC TTTATATGTT CAAAAG'N'AG TGAAAGCAGC TTrGACTGAT GTATTGAAAG ATAACAAGCG CGCCATTGCT TTAAAAGAC'r CTTAATCATr
TTCTATCAAA
AAATGGGTT'r
AAA.AACGGAT
GTCTGAAAAT
TACTTTTGAT
GGTAT'rCTAT
TTAATAAATT
GGACAAGAAA AAATACTTGA ACTTGGAAAG CCTATAAAGG TCTAAATAAT TCTCAAAAGT AAAAGCrAAT ATGGTACCAA AGAAAGCGAG TAAATTTATG TCCCGT'rCCC AATTAACAAT TTGAAGACCT CGAAACTCAG CGCGTGGTGA TGCAGTATCG GGTC TGGTTA TGCCTTTCCT GGAGGTCATG TAGAAAATGA TTTAACAAAT ATCTGI'CTGA CGCCCCTGAA AACAATCGCT 0 0 0
TGAGGCTTTT
AAATCCTCAA
CATTTGTTAT
TTCTrGGGrG GATG4GAAATG
CGATTGGGAA
CTCGATATAG
TGTTAACTCA
GTCGCGGCGA
CAAGATATTT
GCGGAGTCTG
CTTGTCGGCA
AAGGCGACTG
CAAAAAGACC
ATGGAAGCTC
AAGAAAATCT
TGGAGGTCTT
GAAATTGGCT
CAGGTCAGTT
TGAACGCGAC
TCATrCGTGA AATCTACGAA GAAACAGGGT TTAAAAATTG GCCACTAGAT ACAGGTGGGC
TGACTATCCA
GCTATATTGT
3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5020 AGTTCTCTGG TACCCTTCAA TCTTCAGAAG AGGGAGAAGT AGATTCCAAA CTTAAATCTG GCCTATGATA TGCI'ACCATT CCGACAAGTC AGAG'TTTTTC TACCCTCGCC GTACAGAAGA TCTAGTCTTT TACTAAATAA CCTAGCTGAT CCAAGGCCTC GTTGTGTCTC GGCTTCAACT AGGTGATAAT GAATACCATC TAAAGTCAGA ACGT'rCAACT TGTTCTAGAA AATGTTGCAC T'rAGTAAGGT TTCAATCTCT CCATAAACAG GATGATCAAT CACCATTATC TACGATAGCA AGTAATTCTC GTCCAATTTC T'rCAACT'rCA TGCTTGACCT TAAATAATTT GTGATGATAA GTATTTGCAT ATAGATATAA CCACGATTGG TAGATAGAAT TGGAGATCCA TCACCTCTTA ATCTTGAACA ATAACTNGTC GAGTGACATG AAAGTGCTCA INFORMATION FOR SEQ ID NO: 143: SEQUENCE CHARACTERISTICS: LENGTH: 4965 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear
TAGCATCTTT
AAATTGCAAT
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: AAAAAGTGGC AATCCATTGA TTGGCCACTT CATTTAGAGA ATTATCGTCT CGCCCTTGAA GAAGAACGT'C GTGTAGTACT G AGCTGGATG; GAGTTGGTAG
TCACTTCCC
TAGCTGGTAA
0G1-rrT'GAC
CCCTCGTTCC
TAATTAGAGT
ATCCAGCCAC
CAATAGCAAA
CAGTCGAACC
C1'GTTGATrr
CCCAAATGCT
TA'rTAGAATA
CTAGGGTTAA
TGAGTTACTG CTATCGCTAG AACTACTACT ACTCCCCACA ATACTAGACC AAGCATTCTG GCGATAACI' GTCGCTGGCG CTCCT4GACTT TG'rGACCTCT ACTTCTTTTC CTTcAAcAGA CAAGACTCC GATTTCACTA CACTAGGATC TrGAACTGCTG
ATAATCCGCA
ATI'AGCCCAA
AACCTTCTCT
TAAAGCAA.AG
ATGAGCCATG
ATCATGCCCA
TTCGTrCTTGG
TGGGGAAGCT
ACCTGCTCTA
TCTAGGTGTC
AATCCAATCT
TGTCACACGA
TGCTGAATCC
CGTGACAATG
GAAAGCATGA
GCATTAGCCA
GAGGATAGAA
CATTTACCAG
AATGATTATC
GCCACATATT
TTGGTTGTAC CAGTC'rTCCC AGGTTAGACT TGAAGGTTGT GAGTAGGATT TAAAGAAGTC CTTCTCGTAG CAATCCCTCC TATCCTGATA CTCATACACC TCTGATGATA AACTCCATTA do ATAATCGTCG CACTAGCTTT TGAATAGACT TGAACCGGTT ACTCTACCAT CTGCTGCTC AATCTTTGAA ATCACATGCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 TTAGCTAAGG 'rCTGATAGCC A'PTGGCAAGC TC 'CAATACC T'rGACATCAA CACCCTTTTC GAATAGTTCA GAGCTTCTCC ATTGGTATGC TGGGCAACTG TGAC'rTCAAT ACCACCACCC GTACTCAGGA ATCTCGTAAC CCA'rCTTTTC CATATAACCC ACGGAGCATA CGATAGGTCC AGTAAGCAGG GATATTCCAT CAAGGTCATC A~rCCTGTTC CCTTGCT'r AGCATACATA ATCGGA'rTGC CATTAGCAAA GTTTGN'GGA TAGT'rAGA'rA AAGCCCTGGT CAATAGCAAT ACCGTAGGCC AGCAAGGGCT GAATCGT'rTC ACTTrCCCATC TGGTAGTAGA AGCTGGCGAA CG'TTTGGTAT CAAAGGCATG ATTA'rrrTGA TTTCTTGAT AATTACGACC ACCTACAAAG CCTAGAATAG CACCTG'rTTG GTTATCCATC AAGACATTCC CTACTTCTAC ACGACCTGTT CCATCGTCTA AAAGATAGCC ATAATCAGCA ACCGCACTTT TCATCTATGG TAGTAGTAAT CTTATAACCA CCA?1'TTCAA CGATAAAACT TCTGAGTTGC CTCA'=TC AACTCCT'rAG GCATGCCAGA ATGAATTTTC TTTrCCTTGGC TGCCAAATrT CGGAGACATT GTCTCTCTGA GCTAGATAGT CATACATACG TTCTTGAGCI' TCTGCCAAAG T'rGTAAAGTA TAAATAGTCT CGTGAAAT'rC CTGTAACCGT GCCCGATGGT AAAAAGTCCT C'rTrAAGGTC ATAATCCTTG TACrGAGAAT ACTCG'rCTTT GCTTAATGCA CCTGTACGAT ACA'rACTGTA AAGAACTGCC TTAGCCCGTC TTAAGCCAAT TTCTAGGTCT TCATCACTCT TCAACTCCCC AGTATTTTCA TAACGAGAGT AAGTAATGGG ACTCTGTGGA AGTCCTGCTA AAAATGCTGC TTGAGGAACA GTCAACTGAC TGGCATCTAC ACCGAAAATT CCCTCAGCTG CTTGCCGAGC CCCTGCAATA Tr=GTCCCT TATTA~TCG GCCAAAGCGA TCTTTATTCA TGGCGCGTTC CAAGGCAAGA AAGGTCGGCG CATCCCCAAC CACCTGCTGT CCACTAGAGG AACCCAAACC TACAAATTTC 968
GCCACATTGA
GCATCCACAA
TTAATTAGIT
CCCAAGGTCG
TCTTCTGTCG
GTGCGCAACA
GATAGGTCGT TAAAATCTCA
TCTCTGCCGC
GCTGG4GTCAA
CACGAATCAC
CTTACGAGCC
GGTTGAACCC
CCCTTCGGT
ACTACACCCT
TTTTCCGAAA
ATCACCGTCC
TATGTTCTTT AAAGTGTTCA T'rMCTCAGA TGAGATAGAA
CGTCCGAATA
ACCAAT'rC?1 CTGTCTGAGG GCAATCCCAG CTCCCAACAT GGTAATCTCI' GAAATAGAAG CACCCGAACC 'rTGTCAAATA TCCTCCTAGA AAACCGAGTA AGCTGGGAAA ATGACTGACT GCCAGGC'A GCTGA'TTTT CAATGATAGC CTTCTTCAGA AATCACTCTC TATGGAAGCA AGATGTCCTT GACCTGATTC AGCCCACTCC GTATCCCAAA CAAAGAGTAA GTTAAATAAG TATCTAAGGT TTTAGATTT TAT'N'TTTTG TTTrrCCTGG TTTGCATGGA TTTCCTCACT GCCACTTTCT TCCCTATTCT GCTTr'rATAC TCAGTAAA.AT TTGGTACTTG AACC'rTTCTT AAAAA'rTCCA CCAT'rrTCG TTATCTAT-rA TACCACAAAA GCTAGGCTAT TGCCCAAGT'r AAAAACACAT GCACATTT A'rGAAGAAGC TTTGCGTAAA ATCCAACTGC TGACAGCCTT TGCAACTAGC AGGTCACAAA ATCCGTCCTT CAAAGATGCT TCAAGTCTAT CCAAGGACAA CTGTCATGGT CAACAACTAC ATATTGGAAA ATACT'rCACG TCGAALACAGG AATTTCTTAC TCGTCCTTAA CCAAGACCAT T~rTAATTCA
GGGAAATTTI'
TGTGATACAA
GATGAGCTAA
GCCCTAGAAG
CACCTAGGCC
CCTTATGCGC
GAACGTAGTC
CT'rTCTCGTT
GACTGGTTTG
GTCAACTACA
TTT'AA'rrGAT
CAATAAAATA
TAGGTAGAAA CAATAATTTT AAGACCGTGG TTTGATATTT AAGGTCAAGT TTCrTATTAT ACCT'rGTCGC AATCTTGACA 'rCGTT'GGCGG TGCTACAGGT TCCAAACAAA AGACACAGTA T'rCTTGACTT TGAAAATGGC GCAGCATCAG CTTCATTGAC TGATGCGTAA GGAATCTGTT
AAAA.AGGAGA
CAAACGACTG
ACTGGCTACG
AGTCGTCCCT
CTCATCGGAG
GATGGCTGGG
GAAAACAAGG
TTCCTCCGTG
AAAAAACGGA
TATGACTTCT
CAGTGGGGAA
CACGTTATCA
GCAAATGCCG
ATGAACGTGA
GATGAGATTG
AAAGTCTTG
C1TTAACATCA 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 ATATGACAGC TGGTACCGAA CTGTTCCACT AATCACAGAT ACTGAGTTCG CTTACCAAAT1 CATGCAAGGG AATG'rCACTC TTCAAATCGG TGGTTCTGAC TTGCTTCGTC GTAAGGCGGA CAAGACTGGT GCAACTGGTA AGAAPATTTGG TAAATCAGAA ACTTCTCCAT ACGAAATGTA CCAAT'rCTGG TTCTTGAAAkA TCTTTACTTT CTTGTCACTT CAAGCAGCGC CACACGAACG CTTGGCTCAA GTTCACGGAG AAGAAGCCTA CAAAGAAGCA
TCTGGCTCAA
TGGACGCTGA
AAGATATTCG
TCCCGAAAAG
CGCTGTTCGC
TAAACAAT
CTCGTGAAGT TGTTACACT'r 969 CTGAGCAACT CTTTGCAGGA AACATCAAAA
TTCGTGGTGT
TCGTCTCATC
CCATCTACG'T
AGTTAGAGAA
ACTAAACTAT
TTTTGCTGTT
GCAAGGTTGT
CAAAGCCAA'r GCCCAACTAC CAAGTACAGG TGGTATAGTr AACTCAAAAC AAACGGCGAC CGCATCCAAG 'rGAACTGACTr GTTATCCCTC TCAACATTTA TCTATAAACA ACCTTTCTGT CAAAGAGCTC AAACAAGGAC CAGACGAAAA CAACAATATC CTGGAACTGC GCCAAGCCCG TGAAGACGTC CAAAACGGAG
AATAACTCTC
TAGATTATGT
CAAACTACTA
ATCTATCTAT
AAGATAGAGA
TTTACGACAA
C
*t 0 b0 4 o 00.0 *0 *fr 0 00.0 0000 *00.
0**0 0 0
S
TTT'GCAAAGA TGATGATAAC GAATCCAACT ATAAGATATC AC'rCATCTGC TTAGAAATAT CTGATAGAGT TAAAGCCGCT GAGTCATTCA CTGCTTT'CTG CAGTTTCTCT ACTAACTCAA CCTGATCAAA GGGCAAATCT T'rGACTAATT CCAGAATCAA TTTTTTCCCC TGTGCCTTAA TCAAAGGAGT ATGAATGCAG AACATTCCAA GATTGrAGTG ACTCTrGTAC TCTTCAATTA GCTCATCCTG CATCAAGACA TAATTCCCAA TGATCCCCTT GCT3'GCGATA TAT'rGGAGTT CTATCTCAGC TTGCTTGACG ATGGCATTAG AGGCACTGAT TC'rGAGAATA TC'N'CCTCAC CTATAGGATA ACTAG'rTGTG ATTGTTCCTG GATATTTCTC CCTGTTGTGG CCTCTGGCTG AGCT1'GACTA
GTGGGAAGAA
AAGGAGI'AA
TTTrA.ATAGA
GATTTGAAGG
CGGTATCCTG
CI'TGGAAGAA
CTGCACTCTrC
ATCCATCTCC
ATTTCCCATC
CCTCTGTCCT
GTr'rATCCAA
TCAATTCATT
AAGCATTTTG
TAAGAACTG
TCCCATGCAT
CAATAGGA.TG
TATAGTCTCC
TCTTATCAAA
TCATCTCTGT
TGTCTTGAGT GACGCTGATA AAAATACT'T GTATTGACTr CCTCGAGAAA GGTAACTCCT CAGGCTACGC AGGACAATGC ACTGAACCAA TTAAATAACC AATATTT'rTC 'rTGATGAG3'G ATCCAAACGA TTATCTAACA ATTCATCACC ACACCGATAT AACCATCAAA ATAGTGTGAC AGGTT'rCAAG TCTGTATAGA AATCAAGGTG TCTCCTGTTG GGCTGTTI' GCTTCTTTTC TrGATAAGCC AAGA.ATAAGA TTCTGAACTO A'rATGAATCI' TTrGGCCATCT ATATGAGATT TTCCTCATGT TCAATTCCCT ATA.AATGTGT TCCTCAAGAC AAAAGGTAAC ACCTTTTCAA CAAGAAAGTA TCAACTTCCA
GCTGG
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4965 INFORMATION FOR SEQ ID NO: 144: SEQUENCE CHARACTERISTICS: LENGTH: 3232 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 970 CAGGGGCGTA 'rTACGTGACA ATTCAATGTA GGCTGTCGCT ACTTCGCCA TCGATAATGT CGGATGATAC TAACGATTAA ACCGAGCAGA AAGGATCCCA
AACTGCAATA
AATGGATCAA
AAAATCTTCT
TGCAAGGTCA
TCCAAAAATA
TTGGATAATA
CAAAGAATGC CTTTTGATAT AGTGGTAGAT GAACCTCCCA 'ICTAGAAATA ATACAG1'TAT TcTA'rTTTI A7*TGCCGTTA TAAGGATrTTT AAACAAAATA 'rTTAAAAC? TT'rGAAAAAG AACGCTTACT TATTAAGGAG GAcATTTTAT
AAACAAGGAT
AAATTCCCCA
ATTGTTCAAC
TGTAGCACTT
TATCATAGAC
AGTTAAGATA
GTCATACAAA
GCCAATGGCG
AGATAC1T1C
ATCCGTGAAT
ATAAAATTTC TGAAATTI'CC TTTTTGTAAT ACACAAAGTA ACAAGCAATG CAGAACGTCA CAACAAGTTA TTCCTAAAGC ACTTC CC AGTGATTTTA
TGTAGATTTC
AGCATTTGGC
GCGTCAGGTT
ATCAATACCT ATGATTTGGA TATA'rCCTA GTGGGGCGGG CTTTTTAG'rT TTTAAAGATT TTCTTGCTTA TTTA'rGATAA AATGGGAGTG TCGCAAAAAA TGACTCATCG TA'rTCAATTT TGAGTAAAAC TACCGATC CCATGTCTAC AGAACATATG GAAGAACTAA ATGACCAGCA GATCGTTCGC CGTGAAAAAA TGGCTGCGCT CCGCGAACAA GGAATCGATC CTTTCGGAAA ACGTTTTGAA CGTACTGCAA ATTCACAAGA ATTAAAAGAT AAATATGCCA ACCTCGATAA AGAACAATTA CACCATAAAA ACGAAACAGC TACTATCGCA GGACGCTTGA TAACCAAACG TGGTA.AAGGA AAAGTTGGTT TTGCCCACCT TCAAGACCGC GAAGGCCAGA rTCAGATCTA CGTrfCGTAAG GATGCTGTCG GTGAAGAAAA CTACGAAATC TTCAAAAAAG CAGACC'rTGG TGACTTCCTT GGTGTCGAAG GTGAAG'rGAT GCGTACGGAT ATGGGAGAAC TCTCTATCAA GGCAACCCAC ATCACACACT TGTCTAAGGC TCTTCGTCCT CTTCCTGAGA AATTCCATGG TTTGACAGAC GTTGAAACAA TTTACCGTAA ACGTTACCT'r GACTTGATTT CTAATCGTGA AAGCTTTGAA CGCTTTGTCA CTCGTTCAAA AATCATCTCT GAAATCCGTC GTTACCTTGA CCAAAAAGGA 'rTCCTTGAAG TGGAAACACC TGTTrrCAT AATGAAGCCG GTGG'rGCTGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 TGCCCGTCCA 'rTTATCACCC ACCACAATGC GACTGAGCTT CACTTAAAAC GCCTTATCGr CCGTATC'TTC CGTAACGAAG GAATGGACGC AGTTTACCAA GCTTATGCAG ACTTCCAAGA ACACGCTGCT AAATCAGTCA AAGGTGATGG CCAAAACATT GACA'rGGTGC rTCGTATCGC GGGTGGTATG GAACCTGTCT ATGAAATTGG
TACTCATAAC
CATCATGGAC
CCCAGTCAAC
CCTGAGTTCA
TTGACTGAAC
TACCAAGGTA
ATCAGAGAAA
ATCGCTGCTG
CTTCTATCGA
GCATTATCCA
CTGAAATCAA
TTACTGGTGT
AGAAGAAAGT
GATTAACGAA CCATTTAAGC GTG'rTCATA'r GGTGGATGCT CGATTTCTGG CAAGACATGA CTTTGGAAGA AGCTAAAGCT TCCAGTTGAG AA.ACACrACA CTGAGGTTGG TCACATCATC AATGCCTTCT TN'GAAGAGTT 971 TGTTGAAGAA ACTTTAATCC
ACTCGCTAAG
GACTAAGGAG
TrTTrGAACcC TGACTACATr
CGACCGTC
AACAATGAAA
GACTIGAATTT
TTCTAAGACA
CATTTTACGA
AATTGATTTG
TCAACTAGGA
CTATTAGAGG
AAAAATCCTG
TACGGTA.ATG
CAAGCTAAAG
GAAGCTCT'rG
TGCATGCTCC
TAAATTCTTA
AAGGAGAAAA
TTGTTAGAAA
TAGTACGCTG
TTCAGATTCA
TAGTATCTTG
ATGTTCTCGT
AACCAACCTT TGTCTATGGA AAGACCAACG CTTTACTCAC CCTTrACTGA GTrGAACGAC CCAAAGAACT TGGTGAI'GAT AATACGGTAT GCCACCAACA TCACTGATAC AACAACTATC TCCTCTGGGT CTTATCAGAG TGAAGTG'rAG TATATrGAAA TTGGTTTAAA TrCCCTAAGC AAACTTTTCA AAAAGTACTA CTATAAATAA AAAATT1AATA CTTAA.ACAGT ATATATGGGA GTCTTA'N'CA CT-rGTTTTTT CAT'rCATACC TCTCTCAACT CTrCCTCCTC ATGAGGTCAG ATTrCCTCAA AAGGGCAGAC TTCTrTAATG CATCA'rTAAC TCAGGTTGAC TTTTCTAATC GACTAGACAA TT'rGAGGAGC GAAAAAGATG GCGGAAGCGT 'rGAGAAAAAG ATAGAGATTG CATCCAGTAG CTGI'ATCTCC CGTTTCGAGC TCTTTATCAT CCAATCGACC AACTrAGCCG GAAGCGACAG GAATCGACTA GGTGGTTTGG GAATCGGTAT CGTGATGTAT TGCTCTTCCC GATrrrTTGA TTCAAAAAGA T'rGAAATAGT ACACTTTGAT AATTTGTGCA TGTTTTAT'T GAAATTGACT TGGATTCCCC AGTGGGATAG GAAGTTAGCG TTGATATAAG TCCATAGGTC ATAGTATTAG TAGATAGAAT AGATGTAACI' TACAAAACCC TTTTACTTTC TGCI'GTTCCA TCCrCCCTTG GTGCGTCACA GACGCTTTTC TTCTAGGTGG CTAGAATAAA GTGCTGAAAA TCCTTGCGTC CrGTTCGAAC TTGATTGTTA AAGTTTGGAA TAGGCGATAC AGCTCATCAT TTATCGCCAA AAAATCCCT1C CGATAAAGCT GAAACTGGTC ACATCGTAGA AC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3232 CAGCAAATAA AAACCCAAAT CTGACCTCAT GAGCCACT'rr GTrATCGTTTT TCCTCGCTAG CGA'TT'rTC ATC'rCGACTG TTCATAAGGA ACAGGAAGAT CAATTCGGAA TAGGCATAGA ACATTTTCCC ACCACGTGAA GTCACCTCCA GCTAGATTT CATACGAACT TCGT'r'r'TGA TTAAGGTTGA ACTATCCG'N' CTTCATCTCC TTGA'rGAAAT 'rCTCGGCT'rG ACCACGTCCA TTGGCT'rGTT CCACTCGTCA TATTTGTAAC GAGAGAAATA INFORMATION FOR SEQ ID NO: 145: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 10711 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 972 (xi) SEQUENCE DESCRI.PTION: SEQ ID NO: 145: CCGGAGAAAA TGATGAAAAG TTCAAAACTA GCGACTACTT TAGCTGCATG CTCTGGATCA TCATACATTr ATGAGACAGA CCCTGATAAC ACACAAATAT TACCAGTAAC GTGTTGATG 'I~TGTGCCGTC TATGGCTGAG GAT'rGGTCTG C'TATCCGTAA GGATGCAAAA TGGTATACTT TTTGCCCI'G CGGGCGTGAC ATTATTGGCG GGTTCAAGCA CTAAAGGTrGA GAAGACATTC CTCAACTATT TG.ACAACTGC TAAGGCTGCG GTT'rGCAGA
TATCCAAGGA
CTGAAGGTGA
AA'ATCGC
TGGATTGACTr
AGAATACGCG
TACGGGAACT
TACACTTATA
GCAGTCAAAG
CTCAAGACTT
TTGTTCAAGA
CACAAGTAGG
AAAGCTTCTG
TGAATTCAAA
GTCCTTATTT
ACTACTGGGA
TGTAACAGGA TTAAAATATG ATCAATCAAA GCGTTGGATG AATTAAGGCT CTGGATGAAC CTGCTGATAA AAAATCAGAT GCTCTTTACC CCTATGTAAA AGGGGAAATC AAAGATr'rCT
GAATTCTAAG
AGGAGATGAT
GTTGAAATCC
TAAGGACAAT
ACAACCATGG
TTTGCCA.AAG
ATTGTGACCA
GTGCATGTTG
GAAAACTTTA
GCAGAACTTG
AGACAGTTCA
GTG'rGCTTGC
CTACGGATCC
AATCCTCTGT
ACAAAGTTAA
AAGATGGTAG
AGAAGAGTAT
GTACACTTTG
GCCAG=rAAT
AAGTAGTC'C
TGAATTTGCG
ATTGTCArrC
CCT'TACAGCA
GAAGGACAAT
TGACCGTCAG
AAAGGCTCTC
CTATGCCTCT
AACAAACCAG
GAAGAGTTTT
TTGTATAACG
AAAA.ATCCGA
CGGATGGTC
GCTCGTCTrCT ATT1GCTATA
TCCTATAAAT
TTAAACAAGG
CAGTTGAATG
AAGATACCAG CAALACCTGCA ATCCAACAAG TGCAAGTTTC
ACACATCTAA
ATTTCCGTCA
GACAAACTGG
CTCTA'IrACG
GACCAGCGAC
GGCTATTGCC
AGCAAGTAAA
TATCTAGTTG GTACAAATAT GA.ACAAAAGG CATCGACTAA TTTGGATTrG ACCGTACAC ATCTTGCGTA ATCTCTTTGT GATATGGTCA AAGAGAAATT CAT'rCTCAGG ATGCTTT'A TCAGCCTTAC AAGCAGAAGG 240 300 360 420 480 540 600 650 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 CAGATGGTAA AAACTTTGGC GGAAGGATGT TAATCTTGCA CTGAATTTGC TAAAGCTAAA 'rGGATATGCC AGTTGACCAA GCCACCAACA TTTGTTCAAG GGTCACTTAT GGGGATGAAT CAATCCAGAA AAAGCCAAGG AGTCCAATTC CCAATTCATT
AATCCTTGGA
AAGACGAAGT
TATCAGATAA
TCAAACCTTC
ATGTAGCTGC
AGCAACTTTA
AAACAATAT
TGTCGGTTGG
TGTAGGAGAA
TAAAAAAGTA
ACAGCAACTA CAAAAGTTCA GCGCGTCCAA GGAGCTGATA ATGTCATTAT TGATATTCAA ACATATITTG CTGAAAATGC TGCTGGCGAA
TCTATGAAAC
CAACTACAAA
GACTGGGATT
GGTCCAGACT TTGCCGATCC A'rCAACCTAC CTTGATATTA AGTACTAAAA CATAT'rTAGG GTPTGACTCA GGGGAAGATA CGTCTATATG ACTACGAAAA ATTGGTTACT GAGGCTGGTG AAACGCTATG ATAAATACGC TGCAGCCCAA GCTTGGTTGA ATGAGACTAC AGATGTTGCT 973 CAGATAGTGC TITGATTATT CCAACTACAT CTCGTACACG GCG1'CCAATC ?rGTCTAAGA 1800 TGGTACCArr TACAATACCA TTTGCATTGT CAGGAAATAA AGGTACAAGT GAACCAGTC'r 1860 TGTATAAATA C 'TGGAACTT CAAGACAAGG CAGTCACTGT AGATGAATAC CAAAAAGCTC 1920 AGGAAAAATIG GATGAAAGAA AAAGAAGAGT CTAATAAAAA GGCTCAAGAA GATCTCGCAA 1.980 AACATCTGAA ATAACTGTTG CAAAATATAA GAAAGGATTT AGTATTTCCC TGAATGCTG 2040 AATCC~rTr TACATN'TCTA AAGAAAGATT CTAAAATGTA CGGACCCCCA AAAGTTGGAG. 2100 CCTCTTTTTG 'rCAGAATAGA GAAAATrTT GTTAATTTTA CTTGTTTCCT ATTGCTTTCT 2160 CAGCTATTAT 'rTGTTATATT AAAAGTATAA 'N'A~rT'ITTA =TATCAGAG TrAAGCATTG 2220 CAC=TCAGA GGAAGGAGTA TTr=AAAA AGAAAATGTA ACGTNTGCT CAAAAATGAA 2280 AGGATTTAGA AGTTTATGAA TAAAGGATTA ?rrGAAAAAC GTTGTAAATA TAGTATTCGG 2340 AAATTTTCAT TAGGTGTTCC TTCTGTTATG ATTGGAGCTG CATTCTTTGG GACAAGTCCG 2400 GTTCTTGCAG ATAGCGTGCA GTCTGGTTCC ACCGCGAACT TACCAGCTGA ?rAGCTACT 2460 *GCTCTTGCAA CAGCAAAAGA GAATGATGGG CGTGATTTTG AAGCGCCTAA GGTCGGAGAA 2520 *a.GACCAAGGTT CTCCAGAAGT TACAGATGGA CCTAAGACAG AAGAAGAACT ATTAGCACT'r 2580 *aaGAAAAAGAA.A AACCGGCTGA AGAAAAACCA AAAGAGGATA AACCTGCAGC TGCTAAACCT 2640 *.*GAAACACCTA AGACGGTAAC CCCTGAATGG CAAACGGTAG CGAATAAAGA GCAACAGGGA 2700 ACAGTCACTA TCCGAGAAGA AAAAGGTGTC CGCTACAACC AACTATCCTC AACTGCTCAA 2760 *AATGATAACG CAGGCAAACC AGCCCTrG'rr GAAAAGAAGG GCT'rGACCGT TGATGCCAAT 2820 a..GGAAATGCAA CTGTTGATTT AACCTTCAAA GATGATTCTG AAAAGGGCAA ATCACGCTTT 2880 GCTGTCIr TGAAATTTAA AGATACCAAG AATAATG~r TTGTCGG'rTA TGACAAGGAT 2940 *GGCTG CTr GGGAGTATAA ATCI'CCAACA ACTAGCACTT GGTATAGAGG TAGTCGTGTT 3000 *GCTGCTCCTG AAACAGGATC AACAAACCGT CTCTCTATCA CTCTCAAGTC AGACGGTCAG 3060 CTAAATGCCA GCAATAATGA TGTCAATCTC TTTGACACAG TGACTCTACC AGCTGCGGTC 3120 AATGACCATC TTAA.AAATGA GAAGAAGATT CTCTCAAGG CGGGCTCTTA TGACGATGAG 3180 a..CGAACAGTTG TT'AGCGTTAA AACGGATAAC CAAGAGGGGG TAAAAACAGA GGATACCCCT 3240 aGCTGAAAAAG AAACAGGTCC TGAAGTTGAT GATAGCAAGG TGACTTATGA CACGATTCAG 3300 TCTAAGGTCC TCAAAGCAGT GA?1'GACCAA GCCTTCCCTC c'rGTCAAGGA ATACAGCTTG 3360 AACGGGCATA CTTTGCCAGG ACAGGTGCAA CACTTCAACC AACTCTTTAT CAATAACCAC 3420 CGAATCACCC CTGAAGTCAC TTATAAGAAA ATCAATGAGA CAACAGCAGA GTACTTGATG 3480
AAGCTTCGCG
GACAATCAAT
ATGATGCTCA CTTAATCAAT TGCACTTTGA TGTGACTAAG 974 GCGGAAATGA CAGTACGCTT GCAAGTTGTA ATrGTCAACC ACAATCAAGT CACTCCAGGT CAAAAGATTG ATGACCAAAG CAAAcTACI' TCTTCTATTA GTrrCCTCGG G;TCTCTGN' CTAGTAATCA AACTGGTGCT AAGTrrGATIG GGGCAACCAT ACGCATGTCA GCGCAGATGA TCATATCGAT GTAACCAATC CAATGAAGGA GTACATGT ATGGA~rTGT TTCTACAGAT AAG=CTCG CTGGTGT'-r CAAAACAGCT ATGGTGGTGG TTCGAATGAC TGCACTCGN TGACAGC7TA G'rCGGAAATG CCAAC'TATGT AGGAATCCAC AGCTCTGAAT GGCAATGGGA
CAATGCTTTA
GTCAAACAAT
TTTGGCTAAG
GAGTAACTCT
TAAAGAAACA
AAAAGCTTAT
AAGGGCAT'rG TTTTCCCAGA ATACACGAAG GAACTTCCAA GTGCTAAGGT TGTTATCACT GAAGATGCCA ATGCAGACAA GAACGTTGA'r TGGCAAGATG GTGCCATTCC TTATCGTAGC
ATTATGAACA
ATGAACTTTG
ATCAATCTCC
ATCCTCAAGG
GTTCTCAAGC
ATACAGATGG
TTGGGAAAAA GTTAAGATA TCACAGCTTA a~ .1~ S S 5 S S S. 55 S S
S
S~ .5 S S
S
GGCCATGACT CTGGTCACTT GACTTCAAGA CCCTAATTGA AACGCTTCAG AAACTTATCC CCAGATGGAA GCTATAGCTA GCCTATGACC TAGCTCATGG GACGGTC'rCG ACTTTATCTA GCCTGGGCTA CCCACGTTCT GAGTGGGGCC ATGGTGGTGA TACGGTGGCT ACACCAATAA CAAAAAGATG CTTGGGTAGG CTAGGTGGC'r ACACCATGAA TATGTAACCA ACTTrATTGC -AGTAAATGGG AAAATGGTAC ACTCCAGAAA TGCGAGT4GGA AAGTCAAATG ATGTCAATAG GTCATCCAAG ATGGTTCAGC CTI'rCTACTG ATAAGGAAAA ACAAAACCCA TTCCT'rA'GA TCTTGGGCAA GG'rGT'rCTCC GAACTATGCT GATATTGGTA GAAGGCTAAG AAATATG4GAC TGAGTCTAAA TACITrCAATG TGGI'GGAAC TGGCTAGATC TCGTTTGGCA CGTTGGGAAG TGTGGACGTT TGGGGTAATG TGCTAA6AGAA ATTAACAAAC GTACGACTCT ACCTrCCATC AGGTATCAAC AGTGCCATCA GGAC'rACAGA AG~rATGGTG AGACTTTGAA GGCTGGCAGG CCATGACG'rC ATGACTAAGT ACCGGTGACT ATGACCGATA A'N'GGTAGAT GCTGACAATA TCCACAATAT CGCGAACGTA TTACT'rGACT CC 1'GGAACT GATG'rACTAC TCAATACGC
CC'?TGGATGG
TTAAAGGATA
AGCGTATCGG
CTCATCTAGG
AAAAAA'rTCT
AAGGTATCAA
ATTGAAGAA
GTCAATCAGG
AAGGCTGGCG
ACTrGGGCAGC
CCCGCTTTAT
GTGCAGCCAA
GAAGAAGTGA
ACTTCCAACA
CCGTATCGCG
TATCAAGAAA
TGGTAGCGAA
TGGTGT~CGAA
TATCCACGTTI
CCGTAAGAAT
CATGATGCT
AAAACTTGGT
TGATAACGGT
CTTTGCGATC
TGACTTGACC
CCGTAACCAC
CTATCCACTG
CTACAATGGC
CTTCAC'rGTA 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 ACGGTAGCAC C 'ATAAATGG ATAA6AGTAGT TG TAACTCGT CAGTAACGCT CAACGGACGT GGGATGCAAA TGGTAAGAAA AGCCGGTGC AACAACTTGG ACCCTTCCAA G.CGATTGGGC AAAGAGCAAG GTTTACCT
AAGACAGAAG
AATCAACCAT
GGCATGCACA
AGCAAGAACT
ACGTTCTCTA
TCTATGACCA
AACTGTAAAA
TCGTTCGAAA
AGGATTTAAT
GATGGTAAAA
CAAACTAATC
AGCGG3'ACCT CGATGCTT CTAAGGCAGA CAAGGAAACA AAGAAAAAG? AACTATGCCG TTTATGTTGG AATACTGGTG AAAAAGAAGT AAGGCCTACG CCCACAATAC AATTGTCAAG TCTCAAGGGG TAGTCTCACT CAGAAATTAA TGTAGATAAC CGTAGTAATG GACTACTTA'r ACCAATAACT ACGTCGTGAC AATGC'rACAG ACAAGCTAAC TGACCAAGGT TTACCCTACA 'rCN'AGCA CTGAAATGTC ATGGAGTGAA TGAAACA'rTG GACCATTTCA CAAACCATAT GCTTCGTATT CTGGCTTGAA ACCAAATACC CCAAGGCAAC TATCACTGTG CTCTCGCGCT CAACTATGTT T'rGACGATAC AAGTTACTTC CAAAACATGT ACGCCTTCTT' TACAACTGGA AGTCGTGAAG CTGGTGATCA AGCAACTTAC TCAAGCATGT ACGGAGACAA GCATGATACA AATGT'rGCTC AGGGTATCTT CCCATTTGTA CGCACTCACT TGTCTGAAAA ACACAATCCA GTCGATGATG TTATCGAAGG AAATTGGTCA AACTTGGTTT ACCAAACCAT CCCACAAAAC GCGGACGTCT CAAATGTTAC TCTGACATTG TTTGATGAAA ?TCGTACC'rT TGAAAACAAT GGTAAAGGCA CCTTCAAGCA AGACTTTGAA GTGGGTGGTG TCGAAGGTGT TGAAGATAAC TATACACAAC GTGGTTGGAA TGGTAAGAA-A CTCAAGACAA ATGGACTAGT GAGCCGTCGT TTCCGTTTTG AAGCAGGTAA GACCTACCGT 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 GTAACCTTTG AATACGAAGC AGGATCAGAC AATACCTATG GAATTCCAGT CAGGTCGTCG TGGTACTCAA GCAAGCAACT AATAC'rTGGA CAGATTCTAA GAAAGCCAAG AACGCAACCT ACAGGCGATA CTTGGGTAGG TATC2'ACTCA ACTGGAAATG TCTGGTGGAA ATGCCAACTT CCGTGGTTAT AACGACTTCA GAAGA.AATTA CCCTAACAGG TAAGATG'TTG ACAGAAAA'rG ACGGTTGCCA TCACTAACTA CACCAAAGAG TCTATGGATG AACCTCAGTC AGGCCGATGA TGATATCAGT GTGGAAGAAG ATGAAGCTT TGAAGAATGC 'rrTCGTTCAG AAGAAGACGG GCAAGTCTTA CAGCTCCTGC TCAGGCTCAA GAAGGTCTTG GTGTCTAGTC TArGGCA'rAC ATCTTGGAAT GGTGGAGATG GTCT'rGAAAG AACCAACTGA AATCACAGGA CTTCGCTATG C7 GTAGT CGGTAAGGGA TGGAAATGCA TGAATTGCCA TCCTTGTGAC AGGTGCAGAA CAAGTAATAC TCGTGGTGAT TGATGGA'rAA TCTCAAATC CTCTGAAGAA CTAC'N'GCCA C -rTAAAGA GGCGGTCrT CGCGTGCAGA GATTGCCAAG CTTTGGTAGC ACM'GACTTT CAAATGCCTT TGA'rGGCAAT 'rAGGCAAGCC TGCAACTATG TTCCGCGTGG ATCAGGTTCA AATGG'rAACT TGCGAGATGT GAAACTTG'rr GTGACAGATG AGTCTGGCAA GGAGCATACC TTTACTGCAA CTGATICGCC
ATCAAGGCTA
TACCAATCTG
TTGTCAGGCT
AGAAAATTGT
CAGCGGAACT
ATGAAGCAGC
976 AAATAACAAC AAACCAAAAG CCT-rACTGGT ACCAAGACAT TA'rCTTTACT CGTCCACAGG 'rTGGTTAAG GCTrCAGAAAT
ACGGAGATGG
TAGCAGAAAC
TAACAGACAA
ATATTGACTr 'rGGrAAGACA GAGGAAGTAG CTACG=CA GGCAAGCATC AAATATCCGA CGGATAACCA GAAAGAATGG TGGAATACTT 'rGCAGAT'rAT CTCAACCAAT TAAAAGATTC CCAGATGCTC CAACTGTAGA GAAACCTGAG TTTAAACTTA GATCTTrAGC GGTAAGACGC CAGATTATAA GCAAGAAATA GCTAGACCAG AAACACCTGA CCAGCAACAG GTGAGAGTCA ATCTGACACA GCCCTCATCC TAGCAAGTGT CTATCTGCTC TCTTTGTAGT AAAAACGAAG AAAGACTAGT ATTTAGTAAA AAGATTACGG AAGCAGTCTC TATCTTTTCC AATGAGGT'rT ATAGTACAGA
TGGAGATAAA
ACCTC?1'GAC
AGACAATCAA
TCTCTTGACG
TGCTACGAAA
TTCCGAGCAA.
ACAAATCTTG
TAGTCTAGCC
ACCTC?1'AAC
AAAAGCCTGA
GAAGATGTCT TCTCAGGCTT TTGTTAAGCA CATAAATACA ATAGTGCTAT GACAAAATCA CCCAGAAAAA TCTGGGTGAT AAATGTTATG GTTGTGCTGG TTGAGGATTC TGAT'MrGT'r GATCAGGGGT TG1TATTTGAT TGTTGCGTAT TATTGTTAGG ATTGGTAGTC GTACTATTAT T'rGTGCT'rGG AGTGGTTGAG CTAGACTGTG AAGTTGAACT ATCTGATGAT GAGCTTGAAC TTTCAGTTGA TGGGGGT 'GT TGTGGAGCAG GTGACTTCCA CGTAGAACGA GCACCATTITT TAA.ATACGAA TTCTCCATTT CTGTAGAGCC CCTCTGGTAT ATTCCAATCT TCTGGATTGC 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 'rTCC'rTCAGA CAGGTAGGTC TGCCTACAAG TGGTGTCAGA GCGTATAGCC AGCAAATAGT CAATTTCCTC GTCTGTATAG GATAGGCATT TCCTCCAGTT AGGCTGTCGT TTCCTTCATG CAC'rAAAGAC GACTTTATGG CAGCGTAAGC AGCAGCCATC 'rGTTACTTGA AATGGCATr GGAAAGTCTT GGCGCGGTTG GCGAI'rGTTG CAGGGCGTAT TCATCAGGTG CTACAAATTG 'rAGAGGTTC CTGT IACC CCATAAGTCA AGACTCTTTT GCACGAGTTC CGACATTAGA AGAGGTCTTG ATGTCGTTT'r AGCCTGAGGG AGCCAAGCAA CATCATGTCG GTCAtCATAT GAACTCTTTT TCACTCCCAT ATCATAGAGC GGTAAACTTT GGCAGCGACC GTAAGGCCAT CGGTTAGAAT AGCCTGTCCA TACAGCCArr GAATATTTAC ATATACATTG GTTTA'rAGTA AGTTCCACCA TTT'GCAAACG TTTTCACTAC TTGCTCCATA TT~rTTGTCT GATTCGGTTG
GAGTAGTGAA
AGTCCGACCT
TGCAAGGTGA
TACTTGGGTA
TGTTTAGAGT
TGTTGCCAAA
GTCGATTCCT AGACCATT'rA TTCCACGGCT GGGACGTr'rC GTAGCCCCTA TCCCAGTTAT GTGAACGATA GTAGCAGTTG GATCGCTTTC ATAGTrGATC AAACAGGAGT A'TrGTCCCA GGGTAGTTAT AGGGCTCATC AATCGTAGAC ACCGTACTCC AAGGCAGGAG CATAGTCTGT 977
CCCAGTCGCG
GGCGTGCTCC
GTGTTTCT ACTGCTTGGT' TAATTCCGAA GGAAACATTA TAGCT=GCA ATGACrTAC CGTTAGAAAC ATCAACAATG CTTGCAATTC ATCGTCTCGA TAGGCAACGT ATTCCTCTGT ATTGTAAATA GTTTTTGAGC TTCTTGGTCT AGCCTGTTTC TTCTTCAACT ACAT'rrGTGT AGACATCCAT CCCAGTTGTG TGA'rTGAYGA CTTCCTTGAG GTAATTATCC CT'rGACTGAT
GTAGAAGCGA
TCCCACAGAT
AGTAGGTTAT
ATGTAAGCAG
TTGACTGCTT
ACCAAG=TC
GCCTrGAGGCA 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 GCTAA'rrACT
TCTCATACTG
GGCGGTCTTG
TGCTGATTG
TTCAGCACAG
GGCTGCTTCT
AGACT'rTA GTCCATCAGT
ATGTAGCCTT
GGATGTGAAT
TGAGGTAAAC
CCATAGTTCC
GATTTTTCAT
AGGGGTCATA
TrAAATTrATT CAT'rAGACAT
AATTGGTGTA
TTCAGATAAG
TTGGTTTGGT
TTCCAGCCAG CAAGGCTAAC TTTGAGCTrGC TGTCTGCATT GAGGTCTTTA CCATAGTAGT GTAGACCTTA TTTATATAGT AGGTCAAGAT TTCTI'GCTTG GTTlGCTTPT G7TTCTAACTG AATCGCTAAC CAAGCTTCCT GAGCCTTACG AGAAATACTC TGGTCGGAAG TCGAAGTTGA AAAGTAAGTC AACTTAA'CA ACTGTTGGGT GAGAGTTGAT CCACCTTGGA GGGAATTC'r TTGCAGATTG CCCAAGAAAG CTCCCAGGAT ACGGATGGTA TCAATCCCCC TCTGGTCGAA GAAGCGATGG TCTTCGATAG AAACGATTGC CTTAACCAAA CAGAACCCAA GTCAGCAATG CAACTAGTTT ACTCTCGGAT CGCCTAAGAC AATdGCTGCG TCTGTGGGAA TATCATTAGC TTGGGCATTG ACGCGGCGTT AGTTGATTTT TATTCTCGTA GA TACTA GAAGT'rGT'rG AGGCTAGGAG CCTTGCTAAC GTAGTAGAALA AAAACTCCTC ATAACCAAGC TTAAGAAGCT AATGC'rCAGA TACTTGATTA GGCGCAGAAT CGTTGGTT'rG T'rCATCTTGT TTTrACCACCT AATAAATGTT CTTTGATAAC ATTGAGATAA GGAATP'rGAG GGAAGGCACC AGCCTTGATT TCATATCCAT ATTCTCGAAT ATATTCAAGT GGCATTGATT TTTGTCCC'rT ATCTTGATGA TAGAAGCGAA TCAAATCGAA TGCCGGCAAT AAGTAGGTTT CT'rCCTGAGA AGAAAAGTGA AGAAGCACAA AGCAGATTCC TTm'TGGGCA AGGACTTGTT CCATATGCTG AATCTGATGT GGATGAAAAT 71='CATCGG AATCGCACGT TTrTTGTITrrG TrTCCTTGAC TTCAAAGTCG ATGTAATATC CATTATAAAC GCCAGAATAG TCCGTCGTTG AAGCTTGTCG AAAATAGGCT TCAACAATCT TGGCACGACT TCGTTGTGGA TAGTCCACTT GTACGA'rTrG AATAGGAGTT GGTTTCTTAT GTATAACAGC CAAGCCCTGA GACAAATAGT AGTCGI'GGT AGCATITGATC ATCTTTTCAA AGGGTACCGA GCTCGAATTrC GTAATCATG'r CATAGCTGTT TCCTCTGTGA AATTGTTATC CGCTCACAAT TCCACACAAC ATACGAGCCG GAAGCATAAA GTGTAAAGCC TGGGGTGCCT AATGAGTGAG 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 978 CTAACTCACA TTAATTGCGT TGCGCCCACT GCCCGCTTTC CAGTCGGGAA ACCTGTCGTG CCAGCTGCAT TAATGAATCG GCCAACGCGC GGGGAGAGGC GGTTTrGCGTA TN'GGGCGCTC TTCCGCTTCC TCGCTCACTG ACTCGCTGCG C INFORMATION FOR SEQ ID NO: 146: SEQUENCE CHARACTERISTICS: LENGTH: 11887 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 10620 10680 16711 (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 146: TACATTCATT CCATCGGCTA CTCCATAATA CrTAGATAAA ACCATAGCTG CGGATACTGT AAAGTATTAT CAATTTrTAAT CAA.ATCATCA TTACCGATAA
AAGTCGAATA
TACTTCTGAT
ATCATTAGAC
AGAAATAACA
TGC TTT TGGT AGTATGAACC TCTGGACCTT TTTCTAGTGT CTCAAAGCAG ATACTGTCGA CCCAATTTTT TATAAACAGT CTAT'rCTAAA TrrCTTAATC CTCCA'PTATC ACATAAACAG AGAGGTACTT GCTTGCTTCT TAAAGCATGT TACCCAAGTG CTTTATTGAT AAAAAATTCT GAGTATTGGT CTCTCTTrT TATTATCTGA TAGGCTCCAT GCCGCAGTTG CTCAAAACAC ATACGTTGGT GAAATCTCAG ATAATGAAGA CTCACT'rACC TCATATTCTT CACCCTTACT TAACTGGCTA GCCAATAAAG TACTCGCAAT AATTGAAATA TTTCTTCAITr ATTGTATCCT CCTAATGTAA TTATAGCGTA TACTATAGAA TCAAGAAATC TACCACCTTC TTTAAATACC GTAAACTT'rT CAATTAATGA CTGCGCTTT T CAATCACGCT TTGATACTAA G'rTCAGCCAT GGATTCGTTT TGGAGTAGTC GTTTGGT'rCA AGTTATGAAG CTTAACAACT GGTGGTCAAT TTrATACTCA ATGAAAATCA TGTTT'rGAGG TTGCAGATAG TCTTTCCTTG TTTTCTCAA TCGCAGAGTC CAGCCAATGG GAGAATCTTT TCCATTAATT AGCGACACGT CTCAGCCAGA AAGAGCAAAC TAGGAAGCTA AGCTGACGI'G GTTTGAAGAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 ATTTTCGAAG AGTATTAAGA TTATT'rCTTC TACTACTCGA TCTAGGATAT CTACCGTGTC 'I'AGCTTAGCC AAATCGTTT CCTTTAGATA CACATAT TGG TATTT'rCTAG GATCCTTTTC AATCTTTGT T~rTTCGCTT CTGGAAAATA CAATACCTAG AAAAGAAAAT TGATGGCGCA AGTCTTTTGC TGCTTCTAGC TCCTCAAGTA TAGTTCAGGG TGTTCATACA CCAAACTCCC CCACAAGGAT T'N'GTCACGA CTAACTGCTC AGACTGCATT GCTT TCAAAT AGTTAGCAGC CCAGCAACTG TCTGCAAAAT CCCAATCGAT TTTTATAGAG TTTATTTCTT TCAGGCACCG TATAGGCTTC CATGGACCTT GCTTNTTTAG AATCTGCTAA ACTCATCTAA AACTCCTCTT CCCCCACCAA ATGGTGCTCA AAGGCATAGA CAGCCGCCTG GCTACGATCG CTGACTTCAA 12 1320
GTTTGGCAAG
CTGCGATGCG
CAG'rCAATTC
AATA'N'GGAC
CTGATrTCG TTrCATGAAGT ACGTGGGTCT TGACCGTCT TAGCCCTTGG CGATGAGNTG 'VCCATATGAT TGCGGTGCTA CCAGCAGCTA CCTTACTGAC CTTGCTCAAT GGCCAGCTCG CACTAGAAGT CTGAGCATA CATTrGTCCAA ATAAGAGGTC CTAAGGTCGC GrCAATCCCA GACCCAGTTC CAAGGCCAAG CTACATCGTC TTGGAGGTCA CTACTAGTAA AATTTTCATC ATATCAACCC CCAGCCCTTG TCAACCCGCT CCTTGATATT AAACCAATCC CATTGTCCAC ACATCTAGGC AAGATGCCTG CGGAAGATA'r GCTCCTCGAT
TAGCCTTTGG
ACAATCAAAA
TTCATCTCAG
'rCAATCCCT
CACCAGCATC
TC'N'GGCTTC
GCATCACAAT
GAGACCCT1 AAGTAGCTT TCAACCCCAA TTTACTCCTT TATCATTCCT GAGAGAGATA AAGAGGTCAT 1380 GAGAACATCT CGCTCACGCG 1440 TTCA.ACCT'rC TTGCTAACCT 1500 GGCATGAAGC AATTCATCTG 1560 TAAGACTGGC ATGATTTTTT 1620 AGGCCATTCT TT~AAGGATTG 1680 ATCCATGACA ATGACATCTC 1740 GCACGCCTCA CCCACAACr 1800 TCGGACCATT TCATGGTCAT 1860 TATCTAACAG GGGAATACGG 1920 GAACTGTTCC AGCCATATCT 1980 CGTCTAAGCT CCCTAACTGG 2040 CATCTGTCTG ATAGAGGTAG 2100 TAATCAACTC TTGCAGGATA 2160 TATTCI'GCTT GAGACTAACC 2220 GAA'CCCTTC TATCAAGCTC 2280 CCCGCAAATC CTTCGGGC1' 2340
CTTGGGAGCT
TCGCAGTCCA
CACCT'rCAGT GGCArGGcCGG TTTCT'rAGGC
GTCAAGAGTT
TAACTCAAGT
TGCAATTCAA
AGGGTATTGC
AATTTCGTCA
AAAAGAATTT
AAGAGCAAAA
TGGGTCTGCA
CTAAGATCAC TCTTGTCCTC AAGCTCTTT TTCTGCTCCA GTTCAACTGG TCGCAAATGC GTT'rCTAAAA TAGCTGTGAC ACTCTGCAAC AAAGCCTGCT GACTGATACC CGATAAAATC GTATCGTGCA AATCCCCAGC AAT'rCGCTTC TCTTTT'CTCT ATCCAATTTC ATGTGGGCCG CAAACAACTC CTGACTGACT CGTTCCTTCT CGATGATTTC cCCTCCTGA
GCAAGGCTCT
AAGGACTTGA
AATAAACGCT
GAT=CAGC TTTTTGAAGA GCCTCTGTCA AACTGGCATC CAAATCTGGA TCTGCAACCT TGAGATTAGC CTGCATTT'T CT'rAGAGAPA AAAGCTTAAG TTTACCTGA'r GAACCACTTC TTGCCCTGCT GCTCTTCGAT CCCTCGCCAA 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 AACAGGGCTA AGAGACAGGT CATGGACATG CTGAAAACCA ACAATAAAAA GACAAATTTT TCTGTTTTT'r CGACA'rCGTG CAAAAAGATA GACCAGTCAA AATCAAGTAT T'rCCAGCAAG CTGTGGGAGA AAAAAAAGAC AAATAGGAAG GAGGTGAGAG CAATAATGAC ATAGGCTTGT TT'rTCATCC TCTAACCACC TCCACATCAC CAATCATAGT GGTCAAGAAA ATCTrGACAC TCTTGTTACT Cr'rGAGATAG TCTTTTGTTT CrTGATGATA CTGT'rCATT'G CGGAGGGCTC GC7rGGCTG GTTGAAAAAA A'rCAAATCCC CCACATCTAC AGGTACGATG ATTTTGGTCG TGTCATGATT GGTTAAGATG ACCCTCTCCA ACAGATTGAT ATCATCGAAT TGSGCAAGTCT ACCAACGATT TTTCTCCTrC TTAACCGTCA CTrMCCTC GTTCATCATC GGGTAAAGAA TAGCTAGAAT CACAAAAGGA TTGAGCATAA 980 CATAGAGACA GTrA.ACGCTG AGACTGACTT TTCCTACCAT CTTTCTGAGG ATAATGACAT GATGAATAGT GTCCTTGCCC GGTAGCTTGA AAAATIGATGA CGACCTC~rC AAAAACCAAA GAAAGAGGCT ATAGATAACC CGATGAAAAA GAAGAGAATG TGTAGTAGCG AATCAAAAGC CTGATACCAT CAAA.ATCAGA TAAATr'rrCT CATAGGTTCA TCAGTCTGAG GATGGAAAAA
ATGAAGCGAA
AGATTTCCAA
T'rGGTCTGCT 3060 3120 3180 3240 3300 CTAAAAGAAG ATT'ATTTCCC ATAGTATCAG CAGAAAACGC GAAGACAGGC TTCGATAAAT TCTAIrTTAT CACAATTCAA TACGCCTTTT TCATCTGATC TTTTGTAGAT CTGCTATGGT T'rCCAGCCTT GGACAATGCC ACGGTTCGTG ACAATCCCAC GGATTCCGAA CCCCTCCACT AGAAGGGCCT GCATGGTAGA CGGTTTTCGA TATAGGCAAA CCGAATTrCAT AGGCTCTTTC ACAATAGGAA CGCGAATTTG TTCCTTTCTC CCTCGGGCAT ACAGGATTCA TCTCTTCTAC TTCGT-rCCAA GGAGGAGATT TTTTGAGGG CTGCGCTATA TGAGCCAGCT T~rGATTGAT 1AAAAAGGAA AGTCCCACTT TTGTAAAGAG GCAAGGAAGA
TCTTTACCAG
GAAAAATGC1T
AAAAAGATTT
AAAAGTCACC
GCAACAAAAA 3360 GTI'GCCGCrA 3420 AAAAAGAGGA 3480 GCrCCTGTCA 3540 TCCTCTCCCT 3600 AGGCGCTGGT 3660 ATAGTC'rACT 3720 TAGATCTGCT 3780 CAATTCCAGA 3840 CATATCCAGC 3900 TTGGGCATTG 3960 CTTTGCTTrCT TTTAATTTTC CATAAAGAAG GGCACAGTTA AGGGAACACA TAATCAAGCG AATCACTTCT TCAACTGTCr AGGTTTCAAC AGCCTTAGCA CCAAAAACCA AGCACTTAAT AACCALAGAGT TCGACCTTAT CTTTCCATTC 0 0 *0 CTGACCCCAT TGATTGAGGT GCTGGTGCCA CCACGACCCG GATTGTCTTG GCATCCATTC CTTGCTATAA TCTGCTAGAT GAGTAATTCC TGCATGACAT AGTCTGAAGT CCTAACTCGA GGGATGACTA GACTTGACAG AGAACCCGTT ACAAATAAAA TTCTCT'rCCC TTATTACTTC TCGACCAGCA AACTCTGTCG ATGAATCAGC TCCACCTCAT AATCACGCTG GCCACTACGA ATAGGTCCAC TGTACGAACA CAAAGCCCAC TTCCTTGAGG GCGATTGCCA GCTTCTAA-AC TGACA'rGCAC TTGCAATAGA CAGGCTTGTC CAA'rCCAATA AAAAAGAATC ATCCGTTGGA TACCACAGGA TTCCCCCACC CACCAGTCAT GGCATTGATA AAAGATCGAT TTCATCCAGA CAAAGCTATT ATAGGAACTT TTGTCGTCAT GTCCTATCCT T'rTTrAAGG T2PrGGTTGA CCAGCACCAC TACTCTTGGC 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 TTCTGCTCAA GGGCATAGAG GATATGCTCG TCCTTACGAT TTCTTGATAT AAGAGCTCAA TCCCCAGATC GGCCCAACGA TTGCGCATCA AAACTCAGGG CGATGCCACA GTCACCACCA 1. 1-1 1- 981 AACGGTCTGC AAATCTTGAC TGGCTTCTrT CAACTGTCTA AGCAAAGGCG TCTAAATATC TGTACTCAAG CCT'rCTAAAA GCT'rGCTGGC TACTTCTACT TGATCGATAA TCTTTTCTGA 4860 4920 TTTCCCCTGT TCCAAGGCTT CTACCAGAGA AGTCACCGTr 'rT'rGAGG Anw"GATTG ATAT7'TTGCT TGATTTGCTG GACCATCTGA CI'CGATACAG GGTCCATCCC ACTAAGAAAT CACATTCTAA AG7MCGTTIC ACTTGTIGAAA CCAATCACGC TCCAGAACTG 'rCGCCAAGTT TTCTTCTTCT AACCAAGCAG CCGATCAAAT GACTGGTAGA GAACCAAATC CTCTGCCACA ATACAGGCAA
GGAACCATTG
ATCAACAGAA
ACTAGAACCT
AAAAGGTCTT
TTGAATCAAG
ATAGATACGG
TA'rCAAAGCT
TCTCCTCGCT
ACA'rCATACA
AGACCAAACT
AAATTCTGAC
CTATAGTCAG
TAGCTGTCAG
AACTGCCCTC
TAAGCAAGAC AGCGCTAGTC AG=rCAACA GAGCCAGTAA AGCCTTGACA ACCA.AGACAA TTI'TCCCTTC TCGTTCCATT 'NC CCACAGA CACGAACACC GAGGAAGCT CCCATCAAAG AAGTTrAAAAA 4980 CCACTrCCT'r 5040 TTGAAAACCC 5100 CCACCI'TCTG 5160 CGTCGCCCAT 5220 AGAGCTCCTG 5280 CGACGCTGCC 5340 TTTCTAGAGA 5400 CAATCGTTTC 5460 ACATATCTGA 5520 GAATATCCTT 5580 AGAGTTTTCC 5640 ATCAAGCGAT 5700 AAGACCTTAA 5760 AGCTGGCGAA 5820 CTAGCAGTCT 5880 TTGGCAAAAT 5940 CGAACCCAGT 6000 GCAAGTTTA ACAGCAATCA AACGATGACC GAAAATTTCT CATTGGGACC AGCATCCATG CAAAGGCCAT AGCCTCA'rAA TTGTCGTAGC ATGCATAGCC CATTTCCTT GAGATAAATC CGTCGAAAGT CGTCGAGGTT TTTTCTTGTC CTCTAGCACC AAA'rTCTCC ACTATCCTTA
GATTAGGCCT
AAAAAGCAAT
GCTCTAAAAT
TCTTGACTCA
GATAAATGCT
GTAAAGTAC
GAGGCATCCG
AGGGCATTTT
AATCCTrrCT
CCAAGTCTTT
AGGCCTCTCC
TTTT'rACACA
CTCCTGACAG
TTTCTCACGA
TAAGI'CCACT CCGAAATCAA CTCAGCCCTC ATATAGATGC AGCATA'TTCA CCTGCCCAAT TCAGATAAGA AAAGGCTGGA TC'rCCG'rrAA TTCTCCAATC AGCATATCCT GATAGTCCTT TCCACACAAA GTTTCATCCC AACATAATCA TAGCTAGTTT TCCCAGGCTC C'rAGTGCTCC
CTCAGACTGA
AACCTGAGGC AAArGGCT TCCTGTGCCA ACTGACTTCT AACCATTACA AGCCTTGACC AGGGCGGACA AACCACTAGA CCGTAGGCAT ATTG'=~GA GTATCGATAC GGACAAAGCC GGTCAATAAT CTTACTCATC 'TTCGCATGCT CGACCTCATT AAAATTCGTC AGCTG'N'ACA TTGGCTGGTA AAGCCACAA 7'rTCCAAAGT TAGAGAAATA CTGCTAGTAG CAGGCACCAT GTCACGCT'A GAGATTGGTT CAAGTCTGTC TCTACAGGGT ATAAAAACTC CGAGALAGAAG ATCCAATCCA AGCTTGAAAT ACTTGAGGAC ACACCCGCTG CTCACCAGCT GGACGATAAC TTGTAGCTGA CCATTGATGT GGTCGTCTCT GTATACATAT 6060 6120 6180 6240 6300 6360 6420 6480 CTCTTTTTcT TTTTTC'r'1'C 6540 p CCCAATA'N'T GATAATAGCA ATA'orGCGT TIGTCTGAACA GCTCCTTTCT CT'rCTAATCT GGTTACCAAG GCTATGATAC AACCTCCTAG ACCATGGCTA AGAGTCGN'T CAACCAAAAA TTrTTAAATGT AAATGCGCTT GACTGAGGAT AATCGCAACT TCTGCTTGCT GGGTTAATTC CTTGCCCCTrA 7?TTTGAACC.A CTTGGATGGC GGCAATCACC AAATAGGCGG ATAAATCCAT AAAGCGAA'rA GGTTGGTCAC TAAGACAGGT GGCAATCATT TCAGCTCGAT TGACCAAGAT ATAGTAGTCA AATACTGCAC GAA'rCGCCGC CCGTTTCTCA GGGATAGCCG AGTCAATCTC ATACTCCAGT GAGGCATAAA CCGCCATGGA ACTCTCTGCA GGAACTACCT TACAGGTCAC AGGATAACCG TAAACGACCG CATGTTCCCC GACACCAACT TTTTTTGTCA TTTTTTCCTT TACAAGTATT AATTC =TCC TATCTATTTT CCATTCACAA TCGCTTCTTT CATTATTGAA CATCTACGGT CGTATTCACTr GCCAAGGCAC ATTGACCTGG AACCAACCTG TCGTCATAGA AAGAGGAAAG TCGTGAGGGA GCATGCGCCA CTCATTAACA AGTACTCGTT TCGGCCATTT AATCTTCGCC CAT'TCTTGAT CGTTTAAATC AGCTTAACAT TTTTTTGTGA ATACAGGTTC GGAAAACGCC TTATCAAGTA TGCTACGGGA TGTAAAAATA TATACTATAG TAGATTGAAA TTGTTAGAAA TCGATNTGAC TGTCCTGATC TAATAVTGA TAAGTTATCC TAAAAGTATT TCTA.ACTAAA GGATCCTATT CAATTACTAG ATGAGCAACT ATTT'rGGTTA CAATGTCTAC 982 AGGAACGTAC TGTTACAGGC TCTCTATCCA TTCTGCTAGT TCTTGTGCGT GTGTCAAA'TT CCCACCACCG CrCATCTTGG CACCCAGAGC GTCTCCCTCA GGGCTACTGA CTCCAATTTC T'rGTCCCAGT CCrrCAGCAT C -rTTTGTGA TCCCAAGGCA TGCAAAAACG GTACGGCATC 'rTCACGAGTA TGACCATAAA CACCCGTATC CTCAAGTTCT GTAAATCCTA CGTTCTTGAT CTTAGCATCC AAACCACTAG GATTCATATG T'rCTAGTACA TCATGAGGCA GATCAGCCTG TATGCTGATA GCCGCTGACG AACCCATCCC ACAACCAATG CAGGCTTCTG TGATATTCAA CAAGGTATCC TCCTCATAAA GGCGCCAAGG CTCCACCTCC AAAAGAGGCA GGGAAATGGC TATTAAAATT ATCTTACTAT GTGCCTGACC T'rACTAGACG AAAAAACGTC 7wTATTTTTCA ATTATA7=T CACAAAAAAA GCGATTGTTT CCCATTCGCC ATTATAGTTC ACAGAATAGC CTGAGCGCTA TAAGCGTACT ACCATCTGCC ACGACGAAAG AAACTCCATA CCATTAAGTA TTGACAACCT GTTTTAGTGA CGTACAAAGT ATAGGTGCGG TGT'N'GGAGA AATAGGGTTC AGTATCATAT GCT 'TGCGTA TCATAACTCT TAAATAATCG ACCACGAAAA 7TTCT'rAAGT AAGTTATGCA CTTAATTTGA CAATTCAAGA CTAGAATAGT ACACCTCTAC TTCTAAAATA GATTTATCCT GTTATTATCT CA'1rTACTA ATTATGTTGT TGTGTTATAG ATTGATTGAA AACTATCACA TACTCAAGGT CAGCTCACAG TAAATTTAAG TCAAACAAAT AATTTAGTCA 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 a.
a 0 AAAT'rAAAAA AATAGAGCAA CATAAATATG ATTACAAAAC AGAATGTAAT AGTGTTCTAC AATTTTTACT AGATAAAACT GTAAATTCTG AACGAAGGAT CACTTrCTTCA ACAGAATTG GAAATTTCGT AAX3TAATTTA 'rCATTCCAAC ACGGAATAGC TGGACTACTG TTTCCTCTAA ATAAATTGTA CCCCCCAGAA CTGGA7TTrA AAATACT~CTC TATCATCAAG AAGGCAGTGA
CAATTAGAAC
ATCrATGGTT
CAAACGTCAC
GACACACACA TATGAATATC AATACTCACT GCTATrGGT ACTCCTACAT TTATTTTCTA TCAGTAAAAA TCAATACTAT CGCTAAAAAA TTAATAGCGA ATTATGATAC TCTAGAGCAA
GATGCAGGCT
CTACAATTAG
ATAGACTTTG
ACCAATGACA
TTCCT1ACAAA CA'rTGGGAAA ATCTCGTGTC CTATTATCAT TAATAAAATA CTATCAAT ATACTCTTAA AATTTTCATC CACAATAGTA TAGGGGAAAT TTATCATTAT 0 0@ 0 0*0 0 00 VO 0 00 0.
0 0 0 *0 00 0 GAGATACAGC CAAACAAAGC CATArCTTT ATTTGCCTAT ATACATTCCA TACTGA.ATTA TAGGAAAq'TT ACAACTTTCT TGTACGATTG 'rGACGGAAAC ATCATCTAAA AATGATGACA 'rCTACAATCA AAACAAATTA AACGAGATGA TCACGGTTrA ACTTCGGAAT AGGAAGCATG TGTGCAGACA TAAGGAGAAA TGCTGCTAAA CAGCTGTTTT ATTATGTGGC TCAGTACAAT TGTCCCCT ACTATCGCAA TCAAACACGC ACTCTGGATT TTCTCGTCTTr GTCGCCCTrr TCTCTGACAC CTTGAGCTAC TTAAGGACGA CCTGCATGAT TTGTCGCCAA TCTGGCTGGC TTATCAACAC TCTGACTTT TGTATGAGGT TGAAAAAAGA ATTrTAGACT ATAGCTTT-GC TCATGGATAT TGTGGAATTG TCrAAAGTCT TAGAACCT'rC TATGTrTTAT AATGATCTCC AAAAAATTAT TAGAAAAAGT TAC-TC'rAAT 'rGGTGCAAAG GAATTTCCGG AATAATCTTA AAAGATATTA TTAGTAAATA TCAAGAATTT
ACTGAAAATT
TATCTTTGTA
GTTTTTAACC
CAAACCACTG
GCATGTTCTG
GATTTGTT'rG GGATATTGCC ACGGAATAAC CTGATGAAAA AAATCCAACA CTGATGTTTC AAGGAGATAG
TAGCTTACTA
GGTAATTTTA
TGGTAAAGCA
8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 GGGTATATTG GTGTCTATTA AATAATAAAT TCCCATTCGA ACTATGAAAT TATTTTGGAC AAACAACATA TATAGACAGT TCATCATTCG GCGACAGTAT TT'rCTACCTC CCCATTATCA T'rCGCTCCGC TAGCGATTTT ACTGATTTCC ATTTCAGAGA 00 0 0 0 0000 0000 Co *0 00 0 0 CTCT'rTCTCG
GCCAAAATCA
TCATTAGTTT
CTGTCTGCCT
GCCATGGGGT
GCA'IrCCTTA
GTCATTGCCT
ATTGAAATGT
GGA7'rCTAGG
AAATCCTGCT
CAGTCATTAT
ACATGATGAA
TCAGGCAGTC
TCAATGT'rAT
TTTTGGGCCT
CACATACAGC
AGATTTTCA-A GAAAATAGAG CTACGCTATT TTGACAGTAT GATTGTCATC ATCAACCTCA CGCCCTCTAC ATCAGTGTAA TCTGATGAGG GrTTTCCGTA AAGTATTCAA ACTATTTCCC GTATGTTATT CGACATACCT ACTGAGTTT AAGAAATATT 0.0* o* TTrCAACATCT TAAACAGCC TG7'rTCTCAC GACCAG'rATG GCI-rCATCCA ACAGACACTA TCATCATGTC TTGTGGAGCT TCCG CAC GCACCTCTTG 'rCCTTTGTCA AGCCTATATC GCATTCTCAG CCCTCGCCTA GGACGGTGGG CTCTGCTCTG TAGCCCTATC CATAGCATCG.
'M~AATTGC TCrrATCTT AAAGCATGTG TAGATTTCAC AGTCACTTCT TTGATTAAGC ACCCATTCGC CATTATAGTT CCTGAGCTAT AAGCATAGTA CCATTACCTG CATN'AGGTA GCTCCTGATG AACGGAGATA TCCATTTG GGTGGTCTAA ATTGCTCCTG AGGAACGGAG ATAATACCAG T'rGTTGGATC CGTCTTTCAC CACCAAGGTA TTAGACTGAT CATAAAGCCA GTATAATAAC GCTTCTCTTC CCAACACTGT CTTTAGTTTT CCATTrTTAT CCAAGTAGTA GGGTAAGGAC CAA'TACT ATCTCTCCAG ATAGAGAATC TCTTTGAGGT CCCCCTTTT TCAGAATCAT CTGCAAATAC CCAACTTGCA TAG7"TTT'NT CACCAGATGA GGCGAAATTA 984 CTGGCTCTGC TCCTGAGGTT AAAAGATACC GTCATACTAC ATTGCCATCT TGGATGTGTC CCCTCGGCTG ATTGCCCTCC GCACAACTGA GCATTGGGCA ACTCCTCGCC CTGCTCTCCA ATCC7rGGCA ATATGACCAG C-AGTAATCTA TTTAAAAATA GT?1'TCTGTG AGATTTCCCT ATIACTCTA ATAACTAGTA GTArr7rCA TGACCAGTTT CATCAG'rrCT ACGATTA'rCG CAAGCAGCTG TCTTTGCCCA TATCCCCAGT GACAAGATGG AGCACAGTGG ACATTCTCGC CCCGTCCCTG CTCTCCCTAT GGCGTTTCGG TGCAGTTAGC ATTGATATTT 'TGTATCTTA TGTCAATGGT TAGTCAAGTT CAACACTCAT AACTAACGAA ATGCTT=AA TCTCCCCAAT CGTCAGGTCA AGTACAACAA GAGTGTTCTA ATATAATTAT AAGCGCCCTG TCATTACCGA GACAGAATAG CCATCTACGG TCGTATTCAC TGCCAAAGCA CCAGTTGCCA TTGACCTGGA ACCAACCTGT CTTCATGTCT1 G'rACCAAG~r
GTACCATTTG
ATAATACCAA
GTAGTACCAC
TAGGTAGTAC
GTTTTCTCCA
ACCTGTCTCT
TTTATCTTCT
AATCTCTAAT
CCACTCTGAC
ACCTTTAGAT
AAAATAATAG
GTAGTAGGT
TGTACTGGTC
CAAAATTTT-C
TAAACTTTAC
GAACCATCTT GATACCAACC AGTTGCCATA TTCCCAAGGT TTTGCCAACC TGTTTTCATA GTGGTACCTT CCTGA'rACCA GCCAGTGGCC TTATTACCTA CATATTGCCA ACCTGTTTGC CAAGTCGAAT CATCGTTTAT CCACCCCGCA TTAATTTCCG TCTTAGCTAG ATA.ATACCAG AAAGAATGAT TTTGATTAAA GTAATAG'rTC GAATCT'rAC GT'rTTTCCCC GTACT'rTCTT GTTTTCCAAC CAACAAACTC 'r'GAGCACT TTTGGAAAAC CTTCTAATCT GATACCATTT GGAAACGGGA TATATTGCCCA GCCGACAACC TACTTACCAT CAATCACTCG CCAGTAGGTT CTTCCGT'rTT CTTGGACAAA CTCCCATCCT CCTAGCAAAC CAAAGAAAAA TACTGTCAGT ATCTATATAC CCTCCAATAT TAAATCCACT CATCGATAGT 'rTGGCTACCT GTAACCATTG 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 985
CTCCAGG
INFORMATION FOR SEQ ID NO: 147: SEQUENCE CHARACTERISTICS: LENGTH: 11340 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 11887
CCGGTATGTT
GTACGAATAC
AAACGCTGAT
CATGGAATCC
ACAAGGTGTT
GTATCTAACC
GAAAAAAGTT
GCAAAAACTC
CCCACAACAG
AGCCATTATT
TCTkAGAGAT
TGACTCAGTC
GAATCAACGC
TGTTCCTAAT
CTGGAATACT
CTTAAGGAAT
AAAATATCCG
TTTATCC1TT
TCACAGACAA
ATGACCGATT
ATATTCCTTT
ATCTACGTGA
CTATCAATCG
TTTCAGCTGG
ATCAGTTTTG
ACGTCCCTTG
AACCTTATCA
ACCAATCTAA GCTGGCTGTG CCCTACAGTT TTACAACCCT GTTTTGGAGT CTGGTATTTC GAAAATATGT CTAAGAAAGA CTGAATGCTA ATACAACAAA GCACT'PTCTA GTCTTTACAA GAGGAGGTTG AAAACGATCA GGGGGAACCT TA TTT CTATC GTAATGTAAT TCCACCAAGA AAAAGAAAGA AACCCTTGCT GCCAGAGCTG AAAATATCAA TTTCTAGGTG ATGAAACAGA AGGTTTTCTA ACTTATATCG ATCAAGAGCA CTTTCAAATC GAGCTCTCTC ATCATTCAAC AAAA.ATAAAG AACGAGATTT
GCCCTTCTCT
CTCAATCTAA
AATGTCGCTG
TATAAAACGG
CGTATCGATG
TGGCATCTGG
AAATGATGGT
C'TTTGCTAA
AAAAAACAGA
CTTCTAGCGT
TGTTCGCTTA TCTGAAGCTG TTAATCTAGA TATTGATGTT ACTCGAAAAG GTTGCAAACG ACCTTATTTA GAGALATTATC TGGCCATTCG TACAGCCCTT TTrTTTAACTC TCTACAGAGG TGAGAAAATG GTTGCTAAAT ACTCAGAGGA GCGCCATACA CTAGCAACTA GGCTCTATGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 TTTTAAAGTG CGTGTAACAC CCCATAAACT TGCGACTAAA TCACAAGTTT TAGTCAGTCA CCAACTAGGA CATGCTAGCA CACAAGTCAC TGACCTCTAT ACCCATA N'G TTAGTGATGA ACAAAAGAAT GCTCTGGATA GTTTATGATT TTACGTATTT TAAATTATGT AAATAAATAT CAAAAAAAGA ATTTATCCAA CTACCGCTTC AGCGATTTCT TCACGGCTAA ATATCAATGG TTTTTAGCGC CI'AAGAACA TCTTCGCGTT ACATCTTCTA CTGCAGCAAC GTCTTCAATA CCAAAGAAGT AGTTGGCCAA CTTCTTTTG TACCAGCGAA GTAGCGTGTG CGTATTTCAC CCCACGAAGG CACCATAAAT CTTGATGTCT TGGA'rTTTTG ATTCAGTAAC GTTAGCAAAG ACTTCAACCT TACCACTAGT GAATPTGATTr 986 AGGTGATTA CCATAGT'rCC ACTCCCAAGT TCCAAACTTA CCACGACG4GA CGTTAAATTC
GTATCCTTGA
TCTGGCTACT
I I Lr'rTGCTA
GATTCAAATT
AAGAGCAAGC
AACI'TCI'AC
TGACCCAGGG
TCT~TCTTTGC
TGCGATTGAT
C7TTTrTCA'r ATTCAT'rGAT
TATCTTTTGA
AACCGTGGTG
CATCAATCTC
TATTGATAAC
T'rCGGCCAAT TC~rTTCrG AAAAGACGTA GTATTCCAAG AGTAAATCAC GGAATrrC AATATTGGr'r ACACGGGCAC GGACGGATTT AACCTTAAGG GCATTTGCGA GGACTGACAA CATGATACGG CCGTTGATAT AGGCT'rGGGC AACGTCATTA CGACCTGTGA ACTCAGCTTT CrOrAr-T1AG AACCPCTTGA AGTCAAA'rGC 'rVCAGTCATC
GACTGCTGATT
CACACCTTrT
ATCAACGTCA
ATTCCCACAG
AACCCCAAGT
CTTATTTTCA
AGCTCCACCA
GTTGATTTCT
AAGTAGGAAG
AGATGATCGT GTAGTTGAGG TTATTTAAAT CGTGGTAAAC CCACTAATAC GGCGAACTAC TCGATAGTGT TCTGGTGACG ATTTGATCCr CATCCAAAAG GTGTCATTTG AATGATT'GAT CTGGAAATAA CCTTCCAGTC GCCATTCCTA GAACATCTGC TGGATGGTCT TCAGCATTC TVATTAATT CTGCGGCTC TTrATCAGCGA TAACTTTTAC GCAGCAAAGT TAAACTTACC AAACCTAC'rG CTGC'TACTTC GCAACTGCAT GATTTCCTT GCGTGAGCCA ACATCTrAG'r CTTT'rATaT ATTCGTTGAC GTGTTTAAAG GCGTATTCTT CCAAGGCAAT ATTAAkAAGCA AATGTATTTC ATGATATCCC TTTACTTTAT ATGATAGAAA CTCAATACCA TT=CCCGAA CATAA'rCACG ACCAACAATG ATAGA'rOGCT 'rGTTAATCCA
TAATCTATCT
AAACGCTTCG
CTCAACAGTG
AGGACCAATA
GAAACCTTCA
GATGGCAACA
TCGTTTTA'TT
TACATCACTT
AT7TrCCATTT
ATGTGTACAC
GCTGCGTCAG
TCGTATTTCT
TTTTCITrAcG
CAGAGTAAGT
CGATGATGCT
CAAGGATTTC
TGAATGGATG
TGGGTGCCCG
TGATGCT'rCG
TCCGTATTTC
1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 ATGCAATAGC ACGACCGTTA CACGGGCTTG TTCTTCTGTC AGGGAGAGTG TAGATGGCTG CAGGAGTCAA ATTCAATTTG AAGGGCAr'r? TCAGCGGAAA CTTCACCCAT GCGGAAAGCT ACCGTTGATG 'TCACCTGGTG CATAAATGCC TGGAACTGAA CTTGATACAA CCACGATCCA ATTCAAACTC AACCTC'CLCA ATACCTTCAA GGTCTGGCAT ACGACCAATT TCTTTTCCTT CAACCTTGAT ACGAAGT'rGA GTACCAGTCA AGATGGTCAT TCCTTTACGC TCCACATCCA TAGCTGGAAC TATACGGTCC AATGTCATGA AGGCCTGACC GACTTCGATA CTr'rCTGGCA CT'rCGTrrCAT TTCAAGAATG ATACCAGGGA CGTTGATCTT GTTGACTTTT
GAAAGAAGAG
CCAI'TrCCT
TCAAGAATCA
C'TTTGCTTGC GATGATATCG CAATGATTTC TTGCAGTTTA AGCGAAGGTT CTTAGAAACT ATCATTTCGA TAACAGTCAC TTTTGAACCA CCGACAACTC CACCACCGAT GATAACAACG TCATCACTAG TCATGACAAG TGGAGATTCC GAACCACCAG CAAGAATGAT TTTCTTGGrr TCAACCAATT CAGAACCA'rr TACCAAGACG TTATGAACAG TAACTCCGT1A GCTACGAAGA ACAACTTTAG ATTTAGTTTC 5'AAAACTTT ATCACCATAC CACGATTGC AGCATGACCG TAGGTCTTGC TTGGAATACA TCCACGGTT ACAACCGCAA CCTTACCGCC GAA'rrGGGCA CCTCCACCAA rCACAACGAT ATCAAAAGCA C7rGCACAG GTACAGCGCT AGCTTCTGGC CTTT CAC CAAGGTAACC GATAAC?='CC 7 TGTCTT 'rAGTGATTGT ACCAATTCCT AGTCCTGCAA CACCACCAAC AAGAGTATTA TCCATATCAA CAGTGAAGTr AGGATTMrCA ATAI-rrrCAA TAATTrTCAGC GTTA'rGAAGG
AAGCACGTTC
GCTTTAATGG
TCATCGCTCT
GATCCTGCTC
CACCAAGTTC AGATTTCTCA CTGCAACATA ACCAGCAGGA TACCATCA'rC GTTTGAGGTA CAGCTGTTGG GA'rGTTTTCC AGAATGGCAA TCAAGTACCC ATGAT'rTCCA AAAGGATTTC ACGATTTGTC CTTCTGTCAT TCTTCCTTCC T'rTATCTATA ATTGGCGTr'r CAATCAACTC
ACGACACGGT
rTGACGACAA
TTAATA.ATCG
GAATTTTGTA
AAGGCTACAA
AATCCATTAT
TTGCCATCT
GCAAGCGAAA
AGAACCTTCT
GTTGGCGCAG
GG'rCAA'rGGT
CTGGCTTCTC
GACCAAAGGA
ACTCACTTG4G CCAGTTCrGA
CCATCCCAAC
CTGTCAATGA
GAAGGTCTGT
TACGAAGAGC
TCAAGTAAGA
ATCTTCTTCG
TCCTTCTTTT
ATCCACGCCG
TCTTAAAAAT
TTTCAAGTCC
TAATCCTAA
GATTGTCGAA
CTGAACACCA
AGCCAATTTA
AAGACTCATC
TGCCATGGCA
AGCGTTGATG
GTTACAGGGA CAGrCACC ATCTCCTTTG GCTCCAA'rT CCATGCTGAC TTTATCAG'rC ACAAAT'rCTC CGACTT'7r' ATTCCATTGG GCrTTTGGCA TAATTACTTC TAACGCCATG GAATACTCTT GCTCTTAAkAT TAACATTGAG TTCATAAACT TAGCACCAGC CATACCATCT CTCATGATTG GGCGAATCAC AATTTCACCA CTGACACCAA GGATAGCTGA GTTGGGTTGG AACArrCCCA AATTACTGAT TGTGAATGT'r CCATCCAAGG TACGGCCAAT AACATCCTTA TTCTCAGCAT TGTAAACAAC AGGTGTCATC AGATTGACAT AGTTGTGAGT GATAATAGTC TATCGTGTT TCATAAGAGT CTTAACAACT 31.80 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 43.40 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 ATTGGAATAC GCTCGATTTT TCAATCTGAG CAGGAGATr ACATCCI'TCT TCATGATTTT 'rTATGTTCGA GGGCAATTCG TAAGTTTCCA CGTCTTCTTT TACAGTAGTC T'rCTTCCAG 'rTGCT'rCCAT CAACATT'rCA GTCATATCAA CT-rCATAGrT ?'rCAACCATG CGTTGGGCA.A TAACCTTACG ACCATATOCT GTTACGTTAT CAGGGACTTC GATGCTATCG TTP'rCGATAT TTTCAGGAAG ACCACGATGA CCCGTTCCTr GGATTTCCTG 7 GCAAGT GGCGAAA'rGC GAACCACGTT GTGGACACGA CCGTTTGCAC CTGAGCCAGA GATTrGGCTCA
GAGGCTGAAG
CATTGGTGTC
TTCCACTTTT
CAGGCCAAA
CCAAGCAATG
TGTGTCTTTA
AACGTCGTAC
988 AGG~rATCC CTAAATCATC CGCTAACTTT CTAGCTGCAG CAGTCGCTCT TCAGCCA'rGA CCTCTCCAAT TCTAT'rTATG ATACAAAGGG CGTCAAAAGC 'rAGGAAATCG ACGA'rGGCTT CGATGAAGCC AACGAGATTT ATCrTTTC GCCCGTGCTC TAATCrAAGA TATTAATGAC TTCTCGTCAG CTTTAT''rA TTrACATAAC AG7T-rTrCGG ATTGCATCTT TGATACTTTC TTGTGCATAA GGCATCGGCA CATCTTCTCC GTCAAATGCT TCTGATTCTG AAATAATAGC GTGGGCATCG TTGACCAGAA CAACCTTACC CTTATCAAGC GGAACAACGG TACGTGGGTC TAATTCTTCA GCAGCTTGAA CCACACGGCG ATCCGTTCCT TGGCGTTTGA TTTCACCA.AC GAAGACCTCT GCACCTAAAA T'rATCTTATG TAACCCTATT AACTGTTGGA ATCAT'rGCAT TGCACAACGG CCAATTGCTG TGAAATTTCA CCGATATAGC 'rA~CCTGTCA
GACTGAAAAA
CGATCTTTTA
GATACAAAGT
CTTTGTTATA
TTTCTAGGTTr
CATCTAGATA
CACTTGTTTT
AGCTTCTTC ACTGAGTTTA TGATGATATC AACAATTTCA ACTGAAATTC CTrTITCTGC AAGCA'TTTT CCATAAGTAA CAACTGTTAC CCCAAGTGGA ATTGTGTACT CTGGATCAAC 0 0 00 TGGCACTTCC CCTTrrTGGT TAAATTCTGA CTTGTACTCA AGTATAATAA CTGGGTTGT ATCACGGATA GAAGACTTAA GCAGGCCTTT CATGTCCGCA GGGCCAG GTGCCACAAC CTTAAGTCCTr CGAATGTGAG TAAACCAAGA CTCTAGAGAT TGTGAGTGCT GGGCGGCAGA GCCAACTCCG TTACCAGCTG CACAACGAAC AGTCATTGGA ACCTGACCTT TACCACCAAA CATGTAACGT G7=TAGCAG CTTGGTTGAC GATAT'TGTCC ATGCGCAATAA CAGAGAAGTC CATGAAGGTC ATATCGACGA 'PrGGACGAAG TCCTGTCATG GCTGCTCCTG CTGCTGCTCC AGAGATGGCA GCTTCAGAAA TCGGACAGTC ACGGACACGT TCTGGACCAA ATTCTTCAAG CATTCCAACA GAAGTACCGA AGTCTCCTCC GAAGACACCG ACGTCTTCTC CCATCAAGAA CACA'TTTCA TCGCGACGCA TTTCCTCAGA CATAGCAAGG ATAATGGTCT CACGGAAGGA CATTGTTTTT GTTTCCATTT TATCTCTTTC TCCTrTAGTCT GCGTAAATAT CTrCAAAGGC TGATTCAACC GGTGGGAATG GGCTTTCCTC TGCAAATTTA ACAGAAGCTT CTACTGCTTC CTTACTTGC GCTTGGATTT CTTCCAATTC 'PTCGGCACTT GCAATGTTAT 'TTrCAATAAG G'rAATTGCGG AGGT'rTTCGA TTGGA'TCTTT TTGTTTCCAC ATCCACI'T CTTCACGCGI' ACGATATTTA CCAGGGTCAG ATGATGAGTG ACCGAGCCAG CGATAAGTTA CACTTTCAAT CAAGACTGGA CCATTGCCAC TGCGAACATG GTCCACAGCT TTCTGAAATC CTTCATAGAC ATCGATGACA TTGTTACCCT CTTCGATGAA CATTCCAGGA ATTCCATAAG CGGCCACG TTGATGGATA TGT'rCTATAT TGGTCATTTT CTTGATATCC GCAGAGATAC CGTAACCGTT GTTAATGCAA TAGAAAATGA CTGGCACGTT CCAGATAGAA GCCATGTTCA CTGCTTCGTG 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 a a a 9 S 9 99 9 9 999 5 *9 95 9 9 99 99 9 9 6 *9 99 99 9 9 ~9.9 99 9.
99 9 9 9.9.
9 (.9 9t .9 .4 9 9 GAAAACACCT TCATTCGTCG CACCATCTCC AAAGAAGCAG ACAACGATTT TACCGGTATr' TrGCAT'rrGC TGACTGAGGG CTGCACCGAC AGCGATCCCC ATACCACCAC CTACGATACC ATTGGCACCA AGGTTCCCAG CATCAAGGTC AGCGATATGC ATAGATCCAC C~IrCCCTT-r ACAGGTrCCA GTCTATT'rAC CAAGGATTC AGCCATCATT CCGTrGAGGT CAATCCC~rr AGCAATAGCr TGCCCGTGTC CACGGTGGTT TGAGGTAATC AGATCATCTG GATTGAGAGC TAACATAGCC CCCACGTrAG CTGCCTCI'C ACCAACAGAA AAGTGCGTCA TTCCTGCAC TTrCCCTTTC TTrACTAATT GTGCAATTr TAAGTCCATC CGACGGATrT CTTCCATCTT ACGGAACATT TCTAGCAAAA GATTTTTATC TAAACTTGAC ATCTTCTTGC CrTCTAACr TTCTTCTTAC CTTACTAT'rT TACCGCTTrT GGCAAATACT GTCAAACTTT TTCTAAAAGA AATTTCACAA AA'rAAAAAAC AAAACCCCGT GAAAACAAGG GATTTCTTG TCAAGAATAT 7TTTTCACAA ACTTrTAGC ATTTGGA'N'T TGCTAAAGAT TCAAATCTCT TCATAATCAC AGTTAAACGC CAACGGTAGA GCGCCCCGCT CACAATCAAA CTAATAArCA AGCCGATCCA GTAAGAATAA GC1'CCAAAAT CTGTrAGGGA ATCAAATAGC GTAnCACAGG GATTGCTACG CCCCAATAAC CAAGCAAACC AAGGTAAAAA GGAATAACTG TATCCTTATA CCCCCGCAAA ATrCCCTGAA GCGGCGCCGC AAAGGTATCT GCTAACrGGA AGA.AAAGACT ATAAGrTAAA AAACGCACTG TCAAATCGAT AAATTT'rGGG TCGTTACCAT AAAGACTGGC CACATrTCCC CTAAAAATGT AAAGGAAGGT TAAGG'rGAAG GCCGCAAAAA TGAGGGCAGT CCATCTTCCT AGACCAATAT AGGTT'r'rCGC ATCATCAAA'r CGCT'rGGCTC CCACT'rCA'rA GGAAACGACA ATAGCCATAC CCGATGAGAT ACTCATAGGA AAGGCGTACA TAAGACTTGA AAAGTTCATA GCTGACTGGT GACTAGCTA'r AATCAAGGGC GAAAAC TAG CCATAATCAA GCCAACCACT GAAAAGATAG CCACTTCCGC GAAGACAGTr CCCCCAATAG GCAGACCTAA ACGAACTCCT TCCTTAATTT TA'rCCATATT AAGTCAATT CGTT'rCTCAA GGTGTAAGGC TTTGAGCTTC TCCTGI-rTAA ATAAAACCAG AACAGAAATC CCAAGCAAGA CCCAGTACGC CAAGGATGT CCTAAACCAG CACCAGCCCC TCCCAGTTCT GGAACACCAA AGGCACCGTA AATCAAGAGA TAGTTAAATC CCCTATTGAG AGGGAGTAAC AAAAGCATGA GGTACATGGA CAGTTrGGTC AAGCCCAGCG AATCCAGCAA GGAACGAATG ACGCTAAAGA GCAACAAGGG GATAATCCCG ATAGATAAAA ACCAAAGATA GCGAACCGCT ACTGCCGC'rA CTGCTGCTrC TAACCCAATA 'rGATTCAAGA TTATrGGTGC CAAGAAAAGT ACCATCCCCA GCAAGACCAC AGATAGGCCC AAGGCCAAAT AAATAAATTG GTAAAAATCA GACGCAACTr CTTCCTT'rr GCCTCGACCA 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400
AGATG'GTGAC
GGATTCCAGA
GTCATTGCAG
AAGAAAATTT
CTCT'rTCTAT
CGGAGGTTTA
TCTAACTAAA
TTGAATTTTG
CTCCCTACGT
GTCCCTTACA
TAGCTGC?1'T
CAATGATAGG
TACTGGTTGC
'rA'CAACAAA T1'AAAAATAA TCTGAT'rTAT
TTAGATTTG
CCTCCGTTAT
GAACGAT'rAG
TTTCGAAAAT
GATGGTATAT
GGAAGCAAAA
CACCAAGGCT
CATAGATACA
AGAGGCAGAA
TACTAACTrC
CTAAACCAAA
AAGTAG'rATG
CATATTGAAC
CTGCGGGACA
990 GACACAATCC CTGTTAGAAA TGTAAAGAAA CCAGCCAAGT CCATAGTGTT CTATTGACCT TAATrGGCAA ATTGGTAGAT CAGGATrCGGG TCTCGTAAAC ACrTGTCTT ATACATACTT GAGT'rTCAGA CCATAG?'M' TCAAACTTAG CCAACACGCA CATGTACGAC AATAATAGCT CGCATGGTCA GCrTTI'TCTT TAG~rrCATA GTAAATTCCA CTAT'rAGATT- TCGCTTGTCT AAT'rCATATT CTAACTCCTA TCAAGCTTGA TTAACCTGCC CTTTTAAGGT TTCACCGATG TGATATCTGC TGGACCATTC GGTTGTATGT CATTTTTTCA AGGTCAAGCT GAGAGACAGG 'rGGGAGTCCA
TCAGCCAAGT
AGTAArrCCA
CATGTTTCTA
CAAAGCCGTC AGCCrrGGCA AACC'rGCT'C AAAGTTGTAA TCAAGCTCAA CTCACCAGCT AGCCAGTCAT ACCAGATGGC
TAGACGATTT
AATGGrGAAT 'rCAAAAATAG AC7TGGCTG
TCTACTAAAT
GCTTTGGTAA
TATCCTCAAC ATTTTTTCA TCTACATGAT TGACACCTGA TTTGAGACCT TCGATAACGG GATTCATCTT AGCATTGCTA CCTTrGTGTTA GTGGCGCTAC TTCGCTGTG ACr'rCTGCAC CAC'rrTCTTC CTTAGACAAA TGCTGGATGT TGACATCACG CGCCATCATA GCGTACTCAG GTTCTCTAGC AATATTT'rCA 'rTAAAGCCAA GAAGGCTrGAT AAAGGTATTG AGTTTTTTGC TGCTCTCAAG CGGAATACCG TCATCAGAGA TAAACTCAGT CAAGTTTTTA CCATTAAAGT TAATCTTCTC TTTGGCAGCT GACTGGAGAA rrCGACTGGT ATTAGCCATC ATGACGACAG C-ACCAGTATG AATGTCTTCT TTAtGTGTIT CGACCAAGCC AGGAGCAACC ACAAGACCAG TGATCTCAGA CGCAATTTTG A'rAATTTTCC CCAAACCAGA CTTGGGATCC ATTACACGAC
GAGGCGCGTG
CACGACGGTC
AA.AGAAGTGC
CTAACCCCTG
GAACATGGGC
CCACCCCAGT
GAACACCCTT
CT'rCC'CCAT
AACCAACCGC
TTTTAGTAAT
GTCAGTCGCA A'rAACTGTGA TGATTCCAAA CGAAGCGGTG TTCTGTCTTA CAGAAATGCT AGCAAACTCC ACTACTTTAA TTTAGTTGCA TAGGCAATCA AGCACCGCAG ATATGGAAAT CAAACCTGGA TCTTCCTCAT GGCTTCCTT ACAATCTTAC ACCAGCTTCr AAGAGTGCC T GGTCGCAACT GTCTTGACAT 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 CTGCTTGCAA AGTCTCCACG TAGTAAAACC ACCTGCAGCG
TCTGAAATGG
GCTGCTAGGG
GACCAGGTTC ACGGAAATGA ACATGAATAT TAGCATCAAT CGTTTCTGCT CCTTCTTCCG CATCTTGAAC TAAGACATCA CAAACTTGAT CATN"FGAT TAGTAGCATC TGC1TTTCTCC I. 1-1 -11 991 TTTATTCATA GAAATCAACT 'rGGGTATCCA ACAATTTATC CCCATCATAA CTGAAAAGAA GGGTTr'ATCC TCTAAAAGCC ACTCAACAAA GGrTGGTCA TCGGCTTGCT CAAAACCTCA TCATAGGGAA CCCAT'rCTAG CG'TCCCCTCA TCAAGTCGCC CTCAAACTCC GTCACCTrAA AAACATAGGT GTACCAGTCT TAAATTCAGG AAAAGTGATG ACACCTTTA GAACTGGCTT GGCTTTGAGC CA.AGGATTTC ACGCGCCGCG CATCCTGGG GCGTCrCTCC TCTCTCTAGC CACCAATCCA TTTCCCT TCA TGGACATCAT TGGGTTTCTT ATTACGATGG GTTC'TTTCCC ATTATCAATG TAGCAAATCG TCGCTAACTG AGGCATATTT
ACAAACTTGG
CCTTCCCAAG
TTGCAGTCAA
AAATCTGGTG
CCTGTTTrCTT
TTACCACCCA
AGCATGAGCA
TCTCCTTATC
TAAGCCAATC GATTGGCTCT GCTTGGAACC CCAAAATCCT TCAAGTGATG AGGATTGGTA CAAAAACGAC TGGTCTATCT CCCAGATTtG ACCAGCATGA GAAGCAAGAC TCCTTGCTCA CATCTGACAA T'rCT'ITCAAG CAGAAAAACT CAAGCCC'TGC CCACCTTrAAC TTCTrCAAGC GATAAATAAT CCCCTGAGAA CCTCAGGTAA T-TGCGCCTTA
TGTCCTGTCT
CTATAAACCG
ACTAATGCCT
AGATGATTGA
CCATTGGCCT
GCCCAAGCTG
ATATTTTGCA
GCTTGACCTG
AGTGTTGTCA
TAGACCTGCT
ATCAAAGCAT
C=rAAGAA TGCATTGGCC TTGGAAAAGG ACAAAGGACT TGGATGGGCT GATTCGATAA TCTrCTTACG TGCATAAGCT CCCCAGAGTA CCACCTGAAT CACAGCATCA GTAAAAGGCT GTCCAGCAGG AACAGTCAAA CAAGCATTAA TCAAATCATG AGATTTCTTA ACTCCGATAT AGGATGGTGG AGCTGGGATA GAGTCAGGTA GTCCGTGATA GGGGTCTTGC CCTAGAATA AGAGAGCCTG AAAAACCTTT TCCTTGGGTG CCATAAACTG ATTGATTTrC CCGAAATAAC GCCAAGACGA GTGTTCCATA GCCGACTCGG 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 INFORMATION FOR SEQ ID NO: 148: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 12127 base pairs TYPE: nucleic acid STRANDEONESS: double D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: AAAAAATAGA C1 TGTTAGAC TATAAATGTA GTAAGCCTAC ACAAGAAAAA TACATAGAGA TAAAGGTGAT TATTATGAAA TTCAAAAAAA TGCTTACTCT TGCAGCCATT GGCTTATCAG GATTTGGGCT TGTTGCCTGT GGCAATCAGT CAGCTGCTTC CAAACAGTCA GCTTCAGGAA CGATTGAGGT GATTTCACGA GAAAATGGCT CTGGGACACG GGGTGCCTTC ACAGAAATCA CAGGGAT1'CT CAAAAAAGAC GGTGATAAAA 992
AAATTGACAA
CACTTCAAGG TTCAAAATAG
ACATCTCCTT
CTAGTCGAGA
GGTCTTCTAA
GTCAACAAGT
CAAGCCAACA
TGGAAAAATT
CTAATGGGTC
TTTCTAGGGA
ACGGTATT'CC
TTGCAGACGT
TACAGAAG GTTCTCTCAG GGGATCTrA ACGAAATCTG TCAAGGCTTT CACAGTTTTA GATGGTGAAT ACCCTCTTCA TCrMrCAAG CTAGGTCAAG AT'NTATCAG GGTCACAGAT AATAAATTA TTGAAGCTAA CTTATCAGGC AAG'N'CTCTG TTGTAGGTTC CACTGCCAAA ACAGCTGTGA GAA'rGCTAAT GCTATCGCT= AGAGATTGA'r GGTGTCAAGG ACGTCCCTTC AACATTGrTTT CTTTATCCAC TCCAAACAAG AACCGAAACC ACGGAATATA CACTTCAGTA TCrTT~TAA AGTTACGATT GATATTACCT AACCGCTGAT ATTGGTATGG CCATGATGCT ATTGCTTTAG
AGCAGAAGCT
'rTCAGCAGGT
ATTAACTCCT
TGTTGTGGTC
TTTTAGTGGC
TATAAAAAAG AAAATCCAGA ATTACCGCTG TT'AAGGAGAA GAAGAAGGTA AGAGTCTCAC AATAATGACA ATAAGGCAAG CCAAGTCAGT AAATTAACCA CCTGGGACAA GATTAAA'rAA CCATAAATCT C'rAAAGAGAT GCAGACGTT GGGAGCTGTC GTATCTCCCT TACTTTCTTC CCTTCAAAGA AGCAGT=Tr AGGGCAATTT CTATTTTGC'r AATCTGTTTC TTTATT'rTTA CATCGTACAA TAAGATAAAG ACTAGAAAGG ACAAGATGTG TTN'CATGAG TGCAACAGTA GTAATGGCTT ACCTTTCATA
ATGGCTGAAC
AATGTTTGCT
AAGGCAAGTA
ACAAAACAAG
GCTGTTGTAG
GCTAACTACG
GCAAGCTATG
GTGATTGGG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 GCTTTGCCCG ==TATTA GGCAGTGAT"T GGCCCAAC GAACATTCCG GTATTTTACC AATGATCGTT GGTT-CCTTAT TAATTAC=~ AGGAGCGATT TGCCAACAGG CATCTTGACA TCGGTGTTTA TGGrTTATTA ?TGTCCAAAG CCCGTCTATG GCTTCTTAAA ATCAGCTATC AACTTGATGG CAGCCATTCC ATCTAT'rGTT TATGG'TTTTT TCCGCCTACA ATTATTGTG CCTTGGATTA GAAGCTT~T AGGAAATGGC ATCAGTGTCC TAACCGCTTC GTTACTATTA GGAATAATGA TTI'TGCCAAC CATTArCACT TTGTCAGA-AT CTGCTATCCG AACAGTTCCC AAAACGTATT ATTCTGGTAG CTTGGCTCTA GGAGCTAGTC ATGAACGGAG TAr7rTAGT GTCATCTTGC CAGCTGCGAG ATCTGGTATT TTATCAGCAG TTATTTTAGG AATCGGTCGC GCAGTAGGTG AAACCATGGC ACCAGCCGAT TATTCCAAGT GGACTCTTTT CAGGAACCAG TTCTGGAAAT GGCTTACGCA TCAGGTCAGC ATAGGGAAGC TTCTCTTrTTT CCTTATTCTC TTGATTAATG CCTACTTTGC CTTATGAGTA AATACCTGCT AAAACI'CTC GT~rATTGTT AGTTATTTG GTGGCAGGCA AACCTTAACA ACCAATATTG CCTTATTGCA ACCTCAGCAG CTACT=rGAAA CCAAAATCAT TTTCAGCTTT AACCTTTGC TCTCTCTTTT TAATCATTGG T=rATCCTC ATCAAACGCT TACCTCATCT AAGTCTATCC 993 ATMCCCTTA TGCC-AGCGAT TATTTCCACC crcITT-C1- GGACI'TATAC TTCTGAGAAC GTTATTCTGG TCTrGGTGC TCrCI-rMA TATCT'rGTGG AATATACAAA AAAAGAcCC GATACCTTAT CTGGGATTCC TTCCATrT GTCTTCTTAG GTTTTCAATA CTCTCT'GTA ?TGCCAGTCA TTATTCGCTC AACAGAAGAA
GCCTTGCCCA
CTTTGTGTTA
TrrGGI'CTGT
TCAGGAATCT
GCCCTTTTAT
rrACGGACTG
ATACTAGCTA
TCTACCAATA
TAGGGATTT
AAATCATGCC
rrC4CATGCT
TA.ACCTCACT
CTG=1AGTGA
T'TTACAAT
7TrGCCCGTAT
CGCCAAGTAG
TGCTGGTTTT
ATTGGCCTCA
CTTCTTTGTA
TATCATGGTG
TAGCATGCGT
TGTTCTACCA
cCrGTTGAA
TCTCATGTCT
S.
S. *S
S
S
S. 55 S S S S CAAGCAAGTr ATGGACTTGG GTTGCCATGC CAGGTATT ACAGCTGCCC TCATGTATAC TCACGCCGrr CTCTAGCCCT GAAGCCTATG CTACCGGCGT AGCTTATTAT CTCGAAAACT ACACCTAGAC TT-ATTTTACG AGAALAGACAG ATTACTGCCT AACCCTTAAC CCGATGAACG AGATGACCAA GATArrTATA CATGG7TT= CAACAGCCTA
GGCAGGTAAG
AGCTGGAGTG
ATTAGGTACC
ACATATGTAT ATGCTGTCAA GTGAGGGGCT ACATGTCAAT GATTTTGATT ATTACTGTr'r TAATGATAAA TACTCTATCA TGTGAAAGGA GCI-rCCTAGT ATGGGAACAT 'TTTCAGTCAG GGGA'rTTrCA AGCCTTAAAA AATATTTCGA T'rCAA'IrACC TGATAGGCCC ATCTGGTTCGT GGCAAATCAA CTTTTCTAAA A'T'rGGTTCC TTCTrGCCAT A'rTGA.AGGCC AAGTCCTC'r GTAGCAAATT CAACCTTAAT CAGCTACGTA AGCGTGTAGG ATCCCTTTGC CATGTCTATC TATGATAACG TGGCTTATGG 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180
CCCAAGGACA
AAAAGGGGCA
CATGGTATTC GAGACAAAAA ACAATTAGAT GCCTTAGTGG AGAAATCTTT GCC-ATrTGG AAGAAGTCAA AGATGATCTT AAAAAGAGTG CCATGTCCTTr ATCTGGCGGT CAGCAGCAAC TCTGT'rAATG GATGAGCCGA CCTCATTCAG CAACTAAAAA GCCTTTGCA'r CTTCACCTTr AGGArrATAC OS TGCGCGAGCT TTAGCAGTAG AACCTGATAT AGACCCTATC TCCACTPTAA AAAT1'GAAGA GA'1rATCATT GTTACCCATA ACATGCAACA TTCTAACA GGAGAAATI-r GCGAATTTCG AGATCAGCGC ACAGAAGACT ATATTTCAGG AGAAATCAAT 'rrGACTTAGA ATTGCATCAA
S
5.5.
S. OS S S AGCTTCACGT ATTTCAGATA AAACTGCTTT AGATACCGTT GACGTGTTTA CCAATCCAAA ACGGT'rCGCA TAAGGAAGGA AAAACCTATG TTAGAACAAT CCTT TTrAGG ACTACGGCAA CTGGCCTTAG CC1'CCAAAGA CAAGGAGATG ATCAACCAAG GTCAAAGCGC TATCGAATTG CCACAAGTGT CTGACCTTCG ATTTGTGArr C?1'GTCCTTG
GCAGAGCTAA
ACCTGTCCCC
AGCATCATCT
AAACAGCTTC AAAAGCCTTA TTATCAATAA GGATCATGCT GTTrGTTGGC CTTGCAGCAG CTTCTTGTTC AGACCTTGAA 994 CGTATIGGGAG ACCATATGGC AGGCATTGCC AAAGCTGTTT TGCAACTAAA AGAAAATCAA 3840
CTAGCCCCTG
GA'1rrAl-rGG
GATGAACAGA
GACCAAGAAA
CGCTCGCTGA
TAGTGGATTT
T'ATGCTT
TTTCAAT'rCT T'rCTTTGGGG GGCTG'rCTCG
TCCTTGACTA
ACTGCTAGAC
CCTTATTTG
ACGAAGAACA GTTACACCAA ATGGG;TAAAT 7TGCCITMCC TTTGCACCAA GCCTCAAAAG TTGACCAATA TTATTATGCC TTATCAAAGG TATCCCTCAG CATGCTAGCC CTATTAGTAT TGCTCAAAAA AAATCATTGG ACTTATGAAA ATATCATAGG GCATCTGGAA CTACCTAGAA ACAGGAGAAC
CTCAATT-CC
TTACATGCT
GAATTrAA'rTC CAATGGAACT CAATACCTTT AACKITN'GTG AACGCCTACT AACTAATCCT 'rAAAAGAGAA GTAAAAAAGT TCATTTCACC AATTTAAGCA ATCCTCAACG AGGGAATGCT GAAAACTTTA
GAGTACGATT
GTGTAGATAG
TCAAAGAAAG
AAGTACTCTT
TGAGGAGTTG
GAAAGCAGGA
ATAAGACAGA
CTTrATAA'rC1
TCAAGCGTTT
GACATATTCT
ATACAATCTG
TAGTTCCACC ATGATTAAGA CTACCTCTT? T'rAAACCAC TCGACGTCTC TTCCTTCATA CAAATTCTCA AGTCTATACG CCAGATAAAT CTGATACTCC AACAGAATAA GTCACCTTAT TTTTTAGGAA AAAATCAAGA AAAA'rCACCA GAAGGACAGT ATGAAGTACG TATGATGATG TAGCTGGTGA TGAACTAAAG TTGCCCCAAA ATATGTCTCr CCTATCAAA ACAGTTTCAA CTGTTCCATA TAGAGCTAGA TTTAAAAATC GAGCATCAAA CAAATCTCCT CAATTATGTC ATTTCAACTA GCTTT'rCTGG GGGCAGGGGA AAACATGCCT CCAAGGGAGG AGTCTGCCCT TCGAACATCA GAAATT'AAGC TAAT7TTTTGA ACTGTGTAGT 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 .1220 5280 5340 5400 5460 5520 5580 TCGTTAGTGC CAGA'rATGAA TAATTTGGGA TTCT'rCCTCA GGTAGCCCAT CATAATACTC CTTTTGGGCG ATAGTTTCAT CTTCGTATGT TAGGTATTCC TTATCCAACT ACGTTCAGCA ATATATTTTA AATTrAATTGA CGGATACTTA TTTAGCCAAA CGTAATCTTT
CTATATAACT
ACTTTGTTAC
ATTCAGACTC
TAATTTTTTC
TTCAAAAATC TTA'rCAAAAA AGdAGTCCTC ATCAAGAA.AT TGGCATCAAC TTGTAATCIr TATTCGTCTG GAT'rCTCCAT A'rCACCACAA AATTCTGAAC GCCAAACTCT CGCAACCTAC CTGAATCAAA CTATCTATCA TTCGTCTCGG rT'rTTGTGTA TAGAATATGA TAGAATAAGA CGTCGCCGTC TTGGGGCCT~G TGGACACGAG GTACGTATTT
TGATAAATCT
CACTCTCTTT
ACTTCAATTrC
CAACCCCCAA
TTTCAATT~CT
GACTGATI r'
AAGAACTTCC
GATTTCTTCT
TTTTTCCACC
GAAAGAACTC
GTTCTTCGGG
GGGGAAATCT
TGAGTT'GTTT' ACCTCTATTA TAAGCATATA CACTTTAACT AAAGACTAAG AGTrTATCCC ATACCCCAGT AATGCAAGTG CAAAATCCCC TATCAAGGAG GAAATCATGG AAAAACAAAC AACCGCCCTT TCACAAGTCT TAAATGACAA TCCCGAGCAA ATCAATGAAA TTAATACACA CCATACTAA'r AAGCACTACT 'rTAAAGATGT 995 CG=T~AGAC GAAAATATCA TTGCCTACAC CCACTTAGCA GAAACATTGA AAGA'rGTCGA TCCGATTTTG TT'TGTTGTCC CAACAAAAGT GACACGAC~TT GTTGCCCAGC AAGTTGCACA AACCrTGGAC CATAAGGTTA TCATCATGCA CGCATCAAAG GGATTAGAAC CTGATAGCCA TAAACGArrA TCAACCATTC TTGAAGAAGA AATTCCTGA4A CATCTCCGTA GTGA'rATCG'r CGTG7IrcA GGGCCTAGTC ATCCAGAAGA GACCATTGTG TGCTGCT~TCTr
CTTCCGACTT
AAAGATTTAC AAACAGCTCA ATACGTTCAG 'rATACCAATA CGGATCT'AT CGG=TTGAA CGTGACCTAA CTTTAATAAC A6AGCTATwTTA GTAATCACTA ACTGCTGGTG CTCTTAAAAA TTrTGGTGATA A'rGCTAACGC GGCGTAGCAC TCCGGCGCCAG ATCGTAACCG GAACN'CCAT S S S. 55 TATTATTGCT GTCGGTGCTG AGCCATCATC CCTCGAGGT'r TCCATTGACC TATAGCGGCT CCACTC'rCGT AACTGGAGAG AGAAGCTAAT ATGGGCATGG AGCCCAAGAA CTTGGAGTCT CGCAACCAAT ATCAAAGATG TCAGTGGTCT TAACCCTCTA
GAGCTTTACA
TAGCAGAAAT
TATCTGGTGT
CTGGAGATGC TCTCGGACGA GGAGAATCCC TAATCGAAGG AAT~rCAACG ACTCGAGCAG ATATGCCCAT TACACAGGCT ATTACCAAG CCATTTATGA CATCATGAAC AATGAATPA TAGAAAGGAT TTrTTATGACA TCAAAAGTTA
TAGCTGATAT
CCTATGAACT
TTAT'rATCA
AACCAGA.AAA
GAAAGGCAGT
'rrGCCAAAGA CTCTAAA'rC
AGGACCACTT
TGGTCTTGGA
CACCCGCCTA
G4GGAGATTTG
CATCCCTGCT
AA'rGTTGCCA GCTGGACTAG GAACTCGATT TTTACCAGCA ACCAAGGCCC ATCGTAGACA AACCAACTAT CCAGTTTATC GTGGAAGAAG 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 AGGTA'rTGAA GATATTCTAG TTGTCACTGG TGATTCAAAC TTCGAA7"rGG A.ATATAACCT TAAATCAAAA CGTTCTATT-G GCTAGTTGAT AAAACAACTG TCTCGGAGAT GCTGTTTTGC GCTTGGTGAT GACTTGATGG CATGGATGAC 'rACCAGCGTA CGAAGTATCT GCrrACCGGGG TGTTGAAACC 7TrGrGAAA CCGACGCTAC CTCCTCACGC AGGAAATGAA AT1TCAGCTGA TGCTCGTGAG 'IrCA;AGGGG
ACATGCGTCT
AAGCCAAGCC
ATATCACAGA
CCCACGCCTC
TAI-GCTCC
CAAAGAAAAA
GCATTTTATC
TTTCGTCGGA
CGAAAACCCT
TACTATCGCT
GCAAGGCGAA
GGGAAAACAG ATCTTTTGAA CGCCAAACTC ATCCACGCGG AATGAACCTT T'rGTCGTTAT GTTCCACTTA CCAAACAACT GTCATGCCAG 'rCCCTCATGA GCAAAAGATG GTC'TTTACAG AACCAGCTCC AGAGOACOCT CCTAGCGACC TTGCTATTAT CTGAAATTT TGAGATTCTC GAAAACCAAG CTCCAGGTGC CAGATGCAAT CGACACCCTC AATAAAACAC AACGTGTATT CTCGTTACGA TGTCGGAGAC AAGTTTGGCT TCATGAAAAC ATCCATCGAC TACGCCCTCA AACACCCACA AGTCAAAGAT GATTTGAAGA ATTACCTCAT
CCAAC~TTGGA
ACACATAAAT
CCCTATAC?1'
CAAAAAAGCA
AACTTTGCTA
CTGTTGGAGT
GTGGAAGCTA
GTAATTGCGG
AAkACAATAAA TGGA'rGGTAT
AAAAGATACC
TTCAGATGAA
AGGTAAAGAC
=ITCAAAAA
AACTCTTGAA
TGApACT=
TAAATATCAA
TGCTCAAAAA
GTCTTATAAT
ATATTCAAAT
TGGATCATCA
ATATAAAGGC
AGATGTCTAT
TAGTCAACGC
AAAGCTACTC
AAAATCAAAT
AAAGAATTGA
TAAGTAAATT
GTATTTGT
CCGAATCGr
TGTTGTTTCT
AC1TCATCGTT
TAATAAAAA
ACACTATCAA
AAAGTAATCA
TCXATAACC
AAATCAGACA
996 CTGAGAAGGA ATAACAAAAT CTCTACTTGA ATCTACCTAT TTTCATTAAA ATAAGAGTAG GCGCACTTTT TCAAGlTTGrG AATGGTTCCA AAATAATAAA AAATTAAATC AACCGAGCCG TATAATAAGG GAGAAAGATA AGAAAAAGAT TATGGAGAAC TTTCATCAGT TGCAGTTGGT AACAACAAGC AGAACAACAA AAGAACAAGT TGATAAACTA
AATAAATTAG
TACGGACAAA
TAAxI'rTAAA
AACATAAGTT
GGTGATTT
AAATT~GTAG
GpTCTATTGG
GCAAAAATTG
T7'rGAATCAT TTATTTCTA AATTAAAAGA ACTATCTGAA ACTTCACTTA TATCTTAATA ACAAAGTCAA AGAATCATCT AAAGCAATTG CTGGCTT ATGATGTTAA AGATTCAGAT GACAAATTTA ACAAATGTAA ALAGAAATTAC AAAACAAATT GAT-.rATCA PAAACAAGAGA ATTrTGGAAGA AACTCTTAAA TCTCTA.AATG AAACAAATCG AACTTTTGAA GAAAGAAGAA GAAAAAGCTG GCAAAGGAAT CTTCTAGTCA AAGTAATTCT TCTGGTAGTG GCATCTTCCA ATTCAAATGT AGA'rTATAGT TCATCTGAAC AATTATGGCG GTCAAGATTA TTCTGG'TTCA GGAGATAGTT GAACAATATT CATCTAGCAA TTCAAACAGC GGAGCAAATA ACTGGTGCTG ACGGCTATCA AAGATACTAC TACAAAGATC GATGACGATG GAAATTACCT TGGGAACTTT GGTGGCGGCA TAATAACTAT ?rrAGAGCTG TGTTGTTTCG AATGGTTCCA AI-rT=rAAG TAGCTrTT'r CTTA'rTCAAG TTTACATATT TCAAACCACG TCAGCATCGC CN'ACCGTAG GTATGGTTAC ACAACCTCAA AACCATGTTT TCAGCTGACT TCGTCAGTTC TGCTTTGAGC AACCTGCGGC TAGCTTCCTA GTTTGCTCTT TCGTCACAAT CCCATTCCCT TGTAGAAAAG CAAAATGGCG CGCTCCTAAT CTCTGGCTGG TGTTATACAT CCCr'rTirc CATTTATATA AACATTAGCC ?TAATAAAAA CTAATGAAAA
TATAGTAAAA
GCCTATT
TrGccrAA GTTTAArTT
TAATTTTAAA
AATTTA'rCGA TATTAGGG*Tw 'rACAATTAGA
TTGATCCATC
AAACCGATGC
TAGATTTTCA
AAGATAAAGC
AAAAAGTTGA
ATCTTGTTGA
CTGAAAAAGC
CTTCTAATGA
AA.ACTAATGG
CAACAAATGG
ATGCTACAG
ATAATAATGG
TTGCAGAACC
AAACACATTA
ATACTCAATG
TGAC 1'CGTC
TATCTACAAC
TGAT'INTCAT
AG1YCCTACGA
CCTCTAACTC
7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120
AGTTTCATCT
CTCAAAGCAG
TGAGTATTAG
ACAAGACTAC
GAAAGATAAC
TTCCTGGAAT
CTAGCTGTTG
GCCCATAAAG
AGAGATTGCC
TCTCCTCTAC
TGCTAGAAAT
CAGAACACTT
GATATAAGGA
AGAGCTAGAG
CATCATTCCT
CTGCCTTCCA
GCGCCACCAA
CCALATAATGT
TTGCGAG'ITG
GCGCCTGCTG.
GATAAAAGAT
997 CTGCACCACC GATA'rGGCCT GCTAGG A 'rAACCACAAA AAGTGTCAGA TAGGATTGCC CATAGCGAAG AACAATAATC GCGGCAAATA CTAAGGA'rTT AGGACTAAAT ACAAAAACAA AGAGAAAGAA AAACTGCTTA GAACCrGAAAA AGATAATAAA GTGAAAGCAT ATTAACAATG AA.ATGTTCCC GCAGACAAGA GACGCCAAAC CTGCTCGGGA AAGAGGCGAA ACCCAATATG AACAAAAA'rG 'rAGCTGGCCC ATACATGCCT
CTGCAGTGAC
CAGGGTAACG
TCCTTTrTCAG
GAAAAATGTT
G'rCGTAAAAG
AAATCTCCCT
ACCACAAACT
CAACArrAGT
TCTATCAAAG
GTATAAAGTC
CCAAATCGAA
AAAAATACCA
ATTT1CCTTCA
CTGAATTTCA
ATAATGTATC
AGCCCTCAC
'rCAATTAATA CCAGATAGCG GTCATAATAG CCAGACCAGG AACATGAATC GTAGCTCCAG TAAGGCAAAG CCATGCCCCC CTTGGGATAA TGCCCTGTCA AAGTTTCCGC TAAGAGGAAG AAACTCGTCA CCTCCTGAAC AGGAATATCA CAAGGA'rATA TCGTACTCAA ACTACCACCA CCTCCACCGT ATCCTATCCG ATATCCTTTC AAATCAATCT GAGATGCATC CACCACTTCC AAAGTT'rTTA CCAACT=rG CGGATCATAG GTrTTTGGGTA 'rrAAAACCTT C~TGCCGTCC 9180 9240 9300 9360* 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 TTCACGCCT GCTCAATCAG TTCCTGCGTT TGAAACTCAT GCGATGACCT TGGCTTCTTG ATAAAArGGC TGGTAAAA TC'rATAGCCT G==r~GCTC TTGAGATATA GCCTTCATT AATTCCGATT TCATAGACAA GCCCTCTATT CCCCCTT CCGCAGCCAC CCCAATAGCT AAGACTrCTT CCTTAGGACT CGTAGGGACT ATCGATACCT AGCCAAAACA TCACGCCATC CAAAGTCCTC GCCTGTCATA GCAGGTTCGA TATCAATCAA GAGAAAAAGA GAGGTAGGTT GCCGCTCGG~T TAAACCTTGG CATGCAAGAC TTGCTTGCGT CTTTTTCAGG AAACTAGACA CATTTGAGGG TGATGAAGAG AACCTI'TGAA AGGAGATAAC CTCGATTCCG TCTTTTTCGT CTCAACAGGT AGGTATCCAC TGCAACCCCT 'rCTGCAACTG CAAGGCACGA ATAGTTCCAT CAAAGAAC'TC CATCAGTTCA CTTCTT'TGAG TTCCACTTCG 7TrTTACCCT CTTNTGCACC GTAAAAAAGC TCTGTCTGTG TCACCACTGC TCCCTCGATT CAAAGTAACT ACCCGcC-ACC CTTTGCCTTT G.AAACGGATC CGCGCCAAGG CTGGAT'rGT'r ACTTCCATAT CAAAGGCAGC AAGAGACTCA TGTCCTGTGT ATGACATrGT TGGTGGTTCC AGCTTGAAAA ACGCCGAAGG GGGTTGACAT TGCGGC'TAAC AACTGACTGC ACTTGGGTCA AAGGCGTCAT 'rGGCTTCATG AGCAAAAGCT TTCACCTCGC AAGTITCCTGC AAAGAGCTA
GCGTGGCCAC
TGACTATTAG
998 TCGCAATCTG GCCGACTTTC AAATCTGGAC GAACATIGGAG ACCATAGAAT TGATCTGGCA ACCAATCTCC AAAAGCACCG TCCTCATACA TGAGCATACC ACCAGCTCA TTTTCTTCAG CAGGCTGAAA TAGAAAGAGC AGATTATTCT TGGGTTGCTC CTCAAGGGCG CGCTCAAGAC AGCCTrAAGGC AATGGTCATA T'GAAAATCAT GGACACAGGC ATGCATGCGA CCTTGGTGTT GAGAAGCAAA AGGTAGACCT GTTTGTTCGA CGATAGGCAG GCCATCAATA TCTGTCCGCC AACCAATGGT TCGCTCCGGC TACGAATT TG AACAAAATCC CCTGAGTCTr GAACTCCTCC GAATCKAATC TAACATCTAT CCGCCTCTTT TTTACTTTTA TGTTGAGTTT GGGCATCAAT TTTCTGGGA CATCTrGGGT TGGACTCCTT CGATAACCAC GGTTCAGCAC TAGCTGGCTC TTTTTTCCAA CGATGGCACG CCGATTTCAG CACCGATATT TCCACCTGGT CACGGATAAT TGACTTCCCT GCAGGTAGAC CAAAATCCCT TTGCCCGTAG TCAA'I'TCTC AATCACATCC AAGCCAA'rCT CTGGAATCTG GTGTAAATCT
GTCCCCCAAG
AGCAAATAAG
CGTCTAGCT
CTGTCCTCCG ATATAGCAGA AAGAGGCTGG AAAAAGGGTT CAATTACAAG GTACGAAGCG CATCCTCTAG CGCTGTTTT TTCTT'rGATA ATACGAGCTG GAACACCTGC AACAATAGCT CCTGCTGCGA CAACTGAACC TGCATTAGCA CCGATAAGAA CATTGTCTCC AATCACACC'r GCCAAAACTG CACCTGCACC GCCACCAAGG ATGGCACCCA TGTCAATCAT GATAACAGAT CCCATCATGA TAACAGCATr CGCACCTGGC TCGATACGAG CGTTGATAGC
TACTACCACG
ACTACCGATT
GACACGGACT
AACGTGGCTA
GGTTCCAGCA
GTCACCAATT
ACGCTTATCT
ATTTTCTACC
10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12127 AGCAAAGGAA CTrGCAGAAT'r ACGAGCATCT TGCTCGACAA CATA.ATCTTG AAACCTTCAA GAAGCGGAGC CACATCCTTC CAGTCTCCGA ACAACAGAGC TAGGCACAGC AGTTGCGAGT TGCCCCTCAA TrC'TTCAG CATrGGCGAT AAATTGGATA ATTTCTTGAG
ATAGGTG
INFORMATION FOR SEQ ID NO: 149: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 12566 base pairs TYPE: nucleic acid C) STRANDEONESS: double D) TOPOLOGY: linear ATAGCACATT TCCTAGTTTG AGGTTACTTT GACACTGGTT CGTTCATTTT TGTAGCAGTC (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: CCATCCTTCT G'PTGATGTGA CAGGAATGAT GATAAATCAA CCAGTAGCTA GTCGCGAAGA GGTGACAGAG GCTTTGAGTC ACTTGGCGGT AGAGCACAAT AGTCTCATTG CTCGTCGAAT 999
CGTTGAGCCA
CCTTCCAGAA
GTCTTACTTG
GCTTGGTTAT
GACGGCCACC
GACCCTGATT
AATGAAGCTG
GGTCTGACCA
ATTGTATCAG
CAACCrTG CCTA'rGGTGC
TATCGGATCA
CAGAAACACG CTTrACCTAT GCCACTTATG GTGAGGGAAA TrrCCTCCAA GGACAGTGCA GAAACGAGTG ATTTATTACG GAAGTTTGGA TGGACTGAGC TTACAGACCA CCTrGAAAGA GAGCrCTT'r GGAGTTGCTC AGTGCTGGTA TCCAGTCTT GGCAACGGTG CAACTGGTCA TTCTACCTTA CTAACTGTCG ATT~GAAAGGG AAACTCCCTC AGCTGTATTG GTGGTCGGAT
'N'TCGAATGG
TACTGATT
AATCCCTTCG
TCAGACCAGT
TCGGATTGCG
TCATTGCTCT
TCTATCTACT
TCAAACGTAT
CGAGTGCGAC
TCAGGCAGGG
GTTAGAAGAT1
GATTCTCTGG
ATTCGCTTAA
GTGAGACAGC
TATCAAGGTG
AGAAGATCCA TTTTCGA'rAG 'rCTTACTAI1, AGCAIr'= CTGCTGACCT TTATGAGTCT GGAAATGGAG AGAGCTAGCA ATAAATGGAG TGGTTGGTCT AGTGCATTTG CCGATGAAGA GACATTTACT GA.AGAACGGT TACCCAATAC CAATTTCTCA GATGGAGCAG AAGTGGACCT GTCAGGGAA'r GTTATCTATG TCTCACCGCG TTCAGAGITT ATGGACAAGA TGCAAAACTI' TGAGAGCTTG CGAGAGCAGI' CTGTCTACTA 'rCTACTrrTAT GGATTGACCT TGGTrrACAG GAAAATAGTC GA'rGACATTG ATGATCGTGG AGCTCTCCTA CCCCACTACC CCAGCCTCA GACCGTTACC AGGAACGCGT AAGGATAATC AGACTCrTTT TATATTATGA AGATGGCAAT CGTCTCAGTC CTATCTGATA GAAGAAAAGA GTCTGAGGGA GAGrTTTGGGC CCAAGGA7TTG TTTACAGATT 'FCAGA.AACAC TACCTCCCAC
TAGCTGGTGA
TTA'rCTGCTC
CCTTGTTTAT
TGCCAGGGAT
TGGTGGATCT
CGCAACTCTr
GTGAAATGCA
GTCTATCCTT
GTGAGTGGCA
GCAATGTTGA
AC1TACACACC
TTACCGTTTC
TGATCTTGCC
ACCTGCAAAA
AGGTAAGGCT
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 CrTrCATCT GAAAGTGTAG AAGTGACGAG AGCT'rTTACA GAAACAGGAC AGGAACGTTT CCTCTATAAT GATGGGTACA AGACAACACG CCAGTACCTA AAAGATCCGA TITAT'rGTAGT TCTAACCCCG CAAGCGACTG GAACAAGACC TGTTGCAGGG ATGTTGTGGG GAACTACGGC TAATAGTGCC ?TGAAACTAG ATCGATATGG AGACAGCATC ACAGCTCTAA AAGAGAAAGG TCTGTATCAC AAGGTTTcT ACTTGGTAAA AAGCCAGCTA TTTTTTGCCA AGGTACTAAA TGACAAACGG GTGGAGTTTTm ACTCTCTCCT TATTGGGACG A=rGACCC TGTCTACGGC TATCI'rGTTA TTTCATTCCA TGAATCTTCT CTATTTGAG CAGTTCAGAC GGGAACTTAT GATTAAACGT CTTGCTGGTA TGACA.ATCTA TGACTCAT GGCAAGTATT TACTGGCGCA AGGAGGAGTT CTCTTGCTTG GCCTAGTCCT ATCTAGTATT TTGACAAGAG ATGGTTTGAT TAGCGCTCTA GTTGTAGCTr TGTTTACGCT 1000 TAAGGCAGGA CAAAAAAGAA GAAGCTGGTA GCATGGCAGT 1920 TAACGCCCTC TTGATT=AG ATTGAAAGGA AAATAAGA'rG
CGATTTCTC
AGAGTGGAAG
GTGGAAGGGT
TGGlTTTGAAT
CGGAAAGACG
TC1'CTATCAG GAGACCAGAT GGGCTATCTC AAAATTTGGA TTTCGGTTTT AAGTCGGGGC TTTAGAAAAA CTTTATCTGG GGGAGAGGCC CCTTGATT= GGCAGATGAA
ATTGATATTC
CTCAAGCTGG
ACGC74XTG
GGGAAAGATT
TTCAAAA~r G7rGGTCAGA
GTTAATCTAG
CAACGAGTTG
CCAACAGCAG
TAAAAACCAT
TCGGCCTCTT
AAATCTCAAA
GGTATrrGGA
CCCTTGCTAA
CTCTTGATCC
ATCCGATTAT
TTGATATGAG
CC'TAGTGCCA
CTTTTGG'rTC AAGGATTGGA AAAGAAATTT AATGACCGCG AGAAGGGCAA GC~rrATGCC TTAATCGGAA ATA'rCTTGGG AAAGCTAGAA AAGATAGATG TGAATCTCTT GGTGGATTTG AAAGATGAAA CCCTAGTCTG GAATAAGGCT GATGAAATCA AAAATCCGTA TTCGCAGGGT ATCTGAT'rAT CTCATCATGG AAAATATGAC CAATCATCTC TCCCACTCGT GAGTAT"rTC AGAAAACCAA TCAATCAAAG AGTAGAACGT TrGGAAAGGC T'rrAGAACAA AAAATCTATA GACTATTTTG AA.AAATCCAC TGAAAATTCA GAGGAGGTTA CATCATTGCG ACCCATAATC GAAACTTGCT CATGTGTGAA GAGGTGGTTT AGAAGATATC AAATCCGAGT GCATGGCTAT ATTATCGTTT GAAAAATTTA ATGTGGATTT GACCCTACCT TGTTTCGAGA GAACTGCTAA GGAAAGATAG T'rTTTTGGTA AGTAGTA.ACA GATAATAGAA GCAGTGCTGG AGCGTCACGG GATACCAATA TCCTTCAAAA ATCGAAATCG GGCCAGGTAT GTCATGGCTr TTGAGATTGA TTTGATAATG TGACCGTAGT TTGCTTGATT TTGCTAGTAT TGAAGGGCA.A AGGCAAAAGC CCTCAGACGG TTGAACTGAC AGTGGATGAT GTGGAGGAGG GAAAATCGAA GTTATCAAGA AGCTGATTT TTTGAACGCA GGCCACTTTT AAAGATTTCC AAGACTATCT TTCTTCATGA TGAT'TTTCAT TCCCAAAATA CAAGGGGAAT GTGTTACAAT AAGAGAATAG ATGAGAATTG CAGATTATAG CGTGACCAAG 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TTTrTACCTTT' AAAAAGTCCT AATTCTGGAT ACGGCTGAAA TGGTGCCTTG ACAGAATTTT CCACCGTTG GTGCCAATTT TITGGGCAAAA TrTTTTGACG TTGATGATCA GGTCA.ATGTC 'rGGCTGAGCG TGCAGCCCAA TGGCAGATAC CCTGCGTGAT
TAACGAAGAT
CCTCCCAATC
CTTGATTGAG
GGACCGCATT
GTATTACATG
ATTCTCAACG
AAGGTAGTGG
AGTGGCATTC
'rCAGCCCAGC
ACAGCCAAGG
TTGATTTGGC GCAACATATC CAGAATTTTA AAAATCCTGA CTAATTTGCC TTACTACATC ACGACGCCTA TTCTCATGCA CTrMrTGTrGA GTTTGTGGTC ATGATGCAGA AAGAAG;TAGC CTAACACCAA GGCTTACGGT AGCTTGTCTA TCGCCGTGCA T'rGCCTTTAT CGTGCCTCGT ACGGTCI'TG TGCCAGCGCC APAATGTGGAT TCAGCCATCT TGAAAATGGT GCGTCGTCCA GAGCCAGCCG TAGCAGTAGA
AGATGAGAAC
GTCGAATAAC
GGCI-IrGGAC A?1-rGCCGCGT 7rAAAGCCT1'
CGCGTCGAA
CTGCCGAGGA
'rTCGTCCCC '1rl-rCTTTA 7TGACAGGTT
CAGGCAGGCT
CrAGCAGACG
GGCAGGTTTC
TTTCCGTAAA
AAATTCAGAA
TATTGTCAAT
1001 AGG'r=rCCAA GGCTAGTT ACTTTGGTAA GACTGAAGAG TGTCACCAAG TGTGCGTGGG CACTTAAAGG GCAAGGACTC TAC'TATGTG4G
AAAGGCCATA
GC1TATATCC
AGAGTGATGG
CCCCTTATGT
TCAAAATTCA
ACCCATCGCC
GTCAAGGACA
GAACTCTCA
TAAGATGCAG
CCAGGTTAT
'rGGGGACTGG
CCAACGGAAA
CATGTCC=T
GCAGCACAAG
GGGAGAACTG
TAAAGAGGAA
TGTTCCGAAG
AT'TAACAG CAATTTCCTG CCA1-rGTCTA TATTTCCAAA AGCAGACCTA TGGTGACATC TGTTAACAGG CAAGGTTACG TCAATAAAAT CGCACCAGAC GCGGTCGCCA TACCACTCGA ATACACCAGG ATITrCATCC CTTTCCCAGA GATTGCTACT ATGAGCCGTC TTGTGCCGTC TTGACAATTA CCTGCAATTC TCAGCAAAAA AA7TCCAAAA A'TTCTGGCAG CAGATTATGC ATCGATCAAG CTGTAGTAAT GATCGTTTCT TGGTTCTTTT ATGGA7'TTT GGAAGATAG GGCTATGACT TTGrGACCAG GTCTTTATGG GGCAGACAGG
GCAAGACCTT
AGCTCACCAA
GCTTGGCAGA
GGACAAATCA
CAAACACGCG
GTAGATTTCT
AACAGTCTGG
AAGGAACCTG
GGCATCCATC
GATT-1?rACC CTCCrGTCTT
TCAACTCTTC
AG'rCTAGGTC
AAAATCGCAG
CTCAATCAGG
ACCCATACCC
ACCTTCCGTT
TATAAAAAAG
TGCTCCCTCA
AGCAAC'GG
CTCAATCTTG AAACGGGAGA AATTTCAGAC GCTGTTAGTT TTTACALATCT CAACGGGGGT TTGGACTATG AAGTATCAAG GGCTGAACAC G''AGCCGCAG ATTGTAG CCGTACTTGT AAACCAGCTG 'IrGAAGAGGG TGTTATTGCA CTTAGTGAAA TTGAAA6ATCG TAGAGAAACC TAAGGAGA.AA CCTATCTCTC AATACAAGAT CAACTTTGAA CGTGAAATCA AACGTCTAGA 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 GCAGAATATG CCCATATCGA TATCATGGAC AGTCATTTTG TACCGCAAAT CAGTTTTGGT GCAGGTGTGG TCGAGAGCCT TCGTCCTCAT AGTAAGATGG T-rTTCGATTG CCACTrCATG GTGTCAAACC CTGAGCATCA TCTGGAAGAT 7=TCGCGTG CAGGTGCAGA CATCATCAGT ATCCATGTAG AAGCAACGCC TCATATTCAT GGCGCCCTCC AAAAAATTCG TTCACTCGGA CGrAAGCCCT'r CAGTCGTTAT CAATCCTGCC ACATCAGrTTG AAGCCATCAA GCACCCT= CATCTAGT'rG ACCAAGTT ACTCATGACG GTTAATCCAG GrrTTGGTGG GCAAGCC'1r CTCCCAGAAA CCATGGATAA GGTCCGTGAG TrGGTTGCTC TTCGTGPAGGA AAAAGGTTG AACTT-rGAAA TCGAAGTGGA TGGTGGGATT GATGACCAAA CTATTGCTCA AGCCAAAGAA GCCGGTGCGA CTGTTTTTGT AGCAGGTTCC TATGTCTTTA AGGGAGAAGT CAATGAGCGA
GTACAAACTC
TCATTATCGG
1002 TCAGAAAACA ACTGGACTAG GGTTGCAGTT TTTGCACGCG GAAACCGCGG ACAGATTTTG ATGCTTGT 1'GGGC'rGGAT CGAGGCTCGC TCTGGGTCTT GGAAGAAGAC T'rACCTCTTG CTC'TAGCACT GCGACAGGTG ATTCAAAAAG GTGCCCAGTA TACAGATCTG GAATTGGCTC TCTTAACCAT TATTCCGT GCCTTGGGTG GCCGTATTGA CAATCCTAAG TTGGCACCCI' ATATGCATCA TACrrATTGT CCAGAAGGAA TCAGTCAGCI' CT'rTA'GCCA G?'rCGGGATA GCCAGCTGAC GGAAAATTT TTCTTTAAAA AAGTGTACGC GGTAACTTGC CCAGATGGTI' ATGTGGTCGT AAAG=ACT TATTCTATTA T'rAATTGCCA GGCAGGATAG GCAGGAGAAA CACTTAAGTA CAGACCAGTT GCATTACCGC TT'rGACCAAG ATT'TGGAAGT GGT'rGTCAGC GACCGTTTGC CGGAGATTTrT GATTCTGTGA TTTGTCCAA GCACCACCAG CI7rGAACAA AATCCTCAGG CCATATGTTG GCCAATGTCT AATAGAAATT GAGGATGCGC AGAACC1TCGT TCAGACTACG TATTCTrGGA GCCAAGTATG TTCTAACGAA TATATAGATA ACTGCATAGC AAGGACAGGA ATCTAGCTGG TCTCTTT-CTG
CGGAAGAAGA
AAAAGGATGA
CTCAGGTCAC
TTCTGCCTAG
AAAACTTGAT
ACTATCTAGC
AGTTGACAGA
GGGAACTGTC
GGTAGGATCG
ATTTGGCAAA
-se* Sso :0,60 000.S
S.
S 4.00
AGAGCTTGGA
CCAGACAAC
AAGAAGTGCG
ATCTCCTCCA
AGCAACGTTT
TGACCCAAGT CCGTCAAGAA AACGTCTCCA AGCCTTGCAG TCGAGGAAAA ACI!AGAAAAG CTAAACAACT GGAGTCTGTC TCGGAGCTCT TAACAAGGT'r ALACTGGGGCA AATTATTGAA CGGTTGAAAA CTCTAGTGAA AAGAATACGT CTATCTGCCA AAGAAGCC A TGAGACAGGT CAAGCGTCAA GCGCTTTGCT CCAAT'NTTGG AGTTTTGTTT CGGTCTTCT'r TGATGAT'rTG TATCAGCCCT TCTTAACTCC CCCACCATAT CAGCAAGACT
ATGACAGATA
GAATCA.AATG
ACCTTGCAGA CACGCI'TACA AATCGTGGCC TTGGAGAAAT CrCTCTGGAA CCAAGACGCG GACATCATGA CACCTGCCCA CGAGTGGAGT ATGCCATCAA ATTGACTCX'A AGTTTCCACT GACAAGGATG AGAT'TGAACG GGATCAGGCA GATCATTTGT CAGCCAGTTA GACCAAAAAG GAT'rGAATTG CACCAAGGTC AACTAGAGAC AAGACAGACC GGA.ACAAATG CGCCAGACGG GGCT-rCCTTT GAGACAGTTT GCAGACAGTT GCCCGTGATG AGGGATTCTG GGAGAATTGC GTACGAACGA GAATACGCAA GT'rACCCGGA CAAGGCGACC GGCAGATTAT TACCGCTTW.
CTGTCGTAAG TCACTCCTAG 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 AGGGATATTA GGAACAAGTA CATAGCACCA G=CCGACAG AAGGTCrCTA CTCAGAAATC AGACjGGGAAG AACAGATTAT TGTTGCAGGA CTATCAGTTG GTTTCAAGAC CCTTAATATC CTTGCCAGTG rCAAGACCGA GTT'rGGCAAG
CCTCGGACGA
GTCCGCAATC
CCAAGTACCC
CAAAAGAGTG
7TTGGTGGTA TTCTGGTCAA GGCACAAAAA CATCTCCAAC ATGCCTCTGG CAATATTGAT GAA'rTATTAA 1003 ACCGTCGTAC CATAGCrATC GACGGACGC TCCGTCACAT TGAGTTGTCA GAAGGTGAGC CCCCC'A TCTACTCCAT 7rTCAAGAAA A'rGAGGAAGA ATATGAACAT TAGTCACATC AAAAAAGATG AGTTATTTGA AGGC=r~AC CTAATCAAAT CAGCTGACCT GAGGCAAACT
CCAGCTGCA
AAGCTCTGGG
AAAACTACCT
ATGCCCAACC
AGCCTTTACC TTCCAAGATG TCATAACArr GAGGCC?1'TA ATAGTCGCCA GATTGATGGG CCCCACGTAA GGrTGTCCAC TCA.ATCAAAT TACTCTCCC ATGAAAGGAC GCCGAAGI' TTATAACAAT ACCCCTCAAG CTGCCTCALAC CTGGTGAACC CAATGACCCA GTCAAGGAAA ?1'CGTGACTA CATGTCGCAA GCrGATTTCA AGGTCA.AGTC ACCAGrrGAT ATGATTTCA AAATTGAAAA TCCTGTCTGG 0 CALACGGATTG TCCGAAATCT CTACACCAAG TATGATAAGG AATTCTACTC GCCAACACCA ACCACCATGC C?GAAACG GGCT'rGGCCT ATCATACCGC CG'rTTGGCAG ACGCTATTAG CGAAGTTTAT CCTCAGC1CA ATAAGACCCT GGGATTATCT TGCATCACTT AGCTAAGGTC ATCGAGTTGA CGGGGCCAGA TACACAGTGC GAGGTAATCT TCTTGGACAT ATCGCTCTCA TTGATAGCGA ACAGTTATGG AACTCCGCAT CGATGATACC AAGGAAGAAC TCGT'rTC'
CTATCCAGCT
GACCATGG
GCTCTATGCG
CCACACAGAG
AATTACCAAG
TrrTCATaTC ATCCTCAGTC ACCACGGCTT GCTTGAGTAT GGAAGCCCAG GCAGAGATTA TCCA'rATGAT 'rCACAATCTG GATGCAACCA CTTGCTr'rGG TGGATAAAGG AGAGA'rGACC AATAAAATC'r TTCTATAAAC CAGATTTAGA TTAATAA'TT AAGAAAAATG TCCGTCCACG CATTATGGAA TGATGATGAT GTCAACAC'r TCGCTATGGA TAATCGTTCC AGCATTTTTT AGGATAAGAA 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 TGTTCGTTT TTTATGTGAA TATGGTATA.A TAAGTAAAAG ACAAAAATGA ATACTCTTCG AAAATCTCTr CAAACTAGGG TAGTATCGCC T'rGTCGTATG TATATATGCA GGTATATTAC AGGGTTTGTC AGTTCTA'rTG ACAATCTCAA AACAGTGTTT 'rGAACCACCA GCGACCAGCT TTCTAGTTTG CTTTTGATT TTT'rGAATAA AAATGGAATA GGAAATAGAA ATGAAATTAA GAAGAAGTGA TCCGATGGTT GTCATTTCCA ACTATTTGAT TAATAATCCT1 TATAAAC'rAA CTAGTCTCAA TACT'rMTCT GAAAAGTATG AGTCTGCTAA ATCATCCATC TCAGAAGATA TCGTCA'rTAT CAAACGCGCC TTTrGAGGAAA TTGAAATCGG.TCATATCCAG ACAGTGACTG GGGCTGCCG AGGTGTCATC 1TCACACCGT CTATTTCGAG TCAGGATCCT AAGGAAATCG TTGAAGACTT GCGTACCAAG TTGTCAGAAA GTGACCGTAT CTTGCCAGGT GGTTATATCT ATCTGTCTGA TTTGCTTAGC ACACCAGCCA TCTTGAAAAA TATTGGTCCT ATTATTGCCA AAAGCTT'rAT GGACCAAAAA ATTGACGCGG q-rATGACCGT AGCAACTAAG GCGTGTGCCAC 'IrGCAAATGC
AAATTACCGA
TCGAGAAAAT
ATGACTTCTT
ACTCAGAACT
AGT'rTGACTA
ATGTPGAGGT
1004 AGTTGCCAAT GTCCTCAAPG TCTC1-17"rc CATTGTGCGC CGTG.ACCTCA AGGTTCAACT GTTAGCGTCA ACTATG-rrC AGGTTCAAG;T GCTGACCGTA GI'CCTrTCA AAACGTAGTC TTAAGGCAGG CAGCCGTGTC TTGATTGTGG GAAAGGTGGC GGAACGGTCA ATGGTArGAT 'rAG'rCTCTTG CGCGAGTTCG GGCAGGTGTA GCGGTC1-rra CGGACAA'rGC CCAAGAAGAA CGIYGAAAAGC CAAGTCACTC VrGAAGGTAA CCAATA'TTCA TGTCAAGAAC CAAGCCATCC TGGCAATATC TrrGACGAAG ATAAATAAGA GA'rAGAACTA AAGGTTGGAA 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 CGATTGTCCC ACCCTTTCTT TGCAAACAGA ATAGAAGGAA TCAATAGAGA AGAGTTAGAA GCGA7*rGMT CCGAGTTCCC ATGAGAAGGG GATTCGTGAG AAGGCAAGAG CCGTCAACCA GCTTTAAGGA ATATTTTGCA GTTAAGGCTA CTCCAACTCC AAGAAGAAGG TTG1'GGTCTG GACTGCTCTA GTTATGTAGA TGGACT'rTCT GGGTTCTCAG ATTATGTTCT CTTCCAACAA GCTTATGAAA ACACCATTTA CACTCCCTTT CACTTGTATG AGCTTTTCG TGGAACAAGG AGCTATTrrG AAAATTCTCC GCTTrTTGATG
CACGCCAGAC
CCTATGCACG
TGGAGAGAGT
TTGAACTGGG
ACCAGMTCTT
ACTCCTTCCT
TCTrTrGAACT TTTCTGGCGc
TTGGTGAGGG
TGAATTGGGT
AGCAGCATT
GACAGACATT
TGAAGCCT'TT
AGCCGTCCAAT
GGCTGTrGAA TA7TGGTGTT
AGTTCGTAAG
OCCACCATTA ACrTGGATGC C'TTGAAGA'r CCAGAAATCA TCTCTTGTCG TTATAATCCT ATGGACAATC CTGGGGAGGC TAAGTTTGGC GCTATCTTGA AGGAAAAACG AGCCAAGACT ACCGTGACCC ATCTC'rATTA TCCAGAGTTG ATCAAGGAAA AGTTGGCCAT T'rCGCTAGAC AATTATCATC CAGACCAGGA GCCGAACGAT AGCCA'rAAAC 9660 AAGGAATACG 9720 ATTGAACATC 9780 GGAGGCGTTT 9840 ATGACCAAGG 9900 7=CrGGATTC 9960 GCTCGTCAGC 10020 TTATCAATC 10080 ATCGCCTTGA 10140 CTTGGTCAG4G 10200 CTAGTCACAA 10260 GTGTATGAAG AGGTTCTTAC GTCAGCAGGT TCAAGATTTT CACCGAATTG GGTCGTrA TGCTG.GCACC TCACGGTGCT GAGTCACTCA TAAGAAGGAA ACCTACCGTA ACCTCATGCG TCCAGCATG TACGGAGCTT ATGGACCAGC TGAAGTGGTA GATGTGGTCG CAG'N'AATCG CGAACTGCCT CATACAGAAA GTGCCCACGG ATTTTCAATG GGCTACCAGT TC'rATACCGA AGAAGGTAAA GCCCGTCAAA rrGCAACC1-r ATATGGCTTC GATTTTGAAG CCTATCTAGG TGTGGATGCC TCAGCAGTCA ACCATCATAT TACCAACGTG ACCCA'rCCAG GTTCACTCTG TGAAAACAAT GATAAATTTG TCGGTGATTT GCTCCTCATT CATGATACAG ATAATGCCAA ATTACGTTCT GCGGAAATCC 10320 10380 10440 10500 10560 10620 10680 10740
TCCGCCGTGC
AATAATCTCA
AGAGCGCCCT GAGGACTATT TAATAGATTG AAAATGAAAT GTTTTCCTT CAAGTCGTGA TGAAAAACAG ATTGCTrrCT AAAAAATAGG CAAAAATCr 1005 TATAATAAAA CTATAAAACG 'I-rr-CAAGGA
I
TTATGGACAA GTCACAGGAA TGGTATTCGC TTTATTGTCT CCCAGACACT TGGGCTATGG GACAGAGGCC 'rTGCGCTACC ACGAGAAGCT CTCTTGCAGA AGGAATCCAC 'rCTACCTCC
TGGTGCA'N'C
=rTGCAGCG
AGTCCAATAA
GTGGTTTCTG
'rrCATTTCCT
ACACCTGTGC
AGGTAACGAT ATGTCTGAAG GACAGAAAGC TrTGGGAG CTGTCACATC CGTTGCCACT GTCACGTGAA CGGACGGTAG CAAATAAG GGTGGGATTA GA7TGCTCTC TTCACCAACC TCTTCCTTTC CCTAATAAAC TGACTrGT' CTTTTGGATA CCAAACCAAT AAAAATATCI' CTGGATTCCC CACGTGCTAG T=~AAGTTC GTCAAGACCC CATGGGTGAG TTCA-AGTGGC ACCAACAGCA GATCGCGTCA TTATATGAAA CGTGTACATG
AAACAATTGA
TAGATGGGCC
ATTG4CCACAA ATGATGTC?1' CAGrCAGTGG
CTAAGGAACA
CACGT'rACCT
TCAAGGAAAT
10800 10860 10920 10980 11040 11100 11160 11220
TGAGAAGTTT
CAACGAAGAA
CCAGTATCTA
GACAGACAGA
TGATAAGrI-r GACAAACTCA TGGCTG'rCAC CAGCACAAGA TTGTCACTAC TCAGATATTG GAAAACCTGT GATGATGACT TGATTGAACT GAAATT'CTAC CTTATCACAC AATTCCATAT TCCCTCGAAG ACAAC1TCATG GATACCGAAA AAGCC1'GATG GAAACATCGG TTTTTrCTTCT TATCTCGAAC TGTGATAGCA GTTGGTTGTT
GAGTCAAACC
GTTATCAAGA
GCTTTTGACT TGCAAAAAGA CTTAGCAAAT G VrGTTTTCC AGCGTTGCGA TTTTGTGTT CAGGGGTAAC GTCTTTTCGT CCACTTGCTT ACTTGCTTTT GGTGGG=rCT TGGCTAGTTC TTCACGGACT TTmT-GCGAA AACGATATAG TTGACGATAA ACTG'rTGGAG AATCATCATG AAACCACCGA AAGTGTGACA CTAGCTGGTG AGAAGAGGGA GAAGACCACG ATCATGACTG AATCATTTC TTGATTTGr'r CTCTTTGCAT TTCATCTTCT ACTCCGTGAA CGA'rrCAAGA TAGTAAAGGA CACCAGCACA GGCAACCAAA A'rCATACTTG AGGAATGCCT AGGTAGCTTG CTTGAGCAAC CCCTTCAGTA TGTTGGGCAG AGCAGAGAAG AAAGGCATTT GAAGGAGCAT AGGGAAACAT CCTACACCGC GATACCGTGC TCT 'rTTGAG CAGCAAAGAG AGCTrG'=TGG GCTTCGAGTT ACTAGTCGCT TCrTTTGAGAC GCGTTTrGGTG TGGC'TCAAGG ACGTGCTTGA CTTTTCAGAG TGAAGCGT 'G CCTTCCATGA TTGGTAGATA CCAACTGGTA GCGTACCATA ATGGTTACGA TAATGATAGC GACACCAAAG CCTAGACCTT GAAGTACTTG ATGGCTTCAG CCATAGGCGC TCCGATCGTA TT'CCAAATAA TGGCTTGTGC 11280 TTCCAGGATT 11340 TCAAAAATGT 11400 GTGAACTTGG 11460 AGAACGCTAA 11520 GATAGAAAAG 11580 CAGCTAAGCC 11640 TT7TVCTTGCT 11700 'rAGAGAAAGC 11760 GTTPTGGACG 11820 CA.ACCCAGTA 11880 GGCTCATGTA 11940 GTGAAAGGAG 12000 GAGAACCTAG 12060 CAAAGTAGAT 12120 CAAACATGCT 12180 TTTC'?TCTTG 12240 GGGCCTTCAT 12300 ACATAATCAA 12360 TATCAGTAGC 12420 ATCCTGTTCC 12480 1006 CTGACCTGTG GTTTTATCGA CATTGACACA GCCAGTCAAG ACAAGCAACA TAGCCACTCC CATAGCCGAG AGTGCA.AAAT CGGGGT INFORMATION FOR SEQ ID NO: 150: SEQUENCE CHARACTERISTICS: LENGTH: 5238 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: TGACACTCTG TAGGATTGTC GTrAATTGAT TGCTCGTACT CTCTACAATA ACCACCAAAG TAAAAACGAC ATAGAAAGAT AGCATCAGCT GTAGCCATAG CGCCTTTGAC ACCTTCTGGA TGATTATGAG TTACCTCTGC AGAAAGACTC GTAAGTCCTC TAGATGATGG CCATATACCA GTTTTCGCAT AAAAACCACA GTCCATGATC CAAGCACATG GAGAAATACG CATAGCTGAT CCATTCCCAA AGCTATTATA AGGCTCACGG TTATCGCTGT TTAGCCATGC ATTAAACCGA GCACCGTAAT CAGCATTCGG ATACATTCTG CCATATTTCT TCATCCCTC AATGAAGTCA TCT'PTTTGTC CACCATTCAT AATTGCTTCT GCAACAGCAC AGGTCATAAC CGTGTCATCT GTAAAAAAGC AGTCCTTCCG AAATAAAGGA AAGTCCrrTG TTTTGATATT GTTCCA'rTCG TAAACAGAAC CGACAATATC TCCAATAATT GCTCCAAGCA TCAGATTCCT CCTTGTTCAT TTTGATGCTT TTTATATTGG TTATCTACCA TATTTATTTT AGAAAATAAC ATCCTGTTGG 12540 12566 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 ATTTTAAAAA TTTCATTTTT TTCAAAATAG 'rGAAAATTGA 'PTGATTTTAA AGGAGATAGG ACTTCAACAA CAAGTGT'rCT GCCATCGCGA GGCGCATCTT TCCAACATGA TATCGCTTCA TAATCACCTG TATAGGGTCG AACATCTAAT TCAGCTATGA ATTCTACAAC CTCACTAATC CCTATTAAAT CATGGGTPCC ATTAACAACT GGCTTAATAA AT'N'TCCCCA ATTATCAGGT TAAATCTTTC GACCATAAAA CTCTTTAAGC AAGCCCATCA TTTAGTAA TGCTCTAGTC CTTGTTAACT CTTTTATTTC AGAAAAAGAT TAGGCACCTA CTTGAGCATT GTATTTATTA GGTrTTTACCA TTTCTTTCCA CCTAGCTCTA CCATAATTTC CCAATGCATA ACCATCATTT GTAACACCGA TATCTAGTCC ATAAGCTATT TCAATTACAC TTGCA'rCAAA TTGTGCATGA
ACGCGACCAT
CATATAGGAT
CTTCCAGTAA
ATATTCACAA
TCAATAGGAT
TCCATTATAT
TGATATAAAA
CTAACACAAA ACAACGCCAT AGTCGAAAGG TAGACCAATA AGACT'TTA ACCAGCTTTA TCTCTCCTAA AATACCAGCA AGTCATGAAC CGGAACGTTT AATCTACAAC TATATCTTCA TAACTTCTTC TCCTTGTAAG ATTGAAACCT CACTTGGTAA TTTACTTIGT 1007 CTAATATAAA CAACCA'rTrC ATCACTCCTrA TATCACTAGT GrrACACCAA TTTGTAAAAA ATAATACCAA ?N'rCCTCTT ATT?1"NTGA GTAAATAGCC CCCATAATAT CATCGAAATA ATCAACGGTA TrrAGGAGTA ATTCAATAAC ATCTCTAGCA TCTTCTACTA AATNTrCA6AG CATTATTCTA 'rNACGG TGCCAI-r-r- TAACACTAGT GTCAACTGGC TrrT"ATAATA CACAGTAAcT TTAAATC~rr GGTA'rrAAAA TGTCTCAAAT CAGTTATCAA ATCTAGTATA TCTAAACAAG AAAAAGACTT ACACI'CAAGG TGACTCTTTT ACTAGCAAAG GTATATACTC CTGGGAC~rr
?I'TCTCTAGA
CATCACCTCA
GTTAGTCGCA rrCCCCTr 7T'=ATCAT CCAAGCTAAT AGTTAATTCT ATCACAGGTG CATTAGTTT-A AAAGTGGAG.A AAI'MrCACA ATATTTATAG AATTATGACC GGCTACOTrCA GTTTTCTTCC CCCCCTrCGT ACAAGGAACT TTGGTTGACT GGA7TT=AA
AAATAAAATC
ATACTTTCCC
TATAACGTTT
1500 1560 1620 1680 1740 1800 1860 TCCAACTTCT TCTLrAACAT ATCCTTCTAC ATCTTCAATC TCTACAAACA GTGACACAAG AAATGCCAAA CTTCGATCCC rTrTTTTrCTG TAAAGAATCG TTCACTTCCG AAAAAGCTTC TGTCGATTTC ATATCCGCGG CTT-rC-AAGA TTTACCATAG TTCGTTTCTC TTGTTTCGAC ATAGCCTTTA ACTTCATGGT ATTGAATCTC 1920 TTrGGGTCTKA 1980 CTTCACCGTC 2040 AGTCT'rrTGC 2100 TGTTAACGAC 2160 ATAAA'rTTGA 2220 TAGATAAAGG 2280 GATTAAAATA 2340 CTACAAAGGC 2400 ATATGCA'rCA AT'TTTTCAAT AATC'rCTT-rC CAAATAATGT AACAATATTT CCGTTAAAAA CTCCACTTCA AAACCATCT TTTCCCATCT TCTATGGAAT TTCCAATATA ACCATCGATA GTCACCTGAC CAATTCAGGC GTCTTATACC CAGAACACAC ATCCTTCGAT CACTCTATCA TTACATTTTC CTCAGGATCG ThATTrTCCAT ATALATCCGGT CTGTTTCCAG AGTGTATCCC CAAATGCTAC TAAATCTTTA ATCTCTCCAT TTTCATTATC TCTCTGTATC ATCTCATCAT CTTATCGACC rrCGTICTCA
TTTTTGACGGG
AACATA.AATT
ATGrTrTAG
GGGATTTGAG
GAATAATCAT TT'TGGTACAA AGGCTAATGT AAATAAGCAC ATTTCCTACT TACTTTACGA CCTCGTCGCA TTGGCTGAAC ATCTACTTTr ACTTTGCTGA 'rGCTTCAACT CGTACAAGCA GTGATAC4;GC CTCAGCGTGA TGCGTCACTG GGACTCAAAA GGTTCGGGGA ACCTTTTGAG GATT-AACTAC GrTTCTCTAA TAAACTTACA CATTCAACTT GTTCATCATT G'rCCAAACCT ATGTTCAGAT TTTCTTCTAT AATTGGTAGC TTAAAAGTAA TGGA'TrTAG CCATTGTCCG TTAGATTGr-r T1~rCTTCATA AACTTGAATT TCAGAAATCA AAGCTGAAAT TAACTGCCTA CGCTCTACAT CATTCATGAC TTATAGAGC T1'ATCAAAAT AGATCACAAC CTTATATATG TTATCTCCTG TAAGCrmC AGCTTCAATA GTCTGTTTCT TTGCTTrCGC ATCAATTAGT GATGATTCTA ATTCATCTAG 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 TTrTGTCATAC ATACGATATA
ATCTTC.AACA
TCTCAArrCC GTTGATrTTT
TCTAAATTAT
TTTTGGTAAT
TCTTGCATCA
CTCTGCAACA GCATCATCTA ACCTCTTATC ATCTGCCTAT ATCTTTCI-r T'rCTrGATAC TACAATTCCA GAAAGCAAGT C"CTCGA CAT1rMAGCT TTCZATGTATC CC 'TCAGATA TGTACCATGA AC"N'TTTCTA ATTCTTTAAT ATCTTTCTTA TGGGATTTr'r CTAATTCCAT AGTATTTACA TACTGGTCGA CAGCT'rGCCG TCTTCAAGTT CCCTGCTTTT TGAATGCGAC AGCCACAGCT GATAAAACAG ATCTTCAACG CAGATAAGAT ATCAGCGGCA TTTCT'rGCAA TTTTCCAGAT TTTATATCTT GTCAGACTTC CCGGCATCTT AGCTTTCATT CGTGATTTTT TACTCGTGTA TAGAGGTATA TATATACAGG AAGTTCTTCT
TAACACCTGC
GA'rTCTCTAA
AAATCGTTCT
TATATCCATA
CTTCCATTGT
AAATCATTAG
TAACTCCATA
ATCTTGATAA
CCATCATTCG
CATACTCTCC
GTGCCTCTAA
CTTTTATTTT
TCCAAATGCT ATTTTCCCAT ATCAAACAAA GGATTCTTAC CATTCTGTCT GTATTTAGAT ATCCCATTGG CTCCTATCGT TATTGCAACT GCCTCTTCCI' CATTTATAAA CGGAGCAAAG CCACCATTCC ATTTTCCTTC TTGAATACTG ATGTTTTCTC TTTCTATTTC TTTCCCAGCA TCTTAGATG AATCAATGCC 1008 GTCTATCATC TAAATCCTGT TTCCTTCTCT TATAATGCTT CTATTTCCTC AATTAG CTTA AACTTTGTAG AATGACTCTT TATCTAT?1'C ?I'TNCTATT TCAGAGGTAT CCACCTTCAT 'rAGAAGCAAA T'rTCGGATTA CTTACTATCT TGACAATCAC ACAATTCTTC TCTAA'rMC TTXCTGAATG TACACTTATT GGTTACAACC ATAGTAATAA AAATC TTT AT ACTTTGTGCC ACTTGTTCCC AAACAT'rCCC ACTCCACATA TCGGGCATT GTGTGCGTGT ATCTTTCCT TTAT'rCACAT GCTCATATTT TAACCTGAGC AGCTTGCCAA ACTTCATCGG AAACTATAGC TTAGATATTC AT CTT GTTCA ACCTGCTTAT ATTCATTTCT 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 ATCCTGCA'rT ATATCAAGTG CTTAAACACA AGAACAAAAG ATTGAACTGT ATTCTACCTT AACAATTTCA TAATCGTTGT CGAATACCCC TCTATCTGTA TTCTSTTGAC ATAGTATTAA
TAGAAAGAAC
ATACTCCATC
CAATAGACTT
AAATAGCAAA
TTGACGTAGA
CCTCAA'rATA ATTTTTCTAT ATCATATATA ATTTTTTTAA TATAACACTT TTA'IrAGTCC GTCTCAATTT TCTCTTGATT 7TTGCTGGC GTTGGATCCG ATTTTGCAAT CAGATTTGCT ATTAAATCAG ACCTrCCTCTA AACT'N'TATA TCTAATAAITT AAGTCCTF'rc GATTAGCAAC ATAGGAATCT TTrCAACCTTA GAAGTATCAT TACCCTCATT TTTAAGT TTG GACTATCATT TCAAGTATAT GTG=TTGC CATGTCAAAA CTATTTTTCA GTAGAT'rATC TAAATCTAAA GCACCAGCAT CCAATCCATT CCAGTCATTG TCCAATATAT A7TI'GTTAA TTAAGTTTTT TGACATTGAC CACTTCCGCC TCTATTCCGG TCTCATAG CGGATAGGGT
ATGAGCCCGC
ATCCCTCCCT
1009 ATATTCAAAC TCTrACTTAT CGCTCACTTT ClrTrTG.C= AGCAGAACTT 1 rN'GCCGA ATTATTCAGC CGAAAGATCT TGACGGATAG GTTATTACGC TCCAAAAATA ATTAACGTCT TGTCTTGGTC TATrCAATTG TTAAGGTTCA AAATTTATCG AGAGTTATTA ATCTTTTTAA AATr'rGACCA. TCAGAAAATA ?r'rATCTTGA TGTAACAAAA TTCTATAAAT TACCCTCTTA TACTrAACAG TGAAAAGAAG TCT'N'CTTGG TAACCAATTT 'rGAAATAGAA TTTGCTrATA TAAAAAGGTC CAATTCCCAC TGCATAAATA GCAGTGAAAA TTAGACCCTC TTGGTAACTG TCATCTAAAA GTCTTCTA INFORMATION FOR SEQ ID NO: 151: SEQUENCE CHARACTERISTICS: LENGTH: 13425 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: GACGATTTAC GAAGAATCGA ACAAGAACCT AAAT CTT GCA GTTCATGCTT ATACTTTTTT GCTCCTATCA ATTCCCAACC TCTATCTCTA AAGAAATCTA GAATCATAGA TACGGTAGAT GAACAAACCA AAACGACTCG TTCTATACCT TTCTTCTTCC TTGTCCAA.AT CAACAATGGT GACATCGTCT GGTTGACATT CCAACCTTTC AAATGCATCT GAAAATCCGA AATrCTACTC GGCGTTATTT ATTAGATAAT TCTGAATCGC AAAAACTATC AACTGAAAGA ACTACCTCCA GAAATAAAAT AGGAAGCATC TAAACGAAAA TATTATGGCA ATACTTTCAA AATGTTTAAA T GAAAGTGCT TATAGAATGA TATGCGGAGA TAATCTGACG
GGTCAAA.ATA
CATGTAAATG
TGCTATTCAT
TCT'rTCCACT
AATAAAACCC
GTGACAAACT
TGTCTTACCC CAAAATTAGA TTTGACTCAA TCTCCAAAAA AATCTATTAT GAAAATCAAA TTGAGAAAAA CGGTAGTAGA CGCATTGTTA AAATCCGTTT GGCATAAAAA AAATTTACGC CAGTTTTTGA ACGGCTGATG TGTGATTATT TATITI'TGAA AAATAGATCA TAATA.AGTCG CCTGCCGAA.A ACAGCGAAAA CGATGCGAAA GTATATTGCA TACTTATTTT TGATATAATA GAAGTATAAT TTGTTCTGAT AATGCGGAGG GTTCAATATG GTTGAGTTTA ATATTAAAGC AAATTTCATC ACTCCTGCTA AAACATGCCT 300 ATATAAGAAA 360 AACACTTTCC 420 GCTAAAAAGA 480 AATCTTTATT 540 TAGATATTTT 600 CCAGCCCGAC 660 ATAGCGGTGT 720 CAACAATTTA 780 AGTTTATTTT 840 TAAAGTCTAA 900 TTGTATCCAA 960
GCAGAGTAT
ATGGAGAAGT
GAAAGAAATG
TTTATAAGTG
AGATI"PTTAG
AGTGAGGAGG
1010 AGGATGGAAA AATGGTGAGC ATATCGCTI'A CGAAGAATAC TTCACTGATG AGTTAGAGGA GATAAGGCTC GCCTAAAGA AGGAAAAAAA TCAGACTATT CCAATTGGA ACTCGAATTG CAATTGTTGA GGCAAAGGAT AATAAACACA AGGATTACAA CAAGCTATTG AATATGGAGA GATTNTAGAT GTTCCATTTG GAATGGTGAT GGC'TTTATTG AACACGACCG TATCACGAGA GAAGAACGTG AGACGAATTC CCTACTCGTG AAGAATTAT'r TrCTCGTATC ACGAAGGAAA GTACGAAATT ACAGAAGCTA TCTCAACTCC ATACTATACA GACGCTCT GCCACGCTAT TATCAGCAAA TAGCTATCAA CCGTACTATT GAAACAGTTG GTCGAATT'GA 1020 CACTGTA'rrA 1080 GCGTTCGAGC 1140 TTTA7TC~rC 1200 AGCTGGAGrr 1260 AAGGATTGAC 1320 CAATGAAAAC 1380 CCAGAGGACA 1440 AAA.ACGAGTA ATGTTrTGTGA TGGCAACAGG TATTCATCGC CTTCGAAAAG CTGTTGGC AACGGGGAAA ACGTTCATGG CTTTCAAAT TAAACGAGTT TTATTCTTAG CAGATAGAAA CTTTAGGCCA TTCGAAAAGG TAATGACGAA AAAANTAAAT TCTTTTGAAA TTTATCTAGG 9.
9 9 9 .9 9* 9 9 9 9999*9 9 9 CATCTTAGTA GACCAAACGA AATTACACCA AAACT'TTTGA GCTTTATCAG CAACTAACTG AGACTTCTTT GATTTAATCG TAACTGGCGT AAGGTAATTG TCTrAAAGAA ACCAAGAATG TA:GTTTAAAA CAGCGAATCG TTTAGATGTG GATGTGGATG ATTAATAGAA GATAGGTACT TAGAACGCAA AGAGT'TGCCA TGATAAAACA ATTGTTT'TTT
TGGCTGAAGA
CTGCTCCTGA
GTGAAGATGG
TAATTGATGA
ATTATTTCAG
CTTCCAATAC
AGGATGGTTT
GTTA'rCGTCC
ACGGCAGGA-A
AGTTTCTTTC
GTGTTGATAT
AACTGAAACA
AGCGCACCGT
TTCTGCGACA
GGAATACT'rr
TTTGGCTCCA
AGAAACTGGA
CATTATCAAA AATrTGACAA GGT'rCAGCTA AGGAAAACAG CAGAT'rGGGA TGACCGCTAC GGTGAGCCA-A TC'rATACTTA 7TXTCGTGTTIA TGAGGGTTAA AAAGTTGATG CTAACGGACA AGATTTGAT AAAACCATTC TGATTATATG AAGCAAAACA TGACCATGCC GAGCGAATGC AGACTA'rCGT TATGTCATGC TCAT'rGATGA
ATGCACGATT
GTGCTGCACT
AAGTAACTGG
1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 TGTAAAAGAG AATCTAGACT TAGTCCAAGA TGACAACGCT GAAGGAAAAG CTCAACTGGA TAACTTTATG GATGTCAAT'I CTAA7TT'C2 CGCTAT'rGTA ACAACGTCTA AATTATTAAC GACAGGAGTT AATGCTAAAA CATGTCGTT GATTGTTTTA GACTCTAATA TCCAATCCAT GACTGAAT'PT AAACAAATTA TTGGTCGTGG 9 9999 *9 9 9 CACACGTCTT TATCCTCAAA AGCGAAAGA AT'rT=rACG ATTATTGATT TACCAATTTG TTTCCTGACC CTGATTTTGA TGGTGATCCA GTGAAGGTGC TGCGAAAACA GTCAGTGGTT CTACGCCCGG TTTCGTAGAT GAGGAACGTG AAAATA'rATC GTTACAGACA AGCACCTTAC CATTCTTAAT TCTAC'TGTTC TGAAAACGGG AAACTGATTA CCGAAAGCCT GACCGACTAC ACTCGAAAGA
TTCGAAATGT
TAGAAACACG
ACCCAGTAGA
AAGTATTGGA
ATATCTI'AGG
1.011 TAGCTACGCC AC~rrAACG AT'N~rrATrCAC AG7MTGCAT ACCCAGATA AGAAGAAGCT TATCTTAGAC GAACTTTATA AAAAAGGAGT TTATCTAGAT GCTATTCGAG AGTCGGACGGG AATATCAGAA CAAGAAATCG ATGATTTTA TrACTCCTA AAACTTGCCT ATGGTCAAAA AGAATTAACC AA.AACGGAAC GTATCAA'rAA ACT1CAAACAA AGCGATATT TATATAAATA TAGTGAGGAA GCGCGTC.CTG TTTGGAAAT TTTACTGAAC AAATACATGG ATAAAGGTAT TGGAGAACTC GAAAGCATTG AAACATTAAA ACTCCAGAA TTTCAGATAT ATrGTGGAAC CTTCAAAATC ATCAATAC~r A'rTTGGAGA TAAAAAACGA TATTTACAAG CAATTAAAGA ATTGGAGCAA GACCTAT'r'A CAGTAGCTTA ATGAAAGGAA AGTATGTCAA TTACATCATT TGTAAAAAGA ATTCAAGATA TCACTCGAAA CCATGCTCGr GTTAATGGTG ATGcCTCAArCC
OS@SS*
C S 0 00 0 9 0 00 00 0 0 0 0* 00 0 0 C. 00 00 0 0 CO 00 00 0 0000 0 00 CO 00 0 0 TA'rTGAGCAA ATGTCTrGGT TATTATTCTT GGAATTAGAA GAAGACGAGT ATGAGTCAAT GC'TCATGCT CAA.AATGGGG AACCGTATT TAACAAGT-rA TTCAAAGAGT TGAAAGAGCT AAAAATTTAT GATAGCCGTG AAATGGT'N'G TATCCCAGAG GAATTAAAAT GGCCAAATTG GACAGGCGAT GAAT'rACTTG ATTTGTCAA TGAAATAACT TCAAATATGC CTA7TCGAAA AACGATTGTT AAATCAGCTr TTGAAGATGC ACGCCAAGTC ATCAATGTA TTGATGAAGT GTTTAATGA'r ATTTACGAAA AArr'rr=AA ATTTTATACG CCACG'TGCAG CGACTGATTT AGAATCAATG GCAGACCT'rG CTTGCGGAAC TTTAAGTAGT CAACGTAAAA CTAGTGAAGA TATTGAAAAG AAAGCATTTC CTCATCTTT TGATGACCCT AAAATTGT'rC ATGGAAATAC GAACAACTAT ATGAAAAATC GCGTCTTGTT TGATTTCAAT AGCCCTGAAC ATCGTCATTC AGATATTCAA AATGCTGGGA ACICAGGAGA TATTGCCGAA GTTCTTGACC CAAAACTT'GG AGGAGGCTTC TTGACTTCGA CTCTGAACCC TACCAAAAAA TATAATACAG CTGTTTTTGG AGCAGTTACA AATCTGTTTC TTCACCAAAT TTTGGAGAAA AATGTTCGTG A.ATATACGGA 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 000* 0 0*O0 00 *0 0 0 0 TGATGAAAAA TTGACATTA TTATGATGAA TCCACCTTTT GGAGGGTCAG AATTAGAAAC AATAAAAAAT AACr'rCCAG CAGAATTACG GAGITCTGAA ACAGCTrGATT TATTwrATGC TGTCATTATG TATCGITTGA AAGAAAATGG TCGTGTTCGA GTTATTTTAC CTCATGGTT TCTATTTGGT GAAGGTGTAA AAACTCGCT? GAAACAAAAA CTGGTAGATG AGTTCAACTT GCATACG.ATT ATTAGGTTGC CTCATAGTGT CTTTGCACCG TATACAGGAA 'rCCATACGPA CATTCT'TTTC TTTGATAAAA CAAAGAAAAC AGAAGAAACT TGGTTTTATC GTTTAGATAT GCCAGATGGT TATAAAAATT TCTCGAAAAC TGTTCGTGAC TGGTGGGAAA ATCGTGAAGA TAAGCCGATG AAG'TCAGAAC ACTTCAATCC GATTCrCGAA GGTAA=TCT ACAAATCTA-A ATCATTTACA CCTAGTGAAT AAAAGAGGAA GAGGAAATCT AGCAACTrrA AATCATAAGA CAAATAATGA CACCAGAACA TTAGTGCCGC AAAATCCCAA GAAAAAGAAA AACTTATCAG TTTCGTGGTG ATGATGGGAA GATG~rCCTT ATGATATTCC ATTGTCAGAG GTGGCTCTCC ATAAATTGGA TAAAAATACG GAAAAAATCA AAAAATCAGG TTAACTAATr CTATGAGTr GATGGATGGT TGGCTA'rTTC A'rTCTTTCAT CAAATGTAGT TGGCcrAGTT 'rAAATCCCTT
TTGATAATGT
ACTTAAAGCA
TGACGAACCT
TGAAGGAAAA
ACATTATGGG
'rCATAC?1'GG
ACGACCAATC
TGATACTGAA
GCTTAACAAA
TGGTAGACCT
GAACTATGAA
TT-ATTCTCAA
1012
GAATTATAAT
TGAGTTGATT
ATTAGCTVGAT
AGTATTCTCC
GCAAGTGAAT
ATCAAACGAG
AAGT'IrGCTG
GAGTGGGTGA
AAaCM'TATC
AAGGGTGAAA
ACTAGATTTG
TATA7'TTrA AACTCxr'rAA TTTCTrATCTC AT'rCTTATCC GCTr'rAGAAA
GAATI'TCCAG
GAACAAGACC
CAAAAACTCT
CAAGGAGATG
AAAGATA'T
AATAAAGGTG
r-ATAATGAT TTAGACCAGT GTGACTTTCC CAGAATTATC AAGCGGAAAG ATTTTGCAGT TGTTGGAGGA AAAGAGCGAT GGAAGGGAAA TATTAAAGAG rATAAAGCI' ATAAAAAGGA AACTGAGATA ATGGAAGCAC TCAAGAAATT GGTTrlrCTAC ATTGGTTGAA TTACTTCTCA AGTAGATGGA AGTATATAAA TAATGTTAAA TAAAAAAAGG TACATTTTTG A'rGTTCATGG TGCAXATACAC ATAAACATTA CCTATTCTAT TAATTAGTGG AGCTGTrGTG CTCTCCCCCC ACTATCCGAA AAGTAGATGA ATATGCTGAA ATA.AACTAAA AAAATCTATT CAAATGATGA ATCAGTCGAA TT'GAAGAAGG CAAGAT'rAAA ATAACTCTTA TTATGGGAAT TTTCAATAAA TACAGGTCTT TTAGAATTAT ACGTGGTGGT ACTACATTGA TACACAA'rTC AAAAACTTGA ATAGTGATAA AGTTGCTTCT CAACAACGAA TAGTAGAAGC AATCGAATCA AGTTA'rAATA GACTAGAACA GCrAGATAAA CTTCAATATG CTATGCAAGG AAAA'rTAGT'r GT'TrACTTG AAAAAATACG AGCAGAAAAA AAGAAAGATT TGGACATTTC TATTGTTTCC ATACCTATGA A7'TGGG'rTGT TATAAAAATA TCTTACAAGA AGGGCGATTT AAGCATTAAT AATATTAAGC CTTTAGAA'rT TrCTCTG'rTG ATCTCCT CTG AGCAAGTTTA TTTAAAACAT TTAGAACATA TTGGAAAGTT TGCAAGAATC GGATTATTT TCCAATTAAC ACCATTCGAA TTTAACTTGT CCTCTCCGTT ATrATAAA CAAGC~rrAT A'rAATA~rCC TAAAACTACA TTTGAGGAAC AGGAACTTAT TACTCAAAAA CTTTGAAAAT GATTCT C ATCTCTTCAT 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 AATCAGCTAA TAACACCTGT ATCAACCTCT GATAAAGACT ATGATGGTGT TGTGGCTGGT AGTTCAGAGA TTA~rCAAA ATTTCTATTA CAATTrGAAAG CAATA6ACTAA AC'rATCAGGT CTGAGCGAGC TATTAATTCC GT'rAGCTCCT GTTGAGAAAC T7rTTTGAAAA AGTAAATCAA GATTAGAAAT AGGGATTA.AT AATTCGGAGA TACrGGTACT
TAAGTCGCAA
ATATCGATAT
TAGATAAT
TTAATCCTAA
1.013 ATTTAATGTr 'N'CCCTTTGA TAGCATC~rr AAATATCAr AAGI'AATCTC TGATAATATT
AATGGCT'ICA
AAAACTCATT
TTAGAAATA
TA'rAGTGG AA'rAAAGT'rC
GATTCTCTCG
CACGAA'rATC
CI'TAGGTGA
CA?1'AGTTAC CTGATATAGA 'rACCCAAGGT ATT'rCAGTTC CCCAAAAAGTr TT=rCCTrAT TCTCAAGITA ACTAGGCTAG CAAA7TTrAAT TTTCATATAT AGGATAAGAG AAAAATAGCA GGTAGTCAGT ACAATATTTT GTTG'rTCAGT GTTGTTTCGT CTTTrtrCCC 'rrGACCACCT GT-rATTTTTT AGCTGTTTTC CTTAGATAAA AAGCAAGGCA ATA'rCA'rTAT ATGAGGGTAA ATTACGTTAG 0
TCG'TGTCCCA
GCAAA'rAAAT
GCATAATT='
AAAAATAAAT
AATTTTTAG
CCGATTTCTA
CC'TAAAAACC
AGAAAAAGGT
ACAAATTTA
AGTCGCAAAT
AAATCTTCCA
CGCACAGTAT
AAACTCCTAT
G'rGCGTCAA'r
GGGGTGGGAG
TACrACCCGA
AGATGAAATG
CAACAGCCCA
CTAGACCTGT
TCAAATCCC
GCCTAAAAGA
TTAACTrGTTG AGCAACTCCT CTTGCTGTAA AACCACTTCG GCGATTTTCT GATTCTAACC GAATGTACAG TCTACGAATT ??ACCACCTT CATCCTCTAC TTTTAGTTTA ATAAGTTlCAC AAACGACAAA ATGCCATTT AAAATACCAT AATCAGCATG GCTAATGACA TCTTCTAAAA A'TTTCAAATC ATCATGACCA ATAAAAGCCA TGACAGT-T'T AGGTTTAAAA TTGTCTAATA 7TrTGAGTTC GTAAT'rCTCC AAGAAAAATC V1TTCAGCTAA ACCACL'TC TT-CAAATGTA CTTATGTTTG ATAGAAATTC CACCGCACGT ATGGGCGAAA AATTGTTCCA 'TMrATCAAC GGCGATTAAA TATTCTTTTA TAG=rCGT CGCTTTN'CT TCAATAACTG ACTGAACAA'r GCAATTAATA ACCCCCGATA AGACTCTTAT GCCATAAGGA TTTmATrTT CATGGTAAAT TGAATTCCAC ATTAAATCAC CATCTCTTAG TGAATCACCT AAAGTAGAGA TTCN'ATTA GCATAGGGGA CAA'rATGGCA ACN'TCCAA AATGTCrATT TTCTTCATT ATAACCAGAT ATAGGCA'rAT AGCTTCACTG CGTGGAGGAG ATATCTCCAT GCTTCTGCGA ATAATAAGAG CCATAATCAC ACCAATTAAC AATTTTATCT TTCGAGTAGT TTCTATACTT ACTTCGCTAA A.AAA'rTCTTA GATTCA'ETT GTAT'NWLCA TTGGTTCG' AAATTTATTC AACTAAGACA AC'TATTTCTT 'rTGAGTAA.AL G'rCAAAATAA T'7ACACGAGC CCCAG-TGCA CT'rTTTCAA ACTACGITTrA ACGGTTTTTG CTGTACTTTG GATATTTATT TACTCCTTCT AATATCCTTT G'rATTCAAA'r GAACACCATA AACGTACGAA ATTCAAAATC TTTCAACGTA AAA.ACTATTA TACTAAATIA GATTCTGGAT TGTTCAGGAA TAATAATTCT TTN'T=GT AGCAGAGGAA AGAAAATTAT AACTGTAACA 'rGGCTATCTG AGCTAATCGT CCTAACGTAC TAATCTTTCT TTCTGCTAAC 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 TATGAACTGT TT~CGGGATCA ATAAATCTTG TACATCTG AGCAATCACA GGGTATATAG V 0 1TTTGAATGTA GGAGGTTATA AAGGTACTTC C~TCATAATAA C II rr?1r AACTGCCT GTAAAACTTC GACTGATCA ATTGAAGAAT AGAT7=r=I TATAACTTTC AGCATA'T*CA GTTGT'rGT'rC GGATAGTGGG TTGCAGGATA ACTTGTTCCA ATAAATAATA TTrCAAATAT TAGCTA'rCAA ATACTCTTTA CTGTTGAAAA TAAGACACTA CACGTGAAAG ATATTGTAGA ACGTATCTAT ATACCTAAAG ATTTTATCCT CACCCACTCC TGCTTCCA'rC AGCAAAC r TAAAAATACG CCTATAGATA AATTCAATTT mTACTrCCA TTCATCATAA AATTCAATAT T'rCCGCTCGA TCAGGATTCA TAT'rGAGATA ATATAGTAGA TAGTTTCTAG AAATTTTAG TCAAATATAT CTT'rCGAAAA GTAAACTrTGT AGATTTCGAT TTGAATT=r CCATAGTGAC CGCAAAAGAG TCAAGAGGAT GGTTATAGGT ATTTGATGGC TTCTATAT'rA GTATTAGTAA TTGCATAACT TGCCCATTCG TCGT'rTAACC
GAGTTATCAT
'rCTTCAAAGA
TCATTTGGGT
AG=IATCTG
TCTACTTTTT
GGGAGAGCAA
GTAGAT11TAT TA~rAACACG GTTITCG'N'AA GTAALAGTATC AG7TTCTAA CTACAGCAAT TTCTGCGAAA CTAATTTTCT TTTTTGTAGT TGATTATGT'r GATTTCTCTG GCT'rATTTTG CAAGTATCAG GAATATCATA CCATAATGTT TCTrATGTGC ATGGGGTTGA AATAGGTTTA ATCGAATATT CAAATCCTCC CTTCAGGATT TTCCCCTTGG AAAATCGACA AGCACAAACA TTGAAATAAG ATGTAAACPA CAGATGTAGT GTACTATTCT AATAr'rTACA GATGTGTAAT TTGAAGTAAC 'rrGTTTTCTT 'rCCTTAATTT TCTTCTACAC TTTrTCGAAAA ATAAATAGCG TTAGACTGCT GTGTGACTGT AGGTCTAAAT AATTATCAAT ATTCCTTrTGG CTTCAAGGAA 1014
CTAAGTCAAT
GAATATTTGA
TCACCCATTC
CTCC7'rGGGA GrTT7,GT CTTGTTrCAAC
GAAATCTT
CTAAAGCTGA
TTAATAATAG
ACAAAAGCCA GACCATTGA'r ATA'TTTTCGA GACTTCCCTC CCAACTTTCT GGTA'rTCAC AACAATAGAA ATGTCCAAAT TTCTrGCTCGT ATTTTTTCAA TAATTTT"CCT TGCATAGCAT ATCTAGCTGT TCTAGTCTAT TTCGATTC? TCTACTATTC ATTAAAATTA TAATCATTGA ATTGATAAAA TTATCTGATA CAAAACAA'rA AATGC'rGTAC AI-rM-'rTAGA TATCGTCTAA AGCACGGGAA GGCGCTTGTT CTTTTTTCTA TCAATACTAG CCCAAAATTC CAATAAATTG AGGAACATCA AT~rCTTGAG TTCAAGTATA TAAAAAGCCG T'rGT-rGATGA GArrTIAGAT ACCTTTTCTG CCTGTAA'rTG CAACCTCGGC AGAAATATTC AAACAGTCGC CATCATCATT ATCGATTAGG AAAGTTAAAT AGTCTCAATT TACTATGCGCT TTTGAAGCTT GCAAAAGT'rA GCCCGATATT cr'rTTTGKAA GTCTGATCAT AAATCTAATT ACCCAAATCG CTATTTTAAC rACCCACAG GCAATCNTTC 1-rCCCATTGT GAAACGAAGG GCrAGTATAG ATGTGATCTC 1015 CGAGAGCAGC ?ITAACCACT ATGGAAGGTC TGTAATACCA CTTrCGATAGG AGCTGGTGCT GAACAGCCAT AGCAACGTAA TTCCCATACC ACGTGAAGCA CAATGTAAAC AGGCGCT'rCA TCATGATGGC AGTATAGTTG TrTCTGACAA CTGCATTCCT TCATCI'CTG TCAAAGCTTT CAAAGCCTTG
CATCAAACAA
CCATAAATGT
TTTGAAT=r
CAACCGCAAC
CAATCTCACG
TGTCATTCAC
GCTC'rGGACC
TC-AAATTACC
TCAGACCTGC
CCGGGTACAA
CA.AACA'rAAC
GGACATATTA
'rGCGTAAAGT
ATCACAAGCA
CTCGTGGTGA
ACGTGTGTTG
TTCAAGTGTT
AAGGrrGAAG
ACGAGGGTCA
AACACTTrCCA
GTACATATCC
CTTGrrCGAC GCTTCCTTGC GCICTTCTGC TGTCATGATG TCGATTTTAT TTTCAATACC ATACAAACCA GGGrrCGCCA TTGGATCCAC 'rGAACGCAAC GGTACGCGCA CAAGTGGCGA ACGTACGA TAACCTGGAA CCAAACGTTT GTATGAGTTA TAAGCATGCT TGATCAAACC GCCTAGGAAA T'rGGATCAT TGGATCAAA GAAGGCGTTA CAGTGCATAC CTGATCCAGC AATACCAAAT CCGTGTN'GC GAGCAATGGT TTTAACAACA CGGACAACTT CATCGTA=~ AAAGTCAATC CTCGCTTCTA C~rCAAATCC CATTTTGGTC TCCGCAAGGT CAGTAGGTGC CAAGTCAAAG GGGTCCCCAT TTTCATCCAA CTTAAATAGG GATTTGAATC CAACTTCTTC CATGTGACGA CCCGCAALATG GTTCACCTC TGTTGTATAG TTTTCATCTC CCCAAGGGAA GACTGTCCAT GACTCATTGA TACGTACAAA ACCrTTCAATA AAGACCTTAT CTAACTGT'rC ATCTGTAGCA ATATCTGAGA ACATAAGACG AA'rAAAGGTA TCTGCAGCTG TGATTGGCAT AAGTTTTCTC CCGCGACCAA AAGGTGACTG TACTGAAGCA ACTGCACGAC CTACTTCAGT CTGACTAACC TATTTT'CT TAATGGCAGC GATATTATAA AGACGATCCA TGTCATTCAA GGAATACATG A'rCAACTCTT GATCTTCATA ATAACGAATC
TGAAGACTTG
TAGATATTTT
ACTTCCAAAA
TCAAGACGAG
CCAGCCCAAG
ACTGTTGGGT
TGGTAAGCTG
?1rcCCTCTG
TTTGGCTTCG
AGCTTA-AAGA
TCATGCTGTC
AAGACATTCA
TAGCCACCCT
AAGAATTCTG
AGAGCTCGTT
ACATCACAGA
GTATCCAAGT
GAAGATCCAT
GGAATT'rCGA
ACATTTTTTT
C'TTAATCTAT
AAACGCCCCT
GCTTTCTTGG
CCTTCAGAGA
CGACGATTI'C
TGACGCGCCG
CGGCGAAATT
AAAGACAAAC
AAAGAGACGA
9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 CGTTTTTCAT GGTTCCCAAA CCTTGACTTC ACCACGAATA GACTACTTGC GGTTGCCTAA GTTGGAGGAd TTCATTGTGA ATTTCGCTTC ACGTTCAGCA TATAATCTTT GArTTCAAC CTTCGTTTCG ATCGGGCNTG ATAGATCGGT CAACTTCATA CTTTT'rCCTT CAT'rTACAAT GTCAATTGAT AATGTTATAA
ACACTGCCGA
TTCC'r'rr
AATGTAACAT
TAGGAAAAAC
CTGTCTATTA
TATTTTTCTT
AGCCATATTT
TAGTCTAAAA
TTTCTCTAA
1016 ATACGATCAA TATCGTAATT TACGATAATT GCGACAAAAA CTCCCATAAA ACACGCACAA ACACGTACAA AA7TtGTCCA CCAC TGGAA TTGATAGGGT ATAGCTGCTA CACCACCAAT GTTAACATGC TGCAGATrGG GTATTTAATA AGAAGAAGAC GAACTCCCAA AATGAACACT GCACCAATTT GAAGACCT AACCCCTGCT TTGTTATTCA TGGC'TACATT AACAACTACC AAGCTCACCC AAAAGGCTrC CAAGGCATAG AGTCCACCGA TACTATTTCC CTCATCAAAA CTCTCCCTCA GGCTAAAAAC CCAGCCAAAA AAGCCAAAAA TCAAGAGAAC CGTTTCTAAT 11640 AATGATAAC 11700 TGTrCATAATG 11760 GTGGAAAAAG 11820 TAGAATACGC 11880 GGCTGTCAAA 11940 TAGAAAAACA 12000 GAAT'rTATAT 12060 TCGCTGArTT 12120 AGGAACTTTC 12180 TGCAAGTCCT 12240 AATCCATGAG 12300 GCA.A'ACCTG TTTTAAAGGT TCGCATACCA TTTAAAAT AACTCATAAT CTCAACTTTC TTATGAGTAA TAGTTGAGAG GAAGCGTTTT ATCCCTCTCT TCTTTGATTT ATTTATAAAA GGAAGAACCT TACCTTCAAG AAGTTCCATT AACTTGTCTG CACCCCCAAG GTTAATCGCT GATTTAACTC CTGG~rGTTT CACGATAGCG TCTGGGTTI'? CAAATACACC CATAGGTCCG GCTTCGTCAA ATTTGGCGAT AGATTTTGGA ACTGCTTCAC CITCAGTGTC ACGCACTTCA GAGTCAACTG GCAAGA'rCAA TT'rACCATTT AATTTGTCTr CTTCTACAAG TGAGTTACCG TAAGTCATCC CACCACCGAT AAGGACGT1'A CCGATCTTGT CTGAAACTTT TGAACCACCA TCAAC'rGCTTr Cr'rGGATGTA GGCAATTTCG TCAACGTTTG CTGAGATACC AACGTTAGAT TCGTT'rACGA AGATACCATC TCCAAGTGAT TTAGATTCTT TCTTGCCGTC AACAT CG GCGGCAGCTG AGTCACCACC ACCGATGATT 'rCCATCACAC CGATTGTACC AGCTTGGAAA TTCCATACGA CTGTTTTGGC ACCAGTCAAA CCCATGTCAA GACCAAGGAA GCCTTCAGAA GTGTAACCAG CAAATGCGTT AGCTTCTTTT GCTTrTTTCAA GAAGAGCTTT CGCAACATCC ATTTCGATAC CTTGTGCTTT GTAGAATGTG TCAGCTTTT'r CAAGCAAGTT TTCGATAACA AGGATAGCCA CGAATGGACC TTCTGGAGTT TTTTCAAGAA GGAAACCAGC AACTGCTTITT 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 AGrT'GAACT GGGATTTATC TATTTCCATT TTATCATAAA TATTTTAAGC AAAAGAAAAG TCTTrTT CTGTCAAGGC GATGCTCCAC CACCCGTACT CCATCTTCAA GAGCGTTGAT ACAACATCTT GACCAACTT'r CCTTTATCAGCTTC~TTCrTTT TGTTCGATGA TGTACTTAAT ACGCCAT=T TCAATGGTAC TGcc~CrCTT
TGCTCCCAAG
CACACGTCCA
AGTTGGAAGA
GTTrGAACTCA
GCGTGTGCAC
GCCCAGTATT
TAACGAGTGT
AATTCAGCAC
TCAGCTGCTA
AGGTGAGAGA
GCTGCTGTGA
GGTCAGCTGT ACCGAATGCA TACCAAGTTC AGGATCGTTT TTTCAACCAA GAGAACTTGT CACGAGTGAC ACCTGGGAAA CAGGAGCAAG TGAT'rTACCA AAAGAATT'GC ACGTCCACCT TACGGTTATC GTTAGTGATT ACACGAACGA GGACTTTr" ACCTTTCAAG 1017* TCAACGTCP TAACAGTAAG TTTrTGCCATG TTACAAAAAC TCCGG INFORMATION FOR SEQ ID NO: 152: Ci) SEQUENCE CHIARACTERISTICS: LENGTH: 905 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 13425 GATTrATCCT ACCGGnGAAT CGAATCATTG GCTTTCTGGC ATTCCAGT'rG CCATCGGTAT CTGCTrGAAA ATTATCAGGT GTTCCTAGCC TCCTCAAAGA TGGTTATGGA CAACCTTTAT GGAACCT'rAA GCGCCAGCTT GTCT'rGGTTC CTGGCCTCAG ATGTTGACTG GTTTTAAAAC GCAGG'rGCAA CTCTCATCGT 'rCACGCGTCT ATCATTTCAT CCAAATGCAG GAAACGCTGA ATCATCGCCT TCTTCTT TGC GATAAATATA AATAATGGCA TATCTAAAGC AGCTATCCCY
CTCAA
TTCCGGAGGG GTTCTAGCAG CAATCTTAGG AA'rCTATGAA CCATCCCTTT AAAGACTTTA A.AGAAAATGT v'rrGTACTTr GCTTCTGGGA ATCGGCTTAT TTTCCTACCC GATTGAATAC TTTrGTATTA TGGAGCTTTG CGGGAGCTAT TATCGGTACA ATCAACTCGA GAATCTGACC GAGACAAGAT TGATTrAGCT CATTTCTGGA TTAGGACTCT ATGCCTTAAA TTTTGTCGTT TCTTAACTTC GTCCTAGCAG GCGCACTATT CCCATCAAAT TrACTTTTGA TTTTGGGACT TTTTGATTTC TTGGGAACCT TCTrTCCGAT TTTTTCAAAA TTGA'rAGATT ATGCCTTAAA CATCGGTATC GTCCTATCAA GTACCCTT1T AAGTATCCAA 'rACACAGGAC ITrTCACTTGT GCTGGGAATC TGGCTTGGdTA TTTGGATGAG AAAAAAGTTA AAATCAAAAA AACATTGGTG CATCAGGGGA TTCAAATCAA TGCCCTAGAA
GGCCCTTGGC
CTATGCTCCT
TGGAATTGGT
CAACTACCAC
GATCTTAATT
CGGTTATGTC
TCAATTGGAG
GAACAAATCC
GGAGAGCTTC
INFORMATION FOR SEQ ID NO: 153: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 4278 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 153: 1018 CTTGAATTAA ATAAAAACG TCA'rGCGACT AAGCA'r"A CTGATAAGCT TGTTGATCCC AAAGATG'MC GTACGGCTAT CGAAATTCCA ACCTTACCC CAAGCGCCCA CAACAGCCAG CCTTGGAALAT TTGTGG'rGGT ACGTGAGAAA AATGCTGAAC TGG4CAAAGTT TCCAAT'rTTG AACAGGTATC ATCAGCCC? GTAACCATTG CCT7VrTTAC TTAGCCAAAC GTGCTCGTAA GATTGCCCGT GrTGGTGGTG CTAATAACTT CAACTTCAAT A'rTT'rAGAA AAATCTGCCA GCTGAGDTG CCCGTTACAG GTCAGCGACT ACCTAGCITCT CAATGCAGGT TTGGTTGCCA TGAACTTGGT AGCTTAGGT 180 AGATACGGAC 240 TTCTGAAGAG 300 TGAGCAACAA 360 TCTTGCATTG 420 ACAGACCAAG GAATTGGTTC GTT'TGGAAA TCGAAGACCG GAAAAATTGG AACCAAGCTA GAAGAAAAA.A TGACAGCAAT TTGGCTGACT TG'IrTAGCCT GCCCAGCATC CATTTGGGCC GACCGCGATG GCTACCCAAC TAACATTATT CTTGGTTT'TG ACAAATCAAA AGTTAATGAA TTTCCGCCCA GAACTCTTCA TCACAGTGGG TrATACAGAC CCGCTTGCCA GTAGATGAAA TCATCGAGAA AACATAGAAA TGAI-r'rACA GCAGAAGTAG AAAAACGCAA AGAAGACCTC TTTGGAAATC AATTCAGAAC GTGATGACAG CAAGGCTGA'r TGGTCCAGTA AAAGCCTTGG AGAAATTCCT TGAAATCGCA TAAGAATGTT GATAACTATG CAGGACATTT TGAG~rrGGT GATGGAGAAG AAGTTCTCGG AATCTTTGCC CATATGGATG TGGTGCCTGC TGGTAGCGGT TGGGACACAG ACCCTACAC ACCAACTATC AAAGATGGTC GCCTTTATGC GCGCGGGGCT
TCGGACGATA
GGTCTTCCAA
GCAGACATGG
CCAGATGCTG
TTTGCAGGAG
AATATGGTAC
A.AACTAGATG
AAATACAAGG
AATGGCGCAA
GACTACCTTG
ATTGCTCATG
AGGGTCCTAC AACAGCTTGT TACTATGG'rT TGAAAATCAT CAAAGAATTG CTTCTAAGAA AGTTCGCTTC ATCGTTGGAA CAGACGAAGA ATCAGGCTGG ACTACTACTT TGAGCACGTA GGACTTGCCA AACCAGATTT CGGMrCTCA AATrrCCAAT CATCAATGGT GAAAAAGGAA ATATCACCGA ATACCTCCAC AAAATACAGG TGTTGCCCGT CTTCACAGCT TTACAGGTGG 'rTTACGTGAA CAGAATCAGC AACAGCAGTC GTrCAGGTG ACTTGGCTCA CTTGCAAGCT CCTTGTTGC AGAACACAAA CTTAGAGGAG AACTCCAAGA AGAAGCTGGZ: TGACGATCAT TGGTAAATCA GCCCACGGTG CTATGCCTGC TTCAGGTG1'C 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 4 4 0 1500 1560 1620 1680 1740 1800 CTTACCT'rGC
ACATCGCAGG
TGGATGAAAA
CCTCrrCCTC AGCCAGTTTG GCTGCTGG TAAAATTCTC TTGAACGATC ATGAGGGTGA GATGGGTGCT CTTTCTATGA ATGCCGGCGT
TCCAGCCAAA
AAATCTTAAG
CTTCCACTTC
GATGAAACAA GTGCTGATAA CCAGAACAAA 'rCAAGTCAAT CACGGTCACA CGCCrCACTA TACCATTGCC CTCAACATCC GCTATCCAAA AGGAACAAGT CCTTGAAAAC TTGCCAGrrG =CTTTAG CC'TGTCTGAA TGTGCCAATG GAAGATCCAC TTGTGCAAAC CTTGTTGAAT 1019 ATCTATGAAA AACAAACTGG CTTTAAAGGT CATGAACAAG TCATCCGTGG TGGAACCTTT
GGTCGCTTGC
ATGCACCAAG
GCCGAAGCTA
ACT'rTr'r CGGA'rTCGCA
AAACTGCTCG
TTTACTGGAA
TAGAACGCGG AGTTGCCTAC GGTCCTATGr TCCCAGACTC CCAATGAAT'r TATCGCCTTG GATGATCTTT TCCGAGCAGC TrACGAATT GATCAAATAA AACGAThAAA GTCTGAGATC GGAGGGAAAG TAGATGTCTC AAATCGAAAG AATCAAACAG
GAATGCCAGC
CATCAATATC
AGATAAAAGT
TATACAGAGC GTGCATTGA ATCGGTCAGG CTCCGGCACT GGTGACCGCT TGCGGGACTG GCCTCTCTrT
TAAAACTCAA
GCTAGGTGTG
GATTGATACC
AGCAATTTAT
'rrATGCTTGG
GCTATCATGG
GCAGCGCCAA
GAAGCAGGCC
GATGAAGATA
CCAGGACATG
CCGCAGGTCT
CAAGCCTACI'
AAAGACTATC
ATGGCCAAAA
ACCATTTTAT
0
S@
0 0 000 0 00 00 0 0 0 00 00 S 0 0 00 J 0 0 00000 C Of..'
C
00 0 0 *000 0 0000 0000 0000 00 *0 00 0 CCTrTTACAA TTCAGGTTAT GCAAGTCGGG TGATCTTCCG TACAGGAATT~ GCCTGATA'T ATTTACAGGA GAAAATCAGT TGCCAGCCTA TTTTCCGC'rA ATCCTTGGTT TGAGGCAGAA AGTCAATGAA AATCAAAGAG TTGAAGTTGC AGATAAAACT GTCGTTTGAA GAGATTTTCG ACGCAT'NTrA CCAGCTTTAC TTTGCTG=T TGCCTATGGA TT'rCTACTTr CCTCGTACAG GTTTTGCAGA AAAATGGCAT CAGTTAACCC TCTTGATTGG GCAATATCCC GGGAAGGTAA CGGAGAGGGT GAAACACTAT GT'rCACCCAT CACCACGAAA TCAAATCTGG GTAGTGCCAG ATr'rGAAAAA AAGA.ATTAAA
CAAACTAGGA
GACGAAGTCG
AAGAGTATTA
CAAAACGACC
ATGAGT'rTGC AGCTCTTCAT T'rAGAAGGTG ATAGT'rATGA T'rAATTGGAC AAAGATCGCT GATCGGCGCG TGCTTGCCAA TTCCTATTAG GTGGCATGTT ACC'T7'tGAA GAACTTTAAA GGGTAGGAGA GTTTTAT'rT'AGATAATTC AGCTAGTCGT AGGCTrGCTCA AACTACAGCT GTAACATACG CACGGTAAGG CGACGCTGAC GAAGAAAAAG AATGAAAGAA ATAGCCTTTG AGCTTr'CTrT AGTGGATGTG AGAGAAGTGG CCCACAACCT ACCGC'rTAGT CAATTGGCTG TGCATTATAT TATTTGCAAA TCTGGAATGA AACAAGGTTA TAATGTTATC AATGTCCAGG ATTTTGCATT TCTCCTACI'T GGTGTGGACT 'IrA7TTTAA GAAAATTGALA AACATTTAAT 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 ATTTGCCTCG TGATGCTrT T'rCAGACTCC TAATCGTGGT ATACTAGGTC AGTATTTTAT AAATATGAAG GAGATTrTTA 'rOGCTAAAAA AGGTACCCTA ACAGGTTTGC TCCTGTTTGG AATATTTTTT GGTGCGGGGA ACT'rGATTTT TCCGCCTTCT ACATT'rTCTT CCTGCCATCG CAGGTTTTGT CTTTCAGGC CCTTATTATT GGAACGCTAA ATCCTAALAGG ATATATCTAC GCCTTGGTTT GCGACTCT'rT ACCTCTCAGT TCTTTACTTG CTAGGTCTC TATCTGGAGA GTTGGTATCG CCGTCTTGAC GAGATTTCAA CGAAGATAGC TCAATCGGTC CATTCTTTGC 1020 TACCCC.ACGT ACTGCTACAA CAGCTrACGA AGTAGGGATT AAATAAAGGA CTTGGCIGA TTGrATTTAC GG?1'CTGTAT TTCGCTTAAT CCATCAAAAA, TCTTAGACCG CATTGGACGT AATM'rr GTTATCTTGG TCGI'CTGGG AGCTATCAAA AGCTGCTTCA CTGCTrATCA AGCTTCTGCC TTTGGTACAG ACCTTGGACG CCCTTGCCTC AGTGGCCN'T AGCGTAATCG CTTGGATTT'r CAAGTAAGAA AGAATACA'FT TCAAC'TATTT GCCCTTGCCT TCAGCGCTCT TTACATCGGT TTAGGTTTTC CCAGCTGAAG CGATGAAGGG TGGAACACCA GGTGTTrACA GAAATCTTTG GCTCAACAGC TCAACTCTTC CTTGCAGCTA ACAACGACTG TTGGTTTGAT TGTGTCAACA GCTGAGTTCT ATCAGCTACA AGGTrTATGC GACAGCCTI'T ACCTTGATTG GGTCFGATG CGATTIATC INFORMATION FOR SEQ ID NO: 154: Ci) SEQUENCE CHARACTERISTICS: A) LENGTH: 1953 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear AGCCCCCTTT TGTCGGATGC ?I'TGCGGCAG CCTATTTGAT ATTTTA.ACGC CAGTCI'TGC TATGGTIGGAA. CAAGTCCTCA GT= CCTAGA AGGTTACAAT CAGTTCAAAC CTTGAAACAA GGGTTGTTGG TATCGTTGTT TTGGAAATCA TTTCCCAGTA TCTTGTCACA AGCCACTCAA TGGTTACCGT AACCTGCTTC TTAATGAGCG CTTCCCACAA GATTTGCTAT TGCCAATTTG 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4278 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: ACCCGATCAA ATGACAAAAG CCTTGCCC'N' AATATTGAAT AA.AAACGGAA GATGTGATTG TGAAAG'NTTr GTAAACTCAA ACCTGGTACA GATGCTACTA GATTGACGGA GGAAATACTT C1'CTGGTATC AACTN'TATCG TCCTTCTATC ATGCCTGGTG CTAACTTTGG TGTCGTAGGT ATGGCCGTAA TGGGTCGTAA CTCGTGGTTA CACAGTTGCT ATCTACAACC GTAGTAAAGA CTTGCCATCC TGAAAAGAAC TTTGTACCAA GCTATGACGT
TCGAAAAACC
TCCAAGCCCT
TCTACAAAGA
GTACTGGGGT
GACAAAAAGA
CAGAAGATGG
AAATGGTTCA
TGATGCAACA
TCGTCGTATC ATGCTGATGG TTCAAGCTGG TCTTCCACAC CTTGACAAGG GTGATATCTT TACCATCCGT CGTAATGAAG AATTGGCAAA TTCTGGTGGT GANAAAAGGTG CCCTTGAAGG AGCCTACGAA TTGGTTGCGG ATGTTCTTGA CAAACCATGT GTGACTTACA 'rCGGTCCTGA CAATGGTATr GAGTACGGTG ATATGCAAT CTTGCTAGGC CTTTCTGCAG AAGATATGC
AGAAATCTCA
TGGAGCTGGT
GATCGCAGAA
GCTAAAGCAC
CAkCTATGTGA
AGCTATGACT
1021 ACAAGGGTGA ATTAGACAGC TACTrGA'rTG AAATCACAGC TGAAATCTTT ACTGAGTGGA TGATATCTTG AGCCGTAAAG
TGATGCTGCA
TGTACCATTG
AGAACGTGTA
GGTAACAAGG
TCACTGATTA
CATGCTAGCA
CAAGGCTGAA 'rrGATTGAAA CGCACAAGGA TTTGCTCAAT TGCAGATATC GCATCTATCT GATTACAGAT GCTTACAACC CI'TGGATCTT ACTGCTAAGT AGCAGGTGTG CCAGTGCCAA AGCTGACCTT CCAGCTAACT CCAACGTAAA GACAAAGAAG TCAGCCATGG GGAAACGGAT AGTTTGGAAC TCCAGAAAGA CTCTCCATGG CTCTTCAGAC ATGATGGCTC AGGATTTTGC ATTTTAGATC ATTGGGAAGA TCATACATCT ATAAGCCAGT CGAGGTCGGG ACT'rCATTGA ACGATGAAGG CCAAGATGGA CCAATCGTAG GAACTGGTAA ATGGACTAGC CAATCATCTC CTGAGTCAGT G~orGCACGC TACATrTCAA AGG'GCTTCC AAAACCAGCT GCCTTCAACT AGATCCGTCA AGCCCTTrAC TTCTCAAAAA TGCGrGTAGC CTCTAAAGAA AACAACTGGA GGCGTGATGG CTGTATCATC CGTTCTCGTT GCGATGCAGA TCTTGCCAAC CTTCrTTGG ACCAACAAGC AGTACGTGAT ATCGTAGCTC CTT'rCTCAGC AGCTATTACr TACrTTTGATA ACTACATCC'r
TTGACCTTGG
CTTACAAAGA
TTGAAGGAGA
TCATTTCATA
ACTTGCCATT
TCTrGCAAAA ACGAGTACTr
TTGCGGTTCA
GCTACCGT'rC 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 TGATCCAAGC ACAACGTGAC TACTTTGGTG CTCACACTTA GAACCTTCCA CTACTCTTGG TATGACGAAA AATAAGTAGG TTTATTAC?1' GAGAAAGAAC GAAATCTAGC TCATTTTTTA GCAGTATCGG GTTGATC'rGG TAGAGGAGGO GCAAAAAGCC AGACTATGAT TTGATGTTAT TGAACGTTAA TCTGGGAGAT AGAAAAATTG AGCCGAACTA AACCTGCC C AGTCATCATG CTTGCAAGAA GAGCTGGAAG TTGTTCAGCG TTTTGCAGTT CCTTATCGAA AATCTGGTAG CGCGTATTTC GGCGATCTrC TCAACACTGC AGTCTGATGA AAGT'rCCAAG GACCTACCGC AATCTTAGGA TAGATGTrGA ACATCACACG GTTTATCGTG GTGAAGAGAT GATTGCTCTG ACACGCCGTG AGTATGACCT TTTGGCGACA CGG INFORMATION FOR SEQ ID NO: 155: SEQUENCE CHARACTERISTICS: A) LENGTH: 6474 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: CCGGCAGTAC ACGAGCTTGG GGAACAGCCA CTGGAACGAT GAGGTGTGAG CTCAAAATAT CCTCCAGTTA 74TTTTTr'CCT AATAGTATAC GCTTACAAAG AACCTACTTT CTATTAAACA TN-711-GI-AC AAATN'TATA AA~rAT'rGCC AGATCCCCCT TG=GAAACT TTATCTTTAT CAACAATTAG ATCACTTTGT TTTGTAAATA 1022 CGGAAGAGTG AAAGGATT= ATAATGGAGC GTATACTATG AAAATGTGAA AA'rrAACA'r T'MAATAT CAA'rAGTTAA TCTCTTATCC AAGCT'rCAAG GCCCCTATCC CATCTATTTC GTTCAAAATT -C9-r'ICAATA ArrACGTTAT CTAAAATTTT TATACCGACA ATCAATAGTT TGTAATTATC TGAA'rTATAT TTCATCCCCA GT'TCCAATAT TTGTTTAACT ATGAACTGTT CTA'rCACTGG GGCATT'rATT TCTATAAATr GACAATATCC TCCAAATCCA AAAGAAGGAT AACAAACACC TTTATTACA ACI'?CAGCAT
CTATACTAAC
CAT'rAATTAT
ATTTATATAT
GIrAAATTT GGI-rCATATA ACTTAAAATA GCTGACTCTT TCCTACTATC TTTGGCTTTC TTCTATTTGT GTTTGAAATA TCAATCGCTT C=TTAA TTGTTTAGTA TCTTT1GGGAA TATATAAAA AT'rTCCAAT'r CTTGGATCTA 'rTAAGCTTCT CCTCTCAGCA AAAGAA'rCTA GTTCATTAAA AAAGCAACAC CGAGAGCTAA GAATGTGTTA GAAAAAAGCT 'rAATTGCT'rC TGCTTCAGTA GGAGAAACTA ACATAACATT TTAATATITG GCAGTACI'AT TTCAACTGTC TCATCTCCAA ACCTTCTCTC AAAAATTCAG TTTG'I'TGAA AAGCCGATCG CCTCAGAATC TTCGATACCG ATCATAATCT GTCGGAAGCG TGTATCAAAA AAAGTAATAT TTTAGATGGA AGAATGCCCT AACAACTTA TATTTAGATG AATTACTGCT ATATT~CATAA AGCTGCCCCA TTCCTTAAAT CCAGATTATC AAAATCTCAG TT~lrTCA CA AGTTCCTCTT CATAC'TACTA TCACTTACAT TTCAAGCGCC Cr'rCCAATAC GAGTACTAAT CCAAAGGAAC AACTCTGCAA T'rTTTCTTCC CAACTATGCG ACTTGGATAT AAATT'ATCAT ATATAGAACA GGACAAAAAT GATATTTTT GTATCAAACA GCCTTTTTAA GAACTGTTGA CT'rTAAAATA ATCTTTCCA'r 'AGcGrT'AC TNTGTTCGAT TTCATATGTA TTAAAACTAC CAA'IrrCTC CAATAATATA ATAATCAATA TTATTTTT'AA TTTCAGAAAA TTAAGTTATT CTCGCAAAAA AACTTCATAA GCTCTTCATT TTTTAAATT A'rTTAT'rTTT ACAGAATCTA TATCATATGC CAAA'rAGTAA CGCGTAGGCC AGCCCAACAT GCCCCAAACC AACTACTTCC TTATTTCTTA ATCCAAAATC TAATAGAATA ACAACTCTTT AATATTGTTT AAAAGTTTT CAACTGATTT ATTTATAGCA CAATATTGAT GATATTCTAT CAATATAATT 780 840 900 960 1020 108,0 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860
GATACTTTT
ATGGGAAGTC
TATTCACAAA
TAATT=A
CTCATAATAT
AACATGAGCA
CTTTTCCCA 'rATAACTAAC ATTACTTTAT AACGCATAAA ACATGGTCAC CAAGTGAAAG CGGACAATAT ACGACACATT TGTCGTCTAA ATGCATTAAC AGCTCrTTTA TGATATCATT CTTTAATGTG TCCTCATTTT TTAATTCACT ATAGATATGA CGGTATAGAA AATTGCCATT TCTATCTTTC CTATAGAGAC ATTCATAGTA CGATAAGTGr CTAAAATCAC ATTGTAGACG 7rTCACAACCT AACCTGTCTT CTTTCTTCCT TTCTCAATC GGATATTTCC CAACCI-rACA
CAACTTATGA
TATAGTAATA
GGAAAAAAT
AATTGCTTAG
ACAAGTACAA
GCGTCATCTA
CAGAGGGCTG TAGCTTGG CTCAAAGCCT AACCAGAAAA T~rCTCCTTC TIGAAGTTAAT TTTGAAATAT AATCACCACA AATGTGWAGA TAAAAAGATA TACTTAGTAT TGTTACTCA7 AACCATTCCC TCTACAATPr TCAAAAAAAT TCACCTTATT TATTTCAATA TCCTTTCAAT TTTACAATCT TATTAAAAAA CTGTTGTATC GAAAAATATA AATAAATGAG TTGTACCCGG AACTCACAGT TCTTGTG ATCTAAAAAC TCACTAAGTG CTTAATAA'rG AATATTTCGT ATCATCCTCT AAATTCTCCT GATAAGCTCT TTATCTCTAA
TCTGATTAAA
TAAATAAACA
CAATATrTG
TTCCACATCA
TA'rATATAAA
TATCACCCCA
TAAAATAATT
GACACTATT'r
AAATCTT
TTTACTAATG
ATGTTATCAA
TTATCG'I-TA
AATTAAATAT TTTCA'rACAA ITTGAATATT TAAACAACTA GAACACTATC TTGAAACCTC GATCTGATAT 'TTrTAGAC GAATGCCAAA ACI'ATGCAAA TT'rGGGGTGC AGTTCATAAG TAGTCAAATA ACAGATACCT TCTTCAGTAA T'TACCTCATA ATTTCAACAA TCTCAGACAT TT'rATATGGA TATCTAGGAT ATGAACTGCA CCCCAAAACT TAGACAGAAT AAATCTAACT ATTGGGATAT TT'rTTrTTAG CTAGAACTAG TAGAAATATA TAAGGGTTrTC TCATCTACAT AAAAAAATGA TACTTrC 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 AGCTTCACAA TAGAATCTCA TGTTTCCCTC CCCTATAT'rC 'TAAATAAAA TCCTTTGGAA A=TGATATAT CTTAGTAAAA TATTGTTTAA GTTCCGCATG CGGAGCATGG GTAACAATAA TGACAGTCAA ATCCTCTCTA TCTAATATCT TACGTTCAAT CGCTAACGAA GTrCTCCTAT CGATAGCAGA AGTTCCCTCG TCAATTAATA CTATTTT'c~r ATTTCTAATT AGCCCTCTAG *CTAAAGThAT ?TTTTGTTTC TGCCCTCCTC ACAGTAATCT CCCATCATCA CCAACATAAT *AATCTAAAAT GTTATTAGGA AAATCT=TA CACTCAAACC AACTTGCTCT AAAGACTGTA GTA'N'TCTT.C ATCAGTATAA TTTTCTTCCA ATAAAATATT ATCTCTAATC GTACCTTCAA *ACAAATAAGC 7'rTTTGATCT ACATATAGAA CATTCGAAAC CATATTTAAA TAGGACGTTr ***?TTTATATC ATCCCCGCAG AATCGCAATT CTCCACTATA ATCTCTCAAA AAGCCATTCA ATAATTTTAA TAATG'rAGAT 'rrCCCCCTTC CACTTTCACC TAAAATTAAA TACTNTCAT TACGT'rGAAA ACAAAAATTT AAGTPTTTTA ATATrrCTI-r ATCTCCATAC TTATACCAAA TA7IN'TTGC TTCATATAAC CGAAAATCTC TATTCACCTC ATTTGGTTCG ATATCAT'rCA TTTTATTTGA CTCAAT=GA TTAATTGAAT ACAAT=rAA AAAAATAGGC TTCGTACCAA TAATAGAGGA TAATTCACCT CCTAATTCAC CTCC 'ATTGC TTCAATAGTA CCAATTCA AAAAAACGAG AGATATCTGA AAAAAA.ATAT 1'TTCTACAGT TGTCTTTCTT TGTATAACCA TCTTAGGCAA TACATATAAA AGATTCAAGG TCTCACTAGA TTTAAAAAA GCTTCATTTT T'rMCGATGC AAAGAITTTTT GGTACAAGTA CAGTCAATGA CCAA'rGATAG TGATTAAGAG CTTTTATTAC TAAAAAAAGT TGTTAAACG TrAGCCACGA AAGATATGTT CCTGATGATT AGATGTCTG'r GGCAACTCTA TTTCGAA'rC' CATAATTTTT CACTACCCAG TCAAGGAATA 'ErGACA.AT'rT CAAAAACCGC 'rCTAAATTCA TAATAGTTGC TGCATAAA'rT AGCAATAATT TAAATTTTT' TATATTTGAT TTTATAATAG T'rAATTGAAC ATACTTTTCA TATATACAAT ATT'rCAAAAA AGAAATTCGT TAAAAATT AGCCTTGCAT TTCTCTCCTA TTAATAGTAA TGTAACCAAT ATTCTCTCGT TGAAACTCAA A'rATTCTTCA TCTGCATTAA TTGTT'TTTTG GTGAGCGTAG TAATTCACCA AGAATTCTCG TAGTTNTAAT AAAACAACAC CGTCCTGATG TTCTAGCACC TCTTrTAC ACAACCAAAA AACAAACAAG GGATrrCCAA CAAAATCGGT AATAT'rATT CCATAACTAC AAACTAAAGC AGCCATCAAC rTrAAAATTC GATCTGGCAT ATACTTACCT CTAGAA='T TGATACCTAT ATTACTTAAA ATTAACTTAA TTCTA'rGAr'r TCTTAAATTT ACATTACTAT AATTATCATC TTGATGTAAA CAACTTTTCA CAGCTCTAAT 1024 CTAGCGCTGT AAAAATAACA CCTGTTAGTG CTATTCCTTT TATTGCAAGA TAGCCTGI'A TGAGAAACAA GCTAATAGCG CCTGCTAACG TCTTTAATAA AArTCCTGCT TCTTTAATr ACGCTAACAC ATCAAATCCA TTCAATATAG GGTTAGTTAA AT"TTAGACTA ACTTCTCCCA GCATAATCAT TAATGAAAAC AAGGTGGCTA TCACAACTGC A.AATATAGTA CCAGAAATTC CCTGATCATT TAAAGTCTGA ACATCATTAT TACTATGAAA rrCTTCATAG GTAGAGTTAG CAGArrAAA CTCTTGGATC ACTTCAACCT TTATCCCACA CCAGACAATC ATTTGGTAGA TCGCAATTAA TTCATTCAAC ACCAGAGCAT GACCAGCAAC AATAAATATC GTTAATAAAC TATACACAAT AGTTTCTCAC '!PrCAAATT AGAAAAAACC AAAATGATAT AATAACATAT TTCTTCTCTT GCCTTCTTGA TTACTTrrAA CCGCTTTATG rI-rAAAGAAT AATATTTCTT TAAATTAAAA TATTTCCTAC AGTAA'rTATA TGTCACTCCA GTGATACCGT TTTCTTTACT CACTATATCA ATTTGGTATC CTTGAACAAG TGAATCTA'rT TTCTCAAAAC CATTAATTAA TGACGTACCT GCTATATTGT GAACCATTTG CrTTCTCCTrCT TCTCGTGTAC CATTT!GGATA TAAATTCrTC ATTCTACTCT TT=AAAACA ATATTCATCA TCATCGTCTA AAAATGATAT GTrrCTGGCA TTAG7?rGCAC CTAAATCTTC GGTATAGCCA AATTGATGGA TAATTTTATT AATAATTATA ACTTCGATAT TTTTATAACT CAGAGATTCA TACCTATTAT GTGTTGGTAT 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 1025 TATAATACTT ACTAATTCTr GATCTATATT CCTATCCATG CATCATATAC TCTCATGGTT 'rCTACAAACA TTTTTPGCAC TTGATrTACT ATTCTCACCT ATATATTCA AATACTCAGA CACAAGCACA CACTCCCTCA ACATCTTCCT TCTCAAATAA CAATAA'"C ACTTAACCCG CCAACATTAC 'rAGCTAAAAC ACTACTCTrC
AGAAAAATGT
ATCATTGAGT
AAATCCATCA
CGGAG'IrCCT
TCTAATAATT
TTTrTATTT
AAAAAATTAG
ACCCTATG'rr
TGTGACATTG
ACTCTAAAAC ACACATAGGT AT'rCCTTCTG TTrTGGTAAAC TATAGTAGCT GGATAGATTT TATCAGAAGG AATATACAAT AAATCCGATA CACCAAGTAA CCTGAAAT'rA TCTCTACATr CCCACATACT ACCATT'rCCA GCCATAATAA TTTCTGCAAA TTCAAGGAAT CTATCCGGCC a.
a a a a. a a a. .a a..
0 a. a a TCAAATGGCA AATTTTTCT AAATCACATC 'rrCTCTGACT TTTTTTCTGG ATCCAACCTT TTCT'rTTTTA AAGTCTTGAA GT'rATTTATT GCAACTATCT TTTCAGATAC ACAAATAAAA AGAATTI!CTT TTTTAAGTTA.
TTCAAAGCAG
AAAAATAATT
CCAACATAAC
CACCTACTAC
TCTTATTTTT
AAATGATr ATC1'AAATCG
TATTATACTC
GCATCTCCCA TAGAATATGT TATTCAACCC ATCCATGGCA TTGTTATTTG GAATACAAAA CTATTTGATA CATTAATTCC TCCAATCTTr TTTTrCATAG CCAA.AAATCA AAATAAGTCA TGTTATCACT GTCTTAACCT AAAATAATTA GTTGAGTAGC ATTTTTAACG TGTAAGGCAG AATATCTAAA TTTTTrATAAT AGGGAATAGA GACA TTCCAAATCC AT'rCTTGTCA AGTTTTTTTA ACATATATAA CATGACAGTG 'rATAAGTTGG ATTTTTAATA ATTTAAAXAT TTTCAAAATr ATTTGAACAT TGAGTACAAT CAACATAGGC CATCAATAAC CTTTGAATCT CTAGATACAA TTATCAAAAT INFORMATION FOR SEQ ID NO: 156: SEQUENCE CHARACT'ERISTICS: LENGTH: 4792 base pairs B) TYPE: nucleic acid STRLANDEDNESS: double (D TOPOLOGY: linear 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6474 120 180 240 300 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: TATTTAACGA TTTTTTTCAT GTCATTTCCT CCAAAATAGA ATACCTTATA ATCTrAACAG AAAAAGAGCA TTTACGCCAT TATATGATAT CTATCTCTGT GATAAGTTTT TTTTATGGGT AATTTAAAAG ACCAAACGCA AGATGGCAAT CAAGACCACT CCAAAGAGAA CTGTTCCGAC TAGATTGCGG TAGCGAAAGG CTACCCAAGC TGTTGGAAAG ACGGCTAAGA AGTCCAGTCA TTTGATTTGA GGAAGACTGC CAACC2-rACC TGTCACTACG CTTGAAAGAA TCAGGGCAAA 4 0 *9 0 00. 0 0* *0 0 0 0 00 *0 9 0 0 *0 44 0 0 GATAATGGAA ACAG3GCAAAA CTTGACCAAG ATGAAGGGAA TGCTAATAAA AGATACrrAC TCGCAAACAG AACAGCTAGT CAACAACTGC TAGGATAATG ATrGCGAAGC AAAATACCAA TGGATTTGG'r AGCAGGCCAC ATAGCTGTTA AGAT1TGr'rC TrCACCCATC AAAACGCCAT CCAAAGACTG GTATGACGGA GrGATTAGA AAAACCGTCA CAACATGGCA AACTGGGCAC AGGTGTCACA TAGGGCGCAC AGCCGTTGGC ATGGCTGCCT CATATTGTCT TAATAATACT TTGCTCAAAA CACTGTT'rrG T'rGAGGTT GTGGATAGAA TAGAACTGAC GAAGTCAGTA CAGTTTCGAA GAGTACAAGT GCAGATTGAA ATAAGATGAG TTI'TAGCAAT TGTTTCGTAC AAAAAAGTCT GT'rGA'N'TTG CTAGACATAC CGATATTGTA AGGTCTCATC 'rGTCAAGATA AATAAGTCGA 'rGCGrGTAAA TAGCAATAGC TGCCACAGGA TCCCAGCATA AACAAAGAGA CGATAATTCC ACAGGCCAGG GCGCCCCCTC CTAAAATCCT CAATGAAAAT CAAAGAGCAA AGGTTGCAGA TAGAACTGAT
CTCAACAAAA
GCTTGAACCA
CTCATCAAGC
CCGATACTGA
TT'TTCr'CA
ACTAGGAAAC
GAAGTCAGCT
AGAGACGCAA
CAA'rCAGTGC CCAT'2TCAAC
CATAGCCAAG
TCTTTCTCCT
TAGCCGCAGG
CAAAACACTG
AGGTTGTGGA
GGTTTGAAGA
CTATTATATA
TCTATAAATG
CAGAAAAGAG
CAAACATATA
1026 ACTCAAAAA ACCCAACA ATCGCACGCA GGCCCTTATA TCATACGGGG AATCCAAGTC ACCAAGCCAG AGAAAATAAC TGACCATCTA AAACCACCCC CATGCTACAA CCAAGTAGCG GACTGAGACA TCACTGTCAA GAGCAAAAAG AAGGACACCG AGCAGATTGC GGACAGGAAT CCGTCIGC ATAATCTGAA TAAACATCCC AACCAGGGCA AAATCCAAGC CAAAGATTTC CCAGAGCCGT TCCGACI'ACT GTCCCCACAA ACCAAGCCAC CGTGCATCCA CATAGGATTT ACCTTGTCTG TATCGCCAA 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100
CTCACGAAGT
ACCATACCTA
AGGCTGAAAA
AACAAATCGA
TAT'TTAGAT
ACCATCATAA
CAGCTCAAAA CACCGTTTTG CGGCAAAGTG AAGCTGACGT GAATCCAACC ACAGCATGGA 7TrGGGAALAGT AAAATTAAT TCAGTCTATT~ ATAACACATT AAAGACTGGC AATCCAGTCT 00 90 0 9 *0~0
S
0~00 4*00 4.
0* 00 0 0 1-rATAGAAAT TCTCCACTAA ATACTTTCAC GAAGAAATAG CAATAAAACA AAGCTAACTG CCAAAGCTAT TGTAAATAAG CTAGGTAAAA CTGCTCCCAT TGACTCCTCA GGTA~rrGTT GAACACCTGC GAAAAAGGCA TCCAAGGTAC TGGCAAATCC TACTAGAAGA AGCAACTGGA TGGAAATGGT ATGTTGCAGA TAGCCACTTA CTAG'TGTGGC TAACAAGGCA AGAGATTGAC GAA'rATTCAG AAGCATAACA AAGGCAACTA CCAGAGTTCC AAAGCTAGTA CCAATGGTTA CAACCGTAAT GGCACCGATA GAGGATTGAA 'rAAAAACGAG TTCTTGCAAT CTAGGAGAGA TAAAGATGAG AATCCAGTCA AAACGAACTG TGACAAGTGA CGCATAGAGA GCTGTTT1TTA CAAGGCTTCC GACAATCAGG GCTGATAATT CAGMGTAA ATTCAAAAAG GGCTGGTTCC TTAAAAATAG AGTGGAAATA CGAACCGTAA CATTTATCAC TGCTTGACTA TAAACAAAAC CAAGAGCACC TTATTCATAT TCCATATCAA TTCGATGAT GCTGGCAAAA GGA=rTACA GAGAGTCCTI' CTTGATAGCT AATCG7rTT AGAGGTCAGT T'rTATGAAG AGGATACCTA AAAATGCGAT TAAAAAGGTA GTAAGGAAAT AAACTGGATG GATAGAATGC CTAGTAAGAC TCCTCCTAGG TTG71=rCAC TAAACTAACA GTTGACTGTT TAAAGCCAAT AGCTTCTGCC CCCCAATAAT TCTAATG.AAA ATCGGAGTGA GCATGGCGCC TGAAAAATAA
GTAGAGATAA
TGGAGCAAAT
TCTACTTTCA
AGAGCGTTCA
ATATTACTGA
AGATGGTCTT
CTCAATGTGT
GACTGCCCTG
ACTGTGTATT
CAGACAAGAG
AAAGTGATAA
TCAAGACACG
GCAGGGTTTC
CATAAT'rCAG TAGTTAAT'rG
TTCAA.ATGGG
GTTAATCAGA CAAATAAATG AGACACTATA GAGTAAAGCA ATGATGTTGA AAATCCGCCA TGAAATCGTG ATCAGTAAPA GAAGGCCAGA TAAAAAATCG CCTAAAATCT CTAT'rTTGAA AA'rCTTTCCC CAAGGATAGA CTACTAGCAA CAAGGAGAAA AAAATTTGC AAAACTAATG AAACTCCCAG AAAGATTTGT ACAACTTCGG TCGCCAAAGGC GGCAAAAGAT CCATCTGCCA TATCCCCAAG CGTTGAAATC CACTGGTTGA GAAATACTTT CATCACAACT CCTTCT'rAAG CCGCGATACT ACTAACAACC AAAATTACAG TAACATCAAA AGCTGACCAA TGCCATTGTA TGACTTTGTC ATTCTAAATA AGACTGCAAA GTCTGTCAAG ACCCAACCGT GATTACTAAT AAGTACATCT AGCCCCCATT TCNTCCTTT A.ATCATGTCT TCAAAAATGG GACCTrGCAAT TGTAGCCCCT GTAAAAGTCG GAGCAGCATG
GACTATATGC
TATAAGACCT
GTGCGAGACC
TTCCAGAGCA
CACAGGATAA
TTGATAAGAA
AGTCCAATAG GCCAATAAAT CCACCCATAT AGAAGACAAA CCAAATAAAA CAGCGGAACC GTCATCACTA ATCCACGATA AAAAAATACA TCAAAAATGC ATT'rCATTTC GAGTAGGTGG AGCAGAGCTA GGAAGGAATA 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 GAAAAGAAAA AAGG'rAACGA AATTCCAAAC AACAAAAGCA GAAAAGATAG GATCCTTTAA ACTTTCTACT ATTGATN'TC CATAGCAATA ACAGCAAATA AAACCACAAG AAAATTCAAC GGCAAAGTCA GATAGCCCAG TAACAAGGTC GCTGCGTAAA
TGCCATTTCC
ATCATATCCG
ACTAGAACAC
CCGACCAAAT
ACAGATAATA
TGAACTTCTG
GTCAGCAATA ACTAGTAGA.A AAACTATAAT AAACTAGCGG TGTGAGATTA TC7"=TCAT ATATCACCTr TCTAATATCC AAATACCAAT AAAGTAACAA TGAGTAAGAA ACTATTCCAT GAAGCATGCA GAGCTATAGC CCAATAGATG GATCGGGTGT AGCGAAACAT CATACAAAAT ATCAAGCCCA TTCCAAAATA CTT1TATGAAA TCTG;TCGTTA TCCAACCATA CTGCAAAACA TGCATAGCGC CAAATATGGC AGCGGAAACA AGAACATCAA GATAGTATCI' CTTAACTTTA 1028 GATAAACTTG TCATCAAAAG ACCACGACAA ACAACCTCTT CTGATACAGG TGCGATAATA CTAGTATAAA GTATTCGCGT AACAAAATAG CTAATTCCTG TrAAATITGGT GGCTACTrCT ACGACTGTAC TrCCATrCTG GGTACGAGGA AAGATATAGG TTGTTAGATr TGCCCACACG AACAATAAGA AAAAAGAAAG AAGGAAAACA CCCAGGTAAG ACCAACGAAA CTGGAAACGA CCACACTCTT TCCAATGTTC ACTTTTGACA AAAGCAATTG TAGCTATAGT TCCCAGAATA AGTACCAATA AAACTTGGAA CACATAGTAC ATATTrATCAG ACAAAGCAAC CATAAAATCT AAGTCTGATG TGACATTAAA AATGAGGTAA TAAGTCAAAA TCAACAAGCC AGTrCCTAGG TGAAATTTCA CTrCT'rTCAT TTTCTTCATC CTATTATCTC CTATAAGAGC CTATCTTCTA 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4792 CGGCGGCCAA ACAATCCATC CTCATCCCTT GATTGCCCCA CTTCTTCCAC CACCTCCCAT GCTTCTGTCA ATTCCATTGT ATTATCTTCA TTTGTAAACC ACAGTATAAC ACGAACA'N'G AGAAAT'rTCA GATTTGCGAT TACTAATGAG GAGTGGAGAA TGCTAAATCT ATAGTCCAAT CAAAAGCTCC ACGATTAGGA ACCAGGGTAA ATTCCTGGGA CGCCCCAACC AGATATACCA AGAATTTACG AGGTTGCCTC CTCTAACATC TrGCAACTCA TTCTGCAAAT TGTAAATTTA ACATCTTTTA CACTC CTT CA ACTTCTGCGA CCTAGGATTT GCTTCAAGTG CTTTACAAGT G CTT ATTTTA GAAAATCGCA TATTTGATAT TTTTTCTTAT TrTGGTGAAT TTGATTACTT CTCTGGTATA ATAAAGTTAC ATATGAAGAA ACAAATTTTA ACATTATTGA AA INFORMATION FOR SEQ ID NO: 157: SEQUENCE CHARACTERISTICS: A) LENGTH: 2156 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: CCGTTCTCGG CGACGGCCAT CTGATGAAGC TATTTATGAG GGAAACTGGC AAGCTGGAGA GTCAGAGTAT CTAGTCTTTC ACCGATTGCT GTGGCAGCAG ATGTGCAGGG AAAAGGAGTT GCTCAAACCT TCTTAGAGGG CTTGATTGAA GGTTTTGATT ATCTTGAFT TCGCI'CAGAT ACGCATGCTG AAAACAAGGT TATGCAACAT ATTmTGAAA AACTTGGTTT TAAACAAGTC GGTAAGATGC CAGTAGATGG CGAACGCTTG GCCTATCAAG AATTAAAGAA ATAATGCAAA AGAAGTATGT AAAAATCCTC TACTCCTCAC CAATTGGTAT TCTATCACTT GTAGCTGATG ACCATTAI'TT GTATGGAATT TGGGTPTCAGG AGCAGAAGCA TTTTGAGAGG GOACTAGGAG ATGAAACGAT AGAAGAAGTT GTTAGTCATC CTATTTTAGA CCCAGTTATT GCTTGCTTAG A Li ii LJ.JtJO 1029 ATGATTACTT TAAAGGCAAG CCTCAGGATT TATCCAACTT GC'rCTTG~GCG CCAATCGGAA 540 CGAATmTGA AAAGAGAG?1' TGGGACTATT TACAGGGCAT TCCTTATGGT CAGACAGTGA 600 CCTATGGACA AATTGCTCAA GACCTGCAAC TGGCTTCTGC TCAAGCAATT GGTGGAGCAG 660 TGGGACGCAA TCCTTGGTCT ATCCTAGTAC CTTGTCATCG TGTG rGGA GCAGGCAAGC 720 GTCTGACAGG TTATGCTGCA GGAGTGGAAA AGAAAGCTTG GCTCTTGGAG CATGAAGGAG 780 TAGATTTTAA AGATACAAGC AATAGAAGGA GAACCACATC TTAGAATTTA TCGAATACCC 840 CAAATGTTCA ACrrGTAAAA AAGCAAAACA AGAATTAAAT CAATTAGGTG TGGACTATAA 900 AGCCGTCCAT ATCGTGGAAG AAACACCTAG CCAAGAAGTC ATTTTGAATT GGCTAGAAAC 960 CTCAGGATTT GAATTGAAGC AATTTTTMCAA CACCACTGGT ATCAAATACC GTGAATTAGG 1020 GCTAAAAGAT AAGGTAGGAA GTTTGTCAAA CCAAGAAGCG GCTGAGTGC TAGCAAGTGA 1080 CGGTATGTTG TTAAAACGGC CCATTTTAGT AGAAAATGGA ACTGTTAAGC AAATCGG1-rA 1140 TCGAAAATCT TATGAGGAAC TGGGACTGAA ATAGTTTTTA TCTATCTCTT TGATAGATAA 1200 *AATATATAAC TTCCCTGTT'r CAAAGTA'rGA TAAACTAGTA GGTAGACAAA GTCTGTA'rCT 1260 ***GACCGTAGCA AATAATTTCA TTGACGGCAG A.ACCATGGTA GCATGAATCA TTATCAGAAG 1320 **AGGATGTTTr TATGAATGTT ACAACGATTT TAGCATCAGA TTGGTACCAA AACTTGATGC 1380 *AATTGATTCC GGATGGCAAG CTGTTTAGCC TACCTCGGT C1'TTGATGGA ATCCCTACAA 1440 TTGTCCAACA ACT'rCCAACA ACAATTATGT TGACAATTGG TGGTGCCCTT TTTGGCTTGG 1500 7TTTGCGCT TCTTiTTGCCC ATTGTGAAGA TCAATCGTGT CAAGATT'rTA TATCCCTrGC 1560 AGGCCTTCTT TGTTAGTTTC TTAAAAGGGA CACcGA'N'TT GGTGCAACTC ATGTTGACCT 1620 ACTACGGAAT CCCTTGCT TTGAAAGCCC TCAA'rCAGCA ATGGGGAACT GGTCTCAATA 1680 TCAATGCGAT TCCAGCTGCA GCTTTTGCGA TTGTCGCCTT TGCCTTTAAT GAGGCAGCTT 1740 ATGCTAGTGA AACCATTCGT GCAGCCATTC TCTCAGTTAA TCCTGGTGAG ATTGAGCCCC 1800 CACGCACTCT GGGTATGACC CGAGCGCAAG T'ITATCGACG AGTGA'rTATT CCTAATGCAG 1860 CGGTGGTAGC TACTCCAACC TTGATTAATT CCCTCATCC TTTGACCAAG GGAACATCTC 1920 TAGCTTTTAG TGCGGGTGTT GTGGAAGCTr TTGCCCAAGC TCAGA'rTCTA GG'rGGAGCTC 1980 ATTATCGCTA TTTTGAACGC TTCATCTCCG rrGCCCTTGT TTATTGGGTA GTCAATATCG 2040 GAAqlrGAAAG CCTCCGTCGT TTCATCGAGA GAAAAATGGC TATTTCTGCA CCTGATACAG 2100 TGCAACAGAT GTGAAAGGAG ACCTTCGTTA ATGATTAAGA TTTCGAATTT AAGCAA 2156 INFORMATION FOR SEQ ID NO: 158: 1030 Ci) SEQUENCE CHARACTER~ISTICS: LENGTH: 3140 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: GTATCTCTAC ACATGCTC AATCGATTITr GTTGTCCTCC AATT1'AATTC CTI'ATATGCT TTGTCTGCAT TTGCATAACA AGTTGCAACG TCTCCTGAAC GTCTTGGAAC TATTrTATAA 120 GGAATAGGGA TCI'TATTAAC ACTTTCAAAT GTATTTACAA GTTGTAATAC ACTAGTGCCT 180 TCTCCCGAGC CTAGGTTATA GATATAAACA TCTGTrr'rTT CAGATACTTT TTCTAAAGCT 240 TTTATATGTC CTATTGCTAA A'rCTACTACA TGGATATAAT CACGCACACC AGTACCATCA 300 AGCGTATCAT AATCATTTCC GAACACACTT AGCTCTGA'rA GCTTACCTAC CGCTACTTGT 360 GCAATATAAG GCATCAAGTT GTTAGGAATT CCTGAGGGAT CTTCCCCAAT CAAACCAGAC 420 TCATGACCAC CAATTGGATT GAAATAACGA ACCAACGCAA TACTCCATTC TGAATCTGCC 480 ACATGAACAT CTTTTAAAAT TTG.CTCAAGC ATCAC'rTTCG TATACCCATA AGGATTTCTC 540 *GCACTTGr'rT GCATCGTCTC AATTAGAGGT GACTGATTGT TAATTCCATA TACAGTCGCA 600 *CTTGAAGAAA AGACAATCT'r 'ITAACATTA AATTCTGACA TCACTTCAAC AAGTGCCAAT 660 *GTACTCATAA TATTATTTTT GTAGTACATC ACAGGCTTTT GCACGGATTC TCCGACAGCT 720 TTATAACCTG CAAAATGAAT TGCAGCATCA ATCGATTCTT GTTCAAA'rAC CTTTCTCAAT 780 GCTTGTTTAT CACAAACATC TAATTCGTAA AACACGGGAC GTATTCCTGT AATTCCTTCA 840 *ATACGGTCTA GCACCAAGAT GCTAGAGTTC GAAAGGTTGT CGACAATGAT AACTTCCTTT 900 CCTAAA'TrA GTAATTCTAC TACGGTATGG CTACCAATAT AACCAGCTCC GCCTGTTACC 960 ***AATATTGCCA TCTGGGTTTC CTCCTrAATTA ATTCCAACCG ACTTAACAAA TCTCATAAAC 1020 GCTTCATGCC CAGACGGTGT ATTCTTATAA ACTCCTGCAT CTTCCAGAAC TCTCGCAA,;,.C 1080 ACTTGTCCTG CTTCGTGTTG AACTACGCTA TTAACCTCTT CTTTATTAAT GCGAGGATAT 1140 TTTTCT'rTCA ATTGTCGGC CCATTCTAAA TGATAATCCG CAATTGCATT ATCCTCTCCT 1200 *AAAAGATATT TTCCAACTTC TTCTAAC'TCI' GGT1TTCAAAC CAGGTGGTAA TA'rCGCAAGT 1260 CCCATCAC?1' CGATTAACCC GATATTTTCC T'rTTTAATAT GTTGTACATC TTGATGACGA 1320 TGGAAAACAC CATCTGGGTA TTGTTCAGTA GTATGATTAT CTCTTAGAAC AATATCTAAT 1380 TCGTATCTCC CGTCCACTTT ACGAGCAATA GGAGTCACCG TATGGTGTGG CACATCTTCA 1440 GTCATAGCAA TGATGTCTAC TTCTAAATCT GAATATTCTC TCCACTTAr'r TAGAATTTTA 1500 GTAGCTAAA'r CATTrACAA
GAGCCTCCTG
ACAATCTCCA
1031 CTAACAAGCG ATTTTATTT TCACTTTGTA ACC'TAATTAC TGACATTGGC TACCAGCATT AACATCCTCA AAGTCTTTAA AACAAAATTC ACTCTCAAAT CCATTGGGAA AATATGTTTC CCTCCCTGGT AGTGGTTATG. ACTAAGAATG AGATAGGAAG ATCAGAATTT GAACCAGCAA AATATCCTGG CAAAA'rATCA ATAAI-rGTTC AAATGVN=A GAGGTAATAG CCA7=GGAC A'rGTTGACTA TCGCATGCTC ATTAAACTAT GAGTAGGGAG AATACTGGAA TCCCCATACT GTrrCAACCG AATAATTCTA TGA?'rCGAAC GTGCTGGATA ATTTATTCGC
TTCAAAAATA
TCGTCACCAA
CCCTGATATC CTrCATM''C AAT~rrTCAG CAGCAATTGT TCTAGCTCTC CGTATT-TAGT
CATACATAGT
TTTT-GGATCT
TGATGCTCCA
AAACAT=rG GATAATTAGT TrTTCGGGTT TTGACAAATT AACTCAATAT TCTTAGCAAT TAAAACTCTr CAACTGCTTC AATCGACTAG GTAAAGGAAC TCCTTr'rACT
TATCGTAATC
AGCAGAAGTT
TTGAGGTGAT
TATGAAATTC
0 0 0* 0 00*0 0 *000 000.
*0 0 TTAATATAAT CACTATCrITr ACTTAACTTA ATATCATATG AACTCCAAAA AATATCATTT ATTAACTCTG CTCCTAAACA TTCC7=TCC AAGGCGATTT CCACTAAGTA ATCT=TT TCAATTCCCT CCTCACCTAT TAAC~CCrAGT TCATTATACA T'rCCGTAT'rC AATTACTCTA TATrTT'TCTT TCTAATTTAT GT'rCCCATAT TGCGTAGTAT GAGCAATTTT ATCAAGGTGA GAAACGT'rCC CATCATCTTC AAATATGTA.A AGCTCTTrCTA TCTCATTCTG TTTTTGTATA TCGATTAAAT CTTTAATTTT ACCG-TTTTT 'rGTTTCAGGT CATTTTCATC GGAAATGCGA ACTCTATTTT TCACATA'rAT TTTGTCAATT TCAACAAAAT 'rATCAATAAT TGTTTTCATA TTTCTATACA TTATCCATTT ATAAATTGCT TCAATAATAT CTAAAGCACT AATTACT'rCA TTCATTATTT TCTTTTCCAT ATTTATACTA ACAACCA'rAT CTAAACATCC AGATTGTTCC TGCAGTCCGA TAAC -rCATG AAGTATTTTT TTATT'TTGAA GAATCTGTAG ATTT=rAAA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3140 TCTCTATAAC AAGATATAGC CCTATTCATA ATTTTGAAA, TAAT'NMCTT CAAAATTTCA A'rTTCAACAA TTCTATCCCC A'rTCTTCT'rA AGTCATATGA TCAATATTTA GAACTGACTC TCATTTGATA AATATTCAAA TCTAAATTAT TTGATAATTC AATACGTTCA ATGTCAGTTG ATA'rTTTTAT TACACTAATA AACAGGATGT TGTAAACAAA 'rTAACTCATA TCCTT'TTTTA ATTTATGATT AAATCTTCTr TAACTTCTCA TAT'rTATCAA TATAATTTCA TN'TTCTAAAT TAATCAATTC TACTCGTTCT GCACAGATAC CCAAATGGTC ATAACCTI'AA CATTTAGGTA CCTCTTCTTA ACAAAGTTCG INFORMATION FOR SEQ ID NO: 159: 1032 SEQUENCE CHARACTERISTICS: LENGTH: 9048 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: CCGGATGATT TCCTGGTCAG ATAGGGGGAA AGTGACTrCC TCAGCAATCG AGGATrCCCT TCACGGATAA TATCGTTCAT ATCAATTAAG TGAGCAGCTT TTCTATTrGCA GACATT'rCT CTCCTTATAT TATGTTTAGT GCAGTTAGCT CCCAAGTGGT ATACTTGGAA TAAGCCACTG TGGATTAGTT CATTT'rCTTT ACATGATATC ACAAAATGAC AAGAAT'rGAA AGCATTATGG CATTTAGGAT TAGATAGGAA GTTCAATTCA ArTGTGAAAG AAATACTTAT CTGTGATATA AAGGCTTGCA TAAGAAAGTA GGGAGAACGA AGATACAAAG AAGACAAAAT GTGGTTTAGC TTTTCGTTTT ATGAAGGGCT TGGTAAACTT TTTAGGAGT GAGCAATAAG GGATTT GTG G CGATACTCT GCTAGCAGTr GGTTTATCAA CTTGTTGTTT GAAAGCTTCC AAGGAATCCC TTGACTAGTC AAAAACGAGA CAAGAGGGGA CTAAGCAAAA GTCTCAGGAG TAGGAAGAGG AAAAAACTGC GCCCACGGGG AT'IrGCTCTA CCACGATGGA CTTTTCTTTT CAGCTAAAAA ACCTATGACT TTCATGAAAi TTGAGTAT GTGACTCCTT GGCTCAAGCA GCAGCAGATT TAGCTATTGG TGATTTTGAA GGAACCATTA ATAAGGATCA 0
V
V..
0P 0.
0 0 0@ V.
V
V
CGCGTAGAGT
TTGTAATACG
ACTGCCAAAG
CATTACCTCT
TTATAGAAAA
ATAAAAAGAA
CGAAATCAGG
ATCGCAAGTG
TrGGCT'GGT
TACTATTTCT
CAGAATTATG
AGAAGACGGT
AGGGGACTAA
TTATTTAGCG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 GGTTATCTTC TCTTTAATGC TCCTGNTGAA GTI'ATGGATG CATGTGCTGG ATTTAGCTCA TAATCATATT TTGGATTCGC
ACGGCCGATA
CGTGATCAGG
TATTCCTATG
CTTTCAGATT
GATATCACCA
TTATTGAGAA
CTCCGCTGGT
GTTTCAATGG
TAAACGAAGA
TTATCATGCT
.4.
C
0. 40 0 V
V
AGCTGGAATC ACTCCAATCG CATTAAGGAA GTGAATGGTA AATTGAGCAG TATATTTCTC TAAGATGAAG GTTGAAATTG TCAGATGGGT GTTGAGTATC GATGATCGAT TrGGGAGCGG TGAAACGGTT GAAAAAGATG CTATTAAGGA GGCAGGTTAT AAATTGAGGG AGTTATTTCA GAGTTTATAC GCACGAACCA TCAAGGTTGC ATrGTTAGCC AGGAAGACTA TAATCGTTAT AACGGGCAGA GAAGGAAGCA GATTGGAACC AACTGAAGAA ATATTATCTT TGGAGGGCAT GAGATAAGAA ACTCATTATC CTATGGGAGA TGAAGAGAAT CAAAAAGCTC TTTATCACAA CCTCACGTTG TTGAACCATC TATTAAATGG GGAACTTCAT TTCCAATCAA CGAATTGAAT GCTAAGTGCA CTGAACGTGG TGTTCTCATG GATGTCACCA TCAAGAAGAA GGATGGAAAA 1500 1033 ACAACTATCG GAACAGCTAA AGCTCATCCT ACT'rGGCTCA ATCGAACACC AAAGGGAACC 'rrrrCACCAG AAGGATATCC CTTGTATCAT TACCAAACTT ATA'rMrGGA AGATTTTATA GAGGATGGCA GTCA'rCGTGA CCAGTTAGAT GAAGCGACI'A AGGAACGAAT TGATACAGCC TATAAAGAAA TGAATGAACA TGTGGGATTG AAGTrGTIATT AGCTTGAATC CAGAGGAAAG TAAATGATCA TTAAGGTAAT TGCGACAGAT ATGGATGGGA CCTTGCTCGA TGCTAGAGGT CAGCTTGATC TCCCACGArr GGAAAAGATT TTAGATCAGT TGGATCAAAG GGGCATTCGT TTTGTCATTG CGACGGGCAA TGAAATTCAC CGCATGAGAC AACTACTGAG TCCCTTGGTG GATCGAGTGG TTCTGGTTGT TGCTAATGGC GCTCGTATTT TTGAAAACAA TGAATTGA'TT CAGGCTCAGA CATGGGATGA CGCCATTGTC AACAAGGC1-r TGACTCATTT- CAAGGGTCCA GCGTGTCAGG ACCAGTTTGT TGTAACGGGG ATGAAGGGTG ATTTTCTCAA GGAAGGTACG AT'rNrACAG ATCTTGAAAG T'TTTATGACT CCAGAAATGA TTGAAAAA'rT CrACCAACGG ATGCAATTTG TGGATGAATT AACATCTGAC CTCTTTGGTG GTGTGCTCAA GATGAGCATG GTTGTTGGTG AGGAACGTTT GAGTTCGGTT TTGGAAGAAA TCALATGCTCT CTTTGATGGC CGTG'rCCGAG CTGTATCCAG TGGCTATGGT TGCATTGATA TCCTCCAAGC TCCGATTCAT AAAGCATGGG GCTTGGAGGA A'rTACTCAAG CGCTGGGACT TGAAA'CCCA AGAAATCATG GCTTTTGGTG ATAGTGAAAA TGATGTTG-AA ATGCTTGAAA TGGCTGGAAT TGCCTATGCG ATGGAAAATG CTGATGAGAA AGCCAAAGCT GTGGCGACTG CTCTAGCACC AGCCAACAGC CAAGGAGGAG TrrATCAAGT CT'rGGAAAAC TGGTTAGAAA AAGGAGA.ATG AAGTGGCAGT ACAGTTATTA GAAAAr'rGGC TCCTA-AAGGA ACAAGAAAAA ATTCAAACTA AGTATCGTCA CCTAAATCAC ATrrTTG TAGAACCAAA CATTCTT=T ATTGGGGATT CCATTGTCGA GTATTATCCT CTACAGGAGC TATT'rGGGAC TTCAAAGACG ATTCTCAATC GAGGAATTCG TGGCTATCAG ACAGGACTGT TACTAGAGAA CCTTGATGCT CATCTATATG GTGGAGCAGT AGATAAAATT TTTCTTCTGA TTGGGACAAA TGATATCGGA AAGGATQTTC CTGTGAATGA GGCTCTCAAT AATCTCGAAG CTATCA'PTCA ATCCGTTGCT CCCGATTATC CATTGACAGA GArrAAATTG C7TrrCCATTT TGCCTGTCAA TGAGAGAGAG GAGTACCAGC AGGCAGTCTA TATCCGCTCG AATGAAAAAA TTCAGAACTG GAATCAAGCC TATCAAGAGC TTGCATCTGC CTATA'rGC.AG GTGGAATTTG TGCCAGTAFT TGATTGTTTG ACAGACCAAG CAGGCCAACT CAAAAAAGAA TATACAACTG ATGGACTGCA CCTCAGTATT GCTCTTATC AGGCTI'TGTC AAAATCCTTG AAAGAC'rATC TTTACTAAAT AGCTAAATAA TGTT-AAATTT GAGCATPLATA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 a a.
TCTTGTAAAA AA'rTCTAAAA TCcTTTAAAA AATC-AGA'rTG TACGGATTAT TCCTACTTTA TN=TATATTG AAACCCTTGG AATGAAGGCC GGTGACCAAA CGGGITCTTGA AAAGCTGCT AAGGTAGAGG GAAGAAAAAA ACTAGCTAGA ATTGAAGGAA TCTTATCTAA AACAGATTCG TACCTCT AAATTTTCTC ACCAGAAGAT ATAGCAAGTC TAGTAGAAGT AGGAGAAAAG 1034 TAAAAAGTGA CGGAGGAATT TATGAATGTA AAAGCTAATA ATAGAAAATT AAA'rGAAACA 7TTGTTAGAAG AATCGGCCTT TCTGTCACTA TTAGAAGAAG CTCCCACGrAT GCGTACTCGT TTrGATTGTCA -A~TGGAAAA 'rCCCrAGAA ATTrCATCGAT TATATAAAGG TCAAAATGGC GA'TTrGA'T TCATTCA'rGC GGAAGATGAC CCTGAATTTC AAACAGATT GGC-ATCAATT TCTTTAAGTA AATTTGAGAT TTCTATGGAA TTACATCTCC CAACTGATA'r
TTCAATCAT
ACTGTGGACA
GACATAGCAA
AAAT'rCTTCC
GGACCAAGAT
CTTTTGCGTA
CTGAAA'rTGG GGCATCCCTT GA'rTrrATTC CAGCTCAGGG ATACGGTTAC CTG.GGACTTA TCTA'rGCTCA AGTTcTTGGT GTCTTCGCCA GAACTTTGAG TCTACTGAAT AT7rATTCC TTGGTAAAGA TAGAAATAAT GTTGAATTGT CC GAAC.A TAITrAAAAAA
TTTTAAGGAC
ATAGAAGAAC AAATCGAGGC AGGGATTTAT AATCAATGGA CAGAGTTCTA TTTAGGCCAG
CGAAAGTTTC
GCAGGA'rTG
CAATGAATTA
TAAGTCTGAA
AGTATGAAGT
CCCGGAGCCCT
AGTGACCCAG
AGCAACTTG
AT'rGATAGAC
CAGCTCTTGA
3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 AGCATGGCTT GCAGACTGAG GCAGGACTAG TTTATGACCT 'rTCGGGT-rcG CACAG7TT'GT ACCTN'CTTGT GGGAAATAGG 'rGGTAATAGA TTTTTTACCT GAGAGTGATT ATCCAGACAT CTCATGCAAC AGACCTTGAT CCTT'rTATTC CTAATCCTGA
AGCTAG'TGTC
TCAATTAGAT
CACTATTCCC
TAAAGGAAGC
TCCA'TTTr TTTrTAAAGGA GATGTTTCAT CTCAACAGAC GAAGTCAGCC GCTGN'GGC 'rI=ATTTTGG AAAGAATTTT TCAAGTCTGG AAACCTTGGG GAATGACGGA AGCTTGCTGT TCCAACAGTr ACAGCTGTAG AGGCAGGCAT GTCTCCTGGG TAGACATGCT GGGAGTGCTG G~rTT TC TCTTTTAGA ACACTATTTA GCAGATGATT TITGCAAGAGA CTTTrGGATGA CAACGAACGT TCTTTAGCAT GGAATTTGGA CGGGCTATAC AGGTACCTTT ATCATGTGGA ATCGTCAGAA TATCGAATCG TACCTATGAA AAGGACGAGA GAGCTCAATG rGATGAACTT GATTCGCAAA GAAGAGTAAG GAGAGACATG TTACTAACA GTrGTGGCTrG GTATTGCTTG GGGGrGTC-A TCNTTwAACA GCCCCTGAAT AGCC='rTTr TATTCGGATG TAATCAAGAT TTGGATGTGA AACTAAGTrT GGGCCAGTTG AGTGCATGAT CCCAACGCi'C GACTATAAAG GATTTACAAA CTTAAATCAA AA7rTTCTC AGGAGA'TTGG CTAGACCA'rA GCAAGAAGCC ACTATTTTCC GATATTAGAC CGCAATCAAG TCAAATAGTT TAAAAGGGAC CGAACGAGTG GCCAATACCT 1035 AATGGCACAC GCAATI'TCGG AATTCTCATC CTCTTGG7* TAGAAAGAGT T'I'GCTGTCTC CGCCTATCTG TCTGCTTTC TTGTCCTGTC GGAATrMAA AGAGATAGT TCCATCATAT GTTGGACCAG TTATCCATGA TT'ATGCTCTG TATATCATTT CA'rTGGTCTG GGAATGGTCA GGCCGATATC CCGACTAGTC CACTGTC=r GCCTATACAG AAGC'TTGTTG GCTTCAATTG TGAACAATTT TATCCCATTG GArrTCTTTG AAAGATTTAT GTTTTTGCGT GGTAATCTAA GAC7rTAC GTATTCAAAA GAATAGCATG TCCTTTTTCA
CTCTGTT
ATGCTACTGC
TTCTATTTT
AGGAGACCAA
TTTATAGCTC
TCGCC-ATCGG
CACCTGCTGG
TACCCATAGC
TAGCAGGT'rr
TTGATTTTCT
CTrrCCTTAA
AGCCAATATC
AT7='CTTGG
TCTTAGAAAA
TTrATTTTCAA
GCAGTACCAT
ACTTrCCAAAT T'rGCCGATAC 7w=AAAGT
TGGTCGGCAT
TTGGCAACTT
ACTITACCTI'
ACTCGAGTAC
ACTAATAGAG
TCTCTTTCAC
GACrAACTTG
AAAGGATAAA
TGCTCTGATT
TGCGGGAACA
TATCAAGGAT
AGGAACCTTC
TCTGTTCTG4G CT!'GA7TAAA GGTCGcCCrr
CCTTGCGTTT
AGGAGCCAGT
GGCGATT7rC
TATGGCAATG
ATAAAAAAGA
GATAAAATTC
C -rCTAGGTA
AGTGGGCAAT
CGTCTTTTAA TCGCTGGTGG ATACTGCTCT TNAAAGGA GGTCr-rMrC TCAACCATT GCGACGGTGC TTCAGTATGT AGGGTGGCAC CGACACTGGG CTGATCGCAA CACATGGGCA GGTCTCTTTT CTGCCTTGAC AAGTGCGGA GCAGCTTGGT CCTTTTACAG GGGrTCTACA GCAGGCATTA TCCTTATCGG CTGATAGGAC CGGTCAAGTC TTTGCCTTCT TAATAATGAA ATATTCTTTG CTGTAACTTT CTCTTTGTCC GTGACAGAGA AAAGCCTTCG CCTACATATT GGAAACCTGG GTCAATCCAA CTTTTCTTTA GCAAGGCGAG 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780
CATACATGGT
AATGACTGGT
CTTCATAAAC
CCGTTGGAAA
AAATGAGCTG
TCCCTCGCAT
GCTCTAAGAC
CCGCAAACTG
ACTCGTCAAT
TCACTTCATT
CC7'rG=rAG
TTGAGAAAGG
CTG7rrGAGAT
GATTTCTTCT
TAGGGGAACT
AATTCCTAGC
GCTAGAGGGA
CTT7?TCTTGT
GCCTTCTTCG
CTCAGCTAAA
ATCC7T=T
TCTATCTTTT
T'rGGTAATTC
AAAAAGATGA
GATAGCCATG TTTTTGCAAG CCT'rAATGAA T'=TCAGGA AGCGGACCCG CTCCATTCGG CTTGCTCATTr GGGAATGGTT GGGATTTrGAC AATTTCAGTA TGGAAACAAA GGTGCCGCTT ATACGGCT'TG GCGGAGGGTC TGGGAAGCCT CTCACCAATA GGATTT7'rAT ATAAGCAGIGT CCCCAATAAA CTAGAAAAAG GTGATAGAAT ATTGAGAAAA AACTGATTTG CAAGATGTAG
GTTTGGAAGA
ATAGA.AGCAA
ATAATATTGT
TTTTTGTAGG
AAACTGGTTG
CC1'ACACGGC
ATGCGACTGA
GCCCAACGGT
AGCATATTTT
TCAAACTTCG
GATATTTCTT
AAAAAATCAT
ATCTGATCAT
CTCTATTGTA
GCCCTATTT
GCAACATrrC CGTATTGGAC TATGGTAGCC TGTTTrrCA GAACTAAAA-A TGTAGGAATr ATTCTATCAG TGACCCAGAA ATCTrCGAAC ATTGACCCAT AAACTTrGGAG CGGTCAATCA ACCCTAACTC GACTG7M~G ATGAGCCATG AGGTACA'rCA GCTGACTCCC 1036 AGTACAACCA GCTGATTTCA CGCCGTATCC GTGAGATTGG GCCATAAAAT TTCAGCTGCT GAAGTTCGTG AACTCAA'rCC GTGGTCCAAA TTCTGTATAT GAAGA'rGT' CATTTGATAT TCGGAATTCC AATTTTGGGA ATCTGTTATG GTATGCAGTr GAAAAG7'rGT TCCTGCAGCGr GATGCTGGAA ATCGTGAATA ACACACCATC AGCGCTT CAATCAACAC CTGATGAACA GTGATGCCCT TACTGAGATT CCTGCTGACT TTG.=CGTAC CATACGCACC CATCGAAAAC CCAGATAAAC ACATrrACCG TATCCAATTC CACCCAGAAG TTCGTCATTC 'rGCCCTTAAC ATTTGTAAC CTAA.AGGTGA GATCAAAAAA ATTCGTGAAA CCGTCGGTGA TGTTGACTCA TCTGTC C GGGTTCTTCT TATCTTCGTA GACCACGGTC TTCTTCGTAA CGGTGGTAAG TTTGGT'ITrA ATATCGTCA.A AC7?TGCTGGC GT'rrCTGACC CTGAACAAAA 'rGTATTCGAT GACGAAGCAA GCAAGCTCAA ATATACAGAT GTTATCGAGT CTGGTACGGA CG'rGG GGTC TTCCAGAAGA TATGCAGTTT TGTATACGGA AATGATATCC TCGTAACTT CTGGTCAATG GATAATTTCA TTGACATGCA
TAAACGTGTC
CCAAAAAGCG
AGGCGAAGCT
AGCAGACGCT
CT'rCTTGGTC
ATTGGCGATC
GATCAA~rTA
GCTAAACGTT
TATCAGGTGG
AATTGATCTG
TGGACATGCT
TCCTTGACA
ACGTAAAATC ATCGGTAACG AGTTTGTCTA AGAT AAA TTCCTTGCTC AAGGTACTTT 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 762G 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 TACAGCTCAA ACTATCAAGT GAATTGATTG AACCACTCAA AAGGATGAAG TTCGTGCTCT TGGTACAGAG CTTGGTATGC CAGACCATAT CAACCATTCC CAGGACCAGG ACTTGCTATC CGTGTCATGG GTGAAATCAC CTrTGAAACCG TTCGTGAATC AGACGCTATT CTTCGTGAAG AAATCGCTAA GACCGCGATA TTTGGCAATA CTTCACTGTT AACACAGGCG TTCCTTCAGT
CACACCACAA
TACTCTTTAC
CGTATGGCGC
TGAAGAGAAA
AGCTGGACTT
CGGTGTTATG
GGTGACGGTC
ATGACTGCTG
GTAAATGAAG
GTACGTATGA CTACACGATT GCkATCCGTG ATTTTGCCALA AATTCCATGG GAAGTACTTC TGGATCATGT TAACCGTATC GTCTACGATA CTATCACT'rC TATCGATGGT AAAAAATCTC AGTACGTATC TTACAAGTAA ACCACCTGCA ACAGTTGAGT GGGAATAATC GCAAAAAAAT TAAAAGCTTT GTAAAATCAA GGATTAAAAA CTGTAACTGG GATTAAAACG GGAACA'rTG CTAAAAAGAA AATAGTTCCA ACTGGTTTAC ATTTGGACAA AAAATTAGAC CGTAGTrrC CTTTTGATAT ATATAATGAG AATTAATGGC TCT'rTGTCAA CTGTAGTGGG CTAAGCTCGA GAAAGGACAA ATTTTGTCCT TTCTTTTTTrG ATATTCAGAG
CGGTTACAGA
TAAATTGAAT
AACCTGCGGT
TTGA.AGTCAG
CGATAAAAAT
1037 CCGTTTTTG AAGT=rCAA AGTrCCGAAA ACCAAAGGCA TTGCGCTTGA TAAGTTTGAT 8640 GAGATTATTG GTCGCTTCCA ATTTGGCGTT AGAATAGTG T AGTTGAAGGG CGTTGACGAT 8700 TTrCTCTTTG TCCTTrAGAA. AGG='TAAA GACAGTCTGA AAAAGACGAT GAACCTGCTT 8760 TAGATTGTCC TCAATGAGTC CGAAAAATI'T CTCCGGTTCC TTA'rTCTGAA AGTG.AAAC.AG 8820 CAAGAGTTGA TAGAGCTGAT AGTGATGTTT CAAGTCTTGT GAATAGCTCA AAAGCTTGT 8880 TAAAATCTCT TTATrGG'rTA AATGCATACG AAAAGTAGGG CGATAAAAAT GTTrATCGCT 8940 GAGTTTACGA CTATCCTGTT GTATGAGCTT CCAGTAGCGC TrGATAGCCT TGTATr'CATG 9000 AGACTTrrCGA TCCAATTGAT TCATGATTTG AACACGCACA CGACTCGG 9048 INFORMATION FOR SEQ ID NO: 160: SEQUENCE CHARACTERISTICS: LENGTH: 10399 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 160: GTACCTTTAT TGATGAATGG ACTGTTTAAA TCAGTAGCAC GCCAACCAGA TATGCTTTCT *GAGTTTCGTA GTTTGATGTT TTTAGGTGTT GCCTTTATTG AAGGAACTTT CI'TTGTAACT 120 CTTGTCTTCT CATTTATTAT CAAATAAATA CATGGAACGA GAAGAAAAGG GAGGATTTTA 180 GATGGAAGAA AGTATTAATC CAATCATCTC TATTGGTCCT GTTATCTPCA ATCTGACTAT 240 *GTTAGCCATG ACTTTGTTGA TTGTGGGAGT TATTT'rrGTC TTTATrTATT GGGCAAGCCG 300 CAATATGACC TTGAAACCCA AAGGAAAGCA AAATGTACTT GACTATGTCT ATGACTTTGT 360 TATTGGATTT ACAGAACCTA ACATTGGTC GCGCTACATG AAAGATTACT CACTCTrTTTT 420 *CCTTTGTTA TTCCTTTTCA TGGTGATTGC CAATAACCTT GCCTTAATGA CAAAGCTTCA 480 ***AACGATCGAT GGGACTAACT GGTGGAGTTC GCCAACCGCT AATTTACAGT ATGACTTAAC 540 CTTATCTTTT CTTGTCATTT TGTrGACACA TATAGAAAGC GTTCGTCGTC GTGGATPTAA 600 AAAAAGTATA AAATCTTTTA TGAGTCCTGT TTTTGTCATA CCGATGAATA TCTTGGAAGA 660 ATI'TACAAAC TTCTTATCTT TGGCTTTGCG GATTTTTrGGG AATATCTTTG CAGGAGAGGT 720 CATGACGAGT TTGTTACTTC TTCTTTCCCA CCAAGCTATT TATTGGTATC CAGTAGCCTT 780 TGGAGCTAAT TTGGCTTGGA CTGCAT=~C TGTCTTTATT TCCTGCATCC AAGCT TATGT 840 TTTTACTCTT TTGACATCTG TGTATTTAGG GAATAAGATT AATATTGAAG AGGAATAGAA 900 AGGAGThACT GA'rGCACGTA GCTCTTTTAT TCTIrGCTA TTTTCGAAGA AAGAGCTGAA AAAAAGCAGA AGTArrGGCT CTAAGACAAT CATTGAAAAT CAGATGCTAA AC'TAGAAGCA ATAAAGTAGA AGCTTTACAG CTGGTAAAAT CATCTCACAA ATA'rCGATCA GCTAGGAGAA CAGCATGCCT TTTGTCCAAT CTTGAC1TCAA ATCAAGCAAG GGCAGTAGAC GAGTCGGATA TTTATTACAA AACTTTATCC TGTGCTTGTA GATTGCTTGA TACGTCTGCT CATCCTCTA-A 1038 ACACTAGGTG AATTAArCG GTCTTGATTA AAAAATTrGC AAAATTGCTT CAGATAT'rGA CAAAAACGCG AAGATGAATT GCAAAGGAAA CAGCTGAGCA GGACACTTAA AAGAAAAACC AGTGTTAAGG GTGAGGTCGC AACCT'rGACA GTCATGCCCA GCTTAATGGA CAAGAAAACA TGGTACTTGA AAAAGGAGAA TTGTTGAAAA AACAGGCTC AGGAAAAAAC AATTGCTTTT AGGTTCTGGC CTACAATCAC TAATTTATT N'AATCACTIG ATGGTCTAAT ATTACACGCA CAGAGCTGAA GAAGCCCGTC GGCTGCTAGC CGTAAAGAAG AAGTAAG40CT AATATCTTAG CAATCAAGAA ATTGCTCAAA AGATTTGACC ATCAGC3'TAG TAAAGCACTC ATTGATCAGT 960 1020 1080 1140 1200 1260 1320 1380 1440
C.
C
C. Ce C C
C
C. CC C C
GTAAAGGTAA
GAAGACCGTA
CCTrTT'rT
TTCCAAGATT
AGAGCAAATC
TTCAAAAATA
ACCCACTTGA AA.AAGAAACA CTCATGAACA GAAGACTCGT
AATCGATTTCG
TTGCTCCCTI'
GATGAAAGTC
AAAAATGTCT C2'GAAAGTAA GGAGTGTAAA AGAACAAATC TCTTTTCAGA 1500 TAAAACAAG1' 1560 CrG?GCGCC 1620 TTTT1TTATGA 1680 AAGTGACGAT 1740 TGATTGAGAA 1800 TCATTrGGTGr, 1860 AACTAAAGT 1920 ACAAGAAATC 1980 GACTGAAACA 2040 AAATGTCATG 2100 CTTGGACTCA 2160 CGATACAATC 2220 TCGIY.=G:G 2280 C. CC C C
C
CC C.
C C T7rTGTCA7TT ?TTGCCAATC ACAAGACAAT TG7TTAAAGAA AATTTGAAAT AGAAAGTGGT AGCGCTTTAA TTAAGCAACA AATTGAAAAT GGTGTITGTAA CCTATATCGG GGACGG;TATC AGTGGAGAGT TG'rrGAATTT TGAAAACGGC ACAGACGTTG GTAT'TATCAT CCTAGGTGAC CGCCGTACAG GGAAAATCAT GGAAGTCCCT GATCCGC?2TG CTCGTCCAGT TGACGGTCTT GTAGAAGCAC CAGCTCCTGG TGTTATGCAA GC7~'MAAAG CTATTGACGC CCTTGTACCG CGTGACCGTC AGACAGGGAA AACAACCAT CAAGATATGA TCTGTATCTA CGTCGCGATT GTAGAPLACAC TTCGTCAGTA CGGTGCCTTG TCACAACCAT CTCCATTGCT C -rCCTAGCT TGATGTGAGT ATTAAACAAC GTTCTTTTGG CAATTAACC TTCAAACCCA ATTTTGATGT GCGCGTGCTC ACGGCCTTGA TCTTATGGTA TGGCTCAAAA TTTACAGATA TCCGTGAAGG GTAGGTGAAA GTCTGATTGG GGAGAAATCC ACACTGATAA CGTAAGTCTG TTTCAGAACC ATTGGI'CGTG GTCAACGTGA CGcATTGATA CAATCII'GAA GGACAAAAAG AATCAACAGT GACTACACAA TCGTTGTGAC CCTTATGCTG GGGTTGCTAT
AACTCGTCCA
ATTGCAAAC'
GTTGATTATC
CCAAAAAGAT
'rCGTACGC;A AGCCTCrGC~T
GGCGGAAGAA
2340 2400 2460 2520 2580 2640 2700 T'rA'rGTATC AAGGTAAGCA TGTI-rTGATr CTTATC~rG AACTGTCGCT~ CTTGCTTCGT GATGTTTT= ATCTCCACAG CCGtTTrt-= G-GTGGTGCAT CAATTACAGC CCTACCATTT 1039
GTATACGATG
CGTCCTCCAG
GAGCGCTCAG
ATCGAGACAC
ATCTATCAAA ACAAGCGGTA OTCGTGAAGC CTTCCCAGGG CTAAAG C TGATGAACTr AAGCAGGAGA TATCTCACC TATATCGCAA CCAACCGrGAT TTCTATCACT GATGGACAAA TCTTCCTTGG TTCAATGCAG GTATTCGTCC ACCCATCGAT CCGTTCAT CTGTATCTCG TCTGCACAAA TCAAAGCCAT GAAGAAGGTT GCTGGTACAC TTCGTATCGA CGATGGCCTC
TGTAGGTGGT
CC1-rGCTTCA
AACACAGGCT
CAAACCATTA
TACCGTGAGT TGGAAGCCTT TACTAAGTTT CGTTCTGACT TGGACGCAC TACCGT'rGAG GTCTI'GAAAC AACCTGTTCA 4 S h*
S
S
.6 S S 5 5
S
.5 S S 5555 5
S
SeeS S. St S S
S
5.5.
55 4* S AAGT'rGAACC
CCTGTT'GAGA
CC-AGTAGATG
CCAGAGATT'r
GCTIGCGATTA
GCAGTATCTC
ACTAATGCCA
AACTTCCAAG
GGACCTGGTG
GTGGACG
AACAAGTAAC
ATATTGTrCG 'rGGAAACCAT
CAGAGTTTCT
TAAATGATAT
TGCAAATCGr
TTTACGCTCA
CTTCAACTAA
CATTCTTTAT GCTrTGACAC ATGGTTTCTT GGATACTGTT TTTCGAGGAA GAGTTCCATG CCTTCTTTCA TGCTCAACAT
TCGTGATACA
CAATCAATCT
TAAAACAAAA
ATCGGCTGCT
GAAAGTGCGT
TCCGATGTTG
AAAGACTTGC
AGCTTCCAAT
ATCGCCTCAA
AAGCTAGGTC
AAACTTTTGA
ATTAGCCGTT
GGAGGTTATA
GACGGTAAAG
CGCGGTATTC
GTTCGTAAGA
ATCC'rTATCA CTTCAGACCG CGG'IITGGTT GTTATGGAGT TGAAAGAAGA ATACCACCCA GGTGGGATGG GAGCTGATTT CTTThAGGCT GGCTTGTCAG ACCAACCTAG CN'TGATCAA CAGA-AGAAGC AGrCTTGGAT A.AGAATAGAG GTGTCAGATC CAAAAAATAC GAGTCAAATC GTTCTGAAGA AGCTGCTCGC CAGATATCCI TCATGGTAAT CqTGTGA.AGAA GACAGGCTAT ATTCCTCTAT TTrGAAAGC'r GTTTTGAALAT GATCTGTATC AACCACTTTA TGAATTACGT TTATTTCAAA AACTGTTGAA ACAACCACCA TGTCAATACG TTGACTTGGA TCCAAATGAA CCCAGAAGA AATTcTGGAG CCATTATCGA TGCCAAGACA 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 ATGTACCAA.A ATGAACTCTT CTAACCAGTC AAATGCGTGT GCGGATGA.AG AGTACAGCTr CAGTTGTrGC C'TCAGTTTGC GCTGAGAATG CTGCGGGCAT ATCAATGATT TGACAATTCA ACAGAA.ATCG TAGCAGGTC TGATGAGC'rT TATGTTTGCT GGAACAA.ATG CTTCCGATTG GACTTTTGAA TTGGAAACCA AGAAAGTAT'G ATTTACGGTG
GACAGCCAG
GTATAACCGT
TAGTGCC?'TA
CAAACAGCGA CAGATAATGC TAAGAAAGTC GCCAGACAGG CGGCGATTAC ACAAGAAATT GAATAGCCTC TAGTCCAGCT CGTA'rGAAAA TGAACTTAGG ACCTAGTTGA GCTAGGAACC GACAGTATCT TATATAGAAT AGGAGAAGGA GATGAGTTCA GCGTAAAAT'rG CTCAGGTTAT AGGGGAAAAA CDCCTGAGA AACAAAAATC GTCCNGAkAG CATGGAATCA ACAGATGGGT
TTAACAATGC
TAGCC'rGGA TGACTCGTrGG 1040 CGGTCCCCTT GTAGACCTT'r T'GTTGCAC AC7TrGTCGTC TACAAA.AATG ACGAAAGAAA GTTAGGAGAT GGTATGGrrC GTACTATCGC AATGGAAGTA ?3'GGACACAG GTCGTCCAAT ACGTGTrCTTC AACGIrTTGG GAGATACCAT CTCTGTACCA GTAGGTAAAG AAACIrTGGG TGACTTGGAA GCTCCTTTTA CAGAAGACGC AACTTTTGAT GAG7"rGTrCTA CCTCTTCTGA CCTTrCTTGCC CCTTACCTTA AAGGTGGTA-A TAAAACTGTC 'ITAATCCAAG AATTGATTCA
AGAGCGTCAG
AATCC7'rGAA
AGTGGAC'TT
CAACATTGCC
CCAATTCATA AAAAAGCTCC ACAGGGATCA AGGTTATTGA TTCGGTGGTG CCGGAGTTGG CAAGAGCACG GTGGTATTrTC ACTAT'rTGCT GGTGTTGGGG AACGTACTCG TGAGGGGAAT GACCN'TACT GGGAAATGAA AGAATCAGGC GTTATCGAGA AAACAGCCAT GGTCTrcGCT CAGATGAATG AGCCACCAGG AGCACGTATG CGTGTTGCCC TTACTGGTTT GACAATCGCT GAATACTTCC GTGATCTIGGA AGGCCAAGAC CTGCTTCTCT TTATCGATA-A TATCTTCCGT TTCACTCAGG CTGGTTCAGA AGTATCTGCC CTTTTGGGTC GTATGCCATC AGCCGTTGG'T TACCAACCAA CACTTGCTAC GGAAATGGGT CAATTGCAAG AACGTATCAC ATC.AACCA.AG A.AGGGTTCTG TAACCTCTAT 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 520 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180r 6240 CCAGGCTATC TATGTGCCAG CGGATGACTA TACTGACCCA TCACTTGGAT TCAACAACAA ACTTGCAACG TAAGTTGGTA
GCGCCAGCAA
CAATrGGGTA GAAATCG7NrG CGTTGACCCA CTTGCTTCAA CTATGCAGTT GCTGCTGAAG CATTGCTATC CTTG(;TATGG CCGTCGTATC CAGTTCTTCT GCCAGGTTCT TATGTTCCAG TAAA'rACCGAC CACTTGCCAG GCTCACGTGC CTGGCACCT
TAAAACGTGT
ATGAGCTTTC
TGTCACAAAA
TTGCTGAAAC
AAGATGCCTT
CCTTCAACGT TACCATGAAT TGATGAAGAA AAGACCTTGG CTTCAACGTT GCGGAACAAT TGTACGTGGC TTTAAGGAAA CCGTGGTGTA CGTTrTATCG
CAGCCTTCGC
TCTACCCAGC
GAGAAGAGCA
TGCAAGATAT
TTGCTCGCGC
TTACTGGTCA
TCCTTGATGG
AAGATGTGAT
TGCAkAAAGCT GAAAAAATGG GATNTAAGA GGTGATCTAT GGC'TCAGTTA ACTGTCCAGA TCGTGACACC AGATGGTCTC GTCTATGATC ACCATGCCAG CTATGTATCG GTTCGAACTC TGGATGGTGA GATGGGGATC TTGCCACGAC ATGAAATAT GATTGCGGTT TTAC-AGTTG ATGAAGTAAA GGTAAAACGT ATCGATGATA AAGATCACGT GAACTGGATT GCAGTAAACG GTGGCGTTAT TGAAA'PTGCC AATGATATGA TCACAATCGT CGCTGACTCT GCAGAACGTG CTCGTGATAT CGATATCAGT CGTGCAGAAC GTGCCAAACT TCGTGCAGAA CGTGCAATTG AAGAAGCACA AGACAAACAT TTGA'N'GACC AAGALACGTCC TCTAACATT GCTTTGCAAC 1041 GTGCTATTAA CCGTATTAA'r GTCGGAAATA GACTATAAGA CAACTTCATT ?T=ATGCTG 'ITrTAAGGAG CAAAACGGAT GGAAGTCGTT Gr.AGAGrrc GCTAGACGAC CATrGTCACA CATTGTCAGA GTC~wrGA'rcG ACAACAATGA GAAA1TTIT CACGT'GGAGT CTGACCATGC GTTGAACGA TTTCTAATAA CGATGGTATA TACTGCGATA GAATCATGGC CACGGTTAG.A CTTTAGAGAG ATGAATACCA GCGTCCAT TAAAGCCTTC AAATGACCTG CATACGTTCA AATTCGCCAA CGCCATCGTA TATTGAGTTC ACAAATGAGA TAAGCGATTT TATAGTGGITr AGCCTCCC TGGCTTGCTG TGATAGGTAT AGAGCTTAGA GGTCATAGG;T GATGACT'rGG TCAGTTCCCA AGTCGCAGGT rrA.AATCTGT ATAGTGAACA TGGGGGGAAG GTTGATCCAT ATCACTAAGT AGAAGACTAC GTCCCTTGTG ATAGTTrAGCT GCGTAAACCA AGTGGGGAGC TCCTTCTTCA ACAACATGAT CAATTCCCCC CTATCGTCT CAAGGTAGGT TGGACTTGGC CTGTATCAAA GTCTGCCTTG AAACACTTTC TTTCATTACT AAGTAAATI'A AAGATATCCA AAAATGGTAT AATGAGAGAA TTTAGCCTAT CTTGGTTITT GTAACCTITAT TANTGGGACT
TGGCTACCAA
TCAGCTGCAA
TAAATCCCTT
ATACCTCTGT
A'rTAGATGTA
CAATAGAAAG
CAAGTGGTTT
GAATCTTr
AGTTGTGATG
TGA'rTGGATG
CATCGCTCTC
TrrGACCTTr
ATTGGTAGCC
TPTTTCTAGC
CTTGATTTTC
CATCTTCCTG
AATCACGCTT
TTrAACACAGT
CAGTGTATAA
AAAGrrC'rAG
GAGAAGTACG
GTAAAGATA-A
AGCACTTTAA
GAAGTATTTA
TTAGATA-ACA
ATTTTAAGTA
AAAAATGA.AC TTGAAAATAC GCAGACTGCT TCGGGAACAT ATTACG=rA AAGACAGT-rG GTCGGCTGTC AAA'rCAAAAT CTCTAAGCTA CCG'rCCGCAA AGCGTAGAGG TATTTACCGT GTAAGCTTCC GGTAAAGTTG GAT'rAAAACT TCGATAGTAC ATGGAAAATG ATATGGCGTG TAATTTTCCT TCTTGATCGA CACTAGATAG TGGTCAGGTG ATGTGGACCT TGGCCACTGT GCGTrrATAA ACAAGGACTT TTCATCGACA GCAACATAAC CCCGTCAGT'r TGATAGGCTG ATGTTGGTGC TGGTCAA-AGG ATTTGAAAGC TGACCAGTTT ACGTGTATA.A GTTCCAAAAT GACTATTATA 'rCACAAAAAC AAAAGAGTTA TTTTGTTTCA TGGAGCAAAA AGAGAAACAT AGGCAATTAC GGTATTTT'A AGATTAGTTT TCTATTTTCA TTTTCTCTGG =rGTTATAT 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 CCTGTT TAG TAT7TTGTTGA
ATCACTATTG
CCAAATCTGC
ATAGATAGGA
AC7TTAGC
ATCCTATTGT
TCTTTGTTAT
AACGTCAGGT
TTGTTAATGG
GAGAAGCATA AGGTTAATCG TGTTATAGCT TTTATCATTT GGGGCT'rGGC AGTCGCCATT GCAAGAA.ACG TTCCTGTTTA CTTAGAAGAT CAGCACCTGC CAGATGATTT CAGACCTCAA CAGGCTACAG T'TTTGGCA.AG TA.AGGTTTCA TTAGAGCAAG TTTTGACCAA TCTCAGGCAG TCAACTGGT GAGTGCCT TTGATTATCG TTCCTTTCAT GCTCr=AT 1042
ATTAGCGGGG
CTCTTGCGTG
TATTTCACCC AATTCATTCC GTGAATCAAC AGrrGTCCAA GTAATGT'A TCATCTTCTT ACTGCTGGTA I=rAAATCT CTAGTArrGG GT'rTGATTGC GTAGAACAAA CTATTGAAGG ATCCACCCT1A TTAATGTTCT GGAGTTTTAC TTGGTATTCC GAATGGTATA AGGTAGTCAG CAATAGTCAA CAGATGrTAC TTrCGCCAAA GCTTTAGAAA 'rGAAGGGATT' GGTTTC1YATC TCCAGAGGTT CATCTTAATC CrrACCTAT CTTGAGGAAA GAAGGCAGAC CTTTACCAGC GGCCTTGACC TACTCAGAGG GTTGGAAAAT TACCAAGCGG TGAGCAAACG GGCATTTCCA ATTTGAAACG GCTACTGACT AGCTTTTGAG TTGGCCACTC CTTr'rAAGCAG CTTGATACCA AAGAAAATTG AACGAACCTG CTATGT'rCGA GGGCAAGTG.A CAAGA'rrATT CGTCTACGCT GCTCCCTTAT CTTG-,AGCT TGGTCCAGTC ATCCrT =GA CCGT'rTGTC TCTCCATTGA CTT'GTTr'rG TTAACTTCAG GGTTTA'rGCC TCTGCTAAGG TGGTCTATAT GAA'rTAGACG AGGCr'rTGGA GGAGCAAGAT ATGATTCAAG TGATC=rCTG CTCAGGCCAA GGAAATTTAC TAGCTGCAAT TGCTAGCCAG TCCAAGCTGA CAGTGACTGG TGGAAGGTTT GACAGATGTG ATTCTCTCTT GATATTGGGr CTATTCAAGC CTATGCCCAG CCTATCAACG AATTGGCTTT T'I-r'AGAAAA AGCCCTGGAG TTTATI'TGA TCAAGAAGAA TTTC1'CCTGA CTTTGAAGC C7?rCTCAACT GATTGTTGCC ATIGGAAAGG CTTGCGTAAC TTGGACAAGT TTTATCAGA'r CAGTCGCTAT TATrGTAGCA ATGCGGTTAC GCTGGGCT TTCTAGCCAT GCTTCCTGCT AAGTAG'rGAT TGTCTTTATC TTTrTGGGAAG TCAATTAAAC GATCTATGTT TGGTATCTGG TTGTCATTfTC AGCCATTTTC GTGAGGAAGT CAAGAGTGAA TTAACTAAGG CTGAGCATTA TATGAATTGG CAACTTATCT CTGAAAATTG TAGACGATTT GATGGTCAAA TAGAAGAAGC TATGTCTCGT CTTTCGCTCT GCACGTGAGA AATTATTGGA TTGGCAGAGT TGGATAGTGA TTAGATAATC GCTCGATTTA GCCTATGCTC AGTTAGGGAA TTAGAATACG ATGACTTAAC TA'rCAAAAAG CCACCCTCTA TATGAGTrATG GGTACAG7CA ATCGCTAAGC AAGGATTACA CAATTTCTT ATGAATTGCA GAAGACGCTG AGGATACAGA GAGCGTTATG AGGATATTCT TGGATGATTG CTCGTTCTTA CAAGAGTTGA CAGGAGAT TTGCGTGA.AT TGGGACATTT 8040 8100 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 GGCTTTACAT AACGAACATC AAGr'rCAAGA AGCCCTGCGT GAAAAATCCC TTTGAAACTC GCCTCTTGCT AGCTGCTTCA TGATGCTAGT GGTGCAGAAA ATTATCTCCT TACTGCAAAA AGAAATCTTG CTTCGTTTAG CCACTATTTA TCTGGACCAG AGAATTGCAG AGTGAGGAGC CAGAAAATCT T~rGACCAAG TCAAGAAATG GACGATTTGG ATACTGCTTA TGAGTATTAT GAAGGACAA'r CCAGAATTI'C TGGAACACTA 'rATCTA'rCTC TGAAGAAGCA AAAGTCCATG CTCACACTTA CTTAAAACTG GTTCCAGATG ATGTGCAAAT 9840 GCAAGAACTG TTTGAGAGAT TG'TAAGAATG TTTAACCCAA ATCATTCATA CCTCTCrCAA 9900 CTAGATGTAA CTTACAAAAC CCCTGACCTC ATGAGCCACT TTCTTCCTCC TCATGAGGTC 9960 AGTTI-ACTr TCTGCTGTTC CAGTATCGTT Tr'rCCTCGCT AGATTCCTC AAAAGGGCAG 10020 ACTCCTCCCT TGGTGCGTCA CACGATTTTT TCATCTCGAC TGTTCTTrAA TGCATCATrA 10080 ACGACGCTTT TCTTCTAGGT GGTrCATAAG GAACAGGAAG ATrCAGGTTG ACrTTCTAA 10140 TCCTAGAATA AAGTGCTGAA AACAATTCGG AATAGGCATA GAGACTAGAC AAT'rTGAGGA 10200 GCTGCTTGCG TCCTGTTCGA ACACATTC CCACCACGTG AAGAAAAAGA TGGCGGAAGC 10260 GTTTGATTGT TAAAGTNTGG AAGTCACCTC CAGCTAGATG 'ITrTGAGAAAA AGATAGAGAT 10320 *TGTAGGCGAT ACAGCTCATC ATCATACGAA T'TCGTTr= ATTAAGGTTG AACTATCCGT 10380 TTrATCGCCA AAAAATCGG 10399 12) INFORMATION FOR SEQ ID NO: 161: SEQUENCE CHARACTERISTICS- A) LE:NGTH: 9409 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: GATA.AGATTA AGTTAGAAAA GAAAGAACTA GGACATATCT ACCAGATTCA GCTTTTTAAT 6 AGCTATGGGC AGGAAGAAAT CTATCGTGTG ATTN'GATGG AGACCAATAT TAGTTCGGTT 120 TCAACCAATA TCAAGTATGC TGCTGTCTTG A'rTAATACCA GTCAG'rTGGA ACAGGCTAGT 180 CAAA.AGCATG AGCAATTGAT TGTGGTCGTG ATGGCTAGTT TCTGGATTTT GTCTTTACTT 240 GCCAGTCTCT ATCTAGCTAG GGTCAGTGTT AGGCCCCTGC TTGAGAGTAT GCAGAAGCAA 300 CAGTCTTTrG ITGGAAAATGC CAGTCATGAG TTACGAACTC CACTCGCAGT TTTGCAAAAT 360 *CGCTTAGAGA CCCTTTTTCO TAAGCCAGAA GCTACCATTA TGGATGTGAG CGAAAGCATT 420 GCATCGAGTT TGGAAGAAGT CCGAAATATG CGTTPTTAA CGACAAGCI'T GCTGAACTTA 480 GCTCGGAGAG ATGATGGGAT TAAGCCGGAG CTTGCAGAAG TTCCAACTAG CTTTTTTAAT 540 ACAACTTTCA CAAACTACGA GATGA'N'GCT TCGGAAAATA ATCGTGTCTT CCGrTTPGAA 600 AATCGTATCC ATCGAACAAT TCTCACAGAT CAGCTrCTTC TGAAACAACT GATGACCATT 660 CTTTTCGATA ATGCCCTCAA GTATACTGAG GAGGATGGTG AAAT'IGATT TCTTATCTCG 720 4*
S.
S 0 St 55 S S
S
GCGACCGATC GCAATCTTTA T'rTACTTr GATAAAAAGA AAATT GA CCGrTTTTTAT CZTGGTTTT GTTTAGGAT'r ATCCCTACCC GTTACTGTCA AAGATAATAA ACCCAAGGGA ACACCATCTA AAAACAAAAA ATAAAAATAT TCTTCTACGT TTTCG7"TTGA TAATAGACCG GATTGCTGCG GCAAAGGCAA GAGCAGTTGA GATAGCAATA CAGATACAAA AAACAGCGAT GGCATCACAT TGCGCI'TCAG GTCTATAAC TTCTTGATAG TCTACCTCAT AGGATTGTAA AGTACATACC TATTTTATCA TTTTTTCGGC AATAAAGTCC C=ITCATTT TCAAAGCATG TCTTATGGAA AAGATTGTCA TTACAGCAAC ACTCGAAGCT GGCGTAGACC GTATCTATGT AACGACCTrT AGTTATGACC AA?-TACGTGA GGAATTGATC CTTGCGGTCA ATGCTCTCAT TTTCTTAAAC T1rCTTGGAAG AAATCAAGAC CT'rTTACGTA GTTAACCGCG ATGGTTA~rC GGTAACTAGC AGTCGTCAGA T'?AACTTCTG TTTGCCGCGTI GAAATTCCAT CAGCTGAACT 'rGCTGCAAGTT T'rGGTTTACG GTGCTAGCGT AAACTACTAT AACTTTACAC ATATCGATGA GGC1'GAGCCA AGTGATCCAG AGAGCCACTA TATCTI'TGCC AACAATGACC TTGATTTGA'r CTTT'ACTCGC TGGAAACTAG AAGGGCTCTA AAAACTCTTT ATCCAAGCGC GTAGCTTGAT CTT 'GCTG GATGAAGAAG TTCGTAAACT ATI'TTATGAC TACGATCCTG ACATGGTTAG GATGCAALACA TT-rCTTCTCT CAATTTTTCG GGCTAGAATG CTCTATTCGA TGGGATTTTT ATTAATAAAA AATAGGATT TTGGTAATGA GGAAGCTGCT TTTCrMCG GGCATGATTC
AGAGAATTAT-TACAGAAAGG
GCTGATTTTrG GAGA.AATGTG TGCTGAAAGT ATTGAACAAG CGGTGAGAAA GATTTGGTC A.ATCGCTAAC TTGGTTCATC
CCTTGAGATT
GGTTTAATTC
TCTCCTTAAC
TTACAAAAAG
GTATAATTTT
TTGAACAACT
TTCGTCTGCC
ATGCTGGTA-A
1044 TCTGATAATG GAATCGGTAT TTCGACAGAA CGAGTAGACA AGGCTAGAAC CCGGCAAAAA AAGCAAATTG TAGA'rCCTCT AAAAGGAACT ACAATCTNG AAGTGAAGAT TGCCATTrCAG CGCTCCAATT GGGGCGATAT TTTGGATTTA TGAACT=N AAAACAAGTA AGCTGAATCC TAAT'1-TAAT GCTAAAA.AGA TAAAACTAPA 5*
S
S
*t*C S S
S
GCACCAAGAT ATGATGGACC GTATCAAGCC AGACTATATT ACGATrGGGG ATGCAGGCGT ATTTAAGACC ATCTACGATG CTrCAACCAT GGGACAAAAG GCTGGCGCAT CTGAGGCTGT TTTCAAAATG CCAGAGATTT TGGAAATTCC CA'rCCATCAT TCTAAACGTC CACTCTTGCA TGAAAAGACG CATAAACGTG ACCTCTTCTT TTCCATTrTTT GAAGATAATC ATrGACCCA GATCAAATTA ACAGAATTGG TGGAGCATGG CACTCCTGGT CAGAACTG TTGAGATTC TCAAGAGGGC AAC 1'TAGTC ATGCTCAAGC TCACCCTAAA AACCGTTTCC TTGATACAGG ATAAAATACA TGATTCGTTG AGAGAAGGAA TATTTCTTCA CTATTTTACA AAAATCAGCA AAGAAAAGTA GTGTTCTTGA GTTTGAAAAT
TATCCTATGT
CGATTGGTAA
AAArrATCCA TTGCAGGTGC CAAATGCCC TCGCTATGT GTGGTGGA'1r AGTGGGAATT GTCGTG-ATTG CGrGATGTCAA TCCACATGAA ACCAACCTCT GGCGCAAGCA TGGAGGA'rGG GATTTTI'GTA A'I-1-A'T-r'T TGAAGGCTAT AGGTCrrwrT CCCTGAACTG TTCCTCTTAA ACACGCACAC 7'T'TTTACG GAAAAAGATG TGGCTGACGC TCTCCTATAT CTATCCTGAG TTCTCCAGAC CTTCTCTGGA GCCACGAAAA T-rGAAGTGAG GGAGGALACAA- AACCTGTCTC TCTGATTCAA CCGCTTTA'rC TCAATCCAAA AAAATCAAAT CATGGAAGAA ATAGCTTAAA GGGACCACAA AAAATCATGA AAATCGCTTA AGACAGAGAC AGGAGATTTG CAGACCTTCG TCATGATGGG GGAAACGTAG TCAAGAAAGG TCGTAACCAG ACTTGA-AGAT
CCCTTCGGATG
CCTGATTTTT
GCCCATAATrG
GAGCTAAGAA
GAAAAATATA
ACAGCCCTTT
ACCCAGCTTC
GAGTCCTACC
r'rGG'rCCAAG
CTATCTCAAG
GAAAGTTTTG
GCGCCGACAG
7TTrrGGT ATAATN-rr ATAATGAAAA TAGAGGCAAC TAGCACAGGT AGTAAGGCTA AGGACGGAGA AATCGTCGAT CACTATACGA CTCATATCAA AGAACTGACA GGA7TGACAG CGCAAGTI'GC CAGuAAAATA TTNGACTTGG rTCAGTTTGA TGCTAATCTC TrGGCGGAAA ACCCTCGTGT TGATACGGTC GAATTGGCCC GCN'GCCGAT Tq-rGTGTCGA GAA7TAGGAA CAGATGCCCA AGCTACAGCA GAA??ACTTC CTAAAGGTCT CTTGGAACGC TTGCTGGAAA TGGTTATTGA GGAAACTTAT CGCAACCAAT TTCAAGGTCT ATATPI'TAAG AAAACGGAAG ACrTTCTAA AAATATTTCT CTGTTGAACC CTAAAGAGGT TGGCT'TGCTA TTGAAAGATG GGATTGGGAA AACCTATGGC TATCT AC 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 GAGCGACAAA ?I'GTTCTTAG TGTTCCGACA GAAGGTAAAC GCCTCAAGGA AGTGTTCCAT AATTATCTGA AGT1'GGATGC CTTrTATCAT TTTAGACGCT TTAAAATGCA AGTCTTGGTC GATGAAATCG GGCAACTCTA CCGTTACCAA AATTTATCAT CCCAGAGCTT AT'rTGTGACG GCAGAGACTT GCAAGCTT AGTGACTAAT AATCCTGAAT TT-GTCAGTGA CCGTTTA7G AAGATTCT1'C
ACAGATATTC
TCCTTGCAGG
TGGC1TTACTG
CATTTCTAG
GAAGATTT
CATGCCTATC
ATTATTGATG
GATATACAAT
CAACAACGGA
TCTGGCAPLAT
GAATTrGGAAG TGGC 'TGAAG
CGTACTCTC
AAGTCCAA.AA
CTATTATCGA
TACTAGAAAG
CTAGGAAAAA
TAGAAGACT
TAACTGAAAC
GAT-rTTGTTA GC'rCTAGAAA ATCTGCTTCA AGAGACCTAC TTTAATTGAT AAGGCTTTAG TAGGAGAAGA AAACAGGCT2' TATr'CGCTT'r GAGTGTCTCT ACTTGATAGA ACAATTTCAG TATCTTAGAT TC-TCTGGACA ATCTCCATCA GTAT71rTCA TGATGAGC1TG GTTCGCTATT TTACAGCTGA AGG'rGATTAC GAGTCAAAAG AAAATTCAGA TTTCTTCTAC AAAATCAGGC 1046 TGTCCTCTTT ACrrCGAG AGTTGCCAAG TCTTGGGAGT ATCGGCTACT C7TGAGA~rA GTrCAGAGGGT
AATCTCGGGG
AAACcCTCr
TCCAGCAACC
TACTTACAGT
GCTTTGAAAA
A1T'rTCAAG
AAGAACCCTT
TTrC7rTGGrCA GACCTrAG GCTATCCTGA AGCTAAATTT GTCAAGATTG AAAACAGGAA CAACAAGTGO TCATGGTCAA AGATTTCCCT C =GTAACAG ACGAAG'rCTAT GCCAGAGAGG TAGCTGCTTT ACTAGTGGAA A7"rCAAGCrT CArTTGGTT CTCTTTACCG CTAAAQACAT GCMTCTAGCA GTATCCGAT TAGCCACTTG GCCCAGTATA AAAATGGGGA TGTTCATCAG CThAAAGAAAC AGGTGAACAA CAAATCTTGC 7"PCGTGCAGC AAGTTTCTCC CCATCCrrT C TGATTCAAC TrGACCGAG GCTTCCTTTC
GAGGGAGTTG
CAAAATCCTC
GACCAAAAAG ATTAATCAAG AACTGAATCA AGAAGGGAAA AATGCC7Ir GTTTAAAACA GGCTTTGGGA AGAAGTATGA ATCATTATCA ATTGCCAATG GACGTGAATA CCAACGTTCC ACGGCAAACA AATAGTAGCA CCGA.AGTTGA CGAGGCTATT AAAGTATAAG GT'rAGTATAT TCTTGCCAGT ATT'TTTTCTA ATGAITATCC CAATAATATT GGC7TTTGAT TGGT'rrTG CCTTTCTATA TA=TAGGC GCTTAGTTGC ATCAACGGCT AACCGTCAGA ATTTATGAAG 'rTACAAAGAA ACATAAGGAA GGATGATTCT CTrTACCATT CGGCTTGGT 'I-'rGTAGCC AAATTATTAT CCCAGTA'N= TTATTAGCAA GGACGGACGA ATCGGATTTIr GGCTTGGCTC CTCAAGGGCA GATTGCCATT ATCTGCTTAT CCCA~CGA GCTTTA7TTGG CTCTGTCCTG AGATTACTCT TAAATCAAAT
GCCATTATTC
TTAACTCTTA
TCTCTAGCAG
GATAGATrr'r
ATGAAACGTT
CTGGTCATCG
CTGCCCATTT
TI-rTGGATAC
AAGAAGCCAC
TTAATGAGCT
CTCTCGACTC
GTGTGGTGGC
TAGGGCAGCA
GTCATGCTCT TTAATACAGA 'rTGGGACTTA TGATCTTGCC GCCAAAAACT GGGTATCAAT ATATCCTATA TCCTCAGTT TCGACACGCA CCTTCCGCT CCAGTCCTAG TTCTTAGC GAGAATCGTC GGAAAACGAT TGTTAAAACC ATCTCTCGAT TTGATAAATA GTATTGTATG AAGAGTCGAT TACAGT'rGC TATCTATATA GCCGTTAGTC GGTCGCCTGG AN'GCCTTGG AT'TTC~wrrGG AAGGTGACCC GATTGTATTT TATAATCCKA AAATGG;AATT ACCC'rATTCC GCCTCGTGTC ATTGTCCAAT GGACT7'rC TTAATTrTC'r ACTTCAAAGT GACTTGGGGA A7TTATCAGGG GT'rrCrGGAk TGCI'GCTTTC TTAGCTATCT 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 :700 5760 5820 5880 5940 6000 6060
ATTTTCTCAG
GTGACTCC
GCT'DTTCTTC
AATCCCTTTG
GAATCCTTTT
TAACAGGAGTr
ACCAGATTGG
AGTTTGCCCA
GGGAGTGGTG GCTTATTTGG GAGTCAGATA TCATTT'rTAC CTTATTGCCC TCTATCTCAT AACCAGTTCT ACACTTATAT AATGCCGACC TACCAAATTA AACAACGACT TACCAGCAGG TCAGGGAT'rT AATGCTTCGA G=TATTCCA GAAGATTT GTT'GATTTAC CG'rATGTI'A TTCCCAGr ?GATTATGA TGTTGCTCI'T CCACATCTTT GAGAATATCG GTGCTGI'GAC GGATTCCrr TrGGTTTGCT
TCCCATTCAA
AAAGCrAGCAG
GTC'ITGCGTC
TCGCATGCAA
ATGATTGTTC
AT'rCAAGAAT CCAATrGCCC GCCT1'CATT
TTTATCGATG
ACGGAAAAAG
?TATATTAGC
GAGCCAATAT
TCCAAGTAAG
TTCCTGGAGG
TGCAAAGC'T
TCAATCAAGC
'rCGCAAGGGG GATCAGCTAT AG.=ACCAGA CTAATCTAGC GTTGTATTAA AACAAATTAA TCACGGCTTT GAAGAAAT'IG CACATGTGAT ATGGT'roGTT AGCAGATCAT GTCN'TGATG TATGCCTGGT TCTGCACATT CGAGCAAGAA GGGAAGAAAC AGAGATATTG AAAAA'rAAGC TGCGACTACTT CCN'GACCG TATCAGTAAT CTGATTGG1'C TGAAGAAAAG AGCGGAAAAG ATAAGGAGAA AATCATGGTA AAGCCI'GAC AGTTGTAGAT
TTGAAGAGCA
GAGATTTATC
TACCTGATAA
TAGCAGCCAT
GATACACTTG
GT'rCAAGAGC AAATCCTTGA TGCTCACTAC G'rCAAGGAAA CAGTAGTGGT TTGACAACCA GTCGGGGTCC TTCAACAGCC CTTGCCTTTG CCTACGAGTT CTAGCAGGGG ACGCAGAGAG TT'rACGAACA GGAATGCTCT AATCAGTAAA ACGGGACTTA TTCTCTCGTr TTTTATGTGG CT'rTTTTCAT AAAAAAA'rGC TATAATGAAG GGTATGAAAT TTAGGTGGAA CTTTACTGGA TAATTATGAA ACTTCAACAG GCACTGTATG GTATCACACA AGACCATGAC AGTGTCTATC CCT'TTGCCGA TTGAGACATT CGCTCCCAAT TTAGAGAATT AATGAAGCCA GAGAGCTTGA ACACCCGATT TTATTTGAAG GACAT'rrCAA ATCAAGGTGG CCGTCATTTT TTGGTCTCTC GAAATTTTAG AAAAAACCTC TATAGCAGCT TATTIACAG GGCT'rrAAGA GAAAGCCAAA TCCCGAATCC ATGCI'TTATI' AGCTCTGGTC TTGTCATTGG TCATCGGCCG ATTGATATCG CTTGATACCC ACTTGTTTAC CAGTATCGTCG AATTTAAGAC AAAGGAATAA GATGACACAA GAAATCAAAA ATCTGCAGGC AAATTCAAGI' TTTAGAGGGC TTAGAGGCTG TTCGTATGCG CAACCTCAAA AGAAGGTCT CACC.ATCTAG TCTGGGAAAT AGGCCTTGGC AGGATTTGCC AGCCATATTC AAGT~rwrAT CTGTTGTGGA TGATGGGCGT GCTATCCCAG TCGATATTCA
ATCGAGATGT
A.AAACTCAGG
ATCACGATTA
CTGCATGT
AAGCTTTAAA
T7TAGAA.AA
GAGTTTCTGA
ATCGAAATGA
AAGTGGTGAC
TAAGAGAAAA
AACCAGGTCA
AAGTATTAGA
ACAGGATTAT
TCCAGGGATG
TGTTGATAAC
TCAGCCAGAT
GGAAAAAACA
AGTAACGGGT
AGACTATCAT
TCAGACCTTG
TTGTGCGGCA
TTATGACGGC
AGATGGTrCAG
GGTGGAGCA.A
CTTTGGTAAA
GAAATCATCG
CATCTGGGAT
TGAAACATTG
GGTTTCTACT
GTACAAGGAA
CCTAT'G4GAA TCACGGN11MG
TTCTAGCTCA
GTATCAGATT
AGCGCAGGA
CATATAAGAA
GATGCCAGTC
TACATITGGAT
TCAATTGACG
GATTCGATTA
GGCCGTCCTG
6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 1048 CTG?=GAGAC CGTCTTACA GTCCTTCACG CTGGAGGAAA AGGI-rCAGG TGGTCTTCAC GGGGTGGGGT CGTCAGTAGT TACACGTTCA 'rGTTCACAAA AATGGTAAGA TrCATTACCA TTGTCGCAGA TCTTGAAATA GTGAGATA CGGATAAAAC CACCCGACCC AAAAATCTTC ACTGAAACAA CAATCTTrTGA GGATTCAAGA Gl'rGGCCTrr CTAAATCGCG GTCTTCAAAT GrrCCCGGT GGTGGATACA TAATGCCCTT TCCACTCAAT AGAATACCGT CGTGGTCATG ACGAACAACT GrrCACTTCA TTTTrGATAAA TTAAATAAAC TTCAATTACA GATAAGCGCC AAGG7rTGGA ACAAACCAAG CATT-ATCATT ATGAAGGTGG GATTGCTAGT TACG77rAAT ATATCAACGA GAACAAGGAT GTAATCTTTG ATGATATCAC AGTTGAGGTA GCCATGCAGT GT'rTCGCCAA TAATATTCAT ACCCATGAAG CCTTGACACG TGTTATCAAC GATTATGCTC
ATACACCAAT
ACACAACTGG
GTGGAACACA
GTAAAAATAA
ATAATTTAAC AGGGGAAGAT GTTCGCGAAG GCTTA6ACTGC CAAATCCACA GTTTGAAGGA CAAACCAACA CCAAATTGGG CTATACAGAC GGTGAGATGG TTACCATGAA AATGTCATGA TGAACAAGGT TTCCGTACAG GTTACTGAAA GACAATGAAG AGTTATCTCA GTTAAACACC AAATAGCGAA GTGGTCAAGA CATGGAAAAT CCACAGAITG TCGTCTGGCT GCCA6AGCGTG CAACCTTCCA GGGAAACTAG CATCGTCGAA GGAGACTCAG T'rACCAATCG
CCAAACGTAT
CCCGTGAAGT
CAGACTGTTC
CT.GGTGGAT.C
GTAAGATTTT
GTAGTCTTT
Gr'rACCAAAA T'rTTTTAAC 7rGCCCAACC
CCT~CTTCAGT
CGTAGAAAAA
CACACGTAAA
TTCTAATAAC
AGCCAAATCT
GAACGTTGAA
CACAGCCATG
ACTCCTTTTG
CTTGATI'TAT
ACCAATCTAT
GAAGCt=T
GGAATTTTG
AAATCTGG7"T
CCTGCTGAAA
GGTCGTAACC
AAAGCAAGTA
GGAACAGGAT
ArTCACCGATG
CC'N'ATATGA
GGTGTCAAGG
CCGATTTCCT
CTGCCAAGGC
TGGAAATTTC
CAGAACTCTT
7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9409 GTGAGTTTCA GGCTATCCT CCAATTCGCG TGGATAAGAT TCTAGCCAAC GAAGAAATTC TTGGCGCAGA ATTTGATGTT 'rCGAAAGCCC CCGATGTCGA 'rGGAGCCCAC ATTCGTACCC AACCAATCCT AGAAGCTGGT TATGTTTATA 7rGAAGCGA GATTAAAGAA 'rATATCCAGC AAGCrTTAGC CCGTTATAGT GAAGGTCGTA TAGGTGAAAT GGACGATCAT CAGCTGTGGG CGGGTGCAGA TCAAGAAATC AAACTCCAAG CCAAACCGAC TATTCAGCGT TATAAGGGGC GATGTGCAGA AAACAACCAT GGATCCCGAA CATCGCTGA TGGCTAGAGT TI'CTGI'AGAT AGCAGATAAA ATCTTTGATA TGTTGATGGG GATCGAGTTG TCCTCGTCG INFOP14ATION FOR SEQ ID NO: 162: SEQUENCE CHARACTERISTICS: LENGTH: 6415 base pairs TYPE:. nucleic acid 1049 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: CCTCGGAAAG TCTTGAAAAT TATGATAGAA TGGTGGAAGG AAAAATTCAG GAGAGTACTA GTGACTCAAA ATGTwrGAAAG AAATATCTGC CTGGTCTAA'r GAAATTCTAT TTATA.AATGC ATAAAGGAAG ATACAGAG'TT GCTAGTGGTr TTAACCTGGG GCTCATTCAA AAGTTrACTGA GAAT1'TGTCT GTGGGGGGCC ACCTTGCATC TTGTT~GAGGA TCTGAGGATA GATATGTTTC AAGGTTGGTT TAGTAAATGA ATTCGACAAT ATGGTTATAA CGACCAACAT TCAAGAAPAAT ACAAGTCATG TTCAGTTTAA TCTTCTCGTA TCCATTGTAA TCAGTGCATA CAATGAAGAA 'rGAAGACTTA AAAAATCAAA CCTATCCTAA AGAGGATATT TATGTCCACA GATGGGACCA CAGCTATCAT TCAGCAATTT TAACTCAATT AGA'rTG1'ATA ACA.ATCCTAA GAAAAATCAA AGTTAAACAT TCTGTAGGGG ACC rTAT=r AAAAATTGAT GACTI'TTGTA ATGAACAATG TGGCTATTAT TCAACAAGGT TAGACCGACG ATTGTCGAAG GAAAAGGAAA ATGGGCAGAG AAATATGTrT GGCAGTAGCA TTGCCAATTA TCGAAATAGT TTCTATTTPT CATGGAATGT ATAAACGAGA GGTTTTCCAG
GCAACTTGGC
AATCCGCTAT
GCTGCATCAA
GTGTTTATCA
CGAACTGAAG ATAATGATAT AGCCCAAGTA T'rCTATCTTA TTGAGTCTTG TGTT'rAGTCT AGCATTGTTA TTAGGTGCCT ATTTTCTACT TTTGTCATTA AATGGATTTC TAATTGTGAT GCCCTITrATT GGGACGATTG TAGGITTAAT TAGAGGA'r1T ATTTATT1TGG ATAAAATAAG CCAAATAAAT
AAGTATTCAA
TTATTTCACT
CCGATCACAT
CTCACTTTGC
TTATTTTCCA
AAATGGAAGA
ATGGTTTGTG
ATGTTCCTTG
TCGTATTCAT
TGACTTTATT
TTCACTTTGC
AGGAGTACAA
TCATTATAGA 660 TCAGTATATT 720 GATTGGCTTG 780 TTTATTTGTT 840 A.ACTTTACTA 900 AAAACATAAA 960 TTATGGCCTT 1020 GAGAACAATA 1080 ATATAGTAAA 1140 CAAAATATGC TATAATA.ACA ACTCTrTAA OGAGGAGTAG ATTTCTATGA ATAAAAAACT AACAGATTAT GTGATTGATC TGGTGGAAAT TTTAAATAAA CAACAAAAGC AroGT=TCTG GGGAATATTT GATATTTTCA GTATGGTGGT TTCCATCATT GTATCTTATA NrTATTTTA TGGGCTGATT AATCCAGCAC CTGTTGACTA CATTATCTAT ACGAGTTTGG CCTTCCTGTT CTATCAATTG ATGATTGGTT TTTGGGGGTT GAACGCGAGC ATTAGTCGTT ACAGCAAGAT TACGGATTTC ATGAAA.ATCT TT1TTTGGTGT GACTGCTAGC AGTGTCTTGT CATATACTAT CTGTTATGCC TTCTTGCCAC TCTTCTCCAT CCGTTTCATC ATTCTCTTTA TCTTGTTGAG TACCTTCTTG ATTTTATTGC 1200
CACGGATTAC
ACCGTCGGAC
AACATCCAAC
GTCAAAAACT
AACGCCATCA
AGCGTATCTT
AAACTGTTG'r
ACCTTTTCGG
GTAAGACCAT
T'rAGTCGCT
TTGTTTATCA
ACATTCAAGA
ATCATGCGGC
AAAACAATAT
AGATGGTTAT
1050 TTGGCAGTTA ATCTACTCCA GACGCAAAAA AGGTAGTGGT GATGGAGAAC CCTTGATIT GGTGCCGGTG ATGGTGGCGC TCTT=rTrG GATAGTTACC CAGTGAATTA GAACTGGTCG GTATTCGA TAAGGATTCT AAGAAAAAGG TGGTGGTA'N' CC'TGTrTTGG GCTCTATGA CAATCTGCCT GAATTAGCCA AATCGAGCGT GTCATCGTTG CGATTCCGTC GCTGGATCCG ?CAGAATATG GCAGATCTGT AATAAGCTGG GTGTCAAATG TTACAAGATG CCTAACGG TCAGGGCCTT CACCAAGCAG GTACTGGCTT CCAAAAAAT GATATTACGG TCGTCAGGAA ATCCG;TCYTG ACGAATCGCG TCTGGGTGCA GAACTrGACAG CTTAGTCACA GGAGCTGGAG GTTCAATCGG TTCTGAAATC TGTCGTCAAG CAATCCTGAA CGCATTGTCT TGCTCCGTCA TGGCGAAAAC TCAATCTACC TGAATTGATT CGTAAGT'rCC AAGGGATTGA TTATGTACCT GTGATTGCGG CTATGATCGT TTGTTGCAAG TCTTTGAGC-A GTACAAACCT GCTATTG'T AGCCCACAAG CATGTTCCTA TGATGGAGCG CAATCCAAAA GAAGCCTTCA CCGTGGAACT TACAATG-TTG C1TAAGGCTGT TGATGAAGCT AAAGTGTCTA GATTTCGACA GATAAGGCAG TCAATCCACC AAATGTTATG GGAGCAACCA 0 C C 8* *S bC
C
9* CC C
CC..
C
C. *C
CCC.
C
CCC.
CCC.
CC -C C C
C
AGCGCGTGGC GGAGTTGATT CAGTTCGTI'T TGGGAATGTT AGATTGC'rGA-AGGTGGGCCT CCATTCCAGA AGCTAGCCGT TCTTTATCCT TGATATGGGC TTCrAAGTGG CCACACTGAA AAAAACTCTA CGAAGAACTC AGATT'rrCGT TGCTAAG4GTT AG=rCCGCAC TCTCAGTGGA CAACCCACAT TGAATAAAA.A TGCGTTT-rAT TATGTAGAGA GTCACTGGCT TrAACCAACC CTTGCTAGCC GTGGTAGTGT GTAACGGTGA CAGACTI'CCC
TAGCCAATCA
CATTCCAGTC
TATGACCCGT
CTGGTTATCC ATGCTGGTGC TTATCCCA.AA
AA.ACCAGTCA
AGTGAAATTC
TTrGGTATCAA
AATGTCATGC
GATGAGTTGA
AGAAAAACGC
CTTATACTCT
AGATTTATGA CTrGGCCCAAG CAATCGTTGA AGTTGGAATC CCGAACTCGT TGATAATCAA CTI'TAGAATC CATCAATCAA AGCAAGCTAT TATCGCCT'T ATAGTATCAA GTrACACAAC TCGAAAATCT CTTCAAACCA AGTTCTATCC ACAACCTCAA CrCAAAACAG TGrr11'GAGc TGAGC'rGAcT TCGTCAGTTC CAGT-CCATC TACAACCTTA
ACCTACTGTG
TTT'GAACGTC
TACTTTATGA
GA'rGGGGAAG
AAGATGG=TC
CGCCCAGGTG
GTTATGGATA
AAGArTGGAG
GCTAATCAAA
CTTGGTAATA
CGTCAACGTC
AACAGTGTTT
TGACtTCGTC
CATCCACAAC
AAACAGTT
1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 .3000 3060 3120 3180 3240 3300 3360
GCCTTGCCGT
TGA~ytGACT AGT'rCTATCC
CTTAAAACAG
ATATGGTTAC TGACTtCGTC TCGTCAGTTC TATCCACAAC ACAACCTCAA AACAGTGTTT TGTTTTAGY -TGACn.-TCG? 1051 7rGAGCTGCC CGCAGCTAGT TTCCTAGTTT GCTCT7TGAT rTCATTGAG TATTACTTCA 'I-T'rCTTCTG AAATGCAATT GTTACCCAGT CTATGCTArr GAAAATACGC CAAAACTICT AAGGGTrGT GAGCGATATA ATCAGGTTGA TAGITTAGTA GATCTGCTTG CTCTCCAAAT
CCCCAAGTGA
GTATCTCCGA
ATGACATCTG
TGGATTTCCA
TAGAGTGGAT
GCTTCATACA
TGGTCT'TTGG
TGGCCAATTT CTGAATACCT GrrCCGAG CTCCCAGCAT ATCAAACTTG TGATG.ATGGC TTGIrCTGGT GCTAGTTGAT GTGTCTGCAA GGC7T=GGA CCTTATGGGG TGCTTCAGGG CTAGAACCAT ALAATGCCATC AAAGAAATGA AGTTTTTTGC CATGTCTTGA GCAGTAGATG 'rATCCTTTGT CGTGGTGATG AACTGCTCGA TAACTCCTCA AGCAAGTCTA TAATCTGAGG AAAGAGTTGA TGCCTTTTGC C1-rATACTAA GAACGATATA TCTGCACGGC TCAGAAATT ACAGGCAGGT CGCAAAACTA CTTTCGAGAG GTGGTCCCAT AAAACCACCA ATAGTTTTGG CATCAGGGCT ACGCACCCCC TGAATCCCGA TAGALACTATC AACGAGGGTT GAGGTCATGG TTTCTCCTAT TTGATAAGCT CGACCAGTAG GGGTGGTAGC GAGTCCACCT AGC'rC'TTAA AGGTATAGGT CCATCCAAAT CGAAAAAAAT TATTCTCCGA AATTTCT TCAGCTGTTT CACGAAAGGC ACTTCATCCA CAGGGA'TTT GCAAAGCTAG CTCCCATGGC GGG'rCACAGA TGAGGCCTAG
AAAGGCATTG
CGCTGTGATA
TGGAG4GCGA
AGTTGGCATG
AGATTCGATA
ArTTACGTTTG CATATTT7TTA CTTGCTCCTA CTTGGTACAT CCTGCCAAGG CCATGTCTGC ACACAGGGA.A CTTCGACCAA ATGACAAAGG CAATAGCTTG GCGGCAGCAC TCATACCAGA GAGATGGAGG CATTGTMTC AATTGT'rGCT CGTGGCTGAG CAGCCAGCAC TTCCAGCGGT TTGACTGCGA TGGCATTTCG TTT'rCGATGT AGTGATCCAA T'TrTCATTGA GGCCAACI'G AGGAAGACTT CTTCACGTTC GCGACAmTC CNTGAAAGTC ATGCTTCCrC CTATTTAAAG CGATAGCCTC ATCACAGTTG G4GCATCGATC
'TGCGATGAAA
ACCTGCAACA
ACTGGCCTGA
GGCTGAACCA
GATGACTAG;T
GCAATT
TGGAGTGGCA
GGCAGCCGAG
TI'GGCAGCA
3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100
TAAGGTGTTC
ACTTrCACCTT
CCAAAGCCAC
TCAATAGCAG
CAGACCAAGC
AGAATCGTAT
TCTCCACCTG
CACCTrGCACC CAGAGTCAAG GACACCCACC CTCAGCACCT CAGCAGCAAA GAGGAAATCC CAGTGAGAAC GGATGGCAGA CCATTTTGGC A'N'GTGTTCA AATCTGACAG AGT=TCCG TCAGGCCACT ACGACATTTA CCAGATTGCG TTCCATGAGA CTGTTGTAAT CATGAG=rCT ATTCTI'TGAT ACAATAAAAC AGGGAT"=? CGAATTTCTT CATAATGGCT T1TTCACCAG GACAGAGGCT TTCATAACT3' GCGACCGGTC AATTCAAACT
CAGATCTGCT
AAATTIGACAT
CGACTGTCAA
TGCTCGACCA
TGTGGAGATG
CTTCGATAAT
1052 CTPTTCACG AGTGACATTC ATCTGGGCGA TATTGATACC ATAGCGGGAA AGCGCCTCTG TAACAAGGGC AATCATACCT GGAATATCTr GATGAACGAT GATGATAGTC GGTGTATrCA TATTGAGAGA GACGGCAAAA CCATrGAG'rT CGGTrACCTG AATATITCCT CCACCGATAG AAATACCAGT CACGCTGATG GGTGAGGGGC ATTGCTGTCT GTCTTGTGGG CATTTrAAC AGTAAT'rTrA GTGGTGTTAG 'PTCTGAATGG TCCAGACAAT CTrGATACCA CGCT1'GTGGC ATTTCAGGAT CATCTGTATC CATTCCTAAA ATACC7VGCAA 'rGACCACGAT AGGTCTTGGC AAATGAGTTA AAAAGrrGGA
CAATTTCCAG
CAAGGGCTAG
A?1'CAACTI'C
CACCAGCGGT
A'rTGAAAACG
AAAGAATGAG
TAAAAATAGT
TAATATTTCT
AGAAAGAAGG
CAGGAGTAGC
CAACTTACAC
CAGTTGAAAA
AAGAGTTGGT
TGTCGGAGTA. TCATCAAAAA TGGAAGAGAC AATCTTCCCA ATACGAACAG ATGGCTACTA GATGGGCCAA TCATAACTGG TCCGATGATA TCAAA GACAG AAGTGATrTC GGGCT'rGGCT
ATCTTTTATG
ACATAAAAGT
AGAATTTTCG
TGCCTTGTTT
TGTTAAAGAA
ATTGGCAGAA
ATCAG'IrTCC CCTTArAAAA ATTCTTATCr CTATTATATC TTAA'rTGTCG A'rGAAAACCT TTCTAATACC TCAAATAGCA ACAAAAAACA CCTrATTxTAG GGAAATAAAA AATAATTTTG GTCA6AGAAAC GGTAATATr'T AAAGGGTATG ATAGAACTAT AATATGAAAT CAATA.ACTAA AAAGATTAAA GCAACTCTTG GCAGTAT'rTG CTCCATCATT TGTATC'rGCT CAAGAATCAT GGTGATACAC TTrCAGAAAT CGCTGAAACT CACAACACAA AACAACCACA TTGATAACAT TCATTTGATT TATGTTGATC
ACTATTTGGA
GTCTGTTCCG
5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6415 TATCGATGGC CCTGTAGCGC CTGTT-GCAAC ACCAGCGCCA GCTACTTATG CGGCACCAGC CGCTCAAGAT GAAACTGTT'r CAGCrCCAGr AGCAGAAACT CCAGTAGTAA GTGAAACAGT TGTTrCAACT GTAAGCGGAT CTGAAGCAGA AGCCAAAGAA TGGATCGCTC AAAAAGAATC AGGTGGTAGT ATACAGCTAC AAATGGACGr TATATCGGAC GTTACCAATT AACAGATTCA TACCTGAACG GTGACTACTC AGCTGAAAAC CAAGAACGGG TACCG INFOR.MATION FOR SEQ ID NO: 163: SEQUENCE CHARACTERISTICS: I(A) LENGTH: 8494 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163: TACCCCTTTC GAATTTTGGC AAAAATTCGG TAAGGCTTTG ATGGTAGTTA TCGCGGTTAT GCCGGCTGCT GCTTTGATGA T7TCA.ATCGG TAAGTCTATC GTGATGATTA ACCCAACCTT 1053 TGCACCACTT GTCATCACAC GTGGAATTCT TGAGCAAATC GG'rTGGGGGG TTATCGGTAA CCTTCACATT 'XTIV=CCC TAGCCATTGG AGGAACCTGG GCTAAAGAAC GTGCTGGTCG TGCrTCGCC C GGTCTTG CCTTCATCI'T GATTAACCGT ATCACTGGTA CAATCTTTGG.
TGTATCAGGC GATATGTTGA AAAATCCAGA TGCTATCGTA ACTACTTTCI' 7rGTG=C AATCAAAGTT GCTGATTACT TTATCAGTGT 'rCIrGAAGCT CCAGCCTTGA ACA'rGGGGGT ATTCGTAGGG ATTATCTCAG GT~r=GG GGCAACTGCT TACAACAAAT ACTACAACTT CCGTAAACTT CCTGATGCAC TT-TCATTCT'r CAACGGGAAA CCTTTCGTAC CAT?1'GTAGT TATTCTTCGT TCAGCAATCG CTGCAATTCT ACTTGCTGCT TTCTrGGCCAG TAG'rTCAAAC AGGTATCAAT AACTTCGGTA TCTGGATTGC CAACTCACAA GAAACTGCTC CAArrCTTGC ACCA'rrCTTG TATGGTACTT TGGAACGTTT GCTCTTGCCA TTTGGTCTTC GACTA'rCCCA ATGAACTACA CAGCTCTTGG TGGTACTTAT GACATTTTAA
TAAAGGTACT
AAACCTTAAA
TCGTTTCAAA
TATCTACCGT
CALAGTATTCG GTCAAGACCC ACTATGGCTT GGTACTGATG CTAGTCAATA TCAACACTTG GTTGGACAAA TGATCGG'rTC ATTCG4GTATC AATGTTGATG CTGACAAGAA ACATAAATAC
GCATGGGTAA
TTAGATACAG
'rTGATGGGTG
AAAGGTATGA
GAATACATGT
CCTGCCTrcc
ACCACATGTT
CTGGTGCAC
CAGACCTTGT
TACATCCAGC
TGATTGTTGC
TGATTGCAAC
TCATGT'rCAT
CTATGGCTGA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 AGCTCrGMCA ACATTCTTGA CGCAACACCT ATGTATCTTG
CGTCGTAAAC
TGCAATCAGT
TGCTGTAATC
AGGGCGCAAC
CTACGTATGC
GCTGGTAffTG
ATGTACTTTA
GGAAACTACG
CAGGCGTAC
TTTACTCACT
ACTCATTCGG
GTATGGATAT
'rCGCAAACT
AAACTGCTGA
CTGTAAACAT
TGACTCGTCT
'rGAACCAATC
TGTTCA.AGGT
TTCA.ATCGAG
CGTTAACTTrC
CATGATTCAA
AGGTTCAGAA
TATCAACCTI'
TCGTGTAACT
AGTTGCAGCA GGCTCTCAAG CGTTGATGTT GATGCATGTA AGGAAATGCA GAGCAATGGA GGTTCAAGCT ATCTACGCGTC TGATTCAGGT GAAATCATTC CACTGTTCAc TTCAAAGATC TGCTTrTGGAA CAAGTAAAGG TTCTTGACTC GTACACCTAT GTTTGGGTA.A CGTTCTCTT AAATTCAACT ACGCAACTCC GAAACCAGCA GCGAAGI'CAA CTTGGTGGAC GTGTAAACAT GTTAA.AGATG CAGATAAAGT CTTGTCATGA AAGGACAACG TCTGATATCC APLGATATCCT ATGACTCAAC CACAACAAAA GTAGCAGACG GTCAAGTTGT ATCATGGGTG ATGGAN'GC GGTACTGTGT CAAGCATCTT AAGCAGA.AGG AGCTATGGGT CAAAAGCTGA CATTTTGAAA CTGAXACTCr TCCAAGCCAA TTACTGAGGA AC?17TAC'TCA ATCCAGTATT TGCCAAAAA AGTAGAACCT GCAAATGGAA ACATTGTATC TCCAGTTTCA 1054 GGAAGCAGGT CT74CAAGTAT TGGTTCACAT CCCAACAAAA CATGCTTTTG GTANTGTGAC TGGT~TGGAC ACAGTAAGTC rTGAAGGTAA AAAAGTTGCA GCAGGAGATC ACGTGAAACT TCAACAGTAG AGAAAAAACA G=?CTCTTG TTGAGGTTGG AAGCTGATr ACTCAATACT CACAGTTrGGA AGATATTCTT GAAAAGGACr CTCGTCAGAG GTGGAGGN'A CCAAGACCAT TATGTTAGAC CTGGACCTGG GCCTATAACC GTCTAAAACA CCTATTGAAG CTATCATACT CGCCGTCTrG 'rGCCAGTGT'r CATCTCTCTT
TCCTGTCAC
TTGTCTTCAC
ACCATTTACA GTTCATGTTG AGCGACTTG GATGCTATCC AAATGGTGAT GCAATTAAAT CAGCTAAAAC AGCAG~rGCT AAAGTAGAAT CCAACCTCTT AT'TTTCGGAG AAAAGAATGA TGGAGAAAGA AGCAGAGGAA AAATTCCAGA ATGAT'I'GAT TTGTTTTCAA GAAATCAATC ATGACCTTTA TCALAGCTTTG CCAGCAGCTG TCITGTGA AAAGTTGTCT GAGCAAGCGA ATATCGGCTA TAACCGCTAC CACGAAGGTG
CTGAAGGACA
GTGCAGCAGG
CAGTrAAGTT
TGTAATATAC
AATrTrAAC TTTTGTGcGrA
AGGAGATGAC
AGCCTATTCA
AAAATTACTA
TGGCTATCTT
ATCCAACAGA
AGCTAGCAGT
CACGATTTGA
ACAATCCGGC
ACGCATTTGA
GCTGGAAAGG
CGGTGGAAAA
ATGGCTTGAA
CCAGAGAAAT
CCCTAGCTGA
GGTGGGATAA
TTT'=~rTA
AACTGTAGTC
ACGTT1TCCAA GGCTGTCTTG AAAAAATTGA ACAAGCCACT TTTACTAGCT 'rGGACAGGAA GGTTACCAAG CTATTTTAGC TACTCCArI'A AGTTGCTCAkA GAGAAAAGTG GTAGCTATAC TGTrCCGCCT GAACACTGAA CCCCTTCGAA TCGATTATGT CTTTACTACC TTTACA'rGTC G;TATTTGATG GTAACAAGAG TCCACAAGTG TGCTATATTA AACTGGAAAT AATAACTGAA AAGAGGTTGG 'rTCT'rACTAG AGAAGCTACT GGAAATAGCC TAAATAACTC AATATGGTAT AATTGATAAG GTAGATAGAA TCGACGATGT AA'rTTAAAAA CTATATTAGA GAAGCC~rGA AGGAGTTAAA TGCAAGACAA GTTGATTCCT ATI'GTTTTG CAGGTCGTGA CAGGT'rCAGG TAAGACTCAT ACTTTC7TTGT TACCGA7'T= GCGATAGTGT ACAAGCAGTG ATTACTGCAC CGAGTCGTGA AAGTAGCGCG TCAGATITCA GCTCACTCAG ATGTCGAAGT GTGGTACGGA TAAGGCTCGC CAGATTGAGA AAT'rGGCAAG TTGGAACACC AGGCCGTATC TACGACTTG TTAAATCTGG CCAAGACATT TG7TGT'rGAT GAAGCAGATA TGACC=rGGA
GAT=TGATG
GATGGCAAGG
GAAGAATGG
GGAGATTTCA
GGCTTACAAG
GAAATTGATG
AAAGAGTTAG
AGTGATCACT
1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 AACTATAAAA TTCCAGCCTT AGACTACTGT AATGGAATAA TATGTCATTT ACGAAATTTC A7TTACAACT CCAACAGALGG CCTAGTAGGA GAATCAAAAA CCAGCAATTA GATGAAGCTA GTTGGCTACT CAAA'rTrACC TCCTGTGGTI AATTATGTGG CAATCAGCCT CATATTGTTA TGATTTAGC'r ATTCATAAAG TATGGGATTC TTGGAAACTG 1055 TTGATAAGAT TGCTGGCAGT CTTCCAAkAAG ACTTrGCAATT CATCCTCTTC 'rCAGCGACTA TCCCACAAAA ACTGCAACCA TT GAAAA AATACTTATC AAATCCTGTT ATGGAGAAAA TTAAGACCAA AACGGTTATT TCTGACACCA TTGATAATTG GTTCAT'm'CG ACCAAGGGAC ATGATAAGAA 'rGCTCAAATT TACCAGTTGA CTCAGTTGAT GCAGCCGTAT ?TGGCAATGA TrTTGTTAA CACTAAAACG CGTGCTGATG AATTGCATTC ATATCTGACT GCTCAAGGCT 'rGAAGGT'rGC AAAAATCCAT GGCGATATTG CCCCTCGTGA AGGTGCAAAA TCTGGATTI'T GAGTATATTG TCGCAACAGA ACATTGAAGG TG1'CAGCCAT GTCATCAATG ATGCCATTrCC TTCATCGTGT TGG'TCGTACT GGACGAAATG GCCTACCAGG AGCCAAGTGA TGACTCCAT ATCCGTGACI' 'GGAGAAATT AGATGGTCAA AGACGGGGAA T'r'CA.AGATA CCTATGACCG AGAAAAAACA AGATAAACTT GATATCGAAA TGAT'rGGTTT ACGCAAGCGA ATCATGAA'rC
TTI'GGCAGCG
GCAAGACI'TA
TACAGCTATT
GGGA-ATCAAG
TGATCGTCGT
GGTTAAAAAG
TGATGAAAAG
TAAAGCTAAA
ATTTATGACA
AAAATGGTAT
AGTTTTCCTA
CGTGGCATTG
TCT7"=TTG
ACCCTTTATC
TTTAGTCCTA
GCCAACCGTG
AAAAAGAAAA
CGCCGTAAAA
CGCCAAACAT
ACGA.ACTATC
AATGATAAAG
TCAGTACTTT
AAGTCAAACC GGGTTATAAG CCA.AGCGTGC TGAAAATCGC 'rTTAATAGPLA ATTGTTGGAG TAAACCGAAkA CACTACATTA 'rTATATAGTC CCGATAAGAT GTAACTCTAT AACAATATTT ACGTCTGAAT CTGTATCTGA AT'TTTGGATG CTATTrTrAGC TATACTGGTT CTGTCCACGT CGTGTGGTTC GTGATACCAT GCCTGAGACGG TGGGACTACA GTTAACGAAG CCT'rGGAGGT GCAGGTGACC AAGGGCTCAT TTGCCAAT'rG CACTCAGTCA GAAATTAGCT ATC'TCCGTCC GACCGTCCGG TACGTGTAGA AAGAAAAT'rC AATGGGCGCT GCTCGCGGTC GTGCAGAGCG TATTGAGCTC CAACTTTTTT AAGACTGCAA ATTGCGATTA GGTAGGTATT' TATTACGAAG 3720 .3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400
TTTAAGGGGG
GGGGCATCCG
AAAGGATCCA
TTTGGTGAA
TGCAGAGATT
CCCATCTTG
TCGTGGAAAT
GACATTTTA TGTCAGAGCG TAAATTATTC GATAAGATTG CAGACCAAAT TTCAGATGCG GAGGCGCACG TTGCTGCTGA AACAGCTGTA ATTTCTACAA ATGCCTATGT GGATATTAAC GGTTATACCA ATACAGAATA TGGA7=TCT GTGGAACAAT CTCCTGACAT CGCTCAAGGT GCTGATCAAG ATCCACTGGA CTTGATTGGA GT-rTGATTT GCAGTAGATG
TAAATTGGTT
AGATGCAAAA
TACAGTCGTT
CGTCGTCTGG
TCACAACTTA
ATTTCTACTC
AAGGTCATCA
AAACAGA.AGA GCTTATGCCA CAGAACTTCG TAAGTCTGGA CAGTTGAGTA CGATGAAAAT AGCATGATCC AGAGGCCACT AAGAAGTTAT TCCATCTTCT AATGAACAAA TCCATCAAGA TGTGATTGAC
S
S*
S
b S. *G *S*4
S
.5.5 Se SC 4 5
S
S-b.
5* S S
TATCTTGATG
CCTCAAGGGG
'rCTCGTCATG
TCTTATCCGG
GAAGT'GCAGT
TCGGTACAG
cTTCGCCCTG TCGGCT1TACG
GTAGATGCTT
TATAGTTTTT
TACATGTATA
ATCAATTTGA
AAAPLATOATA
AAAG7I1TTAT
ATGGACTCAT
AACCTGTT
TTTCTTTT'CC
GGTATAATAT
CAAATTGCTG
ACGATAGAGG
GCGACCTTGG
TCCATCAACT
GATTTGCAGG
GAGGAAACCC
CTAAA'rCTTT
ACAGACCGGA
CCACCTTATT
1056 ATAAGACAAA A'rrCTTTA'rC AATCCGACAG GTCG71rrMG AATCGGTGGT AC3'CAGGTTT GACTGGTCGT AAGATrATrG TAGATACTrA TGGTGGCTAC GTGGTrGGTGC C7TCCTGGT AAAGATGCGA CTAAGGTGGA TCGTTCAGCC CTCGCTATAT TGCCAAGAAT ATCG=rCAG CAGACCTrGC TAAGAAGGCA TGGCCTATGC TATCGGTGTT GCGCAACCTG TTTCTG?1'CG TATCGATACT GAACAGTAGC TGAAAGTCAA c=GAAAAAG CGGCTCGTCA AATCTTTGAC CAGGGATTAT CCAAArGCTG GACCTCAAGC GTCCAATTTA CCGTCAAACA GTCACATGGG ACGTACAGAT ATTGATCTrC CATGGGAACG TTrGGATAAG TGAALAGAAGC AGTAAAATAA GATrTAAGA GGGGAACCTC CTCTCTINT AACTATACTG GGATACTGTT CTGAAAATCC ATTTrTGCGAA AGTAGAGAT GTAGATTGAA ACTAGAATAG TACACCTCAA CTTCTAAAAC ATT-GTTAGCA CTGTCCTGAT CGATTTCTCC TGTTCTTGTT TCAT~wr'ACT ATATTCmT AAGGTTAAGA TTTCTCCTCG TAATAGATAA TCTTGGGGAT ATTTCAATCC TCGTATCAC T'PGAC'rATC CAAGGTTTTC TAGAGC2ACA GAGTCATGGA GGTTGAGATT TCTCCTGTT GCTrGGACTT CATTCAAAAG TCTGTTACCC CAAACTCIA ATACACTAGC TGT'rTCCATA GCATGACrTC 'rGTACTrAGAC GAATAPAATAG ATAGAACCAC AGAATCTAGT AA.ACCTAGAA TrAAAATTAT 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 TAGCAATAAA AGAAATCTGG AGGATTAGAA G'rTTAGTT TGACAATTGC TTGATGAATG AGTTAGAAGA GGTCAAAAAC TCAGCGGCAG ACTTCCGTC-A GGGGAATCCT GAGCCACGCT CTATGGGCTT GCCAAATAAT GGCTTAGACT AAAAAGAGTC GAACCGAACT TTCTTCTAT ATACTATTTT GAAAAAAGTC CAAGAGAGTG CCTGTCCAAA TGTTCCAGGT AAACCTCAGA VTTTGGCAGA AGTGTITTGCT TAC=CACCA TTGATATTGT TCACNrGAC CAAGCGGCAG TCATGGTATC AACGAAAACA CAGCAGGTGT GGCTTGTATG GAACCTTTGT TACTAAGACA ACCAAGATGT TCCACTTGGT ATTATTI'GGA TTATC"TTA CTCTGGTCGG CA'rGTCTCCA ATTTTCGTGG TCTGACTGAG TrGCCTATGA TTrTGAGACA AACCTC7"GG AATTAAATTG CTATTTTCAA CAAATATCCG TCTATATAGA AGACGAATCT GAGAATACAT CAAACCGACT CTCAAATCCA AATTATCGGA CTCAAGTTTG TCAACTGCGT TAACTCTATC GGAAACGGCC GTCGTTATTC GGCCTAAGAA TGGTTTTCGT GGAATTGGTG GCTTTAGCCCA ATGTTCACGC CTTTTATCAA CG =TAAATC ACAGGTGGCG TTCTGACTGG TCGAGATGCC rTTGAACACA TcCrCTGTGG G'rGCAGGTGG GAACGACCCT TCACAAAGAA GGCGTCAGrG CTr rTGACCG GAACTGAAAG CAATCATGGTI GGAAAAAGGC TTGCGCTATA TTGACTAAAT TAAATCGAAA ATrGA.AGAAA GTCAGAAGTT GACTrTATCA ACAGATCAGG GTCAGT'rTGA AGTGATGAAG TCTGCTGACC TCAACN'TGC CTACTrTGAT cTGGAGT'rGG TCAGCCTrCC TTTCTTTGCG ATGGATATCA CGACTGCTAA GAAACCr'I- GAATACC'TTG ACAATCCTTC TCCAACAACC GATAGCAAAA GACGGTrAGT CAAATACTT GAAGTAAAAG AACAAGAA'1r GCGCCAGTAC CAGTTTACCA ATCATTCTT'r TGAAAATCTC ATCCAGAAAA ATCTTC'TCTT T'rrACAGrCC GATATTGTTA ACGCAA'rTCC CAAGACTTGC ?rCTGACTAA AAAGATGGAT CAGGCGCGCG AAGATGAAAT CAAAC'TGATT GCAGTCATGC AGATT TTGGC GGACTCTGGC CAAACAGAAT TGGGACGTAA CCCAAATCCT TATCAAATCA
TACGAGAGCT
AATCTGAAGA
AATTTACCGA GCCTGAGCCT AGTCAAATGT TGAAACAGAT ATGAAAGAAG TAGTTTACAA GATGAGAAAA 'rCc'GATATT TTGACAGATG ATGAGCTTAA AAGTTGATAA TCTTTGCAGA AAGC ATG CCAAGGCCTT TAGAAGATTr
AAGGAGAGAC
AGCA.AGTATG 7260 CA'rTACCAAT 7320 CCGTIGGGAAA 7380 GATGCTAGCC 7440 ATTrTACAGGG 7500 TGGGTATGA'r 7560 GGATGTGGAA 7620 AGAr'rATTTTr 7680 GTCA'rTTGAG 7740 AGGAAAGCTG 7800 CGATGCAGTA 7860 ITTCCAAAAGT GGAGTCAGAA ACAAGGTCIG CTCATCAAGT CGGGGTTTCA ATTTAGCGAA TATAAGGCGA ATTCTGTrAr TGAGGAAGAG AGGACAATAT TTTTGATr'rA ACTCAGTTTA A TTGGTGAG AGACTTGACC TTGCAAGGGG 'rGGACATTTCGGACTTTT ACTCAGGTGA CGCAGATTGC AAGTAGTTTA GGTAGTTATC AGTTTGCATT AAGAGAT'rCG AGAGGACTr'r AT'rTGATTGA GACAGACTAT CAGATTAAGA AAAAGGCACT CTTACAGATT GCTAGTCAGG 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8494 C TGAGCTT TTTGAAGCAA GCTATTTCCT CAGGTCTTTA TGAAAAAGGT TTCCTTTTT TCAATTGACA TTTGTTGAAA CTACTAACCC GCGG INFORMATION FOR SEQ ID NO: 164: Ci) SEQUENCE CHARACT'ERISTICS: LENGTH: 9707 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164: CCGGTCAGTT CGTTCAGTAC AAGGAATCAT AATGAACGAT CAATCAGAAA AAAAGACTAG AAACGA-AGACT GTATGGATAA TCGACCAArr ACCGTIGTCGC GCGAGCTCAT TNCGGCGCGGG CGCCCTATGG CTGGTCAACT TTCTTGAC ACTGCGGTCG 'rCTGGCAAGA TrGCCAGGAG CTTCGGCAGC ACGCCCATGA CGGTACAATC rrACAGGTGG AGAG =rGGC TCAACCAGTG TTACCAAGAA GATAGCCTGA TTTGGGCTG ATGGGGCCAA AGGTTCAGCT TTACTCAATT ATTTTGAAAT TACACAACAG CCAGTAGCCA
GCGCCAGCTT
CCCCCGTCCT
CAAGGA'rcTC
AATCAAGGCT
CATCAAGTCC
AGACATATAC
CTGTCCCAAG
GGTGGTCTAT
1058 GGT7"rrGG ATTCGGT CGGGGCTTG CCCCATGAAG AAATCC'TCTA TAT'rGGAGAT GCTGAGCAAA TTCGTGAATA TACTTrCAG AAAATGAT'rG TCATTGCTTG TAACACTGCG
CAACTAGATA
AGTCAAGGTrG
CGTCAGAAAA
?TTTGCTCCCT
GAAACCC'TGC
7rCCGTC~r GGGTGTAATr GGAAAATCGG AGTGATTGGA TCCATGATCT GGATCCCGAC TGGTTGAGTC ACGTCCCCTG GTCCCTTGGT TGGAAAGGTG a a a TACTCATTAT CCACTCCTIC GCCCTATTAT CCAAAATGTC CATCGATAGT GGGGCAGAGT GCGTACGGGA TATCTCAGTC CAATCGTGGT CGCGATGCTG GACCAC'rCCA TCACCGTTTT AAGTTTTGCA CAAATTGGTG AAGAArGGCT GGAAAAAGAG ATTCATGTGG ACCATGTAGA ATTATGACAA ATAAAATTTA TGA.P.TATA6AG ACTGGTATGT TGGGTCTTAT AGTATTTTTG GTGGCGTTAA CAGTTTGAGC CAGAT'rTTCC TCTGrTTGAA 7rCTCCAAA6A TATrTGGAGA TGAAGAGTAT I'TTCAGTTAC TGTTTTACCC TATGGTTCTA TCTACCGTTT GrrCTccTTT TGCTTAATCA AGAAATGGGA CGAAACTTGG AAGTTrA"rCA ACGTCATGGG TGGTTGAAAA TGGGCAACTC TTGTATGTAG AATTGCCTAA AGAAGGGGTC AT'rTC'1rTGA GACAAGCAAG GTCAGAGAAA CCTTGT'TGAT TGCGACTCGT AAACCAAGGA ATTCCGAGCT ATCTrTGATA AGTTAGGCTA CGATGTGGAA ACTACCCTGA CCTGCCTGAA GTAGCAGAAA CAGGTATGAC CTTr'GAAGAA T'IAAcGCCAGA AACCATTTCT CAATTAACGG GCAAGATGGT TTTGGCAGAT
GATGACCAGG
GACTATAAGA
GGT'rTCCCGC GTG4GTAGACA
GCCCTGCTCT
AATG'rrCATG AACGA6AGGTA
AATCTTAATG
A6ATGCCCGCC GA7TCTGGC
GCAGGTGTGG
GTCTTTGAAC
AATAAGGAAA
GGTGAAAATG
GCTGAATTAA
CTTTTGGAGG
GCGATAGCTT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 TCAAAGTCGA TGTCCTTGGT GGCTTACCAG GAGCAACTGA CCG'TGAAAAT AATGCCAAAC TCAAGGACCG CTCGGCTCAG TTCCACACAA GTTTAGTTGT TGAAGCAGAC TGGTCAGGTT GCTTGGCTA TGATCCCCTC TTCCTGTAG CCCTGGAAGA AAAAAATAGT CAATCTCACC TATTTCCATC ATGGCAAAGC AAACCATCAT GCGTCTGGTC AGCTrCGTT1'C TCTTGCACGA ATrGGCCATG CCCTAGTCGT AGCCAGCCCA ATATTAACTT TGAACCTAAG CACAAACAGG TGAGTCATCA GTGCCTTAGC CGTTAAGAA6A TGTAATGAGC GATTCCCATG 1059 GATTGTGGAA GAAGTCCGTG ATCGCTATGT GCAAAC'TC GATGCCTGTT TTCATA-ACGG CGA7TCTGAA CTACGTCCGG ArrCTCCACT TTGGGAGGGC CATGGACTTC TACGCCGGCT ACCCAGAACG TCTGGTGACT TATCCAAACT CATGGTCACT T37rTGACA'r CAATTTCAAC GOCTCAGGAG GAAGAGGCCG CTATCTGCCT CTATGGTCAC C?1'GGAAGGC AAGATCCTCT TTCTAA.ATCC AGGTTCTATC CAGAGAATGT CTCTATGCTC GTGTGGAGAT TGATGATAGT GACACGAGAT CACGAGGTGT ATCCAGGTTT GTCCAAGGAG GAGTTTGAGA CTTTCTTGTT GOCCAGGAG GAAAC'TTTT GCTGTGTTGA TTGATACCCA CAATGCGGAT CATGCGACCC ATCCGCGTTG TTAAACGGAA GAGCTI'GGTT CGA.CCAAGAT TTTCAAAAGT TGGACTACTG TTGCATGTGC CAAGTGCrrG AGTCAACCAC GAGGTACCAT TACTTCAAAG TGGACTT =r TTTAGCCGAT GATTGCCAAG TGACCCCTGC TAAAAATCTA TCTTGCTCAG TCAGATGACC TATACCCGTG TTCCCGTTGT GACAGATGAA AAACAG'N'TG TTGGGACGAT TGGACTCAGA GATATTATGG CTTATCAGAT GGAGCATGAC 'rTGAGCCAAG AAATCATGGC GGATACCGAT ATCGTTCATA TGACAAAAAC GGACGTAGCG GTTGTrrCGC C'rGATrTCAC CATTACGGAG GTCTTGCACA AGCTAGTAGA TGAGTCCTTC TTACCGGTTG TGGATGCAGA GGGTATTTTC CAAGGGATTA TTACGCGCAA AGTAAGGAAT ATGAGATTCG AGCAGGGCTT GTCTGTCAAT ACATGGTAGG TGAGCGGATT ATCTAAAAAT CAGCGCCCAG
GTCCATCCTC
ATGCCAATGA
TCCAACCAGT
TCTGAGACCA
AAGCGAAAGA
AAGGCCGTrA ATCCCTCTT GCATGACTT'r GAGACAGGAT ITCAGCCrI'T TTAGAGGAAA 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600
CCTATAAGTA
rTCTCAAGAT
TTTCGGCCTG
ACCGCTTGGA
TAGACTCTTT
TAGAAATGGG
TGATTGGAG CAATTTTTAG TTACCAAGCC CAGCTAGCCA TAACCAATTT CTATACTTTC ATTAGCCAAA CAAGCTGAAA 'rTGGCAGGAA AGCGACCATC GCTCTTGCCC AGTGAGATTT TCTATCAAAA AGGAGAGGTG GACAGCTTT AGAAGACGGA AAAGCCAGAG ATTCTATACC CAGAGGGCCG CTTGCTAGCG CTCTTAATCC TACCATCAA GGTTGCGGAC ATCAATCTGG ATTTTCAGGT GTTGCGAATC AGCAAGGCTT CCCAACAGAG GATTGTCACC ATTCCCACGG CCTTGCTTTC AGAATTGGAA CCCTTGATGG GGCAGACCTA TCT7TTGAA AGAGGAGAGA AACCCTATTC TCGTCAGTGG GCCTTTCGTC AGTAGAATC TTTTrGTCAAG GAGAAAGGTT TTCCATCCTT ATCAGCTCAA GTCTTACGTG AACAGTTAT TCTAAGACAA ATAGAAAACA AGGTCGATTT GTACGAAA'rr GCA.AAAAAAT TAGGATTAAA AACAGTCCTG ACCTTAGAAA AATATAGATA ATGGATAPTA AATTAAAAGA TTTTGAAGGA CCCCTGGACT TGCTCTTGCA TCTGG CT AAGTACCAGA TGGATATCTA 1060 CGATGTGCCC ATTACGGAAG TCATCGAACA GTATCTAGCC TATGTCTCAA CCCTGCAGGC CATGCGTCTG GAACTCACGG GI'GAGTACAT GGTCATGGCT AGTCAGCTCA TGCTGATTAA GAGTCGTAAA CTCCTTCCGA AGGTAGCAGA ACTGACAGAC TTCGGGGATG ACCTGGAGCA GGACCTCCTC TCTCAAATCG AAGAATATCG CAAGTTCAAG C'TCTTGGGTG AGCAC~nrGGA AGCCAAGCAC CAAGAACGGG CCCACTATTA TTCCAAAGCG CCGACAGAGT 'cArTTTACGA AGA'rGCGGAG CrGTGCATG ACAAGACGAC CATTGACCTC 7rTGACT TrTCAAATAT CCTACCCAAG AAAAAAGAGC AGTI'TGCACA AAArCACACG ACGATCTTGC GGGATGAGTA TAAGATTGAG GACATGATGA CTGCAGGAT 'rTG=CAACG AACCCTAGAG 1'TAATCAAAA TATCTATCTC ATGGAAAAGA
TTATCGTGAA
AAGCCCAGAA
CCCAGGAGT
AGGAAGAAAG
AGACTC=TG
TGTCCAAGAG
GATCCTrCGTG
TCAAGTGCCT
ATTGGACGAG ATCAATT'GCG GTCATCACCC TCITTTTGGC CAAGAGGAGA GTTTTGGAGA CAAAGCTAGA CTTGATAGAG Vooo AGGAAAGATG AGTACTTTAC CAAAAATAGA AGCGCTCT'rG TIrGTAGCGG GTGAAGATGG GATTCCGGTC CCCCAGTTAG CTGAACTCCT CTCTCTGCCA CCGACAGGCA TCCAGCAAAG TTAGGAAAA TTAGCCCAGA AGTATGAAAA GGACCCAGAT GACAAGTGGT CCTTATAGAT TGGTGACCAA GCCTCAATTT CTCTAAGGCG CCTATCAACC AGAGCTTGTC TCGGGCTCC TGCCTACAAA CAGCCGATTA CGCGGATAGA AATTGATGCC TGGAGCCTTG GCAAAGTTGC AGGCTTTTGA CCTGATAAAG ATTGGGGCGC CCCAACCTCT ATGTGACTAC GGATTATTTC CCATTTAGAA GAATTACCAG TGA'rrGATGA GCTTGAGATT ATTTGTGAA AGCATAGAAG AAGATGAGAA TCAATAAGTA CCAGTAGGAG AAAAGCAGAA GAGCTGATTA AGCAAGGCT TGGTGCGTGA ACTAGCAACC ACTATCAAGT CAGGCGACAA TCCAGTTTGGC T'rTGATTGA GCAGAGATNT TGAAGGAATA CTTGAGACCT 'rGTCCATTAT ATCCG;TGGAG TTAACTCGAG GAACACGGGA AAAAGGAAGT CTAGATTACA TGGGGATAAA CAAGCCCAAG AAAGCCAATT 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 TA'rTGCCCAC
GCTGACGGTT
GGTCGAAGTT
GCAGGTGTGG
AACGGCCA.AG
GAAGGTCAAC
GTGATrCCA
GTCAAAGAGC
CTATCTACAA CGAAGAAAAG GTCTACTATC TGCTTAACAA ACCACGCGGT GTGTGACACA TGATAAGGGT CGCAAGACGG TTCTCGACCT C=GCCCAAT GTATTTACCC TGTGGGTCGT TlrGGACTGGG ATACATCAGG TGTCTTGA'PT N'GACCAATG
ATGGGGAT
CGCGTGTTAA
TTGATGGTAA
ATCGCTCTGT
'rACAGACGAG ATGAITCACC CTCGTAATGA GATCACA.AG GTTTATGTCG ACTGTCGCC AATAAGGACA ATCTCCGCCC CTTGACCCGT GGTCTTGAGA GAAAACCAAG CCACCTGTTT ATGAAATTCT CAAAGTGGAC CCAGTCAAAA GGTGCACT'rG ACCATCCATG AAGGGCGTAA CCATCAGGTT AAAAAGA'IGr 1061 T'rGAAGCTGT
TGACAGGACT
AcAcCATGGC TGCTCTCCAA GTAGATAAGT TGTCTCGGAC TCGITTCGGA CACCTAGACT CCGTCCAGGA GAATCCCGTC GTCTTAATAA AAAAGAAATC AGCCAACTAC TGTAACTAAG AAATAATGAA ACGAAT1TA ATAGCGCCTG TGCGCTT~rA ccAAcGTTT
CTACATGATT
ATCTCACCAG TCTTTCCACC CTCTI'CTCGC CAGGCTATTG AAAAACATGG GTTTAAGGGG a. a a a a. a.
GATI-rTACGT TGTCATCCCT GGTCGAAAAC CC11'AAACGA AATCAAGAAG GGGAATGAGG CGCATCCTAT CAGOTTTGAG TGAACTGAT ACTCTrTCGAA AATCTCTTCA AACCGCGTCA TGAGCAACCT GCGGCTAGT'r TCCTAGTTTG GTTTGCAAGTG GCTTATTTCA AAGCTTTTTG AAGTCCCCT CCGCTTAGAT ACCAGAGGTC AGCAGCAGGT GTCAGCGA TAAGGGCATT CCCACCGATG GCAAGGGTAC.G'NGATGAC ACTTTCAAAG CTGACTTCTIT GTCCGTGGCG
AGGTAAGGAC
TGGGGTAAAT
AGGATGCGT
GCTTTCATCT
CTCTTTrGrATT
TATGTCT'CA
TGGTGTTAGT
TTTGAGCTGA CTrGTTCCAA GTATTGATGG GCTrGCCTCG CCCGTTrCCAG ACCGCTTTTC AGA1-rrCAAA ATGATAAAAA X'TAGAATGTC AAAATTTTAT GCAACCTCAA iACAGTGTT!' TTCATTGAGT AT'TAAATTGA ATCATGAGTT TTrGTTGATTC TGGATAATCT 'rACCATT'TTT TTCTAGGACA CCGTCG~rGC AAAGAGGATG 'rCAGGGII'GA TGAGTCTTCA AATTTTGTAT
TAGAGTTGTC
TTTCTTTGAC
CAGTTGGTTT
CTGCCATTTT
5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 GAATTTCAAG GTrGGTACA AGAAAGAGAA ACGAGATG TCCTTCATTA AGGACGATCG CAAGGCTT T'TV3TCAGAG 'rrCTTGGATG CTCTrGTCTA GCTTGGTCAA TTCTTCCTTG
GCACCAA.AGG
CCCAAGGCA CT'rGCTAAGG ATTCGATATT GCTTGCI'GG AAGAGAACGG TTGGGGCGAT a. a.
a. a.
a a ACGTGGCGAA GCGATAATCA TTTCATAGAA CCAACATTT]T.
v'T'rGTAGGC AT'rCCCACGA GCCGAGGTCA AAGGTCACAA ACTI'TrAATG GTTrACCTCTC G7TTGTACTA CATGCACCAA TTTAAGGGAT GTTI-TCATAA 'rTAATCTTAT TGACTAAGAG
AATCAGGCTC
TGACAGTTCC
TAT TTTTrC TCTT'rTCAG
TTGGAGCAGA
GTAGGAGCAA
TTTCTCCT
ACTGAAGGTT
ACCTTGGTA
TTCTTTGAAT
AAGGGCGGCG
CACTAGGTCT
AAATCC'A.AA
AACTTGGAA
GCTACTGvGTC
GAAGCTGGCC
CTTTCATTTT TAGTAGCGAC GCTTTCTGTG TACCAGTTTC GAAGTCCAGT AGTCGTCCTT TTGTCTACGA ATTT MGTGT ATAGCTTCTA AATCAGGTTC TTTAGATAAG TCGGAACAGT GCGCGAATAG TATCCGCAC AGTTTGACCT CGTCCAGTGA TCTGTCTGAC TAGTGCTTGA ACTAGGCCAG TGAAATAtAG TTAAAATG'1G ATAACGATT AGGGAGTCTC CTCTAACTTG AGCTTI'TATG TTACTAGCTA TAGATACAGA TCI'rTGTC AT'rGATATCA GCTAGCGTGA TGGGAATCTC ATAA.AGTTGA 1062 CTCAGCAGGT CAGCCTGCAT GATTTGATCG CTTCTTCCCT TTGzAACGCGA CAATTTCATC TGCATACTGA CTGGCCATGT ATAATGGTCT TGCCGAGTT1C CTCCACCAGT CGTCGAAGAA TGC7TATAT CGAGATTGT= GAGTGGTI'CG TCCAGCAAGA AGTACCATAG CGATAAAGAC GCGCTGGAGT TGCCCCCCTG TCln=TAAGT TGGTCAGTTC TAAATAGTTC AGACTTTCCc GATCTAAGTC GACC'TCGGCT GTACGGAAAA CGTCCAAAAC AATTTGGC7'r GGTAATTGAT rrTCTGTTTT AGGATGGT'rA GAAT'rCCAGC TCTCGATTTC ACG;TCCTTTG ATACTGAGAA AGCCTGCTCA TGATGGAGAG GAGAGTCGAT TTTCCAGCAC GTCAG'I=T GAGGACTGAC TTCAAGCGAA ATGCCTTGCA GA=r'GTCAA TG7MTCCAG3 TTTCACTGAC GAGACCTCCT AGAAGCCACC CACACTCTCA ATGATCATAC TGATACGAAT TGCTAAAGAC CTGGCCGTCC TGA'rATCGTG GAGGACGATG TCTGCATCAT GCTGACGCTT TAAAGTCCGT ACCTGGGCC ACAGGCTATT GATG'TAGCGG 7200 7260 7320 7380 7440
S*
S
S.
S
S. 55 S S 5.5.55
S
S
55
S
S
*5 S S
S
GGATTTTTTC
TGACCAGTTC
GTTCT'rGGGC
CTCCCTGATC
CATTTGGACC
AAATATCCTG
ATATAGTAAG
TTCCAGTGCA
CAGAATGGCC
GTTGGCCAGT
CAAAAGCACG
C1'GAGCCCTT CAATCAAGGC TTGCCCCAAG GTAACTTGTG CTGATAGTCT GTTAAGCTAA TAAATCCAAC TTGACAATCA GGTAGGTGAG AAGGCAGI'GG CCGTTGAGGT TCAACATCGA GTCCCAATAT CCAGTCTrCT 7500 TTCAACAGTC 7560 aAGTTrTGC 7620 TTCTGGTT 7680 AATAAAGGCT 7740 TTrTTTGAATG 7800 ATAAAGAATA 7860 AAGAC'rCGr 7920 AC'rATAAAGA 7980 ATAAAGCCGA 8040 ATTCCCCAGA 8100 TCTCTTTGCA 8160 GCGAGGATGA 8220 AAAAGACTAT 8280 CTGATATTTC 8340 TGCTTCATCA 8400 AGTAAGACTA 8460
AGAAGGCCAT
GGAGCTCTTT
GGTGCAAGAC
TCAGAGAACC
TTTGCAG=T
GAAAGAGACT
GTrGTCTTCAA
GGAATTCTA.A
AGGTCCTACC
CTGTTClTTT
ATCTAGAACG
GATGGCTAGG
ATCGTATTrCG
TCTGAGCGCT
GTALACCT'rGT
GATAGGGGAT
ACTGCTTTTC GAAAGAAkAAA GATTC CAAA ATGGAAGTGT TGAGATGTTG AAAGGAGGCA TTTGGATCCA TTAGGACTTG kAGCAAGGTG AGACAGATCA GCAGGACGAA GACCAGGTCT AAGGCGAGAA AGAAGAGGGA CTGGACAAGA GGGT7TGAG TAGGACGTAG TTTCCGTCAG GGTTTGAAAA AAACGATGAT CTTTTGGGAA AAAAGTAGAG AAGACAAGCI' GT'I'TGCTTTT AGTCTGCATC CTACCGATG.A TTCCTAGCAA GATAGGATAT CGCAAGCCAG TTGCCAAGTT GAAGAAACI'T GCTTTCAAAA ACCAGTAGI1A AAGGATTCAA TTCCCAAAAT ACTAGGCGTC AGGAAGCGAT CTAATGGTCG AAATCCCAGT CGCGATCGCT ACCAACAGAT CGCAACTTCC AAGCAAAGGC TGACAAGTGA GTGATGGCC CCGATGGCAA GAATAATGAG AATCCAGAAG AGCrrGGTAT TTTTCGTCCC CCTCTCCAGA GAAGTAGGAT AAAGACGAGA GAGACTGACA GACAACTCAT AGGGCCTA6AT CAGAACTCGG AACTAGATTG GCACCAACCA GTGCGACCAT GAGTrrGGTT
S.
S
S
S
S
S. 55 S S 1063 TGACTTAGAT TATCTCCATA GCGCTTGCGA ACAAGATTGG GAACGATAAC TCCGAGAAAT GGTAGGCCAC CCACGGTAAT CATGGTGACG CT'rGTCGTTA GCGCCACCAG AAAGAGGGCC AGTTTCAA GTAGGGAGTA GGAAATCCCC AAACTCTCGC TGGTF'rCTrT CCCTAGATTC ATGATGGTGA AGGTTTGGGA TAArTCCAA ACGGTTATCA GGATGATGAG GCCTA.AGAAG AGCCACTCAT ACTGATGGGT CTGAATCATG GAGAAGGAGC CCTGGGTCCA GGCAGTCATA CTCTGAACCA GATTG.AAACG ATAGGCGATA ACTTCTGTGA CTGAGCCGAT AATCCCGCTA TAGATGATCC CAATCAGAGG CAACATCCAC CTTCC'TT'rA CAGTAAAAAT GGTCATAAAG GCTAGGAAGA AGAGGGTGAA TACGATGGAT GAAACAAAAG CGAAGAGCAT C1 TGI'GGGTC AGACTAGCCG ATGGAAAGAC AAAAAGGCTC AGCACCAT'rC CCAGTrrGGC GGCTTCAGTC GTTCCAACTG TACTCGGTGC AGCAAACTGA T'NTrGGGTAA. TAGTCTGCAT GAGAAGGCCT GCCATACTCA TACTAGAGGC AGTCAGGAGA ATACTGATAG 'PTC TrGGGAG ACGGGACTCT TGAAAGAGGA GCCAGGTCTG CTGGTCCAAA TCAAATAGCT 'ITCCCCATGA AAAATCACTG GTCCCAATGC TAATAGAGAG AAAGACTAGG AGTAGAAGTA AGCCAGG INFORMATION FOR S.EQ ID NO: 165: SEQUENCE CHARACTERISTICS: LENGTH: 5910 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear Cxi) SEQUEN4CE DESCRIPTION: SEQ ID NO: 165: CCGCAATTAT GCTTGAAAAG GAGTATACTr ATAAGTAACG CAAACGTTTG CGTCTGAAAA ATACdCAACG TrCCATTATT TTAACACACG AGGTGCTATT ATGAAAAAAC GTCKAAGTGG TGTGTTGATG CACATCTCTT CTCTTCCAGG AGCTTACGGA ATCGGATCAT TTGGTCAAAG TGCTTACGAC TTCGTTGATT TCT'rGGTCCG TACAAAACAA CGTTACTGGC AAATCCTTCC 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 970? 55 S S
S
S
5.5.
55 S S ATTAGGAGCA ACTAGTTACG CACTCATTTT ATCGATTTAG TGAAGGAGTT GACTTTGGTA ACGTCGTCCT CTTTTAGAAA TTTTGAGAAA TTTGCTCAAG
GGGATTCTCC
ATATCTTGGT
GCGATGCGTC
AAGCGGTGAA
ACAACCAATC
TTACCAATCT TTCTCAGCCT TCGCAGGAAA GGAGCAAGGT 'T-GTTGGAAG CAAGTGACCT TGAAGTTGAC TATGCTAAAA TCTACTATGC ACGTTTCTTT GAAGTCGGAG ATGTTAAAGA ATGGCTTGAG CTCTTTGCTG AGTATATGGC GACTGAATGG CCAGATGCAG ATGCTCGTGC TATCAAAGAG TATTTTGACA ATCTTGCTTG TCGTAAAGCT TCAGCACTTG CCGTGTGACT CAATACTTCT CAACCACATC GAAATCGTTG GTGGGCAAAT CCACATCTCr ATGCCCACCA GATGAGTTT GGAAGCAATG CACAAAGACG AATCTACGAT ATCGTTCGTA TGCTGGTTCC GATACAGCAG 1064 AAAGCTA'rCG TGAGCAATI'G GCAGACAAGT TGG ACCA TCTTCCAACA AlrGGrrGAAA TTGAAAGCTT ACGCTAACGA GGGACATGCC AATCTACGTA CGAAGATT CAAGTGATAT
TCAAAACAGA
CTGTAACTGG
GCTACAAATG
TCGACCACTT
CACCTGGTGA
TGTCAA'rGGT AAGGCTACTr GTATCGCAGC TCAGCT'I'TGG GTAATCCAA TCTATGACTG
GTGGATTGAA
CCGTGGCTTC
GTGGGTGAAA
CGCTPGCGTG
GAATCTTACT
GGTCCAGGTT
TGCAGCCGTT AAGGAAGAAC TTGGTGAGCT AAACATCATC GCAGAAGACC GACAGATGAA GTGATCGAAT TGCGTGAACG TACTGGCTTC CCAGGAATGA U. *e
U
ATTTGCCT'rC AACCCAGAAG AGTT'ATGTAC ACAGGAACAC TGATGATGCG ACTCGTGACT GGTACACGCT ATGCTTCGTA
ACGAAAGCAT
ACGATAACAA
ACATGGCT'CG
TGATAGCCCA CACTTGGCAC TACGGTCrT GGTTGGTACC TTACACGAAC CGTAA.AGAAT CAGTATTTTC ATCAG'I-AGC TTTATGGCAA AAAGCTTCAA 960 GGGAAATCCC 1020 ACAAGCTTTT 1080 TTGGCTrCAT 1140 AGATTCTTCA 1200 CTGCTAACTC 1260 GTAATGAGAT 1320 ACGAAACAGT 1380 'rTGCAACTAT 1440 CCCTTGGTGG 1500 AAGGTTTGCT 1560 AGAAATAAGA 1620 AATCGTTACA 1680 AACTACAGCA 1740 TACATCTCAG 1800 CTTTACCACC 1860 GAGTTGAAr 1920 GCAAGATTTA CTAGAATTGG ATGAGGCACC TCGTATGALAC TTCCCATCTA AAACTGGTCT 'rGGCGTATGA CTGAAGATCA ATTGACACCA GCTGTCGAGG 'rGACTTGACA ACAATTTATC GCCGAATTAA TGAAAATTTG GTAGAT'rTAA CAATAATCAG GAGACAACTA AACATG;T'AT CACTACAAGA ATTTGTACAA ATAAA.ACCAT TGCAGAATGT AGCAATGAAG AGCTTTACCT TGCTCTTCTT AGCTTGCAAG CAGCCAAAAA CCAGTCA.ACA CTGGTAAGAA AAAAGTTTAC CTGAGTTCTT GATTGGTAAA CTCTTGTCAA ACAACT'rGAT TAACCTTGGT ATGTTAAAAA AGAACTTGCA GCTGCZAGGTA AAGACTTGAT CGAAGTTGAA U. Ut U U
U
*UU.
U
U. UU
U
U
TGGAACCATC
TTGCTACTCT
AACAAGTTCT
ACTGGTTrGGT
CAACTCTTTA
T=TGGTAAT
TGGTT'rGAA'r
TAAAAACAAC
CGATATTGAT
GGTGGTTTGG GACGTTTGGC TGCCTGCTTT ATCGACTCAA CGTGACGGTG rTGGTCTTAA CTACCACTTT GGTC'MrCC CAACAAGAAA CAATTCCAA.A TGCATGGTTG ACAGAGCAAA CGTAGCTACC AAGTACCA'rT TGCAGACTT ACTTTGACAT GTTACTGGTT ATGAAACAGC GACTAAAAAC CGC1'TGCG GATTCTTCTA TTATTAAAGA TGGTATCAAC ?TTGACAAGA ACTCTCTTCC 'rrTACCCAGA 'rGATAGTGAC CGTCAAGGTG CAATACTTCA TGGTTTCAAA CGGTCCGCAA TTGATCATCG 1980 2040 2100 2160 2220 2280 2340 2400 TGTTTGACI-r GGATTCAGTT CAGATATCGC TCGCALACTTA AATTGCTCCG TATCTTCCAA 1065 ACGAAGCAAT CGAAAAAGGA AGCAACTTCC ATGACC7TGC TGACACGCA GT7GTCCAAA TCAACGATAC TCACCCATCA ATGGTGATC CTGAATTGAT TCGTCTI'TTG ACTGCACGTG
GTATCGATCT
CAATCCTTGC
TGACGAAGCA ATCTCAATTG TPCGTAGCAT 'rGAAGCGCTr GAAAAATGGC C'TCI-rGAAT'r GACTGCCTAC ACTAACCA.CA ClrGCAAGAA GrTCCTC ACT'rGGTACC AATCATCGAA CTGITTCAAAT CATCGATGAG GATACAGTGT TAACGGGGTT AAGCCrCTA CGACCTTTAC GTCGTTGGCIT TATGCATGCT ATGGTTGGCA CCATGAAGCA TTGTCAAAGA AAAATTGGAA TGAAAGAACA CCAAGGTG GTCTTCACGA GTACAAACGC ACATCAALAGC TGGTAACATC CTCCAGCCTA CACAATCGCT TT-GCTAACGA TCCAGCAGTA TTACTGCAGC AAG'IrTCCTT CTA-AAGAACC TTCAGGTACT GTACTATGGA CGGTGCTAAC TCTTCGGTGA AGATTCAGAA
GAATTGGACC
AGCGGACGTG
GCAGCACTCC
CCAGAAAAGT
GTCGTGrGAA GGCAGACTAC TrCACATGGC TCACATGGAT ATACTGAAAT CTTGAAAAAT TCAACAACAA AACAAACGGT AACCCAAGAT TGrCTCACTA CTTGGATG.AG ArC'rGAG GATGAGCTTG AAAAACTTTT GTCTTrATGAA GACAAAGCAG AGCATCAAGG CTCACA.ACAA ACGTAAATTG GCTCGTCACT GAAATCAATC CAAATTCTAT CI'TTGATATC CAAATCAAAC CAACAAATGA ACGCTTTGTA CGTGATCCAC AA.ATACCTTG CCrGCTCGTC CAATCACAAT Cr'rCTTTGG'r GGTAAAGCAG CAAGACAT'rA TCCATTTAAT CCTTTGCCATG TCAGAAGTA GCTCCACACT TGCAAGTAGT TA7VGGTTGAA AAC'TACAACG ATCCCAGCAT GTGATATCTC AGAACAAATC TCACTTGCTT GGTAACATGA ATTCATG~r GAACGGAGCIYT TT-GACACTTG GTGGAAA'rCG CTGAGTTGGT TGGAGAAGAA AACATCTACA ACTGT'rATCG ACCTI'TACGC AAAAGCAGCT TACAAATCAA GCTATCAAAC CArrGGTTGA CTTCA'rCGTT AGTGATGCAG GAGCGC7TWG AACGTTTTTA CAATGAATTG ATCAACAAAG GATTGGAAG ACTACATCAA AGTCAAAGAG CAAATGCTTG GCATGGTTGG ATAAAGTCAT CGTrTAACAT'r TCTAAAGCAG ACAATCGCTC ACTATAACGA AGACATCTGG CACTI'GAACT TCAAACCACG TCAGCTTTAT CTGCAACCTC AAAGC.AGTC TTCCTAGTrr GCTC~rGAT TTTCATTGAG TATAAGATAC TGTAAAAAAG CGAGITTCGA TTGAAATTCG C"N'rNTAAT TTGTCTAAAA ATAGGGAAAT CCTAGATACA GTGAAGGCTT
AAAGATCCAG
ATCCACTACG
TCTGAGTTGA
ATCACTTTCC
2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140
GCGAATTCTA
TTCTTGCAGC
ACTGGTITCAT
CTGACTACGA
CM'TCTTCTC
AATACTCTTC
TTTGAGCAAC
AAATrTATAC
CGCTCGTGAA
TGGAAACAAA
C.ACTCTTCT'r
AGACCGTGAC
ATCTGACCGT
GAAAATCTCT
TGCGGCTAGC
TAATACATTT
GATGTAGATT 'rGGGTCAA'rC TAAATGCTGG TTTTTACTGT TATTATATTC GCTTACATAA ATATACACAC TTAAATTGGT AAGATTTTTG GAAAAAATAA GCAGCATT'GT CATT'rATGI'A 1066 CCTCAGCCTI' ATATTTTTC AGTATTrATAA TATAATTGTA GTTG?'rTATT ACCTTTCTTG AAAAATTTGG AAAATAGTTT CTAAGrrATT TTAAGAATGT TCTAA.ATTCT ATTTA'rCCAA CGCTTTTCTT CGATTTCGGC GTAG71TGGTT ACCTCATATC GGAAAGAAGC TCT~rATG 'rAATAACT GrrACCTGAT TrrGCAATATT GACGGCAGTG AGGGAAATAA ACCCTACATT C7"=rAGTT CTCvrGCTTC
AACAACTTGC
CTATTACTCA
GTrAACGTTTA
AC-AGAAAGCA
AGTAATTATA
GAACTT'rAAA
TTAATTTTAT
A'TTTATCC
TAAAGCAACA
ACTTT'TTGCT
TCCTAATGCT
TTGCCAAACT
TCGGTTAATT
G=r'GCAAGT ACCA'rTGCTA AA'rTCTTT'rC GCCATCAGGA CCCAAAGCAG TCCTACAAAG CAGGAATTTT TACAGCTTTG ACACT'rCGTA TCAAAGCGCC GAAAAGGA.AT GATCTGGTAT AAGATCATTC TACAATGTAC GACGAAGTCG TCAT'TGCAAT CAACATTGAT AAGATAA.AGA ATATAAACAA TGTATATCTC ATGATTAAAA GTTAATCCTT CATAATTGTG CTAAATAAGT CCCCGGTGAT TCATTTACAA ACCAACTACC TAGATATCTA GCATGr'r'TC CTTTTACTTT AATTAGATTT AAAGCTA'rTG CAGTATCTAA TAdAGCACCC GCTCTAAATG GAGAGTATGT AGGTGGGATT ACTTCTTTGA TCCArrGTTG TGAGACGTCT GTATTGATTT GTTCTTCTGT TACAGAAGTG AAGTTCTrGCA
CCCATTCAAC
ATGITTTTC
GATACGGTGA
CTCTTACTTT
AAACACACTC
CT'TrCyCTC ATTTCTr.CT TCCAG7TTCAG
GTCGACATGT
7w=GAAATA CGAGGGGT7TT 7T?!TTCTTTA GCTGATCCAC CACCTAAAGG CGAAA.ATACG 'rTATACCCA.A CCGTTGTTAG GAA'rGGCATC AATAAATTCA TAGCGA.ATITC GAAATTGCTC AACGAATAGC GCAAGGCCTG CAGTAGTTCC ATTTGATTAA CTGTAATACC GTATA.ATCGC CTTGTAATTG CGATGAAAAG ATTGGATTTC ACAGGTTGAA GTTCCATATT 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5910 a. a TGTTTCAATT TGTGATACTT GTTCAGAAGC GTATACAGCT GAAACACTTG GAATCGCTGA TACA.ATTAAC ACAATrGACG TCAAAAAAAC CGAAATAAAT TTCATTAATT TGTTCATGAG CT'IrTCTCCT 7TTTT-TGC ATCTGCTTAC ATTTTA'rCAT ATACTOTTAT TATAGTCAAA AAAATATGCT ATTATGTTAA AAAAATATTT TTCAAAATAT AAATGGACGG ATTTATTTTG GA7=~ATTT GTTATT-PTGA CCTGCCTCTA TAT'rGGTAAC CATGATTTGT TTACTCTCAA TCATCAAGAA TTCTCTTTrC GTGGTACGCT TrGGGGTCTG GTACTGGCCT TATATCACTT ACTATTCATT GATAAGTTTG TTATATCGAA TCGAAAATAA AGATTAGAGC TATGCTTGAC TGTGTACTTT TAGGATTTAT TTTGGAGGAA GAT=rGTCT CTATTATTTA TTATTTAAA 7TATTTAI-r TTGTATAAGA TCTATTCTTT 1067 IN'FORATION FOR SEQ ID NO: 166: SEQUENCE CHARACTERISTICS:.
LENGTH: 5406 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: GGCATAGCGA CTCATTTTTT ATCCGTTACT TCTGGAACCT TAAGGGACAA CCCATAGTTG AAAACGAATC TCATAGATCA GACTTCTTCC AAGGCTGTTA AGCCATATTr TCCTCACTCT G'rTGGCGTAG TTTTCTCAAA TAAAGACTTT CCCCACATCT CACGCAGAAC ATTTTCTTCA GCAAGACGAT ACGAGTCGTA CTCCAAGGTG GCTATCGTCC CAATC7ITCAA GATT'rCACGA CTGGTGTCGG ATCTTGCCCC TGATAGTTTC AACCATGTGA GAGTGATAGC CTGACGAATC AGTCAAACTT GTCAACCGCC ACTGCATACC ACGACCGACA CCGCAAGACG TTGTTTGGCT CCTCTTCATT GGTCAAGAGA CATITGACCTT AGCAGAAGTT CAT7 GCTGAG AACACGCGCA CCTGAATCCG P TGCA.AGAGA GACTTGCAT GATTTCATCA
CAACTGTCCA
CTATCATAGC
TCAAAGTCAT
AACCAAGATT
AAATCCGTGT
TAGTCTTCAA
GCCTTTGCTT
TCAAGTGTGC
CGGTCTGTAA
TAATCCACTG
TCTTCACCGA
ACCTTATCAG
AATTCTTGAA
ACTGGGATAC
CACCAAGTTG
TTCATCAAGC
TAGCGTTTGG
TCGATATCAC
GGAACGACCC
GACCCAATCA
CTTGGATTTC
GGCTGGATAC CAGACTAATT TAACCTCAGT ATCATAAATC TGGTCTGTCA AAAGGTCTGC GTCAATCTCT GTTTGCCCTG TGTCACCGTC GACAATATCG ATTCCCAACT CAGGGTCGAT TTTrGATGTTT 'rCAATTTGCT CTTCTGTATA TAAAATCACG AAGCGGTrTTG CTACGACTTG CAATCTGACG GATACGCTCA CGAGTTACG1' CCATTITTCC ATCATCTAGT CCAAAACGTA GAGTATCTAA GATTTCATCC AATTGCTCAC GATTTTCAAT CACTTCATCT TCGATAAAGT TAGGAGTTTC AAGAGATACT GGTTCTTGGG GTGTCATATC CATTCGTTCA GCAATCI'GTT GGAGATTCCG CTGTTCACGA ACCAATTTAT GGATGGTACG AGCTTGGTCC GCAATAGCAC CATAAGTrGA AAACTTGAAC CCTTTAGAAT CCATAT1TTCC TrCTTGAATC AAGTCAAGGA CAATGCAAAC AACCAAACGA AGATTGGCTT CAGCTTCAAC AGCCAGTGCC AACTCTrTCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380
CTATTTCTTT
AGTCCTCATC
CTTCGTTATC
CAAGTACATA
GCTGAGTTCT
CGGACAGGGT
GGTTCTCTT
TCTTCAATCC CATCAGCGTC TC1'GTTGCTG TCCCT1 TTG TGTGATAGAA ATGCCTGCAT CAAGGTAAAA GGAATAACCA CTTATGATTA CGGATAAATT 1068 CTGCTACCTG 'rACGTCAAAT G==?1ACTT CTrrrTGr TGTTGCCATT ATTACTCCAT TCTrCTTT TGGGAAATTA AACGTTCCAA TTCTTCTAGG GCTGTAT=T TATCTCCTAC ATGGCTAGCT TCCTGCACCT TCT'TTGAT TCTCATATTG TCCTGATTCA AGAGAGCCTr G'rrTCGAGTC ATCrCTACTT CACTAAG'rrC CTGCGGCGAT ATCTCAGCAG GCAALATCCTG AGCTAAAACT TGGTACCAAG CTCI'TCAAC T'TCCTCTG;TC TGCTCTGCTA AAACTTCTGG AGGAAGATT CCATACTGGC CAAGCAAGTC ATATAAGACC TGAAATTCAG GTGTAGCAAA TGCAAAGTCTr TCTCGCAAAC GGTAATCGTT CAAAACAAGA GGGGATTCCA 'rCATCCGATA GAGTAGATGG GCTTCTGCCC TCATAATAGC CGATAACTGC TTGGTGACAG GCAPGGTGAT TGGCGTCGGT CTGGAAAr'rC CTTCCATGCG A'rrCTGCCTT TGCACCTGAC GACTCTCATT AACAATCTGC TCAATCTCGG TATAATCAAA GGACGCCAGA C1'GTCAGCTA AAATATGAAT ATAGCTGTT TGAGCAGCGA TGGACrTTTrrc TTGACAATC AAGGGAGCTA TT'rTTCAAG AAACTCAATC TGAGCCTGCA GA7rTTACT G??TTCAGGT TTG'TACTGAT GAA'rGTAGAA CTCA.ATCGGA CTAATACGAG TTTTCGT'rAA TAGATAGGCC AAG'rCTTCTG GACCATTTTT TTGTAGATAC TCATCAGGAT CCAAGTTATC ACCALATTTCA TCCAATGCTT TCAATGTCGC AAGAACCAAT 'rTCTTGTA ACCrrrCAG TCCCATCGAC GCCACAGCAT TTTCGATTCC TCCT'rCC-ATC AGGTAAATCT CACTAGCTTT- ATATAATTCG TAACT'rTTGT TAAAAATTGC T'rGTGAATCC G7TrTTGCC AGATACGACC TGTCAGGGGA AACATAATGC GATTGTGAAA ATAAAACAGG CCTGAATCCA GTAAATCCTC GAGATAGTT CGTTCTGGAG GTGCTAAACC CAACCCCCGC TGATAAAGGT AATTrCTGGC AGCA'rGGTAA AATTTGGCTG CATCTTCCTG TGACITCTGC TCACTATAAA GCGGT'TTTTC TTGGACTGCT TCTATAAAGG CGAACCCCTTG TGAGCGACCA CAACCGAAAC ACTGATAAAA TGTrIrrTTCA CCATGAAAAG GACAGAGCCC AATCACATCT CCTATGACTTI CCACAATGTT AGGCATGCTG ACGATTTGCA CGCTTGCCCA GCCTTATCTC ATGC CAACA TGCTCTCGAC AGCCCGATAG GCTGCAATAA TCCAGAAGAT Clwl'rTTGCCC AGTCGATCGG CTGI-TTAT TGAGAAGGCA ATGACCTTC GGTGTCTACA AATTGATTGG TTCACGATAC TGATCAGACA AATCCAAAAA TGT'rTAAGCA cCCTCGCCC ATAGTCGTTG CATATCATAA AGAGCTI'GGT AACCTCAATT CCAACACGCT
CAGGCATATC
CATCGTAA-AC
TCAAGGCTGT
CATCCATGAA
TATCCATATG
ACT'rAGA.ACT
CTTGGTCATT
CATCCGAGAG
AACGTTGATA
CTTCA'TCTGT
TCATGAGAAT
GAGGTGAGGC
GACCTAAG.AT
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 GTACTCCTCG ATGAACTTAA AGACATCACC CTGCTTGTCC 'rCTACAACAT TGAAAGATGG TAGATAGTTC CGTCCTGCCT TrTCAAAGA GGCA7TTG=~ TTGATCTT CAATGACTTG 1069 T1-rTGTCAACC ATACACAATA CCTCCATG'rr ATCATAGT1-r ACTTTATATA GTATAC?1-rA 3240 TTCAGAAAA AAACTAAACC A~rCACTCA TTTCCCTAC =rATTCAAA TAATCAGAGA T rCA=M' TGC'rIIr~CT TerrGG=TA AATCrGGAT TCTTCA'rGA CAATCAAGCG ATTGCCGTAT T'rGAGAGCAT CTTCCATArG A'rAAGGGCTG TTAGCTGATC 7TTTAACA AAT'rCATCTG TCAATTCCAT CTAGTCTT'rG GATCCAGGGC AGCACTATGC TCATCTAACA 9 AAGGTTGCCA TCAAGAGACT CAAAGCCTGT CT'rTGTCCAC GTATTCAAGT GTTCTCAAG ACCATTI'CCT ACIrrCAA TTATAGCTAG TCAAGCGTCG TGGTAACAAT CCACGCTTT AAAAGATTTT C-AGCGACCGT CATACGGGGA GCTGTCCCCA CGAGACAGGT ACT'rGGCACG CTTCTCGGGT GAAAACTTAG CGGATAGTTC CACTAGTTAG TGATAAGGTC CCTCrATAC CCAGCACCAT TrCCGCCCAA AATCGTGATA AAGTCCCGTT TCATTTAAA-A TAATCTTTC TTCATCAA.AG CCATTTTTAA AAT'rCTACAA T'rGCTGTCAT TTGCTTAACT TGGC-TCCTTT TTGGAATCAT GAGGCAGACT GC'TAAAATCA ACGCACTGTA TAAAGCCA.AG TGCGATAACT GCCCACACTA AAAATTGATA TAGTAACCAA ACGCTCTGCC AAGC7ICAAAC TCTrGA.AAAT 'rTGCAAGCCC CACAACGATA ACCCCGATCC CTCGAGACAC GAGCA.ATGAG GGCACCTGCA AGGGCAATCA CACCATTTGA
GGAGTAATTC
CTGATAAGAA
TGCTTGCCrG
CACCACGAAA
TCTTTGGATC
TGAGATCTTC
TGTTAAAGAG
CAAAAATTTC
CGATTTTGGT
CAAGATTG7'r
TAAACGAAGG
AGCGATAGAA
AACTTCTCCA
ATCGGCATAA
TPACACCAAG
GAGTTGA'rAA 3300 AATTrGCCT 3360 ATGAGTAATC 3420 CAAAGCAACA 3480 AGGTCGCTTC 3540 CTCAA'rCGGT 3600 AAAT'rCATCC 3660 CTTGGCGATT 3720 TTGGAAGACA 3780 ACCTAAAATA 3840 AGTTGATTTT 3900 TAAGGAAACA 3960 TGCATTTTTT 4020 TGCTTAAATG 4080 TAACT7,C;TAT 4140 CCTACAACCA 4200 ATAATCAAAC 4260 CCTTCTTGCT 4320 CCCATGAGCT 4380 CCTGTAGCAA 4440 ACAATACTCA 4500 ACATCCTGAA 4560 ATGATTGAGT 4620 'rrrGTATAAA 4680 CTCGCTAAAA 4740 GGGAAGGAAC 4800 CCCAGACCTA 4860 T'rAATCCTTT 4920 9**9 9 9 9 CCATGCGTCC AGTATGAATC TATAGGCTTG TCCGAGTT'A CAAAGATGAG ACC'TGTCAAG TTrGCTTGGT TCCAAGCAGG GACAAGAAGT CATCACCAAA GAAGGCCTGC TGCCATTCCA ATGGGTrCAC GCCTT'rGGTT CTTCTGTCGT CATATCTGGA GAATAGCCCA GACAAATCCT CCGAAACTTC TAGCCATATC AGGATTATCC GTGTCCAAGA AAAAGAGCAT GAGACCAATA AGTTGATTCA AATCCGAATC AA.AAGGCAAA CCTAAATTCG CACGTCCCAT AATCAACAGC ATCCCTGAGA GCAAGGTTGG GATCTrrCCCT GCCAAACAAC CTGCTCCTAC AGCAACAACT ATCAAAGTGA CAGCAACAGC TCCCCCAAGA AAGTTTAAAA TCCTAAATCT CATAAAGATT TGAGAAATAA TGGAAACAAT CATATTTTAT 1070 CTATATTCAT C7rTT'AAAA AATGGGAAGA GTCTCCTCCT GACTTGTCCT GCTTCTTTGA GAACAGACTC AGGAA'rAGTA TrTTTTATrG ATGACTGACT TACCAGTrGA AAAGACATTG TGCACCTTTC AAGACflTGCA CAATCATTTT ACCTGTTGCC AATTACAACT GATGCCAAAC CACCTACTTC TACCATAGCT TTTCTTAGAA CTTTGATTGC TAGAGACAAC CGTTGGAAAT AATTGGAACC CAAATAGCAT CTACCITGCT AGTCATAACA ATTTGTTGAA GGAACTGCAA AT GTT TCCAC TGTCAGACCT
AAATTC
INFORMATION FOR SEQ ID NO: 167: SEQUENCE CHARACTERISTICS: LENGTH: 9711 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear CCCTACCTTA TTTATTCGAT ATACCTAGTT Cr=TCTAT ACTGGGGTAT CGGCTGGTTT ACACCAAGGT CATGTTGGTC GTCGCACTGG GATAAATTGG CCTGATGCAA TGGTGTTATC GTGACAGTTG AGGCAATTrC GCCTTTTCAG CATAAGCCTT 4980 5040 5100 5160 5220 5280 5340 5400 5406
S
4 4 *4 44 4 .44 4 S. 4.
4 4. 4.
4 St *4 4 4 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: CAGCTTGCTC TTACTATTAT AGCAGATGTT ATAGCTGGAA TTATCTTGTA TTTCGTCTGC AAATGGCTAG ATGGTAAGAA GTAGACCGAA TGACTrAGCCT ATAAACACCC G-TTAAATCGC Sb *S S S 4S4S 4 5.55 4454 S* .5 S S TAAGATACGT CAAAAAAGCC CTTAACTATGCGCACTAGTTA AACCTTATAC ACTAACTACA TTCTAGCATA TAAGCCCAGA rTGTTTAAAG TTCTGAAAGG TCTATAATGA AGTTAGCCAT AGCTCTTATG AACTAGTCGA TTCTCATCA ATGCGCCAAC GCCAGATAGG TTATCTGGGT AGTAGGTTGG CCAGTTGTCC TTG.GCTTGTG CCTCCAAAGA AGATATGGAA ATGTTCTGCC GTCACTAAAC TGAACATACT TGAATTGTCC AGCGTCAGCA ACGCACGCCA CGATTGCCTT TCTTGTAAGT CAAAATTTTC GTATTTCTTG CTTTGTCCAC CTTGAACAAA TTCCATAGTA AGTCACATCT GTATGATAGC CTTTGTATA GTAAGCCTTG ACCAGTCAAC TTAGCCTTGT AGTCAAAGAC TTGGTCAAAC ATAAACTGAT TGCCAGTTAC CTGCATAGTC ACTCAAGGTG CTCGAAGTAA CCATI'TTGGA CTGTCT'rGGT ATCCTCTGCC GGGGCTTTGG TGTCTAATG TATTTCAAGA GTTTTATTTA CTAGTATCAA AAAACCGACT ATTTCTTGGG CGATTTCTTG ATTTCTTCAA AGAGGGCTTC TTAACTGGGG CAACATTG, 0 TCTGTOCCTT CAAAGAGGAA TTACCCACAT ACTTGTAAGT TTATCACTAA TGTTAATCTT TACTCAGCCT GGGTCATCTT GTGCCGTCTT CAAGGAAAGG CGGTCCTTGA CAGCTGCATC TTT'TCAGGTT CAATTGCTGG 1071 GCCTqC?1'GG TCT?GTT GTTr.ICrCCA GCCTrGGTGT TTGACACCT GCTTC?1'TTG TAGATATAGG CGATTI'ATT GATGGCTCTG CATCTGGAGA GCAAGATAGT TAAAGGCTGC CCT'rCTGCGT
TCAAAGGTCT
AGI'TAATGC;
TCTTCTCCAT
CCTGTCGCCT
CATGTTCA
GCCTTGGCAG
TTAGCCGTAT
ATATTGAGT
AGACTGGCTA
ACTAAACTGA
GTTTCTGCAT
GCAAGCAGCA
ACCATAATGG
CCTGTCACCA
GTATCTTCGT
ACAACCCCCG
CCAAAGAGAT
AGAGAAACAG
TAAACCGTAC
ACAGTTGGAG
AGGGTATCAC
AAAAGACTCA
AAGCCTTATC
CTI-rTTATC
CACGAACTGG
GGTCATI'C
TGATGCGTT
TGTTTCATT
ATGGTTCGTA
CI'CCTGCGAC
GTrCAAAGC CrrGAGGrT cCCr7CTGT CAGCTTCIrc AAAGrGTGTT AGCAAGGGCT TTTCTTGACA TACTCTGTCA AAGTCCTG.AG ATTGCGACTT GTrGTGAGrrC ACAAAGCTCT CAAGGC7TGC AATTTTTCGA AGGATAATCT GCTGACAAC TGATXACCAA ACATGGGGGT TCCCTCTTCT TCCTCGCCAC CAcrTT7'rC TTATCCAAGG TTCATAAACG AAGGTATCTG rrCATGAGGT TCTGTCCCAG TTGCT'rGGTA AATTCATAGA TACCATCTGC CTGTTTT1'A 'rTGGAACAAG GTAATAAGCT AATTTI'TC ACGTTCGTCT 'rTAGTATAAA GACAGTTACA AAAATAA'rGG AGTAGGAAAT GTAAAGTCCT GCTACCATTC TAACCGATTTr AAAGTTTC CCCAGACGCA TCGATACCAG AAGAGCTCCT GCTGCAGGAA TGTTAAAAAG AATG.GACATG GTACGAACTC CAAAAGTTAA GATATACATA GGACGAAGAA CAATGACAAA GAGGGAAATG ACCT =rCTT A"rGGTCCAA ACTCATTGAA CTCGAGCTT CCAGACCTGT TGACATGAGG ATAGCTGTCC GGAGATACTC CAGAAAGACC GCCGCAATCA AAATCCCCAA AACCAGACCA AAGGCTACAC TCATCAAACT CTGACGACGC AAGATGAGGA 'rAGCAATAAC CGCCAAAAAG GCGCGTTGTA TTCTCCATCA CGGAALATGTA AAAGGAT'rGA GGACATCAGT TGTG.AGGCAT TTC~rTCAAAA ATTCTGCCA6A GCGAGCAGCT GT TTGAGTCC ATAGI'CCAAG T~rTT'TTGC TTGAGACAAA TATAGGCAGC 'rGCA'rTCTTC TGTCGCGGAT GTGCTCTACT CAAACTCATG GTGATGACCT CTGGCAAGAG CAACATATCG TATCTAGCAA TTTAGGTACC CATCTTGGAT TT'rGGCAACT CACCGATTAG GAGTTCTACA CAGGGTAAAA GGTTGTCACG CCACTAAAAA CAAGGCACAT CCTATTTGAT AAAACGTCTI TAATACTTGC ACTTGCAGGT CCAAAALAGCC AATCGCACTG GGGCAATACT AGCTGGCAAG TCATAAGGGC AATAGCCACC GCAAGCCATC CACAAAGGCC AGAGAAAGGT CAAAATCAAA CACTGATAGT CACGATCGAA TACCCTTGCT CATGACAATC CGAT'1ICCAT AAAGCTCTTG AGACAATGGC AATAGTAGAA CTGAAAGTGA GACGTGGC1'A AGGTTCCCAA TACCGGTGAG TA.AAGTCGTA AGATAATAAA 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 1072 CTAAGC-ATGG CCCACCTCCT GGCCATTCTC ATGAACATTG AAACAACCCC ATGGCGAGTC 7"rGGTTACGG AC'rAGATGAA TATTGCGATC CGCATAATCC ?TAACTTCTT CAGGGTCATG
GGTAATCATC
rCATTTTA
GTCAGAAGCA
CAAGCGTTTG
CTCATCATGA
AAA'rTCATAG
GGCTATTCTC
TGGT'rGCAGA
AGTCAAGGTA
CTTATCATA-A
ACTAAAGCAG
ACTTGTTCAT
CGAGCCAAGT
TCCAACATCC
TTGAGACGAC
AAAACAGCCr TGCCATGATG ATGGGCGCTG TGGTGCATGA GTTCGTAAAA CTTCCTGCAT CCATCCCCGT TGTCGGCTCG TCTAGGATAA ACACATCACG AACATACGCG CAAI'TACCGC TCTCGATGTT CCCACATCCC GCATrCAAAC GACGGAACCA ACCGTACTTG GAAAACCAGC AATTTCTTAC CTTGCGTATT TCGCTGCTTT TGTCCCCCAG ATAGACACCC
AACTC.AGTCC
GCCTl'rTCTC
ATTAAAACTG
TGTCTTTGAA
AGACI'AGCCT TGATATGC'rC CGATAGCGAC CCGACTTGAC GCAATI'TGTrT GAGGAAGATA ATAGCCACCT TTCCAATGCG rAGCCGCTC CATTTTCCCC ATATGTTCAA GAACAGGCTC AT1TCCAAGAC TAGCCTTGAT GAGCGTCGTC ACAAATTCCC CACTATCAAC ACAATAAT'rG TAGAAGGACA AATCCTCTAC TCAAAAACCG CTGALATCACT CGTAATATAT CTCATTATTT GATTTCTCCT TTTTGTTCAT TTGGAGTAAA CTCAGTCGCC AGGTT'AAAAG TGTATGCTCA TGGTGATGGT GCTGCTCCTC CAGTCAACTG ATAAAAAATC ACACGCGCAT CTTTAGAATC CTTCCTGMAC CAAAGACTTA ATGGCCTTGG GGGCCAATTC TGAATTTGTT AAAGATTCCT TGCTCCTGAG TA'NTGGTCAG GGCCACCTCG TGAIIT=CCG CCTGCAAAAT CACCTCATT1C CTCATATCTG ACTCCTTTCC 7TTAGACTT GACATTT'TGT TTACCAGTTA ATTATATCAC AAACTAGTTT CATTCTTGAA CTCTTCTATA TCCATCATAA GTCCCCCAAT CN'TGCTGAA 'rGGTGTGGGA TTGGA'rAGGA AAGGATCAAC ACCAAGGTGA ATGGTGTCCT TCATAAAGAA TATATTGGTA AAACCTTGAC TTTCAACTC CATA'rCCTCT CGTACACCAG CATAGTTCA'r AATCGGGTTT ACCTTAGATT TAGAAAACTG TGGCCACTTG AGATAGCTAA AGCTTTTCTG GATCTGCTCA T'rATAGAAAT AATTTTCCAT
CTAG;TGCAAT
AAAAAAGCAT
CTC7?NTTTA
AAGCAAAAAA
'rrATATTATC
AAAGCGCTCA
TGCCTTGTCA
AGGCTCCCCG
CTAGCGAATC
CCA''rTTrA
TGTTAAAACC
AGAATCCI'?T
TCCCATCTCA
TA.ACTGCCGC
CTGACAAGAG
GACCTATTAG
TGATATCCTT
AGAGAAAAAT
AGAGTCAAGA
TATTGAAAIT
TTCAGATGGT
APAGCCAACC
CCG'rCCTTAG
AGCGATTGGA
'rTTAGATG'TT
CTGACTGACA
CATA.AGGATA
GATTTCAT'GC
TGCTAGCrGT
ACTATTCTTT
AAAAACGTGA
CTTTGACATC
AAGTCGGAGC
AACCCAACCA
AAAAATCTGC
2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 TTCTGCACCG TTTGTTGGTA 1'TAACAGGTG GAATGATAAA AACTGCAAGT CATTATACTC AAT'TTC'rTCA AcTCC TTATTGGAAG TA'TT=C AGCATCTGCTr TTGACAACAT 1073 C7=TnGC CTGATAAGAA AACTGGTCTG GCAAGAT'rr TATCGTACTT AACATAGCCT CTAACCGAAA ACTGACCAAA TAAATACTTA
AAAGGAAGCT
CGACAATTCT
CTGTTGCAGT
AAAACTAGTC
GCTACATGCT
TGGCGTTCAT
TCTTTACTTG
AAGCGAGTCG
AACTGGTCTC
TAAAACGAGC CAATAATTCA ATCATTTCAT CCAACrrCTG AACCAGGTCC TTCATAGCTA CTGCATATTG ACTAGCCTGA TCCCCAGATT CA7"TAAAATA CTGCTGGAAG GCTGCTGGAT
TGTCTGCTGT
CGTrTTGGGAA
GATSTN'TCAG
CATAGCCATT
TTNACTGAAC CACTQACCGr CATCTGTTGC ATI'CCAAAAT AGGACGGTAG GAACGATTGT AACCCATI'CA CTAGAGCCAA GACTrTTrGA CTrTCGCTCCT AGCTCCTAGA TTATGATGCA ACCAGCGATC AAGACCGGTC CAATACCAGC TATGATTTTA GGACACGAAT GTCAAAACGG
AGATAACATA
ATTGGTTAAG
ATTTCTCAGC
AGAAGGGAAC
TAAAACTATC
TCTCAGTAGG
CGAAGATCAT
TTAGCTGTAT
TTCTCAATCT
TAATACCGCA
AAAACGCACA
GATAGTAGTA
ATAAAAGAAA
CCATAAGCGT
TCCAGTCGTC
CCACAATCAA
CCATCATGTC
TGATATCCAT
AAGATTAAGA
GGATGAGCAC CG'rCAAAACG TTTGGATCAG ATAGTGCTCT GCCACTrGC'?G AACGCTrC ATGAGCAGAA AAACCA.ACAA TTAAGCATTT TGTAGCTCCA ACGACCAAAC TCTGTTACAG CTCAACCGTT CCCATACTAT AGAAACATCT TCCATAAACA A=TTATTC ClrTTATTTT ATGACAACAT GACAACATGG CACAACTTGT TTATTCTCCA GCTGTGGTAA CGATGCAGCT CCCCCCTG;TC CTAAAAGATA 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 CCAAGACACC TGCATCAAAA AGATCTTCAT ACTCATCAAT AATTrTCAATA ACI'TCTGATT TTAAACCATA GATTATTCAA GAATCCAGAA AAAGTGACAA CCATGCCAAG CAACTGAATC CAGCGATTCT CAGGTAGGGC AGCCTTCCCT GCTTTTTTCC GTTCCTTATT GAGCGrTT TTCTTGCGAA CCCAGGCATC ATTGATGACC
AAGCCTAGTC
CCCATAATCA
ACT7TrCTTC
GACAGACTCA
TTAAAGTTGA
TAACCTGCAA
AGATTAAAGA
CATGAAAGAG TCCATAGGCG ATArAGTACC GCA'rATTTAC AATGTAGGCC A'rGCTTGAGG TG1-AACAC CATCACCATT CGCATAAAGA AGGTCACACC ATGCCAAAAT 'rTACATTACG ATSTTrAAAG CAAAGTCACG GXACCAGAAG
TATGCCAGCG
TAGGGCTACC
AGTCAAAGAA
AGCCACCTGA
ATTCCAAAAC TCCTTTAAAT CCCTTGATAA AAAGGGCTTG GATTCCCATC AAGTTTGACA TGGCCAAAGC AAACATAGAA GAGTTrCCAGA CCAAAAGTAT ACATAACTCC CAAGGCATAG CTGCAAGGCT AAATTCTTCA GAGGAGGTAG TAACGTCTCT CCTAAAACAT GAGCTAGGAT AAACTTATAC AAAAAGCCCC ACATGATATA TCATCCAGCA TATCCATCAA CTCATCTCGC 'rCAGGAATAG CCTCATAATT
CCCACAGAT
TTCATTAAAT
CGCTTAAAGC GATCGATTGG ACCACTCGAG AATTCCCAGA GGGTAAAATC CTTAATCACT GAACGAAAGG TCAGGTAAGA AATTCCCAAG GCTGGTTSCA CCTTGACAAA GATAATCGCA ACCCACTTGC CATCCTTGCT T'IT1TCCATAA CAGCAAAGGT AAATACCCAA GGCAGCTAGT ACAATAAAGA AGAGACTTAC CAACACTTCA CCTATAAAGA TGGGCAAGGT TGCAGCAATC GGCTC'TAAAT GAG GAAGCTG TTrGAAAAAAC ATCCTTTGAT GTCAATCTTT CCATTTGGAG TAGATGGCAT CA'rATAGGAC ATCATGATGT TATCGATATC 'rCGC'rCAAAC TGCTCACGAA 1074
AAAGTTGGCA
CCATCTCTCA
AACCCAAGCA
AGTAGGGACA
'rGCTTGTAGA
TGATTGGTCT
TACCAGGCAA
ACATAAACAA
TCCATCATCT
TTAGTGGCAA
CTGTCAGGTC
CACCGTCTTT
GATTTTGTAC CTTGTGGTCC TTGTTATAGC GCGGTACTGC GAGACTTGTT GAGGTTTTGA GAGACATCTT CTAACTCAAT TCTGGAAGTC CATGCGTCCG CCGTAGAGAA GCAAGCCCTC CGCCTGTGTG ATAGGCTIGGC AGATCTTCAA ACTCAAAGAA GATTGTTCAT ATAACC'rTT GAAACAGCI'G GCCCAGAAAC CATTTGGCAG 'TTATITTCCT TCCTCGTCAA TGATAAAGCT AGCCGATTGG TAGGCGTTTG AGAGTCGCTA ACATCTCGTC GAGCTACTGT CGCTTCTGTT GGGCCGTAAG CATTGATGAT CGCGCAGTTT TTGAGCTGTT TTGACCGTCA ATTCTTCACC TTCCACGCA'r TTTCTCACTG TTGAAGTATT CAGACAACAT GTGTTGATGT CCAGATAGCG AT'rGGCAATG AAAAGATAGC CCTGAGTGAT GACTCAAGGA AGAGTGAAAA GCGTACCACC AATACATGAC AGACAPAGTCA AAAGAATAAC GTGGCTGTGC GTGTCGCAAA TTCCrrATCC GTAATCATCC AGTTTGTAAA AAATCTGCAC TCCCTTAGGC TTACGAGTCG TACCAGA.AGT CATCTCCCTT GACTGGATGC GTGATTTCAT AGTTATTCCC CCTGAGCTAG ATTTATCATT GCTGTAGAAA CCTCCCAA TAATCAAGCT TGGC'TCTGCT ACTTCTAAAA TAGCTGAAAC TGAAGAGAAG GAAACGCACC GCTCGATGAC AATCCAACC A.AGACTGCCT TCCATTGATA GAAAACTAAC TAAGI'AGAAG AA.AGCAGGAG CAATATTTCC TTCCACCCAC CAACA'rGGTG AGCGTTTCTT GAAAAAGAGA AATACTGAGG ATTGCCATAT CTTATTCACC TCGTTAATCA ACTGTCTCGG TAAAGGAATT TTCCrT7GATG GCC7=GGAA TAACATGACA TAAGCCAATA GACAGCAGAT TCGATAAAGC GCGGTAACCG TTAAACTTAA ATCTGTCATG GTTCCCACAT GGCT'rCTGCT CT'=TCAG AATGATIrCT CCCTGCTCAC TGGAGAATCA GCCTTGCTAT TGTCACGGCA ACTGCTGACA ACGGGCATTT GGGAAACGCr ATCAAAGTAG AALATGCGTGA GGCCATATCT GCAAAGGATG CGCAAAGAGT TGCTTAAACT AAGTGCCAAG GTCGGTGCCC CAGCATTTGC GGACGACTCG GCrGAGGAGA VrATCATG'rG AAAGATAATG TAGTAATTrAT T'rGGGCAAAG GCTTCTTGAA GGGAAAGGCT GAAATGGCA.A TCGCTCCAAG GCCGAATGGC 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 ?800 7860 7920 7980 1075 TATCAATTGG AA'rGTAGGCA T1GACCTGACr TAGTCAGCGC TACAAAGr GCCAACATrr CATArrCTG GCCACCAAAA ACAACCACAG GAGACTrCT AGGCAAGCCT AG7rGGCAA TGACTGCAGC CAAACTATCC CAATCAGCCT TTAAATCGCC ATAAGTGTGT TCCI'GCCCCA AAACATTATA GACAGGATAG CTAGGCTGTG TCTGAGCAAA ATGCTCAATG G7r'rCAATCA TATCTGCTAT TGGTTTATTT GACACAATAG GATAAAGCTT CCTTrGACCCT GACCAAGATA AAGAAAATAC AAGGCTrGTCC GACCAAGAAA
GGATTCTCCT
GCTAAA-aLAG
GAGGTACAAT
GAAAAACCAT TCATTTCTGT AATrTTTCCC TAAAATAAGA =TACCAT TGTACCACTT TATATAGTAT CllTTTCAATT AGATTTCAGC TTAT=TAAG GATTATACAG TN'?TCTATG TCCTGCTTCA AAACTCCATT TCAGGAGACA ATGAAGTAAA CAATATCAAG TrTTTCAAC ACCTGATACT ATGCGCTTTT AACCACTCTC TCA'rTTAAAA TAATCTCGTC 'rGATATAAAT GACAAATGGC TGATAGCCAA AAACTGATGC TAATACCAAA TTAGCAAAAC AAATACTGAA AATGCTAATG TAGAAATCAC TAACTAA.ATG ATT?'rCCTCT ACTGTTTCCT GAAGAAATAC CTTGCGATAA CATACCAACT AAAGCrGAAA ATAATAAAAA ATAGAATAGT CAGTGTCACT ATrTCCATAG C'TACAAGAGG AAATCATTCA TACCTCTCTC AACTAGA'rGT AACTTACAAA CTTTCTTCCT CCTCATGAGG rCAGTTAC TrCTGCTGT CTAGATTTCC TCAAAAGGGC AGACTCCTCC C'TTGGTGCGT ACTGTTTT AATGCATCAT TAACGACGCT TTTCT'rCTAG AGATTCAGGT TGACTTTTCT AATCCTAGAA TAAAGTGCTG TCAAG7TrAAA ATTCATTATA TAAAGCAGCC CTAGAAAGAT TC7TrTTCTCT GTTTCATCAA GTGATTC7'rA CTrAGCTTATT GT'rrACCGTA TGTTTCCAAT TATA7=TCA AATAGAGTGA TCTTCCCATA ATAAAACACA CTGA7TTTA AAGACTTTTT TAAAATAGCT 'rCTATCATCA AC=CCAGTA ATATAGCTCA TTrCA.AGAACG GAATAGACAT ACTTTCAGGA ACTTCTI'rTTA CATCTGTGCG TTTGGAAAAT AAAAAGAATA C~rrCCCCCC ACCCCTGACC TCATGAGCCA TCCAGTATCG ~TTCCTCG CACACGATTT TTTCATCTCG GTGGTTCATA AGGAACAGGA AAAACAATTC GGAATAGGCA GAACACA'rTT TCCCACCACG 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9711
TAGAGACTAG
TGAAGAAAAA
TGTTTGAGAA
TTGATTAAGG
AAATrCTCGG
ACAATTTGAG
GATGGCGGAA
AAAGATAGAG
TGAACTATC
CTTGACCACG
GAGCTGCTTG CcTCCTG=r GCGTTTGATT GTTAAAGTTT GGAAGTCACC TCCAGCTAGA ATTGTAGGCG ATACAGCTCA TCATCATACG AACTTCGTTT CGTTTTATCG CCAAAAAATC CCTCCTCAT CTCCTTGATG TCCACGATAA AGCTGAA.ACT GGTCTTGGCT gTTCCACTCG ATAACATCGT AGAACAACTA TCCTCM C TCATATTTGT AACGAGAGAA 1076 INFORMATION FOR SEQ ID NO: 168: SEQUENCE CHARACTERISTICS: LENGTH: 3025 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: CCCCTTTGTC AAAACTGTAA AATTAACGAC TCAACAATTC GGAAAACAAA AACAAATTGA CCTCTGTCAA AACTGCTATA AACAATAGCC TCTTCAAAGG TATGACGGAT CTGAACAATC GATTTCTrCA ATGATCTAAA CAATTTCAGA CCTTCTAGCA ACCCAATCAG GTGGAGGTTA CGGTGvGAAAC GGCGGTTATG ATCTrTACAC CAATCTCAAT AGATTATCAA AACAGATCCT GTGACTrCGA TCCCTTTGGT ATACTCCTCC TATTCCCCCA GTTCCCAAAA TCGTGGATCT C. GCTCAAACTC CGCCACCTAG ACTGAAATTG CCCGTCGTGG CGTGTCATCG AGATTCTCAA GGTGTCGGAA AAACGGCCGT CCACATA-AAC TCCAAGGTAA ACGGGGATTC GAGGACAATr' CCAAGAAAAA GGCCTGCTGG AAGAATTTGC TATTAATGTA AGACATTGAC CCCGTTATTG GGCGCGACGA TGAGATTATC 'rCGTAGAACC AAGAATAATC CTGTCCTTAT CGGTGAACCT TGTCGAAGGT CTAGCTCAGA AAATTGTCGA TGGCGATGTG ACAAGTCATC CGTCTGGATG TGAAGAACGC ATGCAAAAAC CGTGAAGACA TCATCCTCTT TATCGATGAA AGTGATGGTA ATATGGACGC AGGAAATATC CAACTAGTCG GTGCTACTAC CCTCAATGAA GAGCGTCGTA TGCAGCCTGT TAAACTCGAT CTCAAAGGGA TTCAAAAGAA ATACGAAGAT AmTTGAAGCAG CTGCAACTCT TTCCAATCC
ATCCATGAAA
CTCAACCCAG
TACCGTATCA
GAACCAACGG
TACCACCACG
TGGTTAGCTT
TCATGGAAGA
TTGTTGGTGC
CCCTTGCTCG
TTGAAAAGGA
TGGACGAAAC
TTCAATATAC
AGTTCAAGGA
A.ATTCGCAA
TGGTTCTGCG
TGGAGAACTG
TGCTGCCCTC
AATCACTATT
AGATGCTGCG
GCCTGACAAG
TTTTGTGGAT
AGCTACACGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 TACATCCAAG ATCGCTTCTT GCCATTGACC TCCTAGATGA CCTAAAGTAA TTGATCAGCG GAAGAAGATT TTGAGAAGGC CAAAAGAAAA AGATCACAGA ATTATCGAGC AGAAAACCAA.
AGCTGGTTCT AAGATGAACT TGACCTTGAA CTTGATTGAG GCTGAAAATC TCAAGTCTCA
GGCCTACTTC
CCAGGATACT
TATCCCTGTT
CGCGACCAGA TGCCAAGTA TAAGGAAATG CCTAGCATCA GCGAGAAAAC TATTGAGCAC GGTGATTTGA AAGAGAAAGA ACAATCTCAA CATGTTATTG GTCAAGATGA TGCAGTCGAT CTCATCCATC TAGCCGAAGA TCTCAAGTCT CCGCCCAATC AAGATTGCCA AGGCTATTCG CCGTAATCGT GTCGGACTTG GTACCCCTAA CGCAT 1077
GGAAGCTTCC
GCTATCGAAC
GAAAAACATA.
GCTGGTcAAT
GTGGAAAAAG
TCI'CGTTGG
TTI,'TGGTTC
GTITGGCTAA
TAACTGAAAA
CTCACCCAGA
S S S S 555.5.
S
SS 55 S S
S
5. 55 S S TTGACAGACG GGCAAGGACG AATGCAGGTA CAGGAAAGAC ACCAATCTG TCCTCGGTGA GATGGCA'rTA TCGAATTrAA ATGCTAGCAG ATG=rAACAA AAGGTCAAGG AAAAGTTGGT CGTCGGACrA TTCAAGACTA AGCGAAAAAG ATCTCAAAGC AAAAAAGCTG AAGTTAAAAG.
AAATGAAATr 'rrTCTGCTTC CT'rTGTCCAT TATGATATAT CATTTATAGA AAT'rAANTT TATAGTCAAT TGAAACAAGA AATACCTTTr TGAGGTGCTT GGTGAGTAGG GAGGAAGAGG TTACCCAT'rC TATGGAATCT GGTAAGAGAA ACTTCTGAAA GCCAACTGGT GTCGGTAAGA CAGAACTTTC CAAACAACTG TGCTGATAGr ATGATTCGCT T'rCATATGAG TGAATACATG GTrGGTCGGC GCTCCTCCAG GTrATGTrGG CTATGATGAG AGTTCGCCAC AATCCATA'rT CTCTCATCCT TCTCGATGAA TGTrATGCAC ATGTTTCTTC AAGTC7"GGA CGATGGTCT CACCGTrAGC 'rrCAAGGATG CCATCATTAT CATGACCTCA CGAAGCTAGC GTI'GGATTTG GTGCTGCTAG AGAAGGACGT ACTCGGThAC T*'C=TAGCC CAGAGTTTAT GAACCGTTTI' CGCTCTCAGC AAGGATAACC 'rCCTrCAGAT rGTCGAGCTC GCGCCTCTCT AGCAACAACA 'rTCGTTTGGA TGTAACTGAT TGACCTAGGT TATGATCCAA AAATGGGAGC ACGCCCAcTr TATTrGAGGAC ACAATCACTG ACTACTACCT TGAAAATCCA AGTr'ATGACT AGCAAGGGAA ACA'rTCAGAT TAAATCTGCC TTCTGAAAAA GAAAAATAAA TCCTATAAAA AAGGAGTAGA TTTTrrTACT AAAATAAcrG TAATCTTG ACAGCTTGCC AGTAGACTGA ATCTGAAATA GTACGAAACA ATTGCTAAAA ACTI'TCCCAA TCGA??TG'PT C'TCATCTTAT 'rTCAATCrGC ACAAGACAAA AGAGCCTCAT AAAAGGTATI' GCAACTTGGT T TTGATATGA GCCCATGT1'T TCTCAATAGG ATTGTACTCA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3025 TAAAAGTTTA TACCCAAACT CTTCACACAA TGCA'rTATCC ATAATAATAA CCGATGG'rGT CCAAGCTTCA AAAAAGTrCGC 'rCGTCATCGT
GAGTTCTAAC
GGTTAATGT
CTCTTCGTAA
GTCAT2'GGAG CGATTAACTC ACCATTCATT TGTTAGACC~T CCAACCAAAG AAATTCTCTC ATATCTCTT CCAGATACTT 'rGCCTCTTCT TAAC'rGACCT TTTAATGAGC GACCATATTC TCGATAAAAA TAAGTATCGA ATCCTGTrC GTCAATCTAA ACAGGTGCTA GGTGCTTTAA ACTATTAAAA TTCTTAAGAA ATAAGGCTAC 71-NrCTGGG TCTGNTCAT AGTAGGTGTA GTTCT TTTT? TTTTCGAGTG TAGCC INFORMATION FOR SEQ ID NO: 169: SEQUENCE CHARACTERISTICS: LENGTH: 4104 base pairs 1078 TYPE: nucleic acid SrRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: TTTAAGGTTT TAAA-AAAGT TTTCGAAAGG CGTTGATATC TAAATCGTGG TCAAAGCCGG TATCATAATC TAAATCAGTT N'AGGACTGC
TTCTWPCTT
CAAT TTCC
'CTCCAAAAA
p.
S
55 5.
S
S
55 S S Se so S S
S
CGGAATCCAA ACAGAGGTAA ACTTGCCTGT ATCAATACTG ACCAACGTAG ATGCCGATGT T'PTTAGCACC CAGTGATGCT GACACCTTCG TTCATATTAG ACATGGTT'r GTCTrCCACG GC'rGTAAGGA GAGGCAGCAT TGTAGAAAAC TTCGGCAGCC TTTTCCAGCT ACATAAGCGT AGACAGCAAC TGGGACATTC ATGACTCTTA TAGGCCTTGT CTATTCCATT GATAAATGAA TTGAGCACCA CTGTGA.ACAC GAACAATAGC ACCTGAAATA GTTGATTTCC TCAGGACGCT GCCAGCCAGA GAGGTCAATA CAAAGCCTGT GCTTCAATCT GTGCTATATT GGATTTTGTT AAGTGGGCGA TTGATGATTA AAATGAACAT CATAATCCCA TGGATGAATT TGTTTTrCTCA TATCTTATAA TTCTACCCTA AAATGGGTTA AGGAAGAGAC TTTAGAGCAT TTTTTCATTC AATATGGTAT AATAAAAGGG AATTTCTACA GAAA.AGAGA.A TTATTTTAGC AGCGGGTAAA GGGACTCGCA TGAAATCTGA AGGTTGCGGG TATTTCTATG TTGGAACATG TTTTCCGTAG AAAAGACAGT AACAGTTGTA GGACACAAGG CAGAATTGGT AGACAGAATT TGTGACTCAA TCTGAACAGT TGGGAACTGG AGCCTATCTT AGAAGGTTTG TCAGGACACA CCTTGGTCAT TCACTGGTGA AAGCTTGAAA AACTTGATTG ATTTCCATAT CTATCTTGAC TGCTGAAACG GATAATCCTr TTGGTTATGG ATGCTGAGGT TC'PTCGTATT CTTGAGCAGA AGGATGCTAC AGGAAATCAA CACTGGAACA TACGTCTTTG ACAACGAGCG ATATCAATAC CAATAACGCT CAAGGCGAAT ACTATATTAC ATM'TTAAG GGAGAGATAA TTTAGATGTG TATTGGTGAA TCCTGAGTCT GAGCCGTAGA TGTTCTTCCA TGAAGTAGAC AGTTTTGCTC GAAAGTTTTrC TCAAGCCAAT AGTAACTAGG TTTTCCATTT CTrGGACACT CGCTTTTGAA GTTCAGTGAT GCATCATTTTr CTTTTGTCCT TTTGTGAGA GGGCATCGTA ATCGGTTTGT CTAAGTGTTT TTAAACGATT GGCTGTCATT AAAAAACTAA ATAAAATAAG AAAATCAAAA AAAATCAAAA AAGAGTGCGG AATGATTTGA GATTATGTCA AATTTTGCCA TTTGCCAAAA GTTTTGCACA TGTGGGAGCT ATCCAACCTG TGAGGAGGTC TTGGCTGGAC TCATGCAGTT ATGATGACAG TGCAGGAGAT ACTCCTTTAA CAATCATAAA AATGTGGCCA ACGAATTGTT CGTAATGACA AGATTTTGAA AAGCAAATCA TTTGTITGhAG GCTTTGAAAA AGACGTCATT GGTATT'ITCC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 .5
S
*54* S. S*
S
S
1079 GTGAAACTGG TGAAAAAGTr GGCGC1'TATA CTI'TGAAAGA TrTTCATGAA AGTCTTG4GGG TAAATCACCG T43TGGCGCTT GCGACAGCTG AGTCAGTTAT GCGCCGTCGC ATCAATCA'rA AACACATGGT CAACGGT'T A~C.CrrGTCA ATCCAGAAGC AACTTATATC rrGAGATTGC TTCGGAAG?? CAA.ATCGAAG CCAATGTTAC C?1'GAAAGGG TTGGTGCTGA GACTG?1-rTG ACAAACGGTA CTATGTAGT GOACAGCACr GAGCGGTCAT TACCAATTCT ATGATTGAGG AAAGTAGTGT TGCAGACGGT GTCCTTATGC TCACATTCGT CCAAATTCAA GTCTGGGTGC CCAAGrrCAT 'rrGTGAGCGT GAAAGGACT TCAATCGCTG ACAATACCAA GGCTGGTCAT GATATTrGATG
CAALACGAAAA
ATCGGAGCAG
GTGATAGTCG
A'rT=GAAC'? TTGACrrATA TCGGAAACTG TGAAGTGGCA AGCAACGTTA ATTCGGTGC ATGACGGCAA AAACAAATAC AAGACAGWCA CAACCATTAT TGCACr-AGTA GAACT'rGGTG TTACTAAAGA CGTGCCAGCA CATGC'rATT ACGAATATGC AACACG'rCTT CCTCATCATC TTGAAGAAAA AACGCTTAGC CGAAAAGAAA AAGATCAGGT TGAATTACCA GAAGGCAAGG ATGGGGCTGT CTGTGTTTTA GCAGTAACGG ACCGCAAAGC TATCGAGGCT G'rCTCTrACC
TTGAAACAA
ACAATTCCCI'
CTATTGGTCG
CTAAGAACCA
TCTATCAAGG
GAACTGCCCA
ATGAACAAAA
AAATTCCAGC
TGGAACTATT ACAGTCAACT TGTCI'TGTT GGTTCAAAT CGTTGGTGCT GGTTCAACTA CGGTCGTCAG ATCAATAAAG GTAGGAGCCT ATCATGGAGT ACCAATATT'r
ACGGGATTTG
ACTTATTG
CGGAAAATTG
GGAAGAAACA
C~wTTTrAAT AAAACACAGC CCCTG;TGGCA GCTGCCCTTC GTGAAT'rAGA GGAAATTAGA ACTCTTGTAC GATTTTTATT CAGCTATTGG
AAACTGGTCC
ATTTTCCACA
GTCAAGCAGT
GAACTAGGAG
GCCTATACAG
GAGAAGTTAA
GATGAGGATG
CAATCAGGTC
AAAAAATAGA
C'rAATAGACG 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 AACTATATTT AGCAAGCGAT TTGACAAAAG TGGAAAA'rCC GCGTCCGCAG AAACCTTGGA AGTCCTTGAA GTGAGCTTAG AAGAAGCGAA AGAATTAATC ATATCTGC'7A TGCCAAGACA ATTATGGCTG TTCAGTATTG GGAGTTGCAG GGAGGTCAGT ATGGCTAAAT CTTTATTAAC GGATGAAATG ATTGAAAGAG CGAAAAAAT? TCAGGTCCTC CTCTCTTCC CGTIrTGGTT GAAGA'rTCAG GTCGAACCAT TGTCTTCAAT TCTAAGTGA TGTTTTAGCA ATGAAACTTT CTT'rGCTAGA
ATGCCAATCC
CTATTCATAA
ATAAAATCTT
TGATAATGAG
TAAGGATCAT
AAGCCCTCGT
ATTTGCGGTrC
AAGGAATTGA
TCCAGCATTT
CA.AACTAAGA TTTTACCAAC GGTTTTAGCC ACGAAACCTT AT'rGAAAATA CCAAGAGAAA ATCT11TCTCT TGMTTGCT AATGAAAATA GGAAT'rATrG AGATAATGCC CAGGAGCAAG CTGCTATGCC AGAAGAACTG CCTTATCTGG
TTGTTTTT-GG
AAAGTGGAAT
AGGTGGATGC
GGGATGTCGT
ATGCT TATGG CTrCAAATCCA
CAGGAGATAG
A.AGTTI'TAGC
TCCCAGTCTT
TirGArGAGTT
AGGCTTTAGA
GAAAAGCTAG
GCAGAGCTCT
-AAGACACAG
GAATACCTAT CATACAGGAA TGGTAAGGTC ATGTCTGCTA CCT TATTAAT ACGGGTTCAG GATTGCTGAC AAATrAGCCT ACAAATGGCG CAACAACCGC AAAGAGTrTA TCTCAATTGG TTTGTTGCA GGAAATGACA CGTGGAGATG GAGGGGGCAG AGTCATCCGA GCTATGAGTG TATTATCGAA GCTGGACGTC TTAAGCGGAA ATTTGACAGT AAAACGTrTTC AGAGGATATT TAGGATTATC TCGCCAAGCA ATAAAAATGA CAAG 1080
CCATTGCTTC
TG.AGTGTGrGC
CTGGGGCAGT
ATCATGACGT
TCATGAAGTC GTTCTTGTAG GATrTTGGCT GATCA7=TC AGCAGAAGGT ATCGCTGTTG GGATG'rCACA GCr'rGGwr TrrATTTCGA ATCAGACAAA ACCTTTGT'rG ACCAAAACTG GCATCTTGGT TTGATTGCTA AGATAGAAGC GAT'rAAGTCC CATrTCCCAG CTATTGCTCA AGCAGCGCAT GCCC'TCAATC ACAATGCCAA CCATGAAGCA AACATCTTTT GCTCTGCCCA AGTC?1'GTTG ACCTTTTTGA TTTTCTAGCr TATGATAAGA TTTrAAGTAAA 3360 3420 34800 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4104
ATGAGTATTG
ATCALATAACC
AAATGACCGT CAGTGAGATT G'rGTCAAAGA ATTACCAGAA INFORMATION FOR SEQ ID NO: 170: SEQUENCE CHARACTERISTICS: LENGTH: 8876 base pairs TYPE: nucleic acid STRANDEDNESS double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170:
CACGGATAGG
ACAAAGTCCA
GCAGGATTGG
AA.ACCGTAAT
GCCTTI'CCCT
AACGTCCAAC
CTCGGCTTTC ATCAGTCCTC CAGCGATACG TnTGGGTATC CAACGATAGA GCTTGATTCG CCATCCACTG CTTTCAACAG TGAAAATAGA GGTGATAATG CCCAGTTCCT GATAAACC r AGGCTGATTT ACTA.ATAGCA ACTTTCCTCG AATCCTACGC TTACGCTGAT ACCTTTGCTG CTTGGAGTTA CTATTGGGCA AGGATGGTAC TTCCTTAAAA TCCCGATCCr TGTGTTGATA ACAGAGTTCA TGTCGGACAA T TTTCCTAAA GGGATTrAAA A TCCAAATGCC CATCTrTGGG CCTATTCCAC TGGACATGAT GGATAAAAGG GACGTAATCA GTCAGTTTCA TTTGGAGCTA TATCA.ATT CTTAACCCAA ACGCTCACCA GTTTGATAAA CTTGCGACTC ATATGGGAAA GAAAAATCGC CCACCTGTCG AACGTACACG TCTGCCGAAG TCTTCTAGTG AAACCTGCTT GGAGAGACAG ATTAACIT TCACGTICAG AATCTCCAAC TGCCACCACT TGACTAGGGT 1081 TATGGATGAG ACCGTCCTCA TGAATTCCC.A TATCAACAAA AGCACCGAAA TCAACAACCT TACCCACCAC TCCrCTAGC TTTTGTCCAA CCACTAAGTC crrGATATCT AGGACATCTT GGCGAaCACA GGTCCGTCAA AGGAATCACG GAAATCTCGA CCrCrTGA GAAGATCTGC AATGATATCT TTAAGAG1Tr CTGGACCAAG GTCTAACCT Ir.CGCCKI-rr CCrGACTGA AAGCGACTTG AG'rTGCTTT GGGCC?=CIC C rTAGGTCT ?rAATATCTA AACGTTTGAA GAGTTCCrrA ACTGCAGTGT AAT'rCTCTGG al'GAACTCCT GTA'rTrCAA GGATArrGCT ACTTTCAGGCG ATACGAAGGA AAC=TCTTrG A~rGGGCGC
ATTTTCAGAG
ATTGACATTG
TACTTTCTTC
TTTGACCAAT
TTCAACGGTC
ACCACTTCA
AAAAGCTTCA
TTGACCAATT
TTAACAGGA
CTTGGCACCT
ATAG7'rTGl
ACACCAACTT
TCACTGACAT
TCCGCAAGAG
AAGTCTGGAA
TTAACGATAA
CTTTCACGAC
AAATCTGCTA
TA.AATAACCT
GTACGAAAGG
AACCAGCAGC CTGCTCAAAG GCC~rGGCTC CCAGACGAGG GTGAAGTGAT TTrrCCrrCr TCCTCGCCGT ATTTGACAAT TGAGTCCAGC TACGTGTrCAA AGAAGAGCTG GCCTAGCTGT GGTTAACCAC TGTATCGACA ACAAAGTCCA GACTCTCAGA CGTGrrGGTA TTGACCGACA CCAATTGACT TAGGATCGAT GATCTTGCAA ACGACGGGCG ACTCCTCACG AGCAAGTTCG
CATAGCTGAC
TGGCCGTTCC
AATCTCTT
GAGTTCTCAG
CTGGTCAAA
TTCAGGGAAA
ATTTrCCAATG GGCTTC7'rCG
CATTTTTCCT
TCCAAGAACC
ATAGAAATGG CAGAGCGTTT CTGGCAGAAT AGACAGAAGC TCTTTCAGAA CTTCCGCTAC GCAATAATCT CTACACCGTA ATTTGACGAG CTGATGCTGG GTTGCATCCA CGACAGCTAG ACGCGCCCTT TCAGTGGAGC 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 AACCAAGAGG AGATTrGCGCA GATTGTCAGA AAAAAGTGG ATAGCTCCTT CTTCAGCTTT CTCAGTTAAT TCTGTCCGAA TACGACGCTC GATACCAGGC AAGACCTTTT TCTTAACGGA TTGCTCAACA ACTTCATCAA TATAAGCATT AAGAATACGG TCCGTCGCAT crITCAAAACC ATTGAGAGCC AAGGTACGAT AGCCTTGCAT AATCTGAAAA ACCTGCTTTT CATCAAGACT
TTCACCTTG
GATCTTCAAG
AGTTCCA.ACT
'rTCATCCTTG AAACGAGTAG CAAAGAAGCC ACACCAAGTT TCTCCCCACG
GTCTCTCAAA
GCTTGAGAAG
CTGTCTCAGC ACTTCCTCAT AAGTCATAGA ACGCAAGGTC ACATCTTCCG GACCAAAATA TCAACTGCAC CGGTCAAGGC TTCCTTGCCA GTCGCAAATC GAACMTTCA GCTTCTTTCT CTAAGTCAAC TATATTCTGC AAAATCAAGC
AATCATAATA
TAAGTTTAGA
ATAAGGCTTC
CTTCACAGAC
GAGCAAGAGG
AAAGAGTCCA GCTTCACGGG CAATGCTTGC CTTGGTACGA CGCrTTTCCT TATAAGGAAG ATAGAGITT TCAACGCTrC CTAAT7MT'C GGCAACTAAG ATAGCTTC?1' CCAATTCCT GGTCAACTTA CCTTCTTCTT GAATC~rAGC TGTCAGACTT ?1TATCCAAAT CA.ATAATAGC CATG;TCCTTG CGATAACGCG CGATAAAGGG AACGGTATCA ATIrGCTTTA ACGTCACTCc ATCCATAAAT CTATTATACC ACAAGCTAAA AAATATG'rAG GAAATAGATT TATATCCTAC CCACC7T ?rTAACCAAGC CA'rGATATCA AGTTCTGGAA GATGTTCCTG ACCTGGAATA CGATTGACTA AAATTAATTT ATTAGAATAA TCACTAGCTG AATCTTTATT ATCAAATTTA 1082 TAAGACAGCT TCCT1'ACGGT CAr'rGAGATT 'AATCGCC ACCTCATCCA GACTACCAGT AATAGTCGCC CCTTCAGC2'G TCAAACTTAG CAAATCCTGA GAGA7T'r CArATTT'TTT CGTTTCA.AAT TAACTCGTAG AACATTTAAA AGCGCAATAA CTTCACTTA AAGAGCATTG AAAGTATTTA ATGGATCAGA CATAATAGCC ACACATTCAC TTTCAAATT TTTATATGGA GGAAGA'rTAT CCATCTTATT TAAAArr'rCT AAATAAAGAT TATTCCAATT TATGCGrTTT 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 TTTC7rTI'T CCCACTTAGT TCGTGCTTCT TCAATACTAG TCTATATCTC CTAAGCCC CAAAGGATAA ACTTCATGAG AATAATGTAG AAAATGAATA TCCAGCTCGG TGAAATAAGT TCCTCTrCGA AAACAAGTTC TTGTTCCATA TAATAACGAA AATGC1TTTGT AAGTTTATA.A TAATCATCAG GAAGAATAAA TAAACCAACA AAAGCGTGTTC TATATTGAAA ACCAAGCTGT ?TATAAATTA ATCCTCCAAC ACAATTA'rTA CTTATAATCG TCAAGAAAAG GGAAkATTCC TTTCTCTGCA GCTATTAACT TAAAATCTALA TCTATCAAC TATGATAAAC AATATCAGAA
TCTAAATATT
TGTTTTTTCG
TCATAATT
TTATAGATAT
CAAATAAGGT
TTTCCA.AATA
CACCGTCATT
CTCTTTCTAT
CTAATATGAT
ATGTCGGGAA
'rGTTTATATC
ATGAGCGATA
TTTTAACCAA GCACTAAAAT ATCATAGTTT TcTAAGACGG TTTGTAGGAG TCTTTTAGAG ATTAATATAG CTTGCAGTTT AAATTGAT7'r ATTTTTCGTA TTGTTTTCTA ATTCGATGAT A7=rCTGTC AACCAAT'rAT ATCAGCAAAA TGACCAAGAA TrGCCAATTC
CGCAATCTTT
GTTTAGCATC
TAGAGTGAAT
AAAGCTTACT
CTGTATCATC
CCCCCCAAAA
CAACATCAGA
AAAACTrGAA CA'r'CGTTAA TAAAATACTC CATCAACCAA TTGAATATAT 3360 GATTCTATTT 3420 TATAACAGGT 3480 ATAAAGTCTC 3540 ATTrGAATAAT 3600 CATCTrGT 3660 AGGATAAAIIG 3720 ATCGGATAAT 3780 'rGAAGATAGA 3840 CTCATCTATA 3900 CTCTGTAGAT 3960 TGACAAAGAC 4020 GTAAGTCAGT 4080 TAGCACAAT'r 4140
TTTATCGCAT
AACGTAATAT
ATCTTAATAT
TGATTTACAA
TCTAATAAAC
GATACATCT
AA'rCTCCTGT
TAAATCATAG
TAATAACTTC
TATTTTTATC
TTCAAATGTC CAATCAAATA ATGAATCATT AATCATATCA GACAACTCAG CAAAAGAATT ATTCATCTGT TGGCTAATGG AAGCTATCTC TATATCTTTT AATGTTTGTC TCTCCACTA'r TCCTTGATG1' AACAAAACAA CACTAATTGA TTGACTACCT CCCATAATTT TCTGATAATG ATTTTCITTT TAT'rAATTA 1083 ATGATATATA TCAGG'TAATA TCAAGCTATA TTATC'rCTTA GCTACTCAAT 'rAAATI-rr AACTT~IrCCC
ACCTATGGCT
TAAAC1'CCCT
AGAAAAGAAA
AGCCAT?!'TT
TT'CCGCAA
GTCAAATTTA
GATTTGAAAC
AACTAGATGA
CTCTCCA'rTA a a a a a. a CTCAGGGCTG ATTGAAGrTT TCGTrGGGGG AGAATCTm~ TGGAATCGTC CCCATCATCA TCCI1'TCTT GTGACAGCAC CN'TGGCAAC TCCTTCCATG AATACTAGGA A7M=C'TAG CGCTTGTCAT GAGCATGATT TGTGCAGGCC ATTGATGAAT TGCTTCTATA ATACAGGTCT TC7'rTGCC ATCCTGCTCT GGATGCCTTT ATAGGTGCT TCTCGTCArr GGTCCAATGC AATAATAGTA TAATAGAGG1' AGAATCTAGA ATCGAGGTAC CAAAACGAGA CGACTGGAC AAGA740TTTG AAGACTTTGC AAGT-rACTTT CCCTGATGAC AAAGAGAAAA AAGTCAAAGC C~rTcr"CA ACAACTCCCA TCTAGTGTAC 7?CAAACTGG TCATTGAAGC CCTTCCCTTC GTTCTGATAG GAAGCATTGT ATATCACACC 'rGACAAGGTT TATCA~rT!TC TCCCTCGAAA TTrGGGACCTI 'rGTCGGTATA CTTTrCCCTT CI'GTAATG ATCGTT01' GGAAAAAAAG GTTCCAAGT ACAC GGCCGT CTGI'TATCAA TCCCA'NGTT CTr7rTTGCGA CCIWI'TCIGC TCGCCCTArr ACGAGCTCTG GGT'rCCATTC TTGGGCTCT GATTTCTG GCAAGAACCG ATTCAGAAAG AAAATCCTCT T7?CTTACTT GAGTTCTGCA AAAAAAGTT'r TCAGTC!T T'=rGATAC GGGGCGTTAT TT-GGTA1rrG GCTGCCTCTr ACGTTCCGAC TCGGATTCTG ACCTCTATCA GTGCGACCCC TGATGATTTT AGCCTTCT CTTTCGCTCT GTAGTGAGGC CTCTTCTCTC GAG'rTTCGGT TTGGCACCAG TTCTGGCCr TCGATATCAA AAATATTCI'C ATGATGAAAA ATTACTTGAA TCATAACAAT TGTAACTCTT GTCGTCTTAG 'rCTATTCTCT GATTCGATTT TTAGTr'rAG CTGGCTATTT TGAACTGACT CAAACTAAAC CAGTACATCA ACATGCACTA TTCCTATCTG rTTTTATC TTGGCTATCG TTCAATTGTA TATCTGGATG TCATCTGA.AC AGCCGATTAG CCAAGATAAC GAGTA'rTTCT CATCGGCTTA ACTr'rCCCAA C1'G1-AGC'rT GGATTCTCAG TCATTrCCCC CTATCGGAAG GAACGGATC1' AGCCATTCAG CCAATATTTG AAACCAGATA CCAG=CTTA TT?'rTCAAAA GCGAACGGCG GCGGATAAAT ACTTATCCCA AGATAGTATT TATGGAAGTC ATGGAGGCTA TCTACGACTA TCCAGATGAG GTTTACAGGC 7nITGTCTATA ACGACCCCAG TCATGCCAAT 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880
AGCACGATTT
CTTGATI'GGA
ATTTACCTCC
GCCTATATCT
AAGCAACTCA
CTTCTGGCTA
ACTGTT1TCTG
ACAAGCGAAG
TCAGCCTATG
CAGATCACTA
=TGAGCGCA
ATCAGTCACT
GTTATCCTAT
ATCTGTCGGG
CCATGCTGCT
AAACCCACAG
TTCCACTTGT
CTAAAGGTTA
GGACGACAAG
AAAAGGAAAT
ATGAAAACTA
AGACAATCCA
1084 ?TrCCGATT CGGCATTATC CACTGTATCG CAGATTCIGG TGTCTATWGA
AGTCAATC
TTGCTGACCA AGGGCAATAC CCGGCAGTAT GAAAACAACA CI'GGATAAC AAACTGGTCA ATCACTACCA TAAAGAACTC AAACAAAACC 7TCCAACCTT AGCTTTACCA AAGTCGATAA ACCAGAAAAT CCCTATGTAT ATAGAGCTTT CAAGATAAAA ACGAACAACT TCrcrCTGrA ATAACAGAAA AAW=CT1GT TTATATGAAA ATTACTGACT TGTAGAT'rT CATCTrATAC CA'rTCCCAGC GCTCATAGAA AATAAGCGAG CCACTCATTC ATTAGACTAG CGATTTCT GTATAAAGCT CATGGCCAAA G7TTTCTAAA AAAATAGTAT CAAAATAGTC 'rTTACGGCTT CCTCTCTCCA TGTAGCrrCA TTAGGATAGC GAGGACTAAT TCTCCCACTT CTCTC~rAAA AGCTTGTATT TrCTCCGTA GcGGAGTATC TTTCATAATT~ 'rATAGCCAAC TCATATCTAT TATACTCAAC ATTCCAG1'GA 7TrACAGCTTr CTCCATATTT TCTGACCAAT GC~rl'GCTTC AGATI'MCT GAACATCTAA GTCCGAAACA ArrTGAGATT TGATATAATT TTTAGTTTCC TATCCAAAGG TAAAATCTTA TCTAAATCTA GATAGCCACC ATCCAAAAGA TTACTTC'rC AAATTCCGA'r GCGAAATAAC GAGCTAAATC TCCTCCAAGA
AGCCAAAGGA
GGAAATCGAC
TTAAGAAAAT
TCGTTTTTG
AATACAAGTA
AGGTCTTGA
'rGGCAA'N'CT
AAACAAGGTA
GCTTCTATAT
TAAGACTGTC
TTAGAACTAA
TCTAACTCTG
ATCAGTTTCT
5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 TCAGACAGAT AGATTCTTCC CATCT-CGAAG ATGCTTTTcA GAAAATCCTT GCTATGATAG TCTACTTAAT AC'rATGCTTA ATTCTTCGCG TTT'r=TTC GGTATTCAAT TGAAATGCCT GGATCGCTT AATCAAGTCT CACTAATATT AGACATTGTA CTTTTAATTC CATTCGA'rrA GATGAAACTA CAAAGAATAA ACTTCCTGCG AAACAAAATA TAACTGATGC ACGA!TrrAAA TAGCTGTATT AAAAACAGCT TAAGCCTAGA AGACC7~T= AAGAAAT'rGC GGCTGATTT' 'rCTACAATTT CAT TTAAA CCATGATTTC TATGGATrrA GAAAATAGAC CTGCGAATCT GCATTGcTTc ccAAccGCC TT-CATCTTTT GTTCAAAGAT GGAATCTCAC CAC7TTG=?G 'rAATCGTAC GTG7=GAGC TGTGGGACTA GCGCCAGAAT AGCCT=TCT TAATCATTTC GTCCTCCTAT TTTCTCTAAG ACTTTAAGAA ATCTTCTCAC TGGTATAGTA GTTCTATGAA CGTCTTGTTG GTGTTCAGCG TATCAACrrA AACACGCAAA ATGGCCACTC TTCAATATGT GGTATTCACG AAAGCAACTT
AATAAAATAT
AGTTG1TGATA
AGCTACAGCC
AT7'C7'=CTG
CTTCTCAGTA
CTGI-CmrT 7rAAA'rTATG
TGATAAGATT
TTATGAAGCA
CACCACT~
GAATGGCCTA 6780 AATTCTGTTT 6840 AGTTCTTGAA 6900 I-rTTTCATTC 6960 ATCTGACGCA 7020 TAGAGTTCAG 7080 ACAAAAAACG 7140 GTTTCT7=G 7200 TCTGTAAAAT 7260 TACrAATACA 7320 7*TAGCATTAG 7380 AGTAAACAAC 7440
GAAGAGATAT
AGGTGGACGA AAACCTAAAT GCGAGAATAC CGCACI'ATG AATCCGTCGG AGCCAATGGG 7500 7560 7620 7680 1085 TTTAAGTAAC rCTrGTrCAA ACACGGTAAT GATGATAC TGTTCATITr TrGCGTGGTT AAATAGAAAT AAAAATCCTA CTTATCATTA TCCATGATAA GGTGCCCAAA TACTTGTAGC TTCGAAAGTC AATTrCTCTA.
GTGGCCTACA AACAGATACT AGCTTTTTCA TTGAATrG3AA GACATCTGTA GTTGACTCAG TTGTGAATCA AAGACCTTTC
AGTGGTGTTA
CATTCCCATC
TCTGGCTATr
AGATTACTGT
AAGGCTTAGC
AATCTTCCTT
CAAATAAGAA
CGATTTCAA0 AACrCCTCTC ACTTCTGAGG AATATCGTAT CTTTGGACAT AGCCAATAAA AACGATrGAA ATAACCCACC AACTrM'CAA CATATCATAA. CACTATTAAA GTTAACCCA CAGTCCCTCG CCTGTATAAT CCGCATACTr ACTAGCAAAT TITAATCGCTr GGTAGGCCTC ACCGTCATCA GCAGGTACTA AGACCCCAAC *t a a.
a a a
ATTCTTTCA
GCAGTTIGGTC
ATCGTATTCG
GGTATTATAG
'rGATAAGAGA
TTTCCATGGA
TTGTTGTTGA
TTTT-CTTGAT
TGCTT'rACAG
AGCTTTGGAA
CCAACAAAAT
GTGTGAGTA.A
TCATTAACGA
GATrTGACCI'
ACTTTCGCTG
CAAGCAGCCA
TTC.ICTm
ATATTGAI'A
CGCCATCCAA ATrGTCGTGC AAGACTACAG ACAGCAT'rCG ATTG'rGAGAA GAATGCTTCC ATCTT1CAG CGTGAACCTT TTGGAACTCT CCAAAATAGA ATATCAAACT CTTCCTTATC CTTTrATICAAT CGCATCA'rrA TCTAGGAAAA GCAACTGGTC TGGTGACTGA ATTrCAhA AGACAATAAC TATrGATACG CGCCCTTCTT TTGATTCCAG AGATGACTGA T'rTTCTCAAC AGGAAGTGAA ATC'rCCTGAT AACCAGTrG AGCCGACAAT GATTAAAAAA 'rGCATCAACA CTATTTGGAT CCAAGTGAGC CTrCTGTACT TACCTGGTTG 'I-TAGGTTGG TGTATGAAGC AACTGCTTTG cCCTTrCATTC GTCCCCTCAG AACTAGCATG AGCCTAAAAA CAAGGCTGAA CAGATTCCTA A'IGTGGCTAA CTTTCTCCTA AATGTCTTGG ATTAAAGI'TT CTTTAACTAT cI'I-rT=ATT TAATGTGTTC ATCGTCTTrC CTCCGG 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8876 INFORMATION FOR SEQ ID NO: 171: SEQUENCE CHARACTERISTICS: CA) LENGTH: 14736 base pairs CEB) TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear a. a.
a a (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: CGCAAACTTT CGCGGTCGGA AGGTAGTTTT ATGACACGAT TTGAGATACG AGATGATTTC TATCTCGATG GAAAATCAT TAAGATTrA TCTGGTGCCA TTCATTATTT TAGGG'PTCCT CCAGAGGATT GGTATCATTC GCTCTATAAC 'TGAAGGCTC TTGGTTTTAA TACGGTAGAG *060 0 to 000 0 0600 a* a* .6 V6.00 ACTTATGTrG CTrGGAA =T CTGGATTTAG AGAAA3rTCT CCGTCTCCAT T1TATCTGTGC AAGAACATGC GAATTCGCTC GATCAGTrAT TGCCAAGACT ATGCAGGTTrG AAAAMGAGTA CGACAGCTAA TGGAAGAG'rG CGAGCTACTC TGAAAGCTGG GGTTCTAAGG CACCTTACAA AAATGGCC-AC TCATGTGTAT ATTATCACAC GGGATCCTAA TCTATCAATC TTTACATGTr GCTCGAGGAA CTTTGGACCT GAAGAAGGAA ATCCAACTGC TCAGAGTATC CGCAGTGGA CTAG'rTGAAA AAG'TTrCTTT CTCTATCCTC AAAAGATGGA GAAACAAACT GGGATGCAGA CAGCTGTATG TCGATGGTCA ATTTTT1TATC AAGGTAAAAA GGGCGTGTCA ACTATGGGCA GGGGTCTGTA AGGATCTGCA AATCCTGAGA AAATTGA'TTT TATGACTTTA CAGTCGAAGA GGGG=rGCCT TTGTCAATGG
AC.ACGAGCCT
CCAA.ATAGCG
GGAATGGGAA
ATCCGACCCA
GGTGCCTCGT
TGCTTCTTAC
7rGCGTAACC
AACCTTAATT-
CTTTTCCCAG
GGAG'TTCTGG
GGAATTG4GCA
CCACGGTGGT
1086
TGTGAAGGTG
CAGGATrrGG
TTCGGTG~CCT
GCATATATCG
TTGTTGGACA
GGAGAAGATA
rGTrCCCCTCT
GAAGAGGACC
A'rGCAGGAAT
CATGGTTGGT
GATGCAGTTC
ACAAACTT'rG AGTT1-rATT'r TGAAGGTGAT GTC'TCTACGC AAT7GTGCGT TACCAGCTTG GC'TCTTGACC AGGCAGTrcG T1CGCTACTAT ATGGTGGCAA TAITCCATG AGGCTrACCT GAGAGCGATT TTACATCAGA TGGTCCATGG TCTITTGTAAC AGGAA.ACTTr TCTTTGATGA ACATGGTAAG TCAATCGCTG GAAAGAACCG GAGA.GGTT GGAACAAGGC GTTrCATGAA TGGTGCTCA AT'rAcCATGC CCT'TCTGGAT AGATGATGGC AACACATTTT TGGAGTTGGA TGCTATTCCA TGTCAAGTCC TGTAGAAAGT GCrACCTACT TTATCGAACA TTGATGGTCG AGATAGGGCC AGACAGAGAT TGGGGAAGAT GCCACAAGTT ACGTCrrATG TAAATATCTT GCAGTCAAGA ACCACTCTAC AAAGAGAGTA GTTTGAA-ACC TrAGATAGCT GGAGCTGGGA CAAAGTTATG AGAAGAAAGA Cr'rCGTATCA GTGGGTTAAA ACTCAATATC GAAAGGGCTA TC'rAGGTTAG TAAGTTCTTA GCGGATACC TTTCTTACTA AACTGGAAAC TTCAAAAGGA TGGACTCAAG GCCAAAAGAT ACTTACCTAG GCAGAATCTA GGACGTTTTT 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 .1620 1680 1740 1800 1860 1920 1980 ATATCTTGAT AGAAAATATG AACGTAAGGG AATTCGGACA ACTATCCACT CCCACTAGAC GACAACCAGC CTTTTACGCC ACTTG;TCt-GA GTTTGGTAAG GGAACGTTGG CCCAACTCTC TCACTTTATA TCCCTCATAG CTATCTCAAG CAAGGTGCCA ACCGCATCAT ACAGAAGGTC AATATAAAGA AGAGATTCAT TrAACTCGTA AACCTACACT AAGGGGGAAA AC2'TATGACA ATTGTAGGAT GCCGTATTGA TGGACGTTTG AAGTAGCCAA TCTrGGGCT GGAAAACTAA ATGTCACG CATTATGGTT AAGTTGTCAA CAACGATATT GAAA.AGAGTG GTTTGAAACT TGCGACACCA
AAAACATATA
ATCCACGGAC
GTAGACGACG
CCAGGTGTGA
1087 AATTGAGTAT TlwrGCCAGTT GAGAAAGCTG CAGCCAATAT GCCAACGTCT CTTTATCGT GCTCGTAAAC CAGACCGCTT GTGTACCACr TGAAACCCTT AATG71r.C-CA ATATGTCTCA TTACACGTTC TATCAACGTA GTAGACAAGG ATGTGGAAGA TCTTGGTrGGC AAATACGATA CCTTGG~rrG GITAGAAGCAG AACACCAGAA ACTCGTTCTA CTCCACAAA CTGGCAGAAA AAGG743TTAA ACTTACTGCT CAGATGGTTC CAAATGATCC AATTTCAGAC TqrTTGAGCT TATTAAAATA GGAAAAAAAT T~rrAGGAGG TCATTIGTTAT GATACAATGG TGGCAAATrT TACTTCTCAC 7T'rG'rACTCA GCTTATCAAA TCTGTGATGA GTTGACGATC GTTTCATCTG CAGGTTCCCC TGTA3rrGCT GGTrTCArrA CTGGTTTAAT CATGGGAGAT GTGACTACTG GTTTACTTAT CGGTCGTAAC TTrGCAACTGT TCG7TCTTGG GGTTGGTACC TTCGGTGG'rG CTTCTCGTAT CGACGCAACT TCTGGTGCGG TTCTTGCGAC ACCTTCTCTG TTTCACAAGG AATTGATGCA CCGCTTGCCA TTACTACAAT CCTGTACCA CTAGCAGCTC TCTTGACTTA CTTCGACCTT CT'rGGTCGTA CGAACGC= T GACTATAAAG TCTATCTCGT GCCCTTCCAG AGTACTAGAC TTCGTTGAAG TATGCTTCCA GGTCTTGGAT TCACTACCTT GCTATGGGAT AACAGGTCTT GGTGGCGCTG AAAAATTGGT T'rCGTGAACA TATTTTCCT GCAGTGCTTC TACACCATCA GAAAGTCGGG TGACTACTAC CTTCTTCGCT CACCGTGTGG ATGCTGCAAT GTATTGAACG CAACTACTrG CTTGGTGCGA TTCCGTGGGC TCTTCT'TGC CCrTTGC=T CGTGGTCCCT TTGTACAATC CCTACAAATG GGTTGCAGAT GGCTTGACAC 'rrGCAGGACG 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720
TTGCAATCTT
TTGGT7'rGAC
PTTCTGGTAT
ACTTCAAAGG
ACTTCAAAAA
AAATCGAAGA
GCTTCGTTAC
AGCTA'rGTTG
CGTAGGTACT
TTTGTCTATG
TAGCCAAAAA
TGACGALATTC
'rGTTTACTTT
TGATCTTGCC
CTTCCAG7TTA
ACTGTTCTTT
CTTCC-TGCTG
ATTGGTAT
GTAGCTGTAG
TAATTACAAA
CCAATTAGGT
TCAGTTGCGT
AACGTAACCT
ACTCATATGT
AAGTTGCTGA
CTATCGTAGG
CAGCACCTTC
CTTACAAAAG
TGGAACTACG
AAAATG'TATG
AAGATT7TTAA TCAAATCAAC AAACGTAGCT AACGTATGCA AGCTTCTGGT TACCTTTACA GTGATGGAAC TCCTGAATTG AAAGAAATGA TGAAAGTCA CACCATTCTT CCATACCATT ATCGCTGGTT rrGACCTTGC TACTCAATTC TTrCAATACTT CATGGAAGAA AAAGATGGTG TAGGTTCAAA AGACGCCGTT AACCGTATCA AGACAGGTTT GATGGGACCA TTCGCTCCTC TGGGGATAC AATClnTGGT TCACTTGTAC CTGCTATCAT GGCCTrCAGTC GCAGCAACTA TGGCTATCGC TGGCCAACCT TGGGGGATCT TCCTTTGGAT TGCAGTTCCA GTAGCGTATG ACATCTTCCG TTGGAAACAC TTGGAA?!1G CTTACAAAGA AGGGGTTAAC CTTATCAACA 1088 ACATGCAAAG TACCTTGACA GC"rrGArrG ACGCTGCATC TGGGTGCTCr TGTAGCAACA GPGATTAACrT TGAAM'TTTC AAAAGATGAT TGA?1'TCCAA GACATCTTGA ACCAAATCTT TCI-rACTGC CTTTATCTTC TGGTGCTTG CTAAGAAAGG TCGGTATTAT TATCGTACTI' GCTTGGCTC TTTCTGCCCT TGTAATTCCT TATGACTAAA TCATTAATTT TGGTCAGCCA TTAGAGGTAG CACAGAAATG ATTATGGGCC CACAAGACAA TTCCAGAAGA TG4GCCCACAA GAATTTACTG CTAAATTTGA ATrATT'TCCT AGTCTTTGCG GATCTTCTCG GTGGGACACC TGATCATGGA AGGTCGCAT ATTGACCTTT ACGCAGGGAT AA'rTTATCAA TGCGAGCCTT ACAG4GCGCAG ATGCGGACTA GCATTGlTGAA AGTTAATGAC CTGTT-AGCGG GCTTCGATGA CTTCGAAAA'r CTCTTCAAAC TACGTCAACG TCGCCTTGCC TCGTCAGTCT TATCCGGCAA CC'TCAAAACG GTGTTTTGAG CGGCAACCTC AAAGCACI'GC TTTGAGCAGC CTGCGGCTAG GAACTCGATT CAATTCATGT GACAACGTGA AAATCGTTAG CATGGGAATG TAGCTTACTC CCATTCCCAT ATTTAATAGA CA'rTATACAA AAGAAGACTT GCTCGAATTG GGTGCAGAAA CAACAGCCTG ATGTATGGAG AGAAGCTTTT GAATTTATC GCAGCCTCC TACAAGAAAT CGCTGATAAA CATCACTATA
TGTACTTGGT
TTACAAGTTG
CCCACGT'rrG
TATGAACTCT
TG3TCACT
TGGTCGCTTC
CATTTACACA
AGCTGTTATT
GTCTTCATGA
CCAATCGGTG
CTrCCAGCAA ACTrAAAGCTA GCACT'rGGAA
TGTGAGGAGC
GTAGCTCTTC
GAAGGATTGG
TTGAATGTG GTCGCTCGCT GAATC=TCCA ATCGCATTG CAACACCCGT GCTGCAGAAA TGACGAAGAT GAATAATACT GTAGg'rATAT GTTACTGACT CTGACTPCGT CACTCTTATC 'rTTCCTACAG ATTTTAGTTG AGCATT'rTAT ATAGAATATA AAAAGAGGAA CTCAATGCTA TCACTACGCG TGAAATCTAC AAGCAAALACG TGAAGAAATT TTAAGGTTAT CTTGACAGGT 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520
GCTGGGACTT
GAACGCAAAT
TATTTGAAAA
GAAAGTTTGG
ATTACTTGTG
TTGCTCTTrGC
TCTATGATGT
CGI'IrTTGAAc
GAGCTCGTTG
CTTGCTCATG
CTGCTTATGT GGGAGATACC TTGCTACCTT AT=TAAGGA AGTCTATGAC CGAATTrCAA TGCTATTGCG ACAACAGATA TCGTTGCCAA TCCAGCAACC AAGATGTGGC AACTrGTCCTT GTG;TCT'T7G CTCGTAGTGG GAATTCGCCT CGACTGTTGA =GGCCAAA TCCTTGGTGG ATGAGClwrTA TCAAGTGACG CAGCAGATGG TAAATGGCT CTTCAAGCTC ACGGTGATGA TCGTAATCTC AACCAGCTGT CTCTAATGAT GCTGGATTTG CCATGACT'rC TAGCTTTACG TGACAACTCT CTTGGTCTTT GATCCTACAG AATTTGCTGTr TAAGTCTGAA TTGTATCTAG TCTTGCCCGT AAAGTTTTAG ACAAGGCAGA AGATGTCAAA ATTTAGACTT TAACCGTrGTC ATCTATCTAG GCCCTGGTCC TTTCTTGGA PAAGCTCAGCT CAAGATTTTG GAATTAACTC CTGGTCAAGT TGCGACCATG 1089 TATGAAAGCC CAGTTGGCTT CCGTCACGGT CCAAAATCTC c~rTGGTCT 7rGGTACAAC GACAGACTAC ACTCGTAAGT GAAGTTGCTG OTGACCAGAT TGCTCGTCGT GTrT =CTTT CTTGAAAATG TCAAAGAAGT GGCCCT'rGGT 7GTGGCGCTG G;TCTTCCCTT ACATCGTTTA TGCCCAACTC TTTGCTTTAT AATAAACCAG ATACACCGTC TCCTACAGGT ACAGTAAACC A7'TCACGAAT ATCAAAAGTA AGACAGTGTT TATGAAT1'CT TATCAGATAA ACCATAGA?1' GTCAGTACGC TTTCTATGGT GTAAA.AGCAG AACAGAATGA AAGCATACAC AGAGCGTGTA GGATGTCTTG GCCTATCGAT TTGAGACAGA CGGTGGCTAC TGGTGCGACT ATCTTGCGCT ATGTCGCACC TGACAAGGCCT CTTGGGATT GATGACTTTG ATAGTTATGT AGGCAATAGT AGGTCCTGTA GCGGGTCGTA 'IrGCAGGTGC GACCTTTGAG CCTTGAGGTT AATAATGCTA.GCAACrTGTAA TCACAGTGGT CTTCTTTGAA GTTGAAGAAG TAAGCGATCA TGGCTTGACT TIGGGACAGGA GGGTTCCCTG GAAATCTCAA GAT'rGGATC TGGTGCCTAT GAAATCAGCT ACAAGGTAAC GACCGATCAG CAACCACAGC TA=TCALACT TGTCTGGTGA TTTCACCCAG CCAACTAAAC ACAGAGGGCA TTTACTCAAT CGCTCCTGAC AGAAGCCAAC CGTGATGTGG TCAAACACGT CTACAATGGT TGCAGAAGAA GATGAGCAAA TCCAGCTIGGC ATCAGGTTTG TGCAGGCCAT GACAATGCTG GATTCCTTTA TGACCAAAAT CA.AGACAGPA GCTCCTGCT TTGTGGTCTA CACAGCAAAC CATAGGAGGT CAGCCAATGC TACAGCACAA TGGGATTGCT AGATGCCATTr CACAGTGACC TTAAAGGCCA AGTCM-rC7"r CAGTAAGACA CGTrATGAAC 1-rG~lrGAA GTAAAAGAGT GCTAGGA6ATA GGTACGCAGA GACAAATAGT AGGAAAATAT TTATCAACGA CAATACAGTT ACGACTTGGA CTTGGTTCGT TGAkGTGATCA AGCTTTTGGT TCTTGrAATIGA TATTTACCGT TGAC?1TCACT CAAG=rAGAA- GTGTAGTACA AGGTGTCATA TGACAAGAGG ATTGAA.AT TGI GCrr GACAGAAATA 'rTGGAAATG 'IIAGGGTGA CAACTTGAGG TTATGACII'A GG;AAATTTT6 ccA6ATGTTAT CCCAAGCATG GAGCAAGTGT CTCAATGGTA AGACCTATGA TCAACrGGTT GGGATTCCAG CTCTACACAG AGCGTACAGA AGTrATCACT TGGAAGAAAC GATACGCTGG TCAATCCAAC ACGAT'rGACC GTCATGTCTT GG'rGT'CCTG CCAAAACTCC ACCTTGTTGA AGGATATCTT GATCATCCAT TTGCCCTTCC TCAGGTCGC'T TCCTGCTTTT T'rGGGATG AAAGTGTCAT CTTGAAGCGC AAGCTTrACC AAAGCTGGTC AAACCTTCAC CATTGCGCCT ACTTPGGGA GATATAACTA AG;CGTTGAAA 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 GCTATCTGMr AATATAATAT TCAAACTACA ATAAGGAGTA AGAALAGAAAC GAAGAAAATT GTATTTGCTA GTGCCTTGGC TTTG.ACCTTG GCTGGAGCAG TI-TrGACAAA TGATGTTTT 1090 GCGAACCACA GACTTGTGGC AACACAAACT ACTGA'rGCTA TCAGAGGTGC TAAAACCTTC TAGTGGCAAT G71rTrGGTrG GCTCCTCATC AACAATCTAT TITGGATGCC ATCAATGCTA GAAGGTTTGG TAGATAAGTA TGTCCCTATC AAATGATCAA AAAATCAAAA TGTArrGACC GAATCAAAGG AGAAT7TGTG
TCTGTAAAGA
CTGACCTAGA
CCCG'TCTTTC
AGCGGCTGAC
AAAGGCAGCT
TAGCAAAGAT
T'GCCAGAG
CTTTGGAGTG
GACGGTTTC
AAAATAGTGG
CTACAGAAGC
CCTTTCCAAC
TAAAAGCTAT
TTCAGACAAC
ATCI'ATAACC ATGGATCATA ?rCTAATAGT ATAATGGGAG TGAACAATGG CGTGCTGAAA GGGAAATCTG GTCACTATGA GCTTTTAAAA ATCCTAACAA GATGCTrCTT CAGAGGAATT AAAATTCrGC ATGGAATCAT AAGCAGATA TGTGGAGAAA GTCGCTAATT AACCCTAAAT TCAATACAAA GCTATTACAA GGCTGGTAGA TATGGTTCTG AGT-rAAAACT AAAGCTACGG a
C
C.
C
C
q* C C TTACACACAT GGGGATGGCA 'GCTCAAAC TCTAGGTGAT CTGTTCAGTG TACAGAAGTG TTGTAGAAAA ACCACTGAAA AATCTAATGG TAAATGGTAT CAGATGGTAA ATGGTACTAT TTTC1'GGTAG CTGCTATTAC ATGGTAGCAG A'rGGTTCTAC AAAATGGCAC TTGGTATTAC TCCGACCACA CTGGTACTAT CAGATGGTTA CCGTGTAAAT AAGCATTCAT CTTACTTAC TGGCCTCTTT TGATTTATAA GTAAGCTAAT A~rMrATAGC GTAATTAGTC TGAAG;TCCAC AATTTACTGT ATTTTTTATA TATCGATTTA ?TGTATAATG TATTATT'T' ATGTTTATAC AAAAGCGACG 'rTAATATGAG AGAAAGTTGT CAGTGGGAT'r GGGATTCAAT CGCTATCGGC GAGGGTAAAT TGACAGACGA
ACTGCCTCAA
GATT=ACAG
TTCTATCAGT
TTCAATGACT
T'rGAGCAA7TT
ACCTTTCAAC
GGTTGGGTGG CGTCTACGTC TGATCAGTCT CTGGTGATGT GAAGACAGGT TGGGTGAAAA TAGGTGTCAT GCAGACTGGA TTTGTAAAAT CAGGTGCTAT GTTTACAGGC TGGGGAACAC 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 CC CC .C CC
C
*C.C
.C.
.C
C C
C
~TTTGACGGCT CAGGAGCTAT GAAGACAGGC TGGTACAAGG CIGACGAAG CAGGTACAT GAAGACAGGT TGGT.rAAAG GCCTACGGTT CAGCAGCTTT GGCTGTGAGC ACA.ACAACAC GGTAATGGTG AATGGGTAAA CTAGGCTCAG GCCATAGGTA AAAAAGAATG AACGATAAGA AAGAGGTTGA TGGCGAACAT AGAT'rGGATT CTTGTCGCCT CAATTCAGA CTTTTCTATT CCATTAAAAG CATAACGGT AATCTAA'PT AAAAAATGCT ACTTACTTGT TGAGATGTTA TCTCTGTTTT TTATCGTTA: GTATGCAGAA TATTTTAAG TATATTTCAA TAGAAATTTC ATAAGTAATTr GTTGAAAAGT ACTCAGAAAA 7rCCATACTA TTTTATGCTA TAAAATATAG ATTGATATAA AGAATATAGA CCGAAAAAGC A7TGGTGAGA AACGCCATAG 7TTNCTCGATG GGTATCAGrT ACTCTATCTA GTTCTTTTTr GATGACTCAA CGATAATATG GAAAGTCCAA TTCA'rTATAA GTATATGACC GGAAAAATCC I-rGCTGT'AG AGGCCCTTCC ACAACTGGCI'
GAAGAATCAG
GC1rrAACC TTrGTTTCTA
ATGGGAGTTC
TATAATAGTC
GGTTATCAAT
ACAGTTGAT!G
GATGTAGTTC
GGTGAACCAT
ATGATACTTA TTAC'MGG= CAACTGTTGG TACTITTCCX' AAAGGGAAA.A TGGAAAGAAA AATTSTTGCC GCCCAGTGCT AGC?'rrCTAT CGGAGTCGG ATATTGCTTA TATCAAAACT GGAAATACTC TGCTCAAAGA ATTCAGC'rGA 'rTrAGAATGG CAGGGGATGA TGGACTTTCA 1091
TATAGATCI'C
TTTACTGCAG
CGACTTGTTC
TITGGGTTGA
GAACATTTAC
AAGAAACAGG
AACAGTTTr ACCG.AA'rACA GATTGAGCTT GTTAGTTT-TA ATTTrCTCCT GTTGACTAGC CCAGCCAGAT TTTATCTGCC CAGACCCTCT GAAAATCGAA ATAATACAGA GCTTTCAAGG
TCTTCTAATG
GTAGTTCCAC
GCGGAAGAGG
ACCAAAGGCA
GTCTrACACTA
GCAG'N'CGCG
CCAGGTCATG
GCGACAAAAG
CCGGC'rTTAG
GAAGAAATTC
ATTCATTCGC AAGTCAAGTT CAACAGTGCC AGAACAAGGA AAGTATTGGC G.ACGACAAAT CGCAAGAACC CGGTCATGAG AGCCACTAGA AACCAAAGGI' AGGAAGAACC AGCTTACACA AGGGCAAAGC TACAGTCCC GCACACAAGA ACCCGAACAT AGGTCACTAC ACGAAATAGA AGGATCCAAC AC~rCTGAAA GATAGTCAAC CAAAC'TCTAC AAAAACATCA AACCAAGGAC AGGGGA.AGGT TAGTTTACAA GAAAAATCTT CTATAGCAGC AGACAATCTA GAGCAGAATC CGGATCACAA AGGAGAA'rCT AATCCTGT'GT CTGCTACAAC GCTGCAGAGT GATCGACCAG AGTATAAACT TCCATTGGAA GG1TGAAGCCG CAG'rCCGTGA AGACT'rACCA ACACAAGGAC CCGGACATGA AGGTGAAGCT GAACCGTTAG CAACGAAAGG CACGCAAGAG GAAGAGACTC TAGAGTACAC GGAACCGGTA GAGGGCGAAg cGGCAGTAGA AGAAGAACTT ACGGAAATCC AGAATATTCC TTATACAACA AATCGTCGTA AGATTGAACG ACAAGGGCAA TACATCGTA.A ATGGTAATGT CGTAGAAACT GTCAACGAAG TCGTTAAAGT AGGAACACTT AACTAACAA AAGTGAGAA CAAAAAATCT ACCTCAGCAT ATGTTTCTGC AAAAACGCAA GTCATATAG AAAATCCTGC CAAAGAGCAA TATACAGT'rA AAACACACCT AACTTATAAT ACATCAACTC AAGATTTCCA AT'rAGAGTAT GTAGAATTAT ACGGTAAAGA AAATGATCGT CCGACTGATA CGGCTAA.ATA CTTTGI'AAAA CTACCTGTAA AATC'rATTAC AGAAAATACG 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 GCAGGGACAC GTACAATTCA ATATGAAGAC AAAGAAGTGT CACGAACTGA AGTAGCTCCG GTGAAAGTA AACCTACAGT AGAAATTACA ATAACTGTAA GTTATAACT'r AATAGACACT GTTTTCCATC GAGACAAGCI' AGTTAAAGAG GTAATATCAG G?1'TAGATTA CTACACACCG TTGGTGAAA ATAATGAGGA AAATACTGAA AAGAAAATAG AGATTAAAGA TA~rGATTCA TATCGTAGAT ATTTAAGTCT AAGTGAAGCG G'rGAAATCAG ATCGCTTCAA AGAAATGTAC
GATGCAACGT
TACAAAGATG
ACATCCITA
GCTTCAGATA
GGTCCATTTA
TTGAAGAAAC
ATAAAGTGAC
ATTACACATT
AACAGCTGGT
TGACCGCAGA
CAGGGACCTT
CATTANTrGA
GGTAGCCGTT
TACTGTAGCT
10921 GATCAACTG TCGAACAAGG TACAGACGGT AAATCTAAAG CAGAGCAACC AGGAGTTTAC AC74=TTC CTGATAGTAA AATATTAATA ATGTTGCAGT GTAGCGAGCG CAACAAATAC AATCACCAGG ACAGTAATAA AGTTCGAGAG TTAATAALAGT AACCAAACAG CTGG.AGGG.AT GTTGCTACTG GAGAA.ATACG TCI'ACGTGGC AAAACGGTCG AACAGCCATC CAAAGCAATC TG3'CTGGTGT CTATACATTG TGAGGTGACC TT-AGGCGATA AGCAGACAAG TITATCTCACA GATCGGTTCT GATGGAACAA AATCGTATGC CA7TTATCAT TACATTAA.AT GGTCCTACAG TTAGAGA=T GGATATTAAA AGAAAATG'TC GCAGCGC1TGG CGAAGGCAGC GAATAGCGCG AGAAGGAAAA ATCTCAGGTG CGAAATCTC'r TGCGGGATT-A AGTGATAGAA AACAGC'rCGT 'rTACAGGGAA ACTT-ATCGCA AAATG.ATACT GGAGGAATAG TAGGTAATAT AACAGGAAAT TAGGGTAGAT GCCTTAATCT CTACTAATGC ACGCAATAA'r AGTAGGTAGA TTAGAAAATG GTGCATTGAT ATCTAATTCG AAATGGT1CAA GGATATTCTA GAGTCGGAGG AATAGTAGGA AG.TAAATAAT GTTGTGAGTA ACGTAGATGT TATGTTATCA CCGGTGATCA ATACCCAGCA GCAGATGTGA AAAATGCAAG GATAATAGAA AAGCAGACAG ATrCGCTACA AAATTATCAA AAGACCAAAT GTTGCTGATT ATGGAATCAC AGTAACTCTT GATGATACTG GGCAAGAT'TT CTAAGAGAAG TTGATTATAC AAGACTAAAT AAAGCAGA.AG CTGAAAGAAA AGCAACATAG AAAAACTGAT GCCATTCTAC AATAAAGACC TAGTAGTTCA AAAGTAGCGA CAACAGATAA ACTTTACACT ACAGALATTGT TAGATGTTGT GATGATGAAG TAGTAACGGA TATTAATAAT AAGAAAAATT CAATAAATAA CA'N'TCAAAG ATAATACACT AGAATACCTA GATGTAACAT TCAAAGAAAA AGTCAAG'?AA TCGAATACAA TGTTACAGGA AAAGAATATA TATTCACACC CT?'rCAGACT ATACAGCGAT AACGAATAAC G1,AC'rAAGCG ACTTGCAAAA AAC'rCAGAAG CTACTA-AAAA AGTACTAGGA GCAGCGAATC ATGCAGCCT TAC?1'AGATA GACAATTTGA AGAAGTTAAA C'TAATATAG CAGAACACCT 'rTAGCCATGG ATAAATCAAT CAATACTACA GGAGACGGTG TAGTTGAATA AAAATCAAAA ATAACAAAGA AGCATTTATG CTAGGTCTTA CTTATATGAA GATATTAATT ATGGTAAAAT GAATACAAAA GA=rATCTA CCTACAAGTT GGAAATAATG AGAC7TCAAC GPGGATACT ATTGTCGCAT TAGGAAATAG
TGGAGATGGT
TACATCAGTT
AGACGCGAPA
AAAACGTAAT
AGTAGCTTAT
CTATG4GTAAC
GCCGATGAAA
AG'TTATGI'A
CTTCATAAAC
AGAAGCATT'r
TGTAACACTT
AGATAACCTA
AAGAAAAGTA
CGTAAGTGAG
CCGTTGGTAC
TGACTTTAAC
TGGACTAGAT
10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 1093 AACCTGAGAG CTrCAAATAC TG'rAGGTTTA TA'rGCGAATA AACTTGCATC GAAGATTCAC TCTI-rGAC3-r CCTAGAAGCG TATAGAAAAC ~'T=CTTACC AATAACGAGT GGTTTAAAGA AAATACAAAG GCATATATAG TCGAAATCAA GCAGAAGTAC GAGAAAAACA AGAA'rCACCA ACAGCCG-ATA GAAAATATTC TACGATAGAA TATCAGCACC AAGTTGGGGG CATAAGAGTA TGTrATTACC T'rACCTGAAG AATCTCGTA TAMTCATCG AATATGTCTA CACTTGCAT GAAAGATATC G'rCATAGTGT GGATGGAGTT ATTCTTTCAG GACATGCTT'r GTAAGAA6ATA GAGTTGATAT AGCAGCGAAA AGGCATAGAG ACCArrATGA AATCTTCTTG ACAGTGCTTC AAAAGAAAAA ClTTTCCGTT CTGATAGT r'rCAATGTAA AAGATGAGAC AGGAAGAACI' TAT1TGGGCAA GGTTAACGGA GGCTCTATTA AAGAATTCTT CGGACCTG7?T GGGrAAATGT ATGAGTATAA
GGTAAAAGGA
AAACAAAACA
GTCTGATATT
ATTAGGAGTT
ACTACTAACT
CGGTTCGTAT
ACGAACTTAT
TA7"TTGGTAC
TTATGATGGA
TAAAAACATC
TAGTAG3'CA
AGATGCTTAT
CTACTTTGAA
ACTGCAA'rCT
AGAAAAAGAT
GGAGCGTATG
GGAACGTCGG
GGAAATGGTA
GTAGATAGTG
GATTTGAATA
CAAAGTTATA
GCGATAT'rAG CGyA LGGAAG TTTAACGCAC TTTATACTCA TGAAATGGTT GACGTGAAGG ATTGGGAGCG TAAATTCTCA TATN'TAGCT GATTGCATAC ATATAATCCG TGCATGGATC ATATGATGTA CTCAAAATA.A TGATGTTAAG Tr'r=GTTAG ATAGATAT CATAATTCTG ATTCTGCAAT GAGTTATACG CACTTGGT'rr TTAAATACGT TATATAAAGC 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 TACGTTCGTG ATACTAGACA TAATAAAGAT ACAGATGAAG AAGTACCTAA CTTAACATCG AkATAGACGTA
ATGTTCTCTC
GCTATGATGA TAGTAGAGA.A CTGTATACGC AGCGCTAAGC GTGGAACCTT TCGATTCCGGA TGAGGCGCTT ATOTATACAC TTGATGCGAT GGAAGCAAAA A-AAAAATGGT TTAGAAAAAT AGAAAATTAT ACACATGCAG GAAATAAAGT CCGTCCATTA TTAAACTCAT TAATCGACAA CGACATCATA TATAAACGAA ATGGCTACTA TACTATAAGT AATTCGA.AAG GTGCTCCTGG AGATATTATO GAAAAAGGTT ATCACAAAGG ATTCCTACCT TTTGCCAGCG GAAGCAAAAC ATTCTCATCA GATGATTTAG TATTTAAGAA AGTATTCAAT AAAGCAATCT TTAAACA.ACG TATAGATAAA TTTAGAAAAA TAGCTTATGA TATGTTTCTA ATCAGTACGG TGGCATGGAA GAGATGTTGC GCTGAGTACT CATCATGGGC CAAGATAATC TGAAACCAAT
ATTACTTGCG
ACCAGAAGCA
TTTAGTGACA
TGATTTCAAA
AACAATTCAA TACGAATTAG GAAGTAACTA TAACAACGGC TGCACAAATG CAACAATTAA GATATTACTA ATATAGATCG TGCAACGAGT CATACCCCAG GTAATCCTAA TAGTACAAAA TTAATGAAGC GGCTGCGAAA CAAGTTGGGT GCATTTATTA
AAACAAAAAA
AAATAAGATT
GACAAATCGA
GGAATCATCT
TCTATAATGC
GTAGAGTTTC
CrCC~rTC
TCAACTCATC
ATATCI'CGC
ATTGTTGAGT
TTATGGATCG
AACGACCAAT
1094
ACTACAGATG
AGrGTTTCTT
ATGTAGAGAT
GGTGACAAGG
ACTTTAGAAA TTCTATATAT GTAAGGATGA GGAGTCAGAT TTGATrGAAT GCAGATTGCA TGGATTrCAA TCCCACAGAA AAATACAAAA CAATCCTAGA CAAGTTGTC'r GGCTTGACTT 14400 14460 14520 14580 14640 14700 14736 AATGTTGATT TGAGAAATAA CT'IrGCTAGT CTAGTJ AGATTTTTC TGGGATTGTT T7rrGCTGAG TGGGK TCTrGAGGGA AGTTATATAA TAGTTGTAAT AATTA( INFORMATION FOR SEQ ID NO: 172: SEQUENCE CHARACTERISTICS: LENGTH: 11770 base pairs TYPE: nucleic acid STRANDEDNESS: double D TOPOLOGY: linear
%.AAAT
~GCTT
a a (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: ACAGGAAAGC ACGATAGCAA
TGTTGCTTAC
AAATCATTTG
GACI'TTAGA
TTTI'TAGTTG
GATTGACAAC
AAATGAAAAA
GATCAAATTA
GGTGATACGA
AT'rACAGACA
GAGATTCTAG
ATTGCACATT
TTTATTATGT
AGCTTTTTAC
TCATGGGTTT
GCGTTAGATT ACATAAAAAA GATGCAGCCT TATTAGAAGC GGTACAAAGA CAGAAATTAC ATGACGGAGA TTAATT'rAAT TAATACTAGA GTA'rTATCAT GGTTTGTTTT GTGCAGCAGG GCGCAATCTA GTGGAGTTGA TATGCGCCAA ATATAGATGT TCAAAAGAAA TTTGTGATAA TCTCT 'rGGA AGATTTAAAA AAGTATCTTC GATTCTCTCT AAAATACAAT TTTAAAAGTT GTGAGTGAAA ATGATAGACT TGTGCT'rATA TAAACAAAAA AGTTGTATTC TATAGTTACA ATAGTGAAAT GAGGAGGAAT AATTTTATAT GCTGGTGATG TGGTACATGT GAACGGTGTG TCATAATCTA CAAACAAAAT AGCTCTCmT GTrCATTCAC CAAAGAAATT ATTAGTTTGA ?,GTATTAAC ATAGAGGAGO TT~rTCACT GGTATGCTrG GGCAGAAATA GAGGCGTNTT TGCACTATTG GGTCCACAAG GTGTGATGTT CCGATAGC'G AATATTCCTC AAAGTTTCGC GTCTTGCGTG CTAATTTAGT TTGGAAGAAG ATGGGGATTT GATTCAGTTT ATCGTTTTTC TAGTTTATCT GTTGTrTTTG AAAGAAAATT TTAAAATTTC TTATGGAAAT GATTGTTCCA CGAAACAACA TATTTATAAA AAGAAGAAAT ACAGTTAGCT 'TTTGGCACA GGAAGCGTLT AAGATCATCT CATGACCAGT GAAAAGAACT TCATAAAAAA AA.AACATAAT GGTGAAGATT TAAATAATAT GAAAATTGCA CTCAGTCTAA ATTAGCGGAT TTGCTTATAC ATTAGATAAA T'TATTCCGAT GATGGACTAT GGTATGTITAG ATGGGAAAAA AGTATTAGAT 1-rGGCCCrAT AAGGAGATT ATTATGTCAA AGATGGATGT TCAGAAAATC TGTGAATATG CGTGGCATTA ACTAGTTGGT AGTIrGTTCT CATTGCTAGT G7'rTTrGGAG T=GCTATT ATGGGTCTAA TAGCTCTAAA AGATGGGATG TGATTATGGG ACAATrGcCG CTAATTG.GAC AGAGCCGTTT CTTTGATTAC 'rGGGrAAGAA ATTGCACCGA TGATGAAGTT TTAGCAATTT 'rGCCATTGAC TTCGAAGGAT TAAATAAGAG ATGCAAGTAT ATTCAGGAAC
TTTCTIII=
a.
a a a. a a a a a a a a a a
CGGAGTAGAG
ATCATCTTAT
TGGAGGCCAA
C'rTCTTTATA
CAAACAGTT
TATTTTAGCG
TCAAGTTCCG
TATATCATTT
AGCTCTGCTT
ATTAGAAAAT
AGGTTCAGGG
ATACCAAGCC
TGTATTTGGA
TGTACTTGCA
AGGGGTAACA
GCAAGGAGTT
CTT'rAAAGTA
GTGTGTTACT
'IrAAAAGCTT
TTTCGAAAGA
AGAATATTTT
AAAACCTTAT
CTTACCAG CTGGAGTrT ATCCCTAAAC AAGGTGAGGC GGAATTATCG GTGCTATCAT AAGAGAAAAA TIaGTTATTAA GAAGCAATGA TTCCAGCAT'r AAGTCATTGA CTAATGGCG TTGCAAGGTT TAACTGGATC
TTGTGGTGGT.T'TGGTGTTCA
TTATCTAATC TTGATGCTAA GGTGCACATA TTGTTAC'rCA ATTACGrG GTCTTGTAGT TTAGGAAAAG TTGCAGCTTT TTTCCGATTG TCATGAATCC GCTGTGATAG TATATGGAGC TTCAAITGCC TATrcTrATG CTAAGAATAG ATCTGTATCT GCATTCTTTA TrTTGCTAAG GATTrGGGCAC GCTATT~AGTA AAGTTTGGTT TATAGGTTTrG GTAGTAGGAA GTATTTATAC GATGCCAGAA CA.AGTTCCAC AAGCTATTGC TC'TAATTTC TTATCI'TCTA TCAT'rGTATA AACATTCATA GAAATGATTr ATTCTCCTAT *rTTGTATGGT GCTATTGGAA TTGCAT'rCTT TGGGCAkATCG GTAGTAAATG GAGTACTOAC rAAAGCTATG TTAGCCTCTG C AATCTATC ACAA7'=TA GATTCATT'r TAATTCTATC TGCCATGCTT TrTGCAGCAA AATCAAAACA TCCAGCAATA TT-AACGTAA ATGAGCCAGT AGTTATGTT'T GTACCTTTCA TTCrTGTTCC TATTGCAACA GGTTTCATGC AGCCATTCTC TATTrTATCA GGATTTTTGG TGGGTGGATG AGCGATGTCT ACATTGGTTT ATTTTCCA1~r AAATGAAATC AAACAATCTT AGAGGTATTT CTAAAAATTA GAGAGTTAAA A'!T=TTCTAG GTArTATATT TTCGAAAGAA ATAAAAATAT TACAGAAGTA GCA.AGAACAA CAATCAAGAC GAACAAATAT CGAGAAGATG TTTTAAATAA TCTAGCAACA CGCGCCTATG AACGATACAA 'rATGCTGAAA GAAATTNGG AAAATATCAC 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760
TTGCCTTGGA
ATTACTCAGC
CAGGATCGTr
GTTAAACTCA
GAAAATTTC-r
AAGGTGCTTA
GGCAGCCTTA
ATTGATGCTG
GTACACCAGC
TGGTGATATT
TAGCTTACCA
CACATTTGTG
ATAAAAATCG
CGATGGTAAA
CTGAAAGGAT
AGAGAGCAGT
AACGTGCATA
GGAACAACCT AATGTCCTAA
TATCTATATT
TCCTATI1T
GCGTCATGGA
TCCG'TTGG
GTCTGTTTAT
TCACTTAGCA
GGCTCGTAAA
1096 GAAGAAGAAT CTATGATTGC GGGAAATCAA GCCTTCCA CCGGAATATA CGCTAGAATT TGTTCTCAAT GAG7IGGATC GATTTTCT ATATTACAGA AGAAACAAAA GAACAAC?1'A GAAAATAATA ATTTACGTGC TAGAGCTGGT GCCTATTAC ATGGAAACAG GATTIrCCG TATGGAAGGT AXGC'rGAATT GTTAACTATC AGAAACTTTT GCAATTTrGGT TTAAGAGG'1r GCAAAAGTAG CTCTAGATTT AACAGATCCA GCAAGTATTG
ATAAAGATGC
777'rGAAAA
GAAGTATTGC
CTGAAGAAGT
CTGGAGATGC
TTGAAGAGCG
ATAAATATCA
TrTTTACGAC TCTATATTTA TCGThATCGA TGCTCTTGCT AAAAGTTAG CCGAAAATGC GATTGCAGAT ATTTGCTCTA GAGTCCCATA TCAATCAGTT 7rGGTTATTC AATTATT ATATGGCCGT TTTGATCAAT ATATGTATCC TGCTATTAAA GTATATGCAA AGCGCTTTGT n 0 6* 0 0 0 a 0 00 0 0 AGAAACAGAA GATAGCATTG TAATAACGGT CGCAGTCAAT TGTTACALATT GGTCGACAGA GGTATTAAAA TCAGTTGCrAC TGCAG63TTA GATGCTCGTT TATGCCTGCA TTTAATAATG GGAAGATGAT GCTTATGATT ATGGGGCTAT CGTTGCACAG GATGAATGAT GGAATTGATC TAAGGATATG AACAACT'TT GACACGAATG AGTGTTATTG TGYATATTCTA TGTTCAGCAT
TTGAACCTCT
CACATACATT
CTCGAGATAA
AAACCC-ATCT
TCATGAATGA
ATGAGATTAT
ACAGTGCCAT
GTATGACTTA
CGGCTTCGGG
CTGAATTAGA
TTGAATTC
TGACTGATGA
ATATATCAGG
TGAACCGGCA ACTACTTTC CAGAAGCTAT ACAAA7TGAA TCTAATGGCC ACTCTC=r~ ATATATGAAG GCTGATTTAG AAAGTGGTPA GACAAATCTT TGGATTAAGA CAAT'rACAAT ,TCTTCAGCA GGAAGTCCTT TATATCAAAA GAAGGATGCT GTTAACCCAT TATCTTATTT ACCGCAACCT AATCTAACTG TACGTI'ACCA GTGTATTGAA GTCATGAA.AC TTGGTTrlTGG 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 0 *2.6 0 0**0 0.60 *0 *0 0 0
TATTCCTTCT
TGGATGTGTT
TATGAACTTC
TAAACGGTTT
AAATGCTTGG
TATTGATTTA
TrGTATTGGT
ATTGCAAGTT
TTTATGCAA AAGGAGTATT GAAACGGCAG TTCCAGGGAA CCTAAGGTTC TACTTATCAC GCACCAAGCT TTGGTCGTTT GATAAAACAC TAAGATATTT TCATTGGAAC GAGAAGTTCC CGTGGAAAAC ACCTTAAAGA GGAATTGCAA ATTTGTCGGA CGTATAAGCC CAAGTCAGCT AAGGTCATTC AAGAAATGTT GCTGACAA6AT TGGTTACTGC AATACACGTT ATGGAAGAGG TCAGCCAACG TAGGGCAGGG
AGGTGGAGCA
TTCATTAGCT
TI'CGCATGCA
GATTCATGAT
TGCTTATGAC
GCCTATTGGA
GTATATGATT
GCAATTAAAA AATIGGTGTT TGAGGAAGAA CTGGAAACAG ATTATGCCGG AGAAGAACT GCACCTAAGT ATGGTAATGA TGATGATTAT ATTTATGTTG ATGAAATTGC TAAATATCCT GGAA'rTCGTT ATTCAGGAAC ATCTTCTATC 1097 ACGTGGAACA 'I~rA~CAACTC TTCACCATCA CATAATATGG ATTACCAACA GATGAAATCG GrrAGCCAAA GAAGAAGATA TTTACATGGG TACCA'rATTC GAAACATCCT GAAAAACACA CAATGIrCTT TCTAAGGCAA AAATAAAGAG GTTCTTrTA AAAGTGGTCT GAAArrGC AGAGGGrTCT ATTAATTTTT ACCCTCITATT CA7VM"CAGG TAAAATTCGA AGACAAGCAG ATTACGTGCA A'rAAAGGCGC TACAGTI'ATT CAGGGAT'rA' TAATAGAATG GAAAATCTGA TATTGATAGA CAGAACTCTC AGTAAATAAT GCTTAGCTG ATCAGCTTTC GCCATGCCAT TGTTATTCA.A AATAGTCGTT AGTCCTTCTA GATATATTCA GATTTGGGAA ATTGCCCTAT CGATTGAAG ArrACC1TACA GAAGCTTCTG ACAATGAAAT AGTATTATCG GTCrrGGTGG A?'rGAAAAGC CTGrrATTAT TTATCrTMA TTTATACAGA CCAGATTTrAG TrTrGGTTGA TCTGGI'ATTG CAGATGGTIrr GGAAAXACTA TGTTGGGACA CAGATGGACG CAACGCGGGT ATCAACACGG CCCTACATCT TAGGTGGGC? TCTCTTAAAT AATTAAAACT AATTGCTT AATACAATGT TGTTTCCAGA GAGACTTAAT TGTTCCGTT~ CCCAAGATGA CATTATAGGA TGGAA7rTAT GC7"TGACACA CGCTACCTGG GGTAAC'TTCA 7TGAACGAAT CAAAGATGTA TGATTNCTCA AGATTTTGAA GAGATGATAT ATTTATCAAA TAAAAAA.AGA GGGCTACCAT TAGCTATCGA AGCAGGAGCG ACATTGATTC AAATTCTGTC CTAGTAAGAT TTTAGCTGCA CAGGTGCGCA TGCTGTTACA CTATCCAAAA GGCGGTTGAT CCATTTAGAT AGAGAGGAAA GGGGGAAAAT GCCTTGTI'TG TCTArrT'CC GATCAGTTGG TAGGTATGGT TTCCATATTG CAATCGAGTT GTTGCCTTGG GGGAAAGACG ATTGATAGCG TGCTCCAACA AT'TGCATCGA TGAAGGTGCA TTT-GATCATT TACAAAAGTT ATTTCACAAG AGCAACTTGG CTTGAGGCGC ACAGCAAACA TTGGCTGGAG ACACCGTTAG CAGAGGGTT4G cTTTAAAAT CTGTTTCA6AA CAGACTAA ATCCTCAAAC TTACGAACAT TCTITAATCG GAGACCCTGA TTGACGCTCA GCAGGATACT C1'GCATTCrT CGTACTGAGC ATACI'TGTA TTAAMT'rTAG ATGAGA7MA AATCCCACTA TTGCAAAAAG AGAGAATTGA TTGGCTCTAC GGCATCT'rAA ACGATCCTCA GTACCTrGTTA CTCCAGCTGG ATCACTGCAA CAGCTATTTA GATTACCTAG CTCCATATTA ATTCG;TCAAT TAGCTCTTGC TCCT=TAAAA ATGTAGCACA GCAGGAGCGG ATCTTTTTGA GATTTTTCTG ACGATTGGTT TACATATGAG AATTTTTGCT AAAATGCCAA ATCAATTTTG TTTATGATAT TGTTCGAAAA 7TCGGCGCT ATTTAATGGT CTGAGAAAGA AAAITrGTGAT CAAAAGCTAT TGCAGATTTG CCGACGCACC TGTATCTGCT ATCTATTTTA TTCTAAAAAT CCCCTAAGCG 'rTrATTAGCG GTGCGI-rAT GCAGGCAAAT TTGCA.ATTGC GAAGAAATGT 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 GAAGAAACGC TGTTTGCAGA ACACCAGCAT TAGAAAATAT AGTGGAGGAT TACCTGCGGC ATTCATCATT TAACACATGG GAAAA'rAGAC CTAAAGAAGA CCAACAACTC TAAAAGAAAT GGTAAACAAG CAAC'TATGGA TCAGATGTTG CTCALAGCTAT AGGACTACTG TTICCAAAT 1098 TGGrTTACAG GCTATGGCAG CTGGAAGC TAAAGTGCTG TGTTGAAGCT AATACTTTAT TGAG1'GGTCT AGGITTTGAA GCATGCAATT CATAATGGTT TTACTGCATT GACAGGTGAC TGAAAAAGTA GCrATGGAA C1-rrAGTACA ACTATT'ATTrG ACTTGATAAG TATATTGAGT TTTACAAAAA AATTGGTATG GCATTTGGAT CAAG TGGAT ATGATGATTr AATAAAAGTT
GGGTGAGACA.
TATCGCTGTA
GGTAGTCTTT
ATTGAAATAG
AGCAAT'rGTT
CGACATAGGT
GGTCTAAAAT
CTGCATAACA
AGATACACTA
AGTAACATCA
TAAGTGGCAG
ACCTAAGTTT
GATGAGAACA AATCGATTGG TCGTACTATT TCAGATTCAG TGTCGGCTAT 'rTATTGTGAA AAGTA'rnl'G TGCTATACGA ATGGGATAAA AAGTGGGAGA TACAGTITTTT CCTT'rAAACA CAAAATTGGA GAGAGACCAT GGCTACTGGG CAGAACATCA TATGCGCTTG ATATGTTCCC ATTCATCAGA TGCCGTTTAA GATGCCTATG TAAATTCAAA TATTGATCCC TGTATTGAAT GAAAGTAAAA TrAATTTCTA TCTACTATAT GTTCTTCATA TACATTrAATT AGCA~rCCAG GATAAGCTTC TTGACTTACT TAGAGCAATT CATAGTCATC CATTTCAAAT TCCCTCAAAA GAGTTTrTTAC AATCATAAAG TACATTTAAG ACAGGAACAG 'PTATCCGTCT GGAGCTGGTC CCTCAGTCGT TACAAACGTC 7TMrGG'TTTG CCTGCAGAGC AGCGGAAAAC ATTGCCAACT
GATTTCGCCI'
ATAAACAATA
TCTATAGAZAG
TAAATGTTTT
AATCAAAAAG
TTTTATCTrC
CCTTGATTTA
AAAATrAATG
ATGGTATAAT
AAATTGACCC
ATACATCAAA
TGCACGTAGG
CGCAAGGCTA
AATACGCTAT
TCAAACGTCA
6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100
S
ACACCCAGAA GGTTATACTG CAATGTCCTT CACCCAATGG GGATACTGGT AATGACCCAG AATTAATGCG CT'rGGATTTT CTACTACAAG TGGACTCAAT AGCTGAAGTG CCAGTAAAC1T GCTTCCTGAC GGAACTCI'G ATGGATGCTC AAAATCACGG TTGGTCAGAG TCTATCAAGG TGTAACT'ITC AAAGTAAAAG CACACTN'TC GGTGCGACTT CACAAGTTCA GAGCAAGCAG
GTTGGGATGC
CAGAATTTAC
CTTATGACTG
GGATrTTCAC
CAACCGATAT
GGATCGTGAA GTCAACACAA CAGATCCAAA CAAGCTTT'AC GAAAAAGGCT TGGCCTATCA GGGTrTGAGGA ATTGGGAACT GCCATTGCCA ATGAAGAAGT AGCGTGGAGG CTATCCACTT GTCCGCAAAC CAATGCOCCA CTTACGCAGA GCGCTTGCTC AATGAC -rAG ATGAACTAGA ATATGCAACG CAACTGGATT GGTAAATCAA CTGGTGCCAA GAACAGACAA GGAATTTACA GTCTTTACTA CTCCTCCGGA TCACTGTCTT GGCTCCTGAA CATGAATTAG TAGACGCTAT AAGCTGTAGC AGACTATAAA CACCAAGCCA GCCTTAAGTC TGACTTGGCT CGTACAGACC TTGCTAAAGA CATCAACCCT GTCAATGGTA AGGAAATGCC ?TATGGAACA GGTGCGGTTA TGGCTGTGcc CAAACAATTT GACCTTCCAA TCGTCGAAGT CI'ACACAGAG GATGGCCTGC ATGTCAATTC CGCTATTGCC AAGA?'rGTGG CTTGGTTGGA CTACCGTCTC CGCGACTGGC TC -rTAGCCG CATrCATrGG GAAGATGGAA CTTCAACAGC GCCTGTAACC AAGGATATCC GTCC'N'CAGG AGA'rTGGCTT GAAGTGACTC GTGAAGATGG GCCACAATGG GCTGGTCA.A GCTGGTACTA GAAATTGGCT GATGAGGACC TCCTCAAACA TGCGGAACAT GCTGTACTITC ACTI'GCTTTA CCTCGGTGTT GTTCCGACTA AGGAACCATT GGGAACAAGC 'rACCGTGACC ACCGTGGTGC TGATGGTCC TTCT'rCCATG TAGAAACAGG GTCTAAATCG CTCAAGAACG TTGTTAACCC TACCCTTCGT GTTTATGAAA TGTTTATGGG AGAAGGTTTG GAAGGAAGCC GTAAGTTCCT AGAAATCCTT GCGGAAAACA ATGGTGCTCT TGTTACTGAG CAAATTGAGT CTCTCA;ANN' 1099 AAAAACAGGG G~rTGGACTG GTGCjrATGC AATCTrGGATT GCAGACTATG TCCTTGCTAG TGCCCACGAC CAACGTGACT CGGAATTTGC ACTTGAAGGI' GGAAATGTCG AAGAAGCTGC AGACTTCCTA GATGGATTGA ACAAAGAAGA AGAAAAAGGC TGTGGTCAGG AGAAGCTTAC TCAACGTTAC TGGGGTGAGC CAATTCCAAT TGTTCCTGAA ACTGAATTGC CGCTTGTCrr TACTGaGTGAA AGTCCACTAG CTAACTTGAC TGTCAAAGGT CGTCGTGAAA CCAACACTAT CCTCCGCTAT ATTGACCCGC ACAATACTGA
ATGGTTGCCA
TGCTCGT7TTC
CCAAAAACTC
TCTTGTGGCA
GTAGATATCT ACGTCGGTGG TGGCATAAAT TCCTCTATGA TTTAACCAAG GGATGATT'T ACCGACAAGG T'rGAAAAACG 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 GGAAGAGTTG GAGCAAGCGC CAGCCAAGAT AGACGATGTG GTGGAACAAT ACGGTGCCGA ACCACIGCGAT GCTTCGATTG CTrGGTCAGA 'rGACCCAGTT TACCGTTTGA T'rACAAGTAA TGACAAGGTT TACAACGAAA CAGTCAAAGC CAACACAGCT AT'rGCCCAAC TTATGGTCTT
V
TCTCAATGCT GCTAACAAGGAAGATAAGCT TTATGTTGAC ATTCATTGCA CCATTTGCAC C'TCACTTGGC AGAAGAACTC AGGTGAGTCA ATCTCTTATG TAGCTTGGCC AACrTGGGAC TGAAATTGAA ATTrGTCGTCC AA.ATCAAAGG AAAACTTCGT AGATCTATCA CGTGAAGAAT TACAAGAAAT CGCTTTAGCT AA'rTGACGGT AAGGAAATCG TGAAACTAAT TGCGGTACCG CGTTAAATAA CGAGTTrATT AGCTCTATCT GCCACCTTCA ASCCAACTAA ATTAGTTAAC ATTGT'rGTGA AATA.AGATAG TATGCCAAAG GCTTTATTCA TGGCAAACAG TCCCAGAAAC GAAAGCAAAT TGGTTGAAGA GCCAAACTCA TGGTTGCTAA GATGAAAAAG TCAAAGCAGA A.ATAAACTCG TTAATATCGT ATAGTCCACT GGACTATTGA GAGTCCTTCA GAGTAGAATC
TGGAGGATTT
GAAAAGTGAA
TCAATGGAAG
GCTAGGATAG
CCTGATACGC
GAACTGGTCA
ATAGACAAGG
AACCGACTAA
GGAACAGAAG
TATGAGTGGA
TI-rATTTATG
GATTGGTTGT
TTGCGTCCTG
GAGGTGTTGA
ATCTCTATCA
rl-rGAATC=r CTTATGAAAG ATAAGGAGAA TAAGATGCC-A CTTATACTCC AGGTGAA?1'G AGGCTCTT'rC AGTGGAAAAG CTCGGGAGAT GTGGACCTAC GCCTrMAAA TCAGATGTTG CAATGGAA GGCTTrGGCA 1100 TATGATATAC TATGGGCAAC GTAAATGAAT ATGGI'CAAAT cC'Crr ATTrCTTAGA 'rATAAAGTTT
GATTGGGGAG
AGGGCGI'AT
CATGCGGACG
CTCr'rTCAGG
GCTCGTAAGG
ACTTTTTCCC
TTTCTCCAG
ATrrATI'AGC TGTTTATGGC AGTCAGTAGC AGACATGGAG ACCGrTTA TTATGCAATC TCATGCGAAT TCATCAGAAT AGCTCAGGGG GACACGGATA
TAGAAGTGGG
CCCACTATCT
AATGCGATGC
AAGCAACCTTr
CTATGATTGA
AAAACTrTGA GATGArrACT
GGCTG'TCGGT
AGCTGTCACT
CTTGGCTTGC
TCTTAACCTG
CCGTCAGGCA
TAAGGACTGG
TAAAAATGGA
ATTAAAAAGC
TGGACAAACT
CCTTATCTCA TTCA'rTAGTA ATT-TATCTGG TTCCT7TGGT GGAGATGGT TT=CATCAG GCTATCAGCG TCAAGGGAT GCTAGCTCCT AGGCCTATCA AGTCCAGCTG GCGACAGAAG CTATGGGCTT TGAAATCTTA TCCACCTATG AAAAATAAAA AAAC~rGTTT GTTCTTAAGC TCATTAAATA AAGACCTCCT AACTrrATTT CTCCTAATGA AGCCACCCAA TCAGGTGGCT CTGAGGTGTA AGTCCTCAGC CTGACTATCG AATCGTGGCT CTACGAACAG GAACGTGATA CAAACTCTAA AGTCCAAAAA GGTAGTCGTA AATTCGGACT AAGGTITGI'G TGAAAAAGAT CAGATTTCCT AT=rCACTG TAACCTTTTA TATGTCTGT AGGAGCTTAA C1'ATCGTCGC CCATCCAGAC GAGCAGCGGA ACGFTGGGA GTGGT'rrATA AGGGGCG1'AC AAGAGATACG CCTCAACTCA AAGCTCGATT GGAAATATGG CGACAGCACA AGAGCTTGAG AGAACT'IrAA AAGAAATTGT CAAGCTAGAG GATGTTTTGC ATACCCATCA AACAGAGATG CTGGAGCAGG CACT'GATGG TGATGCTGTG G'TGGGCI'TGA TTTTTGTACA GGATTTGATT GTTTTGCCTA TGATGAAAGA GGCTTTAGGA AA'TT'rAAAG AGACAGAAAA AA.ACGTGGGA TTTATCGTT ACTGTACAGG AATGATTTGG ATAA.ACAGAG AAACT'rTAAG GATGGTCTAG TATCATATAG AATAAAATCC TAAAC?TTI' TCATCACAAT 7"rTTTGCGGT ACGACCGGCA TGTCGTATAT TGAGAGCA GGGAGAGGAA GGGATAGCGA GTAAGGCGTA TATAGCGGAT AAGGAGCr ACCTATATGT GTAAATCACG AGAC;TAAT'rG AAATCI'TCT AGAGTCI'AAA GACTCTGCGT ACGI'CCTCAT ATCTTGTATA AACGAGGAAA 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 GATGTACGAC TTATCCCGTG AGGTTTCATG AGCGCTGAAA GCGTAGTAAC AACGAATCAT GAGAACTCAG CCGAGCCCAT AGTAGTGAGG AAACT'rCCGT AA'rGGAAGTG GAGCGAAGGG 1101 GTGAATACTC AAACAG3TCTG GGGAGAGACT GTPTGAGGTC TGTCGCTAGA AAGAGAAAAC GACAGATCGA AGTAATCCTA CTTCACTI'GT GTCTGTAAAA TGAGTGGTCT GATAGAACTG
GACTTTGAGG
INFORMATION FOR SEQ ID NO: 173: SEQUENCE CHARACTERISTICS: LENGTH: 4185 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 173: 11700 11760 11770 CGCGAAACTA CTTrCTTAGT TTTTTCAATT TTTTCAAGCT AAGTAGTAAA CTAACTACTA TTrTCAAGGCC TGATAACTAT ATGGCAGTCG CAATGGTATC.
AGCAGATTGC CTTTGATTCG GCCTCC1TGAG ACATCTTGAT GTCAGGCTGA CAGAAMAAGC AACTGACTAG GATGGGTTGT ATAACACTTT CAGAATCATT GTCAATAGAA ATGACTTGAT ATTTCCAAGG GTTGTAAAAT CGTCCCTGAT AAAACAAGGT TGCCAAGAGC AAGGTAATAT ACCATGTGCG TTTTTrCTCT TTCCCAAAGC AATGCGT'rCT AGCGAGCTAA AAATCAAGGG T1TGCATAAGA GAAGCTCT TGGATAATTC
AGTAAAGAAT
AATCTTATAG
CATCAAAAAG
TCTTCCTGCA
GGCACACCAA
ATAATAGCCA
TTAATGGCCA
CCCTGCCAAA
TAGACCATCA
TCTrTAAAGC
AGCATTCTCG
TTCCCAGCTC
GCTTGCATCC
GTTTCTTAGC
GATCGCTCAA
GGACCCGGTC
GCCCTTTTG
AATTTAGCAG ATAAAAGAGC TCCTGGCTGG TCACACTrCT CTCTCCATAA AGTCCAACCC AAACGTTTAA AACGGCAAAT ATCGTCGCAA GAATTTCTGA TAAATAGAGG AGAAAGACTG TATCATAGCT AATCATGGCC GCCAATGATA CTGACAAGCG ATGAATCACA GTATCTCTAT CTCTCTCCTT TCTTrGTAAA ATGCCGTTAA CAAGTrTAAAG ATGGAGGTTT CTTTTAGATT CAGACTGGCT GGAACAGTAT CGGCAATCAA TGAATAATCC AGCATCAATT GCATATCATG ATGTAACTCT TCGAGAAATT CCATAATCTC
AATCTGGAAT
TTTGATTTA-A
GAGGA.ATGG1'
TTAGAGTGTA
CATACTCGGG
AAACGGCTAC
AAkAGATGGC
CCAGXATGAA
TCTGCAAGAT
AGTCTCCTTT
GGCGAACTCC
CGTAATAATG
CATCCCACGC
ATAGCGCAAG
AC1'GGAACCA
GCAAAGATAC
GACACCGATT
AGAAAAGAGA
AAAGGAAACA
AATCAGCAAG
AAAGACAAGT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 GCTGGTAACC GATTAATTTA ATCCACTGGA TCCACATCTA GGCTTTTACT AACAGCTCAG TTCTCCATCC ACCATGACAA GGTAATCATG ACAATGGTAT AGTATAGTTC TTCTGATCTT *0 .0 0 GACCTGCAGT CGGTTCA'PCT AGGAGAATAA TGGTGACACG TT ITrCTGA CCAAATGACA AAAGTCCACA GA=~CAAG GTr'rCATATA crCGCAAACG GAGCCCTAGA GCCACCTCAT TAGGA'G TACCACATAT CCTACrCGTT TATCCTGTTT ??CCCAAAGA TAGCGTCCTr CTAGAGTTGA TTTCCCTGCT CCAITTTTTC TATCTAAATG TAGGGATTT AAAATCGGTC GTC'rAAAGAG TGACTGCAAT GCrGGGG~rr GACCTTT'rGA GATAGACAAG TTATCCAGAT CACCTAATTrG ACGGAGAGTC GTTAGATAAA AATCAGTCGC AAGCAACTGG TCAGGGCrCC AGACAATCCG ATCCACAGGG CGATGCAGAA TCGTCGTCCC cCCTCCTTA TGAATCTGGT TGGGATCTAG A'rTGGCGAGT GGCTCATCAA CACCAGCCAG ACTGACTCGC TGC -TTGTC GTAAAGGAAG AAGGTCCAGC TTTTCAGCCC GGGCTGTCAC ATCATTTT-CC AGAGCAAACG ACTGCCCATC TGTATCCTGC AAAACTGTGC TATCAAAGGC TACT'rGACCC TTTATCAAAA TGGGAATAAT CCCATTCAAA CACTGACCCA CAATTrAAGAC TrTCTCTCCC TTGTAAATGG GTTGTG~Tc ATAcccGAAA GAGAAATCCT TCTC?1'CArr CGCTTCTTAG ACTCTATTr TCCTCTTAAA ATCTTAGCGC CAAAAAGATT ATAAACATCG AAAAAGACrA GTTGCCCAGC CTCTTCAAAC CAcGTCAGc'r TCGCTrGCC ATCTACAACC TCAAAACCAT GTTTTGAGCc 1102 TrTCAGCTCC
GGGCAGAAAT
CTCTCGTC
CAAAAATCAT
CCGCCCGcTC
CCGTCTGAAT
CGACAATAGC
TATCATCATA
CTTTTrGCCAC 'rCGCTAA'rr
CGCGG=TCCG
CATTAAAAAG
CCTCCTCCAA
CAATCAATTC
TAAGACCAAA AT'rGAAGCA6A AGGCCAATTA CGGAATTCA'r AAT'11rCCTTC TCATCCACAC ATTGGTwrGAA ATCATTTGAT TGCAAC-AGAA TCGCCTTTTA AAA=~ACrr ATAGCCTCG AATC7'r"rCA CCCVrNA.A AGAAAAACAT ACTTCCTCTA TTCATTCTGC AACTGAACCT TTrCTCCTTG ACTAAGTCCA AATT-CCATTT TGAGTCAATA GATACGACCA TCC~rrATCA ACGGTGCTCG ATAATAAGAG GATAATATCC TGACCTGACT ACAAGAGAAT CGGACTTTCA TCAATCAAGA CACCTGACA6A ATCCTGAGCA CGCTGATCCA ATTTATAAAC ACGACCTTTC ATCTCATCTA CCAAATCTTC TGCCACAGAC AAGCCAATAA TAACCAGATG AGAC -rA'CA TAGATGCTCA A'rTCTCCATA TGTCTGACCC 'rTGTAAATALT AGGTAGATTT ACCTGACCCA GATGGTCCAA TCAAGTCTAT CCCTTGCAAG GTCGGTTCTT TCCACTCAAT TaTAGCTTCT TTCATCTTAC TATCATAAAT CAACCCCTTC TGCAGTCTC CCTATCCTAG CTTACTTGCC TAACTAATCT 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 CTTCCCCATC AT'NTATACT GTAGGTATGG TTACTGACTt TGCTTCGTCA GTTCTATCCA
CTTCGAAAAT
CGTCACTC
CAATCTCAAA
CATTGAGTAT
TAAGACAAGA
ACACTG7=~ GAGCAACtGC GGCTAGCT'rC CTAGTTTGCT CTr'rGAT~rr TAGTCC'TTT TCAAACTTCC TGCACGAGTT TGGGTTCCTG CATAGGCAAG 1103 TACACCATTG GCAATTCCCG GTTCCTGCAA TAGCTACAGA ACTTTTTCTG CCGCTrCTTG ACAAGGGCAT TTGCAAGTAG ACACCATTGA TCACGCGAAC CTAGCGATAA TCCAAGTCCA CCAATCAACC CGACAAGCAA ACCCCATACT GAAGCTGGAT ATGACGACAA AGAGGGCAGC ATTTCCATAC TATTCTCCTA AACCGACACC TACATTATAG GATAGAGAGA GTTGATGTAA CATAGACAAC CrGATrnGA GATCACCAAA 'rTCTGGTTCC AACCGAAACG CACATCCAGA TCGCTACGAC TGGAAGCTCT AAGTAATGTG ACCACTGGCC AGGTATAAGA ATGCTCI'GTG CTACAATGCC AACGTGT-r'C ATGATTTTT TGCAGTCTTG CAACCGATAC AAACGTCGTT
ATAAATCACA
TTGAATGAGA
GTAC?1'TCTA
CCATAGACCA
ACCGATAATC
GCTTGTAT
GCCAATTCCG
TTTTATCCTT
ACATCTCCAA
TTAAAAATAA
AAAAGTCCCA
TAACCAACAA
GGrCCAA
GGAACAGCGG
ACAGCAACAA
CTATTTrTCTT
CAACAATCCC
GTGGTGCCAA
GAATATCTTT
CAACTAAACC
GAGAGTCCTr
TAATAGAAAG
TTGTGCAAAT
GACACCCCAA
CCAGTCAAAA
AAAGAGTCCG
GA'TGCCATGA
TAGCGCTTGT
TTGGAATGTTr GATCATCCCG CTTGTTTAAT TGTAAATTTG 505556 6
S.
6O a 56 U.
S
S
S*
U
Sc-.
S
56 5**9 6.6 o 0 GCCTTGGCAA AGGAACCT'rG AGGATGGCGPT GCCCAATTCT TAGACCAGCA TATCAGCATG AGCTTGTAAA ATTCTTCCCG ATATCAATGG CTCCCTTCAC ACAATCTGTT CCACACTGAG AGT'rTAGCAC CAGTATAGGC TrTTTGACGCC TAT'rGGCCAC TTGATAAAGG AAAGCGTCCC GCAACTACAC TC?1TACGTTT CCCTCAGGCC AGTAATCCAC
TATTTCAATG
G'rrGATAGC
CACATCTGCA
ATAGATGGTC
TGTGATAGAG
CAGATGATCT
CTCTGGCCCT
ATAGACATCA
CTCAGAAATC
ATTATCTGGC
CGAACCGACA
CGTCT
GTCCAAGATG
AAACCTAAAC
AATGATTTGC
ACTTCAAAAC
GTCCAAAGCG
TCTATGATGG
ACTTCCTCAA
CGACCGTGGA
TCACGAATGG
GTGACAATGT
CCTGGATCGA
3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4185 INFORMATION FOR SEQ ID NO: 174: SEQUENCE CHARACTERISTICS: LENGTH: 2069 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: TGATAGAGP AAAGCCGCTG AGTCATTCAA TCCATCTCCA ACCATCAAAA TAGTGTGACC TGCTTTCTGC AGTr"TCTCTA CTAACTCAAA TTTCCCATCA GGTTTCAAGT CTGTATAGAC CTGATCAAAG GGCAAATCTT TGACTAA'PTC CTCTGTCCTA ATCAAGGTGr CTCCTGTTGC Page(s)l1. 4. 7)0were not lodged with this application 1107 CCC ITTCTT GGAAAAATAA TTTGATTCAG TATGCAGG1TC GGNTTCATAG AAACTACAAG 3120 GG~1C~TGGGCGTAT TTT1CGATTAT GTGGATATC ATGTCCTA TI' AGAAAAG 3180 ATG~TTCAGA AACGACAAGT AGCTTATCGA AAGATGGAT ATCGTGTCAT GGAG 34 GAGAAACAAT TCGTTTATGT TGATAGTAA TATGAGAAGG TG?1'GAGAGA GGACGC 3300 GGGGA .AAGAC AGGAATGTCT GCTTATT1TA CCTTATGG ACCAGACAAA ACTGA'rGAAT 3360 TTCTAAAAG AATTTAGGAT TAGTCAAPT GAGATATGTA TrACCAGAGAC GGT GCAA 3420 AXAGCATGGC TAGArCAGT GAAGAGCCAG AAAATTAAG TGTC:ITrMAC 'rCAATCAAA 3480 ATACI'AACGC CTA'TCT'TT GGTGAATAAG ACTATTGTTT' GGTATGG'rGC AATGCCATTA 3540 TT1AGGGAAGG 'rAGATCGAT GACCATTTA CGTT-TGGAA'r CAGCTAGTAT AGTT'C1GA 3600o cTAGTGGcAG GTT'TACGATA GAGAAAAT ?TAAAAATT CTrATGTATGA TTTTCA~TTC 3660 T-r'rAGTGAGA CTGTTGCCAT TNTCACATI'C GXATCACACA AAATAAAAA ATTAA 3720 GTrACTTGACA AATAGATTGA AA'rATcAAA AATAAAAACG GTTACAGAGI' TATTAAT'rAT 3780 TT~AAGCTTCA TGTCACCAT AAAAGA ATAAAAGAT GTTATCACTA ATACAAGTGA 3840 GcAGGAACCT ATT'TAATCAC ATCAGAAGAA GTTCTGAT CTT IThAGT AGGTTCCTTT 3900 *TATTTAAAA GGGAAATT1T ATGATCA'I'A AACGAATACT AAACCACAAT GCCGTA'TT 3960 *CGCAAAGTAA AAAAATAC GATATTCT~c TTTT1GGAAG GGGAATAGCT TTGGAAGAA 4020 *AAACL'GGAGA TAAAGTAA-AT CCAATTGATA T GAGAAAAG TTTTTTTCTC AAAAATAGAG 4080 ATAATATGAC CCG1TTTACA GAGATGTA TTAACGTTCC ?'rTGGAGT'rG GTGTACATCA 4140 CCGAAAAAAT AATTrAACCTA GGTAAAATAA CATTGGGTAA TAATTNGAT AA'11TATC 4200 ATATTAATT AACGGATCAT ATTTCTTCGA GCATAGAACG TT~ATAAAGAA, GGGA'TTATTA 4260 'lTT-CGAATCC CCTACGCGG GAAATATCA ATATTATAA AGAAGAATT GAACTTGGCA 4320 ***AAAGGGCT-1- ACAAATAATA AAAAAAGAGT TAGGTATTA ACTCCAATT GACGA.AGCG 4380 C..ATCATAGC GCT~ACAT'1T GTTAATGCTA ATTTAGAAAA TAATTTTCA- GAGTCGTATA 4440 AAATCAcTGIA AATAATS'ATG GGAATTGAGA WATCArTCA AGA'N'TCI'AT TGTACTGAGT 4500 TTAACCAAGA .IrCTATTGAT TATATAGAT TCA'rAACTCA TATGAAATT'A ?TTGCccATC 4560 CTGGTTGA GAATACAACT TAT1'O1'CG ATGATGA 4597 IFORMATION FOR SEQ ID NO: 176: SEQUENCE CH4ARACTERISTICS.
LENGTH: 3984 base pairs TYPE: nucleic acid STRANDEONESS: double 1108 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176: CGGCTTATTT ACTAC'rTGTT CCATCATATA AGGGAATN'T TTATCCACTA AATAAAGAGC TCCTGCACTA TTGGATACAA CTGCCGGAAG TTTGATAGGT AAACTACTAA AATAGGCCGA TGGTGATTG CCTTCGTTAT CAGGTATGCG AGG'rGTTAAG CCGGTCCTGC CGCCTGCTTG TCGTAATATT TCTGCCTCrA AAACTTCTGA CCAGCGAACT TCAGCCCCAT TAATCTCAGA ATTAATCT'IT TCACCTCCAA AAGGATTAAA CTTTACTAAA ATGCGAGAAA GTACATTAAG ACACCTGCTT GAGCCTTTAT AACGCCAT'rC TGGAATATGC ATGAAACCTG
TTGGTACATC
'rCCCTGTlr
TGCTCCATCA
AA7=rATTGC
TTGATAGCTT
ATACGAATCG
CTCTCATATT
AAACAAAGGT
GTACCATCGC
GTGTATCAAT
AGCATCATCT TGATTAATAG CCACTCGTTC TCCAATACAA AGTACAGCAT CTGGTTGATA
CGACTTATAA
TGGTAATAAT
ACCTGTAACC
AATATGTGAA
TGATCTTTCA
AAAACCGTTG
TTTACAGCCT
AATATTTTCA
TAACAATCAT
TATCCATCAA
GAATTTCTAC
CCAAAGCTGG
T=~ATTTTC
TACTAGAGCA
TGcrGCTGGT AGAGCGTTAA AA'rTAGCAGC CATTGGGGTC AATAAGGTCC CACAATAACC TGCTGTCATG GCAAGAGCAC CAGCCACAAT TGGATTAGCT CCCAGAGCAA ATACAAAGGG AACTCCAACA CCTGCTGTA-A TAACGGTGAA TGCTGCAAAA GCATTTCCCA AT-rCCAAGAA CATAGGCCAA AACTCCTATA AAGCGACTAT ATCAGATGAG AGATAACATC ACCAACACCT GCTACAGTAA AATAATTGAG GAACAATCCC ACTTGTTGAA ACTTGCTGAG AACAGACTC'r TAGGGTGAC'r ATT-GGTAATC ACAAGAACAG GCAAGGCTAA TCGAAATCTr GCTAAATTCT GGAATCATTT ATTGCCATCA GCATAACTGG AATAAAAA'N' TTATT'rTTCA GCTTTCATTT CATCTAAGGA TGGCAAGGTT CCGATACCGA TAkATCATTGT GAATAGAACC CTGAAGGAAC AATACCGCTA AAATAGCCCC CAAAGCCCCT TCATTrCGAT'r ATTTCTGAT AAATTGTAGC AAACAAGGCG GCGCTAAGAC CAACGCAAGT ACCTGTTAGA TTCAATATTG CTTGCTTAAA CALATGTTAAC GCATATAGGA ACCACCTATA GTCGAACTGG GTTTGT TTA 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AGCGATAATA GGATTACAAT AACGTAATAG ACAATAGAGT TCTTTATAAC TACAATAGGC TCTAATAGTIT GCTTTGCCAA TCAA~TTT ATCAAATAAA CAATAATAGA TGTAGAAGCA AATACCAATA CTCATATTTG CCAAAATGCA GATGTCCCAA TGTATGGAGA AATTGACAAC CAATCACAAT ATAGGTCAAC CTCTGTCATT TTTGTTCTCC TCCCCTAGTC TTTTTGATA TAATTATAAA TCCCCACTAC AATAAGTGTT ATAACAGCAA ATCCCTCCAT AATTGCTTTC ATAGCCTAAC TGATCTAATG 1109 TTCCCCCTAT CAAG3AGGACT CCCCCAGCAC CTACAAACGT ATrrGAGCA AAGAAAT'rTC CAAAATTTTC ATTCGCAGCC GCACGCGCTT TTATTGTCTC ATCTTCAACC TCTG=rAAcT TTCTACCTAA TTGAGACTCT GCAGCTGCTT CTCCCATAGG TTGAACCAAA GG1TCTGACAA ACTGAGGGTG TCCTCCTAGA CGAATTGAAA AGAAACCAGC TAACTCTCGA ATAAAGAAAT 1680 1740 180Q 1860
AAACTGTATA
GTTGCTTGAG
TGAGCACTCG
CACCTGAAAC
GAAGTTTCCA
TCCAAAGGTT
CTCATTGCTA
TAAAGCTGTA
ACTGTCAGAC CTTTAATCrr TCGAATCAAA TCTGACAGCC CCACAAGAGG CAAGGTAACC AATrCTTTC CCAAAATCTC CAAAAATTCA ACCAAACCAG CTAAGACTAC TCTTGCAATT TTAAAATAAA ACCCACAACA ACTCCTTTAT ATTCAAAATG TATTTAATGT GT'N'TCTAC TGCCATTTTT 'rAAAAACCTC ATTCCTTTAA AGATATATGG GGTACTACCT TACTTACAAC
ATGATTGCTA
ACAGTATTTT
TTC7'rTTTGG GTTATArrCr
ATCCCCCTTA
TGGAGAACCA
TTCCTATTAA TCTAATCCAC TAAAATTTTA TCAAGATCAA TATTTGCTAT TGGATTCCAA 7wTTGTTGTAA CAATATCTr'r ATACCAACTT GTGAGTA'rGG TCGATGATC 1920 ATAAAAATCG 1980 ACGAGAGAAA 2040 GTATCAAATT 2100 TCCATATCAA 2160 TACCATrCCT 2220 'r'rTTCTT1- 2280 T'rGCAATr'rC 2340 TTTTGAGAAT 2400 CAATAATGAA 2460 CCAGATGAAG
TATT'=CAT
CTGT'rGGCAT ATCTTTTCTC TAGGTCAGTG CTATCTG'rCG ACCAAGCTTG AGCTTrGGCCA GCrCTCAGAAA CAGCATCTTC TAACAAT'rrC ACATCCTTAT CTTTTCCTT'r CGTAATACCA 'rCTATAATAr TCAAGTGAGA CGCTGGATCT AAATCATAGT CTN'TGAGA AGGAGCAACA TTATATTTAT CCAAACCAGG TTTAGCTACA AGGTGTTTCA TGGCAGAACC AGATGGA TGATAGCTTG GAGCCCATCC TGTACTGTTC TTGCCGTATT TATCATTTTC CATCAAACCA 2520 2580 2640 2700 2760 TCAATAACAT TTCCAATAAC GTCTGTCCTC GATGTTCGAG TCGCTATACT GTAGCCCAAT CATGCTGGAT CTACTGCATA GACATAAGAA AATGTTGTCG GTGCATCTGC TTCTTTATCA 2820 2880
GTTTTTCCAC
CACTTTT1TCT ACCTTCTG4GC
CGTCACAGGA
ACCTCCAATT
AAGATTATCA
ATAAAGCAAG
AAGCCACTAA
ATTCATAAA
AATAGCTGAC GTGCTCAGGA CCACTCCTGC GAATCTCCTT TGGTTTATTT TA.ATCTACTI TGTTAAGAGC 2940 TTACAATCCA 3000 GCTTCAATAT CGCCAAACTG AATACCCGTC AATTCATTAT CCTACTTCTC TTCACDATA GAATACATGG AAATCATCAC GGAGAAATAA CCGCTGCTGT ACCACAGGCA CCTGCCTCTA ATTGGAACAT CACCCTCAAT AGGAGTTAAT CCCAACGAT GAATACTTGG TAATAGATGG CAAGATAGAT GGACTCAATG
ATAATTTACC
CATGTTGAAT
CAAAACGGTC
Gr'rCTGCCAA
GTGTTACAAA
3060 3120 3180 3240 3300 3360 'rTCATTATCA GCTGTAATTC CAAAGAAGTT AGCTGATCCG ACTTCTTCAA TCTTTGTATG 1110 AGTTGATGGG TCCAGATAGA TAACATCTGA GAAATGACCT GACTTGGCCA TAAGAGACTI' GCAGCATAGT TTCCACCAAC CTTAGCCGCA CCTGTACCAT ACGGTCGTAC TCATCCTGAA TCAAGAAGTr GGTTGGGACC AAACCACCT TCCAACTGGC ATAGCAAAGA TGGTGAAAAT GTACTCTTCT GCCG3-TI-A ATCTCCGACA CCAATCAAAA GAGGGCGAAG ATATA.AGGTT CCACCTGTTC TTrTrCCTGG
TTGGTGCTGC
TAAAGTAATT
CCCCGATAAT
CGTATGGTGG
TGTCTGTCGG
TTTCATCAGG
CAAATGCT'rG TACGTAT'rCT TCATTCGCAC GGACAACTGC TTACAAGCT AACTTGTGGC ATCAAQAGAC GGTCACATGT ACGTTGCAGA ACGGAACAGT TGAACACTGC CATCCTTAGT ACGATAAGCT TTGTCCATAG TGAAGACTTG GAGAAGAC'rC TGAAATATGC
TCTACAAACA
T'rCAAACCTT 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 3984 AAAGTTGCAT CCTCTGrAAG CTCTCCT'rGA TCCCATTGTC CATTGAA A'rGAGCAAGA TAGCGATAAG GTAATTrCAT ATAGGAAAAA CCGAGG'Tr' CCGG INFORMATION FOR SEQ ID NO: 177: SEQUENCE CHARACTERISTICS: IA) LENGTH: 8703 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: TATCTAATTA TTGGTrrTTTA TCGCTGACCT TGGCTATTGT TGGGGTTGTT TGCCTACAAC ACCTTTCCTP TTGTTGTCTA TTGCTTGTTT CTCCAGAAGT TCGAAGATTG GCTTTATCAT ACCAAGCTCT ATCAAGCATA TGTAGCTGAT CCAAGTCTAT TGCGCGTGAA CGAAAGAA-AA AAATCATCGT CTCTATCTAC GAATTTCTAT TTATTT~TGCA. CCTCTTTTAC CAGTCAAAAT CGGTCTGGGT TTACCCTTGT TCCAAGCGAT 120 TTTCGTGAGA 180 GTCTTGATGG 240 GCTTTGACCA 300 TCT'rTAPTAC TTATTATCTC TTCAAGGTCA TATTTGCCTT GATAAAATTG AAAGCATATT AATATTCAAG GAGAATCAAA TGATTTACGA AAAAGCGATG CAGGCTCGAG CTCGTCGGAT GACAACACCC AGCTATGGAG TGACTAAGGC AACCATCATG ACCATGATTC GGCCACGTGG TGCTATCATG CTAGAAGACA TTCGTTTGAC TGGAGCTTTA ACTGCTGATA AAAAGT TGGA TTCCAGACAA AGAATAGTTA CATAACAATA TGATATkATA A'N'TTGTGCT GAA.AATGTGA TGAACTCTGT GATAATCTAG AAACAGTAG r
AAATTGAAGT
CTTTACTTGA
CAGTTGGTG
AGCGGTTGAA CTGGCAGCTA ACTACGATAC TGGTGACTTT GTCTATAATG ACCTAGAAAT TGCTCAGGCT GGAAGTCAAG GGGTTGTATT TAAGCCTAAT CTGGAAAAGT TAATTGCTGC 1111 ATCAAAAGGA ATGGAAATTG AGCGGAAGCT ATTGACTGC TGTGTCTGGC GACTCCTTAG TAAAGGTAAA ATTGAAATTC TATCGACCAG GTGGGGGTAA AAGGAACTGC TACCTTTGGG AATTTAATCA AGAAAAGGCT ATAGTTGCCA GATTTGCAA TGAACTTITCA AGTCTTTGAC GCATGAGGGG AAG'TTGTC TTCGGAAGGC T'TGTTTTGAT TCTTTrCACAT GGCCTTTCAT TCAGTCAAGC CGCTGTCACT AAAAACGTTT TGrCACTAT TACCAGGTGG GGCGATTGAC CACAATI-GCA TCGTACTAAG TAGCAGT~rr CACrTATGTT CATCATTATG GTTTTATAGA GGTGACTTTG TCATGACTGT GAACTAAGTG ATGAAGArCA CGTATCCTAA CTCGTGCTGG CACAGAArr TGGAGTACGC CTTGAAAACC GTCAAACCTT GT'rGTTTT= AAAAAATAGA TGAAA7'rrTT AAATCCTATC AAATAGCGAA GTCTCGACAT CTCCATCACT GCTGATAATG TCCTCACGTT TATATGGAAA TCTGG3AGATTr CTTTACCAGA TCAGACTAAG CGTATCATGA GTGGGAAAAA TCGCCTGATA CGTCr'rGATC AAAATCTCTT
CAAGAGACTG
GGAAATGTCC
GTGCAAGATT
GTGACCTCI'A
GTGAGGCTTG
TTATCTGTCA
TCGAGTATCT
AGTGGTATGC
CTCAAGTTCA GGAAAACTAT GGAAACCAGT CAGCTGTATT GCGCCATGAA GGCAATCAAA 4 GGAA'rAAGCT GGAAAAGGGC AGAGAAGGAC AAGTGGAACC AGTCAACCTC AAGCATGACC AAGTAGCTAA TTTGC7"rTCA CAAAAGGGGA TTTATCCAGC CTTCCATATG AGCAAGCGCT ACTGGATTAG TCTGTCCCI-r GATGATACTT TATCAGATGA AGAAGTACTG GAATTGATAG AAAAAAGTTG GAACTTAACC TCTAAAAAAT GAAATA'rTTT AATAATTTTC ATGAACTrrC AATTAGCTAA ATATTCTTTA CTGAAGAGAT TTTrAGAAAA TATAGGATTT ACCACACTAG AGGAATATGG TGCCATCTTC AAATACCTGA TTGAGAATGT CAAGACGGAT CGTCAGATCA TCTATTCGCC TCACTGTCAT GATGACC'rCG GAATGGCAGT GGCAAATAGC CTTGCTGCTG TCAAGAATGG TGCAGGACGT GTTGAAGGGA CTATCAATGG TATTAGGGAG CGAGCTIGAAA ATGCTGCTTT GGAAGAAATT GCAGTGGCTC TCAATATTCG CCAAGATTAC TACCAAGTAG AAACCAGTAT TGTCCTAAAT GAGACCATCA ATACGTCAGA AATGGTTTCT CGCTTCTCTG GTATTCCAGT TCCI'AAAAAC AAAGCCGTCG TTGGTGGCALA TACCT'rCTCC CACGAATCTG GTATTCACCA AGATGGA=T CTTAAAAATC CTCTCACTTA TGAGATCATC ACACCTGAAT TGGTTGGTGT TAAGATTCTG CTTGGAAAAT TATCTiGGTCG CCATGCTTTT GTTGAGAAAC TGAGAGAATT GGCCCTAGAT TTTACAGAAG AGGATATCAA ACCACTCT'N' GCTAACI-rCA AGGCACTGGT CGATAAGAAG CAAGAAATCA CAGATGCAGA TATTCGAGCT TTGCTAGCTG GAACCATGGT TGAAAATCCA GAAGGCTTCC ACTTTGATGA TTTACAACTT CAAACTCATG 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 CAGATAATGA CATTGAAGCC 'rrAATGCGAC
ACCAATCTGT
AGGATCGGGT
AGGGCAAGGT
TCGTTTGGTG
?1TTGGTCACT 1112 CTCGTTAGCC TAGCCAATAT C4GATGGTGAG AAAG'TCGAA'r TCCGT'rGAAG CAATCTTTAA TGCTATCGAT AAGTTCTTTA TCCTACACTA TCGATGCGGT AACAGATGGA ATCCATACCC G7TrGAAAACA GAGiATACAGA AACCA'rCTTT AATGCAGCAG AAGGCTTCTG CTATTGTCTA TATAAACGCT AATACCTTTG GGCTTGAT'rT TCATGTGrrG TrCAAAAACA GAA'rGCAGGT GAGATGGGAC GCACTG C TGTAAACGAG AAGGCTA'rGG CAAAGAAAA'r ACTAGCTCTA AGAAATCATG GAGG7TGGTr TAGAAGTTCT C'rATGAGATT GACAGACGAC CGT'rCGGAGG *9 *4 ACCTGATGAA ACCCTTAAGG 'rAGTCCTCAG TATGATGGAG GGAACTCAA'r CTTTACGCTA GTCACCACTC AAACTGGAAC AGGCGGGATT TACTTTCGAT CTATAGCTAT GAGGAAGTGG CAGAAAAATC GTTACTAGTA GAAAGTAGCT GAGGAAGTCG AGAC'TCAGCT GCTATGCTTA GAATCTT1rM GGAGATATTr TATGCCATCA GCCAGTCATT CAAGTrAGG4GA
CAGTGGTTCG
ATATTCGTCC
GAATTGCTGG
ATCATATTCT
GGAGGCTC'rA
TGCAGATATT
AGCAGATGCT
CCCTGAACAA
TGTAAAAATC
TGTAGACTTT
TGAAGAGCGC
TACCACGAT ATGCCTAGTG GCAGGAGACC GAATTGGCCC GCTGAAAAAA CAGGTTTTGA GATGCACCAT GACCTCCCTT ATCC'rACTAG TAGCTATCGG GGCCTGATG(3 CTCTCCGTAA
TTTGACAGTC
GTCGTCGTGC
AATGCGCGTG
AGCGGATTAT TCGCAAAGCC TTTGAAATTC TCGATAAGCA AAATGTTCTA GCGACCTCAA CACAGGATTT cCCAGA'rGTA ACCTTGGAAC TGATTACCAA TCCTGCTAAG TT'rGATGTTA TATCTGATGA ATCAAGCGTC 'rrATCTGGTA CTGAAAATGG ACCAAGTCTC TATGAACCTA 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 TCAAGCAT-rr 3240 GTGAATTGAC 3300 ATATCAACGA 3360 CAAGAAATCG 3420 AACTCTGGCG 3480 ATCAGCTGGT 3540 'rTGTAACGGA 3600 CACTTGGGGT 3660 TTCACGGTTC 3720 TATCAGTTTC 3780 AACGTGCTGT 3840 AGCACCTGAT A'rrGCAGGTC AAGGAATTGC CAATCCTATT TCCATGATTr CATGATGTTG AGAGATAGTT TCGGACGTTA 'rGAGCATGCA GAGCGTATCA *t TGAGACAAGT CTGGCGGCAG GAA'rrrTAAC GAGAGATATA GGAGGTCAGG CTTCAACAAA GGAAATGACG GAAGCTATTA TTGCAAGGTr ATGAAGTTAG ACGAAAAAAT TACTCTAGTC CrTGATTT GGAATGTCAT CATTTTCTTG ATTTATGGTA TTGACAAATC TAAGGCAAGG ACAAGAGTTr GGCGCATCCC TGAGAAAATC TTACTTATTT TAGCCTTYAC TTTTG.GTGCr TT'rGGTGCCT GGCTAGCAGG AATCATCTT CACCACAAGA CTCGAAAATG GTACTTTAAA ATAG7?rTGGT TTCTTGGGAT CZTGACCACA CTAGTAGCCT TATA7=?AT TTGCAGGTAA TGGATGGCAG GGTCTTCGAG GGAATACGCT GCTTGGGCTC TAGCGGACTA TGG=NAAG GTCG;TGATTG CAGGATCTT CGGTGACA'rT CATTACAATA ATGAACTCAA TAATGGCATG 3900 3960 4020 4080 4140 4200 4260 4320 1113 TTCGCCAATCG rrCAGCCTAG AGAGGrrAGA GAGAAACTAG CAGGTAACTr, TGCGACTTGGA ACAACAAAAA ATCATCTCAC GAGATAGATA CCGAGTGGAA ACATAAACTC CrAAATAGTT TTGcAGTATG AAGAGTTGAT TGCTGcrrAT GAAAAACAAC
TAGAAAAAAT
GCTTAACA'rC CAT'rATAAAA
GCGGATGTAT
GAC -rGAGCG
GCGGTAGCAT
ACTAAAAACA
CCT-rTGAAAA AGAAAAGGAG ATATAGTAAA cATTrccAG~c AATTTTrrAG rrATaAcAAA ACACATTCAC CGTTTGCCCT ACrAAAGTTG TAAGTTCGCA GCCAAAGAAC GGATCAACTT 'rGCGCTrrAG rrGGGATGAA GATATTCTTC ACTCAAACCA TACGGGGATG
CTGAAATAAG
AAACTACAGT
AAGAAGGCTA
crACATTAT
GTAAGCGTAA
CGCAACTCAA
TI'GGTTGTAT
CCCArCTAAA ACCAACCGAC CAGTGAAGA ATTCACCTTC TGGATGATAT CGGTATTACC GACCAGCCTA CTGGCAGGAT A'rGTIAAACAA ATGAATTGGA GGACTATTCT GGATTCAACA CGACA?'N'A AAAGGTGAGG CATGACCAGT GACAAGGCAG CAAACCAGGT GTTGTCTCT CCCAGAAATT GAAGCATTCT CCTTCCTTGG AAACCAGAAG
GCCGTGAAGA
GTACTAGCTG TTTT'GTTATC GGGAAGAAGG 'rAAAATGGTC CTATGAGCAA GGTGTAATGG AGGAGGAGCA CGTTCAACTT TAAAATCATG ATGCACCTGT TTATTAGGAT AGAGAAGAAG TTGTTTAGAA TT1'CTTTTCT TAATAAGTAA TGCCAATA'PT GAAAAGTTAG TTTAGAAGAT ATAAGGCAAC TGTTGGTGAA GGCCTTTGGA TAGGAT"TTTA AAAATAAGTC TTTATGCTAC TTICTCGGTG CTCAGCTTCA AAGATCAAGC TAATAATGAG AAGTTTIGGAA AAGCAGGTGA TACGCCTCAT CTGCTTCAAT TGTCTATGGT CGATAAGGAC CACCAGCTCC AGTTGTGATC CAGATACTT TAACTCATGG TCTAGTGTTA TGAGATATTA T'rMCTATAG GATATGGTAT TAAACATCAT TAGTAAAAGG TTTTATAAAT GGTATAGTCT AAGTTTAATG ATAAATTAAA ACTTA'rTACT
ACAATTGGCT
GACAAAACGA
GGCAAACTCA
CGTAAAGGGC
GACTACCGTC
GATGTACGTG
GCCAAGCTT
TTGAAACTCG
TCCCAGAACA
TTGACATI'GA
AGGTTGAGTA
4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 AAGCTCCTAA CACTGGGC7r TCTATGTAGA AAATATATGT AG'rTAGATTG ATGAATAAAA AAATA.AAGAA GAGTTATTAA AGAAGACTTT CTCCAGGAAT ACAATGTCAA. TCGATGAATA TGTAATAGGA AAGGGACAGC GCTC1'TGAGA AGGGAAAATA CAAAAATCTA TrrCTTGGAA AAAIrTGGTA TTATGGAA TAAAkAAAACA AACAAATATA ATTTCAGAGT TGGATCAGCG A=''CAAAA TTAAAATCAG ATTTGTATGA AATTATCAAA GAAGGTATC GlrwrAACTT TGAAAAGATC AACAAATGAA 'N'rATTGGTC GTTCTGCTAT TCTATACTGA GGGAGATCCT TTCTTTGGTG TAAATATrAA TGAAAATCCT ATrMATA GGTGACAAAA TI'ACTTTGTA TAGTCAGAAA GAATTTTGGA ACCACTTT1GT TTCTCAGACA AACTGGTGTC CAAAACTTAT AGTA?1'CTAA GCTrTTTTATG ATTTTCGTCA TCAATTAACT CTCCTGGCAC GGGAAAAACT ACGAAGATCA AATCGGATT AAGG'rTAAG ACCACTATCA TTTTTAAAGA TTTTTCrTCAG TTGATGAGGC TTGGGAT'rCT TAACAAAAAC ATC= ACTTA GTGCI'GTTCC AGGATCCTCA ATTATAATAA GCAAGAATAC AGAGATTrTGG TTTGAAAGAC TCTTCATCAT CGATCGAC TCTCTATCGA CCCCGGCTAT TACACGAAAC TGATGAAAAG 1114 AATCAAGGTG GACCTTATCT GCAAAATCAT AAAATAATTG CCTGAG'r=G AGCCATCGAA A'rTAGGAACT ATGCr=rG GAAAATAAGG AAGACAATAG TACAATGGAT TCATCAAACA CAATCTCTAT TAAAGTCTCC AAACCTCATC CTCCCCCGTG TATCT~rGCTA AAGAAAT'rGC 'rAAAGAATTA ACGGATGGCA CTACAATTTC ACCCATCATA TGATTATACG GATr'rTGTAG AATGGGGA'rG GAGCTArTrGA GTTTAGGCTA CAGGACGCTA AAAGCAAAAG AAACCCAATT GATTGGAGGA CAAGATAA'T TACTTAGAAT ATATAAATGT TGCTGAAGAA AAAGAATATA TCTGTTAATA GTAGACKAAA TTTGTCAGTA AAT'rATGATA CTACCTAGCA AATATGT"A CGAGTTGTAT AAAGATAAAA
TACAAAAGTG
TATGTTTCCC
AATCGTGGGG
cCTGG'rGAAA
TTCTATATCC
ATGATATTGA TCGTTCAGTG GATACCTTTG TTGAAGTTAC TGTCGAGGGT CAAGCTGGCA AAGAAGCAAA AATTCGTCTA. AGAAACTTGA ACAGTCATTA TCATATrGGA CCAAGTTATT ATGAATTACT CTGGTCTGAT TATATTAAGC ATGATGAGGT TGAAACTrrG GAaACTTTGA AAAAAGATCA GGCAGTAGCT GATGACAATG GATAATCAAC ACAAGATTAT TAAAGAAAAA CCTCTTTTAG ACAGAACCTT GGAAAGTCTA AATGATTwGA CTCATACTCC TG.Arr'GGAT CAGAAAATICA AGACAGGGAA CGTGATTGGT ATTTCCTCAC GATTTTCTGA TGAGAGTAAT GTTCT'rCATA TCAATCTCAC 'rAGTrTAGAT CAACTTT'TGG TGTATCTCTT TCCCAAGTAT
GTGGAAAAAC
CAACAGAAAT
AGA'TCTAA
AAGGAAGTGT
CCGAAAATGT
AT=TGCTAT
TGTTGGATAA
ACGCTGCTAT
TTCTTAAGIT
CTCTCCTAGA
AAAAAGCAT
AAGGCGATGA
rr'rGTTGAAG
TCCCAAGATG
AAGGACCAAA
TrTTCTGGAT
GACCACTTTT
GTTGCTTTGT
CTACAAGCTG
TGTCCTAGAA ACArrGAGAA TGA'rACTGAT A.AGAATTTTG GATT=rGGC GAACTCTTTT TTCTACCCAA TATGCAAATC TTACATCATC GGAACTATGA GCGTCGTCGT 'N'TCCTTTTG AGAGTTGAAT ATCCATGCAG CCAAAATATT CAGGAATTAA GAAGGATGTA GAT7rTGACT AGACTACTTG CGAGGTTCTT TCATCTGACA AATAATGAGC AAACGATGAT GCGGATTACT AATATCCTAA ACTAAGCAAT AACCTATrT CATTrTTTCCA AGATTTTNGA AACAGTCAAT ATGGTCAGGA AAGATTAACG 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 TGCA'IrATCT
CTCGTGAAGA
CTATTCGAAA
CTTAAACAAG
GAGGCTTTAT
AGGTCTTTAT
AAGGAATATC ATCGAI'TTC TCATAACGAC AGrCATGTTA AGGGAGTGAT TGATGTAAGA 7920 AACCATCTCA AGAAAAATCT TCC=-r*ACG GGAAATATTG CCTACGCAAC GAGAGAGTPTC 7980 ACCTATGATA ATCCCCTCAT GCAGTTGGTC CGTCACACTA T'rGAATACAT 'rAAGAATCAG 8040 AAAAGCATG GTCAAGGGGT ACTAGATAAT CTCTCAACTA GTCGTGAAAA CGTATCTGAA 8100 ATCGTGCGTG TAACGCCCTC 'rTATA&ACTA GCTGATCGTG C1'AAGATTAT TCGGGGAAA'P 8160 CAATCTAAAC CTATACGTCA TGCATACTTr CACGAGTACA GAAACTTACA AGAACPTGT 8220 CTGATGATCC TAAACCAAGA AAAGCACGGT TTAGGGTATC AAGATCAAAA AATCTATGGT 8280 ATTCTCTTTG ATGTTGCCTG GCTTTGGGAA GAGTATGTr ACACCrTGT'r CCCAAAAGGT 8340 TTTGTACATC CCAGAAATAA GGATAAGACG GATGGAATTT CAGTATTTTC TGTTGGGAAA 8400 CGAAAAGTAT ATCCAGATNT TTATGACAGA GAACGAAAGA TTGTTCTAGA 'rGCAAAATAT 8460 .AAA.AAACTGG AA'rTGACTGA AAAAGGAATC AACCGTGAGG ACTrKPTCCA GCTGA-rrCC 8520 *TATTC'rTATA ?TTTTAAAAGC TGAGAAGGCT GGACTGATTT TTCC'rAGTAT GGAGCAGTCA 8580 GTAA.ATAGTG AAATAGGAAA AGTAGCTGGC TATGCAGCTC AATTGAAGAA GTGGTCTATT 8640 *CGAATCCCTC AGAATGCCTC ATrCTATAGT ACATTTT'GTA AAATGATGGA AAA'rTCAGAA 8700 *GAG 8703 INFORMATION FOR SEQ I0 NO: 178: SEQUENCE CHARACTERISTICS: LENGTH: 4854 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: CATCACCAGT TTTAGATGGC TT-rAACAGTG AAATTATTGC TTTTAATCTT TCTTGTTCGC *CTAATTTAGA-ACAAGTACAA ACAATGTTGG AACAGGCATT.CAAAGAGAAG CACTACCAGA 120 ATACGATTCT CCATAGTGAC CAAGCCTGGC AATATCAACA CGATTCTTAT CATCGGTTCC 180 TAGAGAGTAA GGGAATTCAA GCATCCATGT CACGTAAGGG.CAACAGCCAA GACAACGGTA 240 GGATGGAATC TTTCTTTrGGC ATTTTAAAAT CCGAAATGTT TTATGGCTAT GAGAAAACAT 300 TTAAATCACT TAACCAATTG GAACAAGCCA ?1'ATAGACTA TATTGATTAT TACAACAATA 360 AGAAAATTAA GATAAAACTA AAAGGACTTA GTCCTGTGCA GTACAGAACT AAATCCTTTG 420 GATAAATTAT TTGTCTAACT G=rGGGGGC AGTACACAAG AAAGCGCTTT AAAACCAGTA 480 GACCTTrCA TAAGGTTCGC TTGATGTACC AACTGGGATC TTGTrGGTCT CCAATAGGAG 1116
AAGATGAGGC
TAGGTCCACA
GAGAArI-rCG CTATTGr'rAT GCAGCTGTTG ATGCCCATAC TAGCTGGTGG ATGTAATACT GAGTGGATGA ACCCE-rTT ATCCAGATGA TTATC7=TA CTCGTTATGG ACAATGCTAT TAA.AGATTCC GACTAATA'PT GN'ACCT 'rTATTCCTCC CATTrGAACAA GTGTGGAAAG AGATrCGTAA ACGTGGATT TTTGGAAGAT GTCATGAATC AACTCCAAGA TGTCATACAA AAAGTCCATC G'rrAATCCGA GATGGACTAG AATGCwrrrr TTGAATTGCT TATAAAAAAG CTCCATACAC TGGATGTGTA
TGGTTTCGGT
TG.TCCATACT
AGGCGAATCA
AGAAGAGCTT
ATGGCATAAA
AGAATCAGTA
CACTATATAC
TwrTTCTAA
TCACAAGCTT
TCAAGTACCT
ATACACACCA GAGATGAACC AAGAATAAAG CCTTTCGAAC CGA'N'GGAGA ACGAGGTGAT GAAAACAGAT GAGTATAAAA TAGAGCAATG GGGCTTTArr 0 TGATATAGAG ?TC7TGT=T ?TAGGACAA TTTCTCGGAT ACTTGCAAAC TTTTAAGTT TTTTGATI-rC TTCTGGATGA GTGACGAGAG TGATAACATA ACCTTCCTTG CCCATACGAC CAGTACGGCC AGCACGGTGT GTGTAGGTTT CGCTATCTCT ACGAATATCA AAGTTTACCA CACArCTAG GCTATCGATA TCAATTCCAC GAGCCAAAAG GTCAGTTCCA AGAAGCAGGG TTAG'I-GGTT ATCTT'rAAAC TTTCTAAGA TGA7TTVrCT AAATTTAACA 'N'AACATCAC TAGCGAGGGA AACAGCCAAT ATATCACCAT ACTGTAGTTT TTCCTCGGCA ITrCCCAAGGT CTGACAGGCT ATI'GAAGAAG ACTAGACCAC GGAAATCCTC TACATGAGCC AGTTTTCGTA GCATATCCAC TCGATGACGT TGGTCTACCT GCATGTAGAA ATGCTGGATA TTGTCCAATT 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 TTTGATCAGA GAGATCAATA GTGCGTGTAT TCCGCACAAT CI-rT~GG TCGTGGCACT CATGTAGACC AcrITGGTGGT CACGAGGTGC GTAGTGAGTG CAAAGTGAAT CTGAGAATCA TCTAGTAAT'r GGTCAAAr'rC ATCCAGGATG CATTCATCAT CTTGATTTTT TrAAGTTTAA TGAGTTCAAA GATACGGCCA
TCAAACTTGG
ATT'rTTTCTA
ATGGTI'TCCA
GGAGTTCCAA
*6 00 0 0 a.
*00.
*00 00 0.
TCAGAAT'rTC
GGAAGAGTTG
GTCCAGCAAG
GTCTGAGAAG
CTAGGAGGTT
CTTCGGAAACC
T'rCTGCCTCA TGGCCCCrTr TTAAGACGT AGCAGTCAAT CCGATAGCTT
TT'CCGTAT'T
AC?1'GGTAGG
TTCTCCAGCA
GAGTTGGTCA
AATCTAATGC
GGTGCTAGAA
AGATACGCTA
AGAAGGGGCT
CTCAGTCl-r
CAGCAGTCTG
CAATTTGTCG
CTGCCCACGT
TCAAGAGTTG
GGGTCTTACC
CAAATAGTTG
GCCATTCACT
GCGCATGGTA
ATTCAGATCT
TTTCTGACTT GAACCTGAAA TTTACATACA TCAAAAATCT TTGGGCTTTT TTCTTTTGTA AGTTCCGGTT TGGCTCAC1TC AGTTTGAATG GGGGTGAATT
CGGTAGTTTC
TATAGTAGCT
GTTTTCATTT
CATGAACAGA
GCCTGCATCA TACAGCCAAG TTTGGTAGAG GGTTGCTGGA 'rCATGTGTGC 1117 AAATGCAGCG ACTTCCTCAG TCATCGTATG AGGAGCCTGT TGGATAGGAA GCTGGACTTG ATTTCCTTGG TGGTCGCAA AAATAGCTGA GCGAATATGC GGTrCCATCT GTTGTATAAA 'rCTCGCAAGG AAGATTGGAA GATGTGAACT TGATAGTCTG GGTAGAAGAG GATACCATCT GTCAACCTGT TGAGCATGGT AAGTCGCGTC ATTGGCTT ATAGAGGGGA TAAATCCCCA AATCCATGAG GGCTCCACCA TCAATCGTGT TGAGAGTCAA GTGATG7*'T? TTCCAGCCrr CCAT'ITAGGT CAATGCTrr CCAAAAAGAC GAACACCAGC GCAAAACGGI' CTGAAAAGAC A=~GGTGT? TGTCCAGCCA ACAAGTCAGG ATCTGCTCCT AACACTTGCT TATCTGCTAA GTGGTAATTA CGAGCTGCT'r CAAAGATAAA ATCAAACCAT TC'I-GTGGTT GAGAGACAGC CATCTTGGA-A GAGTAT'TTCG CATAG'rTGAA AAAG7TTG ATAGTAGTAA AGGCTTTCTC ACAGTTATTT TTTTCAGCTG TT'TGAATCAA TGGCTTTTCG AGAATAACAT GT'N'ACCAGC AGACAACGCA GCT'rTTGCCT GAGCAAAATG, TAAGGAGTTT GGACTGGCGA TATAGACTAA ATCAAAAGAA GATTTGAAGA AGACTTCTAA TTGATCGAAT AGTTGGATAT TCTGATAGCG AGAAGCAAAG GT'rGCTGCAG TTTCTrAGTTT TCTAGAATAG ATTGCGACCA GTTGGTATTC TCCACTGGTA TGGGCTGCTT1 CTATGAAA'rG ATGGCTGATA GCGCCAGTTC CGATGACACC 'rAATTAGC ATAAATACTC C="rCCGAT AGACGGGACT ATCCAACAGA GAGGAGAAAA AATAAATAGA TAGAAGCA'rA GAATCTAGCA AGGAGGAAAA GGAGGATTCT CAGACATCTA TATCCGCGAT ATGCTrGGACT TGCCAGCAAA TCACG'rCT'rc CCTTCCATGC CCTACTCAC GCCCGAGTTA GATAATGAA TCCAAGTTAT TTTCATCAAT CGCCTATGGC CTTATCTGTG TCCCCAACGT GAAGGTGATA CCCACCAGAG
TTTAATCCT
TTTCAAAT;A
AACCTAGA'TT
GGTATCAGCC
AAATCTGACG
GTAAGATTTC
CATCGAAATT
CAGTCAGG'TT
CTACAAACAA
TCTTTCA'rTA TAACATAGAT GCTATTAGCT TTCTTTTCCG TAAAAA'rGTG CTATAATAGA CAACTAATGA TTTGTCAATT ATTTTGGAGG GAAGTAACAT TATACTAGTA TAGACGTCT CAGG'rTCATC ATCAGAAT AATCAAAACC TAGAAAAAAT ATCGCACTAG TATACGCTAT TTTCATAGTT TTATAGTAAA 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CGCAATTGTC GATAGTAA'rT ACTTCTCAGA TGACCTAGCT
ATGAAATGAG
GAAGTATAGG
ATAATCTCTA
TTGTGGATTG
GAATAAAAGT
AACAGGACAA ATCCATCAGG ACAGTCAAAT CGATTTCTAA CAATGTTTTA TCTACTATTC TAGCTTCAAT CTACTACAAA TTCCATAGAT AGAAAACI'AC CAGATACGGA 'rGTTGGAGTT GATGTAAGAT GCTTTGGCTT GCTAGAGGAA CCAAATTGTA TCATTGAAAT TATTGCTCAA A7"rTG'TATG ATATAAATAT AGACTAGGAC GTGGCAGACA CGCGAAAACG AGACATGTAT TATTGGCTTT GATrGGTT TTAGCAATTT CCAGCAAAAA AGTrTTTGAGC GAGTGAGGGA AATCAGAAGG TCCTCTCCAA GGGGAGAAAG 1118 CTA rGCCT ATrAGGCGGA TTTATTGCTT TTAAGATCTA AAAAGATTGA ATCGCTCAAA AAAGAGAAAG ATGATCAATT AGCATTTTCG TCAGGGGCAA GCCGAAGTGA 1'rGCCTAT-rA TGATTTCCIC TGTTAGGGAG CTGATAAATC AAGATGTTAA
GGACAAGCTA
TTAAAGGGA
GATTGAAGAG
GAAAGTAAGG
GTCGTTAATC
ACTGAAAAGA
ACAATCTTGT T TNCTACTAT ACAGAGCAAG GTAATGTGAC CAAACAAATC TATGATTTAG CCAGTCTAGG AAAGGTCAC TTAACAGAAG
AAGAGTCAGG
TTGCTFrAA
ATGGGCAACC
TTFACACTT GACCAACTGT TTTCAGATGC TAGTAAGGCT AAGGAACAGC TGATAAAAGA.
GTTGACCTCC 'rTCATAGAGG ATAAAAAAAT AGAGCAAGAC CAGAGTGACC AGATTGTAAA AAACTTCTCT GACCAAGACT TGTCTGCATG GAATTITGAT TACAAGGATA GTCAGATTAT CCTTTATCCA AGTCCTGTGG T'rGAAAATTT AGAAGAGATA GCCTTGCCAG TATCTGCTr C~rrrGATGTr A'rCCAATCTr CGTACTTACT CGAAAAAGAT GCGGCCTTGT CT?1'GATAAG AAACATCAAA AAGTTGTCGC TCTAACCTTT GATGATGGTC AACGACCCCG CAGGTATTAG AGACCCTAGC TAAATATGAT ATTACAACCG INFORMATION FOR SEQ ID NO: 179: SEQUENCE CHARACTERISTICS: LENGTH: 2186 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
ACCAATCA
CAAATCCAGC
GGGT
4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4854 120 180 240 300 360 420 480 540 600 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: TAAACAGGTG TTAGGTGCTC TAAACTATTA A.AATTCTAAG GAAATAAGGC TACTTTTTCT GGGTCTTGTT CATAGTAGGT GTGGTTCTrT T*TTTCGAGTG TAGCCCATAG CTTTGAGCGC ATAGTGGATG GTAGT TGGAT GACAGCCAAA TTCAGAAGCT ATTTCAGTCA A.ATAAGCAriC TCGATTGTCA GTAAGATAGT TTTTAAGTCT ATCTCTATCA ACTTTTCTTG GTTTTGTTCC TTI'ACTTGG TGGTTTAGCT CTCCTGTTTT CTCTTTTAGC TTTAACCAGC CATAAATGGT AT'TACGTGAG ATTTGGAAAA CGTGTGATGC 'N'CTGTTATA CTACCTGTTC GCTCACAATA
AGAGAGAACT
GTACTATI'T
AATATTATT
CTATTATGAA
TTT'TTACGAA
TGGTTCATTT
CTAAATATT
CTATATCAAA
AATCTATPTGA ATATGCCATA TACTATATT CTAAACACTT GAAAATAACA TCTATTN'GTA AGACCACATI' ATTTAGATTT AGAAGATTAT ACCACATTGT AGAAATAATA AAACAAATTA TTATACTATC TTTGAGGTAA TTTAAGAAAA CATCGTGACC GACCAATCAT CAAACTTGTG TCTATAAAGA CGACTTACTA TCGAAGATTT GAGTTACTAT ATCAAT'rACT TAGCAAGAAA 1119 AGTGGAGTTA GACGAGCTGC GCAACTGCCG TAGACCAGGA GATCTGCGAC A'rTrTCAAAC ACATACTATA TCI'TTTTAGA
AATTTGAACT
GATCTAACGC
AGATAGAGGT
ATCTGAATAC
AAACATCATC
TAAATCrGTG CTTTN'TCAAC TCAGATrATA TTCATCAATT ATrATTCGCT TATATAAAAG TGAAATTCAA TA4TTGAAA TGTAGACCTC TA~rG.ACTG CTrGACTGGT CGGTATGTTG AGGTCAATCT CTCACAGAGA.
TGCTrrCCCT TACTTATTGC
GGTACCAGAT
CTACTTTATG
TCT'TCCTTrG
AACAGAAATT
TTACGATGAA
AGTCTATTCA
AGTAGCCAAT
TCATTTGAAG
TTTAACAATT
AAAATTGACT
TCT'TAGCAAA
TAGCAACAAA
AATATCTATC
ATCTCTTTAG
TAAATGATAT TGTCACTAGA TTGGGAAAAC GAACCCTTCT CAGTAGTACA GGTAGCTTAA TCAGCCAAAA TGTTrCAATA TCCCATAATA ATAGTTTACT rTTTTATTCC GT'rCCACGTT GTTTAGAAA.A ATATTATCCC GTTGATTTAG ATCTCAGAGG AATATATAAC TCCATACTGT CAAATCCrAC TATTATTGAG CGCATTGTCC TATCAACAAA TAAGATTCGC AATACCCTAG CTrTGGAAAA TTA'rTTGACA AC'NTrGACAG TTGATGTAAA AGGTAGAGCA TTATTGCAAC GTTTACGACA TCTCTTATTA CCAGACCAGA TGGTATA'PTT GGAATTGAGA CGTAGATATT ATGAGGTTGA TTTTGTTGTT GTAACTGATC AAACAACACT TGCTCCAGAA ACACTAGAAA ATCAATTCCC TAAATATCTA TTAACAATGG GAATCGAGAA GAAAAGCAT'r A'rAGATTGGT 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2186 AAGAAGACAT 'rAGGCATATC CACAACTATA TGT'rGGTAAT TTGGCCACTA CGCTTATTAT GAGAACTTAG ACCACTAGAA ATACGATTCA GCCAACAGCC
TTGGAAAATA
TTAGATAAGT
CAGGTCAGTG
GCCATTAAAG
AATTACAATG
TACTAGAAAA ATAGATAAAT ATAAATCATA CAGCTAATTA CAATGATTCT ACCCAAATCC TAACAAGATA TAGTGAATTT GACACTTGAA AATAGAAATT GGGGATGAAA AAAATTAATA TCA'rAGTCTT ATTAGAGAAT ATTGTAACTG AATTATAATG AAAAAGAGAC GCATAGTATC AGGTATTGAA CAACCTTGAT TCATTrTCTC CTGAAATAGA GCTTTTGCTA TTCAACT'rCT TACCTCTTGG GAAAAA
GGGGATCTAT
AGCATCACCC
TGAGCAATCA
GATTTGCAAC AGTCTGTTAT CGAATACGCT ATATAATACG AATTTCTGGA AGTACTATCA ACT'rTCTCAA ATA.AGATTAA GTClTTAAAA TCAGAAAAGC AATATGCGTT TTATTATGGA AATATTTGCT TCCTATTTTT CTCTATTTCT AATGATTTAC INFORMATION FOR SEQ ID NO: 180: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 3236 base pairs 1120 TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: GTCACACGTT TGACTTCACG TATTTCATAA GTATAAACTr -!wrTT'rATC GGTI-AGATAA ATCNTCATGC CATTTTTAGC ATTATCTAAA GGAGAAAATA ACAT'rTTATT AGCATTATCA ACACCAAAGA TATGGTGACT AGCTAGACTA TAA'rTTCCTT CTCCCATTAC TTGCTCGCGT TTCATTGTAC CAGCTCCGTA GAAGAGATTA ACA'rTATCAA GTCCTr'rAAA AATCGGCAAA T'rCATT'rCCA ATTCAGGAAT TGCAATTCCC TGAGAAGTTA GAACAGCTrC CGAAGAGATA TCTGTATCCT GATTTTCTTC TAATTTTTCT GTATTCCAGA CTATGAAAAT ATTTCGAATT AATATCAGAA ATCCTGCTAG GATATTTGTC TTA'I-r'rTTT GAGACATTAT GCTTCACCTT TT'rCTGCCAC CGCAACCGTT GTGAAAGTCA CTTTAACTCC CATAGTTGC.A CGTCCTGTTT TGACACCTGT ATCAGTGATA ATCATCAAAT CCAGCAACCC ATTTN 7MCG GTAATTTTAG TTGTTGGGTA TTCAGTAGCG ACTGTACGCT CCTCATCTTG ATCAGTAATC AAGCTGGCAC CACCTTTCAC ACCAGTGGCG ATACGGCTCA CCAATAACTG GTAA'rrrTTG GCTTTGACAG AATCAAAGTC TTrGATACCT GGCTAACTTG TGAGTAT'rAA AAATCAAAGC AGCAGATTT'r TTCGCTTGTT
AGCATCCCAT
AAAATTGCCT
ATACTTATTG
CAG IGACAGT TrTCT'rr-rA CTGT'rTCGTT TTCTGTCCCA ACTTCTTCTT CTATCTGAGC ATCTTGATCC AGGCGCATTA GTGAAATATr
CCTCATCCCC
CTGTCTGCAT
TACCATATCC
CAACAACTGT
TACCACGAAC
GGCAAGATTG GTTCGAATCA T'rGAACAGTC ATAAGACCGG TCCCTTACCA CCACGACCTT TTTTTCTGTG ATAATAAGAA GTCTCCTTCA CGAAGGTTAA GGCTGATTGA TTAAAGCGAA TCCTTCTGCC AACAAGACAT ACCATT'FTGA CGAATATTGG 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 CTGCATAACC AAACTTGGTA CCAATGATAA TATCCATATC TGATTAACTC ATCTTCATCC TTTAAATTCA GCGCTTTGAG CAAACTCCTT AACACTGGTT CTCTTCACAA TACCGTGACG GGTTGTAAAG AAGAGATAAG CATCATCACT GCGATCAGAC TCAACATTGA TAACCGTCTG AATACTTTCG TCTTCA'rCCA -ATTTCAAGAG ATTGACTACT AACCT'rTAAG ACCATAGACA TAGTI'GACAC TAACTCACGA GACCCCCACG TTTTTGAGCA TAGAAAGGGT AATCAAGACA GGTAGCCCTT TGGCAGTCCG CGTCCCTTGT TTGTGAAGAA ACAAAGTCAT CATCTTTCAC GTGAACTCGT CCTGATCCAA TCCGATTCTT CAATCAAGTC ACCATACTCA GGAATTTCAT GAGCAGATGA TCATGGGTGC TCCCGTTCCT TGGACACCAC ACGCTTAATG TAGCCTCTGT CTCATCCTCG AGACTCAAGA 1121 CCrGTCCAAT CATCAACTCT ATTCGTCTTT GATAATTTGA CAA'rCAGAGC CAAGAGGTCA AACCACGAAG ACGCA'rATCA TCATCAACTC AGCTTGAGCT GTCGATATGG TCTAGCGCAA TTCCTTATCA AAACGTGTAC CAAAATCTGA CGAAGAGACA ACCAAAATTG GFrGCATT GGCGTCGCGC TTGACTTCAA GTACGGCGCT TATCAGAAAA TTT-ACGTTrA GAAACACCTT CAGGCTTAGC AAGAATATCT TCATACTCAG ATTGAATCTT ATCGCGTTCC AGGATAGCTT GACTTTGACG TTCAGAAAGC TCCGCATcCG tTCACTAGC ACGGATGATA
TCAAGAGACC
GACGAACAAC
AAAT'rTCGG CGGG'CATrTr
TAACAAATCG
TTCTAAGATA TGAGCGCGCG CACrCr= TG4GTGCTCGA TATACCATrr TGGATAGCGA GAAGAGGTTA TTGAGAATAA AACACCrTTCA CGGTTGCT ACTTCATCCA 1620 GCTAAATCCG 1680 AAACCTGTCA 1740 TTAAACr-rGC 1800 CGAATCAyTC 1860 CTTCCGCTTT 1920 TATAAGCATC 1980 GCATATTGAA 2040 CA'rTGGCTGA 2100 CA'rCACGTAC 2160 CATGCACCTT 2220 ACAATATGCT TGCTGTGATA CCCTCAATGC GTTT=CCTG GGTTTTATTG ACCATGTAAG GAAATTCTGT CGTTTCAATC TCTCTACGAG AACGTAGGAC ATGGATACCT GAT'rrCCCCA TGACAAGAGC TTrCCATCAAG TCCTTGGTAG TCACTTCAGG TACAACGATA CGCTCACGAC CAGTCTTACT AATCGAACCT TTACCTGTTT CATAAGCCTT ACCAGTTGGA AAATCTGCTC CAGGCAAGAC A'FrATCCATG ACCAACT'rCA CTGCATCAAT GGTTGCCATC CCAACCGCGA TACCAGT'rGC TGGCAAGACC AAGCGTTCCC GTTCATTGGC TTTGTTGATA TCACGAAGCA TTTCCAGAGC gGTTTCACCC TCCAT'rAACC
ATCATAGTTA
AACCAACA
AGA'rTATGAG GTGGAATATT AAAAGGTTTG GAAAACGCGC TCAACGAAAT CAACTGTATT ,ATCT'rGCTC ATACGTGCCT CGGTATAACG TTGAGCGGCA GCACTATCTC CATCCATGGA ACCAAAATTC CCATGACCAT CTACAAGCAT GTAACGGTAG CTCCACCATT GAGCCATACG GACCATCGCCT TCATAAATAG AGGAATCCCC G'rGTGGGTGA TATTTACCCA TGACATCCCC TGTAATACGA GCAGATTTTT TATGGGGTTT GTCTGGGCGTC ACACCCAATT CATTCATTCC GTAGAGAATG CCACGGTGAA CACGl"=TAA GCCATCTCGA ACATCAGGAA GAGCTCGCGC TACGATAACA CTCATGGCGT AGTCCATAAA ACTTGCCTTC ATCTCCTTTG TCAGATTGAC ATTCACTAAA TTTATCCT GCATTAATAA ATGCCTCAT'r TCACAATTAG TAAGTAACAA CATTATACCA TAAATTCCCA TCTATTTCAG CCTCTAAACC ACTAAAACGT TTACATCGAG AACTATAAGG CATA'rTCGTG ACAAAGTTTT TTAAAAGTGA TAGAATGAAG TTGTCTAGGG AAAACCCCTA ATAGAATAAG GAGATGGTTA nLACAATGACT CTGACTAACA CACAAA INFORMATION FOR SEQ ID NO: 181: 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3236 1122 SEQUENCE CHARACTERISTICS: LENGTH: 8651 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 181: AGGTCCTGAA GTATTGGAAC AGGAACGTCA AGAGTTTTTG GAACATTTCA GCAGTCAGTT GAAGTAGTAG CCATCTCAGG TAGTCTGCCA GCTGGCCTTC CTATCCGAGC TTGGTAGAAC ?TTCTAATCA AGCTGGCAAG CATGTACTCT AGGTGCACCA CTTCAGGCTG TTCTTGAATC ACCCCATAAA CCAACAGTCA TAATGAAGAA TTGTCTCAGC TTCTTGGAAG AGAAGTTTCT GAGGATTTGG AGAAGTACTT CAAGAACCTT TGTTTGCAGG GATTGAATGG ATTATCGTTT CAACGGTACT TTTGCCAAAC GGTGGTAAAT CCTG?1'GGAT TCACAAAGAA TCGGATGCAG TCAAGAAAAA ATGACTGGTC AATAGTAAAA CAGGTATAAA TTTCTGATGA AAATGGTATC GCCTCATGGT TAAACACCAA TCTTGGTAGC AGATGAATTG GACTTCCAGC AACTAAAGCT CAGGTTATGA CACAACAAGC AACGTATTAA AGAAGAAGGT GCTCAGACGA ACTCAATCAA TGGCTGAAGA TATCCCATTC CAGGTTCTGT AGAATACGCT ATGGTGACAC 'N'TCTACAAG CTAGATATTC CTGGAGACTC TACTGTCGCA GGAATTTCT
AAAAACTCTT
CAGTTGATTA
TGGACTGC'C
TCAAACCAAA
ATGAATTAAA
CACTTGGTGC
CTAGAATTCA
CAGGACTTCT
TGCTCAATGC
ATGATCAATT
TTAGAAAAAC
GCTTTGAAAC
GAACTTAAAG
CCTGAGTATG
TATGAAAAAA
TGGTCTGCAA
AATTACTCAT
ATGTCAACAT
ATGGCIYTTAA
ATCTCAGCTC
ACAGAAGAAC
ACTAAATATG
CTTGATGAAA
ACAAAACGCT
GCAGATGCAG
GAAAAACAAG
TTCCTTGAAA
AAAGTAAAAC.
CAAGGCAAAT GTCCTTGGTA GGCCAACTAT CAAGCTCTAT CAGAACAAAA ACGTGTACGC TTGCATTTGA CCAACGTGGT CAACTGTGGC CCAAATGGAA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500
CTTCATCTAT
AAGCTGGTCT
TGCCAGACTG
TTAAATTCTT
CCTACATCGA
TCCTTGCTTA
CACACAAAGT
TGAAAGT1TGA TTATACACrr
ATACATCTAC
TCATGAATCA
GCTTTACTAT GATGTAGATA ACGCATCGGT TCTGAGTCTG CGATGAAAAA ATTGCGGATG TATCGGCGCT ATGAAAGTCT AGTTCCTCTT AACATTAAAT GAAGAAGCAG CAGCCTTCTT T'rGAGTGCTG GTGTATCAGC GGTGCGAACT TTAACGGAGT
GCTTCTTGAC
TCTCCTrGCT
CTTGGATGTT
TTTCAGACCC ACGCTTTAAC ATTGATGTTT ATG'rTGAAGc krTCGCTGAAG GTGAAGTAGT CAAACCGCAA GATGAAGCAA CGAACTTGCC TAAACTCTTC CAAGATACTC TTGTATTTGC TCTTTGTGGC CGTGCTACAT GGGCACGATC AGTTGAAGCT TACATCAAAG ATGGTGAAGC AGCAGCTCGC GAATGGtCGC ACAACTGGAT TTCAAAGAAC AGCAACTrCA TGGAAAGAAC CATGAATCTA AAAAAATTA AAAAAAGTTG TGCTATACTT AAATCACAAG TTAATATGAA 1123 7TrGAAAACAT TGACGAACTC AACAAAGTI'C GCGTGTAAGA AAGTCCTCCT AGT'rTAGGAA TATGTAAAGG CTACAAAAT AACTTACTTG TTAGAAAGTA ACTATATGAA G'rATAATAAA AATAGGATAT AG1'ITATTrT ACGAGCTAGG AAGGAAAAAT ACGGAAACAA TATTGCCAGA
ATAAACTATA
TACTAAAAAG
AGCCTTACCT
AGCAAATGAT
TTTAGATGCA
AGTTTAAAG
ATGGTTAGTG
A7'rGTTTTGA CATTTCATTC ATTGTTTTAT AAAAGGAGAA GATAAACGGC CGTTAGTTGT AGGACTAGGT ATTG'rTTCAA 'rA'TCTTATC GTTCTCTATT TGCAGATAGT GCCCTAACTA CAGTAGATAA ATGTTGATGG GAATAAATTT TATAATGTTT CGG'rTTCAGA AAA=rTGA AGATTA=NT TATGTAGATA ATATTGGAAA CTGAAGAGTT AGCAAAAAAT ATTGGTATTT CTGTACAAGA AGATA'rTGTA AATGCTCGTC 'rATAAATTT-A AAAGGCACTC AGCAAGTTTG ATGTATGGAG AGGT'MCGT TTCAATCTTG T'rTCGCTACT GGA'rATGCTG TGGATrrGTT GCTCTAATAA TTATTGGACA GTTGCTGTAG AGATTTACCT TAGAGGrrAT GT=?ATAGT TTCTTGTGTT CTTTT7rATT TTTCTGTCTT GCCTCTGNTT GATGAATTT CTGTAAAAGA GTTACCCAAC GTCCTCAAGT GAGGGGGATG GATGGCAT'rT GAAACAATTT GTGGTCGAT TGGCTGGGCT CTACACTAGA AGTGCCGTTT TTCr'rTATGA ATCATTCTTT GTTTACGAAA GAGGTCCT
GGTGGCTGGG
GCGGTTAATC
GTAAAAACTG
GTGAACCTTG
TAAAAAA.ATA
CTGCTGGAGC
CTG'rACATC
CTGTAGAAAA
TTTACACCAT
ACTGTATTTT
AA'rGTAGCTA
TAAAAACAGA
1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 CTrrGTTTA'r TAGACT'rAAT GAATTTTAAA CCTGTTTTTG TTTTGAT'rrA CAAA-AATAAA AGAACATAGT .TAAGTTTTAA AAAAAGT'rGT ATGTAAAGGT AATCACAAGT TAATACAAGG TGAGTGT'rAC TTACAAAATA ACTTACT'rGT GCTATACTTA TAAGTAATAT TAGGCATGAT CACACGTGAA CTGGTCATTT T'rTGTACTTA TATACC'N'TA TTCTAAATCC AATGAATCAC AATGTCTCGC TTGTAATTGG TAAGGGAATr GCATTCGGAA
TTAGAAATCA
AGATATAAAA
TTGTCAGAAA
AGAAGAAGGG
GCTGATTTTC
GGAGGTTGAC
TGATAAGGGA
GGATTTGATT
TAGTTCATTT
A'rGTA'rCGAA
GAAGAGGTGA
GCTGAAAATC
ATGGCTCTTC AGGTTGAGAA AATCTTTCGG ATGAAGACCG AAGAGTCCAG AGAAAACTTT TCAAAGATGT TCCGCTTGAT TTTATCACZAG AGAAATATCA 'rTATCCGATT CAAGAGTATC TGACCTATGA AATCATTCAT AAGCTATCAA TCTATGTAAC CTTGACAGAT CATATTTACT GTTrCTTA'rCA AGCI'CTAACT CAAGGAAGGT ACAAGGATAG TAATCTGCCA GATATTTCCG
CTAAGTATCC
TAGCAGATCA
AAGTGAAAA
ATGTTGAAGA
ATGATCGCTT
ATAACCAATC
AGATTGGTTC
GTGAACGAGT
1124 TGTCGCTTITT CAAATCGCAA A'rGAAGCr T GAAATTTAC CGTCAGAAGC TT'TCCTGAG GACCAAATTA 'rrCGGATTGC TT'ADCATrTC ATTAATGCTG TGAAGTGGAA CTTGTGGAGT CGA7rGATAA GAGGAAAGAA ATTCTCAGGA AGT~?=AACG GACTATGCAA 7"TCAACGAAC TAAAAAGAAT AACCA~rCT TATGATCCAT ?GAATTA'r? TCTTGGATTA TTTAGACAGA TCTAGAGATG ACN'CTGGAT A'rGGAAGATC ATATTAAACA ATCCTATCCA AAAGCCTTCG CAAGATCTAT GATGTGATTA CGCAACATAC GGGTCTTGAT TTGATAAAA ?TATCTAGTT CTACATATCC AACG?1TTATr GTCATAAAAA TTTATTTAAA ACTATATAAG GAGAA7TCTA TCA'rGAATAG AGAAGAAGTA ACA'rrG~rAG GTTTTGAAAT CGTAGCCTAT GCTGGCGATG CTCGTTCAAA ACTATTGGAA GCCTTGAAGG C'TGCTGAAGC TGGrGATTTT GAAAAAGCGG TCACCACGCG CAAACAAGTC TGTAACCATG ATGCATGGCC GCATCATTTA ATTGAACTC'r ATCGAGAAAG GAAAGCCTTT CG'rGATGGTT TCATTGCAGG GCCTTTIGTAC CAAACTCATG AAACCTTATA GCTATTCTAT TCATTGACTG ACTCAGTAAA ACATTGTTGG CAGCAATTGT CTAGCTACTG GATTCTTGGG ACTGTAGCCA TCTATAAGGT GTTCCACCAA ATATCTCACA TCTCTTTATG CTC'T'GACTT ATCGGTAAAT TCTTCGCACC -A'rCTT'rGGTG CCTTTGCCTN' CCAGCTATCG CAGCTATTAC GGGATGCATG CAGACAAGAT GGTACAGGTG CGACATTGGT ACGCTCTGGT AGAGGAAGCT GGCrAGCTGTA TTGCAGAGGC TATTGACTAA GGAAGC7TCA GGTGAAGGACT TGGCTTATAG AAGACCAC?1' AATGACAACT ATCTTGTTAA AAGATTTGAT ACAACAGAGG AGTTCAA'rAA TGAATAAACT AATTGCATT'r CTTTGAAAAA CTATCTCGTA ATATCTATCT TCGTCCTATT 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 TA'rGCCTGTT ATTCTCTTCT CCGCTrrAAA TGCTCTGATG GGGTATTCTG GCTCTCTTGG CCCGAGCATG GAAAAAACCA
TGGTTTGTTG
CACAAAAGGT
TTGTCGTTAAG
AGTCTTTAAA
ATTAGCACGT
ACTCT'TC'ICA
CTTCTGGTTT
CTATGCCAAT
ATGTTGGCAG
TTGCTTTCAG
AACAACGTCA
GAT=TATTC
TA7TrTrGTTG
GCAGCAGACG
GTTGGGATTC
GCCGAAGTTA
CAAGTATCT
AAGTTGTAGC
TAGCTGTIAC
ATCAAATCAA
CTGATCCTAT
CCTTCCTTC
CTATTrCGTAT CAT'rCACTCT
CTTCTAGTGT
GATACCTTGG
TATCTTGATT
CTTTCTGATG
AACAGCTAAG
GTA'rA'GTCA
CGAAAGTGGT
TGCCTTTGTT
GCCTGACGAA
ATCTGTTGTT
GGCAGAATCA
TATTACCA'TT
ATGGTCCATC TATCG'rTGAA ACTTGAACCT TCTCCAACAA TCTTACTTCT GGTACACAAA TGTTTATCGT TACCATGCGT CGTTCCA'N'T ATr =TCATGT GGTTGACAAA ATCGAAACGT AACCCTGCAA TCGGACGTGC TTCAGTAGT CCTACCTTCT TCGGTGTAAA TGAACCA.ATC rrGrCGTG
ATTGCAAACG
GCTAATCTAC
GTGCTATCAT
TTCC7TTAAGG GAA'rTGAAAG
AAAGCGGGTG
TGTGCAGGTG
CACCTCTTGT
'rATGGATTTT1
CATGGACAAC
TCA7*rcrGC
TCTATGATGA
AAAAAGTTGC
TCGATGCAGC
GAGGAACAAG
*W
*t TACAATCTCC C'rGTGAAAGC GAGTTTGATC T'rG7rATCCT GAAACAGATA AGCTCCCTAT ACTCGTGATG GAAAAGGTC CTC'rGAAATA GTCTCCCATC GTCGGTAAAA AGATATCGTT AGTGAGATGG AGAAGGATGG CAAGAAATAG GTGAAAAAAA AACAGCTGCI' TATCAAGCAG GGATAAATAT CT'rGAGGATA 1125 7'r'rAATCCA A'rCrTCTTCA TTCCAT??AT CTTTGCTCCA CAAATCTTT ATT-GAAACTC 7TTGGAATCAA CTCA-rC.ACT 1'CCAGCTCCA CTAGGTCTAG TTC~rGGAAC TAACTTCCAA 'rGCCCrrCTA ATCGTGGTTG ACGTTGTC-AT TTACTA'rCCA ACAAATTCTT GAAGAAGAAC c'rTCAGGI'AA GTCTAA'rGAT TGCAAACTTC AACACTGCAA AAGCGGATGC TA'IrCTTGAA ACAAAATACC A'rCACTGAAG AAACAAATGT CCI'CGI'C TGGTCTCCTT GCAAATGCTT TCAATAAGGC AGCAGCAGAA AGCAGCAGGC GGCTATCCTG CT1CACCCTGA AATGTTACCA TGCCCCTCAA G'IrGCTTCAA ACTTTGAAGA TATGAAAGCA TAAACTAGCG AAAACAGAAG GCGCTCAATA CATC.AAAT'rA TCTFGCATTC GTACAAGCGC AA=TCGATI'A AGGCTAGAGA GTTACGGAAA TCCrATCGC GAAN'TCCTA TTA~AATTC TTTACCTCCT CA'rGTCACAA 'rTCGGTrACT 'rGGTACAAGA CTCACTGACT CCTCTCCTCT CACTTTNACT TTATTTAALAT TGACAAAAAC ACTTCCAAAA GACTTATTT 7rGGT-GGCGC AAGGTGCTAC ACATACTGAT GGAAACGGAC CAG'rrGCTTG ACTACTGGTA CACTGCCGAA CCAGCTAGTG ATTTTTACAA 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 TCGATATCCA G1-rGACCTCA AGCTAGCAGA AGAGTATGGT GTCAATGCTA TTCGAATTC TATTGCTTGG TCACGTATTT TCCCACTGG TTACGGCCAA GTAAATGCTA AAGGTGTTGA GTTTTATCAT AATTTATTTG CAGAG;TGTCA CAAACGTCAT GTTGAGCCTT 7TrGTAACTCT 'rCATCACTI-r GACACGCCAG AAGCTCTCCA CTCAAATGGA GACTTCTTAA ACCGTGAAAA 'rATCGAACAT 'rT=AGACT ACGCTGCCTT CTGTTrTGAA GAATTTCCAG AAGTAAACTA TTGGACAACC TTrAATGAAA TTGGACCAAT CGGTGATGGT CAATATT1'GG TTGGGAAA'TT CCCTCCAGGT ATCCAGTACG ACCTTGCCAA AGTCmCAA TCACACCACA ATATCATGGT GTCTCATGCA CGCGCGGTAA AAT'rGTACAA AGAGAAAGGC TATAAAGGGG AAATTGGTGT TGTTCACGCC CTGCCAACTA AATATCCTCT AGATCCTGAA AATCCAGCAG ATGTTCGTGC AGCTGAGTTG GAAGATATCA TCCACAATAA ATI'CATCTTA GACGCAACTT ATCTAGGTCG CTATTCAGCT GAAACCATGG AAGGTGTCAA CCATATCTTA TTAGTCAATG GTGGTAGT 1126 GGATCTTCGT GAAGAAGATT TTACAGCAT'r AGAAGCTGCA AAAGACnrGA ATGATTTCCT AGGAATCAAC TACTATAG GTGACTGGAT GGAAGCCTTT GATGGAGAAA CTGAAATTAT CCATAATGGT AAAGGTGAAA AAGGAAGCTC TAAGTATCAA ATCAAAGGTG TTGGTCGTCG 6840 6900 6960 TGTAGCTCCT GACTATGTAC CACGCACGGA TTGGGATTGG GTATGACCAA ATCATGCGTG TGAAAATGGT CTCGGCTATA TATTGATTAC GTGAACCAAC TGTAAAAGGT TACTTCATrT GAAACCTTAT GCcTCrcvCT AGCTCACTGG 'rACAAGAAAG AGATATAGAA TTrAGTGAG TGAAGAAAGA TTATCCTAAC AAGATGAGTT CGTTGATAAC AC7TTGGAGGT TlrATCTGAT GGTCAT'rAAT
ACGTAGATTT
TAGCGGAAAC
TCAAAAACAT
GGArG'rCTTC
TGAAACTCAA
TCAGATTATA
GTTCAAAGAT
'rGCTCTCAAA ATTAT1CTACC
TACAAGAAGA
ACTGTTTACG
GCGATTGCAG
TCATGGTCAA
GAACGTTATC
GACTAGTAGA
'rTTATCCAAT
TACCGTG'TTT
CTCAAGGTTT 7020 TTTACATCAC 7080 ATGATGG.TCG 7140 ATGGAGCTAA 7200 ACGGTTATCA 7260 CTAAGAAATC 7320 ATTAGTCATT 7380 CTATTTATGA 7440 GACGAGTGAA 7500 TTGTCA-rCA 7560 TGAAGAA'rGG 7620 AATTTCTCAA 7680 TTGCTTA'rAC 7740 CTCAATGCAT 7800 TCACGTGTTT 7860 CTGAAATTGA 7920 AAAAAAG=r ATATTATAAA TrCGAAAAA GAATTrGAAAA GTCTTGGAAA ATGGTATGTC TCGACTGGTA AAGAATGGAT
S.
S S 5 Se S S 55
S
S
@5 S S 5
S
C S S. 55 S S 5
OSSW
5005
GATGATGAGC
GA'rACTATCT
AAAAGTTA.AA
TCAATGAAAA
TGGAAGAATT TAAAAATCTA T71TTTAAATT TTATCAATCC CCTrTGATTC AGArTATG CCGTTTCAAC AATCGTAACC TCTTATATTT AGTACTCTGT AAAACTCTTA TCTAATCACG TCAAAGAGCA AC'PTTAAACT AGGAAGCGAG TCGCAGATTT AGCTrTTGAGG AATTrGGGCAA AAAGTCTTG A'rATAGAAAA ACGCATAGTA TTTCTT CAACACCTGA GTTGTrACCC
TTAAAAAGGA
TTATAAAAAG
TACTATGCGT
AGGCTCTTTC
TTGAATCACT
TATAAAAATC
TTTATTGTGG GAAGATTrAC ACTTTATTAA GGCTTGATGA TAGTTTAGAA TCTGAAACAA AAACTTATTG AACTTGCTAT
CTTTAATGTG.TTTAGATAGC
TAGTATCAAG ATTTGATACA GATCTGCGAG TAAATATTTT TTATTAGAAT TATTTAAAGC AGAGCTCCGT TTTGAATACC AGATTTTGTA GGG'rCAATGT A'rTAAAATCA AATCTGTTAA GATGCGTTGA GCCTCTCCCT CTTCCTCGCT AAAAGTAG(.T ATTACAGCTA ACGAAAGCTT TAGAAAATTG GAGATTAGAG
ACCAACAAAA
TTTTCGTTCG
GCACCTGTAA
CN'AAALATCA
TCACGCGCAA
S. 55 S S
S
TATCGCGATA ATT'rCCACCT GAAAAACAGG TAGACTGTrG AAAACTCTAA TGTTGTTCCT CTGCAAAATG GGCTATTTCT 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 GTTACGACGC GGATATTGTC AATAGGCAAC GGTCCAATGA AAATAGTN'C TCITTTCT ACTAGACTGC TG7T1rMCTG CCG~r'1'GGAG GGCTTG7M-T TCAATATTTG ATCGCTCATT AGTCAAAAGG GAGTTGGTTC GAAGTTTTTC AGC1TCCACCA TGCACACGAA TCAGCAAATC TTTATCAGCT 1127 AATTCCTGTA AATAGCGCCT TGCAGTCArA TCrGAAACGG CTAT'rTCGTC CATAATCTGT TAACTGTTA T INFORMATION FOR SEQ ID NO: 182: SEQUENCE CHARACTERISTICS: LENGTH: 3786 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: AATCTCCAAT CAGTGCCACT TCAGCTACAA AGAAGAGGAG GGACAGACAA GAATAATTGA TAGAAGGAGT CGGTTTCACT TGATWTGGAG ACTGGCAAGC AGAATGA'PTC CAATGCTAAT ATCGTAGGCT A'rCAAAGAAA GCAAAGAAAC TAGCAATAGC CCAAGAGTTG ACTATATTGT TGGAGAACCT TGTCTAGCGT TAAATCGTCT CACAACGAAA CTACCCAAGA GGAA'rGAAAA CTAGGATAGA GATGATAGAA AAAAGAGTTA AAGGAGCTAG TAATGCTTGC TATATGTCCA TAGTAAGCAT GTTTGATGTG AAGATGCAGA AAACAGAATG AGCAAGAGAA AGGCTGTGTA GATAATAACT CCGTTCACAA TG CTT GACTT GGTCTTGTAA CACACACA-AG AGGGCTGTAA AGTGAGGAmG ATTGGAATTG CCAGTCCTTT TCCTGGTGGA GAAGAAGAGT GTTGTCGCTA CTGCTCAGGG AAGCGACTGT ATAGATACTA AAGAAAAAGG ACI'GTGTGTG ATACTTGTTT AGACCAAAAA TCAAGCACTT GATATAAGGA CTTrCTAAAG CTGAGGAAGA GCTTCTTGGC GTCTGGCTCT TCTTCAGTAG ACTGTCTTCA GGAGCTTCAA AGGGACTTCC TCCTCTAACT CCAACTACT TGTAGGAGAT TTGATCGCTT GCTCTTTCCA 'N'TATCCCTA GA'PTTTGGAG ATTTACTGAT AAGAAGTGGC TCTTTCGTGG TCTCTTCAGC TATAGTGACT 7*rMCTGTIT
CCACTAGCCA
CTTGGTCGGG
TTGCTTTTTG
CTTTAGA.AAG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 AATTAGATGC CTTCTT TTCT TCTATTTCTG TTCCGCTTC TT7=CCTTC TTGCTGGCTTr TCCAATTCGA CTTCAGCTTG GAGTAT'FNTr TTCAATTGGT GTATCGAGAT CCTCTTGAGC TTGCTCTTCA GGCTTGTTCT CAAACCATTC TTGTTTCATG GTAGAACCTC CGGCTATCGT TTCTTCAGCC TTGTCTGCAA TGCTTGTTGT TT'TTACAAAA TCATTACTTT CTTTTTAGTT AGATAAATAT GTTTCCATAG TAGCAAATGT AAGCGTTTTT GTCAACGTCT GCTTGGTGTG GATATITAGAT CAATATTATC ATCACATCTC GCAATGAGTT GATCCT'rGAC ATCGGTTTTT TCAGTTTTGT AAGGGT'rGCT TAATTCCGTA CCTCTTGATT CAGGCTTTTC TCTTGTGAAT TGGAAGATAG AACCATAGTT 1128 GCTTGAGATG TCCCAG='AA TTCGTTGGCT TTCI'TTCTGG ATCrwTGGCA GTCAGTTCAA CCTTGCCATG GACTTGGATA CTCTGTTGCAC 'rCTAGCTGAC TATCTGTAAG AACTG'rATCA CG~rrGIGAGT TTACTGI'TT TGATACGACT TCCTrCAAr ATTGAGGGTC GCATTTT~CAA GGCTAG-CATT TATGATGGTG GATGTTGATC CCTTTTAGAG TTCTCCCTrT TGGTAGTCGG ACTAGAGTAG CTACTTGCGA TATGAAGAAT CCCACCAATT TTCAGACACT '1rCTTATCAG TGAGACTCAG AGTCTATCG ATGGTGAGCA GAAAGAGATG GATGCTAAGA AATGTGGATT GATGGTGAGC GTGTGTTGGT GGAGAGTAAT TTCTAGGTTT TCTAGGATGA TTCTGAGATA 7'TTCAGCGT GGAAGTGAT AAGATA'N'AA CGATATTCGGG CCGAGGATAT AGCTGTTTGT GTTTGTCCGC GA7"rGGCTGA AGAATAACTT CTTCAAAACG CCAGAAGACA GAAACGGAGT TTCTGAT'rGC TGATAAGATC TGATCATCGA AAGAGTCTGT TCGACTTCCT TGCCAAAGGT TAGC'TTrCC GTACGGCTAT CATAGACAGO TTC=TGGAC ATGGAAAGTA CCCGTCAGAT 'rGGATACCTA CAAAAAGCAG GATAAAGCCG ATAACGGTAG AAAGATGAGA AATCCTTTTG TCCATTTACG CATGCTGATT ACCTCTCT GAACAAATTG TACCAGACGA ACAATGAGTA GACCGAAGAA GCGACTTGCA CAAGTAAAAC TAGCGAAGAA GCACCGATAG CCAGTAA.ACC AGAACCAAAA AGGCTGATTT CGCTTGGGCG AGGACAGTGA AACTTTCAAC TAAAAATAGG TGATACCCAG 'rATGGAAACT GCAAAGAAAC CCAGAATGAC AGTCAAAGCG TTGCGAACAG GGTCACGAGG ATGGCGA'IrC CCAGAGGAAT GCCGATAGGT GGCTCTTAAt
TCACCACACC
CCTTTTTTAA
TAGGAAATOC
ATCAAGATAA
AATCCGCCGA
GCTACAAGAA
GCTGCAAGGA
1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 GGGCTAAC.AA GGCGATATGT AAAATrTGTC GGTrATT'NT TGAGCGGGT GCTTCArrGA TT'TATC GAGAAGATTG GATAGAACTr CGTGGGCCGC TTCTTTGGGA GTTCCCAAAC TAGCGATGAG TTCTTCTTCT CCTTCGACTC CAGCATCGTC AAAGAGCTCT CTGAAATAGT CCATGGCTTC GATACGCTCA GCTTCAGGTA GTTTCTTGAG ATAGAGTTCT AGCTGAGTCA GGTATTCAGT TCTTGTCATG TAGAGTGCCC ATTCATCTTT TAGTArTTGC GCATGCGACC TCCAATTTTT TGAGAATGGG
GCGGATACTC
TAGGGTCAAG
TTGGAACTCT
ATAGAGTGTG
CCTTCTATGA
AGCTGCTCTA
CTAGAATAGG
GATTCTTTCA
CCCTGCTCCA
TGCCATTGAT GGTGTCTGTA TACCACCG'rT TGTCAAGGAG TTGTCAGAAA GCTATTGCCT TATTAGCGAT CAGCTTAATG GTACAGCCAA GATGAGAAAT TCCTTTTCTA ATCTGTAAGA AAAGAAAGCC CTGTCAAGAG AACTAGAAGT TTAAAGGATC GTTTGGCTAA TCTCATAACC ATAAGAATCA TCAATCAAGG CAGAGGATGT TGGAAAGTAC ATGGGAAACC T1r=ATATA TAATTTTTCT ACACATACAT TGTACATCTA AAATGTGTAA AATTTTTATA TATAAAAAAC TTCTAGCTAA TTATCCCC TGTCCACTGT AAAGAGGGCC ACACTCATCA. GGATATCGAT GAGCAAGAGG GCAGCTACAG ATGGTACCCA AGAGTGGAAC AGGTCAAAAC TGTAACCAAA GAGGGTTGGc CCA.AAGGCTG CTAGGATATA GCCTCCTGTT TGAGATAGGC CGGACAATTG GGCTGTCT
TCAGGGGCGC
GCGGTTCCGA
AGCATGGAAA
TTGTCTTGAG TGAAAAGTrG ACCATGAGAT AAGGGAAGAG GGCACTGGTT TGAGGAGATG GATGGCAAGC CAGTAAATGA AATTATTGAT TGGGAAAAAG TGCCGACCAC ACCAGCTAGT GAAACCAGAG TGAGCATGAG CTGACGGTTG CGAGTAGATA AACTGGTTCT CAGGCTrGGG ATGGTCATTG AAAAAGGAAT GCTAATCAGA GATAAGATAG AAGTCAGCAA GCCAGCTTCG TGACTGGATA GACCTGCATG GATAGACATG GTAGGTAACC AGGTCATGAC GGTGTAAA.AG ATTGCCCAAA CCTG=rT~T ACGCATGACC GCTAGTCTAT GATTATAGCG GTGATTTGGG AACG'rGAGGA GAAGGATAAG TCCTTTCCAA
TAGGAA
INFORMATION FOR SEQ ID NO: IE SEQUENCE LENGTH: 3054 base g B) TYPE: nucleic acid STRANDEDNESS: doub] TOPOLOGY: linear ATCAAGGATT GAAAACCTGA AAAGATAATA TTTATTTGAC 7TTPTG'TTr GGTTTGTGGA AGCCAGACCA AAAAAGTTGC TAGACAGAGT GAACTGGCT'r GTGTAATGGG CACAGCTAGA 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3786 120 180 240 300 360 420 480 540 600 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: TCAGCTAAAA AACATNGCTA AATTGATTGA AGCTGGTGCT ACACATTCCG ATTCAACTTC TCACACGGCG ACCACCAAGA ACAAGGTGAG CGTATGGCAA. C-IT'rAAACT TGCGGAAAAA ATTGCAGGTA AAAAAGTTGG TTrTCCTTCTT GATACAAAAG GACCTGAAAT CCGTACAGAA TTGTTCGAAG GTGAAGCTAA AGAATATI'CA TACAAAACTG GTGAAAAAAT TCGTGTTGCA
ACTAAACAAG
GATATCTATG
CTTCGTGTGG
GGTATCATCG
CTTGCTGAAC
GCAAT'rTCAT GAATCAAATC AACTCGTGAA GTGAT'IGCGT TGAACGTTGC ATGATGTTGA AGTTGGTCGT CAAGTTTTGG TTGACGATGG TTGCTAAAGA TGATGCAACT CGTGAA'LPTG AAGTTGAAGT CTAAACAAAA AGGTGTGAAC ATCCCTAACA CTAAAATTCC GCGATAACGA CGATATCCGT TTCGGTCTTG AACAAGGTAT
TGGTGCTCTT
TA.AACTTGGT
TGAAAACGAT
TTTCCCAGCT
CAACT'rCATC TCGTACGTAC TGCAAAAGAT GTGAACGAAG TTCGTGCA.AT CTGTGAAGAA ACTGGAAACG GACATGTrCA A'rG=CGCT AAAA'rCGAAA ACCAACAAGG TATCGATAAC TTAGATGAAA TCATCGAAGC AGCTGATGGT CAACTACCGT TCGAAATGGT TCCAGTTTAT GCAGGTAAAG 'I"rG1'TATCAC TGCAACAAAC GCAACTCGTT CAGAACTATC AGATGTATC ATST'rGTCAG GCGAGTC1TGC AAACGGTAAA ACAATCGACA AGAACGCTCA AGCI'CTTCTT rTTAGCGTA ACTCTAAGAC AGAAGTAATG ATGGATATCA AATTGGTTGT AACTCTrACT AAATACCGTC CAA.ATGCTGA CATCTTAGCA ATTATGAT1TC CTrCGTGGTGA TATIGGGTATC CAAAAAA'rGA TTATCAAGAA AGTCAATGCT ATGCTTGAAA CAATGACTGA AAAACCACGT AACGCTGTTA TCGACGGAAC TGACGCTACA TACCCACTCG AGTCACTAAC TACAATGGCT AATGAATACG GACCTCTTGA TTCAGATTCA GCTTCTGCTG TTAAAGATGC TACTAGCTCA AAGACAGGTC ATACTGCACG TTTGATTTC
?I'GACATTTG
'rTGATGTTGA
ATGTTCGAAA
ATCGTTATCG
ACTGGGCTGT TATCCCAATG T1'GACAGATG TCGCTGAACG TAAAGCGGTA GAACCAGGTC T'rGCTGGTGT GCCAGTAGGA GAAGCTGTTC ACGAATTGAC AGAACGTGGC CTCCATCTTC AACTGACGAT TCGTTGAGTC AGGCGATGAT GCACAAACAC AATGCCTATC CAGCTTTAGA GCTTCTGTGA CATAATGGAT TGATACTCTI' TATATGTTAC TgACTTCGTC CGGCTAGCT'r CCTAGTTTGC CGCACAGTAC GTTAAGAAAA ATATAAAAAC CTATCATATC TAGGCTTTTT GTATAGAGGG TAAGAAATAG GCAAAACTT'r CGAAAATCTC TTCAAACCAC GTCAGCGTCG CCT'rACCGTA AGTTCTATCT ACAACCTCAA AGCAGTGCTT TGAGCAACtG 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 TCT'rTGATTT TCATTGAGTA TGAAATAAGA TATGCACAAA TTGATTAGAA AGTCAAATGA ATTTCTACA-A ATGTTTAGC AATCG;TAATG TACTTGTCTA GATTCGATCT GATATATTTT" CGATT'rAATG ATATGGTATT TAAAACC'rCC AAAGTAGCTT ACTCCATTCT TTTACTTACG TGAGTGTAGA TGTTATTT'AC TGTTT1TAGCG T7rTGTGTr CCACTCTAAC CATTATAGCA TTCTTCTCAG CTAGTGTACT TTATCGCTT TCTCTAGTT AAGGAGTCTG TGCCI'GAAAA TATGGGAAC'r AAGGGGCTCG AGTAr'rrGCC TTTTGCAAAG TGATCTTAAA TGCCTTTCI'C TAAATTTACA TATCACTATT GTT'rAACAAA ATCTAATCTA T'rAGGTCA CTTATTCTTT 7TTGAAATG TAGAATGAAC TTTTCAAAG "rrCGAAT CTTTTAAAAT CTGTTTGCITT TATATCGCCA TTCTCCCCCC TTTTTlAATT CTCCCTATAT GGTACGAATA TGGTTGCTT CGTCTAGGTG GATGTCGGGG TGAGGCAGCC TTGGCGGAGT TTCTTCACAT AGTTAGTGCC AGCCTGACAG CTTTCCCGAT TATTCGGGAT TGAG'TTTT GTCTACTTGG AAGA'rGCCGA TGGTA1rATA GTCAATCTGT GGGGTA'rrCT TGATAAATAG GTAGTCGCTG T'1"ICTTATCT TTGGCTCCAT GGACT'rGCTG ACGACATAAG CGATTGGGTC c'rAGTCGTCT GGGATAATCG 1131 AAACTCCATA TCTAAATCGT TGTCCTGCAT CGAGCGGCTA CCTGC.AGAGA TAAACTACCT AACACGAGAG TAAGTAGrCT GTCTGTAGTC GTCCAGTCTG ATGATTrTA CGA'rACTTCG TTTMCTGAT CATACAGTrG CC'TCTCGGCA TAGGTCAGAA CTrl-ACCTrG TCCCGTTGGT CGTAGATAGA TTGGATATCG CTAGGAGAAT CCTTTTGAAC AGGGCATCGA TCAAGCTAc'r GAATACTTTA ACTAAGTCAA ATATAGTATT GACCTAACCC TTT?'TCATA ATTTCTAATG GTGTrMAC 'rTATACCTAT AATTCTATT GAGTCCAACC ATTACTAGTC TATATTGTTT TATAGTTGAT ATAGTACGCT GTAGCTGCTA AAACATrTCT AGAAATrAAT TTGACTTTCC GTTCATATCT TATr'rCAATC TATTATGTTT TTCACC'rCTA ACAATCGCAA ATCCATGAAT GAAATCGCTT TCTATTTTTG TAAGTAAAGC ATAACACGAA ATGAAAACCT TTGTTGTGTT TTCGTAAAAA ATIrTTGAC AGAGCACGAA INFORMATION FOR SEQ ID NO: 184: SEQUENCE CHARACTERISTICS: LENGTH: 1590 base pairs TYPE: nucleic acid STRANDEDNESS: double D TOPOLOGY: linear
TCTGGGTGGT
TGGAGGAAAG
TTTCTTAGTA
CTTAGTACCC
TGAGTTTGGA
TAATAGAGTT
TCTCTTCTTT
ATCCACGAAA
2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3054 ACGC a. a a. a a a a. a a a a. S a a (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: TGTGATTTTC yGAAAATTTG GTAAAATATA TCTrAATCAT TTTCAGGAGG ACAAAAATTr GACAAGATAT CAGAATTTAG TAAATGGAAA ATGGAAATCA TCTGAACAAG AAATTACGAT TTATTCACCA ATCAATCAAG AAGAATTGGG TACAGTTCCA GCCATGACTC AGACTGAAGC TGATGAGGCT ATGCAAGCTG CGCGTGCAGC CCTGCCAGCA TGGCGAGCTT TATCAGCAGT TGAACGTGCG GCTTATTTGC ATAAAACAGC AGCTATTTTA GAACGCGATA AGGAAGAAAT TGGTACTATC CTTGCCAAAG AAGTAGCAAA AGGGATTAAA GCAGCAATTG GAGAAGTAGT GCGTACAGCA GACTTGATTC GTTATGCTGC TGAGGAAGGT CTCCGTATCA CTGGACAAGC AATGGAAGGT GGTGGTTTTG AGGCAACAAG TAA.AAACAAA CTGGCTGTTG TCCGTCGTGA ACCAGTTGGT ATCGTGCTAG CGATTGCTCC CTTTAATTAT CCAGTTAATT TATCTGCTTC TA.AAATTGCA CCTGCCTTGA TTGCAGGGAA TGTGGTCATG TTTAAGCCAC CAACACAAGG TTCCATTTCT GGACTCTTGT TGGCTAAAGC ATTTGAAGAA GCAGGGATTC CGGCAGGTGT TTTCAACACC ATTACAGGTC GTGGTTCAGA AATTGGGGAT TATATCATTG AGCACAAAGA 1132 AGTCAACTTC ATCAACTTrA CAGGTrCAAC TCCTATTGGA TGGTATGCGT CCTATCATGT TGGAACTrGG TGGGAAAGAT TGCAGATTrTG GAACALTGCTG CCAAGCAAAT TGTTGCGGGA ACGTTGCACG GCCAT'rAAAC GTGTCATrGr TCTCGAAAGT TTTGCTTCAG GAAGAAGTr'r CTAAATTAAC AGTrGGTGAT GAACGTATG GTCGrrTAGC GCAGCTCTrG TACTAGAAGA GCTTAGCrT ACTCAGGACA GTAGCAGATA AATrAGCTAC CCATTTGACA ATGCTGATAT TACACCTGTr ATTGACAATG AGAAAAAGAA GCTCAGGCTC GCT'TrTGAC CAAGTrACAA TTTACCAATC ATTCGTGTGG AT'rCGGCC TT CAATCATCAG AAAACTTGAA GTAGGTACAG CCCATTCcTr GGTGTCAAAG AGCGATGACA AATGTCAAAT TTGTTTTCCT GGTTTTATTT CTTrTTTGGTA TTATAATAGA CTTCAGCCGA CTTCAT'rTGG GGC7TGAT'rG AGGATGCACA TTACACCAAT CAAACGTGAG GGCAATCTTC TCTGGCCAGT AAGATATGAA AGTGGCATGG GAAGAGCCAT TTGGTCCTGT CTAGTGTAGA GGAAGCTATT GCCTTTGCCA ACGAATCTGA TCTTTACAAA TGATTTCAAA A.AAGCCTTTG AAATTGCTGA TCCACATTAA TAATAAAACC CAGCGTGGTC CAGATAATTT GTTCTGGAGC TGGAGTGCAA GGAATTAAAT ATAGCATTGA CCATTGTTTT TGATCTGAAA TAACGTGTAA AACCAGGAAA TTTTGCTATA AAATAATAAT AAT'rATAGAA AAAATACGAA
TTGAAACCGG
1080 1140 1200 1260 1320 1380 1440 1500 1560 1590 V000.
*0 0* :.000 INFORMATION FOR SEQ ID NO: 185: SEQUENCE CHARACTERISTICS: LENGTH: 4848 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: CCTGCAGTTG TCAGACCTGT AATTTTCTTT TTATCTGTAA TAAGAATCGT TCCAGCGCCT AGAAAACCCA CACCTCATAT AACTTGAGCT CCTA.ATCGTG TAGGATCTCC TGTCCCAAAT TTATAAGATA CGTATTCATT CGTCATCATA ATCAAACATG CAGCTAGACA AACAATACTA TAAGTTCGGA TGCCTGCAGG CTGGGATTTG CTCCCTCTCT CTAAACCAAT TATACTACCA ATGACTACTG ATAAAACAAT CCTCACAACT ATTTCAATAT TTGATAACCC AAGACTAGTG GCTGTCATGA TTATTTCCTT ACTTTACGCC CCCGTCTTTG TGTGAAGTAT AATACCGTTC CAGAAATAAT CATCAGAACA ATTGTATAAA CAAATACCAG AGCTTGTGCA TTAGATGTTG CTG'rxTCATC ACCTGCACAT CGAATCGTAA TACCTAATGG TTGAGCTAGG GGATGGTAAA GGAATACAGA TAACTCGAAG TCAGTTAATA AAGAGTTAAA GTTTAAAGCA ATAACAGAGA 1133 GAACAACCGG TAAAATAAAT CCATACTTCT TGCTGCATCT 'rTCTATAAGA AAATGGGATT CTACCAAAAT C1'GA7TCAAG CTGCTAAAAG TGTACTTGGT CAAAACGAGA T?'rATGr'rr TTGTCGCAGC AATAATAGAA TACTAAAGAA TAAGCGATAA GAATTGCAAC TGGATCTGTA CTGTGAACAA TCCATh'rGCT TTTGTTrTTT AAGAGCGCT TCTTATTCAT GATAGTAAGC CAGCTAAATC ACGAGAATTC GAAATTCTTT ACCACCAACA GGAA'rGATAA CC~rCATCAT AGTATAAAAA GGTGAAGCAC TCCATCTCAT CATCAACACr AAATAAAATA GCACGTACCA TTTACAACTA TATATCCAAT AAGTAGAA'rT ACCAAACTAC ACAAGAAATT GTGGCTGATT AAAAGTAAAT AATAAACTTA AGTAACCAAG GAAGTAGAGC ACCATATTCA AATAAGAAAT CTGACA.ACAC GCAAATAC AACT'GCGAGA ATTGTTGCTG TAAATAAAGC TGACCAAGAA TGGAGAGAAT GCCGCACTAT TT?1TCTAAAG TAAAGTTTGA TAA'rCTTAAG TTACCTGTTT AA'rGAGTATA ATACTATAAA AA'ITAGTGGA AGCATGAAAA ACAATGTGAG CAATGATA'rr CCAAGGCTTA GACCCAATT TTAGTCTTAG AGATAGA.AAT AAAATTGTAG TTGCAATACC CCCATCCCTG CAAATGTAAT ATCATGGGTG C'rGCTACTGC
S
S
S S
S
S. CCATAATAGT AAGTGCAAAT AGAGTTGGAA TTrAAGGTTGG CAGTAAATGG TTTTGCTCCC ATATTTCGAG CAGCCTCAAT GAATTGTATT TGTTAAAAAC AATGTATGAT AGACTGCACC ATACCCAATA AACCAGTTAG ATTTTGTAAT CAATCCATAA GGACCATAGA CTCCATAAAT TAAAGAGGTC ATATAACCTA AGTAC'rCTGT AAATAGAACA CAAAGAATAC ATGCTAACTT AAAACTGT'rC ATAATACTCT GTACAGCATC AAGGGAAAAT TCTCCTCCT TTGGATAAAT AATAAATGTT ACTAAGAACC TTAAATTTAA TTTATGACGC ATACTGCACC GATAAATAAT TCCACACTT'r CTCCGACAGA TACATTAAGA ATCTGACTTT CAGAAACTTT CTCA.ACATCA ATAATTGTCC CTTTTAGAAT AACTTTCTCT AATCGAA7GT ATCCTT7=T
TAGCAGTTCC
GGTCTAAAGA
CAAATTTATA
ATTTTAAAAT
CTACGACATT
GAAGTGCCCT
'rTACAAATAC
AGATTAACCC
ATAATTTCCA CCTrTCTA TAAAATAATT GCAAGTAGGG AATCATTGGA TTTATAGTT AGATAAACCA CTAAGAAAAA TAACACTACT 'TTCGGAAAA AGTGTGATAG TCAACGCTTC TGAAAATGTC ATAATIGAATA AGGGATAACA TTTTGTAAAA TCCAGTCGCT AAAACCACTC TTAGCACCT TTAATATCAA AACTGTAATA A'rGAGTGAAA CTGAGATTTT AGAACACGAT ATTCAC'rACT AGATCAAAGT TAAACGAATA AGCCAATCTT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 TCCTTAAAAT TGCAGAACGT CTGATGGTGT TCTAATAGCA CCCTGACTAT CAATACTTGT
TATTGTATAG
AAAATCTTCT
ATCCTCTAAG
TGAATTGTAA CTCCAGAAAA TCAGTTTCAC GATTGAATCG AAAACGCTTG TATTTTTCAA 1134 TAATACTTCG TGGACTGTTT CATCGGTCAA AACATTAATA TCTCCAATAA AATCACATAC 2340 AAATTCAGI-r TGAGAATTAT GATAAATCTC TACTGGTGTA CCGACCTGTT CGATCTATCC 2400 ATTGrrAAAG ACTGCAATTC TATCAGA'rAA AGTCAACGCT TCCTCTTCAT CATGAGTAAC 2460 ATATAAAGTA GTAATACCTA ACrT"r G AAGTCT=TC AACTCT7rM TCAAATCTAC 2520 ACG'rAA7Tr GCGTCAAGGT TTGACAATGG TTCA'rCTAGA C"=AATTT TAGCTTCAAG 2580 AACCAGACCA CGAGCCAATG CTACCCTTTG '1rGTTGACCC CCAGATAATT CTGATACAT'r 2640 ACGCTGTAAC TGTTGATCAG AGATCTTAAT TTTTGCTGCC ACTGCTGATA CTTTAGCTTT 2700 AATAACATCT GGAGCTACCT TCTTAACTTT' TAAACCAAAT GCAATATTAT CAAAAACAGT 2760 CATAGTTGGA AATAGCGCAT AAGATTGAAA TACAATACCA ATTCCACGCT TTTCAGGTTC 2820 CAAATGAG'rG ACATCTGTTC CATTAACrrC AATAr~crCCT GATGATGGAT CTAGAAAACC 2880 TACCAATGCT CTCAAAGTAG TTGATTTACC ACATCCTGAA GGCCCAAGAA ATGTAAAAAA 2940 TTCCCCr'rCA TGTATATCTA AATTCAGATT ATCAATTGCA ACAAAATCAC CATATTTA-AT 3000 TTGAATATTA TCAAATTTAA TCATCTCACT AACTCCCTCT ATTACTAAAC CAAAAGCCTC 3060 *TCTTTATTTC 'rTCCATAAAT TTAGAAATAA TAGAGAGACT TGGACATAAA AATTAACTCT 3120 *TATTTCT'rAT TGTACGTA'rT CTAATTCAGC TTT7TCTACC CATTCATCCA AATGCTTTCC 3180 *AACAGCTTCC CAGTCAATAT TTTGTGGTTT CAC'ITGATCA ACAAALTTTCT TCGTATCTTC 3240 *AGGTAdGATCT TTGAGGGCAT CT'rTATITTGC AGGAATAGAT CCAAAGTTCT TACTATATTC 3300 *TACTTGAATT TCTGATTGAC CAAACCAATC AATAAArrCT TTAGCTAACG CTTGTTTTT 3360 ACTAGTGCTT AAAACCATAG TTTTTCAGT TACAAATGGT ACACCAATCT CAGGAGTCAT 3420 AACTTTGAAA ACAACATTTT GTTCTTTTTG TCCAAC'rAAT GCACCAGAAC CCCACATCAT 3480 *TCCATATTGT ATTGGATCTT CTTTGTCTAA CA'rCTTA ACA ATTGAACTTT CTCCCTTTTG 3540 AAGAGTGTAT GCATTTTTCA AATA1rrCTTT TGCTACTTCC CAACCTT'TTT CGGAAACACC 3600 ***TAATTCACCT TTATCATCAA GGTATCGAAC TAAGATACTT GCTAGAArrG CCCGTCCTG;T 3660 ACCTCCTTGA AGACCAGAAA TTGAATATTT ACCTTTATAC TTACTACCTA A'rTCAGTCCA 3720 ATCTT'rAGGC ATTTCTrlTTA CATCAGGCGC CCCAATTAAA AC'rAATGG'rr GAACAATCAC 3780 *AGGATTATAA TAATTATCTT TATCTGATA.A AGATTGATCA AT'rTATC-rA ACCATTTAGG 3840 C'rTGTACTGT ACrAGTAATT TTTGATCTCT AAT=TTT GAATCAACAG CACCAATTCC 3900 AAATACCATA TCTGCAACT1G CATTATTCTT CTCAGCAATA ACACGGTCTG CTAA'1'GAGC 3960 GCCAGCGATA TCAACCATTT T'rATATTAAA ACCAGCTTCT TTTCTTTAG CAGTTAACCA 4020 ATCACCACGA CCA~rTGAGA CTGAGTTCGA ATAGATAACT AATTCTTGAC TTTTATCAGC 4080 rrrcirCA GATGAAGAAG CAGTCGTAGA AGTA.AgAGCA ACTCCCGTTG CAAGTAC.AGT TTCTCCTn'I TTATATTTT ATrTAAATTT TTCAAATACA G1'ATAGTCAA TACGGTTTAC 'rTCAGTTA.AT TCTGCTCTGA CTTTAGTAAA.
CCGAAGATAC CCAATTGAAA TATGGAATrG TGCTTTTCT GAGACATAAG TACGAATTTC AGGT TCAACT AGTATGTT GTTTTCCCAT CAACAATGGA AAAATTCAA GTTGTTTAGC ATCTAGAGGA AGATTACTGC TCCAAAACTC 1135 ATTTGAACCT CCAGAGCAAG C.AGC.AGTGT AGACCAAACT TTCAT11NM TCATGATAAG ='CCTGATAT CGAACAAAT GTCTC.ATATC AGTAATAGTT GGAATCTrCT CTCTTCTTCC TCCrCT TCGG ATATCTATCA TGArrAGGGA T'TCTAATCTC TTTGCAGAAG TTCAGI'ATA CGCATATGAA AAAGTAATCA TGTAT'ITCCT g tTCACGATT T'rCATGGCAC
CTAATAAAAT
TTAGAGGAAT
AACAAACACC
CTrCATCTGC TrTCTTCATC GTAAAGCGT1
AACAATTCAA
AATGGTAATT
GTATAAAGAT
4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4848 TTACAGTCAT GTGAATAGAA TTCC?1'GGAG TTAAAGTAAA CTTATCGATA CTCTATA.ACG TGATTGAATA ATATCAACAA CTTCCATCAA ATC'TTGTTTA TTGCTACAAC TGTATTCCCA GGGAAATGAT TAAAT'rCCCC ATTCTCGG INFORMATION FOR SEQ ID NO: 186: SEQUENCE CHARACTERISTICS: A) LENGTH: 3763 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186: GTTATAAGCA ACACCTTCTT AATGAGTTGG TT'rAGCTGGT TCGCAAACTA TGGACTGTTT TGAAGCAAGC TGCCCTCCTT ACAAGCATAG TCCGTrCCAT TGAATCCGAA TAACGACAGT CTCTTTTAGT TTCAACTGG TAAATGACCT CCATCGAAAG TAAACTGTTC ATGTGACTTT GAAACTGTAA AAAGAATTCG ACCAATTCAA CCTCGTCAGT TCTGGAAAGA AAACGGGATA CCAATAATTT TGGAAAGTAG GCATCAGCTG AACCTGTI'AA CAGTTGAAAC AGGAAC'TGGA AGCGGCGTTG GTCATTCGTT ACTAAATACT AAAAAAGTTC CTGAAAAAAG ATA.AGACCAC ATAGTTGGTA AAAAGACTTG TTTTGGAAGT CCTTTCTTTT TGTGTTTN' TCTACACTrA GG'T TGAGGCA
AGGTTGGCTG
ACAATTCT!'T
CAAGGATATC
'rAGAAATCCG
CATACTGGGT
GATGATTTGG
TACCATAAAG
ACCATCCAGG
GCTTGCCATA AG'N'GTGAAA. TGGGTAGAAT CGATATCTAC GGGAAACTCT T'rTTTGTCrA GTAAAAAACA CCCATTGGGT GAAAAAAGAA 1 I I 1136 A'rCTAAGCTA AGGCAAGGAT TCTGGATGGT 'N'?TAGATTT GGGGTGAATA ATI'GGGGATI' TAGGAGAAAT GATGGTATCT TCCAAATCAA AATCAACTTC ACTCCATAGT CTCAACTGAT TGATTTTCCC ATCTTGATAG GTCACATCCT TGTCAAGGAT AAACTGAGTC AACACC'TCAT GTTrGACCTrG ACACCTGA'rG TCATCTACCA AGAGCCAGAC ATCCTCTACC AACATGAGGA TVI'TTCTCCT GTGAAGATAA GGCAAATCAG GTrCTGCrGA CCAATAAGCC CCCrCAATAT AATGCACrCC CTCCCTTrCT TTATGGTGAC AAAACAGGGA GTGAGGATAG TAITrCATA'1- CCCAGGATCC CGTGATTCTT TCCGGAGCTT TCCCATCTAC AATGCAGGTC GAATGACTCC AAGCACTCTT TAAGACATAA CGT'rCATATA TCTCCCGATA AGAATAACCC CCAGCATCI'A TGAAAATAGG TTGGCCTrGA TACTG'rAAGC AAAAACTATT CTCGTCACTA TGACTATGGG 660 720 780 840 900 960 1020 1080 1140 CACTTCCTAG CGGACCATTT GTCCAGAGTC TTCAAAGATC
TTGAAAAATA
ATGGACTTAG
GTCGCTTGAC CT'rTTCTCGC CCCAGGAACA GACCGTTAAG AAGGTCT'rCC TGCTTCAAAA TTTCTGTAGA ATCGCTA'rCA CCAAAAGCCA GAATATAGGT CGCCATCTTT TCCAGCAACT AGAGACACAA ATCCAGCAAG GCTTTATAAA ACTGGCTTCC ATCTCCTAAA ATCTGTGTCT AATGGTAAGC TTCTTCTAGA TCCATCTTAT GAATTGTTTrG TAAAATCCCC CAGTTACTAA TAAAGTCAAT CT~GCTTTTCT AGACTGACCA AGTCAAATTT CAAGAGGAGC AAGAGTAGT'r CCAAG'rrCT AGTCA'rCAAG GATTGAGGAG ATAGAAAGAA CTTGCACTTT 'rGAATATAGT GATAACGATG TTCATCCT'rA GCTGCCAAGC TCTrCT CA AGAGGCTAAG CAA.ATCAACT CCACAGCAGA CAGGCTCAAA AAGTCCGTCC ATCTAAGCCT
CTTGGTAACT
CCTCTACATG
CAATTTGCTG
CTGAAAAGAA
GGGTGTACT
AAATTTTCTC
TCAACCAAGT
AAAArTCT
CCTTATCTCC
ATCTTGCAAG
ATAGAGAATC
TTTCAACTCC
ATGATAGATA
GGCGCGATAG
TAGTTC'N1TC ATGCAGACAT 1200 AATTCCTGCA 1260 TTAACA'rCCA 1320 ATTTCTGTCG 1380 GTCATCATTT 1440 TCTGGAAGCA 1500 GACTGT'rCAA 1560 TCTGAAGCAA 1620 GCAAGCATCG 1680 TAGC7TTTTCA 1740 TCCTCTAGCA 1800 APAAGGAACGA ATACCCGTAT CACCGCTCA ATCCAATCAA TTCTACCAGA TACCCTATCA CCATTCTGGA TCATC=TCAA TGAACAAGGC TCCATATCCC TAAACTGCAA GAGATATTCT TGTCGATTGA GCATATAAGA ATACTTGATC CCATACCATC GGCTGGATTT GATGGATTTT 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 AAGGACTATC AAACATAAAA CGArrGTCCA TCAAGCGTTC AAGGGAACTC TTGACTTTCT CATAGTCTr'r TGAACAGTGC GACAAGATAT AATCACGACA TTGATTTCCA TCGACTCTTT CAAAAAATTG TCTTCTTTCT TCTTTCATTA TCTAT'rACCA GAAAAAGAAC TAC?1'AAAAA GCAGTTCT7r TGTCTTTCCC ATTACACTTT CCTTTTCTAC ATGGATGACC ACACCTTTG CAATCTGCAA GGAGACCAAG TCATCTTGGA TAGAAATGAT 7=TCCATGA ATTCCAGACA ATAACAACAC TTCATCACCA AATGTTAAAG GTTTGCGAAG CAACTTTTGC TGACGAATAG CTGCCAAGAC AATCACTATT CCATAAAACA CCAGCTTC!CG ATAATTCCCA TGATAACTGT C r? 1TCTTTG ATCAAGTAGT AAACTAAAAG AACCTTATCA AGCATTCCCT GAATACTTAC AAGCTAAATA CTCTrGTCGT TGCTCCATCr AATGAAAGCT TGACAGTAAA AGGGGACTCA ATGTTGTATC CATTAAGCTA TAATCTTAAG CCC'TGCAGCA AAGGTrAATCG TACGtTACAC CGATAATATT TCAATCACAG TTGTTCCTAG AAAATCGTAT TCATCGCAAG AACGAAGCTG CGATGGTTGA
GCACCATAAT
GGCAATACGA
TTTGTATCCA
GAAGAACAAG
GAACAATGGA
TAAAATAACG
TGTAAATAGC
GATACTTTGT
CTTAACAGAT
GAAATCGTTG
TACAGACCAG
ATTGGACCGA
GCTAAACAGA
AGTrTATATG TTGI'CCATT GCTGGTAGAA GAGCTGGAGC TTAGCGTCTG CTTTAACTrC GTCGCTGCCA AACCAGCAAT CCATC'GTTC GCT'rAGTTrA TTGACAATTT AATCGC'rGTT CAACCAAGCC TTCTTGAGCA ATTGAGAAAG AGAATCCCCA 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 ATACCTGCCA ATGGTCCCAT CAAGGCCATC TTGATGCTAC GrTTTCTTT TGCCGGACGG CCAT'rTTCCA ACATTACAAG ATGCAAGCTG GTAATAAAAG GCAGGAAGTG TGGGT'rGGTA
S.
5 9
S
*5 S S 0
S
0e 5* 5 5
S
S
*SS.
S
0 .5 S S
S
TTATAGAATT CACAGTTTTC T4GTTTTTTCA AAGCAGGATA TTCCAAGGCT TGGTAGAAAC CATCACAI-rG GCATATCCCA CTTCCTGATC CTCTCCATAG ACCCTTGATA G?1'ACTATAG 'rTAAATCCAT TTTGACAAAA GAATCCCCGC AAAGACGTTT TAAGATAATC ACGTTTGTT AATTTGTrAG ATCCAGTCAT CGTGTGCTTC CrCCTCTACC ACATGATCCG CTG'TrGG CTrGTTATAA AATTCAATCA AAGCAAAGAT AGTACCTACA ATTGCAATAC CAATTGTTGG GATGTTTAGA TAAGCTGCAC AAACATATCC CAACAAGACA AAGGGAATCA ACTCTTTCTT AGCCATCACT GACAAGATCA TCGCAAAACC GATAGCTGGG AGCATTTTAC CAGCAACTGT CAAACCTGTA AGTAATACCG GTGGAATGTA GTCTACGAGT TTCAACAAGG TATCCATTGA AAGGGCACCA AGCAACCCAA GGTAAATCCA ATAAAGGCAA ACAACCAAAT TGTTGCATTT AGAGTGAACT TAAATTTCTT CAAATTATGG 7rTTCAAGT GCT INFORMATION FOR SEQ ID NO: 187: i) SEQUENCE CHARACTERISTICS: LENGTH: 5053 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 3480 3540 3600 3660 3720 3763 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187: 1138 CAATCTCTGA GTATGTGCGG TCAATACTAw CAAAGGGAAT CAATTGGmnCT ATAGGTAATG GCAACCACTC CATCAACmT CATAGTCTTG CTCTCTATT'1 G 'ACCATTGA TAGAACATAA TA'rAGAC -rC ATTCCACA TGCATAGCAA ATTCTGAAAA G'TACAATGAT TGCAATCGr TCTG?1'CGAT 7T1rTTrCAT GAATGTAATT CAAAGTN'TA ATCGCTTGTT CCACrTTTT C CrTTT1 ATTAATTACA CGTGAAACAG rrCCAACACT YCCI'GACGTC AAGTAATGr'r ATTATG-ACGC AACATCTCCA GACTAATTTG TTAT7"TCTCT GAAGGGATGC CAGATACTTG TCCTCTAGCG TAGTAATCTG CAAAGTTACT TCTTTAATC AACTCCTGCT TCTAAAGCAA CATCTTTCAT GGTAATTGAT TTTrCTTTGrT AGTATCATGC AAATGCTTTT TAAGCAACTA TCCCATCATG AATAAAATCA CTCCAATTAG AACATCTACA TAAAACACA AAAGCCTG AAAAGCAAGA G=ITAGATG GCTATAAATC G'rCTTTTCTT AACAAAAACA CCCCCAAAAT AGTTCAA.ACG CGAAATAGCG T~rT7TTT ATCATGGCAT TTAACAAGTA TATGTGAG CTCTTAT'rTT CAA'rAGGAGG AATAATAAAA TCTAAAGATT CC?1'TGATAA ??CTAATTCA CAATAATAAG AAAGTGTCTC TGCAACGAAT ACTAAAACCT TTAGTTTAGG CTGATTTTGI' GTGATGAAGG rATAAGATAG ATGATTTACC TCTAAATAGT GAGAAAGCTC TTTTTATA AAGTTCC1'GA TTGTATTCCC TTTTTGATCA CGATACAGAT GTGCGGTATT ATGCAGATGC ATCTGATACT TGACTGAAAT CTGATCAATA CTACCATATT ATCACCTCCT TTTCTCAATC A7TTTGCC CTTTrGAAAA TACTTCAATT TTTrCATGGCT TTTTTCGTAT TAGATGTACA TTTTGCTTAA TAGACTTTTT CTGTCTAACT ATTrTTTGGTT ACTCATCTAA ACCGTGTTTA TATTATTTGA TTAGAAATAA TGATATCATA
TTCAATATAT
AGATCATTTA
TTCATGTGTA
CT'rCTATAAA
ATGATTGAAG
TTTGAGGTAC
TCGAATAAAC
ATAGATGAGT
AGGTGAATCT
ATTATTGCTA
AT'IACTCATA
TTTGGTATGA
CTCAAGAC
AATATTTTGA
CAACTCT'rGA
GTCCAAACTT
'TGCATGA'r
AGCAAATTAA
ATCATTGAAC
TCTGAAACAA
AATAAAATAA
CAAATCAGAT
AAATCACTCA
CCAGTTCAAA
ACTGATCAAA
TC.ACCAAATG
TAGAACAAAC
ATTTTGGAAA
ACTCAGTAAA
480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 :440 1500 1560 1620 1680 1740 1800 ACATAACTAT CCTr=TAC GCATTTCATA AAGAGACT TGAAACTAAG ACACAAATAA TTGGCAAACA ACrrCTTCAT GATTCAAAAC TCTGAGCAAC ACCTTCTrr CCTTCTGCCT TGGTCGrrAA AAGAATCTTT ATCTACTTCC ATAAAATGAC TTCGTAACTA GGAGCAACTr TAGCATTCTA TGCGTTGACA TCC?1'ATAAA CCAATTCTAA CAATTGAGAT AGTGGCTCTG TATCCTTATT CTCCATTTClt ATAGATGGTA AGATTTrTTCA CATCTATGAA AAACATITTTT CTAAAGAGAT AT'TGTArrCT GCATTAAAAA ATCCAAACTT CAAAC=TAT TCTATATAGC AATTCATTGG AAAGCTTGT'r ATGAAAAA7N TCAAATGGC 1139
CATTCTAGGA
TCATTTCCAA
ACTTTATTGA
CAAATACCT
GTTGAATGTT
ATGCGTATAC
GATAGATCAT
'rGAAACCAAC rTAGATAATA
TTGCAATTAT
AATAATATTT TTCTGAAAAA TGATTTGAAC AGGGGTCAGA TGGCTAAT AATACGATAG CAGCTTGACA ACCTICATTA TAAAGAAATG ATGCTAAACC CATTAGTAGA AGAATGAAAA CT'GACTGC ACGTTCTGTA GTTTATGTTC AAATAATAAT TATTGTGCAA AAAAGTAACG AATGTCTCTC CTAACTTCAA AT'IGAAATTG CCrTTT'AATC AGCGAAGATG AACTGATATA AAATCTTTA AAGAAGATGA ATTCrAAAA'r CGAAAAATGA AmTCAATAT CACTATCATC GGTATTAATA ATCAAGTCAG GAAAAGCAGA 'rrrAACATGG CAA'NTrAATA AC'rCTGCTAG TTCAGAACGA TCTAATAATT CTAATTGCC,, ATGACTTTrM AATCTCTCAT GAATATCTTr1 CTCTCTTT~AT AAATTATCGG ACCACAAAGA ATAGGTATAC CATGATATAA CGACTTTCC
ATTAAACCTC
TAAAATCTr'r TATCGTAT AATAACACTA CGGAGACAAT ATATAAACAA TrTrT'rATT TTACCGTCTA TTGAGGGCGT GAATACAGAA TCA.AATTCAA CTrAAAGAT TATATTTT'TA ATTTAAAAA TTATATAATA GCAACAAT'rA AAGAATTTrGA TTTTTTAAAA TTATATAATA ATAACAATCG AAATAATTCA CTTTTCTATA 'rTAAAG~rAT ATAATAGTAA TAATCAAAGA PLATTGAI=? T'rGATATTAA AATAAAAAAG GAGGGTAGGC AGTGTTGTGA TCAATTATTG CTGGAGGTCT TATTGGTCTC TTGGCAGGTA AAATCACTAA AAAAGTAGTT CTATGGGAAT CATCGCAAAT 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 TTrGrGGTCCA GTATTCGCTG GTTTAGTCGG GGCATATGCA GGACAATCTC TTTTAGGTAG GCAATCGCTG GAATGGCT GTGTCATTCT TTACAGGTAG AAATCAATGA CGGGAAAAAT CAPLAGAAAAT ATGTTTCATT TGATAGAT'rA TCATCAAGTT CCGTAGTTGA ArrCTATCTT TAGTrTTTATT ATCCATTT'rA TTGAAACTAA AAACGATACA GTTTGGTGAG TGATCATAGA ATAAA'rGTTT CGTTCATGTA GCCAAATAAT TCAAAATGAA AAGTAAAACT TGAAGTTGCA GCCCCATC'T ATTGTAGGTG CAGCGATTGT GAT'rACTGTA AAAGTAAACT T'rTCGCCACT AAAGTTAGCA AACTATTrT AGTTTrAAATG TTAAATCGAA AGGA'rTG'rAT ATGTCAAAAG ATTTTCTCTA TTTTALATCTT GACAATTTTC CTTCCTGTT AGTGATCTAG GTA'rrCATCT ACTTAGCTGG AGACAGAACT GCTACATATG TC'N-rrGGGG GACAGTGGTT CTATCAACTT GTTGTGATGT TTTATCCTAA ACG'rTACTTG GAAATCCAAC TTAAAATTAA AGAAT'rCCGC AATCGAAGCT TTTC'PTAGPA TTGATCAAGA ACCCAACTGT TCATGTAAAT TTACGAAAAA GAAGGTAAAA TTCT'rCCTTC AGACAACATC CCTGACAGAT ATAACTAATG GATTGAAGCA G~TTrTTGGT ATTGAGCG1TC GTAAAAAATT ACCAACCAAA ACCTCAAAAC AAAAAGACTG 1140 TTACTCGTGT GAAGTAAGGA AGTAAAAAAT GGAATGGCTT AAACAATATC GATATCCAAT TATCGCTGGT CTCATAGGCG AACAATATTT GTATTGATrT AAACTATATA GATAAATAAA TCAAACGAAA AAAACACAAA GAAATCAAAG GGGAAC?1'AC CTAGAAAACG TTTCAGGTCT AAAATCGTTA ACAGCGATGA GTTGCAG?'rG ACTTAAACGT TATI'TCTGGC TTC?1'TGATT GTCTCCTTTG GCTTCTTCAA TAGGAGCACT GGGAGITrGCA GCTGGA'rTAT ATATCGAAAA AAAATAAAAA TTACTAATTT- AAr'rAAAGGA GTTTCATATG CACTAACGTA GAAAAGAAAG ATGCTACTGT TGTAGCTCAC TTACGAAGA'r AAAGTTATCC AAAAAATCAT 'rGGTCTTTCA TTTGGOAATC GATGCTGGTT TCTTCTCAAA TCTTAAAGAA
CGTAACAAGT
TAT'rGTTGrAG GGTGT-rAACG 'rACCAAAAAA
GCTAAAATGA
GAAATCAGAG AAATCCTATC TTCAGAAGTT TAGAAGTTGG TAAAACACAA ATGTTCCAGC TTTATATTCA CTGACTTGGA AATT-CTrGAA ATGAAGCAGA CTCAGTAAGC AATTCACTTC AGAACAATTC TTCAAGAAAA AGT'rAGCGAA ATCAACGTAA ACGTTGTCGA CATCAAAACT AAAGAACAGC CTTCAAGATC GCGTATCTGA CGTTGCTGAA TCAACAGGAG GAAAAAGCTA AATCTGGTCT TGGATCTGGT TTCTCAACTG 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5053 GGTGTAGAAG CTGTTAAAGG AACTAAGATA AAATAAATAT CAAGCTAAAG GTTCTATTAA AAAGAAGGTG CAGCTGAAAA GACGCTG'rAC AAGGTGCTGT GGTTAAAAGT TACTTTATCT TCTCAGAAAA CAAAAAGCTA TCTTTCTCTT ATTATA'rTC TCACTTGACG C1TTACCTCTC CAAAGACAGA T'rTCA'rATCA TGCAGCAAAT GGTGTAGTAT AACAGGAGAA ATTATCATCT AGAAGGTGTT GGGAAAGCCA
CTCACGAAAA
CAGTAGAAGA
TCGGTGATGA
AGTTGTTTCT
AGAAGGTCTT
TTTTAGTAAT
AAAGTAAAAG AAGTTGCCGA AAAAACATCT TGAGTGGCGA ATTAGTCAAA AGAGTCTGAG
CACTCGTGTA
AAAATTAAAT
AAAAATGGAA
AGACGC1'AAA
CGATAAATAA
TICAAGATGAT
TAATT'rrGCC
ATGTCAATTC
TTTATCTGCC
TTGGAAGATA
GAGATTCCCA ATTGCG43AAC TCTAGCTTTT AGCAGGTTGT TGGCCATGAG TACGAATCCC AGATGACATC TCTTATAACC CAAACAAACC ATCT'rACGTT TAGCGAAAAT TTIGTCTACCC AAAGTGCCTG ATATTCTTTA GTTTTTAAAC ACTGGTAACG TT'CATTCATA TACAGTCTCT TTT1GAGGGGC TGATTCAGGT TCATAATCGC AGTCAACAT'r GATTTCAAGG CTGTTTGCTT TC'rATCTCCC CG INFORMATION FOR SEQ ID NO: 188: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 6492 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1141.
(xi) SEQUENCE DE-SCRIPTION: SEQ ID NO: laa: AArrCTC~rr 'PTTCCAACAA AATG'TATGAC CTGCACTrGA ATACrrCTCA rrGITTrGTAC ATrCATCTAC TTT'CATATAA TCTTTTACAA TT1TAGACAAT ATTCCAATTA GCCTTATrAA GATGTATAAT AGAAAAGCAA TGATAGATAT GATATTAAAG AAAGGAGATA TC-ATGAA AGATAAGAAA TATC~rGGGG TTrTGCCCAT AGrATATGGA TATT'ATTrAA TCTACAAAT ATCCGGTGCA GAGAGTATAG CATTAAAATC TTAT'N'TTGTC TCAGGAATGT TTTCACATAT AAAAAGGGa.A TCGATGGTCT GGAAAAAGCA GGTCAAATAA GAAAGATTAT AGATGP'A.AT ATGATTCCCG ATAGTTCTCA GGCAATAATC ATAGTAAGTA TA.AGAG7rGG CATAA'rrrrC AATCATAATA TGACATAACA CACTATCCCT TTCAAAACTA TrGTArrAGT AAoTTATAACA TATCAATTAA GCGAATTTAT ATCTAAAAGG GA=TACA.AA AAACTA'r1'G CTTATGTCCA AATTTTCT GCTATATCTG CTGCACTTAC TCTAGATAAG TTAATAA-TA ATrCAAACTT TGTrATTACA CTAACAAGTG GAGCrATATT CTTGGGATTC AGGCTTCAAA CAAA'rTTAAG AGTTTTAGGT TCTTTCACTT AAATCCATC' GCTGCACAAA CTCA'rCAGGr GGTAGCACAC ACACCCGTAC 7-rGACT GC ACTTGGCTTT
CTTGCTCTTA
TTAGGGGCAA TGATGGGCGA GCAAGAATTT ATGAAGATAT C'rAAGTGCTG
AATGTAGAGT
GA'rTATTCCC
ATTGCAATTT
ATI"MACni, AAACTGTTGA GTACGTGAGA GGAA'GCA.AG c'rTTTAAAAG CTTTTATAAG GCGATAAAAG TATCTTGTA-A AAGGCCTTAT GTTTTGTA'rC TAATTATTCC TATAGTTTAT TTTA'rGACTA AZTA-TCAT r7,.2- rTIT TTATCAGGAG CTATA.A7'GG TGGCTmAATTr ACCAAGAATC CCTATCTAAA TTCTAAAAT ATTTAAAGCA ATTACTCAAA GTATGCTTAT AATGGTTATT 'rrTGGACTG GCTTACCTAG CGCAAAGGTG TrCTCTTTGT T'TCATTCATG TGCAGTAGAT ACTTTAGAGG TAATGrCAAT AATlnrAAALA TGATAAAGCT GTCATTGAA.A TGTCGGTTCA TCTGGATCAG TGTTAATAAA GGAAGCATAA CTrTAA'PTAAA GCCATTTCCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AGAATGATGT G ,ACTCCATG CGCTTTACGA AGATATGCAA ACTATAATAT AGAATTTGAG ATTTATCCTT TAATTTAGAA GCAAATCAAC AGTAGCAAAA TATATTTCTC AAGGAAA'rTA AAAGACAAAT TAGTGCATGG AA'rGTTAGCT
GAAGGAAAGT
CTTATATCAG
TTGCTTATAA
CCTACGCACT
G'r-1-TTACAA AGATAGOCGG GATAGCAATA AGTGAATATT CTGACGAAGC TTGrrTrCA AGATTCAAAA TTATTCAAGA AGAGCATT'rA TGATAATGTA GCGrrAGCTA ATAAAGATGC GACGAAAGAT GACGTTATGA GAGCCTTAAA ATTAGCAGGA TGCGATTTAA TATTAGACAA ATTCCCAGAA AGAGAAAATA 1142 CAATCATAGG CTCAAAAGGT GTTTAT'rTAT CCGGTGGAGA AAAACAAAGA ATTGCAATTG CTAGAGCAAT TTATTATGGA TGAACCATCA GCATCTATTG ACCCAGATAA CTTTTAAAAA TCTATGAAG GATAAAACAG TTATCATGAT TrAAAGACCT TGATGAAATT A.7rGTCATGG ATAGTG.GAAA ACAAAGAATT AATGTCAAAA GATACAAGGT ATAAGAGCCT TTTAAAGGA'r TCCAAAATTA
CGAGTTTGAA
TGCACACAGG
AATTATAGAA
TGCAAAAAG
CTATC'rACAA AGACGC~t=G GCAAGAGATG TTTAACAGTG AAAGAI"'~rGC TCTTACAGAT CGAATGAATG GAGGGTTTCA GGAGGAGCAA GAAATTTAAG ATGCTTCCTG CCATATTACT AGCAATGGCT TTTATATAGT TCTATCGAAT ACGATAAA'T AGGACAGCGG AGAATTTATC AATGAAAGAG TTTTATAAAA TAAAGCAACA CTGGCTTCA'r TATGA7'NT GCTCAGGAAG ATTCTCAGTT TTGATTTTGA ATATAACACA ACCTATCAAC AAAA'rTACCT CTATCTTACT
TTTTCCTTTA
TT'rGGAAAA
TACCAATGTA
AAAGTGCAGA
T?1'CTA.AACA
S
S.
S S S S S* 55 S S
S
S. 55 .55*55
S
*S
S
*S.5
S
GACATTTCAC
ATACCAAAGG
GGCAATGTCA
CCTTTATCTA
AAACAATCAT GGCTGATATT GAAGGCATAG AGCATGCAAT TGGGCGGCAT GGTACTGTTT TTCCCATTAA TATCTGTAAT AGATGGGTTr AGCTGTAATT ATTCCATCTA 'rTTAAGCTT AAAAATATCA GGTTAATGGA CAGAATAGAT ATTATGATGT
TTGTATAAAC
TATGGGCAA.A
TATTTTGCTT
T'7rAAGAATA
TGACATTTCC
GAGCCACTCA
GATGCTAGCG
TATATTTATA
CTTAAGAAAA
ATATAATTTA
AGTACACTTA
CTrwTATATCT
AAATTCTCTC
TGCATCTAAA
AGAAATTCAA
TGATCTAAAA
T'rTTAAAGCT 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 AACTCAGAAA GC7II'CAAGA TCGAAGGATA 1TAAAGATGA AAGGCGGAAG TAACTACAAT CTTGCTGTTG TGATATTTGT AAATATCGAA ATGCAAA'rGG AGATTAAAGC CTTATATAAA AAAATGGAAG ATAGTGAGAA TTT-AACTTT-G TCTATATCTT CAATATTTAG CGGCGTAAAT CTAATT-A'rrA ATAAAGAGAT TACCTTATAG GATATTTACT AGCTGCTATG GAGGGCTTGA TGGAAATATT TTAT?1'ATCC AATCAAGATT TACAAGIAAGG CGATGACTAT GArGT'rGACT TTGCCTACAA TAAAGACCCA AAGCAGGGAC AGGTCACTGC 'I-TGGTAGGT AAACTTATAT CAAGACT'NTA TGATTA'rGAC ATAAAGGAAA TATCAACAGA ATCCCTmr GTTCTCTTTA ATCAAAGCGT TATGGAAAAT GAAGACGrA AAAGAGCAGC AAAACTTGCA AAAGGTTCG ATACACTTAT TGGTGAAAAC AAGATAACAG ACTCTTTAGA CCCAAAATAG AAAGATTAAA AGCTTAAAAA AA'rTTGATAT AAAGTTTTAA ATGGTGTAAG GCAAGTGGCT GCGGTAAAAC AACTA'rCTTG AAGGGACAAA TCTTAATCGA TGGCAAACAT CATAAGGTGT CTATTrCTTT CCAAGATGTG ATTAGAATCG GTAAGC.AAGA TGCAAGTGAC AATTGCACAG ATTTTATAGA AAAAATGGAT GGAGCTGAGC TA'rCAGGAGG AGAAAGACAA 1143 AGATTATCAA TAGCCAGAGC CTTCTAAAA GATGCGCCGA TATTGATCTr AGA'rGAGATA ACAGCAAGCC TTGAT-GAA CAACGAGAAA AAGATTCAAG AGTCTTTAAA TAATTrAGTT AAAGA'rAAAA CTG'rrGTAAT CATTTCACAT AGAATGAAAT CCATAGAAAA TGCAGACAAG ATAGTAGTTC TTCAAAACGG AAGAG'TAGAA AGCGAAGGTA AGCATGAAGA GCNTTACAA AAATCAAAAA TTTACAAAAA TTAATAGAA AAGACAAAAA TGGCAGAAGA ATTTATTTAT TAGGAGGACT ACAATGGATA ATAAAAAATT AAAAGTAAAA GATTTAGTAA GCATCGGTGT I*MTGGCGTA ATrTA7=~G CC~nCA'rGTT TGGAGrrGGT ATGATGGGCT TGATTCCAAT ATTGrrCTTA ATATACCCGA CAGTATTAGC CATAGTTGCA GGAACTGTTG TTATGTTATT TATGGCTAAG GTTCAAAAGC CATGGGCACT ATTTATATTT GATGTTTGCA GCTGGTCATA CCTACGTAGT TGTGGTT'rTA AGCAGAATTA ATTAGAAAGA TTGGTAATTA TAATTCATI-r TGCAATCTTC AGCACATgGA TATGTAGCTC TTTAATGCAA ATATATGGAG TGGTCTTTGA TGACTATGGG AA.AAGArrAT AATAACTTAT CCTCACATGG CTTTAGTAGC CTTAGGTGCT AGCATATATA GGCAAGGC'rC TATTGAAAAA ACACTTTTCA ATAC7"rrACT CCTTGCCrAA TTTTATGGTG CTATCTGAAT GGTATGATAT CACCACTTGT TCACTTATAG TAATGATAAT AAATACAATA TGCTTTCTTA ATGCTTTITAG CAAAAGAAAA GTTGATGTA'r TAGAAAAGTT TTC7rAGGAG GAATTCTTGG AATGGATTAT ATTGTGTGGG TAAACCCTAT AGTTAACATG 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 TTTTTGAGTA TACCTATTGT TATTAGAATG TTTATTT'rAC CATTTATGGC AGCAAGCTTT ATGATAAAGA CCTCGGATGT AGGCGCAATA ATT'rCATCGA TGGATAAGCT TAAGATTTCA AAGAATGTAT CCATACCTAT TGCGGTTATG TTTAGATTCT TCCCATCTTT TAAGGAGGAG AAGAAAAACA 'rCAAAATGGC TATGAGAGTA AGAGGGATAA ATTTTAAAAA CCCAGTCAAA TATCTTGAAT ATGTI'rCTGT GCCACTACTC ATTATATCAT CTAATATATC AGATGACATT GCAAAAGCGG CAGAAACAAA GGCAATAGAA AATCCAATTG CCAAGACCAG ATACATTCGC GTAAAGATAC AGCTAATTGA TTTTG'mrAT GCTrAGCGG TTGCTGGACT TATTGTGGGA GGCTTAATAT GGTTGAAATA AAAAATTTAA GTCI'TGATTA TGGTGAAGAG CATATATTAG ATGATATATC ACTATCCATA GCCGAGGGAG AGTGCGTGCT ATTTACAGGA AA.AAGTGGAA ATGGTAAGTC ATCTTTAATA AATTCAATCA ATGGACTAGC TGTAAGGTAT GATAACGCAA AGACAAAGGG CGAAATAATT ATTGATGGTA AGAATATAAA AAATTTGGAA CTTTATCAAA TCTCAATGCT TGTTTCAACT G7TNTTCAA.A ATCCTAAGAC ATATTTTTTT AATGTCAATA CGACATTAGA ATTATTATTT TATTTGGAAA ATATCGGTCT TGCAAGAGAA GAGATGGACA GGCGTTTGA.A GGATATACTT GAGATATTCC TAATCTATC CGGCGGTGAA AAACAAATTC CAAAGATrAT AG?1'ATGGAT GAGCCTTCAT TGGCAAAGAT GCTAAAGATA TTAAAAGAGA GAATTTATTA TTTGATGGAC ATAGTGACC AAAAAACTrA TACTAGAAGT GAATT'm=TAA GTTTAAGAGA TAAAGAATTA AGTAAATTAA 1144 CGATAAAAAKA TCTTTTGA.AC AGAAATA'rAT TTTGCA'rrGC AGCTI'CTrAT ATAGCAGGTA CGAATTTAGA TATT'AAA6AGC ATAAGTGTT AAGGCATA6AG CATAATTGTT GCAGAGCATA GTGTATTTTT AATAGATAAA GGAAAGCTTA AGCTAGATAA AAATGAATTA AATGCTTT-AA AAGTTCCTTA TTrAAAAGAA GGTGGAGAGT 5220 5280 5340 5400 5460 5520 5580 ATCAGATAAA AAATCTTAGT TACAAATTTA CTGATGATGA a a a a a. a a a TTTCGTTCAA GCTTG.GGAAA CGCNTTTAAG ATGTTTAATA GAGAGAAGCT ATCTAAAAAA ATCATCAATT ATTCACAGAT ATGAAGAAAA GGCGA.AAATC CCAGAATCCT TGCCTTAGC!T TTrACTAGAC AAAAAAGAGT AAAGGAAACT CACATGAACA ACTATCTTrC GATGGAGGTC AI'TATGGCA TAATAGGATC GGTCTTGAGA AAAAATCAAA GAAAGACTCA AAAACTCTrC GAAGTA'rrCA ACGAGCTrAG ATTTTAAACC CCAATTATTC TAGATCCTGG ATGGTrTCTT TTCCCCTTTA TGGTATAAGT GTTTACCAAA TCATCACTTC ATTTAACCCA GTATGGTGGT GTGTTTAAGC TTAAAAGATA 5640 CAACGGACGA GGAAAATCAA 5700 AGAAGAAATT TATTTrAAGG 5760 ACTTGTTATG CAAGATGTAA 5820 A'rrAGGAGTA AAGAArrT'r 5880 ACCCCAAATC TAAAAACCAT 5940 TTTTCACCCA ATGGGTG1'TT 6000 GTAGAAAAAA ACACAAAAAG 6060 CAAAACAAGT CTTTTACCA 6120 CTrATCNTT TTCAGGAACT 6180 TTAGTALACGA ATGACCAACG 6240 TTCCTCTTTC AACTGTTAAC 6300 GATGCCTACT TTCCAAAATT 6360 GTTTTCTTTC CAGAACTGAC 6420 TTTTTCCCAG T'rGAAACTAA AAGAGCGGAT CCGCTACTGT CGTTATTCGG ATTCAGATAT AGGTTATGGA ACGGACTATG CTTGTAAAGA GTTGGAAGGA GGGCAGCTTG TTCACAGCCA
TTCTAAG'TAT
CCTrGTCCAG
ATTGTCAGCT
ACCTTATCCC
GAGGAAACAG TCCATAGTTT GCGATGCCTC AACCTTGAAT TGGTCGAATT CTrTTTACAT GTTCACCAGC TG INFORMATION FOR SEQ ID NO: 189: SEQUENCE CHARACTERISTICS: LENGTH: 7174 base pairs TYPE: nucleic acid STRANOEONESS: double (0 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 189: AACTGAAGGT AAAGGCTTCG ACGCAGAACG TGACGCTGCC CAAGCTGCCC TTGATGACCT 6480 6492 1145 TAAGAAAGCT CAAGAAGACA ACAACTTGGA CGACATGAAA ACAA6AACTTG CGAALAAAGCT CAAGGAC7"TG CTGTTAAACT CrACGAACAA GCCGCAGCAG TCAAGAAGGA GCAGAAGGCG CACAAGCAAC ACGGAACGCA GGCGATGACG AGAGTTTACG GAAAAGTAAG A'rGAGTGTAT TGGATGAAGA GTATCTAAAA AAGI-rATAA TGAT'TGT AATCAAGCTG ATAACTATAG AACATCAAAA ATAATAT'rCC AATAGAATAT TTAGCTAGAT ATAGAGAATT ATATTAGCTG
AAGCATTGAA
CGCA.ACAAGC
TrCGTAGACOG
AATACACGAA
GATTTTATTG
AACATGATAG
TTGTATCAAA AATGATGAAG CGGTAAGGAA TTT-TGT'rACC TCAG;TATTGT TGTCTGCATT TGTATCGGCG ATGGTACCAG CTATGATATC A'N'AGAAATA CAAACATATA AATTTGTAAT ACCCTTCATA ATTGGTA'rGA TTTGGACACT AGTTGTATrr CrrATGATCA ATTGGAATTA Ce 0 @00 0 bt
C
C
C. CC C C C. 00 C 6
C
CC CO
C
C
0000
C
CC .0 S C TA'rAGGCAAA TACTrAAGAAG AGACAAAAAT TAAAATCAAA ATAAAGTTAA TTACr'rATTl CGATAAAAAG GGACGGAATC TCA TTT AGAG'rGTTCG TAACTGAACA CGGGTTrCAG TGAACCCGAC CTAA6ATGGTG GTTCGATTCA GTAACTGAAC ACGGGCTATG GACTGTGCCA CCGTCGTCAA AACTCCTAGA AACAATAC'TG AATTTTATGA
AAAAAGGCTT
GCTGAGGACA
CGTGCTGCCT
GGTTTCGGCG
ATATAAATAT TTCTGTACTr ATAGGATAT ATCGTA6ACT
AGTACAAGGA
ATGACCAGTA
GTTTCAATGG
TTCTTCGGCG GAGGCGGTTC CAGTATCGTG TCAATTTGAC TATCATCGTG AAGCTGGCTG CCAGTCAC'rT GTGGACGCTG CTTGGTATGA TGCG'rCGCCA AAATATCCAT GTACAACCTG GTGAAAATCC CTGCTGGTGT
TGGCTGTGTC
TCGTCTGGGG
'rTCCAAAAAA
AGTTCAAGAA
TGCTCCTGCA
GGCAGGTGGC
TT-CGCGCAA'r CT'rTGAAGAA
TCGTACATGT
TCA'rGGCGC'r
AGTAACCTGT
TCATGGAACA
GGAAACAGGT
TGCAGAGGTT
GGGTTTTG'rC
AATTTCTTAC
GAACATCAA'r
AAAAGATAGT
CGTTTGACGC
GTATCCAAAA
TATCACCCAG
GCC'rATGAGA
GGCGCCAATG
TTCGGTGGTT
CCAAACGCTC
GCTATCTTCG
AATGGATCTG
GGTGTCATTA
GATGTCTGTC
GGTCATGAGA
CAACAAATTC
GCAACCCAGC CTCTGTTTTT TCATCAATAG AAAGGAACAA TAAATATA.AA AGAAAGGAAT AGAAAGGAAT AAGGGTGT'rC TTTTTCTAGG ACGTAAGCGT CCT-TGTATC T'rGAArrATG ACGCTTCGGC AGACGAAATC ATATCAACAA GGAGCCTGGT CTTTGAGTGA CGACCAAAAA
GTGGTTTTGG
TTGAGGATAT
CTCGCCAAGG
GAACTGAGAA
G'rGCTAACCC
ACGTCGATAC
ACGGTCGAGG
AACAAGCTCA
GCCTCGCTGG
TGGACCTGGT
'rTTCTCAAGT
AGATGATCTC
GGAACTTAAG
AGGGACAAGT
GCAGACTCCT
AAAAGAAATC
TAGCGTACAT
TCAAGGTGAA
660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 GCAGGCTTTA ACGGTGGACC TTATGGTGAC TTGTATGTAG TAGTTTCTGT GGAAGCTAGC
GACAAGTTTG
GCGGCTCTTG
CCAGAGG4GAA
CGTGGCGGTG
AACGACCGCC
CCAAAGAAAA
TCGAAAATCT
1146 AACGTGAAGG AACGACTATC TTCTACAATC GTGATACACT AGATATTCCA ACTCTTCACG CTCAGACTGG TAAGAAGTTC CGCCTACGTA CAGTTGGTGA CCAATACG?? ACTGTTAATG
AAAAAGTAGC
AAG~C1'TC7r
CTTCAAACCA
CTTGAAAGAA TT~CGCGGCTG TGACCATATr AAAGATGCCT CGTCAGCGrr GCCTTGCCGT TCAACCTCAA CTTTrc'CCAA GTGA7437MA ATTGGTTATT- GTAACGGGGC ACCGAGCCTT TCGTAACACC GACAGGC -rG CTCTGAC1-r GAAAGTAAAT VIGATGGAGA ATAATACTCT ATATATGTGA CTGACT'rCGT CGTGGCTAGT T1TCCTAGTTT TGACGGACGG TAGTGACCTC CAGTCGTATC TACAACCTCA AAACAGTG=1 TTGAGCAGCC GCTTTT=ACT TTATAGATTT TTTAAGACTT TCCTAAGTAA TAAGTTTTAA AGTTTCCGGA CAGCTGAAAC CTrCGAAGTr CCATACCTAA CAAGCTGTT'r CAGGTGT'rFT AATCATACTC '1TCAAAAATT
ACTTGAACC
CATTACCOCA
TCTTCAAACC
TACTGACTTC GTCAGTTCTA TCCACAACCT TCTATCCACA ACCTTAAAAC GGTGT'rrTGA TTTGAT7TTT ATTGAGTATG AATTACCTAA AATAGATT'GA AATAGAATAT GAACAAATTG TGCTTTKGAA ACTATGGTGT GCTATTCTAA GAAAGTCTTC GATTTAGTTC ACGTCAGCGT CGGCTTGTCA CAAAACAGTG TTTGAGCTGA GCAGTCTGTG CCTAC'1TTC ATTATGATGC ATAGTTGATG ATAAGAGGAT TTTAAAGTAA
TGAAATGGTG
TGGGTATGGT
CTTCCTCAGT
TAGTTTGCTT
GGATATATAT
TCTCTAACAA
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600
AAAAAAGAGC
GCTAGTCTAG
TATTCCTTCA
ACATCGTCCT
TGGCGATTT
CGTCGGGCTC TTTTTACTTA TCTGGATATC C'ITTTTCCAAG TAAGTTCTTr TNGCTGGTTA TGATAGCTTG AACGCGGTCA TTGTAACCAG ATAACTTCCG ATTCAATTCA CTATAACTT TTTACCTTTT TCTTCAGTTC CCTGCATTrC TN'ATCACA ACCTTAAACT TGTAAGTCAA GTCTTCTTGG ATGATTTGCA GGCTGTITTG GATAATATCC
CTGGTATTCA
GCTGCAGCTC
AGACTTCATC TGTGATGGTT CTGCAA.ATAG CAGTAGGTTG GATAATTTCA TAGCAACTCC TCAAAGTCTfG GTTCGTGGGC GGCACAAGGT GAACGTGAGT ATGATATTCA TACCAGCAGC TTAAGCGTTT TTGATCGTTT GATAAAATCA ATCTTGAGGT ATGAAAAACT GTTTrGACCAG
CAGCGACTTG
CATCGTCAGC
CGACTTCTTC
AGCAAGTTTG
ACI'CTAGCGA
ACAGTTGGAA
CTTAGTGACT TTCATGACTT TT'TGAGCTAC 7w=rGTACT TGGGCAAAGA GTTGGcTGGC GCTCGTAGCA TCCATCTCCA AAAGAT'rGCG ATAGTGTTCT ?1'TGGCACGA CCAAGCTGTG TCCTAGTGTT ACTTGAGAGA TATCAAGAAA GGCAAGGACC TGCTCATCTT CATATACTTT TGAAGCAGGA AT'N'CCCCTG CGATCATTTT ACAAAAAATG CAATCTGACA TAAALATCTAC CTCTACTGTA CTGAATTTG ATATAATATA GCTACATTAT 1147 ACCAGA'rTTG GAGAAAATAT CCTGTTTTGA AAGATGT'GTC GTAGAAATT AAAAACCTGA CAGGTGGCTA TGTTCATGTT CTTTACTGTr GAAAGTGGGC AG7TGTCGG 'TTrGATTGGT CTCAATGGTG CTGGGAAATC AACGACGATC AGTGGCTCCA TCAATATCAA TGGCCTGACT CAGATTGGCT ACAT'rCCTGA GACGCCTAGT ATCGAAACGG TTGCTATGGC TTACGGTATT CCCTTGTrAA AAATGTTCCG TTrGGAACAG AAAGGGATGA ACCAGAAGGT CATGATTATC ATCGTGGA'rG AGCCTTTCCT TGGTCTTGAT TTrGGAAGTGG AGAAGCAAAA GGGCAAGTCT AATGAGATTA TCGGTCTGTT CTGCAAGGAG ATGCGACTAG CTGTATGAGG AATTGACCCT GAGCAAAAAG 'rGGCTTTCGA AAATrAGACT GGTTCCCTGT TGTGCrTTTG TGGTGGATCC CCGCTGGCTA TTTCTGAT?1' ATTCrCATGA GTACCCACGT GGCACC7'rAT
CTACCGCAAG
CAGAGAGCAT
ACGAGTAGAG
3660 3720 3780 3840 3900 3960 TCATTT'rTCA 4020 AAGTCTTrTC 4080 GATTCAGCTr 4140 GC'rGGATTCG 4200 GCGGAGAAGA TGTGTGATGC CTr'rGTCATT CTTCACAAGG GAGAGGTGCG TTCCAAAGGC AATCTCCTGC AAC1'ACGTGA AGCCTTTGAT ATGCCTGAGG CTAGTT'rGAA TGATAT'rTAC TTGGCTCTGA CCAAAGAGGA GGATCTATGA AAGAC?'rGTT TTTAAAGAGA AAGCAGGCCT TTCGTAAGGA GTGTC?=GT TATCTGCGCT ATGTC'TCAA TGACCACTTT GTCTT'GTrCC 4260 4320 4380 4440 TCTGTCCT GTTGGGCTTT AAAATCATTG GCCTATCCTT GAGGAACTGC CACCTATATG AAATTAAGCT CCATCTCAAG AGACCC?'r" CTTGCTGTTA TT'rTTCTGCT CTATGTGCTT CCAGCAAATT TTTCACTGAA AGCGTAAGCA AGTCTTGCTT ACAGCGTTAA GCGTCGTGCC GGAAGATTTG GCAAAATCTC CTAGCCTACC ACTACAGTCA TTGTTTGTAG GAATTACSTC GAGGCTCCAG ACAAGCTCTT ACTCTTACAA CATTTCCTG 4500 TCTTT7TACTT TTACTTTGG 45S60 TCTCTTAGTT GGAGAAGAGG 4620 AGTCT'rTTGG CTCTTTGTAC 4680 AA'rGGGTTAT GGCTTGCCAG 4740
CGTCAAACTG
TTTGCGCCTT
TTATTGGGGG
ACTGGACTGG
CGTTTCI'TG
GCATTTCCCT
TATTTTTAGC
TAGGAAAATA
AC1'GGGACTA CCCTCTT'rAC TTTCCACTTT TGTCAAAAGG TGT'rATTTCT CAAGAAAGCA GCAGGTCAAG GGAATTTCAA TATCTGGACT TTATTTTAAA GGCTGTTCAG AAGCCCTG TATCTGCGTT C'rTATCTGCG AAATGGCGAC CTCTTTGCTC 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 TCACTCTTCG TCTTCTCTTG CTTTCCTTGC TGGCGCAGGT TTrMACGAG CAAGCTTGGA TTGCGACAGC AGTGGTAG'rT CTCTTTAACT ACCTCTTGCT CTTCCAGTTG CTGGCCCTCT ATCATGCC'Ir TGACTACCAG TATTTGACCC AACTCTTTCC GCTGGACAAG GGGCAAAAGG AAAAAGGCTT ACAGGAGGTA GT'rCGAGGA'r TGACCAGTTT TGTTTTACTT GTGGAATTAG T'rCTTGGGT'T GATTACCTTC CAAGAAAAAC TAGCCCTTCT AGCCTTACTA GGAGCTGG?1' 1148 TCGTrTTACT AGTCTTGTAT TTGCCTTATC AGGTAAAACG GCTGATACGA CACTAAAAAA GAAG7TGAGT TCAG GTC ACAGGATAAT GGTTGGTCCG TAGAGACTTA TACTCTTCGA AGCC'rCGTCT TACCGTACTC AAGTACAGCT TGCGGCTAGC qTrCATTGAG TAT'rAACTTG GTCTTGACTT GGTCAAAGTG3 AAGCGGCGCG ACTTGGAGCA TCTGGATCAA GAGCGCTGAG
TCAGATGCAG
TCAACrrCTT
AAATCTCTTC
T'TCCTAGT7'r
GAAGCGGTCA
GACTAACATT
T7'?TGTTACT
AAACCACGTC
GCTC7'TCAT
TAGGCCCGCC
TCTGGTAAA.A T'TTTCTAGT GAAAACCAAG AATACTTT TGCATGC= G GTAGGCTTTG CTGTATCTGT CATGGGAATC TGT'rTTGATA CTG'rTAGTTT rC-ATTGAAAT GTCTrGAATT CATGATGTTC TAGAAAATAA GCTTGGGCTA GACCTTTTCT ATTATTCAGA CGCAGAAAAC ACACATCCAA CTAAAGCCCA CTTCAAAATT TAAGTAATCT AATTGTTTGG TTAA'rCT'GA CTCGGAGAAG AAGGTCAATA CAAAAATGGT TGAAAGAGGG TTCCCAACAG ACTAGAATTG TCCTAGCTCT TTTGAGGTAT TCAATCAAG.A ATCGATTATC AATTGCTCCI' GAAACCAC TTTAAGTCAG TAATCAAAGT ATGAGCTCTT TTGATGGGGT CCTCCTTTAA TCTGGGTGCC AGTCTTACTT CTGGCAACTG ATCACTTTTA ATTCrTwn"' ?TTTATTCAAA TCTTTAA'rTG GCGCTGAC1TG AATTTTATGA TAAAATAGTT GTAAGCTCAT TCC~rrTTAGG AGTTT'TCAAA GACTGTTTAG GATTGGGTGT GTTATTCTTT 'rCTTAGGAGG AGAATCCAAT GAAATATATG AGTCTATAAA GTAAACATCG ACGATATCTA CTATATCCAA TACCGTACAG ATTGTTACAG AAGAAGCTAG T=~AA'rATG TGAGAACCAA TGTCGGGAAA CCTTGATGAG ATGTCATCGA TAAA'rrAAAA TCGATTGATT T'rCAAGAAAG AATCCT??T CGCTGTCAAG TATGCCAGAC GTCGCTATAG AGAAMI'CGT AGAGTAAGAA GA'rCAGAATA 'NTrGTTTTAG AGGA'rGATTT TCCCATGAGA AGACTGGAAG CACI'GTTTCA GCCTTGGCTA GTCGTCAGCG CTTGCCTGTT 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140
AAACGACGAT
TTGGCAAGCC
GGGGGCCCAT
GGAAGTGGCT
TCACTCGGAG
TAAGGCC~nrG
TAGTCAAGAT
ATTTCAGTAT
TATTC'rCTAT
GCAGGAGCCC
CAGCTA'ITCT TTTTGGATA'r AGAAAGATTC GGGATCGGGA TTTATGCCCC TGTCTTT'rCG TCAGCAGAGG AGTTTGAATC AGTAAAAGTC TGGCGGAAGA CCTr=AAAG ACGTTTACTA ACCAAGACAG ACAGGCTGGA CGTCTCTTGC AGTGCCACCG TGAGAAkACTT TTGAAAGCAC ATCATATCAT GGACCAACTG CTGGC'rGAAG TGCATGAGAA TGAGATTCGA AATGAAGAGA TGAAGGGACT TCCTATGCC CTGATTGTCT TTGTGACGAC CTACCAAGTG TCTGCTTTrGG ACTACATTGA TCCGATCGAG ACAGCCCTCC TCTATGCCAA TTGCT'AC TTTAAATCAA AATTTGCCCA TCTCGAAACG TCGCCCAGAG CCCATCGTGT ATTTACAGCG AGTTTAGAGG AGGTTTTCAA CTC7=TCTC ATCAATCCTG CAAATGTGGT 1149 GCATTTGGAT AAGAAAGAAA AACTGCTTTT CTTT INFORMATION FOR SEQ ID NO: 190: SEQUENCE CHARACTERISTICS: LENGTH: 3207 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: CCACCAGGGA AAATCATTGA AGTTGGTAGT TACCATCATC TATTCAATAA ATAAGGAGAA CGAGTTTTAT CAGAGACGTT ACCATAACTA ACTATTTACT TTCATCTTGA TTTTCTCCCT CCAAGGAGAA ATCGCCCCTA CAgTGTCATT
CACCAAGAGT
TGTCATGAAT
TGCGACAGTG
TAATGCAGGC GCAAAGTT'rC CCTAATCTr'r TTAGAAGCGT TTAATTATAC CTCTTTCATT TGTTGCCACA AAAGAAATTA CTGTTACTTC CCCTCCATTC AGTCAACCAG TGATAATCCT ATCCTAGCTA ATCATTTAGT GGCAAATCAA GTAGTTGAAA AAGGGCACTT ACTCATCAAA TACTCTGAAA CAATGGAAGA AAGTCAGAAA ACTGCCTTAG CAACTCAATT ACAAAGACTT GAGAAGCAAA AAGAAGGACT TGGAATTTTG AAACAAAGCT TAGAAAAAGC GACTGATCTT TTTTCTGGCG AGGATGAATT TGGCTACCAT AATACCTTTA TGAATTTTAC TAAACAATCC CATGATATTG AACTGGGTAT CACAAAGACT AACACCGAAG T'rTCAAATCA AGCTAATC7?T TCCAATAGCA GTTCATCAGC TATTGAACAA GAAATTACAA AAGTTCAACA ACAAATTGGA GAATATCAAG AGTTGAGAGA TGCTATCATA AATAACAGAG CACGCTTACC AACTGGCAAT CCGCACCAGT CAATTTTCAA TCGTTATCTT GTAGCCTCAC AAGGACAAAC ACAAGGAACT
GCAGACGAGC
GCAACCCTCA
CATTTTTATC TCAAATTAAT CAAAGTATTG AAATTCAGCA AGCTGGTATC GGAAGTGTAG CAGGTCTTGA ATCATCTATC CAACTTATGA TAACAGTTTrA GCAACCAAAA TTGAAGTACT CCGCACTCAG 'TTTTACAGA CAGCCTCACA CCAACAACTA ACTGTGCAGA ATCAATTAAC AGAATTAAAA GTACAACTAG ATCAAGCCAC ACAGCGTTTG GAAAACAATA CCTTAACCTC CCCAAGTAAA GGTATCGTTC ATCTGAACAG CGAATTTGAA GGTAAAAATA GAATTCCAAC TGGTACAGAA ATTGCTCAAA TATTCCCTGT CATCACAGAT ACAAGAGAAG TACTAATCAC TTACTACGTA TCTTCTGACT ATCTACCTCT ACTAGATAAA GGACAAACTG TAAGATTAAA ACTGGAGAAG ATTrGGAAATC ACGGCACCAC CATCATCGGC CAACTTCAGA CAATTGATCA AACTCCTACC AGAACAGAGC AAGGAAATCT CM~AAATTA 840 900 960 1020 1080 1140 1200 1260 1320 1150 ACCGCTCT'rG CAAAACTATC TAACGAGGAT AGTAAACTCA TCCAATATGG CTTACAAGCT CGCGTCACTA GTGTAACTAC AAAGAAAACA TA77"rTGA'N AITrCAAACA TAAAAT~rMA ACACA7TCTG ATTAArTTC AGATAACACT C'rATAACTAT VrTTATCTT ATCAAAAAC ACAATCATAA CATGGATAAG AAACAAAACC TAACTTCA'rT TCAAGAACTA ACAACTACCG AACTCAA'rCA AATTACAGGT GGAGGATTGT GGGAAGATTT ATTATATAAC ATTAATAGAT ATGCrCATTA CATCACATAA GAACTTCATC ATCCAATACA ACTATAAAAA AA'rAAGACCG 1380 1440 1500 1560 1620 1680 AGAAACAAGT ACTCTCGGTC TTATrCA TCATI'CTGTA TGTATCACAG ACGAAAGACT TGATTTTGAC AGGTCGTATT TAGACTGGTA TTAGGATGGC CTTCATGACG GTATAGAGAC CAACrCCTC'r CTCCTCCCCT TTAGAACTGG GAAGAT7TTCA GAAA'rATCGA TGCCCTCTTC TNTGATGGAG T=TCGATGA CTGTCCTCCA 7TTTT'TAAAA AGGCGATTGA AACATGAGGT TGACTAGCTT TTCAATAGCA TTGTCACAAA GGATAGACAC CTCGACCTGA ATCTCCTCAG TAAAAATTTC CCTGCTAGAA CAGGTCATAT TTATTGTTCT CTC'r''TC TGCTCCATAT GGTATAATCA TGACGAAAGC GCGTTCCATA TCTCTATAGC CTTCA.AATAG GTATCCAAT'r GATGAGATGG CGAACAGTCT TTCCATCACT AGATAGTAGC GGCT1'TATCG AGAATCTCCT AGCTAGAGAA ATGAAGAAAT GAACT'rCGAC
GACTTTTGAG
GCAATTTCTG
CCTCCTCTTC
TCCGTACTTC
GCAGGGCCTG
TCTTGATAAC
T'TGATTGAAT
CACCCATTAT
TTCTCAAGCT
TCAAAATTAT
AATGGTTAGA AAATCAAGTA ATTAAAGACA ATGT'rCTTAT GGCTTTATCA CGAATATTTA ACTGCAATCC TTTAAGACGG AATGCCCAGA CGTrAAGCTAG CTTGTAAAGC TCCTCTATAT TAAGTACCTG 1740 TTTCCACALAT 1800 CTCCAAAGGA 1860 TAAAGGTCTC 1920 CCACACTCGC 1980 GACTCATCCC 2040 CTCTGGCTTT 2100 CCAATCTGCC 2160 AGCCATAGAC 2220 'rCAAGAGGTr 2280 GCCGACTATA 2340 AGAGTTTTTC 2400 AAAAGACTAG 2460 AAGACAGACT 2520 AC7=TGAAA 2580 CTrTTcCC
CCCCATAAAA
ACTTTGTTCA
CCAGTTrAATC
AGTAAAATCG
TATACATAAC
AATCTCTCAT
AAGAGTAGGT
TATTCAAAAA
TGAGTCAGGG
TAGTCCAACC ATTCAAAAA CCAGTA.AATG AGTAGCCATC GAGACTCCTC TATAAAAGAG AGTTTTTTAG GAAGCCCTCT ATATACTTOC CCTTGTCCCA AAAATGGAAG CACAAAATAG
ATTCACCAAT
CAATAATAAG
ATCATTGGAA AGAGACCATA AAAGAAAAGG AAAGATAAGC CTATGCCGTA CAAGGGTTCC ATAAAATAAG ATAGGTAAAC 2640 2700 2760 2820 2880 2940 3000 3060 3120 ATTTCCTACT ATATAGCTAA TCATCACAAA AACAAAGGCC AACAGTATCT TCAAAAGAAA GGCCTTAAAA ATCCTCTCGA AAGTAAGATC AATTCCATCC ACCTTAAAGA AGATGACAAT TTCTAGTCCA TTAGTAACAA GTGTATACAA CAATATCCAA GCAATGTTCA TAAATTCTCC TAGCTCAGTG TAATTTATTG ATGGCCTCAG ACACTTCCCT GACCTTATAA CGGGCGATTA 1151 GACAACTTCC ACCATTGGGA GAGAAGAGCA GTT TrCTr-r CTTATCCAAA TGCACCACAT TT-GCAGGATT GATGAGAAAA GAGCGGT INFORMATION FOR SEQ ID NO: 191: SEQUENCE CHARACTERISTICS: LENGTH: 1D357 base pairs TYPE:- nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: CTGAATCAAG TGTACTGCAC CAGTTCGTGC ATCAGGCATA ACAACATCTA CAGATATAAT ATTGTTTTCT GAGTCCGCCT AGTAGC'TCT TCAATTTCTT GGAATGACTA TTTTCATCAT CATAAGTTAA AATCATAAAT TTTTCGATAT GAATCATTrC ATCAGAAACT AACTCCATCT TTTTGTAGGA AGAATGTTGA TTAAGATAAA CTGAGCATAT TCAAATAAGT AGCCACTCTT ATTrTTTrTGT ACCAAAGGAA AAGTCGCTTC TTACCCTTTA TAATTAACAA TACTTTCCCA TATTTTCTG AAATTCTAAA TATCCCCAAG TCTGTCCTGC TAATTGTAAT TTATACTCAA TGATGCAAA'r GCAGTATCAA TATGATTAGG TCGCGTCCAT GCATAACCAT CATTGTCTCT CTTTTTCTA GACGTTCATC TACATAATCT T'N'TGCCCTT ATCTACAATT T'rGTGCCT CAAGCGAATC AAAGAGATCC TGA'rTCAACA TCCTCCAAAT ACTTTTTAAT GAATTATACC ATTTTCTTAA AGAAAT'rACT TCTTTTT AAAGTTCTGT GTCAGAGTAA TTTAGAAAAT TATATCTTCT
TCCAATTTTT
GAATTGGAAA
GTGTATTCAT
ATTGGTTTGT
TATwrrGTTTC
ACAAATCTGC
TCGACACTAT
TCATCAAAGT
TAAT TCTTCC
ACAATAATTA
ATAGTAAA-AT
CAATTAAAAA CTGAACAAAT TTATTGGGAA AACCGTAGTG TAATAT'rCCA GATTCAATTC AA.AAAGGAAA GACTTCC?1'T CGTGCCTrTC GGTAAGCTAC TC'NGTCTG ATAAAATCCT CTAT=TCCA ACCGACGATA ACACCATCTG GACTAGATAC ACCAAAACCT GTCAAGACTG AGTGCTTGTC CAAATCTGCA CGGTAAT TGC CGGCATAGAT GAATCCCTCC GCCCCTTCAA TCAAGCTTAC TAAAGGAATC AAGGCGATAT ATTCAAATCG CTTTCTGAAA ATATTTTAGG ACTATAAAAC TGACCTTTCT CCTGCAAAAG CTCTTACTTG CTACTTGTI'T GATTATTTTT GAATCGGCTC TCCTTGGTGG AGAGCTTTTA ACACCGCATT.GAAGCGTTCC AGATCGGCTT GGATGTCGGC CACTTGATGA AGTTGCGCCA CTGATTrCCC TGTCACTCCA 'rTGATGGCAA 'rCAACTCTTT CTGGCGCTCA A'rTCCTGTGG CTGTATTTGC CAAAAATGGT TCTACAAAGT 780 840 900 960 1020 1080 1140 1200 1260 1152 TGGCATGTTC ATGAGCCAGG TCTGGGATAA TCAAGCCCTT CACAGCTGTA TCAGCCAGAT 1320 CTTTGACAAA GTTC'rCCACA CCGTACTGAA AGAGGGGG?1' GAAGTAGCTC ATGATGACCA 138B0 GTGGAATCTC TGTIrCAATG GrTNCAAGG ?N'CAACTAA AGCCTGGGTA GAGGTCCCGT 1440 GGGCTAAACI' GCGCAAGCC-A GCTTCTTCGA TAACAGGTCC ATCTGCAACA GGGTCTGAAA 1500 AGGGAATACC CACTTCAAT GCAGAGACAC CCAAATCTTC TAAAAAGTGA ATTGTTTCAG 1560 CAAGACCCTC CAAACCr'rrC TCGTGGTCAC CAGCCA'rGAT ATAGGGAACA AAAATTCCTT 1620 TTCCAGCTGC rI-rAALTAGCA 1-rTAAT T 'GTTAGTGT CTTAGGCATG AGCTTCTCCC 1680 TTC7"rTGCTG CATCTGCTTC CAACGTCC ?rGACTTGAA CCACATCCTT GTCCCCACGA 1740 CCTGATAGGC AGACAATCAT AGACTTTCT GGTCCAAGTT CTTTGGCCAA TTTCACCGCA 1800 AAGGCGATAG CATGGCTAGA TTCCAAGGCT GGGATAATCC CTTCCACACG AGACAAGACT 1860 TGGAATCCTP CCAACGCCTC TTCGTCTGTC ACAGGGACAT AGCTGGCACG TTTAATATCG 1920 TGGTAGTGAG AATGCTCTGG ACCGATACCA GGATAGTCCA AACCTGCTGA GATAGAGAAG 1980 GCTTCAAGAA TTTGACCATG GGCATCTTGG AGCACATCCA TGAGGGAACC GTGAAGGACA 2040 *CCTGGACGAC CCTTGGTCAA GGTAGCTGCG TGGTGCTCTG TATCCACACC AAGCCCTGCT 21.00 SGCTTCAGTTC CATACATAGC TACTGACTCA TCTTCTACAA AGGGATGGAA GACCCGCATA 2160 .GCATTCGACC CACCACCAAC ACAGGCTACT AGG~CATCTrG GCAGATCTCG ACCTGTCAAG 2220 GTTGT'rTAGC CTCTCGACCG ATGACACTTT GGAAGTCACG AACGATTTCT 2280 GGAAATGGAT GAGGCCCCAA GCCAGAACCA AGGATATAGT GGGTATCGTC GATATTAGCC 2340 ACCCATGAAC GAAGGGCTGC ATTGACCGCA TCCTTGAGCA CGCGCGAACC ATCTGTTACA 2400 S *GCCTCGACCT TGGCTCCCAA AAGCTCCATG CGGAAGACAT TGAGGGCTTG GCGTTTGACA 2460 TCTTCCTCAC CCATGTAGAT GGTACATTCC ATGTTAAAGA GGGCTGCAGC AGTTGCAGTT 2520 ***GCCACACCGT GCTGACCAGC ACCCGTTTCT GCGATAATTT TCTTITrTACC CATGCGTTTG 2580 *GCAAGCCAAA C=rGCCTAA GGCATTrGTTA ATCTTG'rGGG CTCCTGTATG GTTAAGGTCT 2640 TCCCGTTTGA GATAAATCTT GGCTCCGCC-A ATATGCTGGG TCAAGTTTTT TGCGTAATAA 2700 AGAGGAG'r'T CACG'TCCTAC GTACTGGCGC AAAAGCTGGT '1?AATTCCTC TTGGAAACTT 2760 *-GGGTCTGCCT GACTTTCACG GTAGGCCT-C TCCAACTCCA AAACTGCTGT CATCAATGTT 2820 TCTGGGACAA AACGTCCGCC GAATTTTCCG TAAAATCCAT CTTATTTGG TTCCTGATAT 2880 GCCATGCTTT ACCCTCTCTA TAAATCTTCT AATCTTTTCA TGATCTTTTT GTCCATCTGT 2940 CTCCACTCCG CTCGATACAT CTACTGCATA GGGAGTAAAG TGTTGAATTG CTT1TACTAC 3000 ATTATCTTCA TTAAGGCCAC CTGCGATAAA GAAGGGCTGT GCTAGTCCAG TCGTATCCAG 3060 1153 TGACCCCAA TCAAAGGGCT TGCCTGAGAA TTGGGGACAT AGGCAAATTC TCAAATAAAT GCCCACTTCC TGCCACACGG GCATCAAAGA GTAGATAATC GCCCATTTCC ATCTACCTGC ACACCTGAA TACTG.GCACA CATCTGCCAC C'TGACCGTGA ACTTGAACCA AGTCCAAGCC GCACTTCTAC CCGACTTGGT GAAACAAA'rA CTCCAACCTr GC7'N'GCCAA CTCAGCTGCC 'rTTCTAAAG TCACCTGTCT AACCCATATA GTCGGCTCCT GCTGAAACGG CTGTr'rCCAC
AACTTTGTCA
TTTCACATCT
TI'TACTAGGT
ATCGCTTCCA
GCAGGAATAA
GCAAAGACAA
06 6* 0 0 00 09 0 C 0 00 00 0 0 0 0**6 00 0 6 6000 0S 0' 0 0 CGCTTCT'rrG GTCGATAGTC CACAAATrTr TCTGCGCCAC ATTTTC'rCCC TGCATAAGAG GGGCTAGTCG TTCCGCATCC TGCCCTGTGA CTTCCTCAAA GTAAGGGCCT AAATCTACAC AGTTGCGGTT CTTGACCCCG ATAATCTCAG CTAGATTGTG AGTCTCCACT AAGACTTCCA CCT'rGAG~CG TTCTTCGGAC AAGGCTGCCA TGCGAGCGCG GA'rGATTTGC Tr=CATCGA CTACCTGACT GGAAATTTCC CGTAGATAAT TCAACACCGA AATCATCACT GCTCCGTTTT CCACATCGAG ATTGATATCT CCCAAACTAG GCAAGCGGTC CTGATGATTC TTCAAAAATT GGAITTTGCTC CAGCTrCATC TGCTCCACCT AAAATTCCTG ACTCATTTTT CGTACTCCTG TCTAGCAATC ACTrGACGGG CCAAGGCAAC AACCTTTGTC AATCTGCAAC TCCTTGATTC CTCCCCTAC CAAAATTCCG 7TAAAGTATG AAATGGCAGA TTCAGAAATG TAATAGCGAC TGGTCTGCAA GTCGACCTCA AAGGTAGTCA CACCAAGTCT GTGGGCTACC TCTAGTTCAG GACCAAGCTC TGTCGCGTAG TCATACAGTT
CAATGAGCAA
TGATIAAAGTC
CCAAATGCCC
CTTCATAAGT
GATAACTGTC GCACCTGCAT TTTGT'rGAGC GTCGGAATCT TTTAAAGAAA ACCTCATCTG CTGGGCCTGT TGCACAA'rAT 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 GGCTAGCTTT CTTGACCTCA CTGCCAAGCG ATAGGTCTGG CACGCGCCTT CTGC-rCTAAG
GCGATTACCT
CGCAGAGGCT
A'rTCGTGCTA
TAACAGTCTC
CCCTTCCTTG
0 *606 00 *0 0* 0
S
AGCATAGAAA CCAAGACCAG CATTCAAGAC TTTCAGAACG CTAAGCAAAA 'rTTCTGCATT CATAGCATAG CCTCCATTC CCAAATCCI'C ATTTCAAGA AGTGCAATCT TGGTTGTTCC TCCAGCAACC ACGATGGCAC GTTTGCGACC TAGGAGTTCT GGACGACTAA TTCCAAGAAG TGGACCAG1'C AAGTTCATAA TCGTTGGAAT TGTCGT'rTCC
TTCCTGAGCA
7rGGAGTAAAG GTTrCAAGCCA
CATATTTTTC
CTGTGrT'CT
TCCCAATTCC
AGTTTTTCAA GGGCCTTrCCC ATGCTATCAA TCTTACCATT AAGAATGCAC T'rGCTTCGTT TTCCCACCAC GAATATCTTC CTTGACAAGC TGATTTCGCC GCTTCATCCA ACCCTTCTGG AAAACCTGAG C'rGTACTT-rC AAAGCCATTG GATGAATCAG AAACGAGCTC CCATGATGTA TTTCATAGCT GGGTGCATAT TTTTAGCGAA GAGAAAGACG ATTCCAGTTT TATCAAAGAC 1154 CTTACCTAGT TCAGCTGGTr TGAGGTCAAG ATrGATTCCC AAGGCTTCGA GGACATCTGC GGAACCAGA'r TTAGAAGATA TCGAGCGT ACCGTGTTTG GCCAI'GTGAA TACCGCCACC AGCCAAGACA AAGGCTGCAG rrGTGGAAAT ATTAAAACTG AAAGACTTGT CCCCACCTGT ACCACAGTXG TCCATGGCAT CATGAATCTC AGTTGGAATA TGCTGrGGCAr GTCCTCI'CAT GACTTGGGCA ATGGCTGTGC GTTCTTCAGG TGTTTCCCCC TTCATCTTAA GAGCTAAGAG 4860 4920 4980 5040 5100
GAGAGAAGCA
CATTTCCACA
AGT?1'CCTCA
TTCCAATGCT
ATCTGCGCTT CAGTTACACG CCCAGTTACG ATACGCTCAA TCACATCCG;T CCTGATAAA'r TTTCAAATTT TGCTAGT'rmr TCAATAATCT CTTCA'rCCT CTT'rACAACC TCCTCGATAA AATTCCGAAT AGAAGACAAG CCGTrC'GGCG CTCTGGATGG TACTGGAAGC CATAAATCGG TAGGTTTTA TGTTGAATCC 5160 5220 5280 5340 5400 CCATGATGGC T'rGG'rCATCA GTCGAACGAG CTGTCACTTC AAAGTCTITCT CAATCAAAAT ACTGTGATAA CGCATGACCG CACGGCCATC CTCAATACCT CAGATGGCGC T'rCAAAGTTG ATATTGCTCT CTTTCCCATG CATGACTTT= CTAGCTTACC ACCAAAGACT TCTGCAATCC CTTrGGTGGCC CAAACAAATC GCTTCT'rGCC TGCAAAATCA CGAATCATGT CTTCrZATCTT TCCAGCATCA
GGCATTT~CCT
TGATACAAAA
GGAGCCAAAC
CCAAGAATCG
ACTGGCCAAC
AGCTTGGAAT
AAGTTATAGG
TTCTAGTCAT
CGTAGACAAT
CAGGACCAGG
CATCATTTCT
TAAAAGAATC
ACATTTTGCT
AGAAAAGACC AGACCATCTrG CAGAACCTGA ACTTCTGCAA ATAGTTGTCA ATCAATAAAA TTC'rTAATGG TT'rT'GGTA
CTTTTTCAGC
AATTCCCAAT
TCATGGTCTT
TTCCCTTTTGG
TTCT'rCATAC
GTATTGGGCC
AGTTCTCCAA
GCGATAGAGT
CCCTGCCCCA GCCTGCACAT AGCCTCTTTG ATTTTTGAGA ATCATGGrrC GGATGGCGAT GGCCAAATCC ATATCACCCG TCGCAGACAA GTAGCCGATT GCCCCAGCGT ATACTCCCCG TTTT'rCCGTT- TCCAGTTCAT AGA'rACGTCT CA'rCGCTCGA ATCTTTGGTG CTCCAGAAAC GGTTCCAGCA GGAAGCGTTG CTTTCAAGGC ATCCATGGCA GTGAGTCTC GAAGCAAACG 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 CCCCTTGACT ACGCTGGTCA AATGCATGAC GTACCGAAG AGTGACTTGG ACACTGGTCG TTTCAGAGAT GCGGCCAATA CAACA'rrCGA TGTrTGCTG ''rcCCrCTC ATCAGAGAGG -G*.CTTCTTCA TCCGTAGCCC CTCTTGGTCG CGTCCCTGCA GCCA'TrrTG ACAGAAACCA AACTTTCTGG ACTAGCTCCG ATCATAGAAA TAAAGGTAAT TAGAAGGAT'r AGTCACGCGG TGGATTTCCA GTAACTTCTG CTGAAAAACG CTGGCTGAGT GTTACGAATC AAGTCACGAG CTGTTTCTAC CAT'rCCCTCA AGCTCCACTT CCATATACTT 'rCGT'rACGCC CCAAGTCTAC AGCTCAGTCG CCAAGGCCT'r ATCGGATTGG TTGTCACGAT ATGATTTGAT AATCCCCAAA AGAT'TTCTGT AGAAGTCAAA ACACATTGGA ACATATCTCC AACTTATGTG GAGCGATATG 1155 CGGT'TTCAAG TCTAACGGAG ATAGATCCAA ATCrrCAAAT TCATTTGCAG CAGGAATGCG TAA'rTCCTCA AGCACTTGGT 'rCAAGGATTT rr1CCAAGGCC TCrrGACTGC GCTCACTATA AAGTGCATCC TCTATGACAT GTATCTTCTC CTTCrGTGG 'rCAAAGACCA TATAGCTCTC ATAGACAAAG AAATGCATGT CTGGCGTCCC AArrGTATCC TCAGGGATr TTCATAAAGC GAAATCATAT CGTAACCCAC AAAACCAA'rG GCTCCACCAC CTCTGAGTGG TGCTGACTCT TATICAATCAC ?rCATAAAGG AAATCCAAGG
AATCACTTGA
ATAGGCTAGG
CrrATG7TTGC
GATTCGTTCC
CCATTTTGAT AGAGAACCCC ATTTCAAAC TTA.ATCTCAA
ATAGAAAAAC
CCCTTTAAGC
ATTGTAATTT
GACCTGTTTC
GCATATAAGC
CCerrCAGT
TTGCGTGAGG
CCTGTCTCAG
CTTGTCTCTC CGAATACTCT CAAGATTGGT GATAAGACAT TCTACTrTCTA GTCCGTGGTG
GACCAATTTC
CAAAAGGTAG
GATCCCGATC
AAACTGGATr
CTAAAATAAC
CTCCATGAAT
ACTGTATGPA
TCAATTATAG
TGCGATAAAG
CrrACAGC'r AAATCCCCAC GCAAAATAAC GATTTCTCCT ATCTCTrCATT GGCACTCCCT TGAGAATGAT GTTTTCrrCT TTCTCTGCTT GTTTTCAGCA ACCACAAGCT ATCTAT'rATT TTTAGCTTC 'rAGTAGTCTG CAGAGACATT GATGAAGAGA TGGTCATCTC TCAAACCGCG TCAACGTCGC CTTGCCGTAG
ACGAAATTCG
ATATCTCCTG
CTCGTTTCAG
CTCTGTGAGA
CAATCGCAGC
GCTACACCT
GTATGGTT=AC
CGGTGCCACC
TAACAGGCTC
ATGAACCCAA
GAAAGAACTG TAATTTTTCC TAGGTCCTTG CCTCCACGAC TATAC'TCTTC GAPAAATCTCT 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 GCAACCTCAA AACAGTGTTT TGAGCTGACT TCGTCAGTTC TGTTTTGAGC TGACTTCGTC AGTTCTATCC ACAACCTCAA TCGTCAGTTC TATCCACA.AC C'TCAAAACAG TGTTTTGAGC GTTTGCTCTT TGATTTTCAT TGAGTATTAC TAGCTTT'N'T
TGACTTCGTC
TATCCACAAC
AACAGTGTTT
AGCCTGCGGC
CCTATTAGTC
AGTTCTATCT
CTCAAAACAG
TGAGCTGACT
TAC?'rTCCTA
CAGCCITTT
GTTTGCTTT AGTAGTAGGC
AAGGCCCTAA
CT'TCCGTGTG
ACATCTCTGC
GT'N'CAAATT
TAAAAGATAA
AGGGCGTTGG
CTTGTTTAAC
CATTTTCAAA
ATGGAGCTGT AGATAGAACT CAAGTTCATC AAAGCGACTT ACCAAACGAC GGATAGAAAA AAGCCCACAC ACAGAATArA TAACGCGGTG CCACCTCAAI' TATA.AAGGGA CTATCCCTTT AACAAGCTGC ACTGTAAG GTGCGCACCG AATTrTrCATT ATCAGCCCAC TTTCACTACT TCCAACCACC TATTCACAAT CACCACAGGC TCCCTGAAGA TCAAAAATAG ATACTrTGTT T'TTCT7r CAAGACTTTT ATGACCATGT CTTCCTTAGA TCGAACATGA TTACTTTTCT GATTTGTTGA ACTTATTTTA TTACGATTTT TTTGAAAATA TCATTCGAAT ACATGTCCC-A CTTCTTAGAA A'rTGGATCCA 1156 ACTCAATAGA AACTGAATGG AGGCTAAACA GAACTTATTT- TAGAACACTC ACTAGGAT'rT TCAAGAATTA AACAATACTA GAAACTCTGT CTCCTAACAA AACTTCAACA GATGTGACAC TTT'CCCC~rr AATAAr-rGCT AAAACACCTT
TTTAGCCAAT
AATAGGAATA
ATAGCACGCA
ATAATTTTGG
GCCATAAATG
TrAACATAAT
TTATCAAATC
CGAATCAAGC
GTACGTCAAGT
ACTTTTAAAT
TGGGAGCAAT TGTAGACAAA GCTGGAGTAT CAATGATAGA AA'rATCATCT GGAATAAGAA CCTGAACCTT 'N'CATCTCCT GAAACAAAAA GCTGCA1rGC ATAAGAATAA ACTGAATCAA CCATAAAGTA ATT =ATCA TTCAGAAAAG CATGGGA'rTC TCCTCCCATA AGCA.ACCACA
CATCTTTT~CC
ATTTAGGAGA
CTATCATTC
AATACTGAGA
TTCCTq-rCTC
TAATGTCCGG
TTGTAGATAA
8400 8460 8520 8580 8640 8700 8760 AACGCACACC 8820 TAT'T'rAAA 8880 TCTT'rCACGA TCCTTATTAA TrTCrrCA GTACAGCTT ATAGACTACT CCAGACGTTT TAAATACTGA ATTCTGATAT GCTACCI'GTG CTTGATATAA TCATCATATC ATAAGTAGCT TGAAAA'rTAT TAT'rAGATAC GAGATTCACC GAAAACAAGA AAAGGCATAT GGTTCTTCTT CATCTACACT 'rTCATAAAAA ACAATAACAC CATCTACTAG TTGAATTACT AATTGTATCC TCCTCTCCAA AGTACTCAAC TATAGCATTA ACACCAAATT CTTTACACGT CCAAATTAAA GGAAAAGAGT CGATTTTTTT TTAGTTAAA TTI-rTTGCAT ACGCATTTGG AATCGCTTGA TAAGTTTCTT CTT'rAACAT'r
CCGTAACACT
TACAGAAATC
AATATACCAC
TACTCCACCA
TTATCTAACA GCGTATGAAA AATATATTTA TAGCTTCTTT AATTCCTCTA TAACTTTTTG TTAATAACTC GTGAAACTGT GTTACCTTTT TCCCATTTAT ACTATACAAT ATTTAATTTT TTTTGGAGAA AAACCTGATA AACGTGCAAT ATCATAAATA ATTTTCATT TCAGTCCTCC ATTACGAACA TTCTAATATT TTTTAACAAG AGAATTTAGT AAATTATTTA AGATCCACAA ATTCACAAAA 7TAATrTTAC APLATATT-CTT CCCCTTCAAA AAAG~rTAAA TTGCATT'rCA CACCTTA1TT rAAGAATG TTTCCAACTr CACGACAALAT AAATTCATAT GAGAAAAAAC TGCCATAAAA TTGTAGATTA AC7rTTCAG TAAAATG"GT AGGATTTATA AAAACATATA ATAGCCTGTC AATGTAACAT 'r'rlAACATAG AGTTAATTTT T'rCTTTAAAG ATAACATT'rG TTATcAACTC ATCACGAGT AAATGAAAGG CAAACACCAT TTCACAAATA TCATAAAAAG AAATAAATTT CTATACTTGT ATCAAACAAT TATTATCAAA ATATTCTATT TTACCTAAAT CAAAATTGAT TTTATAATCT TTCATAAAAA CCTCTGAGCA AAAATCTACT CAAAAATTAG ATGATTAAAA CATCTAAAAA GCAAAAGGAC AAAAACATCT CTCCCTTTGT T'rACTAAATT TCAGCTAATT TCTTCGACAT AAATAACACC TACAATATTA GCAATTTCTT CCATCAGTCG AAGATGTTCA AATCTACCTG ATAATTCCAG AGTAATAAAT GACGCTATTT 7rTTGTCCGG AACATCAAAG TATTCAATTC 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 1157 TGTCAGAA'rT AACATCTCCA AACGCTGTTC TTGAATCGGT CA'1-'CTGATA CCATrrTCTG CACAATAAAC CAATACACGA ?1'ATAGGC?1' CGTAGATT AACCACTATA TACAATTCAA TCATTTTAGA ACGATT'rGC AGATATIrMTT ACTGGTTG GAACATGGAT ATCACACCCC AAACAGAAAT GGCTAC'rAAA AGAGCTCCCT CATAAGG INFORMATION FOR SEQ ID NO: 192: SEQUENCE CHARACTERISTICS: LENGTH: 6867 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: 10200 10260 10320 10357 CGGGACATTC TCAATCTTCT AATAAATTCA AAAGGAGTTT TATGTTAAGG TCAACACGCG CAGGTTGACT TCGCAACAAA GTTTACTATC TACCGAATGG ACACGTAAGA TTGGTTI'TAT AATCCACAGG TAATTGAAAA AAACTCGATC CAGCTGACTT ACAGATGGAA CAACCTTGCT GCCATTGAAT ATCTAACTGC GTCT-TTGTT TTTCTCTTCT TTCTATGATA CAATGGAAAA TTT ATCACT TATCCAAATC T CTT GCACCG CTTCTTAACC CTCTGATGAA CACTCTACTA CTACTCCAAG TACACAGAGT TcTCCTAATT~ CCTGAAATGA AACGTGTTGG ACTGCAAAAT TTTTCCTATT GGAACCTTGC CAGCCAACGA TCCGTCTTTA ATCGCACATG GATACTGCTG ATTTTAATGC TGAAGGAGTC CTACGATGGT GGTGTGATTG AACTAGGGAA TTCTGGTTTC CAAGAGTCTP GAAAAATATC CAGGACAAAC GCTCATCACA AGGTGCTGAT GACAAGTCAG GAATTGCTGA AATTATGACA TCATCCTGAA ATTAAGCACT GTGAGATTCG TGTTGGTTTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GGTCCAGATG AAGAAATCGG TGTTGGTGCC AATAAATrTG ATGCAGAAGA TTTI'GATG;TG GATT'rTGCCT ACACTGTrGA TGGTGGTCCA CTAGGTGAAC TTCAGTACGA GACTTTCTCA GCCGCTGGTG CTGAATTGCA TTTCCAAGGT CGTAATGTCC ACCCTGQTAC TGCCAAAGGG CAGATGGTCA ATGCCCTTCA GCTAGCAATT GA'1TTTCATA ATCAACTTCC AGAAAATGAC CGACCTGAGT TAACTGAAGG TTACCAAGGT TTTTACCATC TAATGGATGT GACAGOTAGT GTTGAGGAGG CGCGTGCAAG CTACATCATT CGTGATTTTG AAAAAGATGC CN~TGAAGCG CGTANAGCAT CCATGCAATC TATCGCTGAT AAGATGAATG AAGAACTTGG GAGCGACCGT GTCACTCTCA ACTTGACAGA CCAGTACTAC AATATGAAAG AAGTCATTGA AAAAGATATG ACTCCAATTA CCATTGCTAA AGCCGTTATG GAAGATCTAG GTATCACGCC TATTATCGAA
CCAATCCCGGG
ATCTTTGCAG
GAACGTGCAG
CCAGCTAC
1158 GTGGAACAGA CGGCTCTAAG ATTCCI'rA TGGGAATCCC GTGGCCAAAA TATGCACGGA CGTNTCGAAT ACGTTAGCCT 'rTGATACCAT CATTGGCATC GTAGCTTATA AAGGCTAAAA TTCGCCTrrC TTTTTATTCT ACTGG7TTTT crGA'rTrCC
AACTCCGAAT
TCAGACTATG,
AGACGACOTA
AGTAG'rTGTA GAAGA'!rCTG TTGTTTCATT
CTTGGTTTGT
rrCTCACCAC
CTTGCGCTAG
'-T~CGTCGCT
TGGTGTTATC
CTTTTGACTrG TTCTGAAGTT GATTCAGCAG G=rAGAATC TCTTCTATTG AGCAGTTTCA ATGTTAGATT CTGCAGTTGC GTTTGGTTGG ACCATTTGCT TCAGCATTTC TTGCTGGACT TGVrCTrCA GATTTGATGA 7ITCAAAACTA GAATAGCTr TGTCGATTCA AGTAAAGCTG TTTTGTCTTT ACTCTTAGCA TCAAAGTCCG CATCAGATCC ATTATTACTT TCGTAGAGTT 1TTGATAGAG TGTTGAAGGG CGCTACCGAT AAGTCTGCAT CAGCCTTGTT TGTCTGATTC CTTCTTCATG TCTACTTCTT CCTTGCTATC
TACAAGTGTC
ACGAGTCAAG
TGTGGCAGCT
GATTTGTTCC
TGTCGCAGAT
GAAAGTTGAT CTAATA.ATGC ATCCACCTTA TCTAAATAAG AGTGAAGCGA CATGAGAATA 'rGAGGATCTT GCTCAGCATT TTCCN'TCT ACATCTTTTA CCTGACTGTT 'rACTTCATCC TTrAGATTTT CTACTTCTTC TGCCAAGGAT AAGAGTTGAT TTGCCT'rGCT CAAAAGACTT
TATTGGTI'GC
GCCTTACAGA
CACAGCAGCA
AACAGGAGAG TTATAATCCA AGATTACAAG cAAT6;rTCGT CCAAGACGCT ATTcGcTTcG TTTAAAGTTT AGGTGTCAAG ACCTCTTTTT AGTGTGCCCA TCAACTAAC'r TTTATTTTTT TCAAACTTTC AGTAAACTGA TCTTTGTAGA TGCTTCTGCT ATCAGCTAGA AGTTGATCTA TCATCAAAAG T'rCCAGGTTG ATAGTTGGAT 'IGCAGGGATG GCTTCATATC CCTTAGTTTC AACCTTGATG TAGTGATTGT AAACCTTCTG AATCTTCACT TATAATTCGA TTGGCATCAA TCATGATGGA CATGTAGTGA CGGATTACTT AATACAGAAC TCCGTGTTAG AGTGTGATC GATTGTTA AGAGATGACT -TCCCCATGTC 'rTACTATATA AGCATCACCT GTATCTCTCA ACATATGTGG CTGCTAATTC ACCTGCCGAC AAGTCACTCT
TATCTACCAT
AATAAGAA.AT
CGGATTCAAT
AAATTTAGAG
CCTAAAGCTA
CTTTTGCCAA
GAA'rCTTGTT
GGTCGCCATG
AACCATGACC
TAGAAGAACT
TAGGAATATA
CTACTCCTAA
CCAGATAAGA
ATGCTTTAAT
AAGTAATCAA
ACTCAATCTG
GACTGCCTTC
'rTTCAAAGCC
AGGAATCACA
ATCTTCTTCC
TCCTACCTCT
GTGATAGTGA
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 CAATATCATT AGGGTTAAAG CACGAATGAA ATGATAGTGA CCACCATGTG GTACTATAGT AGATTGAAAT AGAATATGAG CAAAT'rGATA AAAGTAATTT CTAACAATGA TTTAGAAACT ATGA'rCTGCT ATrCTAAATT TATATAACCA TCATCGGTAG TATAACGTCC CTGTAATTTT GCTACAGATA
AGGGGATT
CAACTCACTA
CTCTGCACT
1159 AGCTCCTTTA TCGTC-rTTAC CATGTTCTTG 7rTTGGCGA TTGATTTCAT CT7"TGTTCG TACAI'rrCT GCATGAGCTr GATCTTTAAG GTAAACATAA TACN'TCCAT CTACCI-rAAT AATATATCCT CCCTTAACCT AACTGACGAT ATCTrGATCI' T'CGGCTGAT AG7"rGGGGGC ?rrCATTAAT AGCTCTTCAC TAAAGAGCGC ATCAAAAGGA ACTACCAT TATAGTAGTG ATAATGATCG CCATGAGAAG TrrACATAACC TrGATCTGTA ATCTTAATAA CA-rTGTTT TGCTTGAATT CCTTCTTTTT GACTAACCTA C'rCTGGAGTC AAATTTTCAG TCTTCTTAGT GTCTTTATTA CTGTTTACAT ATGAAACACG ATTTTTATCT GTATTGGCCT GTTAGCTATG TTGGTTCAGA GCATAAACAC ACAGACTTAA CGAAAGGATA ACAACAGATC CAGCTGCTAT ATATTTCTTT TTAAA'TTTCA TAATTACCTC ATTTCTATAA TTATTTATAT GATGTCTTCA TTATTAAATG ATTAAATAAA T'rAATTAACC AT'AAT'rAA CTAGTAAATA TT-CCACCre'r TTTAAGTTG TATGTCAAGA AATTTTATAT ATTAATAA'rA AAATGAAATT CTCCCAAAGT CAGAGTTTTTA TTTCTAACTT TTGAGAGAAC TTCAN-rG AT'rCAGACTT TTTCTACTGC TATTCCTTAC GCTATGAGAT CAGATAAATT CTTTTTATC ACTTCTCCAC TAATTCAATC GTTCCATCCA TATTGAATAT TGTAAATTTT TCTAATrTT CTTGTACAGG A'rCTAACATA GGGTCACTCC CCACATTCCC GTTTTCTGGT TTT'ACAGGTT TTTCGTTTGG TrCTGTTGG1' TGGTTCTCAA CTGTTCCAGT TTCACCATTT CCTTGAGGTG CTTCTCCTGT TCCCGATGGT AAATATAATT CAATTGT'rCC CATCCCATAA CTT'rCAGCAA AT'TTrGCTAC TTCTTCTAAC GTTGAATTAC TAGTACTAr'r AACACTATCT AAGCCTAATC ATCTACTGCT G4GAGCTTCCT TTCTGGATTC AAC.ATTCCAT TGCCTCTGGT AAAGAATCTrG AGATACTTr'r CCATTTTCAG AAAATCTGCC ATATTCTTT GTCCATATTA AAcAAGACAT TTTTTCTTGT ACAGGATCCA CCCACN'CA GAAAGTTTT
TTGGCAATCT
CGTAACTAGC
CTAATCCTG
'rATCCCT'rGA
CTGGTTTATT
ATGGTTTATT
TAATGACTTC
TT'rCTAGCTT
CTGTAGGAAC
CTTTTTCTAC
CTTGTGCTTT
3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 CTTCTCACTA GTCTTTGGTT CTTCTACCTr TTCATCAAGT =AAGTTI- ATT-CCTT'rrA AATTGTGGTA AGGTACTTCC ACAATATAAG AGGAATT'rTA TTTCCGGCCG GATTTCACCA ACTTTATAGT TCTGACTAAA GCATCAGTTC ATCCCCTGCA TGGAAAGGAT GAATACT'rGG TTTATCACTT TCGATTGAT'r GTCCAAATAA TTCTGGT'rGT TCCTTGGTTT TATTrTTCTAA ATAAGCATTT CTTTAGGCAC TGCAAATTGA AGAAAATCGT TTGACTGGCC TCATTTTCTT TrCCAAGAT GCATTTGCCA TGAAGGTTAC AATTTCGGAA TCGGTAATTT CCATGAAA'IT CATCAAACAC GGGT'rCACTC TTAAATAAGT ATTr'rGTAAG CTAAAGAGGT 1160 TGGAACTGTA AATGTACCAT CA'rAACTT-AC TTCTGGATAA TCTMGAAG CGA'rAGTATA 4740 CTAAATGTT 7VMCTIA AATAAGGTTG ATCTAATTCA AAC1rGCAA TATTCCCTAC 4800 TCC7?rCTCCA AATACTTTAC CAGATACTTT CTCCAATACT TrCCATCTG GTGTTA~rAA 4860 TTTTACTAGC ATATTGATAC CTAATTr CTCCAATTCA GGCGGAAAAC TAAAAGAAAC 4920 GCGTN"'GA CCATTGGCTA GACTAAAGTT TTGATTATTA AACGTACTAT 7TrTAACAA 4980 ATTAACAACA TTCGTTAATT~ CrCTCCAGT ATAAACTTTA TTCCCTTCTT TTTAGCAAC 5040 TCCTTCTTCG GGTTTAAACA GTTCATAGTT ACTGTGAGAA TGACCAATTC CAACCGGTTT 5100 ATGTTrCATCA ATCGGATCTG CATGATGGTG ATCTCCATGC GGATAAATAA TCGCATTTTT 5160 T'rCTTTATTC ACGACAATAC TTTCACGTTT GACACCATAT TG'rrTCATAA TCCCAGCAAT 5220 T1rT'rCirCG A~rTTTTAT CTAAATCTTT CATTTCTTTG GCATTACTTG GATAATCCTG 5280 TTCATGAGAT GACAAAGAAT CTAATCCATT ATGACTAGTT TTAACTTCCT CTAAATG'rr' 5340 TTGCGCAsCT TAATTTGCTC TTCTGTCAAG TCCTTCTTGA AGAAATAATG ATTGTGGTCT 5400 CCGTGACTCA TGACAAAACC TGATTCATCT TCAGCGATAA TACGA'rTAGC ATCAAATCCG 5460 .TATCCATCTT CTTCATG=~ CTCATGTGAA GTTCCTGGAT TGATTGGAAG AGATGGAGAA 5520 GGTG'PTGCTA GACTATTGTT TGGAAGAGTC GGTTGCCCAA TTTGATTTGA TTTGGAATG 5580 *TAATGGAAAT GATCACCATG TCr'rACAATA TAAGCTGTAG CCGTTTCTTC AACGATATCT 5640 TTTGGATTAA AAATATAACC ATCAGATGCT GAAGAGAGCT CCTTACTTGT CGTTAAAGAA 5700 GAAGGATTGC TTGAAAGACT GCCTAGACTA GACACTACTT CATTAGGTT'r TGCATTTGTA 5760 GAAACTGTAG AACCAGTTCC ACTGATAGGC ACCATTCTGG CAATCTTTTC ?,TCTAAGGCA 5820 GA.AAGCTTGC TGTAAGGAAT AAAGTGGTAA TGGTCGCCAT GCGGAATCGC AACTCCAT'rT 5880 GGTGTACGAC TGATAATCTT AGCAGGGTCA AAGACCAGGC CATCTGATTC ACTGTAACGT 5940 *TGGGCGCTAG GTGAATCATA GAGTTCCTTC AAAAGACTCT GGAGATTTTC AGATTTATTT 6000 TAGTTGATCC T?1'TGCTACA GATTGCGTCT TATTGTCACT AGCTGTTGAA 6060 GAATAGCTTA ACTGACTCGG TTGCATAT'TT TTTCCAGCCA GATGTGCTTT AGCTGCTGCT 6120 SAATTrCACTAG CACATAAATC CC71rrCGGA ATGTAGTGAT AGTGACCTCC ATGAGGAACG 6180 *.SATATAAGCAT TACCCGTATC T'rCGATAATA TCAGCTGGAT TAAAGACATA ACCATCA'rTT 6240 GTCGTATATC GTCCCTGAGA CCT'rGCTACA GCAACATTAG AGTTAACCrr CTCATTATCT 6300 TTGACATGT'r CPGTrG ACGATTGATT TCATCTTTAG 7rCGAACArr ATCAGCATGA 6360 GCTGCATCTT 7rACGTAGAC ATAATATTTT CCATCGACCT TGATGATATA ACCACCCTTG 6420 AC'rTCATTG.A CAATATCAGC GTCTTTAAGT TGATAGTTTG GATCCTTCAT CAAGAGTTCT 6480 1161.
TCACTAAAGA GGGCATCATA AGGAACTTC CCA'PrATAGT GACGTTACAT AGCCC'rGATC TGTAATT'N'G ATrACAAT TTCTGGCTAA CCTGGrCTGG TGTCAAGTTr TCACTrTTCT ACATAAGAGA CACGATTATT GTCCTATT TCCTGCGA.AC GCACATAGAC TCAAGGATAC GATAACAGCT GATCCAGCI'G TTCATAAATC CCTCAT1TTCA ATAAATGATG AAGTTTTTC AAATAGTTTT CTAAACCCGG GGGTACC INFORMATION FOR SEQ ID NO: 193: AATGATAGTG GTCACCGTGT CTC.AGCCTG AATTCCTTCT GACTTCACTG GCTGCCATCC GATGCTGG=r TAGTGCATAG CTATATAT'rT TTTACTAAAT TCAACTrCTr TTACTTTATT 6540 6600 6660 6720 6780 6840 6867 SEQUENCE CHARACTERISTICS: LENGTH: 999 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 193: CGTTCTAAAA ATGCAGTACG ACACTTGGAC GCGCTCAAGG GCTGCACAAC AACCAGAAGT AGTATGTCAC AAGCACAATA CGCCTTGTTC CAGGAACTGG CCACACGCTG ACCTTCGTCT TCATACGACG TTTTCGTTAA CGTCACGGTA TCGCTCGTGC
TTTGATTGAG
TATGAAGTTG
TCTTGACATT
TGCAGGTACT
TAAAATCACT
TGTCATCAAC
CGTTATAGGT
CCTTCTTCAA
AAATCAGTTA AAGGTATGCT TCCACACAAT AAAGTATTTG TTGGAGCTGA GCACACTCAC
TCAGGACTTA
GGACGTCGTA
GTTAACAAAA
CAACCATTCG
GGTGGATACG
GTAGACCCAG
TCTAAGGAAA GGAACAATAA AAAACGCTGT TGCACGCGTT AAGATGTTGA AGAGTACATC CAGTTACTTC AACTGTAGGT CTGGTCAATC AGGAGCTATC ACTTCCGCGA TTCATTGAAA CGCGCAGGAC TTCTTACACG TGACTCACGT AAAGTTGAAC GTAAGAAACC AGGTCTTAAG AAAGCTCGTA AAGCATCACA ATTTAGTAAA CGTTAATTCG AAAGAATTAC TATACT'rATA CAGAGCACCT TrCGGGGTGT TCTI'TTTA TACTTTCTTA CTAAATTGGT GCAATTGACA CAGTTGTTGC GACTTTAGTC GCTTACAAAT GTGGCTGCA.A CCTGACATGG TCAGTTGCCT CAAAACGTTA ATCAATACGA TTATATCAAC GrCAAAGC ACTCAAGGGT TTACCCTATG GGTGCTTTTT TCTATACTTT CTAAAAAAGT TTACCCTAAA ATTTGCCCTA AAATTACCCT ACTTATTTTT AAGATGT'rGG TAGGCAACTT GTCCAGCAGA TAATGGAACT ATGTTTGA.AG TATTAACATA AGTCTTAGTT GTAACGGTAT CGCTATGAGT TAATG CTT CA GAAATGGCTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1162 CTAAGCTCAT TCCTGCTTTT TTAGCAAGTG TCGCTCCTG INFORMATION FOR SEQ 10 NO: 194: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2315 base pairs TYPE: nucleic acid STRANDEONESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: AATATTATCA CTGTTCTTGA AGGCAGAACA CAAGCTGTCA TCCGAAATCA CTTTCTTCGC TACGATAGAG CCGTTCGTTG TCAAGTGAAA ATCATTACGA TGGATATGTT TAGTCCTTAC TATGACTTGG CTAAACAGCT TTTTCCGTGT GCTAAAATCG TTCTAGATCG TTTCCATATT ATCCAACATC TCAGCCGTGC CATGAGTCGT TTTCGTGTTC AAATTATGAA TCAGTTTGAA CGAAAATCTC ATGAATACAA GGCTATCAAG CGTTACTGGA AACTCATCCA ACAGGATAGT CGTAAACTCA GCGATAAACG TTTTTATCGC CCTACTTTTC GAAATTCTTG ACAAGATTTT AAGCTATTCA CAACTCTTAC TTTTTCACTT TCAGAACAAA GACAATCTGA AGCAGOTTCA TCCTCTTTTT AAAGAGAAAA TCGTCAACGC CCTTCAACTA AATAATCTCA TCAAACTTAT CAAACGCAAT AAAAXACGGA TTTTTATCGC TCTGAACATC
GAAGACTTGA
GACCCTGAGA
CAGACTGTCT
CCCTATTCAA
GCCTTTGGTT
AAAAAAGAAA
GCATGCACTr AACAAATAAA AACACCACTA TCAGATCTAT AAT7TTTTCGG ACTCATTGAG TTAAAACCTT TCTAAAGAAC ACGCCAAATT GGAAGCGACC TTCGAAACTT TGAAAACTTC GGACGAAATT TGTCCTTI'CT CAAGCTTAGC TTTTCTTCAA CCCACTACAG TTGACAAAGA GCCTATTTTC GCTGATTCTC CACTACATTT GACTGGATTC TAATTTTTTA GACAAATACA AAAGAGCTAG CTTTAGCTAG CTCTTTTCCT ATGCGGAGAG AGGGACTTGA ACCCTCACGA CCTAAAGCGG TCACAGGATC CTTAGTCCTC CGCGTCTGCC AATTCCGCCA TCCCCCCGTC GATTACTTTA CTAGTATATC AACTTTTGGG ATGCTTGTCA ACACTTTTTT TCAAATTI TCA'I=?CAC CAACCAGGTT ACTCAAAAAG TTCATTTAGA TTTTCATCTA CTAACTTAGC TCCGAGTGTA TTTTTGAAAT GACCTAGCGC AAATTGATGA TITTCAGGCC AGATGGAAGC AACAGCTGGT TTAACAATCT CGATGTCATA TCCTAGATTA TAGGCATCTA TAGCTGTATG TAGGACACAG ATATCCGTCA AGACACCTGT TAAGATAACG GTAGACACTC TACGCTCTCT CAAACGAATA TCTAGGTCAG TCCCTGAAAA AGCTGAGTAA TGGCGTTTAT CCATCCAAAA GACACGACTG TCTGAACCAT GCTCTTGATA AAAGATCCCC AAATCTCCAT ATAAATTCCG TCCACTCGTC CCAATCAGAT 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1163 TATGAGGAGG AAATAACT TA CTrrCCGGAT GGAAACAATC GTTrTCTTCA TGACCATCAA TAGTAA.AGAA GATATAATCT CCTCGTrCAA AAGCTAATCG AGTTACTG CTGATGGCAT CCGAAATCGC CTGAGCTGGA GCACCTGCTG TTAGTTrCCC ACTATCAGCA ACAAAATCTT CTGTATAATC AATCGAAAT AAAGCCTTTG TATTAGTAA AAXAAATATCT GAAATCAAGA CCT TAAG.ATA GGTrCCCTrC AATAATCCCC GCAGACTCAA GTTTACGAAG AGCATrGACA GATACGATCT GCAATCACTG ACGCAGTCAA CTTCCCTTCA AATTGCTGAA ACAGGACOGA GTTCGGAGTA AGAAAGGGTA AGTACGACGA CGAATATTTT TCTCATCTTC TTCACC'NCG2 TCTC rT=CT TCACTrCTTC ATTCCAAGTG AGCGACT'rTC ATCACAGAGC GAGTGAT-rCC TTCC.ATTTA ATTCCCCrAA TTGACCGCCA TGGTGACAGC AAGTTA.AGAA GCrGAATCCC TCTTCGAAT'r TTTTATCATT TGAATCGGTG CAATAGTCGT ATACTCA'rAT CATGCTCAAC 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2315 AACAACCGTA CTGGCAATCT CAACAAGAAC ACGCCAAATA ATCAAAGAAC CAAGGCGAAT CAAGCCATCT GGAAAATCAT C'rCTACTCTC
CA.AGTCCTCA
CCCCGATACA
AATAGGGAAA
AGGCAAGTTT CCTTc'rGTrT CGTAAATCAT ATT~AGCCCCT TGA.ACGTAGT AATCTTAGTT TGGAAGAATT GC'rtACCCGA TCTGTATTrG TTTATAACG CCAAGCAGAC GTCCCTTACT ATrGATAATG CAGGCATTGC A.ATGAATAAT TGACGCGTAA TAGCGTTGTIA AGGGAGCTCA, TCTCG INFORMATION FOR SEQ ID NO: 195:
CATCTGGGAA
CATAAAATAG
ATCCGCTAAC
SEQUENCE CHARACTERISTICS: LENGTH: 6693 base pairs 3) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: CGATTTCTTC CATTTCTTCA AATAAGAATA CTTCATCTGA CATATGTGTr ACCTTCTTCA TCAAAAATTA TTTTGTAATC GA'rrACATTG CAGATCGTAA CATAAAGAAA AACAGATGTC AAATATTAAA CGTAAAAACA TGGTCACTAA ACAACTATAA GAGAAA.AGGT AAACCTAGCG ACOCGATGAA CGCTGGGTCG TTTGGTTTCG ATTGCTCTCT TCCTCTT= TTTTCTGTrC TTCTTCTTGT TTTTCTCAG CTTCCTTOGC CTCTTGTTTG GCTT'ITCCT CAGCTTCCAT AATAATTTA TCCCCCACAG TGTAGCTGTA GATTCCAGCT TCCATGTCGA CCACACTCGG TTCTGACAAT TGAGGCTTAA TCTTACTGTA ATATGGCAGT TTCTTACTCA TT'rCAGATAG 1164 AGGAACCAAG ACTTCGTCCG AATCATTCAT GGTCAATCGA ATTAAATCGG ATGTCACCTT GCTTGCGGCT AATTCCAccT 'rTTGGATAGc c~CCTTG.AGT TCTGGCGCTA.A 'TTrGAGCAAG TTCTGAGACA AAAACTTTrGA TTTGTTCACT ATCATTAAAG AGAACTGATA AATAAGTTTC TGGTAAACTG TTCAGACTCA CAGAACTAGT CTCAAGCTGA CCACTGGAAA GAATAGGATA ATGATTTCA CCAGAAATAT ACTACOCCAC AATATCATAT 'rCCT'rGACCT TAATAGTGAA CTTACTTGGA AATTGATAGA CAAGTTGAGC TGATTCAACC CAATAGTTAG ACTTAATCTG CTTTTCATAT TTTGCCTTGT CTAGCAGAAG G7TAATCGTA TAATCCGAAT CCTGAATGCC TGAAGCCTGT CGAATATCAT CAGCTGTAGT TTGCACCGT'r CCCTCAACAC GAATATCTT'r CATGGTCGCA TAACGACTGA GCAAGTAGGC AGAGACAAAC TAAAATCGCG AAGGCTCGCA AGATATGGAT ACCAGGAATC CTTTC-TAGCC TT=rAGCAA GCTTTTTATC TTCT'lrCTCT
CTTAGCTGAT
CTCTCAGAT
CGCCTCTTCT
CTCTTTCAAT
TC=TATGAT
AGAAGCTTTC
TTCCAAACTA
AAAGTAAGCT
GACATGCAATI
TCTrTCTCTT
TCTGAATCTT
TCTTCTCCCA
TCAGCCTTCT
TCT'rCGAGGG AAATCTTT'rT
ATCTTAGCTT
TCCAAGGTCA
GCAT=TCAA
TTTGCTATCG
TGTCAGCCTC
CCTGGCTrGT
TTCGAGCTTG
TTTTTAGATA
TTTCTTTGTC
TCAACAATTG
GGTAATCTTC
AA'rCGCTTTC
TCTGGTCACC
CCAAGAGCTC
CTGTTCCTCc
TGAGGATGCT
TTCACTCTCC
'rCTr-TCCT -r TTCTTGT71T
CTCATTTTTC
ATAAAAATCT
CTTGTGACTT
TTGAAGGTCT
ACGACTAGCT
AAAAATCGTA
ATAGAGATCG
ATCTCCAGTT
CAATTGGTTA
GGGATTAAAG
AATAGAAGCA GACTTGGAAA TTTGCTTTGG CTGGTTT=C 'rTCTCTTTAG ACTCTGGTC ACTTTTTCTT CAGACTCTC TGGTCCTGTT TATCCTCTGA TCCTTCTCCT CAGCTAGAGC CGTTrCTCC
TTATCTTTTG
GCTAGAGATT
AGTAAGTGAG
TCTGCATAGC
TCACGACCAA
TTGGCACCAC
GTCACATAGT
AGATTGATAA
AAGACACGAG
TCGGTTTGAA
ACCGCTCCCA
1"TAGTCGCAA AACAATATCA CCAATTCCA TCAAGGGTTG AAGATTC CTCAACTCAT TCAGACTAGA CTCTGTTAGT 'rCTTTCTTAT GGTCTGTCAC ACCGCCAACA AACAATACAG T74rGGCAAtT
ATTCTGATAA
ACATTTACTT
TCAATTCCTT
AAAGCTTCTC
CTTTCTTAAC
GCGGCACAAT
CTCGTGTCAC
CAACACGAAA
TATTGTAGCG
CGCCTGCAGA
TATCCACCAA
CATrCCcAAC
ATTTATAGGC
GCACTCCTGA
AAAGGGTCTG
CTTTGAAGAC
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 .TTCATCTGGT TCTGGAGTC'r TT'rTGTCCGA AACCTrGGTC CTTAGCCAAA CTCGAAGCTT C'N'CAAAGGT TGAATACATC GA7=TATTG GCCAAGCCCA TAGACAGGTC AGATTCGTGA ATAAAGACAG CACACGCGCA GCGATAACAG GCGGTACTGA GACAAAGCCC CCCTTTGAAA TGGACGCAGT CGCAACATGA TAAAGAGCGA TTGGACAATT CCCCAACCAA 1165 GTCCAGCATA TTGCCAAG AGAAATAGCG ACGCAATT'r GGTGACATCC AAACCTGACT TAAGGAT1TTC 7rGGTGTC2 ATAGTGGACT TCCCAACCAT CTTCGATGAA CTTGGGCATT- GTGTCCAACC GTCCCCCCAC CTGTAAAGAC AAT7T'rMTC CTACTGTGTC GATAAAGAGG TCGCCACGTA CTTCAAAGT-'r
CCACTCGCAA
ATACCACACT
AACAAAAGAT
TAGAATGGAA
TGTCCCCGAT
TGAGvGGTAAC
CATTGGCAGG
TCGCATCTC
TGACACCTTC
ACTAAGAAGA ACCACATCTC CTTCAGTCGC AATATCTGTC GCCTCCACAT AAGCGACACC TGCAGATTGA CCCAGGATGA CCATCTTCTT ATATTATTCT ?ITrAACTCCG AGCATACATA TCCCAGCTAG AAGCTCATAG GCCTTGCCGGG AGCCTTG'rCT GCTGCCCGTT CAGTCCAGTA ATGTCTGCCA AATCAAGACG ACCTTGCTGT AGTTGATTTA CTGTCGTTAT GTGTTTCACA CCACCGAAG CCAATTCGTC AAACTCATTG CCACGGTCCA AACCACCTGC TGTCAAATCC TGACAAGGCT TTTTGACTAG CCAAGATATT a a a.
S
AGAATTTAAC ACCCTTGATG TCATCCACAA CTGAAAGAGT TTCCTrGATG GTTTGATGT TCGCAAGGGC A'N'TTCCACA TTGTGGCTAC CTACTTCACC ACGGAAGTAG AGTTGACCAT CCACATCACG AACTGGCT ACAGCAATAG CTGGAACACC GATTTCAPTC GCTGCCATGA CTTCCAGATA AGCTCCATCA ACCTTTTCAA TCT'rGGAAGT CAAGTCTTT-r GCCAAGTCTT
ACTGGAGACG
GTGTTGAAAA
GATTAAAGTT
CTIGCTACATA
TAACCGCAAT
'rGGTACAACA
CAAGACAAGG
TTCCGAAAAT
CTCTGGATGG
GTGGCTTC G
AAATCAGCCTG
GACCCATGCT
AATrTGAA 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 TAACAAGCGT GTCCTTATCT GATGCTATT'r TCCCTGATAA AAGACCATGT TGGCCAGCAG TGGTTGTCTT ACCGTTCGAT CCTGTGATAC CCAA'rTCCAC CTCAGTCAAG ACTGGAATTC
CTGTCATC'TT
AGTCGATATG
CACCCATGAG
GAGCAACCTG
CAGTCAAAAC
CAATAATCGG
CCTTCGCCAA
GTTCTGGATA TTCCACTTGG AGTTGCCATG AGG'rGGTAA TTGGAAAGAA GAAAGTTCCA ACTAGCTGGA TAGCCCATAT TTCCCCAATC ATAGTCGTTG TGCT'rCTGAA ATCAAATAAG AGCCTTTTrCA ATCATGGGAT CTCTTCATCC AAGAGTTCCA CAAACN7'TGG GCAGCTGGAT ACCTACCTTG TCCAACAAAC
TGTTGTAGCG
AAGGATGGCC
TGTCCTCGAA
GAGCTGCAGA
GATCTArrAC
GAAATAAAAA
GATACCTGGA TTTTTCACCA TAAGGGCAAA ACCTGTAATG ACCTTGATCC CTTCTTCCAG AGGTTCCCA TCATTACTG TC-ACAATGGC TTCACCAGAC TTGGCCAAAC CTAAAACAAG 7rrCATGTCT CGAACTCCAT TTTAC'TCC'r AGCCACAAAG TGTGTTTGTG ACTCT'rTCTT
GACTTTCTTA
ACTA?'N=AC
CTAACTGAAT
TTTTTA.AA7'r
CATT'TTTATG
CTTACCATAT
CATCTATGTG ATAAATCGGT AACTCGAATG ACCTGA'rCCA CTTGCTCCCA AATCAGAGGA
TTATGGGTCG
AATTCCTCAG
CGATCCTTTA
AATATAGGTT
AAAGAGATTT
TTGACGGT'TT
CAATAATAAT
AGTITTTTGGG
AAATTATCTT
GC'r'rCAAATC
TCTCTTTC
CGCTTTCAAT
1166 GGTCCGATTC GGA'N'T TTA GTCTAAGGAA GCGGTTGGTT CGCTAGTGCA ACACGTTGTG CAAATAAGAG AGGTTTAC-Ac CTTCAACTTTr TTACCAACTA TAAGCCAAAA TCTTGAAATA AAGAI'TrAG GATGGAAAGT CATCTGCGAG GATCAAAGGT CTTCTCCTCC TGATAACTCA GrrAGAGC TTGTTTCATC AACCCAGATT GAGATTCTCT AGTATCCTAA GTAATCTCTA CATAGA'rGAT TTeCCCTTTG GAACTGCCAT AAGAAAACAG AAGGCTTGAT GTCCTTAAGA TCATA'rGGCT CCAATCCTCC AATCATATTC AAGAGTGTTG TICTTACCZACA CCGA'rrAACG CATAAATT'T CCCACCTTCA AAATGAAGAT TCATATCTGA CGGCTTCCAA A'NTTTAGA TATATTCTTT AGTTCAATCA TCCTATTTC TGTCA'rAGAA ACACGAGATT CTTTCTGCGC TTGACGGTAA AGCGTCAAAA TAGAAAGACC AATAAAGTGA GCAAGCCAAT CACCAAGTCT CGACTGCTTA GCCACTTG'rA
AAATAGCTGA
CTTTCATAAT
CTGCACTAGC
AAATAAAGAG
ACTAGCACCA AATACAAAAC TAGCAAATTG GCTAACCATA TACTGAGCAT GTGTTTCAAA AAATCGTAAA CCTGAAATTC AAGAT'rGACA GAATAAAACA TAAAGTTGCT GTTTCAGT'T TTTAAGATAG GATACTTGCT ATATCC'rTTG ATAAAGAGTT ACTATCACCT GTAGAAGTCG AGATACGGGA TTCTCACCGT AACGTGCGCT TTCATCTCAT ATAACTCAAT CTTTCTTCAA G'rTTAATCAA GATATCTCGG CGGAATTCCT CGAAATATAG GTAACAAGGA ACTGGCTAT-r GAACTTCA'rT ATAACGAGTT CATAAA'r'CC GTTT'rCCAGC
GCGTGAATAC
TATTATAAAC
AATCCAAAGG
AAACTTTCTT
AGCTT'rC-rC AT'rGATAGAC
CACTAAAATC
AAACCGCTTT
AGCACTTGCC
AAGTTCTGCT
GAGA'rGGGCT
ACTGGGACTA
CCAACAATAG CTCCTAAGAT AGATAAACAC TTCTTCCT'rC AAGAGTTCTA GCCCACTCTC CAACTAGATA AGGATATAAA GGATCAGTCA AATACTGAGT TCTCCCATTG AAAGATAACT TCCTCACCAG A'rTTTCCATA TCTCGAGAGC GCAAATGTTC AACTTCTG'rT TGGTCTCAGC ACATAGAG.CG TA'rAGCATC CCTTIG'GGAT TGGCAAAATG ATTGCTTCCT TGGCAAAGGC CCTAAGCCAA AGGAAATCT4G TCAAGTTCTT TCAATCGTTG AAAACAGCTA CTAACTGACA 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TGGCAGCAAG AGGATAAACT CACCTTTTTG ATCTACCACG ACCTTTTCCT TGTCCAAATA TGAACTATAG GTATCCAGTG TCTCTCCCTG TTCATTTTTT GAGCAGATTA TCCTTTACAT AAAGAGCTTG TrCTTCTrCG ATACCACTTG CTCTGATTr'r CTGTATCTTT TCCTCTATCA GTAATAGTCr GCTCTGTCCT GCCATGCTTG rrTTCAAATT GTAAGACGTC AAACC'rGTCT TAACAGCa'rA CCCTACTGTA CAATAGGGTT AAAGCCATCA AGCGTTTAAG GGGTAATCT'r CCCTTAATAA CGGGAACTAA 1167 TGCTGTAA CTCAAACTCA TTAGGTAAAG GAGCAT'rAGT AAAA'rrGAAA TCGCCAATAA 5820 AAACAACAGA TAGAAACTAA TCCCAAAACC ATAGGTGGCT AACAAGATAG GATAAAACAA 5880 ACCTTGACTA AAAAGAACGA CTCCCCCACC TAGGAAGGAA AGGAGGGCTG ATAGAAGQAG 5940 CCATI'TGATA TCAGTAGATA AAGAATGCCC CATGATGGAT AAGAGAGTCT GACCAGAAAA 6000 GAGT'rATA CCTGCTGCTC TCATTTCCI AATCCGAGTG ATAATCACTA AAGCAAAGAA 6060 AGATAAGCCA AATATTGCTA AACTAATTAA AATAAGGGGA TrrAGTAATA TTCGAA.AAGC 6120 AAGAAAATAG GGCGGTATCT TTCGGTCAGC ACTTGCTTTA TAACCCAAAT CTCCTAATTr 6180 ATCGGCAAGC TTTTCTTTCG TCAAGGAGCC TGACAAAAGG AGATAACTAT TTAGCGGAnT 6240 AtACGTTCAC GACTTTCTrG GCTAGCTTCT TGGAAT'rCTr TTGGTAAAGT TCCCTGACCA 6300 TAAGTTGCAT AAGTAAAGTG AGTCGTCCCA TCCTTACTCG GCTCTACAAT TCTTCTAGCT 6360 ATTAAACTCT GT'rCTGAGTT TGCAAAATTC TCCAATTCCr GTTCAAATAC CTCACGCGTC 6420 GGTTCCTGAG TATCT 'TTTT GACACGAAGT AAAGAAACGG AATCATAGCT TGCATATAAA 6480 *TATTGTGGCG CACGTAAGAC AATAATCCAA GCAAGGAAGA AGCTGAGAAA AAAAGTTGAT 6540 ***AATAATATGA ATAGTTrCT r CATACTAGAC TCCTrGTAAA CAAAATTCCC CCTGTAATTT 6600 CTTACAAGGG GAACGAT'rTA AATCAATGAA CGA'rTAGTCA TAATCACAGT AAAATGCTAC 6660 *TTGTTCTCCC CA'N'TAGTCC AAATCCATGC AGG 6693 0 INFORMATION FOR SEQ ID NO: 196: SEQUENCE CHARACTERISTICS: LENGTH: 1847 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 196: .0 CCGGTCTATG TACCCACTAC TTTGGGACAA TATGGGGATC AGCTACCCAA AACTAATCGA GCGT'rTGGTT GACCTTGCCA AGGAAAGTTT TGACAAGCGC GACGATTTGA TATAAAATGA 120 AAGAGAGGGT AGAAGCCAGA ACCATCACTG CACGGTGACT AGAGTTCTCG GACTTCAGCC 180 C TTTTTAAAG GAGTAGAAAT GAAATTAACA ATCCATGAAA TrrGCCCAAGT TGTGGAGCC 240 AAAAATGATA TCAGTATCTT TGAGGACACC CAGTTAGAAA AAGCTGAGTT TGATAGTCGT 300 TTGATTGGAA CTGGAGATTT ATI'TGTGCCA CTTAAAGGTG CGCGTGATGG CCATGACTTT 360 ATTGAAACAG CCT~rGAAAA TGGTGCAGCA GTAACCTTGT CTGAGAAAGA GGTCTCAAAT 420 1168 CATCCTTACA TTCTAGTAGA TGATGTTG ACAGCCTTTC AATCCTTAGC ATCCTACTAT C1 TGAAA.AAA CGACI'GTTGA TGTCTTTGCT GTTACAGGTT CAAATGGCAA GACAACGACT AAGGATATGT TGGCGCATTT ACTGTCAACA AGATACAAGA CCTACAAAAC ACAAGGCAAT TACAATA.ATG AGATTGGCCT TcCCTACACA TTGGT~TTGG AGATGGGACA GGATCACTrG CGTCCAAAAA CAGCCATCGT GACCTTGGTT CGTTCAGAGA TTGCTAAGGG AAAAATGCAA GTTCTTCATA TGCCTGAAGG GGCGATATTC ATCTCTTGTC
AACAGAAAAG
TGAATTGGCT
CTTTAGCGC
cGFTrGGGc
ACCTTCAAGG
GCGACAAATG
ATTCGTTTGG
GCCAATGGAG
ATTTTAGAGA
CGGCTGACCC TATCGTAGAG AAGGGGCAGA GCTGGAAATT CCAATTTCTT AGAGCAACCC CTATGATTGC ATCCTATGTT CCTTCCAAGA TCTTGAATTG CAGATATCCT GTCAGATGTT CTTTCrCTGC CATTCCAGCC GGAGAAGCCC ATI-rGGCCT-r TTTCAAAGAC ATTGCAGACG GAATGGCTTC AGGTTCCTrG GACTATTTGC CAACTGATAA AAAGGTGGTT ACTGACTTGG TTGAGCGCAA AGATAGTCTG CTTGATTTGC CAGTAACTGG CAAGTACAAT
GCCTTGCAAG
ACGCGTAACC
TACAATGCCA
AATGAAGGTG
AAGGAGTT'rC AGAGGAGCAA GTACCGAGTG GAAGAAAGCA ATCCAACTGC TATGAAACTG GCAAGAAAAT TGCAGTGTTG ATAATCAGAT GATTTTGAGC AAAATA'rTGC TGAATrAGCC ACTTCAAGAA AACAGAAGAC GCCTTGGAGC CCATGACCAA 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1847 GCGGATATGA AGGAGCTTGG TGACCAGTCT GTTCAACrTC CT'TrCrCCAG ATGTGCTTGA TACCGTGATT TTCTATGGAG CAATTGGCCA GTCAAATGT'r CCCAATCGGC CACGTTTACT CAGGATCAAT TTGAAGACCT AGTCAAGCAG GTCAAGGAAA ATCCTG;CTCA AAGGCTCTAA CTCTATGAAT CTAGCCAAGT GAAGACAAGT GATTTTGTCA AGTATTTGCA AAGAATGATT AACCTTTACA AAAGATCCGT TTGACCGTGA GCGCTACGAA TGAAATGT'rG AATCAAGCAT CAGACCTTGA TTCCGAAGAA AACTrCTGCT TATGCGACTC CG?'A.ATGGA CGTCCGTGCT GA'FTTGTCTG GTTAGGGGAC AAGGAGAGGA. TAGTTGGGCT INFORMATION FOR SEQ ID NO: 197: i) SEQUENCE CHARACTERISTICS: LENGTH: 1062 base pairs CE) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TGGTAGAAAG TTTAGAAAAT OCCATTACAG ATACTGGCTT GAC'N'GCGAA GTCTGTTATC GTGGCAGAAG TCTTGAAGCC TGGATTGTTG AGGATGAGAA
TTGCCGG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197: 1169 CAAGCGAAAA CAT'rTAT TCCAAATAAA CAGAGCXI-r TAGGAGAACA AGAGAM-mT AATGCCA.AGT CGATCTrGGC CTTGCTAGAC GGTTTGGAGT CACATAGCTA TGATGTAGTC 120 TATC TCCGTC AGCCTCT'rAA TCGTCTCGAA TATATCGAGT GTGCGATACT GGGGCAATCA 180 CAATTTCTCT TTAAGCTCAG TTATGCTGA'r GGTCAAAAGG CTrACCCTGT CGATCr~CCT 240 GACCTACTA.A CAAAGACAGA CTGGCAGATT ATCAAGTCAT rMAGATGC TrrGCrrGCT 300 TATACAGGGA CTGATATTGA AGGGCTAGAT GGr GATT TTGAAGCTTA r-rCCAAGCA 360 AGTATI'CAAG CCTATCTAGC AGACCCTGTA GCTCGT'r-A CGATTTGCCA AGGAAT-TT'r 420 AATCCTATTT TCTTTAGTCG TGAGAACTTG AA AGCr= TAGAGGCAGA TGGCTTGGCI 480 CAGTTTGAAG CGCGTGTGCG TGCGGTTCAA. GAGACAGATG CCTACTTTGC GAGAGTTTCC 540 TTCTATCAGG ATGGAGAAGG AAAAGTGCAT GGCGTTTACC ATCTACCTCA AGGAGTCA.AG 600 ACACTTTTAC CGAGAGAACC GTTTGTTCCT GCAGCCTATA TTGAGCAAT'r GGTGGATAAG 660 GAAGTCCACT GGGAGATTGA CTTGGTTCA.A ATCACAGGAG ATGGCTCTAA ACCAGAAGAC 720 ***TATGA.AGCCA TTGC'rCGCT'r GGACTATGCA AAATTCTTAG ACGTATTACC CCCATCTT'rT 780 *TACCACCAAC TAGACGCCAA TCAAATAGAA GTGCAACCCA TATTAGACAA AGATTTTAAA 840 *ACAT'TAGCAC AAGAAAAGTA AAGCAGAAGC AGGTCAATCG ACTTGCT1-T TTGACATAGA 900 AAA.AATCCTG CCAAGaTGAC AGGATTGCTA CTCAATGAAA ATCAAAGAGC AAACTAGGAA 960 GCTAGCCGCA GCTGTACTTG AGTACGGTAA GGCGAAGCTG ACGTGGTTTG AATTGATTT 1020 *TTGAAGAGTA TGAAGTTTAA AGAAAAGCCA AGATACGAAG AT 1062 INFORMATION FOR SEQ ID NO: 198: SEQUENCE CHARACTERISTICS: LENGTH: 6846 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 198: *TATCTACAAC CTCAAAAACA TG'TTGawG gCTCGTCAGT cTATCTACAA CCTCAAAAAC .ATGTTTTgAa kGC LcGTCAG tTCTATCTAC AACCTCAAAA ACATGTTTTG AcaGCcTcGT 120 CAGTTCTATC TACAACCTCA AAAACATGTT TTGAGCTGAC TTCGI'AGTT TCATCTACAA 180 CCTCAAAAAC ATGTrGAG CTGACI'CGT TAG?1'TCATC TACAACCTCA AAAACATGTT 240 TTCangnCnT CGTCAGTTCT ATCTGCAACC TCAAAGCAGT GCTTrgagcG C?1'CGTCAGT 300 TCTATCTACA ACCTCAAAAC AGTGTGTTGC TATGG;TATTC ATTAAGTCALA CATCTCTI'GT ATTCCCTGAT TN'TTCTATT TACGrTXTrCG TAAGAGAACT TCACGrCTT CCAACTCrrC ACTAATGGCA CCAAGGTCAT AAAGAGGTTG GCGTGAGATT TG 'GAATCAT CACTAGTAAT CTTATCAGCC AAACCCTTCA AGACTCCTGC AGTTGCAT 'T GATGAAATCA AACGCTCTGC 1170 GCAGCCCTrTA ATCACCCGCC TTAAGAGCAC CAAATCAGGA TG3TTGAGCTA CGTTCTGTCA TAGTCCGC'rC
AATCTTCTCG
AACCATGAGG
CrATGCATA ATC~rGGTCA ACATACOCAT CGCAATCGTT O1CAAGTTTG GACGGGTAAA AATTTCAAAA TC7"rCTGGCA CAGAAACACC TGCCAACTCA TCACCTCTCA CAACTGCTGC TAAGGCGTAA CCATCATCAT AGCTATATTT GATTCCTGCT TTTTTCAAGG rITrCCTTGTA ATCCACTAGC GGACCGCTAA CGAAAGCAAT
AG.ATTCAAAT
GCCAACTAAA
ACGCTCATTT
ATTGACACTT
TGAACGCGAA
ATCTACCTGC
ACCAAACCCT
CGAACCTTAC
TCTTTAGCAA
GGCAACTGGT
AArrCTGAC
TTTGAAAAGA
CACTATAAGT
CATTGATGTC
GGTAACTCAC
GCTCAACATC
GAATTTTATC
GGGTATTGAC
GCTATTAGCT AGGACAATAT TGTACTTGTA CAAACTCGAA AAATAACCAT TGGTAATATT TTTACftGCA AGACCACGCG CAACTGCATr TAGCACTTTT TTACGGGTAT TCTCTTTTAC CGTCGCCATG GAAACACCTG CTTCACGAGC ATTCATTCCT 'TTTCCTGTCC TT'rCTATCTC TGAAGCTCTA TATCTACTTA CAAAAGTGAA TGCATCAATT GTTGCTTGCT TATACTCAAT GACAGTTCCT GCGAGAACAA TCGGAGTACG TGTCAAGTGA TACCCCATAT AGATAATGCC AACAGAAACT TCTTTCTCGT TATCTTCATC CATTTCTGCA ATATCATCAA TCCCCTTAGC 'rGGAATCACG ACACCGACAG TGG7"TGTCTT TGGACGATA.A TCCAAACGAT CAATTACCTC ATTTTTATTG CCATTGACCA CACGGCTGAC GACATCATAA ATGGTTACTG TATCA'rCTGC ACACATTCTT TTACALAGTAG AGGTACTGAT GATGTGAAAA TTTCGTTTTC ATATTTCTAC 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 TTATTCCAT'r CTATCACTAA TTGTAAACAC TTTCAAGTGT 'NTTGAAGA 'rTGATTGAAA AAATTTCATA GAAAACCTAG GTTTAGCTCC TTCACCAC CTTAGACTAA ACAAAAACGA GGAAACTAAG CCCTCCTAAA GTTATAG;TAA AATGAAATAA GAACAGCATA AATCGATCAG GACAGTCAAA TCGATTTCTA ACAATGTTTT AGAAGTAGAG GTGTACTATT CTAGTrTCAA TCTACTATAG GTATTGTTCC ATTCACTACC GTCAAT.TTTA CCACATAGTC TTCATGAAAA TATTATATCA TCATAACCAA CCAGATTCTT TCGCGATATT AGCTGCCTCT GTTCGATTAC CTGCATCTAG TTTCGAAAGA ATATTGGTGA CATAGTTI'CG GACTGTTCCG TTGGATAGAT AAAGTTTrGTC TGCAATTTCT TGTAGAGA ACCCTrGAGC AATTCCCTTT AAAACTGCGA TTTCTTGCTC CGTTAATGGA TTGGGATGCA TCATCACCAC TTCCATCAAT TCAGGCGAAT 1171 ACTCCTTIGCG TCCTTCGAG4G ACCGGrGCA AGGTrrGCAT GAGGTCTGCA ATGT~Tl-r C7NTTAATAC ATAAGCATCT ACTCCAGCCT TGACCGCACG TTCAAAATAC CCAGGACGCT TGAAGGCGT1 CACCACAACC ACCTrTGI-rr CAAGCTTTT-C TGC'rCGTrATC CACTCCAGA C7"rCAAGACC TGTCTTAACA GGCATTTCTA CGTCAAGGAT GGCGATATCT ACAGAcCCC 1'rCTAATAG ITTGGA7TGCT TCTTGCCCAT 'rCTTGGCTTG AAAGACAGAC TCTACATCCG GTTGAAGCAT GAGCAACTGG CACA'rGGCAT CTCGCAACAT ACrrTGATCr TCTGCGACTA ATAC"rCAT CTACTTCTC TCCTTATAAA GTAGTCGAAC CTGCACTTCA CrTTGATGTT TCTGACTGAT TACACTTACT TCTCCTGAAA ATGGAAAAAC ACGATTTCGG ACTGTATGGA GCTCATCCCC GCrrATAGAG GCAAAGCCAC AGCCATCATC TrCTCACTGTT AGAA'CGAGTT CTTTrCTCTGT CCGTTrCrAAT T'rCAAGTAGA CTTTAGACGC TrrAGCATGT TTGATGATAT TGGTCACTAA TTCAAGCAAA ATCATGGAAG CCGTTGACTC CAATTCCTGA GTTAAGCTAG ACTTGTCCAA G'rGArrCTCA ACTTGAACCT CAATTCCAGC AATTCTAAC ATCTN'TTCA CAGTCTCTAG IrCGGATGTC AAAGIITCTAG ACTTAAGAT'r TITCCACAATG GTTCCCACTT CATTCATGGA tCCTrGCTGA TCTGGTGAAT rrC-rTTAAT TCCTTTCCA CCTGTGGATA AGCCTCCATC TGAAATAACT GCAAGGCTAA ATCTGTCI-rG ACACTCAGCA TAGCAAAGGT ATGTCCCAGA CTA'rCATGCA AATCCTGACC GATACGACTA CGTTCATTTT CAGCAAGCAA TAGATTTATC TGAGCATTrM GCTTGACC'rG AGCTTCTTTC AAATCCTCGA CAATACGAAT CCGAACCAAT CCAAAAGTCA TTAAATCGAC AAAAGTAAGA ATTACAAGTA GA'rAGAA'rAG AAACTCAACT 'rCGATTCTCT GAAAAATCAA CAGTTGCCCC ACAACAAGGA CT'rGAGCAAG AAGAAAAGTC CAGACATGTA AAGACTTTAA ACTACGTACG CTGAAATGAT AACTTAAGAG ATT'GGATAGG AAAAAGAAAA ACCAGA'rATA ATTAACAGCA ACAAAGGCAG TATrCCCAAC TACATAAGTC AGCATGAGGC CCCAATATAG CCAAGATAGG CGCTGGCTCT TAGTTGTTAA AACACCCAAA TATGCCACTA CAAA'rAGAA'r ATCAATCAAT A.AATGCCAGG CAGAAAGCCA CCCAGTCACT ACAGACAGGA TGGGGAAAAT CATAAAAATT AAACTGATCC AAAACATATA ATGTATTCTT TTCAGTCrT CAAGCATTAA GCA'rTCTCCT TA'rGACCTTG AAGGTAAATG GTCAAACCAA ACAAAACTAC TGAAAAAACA AGTAAATAAA CTGTGGC'rGA TAGATTGA'rG CCACCCTCAT TTAAGAAGGT CTTGAGCAAC TCCATCAACT GATAGGTCGG GAGACACTTA CCTACTACTT GCATCCAGTC TGGAAATAAA GAGATAGGCA TCCAGAGTCC ACCTAAAACA GCCAACCCTA GATAAAGAAG ATTGCCCACG ACAGACATCA ACTGACTAGT TGGTAAGAGA 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 GTCAAGGTCA AACCAAGCGC TACGAAGGCA CCAATCCAAT T'rCCAAGAGA CATGTCCACA ACCAAGATTG AAACCAAATA ATCAACCAGC ACCATATA CAGGGCTATG ACGCAATGTT AAAACAACTG GGAATGAGAA GATAGCTGTT AGATAATCAC GCATAAAATT CGCGAGTTCA AATAAATAGA AAGCCGTCGG CATCCCTACT GTCAATAAAA ATTCTATCTT ACTAAGTGCT CTGCG'ITTCT TCAAAGATTG TATCCAACA-A GCCACATCCT GC?1'GAACTA ACAGTTCCCA TAGAGCATCC TG7rTTTG ACCAG7TTTC 1172 ATACTTCCTA CTATCAGCAA AAC1'GCAGCC CCTCTTACAA AATGCCCAAC TGAGAAAACC ATACTTGTTA TCrrGATAG ATAATATTCT 'ETCTGCCAGT TGTTGATCTT GTCGGTATGT GACATCATGG AAAATGCAGT CATGGAGATA CCTGGTGTGT CCTGATAGAT ACCAGAAAAA GACAATAGAT AATAGATCAA TTGTCG'rTTG AGCCATCGTT TCATCTTAGT TATCTCCCTT ACTACGATTA TTAACTTCAA TrTT=GTA'r AAAAGCATCT GCTTCGCGTG TGACTAC?1'G AACCAAGTTA GACTGCTCA.A TGACTTCCTT AATTCCCTCA CTACGCATAG CTAGAGGCGT AACCAAAATC CGGTCAGCCG TATGCTCTAC
GTATGCCAGA
CGTATCACGA
CTCTTCAATA
GATTTCCCAA
GACAAGCTTT
TGACAA''T
TTCCTGATCG
CTTTAArrTC GGAAGGATAA AATGCTrTC ATCAACTCTC CCTTATTTAA
TAATGAGACG
AAGCGTTGAC
GGTCGCCCAA
TCTGCGAATT
CTCAAGGAA'r
TGAACGATGA
AGAACTCGTC ACTGACA.AGC AAGCAGACAG TCCAAGAGTG r'rCACCTTCA GCTACCTCAA ATTTAGGCTT TCTACCTTAA TAAGGAAAAC GACGAAAATC gCAAGGTTTG GTCAAACCAA CGATAAAGAA ATAGATGATC TAATTTCAAA CCTTCTCTAA AGGAAAATGA TAGAAACTCT TACTACTAAA CTCACTGCGA CTACAATTAT AGTCT7trTA AATAGAGAAT CCTGACTCCT TGCGCTTrTA GAGTTGAAGT ATCCATGGCA GCAGTTGGTT TCAAGGTCAA GACAAAAGAG AAGAGACGCT GCTC7=TTG TTGCTGGTCA AACTGCAATA TTGGATAGAT ACGTTGAAAG AAAGCAATCA CATTTTCTTG AGGCAGATAA CCTCTAATAT CTTGGATGGA TAC 'rGACCG CTTGTGACCA TGGTCTTCCC AGCACCATTG GGCCCAATCA AGGAAATACC CTTCAAAATA GCCTTGCCCT TCATATTCAT GATATTCTCC TTTCAACCAC ATAAATCCAA ACCCCAAAGC ACCACGAATG CCTGTAAACA TTTCCACTAA CCATACCAAG CCTCTC=CA TTCCTCAAGC TCCTTTTTCA CAAGCCAAGA CATCATTCCA AAGCCAGCAA CATCCAATCC CGAAAACATG AGTTAGGTCA TAATCATTTT ATTTCTCATC TCTTCTTCCT AATCAGAGGA GACAGAAGCT TCTGTCACTA
GGTCCCGAAC
CATCTAAAAA
TTTGCCC~CC
GTTGATCGAT
ACTCTTTGAC
AGTCTAACTG
GTTTATCTCC
AGGCGACGCA
'rGATGTrTTT
TCCATTCTCA
AATTGGCGAA
AGTGACAGGC
CATCTCCGAC
AGAGCTCCCA
TAACTCCTGC
CCATTTCATA
GAAAATATGA
3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 1173 CAAATGTCAT AAAAAArCT GTTCAAAACA AGCAAGATAC ACTATACAAT AAAACACAAT 5700 TAGAAAAATC TAAGGCAACT TCCTCAAAAG AGATATCAAA CCCAATTCAC ACCATAATGT 5760 AAACTAATAC TTATTTAAAA TCAAAAAGAG TAGAAATTTT rA'rCAGACAA ACACATATAT 5820 AGTG'TAT TGA ATCTATAACA GTAGGCCTTA AATACTAAAA TATrCTATA AArATTrA 5880 ACTI-rCCTGA TAGAGCTGTr CATATCTrAT TTCAA'N'CTC TAAATTATAC GT I'GAACAAA 5940 ACCC7rCTAT TT-CTrTCTTA AAGATTATA AGAGTTA'rAA AATCTGTTAA ATTTCAATGT 6000 GTATACCTAA ACTACGGrAT TTAT'rGAAAA GACTGGAGAC AAA.AAGTATA CGCTGCCAAA 6060 ATGAATTACT GAAAATCAAA AAAGAGAGAA CCAAACTGAT TCCCTCTTAA TGTATATAAT 6120 ATCTAGTTTT AAAAATACAC ACTCACATAT CTCTGTAATG AATCGGGAAG ACAGGATTCG 6180 AACCTGCGAC ACCTTGGTCC CAAACCAAGC ACTCTACCAA GCTGAGCTAC TTCCCGAGTT 6240 AAATAGAA.AA ATGCACCCTA GAGGAGTCGA ACCTCTAACC GCCTGATTCG TAGTCAGGTA 6300 CTCTATCCAG TTGAGCTAAG GGTGCTCCAT ATTATCCGCA GGACCGGAAT CGAACCGGTA 6360 CGATCGTTAC CAATCGCAGG ATTTTAAGTC CTGTGCGTCT GCCAGTTCCG CCACCCCGGC 6420 *..C'rCTCTAAGC GAACGACGGG ATCGAACCC GCGACCCCCA CCT'rGGCAAG GTGGTGTTCT 6480 ***ACCACTGAAC TACGTTCGCA CTGTTTTCTT CTATCTAAAA ATGCCGGC TA CATGACTTGA 6540 ACACGCGACC CTCTGA'PTAC AAATCAGATG CTCTACCAAC TGAGCTAAGC CGGCTCATTT 6600 .GTTATATCTT AATGCGGGTT AAGGGACTTG AACCCCCACG CCGTTAAGCG CCAGATCCTA 6660 AATCTGGTGC GTCTGCCAAT TCCGCCAAAC CCGCATATAT GACCCGTACT GGGCTCGAAC 6720 CAGTGACCCA T'rGATTAAAA GTCAATTGCT CTACCAACTG AGCTAACGAG TCTAAAATAA 6780 *cTTGCGTTAC CTTAAACGGT CCCGACGGGA ATCGAACCCG CGATCTcGCC GTGACAAGGC 6840 GACGTG 6846 INFORKATION FOR-SEQ ID NO: 199: 9 SEQUENCE CHARACTERISTICS: LENGTH: 2911 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: GAATTCATTT TAAATAAAGA TACGGGAGAG GTAAGTGAAT TAAAACCTCA TAGGGTAACT GTGACCATTC AAAATGGAAA AGAAATGAGT TCAACGATAG TGTCGGAAGA AGATTTTATT 120 1174 TTACCTGTTT ATAAGGGTGA ATTAGAAAAA GGATACCAAT TTGATCGG GGAAAT'I= GG7'=CAAG GTAAAAAAGA CGCTGGCTAT GTTA'rTAATC TATCAAAAGA TACC?1TrATA AAACCTGTAT 'rCAAG.AAAAT AGAGGAGAAA AAGGAGGAAG AAAATAAACC TACrrTTGAT GTATCGAAAA AGAAAGATAA CCCACAAGTA AACCATAGTC AATTAAATGA AAGTCACAGA AAAGAGGATr TACAAAGAGA AGAGCArrCA CAAAAATCTG AITCAACTAA GGA7G=rACA GCTACAGTTC TTGATAAAAA CAATATCAG;T AGTAAATCAA CrACTAACAA TCCTAATAAG TTGCCAAAAA CTGGAACAGC AAGCGGAGCC CACACACTAT 'rAGCTGCCGG AATAATGTT ATAGTAGGAA 7TTT~TCTTGG ATTGAAGAAA AAAAATCAAG ATTAAGATAA AAGCTATAGA AAAAAATGGT TTATGTACTG AGATTAGATA CTGAGGTGAT GACATAGTTT TGTGAAAATA GCCATrrATA ACTCAATTAT TTAGTTTACT TTACTTTACT AGTGATACTA TTTGGAGTTA 300 360 420 480 540 600 660 720 780 TTAATGGACT TAGTTTATAT GGTCATATTG ACTAGAAAAT CAATTTCGGC TCTT'rGTCAA ATTTTGTCCT TTCTTTTTTG AGTTTrCGAAA ACCAAAGGCA GTTTGGCATT AGAATAGTGT AGG7TTTTAGA GGATGAACTr CTCCTTATTC TGAAAGTGAA AACTAATGAA TTGATTGAAA AGAGTCTATC AAAATNTAAA CTGTAGTGGG TTGAAGTCAG ATATTCAGAG CGATAAAAAT TTGCGCTTA TAAGTTTGAT PGrrGAAGGG CATTGACAAT GATTCAGATT GTCCTCAATG A.AAGCAAGAG TTGATAGAGA GGGTTAGTAT TGACAATATT GGCTAATAGA GGTGATGAGA CTAAGCTCGA GAAAGGACAA CCGT'r=IrG AAG'TrCAA
GAGATTATTG
CTTCTCTTTA
AGTCCGAAAA
TTATAGTGGT
TTCTGAATAG CTCAAAAGTT TATCTATAGT AGATTGAAAC TAGAATAGTA TCTAAAACAT TGTTAGAAAT CGATTTGACT GTCCTGAATG ATTTGTCCTG ATTTTACTAT AAATCCACGT TTACGAATCT CTTTCCACAC TTG?'rCAATG CTGGTGTCTA TGGAGGAATA AATGCAAAAC CAATATTAGT CGGAATCTTT A'rTTA'rGCCA TATAGCATTG TCCATAACGA GTAAAAGATA ATCATCTGGA AAAGCTCCTA TTCCTAAAGC CCCTTTATAA CCTCTTGCGA GAGAGACTAT CTTACTTCAT GCGGATGAAA CTTCTTATCG GGTTCTAGAC AGTCATAGCC GTCGCT'rCCA
TCTTTGAGGA
ATTTGTCAGG
GTTTCAAGTC
CACCTCTGCT
TTATTATTTC
GGGTTCATCT
AAGGTACTTG
TAAGCTTGTG
TGACTCAGCC
ATCTGACCTA
840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 CTATTGGACC TrN'TTCTG GGAAAGTTGA GAATCAAGCA ATCACCCTGT ACCATCATGA TCAGAGTCGG AGTGGTTCGG TAGTACAAGA ATTCCTAGGA GATTATTCTG GCTATGTTCA TTGTGATATG TTGCGGCAGT AACTTAGGAC TTTAGTCCTC TAGTTCTGCC TATGCGATAG CAGTCCAAGG TTTAGGAGCA AGGCGACGCT AACCTTGGTA AACTGCGAAC CGCTAGAAGC TTATCGTCAA CTGGAAGAAG CTGAACTTGT TGGATGTTGG GCGCATGTGA GAAGCAAATT 1.175 TTTGAAGCG ACCCCCAAGC AAGCAGATAA ATCATCCTTA GGAGCTAAAG GTTTAGCTrA TTG-rGA'rCAG TrANrMCCT 'rGGAAACAGA CTGGGAGGCT TTGCCAGCTG ATGAACGACT ACAGAAACGT CAAGAACATC TCCAGCCCCT AATGGAAGAC TTC?1-rGCTT GGTGCCGCCG TCAGTCAGTT TTAGCAGGTr CAAAACTAGG AAGGGCA.ATr GAATACAGCC TCAAGTATGA AGAAACCTT'r AAGACTA'PTT TGAAAGACGG ACATCTGGTC CTTTCCAATA ATCI'AGCTGA ACCGCCATT AAATCATTGG TTATGGGACG GAGTAAAAGA GTCCAGTGGA CTCTT1'TAGC CTGAGCTCAG TTTAAAAAAG CGAGGGTGGT TATr-rCTCA AAGTTrTGAA GGAGCTAAAG CAAGAGCTAT TGrTATGAGC TTGT*rGGAAA CAGCTAA.ACG TCATCAATTA TAGTGCGTTG AATCTATAAC AGTACGCATC GACTGCTAAA ACATTTCTAT AAATCAATTT TCCTTTCCTA ATCGAT'rTr TCATATCTTA TTTCAATCCA TTATAAATAG CGAGAAATAT CTATCCI'ATC 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2911 TTCTAGAATG TCTTCCAAAC GAGGAAACTC TACCATGGAC TAAAGTTGTA CAAGAAAAGT TCCGTGAGTT CACrAATCTG GAGATTTTrC GATcACTACT TCGTCAGTCT TATCTACAAC TAGCTTCCrA GT'rTACTCTT TGATTTTCAT TATTTGTTTG TGTGTTTA'IT TTTATATAAC AGAGACAAAA AAGAACAGAA AGTAATTGAC INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS LENGTH: 6854 base p TYPE: nucleic acid STRANOEDNESS: doubi TOPOLOGY: linear TCGTAAACAA AGAGGTTTTA GAGGTTTATT GCAAATAAGA AATCTCCAGA TTAGGAACTA AATAGAtTCG TTATTGGGCG GTTACGATAT CTCAAAACAG TGTTTTGAGC AACCTGCGAC TGAATATTAG AACAGAAAAA ATGCTTGGAG AAACTATAAA CAAAATAAAA ATATAAAAAA (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200: GAAAATAAGT C?1'GACAGAA AGCGCTATCA ATGATAGAAT GAATTCAGAT AAAAAGATTT -ATT'TTAAAA CAAAAATGAA ACGTTTCAAA AAAAGAAATA AAGAGACAGC GCCAAGCGCT ATC7TTTCTA GAAAAAAATG AAACGTTTCA AAAAAGGAGG TTGCTATGAA TAGCAAAGCG AAGCAAGTTT CTCNTTGGGA AAGAATCAAG AAACAAAAAC TCTTGT 'ATT GATGACTGTC CCCGGTTTAG TTTTAACCTT TATCTTTAAA TACATCCCTA TGTATGGGGT TPTAATCGCA TTTAAAGATT ACAATCCTTr AAAAGGAATT 'PTAGGGAGTG ATTGGAT1TGG TTrTTTCTGAG TTTACAAAAT TCATATCCTC TCCCAACT a. *a AGTATCTATC G~rTTGCr CAACTCTTGA G'rGAAAAAGT ATCTCAGTCG TTG=ATTGT AACAATTC TTTCTATGTT AGACCTTTAT ACATCTr'rAG ACGGCAACAT TGGTAAATGT AATATCTTCC AACGAATCTG CAATTTGTTT TAGCTGCACG CACACATCGT TAAATTTGCC CTTGTATCAG GAGACTATTC GTAGTATTGC TTG7*TGCAGT TAAGGACGAA AGTATGAAAA CTTAAATAAA ATCATTATTG CGTCGTrAGCA TCCTTTATGG AGCCGATTGG ACTGTAGAAG TTTTATCAAT TCTCTAC1'AT GTTTACAGCT TATCCTCTr CTTGATTGTA ACTA'rGTTCT ATTGGGAATG CTCAATACTC TATTATTCTT GCTAGGGCCT CATTGATGG1T GCAAATGAT AATTATGTTT GTTCTCTTCC AATGATTTAT ATCAAGGATC CATTCAGAGC CAACCAGGTC *AcGTTTAGCT GAATGATTA TATGTATCCA TTCTTCCAAA ATAAAAAAAG AAAAAATAAA CAGTTTGTT GACACCTAGT CAAGTCCAGA TTATAAGTTG TGGCTT'r'r'A
CAAAAAACGA
CGGTATGATT
TGGAATGAAG
TGGTATCTGG
AGATCCACCC
GCACATTGAT
TGGAATTATG
AACTTCTGAA
TTACTCAACA
TAACCAAATC
1176
GGTATCTTGT
CCACCAATCA
A'N'CAGCTCA
TCCTCTTCT
CGACTTCT
CAAGGAATGG GCTCGGCTTC A.ACGCTCTAC 'PTAGTAGAAG CAGCCCGACT GGATGGAGCC ATTCCAGCTC TTAAGCCTAT TATGGTTATC AATGTCGGAT ATGAAAAAGC ATTCTTGATG ATTATCTCGA CATATGTCTA TAAAGTTGGT GCGGTTGGT'r TGTTTA.ATGC AGTGATTAAC GTTAAACGCA TGAATAATGG TGAAGGAATT ATTCGATTAT GGATACAAAA TTTGATAGAC GTA'rCTTACT TCTTTATCGT TTTGATGACT TTGCTTCCTT TACTTTATAT ATCCTAAGGT TCTGGTTAGT AGAGGCATTA GCTTTAATCC GTTACCAGCG TGTATTCAGT GACCAATCTA TTCTAAGAGG ACTCTTTTGG ATTTGCAGCT TTAACAGTCT TGCTATCTGT CTAAGAAAGA CTTGGTTGGA CGTCG'rTGGA 'rTAACTACTT TTGGTGGTGG TTTAGTCCCA ACTTACTrGC TCG'rAAAAGA CATGGGCTAT CATTGTTCCA GGTGCTGTTA ACGTI-TGGAA TAGCCAACAC ATTAAAATTA TTCTCGCGAT TATrGCTCAAT TTTTATACGC ACCAAACTrr 'TTTCAGTGGrG AGGACCAATC TGACAAATCC AGAC7rTTT ATr"?CCAAGG
TACAGATTTT
TTTATGCTTT
ATTCCCTGAA CAATTAGT'rG CTNCAAAATC A'rGCTTCCTC 'rGTAGGACAG TGGAACTCAT CAAACTTGGA ACCATTGCAA C1"rGTAC'rTC AAGACATGAT 'rCGAGCACAA GCGGCTATCA AATACGCAAC TATTGTCATT TCCAGCTTC AATACTTTGA TAAAGGAATT ATGGCTGTT AGGAGTTTTC TCATGAAATT CAAAACATTC TTAGCAGTAC TTGCACCTrG TGGCTCAAAA GAAGGTGTAA CATTCCCGCT 'rCA-AGAAAAC AAGCTGCTGT 1620 TTGCAAAACC 1680 ACTTTGATGC 1740 GTAAAATrCT 1800 ATGAAATGA.A 1860 CATTGATTGT 1920 CACTTAAAGG 1980 TCAAAATCAG 2040 AATACAGCTT 2100 AAAACA'N'GA 2160 AG~rrATGAC AGCCAGTTCA CCGTTATCTC C 'AAAGACCC AAAlrAAAAG TTAA'I"MrGC AACCT7'GGA GAAGGAPLACT GGCG'rrCATA TrGACTCGAC CAGAAAAACG TAACTTGGAT ATTCTAGTG GTGATTTACC GAGCTTCAGA TGTGGACI-rG ATGAACTGGG C1TAAAAAAGG ATTTGATTGA TAAATACATG CCAAATCTTA AGAAAATTT'r AGGCCTTGAT GACAGCACCT GATGGGCACA TTACTCATT GAGATGGTAA AGAGTCTAT CACAGTGTCA ACCATATGGC TTAAGAAACT TGTCTTGAA ATGCCAAAAA CTACTGATCA CTTI-CAAAAA CGGGGATCCA AATGGAAATG GAGAGGCTGA TTAGTGGTAA CGGAAACGAA GATTAAAT TCCTATTTGC ACGATGATCA TTTAGTAGTA GGAAATGATrG CCAAAGTTGA ACTATAAAGA AGGTGTCAAA TTTATCCGTC AATTGCAAGA AAGCTTTCGA ACATGATTGG AATAGTTACA TTG.CTAAAC
CAAC'TACCAA
AGATGCTAT1C
TGTTATTATT
'rCCGACTTT-G
CACAACGACG
CCAG'T'rGAAG
S
S. *S S S
S
SI. 55 TTTACTrTTAC ATGGGATAAG CAGTACTTGC TGGACCAAGT CACGTGACAA GATGGTrATT TTGATGCACA ATACGCTCCA
AATAATGTTA
GGTCAAAAAC
ACCAGTGTAA
CTCCAA'rCTG
CTGGAAGTAA
ACGTrAGCTCG
ACAAAAACCT
TGCAAAATAA
GGATGAGAAA CCAGAGTACA TCCATGGATT GAAGAGCTrG 'rTGGATTAAC AAAGA'rTGGC 'r'rGATYAAA GTCCTAGAAG TGAAATTCCA TTTTCANTA TGCATTTGGT A'rAGGGGATA CTTCACAGCA GATAACGATA AAPAGCCCTG AT'rGATAA.AG TCATGATCAG AAATTrGGTG CGAAAGTTAT GATGTTTAC TACAAACGGT ATGGGATTTG AGAA'rrGACA GCTAAATGGA CTGGGGAACT TACGGAGATG TAGTCTAAAA CACTTACCAC AGTAGGAGGA CCACTAGCTA TGATGCCAAA TGGCGTTTGG CAATAACTAT CCAAGAGTCT AGCAGATATG AATGACTATA TCATACTGAG TGGGATGATT CGCTATTAAA CAAAA.ATACT GGGAGATAAG AAATACACAG CGTTAATACT CAATATAAGC TCCAAATGGA TTTGTCTATT TGATAGTGTT TGGGGGCCTA GGAGCACTTG CCAGTGGCAC 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 ACAAACAACA AAACATCTTT GAATTGGATC TAAACGGAAC TGCACCAGCA GAACTTCGTC TCCTAGATTC ATACTATGGT AAAGTAACAA ATCTTATCAA AGAATATTAT GTTCCTTACA T1'ATGACACA GGAACATTTG GACAAGATTG TCTACCGTAA ACC'rGCTGAA TGGATTGrTAA
AAGCGTCAAA
AAAAGACTGA
CCATGCCTGA
TGAGCAATGT
CCCATATCGA
ATGGCAATAT
55 *5*5 5
S
5*05
*S
S S ACAAGAAAGA ACrGAAAAA TACGGACT'rT C'rGATTACC'r ACCACCAATA CCAACCAAAC AAAAACTAGA GGTTGArrAT TAGAAAAAGC CAATCGITTT ATAGCAGAAA ATAAACATCT CTG.AAGAACA 7rTTCAGCT GATTGGrr GGATCAATGA TTCGTGGAGA ATACCA'rCTC TTNATCAAT TCTATCCATA TGCACTGGGG ACATGCTAAA AGTAAGGACT TGGTGACTTG 1178 TTGCTCCTGA CCAACATTAT GACCGAAATG GTTGTTTCTC AGGCTCTGCC ATTGTCAA(G ATGATCGCCT CTGGC1'CATC TACACTGGAC ATATCGAAGA AGAAACCGGT CTCCGCCAAG TCCAAAATAT GGTATT7TCA GATGACGGGA TTCAC""TGA AAAGATTTCC CAAAATCCAG TTGCAACTGG ATCAGACTTA CCAGATGAL3T TGA'IrGCTGC TGATTTCCGT GATCCAAAAC TCwrTGAAAA AGATGGACGC TATTACTCCG TAGTAGCTGC CAAACACAAG GATAA'rGTGG
GCTGTATCGT
TAAAAGGGGG
GGAAACATTG
TICTACTAGCG
AGAACACCAA
CCTTA7'rA'rG
TCCGATAACC
GGTTTTATGT
TCACCCATGC
TTCACGGGTA
GATCA'rGGCC TAGTAGAATG GCAGT'rCGAA TCCATC?1'Tr GGGAATGCCC AG.ATTrACTTC GAGTTAGATC GTTATCAGCG TGAGGGAGAC TCATATCATA AGGTAGA7-rG GAGAGAAAAA CGTTTTATCC AAGACTTCTA TGCGCCTCAA ACATTGTTGG ACATCAACTC ATCCCTTTTG CAGAATCAGT TCAAGAAATT ACCATCAAAA TCGTCGTATC CCCATGACCA AGAACACAAG AAGATGGCAA ACTAAGACAA AAGATTGTCA TTACCACTTA ATGCGCAGCA AGTTTACATTr AACAGGACAC TAGTCGACGG ATAAAAATTC CATCGAGATT AC7TTAACGGT GCCAGCTGAG GAAAAAGTTC TCT'rrCTAAA CTGATTGCT'r GGATGCAGAC TGGGCATGTG CCATGACTCT TTCCCTGTTA AAAAAGGCCA GGAAA'rGATA TAGA'rTATCT GATCGTAGCC ATCTTATTCA TATG'rAGATA TTGAAGCTAA TTrG'rCAATC AAGGTGAAGC CTATCACGAA TTGATTAAAA A~TAGTGGAAA GAGGAC=N ATCGGGGCGT ACCCTTCCAA ACCTAGAATT CTAAG-ATTGG ATATCAAATC CAAATAGATA TGAATTTGGT TATGACAGTA A-AAAkATTCTA CGTGAAGAAG AGAATTGGAA G=PGTTCTAG AAGCTTGACT GCAACTTATT ATTAAGTTAT TTCTCCTAAA TGTGTTTTGG GTATATAAGC T'rAAGCTrAGT TTTCTAAAGA CAAGGCTTTT CCTGTTGTAT AGATGTTCTT GATTTTCTGG GGGGCG'IrGC CCGCCAATTG GTTAATTATA GATTAACACT TTAAATT'AAG TAGAGATAGC GTTCAGATAT CCCAGTTTCT 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 TTAGTTTATC GTATNTGTAA AATTGGTGTT GGATTATGAT AT'rTGAAAAA AA7Mr~ATTT TTAGATGCCC CCTACAGGGA TGTTT'TGTAA CGTTTAAATG CCCGACT'rAT TGCTTGAAAA TGC7TTGGAGG AACTGATGAA GTTT'rAGGAT TGAGCGAAGT GATGGCTATG TGATTGAACT TGGGTAGAGT TGAACAAGTC GATGCCACTA GCTTGAGAAC GATATACAGT 'rAAAGTTGAG AAGCAAAAAA ACCTTGGTTC TTGTAGGAGA TATCTTGCT AGTTTTGA GTTTGTTCGT AGAA'N'TAAA ATATAGTATA GAACAATGAA AGATTAGGTA TAGAAGGCTT TATTTAGGCA TGCTTATAAC CAGATATCAC ATGAGATTGA TATTATTGAT AAAAATTAAG ATAAGTGAAA TTAGTGAAAG CGTGGATATA AACTTTGACT TTAGACACAT TAGTATATGA AGGTATGAGA AGAGCTTACA AAGGCGAGAG TATTC'TTTrC ATTTGTAGTG 1179 AAGTTAG?1'T TGTTTGCTTC TATTTTAAAG AAAAAAGATT TACTAGAAAA ATTTCAAGAA AAGTGTTA.AT CAAGTATTGA CACT'1-rATCT GGATTTCGGT ATAATATGCT TAGAAAGGAA TCTrTCTAAA TrTTTCGT CCTTATGTGT TAATCAAAGA CGAATACAAA ACATA'T1' TTTACTCTAA AAAGTGTTAA TCAATGATGT ATTT-G'rTAGA GAGGTAGATA AATGGAATTG AGAGCACCAC CAGTTATAAT AGAAAAGGAG AAACAAATCA TACATTTGAA GAGCrATrC TTTGGTTGGA AGGTATGGTT AGAAGCGTAT GGGGGTAAAA GAATATTGCT TGCAGTTGTT TAGTGATAAA AAAATTTGTC AACTAGTGAT GTGATAAAAG AGAAGCGACA CCTGAATACA AGTATAAAAC G'rATAATAAA AATATTTTAA C71GAATrAT TGAAACAAAA ACAACCGATT GTTTCTAGAA CGAAACAACA AAGACCAAAA GTrAGAAAGA T'rGGCTAAGT TGTCGCCCGA TTACTGCTAG CTGTGCGTCT ATCTAAACGT AGTTTATGCG ATCATAAGGC TGATGGGTAT TAGATTCTTA CTAT-rACCAC ATCTTTACGA TAATTATAAT TTTATGTrGT ACAAACTGAA TCATTTGCGA ACTTGATTAA AGTCGGATGT TGGCTCTCTG TCTTTAGCAG ATGCGCTTTT AAGAATACCr CTAATACCAT AATTATATGG TTTTA.ACTCG ATGCCAAAAG ATTCACATTT ACCATGCATT ATGCATTTTT CCATATCGAA AT'rGAATTAT AGCTTCGCCT TGCTGTAC'rC TTTCATTGAG TATTACTCT'r TATAGCGAAA ATTTTAGGTT GAGCCTATTr ATTGTAGA.AA ATATTTTTAT ATTAGAGAAG TTCTGGGATT ATCG~ AGTCAAGGTT CTTGCAGGAG CGCTTGCTAG AAAA~ TTAAATFTAT ACTCTTCGAA AATCAAATTC AAAC( AAG'rGCTGTC TGTGGCTAGC TTCTTAGTTT CT' ATGGTAGTTA TTTATGGCAT AATAATATTG ATTTC CTATAATATT TGTAGTGGGT AAACCACTAT AGAT; AAAGTCCCAT ATGA INFORMATION FOR SEQ ID NO: 201: SEQUENCE CHARACTERISTICS: LENGTH: 3895 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear MrAGT rATAAG
.AAGTC
r'rGAT
;GGAGT
~TTATG
5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6854 120 180 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: TCCTTGCTAA GTTTATACTC AATGAAAATC AA.AGAACAAA CTAGGAAGCT AGCCACAGGT TGCTCAAAGC ACCGCT'rTGA GGTTGCAGAT AAAACTGACA CGGTTTrGAAG AGATTTTCGA AGAGTATTAA TTTACATAAA TAGCCAGTGT ?1'GATAGGGT TTGAGTAGAA TTTTCTCAGA CACTTCTGCA TCITTCATAGT GTCGA2rcA CTTCTrAGC 1180 TTCATATCAA AATCTGTCCA TTTTGGTAGA C'TGCTGGCAA ATAAAAGTTA TTGAGCACTA GTAACTTTTG ATCCTCAAAC TGCGTTCAA AAGCGTAGAC TGTI'TGCTA 'rCTCAAAGG CTCG7GT(GTA AC?1'CCTI'CT GAAATGATTG GCAIrMCTT ACGCATCGAA TCAAGCTTG ATAGAAGGTA AAAATCGGAC CCTGGATTTC ATNTMCTACA TTGATGTATT TATAGGATTr ACCAGCTTTC AACCAAGGAG TGCCTGTTGA AAATCCTGCA TTTCCCAAG CATCCCACTG CATGGGAATG CGTGAATTAT CACGCGACTT AGCr'rGAATA ATCTGGAAGG CTTCTTGCTG ACTCTTTCCT TCTCTAAG GCATCTGATA CGCATTAAGC GATTCGACAT CCACATAATC AGCCATAGAA TCATAGTCTG GGTCAATCA'r CCCGATTTCC TCACCCATGT AGATATAAGG TGTCCCACGT GACAGGTGAA TGC'rGGCTGC TAGCATGGTG GCTCCTTCCT TGCGGAAGTT TTGAATATCG ACAAAACGGT TCAAGGCACG TGGI'TGATCG TGATTATTCC AAAAGAGGGC ACTCCAACCG Tc~rATCAC TCATTTCCTT ACCCCAACTA TGGTAAAGAC TCTTCAACTC TTCAAAATCA AAGGGAGCCA AGGTCCACTT TTGTCCATCC TTATAG'rCCA CCTTCGAGG ATAATTCC'rG ACGATCAGGC GACGAATAGA TTTCCCCAAC TGTCATAAAG CTATCGTCGG AATAGTTATG AACGATGGGT TTGTCTGTAT CCACTGAAAC CTCGTCCTTA CCGATCAAAT CCTTGTCGCG CCAGAAATTA ACAACCTTGA AGTTAAGG1'C AGCCTGGGTC TCATCAAATA AAGGCGTCCA TGCAGAACCA CCAAACTTAG AGAAAAAGTC TTCATAATAC TTATCACCAG 'rCGAACAATG AT'rAAGTACC ATGTCCAGCA ACACCATTTT CTCAAAATCA GCCATATCAC GGACACAGT'r
ATCCAAAAGT
AAGCTGGCTT
TGATCACATC
AAAGCTCCTT
GGTCAAGATA
ACTGCCA.ATC
CTAGGGCTTT
TAAAGTCAA'r ATGAAAATTA AAGGTCATGG TTCCATGGTG GTAGAAGACA GGCTTGGTTC ATCATACGCA CCCTTCATTT TCAGGACAGT AAA'rCGGAAA CCTTTGACAC ACGGACATTG GAATTGCGCC GTATTTCCCA GTATCCCCGA TGTTGGTTGG TCTTGGATGA CTGAAACCAT TCATGCTCTC CTTGTGCTCT TTACCGACAC 960 1020 1080 1140 120 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 CAAAAAGAGG ATCCACTGCC ATATAATCTG AAATATCGTA ACCATTATCC CGTPGAGGGC TTGGATAGAA TGGATTGAGC CAGACCATAT CCACACCTAG 71"rGGCTAAA TAGGGAATrT TTTCGATAAT CCCACCGAAA TCCCCAATAC CGTTTTCAGT GGTGTCTTTG TAAGATTwMG GATAGATTTG ATAGACTACT rPTCCTI-rAT CAAGTGTCAT CTCTTTCTCC TTT1TCTGATA AAAGGGAGGA AGCAGTCTTC CGTCCCTATr TGTGCTATTT CAATTATACT CAATGAAAAT CAAAGAACAA ACTAGGAAGC TAGCCACAGG TTGCTCAAAA CACTATNTG AGGTTGCAGA TAGAGCTGAC GTGGTTTGAA GAGAT=rCG AAGAGTATTA GATTCGTGTA GCGACCATGA GAGATGCTCC AGCTTGGATC G7TGGGAT AAGTTCCCGG AATAGTCGCT TCACCAGACC TGCAGCCTTA TAACGTGATC TCCTTGGACT CCATACCGAT CTCGATGAGC TGGTAGGGAA AAGAACCGTC
GTATAACCAT
ATGACATCCA
ACAACACTTT
AATTCAACTC
ACTGTCCCAT
1181 crGGTTGG;T GATGATAACA GGAGTTTCTG TATCP.AAACG AATCAGTTGC TGACCAACTG CAAAACCTr GCCATCAAGA CCTACTGTAT CCTCGTCAGA GACAATGCCG ATCGCATGCT TAACTGGACA GGTCAACTCA CC7?~oGCTTG cTGATGCAAA AATAGGATCC GTCGCTTGAC TCATAATTTC TACCGAAGTA AGTTCTACTG GAGCAACGAA TTCTCCCTGC AAGTTCGTAT TGCGGAAGAA GAAAGTCAAG AGCA'ITGGAA ATGGCAGCAT GTArrGAGGT TGAATAGAGA G?1'CAATGAC 'rAGACCrrGC CCCATGACAC TCAATTCTTT CACTTGGCCA GTTAGTGGGC GTTCATCCr CACAAATTCT GCTTCTTCTT CGCCCTCrGT ?T'rGTAAAG AGACCAGCCT CAACAATCGC AACTAGCATA CTTCCTGCAA GAATACCI'GG CAAACCACCG ATACCAATAG AAGCCGCAGT TACATTAAAA GTAACGGATA ACA'rGCCTGC AAGGGCTGAA CCAGTCATCC CAGCAACAAA TGGATAAATA TATTTTACGT TAACCCCAAA AAGAGCTGGT TCTGTAACAC CGAGATAGGC TGAAATGGTI' GCAGGAAGTG AAACCTGAGC CTCACGCTCA TCATGGCGAT GCATGAAATA ATAGGCAAAC ACGGCTGAGC CT'rGAGCAAT AT'rAGAAAGA GCAATCATTG GCCATAGGGC AGTCCCACCA GCATCCGCAA TCAAr'rGTGT ATCAATGGCA TTGGTCATAT GGTGCAGACC TGTGATGACA AATGGAGCGT AGAdGCGCC AAAAATTGCA CCGAAGAGCC ATTTAAC'rGG ACCAGTTAAA CCTGCCAAGA CAACTGATGA AAGTCCTTGT CCAATTGTCC AACCGArrGG TCCCAAAACA GTATGAGCCA AAATCAAGGC TGGAATCAAT GACAAGAAAG GTACAAAAAT CATAGAAATG ACTTCTGGGA TATGCTTGTG CCAGAAGATT TCAAGATAAG ACAGACTCAA ACCTGCAAGC AAGGCTGGGA 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720
TAACTTGGGC
TTGCCGCGAT
ATACCAAACA
CAGACCAAGT
AGTGATTCAC
ACCAATCCAC
CTGGAATAAT
TTGGTAACCG ATACGATTAA CAGTAAAATA GCCAAAATTC CAAACCCAGT ATCAGCTGCT GGCGTTGAAG CAACCGCATA GCCATT'GAGC AACTGAGG'G.
GATTCCGAGA ACAATTCCCA AAA'N'TGGCT GGTTCCCATC TTACGAGAAA AATCCCTACT GGTAAGAACT GGAAGATAGC TTCACCAGGC AACCAGAGGA ACCTGCCCAA AACTGAGAGG A'rTCTGTCAT GGTCTTGCCA TCCAACATCG ACCTTCCAAG ACATTACOGA AACCGAGGAT CAATCCTCCG ACTATCAAGG CGGAGTAAAA ATCTCCGCCA CAGTGGTCAT AACACCTTGG ACCACGTTTT GATTACTCTT AGCTGCAGAC TTGGCTGCr CTTTGGAAAC ACCCTCAATA CCTGAAACGG CTGTAAAATC ATTATAAAAG ATGGGCACGT CATTTCCAAT GATTACCTGA AATTCACCTG 1182 CATTTGTAAA GGTTCCTTTA ACAGCTGGAA T'rGACTCGAT AGCrAACA TTAGCCTTCT TATCATCTCC TAAAACAAAC CGCATCCGTG TCGCACAGTG AGTTACGGCA GTCACATT CTTTGCCTCC GATTGCCTGA AGCAGATCTT TGGCTTCTTG TTCAAAT~rT CCCGG INFORMATION FOR SEQ ID NO: 202: SEQUENCE CHARACTERISTICS: CA) LENGTH: 3936 base pairs CB) TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: AGGATCGCCC CTCCAGCTAC TAAGTCTCGT GCAGTGCCGA TTTATCAAAC AACATTTTT GT'FN'TGATG ACACGTAGGA AGGTGCCGAT CTGTTTGCCT TGAGGAAACC AGGGAACAT"? TATACTCGTA TCACCAATCC TACAACAGCT GCCCTTGAAG GTGGTGTTGA AGCGCTAgcA ACAGCATCAG GTATGACTGC AGTGACTTAT ACGATTTTGG CGATTGCCCA TGCTGGTGAC CATGTAGTGG CTGCTTCGAC TATTTACGGT GGAACCTTCA ATCTTTTGAA AGAACCCCTT 3780 3840 3895 CCTCGTTATG GTATCACAAC GCTATCAAAG ACAATACCAA ATTCCAGACC TGGAAAAACT
AACCTTTTTC
GCTTGTCTTG
GGCAGAGATT
GACAATACTT TTGCAACACC TTATTTGATT ATTCACTCTG TGACTAAGTT TATCGGTGGG GATAG'TGGTC GTTTTGACTG GACGGCTTCA CCAAGCTCCC ACAATTTGAG CTATACTCGT GTTCGAGTTC AATTGC?1'CG TGATACAGGT TTGCTACAAA GACTTGAAAC CTCTTCAC'TT GATAT-rGATA ATTTGGAGGA AGTAGAAGCA ATTGAAACCT TGGGTAACCC CTTGATTAAT GCTCATAAAC ATCAAATCCC ACTTGTGTCA AACG'TCTTCT CTCATGGCGT TGACATTGCC CATGG'TACAA CTATTGGAGG AATAATTGTC GGGAAAT'rCC CTCAATTTGT TGACGAGGGT GATGTGGGTG CAGCAGCCTT TATTATAGCT GCAGCCTTGT CACCATTCAA TGCTTTCCTC CCTGTGGAAC GCCATGTACA ACAATT'GTTG ATTTTCTTGT CAAC;CATCCT AAGGTAGAAA AGGTAAATTA GCAGATAGTC CTTATCATGC CTTGGCTGAC AAATACTTGC CAAAAGGTGT TTTACCTTCC ACGTCAAACG TGGCGAGGAA GAAGCACGCA AGGTCATPTGA ATCTTTTCTG ACCTTGCAAA CGCGGCAGAT GCTAAATCGC TTGTTGTCCA ACCACTCACG GTCAATTG'rC AGAAAAAGAC CTAGAAGCAG CAGGTGTCAC AAATGCTGAG 840 TCCAAAACTT 900 CGGTTCAATC 960 TAATwrAGAA 1020 TCCAGCAACA 1080 ACCAAACTAA 1140 ATTCGTTTGT CAATCGGTCT TGAAAATGTA GAACA'PTTGA TTGAAGACTr GCGCI'GGCC TTGGAAAAAA TTAAAGTAA AAGAAGATAA ACACTGGGCT TCGACTCACT GTI"TGATT 1200 1260 1183 ?TCCCTCAGG CATGATATAA TCCTTACAGA AGTCTAGAAA GAGGAACCAT A'rGAACGAAA TCAAATGTCC CAACTGTGGG GAAGTCTA CAGTAAATGA GAGTCAGTAT GCCGAACTCT TGTCCCAAGT GAGAACGGCA GAGTTTGATA AGGAACTACA CGATAGGATG AAGCAGGAAC TGGCCTTGGC TGAGCAAAAG GCCATGAATG AGCAACAGAC TAAACTGCC CAGAAGGATC AAGAAATTGC GCAATTACAG AGTCAGATCC AAAACTTTGA TACAGAAAAA GAATTGCCCA ACAAAGAGGT TGAACACACA AGCCATGAGG CTrCTTGGC TAAGGACAAG GAAC2'ACAGC TCTTAGAAAA TCAGTTGGCT ACCTTGCG'rr TCCAGCATGA 'r'r'CTGACCT
AAAATGAATT-
GTGAACAAGT
AGAAAAAGAA
ATC7"rTGGCT
CGAGTTTTAT
CGGGATCAGG TAAAAACCA TCTGTTAAGC AAAAC'TACGA AAGAATTTTA AGGCTCAACA AAATCAACTA CAAAAGACCC ACTACTTG CAGGAAAAGG AGCCCAGCTrC AAGGCAGCTA ATCTACAAAA GCGATTCGGGG TCGrACTTTC GCCTTTCCAA GTCTAAAGGG GACN'TATCT CATGTTTGAG ATGAAAAACG AAAGCCTAGA ACAGTATGCA GAGAGTGACT ATGCTTACTT TGAGAACGAT AACAAGGTCT 'rCCGTGAGTG TGATGAAAAT GGAGTTGAAA AAGCGGACGG AACAGAGAAG AAGCACAAGA ACCCTCGGGA GAAGAACTGT GAGTATGCCG ACTACTT'rAA CACAGGGATT GTTGACGTCA GTCCTCAATT CTTTATCCAA TTGAI-rGGTC
TTA.ACAACGT
CT1'CCCGTGC
TCATTTCTAT
ATGCAGATTr TTACAAGGAA TTGGACAAGG TT'mrGAC CATGCTTGAC GCTGATAATG GTCACGAGTA TGAAAAA.ATG TATGIT=rC TCTTACGTA.A TGCGGCGCTA AATTCCCTAA 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 AATACAAGCA GGAGTTGGCC TTGGTTCGCG AGCAAAATAT TGACA'rTACC CATN'TGAGG AAGATTTGGA TGCCTTAAG CTAGCrTG CTAAGAACTA TAATTCAGC? TCGACTAACT TTGGAAA.AGC TA'IrGATGAA ATCGACAAGG CCATCAAACG TCCTGACCAC ATCTGAAAAC CAACTCCCTT TAGCTAACAA TAAAAAATT GACCCGGAAA AATCCAACAA 'rGAAACGAA S S 9 0 5555
S
S
CATGGAAGAG GTTAAGAAAT CAAATTGGAA GATGTCTCTG GTTCCAAGCA CTGAAGGGGG GGAAGCAGGA ATGACCTCC GAAAAT'rGGT CATGGTGGAA TGGCAAGGCG ACACGCATC AGTAGAAAGC AAAAATGAAC GGTATTATTA ATGATGCCGT TTAAACTG CGTAAGATTT CCTTGGATCC GGATGTGGTG GGTGTTTrGC TCGAGTTTAT GCAGGACGAG GGTAAGATCT CGAAGACTGA GGATGCTACT GGGGAAGTGG ATGAAAAGCT TIGTTGATGAA GCGATTGCTA
ACTTAAAAAA
TGGGAACCAA
CGATTGCGGT
ATGAGGGGGA AATCACTCTG GGCTATTCCA TCGCAGAAAC CCCTGTTTTG TCTCTCTTCG GCTTGACTGG GCCTATTACT CAGATTCCCC CTATGTATTC GGCAGTTAAG GTTA.ATGGTC GCAAGCTCTA TGAGTATGCG CGTGCTGGTC 1184 AGGAAGTGGA GCGTCCAGAA CGTCAGGTGA CCATTTATCA ATTTGAGCGA ACAAGTCCGA TTTCTTATGA TGGCCAACTT GCCCGATTCA CT-r-CGTGT AAAATGCAGT AAAGGGACGT
ACATCCGTAC
ATI'TGACTCG
TTrGCTGAAAA GTGACTrGT 'rrATTGAGCT T'PTGTCAGTr GATrTGGGTG AAAAGCI'GG ?I'ATGCGGCT CATATGTCCC TACTAGTGCT GCTGGCTTAC AATTAGAAGA CGCTCTTGCC TGGAGGAAA AGTAGAGGCT GGGCA.ATTAG ATTI'TCTCCA TCCTTAGAG ATTGGGACAG CAAAGTTC CIYAAGTCCAG AAGAGGCTAC AGAAGTTCGC AGACCAAACG GACAAAGAAC TCGCTGCCTT TGAAGATGAT
TTTGGTCGTT
AAATTGTTAG
3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 CCATTCTAGA AAAACGGGGC AGGAATAAAA ATCGGGTGAT ATGGTTTTGG GAAT-rATAAT ATAGAAATTT AGAGGTGTGA TCCATTACAG GTTCTGATGG AATCTCTATA AGCCAAGGAA AGATAACAAT TGCTTGATAA ATTCCAATTG ?TGCGAGTTG AATGAAGCAA TTrAA.AATTC GAACTTAGGC CCAGGATTTG GGTrTTTTAGC TAGATCGTTT A.ACCCCATAC TAATAGTAGA TAGGTACTCA AATA.ATCTAT TTTCAGATA.A ATATTTAGAG GTGTGATAAT TCCATGATGC 0~ 0 .0 06 0* *0 00 0 S GAAATGAGTT TCGAGAAAGG G'rGGAGCAAC TTC??CAACA AAAAGAAATA AATGAAAATA GTGAGTTGAG TCACCTGTTT CGTCTTGCTA TACAAAA'rTT AGACAGAAAT GAAAAATACC AATCGGTCAT GGCCAATTTG AGTCAAGGGT TGTCACTTTA AGGCACCTAA GTCTGTCATT GATTTTGGTT TATGGA CCTCATGACG CATCATTACC 00 00 4040 0 *000 0 t 00 00 4 INFORMATION FOR SEQ ID NO: 203: SEQUENCE CHARACTERISTICS: LENGTH: 3230 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 203: CATCCAGCAA CTCCTCCTCT GAGCGTTTCA AAATT1GATGT AATTTTTCTA CGTPTTCTA ATAAATGTGC CATTTTCAC CTCGAATTTA ATCGCTATCA TTATAACATA AAAACCTCTC TTTTTCAATA ATTATCTGAA AATTCCTTAT TGACTTGCAT TGACTTACAA TTTAATTAAA AACCAGAATA TTTTTAATTA AATTGTTCCT T rCTATTGA CAAGTTGCCT ATTTTTGTGT ATCATAATAT TATAAAAGAT AATATAATAA TPTTATT'rGT CTTTTCACAT TCGGTCTCCT TATATAAAAA AGCGATTCAT TTTGAACCGC T TTTTCTTAT TTATCGCCTT TCTTACGAAT AACAAAGCCT GTTTCCTTTT CGCTTA.AAGT ATTGCGTGGT TTTTTATTAT CCTTACGGTA ACGTTTTTCC TTATCAAAAC GATCGTTGCC ACGACTTCCT TTTTGAACT CATCACGGCG 1185 ACCATTGCCA CGGCGATCAC GCTCTCGACG GTCGTCCCCA CGACGGCCTC CACGACCTCC CTTAGCrA CCACCGAAAC CATTACCTGA TGGT 'TAAAC GGTACTGGtT AATCTCCACT TCTGGAAGCC TATCTGGGTC TTGGACTG'rC AGACTCAAGA CAATTCTTCT GGAGTAAACT CAGCAGCCAA TTTGCGAGCA TCCTTACCAA GTTGGCACGA ATGGT'CAT CTGCAAAA'rC ACGTTCGATT TT'CTTGAGAG 7"rTGMATTGG AAGGAT'rCTT CTACACTTGC AGTTGAGA CCTTTCATGC CAAGTMTCA ATGAMAA GGTAACCCAT TTCGTTTrGGA GCAACAAAAG ACCTGACTTA CCAGCACGAC CTGTACGACC GATACGGTGA ACATAACTCT TGGAATATCG TAGTTGTACA CATGGGTCAC ACCTGAAA'rA TCCAAACCAC GTCTGTCGCA ACCAAAACAT CAAGATTGCC ATTTTTAAAG TCACGAAGGA TITT=GG TCTAGGTCGC CATGAATTCC rI'CTGCACGG AAGCCACGAA ACGAGTCAAT TCATCCACAC GGCGTTrTGCT ACGACCAAAT ACAATACA TGCCACATCC ATGAGACGAG TCATGGTGTC AAAT7MTTC'r TGTTCCTTAA GTACTGGTCA ACCAATTCTG TTGTCArC CTTAGCCGCA ATCTTGACAT TTTCATAAAC TGAACACCGA TACGTT'rGAT GCCATCTGGC ATAGTTGCTG AGTTTGACC TTCTCACGPA CACGAAAT AATCCTTCG TTCA T7rCACGTGC TATACAT'rGC
ATTTCTCAAA
CTACC74GTTT GTTTC7'rAGT
TAATAGATTG
CAGGATCTTG
CCGCTGCAAC
CACGAAGACG
TrrTCAAACC
GTTCTGGTTG
CACGGATATA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 GTTAAGCATT TCATCCGCTT CT'rGCGTTTA ATCAAGTCCA TTTAAGAGCC TTAATTTGTT TCCCTTACTA CGACCAAAGC TGGAGCGATG ACCAAGGCTT CAAGCCAAAC GCTGCAGTTT CGTCAAGGAT AAGGGTT'rCA ATGTCTTGTA AGAGGCGACC TGGAGTTCCC ACCACAATAT TTTCAATGCT TGATCCGCCA TATACTGAAC GGAAGAGTTC T'rCTTGACT'r TGGACAGC'rA GGATAGTCGC rrCTTCTGTA CGGATT"TT TTCCTGTACC AGTCTGAGCT TGACCGATAA GTTCACGGGC 1320 AGAAAAGCAA 1380 GGAAGCCCAT 1440 ATTTCAAGGC 1500 GGGCACCAGA 1560 CGACTTTGAC 1620 GTTCACGAGT 1680 CAAGGGTAGG 1740 CATCCTTGCC 1800 TTCAAGGGCC AAAGGAATAG TTTGTTCTTG GATAGGACTA ?'rCAATTTCT GCTAGCAAAT CAGCAGACAA G AATTCA GCTTCTACAA AACCAGCTTT TTAAATTTCA CGTTATTCTT CTTTCTAAAG GTGGTGCGAA GCCACCCTAT AGGGCTTAGT TrATACTTT-T CT=PATCA CGTATTTTCA TATAACTAGA TATAAAATCG TGTTGCTTCT TTTCCACAAA AGAAAAGTAC TGTTTTCTTT GCAACCTATC TAGTATAACA CAAGACCAGA GCAAAAGATA GCCCCATT'TC TACAGAAAAT CATGTAAGCG C7TTlrTGACT TTCrTTTrTG ATTGAACGAC CTAGATAATA AGACAAAGCC AAGGCGATAC TGTATAAAAT GAGAAAAACG AACAAGGTTT GTGTGTACGA 1860 1920 1980 2040 2100 2160 2220 1186 ATGAGCCATr T'rATAAGTCT CTGCTAATAA AATAGGTCCC GCTAAACCAG CCA'rTGCCCA AGCTGTTAAA ATATAPLCCAT GCAGAGCGGC CAATTCCTTG GTTCCAAAAA TATCACTGAG ATAAGCTGGA ATCAAAGAAA AACCAGCTCC ATAGCAAGTC ATCAAAATAG ACATAGCAAC TACAAATAAA ACGGAATCTG TAAAGAGCCA AAGTGAGAGA GAAAAGAAAA GA'rrGACAAG CAGTAATATA CTAAAGGTrA GAGGGCGACC GATATAGI'CA GACAAACTCG CCCAGAGCAA GCGACCAAAT CCATTGAAAA 'rCCCCAAAAC ACCCACCATr ACTGCTGCAT GACTTGTAGA CAAGCCAGCC ATCTCCTGTG CCATTGGCGA TGCCGCTGAA ATTAAGCCTA AACCACAAC TATGTTGATA AAGAAAATAA TCCAAAGCAT ATAAAACCGA TrGCTTTTT-A GAGCCTGATT TGCAGCCATr CCT'rGCGTcA AAGAGGCTGT T TTTCTTTC CCTGAAGAAG ATAAAATN'CC AAGCTCTrGC TCATTTGGAC GCTTAATGAA TTGTGAAGCT AGGAGCATGA TAATAAAGTA ACTTGCTCCT AAAATATAAA AAGT'rTCTAC AAGCCCTACC CCTGCGATGA GCGTTGCGC TATGGGACTA GTCALATAAAG AAGCAAAACC AAACCCCATA ATCCCTAAAC CTGTTGCGAG ACCACGI'TA TCAGGAAACC A'N'TTATAAT CGTCGACACA GGGGTAATAT AGCCTGCTCC CAAACCAAGC CCACCTAAAA TGCCATAACC GAGATACAAC AACCACAGCT CTGACGGTCT ATTGCAAATC CTGTTAAGAT ATTTCCACCT GCGTATAGAA AAGCAGATAG ACTTCCCATG ACTPTCGGAC CAAATTTTTC TACCAAACGC CCCATAAATG CAGCCGATAA GCCCAAACAA AAGATTGCTA GACTAAAGGC GAAGGCAACA GAAGCCTGAT CCCATCCCGT INFORMATION FOR SEQ ID NO: 204: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 5096 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3230 120 180 240 300 360 420 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 204: CCTATGAACA CTGTCC!CAAC TGGGTGTCCT TCTAGGCTAT CTG AAACTAATTC CAAAATCAGA CTGGGTCTTG CTTCGTGCCT GCT GTAAATTCAG ACACCACACC ATGTTCTTCC AAAT'rCTTGG CA GATTTTTCCT CCAAGCTATA GGTCACAAAA CCACCCPTTAA A GTCCTGC CACTCCAGTC CAGCCAT CTTCTGAGCT GAATATC CAACATCCTT TACTTGA AACTCCAGAA CTGCAGC CGCGATGGTT TAGTTTC TTCCCCATAA ~GTCCAA GATTTGATTT
AAATTCGCCA
TTCCCTTGCC
CCATAGCAAA
CGGTAGCTTG GAAAAGACCT TTTTCAGTTC TTCTACCACA ACTCTCGTAA AGAAATTCCT GCCGTCAAAC TCT ATGCTGGCTA AAC' TCGAAAGTCT GGC, TCCAAGATA'r CCAGCGCTTG A'rrCGCCTCT TM-MTACTGC TAGCCN'GT TGACAGACGT 480 AGAGTGACTr CTCCTGTCCTr GCCATAAGGG GCCAAGGTAG GATCGATCTG ArATCAArr 540 AAATCAGCCA AAATCGTAAC CAACTGG.CTC TCGCCAATCC CAAAGAAACG AAAACTCGG 600 GAATACAGCT TGCTCCCTCT CATCAACTrG GCTAGAAGT GGrrTAAGAC CATGGGTTrC 660 AATTCACTTG GCGGACCTGG AAGGACGACA 'rAGGTCACTC CCTrACTTC TAATrTTCCT 720 CCAACAGCCA GTCCTGTTTC CTTTGGCAGT GGAATCGCTC CrrCTACAAT TTGACT 780 CTTTCGTTAT TCGGTGTTCG GGCATAGTCT GCTCGCAGGG TAAAAAAGAT ATCCAACTTC 840 TCCTGAGCCT GAGGATCAAA GACTAATGCT TTCCCTAAAA A7"TTAGCTAG GG7"rTGTTTG 900 GTTAGGTCG'r CCTCAGTTGG CCCCAAACCG CCTGTCAAAA TCACCAGACT GCTACGT'rGA 960 CTGGCAATCT CAAGCAAAGA CAAGAGACCA ACTTCATTGT CTCCTACAGC CGTCTCAAAA 1020 TATACATCTA CCCCAATCTC AGCTAGTT'rT TCCGACAAAA ACTGGGCATT GGTGT'rGACA 1080 ATCTGCCCTG TCAAAATCTC TG=rCCAACA GCAATGATTT CTGCTTTCAT GTTTCCTCCT 1140 *ACCTATCTAT TCGTATTTTT TTGAAAAAAT CGCAGGAAT'r 'TCCTACGAT TGAT'TTTrTT 1200 *ATT'rGTATCA AAAGTTAATT ATCTTCATCA CCAACAGGTG CTCTGCCAAA TAAATCTTCA 1260 **.AATAAAACCG CATTGGTTTC AACrGAGTA ACTTCT'rCTT GTCCCAAAGA ACGTCGGAGT 1320 .AGA'rTTTGCA TTTCCAACA'r ATG'rGCTCTC GAAACAATCT GCTAAGAAAC ACCTTGA.AGT 1380 :*ATCTCTCCTr CACCCTGCAA CTGCTGAGTT TCAA'rGGT1-r TAAATGAATc TTTATAGCCT 1440 AGCAAG'N'AG GGATACT~r TGCAGACAAA TCAATArrGG TCTGCATATT GTCACTCAAA 1500 GCTTTTAGAA TCTCTTGATA ATGACCAATG CTATrrAAAC TGAGAGCTTT 'FTCCATGACT 1560 TT'rTGAATAA CTTCACGTTG ACGTTTTTGA CGACCATAAT CCCCCTCAGG ATCTTGGTAA 1620 CGCATTCGTG CATAGACTAG GGCTTCTTCT CCCCCAATAT GTTGCTCCCC AACACCGATA 1680 GAAATAGTAT TAAATTCTTC TTGGTCACTG ATAGAAA'rTG GGAAACCTAG GATATTATTG 1740 *ACTGTAATAC CTCCTACTGC ATCCACTAGT TTrTTGCAATC CTCTCATATT GACCATCACA 1800 TAGCGATCAA TATGGATATT CATCAT7MT TGAATGGTTT CTATAGCAAG CTCTGCTCCA 1860 CCATCTGCAT ATGCTGAGTT CAGTTTCGCT TCATGAGCCT GACCATTCCC TG3ATTCAATG 1920 CGCGTCAGAA TATCCCGCTC TAAACTCATC ATTGTTGTTT TrTTCGTTTT AGGA'rTCACT 1980 *GTCATCAAGA TCATGCTATC ACrTCTACCG ACCCAACTTT CAGTTCGrTC AACATTTCCG 2040 GTGTCCACTC CCATTAACAG AATGCTTAGA GGTTCAGTCG CTTCAATAAC CTTGGTTTCT 2100 TCACCGArTT 7=TATAGGT TTTAGCTAAG GTTTCTGTCC CTTGTTGATAL AA'rAGTATA.A 2160 GCAAAAACAC CTAC'TCCTAC TTTTTAACCA TATTTCTACT CCCTTCTTCT ATATATGCCC ATAAAGA1'GG ACTGC2'GCTT GCCACTTGCT TGTGCCCACT 1188 TACAGTTACA GAAAGTAAAG AACCTATCAG TTTACCCATC CACGCTCTTG GCTACCTTCA CrAGCACCAT 'rCCAATAAT'r AAGTAAACAT CGATAAATTT ATGACA.AAGC CA'rGCr'N'T
GATTACCACT
CTATCCC
71'GGACAGTC
TTCTAGCAAC
ACTTGGAGAC GACGCAGAAT AAACTTCCCA AGCCAT'rAT TGACGGACTC TCTTACGCTC AAGAATGCAA GTAAGGTTAT 2220 2280 2340 2400 2460 2520 2580 2640 CCAATATCTT TTTCCAATCA CAA'rGAAGAG ATCTCCAATA ATCAGCTGTA ATATTTACAA TACCAGCAAT TTTGCCATr CTGATTGTCC GAACTAGCTT GCTTGTTGAC GAA'rA'rTCC AATACCATCT CCGTCTAGGC 'rGGTAAAGTC TGTCTCCAAA CACTAATTCA GCTGCATCTT TGGGCTCTGC TTCCCTA.ATG GAAGCTCCTC TAACAATTTC TCAGCACGCA AACCCTTTGC C-ATCTGCTTC ="TAGAATT TCCAATTCTA AATAAGCATC GATTTCCCCA CTCAATAACA GTCACGCCGC CACCAAAGAT AATCAGCATC TCCTTCAATA CGATAAACAT CTAGGTGATA ACTCTCTCAC GATAGTATAG GTGGGACTTT TAATCATTTG CAAGTCCT'rr AGTAAAGGTC Gr'rTTACCTG CACCCAGTTC CATTCTTTGC TAATAGATGG CCCAAACGCT CCCCTAAGGC TTGTGTACAT ACTCTTATTA TACCAAAAAC TTTTC7,rTG 'IrATCATCAT AACATCCATA AAAAACAGGC TTTCTCTAAA ACCAATACAA GATCTCGGAA AATATGACCA TAAAAGGAAA ATCTCCTCAC 'rAGTCAAGAG
CTCACACGAT
AGCAATTCAT
T'rAAAAAGGC 2700 ACTCCATATT 2760 CTGAAAATTT AAACGGCGTC TGCCAAGGCA TCTCCTAAGA AAACTCATCC AAGTCGATAG AAGTGGAAGT CGACC'rTCAT AGAAATCTGT AATCCTTTTG TCCAGT'rAAG ATTAAAACAT TTGCAACTCT TCTTCATTTT TGTrCTATTTT CCTACTAAAC AGAAAATGAG CGTAACAATG CTTCC'TTT AACCGAATTT 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 GGGACAAGAT AGGCTGCAAA AAACAAGCCC AGTCCAATAT AAATCAGAAG TGAGACAATG GTCATTGGAT TTCT'rAAGAA AAGAAGTGTT GCTAAAATAG TCACCAACAC TGTCTTT CTGTCCAGCA TAGCAAGAAA ATCGCGCACG TAT=TTCA TCTAGCCCAA ATAGGAAAAA GAAGGATGGC AATAAAAAGT GTATTTTTGA TGAACAAGTT ATCTGACAAA ACAAGAACAG AGTAACATAC TGTAAAAAAG CTTCACCGAC TTCTT'ACTGG 'rGC'rTACGGG TATAAAGATA ATrACTCCA GCACAGATTC CCGATGAAAA AAGCTGTACT TTGTTTAAAG GACAAGATGC CTACTCAAAC TGATTTGAAT TAAAGCTAAC AAAAATAAGA AGGGTAAAAA AATCAGCAAA CAACTAATTC TTGCTGCAGC CTCCTAACAA ATTAATTAAG CTAGGACACT ATGGACTTCT CTGAAACGAA AACCATGCTT ATTCCTTCCA TAGGAAACAG TTCTCATTA TTTCATCTTC TC'TCTCCCTT CCTACCAATC A'IrATACTAG GAGAAAAGAG AGAACTGI-r' CTAATCTTCT 1189
CAAATGTCTC
ATCAA-IACTAG
CTGACGTGAT
ACGGGTGVGTT
TAAGGCTGAT
TATAATGTCA
TTTAAGACGC TAAACAAACA CTAGAGACTA ATACTCAATG AAAATCAAAG GTAGCTAGCC ACAGGTTGCr CAAAACAGTG n"rGAGATT GCAGATAGAG TTGAAGAGAT TTTCGAAGAA TATAAATTTG AAATCATGAA AATCCGTCAA G7'rG'rCTC AGTAT-rAATC
AGATTAACTA
GCACCTCACG GAGCGAGACG GACTCAGAGr CACATAATTA TAACrATCAG CtTmCAGGTT ATrTAACGTr TCAGAAAAAC AACAGTATCT AGTTCCTCA AATAA?'n'TC TATCTTCATC AACATTAAAG GATTGTTATA ACTrGTTCTC TGTCTAGAAA AATTGGTAGC ?1'GTCTGTCT AATCTrACAT
TTTGGCTCCA
TTGTTTACAG
GCTTCArTTAT AT'rTrTCCTT TGATACTAGG TAGCCAAATA AAAACTCTTG ATGGTCCAAA GAGTAG'rTTG CCTCATATTC TTGTTCACGA TrCAACGCCT TTAAATAAAT CAGAGTATTT GACGAAATTG CCTCACACTT ACTGTCATAT TCAAGGTAAG TTCGGTCTTC TAATGTTAGA AGA'rA'TCTA AATCCTCATA ATTTC=GCA AACTCTCTTG CTTCTATATA ATAATTrTG GCATTTCC TA CAAGAATAAG TAGAGGAGCC AGTrCAATCG TTTCAAGAGC TTCTTGGATG TAGTGAGCGT AGT'rGTAACG AACTCTGATG T1 TTTGTCT GATACAACTC TATTAAATGA CCCACTAAGG AATAGAAATT AGATAGACTA GAAGAGACT TTAATAATAT ATTTTCCAAT TGA'rAGAACT CAATTATAGA TTTAATCCAT AAAGTGCTrC GTrCTACTTC TATTTTATAA TCTAATAGGC GACCAGATAG ATGTrwrGAAA 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5096 TTAGAGAGGT TAGACTTAAC 'rTCGATTTGT TCATTGAAAA AGTAATCCAA AGGGACTTCA AGTCGTTGAG AGAGTTTGAA TAACAAGTCT GCGGAGGGAA TAAAATGACC TCTTTCAATT TTACI'AATCT GGCTTTGTTC ACAAATTCCT 'rCTGCAAGAG TTTGTTGGGA GAGTCT INFORMATION FOR SEQ ID NO: 205: SEQUENCE CHARACTERISTICS: LENGTH: 2395 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205: ACAAGATAAA AATAAAGGAT TACAAPGGGG AATATAAAGT AAACCGGTAA ACCTAAAAAG AAAGGAGAAA AGATGAAAAT TGTACTTGTA GGGCATGGAC ATTTTGCTAC AGGGATTTAT AGTTCTTTAC AATTGA'PTGC AGGTAATCAA GAAAATGTGG AGGCGATTGA CTTI'TGGAA GGAATGTCAG CAGATGAACT CAAGCAAAAA ATCTTACTTG CAATTTCAAA TGAAGAAGAA G'TTTTAATCC TAAGTGATCT CTTGGGAGG.A GGAGAAAATC CAGCCAAGAC AATGAATGTT GAAGCAGTCTr TTGCTAGAAT GGCTCATAC GCGGCCCAGG GCGGAGTCGT AAATGGTAAA CAAGAAGATT TCGAATCCGG TATTTAAAGG AAAAAATAAA A'rCGCCTGAG CGCTTCTTAG GCCAGGCAAT CGATAAGGTT ATTCGGCAGT ATTTCCCGAC CCCAGCTACC TTTGATAATG 1190
TCGCCATTCA
'rTTGATGAGG GAATTGCwrTT
GTAA.AAGAAT
AAGTACCACT
TAGAACTCAA
TCTATCCAAT
AGGTTrTCTTC TACCATAATG TCAACTTAGC CATGTTAATG TTGVTAATA6A ATCAGTAGTG CAACCGATGC AGAGGAAGAG GATAAAAAAG GTTACGATTG TCTGACGAAA GAAGAAG'TCG CCTTGACTAT TTCAAGGAAG CATGGATAAC ACGGAATGGA
CCAATGGTTT
TTAAAAACAT
AATTGGATCA
TAAATGGAGA
GCTATCAAGA
ACCGTTTGAT
CAGGCGATCA
'rAATCCGTGA
CCTTTAAAGG
cATGGGGAGT
ACTTG'N'TAA
ATTGGGATTT
CTGGACAGGA GAACTGTGGT TGGCTTATGA ATACAGTCAA CAGGATGCAT CGCTCATAAA AATGTTCT" CT'rTCCTGGA TCGTGTCAAT AAGAGAGTAG CCATGATCTC GGCTTCrrGT AC-ACACCGTC TTGTATGGCT GAATATAAGA TGGAGAGGCT AGAGAAGCAA CCTTGAAAGC TGCAGATAAG TTGATTGAAC AAAAGGTGGT TTTArTCAAG CTTGGGGAGA CTTGGGCAAG AAAGAGCATT TATCGACTGC TTGCTCAATA TCCAACTCTT ATTCT'rTGCT TATCAAGAAA AAAATACTAC GATATTGCAG AAAGCCATT'r CTATGCTTCA CCTAATAATG TGACGCTT~CG TCCTTCCACA CCTTCTAT-r TGATCC'rGAG ACAGGTCAAC TGTAACGAGA CAAGGGTATA GTGATGATTC ATGCTGGGCA CGCGTCAAT CTATGGTATT CCTTTGACr'T ATCGTCACT'r AAAAGACGAG tCC'rGC'rTTG GGGTGTGACC AATrTATT'rCT TGAATCGTCT GCCAAAAGAT CATGTGTCCT 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 GATTTTTrAAT GATGGTAGTG ATCAATCACG AGATTCTTCA GCAACAGCTA TCGCCGTCTG TGGGATTCAT GAAATGCTAA AACATCTCCC AGAGGTGGAT C GACAAAG ATATTTATAA ACATGCTATG CATGCCATGC TTCGTTCCTT GATCGAACAT TATGCAAATG ATCAATTTAC CCCTGGTGGG ACAAGTCTCC AAGGAGTGGA TGAAGGCAAT ATCTGGGGTG TCTACAAAGA CTGGAACCTA TATTrGGTAG ATGACCCGTA TCGATGAACG GTTGATTCAT
TCCACGCTGT
ACTACTATTA
AGGAGAAATA
GGACAACGAC
GTACTCATGG CATTCAGGTA CCTAGAAGCC CTrATCCGTT TGACAATGCC AAATATTATT AACTTTGGGT AAAATACCTA CGGACAAGAT GCAACAA.ACT GGTTGTA6ATA CGGTCATTGT TGCCAATGAC GAAGTAAGCA CTGATGAAAA CAGTTGTGCC AGACTCAGTT GCCATGCGT'r TCTTCCCTTT ATTGATATCA 'rrCACAAGGC TAATCCTGCT CAAACCATCT TTATCGTrcT AAGGACGCTT TAACCTTGGT AGAAGGTGGT GTCACTATCA A.AGAAATCAA
GCAAAAGGTG
AAAGGATGTG
TATTGGGAAC
1191 AI'TCACAATG CCCCTGGTAA AGAGCAAGTG ACACGCTCCA TCTTCCTGGG TGAAGAGGAC AAGGCGGCCC TCAAGGAATT GAGCCAAACT CATCAAGTAA CATTTAATAC GAAAACAACT CCAACAGGAA ATGATGGAGC TGTCAAGTC AACATTATGG ACTATATTA ACAGAGGAGA TCGTTATGTC GATTAA'rGTA TTTCAAGCGA ?rr'rAATrGG ATrTATGGACA GCTr'rCTGTT TTAGTGGAAT GCTGTrAGGA ATT ACACCA ATAGATGTAT TG? TCTGTCA T TTGGTGTCG GAATTAT'rCT AGGTGATCTG TCATGCTCTr GCAATGGGAG CCAATGGTGA ATTGG INFORMATION FOR SEQ ID NO: 206: SEQUENCE CHARACTERISTICS: LENGTH: 3342 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear 2100 2160 2220 2280 2340 2395 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: CCTTCTTTAG AGGTTAATTT TGCAAAATCG TCGATTGTTA TATAAGGATT ATTATAGAGA a a. a.
a CTGTTCGCAA AGAATCTCTG ATAGCATTGC CATCTGTTCC TTCAAAATAA AAAATTCTGA ATTAGGTTAG CTCCACCTCC TCATTAGTAA TGTTVTCTATT TTGAAATATA TTTTr'rCT TAAGATAAAC ATCTTCCCAC GAAATTAAAA TCAATCGAAA ATATGTTTTT GAATCrTG AATACAAAAC TATCTCTCTA ATCAATTGGT AAACATACCG TAACTAGAAA AAGAATTATA TGCGTACGGC ACAAATCCCA AAAGTGCTAA TATTGCGACA CCCAAAGAAG TAGAACACCA AATTCCTATC ACTATTTTTT ACTCATTTrGA CAATAACCGA ATGCTAATAA CACTGGAAAT GAAATAGAAG AA-AAAGGGAG TAGCAAGCAT CTCTAGTTTA TAAAAAATGA CCTAGTTCAT GTAATGTAAT TGATATTAAC ATAATAGATT AATGAATCAT TTGGAAAAAT TATCAATAAT AGGAACAATA ACGGAATCAA ACATAAATAT ATGACAGAGT ATACCATTCC TCTAAACTAT TAGCTTCAAA AAGGCGTTTT AAAATGTTCG GAATCATAAT TTTCTAAAAT TAATTTTAtG TAATCCGTTG TTTTGTACTT AATTTTCCCT TCAAGTACAT TCCATCAACT GAGCCTCTGC AATATCTTTG AGTGAATTGG ATCTGTCCts CCATATATGA AAATATATCT CTAAGATATT TTACTCTCAG CAACATCTAA TGTTACAACA AAC=TCCAG CACCCCCCAA TCCTTrCAAT AAAGT'TTTr GTGTCCACAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 TATTTAATAT TTTCAACATA TTCTCCCAAT ACATCTTCTC TCTGGTAAGC TCTTTCTTGA CTTCAATTTT ATAAGTTGCC TAAT'rGAAAC TTGGTGTAAT CTGACACATT ATCAGAGCCG CTAATCGAAA AAGATGGCTC ATACGTTTTG TAAATATACA 1192 GGAGAAGAGA TAATTATAAT ATCAGACTCT AATAACTCTT 7rTATAAC ACCTCCATCA TCAGCATITAC ?=TCCTATC AATTCCTTTC TTAAACAACT CTIPC1GAATC AGAATTACAT A'N'TCTAGCT CTGAAIGAZA AGGTGTCCTG AAAGATATAT CAACATTA~r TCTACTAGAA ATGATACTTG AAAGTCTCTT AGTATACTCT AAAGrCTrAG AGTTATGATT TCGCACTCCT CCATATATAA ATATTTTATT TCAAGATAGT ATGGTACAAA ATrTAAACCAA AATCAATACT ATAACACTAG CATATACCTC TTATTATCTC AACGAAAAGC AGTCAATCTT GGTATT A CTATAAATTA AGCTAGA'T CAGTGCGATA TACCTTAAAA CAMAATT CATCCTCTCA ATrrGAATTT AGTAGAT'N'T AACAGACTTT TGTTGACTCA A7TTTTGGAG TAATTTTCAT AAGCCGTT'rA GCAA'rTAGAA TACACTATTA AAAATATT TATTGCTTAA TGAGTCGACA CATTArrACA TT1TAGTTTAA
TAGAACTTTT
TATGTT-rTGT AATCA'rTTCT CTTTATrATA ATAGAATTAC ATATrAAACT CCTCTATTTT AGAAACAAAA
CAAGTAATGA
AAGTATAAGC
GGGGATAACT ATCTNrGT CATTCTGATT AATACCAGTC ACACCTGTAT ACAAAGAAAA U 4 U U 4 4* 44 U U U, U 4 .4 4 4 4
U
44.4
S
U
4.
*4 4 4 ATCTGGGAAA TTGCTTG'N'T GGACGATACG ATACTCTCCT TCTTTTGATT AACACTACAC AATAAAGACT CCAATTCCAT ACTAGTATCC ATTTC TTCA GTAAAAATTT AT'rATGGCCA TACTTCCATG GCAAAATGTA TCATTATCTA
TATTCATITAC
TGTAGTCGAT
AACTAGCTAC
1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 AATTCCCTCT GGAACACTTT GGGGATGATT AACTAATGTC CCAAATTCTC CACTACACCA CTTCAAAGAA TGAA=ITGA TTTTCTCCCT AGGAACTAGT TGTAAAATTA ATTCrT'rATA TTTTTAAGT CTTGTCACTT TATAAATATT T'M!rAATGTA AAAATTACAC CTGATAGTCC ATGGCCAAAA CTATATCCAA AATTACTATT ATCTCrCTCG CTTACATCAT TATATAGCGT ATCACCTAAA CTTAATACTA GCCTTAGAAC ACG~rCCTTC TCTATTCCTC TCCTATAATA TCTTACCAGT CTATTAATTA AAGGTAGAAG ACCATTAATA TAGTCAGACT TGTTTGAAAC ACTTGCAAAA TCAGTCTrT CAAGCTCAGT TAAAACACTC TT'rATATAAT TTAAGCATGC GAGAGTATTT GTATCGTAAT CCTCTATAAT GGATAGAACA ATGAAATATC CTATATCCCC o AGTTAAACCA AATGTGGTCT TAGATAAACA AACAGATGGC GGAATTGCAG ATAACATTTT ATTGTACAGT TGAGTATATG ATGATTTATC T'rrCAATAA'r TTTACATAGT ACATAAACAG TAATATTCCA GCTCTACCCC TATACATATC ATTmnCCCGTT 'rG'TCAAGAC ACCANTTAGA ACCTTTAAAA TTAACAGA TACTCCAAAT TGGATA'rTCG TCATAAATAT TA'ITAATAAC CAAAGAGTCT GCAATArr'rr CTACTTCATT ATGCAGAATA GTAACTAAAC TTTCATrGG GAG7TTTTTT CTATTAGATA AGTTTAATT'r ATATCCTTTT 7MTGCTGAT CAAAGCTTGGG AAAATAAATT TCAATGA'rAT CAAGTTGCTT TTCTAAAT'IT TCCAAATTAT TATTAGGTAA 1193 ATATrrCATA AAATAGTCAr ATCCAGAAAA TTGATGTAGG GAAATAAAAT GATTTCCAAA ATCATCGTAG ATTTCATTGA TATTTGTATC TGTATAAAAA ATCGGAATAT CTAATAACCT cATT-TGTrcA cATTcGcTrG cTAcAATAcc TTGAI-rAGAA AAcrrATTGC TCCAGAGATr TTCCAATGCT mTT CTAT CTAACATTTC TrCATAAAAA TCAGGATGAT ATAAAAAAGA TAGTACTGAA GCATAGCTAT TTGTGTCTCT AAAAAGTACC CT'rGTcTI-TA AACCATACAA GTT rGCTTrT AATAGCATTT TAAATTCTrC TG3TTTTATTT AACTCTTCAA ATATCAGATA AAAATCCCTA AAACCTTTTT TGAAATCTTT TATATACTTA TCAAATTCTA TATCACCATC CCGAACAGGC AGGTTrTTCC CACCTTCAAA ATCAATTTTC CCA6ATATCAA ACTrTACCTr ATCAGTATTT AAATTAATTA AAACTTGACC AGGGATCCTC TA INFORMATION FOR SEQ ID NO: 207: SEQUENCE CHARACTERISTICS: LENGTH: 3454 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 2880 2940 3000 3060 3120 3180 3240 3300 3342 120 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: GAGAA-AAGAA. TGTTAAAGAA AAATGATATT GTAGAAGTTG AAA GAAGGGGCAG GAGTTGCCAA GGTAGATGGT TTGGTCTTTT TTG AGTGAAAAAA TTCTCATGCG TGTCCTCAAG GTCAATAAAA AGA GAAA.AATACC TTGTCCAGTC ACCACACCGT AATCAAGATC TAG.
TCAGGAATCG CGGATT'rAGG ACACCTTTCT TATCCAGAAC AGC' TTGTTGA TTTGACCCAT TAGAGAA TGCTTTACCG CAAGTCAAGG ACAGTCTCTA CTTGGTATGG AACATCCAGT AATGGTGTCT TGGAAACAGG GAN'TCTTTA TCCAGGATCC CGTCGTTTTG ATTTAAAACC GTGGTGCGTC GTGGTCACTA AAAGTIrTTTC GTGTTGACCA TCTGTCATGC AAAATATCAA CAAGATTGCT GGAATTGCAG CAAGTATCGC AATAAGGCGC ATTTTTCCGT AAGAATTCC TGTCATTGAC CAAGTCCTAG TTATGACGAA AAGGAACAGT TTCAGGACAA ATCATGGTCG ATTGATTGAA CAAGTTATCA CGACCAGAAT ACCAATGCGA
ATG'
AGG'
ATAJ
TAGC
CTGC
TTGGCTT TGGAAAAGTT 180 ATTTGGC TTACCTGCGT 240 rCAACTT TAAAACCAAG 300 TAGAAGT TGCTGAAACG 360 TGCCCGT TCGTCGAGTG 420 ~CCTCAT GCCCCTTGAA 480 CTCTTCG AGACCTGCTC 540 ;ATTGAT TCGGAATCTT 600 TTTTGGTGAC AACTCGTCCA AGCAGTTCCC AGAGATTGTG TTTTTGGTAA GGAGTGGCGC ACTCTTTATG GTCAAGACTA TATTACGGAC CAGATGTTGG GA.AATGACTT CCAAATCGCI' 84 840 1194 CGCCCAGCCT TTTACCAAGT CAATACTGAA A'rCGCGGAGA GACTrTTGCAG AGTTAAAAAA AGATGATGTG ATTATTGATG ATTCCTTAT CAGTCGCCAA GCATGICAAA GAACTCTACG GCAGTAGAGA ATAGCCAGAA GAATGCTTCT TTGAACAAGA TGTGACACGG CTGAAAATGC CATGAAGAAA TGGCTCAAC ATCTTGG?'rG ATCCTCCACG CAAGCGCTrG ACAGAAAGCT AACTCTATCA AACAGCCATT CCTA7"rCTGG TArTTGAACC GTGTTrGAACr GATT-CCACAA TTrACTAATGC CCACTATGTC AAGGTATTCA ACCAACCGTT TTATCAAAGC AAGCGCCCAA ACAGGAGCCG ATCGCATCGC CTATATCTCC TGCAATGTCG CAACCATGGC GCG1GATATT AAACTATACC AAGAGTTGGG ATATGAATTG AAGAAACTCC AGCCGGTGGA TCTATTTCCT CAAACGCATC ACGTCGAGAC GGTAGCACTT AGTGTTGAAA TTGAGCTGGA TGAGATGGAT GCTCAAATCA AAGAATATGT TTGGAATAAA GCACAGATAA AAAAGAAATG TGGAATAGAA GATAAACAAA TTATTCCACA GTGTACACCT AGACACTTCA AAATGATTTA ATAGAAAAGA CAT1TCCTACT TGGTATAGGA ACAGC'rATTA TAGGCTCAAT ATAAAGATTG ATAGGATCAT GATAAAGGCA ACAAAAAATT TTAGGATAAA AAAGATAAAG AGGTTT'ATGA TAAAGTTTGT A'rGGAGGTrGC 7GAACTTGC TTGTTGG'I-IT TA'rGTAAATA TAAGGAGTTC AAGACTTCTA TGGTGTGCTG CTTACAATAT CCATTTTAAT TTTAAAGATT CTrAAAGAG TCTTATTTTG GATAGAAATT AAAAACTCTA TCGTCTTTTT AGGATAAGAG GTCCCACTTA AAACAA'rTTA ATTCTCTAAA ACAAATCACT AAATCAATGT
GAAAAAGAAG
ATGACAC'TAT
'NTCCTTTCTT
TTTTATA'r-I-
TTTGCTAAGT
GAA'rATCTTA
CGTGTCATAA
CCAAAGTTTA
AATGGATATT
TGATGAAAAT
TATACI'CAAA
AAGCCATCAT
ATCACTI'TCT
GCAAGGTATC
AAAGGAGCGT
TGTATGCCTC
TTGTCCAAAC TCGATGTCGA TTGACAAGTG CGGAGAGCAA r'rTGAATTAA AAGT'rTCGAC TTACQAGAAC A'rTACAACAA
TAAGCACATA
AGCAACATAT
ATTATATATT
GTCTAAAAAG
GGATGCTTTG
GCATTTATTA
AATTAGAAAA
TGAAATGATT
TI=TATGA.AA
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 GTCCTCAN'T GAATAAAGAT CAGTTATAGA GGCAAATAGT AAACTCAAAA AATAAATAGT G'rAAGCAGCA CCCCcAtGAA T'rAA'rATGTA AATCTCAGAC ATTAGGAGGT AAAAATGGTA TGGCAAAATA AGGACGGAAT AACACAACAA AAGATTGAAT GAAATCAATA TTTATGCTAT AATTAAATAA ATTTAATGAA GAAAAAAAGA GGGATATTAT GGCACTTAAC TATAAACCAT TATGGATACA GTTAGCAA.AA AAAGGACTAA AGAAAACAGA TGTAATAGCT ATGGCAGGAC TTACAACAAA 'rGT'rATGGCA CAAATCGGA-A AGGATAAACC AATTACATT'r AAGAATT'TAG AAAGAATATG TAAGGCTTTA TCTTGCACTC CTAATGATAT TATTAGTTTT GAAGATATT TTAGTGACGA GGAATAGAAA ATGACTTTAA GGACAGAAGA TCAAGTTAGG GATTATGCAA 1195 GAGAAGTATA GGCTTTAATG AAGTTGAAGA AAACATCAAT CAAGGTAC"G G'rCAAATAAC TACTrrTAAT CAATTAGGCT TCAAGGGATA ?TCAAATAAC CCAGATGGTr GGTATrTACC TAAAAATATG AATGATGTAG CAATAATCCT TGAAACAAAA TCAGAAGAAA GAGATATrAG CAAACAAATT TT-rATrGATG AG? AATGAA AAATATAGAC ATAATTTAAC TAAAAATAAA AACTAGATCC TT'NrGAAA AAATTATATr A7-rAAATTTG TAACTGTATC TATrGACAAT GATAArATr ATCGATACAA TAGACTTGAA ATATGTTTAA GGAGTTTTA TGAAAaCAAA TrTTTTTCTAA TMGCTATTTT AGCTATGTGT ATAGT'I-I-TA GCGCTTGTTC TTCTAATTCT GTTAAAAATG AAGAAAATAC TTCTAAAGAG CATGCGCCTG ATAAAATAGT TTTAGATCAT GC'NTrCGGTC AAACTATATT AGATAAAAA.A CCTGAAAGAG TTGCAACTAT TCTGGGGA AATCATGATG TAGCATTAGC TTTAGGAATA GTTCCT-rrG GATTTTCAAA AGCAAATTAC GGTGTAAGTG CTGATAAAGG AG'N'TACCA TGGACAGAAG AAAAAATCAA AGAACTAAAT GGTAAAGCTA ACCTATTrGA CGATT'rGGAT GGACTTAACT PTGAAGCAAT ATCAAATTCT AAACCAGATG TTATCTTAGC AGGTTATTCT GGTATAACTA AAGAAGATTA TGACACTCTA TCAAAAATTG CTCCTGTAGC AGCATACAAA TC'rG INFORMATION FOR SEQ ID NO: 208: SEQUENCE CHARACTERISTICS: LENGTH: 3752 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: CGGGAGTATA CTTAATATAA TTATAGTCTA AAAATGACTA TCAGAAAAGA GGTAAATTTA GATGAATAAG AAAAAAATGA T1 TTAACAAG TCTAGCCAGC GTCGCTATCT TAGGGGCTGG TTTTGTTACG TCTCAGCCTA CTTTTGTAAG AGCAGAAGAA TCTCCA(;AAG TTGTCGAAAA ATCTTCATTA GAGAAGAAAT ATGAGGAAGC AAAAGCAAAA GCTGATACTG CCAAGAAAGA TTACGAAACG GCTAAAAAGA AAGCAGAAGA CGCTCAGAAA AAGTATGAAG ATGATCAGAA GAGAACTGAG GAGAAAGCTC GAAAAGAAGC AGAAGCATCT CAAAAATTGA ATGATGTGGC GCTTGTTGTT CAAAATGCAT ATAAAGAGTA CCGAGAAGTT CAAAATCAAC GTAGTAAATA TAAATCTGAC GCTGAATATC AGAAAAAATT AACAGAGGTC GACTCTAAAA TAGAGAAGC TAGGAAAGAG CAACAGGACT TGCAAAATAA ATTTAATGAA GTAAGAGCAG TTGTAGTTCC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3454 120 180 240 300 360 420 480 540 1196 TGAACCAAAT GCG?1'GGCTG AGACTAAGAA AAAAGCAGAA CAAGCTAAAG CACAAGAAAA AGTAGCTAAG AGAAAATATG ATTATGCAAC TCTAAAGGTA GCACTAGCGA AGAAAGAAGT AGAGGCTAAG GAACI'CAAA rrGAAAAACT 'rCAATATGAA ATrrCTACTT TGGAACAAGA AGTTGrCTACT CCTCAACATC AAGTAGATAA 11'TCAAAAAA C7rCTTGCTG GTrGGATCC TrATGATGGC ACAGAAGTTA TAGAAGCTAA ATAAAAAAA GGAGAAGCTG AGCTAAACGC TAAACAAGCT GAGTTAGCAA AAAAACAAAC AGAACTTGAA AAACTTCTTG ACAGCCrTGA TCCTGAAGGT AAGACTCAGG ATGAAXTAGA TAAAGAAGCA GAAGAAGCTG AGTTGGATAA AAAAGCTGAT GAACTTCAAA ATAAAGTTGC TGANTTAGAA AAAGAAATTA GTAACCTTGA AATATTACTT GGAGGGGCTG ATCCTGAAGA TGATACTGCT GCTCTTCAAA ATAAATTrAGC 840 900 960 1020 1080 1140 TTGACAGCCI' TGCTAAAAAA GCTGAGTTAG TGATCCTGAA GGTAAGACTC TAAAAAAGCT GATGAACTTC TGAAATATTA CTTGGAGGGG AGCTACTAAA AAAGCTGAAT CAAAAAAACA AACAGAACTT GAA.AAACTTC
AGGATGAA'TT
AAAATAAAGT
CTGATTCTGA
TGGAAAAAAC
AGATAAAGAA GCAGAACAAG CI'GAGTTGGA TCCATTTA GAAAAAGAAA TTAGTAACCT AGATGATACT GCTGCTCTTC AAAATAAATT TCAAAAAGAA TTAGATGCAG CTCTTAATGA 1200 1260 1320 1380 1440 1500 1560 GTTAGGCCCT GATGGAGATG AAGAAGAAAC TCCAGCGCCG AGCTCCTGCA CCAAAACCAG AGCAACCAGC TCCAGCTCCA TGCACCAAAA CCAGAGCAAC CAGCTCCAGC TCCAAAACCA AAAACCAGAG CAACCAGCTA AGCCGGAGAA ACCAGCTGAA ACCAGCCACT CCAAAAACAG GCTGGAAACA AGAAAACGGT GCTCCTCAAC CAGAGCAACC AA.ACCAGAGC AACCAGCTCC GAGCAACCAG CTCCAGCTCC
GAGCCTACTC
ATGTGGTATT
AACCAGAAAA 1620 TCTACA.ATAC 1680 TGATCGTTCA ATGGCAATAG TAACGCCGCT ATGGCAACAG ATCACGTGCT ATGAAAGCAA CAGCAATGGC GCTATGCCGA CGCTAATCGT GATATGGCGA CGCTAATGGT GATATGGCGA CGCTAACGGT GCTATGCCTA CCCTAACGGT TCAATGGCAA AGCATCACGT GCTATGAAAG GTTGGCTCCA AAACAACGGT 'rCATGGTACT ACCTAAACGC GTTGGGTGA-A AGATGGAGAT ACCTGGTAC'? ATCTTGAAC GCCAATGGTT CAAAGTATCA GATAAATGGT ACTATGTCAA CAGGCTGGCT CCAATACAAT GGCTCATGGT ACTACCTCAA CAGGATGGCT CCAATACAAC GGTTCATGGT ATTACCTCAA CAGGATGGGC TAAAGTCAAC GG'rTCATGGT ACTACCrAAA CAGGTTGGGC TAAAGTCAAC GGTTCATGGT ACTACCTAAA CAGGTTGGGT GAAAGATGGA GATACCTGGT ACTATCTTGA CAAGCCAATG GTTCAAAGTA TCAGATAAAT GGTACTATGT 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 CAATGGCTTA GGTGCCCTTG CAGTCAACAC AACTGTAGAT GGCTATAAAG TCAATGCCAA TGGTGAATGG GTTTAAGCCG ATTAAATTAA ATCATGTTAA GAACATTTGA CATrTTAATT TTrGAAACAAA GATAAGG?1C GATTGAATAG ATTTATCTTC GTATrCTrTA GGTACCTATC TTATGATTTC AGGAAATGTC ATTAAAAAA CGACTCATTT TCTCTAACCT GAAAAATAGA TTAGAGAAAA TGGGTTGTTT TATCTATTAT AGTrATITrGA A'rGAAGMnTAA GAAGAAGGTA TACTCACATC ATTCACATAA TCTGTATAT GACTATAAGT TTTAAAAAAC AA'TI'?1'AAG CTCTTCCTTG TCTTCTCrAA CCAAGCGTGT TATAATGAAT ACTGCTCAAG CGACCTTCAA TCGTGA.AGCA CACACGACCT 'rCAATCGTGA ATA.AACGAAT AGATGGGAGA CTrACCATGA G'rGATAACTC
CGGCTCTTTT
ATGACACAGA
TAAAACACGT
GCTCAAGGAG
'rGA.AAACGGC GTTrGTCGTGG GGATGAGTGG TGGTCTTGAT TCGTCGGTGA CAGGGC'TACG ATG1'GATCGG TATCTTCATG AAGAACTGGG GTCTGTAC!GG CGACCGAAGA TTACAAGGAT GTGGTTGCGG CCCTrACTACT CTGTCAATTT TGAAAAAGAG TACTGGGACC GCGGAATACC GTGCAGGOCG CACGCCAAAT CCGGACGTTA TOOCAGACCA GATrGGCATT GCGTTTTTGA GTATTTCCTA
A.
A
A
A A. A A
A
A
A
A* AA
TGTGCA.ACAA
ACTATGTAGC
GGAAATCAAG TTCAAGGCCT GACTGGGCAT TATGCTCGAG TGCTTCGTGG CGTGGACAAT GGCAAGGATC AACAACTTCA AAAA.ACCATG TTCCCAC'rAG TAGCAGAAGA AGCAGGCCTr TCGACTGCTA TCGGAGAAAA GAACTTTAAA AACT'rTCTC:A TGATGACTGT GGATGGTCGC GATATGGGCG GTCAGCGTGG CGGACTCGGT ATCGGTGGGC TTGTCGGAAA AGATCTAAGC AAGAATATTC CGCTCATGTC AACTAGCCTA GAAGCCAGTC AGTTrACGCT AGAATGTACG GCTAAATTCC T'rCATGTCAA AGGAGAAAAG ACAGAGGTCA CAGGACAGGC AGTTGTCTTT TACGATGGCG TTTTGGACTA TGCCATAACC TTGGGGGCAG TGGCGCGTGA TGAGGATGGT ACCGTTCACA AGACCTATTT CCTCAGCCA-A CTTTCGCAAG GACA'I1TGGA AAAGCCTGAA GTACGCAGAC AGAAGAAAGA CTCGACAGGG ATT'rGCTTTA GCAACTACCT GCCAGCTCAG CCTGGTCGCA AGCATGCAGG TCTTATGTAC TATACAATCG AACACGGCGG TGACAATGCC CCTTGGTTCG TCTATGTAGG ACAAGGATTC TACCATGATT AAGI'CCACTT TACTCGTGA6A ATGCCAGAAG 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3752
GTTACCGTCA
TCTTTGCGGA
GG
GCCTGACTCT AAGGTGACCG ACCACAACGC GCGATTACAC INFORMATION FOR SEQ ID NO: 209: SEQUENCE CHARACTERISTICS: LENGTH: 3580 base pairs I(B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1198 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209: TATTTATATT TrrrATCTC AGTGCCTTTC CACCTCT'M' T'rTAAACCTT TCTGTCTTAG TGGCATACTT TGATACCTTT TTAGACTTAA AGTCTrTAAT TATCTATAAA GATI'CTCCTA CATCATAATT CATlwTTTA TTTGTC?1'TA TCTTCTTCAT ACCATTTT1-AA GATTGTCACA
TACTGCTTT
ATATCTGTGT
ACATTTTTGA
GATAGGTCTT
GTGAGTTTAA
ATTCTCCATA
ACCACTGCTT TCCATGTATC T7rTTC7TTT AGATTTTTAT AAAAATGGGG GTGGACTTTT TCTT'rATCTA TCTCTATATC TTTCCATGTA ATTCCAATCT CGGTAATTTA AT'TTGCATAT CTCGCAATAC TGTCCTACAT 'rGGATAGTTT ATTTATCATT ATTCTTM=r GCTTAACCTI' TATCTATCTC TCCCTCTCTC GGAGTACCTC TACTGTC'rAT A'PTTGATCTT TATATTCAGT GCTACTTCTT TATTCAATTC CTTrGCTTTAT ATTTATGATA AAAACrCTAT AATTTT'rATC TTTGTAGCAC TAA'rTCTAGG CCTGTTGGTA CTTAACTCGA ATNTTTTAAA GCTTGCCI'AA TAATTGAAGT TTTAT'TTTT AATTTTAAAC AATGAATTTT AAAGACTGCT CCTAAAAATG AAACAGATAT ATC'rATATCT TCGTAGTAAC CTAAGATACC AGTTTTTCCA TCGAGTAAAT ATC'TTTTGG
TAAATAGAAT
CATATCTAGG
AAAATTTCA
ATTGTCAATA
AATAGATGAG
240 300 360 420 480 540 600 660 720 780 840 900 *3
V.
t. S. .5
S
*S 55 S S
S
T'rTCCCCTT'r TTTTCGGTA.A TAAATATTTC T?1'TTATTTT GT'rGTCTGAT ATTT'rTCCTA CCTGTCCTTT GTAGGATGAG TA'TTTTCTAG ATT7-rCyTGA ATAACTTTTT ACTTGAAGTT TTAGCTTTTG AACTAGTCGT TGTACTTTCT TTTTGTTTAT TATCAGTCCT GATCTTTTTA ATATTGCTGT TA'rTCTCTAT ATCCTATTTT TCATTCATGA TATTCTTTTA CTAATTTTAT CTTAAATTCT GTGCTGTATT TGCCATTAAA AAACTGACCT CC'rTTAGTTA GTTI-TTTGGC CTAACTTTTG AGGGTCAGTT CAAAATTTGC GACT'rTTAALA TCAATTCCAA TATTCAATTA TTAAGAGTTA ACATGGTGCT TCCCAATAGG AATCATTAGA GGCGAATTGG AAALTAGGGTC ACGTATAATT TTTGCT'rCA.A GATTAAAGAT ATCTTTAACr AGTTTATCAT TTAGTATATC TTCAGGCTrT CCCTCTGCAA CAAGTTTACC TTCI'TAATT GCAAATAGGT AATCAGCGTA TCTTGCTGTT AGA'"TATAT CGTGCAAAAT CATGCAAATG GTTGTCTTAT ATTTTTGGT'r 960 1020 1080 1140 1200 1260 1320 1380 TAGATCAGTC AAGAGGTCTA C'rCATCTAAA AGTAGGATAC CC"TTACCC CCAGAAAGTT AACCATTCAT CTGTTrATTA GTAGGGGAAA CGACCACGGC ATAGTTCTAT TTGATATGAG ATATCCAAGT ?PGTATCTTG GGCTAGGGCT AGAGCTATCC CTTCAACTAG GTTAT'rTGCT AGATCTTCAA
AAGTAGTTGG
ATACTCTTTG
CATTGGCCT'r 1440 1500 1560 1620 1680 1740 TTTCAAGGTC ATCTrTTTCCA AGACTCTTAA AAGGCTTTCT TTACAAGATC AGCTACTGTTr ATTGATTCAG GGATTA'ITCG AGATTGAGGT AATATAGCTA TGTCTrTTTCC TAAATCTT'rT TCTTTATAAG AATTAATTGA ?TTAATATCA AGCAATACTT CTCCCTCTAA TGCTTTATA AGTCGAGACA AG7=1'AAT -GAGTGTTGAT TTCCCACAAC CATGACCC AATAATAACT GATATTN'Trr CTTCAGGTAT 1800 1860 7?PMATATTT A'rATTrCCA AXA'rA rr CATCATAA CCACAGACCT TTCArrATA'r ATTCCTCCTG TTCATrTTA GGTGAACCTA ACAAGCCAGT TACAACACCT ACTCGGATATC GAGAATATGTr CTGATAACAA CTT?~TTGC CAATATTTAA CCTGTAATTG AAGTAGAAAA GAAAGCTCGG GATTTGCTCC AGTCTTTrTAT TAAAAAATAA GGTATGTCAT CTAACTTTGT AATTCATATC TTGCTACT AACTAGTAAA AT'rCCAACCA CCGCAG4GTAA GATTATTT-GA TTAGTAAGTA TATAAGTAT TAGCTGGTAA AATATTGA ATCCAGCTAA TATTGGGCTI' AAGATATACA AGCTATTGGT AAAAAArTAA AACAAGCCTT CACCAAGTTC AATAATTTCT TACTTACTAT TAGAACAAGA
GGCTATGGGA
AGCAGTTAAA
AAGTCCGATT
AACTAATATA
AAJAAGATAAA
CAACAATAAA
CCAGCTAAAA
GATACAGCGC
GCTAT'rTCTT
GTAGCAATAA
2100 2160 2220 2280 2340 2400 2460 GAGCCACTGA GCCATCTCAT AACTTCTTcT AATGAGGTGC CTGCTrTGT GACACCTTGA AAACCAATAC CTAATATTAT CAGTCTTCr AATAATATTA AAGATGATGT TAGTCCACAA -L-TGT'T'A ATACCAATAT GCAAAAGACC ATTATATCAG CA GCAAG AGGATTTCT= GCTGAAAAAC CATCTTTTTT GTTATTGAAA TAATTCCAGT AGCTAGTAAA 2520 AGTTrAAACTA 2580 GCTGCAATAG ATGAAGAACT TGTGACACCG AACATAGTTT GAAAGATAAA TCCTGCCAAT CCAAAAGACC AGCCAGCTAT AATTCCTGCT AATA.ATT'rTG GTAATCTAAT TTCCATAATC GAAAAACTAG CTCCAGGAAC AGT'rrCACTA TTTAAGACTT TAATCAAACT TGAAAAAGAA TAACTTTCA'r CTCCGATAAG TAAAATGAAA AATGATAGAC TGATTATTAT ACTGAGGAAA ATAGTGTTAT TCTATTTTT CTTTTTGAA TACCTATA-AT ATTrAGTTATT AACCCCTCTA TTTTTCATAG TTACATAAAT AAGTACTGGA TTGCAGTAAT TATCCCTACT TCAATT'TCAC CTGGTTTACC TAACATACGG
TAATAAAAAT
TAAATTTTGC
CCCCCGATTA
CCGATTATAT
2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 CACATATAAG CAAGAGCTCT GCACCTATAA AAGATGAAGA AATGGTCATT GTGCGTATAT CTTTGCTTAT AAATAAGCCA CAAAAGTGAG GAACTATAAG ACCTACGAAG CCAATAGGTC CACCAATTGC AGTAATACTT GAACATAAAA GCACACTTGC AATTA'IrGCA AGTGATCTTA TCCTATTAAC ATTAACTCCA AGACCAACAG CCAT7TCATC ACCCATAGcT AAAGCGTTTA AATCTGATGA AATAAATATA GCTATCAAGT GACCTAAAAT TATAAAACGT AGTAGTGTAG ATATAGAAGA TAATGTAGCT GCTCCAAG4GC TACCTATTTG CCAAAATCTA AATTTGTCTA AGACGTTATT ATTCCGTAAA ATTAAAAAAC TTACAAAACT GCTTAAAGCC ATACTAACAC 1200 AAGT'rCCTGA TAAGGCAAGT qr ATAGGGG TAAGGCCTGC TT7TCCGTTA CAGCAATCGC GTATACAAAA ATTGCACTTA CTAAGCCACC AATGATTGCG INFORMATION FOR SEQ ID NO: 210: SEQUENCE CHARACTERISTICS: LENGTH: 11378 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: CCAAATTGCT CCACAATTAT TATGGAGTCG TCGTTTGGCA GATCGGCGTG TCAAGAATGG TTGACAGGCA AGATATTGAC CCCCTATGAT ATGAATCGTA CAATATTTTA ACCCGTCTTC ATCGCTCACG TCCGTTGATG ACACAATTGA CTATGCCATG GAAACACCTG TAGATTTACT ACAGTCTTC CAGGAAACGG
ATATGTGTGC
AGCAAATCGT
GTCGTTTGGG
TTTGCGTAAA AATCATTTTA ATTTAGAGAG GACCATGCGA GACAGATAGT GGCTTGATT GTTTGATGTG GCCCATATGC TCAGTGAAGT CATGGCTGAT TTACGTCAGA CTATTCCAG CCATTGTCCA TGGAGATGTA CGACATAGTA ATTGGATTGA ATTTAGTAGA TTGGGATTCG GTTCGCTTGA CCGATCGCAT TCTGCCATTA TATTTCAGAA CATCAGTGGA AGGAATGGTT 3540 3580 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 GACCTACTAC GGTTACAAGT ACAATCAAAC ATTGTCTTAT TTGAGTCAGA TTTCCAAGTA TCGGGAGATT CATGGTTTGC GTCATTTCCG AGAAATCGTA AAGGGGCAAC AGAATACTA CCCTTGCAAG CCAAGGCAAA ATGGCOGGAC GAAGTTGGAA GTGGAAAGGG TGCCTTTGTT AACTATATCG GGATTGATAT TCAAAAGTCT GAAGTTGGAG TGCCTAACAT CAAGCTCTTG TTTGAAGACG GTGAGATTGA TCGCTTGTAT CGCCATGAAA AGCGTCGTTT GACCTACAAG CCTGAAAATG GAGAAATTCA TTCAAGACG GGTATTAAGT AAATTGTATT GGTATGGTCA
TTATATGAAC
AGACAAGTAT
GAGGCAAATC
TTGTTTGGCA
TCAGGTATGG
GTTTTGAGCT
TGTAGATG
CTGAACTTTT
CAAGATTTAG AAAATGTCAA GGAAAGAGAA GATGAGAGTT CCCAGTATGT GGTCCTCAAT ATGATAATCC CATTCATGTG CCAAGCAAAA CCCTGACATC ACGCTTTGGA CAAGGTGCTr GTTCTGACTT AACTGACTAC CAGATCCATG GCCGAAAAAA ACCTTCTTGG ATACCTTCAA ACGTATCTTG GATAACCGTG GCTTGTTTGA GTACAGTTTA GTGAGCTTTT CTCAATATGG CATGAAACTC AATGGTGTCT GGTTAGATTT GCATGCCAGT GATTTTGAAG GCAATGTCAT GACAGAATAC GAGCAAAAAT TCTCAAACAA GGGGCAAGTT ATCTACCGAG TTGAGGCAGA ATTTTAAGAG ATAACCTAAA ATTAGGCTGT ACAAGTGCTT 1201 TTGCTTTACA TAAGTTGGCA AACGTGCrA'r ACTGATACI'A AGAATATGAA AAGTGACGCG GGGAAATATC TTCGCCTCTT CATGAGG ACGTCGACGC AATCGCAACA ATCCTAGAAT TAGTCAGAGA AGTTrGTAGAA CCTGTCATAG AAGCTCCTTT TGAACTCGTG GATATCGAGT ATGGAAAGAT TGGCAGTGAC ATGATTCTCA GTATTTTTrCT CTTGAACGAC ACGGCAGACT TGACAGAAAT TATCACTCCT
ACATCCC'TTC
GAAAACCAAG
AGCCA'rCGAT CCAGAACAAT ATNTCCTAGA GATGCCGTCG CTGGAGCGGT AAGCAAAAGG TC7*TrGAAGG GACTATGGAA TATATCGACA AGACGCGTAA ATCAAAACCA CGTTTACCAG TTAAATTATA AAAGTGAAGA AAACATGAGI' AAAGAAATGC AGGGAATCAA AAAAGAAGAT ATCATCGACG GCAGACGCTA TGGTCACTCA GACAGCGTAG TTACAGTTrTA TACTGTCCGT GAAGTTGTTG GCTTGAAAGA TGCTCTTGCC ATTAATTCAG AAGAAGCACC AGCTGAGTTT GGTCGTGTAG AAAAAATGCG CAAgCAAACA CGTGCCATCA AATTACCAG1T
TGGAAAATAC
AACCTTGTTG
GAAAACCGTC
GAAAAAGAAA
TAGAGGCCTT
CAGTAGTAGA
CTAT'rGACTT ATGAAGTA'1r
CTTATGAACT
CAGCCCAATC
CTTACAATAC
AGATAAACCC GAAGAATTAC GTCCTAGACA CCATCAAGCC CCAGGTTTGG AACGTCCTTT ATCCATGTCG GGCTCTACCA GCCTTCGAAG AGGACGAGTT CAAATTCCAT ACAGTTTAGT GGATAGCTTT TGAGGATTCA CCGCATT'rG GAAGAAGACA GTCGCTTCGT TCCGCTTATC CAACGAAAAA ACAGGTGACT TGATAGCCGT TTGGAAATCA TGGAGACAAA ATCAAGTTTG TGCCAAACAA ACC.ATCATGG TTACAAAGAA CATGAGCAAG 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 AA.ATCATGTC TGGTACAGTA GAACGCTrTG ACAACCGCTT TATCTATGTC C-ATCGAAGC CCAATTGTCA AAACAAGACC AAATTCCTGG AGAACTTT'TT ATCGTA'rCGA AGTTTATGIT TACAAGGTTG AAGACAACCC TCGTGGTGTG TTAGCCGTAG TCATCCAGAA ATGATCAAAC GTTTAATGGA GCAAGAAATT ATGATGGAAC 'rG'!GAkATC ATGAGCGTGG CTCGTGAAGC AGGTGACCGT CTGTTCGTAG CCACAATCCA AACGTGGATG CTATCGGTAC AATCGTTGGA CTAATATCAA GAAGATTACT AGCAAATTCC ACCCAGCTCG rrACGATGCT GCA'rGGTACC AATCGAAGAA AATATCGATG TTATCGAGTG_ GGTAGCAGAT
AACCTTGGTA
G=TCTCATG
AACGTCTTTG
CCAGAAG'r'r'
ACGAAGGTTG
CGTIGGTGGTG
AAAAATGACC
CCAGCTGAAT
TTATCTACAA TGCCATCGCT CCTGCTGAGG TTGACCAAGT TrATC7rTGAT GAAAACGACA GCAAACGTGC CTrTGGTGGTT GTTCCAGATA ACAAGCTTC TCTTGCCATT GGTCGTCGTG GACAAAACGT GCGCTTGGCG GCTCACTTGA CTGGr'rACCG TATCGATATC AAGTCI'GCTA GCGAATTTGA AGCCATGGAA GACGCTGCTT CAGTAGAGTT GGAAGTAGAA AACGATACTG 1202 TAGAAGAATA AAAGCTGCTA GAGGAGGGAA AGATGAAAAC AAGAAAAATC CCTrTGCGCA AGTCTGTTcT GTCTAACGAA GTGATTGATA AGCGTGATTT GCTCCGCATT GTCAAGAACA AGGAAGGACA AGTCTTTATr GATcCTACGG GCAAGGCCAA TGGCCGCGGC GCTTATATCA
AACTAGACAA
GCATGGAAGT
AAAGAAGAGA
CGAGCAGGGC
GCCAAGTTGG
AAAAGTCATT
GCAGTCGGGA
AGGTCTWNA
CGCAAAAGAA
GGATGTGAPA
TGCAGAAGCC CTAGAGGCGA GGAAGAAAGC T1rTATGAcG GTTGCCAC~r GAATAAGCAA GCATCATATC GGGTG.AAGAA TCTTTCTAGC TCATGATGCT ATTATCAAGT AGAAATTGTA AATCGAGAAA GGTT'rTGGCI' TGGAATAGAA GAGGAGGACA CT'rGGAAAAG AAAGTAAAGA AGCCACTCAT CAAGTGTGGA AAAAGAAGAA GGTCN'TAAC CGCAGCTI'rA AGTTGATCGC TTATGTGGAT CACAAAGTGA AAGATAAGTA ATCTCTTGGG GCTTGCTCAG TTrGGTGGTCA AGGCCATTCA AGACGGCAAG GGACCCAATC TGACCAAGAA GATTCAAGAT ACCGTGTTTT CAACACTGGA ATTAAGCATA GTAACAGA'rG CTGGATT'rAC AAAGAAAATG TGAT7"rGTCT AAGAAAAGAT TGTACGAAAT AGTTGTAGCG CCrTGCAAAAG AGTTGGGCTT AGAAGCTG'rC GCTGCAAAAA TTGCTGCCAG
S.
S
S
S
CTTTAAGCCT GCAGCTGCTC CGAAAGTAGA AGCAAAACCT AGAAAAGAAA GCCGAAAAAT CTGAGCCAGC TAAACCAGCT ACCrGCAGCC CCAAAAGCAA GTGCAGAAAA GAAAGCCGAA AGCTGTAGCC AAGGAAGAGG CAAAACCAGC TGAGCCAGTC
GCAGCCCCAA
GTACCTAAGG
AAGTCTGAAC
ACTCCGAAAA
AAGTAAGTGC
AAGAGGCAAA
CAGTAAAACC
CAGAAAAAGT
3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 AGCGGCTAAA CCGCAAAGTC GGCAGAGCGA CGCAAGCAAA TCAGAAAAAC GACGGCCGTA CTTTAATGAC CAAGCTAAGA GCAAGAGGAT AAACGTTCAA AGCCCTAAAA GCAGAGCAAA GTAATTTCAA GGCTGAGCGT ATAAGGGCAA TAACCGTGAC GA.AGCACGTG CTAAAGAGCA CAACAACAAA ACGGAAACCG ATGGTGGA.AA ACA.AGGTCAA AGCAACCGCG ACAATCGTCG AGCAGCAAGG TCAGCAAAAA CGTAGAAATG AGCGCCGTCA ATCAAGCGGC TCCACCTATT GACTTTAAAG CCCGTGCAGC ATGCAGAGTA CGCTCGTTCA AGTGAGGAAC GCTTCAAGCA GTATCAGGCT GCTAAAGAAG CCTTGGCTCA CTTTGAAGAA GCGGCTAAGT TAGCTGAACA CGTCCCTGAG AAAAAAGAAC CTGCAGTGGA CAAAAATCGT GACGATTATG ATCATGAAGA AGCTAACAAA CGCAAGGAAC CAGAGGAAAT AGCACAGCAA GTTCAAGCAG TGGTTGAAGT TACACGTCGT AAAAAACAAG CTCGACCAGA AGATGGTCCT AGAAAACAAC AAAAGAATCG AAGTAGTCAA AATCAAGTGA GAAATCAAAA GAATAGTAAC 'rGGAATAACA ACAAAAAGAA CAAAAAAGGC AATAACAAGA ACAACCGTAA 'rCAGACTCCA AAACCTGTTA CGGAGCGTAA ATTCCATGAA TTGCCAACAG AATTTGAATA 'rACAGATGGT ATGACCGTTG CGGAAATCGC AAAACGTATC AAACGTGAAC CAGCTGAAAT TGN'AAGAAA ClIrTCATGA TGGGTGTCAT GGCCACACAA AACCAATCCT 'rGGATGGGGA AACAATTGAA CTCCTCATGG TGGATTACGG TATCGAAGCC AAACAAAAGG TrGAAAGTGGA TAATGCTGAC ATCGAACGrrT CIrGTCGA AGATGGTTrAT CTCAATGAAG ATGAATTGGT TGAGCGTCCA CCAGTTGTTA CTATCATGGG ACACGTTGAC CACGGTAAAA CAACCCTTT GGATACTCTT CGTAACTCAC GTGTTGCGAC AGGTGAAGCA GGTGGTATTA CTCAGCATAT CGTGCCTAC CAAATCG;TGG AAAATGGTAA GAAGATTACC 'rrCCTTGATA CACCAGGACA CGCGGCCTTT ACATCAATGC GTGCGCGTGG TGCTTCTGTT ACCGATATTA CGATCrTGGT CGTAGCGGCA GA'rGACGGGG TTATGCCTCA GACTA'rTGAA CAAGATTGA'r
TGTGATCTC-A
GCCATCAACC
AAACCAGGTG
AC1TGCTTGGG ACTCAAAAGC AGCTAACGTr CCAATCATCG CTAACCCAGA ACGCGTTATC GGTGAATTGG GTGGAGATTC TGAATTTGT GAAAT'rTCGG
TAGCTATTAA
CAGAGCATGG
CTAAATTCAA
CCAAAATATC GAAGAATTGT TGGAAAC-AGT CCrTCTTG'rG GCTGAAATCC AAGAACTCAA AGCAGACCCA ACAGTTCGTG CGATCGGTAC GGTTATCGAA GCGCGCTTGG ATAAAGGAAA AGGTGCGGTC GCAACCCTTC TTGTACAACA AGGTACCTT'G TGTCGGAAAT ACcTTCGGTC GTGTCCGTGC TATGACCAAC AGTTGCTGGA CCATCAACAC CAGTCTCTAT CACAGGT'N'G TGACCACTTT GCCGTTTACG AGGATGAAAA ATCTGCGCGT AATGTTCAAG ACCCAATCGT GACCTTGGTC GTCGTGTTAA ALACGAACCAC CGATGGCGGG GCAGCAGCTG AAGAGCGTGC 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 CAAACGTGCC CTCATGAAAC AACGTCAAGC TACCCAACGT GTTAGCCTTG TGATACCCTT AAAGCTGGGG AACTCAAATC TGTTAATGTT ATCATCAAGG AGGTTCTGTT GAAGCCCTTT CTGCCTCACT TCAAAAGATT GACGTCGAAG GACTATCGTC CACTCAGCGG TCGGTGCTAT CAACGAATCA GACGTGACCC
AAAACCTCTT
CTGATGTACA
GTGTCAAAGT
?TTCCGAAGC
TTCAAATGCC TVTATCGTTG GTTTCAACGT AGAAGCTGAC GATGTGGAAA TCCGTCTTCA GGAAGAAGCT ATGAAAGGGA TGCTTGATCC GGTTATCCGT CAAACCTTCA AGGTGTCTAA CAACGGTAAG GTTGCCCGTG ACTCTAAAGT TGATGGTGAA CTCGCAAGCT TGAA.ACACTA TCGTGAAGGT GGATTGATGA TCGACGGCTA
ACGCCCTACA
CAGCATTATC
AGAATTTGAA
AGTGGGAACT
CCGTGTTATC
TAAAGACGAC
CAATGATATT
CCACAAGCTC GTCAACAAGC TACAAGGTTA TCGAAGAGAT GAAAAAGTTA TTGGTGAZAGC ATCGGTGGAT TTATGGTTAT CGTGATGGTG TCGTTATCTA GTGAAAGAAG TGACAAACGG AAGATGGCATG ATGTGATTGA GGCGTATGTC ATGGAAGAAA TCAAGAGATA AGATT'TTTTG CTCCTTTCTT AGGTGGTGAG 1204 GGACGCAAGC AAACCGATGG TTTCAT'rGCr TArrmlrGAG CCTAGGGTCT CAAAAATCCC CTG'rGATGGG ACTGATAAAT CAGTTCCATC ACTT'rCACCA CGGCGAAAGA AGCAGATGAC TTCAAArTGA ACTTCG'rTrC AATTTAAACT GAAAATCAAG AACT??AAAA TAGCTAGGTC T~GCTGGCCTA GCNrTTGG1-r CAAAGTAGAG AAACGAATAT CATGGCAAAT CATTTCCG'rA CAGATCGTGT GGGCATCAA ATCAAGCGTG AAGTCAATGA GATTT'rGCAA AAGAAAGTCC
GTGATCCACG
TTGCCAAGGT
TCGGGCTTGA
ACAAAATCCC
ACGAGATGCT
G4GAGGAAAAT TGTCCAAC1' CTGACCATCA TAGATGTTCA GATGCTGGGT CACTTGTCTG TTATTACACC ATTTTGAGTA ACCTTGCTTC GGATAACCAA AAAGCCCAAA AAAAGCAACT GGTACCATCA AACGTGAACT TGCTCGCAAT TTGAAAT'rGT AGATTTGACC T'rCGTCAAAG ACGAGTCCAT CGAGTATGGA AACAAGA'rTG ACCCAATCTG GATAAGAACT AAAGAAGAGG GGTTGCCCCT CrTT'T'GG'r AGGTTGAATT TGAAATGGAA AAATATTCT'r TTATAATAGA TTGAAACTAG AATAGTACGC CTCTACTTCT TGTCCTGTTC TTGTTTCATT AATTAGAAAA TGCTTT7'rT GAGTTTGTAT ATGGCTGAAC ACCTGVTT~ AAGGTTGATC TCCAAAAGAT ATTCCTACCT AT'rAGATCGT GT1AGCTAATG TGTTTIWGCA GGATTACCTG TCAAT'TrAT GCTT'rCTCTC
AAAATATTGT
TTAATA'rAAA
GTAGGAAATA
TAGAAATCGA TTTGACTGTC AAAGGGATTC TGTAT'TTTT TAATATCA'rA AGGTGCAAAA
CTGATCGATT
AATGTTATCT
AAGAAATAAG
AAGACTTAGC TATGCAAGTA TTGCAACAAG TGGTGAAACT GTTCGAAATT TTTAGTGGAT AAGTTTTCCA AAGAATTGGA TA'rrGGAACA AGGTCCAACG CTTGTATTCG GGACAATGTA GAGGGCT'rGC TATGGCAATT TGAAATTGGC 'rCAAGAATTA
ACTCTTCTAT
TTATTAGCGA
ACCATTCCAG
GGTTATATTT
CAAAATACCC
CTCAAGAAAT
GTGGGACTTC
CTGATGTGGC
ATGGTTATGA
TCTTGCTTTA
6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 GGATCTTTGG GCrCACGAG AGGAGTTGAG TCAAGATGCr TCTAGGCGTA ATG'IrAGGGG TGAATGGAAC CGCTGC=rG AATTGCCAAA CAGGTAATGA AAATAGTGCC TAATAAAGCT CCCTA7TTTTG AAAAAAGTCT 'rAAAAATATT TGG;TGTGAAT CAAAGGAATG GGGAAATT'rA TTCCTATCTr TGCAACTATG AAACCAATGG GGGAAAGCT TAGTGAAGTT CAATATCAAG AAGATGTTGA
GGGTGGTATC
GCAGAAAGAA
AACAATCCGA
CTACGTGTTG GTAGTATAAC TTAACAAAGA CGCTTGGTA CTTACCAAGG GAGGGTTGGC A'N'TCAGGTG GTTTAACCTT TTATCCAAGC TAGTCAACTA AAAGAGGCTG AAATCATCAA TTATGGTAAT TAC7'=T~GA' TGACCATGTC GTTGT'rG=rC AGGAGAGTAA TATGAATCCT ATCAAAGCTT TTGCTAAAAT CCCTGCAAGG TGTAAAAGTG ATGAAAACGA TAAAGAAAC TCGGAAACT 7Tr"ATTGCC GACAAGTTAA TGCATACGGC TCGGTGGCTC A'rrAAGCCAG 1205 AGGAGAGAGA ATGAAATTTT TTTGG'rCTTC TTGCTATTCT GGATTGTGAA ATTCTTT'GG ATGATCATCT CIrTGCAGT TAG7TTAA GATATTGGAT TGGCTCTTTA AACTrATC-rA AGAACTAGCA GGAACTCCAC TGCTAGrTr TTAT'rCTCr CAGTAAAATC AlIrTATACT C?1'CGAAAAT CTCTTCAAAC CTATA'rATGT 'rACTGACTTC GTCAGTTCTA TCCACAACCT TTTTATCAAA CCGATTArrC CCAA'rTGCTG TTTTACAAGA GATGGTAATC CAAGTTGCAG TCCATATGGT ATAATATAAG 8460 8520 8580 8640
ACTTCGTCAG
CTACT7TTGCT
ATATCATCGA
TTCTATCTAC
CTTGATTT
TGTGTCAATT
AACCTCAAAA CACTGTTTTC CA'rTGAGTAT TAGAACATAC CCTGTTGCAG AAGTGGTGGA T'AAACCCC TTGCCAATCC CAGGGT=C'rA AGCTAGCAGG AAATTCTAG'r GGAGTTGGGT GTCGTAAAGT ATCACTTAAA CACGTCAGCT TCACC~rGCA 8700 CAAAACGGTC TTTTGAGCTG 8760 AGCAACCTGC GGCTAGCCTC 8820 AATGGACGTC GTCATGGACA 8880 CAAGCATCCA GAAGTCTTGG 8940 CTTAATGCGC AATACAGTTG 9000 AACTCCTATG GACAAGATTG 9060 AGACTAATGA CACATGAACG 9120 GGCGCCTCTC CTGAGTCGGT 9180 ATCGAGATT CCCTTATGGA 9240 GATGTTATGG AACTCTGTGA 9300 a
TACGCACACT
GATTCATATC
TCAAGATCGC
GCACGAGCTG
TGTCCATGCC
GCAAGCGAA'r CCCTACGAAG CTACGGGATA TTTTGTTAGA 'rTTGATGCGA CCTTTACGGG ATGAACTCGG ATTCGGGCGT AATC'1NTA AAAATGCTAT
TCATTGGATT
ATTGCACAAT
CGTGTCAGCC
CACTT'1rTGAA TCCAGGTCAC CCAGT'rCGTG TCTTCAAAGA TCGCATTCGT AGATTGTTAG ATACCTATGA GATGCCTAAG GGTTTGGTGC GTCAGATGGG ACGTAAGGAA GAACTCTTCT TTCCTATCAT AGTTATGTGG GGAGTCGATG ATCAGATTAG CAAGTCACTA CCAGAAGTGT CAAT'rAGCAG CAAAGGTcT
AGAAAATCTG
GTCTATGGA
ACTTGTGGGT
GGAGCGCT AT GGAACTCTr'r
TGTAAAGGAA
GAAGTrrCAG A'rAC'rGAGCA GCTCTCCGTG CCCCCTTGAT GACGAGGAAA TGCTGGCGGA CAATTTGACA TCCATTACCA GGACACGATT CACCTCCCAA CAAACAGCTC TAACGACAGC GCTTTTrGAAG CTTTGCG) C CTCATGATTC TCCTTGAGTC GATGCCTATG GCTATGCCAT TTTATTGAGG AAAAGATTGC CAACAAC'rCA TAGATACGCC 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 AGAG7ITGAA AGIWI'CATTT T'rTTACTCAG GATGACTGGC CATCCGTCCG TCAGAGAAAT AGAGGAGCCT CTACAGCTAG AGAAGGCCAT TTTACCATTA TAGTCAACAG GCT'rGGTA TCATCTCCCT ATGGAGATTA TCAAGGAAGA GTCCATCCTC ?TCAGATTGC GAGGAGAGC GCGTGCCAGA ACGACAGAGC ATACGGCAGA AGGTCAAGTT CCTTTACCCC TAAGGAAAAG GAAGCTGTGC ATGGCTATCT TTCAGTCGAG CAGGCCAATC CCT'rTGTCAA TAAAGAAGAT ATT=CCAGT
TGGACCGCCA
TCATCCTCAA
ATTACAATGA
S.
*5 a S* a a a a a. .a a 1206 CAATACGCCA GCTGATGAGA TGATTTTCAA ACGGACGCCG TCCCAAGTCG GGCGCAATGT CGAACTCTGC CATCCGCCTA AGTACTTGGA CAAGGTCAAA ACTATCATGA AGGGGCTrCG TGAGGGAAGC AAAGACAAGT ATGAAATGTG G~rrCAAGTrCT GAGTCGCGAG GTAAG'1-rrGT CCACATCACC TATGCTGCAG TACACGATGA AGACGGAGAA TTCCAAGGAG TGTTGGAGTA TGTTCAGGA'r ATCCAGCCCT ACCGTGAGAT TGATACGGAC TAT'TTCGTG GATTAGAATA AGGAGAAAAA ATGAGTTACG AACAAGAATT TATGAACAA TTTGAAGCTT GGGTCAATAC CCAAATCATG ATTAACGACA TGGCGCACAA GGAAAGCCAA AAAG~rrACG AAGAAGACCA.
GGACGAGCGT GCCAAAGATG CCATGATTCG CTACGAGAGT CGCTTGGATG CTTATCAGTT CTTGCTTGGT AACTTTGAAA ACTTCAAAGT AGGCAAGGGA TTCCATGATI' TGCCAGAAGG CTTGTTTGGT GAGCGAAATT ATTAAACGAG AAAGATrCTr GATTN'TCAC TAAAATCTTG ATAGAATGTT TATGTTAAAT CCTTGTCAGA GCAGGGATTT TTTATTGAAA GGATTTTATC ATGTCAAAGA AACTCAATCG TAAAAAACAA TTACGAAAT G CCTCCCTCG CGCAGGTGCC TTTTCAAGTA CGGTGACTAA GGTTGTAGAT GAGACAAAAA AAGTCGTGAA GCGTGCAGAA CAGTCAGCAA GCGCAGCTGG TAAGGCTGTT TCTAAAAAAG T'rGAACAAGC AGTAGAAGCT ACCAAAGAGC AAGCTCAAAA AGTAGCTAAT TCTGTAGAAG ATTTTGCAGC AAATTTGGGT GGACTTCCAC T'rGATCGTGC CAAGACT'rTC TATGATGAAG GAATCAAGTC TGCTTCAGAT TrCAAAAACT GGACTGAAAA AGAACTCCTT GCC'PTGAAAG GAATCGGCCC AGCTACCATC AAGAAATTGA AAGAAAATGG CATCAAGTTC AAGTAATTTT TCTTGAGCCT TCCATTTCCG AAAA.AATCTT GCTACAATAG AGCCATTAGA GGTGTTTTGA ATCCCACATT TTACAGAAAG TGGCGGCGCT GAGAAGTCCA CAAATGTGTC AAAACTGGTT GCTAATGGAT GAAAAATTGA AATAAAACTG TCTTTTTGCT TTAAAGACGA GAGTTGCG INFORMATION FOR SEQ ID NO: 211: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 4156 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: CCGCGAGCCA CGGCGAAT'rT GCTGCGGGTA TITCATCAGTC AGGATCTATG ATCTTTGGTG AACAAGAAAA GGI'TCAAGTT GTGACCTTTA TGCCAAATGA AGGTCCTGAT GATCTATACG CTAAGTTAA TAACCCTCTT GCTGCA'ITTG ACGCAGAAGA TGAGGTTCTA GTTN'GGCTG 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11378 120 180 1207
ACCTTGGAG
GTAAG'TGC
GCCTCATGGA
TGG=r~CCA TTTAACCAAG CATCATCACA GGACT'rAACT CGCTGCTGCA GGTGTAGAAA CTAGTCGCGT GATGGGAGAA AATCCTGAGC TACCGATGTT GAT'rCAAGCC TACACAGAGC AAGTCGCTGC TAATATCATT AAAGAAGCCA TAAATCCAGT CGAAGAAGTI' GCAAGCGCTG CAGAAGGAAC TGTTATCGGA GACGGTAAAT GTCTAC7'rCA CGGTCAGGTT GCAACTGCTT AAGATGGCAT CAAAGCTCTT CAGCTGCTCC AGTCCCAA TGAAAATCAA TCT'rGCCCGT GGACTCCAGA TTCAAAAGCA ACC'rCGTAA AGAA'rrGATT CAATTCAAAA ACTGATTGAG TCTTGTIrGA AACACCTCAA CTCTTAATGT TGGrrCTATG CTATGGACAA AGAAGACGTT ATGTCCGTAA AGTACCAAAT ATGTCAAATA AGCCATTATT TAGTAGTCGT TGTAGCCTTC ACCAACCACT 'rGTAGCCTGT TTATCCTCGG TGGATCGCTT
CCAG.AAGAGC
ACTGCTATCC
C7rGACACAC AATCGTATCA TCGTTGC7TC AAACAAGCAG CTCCAGGTAA ATTTCAAAAG ACCCACGTTT GATGCCCTTC GTGCCATCGA GCTCACTCAA CAGGTAAAAC GCTACATTG AAAAAATGCG GATTCTAAAA AAGATTTGT'r TATGAAAGGA TT'rTAAACAT TTT'GCAGGTC T'rGAAGGCAT ACCCTTATTG GGCTTGTAAC CAAA'rGATTG CCCTTGGTTG AGATAACGTG GCTAAAGACG TGTCAAGGCT AACGTGGTTC TGGAGAAACA CATGCCCTTA AGGCGGCGTG CCAATCAAGA AT'rGGTCAAT ACCGTTTTGT TGACTTGGGT GTTGAA'N'ro TCACTTGATT A.ACAAAGCCA
GTCTATTAT
CCTCGACCAG
AGGTCACTTG
GTCAAATATC
TCTATGGTTT
TTCCAATTI'C
GAAGCAGGGA
GGTGCTGCTA
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 TCGCTCCTGA TGCTGCACTr GCTCTGTCG AC'rTTACCAA GACTGGTA'rC GGTGTTGCCC GACTTTTCTT GACAATGAT'r GTTCGTACAA CTGCCGCTAA AAAAGGTGAC rTTCGGCGCTG TCCAAGGACT TCGTATCGCG CTTCCTGCAG TACAAAGTA'r CCTTAGTGCC ATOCCAGACT GTATGGTCGT TGCCGTTGGT TACGCCATGG GGCCA'rTC 'r CGCTCTTGGT TTCGTTCTCG TCGGTGCTAT CCGCGTTGCT ATCGCTCTTA ATGGTGGCGG AGGAGCCGCA ACTTCTAACG CTGCTGCCAT TATCATGGTT CT'rGGTGGTG AAGCGGT'rGC TATCCCTCTr GCTGTAGCTG TTTCAGTTGG TTTCGTTCAT ACTGCAGATG TGGAGCGIGC GCATTTCATC GCGCTACTTT CTCTTCTCCT TATGGTACCA GGCTCAAAGA TGGTATG.GCT TTATCAACAT GATGGCAACT CTGCTGTGTC AGATAT'rACT TCTACCTTCA CC~rrCTAAA
ACTGAAACTG
ATCGGTGGTG
CGTGAAGTAT
CTAATCGGAT
ACTGGTGGAA
ACCCAATCGG CGATATCCTA GAAGACTACT AAGATAAGAA AGGACTGAAA ACATCATGAC TGAAAAACTr CAATTAACTA AATCAGATCG TAAAAAAGTT TGGTGGCGTT CAACCTTCTT ACAAGGGTCT TGGAACTTTG AACGGATGCA 1208 AAAC1'TGGGC SrGGGCTTATA CACTCAI'TCC AGCTATCAAA AAACTCTATA CTAAAAAAGA AGATCAAA'rC GCTGCTCTTG
TGCTCCAGTC
CGATGACGCT
TGACCCAGTA
CCTTACTGGC
GTCATTCTTG
AGCGTCACCT TGAGTTCTTC AACACTCATC CATACGTAC CTCTTGCGCT TGAAGAAGAA CC'TGCTAACG GTGTGGAAAT ATGGGGGTTA GCTATCCAAG GGG~rAAAAT CGGTATGATG GGACCTCTTG ITrCTGGTTTA CAGTACGCCC AATCC~rGGA 'reT6TrCGGTG AATATCTTGG GGCCACTCCT C7TCTT'rGTT GCATGGAACT TGGTATGTTC AAGAGATTGG A'rACAAGGCT GGATCAGAAA
CTGGTATCGG
CTTCACTGC
TGATTCGTAT
'rCACTAAAGA
GGATGTTCAT
=TCTAAAGT
CTAAAGGTAT
AAGTTACTAC
TATGTCTGGT GGTATCCTTC AAGATATCAC TCTGCTTCCTTGTTCAAC GCTGGGTAAA TCAACTAGAT GAAAAGGCTT ATATCCA'rTG CCAAGAAGCA TTCGCACAAG TAGGACAAGG TrCCAACAA AACTTGCATA TGTTGATTCC TTGCATGTAC TTACTTAAGA AAAAAGTATC AGTGGGTATT GTGGCACATG TTCrrCACAT TAAAGGAGCT TCTATCC~rG TATTAAATTT GCTTTCGATG GGATAAATTG CCAGAAGGGT A'GTCTCAA ACTCCTGAAA TGGATTATCA GGACTACTCC TTACTTTACT TCCAATCACT AT'TATCCTTG CCCTCTTCGC 0 S @0b S
SS
0
S
55 5 0 0 00 5 5 0 0
S
GTTCTAAAAT CTGATrCCTr CATGGTGCAA GTACGATTGG TCCTTACGAA CAACGA'N'TG TAACCATCTG TACCGACCTT CCATTTTCGT CCAAGTCAAC TAGTCCAAAC GTCCAGCCITr TAAACATCTG TCGCAAGGAT ACCAAACAGT CTCACCTGCT GAACGTTGAT GTAGCGAACT AAGATTTTCC GTAAGGTTTA TTT'rCTATGC IrCTGCrCCA
GTATGTGTAT
CATGTAATCA
TTTTATTCAG
TAGGCAGCTT
TCGTCCATCC
AGCALACTAAA AAGGAACCAG CCAAGGCTCC CATTGGATCC GTTCT'rCTGC TGTCAGCAAT AAGCGTCTGA GGCAACAAAG 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 GTCTCCCCAT GAGTTTCAA ACCTGTCAAG ACCATGGCGT GTCTTGAGTA AGTTTAA'rcT GCCAGCTTAC GGTTGCTGAG TGCATTrTGGG CAATCGCCAA GCACGGCTAC CAACCACATT
CCTTCCACTT
GGGTCATCAA
CCATGCTTGA
CTGGCCGACA
TTCTTTCAAG
CCCCAACATC
GGTTGATT'rA GC7TTTCACTA 'rTCAAAGTCA
TCAGAACCAA
CGCTCCATTG
TCAACTGTGT
TAGTCTTCTA
TCACTTTTGT
TCAGCAGTTG GAGCATrGAT AACAGAAACG 0040 0 0550 S. *S
S
AAGGAAGATT GACATATTTC TTGTAAAACT AGTTGTTATC TrATCGCGA ThAGCAAAGT TAGCAAGAAA GTTAAAGATT TCTTGCAAGA GATCTGCACC AGAAACAAGC AAGTCACGCA TAAGGATCGC ATTTAGCTCA CGACTGCTGC TAGGCACGAC ACCGTATTTT TCAAAGAGGG CAAACTTGCG TGGTGGAAGT CCTAATGACA GGTCTTCTTT~ CTTAGC'rTGA ACAGTCGCTT AGATTTGAGC ATCTTGACGA AGCAATTTAT TACATGAAAC AGACTCACCA TAAACTGACT AAACGACCAT ATCCCATTCrA CCGCCATCTT CTTGTGGTGT GATTCCTTTT 1209 GTTGAGGTGT T'rGGAGTAAG AAGCTAACtT GCGGCTAGTC AA'rTCTTGGT CTGAAGTCGC 3780 AATGACTTGC TCCAAGAACC AGrNGATTT CTCATACTrA TCCCAGAAGA AAGTGTGGGC 3840 TTGTGACAAC TCAAAGTTCT CCAATTTGTA TTGCGAGATG AGT rTGTGGC GGAAGGTGTT 3900 GAGAGCCGCA AACATCCAGC AACGACCAGA CGCTTTCTGG TTAGTGACCT TGTCCT TGGT 3960 TAAATCCAAT GAGAAAACAG GTGTGTrGTC TACATGGCTT TGGCGACGTr CCAGAGCTGC 4020 AAAAATTCCG TTGTGGCTGG CAGCAT'rr'C AATCGCTTGG TATrTACAT TTGCTTCATA 4080 GTTGGCAAAT AGTTTATCAG TA.AATGATTC TTGAATCGCG T'rCATAGATT CCTCCTTTTA 4140 GTCTACAGTG TATTGG 4156 INFORMATION FOR SEQ ID NO: 212: SEQUENCE CHARACTERISTICS: LENGTH: 3902 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: AAAAACAACA AAATAAAACA AAAACAAAAA TATCGAGGTT TAT'rCAAA ACTTTCGATA *TTTTTATTAA GTTATTATTT TGTTGTTTCT AGTTTACI'T TTGATGG TTA AGAGTGGT GG 120 *AGAATTATAC TCAATGAAAA TCAAAGAGCA AACTAGGAAG CTAGCCGCAG GCTGTACTTG 180 *AGTACGGCAA GGCGAAGCTG ACGTGGTTTG AA'IrTGATTT TCGAAGAGTA TTAGTGCAAA 240 CCGTAGTTGT AGTCATCATC TTGCATGGCT TCAACTTCGC CAAGAAGGTA ACCATTTCCG 300 ACTTGAGAGA AGAAGTCATG GTTGGAAGTT CCTGTTGAAA TACCGTTCAT AACGATTGGG 360 *TTGACATCTT CAGCTGAATC TGGGAAAAGT GGATCTTGTC CCATGTTCAT GAGAGCTTTA 420 TTGGCATTGT AGCGAAGGAA GGTTTTAACC TCTTCAGTCC AACCAACACC GTCATAAAGA 480 ***CTCTCTG'rGT AGCCTTCTTC ATTTTCATAA AGAGTATAGA GTAGGTCGTA CATCCA'!rCT 540 TTGAGTTT'PT CTTGCTCTTC TTCAGGTAAT TCATTGAAAC CAAGTTGGAA TTTGTAACCA 600 ATGTAGGTTC CGTGAACAGA CTCGTCACGA ATAATCAATT TAATGATTTC TGCAACGTTG 660 GCAAGTTTGT TGTTACCGAG ATAGTAGAGG GGAGTGAAGA AACCAGAGTA GAAGAGGAAG 720 *GTTTCGAGGA AGACGCTGGC AACT'IrCTTT TCAAGTGGGC TGCCGTTTAG GTAGATTTCG 780 TTGACAATCT CAGCCTTCTT TTGTAGGTAA GGATTGGTAT TGGTCCATTC GAAAATTTCT 840 TCAATCTCAG CCTTAGTATT CAAGGTAGAA AAGATTGATG AGTAAGATTT AGCGTGGACA 900 1210 GATTCCATAA A'rrGGATGI-r ATTGAAGACA GCTTCCTCAT GTGCTGTACG GATGTCTCG CGAAGGGC?1 GAACCCCAGT ?rCAGATTGC ATAGTGTCAA GAAGGGrrAA ACCACCAAAA ACTTTTCCGA CCAAGTCTTI CTCTTTGTTA GATAGCTr'rC TCCAGTCATC CAAG'rCGTTT GATAAGGOAA TACGTGTATC GAGCCAAAAT TGCTCCGTCA CT7?t=CCCA AGI'GATTTG TCGATGACAT CTTCGATGGC ATTCCAGTTA ATGGCTTTGT AGTAAGTTTC CATTAAAAT CTCTTTCTGT GTT'rAGTATT GCGAACTCAC AATATTTCT GAGTATCGCA CAAAAAGTCG GAAGCCCGAC TI=?AAAATG
CATAGTAGAT
CTAAATCACA
ACGGACGTAG
GTCACGTGTC
GCGCATGAAG
ATCGATGACT
AGACAAGCCA
TTGArrAT CAGTGCTGCT CAGCTTTCAC ATTGGCC TAGATAGACT TGATTCCCTT
TAGGGAAAAA
GCCGACTTCT
GTTAAAGGCA
AcrTTACCA'r AATTCTATAG 'rTACATAAAT TA'rGTTATGA TAGTGTTTCT ATGCTAGAAA CCACCCTCAT CTGTAAAGGT TAGTTACGAA GGATGGACAA G7TTTGTTTAT
AGGGTGAGTG
TT'ACGCATAT
GCAGCAGGGT
a a a a a a. a.
TTTCCCTCTT CCATTCGTAA AGGCCTTG GAATGTCACT AAACTCCTTG ATCCACGTGT 'rCAGTCGCAG CACTAAAC CC-ATATCCTA GGCAGAAGTG TAGTAAGGAA TGCr'rrCTG'r AATAGATTTT ACCAATTTTC TTCTCTTGGC GTTCT'rCGAT TAGAAGCAGA AACGTCGT'TC ATATAGCTGA TACAACCATT TTTGGTGGTA AAGACCATCT TCTTGALACCT TGTCGCGAAG CAGG4GATAAA GACATTT"1TG*AAGAGTTCT'1' TALACACGGTC CAGTTACA'rA CTTGTCAAAG TAACTTCCGT TAGCATAGTC ACGTTGCGTA ATCGGGTGGA TGGCGCTACA GCAAGGCGAT TTCAGCCCAA TCAGCAACAC TGATG'rGGA ACAAATTCAC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 TGA7T'1'CA AAGTTGTGGA AGGTAATACC ACGTTCACGT GCAATATTGT CAAGGTCCAG TAGTTCATAA GCATAAAGTA GATGCTTGTA AA'rTCAACAG
TTGACTCTAC
ACTCAGGTGA
ACCA'rATTCA ATGAGTTGTT GGGCAACGTA GCTGTGCAGT CCCATGGCAC CGAGACCAAA GGTGTGGCT TGGCTATTTC CATGGTCAAT CGTTGGTACA CCTACGATA'r GTGAACTATC TGTAACGAAA GTAAGGGCAC CATCATGTTA ACCACG'N'GG GAATTCVTGA GCATCG3TTGA GTTACTCATG ATAATCTTTC GAACCATAGC ACGGATAGAA CCACCAAAAP CACGTGAAGT TTGAACCCAG GTTACATGAA ACATCTGTTC CCATTTGAAG TCAAGCTTGG TTC'rTGAACT TGAAGAATCT1 CAGAACACAA CATCAACAGG ATTTGCACGG rTAGCCGTAT CGATGTTGAC TACATAAGGA TAGCCAGACT CT'rGT'rCCAA TTTAGAGATT TCAGTTTCCA AATCCCGCGC CTTGATT'r GTCTTGCGAA TATTTGGATT TGCGACCAAT TCATCGTATT TTTCAGTAAT GTCGATGTAA 'rTGAATGGCA CACCGTATTC TrTrTCTACA GAGTAAGGGC TGAAGACGTA CA'rrTTCA ?'N'TTACGAG CCAATTCGTA GAATT'rATCA GGTACTACAA CACCAAGTGA 1.211 TAGAGTCI'G ACACGTACTTr rCATCAGC GTTCN'TC ?1'AGTTGAAA GGAAAGCGAT GATATCTIGGG TGAAAGACGT TGAGGrAGAC AACACCAGCA CCT TGACGTT GCCCCAArrG GTTGGAGTAA GAGA.AGCTGT CTTCAAAAAG CTTCATAACA GGAACGACAC CTGAAGCAGC TCCTrCATAG CCTTT-GATAG GTGCACCAGC TTC.ACGAAGG TTGCTGAGGG TAATTCCCAC ACCACCACCA ATACGTGAAA GTrGAAGAGC ATCATCCGTC ACT TGGATTA GGAAACAAGA ATTCAAGAAG GAAGGAGTAG CAGG7TGGTA GAT'TGCAACA GCTTCATTCC CA'rCAGCGAA CATATTTTCA AGATAGTATT CACCGTCATT
TGAGTN'CATA
TACCAACTCC
GCGTTGGTGG
A'rAAAGGGCA
AGTCTTTAAG
GAACCCCCGA TAGAGTrCAT CCACGACGAG CACGTCCAGC ATGATTITCAT TGGCAATATC TTGAAGAAGA CACGGTCTTC GCATA'PTGAT TGTAAAATTT ATAAGCTGCC ATGAATGACT TGAAT TGGAA GTTTrGGTCTr TTGATAAATT GAGCTAAT-rC TTCCAAGAAC TCTGGACGGT ATTTGAT AAAGGCTGTT TCGATGTAGT TGTGTTCAAT GAGGTAATTG ATTTrGTCTT TGATTGAATC AAAAACCATA GTGTTTGGAA CTACATTT TTTAAAGAAA GCATCCAAGG CTTCCTTGTC TTTATGAAGC ATGAT'rTGTC CATTAACAGG ACGGrTAATT TCGTTATTAA GACGGAAGTA AGTCACGTCr TCAAGATGTT TTAATCCCAT AAAATrTCCC TTATCTAATT ACAAAAOAAA GGCTTCTAAG TTAGCCCTAA AAGCAGTTTC TTCTGGATGA TGTACTAAGA. TTATGCTAAT TG'rTTCAGTT TTCCTGGTTG GAAACCTGAA AAGACTTCAG TTGG7G1TTTG GATAACAGGA GcTrGCGCTAA AACCGAGCTC TTTAACTTGA TCGACGTACT CAGGTTGCTC ATCAAGATTG ATTTCACGAT AAGAGACATT ATTACTGTCC AAGAAACGCT TGGTC.ATTT ACATTGGACA CAATTGTTT'AGAATAAAC GGTTACCAT'r GTGTAACTCC TCTTCAAAAT TTAATACTAT CTTAGTATAT CAGAAAATAA AATTTTGTCG 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3902 GG INFORMATION FOR-SEQ ID NO: 213: Ci)SEQUENCE CHARACTERISTICS: LENGTH: 2456 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: TATTGAAGCT ATTGTAGACT ACAAAGATAA GGANTGTCAG TTAGTAGGCG GTGACACTCA CTGATAACCT AAAAAGGATA GTCAATTATG CTrGTTTACT AACTATTAAC TATGCTAAAT 1212 CAA'rGACGT TGTrACATA AAACTCTATA TCAGAGAAGC GCTAGI-N-A GCArTTTT GTAAAATAGA A-AAAGTGAAG AAAGATAAAA AAATCGAAAT TCAAGTAGCG GA?4GCCAAAG T'rAAGG1-r ATACATTGAC TATCGGTAAA AAAGTTATCG GGACAATTTG CCATTATAAA GAATGGGAAT CTCGATAGTT GCTGTGGAAA '"'rGATTGA AAATTATAAT TTAGCAAAAT TTTCATGATA TAATACTCCA TCTTGATTGT AGCAGAGATA GGACTGTAAA TCCGCCCCTT CGGGTTCGGG GGTTCGAATC CTCATATAGA GTI-rTTCTT AGAGGTATGA AATGAGCAAG TTAATGTTGG TAAAGACAGT GAGAAA'rGC CGAATTAGAC TTTATAAAAA ATTGGAAAAA AAGTCTTG'rT T'rTGAAAT GCGAAGAGGC TAAACGCGC CCTCTCrCTC CATTITCATTA CCTCATGCGT TGGT'rCCkAT CAAAATATCr TAGGGTATTT CTTGCCGGTG CTTAGGAAAA GGAGGATTT'r TTAGATGAAC ATCGGGTATA GCCAAGCGGT CCAGCTACCC CACTTCTTAG TATTTTTATA ATTGAAAGAC AAATTATAAG TATGTCAAGT GAATTTGAAG ATTTGCTAAA GAAGTATTGA CAGTTGATGC GGTGTCT'rGA CTCTTCGCGA AAAGTAGGAG AAGTATTGGA AAGGCAAGGG ACT'rTGACTC GTAATAATCA AGATAGAAAG GTGAATGATA TGAACATGTC TTAAGAAAAA CTTGATG=r TAGCGTTrAGT CAAGTTGAGA CTGGTGATGT
GACTCAAGCT
ATTGACAAAC
TGTTCTTGTA
AACGTPGCAA TCTCTGGAAC GATCGTGATG CAGATATCAA CTTCGTCAAG TAGTTGGTAA TGTTAGTGCT 900 TGGTGTTGAA 960 TGACTrTGTT 1020 AGATACTGAT 1080 ACAGTTACAT ACCTTGTATC TA.AAAAACGC CTTGAAGCTC GTTGGTCGCG AAGAAGAAGT TG'rACTGTT AAAGGAACGC TCAGTAGAAT TTGAAGGTCT TCGTGGATTT ATCCCAGCTT GTACGTAACG CTGAGCG'N1T TGTAGGTCAA GAATTITGATA GCTAAAGAAA ACCGCTTCAT CCTTTCACGT CGTGAAGTTG GCTCGCGCTG AAGTATTCGG TAAATTGGCT GTTGGTCATG GCAAAGCATG GGACAAACT GTGCCGTTAA AGGTGGACTT CAATGTTGGA TACTCGTTTC CTAAAATCAA AGAAGTTAAC TTGAAGCAGC TACTGCAGCA 'rTG'rAACTGG TAAAGTTGCT CGTATCACAA GCTTCGGCGC ACTGAATTGT CACATGAACG TTTCGTCGAC CTTGGTGGTG TTGACGGATT GGTTCACTTG TAATGTATCA CCAAAATCAG TTGTAACTGT TGGTGAAGAA 1140 1200 1260 1320.
1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 ATTGAA(ITcA AAATCCTTCA TCTTAACGAA GAAGAAGGAC GTGTATCACT 'ITCACTTAAA GCAACAGTAC CAGGACCATG GGATGGCGTT GAGCAAAAAT TGGCTAAAGG TGATGTAGTA GALAGGAACAG TTAAACGTT GACTGACTTC GGTGCATTTG TTGAAGTATT GCCAGGTATC GATGGACTTG TTCACGTATC ACAAATTTCA CACAAACGCA TTGAAAATCC AAAAGAAGCT CTTAAAGTTG GTCAAGAAGT TCAAGTTAAA GTTCTTGAAG TTAACGCAGA TGCAGAACGC GTGTCACTTT CTATTAAAGC TCTTGAAGAA CGTCCAGCCC AAGAAGAAGG ACAAAAAGAA 1213 GAAAAACGTG CTGCTCGTCC ACGTCGTCCA AGACGTCAAG AAAAGCGTGA CCAGAAACAC AAACAGGATT TTCAATGGCT GATTTGTTTG GTGATATCGA AAT TGAAAAT TCACAAAATC CTTrGTTTAC TAAACAAGGG ATTTrTCTGG ACTGTAGTGG GTrGAAGAAA AGCTAAGCTC GAGAAAGGAC AAxTrGTC TGATATTCAG AGCGATAAAA ATCCGFTT TGAAGTTTTC AAAGTTCCGA CATTGCGCTT GATAAGTTTG ATGAGATTAT TGGTCGCTTC CAGTTTGGCG GTAG'TTGAAG GGTGTrGACA AGCTTTTCTT TATCTrTGAG GAAGGTTTTA TrTCGAACTT
AC=NAATC.A
CTCTTTGTCA
AAACCAAAGG
TTAGAATAGT
GAAAAATAGG ATGAACCTGC T'rAAGATrGT CCTCAATAAG CCTrATTCTG AAAGTGAAAC AGCAAGAGTT GATAGAGCTG INFORMATION FOR SEQ ID NO: 214: SEQUENCE CHARACTERISTICS: LENGTH: 10974 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear TCCGAAAA.AT TTCTCCGGTT ATAGTGGTGT TTCAGG (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: AAATAGGATA TACAGACATC CTTCTGATCT GCTTTTWACA AAGTCCAATr ATATGCGGAT C'rATACCTCC ACAATGTCCA TTATTATiCC TAACTATAAT ATGAGCCGAA AACACTATAT CCTTAATGTC TCCATATCCA TCAGGGATAT TAATATTTAT TTTTCCACAA CTATATTGCA TTGTAACCAT CTCCTAAAC GACCCATTAT GATATTTGAT AGAGAAATTT TTATGAATAA CTCAATAATT TTATAGTAAA TCATGC'rTAT ATCTCAAAGA TACCTATTTT ATCrTGTCTC GACCTTCTCC AAAGAATTGC TATAATACTA TTACAAATCC ATCTGCACTA CACTTCAAAT TTTAGCACTG TATAAAAACG TTTCAATACA CTAACTTCAA GAAAACTTCC ACTATTAATT GAAAAAATTG ATAGAGATAA ATTAAAA.ATC TATATTGAAA. CTCATCCCGA TCCTTATT'rG ACTGAAATAG CTGCTGAATT CAACTGTCCT CCAACAACTA TTCATTACGC TCTAAAGGCT 1980 2040 2100 2160 2220 2280 2340 2400 2456 120 180 240 300 360 420 480 540 600 660 720 780 840 ATGGGATATA GTCTAAAAAA CGGTTCC!TTA AAGAATTGAA CGGGTTGAGA CCTATrTA CTGGAAGAAG ATATAATTAT ACTGTTATAG AAGTACTAAT GAGCCGTACC TACTCCGAAC TCACTTAAGC TACCTGACTC TCTCGAATAT GATCGAGCCT TTGAATTAAG ATCGAGACAA CCCCTTTTTT GTTTCAATAT AAGACCCAGA AAAAGTAAA1 CTATTTATAT TTATGAGACA TGAGCAGGCA GTTAGTCTCT CGCACACCAG AGATTGCGAT ACTATGGCTC CGATGACCTA TAAAGATACG ATGACGAGTG TTTrAGATACA CCATCCCTTA TAAGGAGCAG GGCATACACT GAGAAAATAT GGGCTTACAT 1214 AC7"=rTCGA AGCTTGCT'rC TCATTATGGA CAATGCAAGG GTTACCACTT CCTACCTAT CAAAAACATC TCAGAATAAT TTCTTGAGGC ACT7'1'TGTCC TArrCTTGTT TACGGAACAG TCGATGGGAC GATGGGGGGA TAACAGTATA CTGG.AGAATT GACAATCTCG TTAGGCAGTT ACAAAACAAC TGTGAACAGA ATGTTTTGGC TCTATAATTT CTGTAGTGGG 'rCAGCCGACT
CATAAAAAAA
GTAGATACCT
AAACAIYTCCA
TAATCCCACC
CAAAAATTCT TACTACCTAC TTTCACAGAA TG.AACATGTG CACCCGAGTA TAATCCCATT A7'1'GTCAAAT TACGATCTT A'rACTCCGTT ATTCGGCAGC TCCTCCAGTT TTGT'TTTA CG'TTATAGCG CGGTTACTTA GAGTCAGACA AGACTTrTGGA CCAGGAATTrA TAGGGTCGTT TCTTGTAGAA AAAAACCCCC ATATGACCTA TAAT-AAAAG CC'rCAACCA ACTCATTAGA AACGGTTCAT ATGGAACAAC 'rTAAGAATAC CACAGATTTG CTCGGATTGG AAGACAAAAA TATCAAAATC TTGTCTG'rTC TGAAATACCA 'rrCCCCCGCT CCTCCTi'GTC CTCATTGTCA AGCCTCTAAA AT'rCCGCTTC TCGACTGTCA GCGCCGCTT CAGTGCAAGA ATTGCCTTAA GAA-AAATTGC CAGATTTCCA ACATGGTGAG GCAGTCTATG ACTGAGATTG CCCACAGATT ACTGAGGGA.A T'rTAAGTr'rG AAACCGATTG TGAGTATAGC 'rrCAAAAAGA GCAAAATGAG
AACCCATCTA
AGGGAAGATG
GGGTTI'ACCC
GGTCCTGTT
GTCGTTAGG
ATCAAATACG
ACGGrACTGC
TCTCAAACAT
CAAAGTTrGGA
ACTTCCAGAA
ATCTCAAAAA
CCATTGTCA.A
TCCTTGAAAA
TCATCCGAAA
TGAGTTGGGA
AGTCCAAATC
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 ACAAAAAATC GCTCAGCTCC GGCGGTCI'CA ACTTCCACCG GACCAAG'rTG CCAAAAGTTA CTTCATTGCC CAAGA7=IG CATCCTCGCA ATTTTAGACG TCAGAGAGAG GTTCGGGAGC TCGGCTCGCT AAGCAACTCT CCAACATCTC AGCCGAGCTA AAAATCCTTG GAGTATCGG GCTCGGGCTA AATCAGTCCA CAGAAACGAT TCTATCACCC GGCGAACTCA TGCGGTGATT CGAAACCATT TCCAACGCTA 'rGGTCGAGGT CATCACCATG GACATGTACA GCCCTTA'rTA TTCCAAAGGC GAAGArG TGAACCGAGT ACGAATCCAA CGCTCAAGCG CTTTTGCA.AC CTGGACTGAT TTACTACACC CTTGACCGCT TCCACATTGT ATCATGAACC AATTTGACCG CCTCGC= T'CGTT'rCAG AGTATAGCTT CAAGCTCTGT CACGTTTCGA ATGCACTTAA CCCATCGGGA AGTACGAGAT AAGCTGCTTT C TACTCTCA GGATTACAG GTTCACrACG AACTCTATCA ACTCCTGCTC TTTCATTTTC AAGAGAAGAA TGCCGACCA'r TTCTTTGGAT TGATTGAGCA AGAACTGCCA ACGGTTCATC CGC'TTTTCA AACGGTCTTT TGGACTTN'T TAAGGGATAG AGATAAGATT ATCAACGCAC T'rAAGCTGCC TTATTCCAAC GCTAAACTTG AAGCGACCAA TAATT'rGATT 1215 AAGATTATCA AGCGCAAAGC CT'rTGG'x-TC CGGAACTTTA 7 rrGATGAC~r TGAACAT'CAA AAAAGAGAGT ACGAA~rCG ACAATTTTAA AAAACGCATT TACTCTCCAG ATTGCAGCTT TTCGCCrACC CAC1TACACTT ACTGTTTG GCATTGAGCT GGCATAGGCA ATCATACCTG.
GACATCTGTG A'rrrCGGCTG TGCCACAACT AGGA7MTAA GACAAAGAGC CACTCTTTAT TCCATCGTAT CCCAGCCTGC GAAGT~CT 7TTTCCACT CATTG'rCTCC GCAGAGTCGC AGAGGGGGGA CTACCTTC TCTGAGACCT TTATTGGCTG CAGGATATTT CTCCAAAGCC 3rC7TWGTrT
CAAAGGCAAG
CGCTGACGCT
TGATAACCTT
CCACACCACC
TGCCATGAG
GGCTTTCTCC
AGAACGAGAA
GCCCCTGATG
AATGTCCATA ACTGCTGCTT GGAAGCAAGC CTTTTGCTCG GCATTGTGAT GAAGATTGAT CTCCAGAT'rA TCTTCCTTAA 'rCATGGCACG ACCCAGCTCG TCAATC'rCAC GACCTGCAGG ACACAAATCT TCTGTAGACA AAAGGCAGAT TTCAAACCTG GGGGAAATCA TAAA'rATCCT a.
a ATCATAAGCC TCACCAACCG CTCCCAAACA TAAACCAACT CTCCAAAGGC TCCACACTCT CAG=GCAAGT CCCTGAGCCC TCCGACCAAG CCI'GGTCCGT TGCTTCTGCC AATGCCTCCT TACTTCGGGC ACTACGCCAC GGACAAGAGC TCATCGTCGT AAATGCTAAA ATATATCTAT
CATCATCACG
CTGTGTGTCC
GAGCTGCCAT
AAGCAAAGGC
ATAGGTCAAC CCCATGACAC GGCCGACCT GGTTTCCCCA ACAATCTTAT AATCTCCGCC GCCGCTGACC AAGAGGCTA GCAAGGGAPA GAGGTGCCCA GCCATGTGAT TAACAGGAAT CTTGGCAGCT GACAAACCA.A CTAGCAAGGC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 AGGTAACCGC AACAGCTGTC ACGTCCTCTT CGATACAGGC TGTAATCACC TCGACATGGT CAAAACGTTT GTGACTCTCA ATTTGACrAG TTTTCAAGAC GGCGACACTG GTCTCATCAC
CGGTAATCCC
GACGACTGGC
CAATGACATT
AGGATGTCTC
CCTTCATCTA
a a. a.
CTGCGTCATG GTAGTAGGCC TTTCGCTCAG ATGCTTGCGC TCG~wrGATI- GACTGTCTGA ATTGAGCAAA CAAGGcTGAC GCAATCCCCT GCAGGACTTC TGCTTCAAAA AGATTCTCCT CATCATAAGC CAATGCATAC CAAGTCTGGT GAGTCCAAGG AC'TGACTAGG TAAACAGC'rG CAGCTGTTG TTGAATTCGC TTGATTTCTA CTCGGTATGG T'TCTTGAGCC AGrTTCCTC TTTCTCTCTT CATCATA.ATG GCGTCCTCGA CGATAACTCT CATCTTTTCT TTCTTGTAAA CTTCGAGGAA AATTTCCTTG TCTGTCGGCA GACCCTGATA AGCTCCTTTG ACAGCGATTT GCACAGCTAG AAATCCAATC ACTTCTGCCC CTTGGGACAG ATCTGCTTGG ATTTGCTCCA CCATAACAGC GTAGATGGCT TGAGCTAGGT TCATAGGCGT TTAATGTAAG ACTCGCCAGA AGCCTCGACT CGTTTGAGGT AATTCGGCAC AAAATCATGC AAGGAC'rCTG CTTCCTTGTC CCAGGCCAAA AGAGCTAGAT TAGCTGCA'TT 48 4380 1216 GGGCAATGTT TCTTTGTAAT CAGTCCTTGG CAAGTCTTT? TGAATCTGCT CAACAAACGG CCAACTTCT CCGACAAACG TTACCTGACT AGTACCCTTG AC1rrCTA GCACCTrrC AAAAGATAGG TGCGCPCTG CCATGACAGG TTTGGCATTT TCATAAAATC CTGCATAAAC ATTATTGCGA CGCGCATCCA TCAAGGGGAC AAACAAACCT TCTTGTTGAT GGGGCACCAG AGCCAAGAGA CTCGACATAC CAACCAACTC GATGTTCAGG GTGTrGAGCTA AGGTCTTAGC AGTGCTACC GCAATTCGCA AGCCTG'rATA GCTACCCGGC CCTTCAGCTA CCACGATTCG GTCCAAATCC TTGGGTGTCC AATCCAAACT TGCCATCAAA AAATCGATGG CAGGCATAAG AGTAATACTG TGATqrCT TAATATTAAT CGTCG'rCTCG GCAAGAACCT GCTTATCCTC TAAAA'rAGCC AGAGAAAGAG CCTTGCTGGA CGTATCAAAA 'rTCCTATCTT TTTGTCTGCT TACTATTATA CTACAAAAGC TGCCCCCAGA CAAGAGTGCC CTCACTTAAC TAAAAATAAT CCTTTrTCTTTr TCCCAATATA AAAGTGAACA AGAAAAAAGG TTTGACAT'rC TTGACAATCA A7T='TATCC TTATCTGAAA
GCTAATACTT
TGGCACATGG
TTAAAAAAAT
TCATAACACA
GAAT-rTTCTT
GCTCACTTTT
AGGAAAGTTC AATGACAAAT ATGAATTATC AGATA'TTGAT eg f *too 00*0 a GGCGGTCTCG CTCCCTrGG1' GGAACAGCAC TrATAGGT'rC GA'rGAAAGAT TTGAACAATT TGGCTTTGGT GTCGGTGTTG CCTrCGTAAA AAGTTTGG'rA 'rTTA'rGTCr TCATCTCTTA ATCCCTTTAC CATTCGATTT TTTGATCAGA ACAATGACTC ATG'rTAGAA.A AA'rGCAAAGA TATCTrl'GGA
TGGTTTGGCA
ATCGTGAAAT
GTA'rCGCTTI'
AGTCATGCTA
CAGT'N'GCTC
ATCTGTTrGT
CCATAAAAAT
CCACCTCATC
GTAGCAGTAT CTTGGAAGGC TAPTGCAGGT GCTGGT'rATr TrTAGGAGG AGATTAATAT TT'CTAATAAG GAATTGCAAG AAATCAAGGG ATTTATGGCA GGTTATACCA TTGGAAAAGA 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 GATAAGAAAC ACATTTTTAG AGCATrC'rCA ATGATTTGAA ATTGTTTTAT TTTTATGCTT
AAGGATAAAT
CATTACTACC
CAACTCTATT
AATAAGCTT GAAAATTCCA TTGTCATGTC TTGATAGATG GGGTGGAATT TTCGTGTCGT AAATCTACTA TCTCTACATT CCCAAACAAA AAACCCCAGC ATAAGCAGGG CATCTAAGCA TTTAATTCAA AGTAAAATAC AAACCAAACC ACATAGGTCA CGAGGAGGAG AAAAAGCGAG TAGAGAGTCA CAAACGTCAT TTTCCACAAG AACTTGGTTT GTCGTCGTTC CAG?1-rCGCA AATAGAAGAT TCCCCGCATA AACGCAAGCA ACAAAAACAA TAAAAGCTAC CAAGCGAGCT CCGATAGCAA AAGCAAATAA GTTATACATA GGGCAACCTC CTTGACTTAA AATCTATATG GAATTATGAC AAGCAATAAA TTTCACTTCC GTTA'rCAACA TAATACA'N'T TCTT'rATTTT TGAAAACGCT TACCAAAGAA ATCG'rCCCCT AACrrTCTCG TTTCCGTCTT TTACTAATTT TTCA7'rGT GGTATAATTG AAATAATTGT AACGAATCAA GGTCAATCTA GACACAAAA'r GGAATGAAAT CAAGCAAATA TCTGCTAAAA GrrrGGAATA AGCTGACCTG TAAATAGAAA GGAACTATAT GATTTACAAA G7T=TATC AAGAAACAAA AGAACGTAGC CCACGCCGTG AAACAACACG CACGCTTT-AC CTAGACATCG ATGCCAGCTC AGAACTTGAG CTGCTCGCCA ACTTGTCG.AA GAAAATCGCC CAGAGTACAA TATCG.AGTAT TGTCTGACAA ATTrGCTCGAT AATATGGCCT ACACTCTTAA GAAATCGGGA AAAACACTTA GGGATTAAAT TCCCAGAAGA TACATCGTGG ACAATATCGA CACATTGG'rG GGAT'rCCGTT CTTGCCTTGG CTTTGATCCG CTTTACGA.AA TCAACCACAA TACGAAAAAG AAACTGGCGC CTTCGAAA1'r ACCTGAAGAA G'rCGGCGTTT TTGCCATCGG CGGAATTGAA TACCAAGACG AGATTATCAT TGAC7'rGCTT GGTATCGACT ATGTCA'NTCC CCGCGTCAAG GCrGTr'?TAA TCACACACGG CCTACTCAAG CAAGCAAATG TCCCTATTTA
GGCCGTATCA
ATCGAACTCT
ACGGAGT'rCT TGGTCTAG4GA
CGTCGATGCCT
TGACTACTC'T
ACACGAGGAC
TGCTGGACCG
TGGGAAACTC GAAGAACACC GCCTCTTGCG CAACGCCAAA CACCGAGT'rG AGAACGACTC ACTCTATTCC AGAGCCTT'rG ATCGTCTGTA CGCTGACTr TAA CGAC ACCTTTAAAA ATCTCAAGGC AACTTTC~rT GGGAPTGTCA TTCATACTCC TCAAGGGAAA TTACTCCAG TTGGAGAACC TGCGGACTTG CATCGTATCG CTGCGCTTGG GCGGAAGTAC CAACCTTTAC ATCCAAGGTA TTGAAGCACG CAGCAGGCAA CAGAAGCTGC ATGGAAAAGG CCATTGTCAA TTT'ATCGAGC CAAATGAAAT GGTAGTCAGG GTGAGCCTAT GTACAATTAC AACCAGGTGA
TGAAGAAGGC
CAACTCTGAA
TATCArCrTTT
TGTTAAGACT
CGGAATCGAT
CAAAGA'rTAT
GTGCTCTGTC
AAAGTCGTTG
GCATCCTTTG
GGACGCAAGA
CTTGGCTACA
CCTGCAGGAG
TCCTGTC'rGA CTCGACAAAT GTCAGTCCAT TATGAAGATT CCTCAAATAT CTTCCGTC'rC TTGCGGTCT'r TGGTCGTTCT TCAAAGCT~CC TAAGGGAACC AAGTTCTT~AT CCTCTGTACA 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 GGCAGCCCTC TC'rCGTA'rCG CCAACGGAAC CCACCGTCAA TACCGTTATC TTCTCTTCTA GTCCCATCCC TGGAAACAC-T ACTAGTGTCA ACAAGCTGAT TAACATCATT TC1TGAAGCTG GTGTCGAAGT TATCCACGGT AAAGTGAACA ATATCCATAC ATCTGCACAC GGTGGTCAGC AAGAGCAAAA ACTCATGCTC TGC7TTGAT'rA AGCCAAAATA CTTCATGCCT GTCCACGGTG AATACCGCAT GCAAAAAGTC CACGCTGGAC TAGCAGTGGA TACTGGTGTT GAGAAGGACA ATA'rCTTTAT CATGAGCAAT GGCGATGTGC 'rTGCCC'rTAC TGCTGACTCA GCTCGTATCG CAGGTCA'r-r CAACGCCCAA GATATCTATG TCGATGGAAA TCGTA'rCGGT GAAATTGGCG CAGCTGTCCT CAAAGATCGT CGCGATCTA'r CTGAAGACGG TGTCGTTCTG GCAGTTGCAA CTGTTGACTT CAAATCGCAG 1218 ATGATTCTAT CTGGTCCAGA CATCCTCAGC CGAGGCTTTG TCTACATGAG AGAGTCTGGC 7980 GACTTGATTC GCCAAAGCCA GCGTATCCTC TTCAATGCCA TTCGTATCGC ACTGAAAAAT 8040 AAGGATGCTA GCGTGCAATC TGTCAATGGT GCCATTGTCA ACCTIATTCG CCC CCTC 8100 TATGAAAATA CCGAACGTGA ACCGATCATC ATCCCGATGA TCCTCACACC AGAVG.AAGAA 8160 TAAAGCAAGA AAACAGCCCC GTCCTCGGAG CTGN7rrcT CTATCCTTTC TrTTTGAGATT 8220 AAAACTCATA CTCAATGAAA ATCAAAGAGC AAACTAGGAA CCTAGCCCTA GGTTGCTCAA 8280 AGCACTGCTT TGAGGTTGTA GATAGAACTG ACGAAGTCAG TAGCCATACC TACGGCAAGG 8340 CGACGTTGAC GCGGTTTGAA GAGAT'N'TCG AAGAGTATCA ATAAAAATCG AAATCAGACT 8400 AGAACGCTAA GCGAAAGCAT AAC=TGAGTT AGCTCCCATA GTTCGGGAAA CTATGGGAGG 8460 CTGGAGATGA ATCAAAGCCA AGCTTTGAAC TCATTCGTAA GAAGCCGACG ACGTATCATT 8520 TTGAT7TTTG AAGAGTTTTA GAAATAC'rAC GATT=TACC TTCCAGATAC ACCATCAAAA 8580 TAGAAAATC TGCTGGG=T ACTCCCGAAA TACCGCTGGC TTGGCCGATG GTTTCTGGAT 8640 TGATGAGTT'T GAACTTCTGA CGGGCTCGG r'rGCGATAGA ATCAATGTCA TCCCAGTCGA 8700 TATTGGCCGG AATGCGTTTT TCTTCCATGC G=TCATCTT GGCAACCTGG TCCATGGCTT 8760 ***TGGAAATA'rA GCCTTCATAC TTGATTTCTG TT'rCAATCAA TTCGATAATC TTGTCATCCA 8820 AGTC'NCTGC AGCTrGGTCCG ATGAAGGCCA CCACATCTG GTAAGAAACT TCTGGACGGC 8880 *GAAGGAAT'rC CTTGGCTGTC ACTGCATCGG TCAAGGGTTT GAACCCCATC TCCTCAACCI' 8940 'rGGCATTGGT TT'CCTTGACT GGCTTGAG'N' TGATACTGTC TAGGCGCTTC ATCTCATTAT 9000 CAAATTGATT TTCTTGA'rT TCAAAACGAG CCCAGCGTTC ATCGTCCACA AGGCCAATCT 9060 CGCGTCCCAT CTCAGTCAAG CGCATATCAG CATTGTCATG ACGAAGAATG AGACGGTATT 9120 CAGCACGACT GCTCAAGAGA CGGTAGGGTT CAATGGTTCC CTTCGTCACC AAGTCGTCGA 9180 TCATCACCCC GATATAACCA TCACTGCGCT TCAAAATCAA TTCACGCTTG CCTTGGATTT 9240 TCAGAGCCGC ATTGATACCC GCGATAATCC CTTGGCCTGC TGCCTCTTCG TAACC1'GATG 9300 *TTCCATTTGT CTGACCAGCA GTGAACAGAC CTGAGAT'rTT CTTCTTTCC AAAGTCGCAC 9360 **GCAACTGATG AGGCAAGACC ATATCATACT CAATAGCATA ACCTGTCCGC ATCATCTCTG 9420 CAT'T'CCAA ACCTTTGATG GAATGCACCA AGTCACGCTG GACATCCTCA GGCAGACTGG 9480 *TTGAAAGTCC TTGCACATAG ACTTCCTCAG TATTGCCCCC 'rTCTGGCTCA AGGAAGAGTT 9540 ***GGTGACGTTC CT'rGTCCGCA AAGCGCACAA TCTTGTCTTC AATCGACGGA CAGTAACGAG 9600 GCCCCACTCC CTTGACCACA CCTGTAAACA TAGGCGCACG GTCGTTG ?rTTGGATAA 9660 TCTCATGACT GGTACCATTG GTATAGGTCA ACCACCATGG TACTTGGTrCC I-rGACATA-AT 9720 1219 CCTCA'rCACG
TCACATCGTA
CGA'PTTCGAG
TAGGACCTGA
TCGTCACAAT
CCTTGCCATC
GGT -rTCAAC
GGGCACCGAC
CAATGGTTT
CAGAACCACC
CAAGCAGGAC
CCGCACCAAT
ATTCCTCAAG
AGACTGGAAC
TTTTCTCATC
ACTCGATA'T
GCTGCCCAGC
CCGTGCTTTC
TAACTAGAAG
TGTCTACTTT
AAAACAGTCA
TGAAGTGTAT
ATTGATAGAA.
ACCCAGTTCC
TGAGTACTTG
AACAGCCTTA
TTCCACCAAA
CGTCTrGCGC AGCrGGCCC
GGCCATCTCG
GATAGAGGGA
CTTACACCCC
TACAATAATA
ATGAATGTGT
TAGCTGGATA
'PTCCTGCATG
GTGATAGGAA
'rTGAGCTTTG
ATTGACTTGA
CTTGTGATTT
TGTCCCAAAG
AAACTACAAA
GAGAAATGAT TAGGCACTTC GAAGCCTTGA CACGTGGAGG TTGAGATTGT CAGCTAGGTr AGGTCTCCCA TGATAATTTC GCAGCATATT CTTGATGGGT GTCTCCTGGC TGAATTCTG GGTrCCTGTC TTGAAACGAC AA'rAGAAGCC
CCCACGGAGA
GGCTGTACGC
ACGAAGGGTC
CTTGTCAGCC
AAGCTGTGGT
GCAGTCCCTG
ACACCGACAA
AGATTTTCTT
TGCGCACGAA
ATCTCATCAA
ATCTCCTTAG
TTCCCTGTGT
CCACCGAGGG
TTACAAGGCA
ATACGGCTAG
TCGTATTCTT
CTTAGTTGGC
TTCTGGAGCT
GTCA.ACGGG
ACGCCGTCCA
ACACCCAGTT
ATCGCACCTC
TGGACAATCA
TCTTGTTGAG
TCATGGTTTG
AGTAAAGTTC
TrAGCATCTT
CATCGACTI'C
TGAAAGCCAG
CGGCCAAGGA
CAGTAAAATG
CTTCCCAATC
TATCCAAGTC
CATCTTCAAG
CTTCAAAACG
CTTCCTTCAC
CAATAGTCTA
AGGCTGTAGC
TCATTCTTGT
CATCTGGATG TAAGTCTTGT ACGCACGACA ATCCCCTTGG CATTTCAATA TTGATGGTCG AGCCTCAACC CCAGCGTGTC ATAAGTCATG TTTCTCTCCT TGGTAGGGCT GTTTTTAAAA AATCCACTCA CAGGGCTGCC CAAATCCACC AGATAATGAA ATT'rTCAACC ACAAAACCTA r'rCACGGACT ACCGCGTCTT ATACTTGCCC TTGTCTTTrG CCGAACACCA AAAACCGTAT CCTTTCCCT'r AAACGACACA 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10974 GAAGTGCAGG ACAAAAAAGC CTGCAACATC CAGG INFORMATION FOR SEQ ID NO: 215: SEQUENCE CHARACTERISTICS! A) LENGTH: 987 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215: CCCGTTATGA TTATGGATAG CGCI'TCAAA T-rTTAAACT CCTATCCCAT CCTTTTATCT ATATAATAAG TGAAAATATA ATAACTGTCA AGTAACTGAA GTGAATTTTA TAAAAAAATT 120 1220 ACAAGCCAAA TTTGTAAAGT TTACACTAAG CCGCTAGgCA ATCGTCTATC AGAATATCCG TTTrATTT'GTC AATAArCCGA GAAAATCTrG CAACGCTrAG AAGTCTATAA AAACTATCAA CATTTATATG ACTTGCGAAT AGCAATCCTG CAAGATAAAA ACATGTGTAA GCAAATCTGC AAAAGCTACG ATAGGCTTGC TATCTGCTAT AAACTTATCC AAAAAGGGTT CTCCCTTCTrG CATCTCAAAA AAGAAGTAGA GACCCATTA'r GAGCTCCTCG ACCTTTTCAT AGAATCGCCG GTTTATCCCA ACTCAATTAT GACATTTTTT GACAAGAAAG AGGCrGATAA TCTACCAACC TTrAGCTrCA TTCGCTrTCT TAGCGACTGC ACCGGTACAA CCATGAGCAA TTGTAGTCGC CTAAACCTTT CCACACTCTA TCTATACAAT TACACTTTAC TGGAGGACGC CAAGAATAAG GTCCGTATTG GC*TTTGTAC AGACGATTCr GAGCTGACCG AGGAAACTC TATGCTGTCT CAACCAAAGA AATTATAAAA AAAGTCGAGG AACGATTTAA CGAGAAAGTA TGACTTTTAC TCAAAAGTCA ATATATCTCA C7T=NCAAC TCTTATTCTG AACCCATCAC TCCATCACTT AATCTGGTAT TCGACTTGGT CATI'CCCCTT TCCTATCTGA TGCGCTA'TrT CAACCAATTT 'rTAGAAATC AGAGGGCGGC TCAAGGCAGA TACCAAGAGA ATGTGACTGA TGAGCCACTA GCACATAATC TGTAGCAAAT TACTTTTGTT CATAATAGGC TCGTCCTTAA CATCAATGAC ATAAGATTCT ACTGCCCAAA CCTTAAG INJFORMATION FOR SEQ ID NO: 216: SEQUENCE CHARACTERISTICS: LENGTH: 2651 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: CTGGGTCTrG TTCATAGTAG GTGTGGTtCT TT'rTTTCGAG TGTAGCCCAT AGCTTTGAGC GCATAGTGGA TGGTAGTTGG ATGACAGCCA AAGTCAGAAG TCTGGATTGT CAGTAAGA'rA GTTTTTAAGT CTATCTCTAT CCTTTTACTT GGTGGTTTAG CTCTCCTGTT TTCTCTTTTA GTATTACGTG AGATTTGGAA AACGTGTGAT GCTTCTGTTA TAAGAGAGAA CTI-rMTACG AAAATCTATT GAATATGCCA GTGTACTATT TTTGGTTCAT TTTACTATAT TTTATAAGTT CAAAGCACTA TAAAGTAAAT TGAAACAAGA ACAATACAAA CAACCACAAA AAAGCAAGCA TTCACAAGAA TACTTACCTA CTATTTCAGT CAAATAAGCA CAACTTI'TCT TGGTTTTGT'r GCTTTAACCA GCCATAAATG TACTACCTAT TCGCTCACAA TAAGAAGATT ATACCACATT ATAGTGTAGC ATTCCAACTT CAATTCTCGT AAACGGATTG TCATGGGAGG AACAACCGTT CCTCTTrTT ATTACTAAAA TTCAAAGAAT TCCALATGCTT 7"='CAAGAG CAAATCCGTA TA-rCTGGA'r ClrCTGGGC TAC 'rCTATT TCCCGCTGAA C~r=TCCAA ATCATCTGTA AT~CACTCCAT CTACTCCTAA GTGAAGAGAT TTGCTGATAG CTTCTGAATC CAGACATAAA GT7rCTGATC CG~rGTCCA'r AGTTTGCTTA CAAAATA'rrC GAGTACTCCA TAGTATA'rCC TGTCGCTCTr GTNTAGGAA AGACAGAATT
ATTGACAGTC
ATCCAACGTT
CTACGGCrATG
ATGGTAGTCT
TAAAAAGCGG
ATGAAATAAA CTrxGTAGTTC
AAAGACTGGA
TTCATCATGT
AGTTCGTTGG
TCAAAAATAT
CCTGCTAGAT
TTTGATGTCC
CTGGACTATC
CTCGACrGAG GGCATCATAC TGTCrrACTT ATAAATCTTG AGCr1-rGCAG TTTTTACTG GT=rAATTT ATAATCTTCA AAGCTTGAAA
TTTCGACAAC
CATAACGGGC
CAATCCCTTT AAGCTCCTCC TTTCAAGTT AGCATCATC a a a a. *a a a a a. .a a a TGCACGTCCG TC 'CCACCAA GTA'rTrTGAA TCCCATTTGC ACCATGGGAG CCTCCAGATA ATGACACCCC A'rCGCACGAT
CCTGTCAAAAATGAAACAAA
TTAATCACGA CAAAATTCAA GTCrGGTTTG ATTG4GAAACC
AATATAACCT
AAGTTTAAGT
ATCATGACAA
AGTTGTGCTG
CCTCGGTGAG
TCTAAGGCAA
CAATTAGTAA TTTGACCA TTTTAGTCTG GTAGCCATTT CTTGAGGACT TTTATTGATA ACTGCCCATC TTTTGTTTCC TAGT'rTCCAA GGACTCTACT AAATAAGTTG AGGTAGATGA AGAAAAGACT GGCACAAGTC GAAGCATATC CAGCTCCTTT GAGCCATATA ATAGAGAT CTCTCTGAGT GATATCATCT G;TGATCTTTT TCTCTCCTAG TTI'AACCAAA AAATAAGTCA AATACCAAGA ATCAGAGACT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 ACCAAAGTTT GAGCCAATAA TAAAGGAATC AAAGGAAGAT ArAATAATAA ATGTGCTTTG AGCAAGATGT AAAATAAATT CCAAGCATAA AAAGTAAC1'C TCTTCTTGGT TTTCTCCAAG CTrAAACATCA CTGCTTCTCG AACAGTCAGC TGATCATATA CAATCTTCGG AAGGGCAAAC ATCAATCTGA CAGAGACATA GAGAAAGATA AGAGATAGAA GTAGGATGCT CAGCCACCAC ATCCAATATC TATCTTCTAA ATAAGCTTGG ATAA6ACTCTG GAATGACGAT TTTATTAAG'A TAATAAATCT TCAGCATTTT CCGTATAAA.A GGAAACAGCA TAGCTATATA GAAAALAGATA AACAAGGCTT TAGCGCAAGT TAGCTTT~TC ATAA6ATCCAA AACTTTCATG GAAAACCTTG CGGATATACT CAATTAGCCT TCGCTTTrTCA TTATAGACGA GATGACGAGC ACCAATAAAG AGGAG'rCCTA TTGAAAATA AGCAACCAGA AGGTTAATTA CAATCAAGGC TAAAAAAGCT
AGACTAATCA
TAACCTGTCT
ATGGAGAATG AGTAAGGATC GCTAAGACAT TrGTTATAGGA AATAAAAAGA GATCTAATAA GAAGCTAGCC AACCATGAAT TGAATGGTAC CCACAAATAC TCCACTATCA TAAAAATCA.A GAAAAATAGA AAGAGGATTT TATCAAGATC GAGGTAAATC 1222 TGTTTAAGAC CCAATTT'r-r AGGT'TrTCA GGTTTCATAG GAGACAAGTC CAAGCCACCA AAAGGATTGT TTGATAAGCT CCCTAGCTTG ATCCGACTCT AAGAAGGATT CGTAAACACG CTAAACTATT ATGAGACTGA CCTTGA.AATC CAAGAAATGA GATTGGCAAT ACCATGTAAA TCTGAACTCC GACGTTCAAA CCTTGTACTG TTGGCTATAG TCTAAACCAT GCTCTGCTAA GCACTCCTAG TCAAATAATT ACTT'rCTGTC TTAACAATT- CGCCGTCATC CGAGCATCCT GGCAACAG= TGCAATTTGA AGCTTCATCA TACAAATCCA AATAGGTAAA TCACTTTTAC 2340 2400 2460 2520 2580 2640 2651 CAGCATTGTA G INFORMATION FOR SEQ ID NO: 217: SEQUENCE CHARACTERISTICS: LENGTH: 5638 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217:
S
S.
S S S.
S
S S 5* 5 0
S
CGTTATAATA AACTTGTGAA AA.AATTAACA AAGGATATCG TTCCTTGAAA GCTATGGAGG AAAATATGGC TGATAAAAAA ACTGTGACAC CAGAGGAAAA GAAACTCGTT GCTGAAAAAC ACGTAGATGA GTTGGTTCAA AAAGCTCTAG T'rGCCCTTGA AGAAATGCGT AAATTGGATC AAGAACAAGT TGACTACATC GTTGCCAAAG CATCAGTAGC AGCTTTGGAT GCCCACGGAG AATTGGCTTT ACATGCCTTT GAAGAAACAG GACGTOOTOT ATTTGAAGAC AAAGCAACTA AGAACTTGTT TGCCTGTGAA CACGTAGTAA ACAACATGCG CCACACTAAG ACAGTTGGCG TTATCGAAGA AGACGATGTA ACAGGATTGA CTCTTATTGC TGAACCAGTT GGTGTTGTTT GTGGTATTAC TCCAACAACA AACCCAACAT CAACAGCAAT CTTCAAATCA TTGATTTCAT TGAAGACACG TAACCCAATC GTCTTTGCCT TCCATCCATC AGCACAAGAA TCATC'rGCTC ATGCAGCTCG TATCGTCCGC GATGCAGCTA TCGCAGCTGG TGCTCCTGAA AACTGTGTGC AATGGA'ITAC TCAACCATCT ATGGAAGCAA CAAGTGCCCT CGACAATCCT TGCAACAGGT GGTAATGCCA TGG'N'AAGGC CAGCTCTTGG GGTAGGTGCC GGAAACGTTC CAGCTTATGT GTCAAGCAGC ACACGATATC GTCATGTCTA AATCATTTGA CTGAACAAGC AGTTATCATT GATAAAGAAA TTTACGATGA CTTACCACAC TrACTT'rGTA AACAAAAAAG AAA.AAGCTCT TATOAACCAC GAAGGTGTTG GGCTTATPTCA TGTGGTAAAC TGAAAAATCA GCAAACATTC TAACOGTATG GTCTGTGCAT ATTTGTAGCA GAGTTCAAAT TCTTGAAGAG TTCTGCTTCG *555 S. 55 S
S
GCGTCAAAGC AAACAGCAAA AACTGTGCTG GTGCAAAATT GAACGCTCAC ATCGTTGGTA 12 1020
I
AACCAGCAAC TTGGArrGCA GAACAAGCAG GATTTACAGT TCCAGAAGGA ACAAACAI'C 'TCTGrCAGA ATGTAAAGAA GTTGGCGAAA ATCAGCCATT GACTCGTGAA AAATTGTCAC CAGTTATTGC AGT'rTwTGAAA TCTGAAAGCC GTGAAGATGG TATTACTAAG GCTCGTCAAA TGGTTGAATT TAACGGTr GGACACTCAG CAGCTATCCA CACAGCTGAC GAAGAATTrGA CTAAAGAA -r TGGTAAAGCT GTTAAAGCTA T'rCGGrIrAT CTGTAACrCA CCTTCTACTT TTrGGTGGTAT CGGGGACGTT TACAATGCCT TCTTGCCATC ATTGACACTT GGATGTGG'r C?1'ACCGACG CAACTCACrr GGGGATAACG TTAGTGCCAT TAACCTCTTG AATATCAAAA AAGTCGGAAG ACGGAGAAAT AACATGCAAT GGATGAAACT TCCTTCAAAA ACATACTTTG AACGTGAI1TC AATTCAATAC CTTCAAAAAT GTCGTGACCT TGAAC =T'C ATGATCGTTA CTGACCATGC CATGGTAGAG CTTGGTTTCC 'rTGATCGTAT CATCGAACAA CTGGACCTTC GTCGCAATPLA GG7'rGTrMAC CAAATCTTTG CGGATGTAGA ACCGGATCCA GATA'rCACAA CTGTAAACCG TGGTACTGAG ATTATGCGTG CCTTCAAACC AGATACCATC ATCGCACTCG GTGTGGGTC TCCAATGGAT GCTGCCAAAG TAATGTGGCT CTTCTACGAG CAACCAGAAG TGGACTTCCG TGACCTTGTC CAAAAATTCA TGGATATCCG TAAACGTGCC TTCAAGTTCC CATTGC7'rGG TAAGAAGACT AAATCATCG CGATTCCAAC TACATCTGGT ACAGGATCTG AAGTAACACC ATTTGCCGTT ATCTCTGATA AAGCAAACAA CCGTAAATAC CCAATCGCTG ACTACTCATT GACACCAACT GTGGCAATCG TAGATCCTGC TTTGGTAT'rG ACAGTTCCAG GArT'rGTTGC TGCTGATACT GGTATGGACG TATTGACTCA CGCCACAGAA GCATACGTAT CACAAATGGC TAGTGACTAC ACTGATCGTT TAGCACTTCA AGCCATTAAA TTGGTCTTG AAAA'rCTCGA AAGCTCAGTr AAGAATGCAG ACTTCCACTC ACGTGAGAAA ATGCATAACG CTTCAACAAT CGCTGG'rATG GCCTTTrCCCA ATCCrTCCT AGGTATTTCT CACTCAATGG CCCATAAGAT TGGTCCGCAA TTCCACACAA TCCACGCTCG TACAAATGCT ATCTTGCTTC CATACGTTAT CCGTTACAAC GGTACACG;TC CAGCTAAGAC AGCAACATGG CCTAAGTACA ACTACTACCG TGCAGATGAA AAATACCAAG ATATCGCACG CATGCT'rGCA CT'rCCAGCTT CTACTCCAGA AGAAGGGGTT GAATCTTACG CAAAAGCTC'r CTACGAACTC CGTGAACGTA TTGGGATCCA AATGAA~rrT ACAGACCAAG GAAMTGACGA AAALAGAATGG AAAGAACAT'r CTCCTAAATT ACCCTTCCTG GCTTATGAAG ACCAATG'rTC ACCAGCTAAC CCACGTCTTC CAATGGTAGA CCATATGCAA GAAATCATCG AAGATGCATA CTATGGCTAC AAAGAAAGAC CAGGACGCCG TAAATAATTG TTTATCAGTC TAGAAGCAAG ACAAAAACTC AATTTGAGGG 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 AAAGATCCAG TAATTTTTCT ATGATAAAAG AGGATGCCTT TTTATGATAT TGAGGCCTTT AAAATATATA ATAGATTGAA ACTAGAATAG TTCGATTGA CTGTCCTGAT CGATTTGTCC TATAGTATAG TAGACTG.AAT CTAAAATAGT T'rAATTT'rAC ?TTTTCTGATA GAGTTGTTCA 'rAAGAGTAGT ATTTACTAAG GCCCAATTAA ATTCAGCTCA AAACACTGAT TTGAGATrC TACGATAAA-A CGATCACAAG GTGTGTTGCT CTCTATAAAA TAAGTGCGAA GGAAATGAGC CTCAGATArr CTTATATCGA CAAGAAGTTT' TAAATACTGG CTCATTCAAT TTTTCACTAG AGATATAAAC AAATAAATTG GAGCTTAACA GTGGACTATT CTAGATTCAA CATATTATAA TGTGTAATGC AGGATCCAAT CCTr'rCAATC TGTAGTCGTC AT'rAA'rAAAG ACAGATGGGA ATCGATGAAT TCTTAGATA GCAACTGAAT ATGCATAAAT AGCAAGGAGA ATCCTATTTT AGAGTGAATC TTGTGCGCTT CTACTAAGGA TTCTTTTAAA TTTCCTGCCT TAGCTAGT'rG TCCACTATTG GCAATGAGCT GAAAACAGAT CCTATCCATG CAAG'rGCT'rG TTCCAGAACT ATATCAATGG TATCTACAGG GA'rA'I-rCCT CGAArGTG GCCCAATTAG AATGACATCT ATTGATTTTG CTTGGATAGA GATYCAATC TTAACAAGCA TACTTGTCGA CATTCCCGCA TTAACCTTCC AT'rTCTITG= CAACAACTITT AAGAACTCCA AGTGCAAAGA TGA'rGAATTG TGCTAACCAT GCATTT-AAGA ATACTGGTGT CATGAATTCT GTAACTGTTC CTAAG'rAGCT 1224
GCATCCTATC
TTGCCCTTr
TACATATCTG
TGTTCTTA'TT
ACGAAACAAT
CATCTTATTT
AATCAAAGAG
AGATAAGACT
TTTTGA'N'TC TAAAGAGTAT AATCATAGAT TTTTATAGTC CTTTCGT'T AAAATACTAT TTGAGTCA'TT CCCTCATCAT ACATATTAAA AATAATAAGC TAGTA'rAGTA AACTGAAATA TCCATTTCCA GCAATTTTTT AGAAACTACA AAACTAGACT AAAAGAAAAG GA'N'GGATCT ATTTTGTCCA ACr'r'TGGAG GTTCCTACAA ATGACAGTG;T TCCTATTTAT TTTGATAGAG AATCTCTGTT GAAGCCAT GGTCTTCTGC TTCTCCAGTA GCTTCTTTTT GTATGAGATT GTCTTCCGCT TCTTCAACTT TAATTTTCGC GATGGCTTCA ATAAAGGATG ATTTGGCTGC ATATTCCATT TCTTCTCTCA TCTTATTTCT 'r'r'GCTCCAT TCATCATTCC CTAATCCCGC GCAATTTCTT TCACAGCAAG TAACTCATAA GCTTCATGGA TATTCNTT AGC~rCTGTC CCACGTTCAG TCGCACTTTG T'rGCATT'TTT TTACATACTA ATAAAATTTG TTTCATAATC GTCATTAACT TTGA'rAAATG GAATGTATAG AACTAGAACT GCTCTCACGT CCCCTGCTGT AGTCCAAGGA ACTTGTA'rAA ATGCAGGACT GATTAAAATA CCAAGGACTG GAACTGTGAT AACGT'N'?rTG AACACCTGAT TGAAAAACTA GAATAGAAAC CTTCTAAAAC ATTGTTAGAA TCATTTTGAT ATATAAAAAA TGCTAAAACA 'rrTATAGAAA CAATTCACTA TAGTTTAATT CAAACTAGAA AACGAGTGCC AGCCCCCTCA TTAACAGATT 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 AAATGGAATA GCTAATGAAA TGTrATAAAC GATTGGGTAA CCGAATAATA CTGGTTCATT GATATTGAAG ATACCAGGTC CAAAAGATAA TTTAGCCACG TTTTAGAGA CAGCATGCG ACTCACTAAG AATGTTGCTA TTAATAAACA TAATGTAGA'r CCACTACCAC CCAN'AAAGC GAATGTTT-GT ATTTGTGATA GGATGAT GTGTGGAATG GCFGTCCAT TATrrTGCTGC AGTGATGrTT TCAGrAATGr TAATT'AATAG TAAGGITrCT AGGATGGCAC TGTAAATAAC TGC nrGGTGA ATACCAAATA GCCATAACAT GATTAAGCTT GTACCAATAT GACGAATTGG ATTCCTAAA GAGTAAATAA TAATGACCCC TTCTTGAATA AAGA'rrGTAA TGA'rrGAGAT TAAGTTCATT CCAGTTATAT TGAATAATAA TGCTGAAACA GGTCATGACT GGAAGTAATA CGCTAAATGA TCTACTAACA AAGGTTCATT TGTAAAGCTT TAACGTTTGA TAAT'rCAATG CGtACGATAA CCCCGGCGAA CATTGCGCCT GTACCTGTGT GAAATGTTTA CCGCATCTT'r TGCTCCGTCA GGAACTACAG ACCCCAAATA AGGAGATGAC GCTGGTGGAA TATTTTCACC AATA.ATTCTG TTGCAATAAT TGTTGAATGA AAGAACACCT AAACTGTATT TGGCATCATC ACAATTAAAG AAACTAATGA TAGCATTGAT GCTGCTAACG GGTTTTCGAA ATCTCTGTI-r TTAGCTAAGA AATAACCAAC CATTACAGCA ATAATCATAC CTGAAATACT TAAAGTACCG TTTGCAATTG TTATTCCCCA ATATrGGAAT CTTGTTAATG TTAAATACCG TGTrGTTCAA AAGAACGATT AAACCTGCCA GTTACGAATG CATCTCTTAG GG7rTTAAA TGAATTTGGT GATGGCA.AAA AAATT'N'TTT GGGGGGGGGG GTrATTAAAC INFORMATION FOR SEQ ID NO: 218: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 4745 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear TATCCCCTTG GAAAATCCAC AAATATATAA TGGCATTACT TCCCTAGTTT ACCAGCAAAG CCCCC7TTT A.AAAAAAA 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5638 120 180 240 300 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: CCCGAAGCTG TTGCCCT'rGG AACTCCAAAT GAAGAAACAG CCTTTGTCTT CAACTAT TTT GGTGTGGAAC CACCACGTGT TATCACTTCT GCCAAAGCAG AGGGGGCAGA GCAAGTTATC TTGACTGACC ACAATGAATT CCAACAATCT GTATCAGATA TCGCTGAAGT AGAAGTTTAC GGTGTTGTAG ACCACCACCG TGTGGCTA.AC TTTGAAACTG CAAGCCCACT TTACATGCGT TTGGAGCCAG TTGGATCAGC GTCTTCAATC GTTTACCGTA TGTTCAAAGA ACATGCTGTA 1226 GCTGTGCCTA AAGAGATTGC AGTTGATG CrTTCAGGTT 'rGATTTCAGA TTGAAATCAC CAACAACACA CCCAACAGAT AAAATCATTG CTCCTGAATT GCTGGTGTGA AC ?TGGAAGA ATATCG7TTG GCAATGTTGA AAGCTGGTAC AGCA.AATCTG CTGAAGAATr GATTGATATC GATGCrAAGA CTrrTGAACT AATGTCCGTG TTGCCCAAGT GAACACAGTT GACATCGCTG AAGTTTTGGA GAAA7TTGAAG CTGCAATIGCA AGCTGCCAAC GAATCAAACG GCTACTCTCA ATGATTACAG ATATCGTCAA CTCAAACTCA GAAATCrrGG CTCTTGGTGC
TACCCTTCTT
GGCTGAATTG
CAACTTGGCT
CAACGGAAAT
ACGCCAAGCA
CTTTGTCTITG
CAATATGGAC
AAGGTCGAAG CGGCrTCAA CTTCAAACTT GAAAACAATC ATGCCTTCCT TGCTGGTGCC GTT'rCACG'rA AGAAACAAGT GGGTGTCAGC TCAAAATCGG rTTTTTCTAG GAGTGAAGTA AGACATGGGA CAGGCCATCC GGATGGGATG ATTTATCA'rG CTTCTr'rGAG TCCAATCAT1' GTAAGGAA AGAGCTTGCA TGCAGCTGGT TTTTACTGCT TCCTATGAAA 'rTTGGAGW'rG AGAACTAGGT CTGCCTGTC ATCGCCTCTG TTACAATGTA GT'r'GACGAG ACAGCCATTT GATTCTACGG GAAATTAAGG TGTTAGAAAA TGGCGATTTG AGACT'PCCAC AGGTAACTAT CTAGTGGACA GGCTGGTGTT TATACGACCT CTA'rG?'TAC AACATCTTGG AGCACCCTAC CCCAGTATAT AGCAGAAATC GGGAGCAGGA GAT'rAGTGAT CrCTGAACCA AGCTGGTACC AAGAAAGGAA TCTTCAVGAT TTGGAGAAr'r GATCCGCTAT CTCAATTTCC AGATGT'rGCA GGTACCTCAA T-rGACTGAAA GCTTTAATGC GTAAGATTT AAAGTCTAGT TTGCCTTATA TCGCAAGGAG TTTCGGCTCC ATTTTGTGA GAGATGGGTC AGCCATGTTG CCATTTATTT GTCTG1'CAAG AACCGGCAGA CCAGAA.ATGG ATATCCAGTC AATCCTTCTT TCTATCCAGA CTACCTATTT TTGAAACTAT TT'rTCAGGG AGTATTACAT AATCCTAGTC AGTTGGCAGC TCAGATTTTT AATCCATCTC CTGGATCAGT ATGAGGATGT GTTGATAAAC TCATGGAAGA TATTACCTCA ATTTT'CCTAC T*=TCAAG AAGCTAGTCC TTGCGCA-ATC AAATCAATGC ATGACCCTGT CCAATTATTT 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 GTATATAAAG GCAGGCTTGA TTC'rACGTGA GCTTGAATCA CT'rGATAGTC 'rTGAACTG-GA GGTCTATCAA GCCTTGTTGG AGCAGACTTT'
AAATAAGCGC
TCAAGAGATT
TGAGACGGAA
AGCTA'TTA GTTGAAAACA CGGACI'TGC GCGCATTAAA TTACAAGGTC AAACAGCAGT ATCCTTTGAC AGGAGATGTT AATCCTGAGT ATGCCCTCAA CAAAAAAGAC CAGC'rTATGC AGAAATGCCG AGGCTATATT GTGCAAAA'rG AAGATGGAAA GAGGTTAACT T'rCTACTTAG CGTGATTTCT AGAAAAACAG CACGAGCCCT GTATATGACG GCTTrTTTTT TGATATCTTT GTGGACAGTT GTATGAGTTG GCTATCGAT'r TGTTTCTGAG TACATTGT'rr ATGACATTr
TGAAATTTCT
AGGTTGTCTT
T'rGA'rAAGGA
GACTTITCCTT
1227
AGTATTCGGT
ATCATATGTC
G1,AAAGTTAC ATAAACTATA TGTAACCGGT AACACATArC GGAATAAACT AAAGGAGACA ACTTGAAAAC AAATTGGAAC AAGCAACAGG CGCTGTCAAA GAAGGTT "G TGGAGACAGC AAGACAGAAC TTGAAGGAGC TGTTGAAAAA ACAGTTGCTA AGGCAAAAGA CGTrGTAGAA GACGCAAAAG GTGCTGTAGA ACGTGCCGTr GAAGGTT'rGA AALAACGTTTT TACTAAAGAA TAGGAAAAAA TCAAGGGTr CAT1--ICCCT TGA7N "rT 'rM-rCTTATA AATAATrTTC TGCGACGGC GTATCTCCTG GGTAGGATTC 'N'TCTTGCCC TGGATGAT'rT GGTAACAATC GGCTCCC??A CCCGCAATAA TAACTGCATC TAATTCGTGA TTTGTGATAG CCATTGCCCC CTTGATGGCT TCTTGGCGAT CCGCAATCTT 'rTCAACAGGA TGAT'rGATGT AGCTACTAAT TTCATCTrGCA ATGGCCATTG C.GTC~rCATA GTTAGGGTCA TCAGCAGTCA GAAAGACTTG AATCTCAGGG 'rGTTGATTGA GGAGGAGGCC AAAGTCCTTA CGACGACTTIT CTCCCTTG? T ICCTGTTGAT CCCAGAACCA GAGCAATCTT TCCGCTTTGA TGAGT'rCAA
AAGACCTTGG
GCGATGCCTT~
CCACATTGAT GAGTTTTC CTCCATTrTTT CTGAGTGAGG TTTTGATGTC CTCAAGACT'r AGACTATCCC CATTGTGGGC ATACTCGATG AC'rTCCATAC GACCAGGAAC GCGCGTTGCA GCTCCGAGAC GGAGACAAGC AAGTCCAGCA ATGAGTTGAA TATCATA.ATC TCCAGCGAGT
S
S.
S S S. 55
S
S S 55 S S 5 GCAACTGCAT T'rTC7"rGGTT GAAGTTGCCA TTACCCGTAG CTGAAAAGCT AAAGGCTTTG CCATAGAAAT CATGGTCTTG ATCTTCAACC TCACTGTTAA TGATGACTGC T1CGGCTCTTT TrrCAAAGC TAGGGTGTTC AATCGGGCCG CCCACATCAA AGGT-rAGACC ATAGACACGT ATGATGAGGT CGGTACGGTC ATTrTTGCACA CTCTCAGGGG TTGTCAACGC TGACTTAAAG GTCGACAACA TAGCAGGTCT ATGCCCTTGA GTTGTCT'rAC CCTTAGTACC AGTAAAGGCA TAGAACTCCA TGGCAATCAA ACTCATG4GC1' A TACCGACTT CGTAGTCCTT TTCAGCTACA AGAAGGTATT CTT?=?AAA CGCAGCGCCT 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 GAATTC'rCGA TTTGGTTATC TGTTCTTTCA AGACTGAGAA TCCATCAAGA GACGC?1'GTG ATATGGTCTrG GGCTGATATT
TTGACCAGAT
GCCTGA'rTCA
AAAGTCTCGC
GATAAGATGT
AGGAGTTTGA
TTCT?1'ATAT
AGCCTTGACT
TCATGTCAAA
CATCAAGAGT
TATAGGCGAA
A.AATTGGCTA
GTGGTCCATG
GTAGAAATAG
TAGGAAAACT
GGAGACTTCC
GAGGTCAATA
TGTGTTCATG
ATAGGCTGCT
&5500 '0.0.0 '00.0 GTTTTTCCTG TGGATTACCA CGTTCACAAT GATGACAGGG TACCAAGCTA ATCCTTGTGT TATAGCAGAA TTTGCGAAAA AAAGAGTGTC TTCTGTTACT TTTCGCCTGT CGTAGCTGA'r CCTTrGGTCALA TAATTTCGCG GCTATCAAAA ATAACTTTGC TGTAGTTGTA GTGGTAA'rGA A.AAAAGGCCA TCTTTCTTTA AAATATCTAA 'rACGGTTTCA 1228 ATCTTAATCA TACTrTCTAT TGTAAACCGA AAGTCGTAAA TTTACAAGTA ACAAGGAAAA 3900 GTTTATAATG GAAGATAAGG AGTrTTCCT AGTrATCAAA ATTGAATGAG GAATCTATGT 3960 CGCACGAAAA CAATCACCAG CAGGCCCAGA TGTTACGGGG GACTGCTTrGG CTAACGGCTA 4020 GTAACTTTAT CAGTCGCCTA CTCGGGGCTG TTT-ACATrAT CCCN'GGTAC ATCTGGATGG 4080 GGGCTTATGC AGCTAAGGCA AATGGTCTCT TTACCATGGG TTACAATATC TATGC*rGT 4140 TCTTGTTGGT TTCAACAGCG GGGATTCCAG TTGCGGTGGC CAAGCAAGTr GCCAAG'rATA 4200 ATACCATGCG AGAAGAAGAG CATAGCTTTG CCCTGATrCG GAGCTTCTTA GGCTrTATGA 4260 CAGGACTAGG CCTCGTTTT GCTTrAGTCT TGTATGTCTT TGCTCCTrGG CTAGCAGACT 4320 TGTCTGGCGT GGGCAAAGAC TTGATCCCAA TCATGCAAAG C'rTGGC'IrGG GGAGTCTTGA 4380 TTTTCCCGTC TATGAGTGTT ATCCGAGGAT TTTTCCAAGG GATGAATAAC CTCAAACCCT 4440 ATGCCATGAG CCAAA'N'GCT GAGCAGGTCA TTCGTGTTAT CTGGATGCTC CTAGCAACCT 4500 TTATCATTAT GAAGCTCGGT TCAGGAGA'rr ATCTAGCAGC CGTTACCCAA TCAACCTTTG 4560 CTGCCTTTGT CGGTATGGTA GCCAGVM'rG CAGTCTTGAT TTATTTCCTT GCCCAAGAAG 4620 GTTCACTCAA AAGAATCTTT GAAACAGGAG ATAAGATTAA CAGTAAGCGT CTCTTGGTTG 4680 *ATACCAT'rAA GGAAGCCATT CCTTTTATCC TGACAGGGTC TGCCATCCAG CTCTTCCAGA 4740 :TTTTG 4745 INFORMATION FOR SEQ ID NO: 219: 9 Ci) SEQUENCE CHARACTERISTICS: A) LENGTH: 1900 base pairs TYPE: nucleic acid 4 STRANDEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 219: CCTGATTGAC CTTATAATAA GGAACAAAAC ACAATGCACT ACCTTTTCAA CAAAAGAGTr VoteGCTGCTTGAT TAAAACCATC ACACCAGTTA TACCATTTTG CTTCATACCC ATCTTGAGCT 120 AGGATACGAT CTTCTAAATC AAAAACAGAG TAAATCTTTC TTTCCTCGCA AGCTTGCGCA 180 oooTAGAGATGAT ATAGTTCATC ACCACCATCT CTATCCCACT CAGCAGAAAT CGTATCCCGA 240 CCTGCCAATA AAGCCTGATA AGCCCTGTGA TGCCCATCTG TAATCAGCAA ACAATCTCCA 300 AAGGCAAGAA TACTGATTGG ATCGACTTGG A?1'GTTTCTG CCGACTGGTA AAGCATCTGA 360 *ATATCTTGCA ACTTCTTTTC TGATAAATAT AGTTGAGTCA GATGAAGATC TGCTATATTG 420 ACTTTCATTT CTTTCTCCTC AAGGGAA'rTC GATACTCACT TCTGTTTGCC TTTAAATCGC 480 1229 CATrGGAAGC GGAgC=rGC ATAAAAGGGA AACTCGATAA ACAGGACTCC CAAGCCCACA CAGAGACTGG CAAGGACGTC TGATGGGTAA 'rGAACTCCCA GATAGACTCT TGATACCAGC ACACTGACTA GGTAGAGGCC AAGGACGA'TT ATCCGCTGAC TAAGAATAAC AATCAAAGTC CCACT'TGGGA AGGAAAATCC CTrCTCCTCC TGGTAGATAT TTTITAAAGGT CACGA'rrAAA AAGAAACTTT1 C'TATCTTCCA TCGCTTACGA GTGATAATCA CTGGGATATC AATCAGACGT TCTGGTAAGT CTCCTCGAAT GGCAGTC'rCA GGGTAAAATT TGACCATGTA GCCAAGAATA ATTAAAAATG T1'rGTrrATC TCTCATAATG CAACCAGAAT GAAACGGAAA AGATAACACC CATTAGGTAG TTGGAAAGTC CCAAAATTr'r CAAAGGAACA GCATAAACAT AGTTGAGAAC AGCTAGAGAG CCTAGTAGGA AACGAAGGGT TACGATGACA AAAACTCCCA AAGCTACGA", TTACGA=T TTCTCCAGAC CTGATCTTTA CCTACCATCA GCGTTACAGC TAGAGAATGC ACCAGATGTA AAATAGCTGG TCGTGGGCGC AGACCrGCCA AAGCC.AGAT T TCCCAGCATG TAAAAGACAA AAGCTGTAAT GACAACCCAA GTGAGGGCTC GAAAAAGAAT AGTCAAATAA ATCGATTGGT CAAAATTGAC CAACATTTCA ACGAAAAGTA AA.AGGGCAAA ACTGCCCTTC TTTTAAGGTr GGT'rTCAAGA GAACATACAA TTCAATCAAG TTAAA.AGGTA ATACCATGGT TCCAATATCA AAGTTAGCAA CAACATGGCC AAGGTTAAAC TGTCCGTTCC TTTTTCCAAA ATTCATCGGC AAACCAATGT
ACTTAGCGTA
CAATAGTTCC
TCAAAGCAAA
AAGTATTCAC
660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1900 TCCTTGGCTG TTAAGAAGCA ATTI-CAAGAG TGAGCGAAGC AAGAGCACTC CTAGAGmCsC AGGCAAATCC ATGACCACCA GACCCACAAG GACTGGCAAG ATACTAAATT CGATCTTGAG GAAAGATGCC GCTGGTAAAA GCGGAAAGTC AAAGTACATC AGCACAAATG AGATGGCTCA TAGAATTGCA ATGGTCGAAA GTCGACGTGT GTTTGTCATA ACAGGTTCCT CCAATTTTCT ATAAAATCAG AAGAAGTTGG AAAGGATTCC TCTATCTATT CTCACTTTTT ATATCCCAAA AGTTCCCTCT 'rACTCTATTA AAGAAAAACA AAGCAAGTGG TTACAATCCG GCTATAAATC TATCAAAACA GACAAGGCTA TTCTTTCGTC TTCTCCCA'rC CAGACTATAC TGTCGGTTGT GGAATCTCAC CACATCACGT TGCGCTCACG GACTTCTTTA INFORMATION FOR SEQ ID NO: 220: SEQUENCE CHARACTERISTICS: LENGTH: 4692 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1230 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: a GGTT"rCCAG CAGGAGCTTC GAATAATGAT ATrN'TITACC TCTCCATATG TCCATACTCC AATGTAAAAC ?TGAAATAAG TTTTTCTTT CAAAAAAAGC TCTTTTA1'TG AAACCGCTTr GGAAAAAATC GAATTTTTTA TAGTT'rTATT TCAAAATAAT ACGAATTI'A TTCAAGTTr AAAAAGCAAT TCAAACTATI' TATCTTCTTT TACTTTT'N'T AGGATCAACA 'rGGCTATTGC GATAGAACCA ATAAAATCAA GTCCCTTTAA TGCTTTCTGG ACTAAATAGA CCATCTTTCT AGCTTTCTrCA CGCTCTGCTT TACTTCCATG TACTCATCGT AGGCGTCI'G ATAACAGCTG TGCCrTCGTG ATGTTAACTG TTCATAAACC TCAGTACCTG TGAGTAAGTT GTAGCCTTAC AATTTCCCCT GGAATCAATG ACGAGTCATT GATAAGAACT GACCAAACGA GrrTGACCAT AGAAAGGCTI' TG4GATAACAG AAATGGTTCA CATTTAATAC AAGTTCATAG CCCTCACGAC TCCTGAAACA GTCCATTTAT TTGCAATTCT GCCTGCAAGC
TCCTTI'ATCA
ATGA'rAGTAA 'rCCA'rCTGGA
AGCTAGAGCA
ACACCTTGAG
TATTATACAG
AGTAATCTAT
CAACAATGCA
CTTATGTGAT AAGAATAACT
GATATTTTAC
ATGCAACCAG
TCCCATTCAT
GTAAAATTCC
GACTGGCATG
TATATTACCT
TACTAACCAA
ACTATACAAG
TAGCAAAAAG
AGTGTGATGT
CAGCTGATGC AGCTCCCAAT GTTTTTTCGT TTTCAT'rTA ACAAAATAAA TCCTCCTCTC TTTTTATtAT TTG'ITGTCAA CTGTGAATAA TATTATATAG ATATAAAATA GATGCCATTA TAAAAGAGA'r GGTGTTAACT AGAGCCGAAA CTCTCTTTTT CTCTAACACT AAAGTAAGCT GAATGACCAT CCCATCTGCT CACCATAGAT T*P'GAAAAAG CCTAACCACC TCCTGAACCT TAGGAATATT TCTGTTGGTA A'rrGAAAAAT TNTCAGAAAA GAGTGCCACT AAAATACATA CCATAGCGAC GATATTGACA TGTCGCAAAT ACATAGAGTA GGAGCAGTAA ATCCTAGG CTICTTCAG CTCTTATTCA GCTGATTTTT TCTTCTTGTT 'GTTAAGGAT TrGTTTACGC AAACGGATAG ACTCAGGCGT CGTTCAAGAA CTCAAGAGAC TCTTCAAGTG TCAAGATACG TTrGGTCCTT AGTAGCTGAA CGAACGTTGG TCATTrGT'TT TCAAGTCATT TTCACGAGAG TTTTCACCGA TGATCATTCC GGTTGACAAA GATCGTACCA CGTTCTTCGA TAGACATGAT CAGCATCGA'r AGAAACAAGG GCACCACGGT GACGTCCACC GCAAGTATTG GTCGAAGGTA TGGTTCATGA TACCGTAACC CAGTTGAGTA TCCAATCAAA CCACGCGCTG GAACAAGGAA TACCAGTTGA AATCATATCC AACATTCAC CTTTrACGTTC ACCCTTrGGTA 'rTCTrCTGGA GTGTCGATTT GTACACGTTC CGTCGATTTC T'TrTACGATA ACTTCTGGAC GAGATACTTG GCATTGTTTC GATAAGGATT GACAAGTGCA ATrCTCCACG CTGGTGAATC AGTTGGGTCA ACACGAAGGG AAACGTCTGT GTTCTTCCAC CTTACGAGAA GTTACCCATT TACCTTCTTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 ACCAGCAAAT GGTGAG~wrGT TGACCAAGAA AGTCA'1rrGA AGAGTTGCCT TAGGATTGGA AGAGCTTCTA TTCCATACCT GAAACGGCAA CAAACCAAAG AAACCGAAGA AGAAAGGGTA ACIrGGTCCC ACGTCCAACG AAGTCATTCTG GTTATCTACT GGAGCTGGGA TTCTTGGTCA GCTGGATCAT CACTGGGAALA TCAAGCTGGT CrGCA'rCTGT CGGAGTGATG GrTTCACCGA TCAAGTrCACC CGCTTTGGCT TCTTGGATTT GTIrTGTAAC ACGGAAGTTT TTAGTrGTAC
CATCGATGTG
CAAAGATCTC
CACGACGTTC
CGTCAAGTT'r
CGATACCGAT
GCTCATCTCA
CCATAGTCGC
CAACCTTAAC TGTACCACGG AGTCCAAAAG TGACACT'rGG TATGCTCGAT AATCCTGTCA CTGACPAT'GA AGAACTTCCG
AAGACACGAC
AACTIGCAAAG
AAGATTGGG
TTGATCGCTG AAGCATAAAC CGTCATCTGC ACCAAGCTCG ATGAAAAGTT CCAAGACTTC 0 00 00 0 0 0 00 0 0 000000 0 0000 0 0000 00 *0 0 0 0 *000 0 0000 0000 00 00 ATCCACTACT TCTGCTGGAC GAGCTGATGG GACA.AGGTCT TGTTCCAAGG CTTTTTTCAA ATAGGCATCT ACGACCAAGA CAACACCGTC ACCAAAGTCC GCGTGTCCTG G'TCIGTCCAT GGCAGTATTT 'rTAGCAAGGA TGGTAATTCC AGCAC~CrCTT GCCAATT'CAG TCCGTGCATC AACCAGGGTT G7I'=TACCGT GGTCAACGTG TCTTAArrTT GTCATGA'N'T CCTCTATAAT ATACCATAAT TTCAAATAAA TAACATAACT TTCTTTTCAC GTCAAGCC'rT TrCAAAGCGA GTTTAGATAA 'N'TATTAGCT CAAGAAAAAA TCAGAGGGGA AA'rTCTAGTC GATCG'PTGCC CTTrATCGATT
TACGAAACGA
AACCATTTTC
AATGTTGATA
ACGCTCTTT
AAGCGrTTCT
GGCGATAATC
ATTCAAAATT
CAAGCAAGTG
GCGACTTATG
TCAGCCGAA
CAGCCCGCTC
TTGTTAACAA CCACGATTGG GT'IrGTGGCA TGGTrCCTTC ATGATACGCT CAACTTCTCC CGAGTTCCGT TGTAAGCAAC TCGATATCGT TTGAGTCCA'r GATTGTrTCA ATAATTCC GcAkrGT'rAC GGATA'rCTTC 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480
TAT'PTTCTAA
TAAATGTTTT
ATAAGATAGG
GGCCATGAAG
CCTACCTCAA
AGGCTATGAA
CTGAACGArT
CACTCTGCTT
CACAGTATC
CAAGCACTCC
AATATCGATA
CACACCTATC CAGGACTACA AGAACTCCTT TTATGCTTCA TAAACCTrCCT TCATGGACCT GCTTCCATCT ACCGAGATAC AACGGGACTC TCCATCCCCA ATATCATGTC CTGACCATAT CCAAACCTTT CCGCAAAACT AGAGATTCTA CAGAAGGAAA AT'rTCATCAA TTCAGGATC GAATCATTCA GGTGCCGTTA CAGCCAACAA AGACAAGGA.A CTTCCGACCG AACATCCAQ.T CTGACAAGCT CTATGCCGTr GGCCGACTGG CTCCTCTTGA CCGATAACGG TCCCTTGGGC TT'TCAGCTCC GATAAGACTT ACCAAGTTGA GGTTAATGGA CTTCTAACAC CAAAAAGGAA 'rrGTC7'rT= AGATGACACT TCTGCAAGTC sCTCCCTCAG TCAAGCCTCT ATCAAGAAAA TGTTCCTCTC GGTTGGTGTT
GTCTGTAAAC
ATCACCATT'r
AAGGTGACTA
1232 GCCTCAAAAG AATCCAA?1Tr GGGGACTCA CATTGAACCC AGATTrAGCA GAAGGTAACT ACCGCCCTN' GAACCAAAAA GAGTI'ACAAA TCATTAAAAA CTATTrAGAG ATGAGTCGAT AAAACAAAAA AAGCTTTAAA. ACTAAAGCTT ?TTCTrTTA TTACCGAAA AATrAAGGCG ATTGCTACAA TCCAGTTAAC TACAGAAATC ACAATTCCTA AGATATrAAG AATC'TCT AT1'TTATAGT CTAATTGTGA CrCr TTTGG TATGAAATAG CC-AAGACCAA TCCTATGATA CCCAAAATCA GGCCTACAAT TGGAAATAAC AAACCAAGAA TAATCGACAA GATACCCACA AAAAGTGGAT TrTTCTTCTT TTC=TATG TrrCrAAGAAC TCCTTAAATT TTATACAAAT TAATTATACT ATAAAACAAT ACCTTCATCC TATCATTCGA CTAATTTGGA AATAAGG'rTA GCTAGTCTTC ACTTTCCCTT TCCAAGAATC CAAGCCATAA GAAAGGATAT AAATCTCAGA 0 9q 0 040 *0 0 9 *0
S
0 S. *t 0 0 AAAACCTTGT T'rrTCAAGT GTAGAGAAGG ACAGGTTrAT AGGAATAI'G CGTGCACCAA AATCAAT'TGA CCCGTACGAA ACGGCGAATA CGAAGATAGT CAAAATCCAA GTAACCATTA TTG'rGCTCTC TGCGAAGAAC GCATCGTAAA TCCGAGATAG ATGTAAACAA TAATACCAAA AGACTCATTC TAGCAAAATT1 T'rTTCCTATC TTCTTTAGCG AACAATTACA TT AA.AGAGCTGC A'rrTGTAACT CGTTGCGCAC GTTGGTTT'rC CTTTACGAAG GGCTGCAAGA CTAGTTTTCA ACTGACTTGA GGATATGTT'? TCTGTGGAAT TCTGCTGGGT CGCGCAAATC TCAAGGCTTC AAACTCCTCA TTG'rCCACAA TTTTAGCCGC TAAAGCCCAT CCACGCCAAC ATTGCTAGTA TAAGTGCCCA GTTCTTTTC'r CCATTTTTCT CAATATAATC CAATTCTACC TGC TTCTGCC TCTAGATAG'r CTAATTTATC CATCAACCCT TTCCAACTTC ATCAGTTCAA TATCATATAA GCGTT'N'CCC TCGTTTGAGG AATTGCTGCA CATCATAGAA TGTTTrTCATA TTGTGTTTI'r T'rCAAGAAGA GACTCACACA ATGCTCCTTA ATTCTAAGGC AAGI'A'GGTA CAATAAAAAC ATGGGGATTC 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4692 0* Sb S S .9 *0 S
S
INFORMATION FOR SEQ ID NO: 221: SEQUENCE CHARACTERISTICS: LENGTH: 706 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: GCTAAAAAGC TGATAATCTT CGACTCCTGT ATATGATGTG TCTTTTCATG TAAGACACC GCCGCCAGAA TCATGGCAAG AGCTGCAAGA CTGGCAAGTA AGAAGCCGAT AAGATAGGCA AAAAGATAAG TGAATTTGAC AAAGAAAGTC AAAAGAACTA GGAAACCAAA GCCTCCTCCA 180 1233
AAAACTACCA
GTCTGACGAA
GrrAAACTGG
GAGAAGAAAC
CCATT-rGAT
TGAACCTCAA
TAACTCCCAC
TGGCCACCAA
AAGACCATCA
AAGTCTTTCG
CGCCTACAAT
AGAAATAAAC
TGATCAAGAA
TAGCTTGTTT
CTGCCAAACT
CTGTTAGGAG
TTCCAGCAAC
AAACCACAGG
TAAATCCCAG ATTTrATCCA ACTGCT1 A~GCTAAC ATACTrCCTA AAAAGA; AGAGGAATAA GAGGTCACTA GAAAACI GGCAACAGCA GATAAGAGAA AGACCA1 GGATAAGAAC CAAACTGCCA ATCCCC CCAAN'ATGA ACA.AACAAAT GAG4GAAI TTCATAGAAG TTGGrCATAA AGCCTA; ATAGTCTTGG CGAACCAAGA AAGTAA; TGGCACAATC 'rCGATAAAAG CGTCTT
~GAC
TGG
~ACC
CCc kAG
~GAA
GAGGGAAGTC
ATAGACATGA
AATAAACATG
CrTCAACTGA
AATATAGTAG
C'TGAGATTCA
~GAC GCCCGCAATC CCG CATGGTCACC INFORMATION FOR SEQ ID NO: 222: SEQUENCE CHARACTERISTICS: LENGTH: 3236 base pairs TYPE: nucleic acid STPANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222: CAGCTGATGG GCAATATCAG TCATAGAAAT GTTGATGATA CGAGGGATI' GGTGATTT=T TTTGAACAG TGATAGCACT TGAAACGGCG AGTTGTTTCG AGGTAAGGGA TCTTAGACGG TCCACAATCA GGGCAAGATG GAGCCTCATA ATCCATATTG ATGATATCTA GAATCTTGAT GATAAAATGT AATTGTTCCA TATGATTCTT TAGGTCATAT GGGACTTTTT TTCTACACAA TACCCACTAC AAATAT'rATA GAGCCCGAAA TTGTCTATAG AAGAATAATA AAGATTATCT TTGTGCAAGT TGCACAGAAC TTGTTTATTT CGTTTTCATA TATAATATAA TTATCAAAAG AATCTTAAAA ATATTTACAA A.AAATATCCA AACTTGAACA TCAAAGATAA AGAATTTATC TTT~mCAATT AACTTr'TCAG CAATTT-'TG CTTTACCAGG GGAGTCTCAG CAACCATCAT TTTTCTAAGG AGAATTCTAG AAGGCATACC TrTTTTGAAAG TCATATTTCT TCATTAGACT ATCCAGCTTA GCGATAATTT CTTTGTCGGT GTTTGGGTCT TTAATATCGA GCAGTTTTGT TCTAATGAGT TGTTrTGTCG CTTT'rCATTA AAATA.AGCTC CATAATATCC ATAGGGGATT ATATGGGAAA ACTGATCCTT GTTTCTGCTT TCTTCAAATT CTCCGATATT CTCTAAAGTT TTTTGGTCAT CTTGCCATAG ACAAAAGGAG TTCACCTCAT AACAGCGAAC ACTATTCAGT GTTTTCGTAG GACCTTCAGG
AAATATAAAG
GGTAGAATTG
TGAAGATTTC
ATGTrGGTAAA 1234 TCAACTACAC TCCGTATGAT TGCTGGTCTT GAAGACA'rTA CAGAAGGTAC TGCATCTATC GATCGCGTAG TTGTCAACGA CGTAGCTCCA AAAGACCGTG ATATCGCCAT CGTATTCCAA AACTACGCTC T'rrACCCACA CATGACTGTT TATGACAACA 7GGCTrTcCGG 7rTGAAATTG CGTAAATACA GCAAAGAAGA CATTAACAAA CGTGTTCAAG AAGCAGCTGA AA'rACTTGGA TTGAAACAAT TCTTrGGAACG TAAACCAGC'r GACCTTTCAG GTGGTCAACG CCATGGGGC GTGCGATTGT CCCTGATGCG AAACTATTCT TGATGGACGA AACTTGGATG CCAAACTTCG ATCGCAGCTA CALACTATCTA CGTATCGTTA TTATGTCAC GAACAAATCG GTACTCCTCA TrCATCGGAA GCCCACTAT
TGTATCAATC
TGTAACTCAC
TACTAAGAAC
AGAAGTTTAC
GAACTTCATC
C01'GCTGAAA TCGCTAAAAT GACCAAACAG AAGCGATGAC CCTGCTGGTA CAGGTACTAT
TCAACGTGTT
ACCTTTGTCA
TCACCGTCGT
ACTTGCAGAC
CGGACGTTrA 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AAAAA'rCCAG TTAACAAATT CGTTGCAGGA ACCGTGAAAT TGGTTGGTAG CGAAATGTT TCTGACGGTT TCCGTTTGAA AGTGCCAGAA GGAGCAT'rGA AAGTTCTTCG TGAAAAAGC TACGAAGGAA AAGAATTGAT CTTTGGTATC CGTCCAGAAG ACGTGAATGC AGAACCTGCT *0 'rTCCTTGA.AA CATTCCCAGA GGTTCAGAAT CTrCACCTTTA GCTCGTCACT ACTTGCAAAC CACTTCTTCG ATGTAGAAAC CTGTGTTGTA AAAGCGACTA CTGTCAAGTT GGTAAAGACG AGGTGCAACA GTTGAGC'rTG TGAAAAAACA ATCTACTAAA TATCAATTGT AGTGGAGAGA 'TTTTTGCTAG AGAAGTCATA TCTC'rGTATC AGAACTGCT'r AGT'rTGTTGC AAAAGTTGAT GATTTGACTT GAACAAAGCA ATAAATAAAA TTCAAAGCAC TATCAGT'rAA TCTAGGGAGA TTATGCATCT ATATTGTGAT TACAAGAAAA GATATCTCTT GAAACAAAAT G CTCTCC
GCTCTTTAAT
TGATTACTGA
GCTAGTrrCC
AGTGAAGCTA
AATTTATC
TTAGAATTG'r .'r.TCAGAAGC ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT GCCGTACGTA TTTCGTCAGT 'rTTA'rCTGCA ACCTCAAAGA TGTACTTTGA GCAGCTTACG TAGTTTGCTC ?TTTGATT'rCC TATGCGTAAA CTACGTGAGC ATAGTTATGG TCACTrGTAT CAGATAATAT CATTTTGTGT AGGTCATAAG ?r'rTTAGCAA A7TrOACTAT AA'rTGAAT'rC GCTTT'rCATT
TATAATGAAG
AA'TTGCGGAA
ATTTGTGGGT ACCATCTACA GAACTAGAGA GCTAATAATA CTAGTTTATC AAATAATAGA AAAAAACAGA GGTGTTCAAA AAAACGCTTA CGTCCAGGTG 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 CAAAGCGTCC CACAGATTGG TTAATTGCAG AACGAGGATT TTCAAAAGAA AAGAGAATAC TAGAGGTTGC GTGTAATAGG GGAACTACAG CAATTGAGTT GGCACACCT TT'rGcr'GCA AGATAACTGC TGTTGATATG GATGCTCAAG CTTTAGAAGT GGCTAAAAAA TCTGCTGGAA CGGCAGGTGT TGCTCATTTA ATCAG~rTG AAAGAGCAAA TGCAATGAAA CTTCCTTATC
AAGATGCTAG
CTAAGAAAAA
CACATGATGT
TTCATGrAAA
GTTATTGTGA
TTTATGACGA
ATAGAAAGCA
1235 T'IrTrGA'rATT GTTATAAATG AAGCTATGCT GACTATGCAA GCCCATCAAG ATGTGTAATG GAATATCTAA GGGTATTAAA ACCTGGAGGr CTTCTCTTGA GCT'TCTTAAG GAAGC'TAAAG AGTCTATCAG ACAGGAATTA TCACAAGCAA TGTAGGTCCT TTAACTCAAG ATGGTTGGGA ACACGTGATG ATAGAATCAG TGTrGAAAGCA TTGACTGGT G AAATGACATT AATGAAATTA TCGGGTATGA AGGTTTGCTA GGAACTTTGA AAATTTGTGT AAATGCTTGT AAAAAGGAGA GTTTTTA.ACT ATGTATAAAA TGTTNGCTAA GAATAAACAG AAATrGGGCr GGCTAGTAT AAATCGTCAA AACGTTAGAT AATTIATTGAA GTTAACTT=1 TTCTTAAAAA ATATGCTATA ATAGAGAGTA A.AAAACTTTG AAAGAAAGAA
TTATTGCGAT
CCT'rTTCT AAAGATGAAT TrAAAAGATT ACATTGCAAC AATTGAAAAT TATCCAAACG GTACCG INFORMATION FOR SEQ ID NO: 223: SEQUENCE CHARACTERISTICS: LENGTH: 2885 base pairs TYPE: nucleic acid STRANOEDNESS: double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: CCTGACTTTT CAAATTGGTT CTATTACTTC TCAAAGGGCG TCTAAGTTAA TITCATT-GTCA T'rGATGCCAA ACGCCTTAAA CGCTCATTGT AAACTATCTT TCACGATTGT TTAGCACGTC ACAGCATTTT TrGCTCCTA.A CTAGCTTCAA ACTCTGTAAG TTGATCATTT CGGTTAGGCT CCCTCATTTA TACCTTTGGC CCTGTGATA.A GCGTATTTCC AATTCTAAAA ATCGCTN'TC TTAAACTCAT CTGCTACATA
AGTTTGCCAC
ATTTCTCACC
CATATTACTC
ACTTGGTTTA TATGGTCGTG GAAAGCATGG
CCATGAAAAG
TTTAACTGAT
TGTCTATTTT TGTTTAGGTT TGAGTGAGTA CCGCTrATAT TTAGAATACG GCTATAATTC TGGTATTAAA AATGGTATTT AGTGTrACCC TCAAGTCCTT AAGCTCATCA CTATCTAGGT AAAGAGTAAA TCCTGCTCCC AGTCACTCTT AGGCT'rAATA ATCATCAATA ATTAAGTAAT CAACAGACTT CATGAGTTCA
TGTTGCACCT
TACAAAAAGC
AATAGCAACT
TTACCATAAT
ACACTCTTAG
GATAAAAGTG
TCCACCCCTC TTTAATTTGT GTTCTCCTTT TGTCTTATAC CCTCATGCCA TCAAGATATr TTrCTGATGTT ACAGCATTAA T'rTTTCCA.AT
TTTGTACCTG
AATCATCAAA
CCCTGTACCT
ACCT'N'TGCA
AGTTAGT
TTGCATATAT GCTCTTATTG CTCATCAACA CATTATAAGT AGTTTAGCAT TCAAATTATC TGGCA -rGTT CACAATAGGG TTCCACCCCA CTACTACATG GGACACAATT CGTCTAGT ATTTTCTGAC CTCCTAAAAT GCTCTACTTG AT'rCGGTTTGC GTI'CCACCGT CAATAAATTG AGTGCTTT ACCCTGACTG AATCTTCTAG CATATACGC CAGCTCGC'TT CATATTCAAC AATCAGAAAT CAGTTGAAAA CAAAT'rCTGC TTGTGTCATC CTCTATAAGA CAACTTTAT T'TTrAAGGTT ACAGTAGCCT CTCTGCTTCG TCTCAAAAA GTCCATTTCA AAAATACCAG GA7=T~ATGG CTTTCATTCA
AGCAATCGCA
TGGGATACAG
'rTCTcrCT
AAATCTCTCA
CGT=C1T
TGTTTTTAG
TCCTGTTTCC
ACAGCCTC?1' 1236 TC7"rCTTCAT CGAATTcTT TTGAT'rTCTG
ATATTTCCTA
GTGTTGGAT
CCTCACGCGC
AACCG=TAAG
TTAACGCCTC
CTTGC7rM CTGTCTTCT TTATTGCC'rC TCCGTTCTCA TTAGCTGTAT TTCATGCTTA AACTAGATTG TAATGATT'rC CCAATCI-rCA TAGCTGGTAG TGCCCTGCTA T'rTC'rAACAA GArTACCTTG ATGTATGCAA ATGGATAAGC TCTGGCCTAA AATTCITGAA TCTGTAACGG TGACA.WrGCT TTACCTGTCT AAGTCGTCTA TrCCACACT GGTTACTTT TTATTTACAA ATG7TrTGGAC
TTGTCTAAAT
TTGTTCG
CTAAATACTC
AGTGCTTAGC
GGGCGAAAAA
ATTCAGGAAG
rTrGTAGCTG
CTAGTGTCAT
AGTTCCATrr
CTTTTCAGCA
TTGGCGTTTA
CATTCCCGTA
TTCAATCCAA
AGAAAGGArr TTTGATGGAT T-rTGTCTGAC 'N'TCCAATAA GATTrCAGCC ATTACCTCAT TGCATTGCCT CCTCAAACTT TTAGAGTTAA AAAGAATATC TGCTCTATAT ACGCCTGTTG A.AGAATGCTT TTCGCA'rACC GTGCTTTTAG AGACCGCTTC AGTAAACGGG ACAACTCATC CTTAGGCCAT 7TTTCGCT GTTAAATTCT CATATGTTGT TTTTTAGCTT ATTTTT~rAC CAATTCAATC ATAGCTATTG AGTCAAGCTA TCGATACCGT GTCTGTTACT GCCTTTAGTA AGGCCATGTT TCTATACTGT CAGACGAAIT TCAGAAAGTT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 AATTTCTCGC ATTTCTGCGC TAATTCTGTC TATACGTCTA TT-C'rGTCATG TTTTTACCTC TGTTTCTTTG 7TGGTGTGAT TTCTAAACAT CATTGTCTTA ATTTCCTGAT AACTCATTTT CCATATCCTC AAATGCCTGG TACTGCTCCA ACTCCTCACT TATAGCCCCC ACGCTCTTCT CTTAACTGCT TAGCGT'rCAT GCAAGTTCTT CATGGTGCTA TGCcGcTrCT ?TGGGCATT CATGCAAGGT TrrTCTTTTC GGTr'=CTA GCGCCCTCTG
CCTCACGCAT
TATTCTTTAA
TAGGTTGTCC
AGTCAGCCTT
CAGCGATGAT
'rTCAAAGAAT GCTTTGACTA GTTAGTTT GAATTGCCGT ACTGTTTCGG ATAAGTGATC AGAAAAGTAG CCTGT1GCTC GTTCAGAATA TAGGATTTTT TCTAGTATCT AATTTATGGA TNTTAA.ATCC AAGTATTCCC AACTCTTCAA ATTTTCTCTT A'rTAAGCGCG TGATAGTGTG GTGTTGTACT TCAGCACATT CTCGCTTGTG GTGTACGGCT CT"TCrTACC GTCCATGTAA ACTAGTTCCA TrACCGTTCT ACCTCCTGTA TAAATC7 GGT TAGCTTACTT TTTAATTGCC TCCTCTAGCC TCTTTTAG CCTCrAAAAC GGCTT TGGC T AGTGGTTAAT ATrA'rPTACC ACTTGTCTCT ATAAACG'rGT TAGAGGCCTT TATAACGACF TGTATCGCTG TATCGATATC CTCCGTGGAA TAGTAGA?'rr ATTTTCTAAT ATCAT'rCAAG ACTrGTrTAA CC CAT-rCT GAAAGAAATA AAATTACATC TTCTTTATCC TTGGCATCTG C~rTGTCTGA GACAAATTAG AATGTCAATA
CTTGG
INFORMATION FOR SEQ ID NO: 224: SEQUENCE CHARACTERISTICS: LENGTH: 3144 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: 2640 2700 2760 2820 2880 2885 TATCAATCCT TTCCCATTAT AGGAGCAACA ATGTATTTT-T ACGAGTCAGT ATCTTGGGAT TACTTGTTAA AACTGGGATA ATTTTCGACT ATTAGAATTG TCAAAACAAT CCGTCTAGGC AGAAGGAAAA ATGTCAAACT TTTATATTGC AACGTCCTGA GGTTTTATCA CCTGCAGGGA ATGGAGCAGA TGCTGTCTTT ATCGGTGGTC ACNTTACTTT CGAACAGATG GAAGAAGGCG TCTATGTAGC GGCTAATATG GTTATGCACG TCCGTAAACT GCGTGATATC GGGATTGCAG TGATTGCAGT GACTGAAGCA CCAGGCCTTG CTAACTATGA AACCCTTGAG TTCTGGAAAG GTGAGGTTTC AATGGAAGAA TTAGCTGAGA CCTTTGTCCA TGGAGCTATG TGTATTTCAT TGAGTATGCG TGATGCCAAC CGTGGTGGAT TTTACGATAT GCCATTTGGG AAAGAACGTA TTTCAATGTC AGCCGTTGAy ATGTCTATGA GAGTGGGAGT AGTCATCTAA GGACTAATTT ACTGGTTTTT ACTTTTCTAG ACTTTNrGAC
GTTTAACAGT
TTGATTTTAT
TATTATGCAA AGTC-TAAAAG CCTTTATTI'A CTATAAAATG AXATAGGAGA AATCATGACA AAAACATTAA CTTTAGAGAA GCTA.AAGGTA GCTGTTCAGT AGGCCTATGG TCTTCGTAGC CGTGCGGGAA TGCAGTTTGC GGCCAAGTAT GGTGCCAAGG AAGGAAATGA AGCTGGTGCT GGTGAGTGGT CAGTTATCGT ATCTGACCCA GCCTTGATTA AAATCCACCT TTCTACCCAA GCCAGTGCCA AGCTAGGCTT GACTCGTGTC GTTTTAGCGC TCCGCAAACG TACAGATGTT GAA.A'NGAAG ACTCTGGACG TTGTACTCTr TCAAACCACA GTTCTCAGTC ATGCCGTTGG AAATACGACC AGAGTTTGCA GGGTGAGATT CCAGAAGAAT TTGACCACAT TCcAGATATG ATTGAAAATG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020
CTGTGGACAG
CCAACTGCTA
TCAAACAAGA
7"MACTATGG
AGTACAAGTT
GTCAACGAAA
TCTAAAAATC
CAAGGCGCCT
CTrGGTGGAC
TACACCATCT
TGTCGCIAA
CGC'rATT-AAC 1238 GAAGGACGTA TGrAGCTrAT GTG4GATCCCT ATCTTGAAAG GACATGTGGA AGG?'rGCCCA TCACTAYGTA TCAACACTAA TCCTGAAAAG TTTGA6AGCTA ACGTGAACTG GC'TACAGGAT rGCrICCTCGT AAAATCCCTG GGCACAAACA GCAACTATTC TTATGG'rCCA GGTTTCCGTC ATTGAAAC CrATATTOAA CAAATCCAAT GGAACTAT'rG TTCGAGCTCT TAAAGAGGGG TTCGTGCrrA ATGTAGTTGT TAAACGAGAT TAAAGAATGG
GAAAATGAGC
GTrGGTCTT
GAAGGGGACC
GATTGCATG
ACTATTAAAG
C'rTATCAATC
TTAGTTTTAA
CGAAATCCC'r AGTTGrTrGG
ATGATGATC
AAGTTGAGT'r ATGCTAAAGG CA6ATAAAATC GACCGCGCTC TCCCACAACC TGTTCAATCA GGAGACATGG TrrATAAGGA AGATGGAACC AGCGTCACAG AAAACTATGC AAAGCTCCAT ATACAACACT T1GATCCGCAA GAGATTAGCT GTCTT=A 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 .6 9**9 'INTTTAAGT GATAAAGTCG GAGTTTAGGC ATCAAAGCCT ATCAAATTAA ACAAAGAAGC GATGTCTTAG ATATTTTGAA AAAAATTAAT AAGCAGAAAA CTC'rCTATTA 'rTTTGTITGTA GAGAGT'N' 'GTTA.ATAAA ATTTCACAAA ATGACATTTA TATATTGCAT TAAGTTAGAT ATATGATATA ATAT1'GTTAA AAAGAGGCGC AACNTTTTAA AATTAATGAG AATCAAAGAG AAAACCAATA ATA'rTAATGG AGGA6ATAAAA AATGTAAGTA AGCAT'rATGG TCATTCAATC ATTCTCAAAG ATATAAATTT TGCACTTAAC AAGGGTGAA.A TTGTrGGTCT ACCAGGGACA AATGGAGTTG GTAAGAGTAC GTTGATGAAA ATTC=GTTC AGAATAATCA ACCGACT'rCA 2100 GGTAATATTA TAAGCAGTGA TAATGTTGG TATTTAATCG AAGAACCAAA ATTATTTTA 2160 TCTAAAACAG GTTTAGAGAA TTTAAA-ATAT rrGTCAAATT TATATCGTGT TGACTACAAT 2220 CAAGAAAGAT TTAGATGTTT GATCCAAGAG TTAGATTGA CTrCAGTCTAT TAATAAAAAA 2280 GTA.AAGACCT ATTCTTTGGG TACAAAACAA AAATTAGCTT TGCTTCTAAC TCTCGTTACG GAACCTGATA TATTGA'TTT AGATGA.ACCG ACTAATGGTT TAGATATTGA ATCATCACAA
ATAG'='TAG
AGTCATAAAT
C7TTTTGACAT
TCATCAGCTA
GAAGAGGGAT
CGGTTCTAAA AA.AATTAGCT TTACATGAAA ATGTGGGAAT TTAPATATCG TAGAAGACAT TGAAGAAATT TG~rGAGAGAG 7rCT'rTTCTT GGAGAACGGG 'rrCAAAAAGT AGGAAAAGAT AGTC-ATAATT TCTTTGA GATAGCTTT- CAGATACAGA CATN'rCATT ACCAAACAAG AATTTTGGGA TATTrGTTTAG 'rGAGAATTAC TATGTCTGGG AATATTCAAA ATAGTGAGCT T=rAAATrT 2340 2400 2460 2520 2580 2640 2700 2760 2820 TTTAACGAAA ACTCTATTAA AGTAGTTGAT TTGAAACTA AAAAAGAGAC GCTTAAAGAT AT'rTACCTAA ATCG~rCAAA ATAAAGGAAG GTTATAATCA rGAAATTAAA TAAACAGAAG 1239 AATCGGATGA 1-1rACGTCTr GTCTAArl-TT C'TATATGCTA TCTCAGTTTC CATrATTTAT GCT7NTGAATG GCATTGTGTT ACTAGTCATA GrAAGTAAAT TGGGTATTCC AGGTGATTTA GGATTAAATT TTATAGTAGC TATTGTAGTC AATACAA'PTT TGTTAGTCCT GPTTAT'I-I- CTATrATCTT ACATCTA TTTATACAAA TTGAAAAGTG GCTTGGTATW TGGTATTTI'A GTAGCTTrAC TACTCTrTAT CTCTAATATA T1TAAATACGA TGATCATGAA TACTAGTAAT- GATrTGTI'TA TCAAAGCAAT TGAA INFORMATION FOR SEQ ID NO: 225: SEQUENCE CHARACTERISTICS: LENGTH: 3766 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 2880 2940 3000 3060 3120 3144 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225: TACGGTATTA TTTTTAAGGA GAAAGAATCA TGAAAATCAA AAAATGGC1TT GGTCTAGCAG 44.44.
4 0 *4 4 9 @64 4 4. 6* 4 0 4 4 CCCTTGCTAC AGTCGCAGGT ACAATGCAAC AACTATCAAA GGGACAAAAT CCAAGAATTG CAGACTACTC ACAACCAAAC AACACTATAA CTTCTTGAAC CAGATACTTA CATCTCTCCA ACACTAAAGT AGAAGACATC ACGAAAGCCG TGCGCTTTAT GAACTGCTCT TGCAACAGTT AATTGGACGC TAGCCAAACA ATACCTTCGT TACAGAAGCA ATGAAAAC2'C AAAACAATGG TTGGCTCTTG CAGCrTGCGG AAACTCAGAA AAGAAAGCAG ATCGCAAC'rG TTAACCGTAG CGGTTCTGAA GAAAAACGTT GTTAAAAAAG ACGGAATTAC CT'rGGAATTT ACAGAGTTCA AAAGCAACTG CTGATGGCGA AGTAGATTTG AACGCTTTCC AACTGGAACA AAGAAAACGG AAAAGACCTr GTAGCGATITG ATCCGCCTTT ACTCAGGTTT GAATGGAAGT GCCAACAAGT CCAGCAAACG GAGAAATCGC TGTACCGAAT GACGCTACAA TTGCTTCAAT CAGCTGGCTT GATTAAATTG GATGTTTCTG GCCA.ACATCA AAGAAAATCC AAAGAACTTG AAAATCACTG GCTCGTTCAT TGTCATCAGT TGACGCTGCC GTTGTAAACA AAATTrGGACT ACAAGAAATC ACTTTTCAAA GAACAAGCTG TACA.ACATCA TTGTTGCA.AA AAAAGATTGG GAAACATCAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 4 444t
S
9 04*9 4* 44 9. 4 4 CTAAGGCTGA TGCTATCAAG AAAGTAATCG CAGCTTACCA CACAGATGAC GTGAAAAAAG TTATCGAAGA ATCATCAGAT GGTTTGGATC AACCAGTTTGCGTAATAAGAA ACAGGGAGGT GGGAGAGAAA ATTCCACCTC TTGCTTTTGT ATAGAGTATA GATTGTAAAG AAGACTATTC GTTCATAGAA AGGTAGAGAG AATATGGTTT TTCCTAGCGA ACAAGAACAG ATTGAAAAAT
TTGAAAAGGA
AATCAGCTT
TCAAGCGTGT
CACATTTCAA
CTGTGCCAGC
ATGGCTTCAT
GTGCT-rTGAG
TGGAGGGAGC
ACAAACTCCG
AGCTG4GAAAT
CTGATGTGGA
TCCAAGCCTT
AAGAAGTACA
ACCCAGAGGA
TGGCCTTTCT
TCATGTAGCC CAGCATTATT TGCCCAGCAG GTTGGACTCA TGGAGCTGAA GTGGAGATTG GAGTrCGCGT CCAGATGCCA GGATGGGGAT CAGGTCTGGA GTATGGGCGT GGGGT'rGATG AAAATATATG CAGCACCATG GGAGGAATCG GCTTCAACAG TGGGGCGGAT T74GTTGGTCT TTCTGGTGGC AATAAGGGGA TATCCACTCG AGTTATGGTG ACAGTCTCTT CGTGCTGCGG AGAGCCCAAT GAACGAGAAA AGTTAGTCGC ATTTATGGAT AAAACGTTTC TTrTTTCGATC 1240 TTGAGG7N'T GCGTACCTTG ATNCAAGA AGGAAGTCGC AAATTATCTG GGTGAGATTT ATGAGAGCTA TACAGCGCCC T'rrGTCA'rGG AGACCTTGAT TrCTATAAC CACTATGACA CAGAGGATCC krr'rACGCTT TCGGTCCGCA ACGACAAGGG TCATATCACA GCTCGCTTGA ATGATrrACC TGTCAA'rATC AGCTTTATCA ACCTAGATAA GTATTTGGAA A.AGCATGCAG GGGAACAAGG GACCAAAAAT GCCTTGGAAC TTGTGACCTT TGATGCCAAG GTAAAAAGCG GTGTTGTGGA ATCAGCTCCT 'rGG'TATCTCC ATGGCCGTAT CTTGCTTGAA GGCTTGTACG TGGCCTTGCT AGAAACTTAT GGTCAACGAA 'rGGAGrFGCC TCTCTTACAG GAGGAGCGCA
CAGCGCTTAA
GTTATCAAGG TCAGGGTGTT TTCGTCTGGT TCCGGGCCTA ACAAAAATGG CTTTGATAAG GCGATATGAG CGCACCACCC AGGGCGTTI'C AGTCTTGCCG CCCTAGAGGT ACCAATGGTT GAGATGAAAA TGTGCGAA'rC TTAGAAGCTA TGAGTAGAGA AAGAAGAGAA CCATCACAC TACCGAATCG TTGCATATTC TTrGCAAAAAC CATCTGCAGG GTGACCTTGA CGGCAGAGCA CATTTTAACC TGATGAGCCA TCTGAACTCA GCAAGGAAGA T'rGGCAGATC GTGCTGAAAA AAGACTATTT TACCTGCAGA GAACCGCATG ATGTTCTGGA GTAGAATTAT ACTATACCTT ATTCTCAATG TGATCGAGTT ACGACAGCGG GGACAGGACC GCATTCGGTC 'rAGGAAATGC GCTGATTATT ACACCCATAT TATTATCAAG TTAGATCAGA GGTTAAGGAT GTGACCATTC TGGAGCAGGA AAATCAACCC GAAAATTACC ATTGACGACG GTTGCGTCGT AAACGTCAAG AAAGACAGCA GAGGAGAATG AAAGAAGGCT AAAGTAGCTA CTACCCTTCA CAACTATCTG TATCGAAGGA ATCCAGTCTG AGCCAGTGCC AAGCTAGAGG AAAAATTCCG AAACAGCTAG GGGAGAGATG AGCTATCGAA GGCCAAGAAA TTCTATCCAC TATGCATACG GTCTTTGATG CAATAGCCGA GACCACGGTG CGAATTAGTA GAGGAGCTGA TCGATGTGAC Tr'rTCACCA.\' ACATCCAAGA AGGGGATATC TTGTACGGGT GATTAATCTC 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820
ATGTGATT
ATATCGGAAT
TAGCCTTTGC
AGTTGTTGGA
GAGGGCAAAA
TGACGGCAAG
GATTTTCCAG
CCTTAAACAC
CTTGGTTGGT
ACAGCGTGTG
1241 GCAATTGCGC GTGCCTTGGC GCCCTTGATC CGAAGACAAC TTAGGCTTGA CTGTI'GTCTT CGTGTTGCAG TTATGCAGGA TCAAACCCTA AACAACCTTT GCCATGGTCA AAATCGAGAA GTGCAACTCA AGTACGCTGG CATTACCAAG TAATGGCTA.A GTTGGAGAAT TGGTGGTGGT
CAATGATCCA
CAAGCAGATT
GATTACGCAT
TGGGCATTTG
GACTCAAGAC
GCAAGAAATC
AGCTTCAACA
TATTCTCTAT
TT'rGTCAGGT
AAAATCTTGA
TTGGCCTTGT
GAAATGCAGA
ATIGAAGAGG
TTTATCTCAA
GTGGAACACT
GACGAGCCAC
GGGAATATCG
TTTCAGACGA
TGCAAGAT?1'
TTGTCAAAGA
GTAGTGTGCT
CAGCTACAGG
TCTCTCAAAA
GTCAACTrCT
GAACCAAAAA
CATTGCCAAC
TGAAATCT'rC
TATTGACGAA
CAGTCTCTTG
TTTTGAATGA ATTGTACAAG AAATTCTCGA TGGTACTCCT 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3766
GCCATTCGTC
ATTGATTCAA
GGGAACGGCT
CTTGGGGCTA
TAAAGTCGTA
AAGCAGGTGT ACAACTAAAA ACCTATTTAC CAAATGTCTA
ATCTACTTAA
GTGGCAGGTC
TITCTGGATTT'
CTCTTTATAT
TCTTTCTCGT
TAGACAAAAT
CACCACTTTC
TTTCTTTTGC
GAAAAAGCAG CGT'rGGCAGG GTATTGAAGG GAGTACAGTA TAAGATGGGT TCGGCTGGTC GACAGTTCTT TCCTTCATTA CTTGACAGCG CCAGGTGGTG TACCTCAATT T'1-CGTGCGG TCACTTGATT GTTAAAACAA 'rGCCCAAGAA
AGATGGAATC
AGGCAGGCTG
TCGGAGGCTT
TCTTGGAGAA
T'rCCCTTrAT
GTATCGGGCC
CATCCT CTT G GCAATCT =T AAATGCAGCC C'rTGTCCCAC AGTCTr'rGCC TTCTGG INFORMATION FOR SEQ ID NO: 226: SEQUENCE CHARACTERISTICS: LENGTH: 2520 base pairs TYPE: nucleic acid STRANDEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226: TGTTGCTGAG TTA.ATCGGTA CGTTCATGTT TGTATTCGTC GCGACAGGAG CTGTTGTTTT TGGAAATGGT CTTGATGGCC TTGGTCACCT TGGAATCGCC TTTGCCCTG GTPTGGCAAT CGTGGTCGCA GCCTACTCAA TCGGAACTGT TTCAGGTGCT CACTTCAACC CGGCTGTTTC CATTGCTATG TTTGTAAACA AACGTTTGTC ATCTTCAGAA CTTGTAAACT ACATCCTTGG TCAGGTTGTr GGAGCTTTCA TCGCTTCTGG CGCTGTCTTC TTCCTCTTGG CTAACTCAGG TATCTCAACT GCTAGTCTTG GTGAAAATGC CTTGGCAAAC GGTGTCACTG TCTTTGGTGG TTTCTTGTTT GAAGTCATCG CAACTTTCTT GTTTGTATTG GTTATCATGA CTGTGACTTC AGAAAGCAAG GGCAATGGCG
GAITCTTGTC
AGCTGTCTTG
CCTGIGTGGAG
GGATTGAAGA
GTAGGCGGCG
7TCTTGCAGC 1242 CGA7'rGCTGG ?rGGTAATC GGTTTGTCAT TGATGGCGAT 480 TTACTGGACT TTCAGTAAAC CCAGCTCCTA GC7TGGCACC 540 CASCCTTCAA CAAGN'GGA TT'N'rCATCCT 'rGCACCAATC 600 CCrGTTGCA AAAA.ATTTCC 'rGGAACAGA AGAATAATTG 660 CT~CATCTTCGA GGAACAGGGC TTTwrCGTAT GATACTCTTC 720 TCAGCTTCAT CTTGCCGTAG TATGGTTACT GACTTCGTCA 780 ACAGTGTN GATCTGACrr CGTCAGTTCT ATCIGCAACC 840 r.A=TCGTCA GTTCTATCTC CAACCTCAAA ACAGTGTTTT 900 AAACTCAAAA AGCCTTGCTC GAAAATCTCT TCAAACCACG GTTCTATCCA CAACCTCAAA TCAAALACAGT GTTTTAAGCT AAGCTGACTT CGTCAGTTCT ATCTGCAACC TCAAAACAGT 0.*
GTTCTATCTG
TCAAAACAGT
GATCTGACTT
AACTTCCTAG
CTGGATAAAG
GCTTACCGTA
AGACTTCTAA
AGTTTTCGTG
ACTCGACACA
CAGTGATTTG
CGATTTC'rCT
CTTCGACTGT
CACTAAAGAC
TC-AGAGCACG
GCGATGGCAT
GGACAAATCG
CTATTCTAGT
CAACCTCAAA ACAGTTT AAGCTGACTT GTTTTGATCT GACTTCGTCA GrrCATCCA
GTTTTAAGCT
CGTCAGTTCT
CAACCTCAAA
CGTCAGTTCT ATCCACAACC TCAAAACAGT GCT'rTGAGCA ACcTGCGGCT TTGCTCTrTr GAT'NTCATT GAGTA'rGACT TTAGCGGTTG TCAATTTTCT GAC?'rCG'rCA
ATCCACAACC
ACAGTG?!1TT
GTCGTGTTCG
GTI'GTAGTAG
ATAGCGAGGG
GAAATCTCCG
GAGCTTGTCA
TCCCAGCAGA
ACTAATTTGA
TTCATAGTGT
GACCAAGGCT
CAAAAAGCTC
ATCCCAAACA
ATCAGGACAG
T-rCAATCTAC AAGAGGCGTT CTTCTGCCA GGGTCGATTG AAATGCCACC TCTAGCAAGT TGACCAAGTC TGGT'rTCGGT AGCTAAATAG GGAATGTAGG AAATATCAAT GACATATCGA GGATATGGTG ATTTCGAGGT GATGACGTTG TGCA'rGACCC AGAAAAGGCA AATTGACGTT 'rCATAGTACT CCATTAGGGA GCTAAAAAAT GCCCTCATAC TTAGTTCCTT GCGCCGAGTG AATr'rTCCCC TTTCCCGATG GTGTTGATAC ATATAGTTTG AGGGA'r~rTG CGTCGCAAAG TCTGGCTGAG ACGAATGCCC TGrTCCTTAG GCCGTAGGCA AAGGTGACAG GGTTIG'r'GAA 'rCTTGACCAC CCTTCCAAAA TGGGAAATGT ACCAAATCGA GGITT=A AGTAAAATGA AATAAGAACA GTTPTAGAAG TAGAGCGTIA GAAAAATGGC AAAGGGCAAG TAGCTCAATT CAGCAATGAT TTGACAACTT GGCCGTCTT CGAGCTGTGT TTGGAT=~C AGTTCTTCAG ACAATTTGTC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220
TCGTAATATT
TCAAATCGAT
TATAGTCTAG
TACTT'GGTCT
TGTGAACACC
CTACTTATAT
TTCTAACPLAT
CATATNT
CTCCTTTGAT
TCCAACTT
AAAAAAGAGA CCAAAGAAAG GGCCTTGATT TGTTCTGCTG TTrTGAAGAGA AGAGTTGGAA ATCAACGTCC A71'=TAACGA TAGACATCAT TCCAAAAGCA TNTCAAGAC ATCTTCTGAA 1243 CAAGA'rTGGA CCTTGCATAC GACATGGACC ACACCAAGTT GCCCAGAAGT CTACTAAGAC CAAACCGTCT ?lrGTTTCTT GTTCGA.ATGT 'rGCATC TGTA ATTGCTTTTG CCAT'rGTA'TT TCTCCTTTTT TrAGTTATAT TGGCTTAAAT CTTGT'rTCAT GAGATAGAAG AAGATATCTC CATAAGTCCC ATGGTAGTCC AAATTATGAC CCTrGTAAGT TAATNr=GG ACAGGGTAGT AkkCTGCGAC GCCGATAAGG CAAGCTTGTT GCGAACGTTC AAAGTCTTCA TAAGACTCGG INFORMATION FOR SEQ ID NO: 227: SEQUENCE CHARACTERISTICS: LENGTH: 5278 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: ACTCAGTTAG AT~rrTGT7"TT CAAAAACAAC GAAGAAAAAG ACCATGTTGC TCTAC'PTGGA AGAATTGGCT CCGAACGTGT TTATCGATAT ATTAATAAAA AATATTTAGA TTTACCGGAA ACATTCGAAA ATTATAATGT TTTTGTACCA GAAGCTAATG GAAGTGGTGC CTTAGGTGAA GTCTTATCAA CACCCCTAAT CGGGGAACCC CTAATCGGGC ATACAGATAC TTTTI'TATCT ATTGGTAATT TTAAAACAAA ATTTGAAGCC GATGCTTGTA 'rTAAATTrAT TAAAACTAAA TTCGCTAGAG TATTATTAGG TGTTTTGAAA GTTACTCAGC ATAATTCACG CAAAACTTGG 2280 2340 2400 2460 2520 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 TATTACGTCC CCCTCCAAGA CTTTACGGTC AATTCGGACA ACTGATATTG ACCGCCAGCT TGATCAAAAA TATGACTTTT ATTGAGAATC ATGTAAGGGA GATGGATTAG AAA.AGTATTT AATGATCTAA AATGACTATA TAGGATTAGG TCAGGAAGCA TGTACTTATG AGATGAGAAA GTCATTTGTT AGATAAATTG AAAAGGAAAA CTTATGCCAG TAGAAATTAA AACCACTAAA TGCCTACACC ACACCGACAG TAACCAGTAA TGAAGGCTCG TTGATTGGAC ACAATCAGTG CCCCTGAAGA AATTGCCTTT TTATTTGACA AATAGTGCTC
TACGATGCCC
ACTCGTTAGC
GAAATTCATC
ATTAAGATTG
GCTCATATAG
TGACCCTTTT
AAACGTTCAA
CTAAAATCTA
GGTATACAGA
CTACAGATGT AACGCATACA ACGTGATGTC ACACAACGTA TCAAGGAGCA CTTATGGACT GGTGATGCAG CCATGATTTC CACCATTrTCC GTTCTATTTT AATGGAACTC CT'PATACAGA AGAGCCTGAT AAGGGGAAAA CTTTCAAGGA TTTCTTTCCA TGATGTAGAA CGTCGTCCCA AGACGGAATG CTGAAAAATC AAAAAATCTT TTTGATAAGT TTGTTCAGCA TGATTTGTCT GGTTATCAGC CTGGAAAAGG ACAGGACTAT ACTCTGCGAC AAGAGCAAGA AGAAGCAGTT GCTAAGACAT GAATGCCAAG CCACGCTTTG AGCTGTCAAT GTCCTAA'ITG TrrGAAACA TTCATAGCAG TAAGAGTCGT CCAATCTTGT ACAACTTGCT TTTATCAGTC CGATAAACTC AAATGGGTAA TGAAGGAGTT GATACCT'rCA 'rCTGCATI'TG TCAGGTACAT AATCTACAAC 'rGGTc7,rATG AGAAGAGGAA AATCCTTATG TcAGATGATT GGCGAAAAGT TrTrGAC TTAAGTGAAT 1244 TAGCTTAT~r CCAAGAACAT GCTGGAGGCA AGTTTCTCTG GTAAAACCTT GTCTACCTAT GACCTAGCTC GACCGATGGA TAACAAACCG CCCTGCCATT GCTAACI'CAT CGTATGATGA GTCAAACCAC T'rACAAGTTT Gl'rTCTGAAT CAGATAGCCT CACGACAAGA ATTTCTTGGT A'rM'AGCTG ACCATGTAAG TCCAAGACTT GAAAGGA'rCT GTTTATTTAG GTGGAGAGCA C'rGATCTGCA rGGGACTTG TTrGGTTATTG ACGAGGCTCA
AGACTGACCA
CATTTAAAC
CTGATGAGCA
AAAGCTTCCC
TAGAAAAAGG
T'ITwrCGCTAC AGCCI'TAAT AAGATT'CGAC ATTCCCTAAA GGAGATTT'rA GGCTGCTAAG TATTCGTGGT TCAGTTGAAT CTCTTTACCT CGCTCAGATC GATGGTGAAA AGATGATAAA GGGAAATTTA AAGCAATGAA AAATATCCAT TTGGCTTTTA GAACGTCTCG AATCTATGAA AACTATGAGA GAAATTTrTAC
CAGAGGAACA
CTCTTGAGCA
ATCAAATGTC
ATATTGACTA
TTCATGAGCA
TTTrCAACCAA
CTTCGGCCAA
TCGTTCTAGC
CCTTGGACTT
GTCAGCTGAC
TGAAA'rCACC
TGATGTCAGA
AGAACTCCGT
AGCATTAAAA
AATTGGTTAG ATACTCTATC AATGAACTCA AGCATACI'T GCCCTACTAG AAGAACACCC 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 TGCTGGTGAC GGACG'rA'GT GGT'rAGAAAA GCGATAGCAG GACAGGTGTC ACTATCCCTG AGCTCTTTAT ATGCAGGCCG AGGAAALTCAC TTTCGCAAAG GAT'rCTCTTT GATGAGTTTG TTCAGCTACA CGCGAAGAAA AGACCGTGCT GGTAAGATGG GATAAAAGCT AGAGAAGTTC TACTGGTA'N' T'rCCAAGCAA AAAGGAACGG AAGGTACAAG TGATCAGGGA AATGCAGTAG TGGTGAAAAA GTTTATGGAC ACGAACTCAA AAACAGCTGG CCGAAGAAGA CGATAAAGTC AGAATGACAA AACCATTACC AATGGACAGG TGTA'rTGATG
AAACTCAPLAT
CTATCCGTTG
TTATCAPLATT
CCTTCCGTGC TCAAAATCCT TACTCATGGA AAAGAGCCTA TGTATTTGAC NTTGCGCCGG CCAACAACTT ATTGCTTCTA ACTGCAGCTG ATATTAGAGA ATTATTAA.AC TTCTTTCCAA TTGAAATTGA TGCAAAGGCA GT'rCTAACCA TTAAACGAGG ?r"rATGTCC AATCTCTTAT GTCAAACAGT TT'rAGATArr TTAAATGAGC ATAGTTCTGA ?I-rATTAGAT T'rTTCAGATG
GCGATAACAA
AAAGAACCTT
GTAGAGGAAC
TTATTGCC-A
CTCCTCGCCA
TTGATAATAT
TGCCAGTTGA
TTACAGTCGA
TAGACCATGA AATTGTAGTT T'rGGTrGAATC TGTTGCTGAG TCAATGACTT GAGTAAGACC AATCAGCAAA TGCGACTTTT TTAGTCACAA AAGATGAGGA G'rTTCTTCAG TGATTGTAGA
GGAATTGAAA
TACAGCAACA
TCATATCAAG
TAAGATTCAA
ACTAGAACAA
TGAGATAAAA
ACGAGGGTTT
ACTTGATAAT
GATTGATCAG
TAAAGCAACA
TTATT'rTAAA
AATTTTCACT
AGGGATTTTT
TATTGCAGAA
TCCTGAAGAA
GATTATCTAT
GAAGAATT
GGTTGATTCC
GAAAAACGC
CTAACTTTTG
GTATAATGGA
A.ACCTTTTAG
AACTCCCTGA
ATTCTGGGTT
?rTTrTGAGT
CAACAACAGC
C'rGAAAGTGG CTACTGCGA'r
GAACACCTTC
1245 GCAGATTA'r CTCTAAAAAC AAGGGAAACT GAGCAAATTA AGAAACAAAT CrITGAGAATG AAATTCGAAA AAAI'CATATC GAAAGAAAAA ?1'TCTGAAC CAAGACTTGC AACAGCAGCT CAAAMAAGCA AATGATAAAG CGCAAAAAGA GAAGATTG AAAAACC=r AGAAGAAAAT AAACTCATC A'rAAAGAAAA ACACTCAAAA AAGAAGTGGA AAAAATGCCT GAGAAATI'TA TCGAACAGGI CG'rCTGGAAC AGTTGAAACA ATCAGCTCAA GATGAAATTC GTGACCA'TTT GCAAGAACAA TTCCAAGT TA'rTATGGCT TACGGTGATC AAACTCTAAC TTTrGATGCCT TTGTTCCTGA ACArGr -rT TATGAAGTAA CAGGGATTAC TTTAGATATT TGCGAGATGG TGGGCAGGAT 'rTTGCAGGGC TTTGACGAAG CTATTCAAGA ATTITCTTCGC AAGAAAAAGG GATCAAAAAG AAGACATTrr TGACTATATT CCACCGCAGA CCTAAACGAG TGGTGAAAAG GATGGTAGAT GATTTGGA.AA GATGATCCAT CTAAGACTTT TATTGA'rrA TATATGAAGT CT'rGTGAAGC GGTTATA'rAA TAGCAATGGC TTGAAAGAGG CGCTTAAAAC ATATTT'rGGA AAAGCAAGTT 'rATGGA'rTTG AACATTTCCA C'TAAT'rrTAT ATTTGCAAT CTTTCTAAAG crTTTTAGCAG ATACCA'rrCC AGCGGCTAAA GAAGGGAGCA TATTTTGAAA ATAA'rTAAAA AGAAGGCCGA GTCAAAATTC ATAATATTGA GTGCTTTTGT~ ACTGCCCCCC A.AAAGTTAGA CGGGGCAGTT CAGACAATCC TTGGTAT'rAT GCGTTTTATT 'rTGAAATAAG ATATGAACAA ATCAATTAGG AATTTAAAGC AGTAATGGGG GGCTATTTCA ACT-TCAACCT ACTATAATAC TAATTCAAGG AGTTGCTAT AGrrAAATTA GTTrAGAA TTTCCATGCT TCGTCAATGA TAGCTrGTAA TTCTTrAGCA 'rrCGCGTCG TTCAATGGGA TATTTACTGG ACGAACGATA TGGTTGACCG ATAAAGACAT TCTCAACTCC GTATTGACCT AAGTACTGCG TT'rrCATCGT CAAGGATTGC ?TTTAGTGATA ACCGTAGTAT GTTGCACCTT TTT'rGrrGAT GATTGTGTAG GAACAATTCA ATCAAT'rCAG CTTCTTGAAC A?'rTrGAGTG ATCTCTTTGA 3420 AGTTCGCGGA 3480 AGACCAACCA 3540 AGGAAAATCC 3600 CAGGCCTCTA 3660 CCTTTCCAAkA 3720 CTCCGTCTGA 3780 2940 3000 3060 3120 3180 3240 3300 3360
ATATCAGTAG
TTCAAAAGTT
'rrTGAAATCA
CAGAAAAAAT
GTGGGAAGAT
ATTTTATAAC
AGAAAAAAAC
GCTTCTTGCA
GATGCTTGCA
CCATGTGCAC
TCNGGAATA
CGAGCAACGG
GCTGCATCAC
TCTTTAAGGA
ATrCTTCAAG GN'TACACCA CGTGTrCACC CATGATGTAG GTGCTTGACG GAAACGAGCT GGAAACCAGA GAATTTCCA.A GGAAGATACC TTTGAAACCA GG7rTTACC TACAAGGTCA TCACAACAAG GTCAGCGTICT AAGTGAAGGC AAGGGCGTGA GTGGAATTC GATAATTCCA AAGATGAACC TACAGCACCA TCATTrGTTI'T AAACATCTCC
GCGATGTTAG
GCGTGCACTG
GAGTCA.AGTG
1246
CGTGTGACCA
AACGAGCATC
AAGTACCTGA
GTTGAGTAAG TCAAAACGTC GATTCAACAA CTTGAGTTAC AGACGAGTTT CACCTGG?1'T GCACAGTCAG AGTATTGAGC CTAAGGTCAA GCGCATCACC AGCTCT'rGTG CAATTCCTTG TCACCGACAA GGATAACTTI' TTAATTTTAT TAGGGGATTT AACAGCGAAC TCAGAGTCAC CACATCCAAT TN'TCAGCAA ACCGATAACG CGTTCTTTAG AACTGGTTA GCAGCAACAA GATGATTTG TTGATAGCAA TTGAGGTGCA CCTGCAGTGA TGCATAGATT TTTTTAGGTG AACAGCTTTT TCA'rGCAATT GTTA.ACAAGT GCAAAAGCGT TTTGTGTTGT TTAGTTGAAG TCCCTAGACA ACTTCATT 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5278 INFORM4ATION FOR SEQ ID NO: 228: SEQUENCE CHARACTERISTICS: A) LENGTH: 1941 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 228: ATAAGGAATC TCTAAAAAAT T'PTAAGGAGA ATCTAGCAAA TGGATTTCAC ATGGGCACTG
AAGTATGCCA
GCCAACGTTG
GGTTATGGTA
ATCAACCCTG
CTGAATTTT GGGAACTGCC ATTTTGATCA AACT'rAA.AGG TACGAAAGGT CACCAAAGTG TGGGGGTTAT GATCCCAGCC TTGATGTTTG CTTTCACTCT AGGGCTTGCA GTTAGCGGTC
TTCTTGGGAA
GCTGGATCGT
GTAACGTATC
T'TTTCCCTTG
GTACCTI'ACA TTATCGCGCA AGTCTTGGGG GCTATCTTTG GCCA.AGCCTT ACATACCGTC CATTCTACTT GAAAACTGAA AACCCAAATA ACATCTGGG ACTATTTCAA GTATTGACCA TGGTACAAAA GAAAGTCGCT ATGCAGCAAC TTGATTAATG AGTTTGTTGG TTCATTrGTr TrGTrCTrTG CAGCTCTTGG AACTTCTTTrG GTGCTGA.AGT GCTTCAATTC ATGAAACAA-A AGGCAACAGA ACAGTTGATT TTrCTGACTT GGCTATTAAA GCACAGGTGG CTCCACACAC CTrTCTGTGG CTCACTrGGC ACTrGGATrC CTCGTrATGG CTTTGGTAAC GGACCTACAG GACCTGCCTT GAACCCAGCC CGTGACTrGG GACCACGTCT
TGGTGCAGTT
CATCGCTGTT
TGGGA.ATCAC
GGCACAAGTG
AGTTGTGGCA
AACTTTCTCA
TGTCAATGGT
TTTGACTAAA
AGCAGGACAA
TGCTTCAGGA
ATCACTTGGA
CCTTCATGCT
TTCCTTCCCA AATCAGTTCT TGGTGAGCAT AAAGGCGATT GTACCAGTAG TAGCACCTAT CGCAGCAGCA ATTGCGGCAG TATCTCTAAG AAATAGCTCC T'PTAACATTT TcAGACTGGk TCTCTCNT1' kGATT?1'TaG TCCAGCAGTG GTTTAGAAGT CTCAGTGGGC GCAGTTGAAA TAATAGACCC TTGTTTCTAA AATCAAAN'G TGCAG?T1C ATTCTACTAT CCCTATCTTG TAAGTCTGCT TTATAGTGGG CAAACATrGT GAGGAATTGA TTACCTTCC TACTATAAAA TAAGCGATTA GGGGGGCTAT TATGATTGTT ATCGTTTTAT CTGCAATTIT GAAGCTAGCC GCAGGTTrGTT CAAAACACAG AATAGTACAC ATCTACTrCT AAAACATTGT TGCCCTATTC TTGTTTCATI' TTACTATATA GAGTGGATGG ATAATGCTGA AAACTCCTrG ATrAGTTAAA TTTTTACCAA GAATAATTCA CTGAAATTTG ATAAAATAGT AAGGAAAGTT ATATATTTTA TTGGAGGC?1' TTACTCAAAT ACACGTTAAC ATrGGTACTA TCGGACACGT TATCACAACT GTTTTGGCAC G
GAGTGAGCAC
GGAAATGAA.A
TATTCCAGCT
AACATrG'rGA
ATATTATCGG
TTGAAGTTGG
TCAACAAAAT
TCTrCGACCT
ATAC'TCAATG
CAAAATGGTG GTATITCTTGG TAGCTGTATT CAAATTCCTT CATCTATAAG TAAGAGAGGA GAAc'rCTAAA CAAACTCCTC TCAATGGACT ATAGTAGGTT GAAAT 'GGTT TGAATTCTCC AATATTATCG GAGATGGGTT AATAGTCCTC CCTTCT'rTCT GTCAGT'7C TATTTCATTT ACATTGACTC TGCTGAGTCC AAAATCAAAG GGCAAACTAA 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1941 TTTTGAGGTT GTATAGTAGA TTGAAACTAG TAGAAATCGA TTrGACTGTC CTGAACGATT AACCAGAGAC TGTTTACATT TTCAGCAAGT AAGGATAAGI' CTATTTAGTA CTrTCTATTA CAAAAACGTT GTAAAACACT TGCAATTTAG AGACTGTATT GCCTACTGTC 'ATCTATAAA GCCAAAAGAA AAATACGATC G'AGTAAACC TGACCACGGT AAAACTACCC TAACTGCAGC INFOR14ATION FOR SEQ ID NO: 229: SEQUENCE CH ARACTERISTICS: LENGTH: 755 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229: ATTTGAAGAA ATTGAAGAAA TCGTAGCCCC TACAGATGGT GAATTTTTGG GGGAAGTTT ACTTGGAACT GGGGTAGTTC TCTTAATTGG AGTAGCCTGT TGTTAAAAAG ATAGGGAGTG ATAATCATGC AAGATAACTPT TTTAT'rTGAG GAAATTGAAG AAATTTCAGT ACCAGTTAAT GATTTTTCAG CTGGACTTGC GGTTGTTGAA GTTTGTTCAT A'rPTAAATrT TCCGTATTAG TGATTGCGGA CTGAGGGACT TTAGGTTTGT TATGAGAAIT AC'rATCATT'r ATATATACCA ATGGTTAGAG CAACTAATAG GAAGAAATAT TTTAAAATA ATAAAAGATT TAATAAGAA'r ATACATGAAA T'rAGI'CATGC 1248 AACAGGTATC GGATrTGGTT TAGCAATCCI' TGCTCTTGCT TTACTAACAT CAAGCrrrT CAATTTCAIT TrAGACAGTC TCTrGCAGCA AGAGATAAT AGAATTAGTC ATTATTTTAT AGAGTATG?1' 1-ACT'rAACC CCTCTNTTAT rrAN'AAAGG GTTGATAAGA TrAAGATATT ACCTACTCCT TATGAGGGAC
TCCAGTAAGA
GTCAAGAATT
AACATGTATT
TACCATATCG
AGTTGGGAAA CAGGAAAAAA GACI'ATTAG 'rGTrAGTAGG ATTATCTGAA AAATTAAACT TTGTCATTAT ATT'TGCAACG TCTrGGGACT TrAATATAAC TTTGAAAAAA TCCTATCTTG AATAATTGCT AATAA INFORMATION FOR SEQ ID NO: 230: SEQUENCE CHARACTERISTICS: LENGTrH: 1483 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear a a a a a. a (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230: CCAGAAAAAC CGTAGTGGAG CTCGTGGAAC AGTGGAATTG ATTTTCCAAA AAGAATACAA TAAATTTTCA AGTATCTCAA AGAGGGAGGC ATAAGATGTC AGATGCATTT ACAGATGTAG CCAAGATGAA AAAAATCAAA GAAGAAATCA AGGCACATGA GGGACAAGTC GTAGAAATGA CTTTGGAGAA TGGTCGTAAG CATCTCTATT TAT'rGTGGAG TTGAATCCTT TACTTACTCA AAAGTGAGAA ATTr1'CTCAC TCGCTTTATA TTATTTTTCA ACAAGI'ATT ACTTGAAGAT ATGCTAGCAG CTGGTGATAG GGGGATGATT ATCCTGCTTA TATTCTCGTC AGTGTACTTC ATrCCGGCAG CTTA'rGGAAA CGTGTAGATA ATACACCGAC CGCCAAAAAA ATAGATTGGG TAAGCTAATT
TTTGGGGATG
GATATTCTTA
AGGAGGAAGA
GTCAGTTAAA
TGTCTATGCC
TTATAAAAAT
TTTTGTAGCC
TGGAAGGAGA TAAACAAGTT CAGAAAAGAA TTTGATTCAT TTCTCCGAAT AATTTAGGTG ATGAAAATTT TACCGTTTAT AAGCTTGTTC CTTTTTTAGT
GAAGTTTATC
AATG'PTTACG
TATC?1'GACT
AAGGCAAT(CA
AGCAAGAGGA
AGTAGGATTG
TATTCCAGAG
GGGAGCCAGG
TTTCGTTTGA
GAAATGGATC GATTGCGCGT AGATTGATCA GTGGCGCATG GTAATGTCAA TGGTTTTGAA GTGCTCGTCG GGAAGG'PTAT CTACTGCAGG AACTTATGGT TGCGAATGAA TGGGGACATC GATTGGTTCC ATTACTTGGT 1249 CATGTGCCT GGGTGTCAAA TGTAATGGGA GATCAGATTG AGATTGAGGA ATATAACTAT 900 GGTTATACAG AATCCTATAA TAAACGAGTT ATAAAAGCAA ACACGATGAC AGGATTTATT 960 CA'NTTAAAG ATTrGGATGG TGGCAG'rGTT GGGAATAGTC AATCCTCAAC TCAACAGGC 1020 GGAACTCATT AT'I TAAGAC CAAGTICTGCT ATTAAAACTG AACCTCTAGC TAGCGGAACT 1080 GTGATTGATT ACTATTATCC TGGGGAGAAG GTTCATTATG ATCAGATACT TGAAkAAAGAC 1140 GGCTATAAGT GGTTGAGTTA TACTGCCrAT AATGGAAGCT ATCGTTATGT TCAATTGGAG 1200 GCTGTGAATA AAAATCCTCT AGGTAAtTCT GI-rCrlrCTT CAACAGGTGG AACTCATrAT 1260 TTTAAGACCA AGTCTGCTAT CAAAACTGAA CCCCTAGTTA GTGCAACTGT GATTGATTAC 1320 TATTATCCTG GAGAGAAGGT TCATTATGAT CAAATTCTCG AAAAAGACGG CTACAAGTGG 1380 TTGAGTTATA CGGCTTATAA CGGAAGTCGT CGCTATATAC AGCTAGAGGG AGTGACTTCT 1440 TCACAAAATT ATCAGAATCA ATCAGGAAAC ATCTCTAGCT A'rG 1483 INFORM4ATION FOR SEQ ID NO: 231: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1027 base pairs TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 231: CCCGGAAAAC AAGTTAAAGT TGAAGTTGGT CAAGCAGTTT ACGTTGAAAA ATTGAACGTT *GAAGCTGGTC AAGAAGTTAC TTTTAACGAA TTGTTCTTGT TGGTGGTGAA AACACTGTTG 120 TCGGAACTCC ACTTGTTGCT GGAGCTACTG TAGTTGGAAC TGTTGAAAAA CAAGGAAAAC 180 AAAAGAAAGT GGTTACTTAC AAGTACAAAC CTAAAAAAGG TAGCCACCGT AAACAAGGTC 240 ACCGTCAACC ATATACAAAA GTTGTCATCA ACG CAATCAA CGCTTAATTT TAAGGAGAAC 300 ACATGATACA GGCAGTCTTT GAGAGAGCCG AAGATGGCGA GCTGAGGAGT GCGGAAATTA 360 CTGGACACGC CGAGAGTGGC GAATACGGCT TAGATGTCGT GTGTGCATCG GTTTCTACGC 420 TTGCCATTAA CTTTATCAAT TCTATTGAGA AATTTGCAGG CTATGAACCA ATCCTAGAAT 480 TAAACGAAGA TGAAGGTGGC TATCTGATGG TTGAAATACC AAAAGATCTT CCTTCACACC 540 *AGAGAGAAAT GACCCAGTTA TTCTTTGAAT CATTTTTCTT AGGTATGGCA AACTTATCGG 600 AGAACTATTC TGAGTTCGTC CAAACCAGAG TTATCACAGA AAACTAACAC GGAGGAAAAC 660 ATTATGTTAA AAATGACTCT TAACAACTTG CAAC?1'TTCG CCCACAAAAA AGGTGGAGGT 720 1250 TCTACATCAA ACGGACGTGA TTCACAAGCA AAACGTCTI'G GAGCTAAAGC AGCTGACGGA CAAACTGTAA CAGGTGGATC AATCC?1'TAC CGTCAACGTG GTACACACAT CTATCCAGGT GTAAACGTrG GTCGTGGTGG AGATGATACT TTGTTCGCTA AACTTGAAGG CGTAGTACGC ITTGAACGTA AAGGACGCGA TAAAAAACAA GTGTCTGTTr ACCCAATCGC TAAATAAAAA GGTCCATTGA ACCTTrTATC CCGAACCT'rG AAATGTAGAG GTGAGGAAGC TAGAAACAC TTAAAA'r INFORMATION FOR SEQ ID NO: 232: SEQUENCE CHARACTERISTICS: LENGTH: 1990 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: CGGTTCAAAT GGTGCAGGTA AATCTACCTT AATTAATTCT ATTGTAGGTT TTCAAGAGAT TTATTTAGGA GAAATAGAGT ATTGTGATAA AGATTTGATA GTTAGTTCTC AACCTTTTGC S
S*
0 0@0 S S. OS S S *f S S 0 OS OS 0 5 5 TCATTTAGGC TTTACTCCTC A.AACCACAGT AATTGATTTT TATACTACTG TGTAATATTG GGGCTGAACC TTGCTGGAAA GTTTGGGAAA AATGCTGAGA AATAGCCTTA GAAATTGTTG AGGTGGACAA CTGCAACGCG TATTTTAGAT GAACCTACCG TTTAAAAGAT AAGAGTTTGG ACTCGAAAAG TTTTGTAAAA TGATATGCGT GACTTTGTAG AATTTCTAGA TATCAAATTG TAATGATAGT TTTACAATAG GGTAGGAAAA GCATGTGAAA TTATTrGCAA AGAATAGGAG
GGTTAGCTGA
TCCAGATTGC
TTCCTTTAGA
AAGGAAAAAC
TAAAAAAAAT AATTTGGTAG TAGAGCAATA GCTCATAATC TACTGAATCT GCCGAAAAAT TATTATCATA TCTTCACATG
TGAAGGACAA
AGTTGTGTCA
AAACATTGTC
CAGAT'TTTA
TTTTAATGTA
ACATAAATCT
CATTTTTTGG
TGCAGAATAG
ACATCGAAGA
780 840 900 960 1020 1027 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 AAATACTTI' TTTACAAAAT GGCTCCATAT ATAATTCAAC TATCAAATTA AATTTTTCAA AATTTTTAGA AAATTTTAGA TTTAA.AGTTC S* OS S S
S
5555
S
OS'S
S
45
S
0 AAGTCCCTAT AGAAGAAAAG ATCTTAGATC TTATCAATGA TTAAAAACTT TTCAACAAGT AAATTAACCT TACAAGAAAG GAGAAAAATG AAGGCTGATC AATTAACCCA CAAATCGGAC TTAGGTTTAA GAGGTCTAGC GATTATTOCT AAAAATGAGA TTATTGCTTT T TTTAGAAGT AAAGGTTTAA TTA'rTrCTCA GTTTCTACAA CCAATCTrAT ATGTTGTTTT TATAATAATA GGATTAAATT CTTCGATAA.A GAACATTCAG TTTAATGATA TAAAAACCTC TTATGCAGAA TATACAATCA TTGGTGTTAT AGCTTTATTG ATAATCGGGC AGATGACTCA AGTTATTTAT 1251 AGGGTGACAA TAGATAAAAA ATATGGGCTA CrCcrA AGTrATGCAG TGGAGTTCGT CCTTTATATT ATAT TAGG GATGAGTATC TATTCTATAT TAGGGTTGAT AGTTCAAGAA ATrATTATAT ATATAATTAC GTTAGCGTTT GAGATAAATA TCGCAATGGA TAGATTrT TATACAGTTr TG=rATCTAT TGTTGTTTTA TTATTP1TGGG ACTCCC?1'GC AATTTTACT ACAATGTTrA TCAATGATTA CAGAAGACGT GATATTGTAA TACGTTTTGT ACTAACACCG CTTGG7r'I-A CAGCTCCTGT TTTCTAC1TTA ATAGATTCTG CTCCTAGTAT TGTGAGATGG ATTGGTCAGT TAAATCCT AACTTATCAA TTAACTA'N'T TGAGAAACTT TTATT2-rAAA AATTCAACAA C'TTTGGAATT AGT'PTTCTTA TTGTTAACAT CATTACTTGT CCTPTATATCT GTATCTTTTA TTATACCAAA GATAAAATrG ATACTGATAG AAAGATAAAA GTTGGGTCAT CCAACTrTTTr TGTTGTCTCC CGAAAACCAC TAGCTATGCT AGTGGTTCCA TAGAGCTTTT AGCGTGGTAA CAAAAAGAAC CTCCTAAAAT GATAAGATAG AAGTGGTT1TC TCCGCCACT.A CAACATATCA TACAGGAGGT ACCTCATGAG AGAGGArAAT CAAAG7TTTAT CACATACCAC ATGGAATTGT AAATATCATA TTGTTTTTGC ACCCAAATAT CGTCGTCAAA TCATTTATCG CAGATACAAA GCTAGTATCG GAAGAATCAT ACGTGACTTA TGTGAGCGTA AGGGTGTAAT AATCCATGAA GCGAATGCTT GTTCAGACCA TArTCACATG CTTATCAGTA TTrCCTCCGAA
ACTTAGTGTT
INFORMATION FOR SEQ ID NO: 233: SEQUENCE CHARACTERISTICS: LENGTH: 4766 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 1990 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: GAACTATATT GCATATATTT CTAGCAATGA TCATGGCGAA TCTTGGTCTG ATTACCTCCT ATAATGGGAC TTAATCGGAA TGCGCCATAT TTAGGTCCTG CATTGAAAGC TCAACTGGAC GTATTCTTAT TCCGTCTTAC ACTGGTAAAG CATTTATAGT GACGATAATG GAGCATCTTG GAAAGTTAAA GTAGTGCCAC TTGGTCAGCA GAAGCACAAT TTGTAGAATT GAGTCCAGGA GTAATTCAAG TACAAATAAT GGTAAAATTG CATATTTAAC AAGTAAAGAC GCAGGTACTA ACCGGAATAT TTGAAATTTG TTTCAA.ATCC AAGTTATGGA ACACAATTAT
CACCAACTTT
GACGTGGAAT
AGTCTGCGTT
TTCCTTCTAG
CATATATGCG
CTTGGAGTGC
CAATCATCAA
TTATAGCCAA TIrcATTGATG GTAAAAAGC TGGTCGTAAA CACGGACAAA TTTGGATTGG GCGTTATCAT CACGACGTTG ATTATAGTAA GTTACCAAA'r CATGAAATTG GATTGATGTT AC?1'CATATC AAAAATGTTG TACCATATAT TTAAAGCTGA AATTTGANAAA TATATAAAAA TGTTGGAGCT GGATAlwrG GAGC"TGArT AAAAGTGGTT GCCGTArTG ACCCAAATCA AGATGTTTGT GCAAGTTTAG ATGAACTTGT AGCrrCACCT AGCTACCTTC ACCGTGAACC CGTArN'TGT GAAA.AGCCAA TTGCA7rGTC ATGTAAAGAA AATAATGTCA TCTTTATGGC ACACCATGCT AAAGAATTGA TTACTCAAGG 1252 TGTCATTTTA AGTACTCCAA ACTCCACAAA TCTAATTrAAT GATGATAATA CAArrGATTG CTATGGATAC TCATATTCAA CATTGACAGA TrGAAAAATT GATTCATGGT CTCGTAA'rGA AACATTTAAG ATTGAAGATC TGAAAAAGAA GAGGATAAAA ATTATGGTAA ATTACGGTAT
AGCTCGCTCA
TGGAGAAGAA
AGCACGTrGAA
AGTTGTGAAA
TTATGAACAT
TGGTCACATC
TAAAATCGGT
ATGAACAAAA TTGAAGATGC
GTTGCTCAAG
GATATTGATT
GCTGCTCAAC
'rGTAAAGCCA
ATGAACTTCT
AAAGTTCTT
TGCTCGTACA GGTTGGGAAG AACAACAACC AACTGTATCA TGGAAGAALAC ATCTGGAGGA CATTTGTACC ACCATATTCA TGAATTAGAT TGCATTCAGT AGGACTTCCT GAAAAAGCGA CAATGGTAGG AGGCAATGTA TATCATAAAG
AGTTGGGATC
GTGTGATCGT
ATGGCAAACA
TGGTTGACGC
TTAACGGTGT
ATTGCCATGC
TTCGTTCTCA
TTATCATGG
GTGAAAACTT
GTTA TGCTGT
AAGGAACTGA
AAGGTGAAGG
CAGCTATCTA
GTTGCCCATT
TTAAAGGTGG
480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 TGGTGATGAA GATGATATGC TTTGGAATAT GGTAATGCTT AGGAGCTATC AAACTTGACT AGAATCACAC T'rCTTAGTTC TACCGGTCGT GGTATGGATG ATGGTTGCAA ACATGTATTG TCATT1GTAAA CTTAGAATAC TCTGATGATC TCCGTTGGGG TGAACACTAC GTCTTGATTC TGTTCAATAC TGGCGGTACT CTTCGTGTTA ATGAAACTCA AGAGGAAGAT GATGATCGTA GAGCAATTGC GTACGGTAAA CCAGGAGTAC ATAAAGAAAT GGAATATCTA CATGACATCA AGAAATTACA GAAGAATTTG AAAAACTTCT CAATGGTGTA GCTGCTTTAG AATCAATC(CC TACCGC'TGAT GCATGTACTT TATCAGTTAA AGAAGATCGA AAAGTAAGTC TTTCAGAAAT CACAAATGCT 'rAACTr'rrGT AGTTCTGTGA TACAACTCAT CCCATTTTTT ATCAAAAAGT AGGTATCGAT ATTGGCGGAA TTTGAATCAT TTCAAAGAGA A'AAATCAG GTCTGTGATT AAAACAGAAT AGTAAATTCT TGTCATTATA TAATTTCTAA TGAATAAAGA AATAGAGATG GGACTGCAT AATGCCCAGT AATGAGATCA AAAATGTGGC AGTGTTGAAA TGAAGATTAT CAACAATTAA GGCAGAT=TA TACGATGAGT TTGGAACGAG TAGAAACAAT TATTGACTAT GATTTGGGAA CCAATCAGAT TAATTGGTGA GTATACTTTA AATCATTCAA TTGATGGTGT 1253 TGGGATTTCC ACTGCTGGAG TTG=rAATGC TAATACTGGA GAAATCATCT ATGCAGGCTA TACAATACCA GGGTATATCG GAGTAAAC~r TACTGCCGAA ATAGAAAAAC G?~n'rGGGTT GTATACTN G?'rGAAAATG ATGTTAATTG 'rGCTGCATTA GGTGAATTGT GGAAGGGACA AGCCAAAGAT AAGAAAAATG TAGTAATGGT TACrATTGGA ACAGGTATAG GAGGCAGTAT rATTGrCAAC GGACAAATTG 'rTAACGGATT TAACTATACT CCTG.GTGAAG TAGTATAT TCCTGTAGGT AA'rTCGGATT GGCAAAGTAA ACCTCAACA ACCGCA'IrGA TTCATTATA TCAAAAAAAG AGCTTGAAAA CTAATCAAAC TGGACGTACT TTCTTCACTG ATTTAAGATC TGGAGA'rAAA GTTGCTGAAG ATTAACGATT TCTTATCTAC TAGTAAGGAT ATTT'rGTTAC TAGGTTTTTA CCTAAAAATC AGCTGTAAAA AATTTCTTAG CACAATGACT AACTCTGTAT TATAAAATCA TATGATAATG AAAACTAGAA AAAAGTTATG TTTAGAAATG AAACAACAGA TGCAAATATT CAGTATATTT TAGGGTGGTA GAAAAATATG AAAGTAGTCT TGATGATGTT ATTTAAAGGT TCAGGATAAC AAACTTTr'rA AATNTTTCGTA TTAATCCAGA AATTCTCATA CTGAAATTCA AAGI'TC'N-rA
TTGTGGCAGC
ATACAATTTC
TTTCGACAAT
AGATTTATAC
ATGAAAAAAG
ACCTTATTGA
CATCAAGTAG
GCCATTAGTC
TTAAAAACTA
TGGTCAGAAG
TACATTAGGA AATGAAGCTG TAATAAATAG TATGTAAGAT GCAACATATT GAGAATGTTG TTATAAAGCT GTTCCCAAG TCACGAAGAA TTAGTCAA TCAGGTTAAT AAAACAATCA GAGAGGAGAA TTTGTAGAAA AGATGAAAAG AATCTCTCTA TTCAAGAACT AGAGTCAGTG CTCTAGAAAA AGATGAAGTT ATCATGGGGT TATTGAGCGA TAGAAGCAAC TGTAGAGAA.A TGCAGCAACC TCCGATAACT
GTCGTATAGG
AAGGACTGT
CAACCGATAT
AAGAATTGGA
TAGAAAGCAA
AGGAAAATGA
AAATTATTGG
CTATTTTCTA
CACTT-CCGTG
GTA'rTTCCAA
AATGATGCCT
TTACA-AGAAT
ACCTCTTATG
AAAGTGAATC
GAAAATCTAA CAAAAGGTT TTAGGAG4GTG GGATI'CTGGA GCTAAAAATG CAATGGATAA 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CTATTCAAAT TTTTCATACT TCTAATTCCA TGAC?rATTT GATGAATCAA CAACAACAT ACCTACCGAA AGAAAACACG TTTAAATTAT TGr'rCTTAAA AAGAATTAGA GAAATTTGGT AAAGCTAATG TT'GCTGAGGG ATCAAATTAA CAGAGTTCAT GAATTAGAAA GACACATTCA AAC'rAATAAT GAGGAAATAG AGCGAT'rAAT AAAGTGGGAA AAA'rrAGAAA 7TGTCCTGC CAA=~AGAA CAAT?11'CTT TCTGTAAAGG AAAAGTCGGA ACAATTCCAA GGACTGAAGA TAATCGCTTA TACAATAGTC rTTAGAAAA CAATATTGAA GTTCAAGAAA TATI1-rCTAA TGATAGAGAG TACGGTGTG TTGTTTTCTA TCAGTCTAGT TACTCTATAG AT1wrMATGA ATACTTATTT GAACCATTTG 1254 ATTA'IrCTAG AAAGGAATTA CCGAAGCAGC GAGTAGTAGA AGTTAATAAC TGAAAAAGAG AATATTATCG CATCGrTGCA TAGATTTACA ATGGCAAATA GACTATATTT TATCrATCTA ATAACTT? GTGCACTCCG CATCTAGTTG CATTAGAAGG TITTATATTT TATAAAAGT'r ATGGATGAGC ATTrTGGACA CGGAAACATT GACGGATAAT CAAGATGAAA TACCTATCAA TTGAACCATr TGAATTATTG ACAGAAATGT ATGCTCTGCC C'TACACCTGT ATTAGCACCA TTTrACI-rA CATTTTTG GCTATGGTT'r ACTATTGTTT T'rAGGAACAA TGTTAGCATT CAGCAACTAA GAGATT TTT A AAATTCTTTA ATATATTAGG GTGGAATCrA TGGCTCATTT TTTGGATATG AGTTGCCATT CTGATGTCAT GAC'rATATTA GTAGTGTCAG TTGTG M1rGG GTTTGTTAGC TTCAGGACTA CAAAAAGTAA GAATGAATAA *CAGGATTTGC GTGGTGTGTT ATTCTG INFORMATION FOR SEQ ID NO: 234: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2484 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TTTAGATCAA GAAAACATGC AGATrCAAAG AAATA'rTTGA TGCTCGTCAA ATCTCTAAGA ATGGATAGAA GAAACTCGTA TTCTArI-rAT ATTTATGAAT ATTAACGAAT CATTCTTTAA CAAATATTAT GAGAA.AGATC 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4766 AATGATGGTT GCTGATTTAG AAAAATTTPT CATCTACCTT GGTAGCCGTT GCAATTTGGG TCATCTGATA TCTACAACCT GTTTATTACA GTATTTGCAG ATATGCAGAA GCATATAATT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234: CCTTTTAGAA AAAATTAAAG AATACGACAC CATTATCATT CATCGTCATA TGAAACCAGA 9 9 4 CCCTGATGCC TTGGGAAGTC AGGTGGGA'1r GAAAGCCTTG AAAAACCATC AAAGCCGTCG GTTTTGATGA ACCAACTC?1' TCTTGTTGAA GATAGAGCCT ACCAAGGCGC ACTTGTCATC TGCTCGTATC GATGATAAGC GCTATAGTCA AGGTGATTTT TCCAAATGAT GATGTATACG GTGACCTGTC TTGGGTCGAT aGaTGATTAC CCTAT'N'GCC CAAACAACCC AACTAGCCTT TGCTCTTTGC AGGAATTGTC GGTGATACAG GTCGCTTCCT GGACTCTTCG CCTGGCTGCT TATTTGAGAG AACATAACTT CTGGAACATC ATTTCCCAGA ACTTGGATGG CTGAGATGGA GTCTGTGATA CAGCTAATAC CTCATTAAGA TTGACCACCA ACTAGTTCAA GTAGCGCTAg GGCAGATCGC GATGCTGAGT CTACCCTTCT ACCACTGCAC TGACTTTGCG GCTCTCACTC GCAAAATGGA CACTATGAGC TACAAAATTG CTAAACTGCA AGGCTACATC TACGACCATC 60 600 1255 TG-GAAGTGGA TGAAAATGGT GCTGCTCGCG TTATCCTGAG TCAGAAAATC ACAATATAAC CGATGCTGAA ACTGCGGCCA ?TGTAGGTGC ACCTGGACGC
TTGAAACAAT
ATTGACAGAG
TGAGTCTCTG
GTAAAGTCCA
CAAGTGGTGC
AC7TGICTTAA
GTAACAATCT
GGGA*=
TCCTATCAAT
TAArrCCTAT
AAACTGATAA
ATGG=rCGCA GTCGAACAGG CTGATGGCCA CTACCCAGTT CGCTTACGCA GAAA'rTGCCA AGGAGCATGA 'rGCTGGAGGC CACCCTCTAG AGCCTAGAAG AAAACGAAAT CATCTACCAA AACTTAGAAG AATACTTGCC AAACTN'TCA GAATCTGATA GACTAGTATA AAGAGACCAT GGCAGAAAGG AAATAI'rGCA AAATGAAAAr C C
CC.
CC C.
C CC .C C C C. *C C C
C
AGATATCCAT CCAGAATATC GCCCAGTTGT CCTrAGCGGT 'rCAACAAAAC GCTCTAACGA ATTGATCCGT GTGGAAATT'r CATCAGACTC CACTCAAGCA GATGGACGCG TGGATCGTTT AGAGAACAGT TTGGCTGTT CTTTT=TT' AGACTCATCT GTAGGTTCGA TTTCCATGCT ACGTCCATAA TGAGCTATAC TATTGTCACG ATTCA7'rTTC TTAAAAGCT CTCGGAC'rrT TAAGGTTTCA AAACGAGGAC TTATACTCAT CGA'rTACCGA TGGACTTTAT CACCTCCT'rC CTTCATGCAC ACAACTACT G TTACCArTT AACAGTTGAG rrCGAAGGCG AAACTTACCC ACACCCATTC TACACTGGAC CAACAAAAAA TACGGTCTCA
GTCAAAAGTT
AATAATGATA
TC'rrGAAATC AACTGCTGTT 'rTCATGTTCC ACTAGGCAGG AAGGAAATAG CTGTTTCAAC AACCACACTT TCATTGATGG TCCAAGTGGA TTCCAAATCT TTGGAGGCAA TGGCCTGCTC CTCTTCAA AAAGCATTCT AGTCCATCTC TCCAGTCCTT GTATGACATC TTGAAGTTGA TTA7TCTAA 960 1020 1080 1140 1200 1260 77CATGACAT
TTCCACACTT
TA'IrAGTCGG
AAAGGATAAG
CCTGAAATAA
CAAGAAAGGC
CTTGAAGGAG
CTTCCAAAGT
G?1'CAATGGC
AATATTTAAG
TCgAAAGGCT 'rTCATCTCTG
GTACTTGATT
ATCCACG=rT ACGAATCTCT GTGTGTA'rGG AGGAATAAAG GTAAAATCAA TATGCCAtAT AGCATTGTCC ATAACGAGTA 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 *too to..
CCC
CTTGTGAAAG CTCTTCTAAA AACGCGTTCA TCCACACTCC 7TTTTATAAA GGCATCAATT GTAACAAATT CTCCTGCCTC TG'TAGCCTTC AAATGACGGC TITTCTCTTrCC TCAACTGTCA TATATGCATG GTTACGACCA CCACGTGTTT AGAGTCGAGT CCGAACTCCT CATATTtTT TACGTT'TCGC CAAATCGTTG 7"TrGATTACA GCTrAAAAGC TAGTAGATTG AAACTAGAAT GTCCTCTTCT TGI'TCATTT TACGAGAGTC TT~rAAAAC'r TACTATTATA TAATGCTTTT TCTATAATCT CTTTATAAGA TTTGCCCATC AGTACACCTC TACTTCTAAA ACATTrGTTAG TACTATAGAA CGATTGAAG GCGTTTATAA
AGACGAAATA
AAATCGATTTr
TA=TAGCTG
GTTrrrGATGG TTTGGATTTC TTCTT'rAGTT GATTTCATAT TGATTTTAGT CTGGTATAAA TATTGCTTTC CTCCAAAATG 1256 GTCATAGTTT TACTGGCAAA TCTAACATAT CACGGATAAA 'rrAACAAGTG ATTTCTGAA'r TGCTAAACAT TTrCTTTrCT TATAGCATAC TTTAAGATTT TGTCT 1TGAG AAAGATATT CCAAGAAAAA CGTTCGTTTT TTGG INFORMATION FOR SEQ ID NO: 235: SEQUENCE CHARACTERISTICS: LENGTH: 1766 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: CTAGATATAG CTATAATTTT ATTTATAACA AGAGGATAGA A.ATGACCGAA TTAGAAAGAA AAA.ATCGAAA AATTAGCTAA GAAATATTCT GATAACTTAA ACATCAAAGT TCAAGAGAGA GTTCGTGAAA TGGCAAATGA TAATAAGAGC CATTATTTGA TATACAGAGT 'N'TAGGTATT TCATTTGAAG AAGGAGAAAA TATCGATTTG TATCAAAATA AAGGTCGTTT TTTATACAAA TATGCTGGTT CAT'PTTTAGA AGAAGCTGCA GTACTATGCT TTAACGAAAA ATTTGGTACA GAAAATACTT AAAAAGTTAA CATTCCTAAT TCTGA.AAGTA CAAAACCTAA CACTT'TCAA ATrGATTGTr TAGTCGGAGA AAAACACGCA TACGAAATAA GATGGAGACC ATATAACTAA ATACCAATTC GGTTAATGT ACTTAGAAA CATTGTATAA GAACATTTAA GAGCAGTGAC AAAAAAACAG GGGTAA.AATC AAACTATTGA ATCCTCAAGT AAACCCAAAA ATTATCTAAT CGATTGAGGA TTACAAAGAA AAAATAGTGG CAGTA'PTTTC AGAACACACT AGAATAAAAG
CTACTATCCA
CGGTATrGGA
CGGTATTGAT
AAAATGACAG
ATTGATTI'AA
AACAAAAATA
TTTTTGTCTG
GTTCATTGTG
AATAGAACTC
GGGAAATATT
TTACTTAGTA
TATTAAAAGG
TCTATATGGA
TTATGTATPTC
AATGGTGGGA TGCAACTACA TTATTCATAA CAAAGGATAT AAGCTATAAA A.ATTCAGCAA ATTATGGAGA TTCTGCCTGC TTCTAACAGA TATTGCAAAT AGATAACT'rA GAAATATTAA CCCTCCTTTC TrTACACAGA ATTCGAAGAT ACGTGGAC, T 2400 2460 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 TAAGATTAGA AGAATGCAAA AGAGTGCTAA ATAAAATTGC AAATCATCAT ATTAGATTAA TTTTAGATAA TATCTTTGGA GTAGATATGT TTCAAAGCGA AATTATATGG AACTATAAAC GGTGGTCTAA TTCAAAAAAG GGATTATTGA ACAATCATCA AAACATTTAC TTTTAT3'CAA AGTCAAAAGA TTTTAAATTT AATACAATTT TTACAGAGTA TTCTTCTACT ACAAATATCG ACCAAATACT AGTGGAACGA AAACGAGATG GAAACTCTAA AACTATATAT AAGGTTGATA ATAATGGTAA CTATATTCTA GCAAAAGAGA AAAATGGAGT TCCCCTTTCA GATGTTTGGA 1257 ATATACCATT TC1-rAATCCA AAAGCTAA.AG AAAGAGTAGG TTATCCTACA CAAAAACCTA 1320 TTCTGTrATT AGAACAAATT ATAAAGA'PTG CTACTGATAA AAATGATATA G?1'TTAGACC 1380 CGTrCrGrGG AAGTGGAACT ACTTrAGTAG CCTCCAAGAT ?TGAATAGA AATTATATGG 1440 GGAT'GAT-rT ATC TGAGGAA GCTATCAATA TAACTCAGCA ACGTCTGGAA AATGTATAA 1500 AAACAAGTrC AAATTTATTG AATAAAGGAA TCGAAGCATA TAGAACCAAA ACTGAGGAAG 1560 AGGAAAACAT TCTrAAATrA TTACAGGCAA AAATrGTrCA AAGAAATAAA GGAATTGATG 1620 GTTTTTACC TAAACrTrTT CAAAAAAAAC CGATACCTAT AA.AAATTCAA AAAAATAATC 1680 AATGTCTGAA rGAGAGTATC TCTrTATTAC AGAATGCTAT AAACTCCAAA AAACTTGATT 1740 T'TGGAGTAGT TATAAAAACT CATTCG 1766 INFORMATION FOR SEQ ID NO: 236: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 748 base pairs TYPE: nucleic acid STRANDEDNESS double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: *CCGAAAATCA AATTCAAACC ACGTCAACGIT CGCCTTGCCG TACTCAAGTA CAGCCTGCGG CTAGTTTCCT AGTTTGCTCT TTGATTTITCA TTGAGTATTA AACTWATTA AATAATATTA 120 GCGCGGAGAA TTTCTAATTC TTCCTTGGTC AAGCGACGCC ATrCCCCTCG TTCTAGGTTC 180 *TCATCTAATA CTAAAGTTCC CATAGTCAAT CGTTGCAAGT CCACCACTTC CTTGCCACAG 240 TAGCCCACCA TACGCTTGAT CTGATGAAAC TTCCCTTCTG CAATGGTCAC ACGGATTTGG 300 CTTTGATTCT TTTCTGTATC TATGGATACA AGCTCCAGTA TAGCGGGTTG ACAGGTAAAG 360 TCTTTGAGAG GAATACCCTC AGCAAATGTC TCCACATCTT CTTGGGTCAT GATTCCCT'rG 420 ACTTGTGCCA GATAAGTcrr GTCCACATGA CGCTTGGGCcG AAAGAAGAAC ATG;AGCCAGC 480 TGACCATCAT TGGTCAAGAG CAAAAGACCA TGCGTGTCAA TATCCAAGCG TCCTACTGGG 540 AAAACTTCCT TACTCCCCC CAAGTCATCC AACAAGTCCA GAACGGTTC1' GTGCTTGGGA 600 *TCCTCAGTCG CTGAGATAAC TCCTTTGGGC TTGTTCATCA TGTAGTAGAC AAACTCTTCA 660 TACTCCAACA CTTGCCCATC AAAGCGAATC TCATCTATTT TTTCATCA.AT CTGCAATTTA 720 GCTGATT'TTT CTTTTTGACC ATTTACAG 748 INFORMATION FOR SEQ ID NO: 237: 1258 SEQUENCE CHARACTERISTICS: LENGTH: 1449 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: Z2T7T AAAAGATTAC ATTGCAACAA TTGAAAATTA TAGTCCTTTG ATGGCTGATG GAAATGCTTA TGCTACTGAC AAGAAAGTCG ACATGATCCT TTGTCCAGTT GCCTTTGAGT TGGGAATTGG GCCACGCGAA GTTATTTCTG CTGACTATGA GCACGCGGAT GCCATTAAGC CAGGTCAACG AGGTGGAACT GTTAAGGCAA CTATCGAGAT TTGTGCCTTC CTTGTTGAAT TGGATGAATT CTACAAAG'r CTTATGCATT ATTAATGAAA GGATATAAAA ATAGACTATA ACTAGTTAGA TGCAGTATAA TAAAAGGACT AAGTGTTTGA TGAAAATA. CAGTTAGGAG AGGGTTATGC TTGAGATTTT ACGGACAGAG AATATCTTTG ATATCCGTCC TTTGAAGATT TTAATTTTAA CCCAGTTGTT GCGCCACT1TG GCTAA'rACAC TGGAGAGCCA CCGT'rCTAAA ACAACTCGTT TTrCCTGAAGT CAAGGATGAG TATTTTGATG ATTTACCATT TGAGGAAG'rG GACTA'rrGGG AGACTCATGT CTATTCGACC CTTCATATCT GCrATGGTGT AGAAA.AATAC CAGATGGACA CCCTAAAACA GGGTCACCTT CTATTTAGAG CACGGCACAC GGAGATTCT AAGGAAGAGG CAGAAGGACC TCAGGTTGCG GTTTCTATTw GTTTTGGTCA TTTGGAGTAT GACCGTGATA TCCAAAGGAA GGCATTACCT TAGCTACGCT GTTCGTGAAA TCGTTCAG'rA GGGACCTGAA GCTCGTGGAT TTATCGTGGG 'TTTGCGCCT GTTCGTAAGC CAGGTAAATT AAAAGAGTAC GGTGTCGATA CCTTGACTAT TGTTCTTATT GTAGATCACC T7=TGCGAC GATTGAAAAA CTTGGTGGTG TTATGGCAGG GAACGGCCGT GAAAAA.ATTG GTGACTACGA
TCCGTGATAT
ACAGTCCCTA
GAAAAACTAT
GATTTGTCTT
CGATTCGAAT
GGGCTGTNTT CTCTACACTA AATGAAAAC TATATCTTCT CAAACATATG CAATTATTCC TGATA-AAAAA TTGCCAGCTG TCATGGATGA TCAACGTGCT GCCCACCAAG ATCTCATGCC ACAGAAAATG GTCACAGAGA CCCTACJ ACT GGATATTGAT TTTCTCTATA CAGAGCACAT GGAGACCTTC TATAAAACTT GGATGATCAT CACGGGTGCT CCAGTTGAGC AGGAATTTAG ACAGATGCTT GAGTGGTCTA GT'rGGGGGGC TCAGGCTGGG CTTTATCTGC GTAAGCTATC AGGTATTrAT CCTCAGGACA GCTTTGATGA TAGCTATGTA TCCCCTCATT TCTTAAACAA GACCAATCTC GAGATTrTAT TGGCCAGTCG TGATTTACGA GAAATTTATA C7'rGCAAA AGAGTATTTT CGAGATCGTG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1449
ATGCAGGT'
1259 INFORMATION FOR SEQ ID NO: 238: SEQUENCE CHARACTERISTICS: LENGTH: 904 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 238: TACCCGCTTC TTTCAAGAGT TGGAGCAGGG CTTGTTrTGCG ATCTTTTGTC ATAGTTCTrC
CTTTTAACGG
ACTATGGATA
ATTGGTACCA
AACAGCCAAA
TTTTTGTAGG
GATAACCATG
TGCCATCAAG
GGAAGCAGGCT
CGTTAAAAGG
CTTrTTAACT
CTTGGGTTTA
ACCTCTACTT
TCTTATTTCG
CGTTrTCGAA GCACTTTATA GACAGCTAGT GCTAATGTAT AGTCTACCAT ATTCTACCAA ATCCAACTAG TACAAATAGA ACATAAAACA TATTT'rCTAC GAAGTTGCGT AAAAAACGAC ACAGGCCAAT ACTTCACCAA GGGCATGAAC ACAAAGTTGA AAATCCAGGA AGATTTTGGT T-rATCTAGGG TAT=C~GGAA
TAAAGAGCTC
GGATAGCCAG
GGCGACAAGA
GGAATGACAA
GCTGTCATTG
GCATATACAC
TAGATCATTT
CTAAAACATT
CTAAAGCACC AAAAGATATA TGGGAAAAAG CCCGAAAAAC CCATCAAAAA TCCAAAACTA GAGGCTAGGA TGACAAAAAC ACATGGCTAT AAAAATAGCG ATGTGGCTCC CCAAAGTATA TCTTGAAAGG CATAACAAr7 GGAATCAAAA TCGCAATAGC TCATXAATTG TGTCTTTTTC CGTGTATTCA CAAGAATCTC TAGTATGGTA CAATAAACCA GACAATAAAG CAAGAATTTA TTTAGTTAAA AGTTATAGTA GATTGAAACT AGAATAGTCC GTTAGAAATC GATTTGGCTG TCCTGATCGA TTTGTCCTGT GTAAAGAT'rr CATTAUNAAAG AAACTGTATA GAGCAAAATC TTTTACTATA TCCACCTrCA GGTTTGGAAA GCGGAGATTG TTrnTTATTT TTTCCAGGGT TITGTAGTCGT
GGGA
INFORMATION FOR SEQ ID NO: 239: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 946 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239: CACTCAAACA TGACTTATAT CAAGACGGAT GGACTTCAAG ACGATGCCAA TCGCTTGAAT CGTAACATTC AGTTrGTGT TCG'rGAATrT CTTCATGGTG GACtTCGTGT ATACGGTGGA GCAGCTGTCC GCTTGTCAGC CTTACAAGGA TCAATCGCAG TTGGGGAAGA TGGTCCGACT C GTG CTATGC CAAATCTAAA TGT'rTrCCGT TGGTACCTTG CAGTGACAAG TGAGAAAACA TTGACTGTrG AAGATGGAAC AGACTTCGAC GAAAATGCAG CCGACTTTGA TACCATCTTG GTCI'CAGCTG CTAAAGAATT GGCTAGTCAA TCTACAGATG TC'PTTGATAA ACAAGATGCA GTCCGCCGTC GTGTTGCAGT CGAAATGGGT CTCGATGGTG CCG7r=TAGG TATTGATACT TGGCAGAATA TGOCTTTACT GTAGAAAATC 1260
GCAATGGGAA
ACTTTCTTCG
CTrCCTGTGA CA'rGAACCAG
CCAGCAGATG
CCAACTGCCC
AAGGTTGCTA
ATTGCGACAG
GGCGAAAAAA
GCTrACAAGG
GCAAGTCAAA
CAATCTTGAA CGGGATGGCC TCTTCTCTGA CTATGTGAAG C -PATGTCTr TACCCATGAT TTGAGCAT'IT AGCAGGTCTT CGCGTGAAAC GCAAGCAGCT 'rTGI'CTTGAC ACGTCAAAAT AAGGTGCTTA TGTTGTATAT CTTCAGAGGT TAATCTTGCT TCCGCGTAGT CAGCATGCCA AAGAAATTCT TCCAA.ATGCA ACTGGTACAA ATATG'ITGGT TCGGAGCCTC TGCCCCAGCA CCAAAAGTAT TTGTAAAAGT TGTTCGAA.AC TTGAAATAAT 9* a a. a CCTAAAAATC PA3GGCGTAAG CTCTGGTTTT 'FCTTACCAGA AAAAGTAGCT GAAA'rTTGAT ATAGTAGTCC TATGTAAAAG INFORMATION FOR SEQ ID NO: 240: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2764 base pairs TYPE: nucleic acid STRANDEONESS: double D TOPOLOGY: linear AAAGTAAGGT ACAATCTTGT
ACAAAG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: CGGGGCTCCc TAGTTCTTAG GGAGCTATTT TTGTTI=TC AAGAAGTTAT CTTCTTGTAT TTTATACTCA ATGAAAATCA AAGACCAAGC TAGGAAACTA GCCGTAs sTG CTCAAAAC AC TGTTTTGAGG TTGTAGATAA GACTGACAAA GTCAGGAACA CATATCTACG GCAAGGCGAC GTTGACGCGG TTTGAAGAGA TTTTCGAAGA GTATTAGTTG TGAATCTGGT GCAGTCGTCC CAGATTATTC TTATTAGTAG GGTCTTGTTT TCTATATCCC CTCGTAGTTA ACAAGACCTT GAGCATWI'A GAAAGAGGAA 'rATCGTCCAT TGGGAAAGGG GTCTCAAAGT AACCATI'CAA GTCCTTACCA GCACGGGGAA TCTATGTCTA CGAAATATAT TTTTGTAACT A'rTGTGGCAG CGACTCTAC CCGTCTCTTG AAGI'TGACC CTTATATCAA TATTGATCCG GTTTTTGTGA CAGATGACGG AGCTGAGACA
GGTGGTGTGG
AAAAATCGTG
GGAACCATGA
GATTTGGACT
11- 1.261 TGGTCACTA TGAACGTTTC ATCGATATCA ATCTCAACAA ATATTCCAAC CTGACAACTG GGA).AAT'rTA CAGTGAAGTT C?7CGTAAAG AACGCCGTGG AGAA'rACCTT GGGGCAACTG TTCAAGTCAT TCCTCATATC ACAGATGCC? TGAAAGAAAA AATCAAGCCT GCCGCTCTAA CGACCGACTC TGATGTCAI-r ATCACAGAGG TTGGlrGGAAC AGTAGGAGAT A'rCCAGTCCT TGCCA'N'CCT AGAGGCTCTT CGTCAGATGA ACGCAGATGT ATATCCATAC AACCTTGCTT CCTTACCTCA AGGCTGCTGG CCCAACACTC TGTCAAAGAA TTGCGTGGC'r TGGGAATCCA GTACAGAAGA GCCAGCTGGT CAAGGAAT'rA AAAATAAACT CACCAGAAGC CGTTATCCAA TCGTTGGATG TTGAACACCT TGCAGGCACA AGGGATGGAC CAAATTGTTT- GTGATCATTr CGGATATGAC AGAA'rGGTCA GCCATGGTGG ACAAGGTCAT AGATTTCCCT TGTTGGTAAG TA'rGTGGAGT TGCAAGATGC GGG'rGCGGAT AATGTCATGT TGAAATGAAA ACCAAACCAA ACCAAATATG TTGGTTATTC GGCCCAGTTC TGTGATGTGG TTACCAAATT CCACTGAACT GAAATTAGAC GCACCAGCAG GAACCTCAAG AAACAAGTTA C'TATATCTCA GTGGTCGAAG AATCAATTGG GTCAATGCCA TGCGGACGGG ATCATCGTAC AGCCATCCGC TATGCGCGTG .q
C
CCTTGAAACA
ATGATGTGAC
CAGGTGGTT3 CTCTGGCTAT GTCAATGATG CAGAAGTTAA AGCAGAGAAT GTAGCAGAAC TCTrGTCTGA TGGTCAACGT GGTACAGAAG GGAAAATCCA 6
S
AAAATGATGT TCCAATGTTG CTCGTCACGT TTTAGGTCTT ACCC'rATCAT TGATATCATG GTTTGGGACT TTATCCGTCT ATCAAGAAGT GGTGCAACGC AGCAGTTTGA GGCAGCAGGT AAATCGTGGA AATTCCTGAA CAAGCCGTCC AAACCGACCA GAAGGTGCCA A'rTCTGCAGA CGTGATCAGA TrGATATrGA AAGT'rGAAAC G'rGGCTCTAA CGTCACCGTC ACCGTrATGA TTTGTCTr'I- CAGGAGTTTC AATAAATTCT1 'rTGTAGCTrG GAAGAAC'rCT ACACTGCCTr GCTTCCACCA GAAACAAAAT GGATATGGGT GGAACCcT'rC GGCTGCTGCT GCT'rATCACA GTTTAATAAT GCCTTCCrTG GGAGTCTGCT TGGGAATGCA GTTGACATGT ATCGAGT=T 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280
TCCAGACAAT
TCAGTATCAC
TGTTACTGCA
GAGG7rTTT
CTGCTACCCT
CG'TG4GTAG
CCTGAACTGT
GCAGTTGAGA
GCATACGATG
TGAAATCAAT
ACAGCAATTA GCAAAATCAG AACCTr'rGAG AAAAATCTCA ATAT'rGCAG'r ATATCTGAGG TAGGGGTCC'r CTGTATGTAC AGCGACTCCC TCTGCCCTG TGCTAGTGAA TGGA'rTTATC AGTATATTGA AATGAAATA-A AATTTGAACA AATTAATTCG GAAAGCCAAA TCAATr'rCTA GCAAAGTTTT AGGAACTGGA TTGTATAGTG AATTGAAATA AGATGTGAAC A'rCTCTATCA GGAAAGTCAA A'rrAATTTAT AGAAATATTT TAGCAGTCAA GATGTACTGT TATAGATTCA ATACATTATA CTTT=?AAT TTAATCCACT ATAGTAAAAT ATPCTAACA ATG?rTAGA ATCATTCATA CCTCTCTCAA TTCTTCCTCC TCATGAGGTC AGATTTCCTC AAAAGGGCAG TGTTCTTTA.A TGCATCATTA ATTCAGGTrG ACT?1'TCTAA GAGACTAGAC AATT'rGAGGA
AAGA
1262 GA.AATAATAA CAGGACAAAT AATAGAGGTG TACTATTCTA CTAGATGTAA CTTACAAAAC AGTN'TACTr TCTGCTGTTC ACTCCTCCCT TGGrGCGTCA ACGACGCTTT TCTrCTAGGT CGATCAGGAC AGTCAAATCG GTTTCAATAT ACTATCCCAA CCCTGACCTC ATGAGCCACT CAGTATCGTT TTTCCTCGCT CACGA'rTTTr TCATCTCGAC GGTTCATAAG GAACAGGAAG 2340 2400 2460 2520 2580 2640 2700 2760 2764 TCCTAGAATA AAGTGCTGAA AACAATTCGG AATACGCATA GCTGCTTGCG TCC1'GTTCGA ACACATTTTC CCACCACGTG INFORMATION FOR SEQ ID NO: 241: SEQUENCE CHARACTERISTICS: LENGTH: 1682 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 241: CCGTTT~TT CATTGTTCAG TACTACAACT TACGTTGTAG CGCCCTGCAC ATTGGTTCGT CTTGITrCAGT
TATCACAACT
TTTTTCATCA
TACCGCCGTG
AACTCTTACT
TTAATTTTAC
AATGAAGATC
AAGAAATATA
TTTCAAAGGT CTTTGTCACT GCTTTCGCITr GTCAACACTT AGTGGTCCTG ACGCAACATA AAAAGGCGGT GTCTTAACCC ATTATACAGT CTTTTCAAAC AACAGCTTCA GTTCGAGCTG ATAGACTTCT ACTAAGCGTA AACCATT'T'r TCTGGTACAT TGCTTCTCTC AAGCGACAAC TATATTAGTA TTTTGAAGAT TTTTAAGTTT TTTTAAACTT CCATAGTCCG TACGGGATTC GAACCCGTGT CTTGACCAAC GGACCTGAGT TGTTATTTTC TTTGTCAACT ACTTrTTTAA ACTTTTITTTA TATGTGGGAA CATATCGACC GACTGGATAT CCAAATCACG AGCCAAGGTC GAAACATTAC AAGTAAGAAT AGTATCTAAT AACTTATCAT CCAGACCTGT ACGTGGTGGG TCAACAATCA AAGCATCTGC TCGGTAGCCT AACGAGGAAT AATCTCTTCT GCCGTTCC-AG CTTCATAATG AGTAT'TGTCA TTTTAGCATT TCGCTTGGCA TCTTCAATAG CTTCTGGAAT AATATCCATA TTTTTACTTT CTTTGCA.AAG GCAAATCCAA TCGTTCCAAC TCCACA-ATAA
TCCTTGTACC
AATCCCATTC
CCTCTGAGTG
GCGTCAATCA
AATGGTCTrC TTTATCAACA TCCAGCGCTT 'FrACTGCTTC GCTATAGAGG ACTTCTGTTT GCTCAGGA'rT TAGTTGATAA AAAGCTCGAG GGGATAGTGA AAATTCATAA TTGAGTACAC 1263 CTTrCTTGAAT ACTCTCTTGC CCCCAGATAA TCTCTGTCTT TTCACCATAT ATCTCACTGG TTTTAGCTGT AT'ITGTATTA ACAGCTACTG TCACAACTTC TGGGAAATCr CTTTTACCA.A TTGAGTrAAA TTAAGCTGGC GGTrTGTAAC AATAATAATC CGGTCTTTC CGCGCGTCGG ACCATAATAG TACCOACACC TAGAACTTTr TGATTGGAAT CTGGTGATAA GTAAGTAATT CTGCTAAGCG ATT'AGCAATC CCTrATCT'IG TACCAGGCAG TCTTTCA.ACT CTACTAAATA GTGAGAGTTT AGCCCGCCTT GACCTGATT TrAAA'1--I-C GAGrCTGAAA TTGTAAC??A A=NTGGTTC CTGCATTCCA ATAGT'rGGAC GAATTTCATA AT!,rCATAT CAAATT'M'r'r CAGCGCTTGA TGAAGTAAGT CCGTCTTGAA CTCCAGCTGC
TTAACCAACT
TGAACCTGTC
CTCTCATCCG
ACI'TGGGT?1r TGTGCA'rATA
GCTCTGTAAT
CC'rGCAGGAG
TTATCATAAT
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1682 a a a a. .a GCAGGTGCAT GATTTrGGCAG CCTCCGCATT CAN'ATAAAT AGTACAAGATI GGCACAATTC GAAATrTAGA CTTCTTGTTG ACCTTCAGTA ATTrTGCTrC AACAAAGTTG CGTCTAATAG AAGTAATCTG ACAATAGATA TCTTCGCCTr TGAGAGCTCC TGGTACAAAG ACTAATGTTT 'rTTGGTAAAA GCCGATTCCC TCACCGTTAA T'rCCCATGCG CTTGATTTTT AATGGTATT'r INFORMATION FOR SEQ ID NO: 242: SEQUENCE CHARACTERISTICS: LENGTH: 2524 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242: TTAACT'TTGG TCAATTCTTT AA.AGTCATCC TCTGTAAGCA TGTCTAACCA TTGATGTTTC CCTTTATTGC TAAAATCACC AATTCCGACT ACAGCTATAT CTAAATCTTT CCAACTATTT TTCAAATrTT CAAAATATCT TGATTGCAAA ATACCATCTG CTAACAATTT A~rTTCTTGC ACA.ATCGTTG CATTCATAAA TGTACACTCT CCATGAAATT TTCTAGACAT TTCATAAATC AGTGTATTCA CATGGTATTT AGCGTGTATG TGACTAGGAC CACCTGCTAG AGGATAGAAG TGAACATTTC GGACACTT ACVGTGAATT AAATCTACTA AATTACTTAA AC1w1MCCCC CAAGAAAAGC CAATTTTCAT ATTATCATCA ATTAGATTCC TAAGGACGCC TGCTGCAACT TGAGAAATTC TTTCAGATAA AATTGTrGGA GTATCATCA6A ATTCATTTGG AATAATTTCT AAACTTTCCA AACTGTATTT ?1'CTTTTACA TAATN'TCCA AC'PTAAACAT A1-PGGTATCA AAATTCTCTA TTTCAATrr AACAATTCCT ATAGAGGI-rC TATAAATTCC TAA'r7TTGCT TAATACAGAT AAGCAAT'Nr AGAAACCAGT TCT'rACGAAA CTACCTTAAC CATI'ATCCCA CAAAGTTTIT CGTCTGTTAT TACTTCATAG CCTCTTTTAT CAAATN-rACT TGAGTCAGTT ATATATTGAA CTACCTCACT GCGCATTAAA TAACCATCTG TCCCAACAAA AGC'rTGACAC 1264 ACATTCCTTG CT'rCTCTTAA CATTCTACTA GCTATTTGTG ACTGATTTAA CTTTCAATA TTATTCCTAT CTTGATTCAT ACACTTAACC GCAT7rCTA ATGTAGCTAT ACwrGACTTA AAGCAAA'rCT AGGACAATGA CTTTATCCGA AlrrTTGTTTrA
TCTTACTGTA
CACTGCTGAA
AAAGGTCCTA
CATGAATCAT
T~TrTrTr AT'rATTGTTT
TCATAATTAA
CAGTCACCTG TGAATCTTTC AAGCTCTCAC AAAATT'rGCT GCTTGCAAAT TTCCTCAGCA CATTATCTGA CACCAAT'rr' TTrGACAAACG TACATTTAAG
TCTTTTCCGG
ACATGAAAAG
TGAAACTCAC
ATAAAAAACG
ACTAAAGCAC
ACTrGCTTCCT
TCATCTCCAC
TAAAGCCCAT CTCTTTATCG TCTGTATCAT TTCTTTTAAT
CACCAAGAAC
AATTTGTTAC
AGGTCGATCC
AATAACACCA
AATCGTAACA
AGATTCTATC
GAACAAT'rT TCTCTTAGTT TA'rTTAATAC AGCATATCCA *c 0 6S TGCTCTCTGT GTAATAAACC TTTTGACTCT kATTrATCTA AATCTTTTCT AATCGTTACT TTCGATACAT TThATTTTTC CGATAATGTA TTAACGTCGA TCTTTTCATA TTCTGATACT AATTTAATAA TTTGTTCCAA TCTTTTCATT TTACACCTCC GTTTTATTCT ACCAAAATAA AA.AGCAAAAA ACAACAAATT AACCTTTCGT TCGTAATTGT TTTTCTTTCG TTTTGTGAT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 AGGATAGACT TATGAAGAGG ATTCAACACT TTTCAATTCA TGTCCTCTGC GCTGTCCATG ATGAAAGATG CTCAACGAGA ATTATTACAG AGGTATTAAA TTATCAGGAG GTGAAATATT AAAGAACATC ACATACACAC ATTrGATTTAA TTCAATATGT AAACATAAAA AAGTGACTGG TTTTCACAAA ATAAAACTAT TTAGAGGATG CAGAAAAATT CTACTCCCTT TTCATCAArT A'rGGATGGAA TCAACGCACT AGGAACTCTT ATGGAALATAT TGACGGTCCG GGTATTCGTA GTGTTCTAAT CCTGAATCTC GAAATTCACC TTAGTCGGTG AGACAAAGAA T=ACGAAG TGCTCAGTTT GAATTTGCTA CTAAAGGAAT TATTr"rAAT CAACTGTTT TTTAA.AAGGA AAAGAATGAA ACCTGAAAAA AAGAAAAGAC TGTAGAAGAA AATCCGGTGG AGGTTTAACT AAGCCATCTT AAAATCAGCT TGCCATTGAA ACTACTGCCT TTGTTGATCA TGAAAAATTT GGATTTTATC TACACAGACC TAAAACATTA TAATTCTATA GGTTTTTAAT CAAATGATTA TTAJAAAACAT TCATTATGCT CGTTTTAAGA ATCCCAGTTA TTCCTAATT'r TA.ACAATAGT CGCTACTCTA TTrAACTCAT TAAATATCGA CCAAGTTCAA TGGTGAAAAC AAATATCGTT TATTAAATCG GAAATATGAA TCATCCwGAA GATCTTATTG ATTATCAAAA GGTATTTCTG 1265 AACCACCATA TPTAA TGTTA TTTCTAGTTT ATTTCCTTGA AATGCTC'rAG CTATTTGCAG ATAACAAGCA TCTATAATAC ATACTrAACT TT-rCAAAAGG rAGCTAAA AAA~rrrAGC CAAACCTTr CTATTrTACC 'rrGCTCTAGA AT7TTAAAC TGCTATACTT ATCACAAAAA
AACG
INFORMATION FOR SEQ ID NO: 243: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2359 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 2400 2460 2520 2524
CGTGC
CAG
GGATG
CCATC
GTCTG
CCTTG
CGCTC
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 243: TTGGG GGCTTGTGGT CAAAAGGAAA GTCAGACAGG AAAGGGGATG AAAATTGTGA T MTA TCCTATCTAC GCTATGGTTA AGGAAGTATC TGGTGACTTG AATGATGTTC ATTCA GTCAAGTAGT GGTATTCACT CCTTTGAACC TTCGGCAAAT GATATCGCAG TATGA TGCAGATGTC TTTGTTTACC ATTCTCATAC ACTCGAATCT TGGGCAGGAA GATCC AAATCTAAAA AAATCCAAAG TGAAGGTCTT AGAGGC'r-CT GAGGGAATCA GAACG TGTCCCTGGA C"TAGAGGATG TGGAAGCAGG GGATGGAGT GATGAAAAAA TATGA CCCTCACACA TGGCTAGATC CTGAAAAAGC TGGAGAAGAA GCCCAAATTA TCGCTGATAA ACTTTCAGAG GTGGATAGTG AGCATAAAGA GACTTATCAA AAGCCTTrAT CAAAAAAGCT CAGGAATTGA CTAAGAAATT CGACTCAGAA AACATTTGTA ACACAACATA CAGCCTTTTC
CCAACCAAAA
T ATCTAGCG
AGAACCAAGT
AAAAATGCGC
TTTGAAAAAG
AAGAGATTTG
CCACGACAAC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GGCTTAATCA ACTTGGTATT TAACAGAAAT TCAGGAATT ACGCTTCTTC AAAAGTAGCT TGAATCCTTT AGAGTCAGAC AT.ATGAGTAT TCTAGCAGAA CTAG3CAGGTT CAGTGGCAGT GCAGGTATCT CTCCTGAACA GTTAAGACCT ATAAGGTTAA GAAACTCTTG TCAAATCAAC CCACAAAATG ACAAGACCTA GAATTAAAGT GAGGAAAGAA CCTTGCCCTA AGTGTTTGTT1 AACGATTTTT ACAGAAAGTA AGGTGTGGGT CTTAAAACTC TTTAGAAAAT CTTGAAGAAA TGAAAATITAA TAAAAAATAT CCTATGAGCT TGGACGTTAC CA.AGCTGGTC AGGATAAGAA AGAGTCTAAT CGAGTTGCTT ATATAGATGG TGATCAGGCT GGTCAAAAGG CAGAAAACTT GACACCAGAT GAAGTCAGTA AGAGGGAGGG GATCAACGCC GAACAAATTG TTATCAAGAT TACGGATCAA GGTTATGTGA CCTCTCATGG AGACCATTAT 1266 TCCI'ATGAT GCC.ATCA'rCA G'rGAAGAGCT CCTCATGAAA CATTACrATA ATGGCAAGGT
GATCCGAATT
ATTAAGGTAA
CGGACAAAAG
GCAGATAATG
ATCTTCAATG
GACCATTACC
ATCACTrGAA GGA'rPCAAC ATTGTCAATG
ACGGTAAATA
AAGAGATTAA
CTG'rTGCTGC CAkTCTGATAT
ATTACATTCC
CTATGTTTAC
ACGTCAGAAG
AGCCAGAGCC
CATTGAGGAC
TAAGAATGAG
CTTrAAGGATG
CAGGAACGCA
CAAGGACGTT
ACGGGTGATG
TTATCAGCTA
CCCTTCA.A
AATCTGACTG
CGTGAATTGT
GACCCAGCGC
TACCACTTTA
AAATCAAGGG TGGTrATGTC CAGCTCATGC GGATAATATT GTCATAATCA TAACTCAAGA ATACAACGGA TGATGGGTAT CTTATATCGT TCCTCACGGC GCGAGTTAGC TGCTGCAGAA GTTCTAGTTA TA.ATGCAAAT TCACTCCAAC TTATCATCAA ATGCTAAACC CTrATCAGAA A.AATCACAAG TCGAACCGCC TCCCTTATGA ACAAATGTCT GCCTATTGGA ATGGGAAGCA GGGATCTCGT CCAGCTCAAC CAAGATrGTC AGAGAACCAC AATCAAGGGG AAAACATTTC AAGCCTTTA CGCCATGTGG AATCTGATGG CCTTATTTTC AGAGGTGTAG CTGTCCCTCA TGGTAACCAT 1260 1320 1.380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 GAATTGGAAA AACGAATTGC TCGTATTATT CCCCTTCGTT1 ATCGTTCAAA CCATTGGGTA CCAGATTCAA GACCAGAAGA ACCAAGTCCA CAACCGACTC CAGAACCTAG TCCAAGTCCG CAACCAGCTC CAAGCAATCC AATTGATGAG AAATTGGTCA AAGAAGCTGT GGCGATGGTT ATGTCTrTGA GGAGAATGGA GTTCTCGTT ATATCCCAGC TCAGCAGAAA CAGCAGCAGG CATTGATAGC AAACTGGCCA AGCAGGAAAG AAGCTAGGAA CTAAGAAAAC TGACCTCCCA TCTAGTGATC GAGAATTTTA TCGAAAAGTA 2040 CAAGGATCTT 2100 TTTATCTCAT 2160 CAATAAGGCT 2220 TATGACTI'AC TAGCAAGAAT 'rCACCAAGAT TTACTwrGATA ATAAAGGTCG ACAAGTTGAT TTTGAGGCTT TGGATAACCT GTTGGAACGA CTCAAGGATG TCTCAAGTGA TAAAGTCAAG T"TAGTGGAAG ATATTCTTG INFORMATION FOR SEQ ID NO: 244: SEQUENCE CHARACTERISTICS: LENGTH: 1052 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244: TTCTTTCTGC TATAATCGTA TAAAATACTT ACTTTAGGAG 'N'CTTATGAA AGTTGTTAAA TTT GGAGGTA GTTCTCTTGC CTCTGCTAGT CAATTAGAAA AAGTTTTAAA, CATCGTCAAA AGCGA'PTCAG AGCGTCGTTT TGTAGTCGTT TCTGCGCCTG GTAAACGCAA TGCTGAAGAT ACTAAGGTrA CGGATGCCCT GATTAAATAC TACCGCGACT ATGTTGcGGG TAACGATA.Ir AGCAAGAACC AAAGC'rGGAT TATCGACCGC TATGCTGCTA TGGTTAGTGA ATTGGGACTA AAACCAGCTG TGCTAGAAAA AATTrCTAAA AGCATTCACG CCTTGGCCAC TC?1'CCTATT GAAGAAAATG AATTTCTCTA CGATACTTTC CTAGCAGCCG GTGAAAATAA CAATGCCAAA TTGATrGCTG CCTACTTTAA CCAAAATGGT ATCGATGCAC GCTATATGCA CCCTAGAGAA
GCTGGGATTG
AAGATTGAAG
ACTAAGGAAA
ATTGCTGCTG
GCAGCCCACC
ATGCGCGAGT
TACCGTGGAA
CGTATCGTTC
GGCTTTGTCA
AAGGTTCTGC
TGGTCACAAG
AATI'GACAAA
ATCAAATCTG
GTGTCAA.AGC
TGAACCTGGT CACGCTCGCA CACCAATGAA GTC CTT GTCA TACTCTCA CGTGGAGGTT '1GACCTCTAT GAAAACTTTA CTGGTATTAT CCACCAACCA TGGCCTATGC AGGCTTCTCA
CAC
GT(
AAATTCCTCr
TAAAACACAG
GCATTAACAT
AAATCCTGGA
GGTTATCAAG AA TAATGATGAA =r GTCGAAATAC CT( AGAACTTAAC AT CTCGA'rrC
:CTCATG
TACCAACA
[CCAGTTG
CATGAACC
TCATTCCATC AAGTrATGAC TCCTGGrTT CTrrGGTGTC CTGATATTAC AGGTTCTATC CGGACGTTGA TGGTATCTTT CTGAGTTGAC CTACCGTGAA ACGAGGCTCT TCTTCCTGCC ACCCTGACCA TCCAGGTACT TGGGAATGC TGGTGACTCA GTGAGG'rTGG ATITTGGCCGC 540 600 660 720 780 840 900 960 1020 1052 INFORMATION FOR SEQ ID NO: 245: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 855 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 245: CCCTCGAAAA CTAAGCCGAT GAAGTCAGAA CACTTCAATC CTGTTCGTGA CTGCTGGGAA AATCGTGAAG AGATTCTGGA AGGTAAGTTC TACAAATCTA AATCATTTAC ACCTAGTGAA TTGGCTGAGT TGAATTATAA 'PTTAGACCAG TGTGACTTTC CAAAAGAGGA AGAGGAAATC TTAAATCCCT TGAGTTGAT TCAGAATTAT CAAGCGGAAA GAGCAAC'rrT AAATCATAAG ATTGATAATG TATTAGCTGA TATTTTGCAG TTGTTGGAGG ACAAATAATG ACACCAGAAC AACTTAAAGC AAGTATTCTC CAAAGAGCGA TGGAAGGGAA ATTAGTGCCG CAAAATCCCA ATGACGAACC TGCAAGTGAA TTATTAAAGA GAATTAAAGC TGAAAAAGAA AAACTTATCA GTGAAGGAAA AATCAAACGA GATAAAAAGG AAACTGAGAT ATT'rCGTGGT GATGATGGGA 1268 AACAI'ATGG GAAGI'TGCT GATGGAAGCA CTCAAGAAAT CTGATACTTG GGAGTGGGTG AGGATAAAAT CAATTATTG CAGAGAAATc CTTTAGGTAT ATAGATACGT CTAGTATTGA ACTACAAAAA TCTACAATAT CNTTCACCTG AACAAGCGCC ?r'rCGCAGAA TAGTGTCTTA TTTr-CAACAG TTAGACCATA TTAGAGAACT TAAAGAGTAT TTGATAGCTA GTACAGCATT TGATGTTCCT TATGATATTC GAATTT'rGGG CAAAATAAGC TAGAAAAAAG AACATAATCA TTCCCGTGCT AGAAAATTAG TCTAAAAAAT ATTGCTGTAG TAATGTTTrG GGATACTTTA CTTAACGA.AA CATAT INFORMATION FOR SEQ I0 NO: 246: SEQUENCE CHARACTERISTICS: LENGTH: 660 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQtJENCE.DESCRIPTION: SEQ ID NO: 246: TTTAGGAAGG CTATCCGTAA TTr'rACAAAG GATTTAGATA ATTATCAAAA GAGAGATGTT TGGCGAATTT TTCAGTAGCA GCAACGCAAT ATGATGCTrT TGAAAATGGT GAGATAATTT CAGGAAATTA CTTTAGAGGA TGTCCTTGAT GCTGGACATC ATAGTTGA'rr TTACAATATT CCCATCGTAG TAACCTATTA GGATGACAAG TATGAGAAAA AAAACAATTG GAGAGGTTT AGGGA'rTGAG TTTAGATGAA 'rTGCAGAAAA AGACAGAAAT CAATGGAAGC AGACGATTTC GATCAACTTC CAAGTCCTTT AAAAATATGC ATGGGCTGTT GAGTTAGATG ACCAAATTGT GGAGTATGAT TACTTATGAG GAAGTAGATG TTGATGAAGA GTTCAAGTAA GAAAAAGAAG AAAAAAACAT CATTTTrACC INFORMATION FOR SEQ ID NO: 247: SEQUENCE CHARACTERISTICS: LENGTH: 1805 base pairs (B3) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TTACAGAGGA ACATTTAGAT TGAACTCTCT TGAATTTATT TTGAT'rrGCC GAAAATTTTA ATTTAATAGA TGATGGTGAC TAATAGACAC TAGAAAGAAG ACGATTAGCT AGAATCAATC CCAGTTAGAT ATGTTGGAAG TTACACGCGT TCT-rTCTTGA TTTGGATGCT TATGATTCTG TGAGTTGACA GGTCGTAGA2 TTTATTTTAT TTTATCCTGG (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 1269 CCGGTTGCAC ACATCGTCC ATAGTCAACT CTTCAAGTAT ACAAGTAATA ACACCITAAAA TGAAGCTTTr TCTTTAC~r AGCATGCTGA GGrAAAAAAC GCTCA'rCATA ATAGGAACAC TAGAAAA'rCG TCAAATACGC T1GAAAAGACA ACGCCAAGGA ACAAATATGA ATCCTTCACG CAAAAAAGGA GTGTGC?1'GG GCCAGCATGG TCCGTrTTGAT ATTCCCTGTC ATAAAAGCGT ACTTCTCCAA AAGCAGTTGT CACCACI'CCC ATACAGAAGG ATATTATCCA CAGT=GCCG CACAAAAGCA ATAATCATTG AACGACAGAA TAGGTTT CACAATTCTC A.ATTTCCT AC'rCCCATCA TAAACGCTAG CAAGGTGAGA ACCTTGTCCC TT'AATTAATr CTACTGAAAG AAAGACAACA TTTCCAGI-? CCGCGAACAA TAAAAGTGTA AGCATCCACA TATCCAGCAC AACCTTTTAG ACTGACGTGA TATTT'IrCT'r ATAGGTAATA GTAT'M~CTC TTAGAAATAT TGTrACCATTT TCrCTAAA GATTTACTA TTAGC-ATAAA AATAATAATA GACAACTATT TAACATGTTT GCAAACAAAG CATACGAACC TTAGTAAAA TAGAGCCCTC TrTAGCAAAAA TCATTATTTT AATTTATTTC TAACI'CTCAC CAATAAAAGA CTATGTCTTA AAAAAATGCT AGCATATCTC CTATTTCTT T CTGCCA AGAGGCAAAA CAAGAATGGT CTTTTCATCA CAAAACTACT AAGCAGGCTA TTCCZAAATA ATCTCCAAAA TATTATAGGC AATACCCGAC CCAAGGGCGG CACTAGATAG ATAAGATTGC CAAGGGAATC TATAAATCGT TAATAAAAAG TAACA'rCCGA AACA'rTATTT CTCCAGCTAC AAGGG'rATTC AAAACGTCAA AAAAAGTGCT ACCTCA'N'TT ACCTCCCATT AAATCGTAGG CTACCATTTA TATCCAAAAA TAGATAGATG TCATTTCCAT GAAACTAGAA 'rAATCACTCC TNGACATAA.A ATAATAAAAT CAATACTTGG TCAGGCACTT GT'rAACAACA ATGATGAACA TGCAAAACAT AGCCAAGCTG AACTTGCTC GCGACCTTAA CTGGCGATAA CCAGAGCACC TTGAGACTCT CAAATCGATG AAACTACCAA AAACAAGGAG CTAGAACAAT TCTATAATAT TTGTAGTGGG AAAAAAGTCC CATATGACCT ATGGAACAAT TACA'TrTAT TTAGACATCG TCAATAAG4GA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 GCTTGATG4GC TATGCTACTA AGGA'rTATCC CCT'rGAGATG GATGCGTCAA GCACAAAAAC TATGCAATrr GTTGGCAAAT GAAACTTGTC AGCATTGATT T'rCTGATATG ACCGTTCAAG GAAAAAACTG GGTGCTTTCG GCTTGTCGAT AACAAAGGCT TAAATCCCCT ATGGATATTA ATAATGAAAA GCGACAAAAC CACAAAATTA CTAGACATTA ATAACAATTA GGAGACAAAA AAAGGAACTT TAGAAATCTT TTCAAAAACA AATCGAACAA CTGCTCAAGA TCTTGTCCAA TCAATCCAGC 'rGTCGTTGAC CCATCAACTC TGCTCTTGA.A CTGGGAAATT ACCTTTCTAA AAGAAAGGTG CAAAAATGAC TGGAGCCTAT TrTTGTGTAG AACTCATTAG AAAGAATCAT AAGACCCTAA TATCCAGAT1' 1270 TACACACAAG GwAATCATCG CCAAACTGGr CTATGAAGCT CCATCTTGTC CTGAGTGCGG 1800 1805
AAG'TC
INFORMATION FOR SEQ ID NO: 248: SEQUENCE CHARACTERISTICS: LENGTH: 2516 base pairs TYPE: nucleic acid STRANOEDNESS: double -TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248: CTGCATCTAG TTTGTTTCTC
CGTCGCCGCG
GAATTAATAA
CCTATAGTGG
TGTACTATAA
CACAATTTGG
AACTTACTTG
ATGCTACACC
TGTACGAGTC
GAGTTTAGGA
TTTTGCCGCT
TACACTGCCC
ATTGTGTCGA
TTCTTTTCTA
TCCTrATCAG
CAATCAGT
CTCTATTTTr
CCCAGTTGCC
AGAACCTGTG
TGCTCATCTT
ACATTCAGAG
GCAGTATTGG
TTGGGGTTCG
TAATGACAGC
GTCAGATATT
TGGAGGTGGT
AAATCATGTA
TCAGAGTAAG
CTATTTTTAT
GTTCAATCAC
AATTTCCTTG
TCTTCTATTC
TATTCTAGTC
ATTTCTAAAA
GTCTTTTTTC
GAAATTTATA
CTCTATAGTC
GGACTAGCTA
TATATGATGG
ACAACGCTAG
GCTTATTT
CCTACAGT'rT TAGCTAGACA GATACAACTA GTGAGCTTGA ACTGGTGGCG GTAAAAGGTT ACCCCGGTAT ATTCATTGGG GACAATTATA TTGGT'rCTGG AGAATTCATA CTTCAGGTTC CCTTAAAGAT GGTTGATTGT TATAAGGAGG AGTAAGGATG CTCGATTTTA TATGTCTTTA CTrTCAATGG TATTTATAAA GAGGATTTTC ACCGATTAGC AGATAATAGA GGATGCTGAG AGAAGTATCT GGCTATTGTG TCCCCTATTT ATTATTATTA TTGGAGATTA TAGTGGTGCC TTGTAACGAC TCTCTGGTAT GTGCTTTGCT AGTGAAGAAA TTGGTGGTAT T'IrrGGGCT CTTTGGGATA TCAGAAAGAT TATTTGTTAG TTGTTTGGTT GATTGGAGAT TATGATTTAA GAAAGAAAAC TCCTCTGCTG AAATACCTCT AI-rCGTAGCG GTCTGGCTCT AGG.\TrGTCA TACTAGATTA GCTATGGCGC TTGGAATCCA GATTCTTATT GGGTGTAGCA TGAAAAAAGA GAATTTTTCA TT'rGTAATCT TTT'rTGACCC TTCTTTGCAT ATTGAAGGTT TATCGATTTT CTAGTAGCTG CACTTATCTG AGTCATTTTC TAACAGCACA GGTAGTACTG TAATTATrTC GGAATTAATC TTTTAGTGAC TTAAAAGAAT TATTTGATTC GGAGTTTGGG GCGCTGTGTT AAAATAGGAG CTATTTTCAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320
ATTTTAGGGC
ATCAGTCTTT
GTTTATGGTA
TTTATAAAAA
TATCTAGTAC
TATCTTACTT
CCTTAGTTAG
CATTTTTTCT
AGATTTCTTA
AGTATATTTA
GACTATGTAT AATGAAACAA TTGTTCAAT TTTATTTTAT ATTACTGCTA TCCTGTGTTT 1271 TTGCGCtGTC GCCAATATTC AATCCATCCA AATGTATTAG CAAGATATGA CGACTGGAGT ATATTGCTTT CCGTTCACAT T'PGA'rGAATA ACTATTTTAA 'rAGGTTGGAG TGTCGCATTC TT-rACCAGTT T'rAG?1'TCAA ATTAGcAGCT CTTAGTACGG TTrTTTATrGA TTTTCTAAT TGCA"rAGT AATGGTTTTA GAGGTTGATT ?'N'TAAGAGA ATrTTATGGT ATAAGTATTG ATAGGATTTT Tr"CTCTTA TATACCATAC TA?'rTCTTTT AGT'NTTCTT GGTTTAAAAA A'rCAAACATG AGCTrAGTAT p.
p p 0P *9 p. *p S P P P G'rAGAATCCT1 TATTCTGGAT TTTCAGTATA TGGTAAATTC ATCATAATTC CATTGACTGT GAAATGGGAA AGrTAAGTAG TTTGGGGTT ATGTTCTTTT GGAAATGATA GACTTGTTAT TrAGTCTTAT CTATATGGTT GTATATCGTA GAAGAGCATC C'TCTGGATTA GGTATCAGAT AAAGTTCTAG TGATTCAATT ATAGTTGATT CTTTGAATC GGAGCGAATG TACAAATGAA TTATCAGTTG GACAATGGGA CAATCCGTAT GCATTGATTT ATTTTCTGTT CATAGAAACT TCACATGTGG AGG1'rGAATC TTGGATA'rTG ATTTGTTITAG AGAAATTTTA AAAGGGCCTA GCTrCATTGG TTTATTATTC CGATTTCTTT ATGGAAGTGA TTGGACCTGT TTTCTTTATG TATGTTACAG TTACCAAACT TrGTGTGTTA GTCTTGTTTr CTTTGCTTGC GTTAGTTTCT AA'rGGATCTr AGTrACTT ATATATTGTT CTTTTTTTAT GTCTGAAATC AATTAAGCAC GGATTrGGAC GGCGACTTTA GCTTCTCTTT GGAGATAAAG CAAACA.ATGC TAGTTTCTTT TATC CTTACT TACTATTAGC TTCTGTTTAC ?ITrTTTATTT~ TAATTGGATT ATTGCCAATT ATTGGCTTAC ATTACTATCT GGAGGAGAGT GTAAAAGTTG AGA'rAATCTA TACCAAGTAC GATTATGGTA TTGGTTAGAA ATCTGAGTCA AAACTCTTT ATACATTN'T TC'rAGCAGTT TTCGATTTTC TTCTATTAAG GACTCATITT AATCATGGTA GGGATATAGG AGTTTTGTTT GCTTTATGTT ATACGCACTA TTTTACTCAT GATTGG INFORMATION FOR SEQ ID NO: 249; SEQUENCE CHARACTERISTICS: LENGTH: 1364 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: CGGTGTTTT TTGTAAATTT TCTAGCACTT GTATGGTAAA ATAGATACAG GTGTTCATTA AACTAGACTA AAAACCTATT TAAGCAGGCA AAATGAAGAA ATACCAACAA TTATTTAAGC AAATCCAAGA AACCATTCAA AACGAGACTT ACGCTGTCGG AGATTTCCTT CCTAGCGAGC
ACGACCTTAT
CCAAGAGGAA
AACCGTCAAT
AC'TGCGCTCT
ACTGATAACC
GGATGATCTG
CACTCGCCAA
TATTGATTAT
CATGGACATT
CGGACGCCAA
?TTTTGCAAAA
TAATATTTGT
AAGTCCCATA
AACAATTACA
ACATCATCAA
CTTGCCCTGA
CTTACCTCGA
GCTATCATTG
TTCCTCGTAT
ATATTGCTCA
GGAGCAATAT
GGATTGATCA.
TTCCCTGTAT
A.AAACCAACG
GGTTTCCCAG
GTATCCGTTC
ATrTGCTGAGC 1272 CAAGTGAGTC GTGATACCGT AAAAGATAAG AGGGCAAGGT CCAACCTAAC CAGCTACCAA TGGTCAGTCT GGACAAGATT AGTTrCGGAT GGTTTGGAAG TGGATACGGA CTATCTGGAT AGTCTATCTA T-rCTTATATA CCGAAAGcCC TGTCTCTCCT TCTCAAGTCG TCAAAGAAGA GAACrAGTTA AAGAACTTGG ATTATTGATA AAAAATCCTC GTGGTCCGCC AGCGTGTGGT ATGGAACTCA TCCCAAATCT GAAAATGGCC TCAAACTCCT C C 4 mm 0 mm. e em C.
C C *0 CC em
S
me ce S GCTCAGAAGG AAATCACCAT TGACCACTCA AGCGACCGAG ACAAGATTCT GGCAAAGACC CTTATGTCGT TTCGATTAAA TCAAAAGTCT ATCTCCAAGA T'rCAGTTTA CCGAAAGTCG CCATAAGTTA GAGAAATTTA GATTTGTAGA CGCAAGAAAT AAAAGACTGA GACACCAGAT CTCAGCCTTT TTCGGCTCTA AGTGGGTAAC CCCCCTATGG ATAT-rATGGA GCCTATTITrG TGTAGAAAAA TGACCTATAA TGAAAAGCGA CAAAACAACr CATTAGAAAG ATTCATATGG TT'rTATCACA AAACTGCTCG ATATTAAAGA CCCAAACATC AACATTCTAG TATGGATACC CACAAAGAAA TTATCGCTAA GCTGGATTAT GAGGCTCCAT TTGTGGAAGT CTAATGAAGA AATATGACTT TCAAAAACCG TCTAAGATCC AACAACTGGT ATGCCTACTA GAATTCTCCT 'rAGAAAGCGT CGTTTCAAGT TTCrAAAATG ATGGTCGCTG AAACTTCTAT CGTCAAGAAG AATCATCAAA TATCAACCAA AAAATTGCGC AAAAGTTGAT TGAGAAGATT TCTATGACCG TCAGCTGGCC ATTTCAACTT CAACTGTCAT TCGG 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1364 *m U- S C .me.
C
.me.
mace Ce..
mm 0C m C C C INFORMATION FOR SEQ ID NO: 250: SEQUENCE CHARACTERISTICS: LENGTH: 1227 base pairs (B5) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID, NO: 250: CCATGAAGAC CGCTTGGAAT TGGAATGGCA CAAGTCTTTG TTGAATGGTC TATTCCCATT GACAATCGGT GGAGGAATTG GACAATCTCG TATGGCCATG TTCCTACTTC GCAAGAGACA CATCGGAGAA GTGCAAACAA GTGTTTGGCC TCAAGAAGTC CGCGATACTT ACGAAAATAT 'TTGTAGAGA ATCGAACCGC AAGGTTCGGT TTrCTPTCrC ?1-rTrGTCTA TAATTTGGTA 1273 TAATAAACAG TATGAAAATC
AAGGCAAGAC
GTCCCTACTT
TCGAAGCAGT
GACCATCGTG
GATGGATGCA
GACAAGACCT
TGAAGTC*GGA
ATC!GCGTGGC
CTG4AAAATA
GAAAGGGCAT
GTATCAGGAA TCrATGGGGG ACGTCCCCTC ACTTCGGATA AGGI'TAGGGG AGCCATTTTT CGAGTC7TTGG ACCTTTATGC AGGTAGTGGT
AAGACACTAG
AACATGATTG
GGTTTATCTA
CCCTCCCTAT GCCAAGGAAC T'=TCTGAA GATGTTATGG AATTGCCTGT CTGGGTATCT TGTCAGATAA GAT'rGGCT'rA ATATCATTGA ACGGGCGAGC CCCACAAACA AGGATrTCTC AACATTTGGG AAATGTTAAA AAAGACTGGG GGCTACTTGC AAGCCAGTTT TGATTACTAC ATAGTCGACC TGAACATCTC GTCAGGATAT TGCCTGCTAT ATGTCCAGTG CTGTTTTGGT TCCAGATGAC CAAGGAAGTT TGGAACAGGT ATCTGGGGAA AAATCGTAGC AGATATTGAA TTGTGTGCCA CACGCATAAA GGAAGGAAAA GATNTATGGA TTCACAGGCT CA'rTTGATCC AGACTrTTTG ATAAGCTrTA CCTCTTGAAA ATCGTAAACG GTCGTGTCTT CTCATGATAA CTAGTGCGAG GT'rTGAGAAA AATCATCAGC TGTC'N'CTGA TATATCAGTT CATCAGCCGT GT'rCCCG TT TGACCTCG
A.AA.ATGGCTG
GCCGTTGAAC
GGAGCGAGAC CGTAAGc'rCA GGAA.AArTTC AACTCCTCAA ArTAGTAAGG TGACAGTCTA GATGACAAAT GGGCATCTGG TGTGGGTATI' TN'TTTAATC GGGGTTAGAA AAGGCTGTGA ATTGGTGGTC GATGTCGCAA TGCGTCGGAT TTGCAATATG TATAGAGACT ATTTATTrAC TAGAGAGCTT TTGAAGTTTG
TTTTCTTAGA
AGAGAGAGCT
?TCCAGAAGA
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1227 INFORMATION FOR SEQ ID NO: 251: SEQUENCE CHARACTERISTICS: LENGTH: 3652 base pairs TYPE: nucleic acid STRANDEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251:
CCGGTCA.AGT
ATCAAAATAG
ACAGAAATTG
TCACTACCAG
AACTCATCTA
TAAAAACGCT
CCCA.AGCGTT
GCTTTTCAAC
AATATGAAAT
ACCATTTT'
ATTTCTTCCC ATTTTATTTA CTCACCCGTG TGAGTTTGAA
ATAAGCTTGT
TCCAATCTCT
AAATGTTTTA
AAGCTTTCTT
GCTACAGCAC
GTTCCACGAC
ATAT'rGACAA TTTTTTAGGA GTGGTAACGT TAATGGAACC CGTPTTCCAA.A TCATCTCTTT TGCCCAATCA CT-rCAGAAAG CGATGTTATC CTTTAACCAT TGGCTGCAAT GAGAGCTCAA CAAGCCAGTT TCATGGTCTT TCATT TGCAT GACAGCACGG ACACGGCTAA TTCGTCCACC TCTGTATGGT TTTTAACCTC TGAGCTAACT TCGCAGCCTC TCAACAACAG TCACATCTGC CCCGAAAGAG CATGGGACAT TCAAAAATCT CAGCAAAGAC TCTTGCATCA ACTGAAGAAA TTATCAATCA TTACAGATAA TCAATAGTAA CAGATGAATC TTTTAGGA'rT CAGATTGGTT TAATGGAGTC AATCAAATT GTCCTATGAT TAAAA.AAATT CT'r'r'GGAC'r GAC= ATT TTACCCTCAT CACCTTTTAT ATATTCCCCT TTCATCCTA TACTAGGAAC CT=AGCTTTG 7TrGATCTCA AGGTGAT'rTA GCCTTGGAAT TATTTTTAAT ?1'TGACAAA
TTCTACATGA
AACAACTTGG
AGTAGATAGG
ATGAATAGCA
ACCTACAGGI'
TrTACCTTCT
TGGAACAATT
1274
TTMCCAAAG
GATAAAATTG
AA~wrTCACGG CrAGCACCr
AGAATC'TGGC
GGCTGACTTG
TCTTTCAAAT
GTAATATCTA
GGTTACAATC TTAA'NTTG TCCTTACTTC TAATTATATC TCCGTAAAAT TTGATATAAT TACCCCATTT 7TACCATTTT1 GTAGT'rCCCC ATCATCTCTT CTTTTTAAAA TCCCTGTTTC GCTT'GGAAGA 'rTTTTGGAGC TCCGGCTGGT 'rAGCTTTTTT CTAATCACAG CCCTTATAGC GCTGGAGGTA CAACTGGCGG ATATCCATAG GAAAACTGCT ATCTTCAAGG ATrTGAGATT GI'TATTGATT TGATTGGTGA CGTCCTGACC AACTTGCTAA TCTGGTCAAG GCTACTATAG TTGAAACACC AATATAAAGC CCTCCATATC TTTACCTTCT CTTGCGTCAGT GAAGGAACTA GGCGTGCTGC TTCTACCGTA CACCATCTTT GCATAGGTCT TTTCGGAAG ATTCTTACTT CCGCATCAGA ATAAACAACA ATTGCTTTAC TAGTTCAGGT TCATAGTATC AATCTTTICTA AAAAAAAAGA PTAAAAATCC CAACTTATAA GAAAAGAGGT ACTAGGTGCT GCTATTTATG TGAACGAGGG GCGACAGGCA CCTCATGAAC CTGCTGATTA CAAATCCCTC TATTCTAGTT TGAGCATATT CCCCTTCATA GGGAATCCTA TTGGGAA'TGC AACTGATAr'r CTAGCTCGTA CTTTATCTTA GATTTTTGTA GGTTTCCTAC ACGCTTTTr'GT AGGAGGATAT GCCGGCAAAG GGCGATTAAT GATGACCTCG TAAAGAAAAT TTGAAAATCA GGAAATGATT CATCGAATCG CCTAGGAGAA GGCTTCACCT GTTAGACTAA ATCATCTATC AACC'rTTAA GACCAATTCC GTCAACTGAG TCAAATTTGG TT-CCCCATAA ATAT GAT ATCAAAGGTC ACATT'TTTCC 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160
TTCTCAACAA
TTCTCATGTT
TTGATTTAT
GCT'rTATGAT
GAAGAGGTGT
TCTACTGTAT
ATCCTCAAGC
fTGAAAAAGA 'TTTGGCTrA GAGT'rACTTC
ATACACTCAT
GATTCTCCTA
TGTTTCTCGT
TATCACAAAA
TGTCGGAAGA AATCAAATTG TGAAAACGAA CTTTATAACT ATTACAGAAG CCCATGAAAT ATAAAAAGAG GTAATGTCGT GACCTCAAAA CAGACAACCT MTTT1ATT TTATTTACTC TTCATCAGCC 'NTrAACTGAT CCACTAATTG TCATATCTCG AATGCGATCA AGCCAATAAA CCA'rGACGGT TAAAATCAAA AATATTGACT TCAAAACGTG CTTCTTCTCC CGACACTAGC CATAGCACGA CATCTGCTGG CATATAAGTA TACGACCACG AGCATTACCA GN'TTCCTGC TrCTTTCACA TCTrTCCTxTr CTCATCTTCT TTAAATCTTC TGCTGTTTT TTTTCGCATT CATAGCCTI'G ATTGACTACT AAAATCAAGG CACGTTCAGC AGGGTTCAAA TACTTCTGTC '1'V1GAATCTC AACATCAACA ACATAAACGC CGGTCTAAAA GCACTAAATT CGCTGTCGGA TAACCAATTG TGAACCACCA TACCTCTTGA TGGAAGCGGT GCCCCCAAAA TTTCCATCTA AA.ATAGCTTG ACAGGTIGGAA CAATGATAAC TTGTCAGAAC CAAATGTATA A'rATAAGTTG CAAAGAATTC AGATATAATT CTTCTACACC ATATGCAAAA ACAAATCTGG
TCTTTGGAGA
TGTTGGCAAC
'rTCATrAAAG GTCATAACGA CGATAGGCAA ACGAAATAAT rCTTGATGCC CCTTATGTAT CAACGACTGA ATCAGATGGr G'rGCCAATAT CT'rTTTGGTT TCATAAAATA ATTATATCAT AGCGATAGCT APICTGGAA TTTTTTCACA TGAAGTGTAC CTGTTTTCAA AAAGCACTTT TGAACTTTGC AATATTCTrC TCAAAAACTT GTAGGACATC GATTAGACTT GTTCGGTAAC CATAAAGI'GT CATACTATGC CAACTAACTC CTGAGAACN' TAAATTACTA ATTGGTGCCG GAGG'rACACC TATGGCTGTA AAATTTACAA AATGAGACAA AATTTCCTAA ACTCCCTGAT TTGAAGCAAG TCACTTTCCC AAAACAGCAA AGAAAAACTA GATGACTGCT TTCCAACA.AC ACTGGGCTAT TTTTCTCTCC ATCTGTTAGC TrGGATTCTC TATCTTTTCC CTTATCAGAA GGAACGGCTC AAGCCATTCA GCCAATATTT GAAACCAGAT AGCAGTTCTT ATAGTCAATT CGAGTAGGAA ACTCATATCA ATGTTTAACA GTGTTCTATT AATTAAAGTG CAA~CTAGGA AGTTAGCCGC AGGTCATACT INFORMATION FOR SEQ ID NO: 252: SEQUENCE CHARAC!TERISTICS: LENGTH: 743 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear ACGGATACGA GI'GAACTAA TTCTCCATCA AAGTAATTCT ATCAAAACCT GCAACA.ATAA TTGTGCAGTG AGACTAGCGA TTCGCGCN'T AATTTCTTT ATGATAAGGC TCTAAAGCGA CAAATCCTTT CTCGCAGCCT GCCATCAAAA TAGCCGAGAA TTTTATAGGA ATAGTAATAA CAGAAAATCT GAAATGTTGT ATTCTATCGT TGCTTAACTA TTCAAAATTT TGCAAGGAGT TTATGTATGA AAAAGCA.ATG AAAAGG'TAGA A NTTAGAATC CTTGGGCAAG ATGTTTGAAG TAATGACAAA GAAAAAAGCC TCCCATCTAG TGTGCTTCAG AGACCGTTTC AGCTAAAGAA GACAAGTGAA GGGACGACAA GAAATAAAAT CTGAAGAAAT CCAGATTCAT ACTCAATGAw TTGGGTACGG CA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3652 1276 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: GTACCGTGGT GCCAAAGTAC TTACATCAAA GAA.AATGGAA TTATTATCTA AAATCCGGTG ?TGGTTTTAT CTCAAATTTG TAGTCAAGCT TGGTACTACT GGATAAGGAA TCTTGGTTTI' CTACGA'N'CT CATAGTCAAG TGAATGGATr TGGGATAAGG AAAAGAATGG GTCTACGATT CATGGCGA.AA AATGAGACAG AGGAAAAACT ACAAATGAAA TTATGATTCA GATGGTGAAA TAAGGATAGA AAAAGTGATG AGCAAGGTTG GCTTTTTGAC ACTATGCTGA TAAAGAATGG GCTACATGCC AGCCAATGAA
ATGGGAAAAT
TCAAATCCCG
ATC'TCAAATC
GGCTGAAAA
TGGTTACATG
TGATGGGAAA
AAACAATACC AATCTTGGTT ATTTTCGAGA ATGGTCACTA TGGAT'N'GGG ATA6AGGAATC GAATGGGTCT ACGATTCTCA ACAGCCAATG AATGGATTTG ATAGCTGAAA AAGAATGGGT GGTGGTTACA TGACAGCCAA TCTGATGGGA AAATAGCTGA TACTTCAAAT CTCGTGGCTA AGCGATGGTA AATGGCTTGG GTGCCTGTTA CAGCCAATGT AGTAGTGTCG TATGGCTAGA
S.
CTTGGTACTA CTrCAAATcC AATCTTGGTT TTACCTCAAA CTCATAGTCA AGCTTGGTAC TAGATGGTA TCAGCTTGGA ATGCTGCTTA CTATCAAGTA AGCTTTCCTA TATATCGCAA
ACA
INFORMATION FOR SEQ ID NO: 253: SEQUENCE CHARACTERISTICS: LENGTH: 4010 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO; 253:
S
TTTTCGTTGA TGATACGAGC ATCATTTTTG AACAGTGATA ATACCAGTCG T'rTCAAGATA TGGTTTCCGC ACTCAGGGCA TGTGTATCCT TA'PTGATGAT TTTGTGATA.A AATGTAATTG CATTATAGGT CATATGGGAC GGATTTACCC ACTACAAATA AATATAGTAA TCGCGTCATA GATTTGGTGA TTCTTCTTGA GCACTTGAAT CGACGCTTTC AGGAA'N'TTA GAAGGTTTTT AGATGGGGCG TCGTAGTCCA GTCTAAAATC TGGATATTAG TTCC-ATATGA TTCTTTCTAA TTTTTPTCTA CAATAAAATA TTATAGAACC GAATrAATTT ACAAGGTATC TATCATTCAT CGATAGAAGT TTCAGCGACC TAAGGAGAAT TCTAGTAGGC GAAAGTCATA TTTCTTCAAT GrTTTGGCGAT GATT'rCCTTG GGTCTTTAAT GTCTAGTAAT TGAGTTGTTT TGTCGCTTTT GGCTCCATAA TATCTATAGT AATTAGAGAG CCAACTTI'CT GGAGTTCC1TC CTGTATACTA rrAGTAAACT AAAACTATTG 'rrCCACAArC TCGTCGTTTTT TTGGCAAGCT TGAAGTAAGT TGGATAAAGT GTTGCTCTAT T13TGGGAAAA TTTGAAT'rAG
TTTAAAGCCC
CGGATATGAG
ATCCTTTCTC
TTTACTGGGA
GGAAAACTGA CT'rGGCCCAG GATrGGCTTA TGAAGGGCT TTCTATTGAA ATGAGGACTT GCCCATGGAA GAAGATTGAA AATGAACTTG CCCTTCTTCG TCCTCTGAGA AAAGTGTTG ATGGAAGAAG ACTCTGACTT AGCTAAAGGG AALACGTCAAG AGTAGAAGAG AGCATTTGCC TCAACGACAA GAATTAGTCT CGCTTATAAG TGCCAAGCAT TATTCCTAAA GCCCCTTTGG CCATCAGAAG T'rTAATCTGA GGGTTTACCA ATCACACGTA TTTPGAGCCC CTTTATAATC GGA'rGAAACC TCTTATCGGG TT'rGTCTGGG AAAGCTGAGA GAGGATANTT TAATGCCACA ACCTATTGT CCTGTAGAGA GATTCTAAAA AGAGAAATGA TATT-CTGCTT AAAATTCGTA ?'rT'TCAAT CTCTCAA'rCT CGAAATGGTA GAACAGCTT GACAATTCAT CTATCTAGCC TAGGGCAGGT CTATCTCGTA GCAAGGCATT GATTCATTGG CTTATCTGGT TAAAACCCAC CGGTCAAGT'I 7rTCTCTr'rT GTGGTGGACG TAAAGACCGC TGGTrCAAGGA TTTTGGCTAC TATATAAACG CT'rTGAGAAC TACAGAAAAG GATGTCAAAG CTCTCACACC TGAACAAGTA TTCTATCACT CCAAAAATAA AT'rTATCAGA AAGTCGTGAT TCTTTTrAGT TATAATAAAG TTAGGAAATA AGGAGAGGAA AATCA'rTCAA CAACAGAGTG CTACAArTGA TAGTCTCACC TGAACAAGTG GCTTATCTAA CGCAAAAGCT CTATGGAAAA CCCATCTGGA CAACTCAGTC TTT-TGAAGA GGAACAAAAT ACCCAGTTGA AAGAGAAGAA ATCACCTATA AACGTAAGAA CTCTTCTTGC CCAA'rTTGAT TCAGAAGAAG r'rCATCATCA CTGATTGTCA GGGAGATCTA AAAGAGATTG GAGCAAhCCCT 1'TATTCCTGC GCAATTAAAA CGAATAGATC ATATCCAACA GCAGTGATAA AAATCCGAG'r GATAAAATCG 'rGAAAGCTCC CGCATAGCCT TGGCTCAGCT TCTATI'ATCG CTCACACCAT AGGTACCCAA TTATCGCCAA GAAGAAGATT GGCCTAAGAT AGGAAATTGC TAATTGGCAT ATCALAGGCGA GTCAATACTA T=TACGAGA AAAGTTCTTA GAACAAGCTC TrTCTTCATGC TTCTAGAGAG TGATAGTCAG TTGCCTTACT AT'rGGACTTT ATCAAGCAAT CACGCTGTAC CACCATGATC AGCGTCGGAG 'rCCTAGGAGA 'rTATTCTGGC TATGTTCATT GTGACATGTT TAGTCCTCTA GCrCTGCCTA 'rGCGATACCA GTCCAAGGSN GCTTGGTAAA CTGCGAACAG CTAGAAGCTT ATCGTCAACT GATCTTGGGC GCATGTGAGA AGGAAGT=r T'rAAGTCC CATCCTTAGG AGCTAAAGGT TTAGCTTATT GTGATCAGTT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 TGGrrIAGTrA
GCGGCAGTAA
TAGGAGTAAG
COAAGAAGCT
CCCCAAGCAA
GTACAAGA.AT
CTTAGGACTT
GCGACGCTAA
GCACTTGTTG
GCACATAAAT
ATTTTCCTTG
AGAACATCTC
AGCAGGrCA.
GACTATTTTG
ATCATTGGTT
TAAAAAAGCG
TTATGAGCT
TACGCATCGA
ATATCTTATT
TTCCAAACCA
AAGTTGTACA
CTACTCTGGA
ATCCCATATTr
ATTATAGCAT
AAAAATAAAG
GTAAATAAAT
4GTTTATAAA
GTATCTTTTA
TTACAATTCA
AAAAATT'
1278 GAAAGAGACT GGGAGGCTTT GCCAGCTGAT CAGCCCCTAA 'rGGAAGACTT CTrTCTGG AAACTAGGAA GGGCAATTGA ATACAGCCTC AAAGACGGAC ATCTGGTCCT TTCCAATAA'r ATGGGACGGA GTAAAAGAGT CCAGTGGACT AGGGTGGTTA TT7=CCAAA GTrMGAAGG GTTGGAAACA GCTAAACGTC ATCAATTATA CTGCTAAAAC A'1-r'CTATAA ATCAAT'ITrC TCAATCCATT ATAAATAGCG AGAAATATCT GGAAACTCTC GTAAACAAAG AGGTTTTAGA AGAAAAGTGC AAATAAGAAA 'rCTCCAGATT GAT'N'TTCAA TAGACTTCGT TATTGGACGG ATTC'rCCAAT TCTATAT'rrT ACCTTTCTAA ATAACGCAGA TTCCTTTCAA TCGTATGATTr GCAGTCCGAA GACTGCCGAT ATTTATCTCT AATTTCCCTA AAGATATGGA A'ITrATrAAT GAACGACTrAC AGAAACGTCA TGCCGCCCTC AGTCAG'TTTr AAGTA'rGAAG AAACCTTTAA CTAGCTGAAC GCGCCATIAA CTTTTAGCCT GACTCAGTT AGCTAAAGCA AGAGCTATTG GTGCGTTGAA TCTATAACAG CTTTCCTAAT CGATTTGTTC A'rCCTATCTT CTAGAATGTC GGCCTATTTA CCGTGGACTA AGGAACTATC CGTGAGTTCT ?rACAA'rTrA TTATATGAAA ATGTATAGAT 'rAACTACCTA TACTGCATTA AATTAAGTAA CATCTCTTTA Ar'rAGGTAA ACTATAAATA CATATTATAA ATA.AATATCA ATATGTGTTA CATGTTTAAC AGACTCTATT TCTATATCAT GGGAAACAAT TCAGATATTT 'rTATCATATT AATCGAGAAT 'rCTTGCACAT AAACTCATCG GATGCCTTTC TCATACCTCT CTCAACTAGA CCTCCTCATG AGGTCAGTT TCCTCAAAAG GGCAGACTCC 'rTTAATGCAT CAT1'AACGAC GGTTGAC'rrT TCTAATCC'rA TAGACAATTT GAGGAGCTGC
AAAGATGGCG
2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720.
3780 3840 3900 3960 4010
TACTGTAAAA
AATTrTTTAGA
AAAATTTCAT
TTATCCATGG
ATCCTGAAGT TAATTTCTA CAATTTACTA GTTCTATAGA
CTCCCACTTC
TTTTATACTT
AAGGT-rCGTC
TTTGCTTT'G
GGAATAATCC ATACCACTTG AACAGATGCT ATTGCAAGCC
AATAAATTCG
TGTAACTTAC
TACTTTrCTGC
.TCCCTTGGTT
GCTTTTC?1TC
GAATAAAGTG
?1'GCGTCCTG TCCAGGCATA AATCTTTTAA AAAACCCCTG ACCTCATGAG TGTTrCCAGTA TCGTTTTTCC CGTCACACGA 7TTTTCATC TAGG'TGGTTC ATAAGGAACA CTGAAAACAA TTCGGAATAG TTCGAACACA TTTTCCCACC
ATTTAAAAAT
ATTAATCAGT
AAAAAAGACA
CCCTCCTGAT
CCCAAATCAT
CCACTTTCTr
TCGCTAGATI'
TCGACTGTTC
CGAAGATTCA
GCATAGAGAC
ACGTGAAGAA
INFORMATION FOR SEQ ID NO: 254: 1279 SEQUENCE CHARACTERISTICS: LENGTH: 2789 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254: 4 4 4. 4* 4 4 4 4 4 4 *4*444 4 *444 4 4 4 *444 4.
4 4 4
ATGCATCCGT
GAAAATGACA
GAGTATTCAG
TTCGATGACA
CTCGATTCAT
CTGTCGGTTC
TT-CGTT~GTTG
CTACTCTCTC
ATPTCAGCGC
AAACAGATAA
ATTTGTCTGA
ATTTAACAAG
ATCAATCCTG
TTTCAACTCA
CTAGAAGCTA
AGTGGTAAGA
AAGAGCAGAA
GTAATGATAT
ATGCTAAGGA
GAATCCACTT
GACAATTTCT
ATCGGATTAT
TCATTGACCA
AAAAATGACC
TTGTCAAGCC TAAATTGTAA TlrrTrCAA 'rTTAAAACAG AAAAACCCAG TAAAAATATC ATTCCTAGGC CTATTTATGC TATTTCTCTC TGAAAAATAT TCGGTCAAAT GAAGCTGAAC GAACTCATTT TCCCTCGCCT AATTCAATGA TTGTTGGGCT ACATAAGCAT CGTGGGTCAC GATAATGACT GTTTTCCCCT CTCTAAGAGA AACTTCAAGA CCAAATCTCT ATTTCAGGA TCCAGAGAAC ATCGGCTAAA ATCAGCTGGC TGGGTTTTAA GATGGCTCTA GCAACTGCAA TTCGCCCCCA CACAACTCGG AGACCCTTTG ATGCAAAGTA GCTGATAAAC TAAAATCTCT TCCACCTTTT TGAGCTTGTC TTTCTTAGGC AATTTCACAT CACATGAGAT TGTACTCGAC CGTTTCATC-A TCAATCAGCG CAA.AATTTTG GAGATATGTT CACGGATTAT TGTTTGCGAC TTAGCAGAAT TAACCGCTAG CCAAAAATCT CATACCGTCC GCTATAATCA CCATCTATCA AACCCAATAA GTCGACTTCC CACTACCACT CTTACCAACA ATAGCTACCA AATCCCCCTG AGAGATAAGT TATCCAAAAT CACT'rTTCCC CCAATGGTTT TGGTAATATT ATCATAAGAT GCCCCCTTTC AA'rAACTCTA CTAGACTTCT TTTCTCCATC AGCCTAGCAC AAATAGTATA TCCAGACi.TG TAAAACCTGC AAACAGTAGA ACCCATGGCC AAAGAAAATC AAGACTAGAA GAGGGAAACT ATAGCCCAGC
CGAGGAGAGG
CCCTGCGCTT
GACCAAACAA
TCTCTTGTTG
CAACTAACTC
TTAAGCCAGT
AGCTAATAAT
AATCTCCNTC
ACGGTAGCGA TCGACCAGTT TCCACCCCAT CAATAAGAAA GTTGTTACTA GTAAGAAGTA AGCAAAGAGT AGGTTAAAAT TCCGAACAGC AATGGCTTGA ATAGATGAAA ATTTTAAATA TGTAATCTCT TTTTGATGTT GAACCGTATT TGTTGACAGG GAGGCTTTCT CATCCCACAT TGGATTGGAG AGATTTTCCT TTCGCTTATC ATAATAGGCA ATCTCGACAT CCATCTCCTC
AAACTTCTTG
GGAAATCATC
ATCTCGATAA
ATTTCCATCT
TTCAATTTTA
CATATCAGAA
ACTATATGGG
TATCGTTCGT
TTTTGCTGCT
TTATCI'CTT
GGTAGCTTGA
TCTGTAACT
CTACTCTGCA
TAATCTGTAG
AAGGTTCTA
TTGTrTTT
GTCCCTATTT
AGCAGAGAGC
TAAAGCTrGCA 1280 CTTrCATAC1'T CATCGAATGA AAGGCAATTA CACCTTTCGTr ACI'GCTGGC ATCAAAATAA ATCCCN'CCT C'N 1 TAGAAAA TTGCGA~rGG GATATrGC1'G AATCTG?=C GATTGGACAA CATAGCCCGC CTGCGTrTTr TCTACCAAAT ATrTCCCTGA CCCTGCTAGC TCrTC?1'GCC AGGTCAGGTA ATTACCTrGA CTTACCCACT GTTCTAAACT TCTGCCCACC CCAATCAGTA TCATCACATA A1-rGAAGATA AGACCAAATT TGATrGTCAT TTTTrTGGATT AAGAGGTAAG ACAGCAAAAA ATGAGACAAC CACAGCATAG ACTTCCCCAA GAGCTGATT CTTTTTT~rAAT ACCGGTAT~r CATAGTAAAC ATCCACCGTA AA'I-I-rTTAC AGGAAGACTG CCTGATAAAA TCGATAGAAA ACAGATTATC AT'rGAGTTTG GT'rCCTGATA AGCAAGTTCT AGGCCGTCAG TA-AAATAGTT TGAAAGATGA AAAACCr'rrC TCAACCAACT GATAAAGAGA GAAACAAATC TwTG4GCTTA 4 0
S.
4 4 44 44 4 4.1 *4 0 44 44 0 4 TAATCAAGCA AGAAAAACAC GCCTAGATTG ATCACAAGAG CCCCACCTAG AGGTTGCCTT TTACAACATC AGCTAAAACA GCCCTATC'rT GAAAACCAAG
GAGGAGGTAA
TAATTTTTGT
1500 1560 1620 1680 1740 1800 1860 1.920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2789
ACCCCAACTC
ATAGCCAAGA
CTTTGTAGG
TGTACAATCC
CTACGAGATG
AGAAAGGTCG
CCATCTGTGG
TCAAAAACAC
TACTCCTI'TG
TTCAAATACA
TTTTCATCTC CATCA'rCGGTTGATACACTG TCACTAACAC AAGAAGCAAA CAAAAACAAT GGCAGATAAA AGCAAATCTC GATTTATGAC rTCCACTGCA TCGGCTCTAG CAAGG'rAGCC TGGTCTATCT TGAkAAAAATC GCTCCATTTC TATCCTTGTC CATCTCTTGT GTrAGAAGTTA TCGTATAGCG ACCATTTAAA TATCCTTGAT ATAGGTT1TGA AAAGTCATAA GCTGAATAGG TTTGGCTTT GAATCGTACC AAGTTTA'rTG GAAATTrCTT TATTACTATA GACTCCTTCA TAAAATCAAG AGAAGAAATC CCAAACTCTT GGTAGGGGAA GGTATCTTTA CAGACTTGAC CACCTCATCA CCACTGTCTG 'IrTTTGATGAT GGAGACTTTA ATACATCCTC AAAAAATCGA AGAACAGACG CTGCAGGTTC GTTAATATCT AATCCAAAGA ATCTACAGG *4 49 9 4 4 4*44 4 4 4444 eas.
4." 4* 40 4 4
S
INFORtMTION FOR SEQ ID NO: 255: SEQUENCE CHARACTERISTICS: LENGTH: 2495 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: CTGCGAA'rTT TATTAAAGAT AATGTGTTAA TTACAGCGGC TCACAACTAC TACAGACATG 12RI ACTATGGGAA AGAAGCGGAT GATATTTATG CAT'rTGGAAA GATCAAAGTA AAGGAAGTTC CTAAGGCMGC AAGGGAATAT GACTTGGCTT AATTAGGGAC T1-rGGGTCTT CCTACTACTC TCACAGGCTA TCCATCATAT AATTTTAAAA TTTAAGTGA TGATGGCATG TrCTTGGATTr TTCTTCCGGC TGTTAGTCCA AGTCAAGAAC GTTATTT1GA TAT'rAATTCT AAAAAAAI-r?
TTCATCAAAT
ACCAAGTTGA
TAG1TACGAGT
GATCTACACT
CTAATCAAAT
TTCTTAAAGG
GACAACATCA
GTTCCGGTAA
CAAATGGAGC
CATCTGGTGA
TGCAAATCGC
ATGATTCTGA
ATCTAAAAGA
CCGGAGAGAT
GTCCTTCTCC
G'rGTATTACA T'rATGATGCT
TAACAGTGCA
TTACTCTCTT
TAAACAAACG
GATGCTTACA
AATGGTTACA
GTGGATTTAA
ACTAGCCACA
AG4GATGGCAG
AACCTACTGG
GGTTCTCGGC
AAGAATAGAG
AGAATTTGTT
GTTAAATTAA ATGAACGAAA GAAGGATGGA AGAAAATAAA GGTTGGCAGG AGATAAATGA GAT1TCGCAAA AAGTCCATGG GGTAGCCAAA CTATCGATGG TGTTGGAGGA TATATAAAAT T'T'MTCTTCG GTTTGTTAGC T'rTGTCCAAG AAAA'rCGTAG AGAGTGATAG ATGGGAAGTA TrGGCAATATA TACCTGCTCC ATGCTCTTA GACCAGAT'rG GGCAAGCAAG TTTTAGAAGC
AGTCACCGTG
GGAA'TTAGA AATTTAAATT AGAAGAGCCC ATTGGTGCAA GACAGGAATA ACTGTGACTA GTA'rACAGAT AAGAAACAAG TACTTTAGAG GGGTCTAGTG GCATACTTTA GGAGATCGAG TTTGCCAT'1T ATTTAwTCGG TGGTAGTTGG TACCATTATA TACCTGGTAT TATNTAGACA AAAATGCTAT TATCTCAATT TAAAGTTTAT AACT'rCGCTT GAAGCTTTTG AAAAAAATGA GACAAATACA GTATTTGCAG
AACCTACTAC
CTATTATTTT
ACACAAGGGG
GTTT'TATTTT
AAAAACTGCT
ACAAACATCA TGGGGAAGAA TATGATAGCC AAGCAGA GAA ACGAGTCTAT ATCAGCGTAC TTATCATACT TTAAAAACTG GTTGGATTTA TGAAGAGGGT ATTTACAGAA GGATGGTGGC TTITGATTCGC GCATCAACAG ATTGACGGTT CACCTGGTTG GGTTAAGGAT TACCCTCTTA CCTATGATGA AGAGAAGCTA CATGGTACTA TCTAAATCCA GCAACTGGCA TTATGCA.AAC AGGTTGGCAA ATAGATGGTA CTACCTCCAT TCGTCAGGAG CTATGGCAAC TGGCTGGTAT CAACTTGGTA CTATCTAGAT GCTGAAAATG GTGATATGAG AACTGGCTCG GGAACAAATG GTACTATCTC CGTTCATCAG GAGCTATGGC AACTGGTTGG GTTCGACTTG GTACTATCTA AATGCAAGTA ATGGAGATAT GAAAACAGC TCAATGGTAA CTGGTACTAT GCCTATGATT CAGGTGCTTT AGCTGTTAAT
AAAAAGGGGG
GATCCTTTAT
CTTACGATTG
GGTCAAGATG
ACGAATACCA
TATN"TGAAG
CATTGGTATT
GGAGAGCTAG
AAAGCACCTC
TATCTAGGTA
AAGGAAG4GCT CAAAACCTT1G
TATCAGGAAA
TGGTTCCAAG
ACCACAGTAG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1282 GGGTTAAGTA ATGAAGGCTA ATTGTAAACT
GTGGTTACTA
GTGATGGATA
TAGTATCAAG
TTrAGTrCTG
GTAGTAATCA
CTCATAACCG
GTCTGGGCTG
ATGATAAGAA
GTGCTTCTCT
TGAAAGATTA
CI-rTTGAGTA
TTTAGAGCCT
CTTAAACTAT AATGGTGAAT CTTAACTTTG TATAATAGGT GTTTTTCTGT ACTGCCCTCA TAN'CCACAG GGCTAAGTCC ATATAGTCTA TAATGGCTTG TAAAACAT?1' CCGATTTCAG TTTCCCTTAC GTGACATGGA TCGTGTTGGT ATTGCCAGCC GTGAATGCCT GTTCCAACAT TAGGCGATAA TTTCGCTATT CTTGCTGGAA TGGCAAATTC TCAAATTGGC CTTGAATGAG GGATAAAAGT CTCACAATC AACACTTAGA CAATTAATTT ?TTAGTTTT ACCT TAATTC TTCCAATTGC ?T~GCGACT AATCCCAAAG AAGGACTCCA TGCTTGAATT CCC= ACTCT TTGGTCACTA TGGAGAATCG TGTTTGTACT TGTTCTAAGT AAAGCCATCT AAAACTGGTG
AAAAAACGCA
ATCCGAAGgA
G'PTTATTGTT
GAAACGACTT
TCATACTATT
CTAGGAACCG
TATTCTCGTA
TGGGTGAAGT
ATAAGTAAAG
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2495 TGTCACATCT GTGTAGCACT TTTCCATTGT
ATTCG
INFORMATION FOR SEQ ID NO: 256: SEQUENCE CHARACTERISTICS: A) LENGTH: 870 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256:
TACCACCGTA
GCAAGTCATC
AAACAATATT
TTCCAATTAC
AAATACTTTG
GAGGAACCAT
CAATCTCATC
GGGTCATCTT
TCGAAGCAGC
GCCCTGTACG
CCCCAAAAAC
GTGCTGTTTC
TTCATCCAGC AAGATTGCCA TTrGTCTTTG GGTATTTCGC AGTTCTTTTA CACAAAAATA GTTTCAGGTA CAAAAAGTGG ATCTTGTAAA ATTCTCTTCC GTCAAAACCG TCCACAAAGC CTGCCTTAAG GAGAC'TCTTG GTGTGAATGA ATTGTCCTTA TCCCCATCAT AAACCGGGAT ACCAGAATAA TTTTGTTTTA GATAATGGCT TGACTATCAT CCTGAATATC CACCATAAAG GCATCCGTTC AACCTCTCGT GCCATCAGTT CATCGAGCGA AAAGACACCT TGTAGCATCT AGCATCCAAT GTTTCTTCAC TATTrGTCAG CATATAGGCA ATTCATCAC TTCATCCGCA TCATCGAATG ACATAGGAGT CAAATGGCTC AAGAAATTG
TAAAAGCCAA
AATTGCCAAG
GATGGAAATA
ACAAAAGGAC TGACTAGTTT GCATCCTTTA GATTAAGAGC TAGGTCAAAA ATGCCAAGGA TCCGATCCCA ATGATAATCG GATTCTCTPTA GGATATAATT TAGAAAAGTT GCCACGGCTT TCCTAGAGTA TCAGTTAAAC GCCATTCCCA AGCCAAGAGG CAATCACACG 1283 TCCCCCCTGA TAAGATTCTA ATCAGGGTGA TTCCTACCTG GATGGTTGAT AAAAAGTGGT TAGGATTrC rAGTACCTTC AGCAGGCGGA TGTAGCGTCT GTCTCCTTCT TCCGCCTTTT' GTTCAACTCG GGCACGATTA AGAGAAACGG INVORMATION FOR SEQ ID NO: 257: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1245 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 257: a CGTTCCCAGA AGCCCGCATT AATCCAACTC AGCCTATGTA ACTTGCCTAT TCCGCGACAC ATGCCCAAGA CTATCTCTAT TGCCAGAAAA ALATTCGTAAT GGGCTTTGGC TCAAAGAAAG AAAAAATTGC ACTTTCCTCT CGCTGTGGTG TACGACTTCA AACAAAAGTC TTCTAACAGC AGTTGCCCAC CTGCTTAATT CAGCTrTTTT CTrTGCCTCC TCACAGGTTC GATCCCTGTA GGCTTTT?1'T GCTTTCCTTT TTT'rCACAGC CATAAACTCC TCAACATGGA GATATCTTTA AGTCAGCTAA GCTCGAGAAA AAAAATCCGT TTTTT-GAAGT TTrGATGAGA TrATGGTCG GACGATTTTC TCTTTGTCCT CTGCTTTAGA TTGTCCTCAA CTCATCGCCA ATGTCGTGAT TGATTTGGCC CTTTCTCCAA GCTATGGATA AGGCACTTGC TGACCTCAAA ACATCAGGGC CTGCGTGATG GGCACTACAG 'rGGAAGCAAG GAACTGGGGA CCACACAACT ATCCTGGAAA T'rGGGTCAAG CAAGACTATC CATCACTATT TCCAAGCAGA AGATACTGGT AAATATGAAC GAAGCTATCG ACCGTTTGCG TGAT?TT TGAAAAAGTG CACTTAAGTG 'rTGACCGACr ATGTAGGCCG TCTCACACGG GCGCGGGTTC AATACAAACC TTAGCTCAGC TGGCAGAGCA
AAAAATCTGA
GTATCATATA
ATT'1TTTTGTA
AAACAGCTTC
GTGAAGTTTC
GCGGACTCTT
GGAAAAGCCT
GCCGTGTTT
GGTATGCCCA
TGTCAACTGT
AATCCTTTTC
AATATAGAAA
TTATTAGGGA
AGTTAGAGCG
GGCACCAATA
AATCCGTGGG
TATAATATAG
TACAACAAAC
TAAACGTCAA
AGTGGGTTGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200
GGGGGCATAT
TAAAAATTGT
TCACTAATTT
TGTCCTAAAA
GGACAAAT
TrTCAAAGTT
CTTCC.AATTT
AAATACAACA
CGTGCAATTT
TTTCCTCCAA
TrrGGCTCTr TG3TCCTCT
CCGAAAACCA
GGCGTTAGAA
T TTTTGATAT TCAGAGCGAT AAGGCATTGC GCTTGATAAG TAGTGTAGTT GAAGGGCGT TTAGAAAGGT TrNAAAGACA GTCTGA.AAAA TGAGTCCGAA AAATTTCTCC GGTCCTTAT
GAGGAGGAAC
TCTGAAAGTG
1284 AAACAGCAAG AGTTGATAGA GCTGATAGTG ATGTT'rCAAG TCTrG INFORMATION FOR SEQ ID NO: 258: SEQUENCE CHARACTERISTICS: LENGTH: 1684 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 ATGCCTATGT AACTCCACAT ATGACCCATA GCCACTGGAT AAGCTGAGAG AGCGGCAcCC AGGCTTATGC TAAAGAGA.AA 58:
TAAAAAAGAT
GGTTTGACCC
GAAGCTATCT
AATCTTCAAT
AGACCATCAG
GAAAGCAGCT
AGTCAAAAAC
GTGGTTTGAC
GACTGTCAAG
TAACGCTAGC
AAAACCAAC
GAAACCGCAA
AGAGGAATCA
TGAAGATTTA
GATTCAGGAA ATACTGAGGC AAGAAGGTGC CACTTGATCG
GGTAGTTTAA
GAAGGCCTTT
TACTATGTCG
GACCATGTTC
GAGGAGAAAC
AGCGAGAAAC
GAAGAACCTC
CTTGGAAAAA
TCATACCTCA T'rATGACCAT TACCATAACA ATGAGGCACC TAAGGGGTAT ACTCTTGAGG AACATCCAAA CGAACGTCCG CATTCAGATA AAAGAAACAA AAATGGTCAA GCTGATACCA CTCAGACAGA AAAACCTGAG GAAGAAACCC CAGAGTCTCC AAAACCAACA GAGGAACCAG AGGTCGAGAC TGA.AAAGGTT GAAGAAAAAC TCCAGGATCC AATTATCAAG TCCA.ATGCCA TACTATrTGG CACCCAGGAC AACAATACTA TATTAAAGGA GAGTAAGTAA AGGTAGCAGC AACGGGAAAA CGAAAAATGA GAGCAGAATG
AAAAGGAGCA
TATGCCTTAC
AGTTTGTCTG
CTCCTTCGAC
ACAACCGCGT
ATACTGTAGA
TCAAATTTGA
ATCTTTTGGC
ATGGTTTTGG
ATCAAACGGA
CTCGAGAAGA
AAGAATCACC
TGAGAGAGGC
AAGAGACTCT
TTATGGCAGA
ATTTTCTAAC
TGAGTTCTAG
1245 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 CACAGGATTA AAAAATAATT AGCTGAAAAA CTATTGGCTT TCCTAAAAAC AGGATAGGAG TTCTCATITrT TTTCATGAAA ATGTGCA.AAA TCTACTTCTA AAACATTGTT AGAAATCGAT ATATCTTAAC AGATAGTGTA AATAAAGATA CACTAGTTAA GGAGTAATGA TGAAAAAAAG AGCTCTTGTC TTAGGAGCAT GTGGTTTCTT GGAT'TACTCT 'rTACTGCTAT TTTAGAAACT 'rTATCGTTTC ACATTCCAAA CACATTGCAG
TATAGTAGAT
TTGACTGTCC
AACTATTTAC
AACAATACTA
GGACATA'rTG
GGGGTGGTTT
AAGGTGTTGT
TGAAACTAGA ATAGTATACC TGTTCTT ATT TCATTTTACT TGGCTAATTA ATCAGTTAAA TTATTGATGG CCAGTCTGTT ATCCTGGATC ATTCTCATCA GATGGAAAGT ATTGGTCTTG.
TGAACTGATT AGTAAAGTAG CTAAAGATGT TCCGATTACT TATGTAAGAG GAACCGAGGG CGGAGGAATT GGAACGAGTT 1285 TTGAACAAGT AGATAGGGTT GTTTCCGAAA ATCCAGCAGA TACPT~ACTT GCC?1'TTTTC ACCTAGGTTC TGCc'AAAATG AAC?1'AAAAA TGGTGACTGA TTCAGTGAT
AAAAGTATCA
TCATCAACAG GGIccAAT1' GTAGAAGGTG ccTATAATrc AGCTGCTcTT
=C~AGGCTG
GTGCAGAACT GTCAGTTATT CAAACACAGT TaGCGGAgC t TGAAATCAAT
AAATAAGGAA
'rTACTATA ACTCrTTrA TAGATAAGCT A'rrGaTTATC TCAACTATAA TAATGTI'AAG
TMAA
INFORMATION FOR SEQ ID NO: 259: SEQUENCE CHARACTERISTICS: LENGTH: 970 base pairs CB) TYPE: nucleic acid SrRANDEONESS: double TOPOLOGY: linear 1440 1500 1560 1620 1680 1684 Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 259: AOGAGTGGAG AnATATGAAG ACACAAATTT TCACATTATT
GAAAATCCTT
TTATTATTTIITL GCCATTTCTA ACTAATCTAT AAGTTCTTTA TATTGCTGAA AAAAAGGGCT AT'rAATTGTG GA ITTT ICTAA TACCTGCAGA GATTGGATAA CTCTTTTTGA TTGCTTCCCT TTGTTTGAAG AAAGACACTC CGATACT'1TT TCAAAAACAT CATACGGTCG TAACATCCTC TGGGATGTAG GAGAAAAGTT TI'CGCTCCAT GAGTTCTGAT CTTCATATAC AATCGATIGT GTACTAACTC
TTTAAATTCT
GGTTGATAAA AAAATCAGAT CTTGATTGCT
CAAGAAGGGC
GAGTTCAA.AC CACGTCCAAG ACTCGATAGC ATAGAGATAG CTCCTCTGCI' AGTGGGTAG-C TTrTATTAG
TGAATGGATG
ATTCAAACGA CGATAGGTCT GCGCCATCTG TTCTTGATCG TAAAGCAGCT ATATCCTGAT GGGCAAAGCG
AITCACAACC
TGGAGATCT TGATAGTI'GT TGAGCI'GTG
CCCAAACTCA
TTGGATAGCT AGAATCAACT TATCCGCAGA
CAGCATAGAC
TCCAACTCGG
AAGATATTTA
TTGGATTTTTr
ATGGTATTGC
GTGGTCAAAA
GCATCTTTAA
GCTGAGATTA
AACGCAATTC
AGCGTTCAAT
AAAATTGCC-A
CTTCGAAGAT
AGAGTCCTTG
CGAGTAAGGA
AAAAGAC.ATA
ACTCGCTATC
TCACGATGC
ACTTCCTCCA ATAGCTGCTC TTTCGACCGA
TTCGCATATG
TCAAAGGTCA
CATTTATACC
TGCCCTAGTT
CAAACTTGGA
CAACTGAGAA, GCTGTTAGAC AcGTAATTCC TTGTAAAATT
CTTTTGCAGG
CCTCACAAGC CACATCTCAC TGC?1'GAGCT CCCCCAGTTC CATTCTCTCA ATCATCTGAC
CACCTCCTAG
INFOR.MATrION FOR SEQ ID NO: 260: SEQUENCE CHARACTERISTICS: LENGTH: 2996 base pairs TYPE: nucleic acid STRANDEnNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260: 0 0.600 t 0 0 000 0 0* t.
0 0* C.
0 0 *0 04 0 0 0 GTTGACCACG GGTAAAACTA CCCTAACTGC GCCTrCATCA GT'rAACCAAC CTAAAGACTA CGAACGCGGT ATCACTATCA ACACTGCGCA CGCTCACATC GACGCTCCAG GACACGCGGA TCAAATGGAC GGAGCTATCC TTGTAGTAGC TGAGCACATC CTTCTTrCAC GTCAGGTTGG AGTTGACTTG GTTGACGACG AAGAATTGCT ATTGTCAGAA TACGACTTCC CAGGTGACGA AGCTCTTGAA GGTGACTCTA AATACGAAGA TGAGTATATC CCAGAACCAG AACGTGACAC CGTATTCTCA ATCACTGGAC GTGGTACAGT TAAAGTCAAC GACGAAATCG AAATCGT'rGG TACTGGTGTT GAAATGTTCC GTAAACAACT TGTCCTTCTT CGTGGTGTTC AACGTGATGA AGGTTCAATC AACCCACACA CTAAATTCAA AGGTGGACGT CACACTCCAT TCTTCAACAA TGACGTTACA GGTTCAATCG AACTTCCAGC CGTGACAATC GACG'ITGAGT TGATTCACCC TATCCGTGAG GGTGGACGTA CTGTTGGTTC CGATT'rAGTT CCCAGAAGAA CAATTATTrA AGGTTCTTTr TTTAGATATT GAACTAATAC TATTGAAACT AGAATAGTAC ACATCTACI' TCCTGATCGA TTTGTCN'GT TCTTATPTCA TCAAAACATT GTTTTTAGGT TGTAGATAGA
AGCTATCACA
TGCGTCTATC
CGTTGAGTAC GAAACTGAAA AACGTCACTA CTACGTTAAA AACATGATCA CTGGTGCTGC TTCAACTGAC GGACCAATGC CACAAACTCG TGTTAAACAC CTTATCGTCI' TCATGAACAA TGAATTGGTT -AAATGGAAA TCCGTGACCT TCTTCCAGTT ATCCAAGGTT CAGCACTTAA CATCGTTA'rG GAATTGATGA ACACAGTTGA TGACAAACCA TTGCTTCTTC CAGTCGAGGA TGCTTCAGGA CGTATCGACC GTGGTATCGT TATCAAAGA.A GAAACTCAAA AAGCAGTTGT TGACGAAGGT CTTGCTGGAG ATAACGTAGG AATCGAACGT GGACAAGTTA TCGCTAAACC AGGTGAAGTC TACATCCTTA CTAAAGAAGA CTACCGTCCA CA.ATTCTACT TCCGTACTAC AGGTACTGAA ATGGTA.ATGC CTGGTGATAA AATCGCCGTA GAACAAGGTA CTACATTCTC AGGTATGGTT ACAGAAATCG AAGCTTAATT AGTTAGACAC 'rAAAAGAATC 'ETGCTTGGCA TCAATGAAAA TCAAAGAGCA AACTATAATA CTAAAACATT GTTAGAAATC GA'rTrGACTG TTTTACTATA GAAAG'NAGC TACAGACTGC ACTGACGAAG TCAG tAACAT CTATACGACA ACTGTTTTGG CACGTCGCTr GATGCTGCTC CAGAAGAACG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 0 0*00 0 *00* *4 04 .0 04 0
AGGCGAAGCT
AAAGCATTAT
CTTTTAAAAG
CTGATGGGCA
GAACAACTCT
CTGTTATAGA
AATATGAGGA
GACGCGGTTT
ACAATAG1TAA
ACTCTCGTAC
AATCTTATAA
ATCAGGAAAG
TTCAATACAC
GTTCGGACTC
GAAGAGATT TCGAAGAGTA TATGAAATCA A'rTAAAGAAG AGCTAAA'rAT CATAAACGCC AGAGATTATA GAACTTTTAT
TAATACTAGA
AAATCCAAAC
TTCAAATCGT
AGTGGTTTGA
CTrATATGAC GGTTGAGCAA CAGGAGAATT TGTTACAATT CCTACACACG TGATGCCTTC CACGTCCAGA ACATCCTAAG TCTCAATTCA AGAAGACAAG TTGGGGTGTT TTTATTATGA AAAACAAGGA TATAGCTTAG TAGGTACATG ATTAAATTGA TCGTTACTAT TC'rCCTGATT GACTAAAGAT AGAGTTTCTC TCAAACTAAT TTATAGAAAT ATTTTAGCAG TrTAGACTGT AATCAAACAA CGA'rrTGGCG GACTCTCTCC TTCAAGAAAC ACGTGGTGGT GAGAAAGTCT TTCTTGCCCG CCATTTGAAG GATGCCTTAT TTCAGGCTTA TAAAAAGGAG TATCAACTGT TGAAGCGCCA TGGTTGGCGA AAAGCAGATG CTCAAACCAT TGTCGCGTCT TGAACTGCAC CCCA.AAAGTT AGACAGAAAA AATTAACTTA TGATGATAAA GTTCAGATCT AGAAGCTTTC AAATAAATTT GGGATAAACA TTGATCG'rrA CGGAATAGAG TTCGTCAAAA TAAAACAAGA AATGATTCAT AAAGTCTGAC TTGAATACTG TCTCCCAAGT CGTACGATAC
CTAAAATCAA
CATCAAAACA
TCTATTTCGT
AATAAGATGT
CCAAGGTGTA
AAATGTA.AAA
CGTAACCATG
GCTACAGAGG
TTAGGTCGTT
AATATTACC
AAAAATAAAG
AATCTAACITT
ATGAACTTAG
ATTCTAATCT
AAGGAAAAAA
ATGAAGGCTG
TTCT-rAACTG
GGAGAGTACC
GAGGAAAAAG
TAGATATTCT
AGCTAGATAA
1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 21.60 2220 2280 2340 24C0 2460 2520 2580 2640 2700 2760 2820 2880 2940 2996 GCTAGCACAA TACAGGAAAA ACGGGTATAC TATTGTTGAG AAAACAAGAG TGAGAGCGGA GAATGCCATC CTAAAAAAGT TAAGAGAAC'r CCGATTGAAG AGAAAGAAGA AAGACAGAAA TTATrCAAGA ATTAATGACT CAGTTTTCGT TCTAAAAGCC AT'TAAACTAG CTCGTTTGAC CTACTACTAT CACTTGAAkAC ACCAGATAAG GACCAAGAGC TTAAAGCTGA AAATTATGCT TATCGTCGGA TTTATT'rAGA TAAAAGAGTT CAAGGCTrGA TAAAAGTACT AAAATATTCT TCTCATAAAG GAGACGTTGG ATTTGAAGGC TCTAAAACAA TGGAA.AAGTG INFORM4ATION FOR SEQ ID2 NO: 2( SEQUENCE CHARACTERISTIC! LENGTH: 837 base pi TYPE: nucleic acid STR.ANDEDNESS: doub] AATTCAATCC ATTTTATCG AACACAAGGG ACTAAGAAAT CGTGGTTATC TGGTAAATCA CAATTTACAA GCTAAAATGC GACAGAAACG CAAGAAGGCA GAGAATCTCA TTCAAGGACA CTACACAGAT GTGACAGAAT TTGCCG 1288 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: CTTATCAACT CCCGACATGG CTCTCAGACC GATGGTGGTC AAGATCAAAC TCTCGAAATA AATCCAAATC CCTAAAAAAA TCAGAACAAG
TCTCATTTCT
TATCTATTAT
TCTTAGGGG
AAGAAGAGAA
ATGGTCTGAC
GGGTCAAGGG
CGCGCTAGGG
CGGTCCGAAA
GATAAT'TCT
AACAAATTAA
TGAGTGATTT
GTTTCTAGAT
ATCTTTTrTA AAGAGTAAAC ATCAAATATA AGTCCGTTTG TGTGGATAAT CTCATCCGGA TTTTATCCGC AACTTGGGCT CTGACTTAGC AGCATCTGCA CTGAAGTTGG TTCGTCTA.AG CAACCCGTTG CTTCrGTCCA GCCCAACCTT AGCCAACTCT TGACAACAAC CAAGCCTTCT ACTGTTGGAA AACCATAGAC TAGAAAAATC AACTGAAAAA AATTGAGACT GCGAGAAAGG TAAAGAAAAT AGTTGCAGTA GCATGAT'PTC TCAGCTAGTC CAACTAACTG AGTTTTCCTT TAACTAGCGA AGAATTCTTT TGTCCGCTCT G TCCAGACT CGATGATTTT CCCCT'rATCT ACAAAGGACA TGTCATGACT GACCAAA.ATC ATAGACTTTT CTACT'rCACC GACCAATTCT AGCAAAACAT CTGGTTTCAT AGCA.AGCGCA CCTGATAAAT GGCGAGGATA ATGGTTTTCA TCCTTGGCAA TCT'rAGTCGC TTCTTGGTCA TTCACATTAT CAAGTGCTGT TCGGCGTTCA AACTrACGAC GTAGGGCAAG GATTTCrTCT CCATCAATCT GAATAGAGCC ACTGTCAGGT TTGATTrTCA GCTCTGAAGA CCAATCA INFORMATION FOR SEQ ID NO: 262: SEQUENCE CHARACTERISTICS: LENGTH: 868 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262: CCGAACAAAA TGGGCTAATT AGATTATAGT AAGAAAGGTA AGTTAAAAAT GAGAATTGCA ATTGGATGTG ACCACATCGT AACTGATGAA AAAATGGCGG 'TTCAGAATT TTTGAAATCA AAAGGATATG AAGTCATTGA CTTTGGTACC TATGACCATA CACGGACTCA CTACCCAATC TTTGGTAAAA. AAGTAGGGGA AGCTGTAACT AGCGGTCAAG CTGATCTTGG AGTATGTATC TGTGGTACTG GTGTTGGTAT CAACAACGCT GTAAATAAAG TTCCAGGTGT TCGTTCTGCC TTGGTTCGTG ATATGACAAC AGCCCTTTAT GCTAAAGAAC AATTGAACGC TAACGTTATI' GGTTTTGGTG GTAAAATTAC TGGTGAATTG CTTATGTGTG ATATCATCGA AGCTTTCATC CATGCTGAAT ACAAACCAAC TGAAGAAAAC AAAAAATTGA GAAAGTCACA ATGCTCAACA AACAGACGCA AACTTCTTTA GATCGTGGAG AATACCACGA CTAAGAGGrG ACCTATGATr ATCCATCGAT A'1TTCCTATC CCTTGGATGA GTTGAAGATT GGATGTAACC AAAACGGCTG GTGGTAAGGG ACTCAATGTT TGGCGATTCT GT'rCT'rGCTA CTGGrTTAGT GGGTGGCA.AA ACATATCGAT AATCAAGTAA AGAAAGAT'rT CTrCTCAATT TATCGCTATT CrCCACGGAG ACAACCAA INFORMATION FOR SEQ ID NO: 263: SEQUENCE CHARACTERISTICS: LENGTH: 3744 base pairs TYPE: nucleic acid C) STRANDEDNESS: double CD) TOPOLOGY: linear TTGCGAAAAT TG.AACATGTT CAGAATTCCT TGAGAAATGG TTAACAGTCA CAATGAACCC GATACTGTCA ATCGTGTGGT
ACCCGAGTAC
CTTGGTGAGT
AAGGGAGAAA
TTTCAGAATT-
TTTGGTTGA
CTCGTAACTG
(xi) SEQUENJCE DESCRIPTION: SEQ ID NO: 263:
CCGTTCAAAG
ATAGGTAATT
TATAAAAGAG
ATCAAGAAAT
GTCTTTTGCC
ATGCGACTGA
CTCGGGCTCA
AAGTCCAGAT
GCAATGGTCT
TTTTCCAAAT
T'rAACGGGCA
AGAATGCTGG
CTA?1GTTTC
ATCI'ATAGTT
TCTTCATAAG
TCAATCATGT
AATGTCAAGA
CCAAACCTGC
CATCATTTGT
GTAOACCGCA
GATGAAGCTT
AGTAGAGCAO
CTTCCAAGAG
ACTCGAAAGT CACAGTTCTT TCGTrCTTGC TGGCATCTAT TTAAAACTCC TTTGTTTAAT GCTAACTTTA TTTrACTCCT AAAATGATTG CGCACGCAAC TTTTTTTAA.A ATCATCTTAA TPCCAAGCTT TCTTCGACAG TCTTTTGTAG CGAGGCCAGT CAGGCAGATA AAACTAGAGC GTCTATCTTG ATGGCAACAC ATTTTTAGCT TCCAAGCGAG CCACCATCCT AGAAACTGCG ATCTGGCAGG TCAATCTGGC GTAGAGATT TTCTTCAGCC GTAGAACTCT TTCAAGGTCA GACrTCCTC GCTCTGTTGG ACTTTCAATT TCTTTCTGAC GCCGATTGAA GTCAAACCAT AGGTCATAGT GTCTCCTTTC GTCTCTGCGT CACAAGATGA CAAGCTCTGT TTTTTAGTGG AATTTCTAAC TCCTTATCAC GCTTAGTTTA AAATAAGCAT TTTTAGAGT CATAAATAGA AGAAAGTCCA TTGCGCATGC. AATAATTATA CTACT'TrTCA TTTTATTIT GTGTGA.ATAA TGGGGGAATC ATTCGAATTrC AGA'I-r-rATT TCATTTCTCT GGTCTAATAA AGCTATGCAT ATAGTACTGA TTTTAAACAA GGACCATTAG ATTCCATTAA AGGAGGGAC AGACATGTCG AGGCGGCCAA 1290 AGT7rTT'GAT GTCGGCGTCA GAACTCTCTT CACGTGGGAA AAGAAAGACG TAAACAAGGG 96 960 AACTTAGAGC GGAAAAAGCG AGTCGTCAAA AAGCG1'AAGA GCCT'rTGTAG AGGCTCATCC GCTTTGCCCT CCGTATGGGC AGTTTTAGGG AACAAAAGCC AGACCTT TTACGCGAAA
AAAGATTTAC
TATGCAGGG
CGGACTAATG
CAGTCCTATA
CTCCTAGAGG
AGGTGGAGCA
AGTTTTAAAG
TGAGAAAGTT
TATTGACGAA
GGAGAAAGTC
AAAACTCAAT
CAAATTrAGAG AATGAAAGAA TAATTATGCA 'rGATTTATAC TCATTTGCGT GTcTT'rATAA TATACAAAAT AGTAAAATGC TrTTrTTTT TTrGGGTTCCC TATTCTATAA TTATAGTATG ATACTCGAAA GGAAATTCA AAATATT'rT AGACCGAGTA ACTCGGTTCA AATTAATCAA 'rCTCTTr'r GGCTTGTTTA AGGTCTTGCG CGAAACGGTA GAAGATAAAG TAAGTCATTC AGTTGCTGCG GGAGCTGTCA TAAGTGGAAC TGAGTCAGCA GTACTTGAAA AAACTGTAGA AGTAGTTCTA GGTACGATAT CrACAAGTAA AGAGTCGGCA AGTACATCTIG CATCTGAGTC TACAAGTGCA TCAGAATCAG CAAGTACATC TGTGGTAGGT TCACAAACAG CTGCCGCTAC TCGTAAGAAA CCAGCTAGTG ATTATGTAGC
CAGAT'TAAGG
TCTCGAGTC
ACGGGAATCG
TATGGCAAGA
GGTAGTTI'C
TAAAAATAGG
AAAACTTATC
TAGCAAAAAT
GTAATTTATT
TAGACGTCAG
ATCAGGGAAG
AGGTGGTGTT
AATTACTGGG
CGTTGCAACT
GAAAACGGAT
TTCAGCGAGT
AGCCTCMCC
GGCTTCGACA
AGAAGCAACT
ATCAGTTACA
TCCCTTTrAGA AGAA'rrGAAA NTGCGGCCCG T?1'TGATTGT TCAT AAA AAAGACGACC TTGATATTTT GGATAACCTA ACCGCTACCT CTATCGTCCT TTAGCCGACG GCG'rTTTGAG TAATCAGATA TA'TCGA'TCA AATATAAACC AAAAAT'rAGC TTATAATATA TATATATATA ACCTCAAGTT TCTTGCTATT TATATCCATA CATG.AAAATA AAGGGTGAAT ATAGAGAAAC CATTGGCTAC GGGCCTCGAC GATACTACTC AGGTCATGAC CTTGATATCC TCAAGGGGAT CAAACGAAGG TATTTACAAA GCTTTGGCAA CAAATGATAC TCAACTAGTT TGTCAGCTTC AGCGCTTCGA CCTCAGCAAG AGTATTTCTG CATCATCTAC GCTAAGAAGG TCGAAGAAGA AATGTCAATC TCCAATCTTA 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 TGCTAAGCGA CGCAAL3CGTT CAGTGGATTC CATCGAGCAA TTGCTGGCTT CTATAAAAAA TGCTGCTGTT =rTCTGGCA ATACGATTGT AAATGGCGCC CCTGCAATTA ATGCAAGTCT AAACATTGCT AAAAGTGAGA CAAAAGTTTA TACAGGTGAA GGTGTAGATT TGTTCCAATT TACTATAAAT TGAAAGTGAC AAATCATGGT TCAAAATTGA TACGGTTACG TATGTQAATC CTAAAACAAA TGATCTTGGT AATATATCAA TGGATATTCT ATCTATAAT CAGGTACTTC AACACAAACA ATGTTAACCC TCTTGGTAAA CCTTCAGGTG TAAAGAACTA CATTACTGAC AAAAATCGTA
CGGTATATCG
CCTTTACCTA
GTATGCGTCC
TTGGCAGTGA
GACAGGTTCT
ATCCTATAAT
CCAAATGAAT
AATTACI'GGA
TGGAAPTAAC
ACTrTCACAG
AACAAGTGCG
AGCTTCAGCA
GACAAGTGCC
GGCTrTCAGCA
C-ACCTCAGCT
TGAATCGGCC
CACAAGCGCC
AGCCTCAGCA
ACGAGTGCGT
GAATCGGCCT
CCAGTGCVTC
CATCAACCAG
CCTCGGCTTC
ACATCTACAA TGACGACGCA GGTTTTrG
ACGGATACAT
TACTTCAATG
TCTAAGTCAC
TCGGCTrCAG
AGTACCAGTG
TCGGCTTCAG
AGTACCAGTG
CTAAGAAAGG
CCTTTACATT
GTGGAGGAAA
'rCrCAGTAAG
CATCAACCAG
CTTCAGTCTC
CAAGCACATC
CTTCAGCTTC
1291.
GGGTAGTGGG TATACTTGGG GAAATGGTGC ATATGGATTA ACATCATCTT GGACTGTACC TACCCCTTAC GCTGCTAGAA CAGATAGAAT GGTAGTrGAA TC'rAGCACGA CCAGTCAGTC TGCTAGTCA6A AGCGCCTCAG CTTCAGCATC TGCCTCGGCT TC.AGCGTCAA CCAGTGCGTC AGCATCAACA AGTCCTTCAG CCTCAGCATC AGCATCTGAA TCAGCGTCAA CCAGTGCTTC AGCATCAACC AGCGCCTCGG CCTCAGCAAG CGCCTCGGCC TCAGCAAGCA CCTCAGCTTC AGCATCAACG AGTGCTTCGG CT-rCAGCAAG TACGTCAGCT TCAGCGTCAA CCAGTGCTTC GCAAGTATCT CAGCGTCTGA ATCGGCATCA ACGTCAGCCT CAGCAAGCAC CTCAGCTTCT CATCGACAAG CGCCTCACCT TCAGCAAGTA CGTCGGCCTC AACCAGTGCA TCTGAATCGG CTAGTGCATC GGCTTCAGCA TCAACCAGTG 2760 2820 2880 2940 3000 3060 31.20 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3744 'rCTGAATCGG CCTCAACCAG TCAACCAGCG CCTCAGCCTC
TCGGGTTCAG
TCAACAAGTG
CTGAGTCAGC
CAACCAGTGC
AGCCTCAGCG
TGCGTCAGCC
CA'rCAACGAG
CGTCAGCTCA
ATCAACGAGT
GTCACCTCAG
TCGACAAGTG
TCAGCAAGTA
AGCGTCAAAC AGTG INFORM~ATION FOR SEQ, ID NO: 264: SEQUENCE CHARACTERISTICS: LENGTH: 795 base pairs IB) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ I0 NO: 264: CGATAAAGAG GCCTTGAGTA ATCTCAATTT GCAGATTGAA AA'rGGAGAGA TTATGGGCTT GAT-rGGTCAT AATGGGGCTG GA-AAATCOAC CACTATAAAA TCCCTAGTCA GTATCATTTC ACCCAGCAGT GGTCGTATTT TGGTAGACGG TCAGGAGTTA TCGGAAAATC GCTTGGCTAT TAAACGAAAG ATTGGCTACG TAGCAGACTC _GCCTGACTTA TTTTACGCT TAACCCCCAA TGAATTTrTGG GAATTGATCG CCTCATCCTA TGATCTGAGT AGATCTGACT TGGAGGCTAG TCTAGCTAGG CTrATTGAACG TCT?1'CTCAC
TA?'N'GGGTC
ACAGATGATG
AGAGGTGGCA
TTGTGGTAGT
CCI rAGTCTT
TAGTTGATAT
AGCAGGCTAA
GGAATGCGTC
T TGGATGAAC
AAGGAACATG
GAGCAAGTCT
GTAGAGGACT
GCTGGTAGAA
CA.ATATCCTT
GAATC
rrI-rrGATrTr
AGAAAGTCTT
CCTTGACTGG
CACAAAAAGG
GTGATCGGAT
TGAGAAAAGA
1292 TGCTGAAAAT CGCTATCAGG TGTCATCGGA GCACTCTTGT TTTGGATCCC CAGGCIYGCCT GAAGACAGTC TTGTN'TCAA TGCCATTTrG AAAAAGGGGC 'rTACCCAGAC CAGTCTTTGG
TTATTGAAAC
CTGATCCCGA
TTGATTTGAA
CTCATGTCCT
ATTTGATrTA
AAAGTATCTA
AAGAGGAGGT TGCGGATGCG TCI'CAAGGTC ATTA).AAAAT TATTCATCTC AAGAAGCTAA TCTGGCTAAT CrACGAAAGA 0 0 0* 0 0
S
*0 *0 0 INFORMATION FOR SEQ ID NO: 265: SEQUENCE CHARACTERISTICS: LENGTH: 2231 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIP>TION: SEQ ID NO: 265: TGGTAATGTG CTTGGCAGCw TACCAACGCC AGCCAGCATT TGAGGTCAAA GCCATATTCT GA PTGATGAC ATCCGATGCA CTGCCGCCTT CTCAGATCT TCATCAGGTG AAGCAGGTCC AAGACTGACT CACCGTCCGA AAGAACCCAG ACCAA.AGGAC CAATCTCCGT GCCCTCTTTT TAGGGCTCGT GAACAAGACT CA.AAATCCAG ATCCAAATCG CTACGCCAAC TAGTACCCCT TCCTTGACAC TGCTACTACC ATTTCCCATA TCAAGATCAT TATCTGAGTC ACCAAAAGCC TTCCCAACTC GGCGAATGCC TTCTAATTTA AAAGGATTGC TACGTGTCAA T'PTCAAGTCT TCTGGTGTCA TCAGCATCAA AACTTGGTAG TCTTCCTTPT GGGGAACAAC CTTGCTGACC GTTAAAACAG AGGGAACGAA GCGACTAATT ATGATTTTAG AACCCAACAT GGCATCCTTG 0 0 000g 0 0000 000 0 0000 00 *0 0 S 0
GCGACCGACA
ATGACTTGGT
GAATTTCCCT
TCAAAATCAG
ATAGGCTGAT
ATGCGATTAA
CGTTGGGAAA
GTCCCTAG; -C
AACTTGGAAA
TAGGTTACCG
CGCCCTGTCG
GTGGA TPTCA
GCTTTGACTT
CAGAAAGAGT
'PTCCATGCAG
TTAGCATAGC
CTGTCTrTAC
TCCATCAATT
TGTTCTTTGA
TAATTAGATG GCGCAAATGT TAAAGATATA CTGGCCATTA
CCTTAACAAA
CAATCTTAAT
CATCGATATC
AAAAGGTCCT
CGCATCCTTA
AAAAAAAACA AAACACTCTT GCGATTGTTG ACCAAGGTTC CCATCCTATC CCAATCTCCC CTTTTGTGAT GAGTAAATCA TGCCTAAGAA AATCCTTGTr ACAATGATTA TACCACA'N'T TTACATACGC GTGGAACTAT
C
S.
S
S
S@ C.
S
S
CC
C
C. 55 S C
C
0
'SC.
C
C* Ce
S
.CS.
CS.*
C. *5
C
1293 GCCGATGCTI' CTGGCGCTGT TG'rGACGAGT TCAGATAATC CCACPGAAG GAATCCAAGT CCACGCCTTG GACTTrTTTTA AAACCCAAAC ATATGCTGGT CCTCTACCAG AAAATTAAAG GGAGTGG'rGA TCACACACGG AACCGATACT ?1'AGAGGAAA ATGGAAGTTC CCCATATGCC TATCGTTCTA ACAGGAGCCA GTAGTGATGG TGrI'TATAAT TACCTAAGTG CTTTACGAGT CTGACAAAGG AGTrTTGGTC GTTATGAACG ATGAAATCCA AAACACATAC GACTAATGTC AGCACCTTCC AGACTCCAAC TCATGAAACA GGAAATCCTC TACTTCAAAA CAGCTGAACC ATCACATACA AGGT'NTAGTC CCTATCATCT CCGCTTATGC 1-rGATATGCT GGATTrAGAA CACTTGGACG GT TTGATTAT ATATTCCCAA AGAAACGGCT CAAAAATTAG AAAGC CITCT CTCTGGTATC ACGATGCTTT AACGGTATTG CCGAGCCTGT GCGTACAGTT GCAAAAAGCA GGCGTTTTCT TTGTTAAACA GCTrGAAACT CCTCATCGCC CTCA.ATGCCG GACTAACACG TGGAAGGCTA ATAC-TCTTCG AAAATCTCTG CAAACCACGT ATGGtACTGA CTTCGTCAGT TTCATCTACA ACCTCAAAAA TCAGTTCTAT CTACAACCTC AAAACATGT TT1TGAGCTGA ACCTCAAAAA CATGTTTTGA GCTGACTTCG TCAGTTCTAT TTTGAGCTGA CTrCGTCAGT TCTATCTACA ACCTCAAAAA TCAGTrCTAT CTACAACCTC AAAAACATGT TTTGAGCTGA ACCTCAAAAA CATGTrFGA GCTGACTTCG T'TAGTTTCAT TTTGAGCTGA C INFORMATION FOR SEQ ID NO: 266: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1310 base pairs TYPE: nucleic acid STRMJDEDNESS: double TOPOLOGY: linear
CCATGAACCA
ACCTTCCAAG
AGGAAGCAGA TAACTACGAT CAGCCrATTT CCTTGATACC TGCGTAC cCC AATGAGCTCG GGCCAGCGAT GACAGGGCTG CGCTGCCAAG TATGTCACCA ACATGGCCCC CTTGGCTCA TCGTGTTCCC T'rrGACCTTG TGGTATGACA GATGAGCTGA CCAAGCCTTC GGAGCTGGTA GCAAAAAGGA ATTCCAGTCG TTATGCATAC CAGGGTGGGG ACTCAACGCC CAAAAAGCTC ACAGGCTTTG AAAGACTATA CACGTCGCCT TACCGTATG'r CATGTTTTGA GCTGACTTCG CTTCGTCAGT TCTATCTACA CTACAACCTC AAAAACATGT CATGTTTTGA GCTGACTTCG CTTCGTCAGk TCTATCTACA CTACAACCTC AAAAACATGT
TGTGTCCAAC
TCCCCATATC
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2231 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266: GAGTCAAAGG CTCCGAGGTT TCAGAAATTC TCTTCTTGA GCTTATGAGG TTCGCCTCAA
CGAGTTTCCA
TCAGGAACGG
TACCAATCCC
AGTCAACGAT
GCACCATTTC
TCCTAAAAGA
1294 GACTT TAC AAGGGGACAG GTGAATATTA TCTAGACCTG AACAGAAGGG AGCAAGATCT ACGCTCATAA CCAGAAGGAA GCTCrATGAG TTGGAGTCTA TC?1'GCCTCG CTATTTTAAT CGCAAACATC CGTCAGATTT ACTCAGTGGA CAAGTCCTTr CTTTrATCAG ACGCACAAGG AGGTTCATGT CTCACGGCAT AAATCTAAGA AACATGAGGT AAAAAACATG AAAAAGAAAG GTTTTAGCAG CTrGGATC'IT GCTGCAAGGG AATTTTrGGAA ATATGGCCTr TACTAGGTAT TGT -rTTTTT GCTTATAAGT CGTCATCTCA CTTCGGCAGT TTrTACAGGT TTACI'GGCGC CATTTGGTAT TGTTTTATTG TTCCTTCTTr GGATGGTAAA CCATTGAGTC CATCCTTAGA TCATCATTGC AAATTACGCT CTAGCATCTT GGTGGTACTT AAAAAAAATG GTGGTACAAT GTAGCGGGAC CTTCTATAAG
TTGGGGATGC
ATATTGAAGT
TGAAAGTAGA
AAACCTTGAT
AAAAAAATCT
GCTCTGATTT
TGTATATT
GATGTAGTGT
TGCTCAGGAA
TAAAATCTAC
GGCCTTCGGG
AACCTCCT
TATCCGTGGA
TCACTTCAAC
CAGTTCTATG
TCGATATTTT
CAAATGCTTT
CCATGAAAGT
TATGACTrGT TACCAG1'TAC GG'rGTTGGTT ATCTGACGCA GGGAAAAAAA CAGTCGTCAC CAAGATCAAG ATCTCGTAGA TATGATAATG CAGAGATGCT AATGCAACCG TCTATGTTCC GGTGCAGCTA AGGCTGACGC GATGTGGCTT TTGGGAAGTT CATCAAAATA GACGTACTAA GTTG'rTAGAC TTTAAAAAAT GGCTTTTACG TTTGATGTAT TAAAAAACGG ATGATATTGG CAGTACCTGG GTrTATGACA CAATCATI'CT CTTATTTGGG TTCAAGTAAG TTCTGGAATG GGATAAGGAA GTCGC-TT1G TGACCAAGTG GAAGTCGCTTr AGGTGA'NTT GCAACTTTAA ACAACACTGG CGTGTAGATT TCCTGTAGCC CCAACCAGCA GGAAATTGTC TACGTTAAAT GAGTAGGAAA T'rGATGCCTT GAAATGCTGC CTTTAA.AAGT CTATGTACTA CAGCGTAGAT ACAGTTTrN TGCCT'rTAAT
AGGGAGAATG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1310 INFORMATION FOR SEQ ID NO: 267: SEQUENCE CHARACTERISTICS: LENGTH: 5922 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 267: ACTCTGAT GATTGGAACG ACAGTCGGTG CCATTGCAGT TACTTCAAAC GTAACGACTT ATGTTGAGTC TGCTGCTGGT ATCGGTGCAG GTGGACGTAC TGGTTTGACA GCCTTGGTTG 1295 TAGCTATCTG TTCCCArr TCAAGCTTCT TTAGCCCACT TCTAGCGATC CTACCAACAG CGGCTACAGC TCCAATCTTG ATTATCGTTG GGATTATCAT GCTTGGTAC TCCATTGGGA TGATATGTCT GAAGCA C C'TCCTC~rT CACATCTA'rC TCAGCTACTC TATCACTCAA GGGATTGCAG TrCTTTCTT GACTTACACT TTGTTAAACG, TCAAGTTAAA GATGN'CATG TCCTTAACTA CATCAGCATG GCCTTATAAT TTAATACAaG GAGATAGCTG ATGAAAGAGA C-AGGCTGGAT 71"=GCTTT TTACTTGCCG CC'TCTAT?'r GACT'rTAAAA GAAGTACCCC TCA'VGAT'rrG GATTTTGGA'r AGAA'rCACCC AGGGGGATIT AAAATATGTG CAAAGAATTG TCCTTTTATA TCAGGTTCCC TGCTACAGTC AGGGCTGATA ?rGAAAAATA
TTTATGGGAT
TTGACTAAGC
GCCTTGTT'rA
CCCCCC'T'T
TTGAATCGTG
CTAGTGGT'rA
GTIGCTGGCC
'IrAGCTAGTT
TATCTAGTTA
ACGACAACAG
AGTTTCTTCT
TTrCAATTCT CGGTrCTGGCT T'rAATTTTTC TTTTTTTAGA ?TGTCCGGTC AAATATACTT CTAACCAGTC TCAGATI'AAT TGCTAGCCTI' GCTTGCTCCG AGATTTTCCG AGGCAAGGAG TGCATCAACC AAGTAAT'N'A C'GGACAGC CTACAAGACC ATGGGATTGC TTCTGTTTrG CTGTTT1AAAA GT TATGT CGTGTCCATC TTTTCTTT GGAGTGACCG ATATGTCAAG AATGTAACCC CAGACTCCTT CAGCAGGCTC GTAAATTGAT CTATTTATTA TGGGAGCTCG GCTAAAGATT TGGCACGTTT GGTTCCATTE' TATTGCAACr G.ATATGCTTC AAAATAG'TTC
TAAAACCAAG
GGGC TGAGT
GTCAAATCAG
GTTGATTTCC
ATTTGTGAGG
AACrTGGGAT
CCTTCTTTAT
CAACGTTTG4G 'rTGGCCTCTTG
AGGAACCGAC
TATGGTAAAA
AAATCTTGTG TCCTGGGArr CTTCCTAAAA TTGTAGTCCG TACGATTGTG TTTGCTTTAT TGATTTATGG AGGTATGTCG ACAGTTCTAT AAATGTCGAT CTTGCTTCAC ATGATTGTTA TGGTCATTAT GAGTCGGACA TTAGGAATTI' CTCTTTCTAC CAGGGAAAGA TGAATGCAAT TAGAAAAATA ATA'rGATGAA AATCCTTGAG TAAAGCCAA'r CATGCAAAGA CAG'rTATTTG TCGGACGGT GGTCAATTTT ?TTGCTCTTGA AGCAGAAGGA GCCAGTATGC TAGATATCGG
CGGAATTATC
GCAGGCGCTC
CGGAGAATCG
ACTCGGCCGG GAAGTAGCTA TGTTGAGATA ATCAAAGCGA 'rTCGCAAGGA AAGTGATGTC GTAGCAGAGG CTGCTTTGGC TGCTGGTGCC GGTGATGAGA AAATGGCCTA TG'rGGTAGCT AACCCAGT1TA TGGCTCGACC TCAGCATCCT GGTcAAAccr TTACAGAAAA AGACTTAGCT GAAGAGGAAA TCCAGCGTGT T'rrCCAGTG CTCATCTCTA T'rGATACTTG GAAGAGTCAA GATCTAGTCA ATGATATCAC TGGTCTTATG GAAGCGAGAg CGAAAGTGGT cATCATGTTT AGTTCGCTTA TCT-rCCCTCA TTTTrGTT GACT=GAAA CAT-TGCCAAT CGAAGAcTTG 1560 1620 1680 1740 1800 1860
ATGGTGGCTT
AATATCCTGT
?rACG-GGACC AAGCgATTTG 1296 TCT'rGAACG AGCACTAGCG AGAGCGGCAG AAGCTGGTAT TGGATCCAGG AATTGGCTTT GGTCTGACCA AGAAAGAAAA TGGATAAACT ACATCAGAAG GGCTATCCAA TCTI'TCTCGG TCATCAATAT CCTAGAGGAG AATGCTTG AAGTCAATCC
TGCACCAGAA
TCTGCTTCTT
AGTGTCGCGC
TGAGACACAG
TGCGAGACAG
'rGAAATTGCC
TAAATAAGAT
CGCATTNTGG
CTGGTT'rCC GAAATCGGGA CACCGCTTCG GGTGTAGAAG TGGTGCGCGT CCATGACGTA TCTGCCATTC GTCTGGCTGA TGAAGCGGAA GAAAGAAATT GAAAACAATC AGTGGA'rTGC CTTGG.AACGA ATGGTGGALAC TGTTAGC~rr CCI'CCATATC GGAGGGACTA ACGGCAAGGG AGAAAAGCTA CGGTTGAGAG TTGGCCGTGC= CCAGATT-AGC ATCAATGGGG AATCGATC'TC CTATCAGTCT TTGCTGGAGG GAGAAGCGGT GATTATCACA GCCCTGG-CCT ATGACTACTT GGAAGTTGGC ATGGGTGGAC TTr'1?GGATAG AAT'rACAACT ATTGGCTTGG ATCATGTGGC AGAGCAGA.AG GCAGGTATTA TCAAACAAGG AGAAGCCTTG GCTGTGATTG ACCGCATTC CGGGACAGAT TATCAGGTTC cGrCATCAAGA TACAAGTGCT GTCAGACAAG GTCGCTTC-A GAATGCTGGG ATGGCCATAG CTTTACTTGA GCGTGGCAAT CCCCATCTCA AACTCAAGGT CTCGCTATT GCrTTTTGA AAAAGATGCT TAC'TCGCCC TATCTCA-TTC ATTACACAGA AGAAGCGAGG CTAGAAGCI'C TCATGGCAGA CGCCAATTTA CAGGGCACAA CCGAGTTTCA TGCCTCAGAC CAACTAGATG TGGCCATCAT TACCAATGTC TGTCAGCCCA T'TTTGACAGG TCTACTTGGT GACACCTTGG AGGTCATAGC GATGCCCTTG GTAACAGGGC GTATTGCTCC GCTCATGTAA CTAGTATCGC GCTAGTCACA GGATGGCAGT AA~rrAGA'rT TAAAACAATA TAACTACCCG ACGGATCAAC 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3360 3420 3480 3540 3600 3660
GGAAGGGAAA
AAGTGTGGTG
GACTAGCCI'G
TACTTTTrTGT
AGCAAGCAAT
AATCGTGTCA
GGCCTI-GTTG
TTGTATCAAA
,C GAGCI'ACT CGCAGCrAAG
GACACATAAA
GAT'rTC?1'G GTCAAGCC?1' GGAAGAAACA AGAGATCCCT TGATGATTI'T GGATGGAGCC GTJ*ACCT'rGC AAGAACGI'TT TGCGGATI'AT ACCAAGGCCT TGGAGGATAT GTTGGACTTG cTAAcCATTr TTGCGGATAG TCGGGCGACG TCTAGAAATC TCAGCTACCA AGATTGGCAT AAAGAAGAGA AACAAACAGT TAGGATTGTC GATGCGCCGA GACTTGCCTA ACAGGGGAAC TCTTTGACTA CTTGGTTTGT ACCAAATAGA CAAGAAGATG GTCGAGAGCT AGTTGGCCAG GGCGTTTGGA CACAATCCCC ATGCTATCAA CATAAGGAAA TCCTCTTCAC CTGGGAGCCA TGCCAGTTAC GATGAAAACG TGCTGAAAGA GATTTTCTAG AGCAGAATTT ACAGGTTCCT TGTATTTCTT GAGCCAAGTG AGGGCCTATC TGATGGAGAG GAAGAACGAG AA'rGGATACA CAAAAGAT'rG AAGCGGCTGT AAAAATGATT ATCGAGGCTG TAGGAGAGGA CGCTAATCGC GAGGGCTIGC 1297 AGCAAACACC TGCTCGTGTA GCCCGTATGT ATCAAGAGAT TTI'TTCAGGT CTTGGTCAAA CAGCACAGGA ACATTTrGTCA AAATCCT'MG AAATTATTGA CGATAATATC G7=GAAA AGGATATCTT 'TrrCCATACC ATGTGTGAAC ACCACTCTT GCCA~wrMAT GGTAGAGCGC ACATTGCCTA CAT'rCCAGAT CGTCGTGGG CAGGCTTGTC TAAGCTAGCC CGTACGGrG AAGTTTAT'rC GAAAAAACCA CAAATTCAAG AACGTrGAA TATCGAACTG GCCGATGCCT TGATGGACTA TCTAGGTGCT TGAGTATGCG TGGTGTTAGA TATTTGAAAC AGATAAGGAT AATCCGCTTC AAGCGGAT?'r A'N'TGGAAAT G?1'TGCCTAT 'rTGTCGTTTC AGCCATCCTA CAGCCTCTGT CCA'rTACGGA GTGAAGATTT GATTGAAACG AAAGGAGCCT TTGTTGTCAT TGAGGCGGAA AAACCACGCA CTGCAACCTT GACGACAGTA CTCCGTGACC AAGCTTATCG 7T'rAATGGGG
CATATG=GA
GCTCGTGGTC
CTATAAAAAG
TTCI'AGAAAG GAA'rCATTAT GGATCAAC7'G CAGATTAACG CATGGTCTTT TTCCTAGTGA GAAACAATTG GGGCAGAAAT TCCTATGATA TGACCAAGGC AGCTACAGAC TTGCATTTAA GAATTGTGTC AGCAGTGGAC GAC'rTGG'rTT CAGGAAACGA GTAGCCTATA AACTGGTGGA ACGTACCTTT GAGT'rN'ATC
CTCTTGTCCA
TAGATACTTG
GCAATATGGG
GCATCCATAT
AGGATAGCTT
TAGAAACCTT
CTCCTTTGAT
TCATATTGCC
TGCGCCTCAT
GAAAAAATAG
TCTTCGAAAA
GGCTAGCTTC
TGGGAGGAGG
AGGGTCGAAA
GTGGAAGGTA
AAAGACATTT
AGAAATGAAC TTGGAACTGA AAAAACCTTG CTCGGTAACC A'rTCATCGCC GCAAGCAACG AGATAAACAA GCAAACTTGA AGCAAGCCAT TC'rCAAAGAG TCCAGTGTCT TAGCGACGGA TGCCAATCAA GTGGTTGAGG TGGAALACCTG GTTAGCCATT GAGTCAGAGC TCGGACGGGT TGATTTGGAC TTGCTC=TTG TGGAGGACCA TCATCCTTAC ATAGCGGAAC GCCTTTTTGT TTTATCCA'rC CGATATTAAA ACAACCGATC AAAAACTCTA GTTTTCAGTT ACTTGCAACT TCTCTTCAAA CCACCTCAGC GTCGCCrAC CTAG7TTTCCT CTTTGATTTT CATTGAG'TAT GGCACCGGTG CATTTGTCAC AGCCTTTATC GCCCTAGGAA TGACAAACTG CGAGCTCGTG GCCTTGCGGT GGAGTGGAGC GCTACCAGCA CAAGACTTGT GAGAGAAGTG CATTGGGGAC GATCCTTTAT ACAGACGACC CCTTGAGTCt TACAGGAAAT CGCAACTTGT ATGATGCTT GAAGGCTAGA GTTrTTTATAC CGTACTCAAG TACAGCTTGC TAAAATAGGT CATTTTCTTC 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400
ATAGI"=C
GGATGGTTAA
ATCTTTCCTT
TACCGTCCAT
AGTCAAAATC
GGTTATTAAG
GTCTAAAACC
AATGGCTGTA
CAATTGAAAC
ATACCAAAAA
AGTACTC?1'G
GGCAGG;TGTT
GGGGATAACG
GACTTGAAAA
TCGAGrrCTT CTTCCAA'rrC GAGTCAATGA 'rGTCATCAGG TTTAAGAAAT GGTCGATGAT CAAGCTGGTA ACAATACCAA TCTGTCCCCT TTCTTTTCCC ATTTTTTGAA AAGGAT?1'TA CGAAGTTAAT AGGTAAATI'C AGTTACCGAA AAATATTr TAGGCGGTAT TGT'PTACCCC GAAGGAACAC CCATCACCGT AACAACATTT ATGGCTAAAC 'rGATGTACCA CTTGGACGTC 1298 AACTAGCAGA TCGCATGTGG GTATTGGTAA AAGCCATATC 'rTA'rCATACA GCAAATAGGA TTAAAAATCA AGAAAAGGTG GTTACAGGGA GAAATAGGGA AAAAATrCCr AAAAATCTAC CCAAATTAAC I-rGATrATAT AACTrTCAGT TACTTTGAGA CATATCTATr GAC1TN'TAGG GGTAAAATTT GGTATGATAG ATTrGAAAGG CCCCGGAACC TTCCAAATAC TT'rTCGATGG AAACAAAAAT CGAACTATAT ATAGGAGAAA TCATGAACAA CAGGCCAAGT TGAACGTAAA TGGTACGTAG TTGACGCAAC TTTCTGCAGT AGTTGCTAGC GT 5460 5520 5580 5640 5700 5760 5820 5880 5922 INFORMATION FOR SEQ ID NO: 268: SEQUENCE CHARACTERISTICS: LENGTH: 1988 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268: TAACTATCTA CGATGAGCTG TTGTGATTCT CATTAGTTCC CCTTTCCCAA GAGGCATAGG GGTGCGCATA ATAGATGTGC TCCTCAGAAA ATATATCAAA. CAAGCGATTG AATTCCGTTC CATTATCTGC CGTGATGGAA AGAATCTTGT GTTGTTTTAA GATGAGTTTT AGAGCCTGAT TGACCACCTC AGCAC~rTTA TTTGGAATCA ATCGGATGAT CTGATGTCTA CTCTTTCGAT CCGTCAAGAC AATCAAGCAG TAGTTTTTCG ATCTCGTAAG TAGAACCGTA TCAATCTCAT AATGCCCATT CTCCAAGCGA AGATTGATAG CAGGTTTAAA GTTGGTGCTA GCCTGTTTCT AATCCTGCTr GCTTAACCCC AA7T=TCCAT CCACGTTAAC CCCTTTAGCC ATAACCATCA GGAGAATCTT TTCCTTTAGT TCCTTGGTCA !rT7TTCATA AGACTGI-rGA GCGTAGTCGG CAAGACATG TCGGACTGTC CCACGCTTGA AAGTAGAGAG GCAATNTCTC TATTTGATTT ACGGCTATCG ATrGTCAAAT GTTTGGCTTT TTTC?1'GTGT ?I'GTGGTTGA ACAACAAGTA CTTCAGGCCG CTGTTCGATG GATTGACCAG TAAGCGCTTT TCCTTTTCTA GGGTAAAGCA GATGAATCCA ATAGTAAATG GTTGAAATTC 'N'TCAGGCGA AAATT7TTGG TTATGATAGT AGCTTGATTT CTTGACCGAG CGCTTGCGAT
CAGAATAAAC
TTTCAGTGTG
TCCTTCTT1'
CTCTTTGAAG
ATAGTTTGAG
TTCCATCTTT
CGCCCTTTTC
GAGCTTT TCC
CGATTAAGCG
TGTAGTATAA TTGTCTTGCA TCTCTGTGCC TAACACAGAG GTGCTTTCTT ATGCCTACA-A GAGCTTTCAT TATTTCCATT TTCTTrGGA ?TT ACTGAAG CTAGCAAGTC ?1'ACCTGTAA GGGAATCTCG GATTAATAGA TAGAGAGITT CAACArCACT TCTTTTCTTA ACCTGATAAG TATTGTCCTC AAATCGTCT AGAAAGACAC TAGTTACTCA TATACATCAG TCCACCTCCA ACCTCCATAC 'rCACCTrACA GTCTTGCAAA GAAAAAACTT CTGGTAAGrA TTGACAAAGG TTGCAATAGT TTTGAACCGA GATTT-CTAGC CAATTTCCAA ATCTTCGACA TA.AAGATTAA AACACGAATC AGATTCCCTA GTATAGTGAA ATGAAATAAA CAATGTCTA GAAATCAAAG AATAAAAATC AAAGAGCAAA GGT'rGTAGAT AGAAcTGACG TGACGAAGTC GGCTCAAAAC
AAAACAGG
CCTTGTAGCC
CAAAACAACA,
TTCCTTGCCC
AAGTCTCTGC
GTGGCTCACT
TTTCTTGCGA
ACATGCGCAA
TGTACTATTT
CTAGGA.AACT
AAGTCAGCTC
ATGGTTTTGA
GAT1TCTTTTG TTTC-ACTCTA TTCTGAAAAA CTTGTGTATA ATTTAATGAA AGCAACACAA AATCCGAGAG 7rAGTrrAAA TAAATTG?1'T AAAATATCAA TCTTGAT'rCC TAATTTTGGG CCTACOATTA AATI'TCTAGG TTATAACTGG TATTTATCGA TACTTATGTG CGAGCCTCTC TT1-GTAT'TAT GTAATAATAT C1TTTGCCTAA TGTAGAGACA ATGTTTCTGA TAAAACTCCG GTGCCTGGAA ATTTCGATTC TTAGCTTCAC TT'TCTGCCTC TCGCAGTTCC TCTTTTACAA ACAAATACTC TATCAAACCT GCCAGGAGAT TGCCCTTTTC TTCAGCCTCT TCTCTTTTTG AACGGTTATA TTNATGTGAT TCCTTATTTT CCAATCTAAA ATCGATTAAG GAATTTAATC TAATTTCTAA TAACTTCAAT GCACTATACA TCTAATACTC AGCCGCAGGT TGCTCA.AAAZ ACTGirTTGA AAAACATAGT TTTGAGGTTG TAGATGAAAC GGTTGTAGAT GAAACTGACG AAGTCAGCTC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1960 1920 1980 1988 INFORMATION FOR SEQ ID NO: 269: SEQUENCE CHARACTERISTICS: LENGTH: 709 base pairs (8B) TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269: CCGGATATTT GNTTTATGTA ATrTTCTTGC AAGTTTCTTC TTAGTAGCTT GTCAGTCAGG TITCTAATGGT TCTCAGTCTG CTGTGGATGC TATCAAACAA AAAGGGAAAT TAGTTGTGGC AACCAGTCCT GACTATGCAC CCTr1'GAATT TCAATCATTG GTTGATGGAA AGAACCAGGT AGTCGGTGCA. GACATCGACA TGGCTCAGGC TATCGCTGAT GAAcTTGGGG TTAAGTTGGA AATCTCAAGC ATGAGTTTTG ACAATGTTTT AGCAGTrGCA GGAATTAGTG CTACTGACGA ATACTATGAA AACAAGATTA GTTrCTrGGT TTrAACTACC CTAGAAAG'rG CTAATATTGC GGTCAAGGAA CAATTGCCAA AAGTTCAATT CAATGAATTG CAGGCTCGAA AAATAGATGC TTATGCTGCT AAAAACGCTG GCTTAGCTGT CGACGCCAAT GCCGyTGCTC TTAGAAaA'rA INFORMATION FOR SEQ ID NO: 27 SEQUENCE CHARACTERISTICS LENGTH: 1680 base p TYPE: nucleic acid STRANDEDNESS: doubi TOPOLOGY: linear 1300 GACCAGTCTT CAAACTGGTA AGGCTGACCT GAGAAAAGAA. GTCTINGATT TTTCAATCCC TCGTAAGGCT GATGTGGAA.A AATACAAGGA AGCCCAAAAA GGGACTCTTC CAGAATCAAT AACTTCCCTA ACTAATATGG GTGAAGCAGT TGTTCATATG GATGAGCCTG TTGCACTTAG CGCAACTCTC AGCTTGAAGA TGAAGGACGG CTGATGATTT GAAAGAAGT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270: TATAA6AATGT TAAGTTAAAT TTTATTTTA ACAGGAGTGA TGTGATGGGG CTATTCTAGC ACTTGGAAGA GTATGAAAGC ATATTTATGA AGACGGGGTG GTCAGTTGTA TAGAAAGTGT GTGC'N'TATG ATATAATGTT CAAATTTCTA CTCTTCGACC GGCTTTGACC CAACTTCTA.A TTACGTTCCT ATGAACAAGG AGAGCTGGTT AACACGCGCT -TAGTTTTCTT GCCCAGCGTT CGCTCATGAT CTCTTGGACA TGGCATTACT GCCATTCAGA TCAAGTCTTT ATCTCAGAAC GATTGGCCAG CAAATTGCTG GATTTCAAAA TTCAGAAAGG AACTATAGTG TTTCTAAATT TTTAGAAACC TTCAAAAATT ATTTAGTTTA TAGGAATTCT TTCGATAGTT AGTATGTTC TCGAATTTTT TTAAGTGATT CTTAATGAAT TTTCAGAAAG TCGACCACAC TCTTCTTGAT AAGAAGAAGG AGTTGCGGAT CTCTCTGGAA AGACTTGGAG TTTCTCGTTT ATTTGCTCAT ACCAATTTTA CCTCGCCCAG GCCTCATTGA GCGTGATTAT CAGGACGPT GGCTCAATCT AGTTGCAAAC TCAAAAGCCG GA'N'TAGTAA AGAAAAGACG GATTGCTTTA TGCAGTTCCT GTGA.ATCAAT CAAAACTGAT AAAATTTAAG GCA6ATCAATT AGGTCTAGAA TTACATATAT TATTCTGAAA GA TTT GAGCT AAATTAGTTA ATTGTATGAG GAAAACCTCA AATTGTTCTA TTTGATGCTG CTGAGGATGT ATTCAGGCTT ATA.AAGAT'.A CTGAAGAAAA TCAGTAAACA TTTGGACAGG AAAAAGACGG CAGGGACAAA CACTATCGGG AACTTCTATG CTGCGACAAA GGTCTAGCAC CTTATTTCAA GATGCTCTI'T TTTATGAAAA CTGATGATTG GAGATTCTCT AACCGCCGAC ATTCAAGGTG GCAATAATGC GGGGATTGAC ACTATCTGGT ATAATCCTCA TCACCTCGAA AATCACACAC GCTGGATTIGT TTAGATAAAA TGACTACAAA AAAGCTAATA TAATGAAGCT GTTAATTATG TcAATCrCTT TTrAAGTTr GAAATCTCCA TCATTAGTCA TAATCTTAGA GTAATAGGAA CAATATGAAT CGTATGGCT AGATCATGTA GCAAAATATG TTTAAAAAGA GCGATTTTAC AAGCCCAGCC GACTTACGAA GTCrATTCTT ACCAAGACT ATATTCIGA AAAGATCACA TTTTAAAGGA GACGAGCTAA TT-ACTATTIGA AGAGTACAT GAAATGTrCTG A.AGTTGATTT AATTrACATC TGACACTTGT CAATTAGCAA ATAGTATTA TTGATAAGAA AAATTTCTCT GGCGATTTAA ?TTACTrG AAGAAGGGGA TTATATTGGG AGAAGGGAT CACAAGTAGA ATATATrCC GAATTATCTT ACTAATCGAA AATATAGCCT GTATGGGAGA TTrTCCTCAT GACrTrTTG ATATATACCT CCTACGAACA AAAAGT-rAAT AATATTAAAG AGTATTATCC ACCAAGAGAA TGCATTGTAT TrCGATTTT TTCTAATTT 1020a 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 S.
S.
S. 55 TGACGACTTT N'AGAAAAAA ATTATTTAAA GACTATATGG CAAGTTTCTA AAGAAACTCC INFORMATION FOR SEQ ID NO: 271: SEQUENCE CHARACTERISTICS: LENGTH: 598 base pairs TYPE: nucleic acid STRANDEDNESS: double (D)'TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271: S S S. *5 S S
S
AGCTCGGTAC GTAGTATnTG CCGATAAGA.A CACCGGTAGC CCTTGATGGT TTTTAGATAA TTTCTACCAG ACCGTTTGGC CTGATTGGGA AATTTGGATA AATCA.ATGCT GTAGGTCTGA AZCGCTGAA'r CTrGCCAATC TTTCTCT'N'G GTCCAGATTT GAGGTGCTrTG GACAATCAGC TCATAGTTTTI CTCTCCTTTr CGGCGTGAAG AGTAGGTGAA GCTTTCGTTT GTGAgCTTGA GGTGAGAGAT GGAATCGATA AAATIrGGTTG GCAAGAGTTT AAGAACCTGA CTCGCGGTTT CTGCTAGAAC CTTCCGATTC TCAACTAGAT AGACCTGATC ATCGATPTTT TCTGCGAACT CGATGAC?1'T CTGGACTTTT TTrTCCTCCT CGTAAGTCTC ACTAATCTGT CAGTrATACA AGGTTGTGAT CACT'rCCTGT
TTCAAGAAAA
TTGATGATTT
TAGAGTCGCA
TCGATATAGA
CCTTTGATTT
TCGGTTTCTT
AGTTCTACTT
ATATCCGG
TGGTGCATAA ATGAGTGAAA AGAGGATAGA GAGGATGAGG TGCATCGTGA AATACTTGTT TTTTCATAGT TCTAATTTCT INFORMATION FOR SEQ ID NO: 272: SEQUENCE CHARACTERISTICS: LENGTH: 1099 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 272: CCAGCAAATC AATAACTGCA ATTGCTATAA AATGGATTCT ATAGAG~rqr TrCATGACAA GACCTCCCTC TTTTATCTAA CTTCATTCTA GATAAAAATA GCAGAAGGGA GATTCTCTrA TTCTATTATC TAACTTCTTC ATCATTCCAG TAAAAGATGT ATTTACCGAT ATTGGCGAAG ACGAAAT TGT AAATCAACCA AAAGCCCCAC TTGGCAATCA GGGACAGTGC AAAGGCAATA CCCCCATATC CGATATAGTT GGTCACAAAG CTCCAAAAGA ATG.GGAGTTA CAACTAAAAT AGT TGGCTAG TATTCTTTAT TTGAGTTTCC ACAAATAAAG CTCCGATTGC ATTGAGGATA TTTCCTrGAA TACCAGCTT TGTCAGCTGA TGAGTTGTTA GTTTTAATGC ATTCAAAGCA GTTGTTACGT AGGCAAGGAG ATTCATCTTG GCAAAGAGGA AGGCGATGAT GGAAATGATG 0 *0 S 9.
S
S* 00 S S *9 CO S S
C
9*C*.C
C
CC..
S
SC..
C* *9 4 *5*S *St#
S
ATGGCCGCCA ATTTTACCTG TTTTTGGCTC AT7'rGGTTGG GTCTGCCTTC TTGCG.AAGCT TCCCACTTCT TTATAGCAAA GGTATAAATG AGGAAGGTGA CGGGATAGGT AATGATGGCC GCCTTATTTC CAAGGATATA ATCAATAGCA CCGGACAAAA TGGTATTAAC AATACCAAAG TAATTTCCCC ATTTGCTrAA TTTCCCCGTG AAACGAGTGG ACAACATGGA AATCCCAACG TTGGTTACGG AAATCAATCC AAAGGGTACA AGAGCTGTCC ATGATCCCCA GTCTACA.AAT TrATCGAGGT GTGAGTTGAG GTAACCAGAT GCAATCGCAA TCCCAACCAC CAAAGCAACC CCGAAGAGGT CAAACTATTT AGATGTAGCA AAAATTA GTGATTTTTT CATAGGTTAA ACTACCTTTC TTTTTTTCAA ATATTCTCCC ACCAAATGAA AGTAAAATAA AATGATAGAA ATAAAACCCT GAAAATAAAG GTTCTATAAT ATTTGTAGTG GGTAAATCCA CTATAGATAT TATGGAGCCT ATTTTATTGT AGAAAAAAAG TCCCATATGA CCTATAATGA AAAGCGACAA AACAACTCAT TAGAAAGAT INFORMATION FOR SEQ ID NO: 273: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2723 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1099 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273:
CTGGGATTCA
TGAAATrcAA AGCrCCTATC
GCAATTAGAG
CCTCTATG=
CGTGAAAAGG AACCCCAGAG TA'rrACCATA GAGAGTCAAC GAGGGAATCT ATAAATACTT 1303 AGTAGCCAGG TGTACTGCTA GAACAGTGAG CCAGATACCT CAGGCTrrAG r'rGAAAATCA TAGCCTTAGC AT1GCCAGACT ATTTTTACTG
AGTCTATGTT
ACCAGCAGGA
AGTrGATTTAC
TCATACTCCG
TGAGACAGTT
GGTTCAGGTG
GATTGTGGGT
TGCTAATTAC
CGAAGAGGGG
GACCACCAAT
GCTTAGTCAA
GTATAATACG
AGATATCATT
CCTCCTCAAA
TCCCAAGTCC
CAAGTTTACC
CCGAAAGCTT CCCC=ATAT ATCAGTCTCT CGAAATGAT'r rT=AACTGG CGTGGCCATT TCTACTAAAG ACAAACGTAG GkkAGAXAAA AA'rACTr"G CCATTCC-AGT GTCAGATCCA ATCTCCTTGG ATCCTGCTGT TTTATACCAT ATGGCAGTAA CAGTGACCTC ACCI-rT'rGAT GATAAGGAGA GTGAAAATTG GCTAGTTGGC GCAGTTCCTA AAAACTTGT T-rrACAAGGA ?TGAGCCTTC TCTTTATTrGT CATTCTTTAT CAAAAGCAGG TAGTGGATTT AGTAGAATCC CGTCGGATTG ACATTTCCGA GAAAGATCAG GATATGTTGG ATCGATTGGA AAAGAATATC AAAdATGCCiA ATATGCGAGc CTTCGCAGGCG C1'GGAGTCT TGCGCATGTA TGCAGTTATG TATGAATTCA GTACTCTCITr GCGTAACAAT CAGGAATTAG AATTTTGCCG TA.AATACAC CTGTATGAALA ATGrTGATGA GC rrCAAG ATTACAAGGA ATCAGGGCTG AGGAT'ITCAA GTGTCAGATC AAGACTTAGG GCCA?1'GATA ATAC'TAGAGG ACGGAGATIT 'rTCATATGGG TTA.ACTTCTC ATGGATATCA ACAG1'GACTA GcTrCTCr CTGACTTTGA GGCAGACTTT A7-iCAAGTCA T'rGCTCkAGG GAATTACTCC TA.ATCGCGGA CATGATATTT ACCAGTTAGA 120 240 300 360 420 480 540 600 660 720 780 840 900 950 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740
CAAATCAATC
CAGAGTCAAG
ATTTCCGACG
TATCTCTGCA
TTAGAGAATA
CTCATrrTAT
ATGAGTTGGC
AAAGAGAGAC
'rGGTTCGCTA
TGAAGATTCC
ATGCCTATC
'ITGCAACCGC
GTTTCAAGAT AGATCCAGAG TGGI'AGAAAA CTATTTCGCG GACAGATAAT GTGATTAGCA TCAAGGCTCT TAAACAGGAT GGTCGATAAT GGTAGAGGAA TGTCGGCTGA AAAGTTGGCA TCAGAGATAT 'PrrGAACACC AAGCCAGCTA CAGTGATCAA CAATGTACAC GAGCGM=rG TGCTCrATTT 'rGGAGACCC TGCAGAGCAA GCCGGTG=C AGTATCGTAT TACAATTCAA ATGTATAAAC TATTATTAGT AGATGATGAG TACATGGTGA ATTCCCTTTG ATAAGTGGGA TATGGAGG1TC GTCGCAACAG CATGGTGTTG ACCACAGGCG GGTTTTGTGG AAATTTTGGT AATATCCGAG AAAAATTAAG AGGCAGTCI'A TCGGGATTGT TATGCCATTA CTATAGAGTC GATGAGTAGA AAGGGAGAAA CAGAAGGCT GAAGCGTTTG CCAGTCATGC CGATGAAGCT CTAGAATATG 'rTCAGGAAAA 'rCCTGTCGAT GTCATCATTT CCGATGTCAA TATGCCAGAC 1304 AAAACAGGGc TTGATA'rGAT TCGGGAGATG AAAGAGATCT TACCAGATGC TGCC'TATATC CTGCTCTCAG GI'ATCAGGA CN'TGATTAT GACTATTTGG TCAAGCCTGT TGATAAGGTA GGTCAGCTCG GCGAGAGAGG GAAGAAAAGT GGArrrGTTA GTrATTTAGG GGATAAGGAG CAAGG?1'CC!T TCACCA'PTCC CTACTATGTC GGCCACCCCC TAGATGGTTT AGTCGTTACA GAACGCTGGA AGCTGAATGC TGAGAAAACC TCTGAGAGTC TCTTTGCCTA TTACGAACCG AA'CAAATCG TAGAAGAGTI' AAATCTCTTG GTT1TCGA'PTA CTAAACAGCr TTPTATCCAG CATCTCAAAG CTGATGATAT GACGGACATT GATGAATTGG TTTCTTATAT CAAGGAAACT AATGAAAATG TGGTCAGTGT GCTGGAAGTC GTAAAAAGAG CAATGAACCT TAGTGTGGTG GAGCTGGGAA ATCTGCTGGA GAAGATTGCA CAGACTCTTA GTCAAGAATT AGACGAGGCT AATTGGTGGA TAGGTCTATC CAAGGAAAAA TTGGGTCAAG ACTGGCAGAT TT TCATTTCT CCTTTGAAG CTCCTTATCA AGAACACT CTCTIr'rACG GTTCTGTAAA TCTGCAGCAG ATTTATAGGG TTATCATTCA GGGAAATCTC GAGAAGGTAG TrCTTGAAAA TACACCTCGT TTTGTCATGG ATGTT TPTCCA TTTATTTGAA GTCAAAACCA TTCATGCTAT TCAATCCTrC CTGATCAGCT TTTTCGGTCA ATACCGTATG ATTGGTCGTG ATTACCAAAA AGAGCTTTCC CTCAAGGATA TCAGTAAGGC CCTCTTTATC AATCCTGTCT ATCTAGGGCA GTTGAT'rAAG CGTGAAACCG ATTCGACCTT TGCAGAGTTA CTAAACAAAC AACGTATrAA GGCTGCCCAG CAGCTCTTGC TTTCAACTAG TIGA INFORMATION FOR SEQ ID NO: 274: SEQUENCE CHARACTERISTICS: LENGTH: 836 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274: CCGCAGTITT TTTAAACCGT ATATAAGTAT AGCATAGTCA AAAAAAGAAT GCAAGATTT TGCAAACTTT TrTAAAATTT TTCGTAATTT TTCTTTTA.AA GTTCTACTGT CAGGACTTGA CCTrGCI'TAA CAACCTGTTC TCCGGCGATA TAAACATCAT CTACATCACT AGATTTAACT GCATAAACCA GGTGAGACAG CATATTTTCC TGAGGTTGGA GATGAATT'PT CCCTTGTGGT TGAATGACCA GAAAATCTGC 'rTGCTTGCCG ACTTCCAGAC TTCCTATCTG ATTI'TCCATT CCAAGGACCT TAGCCCCTTC GATTGTCAGT ACCTTGAGAG CTGTTTCGAT TGGAAACTGG CTGGCATCCC CACTTTTCAT CTTCTGAAGA AGAGCTGCAG TCCTTCCTTC CTCAAACATA 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2723 120 180 240 300 360 420 TCTAGATTGT TATTCGAAGC AACCGAGTCA TGGAGCTGGA TAATTGGAGC AATTCCTGAT GCGATAGCrDA CTTGAGAAGA TGCCAAGCGT 'rGAGCAAATA CGGACGGATG ATCTAAATAA TTGCCGTATC GTTTGAGGAT AATTCCTGAC AGCGGAATAT TTAGCTCTTT TGCCATT1'CC CTATACOGAG AATGAGGTGC TACCATAACC INFORMATION FOR SEQ ID NO: 27 SEQUENCE CHARACTrERIS'TICS LENGTH: 2335 base TYPE: nucleic acid STRANDEDNESS: doub] TOPOLOGY: linear 1305 GTCGCAATTC CGACTGCTAC TCCCGCTT GCCAGI-1-GA GGTTACTGAT AGGATTGTGG TCAATTTCTC TCTCGTTTAA TTCGACCCCG CCCAGTTCTT CAAGAAAAGC AAGGGGGCGT TCCTCCTrGG TCTCCGCCAC ATGGACATGG AAACTCGCTT CCAGCAAGTC TCTACTGCAG TTGAAATTTG GATTrMATA TTTTAA (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275: ATTTTATTTC ACTrTTTTAGG TCGTCTGGGG CTA'rrCTTAT ATAGCnTC.AA GACCATGGGA GACGGTTTrAC AACAAGCTGC 'rGGAGATCGC CTGGGTTTT AATCCTTTGT 'rrGGAGTTCT GGTTGGTATT GGGATGACTG GGTGTAACAG TTATCACAGT CGGCCTGGTC AGTGCCGGTC ATCGGGATTG TCATGGGTGC TAATA'PTGGG ACAACTGTCA AAATTAGGTA ACTATGCCCT ACCTATGCTC TTTATCGGTG AAAAATCGGA CAGTCAATAA TATCGGACGC ATCCTCTTTG GCCCTCAATC TCATGAGCGG CGCAATGGCI' CCACTCAACG TATATGATTG AGCTAAGTAA GAATCCTGT T rGGGTGTCT CTCTAATTCA GTCTAGTTCT T TTAACCTT ACGTCAGGCT CATCCTTTCT CATCGGTTTT CCGTCTGTCT TT'TTrACG GTGTCGGTGG TATCT!TTTTT ATTTACAGGT CTTAAGGAC TTGTCGCTAC TGGCTTGACC TTGCTAATTC AAGCTTCTTC GGCTACCATT GGGATTTTAC AAAACCTCTA CTAATTGATC TACAGGGAGC TS'TGCCACTT CTATTTGGTG ACAATATCGG ACAGCCATCA TTGCCTCTT AGGGGCTAAT ATTGCAGCTA AACGGGTAGC
CGCCGGCAAT
GACAACCATT
AGGAGCTCAT
*.GTTGCCTTCA ACGTTATCGG AACAGrrGTC CTGATTCATT GGTr'rGAAGC TACGCTAAAT CACGGA&ACCT TT-AATATTAC CAACACCATT TACTTTGTAA CCAAGATT-AT TCCTGGACAG TGCGTAI'rT TT-CTAGTTCC Tr'rTACTGTC CTAGCACCGG AA6ATGACCAT CGCCTTTGCT GTCCAATTTC CA'PTTATCGG AGCTCTGGCT GACGAGGTTG TCAAATACGA ACCCTTATAT 1306 CTTGATGAAC ATTTCATCAA ACAGGCCCCA TCTATCGCTC TAGGAAATGC TAAGAAAGAG 1020 CTCrTGCACT TAGGAAACTA CGCTGCTAAA GCCTTTGACC TTrCCTATAA GTACATCATT 1080 GACTrGGATG AAAAAGTTGC TGAAAAAGGG CATAAAACCG AAGAAGCAAT TAACACCATC 1140 GATGAGCAAT TAACACGTTA TCTCATI'GCC CTrTCA.AGCG AAGCTCTCAG CCAAAAAGAA 1200 AGTGAAGTGC TTACCAATAT CCTTGATTCC TCCCGTGATT TGGAACGGAT TGGAGACCAC 1260 ACGGAGGCTC TACTCAATCT GACTGACTAT CICAACGGA AAAATGTTGA AT'rTTCTGAT 1320 GCCGCCTrGA AAGAATrAGA GGAAGTTTAC CGCCAAACTA GTGAC?1'TAT CAAAGATGCT 1380 CTGGATAGTG TGGAAAACAA TGATATrGAA AAAGCACGCA GTCTTGTAGA ACGTCATGAA 1440 GCAATCAATA AGATAGAACG TGTTCTCAGA AAAACCCACA TCAAACGCCT CAACAAAGGC 1500 GAATGTTCAA CACAAGCTGG GGTCAACTTT ATCGACATCA TCTCACACTA CACTCGTGTA 1560 TCAGACCACG CTATGAACCT TGCTGAAAAG GTrTTTGCAG AACAAATCTA AGAACCAAGA. 1620 AGCTATCCAT CATAATTGGA TGGCTTTTTA CTrTCCTA AGCAAGACTA GGATGAATGA 1680 AACrGAAAGA GTATTCTGCA GATATATAGT CCCCAATTAT TCACCCCAAA TCTAAAAACC 1740 ***ATCCAGAATC CTI'GCCTT1AG CTTAGATCCT GGATGGTTTC TTTTTTCACC CAATGGGTGT 1800 TT'TTTACTAG ACAAAAAAGA GTTTCCCCTT TATGGTATA.A GTGTAGAAAA AAACACAAAA 1860 AGAAAGGAAA CTCACATGAA CAGTT'rACCA AATCATCACT TCCAAAACAA GTCTT'NTAC 1920 :.*CAACTATCTT TCGATGGAGG 'rCATTTAACC CAGTATGGTG GTCTTATCTT TTTTCAGGAA 1980 CTTTTTTCCC AGTTGAAACT AAAAGAGCGG ATTrCTAAGT ATTTAGTAAC GAATGACCAA 2040 CGCCGCTACT GTCGTTATTC GGATTCAGAT ATCCTTGTCC AGTTCCTCTr TCAACTGTrA 2100 *ACAGGTTATG GAACGGACTA TGCTTGTAA.A GAATTGTCAG CTGATGCCTA CTT'rCCAAAA 2160 TTGTTGGA.AG GAGGGCAGCT TGC TTCACAG CCAACCTTAT CCCGTTTTCT TTCCAGAACT 2220 GACGAGGAAA CAGTCCATAG TTTGCGATGC CTCAACCTTG AATgGkCGAA TTCT'T1'TAc 2280 .AGTTTCACCA GCTAAACCAA C'rCATTGTAG ATATCGATTC TACCCATTTC ACA6AC 2335 INFORMATION FOR SEQ ID NO: 276: SEQUENCE CHARACTERISTICS: LENGTH: 752 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 276: CGGATTCACT GTTGTTGACT AATCAATAAC ACAGTAGAAA ATCTCACAGC ACTCTATTAG TrTcCTTCA
GGCCGTAGAA
GACGTGGGAG
AGCAAACAAC
GAAGCTGTAG
GCTTACAACT
TTGCCAAACG
GAAGAAGCAA
AACGACGGTT
GGACGTCTTG
GTAACAArGG
GCTGACCGTG
TACTAGGCAA
GAGAAAAATA
ATGAAAATCG
TTCGTGCTGC
CACTTGCAAA
TGAACATCGA
GTACTGGTAA
AAGCTGCTGG
1307 G'TGACTGAGG CTTGTACTTG GTAGACTGAA AACCCGCAAG ATTGAACCAC TTACAAGGAG TCTTGAGAAA ATCGACAGCA AGAAACTAAC TTITGCAAAAT CGTTAAAAAA GCrGACCAAC AACTrCACGT GTTCTTGTTT TGCAGACTr'r GTrGGTGAAG GGTACAGCAA GGGAGCTTAA ACrTCATCAT TTCGAGAAGT AATAGAAAAT GGCTAAAAAA CAAAAGCATA CAGTGTrAGAA T-rGATGCAAC TGTAGAAGT'r AAATCCGTGG AGC.AATGGTA TCGCACGTGG TGCAAAAGCT ATGACCTTGT TGCTAAAATC CTGATATGAT GGCTCTTGTT CAAACCCTAA AACTGGTACT GTGGTAAAAT CACTTACCGT GGTTGGACTT CGACGTAGtT ATCCCTACAC GACGTGTCCT TGGACCACGT AACTTGATGC ATGTTGGCAA AGCGGTTGAA GAGTCTAAAG CAGGTAACGT TCAAGCAATC AT INFORMATION FOR SEQ ID NO: 277: SEQUENCE CHARACTERISTICS: LENGTH: 2643 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 277:
GTCAACATTG
AATGAAATAA
AGAAGTAGAG
CATTTGAACG
GAAGACATTT
CCTTAAAGCC
GJ4TATGAAGA
ATCAAAGATT
TCCCATCCAT
AGCGTTTCT
TCTATGTTTC
ATTTrCAAGGC TGT'rTGCTTT CTATCTCCCC TTTCATAA TGTATAATAA
TAACAGGACG
GTGTACTATT
ATTTCAAATC
AGAGAAATAC
TTGCTACCTA
PTGCTGTCT
GAAAAGACTC
TGTTCTTGAA
GTGCTCACCT
TCCATTATAC
AATTGATCGG GACAGTCAAA TCGAT'rTCTA ACAATGTTT CTAGTTTCAA TCTACTATAT TTTCGTACAG GTGCTTCAAC CTTCTTTTTwG GTAAAGATTC TGAGCTCTTT GATTTGCCTC TGTCTATATC TCTATTTTCA AATGCTPAAC TAACAAATTT AGCCTTGCTC CTGTTTCTGG GGGTTGATAA AAAATCTCCC CTAGCCTGAT T1'TCTGGATA AATCCCACAA ACTCTTGTTC CTTCCAAGGC TTGAAGTGTC AGTAGAAAAG GAATCCTTGG AGGATTTGCC TAGGGAGTTG GACCACTGGC ATACAAATTG TTTCTTCAAA ACGAATTGTC ATCTTTTCCT CACCACCTTA TATTTCTCCC ATTT?1'TACG AATAGATAAG TATGATTGAT 1308 TTTTATTI'rT TTCTCG'TCGG GAGCATTCTA GC1'TCCTTTC 7rGGrTGT CATTGACCGT TTTCCAGAGC AATCCATTA'r CGTCCCTTAG ATTTGATTCC TGCAAAGTTC GCTA'rCCTGT CTGCT'rrACT CTTGG7GGATG ACCTTGGGTA TCTACGACTT CAGCTAATCC TAATAGCTTC GGAAT'rTTGG CTCA'TrTAT CAGTTCAGCC AGTCACTGCG ATTCCTGTCA GACTCCCTTG GATTCTCTCA CAGGTCTTCA ATCGCTCG CTGTCCAC CTGGTATGCC CTC'1-1-GAA'r GCTCTCCTTG GGGCAAGTCG
TCACCATCAG
CTCTGGCTGG
CGATATCCGC
GAATATCCCT
AATCTGGTCA
ATGGGTGCAG
TAAGCTTrAGG ACTCCTCT TeeAATCAC CGCTGGTTTG TACTGGTCTG GATGACTT'rC TGGTCTCCTT CCTCATACTT GGGATTTCCT CTTTTAGCT TGATTCAGTT CGCTTCTGCG GAC'rTCCTTT CGTGCCTTTC TCTTGTGCTC TCGTCTTTAG CGTAACGGAG TTACTGATCT ACGGGTATCC TGGCCTTTCT CCTGCAAAAG AAAAAGGAA
C.
C
C
C
C
.CCC
CTCTTACTTG CTAC7TG'r'I ATT-TCTGCCA TATATCCTTC GTACAAAGGG ATCATGTTAT AAATI-rCATT TCGACTGAAA CATTATAGTC GCTAGACTGA
GATTATTTT
ATGAAArrAT
CAAATCGATC
ATATTTCGCT
CTAGATGATT
GGTAAGCTAC TGCTTGTCTG ATAAAATCCA TT-CACAGTTA AATTATAAAT TATTTC7TTT TGTTCTTCTA TCTTCTTGAT TATAAACTG;T AAACGAATAC ACTCAAAACG ACGTCCAGAA ATCCAAGGGT TCAATCATTT TGC=GGA GTCAATAATC TTTGCTTGGT TTTTTAACAA AAATTTGATC TTGcCCGC)ATT TGACGATAAA AGTAAGAATG
ACTGATCAAA
TTTGTTTAGA
TACTCTTTAC
TGTAACCTTT
CTAACTTAAA
AAATATAACT
ACTCTTCT'TT
CGCCATC
AGCTCGATAC
ATCAGGATTC
TGGCAATAAA
TCTAAAGCCT GTATCGAAAC ATTCAAATCC GACTTCAATIA CTGACACGCT TGCCAACCCT CTCTTCAAAT TTGACTAAAA AAACATGATG CAAAATAATT TGCTTCTTGC TCCAAACGAT 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 100 2160 2220 2280 2340 2400 2460 A7TTCATATCT TTATATTI'AT GTAAAAGAAT ATCTCCTAGC TCATGAGCTA AGTCAAAATT TCGACGTACA GATGATTTAT TCGTTCCTAA CACAATATAA GGTCTTCCCA Ar=rGACCA TGCGCTATAA GCATCACTT GGCCATTAA'r TAATCGTTCC ACGATATAGA TGCCTGAACt; TTCTAATTTA TAAAGCAAAT CATGATTATC 7rMrGAAATA CCTAATTTTr CCCTGGCATA AAGAGCCAAT TCCTCAATGG ATTCTCCCTT ATGATAAGAT TCACTCACTA CATTACTTAC .GTCATGAATT ATAATATTAG GTATAATTAC AAAACTTTCA AAATAATCAA TCAAACTATC TACCTTATGT AAATACATAG TrrGAATATC TArrGr=C CGTGTTGCTA GGTCTGCATT TCTAAAG4GCA ATTACAGAAG AATCAAATCG AATGCTCTCT TCTrCCTGTT CAAAATAACT TAAATCAACA TGAAATTGGT TGGCCAAATG CArTT'GGT'r GATAATrTAG GTTTCGTTC GTT'GGACTCA AACTGCCAAA TGGCTTGTTC CGrFAAATTA ATTCTCTGAG CTAATTCTGC 1309 TCTACTrAAA CCATTTAACA GCCGTAATTC TTTCAATACC CGACCATTAA ACATTTACAT ACTCCTTACT ACTr?1'GACC TrCTrGTTTr TCTATTCTTG GAATAATTTC AAAATCTrCT GT7TTCCGATA A?1'CTGAAAA ATTAGGAATA TCTTGATATT TACrCTTTC GAAATGGTAC
GGG
INFORMATION FOR SEQ ID NO: 278: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 582 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 278: 2520 2580 2640 2643 p.
p p p p p p..
p TGACCAGTGG CAAAATGGCT ATCCAAATGC AGATGTTATT GCAAGCCTAC GTAGCCTTGG AAGAGGGAGA. ACTGCTAGCC TCCAGAGGAG GCCTATGAAG CTATTTATGA GGGAAACTGG TCTAGTCTTT CACCGTATTG CTGTGGCAGC AGATGTGCAG
TA'
CA,
cc' rGCTGCTG
~GCTGGAG
AAGGCAG
TCGCTCAG
IAAACAAG
TAATGCA
TGACCAAGAG
AGTCAGAGTA
TTGCTCAAAC
ATACGCATGC
TCGGTAAGAT
AAAGAAGTAT
ATCGATGATA TCATCTCAGG CTTCTrAGAG GGCTTGATTG AAGGTTTTGA TTATCTrGAT TT' TGAAAACAAG GTTATGCAAC ATATTTTTGA AAAACTTGGT TT' GCCAGTAGAT GGCGAACGCT TGGCCTATCA AAAATTAAAG AAJ GTAAAAATCC TCTACTCCTC ACCAATTGGT ATTCTATCACTT TTGTATGGAA TTTGGGTTCA GGAGCAGAAG CATTTGAGA GG( ATAGAAGAAG TTGTwAGTCA TCCTATITTTA GACCCAGTTA TT INFORMATION FOR SEQ ID NO: 279: Ci) SEQUENCE CHARACTERISTICS: C A) LENGTH: 554 base pairs CB) TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear TAGCTGA TGACCATTAT GGACTAGG AGATGAAACG Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 279: CCCAAGCTAC TAAGAGACTA AAACTTGCTA GAGAAGCAAG AGAA.AGTGTG AATCTTTTTA A'PTTCATGAT GAATTTCCTT TCTGCTACCA AN'TAGAGAA ATTTTCTCTA ACCAGCAATT CCCCTAGTAT AACAAGTTCA AAAAATGGAG TCAATTTATC TGCTCACGGT CCAGCAGGTA 1310 GCCCCGTACT TCTGAGATAA AATAGAGAGA CCCTGTAACG AACAGCAAGT CPTGAGCGTC TGCCCTTTCT TCAAAA'rCGC TGATAAATTC TCGGTAAGAA GAAACTATAT CGTAACCTGT CACATCCCTT TCGTCCAAAG CCCCCTGATA GTCAAAGCCG GTCACCTrGA GTTCCACCTG AGGCAATTT1'T TCAGTCAXGAT AACCCAACAT CCCTTGATAA TCCT rACGTr TCAAGGATCC AAAGAGGAr TIGAGGTCGAT AGCCI'CCTG CTCT1TCT T'rGATAAACr CAGCCAAGCG AGTCAAGGCA GGGAGGTrAT GAGCACCATC CAAATAAATC TGTGGGCGAA TACGCTCCAA GCGAsCAGCC CAAT INFORMATION FOR SEQ ID NO: 280: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 766 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280: 9.
9 9 999 9 9* 6.
9 9 9 99 09 9 0 9 99 *9 9 9 9 90.9 9 9. 9 99.9 9 9 9**9 99 *0 9 9 9 CCGGTTTTTC AAATGAATTT CTTGGTrGTG AAATTTTAAT CCCAACAGCA AAAGAAATGA CTTTAAAACC AGAAAGTCAG GCCGTGCTTG TGGAGAGr'r CTACAAGGTA TCAGCTGAGA CTTTGAAAAG GCAAACTGCT CAACACTATC ACCGCAACAT TAAGAGAGAT AAGCTGACCG TTTTCATTAC CTCGGCTrTG TACGGTGTTG GTTTGGATTT TTTGATGAAA TTAAAAGTCG CAGCCTATGA TGAAACTCTG AAGAAGGAAG TTGAGACTGT ATTTTCTAAG GAAATCAGAG ATAGAGGCGG TCAGCTGAAG ATTCACTCAA TAACACT AATAGAAAAT CAAGTACAAA GCTAAAAAAT ATGCTACACT ATCAATATGA ACACAGACTT CCCAAGTATC GAGGCAATTC ATGCCTTGGC TCTCTATTCT GCCAGTCAAT AAGCGGCGGA AGAATTTCAA AATATCCAAG CAGCCTTGAA ACTTTTTGAT GGGCTTATGT AGGCGGAACA AGATTATCTT GAAAATCATG TTCCAGTCTT GTCACCCATG GCTCCTCACC CTGGTAAGAC TTrGAAGAGC CATTGGAAGG AAGTGATTTT CTCTCTCTTG TCATCAGAGT CAAAGATGGT GACCTTCAAA TTCATGGAGG CTATCTCCAA GAAAGCGCGC GGGGC CTT TC CTGTGGGGGa AGCACGTCGC TTGAACTTTG ,CTGGATTTGT TTACCGAGAA GA'PTTGTCAC AACCACAGGG GGATGG INFORMATION FOR SEQ ID NO: 281: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 901 base pairs CB) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1311 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281: CCGGCCACGG TTCCATCCAA CTTCACAGGT AACGGTAGAA TTTCACCTAT CCCTCCTATC AAGAAGATAA CCTACTTATC CGTTGCTATG TAGATrTTA AAT'FrTTGGC TAATTGTC1'G ATTGCCTAGT TTTTCGCGTG CAATI'TTGAG GAAT'rTTCCT TTGTCTGTAA AGACTTCGAA GGCTCCAATC TGATTGATAT GGCTCCAAGG TGGGTAAAAT TCCAAAGCCT GATCTCCGAC GAGGTAGGAA GTGCCTGTCG TACTGAGGAG TAGTCTTCCT TACTTTCTCC AAAAAAGGCA
GTGCACTTGA
TGC'TCGCAGT
ATTATACTAA
AATCAGGGTC
AATGGCACCT
GTGGCGGCTG
AATCTGGATA
AAGGAATTTC
TACTGTTT'rG
TTGTAGAGGG
'rrGTGTATGT
ACCCGCAGAC
AGTT'rCTACT
GGAAGTTTGA
GAGTCTTTTG
AATTGTCACT
?r-rCTGAAAG
TTTTGCAA.A
CGACCTTGTC
AAGCAAAGAG
ATTTTGCGTC CAGrGACATT AATTGTTCGA CATTGACATC CCAACN'TCC CAGCGATAGA TTAAGTGATT GGGCCATGCT CTTTAATTGC TGCTTTCTCT :00..
000.
.0 0000 0 o* TGGTCTTTAT TGACAACAAA CATAATAGAA ACTTCACTAG AACCTTGAGA ATGTTGATTT TGTTTTCAGA TAGAGCGCGT GTCGCAGTAG CAGTCACTCC TrCATTTTTT CACCAACAAT CATAATGATA GAAAGGTCGT GTTCGATTTC ACTTTAGCCT TTTGAACCAA C-GACGCAGG ATTTCTTCTT CCT1TGATGGG CGAGAACGGA GAATGATAGA AAGAWCGTCG ATACCTGTTG GCATATGI'TC
CATCATCTGG
GATATGGCTC
TGCATGATCT
AGTTAGTTGG
CCAACCGATG
T
INFORMATION FOR SEQ ID NO: 282: SEQUENCE CHARACTERISTICS: LENGTH: 1765 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 282: CCCTGTTACG TGGATAATAG GGTAAGACTG CTCAGGATTT CCT CTGCATITCGA CCCAAACCTG ATCGAAAATT CAAACCAATC CGA TACTTCAAAC ATACACATCT CCTTGACAAA AGTCCAATCA ATT A.ACAAAT CCACCGCTTG CTATGGA GCCATTCTTC ATCGCAT TAAAGTATGG T'PACTAATAA AAACAAGGCC TAGCTACTAT GAATGTGAAT AGGATTTTCG TCCCGACCTC TTACCTGGTT AGCTAATAAC ATGGGCI'AAA AACATCCACT GGACGTTCCA ACTCTTCCCC 1312 ATTTCTGGGA GI'TGGGGTAA AAA'rG~rCAC I'GGACCTTCC AACTCTTCCC AGrrGGGCTG ATACAGTCrC CCAGACTGTA TCACTCCTCC ATAAAGCTGT TTCAATCATG TTCCATTCGT CTTCTGAGTC TTCTGGGA'rT GGTTGCAATT TCCATCTTCG TTTTCGATGA ATGAGTAAGC TTGGA'rrCA ACrrGTCCGT TTCTGCGT'rA ACTGGTACTA GAAGAACATA GT~rTTACCA AAI-rCTT TGTCAAAAGG ATTTCAAACA AGGTTT~CATT TCCTTGCTCA TCTACTAGTG ACGTTCTCG TGC'TCGTGGT TATGATCGTG TGACATAGCC TCGCCTTTAI' TCTATCTAAA TAANTGA AAATCAGCTG AGCTGCTAAC TTATCAATGA
CATTTCTGG
TGAAGACT TC
CGCCTTCTGT
CTTCCTCTTC
TTCCATCAAT
'rGATTAGT'rC
ATTAAAAN'T
CTTrCT'rGCG
CTTATTGCGA
GCGTTCATCC
TGACTAGCTT
AATCGTTCCA
TGT-rCTTCA'r
ATCGCCACCC
TCGACTCCT'I
TACTTACGGA
AATACGTAAC
ACATCTGTCA
GTTTCTTCAG
AATTCCTTAC
CGCCATTTGC
A.ACCAATATT
CAAAACTAGT
CrGCCAATTr T1-rGAGTTTC CTGATATCrG CTTGT'rCAAT CAACATGCGC TCAGCAGCCA CTGTrGTCAA TGATAGTCTA CTGGTAAACC AAAAAACTCT TCTAGCTG CTCCGTAGCT cTACGCGCGC TCCACTTGTA TrGTTCATGT TTTAGCCAA GCCCACTACA CCTTGTAAGT ATCAACCAAT TCCTTAACGC GGTCAAAACC AAATTGGCCT TTATCTGGAT GATTTCAAGC CCTTGAGCTG TAAAACCAAG CGGATCGCTA CTACCGTTTT TGAACCGACG TCCAATCCCA TAATTCTCAT AGG'rrATAGA GTCCTT'rTGAG GTAGTAGCGA ACCAATTCCr CAACGATTrC ATCACGCTCA TTTGATTTCG TGCATTATTA TAACGAGGAA CGTAGGCAGG GTCTCCACTC CTACGATTTG GTrAATTGGG TTGTAaCCCT TATCGTTCAA CGAAGCATAA AAGTTTCGCT AATTTCTTTT TTATTGGAAT CGTCCAATTT AAAACGTACT TAAATCCCAT TCrAACACCC TCTTTCCTTA GAATAG'rACC ATTATAGCAT CTTCTACAA'r TCAGGCAGTC TATrTA'NrG GAT-=rC'rAT TGTTCTGTCG CAATCTATCT GAAATATATT TGCTTGGTTC AT'rTTCAAA AGATT'NCCA CTTCAGATGT TCCAACTGGG AAGCCTTCTT GACATCCAGA ACTTGAAAAT CGTTGTTTGA AGTTCCGTTG CGCTCA.ATAG TTTTCTTTCA AGTTTGAAAC ACGAGCTTCA ATGATAGACT TATCCTTCTC CTCCGCTTCA. AGAAGAGCTT C'TCCACTCCA TGT TG 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1765 INFORMATION FOR SEQ ID NO: 283: Wi SEQUENCE CHARACTRISTICS: LENGTH: 1346 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1313 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283:
CTTATCCATT
TGCCCCTAGA
ACAATGTAAC
GTTTrCCTATC
TCTATCCGTC
CTGGAATTAC
GAACAAGGGC
GACGATTTCC
CACTTTCTTG
rGTGAACGAG
AAAATCAAGG
TGAAAAATGG
AACTGAGCCT
GGCACGAACA
AACATCAAGA
GACTCTGTGT
TCTGTTATTC TATAAATCTT AC'TCCTA.AGT ATACCACATT AGAAACGCTC TAGACATTGC CAAGAAGGAA AAAAAAGGGT GAGGTCTGGA ATGAAGAAAC AAAGCAAGTA CAAAGAGGTC TA'rCGAGTCr GGACGATTrC CGACGGGTAG TCGCCTGCCT TGACTTrCAC TGCAGCA.AGG ACACCA7rCA ACGAGCCCTG ATACC'TCTAT GCCAAGCCTC AGAGTGGCTA CTATGTA'rTA CCTAGAAATC GAGGT'rACCG ACGAACATGC CAGTGCCTAT CAATGAAACC TTGATTGGCC GAGAAAACI'A CCTCTTCAAC A'IrAGAAGAC CTAAGACAGT CCATTCACAA ACTCCTCTTT GGCTAACCAA CTAGTACTGA CTCTGGAAC CCAACAAGCC ATCCTrCCT AGACAAGCCA AGGAAATCTT GGTGGAACAG TCGCCTCTTG AT'rGCACAGG GGCTGGACTA TCAAACGATr TACTATGACA ATCAAGAAGG GAGCAAGCTC TCTACTGCAA TrGTTATCC TCrCTCAA.AT CCAACCTACC ATCGGATGAA GAACGAGGCA TTGATGGGAT ATTAAGI'TT TCTACACCAT CAAGACAAAC GATCTATTCT GATrATCTGG GTGATTTGGA GAGCGTGTCA T'rTATATCAA GCACTCATTC TTCCAAATGC TACGACAGCA ACCTCATrAT GAAAAAAATC GTTTGGCTCG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1346 TGA CTT GGAG GAGCTGGAAG GCCACTTCAA TCCCCGATTI' CACTATCCCC TGGGACATTC TAACTTAGCT GCCAAGTATC A'rGTCTATAT CTCCAAGAAG GGCCAAACCT TCCACTATCT GTCCTTCTCG ACCAGCCTTT TTCCTGCCCT TATCAAAGAA GCATTTGTGG CCTACAAAAA GCAAAAGGCC CTGTCA6TCT ATATTGACAG
AACAGGAAAA
CTATTCTGAG
CGTAGAGGAC
TGATACAGAG
TCGTATTACA
TATCCTAGAC
TCAATTGT'T
CTTGACCAAT CATGAATCTT ACCA-AAAACA AATCGAGGAA AGGATAACTA AAACACCTTG TCCCCT'rCCT CATTATTCCC TACACGATGG YTTATTGCTA GACCTGAGAC AGTATCCTAA AA'rCGCCAGT CTCAAACACA GTCAACTGGG cTTGGAC'rTC TTTGAAGAGG CCTATr'rAAG CACCTG INFORMATION FOR SEQ ID NO: 284: SEQUENCE CHARACTERISTICS: LENGTH: 900 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1314 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284: CTATATrCAG AATATGCCAA ATATTAGAA AACTCCCCCA CCCTTTATTA TTAATTTGCT TGGATTGACA AGAATGGAAA TATCCTrGGA TTGTTAAACC CTrGAAGGGG AATCGGAGCA GAGAATATAT TAAATTTTTA TGTCTrCGGT TCTATACTTA GTGCTAGAAG AAGAATTAGG GATACACACA GCCAAGGAAA TACAAGCCTA AAAATTI'AAT AAGGTTGATC AAAAAATTAG TATGAAGAAT TT-ATTGATTA TATAAT'N'TG GTGTGCTTTT GAAAATTTAA TTTCCTATGG
AAATTCGGAA
AAGAATTAAT
TAAATTAATC
'rATTrCC
'N'TGGTTTTA
GCAACGGTAC
TAACAAATAT
'rGGTATAAAT 'rTGCGGAGGG TTTAAGAAAG ATTTTTCTAG AATAATTATC TAGAGAATAA TCTCTAGTAT TTTATTTAGA GAGATAAATT CATTGCGTGA
TT'CATTTGAC
AATTTTGGCC
AGAATACGAG
AGATTTAATC
AAAAGGTTTA
AAATATTTTA TAACATTGTT TGACAAGGAA CCCGTT'=TAC TGAGGCAAAT ATCGGAGTCT TTTTATAGAA ATT'rTATCAA ATTTAGAAAA GCTAAGGGGG AAATTAAATG ATATAAAATT AACTGTTTTG ATACTCTTCT TTGATGACGC
TGATTTTAGT
TGGAAAGGGT
GAAAATTGTT
GTATATCCGA
GGATCACAC
AATCAATAAC
GATAACAATA
TCTACCTCTA
AGCATTTATA
AGATATGCCT
TCAC'rAAATA CTATTGCTGA CCTCGAACTA TTGCTTATTC GAGCAAAAGA AAAATTTACC TGAATATTAT TAT'N'ATTTA ATGGGAGTGA TATACATTTT GTAATAATAG ACTTTGAAAC AArGTTACGG INFORMATION FOR SEQ ID NO: 285: SEQUENCE CHARACTERISTICS: LENGTH: 862 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285: TTAT'rTAGCA GAGGCAGTTT GCAGATTATT GGTTTAGAAA AGGAAAAGCC TTGGTACACT TGGTCTTTAC CATCTGGCTA GCACCTGACG GAT-rIACAGA CCTTTACTTA GAGGACTTGG CACATGGGAT ATTCGAGAAG GGATATCTAT CAGTTGGGGG TAAATGTGAA GGAT-rTGGTC AGTCAAACAG T 1TTTrATC~A TCC'rATCTCA AACGGATACA GAGGTCGTTC TGGGACTTGG TGATTCAAGC ACAAGAGGGT GGAGAAGTAA GGGAACATTA TTCTTTTGCC GACACGAAAG GCTTTGGCGG ATGTCTTGAA TTCCTCTTGT T GGCGGTGCA GATCACGGTT ACAGTGAGGC AGGGAAATGG CATTGAACTC TATCGAGATA AGCCAGT'rTC ATGGACGTAT TATCGGGGTG ACTGAAGTCC AAAGAGTAGA GCCTTTTATC C'IAGCAGAGG
TTGCGGCTCA
GTACGAGAAT
1315
GGGGCATATT
GTTAGGGC-rC
CCATCATCAT
CCTACCAGGT
TGCCCAACGA
AATCACAGAC
CATCTTTCTG TCAAGGATAG TCGA.AAGTCC! AGACAGTTTT ATCAAACGGT GALGATAAA'r TCAGTGTGCC TAGTGCTAGT TGGATCGCAG CTGGGGACTA TTAGCAGTCA ACGAATGGGG AGGAAAAGGT CTGGATCCGC GTAAACAAGT TTAGCCTACT ATGTCATCGA AGTCGCACAT AAAGAAGAAC TGTTAACGAT GCACAAGAAG TTGACGCACC AATCAAATGG ATGACATCGA TCCA.AT TGGA TCAGATGGCA 'rCGTGACCCG TATTCGTTrA GCTAGATAGA TGGTATGTGA TGAAGGTAGA GCATCAATTG TA INFORMATION FOR SEQ ID NO: 286: SEQUENCE CHARACTERISTICS: LENGTH: 650 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 286: C 0 0@ 0 0 00 09 0 0t 00 0 TCGTTTACAA GATCGCTAAA GCGCTACCTC AATCATAGAT AAGCGTTTCT CCTCATTTAT TCATT'TACTC ITPTATTTTAC TGGCAGGATT AGACTAATTC ACAAATCCAA GCGACTTGAA ACTCTCAATT C TCCATCC TrTCCAAGTC CTCGATGTTG GATAGTGTAA TATCCCCAAT AGCTCACCAT GCCGATyCAr AGCTTCACCC CATCTACGAG ATGCATCTCA TGATCGCGAC CACGALATTCC AGTTCACI'T TTTCTTGCCC AGCAAATACT ACTACTATCG CCAGAGCGAA CAGACTCTGA GATAATT'PTG CGGAATAGTC AAAGGTTALAG CAATATAAAA CTCATTCCTT TT-rCTGTTGC ACACCTCCTA GAAGCATGAT TGTAGGTGTA TTTTACTCGA GCCAATTCA.A TCAAAGCACT GTAAGCGGAA TTCCCAATCA CAATGGGGAG TGGAAACCAT TCTCCCTTCT CCTTGACTTC ATAGGAATAC ATGGCTTCCA AGGTCGCTtG GtAAcCAAGT TCACATCCGT GATACCAAGC
AAGATAGCAC
TCTAATTCCA
CCTCATTTTA
GGGGAGAAAG
TCCATTTTCC
GATTTTCTTG
TAGAATCTTT
ATTATCCTGA
AATCCAAAAA
ACTGTAAGGA
*000 OSbO 00 *0 0 0 INFORMATION FOR SEQ ID NO: 287: SEQUENCE CHARACTERISTICS: LENGTH: 1119 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1316 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 287: GATAGCAATC CGCTTCAGAA ACTrCTCGCT TACCTCTAAC TCCGATCGCT AGTTTGGGAG AAGATACrTrC CATTCTCATA CTATCTGTTG GCN'GCAGG CTGTAAAAAC AAC1TN'TCTC ?I'GCTACI'TC CTGAAAATCT GAATCTrGCA G~TrGCT TrCAAAATAG TCCTGTACT C GCTCCACATC AAAATTCCCA TAAAATTTC TTGCAAATTA TATCAGTTGC TAAAGGTGTA AATCTGGATC ATCTTGGTAC TGGAAGCTTC AGTAAAGTGT AAAAATAATC CGTTGCTGAA TATCTGCACC TAGACTCGTA ATTTATGTTC AAGAAAATGA CTGTGACAAA CGTATCTACC ATTCCTTTTT AGGCAAAAGA TTrCTTTTAC AGCTGGATAG CATAAAGTAA ATCGCTTGTA A'r'rTGTTCA AGCTTTGCAA GCTAAAGACA GAGACATGTT TACAGGTTTG TAAAACTrTG GTTAGATTGA TTTGGGAAAT GGACTCCTCA CTTCCAACTA CCAGGATACA AATTCGCTAA AG?1'GAAAAG AATAAACACG ATTTCTCGTT CTGCTGAAT AATATCCTGC GCTGATGTTA CCAAT'rCATC AAGTAAATCT AAAAGATAGT TTGTTrrTG AAAGCTTGTA AAAGCCGACA TCAAATCACT AGAATCTTCT GCAATTCCTC CAGGATATTG TTT-rACATCT GAACCAAACI' GTACAGTGAC ACTCCCGTAA
TCTGTCAGAA
AAATTT'rCTA
AAGGCATTAC
CTCTCAAATA
CCG'rCAACTT
ACCTCTTTAA
CGATAAACCA
CTATTCCTTC
CATCTTTGTC
CAAATAAGGC
GCAACTGTCA
TATTTTTCTT
GTTTCACATT
TCCAACTTTT
ATCCGTTGGC
CAAAAACAAC
ATTAGCTACT
AAAGTCTGCT
CAAACGAGTT
CTTTGTCATT
CTACA.AATAG
GAAGAT=NC
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1119 120 180 240 ATTTrGATAA GCACGTTCAA TCAATGAAGA ATGATTATCT TGAGAAAGTA ACAACGACCA ACGAATCATT TCCTTGGTCT GATTTAACTC AAACTCTGTA AAAAAACCTT TTTTrAAATC AAGCCGTTGA TTATTCATCA ATTTACGAGC CTGGTTACG INFORMATION FOR SEQ ID NO: 288: SEQUENCE CHARACTERISTICS: LENGTH: 540 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288: ACGCCCTCGC GGGGACATGA CGAATTCCCC GTrCATCACG AACGCCGCCG AGGAGTGGGG GGTGCCGTCC AAGTCAAAAG CCGCCCCACA TCGATTCAGT TCCCCGACGA ACAGCCCTTT CCCCCAGCGT TCCTGGCTTT GCAACCGTTT CACAACAGCC TCGTA.AAGTA GGCCGGACAA GGCAGACGGA CTCCAAAGGA GTTCTTCCAT CTGCAAGTGC GCCTGCGTTA TGTGATCCCG 1317 GTCTTTTGCA TGTGTGTGGC ATGAA'rGCTG TTCCCAATCC CACTCCAGAA CATTCTCCTC AAAAGTGCGC AACGTCCCCC TGAA'rGAATC CTGCCTTGTA GTCGTGACCA TTCCTATGAA GGGTCGCAGA GGATNT"CCC CGAGTGCAAG CGCATCCTCC GGCTCAAATC GGGTGCATT CACAGTCCCG CTCAACGCTA GCCCGATCCC 7nTTGGCAT GGTGACTCAA GCGTCCTTTC AAACAAAAGC TCCTCATCCG CTCCAACCGG CCCGACGTAG ACGCGTAGAC CGAAGTCGTC INFORMATION FOR SEQ ID NO: 289: SEQUENCE CH{ARACTERISTICS: LENGTH: 1949 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 289: AAAGAA'rTCG ACCAATTCAA GGTTGAGGCA TCGCAAACTA TGGACTGTTC TCTGGACAGA AAACGGGATA AGGTTGGCTG TGAAGCAAGC TGCCCTCCTA TGGAA.AGTAG GCATCACCTG ACAATTCT'rT ACAAGCATAG TCCGT1TCCAT CACTTGAAAG AGGAACTGGA GTC-ATrCGFr ACTAAATACT CTGAAAAAAG ATAAGACCAC AAAAGACTTG TTTTGGAAGT TGTGTTTTTTr TCTACACTTA CCCATTGGGT GAAAAAAGAA TNTTAGATTT GGGGTGAATA ATGACTCTTC ACGAATTGAC CAAGGATATC TGAATCCGAA TAACGACAGT TAGAAATCCG CTCTTTAGT TTCAACTGGG CATACTGGGT TAAATGACCT CCATCGAAAG GATGATTTGG TAAACTGTTC ATGTGAGTTT TACCATAAAG GGGAAACTCT TTTTTCTCTA ACCATCCAGG ATCTAAGCTA AGGCAAGGAT ATTGGGGT'rT TACAATATCA ACTCCCATGA GTGATGACTG TCCTTCCTTT TGCATAATTA CCCCGTCAGT CCAACAATTr 120 AACCTGTTAA 180 AGCGGCGTTG 240 AAAA.AAGTTC 300 ATAGTTGGTA 360 CCTTCTTTT 420 GTAAAAAACA 480 TCTGGATGGT 540 TAGTCATGAG 600 CCTCCGAAAC 660 ACAAAAATCC 720 ATCGCTTATC 780 ATTATTTAAG 840 CTGCAGTTGC 900 C -rTAAGACC 960 CTGCAGATGT 1020 CTGCAGCAAC 1080 ACAAAAAAAG GGGTAGACAA TCTAGTGTCT ACCCCCGAAA GTTTATTAAA TGCCAAAGAA '1~TTTTGGCAG GAAACCAAAT CAATTTATCA GTTTCTATCA GCTCTCAAAG ACTGGTAAAT AGGGATTCCG CAATCAAATT GCGATACTCT AGTAACTGAA GCTCCAGC?1 CTTCCAATTT- AGCTGTATT TCTTCAGCT AACGCCTTICT TTAACAAGTG CTGGTGCACC GTCAACAAGT TCTTTAGCTT AAGACCAGTG ATTCACGTA CAACTTTGAT AACGCCAACT TT-FrTGTCGC CAATTCAACG TCGAATGAAT CTr'rAGCAGC ACCAGCATCA GCTGCATCAG 1318 AGCTACAGGA GCAGCTGCAG TTACACCAAA TTCTTCTTCC CAATTCAAGG ATTGAAGCTT C?1'TA.ATTTC TGTTATTTCC TCCAAATAAG 1TTrTAAATTr CTGTGTAGCT TAAGATTAAG CCGCGTCTTC AGCAACGTTG CGCACTGGCG GTTTGGAAGA G~rrGCAAGTG ACCACCTTTA ATTTCAAGTG
CTTGAAGTAC
CAAGAATCTC
CTTCAGCGTT
TGCGATAACA TCTTCATTAG AAAATGCTAC ATCTTCAAGA CCAGcTT'N" CAGCTGCACG CTCAACTTCG CTTrCCACGAA GCTCACGACG ACGAGCGTCT ACAACGACGA TAGATGCAGC AGTTCCGCTT TTrTAGCAAT AA'rTGCTrCA TTGC7"GGG AATTTTTCAA AAAGAAAAAC CGCTTCTTTT 'rACATGATAC GTT'rTGTCCT ACTGTCTTAG GCAGTTTTTT TAGATACGG
AGCAATAATG
TATAATAGTT
TTGCTTTCT
AGAAAGGAGC
TrCTTTAGAT
TTTAGAAAAG
TGCAGA'rGGT
ACGCAAGATT
AAGAACTGTA
AGCTTTCATT
CTCATTAGTG
GCGCCCAATC
CGGTAGGATA
ATAGCTTTTA CAAGGTCGTr TTTTCAATGT 'rCAATGCCAT TTTTTCGTAG CTAGksTACG GCAACCGCTT TGACTGCAAG ATAGAAAGAA GTCCT'rCGCG GCGACAGCGC CTTCGATTGC TCGTTCAAGA TFTCGCTGG CCAACAAATA CAGATGCAAG GAGTTI'TTAA TAACTTATA TCTTGCTCAA CTGTCAAACC TTTTCAGCTA tACGTCAACT TGTTCACCTC CGTAATTATT CTAGACACGA AAGTACAATA TTTATGAGTC GAGCTCCCCT 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1949 INFORMATION FOR SEQ ID NO: 290: SEQUENCE CHARACTERISTICS: LENGTH: 1023 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 290: GGACTGTTrTG ATCTTATACA GTAGCTGCTT GATCCAAGCT TTCACCGATA GCGGCTAGGC GCTCGATAAC TTCAGCTTGT GTCAATTCAT CACGGCACTC GTGTGAGCAT CCACGAAGGT
TACGACGGTT
ACCAGTCTTT
CAAAGACGTA
TTGCGATTCC
AGAATTTCTC
ACAGAATGGA TTTCCACAGT CCCTACGATA GTTGGGTTGA CATTTTCCCA TCCCAAAGCT TCCGTGCA.AT TGGCCGACAT ACAGCGAACG CCACCTGTAC
TTTTTGAAAC
ACTTGTCTTC
TGACATAACG
CATGGTTGAC
CACCTTGAAC
CTTTGTAGCC
AGTAAACCAC
ATAGCGGTTA CGTGGGTGA ATTTTCTTCT GATGTCAAGA TTCACATGGT GTTCCATCAA ATCAACGGCA ATACGCTCGT TTCTGGGTCT TTACCGTAAG TTCACGGACC ATCCAGCCTG CACACGCTTG TCCATGAATT GTTGCGAATA. TCTGGGCGAA TTTCCTTGTT ATCACGGACC CATTGTGGTA ACTCACGGAA 1319
TAGCTCCACG
TATCTTTATC
CAAGTGGGT'r TCTrGTAGCG AGAGTTC~rrC
AGTCACCTGA
CGATTGATTT
TATAAAGGTA
GAAATGTCCT
AAGAAXGCGCT
GATGTCATTG
AGGTCGTACT
TCTTGAACT
TCAAAGTCGT
AACAAACATC! 'rCTGAAGG CATTCCTGGA AGGCTGTGAA AACTGTrCCG TTAATTCCCT ACAGAAAGCC AAGTGGTCTG GTAAAG'rAAG ACACGAATAT CATAATCGTT ACGTGTGTCA AGGACAACGG CTTTTGGAGA CAAGTAAGCA CCTGrTGTTI' TGTCTTCCAA ACCAAGGTGG ACAATTTCT'r C? GTTCATT ?TICTTCGTCA ATCTTGAACC CGTA9TCCAT GTATTTTTGA GT rGTTrCA'r CGTCAGCGAC TAGGATACGG CCTrTAAGGn CAGCAAATTG CTCTGCATTT TCAATTGGAG CTTTTGkCaw AAGATTTGTA TCTCTTTATC 600 660 720 780 840 900 960 1020 1023
TAT
INFORMATION FOR SEQ ID NO: 291: SEQUENCE CHARACTERISTICS: LENGTH: 3831 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 291: ACTATGAACA, AGACCCAGAA ACCTAGCACC TGTTTAGATT GTCGCTCATT AAAAGGTCAA TTTCTTTGGT TGCAGGTCTA CGATGACGAG CGACTTTT CACCATCGGT TATTATTATG TATGCGAAGA GTTTGGGCAT CTATTGAGAA AACATGGGCT ATACCTTTTA CGAGGCTT ACATTTTTCG GTTCTTTGTC AAATTTCGTC C 1TT1CTTTTT GATAAGTTG ATGAGATTAT GGCGTTGACG ATTTTCTCTT TGGAAAAGCA AGAGCTGATA AAAGTAGCCT TATTTCTTAA GAATTTAAT AGTTTAAAC GACGAAACAG GATTCGATAC TTA'rTTTTAT CGAGAATATG TTAATAAGAG GCAAAGTATC TGGA.AGAAGA TATCAGAGGA ACAAATGGTG AATTAATCGC TCCAATGACT TACGAAGAGA GAAGCTTGGT TTCAGAATTT TCTCTTACCA ACATTAAACA GATAATGTAA GATTCCATAG AATGGGGAAG CTAGAACTTT AAACTTTTAC CTCTTCCTCC CTACTCGCCT GAGTACAATC CATATCAAAA AGCACCTCAA AAAGGTATTA CCAAGTTGCA TTATCCTGCT CTTGTTTCAA TTGACTATAT TAGAGGCGAG AACTGTAGTG GGTTGAAGAA AGCGAAGATC TAGAAAGGAC TGAAGTTTTC AAAGTTCCTA AAACCAAAGG CATTGTGCTT TGGTGGCTTC CAG M TGGCG TTGGAATAAG GTAATTGAAG TATCTTTGAG GAAGGTTTTA AACAAAGTCT GAAACAGAGG GAGATTATAG TGGTGTTTAA AGTCTTCGGA ATAGCTCAAA 1320 AG ATCTA GAATTTCTTr ATTAGTCAAG TGCATACGAA AAGTAGGGCG ATAAAA'rCGT ?TATCACTCA GT'rTCTGACT ATCrrGI-rGA ATGAGCTTCC AGTAGCGCTT GATAGCCrrG TA7TCATGGG ATTTCGGATG ATGGCTTGTG 7rCTGCTCTC AAGAACAGTY ATGATATTGA GTTATCAAA GTCCTGAGCA ATAAAGCTCA TCTCCATCTC CCGA'N'GAAA CAGTCACTCC CCGGACTGTT TCAACs'rCCT AGGACATAA'r CTCAGGAAGA-e~eGAAAAAT CATGCTCAAA G'rGAAAATCA TTGTTTTC GAATGACAGr TGAACTTGAA ATAGACAACT CATCATCAAT GTCGGTCATA GAAGTCr'r' TAATTAGCTT CTGAGCAATC "TTGGTTGA TGATACAAGG AATTTGATGA TTCTTCTGA CGATAGAAGT CTCAGCGAGC TCCATTTTTG GCACTTAAAA CGGCCTrC TAACAAGAAT TCTAGTTTGA Al'r-=-AT CAGAACCATA ATACCTATAT AAAAATATTA TAGTTCTAAT AGGATT'rACC
AGCAATGATA
ACTAGAAAAT
CAAAAGTT7r AAGGCGGTCT TTTAGAACT TCTCAACTTT TCCTAT'NT AGTGAATGTG TAAAATTTTT TATGTTACAA AAAATTTATG TGGTCCTACA TTTGGTGATA TTATTTAGTG AGATATGGCA TGCTTATCGT TTGTATCTTG TAA=rTAGT AAATTTCCGA ACTAATTTTT CTAAATCATT CTTGATAAAA AGGCGAT'T'r TTAATrGTr GAAATTTAGG ATCTTGTTGA GCCTGGTAT'r TGCTTAGAATA AGTTrATAAA AGCAGATGGC TAATTTCTAT TAGCAAATNT GlTrTCTATTT TTAACAATTC AGGAATTGAT AAAGAAAAGG AGTATIGAT GATAGTATTG AAGAAGAGTA GGAACATGTT- CATTTTAAAT TTTT'AATTTA ATr'TGACTG
TTGCTTGTCG
AAAAATTGGT
ACTAATTTAC
TTTrTAATAGT
TITATTATAAT
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640
TAACGATTTT
AATGAATCGG
TCTT'TTATGG
TGGAAATACC
AAATTGTAAG
GATGGCTCAG
GGTAAAGGGA
GGTGGTGATA
AT'rGTTTACC ATTATCGTGT GGTTTTATTT CTTGTTGAGG AAAGATGATA GTAAATAGCT AAATCTrTCT ATTGTTTCTT A'rATAA'TGC AGGTGAGAGT ACAAGATTAT GACTTCAGTT AGATTACAGA CTTCCTTTCA ATGC1'GGTCA CACGATTGTG
CCTGCCATGT
GTTGTTGTAG
GCGAATGCAG
ATTGACGGTA
ATATCTGTCA
ATGTGAGAAA GGAAGAGCCT GTACCCA-ATG GGGTGATGA-A AAGTGATTGC ACGTTACCAA AGAAATTTAA GTTGCACTTG TTrGGGAATGG TATGGTTGTA ATTCCATCTG GGATTTTCTT CCCTGAAAAA AATCCTAAAT CTCTTGTAAA AGAGTTGAGC ?TATCI'TCATG ACAACGTGT AACAAC1'GAT AACTrCG'A TTCTGATCG TGCGCATGTT ATTTTGCCTT ATCATATCGA GTrGGATCGC TTGCAAGAAG AAGCTAAGGG CGACAATAAG ATTGGTACGA CAATTAAGGG AATTGCTCCA GCTTATATGG ACAAGGCTGC TCGTGTTGGA ATTCGTATTG CAGATCTTTT AGATAAAGAT ATTTCCGTG AGCGTTTAGA ACGTAACCTI' GCTGAAAAGA ATCGTrCTTN' TGAAAAATTG TATGACAGTA AAGCGA'N'GT TTTCGA'rGAT ATTrrGAAG AATATTACGA ATATGGTCAA.
GTTATCTTGA ATGATGCGCT TGATAATGGC GTTATGCTAG ATATCGACCA AGGTACTTAT GGTGGTGTGA. CAATTGGTTC TGGTGTCGGT TGTAAAGCTT ATACGAGTCG TGTAGGAGAT GTGGGAGAAC GTATCCGTGA AGTGGGTCAT CGTGTAGGTT GGTrrGACTC AGTTGTGATG AACCTTTCTT TGAACTCTAT TGATGTTTTG GCCTATGATC TTGACGGTCA ACGTATTGAC CGTTGCAAGC CTATCTATGA AGAGTTGCCA AATrTGGAAG ATCTTCCTGA GAATGCGCGT GGCGTTCGTA TrCTACTTT CTCAGTAGGr AGTGTTTGGT CCTAAGAGAT TTTTAAGATT GGTTACAAGA AGACCTCCTA ACTTGTTGTA TAATCTCCCT ATAGAGTCAC CGCA'rTCGGT ATAATAAAAT CGATAAGTAG GAAAAGAGAA AAGTCTTrAT GAGGGAGGCT T'FGAGAGAGG CAATTGGTTG TG'rGATTGTC AAAGATGGGG AGGAATTACA GCGAGCGGTT ATGCATGCGG GTGAGGAGAG TGCGCTTGCT GGATTGCACA
CAAATCAAGA
AAACGTGTGC
CCATTTGTTA
CCAAGCAAGA
GGTCCTr'rCC
GAATATGGTA
AATACGI'GAT AGATACATCT TTTTTGAAGG TGCACAAGGT CGTCATCAAA CCCTGTAGCT TTGACAAGGT TGTAGGTGTA CAACTGAGTT GXTGATGAA CAACAACTGG TCGTCCACGT CGTCATAGCC GTCGTGTTTC TGGTATTACT AGCGGTTTGG ATACTGTGAA AATCTGTGTG TACTATCCAG CTAGTCTTGA ACAATTGAAA GGTTGGTCAG AAGATATTAC CGGAGTTCGC AACTATGTTC GTCGTGTGAC TAATTGGTT CCTGGTCGTG AACAAACAAA 'rAT'rrTAGAA TGTTTAAGAT AGGTCGGGTA TACTATAGAC ACAAATATCC TAAACTTrTC TTTTCATA.A GGCTTwrTTT GTGTTGGGAT TCATGATATA 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3831
AAGAGATGTA
CTGAGATTGC
AAATCATTGG
AAATTATGGC
TTATACGCTT GAAGAAAAAG TCTTGAACAC GATGAAATTC TCGTGGGCAT AATGCGCGTG TATAGAGGAT GCGAACTrGA CTTTTTGIGA CCATTGAACC G INFORMATION FOR SEQ ID NO: 292: SEQUENCE CHARACTERISTICS: LENGTH: 1441 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: CCGCTGTI'CC AACCGCAACA TACCATAGTC CGTACGGGAT TCGA.ACCCGT GTTACCGCCG TGAAAAGGCG GATGACTTAA CCCCTTGACC AACGGACCTG AGTTGTTATT TTCAACTCTr ACTANTATAC AGTC'TTCA AACTTTGTCA ACTACTTTTT CTAATTTTTG TTTATTlTTTT
CAACTTATAG
AGCACGTTCT
ATTATAGTCC
TCTACAAGCA
TAATACN'rTC 1322 TAAAAAAAGC CAGAATTATA CTGACTCT'rC TATCGCTCAT TAAACTTAGA TTTCCCCACC AATAAGGGAT TAGTTCTGCG ACTTTAACTG 'FN'TrCTTAT
ATCATGAATT
CCGCAAGGCA
TTAACCTTAG
TTCTGCGCAG AGATGGAAAA TCCATCTCCT GCTrCTACTG ATGTGGATTG TATAGTTTCT ATCCrCI'TA TTTAGAGATT GTCCAAGCAA GAAAATTGTA TACGAATAGG CATGAAAAGA TTTGTGGGAA GATAT'T'rCT CTGCATCTTT ANTTCAGCA T'rAAGCTCTA TGGCTGAACT TCCACCATAA GGTGGTTGT TTTGTCCTGA AAATTrGGTAC ATATTGAAGA, CACCACAGOT TCCCTCCATA CAGAATCCTG CAGCTACAAC ATGATTGGCA TAAACAAAGT GTGCTTCTTC GTACATCTTT TCCCAGATGT TCTTTTAGCA TGTrTTCGAT ATGCTGAATT TCrGGTAATT CI'GGCCCATG CA7='CGCCT TTTrTTCCCTr TAATACCTGT 'rrCTTTTTGG; GTCACAAATT CATCATCTGT CATCGCTTCA
AAAGGAATTC
CICGAAAGGC
GGGCCGCCCG
TAAATATTTG
CTGATACTTC
CCATTATTGT
GATTr'rTCAC
GAAACTGCGA
ACTGCTTTAA
AGTTTGGCTT
TCTGCTrTCTC
TCTACTGATT
CGGCCTGCTT
T'rGATATAAT
CTGAGGCGGT
CCACCTGGGT
TTCTTTCGGT
CCAGTTTCAG
AGAGCGGGTA
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 12100 1260 1320 1380 1440 1441 A A
A
A. A A A. A A A
S
A
A A A. *A
A
A
A.
A
TGAATGCTTC
TCAAT'rCTGG
TCATTTGTGG
CCTCTAAGAA
CATTGCTCAT
TTTCATCAAA
TCCAACCAAG
AATCTTCGAT
AGTTGATAAT
AAGAACTGTT GGAACTGTTT CACCCGTCAT GACTTCCCGC GAAATCTGAG AAGAAAAGAT CTGTCAATGG GATAATCTCA TTTATAGAGC TCAACTAATT TTTCAGCCT'r GTCAGTCAAA TGGTTTTGCC ATTTCAAAGA TGGTTTCAAG GTCTGCATTC CCAGTCTAGT TTTT'rCTGAT CAAAGGC'rGC TGGTGACTTG AAGTTTAATG AATTCTTCAC GAGAGAAAAT CTCATCCCCA AAGAGCAATA AAG'rrAAAGA CTGCTTCTGG AAGGTAACCT AAAT'rGAAGT GTATTAGTAT CACGTTTAGA TAACTTCrTA CAAGTGTCAT GTGACCGAAC TC'rGGAGCTT CCTCAACCTA INFORMATION FOR SEQ ID NO: 293: SEQUENCE CHARACTERISTICS: LENGTH: 4398 base pairs TYPE: nucleic acid STRANOEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 293: CGGCTTATGT AGTGGCAATC TT'rCTACGTA AGCGAAACGA GGAGATTA GAGOCGCTAG AAGAAAAAAA AGAAGAACTA TACAATCTTC CAGTAAATGA TGAACTAGAA GCTGTAAAAA 1323 ATATGCACTT GATTGGACAA AGTCAAGTGG CT-rCCGTGA ATTTATCTCT CAACTCTf GCCCATATTG AAAATAATCT ACCATTCATT TCGN'?I'CTC AAGGCCAGTC ATCAAArrGA CTTTGATTGA AGAAGATATT GCGGCAATTC AATCTAAAAA TAGTGGTCGT GTTCTTCATG GAGTTGCTGA AAATTCAGAA CAGTATGGTC AAAATATCCA ATCTGAATTT TCACAATTTG AAGCCGCAGT GA'TTTGGAT AATACAGAAA ATCGTGTTCC AGCC'TTCGTT ACGACGCTT=
GCAATGCTT
CTTTGGAT
AAGCCTTGGA
TAACCTTGAA
ATCACATTT
CTACAGAATT
ATGGAATCAA AAATGGGTCG CTTGAAGCA GAAGGCTATA CCAAATTGAG AGTCAAATTA GGCAGAC7-rA GAGAAGCAAG ATTTGAGGAA C -rCAGCATA TGAAATTGAA AAACAATTAG '1rCATCGGGT GACCCTGTGG GGCCTTAAGT CATATTGTGG CCCAGATCAA TTACAGGATT TTrTTGTTGAA ACGGATATTG CCAAGAGAAT ATTCGTCAGT AGAGGAAATC AATCCIGT GGAAAATCTA CT'rGCAACTC ATTGGGAGAA GATATTGCAC CCATGTTCGT CCTATTCAGA TTCAAATCA.A GAAGAACCAA ACAAACTCA.A CTAAAAGATA 99 9 9 0 09 0.
S 0
S
90 49 99 9 4 9* 99 0 0 900094 0 ~4 9 9 9090 00 99 *999 0 *0S9 TGGAAGCCGG TTATCGTAAA AAGCGCGTTTr CCACTTGCTT TGGAATTGGA TAATGCCGAA ATGATATIT'1' TACTCGAGAA TrCCAACTTA 'rCTTCAACAT GTTTGAACAA GACCTATTTA CAGAATTAGA GAGTTrGAG CCCAAGCTTA TTCAGTCTT TTGAAGATGA GCAAATrrCA ATGCACGTCA AAAGGCCAAT AAAAACGCAA TCTGCCAGGT ATAATACCGA GGATTTAATG CCCGAGTTCT TGAAATTGCA TTGTACAATA TGCAACTTTG CTAATTGATG CTAATTATCA TATGAAGCAT TCAAGAAAAA TATGAGAATG GACAGGCACA ATTGCTGCTC AGAAAGTAGT ATGAAAGAGA ATAATACT CTTCCTGAGA CAGC'rGCAAG GCAGCTATTG TTGAGGTAAC GAAGAAAATC T'rGAGGATT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 GTTAGTGAGC GCCTGACACA AATTGAGAAA GATGATATTA GTTTATGTCA ATCGTCTCCA TACTA'rCAAG ATTCCACAA.A CTTTCTTGAA GTTATTCTT GT'rGAGTTAG AACAAAAAAT GATTAACATT ACGAATGATA TGGAAGCTTT AGAAACGGAA ACAGAGCAAC TCTTGCAATA TTCTAACCGC
CGATACATGG
ACGGCAAGCA
GAkATCTGTTA
ACTTATAATA
TATCGCTCAT
TTGATGAACG CATTCAAGAA GCATTTAACG AaGCTTTAGA TATTTTTGAA AAAGAATTTG ATTATCACGC TrCA~rrGAC AAGATCTC AAGCATTGGA AGTGGCAGAG CCTGGTGTAA CCAATCGCTT TGTTACCTCA TATGAGAAAA CACCTGAXAC GATTCGTTT TAATAAAAGA AAAAGATr'r AT'VGTGTGAG GAGCAGAATC AAATCTTTTT CTATAGTTGT GGGGAGATTT ACTTCATTITr CTCCTGAGAT TGAGT'rTTTG CCCACCCGAT TTATCCACTA CCTCAAAACA GTGTTTTATA CTCT'rCGAAA ATCTTTTCAA ATCACGTCAG CGTCGCCTTA CCGTACTCAA 1324 GTACAGCCTG AGGCTAGCTT CTTAGTTTGC ?rTTGA?1'T TCGCCAGTC TTATCTGCAG CTTCAA.ATCT GTACTTCrAG AACGAAGWI' ATIGAAAAAT CTCCAGACTA GAGAACTCAC A==ATT TGCACNT 'C TTGTACAACr TTAGTCCACG TCTTTGTTTA CGAGAGTTTC CTCGTTTGGA AGACATTCTA TCGCTATTTA TACTAGACTA AAATCAAAAA GCATTATATA
TCATTTAGTA
TAAC"IrGGTA
GGATACTCC
GTAAATAGAC
GAAGATAGGA
ATAGTGATAT
CTCGTACAGC
CAGTACAAAT
TTAAAGTGAT
ACCGTCCAAT
TAATCTGGAG
CrCTAAAACC TAGATATTTrC
GAAATCAACT
TAAATATCAT
CGATCAGGAC
AAAGAAGAAA TCCAAACCAT AAACGCCTTC AAATCGrTCT AGTCAAATTG ATTTCTAACA ATTATA'N'TC GTC'rGATGGG GAAATAAGAT GTGAACAACT AGCCAAGGTG TACTGTTATA CGAAATGTAA AAAAATATGA GGTCGTAACC ATGCATATAT AAGGCTGCAG AGGCAGGAGA GAGT'rAGGTC G 'TCCTACAC CGAAA'rATTA TGCCACGTCC TCTAAAAATA AAATC1TCAAT TCGTAACGTT CGCTTGATGT ATC'GTTGG GCTCCAATAG TCGCTATTGT TA'rGCAGCTG TACATGTAAT ACTGAGTGGA TGATTATCTT TTACTCGN'A TCCGACTAAT A'rTGGTTTTA ACAAGTGTGG AAAGAGATTC AGATGTCATG AATCAACTCC CATCGTTAAT CGGAGATGGA TCCTCATTTC AATAGAAATC AGGACAAATC GATCAGGACA ACTATTCTAG TTTCAATCTA CAAAACACTT TTAAAAGACT ATAGTAAAAT GAAATAAGAA ATGTT=TAGA AGTAGAGGTG CAAATCTTAT AAAGAGATTA CTATCAGGAA AGTCAAATTA GA'rTCAATAC ACTATACACT GGAGT'ICGGA CTCGACTCTC GACAGT'rGAG GAAAAGAAAG ATTTGGTTACA ATTCGATGCCT ACGTGATGCC TTCTATCAAC AGAACATCCT AAGAAAGCAG 'rCAAGAAGAA AAGAAAGCGC ACCAACATGA GGC7 GGTTTC TACTATTCTA GTTTCAATCT TAGAAC7TTTT ATAGTAGATT ATTATAGAA ATATTrAGC GTA.ATCAAAC AACGATT'IGG TCCTTCAAGA AACACGTGGT TCTTT1CTTGC CCGCCATTG TATTTCAGGC T'TATAAAAAG TGTTGAAGTG CCATGGTTGG ACGCTCAAAC CATTGTCGCG TTTAAAACCA GTAGACCTTT GGTAGAATCA GTAAACTGGC 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 GAGTAGGTCC ACATATCCAT AGTCACTATA TACGAGAATT TTGATGCCCA TACAGGCGAA TCATTTTTCI' TAATAGCTGG TGAACGCCTT TNTAGAAGAG CTTTCACAAG CTTATCCAGA TGGACAATGC TATATGGCAT AAATCAAGTA CCTTAAAGAT CCTTTrATTCC TCCATACACA CCAGAGATGA ACCCCATTGA GTAAACCTGG A TTrAAGAAT AAAGCC -rTC AAACTTTGGA AAGATGTTAT ACAAGGAT'rG GAGAAGGAGG TGATAAAGTC CTAGAATGCT 'I-NrGAAAAC AGATGAGTAT AAAAAGAAAG ACGACTTTCT GATGGAT'rTA TAGTAAAATG AAATAAGAAC GTCAAATCGA ?TTCTAACAA TGTT1'TACAA GCACAGGTGT CTATANTrr GGAGTGATAG AAAAGCCCTT CATAAGCTAG TCTACTTGT1T CCGT'rCTCAA
AAGCGGTCTT
AAGTGGGTT
CAAACAAGGT
TCCGATAATT
TATGACGCGC
TTAGCAGTAG
CAAAAGTTAG
ATGATAAAGT
ATAAATTTGG
GAATAGAGT
TGATTAATAA
CAGGTGCGAG
AGCGTTrTATA.
'rACGTCCACC
TAACTACATA
GAACTTGACC
A7IT"rATC
TTACGTAACT
ATTTCAAAAT
ACAGAAAAAA
TCAGATCTAT
GATAAA'rAAT
CGTCAAAAAA
AGTCTTAC
AGCTI-rGACA TCTrT TCTG TAGTAGCCAA AATCCTTGAC ACAAAAGAGA AAGACTTGAC GGCTAATGAG TCTATrCCCT TACTTAGCCA AGTCAGTT CATCCCAGTA AAGGGCTTTA CGGAGAAAGA ATCCAAT'rCA GCCTCATATC TGTCTTGCCA TCATAGTACA ATACC'TTTCC AATTAGGATA GATACCTTGT1 TAAATCACTT AGT'rGAATTA TAGTATACTG GAAGTTGGGG TGTAACTAGC TGCCTAGTTT GATAAAAACG CATAGTATCA TCTAACTTTT GGGGTGTTTT
GATCTTTGCT
GGTATTGAAA
TGTTATGAAA
TCTTCATTGA
TGTACTGCCC
TTAAGTTATG
3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4398 GAACTTAGAA AACAAGGATA TAGCTTAGAG AAGCTTTCAA TCTAATCTrA GC'rATATGAT TAAATTGATT GATCGTTACG GGAAAAAATC GTTACTATTT TCCTGATTrA AAACAAGAAA INFORMATION FOR SEQ ID NO: 294: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 718 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 294: AGATTTTT"AG ACTTTGTCTT TAATCGTTTC GGCTATTAAC TTTAGCAGGA GGGATTATCC TGAGCTTATA TGCfAGAACAT GGTTATAGCT CTCTTTACAA GCAAAA'N'T GTCTCAAGCA GTCTAGTNT GACCTATGGT TTGTATCTCT ATTrTGATTGC GACCCTTTrG AATGTCCTAG TATCTTTAAA ATTACAAGTT TATT1'TGCCT TGATTGGCAT CTTTATGAGT CTAGCAGCTG TTGTAGCAAT TOOTTATTAT ATGCCTGCCC ATTTCTTTAT CAGTGATATG TTGGA.ACCTG TTTTTAGGGA TGATTGCGAC ACCTTCTTTT TTGGTCTAGC GCCGGCTAGT TTCGGGAATA CAGTTTGAAG ACCTGATTrTT CTATAGCTTT
GCCACCTTGA
GAGGCTTGGT
TTAGGTGTGG
TGGTGCAATT
TAGTTGCCCT
TGTCCTATCG
TGGCTAAGGT
TGCTATTTTT
TCTATGAAAT
GCCTCATCAG ACCATTGTTC GATCTTrTTTG GCTTATACAG AAATAGTCTC AAATTATCCT TCTCCTTGGG ACTGTGCTAC TGTAGGAATT GGGATGTGGC CATCCATGAA AAATTGGCGT 1326 CAAAATAGAA TCAAGCAGTT TTGGCTACAT ACGCTTCrAA GAACCTATAG TTCAGTGATG ATCATTATCA TTGCGAGI-r TGCAATCTTA CTCTCTTACG CTf-rCTGGGA TTCACGTG INFORMATION FOR SEQ ID NO: 295: SEQUENCE CHARACTERISTICS: LENGTH: 718 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295:
TCGGTACCAA
TTAGGTCrAG
GAAAT'TTCTG
AATTCTGGAT TTATACTAC TTTTCCCTGA ATCATTAAAA CAGAAGAAGC AAATAAGATA
AAAGATCCAA
GAATTTGAGA
AATGATCCTG
GATGATAAAA CTCTTGAGC TTTACAAAAA GATCCI'CTTT AAAAATGGTG CCGT'rGCTGT AATTCCAGAT AATACACCGT ACACCACTTT CAATAAACTA TACTATTGAA GAATACCTAA AAAAATGCGA AATAAAAAAC AAATAAACCT AGGCATAATT TCTTATTACA ATATTTTTGT CATTAAAGCT TGGAACAAAA TTTAGCAGCT TTTGGAATGG GTAATACAAA TGATGATTTT rAGAATACCT AGAACTATTT 'N'GCAATTrI! AGCAGGTTCT ATTGATGCAA TCAGTTACTA GAAACCCAAT AGCTGATCCA ACGAGCAACT CTTAGTGTAG TAATTGGTCC TTCtTTTTAG GAGCAAATTA TTTAACAGAT GTGAAGATAG TTTTGCAAAG ATGTAATCAT A.ACTTATGGT TAGGTAAAAT AAATGCAATT TAGCAGCCTC ATGCACTCCA ATCIT-r'rAGG AAATGCATGC TTTATAATCT GCCTAGGTCT GAAATTAATA TCAGAGATTT ATTAAATCAA TrATATATAA AGTCTTGCCA TAAGCGGTGT GGTATACTCG GTATAAACAC GGAAT-rCATC AAGCATAA INFORMATION FOR SEQ ID NO: 296: SEQUENCE CHARACTERISTICS: LENGTH: 1436 base pairs TYPE: nucleic acid STRANDEDNESS:- double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296: GAACTAATCA TTTACAGG ATGAGATTTA CACCAGAGAG TT'rGAAGGCT TTATCAAAGG TTTTT1CTI'G CATAATGACT TTT-CCTCGTT TCCACTI'AAT TTGTGTCTA CTTTATI'ATA CCAAGTCCAC sCTTAAGTTA GATAATAAAT CTAACTTAAG CAAGCTAGAA GGATGAGAAT CCAGGTGGTC AAGAGTCCCA AACTTAAGCT GATGGGGACA CCCAGAATAA TTTGCTTTTT 1327 GAAGGCAAGG CCACGTTCCT CTATATTGGG AAGTG3AGAGT TGAATGAGAG AACCAGCTGA
TGAAAAGGGT
AATATCAATC
TACAACGGAT
AGGTAACCAG
TGACTTGACC
GGGAACCTTA
GAGATAl-rAG
TGAGGATT
AGGGTGGAAC
AAATGAGGAA
GCTAGAGACA
GCTAAAATGG
TAGATAGAGC
GAGCACTOAT
'rAAAGAGTGA
TGGTTGTTGT
TTAGTA.AGCT
CT'rCTTGCTT GCCAATAACG GTGGCTGTTG TGAGTAAGTG GATAGCAATG ATGGGAAAGA GGGCTGGAGC CATCACTCCG GCTATCACAC AAAAGAACAG CATGAGGTGC CC'TATCAGTG TGACTAAACC CATGCCGCAG AGCATGATAA 'rTGTAGCCCA CCCTAATTTG AGCCTTAAGG CGAGGCAGAC CATGAGTATT GAGACAAAGC GTTTGGGAAA ATGAGATGCA GAGCAAGGTG GTTTGTCTT1- TGAGTTTGTT CTTCCCTTAC TGGGTAGATA ATGCTGACGA TCCCATTTGC TTAAACAGGC CCCTCCTGAA GCTCCCCAAT TTGACAGAGG GTAATCGCTA TAAAGCAGAC AAAAGGrTTG GCGTGTGCGG TAGAGAATGT GTTATAAAAG AGAGAGACGC AAGTTCTTTG GGGCTTAATC CAATATCAAA TGTTTTTTGA ACAAGGGAAA AAGCCAAACC GAACCTTGCT GAGGAGTGGT TATAGTGACT GTAACAGGAT TAAAGATATG ATTGCCAAGT CTTGAAAGAC AATGCCTGAG TGACGGCTTG AGCTCCAATC GAGGACAGCA AACGGCCATA CCATCAGGTA TAAAATCATG GTTrGAGCCAA AACATCAAGA TAAG'rAGCTA TCCAGGCGAT AAAACCATGC TGCTGATCAT GGTTGGTCAA TAGTCAAGGA AATAAAAGCA AGACGATGAG GAAAAAGCTT GCTCTTCCCA CTACTGGTTA TCAAATTAGC AAAGGGTGTT TGTCCGCTTT GTAG'rGAAAA ATCCAGCACC TAGAGGGCGT TAGGGTGGGT GTACCGTTAG TTGTTGCAAC 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1436 TAAAAATGGT AAAAAAGAGT GAGGTrGGCC AAAAATGAAG CCATGAGAGT GGTTGCGA'rG AGGTAAGAAA AAGCA.ATAGC CAGCAGGCCA ATATTGATTT TGGTGCGGTA ACCAATTCCA INFORMATION FOR SEQ ID NO: 297: SEQUENCE CHARACTERISTICS: LENGTH: 1696 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear ATGGCTAGAG CAATGG (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: CCA'N'TGGGA AAGAACGTAA GAGTTTGCAG GGTGAGATTC CAGAAGAATT TTCAATGTCA GCCGTTGACA TGTCTATGAT TGACCACATT CCAGATATGA TTGAAAATGG TGTGGACAGT CTAAAAATCG AAGGACGTAT GAAG'rCTATT CACTACGTAT CAACAGTAAC CAACTGCTAC AAGGCGGCTG TGGATGCCTA TCTrGAA.AGT TTGGTGGACG AGATGTGGAA GGTTGCCCAA
ACACCATCTG
GTCGCTGA.AG
GTCATTAACG
TATATTGAAG
GAACTATTGA
AAAGAAGGAC
GAAAGGAAAA
T'rCATTACCA ATGCTT1GTTA
CTTTGATACA
AATATAAAGA
ATCATAAGGA
ATTTCAAGAG
AAAATGAGCA
TGGTTCTA
AAGGGGACCA
ATTTGCATGA
CTATTAAGGT
TICATCAATCT
GGAAATGATA
TAGCAAGCTG
TCGGTGTCAT
GGATAAGCCT
AAGCTTAAGT
ACGCTXTTT
AAAATGAAGT
GTTGTTTGGT
TGATGAWGC
AGTTGAG7r
TGCCAAAGGC
1328 CCTGAAAAGT TTGAAGCrAT CGTGAACTGG CTACAGGA'T GCTCGCCGTA AAATTCCTGA GCACAAACAG CAACAATTCG TATGGTCCAG GTTTCCGTCA AA'rAAAA'rCG ACCGCGCTCC
CAAACAAGAC
TTACTATGGT
GTACAAGT
TCAACGAAAT
TrTTGAAACC
AAATCCAATG
TCGTGCATTA
TCGAGCTTAA
GCCTCAACCC GTrCAATCAG T'rATAAGGAA GATGGAACCA GAGGCACAGG GTTITCTTAGT GATATTATTG CTT'rACAATG GATTCATTAG AACATCACCC AT'N'TATGTG GTGTrTGTCT TGCCCCTTTT GTTTTCTCG AAATAGCAAA TCATCTAGTT AAATCTTCCC ACAATAAAAC TTTAAGATTT TAAAGACTTT TTAATTTTCG ACTGTTTAAT
GAGATATGGT
rGrCTCACAGT AATACCTGAT ACTATGCGT1' GACTACTTGT TAAAACTGGG AAGCTTAGAA 'rTGTCAAAAC T1'CTCAGTCA TGCCCTTGGA GAGTT'rGCAG CGTGAGATTC TGACCATATC TCAGATATGA GGAGTCTATT CACTATOTAT TCTTGAAAGT CCTGAAAAGT GGTTGCCCAA CGTGAACTGG GTTGTTTGGT GCTCGTCGTA GGATAAGCAA ACAAGATGCA CTATGATTGT AAAAACTATT TTTTGAGCCG TATCCCTTAT AAAACTACTA ACATATAAGC CTTTAATCCA GGTTGCCKAA TTGAAGTAGG AGAA.AACTCA GCATAATATC AAGATTGTTC T'rTCCTT'rAT CTGGTATTTT AG'r'ATTATG CAAAGTCTAA GATGCCAACC GTGGTGGATG CCATTTGGGA AAGAACGTAA GCCGTTGATA TGTCTA'rGAT CTAAAAATCG AAGGACGTAT AAGGCGGCTG TGGATGCCTA 'NTGGTGGACG AGATGTGGAA ACACCATCTG AAAATGAGCA GTCGCTGAAG TGGTTTCTTA 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 AATCCGTCTA GAGTATGCGT AGTACGACCT TTACGATATG CAGAAGAAT'T 'IrCAATGTCA TTGAAAATGG TGTGGACAGT CAACAGTAAC CAACTGCTAC TTGAAGCTAT CAAACAAGAC CTACAGGAT TTACTATGGT AAATCCCTGA GTACAAGTTT TGATGATGCG GCGGTA 1696 INFORMlATION FOR SEQ ID NO: 298: SEQUENCE CHARACTERISTICS: LENGTH: 1022 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1329 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298: CCGAGTrTAT TATGGTTTCT AAGAAAGTCT TTATAGTCAA CAT1T'PTAAC CTrCAGTACT CACCAGGTAG TGGACGC'PTC 7'TTTTlr-A TGCCGAGTTG TAAAAAAAAT CTCTrGGAGA TTATCGGAGC CTTGATAGCA CACACGATAG TTTCATCTCA TCTTGCTTGA GGCGATTTTG TGGTCAAAGA TGGTGGTGCC TCTTAACAAA. CGAGCACATT TTGCTGCGGA TTCAATTGCC TCGGAATTTA 'rCTCAAAGAT TGAATTTGCT TGCAALTAAGA AGCAAAT'PTA AGTATGCGAT GCTGCAGGTG CAGTTGGGGC CTCTTTCCAT TCGTTTTTGC GTCACTTCAA ACATGATGTT AAAACAGC1TG AGATTTTACT TCGTTCGA'rG TTCGCAGGTG TGACTTGATT AATAAAATTG TTGGGGCTTG GCCTACATTG CTTGACTGCT GGTAGTTTCT ATACTGTACC TTGTTCAACC GGGTGGGGCT TTGCTCATTC GGCAGCCTAT GCOAATCTGA GGTGTrGTTG AGATGAAGTT AGGCCGCTCA AATGAATTGG GCAAATAT 'rTGrAAATAT TGCGATTCTG TCATTTATTT AAACTTTGGC TTGTGTTGTC AGCTATTTAC ATGTTTGTrAT GCGGCGAACT 'TGCTrCTrT CGCGATTGTG AAATTCAGTG AACTTCGGTG TTGGAAATAT G CTTCGCCAC TGGGGTGTGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1022 CTTTCATCGG AAACTTTATC GGAGGAGGCC ATAAAAACGA AGATACTrAr G1AGATTAAG CA7=TTCAAA ATAAGGTAAT AGCTATTTCT ArACTATAAC TCAAGGTCCT ACAATATCCT CTTGTGATTT TAAATTVTGAA ACTCTACAAC
TCTTGATGGG
AAA.ATGAGCA
TATATCAAAA
TAATAAAATA
TCTTCCATAT GCCTTCCTCA CGATTGAGTC GTGC'TTTT TATAGAAAAC TGATATTTGT ATATGGAGGT CACCTTATGA TCAAGTTGTG GCTCCAGCTA TACATGCTGr, INFORMATION FOR SEQ ID NO: 299: SEQUENCE CHARACTERISTICS: LENGTH: 663 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299: CCPAAGTAA TCTCTGATAA TATT'PTCTTT ATTAGCATAG GGGAATATCG ATATAATGGC TCATTATGA GTGGCAGGAA TATCCAATAT GGCAACTTNT CCAATAGATA A'rTTAAAACT CATTAATAAA GL'CCTTTAG GTGAAATGTC TA'PTTTCTTT GATTTTAATG CTAATTTAGA 1330 AATAGATTCT CTCGCATTAG TTACATAACC AGATATAGGC AGGTATTCA GTTCCCCAAA AAGTAGCTTC ACTGCGTGGA ATATCTGATA TAGATACCCA GGAGTTTTC CTATrCTGAA GGGATT CAT ATATAGGATA TCTCCTTGGG AAACAATAGA GTTAACTAGG CTAGCAAATT AGAGGTTGTT TCG'rC'rrGT AATGTCCAAA TCTTTCTT TA'TTTTCA AGTAAAACTr T'EGCATAGCA TATTGAAGAA TTCTAGTCTA TTATAACTTT
TAATATATCT
TCCCATAATA
TAATCTTGCC
CGACTGATTC
TAGATTTTT
CAGCATATTC
CCATGCTTCT
AGAGTTATCA
TrCTTCAAAG AGTT TGTT TqTCTGCTCG ATCATI'TGGG TCTrGTTCAA CTAATTTTCC TACTTTATCT GGAA.ATTCrT TATCTAGCTG ATCTACT'r TCTAAAGCTG Ar1rCGATTGC
'FTC
INFORMATION FOR SEQ ID NO: 300: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 881 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 300: 6e*e*~
S
S
S. S.
S S
S
S S
S
*S
S I S
S
5* 5
S
55*e
S
.5*S
*S
5 5 CGTCGCTGAA CATGTCAACA CATAATTTTC TTTAGAAAAT ATCCTATGAC CCTTGAGGAA 'FTCGTCGACC AGAAGTGGTA AAAACAG'FGA GTACGAAGCA GCTTAGAAAC AAAAATCCGC AAGTAGCGAT TGGTAAAACA ATAT'TATCGT AGGTTCAGCT CAATTGG43CA GGCCTTGATT TTGGTAGCTA TGATGTAAAA AGTGGGGAGG CGATGTGCTT CTTCAAACCA CGTCAGCGTC CTACAACCTC AAAACAGTGT TATGTTTTGA GCTGACTTCG CTTCGTCAGT TTCATCTACA GCAAATTAAA CTAAACAAAC TAAAATTATG AT'TATCAGAA GAAAGTTGAG AAAAATGGCA AAGGAAAAAC TTGAAAAAGA ATTAGAAGAA GAACGCATTA AGATTGCCCG TTCATACGGT GCTAAGGATG AACAAGCCTF TG'TCGAAGGA TATGCTIGAAA TCGTCAATAG CGACGCAGTT GTCACCATCC AAGAAATTGG TGAGGACGAA GGTGCAGATG CCTTTGTAGG TAAGGTTTCA GGCAAGAAAA CAGGTGATAC AGCAACCATT ATCT1'GAAGG TTGAAAAAAC AGCCTAAAAA CACTCACTCC ?N'TTCCATT TTGCTACTCT GCCT'GCCGT ATGTATGGTT ACTGACTN'G TTTGAGCTAA CTrCGTCAGT TrCATCTACA TCAGTI'TCAT CTACAACCTC AAAACCATGT ACCTCAAAAC TATGTTTTGA G
TGATACTTCA
GAAAAAACAT
TTGAAATTGG
GACCTTTCAG
CAAATCTCTA
GCCCAGGACG
GAAGAAGTT'
AATGAAAGCC
GAAACGCCTG
CAGAAAAAGG
TCGAAAATCT
TCAGTTTCAT
ACCTCAAAAC
TFTGAGCCCA
1331 INFORMATION FOR SEQ ID NO: 301: SEQUENCE CHARACTERISTICS: LENGTH: 949 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301: CCTrTTAA TACAAGTTIAT TTTGA'rPTAA CCGGCTrGTC TGGCAATCGT ATCTGCATAC AATGAACCTG GTCTGTTCCA CTGCTATCGT AATGTAAGGT CAACGATGGC ATAGGTCTCT GGTGTCCCTT AGGAAGATTT TATTTACCCC AGTCGCAATG GCATGATT~T ATTTGCGGTC GAAGAGCTGT CTGTAGTGCT TGCCATCAGC AATTCCCAAA CAATA'rrTGT TGCAGCTrGC CTTGTGGTGC CAACAAGGTC CAAGAATrGC GTGAATATAA CTGCAATCCA AGGTTCCAAT GAGTCAGTAA TACAGCAAGA AATGGAAAAG ATAAACCGCA AATTTTGCTC CTGCTTCGAT GCCCAAATTT CTGGATGCTC GTCTTCTCTG CCAATTCTCT TTGrCl-rAT CTCCCTCATA 'rrCACGATAC TGTCCCAGTC ACCACCGTCT TAGGTAAAAA TTGGTTGTTA CGCTGACCTG GTATTTGCCC TrAAAGCCAC CTGTVTGCAT CTGCCCGTTC T'rGAGCTGTC TGCAAAGCTG AGTGCTACTC TCACTCCCGA TT'rCGCAACT TGATTCCAAT CATATAGGCA GCAGCCTTCT AGGAGTCACC AAAATCATAT ATCCrTG'rAA TTCTCAGGAT TTTATTCTGG CTATTATTrA CGCGTTAATC TGTGCTCCAG TGAGTCACCA ATTAACATAG TGCCATCACC TTGG'rCTGGC TTCAAGCCAT TGACAGTCAA G'rCTGTCTCA AACGCTCCCA ACCGTGCAGA CAATGATGGT CAAGATTCCT GTACCTGCTG GGCAGGGGAC GAAsGGTTTG GACAATAGGT GTGTTC=TC ACATAAAATG ACAGACTGGC AAAGCCATAA GAACAAATCA AGA'TrTGATG TCAACTGTGA GAAAATGATA TAGAAAGGCC TAGCTAGTAT CCGCTAAAAA GCTGATAAT INFORMATION FOR SEQ ID NO: 302: SEQUENCE CH~ARACTERISTICS: A) LENGTH: 622 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302: AAGATATATT TTTTACACAG AAGTATGCAA AAGTAAAGAG TGCAAAAAAT GGAATTAAAG 1332 CG~AAAM'AAA AGCCGTGTAC AGGCGACCAA ACCAACGTAC ACGGCTAAGG AAAAATAACA 120 AAACTCAAGC AAAGGCAAGG CGCGTGGTTT TGTTAGGTAT TTAGCAAGGG GACAAACCCC 180 TTTGTAAATA ATCTCCTCI' ATTTTIATCAA AATrAGAGGA AP.ATGACAAC TrAATrTATA 240 AAAAGGAAAA ATGGAGGATA TAAATGGAAA TTCrGTCTAA AGAAATACAG TTACAGGGCT 300 TACAACTTCT TAAACAGACT CTTGAAACTT TAGTTGAGCT A~tdAAACAA CGATCTAGTA 360 AGTTAGATrT AAX'TCTCGT AAAGAATTAA TGGATCTGCT AGGTATAAGT GCTACAACCC 420 TTGATAACTG GGAGGATCTT GGTCPAAAC GATATCAGAC TCCGATGGAT GGAGCTAAGA 480 AAGTATTCTA TCGTCCGTCA GATGTGTATT TATTNTTAGC AATAAAATAG GAGT'rATGAA 540 ATGAAAATTG TTACTTTCAA ACCAACTAAA CAAATAGACG ATGGCTTTTA ACTGCCAGGT 600 ATTGACATrC TATTTGTCTC AG 622 INFORMATION FOR SEQ ID NO: 303: SEQUENCE CHARACTERISTICS: LENGTH: 1929 base pairs TYPE: nucleic acid C) STRAflDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 303: CGCTAACTTG CAAACAAAAG AAGAACGCAA ACTCCACAAA TCCTTTACGC AGAAACTCAA TCTC.ATCTAC TTACCTTGCT GACTTGGTAG AGTATGTTGC AGACAAAGAC TTCTCAGTAA 120 *ACGTAATTTC TAAATCAGGT ACAACAACTG AACCAGCGAT TGCTTT-CCGT GTCTTTAAAG 180 AACTCTTGGT TAAGAAATAC GGTCAAGAAG AAGCTAACAA ACGTATCTAT GCAACAACTG 240 ACCGCCAAAA GGGTGCTGTT AAGG?1'GAAG CAGACGCTAA CGGTTGGGGA ACATTTGTTG 300 .TTCCAGATGA TATCGGTGGA CGCTTCTCAG TATTGACAGC CGTTGGTTTG CTTCAATCG 360 *CAGCATCAGG ACCTGACATA AAAGCTCTTA TGGAAGGTGC GAATGCAGCT CGCAAAGACT 420 ACACTTCAGA CAAAATCTCT GAAAACGAAG CTTACCAATA CGCAGCTGTT CGTAACATCC 480 TTTATCGTAA AGGCTATGCA ACTGAGATCT TGGTAAACTA TGAGCCATCA CTTCAATACT 540 *..TCTCAGAATG GTGGAAACAA TTGGCTGGTG AATCAGAAGG AAAAGACCAA AAAGGTATCT 600 ACCCAACTTC AGCCAACTTC TCAACTGACT TGCACTCACT TGGTCAATTT ATCCAAGAAG 660 GAACTCGTAT CATGTI'TGAA ACAGTTGTCC GTGTTGACAA ACCTCGTAAA AACGTGCTTA 720 TTCCTACTTT GGAAGAAGAC CTTGACGGAC TTGGTTACCT TCAAGGAAAA GACGTTGACT 780 TTGTAAACAA AAAAGCAACT GACGGTGTTC TTCTTGCCCA CACAGATGGT GATGTACCAA 840 1333 ACATGTATGr GACTCTTCCA GAGCAAGACG C?1'rCACrCT TGGFACACT ATCTACTrCT TCGAATrGGC GTGT TGAAGC
TGAGCAAAGA
TACTCTCTTT
AAAAGGCAGA
AATTGCCCTT
TTATAAACGT
ACT'rAACGCA
ATCCATAGAA
CGCCTAGATA
TCAGGTTACT TGAATGCTAT CAACCCATTT GACCAACCAG AACATGN'G CCCI'CTTGG AAAACCAGGA TTGrAAGAAT CGTCTATAAT AGAAG1AAAAG ATTGGACTCA GCCAAGACTT ATAGGAGAAA CTATGTCAAA CACCAAGTCC AACAGGACTA.
TGTATGCGCG CCATCATGGT GCCATGTCGA GGATGGTGAA GGGATGAAAG TCCAGAATCA AAAAATATAT TGACCAACTA
CTACACATCG
GGAACATTTC
CGTTCACAAC
GAAATGCTCG
TCATCCGTAT
TTGAAAACCT
CATGAGAATT ATCGCCAGTC TTAGCTGAAG GAAAAGCCTA AGTGGTTTGC CCACTCTTT GTGATATAAT ATAGAAAGCA AGATATCCGC GTACGTTACG TACAGCATTG ?1'TAATTACT CGAAGATACT GACCGTAAAC TCGCTGGT TA GGCATGGATT TGAGCGTTTG GACTTGTATC TAAATCTTAC GTi'ACAGAAG CGAAACACCA CGCTACATCA CATCGCAGAA CGTGAAGCAG AGGTATC'rAC AAGTGGCATG CGGTGGTGAC TGGGTTATCC TATCGATGAC CACGATATGC TACACCAAAA CAGCTTATGG CATGACCTTG ATTATCCACT 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1929 AAGAGTTGGC AGCTGAACGC GAACGCCAAG ATGAATACCT TGGTATGAGI' GAAGAAGAAA CAGGGATCAT CCCAACTGTT CGTGTGCrG ATATGGTCAA AGGCGATATC GAA'TTGCAAG
AAGTAGCTGG
A.AGCAGCTTA
TCAATGAGTC
GTGGCAATAT
AAAAGAAAGA CGGTTACCCA ACT TACKACT TTGCCGTTGT AAATCTCTCA TGTTATCCGT GGAGATGACC ATATTGCTAA TCTATGAAGC TC~rGGTTGG GAAGCTCCAG AGTTCGGTCA
CTGAAACTG
INFORMATION FOR SEQ ID NO: 304: SEQUENCE CHARACTERISTICS: LENGTH: 708 base pairs (B TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304: A.AATTTAAGA AAAAGGAGAC ACATCATGTC TAAAAAAGTA TTATTTATCG TCGGATCACT ACGTCAAGGT TCTTTCAACC ACCAAATGGC GCTCGAAGCT GAGAAAGCAC TTGCTGGTAA AGCGGAAGTT AGCTACCTTG ATTATTCAGC CCTTCCTCTC TTCAGCCAAG ATTTGGAAGT TCCAACACAT CCAGCTGTAG CTGCTGCTCG TGAAGCAGT' CTCGTTGCGG ATGCTATCTG 1334 GATTTTCTCT CCAGTCTACA ACTTCTCTAT CCCTGGTACA GTGAAAAACT TGCTTGACTG GCTATCTCGT GCCCTTGACT TGTCTGATAC ACGTGGCCTT TCTGCCCTTC AAGACAAGTT TGTCACAGTA TCATCTGTAG CCAATGCAGG GCACGATCAA CTN'TCGCTA TCTACAAAGA CCTCTTGCCA TTTATCCGTA CACAAGGCCT TGGTGATTTC ACTGC'rGCAC GTGTrAATGA c~cTGCCTGG GCAsACGGAA AATTGGTTCT TGAAGAAACA GTCCTAAACT CACTTGAAAA ACAAGCTCAA GACTrGGTCG AAGCTATCAA GTAACTAACA CTCAATAAAA ATCAAAAAGC AAACTAkGAA GCTArCCGCA AGCTACTCaA gCACTGCTr'r GAGGTTGTAG ATAGAACTGA CGAGTGThnA ACATATATAC GGTAAGGCGA CACTGACGTG GCTTGA.AII INFORMATION FOR SEQ ID NO: 305: SEQUENCE CHARACTERISTICS: LENGTH: 781 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 305: CTTCTTTTCT TGGAAATAGG TGTATAATAC AAGAAAAGTT TTATCCATC-A ACAAGAAGAA TATTTGAAAG ATAAGCTAGA AGTTGTCGAA GACGGAATGC AGGACAACCT GTCTGGTGTG ATCCCTGATG CTACTTATGA AGTGGTGCAC GCTCGTrTG GCTTTGGTGA AGGAGAGGGT GArGAGGATT CCTTGGATGC AACCCACTCT GTTATCCCAA ATGGTAAGCG TAACATCGTT AAGGCTATTC GCCTGACTGA GCTAGCTGTT CCAAAACAAA TTACCTTTAT CCATACAGAA CCGAAAGAAC GTGAAAATGC GATTTGTAAA GGTGGCGAGT TGCCAGATGG TAAACCGCAC ACAAGCGAGT CTGAGAATGG CTACAAGGGT
GTTTATTAAA
ATTTCCTTTG
GTTCAAGGTC
GAAAATCCAG
TCACTTGCTA
CTCTTTGTCC
GTTTATGTTG
TATCTAAAAG
GA.AGCCCGCT
GAATTGGTAG
GAATTTGGAC
GATGGACGTG
TTTTGAGGA GTTGTCTATG TCAAAAACAC TTTTACCCAG CTATCTTGAG TAAGGTCGGT TATCGGTCAA GGTTCTCCAA AATGGAAACG CCACACCTTG ACATGAAAGC CCTTCGTCCA ACCAGTGGGA CTGGGAGAAG AAACAGTTGA GAAGATTTAT ATGACATCGA GTCTATCTLC AACGCTACCC AGACTTGACA CCGTCTTTTT GATTGGTATC CACCAGACTA TGATGACTGG CTGGAATGAG CTAAATGGTG ATATTCTTGT INFORMATION FOR SEQ ID NO: 306: Wi SEQUENCE CHARACTERISTICS: 1335 LENGTH: 846 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 306: CCCGCATCTT GTAGGGTTTr AACGGGCACG ATN'TCATAT CCGTC?1'GAT TGTTTrAGCC GCTTCTAGGG CTGTTrGGTA GTTGTTTTrC GCGTCCGGAT GCGCCTTG TTrTCCG CTAACAGGGT TATCAGGAGC AAAGAAAATA GCAGCACCTG CCCTAGCCGA AGCTACAACC TTCTTATCAA TACCTCCAAT GTCTCCCACA TrACCATCGC GGTCAATGGT ACCTGTACCG GCAACAATAC GACCAT'rACG AAGATCTGGG TGAGCTATTT GAGTATAGAT AGCTAGAC1TA AACATGAGAC CAGCAC?1'GG ACCCCAATA CCAGCTGTrG AAAAGCTAAT TGGGACATTG CTGATTACCT CTGTACGGTC AATCAAGCCG ATTCCAATTC CATTTI'TGCC ATTTTCCAAG GTGATGATTT TTCCTTCTGC AGACTTGGTI' TGCCCATCCT CI'CATAGGT GACC'N'GACG GAATCCCCTA ATTTTrGAGA ACTGACGTAA TCAATCAAGT CTTTGGAACT ATCAAAGGTC TGATCATTGA CTGCTGTGAC TGTATCAGAG ATATTGAGAA TCCGTCACAT TCAAAACATA AACTCCAAAG TACTrGAGTT ?r'rAGTCCTT GA'rACTTGGC CATATTTTGC GATGTTTGCA ATAAATTCAA CATCGGAAGA ACCACCTGTA GTCTCCTGAG GGTGTCAACC AAGCATAAAT CATATGAGCT AAAGTGGCAT
AATTGT
INFORMATION FOR SEQ ID NO: 307: SEQUENCE CHARACTERISTICS: LENGTH: 829 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TCCCTTTAAA GGTTGAATTA CGATATCCTT ACCAGCTGTT TGTAGAATTG ATTGATTCGC CACTACGAAT ATCTGTAAAA GTTGAACACC AACCGTAACG (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307: GCGATCTGCT TGGGCTN'TC CTATTACCTT ATCTAATAAA TAGGTACGCA GACTCATAAC CATATAAAGT CCACCCCCCA TGGCACCGAC AAGAGCTACA TAA.AAGAAGC TCCACAAACG TCCACTTGGT TGGAAGAAAA ATCCTAACAG CCACTGGATG GTTCCTATTA ACAGAAACAT GACTAGGGTC AGCAAACTGA TTAAAATGGT TCGCTTCA.AA ATCACCTTGC GCTTGACACC 1336 AGTTACTTTA CAAATATCCC GATACATCAA GACGTTAGGA
T'GAAATCAAA
GGCAATAGAA
CATTGGAGAC
TAAGCCCAGA
CATAACCACT
AACGAGACGA
ACCAACACTC
GGCTGAGAAA
GGACCATAAC TGTGGAAGAG GGCGATGGTA CCATAGATAA AATAGAGAAC GGCCI'GCGG AAGACCATGT ACAAGCCrAA AATAATAGAC GCCAAACTAT CTGGCT'rACC ATAGAAGACC CCAACCGTTG CTGGTAGCAA GAACATAAAG GAAGCTGCT'T TCAAGTCCCC CTTGACATAG CCAATCGAAA CCCCTACAGA AATCAAAATC TAAGAAAACA TGACAACCAA GTCCTCATTG ATGATGAGAG CAATGGTTGT GGTAGTTGCA AGACTAGCN' TTGCGGAACA TGCCCTGAAG TGCAAAACTG CAAAGACAAA GTATAAAGAG GTTCTCCTAC AGTAGGGTGA GACTGTCCTG T'rTCCGTCA AA.AGTGGCAA ATCGTGATTT TATTAGGATT CTGTAGTTCG TAAACCACCT 0
S.
5 5
S
4* 55
S
o S. 5 0
S
SO
S 50.5
S
40*0 S. .5 0 0 *055
S
55 *S S
S
CATACTATTG ATAAAGGrCA GCTGAGTCCA AATCTGGAAG AGCTGGATG INFORMATION FOR SEQ ID NO: 308: SEQUENCE CHARACTERISTICS: LENGTH: 464 base pairs TYPE: nucleic acid STRANDEDNESS double D, TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 308: CGAACATCTT GCTGGCTGAT TCGTCTGCCG CCATCGCAGC CCCGAACACA TGGCAAGCGG GCTCAATCCG CACATGGGAT CCGTGCCA.AA GCCCCGCGTG GCTCATCTAG TAACGTATGA GGTTTGCCTT CGCTGTCCAT AAACCGATAT CACTGCTCGT TCTCCGCGGA GGGGAAACCG ACTGCGGTAG GATGAACTCC GATCACGACC TACCAGGTGC GGCTCGTTGA AGCTGTTGCC GCTTAGCAGC CCACGCATTC CCAGAACTCA. ACGGGGGTTT GATCGGCGTT CGGTTGCTGA GGTGCACGGG ATGCGAAGTG GCCACTTCTG GCACACCGTT CTTGTCTTCG TTGGGAGGGT GGCCAGCGTT TCGGCGATGA GGCGCACGCA GGCC INFORMATION FOR SEQ ID NO: 309: SEQUENCE CHARACTERISTICS: LENGTH: 982 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309:
TTGCGACCCA
TGCATCATTT
TCAATCGCAC
AGAGAAGAGA
AGCCTCGCCA
CTAATA.ACTC
TAGAGAGCAA
464 CCGTCTATAA TGGTAATAGA TTTITATTTGG GAGCAGGTAT CTTGGCCTGC CTTCTCAT
TCCCTGTGGT
TCTCGTCCTA
ATGCCGTTGT
TGGCGCGCCC GT'rTTTGCCA TAAACAACTG GATGCTGGTT CTTGCTTGGT TTTGGTCTCA AGGTTTTTAT GTCATTTCTA TCAAAAAATG CCATCCTATC TTGGTACTTA GGAGGATTCT TTTTCATAGG CATGCTCCTA CATCCCTrTC TGACCTTTAG TTCCAAGAAG TTGCTCCAAT ATATCTCGCA GGTCTTCGCA GTTGGCCAAT CTTCACTCCC TGTCATCCTG TCCACTATCT CAATAGCTCT GATTATTGCC TACCTCTrCC AGCGTTTCrT TGCCCTGGAT ACAAAACTGG CTACCTTGGT TGGAGTAGGT TCTTCrATCT GTGGGGGTTC TGCCATTGCA GCGACAGgCC CGTTATTGAT GCTAAGGAAA AGGAAGTAGC CCAAGCCATT TCCGTTATCT TTTTCTTCAA TGTCTTGGCT GCGCTCATCT TTCCAACCCT CGGCACCTGG CTTCATCTAT CCAATGAAGG CTTCGCCCTC TTTGCAGGGA CTGCGGTCAA CGACACTTCC TCTGTAACGG CTCCCGCCAG CGCT'rGGGAC AGTCT1rrACC AAAGCAATAC CCTCGAGTCT GCAACCATTG TTAAACTCAC ACGTACTTTG GCCATTATCC CTATCACCCT CT'CTATCC TACTGGCAAA GTCGCCAACA AGAAAACAAG CAAACCCTGC AACTGAAAAA AGTCTTCCCA CTTTTTATCC TTTACTTTAT CCTTGCCTCT CTCCTCACTA CACTAC'rCAC CTC'TCTAGGT GTGTCCAGTA GTTTCTTTAC TCCTCTCAAA GAACTCTCTA AATTCCTTAT TGTCATGGAC ATGAGTGCTA TCGGTCTCAA AACCAATCTG GTCGCTATGG rCAAATCCAG TGGA.AAATCC ATTCATCATG GA INFORMATION FOR SEQ ID NO: 310: SEQUENCE CHARACTERISTICS: LENGTH: 1939 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 310: CTAGCTGCCA ATATGATTGG GGTGCAGAAG CGCGTGATTA TCTTTA.ATCT TGGCTTGGTT CCTGTGGTCA TGTTTAACCC AGTGCTTCTG TCCTTTGAAG GATCCTATGA GGCAGAAGAA GGCTGTTTGT CCTTGGTAGG TGTGAGATCA ACTAAGCGTT ATGAAACCAT AAGGCTTGCC TATCGTGACA GCAAGTGGCA GGAACAGACC ATTACCTTGA CAGGCTTCCC AGCTCAGATT TGCCAGCATG AGCTGGATCA CTTGGAAGGA CGAATCATTT AGGAGGAAAG CAAATGAAAC GAATAGTCTT TGAACTTATT TTTATCGCAA CGACCTGGTA TATCTTTTTA CCGCCCCTTA 1338 ACCI'GACCAG CTGGGAArI'T CTCTTCTTCC TCTGTGGGCA ?TTGTTAG'N' GTGGCAATAT TATTTGGCI-r TGGCAAGGGG ATAAACCTTG TCAAAACCCT TCATGTrGCGC CACGGTAAGG CGGAAGCTGC CTTAAATCTT GAGCGTTTCA AAATCAATCG GTTAGGGAAA ATTCTGTTAG CTrCGATrGG AGGAATTCTT CTCTTGGCAG TTCAGGCTAA AAATTATGCC AATGTAGTCA CTAAGAGTGA CACCAGTAAG GTTCCTATCC ACCGCTACTT GGOTTCCCTA ACCGA'AAGG CTT TGGTTTc CTTGGTAACT TCCAGCATGT CGGTTACGGA AAAAGAC'TTT ACTGAATTTC TAGATAGAAG TACTGCTGAA AAAATCGAG TGTCGCAATA CGTAGCGGCA GATACCTATA GGGTCACACC ACTAGAATAT GCAGACCCTA TCGGTGAGTA TATTAAGGTG GACATGGTAA CACCAATCAA GTATTCAGAC TCGGAGTATT CCCAATTGAC AATTGATGGG TCAAATGGTT TAACAATCAA CTGGAAATGC GGATTTGGTG TTAACCGTGA TGTCAAACGT CATCTTTTGA GGTGGACGAT AATrTGGACT TGCTGT'rCCT AAACCAAGGA ATACAGCTrA
AAACCTTATC
GCCAAGGGAA
GACTTGAAGA
CACCTGCGCT TGAAGTACCC GACCAAAATC TTTAAAACTC
TACCAAAAGC
ACAAATGGAG
0 .0.
GAGGGCAATC CTTTCATGT AGCAACGGTT CGTCCTGCT'r CAGTCATTAT CTTGGATGCT TCAGATGTTC CAGAATGGGT AGGAAACCAT TGAGCAAATC AACTACAACG GCAAGTACAA TGATT'rCCAA GAAAAACGTG ACCCAGACTA CCAATGGCTA ATGACArCTA TCTCTACACA GGTGTGACGT CGGCTAATGC TCATCCTTGA AAATATGCGA ACAGGAGAAA TCACTAAGrA AAGAATCAGC CCGTGAATCA GCAGAAGGTG CTGTTCAGGA TCCCAATCCT CATCAACCTC AATGACAAGC CTCTCrACAT CTGGCTTGGT CAAAGAGTAC GCCCTGG'rAG ACGCAGTCGA CTACTACAGT GGAAGAGATG CTCAGCAAGT ATGCCAATAA ATGCAACGAC AGAAAGCATC AATGGAGTAG TAGCAGACCT GAGACACTGT CTACTTCTITr AAAGT'rGATG GCAACATCTA GGACAGCATC TA'rCCAGCAG GGACGGTTTC TTGAATGCCA TAATTACTTG TCTATCGGTA GGATGAGAGT*AA'rCTTGGT-r 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1939
TAGCTTGGCT
GAAATCCTAC
CATGGGCTTG
GTACCAAAAT
AAACCACCTT
CAAATCAGCT
CAAGGTCAAG
TCTGCGACAG
AAAGCAACCT
AAGGACAATG
GT'rATCGTTG
GAAATTGACA
GTTATCAAL;G
GCTTCAGTAT
GGAAAAGACA
CCGATGACCT TCCTTACCTT GAAAATGGTA ALTTATCTCAA GACCTTTAAG CTACGGTAAA AATAAGGTAA ATTAAGCCG- INFORMATION FOR SEQ ID NO: 33 Mi SEQUENCE CHARACTERISTrICS LENGTH: 907 base pa TYPE: nucleic acid AAACCTTCGA AGGTCAAGTA AATAGGTTrTT TTTCAGAAAG TATATGTTAT 1339 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311: CCTGCTAATA GAGAGAAAGA, CTAGGAGTAG AAGTAAGCCA AITAAATAAT GAGAAAGI'T CATACCCCGT CCTTTCATGT AGATTTGGTA TCGAAAGATA TCTGCGGATA TAAATGTAAC
ATTATTTTTC
GCAACAAGAC
ATAATATCTT
TAATCTGTCA ATAAAATTTC TGACAATTTA ATAAATACAA TTTCTCCTTT GTTATCCTAT TCTAAAATGT r=ACCTTA CGAGGGAGTA GCTAGCCGTC CAATCAAGAT ATTGTTTAGC
CAAGGAGAGA
ATCTGATAAA
TTTTGAAGCA
TCTGCTAGGA CAC1TGGCTGG GTCACTAGCA CGTCGAGCAA CAATCTCGTG TGGGATTTTT TAATTTAGTA ATTCTTCAGC AGTTTTAAAC ATTTCTTT"GA TAGTA'rAGCC T'T'r'rTAGTT a a a CCTAAGTTAA AGATTTGAGA AGAACTG-CT TCr'rGAAATA TGAGCCTATG CAAGGTCCAA GACATAAATG TAATCTCGAA TCGTAGTCAT C1'CCAAATAT TTTTAAGCTA TCATTTTGTC TT'rGGAATGA TGTGAGTTGG ATTT rTCACA CGCAGACCGT CCAGCAACAT TAAAGTAACG GAAAATAACA TATTTCCAGT TAAATCATTC GTTCGCCCAT CAGTTTTGTC TCTGCATAAG GTATCTTCAG TCACCGGCTT GTCAATACAG TTA'N'?CCAT AACATGATTT TTTGAATGCC AACTTCAGAT AAGACTTTGA
ACGTTGG
GGTAGTTCAT
TACATGAACC
CCAATGCGGT
TTGAAGCATC
CGTAGCGATT
GGTTGACAGG
AGAGAGAAGC
GAACTTGGTT
TCCTT'rAACA
GTCACGTGTA
CTTGTTGATA
CATTTCAGCCC
GCCATCCAG
GTCGAGCAGG
AGTCGAAGAG
CATACCACCA
INFORMATION FOR SEQ ID NO: 312: SEQUENCE CHARACTERISTICS: A) LENGTH: 2170 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 312: CCACATAAAG GTAAATATCT TTGTACTAT CTTGGGCATC CAAGAAAAGC AATTGGGCAA TAACAGAGTT AGCCATATTG TCTTCAACCG GACCTGTCAG CATAATGATG CGGTCTT-rGA GAAGACGTGA GTAAATATCG TAAGAACGTT CTCCACGGCT TGT1TTGTTCA ATAACTACAG GAATCATTCA T7rrCTCCTTT TGAGTTAA TT'PTGTTGGT CAAATGACTG AAGATAAGAC 1340 TATTrATAATA TCTTGGI'CAA AAAAGGTCAA ATI'TTGCTC AAAAACCCAA CCTCCTT'rCG 'rGACTGGAAA TACI'TTCCA TT-ATI-rGTA CCGAACAAGC GGTCTCCAGC TTCG~rAAA CGTTCATCCA AGGCTGCTGT
ATCTCCAAGA
AAAAMrrCT
AAGGCAGACA
TGCTGAGCCA
TGCT'TTCATT AGACAGAAAC AGTCATTrCTT CTT,CGATC CCT~GGAACGA TATAACCCTG ACATCTGGAT GAGCTTCTTG AATTTGATAT TTGATGCGCC CCTGTTGCCA ACATT TC *00 9 0 *0 0 0 0..
00 0 0 0 00 00 0 0 6 0. 0.
0 0
AAGGGCTTTT
ACGT?1'rTTA
TACTACAAAA
TTGAAcTGr
GTTCAAGAGA
CTTACCTGCC
TAGTGGAAGA
ACGAAAAGCT
TGGGTGATTA
CTTATTATAC
ACATTAGATC
CTGTCCCT
AAACCACGCG
GGGTTACATA
TGACCTCAGC
TACAATGCTC
TTACCAGAAT
CTTGGGCTGC
T'rCTCTATTT
TGCCATCCTT
ACACCCTCTG GAGCAGATAC AGAG2AATCAA CAGCCAAGAT ATTrTGACCTT GGTCAATGTC TCTTCATCAC GGTACATACC CCATCAACCA TCCCGATACC AATTG'1TT GAACTGrVr TCACGAAGTA CTTCATACCC TTTGTAGAAG TATCTGTACG ATAACTTCAA TTrTTTCCCAT CAAAAAACGG TTTAAAAATC AGCCTCrrrA AGAGCTGTCT TTCTTGATAA AGGTATTCG CTCAGGCAAT 'rTCACCAAGT ATTCAACTGG GATGTGGCCA ACTTTAGCAG CTGGAACCAA TCCACGCAAG ATTG4GGACGA TGCCAATT'r TGTAATTGGT CTTTCGATTT CCACATCTTC CATCAACATT GCAATCTCAT CTAC'TAGCTC ACGCAAGATT GACAATTTGT GTTGAA'rCAG TTT'rGGAATT CCTTCTTTCA ATrrATTCTT TTTCTAAACC ATTTATTTr'T GATAATTTT GTACTGTCTC AAGTGGTA.AA TGGGTCAATT CGTAGTCGTC CATCGGTAC 'rGGTTGATAT ATTGTTTTGT ACAGTTGAGA CAAGGAAAAT CACCACGCTC AGCACCTTGA AGGATAGCAT CCTTCAAT GACCAAACAT TCGTGATCAA
CTTGCAGCCG
GGCTGTAAAG
GTGAAGGGTG
AGTCCCTGAC
CGCGCCCACT
AAAATACTCA
TTTAAAAAAT
GATCCAT=r
ACCTGAAGCA
CCTTTr'GGGAA
CGAACGCAGT
ACCGAACCAT
TTAGCACGTT
TCCCAGGCCA
1200 1260 1320 1380 TGTAACCAGT GGAAATAACC TACAAGTGGA ACGATTCGCA GTCTTTTTC AGTCATCTCI' *0 00 0 9 00.0 0 0000 000 S 0.00 00 .0 6 0 9 GATAAGAGCT GACACCTTTT GGACAAAGAT TTTTTCCACA GGTAAACCTA AATCTGCAAT CTT-TTCAGCT1 AGAAGGACAG AGACGATGGC TGAGCTCCAG GATTTCCCAT GrTATTTTTC TAGAAATTCC TGGTAATCCA AGGCCAATTG ArTACTCTA ACCCAAACCA AATAAAGGTC TGTCTTTAAA TGGACAGTTC GTTI'AAAGAC GCTCTCTAAA AAGGCCGCAC GCGAAACACC TGCACGTTTG TCCTT'TTTT CCAAGAGTTG CAAGAGGGCT TTATTATCTT 1440 ATTAGTAGAG 1500 TT'rCTCCTTT 1560 GGTACCTTCA 1620 AAGGAATGAA 1680 TGCATGGCTT 1740 GCT'rCCTTTC 1800 TCGTAATGAT 1860 ATTTCCTCTT 1920 ACCAATTCAG 1980 GTTTCAATGG 2040
TGGCCTCCCG
GCAGCTGTTC
TGGAGTCATA
GAAAAGGTGA
CATAATATTG
ATTGCGATAA
AAATACTAAT CTTGGTCAGT 1341 CTrCTCTGGr TAATAAATTG GATrCTTGGT TTGATTrTCT GAGATTTTCA AGAGACTN'T CAGAGATTCT ACGTTCAGAC ATAACATTTT C7?PTCTACTT GTCACAACAG ACGGATGATG CTTTTGTrTC INFORMATION FOR SEQ ID NO: 313: SEQUENCE CHARACTERISTICS: LENGTH: 539 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313: ATCTGCACGA ATCAGGGCTr TCTAAGTGAC AGGACATTCA TATGTCACGT TATACAGGAC TTTCACTTAC AGGTACAGGT AAAGAATTGG GACCAAACAA CCGTTCTA.AA TrGTCAGAAT TTCGTTTCAC TTACGGTGTA GGTGAAAAAC AAATCAAAGG CGGAATCCTA GGTTTCAACT ACGTTGTTTA CCGTCTTGGT CTCGCGACTA ACGGTCACAT CCTTGTTGAC GGGAAACGCG GTCAACTGAT CTCAGTTCGT GAAArATCAT TATTTCCACC GAAATATTAT CATCTTGGAA ACAAGCTCGT CACGTCGTAA CTACGTACCA ACGGTTrGCA ATTGGCTGAA AATTCCGTAA CTTGTTCGTA TTATGCTTCT TTTGGAACGT CTCGTCGTCA AGCTCGTCAA TTGATATCCC ATCATrCCGC TGAAAGTTCC AGCAATCCTT
TTATATCAGG
CGTCTTGGCC
GGACAACACG
AAACAAAAAC
CAAGCTACAA
CGTTTrGGATA T'rCGTAAACC
GTAACTCCAG
GAAGCAGTA
INFORMATION FOR SEQ ID NO: 314: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 667 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 314: CCGGTTTGC TCCTTCTCTA CGGCTACGAC GTGATGTATC TCTGATGATA TCCACTGTTT CTGTMGCAGG CGTAGGTGTT TCTGGACCTG CTTGTTCTGC TT7TTTCTCT GCCGTCGTAT AGGAAACAGC TACCCP TGTT GGGGPTCAT TGTATTCTCT TTCAAGTTTC TTAGGTCTAA CAGGACCTGG ACCTGGTCTT GATCCACTTT CTTCCGCTGG AGAAGAAGGT ACATCTTGAC TTGGATGACT TGGAACACCA GGAGTTTCTC TTTGAATCTC ATCTGCTGGA GAAGCTGGTA 1342 CACCTTGACT TGGGTGAGTA GGCACGGTAG GAGCTT'r TCT CATAATCTCC TCTACCGTTG ACAAGGAATC AGCCATGAGT TCTTCAGTTG AAGGTTCAT TGCAGGAGTG CGAACTACTG CCTCATCTTC TTTCAGAACT TCATCATAGC CTrl-rACTTT ?rCTAAATCT CTCAGAATCT GCTCTTTAAA GCGTAATTTC TC rCGCTC TTGACTrTTC ACTCAAAAGT '1-rwrCCTCCT TGTTGAGAAT CCATAATATT AGAGCTGAGA AGTCCAAAAA AAGCAATCTA TGATACTTT CCTAACGGAT TTTGTCATrT CCCAGACCAT ATCATACCAT GTTTCCCCTG CAAAGGTTGA
CTGGGAA
INFORMATION FOR SEQ ID NO: 315: SEQUENCE CHARACTERISTICS: LENGTH: 1483 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 315: GGGAAGCCAA GGTATTTTAT AGAGTCTTCT AATGCAATTA TTCTGCTGAA ATGATTGCCT TAATAGTATC AGCTCTAATG TGAATCTACA GTAACATCTA CAGAATTAPC AAAAAAGAAT AGGTGATGGG ATTCATGATG AGGGCTAGGT GGAGGAAATG rrTTTAAAA AGTCATACAC AAATATTAAG AATCACCCTT GCAAGTAGAA TGGGGCCCAA CGGTGCTTTG AATGAAGAAG TGCATTTGCT ATTGGGAATT TTATCAAGGG CATGCTATTC TTPTTCT1'GGG CAAGCCTTAC CATTCAGATT GAACCATTAA AAAATCTGAA AATGTGACTA CGGATGAAGT TGTTACTAGT TCTTCACCGA TGGCTACAAA CTAATGATTT AGATAATTCA CCAACTGTTA ATCAGAATCG CTAATTCAAC CACTALATGGT 'IrAGATAATT CGTTAAGTGT GTACTATTCG TTCCAATTCA CAATTAGACA ACAGAACAGT CTAATGAAAA TAAGAGTTAT AAGGAAGATG TTATAAGTGA TTGAAGATAC TGCTTTAAGT GTAAAAGATT ATGGTGCGGT ATCGACAAGC AAT'rCAAGAT GCAATAGATG CTGCAGCTCA TATATTTTCC TGAAGGAACT TATTTAGTAA AAGAAATTGT ACTTAGAATT GAATGAGAAA GCTACAATTC TAAATGGTAT CCATTGTTTT TATGACAGGT TTATTTACGG ATGATGGTGC CAGAAGATAT TAGTTATTCT GGTGGTACGA TTGATATGAA GAACTAAAGC AAA.AAATCTA CCACTTATAA ATTCTTCAGG CAAATAACGT AACTATAAA.A AATGTAACAT TCAAGGATAG AAATTGCAGG TTCGAAAAAT GTATTAGTrG ATAA'rTCTCG CCAAAACGAT GAAGGATGGG CAAATCATAA GTAAGGAGAG CTAGAAAAGG TTTCCTTAT GCCT'rGAATG ATGATGGGAA TTCAAAATTrC CTA'ITTTGGC AAAAGTGATA AATCTGGGGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 a a. a.
ATrAGTAACA GCAATTGGCA CACACTATCA AACATTCTCG ACACAGAACC CCTCTAATAT TAAAATTCAA AATAATCATT T'rGATAACAT GATGTATGCA GGTGTACGTT TrACAGGATT CACTGATGTA TTAATCAAAG GAAATCGCTT TGATAAGAAA GTTAAAGGAG AGAGTGTACA TTATCGAGAA AGCGGAGCAG CTTAGTAAA TGCrTATAGC TATAAAAACA CTAAAGACCT ATTAGATTTA AATAAACAGG TGGTTATCGC CGAAAATATA TTTAATATI'G CCGATCCTAA AACAAAAGCG ATACGAGTTG CAAAAGATAG TGCAGAaTW'r TTAGGAAAAG TATCAGATAT TACTGTAACA AAkAAATGTAA TTAATAATAA TTCTAAGGAA ACAGAACAAC CAAATATTGA ATTATTACGA GTTAGTGATA ATTrAGTAGT CrCAGAGAAT AGT INFORMATION FOR SEQ ID NO: 316: SEQUENCE CHARACTERISTICS: LENGTH: 2453 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 316: 1080 1140 1200 1260 1320 1380 1440 1483 CCTGAACOCT TTTTTATAAA TATCATAAAG ACGCGAATTA AAATTCATTG CATACTCCAT GTTAAAATCA TCTAAATTCT GACTCCAATA TGGTTGATTA TTCAATAAAT TTAAGTTGGT CAACTTTTTT TCTTCAATTT GAGTTTCTTT AATATTATTT CTATTATCAA. TAAGTATATT ATATTTGGAG TTTATTATCT CCATATAATC ACTAACCATA TTATACTCGC TTAATTTATC CGAATTCTTT ATGTAAAAAT CGTTAAAAAC TACTAATATT TCTCTTTCAT TTATTAAACG CCAATCTGAT TTATCAAGTG TCTCTAAGCG CGCTTCTAAA AAACTCATTT TTGAAAAGAC TAATAACAAA ACCAATCCCA TAATATCCTC TTCATAAAAC CCTGGAGTTC CA.AATAGAGG CCTTAGGGCA TGCTCAAAGT CTATAATATA ATTAAATGAT AAATCTCTAT AGGAAAGATT AATTAATGTT AAAAACCAAT CATAtGAGCC TGCAATAATA AACTCAAATT CCACAAAATA TTTTGGAGTA AATTCCTCCT TTTCCAATTC ATTCACAGAA TCTCTATTTG TAAAATCAAC CAACGATAAA TCACTAGCTT CTTTTAATAA AGAATAAACT CGCTTTTGAG TTTATAAACT CCACCTTTGG CATT'TTAGA AATCACTTCC AAAATAATAT AATAGTGTTA TATCTTGGAA. TATAGTAATC CCTTATTGGA ACATTCACAT T'PTCTTATCT CTrTATCCT TGAAAGTGCT ATCTN'TACG AACTCCCCAT TACAACCTCG CTAAGTTGAA ATCTGAAATC TGATGGTATG TTTACACCCT
TATTAAATAC
ATTGATCAGG
TTGAAGGGAT
ATCTGTAATA
TTACACCTT
1344 GTAACAAACG 'ITGAAACTCT TT~AT1TACTT TTGGATAAAT A'rACAATATT TCrAATTrG'r TGTAATGAAT TTCCCGACTT TTTAATGCTA ACCAAAATT GAATTTTTTA GCAACCAAAT GTGAATAACC ATTAAGCCCT G'rATTTTGCA TGAAATTAT CTTCTrCrCT CTAGAAAATA 'rAGCATTTAA TAT TGAAGCG CTCAGGTGTA
AAGAAAGTTC
TAAAATCAAA
TTTTAAATCC
960 1020 1080 1140 CTrAGA'rTGG GTGATATTAG ACGGCAAAITT AcTAATXTTA TATrCTAA'rA ATAAATrATG CAAATAGTTT AAATACTTTT CGTAATTCAT CCA'rCATTrA AAGCCAAACA ATTTAAACCG T'rCAGTCCTC TATT'ITGTAA TTCCTTCACC GGCAATTTAT AGGACTTCAA GATAAAACCA CCAGTI'TCAG AATAATN' TAGAAAAATA ATTCTTTTAA CATCGTATPI' ATTrTTCATA AAAATTr'rAA ATTTCTATAC AACAA'rCCGA CACT CTT AAT ATATAAAAAA TGAATTAATC TTcACTTTCT ATATCATAGT AAGGAATTCT GTTTTATATA TTAACAArTA TGCGGATTGT CTATdtGCCC CAGAAGGCGA TGCAACGCTA TCATrTAAGT CACGGA'rTAG CAATGCI'CC ATrACrTT CAGGATTATC AAAAAATrC'r TTCAAGAAAG C1TTTCATTTG ACTACTCATT AATCAAAACT TACTTGTACA TTGGAAACAC ACACTTT'CAA AATCAAATTG CTAAAAATAA ATATATCTCA AAAAATTGTT TTGAAATTAG AACTAGTTAT AAAAACTTCG CCCATCATAA TAATTT7MrIG ATTTrTAAGT GACTATAATC GGATCAATCC CTCTAGCCAT C -rATGAACr GTATGCGTC'r
ATTAAGAAAT
TGATAATAAA
AATAATTNTA
TAAAAGAGAT
TCTAGTGATr 'rCGGCCACrC
GAGTAGTCTC
AGAAACCTCT
AT'ATCCCTA
AAATCTTGTC
TTTAAT
TTCTcrCTTC CCAACAACr'r
TCTATTTCAG
CTrCTCCATA TGTrGATAAT
TGCTATATCT
AAGTATTATA
CTGATAATTC
ATATAACCAK- TGTTCAT4CAC
ATTTTCTCGA
ATCTGACAAT
ATCCGGAATA
TTCCTTAAAA AGCTCACAAT ACAATTTGAA CATTTCACAT GACTAAGATT TCCTAATTAA ATTGAAAATT GA.AATTTTAT TAACAAAATG GCAAGTGCTA AAAGAGCATA ATCATCCATA CGACAATTCC AAATTT'rCTA CCATATTTCC TTGAAGTTCA 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2453 TAAAATTATC 1200 TT'rCATAGTC 1260 AATTTTAGA 1320 CAATGTAACT 1380 ATATAGCTCC TTTTCTATTA CTTTATTGG C'TCTATTCTA CGCTTTCATA TTGCTGCATG TTTTTAAAG CTAA'rTTAG ATrTAATTAC TAAATTAAAA TAGGT'rTCTG TACTTATAGG
AATATCTATT
TCCTATCTAT
TGAGTTCCTC
TAAGTAAAAC AAAAATTTTA AAATACCATT CGCAGGACCT CAGACAGTCC CGG INFORMATION FOR SEQ ID NO: 317: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1049 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear A Al 1345 (xi) SEQUENCE D3ESCRIPTION: SEQ ID NO: 317: CCAA'ITTGAA GGCTCTAAAA TCCAGCAgTA CTCAAAAACT ATTGCTTTTA ATCTTTCTTG GCATTCAAAG AGAAGCAC'rA CAACACGATT CTTATCATCG AAGGGCAACA GCCCAGACAA CAATGGAAAA GTGCTACACA GATIGTGACAG AATTTGCCAT TrACTrATCA CCAGTN' 'AG ATGGCTTTAA CAGCGAAAT'r TTCGCCTAAT TTAGAATAAG TACAAACAAT GTTGGAACAG TGAGAATACG ATTCTCCATA GTGACCAAGG GTTCCTAGAG AGTAAGGGAA TTCAAGCATC CGGCATGATG GAATCTrTCT TTGGCATTTT
CTIGGCAATAC
CATGTCACGC
GAAATCGGAG
ATGTTTTATG GTTATGAGAA GAACTTTAGA TCTTTAGAAA ACCTTGAACA AGCTATTGTG GACTACATTG ATTATTACAA, CAACAAGAGA ATTAAGGTAA AGCTAAAAGG ACTTAGCCCT 0 0. tO 0 0 0 0t 04 GTGCAATACA GAACTAAATC ATTTTTGGTA TATATAAAAT GGATGTAACT TACTATATTC GTTCTGT?'rC TCGAATAAAT AATAATCATC CACGATATAA ATTrTTTrTAG CATGTGAGCT CATCATAAT'r CACAAAAGGT TAATTGCCCG ATAAACATTT TATTT'ATTGC AGAGTCCTTA CTTCGGATAA ATTAATTGTC TAACTTTTGG TTGTAGGAGC TATATCTACA ATTTTATAT'r ACAATGTTAT CCAGTGT' TTCTCrAATA TCTTCAAAGT TrAACCCGTC AACTTGTTCC AATTCATCAG TTAAATTAGT AGTATAACTT TCATTTTTTA TATCATCAAG AGCTGTCCAT CTTGACTGCT TGATGATTAC TTTTTGCCCG
GGTGCAGTAC
CCCAGTTTAT
TTTAAGGAGT
TGAACAAGAA
TTATCGGCTA
TCTCCTTCAG
TCCGATTTTC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 CCTTTATTTG ATCTCTTAAT CTTGAAACTT CACATGTGGT AATTTTTrCC ATrTTTGTATT TTGAAAATAA ATCCTTTTTT *0 0 0 *000 0**0 *000 00.0 09 OS V 0 0 CTTCTTCTGA AAATAAATCC ATTTTCCGG INFORMATION FOR SEQ ID NO: 318: SEQUENCE CHARACTERISTICS: LENGTH: 776 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318: TTAGTTGGTT AGAATCAGAA AATCGCCGAA GTGGTTATTT ATTTTTGAAT AAATTTAACG AACCAATTAC ACCAAGAGGA G'TrGCTCAAC AGTTAAAAAA TTATGCTGAT AAATACAAAA TGAATCCTAA AGTAATTTAC CC 'CATTCTT TTAGGCATTT ATTTGCTAAG AATTTTTTAG CGAAGTATAA TGATATTGCC CTCGAATTTA TCTAAGGAAA A'rTGGTAAAA AATAACAGGT 'rATrATGGGA ATA'rACCTAT AATACAGGTC TTTCTTACAA ATACGTGGTG GTAATATrAA CATACACAAT TCATCTCCTC rTATCAACCT' CrMAGAACA TrGCTGCAG
ACAGCTACTG
GGTCAAAT
GAA'IrGGGTT
GAAGGGCGAT
GCCTTTAGAA
TGAGCAAGTT
TATTGGAAAG
1346
ATTTGATGGG
AACAACAAAA
ACTACCTGCT
GI'ATAAAAA
TTAAGCAI'A
TTTrCTCTGT
TATTTAAAAC
TTTGCAAGAA
ACACGAAAGT ATAGAAACTA TATTGTAGAT AAAATTGTTA ATT7MTTGA
TAAAAGATAT
ATAATAAAGG
TTATGGCTCT
r TTTTCAATA
TGTTAGAATT
TGGATAATGA TTACTACATr ATAATCAGCT AATAACACCT G~GGCTG GTGGATGTAT TTrCCAATTA ACACCATTCG AAArGTcTAT TATGTAACTT GTCCTCTCCG TTATTTTATA
TCGAGAAAGA
AAAGTGCAGA
AACAATTGAA
CTATGATGGT
GATGATGTCA
AGCAAT
INFORMATION FOR SEQ ID NO: 319: SEQUENCE CHARACTERISTICS: LENGTH: 658 base pairs 3) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 319: TGCAATGCGG CGGCTGCATA ACATCGCATC GAGCGAGCAC GGACGAAGAG CATCAGGGGC CCCGACGGCG AGGATCTCGT GAAAATGGCC GCTTTTCTGC CAGGACATAG CGTTGGCTAC CGCTTCCTCG TGCTTTACGG CTTCTTGACG AGTTCTTCTG AGCGGGCTTT TT PTTCCTGA CGTCGGCATC CAGGAAACCA GTGGGGAGGC ACGATGGCCG CGCTTGATCC GGCTACCTGC CCATTCGACC ACCAAGCGAA GTACTCGGAT GGAAGCCGGT CT-rGTCGATC TCGCGCCACC GAACTGT'rCG CCAGGCTCAA CGTGACCCAT GGCGATGCCT GCTTGCCGAA ATTCATCGAC TGTGGCCGGC TGGGTGTGGC CCGTGATATT GCTGAAGAGC TrGGCGGCGA TATCGCCGCT CCCGATTCGC AGCGCATCGC AGCGGGACTC TGGGGTTCGA TGTCGACAGC GGCTGGACGA CCTCGCGGAG TTCTACCGGC GCAGCGGCTA TCCGCGCATC CATGCCCCCG CTTTGGTCCC GGATCAATTC GCGCGACCGG
AGGATGATCT
GGCGCGCATG
TATCATGGTG
GGACCGCTAT
ATGGGCTGAC
CTTCTATCLCr
CCGCCTAATG
AGTGCAAATC
AACTGCAGGA
ATCGATCC
INFORMATION FOR SEQ ID NO: 320: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1475 base pairs TYPE: nucleic acid 1347 STRANZ3EDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 320: CCGGCTTAAT T'N'TAGAAAA AAAGAAAATT AATCGTTCCA CTCATTCTAC AATTGACTCG TAGCAGAAAA ACTACAGGTC TTATTGACGG TAAACCTTAT GTATTACTGA AGATATTTTA TGCATATTGC CACTGCTACA ATTACTTGCA GCTCCTTCCA ATAACGGAAC TCCGTTTGAG
CGTGGGCAGG
GAAAGAGATA.
AGGATTATTC
GCTTrTGCAGA
GTTCTGGAAC
CAAAAATCGA
AAAATCTTAC
ACGGAACCGG
TACTCGATTA
GAACCTTTGT
TCCGGGGACT
ACTTCAAATT
GTCCAGTTTA
AAACTTATAT
TTTACAATTA
GAGCTTCTTC
TATTGAAGT
GTCGTCATCG
TCTCTCTCGT GGCAGCTCAA GACAAAAATA TCTGAAGATG AGAATTTGCA AATGAATTT TAATATTTAC CGCCTGCGTA GAGTACCGAT GTTATTCCAG CATTGAAGGA AAGTTAGGAT TAGTTCAGAA AATGAGCAAC AGAACAAGTG GCTTATTTGG CTATGA'rTTA T'rTGAA'rTTA AA.AATGAAGC CAATCTI'FA GTCCCATCTA TGCTTAAATG ATTCTNT'FGC ATTACGACAT TCCTCCTAGG AGAAAATGTG CAGACTCTAG TTTAAGAAAA ATTTAAAACA GGGCAAGAAG GTT'rCTCTT'r TCTAAATAAG ATGGCTTTAA AAGAGTGATC GTTGTATCCA TCATGTTGAA AAA'FATCTTC GTATAGCTTA TAGAGTAGGT ACTGAAATTG TTCACCTGAT CTACTTCTTA TAGTTATTTA GTTTTAAATA GTGT'FTCKAA CATTCTTACA CTGACGAGAA GTTTTrGAGT CTTTTCTTGT AACACA'FATA GTATACTGTG GTTAGAATAG TAGACTGTGA CTTCTAACAA ATTGCTAGAA ATGAATTTCA ATCTCCCAAT TTA~rTGTTC ATATCTTCTT TTAATATATT AAATA.AATTC TAAATCATAA TCATr'rAAAA AAATTTTATT Tr=ATTTTT CATTACGAAT AATATAGATG AAGGCGAAAG AGTATGAAAA CAGAACTGTT TCTTTTGCTA TTAGTTCAAA AGGAGAAAAA ATGAAAGTAGAAAATATTTC GTATAGGGTG GATCATCGTA AATTGT'FTGA TAATATT'rCT TTTGATACTT CGAGTTCAGA CGTGACATTA ATTACTGGTA AAAATGGTAC AGGAAAGTCA ACTFTACTAT AGTAGATTGA AACTAGAATA GTACACATCT ACTTCTAAAA TATTGTTAGA AATCCATTTG ACTATCCTGA TCTAT'NTGTC CTG1-rCTTAT TTCATTTCAC TATATCTCAA ATTGAGTATG ACGAAGTGCG CTCCCATGTC CTGGGAACGC ACTTTCTTCA TATN'?I'CAT ATTC'IrGAAT CCATCGATAA ACACTATTGG GATGAATTTT TAAAGTTGAA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1475 CTAATCATTT TTACAGGATG AGA'PTTACAC CAGAG INFORMATION FOR SEQ ID NO: 321: 1348 SEQUENCE CHARACTERISTICS: LENGTH: 560 base pairs TYPE: nucleic acid STRANI)EDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 321: GAAATATATA TACTTCATCT TAATAGTGAG CAAGCTAAAC TTAGCATTTC ATGCCCTCAT ATGGGATGTT CTTTGACTAA ATAATATGAT TATCGAGATA TATC'TGGATA AATGAACTAA TAAGTCTGAC GCGTAGACTT ATCAAAGTCA TrGGCATACA CCACTATGAA CTCGTTGGTC TGTTCAAATC CCAACACATT ACCTGAGAAG AAAGTTGCAA TGT'rGTT'rTT GGTGCGGGTT TGAATTTAAA AAATTTGTTA TGTAGTACCT AATCTAAGGA ATTAGAACAA TGCCTCTAAT TTTTCTTTAA TACACTGAAA CATTGATGAT TCTGGCTGTA TTTTTGAAAC AGCTCTrCTT TGCTCCTGGA AAATATCTTC AGAAGTTATA T'rCTCTATTC CTAACGCTAC TTGAGTTTTT TTTCTAAAAT AT'rCTTTTCC GTTGCCATCT TTAGAAAAAT CATAACCTTC CCTATCTACG CTGTTACACA AATTAGCTAA AAAArACTCT GGGGTTGGGA AAGGAAGATA AGAAaCGTAT TTAGCCCATA ATCTATAAAG INFORMATION FOR SEQ ID NO: 322: Ci) SEQUENCE CHARACTERISTICS: A) LENGTH: 643 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
S
0* (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322:
S
.5 0 CCGCCCGGCC ACCGCTGCCT ATCCTCGGGA AGACACGGTG CGGTACGACC TCGTACTACT CGAACTGATC GGACATGGGT GCGAACACTT GCCTGAGGAA CTACGGGTGG TGGCTrTTCC TACACCCTCC GAAGAGATTG TACTTGTGGA TCGGGCGACT GGAGGAATGG CGCCTTTTTC CAAACCTGGC TCAAGATGTA CGCATCTCAA TACCGGAGTA CCCCCCGAAC GCACCGTTTG TTCAGGTCGA GCCTTTTTCA GGTCTGGCCT GAGGGTCACC TGGAGTGAAC CTAGAACGAT TTCGCCGACG GCCTCGTCCG TTGTCATCCA CAGAGAAAAA ATCGTTGGAC TGCGTGTCGG GAAGAACGGC TCCGGGTTTG ATGACGAGGG GAACGGCATT GTGAGGCACG CTGTCAGGGA CGGTTTGACC AA.AGTGGCAT CACATGGTGT GGCGGAAGGG GAATCGTCAC AGGA.AGGAGT GATAGAGCAT TTTTCTGCAG CGAACTACCA TGCCTGGGTA GGCAGCCGAG AAGAACTCTT 1.349 ATATCCCTTA ATGCCFI'TCA CCATGI'CAAT TGATATCTAC GAACTGGCCA GCTrAT'rGTG GCATTTAGAC GGTCAAACGG AACGAGCACG TAGGGTACTG TGC INFORMATION FOR SEQ ID NO: 323: Ci) SEQUENCE CHARACTERI STI CS: LENGTH: 780 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 323: GGTACCCACT CATTCTTGAT AGTCAAGAAG AGGAAAAAAA AACTTCTTGA GGCTGGTGTA CTAAGTACAT CTTTACTGAA AATACGCTGA CCAAGCATAC TGTTCGTTGG TACTAAGAAA GTCAATACTT CATCAACCAC AAAAACGTAT CGCTCGTTTG TTCTTCCTA.A GAAAGAAGTT TGGGCGGTAT CGAAGATATG AGAGCAAATC GCTGTTAAAG CACCAATACT GATCCAGATG TGCTGTTAAA 1'TGATCACAG CAAAAAGGAG AAATACTCAT GGCAGTAATT CACTTTGGTC ACCAAACTCG TCGCTGGAAT CGTAACGGAA TCCACGTTAT CGACTTGCAA GACTTCATGC GTGATGCAGC AGCTAACGAT CAAGCAGCTG ATGCAGTTGC TGAAGAAGCA CGTTGGTrGG GTGGAACTCT TACAAACTGG AAAGAAATFrA AACGTATGGA AGAAGATGGA GCACTTCTTA ACAAACAACG TGCGCGTCTTr CCTCGTATCC CAGATGTGAT GTACGTAtTG AAGCTAAAAA ATTGGGAATC CCAG'rTGTAG ATATCGATGT AATCATCCCA GCTAACGATG CTAAATTGGC TGACGCTATT ATCGAAGGAC
TCAATGAAAC
CCTAAGATGG
CAAACTGTAA
GCAGTTGTAT
GTACGTTCAG
GGAACAATCC
ACTTTCGAAG
GAAAAATTCT
ACCCACATAA
CGATGGTrGA
ACGCTATCCG
GTCAAGGTGT
GAATTGTGAA CAGTTGCCCT TGGGTCGTTT TGCGAGTTGA INFORMATION FOR SEQ ID NO: 324: SEQUENCE CHARACTERISTICS: LENGTH: 624 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 324: CGGGAAAAAT CAGATTGTGG GTTCAGATAT CGAATTAGCC AAGGCTATCG CAACAAAACT AGGTGTCGAA TTGGAACTAT CTCCCATGAG TT'mGATAAT GTACTGGCTA GTGTTCAATC AGGAAAAGCC GACCTrGCCA TGACTTTTCC ATTCCCTACT GACTACTrAT CAGTCTGTAA GATrCAAGAG ACGATGGCGA AAATGGGAAT TTAATCACAG ACCTGTTTCC AAGGGATTTG TGAAAAAGAG CAAGATGATT AGAGGCAGTT CGATAAAACC TTGAGGAAGC CTTATAAGCA 1350 TATCAGGTGT TrCAAGACA GATGAACGGA GCAAGGTGTT ATACTGCAAA AAATAAACTC ATTGTCAAAA AATCTGACTT ACGACTTGGC GCAGAAAAAG GTTGGAGCGC AGAAAGGTTC AAGATT-TGCT ACAAAATTCT TCCCTCGTAT CTCTGCCTAA ATrTAAAATC AGGACAAGTG GATGCCGTTA TCTrTGAAGA TGGAAAATAA TCCTGATTTA GCAATCGCAG ACCTCAATT CCTACGCGGT AGCCATgAAA AAAGATAGCA AGAAATTGA-A ATT-CAAAAGT TGAAGGAGTC TGGGGAATTA GACAAACTCA
TCCA
INFORMATION FOR SEQ ID NO: 325: SEQUENCE CHARACTERISTICS: LENGTH: 1237 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325: TCTTATGAAG CCGA.AGCGTG ATTTATGGCG CTAGTGCCAT CAGCGTATAT TATGAGCAGG TTTATGAGGA GAGGCAGACT TGGTTTCTCT AGTCCTGCGA TCTTACTTGC ATTCCGGTAG GTCAATTTCG GAAAGTGTTG TGTACTCTGA CAGGAAAAAC AAGTTGCATA CATTTTA'rrT CAGGAATCTG ACGGTATTTT TGATTCAGTA .CAGCAACATT CCAAGTATTT GGAATCTnTG
TGCAACGGCT
ACGTATTGTA
TATTCATAAG
TCAGACTAAT
TTACTCCATG
TGCGACACTA
GCAGATTCAT
TCTTCGTGAG
TCGTGATGCC
GATAGGTTTG GTCTGCAGAA AGTGACAAAT GCTGAGAAAC AGTCCCGGGC TGAACTGACT TATCATCATA CCATTGATGC GACTACAGAG GAACTATTGT CTCGAAGAGG CTTTAGCTTC GAGTTGTTTC AAGATATATT TGAACCCTCG ATCACAAAGA ATGAACCTGT TTTGAATGGT ATTCAAATGA CCTTGGATTA TGATTTTAAT ACCCAGGCGG ATATGGTTAA AAAAATCCAG CCATTTCGCG AAGGAAACAC TCGGACGGTA T'N'GGTTTNG ATATTGATAA TACACCATTT T-rAGTGTT-AG ATAATGCAAA GATTTTACAG GAAAATCTCT TGCTCGGTGG TCAAAATGAT GACCTCGATC TT'TCATAATC CTAATACTGA TAAATATTCT CACAAGAAAA CGTATATCAT GTTGAAGAAA AGCTAAGTTC GAGAAAGGGC CGACGTCCTG AGTTTTTAAC AGCTTTTTTT TTGTCTTCAG AAAAAATGTA TCTAGATTA GTAAACATTG AATTTTAGGA AAAAATGAAG CAAAG'NrGG CTCTTTGTCA ATTGTAGTGG AAAT-rrCGGC C -rTCCTTTT TGATGTTCAG, CAAAGTTTCG AAAACCAAAG GCATTGCGCT CCAGTY-rGGC ATTAGAATAG TGTAGTTGAA GGAAGGGTT AAAGACAGTC TGAAAAATAG GTTCGAAAAA TTTCCCGGG TCCTTATTCT GATAGTGATG TATCAAGTCT TGTGAATAGC INFORMATION FOR SEQ ID NO: 32 SEQUENCE CHARACTERISTICS LENGTH: 461 base pa TYPE: nucleic acid STRANDEDNESS: doubl TOPOLOGY: linear AGCGATAAAA ATCCGGT Trr TrGAAGTT TGATAAGTTT GATGAGATTA TTGGGCGCTT GGGCGTrGAT AACCTITTCT TrATCTTTGA GATGAACCTG CTTAAGATTG TCCTCGATAA GAAAGTGAAA CAGCAAGAGT TTGAAGAGCC
TCAAAAG
960 1020 1080 1140 1200 1237 (xi) SEQUENCE DESCRIPTION: SEQ ID HO: 326: TTTGATTTTT CTGAATTAGA AGAGATTGAA ?TGCCTGCAT CTCTAGAATA TATTGGAACA AGTGCATTTT CTTTr'AGTCA AAAATTGAAA AAGc'rAACCT TTTCCTCAAG T'rCAAAATTA GAATTAATAT CACATGAGGC TTTTGCTAAT TTATCAAATT TAGAGAAACT AACATTACCA AAATCGGTTA AAACATTAGG AAG'rAATCTA T'rTAGACTCA CTACTAGCTr AAAACkTGTT GATGTTGAAG AAGGAAATGA ATCGTTTGCC TCAGTTGATG GTGT TTGTT TTCAAAAGAT AAAACCCAAT TAATTTATTA TCCAAGTcAA AAAAATGACG AAAGTTATAA AACGCCTAAG GAGACAAAAG AACTTGCATC ATATTCGTTT AATAAAAATI' CTTACTTGAA AAAACTCGAA TTGAATGAAG GTTTAGAAAA AATCGGTACT TTTGCATTTG C INFORMATION FOR SEQ ID NO: 327: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1436 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 327: TAACATTTAG GTACCTCTTC TTAACAAAGT TCAATAGTAA CAATTAATAT TTTAAACAAT ATATCAAACA TCAATGACTA GAATACTrGC ATCATCCTTC TTTCCATAGA TTGGATCAAT AGCAGAAGAA TTAAATCTCA TCTTAATTAA. CTCTTCAAAA GTTTTATI'?1 GATTATTTTG ATAGAArrCA TAAAAGCCAT CGCTCATTAA AflAATAATA
TGATTACAT
ACCAArrGAA AGTTArrTCC TTTrrTCTGAA
GTACACCTCT
CTATTATTAT
TTTAACCTCG
AACCGACAAA
TTTAGAATTA
GCATGGTCTA. AAAATCTCTC AATTTCTGA rl-rrrrGTAA TCTTTTATCT CArP'PrrCCC ATTTCCTTTA CTAACATGAC AGTITAGCAA ATAflGGTAA ACTTCTAAAA CATTGTTAGA TTCATNTTAC TATACTCTGT AGTAAAGCAG TTTCAAATAT CTTTCTGATA AAATATGTAC TATAAATCTG TAGCTCCACC 1352
AACATTTGT
ATCCAACGAA
AATAAr'rTTT TrrrTCAA.AT
GCAGTCACCT
GCGATAATAT
AATCGATTTG
TAATTTATAT
TTGTrTAAGA
AACTTCTGAG
AATAATCCAA
TCACTAGTAA
CCTATCCAGT
TTArrrAAAA
AAGTTATCTA
AGCATCATAT
CATCTATTTG
ACCCACTCGG
CACTATTTGT
CTCTATGATC
ACTCCAACT
S
S
TACAATATCC TCATTTTCTA CGGAACTTCC TTGTATCGAA CATAATACTr CAACCCTTTT AGTGTCAAXAA GTAAACCAAT GTrCCATAAT CTTATTCGAA CCAGTCTrrG GTAATPI'TTG TAGATTTATT AATATGATrT TCAGTrTCTC TGCCATCTCC CTTCTGTCTT ATrATCTTGT T'rATTGTCGA TCTTGTCATT TTGAGrrAAA CTCTCCGTTC TrCTG GTT AC TATCAATTAC rrrCCrCTT'r GTTrrTTTTCT TTTrCGTTTT TATCACTTAA AAAGCCCATT C'TCCGTTACA ATATTGAAAT TACCATCGCT TCCCATTT'GC ATTAGATTTG ATGAATGATA TATACTTACC TATTTAAAAC GGTTATTTTA CCCTTTGAAT CCTCAATAAC AGTATATT4GA AACTAGAATA ACTGTCCTGA TTGATTTGrC GAGTTTAAAC CGATTrCATC GTTTrrGATT CTrTACAAT'r ACTGAATAAC CTATCTCC'rC AAATACTGAT rrrGTGAACC CAAATTTTAT TTATCTTTAC TCCTGTCACT GTTAAGAATA T'rrkACATCT ACTATYTCTT AACTATTTTA TAGTTI'ACTT CATTTGTCTA TTATCTTTAC ATTATTTGAA TTAGATTGTT ATTATTTGTT ACAATTI'TGT ATCACGTATA ACAGGTTCTT GGATAAATTA TAAAATTGGT AATCCTTCT TTACCC 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1436 INFORMATION FOR SEQ ID NO: 328: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 646 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double CD) TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 328: CCGGCAGACA GGAGAAGGTG ?1'AAATATCA ATCTCAAATG GTTCGTCAAT GGITTCTGAT ACGTATTTTC CGTCTTTCI' CCGTTGCTrG ACACACTCTG TGAGGAGATA TTCGATTTGC CCATTGACTG AACGAAAGTC GTCTTCTGCC CATGATGCGA GTGCAGCGTA TAACTNTGTT 1353 GAGAGTCGAA GGGGGATCTG CTrTTT=A GCTTCAGCCA TCTTTAGTAA TGTTGACAAT T13GTTGTGCA TCATGATrGC CACAAAGAAC GACAAGGAGA TGGCAGCTrT TCGTTCTTCG TCAAGrrCTA CCAATTCCCC TTCATI'GAGC
AGGCTTCCTG
TTGAAACCA
CGTTCTAGTG
CCATTCAAC CATrCCTACA GCACCATCTA CAATCATCTT CCGTGCATCA ATAATGGCAG ATGCT*MTTG GCGTrGAAGC ATAACGGCAG CAATTTCTGG AGCATAAGCT AGGTAAGTGA TACGTGCTTC AAGGATTTCC AAGCCAGCAT CCTCAACACG ACTTrGGATT TCTrCACGAA.
TACGGGTAGC AACAATTTCC CTAGAGCCAC GGAGACTACC 'rTCATCTGCG TGCCCATCAC CCGGAGTATC CACA'rrAGGA GACACATCCT AAGGATAGAT GCGGAC INFORMATION FOR SEQ ID NO: 329: SEQUENCE CHARACTERISTICS: LENGTH: 1653 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 329: GTTGCAGGTG CAGTAGGTGT TACTTCAGAT ACATT'rGAAC GTGCAGAGGC TCTTTTTGAG
GCAGGAGCGG
AAAATTGCCG
ACTGCTGAAG
GGACCAGGTT
GCTATCTACG
GGGATCAAGT
CTTGGATCTA
CGTAAATTCA
GACCGTTATT
GGTCGTGTTG
CGCTCTGGTA
'rTTATTGAAA
AATGAGGCAC
ATGCGATTGT TATTGATACT AGATTCGTGC TCATTTCCCA GTGCACGTGC CCTTTATGAA CTATCTGTAC TACTCGTGTG ATGCTGCAGC TGTTGCGCGC ATTCTGGAGA TATTGTAAAA
GCACATGGTC
GATCGGACTT
GCGGGTGTAG
AITTGCTGGTG
GAATATGGTA
GCACTTGCTG
ATTCTGCAGG TGTCTTGCGT TGATTGCTGG AAATATTGCT ACGTTGTTAA GGTTGGTATT TTGGTGITrCC GCAAGTAACA AAACGATTAT TGCTGACGGT CAGGTGGAAA TGCTGTTATG AAACTGAAAT CTTCCAAGGA CTATGAAGAA AGGT TCAAGC TGTTTGCTGG AACTGATGAA GCTCCAGGCG AGACTTACCG TGGTATGGGA TCAATTGCTC 'rCCAAGGTTC TGTCAATGAA GCAAACAAGC TTGTTCCAGA AGGAATTGAA CTTATAAAGG AGCGGCAGCT GATATTGTTT TCCAAATGAT TGGTGGTATT TGGGTrACTG TGGTGCAGCT AACCTTAAAG AACTACACGA TGTCTGGTGC TGG?1'TGAAA GAAAGCCATC CTCATGATGT CAAATTATTC TATGTAAAAA ACAATGAAAA GAACI'CCAGT
TAATGCTCAA
GCAAATTACT
GAAAACAGGA.
GTTCTT'N'AC AATGTTGTCA ATTTCCATTT ACAGCAGCTT TACCATCCTG AATACTGAAG 1354 ATACTrAGAT 7rrCTGGCAG ATTTGAAGA TGGTCTAAGC TI-rGTTGT GATAAAGGTT 960 'rGGATTGATT GAGAAATCGT TTCTAATAAT TTTAACTGTC TAGTGTrGTC AAGTTCACTC 1020 ATCACATCGT CAAGCAGTAA TATAGGAGAT TCTGTGGTAA TGCII'?CCAT TAATrCGATT 1080 TCTGCTA.ATT TrATCGAGAG GACGAGACTA CGATGTTGAC CTTGGC TTCC GAAACTAGCA 1140 TCCATCCCAT TTATATAAAA AGAAATGTCA TCTCGATGAG GACCGACACC AGTATTCTTT 1200 TTAAATAAAT CTCTGGATCT ACITTrTCT AAAGCAATTT TGAAAGA'rTC GGATAAGTTT 1260 TGTTTGTCAG TTATATTGAC AGAAGATTGA TAGGATATTG ACAACTCTTC GATCTGATTA 1320 GAGAGTrCAA AATGT'rTCTT ACGCCCAAAT GATTCTAGTT TT?1TATGAA ATCTAAGCGG 1.380 TGATTCATTA CACGACATCC ATAATCAACT AGCTGATCAT CTAACACAGA AAGGAATGTT 1440 TCATCTATrT TTTGAGCTGA TTTTAGGTAA GTGTTTCTTT GCTTTAGGAT GTGGT'rATAA 1500 TTG4GTTAAGT CAGATAAATA GATTGGCTrA ATTTGCCCAA GTTCCATATC AATGAATTTT 1560 CGTCGAATCG AAGGTGCTCC ?I'TAATTAGT TGTAAATCTT CAGGAGCAAA 'rAAGACAACA 1620 .*TTCATGTGTC CTACATAATC 'rGAAAGGCGT GCC 1653 INFORMATION FOR SEQ I0 NO: 330: SEQUENCE CHARACTERISTICS: LENGTH: 1340 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: GAAACACTGT AT1TTCAAAGC ATTTTTTGTT AGTTTAAAAT TACTCCCATT CTTCTTTTCC AAACGTACAA TATATCCAAA ACCATTCAAA ATACTAGATT CrATTTTTTA TAATATCACT 120 AAATCCACCT AATTATAGGA CGTTTCAGA TTTI-rAGTCC CAGTCCCAGT ACCGGAGAAA 180 *TATrGTTIA ATATAATATC TC7'rTTTGTC TTCTAAGCTC TTAAAAGCAA AAGAACAAfGT 240 AAAGAGTCAA GACAAGGATA AAAAGTCCAT ATTAGGGCAA ATAAAAAGCT TTAAGACAGA 300 TGACAAATCT AAGTCAAATA AGAAAGACCA TAGCAAAGGT GCAGAGAGAT AAATATTGGC 360 GGTCTTCGGA CTGCCTTTAT TTT TTTATCC ATTTTTCAAA TCAAATTTAT TCAGACTATA 420 ***TATGCACATA TACACTTAAA TTCATATAAA AACATGGCTT GTAAAAAATT ACTTTAATCA 480 CAATAATCGC ATTTAAAATT GTGATGTTTG CAAGCTAAAT TACGGACTTC ACTTGGAAGT 540 TTTCCC'TTGT ATCTTTTATA ATAGATAGAA AATTTGCTGG CAGATGAATA TCCAACAGAT 600 TCTGCTATCT CNTATAGG TAGTTCAGTG TTAAAAGA.A GAGTCAGC TACATTCATT 660 CTT-rrCTr- GAGTGTACTC AA'rrTAGTAC CACTCAT AAATGATCAT CTAAGAATCT TrATM-I-1-I- TTCTATTTAA GCTTTAAGA AAAAATCAGC ACTCTTTCTA AGGCCTrTTGT GATTCTGGAT TAATGTTAAT ATGGAAACAG CAAGATAACA TNATCAAAAT CAAAAGTACA TCTCCGTTTG CACTGACAAT GTGCTATTTT GAACTACTTC TGTAATGCTr
AGATNTN'
TGCTACATCT
GTATGTGTCA
GGCAGGAGCG
AAGTAT'rATT
AGATGCTAAA
ACAATTCTCG
TAGAGAGTTT
GTAACTTGAA
CTCTTTGATA
1355 TGACAATATT TTT CCTTAAA TAAATTTTTr TCAAGCGTGC CTTGAT -rAC ATTCGTrGCA TCAAGTGCTT TATCA'PCATC ATTACTATAC TTATCCATTC TCCATCTTAC AATTTrAATAT
TGATTCGGTT
TGTT'rCTA
GAAGCAAGGT
TTAGCTC'N'T
AATTTCAATT
ATTTGCCTTT
T'rCCATTGCC
TGAATAAAAA
TTrAAAACCm
ATCTTNTATA
ATTAAACTTT
CATACTATAA
AATGAAGATG
TCTAATAAAA AAACAAAATT GCGGTA.ATAG TTTGATACGG TAAATTGAAA CATAGTCTGA TAAAAATCAT CTATATCGAT CCTCCTrCAT AAAACCGGTA INFORM4ATION FOR SEQ ID NO: 331: SEQUENCE CHRACTERISTICS.
LENGTH: 607 base pairs TYPE: nucleic acid STRANDEDNESS: double D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331: 1340
TATGTTCGTG
GTAAAAGAAA
GAAATTGAGA
TTTGCACATA
AGTGATTTAC
GAATTTTTAG
ATTCAAAATG
TATTGTCAAC
TCCA'PTTTGG
ATGGATTGAA
ATGAGTTTTT AAGTAGGAAA CTCTTTTTTC ACCCGTAGTA AAAAACAATT GCTAGCAAGT AAGAATTGGA TAAATTGTTT GAAATCGTAT TTTAGCTGAA CCAATGATCG AATAGATTTC TATTAGAATC ATTTGGCTT AACGTGCTAA CCTCTCAGAT TTTGGAACTT GTTGATAATG GGTTTGATCC GGCCTTATTT TTAGCAGCTG ATATGGATGA TTCTTTTTAT TTTCATGATG AACGTCTTCA ATTGGAATAT ACTCCACAAA GTTCTTATTC TTGTTTCCAA TTTTTCCTAG GTGA~rTTAA TGAGGTTGAA AAAGGTCGAA AAGGAGATGT GAAGGTTCAG GAAGGTATGG TTCGGAAAAA TGTGGGACAA TCTAAATATG GTGATGAGCA ACATTTACCC TTTGCTCACT CTAAGCTCTT TACAAATGTC
CTTATTCTAA
AATTAGGTTA
TGAATGGI'T
TATCCTTCAG
TCATTACTGT
ACTTGGTGGA
CGGGAAA
1356 INFORMATION FOR SEQ ID NO: 332: SEQUENCE CHARACTERISTICS: LENGTH: 900 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332: TTAAAATACC GAATTTTGTT TTGTCCTCTA TTTCAACATT GTGAATCGCC TCAGGCAGAG AACCGATACT AAAGATATAA CCAAAATAGT TGGTTAAATC AAAATCCAGT TCGTCAATTG GTTTTGTAAT GAGGTTACCC GTACCGCCTG
TGTCATTTGC
CGCCATCGAT
TI'TACCGATA TCAATCTAT GTCTTGATTG ATTTCCAAAA
TCTCATCAAT
CATCATACTG
TAATTTTGGT
AATCCAAAGC
ATTTTATCCT
CCTATTATAC
'PTATTCTATT
CCTTATCCTT
CTGCGAGAGT
CTCTACTTTC
AACGATAGTT
ACCTGAAATG ACTrCATTGA CTCACGAGAA GCTTCTTCAG TTCAACATAT TCAAAGTATT CTrCTCGCCA CCAGAAGTAG TAATTTTAAA CAGAAATGTT AATGAAAATA CAGAAAAGAG TTCCCATCGC CTAACTACAT GATCCAATCA GGAATACCGT ATCACTGTCG CCACCAAGTG AAGAAAGGCG ATAATGGCTT AGGACGGATT TCATCTAAAG GGATAATCCC TAACTTAGGA ATGTAGTCTC CAG'IrCCATC TCCACCAAAC ACAACCACTG CAAAATGTGT TGCATCCAGC GCTTTTTCGG CTTTTGCTTT AT'rCTCCAGC TTCTT'rGT GGTTGATAAT TACCATTGCT TrTTTCATTG TACA'N-rCGT CGTATGCAAG TAAATGTAAT AAATCTGACG TACTGGAGAT TAATACGCTT CCTT'rAAGGG TTCATCCAAG TAAGAATAGG AAGCTGCCTC TGCTAWGCTA CAAGTGATTG AGATGGCATT TCTTATCGCA TCTTCGAAGT GAGGGACAGT TTCCTGACAT GTTTCGTTAA TTTGAGATAG ATTGTAATCG TAT'rCTTTTT C. *C INFORMATION FOR SEQ ID NO: 333: SEQUENCE CHARACTERISTICS: LENGTH: 533 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333: CCTTTCTGGC ACACTGGTCT TGGAATACGG CAAAACCTCT GAAAATATCT ATGCTGGAAT GGACGAGGAA TACCGTCG?1' ATCAGCCTGC CATCATCACT TGGTACGAAA CAGCCAAACA TGCTN'TGAT CGCCOACAGA TTGGCAAAAT ATGGGTGGAA TCGAAAACGA CCTCAAGGGC 0
S
59 .0 S S 0 5 5 0 55
S
555**9
S
*5*s
S
55 S.
S
S
5.59 1357 GGTCTCTACA GCTTTAAATC CAAGTrCA.AT CCGACCATTG AGGAATTCGC TGGTGAGTC AACCTGCCAA CTAAT1CCTCT TTACCACCTC TCCAATCTGG CCTACACTCT CAGAAAGAAA CTGCGCAGa.A GCATrAACAG AAAGGAAGCC TATGACCTTT AAACTrCTCA GCCAAGAAGA ATTCA'rCCAG CATACCTCAG CTAGATCCCA ACGCTCN'r? ATGCAGACCG TAGAAATGGC AGAGCTGCTG AGCA.AGCGTG GCTrCAGTAC CCAGTATGTC GGc'rACACTG ACCCACAAGG GAAGGTAGTG GTGTCAGCTG TCCTCTACAG CATGCCTATG ACTGGTGGCC 'rrC INFORMATION FOR SEQ ID NO: 334: SEQUENCE CHARACTERISTICS: LENGTH: 544 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334: CCAGCAAACT AGGAAGCI'AG CCGTAGTTGC TCAAAGCACA GCTTTGAGGT TGTAGATAAG ACTGACGAAG TCATGTACAA AACACTGT'rT TGAGGTTGCA GATAGAACTG ACGAAGTCAC TCAAAACACT GTTTTGAGGT TGCAGATAGA ACTGACGAAG TCAC'rCAAAA CACTGT'N'TG AGGTTGCAGA TAGAACTGAC GAAGTCAnnA ACCACACCTA CGGCA.AAGTG AATCTGAAGT GGTTTGAAGA GAGTACAACT TGTCTTTTAG AAAAGGAGCC TATAATGAAA GTCTTrCAGC ATGTAAATAT CGTGACTTGT GATCAAGATT TCCATG N'TA TCTTGATGGA ATCTTAGCAG TCAAGGATTC TCAAATCGTC TATGTCGGTC AAGATAAGCC AGCGTTT'rTA GAGCAAGCTG AGCAGATTAT AGACTATCAG GGAGCTTGGA TTATGCCrGG TTTGGTCAAT TGTCACACCC ATTCrGCAAT GACAGGTCTG AGAGGGATCC GAGATGACAG CAATCTCCAT GAATGGCTCA
ATGA
INFORMATION FOR SEQ ID NO: 335: SEQUENCE CHARACTERISTICS: LENGTH: 349 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 335: CCAGGAACTC AAATGTAAGT AGGGGTrCCT TTTTTGTATA TwrTTCAAAT AACGCCTCTA 1358 CACTATrGT AGCAAATTCA CCAACTACAG TTGTATCTTA GTrAAAATAA GTTAGAATAT GTAAGTGAGT ACCAGATATA CCAAGACATC GTCACCATCT AAGGTATATT CAAAATACAA AAGTTGACCA ACTAGATrTC TGAATATCCT TATATATCCA TTCTrAAA.AT TGGTTTAAAT AGCGTAGTCT TTTAAACTAG ?TTGAGAAT CCAAAAAATC TTCCTACATA TGTAAGAAGA TTTrTTTAG'rT CAGAATGATT AGaTTTAGCT AATGGATACC TATCCTACC INFORMATION FOR SEQ ID NO: 336: SEQUENCE CHARACTERISTICS: LENGTH: 1206 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336: CTCCGATAAC CACACCAGCA ATGGAAATAA TTCCATCGTT AGCATCAAGA ACACCCGCAC GCAGGATATTr TAAACGACCT GCAAAATTTG AATCAATTrC GTGATTTGTT TCTGACGCTA AATTTCAAGT TCAAGTTAGC TTTTTAGGAT AGTTGTTAAT TTGGTTCCCT TAGGTAACCA ATCCTCfTAGA G'rCGACCTGC GATGACTGGG GAAAACCCTG CGCCAGCTGG CGTAATAGCG CCTGAATGGC GAATGGGGCC ACACCGCATA TGGTGCACTC CCGACACCCG CCAACACCCG TTACAGACAA GCTGTGACCG ACCGAAACGC GCGAAACGAA GATAAGGATG GTTrCTTAGA TATTTGTTTA TTTGTCTAAA ATAAATGCGT CAATAATATT CCTTATACCC TTTTTTGCGG TGAAAGTTTA AGATGCTGAA ATCTCCAnCA GCAGTTAAGA CATCAAGAAG TCTTCTCTGG GTGACTTGTA GTCCAAGCAT CCACTTTTCG ATGAATGCGA CTTCTTTGGG AGTCATTTTC TCTACGAATG AGCCTG'rTGT GATTCTCATT AGTTCCCGGG AGGCATGCAA G CTT GGCACT GGCCGTCGTT 'ITACAACGTC GCGTTACCCA ACTrAATCGC CTTGCAGCAC ATCCCCCTTT AAGAGGCCCG CACCGATCGC CCTT-CCCAAC AGTTGCGCAG TGATGCGGTA TTTTCTCCTT ACGCATCTGT GCGGTATTTC TCAGTACAAT CTGCTCTGAT GCCGCATAGT TAAGCCAGCC CTGACGCGCC CTCACGGGCT TGTCTGCTCC CGGCATCCGC TCTCCGGGAG CTGCATGTG'r CAGAAGTTTT CACCGTCATC AGGGCCTCGT. GATACGCCTA TTTTTATAGG TTAATGTCAT CGTCAAGTGG CACTTATCGG GGAAATGTGC GCCGAGACCC TACATTCAAA TATGTATCCG CTCGTGAGAA AATAAACCTG GAAAAATGAA GAGTATGAGT ATTCTACATT TCCGTGTCGC CATGTTGCCT TCCTGT'rT GCTCACCCAG AAAACGC'rGG AAATCATTTG GGTGCACAAC TGGGGTTACA TCCAACTGGA TCCTCTGACA GTTGTACACG CCGCAAGAAC TAT'rCCCGAT 180 240 300 360 420 480 54 0 600 660 720 780 840 900 960 1020 1080 1140 1359 GAATGAGCAA CTTrAAAAG TCCTGCGAAT GrrGGGGCGG TAATAATCCC CGTGTTGTAG
GCCCGG
INFORMATION FOR SEQ ID NO: 337: SEQUENCE CHARACTERISTICS: LENGTH: 813 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 337: CTGCTCAACT CAGACAGTCA GAAAAAGGCC ATCAGGTTAT TACCGTGAAC TGGGCTrAGA CCAGACCAAG TTTGGGATTT ATGGTTCAAC GT'rCAGAGGA TTCTACATTA CAAATCCCAA TTCCAGCCTO AAGATCAATT TTGCAGACTA GAGCCAGTGA CATCAACT1TT1 CTATCAATAC GGTGTCAACA AGGCCTI'GC GATTTGATTG CCTTTGGAGA GGTTATGCCA TGAAAAATGC CTTACCAACG ACCAAGATGG AA'rrTCTGAC TTTACCAAAA GAACCATCAA AAAAGTTGCT TATTACGACA GGTCGCCCTT ACCGTATGTC AAAAGATTTT CACTCCTATG ATTAACTTCA ACGGATCCCT TACTCATTTA TGAAAAGTGT TTGACTGTAG ACAAAAAATA TCTGCTAGAT CATTCAAGCC GAT-rTTATCG CTGGAGAATA TCGTAAAAAA TGAAGAAATT GCCAATCCCA AACTATTTGG TGTAGAAGCT CCAGCCTGAA TTGGTGACCA AGGACCCTAA CTGTATCCTC CAAATATTCC TTGGCAAAAG AAATGAACGC CTTCTACCAG CTGGGGAGGT CCGCTCAATA TCCTTGAATG TACCCCAAAA TTTGGACTAC TTGCTCAAGA TAATGAATCG TGACAAAAAA TGAACACAAT GATACCGAAA TGCTCGCTTT TGCTGGGAAG CAATCCAGAG CTACTCC CTT ATGCAGATGA GCAAATTTCC GGTTGCCAAA ACCCTACAAG ACTTATTCTT ATAACCTATA CTGATACTCA ATGAGGGGCA AAGAGCGAAC TTA INFORMATION FOR SEQ ID NO: 338: SEQUENCE CHARACTERISTICS: LENGTH: 683 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: li near (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 338: CCTAGATAAA TGATATAATT CTATTATTGT TCGTAAAAAT TAAAAGGAGA TTGATGATGG 813 1360 ACAAATTATT TAAACTAAAA GAGAACGGTA CAGACGTTCG TACAGAGGTT CTCGCTGGTT 120 TAACAACTTT C7"TGCAATG AGCTATA'rTC TCTTTGTAA.A CCCACAAATA CTTTCACAAA 180 CAGGAATGCC TGCTCAGGGC GTCTrCCTAG CGACGATTAT TGGTGCAGTA GCGGGTACCT 240 TGATGATGGC TTTTTATGCT AACTrACCTT ATGCCCAAGC GCCAGGTATG GGACTCAATG 300 CC1-rCTTrAC CTTTACAGTT GTATTCGGGC TTGGTTATTC TrGGCAAGAA GCCCTAGCTA 360 TGGTCTrCAT CTGTPGGGATT ATTTrCATTGA. TrATTACCTT GACAAATGTr CGTAAAATGA 420 TCATTGAATC GATTCCCAAT GCTCTTCGCT CAGCTATTTC AGCTGGTATC GGTGTCTTCC 480 'rTGCCTATGT AGGGATTAAG AATGCTGGAC TTTrGAAATT CACGATTGAT CCAGGCAACT 540 ATACTGTTGT AGGAGAAGGG GCTGACAAAG CTCAAGCAAC GATTGCAGCA AACTCTTCAG 600 CAGTTCCAGG ATTGGTCAGC TTTAATAATC CAGCTGTTTT AGTGGCTCTT GCAGGACTTG 660 CCATTACTAT CTTCTTTGTC ATC 683 INFORMATION FOR SEQ ID NO: 339: Ci) SEQUENCE CHARACTERISTICS: C A) LENGTH: 852 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: CTACTTTACA TGGAAGTAGT CACTGAATTC CAGTTAGAAA 'rTACTorGMTA ACTACGTTTT GAGGAGGAGT AAAATGCTTT CCTACGTTCG ATATTACCCA CTAGCGATAG CTAAATTAAT 120 GTGTCTGTGC TCTCCTAAAA TCTGCTGATT TATTACTGAC TAATACAGGA GGTTI=TTT 180 ATGgACAGAC AATCATATCT GCTATTGGTG TTTATATTTC CACCAGTATC GATTATTTAA 240 TTATTTTAAT TATT-rTATT'r GCACAGCTAT CACAGAATAA ACAGAAATGG CATATTTATG 300 CGGGGCAATA TCTAGGCACA GGCTTACTTG TAGGGGCGAG Tq-rAGTTGCT GCTTATGTCG 360 *TTAATTTCGT GCCTGAAGAA TGGATGGTTG GATTGCTTGG TTTAATCCCT ATCTATTTAG 420 GGATTCGCTT TGCAATTGTT GGAGAAGATG CGGAAGAAGA AGAGGAAGAA ATTATTGAAA 480 GATTAGAACA AAGCAAGGCA AATCAACTGT TTTGGACAGT TACATTGCTG ACAATrGCGT 540 CTGGCGGAGA TAATTTAGGT ATCTATATAC CTTATTTTGC TTCGTTAGAT TGGTCACAGA 600 *CCCTCGTGGC CTTGCTTGTG TTTGTAATCG GCATAATTAT CT TTTGCGAG ATTAGTCGGG 660 TGTTATCCTC TATTCCGTTA ATATTCGAGA CAATTGAAAA ATACGAGCGA ATCAT-TGTGC 720 CCTTAGTATT CA'rrCTACT'r GGACTATACA TCATGTATGA AAATGGCACG ATAGAGACTT 780 TrCTGATCGT GTAGA'rTTTT TTGTTrCACT AGGGATTTAG CCCGAGCTCA AATCAGCTCT CTGATTTTCA GA INFORMATION FOR SEQ, ID NO: 340: SEQUENCE CHARACTERISTICS: LENGTH: 754 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 340: CCGCACAAAA GCGCATAGTA TCAAGATTCT GATAA.ATAGT TAGTCT=rT TAAAGACCGG CACCGCGCCT AGGATAACAA TTTTAGCAAT AAGAACGGAA CCTAA.AATTC GGACATCCAC AGAGAACAGA GTTAGTAAAC CTAAAATCAC ATAGCTAATT TTCCTGTTAG ATAGATTGG AAAGAGGAGG GCGTAA.ATCA GAGGACCTGC A'rCTTTTGTC CAATAATGAG CAAGTAAAGC CAAGGCAAAC GCAAAGAGGA GCTGCAACCA GAAATAAGTC CACGACTCTT TTCGACGCCA TTCATAGATT TTGAAAAACT CCATAACGCC GAAGGTTGCG TCAAGACTTC TCTGGCTATT CAGACGTCTT TCATAAAGCC CAAAAATTCT
CAAATGATGG
TAAGAGAACA
AAGAAAATAA
CAACCCTTGT
CAAAATCATC
AAACTCACTA
TAAGCCTTGT
GATAAAACAG
TTTTCCACAC
CCCA
ACA'rAGTA.AT TGAGATAACT AAGGCACTGC CTGGTAGGGT TAAAGCATGA CCAAGATAGC AAAGCCTGAT AGATALATGCC TGACCAAATA AGATCAAAAA GGAGACTTrAG CATCTGATGG TAAAAGCTTT TTGCAAGAAA AAAAAC'rCAA TAAACCTGTT CTTCATAGAG GC'ITGGGGGG ATAAAGCCTT GATACTATGC CT'rrTTAATG ATCTTTCAAA CTCTGCATAC TGGCATTGAT CAAGATAAAC CAAAACATCA TAACAACAAG INFORMATION FOR SEQ ID NO: 341: SEQUENCE CHARACTERISTICS: LENGTH: 707 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 341: GGGGATAACT CTAGGAGTAC CGCTATTACT CGACTTAATG AGTGCACAAG AAGTCAGGAT TTTATGCAG GTTGGGCGCT TCATCAGACA GGGAAGATTT ACAGCGACTA TTATGGAAGT 1362 CAAGGTTTGC rTATTATTT GCTGACTTAC GTGAGTCAGG GCGGATTTTT TTTGAGTGGT TAGCCTTGGT AGCAGGAGGA TI'rrCCTTr TTAGATCAGC ACAGAGCAAG GAGACCAAGC TGGACATCTG GTGACTATTI' TTTACATGCT CTGCT~wrTG GTGGAGGCTA TGCGACTCTT TTAGCGCTTC CTTTCTTATT AGITAGTTG CGGCTTACCT AAGCAATCCA AGCCATGATA AGGGATTTGT
CTTTGCCATC
GGACACCTTG
AGTTACAGGT
CGCAGCCTTT
ACGGATTGGG
TATTGCTGTA
TGGGTTTTAT
CTACTATAGT
CTAGCTTTGG
GTGAGTTTAG
CAGTTTCTTG
CAGGCGGATT TTTCTrTTGCT CCCTTATCAT CGCTCCTGTT GCTT GTTGGT CTTTAACCTr GOCATAGAC GCTTTGCGCA CAGTGGCTTT AGGTTITTTCA CTTGTCTTTT ATCCAACTGC GCTGCAACAG GAAGTTGG GGATGCGWTT AGTGGTATTC GTTATCCTAT TGACACTAT CGCTTTGATT TTACTTCTAA AATTTTAGAG AATATGTTTT TTTAAGG INFORMATION FOR SEQ ID NO: 342: SEQUENCE CHARACTERISTICS: LENGTH: 762 base pairs TYPE: nucleic acid STRANDEDNESS: double (0 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 342: 0 0.00* 000.
GGATTTTGAA AAACCATACC GATTTGACGA CGTATATTCC CGTTGGCCAT CAATTACAAT CTCTCCGGAT TCTGCTTCCA ACCGTCGTTG ATTTACCACT ACCATTATGC CCTACAATCG ACGTGAAAgT AATATCCTTrC ACATCGTAGT AGTTCTGATT GATTTTTTAC ATCAATTATT GATTTCATTT CGAACCAAAT ACTACCCTTG AAATAGTCAT AGCCAGAGTA GATAGTCAAA AACTTGACCA AGCAAAGTCC AATIGTAATAG CAAGAAAATA AGTTTTAATT TTTCCAGGCA TTGCTGCTGC TAAAATTGTT AAGCCTTAAA CCTGTCACAG CTAACTCACG ACAGATAATC AAACATTTTC CTCAGTCAAA GTAAGCCATC AATTAATCGA AAAGCCATTC TCCACGTTTC 'rrCTTTATAG CGAAAAGAAA GTCCCTTTAA ATACATAGGC AATAAGGCTA CATAAAGTAG ATGGCAAACA TCTGACTAAA CCACCAGTTT- CAACCAATAA ACTGCAACAA TCCAAGCCGG ACTAGTAACT TATCCGCCAT T'rACGAGCTA AATATCCATC GCAACTATAT GACTCTCTAT
AA
AGCCATACCT AACTCAATCA ACATAATAAA AG AGGATCTGCA AATTTACCAA AATTACTGAC CA TAAATAGTCG GTAATACTGG CAACAGCAAA GA CGAATTTCCT ATCGTTAAAA TAAAGATAAA AA INFORMATION FOR SEQ ID NO: 343:
CCGACATA
CATTrCCAT
TAATAGCT
TAGGTATA
Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 482 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE-DESCRIPTION: SEQ ID NO: 343: C?1'TTGATAC ACTrAAACTA TGAATACAAA TCTCAAGCCC AAACTTCAGC TGCGACTGCC TTTGCCTGTC CTATCTGTCA AGAAAATCTG ACTCTGTTAG CAAGTGCTGC AACCGTCATT C~TTTGACTT GGCGAAATr GGCTATGTCA TCAAATCAAG CAATCTGCTA ACTACGACAA GGAAAATTTr CAAAACCGTC AGAAGCCGGC TTT'rACCAAG CTATCTTAGA TGCTGTATCT GACTTGCTTG AACTACCACA ACAATTTrGG ATATCGGTTG TGGTGAAGGA T'rCTATTCTC AGAAAGTCAC TCTGAAAAAA CTTTCTATGC CTNTGACATC TCCAAAGATT CGCGGCTAAA AGTGAACCCA ACTGGGCAGT CAATTGGTTC GTTGGCGACT
GTTTTGCTTC
AGACTAATTT
ATCTAGTCCC
AACAAATCCT
CAAGCTCAAA
GCAAACTACA
CAGTCCAAAT
TGGCACGACT
a p a a a INFORMATION FOR SEQ ID NO: 344: SEQUENCE CHARACTERISTICS: LENGTH: 520 base pairs TYPE nucleic acid C STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 344: TTTATTTTTA TAAAGTCAAT ACCTGTCTTT ACTTTTTCTT AAAAAAAGTT TATTATGTTC TTTAAGGAGG TGTAAAACAT GAAAATAAAT AATAAACTCG TTGGAGAACG TATTCAAAAT ATCCGTTTAA*GCCATGGCGA CTCTATGGAA AAATTTGGAG AAAAATTTAA TACTAGCA-AA GGTACAGTPTA ACAACTGGGA AAAAGGTCGC AATTTACCAA ATAAAGAAAA CCTACTAAAA ATTGCATCTA TTGGAAAAAT GAGTGTTGAA GAGTTACTCT ACGGCGATTA CAATACTTAT CTACACTTAA AGATTATGGA TI'TAGCTCCT GAATGTATAA AAAATTA'rGA TGAGTATAAC TCTTTACACG ATGATATAAC AAATAAAGCG TTACAGATCG CTCAAAATAC CATTTCTAAG ATTGATTATC AAATTTCAGA CGAAACGATC AAAAAATTTA TTGATTTAGC TATCGAACAA TCGAGAGATT TGCAAGGAAA TTTGTTGAAA AATAACGGGT 1364 INFORMATION FOR SEQ ID NO: 345: SEQUENCE CHARACTERISTICS: LENGTH: 1003 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 345:
S
**SS
S
GCATCAAATC CCCCATCAAA GAAGTrCTCT AAGTGCTTAA TGACAAGTAC A.ATGTTCACT GAACCATTGG TGAGCGC1TAT GGTGCCGTTG TCAA.ACAGTT GGAAACCAAT CCTTGGAACC AAGCTTTCGA AGAAACAGAT GGGCTGCTCC GGCGTGTTrGA TGGGGAAATC TATCTGGATG TGGTGGCCCA CCACATCAAC GCTATGCAGT ATTTTGGCTG GAAGGTTGGG AAGTTCTTCT ATCAATTTGA ACAAGCTCAC GAATTGCTCC TGGTTT-TAAA TGTTCCTGAT GGGACTAATT TGGTGGATTA TGACCCTGTT AAGCCACAGT GAAAAAAGAA GTTGAGAATA ATCCCAACTT GAGCTGCTTT TTTACGGTTT TCTTCGATGA CTTTCTTT'TT AAATGCGTAT ACTGCACCTG TTACAAGACC TTTAGCGAAT CCTTTAGCCA CCAGCCTCCT CAAGAGGTCA CATTTTTCTG
TTAAGAAACA
GCCGCAATAT
CGTGCGCCTT
CGACC'rrGAC
ATGTGCCTTT
ACTTCATCAA
GTCGGGAgCC
TCTTTGATAT
TGAAGTT TGA CTT-rTGTTTC
AAGCTGCTTT
CAACGGCAGC
TGAGTCTTCC
ACTGACCTTT
CGACATTATC AATAAGCTTC TATTTCGCTC TGGGAT'rACC TCAGACCATG TTTGATG'rTC CCAGCGCTCC AATG-ATATGC GCAGATGATG ATTGCCAAAC CAACCTCCAT ATCTATGATA GTCAAACTGC CAACCACGCT CAAAGCAGAA GA7rGAGT CCTAGCTATT TAAAAGAATA TTAACGTGAT ACGCGGCGAC TTGCTCTTCT GGTTCGATTA GACAGTTCCT GCGACACCTG TCCTTTATAT TCTCAATCAG TTGTGTTATA ATAATAGTAA GGAT'N'ACCA AGACCAGTCA AATAGCTTAG ACTGGAATGA CTGGGAAGTT GGAGACACGG CGAAAAAATG GGAATTTTTC AAGGAAAAAA GATGAGAACA AAA INFORMATION FOR SEQ ID NO: 346: SEQUENCE CHARACTERISTICS: LENGTH: 750 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 346: CCGCACGTAC TATTCCAGAT GCCGAGGAAG TGGACCTCAT CCTCGTTGGC GCAACTGGTC TCAACGCCTT TGAACGCCTC TTGGTCGGCT AGGTCGATTT GCTGGTTGTG AGAGAACAAG CCCCTAGCTC C-1rTTT ACGAT'rTATT CTGGCGCTGC AGTTCCTTT rAATAGCAGG 'rGGTrTTAAG ATTTTATGGG TCACTGGATC CCCCATATTG GCCTGATGGA. CAATATCAAA AAAACTGCCG TAGGTGAAGT AAAGCGTGTC TTGCTGAGCA GGTGTCTrCT TGGCTACTTT TTGCGATACA GCTTGACCAA AATC'TTCTTC CACTTCT ATTTTAAAAC CAGCCCTATG TrCTTCTTGG GTTCGT'rCAT CCATCATGTG GTCACGGCTG ACAAAGACTT TTTCTGAAGA 1365
CTTCATCTGA
AAAAAACCTr
TCTCTCTTTA
TTCTGGAGCA
AAAATGAGCC
AATACGTTCT
AATCAAGGCA
ATCTGCTrGCC
AGAAGGACTG
GGTTGCACCC
GTGGAAAGTC
ATACATACTC CGCCATGCTA ATAATCACAA AGAAAAGGAG
TGGCGTTCGT
TATTTTTCTT
TITGCCATCTG
GGGTCCACCC
TCCACTTGCC
TTATCAAGGG
GCTGCTCGA.A
TCTAAATCCC
TTGACCTTAT
AAGCCTTGAG
CCCAATTATC
GAAAAATTTT
CCATCAAGAC
CTATCAAATC
CCTGATGAAG
CAAACTrCCAC
AAGCTCGAGG
TGAAATGATA
0 *0 0 *0 00 00 0 0* 00 0 0 *00000
S
0 *0 *0 0 0 *00* 0 0*0 *0 0 0 0 INFORMATION FOR SEQ ID NO: 347: SEQUENCE CHARACTERISTICS: LENGTH: 596 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 347: CGCAACATAC GGATAACCTC CAAAGAATAT TGTTAGAGTC TTGTTCAAAA CAATCATCAA GTTGATCTTG AGGATAAGTG TACTTACCGC ATTGCAAGCG ATCTTTCGC ATNTTCCAAA.
AGTTAGACCA GATATCATAG TGAACTGGGA TACGAAGAAG GTCGATAGAT GTCAkITTGT TATTCAAAGC AACATCAAT TTAAAGTCTT GAGA.ATCTGC ACCATGATAG ATGGTTCCAC TTTGAGCCAT TTCTTCATCT GTAACAGCCA ACCGTTCACT GGGAGAGTTA CCAAGCAAGT TTTTATATTA TAGCAAAGCT TTAAATTGA.A AACCACGTGG ATGATGGTAT TCTACTAAGT CAACTTCCCA GATAAATGGA TGGAAATCGT GTTCTAGAAT CTCATTAGTA GAAGCCATGA
TA.ATGACTTT
CTTGGATACC
TACCATGTTT
CTGGTGTTTC
AGCCAGCAgT GGTACGCAGA TTTTCTGCCA TACCGGATTT TCACCATAGT TGCAAAATAG TTTGAGAAGT AAAGATATAG TTAACAGCCT TCACCGCCTG TCTCATCAGC ACGGTCAAAT GATTCTACTG CATGAA INFORMATION FOR SEQ ID NO: 348: 1366 SEQUENCE CHARACTERISTICS: CA) LENGTH: 673 base pairs TIYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 348: CAGAGTCAAC AGCCTGAGTT GAAGGCAACT TTAGACACAG CAGTTACGAC AGCTGAATGA GCTCCTCCAT CAGrlTT?1C TTTAATGAGT CCAGCTACAT CTrCAACTTC GAGGCCGTTA ATCACAATGT CAGCGCCTAC TTCTTTTGCA AGGGCAAGTr TGTCATrGTT GATATCGACT GCGATAACAT GAGCATTGAA TACTTTTTTA GCGTATTGAA CAGCGAGGTT ACCAAGTCCA CCAGCACcCT AAAGAACAAC CCATTGGCCT GGTTCAACTT TTGCTTCTTr GATAGCTTTA boo*: :0,90 0 t TAGGTTGTTA CTCCAGCACA TGTGATAGAA GAAGCTTGGG ACTTI'GAC.AG CATAGTCAGC AGTTACGATA CATTGTTCAG TAGCCAGCAT TTTTCACTGT ACGGCAAAGG GTTTCGCGAC GTGCCACATC CTTCAAAGAA CCAAGCAACG CTGACGCGGT ACATCTGGAG CAATCTCTrT AACGATACCG ATACCTTCGT ACTTGACCAA AGTCACCATG AGCAACGTGG AGGTCGGTGT ACTTCTACAA GTG INFORMATION FOR SEQ ID NO: 349: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 198 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear CTGGATCAAG TCCGTCAGGA CCATACCACC GTCT-ACTGAG CAGTTGTACA GTATTCGCAA.
CACCGACTTT AAGGCT TTT C GCCCAAGAAC ACGTCCTGGG GGCAAACGCC CACAGTATTC 09 0 o a 99@e eda 9S 9. 9
U
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 349: GTACCCTACA AATGCTTrAC AGTATGGGTT GAGGGTGGTC AATGGAACTA TGGAGTAGGT TGGACAGGAA CTTTTGGATA TTCTGATTAC TTACATTCTA CTCGATATCA TACAGCAACT GTrAGACATC GGGGTAGAAC CTCTAAGGAT TATGCAAAAC CTGAGGCATG GGCTAGAGCT TCCCTCACCA AGATTCCG INFORMATION FOR SEQ ID NO: 350: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 891 base pairs TYPE: nucleic acid 1367 STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 350: GCTTCTTCTA TAGACAAAAA TATCATGGGT AAAATAATCA AGGCTATAGC TAGAAGGAGG GACCAATCCA CTACTAATCC TAAGAACAAA ACACTCAAGA GAGCAGAAGA CAGAGGTTcA CTGGCACTGA TAACGGCAAC AAAAAAGCAA AAGCCGTTCC TCAAGAGTAA AGGAAAGCTG
CACCAAAGGA
AAAGAAAGCG
ATAAACCGGC
ATCATCCCCC ACCCAACCGT AGGAACAAAA ATAACATrAA ACATAACACC CATGGCACTC ATGGA'rAACT GAGAGAGGTC TCCCTr'rGTC AAAACATAGA AAACAGCGCT TTTTGACGCT AGGATAAAGA CAGGCCTAAT AAACTGTAAA GAAACCAAGG ACACAGCCTr CATGGAAATG ATAATGAGGC AAATCAAGAT ACTCCAAATA GAGAGGACAT TGCTrAAACAA ACCTGCCAAA CCATAACGCT TAGCAAAAGG TrGGGGCAAG AGCA.AACCTG TTATAAGAGC TAGCGGCGTC GCCATCAAGC AAACACCCAG CATGGCAACC CGTrTTGAT AAACCAAGCG A'N'GTAAAAG ATAGTTCCTG TCGrAGCATT TGAGTATTCT ACACAGAGAT AGAAAAAATA CTGAACTGAA AAAATCCCCA AAATAGCATA GGCTAAAAAG GGCAGGTAAT TrMTT~GTC TCGCCAAATA TCTAGCACTT GCGATTTTAA TTGTATTGCA GACCAAATGA GTACAAGAC'r CCCTGCCAGT GTCAAACGCA TAGAGGTAAT CCAGCCCGAA GACACCTGAT AATGAGTAAA GAAGTACTCT CC'TAAAATTC CACAGAI-CC CCATATTAAG CCGGATAGGA GCGAATAAAT TTTTCCGTTA ACAATCTTrT TCTGATACTG A INFORMATION FOR SEQ ID NO: 351: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 325 base pairs CB) TYPE: nucleic acid STRANOEDNESS: double CD TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 351: GAAAGCGTTC AATAGAACAT TGCTTTTTTA TTTTrAGAGT AAGCTAAGCC CTrCAGCATC TGCGATGATG GTTACATCAG GGTGAT7rTG GAGGCTACTr GCAGGTAGGT TCTCAGTCAC TGGGCCAGAT ACTGTTCCGG CAATGGCTTC TGCTTTCGAC TCACCGTAAG CAAAAAGAAT AATAGACTTG GCATCCAAAA TGTTTTTA.AT CCCCATTGAA ATAGCTTGGG TTGGGACGTC T-rCAATCTrG GCAAAGAAGC GTGCATTGGC TTCGATAGTA GACTGGTCAA GTTCTACTAG 180 240 300 360 420 480 540 600 660 720 780 840 891 120 180 240 300 1368 ATGCG71rGA CTGTCAAATG CAGTG INFORMATION FOR SEQ ID NO: 352: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 344 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 352: CAAGAGCAGT TTGATGATTT TTGATAAGCA TGCGAATTTA AAATACAAAT ATGGCAATCG CAAGTTTTGG TGTAGAGGCT ATTATGTAGA TACGGTAGGC CGTAATCAGA AAGTGATAGC TGAATATATT CAGAATCAAT TACAAGAAGA CAGAGTAGCA GACCTAGCTC ACGTTATTCG AGTCAGTAGA TCCGTTTACT GGCGAAATAA ATAAGAGGAA GTAACGTnAkA GTGCTTTAGC ACCTGCTCGG GAAAGTGGTG CGCGAGGAAG CTATTTCAGG ATGCTTTGCC CCTGGCCGGT AGAAGCGTTA TAGCCGCAGA CTACGACACT TCACACTGGT GGTT INFORMATION FOR SEQ ID NO: 353: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 692 base pairs B) TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 353: 325 CCCTATCCCT GCTATTGGGG CTGCTCTCAT TGGACCTGTrr CCCTTCACTC TGCAAAACTT CCGAGAGAGG CTGTACTTTC TCCTGGACTC GTCTTTGCAG GAGGTGGAGC TGGTTTTCAG TGGTTTTATC TCGTTTACTC TGGACTTACT GTTAAGATTT TTCTTGCAAA CCTCTTGGGT AGCTTGCATT TCCTAGCTGG AATGGCATTT TTTATCATTC CAGACCTTGG CAAACTTCTA CAACGCCTTA AAAATCAGGC TTACTTrACT TCAATATCCT TTTCTTTTAT TTTGAAAACT GGAAGCTAGC CGCAGGCTnG CAAAACACTG
TGCAATCGGC
TATCTTCTTC
GCTTTAGTTG
TCCTCTCTAA
GATGCCCTTG
GAAAAAGCTC
GCTATTAGTT
AACTAAAAAA
TATACTCAAT
TrTTGAGG-r
TTGATTCTAC
TAGGTGCTAT
GCCCTACTGC
TGTCTTTAGA
CGGTCTTCCT
AGGCTATCTT
TGCTGCTTTG GCACAAATCA GTCTTCCAAT CCAACAGCAA GAGTGGTGTT TCTTTGTCGG CGGGATTCTC TTGCTGTGGG GGTT CTT CCC TTATTAGCCG TCCCCTACTT GGATATCGAG TTATCATGAC GAAAATCAAA GAGCAAACTA GTGGATGAAA CTGACGAGTA 1369 AnATC'TCATA CATACGGCAA CGCAAAGC1TG AC INFORMATION FOR SEQ ID NO: 354: SEQUENCE CHARACTERISTICS: LENGTH: 1005 base pairs TY(PE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 354: 692 GTGATGGACT ACTGGTTCAA AACGCATCCA GTAGCCAGTA ACTTTTTCA TACTrACACC GTTAATTCTT TCCGCACTTC CATCCAATGG GAGCCTGATC CAAAAGGTAT TGCTTrCTAC CAGATGGATC TTGT1GATGAA TTTACATCAT TACGG'TGGTT GGGAA.AGCAA ACATGTAGTG TTCACATGCT TTGGAGATIAA GGTTCATTAC CCAGAAGCAG GGTACTTATA TGCTTTCCAT GCCGTACAAG TCATCTATAA TCTAAACCTT TCATTAGAAC TTGATGGAAA GATTGGGATT AGTAATTCTC CAGAAGACTT AGAAGCAAGT TTCTTGAATC CAG.CTGTTAA AGGAACT=C GATGGCGTGT TATGGAGTCA 'rACCGAAAAA
GAAGATTTCC
AGTCGACTCA
AATGCCATCA
TTTGATTTAC
GAGTTATTCG
TGGACAACTT
TATCCAAATC
GC1'AGTGCAA ATT'TrAAACT
CGATTTACAG
CCAGAAAGAT
ACTTGATGAA GGAAATTCGA TCAAGAATTT AGAGACAG4GT TTGAAGAAGC TAAAAAGAAC CAGTGGAACT TCTTCAAAAA 'rGAAGTTTGC CAAGACTGCT TCAATGAGCC AATCGTCATT TAAAAGGAAA GGGAAAAGAG A.AGTGATTCA ACTATATCGC TGACACCTGC T'rATCCAAGA ATGACTTCTT TAACAAAGTC TGGTAAAACA GCTAGAGAGA GAAGATTrTT TCGATAATGT CGGACCTCTr GAGCTTCAAC TGATGAAATC AAATACGGTT GATTTrCTTG GAGTAAACTA CTACCATCCA AAACGTGTTC AAGCACAAGC AAA'rCC'rGAG GAATATCAGA CGCCCTGGAT GCCAGACCAA TACTT1CAAAG AGTATGAATG GCTGGAGCGT CGCATGAATC CATATCGTGG TrGGGAAATT TTTCCGAAAG CCATrTATGA TATTGCTATG ATTGTGAAGG AAGAATATGG TAATATCCCA TGGI-I-ATCA GTGAA INFORMATION FOR SEQ ID NO: 355: Wi SEQUENCE CHARACTERISTICS: LENGTH: 973 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1370 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 355: CCGACAAGCA ATATTAAAAA GAGTAAACTA TATAGTGAAT CAAATATACT TAAGAAAAGA AGCAGGTTCA GTGGCAGTCC TTGCCCTAAG AGCTGGTCAG GATAAGAAAG AGTCTAATCG TCAAAAGGCA GAAAACI'TGA CACCAGATGA ACAAAT'rGTT ATCAAGATTA CGGATCAAGG TTACTATAAT GGCAAGGTTC CTTATGATGC T'rAACTAGTT AATTAACCGG TTTATTACTr GGAAAGAATG AAAATTAATA AAAAATATCT rGTTrTTCC TATGAGCTrG GACGTrACCA AGTTGCTTAT -A?*GATGGTG ATCAGGCTGG
AGTCAGTAAG
TTATGTGACC
CATCATCAGT
TGTCAATGAA
TAAGGATGCA
TCCGAATTAT
TAAGGTAAAC
GACAAAAGAA
AGATAATGCT
CAG'rTGAAGG
GGTAAATACT
ATTCAGACAT
ATG'N'TACCT
AGGGAGGGGA TCAACGCCGA TCTCATGGAG ACCATTATCA GA.AGAGCTCC TCATGAAAGA ATCAAGGGTG GTTATGTCAT GCTCATGCGG ATAATATTCG CATAATCATA ACTCAAGAGC ACAACGGATG ATGGGTATAT TATATCGTTC CTCACGGCGA GAGTTAGCTG CTGCAGAAGC TCTAGTTATA ATGCAAATCC
CTTCAATGCA
CCATTACCAT
GAGATTAAAC GTCAGAAGCA GGAACGCAGT GTTGCTGCAG CCAGAGCCCA AGGACGTTAT TCTGATATCA TTGAGGACAC GGGTGATGCT TACATTrCCTA AGAATGAGTT ATCAGCTAGC GGGAAGCAGG GATC'TCGTCC TrTCTTCAAGT AGCTCAACCA AGATTGTCAG TCAAGGGGGA AACATTTCAA CATGTGGGAT CTG AGAACCACAA TCTGACTGTC ACTCCAACTT ATCATCAAAA GCCTTTTACG TGAATTGTAT GCTAACCCTT ATCAGAACGC INFORMATION FOR SEQ ID NO: 356: SEQUENCE CHARACTERISTICS: LENGTH: 843 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (XI) SEQUENCE DESCRIPTION: SEQ ID NO: 356: GGTCGCATCT GCAATATCTG TCGCCTCCAC ATAAGCGACA CCAGCCTTGT CTGCTGCCCG TTTGACACGT TCTGCAGATT GACCCAGGAT GACCATCTTC TTGAGTCCAG TAATGTCTGG CACCAATTCG TCAAACTCAT TGCCACGGTC CAAACCACCT GCAATCAAGA CGACCTTGCT GTTGTCAAAT CCTGACAAGC TTTTTGAGTA GCCAAGATAT TAGTTGATTT ACTGTCGTTA TAGAATT'rAA CACsCTTGAT GTCATCCACA AACTGGAGAC GGTG'TTTGAC ACCACCGAAG GCTGAAAGAG TTTCCTTGAT GGTTTGATTG TCCACATCAC GAAGCTTGGC TACAGCAATA 1371 GTCGCAAGGG CATrTTCCAC ATTGTGGCTA CCTGGAACAC CGATTrCATT CGCTGCCATG 420 ACTACTTCAC CACGGAAGTA GAGTTGACCA TCTrCCAGAT AAGCTCCATC AACCT'I-1-CA 480 AGTGTTGAAA ATGGTACAAC AGTGGCTTCT GTCI'TGGAAG TCAAGTCTTT TGCCAAGTCT 540 TGATTAAAGT TCAAGACAAG GAAATCAGCT GC'TG'TCATCT TGTTCTGGAT ATTCCACTTG 600 GCTGCTACAT ATI'CCGAAAA TGACCCATGG TAGTCGATAT GAGT'rGGCAT GAGGTTGGTA 660 ATAACCGCAA TC'TCTGGATG GAATTCTTGA ACACCCATGA GTTGGAAAGA AGAAAGTTCC 720 ATAACAAGCG TGTCCTTATC TGATGCTATT 'rGAGCAACCT GACTAGCTGG ATAGCCGATA 780 TTCCCTGATA AAAGACCATG T'rGGCCAGCA GCAGTCAAAA CTTCCCCGGn TCCTCTAGAG 840 TCG 843 INFORM4ATION FOR SEQ 10 NO: 357: SEQUENCE CHARACTERISTICS: LENGTH: 807 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ 10 NO: 357: .rrTTTTTTAT ATTTTTTTTA TTTATTATTT TTTGGCAAAA AAGACCAATT TGCTTTGGAG CATTGCTTCT GCA'N'AAATT GTCTATT 7" GCTCCTGCTG TTACGCTCTT TGTATCATGT 120 *****ATTAACTAGC AAGTGCAACT TGCAAACTAC TAGTAAGAGG AGAAAAACAA AATGGTATG 180 *ACTGACCCAA TCGCAGACTT CCTAACTCGT ATTCGTAATG CTAACCAAGC TAAACACGAA 240 GTACTTGAAG TACCTGCATC AAACATCAAA AAAGGGATTG CTGAAATCCT TAAACGCGA-A 300 GGTTTTGTAA AAAACG?1'GA AATCATTGAA GATGACAAAC AAGGCGTCAT CCGTGTAT-T 360 CT'rAAATACG GACCAAATGG TGAGAAAGTT ATCACTAACT TGXAACGTGT TTCTAAACCA 420 *fee GGACTTCGTG TCTACAAAAA ACGTGAAGAC CTCCAAAAG TrCTTAACGG ACTTGGAATT 480 GCCATCCTTT CKACTTCTGA AGGT TTCT ACTGATAAAG AAGCACGCCA AAAGAATGTT 540 GGTGGTGAGG TTATCGCTTA CG~TTGGTAA AATCAAGATA CAAAGCTCGT AAAGAACAAA 600 .GCAAAATTAG GAAGTTGGAG AAGTTTGTTT ACAAACAGGC CAACTTATCT AT'rTTGCACA 660 GT'rCTTAGAG CGTGTTCAGT TCAGCTCTTG AGCTAAGTAA GTATCTGAAC CCCGTGAAAA 720 CTGGCCGTCC TGGCATGTTC GGGTAACAGG AGAnAATAAA CATGTCACGT ATTGGTAATA 780 AGTT~CAGCTA AGGCCTTCGT AAAAGTT 807 1372 INFORMATION FOR SEQ ID NO: 358: SEQUENCE CHARACTERISTICS: LENGTH: 653 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 358: CCCAGTATTT TTGTCCAAGC TTAACCATCT tTGAACAAAA CGTATT1GACC ATATGTTGC ATGCATCAAA TAGAAATTGA AGTCAGCTAG AACCTCGTTC CAAGTATGAG TTGACAGAGG TATAGATAGG GAAGTGTCGG GGACAGGAGG TAGGATGGAA TCTTTCTGAT TTGGCAAAGG ATCAGGCAGA TCATTTGTCA GCCAGTT~AGA CCAAAAAGAT ACGACCAGAA AAGGATGATA CAGATCTGGA ArrGGCTCTC TCCTCAGGCT CAGGTCACTA TTTTrCGGTGC CTTGGGTGGC CAATGTCTTT CTGCCTAGCA ATCCTAAGTT GGCACCCTAT GGATGGGCAA AACTTGATA CTTATTGTCC AGAAGGAATC AGACTACGAC TATCTAGCCT TTATGCCAGT TCGGGATAGC AAALATTTr= CTTTAAAAAA GTGTACGCTT CTAACGAATA TAACTTGCCC AGATGGTTAT GTGGTCGTAC TGCATAGCAA AGTTTACTTA TTcTATTATr k.ATTGCCAAT CTAGCTGGTC CAGGATAGGC AGGAGAAACA C'rAAGTAAG AGCTTGGAGG GACCAGCTGG ATTACCGCTT TGACCAAGCC AGACAAGCCA TTGGAAGTGG TTGTCAGCGA CCG TTr GCAA GAA S S 0 0@ S V 0 S S
S
*0 S S
S
*05050
S
St
S
S.
*5t.
S
*.SS
S. *B S S
S
INFORMATION FOR SEQ ID NO: 359: SEQUENCE CHARACTERISTICS: I(A) LENGTH: 641 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
CACCP
GACTC
GTGCG
GCTGC
AAAA')
ACCAC
GTATC
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 359: TGTGA TGTGACGCTG GCCACAGCTG TCAGAAATCT GGCGAGCCAT CGTGTGCAAT .TTCCC GATGTAATCT TGTrCATAGT CCN'TGATGA ATATGTTCAA GCTGTAGAAG :CTTCC TGAACAcTTA TCAACTGTTA CAGGCGAGTT GACCAGTCAG GAAACAGATG ;TACAC ACTTGCCAAC ACTTCTTCAT CCCGCATrA CCTAAAACAA GCCTTCCAAG AGCA CTCCAGA CACTGTA(G AACCC TCAC TATTATCACT GGTGGACACA
AXGGA
.TGTGG
CCAGTTGACC TATGCTTGGA CTGTAGCGTG GACGAAGTTC AAACACTTTT GCAGAATGCG ACCGCGAGAT GGAAAc!GCGT
CCACATGATA
TTGCCAAGG
1373 TCAACCAAGT AGGAAACTTT GTrAAAAGTA ACTrGCTCAA CGTGGAAG GGTAAAATTG CTACGGATAA GGCTC.AAAGT GACTATCTCT TTACTGTCAT TAACACAGGC TGCATGATA AGGTCGATAC rGCAGCACA GTGATTGATG TGGCGACTTG TGATTTCAAG GAATTGCACC CAACAGAAGG CTACAAAAAG ATGGCTGCTC TTATC?1'GCC G INFORMATION FOR SEQ ID NO: 360: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1958 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 360: CCTCAAGGCC A.A'TTGAAGG CTCTAAAACA TTTGCCATTC CAGCAAGTAC TCAAAAGCTT AGCGAAATTA TTGCTTTTAA TCTTTCGACT TTAGAACAGG CATTCAAAGA GAAGCACTAC TGGCAATATC AACACGATTC TTATCATCGG ATGTCACGCA AGGGCAACAG CCAAGACALAC AAATCCGAAA TGTTTTATGG CTATGAGAAA GCCATTATAG ACTATATTGA TTACTACAAC CTTAGTCCTG TGCAGTACAG AACTAAATCC GGTCAGTACA AAACTCTTGC TACTATGCGT CTCAAATCGA GTTTTTACTC AATTTTCTTA C'rCTGAGTAG AGTGTCTTGAL TATTGGCTTC CAAGAAAAAT TCTTGAATGG TTTCGATPTrC GATGAGGATT TCATAGTGAA GCGGAGCTTG ATAA.ACCTCA ACAAGGGTTG CATCGGTACT TGAGPTGTC GTATTGATAA GCTTCATAAT CACTTTTTGA ATAAAGTCGC TTGATTTAPA GCTAGCCGCA GGCTATACTT GACTACGGTA TTCGAAGAGT ATTACCCAAT CTTATGCTGT TACTTATCAC CAGTTTTAGA TCACCCAACT TAGAACA.AGT GAGAATACGA TTCTCCATAG TTCCTAGAGA GTAAGGGAAT
TGGCT'TA.AT
ACAAACAATG
TGACCAAGC
TCAAGCATCT
ATGGAAAAGT GCTACACAGA TGTGACAGAA GGTATGATGG AATCTTTCT TGGCATTTTA ACAT'rrAAAT CACTTA.ACCA ATTGGAACAA AACAAACGAA TTA.AGGTAAA ACTA.AAAGGA TTTGGATAAA TTAATTGTCT A.ACTTTTTGG TTTATTATTG AAAGACTTAT TGGACrrTCT CTTGATTGGG ATTGAAATC CAATTAATTT ATCAACAGAO GCCI-rATCAA TTTTACGTTT AGGCTCACGA ATAGCACGGT GTTTGTTTGA GGTAAAAATA ACATCTGTAT TCCCTGCAGA TTCTAGCTGA CTT-TTACAA GTTGCGAGTG AT1TTCCTCCG ATTTTCTAAT TCTATTATAG CTCAATGAAA ATCAAAGAGC AAACTAGGAA AGGCGACGCT GACGTGGTTT GAATTTTATT T'rrTTCCAAG ATTCAATGGC CCATTTATGG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CTACCACGTT TAAGGTTTTT GATAGCCTCG TTTTCTAGTG GCTTTTGTAC 'TTTTGAAA TAGTAGTAGG TATCACGATG ACGAGAATAG CCAATrTCrG CT1GTGAAACC AAGCTC?1'CA TTTTCACCTG CTrCA.ATTTC TCCACATGGT AGAACAATTX' GrrrG=C AGGA'PTAGGG TAGTCTGTAT TCACTTTTTT TCrCCGAAAG CTAGTATCGT TATTATTATA GTGAAATGAA 1374
TCAATAGGGA
GGAGTrGCTT
AAATATTCGT
ATCAACTCAT
AGGAACCAAG
ATAACTGCAT
TrGGGTTTGC
CCAAAAATAG
ACCAGGCAAT ATGATTAAAG CATAGAGGTA GGCAGGATTG CAGC'TTGTCC GTAATAGGTA GCTTTAGGGC TTCCTGATGA CACCATT'rGG T'rCTTGAACA ATACGCCATA GCGAGCAATA CATTGCATTT TCCTCATTAT TACACAATGT GGTATAATCT TCTTA'rTGTG AGCGAACAGG AATACCATTT ATGGCTGGIT AAAGGAATAA AACCAAGAAA CCAGACGCTT ATTTGACTGA TATGCGCTCA AAGCTATGGG TCTTATGGCA TATTCAATAG ATTT'rCGTAA AAAAGTTCTC TAGTATAACA CAAGCATCAC ACGTTT'rCCA AATCTCACGT AAAGCTA-AAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA GG'N'GATAGA GATAGACTTA AAAACTATCT TACrGACAAT AATAGCTTCT GAATTrTGGCT GTCATCCAAC TACCATCCAC tACACTCGAA AAAAAAAAGA AC'rACACCTA CTATGAAC INFORMATION FOR SEQ ID NO: 361: SEQUENCE CHARACTERISTICS: LENCTH: 851 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 361: T ATGA.AATTA AGTTATGATG ATAAAGTTCA GATCTATGAA CTTAGAAAAC AAGGATATAG CTTAGAGAAG CTTTCAAATA ATTGATTGAT CGI-rACGGAA TGATTTAAAA CAAGAAATGA 'rTCTCTTGAA TACGGTCTCC GAAAAACGGG TATACTATTG CCATCCTAAA AAAGTTAAGA AGAAATTGTT TAAGAATTAA ACTAGCTCGT TCGACCTACT AGACCTTAAA GCTGAAATTC
AATTTGGGAT
TAGAGTTCGT
TTAATAAAGT
CAAGTCGTAC
TTGAGAAACC
GAACTCCGAT
TGACTGAGT
ACTATCACTT
AATCCA'TTTT
AmACAATTCT AATCTTAGGT ATATGAT'rAA CAAAAAAGGA AAA.AATCGTT ACTATTCTCC CTGACATGAA GGCTGGACTA AAGATAGAGT GATACTTCTT AACTGGCTAG CACAATACAG AAGAGGGAGA GTACCTGAGA GCGGAGAATG TGAAGGAGGA AAAAGACAAA GAAGAAAGAC TTCGTTAGAT CTTCTTTTA.A AAGTCATTAA GAAACAGCTA GATAAACCAG ATAAGGACCA TATCGAACAC AAAGGAAATT ATGCTTATCG 1375 TCGGATTTAT TTAGAACTAA GAAATCGTCG TTATCTGGTA AATCATAAAA GAGTTCAAGG CTTGATGAAA GTACTCAATt TACAAGCTAA AACGCGACAG AAACGAAAAT ATTCTTCTCA TAAAGGAGAC GTTGGCA.AGA AGGCAGAGAA TCTCATrCAA GGCCAATTTG AAGGCrCTAA AACAATGGAA CAGTGCTACA CAGATGTGAC AGAA?1-rGCC A'PTCCAGTAA GTACTTAAAA GCTrTACTTA T INFORMATION FOR SEQ ID NO: 362: SEQUENCE CHARACTERISTICS: LENGTH4: 1168 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 362:
GGGTAGAATC
CCAATTrCAAG
AACGGGATAA
CATCAGCTCA
GGAACTGGAC
CTAAATACTT
TAAGACCACC
TTTGGAAGTG
CTACACTTAT
AAAAAAGAAA
GGGTGAATAA
TCATCAACCT
AATATAGTCA
GTAATACCTT
CAGGTGAGTA
ACTTACCCAT
TTGGTAAGAG
AAGTTATTGG
GATATCTCCA ATGAGTrGGr tTAGCTGGTG AAACTGTAAA GTrGAGGCAT CGCAAACTAT GGACTGTTTC CTCGTCAGTT GGTTGGCTGT GAAGCA.AGCT GCCCTCCTTC CAACAAT'rTT CAATTCTTTA CAAGCATAGT CCGTTCCATA ACCTGTTALAC AAGGATATCT GAATCCGAAT AACGACAGTA GCGGCGTTGG AGAAATCCGC TCTTTTAGTr TCAACTGGGA AAAAAGTTCC ATACTGGG?1' AAATGACCTC CATCGAAAGA TAGTGGTAA ATGATTTGGT AAACTGTTCA TGTGAGTTTC CTTTCT'TN ACCATAAAGG GGAAACrCTr TrGTCTAG TAAAAAACAC CCATCCAGGA TCTAAGCTAA GGCAAGGATT CTGGATGGTT TTGGGGTTTT AGCTGCTrGC GGCCAATCAG GTTCAGATAC TT-AGTGGAAA TCCAACTACA 'rr AACTATC TATTAGACTA ATTGAAACAA GAACAAGACA AAAGAGCCTC ATAAAAGGTA TTTGAGGTGC 7TTTTGATAT GAGCCCATGT. T'rTCTCA.ATA GGGAGGAAGA GGTAAAAG?1' TATACCCAAA CTCTTCACAC TCTATGGAA'r CTTGCATTAT CCATAATAAT AACCGATGGT AAGATCGw
CTGGAAAGAA
GGAAAGTAGG
AGTTGAAAGA
TCATTCGTTA
TGAAA.AAAGA
AAAGACTTGT
GTGTT'TTT
CCATTGGGTG
TTTAGATTTG
AAAAACTTAC
TTACGCTGAT
TTGCAACTTG
GGATTGTACT
AAGAG1-rCTA
GTGTTTAATG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 AAAI'TCTGA AACCAAGCTT CAAAAAAGTC GCTCGTCATC. GTCTCTTCGT AGCGATTAAC TCACCATTTG TTAGACCTGC AACCAAAGAA ATCCTCTGAT 1376 ATCT'rCTrCC AGATACTTTG CCTCTTCTTA ACTGACCTTT TAATGAGCGA CCATATTCTC GATAAAAATA AGTATCGAAT CCTGTI'TC INFORMATION FOR SEQ ID NO: 363: SEQUENCE CHARACTERISTICS: LENGTH: 4483 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 363:
S.
S
*SS.
S
GTCAGCTrCA GCAAGCCCAT GTCAACCAGT GCGTCGGCTT GTCGGCCTCA GCAAGCGCAA TrCAGCAAGC ACAAGTGCGT GAGTGCGTCT GAGTCAGCAT ATCTGCATCA ACCAGTGCGT CAGTGCTTCA GCCTCAGCGT ATCAACCAGT GCG'rCAGCCT ATCGGCTTCA GCATCAACCA AAGTACCAGT GCTTCAGTCT CTCGGCTTCA GCAAGCACAT AAGTACCAGT GCGTCAGCCT
CAGCGTCGAC
GTACCTCAGC
CAGCCTCAGC
CAACGAGTAC
CAGCCTCAC
CGACAAGTGC
CAGCAAGTAC
GTGCCTCGGC
CAGCATCAAC
CAGCATCTGA
CAGCGTCGAC
AAGTGCTTCG GCTTCAGCAT CAACGAGTGC GTCAG CTTCC GCCTCAACCA GTGCGTCGGC
AAGTATCTCA
GTCAGCCTCA
ATCGACAAGC
GTCGCCCTCA
TAGTGCATCA
TTCAGCGTCA
AAGTGC'PTCA
ATCAGCGTCG
AAGTGCGTCA
TTCGGCGTCA
AAGTGCCTCG
CTCAGCCTCA
GAGTGCGTCC
GCGI'CTGAAT CGGC.ATCAAC GCAAGCACAT CAGCTTCTGA GCCTCAGCTT CAGCAAGTAC ACCAGTGCAT CTGAATCGGC GCTTrCAGCAT CAACGAGTGC ACCAGTGCGT CAGCTTCAGC GCCTCAGCAT CGACAAGTC ACAAGCGCCT CAGCTTCAGC GCCrCAGCAA GTACTAGTGC ACCAGTGCAT CAGAGTCAGC GCI'TCAGCAA GCACCAGTGC ACCAGTGCGT CAGCCTCAGC GCTTCAGCAA GTACTAGCGC CAGCTTCTCA ATCTGCATCA ACCAGTGCGT CCGCTTCAGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 ATCAGCTTCA GCATCAACGA GTGCATCGGC AAGTACCAGT GCGTCAGCTT CCGCATCAAC GTCGGCTTCA GCAAGTACTA GCGCCTCAGC AAGTATCT CA GCGTCTGAAT CGGCATCAAC
CTCAGCCTCA
ATCAACGAGT
GCGTCAACAA GTGCATCGGC GCGTCCGCTT CAGCAAGTAC TTCAGCGTCA ACGAGTGCGI' CTGAATCGGC TAGCGCCTCA GCCTCAGCGT CAACAAGTGC ATCGGCTTCA GCATCAACGA GTGCGTCCGC TTCAGCAAGT ACTAGCGCCT CAGCCTCAGC GTCAACAAGT GCATCGGCTT CAGCGTCAAC GAGTGCGTCT GAGTCAGCAT CAACGAGTGC
GTCAGCCTCA
ATCGACAAGC
GCAAGCACAT CAGCTTCTGA ATCTGCATCA GCCTCAGCTT CAGCAAGTAC CAGTGCGTCA ACCAGTGCGT CAgCCTCAGC GcTCAGCGTC GACAAGTGCs 1377 TCrCTTICAG CAAGTACCAG TGCGTCAGCC TCGACAAGTG CGTCGGCCTC AACCAGTGCA
TCAGCAAGTA
AGTGCATCAG
TCAGCAAGTA
CTAGCGCCTC
CTTCAGCAAG
CCAGTGCG'rC TCAGCGTCTG AATCAGCATC TCAGCATCAA CAAGTGCTTC AGTGCTTCAG TCTCAGCGTC TCAGCAAGCA CCAGTGCTTC AGTGCGTCAC CTCAGCAAGC CGC.ATCAACA AGCGCCTCG TGCATCAGCT TCAGCCTCAA
AGCCT'CAGCA
TACTAGCCC
AGCCTCAr3CG
AACAAGTGCG
AGCTTCAGCA
AACCAGTGCC
GGCTTCAGCG
ACATCAGCTT
CCTCAGCAAG
CAAGTGCTTC
TCAGCAACTA
'rCTGAATCGG
TCAACGAGTG
TCAGCCrCAG
TCGACAAGTG
TCGGCTTCAG
AGTACCAGTG
TCTGAATCCG
TCAACGAGTG
CTGAATCTGC
TACAAGTGCT
AGCCTCACCC
CACAAGTGCG
AGCATCAGCA
S
S. 55
S
S
S. S.
S
S
S.
S S 55555
S
S. S. Sc S
S
555.
*t S S S AGCAAGTACC AGTGCGTCAG cTTCAGCAAG TGCTTCGGCT TCGGCATCAA CAAGTGCC'rC CCAGTGCkTC AGCCTCAGCG
CATCAACCAG
CGTCCGCcTC
CGTCGACAAG
CGTCGCTTC
CATCAACGAG
CGTCGGCTTC
CATCAACAAG
CGTCTGAGTC
ATCAACCAGT
TCACCC CAG
TCAACCAGTG
TCAGCTTCAG
TCAACGAGTG
CAGCCTCAGC
CA.ACGAGTGC
CGGCTTCAGC
CAACGAGTGC
CGGCTTCAGC
GTACCAGCGC
CAGCCTCAGC
CAACGAGTGC
CGGCTTCAGC
CAACAAGTGC
TGCGTCAGCC
AGCAAGTACT
1380 1440 1500 GCAAGTACTA GTGCATCAGC GCGTCTGAAT CGGCATCAAC GCGTCAACCA GTGCATCAGT GCCTCAGCCT CAGCAAGTAT GCAAGTACTA GTGCATCAGC GCCTCAGCTT CAGCAAGCAC GCAAGCACCA GTGCCTCAGC GCGTCGGCTT CAGCAAGTAC GCATCAACAA GTGCT'rCAGC GCTTCAGTCT CAGCGTCAAC GCAAGCACCA GTGCGTCGGC GCGTCTGAAT CGGCATCAAC GCAAGCACAT CAGCTTCTGA ATCAGCATCA ACCAGTGCAT GAGTGCATCA GCATCAGCAT C'TCAGCAAGC ACCAGTGCGT CTCAGCGTCT GAA'rCGGCAT ATCAGCATCA ACGAGTGCAT CAGTGCG'rCA GCCTCAGCAA TTCAGCAAGT ACCAGTGCGT CTCAGCGTCT GAATCAGCAT TTCAGCAAGT ACCAGTGCGT CAC'rGCCTCT GAATCAGCAT CGCCTC.AGCT 1560 AGCAAGTACC 1620 TGCATCAGCT 1680 AGCATCAACG 1740 TGCCTCGGCT 1800 AGCATCAACG 1860 GCGTCACTTC 1920 CATCAACCAG 1980 CCTrCGGCTTC 2040 CATCAACCAG 2100 CGTCAsCTCA 2160 AAGTATCTCA 2220 ATCGGCTCA 2280 ATCAACCACT 2340 GTCAGcCTCA 2400 AAGTACCAGC 2460 CTCAGCCTCA 2520 GTCGACAAGT 2580 ATCAGCTTCA 2640 ATCAACGAGT 2700 CTCGGCTTCA 2760 ATCGACAAGT 2820 GTCAGCCTCA 2880 GTCAACCAGT 2940 GTCGGCCTCA 3000 TTCAGCAAGC 3060
TTC.AGCAAGT
GAGTGCTTCG
ATCTGCATCA
ACTAGTGCAT CGGC= CAGC GCTTCAGCAT CAACGAGTGC ACCAGTGCGT CCGCTTCAGC GCTTCAGCA'r CAACGACTGC GCGTCGGCTT CAGCGTCGAC AAGTCTTCG GCAAGCGCAA GTACCTCAGC GTCAGCTTCC GCC 'CAACCA GTGCGTCCGC 1378 ACAAGTGCGT CAGCCTCAGC AAGTATCTCA GCGTCTGAAT CGGCP.TCAAC GAGTGCGTCG GCCTCAGCAA GCGCAAGTAC CTCAGCGTCA GCTTCCGCCT CAACCAG1TGC GTCGGCTTCA GCAAGCACAA GTGCGTCAGC CTCAGCAAGr ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCGTCTGAGT CAGCATCAAC GAGTACGTCA GCCTCAGCAA GCACATCAGC TTCTGAATCG
GCATCAACCA
GC1-rCAGCCT
ACCAGTGCGT
GTGCGTCAGC CTCAGCATCG CAGCGTCGAC AAGTGCGTCG CAGCCTCAGC AAGTACTAGT ACAAGCGCCT CAGCTTCAGC AAGTACCAGT GCCTCAACCA GTGCATCTGA ATCGGCATCA GCATCAGCTT CAGCATCAAC GAGTGCATCG GCTTCAGCAT CAACCAGTGC CTCGGCTTCA ACCAGTGCT'r CAGTCTCAGC ATCAACAAGT GCT'rCAGCA-A GCACATCAGC ACCAGTGCGT CAGCCTCAC G CTT CAGCAT CAACGAGTGC ACCAGTGCGT CAGCTI'CCGC GCTTCAGCAA GTACTAGCGC
ATCTGAATCA
GTCGACAAGT
ATCGGCTTCG
ATCAACAAGT
CTCAGCCTCA
GCGTCAACCA G'rGCGTCAGC GCTTCAGCCT CAGCA'rCGAC GCGTCGACAA GCGCCTCAGC GCGTCAGCCT CAGCAAGTAC GCGTCAACCA GTGCATCAGA GCCTCGGCTT CAGCAAGCAC GCCTCAACCA GTGCGTCAGC
TTCAGCAAGT
AAGTGCCTCG
TrCAGCAAGT
TAGTGCATCA
GTCAGCAAGT
CAGTGCGTCG
CTCAGCAAGT
3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4483 ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCGTCCGCTT CAGCAAGTAC TAGCGCCTCA GCCTCAGCGT CAACAAGTGC ATCGGCTTC-A GCGTCAACGA GTGCGTCTGA ATCGGCATCA ACGAGTGCGT CCGCTTCAGC GCT'rCAGCAT CAACGAGTGC ACAAGTGCAT CGGGTTCAGC CCTCAkCAAG CACATCAGCT AAGCGCCTCG GCCTCACCAA TTCAGCCTCA ACAAGTGCT CCAGTGCGTC ACTTCAGCAA TTCGGCATCA ACAAGTGCCT AAGTACTAGC GCCTCAGCCT CAGCcrTCAAC AAGTGCATCG GTCCGCTTCA GCAAGTACTA GCGCCTCAGC CTCAGCGTCA GTCAACGAGT GCGTC'rGAGT CAGCATCAAC GAGTCCGTCA TC'rGAATCTG CATCAACCAG TGCGTCACTT CCGCATCAAC GTACAAGTGC TTCAGCCTCA GCATCAACCA GTGCATCAGC CAGCCTCAGC CTCAGACCAG 'rGCCTCGGCT TCAGCAAGTA GCACAAGTGC GTCAGCTTCA GCATCAACCA GTGCTTCGU;Z CAGCATCAGC ATCAACGAGT GCG INFORMATION FOR SEQ ID NO: 364: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2550 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 364: 1379
GTACCTCAGC
CCTCAGCAAG
CAAGTACCTC
GTCASCTCAG
'DCAACGAGTA
TCAGCCTCAG
GTCC?1'CCGC CTCAACCAGT GCGTCCGCTT CAGCAAGCAC AAGTGCGTCA TArCTCAGCG TCTGAA'rCGG CATCAACGAG TGCGTCCCCC 'rCAGCAAGCG AGCGTCACTT CCGCCTCAAC CAGTGCCI'CG GC~rCAGCAA GCACAAGTGC CAAGTATCTC AGCGTCTGAA TCGGCATCAA CGAGTGCGTC TGAGTCAGCA CGTCAGCCTC AGCAAGCACA TCAGCTTCTG AATCGGCATC AACCAGTGCG CATCGACAAG CGCCTCAGCT TCAGCAAGTA CCAGTGCTTC AGCCTCAGCG 120 180 240 300 360 TCGACAAGTG CGTCGGCCTC AACCAGTGCA 'rCACCAAGTA CTAGTGCATC AGCTTCAGC-A AGTGCCTCGG CTTCACCGTC AACCAGTGCG TCAGCATCAA CAAGTGCTTC AGCCTCAGCA TCTGAATCGG CATCAACCAG TCAACGAGTG CATCGGCr'rC TCAGCTTCAG CAAGTACCAG TCGACAAGTG CCTCGGCTTC TGCGI'CAGCC 420 AGCATCAACC 480 TGCTTCAGTC 540 AGCAAGCACA 600 TC.AA'rCCGCA 660 TCAGCATCTG AATCAGCGTC GACAAGTGCG TCAACCAGTG CGTCAGCCTC AGCAAGTACT TCGGCTTCGG CG'TCAACCAG TGCATCAGAG TCAACAAGTG CCTCGGCTTC AGCAAGCACA TCGGCTTCAG CAAGTACCAG TGCTTCAGCT AGCACCTCAG CTTCTGAATC GGCCTCAACC TCTGAATCGG CCTCAACCAG CGCCTCAGCC AGCACAAGCG CCTCGGGTTC AGCATCAACG TCAGCC'rCAG CATCAACAAG TGCGTCAGCC TCAACGAGTG CGTCTGAGTC AGCATCAACG TCAGCCTCAG CAAGTATCTC ACCGTCTGAA AGTACTAGCG CCTCAGCATC AGCGTCAACA TCTGAGTCAG CATCAACGAG TACGTCAGCC TCAACCAGTG CGTCAGCCTC AGCATCGACA TCAGCCTCAG CAAGTACCAG TGCrrCAGCC AGTGCATCTG AATCGCGCATC AACCAGTGCG CAGCATCAAC GAGTGCATCG GCTTCGGCGT GTGCGTCAC t TCCGCATCAA CAAGTGCCTC AGCGTCAACC AGTGCNTCGG CTTCAGCAAG 'rCGGCCTCAA
AGTGCATCAG
TCAGCAACTA
TCAGCATCTG
TCAGCATCAA
AGCGCCTCGG
TCAGCATCAA
AGTACGTCAG
TCAGCAAGTA
AGTACGTCtAG TCGGCATCA6A
AGTGCTTCGG
TCAGCAAGCA
AGCGCCTCAG
CCAGTGCATC
CTTCACCATC
CCAGTGCGTC AGCT1TCCGCA AkATCAGCGTC AACCAGTGCT
AACC-AGTGCA
CCAGCGCCTC
CCTCAGCAAG
CGAGTGCTTC
CTTCAGCGTC
GGCCTCAGCA
CACCTCAGCT
GGCI'TCAGCA
AACCAGTGCT
TCTCAGCGTC TGAATCGGCA CCTCAGCAAG CACAAGTGCT CGAGTGCGTC CGCTTCAGCA CTTCAGCGTC AACGAG'rGCG CATCAGCTTC TCAATCTGCA CTTCAGCAAG TACCAGTGCG 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 TCAGCGTCGA CAAGTGCGTC GGCCTCAACC TCAGCTCAGC AAGTACTAGT GCATCAGCTT CAACCAGTGC ATCAGAGTCA GCAAGTACCA GGC1'TCAGCA AGCACATCAG CATCTGAATC TACCAGTGCT TCAGCTTCAG CATCAACCAG CGCCTCGGCC TCAGCAAGCA CCTCAGCTTC AGCAAGCACC TCAGCTTCTG AATCGGCCTC TGCTTCGGCT TCAGCAAGCA CAAGCGCCTC AGCGTCAACC ACTGCTTCAG CCTCAGCATC AGCGTCTGAA TCGGCATCAA CGAGTGCGTC AGCAAGCACC TCAGCI-rCTG AATCGGCCTC CGCCTCAGCT TCAGCAAGTA CCAGTGCTTC AACCAGTGCA TCTGAATCGG CATCAACCAG GGCTTCAGCA TCAACCAGTG CCTCGGCTTC TACCAGTGCT TCAGTCTCAG CATCAACAAG GGCTCAGCA AGCACATCAG CATCTGAATC TACCAGTGCG TCAGCCTCAG CGTCGACAAG AGCTTCAGCA TCAACGAGTG CATCGGCTTC TACCAGTGCG TCAGTTCACG CATCAACAAG 1380
TGAATCGGCC
AACCAGCGCC
GGGTTCAGCA
AACAAGTGCG
TGAGTCAGCA
AACCAGTGCG
AGCCTCAGCG
TGCGTCAGCC
AGCGTCAACC
TGCTTCAGCC
AGCGTCGACA
TGCGTCAGCT
GGCGTCAACC
TCAACCAGCG CCTCGGCCTC TCAGCCTCAG CATCAACGAG TCAACGAGTA CGTCAGCTTC TCAGCCTCAG CAAGTATCTC TCAACGAGTA CGTCAGCCTC TCAGCCTCAG CATCGACAAG TCGACAAGTG CGTCGGCCTC TCAGCAAGTA CTAGTGCATC AGTGCGTCAG CTTCAGCAAG TCAGCATCGA CAAGTGCCTC AGCGCCTCAG CTTCAGCAAG ACAGCAAGTA CTAGTGCATC AGTGCATCAG AGTCAGCAAG 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2550 INFORMATION FOR SEQ ID NO: 365: SEQUENCE CHARACTERISTICS: LENGTH: 1436 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 365: ACCCAGCAAG TACTAGTGCA TCGGCTTCAG CCAGTGCCTC AGCCTCAGCA AGTATCTCAG CTCAGCAAGT ACTAGTGCAT CAGCATCAGC CAGCGCCTCA GCTTCAGCAA GCACCAGTGC
CAAGCACCAG
CGTCTGAATC
ATCAACGAGT
TGCGTCGGCT TCAGCATCAA GGCATCAACG AGTGCGTCAC rCATCCCTT CAGCAAGTAC
TCAGCAAGCA
AGTGCGTCGG
TCAGCATCAA
AGTGCGTCCG
TCAGCGTCAA
CCAGTCCCTC
CTTCAGCAAG
CA.AGTGCTTC
CTTCAGCA.AG
CGAGTGCGTC
AGCTTCAGCA
TACCTCAGCG
ACCTTCAGCA
TACTAGCGCC
TGAGTCAGCA
CTCAsCTCAG CAAGTACCAG ACTACCAGTG CGTCAGCCTC TCTGAATCAG CATCAACGAG
CGCCTCAGCC
AGCGTCGACA
TGCATCAGCT
AGTATCTCAG
TCAGCATCAG
TCAACGAGTA
CGTC TGA.ATC GGCATCAACG CGTCAACAAG TGCTTCGGCT CGTCAGCCTC AGCAAGCACA CATCGACAAG CGCCTCAGCT TCACCTTCTG AATCTGCATC AACCAGTGCG TCAGCCTCAG TCAGCAAGTA CCAGTGCGTC! AgCCTCAGCA 1381
AGTACCAGTG
TCGGCATCAA
AGTGCGTCCG
AG'rGCGTCGG CCTCAACCAG AGTACTAGCG CCTCAGCCTC TCAGCTrCAG CAAGTACTAG AGTACCAGTG CGTCAGCCTC TCTGAATCAG CATCAACAAG TCAACAAGTG C'rTCAGCTTC TCAGTCTCAG CGTCAACCAG AGCACCAGTG CTTCGGCTTC TCAGCCTCAG CAAGCACATC TCAACAAGCG CCTCGGCCTC TCAGCTTCAG CCTCAACAAG AGTACCAGTG CGTCAGCTTC TCGGCTTCGG CATCAACAAG
TGCATCTGAA
AGCATCAACG
CGCCTCAGCC
AGCGTCGACA
TGCGTCGGCT
AGCAAGTACC
TGCCTCTGAA
AGCGTCAACG
AGCTTCTGAA
AGCAAGTACA
TGCTTCAGCC
AGCA.AGCACA
TGCCTCAGCA
C7TCAGCCTC AGCGTCGACA CCAGTGCGTC AGCCTCAGCA CTTCAGCAAG TACTAGTGCA TCAGCGTCGA CAAGCGCCTC AGCTrCAGCA AGTGCGTCGG CTrCAGCAAG TACCTCAGCG TCAGCATCAA CGAGTGCATC AGCTTCAGCA AGTGCGTCGG CTTCAGCATC AACGAGTGCT TCCGCATCAA CAAGTGCCTC GGCTTCAGCA AGTGCGTCrG AGTCAGCATC AACGAGTGCG TCTGCATCAA CCAGTGCGTC AGCTTCCGCA AGTG CTTCAG CCTCAGCATC AACCAGTGCA
TCAGCGTCAA
AGTGCGTCAG
TCAGCATCAA
CCAGTGCCTC GGCTTCAGCA CTTCAGCATC AACCAGTGCT CGAGTGCGTC AGCCGG INFORMATION FOR SEQ ID NO: 366: SEQUENCE CHARACTERISTICS: LENGTH: 735 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO; 366: GCAGTTGCCA CACCGTGCTG CG TWTGGCAA GCCAAACTTG AGGTCTTCCC GTTTGAGATA AATAAAGAGG AGTTTCACGT AACTTGGGTC TGCCTGACT ATGTTTCTGG GACAAAACGT GATATGCCAT GCTTTACCCT TCTGTCTCCA CTCCGCTCGA ACTACATTAT CTTCATTAAG ACCAGCACCC GTTCCTGCGA TAATrMCTT TTTACCCATG TCCTAAGGCA TTGTTAATCT TGTGGGCTCC TGTATGGTTA AATCTTGCTC CGCCAATATG CTGGGTCAAG TTl'rTGCGT CCTACGTACT GGCGCAAAAG CTGGTTTAAT TCCTCTTGGA TCACGGTAGG CCTTCTCCAA CTCCAAAACT GCTGTCATCA CCGCCGAATT TTCCGTAAAA TCCATCTTTA TTTGGTTCCT CTCTATAAAT CTTCTAATCT TTTCATGATC TTTTTGTCCA TACATCTACT GCATAGGGAG TAAAGTGTTG AATTGCTTTT GCCACCTGCG ATAAAGAAGG GCTGTGCTAG TCCAGTCGTA 1382 TCCAGTTGAC CCCAATCAAA GGGCTGGCCA CTTCCTGCCA CAGGGGCATC AAAGAGTAGA TAATC'rGCCT GAGAATrGGG GACATGCCCA TTTCCATCTA CCTGCACAGC CTGAATACTG GCACAAGGCA AATTCTCAAA TAAATCATCT GCCACCTGAC CGTGAACTTG AACCAAGTCC AAGCCGGGGA TCCTC INFORMATION FOR SEQ ID NO: 367: SEQUENCE CHARACTERISTICS: LENGTH: 1702 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 367: a.
a. a a a a.
a a *a .a a a a.
a a *.aa a.
a TACTAGCGCC TCAGCCTCAG CGTCAACAAG CGCTTCAGCA AGTACTAGCG CCICAGCCTC AACGAG'TGCG TCTGAGTCAG CATCAACGAG TGAATCTGCA TCAACCAGTG CGTCAGCCTC TACCAGTGCG TCAGCCTCAG CGTCGACAAG AGCCTCAGCA AGTACCAGTG CGTCAGCCTC TGCATCTGAA TCGGCATCAA CCAGTGCGTC AGCATCAACG AGTGCATCGG CTTCAGCATC AGCGTCAACA AGTGCATCG4G TGCGTCAGCC TCAGCAAGCA AGCATCGACA AGCGCCTCAG TGCGTCGGCT TCAGCAAGTA AGCGTCGACA AGTGCGTCGG AGCCTCAGCA AGTACTAGTG AACCAGTGCA TCAGAGTCAG GGCTTCAGCA AGTACTAGCG AACCAGCGCC TCGGCCTCAG GGCTTCAGCA TCAACGAGTG TGCGTCAgCT TCCGCATCAA AGCGTCAACA AGTGCTTCAG AGCGTCTGAA TCGGCATCAA AGCAAGCACC AGTGCGTCGG TGCCTCAGCT TCAGCAAGTA AGCAAGCACA AGTGCTrCAG TGCGTCCGCT TCAGCAAGTA -AGCGTCAACG
AGTGCGTCTG
AGCTTCTGAA TCTGCATCAA AGCAAGTACC AGTGCGTCAG TGCGTCGGCC TCAACCAGTG TACTAGCGCC TCAGCCTCAG
CAAGTGCCTC
CT'rCCGCGTC
CAAGTGCCTC
TGCATCGGCT TCAGCATCAA CGAGTGCGTC CTTrCAGCGTC
CATCAGCTTC
CTTCAGCAAG
CCAGTGCGTC
CCTCAACCAG
CATCAGC'TC
CAAGTACCAG
CCTCAGCCTC
CAAG'rATCTC
CATCAGTCTC
CATCAACCAG
CATCGGCT-2C
CATCAACGAG
CTTCGGCTTC
CAAGCACATC
CCTCAGCTTC
CGTCGACAAG
CCTCAGCAAG
CTAGTGCATC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 CCTCAGCAAG CACCAGCGCG TCTGAATCCG CCTCAGCATC TGAATCAGCA TCAACAAGTG CCTCAGCAAG TATC'rCAGCG TCTGAATCGG CTAGCGCCTC AGCATCAGCG TCAACAAGTG AGTCAGCATC AACGAGTACG TCAGCCTCAG CCAGTGCGTC AGCCTCAGCA TCGACAAGCG CCTCAGCAAG TACCAGTGCT TCAGCCTCAG CATCTGAATC GGCATCAACC AGTGCGTCAG CATCAACGAG TGCGTCCGCT TCAGCAAGTA 1383 AGCATCAGCA TCAACGAGTG CATCGGCTTC AGCAAG'PACC AGCGCCTCAG CTTCAGCAAG CACCAGTGCG TCAGCCTCAG CAAGTACCAG CGCCTCAGCC TCAGCAAGCA CCAGTGCCTC AGc7rCAGCA AGTACCAGTG CGTCAGCCTC AGCGTCGACA AGTGCGTCGG CTrCAGCAAG TACCTCAGCG TCTGAATCAG CATCAACGAG TGCATCAGCT TCAGCATCA.A CAAGTGCTTC AGCrTCAGCA AGTACCAGTG CGTCGGCTTC AGCATCAACG AGTGCI'TCAG TCTCAGCGTC AACCAGTGCC 'rCTGAATCAG CA'rCAACAAG 'rGCCTCGGCT TCAGCAAGCA CCAGTGCGTC GGCTI'CAGCA AGTACTAGTG CATCGGCTTC AGCATCGACA AGTGCGTCTG AATCGGCATC AACGAGTGCT TCGGCTTCAG CATCAACGAG TGCGTCAGCC TCAGCAAGCA CATCAGCTTrC TGAATCTGCA 'rCAACCAG'rG CC INFORMATION FOR SEQ ID NO: 368: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 941 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1260 1320 1380 1440 1500 1560 1620 1680 1702 0 .0 0 0 *J 0 *0 0 0 00 .0 0 0 S 9 00**0* *0.0 *fl.
a. a *000 900* 550.
S. *S 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 368: ACCAGTGCAT CAGCT'rCAGC CTCAACAAGT GCTTCAGCAA GTACCAGTGC GTCACTTCAG CAGTGCTTCG GCTrCGGCAT CAACAAGTGC TCAGCAAGTA CTAGTGCATC AGCATCAGCA TCAkGCGTCTG AATCGGCATC AACGAGTGCA TCAGCGTCAA CCAGTGCATC AGTCTCAGCA AGTGCCTCAG CCTCAGCAAG TATCTCAGCG TCAGCAAGTA CTAGTGCATC GGCTTCAGCA AGTGCCTCAG CCTCAGCAAG TATCTCAGCG GCTTCAGCCT CAGCGTCAAC CAGTGCCTCG CAAGCACAAG TGCGTCACTT CAGCATCAAC
CTCAGCATCA
TCAACCAGTG
TCAGCATCAG
AGCACCAGTG
TCTGAATCGG
AGCACCAGTG
TCTGAATCGG
GCATCAACGA GTGCGTCACC CATCAGCCTC AGCAAGTATC CATCAACGAG TGCATCCGCT CGTCGGCTTC AGCATCAACG CATCAACGAG TGCGTCAGCC CGTCGGCTTC AGCATCALACC CATCAACGAG TGCGTCAGCC CATCGGCTTC AGCAAGTACC CAAGTACCAG CGCCTCAGCC TCAGCAAGTA CTAGTGCATC AGCATCAGCA TCAACGAGTG AGCGCCTCAG CTrCAGCAAG CACCAGTGCG TCAGCCTCAG
TCAGCAAGCA
AGTGCGTCGG
TCAGCATCAA
CCAGTGCCTC
CTTCAGCAAG
CAAGTGCTTC
AGCTTCAGCA AGTACCAGTG CGTCAGCCTC TACCTCAGCG TCTGAATCAG CATCAACGAG AGCTTCAGCA AGTACCAGTG CGTCGGCTTC
AGCGTCGACA
TGCATCAGCT
AGCATCAACG
1384 AGTGCTrCAG TCTCAGCGTC AACCAGTGCC TCTGAATCAG CATCAACAAG TGCCTCGGCT TCAGCAAGCA CCAGTGCGTC GGCTTCAGCA AGTACTAGTG C INFORMATION FOR SEQ ID No: 369: SEQUENCE CHARACTERISTICS: LENGTH: 869 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 369: CAGCAAGTAC TAGTGCATCA GTGCATCAGA GTCAGCAAGT CAGCAAGCAC CAGTGCGTCG GTGCGTCAGC CTCAGCAAGT CAGCAAGTAC TAGCGCCTCA GTGCGTCTGA ATCGGCATCA CAGCGTCAAC AAGTGCATCG GCGCCTCAGC CTCAGCGTCA CAGCATCAAC GAGTGCGTCA GTGCGTCAGC CTCAGCATCG CAGCGTCGAC AAGTGCGTCG GTGCGTCAGC CTCAGCGTCG CAACCAGTGC GTCAGCCTCA CGCCTTCAGC ATCAACCAGT GCTTCAGCAT CAACGAGTGC ATCGGCTTCT GCGTCAACCA ACCAGTGCGT CAGCTTCCGC ATCAACAAGT GCCTCGCCTT GCTTCAGCAA GTACTAGCGC CTCAGCCTCA GCC'rCAACCA ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCG-CCGCTT GCCTCAGCCT CAACAAGTGC ATCGGCTTCA GCGTCAACGA ACGAGTGCGT CCGCTTCAGC AAGTACTAGC GCCTCAGCCT GCTTCAGCAT CAACGAGTGC GTCCGCTTCA GCAAGTACTA ACAAGTGCAT CGGCTTCAGC GTCAACGAGT GCGTCTGAGT GCCTCAGCAA. GCACATCAGC TTCTGAATCT GCATCAACCA ACAAGCGCCT CAGCTTCAGC AAGTACCAGT CCGTCAGCCT GCTTCAGCAA GTACCAGTGC GTCAGCCTCA GCAAGTACCA ACAAGTGCGT CGGCCTCAAC CAGTCCATCT GAATCGGCAT GCAAGTACTA GTGCATCAGC TTCAGCATCA ACGAGTGCAT GCATCAGAGT CAGCAAGTAC CAGTGCGTCA GnTTCCGCAT GCAACAAGTG CCTCGGC?1'C AGCAAGTAC INFORMATION FOR SEQ ID NO: 370: SEQUENCE CHARACTERISTICS: LENGTH: 750 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 370: TCAACAAGTG CCTCAGCATC AGCATCAACG AGTGCGTCAG CCTCAGCAAG TACTAGTGCA 1.385 TCAGCATcAG CATcAACCAG TGCATCAGCC TCAACGAGTG CATCAGCATC AGCATCAACG TCAGTrCTCAG CAAGCACCAG TGCGTCGGCT AGrATCTCAG CGTCTGAATC GGCATCAACG TCGGCTTCAG CAAGCACCAG TGCGTCGGCT AGTATCTCAG CGTCTGAATC GGCATCAACG TCAGCATCAG CATCAACGAG TGCATCGGCT AGCACCAGTG CGTCAGCCTC AGCAAGTACC TCAGCTTCAG CAAGTACCAG TGCGTCAGCC ACTACCTCAG CGTCTGAATC AGCATCAACG TCAGCTTCAG CAAGTATCTC AGCGTCTGAA.
AGTACTAGCG CCTCAGCATC AGCCTCAACG TCAGCAAGTA TCTCAGCGTC TGAATCGGCA AGTGCATCGG CTTCAGCGTC AACCAGTGCA TCAGCATCAA CGAGTGCCTC AGTGCGTCAG CCTCAGCAAG TC.ACCATCAA CCAGTGCCTC
TCAGCAAGTA
AGCGCCTCAG
TCAGCGTCGA
AGTGCATCAG
TCGGCATCAA
CCAGCGCCTC
CCTCACCAAG
CAAGTGCGTC
CTrTCAGCATC
CGAGTGCGTC
AGCCTCAGCA
TACTAGTGCA
AGCCTCAGCA
TACTAGTGCA
AGCTTCAGCA
CACCAGTGCC
GGCrTCAGCA
AACAAGTGCT
CCCT'rCAGCA INFORMATION FOR SEQ ID NO: 371: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 957 base pairs TYPE: nucleic acid CC) STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 371: CCGGAAAACA GCTCTGGCGC TTGGTCTTGC CCAGCGTATT GCTAGTGGTG ACGTGCCTGC GGAAATGGCT AAGATGCGCG CTTCCGTGGT GACTTTGAAG CCAAGTCATC CTCTTTATCG TTCGACTCTG GATGCGGCC-A GGTTGGTGCC ACTACTCAGG TCGTCGTTTC GCTAAAGTGA ACAAGGTTTG AAGGCGACTI' TGAAACAGCG GTTAAGATGG TATCGATCTC TTGGATGAGG AGACGA'PTCA GATTTGAGTC TGTTAGAACT TGATTTGATG AATGTCGTTG AACGCATGAA TAATATCATC AAGGATA'ITG ATGAACTCCA CACCATCATCGCGTTCTGGTA ATATCTTGAA ACCAGCCTTG GCGCGTGGAA AAGAATATCA AAAACATATC GAAAAAGATG CGATITGAAGA ACCAAGTGTG GCAGATACTA ATGAGAAACA TCACCGTGTA CAAATCACAG CTCATCGTTA TTTAACCAGT CGTCACTTGC CGGCAGCAAC AGTGCAAAAT AAGGCAAAGC CAGCTGACAA GGCCCTGATG GATGGCAAGT
CAGGGACACG
AAGAAGATGG
GCGGGATTGA
CTTTGAGAAC
CGGCACTTTC
TGACTAT
ATGAAGCGGT
CAGACTCTGC
ATGTAAAAGC
GGAAACAGGC
1386 AGCCCAGCTA ATCGCAAAAG AAGAGGAAGT ACCTGTCTAC AAAGAC?1'GG TGACAGAGTC TGATATI-TTG ACCACCTrGA GTCGCTTGTC AGGAATCCCA GTTCAAAAAC TGACTCAAAC GGATGCTAAG AAGTATTTAA ATCTrGAAGC AGAACTCCAT AAACGGGTA TCGGTCAAGA TCAAGCTGTT TCAAGCATTA GCCGTGCCAT TCGCCGCAAC CAGTCAGGGA TrCGCAGTCA TAAGCGTCCG ATTGGTTCCT TTATGTTCCT AGGGCCTACA GGTGTCGGGG TATCCGA INFORMATION FOR SEQ ID NO: 3172: SEQUENCE CHARACTERISTICS: LENGTH: 807 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 372: CAAAGCGCCT CAGCTTCAGC ATCAACAAGT GCGTCGGCTr CAGCATCAAC GCTTCAGCGT CAACCAGTGC GTCACATTCA GCAAGTACCA GTGCTTCAGT ACAAGTGCTT CAGCCTCAGC ATCGACAAGT GCCTCGGCTT CAGCAAGCAC
CAGTGCCTCG
CTCAGCATCA
ATCAGCATCT
GAATCAGCGT CAACCAGTGC ACCAGCGCCT CGGCCTCAGC GCCTCAGCAA GCACCTCAGC ACGAGTGCTT CGGCTTCAGC GCTTCAGCGT CAACCAGTGC ATCTCAGCGT CTGAATCGGC GCCTCAGCAA GCACCTCAGC ACAAGCGCCT CAGCTTCAGC TTCGGCTTCA GCAAGTACCA GTGCTTCAGC 'ITCAGCATCA AAGCACCTCA GCTTCTGAAT CGGCCTCAAC CAGCGCCTCG TTCTGAATCG GCCTCAACCA GCGCCTCAGC CTCAGCATCA AAGCACAAGC GCCTCGGGTT CAGCATCAAC GAGTACGTCA TTCAGCCTCA GCATCAACA.A GTGCGTCAGC CTCAGCAAGT ATCAACGAGT GCGTCTGAGT CAGCATCAAC GAGTACGTCA 'rrCTGAATCG GCCTCAACCA GTGCGTCAGC CTCAGCATCG AAGTACCAGT GCTTCAGCCT CAGCGTCGAC AAGTGCGTCG GCCTCAACCA GTGCATCTGA ATCGGCATCA GCATCGGCTT CAGCATCAAC CAGTGCCTCG GCAAGTACCA TGTGCTTCAT GTCTCAG INFORMATION FOR SEQ ID NO: 31 SEQUENCE CHARACTERISTICS LENGTH: 1068 base TYPE: nucleic acid STRANDEDNESS: doub] TOPOLOGY: linear ACCAGTGCGT CAGCCTCAGC AAGTACTAGT GCTTCAGCGT CAACCAGTGC GTCAGCTTCA 1387 Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 373: CATCGGCTTC AGCATCAACG AGTGCGTCCG CTTCAGCAAG TACTACCGCC TCAGCCTCAG CGTCAACAAG TGCATCCGCr TCAGCGTCAA CGAGTGCGTC TGAGTCAGCA TCAACGAGTG CGTCACCTCA GCAAGCACAT CAGCTTCrGA ATCTGCATCA ACCAGTGCGT CACCTCAGCA TCGACAAGCG CCTCAGCTTC AGCAAGTACC AGTGCGTCAC CTCAGCGTCG ACAAGTGCGT CGGCTrCAGC AAGTACCAGT GCGTCASCTC ACAAGTGCGT CGCCCTCAAC CAGTGCATCT CAAGTACTAG TGCATCAGCT TCAGCA'rCAA CATCAGAGTC AGCAAGTACC AGTGCGTCAG CAAGTACTAG CGCCTCACCC TCAGCGTCAA CCT4CGGCCTC AGCAAGTATC TCAGCGTCTG CATCAACGAG TGCATCAGTC TCAGCALAGCA CGTCTGAATC CGCATCAACC AGTGCCTCAG CATCAACAAG TGCATCGGCT TCAGCAAGCA CGTCTGA.ATC GGCATCAACG AGTGCGTCCG CGTCAACAAG TGCTrCGGCT TC-AGCGrCAA CGTCAGCCTC AGCAAGCACA 'rCAGCT'rCTG CATCGACAAG CGCCTCAGCT 'rCAGCAAGTA CTTCAGCCTC AGCGTCGACA AGTGCGTCGG AGCAAGTACC AGTGCGTCAC CTCAGCGTCG GAATCGGCAT CAACCAGTGC GTCACCTCAG CGAGTGCATC GCTCAGCA TCAACCAGTG cTTCCGCATC A.ACAAGTGCC TCGGCT'rCAG CAAG'TGCTTC AGCI"ICCGCG TCAACCAGCG AATCGGCATC AACAAGTGCC TCGGCTTCAG CCAGTGCGTC GGCCTCAGCA AGCACCAGCG CTrCAGCAAG TACCTCAGCA TC1'GAATCAG CAAGTGC1'TC AGCCTCAGCA AGTATCTCAG C'rTCAGCAAG TACTAGCGCC TCAGCATCAG CGAGTGCGTC TGAGTCAGCA TCAACGAGTA AATCTGCATC AACCAGTGCG TCAGCCTCAG CCAGTGCGTC AGCCTCAGCA AGTACCAGTG GCTCAACCAG TGCATCTG 6*6*6*
S
66 6 6 6*6 6 *6 S S
S
.5 6 5 0 5 0 6 5665 56 05 S SS*0 5 5.66 06 *6 6 6 INFORMATION FOR SEQ ID NO: 374: SEQUENCE CHARACTERISTICS: LENGTH: 620 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 374: CAGCATCAAC GAGTGCTTCA GTTTCAGCGT CAACCAGTGC CTCTGAATCA GCTrCAACAA GTGCCTCGGC TTCAGCAAGC CCCAGTGCGT CGGCTTCAGC AAGTACTAGT GCATCGGCTT CAGCATCGAC AAGTGCGTCT GAATCGGCAT CAACGAGTGC TTCGGCTTCA GCATCAACGA GTGCGTCAGC CTCAGCAAGC ACATCAGCTT CTGAA'PCTGC ATcAACCAGT GCGTCCGy'TT 1388 CAGCGTCAAC CAGTGCGTCG GCTTCAGCGT CGACAAGTGC GTGCGTCGGC CTCAGCAAGC GCAAGI'ACCT CAGCGTCAGC CGGCTrCAGC AAGCACAAGT GCGTCAGCCT CAGCAAGTAT CAACGAGTGC GTCTAGTA GCATCAACGA GTACGTCAGC CTGAATCTGC ATCAACCAGT GCGTCAGCCT CAGCATCGAC GTACCAGTGC TTCAGCCTCA GCGTCGACAA GTGCGTCGGC CGGCATCAAC CAGTGCGTCA INFORMATION FOR SEQ ID NO: 375: SEQUENCE CHARACTERISTICS: LENGTH: 720 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TTCGGCTTCA GCATCAACGA TTCCGCCTCA ACCAGTGCGT CTCACGTCT GAATCGGCAT CTCAGCAAGC ACATCAGCTT AAGCGCCTCA GCTTCAGCAA CTCAACCAGT GCATCTGAAT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 375:
GTATTGGGGC
AACCATTTAG
T'rATTTGGTG
GGCCAGTAGG
AGTGGTGCGT
ACCTATCTAC
CAGTGTGACA
GCGTATTTAC
TGATGGGGAC
CGCGCGTGTT
GATTGATGGT
ATTTGCTAAT
GCCCCAACCT CTATGTGACT ACGGATTATT TCCTAGATTA CATGgGGATA AAGAATTACC AGTGATTGAT AAAGGATAGA AGAAGATGAG AGAAAAGCAG AAGAGCTGAT GAACTAGCAA CCACTATCAA AACGAAGAAA AGGTCTACTA GATGATAAGG GTCGCAAGAC CCTGTGGGTC GTTTGGACTG TTTACAGACG AGATGATTCA AAAGGTGTGG CCAATAAGGA AAGAAAACCA AGCCATAATA AAAAATACTG TATTATTACC GAGCTTGAGA TTCAAGCCCA AGAAAGCCAA AATCAATAAG TATATTGCCC ACGCAGGTGT TAAGCAAGGC TTGGTGACGG TTAACGGCCA GTCAGGCGAC AAGGTCGAAG TTGAAGGTCA TCTGCTTAAC AAACCACGCG GTGTGATTTC GGTTGTCGAC CTCTTGCCCA ATGTCAAAGA GGATACATCA GGTGTCTTGA TTTTGACCAA CCCTCGTAAT GAGATTGACA AGGTTTATGT CAATCTCCGC CCCTTGACCC GTGGTCTTGA TATAGGTTT'r GTAGCCTCTA CACCATAAAT CC'rAAGCT GCGA.AATTAT TCAAGTTCTT INFORMATION FOR SEQ ID NO: 376: SEQUENCE CHARACTERISTICS: LENGTH: 648 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1389 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 376: CGCCATTTCC CATCGTACCG CCGAAAATCC CGTTCTCAAA AAAAGTGACC GCTCTCI'CAT CGATGTCAAG GGGGGCGCAC ACTskATTGC 'rACCTACACC GGTCCCATGA CTTrTGAAGGG CTI"TGGAAAT CCAATCGATA rCTCAGATAT AGTCGCCAA'r CGTATTCAAA CAGAATTCCA CAATGATAAA AAACCAAATC CACTCTGGTG TArNATCCTC CCTATCCTAA CCATCATCTT AGATAAGAAA AGAGAAGAAC TTGCATAGAA TTCATTrATA TAGTAGATTG CwACTAGAAT AAATCGATTT GACTGTCCTG ATCGATTTGT CAGCGCCTCA GCCA'rCAAAT ATCCTA'PCAA CATTTTCCA AGTGGTAGCC GCCACTCAAA
CAAAATGGCC
CTTGATTAGC
CAAGAAAATG
ACCTGGAC
GTTTATCCGC
TAGCTrTATC
GAAATGAACC
AGTACACCTC
CCTAATCTTA
AAGGTCCGTA
CCTGAACGTG
TCATGCCGGT
TCGA'rATGAA- AATCATGAAG GCATTGAAAC GAAGAAACCA AACAATGGCA ATCCCTGCCC TCATCCTTGC GCAAGcTrCA TCTGGAACCC rTGGCCAAAC AGCTAAGGTT TACTTCTAAA ACATTTTTAG =rCAxT INFORM4ATION FOR SEQ ID NO: 377: SEQUENCE CHARACTERISTICS: LENGTH: 690 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 377: G1'GCATCGCT TTCAGCATCG ACAAGTGCGT CTGAATCGGC ATCAACGAGT GCTTCGGCTT CAGCATCAAC GAGTGCGTCA. GCTTCAGCAA GCACATCAGC TTCTGAATCT GCATCAACCA GTGCGTCCGC TTCAGCGTCA ACCAGTGCGT CGGCTTCAGC GTCGACAAGT GCTTCGGCTT CAGCATCAAC GAGTGCGTCG GCCTCAGCAA GCGCAAGTAC CTCAGCGTCA GCTTCCGCCT CAACCAGTGC GTCCGCTTCA GCAAGCACAA GTGCGTCAGC CTCAGCAAGT ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCGTCGGCCT CAGCAAGCGC AAGTACCTCA GCGTCAGCTT CCGCCTCAAC CAGTGCGTCG GCTTCAGC.AA GCACAAGTGC. GTCAGCCTCA. GCAAGTA'ICT CAGCGTCTGA ATCGGCATCA ACGAGTGCGT CTGAGTCAGC ATCAACGAGT ACGTCAGCCT CAGCAAGCAC ATCAGCTTCT GAATCGGCAT CAACCAGTGC GTCAGCCTCA GCATCOACAA GCGCCTCAGC TrCAGCAAGT ACCAGTGCTT CAGCCTCAGC GTCGACAAGT GCGTCGGCCT CAACCAGTGC ATCTGAATCG GCATCAACCA GTGCGTCAGC CTCAGCAAGT ACTAGTGCAT 1390 CAGCTTCAGC ATCAACGAGT GCATCGGCTT INFORMATION FOR SEQ ID NO: 378: SEQUENCE CHARACTERISTICS: CA) LENGTH: 1003 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 378: 690 CGAGATTCTC TGGAGTTATG GATGTCGTTC ATATGGGGGG AACAGAATCC TC'rCTTGATT CTTTTTGTCA ACTGTAGTGG GTTGATATAA 1'AGTTTATT CAGTACAAAT CTGAGAGAAG TATCTTGATT GAGCTTGTAG AGAAGAATTA CACAGGATAA CCTGATGCAT TAGACAATTT TAAACCCCAA TTAGCTTAGA TCCTGGATGG AAGAGTT'rCC CCTTTATCGT TGAACAGTTT ACCAAATCAT GAGGTCATTT AACCCAGTAT AACTAAAAGA GCGGATTTCT ATTCGGATTC AGATATCCTT ACTATGCTTG TAAAGAATTG AGCTTGCTTC ACAGCCAACC ATAGTTTGCG ATGCCTCAAC 'TAAcGGGTC
TTATGI!GTGG
GCTCATAAAG
TTTTTTAGCG
TTA?'rCACCC T'rTC'TTTTTT
ATAAGTGTAG
CAATATGTGC ACGTrGGAAT GTTAGTGCTT GAAGACAAGC TAGTCATTAG GCTGGTTTGT TAGTATTAGT GAGTGGGATA AAAGTTTCAT AAGATTTATA TACTAGTGGT GTTTTTGGGG ?TrTTATACT TACAGTTGTT CTGCTCC4AAA GAGATTGATT ATTTTGATAT CAAAAAAATG ACAATGCTTG CTACTrCCTT CTGTCGAATT CAAATCTAAA AACCATCCAG AATCCTTGCC CACCCAATGG GTGT'TrTTA CTAGACAAAA AAAAAAACAC AAkAAAGAAAG GAAACTCACA CAC"FTCCAAA ACAAGTCTTT TTACCAACTA GGTGGTCTTA. TCTTTTTTCA CGAACTTTTT AAGTATTTAG TAACGAATGA CCAACGCCGC GTCCAGTTCC TCTTTCAACT GTTAACAGGT TCAGCTGATG CCTACTTTCC AAAATTATTG TTATCCCGTT TTCTTTCCAG AACTGACGAG CTTGAATTGG TCGAATTCTT TTT
TCTTTCGATG
TCCCAGTTGA
TACTGTCGTT
TATGGAACGG
GAAGGAGGGC
GAAACAGTCC
INFORMATION FOR SEQ ID NO: 379: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 738 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 379:
CCGATGATTC
TGAAATACTA
ACAACCCCGT
CTACTGGCAG
TATTrACTA CC~r'ITrGTGG
TAGGAAGCTA
GAC'TCAAAAC
TGAGGTTG
CTGACGAAGT
ACACCGTTTT
CTGACGTGGT
ATGAAGGTTT
TGATTGGTTT
A'rTGTTTTAG
CAAAATCAAC
AGCAACACCT
TTAACCTTTA
ATGGTCTGA
GCCGCAckGC
ACCGTTTTGA
GATAGAACTG
CGcTCAAAAC
GAGGTTGTGG
T'rGAAGAGTA 1391 GCTCPTACT TTGCTGGGAA TT TTGAGGTA GATCTATGAT CTATTATCCT ATCTCTTCCT TTGATTGTAT TGGTAACrTAT TATTrCCAT GGArGCCACT AGTAATATTG GTAAACCAAG TGGTCAAGGT GCTCACTTTA ?'rGGTGAGrr TGGCTTTA'rr TGGTGATTAC TTATAAATAA AAGAAAACET CAGATATTCA.
AGTTrTC?1- TTTATACTCA ATGAAAATCA AAGAGCAAAC 'PCA-AAACACC CT1"TTGAC--T TGTAGATATA ACTGACGAGC GGTGTAGAT ATAACTGACG AGcGACTCAA ACGAGcGACT CAAAACACCG TTTTGAGGTT ACCGTTTTGA GGTTGTGGAT AGAACTGACG ATAGAACTGA CGAAGCgaaC ATATATACAG TTACTGTCTA TATTTTGGT AAAAATCAAC AACACCGTT'r
GTGCATAGAA
AAtgC tCAAA
CAACGGCGACG
TTACTTGG
INFORMATION FOR SEQ ID NO: 380: SEQUENCE CHARACTERISTICS: LENGTH: 695 base pairs B) TYPE: nucleic acid STRANDEDNESS: double -TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 380: CCGTCTTATC AAAGAGGTTA CGTAAACTTC ATCCACTTTG TAATAGCATC TTGGTCACTA TAACACCTGT TTCAGCGATA TAATCTTC CACATCAATC GACGAACTTC TGGAATGGTT CTTGAGCAAG AGCCTCCGTC GAACGGCTGT AATCCCATCA CTTCCAAACC 'N'GGATATCT TAGCAATACC AGCTACTGGC ACAAAGGCAC CAAATTTCTC GATACGAACG ACTTTAGCAC GCTTCACGAA CCAAACCAGC GAGTAGATAG ACACATTTCC ATCTTGTCGA TGGTTTCTCC TTGATCGTAT CAATTTTCGG
GCTTCAATGA
AAGATTTCTG
CGAGTACCTG
GTCAATACTG
GCCTTGATTG
CATCAAGGAT
CAGTAATCCC
CAACCTTGAA
TGTAGTTATT
GCOACACCACC
AATAATTTCT TTGGCACGGT TTCTTCGTCT ATATCAATCT ACCCTTACCG ATGACAATCT AGCAGTTGGA GCCA.AT'rCTG TTCAAAACGC GCTTTCTTGG TTGAATCTTG ATATCCATT GTCCATATCT CCAAAGTGAT TCCATCTGAG ATAAGCCCCA AGCCATAAGG GCAAGAGTTC 1392 CCGCACAGAT AGAAGCTTGA GATGAAGAAC CGN'GATTC CAAAAC?1TCT GCTACTAGAC 660 GGATAGCGTA GGGGAATTCT TCCAAGCrrG GCAGG 695 INFORMATION FOR SEQ ID NO: 381: SEQUENICE CHARACTRISTICS: LENGTH: 691 base pairs TYPE: nucleic acid S'rRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUEN4CE DESCRIPTION: SEQ ID NO: 381: GACATCT'TAT CTAAATACAT GCTAATATAT ?rAGATACAA ACATTCCAAC TTGATAATrT TCACTCATCT r'rCATCATTrC CTrA'rACAAC TATGCAGTAT AAATAGAATA GTTLrCTCAT 120 CAGAATGAGA CTATTTTAAT ATTAGATCCC CAATTAT1CA CCCCAAATCT AAAA.ACCATC 180 CAGAATCCTT GCCTTAGCTT AGATCCTGGA TGGTrTCTTT TTrCACCCAA TGGGrGrr 240 *TTACTAGACA AAAAAGAGTT TCCCCTTTAT CGTATAAGTG TAGAAAAAAA CACAAAAAGA 300 AAGGAAACTC ACATGAACAG TTTACCAAAT CATCACTTCC AAAACAAGTC TTT TACCAA 360 *C'ATCTTTCG ATGGAGGTCA T?1'AACCCAG TATGGTGGTC TTATC1T=T TCAGGAACTT 420 *TTTTCCCAGT TGAAACTAAA AGAGCGGATT TCTAAGTATT TAGTAACGAA TGACCAACGC 480 *CGCTACTGTC GTTATTCGGA TTCAGATATC CTTGTCCAGT TCCTCTTT'CA ACTGTTAACA 540 GGTTATGGAA CGGACTATGC TTGTAAAGAA TTGTCAGCTG ATGCCTACTT TCCAAAATTG 600 .T-rGGAAGGAG GGCAGCTTGc TTCACAGCCA ACCTTATCCC Gw'TrCT'rTC CAGAACTGAC 660 .GAGGAAACAG TCCATAGTTT GCGATGCCTC A 691 INFORMATION FOR SEQ ID NO: 382: i) SEQUENCE CHARACTERISTICS: LENGTH: 750 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 382: *ATCTCTCTGC GTAATGGTCC TCAGATAACT CTGATGATGT GTGGCGATAT AGAACTGAGC CAAGTTATGC CTAAAGGGCC TTAGGAATAG GAGCTTTCAC AAGCTTATCC AGATGATTAT 120 CTrTACTCG TTATGGACAA TGCTATATOG CATAAATCAA GTACCIAAA GATTCCGACT 180 AATATTGGCT TTGCATAT TCCTCCATAC ACACCAGACA TGAACCCCAT TGAACAAGTG 240 1393 TGGAAAGAGA TrCGTAAACG TGGATT~TAAG ATACAAGGAC TGGAGAAGGA GGTGATAAAG CT TTGAAA ACAGATGAGT ATAAAAAGAA CTGATGAATT- TATAGTAAAA TGAAATAAGA TTTTAGAAGC AGAGGTGTAC TATrCTAGTr AAGCCCI'CA TCAGCCAATC TACTrGTTCA
AATAAAGCCT
TCCATCGTTA
AGTCCTCATT
ACAGGATAGT
TAAATCCACT
GGTGCGAGAG
TTrCGAACTTr GGAAGATGTC ATCGGAGACG GACTAGAATG TCAATAGAAA TCACGACTTT CAAATCGATT TCTAACAATG ATAT TTGGGG AGTGATAGAA CTTTGACATC CTTTTCTGTA ATATCCAAAA TCCTTGACCA AAAAGAGAAA GACTTGATCG CTGGACCAAG TCAGTN'TCC GTTCTCAAAG CGT TCCCAGTAAA. GAACTTTAAA GCGGTCTTTA CGT( GAGAAAGGAT CCAATTCAAA GTGGGTTTGG INFOR.MATION FOR SEQ ID NO: 383: SEQUENCE CHARACTERISTICS: LENGTH: 738 base pairs B) TYPE: nucleic acid STRANI3EDNESS double (D TOPOLOGY: linear
TATATA
CCACCAC
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 383: TCAAATTCTT CGTGGTCCGC ATATCTnTCT TCGTACACGG CAGTCACTTG ACTCGAGTCG CAGCTTCACG GGCCAATTTC TCTTCTACTT GAACTGCCTT CTGTTGTAGG CTGCAATGAT TTCAGCTTGC AATTCAGCAT CCACGTGAAG TCTGCTTTTT CTTTACCGAC AGCAGCAACG ATTTCTTCTT GGAAGGCAAT ACAGCTTCGT GCCCT'rTAAG, GAGCGCTTCC AACATGATTT CTTCTGACAA CCAGACTCTA CCATGTTGAT AGCGTGCTTG GTTCCAGCTA CTGTCAATTC TGCTCTGCTT GTTCTTGACT TGGGTTGATG ATGATTTGGC CATCTACATA ACCCCAGCAA TTGGTCCGTC AAATGGAATA TCTGAAATAG ACAGTG,(CAA
GTCTTTCACT
TTGGAGGTCA
CAATTCCACT
CAATTCTTTG
TTCTTTGGCA
AAGAAGAGAT
TCCCACTC
AGATGAACCA
AACATAGCAG CCATTGGTGC AGATGCATTT TCATCATAAG AAAGCACTGT ATTGATGACT TGGACTTCAT TACGGAAACC TTCCGCA.AAC ATAGGACGAA TCGGACGGTC AATCAAACGC GCTGTCAAGG TCGCATCTGT TGAAGGACGT CCTTCACGTT TCATAAAGCC ACCAGGAAAC TTCCCAGCCG CATACATTPTT TTCTTCGTAG TTGACTTGGA GTGGGAAGAA ATCCTCAGTT GCCATTTTCT GGGGATCC INFORMATION FOR SEQ ID NO: 384: 1394 Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 657 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18*-- CCCCCTATTT ACCGTGGACT TAGGAACTAT ATATGAGTTC GTTACTrTCG AAACTTTGAA GGGGTCAAAC TCAGTAACTT AAGGTCCCAA ATAGCCAAGA CCTTTAGTTC GCCTGAGGAC TGGTCAGGTA GTCATCAGCC TCGCAGAAAC CATCATAATT CCATGCCATC TAATrGTGGT CTGCCAAAGC TAAGGCCTrC TTAAATGGTA GTCAAGCAAT AAAGTTGTAC AAGAAAAGTG CAAATAAGAA ATCTCCAGAT TCTAGTCTGG AGA'r1-TCA ATAGACTTCG TTATTGGGCG AACTTCAAAA AACGGATT TATCGCTTTC AAATTCTTTT ATTCGCCTTG TAGACTTCAT GACGCTCAGG GTATACTTTC ATCGTCAGCG ATATTATCTG AATCATCTCC TTrCTTGTTCT AGCCTTGACA CGCGCCAGAA TTCTCTAGGG CTAAAAGGCT CCTAATTCCA AGGCCAAAAC CTTATCAAAT TCATCACTTT GGAGTTTTGA CGCCTTTGGC TCTCAGCCGC TTACAAACTT AACATGATAT CAAGCAAGAT AAAATCAAAG GGTTCTGTTT CGTCCATTTG TCACCAATTG AGTAGAAAAG CCTTCCTTAC TTCAGAATGT GTTCTTCATC ATCCACrAAT AAGACTT INFORMATION FOR SEQ ID NO: 385: Wi SEQUENCE CHARACTERISTICS: LENGTH: 586 base pairs B) TYPE: nucleic acid STRANOEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 385: CCGCATCAGC ATCAACGAGT GCATCGGCTT CACGTCAACC AGTGCATCAG TCTCAGCA)'.G CACCAGTGCG TCGGCTTCAG CATCAACGAG TGCCTCAGCC TCAGCAAGTA TCTCAGCGTC TGAATCGGCA TCAACGAGTG CGTCAGCTCA GCAAGTACTA GTGCATCGGC TTCAGCAAGC ACCAGTGCGT CGGCTTCAGC ATCAPLCCAGT GCCTCAGCCT CAGCAAGTAT CTCAGCGTCT CAATCGGCAT CAACGAGTGC GTCACCTCAG CAAGTACTAG TGCATCAGCA TCAGCATCAA CGAGTGCATC GGCTTCAGCA AGTACCAGCG CCTCAGCTTC AGCAAGCACC AGTGCGTCAC CTCAGCAAGT ACCAGCGCCT CAGCCTCAGC AAGCACCAGT GCCTCAGCTT CAGCAAGTAC CAGTGCGTCA CCTCAGCATC GACAAGTGCG TCGGCTTCAG CAAGTACCTC AGCGTCTGAA 1395 TCAGCATCAA CGAGTGCGTC AGCTTCAGCA TCAACCAGTG CCTC-AGCCTC AGCAAGTATC 540 AGTGCGTCAG CTTCAGCATC AACGAGTGCG TCAGCTGCAG CAAGTA .586 INFORMATION FOR SEQ ID NO: 386: SEQUENCE CHARACTERISTICS: CA) LENGTH: 451 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 386: CGTCGGCTTC AGCATCAACG AGTGCATCAG CTrCAGCATC AACAAGTGCT TCAGCTTCAG CAAGTACCAG TGCGTCGGCT TCAGCATCAA CGAGTGCTI'C AGTCTCAGCG TCAACCAGTG 120 CCTCTGAATC CGCATCAACA AGTGCCTCGG CTTCAGCAAG CACCAGTGCT TCGGC. TTCAG 180 CGTCAACGAG TGCGTCTGAG TCAGCATCAA CGAGTGCGTC ACCTCAGCAA GCACATCAGC 240 **.TTCTGAATCT GCATCAACCA GTGCGTCAGC TTCCGCATCA ACAAGCGCCT CGGCCTCAGC 300 *..AAGTACAAGT GCTTCAGCCT CAGCATCAAC CAGTGCATCA GCTTCAGCCT CAACAAGTGC 360 ***TTCAGCCTCA GCGTCAACCA GTGCCTCGGC TTCAGCAAGT ACCAGTGCGT CAGTTcAGCA 420 ***AGCACAAGTG CGTCAATTTA GCATCAACCA G 451 INFORMATION FOR SEQ ID NO: 387: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 425 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 387: TcTCAGCA:AG CACCATTG.CG TCGGCTTCAT CAAGCACCAG CGG cTAA TCCGCATCAA CCAGTGCTTC AGCTTCAGCC AAGTTACCTC AGCATCTGAA 'PCAGCATCAA CAAGTGCATC 120 GGCTTCAGCA AGCACAAGTG CTTCAGCtCA GCAAGTATCT CAGCGTCTGA ATCGGCATCA 180 *ACGAGTGCGT CCGCTTCAGC AAGTACTAGC GCCTCAGCAT CAGCGTCAAC AAGTGCTTCG 240 GCTTCAGCGT CAACGAGTGC GTCTGAGTCA GCATCAACGA GTACGTCAGC CTCAGCAAGC 300 ACATCAGCTT CTGAATCTGC ATCAACCAGT GCGTCAGCCT CAGCATCGAC AAGCGCCTCA 360 GCTTCAGCAA GTACCAGTGC GTCAGCCTCA GCAAGTACCA GTGCTTCAGC CTCAGCGTCG 420
ACAAG
INFORMATION FOR SEQ ID NO: 388: SEQUENCE CHARACTERISTICS: LENGTH: 572 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 388: AGAGGATCCC CGGATCCTCA GTCGCTGAGA TAACTCCTTT GGGCTTGTTC AGACAAACTC TTCATACTCC AACACTTGCC CATTTTATGC GAATCTCATC 425
ATCATGTAGT
TATTTTTTCT
CCCAGCCTTG
TTTATCTAAT
-TT7rTT-TGCA ATTTAGCTGA TrTTTCTTTT TrA AGCAAGTTTT TGACCTCAGT CCGACTTCCC ACCI CTCATAGAAC TATTATATCA TATCAAAAGG AGG4 TTTCATACTC TTCAAAAATC TCTTCAAACC GCG ACTGACTTCG TCAGTTCTAT CTGCAACCTC AAA TCTATCTGCA ACCTCAAAGC AGTGCTTTGA GCX '1TrGATTTWC ATTGAGTATC AGAT'rrAGGA AAT AAAACAATCA AGGCTCCTAA AATCGCTGGG AT INFORMATION FOR SEQ ID NO: 389: SEQUENCE CHARACTERISTICS: A) LENGTH: 505 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
CTAGTAC
TCAACGT
PCAGTGT
rCTGCG
~AACTTC
AATGACCAAC CTCCTTTTCG CGCCTTGCCG TATATATGTT TTTGAGCTGA CTTCGTCAGT GCTAGTTTCC kAGTkTGCTC CTCGkCTCCA AAAAAkAGCT CCATTTA CAGTCACGCG GCACAGG CAACTAAAAA (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 389: CAACAAGTGC CTCGGCTTCA GCATGCACAA GTGCTTCAGC TTCAGCATGT ACCTGAGCUGT CTGAATCAGC ATCAACGTGT GCGTCCGCTT CAGCATGTAC TGCTGCCTCA CCATCAGCGT CAAcAwGTGC TTCGGCTTCA GCGTCAACGA GTGCGTCTGA GTCAGCATCA ACGAGTACGT CAGCCTCAGC AAGCACATCA GCTTCTGAAT CTGCATCAAC CAGTGCGTCA GCCTCAGCAT CGACAAGCGC CTCAGCTTCA GCAAGTACCA GTGCGTCAGC CTCAGCAAGT ACCAGTGCTT CAGCCTCAGC GTCGACAAGT GCGTCGGCCT CAACCAGTGC ATCTGAATCG GCATCAACCA GTGCGTCAGC CTCAGCAAGT ACTAGCGCCT CAGCCTCAGC ATCAACGAGT GCGTCCGCTT 1397 CAGCAAGTAC TAGTGCATCA GCATCAGCAT CAACGAGTGC ATCGGCTTCA GCAAGTACCA 480 GCGCCTCAGC TTCAGCAAGC ACCGG 505 INFORMATION FOR SEQ ID NO: 390: SEQUENCE CHARACTERISTICS: LENGTH: 447 base pairs TYPE: nucleic acid STRANDEDNESS; double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 390: GCTAAGACTA CCTCATTAGG GGCATAGGCT GCTAAAATAA CTGCAGCTGT GGTTAATGAC AATACTGTAC TTr'rTTrCAT TTTAATTCCT TACATATTTA TATAACTTCC AATAGATAAT 120 AAACTTTAAC TTTGCTAGCC TITTGT-rATAA AAAGTrTAC TAAGTATTA'r CTAGGAAATA 180 *GAGTAGTACA TTTATATATA ATTGTTATCT CTCTATAAAA ACAGTATATC ATTTAAAAAA 240 *ATTTAAGTCA AAAAAATTAA CATTAGTTAA TTTATTTTT'r AGCACACATT AAAAAATAAG 300 ATTAGTACTC AATGAAAATC AAAGAGCAAA CTAGGAAACT AGCCGCAGAT TGCTCAAAAC 360 *AGTGPTTGA GGTTGTAGAT GGAATGACGT AGTCAGCTCA AAACACTGTT TTGAACTrGT 420 *GGATAGAACT GACGAAGTCG GTACCGA 447 9 INFORMATION FOR SEQ ID NO: 391: SEQUENCE CHARACTERISTICS: LENGTH: 572 base pairs S TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 391: **AGCACTTGTC GTTGAATTCT ACAACAAAAT GT'rGTAATAT TTTArTGAAT AAGATAGGCC TTGATATTAA GCACTTTGGG ACGTTCTCCC TTAGTGCTTr TTTGATTTCT CTTAGTATCC 120 AGCTATAATC GTTGAGACAT AACTAGACCG ATATAGTCCA AAGTGATATA GTAAAATGAA 180 **CCAAAAATAG TACACAATGT GGTATAATCC TPTATGGCA TATTCAATAG ATTI'TCGTA.A 240 AAAAGTTCTC TCTTATTGTG AGCGAACAGG TAGTATAACA GAAGCATCAC ACGTTTTCCA 300 AATCTCACGT AATACCATI' ATGGCTGGTT AAAGCTAAAA GAGAAAACAG GAGAGCTAAA 360 CCACCAAGTA TAGTGTATTG AATCTATAAC AGTACACCTT GGCTGCTAAA ATATTTCTAT 420 1398 AAATTAA'rTT GACTTTCCTG ATAGAGATGT TCACATCTTA Tr1'CAAACTA CTATATAAGT 480 TCTATAATCT CTTTATAAGA ?1,TGCCCATC AGACAAAATA GAACGArrG AAGGCGTTTA 540 TGATATTTrAG CTGTACGAGA GTCTTTTAAA AG 572 1399 MISSING UPON TIME OF PUBLICATION 1400
DENMARK
The applicant hereby requests that, until the application has been laid open to public inspection (by the Danish Patent Office), or has been finally decided upon by the Danish Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art. The request to this effect shall be filed by the applicant with the Danish Patent Office not later than at the time when the application is made available to the public under Sections 22 and 33(3) of the Danish Patents Act. If such a request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Danish Patent Office or any person approved by the applicant in the individual case.
SWEDEN
The applicant hereby requests that, until the application has been laid open to public inspection (by the Swedish Patent Office), or has been finally decided upon by the Swedish Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art. The request to this effect shall be filed by the applicant with the International Bureau before the expiration of 16 months from the priority date (preferably on the Form PUT/RO/134 reproduced in annex Z of Volume I of the PCT Applicant's Guide). If such a request has been filed by the applicant, any request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Swedish Patent Office or any person approved by the applicant in the individual case.
UNITED KINGDOM The applicant hereby requests that the furnishing of a sample of a microorganism shall only be made available to an expert. The request to this effect must be filed by the applicant with the International Bureau before the completion of the technical preparations for the International publication of the application.
NETHERLANDS
The applicant hereby requests that until the date of a grant of a Netherlands patent or until the date on which the application is refused or withdrawn or lapse, the microorganism shall be made available as provided in Rule 3 1F(l) of the Patent Rules only by the issue of a sample to an expert. The request to this effect must be furnished by the applicant with the Netherlands Industrial Property Office before the date on which the application is made available to the public under Section 22C or Section 25 of the Patents Act of the Kingdom of the Netherlands, whichever two dates occurs earlier.
1401 Page 2
SINGAPORE
The applicant hereby requests that the furnishing of a sample of a microorganism shall only be made available to an expert. The request to this effect must be filed by the applicant with the International Bureau before the completion of the technical preparations for international publication of the application.
NORWAY
The applicant hereby requests that, until the application has been laid open to public inspection (by the Norwegian Patent Office), or has been finally decided upon by the Norwegian Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art. The request to this effect shall be filed by the applicant with the Norwegian Patent Office not later than at the time when the application is made available to the public under Sections 22 and 33(3) of the Norwegians Patents Act. If such a request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Norwegian Patent Office or any person approved by the applicant in the .individual case.
AUSTRALIA
The applicant hereby gives notice that the furnishing of a sample of a microorganism shall only be effected prior to the grant of a patent, or prior to the lapsing, refusal or withdrawal of the application, to a person who is a skilled addressee without an interest in the invention (Regulation 3.25(3) of the Australian Patents Regulations).
FINLAND
The applicant hereby requests that, until the application has been laid open to public inspection (by the National Board of Patents and Registration), or has been finally decided upon by the National Board of Patents and Registration without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art.
ICELAND
The applicant hereby requests that, until the application has been laid open to public inspection (by the Icelandic Patent Office), or has been finally decided upon by the Icelandic Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected in the art.

Claims (4)

1402- The Claims Defining the Invention are as Follows 1. An isolated nucleic acid molecule comprising: nucleotides 1885-3876 of SEQ ID NO: 32; a nucleotide sequence at least 95% identical to or or a degenerate variant of any one of to
2. A vector comprising an isolated nucleic acid molecule of claim 1.
3. An isolated fragment of the Streptococcus pneumoniae genome, that specifically modulates the expression of nucleotides 1885-3876 of SEQ ID NO: 32, wherein said fragment consists of a nucleotide sequence from about 10 to 200 bases in length which is selected from the 200 consecutive bases which are 5' to ORF ID NO:3 or a degenerate variant thereof.
4. A non-human organism which has been altered to contain an isolated nucleic acid molecule of claim 1. A non-human organism which has been altered to contain the fragment of claim 3. 6. A method for regulating the expression of an isolated nucleic acid molecule of claim 1 comprising the step of covalently attaching to said nucleic acid molecule a second nucleic acid molecule consisting of an isolated fragment of claim 3. 7. A method of preparing a homolog of an isolated nucleic acid molecule including nucleotides 1885-3876 of SEQ ID NO:32 comprising the steps of: screening a genomic DNA library using a probe derived from nucleotides 1885- 3876 of SEQ ID NO: 32 as a target sequence;
1403- identifying members of said library which contain sequences that hybridize to said target sequence; and isolating the nucleic acid molecules from said members identified in step 8. A method of preparing a homolog of an isolated nucleic acid molecule comprising nucleotides 1885-3876 of SEQ ID NO: 32, the method comprising the steps of: isolating mRNA, DNA, or cDNA produced from an organism; using primers derived from nucleotides 1885-3876 of SEQ ID NO: 32 to amplify nucleic acid molecules from the isolated mRNA, DNA or cDNA. isolating the amplified nucleic acid molecules produced in step 9. An isolated polypeptide encoded by an isolated nucleic acid molecule of claim 1. 10. An isolated polypeptide of claim 9 comprising at least 17, 20 or 50 amino acids. •11. An antibody which selectively binds to any one of the polypeptides of claim 9 or 12. A method for producing a peptide, polypeptide or protein in a host cell comprising the steps of: S 15 incubating a host containing a heterologous nucleic acid molecule whose nucleotide sequence consists of an isolated nucleic acid molecule of claim 1, under conditions where said heterologous nucleic acid molecule is expressed to produce said peptide, polypeptide or protein, and isolating the peptide, polypeptide or protein.
AU33351/01A 1996-10-31 2001-03-30 Streptococcus pneumoniae polynucleotides and sequences Ceased AU777190B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60/029960 1996-10-31
AU69090/98A AU6909098A (en) 1996-10-31 1997-10-30 Streptococcus pneumoniae polynucleotides and sequences

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU69090/98A Division AU6909098A (en) 1996-10-31 1997-10-30 Streptococcus pneumoniae polynucleotides and sequences

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004231248A Division AU2004231248B2 (en) 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences

Publications (2)

Publication Number Publication Date
AU3335101A AU3335101A (en) 2001-12-13
AU777190B2 true AU777190B2 (en) 2004-10-07

Family

ID=33163451

Family Applications (1)

Application Number Title Priority Date Filing Date
AU33351/01A Ceased AU777190B2 (en) 1996-10-31 2001-03-30 Streptococcus pneumoniae polynucleotides and sequences

Country Status (1)

Country Link
AU (1) AU777190B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113290076B (en) * 2021-04-29 2022-11-15 邯郸钢铁集团有限责任公司 Control method for improving threading efficiency of hot galvanizing outlet double-coiler
CN113136444B (en) * 2021-05-10 2024-04-19 临沂大学 Microdroplet digital PCR detection method for enterococcus faecalis in medical food
CN116574634B (en) * 2023-03-13 2023-11-03 广东悦创生物科技有限公司 Streptococcus salivarius thermophilus subspecies JF2 and application thereof in preparation of anti-inflammatory and lipid-relieving food and drug

Also Published As

Publication number Publication date
AU3335101A (en) 2001-12-13

Similar Documents

Publication Publication Date Title
AU745787B2 (en) Enterococcus faecalis polynucleotides and polypeptides
KR101914245B1 (en) Composition Containing Bacterial Strain
AU762606B2 (en) Chlamydia pneumoniae genomic sequence and polypeptides, fragments thereof and uses thereof, in particular for the diagnosis, prevention and treatment of infection
AU754264B2 (en) Chlamydia trachomatis genomic sequence and polypeptides, fragments thereof and uses thereof, in particular for the diagnosis, prevention and treatment of infection
AU2021290210A1 (en) Compositions comprising bacterial strains
KR100923598B1 (en) Surface Proteins of Streptococcus pyogenes
AU2016357553A1 (en) Compositions comprising bacterial strains
WO1998018931A2 (en) Streptococcus pneumoniae polynucleotides and sequences
TW202223083A (en) Use of compositions comprising bacterial strains
JPH09322781A (en) Staphylococcus aureus polynucleotide and sequence
AU2015327511B2 (en) Biomarkers for rheumatoid arthritis and usage thereof
AU2018232902A1 (en) Complete genome sequence of the methanogen methanobrevibacter ruminantium
KR102191537B1 (en) Selection and use of lactic acid bacteria preventing bone loss in mammals
AU2022256122A1 (en) Novel Proteins From Anaerobic Fungi And Uses Thereof
RU2673715C2 (en) Haemophilus parasuis vaccine serovar type 4
CN112243377A (en) Bacteriophage for treating and preventing bacterially-associated cancer
AU2016295176A1 (en) Genetic testing for predicting resistance of gram-negative proteus against antimicrobial agents
KR20200019882A (en) Compositions Containing Bacterial Strains
KR20200038970A (en) Composition comprising a bacterial strain
AU777190B2 (en) Streptococcus pneumoniae polynucleotides and sequences
KR20190059562A (en) Novel Bacillus subtilis having proteolytic activity and uses thereof
KR20060060389A (en) Genome sequence of zymomonas mobilis zm4 and novel gene involved in production of ethanol
AU713692B2 (en) Nucleic acid and amino acid sequences relating to helicobacter pylori for therapeutics
KR20240021274A (en) Bacteriophages against vancomycin-resistant enterococci
AU710880B2 (en) Nucleic acid and amino acid sequences relating to helicobacter pylori for diagnostics and therapeutics