AU5417201A - Polycystic kidney disease gene - Google Patents

Polycystic kidney disease gene Download PDF

Info

Publication number
AU5417201A
AU5417201A AU54172/01A AU5417201A AU5417201A AU 5417201 A AU5417201 A AU 5417201A AU 54172/01 A AU54172/01 A AU 54172/01A AU 5417201 A AU5417201 A AU 5417201A AU 5417201 A AU5417201 A AU 5417201A
Authority
AU
Australia
Prior art keywords
pkd1
seq
gene
sequence
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU54172/01A
Inventor
Timothy Burn
Timothy Connors
Timothy Dackowski
Gregory Feng
Qian Feng
William Germino
Katherine Klinger
Gregory Landes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genzyme Corp
Johns Hopkins University
Original Assignee
Genzyme Corp
Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genzyme Corp, Johns Hopkins University filed Critical Genzyme Corp
Priority to AU54172/01A priority Critical patent/AU5417201A/en
Publication of AU5417201A publication Critical patent/AU5417201A/en
Priority to AU2004212565A priority patent/AU2004212565A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Peptides Or Proteins (AREA)

Description

Regulation 3.2
AUSTRALIA
Patents Act 1990 DIVISIONAL APPLICATION r e Name of Applicant: Actual Inventor(s): Address for Service: Invention Title: Genzyme Corporation AND Johns Hopkins University KLINGER, Katherine; BURN, Timothy; CONNORS, Timothy; DACKOWSKI, William; GERMINO, Gregory and QIAN,Feng C. c- cO DAVIES COLLISON CAVE, Patent Attorneys, Level 3, 303 Coronation Drive, Milton, Queensland, 4064, Australia "Polycystic kidney disease gene" Details of Parent Application No: 32119/97 The following statement is a full description of this invention, including the best method of performing it known to me/us: Q:\opcr\Vpa\July 2001\2432061 divisional Genzym.183.doc 2/7/01 POLYCYSTIC KIDNEY DISEASE
GENE
This application is a continuation-in-part of U.S.
patent application Serial No. 08/381,520, filed January 31, 1995.
FIELD OF THE INVENTION The present invention pertains to the diagnosis and treatment of polycystic kidney disease in humans, using DNA sequences derived from the human PKD1 gene and the protein or proteins encoded by that gene.
BACKGROUND OF THE INVENTION Autosomal dominant polycystic kidney disease (APKD), also called adult-onset polycystic kidney disease, is one of the most common hereditary disorders in humans, 20 affecting approximately one individual in a thousand. The prevalence in the United States is greater than 500,000, with 6,000 to 7,000 new cases detected yearly (Striker et al., Am.
*T
J. Nephrol., 6:161-164, 1986; Iglesias et al., Am. J. Kid.
Dis., 2:630-639, 1983). The disease is considered to be a systemic disorder, characterized by cyst formation in the ductal organs such as kidney, liver, and pancreas, as well as by gastrointestinal, cardiovascular, and musculoskeletal abnormalities, including colonic diverticulitis, berry aneurysms, hernias, and mitral valve prolapse (Gabow et al., 30 Adv. Nephrol., 18:19-32, 1989; Gabow, New Eng. J. Med., 329:332-342, 1993).
The most prevalent and obvious symptom of APKD, however, is the formation of kidney cysts, which result in grossly enlarged kidneys and a decrease in renalconcentrating ability. Hypertension and endocrine abnormalities are also common in APKD patients, appearing even before symptoms of renal insufficiency. In approximately half of APKD patients, the disease progresses to end-stage renal disease; accordingly, APKD is responsible for 4-8% of the renal dialysis and transplantation cases in the United States and Europe (Proc. European Dialysis and Transplant Assn., Robinson and Hawkins, eds., 17:20, 1981).
Thus, there is a need in the art for diagnostic and therapeutic tools to reduce the incidence and severity of this disease.
APKD exhibits a transmission pattern typical of autosomal dominant inheritance, each offspring of an affected individual has a 50% chance of inheriting the causative gene. Linkage studies indicated that a causative gene is present on the short arm of chromosome 16, near the a-globin cluster; this locus was designated PKD1 (Reeders et S 20 al., Nature, 317:542, 1985). Though other PKD-associated genes exist, such as, for example, PKD2, PKD1 defects appear to cause APKD in about 85-90% of affected families (Parfrey et al., New Eng. J. Med., 323:1085-1090, 1990; Peters et al., Contrib. Nephrol., 97:128-139, 1992).
SThe PKD1 gene has been localized to chromosomal position 16p13.3. Using extensive linkage analysis, in conjunction with the identification of new markers and restriction enzyme analysis, the gene has been further localized to an interval of approximately 700 kb between the markers ATPL (ATP6C) and CMM65 (D16S84). The region is rich in CpG islands that are thought to flank transcribed sequences, and it has been estimated that this interval contains at least 20 genes. The precise location of the PKD1 gene was pinpointed by the finding of a PKD family whose affected members carry a translocation that disrupts a 14 kb RNA transcript associated with this region, as reported in the European PKD Consortium (EPKDC), Cell, 77:881, 1994, describing approximately 5631 bp of DNA sequence corresponding to the 3' end of the putative PKD1 cDNA sequence.
Notwithstanding knowledge of the partial PKD1 3' cDNA sequence, several significant impediments stand in the way of determining the complete sequence of the PKD1 gene.
For the most part, these impediments arise from the complex organization of the PKD1 locus. One serious obstacle is that sequences related to the PKD1 transcript are duplicated at least three times on chromosome 16 proximal to the PKD1 locus, forming PKD1 homologues. Another obstacle is that the PKD1 genomic interval also contains repeat elements that are present in other genomic regions. Both of these types of sequence duplications interfere with "chromosome walking" 20 techniques that are widely used for identification of genomic DNA. This is because these techniques rely on hybridization to identify clones containing overlapping fragments of genomic DNA; thus, there is a high likelihood of "walking" into clones derived from PKD1 homologues instead of clones derived from the authentic PKD1 gene. In a similar manner, the PKD1 duplications and chromosome 1 6 -specific repeats also interfere with the unambiguous determination of a complete cDNA sequence that encodes the PKD1 protein. Thus, there is a need in the art for genomic and cDNA sequences corresponding to the authentic PKD1 gene. This includes identification of segments of these sequences that are unique to the expressed PKD1 and not are present in the duplicated homologous sequences also present on chromosome 16.
SUMMARY OF THE INVENTION The present invention involves an isolated normal human PKD1 gene having the sequence set forth in Figure 1, sequences derived therefrom such as the sequence set forth in Figure 2, an isolated nucleic acid having the PKD1 cDNA sequence set forth in Figure 3, and sequences derived therefrom. The PKD1 gene is a genomic DNA sequence whose altered, defective, or non-functional expression leads to adult-onset polycystic kidney disease. The invention also encompasses DNA vectors comprising these nucleic acids, cells transformed with the vectors, and methods for producing PKD1 protein or fragments thereof.
In another aspect, the invention involves isolated oligonucleotides that hybridize only to the authentic expressed PKD1 gene, and not to PKD1 homologues.
20 In yet another aspect, the invention involves isolated mutant PKD1 genes, and their cDNA cognates, which contain alterations in nucleotide sequence relative to the normal PKD1 gene, and whose presence in one or more copies in the genome of a human individual is associated with adultonset polycystic kidney disease.
In still another aspect, the invention involves isolated oligonucleotides that discriminate between normal and mutant versions of the PKD1 gene.
In still another aspect, the invention involves methods for identifying a human subject carrying a mutant PKD1 gene in a human subject, comprising: a) obtaining a sample of biological material from the subject, and b) detecting the presence of the mutant gene or its protein product.
In still another aspect, the invention involves methods and compositions for treating APKD or disease conditions having the characteristics of APKD. Such methods encompass administering an isolated human PKD1 gene, or fragments of the gene, under conditions that result in expression of therapeutically effective amounts of all, or part of, the PKD1 protein. The invention also encompasses compositions for treating APKD that comprise all or part of the PKD1 DNA of Figures 1, 2 and 3, or the PKD1 protein encoded by the DNA of Figures 1, 2 or 3.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1A shows the DNA sequence of the human PKD1 locus between chromosomal markers ATPL (ATP6C) and D16S84.
(SEQ ID NO:1).
Figure 1B shows the DNA sequence of 53,526 bases comprising the normal human PKD1 gene. (SEQ ID NO:2).
Figure 2 shows a partial DNA sequence of 894 bases within the 5' region of normal human PKD1 DNA. (SEQ ID NO:3) Figure 3 shows the full-length sequence of normal human PKD1 cDNA and corresponding amino acid sequence.
(SEQ
ID Figure 4A shows a comparison of the DNA sequence of the 5' region of DNAs derived from the authentic PKD1 gene (SEQ ID NO:19) and PKD1 homologues (SEQ ID NO:18). A 2 9 -base pair gap must be introduced into the sequence of the authentic gene to align the two sequences. In addition, the authentic PKD1 and the PKD1 homologue differ at position 418 of this figure.
Figure 4B shows the DNA sequence of an oligonucleotide that can be used to discriminate between the authentic PKD1 sequence and PKD1 homologues. The star denotes a polymerization-blocking modification. (SEQ ID Figure 5 shows the region of chromosome 16 containing the PKD1 locus. The upper panel shows NotI restriction sites, as well as previously identified genetic markers in this region. The bottom panel shows P1 clones covering this region.
20 Figure 6 shows the restriction map of the 91.8B P1 clone containing the PKD1 gene and flanking regions with only the relevant sites indicated (B=BamHI, C=SacI, E=EcoRI, N=NotI, S=SalI, X=XhoI and V=EcoRV). The NotI site in parenthesis is methylated in genomic DNA. The position of 25 the 1.9 kb BamHI-BamHI fragment is shown by the shaded box, the striped box denotes the location of the 2.5 kb polypurine/polypyrimidine tract. The arrows indicate the position and orientation of the next most centromeric transcript (NCT), TSC-2 and PKD1 genes. The location of relevant cosmid clones is shown by open boxes. Restriction fragments used to generate sequencing templates are shown at the bottom with quotation marks denoting that the site is vector derived. Pools used in fluorescence in situ hybridization (FISH) are indicated by brackets at the bottom.
Figure 7 shows a comparison between the previously reported (EPKDC) partial PKD1 cDNA (SEQ ID NOS:20, 21, 24, 25,28 and 29) sequence and the sequence reported herein (SEQ ID NOS: 22,23,26, 27, 30 and 31). The upper sequence is that reported for the cDNA (EPKDC), while the lower sequence is the genomic sequence of the present invention. Discrepancies are highlighted by lower case in the cDNA (EPKDC) sequence and by boxes in the genomic sequence with the corresponding changes in amino acids denoted with X's. The altered carboxy-terminal residues resulting from the frame shift are shown above the genomic sequence and the previously predicted residues are shown in lowercase. An in-frame termination codon is indicated by an underline in the genomic sequence.
Figure 8 shows an illustration of the PKD1 genomic structure as predicted by GRAIL2. The predicted exons are represented as boxes along the genomic sequence. The reported cDNA is at the top right. The position of the 20 kb GC-rich region is indicated by the striped box at the bottom. The stippled box above exons 3 and 4 in the gene model indicate the position of the predicted LRR and carboxy-flanking region. The extent of the published cDNA is shown by the open (coding region) and cross hatched boxes (3' untranslated region). The filled black box indicates the e. relative position of the exon which was absent in the o" predicted gene model, while the asterisk designates the exon which contains an unspliced intron. The position of the kb GC-rich region is marked by the striped box below the 30 GC-content bar.
Figure 9 shows a schematic structure of the predicted PKD1 protein. Multiple domains are depicted based on sequence homology including two copies of a leucine-rich repeat (LRR) near the N-terminal which is flanked by a cysteine-rich cluster Three perfect copies and 12 related copies of a domain of unknown function (Pmel-17 or Ig-like repeat) are shown. The predicted 7 (or more) membrane-spanning domains are indicated. The exons encoding the various domains are listed.
Figure 10 shows the RT-PCR and cDNA products comprising the PKD1 cDNA. The EPKDC 3' cDNA sequence is shown by the striped box. The full-length cDNA is shown in black. Shaded boxes denote individual cDNAs and RT-PCR products. The cross hatched box denotes the RT-PCR products containing alternatively spliced exons and an unspliced exon which do not maintain the open reading frame. Alternatively spliced exons and insertions are designated by thin lines and inverted triangles, respectively. Open boxes designate the position of open reading frames. The stippled box denotes the 5' untranslated region.
20 Figure 11 shows a schematic structure of the full length PKD1 cDNA in pCMV-SPORT vector. Thin line represents PKD1 cDNA with restriction sites used to assemble individual ScDNA clones. Thick line represents pCMV-SPORT vector which :contains SP6 and T7 RNA polymerase promoters to generate
RNA
for in vitro translations, CMV promoter, SV40 origin of replication and polyadenylation signal for expression in mammalian cells.
Figure 12 shows a schematic of the full-length PKD1 30 product and its truncated products. Black box represents signal peptide Leucine rich repeat (LRR) and Ig-like (Ig-like) domains are indicated by shaded boxes. The eleven predicted transmembrane regions are also indicated by black bars and numbered.
Figure 13 shows regions of homology in the PKD1 gene between sequences encoded by GRAIL2-predicted exons and proteins present in SwissProt and PIR databases. (SEQ ID NOS: 32-55). Positions where the PKD1 sequence matches the consensus sequence are shaded.
Figure 14 shows the results of exon trapping within the PKD1 locus.
Figure 15 shows the regions of PKD1 protein used as fusion proteins for generation of domain specific polyclonal antibodies. The predicted structure of the PKD1 protein is shown above. Each fusion protein consists of the carrier glutathione-S-transferase (GST) or maltose binding protein (MBP) and the indicated region of PKD1 polypeptide. PKD1 corresponding residues of each fusion protein are shown.
Figure 16 shows the two constructs used for 20 immunoprecipitation, SrfIA, which corresponds to the N-terminal half of the PKD1 protein and BRASH 7, which corresponds to the C-terminal half of the PKD1 protein as shown. Epitopes for anti-fusion proteins FP-LRR, FP-46-lc and FP-46-2 polyclonal antibodies used for immunoprecipitations are also indicated.
DETAILED DESCRIPTION OF THE INVENTION All patent applications, patents, and literature references cited in this specification are hereby incorporated by reference in their entirety. In case of conflict or inconsistency, the present description, including definitions, will control.
Definitions: 1. "APKD" as used herein denotes adult-onset polycystic kidney disease, which is characterized by the development of renal cysts and, ultimately, renal failure, and may alternatively or in addition involve cysts in other organs including liver and spleen, as well as gastrointestinal, cardiovascular, and musculoskeletal abnormalities.
2. The term "PKDI gene" refers to a genomic
DNA
sequence which maps to chromosomal position 16p13.3 and gives rise to a messenger RNA molecule encoding the PKDl protein.
The PKDI gene encompasses the sequences shown in Figures 1 and 2, which includes introns and putative.regulatory sequences. The term "authentic" is used herein to denote the 20 genomic sequence at this location, as well as sequences derived therefrom, and serves to distinguish these authentic sequences from "PKDI homologues" (see below).
3. "PKDI complementary DNA (cDNA)" is defined herein as a single-stranded or double-stranded intronless
DNA
molecule encompassing the sequence shown in Figure 3, that is derived from the authentic PKDI gene and whose sequence, or complement thereof, encodes the PKDl protein shown in Figure 3.
4. A "normal" PKDI gene is defined herein as a PKD1 gene whose altered, defective, or non-functional expression leads to adult-onset polycystic kidney disease.
A
normal PKDI gene is not associated with disease and thus is considered to be a wild-type version of the gene. Included in this category are allelic variants in the PKD1 gene, also denoted allelic polymorphisms, i.e. alternate versions of the PKD1 gene, not associated with disease, that may be represented at any frequency in the population. Also included are alterations in DNA sequence, whether recombinant or naturally occurring, that have no apparent effect on expression or function of the PKD1 gene product.
5. A "mutant" PKD1 gene is defined herein as a PKD1 gene whose sequence has been modified by transitions, transversions, deletions, insertions, or other modifications relative to the normal PKD1 gene, which modifications cause detectable changes in the expression or function of the PKD1 gene product, including causing disease. The modifications may involve from one to as many as several thousand nucleotides, and result in one or more of a variety of changes in PKD1 gene expression, such as, for example, decreased or increased rates of expression, or expression of 20 a defective RNA transcript or protein product. Mutant PKD1 genes encompass those genes whose presence in one or more copies in the genome of a human individual is associated with
APKD.
6. A "PKD1 homologue" is a sequence which is o* closely related to PKD1, but which does not encode the authentic expressed PKD1 gene product. Several examples of such homologues that map to chromosomal location 16p13.1 have been identified and sequenced by the present inventors.
7. A "PKD 1 carrier" is defined herein as an individual who carries at least one copy of a diseaseproducing mutant PKD1 gene. Since the disease generally exhibits an autosomal dominant pattern of transmission, PKD1 carriers have a high probability of developing some symptom of PKD. Thus, a PKD1 carrier is likely to be a "PKD patient." 8. As referred to herein, a "contig" is a continuous stretch of DNA or DNA sequence, which may be represented by multiple, overlapping, clones or sequences.
9. As referred to herein, a "cosmid" is a DNA plasmid that can replicate in bacterial cells and that accommodates large DNA inserts from about 30 to about 45 kb in length.
The term "P1 clones" refers to genomic DNAs cloned into vectors based on the P1 phage replication mechanisms. These vectors generally accommodate inserts of about 70 to about 105 kb (Pierce et al., Proc. Natl. Acad.
Sci., USA, 89:2056-2060, 1992).
20 11. As used herein, the term "exon trapping" refers to a method for isolating genomic DNA sequences that are flanked by donor and acceptor splice sites for RNA processing.
12. The term "single-strand conformational polymorphism analysis" (SSCP) refers to a method for detecting sequence differences between two DNAs, comprising hybridization of the two species with subsequent mismatch *o detection by gel electrophoresis. (Ravnik-Glavac et al., Hum.
30 Mol. Genet., 3:801, 1994).
13. "HOT cleavage" is defined herein as a method for detecting sequence differences between two DNAs, comprising hybridization of the two species with subsequent mismatch detection by chemical cleavage (Cotton, et al., Proc. Natl. Acad. Sci., USA, 85:4397, 1988).
14. "Denaturing gradient gel electrophoresis" (DDGE) refers to a method for resolving two DNA fragments of identical length on the basis of sequence differences as small as a single base pair change, using electrophoresis through a gel containing varying concentrations of denaturant (Guldberg et al., Nuc. Acids Res., 22:880, 1994).
As used herein, "sequence-specific oligonucleotides" refers to related sets of oligonucleotides that can be used to detect allelic variations or mutations in the PKD1 gene.
16. As used herein, "PKD1-specific oligonucleotides" refers to oligonucleotides that hybridize to sequences present in the authentic expressed PKD1 gene and not to PKD1 homologues or other sequences.
17. "Amplification" of DNA as used herein denotes a reaction that serves to increase the concentration of a particular DNA sequence within a mixture of DNA sequences.
Amplification may be carried out using polymerase chain 25 reaction (PCR) (Saiki et al., Science, 239:487, 1988), ligase chain reaction (LCR), nucleic acid-specific based amplification (NSBA), or any method known in the art.
18. "RT-PCR" as used herein refers to coupled 30 reverse transcription and polymerase chain reaction. This method of amplification uses an initial step in which a specific oligonucleotide, oligo dT, or a mixture of random primers is used to prime reverse transcription of RNA into single-stranded cDNA; this cDNA is then amplified using standard amplification techniques e.g. PCR.
19. A PKDl gene or PKDI cDNA, whether normal or mutant, corresponding to a particular sequence is understood to include alterations in the particular sequence that do not change the inherent properties of the sequence. It will be understood that additional nucleotides may be added to the and/or terminus of the PKDI gene shown in Figure
IB,
or the PKD1 cDNA shown in Figure 3, as part of routine recombinant DNA manipulations. Furthermore, conservative
DNA
substitutions, i.e. changes in the sequence of the proteincoding region that do not change the encoded amino acid sequence, may also be accommodated.
The present invention encompasses the human gene for PKD1. Mutations in this gene are associated with the occurrence of adult-onset polycystic kidney disease.
A
"normal" version of the genomic sequence, corresponding to 20 53,526 bases of the PKD1 gene is shown in Figure lB.
The PKDl gene sequence was determined using the ."strategy described in Example 1. Briefly, a series of cosmid and P1 DNA clones was assembled containing overlapping human 25 genomic DNA sequences that collectively cover a 700 kilobase segment of chromosome 16 known to contain the PKDI locus. To identify transcribed sequences within this 700 kb segment, including those sequences encoding PKD1, both exon trapping and cDNA selection techniques were employed. At the same time, direct DNA sequencing of the human DNA sequences contained in the genomic clones was performed, using techniques that are well-known in the art. These included the isolation of subclones from particular cosmid or P1 clones. Nested deletions were created from selected subclones, and the nested deletions were then subjected to direct DNA sequencing using the ALFm automated sequencer (Pharmacia, Uppsala, Sweden).
The full-length sequence of PKD1 cDNA is shown in Figure 3.
The present invention encompasses isolated oligonucleotides corresponding to sequences within the PKD1 gene, or within PKD1 cDNA, which, alone or together, can be used to discriminate between the authentic expressed PKD1 gene and PKD1 homologues or other repeated sequences. These oligonucleotides may be from about 12 to about 60 nucleotides in length, preferably about 18 nucleotides, may be single- or double-stranded, and may be labelled or modified as described below. An example of an oligonucleotide that can be used in this manner is shown in Figure 4B. The discrimination function of this oligonucleotide is based on a comparison of the sequence of the authentic PKD1 gene with three cDNAs 20 derived from the PKD1 homologues, which revealed that o homologue cDNAs contain a 29 bp insertion relative to the e authentic PKD1 sequence (Figure 4A). The oligonucleotide shown in Figure 4B is modified at its 3' terminus so that it does not support polymerization reactions, and is designed to hybridize specifically to the homologue sequence and not to oo.*g the authentic PKD1 sequence. When this oligonucleotide is included in amplification reactions, it selectively prevents the amplification of PKD1 homologue sequences. In this *°oo manner, authentic PKD1 sequences are selectively amplified 30 and PKD1 homologues are not. These oligonucleotides or their functional equivalents thus provide a basis for testing for the presence of mutations in the authentic PKD1 gene in a human patient (see Example 5 below).
The present invention encompasses isolated DNA and RNA sequences, including sense and antisense sequences, derived from the sequences shown in Figures 1, 2, and 3. The particular sequences may represent "normal" alleles of PKD1, including allelic variants, or "mutant" alleles, which are associated with disease symptoms. PKD1-derived sequences may also be associated with heterologous sequences, including promoters, enhancers, response elements, signal sequences, polyadenylation sequences, and the like. Furthermore, the nucleic acids can be modified to alter stability, solubility, binding affinity, and specificity. For example, PKD1-derived sequences can be selectively methylated.
The DNA may comprise antisense oligonucleotides and may further include nuclease-resistant phosphorothioate, phosphoroamidate, and methylphosphonate derivatives, as well as "protein nucleic acid" (PNA) formed by conjugating bases to an amino acid backbone as described in Nielsen et al., Science, 254: 1497, 1991. The DNA may be derivatized by linkage of the a-anomer nucleotide, or by formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the nucleic acid sequences of the present invention may also be modified with a label capable of providing a detectable signal, either directly or 25 indirectly. Exemplary labels include radioisotopes, fluorescent molecules, biotin, and the like.
In general, nucleic acid manipulations according to the present invention use methods that are well known in the 30 art, as disclosed in, for example, Molecular Cloning,
A
.Laboratory Manual (2nd Ed., Sambrook, Fritsch and Maniatis, Cold Spring Harbor), or Current Protocols in Molecular Biology (Eds. Ausubel, Brent, Kingston, More, Feidman, Smith and Struhl, Greene Publ. Assoc., Wiley-Interscience, NY, NY, 1992).
The invention also provides vectors comprising nucleic acids having PKDI or PKDl-related sequences A large number of vectors, including plasmid, phage, viral and fungal vectors, have been described for expression in a variety of eukaryotic and prokaryotic hosts, and may be used for gene therapy as well as for simple protein expression.
Advantageously, vectors may also include a promoter operably linked to the PKDI- encoding portion, particularly when the PKDl-encoding portion comprises the cDNA shown in Figure 3 or derivatives or fragments thereof. The encoded PKDI may be expressed by using any suitable vectors, such as pREP4, pREPS, or pCEP4 (InVitrogen, San Diego, CA), and any suitable host cells, using methods disclosed or cited herein.or otherwise known to those skilled in the relevant art. The particular choice of vector/host is not critical to the operation of the invention.
Recombinant cloning vectors will often include one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic resistance,, and one or more expression cassettes. The inserted PKDI coding sequences may be synthesized, isolated 00. from natural sources, or prepared as hybrids, for example.
Ligation of the PKDI coding sequences to transcriptional regulatory elements and/or to other amino acid coding sequences may be achieved by known methods. Suitable host 30 cells may be transformed/transfected/infected by any suitable method including electroporation, CaCI 2 mediated DNA uptake, fungal infection, microinjection, microprojectile, or other established methods.
Appropriate host cells included bacteria, archebacteria, fungi, especially yeast, and plant and animal cells, especially mammalian cells. Of particular interest are E. coli, B. Subtilis, Saccharomyces cerevisiae, SF9 cells, C129 cells, 293 cells, Neurospora, and CHO cells,
COS
cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell lines. Preferred replication systems include M13, ColE1, SV40, baculovirus, lambda, adenovirus, artificial chromosomes, and the like. A large number of transcription initiation and termination regulatory regions have been isolated and shown to be effective in the transcription and translation of heterologous proteins in the various hosts.
Examples of these regions, methods of isolation, manner of manipulation, and the like, are known in the art. Under appropriate expression conditions, host cells can be used as a source of recombinantly produced PKD1.
This invention also contemplates the use of unicellular or multicellular organisms whose genome has been transfected or transformed by the introduction of PKD1 coding sequences through any suitable method, in order to obtain recombinantly produced PKqE protein or peptides derived therefrom.
25 Nucleic acids encoding PKD1 polypeptides may also be incorporated into the genome of recipient cells by recombination events. For example, such a sequence can be microinjected into a cell, and thereby effect homologous recombination at the site of an endogenous gene encoding 30 PKD1, an analog or pseudogene thereof, or a sequence with substantial identity to a PKD1-encoding gene. Other recombination-based methods such as nonhomologous recombinations or deletion of endogenous gene by homologous recombination, especially in pluripotent cells, may also be used.
The present invention also encompasses an isolated polypeptide having a sequence encoded by the authentic PKD1 gene, as well as peptides of six or more amino acids derived therefrom. The polypeptide(s) may be isolated from human tissues obtained by biopsy or autopsy, or may be produced in a heterologous cell by recombinant DNA methods as described above. Standard protein purification methods may be used to isolate PKD1-related polypeptides, including but not limited to detergent extraction, and chromatographic methods including molecular sieve, ion-exchange, and affinity chromatography using e.g. PKD1-specific antibodies or ligands. When the PKD1 polypeptide to be purified is produced in a recombinant system, the recombinant expression vector may comprise additional sequences that encode additional amino-terminal or carboxy-terminal amino acids; these extra amino acids act as "tags" for immunoaffinity purification using immobilized antibodies or for affinity purification using immobilized ligands.
Peptides comprising PKDl-specific sequences may be derived from isolated larger PKD1 polypeptides described 25 above, using proteolytic cleavages by e.g. proteases such as trypsin and chemical treatments such as cyanogen bromide that are well-known in the art. Alternatively, peptides up to residues in length can be routinely synthesized in milligram quantities using commercially available peptide synthesizers.
The present invention encompasses antibodies that specifically recognize the PKD1 polypeptide(s) encoded by the ene shown in Figures 1 and 2 or the cDNA shown in Figure 3, and/or fragments or portions thereof. The antibodies may be polyclonal or monoclonal, may be produced in response to the native PKD1 polypeptide or to synthetic peptides as described above. Such antibodies are conveniently made using the methods and compositions disclosed in Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Laboratory, 1988, other references cited herein, as well as immunological and hybridoma technologies known to those in the art. Where natural or synthetic PKD1-derived peptides are used to induce a PKD-specific immune response, the peptides may be conveniently coupled to an suitable carrier such as KLH and administered in a suitable adjuvant such as Freund's. Preferably, selected peptides are coupled to a lysine core carrier substantially according to the methods of Tam, Proc.Natl.Acad.Sci,USA 85:5409-5413, 1988. The resulting antibodies may be modified to a monovalent form, such as, for example, Fab, Fab 2 FAB', or FV. Anti-idiotypic antibodies may also be prepared using known methods.
In one embodiment, normal or mutant PKD1 polypeptides are used to immunize mice, after which their spleens are removed, and splenocytes used to form cell hybrids with myeloma cells and obtain clones of antibodysecreted cells according to techniques that are standard in -the art. The resulting monoclonal antibodies are screened 25 for specific binding to PKD1 proteins or PKD1-related peptides.
In another embodiment, antibodies are screened for selective binding to normal or mutant PKD1 sequences.
30 Antibodies that distinguish between normal and mutant forms *of PKD1 may be used in diagnostic tests (see below) employing ELISA, EMIT, CEDIA, SLIFA, and the like. Anti-PKD1 antibodies may also be used to perform subcellular and histochemical localization studies. Finally, antibodies may be used to block the function of the PKD1 polypeptide, whether normal or mutant, or to perform rational drug design studies to identify and test inhibitors of the function using an anti-idiotypic antibody approach).
Identification of Disease-CausinQ Mutations in PKDQ In one mode of practice of the present invention, the isolated and sequenced PKD1 gene is utilized to identify previously unknown or mutant versions of the PKD1 gene.
First, human subjects with inherited polycystic kidney disease are identified by clinical testing, pedigree analysis, and linkage analysis, using standard diagnostic criteria and interview procedures, and DNA or RNA samples are obtained from the subjects (see below).
A variety of techniques are then employed to pinpoint new mutant sequences. First, PKD1 DNA may be subjected to direct DNA sequencing, using methods that are standard in the art. Furthermore, deletions may be detected using a PCR-based assay, in which pairs of oligonucleotides are used to prime amplification reactions and the sizes of the amplification products are compared with those of control products. Other useful techniques include Single-Strand 25 Conformation Polymorphism analysis (SSCP), HOT cleavage, denaturing gradient gel electrophoresis, and two-dimensional gel electrophoresis.
A confounding and complicating factor in the 30 detection of a PKD1 mutation is the presence of PKD1 homologues at several sites on chromosome 16 proximal to the transcribed gene. In analysis of mutations in PKD1, it is critical to distinguish between sequences derived from the authentic PKD1 gene and sequences derived from any of the homologues. Thus, an important feature of the present invention is the provision of oligonucleotide primers that discriminate between authentic PKD1 and the homologues.
A
detailed comparison of the sequences of the authentic PKD1 gene and the homologues enables the design of primers that discriminate between the authentic PKD1 gene or cDNA and the homologues. Primers that conform to this criterion, such as those disclosed in Figure 4B, may be used in conjunction with any of the analytical methods described below.
For SSCP, primers are designed that amplify
DNA
products of about 250-300 bp in length across non-duplicated segments of the PKD1 gene. For each amplification product, one gel system and two running conditions are used. Each amplification product is applied to a 10% polyacrylamide gel containing 10% glycerol. Separate aliquots of each amplimer are subjected to electrophoresis at 8W at room temperature for 16 hours and at 30W at 4°C for 5.4 hours. These conditions were previously shown to identify 98% of the known mutations in the CFTR gene (Ravnik-Glavac et al., Hum. Mol.
Genet., 3:801, 1994).
For "HOT" cleavage, amplification reactions are performed using radiolabelled PKD1-specific primers. 'Each 25 radiolabelled amplification product is then mixed with a fold to 100-fold molar excess of unlabelled amplification products produced using the identical primers and DNA from APKD-affected or -unaffected subjects. Heteroduplex formation, chemical cleavage, and gel analysis are then performed as described (Cotton, et al., Proc. Natl. Acad.
Sci., USA, 85:4397, 1988). Bands on the gel that are smaller than the homoduplex result from chemical cleavage of heteroduplexes at base pair mismatches involving cytidine or thymidine. Once a mutation has been identified by this procedure, the exact location of the mismatch(es) is determined by direct DNA sequencing.
Mutations are also identified by "broad range" DDGE (Guldberg et al., Nuc. Acids Res., 22:880, 1994). The use of GC-clamped PCR primers and a very broad denaturant gradient enables the efficient detection of mutant sequences. This method can also be combined with non-denaturing size fractionation in a two-dimensional system. An apparatus is used that permits automated two-dimensional electrophoresis, and the second dimension considerably increases the resolution of mutations.
After the presence of a mutation is detected by any of the above techniques, the specific nucleic acid alteration comprising the mutation is identified by direct DNA sequence analysis. In this manner, previously unidentified PKD1 mutations may be defined.
Once a previously unidentified PKD1 mutation is defined, methods for detecting the particular mutation in other affected individuals can be devised, using a variety of methods that are standard in the art. For example, oligonucleotide probes may be prepared that allow the detection and discrimination of the particular mutation. It will be understood that such probes may comprise either the mutant sequence itself, or, alternatively, may flank the mutant sequence. Furthermore, the oligonucleotide sequence can be used to design a peptide immunogen comprising the 30 mutant amino acid sequence. These peptides are then used to elicit antibodies that distinguish between normal and mutant PKD1 polypeptides.
Diagnostic Tests for PKDI Mutations Mutant PKD1 genes, whether identified by the methods described above or by other means, find use in the design and operation of diagnostic tests. Tests that detect the presence of mutant PKD1 genes, including those described below and in Example 5, can be applied in the following ways: 1 0 To determine donor suitability for kidney transplants. In general, it is desirable to use a close relative of the transplant recipient. When the recipient is a patient suffering from familial APKD, it is important to ascertain that the donor relative does not also carry the familial mutant PKD1 gene.
To screen for at-risk individuals in APKDaffected families. Presymptomatic individuals who have a high probability of developing APKD can be identified, allowing them to be monitored and to avail themselves of preventive therapies.
To target hypertensive patients for antihypertensive treatment. Hypertension is also linked to 25 APKD. Screening of hypertensive patients for the presence of mutant PKD 1 genes can be used to identify patients for preemptive regulation of blood pressure to prevent later kidney damage.
30 To perform prenatal screening. Most PKD1linked PKD is of the adult-onset type. In a small subset of families carrying a mutation in PKD1 genes, however, juvenile onset is common and signifies a more severe form of the disease. In these families, prenatal screening can be useful for genetic counselling purposes.
In general, the diagnostic tests according to the present invention involve obtaining a biological sample from a subject, and screening the sample, using all or part of the PKDl gene of this invention, for the presence of one or more mutant versions of the PKDI gene or its protein product. The subject may be a fetus in utero, or a human patient of any age.
In one embodiment, a sample of genomic DNA is obtained from a human subject and assayed for the presence of one or more disease-associated PKD1 mutations. This DNA may be obtained from any cell source or body fluid. Non-limiting examples of cell sources available in clinical practice include blood cells, buccal cells, cervicovaginal cells, epithelial cells from urine, fetal cells, or any cells present in tissue obtained by biopsy. Body fluids include blood, urine, cerebrospinal fluid, amniotic fluid, and tissue exudates at the site of infection or inflammation. DNA is extracted from the cell source or body fluid using any of the numerous methods that are standard in the art. It will be understood that the particular method used to extract DNA 25 will depend on the nature -of the source. The minimum amount of DNA to be extracted for use in the present invention is about 25 pg (corresponding to about 5 cell equivalents of a genome size of 3 x 109 base pairs).
30 In this embodiment, the assay used to detect the presence of mutations may comprise restriction enzyme digestion, direct DNA sequencing, hybridization with sequence-specific oligonucleotides, amplification by PCR, single-stranded conformational polymorphism analysis, denaturating gradient gel electrophoresis (DDGE), twodimensional gel electrophoresis, in situ hybridization, and combinations thereof.
In a preferred embodiment; RNA is isolated from a PKD1-expressing cell or tissue, preferably lymphocytes, using standard techniques including automated systems such as that marketed by Applied Biosystems, Inc. (Foster City, CA). The RNA is then subjected to coupled reverse-transcription and PCR amplification (RT-PCR). The resulting DNA may then be screened for the presence of mutant sequences by any of the methods outlined above (see Example 5 below).
As discussed above, any nucleic-acid-based screening method for PKD1 mutations must be able to discriminate between the authentic PKD1 gene present at chromosome location 1 6 p13.3 and PKD1 homologues present at 16p13.1 and other locations. The oligonucleotides
SEQ
ID Nos:10 and 13-15) are examples of primers that discriminate between the authentic and homologue sequences, and these oligonucleotides or their equivalents form an important part of any such diagnostic test. Furthermore, nucleotides 43,823 through 52,887 of the PKD1 sequence of Figure 1B represent a sequence that is unique to the 25 authentic PKD1 gene and is not present in the homologues.
Thus, oligonucleotides derived from this region can be used in a screening method to insure that the authentic PKD1 gene, and not the homologues, are detected.
30 In another embodiment, the assay used to detect the presence of a mutant PKD1 gene involves testing for mutant gene products by an immunological assay, using one of many methods known in the art, such as, for example, radioimmunoassay, ELISA, immunofluorescence, and the like. In this embodiment, the biological sample is preferably derived from a PKD1-expressing tissue such as kidney. The PKD1 polypeptide may be extracted from the sample. Alternatively, the sample may be treated to allow detection or visualization of specifically bound antibodies in situ as occurs in, for example, cryosectioning followed by immunofluorescent staining.
The antibodies may be monoclonal or polyclonal, may be raised against intact PKD1 protein, or natural or synthetic peptides derived from PKD1. In a preferred embodiment, the antibodies discriminate between "normal" and "mutant" PKD1 sequences, and possess a sufficiently high affinity for PKD1 polypeptides so that they can be used in routine assays.
It will be understood that the particular method or combination of methods used will depend on the particular application. For example, high-throughput screening methods preferably involve extraction of DNA or RNA from an easily available tissue, followed by amplification of particular SI"" PKD1 sequences and hybridization of the amplification products with a panel of specific oligonucleotides.
25 Therapeutic Applications The present invention encompasses the treatment of PKD using the methods and compositions disclosed herein. All or part of the normal PKD1 gene disclosed above can be 30 delivered to kidney cells or other affected cells using a variety of known methods, including e.g. liposomes, viral vectors, recombinant viruses, and the like. The gene can be incorporated into DNA'vectors that additionally comprise tissue-specific regulatory elements, allowing PKD1 expression in a tissue-specific manner. This approach is feasible if a particular mutant PKD1 allele, when present in a single copy, merely causes the level of the PKD1 protein to diminish below a threshold level necessary for normal function; in this case, increasing the gene dosage by supplementing with additional normal copies of the PKD1 gene should correct the functional defect. In another embodiment, a mixture of isolated nucleic acids, such as that set forth in Figure 2 and at least a portion of the normal PKD1 gene, may be delivered to kidney or other affected cells in order to treat APKD. Alternatively, it may be desired to limit the expression of a mutant PKD1 gene, using, for example, antisense sequences. In this embodiment, antisense oligonucleotides may be delivered to kidney or other cells.
For therapeutic uses, PKD1-related DNA may be administered in any convenient way, for example, parenterally in a physiologically acceptable carrier such as phosphate buffered saline, saline, deionized water, or the like.
Typically, the compositions are added to a retained physiological fluid such as blood or synovial fluid. The amount administered will be empirically determined using routine experimentation. Other additives, such as :i stabilizers, bactericides, and the like, may be included in 25 conventional amounts.
This invention also encompasses the treatment of APKD by protein replacement. In one embodiment, protein produced by host cells transformed or transfected with DNA 30 encoding the PKD1 polypeptide of the present invention is o* introduced into the cells of an individual suffering from altered, defective, or non-functional expression of the PKD1 gene. This approach augments the absence of PKD1 protein, or the presence of a defective PKD1 protein, by adding functional PKD1 protein. The PKD1 protein used in augmentation may comprise a subcellular fragment or fraction, or may be partially or substantially purified. In any case, the PKD1 protein is formulated in an appropriate vehicle, such as, for example, liposomes, that may additionally include conventional carriers, excipients, stabilizers, and the like.
It will be understood that the therapeutic compositions of the present invention need not in themselves constitute an effective amount, since such effective amounts can be reached by administering a plurality of such therapeutic compositions.
The following examples are intended to illustrate the invention without limiting its scope thereof.
Example 1: Cloning and Seuencing of the Human PKD1 gene A. Methods: Employing an ordered sequencing approach, restriction fragments from cDEB11 and cGGG10.2 cosmids were subcloned into either pBLUESCRIPT (Stratagene, La Jolla, CA) 25 or pGEM (Promega, Madison, WI). Plasmids were purified by CsC1 density centrifugation in the presence of ethidium bromide. Nested deletions were generated from each plasmid using ExoIII (Henikoff, Methods Enzymol. 155: 156-165, 1987) and additional enzymatic reagents provided by the Erase-A-Base kit (Promega, Madison, WI). The resulting nested clones were analyzed electrophoretically after appropriate restriction enzyme digestion and were ordered into a nested set of templates for sequencing. A minimum tiling series of plasmids, each differing by approximately 250 bp from flanking clones, were identified and used for sequencing.
Plasmid DNAs were prepared for sequencing in one of two ways. Initially, all clones of interest were cultured in 2 mL of Super Broth (Tartof et al., BRL Focus 9: 12, 1987) for 20 hours at 37 0 C. Sets of 12-24 were processed simultaneously using a modified alkaline SDS procedure followed by ion-exchange chromatography as described by the manufacturer (Easy-Prep, Pharmacia, Piscataway, NJ). Plasmid DNA yields ranged from 2.5 to 25 gg. Poor growing clones, or those whose plasmids generated sequence of unacceptable quality, were recultured in 100 mL of Luria's Broth and the plasmid DNA isolated using Qiagen columns (Qiagen, San Diego,
CA).
Dideoxy sequencing reactions were performed on deletion clones using the Auto-Read Sequencing Kit (Pharmacia, Piscataway, NJ) and fluorescein-labeled vector primers (M13 universal, M13 reverse, T3, T7 and SP6).
Reaction products were separated on 6% denaturing acrylamide gels using the ALF m DNA Sequencer (Pharmacia, Piscataway,
NJ).
25 Second strand sequencing was performed using either an opposing set of nested deletions or primer walking. For .primer walking, custom 1 7-mers, staggered every 250 bp, were purchased from a commercial supplier (Protogene, Palo Alto, CA). Template DNAs prepared by Qiagen or CsCl density 30 gradients were sequenced using the unlabeled 17-mers by inclusion of fluor-dATP labeling mix in the sequencing reactions as described by the manufacturer (Pharmacia, Piscataway, NJ). In all cases, except the 2.5kb GC-rich region, single-stranded DNA was rescued from deletion clones using helper phage VCSM13 (Stratagene) as described by the manufacturer.
Single-stranded templates from the 2.5 kb GC-rich region were sequenced using fluorescein-labeled universal primer and the Sequitherm Long Read cycle sequencing kit (Epicentre Technologies, Madison, WI) (Zimmerman et al., Biotechniques 17: 303-307, 1994). All processed sequencing data was transferred to a Quadra 700 Macintosh computer and assembled using the SEQUENCHER (Gene Codes, Ann Arbor,
MI)
sequencing assembly program. For differences that would not be resolved by examining the chromatograms, templates were either resequenced or primers proximal to the ambiguity were designed and used for resolution of the sequence difference.
Cycle sequencing was performed using the Sequitherm cycle sequencing kit as described by the manufacturer (Epicentre Technologies, Madison, WI). Reaction products were separated on denaturing acrylamide gels and subsequently detected by autoradiography.
B. Sequencing Strategy: A 700 kb region of chromosome 16 containing the 25 PKD1 locus is shown in Figure 5 (top panel). A contig :covering this region was assembled from overlapping P1 clones (shown in the middle panel). The contig was assembled by unidirectional chromosomal walking from the ends of the interval (ATPL and D16S84) and bidirectional walking from several internal loci (D16S139 and KG8). One of the clones, 91.8B (ATCC Accession No. 98056), spans the entire PKD1 interval and includes cosmids cDEB11 (ATCC Accession No.
98057), cGGG10.2 (ATCC Accession No. 98058), and substantial portions of cosmids 2H2 and 325A11 (Stallings, R.L. et al., Genomics 13:1031, 1992). The P1 clone 91.8B (shown schematically in Figure 6) was used as a second genomic template to confirm discrepancies between the published cDNA sequence (EPKDC, Cell, 1994, supra) and the cosmid-derived genomic sequence.
Preliminary experiments revealed the presence of multiple repetitive elements in the cGGG10.2 cosmid.
Therefore, an ordered approach based on nested deletions, rather than random shotgun subcloning, was used to sequence the PKD1 gene. Restriction fragments derived from the inserts of both cGGG10.2 and cDEB11 were subcloned into highcopy number plasmids as a preliminary step to the generation of nested deletions. Unidirectional deletions were prepared and sequenced, using the ALF" automated sequencing system (Pharmacia, Uppsala, Sweden).
C. Primary Structure of the PKD1 Locus: The primary sequence of the locus encompassing the PKD1 gene is 53,577 bp in length. This locus is GC-rich with a CpG/GpC dinucleotide ratio of 0.485. The primary sequence of the PKD1 gene within this locus is 53,526 bp in length. The present sequence was analyzed for S. 25 transcriptional elements and CpG islands using GRAIL2 (Uberbacher, E.C. et al., Proc. Natl. Acad. Sci., USA 88:11261, 1991) and XGrail client server (Shah et al., User's Guide to GRAIL and GENQUEST, Client-Server Systems, available by anonymous ftp to arthur.epm.omi.gov (128.219.9.76) from directory pub/xgrail or pub/xgenquest, as file manual.grailgenquest, 1994). Ten CpG islands were identified (Figure 8).
Forty-eight exons were predicted on the coding strand by the GRAIL program. The quality of 39 of the 48 exons was "excellent", six were considered "good", and three were deemed "marginal". These data were analyzed using the gene model feature of GRAIL2. The final gene model contained 46 exons.
Comparison of the present genomic sequence with the previously reported partial cDNA sequence (EPKDC, Cell, 1994, supra) revealed several differences (Figure The first and most significant difference is the presence of two additional cytosine residues at position 4566 of the reported sequence. The presence of these two cytosine residues results in a frame shift in the predicted protein coding sequence, leading to the replacement of 92 carboxy-terminal amino acids with a novel 12-amino acid carboxy terminus.
Seven of the twelve amino acids of the new carboxy terminus are charged or polar. Additional sequence differences are located at positions 3639-3640 and 3708-3709 of the published EPKDC sequence (Figure A GC dinucleotide pair is present at each of these positions in the present sequence, while a CG pair is found in the reported sequence. In each case, histidine and valine residues would replace the previously predicted glutamine and leucine residues, respectively.
D. Identification of Protein Coding Regions: 25 Exons predicted by the GRAIL2 program with an "excellent" score were used to search the SwissProt and PIR databases (Bairoch and Boeckmann, Nuc. Acids Res. 20:2019- 2022, 1992) using the BLASTP program (Altschul et al., J.
Mol. Biol. 215:403-410, 1990). Exons 3 and 4 of the gene 30 model were predicted to encode peptides with homology to a number of leucine-rich repeat (LRR)-containing proteins involved in protein-protein interactions (Figure 13). In :addition to the LRR itself, sequences amino- and carboxyflanking to the LRR may also be conserved in proteins of the leucine-rich glycoprotein (LRG) family, either singly or together.
Exon 3 encodes residues homologous to the LRR from leucine-rich a2 glycoprotein, members of the GPlb.IX complex which comprise the von Willebrand factor receptor, as well as to the Drosophila proteins chaoptin, toll, and slit. The latter are involved in adhesion, dorsal-ventral polarity, and morphogenesis, respectively.
Sequences predicted by GRAIL2 to be encoded by exon 4 were found to have homology to the conserved region carboxy terminal to the LRR in all of the above proteins except chaoptin, which lacks this conserved region. Homology was also observed between the exon 4-encoded sequences and-the trk proto-oncogene, which encodes a receptor for nerve growth factor. Further examination of the predicted PKD1 peptide revealed additional regions of weaker homology with conserved regions of the trk tyrosine kinase domain. None of the more proximal exons in the gene model appear to encode a peptide with homology to the conserved amino-flanking region seen in a subset of the LRR-containing proteins.
Exon trapping, RT-PCR, and Northern blot analysis 25 revealed that GRAIL2-predicted exons 3 and 4 are present in expressed sequences. During initial exon trapping experiments using genomic P1 and cosmid clones from the PKD1 locus, an exon trap was identified that contained both of these exons. In separate experiments, the presence of the 30 LRR-carboxy-flanking motif in transcribed sequences was confirmed by RT-PCR using as a template RNA from fetal kidney and from adult brain. On a Northern blot, an RT-PCR fragment containing this motif detected the 14kb PKD1 transcript and several other transcripts of 21 kb, 17 kb, and 8.5 kb.
A region of homology was also observed between the GRAIL2-predicted peptide and the human gplO0O/Pmell7 gene products, as well as with bovine RPE1. Three copies of a 34 amino acid segment that is also present in the Pmel-17 and gpl00 gene products was deduced (Kwon et al, Proc. Natl.
Acad. Sci., USA 88:9228-9232, 1991; Adema et al., J. Biol.
Chem. 269:20126-33, 1994) within the larger context of immunoglobulin repeat motifs. The RPE1 gene product has significant homology to gplOO and may represent the bovine homolog (Kim and Wistow, Exp. Eye Res. 55:657-662, 1992).
GRAIL2-predicted exons 9, 22, and 28, upstream of the 3' cDNA, showed strong homology to EST T03080 255 bp), EST T04943 189 bp) and EST T05931 233 bp) In addition, nucleotides 10378-10625 of GRAIL-predicted intron 1 showed strong homology to a region of the Apo CII gene 263 bp).
The identification of a number of transmembrane domains and a leucine-rich repeat motif possessing conserved carboxy-flanking regions, raises interesting speculations about potential protein function. LRR motifs have been shown o to be involved in protein-protein interactions, while the 25 conserved carboxy-flanking region is associated with proteins which interact with the extracellular matrix. These data suggest that the PKD1 gene product may be a membrane oooo glycoprotein that functions in cell-matrix or cell-cell interactions. Less commonly, LRR motifs have been identified in receptors involved in signal transduction (McFarland et al., Science 245:494-499, 1989). Thus an alternative hypothesis is that the gene product is a receptor for a soluble factor(s). In either case, PKDI would function to mediate interactions with the extracellular environment. If so, ligands for the gene product as well as downstream intracellular effectors are obvious candidates for the non-chromosome 16-linked forms of the disease. A model of the predicted PKD1 protein structure is shown in Figure 9.
E. Repeated Sequences: The PKD1 locus was searched for known classes of repetitive DNA by FASTA comparison against the repeat database of Jurka et al., J.Mol.Evol. 35:286-291, 1992. This search identified 23 Alu repeats but no other repetitive elements. The Alu repeats are organized into three clusters of four or more Alu repeats, three clusters of two Alu repeats, and two singlet Alu repeats (Figure 8).
The PKD1 sequence interval contained two dinucleotide repeats and a single tetranucleotide repeat ((TTTA)6). The TG dinucleotide repeats are present at positions 209-224 and 52,698-52,715. The tetranucleotide repeat is located at position 7796-7819. No trinucleotide S. repeats >5 were identified. Only the most 3' TG8 repeat is known to be polymorphic.
In addition to the more usual repetitive elements, 25 the PKD1 gene contains several types of repeated sequences that either do not appear in existing data bases, or do not appear in the extreme form seen at this locus. The most striking repeat is a 2.5 kb segment within the 4 kb BamHI- SacI fragment. A significantly shorter C-T rich region is also found in the adjoining 1.8 kb SaCI-BamHI fragment.
These regions proved very difficult to sequence unambiguously due to the high GC content to the purine asymmetry S. with respect to each strand and to the length of the repeat.
The coding strand in this region has an extreme pyrimidine bias, being 96% C-T, and could not be sequenced using T7 DNA polymerase or Sequenase. This was true regardless of the template type (plasmid, single-stranded phage, or strandseparated single-stranded DNA). In both cases, the noncoding strand, which is G-A rich, was successfully sequenced with both T7 DNA polymerase and Sequenase, although run lengths were noticeably abbreviated compared to all other regions sequenced. Compressions on the non-coding strand were resolved by conventional and cycle sequencing using single-stranded template. The extreme purine asymmetry of strands in this segment may promote localized triple strand conformation under the appropriate conditions (pH, divalent cations, supercoiling), and may be a major cause of the difficulty in sequencing this segment.
The other unusual repeat was located in the 7.6 kb XhoI fragment. This repeat is 459 bp in length and consists of 17 tandem copies of a perfect 27 bp repeat.
Example 2: PKD1 cDNA Sequences Obtained Through Exon Trapping and cDNA Selection Techniques 25 The 700 kb interval of chromosome 16 that includes the PKD1 gene appears to be particularly rich in CpG islands and, by association, is most likely rich in expressed sequences as well. To purify and sequence expressed PKD1 sequences, an exon-rescue vector, pSPL3, was used to recover sequences from cosmids that contain both a splice acceptor and splice donor element; this method is designated "exon trapping." Exon trapping is a highly efficient method for isolating expressed sequences from genomic DNA. The procedure utilizes the pSPL3 plasmid, which contains rabbit i-globin coding sequences separated by a portion of the HIVtat gene, or improved derivatives of SPL3 lacking cryptic (interfering) splice sites. Fragments of cloned PKD1 genomic DNA were cloned into the intron of the tat gene, and the resulting subclones were transfected into COS-7 cells. sequences in the vector allow for both relaxed episomal replication of the transfected vectors, as well as transcription of the cloned genomic DNAS. Exons within the subcloned genomic DNAs spliced into the globin/tat transcript were recovered using RT-PCR, using primers containing tat splice donor and acceptor sequences. A major advantage of exon trapping is that expression of the cloned DNA is directed by a viral promoter; thus, developmental or tissuespecific expression of gene products is not a concern.
PKD1-containing genomic clones, in the form of either cosmid or P1 DNA, were either double digested with BamHI and BglII or partially digested with Sau3A and shotgun cloned into BamHI-digested and dephosphorylated pSPL3 (GIBCO BRL, Bethesda, MD) or its derivatives. Plasmid minipreps were electroporated into COS-7 cells, and trapped exons were S. 25 recovered by RT-PCR, followed by subcloning, using standard procedures.
Trapped exons from the PKD1 locus are shown in Figure 14 (bottom). The trapped exons were subjected to 30 automated DNA sequencing as above, allowing their alignment with the genomic PKD1 DNA.
Example 3: Construction of Full-length PKD1 cDNA In the case of PKD1, the identification of cDNAs which are specific for the 5' end of the PKD1 locus is particularly difficult since multiple transcribed copies of homologous sequences are also present at 16p13.1
(EPKDC,
Cell, 1994 supra). Regions of both genomic DNA and cDNA derived from the homologues were sequenced and compared with the present PKD1 sequence. In this data set, the PKD1 and homologous sequences were greater than 97% identical at the nucleotide level. Therefore, direct comparisons of potential PKD1 cDNAs and genomic sequence are required to definitively map a cDNA to the PKD1 locus, and to verify that the correct sequence is encoded by.the cDNA.
Multiple approaches were required to assemble the full-length PKD1 cDNA. Seven cDNAs were used to construct the full-length cDNA. Five of these cDNAs were recovered from screening cDNA libraries: the BRL Gene-Trapper brain library, and cDNA libraries constructed from fetal brain, and constructed from the somatic cell hybrid 145.19. The 145.19 cell line contains the PKD1 locus, but does not include the 25 PKD1 homologs in its human component.
A. cDNA Library Construction and Screening The somatic cell hybrid library was constructed 30 using both oligo(dT) and random hexamer priming and poly(A)-containing RNA from the 145.19 cell line. The duplex cDNA was linked and then ligated into lambda ZAP EXPRESS (Stratagene, La Jolla, CA) to yield a library consisting of several million independent plaques. Fourteen clones were positive by colony hybridization using a PKD1 specific probe, with inserts ranging in size from 2.6 to 9 kb. Consistent with the RT-PCR products derived from the 145.19 cell line, substantial alternative splicing or incomplete splicing was evident. Interestingly, the missing exons appeared to comprise one or more distinct protein domains.
Two additional libraries were constructed using fetal brain cDNA cloned into lambda ZAP EXPRESS and the replacement vector, lambda DASH (Stratagene, La Jolla,
CA).
Additionally, a variation of.the cDNA selection methodology was used to screen oligo(dT)-primed, unidirectional cDNA libraries (in phagemids). Briefly, single-stranded library DNA was prepared from cultures of the adult brain cDNA library. A single biotinylated 1 7 -mer derived from the sense-strand from the gene-specific portion of the predicted PKD1 cDNA was used for hybrid selection.
Hybrid-bound cDNAs released by denaturation were made double-stranded using the same oligonucleotide as a gene-specific primer and Klenow and then introduced into E.
coli by electroporation. Colony hybridization was used to identify the PKD1 clones from the enriched brain cDNA 25 population. The cloned brain inserts ranged in size from 0.7 to 2.5 kb. The sequence of the two largest cDNAs was virtually identical to each other as well as to the genomic S. sequence.
Example 4: Expression of Full-LenOth PKD1 cDNA Full-length PKD1 cDNA was cloned into three expression vectors, pCMV-SPORT, pcDNA3, and pCEP4 (total construct sizes ranging from 18-24.2 kb). The schematic structure of full-length PKD1 cDNA in pCMV-SPORT is shown in Figure 11.
pCMV-SPORT and pcDNA3 have small differences in cloning sites and some other small features, but share the basic features of flanking T7 and SP6 promoters, CMV enhancer-promoter sequences for high level transcription, and eukaryotic polyadenylation and transcription sequences which enhance RNA stability. The SV40 origin of replication allows growth in eukaryotic cells, while the ColE1 origin allows growth in E. coli. The vector pcDNA3 confers neomycin resistance in eukaryotes, while ampicillin resistance is used for selection in E. coli.
pCEP4 is an EBV-based vector which is maintained extrachromosomally in primate cells. Like pCMV-SPORT and pcDNA3, pCEP4 contains the CMV enhancer and promoter, and the ColEl origin of replication and ampicillin resistance are used for maintenance. However, hygromycin resistance is used e for selection in eukaryotic cells. The use of the EBV origin of replication and hygromycin resistance are important features for studies of PKD1 transformed cell lines, since as a function of the transformation procedure they already S 25 contain SV40 large T antigen, and are G418 resistant.
A. In vitro Expression The T7 promoter feature of pcDNA3 was used to 30 analyze the protein product encoded by the PKD1 cDNA employing the TNT Coupled Reticulocyte Lysate System, (Promega, Madison, WI). This system enables large amounts of RNA to be synthesized from the T7 promoter, and the RNA to be translated into protein in the rabbit reticulocyte lysate.
Since conventional molecular weight standards only extend up to -216 kD, the size estimates of in vitro synthesized polycystin, -462 kD (non-glycosylated), would be speculative at best. For this reason, a series of 3' deleted PKD1 cDNA plasmid templates encoding truncated proteins of predicted size were constructed (Figure 10). The protein products of these deletion clones as well as the full-length PKD1 cDNA were analyzed using the TNT system.
Newly synthesized protein was labeled by inclusion of radioactive amino acids, initially 35 S-methionine. The synthesized proteins were then resolved by electrophoresis on a 3-12% gradient SDS-PAGE gel. The mobility of the protein product produced from each of the truncated clones was consistent with its predicted molecular size. These results are consistent with assembled PKD1 cDNA expression vectors directing in vitro synthesis of polycystin.
20 B. In vivo Expression: PKD1 cDNA Transfection in Human Embryonic Kidney (HEK) 293 cells cDNA constructs containing full-length PKD1 cDNA or portions thereof were transfected into HEK 293 cells and assayed for PKD1 expression using Northern analysis, 48 hours post-transfection. An insertless vector, pcDNA3, was used in parallel as a control for transfection. A Northern blot was probed with a PKD1-specific probe and then subsequently re-probed with a g-actin cDNA to normalize the respective 30 lanes. The results showed that the PKD1 mRNA is increased at least two-fold in HEK 293 which received the PKD1 cDNA construct.
Example 5: Diagnostic Tests for PKD1 Mutations Whole blood samples collected in high glucose
ACD
Vacutainers T (yellow top) were centrifuged and the buffy coat collected. The white cells were lysed with two washed of a 10:1 mixture of 14mM NH 4 Cl and ImM NaHC03, their nuclei were resuspended in nuclei-lysis buffer (10mM Tris, pH 0.4M NaC1, 2mM EDTA, 0.5% SDS, 500 Jg/ml proteinase K) and incubated overnight at 37 0 C. Samples were then extracted with a one-fourth volume of saturated NaCl and the DNA was precipitated in ethanol. The DNA was then washed with ethanol, dried, and dissolved in TE buffer (10mM Tris-HCl, pH ImM EDTA).
A. Test I Long PCR conditions were used with a 4-part reaction mixture. Part 1 containing the following S 20 components: 3.3X XL Buffer 12 il dNTPs (2mM each) 8 Il Forward primer (20M) 1-5 pl 25 Reverse primer (20pM) 1-5 pl Blocking oligo (2mM) 1.5 p.
Mg(OAc)2, (25mM) 4.4 .l water to 40 p.
Part 1 can be assembled as a single reaction component or in batch (10, 50, 100 reaction equivalents) and then dispensed as 40pl aliquots into individual reaction tubes.
Part 2 comprises carefully adding 1 AmpliWaxPCR Gem 100 (or comparable product to each Part 1 reaction tube). The tubes were incubated at 75-80°C for 5 min.to melt the wax bead.
The reactions were cooled allowing the wax to solidify.
In Part 3, the following components were added to the cooled reaction mixture of Part 2: 3.3X XL Buffer 1811i rTth DNA Polymerase, XL 2Li In Part 4, the following components are added to the reaction mixture of Part 3: human DNA 0.2-1Ig water to The forward primer used in the reaction described above comprises an oligonucleotide that hybridizes to both 20 authentic PKD1 and PKD1 homologue sequences. An example of such a primer is: -CACGACCTGTCCCAGGCAT-3' (SEQ ID NO:6) (corresponding to nucleotides 4702-4720 of SEQ ID NO:1).
The reverse primer comprises a sequence derived from a 3' region of the authentic PKD1 gene, which may or may not be present in the PKD1 homologues. Examples of such 3' regions and corresponding reverse primers are: 3' seuence: reverse primer: 5'-CTGGCGGGCGAGGAGAT-3 5'-ATCTCCTCGCCCGCCAG-3 (SEQ ID NO:7) (SEQ ID NO:56) 5'-CTTTGACAAGCACATCT-3' (SEQ ID NO:8) 5'-CAACTGGCTGGACAACA-3' (SEQ ID NO:9) 5'-AGATGTGCTTGTCAAAG-3, (SEQ ID NO:57) 5'-TGTTGTCCAGCCAGTTG-3' (SEQ ID NO:58) The blocking oligonucleotide comprises: 5'-AGGACCTGTCCAGGCATC-3' (SEQ ID Importantly, this oligonucleotide must be incapable of supporting polymerization. One example is an oligonucleotide in which the 3' terminal nucleotide comprises a dideoxynucleotide. It will be understood that any modification that achieves this effect may be used in practicing the invention. Under appropriate conditions, the blocking oligonucleotide hybridizes efficiently to PKD1 homologues but inefficiently to the authentic PKD1 sequence.
Thus, the amplification products in this diagnostic test are 20 derived only from the authentic PKD1 gene.
Twenty-five to thirty-eight cycles of amplification were
S..
performed, using a standard DNA primer-dependent conditions for SEQ ID NO:56: 94°C, 30 seconds; 34 minutes.
SEQ ID NO:57: 94°C, 30 seconds; 30 37 minutes.
SEQ ID NO:58: 94°C, 30 seconds; minutes.
thermal cycler the following each cycle: 62 0 C, 30 seconds; and 72 0
C,
56°C, 30 seconds; and 72 0
C,
58 0 C, 30 seconds; and 72 0
C,
b The 72 0 C extension cycle was lengthened 5 seconds each subsequent cycle. The primary PCR product can be analyzed immediately for mutations or alternatively, can be used as a template for secondary PCR using a collection of paired amplimers to generate an overlapping set of smaller amplicons. The smaller amplicons can then be analyzed for mutations.
B. Test II Long PCR conditions were used with a 4-part reaction mixture. Part 1 containing the following components: 3.3X XL Buffer 12 .l dNTPs (2mM each) 8 1 Forward primer (20pM) 1-5 p.
Reverse primer (20pM) 1-5 p.
Mg(OAc)2, (25mM) 4.4 p.
S" 20 water to 40 ~I *P art 1 can be assembled as a single reaction component or in batch (10, 50, 100 reaction equivalents) and then dispensed as 40pl aliquots into individual reaction tubes.
Part. 2 comprises carefully adding 1 AmpliWaXPCR Gem 100 (or comparable product to each Part 1 reaction tube. The tubes were incubated at 75-80C for 5 min. To melt the wax bead.
The reactions were cooled allowing the wax to solidify.
In Part 3, the following components were added to the cooled S" reaction mixture of Part 2: 3.3X XL Buffer 18L1 rTth DNA Polymerase, XL 2pl In Part 4, the following components are added to the reaction mixture of Part 3: human DNA 0.
2 -lig water to Twenty-five to thirty-eight cycles of amplification were performed, using a standard DNA thermal cycler the following protocol for each cycle: 94C, 30 seconds; 61°C, 30 seconds; and 72 0 C, 11 minutes. The 72°C extension cycle was lengthened 5 seconds each subsequent cycle. The primary
PCR
product can be analyzed immediately for mutations or alternatively, can be used as a template for secondary
PCR
using a collection of paired amplimers to generate an overlapping set of smaller amplicons. The smaller amplicons can then be analyzed for mutations.
The forward primer used in the reaction described 20 above comprises an oligonucleotide that hybridizes to both authentic PKD1 and PKD1 homologue sequences. An Example of such a primer is: CTGCACTGACCTCACGCATGT (SEQ ID NO:11) The reverse primer comprises a sequence derived from the authentic PKD1 gene and is not present in the PKD1 homologues. Thus, the amplification product in this diagnostic test is derived only from the authentic PKD1 gene.
30 An example of a suitable reverse primer is: 5'-GCGCTTTGCAGACGGTAGGCG (SEQ ID NO:14) C. Test III Long PCR conditions were used with a 4 -part reaction mixture. Part 1 containing the following components: 3.3X XL Buffer 12 p.
dNTPs (2mM each) 8 il Forward primer (20pM) 1-5 ll Reverse primer (20pM) 1-5 .1 Mg(OAc)2, (25mM) 4.4 .1 water to 40 pll Part 1 can be assembled as a single reaction component or in batch (10, 50, 100 reaction equivalents) and then dispensed as 40.l aliquots into individual reaction tubes.
Part 2 comprises carefully adding 1 AmpliWaxPCR Gem 100 (or 20 comparable product to each Part 1 reaction tube. The tubes were incubated at 75-80°C for 5 min. To melt the wax bead.
The reactions were cooled allowing the wax to solidify.
In Part 3, the following components were added to the cooled reaction mixture of Part 2: 3.3X XL Buffer 18L1 rTth DNA Polymerase, XL 2.l 30 In Part 4, the following components are added to the reaction mixture of Part 3: human DNA 0.
2 -1gg water to 40 .l Twenty-five to thirty-eight cycles of amplification were performed, using a standard DNA thermal cycler the following protocol for each cycle: 94 0 C, 30 seconds; 65 0 C, 30 seconds; and 72 0 C, 11 minutes. The 72 0 C extension cycle was lengthened 5 seconds each subsequent cycle. The primary
PCR
product can be analyzed immediately for mutations or alternatively, can be used as a template for secondary
PCR
using a collection of paired amplimers to generate an overlapping set of smaller amplicons. The smaller amplicons can then be analyzed for mutations.
The forward primer used in the reaction described above comprises an oligonucleotide that hybridizes to both authentic PKD1 and PKD1 homologue sequences. An Example of such a primer is: ACGTTGGGCTCCTGGGCAACC (SEQ ID NO:12) The reverse primer comprises a sequence derived O 20 from the authentic PKD1 gene and is not present in the PKD1 homologues. Thus, the amplification product in this diagnostic test is derived only from the authentic PKD1 gene.
An example of a suitable reverse primer is: 5'-AGGTCAACGTGGGCCTCCAAGTAGT (SEQ ID NO:13) For RT-PCR, first strand cDNA synthesis is performed using the reverse primer (SEQ ID NO:14) and SuperscriptIIm according to manufacturer's recommended 30 conditions (Life Technologies, Inc., Gaithersburg, MD). PCR is then performed using 1-50% of the first strand reaction under the reaction conditions described above, with the modification that the extension cycle is conducted at 72 0
C
for only 6 min. (due to the smaller product size).
D. Test IV To analyze PKD1 mRNA for mutations, RNA is isolated from the white blood cells as a requisite template for RT- PCR. Whole blood samples collected in high glucose
ACD
Vacutainers (yellow top) were centrifuged and the buffy coat collected (4-20 x 106 cells/10 ml of blood). RNA can be isolated directly from white blood cells or after standard short-term culturing of white blood cells in the presence of a mitogen such as phytohemagglutinin (48-72 hours). RNA is isolated as described using standard conditions such as guanidium isothiocyanate:acid phenol extraction (Chomczynski and Sacchi, Anal. Biochem. 162:156-159, 1987).
For RT-PCR, first strand cDNA synthesis is performed using the reverse primer (below) and a commercially available reverse transcriptase, such as, for example, SuperscriptII
T
m according to manufacturer's recommended conditions (Life Technologies, Inc., Gaithersburg, MD).
PCR
20 is then performed using 1-50% of the first strand reaction under the reaction conditions described below.
The reverse primer comprises a sequence derived from both the authentic PKD1 gene and the PKD1 homologues.
In contrast, the forward primer is specific for the authentic PKD1 locus and will not allow amplification of cDNAs derived from the homologous loci. Thus, the resulting
RT-PCR
amplification product in this diagnostic test is derived only from authentic PKD1 RNA.
The forward primer used in this reaction comprises an oligonucleotide that hybridizes only to authentic PKD1 and not to homologue sequences. An example of such a primer is: AGCGCAACTACTTGGAGGCCC (SEQ ID An example of a suitable reverse primer is: GCCAAAGGGAAAGGGATTGGA (SEQ ID NO:16) The amplification aspect of the RT-PCR reactions was performed using standard conditions as described below including a "hot-start" step: Taq Buffer 8 a1 dNTPs (2mM each) 7 il Forward Primer (100M) 0.4-1.5 .1 Reverse Primer (100M) 0.4-1.5 g1 DNA 0.2-1.0 gg water to 80 .1 Amplification was initiated using a single "hotstart" step, followed by twenty-five to thirty-eight cycles 20 of amplification using a standard DNA thermal cycler. The single "hot-start" step consisted of 80°C for 3-5 minutes after which time 1 i 1 of Tag polymerase was added to each reaction tube. "Hot-start" was proceeded by 25-38 cycles with each cycle consisting of the following specifications: 94C, 20 seconds; 64°C, 30 seconds; and 72°C, 2 minutes.
The primary PCR product can be analyzed immediately for mutations or alternatively, can be used as a template for secondary PCR using a collection of paired amplimers to 30 generate an overlapping set of smaller amplicons. The smaller amplicons can then be analyzed for mutations.
The PCR and RT-PCR products obtained above were analyzed for the presence of specific PKD1 mutations as follows: 8 g1 of the amplified products were added to 50 il of a denaturing solution (0.5mM NaOH, 2.0M NaC1, 25mM
EDTA)
and spotted onto nylon membrane filters (INC Biotrans). The denatured DNA was then fixed to the nylon filters by baking the filters at 80 0 C for 15 minutes under vacuum.
Oligonucleotides that detect PKD1 mutations were chemically synthesized using an automated synthesizer and radiolabeled with y 2 p with polynucleotide kinase, using methods that are standard in the art.
Hybridizations were carried out in plastic bags containing the filters prepared above, to which one or more labeled oligonucleotides were added in a hybridization buffer Tetramethylammonium chloride (TMAC), 0.6% SDS, 1mM 20 EDTA, 10mM sodium phosphate pH 6.8, 5X Denhardt's Solution, and 40 pg/ml yeast RNA). Oligonucleotide concentrations in the pools ranged from 0.03 to 0.15 pmol/ml hybridization olution.
Hybridizations were allowed to proceed overnight at 52°C, with agitation. The filters were then removed from the bags and washed for 20 min. at room temperature with wash buffer (3.0M TMAC, 0.6% SDS, ImM EDTA, 10mM sodium phosphate H followed by a second wash in the same buffer for 30 min. at 52°C. The filters were dried and exposed to Kodak X-OMAT film.
It will be understood that the enzymes and nucleotides used in the above reactions may be obtained from any manufacturer, such as GIBCO-BRL, Promega, New England Biolabs, and the like.
Example 6: Antipolvcvstin Antibodies A. Production and Characterization of Polyclonal Antisera Against Synthetic C-Terminal Peptide.
A peptide (C)SRTPLRAKNKVHPSST (SEQ ID NO:17) representing the last 16 carboxy-terminal amino acids of the predicted PKD1 gene product was synthesized. A cysteine residue that is not predicted from the DNA sequence was appended to the amino terminus to facilitate coupling to KLH carrier protein. Two rabbits (A and B) were immunized with the peptide as described in Cheng et al., EMBO J. 7:3845- 3855, 1988.
Polyclonal anti-peptide antisera were diluted from 20 1:10 up to 1:10,000, and immunoreactivity was determined by ELISA according to conventional procedures (Cheng et al., EMBO 1988 supra.). Antisera produced by both rabbits were epitope mapped by the SPOTs method (Blankenmeyer-Menge and Frank, in INNOVATION AND PERSPECTIVES IN SOLID PHASE SYNTHESIS, Epton, R. Ed., Chapman and Hall Medical, London, 1990, pp. 1-10). Briefly, overlapping 8 amino acid long peptides were synthesized simultaneously on a cellulose membrane and assayed for immunological reactivity. Positive peptides were aligned and the epitope was identified by 30 determining sequence homologies. Interestingly, antisera
A
and B had at least 2 non-overlapping epitopes each, thus increasing the possibility that these antibodies will recognize the PKD1 gene product.
B. Domain Specific Fusion Proteins Four fusion clones were constructed to contain different domains of polycystin such that the correct open reading frame was maintained, as shown in Figure 15. Three of the expression constructs were cloned in the pGEX vectors designed for the expression of foreign sequences as glutathione S-transferase (GST) fusion proteins in E. coli.
These are FP-LRR, which contained the leucine-rich repeat (LRR); FP-46-lc, containing 83 C-terminal amino acids and.
FP46-2 which has 77 amino acids internal to the FP-46-1c.
The fourth fusion construct was cloned into a maltose binding protein (MBP) vector, and encoded 205 amino acids at the carboxy terminus, thus overlapping two of the GST fusion proteins. The overlapping carboxy-fusion products provide an additional layer of antibody reagent confirmation. They allow one to verify that positive antibody reactions are not artifactual, since similar, if not identical, patterns of antibody reactivity should be seen with antibodies raised 20 against these overlapping proteins. Two different 'carrier' fusion proteins also allows one to purify antibody raised against a fusion product using the alternate carrier protein -:as the affinity ligand. This helps to eliminate antibodies raised against the carrier protein itself.
GST fusion proteins were purified from extracts of transformed bacteria using glutathione-Sepharose (Pharmacia) as described in Smith and Johnson, Gene 67:31-40, 1988. MBP fusion proteins were purified on amylose resin (NEB, Beverly, 30 MA).
C. Generation and characterization of polyclonal antibodies to domain specific polycystin fusion proteins.
Antibodies against the fusion proteins were raised in rabbits using published procedures (Cheng et al., EMBO J.
1988 supra.) with 200 gg of protein. These respective antibodies specifically recognized PKD1 protein as part of the fusion protein construct used as immunogen
FP-LRR,
FP46-lc, FP46-2 and MAL-BD-3). Further, these antibodies did not bind the irrelevant antigens GST or MBP, nor cross-react -to polycystin domains not present in the immunogen included as controls after sufficient antibody purification.
In vitro synthesized polycystin protein was used to test the domain specific antibodies. In addition to the full-length PKD1 cDNA, two shorter clones which each expressed only a subset of the PKD1 domains were constructed 20 in expression vectors as shown in Figure 16. The BRASH 7 clone contains the carboxy terminal epitopes, as well as the oo.o. transmembrane domains, while SrfIA contains the amino terminus, the LRR, and the majority of the Ig-like domains.
Both are efficiently expressed in the TNT in vitro transcription/translation system.
D. Immunoprecipitation Antipolycystin antibodies were incubated with 30 either protein A Sepharose or Protein G Sepharose to generate antibody coupled beads. These beads were then incubated with 3 5 S-labeled protein synthesized in vitro from the expression clones. The void and retained fractions were collected and analyzed by SDS gel electrophoresis. Sepharose alone was included as a control against artifactual binding, a concern due to the large size of polycystin, the presence of the large number of Ig-like repeats, and the lectin domain.
Antibodies to irrelevant antigens were also included as controls. If the antibody specifically bound the antigen, a protein species of the correct molecular mass will be detected on the gel in the bead fraction. If not, the expressed protein will appear in the void volume on the gel.
1 0 Each of the anti-fusion protein antibodies coupled to Sepharose A specifically immunoprecipitated protein expressed by clones which contained the matching antigenic domain. The antibodies did not immunoprecipitate protein expressed from irrelevant domains of polycystin domains not used as immunogen to generate that particular antibody), nor did they recognize other irrelevant antigens luciferase). These results confirm that these polyclonal antibodies specifically recognize the carboxy terminus and LRR domains of polycystin.
Example 7: Identification of Proteins that Interact with PKDI Further characterization of the PKD1 protein can be accomplished through identification of other proteins which *normally interact with the PKD1 protein. Those of skill in the art are familiar with a variety of approaches useful for such purposes, including, but not limited to, 30 immunoprecipitation of protein complexes using antipolycystin antibodies, screening of expression libraries with labeled in vitro synthesized polycystin, and use of yeast systems that exploit the interaction of DNA binding and activation domains.
For example, one such approach is the two-hybrid yeast system (Fields and Song, Nature 340:245-6, 1989; Finley and Brent, Proc. Natl. Acad. Sci., USA 91:12980-84, 1994) which enables the identification of genes which encode proteins that interact with PKD1. This technique relies on the fact that eukaryotic transcriptional activators, such as GAL4, function utilizing two essential and discrete domains, an amino terminal DNA binding domain and a carboxy terminal transcriptional activation domain (Ma and Ptashne, Cell 51:113-119, 1987). The two-hybrid system exploits the observation that a functional transcriptional activator can be generated even when the two domains are encoded by different hybrid polypeptides, so long as the spatial relationship between the two essential domains is similar to the native transcriptional activator. The yeast two hybrid system has been used successfully to screen cDNA expression libraries in search of proteins that interact with Yin-Yang-1 (Shrivastava et al., Science 262:1889-92, 1993), E12 20 (Staudinger et al., J. Biol. Chem. 268:4608-11, 1993), H-Ras (Vojtek et al., Cell 74:205-214, 1993), Pr55gag (Luban et al., Cell 73:1067-78, 1993), pllORB (Durfee et al., Genes Dev. 7:555-69, 1993), and p53 (Iwabuchi et al., Oncogene 8:1693-96, 1993).
A. Hybrid Construction Several constructs of the PKD1 regions as fusion proteins with the GAL4 DNA binding domain were prepared. The *fo. 30 constructs were: a BD-3 fusion between the GAL4 DNA-binding domain and the cytoplasmic tail of the PKD1 protein (amino acid residues 4097-4302) using pGBT9 vector, a BD-1 clone containing a DNA-binding domain and the LRR region of polycystin (amino acid residues 27-360), and a BD-2 clone which contains DNA-binding domain and region of Ig-like repeats (amino acid residues 713-2324).
B. Transformation of constructs into yeast Competent yeast cells HF7c, containing the lacz reporter gene are obtained by the LiAc method. Briefly, overnight cultures are diluted to OD600 0.2 and continue to grow for an additional 3 hr. Cells are collected, washed in H20 and resuspended in 0.1M LiAc in TE. Competent cells (0.1 ml) are mixed with 0.1 mg of plasmid-construct DNA and 100 mg of carrier DNA. 50% PEG400 (0.6 ml) is added and incubated at 0 C for 1h. Following this incubation, the cells are heated to 42C for 10 min. and plated on minimal medium (Difco Yeast Nitrogen Base without amino acids, supplemented with auxotrophic requirements) Yeast transformants are selected after 3 days of culture.
B. Colony lift filter assay for E-galactosidase 9 99 9 2 VWR grade 410 filters are layered over agar plates containing transformants on selection medium and transferred to a pool of liquid nitrogen for 10 sec. Filters, colony side up are placed on another filter that is presoaked in 25 X-gal solution. After two hours, filters are analyzed for the presence of blue, E-galactosidase producing colonies (not shown). Alternatively, individual colonies from different transformations can be streaked onto the same plate and processed for E-galactosidase activity.
While the present invention has been described with respect to what are presently considered to be the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, the -59invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or integer or group of elements or integers but not the exclusion of any other element or integer or group of elements or integers.
The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that that prior art forms part of the common 0 general knowledge in Australia.
see* o*oo oooo o 0 oeooo eeo Page(s)II- LTL+ are claims pages they appear after the sequence listing SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: KLINGER, KATHERINE W LANDES, GREGORY M BURN, TIMOTHY C CONNORS, TIMOTHY D DACKOWSKI,
WILLIAM
GERMINO,
GREGORY
QIAN, FENG (ii) TITLE OF INVENTION: POLYCYSTIC KIDNEY DISEASE GENE (iii) NUMBER OF SEQUENCES: 58 (iv) CORRESPONDENCE
ADDRESS:
ADDRESSEE: GENZYME CORPORATION STREET: ONE MOUNTAIN ROAD CITY: FRAMINGHAM STATE: MASSACHUSETTS COUNTRY: USA ZIP: 01701 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM:
PC-DOS/MS-DOS
SOFTWARE: PatentIn Release Version #1.25 (vi) CURRENT APPLICATION
DATA:
APPLICATION NUMBER: US FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT
INFORMATION:
NAME: LASSEN, ELIZABETH REGISTRATION NUMBER: 31,845 REFERENCE/DOCKET NUMBER: GEN4-17.8 (ix) TELECOMMUNICATION
INFORMATION:
TELEPHONE: 508-872-8400 TELEFAX: 508-872-5415 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 53577 base pairs o TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear *O (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: o*o TGTAAACTTT TTGAGACAGC ATCTCACCCT GTTCCCCAGG CTGGAGTGCA GTGGTGTGAT
CATGGCTC,
TGTAATAAj
TTCGTGGG)
TTGAGACAC
CTGCAAcT'I
GATTACAGG
TCCATATT(G
CCCAAAATG
AACACAACAI
CTCTTTTCC4
ACTTACTTT(
CTCACTC=G
CTCCTGGGT]
TGCCACCATC
CAGGATGGCIl
ATTACAGGCA
CTGTCACCCA
GGTTCCAGTG
CATGCCTGG4C
GGTCTCGAAC
ACAGGTGTGA
CCCCCTTTTA
AAGAAACATT
TTCTCTGTC
TCACTCTTGT
CTCCTGGGTT
TGGCACCACC
CAGGCTGGTC
GATTACAGGC
ATTTCCAGGT
CTAAACAATT
TGCAGCGTCA ACCTCCTGGG CCTCCTGCAA TGTCTTTGTT GATGTTCTAT TTTGTTTTTG GTCTTGCTCT TGTTGCCCAG CACCTCTTGG GTTCAAGAGA
TCTACTTGAT
TTTCAAAATc
TGTGTGTGTG
GCTGGAGTGC
TTCTCCTGCC
ATTTTGTATT
CTGTAAACTT CGAGGGAAGG TTTGTATTTC ACAGTTTAGC TGTGTGTTTT
GTGTTTTTTT
AATGGTGTGA TCTTGGCTCA TCAGCCTTCC
GAGTAGCTAG
TTTAGTAGAG ATG7GGGTTTC
CACCCCGCTA
C GCCGCCACC G TCAGGCTGG C TGGGATTAC.
3TTCATAATAJ 2 ATTTAACAC( 3TTTGCAGTT.
r' cAccAGGcJ
'CATGCGATTC
*CCCAGCCAA'I
CAATCTCTTG-
*TGAGCCACTG
GGCTGGAGTG
ATTCTCCTGC
TAATTTTTGT
TCTTGGCCTC
GCCACTGTGC
GGCTAACAAT
TCCCTCACGT
TCACCACATG
TGCCCAGGCT
CAAGCGATTC
CCCAGCTAAT
TCGAACTCCT
ATGAGCCACC
AACTAATTTG
GCATTTTATT
CTCAAACTCC CGACCTCAGG TGATCCGCCC
ACCTCAGCCT
A GGCGTGAGTC r ATTCTACATA
-TTTTGCCTTA
V CCTGTCTTTT r' GGAGTGAAGT
TCCTGCCTCA
lTrTGTATTT
ACCTCGTGAT
TGCCTGGCCT
CAGTGGGGTA.
CTCAGCCTCC
ATTTTTAGTA
ATGTGACCCG(
CTGGCCTGGC
IJ
TATTCACTGT
CTTCTTCCCTC
GATTTTGTTT1
GGAGTGCCATG
TCCTGTCTCAG
TTTTGTATTTT
GACCTTGTGAT
ACGCCCGGCC C CTTCTTTAAA
C.
CCACAACCGC C
ACCGCACCTG
GACCATACCT
GGTTTATTTT
TTTTTTTTTT
GGCGGGATCT
GCTTCCCGAA
TTAGTAGACA
CCACCTGCCT
TTTTTTTTCT
ACCTCAGGTC I
CGAGTAGCTG
3AGACGGGGT T CTGCCTTGG C ETTCTTGTTT
C
'AATAAAAAC
C
;AACCAAACAA
'TTTGTTTCT 1 GCACAATCT C CCTCCTGAG
T
TAGTAGAGA C( CTGCCCACC
T
CCATGGTIT
ATATGTCTT TJ 7'rCAAACAA T(
GCCAATGTTC
GTTATGTGTA
TCTGGTATCA
TTTTTTTTTT
CGG3CTCACTG3
TAGCTGAGAC
CGGGGTTTCA
CCGCCTCCCA
VTTTGAGATG
~CTGCGACCT(
;GATTACAGGC
TTGCCACGT
CTCCCAAAG
TTTTCTCCT C CTCAGGTCT G
GATCTCTGGC
TGTTTTTTGA
AkGCTCACTG C kGCTGGGAT T 3GGGTTTCA C4 PGG CCTCCC A [CAAATAGT T' rCTATTTAA G ATTGAGAC T
TATTTTTGAG
GATAAACAGA
ATACTGGCAC
GAGACAGAGT
CAACCTCTAC
CACAACTGTG
CCATAcTGGc
AAGTGCTGGG
3AGTCTCACT
CGCCTCCCG
'ACCCACCAC
.GCCAGGTT
IGCTGGGATT
TTCTAGTTT
TATTMATC
ACATTTTATT
GATGGAGTC
AACCTCCAC
k~CAGGCGCG
CAIGTTGGT
V ATGCTGG
I'AGAATTTC
!LAATCCT'T
rGGTTAATC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920
TGTTTTGCTC
AATCTTAAGT
TACCATCCAC
GCTCTGTTGC
CTACTGCCTC
TTTITGTATT
TGATCTTGTG
CACTCCCAGC
ACTCTrTTTGcC
CTGGGCTCAC
CAGTTTGGCT GGTCTCCAAC
T
ACCGGAGTGA G
CTCATTTAAT
AATAAATAAG
G(
TCTTTCAATA AC CCCTCTGTCT
C]
GTCGGGAGGC
TC
AACCGCGGAA
GA
CTGGCCCACA
CC
ACGCACTTTA
GC
CTGCAGTGCG
AC
GCGTGGGGCG
GA(
AGCCTCCGGA
TGC
CCTGCACCCG
CC(
AGGAGCCGCG
C
GAGCGGCCTG
CC
CCATCCAGCC
CC
GCGGGCGGGC cT CGCCCGCC!CG
CCT
GCCCCGGGCG
CGG
ATTTGGC
CTGTCAT
CTGCTGn
CCAGGCT(
AGGCTCCC
TTTAGTAG
1TCTGCCC
AGTTCTT
~CAGGCGG4
CGCTCCT(
ATTTTGGC
CCTGGAC
CCACTGTG
AAGGGAAT
AATAAAT
CTCTCACi
.'CTCTCAG(
CGGGTAC
AGGATCAC
ACAGGAGA
CTGCAGCG
GGCGGTGC
~CTTCCGG
MCAGTCCC
CGCCCCC
G-GGCCCG
CCGAGCC
GCCGCC!
GGGACGG
GGCGCTG
TGCCGG
AGC AGTTTCTTGT GGCTGTq TTG ACTGCAATTA
AAAGCTG
~TG ACCTGGTAAA
TTTCTTT'
3GA GTGCAGTGGC
ACAACCT(
TA GTAGCTGGGA
TTATAGG
AG ATGAGGTTTC
ACCATGTJ
GC CTCGGCCTCC cAAAGTGC TT TTTCTTTTTT
CCATTTTT
3A GTGCAGTGGC
ACAATCAC
C GGCCTCAGCC
TCTCGAGT
~T AATTTTTGTA GAAAcGGG( 'C AAGGGATCCA
CCTTCCTCC
C CCTGCTGCA
AATTTCTT;
A ATTGTAGCAC
ACTTTTTC
GAATGGATGGG
GAATGAAGG
CATCAACCT~C
CATTGCCTG
SCAGGAAA~CCT
GGGGTAGGG.
P GACTCGGCC
GCGCACGGA(
GGTGGAGCCT
GTGGCTGCTC
AGGGCGGAGC
AGATGGCACC
GGGCGGAGCG
TGAAAAATAC
TTCCAGACGC
TCCGCCCCAC
ACGCCCCGCC
CTGCTGCCGA
TCATCGCTGG
CCCGGTCGCG
CCTCGCCCCG
TCC!GCCCCGC
C.ACTGCAGCG
CCAGCGTCCG
CCGAGCGGGC
GTCGCTCAGC
ATGCCGTCCG
CGGGCCCCG;C
CGGGGCCArG CGCGC GC=GC GCCCTGGGCC TGGGCCTOGrn -TCT TCCCTCCACT
GGAGTCCTTG
GGT TTGGAATACA
ATCGCAGCCT
TTT TTTTTTTGAG
ACGGAGTCTT
CTG CCTCCCAGGT
TCAAGCGATT
['CC CTGCCACCAT
GCCCAGCTGA
'CG CTAGGCTGGT
CTCGAACTTC
TG GGATTACAGG
CATGAGCCAC
TT TTTTTTCGAG
ACAGGATCTT
GG CTCAGCGCAG
CCACTGCCTA
kC CTGGGACTAC
AAGCGTCAGC
*T CTCGCCATGT TGCCCAGGcTr *C CCTCTCAjAjG
TTCTCGGATT
LACTGTCTGTG
CCTCAGTGAC
'A GACCTGTGA GATTcAATGG ATGTGGGTTTC
CTCCCTCTTG
T TCTCTCTC TT CCCCCTCTCT G~ GCTTC'GA#GC CA~cCGCTGC SATCGCGGGG
AAGGATCCAC
CAGGAGGAGG
AACCCGCCGC
CTGCCCACCG
CTTCCCGCCC
CTCGTGCTCC
TCGGCCGACT
CTCCCATCGc
CCCCGGGAAC
CCCTGTGCAG
CCCACCCTGA
CTGTGCCCAJA
GGGCGGAG
CCCGCGGG
GAGGAGGAGG
AGCGGCCGC
CGAGCTCCCG
AGCAGGTCGC
GGCCGCACC
CTGAGCTGCG
GCCTCCGC
CICTAACGA'rG
CCGCCCGCCG
GCTCCCGC CTrCCGc3G CTGCGGCCCA GCCCCCGGCG 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 348 0 3540O 3600 3660 3720 3780 S
SSSSSS
S
SSSS
SSSS
S
S
5
CCCTGCGAC
CCCCCTGCCT
CCGCCTGCC(
TCCCCGCGG2
GGACGGGCGC
CGCTCCCAGG.
GAGCCGCGGT
CACCCGGGGG
GGACAAACAG
AGAGGTGGGG
ATTTGATGGA
AAGAAGGGGG
GTGAGGGGTA
TCAGGCGCTC
3CGTCAACTG( SCGCCACAGCc
;GCGTGGGCGC
GGCGAGGCTC
TGGCACGGCC
CTCTGCAGAC
TCGCTAATTG
ATGGCTGTAG
TGTGCAAGAA
GTGAACGGTG
GGGGCAGGGT
TGCTATTGGG
TCGGGCCGC(
CTGTGAGTAC
GTTCCCTGGc
CGGCGCGGC.P
CCGGGGAGCC
GCCAGCGGGG
GAGAGGAATT
GGGGCGGCAG
TTGGGCTGAT
TGAGCAkAAGA
GGGAGGTGGG
TTCCAAGGCT
3 GGCTGCGGCc GCTCGGTCCC
GCGCTGCGCA
CGGGCCCAGC GGCACCCGGG
AGAGGCCGCG
CCGGGACGG AAGCAGGACG
CGGGCCAGGA
CGGCGGG4CCC TGCTAAATAA
GGAACGCCTG
GAAAAACCCC GGGTCTGGAG
ACAGACGTCC
GCGGGGCGCG GAGGCCGCGC
TCAGCTGGGA
GGGATCGGCC TGGGGCTGCG
GGGTACCCGG
GGAAGAGTTC CAGGAGGGT CTGGAAjGG GCTTAGGAAG GGGCGATGAG
GTGGGTCCAG
CCGTGAGGCT GGAGGCTGGC
CACGGGAGGT
CTCGCGGGrG GGCTGGGGTC
ATGAAGGGCC
ATCCTGAGAA CAGGGGTGAG GGGGGATTc CGTGGGGGGT TAAAGCCTTG
TCATGTTCGC
TATGGAGAC(
AATCCTGACI
GAAACAGG71
CAGGGGTTTC
ACTAAGATAA
CTTGTATTTC
ATCCCTGGGG
GGTTTTGTAA
TGTCCTrTGGG
AATGGAGCCC
TGTCTCGCGG
TTGCATCAGA
CCAGGCCGGT
TGGTGTCCAT
TGAGCGGAGA
CTGCCTTGTG
GTTGGTGGCC
3CTGCCCAGA(
PCTGACCATC(
TGGAGAGGTC(
GAACTATGAC
GCAGACAGTI
CCAGGCTCCA
GACGTGGCAC
CGTITCTGGG
GCTTTCTGG
GTGCCCCTCG
CTGTCTCTGT
CCTACCCCAC
CACTG2GGGAC
GAGAGCAGCC
GCAAGGCCCG
TTGTCCTCTT
TGTCCCAGCC
3 CCAGGTCTG
GAGGCATAG
ACACGACCTD
GTGCCCAGGI
GTCCCCAGCc
GGCTCCTCG(
ATCCCCAGGC
TCACTCCCGC
GTGGTCTCGA
GGGCCACATT
GGAGATGGCC
CCGTTGTTTG
TCTGTCCAGG
CCTCAGGAGC
TGTTCTCCAG
AGGCTCTGGT
TGAGCTGGCA
TTTCGGGAGA
GCCAGGCTCC
GACCGTGGAG
TCCCAGGCAT
CCCAGGGTTG
TAAAAACAAC
TGTTGGGGGT
ATTTGCATTT
CACAGCCGGG
GATTGAAAAG
CTGGGGAGAG TCTTGGGACC *e.
CGGGACAGTG
TTGCTAAACA
CTGTGGCCAC
GGGTIGGGAGA
GCTCCTGCGC
TCCTCCTGCC
TGATGCTGTA
GCCTGGTGGT
TGTCCGGGAG
GCCCTTGGCA
CCTGGGGTTTq
AGATTCCGAA
TCTCCTTGGG
TTGGGTGGGT
CCTTCCTTAG
AGAATGGGTT
TCCCTGACTG
TGGCAACAGc
GCTGAGGGCT
TCCTG7CTTCC.
AGAAGGGCGC
CAGCAGTGGA i
GGAGGAGGGG
TGCCAGGCCC
AGGTGGCCTT
CGTCATGCGG
CACAGATGAG
ATGTGCATAG
GGCGGAGGGG
AGTCTGATGC
TGCGTGCTGG
TCTGGCATTT
GGGAGCCGTG
CTCCTGGACc
CGGACGCGTG'
kCCCACAGAA 7CTC743TCTG
AGCACCTGA
'GGTGGCTGC
;CCCCCGCCC
7ACCCTGGGA 3840 3900 3960 4.020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 CAACAGGGCA CAGGGTGACC TCATGTGGGC AGGTGGGTC TGTTCTGTAC ACACCTGG 5640
CCGCCGCT
GGAAGGATI
AGCCATTT=
GGAAGAGGC
AAGGGTCC7
GGTGCCTCI
TTCCAGGTG
GCCGAGAGG
CTCGAAGCT'
AGAGCCTGC(
TTTTTTTTT'.
CGATCTCGG(
TCCCGAGTAC
TAGAGACAGC
CGCCCGCCTC
GACCCATGT'r CGCAATAcTG
TGTGTTCACA
AATTAGTTGA
TTGTCTCCAG
GTTCAAGCGA
CACCACCGTG
GG GAGAGTTC
GAAGGCCC'
rG CTGTCTAC( 3G TCAAGCqrGC ~A GAGCGTTGC 'C CACTTGTGG
AGGACAGCC
T GGTGGCTTT r CGTGGGGAG,
-CTGCCCTCCC
r TTTTTTGAGI
-TCATGGCAAC
CTGGGATTAC
GTTTCTCCAT
GGCCTCCCAA
TTGAACCAAA
CAGACCCACC
TTTTTGGTTT
TGTCTTTTTT
GCCGCAGTGC
TTCTCCTlGCC
CCCAGCTAAT
TO GAAGGTGGGG
TGAGGGGACC
TG GCTGGCCCCC CAGGCcACC X TGCAAACTCC
TCCTCGG.JGA
GAGAGOTGAAG
GACACAGATC
'T GTCTGGGTGT
CACCAGTAGC
CTCCCTGGCTG
CTGAAGCTCA
G GGGCTTCTGA GGCCACAGCC T GGAAAAGATG
GCGTACTGCA
A. CGTGGGCGA
GCCGTGGCTG
2TGCCCCGACC CTTCTCCCTC C SCAGAGTTCAC
TCTTGTTGCC
CTCCGCCTCC TGGGTTCAAG
C
AGGCGTGCAC CACCATGCCT
G
ATTGGTCAGG CTGGTCTTGA
A
AGTGCTGGGA TTACAGGCAT
G.
TTCCAGCCAC CCTTTTATCT
G(
TAACACAACA GACAGTTCCT
TC
AATAGTT'rGA ATTAAGAGCC
AX
TTTTTCTTTT TTTTTTTTTT
TT
AGTGGCATGA TCTCAGCTCA
CC
TCAGCCTCCC GAGTACCTGG
TA
TTTTGTATTT TTAGTAGAGA CG CATGGCA2 TCTGTGC2
GACGGCTG
ACAGCTGC
CTTCCTGG
GCAGGGAC,
TGCCTTGG(
k.AACGTGCJ kCTCACAGI
~TGACCCAM
LAGGCTGGA
'GCTTTTTC
GCTAATTT
CTCCTGAC(
AGCCACCA(
2 AAGCATT]
ATGCCACC
ATAAGGTC
'TGAGACGG
GCAACCTC
GCTGGGTT
GGGTTTTA
GGCCTCCC
k.AATATAT IrCAGTGG 3CCTCAGC
L'ATTTTTA
'AGGTGAT
~CCAGACA
LAGAATGA
'AGCTAGC
AC TAGGGCCTTA GT GGGGCAGCCC GG TTTTCCCCAG TG GCAGGTGTTC
GOGGCTCACGCA
1G CTGTGTCCAG ~T TAATGATGCT PG0 CTCTGCGTGG C CCCCCACc.Dc
'TGTTTTTTTT
OTGCAATGGCA
C TGCCTCAG~C
TGTATTTTTAG
C TCAGATGATC
GCCCAGCCCT
[TGGAGGGCAT
GAAGGCCTGG
CACACACTGC
AGTCTTGCTC
CGACTCCCTG
TACAGGCATG
CTGTGTTGGC
AAAGTGCTGG
ACTTTTTTTT
CGCGATCTCA
CTCTCCAGTA
GGAGAGACOG
CCGCCTGCCT
AAAATATATG
AGAATTCTAC
TGATGTCTmJ' 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6*420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 72'0 7210 7320 7380 7440 7500 CAGGATGGTC TCGATCTCCT
GACCTCGTGA
GATTACAGGC GTGAGCCACC
GCACCCGGCC
TTTTTTTGAG ACGGAGTTTC
GCTCTTGTTG
CCTCACGGCA ACCTCCGCCTr
CCCGGGTTCA
GCTOGGATTA CAGGCATGTG
CCACCATGCC
GGTTTCTCCA CGTTGGTCAG
GCTGGTCTCA
TGGCCTCCCA AAGTTGGG
ATTACAGGTG
TGTGTCTTTA AGGCTGGTCA
AGCAAAGCAG
CTGGCTGTGA TCAATTCGT'r GTG7AACACCA TCTGCCCAcc
AATGTCTTTT
CCCAGGCTGG
AGTGATTCC
TGGCTAATTT
AACTCCTGAC
TGAGCCAACG
TAGGACTGGA
CTGTGCTTGG
TC4
AG~
CT
TG
CGC
AcC
TGTTTTGTTT
GATCTCGGCT
CTGAGTAGCT
TAGTAGAGAT
ATCTGCTTGC
TATTTATTTA
GGAGTGCAGC
TCCTGCCTCA
TTGTATTTTT
CCTCGTGATC
CTGTCTTTTA
ATTTGTTGAAC
CTCCATGGTG T CG'IrGGAGCC
G
GATGTCATCA
G
CCCTGGCTGG G
GCACATCGAGG
GCCTTGCCCT T AGCAGGCGCA G 4 GAGCGACCCC T GGCTGGCCAG G( AGGACTAGGG A' AGGGCTCCGT T
GAGATGGATGA
CCAGGTTTCA A TCGGGGGCAG
G
AGAGCCCTGG
CA
CCTCACCCGG GG
TGCCTGCGTGAG
GGACAAAAAT GT TGTTTGAGAC GGAGTCTGG CACTGCAGCC TCCATCTCC4 GGGATTAGAG GCGCGCGCC GGGGTTTCAC CATGTTGGTC CTCGGCCTCC
CAAAGTGCTC
TTTATTTATT TTTATTATTT AGTGCCATCT
CAGCTCACTG
GCCTCCTGAG
TAGCCTGGAC
1GTAGAGACG GGGTTTCACC 2TCCCGCCTC AGCCTCCCAA ATGTCCGAT GATGTCTAGG ;AAACTGGCT CCTGCAGCCT ~CACCTCCGT GGTGCTGTGA CTCTGTCACCC AGOCTGGAGG ACAATGGTG3T CGGGTTCAAGC GATTCTCCTG
CCTCAGCCTC
SCCACGCCCGG CTAATTTTTA
AAAATATTTT
AGGCTGGTCT TGAACTCTTG
GCCTTAGGTG
GGATTACAGG TGTGAGTGAT
GTATTTTATT
GAGATGGAGT CTCACTCTGT
TGCCCAGGCT
CAAGCTCCGC CTCCTGGGTT
CACGCCATTC
TGGTGCCCGC CACCATGCCC
AGCTAATTTT
GTGTTAGCCA GGATGGTCTG
GATCTCCTGA
AGTGCTGGGA TTACAGGCTT
GAGCCACCGC
AGCTTCCCTT CCTCTCTTTT
TCCTTGTGCA
GGATTTCTCG CTGTGTCTG
GGGGTGCCAC
GTGTGTGCTT TGTGTTTCT GTAAATTGGT
ACATCCCA'
CTCAGGGCC
GGTCACCTC
GACTTGGTP
TCCCACCTG
A.CCCCGGGG
%.GGCCTGAT
CACGGAGG
7TGTCACCA
LTCACTGGA
'GGGAGGG
,GCTCCCAC
GTAGGGCC
GCGAGCG
TCCCTGAG
AAACGGTA
GGACAAGA
r' TGTCCCAGA(
CCCTGCTCT)
GGTCTGCTGC
LGGTGCTTGG'I
CATCCCTGAA
CCCCTGCCTG
GCCTTAGAGc
GGCCCTGGCT
AGGCCTCCAT
GCCCGCGTTC
GGCTGGCCTG
AGGGAGCTGC
TCTGTCAGGA
GAGGCAGTGG
CCGGGTCTTA
GTTTCTTTAT
AGTCACACGC
3GTTGTCCTG k AAGGCCACr'
TGTCTCGCAK
TCACTGATG'
TGACAGGAGI
GGATTGGCO']
GCAACTGCCP.
CCCATTTCTC
GAGCCCTCAG
AACCAACACG
GAAGGACCCC
CCAGAGAGAG
GAGCCTAGGA
TGAGGACCTG
CGTG7GCTCCC
TAGACGCGGA
TCACTCCTGT
G CTGGCACTGG r' CTGGTGCTGG k. ATGCTG4GGG'r P AAAATATAGG
GTGTGGGAGA
CGGGGAAGAC
GAGACACAGC
GTCCCTGGAT
CAGAAGGAGG
CAGATGATTC
CAGTGCAGGT
TCCCCAAGGG
GAGGCCTGTG
CATCCTGCAT
GCACTCGGGCC
TGCAAACTCGC
ACGCGATTGC c
CCTAGGTGTA
TTGCCACTCA
CCAGGACTGG
AGCACCCGGG
GTGTAGGGAC
AGGCATTCTG
TTCCTTGGGG
CCTGAGAGCG
GCCACCCTCG
TCCAAGGACA
GACATTGAAG
GCAAGGTGAC
rCTTCTAGGA 3TCCAGCTGG
TTCAGAACG
~CAAACTN'GT
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 GGGGGAAGGG ATGGGGAGGC TTTGGTTGTG TCTGCAGCAG TTGGGAATGT GGGGCACCCG 9360
AGCTCCCAC
TCGGCTTGC.
TTAAACATG<(
GGTGGATCG(
CTCTAAAAjAJ
TCAGGAGACI
ATCACACCAC
AAAAAAAAAA
GCCAACAGCG
CCACTGTCCA
TGGCTTGAGG
ATTAGTGGTG
GTCTATGAGG
CTAGGAAACG
r GCAGAGGCGA CTGTGGAGA kTCCAGATCAT ACAGGGAACA ACAAGGGcGTC ACTCACGCC~ TTGAGCCCAG GAGTTTGACJ TAAAkAGAACA TTGGCCGGGC GAGGTGGGAC ATCACTTGAG TGCACTCCAG GCTGGGTCAC AAAAATCACA GGATCTGAAC TGTGAGAAGA TGGTCGGCCT GCCGGGCGCG
GTGCCTCACG
CCAGGAGTTT GAGGCCAGCC GCGCCTGTAG TCCCAGCTAG TTGAGACTGC AGTTAGCTGT CAGAGAGCACC TGCAGGTCAT CCATGCAGTA CTATGATTCA ACAACAGACA
GGGACCCCGT
r GGAATCCCAG CAGTTTGG3cA
GGCCAGGGTG
CCAGCCTGG CAACAGGGTG
AGACCCCGGT
GTGGTGGTAT GCATCTGTGG
TCCCAGCTAT
CCGAGGAGGT CAAGGCTGCA
GTGAGCTGTG
AGAGCAAGAC CCTGTCTCAA AAAAAAAAA AGAGATTTCT CCAAAGAAGA
CGCACAGAG
CATTAGTCAT GAGGGAAJACG TAAATCAAAA CCTGTAATCC CAGCACTTTA
GGAGAGCAGA
TGGGCAACAT AGCGAGACCA
ATAAATAGAT
TTGGGAGGCT GAGGGGGGAG
GATTCCCTGA
GATGGTGCCA CTGCACTCCA
GCCTGGGCGA
AAAACAGGGT GGGCGCGGTG
GTTCACGCCT
GTCTTTAAAA
AAAAAAAAAA
GTAATCTCAG CACTTTGGGA GGCCAAGGTG GGGGGATCAC
AAGGTCAGGA
9 .9 9 9 9 AGCCTGACtA
TCGTGGGCGC
GGGAGGCGGA
GCGAGACTCT
GCTACTTTGG-r
GGCAAGATTG
AAAAGGTCTA
AGCCCTCTGT
CTGGTCTCAT
ATCAGCATGC
GCCCTGGTTT
TTGCACTCCC
GCCCCTCCCAC
CCCACCTCTCC
TCTCCCTCCC
ACATGGTGAA
CTGTAATCCC
GGTTGCAGTG
GTCTCAAAAA
GGGCTGAGGC
CACCATTGCA
GGAAGAGTCC
GTTCTTGTCT
rTTAcACAcc
TCTGGGGCAG
b.TAGACAGAC
AGGGGCTG
:CTCTCCCTC
:CTCCCTGCC2
"GCCAGCCCC
ACCCCGTTCT
AGCTAATTAG
AGCCAATATC
AAAAAAATGC
AGGAGAATCG
CTCCAGCCTG
GCACCCTCTC
CTCCATACCT
AGGAAATTGA
ACCCCTGCAG
AGAGGTGGCA
k.GGGGCCCTG 2CTGCCAGCC kGCCCCTCCC
P'CCCACCTCT
ACTAAAAATA
GAGGCTGAGG
ACACCACTGC
TGAGCGTG4GT
CTTGAACCTG
GGAGACAGAG
CCCGCGGTGG
CATCACGGCA
GGCTCTTTGA
CCGCACAGGG
GTGGCGCTTC
CGCCCAGGTG
CTCCCACCT C k.CCTCTCCCT
CAAAAATTAG
CAGGAGAATC
ACTCTAGCCT
GGCGCATGcc
GGAGGCAGAG
TGAAACTCTG
CCACGCCGGG
CCGCAGGGTT
GAAGCCGTGG
TGCCTGGGGc 2GAGTCGGAC
AGCTGCTTG
TCCCTCCCTC
~CCTGCCAGCC
GTTTGTGACC
CGAGGTGTGG
ACTTGAAccc
GGTCAACAGA
TGTAGTCTCA
GTCGCAGTGA
TCTCAAAAAG
CTCCGCGCTG
GCAGCCACTC
TGATGATTTC
CCACACTAGT
rGCGATGTGC 3GTGCTGCCA
;CCAGCCCCT
CCTCCCACC
::ACCTCTCCC
I
942 0 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10.1.40 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 L1160 L1220 0 CCTCCCTGC CAGCCCCTCC TCCCTGCCAG CCCCTCCCAC CTCTCCCTCC CTCCAGCCCC TCCCACCTCT CCCTCCCTGC
CAGCCCCTCC
CTCCCACCTC
CCTCTCCCTC
CTTCTCTCTA
TTGCGCCCTG
GGTCCAGTCT
CCTGCAGGGC
TGTTAAAGTA
CACCTCTCCC
TCCCTCCCTG
CCTGCCAGCC
GTTTCCTGTT
GAGTCAGACC
CTCAGCCTCA
AGTGTAGCAG
GCTCTGTCGG
TCCCTGCCAG CCCCTCCCAC CCAGCCCCTC CCACCTCTCC CCTCCCACCT CTCCCTCCCT CAGTTTCAGG AAGGAGGCTG TGGGTTCACG TCCCAGCGCC GTTTCCTCAC
CTGTAAAGTG
TCACCTGGCT CAGCCACTGG TTCCCTCAGG GGTTCCGGGG
CTCTCCCTCC
CTCCCTGCCA
GGCTCATCCC
GGAACCCAGA
TCCACCTCTG
G43CTCCATGA
CAGCCCCAAC
GCCCATTCCC
CTGCCAGCCC
GCCCCTCCCA
TGCTGTGTCC
TGTAGGGAAT
GTGTGACCTT
TTAGATGCAC
AATCATACCT
CTGTCCTCCA
11280 11340 11400 11460 11520 11580 11640 11700
TGCACTGTG,
ACACTGTGC(
TGGACTCCT)
TGTGGGGGC']
TTGGTGCCTI
GAGCCGCGAC
GCCTGAGTTC
GTGGTGGCGG
TTGTGCCAAG
CAGGGCATCT
TGGCCCATGA
CAAATGTGGG
TGCCACTGCC
TGCCTGGGAT
ClrGGAGCTCT CCGC7VCTGT
GCACGTCTGC
GAAAATGCAG
TTGCCTAGGG
AGGCGGGCTG
ATTCCTGACC
GAAAGGGCTT
k. GACCTGCCC'
TGTGCTTAG,
k .TGGACCCCT(
"CACAAGACC(
"TTGACGCGTJ
TACCGTCCTC
CGCTCAGTGC
TGTGCGCTGI
CCTGAGCCTC
CTATGAGGGC
GTGGGTGATG
TCCCGCATCT
CTCGCTCCCC
CTGCTTT~CCA
GCAGTGGCCA
CTGGGCCTGG
GGGCACAGCA
CTCACGCCCT
GCCAGGGTGC
GGTCCTTTTC
TTAGCAGCGC
TGTGCCCAGG
T GCCACAGAGC k CCAGACACTG
AGCACCCAGC
GGCCCACTCC
[CTGGTGTGTG
ACTCCTTTTG
CCGCCCTGAT
CGCTGGTGGG
GACGTCCCCC
CTCCTGCTGG
CGCTGGCCAC
GCAAGCCCCT
CACCTTGGGG
GCAAGGAATA
CGGCAGCCCA
C
CCTGCACTCT
GCGGGGGCGCC
TTCCCAGAAC 1 CCAGGCACTG G
TCCCTGCAGCT
CATGATCTGAA
AGAGTGTAAC
GACGACGGGA
CTCGGTGCCT
TGCTTGTGCC
TGAGACGTGC
TTCTTITGAC
GTGCGGACCC
CACCGAGAGT
TTCCCGGCTT
1GCCGTCTCT
CATCTGGTGA
CCTGGGTCC
PGCCTCTCCCC
rACTTTGGAGC
;CCCGCCAAGC
;CTGCCTGCG C 'ACAGTCTCC C
£CCTCGCTCTT
TGGCAGGAGA
'CCCGAGGCC C
GACAGGCTGG
AGCCTGAGGG
GCCAGTGCAG
TCAGCGCAGG
TACATCTGGG
GGGGCTGGGA
GTAAGCTGGC
CGCTG4CATTC
CTTTGGGAGC
rCTGT'rGGCT
GTGGATCTCC
AGTGGCCGG
'CTAGGGTAT
~CTGCTCGTG
;GAGACACAC
'ACCCTGGAA ~TCCAGCTGG C
TGCAGAGTG
CATGGCTTG.G
AGGGCTACA TI AGCCCTGGC C
CTTCTGTGAG
CCTTCTGGTT
TGAGAGCCAG
CCTGGGCGGG
GCCGCGTGGCc
TGTTTGCCCA
AGTGTTGGCA
GAGTGGCACT
TTGCTGTTAG
TTTGGGGAGG
CTTCTGAGGC
rCTGCCATCC
GCACCGCTGC
3GGGTGGTTC
GGGAGACCC
kTGTTCTTTT
.GAAAACATC
:TGAGGCCGG
LGCGCAGCTG
~CAGCTGTCC
CTGGGGCTG
CAGCCTGGC
GCCACCTCA
GAGGCAGGA
11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 .4 a CAGAGGCGGA AGCCAGCTCT ATGAGGCCAG CGCTGGGCAA GCCCATGCCC AGGGAACGTC ACAGCTGTGG GAGTACAGGG 13080 GCTCCGGGTT CTGAGCCCGT
CCACTGTGCA
CATCATTGGC TlGTGCCCACA
GCCGAGTIGGG
GTGCTGCTGC CCTCTCCAGG
GCACTGCTGT
TGCTTGCCCC CCCCCCCACC ATCCTCTTCC CCCTGCTGGC TGCCCGGGGA GGTGTTTGGG TGAGCCTCAG TGTGCCCATC AGGAGCdTAA TGAAGGGTGG GAGTGGAGAG GGATGCAAGG CGTGCAGGGG CACCAGGAGC
CGGCCCTCAT
CCCCAGCGGC AGGTAGCATA CGTATGAAGC CGTCTCCAGA CGGCCCTTTT
TGAGCTGGCTC
CGCAGACTCC CCTTTCTCAT CTCCCTCGTT
C
TGCCTGCCTG TGGGTCCCGG GAGGACCTGA CTGGGGACAG TTCAGCCGTG GGAGGGATCT G TCCCCAGCTA GTCTCACACC CCGTGTCTGG G
TCGTGGCC(
TGATGGGAJ
GCCCGCACZ
TACCTTGC~j
GGATGGTGT
GGTCAGTGC
GGGTCACAAi rCTCCCCTr% 3CTCTCCTT(
TGTTT'ITC(
AGCCTCCG.P
;GCTGCCCA7
TAAGGACAG
ACCCAGAGA
GAGGCTGCC
CATIGGAAGT
rTCAAATGA rGGAAGCCG ;TCATGGCT
CCCTGTCC
T GGCCTCAGGA
TGGCTCGTAC
T CCGGCTGCCC
CGCTGGATCT
3 CCGGGCGCJAG ATGGCCAGTT P' TCCTCCATTG
ACACACTOGA
7GGGGGAGGAG GAGGGCCCCT GCACCTGCCC
ACACAGGCTG
GCCTGGCTCC
ATGTCAGCTG
AACTGGAAGG
GTGGCCCCGA
CTACACCCCA
CAGGTGGCT
ATCTGTGTAG
GCAAGGACAT
GGCCGGAGTC TCCATCCCTG GTCACCCCCG
GCATCTCATC
AATGCCGCTG
AGCCTGGGGC
CCCTCGTGCA GGGCTCTGTT
S
GCTTGGGGCC
CAGGGTTGTG
ATCTGCCCAG
CCGTTAATTT
AAGGAGCACA
GGGCCGCAGT
GCCCATTTCC
GGCGGAGGGG.
CCTGGGTCTG
GGCGGCCCCT
GGGGTGCCGG
TTCT'rrGGGG
GCCTGGGGCAC
TTCCCTCTGC
TCAGTCCCAC
TI
AGAAGGCCCA
TGGCAGCCTC
GCCGGCCTCC
CCCGGGGCTG
AAACAGGATC
GCGCAGAGGG
GGGGGTGGCA
TCACCCTGGG
AACCAGCTCC
GCGTCCTGTC
2CGAGCCGGT 3CCTGGCGCAC 3CTCCGTCCGC
TCAGCTCTG
GTGGATTGG G 'GTTGCATTC C
CCCCAAGACT
GTCCTGTATC
CTG4GCCCTCC
CACTGTTTTT
ATTTCCGGCC
AAACAGATGA
CTG3CCGCCTG
CCTGCTCTCG
I'CCAGGAGAG
2rGCCCCTGC
XCCCCCAGCC
ACCCGCTGC
CCCCCAGAC
LGTGTGAGAC
IAGGGCCCCG
CCGACCCCGC
TCCCGGCCC
A
TC
GGAGGGACGG GCCTGGGGGT
GACGGGGCCT
CGAGGGAGGA
CAGTTTCCAG
rGACCACACG
CCCACTCAGC
['CTGAGGCAG
3GAGCTGCCC
~CTCCCCCG
.'GCCAGCCTC
GGCAGGGGGC
GCGGTTACAT
TGAGGCGGCC
TGTCTGGAAT
ATCTGGTCTG
GTCTGGTTTG
CACACCCAGG
CCCAGGAGCG
ACCCCCACcC
GGTAGGCGGA
GCACCGTCCC
CCTCACTCTC
CGGCTGGCCC
TCTCCAGCGC
CTCGTGGGGC
GCCAGCAGcc
AAACTGCAGC
CTTAAATAGA
AGCGAGGAAG
CCACACTGCA
CCTCTTGCTG
ATGGGGCCTC
P.AGCTCCGCA
A.GCGCGGGCG
GTGCAGATGT
SGGAGTGGGC
rCTGGGGCCA
;AAGTTCTCC
CTGTGGGGC
I
I
I
1 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 L47b o .4760 .4820 .4880 .4940 CCAACTGTGG GCAGAGCCCA GGGGGAGGGC AGGAGAGCCA GCGCCTGGCT
GGGAACACCC
CTGAGGGGCC GAGGCTCCAG GGCGAGGGGG CCCGACCTGG
GGTTCACACG
GGGCAGACCC GCTGCAGCAT GAGACACGTG TCAGCTACCT
CGGGCCGGCA
GCTGCCCACA GCCCTGGGAC GTGGCCCCAC CTGTGACGGG
TGTGGAGGGG
GCCTGGCCAC ACCCTCTGCT GTTGCTGCTC CTGCTCCAGG A ITGGCAAGG GGGGTGAAGA CCCGTACTGT GOCCACACAC CTGGGACTTC
CTTCTCCACC
CCAGCAGCCG CTAAGGAGCC CGCTGGGTCC CACGCTAGGA
TGGTCCTAAC
TTCCAGA'rCG GACGCTCGGC.GCTGGGGACC CCTTGTGTCC
CGGGGCTGGG
GCCCCCATGG GGGTGTACTC CTCCCGACAA GCTTGGCTTC
AGCTTCCCTG
CTGGCCCTCG GGCACCCATC AGGCTGTCCC TGTGCACCTG
GCTCCCACCC
TAGCAGGAAC TGGGTGAGG AGTGCGTGGG GCAGCAAGGG
CCTGGGACCCC
TGCACTCTGC TCTGTGCTCT TGCCTGGGCT TAGGGCCGCT CGGTGGTCCT
G
GCCTGGGCCC TGCTGTGTCC CCCATCCTTG CAGGGAACCA GAACGTG7GGG
G
AGACAGCGGC GATGATGTCA CCTGGCGGGT GCAGAGGAAG
CCCGAGGGGCG
GGCTGGCGCG AGGCTGCCTG GCTAGGCCTT GGCGTTCCCC CAGAACGGCG
A
CAGATGGAGA CGTGAAAAAG TACGGGAGCA AGCGAGGTGA
GGACTCCACGG
9
TGCTGTTCC(
CCAGGGGGC)
TGCACCTGGC
GCCGAGCGGC
GGAGAAGGAC
GCAGGGCTGC
GGGATGGGCG
CTGGCTTGTC
GCCAAACATC
CCCCATGCCT
TGTCCCTTCC
AGGGCTCCTG
CGCCGCAGCC
TCCCCCTCGG
AGAATGGCCT
-TGTCCCTGA)
k. CCTGAGTCC9J
-TCCTGTCTGC
GTCCCCAAGz
CCAGCCTGGA
CTGCTGGCGT
GGGAGCCGGG
AGCTCCCAGC
TTrCAGGCTTT
GGGGGCAGGT
AGGGGGCCTG
CGCCTCGGGT
GCTGGAAAGT
GCATGGTGCT
CTTGCTGGTC
IGCCCACACC'.
ACCCAGGGCJ
GCCCCAGCTo *CCT'rGCTGCz
*GCCTGGCACC
GGAAGAAGTG
GGGCTCTGGG
AGCAGCCACT
TCCTTCTTTC
GCGAGAGCCT
CCTGTGTTCA
CCCGGTGCCr
CCCTCCTCAG
TGGGCAGTGG
TCCCAGCCAC
r GAGTCCTGCC k. GACGCTTCCP
CATTCCACTC
TrTCTGGGCC
CAGGGAGTGC
TCCATGGCAC
GTCCTCGGCT
CTTGATGGAT
CTTTCTCCCG
GTGCCCCTCC
CCGTGGCCTC
CATTTCTCCC
GTCTAACTGC
GTGTGAGTCC
CACCCTGTCC
CAGGGCAGAT
LCACCCTGGGG
CCCTGGGCCC
TTGGGCTGGG
ATGGCCAGAA
CCCCAGGCCT
GACCTGCCCC
TI'TCCAGAAA
TGGCCTGGGT
CTGGGGCAGT
TGCAGCACCTC
TAAAGCATTGC
AGTTCCTCACC
AGCTGCCTCA C CACCCCACGG
C
G4
G]
cc
GG
CIA
PT
CCCGGGTGGC
GGCTGGCCCT
CAGCCTCCAG
GTGCTGGAA
CAGTGGTGCC
TCCTCCCGCC
GCACCGTCCT
GGAGCACATC
PTCCAGCTCA
'AGAGGACCC
;CTGCCAGAT
CAGGGCArC2
GGGTGGGGG
TGGCAAAAG
GGACCCCTG
~TTCCACAC
~TGGGGGAC
;GGAGCTCG
."GGGGCCG
:GGTGAcAG
TTCACAGT
CCCCTGCC
GAGGGT
GAGCTGCT
CACAGCTG-
CGCCCCTT
rCTGCTGC 2ACAGTGT
-=TTCTCGI
15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 L6620 6680 .6740 .6800 GTGGATGCCT AGCAGCGCGG CTGTGGGCCC ACCCATCCTT ATGGGCAGTG
GGGAGCACCT
1 CAGCCCGTGT CCCTACCTTG GTGTAGAGGA GGGGACGGCA GAGAAGCAGG
GTTCAGTTAG
GGGGGAAGTG GTGGCCCTGC CGGAGGGGCC GTTCCCTGTG TGCCTGGCCC
CCAGATCCTC
TCCCCTCCCG GAGCCCAGGG CACAGGCATA GGCTCTCTGA GTGTCCCACA
GCCCCTGGGG
GAAGGGAA'CT GCACCCCCAA CCGTGCCCTC CATCCGCAGA TGGAACGAGA
AGCTCCGGGA
GCCAGTGCCC AGCGTCTCAT CTGTCTGGGC ACCCAGCCCA GGTGAGGGCC
TGGCTCCACC
GTCCGTGGCT GGTGCTGCTT CCTGGCACGG AGAAGGCCTC GGCTGCTCTG
TCCCCTCAGC
TGGGGTGGCC TCTGGTCCCC TTCTTTGTTG GTTCCCTTCT CAAGCTCTTG
CCCTGGCCCC
GGGCCCCACC GGGCAGCCTG TGTGTGCGTC TCTCCTGCGC CGGGTAGGCT
CCTGTGGGAG
CGGAGCTCCG GTGGGAGGAG CAGGGCTGGA GGCTGGCAGG GGCTGOGCGG
GTGTTCAGGG
ATGGAGGCCG CCCCGGCTTG GGGCTGGCTG CCGGGTGGTC ATTGCTGGGA
AGAGCAAGTC
TAGGCGGAGG CACCTGCTGG GTCACTCGTG GGGAGGGTGA CACCTGGGGA
AGTAGAGCC
CGTGGCAGGA GGTGAGGCCT CGGGGTCCTG GGGAGCAGGG GGGTGGTTG
CAGACCTIGCG
GAGCCATAGT CCTGTGCCAG GAGCACTACT GGGAGTGCGT GGGACCAGGA
GGGGTGCCCA
GGGTGGGCGG CAGAGTGACC CCCGAGGTGC TTGAGGCCGA GGGGAGcqG AGTTCTCG3T TTGCCCCAGC TCTCTGTCTA CTCACCTCCG CATCACCAGC TCCAGGACCT
GGTTTGTAAC
TCGGGCAGCT CTGAAAAGAG AGACATGCTG CCGCCCTGTG GTTTCTGTTG
CTTTTTCTTC
16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 L8240 L8300 .8360 .8420 8480 8540 8600 8660
ACTGACTACT
GCCTGGTTTT
CTCCATCCTT
GGTGCTGGAG
TGATTCTCCT
GACATGGGAT
TCTTTTTTTG
TTTTTTTTTA
TGCAGGGGTG
GCCTAAGCCT
GTTTTTCCTA
TTTTTGGAGT
TTTTTATTTT
CGATCTCAGC
CCTGAGTAGC
*J S
S
ACTAATTCTG
CTCCTGACCT
AGCCATTGCA
TTTTCTTC
CAGTGGCGTG
CTCAGCCTCC
ATTTTTAGTA
ATGATCCACC
GGCCATCTTT
TAGTTTTGGT AGAGACAGGG
CAGGTGATGC
CCCGGCTCTT
TTTTTT TT
ATCTTGGCTC
TGAGTGGCTG
GAGACAGGGT
CACCTTGGCc
CTTTCCTTGC
GCCCGCCTCA
TCCCCTTCTC
CTTTTGAGAT
ACTGCAACCT
GCACTACAGG
TTCACCCTGT
TCCCAAAGTT
1 .TICCTT.L
CGGCTGTGAC
TTTCTCTTTC
TTGAGATGGA
TCACTGCAJAC
TGGAATTACA
TGTCTCCGTG
GCCTCCCAAA~
CTTTTCTTCT
GGAGTCTcGC
TCGCCTCCCG
CTCCCGCCGC
TGGCCAGc3AT(
CTGGCATTAC
CAATTGTGCT
TTTCCTCCCT
GCTTCACTCT
CTCTGCCTCG
GGTGCTTGCC
TTGGTCGGTC
GTGCTGGGAT
CTICTCTCCTC
TCTGTCACCA
3GTTCACGTG
ATGCCCGGC
;GTCTCGATC
GGAGTGAGC(
TCTTCTAATT
CCCTCTCACC
TGCAGGATGG
CGGGTTCAAG
ACCACGCCCG
TGGTCTTGAA
TACAGGCAGG
CCTTTCTTTC
GGCTGGATTG
ATTCTCCTGC
TATkIT'T''GC
IVTTGATCTC
:ACCGTGCCC
J
a
I
1 1 1 .555 TTTTCTTTCG AGACCGGTC GCCCAGGCTG GACTGCAz~GTG GCACAATCAT AGCTCACTGC AGCCTCGACT
TCCCTGGCTC
AAGCCATCCT TCCTCCTCAG CCCCCCGAG.
CTGGCTGATT CTTTTTTTCC
TTGTAGAGA
CAAACTCCTG GCCTTCCCAA
AGCACTGGG
CTTTTCTTCT TTTTAACTGcG
AATAGTTGA,
TATTTTW3C CTTTAGTATG
TCGTGTAAG'
TTTCTAATTT TA TrTATATT TTGCGTAGA) TGGTCTTTGA TGTTTTATTT
ATTAATTATC
TCGCCGTTTC ACCCAGGCTG
GAGTACAGTC
TCTCTGGGCT CAAGTGATTT
TTCTCTCCTC
ATGCCGCCAT GCCTGGCTAA
TGTGTATTTT
AGGGTGGTTT CAAAATCCTG
GGCCCAGGCG
GTTACCGGCG TGTGCCCAGT
GCCTGGCCGT
CTCGAGGTGG CGCCTGCTCC
CCTGTGCTCC
CACAGTCATA CCTGGTGrG
GTCCCACAGT
ATGGGGGCCC CTCGAGTCTG
TGTGGGGGCT
GGGGGACTGT GGACAGGGGA
TGGGGGGCCT
T AGCTGGAACT
ACAGTTAC,
,T GGGGTCTTGC
TATGCTGT(
T TTACAGGCAT
AAGCCACCI
C GTTTTCTTTA
TTAGCTGTG
r TGCTAGTGCT
TTTCTGAGA
k. GTTGTGTATT
TTAGATGGA
TATGTATTTA
TTTATTTT'P
ATGCGATCTC
AGCTCCCTG'
TACCTCCCGA
GTACTTGGG)
*TTGTAGATAC
GGGGTCTCAC
ATCCTTCCGT
CTCAGCTCCC
CTTGGAGGTC
TTGTTTCTCTI
CTGGTAGCCT
GGTAGTGAGC
GGGACCACCC
TGTTGGGTTC
GTGGACAGG
TTGGGAGACC
TGGCCCTGCG TGGGATGGGT ~C ACTACCATC C ATCCTGGTCT *C ACCCAGTTTC
GTCAGGAGGG
,T TGTAGTTTGT
GTTAGGTCGC
GAGGTAGAGTC
r AGCCTTGACC SCCCCAGGCt3C
TGTGTTGCC
ACGGTGCTGT
GGGTTTATGC
CTGCTI'CTCA
AGAACAGGAG
ITGGCTCTGT
TGGGGGTCCG
CTGAGATGGG
AGTGTTCATT
AAGGGTGGGA
CTCTGGCCTA
CCACAACCTG*
GCTGTGAGTG
CCGCCTCTTG
CCAGGGATAT
TTAATTTAAG
CCCCAATATC
rcClrGGGGGC 2 3AGGCGGTGC 2 7TTCCCACCA. 2 'CGTTTGAGT 2 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 0220 0280 0340 0400 TGCCCTTCCT GGCCCTGGGT GGACAGGTCC ATGTGGCACT CGGCATAGGG
S.
S
C
S
C
TGCAGAGGGc
TGGGTCTTCC
GGCCATAGCT
CTAGGGGCTC
CTCCGGGCGC
TCCCCCAGTC
CTGCTCAGCT
AAGCAACAAC
TGAAATGTAA
CCTTCTGCCC
AG'DTCACTGC
TGTCAGGGTG
GACCTGGTCC
*TGAGGCCCCC
*ATCAGAAAGT
TGGGGATGCT
TGGCCCTGAC
TGGACGTTGG
GTGCCAGCAT
GTGGGGGCTT
AAGATTTCTA
GTTGTGGTrC
TCCCAGTTGG
CTGCTTGGAG
GCTCCAGGGC
CCAACACCTG
CCCCTCCTGA
GGCAATGT
CCACGGCCAC
GCTCCTGGCG
GCGGGGCTCA
CCATCAGCTT
CGTTAGAAGA
TTTGGGTGGG
TCCGTGTCCC
CCCCCCAGTG
CTGGTTGCCA
CCCCTGCCCT
CCTCTGGGAG
GGGATGGGCC
TCACTCCTCA
AACCTCTCGG
CTCCGGGTGG
TGCCGAATCc
AGGAATATTT
GTCCTGGCTG
CTTCCAGGCT
CCGGCTTGGT
GTGGGGGGCT
GCAGAAACCT
TGGGGAGCTC
CAGGGAAGGCc
GAGACGTCTC
CGCTGGCAGA
GCTGGCGGCA
CCCGTCTCTT
GCTAATTTAT
GACCCCAGGC
TGAGACCAGA
TGGGGCAGGG
GGCATAGACC(
AGGCCTCTCC TGGCTTGGr'r
TCCCCAGAG
GTGACTGTGG CCTGGCGTGG CTGCCGCGAT GGGCGGAGGA GCAGCAGGTG CGGTGtTGC 20520 AGCCCGAGC AGCCACGTGT GCTGGGCCTG
GCTCCCTGGC
TCCCCTTGCT GGACAGTGGC TGTGGTGAGT
GCCGGTGGGT
CAGCCAGT GGACCTGGGC CCTGCAGACA
CTGGGCAGG
GGGGGCCTCC GGGCCAAG AACAGCATGG
GAGCCTGTGA
GGCGTGGGGT GGAGCCAGGA GGAGCAGJAC
CCGGGGTCCA
AGGAGTATGT CGCCTGCCTC CCTGACAACA GCTCAGG4CAC CAGCTGCCCA CGAAGGCCTG CTTCAGCCAG
AGGCCTGCAG
GCCAGGGCCT CGCAGCCCTC TCGGAGCAGG
GCTGGTGCCT(
CCAGTGCCTC CTTTGCCTGC CTGTCCCTCT GCTCCGGCCC
C
CCTGTAGGGG CCCCACCCTC CTCCAGCACG TCTTCCCTGC
C
TGGGGCCCCA CGGACCTCTG GCCTCTGGCC AGCTAGCAGC
C
TCCCTGTCAC TGCCACACGC 7GGACTTCG GAGACGGCTC
C
GGCCGGCTGC CTCGCATCGC TATGTGCTGC CT'GGGCGCTA
T
CCCTGGGGGC CGGCTCAGCC CTGCTGGGGA CAGACGTGCA
Q(
CCCTGGAGCT CGTGTGCCCG TCCTCGGTGC AGAGTGACGA
G~
AGAACCGCGG TGGTTCAGGC CTGGAGGCCG CCTACAGCAT
CC
CGGCCCGAGG TGAGTGTCTG CTGCCCACTC CCCTTCCTCC
CC
CAGAGCCTGG TACCCCCGTC TTGGGCCCAC ACTGACCGTT
GA
TCTCCAGCGG TGCACCCGCT CTGCCCCTCG GACACGGAGA
TC
TGGCCAGCCT
CTGCTTGGCA
GGGGCCAGCT
CTGTCCTTCC
CTCAGGAAJGG
CCTCTCTGGG
GTGCGGCGGG
CGGATGTGGG
GTGGCTGCCT CTTCTAGG2YG CGTGGCAGCA
GTGTCCTTTT
CGCCTTCTGC
TTCTCCACCG
'TGTGGGGCG
GCCCAGCCCT
CCGCCACCT
CCTGCCCCCA
TCCCCAGGG GCCAcccTGG 'TTCCACATC
GCTGCCCCGC
GCCGAGGTG GATGCcCG CACGTGACG GCCGTGCTG 3TGGAAGCG
GCACCTGCCG
k.GCCTCGAC
CTCAGCATCC
TGGCCCTG GGCGAGGAGC *AGGGCCAT
CCAGATGGG
CACCCTCG TTCCCACCGG TTCCCTGG CAACGGGCAC 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 U1960 ~2020 208*0 214-0 2200 2260 2320 2380 a a
TGCTACCGCC
TGGGCCGGGG
CGGGTCACCA
CACTGGGGCA
CCCTGGTGAG
GGTGGACATG
LTGCAL15GGG
GGCTGGTGGG
AGTGATGGGG
CCCTGCCATC
*TGGTGGTGGA
CCGCCCTGGc
GGTGCCTGCC
GAGACTGCGG
CAGGTGGCGC
AGCTGGGGGC
ATCCCTGCCG
CCCTGGTCTC
CTCCCAGCGG
CCACACCCGC
GAAGGCGGCC
AATGGTGGAC
CCCACCCCCC
CTGGGGAGTC
CGGCCGGTGG
AGCCTCCGGA
GTACCCACAG
CGGGCTCTGA
GCTGCTGTGA
CCCCAGGAGC
TGGCTGCAGG
AGTCCCGCCG
GAGGGGCCAT
TCAGGAGGAA
GGCCGTTCCTr
CACTCCTGG
GCCCCGTGGG
GCCTCAGTTT
GGGTGGGAGG.
CTAGACGTGT4 GGCGAGGcCT
CGCAGGAGCA
TGCAGCGCTT
AGGTTGGGAG
GGAGGTGGGA
GTCAGCTCTG
CACGCCATAC
TGGGTGCTGC
CCCCATCTGG
ATGGAGGAGT
GGATCGGCTT
GTGTCAGGcc
CCTGGTCTCC
ATCTCTGAAG
GCTGGGccGG
CAGATGCAGA
GGGAGGTGGc
TGTGAGCCTG
A.AAGGGGGAC
3CCCTGAGCC
:TCGACTGTG
2 2 2 2 2 2 CAGGGGGTYG AGGTGGGCCC
AGCGCCGCAG
AACTGGCTGC CCGGGGAGCC ACACCCAGCC ACAGCCGAGC ACTGCGTCCG
GCTCGGGCCC
ACCGGGTGGT GTAACACCGA CCTGTGCTCA CCCGGAGGTG TGCGGGGGGC CAGGCAGGGG CCGAGCGCCC GCGGTGGAGC CTGGGCTGAG GGCTCGGTCC CCAGTCTGTT CGTCCTGGTG CACTCGCCAC CCCAGGCCCA GTGCAGGATG GGGACCTGCA GGGACCCCTG ACGCCTCTGG AGCCCGTGGA GGTAGTCGGC CCCCCACGTT GGCCTTGCCT GCCCTGCCCA CTGTGGGTCT TTGTGCCCAG TGAAGATGGT TGGGAAAATC GCGCCGCACA
GCTACGTCT
CCTGAGACGC TGGCTGTGG' GAGGAGGGGC 'rGGTGGGGG( TCCTGGGCCC TGGCCCGGCG CCGAGAACCT
CCTCGTGGGA
CACAGCAGGA
CGGCCTCTCA
CTACAACCG CCCTCCTGCC CGCCAAAAAA
CTTGGGGGCC
CAGAGTGCAG AGAGGAAAGC ATTACCTCCA GGCCTTTTCT CTIGAGCGTGT GTGAGTTATT
TCCTGCCCC(
TGTGGGGGT(
GGAGTCTGG(
GGTCATGGTI
TGGGACCCAC
CACAGCAGG7
GGCCTCTCTG
GGTGGACAGA
GGGTGGGGCC
CTGGGCTGGG
GGTGCAGGCA
TCCCCTCCTC
CAGGACCCAG
CTGCIrGCCG
GCCAGGGCTA
GGGGCCCCCC
TCAGGCCACC
TGCAGGCCCC
CTCTCCCAGT
CTCTCCCAGC
-CATGGACAG'
-TGATGGCTC(
CTTCAGGCTC
TTCCCGGGCC
GAGCTCCGGC
GGGACTCTGG
AGCTTTCACG
GCAGGGGTTC
TGCCTGvGGGG AGTGvCTGCCC GTTG3GGCATC
ACAGGGACCC
CTGGCCCCCG
CTGGACGCCT
CCCGGGGCCC
GCGCAGTACT
TTCCI)GTCTG
TCCCACGTCC
GGGAGAAGCT
ACAGClr.CTC TTCCACCGGA GTCTTCCTCT 3 CTCTCCTTCC
GGATGGGGTC
TGCGTCTGAG
GGCCCGCCCA
GTGGTGGGTG
TCTGCTGGTC
CAGAGACACC
CCGGCCTGGG
AGGTGGGGAG.
TCTGACGGTG
CGGAGAACGG
CGTGCATGCC
CCTGCCACCC
CCTATGCGCT
CGGTGTCG
CTGCCCAGGG
TTGCCACCTC CTCGCCTGGG C CTTCCTGTCT G
CTCCCCTCTT
TGTG\GAGCTG
CCGTGAAGCC
GCTGCGGCTG
GTGGGTGGTG
CTGTGGCCAC
AGCTCATTCC
TCAGTCGGCT
ACCTGTCCTC
GCCTGTGGGC
CAGCGAGCCT
k.GGGGGACGC
XCAGGCCTGCC
k.TGGAGAGAG
:CCTGACCTG
N'cGGTCTG .'GACCTCCGA
C
CCTTGGCAC
;CCAGGTCTTG
CCTGAAAGGC
CGAGCGACAG
CCTGGGAAGT
AGGCGGCCCC
TTCCTCACCA
CAGGTGTACC
GGCGCCGCAG
CAGAGTGGTT
AGGTGTCCTG
GGCCGGAGAC
ACAGCAAGGC
k.AATCAGGGC
,AGAGCAGGTC
['GGTGCCCTG
~CCAATGGCT G .ITCCTCTTCT C ;GTCTGTTCC C r3CACCAGACA CTCTGCAGT G AGCTGTGCC T GCCTGTGTC C G CGAGCTGCAG P TAGGGGCCTG
GTTTTCGGGC
CCTCACTGTG
GCGCCCAGTG
GCCCCGCACG
TGCCCCTGGA
TTAATGTTGC
GTTTACTCAC
AGGTCAGGGG
GAGCCAGGCC
TCGGGTAGGG
CTGCCCACCA
CGGCCGAATT
GGCTCCTCAG
GACTCGGGGT
CCCAGTCTTA
GGGGTGGATT
GGACGCAGCA
7AGGATTGCT
=CAACACCC
~CCCOGACAA
,AGCCAACAT
;CACGTCAGG
~CGTPrCCCGC
TGCATCTCC
.CACCCAGCC
CCCTCGGCC
CCTCTTCCT
22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240
S
5*5* S S GTGTCCCCCG GTCTGCAACT GTCCTGCCTG TCCTTGTCAC GAGCACGTG GGGAGGCTCC TTGAGGTGTG
GCTGACGAA
GTCCACGGGC
CATGACCGT
CTCCACGGCC
AGGATGTCC
GGCCCTGGCG CCCTCCTGC2 TACCTdTCCG
CCAACGCCTC
TGGGCCTGCC
CTGGGCTTGA
GTGGGCAATG
GCTGGGCTGC
GGCTCAGCCT
CCTGGGGGCA
GTGCCCGGCT
CTCAGTGAGG
CTCAGCCTGC
GAGGCCCGTG
CCCAGGCAGG'
AGGGCAGCAG
CGGGCTACCT(
ACTCATTCCA C
CTGCCTGTGC
GGCCCAACCC
GCGTGTCCAG
GGGTCATCTA
TGGTGCTCCA
GTGTCAGCGC
GCCCCTGGGA
GGGAGCACGT
GGGTGACGGC
TACTGCAGGG
TGCCTGCAGA
:CCAGTTACT
;GTGGGCTT
TGTCTCATT
jAGTCTCTCA kGTCTGAACC
;GGGACAGGG(
;CCATCAGAA)
CTCCCCTCAC
1'GAGAAGAC
GGGAAACTGA
CGGACAGTG A
:CTCTCCTC
kCAGGCTAAG GCGGGGAGCCC TGCGTGTCCA CCCTCATCCG
TCGTGCGGGG
AGGACGTGAT GCAGCCCTGC CTCCCTCTCC
ACAGGTCACC
r CATGCTCCCT GGTGACCTCG TTGGCTTGCA
GCACGACGCT
k CTGCTCGCCG GCTCCCGGCC ACCCTGGTCC
CCGGGCCCCG
GTCATGGCTG CCCCACTTGC CAGCCCAGCT
GGAGGGACT
CCTGCGGCTG CTTGCAGCCA CGGAACAGCT
CACCGTGCTG
TGGACTGCGG CTGCCTGGGC GCTATGAGGT
CCGGGCAGAG
GCACAACCTC TCCTGCAGCT TTGACGTGGT
CTCCCCAGTG
CCCTGCCCCC CGCGACGGCC GCCTCTACGT
GCCCACCAAC
GGTGGACTCT GGTGCCAACG CCACGGCCAC
GGCTCGCTGG
CCGCTTTGAG AATGTCTGCC CTGCCCTGGT
GGCCACCTTC
GACCAACGAT ACCCTGTTCT CAGTGGTAGC
ACTGCCGTG
GGTGGACGTG GTGGTGAA ACAGCGCCAG
CCGGGCCAAC
GGAGGAGCCC ATCTGTGGCC TCCGCGCCAC
GCCCAGCCCC
S.
S
S
S
GAGCCCCCGT
TTTGAGGTGG
TGGGTACATG
CTTGGGAGGG
AGTGTAGGTG
CCGGGAGGGT
CTTAC4GAT
GGCTGGAGAG
CCCGGGCACC
TCCAGGAGGC
CCCGCAGAGG
2
C
C
I
L.
C
C
k.GTCCTAGTG
'AGGGTGCTC
;GGGACGTCG
AATTCCTGG
CACAAAATG
GCCCTCTGC
CTGATGGGA
rGTCTCCAT rGGCGTGAC rGCTCCGAG
AGAGAGAG
GCACAGAG
;TGTCCCAG
,TGCCCCGC(
rGCAGAGTCC
GTGAGTA'TGG
ACACAGGGCG
GCCCCGGGCA
AAAGTCACGG
AATTTAAAAC
TGTG3TTCTCG
AACTGCGGGC
CTTGCGGGTA
CTGTGCAGCC
CTCTCTGGGT
GTGGACTCTT
D.GGTTGAGGC2 MCGGGCyrTG TGGGACTCT c
CCGAGGCTCC
TGAGGCCTGG
GGTCCTGCTG
CTCTGACAGT
TCTGCTCCCT
CCTGGCTAA
TGCCCGCGGT
CCTGCCTCTT
TGTCCTGGGT
3AGGAGCTGGC
['GTAGCTGGT)
kTTAGTAGTAC
;CTCCCATGGC
:CAGCCCGACG
ACCACCAGCC
CTTCCCAGTG
GCTGGCTCCT
GGCTCCGCTA
GACCTCACAC
GCGAGTGGCT
GCCACCATGC
CACCAGGGGC
rCTGTAAGcc
;GCAAGAGCG
4CTAGGTTITG
'TACATGGCT
ATGCAGAGC
;GGAGGTGTG
'ACCATTCCC
2 24300 24360 24420 24480 24540 24600 24660 24720 24780 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 257.40 258.00 25860 25920 25980 26040 26100 =TCACAGAG CCCAGGCTGA rACAGCCCCG TGGTGGAGGC CGGCTCGAC
ATGGTCTTCC
CAACGACAAG CAGTCCCTGA CCTTCCAGAA~ CGTGGTCTTC AATGTCATITr ATCAGAGCGyC GGCGG'rC'
CAGGGCW
GACGCTGC
CCTGCCAC
AACGTCAO
GTCTCCACj
CTGGTGGAC
GGGCACCAC
GCTTTTCTC
AGGCTGCAC
GGGGCGTGA
CCCACCCCT
ATGGGGAGC,
ACCCCTCGGI
GTGAGGGAT(
GCTCCCCAGJ
CTGATGGCTI
AAGCTGGGCC
GCCCCGAGTG
TACACCTGTG
AGGTGCAGCC
CAGGTGGGGG
CCGTGCTGGC
CCTCCCTGCC
TCACCTTCTA
GGGACGGCTC
GGGGCACCTA
CGGATGTGCG
AGCAGGGCGC
CCTTCGACAT
ACCTGCGGGC
TC AAGCTCTCA .GG GCGGGCTCC.
CA TGGCTGTGG CT GGGCTCACT( Z7G TGAACTACAl IG TGCCGGCCGI 'T CGGCCGTGGA ,G CTCTTGTCCC ;A GCCTCGGTTT G GGAGCCGGGA C TGCAGAGTGG G TCCCGGTTCA P, GGCCCTCCAC r GGCCCAGGTG 3AGGGGGTGAG
SCAAGCTGCCC
AGGCCCTACT
CCTGAACTGC
CAGCTGCACT
CTCCGTCCCG
GTTCACCCCG
TCC!GCGGCT
ATCTAATGCC
CTCC!GTGGCT
C
CCCGCACCCG C
-CCCTGTCCTG
CCACGTGCGC
C
CGTCTTTGAGG
CCCCGTGGTG G GCGGGACGGC
A
ACAGAACTrGCA GTAGGTGGGCG GGGGTGGGGA GCGGAGGGGA TGGGCCGcG A CCTTCACCTC TGCCTTCTGC TCTGCTTCAT
GCTGCCCGAG
STGAGTGGAGG GAGGGACGCC AATCAGGGCC
AGGCCTCTCA
SACGCCTGTCC CTGCAGCTGA CGGCCTCCAA CCACGTGAGCy CGTAACCGTG GAGCGGATGA ACAGGATGCA
GGGTCTCCAG
GCTGTCCCCC AATGCCACGC TAGCACTOAC
GGCGGGCGTG
GGTCGCCTTC CTGTGAGTGA CTCGGGGCC
GGTTTGGGGT
AGCCCCAGCC TCAGCCGAGG GACCCCCACA
TCACGGGGTT
CCCTGTCTGT TGGGAGGTAAJ CTGCGTGCAC
AGGAGCCCTG
GAGGCCTCAG CACAGCCGGG TGGGCCCTCA
ATGGAGGCCC
AGCCTCGGCT GGGTCCCAAG CACCCCCTGC
CCCGCCACCG
CTCACTGCGT CCCACCGCCC CGGCAGG rGj ACCTTTGGGG CAGTTCCAGC CTCCGTACAA CGAGTCCTTC
CCGGTTCCAG
CTGGTGGAGC ACAATGTCAT GCACACCTAC
GCTGCCCC!AG
a a a GGGGCCAC'pD
CCCTTCCTC(
GGGGTGAGGC
GCCCCCCGCc
TGGAGGCGG'I
CACGTGGCT1 3GCTCCTCAC 3CCCCACTCG
E'TCGAGAACC
;TGGGTGTGA
:TGCCCTCGC
,CCCAGAGCC
TGGAGGTCA
AGCTCCGCG
T CAGCGCC!G CC!GTGCTlGT
CAGTGACCG
3CCTTTCAGG( :CCAACAGCcC
AGGGGCCAGC-
CTCCCCGGGC
GCGTCCTCGC
'GGGAGCCTGG
GCGGGGGGCT
GGCCTGTCCC
GGACGCAGCA
GTGACGGCGT
CTGGGGTGT
AGCCGGCTGC
ACAACACGGT
.GACTCAGCGT
CGGTGCAGAC
CGGGCCCGGA
TGGGTGCGGC
-TCTGAGCACG
*TCACTGTGAC
CGTGGGGGGA
CTGGCTCTTG
CAGGCAGCCC
GACCCTTAAG
TCTGCCGAGC
CACAGGTGAG
GGTGCCTGTG
CC'IGGTGGCC
TCTTTACACG
CAACCACACC
GAGCGGTGCG
GGACATGAGC
GGTCCCCCCA
CTCACCTGGG
GTGGACAGGG
CTGCTCTGCT
TCAGTGCTGC
GCTGGGCCGC
GGGTGGGGAG
TACCTCCTGA
AGCGTGCGCG
GGCCGGCCCG
TGGGACTTCG
PATGCCTCGA
3CGGCCCAGG 2 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 ~7840 ~7900 ~7960 GGGCGACAAC ATCACGTGGA GGCAACAGTG GAGCATGTGT CAGCCCCGCC GGCCACCTGG CCCGGAGCCT
GCACGTGCTG
GCATCCCCAC
GCAGCCTGAC
ACCTCTTCGA
CTGGACCTTC
CGGTGACACA
CAACTTCACG
GCGTGAACAG
GGCGCATTAC
CCCTGCAGCC
AGAGAGGCAG
CCTGGCCCCC GTTCCCCTAC CCCGTGCCAG
GGGCCCTGAG
CAGTCACCGC
GTCCAACAAC
AGCCCGTGCT GGTCACCAGC CGTACCTGr'r
CTCTGCTGTGC
ACGGTGGGTG GCTCGAGGT
C
CCGTTAGGTG GCCGGCTGGA
A
GAAGCGOCGC GTGCGGGGGC rD GAGCGTGAGC TTCAGCACGT
C(
CTGTGACCGC TGCACGCCCA
T(
GGGCACCTTC AATATCATCG
TC
CTTCGTCTAT GTCCTGCAGC
T
CCCCACCAAC CACACGGTAC
AC
CAGCTGGACT GCCTGGAGGG
AC
GCTCACCGTC TCGAGGCCGG
CA
AGCGCCTGGG
CCGACTGCAC!C
GTCTTCGT
GCGCGGCT
GGGGATGG4
CGGAGCGG(
TTCACCAGC
TTTGTGCAG
CGCTACAcc 3TGACGTTC kTCTCTGC'I
LTCAAGGTC,
;GCCGTGGG(
CGGAGGTCI
TGAGGTGAC
CGTCGTCAA
GCTGGAGGc :7CCTGGGGG
'ACGGCTGA
ATAGAGGG
;CTGCAGGC
'AGGGGCCC
CCTACCAT
TGGACTTC
GTCCTTG
'CACACCC
MCACCGTG
~CGGAGGC
;CACCAAT
'CACCATG
ICAGCTGGC
CC TGGAGGTGCT GCGCGTTGAA
CCCGCCGCCT
CA CGGCCTACGT CACCGGGAJAC
CCGGCCCACT
DT CCTCCAACAC GACCGTGCG
GGGTGCCCGA
A CGTTCCCCCT GGCGCTGGTG
CTGTCCAGCC
A TCTGCGTGGA GCCAGAG4GTG
GGCAACGTCA
;C TCGGGGACGA GGCCTGGCTG
GTGGCATGTG
T GGGACTTTGG CACCGAGA
GCCGCCCCCA
A TCTACCGAGA CCCAGGCTCC
TATCTTGTGA
GCCAATGACTC AGCCCTGGTG
GAGGTGCAG
~ATGGCTCCCTj TGGGCTGGAG
CTGCAGCAGC
GCCCCGCCAG CTACCTGTG
GATCTGGGGG
CCCACGCTTA CAACAGCACA
GGTGACTTCA
CCGCAGCGAG GCCTGGCTCA
ATCTGACGGT
TGCA.AGCCCC ACGGTGGTGC CCCTGAATG(2 CGGCAGTGAT GTGCGCTATT
CCWGGGTGCT
TCCTACCATC TCTTACACCT
TCCGCTCCGT
GAACGAGGTG GGCTCCGCCC
AGGACAGCAT
GCTGCAGGTG GTGGGCGGTG
GCCGCTACTT
CGTGGTTAGG GATGGCACCA
ACGTCTCCTA
OGCCCTGGCC GGCAGCGGCA
AAGGCTTCTC
GTGCAGCTGC GGGCCACCAA
CATGCTGGGC
GTGGAGCCTG TGGGGTGGCT
GATGGTGGCC
28020 2808o 28140 2&2 00 28260 28320 28380 28440 28500 28560 28620 28680 28740 28'800 28860 28920 28980 29040 29100 29160 29220 29280 29340 29400 ~9460 .952 0 .9580 9640 9700 9760 9820
S.
S S
S
S
5.5.
GCCTCCCCGA
GGCAGTGGTG
CCATTTACCA
AACCCGCTGG
CTCAGCATCA
TTTTGGGGGC
AGCAGCAAGC
CGGCTCAATG
ACCCAGCTGC
TCGTATACAC
CCCATAGCTT
GCTCAGCCAA
GGGCCAGCGA
AGCTGGCCAC
GTGGCCCTCA
CCTCCAACGC
CG
7Dc
CGC
G
GC
TGI
AGT
AGCGTCACCC
GAGGAGGGGC
GGCCTGCACT
GAAGTGGATG
k.GCTTCGTGG 3TGAGCTGGT 3TCTTCCCGG
'TCTCAGCCA
TCAGTGCCGA
TGAGCTGGGA
TGGTCACCAT
TGCAGGTGCC
CGGCCGGGTC
GCTGGGCTGT
ATGCTGGCAC
GCTGGCTGGT
GACCTCCGAG
GACGGCAGGG
TGTGAGTGGC
CTCTGTGCCC
GCCCGGCGGC
CTTCTCCAC
2 2 2 2 2 GAGCCCATCG TGGGCCTGGT GCTGTGGGCC AGCAGCAJAGG TGGTGGCGCC CGGGCAGCcTG
GTCCATTT
GGGGCCAAi
GACCACGTX
ATCGTGGTC
GCCACGGGC
GCCTGGTAC
GACGTCACC
GCCCTGGGC
GCCCTGCAG
CCCAGCCCC
ACAGATGA04
AACGCCTMC
GCCTGCCGG(
CAGCGCAAC9
TACCGCTGGC
GCCCTGCCCC
GTGGGGCACT
ATCCAGGCCA
TACCGCGTGT
AACCTGGAGG
CAGGTCAGTG
AGAGAACACC
CAGCATGGGG
TGGGGCTCTC
GCGCTGAACT
GCTGGCGrGG
ACCAACCAGA
CCTGGTCATC
CT0TaITCTGA
ACCTGCCACA
AGATCCTGCT GGCTGCCGGC CCGAGGTGCT CCCCGGGCCC TGAGCGTGCG GGGCAAAAAC TGGAGGCCGT GACTGGGCTG CTGAGAGGAA CTTCACAGCC TCTCGCTGCA GAAGGTCCAG ACACGCCCGT GGCCGCGGGG TCAGCTGTCA CCTTCCGCCT
GCAGGTCGGC
CGTTTCTCCC ACAGCTTCCC
CCGCGTCGGA
CACGTGAGCT GGGQCCCAGGC
GCAGGTGCGC
CAGGTGCCCA ACTGCTGCGA
GCCTGGCATC
CGCGTGCAGC GCGGCTCTCG
GGTCGCCTAC
GGCGACTCGC TG0TCATCCT
GTCGGGCCGC
CTGTTGGAGA TCCAGGTGCG CGCCTTCAAC A GTGAGAACC A GCGGCCCCTI
GGCGTGTGG(
7CCAGGGCCGJ
ACCTGGTGAC
3AGCCGGAGG']
ACTTGGAGGC
AGGTGTATCC
GCGTGGACGT
ACTGCTTTGT
ATGTGACGGT
GGTCAGACAC
ACGGCGACCA
CGTGGCAGGG
CAGCTTrGCCA
AGGGGGTCTC
AGGCCACGTC
TTGGGCCCCG
AGTACACCTT
CGGTGGGTGC
TACTGTTTTC
TGCAAATTCT
GTTCCACGTA
G CACGCTGGTG 3 CTTCACCAAC
SCTACCACTGG
~GCACTCCTAC
CTTCTTCGTG
GGACGTGGTC
CCACGTTGAC
CACCGCCAGC
GAGCCGGCCT
GTTTGTCGTG
GGCCCCCGAG
ACGGGACCTG
GACGCCGCTC
CCGTCCTCCA
CCAGGGCTGGC
CCGCGCTGTC GCCCCTTGTr c
CGGGAGCAGCA
CAGCCTGACC G
CGCCCGCCCCT
CGTGTTTTAGT
ATGTAACACG A CAGTCTTCAA~ G
CTGGAGGTTC
CGCTCGGCGC
GACITTTGGGG
CTGAGGCCTG
GCGCAGGCCA
CTGCCCCTGC
CTGCGCGACT
TGCCAGCGGC
CGGCTGGTGC
TCATTTGGGG
CGCCTGGTGC
3TGCTGGATGC kGTTTCCACTC ['GCCCCTCAC C 'CCGTCCTCA
G
'CCTGGGCCG
TCGGCCTGCA
,CGGTCACCAT
TGTGGAAGG
C
CGGCCACTrr G GCTGGTGGA G4 CAGCCTGCT 1 CCACATATG Ca
AGGACGCCGT
AGTTTGAGGC
ATGGGTCGCC
GGGACTACCG
CGGTGACCGT
AGGTGCTGAT
GCGTCACCTA
CGGGGCGCCC
TGCCGCGGCT
k.CACGCCACT
XATCATTGAC
;GAGCGAGTCC
X3GCCTGTGT CGTCCACAC C :TGCCTGGTG0 GCTCTGCTrT GAGGGAGGC TCCACGGGA G CGGCCGCAA 0 CCTTGGACA 0G 3CCGCACGC T AGCTTrGCT'
CCAGTATGTG
CGCCACCAGC
AGGGCAGGAC
CGTGCAGGTG
CCAGGTGCTG
GCGGCGATCA
CCAGACTGAG
AGCGCGTGTG
3GCGCTGCCT 3ACACAGAGC
;GGTGGCTCA
TrACGACCCC
GCTTCGACA
'CATGAGCCC
GCCCCGTCC
AAAACTGGA
GGCGGGTGT
CGGCTGGCG
3;AGGAGGCC 2CCAGCCTC 2TCCCCTCT rCCTTCCAA 29880 29940 30000 30D60 30120 30180 30240 30300 30360 30420 30480 30540 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31320 31380 31440 31500 31560 31620 31680 S. S S 9S
*SSSSS
S
5555 5.55
*S
S S
S
S
S
.555
S.
5.55
S.
S
S. S
SS
S.
TCTAGTGGC AAAAGCTACA CAGTCCCCTA GCAATACCAA CAGTGAGGA GAGCCCCTTC CCACCCCAGA GGTAGCCACT
GTCCCCA
CTCTCTG
GTCAGGC(
CAGCTGG(
GTTCCATC
GTCAGGCC
TCCCCCCA
TACACCAG
TGGAAGGT
GCCCCTGG<
CAGGCCAA2
TGGCCGGG~
AGTGAGCCG
CAAGCGAGG
GGGGACGGG
ACCCCTTCID
ATGAGACCA(
TGCGGGACc(
AGGGCTGCGC
TCTTCCCACT
AGTGCAGGCC
TCCCGTGATG
CGCTGGTGTA
TCTACAAGGG
TCGAGGTGGG
ACAGGTGAGC
TGCTCGTAGG
AGTCTGGC TG
CCAGCACGTC
CTUGGAGGGG
TGGAGCTTTG
0CC CATGTCCCTG
TTGCTGC
GAC CGGCCAGGAG
GCTTCGT
300 GCCTGCTGGT
GCCCCAG.
MT GATTOGGT
CTTCCCA
~CC AGCTCAGCCT
CCTGACC(
~TA CGTGGCAGCT
GCGGTCC]
.G GGCCCTCAGT
GAGCATTG
ACGCACACTCC
AGTGTCCT
S0 CAAACCGGAT
GAGTATCC
~AACCTGGAGT
TTGGGAGC
~T AGACCTGTGT
TGGAGGTA,
~CCCATTGTGT
CCTTGGAG
C AGCTCCTACG
TGTACTTGC
GGTGAGTGTTG
AGCGGGGTG
GCCTGCAGGCA
GAAGTGGGG
CCTGCAGCGG
TGGGCTGCA
-CACATCCACG
GGCACTGCA,
CGAGGGATAC
ACCTTCACG(
CTCCATCCGC CTGTCCCCc GGGCGCTGTGjy CACGcCCjc TGCGTOGGGOG GAGCAGCGGX CCGTGGGGAC
CGTCCCTCAG
CGCCCTGCTG CTGCGGCGCT1 CAGCCTCTCC
AGCTACGGAG
CCTGGCCGTG
GTGGTGCAGG
CAGGCCGTGG
GAGGGCGCCC
TCTTTGGCCA
TCACCCTCCC
CACGGGCTCA
CCGCTAGTGT
ATCGAGTACT CGTTGGCCCp ACGTCACATC
TGCTGCATGC
CAGAGGGCTC ATCCCGGG3CC ;TG TGGTGGGCCG GTTCTCACCC
TCACGCTCCC
GAC CCTGAGCCCG TGOTGGCTGC
TCCTGCTGCT
AT GGGCGTCT03T TCCCCAGTCC
CTGCTTTCCT
3AG GGGTCGTCTG AGGGGAGGGT
GTGGGAGCAG
AG GCCCTGGCTA AGGGCTGCAG
GAGTCTGTGA
'CA CACCCACACA TACGTCTCTT
CTCACACGCA
;CC TGCCTCCTGC TAG0GTCCAG
CTGGGTCCAG
CT GCCCTGTGTA TGCCCTTCCG
CCGTCCAAGT
TG GGAGGGAGTG AGCTCACCGG
CAGTGGCCAG
AG CATCCTCCAT GGGTCCCCCA
GTCCTTCCAG
~C CCCACTC~CC CGCCAGGTGC
TGATCCGGAG
VG0 TGTGTCCTGC AAGGCACAGG
CCGTGTACGA
AGGGCCGCTGC CTCAATTGCA
GCAGCGGCTC
*T GGGCGGGCTG GGGATGGTC
CCATGGCCGA
C TGACAGGGCA GAGGGTTGCG
CCCCCTCACC
C GTACGTCA CAACAAGACG
CTGGTGCTGG
GGCATGCGACT GGTGCTGCGG
CGGGGCGTGC
CTCACGGTGCT GGGCCGCTCT
GGCGAGGAGG
'ACCGCCCG~C GCTOGGGC
TCTTGCCGCC
~CCACCAAGGT GCACTTCGA
TGCACGGGTG
ATCCCCCGAC TCTGTGACGT
CACGGAGCCC
GCTGGCATGA CGCGGAGGAT GCTGGCGCCC 3 GTCGCCAGGGc CCACTGCGAG GAGTTCTGTG 3 CCGTG CTGCC CCCGr'rTC AGGCCACACT 3 ACCAGCTGGG AGCCGCGrG GTCGCCCTCA 3 CCGAGACTGC~ CACCTGCTCA, CCACCCCCTrC 3 AGAGCCCAAC GGCAGCGCAA CGGGGCTCAC 3 GCTCCCAGGG CTGCTGCGGC AGGCCGATCC 3 GGTCACCGTG CTGAACGA'30 TGAGTGCAGC 3: GTGCTTGGGA CCAAGACCTG TACCCCTGCC 3: CCAGAATA ATCCCAGTGA CCCTGAAGCA 3: 31740 31800 31860 31.920 31980 32040 32100 32160 32220 32280 32340 32400 32460 3252'0 32580 32640 32700 32760 32820 32880 32940 ~3000 ~3060 3120 318'0 3240.
3 30.0: 3360 3420 3480 ~540
GCACCCCG.
CCACAATG4 ACAG3CTCA(
CGGGAGTCC
TGCCCCCTG
CGGCAGCAC
CACACTGTG
CCCCACCTG
AGACGGGCA(
TCCTCGGTC~
GCCTCTCGGC
GCTGCTCT13T
CCCCTCCTCC
CCCCCCAGCC
CTTCCCCTCC
CGTCCCCCAG
CCCCTCcTCC
CCCTCCCCTC
AC CTTCCGCTCC CAGCAGCCAC
ACCCACCGGG
CA GCCCCCGCCC AGGAGGGCCC
ATGTGCTTAC
'T GTTGTGGTC AGTGCCCGCA
TCACACAGCG
,T GGGCATCTGC TGGCCTCCTG3
CCGGCCTCCT
,C CTGCCCCAGT ACGAGCGGGC
CCTGGACGG
C GAGCCCAGAT ACGCAAGAAC
ATCACGGAGA
G ATGACATCCA GCAGATCGCT
GCTGCGCTG
C TCACCCTGCC CCGCATGCCT
GCCACGGCAC
G CTTGGCCGAG GAGCTGAGCC TCCAGcCTG
C
V' CTGACCTGCT TCAGTAGCCT CAGCCGTTCT
G
GGACCCAGGG TGTAAAGAGG GGCCCAGATG'1 GCCCTCCACT CTCCCCTCCC CTCCCCTCCC
C
CCTCCCCTAG CCCTTCCCCT CCTCCCCTCC C CTTCCCC'rCC TCCCCTCCCC TAGCCCTTCC
C(
TCCCCTCCCC TAGAccTTCC CCTCACCTCC
I
CCCCTCCCTC CCCTAGCCCC TCCCCTCCCC
CJ
CCC TCCCTCT TCCTCCCCCT CCCCTCCTCC
CC
CTGTCCCCCC TCCTCCCCTC CTCCCTCCrc
CC
CCCTCTCCGG
CGTCTGCTTT
CCTGTTTTGC
CCATGAAGA
TCTAGCACGT
AACTGCACCC
GCGCTGCTGA
CAGCTTGCTG
GCGCAGAGcc
CAAGCACGAG
CTCTGGTGTC
CCTGAGGGTC
CCCAG'rGCAT
GGTAGGATGG
PGGGTTCAGC
CCCCCAGGGC
CTCCTTCCT
GCCATGGCGT
;TCCTGTGTG
AACGCAGGGT
GGGGAGGGA
CTAAGAAGAT
CTTCCCTCC
CCTAGCCCCT
CTAGCCCTT
TCCCTTCTTC
CTCCTCCCC
TCCCCTACCC
CCGCTGAG
CCCCTCCACT
L'TCTCCCC
TCCTCCCC!CT
CTTCCTCC CCTCTCCTCC 33600 33660 33720 33.780 33840 3390o 33960 34020 34080 34140 3420o 34260 34320 34380 34440 34500 34560 34620 34680 34740 34800 ~4860 14920 4980 5040 5100 5160 5220 5280 5340 5400 CCCCCTCCTC CCTCCTCCCT CCTCCCCCTC CTCCTCC'rCC CCTCCTCCCT
CCTCCCCTCC
TCCCCTCCCC TCCTCCCCCT CCCCCCTCCC
TTCCTCCCCC
CTCTCCTCCT
CCCATCCCTC
4 S
S.
S.
9905 45.55.
S
S
5.
S S 0@S S
S
a 555.
55.5 .5.5
S
0O 0 S S 55 CTTCCATT'rC
TTTCTTCTCC
T'rCTCCTCTC CCCCTCcTCC
CCCCTCCTTC
CCGGCTCCTC
TCCTCCCCTC
CTCCTCCTAT
TCC!CTCCTCC
TCCCTCCCTT
TTTTCATCCTr CCTCCqrT'
CTCCTCCCAT
TCCTCCCCTC
CTCATCCCCC
CTCCCATCCC
CCCTGCCCTC
TCTCTCCTCC
TCCcTTCTTc
CCTCCCATTC
TACCCCTCCTr
CTCATCCCCC
TCcTCTCCTT
TCCTCCCCGI
CTCTCCTCcr
CTCCC!CTTCT
CCTCCTTTCC
CCCCTCCTcc
CTCCTCCCT
TCCTCTCCTT
CCcTCCTcCT
*TCCCCCCTCC
TCCCATTCTC
CACCTCCCCT
CCCCT'rCTc TCCTcTTTTC
CCCCTCCCAT
CCTCCCACCC
CCCTCCTAAC
CCTCCTCCCC
TCCCCTCCCC
TCTCCGCTCC
TCTTCTCCCC
CCTCTTCTCC
TCCCCCT CCT
CCCTCTCCTC
CCCCCTCCTC
CTCTCCTCCC
CATGCCCCCT
3 3 3 3 3 3' 3' TCCCCCTCCT CTCCTCCCCT CCTTCCTCCT
CCTCTCCTCC
CCTCCCCTCC TCCCATCCCC CTCCTCCCCT CCTCCCTCCT~
CCCATCCCAT
TCCTCCCCTT CTCTCCCCTC CTCTCCTCCC CTCCTCTCCT CTCCTCCTCT
CCTCCCCTCC
TCCCATCCCC
TCTCCTCCCC
TTCCTCCCCT
CTCCTCCTCT
TCCCCTTTTC
CTCTCTTTCT
CTTCCCTCCT
TTCCCCTTCC
TCTCTCCCCT
CCCCTCCTCT
IJ
CCCTCCCCTC
TI
TCCCCTCCCC
TI
TCTTCCTTCC
T
CTCCTCCCCT T1 TTCCCTCCCC
T
CTTTCCTCTT
C(
TCCCTCCCCT T9] TCCCCCTTCT
CT
CCCTTCTCTC
CC
CCCCTCCCCT
CT
CTCTCCCCTC
CC
CTCCTCTCCC CCi TGAAGAGGTG
CC'
GGTCATGC.AG
AG(
CTCTCCAGCT
C
TGTGCA.AGGA
GT
CAGGGAGC'rC
GT
GCTCATCCTrG
A
CATCCTCAACAC
CCCTTTCTC
TGC
TCCCCCTCCC
CTC
CCTCCTC
TCATCCC(
CCCCCTCC
CCCCTCCC
CCTTTTCI
TTCCTCCC
CCTTTCCT
rTTCCCCT
PCTTTTCC(
~CCCCTCC(
'TCTCCTCC
CCTCTTCC
CTCTTCCC
CTCTTCCC
CCTCTTCCQ
,CCTCCCC'
'CTCCCCCJ
'CCCCTCCC
CCTTCTCT
CCCCCTTC
CTCTCCTC
CTTTTCTC
PTGTGTGG
~CACAGAA
kGGGCTGG
;GGGCCAG
,TGCCGCT
GCAGAGA
ACAGGTG
CCA TCCCCCCTCC CCT CCTCTCTCCT 'TT CCCCCTCCTC CC TTCTCTT'jT VCT TCCTCTCCTC TT TCCTTCTCCC CC CCTCCTCCTT CT CCTTTCTCCT T CTTCCCCTCC ~C TCCTCTTCCC
C
C CTCCTCTCCC
C
C TCCCCTTCCC
C
C TCCCCTCCTC
T
C TCCCCTTTTC
T'
C TCCCCTCTTC
C
r CCTCCTCCCT
C(
PTCTCTCCCCT
CC
CTCTCCCCCT
TC
CTCCCCTTCT
CI
TCTCCCCTCC
CC
TCCCCCTTCC
CT
CACTCCCCTC TCi TCGGTGGGCT
GC
AATGCTTAGTGA
GGGTGGGAGCCC
GAGCGGGGCT
G
CGTGCCTGJA, GC1 CCACCGCGGG
CAC
CCGCGGCCCG
TGC
TCTCCTC
CCCCTCOi
CCCCTCC'
CCCTCCT(
CCCTTCTC
CTGTTCTC
TTCTCTGT
rCCTTTCCl
CTCCTCT
TCTCCTC(
TCTTCCCC
TCCCCTCC
TCCCTCCC
TCCCTCTC
CCTCTCCTI
XCTTTCC(
~CCTCTCC(
~TCTCCCC]
*CCCCCTTC
TCTCCCCT
CTCCTCTC
CTCTCTCC
kTCACGTG 3GAGGCTG 3GTGAGGA iCACTGCT LGACGCT3
CGTGACG
CCCATGC
7CC ACTCCTCTCC 'CC TCCTCTCCTT TC TCCCCATCCC
CCCCTTCCTCC
CTCCTGTCCT
.T CCCTTCCCTT 'f CTCTTCCTTT 'C TCCCCTTCTC C CCCTCCCCTC
C
T CTTCCCCTCC
C
r CCCCTCCTCT
T
V CTTCCCTCCC
C
-TCTTCCCCTC C4 *TTGTCTTCCC
IN
CTCTTCCCTC cC CTCTTCCCCT
CC
CCCTTCWCTC
CC
CTCCTCTCCC CC TCTCCCCTCC CC GTCCTCTCCT
CT
CCCCTTCTCT
CC
TCCCCACT(
CCCTCCTCC
CCTTCCCC
TCCCCTCTT
CCCTCCCTT
CTCCCCTrw
CCCCTCCAC'
rTCCTTTTCC TCTTCCCCq
CTCCTCTT'I
CCCTCCCCT
TTCCCCTCC
CCTTCTCTT
CCCTCCTC
CTCTTCCT
CCTCCGCT
~CTCCCCTC
'TTCTCTCC
CCTTCTCT
CCACCCTT
IC 35460 'T 35520 'T 35580 'C 35640 T 3570o r 35760 S35820~ 35880 35940 36000 36060 3612.0 361.80 362:4.0 36300 36360 36420 36480 36540 36600 36660 36720 36780 36840 36900 36960 37020.- 37080 37140 37200 37260 es
SO
SS
0O
S@*S
S
0
S
0050 @500
SS
S. S
S
0 OOOS*e 0 0055 0
CCTCCTCCTC
GTCCCCAGGT
TGGGGGTCCA
CCCGTGTAA
GGCTCCACAC
CACAAGCTG
CCCACCGCCA
CACCCGCCCG
CGCTCTCATG
GGAGGCCCTG
GTCAAGTGGG
GAGGAGGGCG
AGGGGCCCAG
AGGCCATGAT
TCGGAGACAG
CCCCGTGCGG
CCCATCTTCG
CATTCTCCTC
CTCCCTC
CTCCCCCCAA~
CCCCCAJA TTCCCATCCT CCGCGTCGCC
TTTGCCCCAT
CATCCCCCTC CCCCAXATCC .00. a 60
CCCCTCCC
CCCCCCCG'
CCTTCCCCI,
TCTTTTCT(
GTCCTCATc
TCCTTCTGC
CCTCCGGCTr
ACCTCACGC
CAGCCCTCA
CTGACCTCT
ACGCTGGCV3A TGCTATGGc( GCCCTGGCCj CCCTTTGGC9
ACACAGGCCC
AAGGTGCCCA
AACTCCGTTG
AACCCTGCGG
GGGTGGGGCA
TTCCATGGGT
AGAGAGGAGC
TTTCAGGCCC
TGAGCTTCCC
AACCTGAGCC
GCTCGGCTAG
CCTTCTTCAT
CATGGGTCAG
GACCGGGTAT
CTACCCCTTC
GCCACTTCCG
ACTTCAGCGA
TTCCCTATTA CCATCCCTTT -CTCCCCGTCC TTTTGTCCAT -CCTTATCCCC CTTCCCCTCC ACCCTTTTCC TTCCTTTTTC CCATCACCTT CCCCCTCCCC TGCACCTCGC TCTCTGCCCC CCCCTTTTrG CCTGCCCCCA TGTCTGCAGG AGACCTCATC AGCTGGCAGC CGAGTCACCA CCCTCATGCG CATCCTCATG GCGAGGAGAT CGTGGCCCAG TCTCCATCTC
TCTCCCCTTT
TCCCCTCATC
TTCCTCATCC
CTTTCCCCCT GCTCCTCTTC CTCCCTCTCC
CCATCATCCC
CCTCCACCAC
TCTCTCTCCA
CTCAGGTTCC
CCCTTTCTCC
CCCTCCCTCT
ACCTCCCTGT
CACCTGGCCA
GCTCGGACGT
rCTCGGATGG
TGGCGTCCCA
::GCTCCCGCG TGCTCAACGA 3GCAAGCGCT
CGGACCCGCGC
TCTCCATTTC
CCCTCATCCC
TTCTCCCTTC
CCTCATCTTC
GCTTCCCCCT
CAGCCCCCAC
CTCTGCACPG
GCGGGCACCA
GGCCTACAAC
3GAGCCCCTG 7 AGCCTGCTG 3GCGCCCCAG k ACCTCAGTGJ
ATATCAGCA)
GCGCCCAGA]
ACAACTCGGP
TGGTCCAGCC
CCGGGCTGCA
CACGCGGCCC
GTCTCTGGG
TGGGGGCCAC
GTGGCAGAGG
GGCAGGCGTG
CTACCTGGCA
CAGGAGGATC
TTCCCCGGGG
CATTGCCTGG
GGGCTCTGAG
CCTGCCCAGG
CTGGTCGGCG
GGAGGACATG
3 GCCTGGCTG k. CGTGGTGCA(
CTACACCGT(
CCCCATCGAC
CTGGGCTGCC
CCAGGCCTCC
TCTGCAGCTC
CCTGGCCTTG
AAATTTGCTT
GGAGAAGCAG
GTGGGCTCAG
TGACCTGCGC
GTCTACCTAC
CGCCCAGAGT
TGAGCTCTGC
GTTACTIGGCC
ACTGCGACAT
AGCAGAGACC
CTGCAGGTGT
GTGTGGCGGA
LALTTCTCCA TCCCCGAGGC TTTiCAGCGGG 3CTCATCTC TGGTGGACTC
CAATCCCTTT
TCCACCA.AGG TGGCCTCGAT
GGCATTCCAG
CGGCTGGCCT CAGAGCGCGC
CATCACCGTG
CGGGGCCACC GCAGCTCCGC
CAACTCCGCC
GTCGGTGCTG TGGTCACCCT
GGACAGCAGC
AACTATACGC TGCTGGACGG
TGCGTGCAGC
TTCTTGGGGG GAAGGCGTTT CTCGTAGGGC TCTGTTTCAP GGGCTGC'rGG GGGCCTGGCC GTGccAGCTC TGGTGCAGAG GCTCCTA'rGC GAGGGCCATCY GTGGGTGTCC
CCCGGGTGGT
GTTCTGCCCC AGGCCACTAC CTGTCTGAGG ACTCGGAGCC CCGGCCCAAT
GAGCACAACT
CACTCCAGGG TGCTGACCAC
CGGCCCTACA
GGGCCAGCCT GGCAGGGCAG
GGCAGGGCAT
CCATGGGG~c GGCAGGcAGC GAGGGGAcTG CCAACCTGGC GGAGCCTGGG
CTCACGTCCG
CAGCGGGGAG TTACCATCTG
AACCTCTCCA
CCGTGGGCCT GTACACGTCC CTGTGCCAGT CAGAGGGGCT GCTGCCCCG GAGGAGACCT 37320 37380 37440 37500 37560 37620 37680 37740 37800 37860 37920 37980 38040 38100 38160 38220 38280 38340 38400 38460 38520 38580 38640 38700 38760 38820 38880 38940 3-9000 39060 39120 CGCCCCGCCA
GGCCGTCTGC
TGCCCCCAAG
CCATGTCCGC
CTGCAGAGTC GAGGAGGGCC CGTGACCTCC TGGccCGGC ATCGTCATGC
TGACATGTGC
CACAAGCTGG ACCAGTTGGA GGCCGCTTCA AGTACGAGAT CGCAGCGGGG TGGCAGGGCC AGTGAGTCTC
GTCGCAGGCG
GTGTGTTCAT
CCTGGGAATG
CCTAGCTGGA AGTAGGTGCC TGGCCCCACA AGTGACGTGA
G
TCCTGACTGC ATAGCTCGTC
TI
TTCAGTACCT CCATTTTCCT
T
GTTTTTCTTT TTTTAGAGAC
G
GATCTTGGCT CACAGCAACT Ti CCTGAGTAC TGGGAGTACA
Q<
GCCAGGCGAA CTCCTGACCT
C~
GACAGGTIGTG AGCCACCACA
CC
CTCACCCG(
TTTGTGTTJ
TGGGTGGGC
GGCTGTGTC
TGTGTGCCT
TGCCAGCCG
CCTCGTCAA
rCCCCTGCT( rCAGAACAAc
.CCTCGTGAC
~GTCAGTCAG
;CATCGCTAC
CTCAGACGG
TCATTGGAA
GAGTCTCAC
CCAGCTCCC
3TGCACACC kGGTGATCC
TGGCTGTG
C ACCTCACCGC CTTCGGCGCC
AGCCTCTTCG
CTGTGAGTGA CCCTG3TGCTC
CTGGGAGCCT
*T CGGCTCTATC CTGAGAAGGC
ACAGCTTGCA
TCACAGGGC CGACAGCGGA
TGTWACTAC
G GTGACCTACA TGGTCATGGC
CGCCATCCTG
G GGCCGCGCCA TCCCTTTCTG
TGGGCAGCGG
G ACAGGCTGGIG GCCGGGGCTC
AGGTGAGG
TCACTGGCTG TGCTGGTTGC
ACCCTCTGG
;GCAGTTTTTG CAGTGCTGTG
TGAAGGGCTC
CACTCACTGT CCCTGAGGAC
TAGGACAGCT
GGTGGGCAGC CCACGTTCTG
CACAGTAGCG
CACTGTGGGA GACTGTGCAT
CCACCCGCGA
AGGCGCCAGC ACCCTCCCCG
TGGCTGTP'TC
TTGCCCTTCT GGCATTCCCT TTTTGTT'rTC TCTGTTGCCC AGGCTGGAGT
GCAATGGCAT
GGGTTTAAGC CATTCCCCTT
AAGCGATTCT
ACCACACCCA GTTAATTTTT
CACCATGTCA
GCCTGCCTCG GCCTGCCAGA
GTGCTGGGAT
TTCCCATTTT TTATCTCTGT
GCTGCTTTCC
39180 39240 39300 39360 39420 39480 39540 39600 39660 39720 39780 39840 399 00 39960 40020 40080 401 40200 40260 40320 40380 40440 10500 10560 10620 0680 0 7A 0 0800 0860 0920 0980 TCTTCATTGC CCAGTTCrrT CTTTTGATTA CCTACTTrrTA AAAACTGTCG
GCCGGGCGCG
GTGGCTCACA
GAGATCGAGA
TTAGCCCGGC
AATGGCGTGA
GCCTGGGTGA
GGTCTGTCAC
AGGGACGGGT
CAGTGCGGGT
AAGTTACAGT
AAATTGCAGT
TGTGGCTCCT
CCTGTAATCC
CCATCCTGGC
GTAGTGGcAG
ACCCGGGAGG
CACAGCAAGA
TGGGAGAGGA
GTGTGGTGCG
CACTGGTTGT
TCTTTCCATG
AACCGCAGCT
TGAGTGCGCA
GAGCACTTTG
TAACGGTGAA
GCGCCTGTAG
CGGAGCTTC
CTCCATCTCA
GGTGACACAG
GGTCACCGGT
GGTGTGGACT
TAACTTAATC
CCTTGTGTAT
CAGGCCAAAG
GGAGGCCAGC
ACCCTGTCTC
TCCCAGCTCC
AGTGAGCTA
AAAAAAAAAG
CTTCACGCTT
TGTGGCATGA
GAGGCGTGTG
ATGTCCTTA
GGCAGAGCCG
CTGAGATGAC
CAGGCAAATC
TAATAAAAJAG
TTGGGAGACT
GATTGCGCCA
AAAAAAAATA
TGCAGTCTGT
CTGAGGCGTG
CAGCCATGTT
GGTCCTGCTG
TGCAAAGCCG
TTGCCTGGGA
ACGGGGTCAG
TACAAAAAAA
GAGGCAGGAG
CTGCACTCcA
CTGTCACCTG
GCATGAACTG
GACAGGTGTG
TGCATGTCAC
TTAATTGGAC
GGACTGCq
TGCCACACT
4 4 4 4 4 4 4
GTTGGGCI
GTGGGCA'j
AGAGICCT'I
AGCGTGTc
CTGCCCCT
CAGGGCTC
GCAGCGCCI
TGGTGGAG
AGGTTGGGC
GGCGACGCI
TTTGACAAC
ATCCAGAGG
TGGTACGGG
CTTCCTGCCI
AGCACGGGGi
GTGTCCAGC(
CGGAGCAAG(
GCACCATGCC
AGGCTGCTCC
GAGAGAAGAC
ACGGACGACT
CGCTTCCGAA
AGTTAGTCCC
CCCAACACTT
GGCAACATAG
CGCCTGTAAT
CGAGACCAGC
TGGGCATGGT
CATGAACCCA
G(C AGACCGAG( CA TGCTGTAT( C ACCGCAACI 'A AGATCCGAC .T GCCCCCGCA LG CCCTGCCTG T CTTCCTGGT, A GGAGGTGCTD A. GGGTGGTCC' 3, CCCTTTTGCC -ACATCTGGC'.j
CCACCTGCTG-
CTGTTGGCGA
*AGCCCTTCCT
ATGTGTCCAG
TGGTTGTCTA
TGGGCTGGGG
TAGGGCCGCC
GGAAGGGTCA
TCAGAAGCCA
GCACAGCAGC
'C TCCCACCCCTj CCCTCTTGCC
TCCCAGGTAC
G3 GGTGGACAGC CGGAGCGGCC
ACCGGCACCT
CCTGGACATC TTCCGGATCG
CCACCCCGCA
GTGGCACGAC AACAAAGGTT
TGTGCGGACC
r TGGGGCGCCC TGCGAGCCTG
ACCTCCCTCC
3 TTCCTGCAGC ACGTCATCGT CAGGGAcCTG AATGACTGGC TTTCGGTGGA GACGGAGGCC GCCGCGAGTA AGGCCTCGTT
CCATGGTCCC
GCCCCGTGGC CTCCTGCAGT
GCGGCCCTCC
CTTCCGGCGc cTGcTGGTGG CTGAGCTGCA
C
CTCCATATG GACCGGCCGC CTCGTAGCCG CGTTCTC!CTC ATCTGCCTCT
TCCTGGGCGCC
CTCTGCCTAC AGGTGGGTGC
CGTAGGGGTCG
GCCCCTCAC CTCACCTGTG TGGCCTCCTC
T
GCTGAGCCCG CTGAGCGTCG ACACAGTCGC
ID
TCCCGTCTAc CTGGCCATCC TTTTTCTCT
CC
CTGGGGACCC GGGAGTACTG GGAATGGAGC
C~
CACGGCCCAC
GGACGGCGAC
CAGCCTGGGT
CTGCCAAGCT
TGCGCCTCTG
CAGAcGGcAC
AACGGGGGCC
kICTCCGTGGG TGCCTTcTA
;CGTGGCTTC
TTCACTcGC
'AACGCCGTG
GGGCAGCCT
CCTCCACAC
~TTGGCCTG
CGGATGTcc Ir'GGCTCG ;TCCAcCAA
LGGGAGAAA
CAGACGCC
G3CTGTGCA
AGAGGCTG
CCTGTAAT
CCAGCCTG
3TGGCTCA 2AGGAGTT 4 k.ACTTAGC 4 ;AGAATGG 4 ~CATCCTG 4 41040 41100 41160 41220 41280 41340 41400 41460 41520 41580 41640 41700 41760 41820 41880 41940 42000 42060 42120 42180 42240 42300 42360 42420 ~2480 L2540 L2600 2660 2720 2780 2840
ACTTTCCAGT
ACACACTTGA
GAATGGTGAA
ACGCCAGATA
GCTGCAGccA
GCAGCCTTAG
AGAACGAGGG
ACTCAGAAGA
GAGGGAAAGG
CTAGACTGAC
CACTTT(3CTA ATGCACTCCA GAAGAAAATC TCAGTACATC
TATAGGAGT
TTAGAAACGT
CAGGTGGCCG
CAAGACCCCA
CCCAGCACTT
CTGGCCAACA
GGCGGGCGCC
GGAGGCGGAG
CCCAGTGGCC
AGGTGGGCGG
TCTATATAAA
TGGGAGGCCG
CAATGAAACC
TGTAGTCCCA
CTTGCAGTGA
TCTCCAAAAA
TAGAATITTT
GGGCCGGGTG
ATCTGAGTCC
ACATTAJAJA,
AGGCGGGCAG
CCGACTCTAC
GCTACTCGAG
GCCGAGATTG
AAAAAAAAAA
CTAAGCAGTT
TGGTGGCTCA
AGGAGTTGA
GGGCCAGGCG
ATCACTTGAG
TACAAATACA
AGGCTGAGGC
CGCCACTGCA
CC
AG
CG
GA,
G
GCi GGCAACGGAG
CAAGACTCCA
CAGGCTCAGA GCCTTCACGA AAATCCCACA
AAGAAAAGCT
AAGGAAGAAT TAACACCAAT 4 4 CCTTCACAGA
CTCTTTCC
GGGAGGCCGC ACCCCTTAi ACCCCAGCAG
CACACAGA
CGTGGCAGGA ACCAGAGGC AT'GGCCGGGT
GTGGTGGC]
GATCACTTGA
GGTCAGGAG
ACTAAAAATA
CAAAAAATT
GGAGGCTGAG
CCAGGACAJA
TGCGCCATTG
CACTCCAGC
ATTTAAAACT CTGTTCCTT, GCGAGGGGCT GCCATcACG( CTGTCAGATG
GCACCGGGC
TGAGCAGGTC
TGAGCTGCCC
CCGAGCCCCA CACCTGcCGG GTGCTGGACA
GCTCCTTCCT
GGGGTCCTGG GCTGGGcTGG TGCACCTCTC AGCAGGCCTT AA GAATACAG
GGAATGCACAC
3A AGGCGCATP M ACATGTGGC *C ACACCTGTA *T TCCAAGACC.
'A GCTGGGCAD T CGCCTGAAC(
TGGGTGACAC
PGCTGCACCAG
3ACGGTGCAC.
rCTGTCCTGTC
CCCGCTGACC
GCAGCAGGTG
CACGTTCTCA
GG4GTCCTGCC
TGTTGGACAG
AAACCTCGAG
GTCCTTGCTG
AAGCCCAGGG
GGTACCTTGC
CTCAGTTGGC
GCACGGGGCC
CCCTACTCGC
CTGAAGCTCA
TGATCCAGCAC
I'GGAAACGGA
C
AGAGGAGGGG
C
3GGGTGCCAGG 3GAAAGAGCC A GGTGGGAACG CTTCCCATTC
ATACGGAAAC
~G TGGGGTCCTC AAGAGGTTAC ATG4CAAACTA LGCCGCGACCA GGAGGGGTG CTCCCGAGTc T GCTCGTATTT AAGTTAATTA
AAATGGAACG
ATCCCAGCACT TTGGGAGCG
GAGGCGGGCA
A GCCTGGCCAA CACAGTGAAAJ
CCCCGTCTCT
3GTGGCAG C CCTGTAJATCC
CAGCTACTCA
3CGGGAGGTGG AGGTTGCAGT
GAGCTGAGAT
CGAGACTCCA TCTAAAAAAG
AAAATATGAJA
TCTGCTGTCA AGTGTTCAGT
GGCACACGTC
TGTCCCATAT ATCCAGCATT1 CTAGGAcAT'r TGCTGAGGAG GTGGCTTCTC
ATCCCTGTCC
ACTGCCCTCG TCCTGCAGGT
GGCTGGGAGC
CTGGACATCG ACAGCTGCCT
GGACTCGTCC
GGCCTCCACG CTGAGGTGAG
GACTCTACTG
GCCTTGGCGC AGCTTGGACT
CAAGACACTG
ATGAAGAGTG ACTTGT'rTCT
GGATGATTCT
42900 42960 43020 43080 43140 43200 43260 43320 43380 43440 43500 43*560 43'620 43680 43740 43800 43860 43920 43980 44040 44100 44160 44220 44280 14340 14400 44"60 4520 4580 4640 4700 *00*
AAGAGGTGG(
CGGGGGTGTC
CCGGGAGCAC
GTGTGTGTGA
GGCCCTCCGG
GTAGCAATCT
ACGGCTTCTC
TGGGG7GAGA
TGTACCTCTA
CCCTACCCAA
TCGGGGGAGG
CGCCTGGc3GT
TCTGAACCTC
TTCCCTAGAG
CGGGCTGCGT
GTTTGCTCGG
CACATCCCCT
CGAGGGAACG
GCGGCAGCTG
CCTGGCCAGC
GGAGGGGGCT
GATGAAGACC
GACACCCACA
GGGGAThCC
AGGGTGGGGT
TGTTGTCTGT(
cc
GG
TG
TG
;CG
CA
CTGGTGC;z
TGTCTGW-
TCCGTGCG
ACCCGCGC
GACCTGCT
3CGGGCCA
CCAAATC
-=GCAGC
CCTTGCC
)GCTCAGC
GGACTCA
'AGGGCTG
TGGGATC
LGGTCACTGTC
GCTCCATGTC
TGACTGGAcc
CACCTGCAGT
CAGTGACCCG
TGGGCTGGGC
CTTCTCAGCA
TGGGCCCACC
GAGGGGGTcA
AGCCTGTGAG
GGCCAGGCAG
TGGCTGCACC
TCTGGGGTGC
GTCACACCAC
GGGGTGGGCT
CTGGTGTGCT
TCCATTGTGG-
CCAGAGGAGG
TCAGGTGAGC
CTATGCCTCC
GCAGCCCAGC
TGTCCGGCTC
CCGTGGTTCC
ACTTCACTTC
4 4 4 4 CCTCTAGGA GGGAGCAGGC TCATGGGGCT 'rrGTAGGAGC AGAAAGGCTC
CTGTGTGAGG(
CTGGCCGGGG
CCACGTTTTT
TAAATACCCC ATTTTTGGCC GGCCGAGGTG GCCAGATGAC AAACCCCGTC TCTACTAAAA
TCCCAGTTAC
AGTGAGCCGA
AAAAACAAAA
TAGAGGTTAC
CGCACCCACG
AGGTGTGCTG
CAGTGGCATG
TGCACAGCGG
CAGAATTACC
GCTGCTTTTG
GTTAAGCAACC
CTTCCTGATT'
TGGTTG3GTTT
AGCCCCAGGTA
TTTCTGACGGC
TTCCTCAACC C TTGGGTTTGT C GCTCCTTTCC c.
CCGGCGCCTc T CTCTGCCAGC G GAATAGTIGA
GC
TGCCCAGACT
C
CAAGCAGTTA
T
CCGGCTAATT
T
TGGTCTTGAA ci CAGGCGTGAG
C
AGACAGGGTC
T
TCGGGAGACT
GATCGCGCCA
AAATTCCTCA
CTTGTATGTA
GCGTGTTCCC
AGGTCCACAC
CAGTGCAGGT
CTGCGGTGTC
rAATGACGCA ATCCACATC
'AAGAACTAA
.ACAAAAGGCA
'AAAAATACAC
,TCTCCGGCC C AGCTCAGAC T TTCCCCGTC C( CTCCACCCC T A.AC CAGAG C 3CCCr.CA T kGCAGCCTGC TC
CTGGGTI'A
;AGTGCACT
CG
TGCCTCAG CC "ITGTGT CT CCTGACCT CC
ACCACGCCTG
ACTGTCTC CA
ATCTTGGTCT
GCCGCGGTC
CTGACGTCAG
ATACAAA
GACGTAGGAG
CTCCACTCCA
ATTTCTTGGT
GTCACCCACA
ACGCGTGTGA
GCCCTG4CCG
GCGGTCCCC
PCGGTTTGGG
ETTCTCAGAAC
'AGTTTCATT
T
LGAGGTATGA
A
.CATAACCAT C
AGTATCGGGA
AAAGCTCTC T(x CCGCATCCA C) CTTGCTCGG
AC
TGCGTCCT
TC
kCGGCCTTC CC 'CCCTGCCA CT CCTGAGTC AG
TTTTATTTTT
~ATGATCT CO4 TCCCAAGT AG(
GTTTTAGTAC
TGATCCAC ccI
?LCCAAGTTGA
1CTCCTGA C CACAGCAGTG
AGAAATTATG
GCTCACACGT
GTAATCCCAG
CAGTTCGAGA
CCACCCTGGC
TTAGCCGCGC
ATGCTGGCAG
AATCGATTCA
ACCTGGTACC
GCCTGGGCAA
CAAGAGCGAJA
TGTTTTCTAA
CTTATCAACA
rACTCACCCA
CATGGCAGCC
CCCGGGCTC TGCCATGCCC E'TGCACTGCA
GCTGCCTCCAC
GCCCACAG
GCCACACCAC
'AACTACCCC CTGTGACATT 'ACATCCCTC GCACTAAGTC
G
GTGTTAT TCCTTrTGAGT
C
CTGCCCCTG GACTCAAACAA
GCCGGTTT
CACTTTGGCA
CAACATGC(
GCGCCTCTAC
TGAAGGTTG'I
ACTCCGTCTC
AATGGTCATA
3GCGGCGGAG
['CCTATGCTC
;GATTCACTG
LGGGCCTCCA
IGCACAGCAA
TGCGTGACT
CTTCTCATT
~AAGGAAAA
~TAAATTGC
k.TGCTGACC
~CTCCCGGC
TCCCGCG
S
ALCATGAGG(
kiATCCAGC(
,AAATCCAC
~CAGAGCGC
ACTGCTTC
CTCACTC
'TGCGCCTC
CCATGCCT
CCCTGAG
P.TTI'ATTT
CTCACTC
TAAGATT
k.GAGGAGG kTCTCAGC
;GCTACGT
.ICAAGTGA
CATCTTTATG
GTAGTCACTA
ATTCAGTCCT
GTGCCC'CA
GGCCTGACCA
AGCCTCCAGC
AK
Cl
CC
C
r44760 ~'44820 '44880 44940 45000 45060 45120 4518o 45240 45300 45360 45420 45480 45540 45600 45660 45720 45780 45840 45900 45960 46020 46080 46140 46200 46260 46320 46380 46440 46500 46560 AGCCTGCACC CTCCcQ'CCTG CCCTATTrCGC CATTCTCCGT TCATTTGTCT
TGCTATAAAG
TTTTGACATG CAGTCTCTGT.
CAAAGTCTCC
CTCCCACGTT
ACAGCCCCC
CCOCCACAG
TTTCACCATC
TTACCCAGCC
CTCCCAAAAT
GCTCACATTA
CATTTTTTAA
TTTTTTGTAA
TCCTCCTCCC TCAGCCTCCT
GAAGTGCT
GAGAGACA
GGTGATG
TCTCGCAG'
TAGCAGCTC
GGTCACTGC
GGGATTTGP,
GCATGGGAC
CCAATCATA
TGTAAGGGC
GGGACGCAG4
GCACCACTC(
GACCCAAGA(
CCTGGAGG4CC
GAAGACAGAC
GAACTGGGA~z
GGATGCCCC'I
ACTGGTGGAG
GCTCAGCCTG
CCCCCCGGGC
CCTCGGCTGG
ACGGGGGACC
ACTGGTGGCC
GACGCCTGTG
GGCCAAGGAA
TGCGGCCTGT
GACACTCCTG
CTITTGCCCGG
CCCAGCTCCA
CTGTGTGTICT
GGTAGAGTCT
GG GATTACAG GC AAAACAGG 3C TGACGAGG rT GGTCTCGT(
GGGACCGG(
'C TCCTGGAGI ~A AGGCATTC.P 'A GACACACTG G TTTGCAGGG A GCTGGTCAG 3CGGACGCCTI
-CACTCTCGT'
3GCTCAAGALAj
-GGGATGAATJ
ACGCTGGCGC
CAGCCCCAGC
AGCCCCTCCC
GGTCTGCGGA
CTCCTGGTGG
GTGAGTGTTG
GAGCCACTGA
CCTCTGAAGC
AAGCGGCTGC
AGCGCACGTG
GAAGCCCGCA
GCCCCTGCCA
TCCCCCT'rTC
TCTCCCTCCT
ATGCCCACTC
GCCCCCAGGT
TGGGGCGCCC
TTGAGACACT GCGCCCAGCC
AAGAGTGTC'
GCATTCAGTG CAGTGTGACC
CTGGGTCAG(
GCAGGTACGG GAGAGCGTCC
TGAGAGCCCC
TCCCCCTCAA CGTGTCTTCG CTGCCTCTG7 ATATCAGCAT GGTGGCCCGA
TGCAGTGGCA
ACAAGCAGAT CTCTGGCCTC
AGGGAGCCCT
ATGTTT CCTT GTCCAGAAGT
TAATTTTAGG
CGTCTCTAGA TTGTAGAGAT GCTTGTTGGA 9 a T TGAAGGGGG G CTGTGGGCG.
S ACTTCGGT@i r TGGGGTAGG< k CTGCCCGCC( r' CACAGCCTAc
TGCAGAGGC'I
CAGCGAGGCE
TGTGAGCTGC
AGCGCCTGCT
CTGTGGCTGT
CGTGGCTCCT
AGGTGAGGGG
CACCCCCTCC
ACCCGGATGA
TGCCCCGCGT
AGGTCAAGAG
CCTCCGTCTC
CTCACCTCAG
ACCCCACGCC
CTGCCTGGCC
GGGGTTCCGG
CCTGCCAGCT
G CTCATTGCIz A TGGGTTTAI
:CTGGAGTGC-
3 TCTTCCGGC AGO 1'AACA
CATGTCCCT
GGGGGAGCDI
GTCCAGGAC
C~TCTCACAG(
GCCGGCCTM
GGCTGTCTC~z
GTCCAGCAGC
GCTGCCAGGCG
CCAGGTCTTG
AGATGACACC
ACGGCCACCC
GCTACATGGC
TTGTCTCCCA
AAGGCCCTTA
CCCCACTTGC
CTGAAGGCCC
GCAGGGTGTG
CACCTTCCTG
&C CCTGAGAGAc 'C AGCAGCAAG4C ;C TCTTGGTTrc T TTTTGTCGGG T GGGCTTGGCT C AGGTCCAGCA
GGGCCACCCA
N GGTGTGCTTG 3TCTGTCTCTG 3TGTGCCTCCC
GGGTGGGTGG
*GCCAGCTTCC
*GTAGGCTACA
C
CTGGAAGCCC
CTGGTAGAGA
CACGGCTTTG
C
ATGCTGCGGGT
CCTCCCACCCA
GGGGTTCAAT
G
TGCCCCAGTC
C'
CTAAGCACCA
C
TGC
T
rGCCATT Al CAGCCACACC
T
r TTTATCCTCC
SCCGTTCTTTC
GGACTCGGCG
ACCTCTTCTC
CAGCCTCGGT
ACACACTGTT
CCATAAACCT
TGGTTGOAGAC
TGTGCACTGC
GGGCGGGAGA
CTGGCTCCCA
GGGACCCTGT
GCAACTGCCT
CTCCTGGGGA
GCCCAGGCCT
CGTAGCCCCG
CTTCCCCAGG
TGGCCCACGG
3TGCGAGCTT
EGGCCTCATT
;GCCTCCATC
.GTACTTCTC
~CCCGGCTOT
ACTCTTCCT
GAGCCTGGG
TGCACGCAG
CTCTGCAGC
CTGCCAGGG
TGCAGTGGC
CCCTGGCCA
3CCGCAGCC 46620 4668o 46740 46800 46860 46920 46980 47040 47100 47160 47220 47280 47340 47400 47460 47520 47580 47640 47700 47760 47820 47880 47940 48000 48060 48i120 481.80 48240 48300 48360 18420 ATGGCTCCAG CCGTTGCCAA GGGCCCCCCC GTCCACCAGA GGCCAGCTAT GGGGATGCCT GCAGGAGCTG CACAGCCGGG GTCTGTCTTC TGGGCTTTAG CATGTGTAGA TACCTTTGTG CAGAGGGGAG GGACACAGGT CTGTCTGCCC CAGAACATCC CAGGGCAGTG GGAGCCATGT GGGCAGTAGG GGCTGGAGCG' GGATGGCCCA CGTGCTGCTG CCCCACGGCT GCGGCAGGTG C AGCCCTGCTG
TCACTGTGGG
GCCTCCTGGT
GTACATGCTT
CATGCCATGG GCACGCCTAC CCTTCCTGGC
CATCACGCGG
TTTTGCCTTT AGTCCAGCCA GCTGCTAGAA CTGGAGGTAG CCGTGTCTTG
CAGTGCACAG
CCAGGATAAG GCTGAGAAGC TCCCTGGGTC TCTGGTGGCC PGTGACTGAT GCTGTGGCAG CCTACGTCC ACGGGAACCA C ~GGCTGCAGG AAGGTGAGCT ,GAGAGcAGC CTTTAGCGGA
G
CCTCTTAAC ACCGCCGTTT c
CTGGGGCCAG
TTTCTGCTGG
CGTCTGCAAA
TACGGGCATC
GACCCTAGGG
GTGCTGCTGG
GACGGGCCCA
CCAGGTCTAG
GCTCACTCGA
'TCTGAGGAG
TCCAGCCCA
;GCAGGGCGT G ~CTCTGGCAT C CTTCTCTGT
A
TTAAATCGTT
CCCTAGCTGT
CCTCTTGTTG
GTGACCTTTG
c TGGTAACGTT GTCTAATTGA TGGCTGCTGG GAGGGTTCCC TGGGGTGGCG
GCTGACCACI
TGACCCTGCI
GCGCCATCAA
CGGTGCACTG
GACATGTGGA
CATCAGTAGG
TGACAGACAA
CCGTGGCCAG
GGCGGGCATG
TCTGGCCAT
~AGCTGGGGC
;CCCCAAGAC
AGCCCTGCT
.TATGAGAGA
CGAACCAGA
CCTGGCCCC
G3GGTCCACA PiGAGTCCTC
CAGAGCGAG
CGGCCTCCT
kccccTCTC
GTGCAGGA
;CACACT
;TCACCAGC
GCCCCTCA
7ACTGACCC
CCGGCCGT
GCCCTGGC
CTG3CCTCT
CGTCCCGC
4848o 48540 48600 48660 48720 48780 48840 48900 48960 49020 49080 49140 49200 49260 49320 49380 49440 49500 49560 49620 49680 49740 49800 49860 49920 49980 50040 50100 50160 50220 GCTCAGGCGA GCTGGCCAGC AGGAAACACT CCTGTTGGGT
GGCCTGGGGC
CGTGCTCGGC
TCTGTGTGTT
CGCAGGAGGC
ACAATGGCTC GGGGACGTGG
S
S
SSSS
GGCCCCGGGC
GACCAGCCTG
CGGCAGGGCA
GCTGGGCCTG
GCTGGACAAC
CAGGAGCCCA
GCCACGCCTG
GCACCCTCTG
GGGGCTGCAC
CGCCCTCAGC
GCTCACCTCG
GTCTACGCCA
GCTCCGGGGT
TGGTCCTGGG
AGCCTGGAGG
AGGTGGGAGC
CCCTCACTCC
CACTGCGCGG
CGCAGGAGCC
GCCGCCGTCA
GTCCGCCCCT
GTACGCCCGT
TCAGCACTCT
TTCAGCACCA
GCCTATTCAG
AGGACAAGGG
GCCGGAAGGG
GCTCCTGTGC
AGAGCCGCGA
TCCCTCCCCT
TCCGGCCCCC
rCCCCGCAGC 3CGCTGTGTT
:GCTGCGCCT
I'TGCGCTGCG
ACCCAGACCC
GCGATTAcGA
CGCCGGATCT
AGTAGTTCTC
CTGGGGTGCG
CGTGTATGAC
CCGGCTGCGC
GCCCTCTCCG
GCTGGCCTAG
TCCCGCCCTG
(fCTGGAGCTC
CGAGTTCCCG
CCGCCTCAGC
TTTGATGAGG
TCCCGGCCCC
CGTTGGCTGG
GCTGGGGTGA
CAGGAGTGCC
GCACCCACGC
AGCGGGGGCT
TTCCTGCAGC
GGGTGGCCGC
GCGGCTTCCA
CCACCCGCTC
A.CGCGCTACA
3CGGCCGGCC( 3CGGGCCTCT CCCCGGCCAG ACCCCGCGCC
TCCCACCGGC
CCCCTCGCGG GGCCCCGCCC GGCAGCGTCT CACCCCTCGC AGCGCCCCGC CCCCTCGCAG 50280
CGTCCCGCCC
CCCGGCAGCG
GACCGCGCCC
CCCGTACTTG
GGCTGCTGGT
CTGACCGCCA
AGGTGGCGCA
TGGTCAAGGT
GCAGACAGAT
TCTTTGGCAAC
TGGTGCTCGG
AGGGCGTCTT
GCTCGTGTCTT
CCCTGGGACT
G
GTGTGTGGGG C
CCTCGCAGGG
TCCCGCCCCC
CCCACAGGTG
GCACAGGGAA
GGCGCTGACG
GTGGACCCGT
GCTGAGCTCC
GAGGGCTGGG
PTCTCGTCCG
;ACATTATGCC
;GTAGCCTAC
G
,GCTCAGCTC
CCTGTGTGG A GGCTCTCTA C TCTGGGCAC T A.CCACGCCT T4 T'GGAGTTGT T( STACGGCCC A( rGACTGAGC C( GCTGCCCT C2 TGGCTCCG AlI CCTGGGCC G GGCCCTGC TC GCAGCAGC TG 1'GGCCCAT cc I'GTGGACC TO4 CAGCAGCA
T
I'CAGTATT AC~ r'CCCCAGA GAC
CCCCGCCC
TCGCAGGG
TGCCTGCT
GGGCGCTG
GCGGCCACI
TTCGTGCG
GCAGCCCG~
CCGGTGGGC
'AGGCTGCC
GAGCTCTG
CCCAGCTG
,GCTCAGCT
.CTCCCTC'D
CCTGTGTCC
GCGGCTGT(
GCGTGGAGI
CCTGCGCAC
;TGGGGGGG
CTGTGCCG
'CGCTCCTC
'CCTCGCA
CTGGGGAC
CCCAGTT
ACAGCCT
CCGGGCCT
CCACTGG
IAGTCCTC
[W"ITTGCC
CG GCAGCGT( ~CC CCGCCCCC GC TGTTCGCC GC GCGTGCTG G0 CACTGGTA 2G GCCGCCCG VG GCCTGGCG~ *G CGGGGCTG( C AGCAGCTAC C CAGAGCTCC G CCATCCTGG
GTACGCCCTC
3GAGCGTGGC
TGCCGAGTC
SGGGCGCCCT,
SGCTGTACCG<
GCTGCGCC'rc
AGAGGGACAC
CCCCCAGTTC
CAGGGGCTCC
CCCCTCCACC
AAGGTGTGAG
TGACCGACTrC
GCAAGGCC!GC
GCGGCCAGCA
CCCCAGCAGG
CTTCCTGGCG
GCTrGTCAAGG 7CC GCCCCCT( 3GC AGCGTCCC .GT GCACTTCC CG GCTCGGAC CG CCTCGCC CG CCGCTTCA G~C CTCOCTGC 3G CGCACACC *G CTTCGTGC(
GGGGTCAC
~T AGGTGACTG 'A CTGGTGTCG C CAGGCCCTG C TGGCACCTG 1CGOCTGGGo.A
SCCGGCCTG
-TGGATGGGCC
-GCCCTGG4GCj
*CGCCACAAAG
AAGGTATCCC
TCCTCCAGCC
CCTGAGCCCT
AACCAGGCCA
AGGAGCAGCC
CTGCCCAGCC
ACACCCCT4pC
GGGGTGGGCC
CCGAGGGCCA
CCGCTGGCGC
CTACGAGATG
CAAGGAGGTG
TGCAGCCGGA
AGGGATGGAG
CCCACCCAGC
GCTGAGCGTG
AGCCGTGTTC
CTACCAGCTG
CGGATCTTCC
GGCCAGTCGG
CAAGGTCCAc
AGTGGACACCC
CTGCACGTAGC
TTTAAAGAGGC
CGT AGGGCCCCGC 'TC CCGCCCTCCT ~CC GTGGCCGAGG ;CC TGGGCGCGGT AG CTGGGTGCCG CT AGCTTCGACC TC TTCCTGCTTT 2C AGGGCTGCAA C CAGTGGTCCG :C TTGGGCCTXG C GCOGCCGGGG C CTTCCCCGCA T TGGTGCTGTrc T CACCCCTGCT 3CTGTTATTC'p
SAGCCCCAGA
TCAGCAAGGT
CTGCCCAGGG
TCCGCTT'rGA
CGGATGTGCC
AGCTGGATGG
CCCGCCTCCA
CAGAGGACGT
GGGCGCCCGC
GCCTTGCCCG
GGGCCAAGAA
GTGGAGTCGG
GGCAGAATGG
GCTTCAGCAC
50340 50400 50460 50520 50580 50640 50700 50760 50820 50880 50940 51000 51060 51120 51180 51240 51300 51360 51420 51480 51540 51600 51660 51720 51780 51840 51900 51960 52020 2 080 i2140 ~CAGGCAG GGGCATCTGT 2TGTGTGGCC AACCAGGACC CAGGGTCCCC
TCCCCAGCTC
GACACAOCAG TATTOGACGG TTTCTAGCCT COGTC ATATCCCATC CTGAATGC AATTATTC CCGAGGAIC CAGGTACAGC GGGCTGTGCC CGGCCCCACC CCCTGGGCAG ATGTCCCCCA TGCTGGCTTC AGGGAGGGTT AGCCTGCACC GCCGCCACCC TGCCCCTAAG TCCAGTTCCT ACCGTACTCC CTGCACCGTC TCACTGTGTG TCTCGTGTCA TGGTGTTAAA ATGTGTATAT TTTTGTATGT CACTATTTTC ACTAGGGCTG GCCCAGAGCT GGCCTCCCCC AACACCTGCT GCGCTTGGT.A GGTGTGGTGG
CTGCTAAGGC
TTATTACCTC
GTAATTTATA
AGGGGCCTGC
CGTTATGGCA.
GCCCGGCTG(
GCCAGGCAC'
AAGAGAGCAC
TCAGAGGACC
GAGGAAGGT(.
CCCAGGCAGC
GGCATGGCCG
AATAAAAGAG
TTATTTATTT
CTCACACAAA
CATAGCCAGG
GCACCAGAGG
GGCTTGGGCA
CTTGGGCAGC
GCTTGATGTG
GCTATGATGC
TGGCACCTGG
CAGGGCCATC
GGTG3TCC!ACA
TGCTTGGAT(
P CTCATCACC(
CGCCCAGGCC
-CCAGGGTGG1
ACTGTGTGTC
*CTCAAGGCCC
CTTCTAGAGC
CTGTCTGACT
CACTGACAGG
CTCGGTGAAG
TGTGGGCTCG
TAGGCTGGGG
GTAAGTCTGG
CAGCACACCC
GCGGAGCGGG
ACCTGTGAGG
GCCTGTCCCA
T13GCGGGCCA
AGGCCCTCCA
3CGAGCTTGG<
CAGAGGCCT'
TGCTGGCATC
TAGAGGAAAI
TGTGTGTGTC
TCGGAGCTGG
CTCGACACCC
GCAATCTGTG
CAATACCGTC
TCCTCCACCG
GCTGGAGTCT
TTGGAGTAGG
GAGGCGTGGC
CGCCTGAGGG
CAATCCACTT
CCATCTGGGG
CCAGCTCACG
CGAAGGGCAG
TGTCTGGGGA
-CTTGGGCCGG
r GTCATCCTCC
AGGTCTGGGC
GACTCCTCCT
CGCGCGCGCA
CTGTGCCTGC
CCCCAACCCC
CCTCTATGTC
CAAGGCCAGT
AGGAGATGAG
GTGCAGGGGC
CGGCTTCCTCC
AACCGCTCTG C AGCCCCATAT T1
GGAGGGGTAG
ACGTAGGCAG
CCTGGACCCA C
TGCTGGGGGC
CTTGCCCCAG
AAGTAGCAGG
GGGGGCTGGC
CGCGCGAGTG
TTCTGTGTAC
CGCACCAAGC
TGTGCACTGG;
GCAGGAGGGA
"CGCTTCCGC
V'TTGCTATGGC
;CAGATCTGA CCACACACC C ~CCCTACCCG C
,TATCGGTGG
GGGTGAGCTC
CCCCACTCAC
ACAGCTGTCT
GCCAGGTAGC
ACTAGGCATG
TCCCAGGGTG
TGCTGTATGG
CACTTCTGTG3 1GACAAAGTC
GGTCAGGACT
3GGCCCCGGC
[GGCCCACCT
ACGGAGGGT
GGCAGAGGC
GCCCCACAG
TGGCGGAGC
K3TTGGAGCG
ACTATCAGG
ATTTGCGTG
5220o 52260 52320 52380 52440 52500 52560 52620 52680 52740 52800 52860 52920 52980 53040 53100 53160 53220 53280 53340 53400 53460 53520 53577 GTTGCGGTCA GACACGATCT TGGCCACGCT GACTTGGTGG TCACGCCAGG.CCCAGGG INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 53526 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: TGTAk-ACTTT TTGAGACAGC ATCTCACCCT
GTTCCCCAGG
CATGGCTCAC TGCAGCGTCA ACCTCCTGGG
TCTACTTGAT
TGTAATAAJAC CCTCCTGCAJA TGTCTTTGTT
TTTCAJATC
TTCGTGGGTT GATGTTCTAT TTTGTTTTTG
TGTGTGTGTG
TTGAGACACA GTCTTGCTCT TGTTGCCCAG
GCTGGAGTGC
CTGCAACTTC CACCTCTTGG GTTCAAGAGA
TTCTCCTGCC
GATTACAGGC GCCGCCACCA CACCCCGCTA
ATTTTGTATT
TCCATATG TCAGGCTGGT CTCAAACTCC
CGACCTCA
CCCAAAATGC TGGGATTACA GGCGTGAGTC
ACCGCACCTGC
AACACAACAG TTCATAATAT ATTCTACATA GACCATACCT
G
CTCTTTTCCC ATTTAACACC TTTTGCCTTA OGTTTATTTT
T
ACTTACTTTG TTTGCAGT'rT CCTGTCTTTT TTTTTTTTTT
T
CTCACTCTGT CACCCAGcGCT GGAGTGAAGT GGCGGGATCT
C
CTCCTGGGTT CATGCGATTC TCCTGCCTCA
GC'ITCCCGAA~T
TGCCACCATG CCCAGCCAAT TTTTGTATTT TTAGTAGACA
CC
CAGGATGGCT CAATCTCTTG ACCTCGTGAT CCACCTGCCT
C(
ATTACAGGCA TGAGCCACG TGCCTGGCCT TITTTTTCT
TI
CTGTCACCCA GGCTGGAGTG CAGTGGGGTA ACCTCAGGTC
AC
GGTTCCAGTG ATTCTCCTGYC~ CTCAGCCTCC CGAGTAGCTG
O
CATGCCTGGC TAATTTTTGT ATTTTTAGTA GAGACGGOT
TT
CTGGAGTGCA
GTGGTGTGAT
CTGTAAACTT CGAGGGAAGG TTTGTATTTC
ACAGTTTAGC
TGTGTGTTTT
GTGTTTTTTT
AATGGTGTGA
TCTTGGCTCA
TCAGCCTTCC
GAGTAGCTAG
rTTAGTAGAG
ATGGGGTTTC
rGATCCGCCC
ACCTCAGCCT
;CCAATGTTC
TATTTTTGAG
;TTATGTGTA GATAAACAGA CTGGTATCA
ATACTGGCAC
TTTTTTTTT
GAGACAGAGT
GGCTCACTG
CAACCTCTAC
P.GCTGAGAC
CACAACTGTG
3GGGTTTCA
CCATACTGSC
MGCCTCCCA
AAGTGCTGGG
~TTGAGATG GAGTCTrCACT ~TGCGACCT
CCGCCTCCCG
~ATTACAGG CACCCACCAC 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 15-60 1620 1680 1740 1800
S
GGTCTCGAAC
ACAGGTGTA
CCCCCTTTTA
AAGAAACATT
TTGCTCTT
TCACTCTTGT
CTCCTGGGT
TGGcACCACC
TCTTGGCCTC
GCCACTGTGC
GGCTAACAAT
TCCCTCACGT
TCACCACATG
TGCCCAGGCT
CAAGCGATTC
CCCAGCTAAT
ATGTGACCCG
CTGGCCTGGC
TATTCACTGT
CTTCTTCCCT
GATTTTGTTT
GGAGTGCCAT
TCCTGTCZTCA
TTTGTATTT
CCTGCCTTGG
TTTCTTGTTT
TAATAAAAC
GAACCAAACA
TTTTGTTTCT
GGCACAATCT
GCCTCCTGAG
TGCCACGT
CCTCCCAAAG
CTTTTCTCCT
CCTCAGGTCT
AGATCTCTGG
TTG WrTTTTG
CAGCTCACTG
TAGCTGGGAT
TGGCCAGGTT
TGCTGGGATT
CTTCTAGTTT
GTATTTTATC
CACATTTTAT
AGATGGAGTC
CAACCTCCAC
TTAGTAGAGA CG TTCA CCATGTTGGT CAGGCTGGTC
TCGAACTCCT
GATTACAGGC
ATGAGCCACC
GACCTTGTGA
TCTGCCCACC
ACGCCCGGCC
CCCATGGTTT
TTGGCCTCCC
AAAGTGCTGG
TTCAAATAGT TTAGAATTTC ATTTCCAGGT AACTAATTTG CTTCTTTAAA CTAAACAAT'r GCATTTTATT CCACAACCGC
CATATGTCTT
CTTCAAAcAAJ TTCTATTTAA GAAATCCTTT TCATTGAGAC TTGGTTAATC
TGTTTTGC'I
AATCTTAAC
TACCATCCA
GCTCTGTTG
CTACTGCCT
TTTTTGTAT'
TGATCTTGT(
CACTCCCAG(
ACTCTTTTGC
CTGGGCTCAC
CAGTTTG7GC'I
GGTCTCCAAC
ACCGGAGTGA
CTCATTTAAT
AATAAATAAG
TCTTTCAATA
CCCTCTGTCT
GTCGG7GAGGC
AACCGCGGAA
CTGGCCCACA
ACGCACTTTA
CTGCAGTGCG
GCGTGGGGCG
AGCCTCCGGA
CCTGCACCCG
AGGAGCCGCG
GAGCGGCCTG
CCATCCAGCC
'C ATTTGGCAC ;T CTGTCATT' .C CTGCTGTcYI C CCAGGCTGG C AGGCTCCCT r TTTAGTAGA 3ATCTGCCCG
CAGTTCTT=
-CCAGGCGGG
-ACGCTCCTC(
AATTTTGGC'
*TCCTGGACTC
GCCACTGTGC
AAAGGGAATA
GCAATAAATG
AGCTCTCACC
CTCTCTCAGC
TGCGGGTACT
GAAGGATCAG
CCACAGGAGA
GCCTGCAGCG
ACGGCGGTGC
GAGCTTCCGG
TGCCAGTCCC
CCCCGCCCCC
GCGGGGCCCG
GCCCCGAGCC
CGCGCCCGCC
AGTTTCTTGT
ACTGCAATTA
ACCTGGTAAA
GTGCAGTGGC
GTAGCTGGGA
ATGAGGTTTC
CTCGGCCTCC
TTTCTTTrTrT
GGCTGTTTCT
AAAGCTGGGT
TTTCTTTTTT
ACAACCTCTG
TTATAGGTGC
ACCATGTTGG
CAAAGTGCTG
CCATTTTTTT
GTGCAGTGGC ACAATCACGG #ease C.0
GGCCTCAGCC
P' AATTT'ITGTP.
AAGGGATCCA
CCTGCTGGCA
ATTGTAGCAC
AATGGATGGG
ATCAACCTCC
CAGGAAACCT
GACTCGGGCC
GGTGGAGCCT
AGGGCGGAGC
GGGCGGAGCG
TTCCAGACGC
AGGCCCCGCC
TCATCGCTGG
CCTCGCCCCG
CACTGCAGCG
CCGAGCGGGC
ATGCCGTCCG
TCTCGjAGTAC
GAAACGGGGT
CCTTCCTCc
AATTTCTTAA
ACTTTTTCTA
GAATGAAGGA
CATTGCCTGT
GGGGTAGGGA
GCGCACGGAG
GTGGCTGCTG
AGATGGCACC
TGAAAAATAG
TCCGCCCCAC
CTGCTGCCGA
CCCGGTCGCG
TCCGCCCCGCC
CCAGCGTCCG I GTCGCTCAGC I
CGGGCCCCGCC
TC-CCTCCACT
TTGGAATACA
TTTTTTTGAG
CCTCCCAGGT
CTGCCACCAT
CTAGGCTGGT
GGATTACAGG
rTTTTTCGAG
CTCAGCGCAG
:!TGGGACTAC
TCGCCAT
CTCTCAAAG *CTGTCTGTG C
,AGCTGTGAA
GTGGGTTTC C CTCTCTCTT C GCTTGGAGC C rCGCGGGAGA AGGAGGAGG A L'GCCCACCG C, rCGTGCTCC 'ID rCGCATGCG CC CTGTGGAG C 'GTGGCGAA C 'CGCGCGGG GI ;CGGGCGGC CC ;CAGGTCGC C.
GGAGTCCTTG
ATCGCAGCCT
ACGGAGTCTT
TCAAGCGATT
GCCCAGCTGA
CTCGAACTTC
CATGAGCCAC
ACAGGATCTT
CCACTGCCTA
bjAGCGTGAGC
['GGCCAGGCT
rTCTGGGATT
CTCAGTGAC
'ATTCAATGG
TCCCTCTTG
CCCCTCTCT
AGCGGGTGC
AGGATCCAC
ACCCGCCGC
TTCCCGCCC
GGCCGACT
CCCGGGAAC
3GAGGGTGA: 3GGGCGGAG3 kGGAGGAGG
;AGCTCCCG
~CGCAGCC
1860 1920 1980 2-0 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 .TGAGCTGCG GCCTCCGCGC GCGCGCGGCC CTGGGGACGG CGGGGCCATG CGCGCGCTGC CCTAACGATG CCGCCCGCCG 3660
CGCCCGCCCG
GCCCCGGGCG
CCGCCTGCCG
TCCCCGCGGA
GGACGGGCGG
CGCTCCCAGG
GAGCCGCGGT
CACCCGGGGG
GGACAAACAG
GAGAGGTGGG(
GATTTGATGG
GAAGAAGGGG
TGTGAGGGGT
CTCAGGCGCT C CCGTGGGGGG
T
TTATGGAGAC G GAATCCTGAC 'DN
CCTGGCGCTG
CGGCTGCGGG
CGTCAACTGC
CGCCACAGCG
GCGTGGGCGG
GGCGAGGCTC
TGGCACGGCC
CTCTGCAGAC
TCGCTAATTG
3ATGGCTGTA( kTGTGCAAGA I ;GTGAACGGT C ,GGGGCAGGG TGCTATTGG G TAAAGCCTT G CTGCCCAGA G ~TGACCATC Ci
GCCCTGGGCC
CCCTGCGAGC
TCGGGCCGCG
CTGTGAGTAG
GTTCCCTGGC.
CGGCGCGGCA
CCGGGGAGCC
GCCAGCGGGG
3AGAGGAATT 3GGGGCGGCA
~TTGGGCTGA
;TGAGCAAAG)
VGGAGGTGG
G
TTCCAAGGC TCATGTTCG C CCAGGTCTG T GAC.GCATAG G
TGGGCCTGTG
CCCCCTGCCT
GGCTGCGGAC
CGGGCCCAGC
CCGGGAcGGG
CGGCGGGCCC
GAAAAACCCC
GCGGGGCGCG
GGGATGCGGc 3GGAAGAGTTC
['GCTTAGGAA
CCGTGAGGC1
;CTCGCGGGTG
'ATCCTGAGA
A
TTTCGGGAGA
GCCAGGC'rC C' GACCGTGGA G2
GCTCGGGGCG
CTGCGGCCCA
GCTCGGTCCC
GGCACCCGGG
AAGCAGGACG
TGCTAAATAAJ
GGGTCTGGAG
"AGGCCGCGC
-TGGGGCTGC
~CAGGAGGTG 9]
;GGGCGATGAC
IGGAGGCTGG c
GGCTGGGGTC
CAGGGGTGAG
TAAAAACAA C PGTTGGGGG T NTTTGCATT
T
CTGGCGGGGG
GCGCCCGGCG
GCGCTGCGCA
AGAGGCCGCG
CGGGCCAGGA
GGAACCCCTG
ACAGACGTCC
rCAGCTGGGA 3GGGTACCCG
'CTGGAAAAG
;GTGGGTCCA
CACGGGACG
ATGAAGGGC
GGGGGATTG
AiGGTGGCCT
CGTCATGCG
ACAGATGA
3720 3780 3840 390o 3960 4020 4080 4140 4200 4260 4320 43.80 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220' 5280' 5340 5400 5460 5520
GGAAACAGGT
GCAGGTTT
GACTAAGATA
CCTTGTATTT
GATCCCTGGG
TGGTTTTGTA
GTGTCCTTGG
CAATGGAGCC
GTGTCTGG.CG
ATTGCATCAG
GCCAGGCCGG
TTGGAGAGGT
GGAACTATGA
AGCAGACAGT
CCCAGGCTCC
GGACGTGGCA
ACGTTTCTGG
GGCTTTGCTG
CGTGCCCCTC.
GCTGTCTCTG
ACCTACCCCA
ICACTGGGGA
GACACGACCT
GGTGCCCAGG
TGTCCCCAGC
AGGCTCCTCG
CATCCCCAGG
GTCACTCCCG
GGTGGTCTCG
GGGGCCACAT
TGGAGATGGC
CCCGTTGTTT
CTCTGTCCAG4 CCCTCAdGAG4
GTGTTCTCCA
ACCCAGGGTT
GCTGGGGAGA
CCGGGACAGT
CTTGCTAAAC
CCTGTGGCCA
AGGGTGGGAG
TGCTCCTGCG
CTCCTCCTGC
GTGATGCTGT
GCCCTGGTGG
CTGTCCGGGA
GGCCCTTGGC
GGATTGAAAA
GTCTTGGGAC
GTCTCCTTGG
ATTGGGTGGG
CCCTTCCTTA
AAGAATGGGT
CTCCCTGACT
CTGGCAACAG
AGCTGAGGGC
TTCCTGCTrC
GAGAAGGGCG
AICAGCAGTG
GGGCGGAGGG
CAGTCTGATG
GTGCGTGCTG
TTCTGGCATT
GGGGAGCCGT
TCTCCTGGAC
GCGGACGCGT
CACCCACAGA
TCCTCTGTCT
CCAGCACCTG
CTGGTGGCTG
GTCCCAGGCA TCACAGCCGC GATGTGCATA ATGGTGTCCA TGAGAGCAGC CTGAGCGGAG AGCAAGGCCC 0 CCTGCCTrGT GTTGTCCTCT TAGGCTCTGG TCCTGGGGTTr TGGAGGAGGG
GGACCCTGGG
AGTTGGTGGC CTGTCCCAGC GCAACAGGGC ACAGGGTGAC GCCGCCGCTG GGAGAGTTCT AGGAAGGATG TGAAGGCCCT
CTGAGCTGGC
CTCATGTGGG
GGAAGGTGG4G
GGCTGGCCCC
AAGATTCCGA
CAGGTGGGTG
GTGAGGGGAC
CCAGGCCACC
CTCCTCGGGG
GGACACAGAT
ATGCCAGGCC
CTGTTCTGTA
CCA.TGGCAAA
CTCTGTGCTG
AGACGGCTGG
CACAGCTGCT
CCCCAAGTGT
CACACCTGGG
CTAGGGCCTT
TGGGGCAGCC
GTTTTCCCCA
GGCAGGTGTT
CAGCCATTTT GCTGTCTACC CTGCAAACTC
GGGAAGAGG
CAAGGGTCC"
AGGTGCCTC'
GTTCCAGGTC
TGCCGAGAGC
GCTCGAAGCI3
CAGAGCCTGC
T'TTTTTTTT
ACGATCTCGG
CTCCCGAGTA
GTAGAGACAG
CCGCCCGCCT
TGACCCATGT
TCGCAATACT
GTGTGTTCAC
CAATTAGTTG
CTTGTCTCCA
GGTTCAAGCG
GCACCACCGT
CCAGGATGGT
GGATTACAGG
TTTTTTTA
ACCTCACGGC
AGCTGGGATT
GGGTTTCTCC
TTGGCCTCCC
3GTCAAGCTG k AGAGCGTTG P' CCACTTGTGi 3GAGGACAGC
'TGGTGGCTT!
TCGlr7GGGA(
CCTGCCCTCC
TTTTrTTTGAC
CTCATGGCAP
GCTGGGWATTP.
GGTTTCTCCA
CGGCCTCCCA
TTTGAACCAA
GCAGACCCAC
ATTTrTGGTT
ATGTCTTTTT
GG4CCGCAGTG
ATTCTCCTGC
GCCCAGCTAA
CTCGATCTCC
CGTGAGCCAC
GACGGAGTTT
AACCTCCGCC
ACAGGCATGT
ACGTTG3GTCA
AAAGTGTTGG
G GAGAGGTGA C TGTCTGGGTI 3 CTCCCTGGC :D GGGGCTTCT( r TGGAAAAGA' 3ACGTGGGCAC
CTGCCCCGAC
ACAGAGTTCA
CCTCCGCCTC
CAGGCGTGCA
TATTGGTCAG
AAGTGCTGGG
ATTCCAG4CCA
CTAACACAAC
TAATAGTTI2G
TTTTTTCTTT
CAGTGCA'rG
CTCAGCCTCC
TTTTTGTATT
TG3ACCTCGTG
CGCACCCGGC
CGCTCTTGTT
TCCCGGGTTC
GCCACCATGC
GGCTGGTCTC
GA ITACAGGT
A
TCACCAGTAG CCTTCCTGGG
GGGCTCACGC
r GCTGAAGCTC AGCAGGGACA GCTGTGTCCA SAGGCCACAGC CTGCCTTGGG TTAATGATGC V GGCGTACTGC AAAACGTGCT GCTCTGCGTG AGCCGTGGCT GACTCACAGA CCCCCCACCC CCTTCTCCCT CCTGACCCAT GTGTTTTTTT CTCTTGTTGC CAAGGCTGGA GTGCAATGGC CTGGGTTCAA GCGCTTTTTC CTGCCTCA.GC CCACCATGCC TGGCTAATTT TGTATTTTTA GCTGGTCTTG AACTCCTGAC CTCAGATGAT ATTACAGOCA TGAGCCACCA CGCCCAGCCC CCCTTTTATC TGCAAGCATT TTGGAGGGCA AGACAGTTCC TTCATGCCAC CGAAGGCCTG AATTAAGAGC CAAATAAGGT CCACACACTG TTTTTTTTTT TTTTGAGACG GAGTCTTGCT ATCTCAGCTC ACCGCAACCT CCGACTCCCT CGAGTACCTG GTAGCTGGGT TTACAGGCAT TTTAGTAGAG ACGGGGTTTT ACTGTGTTGG ATCTGCCCAC CTCGGCCTCC CAAAGTGCTG CAATGTI'CTT 'TAAAAATATA' TACTTTTT GCCCAGGCTG GAGTGCAGTG GCGCGATCTC AAGTGATTCT CCTGCCTCAG CCJTCTCCAGT CTGGCTAATT TTGTATTrTTT AGGAGAGACG AAACTCCTGA CCTCAGGTGA TCCGCCTGCC GTGAGCCAAC GCGCCCAGAC AAAAATATAT 5580 5640 5700 57 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380
GTGTGTCTTT
CCTGGCTGTG
TTGTTTTGTT
TGATCTCGGC
CCTGAGTAGC
TTAGTAGAGA
GATCTGCTTG
TTATTTATTT
TGGAGTGCAG
CTCCTGCCTC
TTTGTATTTTJ
ACCTCGTGAT C
CCTGTCTTTT
AATTTGTTGA
A~
CCTCCATGGT G TCGTTGGAGC C AGATGTCATC A ACCCTGGCTG G GGCACATCGA
G
GGCCTTGCCCT'
CAGCAGGCGC
AC
GGAGCGACCC CJ GGGCTGGCCA
C
GAGGACTAGGG2 GAGGGCTCCG
T
AGAGA~TVGAT
G
GCCAGGTTTC A CTCGGGGGCA G AAGAGCCCTIG GC GCCTCACCCG G
AAGGCTGGTC
ATCAATTCGT
TTGTTTGAGA
TCACTGCAGC
TGYGGATTAGA
TGfGGGTTTCA
CCTCGGCCTC
A ITTATTTAT
CAGTGCCATC
k.GCCTCCTGAC
['AGTAGAGACC
CTCCCGCCT C LAATGTCCGA
~GAAACTGGCT
TCACCTCCGT
GACATCCCA GCTCAGGGC C 13GGTCACCT C 3GACTTGGT A MTCCCACCT GC ACCCCGGG C E'AGGCCTGA TNC CCACGGAG C LTTGTCACC A 'ATCACTGG AG ,TGGGGAGG C AGCTCCCA C
GGTAGGCC
AGCCGAGC C GTCCCTGA GC( AAG4CAAAG
TGTGAACA,
CGGAGTCTC
CTCCATCT(
GGCGCGCCC
CCATGTTGC
CCAAAGTGC
TTTTATTAT
r'CAGCTCAC
TAGCCTG
GGTTTCAi
AGCCTCCC
rGATGTCTA(
CCTGCAGC(
GGTGCTGTC
TGTCCCAG~z
CCCTGCTC'I
3GTCTGCTG 3GTGCTTGG
'ATCCCTGA
CCCTGCCT
CCTTAGAG
KGCCCTGGC
GGCCTCCA
CCCGCGTT
GCTGGCCT
GGGAGCTG
CTGTCAG
ikGGCAGTG
~CGTCTT
GTAGGACTGG AGAAAGAATG
AAGAATTCTA
ACTGTGCTTG GACCACCTAG
CTGATGTCTT
CTCTGTCACC CAGGCTGGAG
GACAATGGTG
CGGGTTCAAG CGATTCTCCT
GCCTCAGCCT
ACCACGCCCG GCTAATTTTT
AAAAATATTT
CAGGCTGGTC TTGAACTCTT
GGCCTTAGGT
GGGATTACAG GTGTGAGTGA
TGTATTTTAT
TGAGATGGAG TCTCACTCTG
TTGCCCAGGC
GCAAGCTCCG CCTCCTGGGT
TCACGCCATT
CTGCTGCCCG CCACCATGCC CAGCTAATTT C CTGTTAGC k. AAGTGCTGG 3 GACCTTCCCI
TGGATTTCT(
AGTGTGTGC'.
LGGTTGTCCTC
AAAGGCCAC9
CTGTCTCGCP.
TTCACTGATG
ATGACAGGAG
GGGATTGGCG
CGCAACTGCC
TCCCATTTCT
TGAGCCCTCA
CAACCAACAC
GGAAGGACCC
CCCAGAGAGA
AGAGCCTAGG
GTGAGGACCT
C AGGATGGTCT G ATTACAGGCT r TCCTCTCTTT
CTGTGTCTT
P' TTGTGTTTCT
GCTGGCACTG
TCTGGTGCTG
AATGCTGG
TAAAATATAG
AGTGTGGGAG
TCGGGGAAGA
ACAGACACAC
GGTCCCTGGA
GCAGAAGGAGC
GCAGATGATTC
CCAGTGCAG
GTCCCCAAG
AGAGGCCTGTG
GCATCCTGCAT
GGATCTCCTG
TGAGCCACCG
TTCCTTGTGC
GGGGCTGCCA
TGTAAATTCG
GCCTAGGTGT
GTTGCCACTC
TCCAGGACTG
GAGCACCCGG
AG.TGTAGGGA
MAGGCATTCT
=TCCTTGGG
E'CTGAGAGC
;GCCACCCTC
.MCCAAGGAC
'GACATTGAA
GCAAGGTGA
,tCTTCTAGG
GCTCCAGCTG
7440 7500 7560 7.62 0 7680 7740 7800 7860 7920 7980 8040 8 100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760, 8820 8880 8940 9000-6 9060 9120 9180 9240
S
S
ACCTGGCTCC CGCACTCCG GTCCCTGCGT GAGAAACGCT AGTTTCTTTA TTAGACGCGG ATGCAACTC
GCCAAACTTG
TGGACAAAAA TGTGGACAAG AAGTCACACG CTCACTCCTG TACGCGATTG CCGGCAGGGG TGGGGGAAGG GATGGGGAG CTTTGGTTGT GTCTGCAGCA GTTGGGAATG
TGGGGCACCC
GAGCTCCCA
ATCGGCTTGi
TTTAAACATX
GGGTGGATC(
TCTCTAAAA?
TTCAGGAGAC
GATCACACCA
AAAAAAAAAA
GGCCAACAGC
ACCACTGTCC
ATGGCTTGAG
TATTAGTGGT
AGTCTATGAG
ACTAGGAAAC
C TGCAGAGGCG
CATCCAGATCA
3 GACAAGGGGT 3CTTGAGCCCA
ATAAAAGAAC
TGAGGTGGGA
CTGCACTCCA
AAAAAATCAC
GTGTGAGAAG
AGCCGGGCGC
GCCAGGAGTT GGCGCCTGTA
G
GTTGAGACTG C
GGTCTTTAAAA
ACTGTGGAGA
TACAGGGAAC
CACTCACGCC
GGAGTTTGAC
ATTGGCCGGG
CATCACTTGA
GGCTGGGTCA
AGGATCTGAA
DTGGTCGGCC
)GTGCCTCAC
"GAGGCCAGC
TCCCAGCTA
AGTTAGCTG
.AAAAAAAAA
'GGCCAAGGT
ACCCCG'ITC
AGCTAATTAG
AGCCAATAT
C
A.AAAAAATG
C
A.GGAGAATC G
"TCCAGCCTG
3CACCCTCT C( 7TCCATACC T kGGAAATTG A kCCCCTGCA G( kGAGGTGGC A GGGGCCCT G CTGCCAGC CC
CAGAGAGCAC
ACTATGATTC
TGGAATCCCA
ACCAGCCTGG
CGTGGTGGTA
GCCGAGGAGG
CAGAGCAAGA
CAGAGATTTC'
TCATTAGTCA
GCCTGTAATc
CT'GCAGGTCA
AACAACAGAC
GCAGTTTGGG
GCAACAGGGT
TGCATCTGTG
TCAAGGCTGC
CCCTGTCTCA
TCCAAAGAAG
TGAGGGAAAC
XCAGCACTTT
JrGGGCAACA TAGCGAGACC TGTAATCTCA GCACTTTGG
CAGCCTGACC
GTCGTGGGCG
CGGGAGGCGG
AGCGAGACTC
AGCTACTTTG
AGGCAAGATT
GAAAAGGTCT
GAGCCCTCTG
CCTGGTCTCA
CATCAGCATG
TGCCCTGGTT
CTrGCACTcC
AGCCCCTCCC
*AACATGGTGA
CCTGTAATCC
AGGTTGCAGT
TGTCTCAAAIA
GGGGCTGAGG
GCACCATTGC
AGGAAGAGTC
TGTTtrGTC
ITTTACACAC
CTCTG.GGGCA I
TATAGACAGA
CCGAGGGGCT
ACCTCTCCCT
3TTGGGAGGC GATGGTGcc
LAAAACAGGG
rGGGGATCA
'ACTAAAAAT
GAGGCTGAG
ACACCACTG
TGAGCGTGG
~TTGAACCT
3GAGACAGA :7CCGCGGTG4
:ATCACGGC
;GCTCTG
:CGCACAGG
;TGGCGCTTC
.VCCCAGGTC
~CTCCCACC
TGAGGGGGGA
ACTGCACTCC
TGGGCGCGGT
CAAGGTCAGG
ACAAAAATTA
GCAGGAGAAT
CACTCTAGCC
TGGCGCATGC
GGGAGGCAGA
GTGAAACTCT
GCCACGCCGG
k~CCGCAGGGT k.GAAGCCGTG-
'TGCCTGGGGC
:CGAGTCGGGC
CAGCTGCTT C .'CTCCTCCC
I
TCCATGCAGT
AGGGACCCCG
AGIGCCAGGGT
GAGACCCCG
GTCCCAGCTA
AGTGAGCTGT
AAAAAAAAAA
b.CGCACAGAT
,TAAATCAA
kGGAGAGCAG kATAAATAGA 3GATTCCCTG
.GCCTGGGCG.
;GTTCACGCC
GTTTGTGAC
CGAGGTGTG
A.CTTGAACC
GGTCAACAG
T'GTAGTCTC
-TCGCAGTG
PCTCAAAAA
.'TCGCGCT
3CAGCCACT
PGATGATTT
'CACACTAG
GCGATGTG
;GTGCTGCC
9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 TCCCACCTCT CCCTCCCTGC CAGCCCCTCC CACCTCTCCC TCCCTGCCAG
CCCCTCCCAC
11100 CTCTCCCTCC CTGCCAGccc CTCCCTGCCA
GCCCCTCCCA
CCAGCCCCTC CCACCTCTCC CCTCCCACCT
CTCCCTCCCT
ACCTCTCCCT CCCTGCCAGc CCTTCTCTCT
AGTTTCCTGT
TTTGCGCCCT GGAGTCAGAC TGGTCCAGTC TCTCAGCCTC CCCTGCAGGG CAGTGTAGCA TTGTTAAAGT AGCTCTGTCG ATGCACTGTG AGACCTGCCC GACACTGTGC
CTGTGCTTAG
GTGGACTCCT
ATGGACCCCTC
CTGTGGGGGC TCACAAGACC C CTCCCACCTC
TCCCTCCCG
CCTCTCCCTC CCTCCAGCCC CTCCCTGCCA
GCCCCTCCCA
GCCAGCCCCT
CCCACCTCTC
CCCTCCCACC TCTCCCTCCC TCAGTTTCAG
GAAGGAGGCT
CTGGGTTCAC
GTCCCAGCGC
PGTTTCCTCA
CCTGTAAAGT
3TGACCTGGC
TCAGCCACTG
3TTCCCTCAG
GGGTTCCGGGC
'GCCACAGAG CAGAGTGTAA
C
LCCAGACACT GGACGACGGG 'AGCACCCAG CCTCGGTGCC
T
GGCCCACTC CTGCTTGTGC C CCAGCCCCTC
CCACCTCTCC
CTCCCACCTC
TCCCTCCCTG
CCTCTCCCTC
CCTGCCAGCC
CCTCCCTGCC
AGCCCCTCCC
TGGCTCATCC
CTGCTGTGTC
GGGAACCCAG ATGTAGGGAJA CTCCACCTCT
GGTGTGACCT
GGGCTCCATG
ATTAGATGCA
'CAGCCCCAA CAATCATACC ;GCCCATTCC
CCTGTCCTCC
AGCCTGAGG GTGAGAGCCA .GCCAGTGCA GCCTGGGCGG TCAGCGCAG GGCCGCGTGG TACATCTGG GTGTTTGCCC
ATTGGTGCCT
AGAGCCGCGA
TGCCTGAGTT
GGTGGTGGCG
GTTGTGCCAA
CCAGGGCATC
CTGGCCCATG
CCAAATGTGG
CTGCCACTGC
CTGCCTGGGA
TCTGG-AGCTC
CCCGC-TGCTG
GGCACGTCTG
GGAAAATGCA
CTTGCCTAGGC
TTTGACGCGT
GTACCGTCCT
CCGCTCAGTG
GTGTGCGCTG
GCCTGAGCCT
TCTATGAGGG
AGTGGGTGAT
GTCCCGCATC
CCTCGCTCCC
rTTGCTTTCC
TGCAGTGGCC
PC'TGGGCCTG
MGGGCACAGC
;CTCACGCC.C I ;GCCAGGGTG C
TCTGGTGTGT
CACTCCTTTT
CCCGCCCTGA
TCGCTGGTGG
CGACGTCCCC
CCTCCTGCTG
GCGCTGGCCA
TGCAAGCCCC
CCACCTTGGG
PGCAAGGAAT.
kCGGCAGCCC 3CCTGCACTC kGCGGGGGCG
~TTCCCAGAAC
CCAGGCACTC
GTGAGACGTG
GTTCTTTTGA
TGTGCGGACc
GCACCGAGAG
CTTCCCGGCT
GAGCCGTCTC
CGGGGCTGGG
CGTAAGCTGG
CCGCTGCATT
TCTTTGGGAG
TTCTGTTGGC
TGTGGATCTC
CCATCTGGTG ACAGTGGCCG
TCCCTGGGTC
GTGCCTCTCc
ATACTTTGGA
PGCCCGCCAA
PGCTGCCTGc
CACAGTCTC
:ACCTCGCTrC
;GTGGCAGGA
'TCCCGAGGC
CCCTAGGGTA
CCCTGCTCGT
GGGAGACACA
GCACCCTGGA
GCTCCAGCTG
CCTGCAGAGT
TTCATGGCTT
GAAGGGCTAC
CCAGCCCTGG
AAGTGTTG4GC
CGAGTGGCAC
CTTGCTGTTA
CTTTGGGGAG
TCTTCTGAGG
CTCTGCCATC
GGCACCGCTG
TGGGGTGGTT
GGGGGAGACC
CAITTTCTTT
A~TGAAAACAT
GCTGAGGCCG
GAGCGCAGCT
3GCAGCTGTC
MTCTGGGGCT
XCCAGCCTGG
3 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 119S2.0 118 SO 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600'' 126 1272*CF L2 780 L2840 L2900 .2960 GAGGCGGGCT GGGTCCTTTT CTCCCTGCAG CATTCCTGAC CTAGCAGCG CCATGATNCTG AAGACAGGCT GGCTTCTGTGx AGGCCAccTC AGAAAGGGCT TTGTGCCCAG GCAGAGGCGG AAGCCAGCTC TTCCTTCTGG TTGAGGCAGG 13020
AATGAGGCC
GGCTCCGGC-
CCATCAT'rG
TGTGCTGCT
TTGCTTGCC
ACCCTGCTG
TTGAGCCTC,
GTGAAGGGT(
GCGTGCAGGC
ACCCCAGCGC
TCGTCTCCAC
TCGCAGACTC
GTGCCTGCCT
CCTGGGGACA
CTCCCCAGCT
TGCTTGGGGC
CCAGGGTTGT
CATCTGCCCA
CCCGTTAATT
AAAGGAGCAC
GGGGCCGCAG
AGCCCATTTC
GGGCGGAGGG
CCCTGGGTCT
AGGCGGCCCC
GGGGGTGCCG
TTTCTGG
CGCCTGGGGC
ATTCCCTCTG
,A GCGCTGGGC ;T TCTGAGCCC G CTGTGCCCA G CCCTCTCCA C CCCCCCCCA 2CTGCCCCGGG k. GTGTGCCCA! 3GGAGTGGAG
GCACCAGGA(
CAGGTAGCAJ
ACGGCCCMT
CCCTTTCTCA
GTGGGTCCCG-
GTTCAGCCGT
AGTCTCACAC
CTGGCAGCCT
GGCCGGCCTC
GCCCGGGGCT
TAAACAGGAT
AGCGCAGAGG
TG7GGGGTGGC
CTCACCCTGG
GAACCAGCTC
GGCGTCCTGT
TCCGAGCCGG
GGCCTGGCGC
GGCTCCGTCC
ACTCAGCTCT
CTGTGGATTG
.A AGCCCATGCC 'G TCCACTGTGC ,C AGCCGAGTGG G GGCACTGCTG C CATCCTCTTC G AGGTGTTTGG r CAGGAGCGTA Ek GGGATGCAAG 3CCGGCCCTCA r' ACGTATGAAG
TTGAGCTGGC
TCTCCCTCGT I
GGAGGACCTG
GGGAGGGATC
CCCGTGTCTGC
CGTCCTGTAT C CC'rGGCCCTC C GCACTGTTT T CATTTCCGGC C
GAAACAGATGA
ACTGCCGCCT G GCCTGCTCTC G CTCCAGGAGA G CCTGCCCCTG C( TCCCCCCAGC C( ACACCCGCTG C~ GGCCCCCAGA C( GAGTGTGAGA Cl GGAGGGCCCC G
CAGGGAACGT
ATCGTGG4CCC
GTGATGGGAT
TGCCCGCACA
CTACCTTGGC
GGGATGGTGT
AGGTCAGTGC
GGGGTCACAA
TTCTCCCC TT
CGCTCTCCTT
I'GTGTTTTTCC
rcAGCCTCCG '.GGCTGCCCA
=GAAGGACA
GACCCAGAGA
AGAGGCTGC C
CCATGGAAGT
TTTCAAATGA
CTGGAAGCC Gi GGTCATGGC T TCCCCTGTC C' 3GAGGGACG G "ACGGGGCC T
'*GAGGGAGG.A
AGTTTCCAG)
rGACCACAC G'] CCACTCAG C CTGAGGCA G ;GAGCTGCC CC CACAGCTGTG GGAGTACAGG TGGC-CTCAGG ATGGCTCGTA TCCGGCTGCC CCGCTGGATC GCCGGGCGCA GATGGCCAGT TTCCTCCATT GACACACTGG TGGGGGAGGA GGAGGGCCCC
AGCACCTGC
CGCCTGGCT
GAACTGGAA
2CTACACCC
ATCTGTGTM
GGCCGGAGr.
GTCACCCCC
'AATGCCGCI
.CCCTCGTGC
ACCCCCACC
GGTAGGCGG
GCACCGTCC
CCTCACTCT
2GGCTGGCC rCTCCAGCG 77CCTGGGGG 3GCAGGGGG
;CGGTTACA
['AGGCGGC
WqrCGGA
LTCTGGTCT
'TCTGGTTT
'ACACCCAG
C CACACAGGCT C CATGTCAGCT 3 GGTGGCCCCG
ACAGGTGGGC
k. GGCAAGGACA r~ CTCCATCCCT
GGCATCTCAT
GAGCCTGGGG
AGGGCTCTGT
CCTCGTGGGG
AGCCAGCAGC
CAAACTGCAG
CCTTAALATAG
CAGCGAGGAA
CCCAQACTGC
TCCTCTTGCT
CATGGGGCCT
TAAGCTCCGC
CAGCGCGGGC
TGTGCAGATG
GGGGAGTGGG
GTCTGGGGCC
GGAAGTTCTrC 1308o 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 .14400 14460 14520 14580 14640 14700 14760 14820 9 S
S
9.5.
9 CTCAGTCCCA CTGTTGCATT CCCCGACCCC GGCTCCCCCG GCCCAGGAGC GCCTGTGGGG
CAGAAGGCC
GCCAACTGT
CCTGAGGGG
CGGGCAGAC
TGCTGCCCA
GGCCTGGCC~
AGGGGTGAAC
CCCAGCAGCC
CTTCCAGATC
TGCCCCCATG
CCTGGCCCTC
ATAGCAGGAA
CTGCACTCTG
TGCCTGGGCC
CAGACAGCGG
.C AGCCCCAAGA CTTCCCGGC 'G GGCAGAGCCC AGGGGGAGG C CGAGGCTCCA GGGCGAGG@ C CGCTGCAGCA
TGAGACACG'
-AGCCCTGGGA CGTGGCCCCj 'CACCCTCTGC TGTTGCTGC'I ACCCGTACTG
TGGCCACACA
*GCTAAGGAGc CCGCTGGGTC *GGACGCTCGG CGCTGGGGAC GGGGTGTACT CCTCCCGACA GGGCACCCAT
CAGGCTGTCC
CTGGGGTGAG GAGTGCGTGG CTCTGTGCTC TTGCCTGGGC CTGCTGTGTC CCCCATCCTT CGATGATGTC ACCTGGCGGG C CTGCCAQGCC CAGGCTTCAC
CCACCCTCGC
G CAGGAGAGCc AGCGCCTGGC
TGGGAACACC
SGCCCGACCTG GGG TTCACAC GCCCGGGTGG r GTCAGCTA~C TCGGGCCGGC
AGGCTGGCCC
SCCTGTGACGG GTGTGGAGGG
GCAGCCTCCA
CCTGCTCCAG GATTGGCAAG
GGTGCTGGGA
CCTGGGACTT CCTTCTCCAC
CCAGTGGGC
CCACGCTAG2 ATGGTCCTp.A
CTCCTCCCC
CCCTTGTGTC CCGGGGCTGG
GGCACCGTCC
AGCTTGGCTT CAGCTTCCCT
GGGAGCACAT
CTGTGCACCT GGCTCCCACC
CTTCCAGCTC
GGCAGCAAGG GCCTGGGACC CCAGAGGACC TTAGGGCCGC TCGGTGGTCC
TGCTGCCAGA
GCAGGGAJACC AGAACGTGcGj
GGCAGGGCAT
TGCAGAGGAA GCCCGAGG(Y3
CGGGGTGGGG
14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540Q..
15600 15 660 15720 GGGCTG7GCGC GAGGCTGCCT GGCTAGGCCT TGGCGTTCCC CCAGAACGyc GATGGCAAAA
GCAGATGGAG
GTGCTGTTCC
ACGTGAAAAA
CTGTCCCTGA
CCCAGGGGGC ACCTGAGTCC a
CTGCACCTGG
GGCCGAGCGG
GGGAGAAGGA
GGCAGGGCTG
TGGGATGGGC
CCTGG. TTGT
GGCCAAACAT
TCCCCATGCC
GTGTCCCTTC
TAGGGCTCCT
CCGCCGCAGC
TTCCCCCTCG
CTCCTGTCTG
GGTCCCCAAG
GCCAGCCTGG
CCTGCTGGCG
GGGGAGCCGG
CAGCTCCCAG
CTTCAGGCTT
TGGGGGCAGG
CAGGGGGCCT
GCGCCTCGGG
CGCTGGAAAG'
GGCATGGT(GC
GTACGGGAGC
AGCCCACACC
TACCCAGGGC
GGCCCCAGCT
ACCTTGCTGC
AGCCTGGCAC
TGGAAGAAGT
GGGGCTCTGG.
CAGCAGCCAC
rTCCTTCTTT rGCGAGAGCC 3CCTITGTTC
PCCCGGTGCC
PCCCTCCTCA
.TGGGCAGTG
AAGCGAGGTG
TGAGTCCTGyC
AGACGCTTCC
TCATTCCACT
ATTTCTGGGC
GCAGGGAGTG
GTCCATGGCA
GGTCCTCGGC
TCTTGATGGA
CCTTTCTCCC
TGTGCCCCTC
ACCGTGGCCT
PCATTTCTcC 3GTCTAACTGC 3GTGTGAGTCC
AGGACTCCAC
CCAGGGCAGA
ACACCCTGGG
GCCCTGGGCC
CTTGG3GCTGG
CATGGCCAGA
CCCCCAGGCC
TGACCTGCCC
rTTTCCAGAA.
GTGGCCTGGG
CCTGGGGCAG
TGCAGCACC
'TAAAGCATT
AGTTCCTCA
'AGCTGCCTC
GGGGACCCCT
TGCTTCCACA
GGCTGGGGGA
CTGGGAGCTc
GGTGAGGGCC
ACCGGTGACA
TGG3TTCACAG
C.CACCCCTGC
AATGAGGTGT
TGGGAGCTGC
rTTCACAGCT
TCTCGCCCCT
3GTTCTGCCT
GGCACAGTG
kCCCTGTCTC 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 GAGAATGGCC TCTTGCTGGT TGTGGATGCC TAGCAGCGCG TCAGCCCGTG TCCCTACCTT CTCCCAGCCA
CCACCCTGTC
GCTGTGGGCC CACCCATCCT GGTGTAGAGG AGGGGACGGC
CCACCCCACG
TATGGGCAGT
AGAGAAGCAG
GTGCCTGGCC
AGTGTCCCAC
ATGGAACGAG
GGGGGGAAG
CTCCCCTCC
GGAAGGGAA
AGCCAGTGCi
CGTCCGTGG(
CTGGGGTGG(
CGGGCCCCAC
GCGGAGCTC(
GATGGAGGCC
CTAGGCGGAC
CCGTGGCAGG
GGAGCCATAG
AGGGTGGGCG
TTTGCCCCAG
CTCGGGCAGC
CACTGACTAC
TGCCTGGTTT
CCTCCATCCT
GGGTGCTGGA
GTGATTCTCC
GACTAATTCT
ACTCCTGACC
GAGCCATTGC
CTTTTCTTT
GCAGTGGCGT
CCTCAGCCTC
CATTTTTAGT
CATGATCCAC
GGTGGCCCTG
GGAGCCCAGG
TGCACCCCCA
CCGGAGGGGC
GCACAGGCAT
ACCGTGCCCT
CGTTCCCTGT
AGGCTCTCTG
CCATCCGCAG
CAGCGTCTCA TCTGTCTGGG CACCCAGCCC AGGTGAGGGC
-TGGTGCTGC
-CTCTGGTCC,
CGGGCAGCC'
-GGTGGGAGG
*GCCCCGGCT'.
GCACCTGCTC
AGGTGAGGCC
TCCTGTGCC2!
GCAGAGTGAC
CTCTCTGTC'I
TCTGAAAAGA
TGACATGGGA
TTCTTTTTTT
TTTTTTTTTT
GTGCAGGGGT
TGCCTAAGCC
GTAGTTTTGG
TC-AGGTGATG
ACCCGGCTCT
CTTTTT~TT
GATCTL'GGCT
CTGAGTGGCT
AGAGACAGGG
CCACCTTGGC
T TCCTGGCAC( C CTTCTTTGT'.
r GTGTGTGCG' k. GCAGGGCTIGC P GGGGCTGGCI
;GGTCACTCGU
TCGGGGTCCT
GGAGCACTAC
CCCCGAGGTG
ACTCACCTCC
GAGACATGCT
TGTTTTTCCT
GTTTTTGGAG
ATTTTTATTT
GCGATCTCAG
TCCTGAGTAG
TAGAGACAGG
CGCCCGCCTC
TTCCCCTTCT
TCTTTTGAGA
CACTGCAACC
GGCACTACAG
TTTCACCCTG
CTCCCAAAGT
GAGAAGGCCT CGGCTGCTCT P GGTTCCCTTC
PCTCTCCTGCG
,AGGCTGGCAG
'GCCGGGTGGT
GGGGAGGGTG
GGGGAGCAGG
TGGGAGTGCG
CTTGAGGCCG
GCATCACCAG
GCCGCCCTGT
ACGGCTGTGA
TTTTCTCTTT-
TTTGAGATGG
CTCACTGCAA
CTGGAATTAC2
GTGTCTCCGT(
AGCCTCCCAA)
CCT'rTTCTTC TGGAGTCTCG C
TTCGCCTCCC
GCTCCCGCCG C TTGGCCAGGA TCTGGCATTA C
TCAAGCTCTT
CCGGGTAGGC
GGGCTGGGCG
CATTGCTGGG
ACACCTG3GGG
GGGGTGGTGT
TGGGACCAGG
AGGGGAGGTG
CTCCAGGACC
GGTTTCTGTT
CCAATTGTGC
CTTTCCTCCCJ
k.GCTTCACTC 9J ZCTCTGCCTC C kGGTGCTTGC C ;TTGGTCGGT C LGTGCGGGA
'I
'CTCTCTCCT C ~TCTGTCACC A IGGTTCACGT G 'CATGCCCGG C GGTCTCGAT C AGGAGTGAG
C
GCGGGGATGG 16740 GGGGAGCACC 16800 GGTTCAGTTA 16860 CCCAGATCCT 16920 AGCCCCTGGG 16980 AAGCTCCGGG 17040 CTGGCTCCAC 17100 GTCCCCTCAG 17160 GCCCTGGCCC 17220 rCCTGTGGGA 17280 GGTGTTCAGG 17340 AAGAGCAAGT 17400 PiAGTAGAGGC 17460 3CAGACCTGC 17520 kGGGGTGCCC 17580 3AGTTCTCGG 17640 VGTTTGTAA 17700 C1'TTTCTT 17760 TrCTTCTAAT 17820 CCCTCTCAC 17880 TGCAGGATG 17940 CGGTA 18000 ACCACGCCC 18060 TGGTCTTGA 18120 rACAGGOAG 18180 ::CTTT'CTTr 18240 3GCTGGATT 18300 L'TCTCCTG 18360 PAATTTTTG 18420 VCTTGATCT 18480 'ACCGTGCC 18540
S
S
CGGCCATCTI
CGCCCAGGCX
CAAGCGATCC
CCTGGjCTGAT
TCAAACTCCT
CCTTTTCTTC
GTATTTTTGG
TTTTCTAATT
CTGGTCTTTG
CTCGCCGTTT
CTCTCTGGGC
CATGCCGCCA
TCTTTCCTTG CTTTCTCTTT
GTTTTCTTT(
GGACTGCAGT GGCACAATCA
TAGCTCACTC
TTCCTCCTCA GCCCCCCGAG TAGCTGGAJAC TCTTTTTTTC CTTGTAGAGA TGGGGTCTTG, GGCCTTCCCA AAGCACTGGG TTTACAGGCA TTTTTAACTG GAATAGTTGA CGTTTTCTTT CCTTITAGTAT GTCGTGTAAG TTGCTAGTGC TTAITTATAT TTTGCGTAGA AGTTGTGTAT ATGTTTTATT TATTAATTAT
GTATGTATTT
CACCCAGGCT GGAGTACAGT GATGCGATCT TCAAGTGATT TTTCTCTCCT CTACCTCCCG TGCCTGGCTA ATGTGTATTT TTTGTAGATA GAGACCGGGT CTTGCTCTGT CAGCCTCGAC TTCCCTGGCT TACAGTTACA
CACTACCATG
CTATGCTGTC CATCCTGGTC TAAGCCACCA CACCCAGTTT ATTAGCTGTG
TGTCAGGAGG
TTTTCTGAGA TTGTAGTTTG TTTAGATGGA
GTTAGGTCGG
ATTTATTTpT
GAGGTAGAGT
CAGCTCCCTG TAGCCTTGAC AGTACTTGGG ACCCCAGGCG CGGGGTCTCA CTGTGTTGCC 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200
C-AGGGTGGT'.
TGTTACCGGC
CCTCGAGGTC
ACACAGTCAM
GATGGGGGCC
TGGGGGACTG
GTGCCCTTCC
GTGCAGAGGG
TTGGGTCTTC
AGGCCATAGC
ACTAGGGGCT
GCTCCGGGCG
GTCCCdCAGT GC743CiCAGC
TAAGCAACAA
GTGAAATGTA
CCCTTCTGCC
CAGTTCACTG3 r TCAAAATCC'
-GTGTGCCCAC
GCGCCTGCTC
ACCTGGTTG7
CCTCGAGTCT
TGGACAGGG
TGGCCCTGGG
CTGAGGCCCC
CATCAGAAAG
TTGGGGATGC
CTGGCCCTrGA
CTGGACGTTG
CGTGCCAGCA
TGTGGGGGCT
CAAGATTTCT
AGTTGTGGTT
CTCCCAGTTG
CCTGCTTGGA
rGGGCCCAGGC
TGCCTGGCCG
CCCTGTGCTC
GGTCCCACAG
GTGTGGGGGc ATGyGGGGGCC.
TGGACAGGTC
CAGGCCTCTC
TCCCCTCCTG
TGGCAATGTG
CCCACGGCCA
GGCTCCTGGC
TGCGGGGCTC
TCCATCAGCT
ACGTITAGAAG
CTTTGGGTGGC
GTCCGTGTCC C GCCCCCCAGT G
GATCCTTCCG
TCTTGGAGGT
CCTGGTAGCC
TGGGACCACC
TGTGGACAGG
TTGGCCCTGC
CATGTGGCAC
CTGGCTTGGT
ACCTCTGGGA
TGGGATGGGC
CTCACTCCTC2 3AACCTCTCG C k.CTCCGGGTGC r~TGCCGAATc C AGGAATATT
;GTCCTGGCT
:CTTCCAGGCT
;CCGGCrTGGT
TCTCAGCTCC
CTTGTTTCTC
TGGTAGTGAG
CTGTTGGGTT
GTTGGGAGAC
GTGGGATGGG
TCGGCATAGG
rTCCCCAGAT 3TGGGGAGCT
CAGGGAAGG
kGAGACGTCT
~CGCTGGCAG
;GCTGGCGGC
.CCCGTCTCT G3CTATTTA GACCCCAGG
C
TGAGACCAG
TGGGGCAGG
CACGGTGCTG
TGGGT N'ATG
CCTGCTTCTC
CAGAACAGGA
CTTGGCTCTG
TTGGGGGTCC
GCTGAGATGG
GAGTGTTCAT
CAAGGGTGGG
CCTCTGGCCT
XCCACAACCT
!LGCTGTGAGT
kCCGCCTCT
"'CCAGGGATA
TTAATTTAA
CCCCAATAT
,TCCTGGGGG
19.320 19380 19440 19 500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 CTGTCAGGGT GGCTCCAGGO CCTGGTTGCC AGTGGGGGGC TOCCATAGAC
CCTTCCCACC
AGACCTGGT
TGTGACTGTI
CAGCCCGAGC
ATCCCCTTG(
CCAGCCAGGI
GGGGGGCCTC
GGGCGTGGGG
GAGGAGTATG
TCAGCTGCCC
GGCCAGGGCC
TCCAGTGCCT
ACCTGTAGGG
GTGGGGCCCC
C CCCAACAC 3GCCTGGCG 3CAGCCACG
-TGGACAGT
GGGACCTGC
CGGGCCAAC
TGGAGCCAG
TCGCCTGCC
ACGAAGGCC
TCGCAGCCC'
CC'rTTGCC'B
GCCCCACCC'
ACGGACCTC]
CT
TG
TG
T
T
GCCCCTGCCC TGCAGAAACC GCTGCCGCGA TGGG3CGGAGG TGCTGGGCCT GGCTCCCTGG CTGTGGTGAG TGCCGGTGGG CCCTGCAGAC ACTGGGCAG GAACAGCATG GGAGCCTGTG AGGAGCAGAA CCCGGGGTCC CCCTGACAAC AGCTCAGGCA GCTTCAGcCA
GAGGCCTGCA
TGAGTGGGAP.
AGCAGCAGGT
CTGGCCAGCC
TIGGGGCCAGC
GCTCAGGAJAG
AGTGCGGCGG
AGTGGCTGCC
CCGTGGCAGC
3CGCCTTCTG
CCCGTTTGAC
GCGGGTGGTG
TCTGCTTGGC
TCTGTCCTTC
GCCTCTCTGG
GCGGATGTGG
TCTTCTAGGT
AGTGTCCTTT
CTTCTCCACC
CTCCCTGTCA
CTGCCACAC(
GGGCCGGCTG
CCTCGCATCC
GCCCTGGGGG
CCGGCTCAGC
GCCCTGGAGC TCGTGTGCCC CAGAACCGCG GTGGTTCAGG CCGGCCCGAG GTGAGTGTCT GCAGAGCCTG
GTACCCCCGT
GTCTCCAGCG GTGCACCCGC CTGCTACCGC CTGGTGGTGG C'rGGGCCGGG GCCGCCCTGG CCGGGTCACC AGGTGCCTGC GCACTGGGGC AGAGACTGCG GCCCTPGGTGA GCAGGTGGCG AGGTGGACAT GAGCTGGGGG CCTGCACGGG GATCCCTGCC GGGCTGGTIGG GCCC1X3GTCT CAGTGATGGG GCTCCCAGCG T' CTCGGAGCAG 3CCTGTCCCTC P CCTCCAGCAC
GGCCTCTGGC
CTGGGACTTC
CTATGTGCTG
CCTGCTGGGG
GTCCTCGGTG
CCTGGAGGCC
GCTGCCCACT
CTTGGGCCCA
TCTGCCCCTC
AGAAGGCGGC
CAATGGTGGAC
CCCCACCCCCC
GCTGGGGAGTC
CCGGCCGGTG
CAGCCTCCGG
GGTACCCACAG
TGCTCCGGCc
GTCTTCCCTG
CAGCTAGCAG
GGAGACGGc'r
CCTGGGCGCT
ACAGACGTGC
CAGAGTGACG
GCCTACAGCA
CCCCTTCCTC
CACTGACCGT
3GACACGGAG
.'TGGCTGCAG
!AGTCCCGCCC
~GAGGGGCCA
"TCAGGAGGA
;GGCCGTTCC 1
,CACTCCTGG
GCCCCGTGG G
CCCCGCCACC
CCTCCCCAGG
CCTTCCACAT
CCGCCGAGGT
ATCACGTGAC
AGGTGGAAGC
AGAGCCTCGA
TCGTGGCCCT
CCCAGGGCCA
TGACACCCTC
kTCTTCCCTG 3CGCAGGAGC
.TGCAGCGCTJ
M'GGTTGGGAC
LGGAGGTGGG
WGCAGCTcr G r-ACGCCATA C :TGGGTGCTG C
CCCCATCTGG
TCCTGCCCCC
GGCCACCCTG
CGCTGCCCCG
GGATGCCGCT
GGCCGTGCTG
GGCACCTGCC
CCTCAGCATC
GGGCGAGGAG
rCCAGATGGG 3TTCCCACCG 3CAACGGGCA kGTGTCAGGC
='CTGGTCTC
!ATCTCTGAA
LGCTGGGCCG
CAGATGCAG
'GGGAGGTGG
TGTGAGCCT
AAAGGGGGA
GCCCTGAGC
GGCTGGTGCC TGTTGGGGC GGcccAGccc 20460 2052o 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 21540 21600 2.1660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 CCGGGCTCTG AGCCTCAGpT GGCTGCTGTG AGGGTGGGAG
GATGGAGGAG
CCCCTGCCAT CCCACACCCG CCCCCAGGAG CCTAGACGTG TGGATCGGCT TCTCGACTGT
GCAGGGG
GAACTrG( CACCGGG3
GCCCGGA(
GCCGAGCG
CGGCTCGG
GCACTCGC
GGGGACCT
GAGCCCGT
AGGCCTTGC
CTTr3TGCCC
CATTACCTC
GTCCTGCCC
CTGTGGGGG
GGGAGTCTG
AGGTCATGG
TTGGGACCC)
GCACAGCAGC
TGGCCTCTCI
AGGTG4GACAG
TGGGTGGGGC
ACTGGGCTGG
TGGTGCAGGC
CTCCCCTCCT
ACAGGACCCA
TrGC'TCC
GGCCAGGGCT
CGGGGCCCCC
CTCAGGCCAC
CTGCAGGCCC
CCTCTCCCAG
GTG GAG4GTGGGCC CAGCGCCGCA GGGCGAGGCC TTCAGCCTGG
AGAGCTGCCA
CTG CCCGGGGAGC CACACCCAGC CACAGCCGAG CACTGCGTCC
GGCTCGGGCC
POG TGTAACACCG ACCTGTGCTC AGCGCCGCAC AGCTACGTCT GCGAGCTtJCA GT GTGCGGGGGG CCAGGCAGGG GCCTGAGACG CTGGCTGTGG
TTAGGGGCCT
CC CGCGGTGGAG CCTGGGCTGA GGAGGAGGGG CTGGTGGGGG
GGTTTTCG
TC CCCAGTCTGT TCGTCCTGGT GTCCTGGGCC CTGGCCCGGC
GCCTCACTGT
CA CCCCAGGCCC AGTGCAGGAT GCCGAGAACC TCCTCGTG
AGCGCCCAGT
GC AGGGACCCCT GACGCCTCTG GCACAGCAGG ACGGCCTCTC
AGCCCCGCAC
*G AGGTAGTCGG CCCCCCACGT TCTACAACCT GCCCTCCTGC
CTGCCCCTGG
C TGCCCTGCCC ACTGTGGGTC TCGCCAAA ACTTGGGGC CTTAATG3TTG 'A GTGAAGATGG TTGGGAAAAT CCAGAGTGCA GAGAGGAAAG CGTTTACT2A C AGGCCTTTTC TCTGAGCGTG TGTGAGTTAT TCCTGAAAGG
CAGGTCAGG
C CCATGGACAG TTTCCACCGG AGTCTTCCTC TCGAGCGACA GGAGCCAGG3C T CTGATGGCTC GCTCTCCTTC CCTCCCCTCT TCCTGGGAAG
TTCGGGTAG
*GCTTCAGGCT GGGAlrGGGGT CTGTGGAGCT GAGGCGGCCC
CCTGCCCACC
r ATTCCCGGGC CTGCGTCY3A GCCGTGAAGC CTTCCTCACC
ACGGCCGAAT
I GGAGCTCCG CGGCCCGCCC AGCTGCGGCT GCAGGTGTAC
CGGCTCCTCA
TGGGACTCTG GG PGGTGGGT GGTGGGTGGT GGGCGCCGCA
GGACTCGGG
GAGCTTTCAC GTCTGCTGGT CCTGTGGCCA CCAGAGTGGT TCCCAGTCTpqT AGCAGGGG'rT CCAGAGACAC CAGCTCATTCY CAGGTGTCCT
GGGGGTGGAT
CTGCCTGGGG GCCGGCCTGG GTCAGTCGGC TGGCCGGAGA
CGGACGCAC
GAGTGCTGCC CAGGTGGGGA GACCTGTCCT CACAGCAGG CCAGGArrC AGTTGGGCAT CTCTGACGGT GGCCTTV CAAATCAGGG CCCCAACACC 2 CACAGGGACC CCGGAGAACG GCAGCGAGCC TGAGAGCAGG TCCCCGGACA 2 GCTGGcCCcc GCGTGCATrC CAGGGGGACG CTGGTGCCCT GGAGCCAj~cA 2 GCTGGACGCC! TCCTGCCACC CCCAGGCCTG CGCCAATGGC TGCACGTCAG 2 ACCCGGGGCC CCCTATGCGC TATGGAGJAGA GTTCCTCTTC TCCGTTCCCG 2 CGCGCAGTAC TCGGTGTGTG GCCCTGACCTI GGGTCTGTTC CCTGCATCWT 2 CTTCCTGTCT GCTGCCCAGG GTCTGGGTCT GTGCACCAGA CACACCCAGC 2 CTCCCACGTC C'PIGCCACCT CTGACCTCCG ACCTCTGCAG.TGCCCTCGGC 2~ TGGGAGAAGc TCTCGCCTGG GCCCTTGGCA CGAGCTGTGC CTCCTCT'pCC 2~ 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 2-2980 23040 23100 23160 23220 23280 23340 23400 23460 23520 ~3580 ~3640 3700 3-760 3820 3480 3940 4000 4060 4120
S.
S S
S
S
S
5.5.
5**S S S TCTCTCCCAG CACAGCTGCT CC ITCC TOTGTCCCCC GGTCTGCAjAC
TGTCCT
CTTGAGGTGT GGCTGACGAAJ
GCGGGG,
GGTCCACGGG CCATGACCGT
GAGGAC(
CCTCCACGGC CAGGATGTCC TcATGC'j TGGCCCTGCy~ GCCCTCCTGC
ACTGCTC
GTACCTCTCC GCCAACGCCT
CGTCATG
TTGGGCCTGC CCTGCCTGTG
CCCTGCO
GCTGGGCTTG AGGCCCAACC
CTGGACT(
GGTGGGCAAT GGCGTGTCCA GGCACAc GCTGGGCTGv CGGGTCATCT
ACCCTC
CGGCTCAGCC TTGGTGCTCC
AGGGAC
GCCTGGGGGC AGTGTCAGCG CCCGCTpT'j' CGTGCCCGGC TGCCCCITGG
AGACCAC(
GCTCAGTGAG GGGGAGCACG
TGGTGGACC
CCTCAGCCTG CeGGTACGG
CGGAGGAGC
CGAGGCCCGT GTACTGCAG
GAGTCCTAG;
CCCCAGGCAG GTGCCTGCAG
ACAGGGTGC
GAGGGCAGCA GCCCAGTTAC
TGGGGACGT,
TCGGGCTACC TGGTGGCTT
TAAATTCCT(
AACTCATTCC ACTGTCTCAT TTCACAAAAj CGAGCCCCCG TGAGTCTCTC
ACGCCCTCW.
TTTTGAGGTG GAGTCTGAAC CCCTGATGGy; CTGGGTACAT GGGGGACAGc3
GCTGTCTCCA
CCTTGGGAGG GGCCATCAGA
AATGGCGTGA
CAGTGTAGGT GCCTCCCCTC
ACTGCTCCGA
GCCGGGAGcyG TCTGAGAAGA
CTCAGAGAGA
GCTTTACAGA TGGGGAAACT
GAGOCACAGA
T'GGCTGGAGA GCCGGACAGT
GAGTGTCCCA
CCCCGGGCc CTCCTCPCCTP
CTGTGCCCCG
GTCCAGGAGG3 CGACAGGCTA
AGGGCAGAGT
TUTC TGCCAGGTCT TGGCCTGTGT CCTCTCCCCG 24180 GCCT GTCCTTGTCA CGAGCACTGT GGGGAGGCTC 24240 'GCC CTGCGTGTCC ACCCTCATCC GTCGTG3CGG3G 24300 3TGA TGCAGCCCTG CCTCCCTCTC CACACGTCAC 24360 'CCC TGGTGACCTC GTTGGCTTGC AGCACGACGC 24420 'GCC GGCTCCCGGC CACCCTGGTC CCCGGGCCCC 24480 GCT GCCCCACTTG CCAGCCCAGC TOGAGGGCAC 24540 3CT GCTTGCAGCC ACGGAACAGC TCACCGTGCT 24600 3CG GCTGCCTGGG, CGCTATGAGG TCCGGGCAGA 24660 .CT CTCCTGCAGC TTTGACGTG TCTCCCCAGT 24720 -CC CCGCGACGt3C CGCCTCTACG TGCCCACCA 24780 TC TGGTGCCAAC GCCACGGCCA CGGCTCGCG 24840 SA GAATGTCTGC CCTGCCCTGG TGGCCACCTT 24900 -A TACCCTGTTC2 TCAGTGGTAG CACTGCCGG 24960 3T GGTGGTGA AACAGCGCCA GCCGGGCCAA 25020 'C CATCTGTGGC CTCCGCGCCA CGCCCAGCCC 25080 T GGTGAGTATG GCCGAGGCTC CACCACCAC 25140 T CACACAGGOC GTGAGGCCTG GCTTCCCAGT 25200 SGGCCCCGGGC AGGTCCTGCT GGCTGGCTCC 25260 3 GAAAGTCACG GCTCTGACAG TGGCTCCGCT 25320 P' GAATTTAAA CTCTGCTCCC TGACCTCACA 25380 CTGTGTTCTCv GCC!TGGCTAA AGCGAGTGGC 25440 AAACTGCGGG CTGCCCGCGG TGCCACCAG 25500 TCTTGCGGGT ACCTGCCT2T TCACCAGGGG 25560 CCTGTGCAC CTGTCCTGGG TTCTGTAAGC 25620 GCTCTC'rGGG.TGAGGAGCTGv GGGCAAGAGC 25680 GGTGGACTCYp TTGTAGCTGG TACTAGGTwrT 25740 GAGGTTGAGG CATTAGTAGT ACTACATGGC 25800 GCCCGGGCTTj GGCTCCCATG GCATGCAGAG 25860 CGTGGGACpC~ TCC.AGCCCGA CGGGAGGTGT 25920 CCTCCACAGA GCCCAGGC1YG ACACCATTCC 25980
S
S S
S.
@0
S
0
S.
0 1)
S
S
S
O '55 .4 e.g.
S
0 CCCCGCAGAG GTACAGCC( TCAACGACAA
GCAGTCCC]
CGGCGGTCTT
CAAGCTCTC
GCAGGGCGGG
GGCGGGCTC
GGACGCTGCC ATGGCTGTG ACCTGCCACC
TGGGCTCAC
CAACGTCACC GTGAACTAC, GGTCTCCACA GTGCCGGCC( GCTGGTGGAC
TCGGCCGTGC
TGGGCACCAG GCTCTTGTCC TGCI'rTTCTG AGCCTCGGT1 GAGGCTGCAC GGGAGCCGGG CGGGGCGTGA cTGcAGAGTG GCCCACCCCT
GTCCCGGTTC
GATGGGGAGC
AGGCCCTCCA
GACCCCTCGG TGGCCCAGGT GGTGAGGGAT GAGGGGGTGA AGCTCCCCAG TCAAGCTGCC C GTGGTGGAGG CCGCCTCGGA CATGGTCTTC
CGGTGGACCA
[G ACCTTCCAGA ACGTGGTCTT CAATGTCATT
TATCAGAGCG
'A GTAGGTGGGC GGGGGTGGG AGGGGAGGGG
ATGGGGCGGG
ACCTTCACC2'j CTGCCTTCTG CTCTGCTTCA
TGCTGCCCGA
GGTGAGTGGAG GGAGGGACGC CAATCAGGGC
CAGGCCTCTC
T GACGCCTGTpC CCTGCAGCTG ACGGCCTCCA
ACCACGTGAG
ACGTAACCGT GGAGCGGATG AACAGGATGC
AGGGTCTGCA
3 TGCTGTCCCC CAATGCCACG CTAGCACTGA
CGGCGGGCGT
;AGGTGGCCTT CCTGTGAGG ACTCGGGGGC
CGGTTTGGGG
CAGCCCCAGC CTCAGCCGAG GGACCCCCAC
ATCACGGGGT
TCCCTGTCTPG TTGGGAGGdTA ACTGGGTGCA
CAGGAGCCCT
AGAGGCCTCA GCACAGCCGG GTGGGCCCTG
AATGGAGGCC
GAGCCTCGGC TGGGTCCCAA GCACCCCCTG
CCCCGCCACC
ACTCACTGCG TCCCACCGCC CCGGCAGGTG
GACCTTTGGG
CCAGTTCCAG CCTCCGTACA ACGAGTCCTT
CCCGGTTCCA
GCTGGTGGAG CACAATGTCA TGCACACCTA
CGCTGCCCCA
GGGGGCCACT GCCTTTCAG CTCTGAGCAC
GGGTCCCCCC
CCCCTTCCTC CCCAACAGCC CTCACTGTGA
CCTCACCGG
26040 26100 26160 26.220 26280 26340 26400 26460 26520 26580 26640 2'6700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 ~7420 17480 .7540 7600 7660 7720 7780 7840
GCTGATGGCT
GAAGCTGGGC
TGCCCCGAGT
GCTACACCTG
GCAGGTGCAG
AGCAGGTGGG
GACCGTGCTG
CGCCTCCCTG
CGTCACCTTC
CGGGGACGGC
GAGGGGCACC
GGCGGAT
GGAGCAGGGC
TAGGCCCTAC
CCCTGAACTG
GCAGCTGCAC
TGCTCCGTCC
CCGTTCACCC
GGTGCCGCGG
GCATCTAATG
CCCTCCGTGG
TACCCGCACC
TCCCCTGTCC
TACCACGTGC
TGGGGTGAGG
CGCCCCCCGC
TTGGAGGCGG
CGCACGTGGC
CGGGCTCCTC
CTGCCCCACT
CCTTCGAGAA
CTGTGGGTGT
CGCTGCCCTC
TGACCCAGAG
GCCTGGAGGT
GAGGGGCCAG
CCTCCCCGGyG
TGCCGTCCTC
TTGGGAGCCT
AGGCGGGGGG
CGGGCCTGTC
CCTGACGCAG
GAGTGACGGC
GCCTGGGGGT
CCAGCCGGCT
CAACAACACG
CGGACTCAGCC
CGCGGTG4CAG
GCGTGGGGGG
CCTGGCTCTT
GCCAGGCAGC
GGGACCCTTA
CTTCTGCCGA
CCCACAGGTG
CAGGTGCCTG
GTCCTGGTGG
3TCTTTACA 3CCAACCACA
'TGAGCGGTG
,IWACATGA
iCGGGCGACA) AGTGGAcAGG GCTGCTCTroC
CCTCAGTGCT
AGGCTGGGCC
GCGGGTGGGG
AGTACCTCCT
TGAGCGTGCG
CCGGCCGGCC
CGTGGGACTT
2CTATGCCTC
'GGCGGCCCA
;CCTGGCCGT
CATCACGTG
2 2 2 2 2 2 2 CGCGTCT'rrG
AGGAGCTCCG
GCCCCCGTGG
TGGTCAGCGC
GACCTTCGA
GTACCTGCG
GGCCCGG3AG, CTGCATccc(
CTACCTCTTC
GACGGTGACj
CCGCGTGAAC
CACCCTGCAG
TGCCTGGCCC
CACCCGTGCC
GACAGTCACC
GGAGCCCGTG
GCCGTACCTGx
GGACGGTGGG
CACCGTTAGG
C ATGGGGGAC G GCACAGAAC
CCTGCACGTG
-ACGCAGCCT(
-GACTGGACC
CACAACTTCI
*AGGGCGCATT7
CCAGAGAGGC
CCGTTCCCCT
AGGGGCCCTG
GCGTCCAACA
CTGGTCACCA
TTCTCTGCTG
TGGCTCGAGG
GTGGCCGGCT
G GCACCGTGCT GTCGGGCCCG GAGGCAACAG
TGGAGCATGT
T GCACAGTGAC CGTGGGTGCG GCCAGCCCCG
CCGGCCACCT
*TGGTCTTCGT CCTGGAGGTG CTGCGCGTTG
AACCCGCCGC
SACGCGCGGCT CACGGCCTAC GTCACCGGGA
ACCCGGCCCA
TCGGGGATGG CTCCTCCAAC ACGACCGTGC GGGGGTG4CCC CGCGGAGCGG CACGTTCCCC CTGGCGCTGG
TGCTGTCCAG
ACTTCACCAG CATCTGCGTG GAGCCAGAGG
TGGGCAACGT
AGTTTGTGCA G4CTCGGGGAC GAGGcCTGGC
TGGTGGCATG
ACCGCTACAC CTGGGACT'rT GGCACCGAGO
AAGCCGCCCC
AGGTGACGTT CATCTACCGA GACCCAGGCT
CCTATCTTGT
ACATCTCTC TGCCAATGAC TCAGCCCTGG
TGGAGGTGCA
GCATCAAGGT CAATGGCTCC CTTGGGCTGG
AGCTGCAGCA
TGGGCCGTGG GCGCCCCGCC AGCTACCTGT
GGGATCTGGG
GTCCGGAGGT CACCCACGCT TACAACAGCA
CAGGTGACTT
GGAATGAGGT GAGCCGCAGC GAGGCCTGGC
TCAATGT.GAC
GGTGAAGCGG CGCGTGCGGG GGCTCGTCGT CAATGCAAGC
CGCACGGTGG
TGGGAGCGTG.AGCTTCAGCA CGTCGCTGGA
GGCCGGCAG']
0 0 S*0.
0 GCTCTGTGAC CGCTGCACGC CGTGGGCACC
TTCAATATCA
CATCTTCGTC TATGTCCTGC CTTCCCCACC AACCACACGG CTACAGCTGG ACTGCCTGGA CTCGCTCACC GTGCTCGAGG GGGCAGCGCC TGGCCGACT GGCCGCCTCC
CCGAACCCAG
TGGTGGCAGT
GGTGTCGTAT
CGAGCCATTT
ACCACCCATA
AGGGAACCCG
CTGGGCTCAG
TGGCCTCAGC ATCAGGGCCA GCCCTTTTGG
GGGCAGCTGG
CCATCCCTGC
TCGTCACGGC
AGCTCATAGA
TACAGCTIGCA
GGGACAGGGG
CCGGCACCTA
GCACCATGGA
CTGCCGTCZA
ACACTTGGTC
GCTTCCCCAC
CCAACGCCAC
GCGAGCCCGG
CCACGGGCAC
GGGTCCTACC
TGAGAACGAG
GGGGCTGCAG
GGCCGTGGTT
CCCGGCCCTG
CCATGTGCAG
CTTCGTGGAG
CACAAGCGTc
CTTGGAGGAG
ACCCGGCCTpG
CGTGGAAGTG
AGGCAGCTTC
CAATGTGAGC
GATGTGCGCT
'ATCTCTTACA
GTGGGCTCCG
GTGGTGGGCG
AGGGATGG3CA
GCCGGCAGCG
CTGCGGGCCA
CCTGTGCGGT
ACCCTCAGTG-
.GGGCTGAGCT 4
CACTTGGTCA
GATG-TGCAGG
GTGGCGGCCG
TGGTGCTGG
TGCCCCTGAJA
ATTCCTGGT
CCTTCCGCTC
CCCAGGACAG
GTGGCCGCTA
CCAACGTCTC
GCAAAGGCTT
CCAACATGCT
GGCTGATGGT
CCGAGCTGGC:
GGGAGACCTC
CCATGACGGC
L'GCCTGTGAG
3GTCCTCTGT 2 27900 27960 28020 28080 28140 28200 28260 28320 28380 28440 28500 28560 28620 28680 2'8740 28800 28860 28920 28980 29040 29100 29160 29220 29280 29340 29400 29460 29520 ?9580 ~9640 ~9700 CGGCAGCAGC AAGCGIYGGCC CTCATGTCAC CATGGTCTTC CCGGATGCTG GCACCTrTCTC
CATCCGGC
GGAGGAGC,
GCTGGTCC,
CGGCGGGG(
CGGAGACC)
GCGCATCG1
CATCGCCAC
CTACGCCTG
CCGCGACGT
CAACGCCCT
TGTGGCCCT4
CAGCCCCAG
GGACACAGA
GGTGAACGCC
GCTGGCCTGC
ATCACAGCGC
TGAGTACCC
TGTGGCCC!TG
GCCTGT3GGG
GAGCATCCAG
CTCATACCGC
CCCCAACCTG
GACACAGGTC
GCICCAGAGAA
GTCCCAGCAT
TGGATGGGGC
GTGTGCGCTG
GGCGGCTGGC
GGCCACCAAC
CCTCCCTGGT
CTCTCTGTTT
TC AATGCCTC CC ATCGTGGG TTTCAGATi 'C AACCCCGA( C GTGGTGAGC 'G GTGCTGGAC G GGCACTGAC G TACTTCTCG C ACCTACACG G GGCAGTGAG 3CAGAGCGGCI
-CCCCGGCGD
r GAGCCCAGG(
-TCCAACCTGC
*CGGGAGCCG(
*AACTACTTIGC
TGGGAGGTGI
CCCGGCGTGG
CACTACTGCT
GCC!AATGTGA
GTGTGGTCAG
GAGGACGGCG
AGTGCGTGGC
CACCCAGCTT
GGGGAGGGGG
TCTCAGGCCA
AACTTTGGGC
GTGGAGTACA
CAGACGGTGG
CATCTACTGT
CTGATGCAAA
CA ACGCAGTC CC TGGTGCTG CC TGCTGGCTI 3G .TGCTCCCC( IG TGCGGGGC) ;G CCGTGAGT( ;A GGAACTTC.P 'C TGCAGAAGG C CCGTGGCCG A ACCGCAC!GC C C(CTGCTTCA, 3 TGGCCTACCJ 3 CCGAGCACT(
-TGAGCTTCT'
AGGTGGACG]
AGGCCCACG'I
ATCGCACCGC
ACGTGAGCCG
TTGTGTTTGT
CGGTGGCCCC
ACACACGGGA
ACCAGACGCC
AGGGCCGTCC
GCCACCAGGGC
TCTCCCGCGC
CGTCGCCCCT
CCCGCGGGAG
CCTTCAGCCT
GTGCCGCClCG
TTTCCGTGTT
TTCTATGTAA
CTGGGTCTCA
GCCACGTAC.'
GGCCAGCAGC
AAGGTGGTGG
CGGCTCAGCT
GTCACCTTCC
GCCCCGTTTC
TCCCACAGCT
AAACCACGTG AGCTGGGCCC GCTGCAGGTG
CCCAACTGCT
AGCCCGCGTG
CAGCG(CGGCT
CCAGGGCGAC TCGCTGGTCA C GGGGCTGTT T GGTGCTGGA C CAACCGCTCI k CTGGGACTr
-CTACCTGAG(
[P CGTGGCGCAc 7GGTCCTGCCC
TGACCTGCGC
CAGCTGCCAG
GC(CTCGGCTG
CGTGTCATTT
CGAGCGCCTG
C(CTGGTGCTG
GCTCAGTTTC
TCCATGCCCC
CTGGCCCGTC
TGTCTCCTGG
TGTTCTCGGC
CAGCACGGTC
GACCGTGTGG
CCCCTCGGCC
TTAGTGCTGG
CACGACAGCC
G GAGATCCAGG G GTTCAGGAC!G 3G(CGCAGTTTG r GGGGATGGGT 3C(CTGGGGACT 3G(CCACGGTGA
CTGCAGGTGC'
GACTGCGTCA.
CGGCCGGGGC
GTGCTGCCGCC
GGGGACACGCC
GTGCCCATCA
TI
GATGGGAGCG
CACTG7GGCCT
G
TCACCCGT(CCA
-CTCAGTGCCT
G
GCCGGGCTCT
G
CTGCAGAGGG
A
ACCATTCCAC
G
AAGGCCGGCC
G
AC 'TGCCTTG
G
TGGAGGCCGC
A(
TGCTTCAG(CT T
ACCTCACGGC
CGCCCGGGCA
G(CCTGCAGGT
T(CCCCCGCGT
AGGCGCAGGT
GCGAGCCTGG
CTCGGGTCGC
TCCTGTCGGG
TGCGCGCCTT
CCGTCCAGTA
AGGCCGCCAC
CGCCAGGGCA
AC(CGCGTGCA.
CCGTCCAGGT
TGATGCGGCG
CCTACCAGAC
3CCCAGCGCG
GCTGGCGCT
ACTGACACA
'TGAGGGTGG
.GTCCTACGA
TGTGGCTTC
CACCCATGA
GTGGGCCCC
CTTTAAAAC
3GCTGGCGG 3GAGCGGCT 2AAGGAGGA k.CAGCCCAG
MCTCTC(CC
M'CTTCCTT
29760 29820 29880 29940 30000 30060 30120 3018o 30240 30300 30360 30420 30480 305 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31j,20 31380 31440 31500 31560 a*.
CCAAACCTGC
CACAGTTCCA
TACACAGTCC CCTAGCAATA CACTGTCCCC
AGCCCATGTC
TCCCCTCTCT
GGACCGGCCA
TGCTGTCAGG CGGGGCCTGC
CGTACAGTCT
CCAACAGTGA
CCTGTTGCTG
GGAGGCTTG4G
TGGTGCCCCA
TCAAGCCACA TATGCTCTAG
TGGCAAAAGC
GGAAGAGCCC CTTCCCACCC
CAGAGGTAGC
GATGTGGTGG GCCGGTTCTC
ACCCTCACGC
TGACCCTGAG CCCGTGGTGG
CTGCTCCTGC
GAGTGGGCGT CTGTTCCCCA
GTCCCTGCTT
TCCTCAGCTG GCCTGATTGG GGGTCTTCC
GCAGGTTCC
GTGAGTCAC
CGCATCCCC
CCAGTACAC
AAGTTGGAA,
CCAGGCCCC'
CCAGCAGGC(
GGAGTGGCC(
ACGAAGTGAc
GCTCCAAGCC
CCGAGGGGAC
CACCACCCCI
CTGGATGAGA
GTGCTGCGGG
GAGGAGGGCT
CGCCTCTTCC
GGTGAGTGCA
GCCCTCCCGT
GCCCCGCTGG
TGTGTCTACA
CACTTCGAGG
CTCAACAGGT
CCTCTGCTCG
TCACAGTCTG
ATCCCCAGCA
'A TCCCAGCTC r3 CCTACGTGC C CAGGGGCCC C AGAACGCAC G GTGGCAAAC r GGGAAACCTI
-AAATAGACC'
3GGTGCCCAT J
CCGCAGCTC(
AGGGGTGAGIl *GGGGCCTGC4A
TCTGCCTGCA
CCACCACATC
ACGGCGAGGG
GCGCCTCCAT
CACTGGGCGC
GGCCTGCGTG
GATGCCGTGG
TGTACGCCCT
AGGGCAGCCT
TGGGCCTGGC
GAGCCAGGCC
TAGGTCTTTG
GCTGCACGGG
CGTCATCGAG
:A GCCTCCTG.Z ;C AGCTGCGGT T CAGTGAGCA A CTCCAGTGT C GGATGAGTA' 3 GAGTTTGGG, r GTGTTGGAG( P' GTGTCCTTGC
TACGTGTACJ
GTTGAGCGGC
GGCAGAAGTG-
GCGGTGGGCTI
CACGGGCAGT
ATACACCTTC
CCGCCTGTCC
TGTGCACGCC
GGGGGAGCAG
GGACCGTCCC
GCTGCTGzCGG
CTCCAGCTAC
CGTGGTGGTG
GTGGGAGGGC
GCCATCACCC
CTCACCGCTA
TACTCGTTGG
C AGAGGGGT( LC CCAGGCCCJ 'C CTCACACCC
TGCCTGCCI
CTCTGCCCT
T CCTGGGAGG k~ GCAGCATCC 3 TAACCCCAC 3 AGTGTGTGT(
TGGAGGGCC(
GTGTGGGCGC
GGGCTGACAC
GCACGTACGI
GCAGGCATGC
ACGCTCACGG
CCCAACCGCC
CTCACCACCA
CGGGATCCCC
TCAGGCTGGC
CGCTGTCGCC
GGAGCCGTGC
CAGGACCAGC
GCCCCCGAGA
TCCCAGAGCC
GTGTGCTCC
CCCTGGTCAC
C
'A
G
T
r
TCTGAGGGGA
GCTAAGGGCT
CACATACGTC
CTIGCTAGGGT
TGTATGCCCT
AGTGAGCTCA
CCATGGGTCC
CCCACGCCAG
GGGTGTGGGA
GCAGGAGTCT
TCTTCTCACA
CCAGCTGGGT
TCCGCCGTCC
CCGGCAGTGG
CCCAGTCCTT
GTGCTGATCC
*5
C
SCTGCAAGGCA
CAGGCCGTGT
3CTGCCTCAAT
TGCAGCAGCG
GCTGGGGATG GGTCCCATGG GGCAGAGGGT
TGCGCCCCCT
TCAGCAACAA
GACGCTGGTG
GACTGGTGCT GCGGCGGGGC TGCTGGGCCG
CTCTGGCGAG
CGCCGCTGGG
GGGCTCTTGC
AGGTGCACTT CGAATGCACG CGACTCTGTG ACGTCACGGA ATGACGCGGA,
GGATG.CTGGC
AGGGCCACTG
CGAGGAGTTC-
TGCCCCCGGG TTrTCAGGCCA TGGGAGCCGC
TGTGGTCGCC
CTGCCACCTG
CTCACCACCC
CAACGGCAGC
GCAACGGGGC
AGGGCTrGCTrG
CGGCAGGCCG
CGTGCTGAAC GAGGTGAGTG 31620 31680 31740 31800 31860 31920 31980 32040 32100 32160 32220 32280 32340 32400 32460 32520 32580 32640 32700 32760 32820 32880 32940 33000 33060 33120 33180 33240 33300 33360 33420 CAGCCTGGGA GGGGACGTCA TGCCTGGAGC
TTTGCAGAGG
AGCAGCACCC CGACCTTCCG CTTTCCACAA TGCAGCCCCC AGAAACAGCT CAGTGTTGTG ACCCCGGGAG TCGTgrGGCAT GCTGTGCCCC CTGCCTGCCC ACGAGCGGCA
GCACCG.'GCC
GGGTCCACAC TGTGGATGAC2 GATGGCCCCA CCTGCTCACC( AGGGCAGACG GGCAGCTTGG C GGCGTTCCTC GGTCTCTGAC C AGGGTGCCTC TCGGGGGACCC AAGATGCTGC TCTGTGCCCT
C
CCCCTCCCCT CCTCCCCTCC C
CATCTGCTC
GCTCATCCC
CTCCCAGCA
GCCCAGGAG
GGTCAGTGCi CTG4CTGGCC' CAGTACGAGc
CAGATACGCI
%TCCAGCAGPA
~TGCCCCGCA
~CGAGGAGCT
TGCTTCAGT
'AGGGTGTAA'
CACTCTCCC
CTAGCCCTT
CTCCTCCCC
CCCTAGAC
;C ATGCGTGC'r 'G GGCCCCAGA G CCAcACCCA G GCCCATGTGi 2CGCATCACA( P' CCTGCCGGCc -GG~cccTGGi
AGAACATCAC
TCGCTGCTGC
TGCCTGCCAG
GAGCCTccAG
AGCCTCAGCC
AGAGGGGCCC
CTCCCCTCCC
CCCCTCCTCC
TCCCCTAGCC
CTTCCCCTCA
T GGGACCAAGA G ATAAATCCCA C CGGGCCCTCT
TTACCCTGTT
AGCGTCTAGC
TCCTGCGCTG
CGTGGCGGCA
GGAGACTCTG
GCTGGCCCAG
GGCACTGGGT I1 CCTGGGCTCC TI GTTCTGTCCT
G
AGATGTGGGGA
CTCCCCCTTC C CCTCCCCTAG C( CTTCCCCTCC T( CCTCCTCCCG C
CCTGTACCC,
GTGACCCTpI
CCGGCGTCT(
TTGCCCATGI
ACGTAACTGC
CTGACAGC~pr
GAGCCCAAGC
3TGTCCCTGA
M'CATGGTAG
'CAGCCcccc
TCCTGCCAT
TGTGAACGC
GGGACTAAG
CTCCCCTAG
CCTTTCCCT
CCCTCCCC
Va oe*
TCTTCCCCC(
TACCCCTTCC
CCACTCGTCC
CCCCTCCCC I
CCTCCCCCTC
CTCCTCCCCC
CCTCCTCCCC
TCCCCCTCTC
TCCCCCTTCC
GCTCCTT
TCCCCTTCTC
TCTCCCCCCT
CTCCTCCCCT
TCCTCCCGGC
TCCTCTCCTC
CAGCCCTTC4
-CCTCCTCCC(
-CCCAGCCCCJ
CCTCCCCCTC
*CCCTCCTGTC
TCCTCCCTCC
TCCCCTCCTC
CTCCTCCCAT
ATTTCTCCCT
TCTCCTCCCT
CTCTCTTTTC
CCTCCCCTCC
CCTTCCTCCT
TCCTCTCCTC
CCCTCCTCAT
c4 CCCTCCCCTA GCCCCTCCCC TCCCCCTTCC
TCCCCTCCTC
CCTCTTCCTC
CCCCCTCCTC
TCCCTCCTCC
CCCCTCCCCC
CCCTCCTCCC
CCTCCCCCTG
CCCTTTCTCT
ATCCTTCCCT
TTCCTCCTCC
CCCATTACCC
CCCTCCTCAT
CCCCCTCCTC
CCCCTCCCCI
CCCTCCTCCC
CCCTCCTCCT
CTCCCTTCCT
ATCCCTC'CT
CCCTCCTCTC
CCTCCCTCCC
TCTTCCCTCC
CATTCCCCCT
CTCCTCTCCT
CCCCCTCCTC
TCCTTCCCTC
'CCTCCCCCTT
*TCCTCCCCTC
CCTCCCCTCC
CCCCCTCCCC
CCCGTTCCCA
CTCCTCACCT
CTTCTCCCCT
TTTCCTCCTC
CCTCCCCCCT
CCCCTCCTCC
TCCTTCCCTC
CTCCTATCCC
CCTCCCCTCT
CTCCCCCCTC
TCCCTCCTCC
CCTCCCCTCC
TTCTCTCCCC
CCCCTTCTCC
TCTCCTCTTC
TrlTTCCCTCT
CCCATTCCCC
CACCCCCCTC
CTAACCCCCC
c 33480 .k 33540 3 33600 33660 33720 33780 33840 33900 33966 3 4020 34080 34 0 3-4200 34*260 34320 34380 34440 34500 34560 34620 34680 34740 34800 34860 34920 34-980 35040 35100 35160 35220 35280 CTCCCCTCCT CCTATTCCCC CTCCTCTCCT CCCCTCCTTC CTCCTCCTCr
CCTCCCATGC
CCCCTCCTCC CCTCCTCCCA TCCCCCTCCT CCCCTCCTCC CTCCTCCCAT TCCTCTCCTC CCCTTCTCTC CCCTCCTCTC CTCCCCWCCT CTCCTCTCCT
CCCATCCCCC
CCTCTCCTCC
35340 35400
CCTCCTCCC
ACTCCTCTC
CTCCTTTCC
CCCTTCTCC
TCTTCTCCC
CCTTTCTCT
CTTTTCTTC
CCACTTTCC4
TTTCCTCTC'.
CCCCTCCCC'.
TCTTTCCCTc
CCCCTTCCCC
CCTCCTCTTC
CTCTTCTCCI
TCCTCTTCCC
TTCCTCTTTC
CCGCTTCCCT
CCCTCTCCCC
TCTCCCCCTT
TCTCTCCCCT
CCCTTCTCTC
CCCCTCTCCT
TCATGTGAAG
CCCTGGGTCA
GTGGGCTCTC
GGGCGTGTGC
CCCAGCAGGG
ATGATGCTCA
GACAGCATCC
'A TCCCCCCTC 'C TCCCCTCAI 'T CCCCTCCCC T CCTCTCCCC C TTTTCCCTT C TTTCTTTCC C CTCCTCCTT 2CTTCCTTTC( r CCCCTTCTIT.
CCTCTTCCCC
-CCCTCTTCTC
-TCCCCTCCTC
*CTTCCTCTC7
CCCCTTCTCT
TCCCCTCCTC
CTCTTCCCCT
CCCCTTTCTC
CTTCTCTCCC
CTCTCCCCCT
CCCCTCTCCC
CCCTCCCCTC
CTCCCCCCTT
AGGTGCCTTG
TGCAGAGCCA
CAGCTGCAGG
AAGGAGTGGG
AGCTCGTATG
TCCTGCAGGC
TCAACATCAC
'C TCCCATCCCC 'C CCCCTCCTCT 'C TCCTTCCCCC T CCCCCTTCTC T TCTCTTCCTC T CCCTTTCCTT r CCTCCCCTCC
-CCTCTCCTTT
P TCCCTCTTCC
TCCCCTCCTC
CTCCCCTCCT
TTCCCTCCCC
TCCCCTCCCC TCCCCTCCCC TTCCCTCCCC CCCCTCCTCC T CCCCTTCTCT C CTCCCCTCTC C TCTCTCTCCC C CCTTCTCTCC C' TCCTCTCCCC C TTCTCCACTC C( TGTGTCGGT G CAGAAAATGC T GCTGGGGGTG G GCCAGGAGCG G
CCGCTCGTGCC'
CCTCCTCTCC
CTCCTCCCCT
TCCTCCCCCT
TTTTTCCCTC
TCCTCCCCTT
CTCCCCTGTT
TCCTTTTCTC
CTCCTTCCTT
CCTCCCCTCC
rTCCCCTCTC
TCCCCTCTTC
PTCCCCTCCC C =CTCTTCCC TI 'rTTCTTCCC T 'CTCCCCTC T CCCTCCCCT T CCCTCCCCT C CCCTTCTCT C' TTCTCTCCC C( CTCCCCTCT C( rTCCCTCTC C' CCTCTCCTC T GCCTGCATC AC ['AGTGAGGA GG 'AGCCAGGT G~ ;GCTGGACA C )GAAGCAGA CC
TCCCCACTCC
CCCCCTCCTC
CCTTCTCCCC
CTCCTCCCTT
CTCCCCTCCT
CTCCTCCCTT
TGTTTCTCTT
rCCTCTCcCCC
TCTTCCCCTC
TCTCCTCCCC
TCCTTCCCTC
ATCCCCCTTC
CCTCCTCCCC
GTCCTCCCTC
CCCTTCTCCC
CCTTTCCCCT
TTCTCTTCCT
CCCTCCTCTT
7TCCTCTTCC CCTCCCCTCC
~CCCTCCCC]
~TCCTCTTCC
'CCCCTCTTC
CTCCTTGTC
CCTCCTCTT
TCCCCTCTT
TCCCCCCTT
CCCTCTCCT
CTTCTCTCC
CCTGTCCT
rCTCCCCCT
TCCCCTCC
GTGGTCCC
;CTGTGGG
LGGACCCGT
GCTGSGCTC
CTGCACAA
'ACGCCCAC
,TGCCACCC
CCTCTTCCCT
CTCCCCTTCC
CCCTCCCCTT
TTCCCTGCCC
CCCTCCCCTC
CCCCTCCCCT
CTCTCCCCTC
CTCCCCCTTC
CCTCCCCCCT
CTCCTCTCCA
TCTCTCCCCT
TCCTCCGCTC
CAGGTGGAGG*
GTCCAGTCAA.
GTAGAGAGGA
CACACAGGGG
GCTGGAGGCC
CGCCATCGGA
GCCCGCCCCG
35460 35520 35580 35640 35700 35760 35820 35880 35940 36000 36060 36120 36180 36240 36300 36360 36420 36480 36540 36600 36660 36720 36780 36840 36900 36960 37020 37080 37140
*S
S S 5 5*SS AGAGACCACC
GCGGGCACCG
AGGTGCCGCG GCCCGTGCCC
TO
TGCGGCCCTT TCCTCTGCCT CCCTCCTCCC
CCCAACCGCG
CTTCGTCCCC CTCCCCTCCC CCCAATTCCC
ATCCTCATCC
TCCTCCCCCT CCCCCTTCCC TATTACCATC
CCTTTTCTCC
ATTTCCCCCC CCGTCCTCCC CGTCCTTTPG
TCCATTCCCC
ATCCCCCTTC CCCTCCCTTA TCCCCCTTCC
CCTCCCTTTC
CCTTCTCTTT TCTCTACCCT TTTCCTTCCT
TTTTCCTCCC
TCTTCGTCCT CATCCCCATC ACCTTCCCCC
TCCCCCCTCC
CCCCTTCCTT CTGCCTGCAC CTCGCTCTCT
GCCCCCTCAG
CCCACCCTCC GGCTCCCCCT TTTTGCCTGC CCCCACCCTC
C
CACTGACCTC ACGCATGTCT GCAGGAGACC
TCATCCACCT
CACCACAGCC CTCAGAGCTG GGAGCCGAGT
CACCATCTCGG
ACAACCTGAC CTCTGCCCTC ATGCGCATCC TCATGCGCTC
C
CCCTGACGCT GGCGGG3CGAG GAGATCGTGG CCCAGGGCAA~
G(
TGCTGTGCTA TGGCGGCGCC CCAGGGCCG GCTGCCACTT
C~
GCGGGGCCCT GGCCAACCTC AGTGACGTrvG TGCAGCTCAT C9] CCTTTCCCTT~ TGGCTATATC AGCAACTACA CCGTCTCCAC
C~
TCCAGACACA GGCCGGCGCC CAGATCCCCA TCGAGCGGCT
CC
CCGTGAAGGT GCCCAACAAC TCGGACTGGG CTGCCCGGGG~
C
CCGCCAACTC CGTTGTGTC CAGCCCCAGG CCTCCGTCGG
TG
GCAGCAACCC TGCGGCCcGG CTGCATCTGC AGCTCAACTA
TA(
GCAGCGGGTG GGGCACACGC GGCCCCCTGG
CCTTGTCTTGG
AGGGCTTCCA TGGGTGTCTC TGGGAATr TGCTTTCTGT
TTC
TGGCCAGAGA GGAGCTGGGG GCCACGGAGA AGCAGGTGCC
AGC
TATGCTTTCA GGCCCGTGGC AGAGGGTGGG CTCAGGAGG
CC
GTGGTTGAGC TTCCCGGCAG GCGTGTGACC TGCGCGT'rCCC TGAGGAACCT GAGCCCTACC TGGCAGTCTA CCTACACTCG
GAG
CAACTGCTCG GCTAGCAGGA GGATCCGCCC
AGAGTCACTCCA
CTACACCTT TTCATTTCCC CGGGGTGAGC TCTGCGGGCC
AGC
GGCATCAG GTCAGCATTG CCTGGGTTAC TGGCCCCA
GG.
GACTGGACCG GGTATGGGCT CTGAGACTrGC GACATCCAAC
CTG
GTCCGCTACC CCTTCCCTGC CCAGGAGCAG AGACCCAGCG
GGG~
TCGCCTTTGC
CCCATCCCAT
CCCTCCCCCA
ATTCCCATTC
ATCTCTCTCC
CCTTTTCTCC
TCATCTTCCT
CATCCCCCTC
CCCCTGCTCC
TCTTCTTCTC
TCTCCCCATC
ATCCCCCTCA
NCCACTCTCT
CTCCAGCTTC
;TTCCCCCTT
TCTCCCAGCC
CTCTACCTC
CCTGTCTCTG
GCCAGCTCG
GACGTGCGGS
ATGGTGGCG
TCCCAGGCCT
CGCGTGCTC
AACGAGGAGC
CGCTCGGAC
CCGCGGAGCC
E'CCATCCCC
GAGGCTTTCA
[TTCTGGTG
GACTCCAATC
LAGGTGGCC
TCGATGGCAT
CCTCAGAG
CGCGCCATCA
ACCGCAGC
TCCGCCAACT
CTGTGGTC
ACCCTGGACA
GCTGCTG
GACGGTGCGT
3GGGAAGG CGTTTCTCGT 3 ATGGGCT GCTGGGGGCC 3 TCTGGTG CAGAGGCTCC 3 ,TCGTGGG TTCCCCCGG 3 CCAGGCC ACTACCTGTC 3 CCCCGGC CCAATGAGCA 3 GGTGCTG ACCACCGGCC 3: CTGGCAG GGCAGGGCAG 31 'kCGGCAG GCAGCGAGGG 3 3CGGAGC CTGGGCTCAC 3f kGTTACC ATCTGAACCT 39S 37200 37260 37320 37380 37440 37500 37560 37620 37680 37740 37800 378.60 37-920 379 38040 38100 38160 38220 38280 38340 ~8400 8460 8520 8580 8640 8760 8820 3880 ~940 '000
CTCCAGCC
CCAGTACT
GACCTCGC
CTTCGTGC,
AGCCTCTG
TTGCACGT(
ACTACATCC
TCCTGCAC.P
AGCGGGGCC
AGGGGCGCA
CTGGGAGTG
GGCTCGTGnl
CAGCTCCTA(
TAGCGTGGC(
CGCGATCCT'(
GTTTCTTCAC
TTTTCGTTT7I GGCATGATC7
ATTCTCCTGA
TGTCAGCCAG
GGGATGACAG
TTTCCTCTTC
GCGCGGTGGC
GTCAGGAGAT
AAAAATTAGC
AGGAGAATGG
CTCCAGCCTG
ACCTGGGTCT
AACTGAGGGA
GTGTGCAGTG
GTCACAAGTT
AC
'GI
GT
TTCCGCTGGT CGGCGCTGCA GGTGTCCGTG GGCCTGTACA CGTCCCTGTrG AGCGAGGAGG ACATGGTGTG GCGGACAGAG GGGCTGCTGC
CCCTGGAGGA
CGCCAGGCCG TCTGCCTCAC CCGCCACCTC ACCGCCTTCG
GCGCCAGCCT
CCAAGCCATG TCCGCTTTGT GTTTCCTGTG AGTGACCCTG
TGCTCCTGGG
GAGTCGAGGA GGGCCTGGGT GGGCTCGGCT CTATCCTGAG
AAGGCACAGC
CCTCCTGGGC CCGGCGGCTG TGTCCTCACA GGAGCCGACA
GCGGATGTAA
CATGCTGACA TGTGCTGTGT GCCTGGTGAC CTACATGGTC
ATGGCCGCCA
GCTGGACCAG TTGGATGCCA GCCGGGGCCG CGCCATCCCT
TTCTGTGGGC
TTCAAGTAC GAGATCCTCG TCAAGACAG CTGGGGCCGG
GGCTCAGGTG
GGGGTGGCA GGGCCTCCCC TGCTCTCACT GGCTGTGCTG
GTTGCACCCT
;TCTCGTCGC AGGCGTCAGA ACAAGGCAGT TTTTGCAGTG
CTGTGTGAAG
'TCATCCTGG GAATGACCTC GTGAGCACTC ACTGTCCCTpc
AGGACTAGA
TGGAAGTAG GTGCCAGTCA GTCAGGGTGG GCAGCCCACG
TTCTGCACAG
CACAAGTGA CGTGAGCATC GCTACCACG TGGGAGACTG
TGCATCCACC
CTGCATAGC TCGTCTCTCA GACGGAGGCG CCAGCACCCT
CCCCGTGGCT
ACCTCCATT TTCCTTTCAT TGGAATTGCC CTTCTGGCAT TCCCTTT'rTG 39060 39120 39180 39240 39300 39360 39420 39480 39540 39600 39660 39720 39780 39840 39900 39960 40020 40080 40140 40200 40260 40320 40380 40440 10500 10560 L0620 0680 0740 0800 0860
TCTTTTTTTI
TG-GCTCACACG
GTAGCTGGGA
GCGAACTCCT
GTGTGAGCCA
ATTGCCCAGT
TCACACCTGT
CGAGACCATC
CCGGCGTAGT
CGTGAACCCG
GGTGACACAG
GTCACTGGGA
CGGGTGTGTG
CGGGTCACTG
ACAGTTCTTT
GAGACGGAGT
ICAACTTCCAG
GTACAGGTGC
GACCTCAGGT
CCACACCTGG
TCTTTCTTTT
AATCCGAGCA
CTGGCTAACG
GGCAGGCGCC
GGAGGCGGAG
CAAGACTCCA
GAGGAGGTGA(
GTGCGGGTCAC
CTCACTCTGTr
CTCCCGGGTT
ACACCACCAC
GATCCGCCTG
CTGTGTTCCC
GATTACCTAC
CTTTGGGAGG
GTGAAACCCT
T(GTAGTCCCA
'TTGCAGTGA
rCTCAAAAAA
'ACAGCTTCA
:CGGTTGTGG
TGCCCAGGCT
TAAGCCATTC
ACCCAGTTAA
CCTCGGCCTG
ATTTTTTATC
TTTTAAAAAC
TGTCGGCCGG
CCAGGCAGGC
AAATCACGGG
GTCTrCTAATA AAAAGTACAA GCTCCTTGGG
AGACTGAGGC
GCTGAGATTG
CGCCACTGCA
AAAAGAAzAAA AAATACTlGTC CGCTTTIGCAG
TCTGTGCAG
CATGACTGAG GCGTGGACAG GGAGTGCAAjT
CCCTTAAGCG
TTTTTCACCA
CCAGAGTGCT
4 4 4 4 4 GTTGTGGTGT GGACTGA~c
GTGTGCAGC
CCATGTAACT TAATCATTC
CTTGAGGTCC
ATGTTTGCAT
TGCTGTTAAT
TGGACAAATT GCAGTAACCG
CAGCTCCTT(
GCCTGTGTGG CTCCTTGAGT
GCGCACAGG(
CACGTGTTGG GCAGCAGACC
GAGCCTCCCI
CCC-ACGTGGG CATCATGCTG
TATGGGGTGC
GCGACAGAGC CTTCCACCGC
AACAGCCTGG
TGGGTAGCGT GTGGAAGATC CGAGTG7,GGC AAGCTCTGCC CCTCTGCCCC
CGCATTGGGG
CTCTGCAGGG CTCAGCCCTG
CCTGGTTCCT
GGCACGCAGC GCCTTCTTCC
TGGTCAATGA
GGGCCTGGTG GAGAAGGAGG
TGCTGGCCGC
GTGGGAGGTT GGGCAGGGTG
GTCCTGCCCC
TTCTAGGCGA CGCAGCCCTT
TTGCGCTTCC
GCTTCTTTGA CAAGCACATC
TGGCTCTCCA
3TGTATGGCAG AGCCGTGCAA
AGCCGGGACT
CAAAGCTGAG ATGACTTGCC
TGGGATGCCA
CCCCTCCCTC TTGCCTCCCA
GGTACCACG
ACAGCCGGAG CGGCCACCGG
CACCTGGACG
ACATCTTCCG GATCGCCACC
CCGCACAGCC
ACGACAACAA AGGTTTGTGC
GGACCCTGCC
CGCCCTGCGA GCCTGACCTC
CCTCCTGCGC
GCAGCACGTC ATCGTCAGGG
ACCTGCAGAC
CTGGCTTTCG GTGGAGACGG
AGGCCAACGG
GAGTAAGGCC TCGTTCCAG
GTCCCACTCC
GTGGCCTCCT GCAGTGCGGC
CCTCCCTGCC
GGCGCCTGCT GGTGGCTGAG
CTGCAGCGG
TATGGGACCG GCCGCCTCGT
AGCCGTTTCA
CTCGCATCCA
CCGTGTGGTA
AGCCTCTTCC
CACACAGCAC
GCCTGGTGTC
TGTCCCGGAG
CCTCGGCACC
ACCAAAGGCT
AGAAAGAGAG
ACGCCACGGA
GTGCACGCTT
GGC'PGAGTTA
GTAATCc~cAA~
GCCTGGGCAA
GC'rCACGCCTC GAGTTCGAGA c
TTAGCTGGGC
GAGGGCCACC
CGGGGCTGT9
TGCCCAGCCC
GGGGCATGTG
CAGCGTGGTT
CAAGGTGGGC
ATGCCTAGGG
GCTCGGGAAG
AAGACTCAGA
CGACTGCACA
:CGAAAGCA
'TCCCTTAGA
:ACTTCAGGT
ATAGCAAGA
;TAATCCCAG
~CAGCCTGGC
LTGGTGGCGG
TGCTGCGTTC
GGCGACTCTG
TCCTCATCTG
CCTACAGGTG
CCTCTTCCTG
GGTGCCGTAG
TTCCTGCCCC TCAGCCTCAC CTGTGTGGcC TCCAGGCTGA G4CCCGCTGAG
CGTCGACACA
GTCTATCCCG TCTACCTGGC
CATCCTT'TTT
TGGGGCTGGG GACCCGGGAG
TACTGGGAAT
CCGCCACTTT CCAGTGCTGC
AGCCAGAGGG
GGTCAACACA CTTGAGCAGC
CTTAGCTAGA
AGCCAGAATG GTG2AAAGAAC
GAGGGCACTT
GCAGCACGCC AGATAACTCA
GAAGAAGCAA
CTCCAGAAGA AAATCTCAGT
ACATCTATAG
AACGTCCCAG TGGCCGGGCC
GGGTGTGGTG
GGCCGAGGTG GGCGGATC'rG
AGTCCAGGAG
CCCCATCTAT ATAWAA'r'r
AAAAAGGGCC
CACTTGGGA GGCCGAGGCG
GGCAGATCAC
CAACACAATG AAACCCCGC
TCTACTACAA
GCGCCTGTAG TCCCAGCTAC
TCGAGAGGCT
GGCGCCAACG
GGGTCGGGG3C
TCCTCTCCTC
GTCGCTGTTG
CTCTTCCGGA
GGAGCCTGGG
AAAGGCGTCC
CTGACCAGGG
TGCTAAGCAG
GCACGCGGCT
GAAGTGAAGA
;CTCACGCCT
rTTGAGACCA kGGCGCGGTG
TGAGGTCAG
kTACAAAAAC 4 4 40920 40980 41040 41100 41160 41220 41280 41340 41400 414 41520 411580 41640 41700 41760 41820 41880 41940 42000 42060 42120 42180 42240 42300 42360 12 42 0 12480 12540 ~2600 2660 2720 AATGGCATGA ACCCAGGAGG CGGAGCTTGC AGTGAGCCGA GATTGCGCCA
CTGCACT~CA
TCCTGGGCAA
CGGAGCAAGA
AAGCTCAGGC TCAGAGCCTr' CCAATCCTTC
ACAGACTCTT
GAAACGGGAG GCCGCACCCC AACTAACCCC AGCAGCACAC GAGTCCGTGG CAGGAACCAG GAACGATGGC CGGGTGTGGT GGGCAGATCA CTTGAGGTCA TCTCTACTAA AAATACAAAA ACTCAGGAGG CTGAGCCAGG GAGATTGCGC CATTGCACTCC ATGAAATTTA AAACTCTGTT C ACGTCGCGAG GGGCTGCCAT C ACATTCTGTC AGATGGCACC TGTCCTGAGC AGGTCTGAGCT GGAGCCCGAG CCCCACACCT G CGTCCGTGCT GGACAGCTCC TACTG3GGGGT CCTGGGCTGG G CACTGTGCAC CTCTCAGCAG GC ATTCTAAGAG GTGGGTTCCC T GGTGCCGGGG GTGTGCGGGC TC ACCACCCGGG AGCAGGTTTGM
C]
GGGCTGTGTG TGTGACACAT CC GTGCTGGCCC! TCC!GGCGAGG G TGTGGGTAGC AATCTGCGGC AG GGAGGACGGC TTCTCCCTGG CC TGAGCTG3GGG TGAGAGGAGG
G
CCTCCTGTAC CTCTAGATGA AG.
CCAGCCCCTA CCCAAGACAC
CC
GGCTCTCGGG GGAGGGGGGA T74 GTTCCCGCCT GGGGTAGGGT GG( CTCCATCTCC AAAAAAAAAA AAAAAAAATC
CCACAAAGAA
CACGATAGAA TTTTTCTAAG CAGTTAAGGA
AGAATTAACA
TCCAAGAATA CAGCAGGTGG GAACGCTTCC
CATTCATACG
TTAGGAATGC ACACGTG3GGG TCCTCAA3AG
GTTACATGCA
AGAGAAGGCG CATAAGCCGC GACCAGGAGG
GGTTGCTCCC
AGGCCACAG TGGCTGCTCG TATTTAAGTT
AATTAAAT
GGCTCACACC TGTAATCCCA GCACTTTGGyG
AGGCGGAGGC
GGAGTTCCAA GACCAGCC']G GCCAACACAG
TGAAACCCCG
FLATTAGCTGG GCATGGTGGC AGGCACCTGT
AATCCCAGCT
%CAATCGCCT GAACGCGGGA GGTGGAGGTT
GCAGTGAGCT
-AGCCTGGGT GACAGCGAGA CTCCATCTAA
AAAAGAAAAT
~CTTAGCTGC ACCAGTCTGC TGTCAAGTGT
TCAGTGGCAC
ACGGACGGT GCAGATGTCC CATATATCCA GCATTCTAGG G3GCTCTGTC CTGTCTGCTG AGGAGGTGGC
TTCTCATCCC
a.
a.
a a a a
GCCGCCCGC
CCGGGCAGC
TCCTCACG'I
CTGGGGGTC
CTTTGTTG
kGAGAAACC
;CGTGTCCT
"CGGAAGCC
:CCTGGTAC
ACGCTCAG
CTGGCACG
AGCCCCTA
GCTCTGAA
ACCTGATC
1CATGGAA
~CCAGAGG
~GTGGGGT
AG4GTGCTGG2
TCTCAGGCC']
CTGCCGCCTI
GACAGATGAA
TCGAGCCCTG
TGCTGGGTGT
CAGGGTGTCC
CTTGCTGACC
TTGGCCGGAC
GGGCCAGGCG
CTCGCCTGCC
GCTCACCCTT
CAGCAGGTCC
ACGGACCTGCc
AGGGGCCGG
GCCAGGG4CAG
CATCGACAG(
CCACGCTGAC
GGCGCAGCTI
GAGTGACTTG
GTGCAGGTCA
CTGTGGCTCC
GTGCGTGACT
CGCGCCACCT
CTG4CTCAGTG
GGCCATGGGC
AAATCCTTCT
GCAGCTGGGC
TTIGCCGAGGG
TCAGCAGCCT
ACTCAGGCCA
GGCTGTGGCT
TGCCTGGACT
GTGAGGACTC
GGACTCAAGA
TTTCTGGATG
CTGTGTCTGG
ATGTGGTCAC
GGACGGGGGT
GCAGTCTGGT
ACCCGTCCAT
TGGGCCCAGA
CAGCATCAGG
CCACCCTATG
GGTCAGCAGC
GTGAGTGTCC
GGCAGCCGTG
GCACCACTTC
TGACCACTGC CCTCGTCCTG
CAGGTGGCTG
42780 42840 42900 42.960 43020 43080 43140 43200 43260 43320 43380 43440 43500 43560 43620 43680 43740 43800 43860 43920 43980 44040 44100 44160 44220 44280 44340 44400 44460 14520 14580
ACTTCTCTK
CCTTCCCTC
TGAGGCTGG
GTTTTTAAA
TGGGAGGCC
TGGCGAAACI
TGTAGTCCC
4
GTTGTAGTG~
GTCTCAAAAZ
TCATATAGAG
CGGAGCGCAC
TGCTCAGGTG
CAGTGCAGTG
CTGCATGCAC
AGCAACAGAA
TGACTGCTGC
TCATTGTTAA
GAAAACTTC
ATTGCTGGTT
TGACCAGCCC
GGGGATTTCT
,A ACCTCTGTTG TCTGTGGAAA GAGCCTCAqyG
GGATCCCCAG
T AGGGAGGGAG CAGGCTCATG GGGCTTTGTA GGAGCAGAAA C CGGGGCCACG TTTTTATCTT GGTCTCAGAG
CAGTGAGAAA
.T ACCCCATTTT TGGCCGGGCG CGGTGGCTCA
CACGTGTAAT
G AGGTGGCAG ATGACCTGAG GTCAGCAGTT
CGAGACCAGC
C CCGTCTCTAC TAAAAATACA AAAAATTAGC
CGGGCATGCT
k. GTTACTCGGG AGACTGAGGT AGGAGAATCG
ATTGAACCTG
k. GCCGAGATCG CGCCACTGCA CTCCAGCCTG
GGCAACAAGA
CAAAAAAJATT CCTCAATTTC TTGGTTG3TTT
TGTAACTTAT
GTTACCTTGT ATGTAGTCAC GCACATAGTC
ACGCACATGGC
*CCACGGCGTG TTCCCACGCG TGTGACCCCG GGCTCTGCCA
I
TGCTGAGGTC CACACGGCCC TGCCGTTGCA CTGCAGCTGC
C
GCATGCAGTG CAGGTGCGGT GCCCCGGAGC CACAGGCCAC
A
AGGGGCTGCG GTGTCTGGGT TTGGGTAACT
ACGCCCTGTGA
TTACCTAATG ACGCATTTCT CAGAACACAT
CCCTGGCACTA
TTTTGCATCC ACATCTAGTT TGATTTGTGT
GTTATTCCTT
GCAACCAAGA ACTAAAGAGG TATGAACTY2 CCCTGGACTC
A
TGATTTACAA AAGGCAGATA ACCATCACAT GAGGGCATCT
T'
GGCCCCAGAA
GGCTCCTGTG
TTATGGGCGG
CCCAGCACTT
CTGG3CCAACA
GGCAGGCGCC
GTAGGTGAG
GCGAAACTCC
AAPCAAATGG
AGCcGGCcG
'GCCCTCCTA
TGCAGGATT
CCACAGGGC
CATTTGCAC
AGTGGTGCG
77AGTGCTTC t.ACAAAAAG
PATGAATAA
44640 44700 44760 44820 44880 44940 45000 45060 45120 45180 45240 45300 453'60 45420 45480 45540 45600
GGTTTTAAAA
CAGGTATCTC
GACGGCAGCT
ATACAGAGTA
CGGCCCAAAG
CAGACTCCGC
CCGGCTTCCT CAACCCTTGG CCGTCCCTTG
GAGGCTTGGG
CCGTGGCTCC
TCCTGGCGGC
TCCGTCTCTG
TAAAGGAATA
TCTGTTGCCC
ACGTTCAAGC
CACAGCCGGC
TTTGTCCTGG
TTTCCCAACG
GCCTCTGCCC
CCAGCGAGAG
GTTGAGGCTG
AGACTGGAGT
AGTTATCTGC
TAATTTTTTG
ACCCCTCTGC
CAGAGCAcGG
TGGCATTCCC
CCTGCTCCCT
GGTTATTTTT
GCAGTCGCAT
CTCAGCCTCC
TGTG3GT
TGGGGAAATC
CTCTGTGAAA
ATCCACACAG
CTCGGACAGT
GTCCTTCCTC
CCTTCCCTGC
TCCACTCCA
GAGTCAGACC
TATTTTTATT
GATCTCGGCT
2AAGTAGCTA
I'TAGTAGAGA
CAGGGGTAG~i
TCCAGATTCA
AGCGCGTGGC
GCTTCGGGCr
ACTGCAGCCT
GCCTGAGCCT
TGCCTCCCTA
CTGAGTCATT
TATTTTTTTG
CACTGCAAAG
AGATTACAGG
GGAGGTTTCA
-CACTACATGC
GTGCTTCCGC
CCTCACCCTC
GACCAGGTCG
CCAGCGCGTC
GCACCCTCCG
TTGGCCATTC
TGTGTTGCTA
AGATGGAGTC
TCTGCCTCCC
CGCCCGCCGC
45720 45780 45840 45900 45960 46020 46080 461.40 46200 46260 46320 46380 46440 CAGGCTGGTC TTGAACTCCT GACCTCGTGA TCCACCCATC TCAGCCTCCC AAAATGCTGA
I
GATTACAG
TGTAAAGAC
CTCcTGAJAc
CCTCCGAGA
CTTTCGGTG
CGGCGTCTC
TTCTCTAGC,
TCGGTGGTC~
CTGTTGGGA
AACCTGCATC
GAGACCCAAgI
ACTGCTGTAA
GGAGAGGGAC
TCCCAGCACC
CCTG3TGACCC
TGCCTCCTGG
GGGGAGAAGA
GCCTGAACT
CCCCGGGATG
CCAGGACTGG
CACGGGCTCA
AGCTTCCCCC
TCATTCCTCG
CCATCACGGG4
TTCTCACTGG
GCTGTGACGC
TTCCTGGCcA
CTGGGTGCGGC
CGCAGGACAC1 GCAGCCTTTG
C
CAGGGCCCAG C ,C GTGAGCCACC ACGCCTGACC
AAGTTGAGGC
A GGGTCTCACT GTCTCCAACT
CCTGAGCTCA
*T GCTGGGATTA CAGGCTTGAG
ACACTGCGCC
ACAGCAAAAC AGGAAGCATT
CAGTGCAGTG
ATGGGCTGACG AGGGCGCAGG
TACGGGAGAG
G CAGTTGGTCT CGTCCTCCCC
CTCAACGTGT
GCTCTGGGAC CGGGCATATC AGcATGGTGG CTGGCTCCTG GAGACACAG
CAGATCTCTG
LTTGAAAGGCA TTCATATGr'. TCCTTGTCCA
C
GGACAGACAC ACTGGCGTCT
CTAGATTGTA
CATAGTTTGC AGGGTTGAAG GGGGGCTCAT GGGCAGCTGG TCAGGCTGTG
GGCGATGGGTT
GCAGGCGGAC GCCTGACTTC GGTGCCTGGA
G
ACTCCCACTC TCGTTTGGGG TAGGGTCTTC
C
AAGAGGCTCA AGAAACTGCC CGCCCAGGTT
A
AGGCCGGGAT GAATTCACAG CCTACCATGT
C(
CAGAGACOCT GGCGCTGCAG AGGCTGGGG
A(
GGGAACAGCC CCAGGCAGCG AGGCTGTCCA
GC
TAGGTCATTT
TTTAATTTTT
AGTGATCCTC
CTGCCTCAGC
CAGCCAAGAG
TGTCTTTTAT
TGACCCTGGG
TCAGGCCGTT
CGTCCTGAGA
GCCCGGGACT
CTTCGCTGCC
TCTGTACCTC
CCCGATGCAG TGGCACAGCC 3CCTCAGGGA GCCCTACACA AAGTTAATT
TTAGGCCATA
;AGATGCTTG T'rGGATGGTT 'GCACCCTGA GAGAcTGTGC TATCAGCAG CAAGCGGGCG TGGCTCTTG GTTCCCT(C GGCTTT'ITG TCGGGGGGAc kCATGGGCT TGGCTGCAAC CTCAGGTC
CAGCACTCCT
~CTGGGGCC
ACCCAGCCCA
,ACAGGTGT GCTTGCGTAG 46500 46560 46620 4668o 46740 46800 46860 46920 46980 47040 47100 47160 47220 47280 47340 47400 47460 47520 47580 47640 47700 47760 17820 17880 ~7940 8000 8060 8120 8180 8240 8300 Ve 060*
CCCCTAGCCC
TGGAGGGTC'l GCCTGCTCC7
CGGGCGTGAG
GCTGGGAGCC
GGACCCCTCT
rGGCCAAGCG
'TGTGAGCGC
kGGAAGAAGC
.CTGTGCCCC
CCTGTCCCC
.CCGGTCTCC
*TCCAATGCC
CTCCCTGTGA
GCGGAAGCGC
GGTGGCTGTG
TGTTGCGTG
ACTGAAGGTG
GAAGCCACCC
GCTGcAcccG
ACGTGTGCCC
CCGCAAGGTC
TGCCACCTCC
CTTTCCTCAc
CTCCTACCCC
CACTCCTGCC
GCTGCCTCTC
CTGCTGccGG
GCTGTGGCTG
CTCCTGTCcA
AGGGGGCTGC
CCTCCCCAGG
GATGAAGATG
CGCGTACGGC
AAGAGGCTAC
GTCTCTTGTC
ACAGGTCTG]
CCTGGTGTGC
TCTCAGGcjTC
GCAGCGCCAG
CAGGGGTAGG
TCTTGCTGGA
ACACCCTGGT
CACCCCACG
ATGGCATGCT
TCCCACCTCC
CTCTGCTTCC
CTCCCTGGCC
GGTGGGTGCG
CTTCCTGGCC
CTACAGGCCT
AGCCCTG3TAC
AGAGAGCCCG-
CTTTGCACTC
GCGGGTGAGC
CACCCATGCA
TCAATGYCTCT
CAGTCCCTGC
CACCACTGCA
4 4 4 4 4 4 4 4 CTCAGAAGGC
CCTTAGGGGT
ACGCCCCCCA
CTTGCTGCCC
TGGCCCTGAA GGCCCCTAAG
GTGGCCTG
GGCCAGGT,
CAGCCATG
CCACAGGG(
CTGCTGGCC
ATCAAGCAC
CACTGGTC'I
GTGGACATG
GTAGGCAGA
GACAACTGT,
GCCAGCAGG
GCATGGGMC
GCCATGGAT(
GGGGCCCCCI
AAGACTTAA7
CTGCTCCCTP.
AGAGATGGTA
CCAGAGCTCA
GCCCCGGCCT
CCACACGTGC
TCCTCACAAT
GCGAGGGCCC
C'rCCTGACCA
CTCTCCGGCA
CAGGAGCTGG
AACTCGGCTGG
CCAGCCAGGA
CCTCAGCCAC
GACCCGCACC
GCCGTGGGGC
CTGGCCGCCC
3 TGTCTGCCCC
AGTCTTGGGC
TCCAGCCGTT
CCCCCGTCCA
GCTATGGGGA
AGCTGCACAG
TCTTCTGGGC
GTAGATACCT
GGGAGGGACA
TGCCCCAGAA
CAGTGGGAGC
GTAGGGGCTG
CAGGTGGGGT TCCGGGCAGG CGCCCCCTGC CAGCTCACCT GCCAAAGCCC TGCTGTCACT CCAGAGCCTC
CTGGTGTACA
TGCCTCATGC
CATGGGCACG
CCGGGCCTTC CTGGCCATCA TTTrAGTTTTG CCTTTAGTCC TTGTGGCTGC TAGAACTGGA CAGGTCCGTG
TCTTGCAGTG
CATCCCCAGG ATAAGGCTGA CATGTTCCCT GGGTCTCTGG GAGCGTGTGA CTGATGCTGT GTGTGTGCTG
CCATTACCCJ
TCCTGCAGCC ACACCTGCCG GTGGGCTGGG
GCCAGGCTGA
TG CTTTTTCT GCTGGTGACC CCTACCGTCT GCAAAGCGCC CGCGGTACGG GCATCCGGTG AGCCAGACCC
TAGOGGACAT
GGTAGGTGCT GCTGGCATCA CACAGGACGG GCCCATGACA GAAGCCCAGG TCTAGCCGG 1'GGCCGCTCA CTCGAGGCGG 3GCAGGTCTG AGGAGCTCTG GCCCACGTGC TGCTGCCCTA CGTCCACGGG AACCAGTCCA
GCCCAGAGCT
S S
S
5**e
CGGCTGCGGC
ITCGTTCCTCI
GCTGTGTGAC
ACGTTGTCTA
GGCGAGCTGG
GGGGCTCTGT
TCGGCCGCAG
GGCTCGGGGA
CGGGCGTCTA
GCCTGGCTCC
GGGCATGGTC
GCCTGAGCCT
ACAACAGGTG
GCCCACCCTC
GCCTGCACTG
CTCTGCGCAG
TGC-ACGCCGC
TCAGCGTCCG
AGGTGCGGCT
TGTTGAGAGA
CTTTGCCCTC
ATTGATGGCT
CCAGCAGGAA
GTGTTTCAGC
GAGGCTTCAG
CGTG3GGCCTA
CGCCAAGGAC
GGGGTGCCGG
ClrGGGGCTCC
GGAGGAGAGC
GGAGCTCCCT
ACTCCTCCGG(
CGCGGTCCCCC
GAGCCGCGCT C
CGTCACGCTGC
CCCCTTTGCG C
GCAGGAAGGT
GCAGCCTTTA
TTAACACCGC
GCTGGGAGGG
ACACTCCTGT
ACTCTACCCA
CACCAGCGAT
TTCAGCGCCG
AAGGGAGTAG
AAGGGCTGGG
TGTGCCGTGT
'GCGACCGGC
CCCTGCCCT
CCCCGCTGG
CAGCTCCCG
;TrTCCr )6 'GCCTCGAGT TGCGCCGCC
GAGCTGGCAG
GCGGAGCTCT
CGTTTCCT'rC
TTCCCTGGGG
TGGGTTTTGA
GACCCTCCCG
TACGACGTTG
GATCTGCTGG
TTCTCCAGGA
GTGCGGCACC
ATGACAGCGG
TGCGC2TrCCT =rCGGGGTG C 'CTAGG4CGGC CCTGCCACC C kGCTCACGCG
C
CCCGGCGGC C 'CAGCGCGGG C
GGCGTGCCCC
GGCATCAGCC
TCTGTATATG
TGGCGCCGAA
TGAGGCCCTG
GCCCCAGGGT
GCTGGGAGAG
GGTGAGCAGA
3TGCCGCGGC
ACGCCACCC
;GGCTACGTG
C-AGCTGCAC
~CCGCAGTCA
7TCCACAGCC
GCTCCTACT
TACAGCCCG
GGCCGCGCC
CTCTCGCTG
48360 48420 48480 48540 48600 48660 48720 48780 48840 48900 48960 49020 49080 49140 49200 49260 49320 49380 49440 49500 49560 49620 49680 49740 49800 49860 49§20 49980 50040 50100 50160
CCTCTGCT
CCCGCCCC
CGCAGCGT
CCCGCCCC
CTCCTGAC(
CGAGGCCCC
GCGGTGGC
TGCCGCTGA
CGACCAGGT
GCTTTTGGT
TGCAAGCAG
GTCCGTCTT
CCTGGTGGT(
CGGGGAGGG(
CCGCAGCTCC
CTGTGCCCTC
CTGCTGTGTG
ATTCTCCGCT
CAGGACTACG
AAGGTCAAGG
CAGGGTGCAG
TTTGAAGGGA
GTGCCCCCAC
GATGGGCTGA
CTCCAAGCCG
GACGTCTACC
CCCGCCGGAT
GCCCcGGCCA
AAGAACAAGG
GTCGGAGTGG
AATGGCTGCA
CA CCTCGGTP CT CGCGGGGC CC CGCCCCCT GG CAGCGTCC :G CGCCCCCC, T ACTTGGCA(
CTGGTGGC(
~C CGCCAGTGC G GCGCAGCTC C AAGGTGAGG
ACAGATTTCT
r GGCAAGACA GCTCGGGGTAi
-GTCTTAGCTD
TGTCTTCCqX
GGACTGGGC]
TG3GGGCTCTC
GGCGCTACCP
AGATGGTGGA
AGGTGGGTAC
CCGGACTGAC
TGGAGCCGCT
CCAGCGCTGG
GCGTGAGCCT
TGTTCGAGGC
AGCTGGAGCA
CTTCCCGTGG
GTCGGGGTGT
TCCACCCCAG
ACACCGCTCA
CGTAGGTTCC
.G CCCGTCCCCG 'C CGCCCGGCAG G CAGGGCCCCG G CCCCCTCGCA :AGGTG3TGCCT
GGGAAGGGCG
TGACGGCGGC
CCCGTTTCGT
LGCTCCGCAGCC
CTGGGCCGGT
GTCCGCAGGC TATGCCGAGCcT CCTACGCCCA G AGCTCAGCTC
A
TGTGGACTCC
C
CTCTACCCTG 7X GGCACTGCGG C' CGCCTTGCGT
G
GTTGTTCCTG CC
GCCAGACC
CGTCTCACC
CCCCGGCAG
GGGCCCCGC
GCTGCTGTT
CTGGCGCGT(
CACGGCACTc 3CGCGGCCGC
~CGTGGCCTG
;GGCGCGGGG
'GCCCAGCAG
CTGCCAGAG
CTGGCCATC
GCTGTACGC
rCTGGAGCG 3TCCTGCCG rGTGGGGCG
'AGAGCTGT
;CAGGCTGC
.C GCGCCTCCCA
CCGGCAGCGT
C CTCGCAGCG3C
CCCGCCCCCT
C GTCCCGCCCC
CTCGTAGGC
C CCGGCAGCGTj
CCCTCCCGCC
c GCCGTGCACT
TCGCCGTGC
3CTGCGGCTCG GAGCcTGGGyC GTACGCCTG CCCAGCTGGyG CCGCGCCGCT
TCACTAGCTT
GCGGCCTCGC
TGCTCTTCCT
CTGGGCGCAC ACCCCAGGC CTACGCTTCG
TGCGCCAGTG
CTCCTGGGGG TCACCTTGG CTGGTAGGTG ACTGcGcG~c CCTCACTGGT
GTCGCCTTCC
TGGCCCAGGC
CCTGTTGGTG
AGTCCTGGCA
CCTGTCACCC
CCCTACGGCT
GGGGGCTGTT
ACCGGCCGC
CTGGGAGCCC
GCCTCTGGAT GGGCCTCAGC 5022o 50280 50340 50400 50460 50520 50580 50640 50700 50760 50820 50880 50940 51000 51060 51120 51180 51240 51300 51360 51420 51480 51540 51600 51660 51720 51780 51840 51900 1960 2020 GGCCCAGTGG GGGGGAGAGG GACACGCCCT
GGGCTCTGCC
TGAGCCCCTG TGCCGCCCCC 0@ S @0 0*
S.
.006 0*SS*S
S
0005
OS
@9 5 900 0 0 5@590S
S
*560
S
SOSS
GCCCTCTCGC
CTCCGATGCC
GGGCCGGCTG
CCTGCTCACC
GCAGCTGCAC
CCCATCCCCG
GGACCTGGCC
CAGCACTTAG
GTATTACTTT
CCAGAGAGCA
TCCTCCAGG
TCGCACCCCT
GGGACAAGGT
CAGTTTGACC
AGCCTGCAAG
GGCCTGCGGC
ACTGGCCCCA
TCCTCCTPrCC
CTGCCGCTGT
GGCAGGGGCA
AGTTCCG4CCA
GCTCCAAGGT
CCACCTCCTC
GTGAGCCTGA
GACTCAACCA
GCCGCAGGAG
CAGCACTGCC
GCAGGACACC
TGGCGGGGGT
CAAAGTCCGC
ATCCCCGGAT
CAGCCAGCTG
GCCCTCCCGC
.GGCCACAGAG
CAGCCGGGCG
CAGCCGCCTT
CCTTCGGGCC
GGGCCGTGGA
5 CAAGGCCGAG
GGCCAGGCAG
TCTGTCTGTC TGTGGGCTTC OS S 0@ 0 AGCACTTTAA AGAGGCTGTG TGGCCAACCA GGAAGGACAC AGCAGTATTG
GACGGTTTCT
GTCCTCAGGT ACAGCGGGCT
GTGCCCGGCC
AAGGCTGCTG GCTTCAGGGA
GGGTTAGCCT
ACCTCTCCAG TTCCTACCGT
ACTCCCTGCA
TTATATGGTG TTAAAATGTG
TATATTTTTG
CCTGCGCCCA GAGCTGGCCT
CCCCCAACAC
TGGCAGCCCG GCTGCTGCTT
GGATGCGAGC
TGTCTGCCAG GCACTCTCAT
CACCCCAGAGC
GTAGCAAGAG AGCAGCGCCC AGGCCTGCTG
G
GCATGTCAGA GGACCCCAGG GTGGTTAGAG
G
GGGTGGAGGA AGGTGACTGT GTGTGTGTGT
G
TATGGCCCAG GCAGCCTCA2A GGCCCTCGGA
G
CTGTGGGCAT GGCCGCTTCT AGAGCCTCGA
C,
AAGTCAATAA AAGAGCTGTC TGACTGCAAT
C'
GGACTTTATT TATTTCACTG ACAGGCAATA
C(
CCGGCCTCAC ACAAACTCGG TGAAGTCCTC C1 CACCTCATAG CCAGGTGTGG GCTCGGCTGG
AG
AGGGTGCACC AGAGGTAGGC TGGGGTTGGA
GI
GAGGCGGCTT GGGCAGTAAG TCTGGGAGGC
GT
CACAGCTTGG GCAGCCAGCA CACCCCGCTG
A
*.:GAGCGCTTGA TGTGGCGGAG CGGGCAATCC
AC
AGCGGCTATG ATGCACCTGT
GAGGCCATCTGG
*CAGGTGGCAC CTGGGCCTGT CCCACCAGCT
CA(
CGTGCAGGGC CATCTGGCGG GCCACGAAGG
GC)
*CGCTGG
GGACCCAGGG TCCCCTCCCC
AGCTCCCTTG
AGCCTCTGAG ATGCTAATTT
ATTTCCCCGA.
CCACCCCCTG GGCAGATGTC
CCCCACTGCT
GCACCGCCGC CACCCTGCCC CTAAGTTATr' CCGTCTCACT GTGTGTCTCG
TGTCAGTAAT
TATGTCACTA TTTTCACTAG
GGCTGAGGGG
CTGCTGCGCT TGGTAGGTGT
GGTGGCGTTA
VTGGCCTTG GCCGGTGCTG
GGGGCACAGC
;CCTTGTCAT CCTCCCTTGC
CCCAGGCCAG
;CATCAGGTC TGGGAAGTA
GCAGGACTAG
;AAAAGACTC CTCCTGGGGG
CTGGCTCCCA.
TGTGCGCGC GCGCACGCGC
GAGTGTGCTG
CTGGCTGTG CCTGCTTCTG
TGTACCACTT
1CCCCCCCA ACCCCCGCAC
CAAGCAGACA
rGTGCC!TCT ATGTCTGTGC
ACTGGGGTA
-GTCCAAGG CCAGTGCAGG
AGGGAGGGCC
~CCGAGGAG ATGAGGCGCT
TCCGC!TGGCC
;TCTGTGCA GGGGCT2-rGC
TATGGGACGG
'AGGCGGCT TCCTCGCAGA
TCTGAAGGCA
GGCAACCG CTC TGCCCAC
ACACCCGCCC
GGAGCCCC ATATTCCCTA
CCCGCTGGCG
TTGGAGGG GTAGATATCG
GTGGGGTTGG
3GACGTAG GCAGG GTG AGCTCACTAT 2GCCTGGA CCCACCCCCA
CTCACATTTG
~TTGCG GTCAGACACG
ATCTTGGCCA
52080 52140 52200 52260 52320 52380 52440 52500 52560 52620 52680 52740 5 2-800 52:860 52920 52980 53040 53100 53160 53220 53280 53340 53400 53460 53520 3526 INFORMATION FOR SEQ ID NO:3: SEQUENCE
CHARACTERISTICS:
LENGTH: 894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genoxnic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: GGTGTGAGGG GTAGGGGCAG GGTGGGAGGT GGGCTCGCGG GTGGGCTGGG GTCATGAAGG GCCTCAGGCG CTCTGCTATT GGGTTCCAAG GCTATCCTGA GAACAGGGGT GAGGGGGGAT TGCCGTGGGG GGTTAA.AGCC TTGTCATGTT CGCTTTCGGG AGATAAAAAC AACAGGTGGC CTTTATGGAG ACGCTGCCCA GAGCCAGGTC TGTGCCAGGC TCCTGTTGGG GGTCGTCATG CGGAATCCTG ACTCTGACCA TCCGAGGCAT AGGGACCGTG GAGATTTGCA TTTCACAGAT GAGGAAACAG GTTTGGAGAG GTGACACGAC CTGTCCCAGG CATCACAGCC GGGATGTGCA.
TAGCAGGGGT TTGGAACTAT GAGGTGCCCA GGACCCAGGG TTGGATTGAA AAGGGCGGAG GGGACTAAGA TAAGCAGACA GTTGTCCCCA GC!GCTGGGGA GAGTCTTGGG ACCAGTCTGA TGCCTTGTAT TTCCCAGGCT CCAGGCTCCT CGC!CGGGACA GTGTCTCCTT GGGTGCGTGC TGGATCCCTG GGGGACGTGG CACATCCCCA GGCTTGCTAA ACATTGGGTG GGTTCTGGCA TTTGGTTTTG TAACGTTTCT GGGTCACTCC CGCCTGTGGC CACCCTTCCT TAGGGGAGCC GTGTGTCCTT GGGGCTTTGC TGGGTGGTCT CGAGGGTGGG AGAAGAATGG GTTCTCCTGG ACCAATGGAG CCCGTGCCCC TCGGGGCCAC ATTGCTCCTG CGCTCCCTGA CTGCGGACGC GTGTGTCTCG CGGCTGTCTC TGTGGAGATG GCCTCCTCCT GCCTGGCAAC AGCACCCACA GAATTGCATC AGACCTACCC CACCCGTTGT TTGTGATGCT GTAGCTGAGG GCTC INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 14060 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 135. .13040 (xi) SEQUENCE DESCRIPTION: SEQ, ID NO:4: GCTCAGCAGC AGGTCGCGGC CGCAGCCCCA TCCAGCCCGC GCCCGCCATG CCGTCCGCGG GCCCCGCCTG AGCTGCGGCC TCCGCGCGCG-GGCGGGCCTG GGGACGGCGG GGCCATGCGC GCGCTGCCCT AACG ATG CCG CCC GCC C CCC GCC CGC CTG C CTG GCC Met Pro Pro Ala Ala Pro Ala Arg Leu Ala Leu Ala 1 5 120 .180 240 300 360 420 480 540 600 660 720 780 840 894 9* S 120 170 CTG GGC CTG CCC CTG TGG CTC CCC Leu Gay Leu Gly Leu Trp Leu Gly 20 CCG CTG GCG COG GGC CCC GCG CGC Ala Leu Ala Gly Gay Pro Gly Arg 218 266 GGC TGC Gly Cys GGG CCC TGC GAG CCC CCC TGC CTC TGC Gly Pro Cys Glu Pro Pro Cys Leu Cys 35
GC
Gly' CCA GCC CCC GC Pro Ala Pro Cly
GCC
Ala GCC TG CCC GTC Ala Cys Arg Val
AAC
Asn 50 TGC TCG CCC CC Cys Ser Gly Arg CTG COG ACG CTC Leu Arg Thr Leu CCC CC CTC CGC Pro Ala Leu Arg CCC C GACG CC Pro Ala Asp Ala CC CTA GAG Ala Leu Asp CTC TCC GAG ValgSeL His AAC CTG CTC Asn Leu Leu CTC GCA GAG Leu Ala Glu CC CTC GACGOTT Ala Leu Asp Val
CG
Gly CTC CTG C AAC CTC TCG CC Leu Leu Ala Asn Leu Ser Ala CTC OAT ATA AC Leu Asp Ile Ser AAC.AAG ATT TCT Asn Lys Ile Ser
ACC
Thr 105 TTA CAA CAA Leu Clu Clu 362 410 458 506 554 602 GGA ATA Gly Ile 110 TTT GCT AAT TTA Phe Ala Asn Leu
TTT
Phe 115 AAT TTA ACT OAA Asn Leu Ser Glu
ATA
Ile 120 AAC CTG ACT COO Asn Leu Ser Gly CCC TTT GAG TCT Pro Phe Glu Cys
GAG
Asp 130
CCC
Arg TGT GGC CTC CC Cys Gly Leu Ala CTG GTGCGAG
CCC
Val Val Gln Pro 150
TG
Trp 135 CTG CCC CCA TOG Leu Pro Arg Trp
CC
Ala GAG GAG GAG Glu Glu Gln GAG GTC Gln Val 145 GAG CCA CCC AG Glu Ala Ala Thr TCT CCT Cys Ala 155 a GCG CCT GCC Gly Pro Gly GAG ACT GCC Asp Ser Gly 175 TCA CCC ACC Ser Cly Thr 190
TC
Ser 160 CTG CCT CCC GAG Leu Ala Gly Gin
CCT
Pro 165 CTG CTT GC Leu Leu Cly ATC CCC TTG CTC le'Pro Leu Leu 170 TCT COT GAG GAG Cys Gly Glu Glu
TAT
Tyr 180 GTC CCC TCC CTG Val Ala Cys Leu GAG AAC AGC Asp Asn Ser GTG GGA GCA Val Ala Ala TCC TTT TCA OCT Ser Phe Ser Ala GCC TTC TG TTC Ala Phe Cys Phe 215 CAC GAA CCC CTG His Olu Cly Leu 650 698 746 794 842 890 CTT GAG Leu Gin 205 CCA GAG CC Pro Glu Ala TG AGC Cys Ser 210 GAG GAG Glu Gin ACC CCC GAG Thr Gly Gin
C
Gly 220 a CTC GCA 0CC CTC Leu Ala Ala Leu
TCG
Ser 225 GGC TCC Gly Trp, CTG TOT GGG Leu Cys Gly C CCC GAG Ala Ala Gin 235 GGC CCC CCC Cly Pro Pro 250 CCC TCG ACT Pro Ser Ser
CC
Ala.
240 TGG TTT CCC TG Ser Phe Ala Cys
CTC
Leu 245 TGC CTC TG TCC Ser Leu Cys Ser CCA CCT CCT Pro Pro Pro 255 GCC CCC ACC TGT Ala Pro Thr Cys
AGG
Arg 260 GGC CCC ACC CTC CTC CAG CAC GTC Cly Pro Thr Leu Leu Gin His Val 265 938 TTC CCT Phe Pro 270 GCC TCC CCA CCC Ala Ser Pro Gly ACC CTG GTG GG Thr Leu Val Gly
CCC
Pro 280 CAC GGA CCT CTG His Gly Pro Leu TCT GGC CAG CTA Ser Gly Gin Leu
GCA
Ala 290 CCC TTC CAC ATC Ala Phe His Ile GCT CC Ala Ala 295 CCG CTC CCT Pro Leu Pro
GTC
Val 300 ACT GCC ACA CGC Thr Ala Thr Arg
TG
Trp 305 GAC TTC GGA GAC Asp Phe Cly Asp
GC
Gly 310 TCC GCC GAG CTG Ser Ala Glu Val GAT CC Asp Ala 315 GCT GGG CCG Ala Gly Pro GTG ACG GCC Val Thr Ala 335 GCC TCG CAT CC Ala Ser His Arg
TAT
Tyr 325 GTG CTG CCT GGG Val Leu Pro Gly CGC TAT CAC Arg Tyr His 330 CTG CCC ACA Leu Gly Thr GTC CTG CCC CTC Val Leu Ala Leu CCC GCC Gly Ala 340 CCC TCA CC Gly Ser Ala
CTG
Leu 345 CAC GTG Asp Val 350 CAG CTG GAA CC Gin Val Glu Ala
OCA
Ala 355 CCT GCC CCC CTG Pro Ala Ala Leu
GAG
Glu 360 CTC CTC TGC CCC Leu Val Cys Pro TCG CTC CAG ACT Ser Val Gin Ser CAC GAG Asp Giu 370 AGC CTC GAC Ser Leu Asp
CTC
Leu 375 ACC ATC CAC AAC Ser Ile Gin Asn
CC
Arg 380 986 1034 1082 1130 1178 1226 1274 1322 1370 1418 1466 1514 1562 1610 GGT GGT TCA GCC Gly Cly Ser Gly
CTG
Leu 385 GAG CCC GCC TAC Ciu Ala Ala Tyr
AC
Ser 390 ATC GTG CCC CTG Ile Val Ala Leu CCC GAG Cly Ciu 395 o 9* GAG CCC CC Glu Pro Ala TTC CCT GCC Phe Pro Gly 415 CC CTG CAC CCC Ala Val His Pro
CTC
Leu 405 TGC CCC TCG CAC Cys Pro Ser Asp ACC GAG ATC Thr Giu Ile 410 AAG C CC Lys Ala Ala AAC CCC CAC TGC Asn Cly His Cys
TAC
420 CCC CTG GTG GTG Arg Leu Val Val
GAG
Glu 425 TGC CTG Trp Leu 430 CAG CCAG GAG Gin Ala Gin Glu TGT GAG GCC TCG Cys Gin Ala Trp
CC
Ala 440 CCC GCC CCC CTG Gly Ala Ala Leu
GCA
Ala 445 ATC GTG GAC ACT CCC CCC GTG GAG CC Met Val Asp Ser.Pro Ala Val Gin Arg
TTC
Phe 455 CTG GTC TCC Leu Val Ser ACC ACC TGC CTA Thr Arg Cys Leu CGG GTC Arg Val 460 CCC CTC Gly Val 475
GAC
Asp 465 GTG TCC ATC GC Vai Trp Ile Gly
TTC
Phe 470 TCG ACT GTG CAG Ser Thr Val Gin GAG GTG GC Ciu Val Gly CCA CC CCGCGAG CCC GAG Pro Ala Pro Gin Giy Clu 480 485 CCC TTC ACC CTG Ala Phe Ser Leu GAG AGC TGC Glu Ser Cys 490 CAG AAC TGG CTG Gin Asn Trp Leu 495 CCC GGG GAG CCA Pro Gly Glu Pro 500 CAC CCA GCC ACA GCC GAG CAC TGC His Pro Ala Thr Ala Glu His Cys 505 GTC CGG Val Arg 510 CCG CAC Pro His 525 CTC GGG CCC ACC Leu Gly Pro Thr
GGG
Gly 515 TGG TGT AAC ACC Trp Cys Asn Thr GAC CTG TGC TCA GCG Asp Leu Cys Ser Ala 520 AGC TAG GTC Ser Tyr Val
TG
Cys 530 GAG CTG GAG CCC Glu Leu Gin Pro
GGA
Gly 535 GOC GGA GTG GAG Gly Pro Val Gin
GAT
Asp GCC GAG AAG CTC Ala Glu Asn Leu
CTC
Leu 545 OTO GGA GCG CCC Val Gly Ala Pro
AGT
Ser 550 GOG GAC Gly Asp CTG AGG CCT Leu Thr Pro GTG GAG GTC Val Giu Val 575
CTG
Leu 560 GCA GAG CAG GAC Ala Gin Gin Asp
GGC
Gly 565 CTC TGA GCC Leu Ser Ala CTG GAG GGA CCC Leu Gin Gly Pro 555 CCG CAC GAG CCC Pro His Glu Pro 570 CGT GAA GCC TTC Arg Giu Ala Phe 585 CGG CCC GCC GAG Arg Pro Ala Gin ATG OTA TTC CG Met Val Phe Pro CTG CGT CTG AC Leu Arg Leu Ser GTC ACC AGG GCC OAA TTT Leu Thr Thr Ala Giu Phe 590
GG
Gly 595 ACC CAG GAG CTC Thr Gin Giu Leu
CGG
Arg 600 1658 1706 1754 1802 1850 1898 1-946 1994 2042 2090 2138 2186 2234 2282 CTG CGG Leu Arg 605 CTG GAG GTG Leu Gin Val
TAG
Tyr 610 COG CTC CTG AGC Arg Leu Leu Ser ACA GCA CG Thr Ala Gly 615 ACC CCG GAG Thr Pro Glu AAC GC AGC GAG Asn Cly Ser Glu GAG AGG AGG TCC Glu Ser Arg Ser CCG GAC Pro Asp 630 AAC AGG ACC Asn Arg Thr CAG CTG Gin Leu 0CC CCC GG Ala Pro Ala TG TTG CCG Cys Leu Pro 655 ATG CCA GGG Met Pro Gly GGA CC Gly Arg 645 TOG TGC CCT GGA Trp, Cys Pro Gly GCC AAC ATC Ala Asn Ile 650 CCC AAT GGC Ala Asn Oly CTG GAG GCC TCC Leu Asp Ala Ser
TGC
Cys 660 CAC CCC GAG CC His Pro Gin Ala
TG
Cys 665 TG ACG Cys Thr 670 TCA GO CCA GGG Ser Gly Pro Gly CCC GGG CCC CCC Pro Gly Ala Pro
TAT
Tyr 680 GOG CTA TGG AGA Ala Leu Trp Arg TTG CTC TTC TCG Phe Leu Phe Ser
GTT
Val 690 CCC GCG 000 CCC Pro Ala Gly Pro
CCC
Pro 695 GCG GAG TAC TCG Ala Gin Tyr Ser ACC CTC GAG GOC Thr Leu His Gly GAG GAT GTC Gin Asp Val 705 CTC ATG CTC Leu Met Leu 710 CCT GOT GAG CTC Pro Cly Asp Leu OTT GOC Val Gly 715 TTG CAG GAG Leu Gin His
GAG
Asp 720 GCT C CGCT GGG Ala Gly Pro Cly GCC GTC Ala.Leu 725' GTG CAC TG Leu His Cys TCG CCC GCT Ser Pro Ala 730 2330 CCC GGC CAC Pro Gly His 735 CCT GOT CCC CGG GCC CCG TAC CTC TCC GCC Pro Gly Pro Arg Ala Pro Tyr Leu Ser Ala 740 745 AAC 0CC TCG Asn Ala Ser TOO 0CC TGC Trp Ala Cys TCA TGG Ser Trp 750 CTG CCC CAC TTG OCA Leu Pro His Leu Pro 755 0CC CAG CTO GAG Ala Gin Leu Glu
GGC-ACT
Gly Thr 760
CCT
Pro 765 0CC TOT 0CC CTG Ala Cys Ala Leu
CG
Arg 770 CTG CTT OCA 0CC Leu Leu Ala Ala ACO GAA Thr Olu 775 CAG CTC ACC Gin Leu Thr CTO CTG GOC TTO Leu Leu Gly Leu
AGO
Arg 785 CCC AAC CCT OGA Pro Asn Pro Oly
CTO
Leu 790 COG CTG CCT Arg Leu Pro GAO GTC CG Olu Val Arg TOC AGC TTT Cys Ser Phe 815
OCA
Ala 800 GAG OTG GOC AAT Glu Val Gly Asn OTO TCC AGO CAC Val Ser Arg His 000 COC TAT Gly Arg Tyr 795 AAC CTC TCC Asn Leu Ser 810 OTC ATC TAC Val Ile Tyr GAC OTO GTC TCC Asp Val Val Ser
CCA
Pro 820 OTO OCT 000 CTG Val Ala Oly Leu
CG
Airg 825 CCT 0CC Pro Ala 830 CCC COC GAC Pro Arg Asp GOC COC Oly Arg 835 CTC TAC OTO CCC Leu Tyr Val Pro
ACC
Thr 840 AAC GOC TCA 0CC Asn Gly Ser Ala OTO CTC CAG OTO Val Leu Gin Val TCT GOT 0CC AAC Ser Oly Ala Asn 0CC Ala 855 ACO 0CC ACO OCT Thr Ala Thr Ala 2378 2426 2474 2522 2570 2618 2666 2714 2762 2810 2858 2906 2954 3002 3050 TOG CCT 000 C Trp Pro Gly Oly
AGT
Ser 865 GTC AGC 0CC COC Val Ser Ala Arg
TTT
Phe 870 GAG AAT GTC Olu Asn Val CTG OTG 0CC Leu Val' Ala CTG TTC TCA Leu Phe Ser 895 TTC OTO CCC GOC Phe Val Pro Gly CCC TOG GAG ACC Pro Trp Olu Thr TOC CCT 0CC Cys Pro Ala 875 AAC OAT ACC Asn Asp Thr 890 GAG CAC OTO 0Th His Val a. 'a a 4' OTG OTA OCA CTG Val Val Ala Leu TOG CTC AGT GAG Trp Leu Ser Olu
GG
Gly 905 GTO GAC Val Asp 910 OTG OTO OTG GAA Val Val Val Glu
AAC
Asn 915 AGC 0CC AGC COO Ser Ala Ser Arg AAC CTC AOC CTO Asn Leu Ser Leu
CG
Arg 925 OTO ACO GCG GAG Val Thr Ala Glu
GAO
Glu 930 CCC ATC TOT GGC Pro Ile Cys Gly
CTC
Leu 935 COC 0CC ACG CCC Arg Ala Thr Pro
AGC
Ser 940 CCC GAG 0CC COT Pro Olu Ala Arg
OTA
Val 945 CTG CAG GOA GTC Leu Oln Gly Val
CTA
Leu 950 OTO AGG TAC Val Arg Tyr AGC CCC OTO Ser Pro Val 955 AAC GAC AAG Asn Asp Lys 970 OTO GAG 0CC Val Giu Ala TCG GAC ATO GTC Ser Asp Met Val CGO TOO ACC ATC Arg Trp Thr Ile CAG TOO CTG Gin Ser Leu 975 ACC TTC CAG AAC Thi- Phe Gin Asn GTC TTC AAT Val Phe Asn GTC ATT Val Ile 985 TAT CAG AGO Tyr Gin Ser 3098 GCG GCG Ala Ala 990 GTC ACC Val Thr 1005 GTC TTC AAG Val Phe Lys GTG AAC TAO Vai Asn Tyr CTC TCA Leu Ser 995 AAC GTA Asn Vai 1010 CTG AOG GCC TOO Leu Thr Ala Ser ACC GTG GAG CGG Thr Val Giu Arg 1015 AAO- CAC GTG AGO AAO Asn His Val Ser Asn 1000 ATG AAC AGG ATG Met Asn Arg Met
CAG
Gin 1020 GGT CTG CAG GTO Gly Leu Gin Val TOO ACA Ser Thr 1025 GTG CCG GC Val Pro Ala GTG CTG TOO COO AAT Val Leu-Ser Pro Asn 1030 GCC ACG Ala Thr CTA GCA CTG Leu Ala Leu AOG GOG Thr Ala 1040 GGC GTG CTG Gly Val Leu GTG GAC Val Asp 1045 TOG GCC GTG Ser Ala Val GAG GTG GCC Giu Val Ala 1050 TTC CTG TGG ACC Phe Leu Ti-p Thr 1055 TTT GGG GAT Phe Gly Asp GGG GAG Gly Glu 1060 CAG GOC CTC Gin Ala Leu CAC CAG TTC CAG His Gin Phe Gin 1065 OCT CCG TAO Pro Pro Tyr 1070 AAC GAG TOO Asn Glu Ser TTC COG Phe Pro 1075 GTT CCA GAO Val Pro Asp 000 TOG Pro Ser GTG CTG GTG GAG CAC Val Leu Val Giu His 1085 AAT GTC ATG Asn Val Met 1090 CAC ACC TAO GOT GCC His Thi- Tyr Ala Ala 1095 GTG GCC CAG Val Ala Gin OCA GGT GAG Pro Gly Glu 1100 CTG ACG CAG Leu Thi- Gin 3146 3194 3242 3290 3338 3386 3434 3482 3530 3578 3626 36-74 TAC CTC CTG ACC GTG CTG GCA TOT AAT GOC TTC GAG AAC Tyr Leu Leu Thr Val Leu Ala Ser Asn Ala Phe Glu Asn 1105 1110 CAG GTG COT Gln Val Pro GTG AGC Val Ser 1120 GTG CGC GOC TCC CTO Val Arg Ala Ser Leu 1125 CCC TOO GTG Pro Ser Val GCT GTG GGT Ala Val Gly *3 p
Q
GTG AGT GAC GGC GTC Vai Ser Asp Gly Val 1135 CTG GTG GOC GGC CGG CCC Leu Val Ala Gly Arg Pro 1140 GTC ACC TTO TAC CCG Val Thi- Phe Tyr Pro 1145 AOG TGG GAC TTO GGG Thi- Trp Asp Phe Gly 1160 CAC CCG CTG His Pro Leu 1150 COO TCG OCT Pro Ser Pro GGG GGT Gly Gly 1155 GTT CTT TAO Val Leu Tyr GAO GGC Asp Gly 1165 TOO COT GTC Ser Pro Val OTG ACC Leu Thr 1170 CAG AGO CAG Gin Ser Gin CCG GCT Pro Ala 1175 TAT GCC TOG AGG GOC ACC TAO Tyr Ala Ser Arg Gly Thr Tyr 1185 GTG AGO GGT CG GCG GOC CAG Val Ser Gly Ala Ala Ala Gin 1200 CAC GTG OGO OTG GAG His Val Arg Leu Glu 1190 GOG GAT GTG CGC GTC Ala Asp Val Arg Val 1205 GCC AAO CAC ACC Ala Asn His Thr 1180 GTC AAC AAC ACG Val Asn Asn Thr 1195 TTT GAG GAG CTO Phe Glu Glu Leu 1210 3722 3770 CGC GGA CTC AGC GTG Arg Gly Leu Ser Val 1215 GTG GTG GTC AGC GCC Val Val Val Ser Ala 1230 TTC GAC ATG GGG GAC Phe Asp Met Gly Asp 1245 GAC ATG AGC CTG GCC Asp Met Ser Leu Ala 1220 GCG GTG CAG ACG GGC Ala Val Gin Thr Gly 1235 GTG GAG CAG GGC GCC CCC Val Ciu Gin Gly Ala Pro 1225 GAC AAC- ATC ACG TGG ACC Asp Asn Ile Thr Trp Thr 1240 GGC ACC Gly Thr 1250 GTG CTG TCG Val Leu Ser CCC CCG Gly Pro 1255 GAG GCA ACA Giu Ala Thr
GTC
Val 1260 3818 3866 3914 3962 4010 4058 GAG CAT GTG TAC Glu His Val Tyr CTG CGG. OCA Leu Arg Ala 1265 CAG AAC TGC ACA Cln Asn Cys Thr 1270 CTG ACC CTG Val Thr Val GCT CC Gly Ala 1275 GCC ACC CCC Ala Ser Pro GCC GGC Ala Gly 1280 CAC CTG CCC CGG AGC CTG His Leu Ala Arg Ser Leu 1285 CAC GTC CTG GTC TTC His Val Leu Val Phe 1290 GTC CTG GAG GTG Val Leu Clu Val 1295 CTG CGC GTT Leu Axg Val CAA CCC GCC Clu Pro Ala 1300 GCC TGC ATC CCC ACG CAC Ala Cys Ile Pro Thr Gin 1305 CCT GAC GCG Pro Asp Ala 1310 CCC CTC ACG Arg Leu Thr GCC TAC CTC Ala Tyr Val 1315 ACC GGG AAC CCG Thr Gly Asn Pro 1320 CCC CAC TAC Ala His Tyr CTC TTC Leu Phe 1325 CAC TGC ACC Asp Trp Thr TTC CCC CAT Phe Cly Asp 1330 CCC TCC TCC Gly Ser Ser 133 TTC ACG CCC Phe Thr Arg 1350
AAC
Asn 5 ACC ACC CTG CCC Thr -Thr Val Arg 1340 CCC TCC CCC Cly Cys Pro CTG CC CTG Leu Ala Leu ACG CTG ACA CAC AAC Thr Val Thr His Asn 1345 GTG CTG TCC Val Leu Ser 1360 AGC CCC GTG AAC AG Ser Arg Val Asn Arg 1365 AGC CCC ACC TTC CCC Ser Cly Thr Phe Pro 1355 CC CAT TAC TTC ACC Ala His Tyr Phe Thr 1370 ACC CTG CAG CCA GAG Thr Leu Gin Pro Glu 1385 9 eb 59** z AGC ATC TGC GTG Ser Ile Cys Val 1375 GAG CCA GAG Glu Pro Clu CTC CCC AAC Val Cly Asn.
1380 4106 4154 4202 4250 4298 4346 4394 4442 4490 ACC CAG TTT Arg Gin Phe 1390 GTG CAG CTC Val Gin Leu GCC CAC Gly Asp 1395 GAG CCC TGG Clu Ala Trp CTG CTG Leu Val 1400 GCA TCT CC Ala Cys Ala TCC CCC Trp Pro 1405 CCC TTC CCC TAC CCC TAC ACC TCC GAC TTT CCC ACC GAG CAA Pro Phe Pro Tyr Arg Tyr Thr Trp Asp Phe Cly Thr Clu Clu 1410 1415 1420 CCC CCC CCC ACC CCT CCC ACC GC Ala Ala Pro Thr Arg Ala Arg Cly 1425 CAC CCA CCC TCC TAT CTT GTG ACA Asp Pro Gly Ser Tyr Leu Val Thr 1440 CCT GAG CTG ACO TTC ATC TAC CGA Pro Glu Val Thr Phe le Tyr Arg 1430 1435 GTC ACC CCC TCC AAC AAC ATC TCT Val Thr Ala Ser Asn Asn Ile Ser 1445 -1450 GCT GCC AAT GAC TCA Ala Ala Asn Asp Ser 1455 GCC CTG GTG GAG GTG Ala Leu Val Glu Val 1460 CAG GAG CCC GTG CTG GTC Gin Glu Pro Val Leu Val 1465 ACC AGC ATC Thi- Ser Ilie 1470 AAG GTC AAT Lys Val Asn GOC TCC Gly Ser 1475 CTT GGG CTG GAG -CTG Leu Gly Leu Glu Leu 1480 CAG CAG CCG Gin Gin Pro TAC CTG Tyr Leu 1485 TTC TCT GCT Phe Ser Ala GTG GGC CGT Val Gly Arg 1490 GGG CGC CCC GCC Gly Arg Pro Ala 1495 AGC TAC CTG Ser Tyr Leu
TGG
Ti-p 1500 GAT CTG GGG GAC Asp Leu Gly Asp GGT GGG Gly Gly 1505 TGG CTC GAG Ti-p Leu Giu GGT CCG GAG Gly Pro Glu 1510 GTC ACC CAC GCT Val Thi- His Ala TAC AAC AGC ACA GGT GAC TTC ACC OTT AGG GTG GCC GGC TGG AAT GAG Tyr Asn Ser Thr Gly Asp Phe Thi- Val Arg Val Ala Gly Ti-p Asn Glu 1520 1525 1530 4538 4586 4634 4682 4730 4778 4826 4874 4922 GTG AGC CGC AGC Val Ser Arg Ser 1535 GAG GCC TGG Giu Ala Ti-p CTC AAT Leu Asn 1540 GTG ACG GTG AAG CGG CGC GTG Val Thr Val Lys Arg Arg Val 1545 CGG GGG CTC Arg Gly Leu 1550 GTC GTC AAT Val Val Asn GCA AGC Ala Ser 1555 CGC ACG GTG GTG Arg Thr Val Val 156i GAG GCC GGC AGT Giu Ala Gly Ser 1575 CCC CTG AAT GGG Pro Leu Asn Gly AGC GTG Ser Val 1565 AGC TTC AGC Ser Phe Ser ACG TCG CTG Thi- Ser Leu 1570 GAT GTG CC Asp Val Arg
TAT
1580 TCC TGG GTG CTC Ser Ti-p Val Leu TGT GAC Cys Asp 1585 CGC TGC ACG Arg Cys Thi- CCC ATC Pro Ile 1590 CCT GGG OCT CCT ACC Pro Gly Gly Pro Thi- 1595 ATC TCT TAC Ile Ser Tyr ACC TTC CC Thi- Phe Arg 1600 TCC GTG GGC ACC Ser Val Gly Thr 1605 TTC AAT ATC Phe Asn le ATC GTC ACG Ile Val Thr 1610 a. q a.
I
a GCT GAG AAC GAG GTG Ala Glu Asn Glu Val 1615 GGC TCC GCC CAG Oly Ser Ala Gin 1620 GAC AGC ATC Asp Ser Ile TTC GTC TAT GTC Phe Val Tyr Val 1625 CTG CAG CTC Leu Gin Leu 1630 ATA GAG GGG Ile Giu Gly CTG CAG Leu Gin 1635 OTG GTG GGC Val Val Gly GGT GGC Gly Gly 1640 CCC ACC Pro Thr 1645 AAC CAC ACG Asn His Thi- CGC TAC TTC Arg Ty-r Phe GAT G ACC Asp Gly Thr 1660 4970 5018 5066 5114 5162 5210 GTA CAG Val Gin 1650 CTG CAG GCC GTG GTT AGG Leu Gin Ala Val Val Arg 1655 AAC GTC TCC TAC Asn Val Ser Tyr AGC TG Ser T-p, 1665 ACT GCC TGG Th- Ala Ti-p AGG GAC Arg Asp 1670 AGG GGC CCG Arg Gly Pro GCC CTG Ala Leu 1675 GCC CCC AGC Ala Gly Ser GOC AAA GGC Cly Lys Cly 1680 TTC TCG CTC ACC GTG Phe Ser Leu Thr Val 1685 CTC GAG GCC GGC ACC Leu Giu Ala Gly Thi- 1690 p. TAC CAT GTG CAG CTG CGG GCC ACC AAC ATiG Tyr His Val Gin Leu Arg Ala Thr Asn Met 1695 1700 CTG GGC AGC GCC TGG GCC Leu Gly Ser Ala Trp, Ala 1705 GAC TGC ACC Asp Cys Thr 1710 GCC TCC CCG Ala Ser Pro 1725 ATG GAC TTC Met Asp Phe GTG GAG Val Giu 1715 CCT GTG GGG TGG CTG ATG GTG GCC Pro Val Gly Trp Leu Met Val Ala 1720 AAC CCA GCT GCC Asn Pro Ala Ala 1730 GTC AAC ACA Val Asn Thr AGC GTC ACC Ser Val Thr 1735 CTC AGT GCC Leu Ser Ala 1740 GAG CTG OCT GGT Glu Leu Ala Gly GGC AGT Gly Ser 1745 GGT GTC GTA Gly Val Val TAC ACT TGG Tyr Thr Trp 1750 TCC TTG GAG GAG Ser Leu Glu Glu 1755 5258 5306 5354 5402 5450 5498 5546 GGG CTG AGC Gly Leu Ser TOG GAG Trp Giu 1760 ACC TCC GAG Thr Ser Glu CCA TTT ACC Pro Phe Thr 1765 ACC CAT AGC TTC CCC Thr His Ser Phe Pro 1770 ACA CCC GGC CTG Thr Pro Gly Leu 1775 CAC TTG GTC His Leu Val ACC ATG ACG Thr Met Thr 1780 GCA GGG AAC CCG CTG GGC Ala Gly Asn Pro Leu Gly 1785 TCA GCC AAC Ser Ala Asn 1790 GCC ACC GTG Ala Thr Val GAA GTG GAT Glu Val Asp 1795 GTG CAG GTG CCT GTG AGT GGC Val Gin Val Pro Val Ser Gly 1800 CTC AGC Leu Ser 1805 ATC AGG GCC Ile Arg Ala AGC GAG Ser Glu 1810 CCC GGA GGC Pro Gly Gly AGC TTC Ser Phe 1815 GTG GCG GCC GGG Val Ala Ala Gly 1820 TCC TCT GTG Ser Ser Val TGG TGC TGG Trp Cys Trp CCC TTT TGG Pro Phe Trp 1825 GCT GTG CCC Ala Val Pro 1840 GGG CAG CTG Gly Gin Leu GCC ACG Ala Thr 1830 GGC ACC AAT GTG AGC Gly Thr Asn Val Ser 1835 GGC GGC AGC AGC Gly Gly Ser Ser 1845 ACC ATG GTC TTC Thr Met Val Phe 1855 CCG GAT GCT Pro Asp Ala GGC ACC TTC Gly Thr Phe 1860 AAG COT GGC Lys Arg Gly TCC ATC CGG Ser Ilie Arg 186~ TAC AAC CTC Tyr Asn Leu 1880 CCT CAT GTC Pro His Val 1850 CTC AAT GCC Leu Asn Ala ACO GCG GAG Thr Ala Glu 5594 5642 5690 5738 5786 5834 5882 TCC AAC GCA Ser Asn Ala 1870 GTC AGC TGG Val Ser Trp GTC TCA GCC ACG Val Ser Ala Thr 1875 p GAG CCC Giu Pro 1885 ATC GTG GGC Ile Val Gly CTG GTG Leu Val 1890 CTG TGG GCC Leu Trp, Ala AGC AGC Ser Ser 1895 AAG GTG GTG Lys Val Val
GCG
Ala 1900 CCC GGG CAG CTG GTC CAT TTT CAG Pro Gly Gin Leu Val His Phe Gin 1905 ATC CTG CTG GCT Ile Leu Leu Ala 1910 GCC GGC TCA GCT Ala Gly Ser Ala 1915 GTC ACC TTC Val Thr Phe CGC CTG CAG GTC Arg Leu Gin Val 1920 GGC GGG GCC AAC CCC GAG Gly Giy Ala Asn Pro Giu 1925 GTG CTC CCC Val Leu Pro 1930 5930 GGG CCC GGT TTC TCC CAC AGC TTC CCC CGC GTC GGA GAC GAG GTG GTG Gly Pro Arg Phe Ser His Ser Phe Pro Arg Val Gly Asp His Val Val 1935 .1940 1945 AGC GTG CGG Ser Val Arg 1950 GGC AAA AAC CAC GTG Gly Lys Asn His Val 1955 AGC TGG GCC GAG GCG GAG GTG CGC Ser Trp Ala Gin Ala Gin Val Arg 1960 ATC GTG Ile Val 1965 GTG CTG GAG Val Leu Giu GGC GTG Ala Val 1970 AGT GGG CTG Ser Gly Leu GAG GTG CCC Gin Val Pro 1975 GAG CCT GOC Giu Pro Gly GAG GGC GG Gin Arg-Gly ATC GCC ACG Ile Ala Thr 1985 GGG ACT GAG Gly Thr Glu AGG AAG TTG ACA Arg Asn Phe Thr 1990 AAC TGG TG Asn Gys Cys 1980 GGG CGG GTG Ala Arg Val 1995 GTG GAG AAG Leu Gln Lys 5978 6026 6074 6122 6170 6218 62:66 TGT GG Ser Arg 2000 GTG GGG TAG Val Ala Tyr GCC TG Ala Trp, 2005 TAG TTG TG Tyr Phe Ser GTG GAG GGG GAG Vai Gin Gly Asp 2015 TGG GTG GTG Ser Leu Val ATG CTG TG Ile Leu Ser 2020 GCG CG GAG GTG ACG TAG Cly Arg Asp Val Thr Tyr 2025 AGG GGG GTG Thr Pro Val 2030 GGG GGG GGG Ala Ala Gly CG TTG GAG Leu Leu Clu 2035 ATG GAG GTG G Ile Gin Val Arg 2040 GGG TTG AAG Ala Phe Asn GGG GTG Ala Leu 2045 GG AGT GAG Gly Ser Giu AAG GG AG Asn Arg Thr 2050 GTG OTG GTG GAG Leu Val Leu Glu 2055 GTT GAG GAG Val Gin Asp
CG
Ala 2060 6314 GTG GAG TAT GTG Val Gin Tyr Val GGG GTG Ala Leu 2065 GAG AGG GGG Gin Ser Oly GGG TGG Pro Gys 2070 TTG AGG AAG Phe Thr Asn G TG Arg Ser GGG GAG TTT Ala Gin Phe GAG GGG Giu Aia 2080 GGG AGG AGG Ala Thr Ser GGG AGG GGG Pro Ser Pro 2085
S.
S.
S
S.
5 4 p
*SSS
GAC TGG GAG TTT His Trp Asp Phe 2095 GGG GGT GTG GGG TAG Arg Arg Val Ala Tyr 2090 GAG AGA GAT GAG GGG Asp Thr Asp Clu Pro 2105 GO OAT GGG Gly Asp Oly TGG GGA GGG GAG Ser Pro Giy Gin 2100 AGO GGG GAG A-rg Ala Glu 2110 GAG TGG TAG His Ser Tyr GTG AGG GCT Leu Arg Pro 2115 COG GAG TAG CG Gly Asp Tyr-Arg 2120 OTG GAG OTO Val Gin Val 6362 6410 6458 6506 6554 6602 6650 AAC GCC Asn Ala 2125 TCC AAG GTG OTO AGC Ser Asn Leu Val Ser 2130 TTG TTC GTG Phe Phe Val GCG GAG Ala Gin 2135 GCG AG GTG Ala Thr Val
ACC
Thr OTG GAG GTG GTG GGG TG GG Val Gin Vai Leu Ala Gys Arg 2145 GTG GAG GTG CTO ATO GGG COA Leu Gin Val Leu Met Arg Arg 2160 GAG GGG GAG GTO GA GOTO GTC GTG CG Giu Pro Glu Val Asp Val Val Leu Pro 2150 2155 TCA GAG GG AAG TAG TTG GAG GG GAG Ser Gin Arg Asn Tyr Leu Glu Ala His 2165 2170 p GTT GAC CTG CGC GAC Val Asp Leu Arg Asp 2175 TGC GTC ACC TAC GAG Cys Val Thr Tyr Gin 2180 ACT GAG TAC GGC TGG GAG Thr Glu Tyr Arg Trp Giu 2185 GGG GG. GA GCG CGT GTG Gly Arg Pro Ala Arg Val 2200 GTG TAT GGC Val Tyr Arg 2190 AGC GGG AGG TGC! GAG GGG CG Thr Ala Ser Gys Gin Arg Pro 2195 GCC GTG Ala Leu 2205 CCC GGC GTG Pro Gly Val GAG GTG Asp Val 2210 AGG CCC GCT Ser Arg Pro CGG GTG Arg Leu 2215 GTG GTG CGG Val Leu Pro
GG
Arg 2220 GTG CG GTG GGT Leu Ala Leu Pro GTG GGG Val Gly 2225 GAG TAC TGC His Tyr Gys TTT GTG Phe Val 2230 TTT GTG GTG Phe Val Val TGA TTT Ser Phe 2235 6698 6746 6794 6842 6890 6938 6986 GGG GAG AG Gly Asp Thr GGA CTG ACA Pro Leu Thr 2240 GAG AGG ATG GAG GGC AAT CTG Gin Ser Ile Gin Ala Asn Val 2245 AGG GTG GCG Thr Val Ala 2250 CCC GAG GG CG Pro Giu Arg Leu 2255 TGA GAG AGA GG Ser Asp Thr Arg 2270 GTG GGG ATG Val Pro Ile ATT GAG OCT Ile Giu Oly 2260 GG TCA TAG CG GTG TG Gly Ser Tyr Arg Val1 Trp, 2265 GAG CTG GTG GTG Asp Leu Val Leu 2275 GAT COG AGG Asp Oly Ser GAG TGG TAG GAG CG Oiu Ser Tyr Asp Pro 2280 AAG GTG Asn Leu 2281; GAG GAG GGG Olu Asp Gly GAG GAG Asp Gin 2290 AGG GGG GTG Thr Pro Leu AGT TTG Ser Phe 2295 GAG TOG GGG His Trp Ala
TGT
Cys 2300 GTG GGT TGG AGA Val Ala Ser Thr GAG AGO Gin Arg 2305 GAG GGT GGG Oiu Ala Gly GGG TOT GCG Gly Gys Ala 2310 GTG AAG TTT 000 Leu Asp Phe Gly 2315 CG GG GGG Pro Arg Gly AG AOC Ser Ser 2320 AGG GTG AGC Thr Val Thr ATT GGA Ile Pro 2325 GGG GAG GG Arg Glu Arg CTG GCG, GCT Leu Ala Ala 2330
C.
o 0* a GOC GTG GAG TAG Oly Val Olu Tyr 2335 AGG TTC! AGC Thr Phe Ser C!TG AGG Leu Thr 2340 GTG TGG AAG Val Trp Lys GGG GGC CGG AAG Ala Gly Arg Lys 2345 7034 7082 7130 7178 7226 .7274 7322 GAG GAG OCG Giu Giu Ala 2350 ACG AAC GAG Thr Asn Gin AG CTG Thr Val 2355 CTG ATC COO Leu Ile Arg AGT GC Ser Gly 2360 COG GTO CG Arg Val Pro GTG TAG GAA Val Tyr Olu 2380 ATT OTG Ile Val 2365 TCC TTG GAG Ser Leu Giu TOT GTG Cys Vai 2370 TGG TGG AAO Ser Gys Lys GCA GAG GGG Ala Gin Ala 2375 OTG AGG GGC AGC Val Ser Arg Ser TGG TAG Ser Tyr 2385 GTG TAG TTO Val Tyr Leu GAG GGG CGG Olu Gly Arg 2390 TGG! CTC AAT TGG Cys Leu Asn Gys 239S AGG AGG GGG Ser Ser Oly TGG AAG GGA Ser Lys Arg 2400 COO GGG TGG GGT GGA COT ACG Gly Arg Trp Ala Ala Arg Thr 2405 TTC AGG AAG Phe Ser Asn 2410 7370 a. a AAG ACG CTG GTG CTG LYS Thr Leu Val Leu 2415 GAT GAG ACC ACC ACA TCC ACG GGC AGT GCA GGC Asp Giu Thr Thr Thr Ser Thr Gly Ser Ala Gly 2420 2425 ATG CGA CTG Met Arg Leu 2430 GTG CTG CGG CGG GOC GTG CTG CGG GAC GGC GAG GGA TAC Val Leu Arg Arg Gly Val Leu Arg Asp Gly Glu Gly Tyr 2435 2440 ACC TTC Thr Phe 2445 ACG CTC ACO Thr Leu Thr GTG CTG Val Leu 2450 GOC CGC TCT GGC GAG Gly Arg Ser Gly Giu 2455 GAG GAG GGC TGC Glu Giu Gly Cys, 2460 GCC TCC ATC CGC Ala Ser Ile Arg CTG TCC CCC Leu Ser Pro 2465 AAC CGC COG CCG Asn Arg Pro Pro 2470 CGC CTC TTC Arg Leu Phe CTG GGG GOC TCT TGC Leu Gly Gly Ser Cys 2475 ACC ACC AAG GTG CAC Thr Thr Lys Val His 2490 7418 7466 7514 7562 7610 7658 7706 CCA CTG Pro Leu 2480 GGC OCT GTG Gly Ala Val CAC GCC CTC His Ala Leu 2485 TTC GAA TGC ACG Phe Glu Cys Thr 2495 GGC TGG CAT Gly Trp His GAC GCG GAG Asp Ala Giu 2500 GAT GCT GGC GCC CCG CTG Asp Ala Giy Ala Pro Leu 2505 GTG TAC GC Val Tyr Ala 2510 CTG CTG CTG Leu Leu Leu CGG CGC TGT Arg Arg Cys 2515 CGC CAG GGC CAC TGC GAG GAG Arg Gin Gly His Cys Giu Giu 2520 TTC TGT Phe Cys 2525 GTC TAC AAG Val Tyr Lys GGC AGC Gly Ser 2530 CTC TCC AGC Leu Ser Ser TAC GGA Tyr Gly 2535 GCC GTG CTG Ala Val Leu
CCC
Pro 2540 CCG GGT TTC AGO Pro Gly Phe Arg CCA CAC Pro His 2545 TTC GAG GTG Phe Glu Val GGC CTG Gly Leu 2550 GCC GTG GTG Ala Val Val GTG CAG Val Gin GAC CAG CTG Asp Gln Leu GGA GCC Gly Ala 2560 GCT OTG GTC Ala Val Val GCC CTC Ala Leu 2565 AAC AGG TCT Asn Arg Ser TTO GCC ATC Leu Ala Ile ACC CTC CCA GAG Thr Leu Pro Giu 2575 CCC AAC GGC Pro Asn Gly AGC GCA ACG Ser Ala Thr 2580 GGO CTC ACA GTC TGG CTG Gly Leu Thr Val Trp Leu 2585 7754 7802 7850 7898 7946 7994 8042 8090 CAC GGG CTC His Gly Leu 2590 ACC GCT AGT Thr Ala Ser GTG CTC CCA GGG Val Leu Pro Gly 2595 CTO CTG CGG Leu Leu Arg 2600 CAG GCC GAT Gin Ala Asp CCC CAG Pro Gin 2605 CAC GTC ATC His Val Ile GAG TAC Giu Tyr 2610 TOG TTO; CCC CTG GTC Ser Leu Ala Leu Val 2615 GTG GCG GCA GAG CCC Val Ala Ala Giu Pro 2630 ACC GTO CTG Thr Val Leu
AAC
Asn 2620 GAG TAC GAG Oiu Tyr Giu CAG CAC CGA Gin His Arg CGG GCC CTG GAC Arg Ala Leu Asp 2625 GCC CAG ATA COO Ala Gin Ile Arg 2640 AAG CAC GAG CGG Lys His Oiu Arg 2635
AAO
Lys AAC ATC Asn Ile 2645 AOG GAG ACT Thr Giu Thr CTG GTG TCC Leu Val Ser 2650 I. CTG AGG GTC CAC Leu Arg Val His 2655 ACT GTG GAT GAC ATC Thr Val Asp Asp Ile 2660 CAG CAG ATC GCT OCT GCG CTG Gin Gin Ile Ala Ala Ala Leu 2665 GCC CAG TGC Ala Gin Cys 2670 ATG GGG CCC Met Giy Pro AGC AGO Ser Arg 2675 GAG CTC GTA TGC CGC Glu Leu Val Cys Arg 2680 TCG TGC CTG Ser Cys Leu AAG CAG Lys Gin 2685 ACG CTG CAC Thr Leu His AAG CTG Lys Leu 2690 GAG GCC ATG Giu Ala Met ATG CTC Met Leu 2695 ATC CTG CAG Ile Leu Gin
OCA
Ala 2700 GAG ACC ACC OCG Glu Thr Thr Ala GGC ACC Oly Thr 2705 OTG ACG CCC Val Thr Pro ACC 0CC Thr Ala 2710 ATC GGA GAC Ile Gly Asp AGC ATC Ser Ile 2715 CTC AAC ATC Leu Asn Ile GCA CCA CAG Ala Pro Gin 2735 ACA GGA Thr Gly 2720 GAC CTC ATC Asp Leu Ile
CAC
His 2725 CTG 0CC AGC Leu Ala Ser GAG TCA CCA Giu Ser Pro TCG GAC GTO CG Ser Asp Val Arg 2730 TCT COO ATO OTG Ser Arg Met Val 2745 CCC TCA GAO CTO Pro Ser 0Th Leu OGA 0CC Oly Ala 2740 OCO TCC( Ala Ser G 2750 COC TCC C Arg Ser A 2765 AG 0CC TAC AAC ln Ala Tyr Asn CTG ACC Leu Thr 2755 TCT 0CC CTC ATO CGC Ser Ala Leu Met Arg 2760 ATC CTC ATO Ile Leu Met GC OTO CTC ~rg Val Leu AAC GAG Asn Glu 2770 GAO CCC CTO Giu Pro Leu ACO CTG Thr Leu 2775 OCG 0CC GAO Ala Oly Giu
GAG
Oiu 2780 8138 8186 8234 8282 8330 8378 8426 8474 8522 8570 8618 8666 8714 8762 8810 ATC GTO 0CC CAG Ile Val Ala Gin 0CC AAG Oly Lys 2785 COC TCO GAC Arg Ser Asp CCO COO Pro Arg 2790 AOC CTG CTG Ser Leu Leu TOO TAT Cys Tyr 2795 GGC GGC 0CC Gly Oly Ala CCA 000 Pro Gly 2800 CCT 0CC TGC Pro Gly Cys CAC TTC His Phe 2805 TCC ATC CCC Ser Ile Pro GAG OCT TTC Giu Ala Phe 2810
S
AGC 000 0CC CTG Ser Oly Ala Leu 2815 0CC AAC CTC Ala Asn Leu AGT GAC Ser Asp 2820 OTO OTG CAG Val Val Gin CTC ATC TTT CTG Leu Ile Phe Leu 2825 OTO GAC TCC Val Asp Ser 2830 AAT CCC TTT Asn Pro Phe CCC TTT Pro Phe 2835 GGC TAT ATC Oly Tyr Ile AGC AAC Ser Asn 2840 TAC ACC OTC Tyr Thr Val TCC ACC Ser Thr 2845 AAG OTO 0CC Lys Val Ala TCG, ATO Ser Met 2850 OCA TTC CAG Ala Phe Gin ACA CAG Thr Gin 2855 GCC 0CC 0CC Ala Oly Ala
CAG
Gin 2860 ATC CCC ATC GAG Ile Pro Ile Giu CGG CTO 0CC Arg Leu Ala 2865 TCA GAG COC 0CC ATC Ser Glu Arg Ala Ile 2870 ACC OTG AAG OTO Thr Vai Lys Val 2875 CCC AAC AAC Pro Asn Asn TCG GAC TGG 0CT 0CC CGO 0CC CAC CGC AGC TCC 0CC AAC Ser Asp Trp Ala Ala Arg Oly His Arg Ser Ser Ala Asn 2880 2885 2890 TCC GCC AAC TCC GTT GTG GTC CAG CCC CAG GCC TCC GTC GGT GCT GTG Ser Ala Asn Ser Val Val Val Gin Pro Gin Ala Ser Val Gly Ala Val 2895 2900 2905 8858 8906 GTC ACC CTG GAC Val Thr Leu Asp 2910 A-AC TAT ACG CTG Asn Tyr Thr Leu 2925 AGC A.GC A-AC CCT GCG GCC Ser Ser Asn Pro Ala Ala 2915 CTG GAC GGC CAC TA.C CTG Leu Asp Gly His Tyr Leu 2930 GOG CTG CAT CTG CAG CTC Gly Leu His .Leu Gin Leu 2920 TCT GAG GAA Ser Giu Glu 2935 CCT GAG CCC Pro Oiu Pro 2940 TAC CTG GCA GTC TP.C CTA C-AC Tyr Leu Ala Val Tyr Leu His 2945 TGC TCG GCT AGC AGG AGO ATC Cys Ser Ala Ser Arg Arg Ile 2960 TCG GAG CCC CGG Ser Glu Pro Arg 2950 CGC CCA GAG TCA Arg Pro Glu Ser 2965 CCC AAT GAG C-AC AAC Pro Asn Glu His .'Asn 2955 CTC C-AG OCT GCT GAC Leu Gin Gly Ala Asp 2970 C-AC CGO CCC TAC ACC His Arg Pro Tyr Thr 2975 GGG AGT TAC CAT CTG Gly Ser Tyr His Leu 2990 TTC TTC ATT TCC Phe Phe Ile Ser 2980 AAC CTC TCC AGC Asn Leu Ser Ser 2995 CTG TAC ACG TCC Leu Tyr Thr Ser 3010 CCG GGG AGC AGA Pro Gly Ser Arg 298 C-AC TTC CGC TGG His Phe Arg Trp, 3000 GAC CC-A GCG Asp.Pro Ala TCG GCG CTG Ser Ala Leu 8954 9002 9050 9,098 9-146 9194 9242 C-AG GTG Gin Val 3005 TCC GTO GGC Ser Val Gly CTG TGC C-AG Leu Cys Gin TAC TTC AGC Tyr Phe Ser
GAG
Glu GAG 0-AC A~TG GTG Glu -Asp Met Val TGG CGG Trp Arg 3025 ACA GAG GO Thr Giu Oly CTG. CTG CCC Leu Leu Pro 3030 CTG GAG GAG ACC Leu Olu Glu Thr TCG CCC CGC Ser Pro Arg CA.G GCC Gin Ala 3040 OTC TGC CTC Val Cys Leu ACC CGC C-AC Thr Arg His 3045 CTC ACC 0CC TTC GOC Leu Thr Ala Phe Oly 0CC AGC CTC TTC GTG -Ala Ser'Leu Phe Val 3055 CCC CC-A AGC C-AT GTC Pro Pro Ser His Val 3060 CGC TTT OTO TTT CCT GAG Arg Phe Val Phe Pro Oiu 3065 CCG -AC-A GCG Pro Thr Ala 3070 CTG OTG ACC Leu VAl Thr 3085 OAT GTA AAC Asp Val Asn TAC ATC Tyr Ile 3075 OTC ATO CTG Val Met Leu AC-A TOT Thr Cys 3080 OCT GTO TGC Ala Val Cys 9290 9338 9386 9434 9482 9530 TAC ATG GTC P.TG Tyr Met Val Met 3090 0CC 0CC ATC Ala Ala Ile CTG C-AC Leu His 3095 AAG CTG 0-AC LYS Leu Asp
CAG
Gin 3100 TTG 0-AT 0CC AGC Leu Asp Ala Ser COO GOC Arg Gly 3105 COC 0CC ATC Arg Ala Ile CCT TTC TGT Pro Phe Cys 3110 COC TTC AAG Arg Phe Lys TAC GAG P.TC Tyr Giu Ile 3120 000 C-AG CGO GGC Gly Gin Arg Oly 3115 GOC COO GGC TCA Gly Arg Oly Ser 3130 CTC OTC AAG ACA GGC TGO Leu Val Lys Thr Gly Trp 3125 GGT ACC ACG 0CC CAC GTG GOC ATC ATG CTG TAT GG GT0 GAC AGC CG Oly Thr Thr Ala His Val Gly Ile Met Leu Tyr Gly Val Asp Ser Arg 3135 3140 3145 AGC GGC CAC COG CAC Ser Gly His Arg His 3150 CTG GAC GGC Leu Asp Gly 3155 GAC AGA 0CC TTC CAC CGC AAC AGC Asp Arg Ala Phe His Arg Asri Ser 3160 CTG GAC Leu Asp 3165 ATC TTC CGG Ile Phe Arg ATC GCC ACC Ile Ala Thr 3170 CCG CAC AGC CTG Pro His Ser Leu 3175 AAG ATC CGA GTG Lys Ile Arg Val TOG CAC GAC Trp His Asp 3185 AAC AAA GOG CTC AGC Asn Lys Gly Leu Ser 3190 GGT AGC GTG TG Gly Ser Val Trp 3180 CCT 0CC TOG TTC Pro Ala Trp Phe 3195 CGC AGC GCC TTC Arg Ser Ala Phe 9578 9626 9674 9722 9770 9818 9866 9914 CTG CAG CAC Leu Gin His GTC ATC GTC Val Ile VJal 3200 AGO GAC CTG CAG ACO OCA Arg Asp Leu Gin Thr Ala 3205 TTC CTG GTC AAT GAC Phe Leu Val Asn Asp 3215 TOG CTT TCG OTO GAG Trp Leu Ser Val Giu 3220 ACG GAG 0CC AAC 000 0c Thr Glu Ala Asn Gly Gly 3225 OTO OTO GAG Leu Val Giu 3230 AAG GAG GTO Lys Giu Val CTG CC Leu Ala 3235 GCG AGC GAC OCA 0CC Ala Ser Asp Ala Ala 3240 CTT TTG COC Leu Leu Arg TTC CG Phe Arg 3245 COC CTG CTG Arg Leu Leu GTG OCT Vai Ala 3250 GAG CTG CAG Giu Leu Gin CGT GOC Arg Oly 3255 TTC TTT GAC Phe Phe Asp
AAG
Lys 3260 CAC ATC TGG CTC His Ile Trp Leu TCC ATA Ser Ile 3265 TOG GAC COO Ti-p Asp Arg CCG CCT COT Pro Pro Arg 3270 AGC COT TTC ACT Ser Arg Phe Thr COC ATC CAG Arg Ile Gin AGO 0CC ACC Arg Ala Thr 3280 TOC TOC OTT CTC CTC Cys Cys Val Leu Leu 3285 ATC TGC CTC TTC CTG Ilie Cys Leu Phe Leu 3290 66 6
S
66 66 666* 6566..
6 6*66 6666 66 6 66* 6 GOC 0CC AAC 0CC Gly Ala Asn Ala 3295 GTO TOO TAC Val Trp Tyr 000 OCT Gly Ala 3300 GTT GOC OAC Val Gly Asp TCT 0CC TAC AOC Ser Ala Tyr Ser 3305 ACO 000 CAT Thr Gly His 3310 GTG TCC AGO Val Ser Arg CTG AGC Leu Ser 3315 CCG CTO AGC Pro Leu Ser OTC GAC Val Asp 3320 ACA GTC OCT Thr Val Ala 9962 10010 10058 10106 10154 10202 10250 OTT GGC Val Gly 3325 CTG GTG TCC Leu Vai Ser AOC OTO Ser Val 3330 OTT GTC TAT Val Val Tyr CCC CTC Pro Vai 3335 TAC CTG 0CC Tyr Leu Ala
ATC
le 3340 CTT TTT CTC TTC COG ATO TCC Leu Phe Leu Phe Arg Met Ser 3345 CCC ACA CCT 0CC 000 CAG CAG Pro Thr Pro Ala Oly Gin Gin 3360 COG AGC AAG GTO OCT Arg Ser Lys Val Ala 3350 OTO CTG GAC ATC GAC Vai Leu Asp Ile Asp 3365 COG AGC CCG AGC Gly Ser Pro Ser 3355 AGC TOC CTG GAC Ser Cys Leu Asp 3370 TCG TCC GTG CTG Ser Ser Val Leu 3375 GAC AGC TCC TTC CTC ACG TTC TCA GGC CTC CAC GCT Asp Ser Ser Phe Leu Thr Phe Ser Gly Leu His Ala 3380 3385 10298 GAG GCC TTT GTT Giu Ala Phe Val 3390 AAG AGT CTG GTG LYS Ser Leu Val 3405 GAC CTG CTC AGT Asp Leu Leu Ser GGA CAG ATG AAG Gly Gin Met Lys 3395 TGC TGG CCC TCC Cys Trp Pro Ser 3410 AGT GAC TTG TTT- CTG GAT GAT TCT Ser Asp Leu Phe Leu Asp Asp Ser 3400 GGC GAG GGA ACG CTC AGT TGG CCC Gly Glu Gly Thr Leu Ser Trp Pro 3415 3420 GAC CCG A~sp Pro 3425 TCC ATT GTG GGT AGC AAT CTG Ser Ie Val Gly Ser Asn Leu 3430 OCA CGG GGC CAG GCG GGC Ala Arg Gly Gin Ala Gly 3440 CAT GOG CTG GGC His Gly Leu Gly 3445 CCA GAG GAG Pro Glu Giu CGG CAG CTG Arg Gin Leu 3435 GAC GGC TTC Asp Gly Phe TCC CTG GCC AGC Ser Leu Ala Ser 3455 CCC TAC TCG Pro Tyr Ser CCT GCC Pro Ala 3460 AAA TCC TTC LYS Ser Phe TCA GCA TCA GAT Ser Ala Ser Asp GAA GAC CTG Giu Asp Leu 3470 ATC CAG CAG Ile Gin Gin GTC CTT Val Leu 3475 GCC GAG GGG Ala Glu Gly GTC AGC AGC CCA GCC Val Ser Ser Pro Ala CCT ACC Pro Thr 3485 CAA GAC ACC Gin Asp Thr CAC ATG GAA His Met Giu 3490 ACG GAC CTG CTC AGC Thr Asp Leu Leu Ser 3495 ACG CTG GCG CTG CAG Thr Leu Ala Leu Gin 3510 AGC ACT CCT GGG Ser Thr Pro Gly GAG AAG ACA GAG Glu Lys Thr Giu 3505 AGC CTG TCC Ser Leu Ser 3500 AGG CTG GGG Arg Leu Gly 3515 CCC CAG GCA Pro Gin Ala 10346 10394 10442 10490 10538 105 86 10634 10682 10730 10778 10826 10874 10922 10970 GAG CTG GGG Glu Leu Gly CCA CCC AGC Pro Pro Ser 3520 CCA GGC CTG AAC TGG Pro Gly Leu Asn Trp 3525 GAA CAG Glu Gin 3530 GCG AGG CTG TCC AGG Ala Arg Leu Ser Arg 3535 ACA GGA CTG GTG Thr Giy Leu Val 3540 GAG GGT CTG CGG AAG CGC CTG Glu Gly Leu Arg Lys Arg Leu 3545 CTG CCG GCC Leu Pro Ala 3550 TGG TGT GCC Trp Cys Ala TCC CTG Ser Leu 3555 GCC CAC GGG Ala His Gly CTC AGC Leu Ser CTG CTC CTG Leu Leu Leu GTG GCT Val Ala 3565 GTG GCT GTG Val Ala Val OCT GTC Ala Val 3570 TCA GGG TGG Ser Gly Trp, GTG GGT GCG AGC TTC Val Gly Ala Ser Phe
CCC
Pro CCG GGC GTG AGT GTT GCG TGG CTC Pro Gly Val Ser Val Ala Trp Leu 3585 GCC TCA TTC CTC GGC TGG GAG CCA Ala Ser Phe Leu Gly Trp Glu Pro 3600 CTG TCC AGC AGC GCC Leu Ser Ser Ser Ala 3590 CTG AAG GTC TTG CTG Leu Lys Val Leu Leu 3605 AGC TTC CTG Ser Phe Leu 3595 GAA GCC CTG Glu Ala Leu 3610 TAC TTC TCA CTG Tyr Phe Ser Leu 3615 GTG GCC AAG CGG CTG CAC CCG GAT GAA GAT GAC ACC Val Ala Lys Arg Leu His Pro Asp Glu Asp Asp Thr 3620 3625 11018 CTG GTA GAG AGC CCG GCT GTG ACC Leu Val Giu Ser Pro Ala Val Thi 3630 3635 GTA CGG CCA CCC CAC GGC TTT GCA Val. Arg Pro Pro His Gly Phe Ala 3645 3650 CGC AAG GTC AAG AGG CTA CAT GGC Arg Lys Val Lys Arg Leu His Gly 3665 ATG CTT TTT CTG CTG GTG ACC CTG Met Leu Phe Leu Leu Val Thr Leu 3680 CCT GTG AGC GCA Pro Val Ser Ala 3641 CTC TTC CTG GCC Leu Phe Leu Ala 3655 ATG CTG CGG AGC Met Leu Arg Ser 3670 CGT GTG CCC CGC Arg Val Pro Arg AAG GAA GAA Lys Giu Glu
GCC
Ala 3660 CTC CTG GTG TAC Leu Leu Val Tyr CTG GCC AGC TAT GGG GAT GCC TCA Leu Ala Ser Tyr Gly Asp Ala Ser 3685 3690 TGC CAT GGG CAC GCC TAC Cys His Gly His Ala Tyr 3695 CGT CTG CAA AGC GCC Arg Leu Gin Ser Ala 3700 ATC AAG CAG GAG CTG le Lys Gin Glu Leu 3705 GAG GAG CTC TGG CCA Glu Glu Leu Trp Pro 3720 CAC AGC CGG His Ser Arg 3710 GCC TTC CTG Ala Phe Leu GCC ATC Ala Ile 3715 ACG CGG TCT Thr Arg Ser TGG ATG Trp Met 3725 GCC CAC GTG Ala His Val CTG CTG CCC Leu Leu Pro 3730 TAC GTC CAC GGG AAC Tyr Val His Gly Asn 3735 CAG TCC AGC Gin Ser Ser S3740 11066 11114 11162 11210 11258 11306 11354 11402 11450 11498 11546 11594 11642 11690 CCA GAG CTG GGG Pro Glu Leu Gly CCC CCA CGG CTG Pro Pro Arg Leu 3745 CGG CAG GTG Arg Gln Val 3750 CGG CTG CAG Arg Leu Gin GAA GCA Glu Ala CTC TAC CCA Leu Tyr Pro GAC CCT Asp Pro 3760 CCC GGC CCC Pro Gly Pro AGG GTC Arg Val 3765 CAC ACG His Thr GGA GGC TTC AGC Gly Gly Phe Ser 3775 ACC AGC GAT Thr Ser Asp TAC GAC Tyr Asp 3780 GTT GGC TGG Val Gly Trp GCG CCG GAT Ala Pro Asp AAT GGC TCG Asn Gly Ser 3790 GGG ACG TGG Gly Thr Trp GCC TAT TCA Ala Tyr Ser 3795 TGG TCC Trp Ser 3805 TGC TCG GCC GCA Cys Ser Ala Ala 3770 GAG AGT CCT CAC Glu Ser ProHis 3785 CTG CTG GGG GCA Leu Leu Gly Ala GGC TAC GTG CAG Gly Tyr Val Gin 3820 CTG CGC TTC CTG Leu Arg Phe Leu 3835 GTG TTC CTG GAG Val Phe Leu Glu 3850 TGG GGC TC Trp Gly Ser TGT GCC GTG Cys Ala Val 3810 GAG CTG GGC CTG AGC CTG GAG GAG Glu Leu Gly Leu Ser Leu Glu Glu 3825 CAG CTG CAC AAC TOG CTG GAC AAC Gin Leu His Asn Trp Leu Asp Asn 3840 TAT GAC AGC GGO Tyr Asp Ser Gly 3815 AGC CGC GAC CGG Ser Arg Asp Arg 3830 AGO AGC CGC GCT Arg Ser Arg Ala 3845 CTC ACG CGC TAC Leu Thr Arg Tyr 3855 CGC CTC GAG TTC Arg Leu Giu Phe 3870 AGC CCG GCC GTG GGG CTG CAC GCC GCC GTC ACG CTG Ser Pro Ala Val Gly Leu Hius Ala Ala Val Thr Leu 3860 3865 CCG GCG CCC GGC Pro Ala Ala Oly 3875 CGC CCC CTG CC GCC CTC AGC GTC Arg Ala Leu Ala Ala Leu Ser Val 3880 CGC CCC Arg Pro 3885 TTT GCG CTC CCC CC Phe Ala Leu Arg Arg 3890 CTC AGC GCG Leu Ser Ala CCC CTC Gly Leu 3895 TCO CTO CCT Ser Leu Pro
CTG
Leu 3900 CTC ACC TCG CTG Leu Thr Ser Val TGC CTG Cys Leu 3905 CTC CTG TTC Leu Leu Phe GCC GTO Ala Val 3910 CAC TTC CC His Phe Ala OTO GCC Val Ala GAG GCC CGT ACT TOO CAC AGO GAA Glu Ala Arg Thr Trp His Arg Giu 3920 COG CCC TG Gly Arg Trp 3925 OCA GCC TGG GCG Gly Ala Trp Ala 3935 CGG TOG CTG CTG GTG GCG CTG Arg Trp Leu Leu Val Ala Leu 3940 CCC OTO CTO COG CTC Arg Val Leu Arg Leu 3930 ACC CC CCC ACO GCA Thr Ala Ala Thr Ala 3945 CCC CAC TOG ACC COT Arg Gin Trp Thr Arg CTG OTA CC Leu Val Arg 3950 CTC CCC CAC Leu Ala Gin CTO COT Leu Gly 3955 CCC OCT GAC Ala Ala Asp TTC ACT AC Phe Thr Ser 3975 TTC OTO Phe Val 3965 CCC OGC CC Arg Oly Arg CCC CCC CC Pro Arg Arg 3970
TTC
Phe GAC CAC OTO Asp Gin Val
C
Ala 11738 11786 11834 11882 11930 11978 1202 6 12074 12122 12170 12218 12266 12314 12362 12410 CAG CTO ACC TCC Gin Leu Ser Ser OCA CC Ala Ala 3985 COT GCC CTO Arg Cly Leu 0CC CC Ala Ala 3990 TCO CTG CTC Ser Leu Leu TTC CTG Phe Leu CTT TTC CTC Leu Leu Val AAC OCT CC Lys Ala Ala 4000 CAG CAG CTA CC Gin Gin Leu Arg 4005 TTC OTO CC Phe Val Arg CAG TOO TCC Cln Trp Ser p
S.
S.
S *n.
S *S S.
S
SOS.
55 5 5 @5S 55* GTC TTT OGC AAO Val Phe Cly Lys 4015 ACA TTA TOC Thr Leu Cys CCA OCT CTO Arg Ala Leu 4020 CCA GAO CTC CTO 000 CTC Pro Olu Leu Leu Oly Val 4025 ACC TTO OGC Thr Leu Oly 4030 CTO OTO OTO Leu Val Val CTC COO OTA CC Leu Cly Val Ala 4035 TAC GCC CAG Tyr Ala Gin CTC CTC Leu Leu 4045 OTO TCT TCC Val Ser Ser TOT OTO CAC Cys Val Asp 4050 CTG CCC ATC Leu Ala Ile GCC CAC CC Ala Gin Ala TCC CTC TOG AGC OTO Ser Leu Trp Ser Val 4055 CTC TTO OTO Leu Leu Val GAG TCC TG Oiu Ser Trp CTG TOC CCT 000 Leu Cys Pro Cly 4065 CAC CTO TCA CCC His Leu Ser Pro 4080 ACT COO CTC TCT ACC CTG TOT CCT CC Thr Oly Leu Ser Thr Leu Cys Pro Ala 4070 4075 CTO CTO TOT OTO COO CTC TOG OCA CTG Leu Leu Cys Val Gly Leu Trp Ala Leu 4085 4090 S.
S
136 CCC CTG TGC GGC Arg Leu Trp, Cly 4095 TAC GACGCCC TTG Tyr His Ala Leu 4110 GCC CTA CCC CTG GGG Ala Leu Arg Leu Gly 4100 CGT GGA GAG CTC TAC Arg Gly Giu Leu Tyr 4115 GGT GTT ATT CTC CGC TGG CGC Ala Val Ile Leu Arg Trp Arg 4105 CCC CCC GCC TGG GAG CCC CAG Arg Pro Ala Trp Clu Pro Gin 4120 GAG TAG GAG ATG Asp Tyr Ciu Met 4125 GTG GAG TTG TTC GTG Val Glu Leu Phe Leu 4130 CCC ACC CTC CC Arg Arg Leu Arg 4135 CTG TGC ATC Leu Trp Met 4140 GCC CTG AGG AAC GC AAG GAG TTG CC Gly Leu Ser Lys Val Lys Ciu Phe Arg 4145 GAG AAA Hius Lys 4150 GTC CCC TTT Val Arg Phe CAA CCC Giu Gly 4155 ATG GAG CCG GTG Met Clu Pro Leu 416( CAT GTC CCC GCA Asp Val Pro Pro 4175 TOG TCG AC GAG Ser Ser Ser Gin 4190 CCC TGT CCC TGG Pro Ser Arg Ser 0 TGG AGG Ser Arg 4165 CCC TCG AAG Gly Ser Lys GTA TGG CCC Val Ser Pro 4170 12458 12 506 12554 12602 12650 12698 12746 12794 12842 CCC AGG GGT Pro Ser Ala CCC TGG Gly Ser 4180 CAT CCC TG Asp Ala Ser GAG CCC TGG ACC His Pro Ser Thr 4185 GTG GAT CCC CTC Leu Asp Gly Leu 4195 AGG GTG AGC Ser Val Ser CTC CCC CCC GTG CCC Leu Gly Arg Leu Gly 4200 ACA AGG Thr Arg 4205 TCT GAG GCT Cys Glu Pro GAG CCC Glu Pro 4210 TGC CGG GTG Ser Arg Leu CAA CCC GTC Gin Ala Val 4215 TTC GAG CC Phe Glu Ala 4220 CTC CTG ACC GAG Leu Leu Thr Gin TTT GAG Phe Asp 4225 CGA CTG AAG Arg Leu Asn GAG CCC ACA Gin Ala Thr 4230 GAG GAG GTC TAG Giu Asp Val Tyr 4235 CAG GTG GAG Gin Leu Glu GAG GAG CG Gin Gin Leu 4240 GAG AC CTG CAA GC His Ser Leu Gin Gly 4245 CCT CCC CGA TGG CCC Arg Gly Pro Ser Pro 4260 S.
S.
0* 6@ @0SO
S
S
eSe.
0*@S
S.
p osi, S CC CCC CCC GCA TGT TCG Ala Pro Ala Gly Ser Ser 4255 CCCC AGG AG GC CGC Arg Arg Ser Ser Arg 4250 CGC CTG CCC CGA CA Gly. Leu Arg Pro Ala 4265 GTG GAG GTG CCC ACT Val Asp Leu Ala Thr 4280 CTC CCC AGC CCC CTT CCC CCC CC Leu Pro Ser Arg Leu Ala Arg Ala 4270 4275 CCC CCC AGG AGC ACA. CCC CTT CCC Cly Pro Ser Arg Thr Pro Leu Arg 4285 4290 ACT CCC GGT Ser Arg Gly CCC AAG AAG AAG GTG Ala Lys Asri Lys Val 4295 GAG CCC AGG His Pro Ser 12890 12938 12986 13034 13090 13150 13210 13270 AGO ACT TAGTGGTGCT TGGTGCGGC CCTGGGCCGT GGAGTGCAG
TGGAGCGC
Ser Thr TCGTATTAG TTTCGCCG TGCAAGGGG CAGGCCAGG CAGAATCGGT
GGAGGTACCT
TGCCAA
GTGTGCAA
CAGGCAGC CATTCTGT CTGTCTCCCGC TTGAGGAGTT
TAAAGAGGGT
GGAGCACGA CCCTCGGCTG CGCGCTCC TTCCCAAGGA
GAGAGCGTA
TTGGACGGTT TCTAGCCTCT GAGATGCTAA TTTATTTCCC CGAGTCCTCA
GGTACAGCGG-
GCTGTGCCCG GCCCCACCCC CTGGGCAGAT GTCCCCCACT GCTAAGGCTG
CTGGCTTCAG
GGAGGGTTAG CCTGCACCGC CGCCACCCTG CCCCTAAGTT ATTACCTCTC
CAGTTCCTAC
CGTACTCCCT GCACCGTCTC ACTGTGTGTC TCGTGTCAG3T AATTTATATG
GTGTTAAAAT
GTGTATATTT TTGTATGTCA CTATTTTCAC TAGGGCTGAG GGGCCTGCGC
CCAGAGCTGG
CCTCCCCCAJA CACCTGCTGC GCTTGGTAGG TGTGGTGGCG TTATGGCAGC
CCGGCTGCTG
CTTGGATGCG AGCTTGGCCT TGGGCCGGTG CTGGGGGCAC AGCTGTCTGC
CAGGCACTCT
CATCACCCCA GAGGCCTTGT CATCCTCCCT TGCCCCAGGC CAGGTAGCAA
GAGAGCAGCG
CCCAGGCCTG CTGGCATCAG GTCTGGGC. GTAGCAGGAC TAGGCATGTC
AGAGGACCCC
AGGGTGGTTA GAGGAAAAGA CTCCTCCTGG GGGCTGGCTC CCAGGGTGGA
GGAAGGTGAC
TGTGTGTGTG TGTGTGTGCG CGCGCGCACG CGCGAGTGTG CTGTATGGCC
CAGGCAGCCT
CAAGGCCCTC GGAGCTGGCT GTGCCTGCTT CTGTGTACCA CTTCTGTGGG
CATGGCCGCT
TCTAGAGCCT CGACACCCCC CCAACCCCCG CACCAAGCAG ACAAAGTCA
TAAAAGAGCT
GTCTGACTGC
13330 13390 13450 13510 13570 13630 13690 13750 13810 13870 13930 13990 14050 14060 INFORMATION FOR SEQ ID SEQUENCE
CHARACTERISTICS:
LENGTH: 4302 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: Met Pro Pro Ala Ala Pro Ala Arg Leu Ala Leu Ala Leu 1 5 10 Leu Trp Leu Gly Ala Leu Ala Gly Gly Pro Gly Arg Gly 20 25 Cys Glu Pro Pro Cys Leu Cys Gly Pro Ala Pro Gly Ala -,35 40 0O 0 S S 00 Oe 05.5
S
S
0
S
OSOe
S.
0 0 005 0
S
lOSS 55 0 @050
S
0055 Gly Leu Gly Cys Gly Pkh Val Asn Cys Ser Gly Arg Gly Leu Arg Thr Leu Gly Pro Ala Leu Arg 50 55 Ile Pro Ala Asp Ala Thr Ala Leu Asp Val Ser His Asn Leu Leu Arg 65 70 75 Ala Leu Asp Val Gly Leu Leu Ala Asn Leu Ser Ala Leu Ala Glu Leu 90 Asp Ile Ser Asn Asn Lys Ile Ser Thr Leu Glu Glu Gly Ile Phe Ala 100 105 110 Asn Leu Phe Asn Leu Ser Glu Ile Asn 115 120 Cys Asp Cys Gly Leu Ala Trp Leu Pro 130 135 Val Arg Val Val Gin Pro Giu Aia Ala 145 150 Leu Ala Gly Gin Pro Leu Leu Gly Ile 165 Gly Giu Giu Tyr Vai Ala Cys Leu Pro 180 185 Ala Ala Val Ser Phe Ser Ala Ala His 195 200 Ala Cys Ser Ala Phe Cys Phe Ser ThrC 210 215 Ser Glu Gin Gly Trp Cys Leu Cys Gly 1 225 230 Ser Phe Ala Cys Leu Ser Leu Cys Ser G 245 2 Pro Thr Cys Arg Gly Pro Thr Leu Leu G 260 265 Pro Gly Ala Thr Leu Val Gly Pro His G 275 280 Leu Ala Ala Phe His Ile Ala Ala Pro L4 290 295 Trp Asp Phe Gly Asp Gly Ser Ala Giu Vz 305 310 Ala Ser His Arg Tyr Val Leu Pro Gly Ar 325 33 Leu Ala Leu Gly Ala Gly Ser Ala Leu Le 340 345 Glu Ala Ala Pro Ala Ala Leu Giu Leu Va 355 360 Ser Asp Giu Ser Leu Asp Leu Ser Ile Gi1 370 375 Leu Giu Ala Ala Tyr Ser Ile Val Ala Lei 385 390 Ala Val His Pro Leu Cys Pro Ser Asp Th 405 41( Gly His Cys Tyr Arg Leu Val Val Giu Ly 420 425
L
A
T
1 31 ii .0
U
n
LI
r) ~eu Ser Gly Asn 125 rg Trp Ala Giu 140 hr Cys Ala Gly 155 ro Leu Leu Asp 70 ~P Asn Ser Ser .u Gly Leu Leu 205 y Gin Gly Leu 1 220 a Ala Gin Pro S 235 Y Pro Pro Pro P 0 i His Val Phe P 2 Pro Leu Ala S 285 Pro Val Thr A: 300 Asp Ala Ala G] 315 Tyr His Val T11 Gly Thr Asp Va 35 Cys Pro Ser Se 365 Asn Arg Gly G1: 380 Gly Giu Glu Pr4 395 Giu Ile Phe Pr Ala Ala Trp, Let 430 Pro Phe Glu Gin Pro Gly Ser Gly 175 Gly Thr 190 31n Pro Ila Ala ;er Ser 1 'ro Pro A 255 ro Ala S er Gly G la Thr A: -y Pro A: 3; ir Ala Va 335 1 Gin Va 0 r Val G1 y' Ser Gi o Ala Ar 40 3Gly AsI 415 i Gin Al Giu Gin Ser 160 Cys Val .Giu eu lia la.
er in rg La n y 0 Gin Giu Gin Cys Gin Ala Trp Ala Giy Ala Ala Leu Ala Met Val Asp 435 440 445 Ser Pro Ala Val Gin Arg Phe Leu Val Ser Arg Val Thr Arg Cys Leu 450 455 460 Asp Val Trp Ile Gly Phe Ser Thr Val Gin Gly Vai Giu Val Giy Pro 465 470 475 480 Ala Pro Gin Giy Giu Ala Phe Ser Leu Giu Ser Cys Gin Asn Trp Leu 485 490 495 Pro Gly Glu Pro 'His Pro Ala Thr Ala Glu His Cys Val Arg Leu Gly 500 505 510 Pro Thr Gly Trp Cys Asn Thr Asp Leu Cys Ser Ala Pro His Ser Tyr 515 520 525 Val Cys Giu Leu Gin Pro Gly Gly Pro Val Gin Asp Ala Giu Asn Leu 530 535 540 Leu Vai Gly Ala Pro Ser Giy Asp Leu Gin Gly Pro Leu Thr Pro Leu 545 550 555 560 Ala Gin Gin Asp Gly Leu Ser Ala Pro His Glu Pro Val Glu Val Met 565 570 575 Val Phe Pro Gly Leu Arg Leu Ser Arg Glu Ala Phe Leu Thr Thr Ala 580 585 590 Giu Phe Giy Thr Gin Giu Leu Arg Arg Pro Ala Gin Leu Arg Leu Gin 595 600 605 Val Tyr Arg Leu Leu Ser Thr Ala Gly Thr Pro Giu Asn Giy Ser Giu 610 615 620 Pro Giu Ser Arg Ser Pro Asp Asn Arg Thr Gin Leu Ala Pro Ala Cys 625 630 635 640 Met Pro Gly Gly Arg Trp Cys Pro Gly Ala Asn Ilie Cys Leu Pro Leu 645 650 655 Asp Ala Ser Cys His Pro Gin Ala Cys Ala Asn Gly Cys Thr Ser Gly 660 665 670 Pro Gly Leu Pro Gly Ala Pro Tyr Ala Leu Trp Arg Glu Phe Leu Phe *675 680 685 Ser Val Pro Ala Giy Pro Pro Ala Gin Tyr Ser Val Thr Leu His Gly :::690 695 700 Gin Asp Val Leu Met Leu Pro Gly Asp Leu Val Gly Leu Gin His Asp 705 710 715 720 Ala Gly Pro Gly Ala Leu Leu His Cys Ser Pro Ala Pro Gly His Pro C725 730 735 Gly Pro Arg Ala Pro Tyr Leu Ser Ala Asn Ala Ser Ser Tip Leu Pro 740 745 750 140 His Leu Pro Ala Gin Leu Glu Gly Thr Trp Ala Cys Pro Ala Cys Ala 755 760 765 Leu Arg Leu Leu Ala Ala Thr Giu Gin Leu Thr Val Leu Leu Gly Leu 770 775 780- Arg Pro Asn Pro Gly Leu Arg Leu Pro Gly Arg Tyr Glu Val Arg Ala 785 790 795 800 Glu Val Gly Asn Gly Val Ser Arg His Asn Leu Ser Cys Ser Phe Asp 805 810 815 Val Val Ser Pro Val Ala Gly Leu Arg Val Ile Tyr Pro Ala Pro Arg 820 825 830 Asp Gly Arg Leu Tyr Val Pro Thr Asn Gly Ser Ala Leu Val Leu Gin 835 840 845 Val Asp Ser Gly Ala Asn Ala Thr Ala Thr Ala Arg Trp Pro Gly Gly 850 855 860 Ser Val Ser Ala Arg Phe Glu Asn Val Cys Pro Ala Leu Val Ala Thr 865 870 875 880 Phe Val Pro Gly Cys Pro Trp Giu Thr Asn Asp Thr Leu Phe Ser Val 885 890 895 Val Ala Leu Pro Trp Leu Ser Giu Gly Giu His Val Val Asp Val Val 900 905 910 Val Giu Asn. Ser Ala Ser Arg Ala Asn Leu Ser Leu Arg Val Thr Ala 915 920 925 Giu Giu Pro Ile Cys Gly Leu Arg Ala Thr Pro Ser Pro Giu Ala Arg 930 935 940 Val Leu Gin Gly Val Leu Val Arg Tyr Ser Pro Val Val Giu Al'a Gly 945 950 955 960 Ser Asp Met Val Phe Arg Trp Thr Ile Asn Asp Lys Gin Ser Leu Thr 965 970 975 Phe Gin Asn Val Val Phe Asn Val Ile Tyr Gin Ser Ala Ala Val Phe 980 985 990 Lys Leu Ser Leu Thr Ala Ser Asn His Val Ser Asn Val Thr Val Asn 995 1000 1005 Tyr Asn Val Thr Val Glu Arg Met Asn Arg Met Gin Gly Leu Gin Val 1010 1015 1020 Ser Thr Val Pro Ala Val Leu Ser Pro Asn Ala Thr Leu Ala Leu Thr 1025 1030 1035 1040 Ala Gly Val Leu Val Asp Ser Ala Val Glu Val Ala Phe Leu Trp Thr 1045 1050 1055 Phe Gly Asp Gly Glu Gin Ala Leu His Gin Phe Gin Pro Pro Tyr Asn 1060 1065 1070
S.
*Q Glu Ser Phe Pro Val Pro Asp Pro Ser Val Ala Gin Val Leu Val Glu 1075 1080 1085 His Asn Val Met His Thr Tyr Ala Ala Pro Gly Glu TPyr Leu Leu Thr 1090 1095 1100 Val Leu Ala Ser Asn Ala Phe Glu Asn Leu Thr Gin Gin Val Pro Val 1105 1110 1115 1120 Ser Val Arg Ala Ser Leu Pro Ser Val Ala Val Gly Val Ser Asp Gly 1125 1130 1135 Val Leu Val Ala Gly Arg Pro Val Thr Phe Tyr Pro His Pro Leu Pro 1140 1145 1150 Ser Pro Gly Gly Val Leo Tyr Thr Trp Asp Phe Gly Asp Gly Ser Pro 1155 1160 1165 Val Leu Thr Gin Ser Gin Pro Ala Ala Asn His Thr Tyr Ala Ser Arg 1170 1175 1180 Gly Thr Tyr His Val Arg Leu Giu Val Asn Asn Thr Val Ser Gly Ala 1185 1190 1195 1200 Ala Ala Gin Ala Asp Val Arg Val Phe Glu Glu Leu Arg Gly Leu Ser 1205 1210 1215 Val Asp Met Ser Leu Ala Val Glu Gin Gly Ala Pro Val Val Val Ser 1220 1225 1230 Ala Ala Val Gin Thr Gly Asp Asn Ile Thr Trp Thr Phe Asp Met Gly 1235 1240 1245 Asp Gly Thr Val Leu Ser Gly Pro Glu Ala Thr Val Giu His Val Tyr 1250 1255 1260 Leo Arg Ala Gin Asn Cys Thr Val Thr Val Gly Ala Ala Ser Pro Ala 1265 1270 1275 1280 Gly His Leo Ala Arg Ser Leo His Val Leu Val Phe Val Leu Glu Val 1285 1290 1295 *Leo Arg Val Glu Pro Ala Ala Cys Ile Pro Thr Gin Pro Asp Ala Arg 1300 1305 1310 Leo Thr Ala Tyr Val Thr Gly Asn Pro Ala His Tyr Leo Phe Asp Trp, 1315 1320 1325 Thr Phe Gly Asp Gly Ser Ser Asn Thr Thr Val Arg Gly Cys Pro Thr :::1330 1335 1340 Val Thr His Asn Phe Thr Arg Ser Gly Thr Phe Pro Leo Ala Leu Val 1345 1350 1355 1360 Leu Ser Ser Arg Val Asn Arg Ala His Tyr Phe Thr Ser Ile Cys Val 1365 1370 1375 Glu Pro Glu Val Gly Asn Val Thr Leu Gin Pro Glu Arg Gin Phe Val *1380 1385 1390 142 Gin Leu Gly Asp Glu Ala Trp Leu Val Ala Cys Ala Trp Pro Pro Phe 1395 .1400 1405 Pro Tyr Arg Tyr Thr Trp Asp Phe Gly Thr Giu Glu Ala Ala Pro Thr 1410 1415 1420 Arg Ala Arg Gly Pro Giu Val Thr Phe Ile Tyr Arg Asp Pro Gly Ser 1425 1430 1435 1440 Tyr Leu Val Thr Val Thr Ala Ser Asn Asn le Ser Ala Ala Asn Asp 1445 1450 1455 Ser Ala Leu Val Giu Val Gin Glu Pro Val Leu Val Thr Ser Ile Lys 1460 1465 1470 Val Asn Gly Ser Leu Giy Leu Glu Leu Gin Gin Pro Tyr Leu Phe Ser 1475 1480 1485 Ala Val Gly Arg Gly Arg Pro Ala Ser Tyr Leu Trp Asp Leu Gly Asp 1490 1495 1500 Gly Gly Trp Leu Glu Gly Pro Glu Val Thr His Ala Tyr Asn Ser Thr 1505 1510 1515 1520 Gly Asp Phe Thr Val Arg Val Ala Gly Trp Asn Giu Val Ser Arg Ser 1525 1530 1535 Glu Ala Trp, Leu Asn Val Thr Val Lys A-rg Arg Val Arg Gly Leu Val 1540 1545 1550 Val Asn Ala Ser Arg Thr Val Val Pro Leu Asn Gly Ser Val Ser Phe 1555 1560 1565 Ser Thr Ser Leu Glu Ala Gly Ser Asp Val Arg Tyr Ser Trp, Val Leu 1570 1575 1580 Cys Asp Axg Cys Thr Pro Ile Pro Gly Gly Pro Thr Ile Ser Tyr Thr 1585 1590 1595 1600 Phe Arg Ser Vai Gly Thr Phe Asn Ile Ile Val Thr Ala Glu Asn Giu 1605 1610 1615 *Val Gly Ser Ala Gln Asp Ser Ile Phe Val Tyr Val Leu Gin Leu Ile **.1620 1625 1630 Giu Gly Leu Gln Val Val Gly Giy Gly Arg Tyr Phe Pro Thr Asn His 1635 1640 1645 *Thr Val Gin Leu Gin Ala Val Vai Arg Asp Gly Thr Asn Val Ser Tyr 1650 1655 1660 Ser Trp Thr Ala Trp Arg Asp Arg Gly Pro Ala Leu Ala Gly Ser Gly 1665 1670 1675 1680 Lys Gly Phe Ser Leu Thr Val Leu Glu Ala Gly Thr Tyr Hi Val Gin 1685 1690 1695 Leu Arg Ala Thr Asn Met Leu Gly Ser Ala Trp Ala Asp Cys Thr Met *1700 1705 1710 Asp Phe Val Giu Pro Val Gly Trp Leu Met Val Ala Ala Ser Pro Asn 171S 1720 1725 Pro Ala Ala Val Asn Thr Ser Val Thr Leu Ser Ala Glu Leu Ala Gly 1730 1735 174'0 Gly Ser Gly Val Val Tyr Thr Trp Ser Leu Glu Glu Gly Leu Ser Trp 1745 1750 1755 1760 Glu Thr Ser Glu Pro Phe Thr Thr His Ser Phe Pro Thr Pro Gly Leu 1765 1770 1775 His Leu Val Thr Met Thr Ala Gly Asn Pro Leu Giy Ser Ala Asn Ala 1780 1785 1790 Thr Val Glu Val Asp Val Gln Val Pro Val Ser Gly Leu Ser Ile Arg 1795 1800 1805 Ala Ser Glu Pro Gly Giy Ser Phe Val Ala Ala Gly Ser Ser Val Pro 1810 1815 1820 Phe Trp Gly Gin Leu Ala Thr Gly Thr Asn Val Ser Trp Cys Trp Ala 1825 1830 1835 1840 Val Pro Gly Gly Ser Ser Lys Arg Gly Pro His Val Thr Met Val Phe 1845 1850 1855 Pro Asp Ala Gly Thr Phe Ser Ile Arg Leu Asi Ala Ser Asn Ala Val 1860 1865 1870 Ser Trp Vai Ser Ala Thr Tyr Asn Leu Thr Ala Giu Giu Pro Ile Val 1875 1880 1885 Gly Leu Val Leu Trp Ala Ser Ser Lys Val Val Ala Pro Gly Gin Leu 1890 1895 1900 Val His Phe Gin Ile Leu Leu Ala Ala Gly Ser Ala Val Thr Phe Arg 1905 1910 1915 1920 Leu Gin Val Gly Gly Ala Asn Pro Glu Val Leu Pro Gly Pro Arg Phe 1925 1930 1935 Ser His Ser Phe Pro Arg Vai Gly Asp His Val Vai Ser Val Arg Gly 1940 1945 1950 Lys Asn His Val Ser Trp Ala Gin Ala Gin Val Arg Ile Val Val Leu 1955 1960 1965 *Glu Ala Val Ser Gly Leu Gin Val Pro Asn Cys Cys Glu Pro Gly Ile 1970 1975 1980 Ala Thr Gly Thr Giu Arg Asn Phe Thr Ala Arg Val Gin Arg Gly Ser 1985 1990 1-995 2000 a Arg Val Ala Tyr Ala Trp Tyr Phe Ser Leu Gin Lys Val Gin Gly Asp 2005 2010 2015 Ser Leu Val Ile Leu Ser Giy Arg Asp Val Thr Tyr Thr Pro Val Ala 2020 2025 2030 144 Ala Gly Leu Leu Glu Ile Gin Val Arg Ala Phe Asn Ala Leu Gly Ser 2035 2040 2045 Glu Asn Arg Thr Leu Val Leu Glu Val Gin Asp Ala Val Gin Tyr Val 2050 2055 206Oi Ala Leu Gin Ser Gly Pro Cys Phe Thr Asn Arg Ser Ala Gin Phe Giu 2065 2070 2075 2080 Ala Ala Thr Ser Pro Ser Pro Arg Arg Val Ala Tyr His Trp Asp Phe 2085 2090 2095 Gly Asp Gly Ser Pro Gly Gin Asp Thr Asp Giu Pro Arg Ala Giu His 2100 2105 2110 Ser Tyr Leu Arg Pro Gly Asp Tyr Arg Val Gin Val Asn Ala Ser Asn 2115 2120 2125 Leu Val Ser Phe Phe Val Ala Gin Ala Thr Val Thr Val Gin Val Leu 2130 2135 2140 Ala Cys Arg Glu Pro Giu Val Asp Val Val Leu Pro Leu Gin Val Leu 2145 2150 2155 2160 Met Arg Arg Ser Gin Arg Asn Tyr Leu Glu Ala His Val Asp Leu Arg 2165 2170 2175 Asp Cys Val Thr Tyr Gin Thr Glu Tyr Arg Trp Glu Val Tyr Arg Thr 2180 2185 2190 Ala Ser Cys Gin Arg Pro Gly Arg Pro Ala Arg Val Ala Leu Pro Gly 2195 2200 2205 Val Asp Val Ser Arg Pro Arg Leu Val Leu Pro Arg Leu Ala Leu Pro 2210 2215 2220 Val Gly His Tyr Cys Phe Val Phe Val Val Ser Phe Gly Asp Thr Pro 2225 2230 2235 2240 Leu Thr Gin Ser Ile Gin Ala Asn Val Thr Val Ala Pro Giu Arg .Leu *2245 2250 2255 Val Pro Ile Ile Glu Gly Gly Ser Tyr Arg Val Trp Ser Asp Thr Arg *2260 2265 2270 Asp Leu Val Leu Asp Gly Ser Giu Ser Tyr Asp Pro Asn Leu Glu Asp 2275 2280 2285 Gly Asp Gin Thr Pro Leu Ser Phe His Trp Ala Cys Val Ala Ser Thr 2290 2295 2300 Gin Arg Glu Ala Gly Gly Cys Ala Leu Asn Phe Gly Pro Arg Gly Ser 2305 2310 2315 2320 0 Ser Thr Val Thr Ile Pro Arg Glu Arg Leu Ala Ala Gly Val Giu Tyr 2325 2330 2335 Thr Phe Ser Leu Thr Val Trp Lys Ala Gly Arg Lys Giu Glu Ala Thr 2340 2345 2350 Asn Gin Thr Val Leu Ile Arg Ser Gly Arg Val Pro Ile Val Ser Leu 2355 2360 2365 Glu Cys Val Ser Cys Lys Ala Gin Ala Val Tyr Giu Val Ser Arg Ser 2370 2375 23-80 Ser Tyr Val Tyr Leu Glu Gly Arg Cys Leu Asn Cys Ser Ser Gly Ser 2385 2390 2395 2400 Lys Arg Gly Arg Trp Ala Ala Arg Thr Phe Ser Asn Lys Thr Leu Val 2405 2410 2415 Leu Asp Giu Thr Thr Thr Ser Thr Gly Ser Ala Gly Met Arg Leu Val 2420 2425 2430 Leu Arg Arg Gly Val Leu Arg Asp Gly Glu Gly Tyr Thr Phe Thr Leu 2435 2440 2445 Thr Val Leu Gly Arg Ser Gly Glu Glu Glu Gly Cys Ala Ser Ile Arg 2450 2455 -2460 Leu Ser Pro Asn Arg Pro Pro Leu Gly Gly Ser Cys Arg Leu Phe Pro 2465 2470 2475 2480 Leu Gly Ala Val His Ala Leu Thr Thr Lys Val His Phe Giu Cys Thr 2485 2490 2495 Gly Trp, His Asp Ala Giu Asp Ala Gly Ala Pro Leu Val Tyr Ala Leu 2500 2505 2510 Leu Leu Arg Arg Cys Arg Gin Gly His Cys Glu Giu Phe Cys Val Tyr 2515 2520 2525 Lys Gly Ser Leu Ser Ser Tyr Gly Ala Val Leu Pro Pro Gly Phe Arg 2530 2535 2540 Pro His Phe Giu Val Gly Leu Ala Val Val Val Gin Asp Gin Leu Gly 2545 2550 2555 2560 Ala Ala Val Val Ala Leu Asn Arg Ser Leu Ala Ile Thr Leu Pro Giu 2565 2570 2575 Pro Asn Gly Ser Ala Thr Gly Leu Thr Val Tx-p Leu His Gly Leu Thr 2580 2585 2590 a, Ala Ser Val Leu Pro Gly Leu Leu Arg Gin Ala Asp Pro Gin His Val 2595 2600 2605 S le Giu Tyr Ser Leu Ala Leu Val Thr Val Leu Asn Giu Tyr Glu Arg *2610 2615 2620 Ala Leu Asp Val Ala Ala Giu Pro Lys His Giu Arg Gin His Arg Ala 2625 2630 2635 2640 Gin Ile Arg Lys Asn Ile Thr 0Th Thr Leu Val Ser Leu Arg Val His 2645 2650 2655 Thr Val Asp Asp Ile Gin Gin Ile Ala Ala Ala Leu Ala Gin Cys Met 2660 2665 2670 a a. 146 Gly Pro Ser Arg Glu Leu Val Cys Arg Ser Cys Leu Lys Gin Thr Leu 2675 2680 2685 His Lys Leu Glu Ala Met Met Leu Ile Leu Gin Ala Giu Thr Thr Ala 2690 2695 2700- Gly Thr Val Thr Pro Thr Ala Ile Gly Asp Ser Ile Leu Asn Ile Thr 2705 2710 .2715 2720 Gly Asp Leu Ile His Leu Ala Ser Ser Asp Val Arg Ala Pro Gin Pro 2725 .2730 2735 Ser Glu Leu Gly Ala Glu Ser Pro Ser Arg Met Val Ala Ser Gin Ala 2740 2745 2750 Tyr Asn Leu Thr Ser Ala Leu Met Arg Ile Leu Met Arg Ser Arg Val 2755 2760 2765 Leu Asn Giu Giu Pro- Leu Thr Leu Ala Gly Glu Giu Ile Val Ala Gin 2770 2775 2780 Gly Lys Arg Ser Asp Pro Arg Ser Leu Leu Cys Tyr Gly Gly Ala Pro 2785 2790 2795 2800 Gly Pro Gly Cys His Phe Ser Ile Pro Glu Ala Phe Ser Gly Ala Leu 2805 2810 2815 Ala Asn Leu Ser Asp Val Val Gin Leu Ile Phe Leu Val Asp Ser Asn 2820 2825 2830 Pro Phe Pro Phe Gly Tyr Ile Ser Asn Tyr Thr Val Ser Thr Lys Val 2835 2840 2845 Ala Ser Met Ala Phe Gin Thr Gin Ala Gly Ala Gin Ile Pro Ile Glu 2850 2855 2860 Arg Leu Ala Ser Glu Arg Ala Ile Thr Val Lys Val Pro Asn Asn Ser 2865 2870 2875 2880 Asp Trp Ala Ala Arg Gly His Arg Ser Ser Ala Asn Ser Ala Asn Ser 2885 2890 2895 Val Val Val Gin Pro Gin Ala Ser Val Gly Ala Val Val Thr Leu Asp 2900 2905 2910 Ser Ser Asn Pro Ala Ala Gly Leu His Leu Gin Leu Asn Tyr Thr Leu *2915 2920 2925 Leu Asp Gly His Tyr Leu Ser Glu Glu Pro Glu Pro Tyr Leu Ala Val 2930 2935 2940 Tyr Leu His Ser Giu Pro Arg Pro Asn Glu His Asn Cys Ser Ala Ser 2945 2950 2955 2960 Arg Arg Ile Arg Pro Glu Ser Leu Gin Gly Ala Asp His Arg Pro Tyr 2965 2970 2975 Thr Phe Phe Ile Ser Pro Gly Ser Arg Asp Pro Ala Gly Ser Tyr His 2980 2985 2990 1I47 Leu Asn Leu Ser Ser His Phe Arg Trp, Ser Ala Leu Gin Val Ser Val 2995 3000 3005 3010 *e y Tr r Leu Cys Gin Tyr Phe Ser Glu Glu Asp met Val 3003015 302'0 Trp Arg Thr Giu Gly Leu Leu Pro Leu Giu Glu Thr Ser Pro Arg Gin 3025 3030 3035 3040 Ala Val Cys Leu Thr Arg His Leu Thr Ala Phe Gly Ala Ser Leu Phe 3045 3050 3055 Val Pro Pro Ser His Val Arg Phe Val Phe Pro Giu Pro Thr Ala Asp 3060 3065 3070 Val Asn Tyr Ile Val Met Leu Thr Cys Ala Val Cys Leu Val Thr Tyr 3075 3080 3085 Met Val Met Ala Ala Ile Leu His Lys Leu Asp Gin Leu*Asp Ala Ser 3090 3095 3100 Axg Gly Arg Ala Ile Pro Phe Cys Gly Gin Arg Gly Arg Phe Lys Tyr 3105 3110 3115 3120 Giu Ile Leu Val Lys Thr Gly Trp Gly Arg Gly Ser Gly Thr Thr Ala 3125 3130 3135 His Val Giy Ile Met Leu Tyr Gly Val Asp Ser Arg Ser Gly His Arg 3140 3145 3150 His Leu Asp Gly Asp Arg Ala Phe His Arg Asn Ser Leu Asp Ile Phe 3155 3160 3165 Arg Ile Ala Thr Pro His Ser Leu Gly Ser 'Val Trp Lys Ile Arg Val 3170 3175 3180 Trp His Asp Asn Lys Gly Leu Ser Pro Ala Trp, Phe Leu Gin His Val 3185 3190 3195 3200 Ile Val Arg Asp Leu Gin Thr Ala Arg Ser Ala Phe Phe Leu Vai Asn 3205 3210 31 Asp Trp Leu Ser Val Glu Thr Glu Ala Asn Gly Gly Leu Val Giu Lys 3220 3225 3230 *Giu Val Leu Ala Ala Ser Asp Ala Ala Leu Leu Arg Phe Arg Arg Leu 3235 3240 3245 Leu Val- Ala Glu Leu Gin Arg Gly Phe Phe Asp 'Lys His Ile Trp Leu 325*0 3255 3260 Ser Ile Trp Asp Arg Pro Pro Arg Ser Arg Phe Thr Arg Ile Gin Arg 3265 3270 3275 3280 0 Ala Thr Cys Cys Val Leu Leu Ile Cys Leu Phe Leu Gly Ala Asn Ala 3285 3290 3295 Val Trp Tyr Gly Ala Val Gly Asp Ser Ala Tyr Ser Thr Gly His Val 3300 3305 3310 0Q Ser Arg Leu Ser Pro Leu Ser Val Asp Thr Val Ala Val Gly Leu Val 3315 3320 3325 Ser Ser Val Val Val Tyr Pro Val Tyr Leu Ala Ile Leu Phe Leu Phe 3330 3335 3340 Arg Met Ser Arg Ser Lys Val Ala Gly Ser Pro Ser Pro Thr Pro Ala 3345 3350 3355 3360 Gly Gin Gin Val Leu Asp Ile Asp Ser Cys Leu Asp Ser Ser Val Leu 3365 3370 3375 Asp Ser Ser Phe Leu Thr Phe Ser Gly Leu His Ala Glu Ala Phe Val 3380 3385 3390 Gly. Gin Met Lys Ser Asp Leu Phe Leu Asp Asp Ser Lys Ser Leu Val .3395 3400 3405 Cys Trp Pro Ser Gly Glu Gly Thr Leu Ser Trp Pro Asp Leu Leu Ser 3410 3415 3420 Asp Pro Ser Ile Val Gly Ser Asn Leu Arg Gin Leu Ala Arg Gly Gin 3425 3430 3435 3440 Ala Giy His Giy Leu Giy Pro Glu Giu Asp Gly Phe Ser Leu Ala Ser 3445 3450 3455 Pro Tyr Ser Pro Ala Lys Ser Phe Ser Ala Ser Asp Giu Asp Leu Ile 3460 3465 3470 Gin Gin Val Leu Ala Glu Gly Val Ser Ser Pro Ala Pro Thr Gin Asp 3475 3480 3485 Thr His Met Giu Thr Asp Leu Leu Ser Ser Leu Ser Ser Thr Pro Gly 3490 3495 3500 Glu Lys Thr Glu Thr Leu Ala Leu Gin Arg Leu Gly Giu Leu Gly Pro 3505 3510 3515 3520 Pro Ser Pro Gly Leu Asn Trp Glu Gin Pro Gin Ala Ala Arg Leu Ser *3525 3530 3535 Arg Thr Gly Leu Val Giu Gly Leu Arg Lys Arg Leu Leu Pro Ala Trp *3540 3545 3550 *Cys Ala Ser Leu Ala His Gly Leu Ser Leu Leu Leu Val Ala Val Ala 3555 3560 3565 Val Ala Val Ser Gly Trp Val Gly Ala Ser Phe Pro Pro Gly Val Ser 3570 3575 3580 Val Ala Trp Leu Leu Ser Ser Ser Ala Ser Phe Leu Ala Ser Phe Leu 3585 3590 3595 3600 Gly Trp Glu Pro Leu Lys Val Leu Leu Giu Ala Leu Tyr Phe Ser Leu 3605 3610 3615 Val Ala Lys Arg Leu His Pro Asp Glu Asp Asp Thr Leu Val Glu Ser 3620 3625 3630 Pro Ala Val Thr Pro Val Ser Ala Arg Val Pro Arg Val Arg Pro Pro 3635 3640 3645 His Gly Phe Ala Leu Phe Leu Ala Lys Glu Glu Ala.Arg Lys Val Lys 3650 3655 3660 Arg Leu His Gly Met Leu Arg Ser Leu Leu Val TPyr Met Leu Phe Leu 3665 3670 3675 3680 Leu Val Thr Leu Leu Ala Ser Tyr Gly Asp Ala Ser Cys His Gly His 3685 3690 3695 Ala T yr Arg Leu Gin Ser Ala Ile Lys Gin Glu Leu His Ser Arg Ala 3700 3705 3710 Phe Leu Ala Ile Thr Arg Ser Giu Glu Leu Trp Pro Trp Met Ala His 3715 3720 3725 Val Leu Leu Pro Tyr Val His Gly Asn Gin Ser Ser Pro Giu Leu Gly 3730 3735 3740 Pro Pro.Arg Leu Arg Gin Val Arg Leu Gin Glu Aia Leu Tyr Pro Asp 3745 3750 3755 3760 Pro Pro Gly Pro Arg Val His Thr Cys Ser Ala Ala Gly Gly Phe Ser 3765 3770 3775 Thr Ser Asp Tyr Asp Val Gly Trp Glu Ser Pro His Asn Gly Ser Gly 3780 3785 3790 Thr Trp, Ala Tyr Ser Ala Pro Asp Leu Leu Gly Ala Trp, Ser Trp Gly 3795 3800 3805 Ser Cys Ala Val Tyr Asp Ser Gly Gly Tyr Val Gin Glu- Leu Gly Leu 3810 3815 3820 Ser Leu Giu Glu Ser Arg Asp Arg Leu Arg Phe Leu Gin Leu His Asn 3825 3830 3835 3840 Trp Leu Asp Asn Arg Ser Arg Ala Val Phe Leu Glu Leu Thr Arg Tyr 3845 3850 3855 Ser Pro Ala Val Giy Leu His Ala Ala Val Thr Leu A-rg Leu Giu Phe *3860 3865 3870 Pro Ala Ala Gly Arg Ala Leu Ala Ala Leu Ser Val Arg Pro Phe Ala 3875 3880 3885 Leu Arg..Arg Leu Ser Ala Gly Leu Ser Leu Pro Leu Leu Thr Ser Val 3890 3895 3900 Cys Leu Leu Leu Phe Ala Val His Phe Ala Val Ala Giu Ala Arg Thr 3905 3910 3915 3920 a Trp His Arg Glu Gly Arg Trp, Arg Val Leu Arg Leu Gly Ala Trp Ala 3925 3930 3935 Arg Trp Leu Leu Val Ala Leu Thr Ala Ala Thr Ala Leu Val Arg Leu *3940 3945 3950 Ala Gin Leu Gly Ala Ala Asp Arg Gin Trp Thr Arg Phe Val Arg Gly 3955 3960 3965 Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala Gin Leu Ser Ser 3970 3975 3986 Ala Ala Arg Gly Leu Ala Ala Ser Leu Leu Phe Leu Leu Leu Val Lys 3985 3990 3995 4000 Ala Ala Gin Gin Leu Arg Phe Val Arg Gin Trp.Ser Val Phe Gly Lys 4005 4010 4015 Thr Leu Cys Arg Ala Leu Pro Giu Leu Leu Gly Val Thr Leu Gly Leu 4020 4025 4030 Val Val Leu Gly Val Ala Tyr Ala Gin Leu Ala Ile Leu Leu Val Ser 4035 4040 4045 Ser Cys Val Asp Ser Leu Trp Ser Val Ala Gin Ala Leu Leu Val Leu 4050 4055 4060 Cys Pro Gly Thr Gly Leu Ser Thr Leu Cys Pro Ala Giu Ser Trp His 4065 4070 4075 4080 Leu Ser Pro Leu Leu Cys Val Gly Leu Trp Ala Leu Arg Leu Trp Gly 4085 4090 4095 Ala Leu Arg Leu Gly Ala Val Ile Leu Arg Trp Arg Tyr His Ala Leu 4100 4105 4110 Arg Gly Glu Leu Tyr Arg Pro Ala Trp Glu Pro Gin Asp Tyr Giu Met 4115 4120 4125 Val Giu Leu Phe Leu Arg Arg Leu Arg Leu Trp Met Gly Leu Ser Lys 4130 4135 4140 Val Lys Giu Phe Arg His Lys Val Arg Phe Giu Gly Met Glu Pro Leu 4145 4150 4155 4160 Pro Ser Arg Ser Ser Arg Gly Ser Lys Val Ser Pro Asp Val Pro Pro *4165 4170 4175 Pro Ser Ala Gly Ser Asp Ala Ser His Pro Ser Thr Ser Ser Ser Gin *4180 4185 4190 *Leu Asp Gly Leu Ser Val Ser Leu Gly Arg Leu Gly Thr Arg Cys Glu 4195 4200 4205 Pro.Glu Pro Ser Arg Leu Gin Ala Val Phe Glu Ala Leu Leu Thr Gin 4210 4215 4220 Phe Asp Arg Leu Asn Gin Ala Thr Glu Asp Val Tyr Gin Leu Giu Gin 4225 4230 4235 4240 Gin Leu His Ser Leu Gin Gly Arg Arg Ser Ser Arg Ala Pro Ala Gly 4245 4250 4255 Ser Ser Arg Gly Pro Ser Pro Gly Leu Arg Pro Ala Leu Pro Ser Arg **4260 4265 4270 Leu Ala Arg Ala Ser Arg Gly Val Asp.Leu Ala Thr Gly Pro Ser Arg 4275 4280 4285 Thr Pro Leu Arg Ala Lys Asn Lys Val His Pro Ser Ser Thr 4290 4295 4300 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: CACGACCTGT
CCCAGGCAT
19 INFORMATION FOR SEQ ID NO:7: SEQUENCE
CHARACTERISTICS:
LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: CTGGCGGGCG
AGGAGAT
17 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: CTTTGACAAG
CACATCT
17 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: CAACTGGCTG
GACAACA
17 INFORMATION FOR SEQ ID SEQUENCE
CHARACTERISTICS:
LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID AGGACCTGTC
CAGGCATC
18 INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: CTGCACTGAC CTCACGCATG
T
21 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid S* STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: ACGTTGGGCT CCTGGCGAAC
C
21 INFORMATION FOR SEQ ID NO:13: i) SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: AGGTCAACGT GGGCCTCCAA
GTAGT
INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: GCGCTTTGCA GACGGTAGGC
G
21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID AGCGCAACTA CTTGGAGGCC
C
21 21 INFORMATION FOR SEQ ID NO:16: SEQUENCE
CHARACTERISTICS:
LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: GCCAAAGGGA AAGGGATTGG
A
21 INFORMATION FOR SEQ ID NO:17: 154 SEQUENCE CHARACTERISTICS: LENGTH: 17 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: Cys Ser Arg Thr Pro Leu Arg Ala Lys Asn Lys Val His Pro Ser Ser 1 5 10 Thr INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 160 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: GGAAACAGGT TTGGAGAGGT GACACGACCT GTCCCAGGCA TCACAGCCAG GACAGGACCT GTCCAGGCAT CACAGCCGGG ATGTGCATAG CAGGGGTTTG GAACTATGAG GTGCCCAGGA 120 CCCAGGGTTG GATTGAAAAG GGCGCAGGGG ACTAAGATAA 160 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 131 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: GGAAACAGGT TTGGAGAGGT GACACGACCT GTCCCAGGCA TCACAGCCGG GATGTGCATA t: GCAGGGGTTT GGAACTATGA GGTGCCCAGG ACCCAGGGTT GGATTGAAAA GGGCGCAGGG 120 0.0000 GACTAAGATA A 131 131 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 60 base pairs TYPE: nucleic acid 155 STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..60 (xi) SEQUENCE DESCRIPTION: SEQ ID CGC CCG CGC CGC TTC ACT AGC TTC GAC CAG GTG GCG CAC GTG AGC TCC 48 Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala His Val Ser Ser 1 5 10 GCA GCC CGT GGC Ala Ala Arg Gly INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 20 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala His Val Ser Ser 1 5 10 Ala Ala Arg Gly INFORMATION FOR SEQ ID NO:22: S SEQUENCE CHARACTERISTICS: LENGTH: 60 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..60 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: S CGC CCG CGC CGC TTC ACT AGC TTC GAC CAG GTG GCG CAG CTG AGC TCC 48 Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala Gln Leu Ser Ser 1 5 10 1..15 GCA GCC CGT GGC Ala Ala Arg Gly 20 o f 9* 1 r INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 20 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala Gin Leu Ser Ser 1 5 10 Ala Ala Arg Gly INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 60 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..60 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: GCT GCC CAG CAC GTA CGC TTC GTG CGC CAG TGG TCC GTC TTT GGC AAG Ala Ala Gin His Val Arg Phe Val Arg Gin Trp Ser Val Phe Gly Lys 1 5 10 ACA TTA TGC CGA Thr Leu Cys Arg INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Ala Ala Gin His Val Arg Phe Val Arg Gin Trp Ser Val Phe Gly Lys 1 5 10 Thr Leu Cys Arg 0G e o o INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 60 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..60 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: GCT GCC CAG CAG CTA CGC TTC GTG CGC CAG TGG TCC GTC TTT GGC AAG 48 Ala Ala Gin Gin Leu Arg Phe Val Arg Gln Trp Ser Val Phe Gly Lys 1 5 10 ACA TTA TGC CGA Thr Leu Cys Arg INFORMATION FOR SEQ ID NO:27: SEQUENCE
CHARACTERISTICS:
LENGTH: 20 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: Ala Ala Gin Gin Leu Arg Phe Val Arg Gin Trp Ser Val Phe Gly Lys 1 5 10 Thr Leu Cys Arg S@oo INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 81 base pairs o. TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear S(ii) MOLECULE TYPE: cDNA 0 (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..81
**S
*000 o
SR
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: GCC ACT GGC CCC Ala Thr Gly Pro AGC AGG ACA Ser Arg Thr GTC CTC CTT Val Leu Leu CCT TCG GGC CAA GAA CAA GGT CCA Pro Ser Gly Gin Glu Gin Gly Pro 10 CCT GGC GGG Pro Gly Gly CCC CAG CAG CAC TTA Pro Gin Gin His Leu INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 27 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: Leu Ala Thr Gly Pro Ser Arg Thr Pro Ser Gly Gin Glu Gin Gly Pro 1 5 10 Pro Gin Gin His Leu Val Leu Leu Pro Gly Gly INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 81 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY: CDS LOCATION: 1..64 (xi) SEQUENCE DESCRIPTION: SEQ ID r r CTG GCC ACT Leu Ala Thr 1 CAC CCC AGC His Pro Ser GGC CCC AGC AGG ACA CCC CTT CGG GCC AAG AAC AAG GTC Gly Pro Ser Arg Thr Pro Leu Arg Ala Lys Asn Lys Val 5 10 AGC ACT T AGTCCTCCTT CCTGGCG Ser Thr INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 21 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: Leu Ala Thr Gly Pro Ser Arg Thr Pro Leu Arg Ala Lys Asn Lys Val 1 5 10 His Pro Ser Ser Thr INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: Ala Leu Thr His Gly His Ser Leu Leu Arg Asp Val Ser His Asn Leu 1 5 10 Leu Arg Ala Leu Asp Val Gly Leu Leu Ala Asn Leu Ser Ala Leu Ala 25 Glu Leu INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide ee (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: Leu His Gly Leu Lys Ala Leu Gly His Leu Asp Leu Ser Gly Asn Arg 1 5 10 Leu Arg Lys Leu Pro Pro Gly Leu Leu Ala Asn Phe Thr Leu Leu Arg 25 Thr Leu INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids c* TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: Pro Ala Leu Pro Ala Arg Thr Arg His Leu Leu Leu Ala Asn Asn Ser 1 5 10 Leu Gin Ser Val Pro Pro Gly Ala Phe Asp His Leu Pro Gin Leu Gin 25 Thr Leu INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Gly Gin Thr Leu Pro Ala Leu Thr Val Leu Asp Val Ser Phe Asn Arg 1 5 10 Leu Thr Ser Leu Pro Leu Gly Ala Leu Arg Gly Leu Gly Glu Leu Gin 25 Glu Leu INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: Thr Ala Phe Pro Val Asp Thr Thr Glu Leu Val Leu Thr Gly Asn Asn 1 5 10 Leu Thr Ala Leu Pro Pro Gly Leu Leu Asp Ala Leu Pro Ala Leu Arg 25 Thr Ala INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: Leu Glu His Gin Val Asn Leu Leu Ser Leu Asp Leu Ser Asn Asn Ala 1 5 10 Leu Thr His Leu Pro Asp Ser Leu Phe Ala His Thr Thr Asn Leu Thr 25 Asp Leu INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: Ile Arg His Leu Arg Ser Leu Thr Arg Leu Asp Leu Ser Asn Asn Gin 1 5 10 Ile Thr Ile Leu Ser Asn Tyr Thr Phe Ala Asn Leu Thr Lys Leu Ser 25 Thr Leu INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 34 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide SEQUENCE DESCRIPTION: SEQ ID NO:39: *o Phe Gly Asn Met Pro His Leu Gln Trp Leu Asp Leu Ser Tyr Asn Trp 1 5 10 Ile His Glu Leu Asp Phe Asp Ala Phe Lys Asn Thr Lys Gln Leu Gin 20 25 Leu Val INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 17 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID Leu Asp Leu Ser Asn Leu Thr Leu Pro Gly Leu Leu Ala Leu Leu Thr 1 5 10 Leu INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: Gly Pro Gln His Leu Pro Leu Pro Cys Arg Asn Leu Ser Gly Asn Pro 1 5 10 Phe Glu Cys Asp Cys Gly Leu Ala Trp Leu Pro Arg Trp Ala Glu Glu 25 INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear S(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: Gln Pro Asn Trp Asp Met Arg Asp Gly Phe Asp Ile Ser Gly Asn Pro 1 5 10 Trp Ile Cys Asp Gin Asn Leu Ser Asp Leu Tyr Arg Trp Leu Gin Ala 25 S INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: Thr Val Gin Gly Leu Ser Leu Gin Glu Leu Val Leu Ser Gly Asn Pro 1 5 10 Leu His Cys Ser Cys Ala Leu Arg Trp Leu Gln Arg Trp Glu Glu Glu 25 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: Phe Asp His Leu Pro Gin Leu Gin Thr Leu Asp Val Thr Gin Asn Pro 1 5 10 Trp His Cys Asp Cys Ser Leu Thr Tyr Leu Arg Leu Trp Leu Glu Asp 25 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID Phe Phe Gly Ser His Leu Leu Pro Phe Ala Phe Leu His Gly Asn Pro 1 5 10 Trp Leu Cys Asn Cys Glu Ile Leu Tyr Phe Arg Arg Trp Leu Gln Asp 25 INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: Leu Asp Ala Leu Pro Ala Leu Arg Thr Ala His Leu Gly Ala Asn Pro 1 5 10 Trp Arg Cys Asp Cys Arg Leu Val Pro Leu Arg Ala Trp Leu Ala Gly 25 INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: Leu Asn Arg Thr Met Lys Trp Arg Ser Val Lys Leu Ser Gly Asn Pro 1 5 10 Trp Met Cys Asp Cys Thr Ala Lys Pro Leu Leu Leu Phe Thr Gin Asp 25 INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 32 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide S(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: Phe Glu Asp Leu Lys Ser Leu Thr His Ile Ala Leu Gly Ser Asn Pro o :1 5 10 Leu Tyr Cys Asp Cys Gly Leu Lys Trp Phe Ser Asp Trp Ile Lys Leu 25 INFORMATION FOR SEQ ID NO:49: SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: Leu Leu Ser Gly Asn Pro Trp Cys Asp Cys Leu Trp Leu Arg Trp 1 5 10 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID Thr Ala Thr Arg Trp Asp Phe Gly Asp Gly Ser Ala Glu Val Asp Ala 1 5 10 Ala Gly Pro Ala Ala Ser His Arg Tyr Val Leu Pro Gly Arg Tyr His 25 Val Thr Ala INFORMATION FOR SEQ ID NO:51: SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: e e Val Leu Tyr Thr Trp Asp Phe Gly Asp Gly Ser Pro Val Leu Thr Gin S1 5 10 Ser Gln Pro Ala Ala Asn His Thr Tyr Ala Ser Arg Gly Thr Tyr His 20 25 S* Val Arg Leu INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
CC
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: Val Ala Tyr His Trp Asp Phe Gly Asp Gly Ser Pro Gly Gin Asp Thr 1 5 10 15 10 Asp Glu Pro Arg Ala Glu His Ser Tyr Leu Arg Pro Gly Asp Tyr Arg 25 Val Gin Val INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 35 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: Leu Ser Tyr Thr Trp Asp Phe Gly Asp Ser Ser Gly Thr Leu Ile Ser 1 5 10 15 Arg Ala Pro Val Val Thr His Thr Tyr Leu Glu Pro Gly Pro Val Thr 25 Ala Gin Val INFORMATION FOR SEQ ID NO:54: SEQUENCE
CHARACTERISTICS:
LENGTH: 35 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: Leu Ser Tyr Thr Trp Asp Phe Gly Asp Ser Thr Gly Thr Leu Ile Ser 1 5 10 Arg Ala Leu Thr Val Thr His Thr Tyr Leu Glu Ser Gly Pro Val Thr 25 Ala Gin Val INFORMATION FOR SEQ ID SEQUENCE
CHARACTERISTICS:
LENGTH: 22 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID Tyr Thr Trp Asp Phe Gly Asp Gly Ser Leu Pro Ala His Thr Tyr Leu 1 5 10 Pro Gly Tyr Val Gin Val INFORMATION FOR SEQ ID NO:56: SEQUENCE
CHARACTERISTICS:
LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: ATCTCCTCGC
CCGCCAG
17 INFORMATION FOR SEQ ID NO:57: SEQUENCE
CHARACTERISTICS:
LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) .0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: AGATGTGCTT GTCAAAG 17 INFORMATION FOR SEQ ID NO:58: SEQUENCE
CHARACTERISTICS:
LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: TGTTGTCCAG
CCAGTTG
17

Claims (38)

1. Isolated nucleic acid encoding human PKD1 polypeptide.
2. Isolated nucleic acid according to claim 1, wherein said nucleic acid is DNA comprising the sequence set forth in SEQ ID NO:2.
3. Isolated nucleic acid according to claim 1, wherein said nucleic acid is RNA.
4. Isolated nucleic acid according to claim 1, wherein said nucleic acid is cDNA comprising the sequence set forth in SEQ ID NO:4. Isolated nucleic acid that hybridizes under stringent conditions to the nucleic acid of claim 1.
6. Isolated nucleic acid according to claim comprising the sequence set forth in SEQ ID NO:3.
7. Isolated polypeptide encoded by the nucleic acid of claim 1.
8. Isolated polypeptide according to claim 7 :comprising the amino acid sequence set forth in SEQ ID
9. A vector comprising the isolated nucleic acid of claim 1.
10. A vector according to claim 9 further comprising a transcriptional regulatory element operably linked to said nucleic acid, said element having the ability to 169 direct the expression of genes of prokaryotic or eukaryotic cells and their viruses or combinations thereof.
11. A host cell comprising the vector of claim 9.
12. A method for producing PKD1 protein, which comprises: culturing the host cell of claim 11 in a medium and under conditions suitable for expression of said protein, and isolating said expressed protein.
13. Isolated human PKD1 gene, comprising the DNA sequence set forth in SEQ ID NO:2 having modifications selected from the group consisitng of: transitions, transversions, deletions, and insertions.
14. The gene of claim 13, comprising a DNA sequence whose presence in one or more copies in the genome of a subject is associated with adult-onset polycystic kidney disease in said subject. A recombinant vector comprising the DNA sequence of claim 13.
16. The vector of claim 15 further comprising a transcriptional regulatory element operably linked to said DNA sequence, said element having the ability to direct the expression of genes of prokaryotic or eukaryotic cells and their viruses or combinations hereof.
17. A host cell comprising the vector of claim
18. A method for producing mutant PKD1 protein, which comprises: 170 culturing the host cell of claim 17 in a medium and under conditions suitable for expression of said protein, and isolating said expressed protein.
19. An isolated nucleic acid comprising: 5'-AGGACCTGTCCAGGCATC-3' (SEQ ID An isolated nucleic acid comprising: 5'-GCGCTTTGCAGACGGTAGGCG-3' (SEQ ID NO:14).
21. An isolated nucleic acid comprising: 5'-AGGTCAACGTGGGCCTCCAAGTAGT-3' (SEQ ID NO:13).
22. An isolated nucleic acid comprising: 5'-AGCGCAACTACTTGGAGGCCC-3' (SEQ ID
23. A diagnostic method for screening human subjects to identify PKD1 carriers, which comprises the steps of: obtaining a sample of biological material from said subject; and assaying for the presence of mutant PKDI genes or their protein products in said biological material.
24. The method of claim 23 wherein said biological material comprises nucleic acid, and said assaying comprises: selectively amplifying the authentic PKD1 gene, or fragments thereof, from said biological material, and detecting the presence of normal and mutant PKD1 genes using an analytical method selected from the group consisting of: restriction enzyme digestion, direct S. 10 DNA sequencing, hybridization with sequence-specific oligonucleotides, single-stranded conformational polymorphism analysis, denaturating gradient gel electrophoresis (DDGE), two-dimensional gel electrophoresis, and combinations thereof. The method of claim 24 wherein said amplifying is performed in the presence of at least one oligonucleotide selected from the-group consisting of 5'-AGGACCTGTCCAGGCATC-3' (SEQ ID 5'-GCGCTTTGCAGACGGTAGGCG-3' (SEQ ID NO:14), 5'-AGGTCAACGTGGGCCTCCAAGTAGT-3' (SEQ ID NO:13), and 5'-AGCGCAACTACTTGGAGGCCC-3' (SEQ ID
26. The method of claim 23, wherein said assaying step comprises an immunoassay employing an antibody specific for said PKD1 gene product.
27. An isolated antibody directed against a peptide comprising the sequence (C)SRTPLRAKNKVHPSST (SEQ ID or immunogenic fragments thereof.
28. The antibody of claim 27 immunoreactive with a polypeptide encoded by the gene sequence set forth in SEQ ID NO:2.
29. An isolated antibody immunoreactive with a polypeptide encoded by the sequence set forth in SEQ ID NO:2.
30. An isolated antibody immunoreactive with a polypeptide comprising the amino acid sequence set forth in SEQ ID *oo.
31. A method for treating a disease condition having the characterisitics of APKD, which comprises administering to cells having defective PKD1 gene function a normal human PKD1 gene or fragments thereof, wherein said administration results in expression of therapeutically effective amounts of normal PKD1 protein or fragments thereof. 172
32. The method claim 31, wherein said normal human PKD1 gene comprises the DNA sequence of SEQ ID NO:2.
33. The method of claim 31, wherein said normal human PKD1 gene comprises the cDNA sequence of SEQ ID NO:4.
34. A method for treating a disease condition having the characterization of APKD, which comprises administering to cells having defective PKD1 gene function therapeutically effective amounts of a normal PKD1 protein or fragments thereof. The method of claim 34, wherein said PKD1 protein is encoded by the DNA sequence set forth in SEQ ID NO:2.
36. The method of claim 34, wherein said PKD1 protein is encoded by the cDNA sequence set forth in SEQ ID NO:4.
37. The method of claim 35, further comprising administering a polypeptide having an amino acid sequence set forth in SEQ ID
38. A composition for treating a disease condition having the characteristics of APKD, said composition comprising an isolated human PKD1 gene having the DNA sequence of SEQ ID NO:2, or fragments thereof, and a 5 Pharmaceutically acceptable carrier or diluent.
39. The composition of claim 38 comprising a vector into which said PKD1 gene is incorporated. The composition of claim 38, further comprising an isolated human PKD1 gene having the DNA sequence of SEQ ID NO:4 for fragments thereof. 173
41. A composition for treating a disease condition having the characteristics of APKD, said composition comprising a normal PKD1 protein encoded by the DNA sequence of SEQ ID NO:2, or fragments thereof, and a Pharmaceutically acceptable carrier or diluent.
42. The composition of claim 41 further comprising a polypeptide encoded by the DNA sequence of SEQ ID NO:4 or fragments thereof.
43. A unicellular or multicellular organism whose genome comprises a recombinant PKD1 gene or fragments thereof.
44. The organism of claim 43 wherein said PKD1 gene has the DNA sequence of SEQ ID NO:2 or fragments thereof. The organism of claim 43 wherein said PKD1 gene has the sequence of SEQ ID NO:4 or fragments thereof. DATED this 2nd day of July 2001 Genzyme Corporation AND Johns Hopkins University DAVIES COLLISON CAVE Patent Attorneys for the applicant *••go o **eo 174
AU54172/01A 1996-05-24 2001-07-02 Polycystic kidney disease gene Abandoned AU5417201A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU54172/01A AU5417201A (en) 1996-05-24 2001-07-02 Polycystic kidney disease gene
AU2004212565A AU2004212565A1 (en) 1996-05-24 2004-09-17 Polycystic kidney disease gene

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08655360 1996-05-24
US08658136 1996-06-03
AU54172/01A AU5417201A (en) 1996-05-24 2001-07-02 Polycystic kidney disease gene

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU32119/97A Division AU3211997A (en) 1996-05-24 1997-05-22 Polycystic kidney disease gene

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004212565A Division AU2004212565A1 (en) 1996-05-24 2004-09-17 Polycystic kidney disease gene

Publications (1)

Publication Number Publication Date
AU5417201A true AU5417201A (en) 2001-09-20

Family

ID=3740170

Family Applications (2)

Application Number Title Priority Date Filing Date
AU54172/01A Abandoned AU5417201A (en) 1996-05-24 2001-07-02 Polycystic kidney disease gene
AU2004212565A Abandoned AU2004212565A1 (en) 1996-05-24 2004-09-17 Polycystic kidney disease gene

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2004212565A Abandoned AU2004212565A1 (en) 1996-05-24 2004-09-17 Polycystic kidney disease gene

Country Status (1)

Country Link
AU (2) AU5417201A (en)

Also Published As

Publication number Publication date
AU2004212565A1 (en) 2004-10-14

Similar Documents

Publication Publication Date Title
CA2395781C (en) Detection and treatment of polycystic kidney disease
CN107941681B (en) Method for identifying quantitative cellular composition in biological sample
KR102657306B1 (en) Use of markers including filamin a in the diagnosis and treatment of prostate cancer
KR102613599B1 (en) Prediction method for risk of ischemic stroke onset
ES2792126T3 (en) Treatment method based on polymorphisms of the KCNQ1 gene
KR20150023904A (en) Use of markers in the diagnosis and treatment of prostate cancer
TWI849576B (en) Use of gene marker
KR20210136038A (en) Piezoelectric mechanosensitive ion channel component 1 (PIEZO1) variants and uses thereof
KR101695866B1 (en) Phosphodiesterase 9a as prostate cancer marker
US6566061B1 (en) Identification of polymorphisms in the PCTG4 region of Xq13
KR102661616B1 (en) GPR156 variants and their uses
US6071717A (en) Polycystic kidney disease gene and protein
WO1997044457A1 (en) Polycystic kidney disease gene
AU5417201A (en) Polycystic kidney disease gene
KR101818352B1 (en) Biomarkers for diagnosis and prognosis of bladder cancer and uses thereof
RU2812362C2 (en) Piezotype mechanesentive ion channel component 1 (piezo1) and its use
KR102326582B1 (en) Marker for diagnosing hearing impairment and deafness and use thereof
US20020081678A1 (en) Isolated nucleic acid molecules encoding human transporter proteins, and uses thereof
US20040161759A1 (en) Test and model for inflammatory disease
US20030027153A1 (en) Methods and compositions for diagnosing and treating neuropsychiatric disorders such as schizophrenia
US20020123083A1 (en) Nucleic acid endocing growth factor protein
KR20230057410A (en) Treatment of sepsis using PCSK9 and LDLR modulators
JP2002355069A (en) Inspection method of chronic rheumatoid arthritis by novel genetic polymorphism
JP2003180359A (en) New gene and protein encoded with the same
US20020164709A1 (en) Nucleic acid endocing growth factor protein

Legal Events

Date Code Title Description
MK5 Application lapsed section 142(2)(e) - patent request and compl. specification not accepted