AU8933001A - 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales - Google Patents

3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales Download PDF

Info

Publication number
AU8933001A
AU8933001A AU89330/01A AU8933001A AU8933001A AU 8933001 A AU8933001 A AU 8933001A AU 89330/01 A AU89330/01 A AU 89330/01A AU 8933001 A AU8933001 A AU 8933001A AU 8933001 A AU8933001 A AU 8933001A
Authority
AU
Australia
Prior art keywords
leu
virus
ser
ile
arg
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU89330/01A
Inventor
Brian R. Murphy
Valerie B. Randolph
Mohinderjit S. Sidhu
Joanne M. Tatem
Stephen A. Udem
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wyeth Holdings LLC
US Department of Health and Human Services
Original Assignee
US Department of Health and Human Services
US Government
American Cyanamid Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by US Department of Health and Human Services, US Government, American Cyanamid Co filed Critical US Department of Health and Human Services
Priority to AU89330/01A priority Critical patent/AU8933001A/en
Publication of AU8933001A publication Critical patent/AU8933001A/en
Assigned to GOVERNMENT OF THE UNITED STATES OF AMERICA, AS REPRESENTED BY THE SECRETARY OF THE DEPARTMENT OF HEALTH AND HUMAN SERVICES, THE, WYETH HOLDINGS CORPORATION reassignment GOVERNMENT OF THE UNITED STATES OF AMERICA, AS REPRESENTED BY THE SECRETARY OF THE DEPARTMENT OF HEALTH AND HUMAN SERVICES, THE Amend patent request/document other than specification (104) Assignors: AMERICAN CYANAMID COMPANY, GOVERNMENT OF THE UNITED STATES OF AMERICA, AS REPRESENTED BY THE SECRETARY OF THE DEPARTMENT OF HEALTH AND HUMAN SERVICES, THE
Priority to AU2004237877A priority patent/AU2004237877A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Description

S&FRef: 457034D1
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT
ORIGINAL
Name and Address of Applicants: American Cyanamid Company Five Giralda Farms Madison New Jersey 07940 United States of America a Actual Inventor(s): Address for Service: The Government of the United States of America as represented by The Department of Health and Human Services Suite 325 6011 Executive Boulevard Rockville Maryland 20852 United States of America Stephen A. Udem, Mohinderjit S. Sidhu, Joanne M.
Tatem, Brian R. Murphy, Valerie B. Randolph Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) 3' Genomic Promoter Region and Polymerase Gene Mutations Responsible for Attenuation in Viruses of the Order Designated Mononegavirales Invention Title: 0* The following statement is a full description of this invention, including the best method of performing it known to me/us:- Documents rcceived on: L 8 OV 2001 Batch No: 5845c
I
3' GENOMIC PROMOTER REGION AND POLYMERASE
GENE
MUTATIONS RESPONSIBLE FOR ATTENUATION IN VIRUSES OF THE ORDER DESIGNATED
MONONEGAVIRALES
Field Of The Invention This invention relates to isolated, recombinantly-generated, attenuated, nonsegmented, negative-sense, single stranded RNA viruses of the Order designated Mononegavirales having at least one attenuating mutation in the 3' genomic promoter region and having at least one attenuating mutation in the RNA polymerase gene. This invention was made with Government support under a grant awarded by the Public Health Service. The Government has certain rights in the invention.
Background Of The Invention 20 20 Enveloped, negative-sense, single stranded •RNA viruses are uniquely organized and expressed. The genomic RNA of negative-sense, single stranded viruses serves two template functions in the context of a nucleocapsid: as a template for the synthesis of 25 messenger RNAs (mRNAs) and as a template for the synthesis of the antigenome strand. Negativesense, single stranded RNA viruses encode and package their own RNA dependent RNA Polymerase. Messenger RNAs are only synthesized once the virus has been uncoated 30 in the infected cell. Viral replication occurs after synthesis of the mRNAs and requires the continuous synthesis of viral proteins. The newly synthesized antigenome strand serves as the template for generating further copies of the strand genomic
RNA.
2 The polymerase complex actuates and achieves transcription and replication by engaging the cisacting signals at the 3' end of the genome, in particular, the promoter region. Viral genes are then transcribed from the genome template unidirectionally from its 3' to its 5' end. There is always less mRNA made from the downstream genes the polymerase gene relative to their upstream neighbors the nucleoprotein gene Therefore, there is always a gradient of mRNA abundance according to the position of the genes relative to the 3'-end of the genome.
Based on the revised reclassification in 1993 by the International Committee on the Taxonomy of Viruses, an Order, designated Mononegavirales, has been established. This Order contains three families of enveloped viruses with single stranded, nonsegmented RNA genomes of minus polarity (negative-sense). These families are the Paramyxoviridae, Rhabdoviridae and :oo. Filoviridae. The family Paramyxoviridae has been further divided into two subfamilies, Paramyxovirinae and Pneumovirinae. The subfamily Paramyxovirinae contains three genera, Paramyxovirus, Rubulavirus and Morbillivirus. The subfamily Pneumovirinae contains the genus Pneumovirus.
The new classification is based upon morphological criteria, the organization of the viral genome, biological activities and the sequence relationships of the proteins. The morphological distinguishing feature among enveloped viruses for the S 30 subfamily Paramyxovirinae is the size and shape of the nucleocapsids (diameter 18mm, Imm in length, pitch of nm), which have a left-handed helical symmetry. The biological criteria are: 1) antigenic cross-reactivity between members of a genus, and 2) the presence of neuraminidase activity in the genera Paramyxovirus, 3 Rubulavirus and its absence in genus Mooillivirus. In addition, variations in the coding potential of the P gene are considered, as is the presence of an extra gene (SH) in Rubulaviruses.
Pneumoviruses can be distinguished from Paramyxovirinae morphologically because they contain narrow nucleocapsids. In addition, pneumoviruses have major differences in the number of protein-encoding cistrons (10 in pneumoviruses versus 6 in Paramyxovirinae) and an attachment protein that is very different from that of Paramyxovirinae. Although the paramyxoviruses and pneumoviruses have six proteins that appear to correspond in function P, M, G/H/HN, F and only the latter two proteins exhibit significant sequence relatedness between the two subfamilies. Several pneumoviral proteins lack counterparts in most of the paramyxoviruses, namely the nonstructural proteins NS1 and NS2, the small hydrophobic protein SH, and a second protein M2. Some S. 20 paramyxoviral proteins, namely C and V, lack counterparts in pneumoviruses. However, the basic genomic organization of pneumoviruses and paramyxoviruses is the same. The same is true of rhabdoviruses and filoviruses. Table 1 presents the current taxonomical classification of these viruses, together with examples of each genus.
Table 1 Classification of Nonsegmented, negative-sense, single stranded RNA Viruses of the Order Mononegavirales Family Paramvxoviridae Subfamily Paramyxovirinae Genus Paramyxovirus Sendai virus (mouse parainfluenza virus type 1) 4 Human parainfluenza virus (PIV) types 1 and 3 Bovine parainfluenza virus (BPV) type 3 Genus Rubulavirus Simian virus 5 (SV) (Canine parainfluenza virus type 2) Mumps virus Newcastle disease virus (NDV) (avian Paramyxovirus 1) Human parainfluenza virus types 2, 4a and 4b Genus Morbillivirus Measles virus (MV) Dolphin Morbillivirus Canine distemper virus (CDV) Peste-des-petits-ruminants virus Phocine distemper virus Rinderpest virus SSubfamily Pneumovirinae 20 Genus Pneumovirus Human respiratory syncytial virus (RSV) Bovine respiratory syncytial virus Pneumonia virus of mice Turkey rhinotracheitis virus Family Rhabdoviridae Genus Lyssavirus Rabies virus Genus Vesiculovirus Vesicular stomatitis virus Genus Ephemerovirus Bovine ephemeral fever virus Family Filovirdae Genus Filovirus Marburg virus 5 For many of these viruses, no vaccines of any kind are available. Thus, there is a need to develop vaccines against such human and animal pathogens. Such vaccines would have to elicit a protective immune response in the recipient. The qualitative and quantitative features of such a favorable response are extrapolated from those seen in survivors of natural virus infection, who, in general, are protected from reinfection by the same or highly related viruses for some significant duration thereafter.
A variety of approaches can be considered in seeking to develop such vaccines, including the use of: purified individual viral protein vaccines (subunit vaccines); inactivated whole virus preparations; and live, attenuated viruses.
Subunit vaccines have the desirable feature of being pure, definable and relatively easily produced in abundance by various means, including recombinant Oo.DNA expression methods. To date, with the notable 9* e 20 exception of hepatitis B surface antigen, viral subunit •vaccines have generally only elicited short-lived and/or inadequate immunity, particularly in naive recipients.
"Formalin inactivated whole virus preparations of polio (IPV) and hepatitis A have proven safe and :efficacious. In contrast, immunization with similarly inactivated whole viruses such as respiratory syncytial virus and measles virus vaccines elicited unfavorable immune responses and/or response profiles which predisposed vaccinees to exaggerated or aberrant disease when subsequently confronted with the natural or "wild-type" virus.
Early attempts (1966) to vaccinate young children using a parenterally administered formalininactivated RSV vaccine. Unfortunately, several field 6 trials of this vaccine revealed serious adverse reactions the development of a severe illness with unusual features following subsequent natural infection with RSV (Bibliography entries It has been suggested that this formalinized RSV antigen elicited an abnormal or unbalanced immune response profile, predisposing the vaccinee to RSV disease Thereafter, live, attenuated RSV vaccine candidates were generated by cold passage or chemical mutagenesis. These RSV strains were found to have reduced virulence in seropositive adults.
Unfortunately, they proved either over or underattenuated when given to seronegative infants; in some cases, they also were found to lack genetic stability Another vaccination approach using parenteral administration of live virus was ineffective and efforts along this line were discontinued Notably, these live RSV vaccines were never associated S: with disease enhancement as observed with the formalin- .20 inactivated RSV vaccine described above. Currently, there are no RSV vaccines approved for administration to humans, although clinical trials are now in progress with cold-passaged, chemically mutagenized strains of RSV designated A2 and B-l.
Appropriately attenuated live derivatives of wild-type viruses offer a distinct advantage as vaccine candidates. As live, replicating agents, they initiate infection in recipients during which viral gene products are expressed, processed and presented in the 30 context of the vaccinee's specific MHC class I and II molecules, eliciting humoral and cell-mediated immune responses, as well as the coordinate cytokine patterns, which parallel the protective immune profile of survivors of natural infection.
7 This favorable immune response pattern is contrasted with the delimited responses elicited by inactivated or subunit vaccines, which typically are largely restricted to the humoral immune surveillance arm. Further, the immune response profile elicited by some formalin inactivated whole virus vaccines, e.g., measles and respiratory syncytial virus vaccines developed in the 1960's, have not only failed to provide sustained protection, but in fact have led to a predisposition to aberrant, exaggerated, and even fatal illness, when the vaccine recipient later confronted the wild-type virus.
While live, attenuated viruses have highly desirable characteristics as vaccine candidates, they have proven to be difficult to develop. The crux of the difficulty lies in the need to isolate a derivative of the wild-type virus which has lost its diseaseproducing potential virulence), while retaining sufficient replication competence to infect the 20 recipient and elicit the desired immune response o profile in adequate abundance.
Historically, this delicate balance between virulence and attenuation has been achieved by serial S• passage of a wild-type viral isolate through different 25 host tissues or cells under varying growth conditions (such as temperature) This process presumably favors oo 0 the growth of viral variants (mutants), some of which have the favorable characteristic of attenuation.
Occasionally, further attenuation is achieved through 30 chemical mutagenesis as well.
This propagation/passage scheme typically leads to the emergence of virus derivatives which are temperature sensitive, cold-adapted and/or altered in their host range one or all of which are changes 8 from the wild-type, disease-causing viruses i.e., changes that may be associated with attenuation.
Several live virus vaccines, including those for the prevention of measles and mumps (which are paramyxoviruses), and for protection against polio and rubella (which are positive strand RNA viruses), have been generated by this approach and provide the mainstay of current childhood immunization regimens throughout the world.
Nevertheless, this means for generating attenuated live virus vaccine candidates is lengthy and, at best, unpredictable, relying largely on the selective outgrowth of those randomly occurring genomic mutants with desirable attenuation characteristics.
The resulting viruses may have the desired phenotype in vitro, and even appear to be attenuated in animal models. However, all too often they remain either under- or overattenuated in the human or animal host for whom they are intended as vaccine candidates.
20 Even as to current vaccines in use, there is still a need for more efficacious vaccines. For example, the current measles vaccines provide reasonably good protection. However, recent measles epidemics suggest deficiencies in the efficacy of 25 current vaccines. Despite maternal immunization, high rates of acute measles infection have occurred in children under age one, reflecting the vaccines, inability to induce anti-measles antibody levels comparable to those developed following wild-type 30 measles infection As a result, vaccineimmunized mothers are less able to provide their infants with sufficient transplacentally-derived passive antibodies to protect the newborns beyond the first few months of life.
9 Acute measles infections in previously immunized adolescents and young adults point to an additional problem. These secondary vaccine failures indicate limitations in the current vaccines' ability to induce and maintain antiviral protection that is both abundant and long-lived (11,12,13). Recently, yet another potential problem was revealed. The hemagglutinin protein of wild-type measles isolated over the past 15 years has shown a progressively increasing distance from the vaccine strains (14).
This "antigenic drift" raises legitimate concerns that the vaccine strains may not contain the ideal antigenic repertoire needed to provide optimal protection. Thus, there is a need for improved vaccines.
Rational vaccine design would be assisted by a better understanding of these viruses, in particular, by the identification of the virally encoded determinants of virulence as well as those genomic *changes which are responsible for attenuation.
Summary Of The Invention Accordingly, it is an object of this invention to identify those regions of the genome of the RNA viruses of the Order Mononegavirales where mutations result in attenuation of those viruses.
It is a further object of this invention to produce recombinantly-generated viruses which incorporate such attenuating mutations in their genomes.
It is still a further object of this invention to formulate vaccines containing such attenuated viruses.
These and other objects of the invention as discussed below are achieved by the generation and 10 isolation of recombinantly-generated, attenuated, nonsegmented, negative-sense, single stranded RNA viruses of the Order Mononegavirales having at least one attenuating mutation in the 3' genomic promoter region and having at least one attenuating mutation in the RNA polymerase gene.
In the case of measles virus, at least one attenuating mutation in the 3' genomic promoter region is selected from the group consisting of nucleotide 26 (A nucleotide 42 (A T or A C) and nucleotide 96 (G where these nucleotides, as well as others delineated in this application (unless stated otherwise), are presented in positive strand, antigenomic, that is, message (coding) sense, and at least one attenuating mutation in the RNA polymerase gene is selected from the group consisting of nucleotide changes which produce changes in an amino acid selected from the group consisting of residues 331 (isoleucine threonine), 1409 (alanine threonine), 20 1624 (threonine alanine), 1649 (arginine methionine), 1717 (aspartic acid alanine), 1936 (histidine tyrosine), 2074 (glutamine arginine) and 2114 (arginine lysine) In the case of human parainfluenza virus type 3, at least one attenuating mutation in the 3' genomic promoter region is selected from the group consisting of nucleotide 23 (T nucleotide 24 (C -4 T), nucleotide 28 (G T) and nucleotide 45 (T and at least one attenuating mutation in the RNA polymerase 30 gene is selected from the group consisting of nucleotide changes which produce changes in an amino acid selected from the group consisting of residues 942 (tyrosine histidine), 992 (leucine 11 phenylalanine) ;and 155 8 (threonine isoleucine) In the case of human respiratory syncytial virus subgroup B, at least one attenuating mutation in the 31 genomic promoter region is selected from the group consisting of nucleotide 4 (C -4 G) and the insercion, of an additional A in the stretch of A's at nucleotides 6-11, and at least one attenuating mutation in the RNA polymerase gene is selected from the group consisting of nucleotide changes which produce changen in an amino acid selected from the group consisting of residues 353 (arginine -4 lysinle) 451 (lysine arginine) 1229 (aspartic acid -4 asparagine) 2029 (threonine -4 isoleucine) and 2050 (asparagine aspartic acid).
In another embodiment of this invention, attenuated virus is used to prepare vaccines which elicit a protective immune response against the wildtype form of the virus.
In yet another embodiment of this invention, an isolated, positive strand, antigenomic message sense nucleic acid molecule (or an isolated, negative strand *genomic sense nucleic acid molecule) having the complete viral nucleotide sequence (whether of wildtype virus or virus attenuated by non-recombinant means) is manipulated by introducing one or more of the attenuating mutations described in this application to generate an isolated, recombinant2.y-generate~d attenuated virus. This virus is then uned to prepare vaccines wh~ich elicit a protective immune response against the wild-type form of the virus.
Xn still another embodiment of this invention, such a complete wild-type or vaccine viral nucleotide sequence in used: to design PCR primers for use in a PCR assay to detect the presence of the 12 corresponding virus in a sample; or to design and select peptides for use in an ELISA to detect the presence of the corresponding virus in a sample.
Brief Description Of The Fiqures Figure 1 depicts the passage history of the Edmonston measles virus The abbreviations have the following meanings: HK human kidney; HA human amnion; CE(am) chick embryo; CEF chick embryo fibroblast; DK dog kidney; WI-38 human diploid cells; SK sheep kidney; plaque cloning. The number following each abbreviation represents the number of passages.
Figure 2 depicts a map of the measles virus genome showing putative cis-acting regulatory elements at and near the genome and antigenome termini. Top a schematic map of the measles virus genome, beginning at the 3' end with 52 nucleotides of leader sequence (1) 20 and ending at the 5' terminus with 37 nucleotides of trailer sequence Gene boundaries are denoted by 99"" vertical bars; below each gene is the number of cistronic nucleotides. Bottom an expanded schematic 9 9. view of the 3' extended genomic promoter regions of genome and antigenome, showing the position and sequence of the two highly conserved domains, A and B.
0" The intervening intergenic trinucleotide is denoted as o* well. Nascent 5' RNAs encompassing the A' to B' regions are presumed to contain the regulatory sequence 30 at which the N protein encapsidation initiates.
Figure 3 depicts a genetic map of the RSV subgroup B wild-type strains designated 2B and 18537 (top portion), the intergenic sequences of those strains (middle portion) and the 68 nucleotide overlap between the M2 and L genes (bottom portion). The RSV 13 2B stain has six fewer nucleotides in the G gene, encoding two fewer amino acid residues in the G protein, as compared to the 18537 strain. The 2B strain has 145 nucleotides in the 5' trailer region, as compared to 149 nucleotides in the 18537 strain. The 2B strain has one more nucleotide in each of the NS-1, NS-2 and N genes, and one fewer nucleotide in each of the M and F genes, as compared to the 18537 strain.
Detailed Description Of The Invention Transcription and replication of negativesense, single stranded RNA viral genomes are achieved through the enzymatic activity of a multimeric protein acting on the ribonucleoprotein core (nucleocapsid).
Naked genomic RNA cannot serve as a template. Instead, these genomic sequences are recognized only when they are entirely encapsidated by the N protein into the I* nucleocapsid structure. It is only in that context :0 20 that the genomic and antigenomic terminal promoter g" "sequences are recognized to initiate the transcriptional or replication pathways.
All paramyxoviruses require the two viral g*o proteins, L and P, for these polymerase pathways to o 25 proceed. The pneumoviruses, including RSV, also require the transcription elongation factor, M2, for the transcriptional pathway to proceed efficiently.
Additional cofactors may also play a role, including perhaps the virus-encoded NS1 and NS2 proteins, as well 30 as perhaps host-cell encoded proteins.
However, considerable evidence indicates that it is the L protein which performs most, if not all, the enzymatic processes associated with transcription and replication, including initiation, and termination of ribonucleotide polymerization, capping and 14 polyadenylation of mRNA transcripts, methylation and perhaps specific phosphorylation of P proteins. The L protein's central role in genomic transcription and replication is supported by its large size, sensitivity to mutations, and its catalytic level of abundance in the transcriptionally active viral complex (16).
These considerations led to the proposal that L proteins consist of a linear array of domains whose concatenated structure integrates discrete functions Indeed, three such delimited, discrete elements within the negative-sense virus L protein have been identified based on their relatedness to defined functional domains of other well-characterized proteins. These include: a putative RNA template 15 recognition and/or phosphodiester bond formation domain; an RNA binding element; and an ATP binding domain. All prior studies of L proteins of nonsegmented negative-sense, single stranded
RNA
viruses have revealed these putative functional elements (17).
e Without being bound by the following, it is reasonable to presume that these non-protein coding, promoter and other cis-acting genomic regulatory domains are important determinants of the efficiency 25 with which transcription and replication by measles virus (MV) and other viruses of the Order Mononegavirales are actualized, in association with the L protein, and that they may therefore be virulence determinants for these viruses as well.
In summary, the invention is believed to encompass a coordinate set of changes between the cisacting regulatory signal genomic promoter region) and the polymerase gene which results in attenuation of the virus while retaining sufficient ability of the virus to replicate. Attenuation is 15 optimized by rational mutations of the 3' genomic promoter region and the polymerase gene, which provide the desired balance of replication efficiency: so that the virus vaccine is no longer able to produce disease, yet retains its capacity to infect the vaccinee's cells, to express sufficiently abundant gene products to elicit the full spectrum and profile of desirable immune responses, and to reproduce and disseminate sufficiently to maximize the abundance of the immune response elicited.
Without being bound by the following, attenuating mutations in the extended promoter (3' genomic promoter region) and in the polymerase gene are believed to affect the display of cis-acting signals 15 and the conformation of the polymerase complex engaging S: these signals. For example, when encapsidated, the Se promoter RNA is coiled in a helical array. Changes in oe e promoter sequence may affect the relative positions at which the conserved signals are displayed relative to one another. Specifically, the measles wild-type 3' genomic promoter region has a pyrimidine (uracil) at positions 26 and 42 (the antigenomic message sense sequences have the purine adenine). The vaccine strains have purines at those positions (the 25 antigenomic message sense sequences have the corresponding pyrimidines; see Table 3 in Example 1 below). The larger purines may change the distance and/or angular display between the conserved domains of the promoter in measles, positions 1-11 and 87- 98), resulting in an altered spatial presentation of the cis-acting signals to the polymerase.
Animal studies have demonstrated a decrease in viral replication sufficient to avoid illness but adequate to elicit the desired immune response. This likely represents a decrease in transcription, a 16 decrease in gene expression of virally encoded proteins, a decrease in antisense templates and, therefore, the production of fewer new genomes. The resulting attenuated viruses are significantly less virulent than the wild-type.
The attenuating mutations described herein may be introduced into viral strains by two methods: Conventional means such as chemical mutagenesis during virus growth in cell cultures to which a chemical mutagen has been added, selection of virus that has been subjected to passage at suboptimal temperature in order to select temperature sensitive and/or cold adapted mutations, identification of mutant virus that produce small plaques in cell culture, and 15 passage through heterologous hosts to select for host .range mutations. These viruses are then screened for e.e\ .attenuation of their biological activity in an animal model. Attenuated viruses are subjected to nucleotide sequencing of their 3' genomic promoter region and polymerase genes to locate the sites of attenuating o mutations. Once this has been done, method is then •carried out.
A preferred means of introducing ee: attenuating mutations comprises making predetermined 25 mutations using site-directed mutagenesis. These mutations are identified either by method or by reference to closely-related viruses whose attenuating mutations are already known. One or more mutations are introduced into each of the 3' genomic promoter region and the polymerase gene. Cumulative effects of different combinations of coding and non-coding changes can also be assessed.
The mutations to the 3' genomic promoter region and polymerase gene are introduced by standard recombinant DNA methods into a DNA copy of the viral 17 genome. This may be a wild-type or a modified viral genome background (such as viruses modified by method thereby generating a new virus. Infectious clones or particles containing these attenuating mutations are generated using the cDNA "rescue" system, which has been applied to a variety of viruses, including Sendai virus measles virus (19); respiratory syncytial virus rabies (21); vesicular stomatitis virus (VSV) and rinderpest virus these references are hereby incorporated by reference. See, for measles virus rescue, published International patent application WO 97/06270, designating the United States for PIV-3 rescue, U.S. provisional patent application 60/047575 for 15 RSV rescue, published International patent application S: WO 97/12032, designating the United States these applications are hereby incorporated by reference.
Oe Briefly, all Mononegavirales rescue systems can be summarized as follows: Each requires a cloned DNA equivalent of the entire viral genome placed between a suitable DNA-dependent RNA polymerase promoter the T7 RNA polymerase promoter) and a self-cleaving ribozyme sequence the hepatitis delta ribozyme) which is inserted into a propagatable S* 25 bacterial plasmid. This transcription vector provides the readily manipulable DNA template from which the RNA polymerase T7 RNA polymerase) can faithfully transcribe a single-stranded RNA copy of the viral antigenome (or genome) with the precise, or nearly precise, 5' and 3' termini. The orientation of the viral genomic DNA copy and the flanking promoter and ribozyme sequences determine whether antigenome or genome RNA equivalents are transcribed. Also required for rescue of new virus progeny are the virus-specific trans-acting proteins needed to encapsidate the naked, 18 single-stranded viral antigenome or genome RNA transcripts into functional nucleocapsid templates: the viral nucleocapsid (N or NP) protein, the polymerase-associated phosphoprotein and the polymerase protein. These proteins comprise the active viral RNA-dependent RNA polymerase which must engage this nucleocapsid template to achieve transcription and replication.
The trans-acting proteins required for measles virus rescue are the encapsidating protein
N,
and the polymerase complex proteins, P and L. For PIV- 3, the encapsidating protein is designated NP, and the polymerase complex proteins are also referred to as P and L. For RSV, the virus-specific trans-acting 15 proteins include N, P and L, plus an additional protein, M2, the RSV-encoded transcription elongation factor.
Typically, these viral trans-acting proteins are generated from one or more plasmid expression vectors encoding the required proteins, although some Sor all of the required trans-acting proteins may be produced within mammalian cells engineered to contain and express these virus-specific genes and gene o*e: products as stable transformants.
25 The typical (although not necessarily exclusive) circumstances for rescue include an appropriate mammallian cell milieu in which T7 polymerase is present to drive transcription of the antigenomic (or genomic) single-stranded RNA from the viral genomic cDNA-containing transcription vector.
Either cotranscriptionally or shortly thereafter, this viral antigenome (or genome) RNA transcript is encapsidated into functional templates by the nucleocapsid protein and engaged by the required polymerase components produced concurrently from co- 19 transfected expression plasmids encoding the required virus-specific trans-acting proteins. These events and processes lead to the prerequisite transcription of viral mRNAs, the replication and amplification of new genomes and, thereby, the production of novel viral progeny, rescue.
For the rescue of rabies, VSV and Sendai, T7 polymerase is provided by recombinant vaccinia virus VTF7-3. This system, however, requires that the rescued virus be separated from the vaccinia virus by physical or biochemical means or by repeated passaging in cells or tissues that are not a good host for poxvirus. For MV cDNA rescue, this requirement is avoided by creating a cell line that expresses T7 15 polymerase, as well as viral N and P proteins. Rescue is achieved by transfecting the genome expression vector and the L gene expression vector into the helper cell line. Advantages of the host-range mutant of the vaccinia virus, MVA-T7, which expresses the T7 RNA polymerase, but does not replicate in mammalian cells, 'i .e are exploited to rescue RSV, Rinderpest virus and MV.
After simultaneous expression of the necessary encapsidating proteins, synthetic full length antigenomic viral RNA are encapsidated, replicated and S 25 transcribed by viral polymerase proteins and replicated genomes are packaged into infectious virions. In addition to such antigenomes, genome analogs have now *been successfully rescued for Sendai and PIV-3 (25,27).
The rescue system thus provides a composition which comprises a transcription vector comprising an isolated nucleic acid molecule encoding a genome or antigenome of a nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales having at least one attenuating mutation in the 3' genomic promoter region and having at least one attenuating 20 mutation in the RNA polymerase gene, together with at least one expression vector which comprises at least one isolated nucleic acid molecule encoding the transacting proteins necessary for encapsidation, transcription and replication N, P and L for measles virus; NP, P and L for PIV-3; N, P, L and M2 for RSV). Host cells are then transformed or transfected with the at least two expression vectors just described. The host cells are cultured under conditions which permit the co-expression of these vectors so as to produce the infectious attenuated virus.
The rescued infectious virus is then tested for its desired phenotype (temperature sensitivity, 1 5 cold adaptation, plaque morphology, and transcription and replication attenuation), first by in vitro means.
The mutations at the cis-acting 3' genomic promoter region are also tested using the minireplicon system where the required trans-acting encapsidation and polymerase activities are provided by wild-type or vaccine helper viruses, or by plasmids expressing the N, P and different L genes harboring gene-specific attenuating mutations (19,28).
If the attenuated phenotype of the rescued 25 virus is present, challenge experiments are conducted with an appropriate animal model. Non-human primates provide the preferred animal model for the pathogenesis of human disease. These primates are first immunized with the attenuated, recombinantly-generated virus, then challenged with the wild-type form of the virus.
Monkeys are infected by various routes, including but not limited to intranasal, intratracheal or subcutaneous routes of inoculation (29).
Experimentally infected rhesus and cynomolgus macaques have also served as animal models for studies of 21 vaccine-induced protection against measles Protection is measured by such criteria as disease signs and symptoms, survival, virus shedding and antibody titers. If the desired criteria are met, the attenuated, recombinantly-generated virus is considered a viable vaccine candidate for testing in humans. The "rescued" virus is considered to be "recombinantlygenerated", as are the progeny and later generations of the virus, which also incorporate the attenuating mutations.
Even if a "rescued virus is underattenuated or overattenuated relative to optimum levels for vaccine use, this is information which is valuable for developing such optimum strains.
15 Optimally, a codon containing an attenuating point mutation may be stabilized by introducing a second or a second plus a third mutation in the codon without changing the amino acid encoded by the codon bearing only the attenuating point mutation.
Infectious virus clones containing the attenuating and .stabilizing mutations are also generated using the cDNA "rescue" system described above.
Measles virus serves as a useful model for this invention, because sequence data are now available 25 as described herein for the disease-causing wild-type virus and for the disease-preventing vaccines which have a demonstrated history of efficacy.
Measles virus was first isolated in tissue culture in 1954 (31) from an infected patient named David Edmonston. This Edmonston strain of measles became the progenitor for many live-attenuated measles vaccines including Moraten, which is the current vaccine in the United States (Attenuvax'M; Merck Sharp Dohme, West Point, PA) and was licensed in 1968 and has proven to be efficacious.
22 Aggressive immunization programs instituted in the mid to late 1960s resulted in the precipitous drop in reported measles cases from near 700,000 in 1965 to 1500 in 1983. In parallel, other vaccine strains were also developed from the Edmonston strain (see Fig. Schwarz (Institut Merieux, Lyon, France), Zagreb (Zagreb, Yugoslavia) and AIK-C (Japan). These other vaccines have also proven to be efficacious and have been used extensively. An early, reactogenic, underattenuated vaccine strain (RubeovaxM: Merck Sharp Dohme) produced measles-like illness in children and its use thus was discontinued. It, however, was further attenuated successfully to produce the Moraten vaccine strain (see Fig. 1) Live measles virus 15 vaccine provides a success story of the development of an efficacious vaccine and provides a model for understanding the molecular mechanisms of viral vaccine attenuation among nonsegmented, negative-sense, single stranded RNA viruses.
Because of its significance as a major cause of human morbidity and mortality, measles virus (MV) has been quite extensively studied. MV is a large, relatively spherical, enveloped particle composed of S. two compartments, a lipoprotein membrane and a 25 ribonucleoprotein particle core, each having distinct biological functions The virion envelope is a ~host cell-derived plasma membrane modified by three virus-specified proteins: The hemagglutinin
(H;
approximately 80 kilodaltons and fusion (F 1 2 approximately 60 kD) glycoproteins project on the virion surface and confer host cell attachment and entry capacities to the viral particle (16).
Antibodies to H and/or F are considered protective since they neutralize the virus' ability to initiate infection (34,35,36). The matrix approximately 37 23 kD) protein is the amphipathic protein lining the membrane's inner surface, which is thought to orchestrate virion morphogenesis and thus consummate virus reproduction The virion core contains the 15,894 nucleotide long genomic RNA upon which template activity is conferred by its intimate association with approximately 2600 molecules of the approximately 60 kD nucleocapsid protein (38,39,40). Loosely associated with this approximately one micron long helical ribonucleoprotein particle are enzymatic levels of the viral RNA dependent RNA polymerase
(L;
approximately 240 kD) which in concert with the polymerase cofactor approximately 70 kD), and perhaps yet other virus-specified as well as 15 host-encoded proteins, transcribes and replicates the MV genome sequences (41).
To date, the entire nucleotide sequences (only for the Edmonston B laboratory strain and the AIK-C vaccine strain), coding potential, and organization of the MV genome have been reported (33).
The six virion structural proteins are encoded by six contiguous, non-overlapping genes which are arrayed as follows: Two additional MV gene products of as yet uncertain function have also been 25 identified. These two nonstructural proteins, known as C (approximately 20 kD) and V (approximately 45 kD), are both encoded by the P gene, the former by a second reading frame within the P mRNA; the latter by a cotranscriptionally edited P gene-derived mRNA which encodes a hybrid protein having the amino terminal sequences of P and a new zinc finger-like cysteine-rich carboxy terminal domain (16).
In addition to the sequences encoding the virus-specified proteins, the MV genome contains distinctive non-protein coding domains resembling those 24 directing the transcriptional and replicative pathways of related viruses (16,42). These regulatory signals lie at the 3' and 5' ends of the MV genome and in short internal regions spanning each intercistronic boundary.
The former encode the putative promoter and/or regulatory sequence elements directing genomic transcription, genome and antigenome encapsidation, and replication. The latter signal transcription termination and polyadenylation of each monocistronic viral mRNA and then reinitiation of transcription of the next gene. In general, the MV polymerase complex appears to respond to these signals much as the RNA-dependent RNA polymerases of other non-segmented negative strand RNA viruses (16,42,43,44).
1 5 Transcription initiates at or near the 3' end of the MV genome and then proceeds in a 5' direction producing monocistronic mRNAs (40,42,45). As the polymerase traverses the MV genomic template, it encounters putative stop/start signals which, in 3' to 5' order, are: a semi-conserved transcription termination/polyadenylation signal (A/G U/C UA A/U NN
A
4 where N may be any of the four bases) at which each monocistronic RNA is completed; a non-transcribed intergenic trinucleotide punctuation mark (CUU; except 25 at the H:L boundary where it is CGU); and a semiconserved start signal for transcription initiation of the next gene (AGG A/G NN C/A A A/G G A/U, where N may be any of the four bases) (45,46). Since some polymerase complexes fail to reinitiate, the abundance of each MV mRNA diminishes in parallel with the distance of the encoding gene from the genomic 3' end.
This mRNA gradient directly corresponds to the relative abundance of each virus-specified protein. This indicates that MV protein expression is ultimately controlled at the transcriptional level (44).
25 The 3' and 5' MV genomic termini contain non-protein coding sequences with distinct parallels to the leader and trailer RNA encoding regions of VSV Nucleotides 1-55 define the region between the genomic 3' terminus and the beginning of the N gene, while 37 additional nucleotides can be found between the end of the L gene and the 5' terminus of the genome. However, unlike VSV, or even the paramyxoviruses Sendai and NDV, MV does not transcribe these terminal regions into short, unmodified or sense leader RNAs (47,48,49). Instead, leader readthrough transcripts, including full-length polyadenylated leader:N, leader:N:P, leader:N:P:M, and of course full-length antigenome MV RNAs are 1 5 transcribed (48,49). Thus, the short leader transcript, the key operational element determining the switch from transcription to replication of the VSV single-stranded, negative polarity genome (50,51,52), seems absent in MV. This leads to consideration and exploration of alternative models for this crucial reproductive event (42).
Measles virus, as well as all other Mononegavirales except the rhabdoviruses, appears to have extended its terminal regulatory domains beyond 25 the confines of leader and trailer encoding sequences For measles, these regions encompass the 107 3' ~genomic nucleotides (the genomic promoter region", also referred to as the "extended promoter", which comprises 52 nucleotides encoding the leader region, followed by three intergenic nucleotides, and 52 nucleotides encoding the 5' untranslated region of N mRNA) and the 109 5' end nucleotides (69 encoding the 3' untranslated region of L mRNA, the intergenic trinucleotide and 37 nucleotides encoding the trailer).
Within these 3' terminal approximately 100 nucleotides 26 of both the genome and antigenome are two short regions of shared nucleotide sequence: 14 of 16 nucleotides at the absolute 3' ends of the genome and antigenome are identical. Internal to those termini, an additional region of 12 nucleotides of absolute sequence identity have been located. Their position at and near the sites at which the transcription of the MV genome must initiate and replication of the antigenome must begin, suggests that these short unique sequence domains encompass an extended promoter region.
These discrete sequence elements may dictate alternative sites of transcription initiation the internal domain mandating transcription initiation at the N gene start site, and the 3' terminal domain 15 directing antigenome production (42,48,53). In addition to their regulatory role as cis-acting determinants of transcription and replication, these 3' extended genomic and antigenomic promoter regions encode the nascent 5' ends of antigenome and genome RNAs, respectively. Within these nascent RNAs reside as yet unidentified signals for N protein nucleation, another key regulatory element required for nucleocapsid template formation and consequently for amplification of transcription and replication. Figure 25 2 schematically shows the location and sequence of these highly conserved, putative cis-acting regulatory domains.
Terminal non-protein coding regions similar in location, size and spacing are present in the genomes of other members of the genus Paramyxoviridae, though only 8-11 of their absolute terminal nucleotides are shared by MV (42,54). The genomic terminii of the Morbillivirus canine distemper virus (CDV) displays a greater degree of homology with its MV relative: 73% of the nucleotides of the leader and trailer sequences 27 of these two viruses are identical, including 16 of 18 at the absolute 3' termini and 17 of 18 at their ends No accessory internal CDV genomic domainsharing homology to that of the MV extended promoter has been found. However, there is a 20 nucleotide long stretch lying between CDV genomic nucleotides 85 and 104 and 15,587 and 15,606 in which 15 of the nucleotides are complementary (Gene Bank accession number AF 14953). This indicates that CDV, like MV contains an additional region within its non-coding 3' genomic and antigenomic ends that may provide important cis-acting promoter and/or regulatory signals Additionally, the precise length of the 3'leader region (55 nucleotides) is identical among 1 5 several members of the Family Paramyxoviridae (MV, CDV, PIV-3, BPV-3, SV and NDV). Further evidence for the importance of these extended, non-protein coding regions comes from analyses of a large number of distinct copy-back Defective Interfering Viruses (DIs) recently cloned from subacute sclerosing panencephalitis (SSPE) brain tissue. No DI with a stem shorter than the 95 5' terminal genomic nucleotides was found. This indicates that the minimal signals needed for MV DI RNA replication and encapsidation extend well 25 beyond the 37 nucleotide long trailer sequence to encompass the additional internal putative regulatory domain (56).
As exemplified in part by measles virus, this invention is directed to the concept that important virulence/attenuation determinants reside in viral genomic non-protein coding regulatory regions and in the transacting transcription/replication enzyme complex with which these cis-acting elements must interact. The cis-acting domains are found both at the 3' and 5' ends of the MV genome, flanking the six 28 contiguous genes encoding viral structural proteins; and within the MV genome as short regions encompassing internal intergenic boundaries. The former encode the putative promoter and/or regulatory sequence elements directing the vital processes of genomic transcription, genome and antigenome encapsidation, and replication.
The latter signal transcription termination and polyadenylation of each monocistronic viral mRNA and then reinitiation of transcription of the next gene.
The transcription/replication enzyme, RNA dependent
RNA
polymerase molecule can modulate transcription and/or replicative efficiency, thereby determining the abundance of cytopathic viral gene products and/or virion progeny.
15 Proof of the concept of this invention for measles virus is obtained by first determining the nucleotide sequences of the non-coding regulatory regions genomic promoter region) and the coding regions of the L gene (with predicted amino acid sequences) of the progenitor Edmonston wild-type
MV
isolate, together with available measles vaccine strains derived from this isolate (see Figure 1).
Independent other wild-type isolates were examined for comparative purposes as well.
25 The nucleotide sequences (in positive strand, antigenomic, message sense) of four wild-type and five *00* vaccine measles strains, as well as the deduced amino acid sequences of the RNA polymerase (L protein) of these measles viruses, are set forth as follows with reference to the appropriate SEQ ID NOS. contained herein: 29 Virus Nucleotide Sequence L Protein Sequence Wild-Type Edmonston SEQ ID NO:1 SEQ ID NO:2 1977 SEQ ID NO:3 SEQ ID NO:4 1983 SEQ ID NO:5 SEQ ID NO:6 Montefiore SEQ ID NO:7 SEQ ID NO:8 Vaccine Rubeovaxm SEQ ID NO:9 SEQ ID Moraten SEQ ID NO:11 SEQ ID NO:12 Zagreb SEQ ID NO:13 SEQ ID NO:14 AIK-C SEQ ID NO:15 SEQ ID NO:16 Each measles virus genome listed above is 15 15,894 nucleotides in length. Translation of the L gene starts with the codon at nucleotides 9234-9236; the translation stop codon is at nucleotides 15783- 15785. The translated L protein is 2,183 amino acids long.
Note that nucleotide 2499 of 1983 wild-type measles virus is indicated as in SEQ ID NO:5. In fact, the base is actually a mixture of and Also note that nucleotide 2143 of RubeovaxM vaccine virus is indicated as in SEQ ID NO:9. In nine S 25 clones sequenced, this base was in seven and in two; thus, this base can be or In addition, the Schwarz vaccine virus genome is identical to that of the Moraten vaccine virus genome (SEQ ID NO:11), except that at nucleotides 4917 and 4924, Schwarz has a instead of a Nucleotide differences distinguishing the 3' genomic promoter region and nucleotide and amino acid differences distinguishing the L gene and L protein sequences of the Edmonston wild-type isolate, vaccine strains and other independently isolated wild-type 30 viruses were then compared and aligned (see Tables in Example 1 below).
As shown in Table 3, there were three mutations from the 3' genomic promoter region (in antigenomic, message sense) of the progenitor wild-type MV isolate and the derivative vaccine strains: At nucleotide position 26, from to at position 42, from to or from to and in the case of Zagreb only, at position 96, from to In addition, the other examined wild-type isolates differed from both the progenitor wild-type isolate and the vaccine strains at position 50 by having
"A"
instead of The predicted amino acid sequences of the L 15 genes of measles vaccine strains (Rubeovax', Moraten, Schwarz, AIK-C and Zagreb) and wild-type isolates (1977, 1983 and Montefiore), differ from the progenitor strain (Edmonston) at 49 positions in the 2183 amino acid long open reading frame (see Tables 4 and 5 in Example 1 below).
These amino acid differences can be divided into four categories: Positions where one vaccine strain :differs from the progenitor, as well as from other 25 vaccine and wild-type strains, suggesting a potential attenuation site.
Specific differences between all wildtype and all vaccine sequences; these may also constitute important attenuation sites.
3) Residues where chronologically newer wildtypes differ from older wild-types; which may be attributable to genetic drift.
Positions where one or more vaccine strains and/or wild-type strains have common amino acids and differ from all the other strains; these 31 changes may represent lineage-specific, potentially attenuating changes within the vaccine strains and relatedness among the wild-type isolates, respectively.
There were four category changes where one vaccine differed from the other vaccines, as well as the wild-type strains. Two of these were in Moraten and Schwarz (amino acids 331 and 2114) and two were in AIK-C (1624 and 2074). These mutations are of special interest because all of these viruses are good vaccines. Thus, these positions are sites for attenuation.
Only one position, 1717, fits into category with all wild-types having aspartic acid and all vaccines having alanine. Interestingly, this position 15 is in one of two areas where the L genes of measles and canine distemper virus (which are otherwise highly homologous) do not show exceptional conservation. This difference makes it more likely that 1717 is a key position for an attenuating mutation in measles.
There were five positions, 149, 636, 720, 2017 and 2119, where both chronologically newer wildtypes (1983 and Montefiore) differ from older wildtypes (Edmonston and 1977), which therefore fit into category These differences suggest genetic drift 25 rather than denoting sites of attenuating mutations.
Not included in this total are 16 positions where Montefiore (the 1989 isolate) differed from the rest (see Table These could be either genetic drift (category or random change (category The remaining 23 positions are category with one or more of the viruses differing from the consensus.
Three of these positions (1409, 1649, 1936) are potentially attenuating category mutations.
These are changes where two vaccine strains have a common change from the progenitor wild-type strain.
32 These changes may be connected with the vaccine lineage leading to the Rubeovax"T and Moraten vaccines (Figure 1).
Applicants have found that their AIK-C vaccine strain nucleotide sequence differs from the published sequence (33) at 21 positions, including one insertion and one deletion. Several of these differences result in coding changes including two in the L gene (at amino acids 1477 and 2008).
Thus, the additional changes accrued within the L gene sequence as the measles progenitor strain is progressively attenuated to achieve a replicative capacity optimized for live vaccine purposes appears to be constrained and delimited. Presumably, this limited 15 tolerance in the number and location of L gene changes is imposed not only by the need to preserve the multifunctional capacities of the polymerase, but also by the preexisting 3' promoter changes with which the evolving L protein must interact to achieve transcription and replication. In other words, optimal virus attenuation requires coordinate linked) changes in the polymerase protein and the cis-acting regulatory elements on which it acts.
The 3'-leader displays the least tolerance 25 for change, allowing highly selected changes during the attenuation process at nucleotide position 26 (always the change of from to and at position 42 (the change of from to or from to (in antigenomic, message sense). In the case of Zagreb only, there is a single further change, from to "A" at position 96, which may be important when combined with Zagreb L gene-specific changes. The 3'-leader region seems to have undergone only one instance of genetic drift since 1954, with a change of to "A" at position 50 (see Table 3) 33 The net change in the 3' genomic promoter region during the attenuation process is the replacement of two pyrimidines by two purines in genomic sense in all MV vaccine strains. The coevolution of the L gene during these attenuation processes is believed to reflect selection of subtle changes favoring reproduction of the viruses in different host cells. All the vaccine strains were grown in chick embryo (CE) or chick embryo fibroblast (CEF) cells during their attenuation process (Figure In addition, some vaccine strains have been exposed to unique host cells; Zagreb vaccine was grown in dog kidney cells and human diploid cells, while the AIK-C vaccine was adapted to sheep kidney 15 cells. Moraten and RubeovaxM were exclusively developed in CE and CEF.
Some of the lineage-specific L gene changes (position 1649 in Rubeovax'", Moraten and Schwarz vaccines and the change at position 1717 in all vaccines) represent a subset-of adaptations of the L gene to the 3 '-leader to modulate the transcription/replication processes for vaccine attenuation. Additionally, individual vaccine-specific changes (category may provide additional fine tune 25 modulation of virus replication/transcription for each vaccine strain.
Based on Table 3 and the foregoing discussion, the key attenuating mutations for the MV 3' genomic promoter region are nucleotide 26 (A T), nucleotide 42 (A or A C) and nucleotide 96 (G A) (in antigenomic, message sense).
Based on Table 4 and the foregoing discussion, the key attenuating sites for the L protein are as follows: amino acid residues 331 (isoleucine threonine), 1409 (alanine threonine), 1624 34 (threonine -+alanine), 1649 (arginine methionine), 1717 (aspartic acid alanine), 1936 (histidine tyrosine), 2074 (glutamine arginine) and 2114 (arginine lysine). It is understood that the nucleotide changes responsible for these amino acid changes are not limited to those set forth in Table 4 of Example 1 below; all changes in nucleotides which result in codons which are translated into these amino acids are within the scope of this invention.
Human parainfluenza virus type 3 (HPIV-3) is another nonsegmented, negative-sense, single stranded enveloped RNA virus. HPIV-3 belongs to the Family Paramyxoviridae (see Table The genome of HPIV-3 is 15,462 nucleotides long and encodes six non-overlapping 15 protein-encoding genes Five of the genes encode a single virion structural protein each, which are designated NP (corresponding to the N protein of MV), S. M, F, HN (hemagglutinin-neuraminidase) and L. The sixth mRNA encodes the P protein, and by an overlapping 5' proximal open reading frame (ORF) encodes the C protein, and by the RNA editing mechanism, also encodes the D protein.
Like MV, HPIV-3 consists of a 3 '-nonprotein coding leader region of 55 nucleotides, but unlike 25 measles (where it is 37 nucleotides), it has a 44 nucleotide long 5'-trailer region. The polymerase o*o transcribes the genome in a linear, sequential, startef stop manner which is guided by transcription signals in the RNA template.
Attempts to develop a live attenuated HPIV-3 vaccine by passaging the wild-type virus JS strain through cell culture at sub-optimal temperature has produced promising results Several "cold passage" (cp) mutants were isolated for evaluation from different passage levels of the JS strain. One such 35 mutant resulted from 45 serial passages and was designated This virus exhibited three interesting properties: cold adaptation the ability to replicate efficiently at the suboptimal temperature of temperature sensitivity inability to replicate in vitro at temperatures greater than or equal to 39°C; and small plaque morphology. This mutant appeared to be a promising vaccine candidate because: its ca, ts and small plaque phenotype is stable after passage in cell culture; its replication is restricted in both the upper and lower respiratory tract of hamsters; and it induced significant protection in hamsters against subsequent 15 challenge with wild-type HPIV-3 (58,59).
SEvaluation of this strain in the rhesus monkey showed the attenuation mutations in cp45 to be a combination of ts and non-ts mutations Subsequent evaluation in chimpanzees indicated that 20 cp45 appeared to be satisfactorily attenuated while still able to induce a high level of protection against wild-type virus challenge Later preliminary clinical evaluation of cp45 in seronegative human infants and small children suggested that this S 2 5 candidate vaccine strain is suitably infectious and attenuated, as well as being moderately immunogenic (61).
The cp45 strain has been grown in both fetal rhesus lung (FRhL) and Vero cells as follows: The PIV- 3 cp45 virus grown in FRhL cells was prepared by inoculating confluent FRhL cell monolayers in tissue culture flasks at an MOI 0.1-1.0. The infected cell cultures were fed with EMEM medium and incubated at 32 0 C. About seven days later, when maximal cytopathic effects (synctyia) were observed, the virus was 36 harvested by subjecting the cultures to one freeze-thaw cycle, pooling the fluids and then storing the virus at oC.
The PIV-3 cp45 virus grown in Vero cells was prepared by inoculating with virus a bioreactor culture of confluent monolayers of Vero cells on microcarrier beads which was continuously stirred. The infected bioreactor culture was maintained at 30°C. The virus was harvested 4-5 days later when syncytial CPE was observed. The culture fluid containing the virus was stored at -70 °C.
The nucleotide sequences (in positive strand, antigenomic, message sense) of the HPIV-3 JS wild-type strain (89) and the cp45 vaccine strain grown in FRhL 15 and Vero cells, as well as the deduced amino acid sequences of the RNA polymerase (L protein) of these HPIV-3 viruses, are set forth as follows with reference to the appropriate SEQ ID NOS. contained herein: 20 Virus Nucleotide Sequence L Protein Sequence Wild-Type JS SEQ ID NO:17 SEQ ID NO:18 Vaccine FRhL cp45 SEQ ID NO:19 SEQ ID "o Vero cp45 SEQ ID NO:21 SEQ ID NO:22 Each PIV-3 virus genome listed above is 15,462 nucleotides in length. Translation of the L gene starts with the codon at nucleotides 8646-8648; the translation stop codon is at nucleotides 15345- 15347. The translated L protein is 2,233 amino acids long.
As detailed in Example 2 and Table 6 therein below, based upon the differences between the wild-type 37 JS strain and the FRhL-g'rown cp 45 mutant vaccine strain, the key attenuating mutations for the HPIV-3 3' genomic promoter region are nucleotide 23 (T C), nucleotide 24 (C nucleot;.de 28 (C and nuclaotide 45 (T (in antigenomic, message sense) A9 also detailed in Example 2 and Table 6 therein below, key attenuating sites for the L protein of HPIv- 3 include the following: amino acid residues 942 (tyrosine *histidine), 992 (leucine -4 phenylalanine) and 1558 (threonine isoleucifle) It is understood that the nucleotide changes renponsible for these amino acid changes are not limited to those set forth in Exarple 2 below; all changes in nucleotides which result in codone which are translated into these amino acids are within the scope of this invention.
Human respiratory syncytial virus (REV) is yet another nonaegmentede' negative-sense, single stranded enveloped RNA virus. RSV belongs to the Subfamily Pneumovirinae and the genus Pneunovirus (see Table 1).
Two major subgroups of human RSV, designated A and B, have been identified based on reactivities of W. the F and G surface glycoproteins with monoclonal 2S antibodies (62) More recently, the A and B lineages of. RBV strains have been confirmed by sequence analysis (63,64). Bovine, ovine. and caprine atraini of this virus have also been isolated. The host specificity of the virus iis =onL clearly associated with the 0 attachment protein, which is highly divergent between 38 the human and the bovine/ovine strains (65,66), and may be influenced, at least in part, by receptor binding.
RSV is the primary cause of serious viral pneumonia and bronchiolitis in infants and young children. Serious disease, lower respiratory tract disease (LRD), is most prevalent in infants less than six months of age. It most commonly occurs in the nonimmune infant's first exposure to RSV. RSV additionally is associated with asthma and hyperreactive airways and it is a significant cause of mortality in "high risk" children with bronchopulmonary dysplasia and congenital heart disease (CHD). It is also one of the common viral respiratory infections predisposing to otitis media in children. In adults, S: 15 RSV generally presents as uncomplicated upper :respiratory illness; however, in the elderly it rivals influenza as a predisposing factor in the development of serious LRD, particularly bacterial bronchitis and pneumonia. Disease is always confined to the e 20 respiratory tract, except in the severely immunocompromised, where dissemination to other organs can occur. Virus is spread to others by fomites contaminated with virus-containing respiratory secretions, and infection initiates through the nasal, oral, or conjunctival mucosa.
RSV disease is seasonal and virus is usually isolated only in the winter months, from November to April in northern latitudes. The virus is ubiquitous, and over 90% of children have been infected at least once by 2 years of age. Multiple strains cocirculate. There is no direct evidence of antigenic drift (such as that seen with influenza A viruses), but sequence studies demonstrating accumulation of amino acid changes in the hypervariable regions of the G 39 protein and SH proteins suggest that immune pressure may drive virus evolution.
In mouse and cotton rat models, both the F and G proteins of RSV elicit neutralizing antibodies and immunization with these proteins alone provides longterm protection against reinfection (67,68).
In humans, complete immunity to RSV does not develop and reinfections occur throughout life (69,70); however, there is evidence that immune factors will protect against severe disease. A decrease in severity of disease is associated with two or more prior infections and there is evidence that children infected with one of the two major RSV subgroups may be somewhat protected against reinfection with the homologous 15 subgroup observations which suggest that a live attenuated virus vaccine may provide protection sufficient to prevent serious morbidity and mortality.
Infection with RSV elicits both antibody and cell mediated immunity. Serum neutralizing antibody to the 20 F and G proteins has been associated, in some studies, with protection from LRD, although reduction in upper respiratory disease (URD) has not been demonstrated.
High levels of serum antibody in infants is associated with protection against LRD, and adminstration of 25 intravenous immunoglobulin with high RSV neutralizing antibody titers has been shown to protect against severe disease in high risk children (70,72,73). The role of local immunity, and nasal antibody in particular, is being investigated.
The RSV virion consists of a ribonucleoprotein core contained within a lipoprotein envelope. The virions of pneumoviruses are similar in size and shape to those of all other paramyxoviruses.
When visualized by negative staining and electron microscopy, virions are irregular in shape and range in 40 diameter from 150-300 nm The nucleocapsid of this virus is a symmetrical helix similar to that of other paramyxoviruses, except that the helical diameter is 12-15 nm rather than 18nm. The envelope consists of a lipid bilayer that is derived from the host membrane and contains virally coded transmembrane surface glycoproteins. The viral glycoproteins mediate attachment and penetration and are organized separately into virion spikes. All members of paramyxovirus subfamily have hemagglutinating activity, but this function is not a defining feature for pneumoviruses, being absent in RSV but present in PVM Neuraminidase activity is present in members of the genera Paramyxovirus, Rubulavirus, and is absent in 15 Morbillivirus and Pneumovirus of mice (PVM) RSV possesses two subgroups, designated A and B. The wild-type RSV (strain 2B) genome is a single strand of negative-sense RNA of 15,218 nucleotides
(SEQ
ID NO:23) that are transcribed into ten major 20 subgenomic mRNAs. Each of the ten mRNAs encodes a major polypeptide chain: Three are transmembrane surface proteins F and SH); three are the proteins associated with genomic RNA to form the viral nucleocapsid P and two are nonstructural proteins (NS1 and NS2) which accumulate in the infected cells but are also present in the virion in trace amounts and may play a role in regulating transcription and replication; one is the nonglycosylated virion matrix protein and the last is M2, another nonglycosylated protein recently shown to be an RSVspecified transcription elongation factor (see Figure These ten viral proteins account for nearly all of the viral coding capacity.
The viral genome is encapsidated with the major nucleocapsid protein and is associated with 41 the phosphoprotein and the large polymerase protein. These three proteins have been shown to be necessary and sufficient for directing RNA replication of cDNA encoded RSV minigenomes Further studies have shown that for transcription to proceed with full processing, the M2 protein (ORF 1) is required (74).
When the M2 protein is missing, truncated transcripts predominate, and rescue of the full length genome does not occur (74).
Both the M (matrix protein) and the M2 proteins are internal virion-associated proteins that are not present in the nucleocapsid structure. By analogy with other nonsegmented negative-stranded
RNA
viruses, the M protein is thought to render the nucleocapsid transcriptionally inactive before packaging and to mediate its association with the viral envelope. The NS1 and NS2 proteins have only been detected in very small amounts in purified virions, and at this time are considered non-structural. Their 20 functions are uncertain, though they may be regulators of transcription and replication. Three transmembrane surface glycoproteins are present in virions: G, F, and SH. G and F (fusion) are envelope glycoproteins that are known to mediate attachment and penetration of the 25 virus into the host cell. In addition, these glycoproteins represent major independent immunogens The function of the SH protein is unknown, although a recent report has implicated its involvement in the fusion function of the virus (78).
The genomes of two wild-type RSV subgroup
B
strains (2B and 18537) have now been sequenced in their entirety (see SEQ ID NOS:23 and 25, discussed below).
Genomic RNA is neither capped nor polyadenylated (79).
In both the virion and intracellularly, genomic RNA is tightly associated with the N protein.
42 The 3' end of the genomic RNA consists of a 4 4 -nucleotide extragenic leader region that is presumed to contain the major viral promoter (Fig. The 3' genomic promoter region is followed by ten viral genes in the order 3 '-NS1-NS2-N-P-M-SH-G-F-M2-L-5' (Fig. 3).
The L gene is followed by a 145-149 nucleotide extragenic trailer region (see Figure Each gene begins with a conserved nine-nucleotide gene start signal 3'-GGGGCAAAU (except for the ten-nucleotide gene start signal of the L gene, which is 3'-GGGACAAAAU; differences underlined). For each gene, transcription begins at the first nucleotide of the signal. Each gene terminates with a semi-conserved 12-14 nucleotide gene end G U/G U/A ANNN U/A A 3 (where N can be 15 any of the four bases) that directs transcription termination and polyadenylation (Fig. The first nine genes are non-overlapping and are separated by intergenic regions that range in size from 3 to 56 nucleotides for RSV B strains (Fig. The intergenic 20 regions do not contain any conserved motifs or any obvious features of secondary structure and have been shown to have no influence on the preceding and succeeding gene expression in a minreplicon system (Fig. The last two RSV genes overlap by 68 nucleotides (Fig. The gene-start signal of the L gene is located inside of, rather than after, the M2 gene. This 68 nucleotide overlap sequence encodes the last 68 nucleotides of the M2 mRNA (exclusive of the Poly-A tail), as well as the first 68 nucleotides of the L mRNA.
Ten different species of subgenomic polyadenylated mRNAs and a number of polycistronic polyadenylated read-through transcripts are the products of genomic transcription (74).
Transcriptional mapping studies using UV light mediated 43 genomic inactivation showed that RSV genes are transcribed in their 3' to 5' order from a single promoter near the 3' end Thus, RSV synthesis appears to follow the single entry, sequential transcription model proposed for all Mononegavirales (16,81). According to this model, the polymerase
(L)
contacts genomic RNA in the nucleocapsid form at the 3' genomic promoter region and begins transcription at the first nucleotide. RSV mRNAs are co-linear copies of the genes, with no evidence of mRNA editing or splicing.
Sequence analysis of intracellular RSV mRNAs showed that synthesis of each transcript begins at the o first nucleotide of the gene start signal The S 15 end of the mRNAs are capped with the structure m7G(5')ppp(5')Gp (where the underlined G is the first template nucleotide of the mRNA) and the mRNAs are polyadenylated at their 3' ends Both of these modifications are thought to be made co- 20 transcriptionally by the viral polymerase. Three regions of the RSV 3' genomic promoter have been found to be important as cis acting elements These regions are the first ten nucleotides (presumably acting as a promoter), nucleotides 21-25, and the gene 25 start signal located at nucleotides 45-53 Unlike other Paramyxovirinae, such as measles, Sendai and PIV- 3, the remainder of the leader and non-coding region of NS1 gene of RSV was found to be highly tolerant of insertions, deletions and substitutions (83).
Additionally, by saturation mutagenesis (wherein each base is replaced independently by each of the other three bases and compared for translation and replication efficiencies) within the first 12 nucleotides of the 3' genomic promoter region, a Utract located at nucleotides 6-10 was shown to be 44 highly inhibitory to substitutions In contrast, the first five nucleotides were relatively tolerant of a number of substitutions and two of them at position four were up-regulatory mutations, resulting in a fourto 20-fold increase in RSV-CAT RNA replication and transcription. Using a bi-cistronic minireplicon system, gene-start and gene-end motifs were shown to be signals for mRNA synthesis and appear to be self contained and largely independent of the nature of adjoining sequence (84).
The L gene start signal lies 68 nucleotides upstream of the M2 gene-end signal, resulting in gene overlap (Fig. 3) The presence of the M2 gene-end signal within the L gene results in a high frequency of 15 premature termination of L gene transcripts. Full length L mRNA is much less abundant and is made when the polymerase fails to recognize the M2 gene-end motif. This results in much lower transcription of L mRNA. The gene overlap seems incompatible with a model 20 of linear sequential transcription. It is not known whether the polymerase that exits the M2 gene jumps backward to the L gene-start signal or whether there is a second, internal promoter for L gene transcription It is also possible that the L gene is 25 accessible by a small fraction of polymerases that fail to start transcription at the M2 gene-start signal and slide down the M2 gene to the L gene-start signal.
The relative abundance of each RSV mRNA decreases with the distance of its gene from the promoter, presumably due to polymerase fall-off during sequential transcription Gene overlap is a second mechanism that reduces the synthesis of full length L mRNA. Also, certain mRNAs have features that might reduce the efficiency of translation. The initiation codon for SH mRNA is in a suboptimal Kozak 45 sequence context, while the G ORF begins at the second methionyl codon in the mRNA.
RSV RNA replication is thought (74) to follow the model proposed from studies with vesicular stomatitis virus and Sendai virus (16,81). This involves a switch from the stop-start mode of mRNA synthesis to an antiterminator read-through mode. This results in synthesis of positive sense replicationintermediate (RI) RNA that is an exact complementary copy of genomic RNA. This serves in turn as the template for the synthesis of progeny genomes. The mechanism involved in the switch to the antiterminator mode is proposed to involve cotranscriptional encapsidation of the nascent RNA by N protein (16,81).
15 RNA replication in RSV like other nonsegmented negative-strand RNA viruses is dependent on ongoing protein synthesis Predicted RI RNA has been detected for the standard virus as well as RSV-CAT minigenome (74,85). RI RNA was 10-20 fold less o 20 abundant intracellularly than was the progeny genome both for the standard and the minigenome system. The nucleotide sequences (in positive strand, antigenomic, message sense) of various wild-type, vaccine and revertant RSV strains, as well as the deduced amino 25 acid sequences of the RNA polymerase (L protein) of these RSV viruses, are set forth as follows with reference to the appropriate SEQ ID NOS. contained herein: 46 Virus Wild-Type 2B 18537 Vaccine 2B33F 2B20L Revertant 2B33F TS(+) 2B20L TS(+) Nucleotide Sequence SEQ ID NO:23 SEQ ID NO:25 SEQ ID NO:27 SEQ ID NO:29 L Protein Sequence SEQ ID NO:24 SEQ ID NO:26 SEQ ID N0:28 SEQ ID N0:30 SEQ ID NO:31 SEQ ID NO:33 SEQ ID N0:32 SEQ ID N0:34 Each RSV virus genome that is 2,166 amino acids long.
other nucleotide information is encodes an L protein Genome length and as follows: Virus Wild-Type 2B 18537 Vaccine 2B33F 2B20L Revertant 2B33F TS(+) 2B20L TS(+) Genome Length 15218 15229 15219 15219 15219 15219 L Start Codon 8502-8504 8509-8511 8503-8505 8503-8505 8503-8505 8503-8505 L Stop Codon 15000-15002 15007-15009 15001-15003 15001-15003 15001-15003 15001-15003 r As detailed in Example 3 (especially Tables 7 and 8) below, the key attenuating mutations for the RSV subgroup B 3' genomic promoter region are nucleotide 4 (C and the insertion of an additional A in the stretch of A's at nucleotides 6-11 (in antigenomic 47 message sense). As also detailed in Example 3 below, the key potentially attenuating sites for the L protein of RSV are as follows: amino acid residues 353 (arginine lysine), 451 (lysine arginine), 1229 (aspartic acid asparagine), 2029 (threonine isoleucine) and 2050 (asparagine aspartic acid). It is understood that the nucleotide changes responsible for these amino acid changes are not limited to those set forth in Example 3 below; all changes in nucleotides which result in codons which are translated into these amino acids are within the scope of this invention.
The attenuated viruses of this invention exhibit a substantial reduction of virulence compared 15 to wild-type viruses which infect human and animal hosts. The extent of attenuation is such that symptoms of infection will not arise in most immunized individuals, but the virus will retain sufficient replication competence to be infectious in and elicit 20 the desired immune response profile in the vaccinee.
The attenuated viruses of this invention may be used to formulate a vaccine. To do so, the attenuated virus is adjusted to an appropriate concentration and formulated with any suitable vaccine 25 adjuvant, diluent or carrier. Physiologically acceptable media may be used as carriers. These include, but are not limited to: an appropriate isotonic medium, phosphate buffered saline and the like. Suitable adjuvants include, but are not limited to MPL T (3-O-deacylated monophosphoryl lipid A; RIBI ImmunoChem Research, Inc., Hamilton, MT) and IL-12 (Genetics Institute, Cambridge,
MA)
In one embodiment of this invention, the formulation including the attenuated virus is intended for use as a vaccine. The attenuated virus may be mixed 48 with cryoprotective additives or stabilizers such as proteins albumin, gelatin), sugars sucrose, lactose, sorbitol), amino acids sodium glutamate), saline, or other protective agents. This mixture is maintained in a liquid state, or is then dessicated or lyophilized for transport and storage and mixed with water immediately prior to administration.
Formulations comprising the attenuated viruses of this invention are useful to immunize a human or animal subject to induce protection against infection by the wild-type counterpart of the attenuated virus. Thus, this invention further provides a method of immunizing a subject to induce protection against infection by an RNA virus of the S 15 Order Mononegavirales by administering to the subject "an effective immunizing amount of a vaccine formulation incorporating an attenuated version of that virus as described hereinabove.
A sufficient amount of the vaccine in an 20 appropriate number of doses must be administered to the subject to elicit an immune response. Persons skilled in the art will readily be able to determine such amounts and dosages. Administration may be by any conventional effective form, such as intranasally, S 25 parenterally, orally, or topically applied to any mucosal surface such as intranasal, oral, eye, vaginal or rectal surface, such as by an aerosol spray. The preferred means of administration is by intranasal administration.
In another embodiment of this invention, an isolated nucleic acid molecule having the complete viral nucleotide sequence of either the wild-type viruses or vaccine viruses described herein is used to generate oligonucleotide probes (from either positive strand antigenomic message sense or negative strand 49 complementary genomic sense) and to express peptides (from positive strand antigenomic message sense only), which are used to detect the presence of those wildtype virus and/or vaccine strains in samples of body fluids and tissues. The nucleotide sequences are used to design highly specific and sensitive diagnostic tests to detect the presence of the virus in a sample.
Polymerase chain reaction (PCR) primers are synthesized with sequences based on the viral wild-type or vaccine sequences described herein. The test sample is subjected to reverse transcription of RNA, followed by PCR amplification of selected cDNA regions corresponding to the nucleotide sequence described e herein which have nucleotides which are distinct for a 15 defined strain of virus. Amplified PCR products are identified on gels and their specificity confirmed by hybridization with specific nucleotide probes.
ELISA tests are used to detect the presence of antigens of the wild-type or vaccine viral strains.
20 Peptides are designed and selected to contain one or more distinct residues based on the wild-type or vaccine sequences described herein. These peptides are then coupled to a hapten keyhole limpet hemocyanin (KLH) and used to immunize animals 25 rabbits) for the production of monospecific polyclonal antibody. A selection of these polyclonal antibodies, or a combination of polyclonal and monoclonal antibodies can then be used in a "capture ELISA" to detect antigens produced by those viruses.
Samples of the Moraten measles virus vaccine strain were deposited by Applicants with the American Type Culture Collection, 12301 Parklawn Drive, Rockville, Maryland 20852, under the provisions of the Budapest Treaty for the Deposit of Microorganisms for the Purposes of Patent Procedures 50 ("Budapest Treaty") and have been assigned ATCC accession number VR2587. Samples of the HPIV-3 virus Vero-grown cp45 vaccine strain were deposited by Applicants with the American Type Culture Collection, 12301 Parklawn Drive, Rockville, Maryland 20852, under the provisions of the Budapest Treaty and have been assigned ATCC accession number VR2588.
Samples of the 2B wild-type RSV virus were deposited by Applicants with the American Type Culture Collection, 12301 Parklawn Drive, Rockville, Maryland 20852, under the provisions of the Budapest Treaty and have been assigned ATCC accession number VR2586.
Given these three deposited strains and the sequence information for these and other strains 15 provided herein, one can use site-directed mutagenesis and rescue techniques described above to introduce mutations (or restore a wild-type genotype) of all the strains described herein, as well as taking these strains and making additional mutations from the panel 20 of mutations set forth in Tables 3, 4 and 6-8 below.
In order that this invention may be better understood, the following examples are set forth. The examples are for the purpose of illustration only and are not to be construed as limiting the scope of the 25 invention.
Examples Standard molecular biology techniques are utilized according to the protocols described in Sambrook et al. (86).
51 Example 1 Measles Moraten MV vaccine virus was grown once, directly from the Attenuvax TM vaccine vial (Lot #0716B), the Schwarz vaccine virus was grown once (Lot 96G04/M179 G41D), while the Zagreb and RubeovaxM vaccine viruses were each grown twice in the Vero cells before RNAs were made for sequence analysis.
MV
wildtype isolate Montefiore (56) was passed 5-6 times in Vero cells before extraction of RNA materials and similarly, MV wildtype isolates 1977, 1983 (14) were grown 5-7 times before extracting materials for analysis. Edmonston wild-type isolate received from 15 Dr. J. Beeler (CBER) (see Fig. 1) was the original Edmonston isolate already passaged seven times in human kidney cells and three times in Vero cells before receipt and further passaged once in Vero cells before using for sequence analysis.
20 RNA was prepared by infecting Vero cells at a multiplicity of infection of 0.1 to 1.0 and allowed to reach maximum cytopathology before being harvested. Total RNA from measles virus-infected cells was extracted using TrizolM reagent (Gibco-BRL).
25 The total RNA isolated from Vero cell passage material was amplified by the Reverse Transcriptase-PCR (Perkin-Elmer/Cetus) procedure using measles (Edmonston B strain specific primer pairs spanning the 3' and 5' promoter regions and the L gene of the viral genome. Table 2 presents these primer sequences. The primers of SEQ ID NOS:35-54, 74, 77 and 78 are in antigenomic message sense. The primers of SEQ ID NOS:55-73, 75, 76 and 79 are in genomic negative-sense.
52 Table 2 Primers for PCR and Sequencing MV L Genes and Genomic Termini 9 0 4 7
CATATCACTCACTCTGGGATGGAG
9 0 7 0 (SEQ ID 9 3 7 1
TCAGAACATCAAGCACCGCC
9 3 9 0 (SEQ ID NO:36) 9 7 4 1
ACAGTCAAGACTGAGATGAG
9 7 60 (SEQ ID NO:37) 1 000 1
AAGAGTCAGATACATGTGGA
1 0 0 2 0 (SEQ ID NO:38) 1 0 3 1 ACATGAATCAGCCTAAAGTC10370 (SEQ ID NO:39) 1 06 7 4
CCGAAAGAGTTCCTGCGTTACGACC
1 0 69 8 (SEQ ID 1 1 8 3 CAGTCCACACAAGTACCAGG11102 (SEQ ID NO:41) 14
,,GTCAGAAGCTGTGGACCATC,,
4 (SEQ ID NO:42) 1 1 8 4 1
AATATTGCTACAACAATGGC
1 1 8 6 0 (SEQ ID NO:43) 121 ACTCTTCATTCCTAGACTGG (SEQ ID NO:44) 1 2 5 4 2 GTCCAATTATGACTATGAAC (SEQ ID 12891AGACAGACATGAAGCTTGC 1 2 9 (SEQ ID NO:46) 1 32 CCAACAAGGAATGCTTCTAG1325, (SEQ ID NO:47) 13 5 5 1
ACAGCACTATCTATGATTGACCTGG
1 3 5 7 5 (SEQ ID NO:48) 13 9 3 0GCAACATGGTTTACACATGC 1 3 949 (SEQ ID NO:49) 1 4 2
AGATTGAGAGTTGATCCAGG
1 4 2 9 9 (SEQ ID 5555 1 4 2 9 AGGAGATACTTAAACTAAGC144, (SEQ ID NO:51) S 1 4 9 8 1
TAAGCTTATGCCTTTCAGCG
1 5 0 0 0 (SEQ ID NO:52) 1
S
337
TTAACGGACCTAAGCTGTGC
15356 (SEQ ID NO:53) 1 5671 GAAACAGATTATTATGACGG15690 (SEQ ID NO:54)
S
9 2 0CGGGCTATCTAGGTGAACTTCAGG (SEQ ID 0
ATTTGGATATGGAATATGAG
9 4 1 (SEQ ID NO:56) 9 8 4
ACTCAACTGAACTACCAGTG
8 2 1 (SEQ ID NO:57) 1 1
,,AAGAACATCATGTATTTCAG
1 1 62 (SEQ ID NO:58) 1 0 5 4 9 TTATCAACGCACTGCTCATG10530 (SEQ ID NO:59) 1 D9 19 ATTTTCAGCAATCACTTGGCATGCC10895 (SEQ ID 1 12 80GCCTCTGTGCAAACAAGCTG11261 (SEQ ID NO:61) 1 16 3 ,TCTCTAGTTACTCTAGCAGC11619 (SEQ ID Nb:62) 1 2 o 1 oAGGTCGTTGTTTGTGAGGAG11991 (SEQ ID NO:63) 1 2 361
TCGTCCTCTTCTTTACTGTC
1 2 3 4 2 (SEQ ID NO:64) 53 12 8 CCGTCCTCGAGCTAGCCTCG167 (SEQ ID NO: 1 3 0 5 2 CTCCTCCAGGCTCACATTGG103(E ID NO:66) 1 3 4 2 GGGTTGGTACATAGCTCTGC130 (SEQ ID NO:67) 1 3 7 7 CACCCATCTGATATTTCCCTGATGG1374 (SEQ ID NO:68) 1 4 99TGGTTGACAGTACAAATCTG108 (SEQ ID NO:69) 1 4 4 0CTGAAATGGGAAGATTGTGC144 (SEQ ID ,,,,,AGCAATCTACACTGCCTACC140 (SEQ ID NO:71) 15180TCACAGATGATTCATTATC11 (E ID NO:72) 155 3 GATCCTAGATATAAGTTCTC151 (SEQ ID NO:73) I0 IACCAACAAAGTTGGGTAAGG2 (SEQ ID NO:74) GGGGGATCC1 0 0
ATCCCTAATCCTGCTCTTGTCCC
78 (SEQ ID 2 0 0 GATTCCTCTGATGGCTCCAC18 (SEQ ID NO:76) 7 2 TAACAGTCAAGGAGACCAG17 (E ID NO:77) GGGAAGCTTj AACCCTAATCCTGCCCTAGGTGG112 (SEQ ID NO: 78) 158 9 4 ACCAGACAAAGCTGGGATAGA1,7 (SEQ ID NO:79) Overlapping PCR fragments of the complete viral genome were directly sequenced without cloning to 20 achieve the consensus sequence, by the dideoxy terminator cycle sequencing method using both strands (ABI PRISM 377 sequencer and A13I PRISM sequencing Kit).
To determine the sequence at the absolute termini, a ligation procedure described previously was used To test this hypothesis, the nucleotide sequences were determined for the non-protein coding regulatory regions and the L gene of the progenitor Edmonston wild-type MV isolate, for the available vaccine strains derived from this isolate, as well as for other wild-type strains. Nucleotide (in antigenomic, message sense) and amino acid differences were then compared and aligned as set forth in Tables (differences are in italics): 54 Table 3 Differences in MV 3' Genomic Promoter Region Nucleotide Sequence Virus Edmonston w-t Vaccines: RubeovaxM Nucleotide number: 26 42 50 .96 A A G G Mora ten Schwarz Zagreb
AIK-C
G G G G G G G A G G Wild-Types: 1977 1983 Montefiore A G A G A G Table 4 Differences in MV L Nucleotides and Amino Acids Between Edmonston Wild-Type and Vaccine Strains 331 1409 1624 1649 1717 1887 193 6 2 074 2 114 Edmuonston w-t ATT GCA ACC AGG GAT AAC CAT CAA AGA Mutation ACT ACA GCC ATG GCT GAC TAT CGA AAA Edmonston w-t I A T R D N H Q R Rubeovax" vac. I A T M A D H Q R Moraten vac. T A T M A D H Q K Schwarz vac. T A T M A D H Q K Zagreb vac. I T T R A N H Q R AIK-C vac. I T A R A N Y R R Table Differences in MV L Nucleotides and Amino Acids Between Wild-Type Strains 81 Edinonston w-t GCC Mutation ACC Edmonston w-t A 1977 w-t A 1983 w-t T Montefiore w-t A 618 Edmonston w-t GTC Mutation GCC Edmonston w-t V 1977 w-t A 1983 w-t V Montefiore w-t V 122 149 GAT GTT AAT ATT D V N V D I D I 252
ACA
GCA
T
T
T
A
331
ATT
GTT
I
V
441
AAA
AGA
K
K
K
R
447
AAA
AGA
K
K
K
R
500 513 570 GAT GTG AAA AAT ATG AAT D V K D M K N M N D M K 637 641 645 GTA GAC GAT ATA AAT AAT V D D I D N V D D V N D 613
TAC
CAC
y
Y
621 623 626 628 632 636 AGT AGG AGA GCA ATA CAA AAT AAG AAA GAA GTA CAT S R R A I Q N R R A I Q S K R A I H S R K E V H 650
ATG
ATA
M
M
M
I
S S S S. 652 Edmonston w-t GCT Mutation ACC Edmonston w-t A 1977 w-t A 1983 w-t A Montefiore w-t T 1860 Edmonston w-t GTA Mutation ATA Ednionston w-t V 1977 w-t V 1983 w-t V Montefiore w-t I Table 5 (continued) Differences in MV L Nucleotides and Amino Acids Between Wild-Type Strains 720 723 794 914 970 1044 1294 1569 1705 1745 ATC TAT CGG CGG GCC GGA AGC GTT ATC AAT GTC TGC TGG CAG TCA AGA ACC ATT GTC ACT I Y R R A G S V I N I C W Q A G S V I N V C R R S G T I I N V C R R A R S V V S 1865 1936 TTC CAT TAC TAT F H Y H F Y F H 2007
GAC
GGC
D
D
D
G
2013 2017 GAT ACT GOT ATT D T D T G I D I 2030
AAT
AGT
N
N
N
S
2096
ATA
GTA
I
2119
AAG
CGG
K
K
R
R
2165
GTC
ATC
V
V
I
V
58 Example 2 PIV-3 A comparison of sequences (in antigenomic message sense) of the parental wild-type JS strain of PIV-3 virus and the FRhL-grown and Vero-grown forms of the cp45 mutant are set forth in Table 6. Where a codon change does not result in an amino acid change, Table 6 states "none", followed by the name of the unchanged amino acid.
*o* 9 9* 9 59 Table 6 and FRhL-grown cp45 JS Sequence Comparison of Verc atrains Gene Region NP UTh N4P coding P coding X coding F coding HN coding L coding Nucleotide Position 24 28 62 397 1275 2080 4347 5536 6329 6419 6847 7956 9323 9971 11469 11621 12581 13318 is FRhL cp4 5 Ve ro cp4 5 Codon Change Amino Acid Change (number in L) GTC -4- TCT AAT
CC
GCT
AAC
CCC 4 ACC AA C AAI
ATA
OCA GGT GTT TAT GAA
TAC-
TTG TTC ACT
GTA
ACA
GGC
GCT
TAC
GAG
CAC
TTT
TTT
ATT
Val -4Ala Ser -*Ala none /Asn.
Pro Thr none/Asn le -4 Val Ala Thr none/Gly Val k Ala none/Tyr (226) none/Glu (442) Tyr -*His (942) Leu -+Phe (992) none/Phe (1312) Thr Ile (1558) mutations 20 60 Sequence analysis of the parental wild-type 3'S strain of PIV-3 virus and the M1Lht-grown cp 4 5 mutant showed that the latter contained 20 nucleotide changen.
Four changes were in the noncoding 31-leader region at nucleotide positions 23 (T 24 (C T) 28 (0 T) and 45 (T A) (in antigelomiC, message aiense') When considered in the genomic, negative sense, the change at position 28 fro= the smaller pyrimidine to the larger purine may change the size of the region flanked by the conserved regions of the 3' genomic promoter region, resulting in an altered spatial presentation of the cis-acting signals to the polymerase.
Nine changes were coding changes in the NP, M, F, 1fl and L genes. The other ise ven changes were non-coding or silent changes in the NP, P, F, HflN and L genes or the NP unt-ranslated region (UTR) The mutant has been demonstrated to have poor transcription activity at non-permissive temperatures due to its ta phenotype This ts phenotype has now been mapped to the viral L gene (88) Because the cp 4 5 virus has been shown to function normally with regard to mutations in the RN and F~ g'lycoproteins this supports the implication that mutations in the 3' leader and L gene contributed to the attenuating phenotype of this virus.
Thus, the four 3' leader specific ,changes in FRhL-grown cp45 and the three coding changes in the L gene at &Mino acid positions 942 (Tyr -+His) 992 (Leu -+Phe) and 1558 (Thr -+Ile) contributed significantly to the attenuation phenotype of the candidate cp 4 vaccine strain.
The first two amino acid changes in the L protein (at positions 942 and 992) map to one of the highly conserved areas among all Paramyxovirus L genes. The third amino acid change (at position 1558) maps to the area joining two conserved blocks corresponding to the change at amino acid 1717 in the HV vaccine strains The published literature (89) sets forth only 18 changes between the antigenomic message sense sequences of the JS and FRhL-grown cp45 strains. Sixteen of these changes were found by applicants.
The published literature did not report four changes found by applicants: in the 3' leader at nucleotide 45 (T in the NP UTR at nucleotide 62 (A or the 0o changes in amino acids in the NF protein resulting from the changes at nucleotide 397 (T leading to the amino acid change (Val Ala) and nucleotide 1275 (T G), leading to the amino acid change (Ser Ala) (nucleotide changes in antigenomic, message sense).
a .o g* a a o *ooo (R:\LIBZ]05589.doc:lam 62 Example 3 RSV Subqroup B The temperature-sensitive (ts) phenotype is strongly associated with attenuation in vivo; in addition, some non-ts mutations may also be attenuating. Identification of ts and non-ts attenuating mutations was achieved by sequence analysis and evaluation of ts, cold-adapted and in vivo growth phenotypes of RSV mutants and revertants.
The genomes of the following five RSV 2B strains have now been completely sequenced: 2B parent, 2B33F, one revertant designated 2B33F 2B20L and one revertant designated 2B20L The 2B33F and 2B20L strains are ts and ca and are described in U.S.
Serial No. 08/059,444 which is hereby incorporated by reference. After identifying regions where mutations in 2B33F and 2B20L are located, nine additional isolates of 2B33F "revertants" obtained 20 following in vitro passaging at 39 0 C and in vivo passaging in African Green Monkeys or chimpanzees, and nine additional isolates of 2B20L "revertants" obtained following in vitro passaging at 39 0 C have been sequenced in those regions. The ts, ca, and 25 attenuation phenotypes of many of these revertants have now been characterized and assessed. Correlations between phenotype ts, vaccine attenuation and sequence *changes have been identified.
A summary of results is presented in Tables 30 7-12.
63 Table 7 Sequence comparison between RSV 2B and 2B33F strains Nuc. Nuc 3' end RSV 2B of vRNAI leotide changes Gene/ region Genomic Promoter
M
SH
L
r
RSV
2B33F 4 6 4175 4199 4329 4409 4420 4442 4454 4484 4497 4505 4525 4526 4542 4561 4575 4598 9559 9 853 12186 14587 15071 C1T G ease to IS W ofS se.
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
G
A
G
C
A
extra A
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
G
A
T
G
RSV 2B33F 5a revertant
G
extra A
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
T
Amino acid changes non- coding non- coding non- coding non-coding Phe-Leu none Ile (36) Ile-Thr none His (47) none Cys (51) none Tyr (61) Stop-Gin (66) none Ser (68) Ile-Thr Ile-Thr Stop-Gin (81) Leu-Pro (87) Trp-Arg (92) none Thr (99) Arg-Lys (353) Lys-Arg (451)* Asp-Asn (1229) Thr-Iie (2029) non- coding
G
1~ I a ease.
For 2B33F and 2B33F nuci. pos. numbers are one larger than f or 2B for M, SH L genes At pos. 9853, the Lys-Arg change has reverted back to Lys in the 2B33F strain 64 Table 8 Sequence comparison between RSV 2B and 2B20L strains Nuci.
pos. t Nucleotide changes Gene/ region Genomi c Promoter
L
3'1 end of vRNA 4 6 8963 13347 14587 14649 14650 RSV 2 B
C
C
A
C
A
A
RSV
2B20L
G
extra A
T
A
T
G
A
RSV 2B20L R1 r ever tan t
G
extra A
T
G
T
G
T
Amino acid changes non- coding* non-coding* none Thr (154) Asn-Asp (1616) Thr-Ile (2029) Asn-Asp (2050) Asn-Asp-Val (2050) I 4
C
For 2B20L and 2B20L nucl. pos. numbers are one larger than for 2B for L gene mutation is common in 2B33F and 2B20L strains At pos. 14650, the mutation suppresses the ts phenotype in 2B20L revertant ~0 ~0 0 0* 0 0*0 0 *Table 9 RS 2B s an Reetn Strain Sample Source In Vitro Phenotype In Vjvo Growt h* tsca Cotton Rat AGM 39/32 0 C EOP 20/32 0 C Nasal Lungs Nasal Bronchial plaque morph Yield turbinates Wash Lavage RSV 2B Wild-type Parent 0.7 0.0001 5.58 5.8a 5.80 4.76 Strain (WT) 3 .9b 5 2 b (4/4) (4/4) RSV 2B33F ca, ts mutant isolated 0.00007 0.04 6a 3 .0e <0.9' from 2B cold-passaged (sp/int/wt) <1 .9b <1l.2 b (0/4) x 3(1/4) (0/4) RSV 2B33F 5a 2B33F spinner passage, 0.5 0.03 1l. 7" 3 .5a 4 .2e 4.Qe plaque picked at 39'C (WT) (4/4) RSV 2B33F 4a 2B33F spinner passage, 0.7 0.01 -1.71 3.8' ND ND plaque picked at 39'C (WT) (4/4) RSV 2B33F 3b 2B33F spinner passage, 0.5 0.04 2.96 ND ND plaque picked at 39'C (WT) (4/4) AGM pp2 2B33F-infected AGM 0.3 0.00002 2 0 .b 1.6b ND N #A2,d7 nasal wash (sp,int) (4/4) plaque picked at 32'C Tal 9 (continued RS 2B s an Reetn Strain Sample Source In Vitro Phenotype In Vivo Growth* tsca Cotton Rat AGM 39/32-C EOP 20/32-C Nasal Lungs Nasal Bronchial plaque morph Yield turbinates Wash Lavage AGM pp4 2B33F-infected AGM 0.1 0.008 D N #A2,d7 nasal wash (sp,int) (4/4) plaque picked at 32 0
C
AGM pp6 2B33F-infected AGM 0.000004 -0.00005 j.S5b<1.l NDD #A4,d12 nasal wash (wt) (0/4) plaque picked at 32'C AGM pp7 2B33F-infected AGM 0.000004 0.007 l1.
4 b<1.bNDD #A4,d12 nasal wash (sp/int/wt) (0/4) plaque picked at 32*C Chimp pplA 2B33F-infected Chimp 0.5 Nl ND D N #1552, d4 tracheal (WTr) lavage plaque picked at 32'C Chimp pp3A 2B333F-infected Chimp 0.7 ND 2.4' :0c ND ND #1560, d6 tracheal
(T
lavage (3/4) plaque picked at 32'C Chimp pp5A 2B33F-infected Chimp 0.7 N 2.3c 3.0 ND ND #1563, dlO nasal swab (T plaque picked at 32 0 C (4/4) a. *ee a a a RS 2B s an Reetn Strin
I
S amplie Source In Vitro Phenotype ts ca 39/32 0 C EOP 20/32-C plaque morph] Yield In Vi vo Cotton Rat Nasal 1 T L-, Growth* 4 RSV 2B20L RSV 2B20L Ri TS RSV 2B20L R2
TS(+)
RSV 2B20L R9 TS RSV 2B20L R10 TS ca, ts mutant isolated from 2B cold-passaged x 20 2B20L spinner passage, plaque picked at 39'C 2B20L spinner passage, plaque picked at 39 0
C
2B20L spinner passage, plaque picked at 39 0
C
2B20L spinner passage, plaque picked at 39*C 0.0002 (int/wt) 0.6 0.02 turbinates <1 9 d (0/4) (0/4)
AGM
N asal Bronchial Wash Lavage (0n/2) (0/2) ND 12 .3c 15 (ND IND (4/4) 06N 2.5c 2 .70 ND ND
(WT)
(4/4) 0.8ND_22c4.O
NDD
(WT)
0.7ND26'32 ND D (WT)(44(4 In Vivo growth measured in log 10 mean virus t ND not done WT wild-type plaque size aDose 106*7 PFU IN bDose 103'6 PFU IN d Dose 105'9 PFU IN e Dose 106'6 PFU IN+IT iter infected/# total) sp small plaque size int =intermediate plaque size CDose 1063 PFU IN Dose 106" PFU IN+IT 68 Table 2B33F Revertants vitro ~2 AGM ppjAChimp 4a 3b p2 pp4 P6 p 1A 3 base no.t
M
4176,4200 S S S S S S S S S S
SH
14 bases* S S S S S S S S S S
L
9560 S S S S S S S S S S 9854 2B 2B 2B 2B S S S ND 2B 2B 12187 S S S S S S S S S S 14588 S S S S S S S ND S S 15072 S S S S S S S S S S Phenotype ts {2B 2B 2B r r S S 2B 2B 2B ca S S S 2B S 2B S ND ND ND Attenuated r r r S S ND r r t These 2B33F revertant base nos. are one larger than for 2B for M, SH and L genes *bases 4330,4410,4421,4443,4455,4485,4498,4506,4526,4527,4543, 4562,4576,4599 S same base as 2B33F 2B reversion to 2B base or complete reversion in phenotype r moderate reversion in phenotype slight reversion in phenotype ND not donA 69 Table 11 2B20L Revertants base no.t In vitro Isolates R1 R2 R3A R4A R5A R6A R7A R8A R9A
LL
8964 S S S S S S S S S S 13348 C* S ND S S ND S S S S 14588 S S S S S S S S S S 14650 S S 2B S 2B 2B S S 2B 2B 14651 A* A* S A* S S A* A* S S Phenotype_ ts 2B 2B ND ND ND ND ND ND 2B 2B lAttenuated r r ND ND ND ND ND ND r r t These 2B20L revertant base nos. are one larger than for 2B for L genes S same base as 2B20L 2B reversion to 2B base r moderate reversion in phenotype base change, different from 2B or 2B20L ND not done Table 12 RSV 2B, ts and Revertant Strains: Phenotype Summary Virus Isolate Source In Vi tro In Vivo Phenotype Attenuation ts ca Cotton AGM I Rat RSV 2B j wild-type Parent Strain I I RSV 2B33F ca, ts mutant isolated from 2B, cold-passaged x 33 RSV 2B33F -5a 2B33F spinner passage TS(4-) plaque picked at 39*C SV B3F 4 2B33F pinner passage ND TS(+)plaque picked at 39*C RSV 2B33F -3b 2B33F spinner passage
ND
plaque picked at 39'c AGM pp2 2E33F-infected AGM A2,
ND
d7 nasal wash plaque picked at 32'C AGM pp4 2B33F-infected AGM A2,
ND
d7 nasal wash plaque picked at 32*C AGM pp6 2B33F-infected AGM. A4,
ND
d12 nasal wash plaque picked at 32'C AGM pp7 2B33F-infected AGM A4,
ND
d12 nasal wash plaque picked at 32'C Chimp pplA 2B33F-infected chimp ND ND ND #1552, d4 tracheal lavage, plaque picked at 32 0
C
Chimp pp3A 2B33F-infected chimp ND
ND
#1560, d6 tracheal lavage, plaque picked at 32'C Chimp pp5A 2B33F-infected chimp ND
ND
#1563, dlO tracheal lavage, plaque picked at 32*C 71 Table 12 (continued) RSV 2B, ts and Revertant Strains: Phenotype Summary Virus isolate Source In Vi tro In Vv Phenotype Attenuation ts ca Cotton
AGM
Rat RSV 2B20L ca, ts mutant isolated from 2B, cold-passaged x RSV 2B20L RI 2B20L spinner passage ND
ND
plaque picked at 39 0
C
RSV 2B20L R2 2B20L spinner passage ND
ND
plaque picked at 39 0
C
RSV 2B20L R9 2B2aL spinner passage
N
TS plaque picked at 39'C RSV 2B20L Ri0 2B20L spinner passage ND
ND
plaque picked at 39'C ND =not done -=Wild-type phenotype, not temperature sensitive, not cold adapted, not attenuated to increasing levels of temperature sensitivity, coldadaptation or attenuation 9 .9 9* 9 9.
9 9* 9 99999.
9 99 9.
9999 991.9 72 Several significant observations can be drawn from these data: a. As shown in Tables 7 (for 2B33F) and 8 (for 2B20L), there are relatively few sequence changes identified in the two mutant strains: RSV 2B33F differs from parental RSV 2B by two changes at the 3' genomic promoter region, two changes at the non-coding of the M gene, and four coding changes plus one non-coding (poly(A) motif) change in the RNA dependent RNA polymerase coding L gene. In addition, 14 changes mapped to the SH gene alone. RSV 2B20L differs from its RSV 2B parent only at seven nucleotide positions, of which three are common with 2B33F virus, including two changes at the 3' genomic promoter and one coding change in the L gene. Two additional unique changes of 2B20L virus mapped to the coding region of the L gene.
Potentially attenuating mutations at the non-coding 3' genomic promoter region and the RNA dependent RNA 20 polymerase gene have been identified.
b. Two ts mutations can be identified in the L gene of the attenuated virus strains 2B33F and 2B20L: In 2B33F, a mutation at nucleotide position 9853 (A G) leading to a coding change in L protein at amino acid 451 (Lys ->Arg) is clearly associated with the ts and attenuation phenotypes. Reversion at this site alone in the 2B33F 5a strain is 30 responsible for complete restoration of growth at 39 0
C
(Table 9) and partial reversion in attenuation in animals. This association with the ts and attenuation phenotypes was also supported by partial sequence analyses of six additional "full TS revertants" (designated 4a, 3b, pp2, 3A, 5a, 5A) isolated from cell 73 culture and from chimps, in which only the nucleotide 9853 mutation reverted (Tables 10-12) (note that one AGM (African Green Monkey) isolate which reverted at 9853 only partially reverted in ts phenotype). This amino acid 451 mutation (Lys ->Arg) is amenable to stabilization in cDNA infectious clone constructs, by inserting a second mutation to stabilize the codon, thereby lessening the likelihood that it will revert back to Lys.
(ii) In 2B20L, a mutation at base 14,649 (A leading to a coding change in the L protein (amino acid position 2,050, Asn ->Asp) appears to be associated with the ts and attenuation phenotypes. This aspartic acid at the amino acid 2050 invariably reverts back (Asp ->Asn) in revertants or changes to a different amino acid (Asp Val) by nucleotide substitution at position 14,650 (A (Tables 8, 11). The above observation is based on complete 20 sequence analysis on the revertant R1 and partial sequence of several additional revertants (R2, R4A, R7A, R8A) at selected regions (Table 11). An additional mutation is seen in the R1 revertant at nucleotide postion 13,347 (amino acid 1616, Asn Asp) associated with the above reversion. However, the effect of this mutation on the ts phenotype is not
O
known; the L gene of other revertants has not been sequenced completely.
c. Three base changes are common to 2B33F and 2B20L strains of virus: A change at position 14,587 (C with a corresponding change (Thr Ile) at amino acid 2029 is 74 present in both 2B33F and 2B20L (Tables This nucleotide substitution was found to be present in of the population of the progenitor RSV2B strain and may have been preferred during the attenuation process. No wildtype base was found in the 2B33F and 2B20L virus.
(ii) Two mutations are seen in the 2B33F and 2B20L 3' genomic promoter region: nucleotide 4 (C and the insertion of an extra A in the stretch of A's at positions 6-11 (in antigenomic, message sense). When the sequences of selected revertants were analyzed, these mutations were seen to have been retained in the 2B33F TS(+)5a (Table 7) and the 2B20L TS(+)R1 (Table 8) revertants. These non-coding, cisacting mutations remained associated with partial viral attenuation.
Expression using the minireplicon RSV-CAT system for the analysis of these cis-acting changes has 20 shown the 3' genomic promoter nucleotide 4 (C G) change to be an upregulation of transcription/replication in this in vitro system when the 2B progenitor virus or either of the 2B33F or 2B33F provided helper L gene functions (the N, P and M2 genes are identical in these viruses) Complementation analysis of the 2B33F 3' genomic promoter and the helper functions provided by the progenitor RSV2B virus or the 2B33F and 2B33F TS(+) viruses by this RSV-CAT minireplicon system has also 30 been conducted. All three viruses supported both the 2B and 2B33F 3' genomic promoter mediated transcription/replication functions. However, the 2B33F and 2B33F viruses preferred their 2B33F 3' genomic promoters. This analysis clearly shows coevolution of 3' genomic promoter changes during the 75 vaccine attenuation process, along with the RNA dependent RNA polymerase gene. Reversion of ts phenotype in the 2B33F mutant 5a by reversion of the single L protein amino acid 451 (Arg -*Lys) by sequence analysis was clearly demonstrated by support of transcription/replication functions of RSV-CAT minireplicon at 37°C. The 2B33F virus did not provide helper functions to the RSV-CAT minireplicon (with 2B or 2B33F 3' genomic promoters) at 37°C.
d. A biased hypermutation of SH seen in 2B33F is present in all 2B33F revertants, regardless of phenotype, and is not seen in 2B20L, which is ts, ca, and attenuated. Thus, there are no data at this time that associate this mutation with any biological phenotype.
Another wild-type RSV designated 18537 was also sequenced and compared to the sequence of the wild-type RSV 2B strain. With one exception, at all 20 the critical residues described above, the two wild- Stype strains were identical. For 2B, the codon ACA at nucleotides 14586-14588 encodes a Thr at amino acid 2029 of the L protein, while for 18537, the codon ATT .at nucleotides 14593-14595 encodes an Ile at amino acid 2029 (the L gene start codon is at nucleotides 8509- 8511 in 18537, compared to 8502-8504 in 2B).
Example 4 PCR Assay to Detect Measles Virus A 21 year old patient was admitted to a hospital with a three week history of progressive nonproductive cough, shortness of breath, and fever. His symptoms failed to improve following treatment with clarithromycin for seven days or after a similar course 76 of treatment with atovaquone. Concomitant complaints of right upper quadrant abdominal pain proved recalciltrant to omeprazole and antacids. Relevant past medical history included Factor VIII deficiency and HIV infection diagnosed 3-4 years prior to this hospital admission. One year earlier, he had received a booster immunization of measles-mumps-rubella
(MMR)
vaccine as required for college enrollment.
Bronchoalveolar lavage and transbronchial biopsies performed two days after admission to the hospital demonstrated reactive hyperplasia and alveolar lining cell desquamation with minimal chronic inflammation. No microorganisms were revealed by Gram, methenamine silver, or PAS stains. CT scans of the chest showed multiple, ill-defined, confluent nodules at the left lung base. Despite administration of empiric antimicrobials for opportunistic bacterial, mycobacterial, and fungal pathogens commonly responsible for pulmonary complications of advanced HIV S 20 disease, the patient became and remained febrile to :39 0 C. A left-sided pleural effusion developed; diagnostic thoracentesis showed it to be exudative but otherwise non-diagnostic. Bronchoalveolar lavage performed three weeks later only demonstrated alveolar histiocytes, some of which were hemosiderin laden, a few lymphocytes, and neutrophils. FITE, AFB, and methanamine silver stains again were negative.
Two weeks thereafter, a wedge resection of the left lung was performed through CT-guided 30 minithoracotomy. Multiple tissue sections revealed nodular areas of acute and chronic inflammation with regions of necrosis and fibrosis. Numerous multinuclated giant cells were present, some of which contained both intracytoplasmic and intranuclear inclusions suggestive of measles virus giant cell 77 pneumonia. Special stains for bacteria, fungi, P.
carinii, and acid fast organisms again gave negative results. Electron microscopic examination of sections of this lung biopsy revealed particles morphologically consistent with paramyxoviruses such as measles virus.
Serum anti-measles IgM titers determined by a solid phase hemadsorbant assay were negative, as was a subsequent IgM capture immunoassay.
Two weeks later, Rhesus monkey kidney (RMK) tissue culture cells inoculated with the patient's lung biopsy material revealed cytopathic changes characteristic of measles virus infection.
Confirmation was obtained using an immunofluorescence assay with monoclonal antibodies directed to measles virus. Based upon this diagnosis, oral ribavirin 1000mg B.I.D. was given for 14 days. Unfortunately, the patient progressively deteriorated, eventually dying two months later.
In order to ascertain the nature of the 20 measles virus present in the patient, reverse transcription and PCR amplification of virus obtained from infected tissues were performed, followed by sequence analysis. The measles virus isolated from Rhesus monkey kidney cells inoculated with tissue from this patient's lung biopsy was propagated by two serial passages in the continuous Vero (monkey kidney) tissue culture cell line. Total infected cell RNA was extracted at the second Vero cell passage using TRIzol reagent (Life Technologies, Grand Island, NY) according S 30 to the manufacturer's protocol. Total RNA was similarly extracted from the patient's lung biopsy material. The measles virus vaccine strain (Moraten) currently used in the United States as a component of the trivalent MMR vaccines, was obtained in its univalent form (Attenuvax, Merck, Sharpe, Dohme) 78 This virus was passaged once in Vero cells and total vaccine infected cellular RNA then was extracted as described above.
Each of these RNA preparations was reverse transcribed (RT) to cDNA using random hexameric primers and Maloney murine leukemia virus reverse transcriptase (Perkin-Elmer/Cetus RT-PCR kit reagents, Perkin-Elmer- Cetus, Branchburg, NJ). The cDNA then was amplified by PCR using measles virus-specific oligodeoxynucleotide primer pairs whose design was based on the Edmonston measles virus sequence described above. These PCR products comprised a set of overlapping DNA fragments spanning the entire 15,894 nucleotide long measles genome. A consensus genomic sequence was established by direct analysis of each PCR product, without cloning, using the dideoxy terminator cycle-sequencing method established by the manufacturer (ABI PRISM 377 sequencer and ABI PRISM DNA sequencing kit; Perkin- Elmer/Cetus, Foster City, CA). Both strands of the 20 PCR-amplified DNA products were analyzed to eliminate possible sequencing ambiguities.
The nucleotide sequences of selected regions of the measles virus genomes present in the patient's viral isolate, as well as in the diseased lung tissue, S 25 were compared with that of the Moraten vaccine virus, as well as with the nucleotide sequences of other o *measles virus wild-type and vaccine strains. This sequence analysis revealed identity to the Moraten vaccine strain rather than demonstrating relatedness to 30 past or currently circulating wild-type viruses or other measles vaccine strains.
79 Example ELISA to Detect RSV An ELISA test is used to detect the presence of RSV. Peptides are designed and selected based on homologies to the RSV sequences described herein to be specific for all subgroup B strains, or for individual wild-type, vaccine or revertant RSV subgroup B strains described herein. These peptides are then coupled to KLH and used to immunize rabbits for the production of monospecific polyclonal antibody. A selection of these polyclonal antibodies, or a combination of polyclonal and monoclonal antibodies is then used in a "capture ELISA" to detect the presence of an RSV antigen.
e o 80 Bibl1iocrraphy 1. Kapikian, et al., Am. J.
Epideinol.,, 89, 405-421 (1969).
2. Chin, et al., Am. J. Epidemol., 89, 44 9 -463 (1969).
3. Fulginiti, et al., Am. J.
Epidemol.,, 89, 435-448 (1969).
4. Prince, et al., J. Virologyv, 57, 721-728 (1986).
Kim, et al., Pediatrics, 52, 56-63 (1973).
6. Hodes, et al., Proc. Soc. Exp.
Biol. Med.-, 145, 1158-1164 (1974).
7. Beishe, and Hissom, J. Med.
Virol., 10, 235-242 (1982).
8. Black, et al., Am. J. Epidemiol., 124A, 442-452 (1986).
Lennon, and Black,
J.
Pediatrics, 108, 671-676 (1986) 10. Pabst, et al., Pediatr. Infect.
Dis. 11, 525-529 (1992).
11. Centers for Disease Control, MMWR, 369-372 (1991).
12. Centers for Disease Control, Z4MWR, 41:S6, 1-12 (1992).
13. King, et al., Pediatr. Infect.
Dis. 10, 883-887 (1991).
14. Rota, et al., ViroLocrv, 188, 135- 142 (1992).
Rota, et al., Virus Res., 31, 317- 330 (1994).
16. Lamb, and Kolakosky, pages 1177-1204 of Volume 1, Fields Virolocrv, B.N. Fields, et al., Eds. (3rd ed., Raven Pr-ess, 1996).
81 (1993).
6094 (1995 5783 (1.995 Sci., IusA, No. 702,08! WO 96/1040 Virology, WO 97/0627( 60/047575.
2 WO 97/12032 569-579 (19 2 807 (1995).
2 17. Sidhu, et al., Virologqy, 193, 18. Garcin, et al., -EMBO 1-4, 6087- 1.9. Radecke, et al., EMBO 14, 5773- Collins, et al., Proc. Natl. Acad.
92, 11563-11567 (1995).
1. Published European Patent Application 22. Published International Application No.
23. Baron, and Barrett,
J.
11, 1265-1271 (1997).
Published International Application No.
U.S. Provisional Patent Application No.
Published International Application No.
Kato, et al. Genes to Cells, 1, p96).
8. Sidhu, et al., Virolog, 208, 800-
S.
.9. Shaffer, M.F., et al., J. Immunol., 41, 241-256 (1941).
30. Enders, et al., N. Encrl. J. Med., 263, 153-159 (1960).
31. Enders, and Peebles, Proc.
Soc. Exp. Biol. Med., 86, 227-286 (1954).
32. Schwarz, Am. J. Dis. Child., 103, 216-219 (1962).
33. Griffin, and Bellini, pages 1267-1312 of Volume 1, Fields Virology, B.N. Fields, et al., Eds. (3rd ed., Raven Press, 1996).
82 34. Birrer, et al., Virolo~y 108, 3 81-3 90 (1981).
Birrer, et al., Nature, 293, 67-69 (1981).
36. Norby, et al., pages 481-507, in The Paramvxoviruses, D. Kingsbury, Ed. (Plenum Press, 1991).
37. Peebles, pages 427-456, in The Paramyxoviruses, D. Kingsbury, Ed. (Plenum Press, 1991).
38. Egelman, et al., J. Virol., 63, 2233-2243 (1989).
39. Udem, et al., J. Virol. Methods 8,123-136 (1984).
Udem, and Cook, J. Virol.,, 49., 57-65 (1984).
41. Moyer, and Horikami, pages 249-274, in The Parayxoviruses, D. Kingsbury, Ed.
(Plenum Press, 1991).
42. Blumberg, et al., pages 235-247, in The Parnwxoviruse D. Kingsbury, Ed. (Plenum Press, 43. Berrett, et al., pages 83-102, in The Parmyxoviruses, D. Kingsbury, Ed. (Plenum Press, 1991).
44. Tordo, et al., Sem. in Virolov, 3, 341-357 (1992).
Cattaneo, et al., EMBO 6, 681- 688 (1987).
46. Crole, et al., Virol 164, 498-506 (1988).
47. Banerjee, and Barik, et al., Virolognr, 188, 417-428 (1992).
48. Castaneda, and Wong,
J.
Virol., 63, 2977-2986 (1989).
83 49. Chan, et al. pages 221-231, in Genetics and Pathogenicitv of Negative Stranded Viruses, B.W.J. Mahy and D. Kolakof sky, Eds. (Elsevier Biomedical Press, 1989).
Blumnberg, et al., Cell, 23, 837-845 (1981).
51. Blumberg, et al., Cell, 32j, 559-567 (1983).
52. Kolakofsky, and Blumberg,
B.M.,
pages 203-213, in Virus Persistence, B.M.J. Mahy, et al., Eds. (Cambridge University Press, 1982).
53. Castaneda, and Wong,
J.
Virol., 64, 222-230 (1990).
54. Curran, and Kolakofsky,
D.,
Virology, 182, 168-176 (1991).
Sidhu, et al., Virolo, 193, 66- 72 (1993).
56. Sidhu, et al., Virology, 202, 631- *641 (1994).
*57. Collins, et al., pages 1205-1241 .of Volume 1, Fields Virology, B.N. Fields, et al., Eds.
(3rd ed., Raven Press, 1996).
58. Crookshanks, and Beishe,
J.
Med. Virol., 13, 243-249 (1984).
59. Crooks hanks -Newman, and Belshe, J. Med. Virol., 18, 131-137 (1986).
C 60. Hall, et al., Virus Res., 22, 173- 184 (1992).
61. Karron, et al., J. Inf. Dis., 172, 1445-1450 (1995).
62. Anderson, et al., J. Infect. Dis., 151, 626-633 (1985).
63. Collins, pages 103-162 of The Paramxoviruses, D.W. Kingsbury, Ed. (Plenum Press, NY and London, 1991).
84 64. Sullender, J. Virology, 65, 5425- 5434 (1991).
Lerch, et al., J. Virology, 64, 5559-5569 (1990).
66. Mallipeddi, and Samal,
J.
Gen Virol., 74, 27 87 -2791 (1993) 67. Johnson, et al., J. Virology, 61, 3163-3166 (1987).
68. Stott, et al., J. Virology, 61, 3855-3861 (1987).
69. Henderson, et al., N. Enql. J.
Med., 300, 530-534 (1979).
Hall, et al., J. Infect. Dis., 163, 693-698 (1991).
71. Mufson, et al., J. Gen. Virol., 66, 2111-2124 (1985).
72. Glezen, et al., Am. J. Dis.
Child., 140, 543-546 (1986).
73. Hemming, et al., Clin. Microbiol.
Res., 8, 22-33 (1995).
*74. Collins, P. L. et. al., pages 1313-1351 of volume 1, Fields Virology, B. N. Fields, et al., Eds. (3rd ed. Raven Press, 1996) Ling, and Pringle, J. Gen.
Virol., 70, 1427-1440 (1989).
76. Yu, et al., J. Virology, 69, 2412- 2419 (1995).
77. McIntosh, and Chanock, pages 1045-1072 of Virology, B.N. Fields, et al., Eds. (2nd ed., Raven Press, 1990).
78. Heminway, et al., page 167 of Abstracts of the IX International Congress of Virology, P17-2, (1993).
79. Mink, et al. Virolocry, 185, 615- 624 (1991).
85 Dickens, et al., J. Virology., 52, 364-369 (1990).
81. Wagner, and Rose, pages 1121-1135 of volume 1, Fields Virology, B.N. Fields, et al., Eds. (3rd ed., Raven Press, 1996).
82. Barik, J. Gen. Virol., 74, 485-490 (1993).
83. Collins, et al., pages 259-264 of Vaccines 93: modern approaches to new vaccines including prevention of AIDS, F. Brown et al., Eds.
(Cold Spring Harbor Laboratory Press, NY, 1993).
84. Kuo, et al., J. Virology., 70, 6892- 6901 (1996).
Huang, and Wertz,
J.
Viroloogy, 43, 150-157 (1982).
86. Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).
87. Ray, et al., J. Virol., 69, 1959- 1963 (1995).
L* O o 88. Ray, et al., J. Virol., 70, 580-584 (1996).
89. Stokes, et al., Virus Research, .43-52 (1993).
U.S. Patent Application No. 08/059,444.
i o Page(s) 4 b 21 are claims pages they appear after the sequence listing 86 SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Udem, Stephen A.
Sidhu, Mohinderjit S.
Tatem, Joanne M.
Murphy, Brian R.
Randolph, Valerie B.
(ii) TITLE OF INVENTION: 3' Genomic Promoter Region and Polymerase Gene Mutations Responsible for Attenuation in Viruses of the Order Designated Mononegavirales (iii) NUMBER OF SEQUENCES: 79 (iv) CORRESPONDENCE
ADDRESS:
ADDRESSEE: American Home Products Corporation STREET: One Campus Drive CITY: Parsippany STATE: New Jersey COUNTRY: United States ZIP: 07054 COMPUTER READABLE
FORM:
MEDIUM TYPE: Floppy disk S(B) COMPUTER: IBM PC compatible OPERATING SYSTEM:
PC-DOS/MS-DOS
SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: US FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT
INFORMATION:
NAME: Gordon, Alan M.
REGISTRATION NUMBER: 30,637 REFERENCE/DOCKET NUMBER: 33,294 PCT (ix) TELECOMMUNICATION
INFORMATION:
TELEPHONE: 973/683-2157 S(B) TELEFAX: 973/683-4117 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single 87 TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: ACCAAACAA.A GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTG
CACTTAGGAT
TCAAGATCCT
TAAGGAGCTT
GTGGAGCCAT
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA.
TGATTCTGGG
CAGACACGGC
TAGTTGGTGA
AGGACCTCTC
GAA.ACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTGC
ATTATCAGGG
AGCATTGTTC
CAGAGGAATC
ATCCAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGAA
TACCATCCTA
AGCTGATTCG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTAAGGAGG
CGAGGATGCA
ACAAGAGCAG
AA.AAGAAACA
AAACACATTA
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCCCAAATTT
GAGCTAAGAA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGCCGA.T
TCAGCTGGAA
AGGCTTGTTT
GATTAGGGAT
AGGACAAACC!
TTATAGTACC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAATTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATA;A
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAG,
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
CAGAGATTGC
ATCCGAGATG
ACCCATTACA
AA.TCCCTGGA
AATTGGAAAC
CTTATTTGTG
CATA.AGGCTG
AAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAAAGGCG
GTACACCCAA
GGTGAGGAAC
GGATATCAAG
TACATATATC
AACTATGTAT
CTTGATGAAC
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
AATGCATACT
GCCACACTTT
TCAGGATCCG
GATTCCTCAA
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGAACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTGAACTCG
ACTGAGGACA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 88 0*
AGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
CGGAAGAACA
ACTAGGTGCG
AAAACTTAGG
GAGCCGATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGCCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AA;AACAGCCT
CTGCTGTTAT
AGCGGTTGGA
GCTACCGAGA
GGAGAGCTAC
AACCGGCACA
TCGAAGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GACCTGGAGA
CATCAAGCAC
GAATCC;AAGA
CAGGAGGAGA
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGA
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAT
TGAAGGGAGA
CCCAGACAAG
TTGGGGGGCA
AGAGAAACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
GACCAGAACA
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CAATGAATCT
CACTGACCGG
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
CTATTATGAT
ACACGAGGAT
AGTTGAGTCA
CCCAAGTATC
AGGAAGATAG
GGCCCAGCAG
TTGACACTGC
TGCTTAGGCT
CTATAGTGTA
ACATCCGCCT
CAGCCCATCA
GTCAAAAACG
ATCGAGGALAG
ACCTGCAGGG
GGATCAACTG
GACGCTGAAA
TGTTATTATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAA.AAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATTAAGAAGC
ATTTCTACAC
GAGGGTCAA.A
AGCAAGTGAT
ATCGGAGTCC
GCA.AGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
GACTGGAATG
CTATGGCAGC
AAGAGAAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCTGATGT
TAATCTCCAA
AGATCAACAG
GGTGATCAAA
CAGAGTCGAG
GCGAGAGCTG
AGCCAAGATC
GCAGGAATCT
AATCTTCTAG
ATTGTTATAA
CCCACGATTG
CATCCGGGCT
ATGGTCAGAA
AGGCAGTTCG
ACCTCGCATC
CCCCCCALAGA
CAGCGGTGAA
TGATGGTGAT
TGGCGAACCT
GGGGTTCAGG
ACTCCAATCC
CCCGGACCCC
ATTAGCCTCA
TCGAAAGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
GCAAAATATC
1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 89
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
CCAGTCGACC
GCCTCCCAAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
AAGAATTCCG
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTALATCCG
TGACTCTCCT
TGAAGATAAT
CAACTAGTAC
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGGC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATAGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
CATTTACGAC
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
AATGAAGTAG
AACCTAAATC
ACAGAGATCT
CAACCCACCA
CTAGGCGACA
AGCGATCCCC
AGATCCACAG
AGACGTACAG
CTCACACCTT
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGAAGA
GGGAACTTCA
GAAAAGATGG
AGCACAGGCA
TACCCGCTGA
ATAGTAAGAA
GACGTGATCA
AGCATCATGA
GAAATCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AAATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CAAAGCCCGA
CAGGGCTCAA
GGAGAAAGGT
ATCTGATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGA.ALAGAA
GCCTGGTTTT
AGATGAGCAA
TGGATATCAA
TCCAGGCAGT
TAAATGATGA
TCGCCATTCC
CCGACTTGAA
AACCCGTTGC
AGCTGCTGAA
TTGTTCCTGA
GGCTAGAGGA
ATGATCTTGC
ACTTACCTGC
AACTTAGGAG
CAAGTCGGCA
TGGCAGGCTG
ATGCTTTATG
AATCGGGCGA
AGAACTCCTC
TGAAAA1ACTG
CCTAACAACA
GCTCGATACC
CGGGTATTAC
CAACCTGCTG
TACAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTGCAGCCA
CCAAGGACTA
TGGACTTGGG
ACCCATCATA
CAGCCGACAA
GGAATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCTC
TTCAAAGTTC
2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 90 TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGA.AG 4500
S.
4 S. S 4
S
S.
S.
GCCCGGACA.A
GCCGACGGCA
CCACCAGCCA
TGCCCCCGAT
ATTGGAAGGC
GACCGAGGTG
ACTAAACAAA
CGGCGCCGCG
CCCCGGTGCC
AATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCCTCCTTT
CCCACCCCTA
GGTCTCAAGG
ACCGGTCA;A
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
GAGGCAATCA
AA.AAGCCCCC
AGCGCGAACA
CCCCAATCTG
CCAAACCACC
CCCTCCCCCT
ACCCALACCGC
ACTTAGGGCC
CCCCCAACCC
CACAGGCAGG
GGGGGGGCCC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
AAGGAGACAC
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCAATAACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
TCCGAAAGAC
CCAGGCGGCC
CATCCTCCTC
AACCG CAT CC
CTTCCTCAAC
AGGCATCCGA
AAGGAACATA
CCGACAACCA
GACACCAACC
CCCCAAAAAA
CACACGACCA
CGGCCATCAC
GGGAGCCACC
AAGGACATCA
ACCAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
GCAGGAGATG
TCCACGGACC
CCAGCACAGA
GTGGGACCCC
CCACCACCCC
ACAAGAACTC
CTCCCTAGAC
CACACCCAAC
GAGGGAGCCC
CCCGAACAGA
AGGCCCCCAG
CGGCAACCAA
CCCGCAGA;A
CAACCCGAAC
GTATCCCACA
CAATCCACCA
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCGG
ACAGCCGGCA
GCGAGCCTGG
ATATTGGCTG
AAGCGAGAGG
ACAGCCCTGA
CGAGGACCAA
CGGGAAAGAA
CACAACCGAA
AGATCCTCTC
AGAACCCAGA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAACCC
GGAAJAGGCCA
CAGCACCCAA
GCCTCTCCAA
CACCCGACGA
CTCATCCAAT
TGTTAACTCT
TGGTAGGAAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTAGTCCT
TTGCACTTCA
AAACTACTAA
TTCAGGGTGT
CCAGCCAGCA
CACAAGGCCA
CCCCCAAGGC
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAA
CCCCGGCCCA
CCGCCGGTTC
AACCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
GTCCATCATG
CCAGACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGGTGCG
CCAGT6CATG
TCAGGCAATT
CCAAGACTAC
4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000
S
4 5O S 4
S.
91 *0@e *e 4 4 04 @9 4 4 4 4. 9 .9 4 4 *4 9 44.4 9 9&*4 *99*44 4 94*444 9 4.
4 9 94 9 *fl.
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GTGTGCAGCC
TCCACCAAGT
TCACAAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
TCTAGATGTA
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAA-A
TAAAGGCCCG
ATCCGACGCT
ACATAGGCTC
TTATCTCGAA
AAAATGCCTT
CCTGTGCTCG
ACCTAATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GAGAACAAGT
CCTATGTAAG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTTG
CATCGGGCAG
ACTAACTCAA
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTATCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGAACAATA
AACGAGACCG
TCATTAACAG
TCATGTCTCT
CCATCTACAC
TCGAGCATCA
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
GCTTTAATAT
AGACCAGGCC
TCCTCTACAA
ACCGCACCCA
TCGGTAGTTA
GATAALATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
TCCAAGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
AkATTGTTGGA
TAGTCTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAAAC
GCATCAAGCC
ATTAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
GTGCTGACAC
CGGCCAGAAG
CCCCAGCTTA
TGGAGGAGAC
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
AGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACGGGA
ACAAATGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
CACTCTTCAA
6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 92 44000 0 0 00 0 00 00 0000
AATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA~
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AGGTGTTATC
GCAACCAGTC
AGCCCTTTGT
CAGCTTCCAG
CCCCTTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATWAG
CTGGTGCCGT
GATGAAGTGG
AAGATTAAAT
AACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGCAAAAGGT
AGAAATCCGG
AGTAATGATC
CACGGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA,
TGCTTCCAAC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCT
AACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GCCTGAGGAC
TCCTTA.ATCC
AGAGA.ATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTAGA
CCCAGGGALAT
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AAATCAAAAT
ACAAATCCAA~
TAGGTGTALAT
CTGTCCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAGGCCG
TCGAATTACA
TGCTTGCGGA
ACCTCAGAGA
GGATAGGGAC
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTATGGGGGA
ACAACTGAGC
TCCGGTGTTC
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAACA
GGGTAAAATC
TCCTTCATAC
TGCTTCGGGA
CCACAACAAT
CAACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
AGTCGAGGTT
ACTTACCTAG
ATGTACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGAACAGATG
CAAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
GTGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
TGTTTGAAGT
ACTATCTTGA
TCAAACTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
PLCCAAAAACT
TCACTCACTC
7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAJ,
CCAATCGCAG
93 ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT
ACCCACTAGT
9180 .v r.
4 9.
9 9
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CGAGGAAGAT
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACATTTGA
CCGCTATGAC
AACTGATAGA
TGGAGCCTCT
CTTTCCTTAA
ATGAAGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAAAGGA
*CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAr
CCGTGAACTC
ATGCTTAAGG
GGAGAAAGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
TATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAAAAT
GAA.AGGTCAT
CAGTTGGCCA
TTCAGGTGAA
GAAATTTGGC
CAAGGCACTT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
TGTAATCAGG
CTCAAAAAGG
GACACTAACT
ATTAACTTGG
AC-AGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
TACCTGCAGC
ACTGAAATAC
TTAATTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTTT
CCGCTGACCC
GGGTTAACAC
TGCTTTATGC
GCTGCTCTC
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GGAATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCA~c
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATA
GTGGA.ATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGAATG
GTGGTTCCCC
AGATAGCCCG
CGCTTACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAAATT
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
rGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGATTCAGTT I
GTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAXATGATTA
CCGGCCCACT
AAAGAGTCAA
GTCAGTGATA
GAATTGAGGG
CAGTGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
GGGTTTTCTG
PCTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
kCAATCCGGA
EGGAAATCTT
!TGACAATGT
~ACCCGAALAG
9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 94
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGGGG
GGAACGTGAG
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTGTA
ATAAAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
ATCTCAAGAG
TTACGACCCT
GAGCTTTGAC
TGAGTTCA.AC
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
AGCAGCAAAA
TCCGGAGAAT
GTACTGCCTT
TTACGGATTG
TGTAAGTGAC
CAATGATCAA
GTGGACCATC
TGCTTCGTTA
ATGGCCCTAC
TCTTAGGCA.A
ATCACATTTT
ACTCAAGAGC
AGCATGCAGT
CCTTGCATAT
CACAATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
CCCAAGGGAA
CCATATGATG
CTGTCTTACA
ACTTACAAAA
AAATATTTTA
ACTCTAGCTG
TTAAAAACCT
GGGTTTATAG
ATGGAAGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCA.AGGGG
AACCTTAAGA
AGGCTACATG
TTTGTCTATT
ATCGCAAGAT
AATATTGCTA
TCCCTGAACG
TCILACCATGA
ATGGCACTGT
GTCAGAAACA
GCCTCACTAA
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAALG
GGTTCCCTCA
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTATCTATA
ACAATCAGAC
AACGGGAAGC
ATATTGGCCA
CAAAAGGAAT
GTGTATTCTG
CAACA.ATGGC
TCCTAAAAGT
CCCGGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
GAGGCTTGTA
TGTTGTAAGT
AKAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAAAGAT
CCCAGTCCAC
AGTAATTCGG
CAGTGCATTT
CAGCTTGTTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TCACCTCAAG
ATATTATGAT
GTCAGAGACT
TAAAAGCATC
GATACAGCAA
AGTCATACCC
TATTGGGGGG
AGTAACATCA
GACCCTCCAT
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
ACAAGTACCA
CAGGACCAAG
ATCACGACTG
GCACAGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACA.A
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 95 TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG
TTTGTCCTGA
C C C. C
C
C
C
C
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CTAGGCAAAG
CGACTAATTT
CCCTTGTCCG
CAGATAAGAA
TTTTAGAAAC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAAJ
GAGGCTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
AGTGGCGAGG
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
TTAAAAGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCA.ACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCGGCCA
*TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCAACGA
ACCAACAAGG
ATACCGGATC
CGATGATAGA
GTACCAAcCCC
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGACAGTAA
ACCTAGGGCA
AGGCATGCTG
CTCTCGAGTG
OCTATTGACA
GGCGAGAGCT
CCTTGAGGTC
ATGTGTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAAATAC
CAATCTCTCA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTCATAGAG
1ATTTGATGTA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
rcAGCTCTCA
CCAAGATTAT
7ATTATCATA 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT
AGAATGAGCA
96
AAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
TTGTAGACCA
GATTGAGAGT
CAAAGATCGG
ATGATGTTGC
GGGGCAATCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
TTGTCAA&GT
TCAATTTCAT
AGACCTTGCC
TGGCTCTGCT
GGGATTTTGT
TATACCCTAG
AGGCTAACCG
GGACTTCACC
CAkTTGTGGG
CTATAGAGCA
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCAATTCGA
AGAGGCTAGG
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAAATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
AGTTAGTAAT
TAACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCA.AC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTCTTGT
AAACACTTAT
GGTCTAAGAC
TTATCTCCAG
TCTCTGACTT
TTCATTTTCG
ATCTCAAATA
AAAGATATCA
GAAATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATCTCCGGCG
ACGCCCTCGC
TGAGCATCAA
ACACAAGCAA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TTTCCGCCAA
TTGTCGAACA
AAGTCACGTG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTCA
CTGAATCTTA
TTAAGCAGCA
TATCCATTAA
ATATCAATCC
TGGCAATTAA
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCGATC
TGAGGTmAT
GGATTTCAGA
GCACAATCTT
AATCGGGTTG
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTGGTTATG
GATAATTGAA
GCAACTAAGC
TACTCTGAAA
CGGACCTAAG
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGACC
AATCCAATTA
AAACAGATAA
GTCAGTCAGC
CCCCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAAG
GAGATACTTA
GGTCAA-AGGG
GTAGGTAATA
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGA
TGCATACAAG
AAACTTACAC
CTGTGCAAAG
13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 4 97 AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT TCTACAGGGA GTTGGCAAGA TTCAXAGACA ACCAAAGAAG CTTACCCCGT ATTGGTA.AGT AGCAGGCAAC GAGAACTTAT TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA TAACAGTCAA GGAGACCAAA GAATGGTATA AGTT.AGTCGG ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein
GCTTAATTCT
TCAACAAGGG
ATCTAGGATC
GATAAATAAG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
TGGTTAGGCA
CAGCTTTGTC
ATACTCATCC
ATGTTCCACG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
TTATTTGCAA
TGGT
15420 15480 15540 15600 15660 15720 15780 15840 15894 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Asp Set Leu Ser Val Asn Gin Ile Leu 1 5 10 Asp Ser Pro Ile Val Thr Asn Lys Ile Val 25 Arg Val Pro His Ala Tyr Ser Leu Giu Asp 35 40 Ile Lys His Arg Leu Lys Asn Gly Phe Ser 55 Asn Val Giu Val Gly Asn Val Ile Lys Ser 70 Ala His Ser His Ile Pro Tyr Pro Asn Cys 90 Tyr Pro Giu Val His Leu Ala Ile Leu Pro Thr Leu Asn Gin Met Giu Tyr Ala Cys Gin Asn Ile Ile Asn Lys Leu Arg Ser Tyr Pro Asn Gin Asp Leu Phe 98 Ile Giu Asp Lys Giu Ser Thr Arg Lys Ile Arg Glu Leu Leu Lys Lys 100 105 110
C
C C C .C Gly Arg Ile 145 Trp Ser Pro Leu Phe 225 Thr Arg Gly Al a Leu 305 Phe Ile Phe Asn Asp 130 Lys Phe Val1 Val1 Val 210 Giu Giu Val Asn Tyr 290 Asn Ser Phe Arg Ser 115 Thr Giu Giu Ile Phe 195 Ala Leu Thr Arg Pro 275 Leu His Asp Ile Ser Leu Tyr Asn Ser Lys Val Pro Phe 165 Lys Ser 180 Phe Thr Ile Ile Val Leu Ala Met 245 Tyr Met 260 Thr Tyr Gin Leu Cys Phe Glu Gly 325 Thr Asp 340 Phe Gly Ser Arg Ile i50 Lau Gin Gly Ser Met 230 Thr Trp Gin Arg Thr 310 Thr Asp His Lys Val 120 Leu Gly 135 Asn Leu Phe Trp Thr His Ser Ser 200 Lys Giu 215 Tyr Cys Ile Asp Lys Leu Ile Val 280 Asp Ile 295 Giu Ile Tyr His Ile His Pro Arg S er Leu Gly Phe Thr 185 Val Ser Asp Ala Ile 265 Ala Thr His Glu Leu 345 Lau Asp Gly Val Thr 170 Cys Giu Gin Val Arg 250 Asp Met Val Asp Lau 330 Thr Glu Lys Ser Tyr 155 Val His Leu His Ile 235 Tyr Gly Leu Giu Val 315 Ile Gly Al a Val Glu 140 Met Lys Arg Leu Val 220 Giu Thr Phe Glu Leu 300 Lau Glu Giu Val Phe 125 Leu His Thr Arg Ile 205 Tyr Gly Giu Phe Pro 285 Arg Asp Ala Ile Thr 365 Gin Arg Ser Glu Arg 190 Ser Tyr Arg Leu Pro 270 Lau Gly Gin Lau Phe 350 Ala Cys Giu Ser Met 175 His Arg Leu Lau Lau 255 Ala Ser Ala Asn Asp 335 Ser Ala Leu Asp Gin 160 Arg Thr Asp Thr Met 240 Gly Lau Leu Phe Gly 320 Tyr Phe Glu 355 360 Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val Ile Val Tyr Glu Thr 99 370 Leu Met Lys Gly Arg Ala His Gly Lys 465 Pro Arg Val Asn Leu 545 Glu Met Val Val Val 625 Asp Al a Glu Cys 450 Asp Lys Leu Ile Leu 530 Phe Asn Al a Ser Leu 610 Arg Arg Asp Gin 435 Phe Lys Glu Val Met 515 Ser Al a Leu Lys Gly 595 Lys Al a His Thr 420 Cys Met Ala Phe Asp 500 Tyr Tyr Lys Ile Asp 580 Val Thr Ala His Ala Ile 390 Gly Gly Ser 405 Ile Arg Asn Val Asp Asn Pro Leu Ser 455 Leu Ala Ala 470 Leu Arg Tyr 485 Val Phe Leu Val Val Ser Ser Leu Lys 535 Met Thr Tyr 550 Ser Asn Gly 565 Glu His Asp Pro Lys Asp Tyr Ser Arg 615 Lys Gly Phe 630 Phe Trp Ala Trp 440 Leu Leu Asp Asn Gly 520 Glu Lys Ile Leu Leu 600 S er Ile Cys Pro Gin 425 Lys Asp Gin Pro Asp 505 Al a Lys Met Gly Thr 585 Lys Pro Gly Gly Pro 410 Al a Ser Ser Arg Pro 490 Ser Tyr Glu Arg Lys 570 Lys Giu Val Phe Met 650 Ile 395 Leu Ser Phe Asp Glu 475 Lys Ser Leu Ile Al a 555 Tyr Al a Ser His Pro 635 Ile Thr Gly Ala Leu 460 Trp Gly Phe His Lys 540 cys Phe Leu His Thr 620 Gin Ile Leu Giu Gly 445 Thr Asp Thr Asp Asp 525 Glu Gin Lys His Arg 605 Ser Val Asn Pro Gly 430 Val Met Ser Gly Pro 510 Pro Thr Val Asp Thr 590 Gly Thr Ile Gly Leu 415 Leu Lys Tyr Val Ser 495 Tyr Glu Gly Ile Asn 575 Leu Gly Arg Arg Tyr 400 His Thr Phe Leu Tyr 480 Arg Asp Phe Arg Al a 560 Gly Ala Pro Asn Gln 640 Asp Gin Asp Thr Asp His Pro Giu Asn Glu Ala Tyr Glu Thr Val 655 100 Ser Ala Phe Ile Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 660 665 670 Tyr Giu Thr Ile Ser Leu Phe Ala Gin Arg Leu Asn Giu Ile Tyr Giy 675 680 685 Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Giu Thr Ser Vai 690 695 700 Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Ile 705 710 715 720 Pro Leu Tyr Lys Val Pro Asn Asp Gin Ile Phe Ile Lys Tyr Pro Met 725 730 735 Giy Gly Ile Giu Gly Tyr Cys Gin Lys Leu Trp Thr Ile Ser Thr Ile 740 745 750 Pro Tyr Leu Tyr Leu Ala Ala Tyr Giu Ser Gly Vai Arg Ile Ala Ser 755 760 765 Leu Val Gin Gly Asp Asn Gin Thr Ile Ala Val Thr Lys Arg Val Pro 770 775 780 Ser Thr Trp Pro Tyr Asn Leu Lys Lye Arg Giu Ala Ala Arg Val Thr 785 790 795 800 Arg Asp Tyr Phe Val Ile Leu Arg Gin Arg Leu His Asp Ile Gly His .805 810 815 His Leu Lye Ala Aen Giu Thr Ile Val Ser Ser His Phe Phe Vai Tyr 820 825 830 Ser Lys Giy Ile Tyr Tyr Asp Giy Leu Leu Val Ser Gin Ser Leu Lye 835 840 845 **Ser Ile Ala Arg Cys Val Phe Trp Ser Giu Thr Ile Val Asp Giu Thr **850 855 860 Arg Ala Ala Cys Ser Asn Ile Ala Thr Thr Met Ala Lye Ser Ile Giu 865 870 875 880 Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 885 890 895 Ile Gin Gin Ile Leu Ile Ser Leu Gly Phe Thr Ile Asn Ser Thr Met 900 905 910 Thr Arg Asp Val Val Ile Pro Leu Lou Thr Asri Asn Asp Leu Leu Ile 915 920 929; 101 Arg Met Ala Leu Leu Pro Ala Pro Ile Gly Gly Met Asn Tyr Leu Asn 930 935 940 Met Ser Arg Leu Phe Val Arg Asn Ile Gly Asp Pro Val Thr Ser Ser 945 950 955 960 Ile Ala Asp Leu Lys Arg Met Ile Leu Ala Ser Leu Met Pro Giu Giu 965 970 975 Thr Leu His Gln Val Met Thr Gin Gin Pro Giy Asp Ser Ser Phe Leu 980 985 990 Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Vai Cys Val Gin Ser 995 1000 1005 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Val Leu Ile His 1010 loi5 1020 Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Giu 1025 1030 1035 1040 Glu Asp Glu Giy Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 Pro Arg Ala Ala His Glu Ile Leu Asp His Ser Val Thr Gly Ala Arg .1060 1065 1070 Giu Ser Ile Ala Gly Met Leu Asp Thr Thr Lys Giy Leu Ile Arg Ala ***1075 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 Asn Tyr Asp Tyr Giu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Giu Ser Cys Ser Val Gin Leu 1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp, Ala Arg Leu Ala Arg Gly Arg C1140 1145 1150 *Pro Ile Tyr Gly Leu Glu Val Pro Asp Val Leu Giu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Glu Thr Cys Val Ile Cys Giu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Giy Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 1185 1190 1195 1200 Ile Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 102 1205 1210 1215 Thr Asp Giu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Asn Giu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Glu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 1285 1290 1295 Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Giu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 *Leu His Leu His Val Glu Thr Asp Cys Cys Val Ile Pro Met Ile Asp 1365 1370 1375 *His Pro Arg Ile Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Giu Leu **.1380 1385 1390 Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 **Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr 1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Giu Lys Asp His Met 1445 1450 1455 Aen Glu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 Thr Giu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 147 5 1480 1485 103 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ile Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Giu 1570 1575 1580 Leu Glu Glu Phe Thr Phe Leu Leu Cys Giu Ser Asp Giu Asp Val Val 1585 1590 1595 1600 Pro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro Ile Arg Gly Leu Arg 1620 1625 1630 :cPro Vai Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Glu Ala 1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val 1650 1655 1660 .*Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 **Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Giu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn 1700 1705 1710 Met Ser Ile Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Giy 1730 1735 1740 Asn Leu Ala Asn Tyr Giu Ile His Aia Phe Arg Arg Ile Gly Leu Asn 1745 1750'1- )D 1700 104 Ser Ser Ala Cys Tyr Lys Ala Vai Giu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Giu Pro Gly Glu Asp Gly Leu Phe Leu Giy Giu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lye Giu Ile Leu Lys Leu Asn Lys Cys Phe 1795 1800 1805 Tyr Asn Ser Giy Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Giu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Giy Leu Val Glu His Arg Met Gly Val 1825 1830 1835 1840 Gly Asn Ile Val Lys Val Leu Phe Asn Gly Arg Pro Giu Val Thr Trp 1845 1850 1855 Val Gly Ser Val Asp Cys Phe Asn Phe Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 Ser Ser Vai Gly Phe Ile His Ser Asp Ile Giu Thr Leu Pro Asn Lys 1875 1880 1885 *Asp Th~r Ile Giu Lys Leu Giu Giu Leu Ala Ala Ile Leu Ser Met Ala .1890 1895 1900 Leu Leu Leu Gly Lye Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro *1905 1910 1915 1920 Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 Tyr Arg Giu Val Asn Leu Val Tyr Pro Arg Tyr Ser Aen Phe Ile Ser 1940 1945 1950 Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lye Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Giu Lye Ile Lye Gin Gin Ile Ile Giu Ser Ser Val Arg Thr 1970 1975 1980 *Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lye Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp Ile Aen Pro 2005 2010 2015 Thr Leu Lye Lye Leu Thr Pro Ile Glu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Glu Leu Ile His His Asp 105 2035 2040 2045 Val Ala Ser Giy Gin Asp Gly Leu Leu Asn Ser Ile LeU Ile Leu Tyr 2050 2055 2060 Arg Giu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu Ile 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp, Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Lys Phe Ile Gin Asn Leu Lys Ser Gly Tyr 2115 2120 2125 Leu Ile Leu Asp Leu His Gin Asn Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Giu Lys Gin Ile Ile Met Thr Gly Gly Leu Lys Arg Giu Trp Val 2145 2150 2155 2160 Phe Lys Val Thr Val Lys Glu Thr Lys Giu Trp Tyr Lys Leu Val Gly 2165 2170 2175 Tyr Ser Ala Leu Ile Lys Asp 2180 INFORMATION FOR SEQ ID NO;3: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) SEQUENCE DESCRIPTION: SEQ ID NO:3: ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTC 120 TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCGGGA GATTCCTCAA 240 106
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
TAGTTGGTGA
AGGACCTCTC
GGAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTGC
GGATCAGTAG
ATGAAAATGA
GAGAAGCCAG
CCCATCCTCC
CGCAGGACAG
CGGAAGAACA
ACTAGGTGCA
AAAACTTAGG
ATCTAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATATTTT
GAACAAGGAA
TACCATCCTA
AGCTGATTCG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCA
ATACCCTCTG
AGGTTTGAAC
GGTGAGGAGG
CGAAGATGCA
AGCGGTTGGA
GCTACCGAGA
AGAGAGCTAC
AACCGACACA
TCGAAGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CTGGACCGGT
GCACTAATAG
ACCGA'rGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCTCAAATTT
GAGCTA.AGAA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGCCGAT
TCAGCTGGAA
AGGCTTGTTT
CCCAGACAAT
TGGGGGGGTA
AGAGAAACCA
CCCTTAGACA
GCTGACGCcC
ACGGACACCC
GACCAGAACA
ACACAGCCGC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAAGTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAAA
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGAA
ATGCCATGGG
CTTACTTCGA
AGGTCAGTTC
CAGAGATCGC
CCCAAGTGTC
AGGA.AGATAT
GGCCCAGCAG
TTGACACTGC
TGCTCAGGCT
CTAGAGTGTA
ACATCCGCCT
CAGCCCACCA
AATTGGAAAC
CTTATTTGTG
CATAAGGCTG
ALAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAA.AGGCG
GTACACCCAA
GGTGAGGAAC
GGATATCALAG
TACATATATC
AACTATGTAT
CTTGATGAAT
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
AATGCATACT
ATTCCTACAC
GAGGGTCAAA
AGCAAGTGAC
ATCGGAGTCC
GCAAGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGAACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTCAGACTAG
TCTGAACTCG
ACAGAGGACA
GGTGATCAAA
CAGAGTCGGG
GCGAGAGCTA
AGCCAAGATC
GCAGGAATCT
GATCTTCTAG
ATTGTTATAA
CCCACGATTG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 107 GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGTACTGA
AATAATGAAG
AAALACAGCCT
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
ATCCAAGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GATCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
TTGAA.ACTGC
ACTTTCCAAA
GCACTTCCGA
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAAT
TGAAGGGGGA
CCTTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CAATGA.ATCT
CACTGACCGG
AGILAGGAGGG
GCTTAGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
TTATTATGAT
ACACGAGGAT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCTGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGACATC
AATGAAGTAG
ATCGAGGAAG
GCCTGCA.AGG
GGATCAACTG
GACGCTGAAA
TGTTATTATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAAAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATCAAGAAGC
AGCATCATGA
GAAATCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AAATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CTATGGCAGC
ILAGAGAAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
ALATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAG
TTCCCCCGCCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCTGATGT
TAATCACCAA
AGATCAACAG
TCGCCATTCC
CCGACTTGAA
AACCCGTTGC
AGCTGCTGAA
TTGTTCCGGA
GGCTAGAGGA
ACGATCTTGC
ACTTACCTGC
ATGGTCAGAA
AAGCAGTCCG
ACCTCGCATC
CCCCTCAGGA
CAGCGGTGAA
TGATGGTGAT
TGGCGALACCT
GGGGTTCAGG
ACTCCAATCC
CCCGGACCCT
ATTAGCCTCA
TCGAAAGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACAA
GGAATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 108 CCAGTCGACC TAGCTAATAC AACCTAAATC CATTATAAAA AACTTAGGAG CA.A1GTGATT
GCCTCCCAAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CTCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
AAGAATTCCG
TGTAGACCGT
GCCCGGAAAA
GCTGACGGCA
CCACCAGCCA
TGCCCCCCAC
ACTGGAAGAG
GACCGAGGTG
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATTGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCCTATGT
CAGATGCAAG
CATTTACGAC
AGTGCCCAGC
AAAGGCCCCC
AGCACGAACA
TCCCAATCTG
CCAAACCACC
CCCTTCCCCT
ACCCAACCGC
ACAGAGATCT
CAACCCACCA
CTAGGCGACA
AGCGATCTCC
AGATCCACAG
AGACGTACAG
CTCATACCTT
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CATGGGAAGA
GGGAACTTCA
GAAAAGATGG
AGCACAGGCA
TACCCACTGA
ATAGTAAGA
GACGTTATCA
AATGCCCGAA
TCCGAA.AGAC
CCAGGCGGCC
CATCCTCCTC
AACCGCATCC
TTCCCTCAAC
AGGCACCCGA
ACGACTTCGA
CCTACAGTGA
GAAAAGATGA
TAGGGCCTCC
CAAAACCCGA
CAGGGCTCAA
GGAGAAAGGT
ATCTGATACC
TTTCAGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAXAGAA
GCCTGGTTTT
AAkATGAGCAA
TGGATATCAA
TCCAGGCAGT
TAAATGATGA
GACGACCCTC
TCCACAGACC
CCAGCACAcIA
GTAGGACCCC
CTACCACCCC
ACAAGAACTC
CTCCCTAGAC
CAAGTCGGCA
TGGCAGGCTG
ATGTTTTATG
A.ATCGGGCGA
AGAACTCCTC
TGAAAAACTG
CCTAACAACA
GCTGGATACC
CGGGTATTAC
CAACCTGCTG
TGCAGAGCAA
AAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTTGCAGCCA
CCAAGGATTA
CTCACAATGA
AAATGAGAGG
ACAGCCCTGA
CGAGGACCAA~
CGGGAAAGAA
CACAACCGAA
PGATCCTCTC
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCTC
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
CATAAGGCCA
CCCCCAAiGGT kCCCCCAGCA 7CACACAAGC
CCCTGGCAA
3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 109 a a a. a.
a a a a a a a..
a a a a a.
a
CGGCGCCGCG
CCCCGGTGCC
AATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCTTCCTCT
CCCGTCCCTA
GGTCTCAAGG
ACCGGTCA;A
AGCTACAILAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
GAGGCAATCA
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGIA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
CCCCCAACCC
CACAGGCAGG
GGGGGGGCCC
ACCCACCCCA
TTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
AAGGAGACAC
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCAATAACTG
AACCAATTAG
CTTCALAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAAA
TAAAGGCCCG
ACCCGACGCT
ACATAGGCTC
TTATCTCGAA
CCGACAACCA
CACACCAACC
CCC CAAAAAA
CACACGACCA
CGGCCATCAC
GGCAGCCACC
AAGGACATCA
ACTAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGG
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAT
GATAACTCAC
GTCCGAGATC
TCAAGAGTGG
TTTTGATGAG
GAGGGAGCCc
CCCGAACAGA
AGGCCCCCAG
CGACAACCAA
CCCGCAGAAA
CALACCCTAAC
GTATCCCACA
CAATCCACCA
GGAATTAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCAG
ACAGCCGGCA
GCAAGTCTGG
ATATTGGCTG
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGAkCACAG
AAGGGGGTGA
TATACGACTG
TCATCGTGTA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAACCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCAA
CATCCGACGA
CTCATCCAAT
TGTTAACTCT
TGGTAGGAAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTAGTCCT
TTGCACTTCA
AAACTACTAA
TTCAGGGTGT
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
CCGCCGGCTC
AGCCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCTGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGTTT
GGCAGGTGCG
CCAGTCCATG
TCAGGCAATT
CCAAGACTAC
CGGCCAGAAG
CCCTAGCTTA
CGGAGGAGAT
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
AGAGGGGACT
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 110 p.
GTGTGCAGCC
TCCACCAAGT
TCACAAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AkATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCGAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
TCTAGATGTA
AATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
C.AATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AAAATGCCTT
CCTGTGCTCG
ACCTA.ATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GAGAACAAGT
CCTATGTAAG
CTCTTCGTCA
TCCCCTTTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTCG
CATCGGGCAG
ACTALACTCAA
GATGAAGTGG
AAGATTAXAT
AACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGCAAAGGGT
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACILAGATC
GACCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTGTCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGAACAATA
ALACGAGACCG
TTATCAACAG
TCATGTTTCT
CCATCTACAC
TTGAGCATCA
GCCTGAGGAC
TCCTTAACCC
AGAGAATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTGGA
CCCAGGGAAT
CAGAGTTGTC
AGTCCTCTGC
TCTGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
GCTTTAATAT
AGACCAGGCC
TCCTCTACIA
ACCGCATCCA
TCGGTAGTTA
GATAA.ATGCC
AGAACACCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTACGGGGGA
ACAACTGAGC
TCCAAGAATG
TTGGGAkACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
AATTGTTGGA
TAGTCTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGGAAC
GCATCAAGCC
ATTAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
GTGCTGACAC
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
AGTCGAGGTT
ACTTACCTAG
ATGTACCGAG
CCTCCGGGGG
GTTCATTTTG
CACAACAGGA
TCACTGCCCG
AGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACAGGA
ACAAATGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CAATTGCAGG
TCAGCACCAA
CACTCTTCAA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
TGTTTGAAGT
6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 ill1 AGGTGTTATC AGAkATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 8100
GCAACCAGTC
AGCCCTTTGT
CAGCTTTCAG
AGTAATGATC
CACGGGGGAG
CTCGTCAAGC
CCCCTTCTCA ACGGATGACC too* 0*S 060 S 00.055 0:00.:
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTA.ACA
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
TGGGATGGTG
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ACAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CGAGGAAGAT
AATCAAGCAA
TGCTTCCAGC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCC
AACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGATCCCCA
CACTTCTGTG
GGCATGGGAG
CAGTGAACCA
CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAC
CCGTGAACTC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTAT
AGGCGTGTAI.
ATAACAGGAT
AAATCAAAAT
ACAAGTCCAA
TAGGTGTA.AT
CTGTCCCAAT
TGGATGGTGA
ATCTTTTGGC
GCCCAAGCCG
TCGAkATTACA
TGCTTGCGGA
rCAGCTGCAC kTCACATGAT
PAGAAAAACG
rTATACCCCG 3AGTATGCTC kAGCACCGCC
!LATGTCATCA
EGTAATCAGG
:TCAAAAAGG
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAACA
GGGTAAAATC
TCCTTCATAC
TGCTTCGGGA
CCACAACA.AT
CAACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
AGTCACCCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
PGTCCAAGCT
PTTTATTTAA
GAAATTCGCT
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
AGAACAGATG
CAAGCACTCT
GGAGTCTTGT
TTCGGGCCAT
GAGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CGCTTACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC1
GTACTCTAAAC
TCAA.ACTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATCCT
TTGAACATGC
CTTTTAGGTT
ACCAA.AAACT
TCACTCACTC
CCAATAGCAG
PCCCACTAGT
GTTATGGACT
kTAGTTACCA
CTGGAGGACC
7AAATGATTA
XGGCCCACT
LZLAGAGTCAA
;TCAGTAATA
8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 112 000S 0 0@e* @0 08 0 0 @6 @0 S S 5 00 00 S
S.
0 0 0
S
@05660
S
0 *000
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TAACATTTGA
CTGCTATGAC
AATTGATAGA
TGGAGCCTCT
CTTTCCTTAA
ATGAAGGTAC
TACACCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAAAGGA
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGAGG
GGAACGTGAG
ATGCTTGAGG
GGAGAAAGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
GCTGGTTTTG
CATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCACGAG
AGGGGAGATT
TGCTGAAAAT
GAAAGGTCAT
CAGTTGGCCA
CTCAGGTGAA
GAAATTTGGC
CAAGGCACTT
TTACGACCCC
GAGCTTTGAC
TGAGTTCAAC
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
GGCAGCAAAA
GACACTAATT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGATATACAG
CCTGCACTCG
TACCTGCAGC
ACTGAAATAC
TTAGTTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTTT
CCGCTGACCC
GGATTAACAC
TGCTTCATGC
GCTGCTCTCC
CCCAAGGGAA
CCATATGATA
CTGTCTTACA
ACTTACAAAA
AALATATTTTA
ACTCTGGCTG
CTAAAAACCT
GGGTTTATAG
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGAATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAAG
GGTTCCCTCA
TCTAGGCTCC
GCACAG CTC C
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAAATT
AACGGTAGAA
TGACCAA.AAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGA'rTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCTAAAGAT
CCCAGCCCAC
GATAATTCGG
GA.ATTGAGGG
CAATGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACAATGT
TACCCGAAAG
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
ACAAATACCA
CAGGACCAAG
9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 see*0 113
ACACTAATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTGTA
GCAAAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTA-AT
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
ATCTCAAGAG
CACAGCAACC
TTGTATGTGT
TCCACAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TCCGGAG2AAT
GTACTGCCTT
TTACGGATTA
TGTAAGTGAC
CAATGACCAA
GTGGACCATC
TGCTTCATTA
ATGGCCTTAC
TCTTAGGCAA
ATCACATTTT
ACTCAAGAGC
AGCATGCAGT
CCTTGCATAT
CAC2AATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
GGGGGACTCT
CCAGAGCATC
AAACCCAATG
AGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
GAGGCTAGCT
ATGGAAGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCAAGGGG
AACCTTAAGA
AGGCTACATG
TTTGTTTATT
ATCGCAAGAT
AATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
GCATCACTGA
TCATTCCTAG
ACTAGACTCC
TTAAAGGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
CGAGGACGGC
ACGAGACAGT
ATGAGACCAT
TTCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTATTTATA
ACAATCAGAC
AATGGGAAC
ACATTGGCCA
CAAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCAGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
ACTGGGCTAG
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
CAGTGCATTT
CAGCTTGTTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCTGTA
TGCTAGAGTA
TCACCTCALAG
ATATTATGAT
GTCAGAGACT
TAAALAGCATC
GATACAGCAG
AGTCATACCC
TATTGGGGGG
AGTAACATCA
GACCCTCCAT
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTA
CTCTCGAGTG
GCTATTAACA
GGCGAGAGCC
CCTTGAGGTC
ATCACAACTG
GCACAGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
TCAGCAAATC
TTTGTCCTAA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
CCTGATGTAC
11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 114
*O.
i
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CAAGGCAAAG
CGACTAATTT
CCCTTGTCCG
CAGATAAGAA
TTTTAGAAAC
TTCACGTCGA
CTCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
GACCATCAGG
AAGGAGTGTT
GGCACTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCTAATGTG
AGCACATAGG
AGTGGCAAGG
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTTACA
CATCCAGGCA
ACCAATTCGA
GGAGGCTAGG
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTGT
ACAAGGCTAT
CCCCA.ACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCGGCCA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTTTTGT
AAACACTTGT
GGTCTAAGAC
TTATCTCCAG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCAkACGA
ACCAACAAGG
ATACCGGATC
CAATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTCT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGTTGTCATC
TA-AGCCAccc
GTCCTTCACT
ATATGACCTA
GTGAAAGTGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATGTGTCATC
CCALACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAAATAC
CAATCTCTCA
AATGCTCCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTTATAGAG
ATTTGATGTA
GTTCCTTTCT
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAILCATA
TGCGAGTGTG
GATATTGACA
AGA.ACAGACA
AGA)ATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGCG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTAG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
AGAATGAGCA
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTALACCGACC
AATCCAATTA
12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTTCGGCG AGGATCGATC AAACAGATAA 115
GATTGAGAGI
CAAAGATCGG
ATGATGTTGC
GGGGCAATCT
CTTGCTACAA
ACGGCTTATT
AACTA.AACAA
AATTAGCACC
TTGTCAAGGT
TCAATTACAT
AGACCTTACC
TGGCTCTGCT
GGGATTTTGT
TATACCCCAG
AGGCTAACCG
GGACTTCACC
CAATTGTGGG
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TTTGGGGGCA
ATCTCAAGTC
CCA.AGTCAGA
TAACAGTCAA
o
*TGATCCAGGA
CAGCAACAAC
AAA.ATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
AGTTAGTAAT
TAACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
CATTCTTCTT
CGGTTATCTG
GAAACAGATT
GGAGACCAAG
TTCATTTTCG
ATCTCAAATA
AILAGATATCA
GAAATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
GCCTCAGGGC
TTCAAGGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
P.TTATGACGG
GAATGGTATA
ACGCCCTCGC
TGAGCATCAA
ATACAAGCAA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TCTCTGCCAA
TTGTCGAACA
AAGTCACATG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTCA
CTGAATCTTA
TTILAGCAGCA
TATCCATTAA
ATATCAATCC
TGGCAATTIA
AAGATGGATT
ACCAAAGAAG
GAGAACTTAT
ACAGAAAGTT
TACACCAGAA
GGGGTTTGAA
kGTTAGTCGG
TGAGGTAAAT
GGATTTCAGA
GCACAATCTT
AATCGGGTTG
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTGGTTATG
GATAATTGAA
GCAACTAAGC
TACTCTGAAA
CGGACCTAAA
GCTTAATTCT
TCAACAAGGG
ATCTAGAATC
GATAAATAAG
TATCTTCGTT
A.CGTGAGTGG
kTACAGTGCC
GTCAGTCAGC
CCCCCACACG
CCCATTTCTG
AACTCATCTG
CCAGGGGAAG
GAGATACTTA
GGTCAAAGGG
GTAGGTAATA
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGC
TGCATACAAG
AAACTTACAC
CTGTGCAAAG
ATACTCATCC
1iTGTTCCACG
ACTCGCAAAT
rTTATCCAGA
PLAGAATCTAT
GTTTTTAAGG
TGATTAAGG
14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 ACTA.ATTGGT TGAACTCCGG AACCCTAATC CTGCCCCAGG TGGTTAGGCA
TTATTTGTAA
116 TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC
TGGT
INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 15894 Met Asp Ser Leu Ser Val Asn Gin Ile Leu *e 1 Asp Arg Ile Asn 65 Ala Ile Gly Arg Ile 145 Trp Ser Val Lys 50 Val His Glu Asn Asp 130 Lys Phe Pro Pro His Giu Ser Asp Ser 115 Thr Glu Glu Ile Val His Ala Arg Leu Val Gly His Ile Lys Giu 100 Leu Tyr Asn Ser Lys Val Pro Phe 165 Thr Tyr Lys Asn 70 Pro Ser Ser Arg Ile 150 Leu Asn Ser Asn 55 Val Tyr Thr Lys Leu 135 Asn Phe Lys Leu 40 Gly Ile Pro Arg Val 120 Gly Leu Trp 10 Ile Val 25 Glu Asp Phe Ser Lys Ser Asn Cys 90 Lys Ile 105 Ser Asn Leu Gly Gly Val Phe Thr 170 Tyr Al a Pro Asn Lys 75 Aen Arg Lys Ser Tyr 155 Val Pro Ile Thr Gin Leu Gin Giu Val Glu 140 Met Lys Glu Val Leu Glu Leu Cys Met Ile Arg Ser Asp Leu Leu Leu 110 Phe Gin 125 Leu Arg His Ser Thr Glu His Leu Tyr Ala Gin Asn Ile Asn Tyr Pro Phe Asn Lys Lys Cys Leu Glu Asp Ser Gin 160 Met Arg 175 Ser Val Ile Lys 180 eSer Gin Thr His Thr Cys His Arg Arg Arg His Thr 10185io 117 Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu Ile Ser Arg Asp .L -11 n 205 9*
C
S
Leu Phe 225 Thr Arg Gly Al a Leu 305 Phe Ile Phe Asn Leu 385 Arg Ala His Val 210 Giu Giu Val.
Asn Tyr 290 Asn Ser Phe Arg Val 370 Met Asp Al a Giu Al a Leu Thr Arg Pro 275 Leu His Asp Ile Ser 355 Arg Lys Arg Asp Gin 435 Ile Val Ala Tyr 260 Thr Gin Cys Glu Thr 340 Phe Lys Gly His Thr 420 Cys Ile Leu Met 245 Met Tyr Leu Phe Gly 325 Asp Giy Tyr His Giy 405 Ile Val Ser Met 230 Thr Trp, Gin Arg Thr 310 Thr Asp His Met Al a 390 Gly Arg Asp Lys 215 Tyr Ile Lys Ile Asp 295 Glu Tyr Ile Pro Asn 375 Ile S er Asn Asn Glu Cys Asp Leu Val 280 Ile Ile His His Arg 360 Gin Phe Trp Al a Trp 440 Ser Asp Ala Ile 265 Ala Thr His Glu Leu 345 Leu Pro Cys Pro Gin 425 Lys Gin Val Arg 250 Asp Met Val Asp Leu 330 Thr Glu Lys Gly Pro 410 Ala Ser His Ile 235 Tyr Gly Leu Giu Val1 315 Val Gly Ala Val1 Ile 395 Leu Ser Phe Val 220 Glu Thr Phe Giu Leu 300 Leu Giu Glu Val Ile 380 Ile Thr Gly Ala Tyr Giy Giu Phe Pro 285 Arg Asp Ala Ile Thr 365 Val Ile 1Leu Glu 4~45 Tyr Arg Leu Pro 270 Leu Gly Gin Leu Phe 350 Ala Tyr Asn Pro Giy 430 Val LeL Leu Leu 255 Al a Ser Al a Asn Asp 335 Ser Al a Glu Gly Leu 415 ELeu Lys Thr Met 240 IGly Leu Leu Phe Gly 320 Tyr Phe Giu Thr Tyr 400 His Thr Phe Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 450 455 Az 118 Lys 465 Pro Arg Met Asn Leu 545 Glu Met Val1 Val Val 625 Asp Ser p.....Tyr Leu 705 Pro Gly Asp Lys Leu Ile Leu 530 Phe Asn Ala Ser Leu 610 Arg Gin Al a Glu Pro 690 Tyr Leu Gly Lys Glu Vai Met 515 Ser Al a Leu Lys Gly 595 Lys Ala Asp Phe Thr 675 Ser Val Cys Ile Leu Leu 485 Val Val1 Ser Met Ser 565 Giu Pro Tyr Lys Asn 645 Thr Ser Phe Asp Val 725 Gly Ala 470 Arg Phe Val Leu Thr 550 Asn His Lys Ser Gly 630 His Thr Leu Gin Pro 710 Pro Tyr Al a Tyr Leu S er Lys 535 Tyr Gly Asp Asp Arg 615 Phe Pro Asp Phe Trp 695 His Asn Cys Arg Pro 490 Ser Tyr Giu Arg Lys 570 Lys Glu Al a Phe Met 650 Lys Arg Lys Pro Ile 730 Leu Glu 475 Lys Ser Leu Ile Ala 555 Tyr Ala Ser His Pro 635 Glu Tyr Leu Arg Asp 715 Phe Trp Trp Gly Phe His Lys 540 Cys Phe Leu His Thr 620 Gin Al a Cys Asn Leu 700 Leu Ile Thr Asp Thr Asp Asp 525 Glu Gin Lys His Arg 605 Asn Ile Tyr Leu Glu 685 Glu Asp Lys Ile Ser Gly Pro 510 Pro Thr Val Asp Thr 590 Gly Thr Ile Giu Asn 670 Ile Thr Al a Tyr Ser Val1 Ser 495 Tyr Giu Gly Ile Asn 575 Leu Gly Arg Arg Thr 655 Trp Tyr Ser His Pro 735 Thr Tyr 480 Arg Asp Phe Arg Ala 560 Gly Ala Pro Asn Gin 640 Val Arg Gly Val Ile 720 Met Ile 119 a Pro Leu Ser 785 Arg His Ser Ser Arg 865 Arg Ile Thr Arg Met 945 Ile Thr Asp Tyr Val 770 Thr Asp Leu Lys Ile 850 Ala Giy Gin Gin Met 930 Ser Al a Leu Trp Leu 755 Gin Trp Tyr Lys Gly 835 Ala Ala Tyr Gin Asp 915 Ala Arg Asp His Ala 995 740 Tyr Gly Pro Phe Al a 820 Ile Arg Cys Asp Ile 900 Val Leu Leu Leu Gin 980 Ser Leu Asp Tyr Val 805 Asn Tyr Cys Ser Arg 885 Leu Val Leu Phe Lys 965 Val Asp Ala Asn Asn 790 Ile Glu Tyr Val Asn 870 Tyr Ile Ile Pro Val 950 Arg Met Pro Al a Gin .775 Leu Leu Thr Asp Phe 855 Ile Leu Ser Pro Ala 935 Arg Met Thr Tyr Tyr 760 Thr Lys Arg Ile Gly 840 Trp Ala Al a Leu Leu 920 Pro Asn Ile Gin Ser 745 Giu Ile Lys Gin Val 825 Leu Ser Thr Tyr Gly 905 Leu Ile Ile Leu Gin 985 Ala Ala Val Thr 780 Trp Giu Ala 795 Arg Leu His 810 Ser Ser His Leu Val Ser Giu Thr Ile 860 Thr Met Aia 875 Ser Leu Asn 890 Phe Thr Ile Thr Asn Asn Gly Gly Met 940 Gly Asp Pro 955 Ala Ser Leu 970 Pro Gly Asp Asn Leu Val Lys Arg Ala Arg Asp Ile Phe Phe 830 Gin Ser 845 Val Asp Lys Ser Val Leu Asn Ser 910 Asp Leu 925 Asn Tyr Val Thr Met Pro Ser Ser 990 Cys Val 1005 Val Vai Gly 815 Val Leu Giu Ile Lys 895 Thr Leu Leu Ser Giu 975 Phe Gln Pro Thr 800 His Tyr Lys Thr Glu 880 Val Met Ile Asn Ser 960 Giu Leu Ser 750 Ser Gly Val Arg Ile Ala Ser 1000 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Val Leu Ile His 1010 1015 1020 120 Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 1025 1030 1035 1040 Giu Asp Giu Gly Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 Pro Arg Ala Ala His Glu Ile Leu Asp His Ser Val Thr Gly Ala Arg 1060 1065 1070 Giu Ser Ile Ala Gly Met Leu Asp Thr Thr Lys Gly Leu Ile Arg Ala 1075 1080 1085 Ser Met Arg Lys Gly Gly Lou Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 Asn Tyr Asp Tyr Giu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 1105 1110 1115 1120 Arg Lys Arg Asn Val Lou Ile Asp Lys Giu Ser Cys Ser Val Gin Leu 1125 1130 1135 Aia Arg Ala Leu Arg Ser His Met Trp, Ala Arg Leu Ala Arg Gly Arg 1140 1145 1150 Pro Ile Tyr Gly Lou Giu Val Pro Asp Val Leu Glu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Glu Thr Cys Val Ile Cys Giu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Lou Asp Asp 1185 1190 1195 1200 Ile Asp Lys Giu Thr Ser Ser Lou Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 Thr Asp Glu Arg Thr Asp Met Lys Lou Ala Phe Val Arg Ala Pro Ser *1220 1225 1230 Arg Ser Lou Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Am-i Giu Ala Trp Lou Lou Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Lou Glu Giu Lou Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Sor Thr Asn Lou Ala His Arg Lou Arg Asp Arg Ser Thr Gin 1285 1290 1295 121 Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Glu Thr Leu Phe Arg Leu Giu Lys Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 Leu His Leu His Val Giu Thr Asp Cys Cys Vai Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lys Leu Giu Leu Arg Ala Giu Leu 1380 1385 1390 Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr 1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1450 1455 Glu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 *Thr Glu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly **,1475 1480 1485 Gin Cys Ala Ala Ile Ann Trp Ala Phe Asp Val His Tyr His Arg Pro *1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 ***Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ile Ile Glu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Glu 122 1570 1575 1580 Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 1585 1590 1595 1600 Pro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro Ile Arg Gly Leu Arg 1620 1625 1630 Pro Val Glu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Glu Ala 1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val 1650 1655 1660 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Glu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn 1700 1705 1710 ***Met Ser Ile Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 *Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Gly *1730 1735 1740 Asn Leu Ala Asn Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu Asn *1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Giu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 **Cys Leu Giu Pro Gly Glu Asp Gly Leu Phe Leu Gly Giu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lys Glu Ile Leu Lys Leu Asn Lys Cys Phe .*1795 1800 1805 Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Vai 1825 1830 1835 1840 Gly Asn Ile Val Lys Val Leu Phe Asn Gly Arg Pro Giu Val Thr Trp 1845 1850 1855 123 Val Gly Ser Val Asp Cys Phe Asn Tyr Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 Ser Ser Val Gly Phe Ile His Ser Asp Ile Giu Thr Leu Pro Asn Lys 1875 1880 1885 Asp Thr Ile Giu Lys Leu Giu Giu Leu Ala Ala Ile Leu Ser Met Ala 1890 1895 1900 Leu Leu Leu Gly Lys Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Vai Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 Tyr Arg Giu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 1940 1945 1950 Thr Giu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn-Arg Leu Met 1955 1960 1965 Asn Pro Giu Lys Ile Lys Gin Gn Ile Ile Giu Ser Ser Val Arg Thr 1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp Ile Asn Pro *2005 2010 2015 Thr Leu Lys Lys Leu Thr Pro Ile Giu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Giu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr *.2050 2055 2060 *.Arg Giu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 *Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Ile 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Lys Phe Ile Gin Asn Leu Lys Ser Gly Tyr 2115 2120 2125 124 Leu Ile Leu Asp Leu His Gin Asn Ilie Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Glu Lys Gin Ile Ile Met Thr Gly Gly Leu Lys Arg Giu Trp Vai 2145 2150 2155 2160 Phe Lys Val Thr Val Lys Giu Thr Lys Glu Trp Tyr Lys Leu Val Giy 2165 2170 2175 Tyr Ser Ala Leu Ile Lys Asp 2180 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ACCAAACAAA GTTGGGTAAG GATAGATCAA. TCAATGATCA
TCAAGATCCT
TAAGGAGCTT
GTGGAGCCAT
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
ATTATCAGGG
AGCATTGTTC
CAGAGGAATC
ATCCAGACTA
ACTA-ACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGAA
TACCATTCTA
AGCTGATTCG
ACAAGAGCAG
AAAAGAAACA.
AAACACATTA
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA.
TCACATGATG
ATCTCAGATA
GCCCAAATTT
GAGCTAAGAA
GATTAGGGAT
AGGACAALACC
TTATAGTACC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAAGTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAAA
TATTCTAGTA
ATCCGAGATG
ACCCATTACA
AATCCCTGGA
AATTGGAAAC
CTTGTTTGTG
CATCAGGCTG
AAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCGILAGGCG
GTACACCCAA
CACTTAGGAT
GCCACACTTT
TCAGGATCCG
GATTCCTCAA
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTCCG
GGATTCA-ACA
GTTACGGCCC
CAAAGAAGGG
120 180 240 300 360 420 480 540 600 660 720 p TAGTTGGTGA ATTCAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 125 AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGGACACCCG 0*
GGAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGAGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTGC
GGACCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
TGGAAGAACA
ACTAGGTGCG
AAPLACTTAGG
GGGCCGATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
TGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
AAGGATTGCT
TTTTATCCTA
ATTTGCTGGT
AACTGCACCC
ATACCCCCTG
AGGTTTGAAC
GGTGAGGAGG
TGAGGATGCA
AGCGGTTGGA
GCTACCAGGA
GGAGAGCTAC
AACCAGCGCA
TCGACGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GATCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGTCGAT
TCAGCTGGGA
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGAALACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
GACCAGAACA
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGACCGAACC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CGATGAATCT
CACTGACCGG
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGAA
ATGCCATGGG
CTTACTTTGA
A.AGTCAGTTC
CAGAGATTGC
CCCAAGTGTC
AGGAAGATAG
GGTCTAGCAG
TTGACACTGC
TGCTCAGGCT
CTAGGGTGTA
ACATCCGCCT
CAGCCAACCA
GTCAAAAACG
GTCGAGGALAG
ACCCGCAAGG
GGATCAACTG
AACGCTGAAA
TGTTATCATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
TACATATATC
AACTATGTAT
CTTGATGAAT
CTCAATTCAG
AGTAGGGGTG
TCCAGCATAT
CACATTAGCA
AATGCACACT
ATTTCTACAC
GAGGGTCAAA
AGCAAGCGAT
ATCGGAGTCA
GCAAGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
GACTGGAATG
CCATGGCAGC
AAGAGGAGGC
ALAGGCAGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTGAACTCG
ACTGAGGACA
GGTGATCA;A
CAGAGTCGGG
GCGAGAGCTG
GGCCAAGATC
GCAGGAATCT
GATCTTCTAG
ATTGTTATAA
CCTACGACTG
CATCCGGGCT
ATGGTCACAA
AGGCAGTTCG
ACCTCGCATC
CCCCTCAAGA
CAGCGGTGAA
TGATGjGTGAT
TGGCGAACCT
GGGGTTCAGG
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 126
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAGCGG
TTGAAACTGC
ACTTCCCGAA
GCACTTCCGA
AGATCGCGTC
CCCTCGGAAC
CATCAGGGCC
GCCGCACTGA
AATAATGAAG
AAAACAGCCT
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
CCAATCGACC
GCCTCCCAAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAAT
TGAAGGGAGA
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
TAATTAGTAC
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGT
CATAGTCGTT
ACTAACTCTC
CCAAGTGTGC
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
TTATTATGAT
ACACGAGGAT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
PLATGAAGTAG
kGCCTAIAATC PiCAGAGATCT
:!PACCTACCA
7TAGGCGACA kGCGATCCCC kGATCCACAG kGACGTACAG 7TCACACCTT
LATGCGGTTA
GAGATCCACG
ACTCTCAATG
AAA.AAGGGGA
GGTGGTGCAjA
GTGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATTAAAAAGC
AGCATCATGA
GAACTCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AAATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CAAAACCCGA
CAGGGCTCA
'GAGAAAGGT
kTCTGATACC
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CILATCTCCCC
TCTCCGATGT
TAATCTCCIJ
AGATCILACAG
TCGCCATTCC
CCGACCTGA
AACCCGTTGC
AGCTGCTGA
TTGTTCCTGA
GGCTAGAGGA
ACGATCTTGC
ACTTACCTGC
A.ACTTAGGAG
CAAGTCGGCA
TGGCAGGCTG
ATGCTTTACG
AATCGGGCGA
PGAACTCCTC3 rGAAAAACTGC ::CTAACAA&CA
C
;CTGGATACC
C
ACTCCA.ATCT
CCCGAACCCC
ATTAGCCTCA
TCGAAAGTA
TGTGACAT
GAGATCCCAG
CCALAGACATC
GCTAGAATCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACA.A
GGAATTTCAA
CACCGGCCCC
GGATCGGAAG
CAAGTTCCAC
CAACCTCATG
CAAAGTGATT
rGGGACATCA 3TGCCCCAGG rACATGTTTC
;CATTTGGGT
LZLAGAGGCCA
TGTTCTACA
;GGAGTGTCT
CGCAGAGGT
2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 a 127 4
C
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAAAA
TCTGGAGGAG
AAGAATTCCG
TGTAGACCGT
GCCCGGACA.A
GCTGACGGCA
CCACCAGCCA
CGCCCCCGAC
ACTGGAAGGC
GATCGAGGTG
ACTAAACAAA
CGGCGCCGCG
CCCCGGTGCC
AATTCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCTGCCC
CGAAGGGCCC
CTCCCCCTCT
CCCACCCCTA
TTATATGAGC
AGAATTCAGA
AGCGATTGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
CATTTACGAC
AGTGCCCAGC
AAAAG CC CCC
AGCGTGAACA
TCCCAATCTG
CCAGACCACC
CCCTCCCCCT
ACCCAACCGC
ACTTAGGGCC
CCCCCACCTC
CACAGGCAGG
GGGGGGCCCC
ACCCACCCCA
TTCCCAGACT
TGATCCGGTG
CCGAACCGCA
TCTCGAAGGG
AAGGAGACAC
ATCACCCGTC
TCGGTCAATG
CCTGGGAAGA
GGGAACTTCA
GAAAAGATGG
AGCACAGGCA
TACCCACTGA
ATAGTA.AGAA
GACGTGATCA
AATACCCGAA
TCCAAAAGAC
CCAGGCGGCC
CGTCCTCCTC
AACCGCATCC
TTCCCTCAAC
AGGCATCCGA
AAGGAACATA
CCGACAACCA
CACACCAACC
CCCCAAAAAA
CACACGACCA
CGGCCATCAC
GGCGGCCACC
AAAGACATCA
ACCAAAAGAT
CGGGAATCCC
TTTCGGATAA
CAGTGGCTTT
TCATCGATAA
GGAGAAAGAA
GCCTGGTTTT
AAATGAGCAA
TGGATATCAA
TCCAGGCAGT
TAA.ATGATGA
AACGACCCCC
TCCACGGACC
TGGGCACAGA
GTGGGAC CCC
CCACAGCCCC
GCAAGAACTC
CTCCCTAGAC
CACACCCGAC
GAGGGAGCCC
CTCGAACAGA
AGGCCCCCAG
CAGGAACCGA
CCCGCAGA-
CAACCCGAAC
GTATCCCACA
CAATCCACCA,
AGAATCAAGA
CGGGTATTAC
CAACCTGCTG
TGCAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTTGCAGCCA
CCAAGGACTA
CTCATAATGA
AAGTGAGAGG
ACAGCCCCGA
CGAGGACCAA
CGGGAAAGAG
CACALACCGAA
AGATCCTCTC
AGAACCCAGA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAATCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCAA
CACCCGACGA
CTCATCCAAT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCTG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCCC
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
CACAAGGCAA
CCCCCAAGGT
ACCCCCAGCA
CCGCACAAGC
CCCCCGGCAA
CCCCGGCCCA
CCGCCGGCTC
AGCCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAATTC
GTCCATCATG
3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 128 4 *4 4* 4
GGTCTCAAGG
ACCGGTCAAA
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
TTGAACTCTC
GAGGCAATCA
ATCAATAATG
CTAGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GTGTGCAGCC
TCCACCAAGT
TCACAAGGGA
ACGATCATTA
GTGGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCAATAACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAAA
TAAAGGCCCG
ATCCGACGCT
ACATAGGCTC
TTATCTCGAA
AA]LATGCCTT
CCTGTGCTCG
ATCTAATAGC
ATCAGGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
GTACCCGATG
TACACTTGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTATCG
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCTG
ACAGCCGGCA
GCGAGCCTGG
ATATTGGCTG
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
TGTTAACTCT
TGGTAGGGAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTTGTCCT
TTGCACTTCA
AAACTACTAA
TTCAGGGTGT
GTGATTTAAT
CACTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
TCCAAGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGCGGTATCC
AGAGGTTGGA
AATTGTTGGA
TAGTTTACAT
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCGGGTGCG
CCAGTCCATG
TCAGGCAATT
CCAAGACTAC
CGGCCAGAAG
CCCCAGCTTA
TGGAGGAGAT
CATCTTAGAG
CATTGTACTC
GCTAGAAGGG
TGTTGCAACC
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
GGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 4* 4 GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 129 AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA S 5.55
S*
S
5* S S
S
5555
S
S.*
S
5S S 55
ACATCAAAAT
CACAAGTCTC
TGTCTCCGGA
ATCATCCACA
TCCTAGGGGA
TTTGCTGGCT
CATAAGACTT
TCTAGATGTA
GATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AGGTGTTATC
GCAACCAGTC
AGCCCTTTGT
CAGCTTCCAG
CCCCCTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CCTATGTAAG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTATTCG
CATCGGGCAG
ACTAACTCAA
GATGAAGTGG
AAGATTAAAT
AACCCGCCAG
GAkACTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGTAAAGGGT
AGAILATCCGG
AGTAATGATT
CACAGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA
TGCTTCCAGC
CCATTGAAGG
GTTGAGCTTA
GTCGCTCTGA
TCAAGCIAACC
CCGAACAATA
AACGAGACCG
TTATTAACAG
TCATGTTTCT
CCATCTACAC
TCGAGCATCA
GCCTGAGGAC
TCCTTALATCC
AGAGAATCIA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTGGA
CCCAGGGAAT
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AAATCAAAAT
TCCTCTACAA
ACCGCATCCA
TCGGTAGTTA
GATAAATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTACGGGGGA
ACAACTGAGC
TCCGGTGTTC
CATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTCTAC
CCCGACAACA
GGGTAAAATC
TCCTTCATAC
TGCTTCAGGA
CTCTTGAAAC
GCATCGAGCC
ATTAAA.ACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAA.AAGCC
GTGCTGACAC
TTCACCGACC
TACGACTTCA
GATCAATACT
CTACTGGAGG
CCCACTACAA
AILTCGAGGTT
ACTTACCTAG
ATGCACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGGACAGATG
CAAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
ACAAATGTCC
CACCTGAAAT
AGGGTGCAAG
ACAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
CACTCTTCA.A
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGGGTAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
TGTTTGAAGT
ACTATTTTGA
TCAAATTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGCGT
ACAAGTTGCG
GCGAGAATCC
CTGTTAATCT
TGATCACACA
7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 -130 .fr..
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAGG
CTGGTGCCGT
TGGGATGGTG
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CAAGGAAGAT
AGGTTTTCCA
AGGACATCA.A
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACGTTTGA
CCGCTATGAC
AACTGATAGA
TGGAGCCTCT
ATGGACCTAT
AACCTAGCCT
TACCTCTTCA
CCTGCGGAGG
GATCTCCA.AT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GGCATGGGAG
CAGTGAACCA
CATCAGAATT
CAACCAGATC
AGCTATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAT
CCGTGAGCTC
ATGCCTGAGG
GGAGAAAATT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTCTTG
CATTGATGCT
TGGTTTCTTC
TTCACTTGCT
ACAAATCCAA
TAGGTGTAAT
CTGTTCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAAGCCG
TCGAATTACA
TGCTTGCGGA
TCAGCTGCAC
ATCACATGAT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
TGTAATCAGG
CTCAAAAAGG
GACACTAACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
TACCTGCAGC
CCACAACAAT
CAACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTATGAT
CTCATTTTCT
AGTGGAATGC
CTCAGA.ATCT
AGTCACTCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GAAATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CAGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
TGAGGGATAT
GTGTATTGGC
GAGTGGATAC
GGCGAGGACT
AGTTCCAATC
ACTTCCAGAG
TACTTTTATC
TTCACATGGG
GGTGGATATA
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CGCATACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCJZLCAT
GGGGAGGTTA
AAGAGTCAGA
TTACCA.AATT
AACAGTAGAA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
ACCAAAAACT
TCACTCACTC
CCAACCGCAG
ACCCACTAGT
CTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAAATGATTA
CCGACCCACT
AAAGAGTCAA
GTCAGTGATA
GA.ATTGAGGG
CAATGGTTTG
AAkATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 4* 131 CTTTCCTTA-A CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG ATGAAGGTAC TTATCATGAG off, eoo.*: foe.
0 so #009
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAILAGGA
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGGGG
AGAACGTGAG
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTCTA
GCAAAGTCCc
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
AGGGGAGATT
TGCTGAAAAT
GAAAGGTCAT
CAGTTGGCCA
TTCAGGTGA.A
GAAATTTGGC
CAAGGCACTT
TTACGACCCT
GAGCTTTGAC
TGAGTTCAAC
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
AGCAGCAAAA
TCCGGAGAAT
GTACTGCCTT
TTACGGATTA
TGTA-AGTGAC
CAATGACCAA
GTGGACCATC
TGCTTCGTTA
ATGGCCTTAC
TCTTAGGCAA
TTAATTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTCT
CCCCTGACCC
GGGTTALACAC
TGCTTTATGC
GCTGCTCTCC
CCCAAAGGAA
CCATATGACA
CTGTCTTACA
ACTTACAAA-A
A.ATTATTTTA
ACTCTAGCTG
TTAAAAACCC
GGGTTTATAG
ATGGAGGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCAAGGGG
AACCTTAAGA
AGGCTACATG
CCCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGAATG
CTGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAAG
GATTCCCTCA
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCCCTGACCT
AGTACCCTAT
CCTATTTATA
ACAATCAGAC
AACGGGAAGC
ACATAGGCCA
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGATTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAAAGAT
CCCAGTCCAC
TGTAATTCGG
CAGTGCATTT
CAGCTTATTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TCACCTCAAG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACAATGT
TACCCGA;LAG
AATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GACGAGCACG
CTCAAAGAAA
ACAAGTACCA
CAGGACCAAG
ATCACGACTG
GCACAAAGGC
CTTGAAACCT
GTCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 132 e.g.
S
00Se 0O Ce S 0
S
S@ 55 S C
S
CbS.
0@ 55 5
CO
5 S Ce
S
S CC S 55 5 0005 5 5555 0
S.O.S.
0 50 6C S 00 0000
C
6060
CAATTGTCTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGATCT
TGAATATGAG
ATCTCAAGAG
CACAACAACC
TTGTATGCGT
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TAGAA.TCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CAAGGCAAAG
CGACTAATTT
CCCTTGTCCG
ATCACATTTT
ACTCA.AGAGC
AGCATGCAGT
CCTTGCATAT
CACAATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
GGGGGACTCT
CCAGAGCATC
AAACCCA.ATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
AAGGCTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAT
GGCCAATGTG
AGCGCATAGG
AGTGGCAAGG
TTTGTCTATT
ATCGCAAGAT
A.ATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
TCATCACTA.A
TCATTCCTAG
ACTAGACTCC
TTAAAAGGGT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAPLTTTAGAG
GAGTCATGTT
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
CAAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCGGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
ACTGGGCTAG
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
GCCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTACCACTCA
TCTCCAACGA
ATATTATGAT
GTCAGAGACT
TAA.AAGCATC
GATACAGCAA
AGTCATACCC
TATCGGGGGG
AGTAACATCA
GACCCTTCAT
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTA
CTCTCGAGTG
GCTATTGACA
GGCTAGAGCC
CCTTGAGGTC
ATGTGTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAALATAC
CAATCTCTCA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATCCTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
TCAGCAAATC
TTTGTCCTGA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAG~jTACAT
TTTGTCATAT
11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAGGG AATGCTTCTA GGGTTGGGTG 133
TTTTAGAAAC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
GACCATCAGG
AAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
TTGTAGACCA
GATTGAGAGT
CAAAGATCGG
ATGATGTTGC
GGGGTAATCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
TTGTCAAAGT
ATTGTTTCGA
AACAGATTGT
AGAGCTTAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGATT
AGAGTTCACA
TATCCAGGCA
ACCAATTCGA
AGAGGCTAGG
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAALATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCA.ACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCAGCCA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTTCTGT
AAACACTTGT
GGTCTACGAC
TTATCTCCAG
TCTCTGACTT
TTCATTTTTG
ATCTCAAATA
AAAGATATCA
GAAATCCACG
ATATCAACAT
GGGTCGGGTT
AATAGTGGGG
GAAGTTGGTC
GGGAGGCCCG
ATACCGGATC
CGATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGTTGTCTTC
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTAGC
CTGTAGAGAA
CAGGGTCTTC
ATCTCCGGCG
ACGCCCTCGC
TGAGCATCAA
ACACAAGCAA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TTTCCGCCAA
TTGTCGAACA
AAGTCACGTG
ATCTAA-CACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGA.AATT
GCTTATAGAG
ATTTGATGTA
GTTCCTTTCT
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCGATC
TGAGGTAAAT
GGATTTCAGA
GCACAATCTT
AATCGGGTTA
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
AGAATGAGCA
AAGAAATTCT
AACTTACACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGATC
AATCCAATTA
AALACAGATAA
GTCAGTCAGC
CCTCCACACG
CCCATTTCAG
AACTCATCCG
CCAGGGGAAG
GAGATACTAA
GGTCAALAGGG
GTAGGTAATA
GTAGATTGCT
13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 134
TCAATTTCAT
AGACCTTACC
TGGCTCTGCT
GGGATTTTGT
TCTACCCTAG
AAGCTAACCG
GGACTTCACC
CAATTGTGGG
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TTTGGGGGCA
ATCTCAAGTC
CTAAGTCAGA
TAACAATCAA
ATTAATTGGT
AGTCAGTAAT
TAACAAAGAT
CCTTGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
TATTCTTCTT
CGGTTACCTG
GAAACAGATT
GGAGACCAAA
TGGACTCCGG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
GCCTCAGGGC
TTCAAAGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
GACCCTAATC
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTTA
CTGAATCTTA
TTAAGCAGCA
TATCCATTAA
GTATCAACCC
TGGCAATTAA
AAGATGGATT
ACCA.AAGAAG
GAGAACTTAT
ACAGAAAGTT
TACACCAGAA
GGGGTTTAAA
AGTTAGTCGG
CTGCCCTAGG
TTTCTATTCC
GTTTAT CCAT
ATTAGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTAGTCATG
GATAATTGAA
GCAACTAAGC
TATTCTGAAG
CGGACCTAAA
GCTTAACTCT
TCA-ACAAGGG
ATCTAGGATC
GATAAATCGG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
TAGTTAGGCA
CAGCTTTGTC
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGC
TGCATACAAG
AA.ACTTACAC
CTGTGCAAAG
ATACTCATCC
ATGTTCCATG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
TTATTTGCAA
TGGT
14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15894 a a.
a a a.
a TATATTAAAG AAAACTTTGA AAATACGAAG INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 135 Met Asp Arg Ile Asn Thr Ile Gly .Arg 145 Trp Ser Pro Leu Phe 225 Thr Arg Gly Asp Ser Val Lys Val His Glu Asn Asp 130 Lys Phe Vai Val Val 210 Giu Glu Val Asn Ser Pro Pro His Giu Ser Asp Ser 115 Thr Giu Giu Ile Phe 195 Al a Leu Thr Arg Pro Leu Ile His Arg Val His Lys 100 Leu Asn Lys Pro Lys 180 Phe Ile Val Al a Tyr 260 Thr Ser Val Al a Leu Gly Ile Giu Tyr Ser Ile Phe 165 Ser Thr Ile Leu Met 245 Met Tyr Val Thr Tyr Lys Asn 70 Pro Ser Ser Arg Ile 150 Leu Gin Gly Ser Met 230 Thr Trp Gin Asn Asn Ser Asn Val Tyr Thr Lys Leu 135 Asn Phe Thr Ser Lys 215 Tyr Ile Lys Ile Gin Lys Leu Gly Ile Pro Arg Vai 120 Gly Leu Trp His Ser 200 Glu Cys Asp Leu Val Leu 10 Val Asp Ser Ser Cys 90 Ile Asp Gly Val Thr 170 Cys Giu Gin Val Arg 250 Asp Met Tyr Al a Pro Asn Lys 75 Asn Arg Lys Ser Tyr 155 Val1 HisB Leu His Ile 235 Tyr Gly Leu Pro le Thr Gin Leu Gin Giu Val Giu 140 Met Lys Arg Leu Val 220 Glu Thr Phe Glu Val Giu Cys Ile Ser Leu Leu 110 Gin Arg S er Glu Arg 190 Ser Tyr Arg Leu Pro 270 Leu His Tyr Gin Ile Tyr Phe Lys Cys Giu Ser Met 175 His Arg Leu Leu Leu 255 Ala Ser Leu Ala Asn Asn Pro Asn Lys Leu Asp Gin 160 Arg Thr Asp Thr Met 240 Gly Leu Leu 136 275 280 285 Ala Tyr Leu Gin Leu Arg Asp Ile Thr Val Giu Leu Arg Giy Ala Phe 290 295 300 Leu Asn His Cys Phe Thr Giu Ile His Asp Val Leu Asp Gin Asn Gly 305 310 315 320 Phe Ser Asp Giu Gly Thr Tyr His Giu Leu Ile Giu Ala Leu Asp Tyr 325 330 335 Ile Phe Ile Thr Asp Asp Ile His Leu Thr Gly Giu Ile Phe Ser Phe 340 345 350 Phe Arg Ser Phe Giy His Pro Arg Leu Glu Ala Val Thr Ala Ala Giu 355 360 365 Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val Ile Val Tyr Giu Thr 370 375 380 Leu Met Lys Giy His Ala Ile Phe Cys Gly Ile Ile Ile Asn Gly Tyr S385 390 395 400 Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His *405 410 415 *Ala Ala Asp Thr Ile Arg Asn Ala Gin Ala Ser Giy Glu Gly Leu Thr 420 425 430 His Giu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 435 440 445 Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 450 455 460 Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Giu Trp Asp Ser Val Tyr 465 470 475 480 Pro Lye Giu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg *485 490 495 Arg Leu Val Asn Val, Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 500 505 510 Met Ile Met Tyr Val Vai Ser Giy Ala Tyr Leu His Asp Pro Glu Phe 515 520 525 Asn Leu Ser Tyr Ser Leu Lye Glu Lye Glu Ile Lye Glu Thr Gly Arg 530 535 540 Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val Ile Ala 545 550 555 560 137 Giu Asn Leu Ile Ser Asn Gly Ile Cly Asn Tyr Phe Lys Asp 565 570 Asn Gly 575 Met Val Val Val 625 Asp Ser Tyr Leu Leu 705 Pro Gly Pro Leu Ser 785 Arg Ala Ser Leu 610 Arg Gin Aila Glu Pro 690 Tyr Leu Giy Tyr Val 770 Thr Asp Lys Giy 595 Lys Ala Asp Phe Thr 675 Ser Val Cys Ile Leu 755 Gin Trp Tyr Asp Giu 580 Vai Pro Thr His Aia Lys Thr Asp 645 Ile Thr 660 Ile Ser Phe Phe Ser Asp Lys Val 725 Giu Giy 740 Tyr Leu Giy Asp Pro Tyr Phe Vai 805 His Lye Ser Giy 630 His Thr Leu Gin Pro 710 Pro Tyr Aia Asn Aen 790 Ile Asp Asp Arg 615 Phe Pro Asp Phe Trp 695 His Aen Cys Aia Gin 775 Leu Leu Leu Leu 600 Ser Ile Giu Leu Aila 680 Leu Cys Asp Gin Tyr 760 Thr Lys Arg Thr 585 Lye Pro Giy Asn Lye 665 Gin His Pro Gin Lys 745 Giu Ile Lye Gin Lye Giu Val Phe Met 650 Lye Arg Lye Pro Ile 730 Leu Ser Aia Arg Arg 810 Al a S er His Pro 635 Giu Tyr Leu Arg Asp 715 Phe Trp Gly Vai Giu 795 Leu Leu His Thr 620 His Ala Cys Asn Leu 700 Leu Ile Thr Val Thr 780 Ala His His Arg 605 Ser Val Tyr Leu Giu 685 Giu Asp Lys le Arg 765 Lys Ala Asp Thr 590 Giy Thr Ile Glu Asn 670 Ile Thr Ala Tyr Ser 750 Ile Arg Arg Ile Leu Gly Lys Arg Thr 655 Trp Tyr Ser His Pro 735 Thr Ala Val Val Gly 815 Al a Pro Asn Gin 640 Val Arg Gly Val Val 720 Met Ile Ser Pro Thr 800 His His Leu Lye Aia Aen Glu Thr Ile Val Ser Ser His Phe 820 825 Phe Val Tyr 830 138 Ser Lys Gly Ile Tyr Tyr Asp Gly LeU Leu Val Ser Gin Ser Leu Lys 835 840 845 Ser Ile Ala Arg Cys Val Phe Trp Ser Glu Thr Ile Val Asp Giu Thr 850 855 860 Arg Ala Ala Cys Ser Asn Ile Ala Thr Thr Met Ala Lys Ser Ile Giu 865 870 875 880 Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 885 890 895 Ile Gin Gin Ile Leu Ile Ser Leu Gly Phe Thr Ile Asn Ser Thr Met 900 905 910 Thr Arg Asp Vai Vai Ile Pro Leu Leu Thr Asn Asn Asp Leu Leu Ile 915 920 925 Arg Met Ala Leu Leu Pro Aia Pro Ile Gly Gly Met Asn Tyr Leu Asn 930 935 940 Met Ser Arg Leu Phe Vai Arg Asn Ile Gly Asp Pro Val Thr Ser Ser 945 950 955 960 Ile Ala Asp Leu Lys Arg Met Ile Leu Ser Ser Leu Met Pro Giu Giu 965 970 975 Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 980 985 990 Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 995 1000 1005 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Val Leu Ile His 1010 1015 1020 Ser Pro Asn Pro Met Leu Lys Giy Leu Phe His Asp Asp Ser Lys Giu 1025 1030 1035 1040 Giu Asp Giu Gly Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val *.1045 1050 1055 Pro Arg Ala Ala His Giu Ile Leu Asp His Ser Vai Thr Gly Ala Arg 1060 1065 1070 Giu Ser Ile Ala Gly Met Leu Asp Thr Thr Lys Gly Leu Ile Arg Ala 1075 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 139 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Glu Ser Cys Ser Val Gin Leu 1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 1140 1145 -1150 Pro Ile Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Glu Thr Cys Val Ile Cys Glu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 1185 1190 1195 1200 Ile Asp Lys Giu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 Thr Asp Giu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 .Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Aen Glu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Glu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Thr Thr Gin 1285 1290 1295 Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Val Asp *1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Glu Thr Leu Phe Arg Leu Glu Lye Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 Leu His Leu His Val Glu Thr Asp Cys Cys Val Ile Pro MetIle Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lye Leu Glu Leu Arg Ala Glu Leu 1380 1385 1390 140 Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr 1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1450 1455 Asn Glu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 Thr Giu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 1475 1480 1485 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 *Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro *1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ile Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 Ile Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Glu *1570 1575 1580 Leu Giu Giu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 1585 1590 1595 1600 Pro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro Ile Arg Gly Leu Arg 1620 1625 1630 Pro Val Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Giu Ala 1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val 1650 1655 1660 141 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Giu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn 1700 1705 1710 Met Ser Ile Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Gly 1730 1735 1.740 Asn Leu Ala Asn Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu Asn 1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Glu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Giu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Giy 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lys Glu Ile Leu Lys Leu Asn Lys Cys Phe 1795 1800 1805 **Tyr Asn Ser Gly Vai Ser Ala Asn Ser Arg Ser Giy Gin Arg Giu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 1825 1830 1835 1840 Gly Asn Ile Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp *1845 1850 1855 Val Gly Ser Val Asp Cys Phe Asn Phe Ile Vai Ser Asn Ile Pro Thr 1860 1865 1870 Ser Val Gly Phe Ile His Ser Asp Ile Giu Thr Leu Pro Asn Lys 1875 1880 1885 Asp Thr Ile Glu Lys Leu Giu Glu Leu Ala Ala Ile Leu Ser Met Ala 1890 1895 1900 Leu Leu Leu Gly Lys Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Val Gly Ser Tyr 1925 1930 1935 Tyr Arg Giu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 142 1940 1945 1950 Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Giu Lys Ile Lys Gin Gin Ile Ile Giu Ser Ser Val Arg Thr 1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Gly Ile Asn Pro 2005 2010 2015 Ile Leu Lys Lys Leu Thr Pro Ile Giu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Giu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Giy Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 *Arg Giu Leu Ala Arg Phe Lye Asp Asn Gin Arg Ser Gin Gin Giy Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Ile *..2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Arg Phe Ile Gin Asn Leu Lye Ser Gly Tyr 2115 2120 2125 *Leu Ile Leu Asp Leu His Gin Asn Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Giu Lye Gin Ile Ile Met Thr Gly Gly Leu Lye Arg Giu Trp Val *2145 2150 2155 2160 C..Phe Lys Vai Thr Ile Lye Giu Thr Lye Giu Trp Tyr Lye Leu Val Gly 2165 2170 2175 Tyr Ser Ala Leu Ile Lye Asp 2180 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid 143 STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT
TCAAGATCCT
TGAGGAGCTT
GTGGAGCCAT
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TTCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
TAGTTGGTGA
AGGACCTCTC
GGAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGAGA
GCGCAGGATC
ACTCCATGGG
GGCAAGAGAT
ATTATCAGGG
AGCATTGTTC
CAGAGGAATC
ATCCAGACTA
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGAA
TACCATTCTA
AGCTGATTCG
ATTTAGATTG
TTTACGCCGA
TAGGATTGCT
TTTTATCTTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTGAGGAGG
ACAAGAGCAG
AAAAGAAACA
AAACACATTA
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCCCAGATCT
GAGCTAAGAA
GAGAGAAAAT
TTCATGGTGG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGTCGAT
TCAGCTGGAA
GATTAGGGAT
AGGACAXACC
TTATAGTACC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAAGCAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAAA
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTAGAGAA
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
ATCC!GAGATG
ACCCATTACA
AATTCCTGGA
AATTGGAAAC
CTTATTTGTG
CATCAGGCTG
AAGAGGTACC
TAGTGATCAA
AGATCCTGAG
CGCAAAGGCG
GTACACCCAA~
GGTGAGGAAC
GGATATCAAG
TACATATATC
AACTATGTAT
CTTGATGAAT
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
GCCACACTTT
TCAGGATCCG
GATTCCTCAA
CCGGATGTGA.
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTCCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGGACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCCGAACTCG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 144 a
GTATCACTGC
GGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCCTCC
CGCAGGACAG
TGGAAGAACA
ATTAGGTGCG
AAAACTTAGG
GAGCCGATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
AGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AAAACAGCCT
CGAGGATGCA
AGCGGTCGGA
GCTACCAGGA
GGAGAGCTAC
AACCAGCATG
TCGA.AGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ATCCAGGACA
AACCATGCTT
GATCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGA
AGATCGCGTC
CGTCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAAT
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGAAACCG
CCCCTAGACA
GCTGACGCTC
ACGGACACCC
GACCAGA.ACA
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGACCGAGCC
CTCAGCAATT
AAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CGATGAATCT
CACTGACCGG
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGATGCACCT
GACACCCGAA
CTATTATGAT
ACACGAGGAT
CAGAGATTGC
CCCAAGTATC
AGGAAGACAG
AGTCCAGCAG
TTGACACTGC
TGCTCAGGCT
CTAGGGTATA
ACATCCGCCT
CAGCCAACCA
GTCAAAAACG
GTCGAGGAAG
GCCTGCAAGG
GGATCAACTG
GACGCTGAAA
TGTTATCATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAA.AGGGGA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
AATGCATACT
ATTTCTACAC
GAGGGTCAA
AGCAAGTGAT
ATCGGAGTCA
GCAAGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
GACTGGAATG
CCATGGCAGC
AAGAGGAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAA
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCCGATGT
TALATCTCCAA~
ACTGAGGACA
GGTGATCAAA
CAGAGTCGGG
GCGAGAGCTG
GGCCALAGATC
GCAGGAALTCT
GATCTTCTAG
ATTGTTATAA
CCCACGACTG
CATCCGGGCT
ATGGTCAGAA
AGGCAGTTCG
ALCCTCGCATC
CCCCTCAAGA
CAGCGGTGAA
TGATGGTGAT
TGGCGAACCT
GGGGTTCAGG
ACTCCAATCC
CCCGA.ACCCC
ATTGGCCTCA
TCGAAAGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGACATC
GCTAGAATCA
1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 a.
145
TTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAGGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGT
CAGATGCTGA
CCAGTCGAC
GCCTCCTAAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAATGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
TGAAGGGAGA
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACTAATGG
TCGGGAAAA.A
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
TAATTAGTAC
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGT
CATAGTTGTT
ACTALACCCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATTGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GGTGAGCTCA
CTCCATTATA
TGATGATATC
AATGAAGTAG
AACCTAAATC
ACAGAGATCT
CA.ACCTACCA
CTAGGTGATA
AGAGATCCCC
AGATCCACAG
AGACGTACAG
CTCACACCTT
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGAAGA
GGGAACTTCA
GAAAAGATGG
AGCACAGGCA
TACCCACTGA
ATAGTAAGAA
ATTAAGAAGC
AGCATCATGA
GAACTCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AALATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CAAAACCCGA
CAGGGCjTCAA
GGAGAAAGGT
ATCTAATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAAAGAA
GCCTGGTTTT
AAATGAGCAA
TGGATATCAA
TCCAGGCAGT
AGATCAACAG
TTGCCATTCC
CCGACCTGAA
AGCCCGTTGC
AGCTGCTGAA
TTGTCCCTGA
GGCTAGAGGA
ACGATCTTGC
ACTTACCTGC
AACTTAGGAG
CAAGTCGGCA
TGGCAGGCTG
ATGCTTTATG
AATCGGGCGA
GGAACTCCTC
TGAAAAACTG
CCTAACAACA
GCTGGACACC
CGGGTATTAC
CAACCTGCTA
TGCAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTTGCAGCCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACA.A
GGAATTTCAA
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTCGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCCA
GTGACCCTCA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGGTTAC
TCAGTTCCTC
2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 146 9 9 9* 9 .99 9 9 9 9 9 9 9.
9
AAGAATTCCG
TGTAGACCGT
GCCCGGACAA
GCCGACAGCA
CCACCAGCCA
CGCTCCGGAC
ACTGGAAGGC
GACCGAGGTG
ACTAAACAAA
CGGCACCGCG
CCCCGGTGCC
AATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGGCC
TGGGGGACCC
CTCCTCCTCT
CCCACCCCTA
GGTCTCAAGG
ACCGGTCAAA
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
CATTTACGAC
AGTGCCCAGC
AAAAG CCC CC
AGTGTGGACA
TCCCAATCCG
ACAGACCACC
CCCTTCCCCC
ACCCAACCGC
ACTTAGGGCC
CCCCCACCCC
CACAGGTAGG
GGGGGGCCCC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCAAACCGCA
TCTCGAAGGG
AAGGAGACAC
TGAATGTCTT
TCCATTGGGG
TTATGACTCG
TCAATAACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACGTGATCA
AATACCCGAA
TCCAAAAGAC
CCAGGCGGCC
CGTCCTCCTC
AGCCGCATCC
CTCCCCCAAC
AGGCATCCGA
AAGGAACACA
CCGAAAACCA
CACACCAACC
CCCCAAAA.AA
CACACGACCA
CGGCCATCAC
GGAAGCCACC
AAAGACATCA
ACCAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
TAAATGATGA
AACGACCCCC
TTCACGGACC
CAkAGCACAGA
GTAGGACCCC
CCACAGCCCT
GCAAGAACCC
CTCCCTAGAC
CACACCCGAC
GAGGGAGCCC
CCCGAkACAGA
AGGCCCCCAG
CGGCAACCAA
CCCGAAAAAA
CAACCCGAAC
GTATCCCACC
CAATCCACCA
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTGG
GAAATTGCAG
AATGCAATGA
AGATTTGCGG
ACAGCCGGCA
GCAAGCCTGG
CCAAGGACTA
CTCATAATGA
AAGCGAGAGG
ACAGCCCCGA
CGAGGACCAA
CGGGAAAGGA
CACAACCGAA
AGACCCTCCC
AGAACCCAGA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAGCCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCAA
CATCCGACGA
CTCATCCAAT
TGTTAACTCT
TGGTAGGGAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTTGTCCT
TTGCACT'TCA
AAACTACTAA
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
CACAAGGCCA
CCCCCAAGGT
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAT
CCCCGGCCCG
CCGCCGCCCC
AGCCACCGAC
CCAGCATCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAATTC
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGGTGCG
CCAGTCCATG
TCAGGCAATT
4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 147
GAGGCAATCA
ATCAATAATG
CTAGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GTGTGCAGCC
TCCACCAAGT
TCACAAGGGA
ACGATCATTA
GTGGTCGAGG
TACCTGCACA
AATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TGTCTCCGGC
ATCATCCACA
TCCTAAGGGA
TTTGCTGGCT
CATTAGACTC
GGCAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TATTAGAAAA
TAAAGGCCCG
ATCCGACGCT
ATATAGGCTC
TTATCTCGAA
AAAATGCCTT
CCTGTGCTCG
ACCTAATAGC
ATCAAGACCC
TGAACGGTGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GGGAACAAGT
CCTATGTAAG
CTCTCCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTATTCG
CATCGGGCAG
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCCATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TAAGCTGGAG
AGGTTTATCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGAACGATA
ACCGAGACCG
TTATTAACAG
TCATGTTTCT
CCATCTACAC
ATATTGGCTG
CAACTATCTT
GAALATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTCT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
GCTTTAATAT
AGACCAGGCC
TCCCCTACA.A
ACCGCATCCA
TCGGTAGTTA
AATAAATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
TTCAGGGTGT
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
TCCAAGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAAGTTGGA
AATTGCTGGA
TAGTTTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAAAC
GCATCAAGGC
ATTAAAACTT
TTCTACA7LAG
ATGATTGATA
GGGTTGCTAG
CATAAGAGCC
CCAAGACTAC
CGGCCAGAAG
CCCCAGCTTA
TGGGGGAGAT
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
GGACGCGGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACAGGG
ACAGATTTCC
CACCCGAAAT
AGGGTGCAAG
ACAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 148 TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA *9e* 9*SR fl*q 00 0 *0
I
GATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AGGGGTTATC
GCAACCAGTC
AGCCCTCTGT
CAGCTTCCAG
CCCCCTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAG
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
GATGAAGTGG
AA.AATTAAAT
AACCCGCCAG
GAACTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACCATGACAT
AGTAAAGGGT
AGAALATCCGG
AGTAATGATT
CACAGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA
TGCTTCCAGC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCT
AACCTCTTCA
CCTGCGGAGG
GATCTCAT
TATGTTTACA
GGGGTCCCAA
CACTTCTGTG
GCCTGAGGAC
TCCTTAATCC
AGAGAATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTGGA
CCCAGGGAAT
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTGTCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AAATCAAA.AT
ACAAAACCAA
TAGGTGTAAT
CTGTTCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAGGCCG
TCGAATTACA
TGCTTGCGGA
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTACGGGGGA
ACAACTGAGC
TCCGGTGTTC
CATGGTGGCT
GGTTCCCTAT
GA.AATCCCCA
TAGGCTTTAC
CCCGACAACA
GGGTAAAAAC
TCCTTCATAC
TGCTTCAGGA
CCACAACAAT
CAACACATTG
CAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
TTCAGAATCT
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGG
CCCACTACAA
AGTCGAGGTT
ACTTACCTAG
ATGCACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGGTCAG
ACCGACATGC
CTCTCATCTC
CGGACAGATG
CAAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
GTGTATTGGC
GAGTGGATAC
GGCGAGGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGGGCAAC
TCAGAGGTCA
ACAATGTGTC
TGGGAALAGCC
TGTTTGAAGT
ACTATTTTGA
TCAGGTTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTAATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTAATTCT
TTGA.ACATGC
CTTTTAGGTT
ACCAAAAACT
TCACTCACTC
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 149 TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAATCGCAG 9120 *0Oe
S
OOSO
SC OS 0 5 00 5 0 0555 S S 5* 0 50
S
S.
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CAAGGAAGAT
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGTTAATCTC
TGACGTTTGA
CCGCTATGAC
AACTGATAGA
TGGAGCCACT
CTTTCCTTAA
ATGAAGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
CAGTGAACCG
CATCAGAATT
CAACCAGATC
AGCTATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAT
CCGTGAGCTC
ATGCCTGAGG
GGAGAAAATT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
CATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAAAAT
GAAGGGTCAT
CAGTTGGCCA
TTCAGGTGAA
GAGATTTGGC
ATCACATGAT
AAGAA.AAACG
TTGTACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
TGTAATCAGG
CTAAAAAAGG
GACACTAACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATGCAG
CCTGCACTCG
TACCTGCAAC
ACTGAA.ATAC
TTAATTGAAG
TTCTCATTTT
GTCAGGAAAT
GCCATATTTT
CCCCTGACCC
GGGTTAACAC
TGTTTTATGC
GTCACTCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GAALATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAGGA
ATGTCATAGA
AACTTCTAGG
GGAATCCAAC
TGAGGGACAT
ATGATGTTCT
CCTTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
CACCAGGCAT
GTGGTTTCCC
AGATAGCCCG
CGCTTACAGC
ATTCTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAAATT
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGACAGTGAT
ACCCACTAGT
GTCATGGACT
ATAGTTACCA
CTTGAGGACC
CAAATGATTA
CCGGCCCACT
AAAGAGTCAA
GTCAGTGATA
GAATTGAGGG
CAATGGTTTG
AAATCACAAA
TCAGTTGAGC
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCTATGC
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAGATCAT
CTGACAATGT
9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 0
S
0
SOS...
O 5 5S C S
SS
5005 0 SOgO 150
ACCTAA.AGGA
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAkA
ATTTGACTAA
GTCACAGGGG
GGAACGTTAA
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTCTA
GCAAAGTCCC
GTCAGAAGCT
GGGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGATCT
TGAACATGAG
ATCTCAAGAG
CAAGGCACTT
TTACGATCCT
GAGCTTTGAC
TGAGTTCAAT
CGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
AGCAGAAAAA
TCCGGAGAAT
GTACTGCCTT
TTACGGATTA
TGTAAGTGAT
CAATGACCAA
GTGGACCATC
TGCCTCGTTA
ATGGCCTTAC
TCTTAGGCAA
ATCACATTTT
ACTCAAGAGC
AGCATGCAGT
TCTTGCATAT
CACAATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
GCTGCTCTCC
CCCAAGGGAA
CCATATGATA
CTGTCTTACA
ACTTACAA;A
AAGTATTTTA
ACTCTGGCTG
TTAAAAACCT
GGGTTTGTAG
ATAGAAACCT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATCA
AGCACCATTC
GTGCAAGGGG
AACCTTAAGA
AGGCTACATG
TTTGTCTATT
ATTGCAAGAT
AATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAkAACA
GCATCACTAA
AAAGGGAATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAAG
GATTCCCTCA
ACGAGACAGT
ATGAGACCAT
TTCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTACTTATA
ACAATCAGAC
AACGGGAAGC
ACATTGGCCA
CAAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCGAGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
GGATTCAGTT
GAGGCTTGTA
TGTCGTAAGT
AAAGGAGATC
CCAAGTGATC
GATGGCCAAG
CCCCAAAGAT
CCCAGTCCAC
TGTAATTCGG
CAGCGCATTT
CAGCTTATTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TCACCTCAAG
ATATTATGAT
GTCAGAGACT
TAAAAGCATC
GATACAGCAA
AGTCATACCC
TATTGGGGGG
AGTAACATCA
GACCCTCCAT
TACCCGAAkAG
GATGTTTTCC
GGAGCCTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
ACILAGTACCA
CAGAATCAAG
ATCACGACTG
GCACAGAGGC
CTTGAAACCT
GTCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTTTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 151
CACAACAACC
TTGTATGCGT
TCCATAGTCC
AGAGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CAAGGCAAAG
CGACTAATTT
CCCTTGTCAG
CAGATAAGAA
TTTTAGAAAC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACCATCTA
GGGGGACTCT
CCAGAGCATC
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
AAGACTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
AGTGGCAAGG
AGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
TCATTCCTAG
ACTAGACTCC
TTAAAAGGGT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTTAGAG
GAGTCATGTT
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTA
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCAACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCAGCCA
ACTGGGCTAG
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCAACGA
ACCAACAAGG
ATACTGGATC
CGATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTCT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTA
CTCTCGAGTG
GCTATTGACA
GGCTAGAGCC
CCTTGAGGTC
ATGTGTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCCGTT
GAACGAAGCC
GATCACTCCC
AGTGAA.ATAC
CAATCTCTCA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTTATAGAG
ATTTGATGTA
TCAGCAAATC
TTTGTCCTAA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCGACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 152 GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT
AGAATGAGCA
13800
AAGGAGTGTU
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
TTGTAGACCA
GATTGAGAGT
CAAAGGTCGG
ATGATGTTGC
GGGGTAGTCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
TTGTCAAGGT
TCAATTTCAT
AGACCTTACC
TGGCTCTACT
GGGATTTTGT
TCTACCCTAG
AAGCTAACCG
GGACTTCACC
*TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCGATTCGA
AGAGGCTAGG
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAAATTGCTC
TGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTcC
GCTCTTTAAC
AGTCAGTAAT
CAACAAAGAT
CCTTGGCAA
TCAGGGATTT
GTACAGCAAC
GCTAATGAT
TGGACTTATA
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTTTTGT
AAACACTTGT
GGTCTAAGGC
TTATCTCCAG
TCTCTGACTT
TTCATTTTTG
ATCTCAAATA
AAAGATATCA
GAAATCCATG
ATATCAACAT
GGGTCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGCTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATCTCCGTCG
ATGCCCTCGC
TGAGCATCAA
ACACAAGCA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TTTCCGCCA
TTGTCGAACA
AAGTCACGTG
CTAGTGTGGG
AGTTAGAGGA
TACTGGTGAT
TAGGGTCTCAI
CTGAATCTTA I
TTAAGCAGCAC
TATCTATCAA C
AAAGATCTAC
TGATGCTCA
CCTTGACCTG
TGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCTATC
TGAGGTAAAT
GGATTTCAGA
GCACAATCTT
AATCGGGTTA
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
k.TTTATCCAT k.TTGGCAGCC
CAAGCTTATG
~TATAGAGAA
~TTAGTTATG2 7ATAATTGAA,
I
;CAACTAAGc I
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGATC
AATCCAATTA
AAACAGATAA
GTCAGTCAGC
CCTCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAAG
GAGATACTA
GGTCAAAGGG
GTAGGTAATA
ATAGATTGCT
rCAGATATAG kTCTTATCGA
CCTTTCAGCG
'TGAACCTTG
kCAGATCTCA
LCATCTGTGC
13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 CAATTGTGGG AGGCGCAGTT AGTAGAGGTG ATATCAACCC TATTCTGA
AAACTTACAC
15300 153
CTATAGAGCA
AATTAATCCA
TCTACAGGGA
CTTACCCCGT
TTTGGGGGCA.
ATCTCAAGTC
CCAAGTCAGA
TAACAGTCAA
ATTAATTGGT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
TATTCTTCTT
CGGTTATCTA
GAAACAGATT
GGAGACCAAA
TGAACTCCGG
AGTTGCGGGT
GCCTCAGGGC
TTCAAAGACA
AGTAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
AACCCTAATC
TGGCAATTAJA
AAGATGGATT
ACCAAAGAAG
GAGAACTTGT
ACAGAA.AGTT
TACACCAGAA
GGGGTTTAAA
AGTTAGTCGG
CTACCCTAGG
TTTCTATTCC
CGGACCTAAG
GCTTAACTCT
TCAACAAGGG
ATCTAGGATC
GATAAATCGG
TATCTTCGTT
ACGTGAGTGG
ATACAGCGCT
TAGTTAGGCA
CAGCTTTGTC
CTGTGCAAAG
ATACTCATCC
ATGTTCCACG
ACTCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTA.AGG
CTGATTAAGG
TTATTTGCAA
TGGT
15360 15420 154 15540 15600 15660 15720 15780 15840 15894 TATATTAAAG AAAACTTTGA
AXATACGAAG
INFORMATION FOR SEQ ID NO:8: i)SEQUENCE
CHARACTERISTICS:
LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (iMOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met Asp Ser Leu Ser Val Asn Gin Ile Leu Tyr 1 5 10 Asp Ser Pro Ile Val Thr Asn Lys Ile Val Ala 25 Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro 40 Ile Lys His Arg Leu Lys Asn Gly Phe Ser Asn 55 Asn Val Glu Val Gly Asn Val Ile Lys Ser Lys 70 75 Pro Glu Val His Leu Ile Leu Thr Leu Gin Met Leu Arg Glu Tyr Ala Cys Gln Asn Ile Ile Asn Ser Tyr Pro Leu Phe Asn Ala His Ser His Ile Pro Tyr Pro Asn Cys 90 Asn Ginx Asp 154 Ile Giu Asp Lys 100 Glu Ser Thr Arg Lys Ile Arg Giu Leu Leu Lys Lys 105 110 Gly Arg Ile 145 Trp Ser Pro Leu Phe 225 Thr Arg Gly Ala Leu 305 Phe Ile Phe Asn Asp 130 Lys Phe Val Vai Val 210 Giu Glu Val Asn Tyr 290 Asn Ser Phe Arg Ser 115 Thr Giu Giu Ile Phe 195 Ala Leu Thr Arg Pro 275 Leu His Asp Ile Ser 355 Leu Asn Lys Pro Lys 180 Phe Ile Val Al a Tyr 260 Thr Gin Cye Giu Thr 340 Phe Tyr Ser Ile Phe 165 Ser Thr Ile Leu Met 245 Met Tyr Leu Phe Gly 325 Asp Gly Ser Arg Ile 150 Leu Gin Gly Ser Met 230 Thr Trp Gin Arg Thr 310 Thr Asp His Lys Leu 135 Asn Phe Thr Ser Lys 215 Tyr Ile Lye Ile Asp 295 Glu Tyr Ile Pro Vai 120 Gly Leu Trp His Ser 200 Giu Cys Asp Leu Vai 280 Ile Ile His His Arg 360 Ser Leu Gly Phe Thr 185 Val Ser Asp Ala Ile 265 Ala Thr His Giu Leu 345 Leu Asp Giy Val1 Thr 170 Cye Giu Gin Val1 Arg 250 Asp Met Val Asp Leu 330 Thr Glu Lys Ser Tyr Val His Leu His Ile 235 Tyr Gly Leu Giu Val1 315 Ile Giy Al a Val Giu 140 Met Lys Arg Leu Val 220 Glu Ala Phe Giu Leu 300 Leu Glu Giu Val Phe 125 Leu His Thr Arg Ile 205 Tyr Gly Giu Phe Pro 285 Arg Asp Ala Ile Thr 365 Gin Arg Ser Glu Arg 190 Ser Tyr Arg Leu Pro 270 Leu Gly Gin Leu Phe 350 Ala Cys Giu Ser Met 175 His Arg Leu Leu Leu 255 Ala Ser Ala Asn Asp 335 Ser Ala Leu Asp Gin 160 Arg Thr Asp Thr Met 240 Giy Leu Leu Phe Gly 320 Tyr Phe Glu 155 Asn Leu 385 Arg Ala His Gly Lys 465 Pro Arg Met Asn *.oO Met Val Val Val 625 Asn Val 370 Met Asp Ala Glu Cys 450 Asp Lys Leu Ile Leu 530 Phe Asn Ala Ser Leu 610 Lys Gin Arg Lys Arg Asp Gin 435 Phe Lys Glu Val Met 515 Ser Ala Leu Lys Gly 595 Lys Ala Asp Lys Gly His Thr 420 Cys Met Ala Phe Asp 500 Tyr Tyr Lys Ile Asp 580 Val Thr Glu Thr Tyr His Gly 405 Ile Val Pro Leu Leu 485 Val Val Ser Met Ser 565 Glu Pro Tyr Lys Asp Met Ala 390 Gly Arg Asp Leu Ala 470 Arg Phe Val Leu Thr 550 Asn His Lys Ser Gly 630 His Asn 375 Ile Ser Asn Asn Ser 455 Ala Tyr Leu Ser Lys 535 Tyr Gly Asp Asp Arg 615 Phe Pro Pro Cys Pro Gin 425 Arg Asp Gin Pro Asp 505 Ala Lys Met Gly Thr 585 Lys Pro Gly Asn Lys Gly Pro 410 Ala Ser Ser Arg Pro 490 Ser Tyr Glu Arg Lys 570 Lys Glu Val Phe lle Val Ile 395 Leu Ser Phe Asp Glu 475 Lys Ser Leu Ile Ala 555 Tyr Ala Ser His Pro 635 Glu Ile 380 Ile Thr Gly Ala Leu 460 Trp Gly Phe His Lys 540 Cys Phe Leu His Thr 620 His Thr Val Ile Leu Glu Gly 445 Thr Asp Thr Asp Asp 525 Glu Gin Lys His Arg 605 Ser Val Tyr Tyr Asn Pro Gly 430 Val Met Ser Gly Pro 510 Pro Thr Val Asp Thr 590 Gly Thr Ile Glu Glu Thr Gly Tyr 400 Leu His 415 Leu Thr Arg Phe Tyr Leu Val Tyr 480 Ser Arg 495 Tyr Asp Glu Phe Gly Arg Ile Ala 560 Asn Gly 575 Leu Ala Gly Pro Arg Asn Arg Gin 640 Thr Val 156 650 p
C
p. p *ppp Ser Tyr Leu Leu 705 Pro Gly Pro Leu Ser 785 Arg His Ser Ser Arg 865 Arg Ile Ala Glu Pro 690 Tyr Leu Gly Tyr Val 770 Thr Asp Leu Lys Ile 850 Ala Gly Gin Phe Thr 675 Ser Val Cys Ile Leu 755 Gin Trp Tyr Lys Gly 835 Ala Ala Tyr Gin Thr Ser Phe Asp Val 725 Gly Leu Asp Tyr Val 805 Asn Tyr Cys Ser Arg 885 Leu Thr Leu Gin Pro 710 Pro Tyr Ala Asn Asn 790 Ile Glu Tyr Val Asn 870 Tyr Ile Phe Trp 695 His Asn Cys Ala Gin 775 Leu Leu Thr Asp Phe 855 Ile Leu Ser Ala 680 Leu Cys Asp Gin Tyr 760 Thr Lys Arg Ile Gly 840 Trp Ala Ala Leu Leu 920 665 Gin His Pro Gin Lys 745 Glu Ile Lys Gin Val 825 Leu Ser Thr Tyr Gly 905 Arg Leu Lys Arg Pro Asp 715 Ile Phe 730 Leu Trp Ser Gly Ala Val Arg Glu 795 Arg Leu 810 Ser Ser Leu Val Glu Thr Thr Met 875 Ser Leu 890 Phe Thr Asn Leu 700 Leu Ile Thr Val Thr 780 Ala His His Ser Ile 860 Ala Asn Ile Glu 685 Glu Asp Lys Ile Arg 765 Lys Ala Asp Phe Gin 845 Val Lys Val Asn 670 Ile Thr Ala Tyr Ser 750 Ile Arg Arg Ile Phe 830 Ser Asp Ser Leu Ser 910 Tyr Ser His Pro 735 Thr Ala Val Val Gly 815 Val Leu Glu Ile Lys 895 Thr Gly Val Val 720 Met Ile Ser Pro Thr 800 His Tyr Lys Thr Glu 880 Val Met Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg Thr Arg Asp Val Val Ile Pro Leu Thr Asn Asn Asp 925 Leu Leu Ile 157 Arg Met Ala Leu Leu Pro Ala Pro Ile Gly Gly Met Asn Tyr Leu Asn 930 935 940 Met Ser Arg Leu Phe Val Arg Asn Ile Gly Asp Pro Val Thr Ser Ser 945 950 955 960 Ile Ala Asp Leu Lys Arg Met Ile Leu Ala Ser Leu Met Pro Giu Giu 965 970 975 Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 980 985 990 Asp Trp Aia Ser Asp Pro Tyr Ser Aia Asn Leu Vai Cys Vai Gin Ser 995 1000 1005 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Val Leu Ile His 1010 1015 1020 Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Giu 1025 1030 1035 1040 Glu Asp Glu Arg Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 *Pro Arg Ala Ala His Glu Ile Leu Asp His Ser Val Thr Gly Ala Arg .*.1060 1065 1070 *..Glu Ser Ile Ala Gly Met Leu Asp Thr Thr Lys Gly Leu Ile Arg Ala .1075 1080 1085 **Ser Met Arg Lys Giy Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 Asn Tyr Asp Tyr Giu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Glu Ser Cys Ser Val Gin Leu **1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 1140 1145 1150 Pro Ile Tyr Gly Leu Giu Val Pro Asp Val Leu Giu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Giu Thr Cys Val Ilie Cys Giu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Gly Trp, Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 1-10C 1190 1195 1200 158 Ile Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Glu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 1285 1290 1295 Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lye Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Aen Thr Val *1345 1350 1355 1360 Leu His Leu His Val Glu Thr Asp Cye Cys Val Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Giu Leu 1380 1385 1390 Cys Thr Aen Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr *1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1.450 1455 Aen Giu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 Thr Glu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 159 1475 1480 1485 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 Met Ser Lys Gly Val Phe Lys Val Leu Val Asrn Ala Leu Ser His Pro 1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ilie Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Aen Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Giu 1570 1575 1580 Leu Glu Giu Phe Thr Phe Leu Leu Cys Giu Ser Asp Giu Asp Val Val 1585 1590 1595 1600 Pro Asp Arg Phe Asp Aen Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro le Arg Gly Leu Arg 1620 1625 1630 *SPro Val Giu Lys eye Ala Val Leu Thr Asp His Ile Lys Ala Giu Ala 1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Aen Pro Ile Ile Val 1650 1655 1660 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lye 1665 1670 1675 1680 **Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ilie Phe Asp Ala Leu Ala 1685 1690 1695 Giu Val Aen Val Ser Gin Pro Lys Val Gly Ser Asn Aen Ile Ser Aen *SS1700 1705 1710 Met Ser Ile Lye Asp Phe Arg Pro Pro His Asp Asp Val Ala Lye Leu 1715 1720 1725 Leu Lye Asp Ile Asn Thr Ser Lye His Aen Leu Pro Ile Ser Gly Gly 1730 1735 1740 Ser Leu Ala Aen Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu Asn .L 1% n 1750 1755 1760 160 Ser Ser Ala Cys Tyr Lys Ala Val Glu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Giu Pro Gly Giu Asp Gly Leu Phe Leu Gly Giu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lys Giu Ile Leu Lys Leu Asn Lys Cys Phe 1795 1800 1805 Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Giu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 1825 1830 1835 1840 Gly Asn Ile Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 1845 1850 1855 Val Gly Ser Ile Asp Cys Phe Asn Phe Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 Ser Ser Val Gly Phe Ile His Ser Asp Ile Glu Thr Leu Pro Asn Lys 1875 1880 1885 Asp Thr Ile Glu Lys Leu Giu Glu Leu Ala Ala Ile Leu Ser Met Ala *1890 1895 1900 *Leu Leu Leu Gly Lys Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro *1905 1910 1915 1920 *Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 1940 1945 1950 Thr Giu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Giu Lys Ile Lys Gin Gin Ile Ile Glu Ser Ser Val Arg Thr 1970 1975 1980 *Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Gly Ala Val Ser Arg Gly Asp Ile Asn Pro 2005 2010 2015 Ile Leu Lys Lys Leu Thr Pro Ile Giu Gin Val Leu Ile Ser Cys Giy 2020 2025 2030 161 Leu Ala Ile Aen Giy Pro Lys Leu Cys Lys Giu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 Arg Giu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Val 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ilie Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Arg Phe Ile Gin Asn Leu Lys Ser Gly Tyr 2115 2120 2125 Leu Ile Leu Asp Leu His Gin Asn Ile Phe Val Lys Asn Leu Ser Lye 2130 2135 2140 Ser Glu Lys Gin Ile Ile Met Thr Gly Gly Leu Lys Arg Glu Trp Val 2145 2150 2155 2160 Phe Lys Val Thr Val Lye Giu Thr Lye Glu Trp, Tyr Lye Leu Val Gly 2165 2170 2175 *Tyr Ser Ala Leu Ile Lye Asp 2180 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) SEQUENCE DESCRIPTION: SEQ ID NO:9: ACCAA.ACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 162
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
TGGTTGGTGA
AGGACCTCTC
GAAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTGC
AGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
CGGAAGAACA
ACTAGGTGCG
AAAACTTAGG
ATCCAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGAA
TACCATCCTA
AGCTGATTCG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTAAGGAGG
CGAGGATGCA
AGCGGTTGGA
GCTACCGAGA
GGAGAGCTAC
AACCGGCACA
TCGAAGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCCCAAATTT
GAGCTAAGAA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGCCGAT
TCAGCTGGAA
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGAAACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
GGCCAGAACA
ACACAGCCGC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAATTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATA.AA
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGA.A
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
CAGAGATTGC
CCCAAGTATC
AGGAAGATAG
GGCCCAGCAG
TTGACACTGC
TGCTTAGGCT
CTATAGTGTA
ACATCCGCCT
CAGCCCATCA
AATTGGAA-AC
CTTATTTGTG
CATAAGGCTG
AAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAAAGGCG
GTACACCCAA
GGTGAGGAAC
GGATATCAAG
TACATATATC
AACTATGTAT
CTTGATGAAC
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
AATGCATACT
ATTTCTACAC
GAGGGTCAAA~
AGCAAGTGAT
AACGGAGTCC
GCA.AGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGAACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTG.AACTCG
ACTGAGGACA
GGTGATCAAA
CAGAGTCGAG
GCGAGAGCTG
AGCCAAGATC
GCAGGAATCT
AATCTTCTAG
ATTGTTATAA
CCCACGATTG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 163 GAGCCAATGG CAGAAGAGCA GGCACGCCAT GTCAAAA.ACG GACTGGAATG CATCCGGGCT
S*
S*
S
S..
S
*5SS
S
S.
S S
S.
S
5559
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGA.ACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AAA.ACAGCCT
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CTAALAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GACCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGG
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAALAT
TGAAGGGAGA
CCCTGGALAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCALATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CAATGA.ATCT
CACTGACCGG
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGALA
CTATTATGAT
ACACGAGGAT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
AATGAAGTAG
ATCGAGGAAG
ACCTGCAGGG
GGATCAACTG
GACGCTGAAA
TGTTATTACG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAAAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATTAAGAAGC
AGCATCATGA
GAAATCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AAATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CTATGGCAGC
ALAGAGAAGGC
A.AGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCTGATGT
TA.ATCTCCAA
AGATCAACAG
TCGCCATTCC
CCGACTTGAA
AkCCCGTTGC
AGCTGCTGAA
TTGTTCCTGA
GGCTAGAGGA
ATGATCTTGC
ACTTACCTGC
ATGGTCAGAA
AGGCAGTTCG
ACCTCGCATC
CCCCCCAAGA
CAGCGGTGAA
TGATGGTGAT
TGGCGAACCT
GGGGTTCAGG
ACTCCAATCC
CCCGGACCCC
ATTAGCCTCA
TCGAAAGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACA
GGAATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 164 9 9* 9 9 9* *9 9 9 9 9*9* 9 9.
.9 9 9* 9 9*9* 9 9*99 999999 9 ~9 9 9. 9 9*
CCAGTCGACC
GCCTCCCALAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
AAGAATTCCG
TGTAGACCGT
GCCCGGACAA
GCCGACGGCA
CCACCAGCCA
TGCCCCCGAT
ATTGGAAGGC
GACCGAGGTG
CA.ACTAGTAC
TTCCACA.ATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATAGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
CATTTACGAC
AGTGCCCAGC
AA7LAGCCCCC
AGCGCGAACA
CCCCAATCTG
CCAAACCACC
CCCTCCCCCT
ACCCAACCGC
AACCTAAATC
ACAGAGACCT
CAACCCACCA
CTAGGCGACA
AGCGATTCCC
AGATCCACAG
AGACGTACAG
CTCACACCTT
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGAAGA
GGGAACTTCA
GAA.AAGATGG
AGCACAGGCA
TACCCGCTGA
ATAGTAAGAA
GACGTGATCA
AATGCCCGAA
TCCGAAAGAC
CCAGGCGGCC
CATCCTCCTC
AACCGCATCC
CTTCCTCAAC
AGGCATCCGA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CAAAGCCCGA
CAGGGCTCAA
GGAGAA.AGGT
ATCTGATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAAAGAA
GCCTGGTTTT
AA.ATGAGCAA
TGGATATCAA
TCCAGGCAGT
TAAATGATGA
AACGACC CC C
TCCACGGACC
CCAGCACAGA
GTGGGACCCC
CCACCACCCC
ACAAGAACTC
CTCCCTAGAC
AACTTAGGAG
CAAGTCGGCA
TGGCAGGCTG
ATGCTTTATG
AATCGGGCGA
AAAACTCCTC
TGAAAA.ACTG
CCTAACAACA
GCTCGATACC
CGGGTATTAC
CAACCTGCTG
TACAGAGCA
GAGTGA.AGTC
TGCACTTGGT
GACTCTCCAT
TGALAGACCTT
TTTGCAGCCA
CCAAGGACTA
CTCACAATGA
AAGCGAGAGG
ACAGCCCCGA
CGAGGACCAA
CGGGAAAGAA
CACAACCGAA
AGATCCTCTC
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCTC
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
CACAAGGCCA
CCCCCAAGGC
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAA
3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAA~C AGAACCCAGA CCCCGGCCCA 165 CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCA.ATC CCGCCGGCTC CCCCGGTGCC
CACAGGCAGG
p p.
p p p. pp p p p pp p. p p* p p p.
p p p p p.
p *6
A.ATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCCTCCTCT
CCCACCCCTA
GGTCTCAAGG
ACCGGTCAAA
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
GAGGCAATCA
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCGGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GGGGGGGCCC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
AAGGAGACAC
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCAATAACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAAA
TAAAGGCCCG
ATCCGACGCT
ACATAGGCTC
TTATCTCGAA
GACACCAACC
CCCCAAAAAA
CACACGACCA
CGGCCATCAC
GGGAGCCAcC
AAGGACACCA
ACCAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
CCCGAACAGA
AGGCCCCCAG
CGGCAACCA3A
CCCGCAGAAA
CAACCCGAAC
GTATCCCACA
CAATCCACCA
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCGG
ACAGCCGGCA
GCGAGCCTGG
ATATTGGCTG
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
CCCAGCACCC
GGGCCGACAG
ACCAGAACCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCA
CACCCGACGA
CTCATCCAAT
TGTTAACTCT
TGGTAGGAAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTACTCCT
TTGCACTTCA
AAACTACTAA
TTCAGGGTGT
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
AACCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
ALATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGGTGCG
CCAGTCCATG
TCAGGCAATT
CCAAGACTAC
CGGCCAGA.AG
CCCCAGTTTA
TGGAGGAGAC
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
AGAGGGGACT
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 166 -tee too..: 0 00 see.*
GTGTGCAGCC
TACACCAAGT
TCACAAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
TCTAGATGTA
AATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AAAATGCCTT
CCTGTGCTCG
ACCTALATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GAGAACAAGT
CCTATGTAAG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTTG
CATCGGGCAG
ACTAACTCAA
GATGAAGTGG
AAGATTAAAT
AACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGCAAAAGGT
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GATCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTATCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCALAGCAACC
CCGAACAATA
AACGAGACCG
TCATTAACAG
TCATGTTTCT
CCATCTACAC
TCGAGCATCA
GCCTGAGGAC
TCCTTAATCC
AGAGAATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTAGA
CCCAGGGAAT
CAGAGTTGTC
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
GCTTTAATAT
AGACCAGGCC
TCCTCTACAA
ACCGCACCCA
TCGGTAGTTA
GATAALATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTATGGGGGA
ACAACTGAGC
TCCAAGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
AATTGTTGGA
TAGTCTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAA.AC
GCATCALAGCC
ATTAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
GTGCTGACAC
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
GGTCGAGGTT
ACTTACCTAG
ATGTACCGAG
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
AGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACGGGA
ACAAATGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
CACTCTTCAA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAkTGTGTC
TGGAAAAGCC
TGTTTGAAGT
6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 167 AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 0060
S
@000
OS
S S
S
@5 05 S 0
S
555.
OS
55 5 00 0 55
S
00 55 .5 5 @005 0 0055
S
005500 0
S
550500 9 6 S0 S S 5* 0005 0 0050
GCAACCAGTC
AGCCCTTTGT
CAGCTTCCAG
CCCCTTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
TGGGATGGTG
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CGAGGAAGAT
AGTAATGATC
CACGGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA
TGCTTCCAAC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCT
TACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GGCATGGGAG
TAGTGAACCA
CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAT
CCGTGAACTC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AAATCAAAAT
ACAAATCCAA
TAGGTGTAAT
CTGTCCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAAGCCG
TCGAATTACA
TGCTTGCGGA
TCAGCTGCAC
ATCACATGAT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
TGTAATCAGG
CTCAAAAAGG
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAACA
GGGTAAAATC
TCCTTCATAC
TGCTTCGGGA
CCACAACAAT
CA-ACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
AGTCACCCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GGAATTCGCT
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGAACAGATG
CALAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
GTGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CG3CTTACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAA
TCAA.ACTCGC
GGAILAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
ACCAAAAACT
TCACTCACTC
CCA.ATCGCAG
ACCCACTAGT
GTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAAPLTGATTA
CCGGCbCACT
AAAGAGTCAA
GTCAGTGATA
8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 168
I.
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACATTTGA
CCGCTATGAC
A.ACTGATAGA
TGGAGCCTCT
CTTTCCTT;A
ATGAAGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAAAGGA
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGGGG
ATGCTTAAGG
GGAGAA.AGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
TATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAkAAT
GAAAGGTCAT
CAGTTGGCCA
TTCAGGTGAA
GAAATTTGGC
CAAGGCACTT
TTACGACCCT
GAGCTTTGAC
TGAGTTCAAC
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
GACACTAACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
TACCTGCAGC
ACTGAAATAC
TTAATTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTTT
CCGCTGACCC
GGGTTAACAC
TGCTTTATGC
GCTGCTCTCC
CCCAAGGGAA
CCATATGATG
CTGTCTTACA
ACTTACAAAA
AAATATTTTA
ACTCTAGCTG
TTAAA.AACCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGAATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAAG
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAAATT
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACcCCC
GCCTAA-AGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGATTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAAAGAT
CCCAGTCCAC
GAATTGAGGG
CAGTGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACAATGT
TACCCGAAAG
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
ACAAGTACCA
9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 GGAACGTGAG AGCAGCAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 169 4 a a.
a a.
l.a a
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTGTA
ATAAAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
ATCTCAAGAG
CACAACAACC
TTGTATGTGT
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCTTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TCCGGAGAAT
GTACTGCCTT
TTACGGATTG
TGTAAGTGAC
CAATGATCAA
GTGGACCATC
TGCTTCGTTA
ATGGCCCTAC
TCTTAGGCAA
ATCACATTTT
ACT CAAGAGC
AGCATGCAGT
CCTTGCATAT
CACAATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
GGGGGACTCT
CCAGAGCATC
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
GAGGCTAGCT
ATGGA.AGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCALAGGGG
AACCTTAAGA
AGGCTACATG
TTTGTCTATT
ATCGCAAGAT
AATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
GCCTCACTAA
TCATTCCTAG
ACTAGACTCC
TTAAAAGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
CGAGGACGGC
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTATCTATA
ACAATCAGAC
AACGGGAAGC
ATATTGGCCA
CAAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCGGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
ACTGGGCTAG
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTT;LAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
CAGTGCATTT
CAGCTTGTTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TCACCTCAAG
ATATTATGAT
GTCAGAGACT
TAAAAGCATC
GATACAGCAA
AGTCATACCC
TATTQGGGGGG
AGTAACATCA
GACCCTCCAT
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTG
CTCTCGAGTG
GCTATTGACA
GGCGAGAGCT
CCTTGAGGTC
ATCACGACTG
GCACAGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
TCAGCAAATC
TTTGTCCTGA
GAAGAGGACG
GCTCATGA;A
GATACCACAA
ATAACCAGAT
GGAAGAAAGC
CTAAGAAGCC
CCTGATGTAC
11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 170 p r.
P
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CTAGGCAAAG
CGACTAATTT
CCCTTGTCCG
CAGATA.AGA.A
TTTTAGAA.AC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
GACCATCAGG
AAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
TTGTAGACCA
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
AGTGGCGAGG
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCAATTCGA
AGAGGCTATG
TTACTCATGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACA.A
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCAACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCGGCCA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTCTTGT
AAACACTTAT
GGTCTAAGAC
TTATCTCCAG
TCTCTGACTT
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCAACGA
ACCAACAAGG
ATACCGGATC
CGATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGTTGTCATC
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATCTCCGGCG
ATGTGTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GAT CACT CCC
AGTGAAATAC
CAATCTCTCA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTCATAGAG
ATTTGATGTA
GTTCCTTTCT
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCGATC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCtTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
AGAATGAGCA
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGACC
AATCCAATTA
AAACAGATAA
12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 171 0 0. 0 0 0 0* 00 0 0**0 0* 0 0 0 GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 00*000 0 0 00 0 000 00.
0.
000.
0 00 0 0*
CAA.AGATCGG
ATGATGTTGC
GGGGCAATCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
TTGTCAAAGT
TCAATTTCAT
AGACCTTGCC
TGGCTCTGCT
GGGATTTTGT
TATACCCTAG
AGGCTAACCG
GGACTTCACC
CAATTGTGGG
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TCTGGGGGCA
ATCTCAAGTC
CCAAGTCAGA
TAACAGTCAA
CAGCAACAAC
AAAATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
AGTTAGTAAT
TGACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
CATTCTTCTT
CGGCTATCTG
GAAACAGATT
GGAGACCAAA
ATCTCAAATA
AAAGATATCA
GAAATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATCTCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
GCCTCAGGGC
TTCAA.AGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
TGAGCATCAA
ACACAAGCAA
CTTTCCGCAG
TA.ATTAGGAG
CTATGTTGAT
TTTCCGCCAA
TTGTCGAACA
AAGTCACGTG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTCA
CTGAATCTTA
TTAAGCAGCA
TATCCATTAA
ATATCAATCC
TGGCAATTAA
AAGATGGATT
ACCAAAGAAG
GAGAACTTAT
ACAGAAAGTT
TACACCAGAA
GGGGTTTGA
AGTTAGTCGG
GGCTTTCAGA
GCACAATCTT
AATCGGGTTG
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTGGTTATG
GATAATTGAA
GCAACTAAGC
TACTCTGAA
CGGACCTAAG
GCTTAATTCT
TCAACAAGGG
ATCTAGGATC
GATAAATAAG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
CCCCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAGG
GAGATACTTA
GGTCAAAGGG
GTAGGTAATA
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGA
TGCATACAAG
AAACTTACAC
CTGTGCAAAG
ATACTCATCC
ATGTTCCACG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 172 *TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asp Ser Leu Ser Val Asn Gin Ile Leu Tyr Pro Glu Val His Leu 1 5 1.0 Asp Ser Pro Ile Val Thr Asn Lys Ile Val Ala Ile Leu Glu Tyr Ala 25 *Arg Val Pro His Ala Tyr Ser Leu Giu Asp Pro Thr Leu Cys Gin Asn 40 Ile Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met Ile Ile Asn 55 *Asn Val Giu Val Gly Asn Val Ile Lys Ser Lys Leu Arg Ser Tyr Pro 70 75 Ala His Ser His Ile Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 90 Ile Glu Asp Lys Giu Ser Thr Arg Lys Ile Arg Giu Leu Leu Lys Lys 100 105 110 Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu *115 120 125 *Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Giu Leu Arg Giu Asp 130 135 140 Ile Lys Giu Lys Val Ile Asn Leu Gly Val Tyr Met His Ser Ser Gin 145 150 155 160 Trp Phe Giu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Giu Met Arg 165 170 175 Ser Val Ile Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 173
S
S
*5S*
S
180 Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu Ile Ser Leu Phe 225 Thr Arg Gly Ala Leu 305 Phe Ile Phe Asn Leu 385 Arg Ala His Val 210 Glu Glu Val Asn Tyr 290 Asn Ser Phe Arg Val 370 Met Asp Ala Glu 195 Ala Leu Thr Arg Pro 275 Leu His Asp Ile Ser 355 Arg Lye Arg Asp Gin Ile Val Ala Tyr 260 Thr Gin Cys Glu Thr 340 Phe Lye Gly His Thr 420 Cys Ile Leu Met 245 Met Tyr Leu Phe Gly 325 Asp Gly Tyr His Gly 405 Ile Val Ser Met 230 Thr Trp Gin Arg Thr 310 Thr Asp His Met Ala 390 Gly Arg Asp Lye 215 Tyr Ile Lys Ile Asp 295 Glu Tyr Ile Pro Asn 375 Ile Ser Asn Asn 200 Glu Cys Asp Leu Val 280 Ile Ile His His Arg 360 Gin Phe Trp Ala Trp 440 Ser Asp Ala Ile 265 Ala Thr His Glu Leu 345 Leu Pro Cys Pro Gin 425 Lys Gin Val Arg 250 Asp Met Val Asp Leu 330 Thr Glu Lye Gly Pro 410 Ala Ser His Ile 235 Tyr Gly Leu Glu Val 315 Ile Gly Ala Val Ile 395 Leu Ser Phe Val 220 Glu Thr Phe Glu Leu 300 Leu Glu Glu Val Ile 380 Ile Thr Gly Ala 205 Tyr Gly Glu Phe Pro 285 Arg Asp Ala Ile Thr 365 Val Ile Leu 31u Gly 445 Tyr Arg Leu Pro 270 Leu Gly Gin Leu Phe 350 Ala Tyr Asn Pro Gly 430 Val Arg Leu Leu Leu 255 Ala Ser Ala Asn Asp 335 Ser Ala Glu Gly Leu 415 Leu Lye Asp Thr Met 240 Gly Leu Leu Phe Gly 320 Tyr Phe Glu Thr Tyr 400 lis Vhr ?he 435 Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 455 460 174 o o Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trj Asp Ser Val Tyr A C: 1 U. -J 470 475 @00.
a.
I..
*0 Pro Arg Val Asn Leu 545 Glu Met Val Val Val 625 Asp Ser Tyr Leu Leu 705 Lys Leu Ile Leu 530 Phe Asn Ala Ser Leu 610 Arg Gin Ala Glu Pro 690 Tyr Glu Val Met 515 Ser Ala Leu Lys Gly 595 Lys Ala Asp Phe Thr 675 Ser Val Phe Asp 500 Tyr Tyr Lys Ile Asp 580 Val Thr Ala Thr Ile 660 Ile Phe Ser Leu 485 Val Val Ser Met Ser 565 Glu Pro Tyr Lys Asp 645 Thr Ser Phe Asp Arg Phe Val Leu Thr 550 Asn His Lys Ser Gly 630 His Thr Leu Gin Pro 710 Tyr Leu Ser Lys 535 Tyr Gly Asp Asp Arg 615 Phe Pro Asp Phe Trp 695 His Asp Asn Gly 520 Glu Lys Ile Leu Leu 600 Ser Ile Glu Leu Ala 680 Leu Cys Pro Asp 505 Ala Lys Met Gly Thr 585 Lys Pro Gly Asn Lye 665 Gin His Pro SPro 490 Ser Tyr Glu Arg Lys 570 Lys Glu Val Phe Met 650 Lys Arg Lys Pro SLys Ser Leu Ile Ala 555 Tyr Ala Ser His Pro 635 Glu Tyr Leu Arg Asp 715 Gly Phe His Lys 540 Cys Phe Leu His Thr 620 Gin Ala Cys Asn Leu 700 Leu SThr Asp Asp 525 Glu Gin Lys His Arg 605 Ser Val Tyr Leu Glu 685 Glu Asp Gly Pro 510 Pro Thr Val Asp Thr 590 Gly Thr lle Glu Asn 670 Ile Thr Ala I Ser 495 Tyr Glu Gly Ile Asn 575 Leu Gly Arg Arg Thr 655 Trp Tyr Ser lis Arg SAsp Phe Arg Ala 560 Gly Ala Pro Asn Gin 640 Val Arg Gly Val Ile 720 Pro Leu Tyr Lys Val Pro Asn Asp Gin Ile Phe Ile Lys Tyr Pro Met 725 730 735 175
S
S.
S.
Gly Pro Leu Ser 785 Arg His Ser Ser Arg 865 Arg Ile Thr Arg Met 945 Ile Thr Asp Gly Tyr Val 770 Thr Asp Leu Lys Ile 850 Al a Gly Gin Arg Met 930 Ser Al a Leu Trp Ile Leu 755 Gin Trp Tyr Lys Gly 835 Ala Ala Tyr Gin Asp 915 Ala Arg Asp His Al a 995 Glu 740 Tyr Gly Pro Phe Al a 820 Ile Arg Cys Asp Ile 900 Val Leu Leu Leu Gin 980 Ser Gly Leu Asp Tyr Vai 805 Asn Tyr Cys S er Arg 885 Leu Val Leu Phe Lys 965 Val Asp Cys Ala Gin 775 Leu Leu Thr Asp Phe 855 Ile Leu Ser Pro Al a 935 Arg Met Thr Tyr Gin Tyr 760 Thr Lys Arg Ile Gly 840 Trp Ala Ala Leu Leu 920 Pro Asn Ile Gin Ser Lys 745 Glu Ile Lys Gin Vai 825 Leu Ser Thr Tyr Giy 905 Leu Ile Ile Leu Gin 985 Ala Trp Gly Val Giu 795 Leu Ser Val Thr Met 875 Leu Thr Asn Gly Asp 955 Ser Gly Leu le Arg 765 Lys Ala Asp Phe Gin 845 Val Lys Val Aen Asp 925 Aen Val Met Ser Cys Ser 750 Ile Arg Arg Ile Phe 830 Ser Asp Ser Leu Ser 910 Leu Tyr Thr Pro Ser 990 Val Thr Al a Val Val Giy 815 Val Leu Giu Ile Lys 895 Thr Leu Leu Ser Glu 975 Phe Ile Ser Pro Thr 800 His Tyr Lye Thr Giu 880 Val1 Met Ile Aen Ser 960 Giu Leu 1000 1005 Ile Thr Arg Leu Leu Lys An Ile Thr Ala Arg Phe Val Leu Ile His 176 1010 1015 1020 .~:Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu *1025 1030 1035 1040 Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 -Pro Arg Ala Ala His Giu Ile Leu Asp His Ser Vai Thr Gly Ala Arg 1060 1065 1070 Glu Ser Ile Ala Gly Met Leu Asp Thr.Thr Lys Gly Leu Ile Arg Ala *1075 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 'Asn TDyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Giy 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Giu Ser Cys Ser Val Gin Leu 1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 9...*1140 1145 1150 *..Pro Ile Tyr Gly Leu Glu Vai Pro Asp Vai Leu Giu Ser Met Arg Gly *1155 1160 1165 **His Leu Ile Arg Arg His Glu Thr CyB Val Ile Cys Glu Cys Gly Ser *1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp :1185 1190 1195 1200 Ile Asp Lys Giu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 *Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Glu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 1285 1290 1295 177 :Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr *1300 1305 1310 Sle Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Vai Asp 1315 1320 1325 *Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Giy Leu Gly Val Leu 1330 1335 1340 Giu Thr Leu Phe Arg Leu Giu Lye Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 Leu His Leu His Val Giu Thr Asp Cys Cys Val Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lye Leu Giu Leu Arg Ala Giu Leu 1380 1385 1390 Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr *1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lye Phe Glu Lye Asp His Met 1445 1450 1455 Asn Giu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 *Thr Giu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 1475 1480 1485 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lye Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 Met Ser Lye Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 1525 1530 1535 Lye Ile Tyr Lye Lye Phe Trp His Cys Gly Ile Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Aen Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 178 Val. Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 1570 1575 1580 *Leu Glu Giu Phe Thr Phe Leu Leu Cys Giu Ser Asp Giu Asp Val Vai .1585 1590 1595 1600 Pro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val *Leu Ala *.1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro le Arg Gly Leu Arg 1620 1625 1630 *Pro Vai Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Giu Ala *1635 1640 1645 Met Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val *1650 1655 1660 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Glu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn 1700 1705 1710 *Met Ser Ile Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Giy Gly 1730 1735 1740 Asn Leu Ala Asn Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu Asn 1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Glu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Giu Pro Gly Giu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr LyB Gu Ile Leu Lys Leu Asn Lys Cys Phe 1795 1800 1805 Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Gly Leu Val. Glu His Arg Met Gly Val 1825 1830 1835 1840 Gly Asn Ile Val. Lys Val Leu Phe Asn Gly Arg Pro Giu Val Thr Trp 179 1845 1850 1855 Val Gly Ser Val Asp Cys Phe Asn Phe Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 Ser Ser Val Gly Phe Ile His Ser Asp Ile Glu Thr Leu Pro Asp Lys 1875 1880 1885 Asp Thr Ile Glu Lys Leu Glu Glu Leu Ala Ala Ile Leu Ser Met Ala 1890 1895 1900 Leu Leu Leu Gly Lys Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 1940 1945 1950 Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Glu Lys Ile Lys Gin Gin Ile Ile Glu Ser Ser Val Arg Thr 1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gln Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp Ile Asn Pro 2005 2010 2015 Thr Leu Lys Lys Leu Thr Pro Ile Glu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Glu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu Ile 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Lys Phe Ile Gin Asn Leu Lys Ser Gly Tyr 2115 2120 2125 180
S
S
S. S S *5 5
S@
S
S
S
5.*
S
S.
S
5) Leu Ile Leu Asp Leu His Gin Asn Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Giu Lys Gin Ile Ile Met Thr Gly Giy Leu Lys Arg Glu Trp Vai 2145 2150 2155 2160 Phe Lys Val Thr Val Lys Giu Thr Lye Glu Trp Tyr Lye Leu Val Gly 2165 2170 2175 Tyr Ser Ala Leu Ile Lye Asp 2180 INFORMATION FOR SEQ ID NO:1i: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT S. TCAAGATCCT
TAAGGAGCTT
GTGGAGCCAT
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGG
TGATTCTGGG
CAGACACGGC
ATTATCAGGG
AGCATTGTTC
CAGAGGAATC
ATCCAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGAA
TACCATCCTA
AGCTGATTCG
ACAAGAGCAG
AAAAGAAACA
AAACACATTA
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCCCAAATTT
GAGCTAAGAA
GATTAGGGAT ATCCGAGATG
AGGACAAACC
TTATAGTACC
TGGTGAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAATTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAAA
ACCCATTACA
AATCCCTGGA
AATTGGAAAC
CTTATTTGTG
CATAAGGCTG
AAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAAAGGCG
GTACACCCAA
GCCACACTTT
TCAGGATCCG
GATTCCTCAA
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
120 180 240 300 360 420 480 540 600 660 720 181. 0* 4~ 44
S
4* 4**4
TAGTTGGTGA
AGGACCTCTC
GAAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTC
AGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
CGGAAGAACA
ACTAGGTGCG
AAAACTTAGG
GAGCCAATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTAAGGAGG
CGAGGATGCA
AGCGGTTGGA
GCTACCGAGA
GGAGAGCTAC
AACCGGCACA
TCGAAGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GACCTGGAGA
CATCAAGCAC
GAATCCAAGA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGCCGAT
TCAGCTGGA-A
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
GGCCAGAACA
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGAA
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
CAGAGATTGC
CCCAAGTATC
AGGAAGATAG
GGCCCAGCAG
TTGACACTGC
TGCTTAGGCT
CTATAGTGTA
ACATCCGCCT
CAGCCCATCA
GTCAAAAACG
ATCGAGGAAG
ACCTGCAGGG
GGATCAACTG
GACGCTGAAA
TGTTATTACG
ATCATGGTTC
GGTGAGGAAC
GGATATCAAG
TACATATATC
AACTATGTAT
CTTGATGA-AC
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
ALATGCATACT
ATTTCTACAC
GAGGGTCAAA
AGCAAGTGAT
AACGGAGTCC
GCAAGCCATG
CIAATGACAGA
ACCATCCATC
ACCATCCACT
GACTGGAATG
CTATGGCAGC
AAGAGAAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
AGGATTGCCG
AGAACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTGALACTCG
ACTGAGGACA
GGTGATCAAA
CAGAGTCGAG
GCGAGAGCTG
AGCCAAGATC
GCAGGAATCT
AATCTTCTAG
ATTGTTATAA
CCCACGATTG
CATCCGGGCT
ATGGTCAGAA
AGGCAGTTCG
ACCTCGCATC
CCCCCCAAGA
CAGCGGTGAA
TGATGGTGAT
780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 AGCACCCTCT CAGGAGGAGA CAATGALATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 182 0* a
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AAAACAGCCT
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
CCAGTCGACC
GCCTCCCAAG
AAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
TCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGG
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAA-AT
TGAAGGGAGA
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
CAACTAGTAC
GTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CACTGACCGG
AGA.AGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
CTATTATGAT
ACACGAGGAT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
AATGAAGTAG
IAACCTAAATC
ACAGAGACCT
CAACCCACCA
CTAGGCGACA
AGCGATTCCC
AGATCCACAG
AGACGTACAG
CTCACACCTT
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAAAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATTAAGAAGC
AGCATCATGA
GAAATCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AA.ATCCAGCC
AAAGGAGCCA
CTACAGCTCA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CA7LAGCCCGA
CAGGGCTCAA
GGAGAAAGGT
CCATCTCTAT
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCTGATGT
TAATCTCCAA
AGATCAACAG
TCGCCATTCC
CCGACTTGAA
KACCCGTTGC
AGCTGCTGAA
TTGTTCCTGA
GGCTAGAGGA
ATGATCTTGC
ACTTACCTGC
AACTTAGGAG
CILAGTCGGCA
TGGCAGGCTG
ATGCTTTATG
AATCGGGCGA
AAAACTCCTC
TGAA.AAACTG
CCTAACAACA
GGGGTTCAGG
ACTCCAATCC
CCCGGACCCC
ATTAGCCTCA
TCGAA.AGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACAA
GGAATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 183 *0 S 0 00.
0* 5
TCAACGCAA.A
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
AAGA.ATTCCG
TGTAGACCGT
GCCCGGACAA
GCCGACGGCA
CCACCAGCCA
TGCCCCCGAT
ATTGGAAGGC
GACCGAGGTG
ACTAA.ACAAA
CGGTGCCGCG
CCCCGGTGCC
AATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCCTCCTCT
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATAGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
CATTTACGAC
AGTGCCCAGC
AAAAGCCCCC
AGCGCGAACA
CCCCAATCTG
CCAAACCACC
CCCTCCCCCT
ACCCAACCGC
ACTTAGGGCC
CCCCCAACCC
CACAGGCAGG
GGGGGGGCCC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGALAGA
GGGAACTTCA
GAAAAGATGG
AGCACAGGCA
TACCCGCTGA
ATAGTAAGALA
GACGTGATCA
AATGCCCGAA
TCCGAAAGAC
CCAGGCGGCC
CATCCTCCTC
AACCGCATCC
CTTCCTCAAC
AGGCATCCGA
ALAGGAACATA
CCGACAACCA
GACACCAACC
CCCCAAAAAA
CACACGACCA
CGGCCATCAC
GGGAGCCACC
AAGGACATCA
ACCAAAAGAT
ATCTGATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAAAGAA
GCCTGGTTTT
AAATGAGCAA
TGGATATCAA
TCCAGGCAGT
TAAATGATGA
AACGACCCCC
TCCACGGACC
CCAGCACAGA
GTGGGACCCC
CCACCACCCC
ACAAGAACTC
CTCCCTAGAC
CACACCCAAC
GAGGGAGCCC
CCCGAACAGA
AGGCCCCCAG
CGGCAACCAA
CCCGCAGAA-A
CAACCCGAAC
GTATCCCACA
CAATCCACCA
GCTCGATACC
CGGGTATTAC
CAACCTGCTG
TACAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTTGCAGCCA
CCAAGGACTA
CTCACAATGA
AAGCGAGAGG
ACAGCCCTGA
CGAGGACCAA
CGGGAAAGAA
CACAACCGAA
AGATCCTCTC
AGAACCCAGA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAAcCCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCAA
CACCCGACGA
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCTC
TTCAALAGTTC
CAGCCAGAAG
CCAGCCAGCA
CACAAGGCCA
CCCCCAAGGC
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAA
CCCCGGTCCA
CCGCCGGCTC
AACCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 184 0*
CCCACCCCTA
GGTCTCAAGG
ACCGGTCAAA
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
GAGACAATCA
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCGGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GTGTGCAGCC
TACACCAAGT
TCACAAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
AAGGAGACAC
TGAACGT CT C
TCCATTGGGG
TTATGACTCG
TCAATA.ACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAAA
TAAAGGCCCG
ATCCGACGCT
ACATAGGCTC
TTATCTCGAA
AAAATGCCTT
CCTGTGCTCG
ACCTAATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTATCG
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCGG
ACAGCCGGCA
GCGAGCCTGG
ATATTGGCTG
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCAT'TGG
GATGCCAAGG
AGCACTAGCA
CTCATCCAAT
TGTTAACTCT
TGGTAGGAAT
TCATAAILATT
AATACAGGAG
CCCAGALATAT
GAGTAGTCCT
TTGCACTTCA
AAACTACTAA
TTCAGGGTGT
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
TCCAAGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
AATTGTTGGA
TAGTCTACAT
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGGTGCG
CCAGTCCATG
TCAGGCAATT
CCAAGACTAC
CGGCCAGAAG
CCCCAGTTTA
TGGAGGAGAC
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
AGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 185 S S
S.
S S
S.
S. S S 5
S
S 5555
S
555*55
S
S 5*
*SS*
5*55
S
S.
I. 55
S
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
TCTAGATGTA
AATCATCGGT
AATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AGGTGTTATC
GCAACCAGTC
AGCCCTTTGT
CAGCTTCCAG
CCCCTTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGGGTTGAT
GAGAACAAGT
CCTATGTAAG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTTG
CATCGGGCAG
ACTAACTCALA
GATGAAGTGG
AAGATTAAAT
ILACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGCAkAAGGT
AGAALATCCGG
AGTAATGATC
CACGGGGAAG
CTCGTCA-AGC
ACGGATGATC
ILATCAAGCAA
TGCTTCCAC
CCATTGAAGG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGA.ACAATA
AACGAGACCG
TCATTAACAG
TCATGTTTCT
CCATCTACAC
TCGAGCATCA
GCCTGAGGAC
TCCTTAATCC
AGAGA.ATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTAGA
CCCAGGGAAT
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
GCTTTAATAT
AGACCAGGCC
TCCTCTACAA
ACCGCACCCA
TCGGTAGTTA
GATAAATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTATGGGGGA
ACAACTGAGC
TCCGGTGTTC
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAAA
GGGTAAAATC
TCCTTCATAC
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAAAC
GCATCAAGCC
ATCAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
GTGCTGACAC
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
GGTCGAGGTT
ACTTACCTAG
ATGTACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGAACAGATG
CAAGCACTCT
GGGGTCTTGT
GGGGCGTTGT
TCTTACGGGA
ACAAI-TGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
CACTCTTCAA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
TGTTTGAAGT
ACTATCTTGA
TCAAACTCGC
GGAALAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 186 9 9. 9 9 9* *9 9 9 9 S 9 9. 9 99 9 *9 GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 9* 9 5 9*
S
9
S
99 9 9 9 99 9 9 9999 9 9*
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
TGGGATGGTG
ATAGGGCTGC
GTGAA.ATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
CTCATATTCC
CGAGGAAGAT
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACATTTGA
CCGCTATGAC
AACTGATAGA
ATGGACCTAT
A.ACCTAGCCT
TACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GGCATGGGAG
TAGTGAACCA
CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAT
CCGTGAACTC
ATGCTTAAGG
GGAGAA.AGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
TATTGATGCT
TGGTTTCTTC
ACAAATCCAA
TAGGTGTAAT
CTGTCCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAAGCCG
TCGAATTACA
TGCTTGCGGA
TCAGCTGCAC
ATCACATGAT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AALTGTCATCA
TGTAATCAGG
CTCAAAAAGG
GACACTAACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
CCACAACAAT
CAACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
AGTCACCCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GGAATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAALAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
GTGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CGCTTACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAAATT
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
ACCAAAA.ACT
TCACTCACTC
CCAATCGCAG
ACCCACTAGT
GTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAAATGATTA
CCGGCCCACT
AAAGAGTCAA
GTCAGTGATA
GAATTGAGGG
CAGTGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 187 S~e a 0 oes0 ease S0.06
TGGAGCCTCT
CTTTCCTTAA
ATGA-AGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAA.AGGA
AGTTCCTGCG
TTAPLTGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGGGG
GGAACGTGAG
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTGTA
ATAALAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAAAAT
GAAAGGTCAT
CAGTTGGCCA
TTCAGGTGA-A
GAAATTTGGC
CAkAGGCACTT
TTACGACCCT
GAGCTTTGAC
TGAGTTCAAC
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
AGCAGCAPAkA
TCCGGAGAAT
GTACTGCCTT
TTACGGATTG
TGTAAGTGAC
CAATGATCAA
GTGGACCATC
TGCTTCGTTA
TACCTGCAGC
ACTGAAATAC
TTAACTGAAG
TTCTCATTTT
GTTAGGAA.AT
GCCATATTTT
C CG CTGAC CC
GGGTTAACAC
TGCTTTATGC
GCTGCTCTCC
CCCAAGGGALA
CCATATGATG
CTGTCTTACA
ACTTACAAA.A
AAATATTTTA
ACTCTAGCTG
TTAAAAACCT
GGGTTTATAG
ATGGAAGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCAAGGGG
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGA.ATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACA.ATGG
TCTCAGGAGT
ACTCCCGAAG
GGTTCCCTCA
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCC CCGACCT
AGTACCCTAT
CCTATCTATA
ACAATCAGAC
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGATTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAAAGAT
CCCAGTCCAC
AGTAATTCGG
CAGTGCATTT
CAGCTTGTTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACA.ATGT
TACCCGAAAG
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAALATC
GATGAGCACG
CTCAAAGAAA
ACAAGTACCA
CAGGACCAAG
ATCACGACTG
GCACAGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 188
ESSS
S
6000 S. 05 S S 0 @0 Se S 0
S
00 U. S 0S S S
S.
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG
GCAAATGAGA
@0 00 0 Se OCee 5 0050 55005 000.0 !0
S
S
50
S
S
5565
S
S.
S.
0
S.
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
ATCTCAAGAG
CACAACAACC
TTGTATGTGT
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCTTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGIAAGCTTGC
CAGTGTACTC
CTAGGCAAAG
CGACTAATTT
ACTCILAGAGC
AGCATGCACT
CCTTGCATAT
CACAATCAAT
CTTAATA-AGG
CAGGCTGTTT
AATGATTCTC
GGGGGACTCT
CCAGAGCATC
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
GAGGCTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
ATCGCAAGAT
A-ATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
GCCTCACTAA
TCATTCCTAG
ACTAGACTCC
TTAAAAGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
ATCACATTTT
TTTGTCTATT
CAAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCGGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
ACTGGGCTAG
TCKAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
ATATTATGAT
GTCAGAGACT
TAAAAGCATC
GATACAGCA
AGTCATACCC
TATTGGGGGG
AGTAACATCA
GACCCTCCAT
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTG
CTCTCGAGTG
GCTATTGACA
GGCGAGAGCT
CCTTGAGGTC
ATGTCTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAAATAC
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
TCAGCAALATC
TTTGTCCTGA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA
TTTGTCATAT
189
CAGATAAGAA
TTTTAGAAAC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
GACCATCAGG
ILAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGGC
TTGTAGACCA
GATTGAGAGT
CAAAGATCGG
ATGATGTTGC
GGGGCAATCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
e
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATGCA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCAATTCGA
AGAGGCTATG
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAAATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
AACTTTATAT
CTCGAGAALAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCA.ACTAT
AAATTTGAGA
AGTTTCATA.A
TGTGCGGCCA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTCTTGT
AALACACTTAT
GGTCTAAGAC
TTATCTCCAG
TCTCTGACTT
TTCATTTTCG
ATCTCAAATA
AAAGATATCA
GAAATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
ACCAACAAGG
ATACCGGATC
CGATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGTTLTCATC
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATCTCCGGCG
ACGCCCTCGC
TGAGCATCA
ACACAAGCA
CTTTCCGCAG
TILATTAGGAG
CTATGTTGAT
TTTCCGCCAA
TTGTCGAACA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTCATAGAG
ATTTGATGTA
GTTCCTTTCT
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCGATC
TGAGGTAAAT
GGCTTTCAGA
GCACAATCTT
A.ATCGGGTTG
ATGCCTTGAG
CACTTATAIA
TTCTAGATCT
CAGAATGGGA
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
AGAATGAGCA
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGACC
AATCCAATTA
AAACAGATAA
GTCAGTCAGC
CCCCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAGG
GAGATACTTA
GGTCAA.AGGG
GTAGGTAATA
13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 190
TTGTCAAAGT
TCAATTTCAT
AGACCTTGCC
TGGCTCTGCT
GGGATTTTGT
TATACCCTAG
AGGCTAACCG
GGACTTCACC
CAATTGTGGG
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TCTGGGGGCA
ATCTCAAGTC
CCAAGTCAGA
TAACAGTCAA
ACTAATTGGT
GCTCTTTAAC
AGTTAGTAAT
TGACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
CATTCTTCTT
CGGCTATCTG
GAAACAGATT
GGAGACCAAA,
TGAACTCCGG
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATCTCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
GCCTCAGGGC
TTCAAkGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
AACCCTAATC
AAGTCACGTG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTCA
CTGAATCTTA
TTAAGCAGCA
TATCCATTAA
ATATCAATCC
TGGCAATTAA
AAGATGGATT
ACCAAAGAAG
GAGAACTTAT
ACAAAAAGTT
TACACCAGAA
GGGGTTTGAA
AGTTAGTCGG
CTGCCCTAGG
TTTCTATTCC
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTGGTTATG
GATAATTGAA
GCAACTAAGC
TACTCTGAAA
CGGACCTAAG
GCTTAATTCT
TCAACAAGGG
ATCTAGGATC
GATAAATAAG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
TGGTTAGGCA
C!AGCTTTGTC
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTA
TCATCTGTGA
TGCATACALAG
AAACTTACAC
CTGTGCAAAG
ATACTCATCC
ATGTTCCACG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
TTATTTGCAA
TGGT
14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15894 TATATTAAAG AAAACTTTGA AAATACGAAG INFORM4ATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STR.ANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 191 Met Asp Ser Leu Ser Val Asn Gin Ile Leu Tyr Pro Giu Val His Leu 1 5 Asp Arg Ile Asn Aia Ile Gly Arg Ile 145 Trp Ser Pro Leu Phe 225 Thr Ser Vai Lys Vai His Giu Aen Asp 130 Lys Phe Val Vai Vai 210 Glu Glu Pro Pro His Giu S er Asp Ser 115 Thr Glu Giu Ile Phe 195 Ala Leu Thr Ile His Arg Val His Lys 100 Leu Asn Lys Pro Lys 180 Phe Ile Val Al a Val Ala Leu Giy Ile Giu Tyr Ser Val Phe 165 Ser Thr Ile Leu Met 245 Thr Tyr Lye Asn 70 Pro Ser Ser Arg Ile 150 Leu Gin Gly Ser Met 230 Thr Aen Ser Asn 55 Val Tyr Thr Lye Leu 135 Asn Phe Thr Ser Lye 215 Tyr Ile Lys Leu 40 Giy Ile Pro Arg Val1 120 Gly Leu Trp His Ser 200 Giu Cys Asp Ile 25 Giu Phe Lye Asn Lye 105 Ser Leu Gly Phe Thr 185 Val Ser Asp Ala Val Asp Ser Ser Cys 90 Ile Asp Gly Val Thr 170 Cys Glu Gin Val Arg 250 Al a Pro Aen Lye 75 Asn Arg Lye Ser Tyr 155 Val His Leu His Ile 235 Tyr Ile Thr Gin Leu Gin Glu Val Glu 140 Met Lys Arg Leu Val 220 Glu Thr Leu Leu Met Arg Asp Leu Phe 125 Leu His Thr Arg.
Ile 205 Tyr Gly Glu Git Cys Ile Ser Leu Leu 110 Gin Arg Ser Giu Arg 190 Ser ryr Arg Leu Tyr Gin Ile Tyr Phe Lye Cys Glu Ser Met 175 His Arg Leu Leu Leu 255 Al a Asn Asn Pro Asn Lys Leu Asp Gin 160 Arg Thr Asp Thr Met 240 Gly Arg Val Arg Tyr Met Trp Lye Leu Ile Asp Gly Phe Phe Pro Ala Leu 260 7 f; C; 192 Gly Ala Leu 305 Phe Ile Phe Asn Leu 385 Arg Ala His Gly Lys 465 Pro Arg Val Asn Leu Asn Tyr 290 Asn Ser Phe Arg Val 370 Met Asp Ala Glu Cys 450 Asp Lys Leu Ile Leu 530 Phe Pro 275 Leu His Asp Ile Ser 355 Arg Lys Arg Asp Gin 435 Phe Lys Giu Val Met 515 Ser Ala Thr Gin Cys Giu Thr 340 Phe Lys Gly His Thr 420 Cys Met Ala Phe Asp 500 Tyr Tyr Lys Tyr Leu Phe Gly 325 Asp Gly Tyr His Gly 405 Ile Val Pro Leu Leu 485 Val Val Ser Met Gin Arg Thr 310 Thr Asp His Met Ala 390 Gly Arg Asp Leu Ala 470 Arg Phe Val Leu Thr Ile Asp 295 Glu Tyr Ile Pro Asn 375 Ile Ser Asn Asn Ser 455 Ala Tyr Leu Ser Lys 535 Tyr Val 280 Ile Ile His His Arg 360 Gin Phe Trp Al a Trp 440 Leu Leu Asp Asn Gly 520 Glu Lys Al a Thr His Giu Leu 345 Leu Pro Cys Pro Gin 425 Lys Asp Gin Pro Asp 505 Ala Lys Met Met Val Asp Leu 330 Thr Glu Lys Gly Pro 410 Ala S er S er Arg Pro 490 Ser Tyr Giu Arg Leu Glu Val1 315 Thr Gly Ala Val1 Ile 395 Leu Ser Phe Asp Glu 475 Lys Ser Leu Ile Al a Giu Leu 300 Leu Giu Giu Val Ile 380 Ile Thr Gly Ala Leu 460 Trp Gly Phe His Lys 540 Cys Pro Leu 285 Arg Gly Asp Gin Ala Leu Ile Phe 350 Thr Ala 365 Vai Tyr Ile Asn Leu Pro Glu Gly 430 Gly Val 445 Thr Met Asp Ser Thr Gly Asp Pro 510 Asp Pro 525 Glu Thr Gln Val Ser Leu Ala Phe Asn,; Gly 320 Asp Tyr 335 Ser Phe Ala Glu Giu Thr Gly Tyr 400 Leu His 415 Leu Thr Lys Phe Tyr Leu Val Tyr 480 Ser Arg 495 Tyr Asp Glu Phe Gly Arg Ile Ala 193 545 550 555 560 GiU Asn Leu Ile Ser Asn Gly Ile Gly Lys Tyr Phe Lys Asp Asn Gly 565 570 575 Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 580 585 590 Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 595 600 605 Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 610 615 620 Val Arg Ala Ala Lys Gly Phe Ile Gly Phe Pro Gln Val Ile Arg Gin 625 630 635 640 Asp Gin Asp Thr Asp His Pro Giu Asn Met Giu Ala Tyr Giu Thr Vai 645 650 655 Ser Ala Phe Ile Thr Thr Asp Leu Lys Lys Tyr CyS Leu Asn Trp Arg 660 665 670 Tyr Giu Thr Ile Ser Leu Phe Ala Gin Arg Leu Asn Giu Ile Tyr Gly 675 680 685 Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 690 695 700 Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Ile 71 71 720 Pro Leu Tyr Lys Val Pro Asn Asp Gin Ile Phe Ile Lys Tyr Pro Met 725 730 735 Gly Gly Ile Giu Gly Tyr Cys Gin Lys Leu Trp Thr Ile Ser Thr Ile 740 745 750 Pro Tyr Leu Tyr Leu Ala Ala Tyr Giu Ser Gly Val Arg Ile Ala Ser 755 760 765 Leu Val Gin Gly Asp Asn Gin Thr Ile Ala Val Thr Lys Arg Val Pro 770 775 780 Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Giu Ala Ala Arg Val Thr 785 790 795 800 Arg Asp Tyr Phe Vai Ile Leu Arg Gin Arg Leu His Asp Ile Gly His 805 810 815 His Leu Lys Ala Asn Giu Thr Ile Val Ser Ser His Phe Phe Vai Tyr 820 825 830 194 Ser Lys Gly Ile Tyr Tyr Asp Gly Leu Leu Vai Ser Gin Ser Leu Lys 835 845 Ser Ile Ala 850 Arg Ala Ala 865 Arg Giy Tyr Ile Gin Gin Thr Arg Asp 915 Arg Met Aia 930 Met Ser Arg 945 Ile Aia Asp Thr Leu His Asp Trp, Ala 995 Ile Thr Arg 1010 Ser Pro Asn 1025 Giu Asp Giu Pro Arg Ala Giu Ser Ile 1075 Arg Cys Val Phe Trp Ser Giu Thr Ile Vai Asp Giu Thr 855 860 Cys Ser Asn Ile Ala Thr Thr Met Ala Lys Ser Ile Giu 870 875 880 Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Vai 885 890 895 Ile Leu Ile Ser Leu Gly Phe Thr Ile Asn Ser Thr Met 900 905 910 Val Vai Ile Pro Leu Leu Thr Asn Asn Asp Leu Leu Ile 920 925 Leu Leu Pro Ala Pro Ile Gly Gly Met Asn Tyr Leu Asn 935 940 Leu Phe Vai Arg Asn Ile Gly Asp Pro Vai Thr Ser Ser 950 955 960 Leu Lys Arg Met Ile Leu Ala Ser Leu Met Pro Giu Giu 965 970 975 Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 980 985 990 Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 1000 1005 Leu Leu Lys Asn Ile Thr Aia Arg Phe Val Leu Ile His 1015 1020 Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Giu 1030 1035 1040 Gly Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Vai 1045 1050 1055 Ala His Giu Ile Leu Asp His Ser Val Thr Gly Ala Arg 1060 1065 1070 Ala Giy Met Leu Asp Thr Thr Lye Gly Leu Ile Arg Ala 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser .L U 1-1 u 1095 1100 195 Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Giu Ser Cys Ser Val Gin Leu 1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp, Ala Arg Leu Ala Arg Gly Arg 1140 1145 1150 Pro Ile Tyr Gly Leu Glu Vai Pro Asp Vai Leu Giu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Giu Thr Cys Val Ile Cys Glu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 1185 1190 1195 1200 Ile Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 Thr Asp Giu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 *Tyr Gly Asp Asp Asp Ser Ser Trp Asn Giu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Giu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 1285 1290 1295 *Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu *.*1330 1335 1340 *Glu Thr Leu Phe Arg Leu Giu Lys Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 Leu His Leu His Val Glu Thr Asp Cys Cys Val Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Giu Leu 196 1380 1385 1390 Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val. Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr 1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1450 1455 Asn Glu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 Thr Glu Phe Leu Leu Ile Glu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 1475 1480 1485 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro *1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ile Ile Glu Pro Ile His 1540 1545 1550 .Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 ***Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Giu 1570 1575 1580 Leu Glu Giu Phe Thr Phe Leu Leu Cys Glu Ser Asp Giu Asp Val Val **1585 1590 1595 1600 Pro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro Ile Arg Gly Leu Arg 1620 1625 1630 Pro Val Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Glu Ala 1635 1640 1645 Met Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val 1650 1655 1660 197 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Giu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn 1700 1705 1710 Met Ser Ile Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Gly 1730 1735 1740 Asn Leu Ala Asn Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu Asn 1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Giu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Glu Pro Gly Giu Asp Gly Leu Phe Leu Gly Giu Gly Ser Gly 1780 1785 1790 *Ser Met Leu Ile Thr Tyr Lys Gu Ile Leu Lys Leu Aen Lys Cys Phe 1795 1800 1805 *Tyr Aen Ser Gly Val Ser Ala Aen Ser Arg Ser Gly Gin Arg Giu Leu *1810 1815 1820 *.*Ala Pro Tyr Pro Ser Giu Val Gly Leu Val Giu His Arg Met Gly Val 1825 1830 1835 1840 Asn Ile Val Lye Val Leu Phe Asn Gly Arg Pro Giu Val Thr Trp 1845 1850 1855 Val Gly Ser Val Asp Cys Phe Aen Phe Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 0**0Ser Ser Val Gly Phe Ile His Ser Asp Ile Giu Thr Leu Pro Asp Lys 1875 1880 1885 **Asp Thr Ile Glu Lys Leu Giu Giu Leu Ala Ala Ile Leu Ser Met Ala **1890 1895 1900 Leu Leu Leu Gly Lye Ile Gly Ser Ile Leu Val Ile Lye Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Vai Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 198 Tyr Arg Giu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 1940 1945 1950 Thr Giu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Glu Lys Ile Lys Gin Gin Ile Ile Giu Ser Ser Val Arg Thr 1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Giy Asp Ala Vai Ser Arg Gly Asp Ile Asn Pro 2005 2010 2015 Thr Leu Lys Lys Leu Thr Pro Ile Giu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Giu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 Arg Giu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Ile 2085 2090 2095 ***Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Agn Lys Lys Leu Ile Aen Lys Phe Ile Gin Aen Leu Lye Ser Gly Tyr 2115 2120 2125 *Leu Ile Leu Asp Leu His Gin Aen Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Giu Lys Gin Ile Ile Met Thr Gly Gly Leu Lye Arg Giu Trp Val 2145 2150 2155 2160 Phe Lys Val Thr Val Lye Glu Thr Lye Glu Trp Tyr Lys Leu Val Gly 2165 2170 2175 *Tyr Ser Ala Leu Ile Lye Asp 2180 INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs 199 TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TTTTCTAGTG
CACTTAGGAT
U S* U U
U
9. S.
U U flUe S U U.
U*
5 9
U.
TCAAGATCCT
TAAGGAGCTT
GTGGAGCCAT
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
TAGTTGGTGA
AGGACCTCTC
GAAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
ATTATCAGGG
AGCATTGTTC
CAGAGGAATC
ATCCAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CCAATACTTT
GAACAAGGz.A
TACCATCCTA
AGCTGATTCG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTAAGGAGG
ACAAGAGCAG
AAAAGAAACA
AAACACATTA
CTGGACCGGT
GCACTAATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCTCAAATTT
GAGCTAAGAA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA.
CTCTGGAGCT
TTTGGCCGA.T
TCAGCTGGAA
GATTAAGGAT
AGGACAAACC
TTATAGTACC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAATTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAA-A
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGA
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
ATCCGAGATG
ACCCATTACA
A.ATCCCTGGA
AATTGGAAAC
CTTATTTGTG
CATILAGGCTG
AAGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAA.AGGCG
GTACACCCAA
GGTGAGGAAC
GGATATCAAG
TACATATATC
ILACTATGTAT
CTTGATGAAC
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
GCCACACTTT
TCAGGATCCG
GATTCCTCAA
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGA.ACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACAAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTGAACTCG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260
S
U
9SSUS~
U
U.
U
U.
egg.
S
.5,5 200
GTATCACTGC
AGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
CGGAAGAACA
ACTAGGTGCG
APLAACTTAGG
GAGCCGATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTCT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AAAACAGCCT
CGAGGATGCA
AGCGGTTGGA
GCTACCGAGA
GGAGAGCTAC
AACCGGCACA
TCGAAGGTCA
AGGCTCAGAC
AGAGGCCGAG
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GACCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGG
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAAT
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGAAACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
GGCCAGAACA
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CAATGAATCT
CACTGACCGG
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
CTATTATGAT
ACACGAGGAT
CAGAGATTGC
CCCAAGTATC
AGGAAGATAG
GGCCCAGCAG
TTGACACTGC
TGCTTAGGCT
CTATAGTGTA
ACATCCGCCT
CAGCCCATCA
GTCAAAAACG
ATCGAGGAAG
ACCTGCAGGG
GGATCAACTG
GACGCTGAALA
TGTTATTATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAAAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
AATGCATACT
ATTTCTACAC
GAGGGTCAAA
AGCAAGTGAT
ATCGGAGTCC
GCAAGCCATG
CAATGACAGA
ACCCTCCATC
ACCATCCACT
GACTGGAATG
CTATGGCAGC
AAGAGAAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTG
CAATCTCCCC
TCTCTGATGT
T&ATCTCCAA
ACTGAGGACA
GGTGATCAAA
CAGAGTCGAG
GCGAGAGCTG
AGCCALAGATC
GCAGGAATCT
AATCTTCTAG
ATTGTTATALA
CCCACGATTG
CATCCGGGCT
ATGGTCAGAA
AGGCAGTTCG
ACCTCGCATC
CCCCCCAAGA
CAGCGGTGAA
TGATGGTGAT
TGGCOAACCT
GGGGTTCAGG
ACTCCAATCC
TCCGGACCCC
ATTAGCCTCA
TCGAAAGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 201 I.
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CTAAAGCCGA
GCATCACGCA
CGTTACCTGA
CAGATGCTGA
CCAGTCGACC
GCCTCCCAAG
AAGGGTTGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACA
CAACATTTAT
ATTATTGCAA
GCACCAGTCT
GGTTCAAGAA
TGAAGGGAGA
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAILATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
TGAAGATAAT
CAACTAGTAC
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATAGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
AATGAAGTAG
AACCTAAATC
ACAGAGATCT
CAACCCACCA
CTAGGCGACA
AGCGATCCCC
AAATCCACAG
AGACGTACAG
CTCACACCTT
AGTGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGAAGA
GGGAACTTCA
GAAA-AGATGG
AGCACAGGCA
TACCCGCTGA
ATTAAGAAGC AGATCAACAG AGCATCATGA TCGCCATTCC GAAATCAATC CCGACTTGAA GTTCTCAAGA AACCCGTTGC TCCAGAGGAC AGCTGCTGAA GCCGTCGGGT TTGTTCCTGA AAATCCAGCC GGCTAGAGGA AAAGGAGCCA ATGATCTTGC CTACAGCTCA ACTTACCTGC CATTATAAAA AACTTAGGAG ACGACTTCGA CAAGTCGGCA CCTACAGTGA TGGCAGGCTG
GGAAGGATGA
TAGGGCCTCC
CAAAGCCCGA
CAGGGCTCAA
GGAGAAAGGT
ATCTGATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAAAGIA
GCCTGGTTTT
AAATGAGCAA
TAGATATCIA
ATGCTTTATG
AATCGGGCGA
AA.AACTCCTC
TGAAAAACTG
CCTAACAACA
GCTCGATACC
CGGGTATTAC
CAACCTGCTG
TACAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACAA~
GGALATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
CAACCCCATG
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTTTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCACT TTTGCAGCCA TCAGTTCCTC 202 C*
AAGAATTCCG
TGTAGACCGT
GCCCGGACA.A
GCCGACGGCA
CCACCAGCCA
TGCCCCCGAT
ATTGGAAGGC
GACCGAGGTG
ACTAAACAAA
CGGCGCCGCG
CCCCGGTGCC
AATCCAAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCCTCCCCT
CCCACCCCTA
GGTCTCAAGG
ACCGGTCAAA
AGCTACAALAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
CATTTACGAC
AGTGCCCAGC
AAAAGCCCCC
AGCGCGAACA
CCCCAATCTG
CCAAACCACC
CCCTCCCCCT
ACCCAACCGC
ACTTAGGGCC
CCCCCAACCC
CACAGGCAGG
GGGGGGGCcC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
APLGGAGACAC
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCAATAACTG
ALACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACGTGATCA
AATGCCCGAA
TCCGAAAGAC
CCAGGCGGCC
CATCCTCCTC
AACCGCATCC
CTTCCTCAAC
AGGCATCCGA
AAGGAACATA
CCGACAACCA
GACACCAACC
CCCCAAAAAA
CACACGACCA
CGGCCATCAC
GGGAGCCACC
AAGGACATCA
ACCAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
TAAATGATGA
AACGACCCCC
TCCACGGACC
CCAGCACAGA
GTGGGACCCC
CCACCACCCC
ACAAGAACTC
CTCCCTAGAC
CACACCCAAC
GAGGGAGCCC
CCCGAACAGA
AGGCCCCCAG
CGGCAACCAA
CCCGCAGAAA
CAACCCGAAC
GTATCCCACA
CAATCCACCA
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
CCAAGGACTA
CTCACAATGA
AAGCGAGAGG
ACAGCCCTGA
CGAGGACCAA
CGGGAAAGAA
CACAACCGAA
AGATCCTCTC
AGAACCCAGA
CCAACCAATC
CCCAGCACCC
GGGCCGACAG
ACCAGAACCC
GGAAAGGCCA
CAGCACCCAA
GCCTCTCCAA
CACCCGACGA
CTCATCCAAT
TGTTILACTCT
TGGTAGGAAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
TACAAGGCCA
CCCCCA.AGGC
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAA
CCCCGGCCCA
CCGCCGGCTC
AACCATCGAC
CCAGCACCGC
AGACCACCCT
CA.ACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGdTGCG
CCAGTCCATG
TCAGGCAATT
4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 AGATTTGCGG GAGTAGTCCT ACAGCCGGCA TTGCACTTCA GCGAGCCTGG AAACTACTAA 203 GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGAA
AGTATAGCCT
GTCTCGTACA
CAAGGGTACC
GTGTGCAGCC
TCCACCAAGT
TCACALAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAIA
TAA.AGGCCCG
ATCCGACGCT
ACATAGGCTC
TTATCTCGAA
AAAATGCCTT
CCTGTGCTCG
ACCTAATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GAGAACAAGT
CCTATGTAAG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTTG
CATCGGGCJAG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
TTTTGATGAG
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TAAGTTGGAG
AGGTTTATCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGAACAATA
AACGAGACCG
TCATTAACAG
TCATGTTTCT
CCATCTACAC
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATCATTGG
GATGCCAAGG
AGCACTAGCA
GCTTTAATAT
AGACCAGGCC
TCCTCTACAA
ACCGCACCCA
TCGGTAGTTA
GATAALATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GTGATTTAAT
CATTATTTGG
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CTTTCATGCC
TCCAAGAATG
TTGGGAACCG
GCALAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
AATTGTTGGA
TAGTCTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAAAC
GCATCAAG;CC
ATTAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
CGGCCAGAAG
CCCCAGCTTA
TGGAGGAGAC
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCA.ACC
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TCACTGCCCG
AGATGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACGGGA
ACAA.ATGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500
C
204 TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA
AATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCAAAC
ATCTATAGTC
TAATCTGAGC
AGGTGTTATC
GCAACCAGCC
AGCCCTTTGT
CAGCTTCCAG
CCCCTTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CGGTTCAGGG
GCCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
GATGA.AGTGG
AAGATTAA.AT
AACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
AGCAAAAGGT
AGAAATCCGG
AGTAATGATC
CACGGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA
TGCTTCCAAC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCT
TACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GCCTGAGGAC
TCCTTAATCC
AGAGAATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTAGA
CCCAGGGAAT
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AA-ATCAAAAT
ACAAATCCAA
TAGGTGTAAT
ATGTCCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAGGCCG
TCGAATTACA
TGCTTGCGGA
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTATGGGGGA
ACAACTGAGC
TCCGGTGTTC
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAZACA
GGGTAAAkATC
TCCTTCATAC
TGCTTCGGGA
CCACAACAAT
CAACACATTG
TAAGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
GGTCGAGGTT
ACTTACCTAG
ATGTACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGAACAGATG
CAAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
GTGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATc
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
TAGTGAAATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
TGTTTGAAGT
ACTATCTTGA
TCAAACTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
ACCAAAAACT
TCACTCACTC
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 .99* 205
TGGGATGGTG
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAA.ACAATGT
CTCATATTCC
CGAGGAAGAT
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACATTTGA
CCGCTATGAC
AACTGATAGA
TGGAGCCTCT
CTTTCCTTAA
ATGAAGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
GGCATGGGAG
TAGTGAACCA
CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
ATATCCAAAT
CCGTGAACTC
ATGCTTAAGG
GGAGAAAGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
TATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAAAAT
GAAAGGTCAT
CAGTTGGCCA
TTCAGGTGAA
GAAATTTGGC
TCAGCTGCAC
ATCTCATGAT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
TGTAATCAGG
CTCAAAAAGG
GACACTAACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
TACCTGCAGC
ACTGAA.ATAC
TTAATTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTTT
CCGCTGACCC
GGGTTAACAC
TGCTTTATGC
AGTCACCCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCAAGCT
ATTTATTTAA
GGAATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CGCTTACAGC
ATTTTCCAAC
TAGGAGTTAT
CATAGAAGAC
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
A.AGAGTCAGA
TTATCAAATT
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
CCAATCGCAG
ACCCACTAGT
GTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAAATGATTA
CCGGCCCACT
AAAGAGTCAA
GTCAGTGATA
GAATTGAGGG
CAGTGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACAATGT
9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 206 ACCTAAAGGA CAAGGCACTT AGTTCCTGCG TTACGACCCT TTAATGATTC oGAG CTTTGAC TCCATGACCC TGAGTTCAAC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
GTCACAGGGG
GGAACGTGAG
ACACTGATCA
ATCTCAAGAA
TAAATGAGAT
CTGTCCTGTA
ATAAAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
CAhTTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
TGCTAAAATG
CGGGATTGGC
GGCACTCCAC
GGGGCCAGTC
AGCAGCAAAA
TCCGGAGAAT
GTACTGCCTT
TTACGGATTG
TGTAAGTGAC
CAATGATCAA
GTGGACCATC
TGCTTCGTTA
ATGGCCCTAC
TCTTAGGCAA
ATCACATTTT
ACTCAAGAGC
AGCATGCAGT
CCTTGCATAT
CACAATCAAT
CTTAATAkGG
CAGGCTGTTT
GCTGCTCTCC
CCCAAGGGAA
CCATATGATG
CTGTCTTACA
ACTTACAAAA
AAATATTTTA
ACTCTAGCTG
TTAAAAACCT
GGGTTTATAG
ATGGAAGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCAAGGGG
AACCTTAAGA
AGGCTACATG
TTTGTCTATT
ATCGCAAGAT
AATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
AAAGGGAATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
ACTCCCGAAG
GGTTCCCTCA
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTATCTATA
ACAATCAGAC
AACGGGAAGC
GGATTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAALAGAT
CCCAGTCCAC
AGTAATTCGG
CAGTGCATTT
CAGCTTGTTT
GCATAAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TACCCGAAAG
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
ACAAGTACCA
CAGGACCAAG
ATCACGACTG
GCAC.AGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACAA
ATGAATTATC
TCAATTGCTG
10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 ATATTGGCCA TCACCTCAAG CAAAAGGAAT ATATTATGAT GTGTATTCTG GTCAGAGACT CAACAATGGC TAAkAAGCATC TCCTAAA.AGT GATACAGCAA CCCGGGATGT AGTCATACCC TGCCCGCTCC TATTGGGGGG TCGGTGATCC AGTATACATCA ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 207 CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC
TTGTATGTGT
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
ATATGTGGGC
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CTAGGCAAAG
CGACTAkTTT
CCCTTGTCCG
CAGATAAGAA
TTTTAGAAAC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAGGGGATGA
TCACTATCTA
CCAGAGCATC
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAAA
GAGGCTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
AGTGGCGAGG
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATACA
ATGGTCCACA
CCTGGTAACA
CGATATCAAT
CTTGGGCCAG
ACTAGACTCC
TTAAAAGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCALACTAT
AAATTTGAGA
AGTTTCATAA
TGTGCGGCCA
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CTATTTACGG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCAACGA
ACCAACAAGG
ATACCGGATC
CGATGATAGA
GTACCAkAccc
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTG
CTCTCGAGTG
GCTATTGACA
GGCGAGAGCT
CCTTGAGGTC
ATGTGTCATC
CCAACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAAATAC
CAATCTCTCA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTCATAGAG
ATTTGATGTA
TTTGTCCTGA
GAAGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAILAGA
CTAAGAAGCC
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 208
S
GACCATCAGG
AAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
AAGAGTTAGA
GATTCGACAA
GGACCTGCCC
ATATCAAGC
TTGTAGACCA
GATTGAGAGT
CAAAGATCGG
ATGATGTTGC
GGGGCAATCT
CTTGCTACAA
ACGGCTTGTT
AACTAAACAA
AATTAGCACC
TTGTCAALAGT
TCAATTT CAT
AGACCTTGCC
TGGCTCTGCT
GGGATTTTGT
TATACCCTAG
AGGCTAACCG
GGACTTCACC
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCAATTCGA
AGAGGCTAGG
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAAATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
AGTTAGTAAT
TAACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTCTTGT
AAACACTTAT
GGTCTAAGAC
TTATCTCCAG
TCTCTGACTT
TTCATTTTCG
ATCTCAILATA
AAAGATATCA
GAAATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
TGTTGTCATC
TA.AGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAAAGCGA
GTGTTCTGGC
CGGTAGAGAA
CAGGATCTTC
ATCTCCGGCG
ACGCCCTCGC
TGAGCATCAA
ACACAAGCAA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TTTCCGCC.A
TTGTCGAACA
AAGTCACGTG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCCCA
CTGAATCTTA
TTAAGCAGCA
TATCCATTA
GTTCCTTrCT
AAAGATCTAC
TGATGCTCAA
C CTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
GTGGAACATA
AGGATCGATC
TGAGGTAAAT
GGCTTTCAGA
GCACAATCTT
AATCGGGTTG
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGA
TTTGGTTATG
GATAATTGA
GCAACTAAGC
AGAATGAGCA
AAGAAATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGACC
AATCCAATTA
AAACAGATAA
GTCAGTCAGC
CCCCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAGG
GAGATACTTA
GGTCAAAGGG
GTAGGTAATA
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGA
TGCATACAAG
13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 209
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TTTGGGGGCA
ATCTCAAGTC
CCAAGTCAGA
TAACAGTCAA
ACTAATTGGT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
CATTCTTCTT
CGGCTATCTG
GAAACAGATT
GGAGACCAAA
TGAACTCCGG
AATTGCGGGT
GCCTCAGGGC
TTCAAAGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
AACCCTAATC
TGGCAATTAA
AAGATGGATT
ACCAAAGAAG
GAGAACTTAT
ACAGAAAGTT
TACACCAGAA
GGGGTTTGAA
AGTTAGTCGG
CTGCCCTAGG
CGGACCTAAG
GCTTA-ATTCT
TCAACAAGGG
ATCTAGGATC
GATA.AATAAG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
TGGTTAGGCA
CTGTGCAAAG
ATACTCATCC
ATGTTCCACG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
TTATTTGCAA
15360 15420 15480 15540 15600 15660 15720 15780 15840 15894 TAGATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STR.ANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) Met 1 Asp SEQUENCE DESCRIPTION: SEQ ID NO:14: Asp Ser Leu Ser Val Asn Gin Ile Leu 5 10 Ser Pro Ile Val Thr Asn Lys Ile Vai Tyr Pro Giu Val His Leu Ala Ile Leu His 25 Glu Arg Val Pro Ile Lys His Ala Tyr Ser Leu 40 Gly Asp Pro Thr Leu Met Giu Tyr Ala Cys Gin Asn Ile Ile Asn Arg Leu Lys Val Asn 55 Val Phe Ser Asn Gin Leu Asn Al a Giu Val Gly Asn 70 Ile Lys Ser Lys 75 Asn.
Arg Ser Tyr Pro Asn His Ser His Ile Pro Tyr Pro Asn Cys Gin Asp Leu Phe 210 Ile Giu Asp Lys Giu Ser Thr Arg Lys Ile Giy Arg Ile 145 Trp Ser Pro Leu Phe 225 Thr Arg Giy Ala Leu 305 Phe Ile Phe Asn Asp 130 Lys Phe Val Val Val 210 Giu Glu Val Asn Tyr 290 Asn Ser Phe Arg Ser 115 Thr Giu Giu Ile Phe 195 Ala Leu Thr Arg Pro 275 Leu HisB Asp Ile Ser 355 100 Lau Asn Lys Pro Lys 180 Phe Ile Val Ala Tyr 260 Thr Gin Cys Giu Thr 340 Phe Ser Arg Ile 150 Leu Gin Gly Ser Met 230 Thr Trp Gin Arg Thr 310 Thr Asp His Lys Leu 135 Asn Phe Thr Ser Lys 215 Tyr Ile Lys Ile Asp 295 Giu Tyr Ile Pro Val 120 Gly Leu Trp His Ser 200 Giu Cys Asp Leu Val 280 Ile Ile His His Arg 360 105 Ser Asp Leu Gly Gly Val Phe Thr 170 Thr Cys 185 Val Glu Ser Gin Asp Val Ala Arg 250 Ile Asp 265 Ala Met Thr Val His Asp Glu Leu 330 Leu Thr 345 Leu Glu Arg Lys Ser Tyr 155 Val1 His Leu His Ile 235 Tyr Gly Leu Glu Val 315 Ile Gly Al a Val1 Giu 140 Met Lys Arg Leu Vai 220 Glu Thr Phe Glu Leu 300 Leu Glu Glu Val Phe 125 Leu His Thr Arg Ile 205 Tyr Gly Giu Phe Pro 285 Arg Asp Ala Ile Thr 365 110 Gin Arg Ser Glu Arg 190 Ser Tyr Arg Leu Pro 270 Leu Gly Gin Leu Phe 350 Al a Glu Leu Leu Lys Cys Giu Ser Met 175 His Arg Leu Leu Leu 255 Ala Ser Al a Asn Asp 335 Ser Al a Lys Leu Asp Gin 160 Arg Thr Asp Thr Met 240 Gly Lau Leu Phe Gly 320 Tyr Phe Glu 211 Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val Ile Val Tyr Glu Thr 370 375 380 Leu Met Lys Gly His Ala Ile Phe Cys Gly Ile Ile Ile Asn Gly Tyr 385 390 395 400 Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 405 410 415 Ala Ala Asp Thr Ile Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 420 425 430 His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 435 440 445 Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 450 455 460 *Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 465 470 475 480 Pro Lys Giu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg *485 490 495 Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 500 505 510 Val Ile Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 515 520 525 **Asn Leu Ser Tyr Ser Leu Lys Giu Lys Giu Ile Lys Giu Thr Gly Arg 530 535 540 Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val Ile Ala ***545 550 555 560 Giu Asn Leu Ile Ser Asn Gly Ile Gly Lys Tyr Phe Lys Asp Asn Gly 565 570 575 Met Ala Lys Asp Giu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 580 585 590 Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 595 600 605 Vai Leu Lys Thr Tyr Ser Arg Ser Pro Vai His Thr Ser Thr Arg Asn 610 615 620 Val Arg Ala Ala Lys Giy Phe Ile Gly Phe Pro Gin Vai Ile Arg Gin 625 630 635 640 212 Asp Gin Asp Thr Asp His Pro Glu Asn Met Giu Ala Tyr Giu Thr Val 645 650 655 Ser Tyr Leu Leu 705 Pro Gly Pro Leu Ser 785 Arg Hisa Ser Ser Arg 865 Arg Ile Ala Phe Giu Thr 675 Pro Ser 690 Tyr Vai Leu Tyr Gly Ile Tyr Leu 755 Val Gin 770 Thr Trp Asp Tyr Leu Lys Lys Gly 835 Ile Ala 850 Ala Ala Gly Tyr Gin Gin Ile Thr 660 Ile Ser Phe Phe Ser Asp Lys Val 725 Giu Gly 740 Tyr Leu Gly Asp Pro Tyr Phe Val 805 Ala Asn 820 Ile Tyr Arg Cys Cys Ser Asp Arg 885 Ile Leu Thr Leu Gin Pro 710 Pro Tyr Ala Asn Asn 790 Ile Giu Tyr Val Asn 870 Tyr Ile Asp Phe Trp 695 His Asn Cys Ala Gin 775 Leu Leu Thr Asp Phe 855 Ile Leu Ser Leu Ala 680 Leu Cys Asp Gin Tyr 760 Thr Lys Arg Ile Gly 840 Trp Al a Ala Leu Lys 665 Gin His Pro Gin Lys 745 Giu Ile Lys Gin Val 825 Leu Ser Thr Tyr Gly Lys Arg Lys Pro Ile 730 Leu Ser Ala Arg Arg 810 Ser Leu Glu Thr Ser 890 Phe Tyr Leu Arg Asp 715 Phe Trp Gly Vai Giu 795 Leu Ser Val Thr Met 875 Leu Thr Cys Asn Leu 700 Leu Ile Thr Val Thr 780 Ala His His Ser Ile 860 Al a Asn Ile Leu Giu 685 Giu Asp Lys Ile Arg 765 Lys Al a Asp Phe Gin 845 Val Lys Val Asn Aen 670 Ile Thr Al a Tyr Ser 750 Ile Arg Arg Ile Phe 830 Ser Asp Ser Leu Trp Tyr Ser His Pro 735 Thr Ala Val1 Val Gly 815 Val Leu Giu Ile Lys 895 Arg Giy Val Ile 720 Met Ile Ser Pro Thr 800 His Tyr Lys Thr Giu 880 Val 910 Thr Arg Asp Val-Val Ile Pro Leu Leu Thr Asn Asn Asp Leu Leu Ile 213 915 920 925 Arg Met Ala Leu Leu Pro Ala Pro Ile Gly Gly Met Asn Tyr Leu Asn 930 935 940 Met Ser Arg Leu Phe Val Arg Asn Ile Gly Asp Pro Val Thr Ser Ser 945 950 955 960 Ile Ala Asp Leu Lys Arg Met Ile Leu Ala Ser Leu Met Pro Glu Glu 965 970 975 Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 980 985 990 Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 995 1000 1005 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Vai Leu Ile His 1010 1015 1020 Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 1 025 1030 1035 1040 Glu Asp Giu Giy Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 Pro Arg Ala Ala His Glu Ile Leu Asp His Ser Val Thr Gly Ala Arg 1060 1065 1070 Glu Ser Ile Ala Giy Met Leu Asp Thr Thr Lys Gly Leu Ile Arg Ala 1075 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 ***Asn Tyr Asp Tyr Giu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Giy 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Glu Ser Cys Ser Val Gin Leu 1125 1130 1135 Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 1140 1145 1150 Pro Ile Tyr Gly Leu Giu Val Pro Asp Val Leu Giu Ser Met Arg Gly 1155 1160 1165 His Leu Ile Arg Arg His Glu Thr Cys Val Ile Cys Glu Cys Giy Ser 1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 1185 1190 1195 1200 214 Ile Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 Thr Asp Giu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 1220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Asn Giu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Giu Giu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 1285 1290 1295 .Val Lys Tlyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Sle Ser Asn Asp Asn Leu Ser Phe Val Ile Ser Asp Lys Lys Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Giu Thr Leu Phe Arg Leu Giu Lys Asp Thr Gly Ser Ser Asn Thr Val 1345 1350 1355 1360 *Leu His Leu His Val Glu Thr Asp Cys Cys Val Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Giu Leu 1380 1385 1390 see* Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Giu Phe 1410 1415 1420 Val Thr Trp Ser Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr 1425 1430 1435 1440 Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1450 1455 Asn Giu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Ser Phe Ile 1460 1465 1470 215 Thr Giu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 1475 1480 1485 Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Vai His Tyr His Arg Pro 1490 1495 1500 Ser Gly Lys Tyr Gin Met Gly Giu Leu Leu Ser Ser Phe Leu Ser Arg 1505 1510 1515 1520 Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Giy Ile Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Ser Leu Asp Ala Gin Aen Leu His Thr Thr Vai Cys Asn Met 1555 1560 1565 Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Giu *1570 1575 1580 *Leu Glu Giu Phe Thr Phe Leu Leu Cys Giu Ser Asp Giu Asp Val Val *1585 1590 1595 1600 Pro Asp Arg Phe Asp ken Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro Ile Arg Giy Leu Arg 1620 1625 1630 Pro Val Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Giu Ala **1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Aen Pro Ile Ile Val 1650 1655 1660 so.Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 Gin Ile Arg Leu Arg Val Asp Pro Giy Phe Ile Phe Asp Ala Leu Ala 1685 1690 1695 Glu Val ken Val Ser Gin Pro Lye Ile Gly Ser ken Asn Ile Ser ken 1700 1705 1710 Met Ser Ile Lye Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 1715 1720 1725 Leu Lys Asp Ile Aen Thr Ser Lye His ken Leu Pro Ile Ser Gly Gly 1730 1735 1740 ken Leu Ala Asn Tyr Giu Ile His Ala Phe Arg Arg Ile Gly Leu ken 216 1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Glu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Giu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lys Glu Ile Leu Lys Leu Asn Lys Cys Phe 1795 1800 1805 Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Giu -Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 1825 1830 1835 1840 V 0. Gly Asn Ile Val Lys Val Leu Phe Asn Gly Arg Pro Giu Val Thr Trp V*Vlp 1845 1850 1855 VlGly Ser Val Asp Cys Phe Aen Phe Ile Val Ser Aen Ile Pro Thr *1860 1865 1870 **Ser Ser Val Gly Phe Ile His Ser Asp Ile Glu Thr Leu Pro Asn Lys Vl1875 1880 1885 0:.:Asp Thr Ile Glu Lye Leu Giu Glu Leu Ala Ala Ile Leu Ser Met Ala 1890 1895 1900 Leu Leu Leu Gly Lye Ile Gly Ser Ile Leu Vai Ile Lye Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Val Gly Ser His 1925 1930 1935 Tyr Arg Glu Val Aen Leu Val Tyr Pro Arg Tyr Ser Aen Phe Ile Ser 50e1940 1945 1950 Thr Giu Ser Tyr Leu Val Met Thr Asp Leu Lye Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Glu Lye Ile Lye Gin Gin Ile Ile Giu Ser Ser Val Arg Thr 1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lye Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp 1ie Asn Pro 2005 2010 2015 Thr Leu Lye Lye Leu Thr Pro Ile Giu Gin Val. Leu Ile Asn Cys Gly 2020 2025 2030 217 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Giu Leu Ile His His Asp 2035 2040 2045 Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 Arg Giu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Ile 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Lys Phe Ile Gin Asn Leu Lys Ser Gly Tyr 2115 2120 2125 Leu Ile Leu Asp Leu His Gin Asn Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 *Ser Giu Lys Gin Ile Ile Met Thr Gly Gly Leu Lys Arg Giu Trp Val 2145 2150 2155 2160 *Phe Lys Val Thr Val Lys Giu Thr Lys Glu Trp Tyr Lys Leu Val Gly .2165 2170 2175 Tyr Ser Ala Leu Ile Lys Asp 2180 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 15894 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:lS: ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 218 GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCA.A
TTACCACTCG
GCGGGCCCAA
GTCAATTGAT
TCCAGAGTGA
ATGAGGCGGA
GATGGTTCGA
TGATTCTGGG
CAGACACGGC
TAGTTGGTGA
AGGACCTCTC
GAAACAAACC
GATTAGCCAG
GACTGCATGA
AAATGGGGGA
GTGCAGGATC
ACTCCATGGG
GGCAAGAGAT
GTATCACTGC
AGATCAGTAG
GTGAGAATGA
GAGAAGCCAG
CCCATCTTCC
CGCAGGACAG
CGGAAGAACA
ATCCAGACTT
ACTAACAGGG
TCAGAGGATC
CCAGTCACAA
CAAATACTTT
GAACAAGGAA
TACCATCCTA
AGCTGATTCG
ATTTAGATTG
CTTACGCCGA
CAGGATTGCT
TTTTATCCTG
ATTTGCTGGT
AACTGCACCC
ATACCCTCTG
AGGTTTGAAC
GGTAAGGAGG
CGAGGATGCA
AGCGGTTGGA
GCTACCGAGA
GGAGAGCTAC
AACCGGCACA
TCG2LAGGTCA
AGGCTCAGAC
CTGGACCGGT
GCACTA.ATAG
ACCGATGACC
TCTGGCCTTA
TCACATGATG
ATCTCAGATA
GCCCAAATTT
GAGCTAAGAA
GAGAGAAAAT
TTCATGGTCG
GAAATGATAT
ACTATTAAGT
GAGTTATCCA
TACATGGTAA
CTCTGGAGCT
TTTGGCCGAT
TCAGCTGGAA
AGGCTTGTTT
CCCAGACAAG
TTGGGGGGCA
AGAGA.AACCG
CCCCTAGACA
GCTGACGCCC
ACGGACACCC
TGGTCAGGTT
GTATATTATC
CTGACGTTAG
CCTTCGCATC
ATCCAATTAG
TTGAAGTGCA
GGGTCTTGCT
GGTGGATAAA
GGTTGGATGT
CTCTAATCCT
GTGACATTGA
TTGGGATAGA
CACTTGAGTC
TCCTGGAGAA
ATGCCATGGG
CTTACTTTGA
AGGTCAGTTC
CAGAGATTGC
CCCAAGTATC
AGGAAGATAG
GGCCCAGCAG
TTGACACTGC
TGCTTAGGCT
CTATAGTGTA
AATTGGAAAC
CTTATTTGTG
CATAAGGCTG
A.AGAGGTACC
TAGTGATCAA
AGACCCTGAG
CGCAAAGGCG
GTACACCCAA
GGTGAGGAAC
GGATATCAAG
TACATATATC
AACTATGTAT
CTTGATGAAC
CTCAATTCAG
AGTAGGAGTG
TCCAGCATAT
CACATTGGCA
AATGCATACT
ATTTCTACAC
GAGGGTCAAA
AGCAAGTGAT
ATCGGAGTCC
GCAAGCCATG
CAATGACAGA
CCGGATGTGA
GAGTCTCCAG
TTAGAGGTTG
AACATGGAGG
TCCAGGTTCG
GGATTCAACA
GTTACGGCCC
CAAAGAAGGG
AGGATTGCCG
AGAACACCCG
GTAGAGGCAG
CCTGCTCTTG
CTTTACCAGC
AACILAGTTCA
GAACTTGAAA
TTTAGATTAG
TCTGAACTCG
ACTGAGGACA
GGTGATCAAA
CAGAGTCGAG
GCGAGAGCTG
AGCCAAGATC
GCAGGAATCT
AATCTTCTAG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 *o ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 219 *0 S*
AAA.ACTTAGG
GAGCCGATGG
CTCAAGGCCG
ATATCAGACA
GGTCTCAGCA
CGCGGTCAGG
AATCTCCAGG
GCGGTTAAGG
AGCACCCTAT
GATACCGAGG
GCTTCTGATG
AGAGGCAACA
GGTAGGGCCA
TTTGGAACGG
CCCTCGGAAC
GCCGCACTGA
AATAATGAAG
AAAACAGCCT
CTGCTGTTAT
AGCATATCCA
AAGGATCCCA
GGCAGAGATT
CTCCAAGGAA
CCAAAGCCGA
GCATCACGCA
CGTTACCTGA
AACCAGGTCC
CAGAAGAGCA
AGCCCATCGG
ACCCAGGACA
AACCATGCCT
GACCTGGAGA
CATCAAGCAC
GAATCCAAGA
CAGGAGGAGA
GATATGCTAT
TTGAAACTGC
ACTTTCCGAA
GCACTTCCGG
AGATCGCGTC
CATCAGGGCC
TACAGGAGTG
AAGGGGGAGA
TGGCCAAAAT
TGAAGGGAGA
CCCTGGAAGG
ACGACCCCAC
CAGGCCGAGC
TGACAAATGG
TCGGGAAAAA
GTGTAATCCG
TGACTCTCCT
ACACAGCCGC
GGCACGCCAT
CTCACTGGCC
GGAGCGAGCC
CTCAGCAATT
GAGCGATGAC
TGGGTTACAG
TGCTGACTCT
CAATGAATCT
CACTGACCGG
AGAAGGAGGG
GCTTGGGAAA
GACACCCATT
TTTATTGACA
AGGTGCACCT
GACACCCGAA
CTATTATGAT
ACACGAGGAT
AGTTGAGTCA
ACACCTCTCA
TGCAGATGTC
ACTGGCCGAA
ACGGACCAGT
GATGAGCTCA
CTCCATTATA
TGATGATATC
CAGCCCATCA
GTCAAAAACG
ATCGAGGAAG
ACCTGCAGGG
GGATCAACTG
GACGCTGAAA
TGTTATTATG
ATCATGGTTC
GAAAACAGCG
GGATCTGCTC
GAGATCCACG
ACTCTCAATG
AAAALAGGGCA
GGTGGTGCAA
GCGGGGAATG
TCTGGTACCA
GATGAGCTGT
AATCAGAAGA
ATTAAGAAGC
AGCATCATGA
GAAATCAATC
GTTCTCAAGA
TCCAGAGGAC
GCCGTCGGGT
AAATCCAGCC
AAAGGAGCCA
ACCATCCACT
GACTGGAATG
CTATGGCAGC
AAGAGAAGGC
AAGGCGGTGC
CTTTGGGAAT
TTTATGATCA
AATCAGGCCT
ATGTGGATAT
CCATCTCTAT
AGCTCCTGAG
TTCCTCCGCC
CAGACGCGAG
CCCAATGTGC
TCCCCGAGTA
CAATCTCCCC
TCTCTGATGT
TAATCTCCAA
AGATCAACAG
TCGCCATTCC
CCGACTTGAA
AACCCGTTGC
AGCTGCTGAA
TTGTTCCTGA
GGCTAGAGGA
ATGATCTTGC
CCCACGATTG
CATCCGGGCT
ATGGTCAGAA
AGGCAGTTCG
ACCTCGCATC
CCCCCCAAGA
CAGCGGTGA.A
TGATGGTGAT
TGGCGAACCT
GGGGTTCAGG
ACTCCAATCC
CCCGGACCCC
ATTAGCCTCA
TCGAA.AGTCA
TGTGAGCAAT
GAGATCCCAG
CCAAGATATT
GCTAGAATCA
GCAAAATATC
TGGACTTGGG
ACCCATCATA
CAGCCGACAA
GGAATTTCAG
CACCGGCCCT
GGATCGGAAG
CAAGTTCCAC
1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 C) *C 220 CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 0 00 0 0 0
CCAGTCGACC
GCCTCCCAAG
AkAGGGTCGAT
TCAGAGTCAT
TGCTGGGGGT
CCCTGCCCTT
CTGAGCTTGA
ACAACACCCC
TCAACGCAAA
TCCGTGTTGT
GAAGAATGCT
GGATTGACAA
CAACATTTAT
ATTATTGCA.A
GCACCAGTCT
GGTTCAAGAA
TCTGGAGGAG
AAGAATTCCG
TGTAGACCGT
GCCCGGACAA
GCCGACGGCA
CCACCAGCCA
TGCCCCCGAT
ATTGGAAGGC
GACCGAGGTG
CAACTAGTAC
TTCCACAATG
CGCTCCGATA
AGATCCTGGT
TGTTGAGGAC
AGGTGTTGGC
CATAGTTGTT
ACTAACTCTC
CCAAGTGTGC
TTATATGAGC
GGAATTCAGA
GGCGATAGGC
GGTCCACATC
AATGAAAATC
TCACATTAGA
GACCTTATGT
CAGATGCAAG
CATTTACGAC
AGTGCCCAGC
AAAAGCCCcC
AGCACGAACA
CCCCAATCTG
CCAAACCACC
CCCTCCCCCT
ACCCAACCGC
AACCTAAATC
ACAGAGATCT
CAACCGACCA
CTAGGCGACA
AGCGATCCCC
AGATCCACAG
AGACGTACAG
CTCACACCTT
AATGCGGTTA
ATCACCCGTC
TCGGTCAATG
CCTGGGA-AGA
GGGAACTTCA
GAA.AAGATGG
AGCACAGGCA
TACCCGCTGA
ATAGTAAGAA
GACGTGATCA
AATGCCCGAA
TCCGAALAGAC
CCAGGCGGCC
CAT CCTC CT C
AACCGCATCC
CTTCCTCAAC
AGGCATCCGA
CATTATAAAA
ACGACTTCGA
CCTACAGTGA
GGAAGGATGA
TAGGGCCTCC
CAAAGCCCGA
CAGGGCTCAA
GGAGAA.AGGT
ATCTGATACC
TTTCGGATAA
CAGTGGCCTT
TCATCGACAA
GGAGAAAGAA
GCCTGGTTTT
AAATGAGCAA
TGGATATCAA~
TCCAGGCAGT
TAAATGATGA
AACGACCCCC
TCCACTGACC
CCAGCACAGA
GTGGGACCCC
CCACCACCCC
ACAAGAACTC
CTCCCTAGAC
AACTTAGGAG
CAAGTCGGCA
TGGCAGGCTG
ATGCTTTATG
AATCGGGCGA
AAAACTCCTC
TGAAAAACTG
CCTAACAACA
GCTCGATACC
CGGGTATTAC
CAACCTGCTG
TACAGAGCAA
GAGTGAAGTC
TGCACTTGGT
GACTCTCCAT
TGAAGACCTT
TTTGCAGCCA
CCAAGGACTA
CTCACAATGA
AAGCGAGAGG
ACAGCCCTGA
CGAGGACCAA
CGGGAAAGAA
CACAACCGAA
AGATCCTCTC
CAAAGTGATT
TGGGACATCA
GTGCCCCAGG
TACATGTCTC
GCATTTGGGT
AAAGAGGCCA
GTGTTCTACA
GGGAGTGTCT
CCGCAGAGGT
ACCGTTCCTA
GTGACCCTTA
CTTCCTGAGG
TACTCTGCCG
GGGATAGGGG
GCACAACTCG
AATCGATTAC
TCAGTTCCTC
TTCAAAGTTC
CAGCCAGAAG
CCAGCCAGCA
TACAAGGCCA
CCCCCAAGGC
ACCCCCAGCA
CCGCACAAGC
TCCCCGGCAA
3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 221 ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA
CGGCGCCGCG
CCCCGGTGCC
AATCCILAGAC
GAGGAAGCCC
GGGCCACCAG
ACCCCAGCCC
CGAAGGACCC
CTCCTCCTCT
CCCACCCCTA
GGTCTCAAGG
ACCGGTCAAA
AGCTACAAAG
ATAACTCTCC
ACAGTTTTGG
CAGAGTGTAG
GCCCTAGGCG
CTGAACTCTC
GAGGCAATCA
ATCAATAATG
CTCGGGCTCA
CGGGACCCCA
ATCAATAAGG
AGCAGAGGIA
AGTATAGCCT
GTCTCGTACA
CCCCCAACCC
CACAGGCAGG
GGGGGGGCCC
ACCCACCCCA
CTCCCAGACT
CGATCCGGCG
CCGAACCGCA
TCTCGAAGGG
ILAGGAGACAC
TGAACGTCTC
TCCATTGGGG
TTATGACTCG
TCA-ATAACTG
AACCAATTAG
CTTCAAGTAG
TTGCCACAGC
AAGCCATCGA
GACAAGCAGG
AGCTGATACC
AATTGCTCAG
TATCTGCGGA
TGTTAGAAAA
TAAAGGCCCG
ATCCGACGCT
ACATAGGCTC
CCGACAACCA
GACACCAACC
CC CCAAAAAA
CACACGACCA
CGGCCATCAC
GGGAGCCACC
AAGGACATCA
ACCAAAAGAT
CGGGAATCCC
TGCCATATTC
CAATCTCTCT
TTCCAGCCAT
CACGAGGGTA
AGATGCACTT
GAGACACAAG
TGCTCAGATA
CAATCTGAGA
GCAGGAGATG
GTCTATGAAC
ATACTATACA
GATATCTATC
GCTCGGATAC
GATAACTCAC
GTCCGAGATT
TCAAGAGTGG
GAGGGAGCCC
CCCGAACAGA
AGGCCCCCAG
CGGCAACCAA
CCCGCAGAAA
CAACCCGAAC
GTATCCCACA
CAATCCACCA
AGAATCAAGA
ATGGCAGTAC
AAGATAGGGG
CAATCATTAG
GAGATTGCAG
AATGCAATGA
AGATTTGCGG
ACAGCCGGCA
GCGAGCCTGG
ATATTGGCTG
CAACTATCTT
GAAATCCTGT
CAGGCTTTGA
AGTGGAGGTG
GTCGACACAG
AAGGGGGTGA
TATACCACTG
CCAACCAATC
CCCAGCACCT
GGGCCGACAG
ACCAGAACCC
GGAAAGGCCA
CAGCACCCA-A
GCCTCTCCALA
CACCCGACGA
CTCATCCAAT
TGTTAACTCT
TGGTAGGAAT
TCATAAAATT
AATACAGGAG
CCCAGAATAT
GAGTAGTCCT
TTGCACTTCA
AAACTACTA.A
TTCAGGGTGT
GTGATTTAAT
CATTATTTGO
GCTATGCGCT
ATTTACTGGG
AGTCCTACTT
TTGTCCACCG
TGCCCAAGTA
CCGCCGGCTC
AACCATCGAC
CCAGCACCGC
AGACCACCCT
CAACCCGCGC
GAGCGATCCC
GTCCCCCGGT
CACTCAACTC
GTCCATCATG
CCAAACACCC
AGGAAGTGCA
AATGCCCAAT
ACTACTGAGA
AAGACCGGTT
GGCAGGTGCG
CCAGTCCATG
TCAGGCAATT
CCAAGACTAC
CGGCCAGAAG
CCCCAGCTTA
TGGAGGAGAC
CATCTTAGAG
CATTGTCCTC
GCTAGAGGGG
TGTTGCAACC
4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 222
CAAGGGTACC
GTGTGCAGC
TACACCAAGT
TCACAAGGGA
ACGATCATTA
GTAGTCGAGG
TACTTGCACA
AATCTGGGGA
CAGATATTGA
GTGTGTCTTG
AACAAAAAGG
ACATCAAAAT
CACAAGTCTC
TATCTCCGGC
ATCATCCACA
TCCCAAGGGA
TTTGCTGGCT
CATTAGACTT
TCTAGATGTA
AATCATCGGT
CATCTCTGAC
TTGGTGTATC
GGCTGCTGAA
CAATCAGTTC
ATTCTCALC
ATCTATAGTC
TTATCTCGAA
AAAATGCCTT
CCTGTGCTCG
ACCTAATAGC
ATCAAGACCC
TGAACGGCGT
GAATTGACCT
ATGCAATTGC
GGAGTATGAA
GAGGGTTGAT
GAGAACAAGT
CCTATGTA.AG
CTCTTCGTCA
TTCCCTCTGG
ATGTCACCAC
AGTAGGATAG
GTTCTGTTTG
CATCGGGCAG
ACTAACTCA
GATGAAGTGG
AAGATTAAAT
AACCCGCCAG
GAGCTCATGA
CTAGCTGTCT
ATGTCGCTGT
ACTATGACAT
TTTTGATGAG
GTACCCGATG
TACACTCGTA
CAATTGTGCA
TGACAAGATC
GACCATCCAA
CGGTCCTCCC
TALAGTTGGAG
AGGTTTATCG
AGGGATCCCC
TGGTATGTCA
GTCGCTCTGA
TCAAGCAACC
CCGAACAATA
AACGAGACCG
TCATTAACAG
TCATGTTTCT
CCATCTACAC
TCGAGCATCA
GCCTGAGGAC
TCCTTAATCC
AGAGAATCAA
ATGCATTGGT
CAAAGGGAAA
CCCTGTTAGA
CCCAGGGAAT
TCATCGTGTA
AGTCCTCTGC
TCCGGGTCTT
TCAATCCTTT
CTAACATACA
GTCGGGAGCA
ATATTATTGG
GATGCCAAGG
AGCACTTGCA
GCTTTAATAT
AGACCAGGCC
TCCTCTACAA
ACCGCACCCA
TCGGTAGTTA
GATAAATGCC
AGAACATCTT
GAGCTTGATC
CGCAGAGATC
GGTCAAGGAC
ACCTCAGAGA
GGATAGGGAG
ATTGGATTAT
GAACTCAACT
CTGCTCAGGG
CTTGTATTTA
GTATGGGGGA
CTTTCATGCC
TCCA.AGAATG
TTGGGAACCG
GCAAGTGTTA
TTGCTGCCGA
GGAGGTATCC
AGAGGTTGGA
A.ATTGTTGGA
TAGTCTACAT
GTTGCTGCAG
TAAAGCCTGA
CTCTTGAAAC
GCATCAAGCC
ATTAAAACTT
TTCTACAAAG
ATGATTGATA
GGGTTGCTAG
CATAAAAGCC
GTGCTGACAC
TTCACTGACC
TACGACTTCA
GATCAATACT
CTACTGGAGA
CCCACTACAA
GGTCGAGGTT
ACTTACCTAG
AGAGGGGACT
CCTCCGGGGG
GTTCATTTTA
CACAACAGGA
TAACTGCCCG
AGACGCTGTG
CGTAGGGACA
GTCATCGGAC
CCTGATTGCA
GGGGCGTTGT
TCTTACGGGA
ACAAATGTCC
CACCTGAAAT
AGGGTGCAAG
ATAACCCCCA
GACCTTATGT
CCATTGCAGG
TCAGCACCAA
CACTCTTCAA
TAGTGAALATT
GAGATCTCAC
GTGCAGATGT
CCAGAACAAC
TCAGAGGTCA
ACAATGTGTC
TGGAAAAGCC
6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 223 9 9* 9. 9 9. 9* 9 9 9
TAATCTGAGC
AGGTGTTATC
GCAACCAGTC
AGCCCTTTGT
CAGCTTCCAG
CACCTTATCA
TATCGCTGAC
AATGGAGACA
CGAGTGGGCA
GAGTCTGACA
CGGTTCAGGG
ACCAATGAAG
GGTTAGTCCC
AACATACCTA
ACCTGGTCAA
TGTGGTTTAT
GCCTATAAAG
CTGGTGCCGT
TGGGATGGTG
ATAGGGCTGC
GTGAAATAGA
CGCTATCTGT
ATAAGATAGT
CTACACTGTG
TAAACAATGT
AGCAAAAGGT
AGAAATCCGG
AGTAATGATC
CACCGGGAAG
CTCGTCAAGC
ACGGATGATC
AATCAAGCAA
TGCTTCCAC
CCATTGAAGG
GTTGAGCTTA
ATGGACCTAT
AACCTAGCCT
TACCTCTTCA
CCTGCGGAGG
GATCTCCAAT
TACGTTTACA
GGGGTCCCCA
CACTTCTGTG
GGCATGGGAG
TAGTGAACTA
CATCAGAATT
CAACCAGATC
AGCCATCCTG
TCAGAACATC
GGAAGTTGGG
CAGAGTTGTC
GTTTGGGGGC
TCAGCAACTG
ATTCTATCAC
TAGGTGTCTG
CAGTGATAGA
AATGGGCTGT
AGGCGTGTAA
ATAACAGGAT
AAATCAAAAT
ACAAATCCAA
TAGGTGTAAT
ATGTCCCAAT
TGGATGGTGA
ATGTTTTGGC
GCCCAAGCCG
TCGAATTACA
TGCTTGCGGA
TCAGCTGCAC
ATCTCATGAT
AAGAAAAACG
TTATACCCTG
GAGTATGCTC
AAGCACCGCC
AATGTCATCA
ACAACTGAGC
TCCGGTGTTC
TATGGTGGCT
AATTCCCTAT
GAAATCCCCA
CAGGCTTTAC
CCCGACAACA
GGGTAAAATC
TCCTTCATAC
TGCTTCGGGA
CCACAACAAT
CAACACATTG
TA-AGGAAGCA
TGTCAAACTC
AACCTACGAT
CTCATTTTCT
AGTGGAATGC
CTCAGAATCT
AGTCACCCGG
GTCACCCAGA
TAGGGTCCAA
AAGTTCACCT
GAGTCCCTCA
TAAAAAACGG
AGTCCALAGCT
ATGTACCGAG
CATATGACAA
TTGGGGGAGC
CAGGGATCAG
ACCGACATGC
CTCTCATCTC
CGAACAGArG
C-AAGCACTCT
GGGGTCTTGT
TTCGGGCCAT
GTGTATTGGC
GAGTGGATAC
GGCGAAGACT
AGTTCCAATC
ACTTCCAGGG
TACTTTTATC
TTCACATGGG
GGTGGACATA
GAAGATGGAA
CATCAGGCAT
GTGGTTCCCC
AGATAGCCCG
CGCTTACAGC
ATTTTCCAAC
TAGGAGTTAT
TGTTTGAAGT
ACTATCTTGA
TCAAACTCGC
GGAAAGGTGT
AATCCTGGGT
ACAGAGGTGT
ACAAGTTGCG
GCGAGAATCC
CTGTTGATCT
TGATCACACA
TGACTATCCC
CGAGATTCAA
GCCATGCCCC
TGGTGATTCT
TTGAACATGC
CTTTTAGGTT
ACCAAAAACT
TCACTCACTC
CCAATCGCAG
ACCCACTAGT
GTTATGGACT
ATAGTTACCA
CTGGAGGACC
CAAATGATTA
CCGGCCCACT
8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA~ CATAGAAGAC AAAGAGTCAA 224 9* S S
S
CGAGGAAGAT
AGGTTTTCCA
AGGACATCAA
AGCCCTTTCT
CCCATACTTG
TGCTAATCTC
TGACATTTGA
CCGCTATGAC
AkACTGATAGA
TGGAGCCTCT
CTTTCCTTAA
ATGAAGGTAC
TACATCTGAC
CAGTAACGGC
AGACTCTGAT
GGCACGGAGG
ATGCTCAAGC
TTGCTGGAGT
ACCTAAAGGA
AGTTCCTGCG
TTAATGATTC
TCCATGACCC
GTAGACTTTT
TAATCTCAAA
ATTTGACTAA
CCGTGAACTC
ATGCTTAAGG
GGAGAAAGTT
GTTTTGGTTT
CCATAGGAGG
TCGTGACCTT
ACTGGTTTTG
TATTGATGCT
TGGTTTCTTC
TTCACTTGCT
CCACTGCTTT
TTATCATGAG
AGGGGAGATT
TGCTGAAAAT
GAAAGGTCAT
CAGTTGGCCA
TTCAGGTGAA
GAAATTTGGC
CAAGGCACTT
TTACGACCCT
GAGCTTTGAC
TGAGTTCAAC
TGCTAkAATG
CGGGATTGGC
GGCACTCCAC
CTCAAAAAGG
GACACTA.ACT
ATTAACTTGG
ACAGTCAAGA
AGACACACAC
GTTGCTATAA
ATGTATTGTG
AGGTATACAG
CCTGCACTCG
TACCTGCAGC
ACTGAAATAC
TTAATTGAAG
TTCTCATTTT
GTTAGGAAAT
GCCATATTTT
CCGCTGACCC
GGGTTAACAC
TGCTTTATGC
GCTGCTCTCC
CCCAAGGGAA
CCATATGATG
CTGTCTTACA
ACTTACAAAA
AAATATTTTA
ACTCTAGCTG
GGAATTCGCT
CACGGCTTGG
GAGTTTACAT
CTGAGATGAG
CTGTATTCTT
TCAGTAAAGA
ATGTCATAGA
AGCTTCTAGG
GGAATCCAAC
TGAGGGATAT
ATGATGTTCT
CTCTAGATTA
TCAGAAGTTT
ACATGAATCA
GTGGAATCAT
TCCCCCTGCA
ATGAGCAGTG
CTCTTAGCCT
AAAGGGAATG
CCGGGTCACG
TGATAATGTA
GCCTGAAAGA
TGAGGGCATG
AGGACAATGG
TCTCAGGAGT
GTACTCCAAA
CCTAGGCTCC
GCACAGCTCC
GTCAGTGATT
CACTGGTAGT
GTCTCAACAT
GGGGAGGTTA
AAGAGTCAGA
TTATCAKATT
AACAGTAGAA
TGACCAAAAC
CATTTTCATA
CGGCCACCCC
GCCTAAAGTC
AATCAACGGC
TGCTGCAGAC
CGTTGATAAC
GGATAGTGAT
GGATTCAGTT
GAGGCTTGTA
TGTTGTAAGT
AAAGGAGATC
CCAAGTGATT
GATGGCCAAG
CCCCAAAGAT
GTCAGTGATA
GAATTGAGGG
CAGTGGTTTG
AAATCACAAA
TCAGTTGAGT
GTATATTACC
ATGACAGAGA
TACATGTGGA
GTAGCCATGC
CTCAGAGGTG
GGGTTTTCTG
ACTGATGACA
AGACTTGAAG
ATTGTGTATG
TATCGTGACA
ACAATCCGGA
TGGAAATCTT
CTGACAATGT
TACCCGAAAG
GATGTTTTCC
GGAGCTTACC
AAGGAAACAG
GCTGAAAATC
GATGAGCACG
CTCAAAGAAA
9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 225 GGAACGTGAG AGCAGCAAA-A GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG
S
S S
S.
SSSS
S
ACACTGATCA
ATCTCAAGA
TAAATGAGAT
CTGTCCTGTA
ATAAAGTCCC
GTCAGAAGCT
GAGTAAGGAT
TACCCAGCAC
ACTTTGTAAT
CAATTGTTTC
TGTCCCAATC
AAACAAGGGC
ATGACCGTTA
CTCTTGGCTT
ACAACGACCT
TGAATATGAG
ATCTCAAGAG
CACAACAACC
TTGTATGTGT
TCCATAGTCC
AGGGACTGGC
TCCTGGATCA
AAGGCCTGAT
TGTCCAATTA
GAAATGTCCT
TCCGGAGAAT
GTACTGCCTT
TTACGGATTG
TGTAAGTGAC
CAATGATCA-A
GTGGACCATC
TGCTTCGTTA
ATGGCCCTAC
TCTTAGGCAA
ATCACATTTT
ACTCAAGAGC
AGCATGCAGT
CCTTGCATAT
CACAATCAAT
CTTAATAAGG
CAGGCTGTTT
AATGATTCTC
GGGGGACTCT
CCAGAGCATC
AAACCCAATG
GGCATTCCTC
TAGTGTCACA
TCGAGCCAGC
TGACTATGAA
CATTGACAA
ATGGAAGCTT
AATTGGAGAT
CCCTCATTTT
CCTCATTGCC
ATCTTCATTA
AGCACCATTC
GTGCAAGGGG
AACCTTAAGA
AGGCTACATG
TTTGTCTATT
ATCGCAAGAT
AATATTGCTA
TCCCTGAACG
TCAACCATGA
ATGGCACTGT
GTCAGAAACA
GCCTCACTAA
TCATTCCTAG
ACTAGACTCC
TTAAAAGGAT
ATGGACAGGC
GGGGCAAGAG
ATGAGGAAGG
CAATTCAGAG
GAGTCATGTT
ACGAGACAGT
ATGAGACCAT
TCCAGTGGCT
CCCCCGACCT
AGTACCCTAT
CCTATCTATA
ACAATCAGAC
AACGGGAAGC
ATATTGGCCA
C.AAAAGGAAT
GTGTATTCTG
CAACAATGGC
TCCTAAAAGT
CCCGGGATGT
TGCCCGCTCC
TCGGTGATCC
TGCCTGAAGA
ACTGGGCTAG
TCAAGAACAT
TATTCCATGA
ATATTATAGT
AGTCTATTGC
GGGGGTTAAC
CAGGGATGGT
CAGTGCAGCT
CAGTGCATTT
CAGCTTGTTT
GCATALAGAGG
TGACGCCCAT
GGGAGGTATA
CCTGGCTGCT
CATAGCCGTA
TGCTAGAGTA
TCACCTCAAG
ATATTATGAT
GTCAGAGACT
TAAAAGCATC
GATACAGCAA
AGTCATACCC
TATTGGGGGG
AGTAACATCA
GACCCTCCAT
CGACCCTTAC
AACTGCAAGG
TGACAGTAAA
ACCTAGGGCA
AGGCATGCTG
CTCTCGAGTG
GCTATTGACA
GGCGAGAGCT
ATCACGACTG
GCACAGAGGC
CTTGAGACCT
ATCCCGTTAT
GAAGGGTATT
TATGAGAGCG
ACAAAAAGGG
ACTAGAGATT
GCAAATGAGA
GGGCTACTTG
ATAGTTGATG
GAGAGAGGTT
ATTCTGATCT
CTCCTCACA-A
ATGAATTATC
TCAATTGCTG
CAAGTAATGA
TCAGCAAATC
TTTGTCCTGA
GA.AGAGGACG
GCTCATGAAA
GATACCACAA
ATAACCAGAT
GGAAGAAAGA
CTAAGAAGCC
11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 226
ATATGTGGGC
TAGAATCTAT
GATCAGTCAA
AGGAAACATC
TGAAGCTTGC
CAGTGTACTC
CTAGGCAAAG
CGACTAATTT
CCCTTGTCCG
CAGATAAGAA
TTTTAGAk.AC
TTCACGTCGA
CCCGCAAGCT
CTTTAATTGA
AATTTGTTAC
CTATGATTGA
TAOGGGATGA
TCACTATCTA
GACCATCAGG
AAGGAGTGTT
GGCATTGTGG
CAACTGTGTG
ALAGAGTTAGA
GATTCGACAA
GGGCCTGCCC
GAGGCTAGCT
GCGAGGCCAC
CTACGGATGG
ATCCTTGAGA
CTTCGTAAGA
ATGGGCTTAC
GGCCAATGTG
AGCGCATAGG
AGTGGCGAGG
GGTTGATACT
ATTGTTTCGA
AACAGATTGT
AGAGCTGAGG
CAGAGATACA
ATGGTCCACA
CCTGGTA.ACA
CGATATCAAT
CTTGGGCCAG
GAAATATCAG
TAAGGTGCTT
TATTATAGAG
CAACATGGTT
AGAGTTCACA
CATCCAGGCA
ACCAATTCGA
CGAGGACGGC
CTTATTCGGC
TTTTTTGTCC
GTCCCATATA
GCCCCAAGTC
GGTGATGATG
AGCCTGGAGG
TTGAGGGATC
TATACCACAA
AACTTTATAT
CTCGAGAAAG
TGCGTGATCC
GCAGAGCTAT
ACAAGGCTAT
CCCCAACTAT
AAATTTGAGA
AGTTTCATkA
TGTGCGGCCA
ATGGGTGAGC
GTCAATGCTC
CCTATCCATG
TACACATGCT
TTTCTCTTGT
AAACACTTAT
GGTCTAAGAC
CTATTTACGG
GTCATGAGAC
CCTCGGGTTG
TTGGTTCTAC
GATCCTTGCG
ATAGCTCTTG
AGCTAAGGGT
GTAGCACTCA
TCTCCA.ACGA
ACCAACAAGG
ATACCGGATC
CGATGATAGA
GTACCAACCC
ACACCCAGAG
ATCACATTTT
AGGACCATAT
CTGAGTTTCT
TCAATTGGGC
TGTTGTCATC
TAAGCCACCC
GTCCTTCACT
ATATGACCTA
GTGAA.AGCGA
GTGTTCTGGC
CGGTAGAGA
CCTTGAGGTC
ATGTGTCATC
CCALACTGGAT
CACTGATGAG
ATCTGCTGTT
GAACGAAGCC
GATCACTCCC
AGTGAAATAC
CAATCTCTCA
AATGCTTCTA
ATCTAACACG
TCATCCCAGG
ATTGATATAT
CCATAGGAGG
AGCTAAGTCC
GAATGAAATT
GCTCATAGAG
ATTTGATGTA
GTTCCTTTCT
AAAGATCTAC
TGATGCTCAA
CCTCGACCTG
CGAGGATGTA
AGATTTGTAC
ATGTGCAGTT
CCTGATGTAC
TGCGAGTGTG
GATATTGACA
AGAACAGACA
AGAATAGCAA
TGGTTGTTGG
ATCTCAACTT
TCAGGTACAT
TTTGTCATAT
GGGTTGGGTG
GTATTACATC
ATACCCAGCT
GATAATGCAC
CACCTTGTGG
ACAGCACTAT
TCAGCTCTCA
CCAAGATTAT
CATTATCATA
AGAATGAGCA
AAGAALATTCT
AACTTGCACA
TTGTTGAATG
GTACCGGACA
TGTCAACCAG
CTAACCGACC
12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 a ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 227 *0 a
TTGTAGACCA
GATTGAGAGT
CAkkAGATCGG
ATGATGTTGC
GGGGCAkATCT
CTTGCTACAA
ACGGCTTGTT
AACTA.AACAA
AATTAGCACC
TTGTCAAAGT
TCAATTTCAT
AGACCTTGCC
TGGCTCTGCT
GGGATTTTGT
TATACCCTAG
AGGCTAACCG
GGACTTCACC
CAkTTGTGGG
CTATAGAGCA
AATTGATCCA
TCTACAGGGA
CTTACCCCGT
TTTGGGGGCA
ATCTCAAGTC
CCAAGTCAGA
TAACAGTCAA
TTACTCATGC
TGATCCAGGA
CAGCAACAAC
AAAATTGCTC
CGCCAATTAT
AGCTGTTGAG
CTTGGGTGAG
GTGCTTCTAT
CTATCCCTCC
GCTCTTTAAC
AGTTAGTAAT
TAACAAAGAT
CCTGGGCAAA
TCAGGGATTT
ATACAGCAAC
GCTAATGAAT
TGGACTTATA
AGACGCAGTT
GGTGCTGATC
CCATGATGTT
GTTGGCAAGA
ATTGGTAAGT
CATTCTTCTT
CGGCTATCTG
GAILACAGATT
GGAGACCAAA
TCTCTGACTT
TTCATTTTCG
ATCTCAAATA
AKAGATATCA
GAA.ATCCATG
ATATCAACAT
GGATCGGGTT
AATAGTGGGG
GAAGTTGGCC
GGGAGGCCCG
ATCCCTACCT
ACTATAGAGA
ATAGGATCAA
ATAAGTTATG
TTCATATCTA
CCTGAAAAGA
GGTCACATCC
AGTAGAGGTG
AATTGCGGGT
GCCTCAGGGC
TTCAAAGACA
AGCAGGCAAC
TACTCCGGGA
ATACTAGACT
ATTATGACGG
GAATGGTATA
ATCTCCGGCG
ACGCCCTCGC
TGAGCATCAA
ACACAAGCA
CTTTCCGCAG
TAATTAGGAG
CTATGTTGAT
TTTCCGCCAA
TTGTCGAACA
A.AGTCACGTG
CTAGTGTGGG
AGCTAGAGGA
TACTGGTGAT
TAGGGTCTTA
CTGAATCTTA
TTAAGCAGCA
TATCCATTA.A
ATATCAATCC
TGGCAATTAA
AAGATGGATT
ACCGAAGAAG
GAGAACTTAT
ACAGAALAGTT
TACACCAGAA
GGGGTTTGAA
AGTTAGTCGG
AGGATCGATC
TGAGGTAAAT
GGCTTTCAGA
GCACAATCTT
AATCGGGTTG
ATGCCTTGAG
CACTTATAAG
TTCTAGATCT
CAGAATGGGA
GGTAGGCAGT
GTTTATCCAT
ATTGGCAGCC
TAAGCTTATG
TTATAGAGAA
TTTGGTTATG
GATAATTGAA
GCAACTAAGC
TACTCTGAAA
CGGACCTAAG
GCTTAATTCT
TCAACAAGGG
ATCTAGGATC
GATAAATAAG
TATCTTCGTT
ACGTGAGTGG
ATACAGTGCC
AAACAGATAA
GTCAGTCAGC
CCCCCACACG
CCCATTTCAG
AACTCATCTG
CCAGGGGAGG
GAGATACTTA
GGTCAAAGGG
GTAGGTAATA
GTAGATTGCT
TCAGATATAG
ATCTTATCGA
CCTTTCAGCG
GTGAACCTTG
ACAGATCTCA
TCATCTGTGA
TGCATACAAG
AAACTTACAC
CTGTGCAAAG
ATACTCATCC
ATGTTCCACG
ACCCGCAAAT
TTTATCCAGA
AAGAATCTAT
GTTTTTAAGG
CTGATTAAGG
14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 a a 228 ACTAATTGAT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 2183 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID, NO:16: Met Asp Ser Leu Ser Val Asn Gin Ile Leu Tyr Pro Giu Val His Leu 1 5 3.0 *Asp Ser Pro Ile Val Thr Asn Lys Ile Val Ala Ile Leu Giu Tyr Ala 25 *p*Arg Val Pro His Ala Tyr Ser Leu Giu Asp Pro Thr Leu Cys Gin Asn 40 Ile Lys His Arg Leu Lys Asn Giy Phe Ser Asn Gin Met Ile Ile Asn 55 Asn Val Giu Vai Gly Asn Val Ile Lys Ser Lys Leu Arg Ser Tyr Pro 70 75 **Aia His Ser His Ile Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 90 Ile Glu Asp Lys Giu Ser Thr Arg Lys Ile Arg Glu Leu Leu Lys Lys ***100 105 110 Giy Asn Ser Leu Tyr Ser Lys Vai Ser Asp Lys Vai Phe Gin Cys Leu 115 120 125 Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Giu Leu Arg Giu Asp 130 135 140 Ile Lys Giu Lys Val Ile Asn Leu Gly Val Tyr Met His Ser Ser Gin 145 150 155 160 Trp Phe Giu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Giu Met Arg 165 170 175 229 Ser Pro Leu Phe 225 Thr Arg Gly Ala Leu 305 Phe Ile Phe Asn 385 Arg Ala His Val Val Val 210 Glu Glu Val Asn Tyr 290 Asn Ser Phe Arg Val 370 Met Asp Ala Glu Ile Phe 195 Ala Leu Thr Arg Pro 275 Leu His Asp Ile Ser 355 Arg Lys Arg Asp Gin 435 Lys 180 Phe Ile Val Ala Tyr 260 Thr Gin Cys Glu Thr 340 Phe Lys Gly His Thr 420 Cys Ser Thr Ile Leu Met 245 Met Tyr Leu Phe Gly 325 Asp Gly Tyr His Gly 405 Ile Val Gin Gly Ser Met 230 Thr Trp Gin Arg Thr 310 Thr Asp His Met Ala 390 Gly Arg Asp Thr Ser Lys 215 Tyr Ile Lys Ile Asp 295 Glu Tyr Ile Pro Asn 375 Ile Ser Asn Aen His Ser 200 Glu Cys Asp Leu Val 280 Ile Ile His His Arg 360 Gin Phe Trp Ala Trp 440 Thr 185 Val Ser Asp Ala Ile 265 Ala Thr His Glu Leu 345 Leu Pro Cys Pro Gin 425 Lys Cys Glu Gin Val Arg 250 Asp Met Val Asp Leu 330 Thr Glu Lys Gly Pro 410 Ala Ser His Leu His Ile 235 Tyr Gly Leu Glu Val 315 Ile Gly Ala Val Ile 395 Leu Ser Phe Arg Leu Vai 220 Glu Thr Phe Glu Leu 300 Leu Glu Glu Vai Ile 380 Ile Thr Gly Ala Arg Ile 205 Tyr Gly Glu Phe Pro 285 Arg Asp Ala Ile Thr 365 Val Ile Leu Glu Gly 445 Arg 190 Ser Tyr Arg Leu Pro 270 Leu Gly Gin Leu Phe 350 Ala Tyr Asn Pro Gly 430 Val His Arg Leu Leu Leu 255 Ala Ser Ala Asn Asp 335 Ser Ala Glu Gly Leu 415 Leu Lys Thr Asp Thr Met 240 Gly Leu Leu Phe Gly 320 Tyr Phe Glu Thr Tyr 400 His Thr Phe Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 230 450 455 460 Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Giu Trp Asp Ser Val Tyr 465 470 475 480 Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 485 490 495 Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 500 505 510 Val Ile Met Tyr Val Vai Ser Giy Ala Tyr Leu His Asp Pro Glu Phe 515 520 525 Asn Leu Ser Tyr Ser Leu Lys Glu Lys Giu Ile Lys Giu Thr Gly Arg 530 535 540 Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val Ile Ala 545 550 555 560 *Glu Asn Leu Ile Ser Asn Giy Ile Giy Lys Tyr Phe Lys Asp Asn Giy 565 570 575 Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 580 585 590 ***Val Ser Giy Vai Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 595 600 605 0 0 0..:Val Leu Lye Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 610 615 620 Val Arg Ala Ala Lys Gly Phe Ile Gly Phe Pro Gin Val Ile Arg Gin *625 630 635 640 Asp Gin Asp Thr Asp His Pro Giu Asn Met Giu Ala Tyr Glu Thr Val 645 650 655 .Ser Ala Phe Ile Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg a..660 665 670 Tyr Giu Thr Ile Ser Leu Phe Ala Gin Arg Leu Asn Giu Ile Tyr Gly 675 680 685 Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Giu Thr Ser Val 690 695 700 Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Ile 705 710 715 720 Pro Leu Tyr Lys Val Pro Asn Asp Gin Ile Phe Ile Lys Tyr Pro Met 725 730 735 231 Gly Gly Ile Giu Gly Tyr Cys Gin Lys Leu Trp, Thr Ile Ser Thr Ile 740 745 750 9 .9 9 9.
9* 9 9 9***99 9 9 999w 9*9*99 9 999*99 9* 9 *999 9 *999 Pro Tyr Leu Vai 770 Ser Thr 785 Arg Asp His Leu Ser Lys Ser Ile 850 Arg Ala 865 Arg Giy Ile Gin Thr Arg Arg Met 930 Met Ser 945 Ile -Aia Thr Leu Leu 755 Gin Trp Tyr Lys Giy 835 Aia Ala Tyr Gin Asp 91i5 Ala Arg Asp His Tyr Giy Pro Phe Ala 820 Ile Arg Cys Asp Ile 900 Vai Leu Leu Leu Gin 980 Leu Asp Tyr Vai 805 Asn Tyr Cys Ser Arg 885 Leu Vai Leu Phe Lys 965 Val Al a Asn Asn 790 Ile Giu Tyr Vai Asn 870 Tyr Ile Ile Pro Vai 950 Arg Met Ala Gin 775 Leu Leu Thr Asp Phe 855 Ile Leu Ser Pro Aia 935 Arg Met Thr Tyr 760 Thr Lys Arg Ile Gly 840 Trp Ala Aia Leu Leu 920 Pro Asn Ile Gin Giu Ile Lys Gin Val1 825 Leu Ser Thr Tyr Giy 905 Leu Ile Ile Leu Gin 985 Ser Ala Arg Arg 810 Ser Leu Glu Thr Ser 890 Phe Thr Gly Gly Al a 970 Pro Gly Val Giu 795 Leu Ser Val1 Thr Met 875 Leu Thr Asn Giy Asp 955 Ser Gly Val Thr 780 Al a His His Ser Ile 860 Ala Aen Ile Aen Met 940 Pro Leu Asp Arg 765 Lys Ala Asp Phe Gin 845 Val Lys Val Asn Asp 925 Asn Val Met Ser Ile Arg Arg Ile Phe 830 Ser Asp Ser Leu Ser 910 Leu Tyr Thr Pro Ser 990 Ala Ser Val Pro Val Thr 800 Gly His 815 Val Tyr Leu Lys Giu Thr Ile Giu 880 Lys Val 895 Thr Met Leu Ile Leu Asn Ser Ser 960 Giu Glu 975 Phe Leu Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 995 1000 1005 232 Ile Thr Arg Leu Leu Lys Asn Ile Thr Ala Arg Phe Val Leu Ile His 1010 1015 1020 Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 1025 1030 1035 1040 Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His Ile Ile Val 1045 1050 1055 Pro Arg Ala Ala His Giu Ile Leu Asp His Ser Val Thr Gly Ala Arg 1060 1065 1070 Glu Ser Ile Ala Gly Met Leu Asp Thr Thr Lys Gly Leu Ile Arg Ala 1075 1080 1085 Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val Ile Thr Arg Leu Ser 1090 1095 1100 Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 1105 1110 1115 1120 Arg Lys Arg Asn Val Leu Ile Asp Lys Glu Ser Cys Ser Val Gin Leu *0t1125 1130 1135 SAla Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg *1140 1145 1150 Pro Ile Tyr Gly Leu Giu Val Pro Asp Val Leu Giu Ser Met Arg Gly 1155 1160 1165 0**His Leu Ile Arg Arg His Giu Thr Cys Val Ile Cys Glu Cys Gly Ser 1170 1175 1180 Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp **1185 1190 1195 1200 Soso*: Ile Asp Lye Glu Thr Ser Ser Leu Arg Val Pro Tyr Ile Gly Ser Thr 1205 1210 1215 0 a 0 '0064,Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 0001220 1225 1230 Arg Ser Leu Arg Ser Ala Val Arg Ile Ala Thr Val Tyr Ser Trp Ala 1235 1240 1245 Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 1250 1255 1260 Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val Ile Thr Pro Ile 1265 1270 1275 1280 Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 233 1285 1290 1295 Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 1300 1305 1310 Ile Ser Asn Asp Asn Leu Her Phe Val Ile Ser Asp Lys Lys Val Asp 1315 1320 1325 Thr Asn Phe Ile Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 1330 1335 1340 Giu Thr Leu Phe Arg Leu Giu Lys Asp Thr Gly Her Her Asn Thr Val 1345 1350 1355 1360 Leu His Leu His Val Giu Thr Asp Cys Cys Val Ile Pro Met Ile Asp 1365 1370 1375 His Pro Arg Ile Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Giu Leu 1380 1385 1390 *Cys Thr Asn Pro Leu Ile Tyr Asp Asn Ala Pro Leu Ile Asp Arg Asp 1395 1400 1405 .seThr Thr Arg Leu Tyr Thr Gin Her His Arg Arg His Leu Val Giu Phe 1 410 1415 1420 *see 0. 0 w.Val Thr Trp Her Thr Pro Gin Leu Tyr His Ile Leu Ala Lys Ser Thr i1425 1430 1435 1440 0Ala Leu Ser Met Ile Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 1445 1450 1455 Asn Glu Ile Ser Ala Leu Ile Gly Asp Asp Asp Ile Asn Her Phe Ile 1460 1465 1470 CThr Giu Phe Leu Leu Ile Giu Pro Arg Leu Phe Thr Ile Tyr Leu Gly 1475 1480 1485 .,*Gin Cys Ala Ala Ile Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 1490 1495 1500 Her Gly Lys Tyr Gin Met Gly Giu Leu Leu Her Her Phe Leu Ser Arg 1505 1510 1515 1520 Met Her Lye Gly Val Phe Lys Val Leu Val Asn Ala Leu Her His Pro 1525 1530 1535 Lys Ile Tyr Lys Lys Phe Trp His Cys Gly Ile Ile Giu Pro Ile His 1540 1545 1550 Gly Pro Her Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 1555 1560 1565 234 Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Giu Giu 1570 1575 1580 Leu Giu Giu Phe Thr Phe Leu Leu Cys Giu Ser ASP Giu Asp Val Val 1585 1590 1595 1600 P ro Asp Arg Phe Asp Asn Ile Gin Ala Lys His Leu Cys Val Leu Ala 1605 1610 1615 Asp Leu Tyr Cys Gin Pro Gly Ala Cys Pro Pro Ile Arg Gly Leu Arg 1620 1625 1630 Pro Val Giu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Giu Ala 1635 1640 1645 Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Ile Asn Pro Ile Ile Val 1650 1655 1660 Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser Ile Lys 1665 1670 1675 1680 *Gin Ile Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala *1685 1690 1695 .00 0 Giu Val Asn Val Ser Gin Pro Lys Ile Gly Ser Asn Asn Ile Ser Asn o 1700 1705 1710 Met Ser Ile Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu .01715 1720 1725 0 00.. Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Gly 1730 1735 1740 o Asn Leu Ala Asn Tyr Glu Ile His Ala Phe Arg Arg Ile Gly Leu Asn *.1745 1750 1755 1760 Ser Ser Ala Cys Tyr Lys Ala Val Glu Ile Ser Thr Leu Ile Arg Arg 1765 1770 1775 000.
Cys Leu Giu Pro Gly Giu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 1780 1785 1790 Ser Met Leu Ile Thr Tyr Lys Giu Ile Leu Lye Leu Asn Lys Cys Phe 1795 1800 1805 Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Giu Leu 1810 1815 1820 Ala Pro Tyr Pro Ser Giu Val Gly Leu Val Glu His Arg Met Gly Val 1825 1830 1835 1840 235 Gly Asn Ile Val Lys Vai Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 1845 1850 1855 Val Gly Ser Val Asp Cys Phe Asn Phe Ile Val Ser Asn Ile Pro Thr 1860 1865 1870 Ser Ser Val Gly Phe Ile His Ser Asp Ile Giu Thr Leu Pro Asn Lys 1875 1880 1885 Asp Thr Ile Giu Lys Leu Giu Giu Leu Ala Ala Ile Leu Ser Met Ala 1890 1895 1900 Leu Leu Leu Gly Lys Ile Gly Ser Ile Leu Val Ile Lys Leu Met Pro 1905 1910 1915 1920 Phe Ser Gly Asp Phe Val Gin Gly Phe Ile Ser Tyr Vai Gly Ser Tyr 1925 1930 1935 Tyr Arg Giu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 1940 1945 1950 **.Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 1955 1960 1965 Asn Pro Giu Lys Ile Lys Gin Gin Ile Ile Glu Ser Ser Val Arg Thr .1970 1975 1980 Ser Pro Gly Leu Ile Gly His Ile Leu Ser Ile Lys Gin Leu Ser Cys 1985 1990 1995 2000 Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp Ile Asn Pro 2005 2010 2015 Thr Leu Lys Lys Leu Thr Pro Ile Giu Gin Val Leu Ile Asn Cys Gly 2020 2025 2030 Leu Ala Ile Asn Gly Pro Lys Leu Cys Lys Giu Leu Ile His His Asp **2035 2040 2045 .Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser Ile Leu Ile Leu Tyr 2050 2055 2060 Arg Giu Leu Ala Arg Phe Lys Asp Asn Arg Arg Ser Gin Gin Gly Met 2065 2070 2075 2080 Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Giu Leu Ile 2085 2090 2095 Ser Arg Ile Thr Arg Lys Phe Trp Gly His Ile Leu Leu Tyr Ser Gly 2100 2105 2110 Asn Arg Lys Leu Ile Asn Lye Phe Ile Gin Asn Leu Lys Ser Gly Tyr 236 2115 2120 2125 Leu Ile Leu Asp Leu His Gin Aen Ile Phe Val Lys Asn Leu Ser Lys 2130 2135 2140 Ser Glu Lys Gin Ile Ile Met Thr Gly Gly Leu Lys Arg Giu Trp Val 2145 2150 2155 2160 Phe Lys Val Thr Vai Lys Glu Thr Lys Giu Trp Tyr Lys Leu Vai Gly 2165 2170 2175 Tyr Ser Ala Leu Ile Lye Asp 2180 INFORMATION FOR SEQ ID NO:i7: SEQUENCE CHARACTERISTICS: LENGTH: 15462 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: ACCAAACAAG AGAAGAAACT TGTCTGGGAA TATAAATTTA ACTTTAAATT AACTTAGGAT a. a.
a a a.
a a a
TAAAGACATT
TATTTGATAC
TCATTCCTGG
ATAATGAGAA
AACATGCACA
AGCTCTACCT
AGAAAGATCT
ATGAAAAGAC
TGCAGAACGG
CATGTTTAGG
GACTAGAAGG
ATTTAATGCA
ACAGAAAAAT
AATGACATTA
AAGGGCAGGG
AACAACAAAT
AAAACGGCAA
AACTGATTGG
CAGGAACAAT
AGCTCTTATA
TCAAGAAAAG
CGTAGGCAAG
ACTGTCTCTA
GCTCTTCTAT
TTCTTGGTGT
GGAAGTAATG
AAGTATGGAG
ATATTTGGAA
TCAACAATTG
ATACAGATCT
GGAACTCTAT
AAAACATAAC
TATTCGCCCT
TTCTATCTCA
CTTTATTGTC
CAGATGTCAA
GATTTGTGGT
GTGACCTGGA
AAGACCTTGT
GGATAGTTCT
AATTTCAAAA
AAAATCAGCC
TGGACCGACA
TTCACTAGAT
AATGGCTTAT
GTATGTCATA
TAAGACGAGA
TTATGATCAG
CCACACATTT
GGTCAAAGCT
ATGTTGAGCC
GGTGGAGCTA
ATAACTGATG
AATGAGAAAC
GCCAATCCAG
TACATGATTG
GAGATGATAT
GAAACTATGT
GGGTATCCAT
ATCACTAGTA
120 180 240 300 360 420 480 540 600 660 TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 237
S.
S
S*
4* 5 S S
S.
S
TGCAGGCAGG
CTCAACAGAG
ATGACCTCAC
GTCTCGCTTC
CTCTATCCAC
CAAAGGGACC
CACCAGGCAA
GAGCCATGCA
GACALAGCAGT
GAGTGACACA
AGACATCTTT
CAGAACAATT
TTCAATATGC
CTGACAATAT
ACAAGAAGAA
AAATAGATGA
ATCAATAATA
GGTAALATTTA
AAACTATCAA
CTCGGCCCTC
AAACGACACA
CAAACCAACA
GTCATCACAC
GAGAGGACCT
AATCCCCAGA
CAATGAAATT
GCTGGTATTG
CTTGGTAACT
AACCATAGAA
ATTCTTCA.AT
TCTCAGACCA
ACGCGCTCCT
CTATCCTGCC
ACAGTATGTG
AGCACGTGAT
CGAATCTAAA
CCACAAACCG
CGAACATAGA
CTGGGCAGA.A
CAAGACCGAA
ACAAAGCAGT
TCTGTTTA.AC
AATAPLGAAAA
GAGTCTGCTT
ATCATGGATT
AACATCATTG
ATCAACACAA
GAAACAAGTG
GAATGTACAA
GGGAGAAGAA
AGCATCACAG
AGAAAGATGG
AGCGGTGACA
CTTATGGTTG
A.AGAATATAC
ACAATCAGAT
GATATCAATA
TTCATCTGTA
ATATGGAGCT
ACGGGA.AGAT
GCCGAAGCTC
GAA.AGCTTGA
ACAGGTGGAT
GCAGATCAAG
GGAAATAGAA
CAP.CAAAACA
CAACCACCCA
GCATTTGGAA
ACTTAGGATT
GAAACTCAAT
CTTGGGAAGA
AATTCATACT
GAACCCAGCA
AGAAAGATAG
CAGAAGCAAA
GCAGCTCAGA
ATTCTAAAAA
ATAAGGACTC
CAGTGGATCA
AAACATTAAT
AAATTGTTGG
ATGGAATTGA
GATTAAAAGC
TCCTCAGAGA
ATGCAATGGG
CATATCTAGA
AAATGAGCTC
AGAGACATAT
CAGCCATAGA
AACAAAATGG
GCGATGATCA
TCAGAGACAG
CTAATCCCAC
GCAACTAATC
AAAGAATCCT
CAATAGAGAG
GGAATCAAGA
CAGCACCGAC
ACTCAGTGCC
TGGATCAACT
AGATAGAAAT
TAGTAGAGCT
TGGAACCCA
TATTGAGGGG
GATTGGGTCA
AACAATGAAT
CAACTACATA
GACCAGAATG
TTTGATGGAA
TCCTATACAT
GGTGGCAGTT
CATTGATATG
AACACTGGA.A
AAGGAACATA
GATGGCAATA
AGAACCTCAA
GACTGAGCAA
ACTAAACAAG
AAACAGAACA
GAATCAACAT
ATCATACCGG
TTGATGGAAA
GATAAATCAA
CCCCAAGAAG
ACCATCTGTC
GACAAAAATA
ATTGATCAGG
GAGACTGTGG
AACACGGAGG
AAAATGCGAC
ATCATGCGGT
ACCAGCAGAA
AGAGATGCAG
GCAGCTTTGA
CTGTATTTAT
GGTGAGTTCG
GTACAAAATA
TTCCAGCTAG
GATGAACTTG
AACAGTTCAG
GATGAAGAGC
TCATCCATAA
GCTACAGAAT
AGACTCAACG
AACCAGGACG
TTTAATCTAA
AATATAGGGT
GCGATGCTAA
CTAATATCTC
ACTTATCGGA
AACCAGAAAT
GACAGTCCGG
AAACTGTACA
TCTCTGGAGG
ATATTGATCT
AATCTGCAAA
780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280
S.
S S *50S
S
5555 238
S.
S*
S.
S
S
5..
S
TGTTCCAAGC
TGATCATGGA
TACTGCTGCA
AAGTTCTTCA
CTGGTTTALAG
ATCAAKAGGG
AACAGAAATA
CAACAACACC
AACTTATACA
AAATGGAAAG
TCTATTGCAG
ACGAGTTGTA
CCTGGCAGGA
AAATGAA.ATG
GATAGAAAAT
TATGACTGAG
CAAAACAAAA
ACAAGGCATT
CGATGTACAA
AATACCCAAA
TCTCTCACp.A
AGAAGTATCT
CAAAGAAACG
ACACAA.AATC
AGGATTAAAG
CATTCCCAGA
GAGATATCAG
AGAAGCCTGG
ACACCAGATG
ACACATCAAG
AAATCAAAAG
CAGAAGAAAA
CAGACAGAAT
GACCGGAACG
AAAGAATCGA
GAAAGGAAGG
AATCTTGGTG
TGTGTAGCAA
TTAGTCATAG
CTAAACCTCA
CAAAGAGAAC
AGAGGAGGAA
TTGAAAGAAG
GACAAGAATA
GTTAAATCAG
AAAGTGAGCA
AGCACAAAAC
GAATTAATGG
ACACCGAACA
AAGCAGAATG
AATAAATTAA
ATCATCATTC
GAAGTGATGA
AATCTATCAG
ATGAAGAAGA
AAGATGACAA
ATACCGACAA
TCTCAAAGAC
CATCAGAAAC
AACAGACAAG
TCCGAACAAA
ATACAGAAGA
TAATTCALATC
ATGTACTAAA
GGGTTTCAAT
AAGCAGATCT
AACTGTCATT
AGAAAGACCA
AAAAGATCAA
TACCCGATCT
AGATATTAAG
GTACAATGAG
AATCATACAT
ACATGTTCAA
AACAGACAAG
AAACAACAGA
TCCTTGTCCA
TCTGAAAATG
CATATTTACA
TACACCTGAT
AATACTAATG
AAGAATTAAA
CCAGATACCA
AACAACCACC
ACAATCCTCA
CACAACTCCT
CTCTGAATCC
GAGCAATCGA
CACATCAAAA
CAATGTAGAT
GGACAACGAC
AAAGAAAATG
GATCACGTCA
AAATGAATCC
GAAGACCAGG
ATATCGACAT
TTCATACAAT
ATCACTAGTT
AAACGAACTC
TGAAGATGTC
AAACAACAGT
TATCAATCAA
AAATGAGTAT
GTCATATAGA
ACAGAACAAA
ACAAGATCAA
AA.AAATAGTA
AAAGGGGGAA
ACATCAGACT
AACACCGACA
TCATGGAATC
CCAACAACAA
AA.ACCCAAGA
TTTACAGAGA
CTAGATTTAT
ACTGCATCAA
ACAAAATTA
GACGAATCAC
CTAATTTCAA
AATGAGAGAG
TTTGACCCAC
GCAGGAGATA
GAGTCAAATG
GCAGTCATCA
AALACGTTGCA
AACAATTGCC
AGATCAAAAC
TATACAAATA
AACTAACTCT
ACCATTACCA
GTAGAAACAG
TAAGTGTTGT
GGACAAAGAA
AAGGGAAAGA
ACAGATCCAC
CAILAGGGGCA
TCATCATCGA
CTTCCAGATC
CACAAAAGAC
GGGCAATTAC
ATCAAGACAA
AGATAGATTT
CACAGATACA
ATAGAAGATT
ATCTCAAAAT
TATCCATGAT
TTATGGAGGC
CACTAGAGAA
CAACAAGACT
ACAACAGCAA
AAAATGATGA
AATGATCCAA
CTGTCAACAC
AGAAAAACTT
GCAATATACA
CTCAAAGTCA
2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 239
C
CC..
C C *C C
C
C
ATGAACAGAG
ACGGATCCCG
ACAAATACGG
GATCATTACC
CAACCAAACT
CGGTACAAAA
TGTTCGATGC
AATTTAGAGT
CTAAGTCAAT
TAAAAACAGG
AAAAATCACT
ACTCTGTTGA
TAGTTGGAGG
GTCAGCTGGT
ATCTAGTTAT
CTTTACCTGG
AACAATGGAA
AAGGATAATC
TCGCAAGAAT
ACAGAACACC
GAGACCGGCA
AACCATGATC
ATTGGTCAAC
TTTGAGCCTC
ATACAAGAAG
AGATGTGATA
GAAAGCAGTA
GTATTTAGAT
GAGTGTGAAT
AATCGGATTG
GGATATAGAA
TATAAAACCA
CAACAAAGTT
AATCTTCGTG
GGCATCACTA
GGTTCAGACT
GAATTTCATG
ATACTGTAAA
AATCAGTCTT
ATTCAAAAGA
CTGGGCTTCA
CGAGTTCAGA
CTAGTAATCT
AAAAACTTAG
AAGAGAGAAG
AGAACAACAA
ACACAACAAG
ATGGCATCTT
AGTCCCAAAG
ATACCAAAAA
TTATTGGATA
GTAACCAATC
CCCCACATTA
GTCTTCTTAC
GATCTCGACA
GCTAAGTACA
GTGAGAAGAA
GAACTGTACC
GCTCTTGCTC
AATTGTACGG
TCTCTACCCA
GATTCTAALAG
GTCCATCTCG
CAGAAAATCG
CATGTCAATG
GAGATTTGTT
TCAGTAGAGA
TACTATCCTA
CTATTTTAGT
GACAAAAGAG
GGACCAAAAA
AATCAAAACA
CACTGAALCAC
TCTGCCAAAT
GGATGAAGAT
TAGAAGACTC
GACTGATCAT
AAGAATCCAA
GAGTTGCA
TCGGCTTCTT
GTGACCCGAG
CTGGGAATGA
CAGTCAAAGC
CATGGTCCAA
CTCAATGTCT
CAATTGGATC
ACACAATATC
GGATAGTTCA
GATTGATCAA
AGAAAATGAG
CAACTGGGTC
ATCCTTTAAT
TTACAAGAGT
ATATTATTGC
CCGGACGTAT
GTCAATACCA
AGTCAAATAG
TCCAACTCAC
AATGCCA.ACT
AGATATCACA
ATCACAAAAC
TAACTCTTGT
CCCTTTA.TAT
TGAAAACACT
GATCGGAAAT
CGAGATGGAA
TTACAAAGTT
CCAGGAATTG
GAAAGAGATG
TAGACTAAGA
TCCACTAGAT
AATAACCTTG
AATCAATCTG
AATTTTGGAT
AAGAAAAGTA
ATTGATATTT
CATATCAAAA
GGATCTAAAT
GGATGCAATT
AAAAGGAGTT
CTATTAAGCC
ACAACTATTA
GAGAAATCAA
TCAAAACAAA
TCAATACTGC
AAACTACAGC
TTTGAAACAA
GGTGACCAAC
GATGGATTAA
GATCCCAGAA
CCACCAAAAC
CGAATCAAAG
TGTGGCTCTG
TTACAAGCCG
GTTGTTTACA
AAAGGAATGC
AGGAGCATAA
TTCAAAATTC
CAGGTACACA
GAGAAAGGCG
GGCAGAATGT
TCTTTAGGAC
ACACTAGCAA
CCGCATCTCA
TTCCAACCTT
GGGAAAATCA
GAAGCAAATA
GCAGTCACAC
AACAAAAGGT
AATTCCAAAA
TAATTATTAC
ACGTAGGTGT
GATATCTAAT
AGATCAAGCA
GATTACAGAA
CAA.AACGATT
3900 3960 4020 4080 41.40 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 240 CTTTGGAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC
C.
a a
C
C a..
a
GGCAGTTGCT
AATTAGGGAC
AGCAATTAAA
AGGTTGTGAA
AACAAACATA
TATAGCATCA
ATATGATATC
CTTGAATGAT
CACTCAGATC
CCCTCTTCCC
ATGTATAGAA
TGAAATAGAG
AGACATTGTT
CACCTGTACA
AATTATAACA
TAAAGAAGGA
ACTTGATCCA
AAAAGAATGG
TAGCACTACA
GATAATTACA
TGACAAGCCA
TATAAAAAAC
CCCAATAGAC
TGCTGGTAAT
AATATACATA
CTGGTTGAAG
ACAILACAAAG
TCAGTCCAGG
GCAGCAGGAC
TTTGGTGATA
TTATACCGCA
TATGATCTGT
TACTCAATCA
TACAAAGTAG
AGCCATATCA
GCATTCAGCA
AGCTGCTTAT
CCAAGATATG
TGCAACGGAA
CATAAAGAAT
ACTCTTGCAT
ATTGACATAT
ATAAGAAGGT
ATCATAATTA
ATTGCAATTA
TATGTACTAA
TTAGGAGTAA
AAATCCAAAT
GAGCTGGAGA
TTATGGACAA
CCAAGCAGGC
CAGTGCAGTC
ATTATGTTAA
TTCA.ATTAGG
ACATAGGATC
CAAATATCAC
TATTTACAGA
CCCTCCAAGT
ATTCCATATC
TGACGAALAGG
GCTATATATG
CAGGAAACAT
CATTTGTCAA
TTGGTAATAG
GTAGTACAAT
TCTATACACC
CAATCGAGCT
CAAATCAAAA
TTTTGATAAT
AGTATTACAG
CAAACAAATA
AGTTACGCAA
TCGAGATGGA
CGTCTATGGC
TAATCCTGGT
AAGATCAGAC
AGTTCAGAGC
CAAAGAAATC
AATTGCATTA
GTTACAAGAA
AGAAATATTC
ATCAATAAAG
CAGACTCCCT
ATATAACATC
GGCATTTCTA
CCCTTCTGAT
ATCCCAATGT
TGGAGGAGTG
AATCAATCAA
AGGTATCAAC
AAATGATATA
CAACAAGGCC
ACTAGATTCT
GATCATTATA
AATTCAAAAG
ACATATCTAC
TCCAACTCTA
ATACTGGAAG
TACTCATGGC
GTTATTATCA
AT CGAAAAA C
TCCATAGGAA
GTGCCATCGA
ACACAGCATT
AAAGGAATAA
ACAACATCAA
GTGAGAGTTA
TTATTAACTA
CAAAACAGAG
GGTGGAGCAG
CCAGGATTTG
CCAAGAACAA
GTTGCAAACT
CCACCTGATC
GGAATGCTGT
ACACTAAACA
AAATCAGATC
ATTGGAAATT
TTGTTTATAA
AGAAATCGAG
AGATCATTAG
CTCATATAAT
CATACCAATC
AACAAGCTCA
ATAGTCTTCA
TCAAAGAAGC
ATTTAATAGT
TTGCGAGGCT
ACTCAGALATT
AATTACAAGG
CAGTTGATAA
TAGATGTTGA
GGCTGCTGAA
AATGGTATAT
ACGTCAA.AGA
TATTAAACCA
CGGTCACATC
GTATA.ACA.AC
AAGGAGTAAA
TCAATACAAA
ATTCTGTTGC
TAGAAGAATC
GGCATCAATC
TTAATATAAC
TGGATCAA;A
ATATTAWAAT
TGAGGAAGGA
ACGGAAAGGA
CTAATAAGAT
TCATAGTGCT
5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 241 A.ATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA a a a a a a a,
GTTTATGGAA
GTCAGGAGTG
ATCATTGACA
TGATAATCAA
TCCAGATGAT
AAGGTTAATG
AACTCCGTCT
AGGTTGTCAG
CTCAGACTTG
TAGGAAGTCA
CAAAGTTGAT
TGTCAATTAT
TCAACCATAT
AATATTTCTC
AACTGGGTGC
TTCAGATAGG
AAAATTGAAA
ACTTCTACTA
ACAATTAGGA
TAATGTGCTA
ATGTATAACA
ATCTGTCATA
AACCGAAAGA
AACAAGCTGC
TAAAAGCTTA
ATTACAGAAA
AATACAAGGC
CAACAGATGT
GAAGTGCTGC
TTTTGGAGAT
CCAGGGCCGG
TTAGTTATA.A
GATATAGGAA
GTACCTGACT
TGTTCTCTAG
GAAAGATCAG
GATGGTTCAA
GCTGCACTAT
GGGTATGGAG
CCCGGGAAAA
AGGATGGTCA
GTATGGACGA
GGTAACAAGA
ATAATTGATA
TCAAGACCAG
GGAGTATATA
TTAGACTCAC
GTAAACGAGC
ATTACACACT
AACACATTTC
AGATCCAAAT
TTCTTACALAT
CAGATCTTAG
CACAAAGAAT
GCACGTCTGG
GATTATTAGC
ATGATCTGAT
AATCATATCA
TAAATCCTAG
CACTCCTAAA
ATTATGCATC
TCTCAACAAC
ACCCATCTGT
GTCTTGAACA
CACAGAGAGA
ACTCCATCAT
TATCTATGCG
TCTATATATA
TTACTGATTA
GAAACAATGA
CTGATGCATA
AAAA.ATCGAG
TGGCCATCCT
ATAACAAAGG
AACCCATGTT
GGCATCGGAT
TCAGAGTCAT
GAAATTCATT
AACACATGAT
TCTTCCATCT
TATGCCAACG
TTATGCTTAT
AGTCTTACAG
GATCTCTCAT
TACAGATGTA
ATCAGGCATA
AAGATTTAAG
TGGACCAGGG
TCCAATAAAT
CTGTAATCAA
TGTTGTTGAC
ACAAAATTAC
TACAAGATCT
CAGTGATATA
ATGTCCATGG
TCCACTCAAT
AGTGAACCCA
AAACAGAACA
ATATTGTTTT
GTTCAAAACA
AATACCAATG
GTCCAGAATT
AGTGmATTA
GTAGGTATAA
TTAATGAAAA
ACTGTTGATG
ACCTCAA.ATC
ATAGGGATAA
ACCTTTAACA
TATCAACTGT
GAAGATATTG
AATAATAACA
ATATACTACA
GAGAATGTAA
GCGTCTCATA
AAAGGCTTAA
TGGGGGTCAG
ACAAGTTGGC
AGGATAAAAT
GGACATTCAT
CCCACAGGGA
GTCATAACTT
CTCTCAGCTG
CATATAGTAG
GAGATTCCAA
ATCTAATACA
ACATACCAAT
CAATTAGAAA
AACCTTTAAA
CTCCAAAAAT
GCTGTGTTAG
TAATTACTCG
TAACTGTAA-A
TAAATGACAA
GTTCAACTCC
TACTTGATAT
TAAGCTTTGA
AAGGCAAAAT
TCTGCA.ACAC
GTCCATGGTT
ACTCAATTCC
AAGGAAGGTT
ATAGCAAGTT
GGACATGGCA
GTCCAGATGG
GCATTGTGTC
ACTCAACAGC
GATATACAAC
AAATAAATCA
AAAGCTGCAG
7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 '8280 8340 8400 8460 8520 242
C
C
C
TTAATCATAA
ATCAGCAATC
GGGAAATGGA
ACCTTAACTC
CTCAGCCTTA
TTAATAAATT
AAGTGAATGA
TCAAATTATA
GAACATATAG
TAGCCTCAAA
AAGTTCACAC
TCAAGTATGA
AGGATTATALA
TGATATTAGA
ATTGTGACGT
TACAATCTAT
TTATGGGAGA
TTCAAACTCA
AGATGGAATT
TTGATAAAAT
CTTTTTTTAG
GAAAATATAT
TCTTCTGTAC
TGACATTACC
TATCATATGA
TCATAGAGCC
TTAACCATAA
AGACAATAGA
CACTGAATCT
TCCTATCGTT
TGATATGGAT
GGATAAAAGA
CTTAGGAAAA
TATACCTGGT
TCAAATGACT
AAATGATGGA
AACCTATAAA
TATGAGAAGA
CTTGTTAGA.A
TAAACAAAAC
AGTCGAAGGC
GTATCAGAAA
AAAGACATTT
TGATCCTGTT
AATATTTGAA
TTTAGATATA
AACATTTGGG
GTATATTGGA
AATAATAATT
TGATCATGCA
AAATGCTGTT
TCAGTTAGAT
TATGCATCA-A
CAAAAGGGAA
AACAATGGCA
AAAGGTAAAA
GACGACTCAA
CAACGATCTA
TACACATTTA
ATTAACAGTA
GATGGATTA.A
AGCAATTATG
TCAGATAAAT
TTACAAAAAG
GACCAGAAGA
TATAATGGTT
CGATGGAATA
GGTAATAACC
GATGTGATAT
AAACAACTAA
TCTAGAGAAT
TTTAATAAGT
CATCCTCCAT
AAAJCAATTAA
AACGGATATA
CACGAATTCA
GATTATTACC
GAGGATTTGA
TCTATCTATA
ATATAAAkAAA
CTGTATCTGA
TAGCACAATT
TACTAGTTAT
TTAGAAGATT
TCAGATATCC
AAGTGACTGA
GAGATCTATG
ATCTTAATGA
GGTATAATCC
CTCGAAATGA
ATTTCTTATT
ATCTAATTAC
TAAGTGCATG
TGTGGGAAGT
CGTTATTAGA
GAGGAGCTTT
CGATTAAGGA
CTACAAkTAGA
TAGAAGCTAG
AATTTGACAC
GAGAGAGGCA
TCATAA.ATGC
AGAGCTTTAT
CAATTTATAT
ATACAAGTAT
CTTAGGAGCA
CATACTCTAT
ACACACTATT
CACTAGACAG
AA.AATTAATA
AGAA.ATGTCA
ATTATTACTT
GATTAATGTG
AGAAATTAAT
ATTCAAAACA
GATCACTTTT
GATACATCCA
TCCTGAATTA
TGCTAAGTTA
GATAGATAAA
ACCACTTGCA
TTTAAATCAT
ATTTCTGAGT
TGAAATAGCA
TATTGCAGCA
TATTAATAAA
TGGTGGACAG
TTACGGTTCA
AGGAATAAAA
GAAAGATAAA
ATGATAAGTA
AAGCGTGCTC
CCTGAGTGTC
ATGAGTCTAC
AAAATAAAAC
TTAACTGAAA
AAAGAAATGT
AAAGCAGATA
CTATCAAAAT
AATATATCGA
TGGTTTACTA
AATGTTGGGA
GAATTGGTTT
GTATTGATGT
GATCCAAAAT
TTGTTTCCAA
TTATCCTTA
GTGTTATCCG
GTAGATTACA
GAGATTTTCT
GAAAAGGTTA
TGTCATGCTA
TGGCCTCCTG
AACTCTGCGA
TTCAATAAAT
GCATTATCTC
8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 243
S
*5
CAAAAAAATC
CATCCAACGA
ATCAGATATT
CTTATAGTCT
ACAAAATGAG
TCTTTCAAGA
TATCAATATC
ATGACCTTAA
AGAAATTTGA
TCCTAACAAC
TTGGAGAAAC
GTCTTGAAGG
ATATATCATT
TAGAAGGATT
CTGTTAGAAT
TAACCACAAG
ATGTAGTGAG
AATTAAATGA
ATGGGAGAAT
CAGTAATAGA
TTGAGAATGG
AACTATATAT
AGTATTTTAG
GATTCAATTA
CCGCATTGGC
ATAGGATTAT
AAATTGGGAC
ATCACGAAGA
GGATTATGTA
TAAAGAAAAA
AGCTACACA.A
AAATGGGATG
AGGAGTTCCA
AACCTACAAT
ATTCAAGTCA
AGATCTCAAA
TTGCAACCAA
AAGTACAATC
AGAGGATCAC
TTGTCAAAAA
AGGCGTGAGG
AGTACCCAAC
ATTTTTTGAT
AACGATTATA
TCTTCCTCAA
CGAAACAAGA
TTATTCACCT
TGCCCTTGGG
GAATCCAAAT
CATGGCCATG
TGATATTAAA
GAATCAAGAA
ACAGTTTATC
TTAGTTGAAG
GA.ATCTGGGG
GAGATCAAAC
GTTTTATCAG
GTGAAGGGAG
CGGTATAATG
AAAATAAGTA
ACGGATATCT
AAATACTGTC
ATATTTGGAT
TATGTAGGTG
CCTGATTCTG
TTATGGACAC
GTGACTGCAA
AATTATGACT
TCATTAAGAG
AGTAGCAAGA
GCTCTAAA-AG
TCAGCATCTT
GTTCTAGGAT
ATGAATATCA
TGGATGCAAT
TCAAGATGTT
AGATTTATTA
CCAGGTGAGT
CTGCATCTAA
TATTTATAGC
ACTGGTTAGA
AGGAAGGTAG
AGACACTACT
AGATTGAATT
AAGTGTACAA
ATCTTAATTT
ACAATGATGG
TTAATTGGAG
TAAATAAATT
ATCCTTACTG
GTTTTTACGT
TCATATCTAT
TGGTTCAAGG
ACAGAGTTAA
AAGTGATGGA
TGTTCATATA
CATTATCTAG
CAAATTTGGC
ATGCATGCTC
ATCCAACTAT
ATGCCTCTTT
TTGTAAGGAA
AGGCGAATCT
CATCTTTTTT
TTTACTGTAC
AGATAGTAA-A
TGATCCAGAA
ACTCTTTGCA
TGCAAATAAC
ACTTAAGAGA
TAATTCTAAA
GTCTTCTAAT
ATACGAGAC'r
ATATGAATCA
GTTTAATTGG
TCCTCCATCA
TCATAACCCA
AAGTGCAATA
AGACAATCAA
GAAGGAGATA
TGATCTAGGT
TAGCAAAAGA
ATGTGTCTTC
AACATCATTT
AATTTTTAAG
AACACAGAAT
AATACCTGCT
TATTGGTGAT
ATTAGACCGA
GGACTGGGCT
CGTACTAACG
TTTGATCCTC
TTTAATATTT
AAAATGACAT
ATAGGAAA.AT
TTAACAACCA
AGCCATACAG
CAGAAATCAA
GTGAGCTGTT
ACAGCTCTAT
TTACACCCTC
GATAAAGAAC
AGAGGGGGTA
CATCTAGCAG
GCTATAGCTG
GTTTATAAAG
CATGAACTTA
ATCTATTATG
TGGTCAGAGA
GCAAAAGCAA
AACATTCAAC
ATCAGAGATC
AGTGTTGGGG
CCATCAGTTG
AGTGTTCTTT
TCAGATCCAT
10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640
S.
S S S S
SSSS
244
ATTCATGCAA
GGAATGTATT
TAGAAGAAGA
TTGCACATGA
TAGATACGAC
TGTTGAGGA.A
GACTAATTGT
CATTGCGACA
CGCCTGACCC
TATGTTATTC
A.AATAGGATC
CTGATGAAAG
CCGCAATAAG
TGGAAGCCTC
TAACACCGGT
TGAALATTCTC
ACATGTCTAT
TGTTAACAGG
ACCCTATAGT
ATGAACATAT
TTATTTATGA
ACCATTCTTA
CAATATGTAC
AAGAGATAAT
CTCTTGACAT
ACACTCTTTA
TTTACCACAA
ACAAGATTCA
TGAAGAATTA
TATTCTAGAT
AAAATCACTA
AATCAGTAAT
AAGTGATAAA
AAAGATGTGG
ATTAGAATTA
TTCAGATGGC
AGCAGAAACA
ATCTGAAGCA
AATAGCAATG
ACAGATAGCA
AGCTACATCA
CAGTACATCA
CAAAGAAGCT
ATTAAGTGTT
TATGCATCTG
TAATCCAGAG
TAAAGACCCA
CACAATTGAT
TGCA.ATTACA
AGTTATTGCA
ACTTGTATTT
TAGTCTAAAA
TCT CAAAATA
CCAAATCCAT
GCTGAGTTCC
AATTCTCTCA
ATTCGGGTTG
TACGATCTAG
ATCAAGTATG
ATTCATTTAT
CTATCTGGGG
ACA.AAC COAT
GGTATATCGT
CAATTAGGAT
ATATATACAT
CAAACACGTG
ACAA.ATTTAT
TTGATCAGAG
AATGAAACCA
TTCGAATATT
CACATAGAAG
TCTACATTAG
CTCAALAGATG
ATGAATTATT
ATAGCAGATA
AATGATGATG
CTCAAGACAT
ATAGAAGGTA
TAACCACCAT
TATTATCTGG
TGATGGACAG
CAGGA.ATTAG
GCATAAATAG
TACAATATGA
AAGATATGTG
CAGGAGGAAG
TAGTAATAAC
ATACTTGGAT
CATTAAGAGT
ATATCAAGAA
GGGCATTTGG
CAAATTTTAC
CACACAGATT
TCAGCAGATT
AAGATACTAA
TATTTAGATT
ATGAGTGTTG
AATTAATTCG
TGGACTTATC
GGGATGATAC
CTATGTCACA
ATATTAATAG
TTGGTGGATT
GGGATCTCAT
GATAAAAAAT
ATTATTCACA
GAAGGTAATT
AAATGCCATA
AGGAGGACTG
AACACTAAGT
TTCGGTAGAC
GATGATAAGT
AGGATCAGAA
GTATTTACCC
TCCTTATTTT
TCTTAGTAAA
TAATGATGAG
ACTAGATAGT
AAAGGATACT
CATAACAATG
TCTTATTTAT
AAAAGAAACC
TATTAAAGAA
ATATCCTGAA
AAAACTTATG
TGACATCATA
ATTAGATCGA
CTTAATCACT
ATTAGTAAAT
TTGGGATTAT
ATAACAGCAA
AATACAATGA
CTCCCTAGAG
GCTGGAATGT
ACATATAGTT
AGGACTTTGC
CTTGCCATAG
GGACTTGAAA
CATTGTAAAA
GGTAATATCA
GGATCAGTCA
CCTGCAAAAG
ATATCTTGGA
CTCAAAATTT
GCAACTCAGA
TCCAATGATA
CAA.CAAATAA
ACAGGACACA
AGTTTTAATG
AGTAATGAAT
GTTATTAAAG
CATGCAATTT
GATAATTTAA
GAATTTTTGA
CAATTTGCAT
ATAATGAGAA
11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 245 a a.
a.
a a a a a a *aa.
a
CACTGAGAGA
AAGTATTCAA
CTAGTCAAGA
TGAGAGAATG
TTGCAAATGA
CAGAAATTGC
TATTGAAACA
TATCTGGATT
TCAAATATCT
TAGAAGATGA
ATAATAAAGG
TTAAAATCAG
GTGGTTTGAC
TCAACAGCAC
ATAAAGACAA
ATGCCACATT
TTGGTCAACG
GAAATGTGAC
CATGGATAGG
TTGGATTAGT
ATGAACATTA
TTTCCAAAAT
TATATTGGAA
TATATCTAAT
CAAAACTTAA
TACTTCCCAT
GAGGTTCTGG
CCAGATAAkA.
GTTGAATGGT
TAGGAAACAA
ATCTTTCGGA
ATATCTTGAA
ATTAATTAAA
AAGGATTCGC
AAATATGCTG
GAATAAAATT
ATCTATALACA
ACTTCCTCAA
TAGTTGTCTG
GGACAGGCTC
AGGACCTGCA
AGAATTGAAA
ACAGATTCTT
AAATATGGAA
ACATTGTGAT
TAGTGTTATA
TATACCTACA
AGATGTAAGT
TTCGAAAGAT
AAGATTGTCA
TCAATATTAA
GATTGTGGAG
CTTGCCCTAT
GTATCACTTG
GCCTTTATTT
CCTAACCTGT
TTAA.ATATTA
TCGTTCCCAT
GGTATTAGTC
GATAACATTG
AACAATTTCT
AGTGATTCTG
GGAGGGAATT
AAAGCTCTTG
TTCCTGGGAG
GTTAATTATT
ATATTTCCTT
AACAGGGTAA
TGTGAGAGCT
ATGGAAGGAG
AGAATTACAT
ATCACTCCGA
ATAATATCAC
GCATATTGTA
CTCTTGGAAG
AAGTATTATC
TTTTAAACCC
CTATATGTGA
AAATATACAT
CTAGACACCT
TAAACTTAAC
AAGAAGACCC
CAACTGTAAC
CACCTGAGGT
TCAAAACTAT
GGGGACTAGC
ATGATAATGA
ATCTATCGCA
AGTTATCACA
AAGGAGCAGG
ATAATTCAGG
CAGAGGTATC
ALAGTACTGTT
TAATATGGAG
CTATCGGTAA
ACTTGATTGG
ATTGGTCTAG
TCAAAACTTC
CTATAATGGA
AAAATAATCT
TAATGCATTA
TATTTATGGT
ATATTCACTA
TTGTGACAGC
TTCATTTGTT
ATACTTGGAG
TACTCTTAAA
ATACGTAAGA
AATTGATGAT
AAATGATAAC
ACTTA.AGAAC
TAGACTAGAT
TCAATTGAGA
AATTTTAATG
AGCTATGCTA
TTTGAATATA
ATTAGTAGGT
CAATGGGAAT
TGAATTAAAT
ATCAGAAGAA
GGATGATGAT
AATACTTTAT
TAATCCTGCA
ACCTAGTGAA
ATTAAAATGG
TCTCATCCTA
CCTAATACTG
GATCTATTTA
GATATGGAAG
TGTTGTTTAG
AGACTTGATC
TATGTACAAA~
AAGACTGCAA
TGGGATCCGG
TGTAATAAAG
TATCAAGTCC
GCTAATACAA
TTATTCGGAA
AkAGGAAGTCA
GCATGTTATG
ACAGATGTAA
AAAAAATTAG
CCTAATTCAA
GATAAGTCCA
ACTGTTCTAC
GTTGTTTTAG
CTATATAAAT
TCAACAGAAT
ATTGTTTTAT
ATCATTTTAT
13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCA AGA.AGGAGAA AGAGATTATG 246 GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC
ATCTGGCGAA
GTTTTCAGCG
AGAGACATAA
GACTGCTATC
TTACAGGTCG
TAGCTGATAC
GAGAGTGTAT
AATTGATTGG
AGTTATTAGA
TGAAGATATA
TGTAATATAT
AGAATTTTTA
AACAATAAAG
ATTAGGCGGA
GAGAAGACTA
CTTTCCTGAT
TGATTTAGAA
AGGATCAATA
TGGTGCTAAA
AAACTACAAT
TCCTAACCTT
TCAACCCCAG
GATGTTTTAT
AGATATAACA
GTATTAAGTT
GAAAAATTTG
TCATTAAAGT
TCATATTGGT
TTATTAGGAA
CAACATGATG
TATCTTTAAG
ATCTGACTAA
TTGAATGGAT
TATTCCCACT
GGATTTCATT
AACATAGAGC
TATTGTCGAA
TTCTAACCA
TTCCCAGACA
AATTTGATAT
CCTAGGAATA
TATCAACAAT
TAATATA-ACT
GAAAAATAAG
ATCATTATCG
ACAGACTGGA
AAACATCATT
AGA-AGTTAAA
ATATAAAGAA
CGATTAAAAC
GACAAAAAGT
ATAATCCAAA
CATGATGATA
GGAAAGTTAA
ACTCGATTAC
TATGTATCAT
AAGAATTACA
ATACTTATGA
CCCGAAGACC
ATAA.ATACAA
AAGAAAAACA
14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15462
S
ATATACCAAA. CAGAGTTCTT CTCTTGTTTG GT INFORMATION FOR SEQ ID NO:18: Wi SEQUENCE CHARACTERISTICS: LENGTH: 2233 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID 140:18: Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser 1 5 10 Glu Cys His Leu Asn Ser Pro Ile Val Lys Gly 25 His Thr Ile Met Ser Leu Pro Gin Pro Tyr Asp 40 Ile Leu Val Ile Thr Arg Gin Lys Ile Lys Leu 55 Asp Ile Leu Tyr Pro Lys Ile Ala Gin Leu Met Asp Asp Asp Ser Asn Lys Leu Asp Lys 247 Arg Asn Glu Leu Arg Gly 145 His Phe Ile Asn Asn 225 S*Asp Pro Ile Ser Val 305 Glu Asp Gin Asp Met Leu Asp 130 Ser Thr Thr Thr Phe 210 Tyr Val Lys Asp Leu 290 Lye Leu Tyr Arg Leu Phe Leu 115 Leu Asn Thr Ile Phe 195 Leu Asn Val Leu Lye 275 Leu Gin Ile Ile Ser Gly Lys 100 Lys Trp Tyr Tyr Lys 180 Asn Leu Gly Glu Gin 260 Leu Glu Leu Phe Asp Ile Lye Leu Ala Ile Asp Lys 165 Tyr Val Ile Tyr Gly 245 Ser Phe Pro Arg Glu 325 Lye Arg 70 Tyr Tyr Asp Asn Leu 150 Ser Asp Gly His Leu 230 Arg Met Pro Leu Gly 310 Ser Ile Arg Thr Ile Arg Val 135 Asn Asp Met Lye Pro 215 Ile Trp Tyr Ile Ala 295 Ala Arg Leu Leu Phe Pro Thr 120 Leu Glu Lye Arg Asp 200 Glu Thr Asn Gin Met 280 Leu Phe Glu Asp Lye Ile Gly 105 Tyr Ser Glu Trp Arg 185 Tyr Leu Pro Ile Lye 265 Gly Ser Leu Ser Ile Ile 75 Tyr Asn Gin Leu Asn 155 Asn Gin Leu Leu Leu 235 Ala Asn Lye Ile His 315 Lye Aen Leu Pro Ser Met Ala 140 Asn Pro Lye Leu Ile 220 Val Cys Asn Thr Gin 300 Val Glu Lye Thr Glu Lye Thr 125 Ser Ile Phe Ala Glu 205 Leu Leu Ala Leu Phe 285 Thr Leu Phe Ser Glu Met Val 110 Asp Lye Ser Lye Arg 190 Asp Asp Met Lye Trp 270 Asp His Ser Leu Thr Lys Ser Thr Gly Asn LyE Thr 175 Asn Gin Lye Tyr Leu 255 Glu Val Asp Glu Ser 335 Ile Vai Lys Glu Leu Asp Val 160 Trp Glu Lye Gin Cys 240 Asp Val Ile Pro Met 320 Val Asp 248 340 345 350 Glu Ile Ala Giu Ile Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 355 360 365 Leu Giu Ala Ser Ile Ala Ala Giu Lys Val Arg Lys Tyr Met Tyr Ile 370 375 380 Gly Lys Gln Leu Lys Phe Asp Thr Ile Asn Lye Cys His Ala Ile Phe 385 390 395 400 Cys Thr Ile Ile Ile Asn Giy Tyr Arg Glu Arg His Gly Gly Gin Trp 405 410 415 Pro Pro Val Thr Leu Pro Asp His Ala His Giu Phe Ile Ile Asn Ala 420 425 430 Tyr Gly Ser Asn Ser Ala Ile Ser Tyr Giu Aen Ala Val Asp Tyr Tyr 435 440 445 9Gin Ser Phe Ile Gly Ile Lye Phe Asn LyE; Phe Ile Glu Pro Gin Leu .450 455 460 *99Asp Giu Asp Leu Thr Ile Tyr Met Lye Asp Lys Ala Leu Ser Pro Lye *465 470 475 480 9*9Lye Ser Aen Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 485 490 495 Thr Asn Ala Ser Asn Giu Ser Arg Arg Leu Val Giu Val Phe Ile Ala 9...500 505 510 Asp Ser Lye Phe Asp Pro His Gin Ile Leu Asp Tyr Vai Giu Her Gly 515 520 525 99*9Asp Trp Leu Asp Asp Pro Giu Phe Asn Ilie Ser Tyr Her Leu Lys Giu 530 535 540 Lye Giu Ile Lye Gin Giu Giy Arg Leu Phe Ala Lys Met Thr Tyr Lys 545 550 555 560 Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Aen Aen Ile 565 570 575 Gly Lye Phe Phe Gin Giu Asn Giy Met Val Lye Gly Giu Ile Giu Leu 580 585 590 Leu Lye Arg Leu Thr Thr Ile Her Ile Ser Giy Val Pro Arg Tyr Asn 595 600 605 Giu Vai Tyr Aen Asn Ser Lye Ser His Thr Asp Asp Leu Lye Thr Tyr 610 615 620 249 Asn 625 Phe Ser Tyr Leu Ile 705 Ser Gly .Ser *9 Met 999*** Asn 785 995999 Val 9 Glu Ser Ala Arg 865 Asn Lys Glu Cys Glu Asn 690 Tyr Leu Gly Ala Val 770 Asn Arg Leu Lys Leu 850 Ser Gly Ile Phe Phe Ser 675 Lys Val Glu Ile Ile 755 Gin Tyr Phe Lye Arg 835 Ser Ala Tyr Ser Lye Leu 660 Thr Leu Gly Asp Glu 740 His Gly Asp Phe Leu 820 Ile Arg Ser Ser Asn Ser 645 Thr Ala Phe Asp His 725 Gly Leu Asp Tyr Asp 805 Asn Tyr Cys Ser Pro 885 Leu 630 Thr Thr Leu Asn Pro 710 Pro Phe Ala Asn Arg 790 Ser Glu Tyr Val Asn 870 Val Asn Asp Asp Phe Trp 695 Tyr Asp Cys Ala Gin 775 Val Leu Thr Asp Phe 855 Leu Leu Leu Ile Leu Gly 680 Leu Cys Ser Gin Val 760 Ala Lys Arg Ile Gly 840 Trp Ala Gly Ser Tyr Lys 665 Glu His Pro Gly Lye 745 Arg Ile Lys Glu Ile 825 Arg Ser Thr Tyr Ser Asn 650 Lye Thr Pro Pro Phe 730 Leu Ile Ala Glu Val 810 Ser Ile Glu Ser Ala 890 Asn 635 Asp Tyr Cys Arg Ser 715 Tyr Trp Gly Vai Ile 795 Met Ser Leu Thr Phe 875 eye Gin Gly Cys Asn Leu 700 Asp Val Thr Val Thr 780 Val Asp Lye Pro Val 860 Ala Ser Lye Tyr Leu Gin 685 Glu Lye His Leu Arg 765 Thr Tyr Asp Met Gin 845 Ile Lys Ile Ser Glu Asn 670 Ile Gly Glu Asn Ile 750 Val Arg Lye Leu Phe 830 Ala Asp Ala Phe Lye Thr 655 Trp Phe Ser His Pro 735 Ser Thr Val Asp Gly 815 Ile Leu Glu Ile Lye 895 Lye 640 Val Arg Gly Thr Ile 720 Arg Ile Ala Pro Val 800 His Tyr Lye Thr Glu 880 Asn 250 Ile Gin Gin Leu Tyr Ile Ala Leu Gly Met Asn Ile Asn Pro Thr Ile 900 905 910 Thr Gin Asn Ile Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 915 920 925 Tyr Ala Ser Leu Ile Pro Ala Ser Val Gly Gly Phe Asn Tyr Met Ala 930 935 940 Met Ser Arg Cys Phe Val Arg Asn Ile Gly Asp Pro Ser Val Ala Ala 945 950 955 960 Leu Ala Asp Ile Lys Arg Phe Ile Lys Ala Asn Leu Leu Asp Arg Ser 965 970 975 Val Leu Tyr Arg Ile Met Asn Gin Giu Pro Gly Glu Ser Ser Phe Leu 980 985 990 Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn ease 995 1000 1005 000Ile Thr Thr Met Ile Lys Asn Ile Thr Ala Arg Asn Val Leu Gin Asp :001010 loi5 1020 ***Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met Ile Glu *1025 1030 1035 1040 Glu Asp Giu Giu Leu Ala Glu Phe Leu Met Asp Arg Lys Val Ile Leu 1045 1050 1 055 .Pro Arg Vai Ala His Asp Ile Leu Asp Asn Ser Leu Thr Gly Ile Arg 1060 1065 1070 Asn Ala Ile Ala Gly Met Leu Asp Thr Thr Lys Ser Leu Ile Arg Val 1075 1080 1085 Gly Ile Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys Ile Ser @.1090 1095 1100 *Asn Tyr Asp Leu Val Gin Tyr Giu Thr Leu Ser Arg Thr Leu Arg Leu 1105 1110 1115 1120 Ile Val Ser Asp Lys Ile Lys Tyr Giu Asp Met Cys Ser Val Asp Leu 1125 1130 1135 Ala Ile Ala Leu Arg Gin Lys Met Trp Ile His Leu Ser Gly Gly Arg 1140 1145 1150 Met Ile Ser Gly Leu Giu Thr Pro Asp Pro Leu Giu Leu Leu Ser Gly 1155 1160 1165 Vai Val Ile Thr Gly Ser Glu His Cys Lys Ile Cys Tyr Ser Ser Asp 251 1170 1175 1180 Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn Ile Lys Ile 1185 1190 1195 1200 Gly Ser Ala Giu Thr Giy Ile Ser Ser Leu Arg Val Pro Tyr Phe Gly 1205 1210 1215 Ser Val Thr Asp Giu Arg Ser Giu Ala Gin Leu Gly Tyr Ile Lys Asn 1220 1225 1230 Leu Ser Lys Pro Ala Lys Ala Ala Ile Arg Ile Ala Met Ile Tyr Thr 1235 1240 1245 Trp Ala Phe Gly Asn Asp Giu Ile Ser Trp, Met Glu Ala Ser Gin Ile 1250 1255 1260 Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys Ile Leu Thr 1265 1270 1275 1280 Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 1285 1290 1295 *Thr Gin Met Lys Phe Ser Ser Thr Ser Leu Ile Arg Val Ser Arg Phe *1300 1305 1310 Soo le Thr Met Ser Asn Asp Asn Met Ser Ile Lys Glu Ala Asn Giu Thr 1315 1320 1325 Lys Asp Thr Asn Leu Ile Tyr Gin Gin Ile Met Leu Thr Gly Leu Ser 1330 1335 1340 Phe Giu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 1345 1350 1355 1360 Ile Val Met His Leu His Ile Giu Asp Glu Cys Cys Ile Lys Giu Ser 1365 1370 1375 Phe Asn Asp Giu His Ile Asn Pro Giu Ser Thr Leu Giu Leu Ile Arg 1380 1385 1390 Tyr Pro Glu Ser Asn Glu Phe Ile Tyr Asp Lys Asp Pro Leu Lys Asp 1395 1400 1405 Val Asp Leu Ser Lys Leu Met Val Ile Lys Asp His Ser Tyr Thr Ile 1410 1415 1420 Asp Met Asn Tyr Trp Asp Asp Thr Asp Ile Ile His Ala le 'Ser Ile 1425 1430 1435 1440 Cys Thr Ala Ile Thr Ile Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 1445 1450 1455 252 Aen Leu Lys Glu Ile Ile Val Ile Ala Asn Asp Asp Asp Ile Asn Ser 1460 1465 1470 Leu Ile Thr Giu Phe Leu Thr Leu Asp Ile Leu Val Phe Leu Lys Thr 1475 1480 1485 Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 1490 1495 1500 Lys Ile Glu Gly Arg Asp Leu Ile Trp Asp Tyr Ile Met Arg Thr Leu 1505 1510 1515 1520 Arg Asp Thr Ser His Ser Ile Leu Lys Val Leu Ser Asn Ala Leu Ser 1525 1530 1535 His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 1540 1545 1550 Ile Tyr Gly Pro Asn Thr Ala Ser Gin Asp Gin Ile Lys Leu Ala Leu 1555 1560 1565 *Ser Ile Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Giu Trp Leu Aen :1570 1575 1580 Gly Val Ser Leu Giu Ile Tyr Ile Cys Asp Ser Asp Met Glu Val Ala *1585 1590 1595 1600 *Asn Asp Arg Lys Gin Ala Phe Ile Ser Arg His Leu Ser Phe Val Cys 1605 1610 1615 **Cys Leu Ala Glu Ile Ala Ser Phe Gly Pro Asn Leu Leu Aen Leu Thr 1620 1625 1630 Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn Ile 1635 1640 1645 *Lye Glu Asp Pro Thr Leu Lye Tyr Val Gin Ile Ser Gly Leu Leu Ile 1650 1655 1660 Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lye Thr Ala Ile Lye 1665 1670 1675 1680 Tyr-Leu Arg Ile Arg Gly Ile Ser Pro Pro Glu Val Ile Asp Asp Trp, 1685 1690 1695 Asp Pro Val Giu Asp Glu Asn Met Leu Asp As Ile Val Lye Thr Ile 1700 1705 1710 Aen Asp Asn Cys Asn Lye Asp Asn Lye Gly Aen Lye Ile Asn Asn Phe 1715 1720 1725 253 Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys Ile Arg Ser Ile 1730 1735 1740 Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 1745 1750 1755 1760 Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 1765 1770 1775 Phe Gly Ile Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 1780 1785 1790 Ile Leu Met Lys Giu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 1795 1800 1805 Giu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 1810 1815 1820 Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn Ile Thr Asp Val Ile Gly 1825 1830 1835 1840 Gin Arg Glu Leu Lys Ile Phe Pro Ser Glu Val Ser Leu Val Giy Lys 1845 1850 1855 Lys Leu Gly Asn Val Thr Gin Ile Leu Asn Arg Val Lys Val Leu Phe 1860 1865 1870 *Asn Gly Asn Pro Asn Ser Thr Trp Ile Gly Asn Met Giu Cys Glu Ser *1875 1880 1885 Leu Ile Trp Ser Giu Leu Asn Asp Lys Ser Ile Gly Leu Val His Cys **1890 1895 1900 Asp Met Giu Gly Ala Ile Gly Lys Ser Giu Giu Thr Val Leu His Glu 1905 1910 1915 1920 His Tyr Ser Val Ile Arg Ile Thr Tyr Leu Ile Gly Asp Asp Asp Val *1925 1930 1935 *Val Leu Val Ser Lys Ile Ile Pro Thr Ile Thr Pro Asn Trp Ser Arg *1940 1945 1950 Ile Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser Ile Ile Ser 1955 1960 1965 Leu Lys Thr Ser Asn Pro Ala Ser Thr Giu Leu Tyr Leu Ile Ser Lys 1970 1975 1980 Asp Ala Tyr Cys Thr Ile Met Glu Pro Ser Glu Ile Val Leu Ser Lys 1985 1990 1995 2000 Leu Lys Arg Leu Ser Lau Leu Glu Giu Asn Asn Leu Leu Lys Trp Ile 254 2005 2010 2015 Ile Leu Ser Lys Lys Arg Asn Asn Giu Trp Leu His His Giu Ile Lys 2020 2025 2030 Glu Gly Giu Arg Asp Tyr Gly Ile Met Arg Pro Tyr His Met Ala Leu 2035 2040 2045 Gin Ile Phe Gly Phe Gin Ile Asn Leu Asn His Leu Ala Lys Giu Phe 2050 2055 2060 Leu Ser Thr Pro Asp Leu Thr Asn Ile Asn Asn Ile Ile Gin Ser Phe 2065 2070 2075 2080 Gin Arg Thr Ile Lys Asp Val Leu Phe Glu Trp Ile Asn Ile Thr His 2085 2090 2095 Asp Asp Lys Arg His Lys Leu Gly Giy Arg Tyr Asn Ile Phe Pro Leu 2100 2105 2110 Lys Asn Lys Giy Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 2115 2120 2125 Trp Ile Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Giy Arg Phe Pro 2130 2135 2140 *Asp Giu Lys Phe Giu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 2145 2150 2155 2160 Asp Thr Asp Leu Giu Ser Leu Lys Leu Leu Ser Lys Asn Ile Ile Lys *2165 2170 2175 Asn Tyr Arg Glu Cys Ile Gly Ser Ilie Ser Tyr Trp Phe Leu Thr Lys 2180 2185 2190 Giu Val Lys Ile Leu Met Lys Leu Ile Gly Gly Ala Lys Leu Leu Gly 2195 2200 2205 Ile Pro Arg Gin Tyr Lys Glu Pro Giu Asp Gin Leu Leu Glu Asn Tyr 2210 2215 2220 Asn Gin His Asp Giu Phe Asp Ile Asp 2225 2230 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 15462 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear 255 (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: ACCAAACAAG AGAAGAAACT TGCTTGGTA. TATAAATTTA ACTTAAAATT AACTTAGGAT 0* *0 0
C
C C
C
TTAAGACATT
TATTTGATAC
TCATTCCTGG
ATAATGAGAA
AACATGCACA
AGCTCTACCT
AGAAAGATCT
ATGAAAAGAC
TGCAGAACGG
CATGTTTAGG
TCTCAGGGTT
TGCAGGCAGG
CTCAACAGAG
ATGACCTCAC
GTCTCGCTTC
CTCTATCCAC
CAAAGGGACC
CACCAGGCAA
GAGCCATGCA
GACAAGCAGT
GAGTGACACA
AGACATCTTT
GACTAGAAGG
ATTTAATGCA
ACAGAAAAAT
AATGACATTA
A.AGGGCAGGG
AACAACAAAT
AAAACGGCAA
AACTGATTGG
CAGGAACAAT
AGCTCTTATA
AAGAAAAGGC
GCTGGTATTG
CTTGGTAACT
AACCATAGAA
ATTCTTCAAT
TCTCAGACCA
ACGCGCTCCT
CTATCCTGCC
ACAGTATGTG
AGCACGTGAT
CGAAGCTAAA
CCACAAACCG
TCAAGAA.AAG
CGTAGGCAAG
ACTGTCTCTA
GCTCTTCTAT
TTCTTGGTGT
GGAAGTAATG
AAGTATGGAG
ATATTTGGAA
TCAACAATTG
ATACAGATCT
TTTTTCACCC
AGCGGTGACA
CTTATGGTTG
AAGAATATAC
ACAATCAGAT
GATATCAATA
TTCATCTGTA
ATATGGAGCT
ACGGGAAGAT
GCCGAAGCTC
GAAAGCTTGA
ACAGGTGGAT
GGAACTCTAT
AAAACATAAC
TATTCGCCCT
TTCTATCTCA
CTTTATTGTC
CAGATGCCAA
GATTTGTGGT
GTGACCTGGA
AAGACCTTGT
GGATAGTTCT
GATTGGAAGC
CAGTGGATCA
AAACATTAAT
AAATTGTTGG
ATGGAATTGA
GATTAAAAGC
TCCTCAGAGA
ATGCAATGGG
CATATCTAGA
AAATGAGCTC
AGAGACATAT
CAGCCATAGA
AATTTCAAAA
AAAATCAGCC
TGGACCGACA
TTCACTAGAT
AATGGCTTAT
GTATGTCATA
TAAGACGAGA
TTATGATCAG
CCACACATTT
GGTCAAAGCT
TTTCAGACAA
GATTGGGTCA
AACAATGAAT
CAACTACATA
GACCAGAATG
TTTGATGGAA
TCCTATACAT
GGTGGCAGTT
CATTGATATG
AACACTGGAA
AAGGAACATA
GATGGCAATA
ATGTTGAGCC
GGTGGAGCTA
ATAACTGATG
AATGAGAAAC
GCCAATCCAG
TACATGATTG
GAGATGATAT
GAAACTATGT
GGGTATCCAT
ATCACTAGTA
GATGGAACAG
ATCATGCGGT
ACCAGCAGAA
AGAGATGCAG
GCAGCTTTGA
CTGTATTTAT
GGTGAGTTCG
GTACAAAATA
TTCCAGCTAG
GATGAACTTG
AACAGTTCAG
GATGAAGAGC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 256 CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA
TTCAATATGC
CTGACAATAT
ACA.AGA.AGAA
AAATAGATGA
ATCAATAATA
GGTA.AATTTA
AAACTATCAA
CTCGGCCCTC
AAACGACACA
CAAACCAACA
GTCATCACAC
GAGAGGACCT
AATCCCCAGA
CAATGAAATT
TGTTCCAAGC
TGATCATGGA
TACTGCTGCA
AAGTTCTTCA
CTGGTTTAAG
ATCAAAAGGG
AACAGAAATA
CAACAACACC
AACTTATACA
AAATGGAAAG
CTGGGCAGALA
CAAGACCGAA
ACAAAGCAGT
TCTGTTTALAC
AATAAGAAAA
GAGTCTGCTT
ATCATGGATT
AACATCATTG
ATCAACACAA
GAAACAAGTG
GAATGTACAA
GGGAGAAGAA
AGCATCACAG
AGAAAGATGG
GAGATATCAG
AGAAGCCTGG
ACACCAGATG
ACACATCAAG
AAATCAAAAG
CAGAAGAAAA
CAGACAGAAT
GACCGGAACG
AAAGAATCGA
GAAAGGAAGG
GGAILATAGALA
CAACAAAACA
CAACCACCCA
GCATTTGGA.A
ACTTAGGATT
GAA.ACTCAAT
CTTGGGAAGA
AATTCATACT
GA.ACCCAGCA
AGAAAGATAG
CAGAAGCAAA
GCAGCTCAGA
ATTCTAAAAA
ATAAGGACTC
GAAGTGATGA
AATCTATCAG
ATGAAGA.AGA
AAGATGACAA
ATACCGACAA
TCTCAALAGAC
CATCAGAAAC
AACAGACAAG
TCCGAACAAA
ATACAGAAGA
GCGATGATCA
TCAGAGACAG
CTA.ATCCCAC
GCAACTAATC
AAAGAATCCT
CAATAGAGAG
GGA.ATCAAGA
CAGCACCGAC
ACTCAGTGCC
TGGATCAACT
AGATAGAALAC
TAGTAGAGCT
TGGAACCCAA
TATTGAGGGG
CATATTTACA
TACACCTGAT
AATACTAATG
AAGAATTAAA
CCAGATACCA
AACAACCACC
ACAATCCTCA
CACAACTCCT
CTCTGAATCC
GAGCAATCGA
GACTGAGCAA
ACTAAACAAG
AAACAGAACA
GAATCAACAT
ATCATACCGG
TTGATGGAAA
GATAAATCAA
CCCCAAGAAG
ACCATCTGTC
GACAAAAATA
ATTGATCAGG
GAGACTGTGG
AACACGGAGG
AAAATGCGAC
ACAGAACAA
ACAAGATCAA
AAAAATAGTA
AA.AGGGGGAA
ACATCAGACT
AACACCGACA
TCATGGAATC
CCAACAACAA
AAACCCAAGA
TTTACAGAGA
GCTACAGAAT
AGACTCAACG
AACCAGGACG
TTTAATCTAA
AATATAGGGT
GCGATGCTAA
CTAATATCTC
ACTTATCGGA
AACCAGAAAT
GACAGTCTGG
AAACTGTACA
TCTCTGGAGG
ATATTGATCT
AATCTGCAAA
GTAGAAACAG
TAAGTGTTGT
GGACAAAGAA
AAGGGAAAGA
ACAGATCCAC
CAAAGGGGCA
TCATCATCGA
CTTCCAGATC
CACAAAAGAC
GGGCAATTAC
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAALA CTAGATTTAT ATCAAGACAA 257
ACGAGTTGTA
CCTGGCAGGA
AAATGAAATG
GATAGAAAAT
TATGACTGAG
CAAAACAAAA
ACAAGGCATT
CGATGTACAA
AATACCCAAA
TCTCTCACAA
AGAAGTATCT
CAAAGAAACG
ACACAAAATC
AGGATTAAAG
CATTCCCAGA
ATGAACAGAG
ACGGATCCCG
ACAAATACGG
GATCATTACC
CAACCAAACT
CGGTACAAAA
TGTTCGATGC
AATTTAGAGT
CTAAGTCA.AT
TAAAAACAGG
AAAAATCACT
TGTGTAGCAA
TTAGTCATAG
CTAAACCTCA
CAAAGAGAAC
AGAGGAGGAA
TTGAAAGAAG
GACAAGAATA
GTTAAATCAG
AAAGTGAGCA
AGCACAAAAC
GAATTAATGG
ACACCGAACA
AAGCAGAATG
AATAAATTAA
ATCATCATTC
GAAAGCAGTA
GTATTTAGAT
GAGTGTGAAT
AATCGGATTG
GGATATAGAA
TATAAILACCA
CAACAAAGTT
AATCTTCGTG
GGCATCACTA
GGTTCAGACT
GAATTTCATG
ATGTACTAAA
GGGTTTCAAT
AAGCAGATCT
AACTGTCATT
AGAAAGACCA
AAAAGATCAA
TACCCGATCT
AGATATTAAG
GTACAATGAG
AATCATACAT
ACATGTTCAA
AACAGACAAG
AAACAACAGA
TCCTTGTCCA
TCTGAAAATG
CCCCACATTA
GTCTTCTTAC
GATCTCGACA
GCTAAGTACA
GTGAGAAGAA
GAACTGTACC
GCTCTTGCTC
APLTTGTACGG
TCTCTAACCA
GATTCTAAAG
GTCCATCTCG
CAATGTAGAT
GGACAACGAC
AAAGAAAATG
GATCACGTCA
AAATGIAATCC
GAAGACCAGG
ATATCGACAT
TTCATACAAT
ATCACTAGTT
AAACGA.ACTC
TGAAGATGTC
AAACAACAGT
TATCAATCAA
A7LATGAGTAT
GTCATATAGA
GAGTTGCCAA
TCGGCTTCTT
GTGACCCGAG
CTGGGAATGA
CAGTCAAAGC
CATGGTCCAA
CTCAATGTCT
CAATTGGATC
ACACAATATC
GGATAGTTCA
GATTGATCAA
ACTGCATCAA
ACAAAATTAA
GACGAATCAC
CTAATTTCAA
AATGAGAGAG
TTTGACCCAC
GCAGGAGATA
GAGTCAAATG
GCAGTCATCA
AAACGTTGCA
AACAATTGCC
AGATCAAAAC
TATACAAATA
AACTAACTCT
ACCATTACCA
GATCGGAAAT
CGAGATGGAA
TTACAAAGTT
CCAGGAATTG
GAAAGAGATG
TAGACTAAGA
TCCACTAGAT
AATAACCTTG
AATCAATCTG
AATTTTGGAT
AAGAAAAGTA
AGATAGATTT
CACAGATACA
ATAGAAGATT
ATCTCAAAAT
TATCCATGAT
TTATGGAGGC
CACTAGAGAA
CAACAAGACT
ACAACAGCAA
AAAATGATGA
AATGATCCAA
CTGTCAACAC
AGAAAALACTT
GCAATATACA
CTCAAAGTCA
CCACCAAAAC
CGAATCA.AAG
TGTGGCTCTG
TTACAAGCCG
GTTGTTTACA
AAAGGAATGC
AGGAGCALTAA
TTCAAAATTC
CAGGTACACA
GAGAAAGGCG
GGCAGAATGT
3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 258
ACTCTGTTGA
TAGTTGGAGG
GTCAGCTGGT
ATCTAGTTAT
CTTTACCTGG
AACAATGGAA
AAGGATAATC
TCGCAAGAAT
ACAGAACACC
GAGACCGGCA
AACCATGATC
ATTGGTCAAC
TTTGAGCCTC
ATACAAGAAG
AGATGTGATA
CTTTGGAGGG
GGCAGTTGCT
AATTAGGGAC
AGCAATTAAA
AGGTTGTGAA
AACAAACATA
TATAGCATCA
ATATGATATC
CTTGAATGAT
CACTCAGATC
CCCTCTTCCC
ATACTGTAA
ATTCAAAAGA
CTGGGCTTCA
CGAGTTCAGA
CTAGTALATCT
AAAA.ACTTAG
AAGAGAGAAG
AGAACAACAA
ACACAACAAG
ATGGCATCTT
AGTCCCAAAG
ATACCAAAAA
TTATTGGATA
GTAACCAATC
GTAATTGGAA
CTGGTTGAAG
ACAkAATAAAG
TCAGTCCAGG
GCAGCAGGAC
TTTGGTGATA
TTATACCGCA
TATGATCTGT
TACTCAATCA
TACAAAGTAG
AGCCATATCA
CAGAAAATCG
CATGTCAATG
GAGATTTGTT
TCAGTAGAGA
TACTATCCTA
CTATTTTAGT
GACAAAAGAG
GGACCAA.AAA
AATCAAAACA
CACTGAACAC
TCTGCCAAAT
GGATGAAGAT
TAGAAGACTC
-GACTGATCAT
AAGAATCCAA
CCATTGCTCT
CCAAGCAGGC
CAGTGCAGTC
ATTATGTTAA
TTCAATTAGG
ACATAGGATC
CAAATATCAC
TATTTACAGA
CCCTCCAAGT
ATTCCATATC
TGACGAALAGG
AGAAAATGAG
CAACTGGGTC
ATCCTTTAAT
TTACAAGAGT
ATATTATTGC
CCGGACGTAT
GTCAATACCA
AGTCAAATAG
TCCAACTCAC
AATGCCAACT
AGATATCACA
ATCACAAAAC
TAACTCTTGT
CCCTTTATAT
TGAAAACACT
GGGAGTAGCA
AAGATCAGAC
AGTTCAGAGC
CAAAGAAATC
AATTGCATTA
GTTACAAGAA
AGAAkATATTC
ATCAATAAAG
CAGACTCCCT
ATATAACATC
GGCATTTCTA
ATTGATATTT
CATATCAAAA
GGATCTAAAT
GGATGCAATT
AAAAGGAGTT
CTATTAAGCC
ACAACTATTA
GAGAAATCAA
TCAAAACAAA
TCAATACTGC
AAACTACAGC
TTTGALACAA
GGTGACCAAC
GATGGATTAA
GATCCCAGAA
ACCTCAGCAC
ATCGAAAAAC
TCCATAGGAA
GTGCCATCGA
ACACAGCATT
AAAGGAATAA
ACAACATCAA
GTGAGAGTTA
TTATTAACTA
CAA.AACAGAG
GGTGGAGCAG
TCTTTAGGAC
ACACTAGCAA
CCGCATCTCA
TTCCAACCTT
GGGAAAATCA
GAAGCAAATA
GCAGTCACAC
AA.CAAAAGGT
AATTCCAAAA
TA.ATTATTAC
ACGTAGGTGT
GATATCTAAT
AGATCAAGCA
GATTACAGAA
CAAAACGATT
AAATTACAGC
TCAAAGAAGC
ATTTAATAGT
TTGCGAGGCT
ACTCAGAATT
AATTACAAGG
CAGTTGATAA
TAGATGTTGA
GGCTGCTGAA
AATGGTATAT
ACGTCAAAGA
4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 259
U
U
U
U
ATGTATAGAA
TGAAATAGAG
AGACATTGTT
CACCTGTACA
AATTATAACA
TAAAGAAGGA
ACTTGATCCA
A.AAAGAATGG
TAGCACTACA
GATAATTACA
TGACAAGCCA
TATAAAAAAC
CCCAATAGAC
TGCTGGCAAT
AATATACATA
AATTAATTCC
GTTTATGGA
GTCAGGAGTG
ATCATTGACA
TGATAATCAA
TCCAGATGAT
AAGGTTAATG
AACTCCGTCT
AGGTTGTCAG
CTCAGACTTG
TAGGAAGTCA
GCATTCAGCA
AGCTGCTTAT
CCAAGATATG
TGCAACGGAA
CATAAAGAAT
ACT CTTGCAT
ATTGACATAT
ATAAGAAGGT
ATCATAATTA
ATTGCAATTA
TATGTACTAA
TTAGGAGTAA
AAATCCAAAT
GAGCTGGAGA
TTATGGACAA
ATCAAALAGTG
ATTACAGAAA
AATACA.AGGC
CAACAGATGT
GAAGTGCTGC
TTTTGGAGAT
CCAGGGCCGG
TTAGTTATAA
GATATAGGAA
GTACCTGACT
TGTTCTCTAG
GCTATATATG
CAGGAAACAT
CATTTGTCAA
TTGGTAATAG
GTAGTACAGT
TCTATACACC
CAATCGAGCT
CA.AATCAAAA
TTTTGATAAT
AGTATTACAG
CAAACA.AATA
AGTTACGCAA
TCGAGATGGA
CGTCTATGGC
TAATCCTGGT
AAAAGGCCCA
AGATCCAAAT
TTCTTACAAT
CAGATCTTAG
CACAAAGAAT
GCACGTCTGG
GATTATTAGC
ATGATCTGAT
AATCATATCA
TAAATCCTAG
CACTCCTAAA
CCCTTCTGAT
ATCCCAATGT
TGGAGGAGTG
AATCALATCAA
AGGTATCILAC
AAATGATATA
CAACAAGGCC
ACTAGATTCT
GATCATTATA
AATTCAAAAG
ACATATCTAC
TCCAACTCTA
ATACTGGAAG
TACTCATGGC
GTTATTATCA
CGAATCATTG
GGCATCGGAT
TCAGAGTCAT
GAAATTCATT
AACACATGAT
TCTTCCATCT
TATGCCAACG
TTATGCTTAT
AGTCTTACAG
GATCTCTCAT
TACAGATGTA
CCAGGATTTG
CCAAGAACAA
GTTGCAAACT
CCACCTGATC
GGAATGCTGT
ACACTAAACA
AAATCAGATC
ATTGGAAATT
TTGTTTATAA
AGAAATCGAG
AGATCATTAG
CTCATATAAT
CATACCAATC
AACAAGCTCA
ATAGTCTTCA
CTGCAAGACA
A.ATACCAATG
GTCCAGAATT
AGTGAAATTA
GTAGGTATAA
TTAATGAkA
ACTGTTGATG
ACCTCAAATC
ATAGGGATA.A
ACCTTTAACA
TATCAACTGT
TATTAAACCA
CGGTCACATC
GTATAACAAC
AAGGAGTAAA
TCAATACAAA
ATTCTGTTAC
TAGALAGAATC
GGCATCAATC
TTAATATAAC
TGGATCAAAA
ATATTAAA.AT
TGAGGAAGGA
ACGGAAAGGA
CTAATA.AGAT
TCATAGTGCT
TAA.ATAATGA
ATCTAATACA
ACATACCAAT
CAATTAGAAA
AACCTTTAAA
CTCCAAAAAT
GCTGTGTTAG
TAATTACTCG
TAACTGTAAA
TAAATGACAA
GTTCAACTCC
6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620
U.
260 et S.
S *5 5
S
t be
*SESSS
5 *.a
S
.555 5,5.55
S
5* S S
S*
CAAAGTTGAT
TGTCAATTAT
TCAACCATAT
AATATTTCTC
AACTGGGTGC
TTCAGATAGG
AA.AATTGAAA
ACTTCTACTA
ACAATTAGGA
TAATGTGCTA
ATGTATAACA
ATCTGTCATA
AACCGAAAGA
AACAAGCTGC
TAA.AAGCTTA
TTAATCATAA
ATCAGCAPLTC
GGGAAATGGA
ACCTTAACTC
CTCAGCCTTA
TTAATAAATT
AAGTGAATGA
TCAA.ATTATA
GAACATATAG
TAGCCTCAA)
AAGTTCACAC
GAAAGATCAG
GATGGTTCAA
GCTGCACTAT
GGGTATGGAG
CCCGGGAAAA
AGGATGGTCA
GTATGGACGA
GGTAACAAGA
ATAATTGATA
TCAAGACCAG
GGAGTATATA
TTAGACTCAC
GTAAACGAGC
ATTACACACT
AACACATTTC
TTAACCATAA
AGACAATAGA
CACTGAATCT
TCCTATCGTT
TGATATGGAT
GGATAAAAGA
CTTAGGAAAA
TATACCTGG7 1TCAAATGAC7
LAAATGATGG.A
'AACCTATAAP
ATTATGCATC
TCTCAACAAC
ACCCATCTGT
GTCTTGAACA
CACAGAGAGA
ACTCCATCAT
TATCTATGCG
TCTATATATA
TTACTGATTA
GAAACAATGA
CTGATGCATA
AAAA.ATCGAG
TGGCCATCCT
ATAACAAAGG
AACCCATGTT
TATGCATCAA
CAAAAGGGAA
AACAATGGCA
AALAGGTAAAA
GACGACTCAA
CAACGATCTA
*TACACATTTA
ATTAAiCAGTA
*GATGGATTAA
AGCAATTATC
LTCAGATAAAI
ATCAGGCATA
AAGATTTAAG
TGGACCAGGG
TCCAATAAAT
CTGTAATCAA
TGTTGCTGAC
ACAAAATTAC
TACAAGATCT
CAGTGATATA
ATGTCCATGG
TCCACTCAAT
AGTGAACCCA
AAACAGAACA
ATATTGTTTT
GTTCAAAACA
TCTATCTATA
ATATAAAAAA
CTGTATCTGA
TAGCACAATT
TACTAGTTAT
TTAGAAGATT
TCAGATATCC
'AAGTGACTGA
GAGATCTATG
IATCTTAATGA
GGTATAATCC
GAAGATATTG
AATAATAACA
ATATACTACA
GAGAATGTAA
GCGTCTCATA
AAAGGCTTAA
TGGGGGTCAG
ACAAGTTGGC
AGGATAAAAT
GGACATT CAT
CCCACAGGGA
GTCATAACTT
CTCTCAGCTG
CATATAGTAG
GAGATTCCAA
ATACAAGTAT
CTTAGGAGCA
CATACTCTAT
ACACACTATT
CACTAGACAG
AAAATTAATA
AGAAATGTCA
ATTATTACTT
GATTAATGTG
AGAAATTAAT
ATTCAAAACA
TACTTGATAT
TAAGCTTTGA
AAGGCAAAAT
TCTGCAACAC
GTCCATGGTT
PCTCAATTCC
AAGGAAGGTT
ATAGCAAGTT
GGACATGGCA
GTCCAGATGG
GCATTGTGTC
ACTCAACAGC
GATATACAAC
AAATAAATCA
AAAGCTGCAG
ATGATAAGTA
AAGCGTGCTC
CCTGAGTGTC
ATGAGTCTAC
AAAATAAAAC
TTAACTGAAA
AAAGAAATGT
AAAGCAGATA
CTATCAAAAT
AATATATCGA
TGGTTTACTA
7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 261
S
TCAAGTATGA
AGGATTATAA
TGATATTAGA
ATTGTGACGT
TACAATCTAT
TTATGGGAGA
TTCAAACTCA
AGATGGAATT
TTGATAAAAT
CTTTTTTTAG
GAAAATATAT
TCTTCTGTAC
TGACATTACC
TATCATATGA
TCATAGAGCC
CAAAAAAATC
CATCCAACGA
ATCAGATATT
CTTATAGTCT
ACAAAATGAG
TCTTTCAAGA
TATCAATATC
ATGACCTTAA
AGAAATTTGA
TCCTAACAAC
TTGGAGAAAC
TATGAGAAGA
CTTGTTAGAA
TAAACAAAAC
AGTCGAAGGC
GTATCAGAAA
AAAGACATTT
TGATCCTGTT
AATATTTGAA
TTTAGATATA
AACATTTGGG
GTATATTGGA
AATAATAATT
TGATCATGCA
GAATGCTGTT
TCAGTTAGAT
AAATTGGGAC
ATCACGAAGA
GGATTATGTA
TAAAGAAAAA
AGCTACACAA
AAATGGGATG
AGGAGTTCCA
AACCTACAAT
ATTCAAGTCA
AGATCTCAAA
TTGCAACCAA
TTACAAA.AAG
GACCAGAAGA
TACAATGGTT
CGATGGAATA
GGTAATAACC
GATGTGATAT
AAACAACTAA
TCTAGAGAAT
TTTAATAAGT
CATCCTCCAT
AAACAATTAA
AACGGATATA
CACGAATTCA
GATTATTACC
GAGGATTTGA
ACAGTTTATC
TTAGTTGAAG
GAATCTGGGG
GAGATCAAAC
GTTTTATCAG
GTGAAGGGAG
CGGTATAATG
AAAATAAGTA
ACGGATATCT
AAATACTGTC
ATATTTGGAT
CTCGAAATGA
ATTTCTTATT
ATCTAATTAC
TAAGTGCATG
TGTGGGAAGT
CGTTATTAGA
GAGGAGCTTT
CGATTAAGGA
CTACAATAGA
TAGAAGCTAG
AATTTGACAC
GAGAGAGGCA
TCATAAATGC
AGAGCTTTAT
CAATTTATAT
CTGCATCTAA
TATTTATAGC
ACTGGTTAGA
AGGAAGGTAG
AGACACTACT
AGATTGAATT
AAGTGTACAA
ATCTTAATTT
ACAATGATGG
TTAATTGGAG
TAAATAAATT
GATCACTTTT
GATACATCCA
TCCTGAATTA
TGCTAAGTTA
GATAGATAAA
ACCACTTGCA
TTTAAATCAT
ATTTCTGAGT
TGAAATAGCA
TATTGCAGCA
TATTAATAAA
TGGTGGACAG
TTACGGTTCA
AGGAATAAAA
GAAAGATAAA
TTTACTGTAC
AGATAGTAA.A
TGATCCAGAA
ACTCTTTGCA
TGCAAATAAC
ACTTAAGAGA
TAATTCTAAA
GTCTTCTAAT
ATACGAGACT
ATATGIAATCA
GTTTAATTGG
AATGTTGGGA
GAATTGGTTT
GTATTGATGT
GATCCAAAAT
TTGTTTCCAA
TTATCCTTAA
GTGTTATCCG
GTAGATTACA
GAGATTTTCT
GAAAAGGTTA
TGTCATGCTA
TGGCCTCCTG
AACTCTGCGA
TTCAATAAAT
GCATTATCTC
CGTACTAACG
TTTGATCCTC
TTTAATATTT
AAAATGACAT
ATAGGAAAAT
TTAACAACCA
AGCCATACAG
CAGAAATCAA
GTGAGCTGTT
ACAGCTCTAT
TTACACCCTC
9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 262
GTCTTGA.AGG
ATATATCATT
TAGA.AGGATT
CTGTTAGAAT
TAACCACAAG
ATGTAGTGAG
AATTAAATGA
ATGGGAGAAT
CAGTAATAGA
TTGAGAATGG
AACTATATAT
AGTATTTTAG
GATTCAATCA
CCGCATTGGC
ATAGGATTAT
ATTCATGCAA
GGAATGTATT
TAGAAGAAGA
TTGCACATGA
TAGATACGAC
TGTTGAGGAA
GACTAATTGT
CATTGCGACA
CGCCTGACCC
TATGTTATTC
AAATAGGATC
AAGTACAATC
AGAGGATCAC
TTGTCAAAAA
AGGCGTGAGG
AGTACCCAAC
ATTTTTTGAT
AACGATTATA
TCTTCCTCAA
CGAAACAAGA
TTATTCACCT
TGCCCTTGGG
GAATCCAAAT
CATGGCCATG
TGATATTAAA
GAATCAAGAA
TTTACCACAA
ACAAGATTCA
TGAAGAATTA
TATTCTAGAT
AAAATCACTA
AATCAGTAAT
AAGTGATAAA
AAAGATGTGG
ATTAGAATTA
TTCAGATGGC
AGCAGAAACA
TATGTAGGTG
CCTGATTCTG
TTATGGACAC
GTGACTGCAA
AATTATGACT
TCATTAAGAG
AGTAGCAAGA
GCTCTAAZAAG
TCAGCATCTT
GTTCTAGGAT
ATGAATATCA
TGGATGCAAT
TCAAGATGTT
AGATTTATTA
CCAGGTGAGT
TCTCAAA.ATA
CCAAATCCAT
GCTGAGTTCC
AATTCTCTCA
ATTCGGGTTG
TACGATCTAG
ATCAAGTATG
ATTCATTTAT
CTATCTGGGG
ACAA.ACCCAT
GGTATATCGT
ATCCTTACTG
GTTTTTACGT
TCATATCTAT
TGGTTCAAGG
ACAGAGTTAA
A.AGTGATGGA
TGTTCATATA
CATTATCTAG
CAAATTTGGC
ATGCATGCTC
ATCCAACTAT
ATGCCTCTTT
TTGTAAGGAA
AGGCGAATCT
CATCTTTTTT
TAACCACCAT
TATTATCTGG
TGATGGACAG
CAGGAATTAG
GCATAAATAG
TACAATATGA
AAGATATGTG
CAGGAGGAAG
TAGTAATAAC
ATACTTGGAT
CATTAAGAGT
TCCTCCATCA
TCATAACCCA
AAGTGCAATA
AGACAATCAA
GAAGGAGATA
TGATCTAGGT
TAGCAAAAGA
ATGTGTCTTC
AACATCATTT
AATTTTTAAG
AACACAGAAT
AATACCTGCT
TATTGGTGAT
ATTAGACCGA
TGACTGGGCT
GATAAAAAAT
ATTATTCACA
GAAGGTAATT
AAATGCCATA
AGGAGGACTG
AACACTAAGT
TTCGGTAGAC
GATGATAAGT
AGGATCAGAA
GTATTTACCC
TCCTTATTTT
GATAAAGAAC
AGAGGGGGTA
CATCTAGCAG
GCTATAGCTG
GTTTATAAAG
CATGAACTTA
ATCTATTATG
TGGTCAGAGA
GCAAAAGCAA
AACATTCAAC
ATCAGAGATC
AGTGTTGGGG
CCATCAGTTG
AGTGTTCTTT
TCAGATCCAT
ATAACAGCAA
AATACAATGA
CTCCCTAGAG
GCTGGAATGT
ACATATAGTT
AGGACTTTGC
CTTGCCATAG
GGACTTGAAA
CATTGTAAAA
GGTAATATCA
GGATCAGTCA
10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 263 a. a.
a a a a a. a a a a *0
C
CTGATGAAAG
CCGCAATAAG
TGGAAGCCTC
TAACACCGGT
TGAAATTCTC
ACATGTCTAT
TGTTAACAGG
ACCCTATAGT
ATGAACATAT
TTATTTATGA
ACCATTCTTA
CAATATGTAC
AAGAGATIAT
CTCTTGACAT
ACACTCTTTA
CACTGAGAGA
AAGTATTCAA
CTAGTCAAGA
TGAGAGA.ATG
TTGCAAATGA
CAGAALATTGC
TATTGAAACA
TATCTGGATT
TCAAATATCT
TAGAAGATGA
ATAATAAAGG
AT CTGAAG CA
AATAGCAATG
ACAGATAGCA
AGCTACATCA
CAGTACATCA
CAA.AGAAGCT
ATTAAGTGTT
TATGCATCTG
TAATCCAGAG
TAAAGACCCA
CACAATTGAT
TGCAATTACA
AGTTATTGCA
ACTTGTATTT
TAGTCTAAAA
TACTTCCCAT
GAGGTTCTGG
CCAGATAAAA
GTTGAATGGT
TAGGAAACAA
ATCTTTCGGA
ATATCTTGAA
ATTAATTAAA
AAGGATTCGC
AAATATGCTG
GAATArT
CAATTAGGAT
ATATATACAT
CAAACACGTG
ACAAATTTAT
TTGATCAGAG
AATGAAACCA
TTCGAATATT
CACATAGAAG
TCTACATTAG
CTCAAAGATG
ATGAATTATT
ATAGCAGATA
AATGATGATG
CTCAAGACAT
ATAGAAGGTA
TCALATATTAA
GATTGTGGAG
CTTGCCCTAT
GTATCACTTG
GCCTTTATTT
CCTAACCTGT
TTAAATATTA
TCGTTCCCAT
GGTATTAGTC
GATAACATTG
AACAATTTCT
ATATCAAGAA
GGGCATTTGG
CAAATTTTAC
CACACAGATT
TCAGCAGATT
AAGATACTAA
TATTTAGATT
ATGAGTGTTG
AATTAATTCG
TGGACTTATC
GGGATGATAC
CTATGTCACA
ATATTAATAG
TTGGTGGATT
GGGATCTCAT
AAGTATTATC
TTTTAAACCC
CTATATGTGA
AALATATACAT
CTAGACACCT
TAA.ACTTAAC
A~AGAGACCC
CAACTGTAAC
CACCTGAGGT
rCAA.AACTAT
GGGGACTAGC
TCTTAGT)AJA
TAATGATGAG
ACTAGATAGT
AAAGGATACT
TATAACAATG
TCTTATTTAT
AAAAGAAACC
TATTAXAGAA~
ATATCCTGAA
AAAACTTATG
TGACATCATA
ATTAGATCGA
CTTAATCACT
ATTAGTAAAT
TTGGGATTAT
TAATGCATTA
TATTTATGGT
ATATTCACTA
TTGTGACAGC
TTCATTTGTT
ATACTTGGAG
TACTCTTAAA
ATACGTAAGA
AATTGATGAT
AAATGATAAC
ACTTAAGAAC
CCTGCAAA-AG
ATATCTTGGA
CTCAAAATTT
GCAACTCAGA
TCCAATGATA
CA.ACA2LATAA
ACAGGACACA
AGTTTTAATG
AGTAATGAAT
GTTATTAAAG
CATGCAATTT
GATAATTTAA
GAATTTTTGA
CAATTTGCAT
ATAATGAGAA
TCTCATCCTA
CCTAATATTG
GATCTATTTA
GATATGGAAG
rGTTGTTTAG kGACTTGATC 1ATGTACAAA kAGACTGCAA
LGGGATCCGG
rGTAATAAAG rATCAAGTCC 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 264
S
TTAAAATCAG
GTGGTTTGAC
TCAACAGCAC
ATAAAGACAA
ATGCCACATT
TTGGTCAACG
GAAATGTGAC
CATGGATAGG
TTGGATTAGT
ATGAACATTA
TTTCCAAAAT
TATATTGGAA
TATATCTAAT
CAAAACTTAA
CAAAGAAGAG
GAATCATGAG
ATCTGGCGAA
GTTTTCAGCG
AGAGACATAA
GACTGCTATC
TTACAGGTCG
TAGCTGATAC
GAGAGTGTAT
AATTGATTGG
AGTTATTAGA
TGAAGATATA
ATCTATAACA
ACTTCCTCAA
TAGTTGTCTG
GGACAGGCTC
AGGACCTGCA
AGAATTGAAA
ACAGATTCTT
AALATATGGAA
ACATTGTGAT
TAGTGTTATA
TATACCTACA
AGATGTAAGT
TTCGAALAGAT
AAGATTGTCA
GAATAATGAA
ACCATATCAT
AGAATTTTTA
AACAATAAAG
ATTAGGCGGA
GAGAAGACTA
CTTTCCTGAT
TGATTTAGAA
AGGATCAATA
TGGTGCTAAA
AAACTACAAT
TCCTAACCTT
AGTGATTCTG
GGAGGGAATT
AAAGCTCTTG
TTCCTGGGAG
GTTAATTATT
ATATTTCCTT
AACAGGGTAA
TGTGAGAGCT
ATGGAAGGAG
AGAATTACAT
ATCACTCCGA
ATAATATCAC
GCATATTGTA
CTCTTGGAAG
TGGTTACATC
ATGGCACTAC
TCAACCCCAG
GATGTTTTAT
AGATATAACA
GTATTAAGTT
GAAAA.ATTTG
TCATTAAAGT
TCATATTGGT
TTATTAGGAA
CAACATGATG
TATCTTTAAG
ATGATAATGA
ATCTATCGCA
AGTTATCACA
AAGGAGCAGG
ATAATTCAGG
CAGAGGTATC
AAGTACTGTT
TALATATGGAG
CTATCGGTAA
ACTTGATTGG
ATTGGTCTAG
TCAAAACTTC
CTATAATGGA
AAAATAATCT
ATGAAJATCA-A
AAATCTTTGG
ATCTGACTAA
TTGAATGGAT
TATTCCCACT
GGATTTCATT
AACATAGAGC
TATTGTCGAA
TTCTAACCAA
TTCCCAGACA
AATTTGATAT
CCTAGGAATA
TAGACTAGAT
TCAATTGAGA
AATTTTAATG
AGCTATGCTA
TTTGAATATA
ATTAGTAGGT
C-AATGGGAAT
TGAATTA.AAT
ATCAGAAGAA
GGATGATGAT
AATACTTTAT
TAATCCTGCA
ACCTAGTGAA
ATTAAAATGG
AGAAGGAGAA
ATTTCAAATC
TATCAACAAT
TAATATAACT
GAAAAATAAG
ATCATTATCG
ACAGACTGGA
AAACATCATT
AGAAGTTAAA
ATATAAAGAA
CGATTAAAAC
GACAAAAAGT
GCTALATACAA
TTATTCGGAA
AAGGAAGTCA
GCATGTTATG
ACAGATGTAA
AAAAAATTAG
CCTAATTCAA
GATAAGTCCA
ACTGTTCTAC
GTTGTTTTAG
CTATATAAAT
TCAACAGAAT
ATTGTTTTAT
ATCATTTTAT
AGAGATTATG
AATTTAAATC
ATAATCCA.AA
CATGATGATA
GGAAAGTTAA
ACTCGATTAC
TATGTATCAT
AAGALATTACA
ATACTTATGA
CCCGALAGACC
ATAAATACAA
AAGAAAAACA
13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 6 265 TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG GT INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 2233 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID 15462 Met Asp Thr Giu Ser Asn Asn Gly Thr Val Ser Asp Ile Leu Tyr Pro 0000.
0 .so.
Giu His Ile Arg Asn Glu Leu Arg Gly 145 His Cys Thr Leu Gin Asp Met Leu Asp 130 Ser Thr His Ile Val Arg Leu Phe Leu 115 Leu Asn Thr Leu Met Ile Ser Gly Lys 100 Lys Trp Tyr Tyr Asn Ser Thr Ile Lys Leu Ala Ile Asp Lys 165 Ser Leu Arg Arg 70 Tyr Tyr Asp Asn Leu I50 Ser Pro Pro Gin 55 Arg Thr Ile Arg Val 135 Afin Asp Ile Gin 40 Lys Leu Phe Pro Thr 120 Leu Giu Lys Val 25 Pro Ile Lys Ile Gly 105 Tyr Ser Giu Trp Lys Tyr Lys Leu Arg 90 le Ser Lys Ile Tyr Gly Asp Leu Ile 75 Tyr Asn Gin Leu Asn 155 Asn Lys Met Asn Leu Pro Ser Met Ala 140 Asn Pro Ile Asp Lys Thr Giu Lys Thr 125 Ser Ile Phe Ala Asp Leu Giu Met Val 110 Asp Lys Ser Lys Gin Asp Asp Lys Ser Thr Gly Asn Lys Leu Ser Lys Val Lys Giu Leu Asp Val 160 Trp 170 175 Phe Thr Ile Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Giu 266 Ile Asn Asn 225 Asp Pro Ile Ser Val Glu .Asp Glu Leu Gly 385 Cys Pro Tyr Thr Phe 195 Phe Leu 210 Tyr Asn Val Val Lys Leu Asp Lys 275 Leu Leu 290 Lys Gin Leu Ile Tyr Ile Ile Ala 355 Giu Ala 370 Lys Gin Thr Ile Pro Vai Giy Ser 435 180 Asn Leu Gly Giu Gin 260 Leu Giu Leu Phe Asp 340 Giu Ser Leu Ile Thr 420 Asn Val Gly Lys Asp Ile Tyr Giy 245 Ser Phe Pro Arg Giu 325 Lys Ile Ile Lys Ile 405 Leu Ser His Leu 230 Arg Met Pro Leu Gly 310 Ser Ile Phe Ala Phe 390 Asn Pro Ala Pro 215 Ile Trp Tyr Ile Al a 295 Ala Arg Leu Ser Ala 375 Asp Gly Asp Ile Glu Thr Asn Gin Met 280 Leu Phe Giu Asp Phe 360 Glu Thr Tyr His Ser 440 185 Tyr Leu Pro Ile Lys 265 Gly Ser Leu Ser Ile 345 Phe Lys Ile Arg Ala 425 Tyr 190 Asn Leu Leu Glu Asp Gin Lys 205 Vai Giu Ser 250 Gly Glu Leu Asn Ile 330 Phe Arg Val Asn Giu 410 His Giu Leu Leu 235 Al a Asn Lys Ile His 315 Lys Asn Thr Arg Lys 395 Arg Giu Asn Ile Leu 220 Val Leu Cys Ala Asn Leu Thr Phe 285 Gin Thr 300 Val Leu Giu Phe Lys Ser Phe Gly 365 Lys Tyr 380 Cys His His Gly Phe Ile Ala Val 445 Asp Lys Gln Met Tyr Cys 240 Lys Leu Asp 255 Trp Giu Val 270 Asp Val Ile His Asp Pro Ser Glu Met 320 Leu Ser Val 335 Thr Ile Asp 350 His Pro Pro Met Tyr Ile Ala Ile Phe 400 Gly Gin Trp 415 Ile Asn Ala 430 Asp Tyr Tyr Gin Ser 450 Phe Ile Gly Ile Lys 455 Phe Asn Lys Phe Ile 460 Giu Pro Gin Leu 267 Asp 465 Lys Thr Asp Asp Lys 545 Met Gly Leu Glu Asn 625 Phe Ser Tyr Leu Ile 705 Giu Asp Leu Thr Ile Tyr Met Lye Asp Lys Ala Leu Ser Pro Lys 470 475 9.
C
9 Ser Asn Ser Trp 530 Glu Arg Lys Lys Val 610 Lys Glu Cys Giu Asn 690 Tyr Asn Ala Lys 515 Leu Ile Al a Phe Arg 595 Tyr Ile Phe Phe Ser 675 Lys Val Trp Ser 500 Phe Asp Lys Thr Phe 580 Leu Asn Ser Lys Leu 660 Thr Leu Gly Asp 485 Asn Asp Asp Gin Gin 565 Gin Thr Asn Asn Ser 645 Thr Ala Phe Asp His 725 Thr Giu Pro Pro Giu 550 Val Giu Thr Ser Leu 630 Thr Thr Leu Asn Pro 710 Val Ser His Glu 535 Gly Leu Asn Ile Lys 615 Asn Asp Asp Phe Trp 695 Tyr Pro Arg 505 Ile Asn Leu Glu Met 585 Ile His Ser Tyr Lys 665 Glu His Pro Al a 490 Leu Leu Ile Phe Thr 570 Val1 Ser Thr Ser Asn 650 Lys Thr Pro Pro Phe 730 Ser Val Asp Ser Al a 555 Leu Lys Gly Asp Asn 635 Asp Tyr, Cys Arg Ser 715 Asn Glu Tyr Tyr 540 Lys Leu Gly Val Asp 620 Gln Gly Cys Asn Leu 700 Asp Leu Val Val 525 Ser Met Al a Glu Pro 605 Leu Lys Tyr Leu Gin 685 Glu Lys Leu Phe 510 Glu Leu Thr Asn Ile 590 Arg Lys Ser Glu Asn 670 Ile Gly Glu Tyr 495 Ile Ser Lys Tyr Asn 575 Glu Tyr Thr Lys Thr 655 Trp Phe Ser His 480 Arg Ala Gly Glu Lye 560 Ile Leu Asn Tyr Lys 640 Val1 Arg Gly Thr Ile 720 Ser Leu Giu Asp Pro Asp Ser Gly Tyr Val His Asn Pro Arg 735 268 Gly Gly Ile Giu Gly Phe Cys Gin
V.
V
V. V V
C
1 740 Ser Ala Ile His 755 Met Val Gin Gly 770 Asn Aen Tyr Asp 785 Val Arg Phe Phe Giu Leu Lys Leu 820 Ser Lys Arg Ile 835 Ala Leu Ser Arg 850 Arg Ser Ala Ser 865 ksn Gly Tyr Ser Ile Gin Gin Leu 900 ['hr Gin Asn Ile 915 ['yr Ala Ser Leu 930 'let Ser Arg Cys 945 eu Ala Asp Ile Pal Leu Tyr Arg 980 ~sp Trp Ala Ser 995 Leu Asp Tyr Asp 805 Asn Tyr Cys Ser Pro 885 Tyr Arg Ile Phe Lys 965 Ile Asp Ala Asn Arg 790 Ser Glu Tyr Val Asn 870 Val Ile Asp Pro Vai 950 Arg Met Pro Ala Gin 775 Val Leu Thr Asp Phe 855 Leu Leu Ala Gin Al a 935 Arg Phe Asn Tyr Val 760 Ala Lys Arg Ile Gly 840 Trp Ala Gly Leu Tyr 920 Ser Asn Ile Gin Ser 1000 Lys 745 Arg Ile Lye Giu Ile 825 Arg Ser Thr Tyr Gly 905 Phe V.al Ile Lys G1u 985 Cys Leu Ile Ala Glu Val 810 Ser Ile Glu S er Ala 890 Met Arg Gly Gly Ala 970 Pro Asn Trp Gly Val Ile 795 Met Ser Leu Thr Phe 875 Cys Aen Aen Gly Asp 955 Asn Gly Leu Thr Val Thr 780 Val1 Asp Lys Pro Val 860 Al a Ser Ile Pro Phe 940 Pro Leu Giu Pro Leu Arg 765 Thr Tyr Asp Met Gin 845 Ile Lys Ile Asn Aen 925 Aen Ser Leu Ser Gin Ile 750 Val Arg Lys Leu Phe 830 Ala Asp Ala Phe Pro 910 Trp His Val Asp Ser 990 Ser Ser Thr Val Asp Gly 815 Ile Leu Giu Ile Lys 895 Thr Met Met Ala Arg 975 Phe Ile Al a Pro Val 800 His Tyr Lys Thr Giu 880 Asn Ile Gin Ala Ala 960 Ser Phe 1005 Ile Thr Thr Met Ile Lye An Ile Thr Ala Arg Asn Val Leu Gin Asp 269 1010 1015 1020 Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met Ile Giu 1025 1030 1035 1040 Glu Asp Giu Giu Leu Ala Giu Phe Leu Met Asp Arg Lys Val Ile Leu 1045 1050 1055 Pro Arg Val Ala His Asp Ile Leu Asp Asn Ser Leu Thr Gly Ile Arg 1060 1065 1070 Asn Ala Ile Ala Gly Met Leu Asp Thr Thr Lys Ser Leu Ile Arg Val 1075 1080 1085 Gly Ile Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys Ile Ser 1090 1095 1100 Asn Tyr Asp Leu Val Gin Tyr Giu Thr Leu Ser Arg Thr Leu Arg Leu 1105 1110 1115 1120 Ile Val Ser Asp Lys Ile Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 1125 1130 1135 Ala Ile Ala Leu Arg Gin Lys Met Trp Ile His Leu Ser Gly Gly Arg 1140 1145 1150 Met Ile Ser Gly Leu Giu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 1155 1160 1165 Val Val Ile Thr Gly Ser Glu His Cys Lys Ile Cys Tyr Ser Ser Asp 1170 1175 1180 Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn Ile Lys Ile 1185 1190 1195 1200 Gly Ser Ala Giu Thr Gly Ile Ser Ser Leu Arg Val Pro Tyr Phe Gly 1205 1210 1215 Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr Ile Lys Asn 1220 1225 1230 Leu Ser Lys Pro Ala Lys Ala Ala Ile Arg Ile Ala Met Ile Tyr Thr a1235 1240 1245 Trp Ala Phe Gly Asn Asp Giu Ile Ser Trp Met Giu Ala Ser Gin Ile 1250 1255 1260 Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lye Ile Leu Thr 1265 1270 1275 1280 Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 1285 1290 1295 270 Thr Gin Met Lys Phe Ser Ser Thr Ser Leu Ile Arg Val Ser Arg Phe 1300 1305 1310 Ile Thr Met Ser Asn Asp Asn Met Ser Ile Lys Giu Ala Asn Giu Thr 1315 1320 1325 Lys Asp Thr Asn Leu Ile Tyr Gin Gin Ile Met Leu Thr Gly Leu Ser 1330 1335 1340 Vai Phe Giu Tyr Leu Phe Arg Leu Lys Giu Thr Thr Gly His Asn Pro 1345 1350 1355 1360 Ile Val Met His Leu His Ile Giu Asp Giu Cys Cys Ile Lys Giu Ser 1365 1370 1375 Phe Asn Asp Giu His Ile Asn Pro Giu Ser Thr Leu Giu Leu Ile Arg 1380 1385 1390 Tyr Pro Giu Ser Asn Giu Phe Ile Tyr Asp Lys Asp Pro Leu Lys Asp 1395 1400 1405 Vai Asp Leu Ser Lys Leu Met Vai Ilie Lys Asp His Ser Tyr Thr Ile 1410 1415 1420 Asp Met Asn Tyr Trp Asp Asp Thr Asp Ile Ile His Ala Ile Ser Ile *1425 1430 1435 1440 V Cys Thr Ala Ile Thr Ile Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 1445 1450 1455 .Asn Leu Lys Gu Ile Ile Val Ile Ala Asn Asp Asp Asp Ile Asn Ser 1460 1465 1470 *.Leu Ile Thr Giu Phe Leu Thr Leu Asp Ile Leu Val Phe Leu Lys Thr 1475 1.480 1485 Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 1490 1495 1500 *Lys Ile Giu Gly Arg Asp Leu Ile Trp Asp Tyr Ile Met Arg Thr Leu 1505 1510 1515 1520 *Arg Asp Thr Ser His Ser Ile Leu Lys Val Leu Ser Asn Ala Leu Ser *1525 1530 1535 His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Giy Val Leu Asn Pro 1540 1545 1550 Ile Tyr Gly Pro Asn Ile Ala Ser Gin Asp Gin Ile Lys Leu Ala Leu 1555 1560 1565 271 Ser Ile Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 1570 1575 1580 Gly Val Ser Leu Glu Ile Tyr Ile Cys Asp Ser Asp Met Giu Val Ala 1585 1590 1595 1600 Asn Asp Arg Lys Gin Ala Phe Ile Ser Arg His Leu Ser Phe Vai Cys 1605 1610 1615 Cys Leu Ala Glu Ile Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 1620 1625 1630 Tyr Leu Giu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Giu Leu Asn Ile 1635 1640 1645 Lys Giu Asp Pro Thr Leu Lys Tyr Val Gin Ile Ser Gly Leu Leu Ile 1650 1655 1660 Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala Ile Lys 1665 1670 1675 1680 Tyr Leu Arg Ile Arg Gly Ile Ser Pro Pro Giu Val Ile Asp Asp Trp 1685 1690 1695 Asp Pro Val Glu Asp Giu Asn Met Leu Asp Asn Ile Val Lys Thr Ile *1700 1705 1710 *Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys Ile Asn Asn Phe *1715 1720 1725 .00 Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys Ile Arg Ser Ile *1730 1735 1740 Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly *.1745 1750 1755 1760 Leu Thr Leu Pro Gin Gly Giy Asn Tyr Leu Ser His Gin Leu Arg Leu 1765 1770 1775 Phe Gly Ile Asn Ser Thr Ser Cys Leu Lys Ala Leu Giu Leu Ser Gin **1780 1785 1790 0. *le Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly .1795 1800 1805 Giu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 1810 1815 1820 Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn Ile Thr Asp Val Ile Gly 1825 1830 1835 1840 Gin Arg Glu Leu Lys Ile Phe Pro Ser Glu Val Ser Leu Val Gly Lys 272 1845 1850 1855 Lys Leu Gly Asn Val Thr Gin Ile Leu Asn Arg Val Lye Val Leu Phe 1860 1865 1870 Asn Gly Aen Pro Asn Ser Thr Trp Ile Gly Asn Met Giu Cys Giu Ser 1875 1880 1885 Leu Ile Trp Ser Glu Leu Asn Asp Lys Ser Ile Gly Leu Val His Cys 1890 1895 1900 Asp Met Giu Gly Ala Ile Gly Lys Ser Giu Glu Thr Val Leu His Giu 1905 1910 1915 1920 His Tyr Ser Val Ile Arg Ile Thr Tyr Leu Ile Gly Asp Asp Asp Val 1925 1930 1935 Val Leu Val Ser Lys Ile Ile Pro Thr Ile Thr Pro Asn Trp Ser Arg 1940 1945 1950 Ile Leu Tyr Leu Tyr Lye Leu Tyr Trp Lys Asp Vai Ser Ile Ile Ser 1955 1960 1965 Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu Ile Ser Lys 1§70 1975 1980 .Asp Ala Tyr Cys Thr Ile Met Giu Pro Ser Glu Ile Val Leu Ser Lye *1985 1990 1995 2000 Leu Lye Arg Leu Ser Leu Leu Glu Giu Asn Asn Leu Leu Lye Trp Ile 2005 2010 2015 Ile Leu Ser Lye Lye Arg Asn Asn Giu Trp Leu His His Glu Ile Lye 2020 2025 2030 Clu Giy Giu Arg Asp Tyr Gly Ile Met Arg Pro Tyr His Met Ala Leu 2035 2040 2045 Gin Ile Phe Gly Phe Gin Ile Aen Leu Aen His Leu Ala Lye Glu Phe *.2050 2055 2060 Leu Ser Thr Pro Asp Leu Thr Asn Ile Asn Asn Ile Ile Gin Ser Phe 2065 2070 2075 2080 Gin Arg Thr Ile Lye Asp Val Leu Phe Giu Trp Ile Asn Ile Thr His 2085 2090 2095 Asp Asp Lye Arg His Lye Leu Gly Gly Arg Tyr Asn Ile Ph6 Pro Leu 2100 2105 2110 Lye Asn Lye Gly Lye Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 2115 2120 2125 273 Trp Ile Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 2130 2135 2140 Asp Giu Lys Phe Giu His Arg Ala Gin Thr Gly Tyr Vai Ser Leu Ala 2145 2150 2155 2160 Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn Ile Ile Lys 2165 2170 2175 Asn Tyr Arg Glu Cys Ile Gly Ser Ile Ser Tyr Trp Phe Leu Thr Lys 2180 2185 2190 Glu Val Lys Ile Leu Met Lys Leu Ile Gly Gly Ala Lys Leu Leu Gly 2195 2200 2205 Ile Pro Arg Gin Tyr Lys Glu Pro Giu Asp Gin Leu Leu Glu Asn Tyr 2210 2215 2220 Asn Gin His Asp Giu Phe Asp Ile Asp 2225 2230 INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 15462 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA
ACTTAAAATT
TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC
AAAATCAGCC
TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT
TGGACCGACA
ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA
TTCACTAGAT
AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT AGCTCTACCT AACAACAA.AT GGAAGTAATG CAGATGCCAA
GTATGTCATA
AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA
AACTTAGGAT
ATGTTGAGCC
GGTGGAGCTA
ATAACTGATG
AATGAGAAAC
GCCAATCCAG
TACATGATTG
GAGATGATAT
120 180 240 300 360 420 480 274 ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GA.AACTATGT
TGCAGA.ACGG
CATGTTTAGG
TCTCAGGGTT
TGCAGGCAGG
CTCAACAGAG
ATGACCTCAC
GTCTCGCTTC
CTCTATCCAC
CAXAGGGACC
CACCAGGCAA
GAGCCATGCA
GACAAGCAGT
GAGTGACACA
AGACATCTTT
CAGAACAATT
TTCAATATGC
CTGACAATAT
ACAAGAAGAA
AAATAGATGA
ATCAATAATA
GGTAAATTTA
AAACTATCAA
CTCGGCCCTC
AAACGACACA
CAAACCAACA
CAGGAACALAT
AGCTCTTATA
AAGAAAAGGC
GCTGGTATTG
CTTGGTAACT
AACCATAGAA
ATTCTTCAkAT
TCTCAGACCA
ACGCGCTCCT
CTATCCTGCC
ACAGTATGTG
AGCACGTGAT
CGAAGCTAAA
CCACAAACCG
CGAACATAGA
CTGGGCAGAA
CAAGACCGAA
ACAAAGCAGT
TCTGTTTAAC
AATAAGAAAA
GAGTCTGCTT
ATCATGGATT
AACATCATTG
ATCAACACAA
GAAACAAGTG
TCAACAATTG
ATACAGATCT
TTTTTCACCC
AGCGGTGACA
CTTATGGTTG
AAGAATATAC
ACAATCAGAT
GATATCAATA
TTCATCTGTA
ATATGGAGCT
ACGGGAAGAT
GCCGAAGCTC
GAAAGCTTGA
ACAGGTGGAT
GCAGATCAAG
GGAAATAGAA
CAACAAAACA
CAACCACCCA
GCATTTGGAA
ACTTAGGATT
GAAACTCAAT
CTTGGGAAGA
AATTCATACT
GAACCCAGCA
AGAAAGATAG
AAGACCTTGT
GGATAGTTCT
GATTGGAAGC
CAGTGGATCA
AALACATTAAT
AAATTGTTGG
ATGGAATTGA
GATTAAAAGC
TCCTCAGAGA
ATGCAATGGG
CATATCTAGA
AAATGAGCTC
AGAGACATAT
CAGCCATAGA
AACAAAATGG
GCGATGATCA
TCAGAGACAG
CTAATCCCAC
GCAACTAATC
AAAGAATCCT
CAATAGAGAG
GGAATCAAGA
CAGCACCGAC
ACTCAGTGCC
TGGATCAACT
CCACACATTT
GGTCAAAGCT
TTTCAGACAA
GATTGGGTCA
AACAATGAAT
CAACTACATA
GACCAGAATG
TTTGATGGAA
TCCTATACAT
GGTGGCAGTT
CATTGATATG
AACACTGGAA
AAGGAACATA
GATGGCA.ATA
AG;AACCTCAA
GACTGAGCAA
ACTAAACAAG
AAACAGAACA
GAATCAACAT
ATCATACCGG
TTGATGGAAA
GATAAATCAA
CCCCAAGAAG
ACCATCTGTC
GACAAAAATA
GGGTATCCAT
ATCACTAGTA
GATGGILACAG
ATCATGCGGT
ACCAGCAGA.A
AGAGATGCAG
GCAGCTTTGA
CTGTATTTAT
GGTGAGTTCG
GTACAAAATA
TTCCAGCTAG
GATGAACTTG
AACAGTTCAG
GATGAAGAGC
TCATCCATAA
GCTACAGAAT
AGACTCAACG
AACCAGGACG
TTTAATCTAA
AATATAGGGT
GCGATGCTA.A
CTAATATCTC
ACTTATCGGA
AACCAGAAAT
GACAGTCTG
600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 275 GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA
GAGAGGACCT
AATCCCCAGA
CAATGAAATT
TGTTCCAAGC
TGATCATGGA
TACTGCTGCA
AAGTTCTTCA
CTGGTTTAAG
ATCAAAAGGG
AACAGAAATA
CAACAACACC
AACTTATACA
AAATGGAAAG
TCTATTGCAG
ACGAGTTGTA
CCTGGCAGGA
AAATGAAATG
GATAGAAAAT
TATGACTGAG
CAAAACAAAA
ACAAGGCATT
CGATGTACAA
AATACCCAAA
TCTCTCACAA
AGAAGTATCT
GGGAGAAGAA
AGCATCACAG
AGAAAGATGG
GAGATATCAG
AGAAGCCTGG
ACACCAGATG
ACACATCAAG
AAATCAAAAG
CAGAAGAAAA
CAGACAGAAT
GACCGGAACG
AAAGAATCGA
GAAAGGAAGG
AATCTTGGTG
TGTGTAGCAA
TTAGTCATAG
CTAAACCTCA
CAAAGAGAAC
AGAGGAGGAA
TTGAAAGAAG
GACAAGAATA
GTTAAATCAG
AAAGTGAGCA
AGCACAAAAC
GAATTAATGG
GCAGCTCAGA
ATTCTAAAAA
ATAAGGACTC
GAAGTGATGA
AATCTATCAG
ATGAAGA.AGA
AAGATGACAA
ATACCGACAA
TCTCAAAGAC
CATCAGAAAC
AACAGACAAG
TCCGAACAAA
ATACAGAAGA
TAATTCAATC
ATGTACTAAA
GGGTTTCAAT
AAGCAGATCT
AACTGTCATT
AGAAAGACCA
AAAAGATCAA
TACCCGATCT
AGATATTAAG
GTACAATGAG
AATCATACAT
ACATGTTCAA
TAGTAGAGCT
TGGAACCCAA
TATTGAGGGG
CATATTTACA
TACACCTGAT
AATACTAATG
AAGAATTA.AA
CCAGATACCA
AACAACCACC
ACAATCCTCA
CACAACTCCT
CTCTGAATCC
GAGCAATCGA
CACATCAAAA
CAATGTAGAT
GGACAACGAC
AAAGAAAATG
GATCACGTCA
AAATGAATCC
GAAGACCAGG
ATATCGACAT
TTCATACAAT
ATCACTAGTT
A;ACGAACTC
TGAAGATGTC
GAGACTGTGG
AACACGGAGG
AAAATGCGAC
ACAGAACAAA
ACAAGATCAA
AAAAATAGTA
AAAGGGGGAA
ACATCAGACT
AACACCGACA
TCATGGAATC
CCAACAACAA
AAACCCAAGA
TTTACAGAGA
CTAGATTTAT
ACTGCATCAA
ACAAAATTAA
GACGAATCAC
CTILATTTCAA
ILATGAGAGAG
TTTGACCCAC
GCAGGAGATA
GAGTCAAATG
GCAGTCATCA
AAACCTTGCA
AACA.ATTGCC
TCTCTGGAGG
ATATTGATCT
A.ATCTGCAAA
GTAGAAACAG
TAAGTGTTGT
GGACAAAGAA
XAGGGAAAGA
ACAGATCCAC
CAALAGGGGCA
TCATCATCGA
CTTCCAGATC
CACAAAAGAC
GGGCAATTAC
ATCAAGACAA
AGATAGATTT
CACAGATACA
ATAGAAGATT
ATCTCAAAAT
TATCCATGAT
TTATGGAGGC
CACTAGAGAA
CAACAAGACT
ACAACAGCA
AAAATGATGA
AATGATCCAA
2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 276
V.
V 9 *9 99 V V
V
9**V V V V. V V V
V.
V
V
CAAAGAAACG
ACACAAAATC
AGGATTAAAG
CATTCCCAGA
ATGAACAGAG
ACGGATCCCG
ACAAATACGG
GATCATTACC
CAACCAAACT
CGGTACAAAA
TGTTCGATGC
AATTTAGAGT
CTAAGTCAAT
TAAAAACAGG
AAAAATCACT
ACTCTGTTGA
TAGTTGGAGG
GTCAGCTGGT
ATCTAGTTAT
CTTTACCTGG
AACAATGGAA
AAGGATAATC
TCGCAAGAAT
ACAGAACACC
GAGACCGGCA
ACACCGALACA
AAGCAGAATG
AATAAkATTAA
ATCATCATTC
GAAAGCAGTA
GTATTTAGAT
GAGTGTGAkAT
AATCGGATTG
GGATATAGAA
TATAAAACCA
CAACAAAGTT
AATCTTCGTG
GGCATCACTA
GGTTCAGACT
GAATTTCATG
ATACTGTAAA
AATCAGTCTT
ATTCAAAAGA
CTGGGCTTCA
CGAGTTCAGA
CTAGTAATCT
AAAAACTTAG
AAGAGAGAAG
AGAACAACAA
ACACAACAAG
A.ACAGACAAG
AAACAACAGA
TCCTTGTCCA
TCTGAAAATG
CCCCACATTA
GTCTTCTTAC
GATCTCGACA
GCTALAGTACA
GTGAGAAGAA
GAACTGTACC
GCTCTTGCTC
AATTGTACGG
TCTCTALACCA
GATTCTAAAG
GTCCATCTCG
CAGAAAATCG
CATGTCAATG
GAGATTTGTT
TCAGTAGAGA
TACTATCCTA
CTATTTTAGT
GACAAAAGAG
GGACCAAAAA.
AATCAAAACA
CACTGAACAC
AAACAACAGI
TAT CAAT CAA
AAATGAGTAT
GTCATATAGA
GAGTTGCCAA
TCGGCTTCTT
GTGACCCGAG
CTGGGAATGA
CAGTCAAAGC
CATGGTCCAA
CTCAATGTCT
CAATTGGATC
ACACAATATC
GGATAGTTCA
GATTGATCAA
AGAAAATGAG
CAACTGGGTC
ATCCTTTAAT
TTACAAGAGT
ATATTATTGC
CCGGACGTAT
GTCAATACCA
AGTCAAATAG
TCCAACTCAC
P.ATGCCAA~CT
AGATCAAAAC
TATACAAATA
AACTAACTCT
ACCATTACCA
GATCGGAAAT
CGAGATGGAA
TTACAAAGTT
CCAGGAATTG
GAAAGAGATG
TAGACTAAGA
TCCACTAGAT
AATAACCTTG
AATCAATCTG
AATTTTGGAT
AAGAAAAGTA
ATTGATATTT
CATATCAAAA
GGATCTAAAT
GGATGCAATT
AAAAGGAGTT
CTATTAAGCC
ACAACTATTA
GAGAAATCAA
TCAAAACAAA
TCAATACTGC
CTGTCAACAC
AGAAAAACTT
GCA.ATATACA
CTCAALAGTCA
CCACCAA.AAC
CGAATCAAAG
TGTGGCTCTG
TTACAAGCCG
GTTGTTTACA
AAAGGAATGC
AGGAGCATAA
TTCAAAATTC
CAGGTACACA
GAGAILAGGCG
GGCAGAATGT
TCTTTAGGAC
ACACTAGCAA
CCGCATCTCA
TTCCAACCTT
GGGAAAATCA
GAAGCAAATA
GCAGTCACAC
AACAAAAGGT
A.ATTCCAAAA
TAATTATTAC
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 9V
V.
V
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 5160 277 Ge.' c Oe SI C C
C
CS SC C 9
I
See.
S S le 0
S.
S C Ce eel...
S
*59@
S
0 eel...
S
S
CS
S C
S.
5605
C
See.
ATTGGTCAAC
TTTGAGCCTC
ATACAAGAAG
AGATGTGATA
CTTTGGAGGG
GGCAGTTGCT
AATTAGGGAC
AGCAATTAAA
AGGTTGTGAA
AACAAACATA
TATAGCATCA
ATATGATATC
CTTGAATGAT
CACTCAGATC
CCCTCTTCCC
ATGTATAGA.A
TGAAATAGAG
AGACATTGTT
CACCTGTACA
AATTATAACA
TAAAGAAGGA
ACTTGATCCA
AAAAGAATGG
TAGCACTACA
GATAATTACA
TGACAAGCCA
AGTCCCAAAG
ATACCAAAA
TTATTGGATA
GTAACCAATC
GTAATTGGAA
CTGGTTGAAG
ACAAATAAAG
TCAGTCCAGG
GCAGCAGGAC
TTTGGTGATA
TTATACCGCA
TATGATCTGT
TACTCAATCA
TACAkAGTAG
AGCCATATCA
GCATTCAGCA
AGCTGCTTAT
CCAAGATATG
TGCAACGGALA
CATAAAGAAT
ACTCTTGCAT
ATTGACATAT
ATAAGAAGGT
ATCATAATTA
ATTGCAATTA
TATGTACTA
GGATGAAGAT
TAGAAGACTC
GACTGATCAT
AAGAATCCAA
CCATTGCTCT
CCAA.GCAGGC
CAGTGCAGTC
ATTATGTTA.A
TTCAATTAGG
ACATAGGATC
CAA.ATATCAC
TATTTACAGA
CCCTCCAAGT
ATTCCATATC
TGACGAAAGG
GCTATATATG
CAGGAAACAT
CATTTGTCAA
TTGGTAATAG
GTAGTACAGT
TCTATACACC
CAATCGAGCT
CAA.ATCAAAA
TTTTGATAAT
AGTATTACAG
CA.AACAAATA
ATCACAAAAC
TAACTCTTGT
CCCTTTATAT
TGAAAACACT
GGGAGTAGCA
ALAGATCAGAC
AGTTCAGAGC
CAAAGAAATC
AATTGCATTA
GTTACAAGAA
AGAAATATTC
ATCAATAAAG
CAGACTCCCT
ATATAACATC
GGCATTTCTA
CCCTTCTGAT
ATCCCAATGT
TGGAGGAGTG
AATCAATCAA
AGGTATCAA~C
AAATGATATA
CAACAAGGCC
ACTAGATTCT
GATCATTATA
AATTCAAAAG
ACATATCTAC
TTTGAAACAA
GGTGACCAAC
GATGGATTAA
GATCCCAGAA
ACCTCAGCAC
ATCGAAAAAC
TCCATAGGAA
GTGCCATCGA
ACACAGCATT
AAAGGALATIA
ACAACATCAA
GTGAGAGTTA
TTATTAACTA
CAAAACAGAG
GGTGGAGCAG
CCAGGATTTG
CCAAGAACAA
GTTGCAAACT
CCACCTGATC
GGAATGCTGT
ACACTAAACA
AAATCAGATC
ATTGGAAATT
TTGTTTATAA
AGAAATCGAG
AGATCATTAG
GATATCTAAT
AGATCAAGCA
GATTACAGAA
CAAAACGATT
AAATTACAGC
TCAAAGAAGC
ATTTAATAGT
TTGCGAGGCT
ACTCAGAATT
AATTACAAGG
CAGTTGATAA
TAGATGTTGA
GGCTGCTGAA
AATGGTATAT
ACGTCAAAGA
TATTAAACCA
CGGTCACATC
GTATAACAAC
AAGGAGTAAA
TCAATACAAA
ATTCTGTTAC
TAGAAGAATC
GGCATCAATC
TTAATATAAC
TGGATCAAAA
ATATTAAAAT
5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 278 TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGA-AGGA
S
S. S.
S. 55 S S
S
SS
S S
S.
CCCAATAGAC
TGCTGGCA.AT
AATATACATA
AATTAATTCC
GTTTATGGAA
GTCAGGAGTG
ATCATTGACA
TGATAATCAA
TCCAGATGAT
AAGGTTAATG
AACTCCGTCT
AGGTTGTCAG
CTCAGACTTG
TAGGAAGTCA
CAAAGTTGAT
TGTCAATTAT
TCAACCATAT
AATATTTCTC
AACTGGGTGC
TTCAGATAGG
AAAATTGAAA
ACTTCTACTA
ACAATTAGGA
TAATGTGCTA
ATGTATAACA
AAATCCAAAT
GAGCTGGAGA
TTATGGACAA
ATCAAAAGTG
ATTACAGAAA
AATACAAGGC
CAACAGATGT
GAAGTGCTGC
TTTTGGAGAT
CCAGGGCCGG
TTAGTTATAA
GATATAGGA.A
GTACCTGACT
TGTTCTCTAG
GAAAGATCAG
GATGGTTCAA
GCTGCACTAT
GGGTATGGAG
CCCGGGAAAA
AGGATGGTCA
GTATGGACGA
GGTAACAAGA
ATAATTGATA
TCAAGACCAG
GGAGTATATA
TCGAGATGGA
CGTCTATGGC
TA.ATCCTGGT
AAAAGGCCCA
AGATCCAAAT
TTCTTACA.AT
CAGATCTTAG
CACAAPLGAAT
GCACGTCTGG
GATTATTAGC
ATGATCTGAT
AATCATATCA
TAALATCCTAG
CACTCCTAAA
ATTATGCATC
TCTCAACAAC
ACCCATCTGT
GTCTTGA.ACA
CACAGAGAGA
ACTCCATCAT
TATCTATGCG
TCTATATATA
TTACTGATTA
GAAACAATGA
CTGATGCATA
ATACTGGAAG
TACTCATGGC
GTTATTATCA
CGAATCATTG
GGCATCGGAT
TCAGAGTCAT
GAAATTCATT
AACACATGAT
TCTTCCATCT
TATGCCAACG
TTATGCTTAT
AGTCTTACAG
GATCTCTCAT
TACAGATGTA
ATCAGGCATA
AAGATTTAAG
TGGACCAGGG
TCCAATAAAT
CTGTAATCAA
TGTTGCTGAC
ACAAAATTAC
TACAAGATCT
CAGTGATATA
ATGTCCATGG
TCCACTCAAT
CATACCAATC
AACAAGCTCA
ATAGTCTTCA
CTGCAAGACA
ILATACCAATG
GTCCAGAATT
AGTGAAATTA
GTAGGTATAA
TTAATGAAAA
ACTGTTGATG
ACCTCAAATC
ATAGGGATAA
ACCTTTAACA
TATCA.ACTGT
GAAGATATTG
AATAATAACA
ATATACTACA
GAGAA.TGTAA
GCGTCTCATA
AAAGGCTTAA
TGGGGGTCAG
ACAAGTTGGC
AGGATAAAAT
GGACATTCAT
CCCACAGGGA
ACGGAAAGGA
CTAATAAGAT
TCATAGTGCT
TAAATAATGA
ATCTALATACA
ACATACCAAT
CAATTAGAA-A
AACCTTTAAA
CTCCAAAAAT
GCTGTGTTAG
TAATTACTCG
TAACTGTAA
TAAATGACAA
GTTCA-ACTCC
TACTTGATAT
TAAGCTTTGA
AAGGCAAAAT
TCTGCAACAC
GTCCATGGTT
ACTCAATTCC
AAGGAAGGTT
ATAGCAAGTT
GGACATGGCA
GTCCAGATGG
GCATTGTGTC
6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 279 o S. 0* 0 0* .0.*e0
S
005 5 S
S
ATCTGTCATA
AACCGAAAGA
AACAAGCTGC
TAAA.AGCTTA
TTAATCATAA
ATCAGCAkATC
GGGAAATGGA
ACCTTAACTC
CTCAGCCTTA
TTAATA.AATT
AAGTGAATGA
TCAAATTATA
GAACATATAG
TAGCCTCAAA
AAGTTCACAC
TCAAGTATGA
AGGATTATAA
TGATATTAGA
ATTGTGACGT
TACAATCTAT
TTATGGGAGA
TTCAAACTCA
AGATGGAATT
TTGATAAAAT
CTTTTTTTAG
TTAGACTCAC
GTAAACGAGC
ATTACACACT
AACACATTTC
TTAACCATAA
AGACAATAGA
CACTGAATCT
TCCTATCGTT
TGATATGGAT
GGATAAAAGA
CTTAGGAAAA
TATACCTGGT
TCAAATGACT
AA.ATGATGGA
AACCTATAkA
TATGAGAAGA
CTTGTTAGAA
TAAACAAAC
AGTCGAAGGC
GTATCAGAA
AAAGACATTT
TGATCCTGTT
AATATTTGAA
TTTAGATATA
AACATTTGGG
AAAA.ATC GAG
TGGCCATCCT
ATAACAAAGG
AACCCATGTT
TATGCATCAA
CAAAAGGGAA
AACAATGGCA
AAAGGTAAAA
GACGACTCAA
CAACGATCTA
TACACATTTA
ATTAACAGTA
GATGGATTAA
AGCAATTATG
TCAGATAAAT
TTACAAAAAG
GACCAGAAGA
TACAATGGTT
CGATGGAATA
GGTAATAACC
GATGTGATAT
AAACAACTAA
TCTAGAGAAT
TTTAATAAGT
CATCCTCCAT
AGTGAACCCA
AAACAGAACA
ATATTGTTTT
GTTCAAAACA
TCTATCTATA
ATATAAAAAA
CTGTATCTGA
TAGCACAATT
TACTAGTTAT
TTAGA.AGATT
TCAGATATCC
AAGTGACTGA
GAGATCTATG
ATCTTAATGA
GGTATAATCC
CTCGAAATGA
ATTTCTTATT
ATCTA.ATTAC
TAAGTGCATG
TGTGGGAAGT
CGTTATTAGA
GAGGAGCTTT
CGATTAAGGA
CTACAATAGA
TAGAAGCTAG
GTCATAACTT
CTCTCAGCTG
CATATAGTAG
GAGATTCCAA
ATACA.AGTAT
CTTAGGAGCA
CATACTCTAT
ACACACTATT
CACTAGACAG
AA.AATTAATA
AGAAATGTCA
ATTATTACTT
GATTAATGTG
AGAAATTAAT
ATTCAAAACA
GATCACTTTT
GATACATCCA
TCCTGAATTA
TGCTAAGTTA
GATAGATAAA
ACCACTTGCA
TTTAAATCAT
ATTTCTGAGT
TGAAATAGCA
TATTGCAGCA
ACTCAACAGC
GATATACAAC
AAATAAATCA
AA.AGCTGCAG
ATGATAAGTA
AAGCGTGCTC
CCTGAGTGTC
ATGAGTCTAC
AAAATAAAAC
TTAACTGAAA
A.AAGAAATGT
AAAGCAGATA
CTATCAAAAT
AATATATCGA
TGGTTTACTA
AATGTTGGGA
GAATTGGTTT
GTATTGATGT
GATCCAAAAT
TTGTTTCCAA
TTATCCTTAA
GTGTTATCCG
GTAGATTACA
GAGATtTTCT
GAAAAGGTTA
8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 280
TCTTCTGTAC
TGACATTACC
TATCATATGA
TCATAGAGCC
CAAAAAAATC
CATCCAACGA
ATCAGATATT
CTTATAGTCT
ACAAAATGAG
TCTTTCAAGA
TATCAATATC
ATGACCTTAA
AGAAATTTGA
TCCTAACAAC
TTGGAGAAAC
GTCTTGAAGG
ATATATCATT
TAGAAGGATT
CTGTTAGAAT
TAACCACAAG
ATGTAGTGAG
AATTAAATGA
ATGGGAGAAT
CAGTAATAGA
TTGAGkATGG
AACTATATAT
AATA-ATAATT
TGATCATGCA
GAATGCTGTT
TCAGTTAGAT
AAATTGGGAC
ATCACGAAGA
GGATTATGTA
TAAAGAAAAA
AGCTACACALA
AAATGGGATG
AGGAGTTCCA
AACCTACAAT
ATTCA-AGTCA
AGATCTCA7A
TTGCAACCAA
AAGTACAATC
AGAGGATCAC
TTGTCAAAAA
AGGCGTGAGG
AGTACCCAAC
ATTTTTTGAT
AACGATTATA
TCTTCCTCAA
CGAAACAAGA
TTATTCACCT
TGCCCTTGGG
AACGGATATA
CACGA.ATTCA
GATTATTACC
GAGGATTTGA
ACAGTTTATC
TTAGTTGAAG
GAATCTGGGG
GAGATCAAAC
GTTTTATCAG
GTGAAGGGAG
CGGTATAATG
AAAAJTAAGTA
ACGGATATC2
AILATACTGTC
ATATTTGGAT
TATGTAGGTG
CCTGATTCTG
TTATGGACAC
GTGACTGCAA
AATTATGACT
TCATTAAGAG
AGTAGCAAGA
GCTCTAkAAAG
TCAGCATCTT
GTTCTAGGAT
ATGAATATCA
GAGAGAGGCA
TCATAAATGC
AGAGCTTTAT
CAATTTATAT
CTGCATCTAA
TATTTATAGC
ACTGGTTAGA
AGGAAGGTAG
AGACACTACT
AGATTGAATT
AAGTGTACIA
ATCTTAATTT
ACAATGATGG
TTAATTGGAG
TAAATAAATT
ATCCTTACTG
GTTTTTACGT
TCATATCTAT
TGGTTCAAGG
ACAGAGTTAA
AAGTGATGGA
TGTTCATATA
CATTATCTAG
CAAATTTGGC
ATGCATGCTC
ATCCAACTAT
TGGTGGACAG
TTACGGTTCA
AGGAATAAAA
GAAAGATAAA
TTTACTGTAC
AGATAGTAAA
TGATCCAGA-A
ACTCTTTGCA
TGCAAATAAC
ACTTAAGAGA
TAATTCTAIA
GTCTTCTAAT
ATACGAGACT
ATATGAATCA
GTTTAATTGG
TCCTCCATCA
TCATAACCCA
AAGTGCkATA
AGACAATCAA
GAAGGAGATA
TGATCTAGGT
TAGCAAAAGA
ATGTGTCTTC
AACATCATTT
AATTTTTAAG
AACACAGAAT
TGGCCTCCTG
A.ACTCTGCGA
TTCAATAAAT
GCATTATCTC
CGTACTAACG
TTTGATCCTC
TTTAATATTT
AAAATGACAT
ATAGGAAAAT
TTAACAACCA
AGCCATACAG
CAGAAATCAA
GTGAGCTGTT
ACAGCTCTAT
TTACACCCTC
GATAAAGAAC
AGAGGGGGTA
CATCTAGCAG
GCTATAGCTG
GTTTATAAAG
CATGAACTTA
ATCTATTATG
TGGTCAGAGA
GCAAAAGCAA
AACATTCAAC
ATCAGAGATC
9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 281
AGTATTTTAG
GATTCAATCA
CCGCATTGGC
ATAGGATTAT
ATTCATGCAA
GGAATGTATT
TAGAAGAAGA
TTGCACATGA
TAGATACGAC
TGTTGAGGAA
GACTAATTGT
CATTGCGACA
CGCCTGACCC
TATGTTATTC
AAATAGGATC
CTGATGAAAG
CCGCAATAAG
TGGAAGCCTC
TAACACCGGT
TGAALATTCTC
ACATGTCTAT
TGTTAACAGG
ACCCTATAGT
ATGAACATAT
TTATTTATGA
ACCATTCTTA
GAATCCAAAT
CATGGCCATG
TGATATTAAA
GAATCAAGAA
TTTACCACAA
ACAAGATTCA
TGAAGAATTA
TATTCTAGAT
AAAATCACTA
AATCAGTAAT
AAGTGATAAA
AAAGATGTGG
ATTAGAATTA
TTCAGATGGC
AGCAGAAACA
ATCTGAAGCA
AATAGCAATG
ACAGATAGCA
AGCTACATCA
CAGTACATCA
CAAAGAAGCT
ATTAAGTGTT
TATGCATCTG
TAATCCAGAG
TAAAGACCCA
CACAATTGAT
TGGATGCAAT
TCAAGATGTT
AGATTTATTA
CCAGGTGAGT
TCTCAAAATA
C CAAAT CCAT
GCTGAGTTCC
AATTCTCTCA
ATTCGGGTTG
TACGATCTAG
ATCAAGTATG
ATTCATTTAT
CTATCTGGGG
ACAAACCCAT
GGTATATCGT
CAATTAGGAT
ATATATACAT
CAA.ACACGTG
ACAAATTTAT
TTGATCAGAG
AATGAAACCA
TTCGAATATT
CACATAGAAG
TCTACATTAG
CTCAAAGATG
ATGAATTATT
ATGCCTCTTT
TTGTAAGGAA
AGGCGAATCT
CATCTTTTTT
TAACCACCAT
TATTATCTGG
TGATGGACAG
CAGGAATTAG
GCATAAATAG
TACAATATGA
AAGATATGTG
CAGGAGGALAG
TAGTAATAAC
ATACTTGGAT
CATTAAGAGT
ATATCAAGIA
GGGCATTTGG
CAAATTTTAC
CACACAGATT
TCAGCAGATT
AAGATACTAA
TATTTAGATT
ATGAGTGTTG
AATTAATTCG
TGGACTTATC
GGGATGATAC
AATACCTGCT
TATTGGTGAT
ATTAGACCGA
TGACTGGGCT
GATAAAAAAT
ATTATTCACA
GALAGGTAATT
AAATGCCATA
AGGAGGACTG
AACACTAAGT
TTCGGTAGAC
GATGATALAGT
AGGATCAGAA
GTATTTACCC
TCCTTATTTT
TCTTAGTAAA
TAATGATGAG
ACTAGATAGT
TAAGGATACT
TATALACA.ATG
TCTTATTTAT
AAAAGAAACC
TATTAAAGAA
ATATCCTGAA
AAAACTTATG
TGACATCATA
AGTGTTGGGG
CCATCAGTTG
AGTGTTCTTT
TCAGATCCAT
ATAACAGCAA
AATACAATGA
CTCCCTAGAG
GCTGGAATGT
ACATATAGTT
AGGACTTTGC
CTTGCCATAG
GGACTTGAAA
CATTGTAAAA
GGTAATATCA
GGATCAGTCA
CCTGCAAAAG
ATATCTTGGA
CTCAAAATTT
GCAACTCAGA
TCCAATGATA
CAACAAATAA
ACAGGACACA
AGTTTTAATG
AGTAATGAAT
GTTATTAAAG
CATGCAATTT
11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 282 CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 9
AAGAGATAAT
CTCTTGACAT
ACACTCTTTA
CACTGAGAGA
AAGTATTCAA
CTAGTCAAGA
TGAGAGA.ATG
TTGCAAATGA
CAGAAATTGC
TATTGAAACA
TATCTGGATT
TCAAATATCT
TAGAAGATGA
ATAATAAAGG
TTAAAATCAG
GTGGTTTGAC
TCAACAGCAC
ATAAAGACAA
ATGCCACATT
TTGGTCAACG
GAALATGTGAC
CATGGATAGG
TTGGATTAGT
ATGAACATTA
TTTCCAAAAT
AGTTATTGCA
ACTTGTATTT
TAGTCTAAAA
TACTTCCCAT
GAGGTTCTGG
CCAGATAAAA
GTTGAATGGT
TAGGAAACAA
ATCTTTCGGA
ATATCTTGAA
ATTAATTAAA
AAGGATTCGC
AAATATGCTG
GAATAAAATT
ATCTATAACA
ACTTCCTCAA
TAGTTGTCTG
GGACAGGCTC
AGGACCTGCA
AGAATTGAAA
ACAGATTCTT
AAATATGGAA
ACATTGTGAT
TAGTGTTATA
TATACCTACA
AATGATGATG
CTCKAGACAT
ATAGALAGGTA
TCAATATTAA
GATTGTGGAG
CTTGCCCTAT
GTATCACTTG
GCCTTTATTT
CCTAACCTGT
TTAAATATTA
TCGTTCCCAT
GGTATTAGTC
GATAACATTG
ALACALATTTCT
AGTGATTCTG
GGAGGGAATT
AAAGCTCTTG
TTCCTGGGAG
GTTAATTATT
ATATTTCCTT
AACAGGGTAA
TGTGAGAGCT
ATGGAAGGAG
AGAATTACAT
ATCACTCCGA
ATATTAATAG
TTGGTGGATT
GGGATCTCAT
AAGTATTATC
TTTTAkACCC
CTATATGTGA
AAATATACAT
CTAGACACCT
TAAACTTAAC
AAGA.AGACCC
CAACTGTAAC
CACCTGAGGT
TCAAAACTAT
GGGGACTAGC
ATGATAATGA
ATCTATCGCA
AGTTATCACA
AAGGAGCAGG
ATAATTCACG
CAGAGGTATC
AAGTACTGTT
TAPLTATGGAG
CTATCGGTA
ACTTGATTGG
ATTGGTCTAG
CTTAATCACT
ATTAGTAA-AT
TTGGGATTAT
TAATGCATTA
TATTTATGGT
ATATTCACTA
TTGTGACAGC
TTCATTTGTT
ATACTTGGAG
TACTCTTAAA
ATACGTAAGA
AATTGATGAT
AAATGATAAC
ACTTAAGAAC
TAGACTAGAT
TCAATTGAGA
AATTTTAATG
AGCTATGCTA
TTTGAATATA
ATTAGTAGGT
CAATGGGAAT
TGAATTAAAT
ATCAGAAGAA
GGATGATGAT
AATACTTTAT
GAATTTTTGA
CAATTTGCAT
ATILATGAGAA
TCTCATCCTA
CCTAATATTG
GATCTATTTA
GATATGGAAG
TGTTGTTTAG
AGACTTGATC
TATGTACAAA
AAGACTGCAA
TGGGATCCGG
TGTAATAAAG
TATCAAGTCC
GCTAATACAA
TTATTCGGAA
AAGGAAGTCA
GCATGTTATG
ACAGATGTAA
OAAAAAATTAG
CCTAATTCAA
GATAAGTCCA
ACTGTTCTAC
GTTGTTTTAG
CTATATA;AAT
13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 9 283
TATATTGGAA
TATATCTAAT
CAAAACTTAA
AGATGTAAGT
TTCGAAAGAT
AAGATTGTCA
CAAAGAAGAG
GAATAATGAJA
GAATCATGAG
ATCTGGCGAA
GTTTTCAGCG
AGAGACATAA
GACTGCTATC
TTACAGGTCG
TAGCTGATAC
GAGAGTGTAT
AATTGATTGG
AGTTATTAGA
TGAAGATATA.
TGTAATATAT
ACCATATCAT
AGAATTTTTA
AACAATAAAG
ATTAGGCGGA
GAGAAGACTA
CTTTCCTGAT
TGATTTAGAA
AGGATCAATA
TGGTGCTAAA
AAACTACAAT
TCCTAACCTT
ATAATATCAC
GCATATTGTA
CTCTTGGAAG
TGGTTACATC
ATGGCACTAC
TCAACCCCAG
GATGTTTTAT
AGATATAACA
GTATTAAGTT
GAAAAATTTG
TCATTAAAGT
TCATATTGGT
TTATTAGGAA
CAACATGATG
TATCTTTAAG
TCAAAACTTC
CTATAATGGA
AAAATAATCT
ATGAAATCAA
AAATCTTTGG
ATCTGACTAA
TTGAATGGAT
TATTCCCACT
GGATTTCATT
AACATAGAGC
TATTGTCGAA
TTCTAACCAA
TTCCCAGACA
AATTTGATAT
CCTAGGAATA
CTCTTGTTTG
TAATCCTGCA
ACCTAGTGAA
ATTAAAATGG
AGAAGGAGAA
ATTTCAAATC
TATCA.ACAAT
TAATATAACT
GAAAKATAAG
ATCATTATCG
ACAGACTGGA
AAACATCATT
AGAAGTTAAA
ATATA&AGAA
CGATTAAAAC
GACAAAAAGT
GT
TCAACAGAAT
ATTGTTTTAT
ATCATTTTAT
AGAGATTATG
AATTTAAATC
ATAATCCAAA
CATGATGATA
GGAAAGTTAA
ACTCGATTAC
TATGTATCAT
AAGAATTACA
ATACTTATGA
CCCGAAGACC
ATAAATACAA
AAGAAAAACA
14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15462 ATATACCA
CAGAGTTCTT
INFORMATION FOR SEQ ID NO:22: Wi SEQUENCE
CHARACTERISTICS:
LENGTH: 2233 amino acids TYPE: amino acid STRANflEDNESS: TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) Met 1 Glu SEQUENCE DESCRIPTION: SEQ ID NO:22: Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp Ile Leu Tyr Pro 5 10 Cys His Leu Asn Ser Pro Ile Val Lys Gly Lys Ile Ala Gin Leu 284 His Ile Arg Asn Glu Leu Arg Gly 145 His Phe Ile *Asn Asn 225 Asp Pro Ile Ser Thr Leu Gin Asp Met Leu Asp 130 Ser Thr Thr Thr Phe 210 Tyr Val Lys Asp Leu 290 Ile Val Arg Leu Phe Leu 115 Leu Asn Thr Ile Phe 195 Leu Asn Val Leu Lys 275 Leu Met Ile Ser Gly Lys 100 Lys Trp Tyr Tyr Lys 180 Asn Leu Gly Glu Gin 260 Leu Glu Ser Thr Ile Lys Leu Ala Ile Asp Lys 165 Tyr Val Ile Tyr Gly 245 Ser Phe Pro Leu Arg Arg 70 Tyr Tyr Asp Asn Leu 150 Ser Asp Gly His Leu 230 Arg Met Pro Leu Pro Gin 55 Arg Thr Ile Arg Val 135 Asn Asp Met Lys Pro 215 Ile Trp Tyr Ile Ala 295 Gin 40 Lys Leu Phe Pro Thr 120 Leu Glu Lys Arg Asp 200 Glu Thr Asn Gin Met 280 Leu 25 Pro Ile Lys Ile Gly 105 Tyr Ser Glu Trp Arg 185 Tyr Leu Pro Ile Lys 265 Gly Ser Tyr Lys Leu Arg 90 Ile Ser Lys Ile Tyr 170 Leu Asn Val Glu Ser 250 Gly Glu Leu Asp Leu Ile 75 Tyr Asn Gin Leu Asn 155 Asn Gin Leu Leu Leu 235 Ala Asn Lys Ile Met Asn Leu Pro Ser Met Ala 140 Asn Pro Lys Leu Ile 220 Val Cys Asn Thr Gin 300 Asp Lys Thr Glu Lys Thr 125 Ser Ile Phe Ala Glu 205 Leu Leu Ala Leu Phe 285 Thr Asp Leu Glu Met Val 110 Asp Lys Ser Lys Arg 190 Asp Asp Met Lys Trp 270 Asp His Asp Asp Lys Ser Thr Gly Asn Lys Thr 175 Asn Gin Lys Tyr Leu 255 Glu Val Asp Ser Lys Val Lys Glu Leu Asp Val 160 Trp Glu Lys Gin Cys 240 Asp Val Ile Pro 285 Val Lys Gin Leu Arg Giy Ala Phe Leu Asn His Val Leu Ser Giu Met 305 310 315 320 Glu Leu Ile Phe Glu Ser Arg Giu Ser Ile Lys Giu Phe Leu Her Vai 325 330 335 Asp Tyr Ile Asp Lye Ile Leu Asp Ile Phe Asn Lys Ser Thr Ile Asp 340 345 350 Glu Ile Ala Giu Ile Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 355 360 365 Leu Giu Ala Ser Ile Ala Ala Giu Lys Vai Arg Lys Tyr Met Tyr Ile 370 375 380 Gly Lys Gin Leu Lys Phe Asp Thr Ile Asn Lys Cys His Ala Ile Phe 385 390 395 400 Cys Thr Ile Ile Ile Asn Gly Tyr Arg Giu Arg His Gly Gly Gin Trp 405 410 415 Pro Pro Vai Thr Leu Pro Asp His Ala His Giu Phe Ile Ile Asn Ala 420 425 430 Tyr Gly Ser Asn Her Ala Ile Ser Tyr Giu Aen Ala Val Asp Tyr Tyr *..435 440 445 Gin Ser Phe Ile Gly Ile Lye Phe Asn Lys Phe Ile Giu Pro Gin Leu 450 455 460 *.Asp Glu Asp Leu Thr Ile Tyr Met Lys Asp Lys Ala Leu Her.Pro Lye 465 470 475 480 Lys Ser Aen Trp Asp Thr Val Tyr Pro Ala Ser Aen Leu Leu Tyr Arg **485 490 495 Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Giu Val Phe Ile Ala 500 505 510 Asp Ser Lys Phe Asp Pro His Gin Ile Leu Asp Tyr Val Glu Her Gly 515 520 525 Asp Trp Leu Asp Asp Pro Glu Phe Asn Ile Ser Tyr Her Leu Lye Giu 530 535 540 Lys Giu Ile Lye Gin Glu Gly Arg Leu Phe Ala Lye Met Thr Tyr Lys 545 550 555 560 Met Arg Ala Thr Gin Val Leu Her Giu Thr Leu Leu Ala Aen Aen Ile 565 570 575 286 Gly Leu Glu Asn 625 Phe Ser Tyr *.Ile 705 Ser Gly Ser Met 785 Val Giu Ser Lys Lys Val1 610 Lys Giu Cys Giu Asn 690 Tyr Leu Gly Ala Val 770 Asn Arg Leu Lys Phe Arg 595 Tyr Ile Phe Phe Ser 675 Lys Val Giu Ile Ile 755 Gin Tyr Phe Lys Arg 835 Phe 580 Leu Asn Ser Lys Leu 660 Thr Leu Giy Asp Giu 740 His Giy Asp Phe Leu 820 Ile Gin Thr Asn Asn Ser 645 Thr Ala Phe Asp His 725 Giy Leu Asp Tyr Asp 805 Asn Tyr Giu Thr Ser Leu 630 Thr Thr Leu Asn Pro 710 Pro Phe Ala Asn Arg 790 Ser Giu Tyr Asn Ile Lys Aen Asp Asp Phe Trp 695 Tyr Asp Cys Al a Gin 775 Val Leu Thr Asp Gly Met 585 Ser Ile 600 Ser His Leu Ser Ile Tyr Leu Lys 665 Giy Giu 680 Leu His Cys Pro Ser Gly Gin Lys 745 Vai Arg 760 Ala Ile Lys Lys Arg Giu Ile Ile 825 Gly Arg 840 Val Lye Giy Giu Ile Glu Leu 590 Ser Thr Ser Asn 650 Lys Thr Pro Pro Phe 730 Leu Ile Al a Giu Val 810 Ser Ile Gly Asp Aen 635 Asp Tyr eye Arg Ser 715 Tyr Trp Gly Vai Ile 795 Met Ser Leu Val1 Asp 620 Gin Gly eye Asn Leu 700 Asp Vai Thr Val Thr 780 Val Asp Lys Pro Pro 605 Leu Lys Tyr Leu Gin 685 Giu Lye His Leu Arg 765 Thr Tyr Asp Met Gin Arg Lys Ser Glu Asn 670 Ile Giy Glu Aen Ile 750 Val Arg Lye Leu Phe 830 Ala Tyr Thr Lye Thr 655 Trp Phe Ser His Pro 735 Ser Thr Val Asp Giy 815 Ile Leu Asn Tyr Lye 640 Vali Arg Giy Thr Ile 720 Arg Ile Al a Pro Val 800 His Tyr Lys 845 Ala Leu Ser Arg eye Vai Phe Trp Ser Giu Thr Val Ile Asp Giu Thr 287 850 855 860 Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala Ile Glu 865 870 875 880 Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser Ile Phe Lys Asn 885 890 895 Ile Gin Gln Leu Tyr Ile Ala Leu Gly Met Asn Ile Asn Pro Thr Ile 900 905 910 Thr Gln Asn Ile Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 915 920 925 Tyr Ala Ser Leu Ile Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 930 935 940 Met Ser Arg Cys Phe Val Arg Asn Ile Gly Asp Pro Ser Vai Ala Ala 945 950 955 960 Leu Ala Asp Ile Lys Arg Phe Ile Lys Ala Asn Leu Leu Asp Arg Ser 965 970 975 Vai Leu Tyr Arg Ile Met Asn Gin Glu Pro Gly Giu Ser Ser Phe Phe *980 985 990 *Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn .*.995 1000 1005 Ile Thr Thr Met Ile Lys Asn Ile Thr Ala Arg Asn Vai Leu Gin Asp 1010 loi5 1020 Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met Ile Giu 1025 1030 1035 1040 Glu Asp Giu Giu Leu Ala Giu Phe Leu Met Asp Arg Lys Val Ile Leu 0: :1045 1050 1055 Pro Arg Val Ala His Asp Ile Leu Asp Asn Ser Leu Thr Gly Ile Arg **1060 1065 1070 *see Asn Ala Ile Ala Gly Met Leu Asp Thr Thr Lys Ser Leu Ile Arg Val 1075 1080 1085 Gly Ile Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys Ile Ser 1090 1095 1100 Asn Tyr Asp Leu Val Gin Tyr Giu Thr Leu Ser Arg Thr Leu Arg Leu 1105 1110 1115 1120 Ile Val Ser Asp Lys Ile Lys Tyr Giu Asp Met Cys Ser Val Asp Leu 1125 1130 1135 288 Ala Ile Ala Leu Arg Gin Lys Met Trp Ile His Leu Ser Gly Gly Arg 1140 1145 1150 Met Ile Ser Gly Leu Giu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 1155 1160 1165 Val Val Ile Thr Gly Ser Glu His Cys Lys Ile Cys Tyr Ser Ser Asp 1170 1175 1180 Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn Ile Lys Ile 1185 1190 1195 1200 Gly Ser Ala Glu Thr Gly Ile Ser Ser Leu Arg Val Pro Tyr Phe Gly 1205 1210 1215 Ser Val Thr Asp Giu Arg Ser Giu Ala Gin Leu Gly Tyr Ile Lys Asn 1220 1225 1230 Leu Ser Lys Pro Ala Lys Ala Ala Ile Arg Ile Ala Met Ile Tyr Thr 1235 1240 1245 Trp Ala Phe Gly Asn Asp Giu Ile Ser Trp Met Glu Ala Ser Gin Ile *..1250 1255 1260 .Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys Ile Leu Thr *1265 1270 1275 1280 Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Phe Lys Asp Thr Ala 1285 1290 1295 **Thr Gin Met Lys Phe Ser Ser Thr Ser Leu Ile Arg Val Ser Arg Phe 1300 1305 1310 Ile Thr Met Ser Asn Asp Asn Met Ser Ile Lys Glu Ala Asn Glu Thr **1315 1320 1325 Lys Asp Thr Asn Leu Ile Tyr Gin Gin Ile Met Leu Thr Gly Leu Ser 1330 1335 1340 Val Phe Giu Tyr Leu Phe Arg Leu Lys Giu Thr Thr Gly His Asn Pro 1345 1350 1355 1360 Ile Val Met His Leu His Ile Giu Asp Glu Cys Cys Ile Lys Giu Ser 1365 1370 1375 Phe Asn Asp Giu His Ile Asn Pro Giu Ser Thr Leu Giu Leu Ile Arg 1380 1385 1390 Tyr Pro Giu Ser Asn Giu Phe Ile Tyr Asp Lys Asp Pro Leu Lys Asp 1395 1400 1405 289 Val Asp Leu Ser Lys Leu Met Val Ile Lys Asp His Ser Tyr Thr Ile 1410 1415 1420 Asp Met Asn Tyr Trp Asp Asp Thr Asp Ile Ile His Ala Ile Ser Ile 1425 1430 1435 1440 Cys Thr Ala Ile Thr Ile Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 1445 1450 1455 Asn Leu Lys Giu Ile Ile Val Ile Ala Asn Asp Asp Asp Ile Asn Ser 1460 1465 1470 Leu Ile Thr Giu Phe Leu Thr Leu Asp Ile Leu Val Phe Leu Lys Thr 1475 1480 1485 Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 1490 1495 1500 Lys Ile Giu Giy Arg Asp Leu Ile Trp Asp Tyr Ile Met Arg Thr Leu 1505 1510 1515 1520 *Arg Asp Thr Ser His Ser Ile Leu Lys Val Leu Ser Asn Ala Leu Ser 1525 1530 1535 *His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro *1540 1545 1550 I *le Tyr Gly Pro Asn Ile Ala Ser Gin Asp Gin Ile Lys Leu Ala Leu 1555 1560 1565 Ser Ile Cys Giu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 1570 1575 1580 Gly Val Ser Leu Giu Ile Tyr Ile Cys Asp Ser Asp Met Giu Val Ala 1585 1590 1595 1600 Asn Asp Arg Lys Gin Ala Phe Ile Ser Arg His Leu Ser Phe Val Cys 1605 1610 1615 *Cys Leu Ala Giu Ile Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 1620 1625 1630 Tyr Leu Giu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Giu Leu Asn Ile 1635 1640 1645 Lys Giu Asp Pro Thr Leu Lys Tyr Val Gin Ile Ser Gly Leu Leu Ile 1650 1655 1660 Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala Ile Lys 1665 1670 1675 1680 Tyr Leu Arg Ile Arg Gly Ile Ser Pro Pro Giu Val Ile Asp Asp Trp 290 1685 1690 1695 Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn Ile Val Lys Thr Ile 1700 1705 1710 Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys Ile Asn Asn Phe 1715 1720 1725 Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys Ile Arg Ser Ile 1730 1735 1740 Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 1745 1750 1755 1760 Leu Thr Leu Pro Gin Gly Giy Asn Tyr Leu Ser His Gin Leu Arg Leu 1765 1770 1775 Phe Gly Ile Asn Ser Thr Ser Cys Leu Lys Ala Leu Giu Leu Ser Gin 1780 1785 1790 Ile Leu Met Lys Giu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly *1795 1800 1805 *.,Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro *1810 1815 1820 **.Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn Ile Thr Asp Val Ile Gly **1825 1830 1835 1840 Gln Arg Glu Leu Lys Ile Phe Pro Ser Giu Val Ser Leu Val Gly Lys 1845 1850 1855 *Lys Leu Gly Aen Val Thr Gin Ile Leu Asn Arg Val Lys Val Leu Phe 1860 1865 1870 *.Asn Gly Asn Pro Asn Ser Thr Trp Ile Gly Asn Met Glu Cys Giu Ser 1875 1880 1885 Leu Ile Trp Ser Glu Leu Asn Asp Lys Ser Ile Gly Leu Val His Cys 0* *1890 1895 1900 ASP Met Giu Gly Ala Ile Gly Lys Ser Giu Glu Thr Val Leu His Glu 1905 1910 1915 1920 His Tyr Ser Val Ile Arg Ile Thr Tyr Leu Ile Gly Asp Asp Asp Val 1925 1930 1935 Val Leu Val Ser Lye Ile Ile Pro Thr Ile Thr Pro Asn Trp Ser Arg 1940 1945 1950 Ile Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser Ile Ile Ser 1955 1960 1965 291 Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu Ile Ser Lys 1970 1975 1980 Asp Ala Tyr Cys Thr Ile Met Giu Pro Ser Glu Ile Val Leu Ser Lys 1985 1990 1995 2000 Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp Ile 2005 2010 2015 Ile Leu Ser Lys Lys Arg Asn Asn Giu Trp Leu His His Giu Ile Lys 2020 2025 2030 Giu Gly Giu Arg Asp Tyr Gly Ile Met Arg Pro Tyr His Met Ala Leu 2035 2040 2045 Gin Ile Phe Gly Phe Gin Ile Asn Leu Asn His Leu Ala Lys Giu Phe 2050 2055 2060 Leu Ser Thr Pro Asp Leu Thr Asn Ile Asn Asn Ile Ile Gin Ser Phe 2065 2070 2075 2080 $*so 000. ,Gin Arg Thr Ile Lys Asp Val Leu Phe Giu Trp Ile Asn Ile Thr His *2085 2090 2095 Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn Ile Phe Pro Leu 2100 2105 2110 .Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 2115 2120 2125 coos*: Trp Ile Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Giy Arg Phe Pro 2130 2135 2140 see**:Asp Glu Lys Phe Giu His Arg Ala Gin Thr Giy Tyr Val Ser Leu Ala 2145 2150 2155 2160 Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn Ile Ile Lys 2165 2170 2175 Asn Tyr Arg Giu Cys Ile Gly Ser Ile Ser Tyr Trp Phe Leu Thr Lys 2180 2185 2190 Giu Val Lys Ile Leu Met Lys Leu Ile Gly Gly Ala Lys Leu Leu Gly 2195 2200 2205 Ile Pro Arg Gin Tyr Lys Giu Pro Giu Asp Gin Leu Leu Giu Asn Tyr 2210 2215 2220 Asn Gin His Asp Glu Phe Asp Ile Asp 2225 2230 292 INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 13218 base pairs TYPE: nucleic acid STRA1NDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: ACGCGAAAAA ATGCGTACTA CAA.ACTTGCA CATTCGAAAA AAATGGGGCA AATAAGAACT
TGATAAGTGC
TGATAAAGGT
CATGTTATAC
CAATTAAATT
ATAACAATAT
ACATATGGGA
ATTGTGAAAT
AAATATCTGA
AGACATGTGT
ATAAACTCAC
ATTGATGATC
AGAAATCATC
TGATGAAAGA
AGTAGGGAGT
CATGCCTATA
ACACACTOCT
CCAACCAAAC
ATTTTAGTAA
TATTTAAGTC
TAGATTACAA
TGATAAATTA
AAACGGCATA
TGTAGTGAAA
ATTGATTGAG
CAAATTTTCT
CTTACTTGGG
TTATTACCAT
CTAATCAATC
ACAGACATGA
ACACACAAAT
CAAGCTACAT
ACCAAATACA
TTTATCAATC
ATAATATACA
CAAACTATTC
TTAAAAATL
TAACCTTTTC
AATTTATTTG
ATTCTTCTGA
GTTTTTATAC
TCTAACTTTA
TTGACACACT
AAAAGACTAA
CTTGATCTCA
TTTAGTTAAT
AAACCATGAG
GACCCCTGTC
TCATATACTT
TTACATTCTT
AAAAATACAC
ACGGCGGGTT
AATATGACCT
CTCAAACAAC
A.AGTAAAGCC
AATCAGAAAT
ACAATGACGA
CCAATGCATT
ATGTTATAAC
CAACAATGCC
GCTCTCILATT
GTGACTCAGT
ATTCATGAAT
ATAAAAACTC
CACTACAAAT
AATGGATTCA
GATAAACAAT
AGTCAATTAT
TGAATATAAT
TCTAGAATGT
CAACCCGTGA
AGTGCTCAAT
AATAACATAA,
GGGGTGCAAT
AGTAGCATTG
AGCCAAAGCA
AAGCAGTGAA
AATACTACAA
AAACGGTTTA
AATGACTAAT
TATGTTTAGT
ATCAALAGGGA
GACAACACTA
ATAATAACAT
GAATGTATTG
GAGATGAAGC
ACAAAATATG
ATTGGCATTA
ATTCCAACAA
AGTTAAGAAG
ATTGGGGCAA
TCACTGAGCA
TTAAAAATAA
GCAATACATA
GTGTGCCCTG
AATGGAGGAT
ATGGATGATA
TATATGAATC
CTAATTCAAT
AATGGGGCAA
CTATGCAAAG
CTCTTACCAA
TAAGAAAACT
TACTGCACAA
GCACTTTCCC
AGCCTACAAA
AAAAACCAAC
GAGCTAATCC
ATACAAAGAT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 293 GGCTCTTAGC AAAGTCAAGT TGAATGATAC ATTAAATAAG GATCAGCTGC TGTCATCCAG
CAAATACACT
AAAACACCTA
CACAGGATTA
GATACTTAAA
TCAAGATATA
AGAAATALCA.A
GATGGGAGAA
GTGTATAGCT
AGTAATTAGG
ACCAAAGGAT
TGTTTTCGTG
AATCTTTGCA
AGTTTTAGCC
GGAGCAAGTT
CCATATATTG
AAGTGTGGTC
AAGAAACCAG
AGTAATAAAC
ACTCAACCCC
GTCAACATGG
AALATTCCTAG
AGCATAATAT
GGCACCAACA
TACCCAAGAA
ATTCAACGTA
AACAAACTAT
ATAGGTATGT
GATGCTGGAT
AATGGAAAGG
GTCAATATTG
GTGGCTCCAG
GCACTTGTGA
AGGGCAAACA
ATAGCTAACA
CACTTTGGCA
GGATTGTTTA
AAATCTGTAA
GTGGAAGTCT
AACAATCCAA
CTAGGCAATG
GATCTTTATG
TACAGTGTAT
AAIAGAAGATG
AGAAGTTTGC
AATCAATAAA
CTGTTAACTC
TCATCAATCC
AACCCCTAGT
GTACAGGAGA
GTGGTATGCT
TATATGCTAT
ATCATGTTAA
AAATGAAATT
AGATAGAATC
AATATAGGCA
TAACCAAATT
ATGTCTTAAA
GTTTTTATGA
TTGCACAATC
TGAATGCCTA
AAAATATCAT
ATGAGTATGC
AAGCATCATT
CAGCAGGTCT
ATGCAGCTAA
TAGACTTAAC
ATGTAGAGCT
ACCTGAATTT
GGGCAAGTTC
AATAGATATA
AACAAGTGAA
AAGCTTCAAA
TAATATTGAC
ATTAATCACT
GTCCAGGTTA
AGCTAATGGA
CGAAGTATTA
TAGAAAGTCC
TGATTCTCCA
AGCAGCAGGA
AAACGAAATA
AGTGTTTGAA
ATCCACAAGA
TGGTTCAGGG
GCTAGGACAT
ACAGAAGTTG
GCTGTCATTA
AGGCATAATG
AGCATATGCA
AGCAGALAGAA~
TTAAGTTAC
CATGGAGAAG
GCATCATCCA
GAAGTAACTA
GCCGACAGTA
GAAGATCTCA
ACTCCCAATT
GAAGATGCAA
GGAAGGGAAG
GTAGATATAA
ACATTATCAA
TACAAAAAAA
GACTGTGGGA
GACAGATCAG
AAACGATACA
AAACACCCTC
GGGGGTAGTA
CAAGTAATGC
GCTAGTGTCC
GGAGGAGALAG
ACTCAATTTC
GGAGAGTATA
GAGCAACTCA
TTGGAAGCCA
AAA.AAATACG
ATGCAAATAA
AAGATCCTAA
AAGAGAGCCC
CCCCAGAALAC
CCCCAAGTGA
ATGATGTGCA
ATCATAAATT
ACACTATAAA
CAACATATCG
GCTTGACATC
TGCTAAA.AGA
TGATAATACT
GTCTTACAGC
AGGGCCTCAT
ATCTTATAGA
GAGTTGAAGG
TAAGATGGGG
AGGCAGAAAT
CTGGATTCTA
CCAACTTCTC
GAGGTACACC
AAGAAAATGG
TAAAGCATCA
GGGCAAATAA
CAAAGCTACC
GAAGAAAGAT
GATAACATCT
AAAAGCCAAC
CAACCCTTTT
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 a to TCTAAGTTGT ACAAGGAC AATAGAAACA TTTGATAACA ATGAAGAAGA ATCTAGCTAC 294 TCATATGA-AG AGATAAATGA TCAAACAAAT GACAACATTA CAGCAAGACT AGATAGAATT .e
GATGAAAAAT
CCCACTTCAG
GAAAAAATAA
AGGA.ATGAGG
ACTTCCAAAA
GATGATTTTT
CILATCCATTG
CAATCAATCA
AAAGAACAAG
TACACAGCAG
ATATGGGTGC
AGCATCAATA
ATTAACTCAA
GTATCATTAG
GCATGCAGTC
ATGAAGACAT
ACATCAAAAA
CTGAACTCAC
ATTATTCCTT
AAATATATCA
AGCATATATT
CTAGAGGATT
CACACTATAT
CACTCAAAAT
AATAAAACCA
TAAGTGAAAT
CTCGCGATGG
GAGCGGAAGC
AAAGCGAAAA
AATTGAGTGA
GATCAGCGAT
AATCAACTGC
ACCAATTGAT
ATGGGGCAAA
CTGTTCAGTA
CTATGTTCCA
TACTAGTGAA
GAAGTGCTGT
ATGAAAGAAG
TAACATGCTT
TCAACCCCAC
GAGTAATAAT
TAGAAAATAT
ATGCAGGATT
AACCACAGAG
ATGTGACTAC
AAACTTAATT
CCAAACATCA
CCAAAATCAC
AAATATGGGG
ATTAGGAATG
AATAAGAGAT
ATTAATGACC
AATGGCAAAA
CTTGTTGGAA
CAACTCACTC
CAGACCGAAC
CAATCAGCAA
TATGGAAACA
CAATGTTCTA
GTCATCTGTG
GCAGATCTCT
GCTGGCTCAA
CAAATTAGCA
AAAAGTAAAA
TCATGAGATC
ACCAACCTAT
AGCAACCACC
AGTGTTAGTT
TCAATTTATA
TAATTGGAAG
ATCAACACTG
TAAACATCTA
TACCAGCCAC
TAAATAGACA
CTCCATACAT
GCTATGGTTG
AATGATAGGT
GACACCTCAG
GACAACGATA
AGCAATCAAC
AAACAAACGT
CCCGACAAAA
TACGTGAACA
GAAAAAGATG
CCAGCAGACT
ACGCCCAAAG
ATGCCTAGTA
TATGATGTAA
AGTATGTTAA
ATTGCTCTAT
CTAAGATCAA
GAATTCAAAA
ATCACAGTTA
GTAGATCTTG
CATACAGCTA
AATGACAGGT
CACTACACAC
TATCTGCTAG
TTAGTTAGAG
TAGTAGTTGC
GTCTAAGAGA
TAGAGGCTAT
ATGAAGTGTC
GTGACAATGA
AACATCAATA
CCATCAGTAG
TTAACAATAT
AGCTTCACGA
ATGATCCTGC
TGCTCATAAA
GACCTTCACT
ATTTCATCAT
CTACACCTTG
CTACAGTCAA
GTGAATTTGA
TTAGTGTCAA
ATGCTATCAC
CTGACAATAA
GTGCCTACCT
CACGTTTTTC
CCACATATAT
TTCATCACAC
ACCTAGAGTG
TTCAATCAAT
AAGTGCAGGA
AGAGATGATA
GGCAAGACTT
TCTTAATCCA
TCTATCACTT
AAACAGACAT
AACCACCAAC
AGTAACAAAA
AGGCTCCACA
ATCACTAACA
AGAACTTGCA
ACGAGTCACG
AAGCGCAAAT
TGAAATCAAA
AGATCTTACC
AAATATTATG
GAACAAGGAT
CAATGCAAAA
AGGAGCATTC
AGAAAAAGAG
AATCAAACCA
CCTCAAACTA
AAACCAATCC
CGAATAGGTA
CTTAACAACC
2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 295 o
ATTTATACCG
CAATAGAATT
TAATCTTTTT
ATAAAGCATT
GTTCTACCAT
TTCTCACAGA
ATTTAAAAAT
ATGCAACCAT
ATACTCTTAA
TAGCACAA.AT
CCATAATATT
CAATAAAAAA
GGGTTAGTTC
CATCACCCAA
CCACTTCAAC
AAGATGATTA
AACTTTGCAA
TCAAACCCAC
CCAAAACGAC
CAGAAAGAGA
CAGTCCAACA
CACCCACAGC
AGTTATTCAA
AACCTGGGGC
CTTGCTATTA
TCGACATGTA
CCA.ATTCAAC
CACAAGCAAA
ACTAATTATA
CTGTAACAAA
TATGCTGTGT
GTCATGGTGT
TAACATAATG
GTCCAAACAC
TCATCTAATT
AGCACTATCA
CATCATCTCT
CCACACTGAA
ATCCAAGCAA
TACAAAATCA
ACAGACCAAC
CCATTTTGAA
ATCCATCTGC
AAACAAACCA
GAAAAAAGAA
CACCAGCACC
GCAATCCCTC
ATCCGAGCCC
AAACTACATC
AAATAACCAT
ATGCATTGTA
GTGCAGTTAG
ACATATACTA
TTTTGGCCCT
ATCACTATTA
ACTCTTGAAC
CAAATTATAA
CGCAAAACCA
ATGAATTGTT
AAGAATCAAC
GTAATATCCT
GTTTTGGCAA
GCCAATCACA
AAAAACATCA
CCCACAACCA
GAkAACACACC
AAGCCAAGCA
GTGTTCAACT
AAAACAATAC
ACCACCAAAA
ACTACCACCA
TCACAATCCA
CTCTCA.ACCA
TCCACACCAA
TTAGCAGAGA
GGAGTTGATG
CCTCACCTCA
CAGAGGTTAT
TAAATCTTAA
ATTTTACACT
TGATTGCAAT
TAGGACAGAT
TCCTGTATAT
CGCTAACTAT
AGTATGAGAT
GCACTGCCAG
CTTGTTTATA
TGATAATCTC
AAGTTACACT
CCACCTACCC
CATCACCAAT
ATACAACAGC
CAAAACCACG
TCGTTCCCTG
CAAGCAACAA
CCACAAACAA
ACCCAACAAA
CTGCACTCGA
CCCCCGAAAA
ACTCCACCCA
ACCGTGATCT
ATCCACAAGT
AGTCAGAACA
TTTAGTGCTT
AATGGGAAAT
AATACATATG
ACTAAATAAG
GTATCAAATC
ATAAACAAAC
CATGGTAGCA
CAAAAACAAC
GACTCTAGAA
CAGATTAAAT
AACCTCTCTC
AACAACGGTC
TACTCAAGTC
CCACACAAGT
ACAAACCAAA
TCCAAAAAAT
CAGTATATGT
ACCAAAGAAG
AAGAGACCCA
AAAACTAACC
CACAACCACA
CACACCCAAC
AAAAACCCAG
ATCAAGCAAG
CAAGTGCAAT
TAACTGAGGA
TAAGAACAGG
ACATCCATCA
ATCTTAACTC
CTALAGTGAAC
AACACATAGA
AAATCCAATC
TAGAGTAGTT
ATTGGGGCAA
AAGACCTGGG
TTAAAATCTA
ATAATTGCAG
ACAGTTCAA-A
TCACCAGAAA
TCAGCTACAA
GGCAGAACCA
CCACCAAAAA
GGCAACAATC
AAACCAACCA
AAAACACCAG
CTCAAGACCA
TTAAAACACA
TCCACACAAA
CCACATGCTT
AACGAALATTA
CTTCCTAACT
GTTTTACCAA
TTGGTATACT
4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 296
S
S.
5* S. S 5,
S.
AGTGTCATAA
AAAGTAAAAC
CTACTTATGC
ATGAACTACA
CGALAGATTTC
TCAAAAGTTC
AACAAAGCTG
CTCAAGAATT
TCCAACATTG
AGAGAATTTA
AGTGAGTTAC
TCAAGCAATG
GAAGTCCTTG
AA.ATTGCACA
ACAAGGACTG
GCTGACACTT
TTACCAAGTG
ATTATGACAT
TCATGCTATG
TTTTCTAATG
ACTTTATACT
ATAAATTACT
GTCAATGAkLA
AATGTAAATA
ATTGTAGTAT
ACACCAGTTA
CAATAGAATT
TTATGAAACA
AAAACACACC
CAATCAATAC
TAGGCTTCTT
TACACCTTGA
TAGTCAGTTT
ACATAAATAA
AAACAGTTAT
GTGTCAATGC
TATCATTAAT
TTCAGATAGT
CATATGTTGT
CATCGCCTCT
ATAGAGGATG
GTAALAGTACA
AAGTCAGCCT
CAAAA.ACAGA
GTAAAACTAA
GTTGTGACTA
ATGTAAACAA
ATGACCCTCT
AAATCAATCA
CTGGCAAATC
TGTTATCATT
CACTAAGCAA
AAGTAATATA
AGAATTAGAT
AGCTGTCAAC
CACTAAAAAC
GTTAGGTGTG
AGGAGAAGTG
ATCAAATGGG
CCAATTATTA
AGAATTCCAG
AGGTGTAACA
CAATGATATG
AAGGCAACAA
ACAGCTGCCT
ATGCACTACC
GTATTGTGAT
GTCCALATCGA
TTGTAACACT
CATAAGCAGC
ATGCACTGCA
TGTGTCAAAC
GCTGGAAGGC
AGTGTTTCCT
AAGTTTAGCT
TACTACAAAT
AATAGCTATT
AGACCAACTA
A.AAGAAACCA
AAGTATAAGA
A.ACCGGGCCA
CTAAATGTAT
GGATCTGCAA
AACAAGATCA
GTCAGTGTTT
CCCATAGTAA
CAGAAGAACA
ACACCTTTAA
CCTATALACAA
AGTTATTCCA
ATCTATGGTG
A.ACATCAAAG
AATGCAGGAT
GTATTTTGTG
GACATATTCA
TCAGTAATTA
TCCAACAAAA
AAAGGAGTAG
AAGAACCTTT
TCTGATGAGT
TTTATTCGTA
ATTATGATAA~
GGTTTACTGT
AGTGGAATCA
AATGCAATGG
ATGCAGTAAC
GAAGAGAAGC
CAATAAGCAA
TAGCAAGTGG
AA.AATGCTTT
TAACCAGCAA
ATCAACAGAG
GCAGATTGTT
GCACTTACAT
ATGATCAGAA
TCATGTCTAT
TAATAGATAC
AAGGATCAAA
CAGTATCCTT
ACACTATGA.A
ATTCCAAGTA
CTTCTCTTGG
ATCGTGGGAT
ATACTGTGTC
ATGTAAAAGG
TTGATGCATC
GATCTGATGA
CTACAATTAT
TGTATTGTAA
ATAATATTGC
AACTGACACT
AGA.ATTACAG
ACCACAGTAT
GAAGAGGAIA
TATAGCTGTA
GTTGTCTACA
AGTGTTAGAT
CTGTCGCATC
GGAAATCACC
GTTGACAAAC
AAAATTAATG
AATA.AAGGAA
ACCTTGCTGG
TATTTGTTTA
CTTTCCACAG
CAGTTTGACA
TGACTGCAAA
AGCTATAGTG
TATAAAGACA
AGTGGGCAAC
GGAACCTATA
AATATCTCAA
ATTACTACAT
TATAGTAATC
AGCCAAAAAC
ATTCAGCAAA
5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 0.
297 0*
TAGACAAAAA
AACTTACAAC
ACCCACATAA
TCAACCACTA
GAAATCCTTG
GTCATAATTA
ACAAGATACT
CTGAACTGGA
TAGGATCTAT
TTGAGATCAA
AGATAAGAGT
AAACCATCCA
CATTAGATAT
AAAATGACCA
ATATTGATCT
AACCATAACT
TGGACACCTA
GAAGATATAT
CATCCAACCA
GAAACTCTGC
AGTGTAATGC
ACTTAATTAG
CACAGTCATT
TCCAGTCATT
ACTTACTTAA
TCTTGAATAA
ACCACCTGAT
AALATATTTCA
CTAAGCTAGA
AATCAACACA
TAAkATTTGAG
CTTTGAATGG
CAAGTCAATG
TAGAACAGAA
AAACAACATA
TAGTGATGAC
GTACAATACT
TCTGCTCALAG
CCACAAAAGC
AACCAAAAAT
CAAGTGAAAG
ATTTGGATAA
AAAACTTATT
ATACAGTATA
TAAAACTATT
TAATGTGTAT
TTTAGGGAGT
TAGACAAAGC
AATATCTAGA
ACTTATGACA
AAAAATAATA
ACTAGGATTA
CATGTTTCAA
ACATCACAGT
TCCTTAACTT
TCATTCACAA
ATTAGAGGTC
CCTCCTCATG
GACAAAAGCA
GAATATGCTC
ACAAAACAAT
ATTAAAAAGC
GTTATATCAT
AGACTACCAG
ATAACCATAA
AATGATATTA
CATGGTTGCT
CCACCAGCGT
AGATGCCACT
TATATTAGTG
TTGATAAGGT
CTAACTGATA
TATCTTTTTA
CCACTACTAG
TATCATAAAG
TATAAAAGTA
CGAAGAGCCA
AAGGAAAAGG
CAACAATCTG
ACAGGCTGAA
ATAGTTACAT
AATTAACAGC
ATTGCTTGAA
CATTACTAGT
TAGACACTTT
TTGGTATAGT
CAGCATGTGT
TTAGAGATAA
ACATTGAGAG
CAGACGTGCT
GCAATCCAAA
CCGGATAA.AT
ACATTCAATC
TTATTAA.ATC
CAACAATTTC
TCATAATGCT
TATGGGACAA
GTTATTTAAA
ACGGCCCTTA
AGCATATGAA
GTGAACTGAA
TGTCCTCGTC
TAGAAATAAG
ACAGAGTTAA
CTGACCACCA
TCATTTCCTC
AAAAACCTCA
TGGGGC.AAAT
TGGTAGAAGA
GAGGCAAAAC
GTCTGAAATA
TGGAGTGCTA
TGCTATGAGT
TGAAGA.ACCC
CAATAGAAAA
GAAGAAGACA
AGAGTCAACT
ATCCTTGTAG
ATAAAAACAT
ATATATTTGA
TCCAACATCT
TGACCATAAC
AATGGATCCC
AGGTGTTATC
TCTTAAAAAT
TCTTAAAAAA
ATTAGAAGALA
TGAACAAATT
TGATGTAAAG
GCCCAACAAT
AT C CCAAAT C
ACATCATGCT
AGTATCACAA
ATGTCGCGAA
TGTCACTACA
TTCATGTTAA
AGTGGAGCTG
GAGAGTTACA
AAACTTCTTA
AATTCACCTA
AACAACAAGC
ATAAAGAACA
GTGAATGATC
TATATCATCC
ATTACAATTT
TGAAATTCAT
TAACATCCCT
GACTCTATGT
ATTATTAATG
TCTTTTTCAG
GATTACACCA
CTAACTATAA
CCAACTTATT
GCTACAACTA
GTGTACGCCA
AATTCAGGTG
7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 298
C
ATGAAAACTC
ATCAATCATA
CAACACTCTT
GGTTCAATTT
AA.AGTCATGG
ATCAATATGG
ATCAATTTTT
GGATAAGTAA
ATGTTGTGTT
AAGGCTTCTA
CAGAAGAAGA
CAGCTATTAA
CAGTGTCTGA
TGATTAAGCT
GAATCTTTGG
GTAATGAAAC
ATAGAATCAT
TTGTCCTACC
AAATCACAGA
TGCCTAAAAA
ATCTAATATG
AACATGAAAA
TGAGAGATAA
TCAACAACTC
GAATGTTTGC
TAGCTGAAAA
AGTACTTACA
TACAAATTCA
GAAAAAATTG
ATATACAAA-A
GTTTATATTA
TTGTATCGTT
GACATGGAAA
TTGTTTAAAT
ATCACAATTA
CATAATAAAA
TCAATTTAGG
GGCTCAAAAG
TAATATCATA
TGCAGGTGAT
ACATCCAATG
TALAGTTCTAC
AAAAGGGTTT
TCTAAGATGG
AAATGATTTG
AGTGGATCTT
GACTAGTTTT
GTTGAAGTTC
TAAATTCAAT
TAATCACGTG
TATGCAACCA
TATTTTACAA
ACCATAATTA
GACAAAAGTC
ATGTGTTCAA
TTAAATAACA
ATAGATAATC
TATCATAAAG
GACATCAGCC
ACATTAAACA
TTTCTTTATG
GAAGTAGAGG
AAACGATTTT
GACCTACTAT
AATGGTA.AT
AATAATCTCA
GTCGATGAAA
TTATTAAGTA
GTAAATACCT
TTAAACTACT
ATTATTTTAT
GAAATGATAA
CCTAGAAATT
TCTGAAAGCG
GAATGCGATC
GTATCACTAA
GGTATGTTTA
TTCTTCCCTG
AAGATGATAT
ACTCAGTAAA
TGCAACATCC
TATTAACACA
AAACTTTAAG
GACTCAAAALA
TTAGCAGATT
AAAGCTTAGG
GAGATTGTAT
GATTTATTAT
ATAATAGCAT
CA.AGAGTATG
GGATAATCCT
ATAACTTGAG
GACAAGCAAT
GTCTAAGTAC
ACAACAGATG
ATAAACTTAA
CAGGATTGCG
TAAATGACAA
ACATGCCATC
ACAGATCGAG
TATACAATTG
CTGGTAAAGA
GGCAAATCCA
AGAGTTTGAC
ACTTTCGGCT
TCAAA.ATATC
TCCATCATGG
ATATCGATCA
TGGTTTTCAG
AATCACAACT
AAATGTTTGC
GCTGAGATGT
ACTGAAATTA
GTCTTTAATT
GCTAAATAAC
TCACACTTTA
ATTAAGTAAA
TGAGCTATAT
GGATTCTGTA
ATTAAGAGGT
GCCCACCTTA
TACTTATCCA
GTTCTATCGT
AGCCATTTCA
ACATATACAA
AAGAGTACTA
TGTAGTCAAT
AAGAGAGCTC
AATCTTAGCA
AAGATATGGT
GTGGAAAACA
ACTATCAAAA
TTAATACACT
AATGAGGTAA
TTTATTTTAA
ACTACTTACA
TTAATTACTT
GGATTCAATA
TTTCATAATG
CTAAACATA-A
ATCACAGATG
TTAGACAAGA
TTTCTTAAAT
TTTCTCTTCA
AGAATTAACT
GCTTTCATTT
AGGAATGCTA
TCTCTACTTG
GAGTTTCATC
CCTCCAA.AG
AATTATATAG
GAGTATTACT
CAAAGCTATC
AGTGTAGGTA
GAGAAAATGA
GATCTAGAGC
9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 299 TTCAAAAGAT ATTAGAATTA AAAGCAGGAA TAAGCAACAA GTCAAATCGT TATAATGATA *0 S S
S.
ACTACALACAA
CATTTAGATA
AATCTCTGTT
GACATGCACC
GTGGATTATA
TTGAAGCTAT
TGATAALATGG
AGACCCATGC
AGTATGCAGG
AGTTCATGAG
TCCTGAGAGT
CTATAGGCAG
TATTTAGGIA
GTAACAATAA
ATCTTGATAG
GTGGTGATCC
AAGCTATAGT
AGCTCCAGGA
ATAAAAATCC
AAAGGCAAGC
TAGCCCCAAA
TAAATGACAT
AAAGTTTACC
TAACTAATAT
ATATGATGAG
TTATATCAGT
TGAAACATCA
CTCTTGGTTG
TCCTTTCATA
CAGATATCAT
ATCATTATTA
TGATAATCAG
ACAAGCAGAT
TATAGGCCAT
CAAAACAATC
AGGTCCATGG
CTTAACACAG
CATTTGGTTA
GCTATATTTA
CATTGATATG
TAATTTGTTA
ACATTCAGTG
TCTTCCAGAT
CAATGCCGAG
TAAAATTACT
CAAAATATTT
TATGCAAAAT
TTTTTATAAA
ACTTGAAAAA
GAAAAATATA
AAATGTTCTA
TGTATCTGCA
CATTTA.ACA.A
AAGGATCATG
ATGGGTGGTA
GATCTAATAT
TCAATTGATA
TATTTGTTAG
AAGCTTAAGG
CAGCACAATG
ATAAACACGA
GAGTTAGA.AT
TACAATCAAA
GATATATTGA
GCTTTATCAT
TATCGAAGCT
TTTGTGTTGA
GATAGACTGA
TTTGTAACAT
AGTGAGATTA
TCTAAAAGTG
ATAGAACCAA
GCAGAAAAAA
ACATCAGCAA
ACTTACTTA
TCATTACAGA
GTGATGTATT
TACCTCTTGT
TTGTTAATCT
TTGAGGGCTG
CTCTCAAAGG
TAAGCAAACC
CATTAAATAG
GAACAGAGAC
GAGTGTACTA
TACTTGATGA
ACAGAGGAGA
TTGCTTTGCA
AAGTATTAAA
TGTATATGAA
TTTATAGGAG
GCTATTATAC
ACAAATTCTT
TGATGAGGGA
ATAGATTAGC
CACAACATTA
CTTACCCTCA
TAGTTAATCT
TAGATACAC
TAAGGATACT
TCTTAGCAAA
AGATGILACTG
CACAATAATA
TAATGAGGTT
GTGTCAAAAA
GAAATTCTCT
AGTTAGACTT
CCTTAAATTG
CTATATATCC
TCCAGCCAGT
TTTTAAAGTT
AAGCTTATTA
ACTCCGAAAT
ACACTTAAAA
TTTGCCTATG
AACTCCAGAC
TGGTCACGAT
GACATGTGTC
TCCACAGGCT
AGTAACAGAA
TACTACCACT
TGGATTAAGA
TATATCAGGA
TGATATTAAT
TCCACTAGAT
TTCAATCAGG
CATGGAGTAC
TGTACATATA
GATGAACAAA
CTGTGGACCA
ATCACAGCTC
ATAGAGGGTC
TTATATAAAG
CGAGATATGC
ATCAAAAAAG
AGTTTAGAAT
TGCAGTTTAA
CATGCATTAT
ACTTTTTTTA
CTGTTTGGTG
TTCCTTACAG
TTACAAGATA
ATCACATTTG
TTAGGGTCTG
GTCTTAAGTA
GAGATTGATC
GTTGTTTATG
ACAAAATCCA
AGGGCTACTG
TGTAACAAAG
10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 300 S S
SS
S
S
S.
S
S
S
a S
*SSS
S.
ACAAAAGAGA
GAGAAAGATC
CAATGGACAT
ATGTTAATAG
CGCAGGAGA-A
ACCAAATAGA
AATTCATGGA
TGTTTCCACA
GTGAATTCCC
CTATCAATCA
ATTGCATAAG
CTAATAGAAT
TTACAGGAGA
TACCAGATAA
AATCTGGATC
ATAATGCTTA
TGAAAGATTC
TGTTCATTAA
AAGGTTATGG
TGGAGTTAAT
TCATAAAATA
TTAAGTTGTG
TTAACATAGA
GAATGGGGTT
AATTTTACAC
GTTATTAAGT
TTGGTCATTA
TAAATATACA
TTTAACTCGT
AAAAACAATG
TTTATTAGCA
AGAACTGAGT
ATATCTAAGT
TGCATCAATA
TGTATTALACA
TTTTGGTCTT
TATTCTCATA
TGTTGATATC
AATAAGTTTA
TCACATCAAC
TATTTTAAGT
AAAAGGTATT
TTTGAATGTT
TAAAGCAAAA
AGACAGTAGC
CATAGTCAAT
GTTTTTAAAA
TTATCACCCA
AATAAATGTA
ATCAAATCTC
TTAGAAAATC
TCCAATATAG
ACTAGCACTA
GGTGAAAGAG
CCAGTGTACA
AAATTAGACT
ACTGGA.ACAC
GTCAATTATT
CCAGCTTATA
GAAAAGTATG
AGCCTGATGT
CCGAAGCTGA
ATCAAGTTGA
ACCCAATATG
TCTAATTTAA
ACTAATTTAG
TTTGAAAAAG
TTCTTTAATG
TTAGAATGTG
TACTGGAAAT
CAAGACACAA
CGCCTTAATA
ACACACATGA
GATAAATTAA
TTTTACATTA
TTAGTATAAC
TAGGAGTAAC
TAGCCAGTGG
GACCCACCA
ACAGACAAGT
GGGTATATGC
TTGGACTGTC
TACACCGTTT
GAACAACAAA
GAGATGAAGA
CGGTTGTGGA
ATGAGATACA
AGC.AAGTGAT
TAGAATTATT
TATTAGTACA
CTGGACATTG
ATTGGGGAGA
CTTATAAGAC
ATATGAACAC
CTATGTCTAA
GTTTGCGTAG
ATGCTAAATT
AAGCTATATT
CCATTAAAAA
GTTATAACTT
TGAATTA-AGC
ATCGCCAAGT
TATA.ATAATA
GCCATGGGTA
TTTAACCAAA
ATCCATAGAC
ATATGAAAAA
AACAGTCAGT
TTATCATTTT
TATCGACATT
ACA.ATTCACA
TTTGATGA
ACAAAAGCAG
CTTAAGTAAC
TAAAATGTCT
GATTCTGATT
GGGGTACATA
TTATTTGCTA
TTCAGATCTT
AGTTTTCCTA
AATAAAAGGC
TACCGTATGC
ATCTTACATA
TAAAALACAAA
TTCAGACAAC
AAGTATGTAA
ATTATGTTCA
GAAAAATATA
GGCTCATCCA
AAGCAA.AGAG
AACAAAGATG
GCCAAAAAGT
AGTAGACCAT
GATACTAGTC
GTGTTTCAAA
AACATATGTC
CCTCCTATAT
CACATGTTCC
AAAGCACTTA
GATTATTTTC
ATTCAACTTA
ACTGATCATA
TGTTTTCATA
CTTTGTGTTT
GAACAAAAAG
TGTCACAGTT
CCTTGGGTTG
GATTTAGTTA
TTCAATGATG
ACTCATTTGC
12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 TAACAAAACA AATAAGAATT GCTAATTCAG AATTAGAAGA TAATTATAAC AAACTATATC 301
ACCCAACCCC
ACAAACCTZA
GTAAAATGCA
TGTACAATTT
CAAAATCTAA
CATCACTTTA
CCACAGGATG
GTTGTATAGC
TTCATCCAGA
CTATTGAATT
CCATTCCTGC
CAGAkACCTAT
AAATTATAAT
GATGCATTTT
TTACTATATT
TAATCCTTAC
AATTGACACT
TCGATGCAA.A
TTAAGACTTC
TAGCTGGACG
TAAAATGGCT
ACATGATAGA
AGCTCAAGAA
AGTTTAAAAT
AGTTATTAAA
TTTAGTCTTA
AGAAACTTTA
ATTTTGTATA
TATTAAATCT
ATTTCCAATT
CCAACTTTAC
TTGCATGCTT
CAAGATCAGT
ATTCATAGGT
CATAAGATAC
TCTAAGGTTA
TACAGATGCA
TAGCATCTTT
TGAATGGAGT
AATTGCAAAA
AAAAACTTAC
IAATAGGCCCT
TTCAAGAACT
TATTAAALAGC
ATTGTCAAAA
TAATGAAGTA
AGATCATGTT
GTCCACATAT
GCTGATTAAA
ATCATTAACA
AAATATACAA
AGGGGTTAAA
GAAAATATGT
AGTGGAAATA
TCCACTGTTA
GTTGTGATAG
ACCACCACTT
CCTTGGCATC
ATAGAGTATA
GAAGGAGCTG
ATTTACAGA-A
TACALACGGGC
ACTAATAACA
GTCTGCGATG
AAGCATGTAA
TATCATGCTC
GTGTGCCTAG
GCAAATATAC
AAAA.ATTTCA
TTAATACCTT
TTGAAGAGTG
TTCAGCAACA
TTAAATTTTA
CCTTACTTAA
ATAACAGGTA
AGTTTGGTCA
ACTTTTCAAT
TAAAAGTCTA
CATTAATTCC
CCGAATCTAT
CCACAAGATT
ACA.AGATTAT
CACATCAGAC
ATGTCAATAG
TTTTAAAAGA
GTAACTTATT
GTTTAAAAGA
ATATAXACAT
TTCATTGGTC
CTGAATTACC
GAAAGTGCAA
AAGATGACAT
GTAGCAAGTT
TTCCTGTTTT
TTATGCCTAA
TCCTTTGTTA
TAGTTAATGG
AGCTTATAAA
GATCAGCTGA
GTGAATTGTT
GTGTGCTATA
AATTTAGATG
AATTTAGCAT
AAACTAACAA
TGTTAAAAGT
GATGATGTCA
CAATTATAGC
AGATCATTCA
ATCTTTAGTA
ATTTAACTTT
TCTTAAGATT
ATTACGTACG
TTGCAATGAT
AGATTATGGT
TTATTTACAT
TGTTACACC
GTACTGTTCT
TGATTTCAAA
AAAAGGATCT
TGATGTTGTA
AAAAACTGAC
CCCTATAACA
AGATATATTA
CCACAAGCAT
ACTTAATTAC
AAATAGTTTA
CAACCTTCCC
CTAACACATC
ATTGATTCCA
TTATACATGT
AA.TAATAGTA
ACATTCTCTA
AAACAAGACT
GGTAATACAG
AGGA.ATAGTG
GTATTTAGTT
AAGGACCCCA
GTAGTAGA-AC
CATAGTTTAC
GAGAATTTAA
ATAAAATTTG
AATTGGAGTA
TCTGTAAATA
TTAGATAACA
GAAGTTTACT
CAAAATGCTA
AAGGAATCTA
AAAAAAGGAA
TCATATTCTA
ATGAATATCC
AATCATTTAT
ACAACCAATG
AACGAACAGT
ATTATATTAT
AA-ATTATCAT
GCATTCACAA
13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 302 CACAACGAGA CATTAGTTTT TGACACTTTT TTTCTCGT 15218 INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 2166 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: Met Asp Pro Ile Ile Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 1 5 10 Ser Tyr Leu Lys Giy Val Ile Ser Phe Ser Glu Cys Asn Ala Leu Gly 25 Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Aen Asp Tyr Thr Asn Leu 40 Sle Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 50 55 Thr Ile Thr Gin Ser Leu Ile Ser Arg Tyr His Lys Gly Glu Leu Lys 70 75 Leu Giu Giu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 90 Met Ser Ser Ser Giu Gin Ile Ala Thr Thr Asn Leu Leu Lys Lys Ile *.100 105 110 Ile Arg Arg Ala Ile Giu Ile Ser Asp Val Lys Val Tyr Ala Ile Leu 115 120 125 Asn Lys Leu Gly Leu Lys Giu Lys Asp Arg Vai Lys Pro Asn Asn Asn 130 135 140 Ser Gly Asp Glu Asn Ser Val Leu Thr Thr Ile Ile Lys Asp Asp Ile 145 150 155 160 Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 165 170 175 His Ser Val Asn Gin Asn Ile Thr Ile Lys Thr Thr Leu Leu Lys Lys 303 180 Leu Met Cys Ser Met Gin His Pro S. S S
S
S
S
S
S
S*
S
S.
S
.5* Asn Giu 225 Gly Gly Lys Ser Phe 305 Leu Gly Arg Ile Asp 385 Leu Asn Met Leu 210 Val1 Phe Leu Asp Asn 290 Asn Lys Phe Lys Lys 370 Lys Ser Asn Val 195 Tyr Lys Gin Lys Ile 275 Cys Asn Leu Ile Arg 355 Ala Thr Lys Leu Asp 435 Thr S er Phe Lys 260 Ser Leu Val Phe Met 340 Phe Gin Val Phe Ser 420 Glu Lys His Ile 245 Ile Leu Asn Val His 325 Ser Tyr Lys Ser Leu 405 Giu Arg Leu Giy 230 Leu Thr Ser Thr Leu 310 Asn Leu Asn Asp Asp 390 Lys Leu Gin Asn 215 Phe Asn Thr Arg Leu 295 Ser Giu Ile Ser Leu 375 Aen Leu Tyr Ala 200 Asn Ile Gin Thr Leu 280 Aen Gin Gly Leu Met 360 Leu Ile Ile Phe Met 440 185 Pro Ile Leu Tyr Thr 265 Aen Lye Leu Phe Asn 345 Leu Ser Ile Lye Leu 425 Asp Ser Leu Ile Gly 250 Tyr Val Ser Phe Tyr 330 Ile Asn Arg Asn Leu 410 Phe Ser Trp Thr Asp 235 Cys Asn Cys Le u Leu 315 Ile Thr As n Val Gly 395 Aila Arg Val1 Leu Gin 220 Aen Ile Gin Leu Gly 300 Tyr Ile Giu Ile Cys 380 Lye Giy Ile Arg Ile 205 Tyr Gin Val Phe Ile 285 Leu Gly Lys Giu Thr 365 His Trp Asp Phe Ile 445 190 His Arg Thr Tyr Leu 270 Thr Arg Asp Glu Asp 350 Asp Thr Ile Asn Gly 430 Asn Trp Ser Leu His 255 Thr Trp Cys Cys Val 335 Gin Ala Leu Ile Asn 415 His Cys Phe Aen Ser 240 Lye Trp Ile Gly Ile 320 Glu Phe Al a Leu Leu 400 Leu Pro Asn Giu Thr Lye Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 450 455 460 304 Phe Ile Tyr Arg Ile Ile Lye Gly Phe Val Asn Thr Tyr Asn Arg Trp 465 470 475 480 a. a.
a a a.
a a Pro Tyr Leu Lys Pro 545 His Asp Asn Asn Val 625 Ile Giu Leu Asn Asn 705 Asp Thr Lys Ile Lys 530 Lys Ile Arg Giu Ser 610 Gly Leu Ser Lys Asn 690 Gin Giu Leu Leu Ile Val Asp Gin S er Cys 595 Asn Arg Ala Leu Ala 675 Tyr Ala Leu Arg Asn 500 Leu Asp Leu Asn Arg 580 Asp His Met Giu Thr 660 Gly Ile Phe His Asn 485 Thr Ser Leu Ile Tyr 565 Arg Leu Vai Phe Lys 645 Arg Ile Ser Arg Gly 725 Al a Tyr Giy Giu Trp 550 Ile Vai Tyr Val Ala 630 Met Tyr Ser Lys Tyr 710 Val Ile Pro Leu Met 535 Thr Giu Leu Aen Ser 615 Met Ile Gly Aen Cys 695 Giu Gin Val Ser Arg 520 Ile Ser His Giu Cys 600 Leu Gin Ala Asp Lys 680 Ser Thr S er Leu Leu 505 Phe Ile Phe Giu Tyr 585 Val Thr Pro Giu Leu 665 Ser Ile Ser Leu Pro 490 Leu Tyr Asn Pro Lye 570 Tyr Val1 Gly Gly Asn 650 Giu Asn Ile eye Phe 730 Leu Glu Arg Asp Arg 555 Leu Leu Asn Lys Met 635 Ile Leu Arg Thr Ile 715 Ser Arg Ile Giu Lye 540 Aen Lye Arg Gin Giu 620 Phe Leu Gin Tyr Asp 700 Cys Trp Trp Thr Phe 525 Al a Tyr Phe Asp Ser 605 Arg Arg Gin Lye Asn 685 Leu Ser Leu Leu Giu 510 His Ile Met Ser Asn 590 Tyr Giu Gin Phe Ile 670 Asp Ser Asp His Asn Tyr 495 Aen Asp Leu Pro Ser Pro Pro Ser 560 Giu Ser 575 Lye Phe Leu Asn Leu Ser Ile Gin 640 Phe Pro 655 Leu Giu Aen Tyr Lye Phe Val Leu 720 Leu Thr 735 305 Ile Pro Leu Val Thr Ile Ile Cys
S
*S0.
S*
9 9 *9*9 Ile Leu Trp 785 Lye Ile Asp Ala Asp 865 Pro Ile Gin Arg Ala 945 His Leu Leu Lye Asp 755 Tyr Arg 770 Thr Ile Phe Ser Ser Lys Tyr Leu 835 Gly Ile 850 Met Gin Ala Ser Leu Asp Glu Leu 915 Asn Ile 930 Leu Cys Leu Lye Tyr Met Tyr Arg 995 His Val Tyr His Glu Ala Ile Thr 805 Pro Val 820 Leu Ala Gly His Phe Met Ile Lys 885 Asp Phe 900 Glu Tyr Trp Leu Asn Asn Thr Phe 965 Asn Leu 980 Ser Phe Val Met Ile 790 Ala Arg Leu Lye Ser 870 Lye Lys Arg Tyr Lye 950 Phe Pro Tyr Asn Gly 775 Ser Leu Leu Asn Leu 855 Lys Val Val Gly Asn 935 Leu Asn Met Arg Leu 760 Gly Leu Ile Ile Ser 840 Lye Thr Leu Ser Glu 920 Gin Tyr Leu Leu Arg 1000 Thr 745 Asn Ile Leu Asn Glu 825 Leu Gly Ile Arg Leu 905 Ser Ile Leu Asp Phe 985 rhr Tyr Glu Glu Asp Gly 810 Gly Lye Thr Gin Vai 890 Glu Leu Ala Asp Ser 970 Gly Pro Arg Val Gly Leu 795 Asp Gin Leu Glu His 875 Gly Ser Leu Leu Ile 955 Ile Gly Asp His Asp Trp 780 Ile Asn Thr Leu Thr 860 Asn Pro Ile Cys Gin 940 Leu Asp Gly Phe Ala Glu 765 Cys Ser Gin His Tyr 845 Tyr Gly Trp Gly Ser 925 Leu Lys Met Asp Leu Pro 750 Gin Gin Leu Ser Ala 830 Lye Ile Vai Ile Ser 910 Leu Arg Val Ala Pro 990 Thr Pro Ser Lye Lye Ile 815 Gin Glu Ser Tyr Asn 895 Leu Ile Asn Leu Leu 975 Asn Glu Phe Gly Leu Gly 800 Asp Ala Tyr Arg Tyr 880 Thr Thr Phe His Lys 960 Ser Leu Ala 1005 Ile Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 306 1010 1015 1020 Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 1025 1030 1035 1040 Thr Cys Vai Ile Thr Phe Asp Lys Asn Pro Asn Ala Giu Phe Val Thr 1045 .1050 1055 Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Giu Arg Gin Ala Lys Ile 1060 1065 1070 Thr Ser Giu Ile Asn Arg Leu Ala Val Thr Glu Val Leu Ser Ile Ala 1075 1080 1085 Pro Asn Lys Ile Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Giu 1090 1095 1100 Ile Asp Leu Asn Asp Ile Met Gin Asn Ile Glu Pro Thr Tyr Pro His 1105 1110 1115 1120 Gly Leu Arg Val Val Tyr Giu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 1125 1130 1135 Sle Val Asn Leu Ile Ser Giy Thr Lys Ser Ile Thr Asn Ile Leu Giu 1140 1145 1150 Lys Thr Ser Ala Ile Asp Thr Thr Asp Ile Asn Arg Ala Thr Asp Met 1155 1160 1165 Met Arg Lys Asn Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp Cys 1170 1175 1180 Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser Ile Thr 1185 1190 1195 1200 Giu Leu Ser Lys Tyr Val Arg Giu Arg Ser Trp Ser Leu Ser Asn Ile 1205 1210 1215 **Val Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asp Ile Lys Tyr 1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile Ile Ile Glu Lys Tyr Asn Val 1235 1240 1245 Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Giy 1250 1255 1260 Ser Ser Thr Gin Giu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 1265 1270 1275 1280 Leu Thr Lys Lys Gin Arg Asp Gin Ile Asp Leu Leu Ala Lys Leu Asp 1285 1290 1295 307 Trp Val Tyr Ala Ser Ile Asp Asn Lys Asp Giu Phe Met Giu Giu Leu 1300 1305 1310 Ser Thr Gly Thr Leu Gly Leu Ser Tyr Giu Lys Ala Lys Lys Leu Phe 1315 1320 1325 Pro Gin Tyr Leu Ser Vai Asn Tyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 Arg Pro Cys Giu Phe Pro Ala Ser Ile Pro Ala Tyr Arg Thr Thr Asn 1345 1350 1355 1360 Tyr His Phe Asp Thr Ser Pro Ile Asn His Val Leu Thr Giu Lys Tyr 1365 1370 1375 Giy Asp Giu Asp Ile Asp Ile Val Phe Gin Asn Cys Ile Ser Phe Gly 1380 1385 1390 Leu Ser Leu Met Ser Val Val Giu Gin Phe Thr Asn Ile Cys Pro Asn 1395 1400 1405 Arg Ile Ile Leu Ile Pro Lys Leu Asn Giu Ile His Leu Met Lys Pro 1410 1415 1420 .Pro Ile Phe Thr Gly Asp Val Asp Ile Ile Lys Leu Lys Gin Val Ile *1425 1430 1435 1440 Gin Lys Gin His Met Phe Leu Pro Asp Lys Ile Ser Leu Thr Gin Tyr .1445 1450 1455 Vai Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His Ile 1460 1465 1470 Asn Ser Asn Leu Ile Leu Val His Lys Met Ser Asp Tyr Phe His Asn 1475 1480 1485 *Ala Tyr Ile Leu Ser Thr Asn Leu Ala Gly His Trp Ile Leu Ile Ile *1490 1495 1500 Gin Leu Met Lys Asp Ser Lys Gly Ile Phe Giu Lys Asp Trp Gly Giu 9**1505 1510 1515 1520 Gly Tyr Ile Thr Asp His Met Phe Ile Asn Leu Asn Val Phe Phe Asn 1525 1530 1535 Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Giy Tyr Gly Lys Ala 1540 1545 1550 Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu GiU 1555 1560 1565 308 Leu Ile Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 1570 1575 1580 Gin Lys Val Ile Lys Tyr Ile Val Asn Gin Asp Thr Ser Leu Arg Arg 1585 1590 1595 1600 Ile Lys Gly 0 ys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 1605 1610 1615 Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn Ile Asp Tyr His 1620 1625 1630 Pro Thr His Met Lys Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met 1635 1640 1645 Gly Leu Ile Asn Val Asp Lys Leu Thr Ile Lys Asn Lys Asn Lys Phe 1650 1655 1660 Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr Ile Ser Tyr Asn Phe 1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lys Gin Ile Arg Ile Ala Asn Ser 1685 1690 1695 Giu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr *991700 1705 1710 9Leu Glu Asn Met Ser Leu Ile Pro Val Lys Ser Asn Asn Ser Asn Lys *1715 1720 1725 *Pro Lys Phe Cys Ile Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 1730 1735 1740 *9Phe Ser Ser Lys Met His Ile Lys Ser Ser Thr Val Thr Thr Arg Phe 1745 1750 1755 1760 Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro Ile Val Val Ile *.1765 1770 1775 Asp Lys Ile Ile Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 1780 1785 1790 999Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 1795 1800 1805 Leu Tyr Cys Met Leu Pro Trp His His Val Asri Arg Phe Asn Phe Val 1810 1815 1820 Phe Ser Ser Thr Gly Cys Lys Ile Ser Ile Glu Tyr Ile Leu Lys Asp 1825 1830 1835 1840 Leu Lys Ile Lys Asp Pro Ser Cys Ile Ala Phe Ile Gly Glu Gly Ala 309 1845 1850 1855 Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile 1875 1880 1885 Glu Phe Leu Arg Leu Tyr Asn Gly His Ile Asn Ile Asp Tyr Gly Glu 1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala.Thr Asn Asn Ile His Trp Ser 1905 1910 1915 1920 Tyr Leu His Ile Lys Phe Ala Glu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys Ile Ile Ile Glu Trp 1940 1945 1950 Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 1955 1960 1965 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 0 1985 1990 1995 2000 Lys Gly Ser Glu Val Tyr Leu Ile Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Thr Leu Ser Arg 2020 2025 2030 Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lys Glu Ser Ile Asp 2035 2040 2045 Ala Asn Ile Lys Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lys 2050 2055 2060 Lys Gly Ile Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly o.o. 2065 2070 2075 2080 Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Asn Glu Val Phe Ser Asn 2085 2090 2095 Lys Leu Ile Asn His Lys His Met Asn Ile Leu Lys Trp Leu Asp His 2100 2105 2110 Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 2115 2120 2125 310 Ile Giu Ser Thr Tyr Pro Tyr Leu Ser Giu Leu Leu Asn Ser Leu Thr 2130 2135 2140 Thr Asn Glu Leu Lys Lys Leu Ile Lys Ile Thr Gly Ser Vai Leu Tyr 2145 2150 2155 2160 Asn Leu Pro Asn Giu Gin 2165 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 15229 base pairs TYPE: nucleic acid STRA1NDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGGAAA AAATGGGGCA AATAAGAATT 4* TGATAAGTGC
TGATAAAGGT
CATGTTATAC
CAATTAAATT
ACAACAATAT
ACATATGGGA
ATTGTGAAAT
AAATATCTGA
AGACATGTGT
ATAAACTCAC
ATTGATGATC
AGAAATCATA
TGATGAA.AGA
TATTTAAATC
TAGATTACAA
TGACAAATTA
AAACGGCATA
TGTAGTGAAA
ATTGATTGAG
CAAATTTTCT
TTTACTTGGG
TTATCACCAT
CTAATCAGTC
ACAGACATGA
ACACACAAAT
CAAGCTACAT
TAACCTTTTC
AATTTATTTG
ATTCTTCTGA
GTTTTTATAC
TCTAACTTTA
TTGACACACT
AAAAGACTAA
CTTGATCTCA
TTTAGTTAAT
AAACCATGAG
GACCCCTGTC
TCATATACTT
TTACATTCTT
AATCAGAKAT
ACAATGACGA
CCAATGCATT
ATGTTATAAC
CAACAATGCC
GCTCTCAATC
GTGACTCAGT
ATTCATGAAT
ATAAAACCTC
CACTACAAAT
GATGGAATCA
GATAAACAAT
AGTCAATTAT
GGGGTGCAAT
AGTAGCATTG
AGCCAAAGCA
AAGCAGTGAA
AATATTACAA
AAATGGTCTA
AATGACTAAT
TATGTTTAGT
ATCAAAGGGA
GACAACACTA
ATAATAACAT
GAATGTATTG
GAGATGAAGC
TCACTGAGCA
TTAAAAATAA
GTAATACATA
GTGTGCCCTG
AACGGAGGAT
ATGGATGATA
TATATGAATC
CTAATTTA.AT
AATGGGGCAA
CTATGCAAAG
CTCTCACCAA
TAAGAAAACT
TATTGCACAA
120 180 240 300 360 420 480 540 600 660 720 780 840 311 a.
a. a a a a.
a a a a a.
AGTAGGGAGT
CATGCCTATA
ACACACTCCT
CATCCAAACT
TTTTAGTAAT
GCTCTTAGCA
AALATACACTA
AALACACCTAA
ACAGGATTAA
ATACTTAAAG
CAAGATATAA
GAAATACAAG
ATGGGAGAAG
TGTATAGCTG
GTAATTAGGA
CCAAAGGATA
GTTTTTGTGC
ATCTTTGCAG
GTTCTAGCCA
GAACAAGTTG
CATATATTGA
AGTGTGGTCC
AGAAACCAAG
GTAATAAACT
CTCAACCCCA
ACCAAATACA
TTTATCAATC
ATAATATACA
AAGCTATTCC
TAAAAATAAA
AAGTCAAGTT
TTCAACGTAG
ACAAACTATG
TAGGTATGTT
ATGCTGGATA
ACGGAA.AGGA
TCAATATTGA
TGGCTCCAGA
CACTTGTAAT
GGGCAAACAA
TAGCTAACAG
ACTTTGGCAT
GATTATTTAT
AATCTGTAAA
TGGAAGTTTA
ACAATCCAAA
TAGGCAATGC
ATCTATATGA
ACAGTGTATT
AAGAAGATGA
AGAAATACAC
ATGACGGGTT
AATATGACCT
TCAAACAACA
GGCAGAGCCA
AALATGATACA
TACAGGAGAT
TGGTATGCTA
ATATGCTATG
TCATGTTAAA
AATGAAATTC
GATAGAATCT
ATATAGGCAT
AACCAAGTTA
TGTCTTAAAA
TTTTTATGAA
TGCACAATCA
GAATGCCTAT
AAATATCATG
TGAGTATGCA
AGCATCATTG
AGCAGGTCTA
TGCAGCCAAA
AGACTTAACA
TGTAGAGCTT
TGAATATAAT
TCTAGAATGT
CAACCCGTAA
GTGCTCAACA
ATAACATAAA
TTAAATAAGG
TTAATCACTG
TCCAGGTTAG
GCTAATGGAG
GAAGTATTIA
AGAAAGTCCT
GATTCTCCAG
GCAGCAGGAG
AACGAAATAA
GTGTTTGAAA
TCCACAAGAG
GGTTCAGGGC
CTAGGACATG
CAGAAGTTGG
CTGTCATTAA
GGCATAATGG
GCATATGCAG
GCAGAAGAAT
TAAGTTAACA
ACAAAATATG
ATTGGCATTA
ATTCCAACAA
GTTAAGAAGG
TTGGGGCAA.A
ATCAGCTGCT
CTCCCAATTA
AAGATGCAAA
GAAGGGAAGA
TAGATATAAC
CATTATCAAG
ACAAAAAAAT
ACTGTGGGAT
ATAGATCAGG
AACGCTACIA
AACACCCTCA
GGGGTAGTAG
AAGTAATGCT
CTAGTGTCCA
GAGGAGAAGC
CTCAATTTCC
GAGAGTATAG
AGCAACTCAA
TGGAAGCCAT
AAAAATACGG
GCACTTTCCC
AGCCTACAAA
AAAACTAACC
AGCTAATCCA
TACAAAGATG
GTCATCCAGC
TGATGTGCAA
TCATAAATTC
CACTATAAAG
ALACATATCGT
CTTGACATCA
GCTAAAAGAG
GATAATACTG
TCTTACAGCA
GGGCCTCATA
TCTTATAGAT
AGTTGAAGGA
AAGATGGGGA
GGCAGAAATG
TGGATTCTAC
TAACTTCTCA
AGGTACACCA
AGAAAATGGA
AAAGCATCAA
GGCAAATAAG
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 TCAACATGGA GAAGTTTGCA CCTGAATTTC ATGGAGAAGA TGCAAACAAC AAAGCTACCA 312 q
AATTCCTAGA
GCATAATATC
GCACCAACAT
ACCCAAGAAA
CTAAGTTGTA
CATATGAAGA
ATGAAAAATT
CCACCTCAGC
AAAAAATAAG
GGAATGAGGA
CTTCCAAA;A
ATGATTTTTG
AATCCATTGA
AATCAATCAA
AAGAACAAGA
ACACAGCAGC
TATGGGTGCC
GCATCAACAT
TTAACTCAAG
TATCATTAGA
CATGCAGTCT
TGAAAACATT
CATCAAAAAG
TGAACTCACT
TTATTCCCTA
AATATATCAA
ATCAATAAAG
TGTTAACTCA
CATCKATCCA
ACCCCTAGTA
CAAAGAAACA
AATAAATGAT
ALAGTGAAATA
TCGCGATGGA
AGCGGAAGCA
AAGCGAAAAA
ATTGAGTAAT
ATCAGTGATC
ATCAACTGCC
CCAATTGATC
TGGGGCA.AT
TGTTCAGTAC
TATGTTCCAG
ACTAGTGAAG
AAGTGCTGTG
TGAAAGAAGC
AACATGCTTA
CAATCCCACT
AGTAATAATA
AGAAAATATA
TGCAGGATTA
GCCACAGAGT
GGCAAGTTTG
ATAGATATAG
ATAAGTGAAG
AGCTTCAALAG
ATAGAAACAT
CAAACAAATG
TTAGGAATGC
ATAAGAGATG
TTA.ATGACCA
ATGGCAAAAG
TTGTTGGA.AG
AACTCACTCA
AGACTGAACA
AATCAGCGAC
ATGGAAACAT
AATGTTCTAG
TCATCTGTGC
CAGATCTCCA
CTGGCACAAA
AAATTAGCAT
AAAGTAAAAA
CATGAGATTA
CCAACCTATC
GCAACCACCG
GTATTAGTTA
CAATTTATAG
CATCATCCAJA
AAGTALACTAA
CTGATAGTAC
AAGATCTCAC
TTGATAACAA
ACAACATTAC
TCCATACATT
CTATGGTTGG
ATGATAGGTT
ACACCTCAGA
ACAACGATAG
GCAATCAACA
CACAAACGTC
CTAACAAAAT
ACGTGAACAA
AAAAAGATGA
CAGCAGACTT
CGCCCAAAGG
TGCCTAGTAG
ATGATGTAAC
GTATGTTAAC
TTGCTCTATG
TAAGATCAAT
AATTCAAAA.A
TCACAGTTAC
TAGATCTTGG
AGATCCTAAG
AGAGAGCCCG
CCCAGAAGCT
CCCAAGTGAC
TGAAGAAGAA
AGCAAGACTA
AGTAGTTGCA
TCTAAGAGAA
AGAGGCTATG
TGAAGTGTCT
TGACAATGAT
ACATCAATGA
CATCAGCAGA
TA.ACAATATA
GCTTCACGAG
TGATCCTGCA
GCTCATAAAA
ACCTTCACTA
TTTTATCATA
TACACCTTGT
TACAGTCAA.A
TGAATTTGAA
TAGTGTCAAA
TGCTATCACC
TGACAATAAA
GGCCTACCTA
AAGAAAGATA
ATAACATCTG
AAAGCCAACT
A-ACCCCTTTT
TCTAGCTACT
GATAGAATTG
AGTGCAGGAC
GAA.ATGATAG
GCAAGACTTA
CTTAATCCAA
CTATCACTTG
AACAGACATC
ACTACCAACC
GTAAC.AAAAA
GGCTCCACAT
TCACTAACAA
GAACTTGCAA
CGAGTCACGA
AGTGCAAATG
GAAATCAAAG
GATCTTACCA
AATATTATGA
AACAAGGACC
kLATGCGAAAA
GGAGCATTCA
GAAAAAGAGA
2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 0 313 GCATATATTA TGTGACTACA AATTGGAAGC ATACAGCTAC ACGTTTTTCA ATCAAACCAC S. ~1
TAGAGGATTA
ACATTATATC
CATTTAAAAT
AATAAAACCA
ATTTATACCG
CAATAGAATT
TAATCTCTTT
ATAAA.ACATT
GTTCTACCAT
TTCTCACAGA
ATTTAAAAAT
ATGCAGCCAT
ATACTCTTAA
TAGCACAAAT
CCATAATATT
CAATAAAAAA
GGGTCAA.CTC
TATCACCAAA
CCACTTCAAC
AACCAMAAGA
ATAATCAACT
CAACCATCAA
CACCAGCCAA
AGACCACAGA
AATACACAAT
AACTTAATTA
CAAACATCAT
CCAAAATTAC
AAATATGGGG
CCAATTCAGT
CACAAGCAAA
ACTALATTATA
CTGCAACAALA
TATGCTGTGT
GTCATGGTGG
TAACATAATG
GTCCAAGCAC
TCATCTAATT
AGCACTGTCA
CATCATCTCT
CCACACTGAA
ATCCAAACAA
TACAAAATCA
ACAGACCAAC
TGATTACCAT
CTGCAAATCC
ACCCACAAAC
AATGCCAAAA
AAGAGACACC
CCAACAGCAA
TCAACACTAA
GAGCATTTAC
TTCCAGCTAT
TAAATAGACA
ACATATACTA
TTTTGGCCTT
ATCACTATTA
ACTCTTGAAC
CAAATTATAA
CGCAAAACCA
ATGAATTATT
AAGAATCGGC
GTAATATCCT
GTTTTGGCAA
GCCAATCACA
AAAAACATCT
CCCACAACCA
GAAACACACC
AAGCCAAGCA
TTTGAAGTGT
ATCTGCAAA-A
AAACCAACCA
AAAGAAATCA
AGCATTTCAC
TCCCTCCACT
ATGACAGGTC
ACTACACACT
CATCTGTTAG
TTAGTTAGAG
TAAATCTCAA
ATTTTACACT
TGATTGCAAT
TAGGACAGAT
TCTTGTATAT
CGCCAACCAT
GGTATGAGAT
GCACTGCCGG
CTTGTTTATA
TGATAATCTC
AAGTTACACT
CCACCTACCT
CATCACCAAT
ATACAACAGC
CAAAATCACG
TCAATTTTGT
CAATACCAAG
CCAA.AACCAC
TCACCAACCC
AATCCACCGT
CAACCACCTC
TTTACCATAT
ACCTAGAGTG
TTCAATCAAT
AATGGGAAAT
AATACATATG
ACTAAATAAG
GTATCAAATC
ATA.AACAAAC
CATGATAGCA
CAGGAACAAC
GACTCTAGAA
CAGATTAAAT
AACCTCTCTC
AACAACGGTT
TACTCAAGTC
CCACACAAAT
ACAAACCAAA~
TTCAAAAAAT
TCCCTGTAGT
CAACAALACCA
AALACAAAAGA
AGCAAAAAA
GCTCGACACA
CGAAAACACA
AAATCAATCT
CGAATAGGTA
CT CAACAAC C
ACATCCATCA
ATCTTAACTC
CTAAGTGAAC
AACACATAGT
AAATCCAATC
TAGAGTAGTT
ATTGGGGCAA
AGGACCTGGG
TTAAAATCTA
ATAATTGCAG
ACAGTTCAA6A
CCACCAGAAA
TCAGCCACAA
GGCAGAATCA
CCACCAAAAA
ATATGTGGTA
AAGAAAAAAC
GACCCCAAAA
CCAACCCTCA
ATCACTCCAA
CCCAGCTCCA
CACATATATC TTCAAACTAT 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 314
C.
CACAAATACC
ATGCTTAGTT
AAATTAAACC
CTAACTCTTG
TACCAATCGA
TATACCAGTG
GACACTAAAG
TTACAGCTAC
CAGTACATGA
AGGAAACGAA
GCTGTATCCA
TCTACAI.ACA
TTAGATCTCA
CGCATCTCCA
ATCACCAGAG
ACAAACAGTG
TTAATGTCAA
AAGGAAGAAG
TGCTGGAAAT
TGTTTAACAA
CCACAGGCTG
TTAACATTAC
TGCAAAATTA
ATAGTGTCAT
AAGACATTTT
GGCAACACTT
CACAGCATCC
ATTCAAAA-AC
TGGGGCAAAT
CTGTTAATGC
CATGTAGTGC
TCATAACAAT
TAAAACTTAT
TTATGCAAAA
ACTACACAAT
GATTTCTGGG
AAGTTTTACA
AAGCTGTAGT
AGAATTACAT
ACATTGAAAC
AATTTAGTGT
AGTTACTATC
GCAATGTTCA
TCCTTGCATA
TACACACATC
GGACTGATAG
ATACTTGCAA
CAAGTGAAGT
TGACATCAAA
GCTATGGAAA
CTAATGGTTG
TATACTATGT
GAGCCCTCCA
TACATCTTAG
AACCATGGAG
ATTGTACCTC
AGTTAGCAGA
AGAATTAAGT
AAAACAAGAA
CACGCCAGCT
CAATACCACA
CTTCTTGTTA
CCTTGAAGGA
CAGTCTATCA
AAATAkCCGA
AGTTATAGAA
TAATGCAGGT
ATTGATCAAT
GATAGTAAGG
TGTTGTACAG
ACCTCTATGC
AGGATGGTAT
AGTACAGTCC
CAGCCTTTGT
AACAGACATA
AACTAAATGC
TGACTATGTG
AAACAAGCTG
CATTAAATCC
CAGAGAACCG
TTGCTGATCC
ACCTCAAGTC
GGTTATTTTA
AATATAAAAG
TTAGATAAGT
GCCAACAACC
AAAAACCTAA
GGTGTAGGAT
GAAGTGAACA
AATGGGGTCA
ATATTACCCA
TTCCAGCAGA
GTAACAACAC
GATATGCCTA
CAACAAAGTT
CTACCTATCT
ACCACCAACA
TGTGATAATG
AATCGAGTAT
AACACTGACA
AGCAGCTCAG
ACTGCATCCA
TCAAACAAAG
GAAGGCAA
TAATTAAAAA
TGATCTATCA
ACAGGTCAAG
AGAACATAAC
GTGCTTTAAG
AAACCAAATG
ATAAGAATGC
GGGCCAGAAG
ATGTATCAAT
CTGCAATAGC
AAATCAAAAA
GTGTTTTAAC
TAGTAALATCA
AGAATAGCAG
CTTTAAGCAC
TAACAAATGA
ATTCTATCAT
ATGGTGTAAT
TCAAAGAAGG
CAGGATCAGT
TTTGTGACAC
TATTCAATTC
TAATTACTTC
ATAAAAATCG
GAGTAGATAC
ACCTTTATGT
ACCTAGTCAC
AGCAAGAACA
TGCAATCTTC
TGAGGAGTTT
AACAGGTTGG
CAATGGAACT
AGTAACAGAA
AGAAGCACCA
AAGCAAGAAA
AAGTGGTATA
TGCTTTGTTG
CAGCAAAGTG
ACAGAGCTGT
ATTGTTGGAA
TTACATGTTA
CCAGAAAAAA
GTCTATAATA
AGATACACCT
ATCAAATATT
ATCCTTCTTC
TATGAACAGT
CAAGTATGAC
TCTTGGAGCT
TGGGATTATA
TGTGTCAGTG
AAAAGGGGAA
5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 315 CCTATAATAA ATTACTATGA TCCTCTAGTG TTTCCTTCTG ATGAGTTTGA TGCATCA.ATA TCTCAAGTCA ATGAAAAXAT
S
CTACATAATG
GTAATCATTG
AAAAACACAC
AGCAAATAGA
CAAATCAACT
TCATGCTACC
ATCACAATCA
TCGCGAAGAA
CACTACAGTC
ATGTTAAACA
GGAGCTGCTG
AGTTACATAG
CTTCTTATTG
TCGCCTAAGA
AACAAGCAAA
AAGAACACAT
AATGATCAAA
ATCATCCATA
ACAATTTAAC
AATTCATTGG
CATCCCTGAA
TTTATATCAT
ATTAATGGAA
TTTTCAGAAT
TAAATACTGG
TAGTATTGTT
CAGTTACACT
CAAAAAACTA
TAACAACAAA
TACACAACTA
AACACTAAAT
ATCCTTGTAA
ATAATTATTT
AGATACTTAA
AACTGGATAG
GATCAPLTAAA
AGATCAPLCAG
TAAGAGTGTA
CCATCCATCT
TAGATATCCA
ATGACCAAAC
TTGATTTCAA
CATAACCATT
ACACCTAAAA
GATATATAT)A
TCAACCATA.A
ACTCTGCCAA
GTAATGCTT!I
CAATCAAAGT
CAAATCTACT
ATCATTAATA
AAGCAAAGAC
CTTAATCATG
TATTTCAACA
AGCTAGATCT
CGACACATCA
ATTTGAGATT
TGAATGGCCT
GTCAATGGAC
AACAGAAGAA
CAACATAACA
TGATGACATT
CAATACTGTT
GCTCAAAAGA
CAAAAGCATA
CAAAAATAAT
GTGAAAGCAT
TGGATAACCA
ACTTATTAGA
CAGTATATAT
AACAACCTTA
*TGTGTATCTA
*AGGGAGTTAC
TTAGCTTTTA
ACAAATATTA
GCTATTGGTT
CAACTAAGTG
TTTCAACAAC
TCATAGCACA
TCAACTCATA
TTCACAAAAT
AGAGGTCATT
CCTCATGCAT
AAAAGCATAG
TATGCTCTTG
AAACAATCAG
AAAAAACTGA
ATATCATACA
CTACCAGCAG
ACCATAAGCA
GATATTACCG
GATTGCTACA
CCAGTGTTTA
TGCCACTCAA
ATTAGTGTCA
*ATAAGGTTAT
*ACTGATAGTT
CTTTTTAACG
TTCGTAGATC
TGATA.ACTAC
TACTGTTGTA
GAATCAATAA
AATCTGCTGA
GGCTGAATCA
GTTACATAAA
TAACAACTGG
GCTTGAATGG
TACTAGTGAG
ACACTTTGTC
GTATAGTTGG
CATGTGTTGC
GAGATAACGA
TTGAGAGCAA
ACGTGCTGAA
ACTCAAAAGA
GATAAATATC
TTCAATCATA
TTAAATCATA
CAATTTCTCC
TAATGCTTGA
GGGACAAAAT
ATCTAAAAGG
GCCCCTATCT
TGATGAATTA
AATTATTATA
TTGCAAAGCC
TATTGCATTC
CCACCAATCC
TTTCCTCATA
AACCCCALAGT
GGCAAATATG
TAGAAGATGT
GCAAALACTTC
GGAAATAAGT
AGTGCTAGAG
TATGAGTAAA
AGAACCCAAT
TAGAAAAAAC
GAAGACAATA
GTCAACCGTG
CTTGTAGTAT
AAAACATATT
TATTTGATGA
AACATCTTAA
CCATAACAAT
GGATCCCATT
TGTTATCTCT
TAA.AAATGAT
7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640
S
316 TACACCAACT TAATTAGTAG ACAAAGCCCA CTACTAGAGC ATATGAATCT
AAAAAAACTA
ACTATAACAC AGTCATTAAT ATCTAGATAT CATAAAGGTG AACTGAAGTT
AGAAGAACCA
ACTTATTTCC AGTCATTACT
TATOACATAT
ACAACTAATT
TACGCCATCT
TCAGGTGATG
GAAAACAATC
ATCAAAACAA
ATACACTGGT
GAGGTAAAAA
TACTTAAAAA
TGAATAAACT
AAAACTCAGT
AATCATATAC
CACTCTTGAA
TCAATTTATA
GTCATGGGTT
AATAATACGA
GGGACTAAAG
TCTTACAACC
AALATTCAGAC
AAAATTGATG
TACAAA.ATTA
TATATTAATA
AAALAGTATGT
AGAGCTATAG
GAAAAGGACA
ATAATCAAAG
AAA.AATCATT
TGTTCAATGC
AATAACATAT
GATAATCA;A
CCTCGTCTGA
AAATAAGTGA
GAGTTAAGCC
ATGATATACT
CAGTAAATCA
AACATCCTCC
TAACACAATA
CTTTAAGTGA
AC
TC
cl
T
A
4*e**c ATTTTAAATC AATATGGTTG TATCGTTTAT CATAAAGGAC
TCAAAAAAAT
ACTTACAATC AATTTTTGAC
A
ATTACTTGGA TAAGTAATTG
I
TTCAATAATG TTGTGTTATC
I
CATAATGAAG GCTTCTACAT AACATAACAG AAGAAGATCA ACAGATGCAG
CTATTAAGGC
GACAAGACAG
TGTCTGATAA
CTTAAATTGA
TTAAGCTTGC
CTCTTCAGAA
TCTTTGGACA
ATTAACTGTA
ATGAAACCAA
TTCATTTATA GAATCATAAA AATGCTATTG
TTCTACCTCT
CTACTTGAALA TCACAGAGAA TTTCATCTGC CTAAAAAAGT CCAAAAGATT TAATATGGAC
LTGGAAAGAC
TTAAATACA
~CAACTATTT
kATAAAAGAA kTTTAGGAA.A
TCAAAAAAAC
TATCATAKAT
AGGTGATAAiT
TCCAATGGTC
GTTCTACTTA
GGGGTTTGTA
AAGATGGTTC
AGATTTGATJ
GGATCTTGA)
TAGTTTTCC'
ATCAGCCTTA GCAGATTAAA T TTAAATAAAA
GCTTAGGGCT
CTTTATGGAG
ATTGTATACT
GTAGAGGGAT TTATTATGTC
I
CGATTTTATA
ATAGCATGCT
CTACTATCAA
GAGTATGTCAC
GGTAAATGGA
TKATCCTATT
AATCTCAILTA
ACTTGAGTGA
GATGAAAGAC
AAGCAATGGA
TTAAGTAATC TAAGTACGTT AATACCTACA
ACAGATGGCC
AACTATTATA
AA.CTTAATAC
ATTTTATCAG GATTGCGGTT ~ATGATAATAA
ATGACAAAGC
r AGAAATTACA TGCCATCACA
AAATTGCT
;TAAAGGTG
M.CAATAAT
LCAGCTGTG
kATATCACT rCATGGTTA
CGATCAAAT
TTTCAGTTT
ACAACTACT
GTTTGCTTA
,AGATGTGGA
,AAATTATTC
~TTAATTCTA
U ATAACATC
MACTTTATTA
A.AGTAAATTT
GCTTTATTTT
TGCTGTAAGA
AAGAGGTGCT
CACTTTAAGG
TTATCCATCT
CTATCGTGAG
CATTTCACCT
TATACAAAAT
8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 317 0 0 0
TATATAGAAC
TATTACTTGA
AGCTATCTCA
GTAGGTAGA.A
AAAATGATAG
CTAGAGCTTC
AATGATAACT
AATCAAGCAT
GGAGTACAAT
ACATATAGAC
GAACAAAGTG
TGGACCATTG
ACAGCTCTA.A
GAGGGTCAGA
TATAAAGAGT
GATATGCAAT
AAAAAAGTCC
TTAGAATCTA
AGTTTAATAT
GCATTATGTC
TTTTTTAATC
TTTGGTGGTG
CTTACAGAAG
CAAGATAAGC
ACGTTTGATA
GGGTCTGAAA
ATGAAAAGTT
GAGATAATAA
ACAACTCTAA
TGTTTGCTAT
CCGAAAATAT
AAAAGATATT
ACAACAATTA
TTAGATATGA
CTCTGTTCTC
ATGCACCTCC
GATTATACAG
AAGCTATATC
TAAATGGTGA
CCCATGCTCA
ATGCGGGCAT
TCATGAGCAA
TGAGAGTAGG
TAGGTAGCTT
TTAGGAACAT
ACAATAAGCT
TTGATAGTAT
GTGATCCTAA
CTATAGTACA
TCCAGGATCT
AAAATCCCAA
GGCAAGCAAA
GAAGTTCTCT
ATTCAATGAA
CCATGTGGTA
GCAACCAGGT
TTTACAATTC
AGAATTAAAA
TATCAGTAAA
AACATCATGT
TTGGTTGCAT
TTTTATAAAG
ATATCATATG
ATTATTAGAT
TAATCAGTCA
AGCAGATTAT
AGGCCACAAG
AACAATCCAG
TCCATGGATA
AACACAGGAG
TTGGTTATAC
ATATTTAGAT
TGATATGGCT
TTTGTTATAT
TTCAGTGTTT
TCCAGATGAT
TGCCGAGTTT
AATTACTAGT
GAAAGTGACA
TGCGATCTAT
TCACTAACTG
ATGTTTAGGC
TTCCCTGAGA
GCAGGAATAA
TGTTCTATCA
ATCTGCAGTG
TTAACAATAC
GATCATGTTG
GGTGGTATTG
CTAATATCTC
ATTGATATAA
TTGTTAGCAT
CTCAAGGGAA
CACAATGGAG
AATACAATAC
TTAGAATATA
AATCAAATTG
ATATTGAAG
TTAACATTGT
CGAAGCTTTT
GTGTTGAGCT
AGACTGAACA
GTAACATTGA
GAGATTAATA
GATCAAGAAG
ACAATTGTGT
GTAAAGAAAG
AAATTCAAAT
GTTTGACAAG
GCAACAAGTC
TTACAGACCT
ATGTATTAGA
CTCTTGTCAC
TTAATCTTA.A
AAGGCTGGTG
TCAAAGGGAA
GTAAIACCAGT
TAAATAGCCT
CAGAGACCTA
TGTACTATCC
TTGATGATTT
GAGGAGAGAG
CTTTGCAACT
TATTAAAACA
ATATGAATTT
ATAGGALGAAC
ATTATACTGG
AATTCTTGAC
AGTACTAGAG
GGTCAATCAA
AGAGCTCAGT
CTTAGCAGAG
ATATGGTGAT
AAATCGTTAT
TAGCAAATTC
TGAACTGCAT
ALATAATATGT
TAAAGTTGAT
TCAAAAACTG
ATTCTCTATC
TAGACTTATA
TAAATTGCTA
TATATCCCGA
AGCCAGTATC
TAAAGTTAGT
CTTATTATGC
CCGAAATCAT
CTTAAAAACT
GCCTATGCTG
TCCAGACTTC
TCACGATTTA
ATGTATCATC
10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 TGAGAGATCC ACAGGCTTTA GATTAGCAGT GACAGAAGTC
I
318 TTAAGTATAG CTCCAAACAA ILATATTTTCT AAAAGTGCAC AACATTATAC TACCACTGAG b
ATTGATCTAA
GTTTATGAAA
AAATCCATAA
GCTACTGATA
AACAAAGACA
TATGTAAGAG
ATGTTCACAA
AAATATA.ATG
TCATCTACGC
CAAAGAGACC
AAAGATGAAT
AAAAAATTGT
AGACCATGTG
ACTAGTCCTA
TTTCAAAATT
ATATGTCCTA
CCTATATTTA
ATGTTCCTAC
GCACTTAAT
TATTTTCATA
CAACTTATGA
GATCATATGT
TTTCATAAAG
TGTGTTTTGG
CAAAAAGTCA
ATGATATTAT
GTTTACCTTT
CTAATATACT
TGATGAGGAA
AAAGAGAGTT
AAAGATCTTG
TGGACATTAA
TTAATAGTTT
AGGAGAAAAA
AAATAGATTT
TCATGGAAGA
TTCCACAATA
AATTCCCTGC
TCAACCATGT
GCATAAGTTT
ATAGAATTAT
CAGGAGATGT
CAGATAAAAT
CTGGATCTCA
ATGCTTATAT
AGGATTCAAA
TCATTAATTT
GTTATGGTAA
AGCTAATAGA
TAAAATACAT
GCAAAATATA
TTATAAAGCA
TGAAAAAACA
AA.ATATAACT
ATTAAGTTTA
GTCGTTATCC
ATATACAACT
ALACTCGTGGT
AACAATGCCA
ATTAGCAA;A
ACTGAGTACT
TCTA.AGTGTC
ATCAATACCA
ATTA-ACAGAA
TGGTCTTAGC
TCTCATACCG
TGATATCATC
AAGTTTAACC
CATCAACTCT
TTTAAGTACT
AGGTATTTTT
GAATGTTTTC
AGCAAAATTA
CAGTAGCTAC
AATCAATCAA
GAACCAACTT
GAAAAAATAG
TCAGCAATAG
TTACTTATAA
GAAAATCTTA
AATATAGTAG
AGCACTATAG
GAA.AGAGGAC
GTGTACAATA
TTAGACTGGG
GGAACACTTG
AATTATTTAC
GCTTATAGAA
AAGTATGGAG
TTAATGTCGG
AAGCTGAATG
AAGTTGAAGC
CAATATGTAG
ILATTTAATAT
AATTTAGCTG
GAAAA.AGATT
TTTAATGCTT
GAATGTGATA
TGGAAATCTA
GACACAJAGTT
ACCCTCATGG
TTAATCTTAT
ATTCAACTGA
GGATACTTCC
GTATAACTGA
GAGTAACATC
CCAGTGGTAT
CTACTAAGCC
GACAAGTTTT
TATATGCATC
GACTGTCATA
ACCGCTTAAC
CAACAAATTA
PLTGAAGATAT
TTGTGGAACA
AGATACATTT
AAGTGATACA
AATTATTCCT
TAGTACATAA
GACATTGGAT
GGGGAGAGGG
ATAAGACTTA
TGAACACTTC
TGTCTmAGT
TGCATAGAAT
ATTAAGAGTT
ATCAGGAACA
TATTAATAGG
ACTAGATTGT
ATTA.AGCAAG
GCCAAGTATT
AATTATAGAA
ATGGGTAGGT
AACCAAAAAG
CATAGACAAC
TGAGA7LAGCC
AGTCAGTAGT
TCATTTCGAT
CGACATTGTG
ATTCACAAAC
GATGAAACCT
AAAACAGCAC
AAGTAACAAA
AATGTCTGAT
TCTGATTATT
GTATATAACT
TTTGCTATGT
AGATCTTCTT
TTTCCTAGAA
AAAAGGTTGT
11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 319
CATAGTTTTA
TGGGTTGTTA
TTAGTTAGAA
AATGATGAkAT
CATTTGCTAA
CTATATCACC
AATAGTAATA
TTCTCCAATA
CA.AGACTTGT
AATACAGCAA
AATAGTGCAT
TTTAGTTCCA
GACCCCAGTT
GTAGAACTTC
AGTTTACCTA
AATTTAACCA
AAATTTGCAG
TGGAGTAAAA
GTAA.ATAGAT
GATAACATTA
GTTTACTTAG
AATGCTAAAT
GAATCTATCG
AAAGGAATTA
TATTCTATAG
AATATCCTAA
AGTTATGGTT
ACATAGATTA
TGGGGTTAAT
TTTACACATC
CAAAACAAAT
CAACCCCAGA
AACCTAAATT
AAACGCATAT
ACAATTTATT
AATCTAACCA
CACTTTATTG
CAGGATGCAA
GTATAGCATT
ATCCAGACAT
TTGAATTTCT
TTCCTGCTAC
AACCTATTAG
TTATAATTGA
GCATTTTAAT
CTATATTAAA
TCCTTACAAT
TGATTCTTTC
ATGCAALATAT
AGACTTCATT
CTGGACGTAA
AATGGCTAGA
TTTAAAACGC
TCACCCAACA
AA.ATGTAGAT
AAATCTCTTT
AAGAATTGCT
AACTTTAGALA
TGGTATAAGT
TAAATCTTCC
TCCAATTGTC
ACTCTACACT
CATGCTTCCT
GATCAGTATA
CATAGGTGALA
AAGATACATT
AAGGTTATAC
AGATGCAACT
CATTTTTGTC
ATGGAGTAAG
TGCAAAATAT
ALACTTACGTG
AGGCCCTGCA
AAGGACTAAA
TAAAAG CT TA
GTCAAAATTG
TGAAGTATTC
TCATGTTTTA
CTTALATAATG
CACATGAAAG
AAATTAACCA
TACATTAGTT
AATTCAGAAT
AATATGTCAT
GGAAATACCG
GCTGTTATTA
GTGATAGACA
ACCACTTCAC
TGGCATCATG
GAGTATATTT
GGAGCTGGTA
TACAGAAGTT
AACGGGCATA
AATAACATTC
TGCGATGCTG
CATGTAAGAA
CATGCCCAAG
TGCCTAGGTA
AATATACTTC
AATTTCATTA
ATACCTTTCC
AALGAGTGTAG
AGCAACAAGC
AACTTTAGAT
CTAAATTTAC
CTATATTATC
TTAAAAATAA
ATAACTTTTC
TAGAAAATAA
TAATTCCTGT
A.ATCTATGAT
CAAGATTCA.A
GGATTATAGA
ATCAGACATC
TCAATAGATT
TAAAAGATCT
ACTTATTATT
TAAAAGATTG
TAAACATAGA
ATTGGTCTTA
AATTACCTGT
AGTGCAAGTA
ATGATATTGA
GCAAGTTAAA
CTGTTTTTAA
TGCCTAAAAA
TTTGTTACCC
TTAGTGGAGA
TTATAAACCA
CAGCTGAACT
CGTATGCCCT
TTACATAGAT
AAATAAATTC
AGATALACACT
TTATAACAA.A
CAAAAGTAAT
GACGTCAACA
TTATAGTAAA
TCATTCAGGT
TTTAGTAAGG
TAACTTTGTA
TAAGATTAAA
ACGTACAGTA
CAATGATCAT
TTATGGTGAG
TTTACATATA
TACAGCCAAT
CTGTTCCTCT
TTTCAAATTA
AGGATCTGAA
TGTTGTGCAA
AACTGACAAA
TATAACAAAA
TATATTATCA
CAAGCATALTG
TAATTACAAT
13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 320 CATTTATATA TGATAGAGTC CACATATCCT TACTTAAGTG ACCAATGAGC TCAAGAAGCT GATTAAAATA ACAGGTAGTG GAACAGTAAC TTAAAACATC ATTAACAAGT TTGATCAAAT ATATTATAGT TATTAAAAAA TATATATGCA
AACTTTTCAA
AAAGTTATCA TTTTGGTCTT AAGGGGTTGA
ATAAAAATCT
TGCATTTACA ACACAACGAG0 ACATTAGTTT TTGACACTTT INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 2166 amino acids TYPE: amino acid
STRAN'DEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein AATTGTTAAA CAGTTTAACA TACTATACAA CCTTCCCAAC TTAGATGCTA ACACATCATA TAATTTAGCA TATTGATTCC AA-AACTAACA ATTATACATG
TTTTCTCGT
14940 15000 15060 15120 15180 15229 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: Met Asp Pro Ile Ile Asn Gly Asn Ser Ala Asn Val Tyr Leu 1 Ser Tyr Leu Ser Tyr Leu 35 Ile Ser Arg Thr Asp Lys 20 Phe Gly Val Ile Ser Phe 25 Ser Glu Cys Asn Asn Gly Pro Leu Lys Asn Asp Tyr Leu Ala Leu Gly Thr Asn Leu Lys Lys Leu Gin Ser Pro Thr Ile Leu 55 Ile Leu Giu His Met Asn Lys Thr Gin Ser Leu Leu 70 Ser Arg Tyr His 75 Gly Glu Leu Lys Glu Giu Pro Thr Glu Tyr Phe Gin Ser Leu 90 Thr Leu Met Thr Tyr Lys Ser Lys Ile Met Ser Ser Ser 100 Gin Ile Ala Thr 105 Asn Leu Leu Lys 110 Ile Arg Arg Aia Ile Giu Ile Ser 120 Asp Val Lys Val Tyr Ala Ile Leu 125 321 Asn Lys Leu Gly Leu Lys 130
C.
C.
C
C
C.
C,
Ser 145 Leu His Leu Asn Giu 225 Asp Giy Lys Ser Phe 305 Leu Gly Arg Ile Asp 385 Gly Ser Ser Met Leu 210 Val Phe Leu Asp Asn 290 Asn Lys Phe Lys Lys 370 Lys Asp Al a Val Cys 195 Tyr Lys Gin Lys Ile 275 Cys Aen Leu Ile Arg 355 Aia Thr Glu Val Asn 180 Ser Thr Ser Phe Lys 260 Ser Leu Vai Phe Met 340 Phe Gin Val Asn Ser 150 Giu Asn 165 Gin Asn Met Gin Lys Leu His Gly 230 Ile Leu 245 Ile Thr Leu Ser Asn Thr Val Leu 310 His Asn 325 Ser Leu Tyr Asn Lys Aen Ser Asp 390 Giu Lys Asp 135 Val Leu Thr Asn Gin Ser Ile Thr Ile 185 His Pro Pro 200 Asn Asn Ile 215 Phe Ile Leu Asn Gin Tyr Thr Thr Thr 265 Arg Leu Asn 280 Leu Aen Lys 295 Ser Gin Leu Giu Gly Phe Ile Leu Aen 345 Ser Met Leu 360 Leu Leu Ser 375 Asn Ile Ile Arg Val Thr Ile 155 Tyr Thr 170 Lys Thr Ser Trp Leu Thr Ile Asp 235 Gly Cys 250 Tyr Aen Val Cys Ser Leu Phe Leu 315 Tyr Ile 330 Ile Thr Asn Asn Arg Val Asn Gly 395 Lys 140 Ile Asn Thr Leu Gin 220 Asn Ile Gin Leu Gly 300 Tyr Ile Giu Ile Cys 380 Lys Pro Lys Ser Leu Ile 205 Tyr Gin Val Phe Ile 285 Leu Gly Lys Giu Thr 365 His Trp Asn Asp Asp Leu 190 His Arg Thr Tyr Leu 270 Thr Arg Asp Giu Asp 350 Asp Thr Ile *Asn Asp Lys 175 Lys Trp Ser Leu His 255 Thr Trp Cys Cys Val 335 Gin Ala Leu Ile Asn Ile 160 Asn Lye Phe Asn Ser 240 Lys Trp Ile Gly Ile 320 Giu Phe Al a Eaeu Leu 400 Leu Ser Lye Phe Leu Lye Leu Ile Lye Leu Ala Gly Asp Asn Aen Leu 322 405 410 Asn Asn Leu Ser Giu Leu Tyr Phe Leu Phe Met Giu Phe 465 Pro Tyr Leu Lys Pro 545 His Asp Asn Asn Val 625 Ile Glu Val1 Thr 450 Ile Thr Lys Ile Lys 530 Lys Ile Arg Glu Ser 610 Gly Leu Ser Asp 435 Lys Tyr Leu Leu Ile 515 Val Asp Gin Ser Cys 595 Asn Arg Al a Leu Arg Tyr Ile Asn 485 Thr Ser Leu Ile Tyr 565 Arg Leu Val Phe Lys 645 Arg Gin Leu Ile 470 Aila Tyr Giy Giu Trp 550 Ile Val Tyr Val Ala 630 Met Tyr Ala Leu 455 Lye Ile Pro Leu Met 535 Thr Glu Leu Asn Ser 615 Met Ile Gly Met 440 Ser Gly Val S er Arg 520 Ile Ser His Giu Cys 600 Leu Gin Al a Asp 425 Asp Asn Phe Leu Leu 505 Phe Ile Phe Glu Tyr 585 Val Thr Pro Glu Leu 665 Ala Leu Val Pro 490 Leu Tyr Asn Pro Lye 570 Tyr Val Gly Gly Asn 650 Glu Arg Val1 Ser Asn 475 Leu Glu Arg Asp Arg 555 Leu Leu Asn Lys Met 635 Ile Leu Ile Arg Thr 460 Thr Arg Ile Giu Lye 540 Asn Lye Arg Gin Giu 620 Phe Leu Gin Phe Ile 445 Leu Tyr Trp Thr Phe 525 Ala Tyr Phe Asp Ser 605 Arg Arg Gin Lys Gly 430 Asn Arg Asn Leu Glu 510 His le Met Ser Asn 590 Tyr Giu Gin Phe le 670 His Cys Gly Arg Asn 495 Lye Leu Ser Pro Glu 575 Lys Leu Leu Ile Phe 655 Leu Pro Aen Ala Trp 480 Tyr Asp Pro Pro Ser 560 Ser Phe Aen Ser Gin 640 Pro Glu Lou Lye Ala Gly Ile Ser Aen Lye Ser Asn Arg Tyr Asn Asp Aen Tyr 675 680 685 323 Asn Asn Tyr Ile Ser Lys Cys Ser Ile Ile Thr Asp Leu Ser Lys Phe 690 695 700 Asn Gin Ala Phe Arg Tyr Giu Thr Ser Cys Ile Cys Ser Asp Val Leu 705 710 715 720 Asp Giu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 725 730 735 Ile Pro Leu Val Thr Ile Ile Cys Thr Tyr Arg His Ala Pro Pro Phe 740 745, 750 Ile Lys Asp His Val Val Asn Leu Asn Lys Vai Asp Giu Gin Ser Giy 755 760 765 Leu Tyr Arg Tyr His Met Giy Gly Ile Glu Giy Trp Cys Gin Lys Leu 770 775 780 Trp Thr Ile Giu Aia Ile Ser Leu Leu Asp Leu Ile Ser Leu Lys Giy *785 790 795 800 .Lys Phe Ser Ile Thr Aia Leu Ile Asn Giy Asp Asn Gin Ser Ile Asp .*.805 810 815 Ile Ser Lys Pro Vai Arg Leu Ile Giu Gly Gin Thr His Aia Gin Aia 820 825 830 .eo Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 835 840 845 Ala Giy Ile Gly His Lys Leu Lys Gly Thr Giu Thr Tyr Ile Ser Arg **850 855 860 Asp Met Gin Phe Met Ser Lys Thr Ile Gin His Asn Gly Val Tyr Tyr 865 870 875 880 o .6 Pro Ala Ser Ile Lys Lys Val Leu Arg Val Gly Pro Trp Ile Asn Thr 885 890 895 Ile Leu Asp Asp Phe Lys Vai Ser Leu Giu Ser Ile Gly Ser Leu Thr 900 905 910 Gin Giu Leu Giu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu Ile Phe 915 920 925 Arg Asn Ile Trp Leu Tyr Asn Gin Ile Ala Leu Gin Leu Arg Asn His 930 935 940 Aia Leu Cys His Asn Lye Leu Tyr Leu Asp Ilie Leu Lys Vai Leu Lys 945 950 955 960 324 His Leu Lys Thr Phe Phe Asn Leu Asp Ser Ile Asp Met Ala Leu Thr 965 970 975 Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 980 985 990 Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Giu Ala 995 1000 1005 Ile Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 1010 1015 1020 Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 1025 1030 1035 1040 Thr Cys Ile Ile Thr Phe Asp Lys Asn Pro Asn Ala Giu Phe Val Thr 1045 1050 1055 *Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys Ile 1060 1065 1070 .Thr Ser Giu Ile Asn Arg Leu Ala Val Thr Giu Vai Leu Ser Ile Ala 1075 1080 1085 Pro Asn Lys Ile Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 1090 1095 1100 Ile Asp Leu Asn Asp Ile Met Gin Asn Ile Glu Pro Thr Tyr Pro His 1105 1110 1115 1120 Gly Leu Arg Val Vai Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 1125 1130 1135 Ile Val Asn Leu Ile Ser Gly Thr Lys Ser Ile Thr Asn Ile Leu Glu 1140 1145 1150 *Lys Thr Ser Ala Ile Asp Ser Thr Asp Ile Asn Arg Ala Thr Asp Met 1155 1160 1165 Met Arg Lys Asn Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp Cys 1170 1175 1180 Asn Lys Asp Lys Arg Giu Leu Leu Ser Leu Glu Asn Leu Ser Ile Thr 1185 1190 1195 1200 Giu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn Ile 1205 1210 1215 Val Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asp Ile Lys Tyr 1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile Ile Ile Giu Lys Tyr Asn Val 325 1235 1240 1245 Asn Ser Leu Thr Arg Gly Giu Arg Gly Pro Thr Lys Pro Trp Val Gly 1250 1255 1260 Ser Ser Thr Gin Giu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Vai 1265 1270 1275 1280 Leu Thr Lys Lys Gin Arg Asp Gin Ile Asp Leu Leu Ala Lys Leu Asp 1285 1290 1295 Trp Val Tyr Ala Ser Ile Asp Asn Lys Asp Giu Phe Met Giu Giu Leu 1300 1305 1310 Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 1315 1320 1325 Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 6Arg Pro Cys Giu Phe Pro Ala Ser Ile Pro Ala Tyr Arg Thr Thr Asn *1345 1350 1355 1360 *Tyr His Phe Asp Thr Ser Pro Ile Asn His Val Leu Thr Glu Lys Tyr **1365 1370 1375 Gly Asp Glu Asp Ile Asp Ile Val Phe Gin Asn Cys Ile Ser Phe Gly 1380 1385 1390 Leu Ser Leu Met Ser Val Vai Giu Gin Phe Thr Asn Ile Cye Pro Aen 1395 1400 1405 Arg Ile Ile Leu Ile Pro Lys Leu Asn Giu Ile His Leu Met Lys Pro 1410 1415 1420 Pro Ile Phe Thr Gly Asp Val Asp Ile Ile Lys Leu Lys Gin Val Ile ***1425 143014540 0*eGin Lye Gin His Met Phe Leu Pro Asp Lye Ile Ser Leu Thr Gin Tyr 1445 1450 1455 Vai Giu Leu Phe Leu Ser Asn Lye Ala Leu Lys Ser Gly Ser His Ile 1460 1465 1470 Aen Ser Aen Leu Ile Leu Val His Lye Met Ser Asp Tyr Phe His Asn 1475 1480 1485 Ala Tyr Ile Leu Ser Thr Aen Leu Ala Gly His Trp Ile Leu Ile Ile 1490 1495 1500 Gin Leu Met Lye Asp Ser Lye Gly Ile Phe Giu Lye Asp Trp Gly Glu 1505 1510 1515 1520 326 Gly Tyr Ile Thr Asp His Met Phe Ile Aen Leu Asn Val Phe Phe Asn 1525 1530 1535 Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 1540 1545 1550 Lys Leu Giu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Giu 1555 1560 1565 Leu Ile Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Vai Phe Leu Giu 1570 1575 1580 Gin Lys Val Ile Lys Tyr Ile Ile Asn Gin Asp Thr Ser Leu His Arg 1585 1590 1595 1600 Ile Lye Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 1605 1610 1615 Aen Ala Lye Phe Thr Val Cys Pro Trp Val Val Asn Ile Asp Tyr His *1620 1625 1630 *Pro Thr His Met Lye Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met *1635 1640 1645 .Gly Leu Ile Asn Val Asp Lye Leu Thr Ile Lye Asn Lye Aen Lye Phe 1650 1655 1660 Asn Asp Giu Phe Tyr Thr Ser Asn Leu Phe Tyr Ile Ser Tyr Aen Phe *.1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lye Gin Ile Arg Ile Ala Asn Ser **1685 1690 1695 Glu Leu Glu Asn Asn Tyr Aen Lye Leu Tyr His Pro Thr Pro Glu Thr 1700 1705 1710 Leu Giu Asn Met Ser Leu Ile Pro Val Lye Ser Asn Asn Ser Asn Lye S..1715 1720 1725 Pro Lye Phe Gly Ile Ser Gly Asn Thr Giu Ser Met Met Thr Ser Thr 1730 1735 1740 Phe Ser Asn Lye Thr His Ile Lye Ser Ser Ala Vai Ile Thr Arg Phe 1745 1750 1755 1760 Aen Tyr Ser Lye Gin Asp Leu Tyr Aen Leu Phe Pro Ile Val Val Ile 1765 1770 1775 Asp Arg Ile Ile Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 1780 1785 1790 327 Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 1795 1800 1805 Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 1810 1815 1820 Phe Ser Ser Thr Gly Cys Lys Ile Ser Ile Glu Tyr Ile Leu Lys Asp 1825 1830 1835 1840 Leu Lys Ile Lys Asp Pro Ser Cys Ile Ala Phe Ile Gly Glu Gly Ala 1845 1850 1855 Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile 1875 1880 1885 Glu Phe Leu Arg Leu Tyr Asn Gly His Ile Asn Ile Asp Tyr Gly Glu 1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala Thr Asn Asn Ile His Trp Ser 1905 1910 1915 1920 Tyr Leu His Ile Lys Phe Ala Glu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 o Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys Ile Ile Ile Glu Trp 1940 1945 1950 Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 1955 1960 1965 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 1985 1990 1995 2000 Lys Gly Ser Glu Val Tyr Leu Val Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 Leu Pro Val Phe Asn Val Val Gin Asn Ala Lys Leu Ile Leu Ser Arg 2020 2025 2030 Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lys Glu Ser Ile Asp 2035 2040 2045 Ala Asn Ile Lys Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lys 2050 2055 2060 Lys Gly Ile Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Ser Gly 328 2065 2070 2075 2080 Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Asn Glu Val Phe Ser Asn 2085 2090 2095 Lys Leu Ile Asn His Lys His Met Asn Ile Leu Lys Trp Leu Asp His 2100 2105 2110 Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu T'yr Met 2115 2120 2125 Ile Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 2130 2135 2140 Thr Asn Glu Leu Lys Lys Leu Ile Lys Ile 2145 2150 Asn Leu Pro Asn Glu Gin 2165 INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 15219 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: ACGGGAA.A.7AA AATGCGTACT ACAAACTTGC ACATTCGA;A1 TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA
I
ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG ACATOTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT
'T
ACAATTAAAT TAAACGGCAT AGTTTTTATA.
CATGTTATAA
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC
C
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT
T
AATTGTGAA.A TCAAATTTTC TAAAAGACTA AGTGACTCAG
TI
CAA.ATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAJA
T
Thr Gly Ser Val Leu Tyr 2155 2160 ~AAATGGGGC
AAATAAGAAC
GGGGTGCAA TTCACTGAGC LAGTAGCATT
GTTAAAAATA
AGCCAALAGC
AGCAATACAT
:AAGCAGTGA AGTGTGCCCT AATACTACA
AAATGGAGGA
AAACGGTTT AATGOATGAT 'AATGACTAA TTATATGAAT TATGTTTAG TCTAATTCAA 120 180 240 300 360 420 480 540 329 TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA
AATAAACTCA
GATTGATGAT
AAGAAATCAT
TTGATGAAAG
AAGTAGGGAG
CCATGCCTAT
AACACACTCC
CCCAACCAAA
CATTTTAGTA
TGGCTCTTAG
GCAAATACAC
AAAAACACCT
TCACAGGATT
AGATACTTAA
GTCAAGATAT
CAGAAATACA
AGATGGGAGA
TGTGTATAGC
CAGTAATTAG
TACCAAAGGA
ATGTTTTCGT
GAATCTTTGC
GAGTTTTAGC
TGGAGCAAGT
ACCATATATT
CCTAATCAAT
CACAGACATG
CACACACAAA
ACAAGCTACA
TACCAA.ATAC
ATTTATCAAT
TATAATATAC
CCAAACTATT
ATTAAAAATA
CAA.AGTCAAG
TATTCAACGT
AAACAAACTA
AATAGGTATG
AGATGCTGGA
AAATGGAAAG
AGTCAATATT
AGTGGCTCCA
TGCACTTGTG
GAGGGCAAAC
TATAGCTAAC
GCACTTTGGC
AGGATTGTTT
CAAATCTGTA
TGTGGAAGTC
GAACAATCCA
CAAACCATGA
AGACCCCTGT
TTCATATACT
TTTACATTCT
AAAAAATACA
CACGGCGGGT
AAATATGACC
CCTCAAACA.A
AAAGTAAAGC
TTGA.ATGATA
AGTACAGGAG
TGTGGTATGC
TTATATGCTA
TATCATGTTA
GAAATGAAAT
GAGATAGAAT
GAATATAGGC
ATAACCAAAT
AATGTCTTAA
AGTTTTTATG
ATTGCACAAT
ATGAATGCCT
AAAAATATCA
TATGAGTATG
AAAGCATCAT
GCACTACAAA
CAATGGATTC
TGATAILACAA
TAGTCAATTA
CTGAATATAA
TTCTAGAATG
TCAACCCGTG
CAGTGCTCAA
CAATAACATA
CATTAAATAA
ATAATATTGA
TATTAATCAC
TGACAACACT
AATAATAACA
TGA.ATGTATT
TGAGATGAAG
TACAAAATAT
TATTGGCATT
AATTCCAACA
TAGTTAAGAA
AATTGGGGCA
GGATCAGCTG
CACTCCCAAT
TGAAGATGCA
a TGTCCAGGTT AGGA.AGGGAA AAGCTAATGG AGTAGATATA TCGAAGTATT AACATTATCA CTAGAAAGTC CTACAAAAAA
ACTATGCAAA
TCTCTTACCA
GTAAGAAAAC
CTACTGCACA
GGCACTTTCC
AAGCCTACALA
AAAAA.ACCAA
GGAGCTAATC
AATACAAAGA
CTGTCATCCA
TATGATGTGC
AATCATAAAT
GACACTATAA
ACAACATATC
AGCTTGACAT
ATGCTAAAAG
ATGATAATAC
GGTCTTACAG
AAGGGCCTCA
CATCTTATAG
AGAGTTGAAG
CTAAGATGGG
CAGGCAGAAA
GCTGGATTCT
CCCAACTTCT
660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100
ATGATTCTCC
TAGCAGCAGG
AAAACGAAAT
AAGTGTTTGA
CATCCACAAG
ATGGTTCAGG
TGCTAGGACA
CACAGAAGTT
TGCTGTCATT
AGACTGTGGG
AGACAGATCA
AAAACGATAC
AAAACACCCT
AGGGGGTAGT
GCAAGTAATG
TGCTAGTGTC
GGGAGGAGIA
AACTCAATTT
330 CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT
AGAGGTACAC
2160
CAAGAA.ACCA
GAGTAATAAA
GGATCTTTAT
CTACAGTGTA
AACTCAACCC
CAAAGAAGAT
a. a a a a.
a. a a a a.
a..
a a a
AGTCAACATG
CAAATTCCTA
TAGCATAATA
TGGCACCAAC
CTACCCAAGA
TTCTAAGTTG
CTCATATGAA
TGATGAAAAA
ACCCACTTCA
AGAAAAAATA
TAGGAATGAG
AACTTCCAAA
TGATGATTTT
TCAATCCATT
CCAATCAATC
AAA~4GAACAA
ATACACAGCA
AATATGGGTG
AAGCATCAAT
GATTAACTCA
TGTATCATTA
GAGAAGTTTG
GAATCAATAA
TCTGTTAACT
ATCATCAATC
AAACCCCTAG
TACAAGGAAA
GAGATAAATG
TTAAGTGAAA
GCTCGCGATG
AGAGCGGAAG
GAAAGCGAAA
AAATTGAGTG
TGATCAGCGA
GAATCAACTG
AACCAATTGA
GATGGGGCAA
GCTGTTCAGT
CCTATGTTCC
ATACTAGTGA
AGAAGTGCTG
GATGAAAGAA
GATGCAGCTA
TTAGACTTAA
GATGTAGAGC
CACCTGAATT
AGGGCAAGTT
CAATAGATAT
CAACAAGTGA
TAAGCTTCAA
CAATAGAAAC
ATCAAACAAA
TATTAGGAAT
GAATA.AGAGA
CATTAATGAC
TAATGGCAAA
k~CTTGTTGGA rCAACTCACT
W!AGACCGAA
rCAATCAGCA kTATGGAAAC kCAATGTTCT
WGTCATCTGT
WGCAGATCTC
'GCTGGCTCA
;CAAATTAGC
AAGCATATGC
CAGCAGA.AGA
TTTAAGTTA
TCATGGAGAA
CGCATCATcC
AGAAGTAACT
AGCCGACAGT
AGAAGATCTC
ATTTGATAAC
TGACAACATT
GCTCCATACA
TGCTATGGTT
CAATGATAGG
AGACACCTCA
AGACAACGAT
CAGCAATCA
CAAACAAACG
ACCCGACAAA~
ATACGTGAAC
AGAAAAAGAT
GCCAGCAGAC
rACGCCCAAA kATGCCTAGT kTATGATGTA
AGAGCAACTC
ATTGGAAGCC
CAAAAAATAC
GATGCAAATA
AAAGATCCTA
AAAGAGAGCC
ACCCCAGAAA
ACCCCAAGTG
AATGAAGAAG
ACAGCAAGAC
TTAGTAGTTG
GGTCTAAGAG
TTAGAGGCTA
GATGAAGTGT
AGTGACAATG
CAACATCAAT
TCCATCAGTA
ATTAACAATA
A.AGCTTCACG
GATGATCCTG
rTGCTCATAA
GGACCTTCAC
kATTTCATCA kCTACACCTT
AAAGAAAATG
ATAAAGCATC
GGGGCAAATA
ACAAAGCTAC
AGAAGAAAGA
CGATAACATC
CAAAAGCCA
ACAACCCTTT
AATCTAGCTA
TAGATAGAAT
CAAGTGCAGG
A-AGAGATGAT
TGGCAAGACT
CTCTTAATCC
ATCTATCACT
AAAACAGACA
GAACCACCAA
TAGTALACAAA
AAGGCTCCAC
CATCACT.C
AAGAACTTGC
TACGAGTCAC
rAAGCGCAAA 3TGAAATCAA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 AGCATGCAGT CTAACATGCT TAAAAGTAUA AAGTATGTTA ACTACAGTCA
AAGATCTTAC
3660 331 CATGAAGAC;z
GACATCAAAP
TCTGAACTCA
AATTATTCCT
CAAATATATC
GAGGATATAT
ACTAGAGGAT
ACACACTATA
CCACTCAAAA
AAATAAACC
CATTTATACC
ACAATAGAAC
CTAATCTTTT
CACAAAGCAT
AGTTCCACCA
CCTCTCACAG
TATTTAAA
AATGCAACCA
GATACTCTTA
ATAGCACAA
GCCATAATAT
ACAATAAA
AGGGTTAGTT
ACATCACCCA
ACCACTTCA
LTTCAACCCCA
AGAGTAATAA
*CTAGAAAATA
*TATGCAGGAT
AAACCACAGA
TATGTGACTA
TAAACTTAAT
TCCAAACATC
TCCAAALATCA
AAAATATGGG
GCCAATTCAA
TCACAAGCAA
TACTAATTAT
TCTGCAACAA
TTATGCTGTG
AGTCACGGTG
TTAACATAT
TGTCCAAACA
ATCATCTAAT
TAGCACTATC
TCATCATCTC
ACCACACTGA
CATCCAAGCA
I
ATACAAAATC
I
CACAGACCAA C
CTCATGAGAT
TACCAACCTA
TAGCAACCAC
TAGTGTTAGT
GTCAATTTAT
CTAATTGGAA
TATCAACACT
ATAAACATCT
CTACCAGCCA
GTAAATAGAC
CACATATACT
ATTTTGGcCCC
A.ATCACTATC
AACTCTTGAA
TCAAACCATA
rCGCAAAACC
GATGAATTGT
CAAGAATCAA
rGTAATATCC kGTTTTGGCA
LGCCAATCAC
;AAAACATC
~CCCACAACC1
LGAAACACAC
~AAGCCAAGC
CATTGCTCTA
TCTAAGATCA
CGALATTCAAA
TATCACAGTT
AGTAGATCTT
GCATACAGCT
GAATGACAGG
ACACTACACA
CTATCCGCTA
ATTAGTTAGA
ATAAATCTTA
TATTTTACAC
ATGATTGCAA~
CTAGGACAGA
ATCCTGTATA
ACGCTA.ACCA
rAGTATGAGA
CGCACTGCCA
rCTTGTTTAT
PLTGATAATCT
kAAGTTACAC k.CCACCTACC kCATCACCAA :ATACAACAG
C
LCAA.AACCAC
G
TGTGAATTTG
ATTAGTGTCA
A.ATGCTATCA
ACTGACAATA
GGTGCCTACC
ACACGTTTTT
TCCACATATA
CTTCATCACA
GACCTAGAGT
GTTCAATCAA
AAATGGGAAA
TAATACATAT
CACTAAATA
TGTACCAAAT
TACAAACAAA
rCATGGTAGC
TCAAAACA
GGACTCTAGA
k.CAGATTAAA
MAACCTCTCT
LAACAACGGT
TACTCAAGT
C
~CCACACAAG
'I
ACAAACCAA
A~
TCCAAAAALA
T
AAAATATTAT
AGAACAAGGA
CCAATGCA
ALAGGAGCATT
TAGAAAALAGA
CAATCAAACC
TCCTCAAACT
CAAACCAATC
GCGAATAGGC
TCTTAACAAC
TACATCCATC
GATCTTAACT
GCTAAGTGAA
CAACACACAG
CAAATCCAAT
PTAGAGTAGT
CATTGGGGCA
kAAGACCTGG rTTAAAATCT
:ATAATTGCA
MACAGTTCAA
~TCACCAGAA
TCAGCTACA
LGGCAGAJACC
'CCACCAAAA
3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG
TGGCAACAAT
5220 332
CAACTTTGCA
ATCAAACCCA
GCCAAAACGA
ACAGAAAGAG
ACAGTCCAAC
ACACCCACAG
TAGTTATTCA
AAACCTGGGG
TCTTGCTATT
ATCGACATGT
TAGTGTCATA
TAAAGTAA.AA
GCTACTTATG
TATGAACTAC
ACGAAGATTT
ATCAAAAGTT
AAACAAAGCT
TCTCAAGAAT
CTCCAACATT
CAGAGAATTT
CAGTGAGTTA
GTCAAGCAAT
AGAAGTCCTT
GAAATTGCAC
AACAAGGACT
AATCCATCTG
CAAACA.AACC
CGAAAAAAGA
ACACCAGCAC
AGCAATCCCT
CATCCGAGCC
AAAACTACAT
CA-AATAACCA
AATGCATTGT
AGTGCAGTTA
ACAATAGAAT
CTTATGAA.AC
CAAAACACAC
ACAATCAATA
CTAGGCTTCT
CTACACCTTG
GTAGTCAGTT
TACATAAATA
GAAACAGTTA
AGTGTCAATG
CTATCATTAA
GTTCAGATAG
GCATATGTTG
ACATCGCCTC
GATAGAGGAT
CAAA.ACAATA
AACCACCAAA
AACTACCACC
CTCACAATCC
CCTCTCAACC
CTCCACACCA
CTTAGCAGAG
TGGAGTTGAT
ACCTCACCTC
GCAGAGGTTA
TAAGTAATAT
AAGAATTAGA
CAGCTGTCAA
CCACTAAAAA
TGTTAGGTGT
AAGGAGAAGT
TATCAAATGG
ACCAATTATT
TAGAATTCCA
CAGGTGTAAC
TCAATGATAT
TAAGGCAACA
TACAGCTGCC
TATGCACTAC
GGTATTGTGA
CCAkAGCAACA
ACCACAAACA
AACCCAACAA
ACTGCACTCG
ACCCCCGAAA
AACTCCACCC
ILACCGTGATC
GATCCACAAG
AAGTCAGAAC
TTTTAGTGCT
AAAAGAAACC
TAAGTATAAG
CAACCGGGCC
CCTA7LATGTA
GGGATCTGCA
GAACAAGATC
GGTCAGTGTT
ACCCATAGTA
GCAGAAGAAC
AACACCTTTA
GCCTATALACA
AAGTTATTCC
TATCTATGGT
CAACATCAAA~
TAATGCAGGA
AACCAAAGAA
AAAGAGACCC
AAAAACTAAC
ACACAACCAC
ACACACCCAA
AAAAAACCCA
TATCAAGCAA
TCAAGTGCAA
ATAACTGAGG
TTAAGAACAG
AAATGCAATG
AATGCAGTAA
AGA.AGAGAAG
TCAATAAGCA
ATAGCAAGTG
AAAAATGCTT
TTAACCAGCA
AATCAACAGA
AGCAGATTGT
AGCACTTACA
A.ATGATCAGA
ATCATGTCTA
GTAATAGATA
GAAGGATCAA
TCAGTATCCT
GAAACCAACC
AAAA.ACACCA
CCTCA.AGACC
ATTAAAACAC
CTCCACACAA
GCCACATGCT
GAACGAAATT
TCTTCCTAAC
AGTTTTACCA
GTTGGTATAC
GAACTGACAC
CAGAATTACA
CACCACAGTA
AGAAGAGGAA
GTATAGCTGT
TGTTGTCTAC
AAGTGTTAGA
GCTGTCGCAT
TGGA.AATCAC
TGTTGACAAA
AAAAATTAAT
TAATAAAGGA
CACCTTGCTG
ATATTTGTTT
TCTTTCCACA
5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 GGCTGACACT TGTWAGTAC AGTCCAATCG AGTATTTTGT CACACTATGA ACAGTTTGAC 333
ATTACCAAGT
AATTATGACA
GTCATGCTAT
ATTTTCTAAT
CACTTTATAC
AATAAATTAC
AGTCAATGAA
TAATGTAAAT
CATTGTAGTA
CACACCAGTT
ATAGACAAAA
CAACTTACAA
TACCCACATA
ATCAACCACT
AGAALATCCTT
AGTCATAATT
AACAAGATAC
GCTGAACTGG
ATAGGATCTA
ATTGAGATCA
AAGATAAGAG
CAAACCATCC
ACATTAGATA
CAAAATGACC
CATATTGATC
GAAGTCAGCC
TCAAAAACAG
GGTAAAACTA
GGTTGTGACT
TATGTAAACA
TATGACCCTC
AAAATCAATC
ACTGGCAAAT
TTGTTATCAT
ACACTA.AGCA
AACCACCTGA
CAAATATTTC
ACTAAGCTAG
A.AATCAACAC
GTAAATTTGA
ACTTTGAATG
TCAAGTCAAT
ATAGAACAGA
TAAACAACAT
ATAGTGATGA
TGTACAATAC
ATCTGCTCAA
TCCACAAAAG
AAACCAAAA
TCAAGTGAAA,
TTTGTAACAC
ACATAAGCAG
AATGCACTGC
ATGTGTCAAA
AGCTGGAAGG
TAGTGTTTCC
AAAGTTTAGC
CTACTACAAA
TAkATAGCTAT
AAGACCAACT
TCATGTTTCA
AACATCACAG
ATCCTTAACT
ATCATTCACA
GATTAGAGGT
GCCTCCTCAT
GGACAAAAGC
AGAATATGCT
AACAA.AACAA
CATTAAAAAG
TGTTATATCA
GAGACTACCA
CATAACCATA
TAATGATATT
GCATGGTTGC
TGACATATTC
CTCAGTAATT
ATCCAACAAA
CAAAGGAGTA
CAAGAACCTT
TTCTGATGAG
TTTTATTCGT
TATTATGATA
TGGTTTACTG
AAGTGGAATC
ACAACAATCT
TACAGGCTGA
TATAGTTACA
AAATTAACAG
CATTGCTTGA
GCATTACTAG
ATAGACACTT
CTTGGTATAG
TCAGCATGTG
CTTAGAGATA
TACATTGAGA
GCAGACGTGC
AGCAATCCAA
ACCGGATAAA
TACATTCAAT
AATTCCAAGT
ACTTCTCTTG
AATCGTGGGA
GATACTGTGT
TATGTAAAAG
TTTGATGCAT
AGATCTGATG
ACTACAATTA
TTGTATTGTA
AATAATATTG
GCTGACCACC
ATCATTTCC'r
TAAAAACCTC
CTGGGGCAAJA
ATGGTAGAAG
TGAGGCAAAA
TGTCTGAAAT
TTGGAGTGCT
TTGCTATGAG
ATGAAGAACC
GCAATAGAAA,
TGAAGAAGAC
A-AGAGTCA.AC
rATCCTTGTA
CATAAAAACA
ATGACTGCAA
GAGCTATAGT
TTATAAAGAC
CAGTGGGCAA
GGGAACCTAT
CAATATCTCA
ALATTACTACA
TTATAGTAAT
AAGCCAAAAA
CATTCAGCAA
AATCCCAAAT
CACATCATGC
AAGTATCACA
TATGTCGCGA
ATGTCACTAC
CTTCATGTTA
AAGTGGAGCT
AGAGAGTTAC
TAAACTTCTT
CAATTCACCT
AALACAACAAG
AATAAAGAAC
TGTGAATGAT
GTATATCATC
TATTACAATT
6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG
ATGAAATTCA
334 TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTA.ACATCCC
TGAAGATATA
TCATCCAACC
GGAAACTCTG
GAGTGTAATG
AACTTAATTA
ACACAGTCAT
TTCCAGTCAT
AACTTACTTA
ATCTTGA-ATA
GATGAAALACT
AATCAATCAT
ACAACACTCT
TGGTTCAATT
AAAAGTCATG
AATCAATATG
AATCAATTTT
TGGATAAGTA
AATGTTGTGT
GILAGGCTTCT
ACAGAAGAAG
GCAGCTATTA
ACAGTGTCTG
TTGATTAAGC
AGAATCTTTG
TGTAATGAAA
TATACAGTAT
ATAAAACTAT
CTAATGTGTA
CTTTAGGGAG
GTAGACAAAG
TAATATCTAG
TACTTATGAC
AAAAAATAAT
AACTAGGATT
CAGTACTTAC
ATACAAATTC
TGAAAAAATT
TATATACAAA
GGTTTATATT
GTTGTATCGT
TGACATGGA
ATTGTTTAAA
TATCACAATT
ACATAATAA
ATCAATTTIA
AGGCTCAAAA
ATAATATCAT
TTGCAGGTGA
GACATCCAAT
CTAGGTTCTA
ATATATTAGT
TTTGATAAGG
TCTAACTGAT
TTATCTTTTT
CCCACTACTA
ATATCATAAA
ATATAAAAGT
ACGAAGAGCC
AAAGGAAAAG
AACCATAATT
AGACAAAAGT
GATGTGTTCA
ATTAAATAAC
AATAGATAAT
TTATCATAAA
AGACATCAGC
TACATTAAAC
ATTTCTTTAT
AGAAGTAGAG
GAAACGATTT
GGACCTACTA
AAATGGTAAA
TAATAATCTC
GGTCGATGA
CTTATTAAGT
GTCATAATGC
TTATGGGACA
AGTTATTTAA
AACGGCCCTT
GAGCATATGA
GGTGAACTGA
ATGTCCTCGT
ATAGAAATAA
GACAGAGTTA
AAAGATGATA
CACTCAGTAA
ATGCAACATC
ATATTAACAC
CAAACTTTAA
GGACTCAA
CTTAGCAGAT
A.AAAGCTTAG
GGAGATTGTA
GGATTTATTA
TATAATAGCA
TCAAGAGTAT
TGGATAATCC
AATAACTTGA
AGACAAGCAA
AGTCTAAGTA
TTGACCATAA
AAATGGATCC
AAGGTGTTAT
ATCTTAAAAA
ATCTTAAAIA
AATTAGAAGA
CTGAACAAAT
GTGATGTAAA
AGCCCAACAA
TACTTTCGGC
ATCAAAATAT
CTCCATCATG
AATATCGATC
GTGGTTTTCA
AAATCACAAC
TAAATGTTTG
GGCTGAGATG
TACTGAAATT
TGTCTTTAAT
TGCTAAATA
GTCACACTTT
TATTAAGTA
GTGAGCTATA
TGGATTCTGT
CATTAAGAGG
CGACTCTATG
CATTATTAAT
CTCTTTTTCA
TGATTACACC
ACTALACTATA
ACCAACTTAT
TGCTACAACT
GGTGTACGCC
TAATTCAGGT
TGTGGAAAAC
CACTATCAAA
GTTAATACAC
AAATGAGGTA
GTTTATTTTA
TACTACTTAC
CTTAATTACT
TGGATTCAAT
ATTTCATAAT
TCTAAACATA
CATCACAGAT
ATTAGACAAG
ATTTCTTAAA
TTTTCTCTTC
AAGAATTAAC
TGCTTTCATT
8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 335 TATAGAATCA TA.AAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 9960
ATTGTCCTAC
GAAATCACAG
CTGCCTAAAA
GATCTAATAT
GAACATGAAA
TTGAGAGATA
CTCAACAACT
AGAATGTTTG
ATAGCTGAAA
CTTCAAAAGA
AACTACAACA
GCATTTAGAT
CAATCTCTGT
AGACATGCAC
AGTGGATTAT
ATTGAAGCTA
CTGATAAATG
CAGACCCATG
GAGTATGCAG
CAGTTCATGA
GTCCTGAGAG
TCTATAGGCA
ATATTTAGGA
TGTAACAATA
AATCTTGATA
CTCTAAGATG
AAAATGATTT
AAGTGGATCT
GGACTAGTTT
AGTTGAAGTT
ATAA.ATTCAA
CTA.ATCACGT
CTATGCAACC
ATATTTTACA
TATTAGAATT
ATTATATCAG
ATGAAACATC
TCTCTTGGTT
CTCCTTTCAT
ACAGATATCA
TATCATTATT
GTGATAATCA
CACAAGCAGA
GTATAGGCCA
GCAAAACAAT
TAGGTCCATG
GCTTAACACA
ACATTTGGTT
AGCTATATTT
GCATTGATAT
GTTAAACTAC
GATTATTTTA
TGAAATGATA
TCCTAGAAAT
CTCTGAA.AGC
TGAATGCGAT
GGTATCACTA
AGGTATGTTT
ATTCTTCCCT
AAAAGCAGGA
TAAATGTTCT
ATGTATCTGC
GCATTTAACA
AAAGGATCAT
TATGGGTGGT
AGATCTAATA
GTCAATTGAT
TTATTTGTTA
TAAGCTTAAG
CCAGCACAAT
GATAALACACG
GGAGTTAGAA
ATACAATCAA
AGATATATTG
GGCTTTATCA
TATAA.ACTTA
TCAGGATTGC
ATAAATGACA
TACATGCCAT
GACAGATCGA
CTATACAATT
ACTGGTAAAG
AGGCAAATCC
GAGAGTTTGA
ATAAGCAACA
ATCATTACAG
AGTGATGTAT
ATACCTCTTG
GTTGTTAATC
ATTGAGGGCT
TCTCTCAAAG
ATAAGCAAAC
GCATTAAATA
GGAACAGAGA
GGAGTGTACT
ATACTTGATG
TACAGAGGAG
ATTGCTTTGC
AAAGTATTAA
TTGTATATGA
ATACTTATCC
GGTTCTATCG
AAGCCATTTC
CACATATACA
GA.AGAGTACT
GTGTAGTCAA
AAAGAGAGCT
AAATCTTAGC
CAAGATATGG
AGTCAAATCG
ATCTTAGCAA
TAGATGAACT
TCACAATAAT
TTAATGAGGT
GGTGTCAAAA
GGAAATTCTC
CAGTTAGACT
GCCTTAAATT
CCTATATATC
ATCCAGCCAG
ATTTTAAAGT
AAAGCTTATT
AACTCCGAA-A
AACACTTA;A
ATTTGCCTAT
ATCTCTACTT
TGAGTTTCAT
ACCTCCAAAA
AAATTATATA
AGAGTATTAC
TCAAAGCTAT
CAGTGTAGGT
AGAGAAAATG
TGATCTAGAG
TTATAATGAT
ATTCAATCAG
GCATGGAGTA
ATGTACATAT
TGATGAACAA
ACTGTGGACC
TATCACAGCT
TATAGAGGGT
GTTATATAAA
CCGAGATATG
TATCAAAAAA
TAGTTTAGAA
ATGCAGTTTA
TCATGCATTA
AACTTTTTTT
GCTGTTTGGT
10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 336 GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA
CTTCCTTACA
GAAGCTATAG
AAGCTCCAGG
GATAAAAATC
GAAAGGCAAG
TACATTCAGT
ATCTTCCAGA
CCAATGCCGA
CTAAKATTAC
ATAGCCCCAA
ACAAAATATT
a a a. a a a
CTAAATGACA
GAAAGTTTAC
ATAACTAATA
GATATGATGA
GACAAAAGAG
AGAGAA.AGAT
ACAATGAACA
AATGTTALATA
ACGCAGGAGA
GACCAAATAG
GAATTCATGG
TTGTTTCCAC
TGTGAATTCC
CCTATCAATC
AATTGCATAA
CCTAATAGAA
TTTACAGGAG
CTACCAGATA
AAATCTGGAT
CATAATGCTT
TTATGCAAAA
CTTTTTATAA
TACT TGAAAA
GGAAAA.ATAT
AGTTATTAAG
CTTGGTCATT
TTAAATATAC
GTTTAACTCG
AAAA.AACAAT
ATTTATTAGC
AAGAACTGAG
AATATCTAAG
CTGCATCAAT
ATGTATTA2AC
GTTTTGGTCT
TTATTCTCAT
ATGTTGATAT
AA.ATAAGTTT
CTCACATCA
ATATTTTAAG
GTTTGTGTTG
TGATAGACTG
GTTTGTAACA
TAGTGAGATT
TTCTAAAAGT
TATAGAACCA
AGCAGAAAA-A
AACATCAGCA
AACTTTACTT
TTTAGAAAAT
ATCCAATATA
AACTAGCACT
TGGTGAAAGA
GCCAGTGTAC
AAAATTAGAC
TACTGGAkACA
TGTCAATTAT
ACCAGCTTAT
AGAAAAGTAT
TAGCCTGATG
hCCGAAGCTG
CATCAAGTTG
k.ACCCAATAT
:TCTAATTTA
rACTAATTTA
AGCTATTATA
AACAAATTCT
TTGATGAGCG
AATAGATTAG
GCACAACATT
ACTTACCCTC
ATAGTTAATC
ATAGATACAA
ATAAGGATAC
CTTAGTATAA
GTAGGAGTAA
ATAGCCAGTG
GGACCCACCA
AACAGACAAG
TGGGTATATG
CTTGGACTGT
TTACACCGTT
AGAACAACAA
GGAGATGAG
TCGGTTGTGG
AATGAGATAC
P.AGCAAGTGA
GTAGAATTAT
ATATTAGTAC
GCTGGAATT
CTGGTCACGA
TGACATGTGT
ATCCACAGGC
CAGTAACAGA
ATACTACCAC
ATGGATTAAG
TTATATCAGG
CTGATATTAA
TTCCACTAGA
CTGAATTAAG
CATCGCCAAG
GTATAATAAT
AGCCATGGGT
TTTTAACCAA
CATCCATAGA
CATATGAAAIA
TALACAGTCAG
ATTATCATTT
ATATCGACAT
AACAATTCAC
ATTTGATGAA
TACAAAAGCA
TCTTAAGTAA
ATAAAATGTC
GGATTCTGAT
TTTACAAGAT
CATCACATTT
TTTAGGGTCT
AGTCTTA.AGT
TGAGATTGAT
AGTTGTTTAT
AACAAAATCC
TAGGGCTACT
TTGTAACAAA
CAAGTATGTA
TATTATGTTC
AGAAAAATAT
AGGCTCATCC
AAAGCAAAGA
CAACAAAGAT
AGCCAAAAAG
TAGTAGACCA
TGATACTAGT
TGTGTTTCAA
AAACATATGT
PCCTCCTATA
GACATGTTC
CAAAGCACTT
TGATTATTTT
TATTCAACTT
11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 337
ATGAAAGATT
ATGTTCATTA
AAAGGTTATG
TTGGAGTTAA
GTCATAAAAT
TTTAAGTTGT
GTTAACATAG
AGAATGGGGT
GAATTTTACA
CTAACAAAAC
CACCCAACCC
AACAAACCTA
AGTAAAATGC
TTGTACAATT
GCAAAATCTA
GCATCACTTT
TCCACAGGAT
AGTTGTATAG
CTTCATCCAG
CCTATTGAAT
ACCATTCCTG
GCAGAACCTA
AAAATTATAA
AGATGCATTT
ATTACTATAT
TTAATCCTTA
CAAAAGGTAT
ATTTGAATGT
GTAAAGCAAjA
TAGACAGTAG
ACATAGTCALA
GGTTTTTAAA
ATTATCACCC
TAATAAATGT
CATCAAATCT
AAATA.AGAAT
CAGAAACTTT
AATTTTGTAT
ATATTAAATC
TATTTCCAAT
ACCAACTTTA
ATTGCATGCT
GCAAGATCAG
CATTCATAGG
ACATAAGATA
TTCTAAGGTT
CTACAGATGC
TTAGCATCTT
TTGAATGGAG
TAATTGCAAA
TAAAAACTTA
CAATAGGCCC
TTTTGAAAAA
TTTCTTTAAT
ATTAGAATGT
CTACTGGAAA
TCAAGACACA
ACGCCTTAAT
AACACACATG
AGATAAATTA
CTTTTACATT
TGCTAATTCA
AGAAAATATG
AAGTGGAAAT
TTCCACTGTT
TGTTGTGATA
CACCACCACT
TCCTTGGCAT
TATAGAGTAT
TGAAGGAGCT
CATTTACAGA
ATACAACGGG
AACTAATAAC
TGTCTGCGAT
TAAGCATGTA
ATATCATGCT
CGTGTGCCTA
TGCAA.ATATA
GATTGGGGAG
GCTTATAAGA
GATATGAACA
TCTATGTCTA
AGTTTGCGTA
AATGCTAAAT
AILAGCTATAT
ACCATTAAAA
AGTTATAACT
GAATTAGAAG
TCATTAATTC
ACCGAATCTA
ACCACAAGAT
GACAAGATTA
TCACATCAGA
CATGTCAATA
ATTTTAAAAG
GGTAACTTAT
AGTTTAAA-AG
CATATAAACA
ATTCATTGGT
GCTGAATTAC
AGAAAGTGCA
CAAGATGACA
GGTAGCAAGT
CTTCCTGTTT
AGGGGTACAT
CTTATTTGCT
CTTCAGATCT
AAGTTTTCCT
GAATAAAAGG
TTACCGTATG
TATCTTACAT
ATAAAAACAJA
TTTCAGACAA~
ATAATTATAA
CTGTTAAAAG
TGATGATGTC
TCAATTATAG
TAGATCATTC
CATCTTTAGT
GATTTAACTT
ATCTTAAGAT
TATTACGTAC
ATTGCAATGA
TAGATTATGG
CTTATTTACA
CTGTTACAGC
AGTACTGTTC
rTGATTTCA rAAAAGGATC rTGATGTTGT
AACTGATCAT
ATGTTTTCAT
TCTTTGTGTT
AGAACAAAA
CTGTCACAGT
CCCTTGGGTT
AGATTTAGTT
ATTCAATGAT
CACTCATTTG
CAAACTATAT
TAATAATAGT
AACATTCTCT
CAAACAAGAC
AGGTA.ATACA
AAGGAATAGT
TGTATTTAGT
TAAGGACCCC
GGTAGTAGA.A
TCATAGTTTA
TGAGA.ATTTA
TATAAAATTT
CAATTGGAGT
TTCTGTAAAT
AiTTAGATAAC
TGAAGTTTAC
ACAAAATGCT
13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 338
AAATTGATAC
ATCGATGCAA
ATTAAGACTT
ATAGCTGGAC
CTAAAATGGC
TACATGATAG
GAGCTCAAGA
TAGTTTAAAA
TAGTTATTAA
TTTTAGTCTT
ACACILACGAG
TTTCAAGAAC
ATATTAAAAG
CATTGTCAAA
GTAATGAAGT
TAGATCATGT
AGTCCACATA
AGCTGATTAA
TATCATTAAC
AGAATATACA
AAGGGGTTAA
TAAAAATTTC
CTTAATACCT
ATTGA.AGAGT
ATTCAGCA.AC
TTTAA.ATTTT
TCCTTACTTA
KATAACAGGT
AAGTTTGGTC
AACTTTTCA-A
ATAAAAGTCT
ATTATGCCTA
TTCCTTTGTT
GTAGTTAATG
AAGCTTATAA
AGATCAGCTG
AGTGAATTGT
AGTGTGCTAT
AAATTTAGAT
TAATTTAGCA
AAAACTAACA
AAAAAACTGA
ACCCTATAAC
GAGATATATT
ACCACAAGCA
AACTTAATTA
TAAATAGTTT
ACA.ACCTTCC
GCTAACACAT
TATTGATTCC
ATTATACATG
CAAGGAATCT
AAAAAAAGGA
ATCATATTCT
TATGA.ATATC
CAATCATTTA
AACAACCAAT
CAACGAACAG
CATTATATTA
AAAATTATCA
TGCATTCACA
14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15219 ACATTAGTTT TTGACACTTT TTTTCTCGT INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 2166 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) Met SEQUENCE DESCRIPTION: SEQ ID NO:28: Asp Pro Ile Ile Asn Gly Asn Ser Ala 10 Asn Val Tyr Leu Thr Asp Ser Tyr Leu Lys Giy Val Ile Ser Phe 25 Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Ile Ser Arg Gin Ser Pro Leu Leu Glu 55 Thr Ile Thr Gin Ser Leu Ile Ser Arg 70 Ser Giu Cys Asn Lys Asn Asp His Met Asn Tyr His Lys Ala Leu Gly Thr Asn Leu Lys Lys Leu Giy Giu Leu 339 Leu Glu Met Ser Ile Arg Asn Lys 130 Ser Gly 145 Leu Ser His Ser Leu Met Asn Leu 210 Glu Val 225 Phe Leu LysAs Ser Asn 290 Phe Asn 305 *Leu Lys *see Gly Phe Giu Pro Ser Ser 1.00 Arg Ala 115 Leu Gly Asp Glu Ala Val Val Asn 180 Cys Ser 195 Tyr Thr Lys Ser Gin Phe Lys Lys 260 Ile Ser 275 Cys Leu Asn Val Leu Phe Ile Met 340 Thr Tyr Giu Gin Ile Glu Leu Lys Asn Ser 150 Giu Asn 165 Gin Asn Met Gin Lys Leu His Gly 230 Ile Leu 245 Ile Thr Leu Ser Asn Thr Val Leu 310 His Asn 325 Ser Leu Phe Ile Ile Giu 135 Vai Asn Ile His Asn 215 Phe Asn Thr Arg Leu 295 Ser Glu Ile Gin Ala Ser 120 Lys Leu Gin Thr Pro 200 Asn Ile Gin Thr Leu 280 Asn Gin Giy Leu Ser Thr 105 Asp Asp Thr Ser Ile 185 Pro Ile Leu Tyr Thr 265 Asn Lys Leu Phe Asn 345 Leu 90 Thr Val Arg Thr Tyr 170 Lys S er Leu Ile Gly 250 Tyr Val Ser Phe Tyr 330 Ile Leu As n Lys Val Ile 155 Thr Thr Trp Thr Asp 235 Cys Asn Cys Leu Leu 315 Ile Thr Met Leu Val Lys 140 Ile Asn Thr Leu Gin 220 Asn Ile Gin Leu Gly 300 Tyr Ile Giu Thr Leu Tyr 125 Pro Lys Ser Leu Ile 205 Tyr Gin Val Phe Ile 285 Leu Giy Lys Glu Tyr Lys 110 Aia Asn Asp Asp Leu 190 His Arg Thr Tyr Leu 270 Thr Arg Asp Giu Asp 350 Lys Lys Ile Asn Asp Lys 175 Lys Trp Ser Leu His 255 Thr Trp Cys Cys Val 335 Gin Ser Ile Leu Asn Ile 160 Ser Lys Phe Asn Ser 240 Lys Trp Ile Gly Ile 320 Glu Phe 340 Lys Ile Asp 385 Leu Asn Met Glu Phe 465 Pro Tyr Leu Lys Pro 545 His Asp Aen Asn Val Lye Lye 370 Lys Ser Asn Val Thr 450 Ile Thr Lye Ile Lys 530 Lye Ile Arg Glu Ser 610 Gly Arg 355 Ala Thr Lye Leu Asp 435 Arg Tyr Leu Leu Ile 515 Va1 Asp Gin Ser Cys 595 Asn Arg Phe Gin Val Phe Ser 420 Glu Phe Arg Arg Asn 500 Leu Asp Leu Asn Arg 580 Asp His Met Tyr Lye Ser Leu 405 Glu Arg Tyr Ile Asn 485 Thr Ser Leu Ile Tyr 565 Arg Leu Val Phe Asn Asp Asp 390 Lye Leu Gin Leu Ile 470 Ala Tyr Gly Glu Trp 550 Ile Val Tyr Val Ala Ser Leu 375 Asn Leu Tyr Ala Leu 455 Lys Ile Pro Leu Met 535 Thr Glu Leu Asn Ser 615 Met Met 360 Leu Ile Ile Phe Met 440 Ser Gly Val Ser Arg 520 Ile Ser His Glu Cys 600 Leu Gin Leu Ser Ile Lye Leu 425 Asp Ser Phe Leu Leu 505 Phe Ile Phe Glu Tyr 585 Va1 Thr Pro Asn Arg Asn Leu 410 Phe Ser Leu Val Pro 490 Leu Tyr Asn Pro Lys 570 Tyr Val Gly Gly Asn Val Gly 395 Ala Arg Val Ser Asn 475 Leu Glu Arg Asp Arg 555 Leu Leu Asn Lye Met Ile Cys 380 Lys Gly Ile Arg Thr 460 Thr Arg Ile Glu Lye 540 Asn Lye Arg Gin Glu 620 Phe Thr 365 His Trp Asp Phe Ile 445 Leu Tyr Trp Thr Phe 525 Ala Tyr Phe Asp Ser 605 Arg Arg Asp Thr Ile Asn Gly 430 Asn Arg Asn Leu Glu 510 His Ile Met Ser Asn 590 Tyr Glu Gin Ala Leu Ile Asn 415 His Cys Gly Arg Asn 495 Aen Leu Ser Pro Glu 575 Lye Leu Leu Ile Ala Leu Leu 400 Leu Pro Asn Ala Trp 480 Tyr Asp Pro Pro Ser 560 Ser Phe Asn Ser Gln 0 o t *as* 0 0.
*5 S S
S.
341 625 Ile Glu Leu Asn Asn 705 Asp Ile Ile Leu Trp 785 Lys Ile Asp Ala Asp 865 Pro Leu Ser Lys Asn 690 Gin Glu Pro Lys Tyr 770 Thr Phe Ser Tyr Gly 850 Met Ala Ala Leu Ala 675 Tyr Ala Leu Leu Asp 755 Arg Ile Ser Lys Leu 835 Ile Gin Ser Glu Lys 645 Thr Arg 660 Gly Ile Ile Ser Phe Arg His Gly 725 Val Thr 740 His Val Tyr His Glu Ala Ile Thr 805 Pro Vai 820 Leu Ala Gly His Phe Met Ile Lys 885 630 Met Tyr Ser Lys Tyr 710 Val Ile Val Met Ile 790 Ala Arg Leu Lys Ser 870 Lys Ile Gly Asn Cys 695 Glu Gin Ile Asn Gly 775 Ser Leu Leu Asn Leu 855 Lys Val Ala Asp Lys 680 Ser Thr Ser Cys Leu 760 Gly Leu Ile Ile Ser 840 Lys Thr Leu Glu Leu 665 Ser Ile Ser Leu Thr 745 Asn Ile Leu Asn Glu 825 Leu Gly Ile Arg 635 Asn Ile 650 Glu Leu Asn Arg Ile Thr Cys Ile 715 Phe Ser 730 Tyr Arg Glu Val Glu Gly Asp Leu 795 Gly Asp 810 Gly Gin Lys Leu Thr Glu Gin His 875 Val Gly 890 Leu Gin Tyr Asp 700 Cys Trp His Asp Trp 780 Ile Asn Thr Leu Thr 860 Asn Pro Gin Lys Asn 685 Leu Ser Leu Ala Glu 765 Cys Ser Gin His Tyr 845 Tyr Gly Trp Phe Ile 670 Asp Ser Asp His Pro 750 Gin Gin Leu Ser Ala 830 Lys Ile Val Ile Pho 655 Leu Asn Lys Val Leu 735 Pro Ser Lys Lys Ile 815 Gin Glu Ser Tyr Asn 895 640 Pro Glu Tyr Phe Leu 720 Thr Phe Gly Leu Gly 800 Asp Ala Tyr Arg Tyr 880 Thr e g.
C
CC CC
C
C
COCC
C C CC C
C.
C
C.
CCC...
C
C
C
C
0@
C
C CC
CCC.
cCC.
Ile Lou Asp Asp Phe Lys Val Ser Leu Glu Ser Ile Gly 900 905 Ser Lou Thr 910 342 Gin Giu Leu Giu Tyr Arg Gly Giu Ser Leu Leu Cys Ser Leu Ile Phe 915 920 925 Arg Asn Ile Trp Leu Tyr Asn Gin Ile Ala Leu Gin Leu Arg Asn His 930 935 940 Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp Ile Leu Lys Val Leu Lys 945 950 955 960 His Leu Lys Thr Phe Phe Asn Leu Asp Ser Ile Asp Met Ala Leu Ser 965 970 975 Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 980 985 990 Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Giu Ala 995 1000 1005 Ile Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 1010 1015 1020 Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 1025 1030 1035 1040 Thr Cys Val Ile Thr Phe Asp Lys Asn Pro Asn Ala Giu Phe Val Thr *1045 1050 1055 Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys Ile .1060 1065 1070 Thr Ser Giu Ile Asn Arg Leu Ala Val Thr Giu Val Leu Ser Ile Ala **1075 1080 1085 Pro Asn Lys Ile Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Giu 1090 1095 1100 Ile Asp Leu Asn Asp Ile Met Gin Asn Ile Giu Pro Thr Tyr Pro His *1105 1110 1115 1120 *Gly Leu Arg Val Val Tyr Giu Ser Leu Pro Phe Tyr Lys Ala Giu Lys 1125 1130 1135 Ile Val Asn Leu Ile Ser Gly Thr Lys Ser Ile Thr Asn Ile Leu Giu 1140 1145 1150 Lys Thr Ser Ala Ile Asp Thr Thr Asp Ile Asn Arg Ala Thr Asp Met 1155 1160 1165 Met Arg Lys Asn Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp Cys 1170 1175 1180 343 Asn Lys Asp Lys Arg Giu Leu Leu Ser Leu Giu Asn Leu Ser Ile Thr 1185 1190 1195 1200 Giu Leu Ser Lye Tyr Val Arg Giu Arg Ser Trp, Ser Leu Ser Asn Ile 1205 1210 1215 Val Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asn Ile Lys Tyr 1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile Ile Ile Giu Lys Tyr Asn Val 1235 1240 1245 Aen Ser Leu Thr Arg Giy Glu Arg Gly Pro Thr Lye Pro Trp Val Gly 1250 1255 1260 Ser Ser Thr Gin Giu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 1265 1270 1275 1280 Leu Thr Lys Lys Gin Arg Asp Gin Ile Asp Leu Leu Ala Lye Leu Asp 1285 1290 1295 Trp Val Tyr Ala Ser Ile Asp Asn Lys Asp Giu Phe Met Glu Giu Leu 1300 1305 1310 :Ser Thr Gly Thr Leu Gly Leu Ser Tyr Giu Lys Ala Lys Lye Leu Phe 1315 1320 1325 Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 Arg Pro Cys Glu Phe Pro Ala Ser Ile Pro Ala Tyr Arg Thr Thr Aen 1345 1350 1355 1360 Tyr His Phe Asp Thr Ser Pro Ile Aen His Val Leu Thr Glu Lye Tyr 1365 1370 1375 Gly Asp Giu Asp Ile Asp Ile Val Phe Gin Aen Cys Ile Ser Phe Gly 1380 1385 1390 Leu Ser Leu Met Ser Val Val Giu Gin Phe Thr Asn Ile Cys Pro Asn *1395 1400 1405 Arg Ile Ile Leu Ile Pro Lys Leu Aen Giu Ile His Leu Met Lye Pro 1410 1415 1420 Pro Ile Phe Thr Gly Asp Val Asp Ile Ile Lye Leu Lye Gin Val Ile 1425 1430 1435 1440 Gin Lye Gin His Met Phe Leu Pro Asp Lye Ile Ser Leu Thr Gin Tyr 1445 1450 1455 Val Glu Leu Phe Leu Ser Asn Lye Ala Leu Lye Ser Gly Ser His Ile 344 1460 1465 1470 Aen Ser Asn Leu Ile Leu Val His Lys Met Ser Asp Tyr Phe His Asn 1475 1480 1485 Ala Tyr Ile Leu Ser Thr Asn Leu Ala Gly His Trp Ile Leu Ile Ile 1490 1495 1500 Gin Leu Met Lye Asp Ser Lys Gly Ile Phe Glu Lys Asp Trp Gly Giu 1505 1510 1515 1520 Gly Tyr Ile Thr Asp His Met Phe le. Asn Leu Asn Val Phe Phe Asn 1525 1530 1535 Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 1540 1545 1550 Lye Leu Giu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Giu 1555 1560 1565 Leu Ile Asp Ser Ser Tyr Trp Lye Ser Met Ser Lye Val Phe Leu Giu 1570 1575 1580 Gin Lye Val Ile Lye Tyr Ile Val Asn Gin Asp Thr Ser Leu Arg Arg 1585 1590 1595 1600 I .le Lys Gly Cys Hie Ser Phe Lys Leu Trp Phe Leu Lye Arg Leu Aen 1605 1610 1615 Ala Lye Phe Thr Val Cye Pro Trp Val Val Aen Ile Asp Tyr His 1620 1625 1630 *Pro Thr His Met Lye Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met 1635 1640 1645 Gly Leu Ile Aen Val Asp Lye Leu Thr Ile Lye Aen Lye Asn Lye Phe 1650 1655 1660 **Aen Asp Giu Phe Tyr Thr Ser Asn Leu Phe Tyr Ile Ser Tyr Aen Phe 1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lye Gin Ile Arg Ile Ala Aen Ser .**1685 1690 1695 Glu Leu Giu Asp Aen Tyr Asn Lye Leu Tyr His Pro Thr Pro Giu Thr 1700 1705 1710 Leu Giu Asn Met Ser Leu Ile Pro Val Lye Ser Aen Asn Ser Aen Lye 1715 1720 1725 Pro Lye Phe Cys Ile Ser Gly Aen Thr Glu Ser Met Met Met Ser Thr 1730 1735 1740 345 Phe Ser Ser Lys Met His Ile Lys Ser Ser Thr Val Thr Thr Arg Phe 1745 1750 1755 1760 Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro Ile Val Val Ile 1765 1770 1775 Asp Lys Ile Ile Asp His Ser Giy Asn Thr Ala Lys Ser Asn Gin Leu 1780 1785 1790 Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 1795 1800 1805 Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 1810 1815 1820 Phe Ser Ser Thr Gly Cys Lys Ile Ser Ile Giu Tyr Ile Leu Lys Asp 1825 1830 1835 1840 Leu Lys Ile Lys Asp Pro Ser Cys Ile Ala Phe Ile Gly Giu Giy Ala 1845 1850 1855 Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 :e*Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile *1875 1880 1885 *Giu Phe Leu Arg Leu Tyr Asn Gly His Ile Asn Ile Asp Tyr Gly Glu .1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala Thr Asn Asn Ile His Trp Ser **1905 1910 1915 1920 Tyr Leu His Ile Lys Phe Ala Giu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 *Ala Giu Leu Pro Val Thr Ala Asn Trp Ser Lys Ile Ile Ile Glu Trp 1940 1945 1950 *Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 1955 1960 1965 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 1985 1990 1995 2000 Lys Gly Ser Giu Val Tyr Leu Ile Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 346 Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Ile Leu Ser Arg 2020 2025 2030 Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lys Giu Ser Ile Asp 2035 2040 2045 Ala Asn Ile Lys Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lys 2050 2055 2060 Lys Gly Ile Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 2065 2070 2075 2080 Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Asn Glu Val Phe Ser Asn 2085 2090 2095 Lys Leu Ile Asn His Lys His Met Asn Ile Leu Lys Trp Leu Asp His 2100 2105 2110 Val Leu Asn Phe Arg Ser Ala Giu Leu Asn Tyr Asn His Leu Tyr Met 2115 2120 2125 Ile Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 2130 2135 2140 *Thr Asn Giu Leu Lys Lys Leu Ile Lys Ile Thr Gly Ser Val Leu Tyr *2145 2150 2155 2160 *Aen Leu Pro Asn Giu Gin 2165 INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 15219 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii)MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: ACGGGAAAA.A AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA~ TGGGGTGCAA TTCACTGAGC 120 ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG ALAGTAGCATT GTTAAAAATA 180 ACATGTTATA CTGATAAATT AATTCTTCTG ACCA.ATGCAT TAGCCAAAGC AGCAATACAT 240 347 C.
C
C
Cs CC..
C
ACAATTAAAT
GATAACAATA
TACATATGGG
AATTGTGAAA
CAAATATCTG
TAGACATGTG
AATAAACTCA
GATTGATGAT
AAGAAATCAT
TTGATGAA.AG
AAGTAGGGAG
CCATGCCTAT
AACACACTCC
CCCAACCAAA
CATTTTAGTA
TGGCTCTTAG
GCAAATACAC
AAAAACACCT
TCACAGGATT
AGATACTTAA
GTCAAGATAT
CAGAAATACA
AGATGGGAGA
TGTGTATAGC
CAGTAA.TTAG
TACCAAAGGA
TAAACGGCAT
TTGTAGTGA.A
AATTGATTGA
TCAAkATTTTC
ACTTACTTGG
TTTATTACCA
CCTAATCAAT
CACAGACATG
CACACACAAA
ACAAGCTACA
TACCAAATAC
ATTTATCAAT
TATAATATAC
CCAAACTATT
ATTAAAAATA
CAAAGTCAAG
TATTCAACGT
AAACAAACTA
AATAGGTATG
AGATGCTGGA
A.AATGGAAAG
AGTCAATATT
AGTGGCTCCA
TGCACTTGTG
GAGGGCAAAc
TATAGCTAAC
AGTTTTTATA
ATCTAkACTTT
GTTGACACAC
TAAA.AGACTA
GCTTGATCTC
TTTTAGTTAA
CAAACCATGA
AGACCCCTGT
TTCATATACT
TTTACATTCT
AAAAAATACA
CACGGCGGGT
AAATATGACC
CCTCAAACAA
AAAGTAAAGC
TTGAATGATA
AGTACAGGAG
TGTGGTATGC
TTATATGCTA
TATCATGTTA
GAAATGAAAT
GAGATAGAAT
GAATATAGGC
ATAACCAAAT
AATGTCTTAA
AGTTTTTATG
CATGTTATAA
ACAkACAATGC
TGCTCTCAAT
AGTGACTCAG
AATTCATGAA
TATAAAAACT
GCACTACAAA
CAATGGATTC
TGATAAACAA
TAGTCAATTA
CTGAATATAA
TTCTAGAATG
TCAACCCGTG
CAGTGCTCAA
CALATAACATA
CATTAAATAA
ATAATATTGA
TATTAATCAC
TGTCCAGGTT
AAGCTAATGG
TCGAAGTATT
CTAGAAAGTC
ATGATTCTCC
TAGCAGCAGG
AAAACGAAAT
AAGTGTTTGA
CAAGCAGTGA
CAATACTACA
TAAACGGTTT
TAATGACTAA
TTATGTTTAG
CATCAAAGGG
TGACAACACT
AATAATAACA
TGAATGTATT
TGAGATGAAG
TACAAAATAT
TATTGGCATT
A.ATTCCAACA
TAGTTAAGAA
AATTGGGGCA
GGATCAGCTG
CACTCCCAAT
TGAAGATGCA
AGGAAGGGAA
AGTAGATATA
AACATTATCA
CTACAAAAA
AGACTGTGGG
AGACAGATCA
AAAACGATAC
AAAACACCCT
AGTGTGCCCT
AAATGGAGGA
AATGGATGAT
TTATATGAAT
TCTAATTCAA
AAATGGGGCA
ACTATGCAA.A
TCTCTTACCA
GTAAGAA.AAC
CTACTGCACA
GGCACTTTCC
AAGCCTACAA
AAA.AAACCAA
GGAGCTAATC
AATACAAAGA
CTGTCATCCA
TATGATGTGC
AATCATAAAT
GACACTATAA
ACAACATATC
AGCTTGACAT
ATGCTAAAAG
ATGATAATAC
GGTCTTACAG
AAGGGCCTCA
CATCTTATAG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 348
S
S
S S
ATGTTTTCGT
GAATCTTTGC
GAGTTTTAGC
TGGAGCAAGT
ACCATATATT
CAAGTGTGGT
CAAGAAACCA
GAGTAATAAA
AACTCAACCC
AGTCAACATG
CAAATTCCTA
TAGCATAATA
TGGCACCAAC
CTACCCAAGA
TTCTAAGTTG
CTCATATGAA
TGATGAAAAA
ACCCACTTCA
AGAAAAAATA
TAGGAATGAG
AACTTCCAAA
TGATGATTTT
TCAATCCATT
CCAATCAATC
AAAAGAACAA
GCACTTTGGC
AGGATTGTTT
CAAATCTGTA
TGTGGA.AGTC
GAACA-ATCCA
CCTAGGC.T
GGATCTTTAT
CTACAGTGTA
CAAAGAAGAT
GAGAAGTTTG
GAATCAATA
TCTGTTAACT
ATCATCAATC
AAACCCCTAG
TACAAGGA
GAGATAAATG
TTAAGTGAIA
GCTCGCGATG
AGAGCGGAAG
GAAAGCGAAA
AAATTGAGTG
TGATCAGCGA
GAATCAACTG
AACCAATTGA
GATGGGGCA2A
ATTGCACAAT
ATGA.ATGCCT
AAAAATATCA
TATGAGTATG
AAAGCATCAT
GCAGCAGGTC
GATGCAGCTA
TTAGACTTAA
GATGTAGAGC
CACCTGAATT
AGGGCAAGTT
CAATAGATAT
CAACAAGTGA
TAAGCTTCAA
CAATAGAAAC
ATCAAACAA
TATTAGGAAT
GAATAAGAGA
CATTAATGAC
AAATGGCAAA
ACTTGTTGGA
TCAACTCACT
CCAGACCGAA
TCAATCAGCA
ATATGGAAAC
CATCCACAAG
ATGGTTCAGG
TGCTAGGACA
CACAGAAGTT
TGCTGTCATT
TAGGCATAAT
AAGCATATGC
CAGCAGAAGA
TTTAAGTTAA
TCATGGAGA
CGCATCATCC
AGAAGTAACT
AGCCGACAGT
AGAAGATCTC
ATTTGATAAC
TGACAACATT
GCTCCATACA
TGCTATGGTT
CAATGATAGG
AGACACCTCA
AGACAACGAT
CAGCAATCAA
CAAACAAACG
ACCCGACAA
ATACGTGAAC
AGGGGGTAGI
GCAAGTAATG
TGCTAGTGTC
GGGAGGAGAA
AACTCAATTT
GGGAGAGTAT
AGAGCAACTC
ATTGGAAGCC
CAAAAAATAC
GATGCAIAATA
AAAGATCCTA
AAAGAGAGCC
ACCCCAGAAA~
ACCCCAAGTG
AATGAAGAAG
ACAGCAAGAC
TTAGTAGTTG
GGTCTAAGAG
TTAGAGGCTA
GATGAAGTGT
AGTGACAATG
CAACATCAAT
TCCATCAGTA
ATTAACAATA
AAGCTTCACG
AGAGTTGAAG
CTAAGATGGG
CAGGCAGAAA
GCTGGATTCT
CCCAACTTCT
AGAGGTACAC
AAAGAAAATG
ATAAAGCATC
GGGGCA;ATA
ACAAAGCTAC
AGAAGAAAGA
CGATAACATC
CAAAAGCCAA
ACAACCCTTT
AATCTAGCTA
TAGATAGA.AT
CAAGTGCAGG
AAGAGATGAT
TGGCAAGACT
CTCTTAATCC
ATCTATCACT
AAAACAGACA
GAACCACCAA
TAGTAACAAA
AAGGCTCCAC
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360
*SSSSS
S
S S *5
S.C.
S 5 ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 349 AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC
S
AAGCATCAAT
GATTA-ACTCA
TGTATCATTA
AGCATGCAGT
CATGAAGACA
GACATCAAAA
TCTGAACTCA
AATTATTCCT
CAAATATATC
GAGCATATAT
ACTAGAGGAT
ACACACTATA
CCACTCAAAA
AAATAAAACC
CATTTATACC
ACAATAGAAT
CTAATCTTTT
CATAAAGCAT
AGTTCTACCA
CTTCTCACAG
TATTTAAAAA
AATGCAACCA
GATACTCTTA
ATAGCACAAA
GCCATAATAT
ATACTAGTGA
AGAAGTGCTG
GATGAAAGAA
CTAACATGCT
TTCAACCCCA
AGAGTAATAA
CTAGAAAATA
TATGCAGGAT
AALACCACAGA
TATGTGACTA
TAAACTTAAT
TCCAILACATC
TCCAAAATCA
AAALATATGGG
GCCAATTCAA
TCACAAGCAA
TACTAATTAT
TCTGTAACAA
TTATGCTGTG
AGTCATGGTG
TTAACATAAT
TGTCCAAACA
ATCATCTAAT
TAGCACTATC
TCATCATCTC
AGCAGATCTC
TGCTGGCTCA
GCAAATTAGC
TAAAAGTAAA
CTCATGAGAT
TACCA.ACCTA
TAGCALACCAC
TAGTGTTAGT
GTCAATTTAT
CTAATTGGAA
TATCAACACT
ATAALACATCT
CTACCAGCCA
GTAAATAGAC
CACATATACT
ATTTTGGCCC
A.ATCACTATT
AACTCTTGAA
TCAAATTATA
TCGCAAAACC
GATGAATTGT
CAAGAATCA
TGTAATATCC
AGTTTTGGCA
TGCCAATCAC
GCATACAGCT
GAATGACAGG
ACACTACACA
CTATCTGCTA
ATTAGTTAGA
ATAALATCTTA
TATTTTACAC
ATGATTGCAA
CTAGGACAGA
ATCCTGTATA
ACGCTAA CTA
TAGTATGAGA
CGCACTGCCA
TCTTGTTTAT
ATGATAATCT
AAAGTTACAC
ACACGTTTTT
TCCACATATA
CTTCATCACA
GACCTAGAGT
GTTCAATCAA
AAATGGGAAA
TAATACATAT
TACTAALATAA
TGTATCAAAT
TATAAACAAA
TCATGGTAGC
TCAAAAACAA
GGACTCTAGA
ACAGATTAAA
CAACCTCTCT
TAACAACGGT
TACGCCCAAA GGACCTTCAC AATGCCTAGT AATTTCATCA ATATGATGTA ACTACACCTT ALAGTATGTTA ACTACAGTCA CATTGCTCTA TGTGAATTTG TCTAAGATCA ATTAGTGTCA CGA.ATTCAAA AATGCTATCA TATCACAGTT ACTGACAATA AGTAGATCTT GGTGCCTACC
TACGAGTCAC
TAAGCGCAAA
GTGA.AATCAA
A.AGATCTTAC
AAAATATTAT
AGAACAAGGA
CCAATGCAAA
AAGGAGCATT
TAGAAAAAGA
CAATCAAACC
TCCTCAAACT
CAAACCAATC
GCGAATAGGT
TCTTAkACAAC
TACATCCATC
GATCTTAACT
GCTAAGTGAA
CAACACATAG
CAAkATCCAAT
ATAGAGTAGT
CATTGGGGCA
AAAGACCTGG
TTTAAAATCT
CATAATTGCA
CACAGTTCAA
3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 41.40 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 350 a.
S.
4
ACAATAAAAA
AGGGTTAGTT
ACATCACCCA
ACCACTTCAA
AAAGATGATT
CAACTTTGCA
ATCAAACCCA
GCCAAAACGA
ACAGAAAGAG
ACAGTCCAAC
ACACCCACAG
TAGTTATTCA
AAACCTGGGG
TCTTGCTATT
ATCGACATGT
TAGTGTCATA
TAAAGTAAAA
GCTACTTATG
TATGAACTAC
ACGAAGATTT
ATCAA.AAGTT
AAACAAAGCT
TCTCAAGAAT
CTCCAACATT
CAGAGAATTT
ACCACACTGA
CATCCAAGCA
ATACAAALATC
CACAGACCAA
ACCATTTTGA
AATCCATCTG
CAAACA.AACC
CGAAAAAAGA
ACACCAGCAC
AGCAATCCCT
CATCCGAGCC
AAIAACTACAT
CAAATAACCA
AATGCATTGT
AGTGCAGTTA
ACAATAGAAT
CTTATGAAAC
CAAAACACAC
ACAATCAATA
CTAGGCTTCT
CTACACCTTG
GTAGTCAGTT
TACATAAATA
GAAACAGTTA
AGTGTCAATG
AAAAAACATC
ACCCACAACC
AGAAACACAC
CAAGCCA.AGC
AGTGTTCAAC
CAAAACAATA
AACCACCAAA
AACTACCACC
CTCACAATCC
CCTCTCAACC
CTCCACACCA
CTTAGCAGAG
TGGAGTTGAT
ACCTCACCTC
GCAGAGGTTA
TAAGTAATAT
AAGAATTAGA
CAGCTGTCAA
CCACTAAAAA
TGTTAGGTGT
AAGGAGAAGT
TATCAAATGG
ACCAATTATT
TAGAATTCCA
CAGGTGTAAC
ACCACCTACC
ACATCACCAA
CATACAACAG
ACAAAACCAC
TTCGTTCCCT
CCAAGCAACA
ACCACAAACA
AACCCAACAA
ACTGCACTCG
ACCCCCGAAA
AACTCCACCC
AACCGTGATC
GATCCACAAG
AAGTCAG.ZAC
TTTTAGTGCT
AAAAGAAACC
TAAGTATAAG
CAACCGGGCC
CCTAAATGTA
GGGATCTGCA
GAACAAGATC
GGTCAGTGTT
ACCCATAGTA
GCAGAAGAAC
AACACCTTTA
CTACTCAAGT
TCCACACAAG
CACAAACCAA
GTCCAAAAAA
GCAGTATATG
AACCA.AAGAA
AAAGAGACCC
A.AJAAACTAAC
ACACAACCAC
ACACACCCAA
AAAAA-ACCCA
TATCAAGCAA
TCAAGTGCAA
ATAACTGAGG
TTAAGAACAG
AAATGCAATG
AATGCAGTAA
AGAAGAGAAG
TCAATAAGCA
ATAGCAAGTG
AA.AAATGCTT
TTAACCAGCA
AATCAACAGA
AGCAGATTGT
AGCACTTACA
CTCACCAGA.A
TTCAGCTACA
AGGCAGAACC
TCCACCAAA
TGGCAACAAT
GAAACCAACC
AAAAACACCA
CCTCAAGACC
ATTAAAACAC
CTCCACACAA
GCCACATGCT
GAACGAAATT
TCTTCCTAAC
AGTTTTACCA
GTTGGTATAC
GAACTGACAC
CAGAATTACA
CACCACAGTA
AGAAGAGGAA
GTATAGCTGT
TGTTGTCTAC
AAGTGTTAGA
GCTGTCGCAT
TGGWATCAC
TGTTGACAAA
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 351
S
S..
GTCAAGCAAT
AGAAGTCCTT
GAAATTGCAC
AACAAGGACT
GGCTGACACT
ATTACCAAGT
AATTATGACA
GTCATGCTAT
ATTTTCTAAT
CACTTTATAC
AATAAATTAC
AGTCAATGAA
TAATGTAAAT
CATTGTAGTA
CACACCAGTT
ATAGACAA;A
CAACTTACAA
TACCCACATA
ATCAACCACT
AGAAATCCTT
AGTCATA.ATT
AACAPLGATAC
GCTGIAACTGG
ATAGGATCTA
ATTGAGATCA
GTTCAGATAG
GCATATGTTG
ACATCGCCTC
GATAGAGGAT
TGTAAAGTAC
GAAG3TCAGCC
TCAAAAACAG
GGTAAAACTA
GGTTGTGACT
TATGTAAACA
TATGACCCTC
AAAATCAATC
ACTGGCAAAT
TTGTTATCAT
ACACTAAGCA
AACCACCTGA
CAAATATTTC
ACTAAGCTAG
AAATCAACAC
GTAAATTTGA
ACTTTGAATG
TCAAGTCAAT
ATAGAACAGA
TAAACAACAT
ATAGTGATGA
TAAGGCAzACA
TACAGCTGCC
TATGCACTAC
GGTATTGTGA
AGTCCAATCG
TTTGTAACAC
ACATAAGCAG
AATGCACTGC
ATGTGTCAAA
AGCTGGA.AGG
TAGTGTTTCC
AAAGTTTAGC
CTACTACAAA
TAATAGCTAT
AAGACCAACT
TCATGTTTCA
AACATCACAG
ATCCTTAACT
ATCATTCACA
GATTAGAGGT
GCCTCCTCAT
GGACAAA.AGC
AGAATATGCT
AACAAAACAA
CATTAAAAAG
AAGTTATTCC
TATCTATGGT
CAACATCAA.A
TAATGCAGGA
AGTATTTTGT
TGACATA'XTC
CTCAGTAATT
AT CCAACAAA
CAA.AGGAGTA
CAAGAACCTT
TTCTGATGAG
TTTTATTCGT
TATTATGATA
TGGTTTACTG
AAGTGGAATC
ACAACAATCT
TACAGGCTGA.
TATAGTTACA
AZAATTAACAG
CATTGCTTGA
GCATTACTAG
ATAGACACTT
CTTGGTATAG
TCAGC-ATGTG
ATCATGTCTA
GTAATAGATA
GAAGGATCAA
TCAGTATCCT
GACACTATGA
AATTCCAAGT
ACTTCTCTTG
AATCGTGGGA
GATACTGTGT
TATGTAAAAG
TTTGATGCAT
AGATCTGATG
ACTACAATTA
TTGTATTGTA
AATAATATTG
GCTGACCACC
ATCATTTCCT
TAAAAACCTC
CTGGGGCAAA
ATGGTAGAAG
TGAGGCAAAA
rGTCTGAAAT rTGGAGTGCT
TTGCTATGAG
TAATAAAGGA
CACCTTGCTG
ATATTTGTTT
TCTTTCCACA
ACAGTTTGAC
ATGACTGCAA
GAGCTATAGT
TTATAAAGAC
CAGTGGGCAA
GGGAACCTAT
CAATATCTCA
AATTACTACA
TTATAGTAAT
AAGCCAAAAA
CATTCAGCAA
A.ATCCCAAAT
CACATCATGC
A.AGTATCACA
TATGTCGCGA
ATGTCACTAC
CTTCATGTTA
A~AGTGGAGCT
A~GAGAGTTAC
TAAACTTCTT
CAATTCACCT
6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 CTTAGAGATA ATGAAGAACC AAGATAAGAG TGTACA.ATAC TGTTATATCA TACATTGAGA GCAATAGAAA
AAACAACAAG
352
CAAACCATCC
ACATTAGATA
CAAAATGACC
CATATTGATC
TAACCATAAC
TTGGACACCT
TGAAGATATA
TCATCCAACC
GGAAACTCTG
GAGTGTAATG
AACTTAATTA
ACACAGTCAT
TTCCAGTCAT
AACTTACTTA
ATCTTGAATA
GATGAAAACT
AATCAATCAT
ACAACACTCT
TGGTTCAATT
AAAAGTCATG
AATCAATATG
AATCAATTTT
TGGATAAGTA
AATGTTGTGT
GAAGGCTTCT
ACAGAAGAAG
ATCTGCTCAA
TCCACAAAAG
AAACCAAAAA
TCA.AGTGAAA
TATTTGGATA
AAAAACTTAT
TATACAGTAT
ATAAAACTAT
CTAATGTGTA
CTTTAGGGAG
GTAGACAALAG
TAATATCTAG
TACTTATGAC
AAAAAATAAT
AACTAGGATT
CAGTACTTAC
ATACAAATTC
TGAAAAAATT
TATATACAAA
GGTTTATATT
GTTGTATCGT
TGACATGGAA
ATTGTTTAAA
TATCACAATT
ACATA.ATAAA
ATCALATTTAG
GAGACTACCA
CATAACCATA
TAATGATATT
GCATGGTTGC
ACCACCAGCG
TAGATGCCAC
ATATATTAGT
TTTGATAAGG
TCTAACTGAT
TTATCTTTTT
CCCACTACTA
ATATCATAA.A
ATATAAAAGT
ACGAAGAGCC
AAAGGAAAAG
AACTATAATT
AGACAAALAGT
GATGTGTTCA
ATTAAATAAC
AATAGATAAT
TTATCATAL
AGACATCAGC
TACATTAAAC
ATTTCTTTAT
AGAAGTAGAG
GAAACGATTT
GCAGACGTGC
AGCAATCCAA
ACCGGATAAA
TACATTCA-AT
TTTATTAA-AT
TCAACAATTT
GTCATAATGC
TTATGGGACA
AGTTATTTAA
AACGGCCCTT
GAGCATATGA
GGTGAACTGA
ATGTCCTCGT
ATAGAAATAA
GACAGAGTTA
AAAGATGATA
CACTCAGTAA
ATGCAACATC
ATATTAACAC
CAAACTTTAA
GGACTCAAAA
CTTAGCAGAT
AAAAGCTTAG
GGAGATTGTA
GGATTTATTA
TATAATAGCA
TGAAGAAGAC
AAGAGTCAAC
TATCCTTGTA
CATAAAAACA
CATATATTTG
CTCCA.ACATC
TTGACCATAA
AAATGGATCC
AAGGTGTTAT
ATCTTAAAA
AT CT TAAAAA
AATTAGAAGA
CTGAACAAAT
GTGATGTAAA
AGCCCAACAA
TACTTTCGGC
ATCAAAATAT
CTCCATCATG
AATATCGATC
GTGGTTTTCA
AAATCACAAC
TAAATGTTTG
GGCTGAGATG
TACTGAAATT
TGTCTTTAAT
TGCTAAATAA
AATAAAGAAC
TGTGAATGAT
GTATATCATC
TATTACAATT
ATGAAATTCA
TTAACATCCC
CGACTCTATG
CATTATTAAT
CTCTTTTTCA
TGATTACACC
ACTAACTATA
ACCAACTTAT
TGCTACAACT
GGTGTACGCC
TALATTCAGGT
TGTGGAAAAC
CACTATCAA.A
GTTAATACAC
AAATGAGGTA
GTTTATTTTA
TACTACTTAC
CTTAATTACT
TGGATTCAAT
ATTTCATAAT
TCTAAACATA
CATCACAGAT
8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 353 GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 9660
ACAGTGTCTG
TTGATTAAGC
AGAATCTTTG
TGTAATGAAA
TATAGAATCA
ATTGTCCTAC
GAAATCACAG
CTGCCTAAA-A
GATCTAATAT
GAACATGAAA
TTGAGAGATA
CTCAACAACT
AGAATGTTTG
ATAGCTGA7A
CTTCAAAAGA
AACTACAACA
GCATTTAGAT
CAATCTCTGT
AGACATGCAC
AGTGGATTAT
ATTGALAGCTA
CTGATAAATG
CAGACCCATG
GAGTATGCAG
CAGTTCATGA
ATAATATCAT
TTGCAGGTGA
GACATCCAAT
CTAAGTTCTA
TAAAAGGGTT
CTCTAAGATG
AAAATGATTT
AAGTGGATCT
GGACTAGTTT
AGTTGA.AGTT
ATAA.ATTCAA
CTAATCACGT
CTATGCAACC
ATATTTTACA
TATTAGAATT
ATTATATCAG
ATGAAACATC
TCTCTTGGTT
CTCCTTTCAT
ACAGATATCA
TATCATTATT
GTGATAATCA
CACAAGCAGA
GTATAGGCCA
GCAAALACAAT
AAATGGTAAA
TAATAATCTC
GGTCGATGAA
CTTATTALAGT
TGTAAATACC
GTTAAACTAC
GATTATTTTA
TGAA.ATGATA
TCCTAGAAAT
CTCTGAAAGC
TGAATGCGAT
GGTATCACTA
AGGTATGTTT
ATTCTTCCCT
AAAAGCAGGA
TAAATGTTCT
ATGTATCTGC
GCATTTAACA
AAAGGATCAT
TATGGGTGGT
AGATCTAATA
GTCALATTGAT
TTATTTGTTA
TAAGCTTALAG
CCAGCACAAT
TGGATAATCC
AATAACTTGA
AGACAAGCAA
AGTCTAAGTA
TACAACAGAT
TATAAACTTA
TCAGGATTGC
ATAAATGACA
TACATGCCAT
GACAGATCGA
CTATACAATT
ACTGGTAAAG
AGGCAAATCC
GAGAGTTTGA
ATAAGCAACA
ATCATTACAG
AGTGATGTAT
ATACCTCTTG
GTTGTTAATC
ATTGAGGGCT
TCTCTCAAAG
ATAAGCAAAC
GCATTAAATA
GGAACAGAGA
GGAGTGTACT
TATTAAGTAA
GTGAGCTATA
TGGATTCTGT
CATTAAGAGG
GGCCCACCTT
ATACTTATCC
GGTTCTATCG
AAGCCATTTC
CACATATACA
GAAGAGTACT
GTGTAGTCAA
AAAGAGAGCT
AAATCTTAGC
CAAGATATGG
AGTCAAATCG
ATCTTAGCAA
TAGATGAACT
TCACAATAAT
TTAATGAGGT
GGTGTCAA
GGAA)ATTCTC
CAGTTAGACT
GCCTTAAATT
CCTATATATC
ATCCAGCCAG
ATTTCTTAAA
TTTTCTCTTC
AAGAATTAAC
TGCTTTCATT
AAGGAATGCT
ATCTCTACTT
TGAGTTTCAT
ACCTCCAAAA
AAATTATATA
AGAGTATTAC
TCAILAGCTAT
CAGTGTAGGT
AGAGAAAATG
TGATCTAGAG
TTATAATGAT
ATTCAATCAG
GCATGGAGTA
ATGTACATAT
TGATGAACAA
ACTGTGGACC
TATCACAGCT
TATAGAGGGT
GTTATATAAA
CCGAGATATG
TATCAAAAAA
9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 354
GTCCTGAGAG
TCTATAGGCA
ATATTTAGGA
TGTAACAATA
AATCTTGATA
GGTGGTGATC
GAAGCTATAG
AAGCTCCAGG
GATAA.AAATC
GAAAGGCAAG
ATAGCCCCAA
CTAAATGACA
GAAAGTTTAC
ATAACTAATA
GATATGATGA
GACAAAAGAG
AGAGAAAGAT
ACAATGGACA
AATGTTAATA
ACGCAGGAGA
GACCAAATAG
GAATTCATGG
TTGTTTCCAC
TGTGAATTCC
CCTATCAATC
AATTGCATAA
TAGGTCCATG
GCTTAACACA
ACATTTGGTT
AGCTATATTT
GCATTGATAT
CTAATTTGTT
TACATTCAGT
ATCTTCCAGA
CCAATGCCGA
CTAAAATTAC
ACAAAATATT
TTATGCAAAA
CTTTTTATAA
TACTTGAAA
GGAAAAATAT
AGTTATTAAG
CTTGGTCATT
TTAAATATAC
GTTTAACTCG
AAAAAACAAT
ATTTATTAGC
AAGAACTGAG
AITATCTAAG
CTGCATCAAT
ATGTATTAAC
GTTTTGGTCT
GATAAACACG
GGAGTTAGAA
ATACAATCAA
AGATATATTG
GGCTTTATCA
ATATCGAAGC
GTTTGTGTTG
TGATAGACTG
GTTTGTAACA
TAGTGAGATT
TTCTAAAAGT
TATAGAACCA
AGCAGAAAAA
AACATCAGCA
AACTTTACTT
TTTAGAAAAT
ATCCAATATA
AACTAGCACT
TGGTGAAAGA
GCCAGTGTAC
AAAATTAGAC
TACTGGAACA
TGTCAATTAT
ACCAGCTTAT
AGAAAAGTAT
TAGCCTGATG
ATACTTGATG
TACAGAGGAG
ATTGCTTTGC
AAAGTATTAA
TTGTATATGA
TTTTATAGGA
AGCTATTATA
AACAAATTCT
TTGATGAGGG
AATAGATTAG
GCACAACATT
ACTTACCCTC
ATAGTTAATC
ATAGATACAA
ATAAGGATAC
CTTAGTATAA~
GTAGGAGTAA~
ATAGCCAGTG
GGACCCACCA
AACAGACAAG
TGGGTATATG
CTTGGACTGT
TTACACCGTT
AGAACAACAA~
GGAGATGAAG
TCGGTTGTGG
ATTTTAAAGT
AAAGCTTATT
AACTCCGAAA
AACACTTAAA
ATTTGCCTAT
GAACTCCAGA
CTGGTCACGA
TGACATGTGT
ATCCACAGGC
CAGTAACAGA
ATACTACCAC
ATGGATTAAG
TTATATCAGG
CTGATATTAA
TTCCACTAGA
CTGAATTAAG
CATCGCCAAG
GTATAATAAT
AGCCATGGGT
TTTTAACCAA
CATCCATAGA
CATATGAAAA
TAACAGTCAG
ATTATCATTT
ATATCGACAT
AACAkTTCAC
TAGTTTAGAA
ATGCAGTTTA
TCATGCATTA
AACTTTTTTT
GCTGTTTGGT
CTTCCTTACA
TTTACAAGAT
CATCACATTT
TTTAGGGTCT
AGTCTTAAGT
TGAGATTGAT
AGTTGTTTAT
AACAAAATCC
TAGGGCTACT
TTGTAACA7A
CAAGTATGTA
TATTATGTTC
AGAAAAATAT
AGGCTCATCC
AAAGCAAAGA
CAACAAAGAT
AGCCAAAAAG
TAGTAGACCA
TGATACTAGT
TGTGTTTCAA
AAACATATGT
11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 a '1 a *.aa a 355 CCTAATAGAA TTATTCTCAT ACCGALAGCTG AATGAGATAC ATTTGATGAA
ACCTCCTATA
4 *4 *4 4 4 *44*44 4 444444
TTTACAGGAG
CTACCAGATA
AAATCTGGAT
CATAATGCTT
ATGAAAGATT
ATGTTCATTA
AALAGGTTATG
TTGGAGTTAA
GTCATAAAZAT
TTTAAGTTGT
GTTAACATAG
AGAATGGGGT
GAATTTTACA
CTAACAAAAC
CACCCAACCC
AACAAACCTA
AGTAAALATGC
TTGTACAATT
GCAAALATCTA
GCATCACTTT
TCCACAGGAT
AGTTGTATAG
CTTCATCCAG
CCTATTGAAT
ACCATTCCTG
ATGTTGATAT
AALATAAGTTT
CTCACATCAA
ATATTTTA.AG
CAAA.AGGTAT
ATTTGAATGT
GTAAAGCAAA
TAGACAGTAG
ACATAGTCAA
GGTTTTTA;A
ATTATCACCC
TAATAAATGT
CATCAAATCT
AAATAAGAAT
CAGAAACTTT
AATTTTGTAT
ATATTAAATC
TATTTCCAAT
ACCAACTTTA
ATTGCATGCT
GCAAGATCAG
CATTCATAGG
ACATAAGATA
TTCTAAGGTT
CTACAGATGC
CATCAAGTTG
AACCCAATAT
CTCTAATTTA
TACTAATTTA
TTTTGAAAAA
TTTCTTTAAT
ATTAGAATGT
CTACTGGAAA
TCAAGACACA
ACGCCTTAAT
AACACACATG
AGATAAATTA
CTTTTACATT
TGCTAATTCA
AGAA.AATATG
AAGTGGAAAT
TTCCACTGTT
TGTTGTGATA
CACCACCACT
TCCTTGGCAT
TATAGAGTAT
TGAAGGAGCT
CATTTACAGA
ATACAACGGG
AACTAATAAC
AAGCAAGTGA
GTAGAATTAT
ATATTAGTAC
GCTGGACATT
GATTGGGGAG
GCTTATAAGA
GATATGAACA
TCTATGTCTA
AGTTTGCGTA
AATGCTAAAT
AAAGCTATAT
ACCATTAAAA
AGTTATAACT
GAATTAGAAG
TCATTAATTC
ACCGAATCTA
ACCACAAGAT
GACAAGATTA
TCACATCAGA
CATGTCAATA
ATTTTAAAG
GGTAACTTAT
AGTTTAAAAG
CATATAAACA
ATTCATTGGT
TACAA-AAGCA
TCTTAAGTAA
ATAAAATGTC
GGATTCTGAT
AGGGGTACAT
CTTATTTGCT
CTTCAGATCT
AAGTTTTCCT
GAATAAAAUGG
TTACCGTATG
TATCTTACAT
ATAAAACAA
TTTCAGACAA
ATAATTATAA
CTGTTAAAAG
TGATGATGTC
TCAATTATAG
TAGATCATTC
CATCTTTAGT
GATTTAACTT
ATCTTAAGAT
TATTACGTAC
ATTGCAATGA
TAGATTATGG
CTTATTTACA
GCACATGTTC
CAAAGCACTT
TGATTATTTT
TATTCAACTT
AACTGATCAT
ATGTTTTCAT
TCTTTGTGTT
AGAACAAAAA
CTGTCACAGT
CCCTTGGGTT
AGATTTAGTT
ATTCAATGAT
CACTCATTTG
CAA.ACTATAT
TAATAATAGT
AACATTCTCT
CAAACAAGAC
AGGTAATACA
AAGGAATAGT
TGTATTTAGT
TAAGGACCCC
GGTAGTAGAA
TCATAGTTTA
TGAGAATTTA
TATAAAATTT
12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 356
GCAGAACCTA
AAAATTATAA
AGATGCATTT
ATTACTATAT
TTAATCCTTA
AAATTGATAC
ATCGATGCAG
ATTAAGACTT
ATAGCTGGAC
CTAAAATGGC
TACATGATAG,
GAGCTCAAGA
TAGTTTAAA
TAGTTATTAA
TTTTAGTCTT
TTAGCATCTT
TTGAATGGAG
TAATTGCAAA
TAAAAACTTA
CAATAGGCCC
TTTCAAGA2AC
ATATTAAAAG
CATTGTCAA
GTAATGAAGT
TAGATCATGT
AGTCCACATA
AGCTGATTAA
TATCATTAAC
AAAATATACA
AAGGGGTTAA
TGTCTGCGAT
TAAGCATGTA
ATATCATGCT
CGTGTGCCTA
TGCAAATATA
TAAAATTTC
CTTAATACCT
ATTGAAGAGT
ATTCAGCAAC
TTTAAATTTT
TCCTTACTTA
AATAACAGGT
AAGTTTGGTC
AACTTTTCAA
ATAAAAGTCT
GCTGAATTAC
AGAAAGTGCA
CAAGATGACA
GGTAGCAAGT
CTTCCTGTTT
ATTATGCCTA
TTCCTTTGTT
GTAGTTAATG
AAGCTTATAA
AGATCAGCTG
AGTGAATTGT
AGTGTGCTAT
AAATTTAGAT
TAATTTAGCA
AAAACTAACA
TTTTCTCGT
CTGTTACAGC
AGTACTGTTC
TTGATTTCAA
TAAAAGGATC
TTGATGTTGT
AAAAAACTGA
ACCCTATAAC
GAGATATATT
ACCACAAGCA
AACTTAATTA
TAAATAGTTT
ACAACCTTCC
GCTAACACAT
TATTGATTCC
ATTATACATG
CAATTGGAGT
TTCTGTAA.AT
ATTAGATAAC
TGAAGTTTAC
ACAAAATGCT
CAAGGAATCT
AAAAAAAGGA.
ATCATATTCT
TATGAATATC
CAATCATTTA
AACAACCAAT
CALACGAACAG
CATTATATTA
AAAATTATCA
TGCATTCACA
14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15219 ACACAACGAG ACATTAGTTT
TTGACACTTT
INFORMATION FOR SEQ ID SEQUENCE
CHARACTERISTICS:
LENGTH: 2166 amino acids TYPE: amino acid STRANflEDNESS: TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asp Pro Ile Ile Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 1 5 10 Ser Tyr Leu Lys Gly Val Ile Ser Phe Ser Glu Cys Asn Ala Leu Gly 357 25 Ser Tyr Leu Phe Asn Giy Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 40 Ile Ser Arg Gin Ser Pro Leu Leu Giu His Met Asn Leu Lys Lys Leu 55 Thr Ile Thr Gin Ser Leu Ile Ser Arg Tyr His Lys Giy Glu Leu Lys 70 75 Leu Giu Giu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 90 Met Ser Ser Ser Giu Gin Ile Ala Thr Thr Asn Leu Leu Lys Lys Ile 100 i05 Ile Arg Arg Aia Ile Giu Ile Ser Asp Vai Lye Val Tyr Ala Ile Leu 115 120 125 Asn Lye Leu Gly Leu Lys Giu Lye Asp Arg Vai Lye Pro Asn Asn Asn 130 135 140 Ser Gly Asp Giu Asn Ser Val Leu Thr Thr Ile Ile Lys Asp Asp Ile *145 150 155 160 Leu Ser Ala Val Giu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser *165 170 175 His Ser Val Aen Gin Asn Ile Thr Ile Lys Thr Thr Leu Leu Lye Lys 180 185 190 *Leu Met Cys Ser Met Gin His Pro Pro Ser Trp, Leu Ile His Trp Phe 195 200 205 Asn Leu Tyr Thr Lys Leu Asn Asn Ile Leu Thr Gin Tyr Arg Ser Aen **210 215 220 Giu Val Lys Ser His Gly Phe Ile Leu Ile Asp Aen Gin Thr Leu Ser 225 230 235 240 *Gly Phe Gin Phe Ile Leu Asn Gin Tyr Gly Cys Ile Val Tyr His Lye 245 250 255 Gly Leu Lye Lye le Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp, 260 265 270 Lye Asp Ile Ser Leu Ser Arg Leu Asn Val Cys Leu Ile Thr Trp Ile 275 280 285 Ser Asn Cys Leu Asn Thr Leu Asn Lye Ser Leu Giy Leu Arg Cys Gly d. -7 v 295 358 Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys Ile 310'AI9-
S
.5
S
.5 S. S S *5 S
S
S.
S
S.
S
S
a.
S 5@
S
Leu Giy Arg Ile Asp 385 Leu Asn Met Giu Phe 465 Pro Tyr Leu Lys Pro 545 Lys Phe Lys Lys 370 Lys Ser Asn Vali Thr 450 Ile Thr Lye Ile Lys 530 Lys Leu Ile Arg 355 Aia Thr Lye Leu Asp 435 Lye Tyr Leu Leu Ile 51i5 Vai Asp Phe Met 340 Phe Gin Vai Phe Ser 420 Giu Phe Arg Arg Asn 500 Leu Asp Leu His 325 Ser Tyr Lye Ser Leu 405 Giu Arg Tyr Ile Asn 485 Thr Ser Leu Ile Asn Leu Asn Asp Asp 390 Lys Leu Gin Leu Ile 470 Aila Tyr Giy Giu Trp 550 Giu Ile Ser Leu 375 Asn Leu Tyr Aia Leu 455 Lys Ile Pro Leu Met 535 Thr Giy Leu Met 360 Leu Ile Ile Phe Met 440 Ser Giy Vai Ser Arg 520 Ile Ser Phe Asn 345 Leu Ser Ile Lye Leu 425 Asp Ser Phe Leu Leu 505 Phe Ile Phe Tyr 330 Ile Asn Arg Aen Leu 410 Phe S er Leu Vai Pro 490 Leu Tyr Asn Pro Ile Thr Asn Vai Gly 395 Ala Arg Vai Ser Asn 475 Leu Glu Arg Asp Arg 555 Ile Giu Ile cys 380 Lye Giy Ile Arg Thr 460 Thr Arg Ile Giu Lys 540 Asn Lye Giu Thr 365 His Trp Asp Phe Ile 445 Leu Tyr Trp Thr Phe 525 Aila Tyr Giu Asp 350 Asp Thr Ile Aen Giy 430 Aen Arg Asn Leu Giu 510 His Ile Met Val2 335 Gin Aia Leu Ile Asn 415 His Cys Giy Arg Asn 495 Asn Leu Ser Pro Giu Phe Ala Leu Leu 400 Leu Pro Aen Aila Trp 480 Tyr Asp Pro Pro Ser 560 His Ile Gin Asn Tyr Ile Giu His Giu Lys 570 Leu Lys Phe Ser Giu Ser 575 359 Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 580 585 590 Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 595 600 605 Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 610 615 620 Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin Ile Gin 625 630 635 640 Ile Leu Ala Glu Lys Met Ile Ala Glu Asn Ile Leu Gin Phe Phe Pro 645 650 655 Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys Ile Leu Glu 660 665 670 Leu Lys Ala Gly Ile Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 675 680 685 Asn Asn Tyr Ile Ser Lys Cys Ser Ile Ile Thr Asp Leu Ser Lys Phe 690 695 700 0* S*Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys Ile Cys Ser Asp Val Leu 705 710 715 720 *000 Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 725 730 735 Ile Pro Leu Val Thr Ile Ile Cys Thr Tyr Arg His Ala Pro Pro Phe 740 745 750 Ile Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 755 760 765 S. Leu Tyr Arg Tyr His Met Gly Gly Ile Glu Gly Trp Cys Gin Lys Leu 770 775 780 Trp Thr Ile Glu Ala Ile Ser Leu Leu Asp Leu Ile Ser Leu Lys Gly 785 790 795 800 Lys Phe Ser Ile Thr Ala Leu Ile Asn Gly Asp Asn Gin Ser Ile Asp 805 810 815 Ile Ser Lys Pro Val Arg Leu Ile Glu Gly Gin Thr His Ala Gin Ala 820 825 830 Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 835 840 845 Ala Gly Ile Gly His Lys Leu Lys Gly Thr Glu Thr Tyr Ile Ser Arg 360 850 855 860 Asp Met Gin Phe Met Ser Lys Thr Ile Gin His Ann Giy Vai Tyr Tyr 865 870 875 880 Pro Ala Ser Ile Lys Lys Val Leu Arg Val Gly Pro Trp Ile Ann Thr 885 890 895 Ile Leu Asp Asp Phe Lys Val Ser Leu Glu Ser Ile Giy Ser Leu Thr 900 905 910 Gin Giu Leu Glu Tyr Arg Gly Giu Ser Leu Leu Cys Ser Leu Ile Phe 915 920 925 Arg Ann Ile Trp Leu Tyr Ann Gin Ile Ala Leu Gin Leu Arg Ann His 930 935 940 Ala Leu Cys Asn Ann Lys Leu Tyr Leu Asp Ile Leu Lys Val Leu Lys 945 950 955 960 His Leu Lye Thr Phe Phe Ann Leu Asp Ser Ile Asp Met Ala Leu Ser *965 970 975 :Leu Tyr Met Ann Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 980 985 990 *Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Giu Ala 995 1000 1005 Ile Val His Ser Val Phe Vai Leu Ser Tyr Tyr Thr Gly His Asp Leu 1010 1015 1020 Gin Asp Lye Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 1025 1030 1035 1040 Thr Cys Val Ile Thr Phe Asp Lye Ann Pro Ann Ala Giu Phe Val Thr 1045 1050 1055 Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Giu Arg Gin Ala Lye Ile 1060 1065 1070 Thr Ser Giu Ile Ann Arg Leu Ala Val Thr Giu Val Leu Ser Ile Ala 1075 1080 1085 Pro Ann Lye Ile Phe Ser Lye Ser Ala Gin His Tyr Thr Thr Thr Glu 1090 1095 1100 Ile Asp Leu Ann Asp Ile Met Gin Asn Ile Giu Pro Thr Tyr Pro His 1105 1110 1115 1120 Gly Leu Arg Val Val Tyr Giu Ser Leu Pro Phe Tyr Lye Ala Glu Lye 1125 1130 1135 361 Ile Val Asn Leu Ile Ser Gly Thr Lys Ser Ile Thr Asn Ile Leu Giu 1140 1145 1150 Lys Thr Ser Ala Ile Asp Thr Thr Asp Ile Asn Arg Ala Thr Asp Met 1155 1160 1165 Met Arg Lys Asn Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp Cys 1170 1175 1180 Asn Lys Asp Lys Arg Giu Leu Leu Ser Leu Glu Asn Leu Ser Ile Thr 1185 1190 1195 1200 Giu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn Ile 1205 1210 1215 Val Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asp Ile Lys Tyr 1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile le Ile Glu Lys Tyr Aen Val 1235 1240 1245 Asn Ser Leu Thr Arg Gly Giu Arg Gly Pro Thr Lye Pro Trp Val Giy 1250 1255 1260 *Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val *1265 1270 1275 1280 Thr Lys Lys Gin Arg Asp Gin Ile Asp Leu Leu Ala Lys Leu Asp 1285 1290 1295 **Trp Val Tyr Ala Ser Ile Asp Asn Lys Asp Giu Phe Met Giu Glu Leu 1300 1305 1310 Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 1315 1320 1325 *.Pro Gin Tyr Leu Ser Val Asn Tiyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 Arg Pro Cys Glu Phe Pro Aia Ser Ile Pro Ala Tyr Arg Thr Thr Asn *::*1345 1350 1355 1360 Tyr His Phe Asp Thr Ser Pro Ile Asn His Val Leu Thr Giu Lys Tyr 1365 1370 1375 Gly Asp Glu Asp Ile Asp Ile Val Phe Gin Aen Cys Ile Ser Phe Gly 1380 1385 1390 Leo Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn Ile Cys Pro Asn 1395 1400 1405 362 Arg Ile Ile Leu Ile Pro Lys Leu Asn Giu Ile His Leu Met Lys Pro 1410 1415 1420 Pro Ile Phe Thr Gly Asp Val Asp Ile Ile Lys Leu Lys Gin Val Ile 1425 1430 1435 1440 Gin Lys Gin His Met Phe Leu Pro Asp Lys Ile Ser Leu Thr Gin Tyr 1445 1450 1455 Vai Giu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Giy Ser His Ile 1460 1465 1470 Asn Ser Asn Leu Ile Leu Vai His Lys Met Ser Asp Tyr Phe His Asn 1475 1480 1485 Ala Tyr Ile Leu Ser Thr Asn Leu Ala Giy His Trp Ile Leu Ile Ile 1490 1495 1500 Gin Leu Met Lys Asp Ser Lys Giy Ile Phe Giu Lys Asp Trp Gly Giu 1505 1510 1515 1520 Gly Tyr Ile Thr Asp His Met Phe Ile Asn Leu Asn Vai Phe Phe Asn 1525 1530 1535 .Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala *..1540 1545 1550 .*.Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Giu *..1555 1560 1565 Leu Ile Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Giu 1570 1575 1580 Gin Lys Val Ile Lys Tyr Ile Val Asn Gin Asp Thr Ser Leu Arg Arg 1585 1590 1595 1600 Ile Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 1605 1610 1615 Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn Ile Asp Tyr His *1620 1625 1630 Pro Thr His Met Lys Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met 1635 1640 1645 Gly Leu Ile Asn Val Asp Lys Leu Thr Ilie Lys Asn Lys Asn Lys Phe 1650 1655 1660 Asn Asp Giu Phe Tyr Thr Ser Asn Leu Phe Tyr Ile Ser Tyr Asn Phe 1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lys Gin Ile Arg Ile Ala Asn Ser 363 1685 1690 1695 Giu Leu Giu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 1700 1705 1710 Leu Giu Asn Met Ser Leu Ile Pro Val Lys Ser Asn Asn Ser Asn Lys 1715 .1720 1725 Pro Lyse Phe Cys Ile Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 1730 1735 1740 Phe Ser Ser Lys Met His Ile Lys Ser Ser Thr Val Thr Thr Arg Phe 1745 1750 1755 1760 Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro Ile Val Val Ile 1765 1770 1775 Asp Lys Ile Ile Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 1780 1785 1790 Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 1795 1800 1805 *Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Vai *1810 1815 1820 Phe Ser Ser Thr Gly Cys Lys Ile Ser Ile Glu Tyr Ile Leu Lys Asp 1825 1830 1835 1840 Leu Lye Ile Lys Asp Pro Ser Cys Ile Ala Phe Ile Gly Giu Gly Ala 1845 1850 1855 .e Giy Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile *1875 1880 1885 Giu Phe Leu Arg Leu Tyr Asn Giy His Ile Asn Ile Asp Tyr Giy Giu 1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala Thr Asn Asn Ile His Trp Ser 1905 1910 1915 1920 Tyr Leu His Ile Lye Phe Ala Giu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 Ala Giu Leu Pro Val Thr Ala Asn Trp Ser Lys Ile Ile Ile'Glu Trp 1940 1945 1950 Ser Lys His Val Arg Lys Cys Lye Tyr Cys Ser Ser Val Aen Arg Cys 1955 1960 1965 364 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 1985 1990 1995 2000 Lys Gly Ser Giu Val Tyr Leu Ile Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Ile Leu Ser Arg 2020 2025 2030 Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lys Giu Ser Ile Asp 2035 2040 2045 Ala Asp Ile Lys Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lys 2050 2055 2060 Lys Gly Ile Lys Thr Ser Leu Ser Lye Leu Lys Ser Val Val Asn Gly 2065 2070 2075 2080 *Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Asn Glu Vai Phe Ser Asn *2085 2090 2095 *..Lys Leu Ile Aen His Lys His Met Asn Ile Leu Lys Trp Leu Asp His :::2100 2105 2110 *Val Leu Asn Phe Arg Ser Ala Giu Leu Asn Tyr Asn His Leu Tyr Met 2115 2120 2125 Sle Giu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 2130 2135 2140 Thr Asn Giu Leu Lye Lye Leu Ile Lye Ile Thr Gly Ser Val Leu Tyr 2145 2150 2155 2160 *Aen Leu Pro Asn Giu Gin 2165 INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 15219 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 365 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC a.
TTGATAAGTG
ATGATAAAGG
ACATGTTATA
ACAATTAAAT
GATAACAATA
TACATATGGG
AATTGTGAALA
CAAATATCTG
TAGACATGTG
AATAAACTCA
GATTGATGAT
AAGAAATCAT
TTGATGAAAG
AAGTAGGGAG
CCATGCCTAT
AACACACTCC
CCCAACCAAA
CATTTTAGTA
TGGCTCTTAG
GCAAATACAC
AAAAACACCT
TCACAGGATT
AGATACTTAA
GTCAPLGATAT
CTATTTA.AGT
TTAGATTACA
CTGATAAATT
TAAACGGCAT
TTGTAGTGAA
AATTGATTGA
TCAAATTTTC
ACTTACTTGG
TTTATTACCA
CCTAATCAAT
CACAGACATG
CACACACAAA
ACAAGCTACA
TACCAAATAC
ATTTATCAAT
TATAATATAC
CCAAACTATT
ATTAAAAATA
CAAAGTCAAG
TATTCAACGT
AAACAAACTA
AATAGGTATG
AGATGCTGGA
A7LATGGAAAG
CTAACCTTTT
AAATTTATTT
AATTCTTCTG
AGTTTTTATA
ATCTAACTTT
GTTGACACAC
TAAAAGACTA
GCTTGATCTC
TTTTAGTTAA
CAAACCATGA
AGACCCCTGT
TTCATATACT
TTTACATTCT
AAAAALATACA
CACGGCGGGT
AAATATGACC
CCTCAAACAA
AAAGTAAAGC
TTGAATGATA
AGTACAGGAG
TGTGGTATGC
TTATATGCTA
TATCATGTTA
GAAATGAAAT
CAATCAGAAA
GACAATGACG
ACCAATGCAT
CATGTTATA.A
ACAACAATGC
TGCTCTCAAT
AGTGACTCAG
AATT CAT GA.A
TATAAAALACT
GCACTACAAA
CAkATGGATTC
TGATAAACAA
TAGTCAATTA
CTGA.ATATALA
TTCTAGAATG
TCAACCCGTG
CAGTGCTCAA
CAATAACATA
CATTAAATAA
ATAATATTGA
TATTAATCAC
TGTCCAGGTT
AAGCTAATGG
TCGAAGTATT
TGGGGTGCAA
AAGTAGCATT
TAGCCAAAGC
CAAGCAGTGA
CAATACTACA
TAAACGGTTT
TAATGACTAA
TTATGTTTAG
CATCAXAGGG
TGACAACACT
AATA.ATAACA
TGAATGTATT
TGAGATGAAG
TACAA.AATAT
TATTGGCATT
AATTCCAACA
TAGTTAAGA-A
AATTGGGGCA
GGATCAGCTG
CACTCCCAAT
TGAAGATGCA
AGGAAGGGAA
AGTAGATATA
AACATTATCA
TTCACTGAGC
GTTAAAAATA
AGCAATACAT
AGTGTGCCCT
AAATGGAGGA
ALATGGATGAT
TTATATGAAT
TCTAATTCALA
AA.ATGGGGCA
ACTATGCAAA
TCTCTTACCA
GTAAGAAAAC
CTACTGCACA
GGCACTTTCC
AAGCCTACAA
AAAAAACCAA
GGAGCTAATC
AATACAAAGA
CTGTCATCCA
TATGATGTGC
AATCATAAAT
GACACTATAA
ACAACATATC
AGCTTGACAT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 a a a a 366
CAGAAATACA
AGATGGGAGA
TGTGTATAGC
CAGTAATTAG
TACCAAAGGA
ATGTTTTCGT
GAATCTTTGC
GAGTTTTAGC
TGGAGCAAGT
ACCATATATT
CA.AGTGTGGT
CAAGAAACCA
GAGTAATAAA
AACTCAACCC
AGTCAACATG
CAAATTCCTA
TAGCATAATA
TGGCACCAAC
CTACCCAAGA
TTCTAAGTTG
CTCATATGAA
TGATGAAAAA
ACCCACTTCA
AGAAAAAATA
TAGGAATGAG
AGTCAATATT
AGTGGCTCCA
TGCACTTGTG
GAGGGCAAAC
TATAGCTAAC
GCACTTTGGC
AGGATTGTTT
CAAATCTGTA
TGTGGAAGTC
GAACAATCCA
CCTAGGCAAT
GGATCTTTAT
CTACAGTGTA
CAAAGAAGAT
GAGAAGTTTG
GAATCAATAA
TCTGTTAACT
ATCATCAATC
AAACCCCTAG
TACAAGGAAA
GAGATAAATG
TTAAGTGAAA
GCTCGCGATG
AGAGCGGAAG
GAAAGCGAAA
GAGATAGAAT
GAkATATAGGC
ATAACCA.AAT
AATGTCTTAA
AGTTTTTATG
ATTGCACAAT
ATGAATGCCT
AAAAATATCA
TATGAGTATG
AAAGCATCAT
GCAGCAGGTC
GATGCAGCTA
TTAGACTTAA
GATGTAGAGC
CACCTGAATT
AGGGCA.AGTT
VAATAGATAT
CAACAAGTGA
TAAGCTTCAA
CAATAGAAAC
ATCAAACAAA
TATTAGGAAT
GAATAAGAGA
CATTAATGAC
AAATGGCAAA
CTAGAAAGTC
ATGATTCTCC
TAGCAGCAGG
AAAACGAAAT
AAGTGTTTGA
CAT CCACAAG
ATGGTTCAGG
TGCTAGGACA
CACAGAAGTT
TGCTGTCATT
TAGGCATAAT
AAGCATATGC
CAGCAGAAGA
TTTAAGTTAA
TCATGGAGAA
CGCATCATCC
AGAAGTAACT
AGCCGACAGT
AGAAGATCTC
ATTTGATAAC
TGACAACATT
GCTCCATACA
TGCTATGGTT
CAATGATAGG
AGACACCTCA
CTACAAAAAA
AGACTGTGGG
AGACAGATCA
A.AAACGATAC
AAAACACCCT
AGGGGGTAGT
GCAAGTAATG
TGCTAGTGTC
GGGAGGAGAA
AACTCAATTT
GGGAGAGTAT
AGAGCAACTC
ATTGGAAGCC
CAAAAAATAC
GATGCAAATA
AAAGATCCTA
AAAGAGAGCC
ACCCCAGAAA
ACCCCAAGTG
AATGAAGAAG
ACAGCAAGAC
TTAGTAGTTG
GGTCTAAGAG
TTAGAGGCTA
GATGAAGTGT
ATGCTAAAAG
ATGATAATAC
GGTCTTACAG
AAGGGC CTCA
CATCTTATAG
AGAGTTGAAG
CTAAGATGGG
CAGGCAGAA.A
GCTGGATTCT
CCCAACTTCT
AGAGGTACAC
AAAGAAAATG
ATAAAGCATC
GGGGCAAATA
ACAAAGCTAC
AGAAGAAAGA
CGATAACATC
CAAAAGCCAA
ACAACCCTTT
AATCTAGCTA
TAGATAGAAT
CAAGTGCAGG
AAGAGATGAT
TGGCAAGACT
CTCTTAATCC
1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAAcGAT AGTGACAATG ATCTATCACT 367 .000 0 .0 0 0* 0 **00
TGATGATTTT
TCAATCCATT
CCAATCAATC
A.AAAGAACAA
ATACACAGCA
AATATGGGTG
AAGCATCAAT
GATTA.ACTCA
TGTATCATTA
AGCATGCAGT
CATGAAGACA
GACATCAAAA
TCTGAACTCA
AATTATTCCT
CAAATATATC
GAGCATATAT
ACTAGAGGAT
ACACACTATA
CCACTCAAAA
AAATAAAACC
CATTTATACC
ACAATAGAAC
CTAATCTTTT
CACAAAGCAT
AGTTCCACCA
CCTCTCACAG
TGATCAGCGA
GAATCAACTG
AACCAATTGA
GATGGGGCAA
GCTGTTCAGT
CCTATGTTCC
ATACTAGTGA
AGAAGTGCTG
GATGAAAGAA
CTAACATGCT
TTCAACCCCA
AGAGTAATAA
CTAGAAIAATA
TATGCAGGAT
AALACCACAGA
TATGTGACTA
TAAACTTAAT
TCCAAACATC
TCCAAAATCA
AAAATATGGG
GCCAATTCAA
TCACAAGCAA
TACTAATTAT
TCTGCAACAA
TTATGCTGTG
AGTCACGGTG
TCAACTCACT
CCAGACCGAA
TCAATCAGCA
ATATGGAAAC
ACAATGTTCT
AGTCATCTGT
AGCAGATCTC
TGCTGGCTCA
GCAAATTAGC
TAAAAGTAIA
CTCATGAGAT
TACCAACCTA
TAGCAACCAC
TAGTGTTAGT
GTCAATTTAT
CTA.ATTGGAA
TATCAACACT
ATAILACATCT
CTACCAGCCA
GTAAATAGAC
CACATATACT
ATTTTGGCCC
AATCACTATC
AACTCTTGAA
TCAAACCATA
TCGCAAAACC
CAGCAATCAA
CAAACAAACG
ACCCGACAAA
ATACGTGAAC
AGAAAAAGAT
GCCAGCAGAC
TACGCCCAAA
AATGCCTAGT
ATATGATGTA
AAGTATGTTA
CATTGCTCTA
TCTAAGATCA
CGAATTCA.AA
TATCACAGTT
AGTAGATCTT
GCATACAGCT
GAATGACAGG
ACACTACACA
CTATCCGCTA
ATTAGTTAGA
ATAAATCTTA
TATTTTACAC
ATGATTGCAA
CTAGGACAGA
ATCCTGTATA
ACGCTAACCA
CAACATCAAT
TCCATCAGTA
ATTAACAATA
AAGCTTCACG
GATGATCCTG
TTGCTCATAA
GGACCTTCAC
AATTTCATCA
ACTACACCTT
ACTACAGTCA
TGTGAATTTG
ATTAGTGTCA
AATGCTATCA
ACTGACAATA
GGTGCCTACC
ACACGTTTTT
TCCACATATA
CTTCATCACA
GACCTAGAGT
GTTCAATCAA
AAATGGGAAA
TAATACATAT
CACTAAATAA
TGTACCAAAT
TACAAACAAA
TCATGGTAGC
AAALACAGACA
GAACCACCAA
TAGTAACAA.A
AAGGCTCCAC
CATCACTAAC
AAGAACTTGC
TACGAGTCAC
TAAGCGCAA-A
GTGAAATCAA
AAGATCTTAC
AAAATATTAT
AGAACAAGGA
CCAATGCAAA
AAGGAGCATT
TAGAAAAAGA
CAATCAAACC
TCCTCAAACT
CAAACCAATC
GCGAATAGGC
TCTTAACAAC
TACATCCATC
GATCTTAACT
GCTAAGTGAA
CAACACACAG
CAAATCCAAT
ATAGAGTAGT
3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 368 TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAJA
CATTGGGGCA
AATGCAACCA
TGTCCAAACA
GATACTCTTA
ATCATCTAAT
U
a a a a.
a
ATAGCACAA
GCCATAATAI
ACAATAAAA.A
AGGGTTAGTT
ACATCACCCA
ACCACTTCAA
AAAGATGATT
CAACTTTGCA
ATCAAACCCA
GCCAAAACGA
ACAGAAAGAG
ACAGTCCAAC
ACACCCACAG
TAGTTATTCA
AAACCTGGGG
TCTTGCTATT
ATCGACATGT
TAGTGTCATA
TAAAGTAAA
GCTACTTATG
TATGAACTAC
ACGAAGATTT
ATCAAAAGTT
TAGCACTATC
*TCATCATCTC
*ACCACACTGA
CATCCAAGCA
ATACAAAATC
CACAGACCAA
ACCATTTTGA
AATCCATCTG
CAAACAAACC
CGAAAAAAGA
ACACCAGCAC
AGCAATCCCT
CATCCGAGCC
AAAA.CTACAT
CAAATAACCA
AATGCATTGT
AGTGCAGTTA
ACAATAGAAT
CTTATGAAAC
CA.AAACACAC
ACAATCAATA
CTAGGCTTCT
CTACACCTTG
*CAAGAATCAA
TGTA.ATATCC
AGTTTTGGCA
TGCCAATCAC
AAAAAACATC
ACCCACAAcC
AGAAACACAC
CAAGCCA.AGC
AGTGTTCAAC
CAAAACAATA
AACCACCAAA
AACTACCACC
CTCACAkATCC
CCTCTCAACC
CTCCACACCA
CTTAGCAGAG
TGGAGTTGAT
ACCTCACCTC
GCAGAGGTTA
TAAGTAATAT
AAGAATTAGA
CAGCTGTCAA
CCACTAAAAA
TGTTAGGTGT
AAGGAGAAGT
*CGCACTGCCP,
TCTTGTTTA7
ATGATAATCT
AAAGTTACAC
ACCACCTACC
ACATCACCAA
CATACAACAG
ACAAAACCAC
TTCGTTCCCT
CCAAGCAACA
ACCACAAACA
AACCCAACAA
ACTGCACTCG
ACCCCCGAAA
AACTCCACCC
AACCGTGATC
GATCCACAAG
AAGTCAGAAC
TTTTAGTGCT
AAAAGAAACC
TAAGTATAAG
CAACCGGGCC
CCTAALATGTA
GGGATCTGCA
GAACAAGATC
GGACTCTAGA AAAGACCTGG
ACAGATTAAJP
CAACCTCTCI
TAACAACGGT
CTACTCAAGT
TCCACACAAG
CACAAACCA
GTCCAAAjAJA
GCAGTATATG
AACCAAAGAA
AAAGAGACCC
AAAAACTAAC
ACACAACCAC
ACACACCCAA
AAAAAACCCA
TATCAAGCAA
TCAAGTGA
ATAACTGAGG
TTAAGAACAG
AAATGCAATG
AATGCAGTAA
AGAAGAGAAG
TCAATAAGCA
ATAGCAAGTG
AAAAATGCTT
TTTAAAATCT
CATA.ATTGCA
CACAGTTCAA
CTCACCAGAA
TTCAGCTACA
AGGCAGAACC
TCCACCAAAA
TGGCAACAAT
GAAACCAACC
AAAAACACCA
CCTCAAGACC
ATTAAAACAC
CTCCACACAA
GCCACATGCT
GAACGAAATT
TCTTCCTAAC
AGTTTTACCA
GTTGGTATAC
GAACTGACAC
CAGAATTACA
CACCACAGTA
AGAAGAGGAA
GTATAGCTGT
TGTTGTCTAC
4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 a a a.
a 369 C SC
C
C. Ce C C
C
CC eb C C e
*CCC
C.
C
Ce a. C
AAACAAAGCT
TCTCAAGAAT
CTCCA.ACATT
CAGAGAATTT
CAGTGAGTTA
GTCAAGCAAT
AGAAGTCCTT
GAAATTGCAC
AACAAGGACT
GGCTGACACT
ATTACCAAGT
AATTATGACA
GTCATGCTAT
ATTTTCTAAT
CACTTTATAC
AATAAATTAC
AGTCAATGAA
TAATGTAAAT
CATTGTAGTA
CACACCAGTT
ATAGACAA.AA
CAACTTACAA
TACCCACATA
ATCAACCACT
AGAAATCCTT
AGTCATAATT
GTAGTCAGTT
TACATAAATA
GAAACAGTTA
AGTGTCAATG
CTATCATTAA
GTTCAGATAG
GCATATGTTG
ACATCGCCTC
GATAGAGGAT
TGTAAAGTAC
GAAGTCAGCC
TCAAAAACAG
GGTAAAACTA
GGTTGTGACT
TATGTAAACA
TATGACCCTC
AAAATCAATC
ACTGGCAAAT
TTGTTATCAT
ACACTAAGCA
AACCACCTGA
CAAATATTTC
ACTAAGCTAG
AAATCAACAC
GTAAATTTGA
ACTTTGAATG
TATCAAATGG
ACCAATTATT
TAGAATTCCA
CAGGTGTAAC
TCAATGATAT
TAAGGCAACA
TACAGCTGCC
TATGCACTAC
GGTATTGTGA
AGTCCAATCG
TTTGTALACAC
ACATAAGCAG
AATGCACTGC
ATGTGTCAAA
AGCTGGAAGG
TAGTGTTTCC
AALAGTTTAGC
CTACTACAAA
TAATAGCTAT
AAGACCAACT
TCATGTTTCA
AACATCACAG
ATCCTTA.ACT
ATCATTCACA
GATTAGAGGT
GCCTCCTCAT
GGTCAGTGTT
ACCCATAGTA
GCAGA.AGAAC
AACACCTTTA
GCCTATAACA
AAGTTATTCC
TATCTATGGT
CAACATCAAA
TAATGCAGGA
AGTATTTTGT
TGACATATTC
CTCAGTAATT
ATCCALACAAA
CAAAGGAGTA
CAAGAACCTT
TTCTGATGAG
TTTTATTCGT
TATTATGATA
TGGTTTACTG
A.AGTGGAATC
ACAACAATCT
TACAGGCTGA
TATAGTTACA
AAATTAACAG
CATTGCTTGA
GCATTACT'AG
TTAACCAGCA
AATCAACAGA
AGCAGATTGT
AGCACTTACA
AATGATCAGA
ATCATGTCTA
GTAATAGATA
GAAGGATCAA
TCAGTATCCT
GACACTATGA
AATTCCAAGT
ACTTCTCTTG
AATCGTGGGA
GATACTGTGT
TATGTAAAAG
TTTGATGCAT
AGATCTGATG
ACTACAATTA
TTGTATTGTA
AATAATATTG
GCTGACCACC
ATCATTTCCT
TAAAAACCTC
CTGGGGCA;A
ATGGTAGAAG
TGAGGCAAAA
AAGTGTTAGA
GCTGTCGCAT
TGGAAATCAC
TGTTGACAAA
AAAAATTAAT
TAATAAAGGA
CACCTTGCTG
ATATTTGTTT
TCTTTCCACA
ACAGTTTGAC
ATGACTGCAA
GAGCTATAGT
TTATA2LAGAC
CAGTGGGCAA
GGGAACCTAT
CAATATCTCA
AATTACTACA
TTATAGTAAT
AAGCCAAAAA
CATTCAGCAA
AATCCCAAAT
CACATCATGC
AAGTATCACA
TATGTCGCGA
ATGTCACTAC
CTTCATGTTA
6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740
C
C
C CC C
*CSC
370
AACA.AGATAC
GCTGAACTGG
ATAGGATCTA
ATTGAGATCA
AAGATAAGAG
CAAACCATCC
ACATTAGATA
CAAAATGACC
CATATTGATC
TAACCATAAC
TTGGACACCT
TGAAGATATA
TCATCCAACC
GGAAACTCTG
GAGTGTAATG
AACTTAATTA
ACACAGTCAT
TTCCAGTCAT
AACTTACTTA
ATCTTGAATA
GATGAAAACT
AATCAATCAT
ACAACACTCT
TGGTTCAATT
AAAAGTCATG
AATCAATATG
TCA.AGTCAAT
ATAGAACAGA
TA.AACAACAT
ATAGTGATGA
TGTACAATAC
ATCTGCTCAA
TCCACAAAAG
AAACCAAAAA
TCAAGTGAAA
TATTTGGATA
AA.AAACTTAT
TATACAGTAT
ATAAAACTA'r
CTAATGTGTA
CTTTAGGGAG
GTAGACAAAG
TAATATCTAG
TACTTATGAC
AAAAAATAAT
AACTAGGATT
CAGTACTTAC
ATACAAATTC
TGAAAAAATT
TATATACAAA
GGTTTATATT
GTTGTATCGT
GGACAAAAGC
AGAATATGCT
AACAAAACAA
CATTAAAAAG
TGTTATATCA
GAGACTACCA
CATAACCATA
TAATGATATT
GCATGGTTGC
ACCACCAGCG
TAGATGCCAC
ATATATTAGT
TTTGATAAGG
TCTAACTGAT
TTATCTTTTT
CCCACTACTA
ATATCATAAA
ATATAAAAGT
ACGAAGAGCC
AAAGGAAAAG
AACCATAATT
AGACAAAAGT
GATGTGTTCA
ATTAAATAAC
AATAGATAAT
TTATCATAIA
ATAGACACTT
CTTGGTATAG
TCAGCATGTG
CTTAGAGATA
TACATTGAGA
GCAGACGTGC
AGCAATCCAA
ACCGGATAAA
TACATTCAAT
TTTATTAAAT
TCAACAATTT
GTCATAATGC
TTATGGGACA
AGTTATTTAA
AACGGCCCTT
GAGCATATGA
GGTGAACTGA
ATGTCCTCGT
ATAGAAATAA
GACAGAGTTA
AAAGATGATA
CACTCAGTAA
ATGCAACATC
ATATTAACAC
CAAACTTTAA
GGACTCAAAA
TGTCTGAAAT
TTGGAGTGCT
TTGCTATGAG
ATGAAGA-ACC
GCAATAGAAA
TGAAGAAGAC
AAGAGTCAAC
TATCCTTGTA
CATAAAAACA
CATATATTTG
CTCCAACATC
TTGACCATAA
AAATGGATCC
AAGGTGTTAT
ATCTTAAAAA
ATCTTAAAAA
AATTAGAAGA
CTGAACAAAT
GTGATGTAAA
AGCCCAACAA
TACTTTCGGC
ATCAAAATAT
CTCCATCATG
AATATCGATC
GTGGTTTTCA
AAATCACAAC
AAGTGGAGCT
AGAGAGTTAC
TAAACTTCTT
CAATTCACCT
AAACAACAAG
AATAAAGAAC
TGTGA.ATGAT
GTATATCATC
TATTACAATT
ATGAAATTCA
TTAACATCCC
CGACTCTATG
CATTATTAAT
CTCTTTTTCA
TGATTACACC
ACTAACTATA
ACCAACTTAT
TGCTACAACT
GGTGTACGCC
TAATTCAGGT
TGTGGAAALAC
CACTATCAAA
GTTAATACAC
AAATGiGGTA
GTTTATTTTA
TACTACTTAC
7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 371
AATCAATTTT
TGGATAAGTA
AATGTTGTGT
GAAGGCTTCT
ACAGAAGAAG
GCAGCTATTA
ACAGTGTCTG
TTGATTAAGC
AGAATCTTTG
TGTAATGAAA
TATAGAATCA
ATTGTCCTAC
GAILATCACAG
CTGCCTAAAA
GATCTAATAT
GAACATGAAA
TTGAGAGATA
CTCAACAACT
AGAATGTTTG
ATAGCTGAAA
CTTCAAAAGA
AACTACAACA
GCATTTAGAT
CAATCTCTGT
AGACATGCAC
TGACATGGALA
ATTGTTTAAA
TATCACAATT
ACATAATAAA
ATCAATTTAA
AGGCTCAAAA
ATAATATCAT
TTGCAGGTGA
GACATCCAAT
CTAAGTTCTA
TAAALAGGGTT
CTCTAAGATG
AAAATGATTT
AAGTGGATCT
GGACTAGTTT
AGTTGAAGTT
ATAAATTCAA
CTAATCACGT
CTATGCAACC
ATATTTTACA
TATTAGAATT
ATTATATCAG
ATGAAACATC
TCTCTTGGTT
CTCCT TCAT
AGACATCAGC
TACATTAAAC
ATTTCTTTAT
AGAAGTAGAG
GAAACGATTT
GGACCTACTA
AAATGGTAAA
TA.ATAATCTC
GGTCGATGAA
CTTATTAAGT
TGTAAATACC
GTTAAACTAC
GATTATTTTA
TGAAATGATA
TCCTAGAAAT
CTCTGAAAGC
TGAATGCGAT
GGTATCACTA
AGGTATGTTT
ATTCTTCCCT
AAAAGCAGGA
TAAATGTTCT
ATGTATCTGC
GCATTTAACA
AAAGGATCAT
CTTAGCAGAT
AA.AAGCTTAG
GGAGATTGTA
GGATTTATTA
TATAATAGCA
TCALAGAGTAT
TGGATAATCC
AATAACTTGA
AGACAAGCAA
AGTCTAAGTA
TACAACAGAT
TATAAACTTA
TCAGGATTGC
ATAILATGACA
TACATGCCAT
GACAGATCGA
CTATACAATT
ACTGGTAAAG
AGGCAAATCC
GAGAGTTTGA
ATAAGCAACA
ATCATTACAG
AGTGATGTAT
ATACCTCTTG
GTTGTTAATC
TAAATGTTTG
GGCTGAGATG
TACTGAAATT
TGTCTTTAAT
TGCTAAATAA
GTCACACTTT
TATTAAGTAA
GTGAGCTATA
TGGATTCTGT
CATTAAGACG
GGCCCACCT'r
ATACTTATCC
GGTTCTATCG
AAGCCATTTC
CACATATACA
GAAGAGTACT
GTGTAGTCAA
AAAGAGAGCT
AAATCTTAGC
CAAGATATGG
AGTCAAATCG
ATCTTAGCALA
TAGATGAACT
TCACAATAAT
TTAATGAGGT
CTTAATTACT
TGGATTCA.AT
ATTTCATAAT
TCTAAkACATA
CATCACAGAT
ATTAGACAAG
ATTTCTTAAA
TTTTCTCTTC
AAGAkATTAAC
TGCTTTCATT
A.AGGAATGCT
ATCTCTACTT
TGAGTTTCAT
ACCTCCAAAA
AAATTATATA
AGAGTATTAC
TCAAAGCTAT
CAGTGTAGGT
AGAGAAAATG
TGATCTAGAG
TTATAATGAT
ATTCA.ATCAG
GCATGGAGTA
ATGTACATAT
TGATGAACAA
9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAA ACTGTGGACC 372
ATTGAAGCTA
CTGATAAATG
CAGACCCATG
GAGTATGCAG
CAGTTCATGA
GTCCTGAGAG
TCTATAGGCA
ATATTTAGGA
TGTAACAATA
AATCTTGATA
GGTGGTGATC
GAAGCTATAG
AAGCTCCAGG
GATAAAKATC
GAAAGGCAAG
ATAGCCCCAA
CTAAATGACA
GAAAGTTTAC
ATAACTAATA
GATATGATGA
GACAAAAGAG
AGAGAAAGAT
ACAATGAACA
AATGTTAATA
ACGCAGGAGA
GACCAAATAG
TATCATTATT
GTGATAATCA
CACALAGCAGA
GTATAGGCCA
GCAAALACAAT
TAGGTCCATG
GCTTAACACA
ACATTTGGTT
AGCTATATTT
GCATTGATAT
CTAATTTGTT
TACATTCAGT
ATCTTCCAGA
CCAATGCCGA
CTAAAATTAC
ACAAAATATT
TTATGCAA;A
CTTTTTAT;A
TACTTGAA;A
GGAAAAATAT
AGTTATTAAG
CTTGGTCATT
TTAAATATAC
GTTTAACTCG
AAAAAACAAT
ATTTATTAGC
AGATCTA.ATA
GTCAATTGAT
TTATTTGTTA
TAAGCTTAAG
CCAGCACAAT
GATAA.ACACG
GGAGTTAGAA
ATACAATCAA
AGATATATTG
GGCTTTATCA
ATATCGAAGC
GTTTGTGTTG
TGATAGACTG
GTTTGTAACA
TAGTGAGATT
TTCTAAAAGT
TATAGAACCA
AGCAGAAAAA
AACATCAGCA
AACTTTACTT
TTTAGAAAAT
ATCCAATATA
AACTAGCACT
TGGTGAAAGA
GCCAGTGTAC
AA.AATTAGAC
TCTCTCAAAG
ATAAGCAAAC
GCATTAATA
GGAACAGAGA
GGAGTGTACT
ATACTTGATG
TACAGAGGAG
ATTGCTTTGC
AAAGTATTALA
TTGTATATGA
TTTTATAGGA
AGCTATTATA
AACAAATTCT
TTGATGAGGG
ALATAGATTAG
GCACAACATT
ACTTACCCTC
ATAGTTAATC
ATAGATACAA
ATAAGGATAC
CTTAGTATAA~
GTAGGAGTA
ATAGCCAGTG
GGACCCACCA.
AACAGACA-G
TGGGTATATG
GGAAATTCTC
CAGTTAGACT
GCCTTAAATT
CCTATATATC
ATCCAGCCAG
ATTTTAILAGT
AAAGCTTATT
AACTCCGAAA
AACACTTAAA
ATTTGCCTAT
GAACTCCAGA
CTGGTCACGA
TGACATGTGT
ATCCACAGGC
CAGTAACAGA
ATACTACCAC
ATGGATTAAG
TTATATCAGG
CTGATATTAA
TTCCACTAGA
CTGAATTAAG
CATCGCCAAG
GTATAATAAT
AGCCATGGGT
TTTTAACCAA
CATCCATAGA
TATCACAGCT
TATAGAGGGT
GTTATATAAA
CCGAGATATG
TATCAAAAAA
TAGTTTAGAA
ATGCAGTTTA
TCATGCATTA
AACTTTTTTT
GCTGTTTGGT
CTTCCTTACA
TTTACAAGAT
CATCACATTT
TTTAGGGTCT
AGTCTTA.AGT
TGAGATTGAT
AGTTGTTTAT
AACAAAATCC
TAGGGCTACT
TTGTAACAIA
CAAGTATGTA
TATTATGTTC
AGAAAAATAT
AGGCTCATCC
AAAGCAAAGA
CAACAAAGAT
10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 373 0.0
GAATTCATGG
TTGTTTCCAC
TGTGAATTCC
CCTATCAATC
AATTGCATAA
CCTAATAGAA
TTTACAGGAG
CTACCAGATA
AAATCTGGAT
CATAATGCTT
ATGAAAGATT
ATGTTCATTA
AAAGGTTATG
TTGGAGTTAA
GTCATAAAAT
TTTAAGTTGT
GTTAACATAG
AGAATGGGGT
GAATTTTACA
CTAACAAAAC
CACCCAACCC
AACAAACCTA
AGTAAAATGC
TTGTACAATT
GCAAAATCTA
GCATCACTTT
A.AGAACTGAG
AATATCTAAG
CTGCATCAAT
ATGTATTAAC
GTTTTGGTCT
TTATTCTCAT
ATGTTGATAT
AAATAAGTTT
CTCACATCAA
ATATTTTAAG
CA.AAAGGTAT
ATTTGAATGT
GTAAAGCAAA
TAGACAGTAG
ACATAGTCAA
GGTTTTTAA-A
ATTATCACCC
TAATAAATGT
CATCAAATCT
AAATAAGAAT
CAGAAACTTT
AATTTTGTAT
ATATTAA.ATC
TATTTCCAAT
ACCAACTTTA
ATTGCATGCT
TACTGGAACA
TGTCAkATTAT
ACCAGCTTAT
AGAAAAGTAT
TAGCCTGATG
ACCGAAGCTG
CATCAAGTTG
AACCCAATAT
CTCTAATTTA
TACTAATTTA
TTTTGAAAA.A
TTTCTTTAAT
ATTAGAATGT
CTACTGGAAA
TCAAGACACA
ACGCCTTAAT
ILACACACATG
AGATAILATTA
CTTTTACATT
TGCTILATTCA
AGAAAATATG
AAGTGGAALAT
TTCCACTGTT
TGTTGTGATA
CACCACCACT
TCCTTGGCAT
CTTGGACTGT
TTACACCGTT
AGAACAACAA
GGAGATGAAG
TCGGTTGTGG
AATGAGATAC
ALAGCAAGTGA
GTAGAAT TAT
ATATTAGTAC
GCTGGACATT
GATTGGGGAG
GCTTATAAGA
GATATGAACA
TCTATGTCTA
AGTTTGCGTA
AATGCTAAAT
AAAGCTATAT
ACCATTAAAA
AGTTATAACT
GAATTAGAAG
TCATTILATTC
ACCGAATCTA
ACCACAAGAT
GACAAGATTA
TCACATCAGA
CATGTCAATA
CATATGAAAA
TAACAGTCAG
ATTATCATTT
ATATCGACAT
AACAATTCAC
ATTTGATGAA
TACAAAAGCA
TCTTAAGTAA
ATAAAATGTC
GGATTCTGAT
AGGGGTACAT
CTTATTTGCT
CTTCAGATCT
AAGTTTTCCT
GAATAAAAGG
TTACCGTATG
TATCTTACAT
ATAAAAACAA
TTTCAGACAA
ATAATTATAA
CTGTTAAAAG
TGATGATGTC
TCAATTATAG
TAGATCATTC
CATCTTTAGT
GATTTAACTT
AGCCAAAA.AG
TAGTAGACCA
TGATACTAGT
TGTGTTTCAA
AAACATATGT
ACCTCCTATA
GCACATGTTC
CAAAGCACTT
TGATTATTTT
TATTCAACTT
AACTGATCAT
ATGTTTTCAT
TCTTTGTGTT
AGAACAAAAA
CTGTCACAGT
CCCTTGGGTT
AGATTTAGTT
ATTCAATGAT
CACTCATTTG
CAAACTATAT
TAATAATAGT
AACATTCTCT
CAAACAAGAC
AGGTAATACA
ILAGGALATAGT
TGTATTTAGT
12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 374
TCCACAGGAT
AGTTGTATAG
CTTCATCCAG
CCTATTGAAT
ACCATTCCTG
GCAGAACCTA
AAAATTATA.A
AGATGCATTT
ATTACTATAT
TTAATCCTTA
AAATTGATAC
ATCGATGCAA
ATTAAGACTT
ATAGCTGGAC
CTAAAATGGC
TACATGATAG
GAGCTCAAGA
TAGTTTAAAA
TAGTTATTAA
TTTTAGTCTT
GCAAGATCAG
CATTCATAGG
ACATAAGATA
TTCTAAGGTT
CTACAGATGC
TTAGCATCTT
TTGAATGGAG
TAATTGCAAA
TAAAAACTTA
CAATAGGCCC
TTTCAAGAAC
ATATTAAAAG
CATTGTCAAA
GTAATGAAGT
TAGATCATGT
AGTCCACATA
AGCTGATTAA
TATCATTAAC
AGAATATACA
AAGGGGTTAA
TATAGAGTAT
TGAAGGAGCT
CATTTACAGA
ATACAACGGG
AACTAATAAC
TGTCTGCGAT
TAAGCATGTA
ATATCATGCT
CGTGTGCCTA
TGCAAATATA
TAAAAATTTC
CTTAATACCT
ATTGAAGAGT
ATTCAGCAAC
TTTAAATTTT
TCCTTACTTA
AATAACAGGT
AAGTTTGGTC
AACTTTTCA.A
ATAAAAGTCT
ATTTTAAAAG
GGTA.ACTTAT
AGTTTAAAAG
CATATAAACA
ATTCATTGGT
GCTGAATTAC
AGAAAGTGCA
CAAGATGACA
GGTAGCAAGT
CTTCCTGTTT
ATTATGCCTA
TTCCTTTGTT
GTAGTTAATG
AAGCTTATAA
AGATCAGCTG
AGTGAATTGT
AGTGTGCTAT
AAATTTAGAT
TAATTTAGCA
AAAACTAACA
TTTTCTCGT
ATCTTAAGAT
TATTACGTAC
ATTGCAATGA
TAGATTATGG
CTTATTTACA
CTGTTACAGC
AGTACTGTTC
TTGATTTCAA
TAAAAGGATC
TTGATGTTGT
AAAAAACTGA
ACCCTATAAC
GAGATATATT
ACCACAAGCA
AACTTAATTA
TAAATAGTTT
ACAACCTTCC
GCTAACACAT
TATTGATTCC
ATTATACATG
TAAGGACCCC
GGTAGTAGAA
TCATAGTTTA
TGAGAATTTA
TATAAAATTT
CAATTGGAGT
TTCTGTAAAT
ATTAGATAAC
TGAAGTTTAC
ACAAA.ATGCT
CALAGGAATCT
AAAAAXAGGA
ATCATATTCT
TATGAATATC
CAATCATTTA
AACAACCAAT
CAACGAACAG
CATTATATTA
AAAATTATCA
TGCATTCACA
14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15219 ACACAACGAG ACATTAGTTT TTGACACTTT INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 2166 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein 375 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: Met Asp Pro Ile Ile Aen Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp
I
Ser Ser Ile Thr Leu Met Ile Asn Ser 145 Leu His Leu Asn Giu Tyr Tyr Ser Ile Giu Ser Arg Lys 130 Gly Ser Ser Met Leu 210 Val Leu Leu Arg Thr Glu S er Arg 115 Leu Asp Al a Val Cys 195 Tyr Lys Lys Phe Gin Gin Pro Ser 100 Aila Gly Glu Val Asn 180 Ser Thr Ser 5 Gly Asn Se r Ser Thr Giu Ile Leu Asn Glu 165 Gin Met Lys His Val Gly Pro Leu 70 Tyr Gin Giu Lys Ser 150 Asn Asn Gin Leu Gly Ile Pro Leu 55 Ile Phe Ile Ile Giu 135 Val Asn Ile His Asn 215 Phe Ser Tyr 40 Leu Ser Gin Ala Ser 120 Lys Leu Gin Thr Pro 200 Asn Ile *Phe 25 Leu Giu Arg Ser Thr 105 Asp Asp Thr Ser Ile 185 Pro Ile Leu 10 Ser Lys His Tyr Leu 90 Thr Val Arg Thr Tyr 170 Lys Ser Leu Ile Giu Asn Met His 75 Leu Asn Lys Val Ile 155 Thr Thr Trp Thr Asp 235 Cys Asp Asn Lys Met Leu Val Lys 140 Ile Asn Thr Leu Gin 220 Asn Asn Tyr Leu Gly Thr Leu Tyr 125 Pro Lys Ser Leu Ile 205 Tyr Gin Al a Thr Lys Giu Tyr Lys 110 Ala Asn Asp Asp Leu 190 His Arg rhr Leu Asn Lys Leu Lye Lye Ile Asn Asp Lye 175 Lys Trp S er Leu *Gly Leu Leu Lys Ser Ile Leu Asn Ile 160 Ser Lys Phe Asn Ser 240 225 230 Gly Phe Gin Phe Ile Leu Asn Gin Tyr Gly Cys Ile Val Tyr His Lys 376 245 250 255 Gly Leu Lys Lys Ile Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 260 265 270 Lys Asp Ile Ser Leu Ser Arg Leu Asn Val Cys Leu Ile Thr Trp Ile 275 280 285 Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 290 295 300 Phe Asn Asn Vai Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys Ile 305 310 315 320 Leu Lys Leu Phe His Asn Giu Giy Phe Tyr Ile Ile Lys Giu Vai Giu 325 330 335 Gly Phe Ile Met Ser Leu Ile Leu Asn Ile Thr Glu Giu Asp Gin Phe 340 345 350 Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn Ile Thr Asp Ala Ala 355 360 365 Ile Lys Ala Gin Lys Asp Leu Leu Ser Arg Vai Cys His Thr Leu Leu 370 375 380 .Asp Lys Thr Vai Ser Asp Asn Ile Ile Asn Gly Lys Trp Ile Ile Leu *385 390 395 400 Leu Ser Lys Phe Leu Lys Leu Ile Lys Leu Ala Gly Asp Asn Asn Leu 405 410 415 Asn Asn Leu Ser Giu Leu Tyr Phe Leu Phe Arg Ile Phe Gly His Pro 420 425 430 *.Met Val Asp Giu Arg Gin Ala Met Asp Ser Vai Arg Ile Aen Cys Asn a...435 440 445 Giu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala *450 455 460 Phe Ile Tyr Arg Ile Ile Lys Gly Phe Vai Asn Thr Tyr Asn Arg Trp 465 470 475 480 Pro Thr Leu Arg Asn Aia Ile Val Leu Pro Leu Arg Trp Leu Aen Tyr 485 490 495 Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu Ile Thr Giu Aen Asp 500 505 510 Leu Ile Ile Leu Ser Giy Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 515 520 525 377 a a Lys Lys 530 Pro Lys 545 His Ile Asp Arg Asn Giu Asn Ser 610 Vai Giy 625 Ile Leu Giu Ser Leu Lys Asn Asn 690 Asn Gin 705 Asp Giu Ile Pro Ile Lys Leu Tyr 770 Asp Gin S er Cys 595 Asn Arg Al a Leu Ala 675 Tyr Aila Leu Leu Asp 755 Arg Leu Asn Arg 580 Asp His Met Giu Thr 660 Giy Ile Phe His Vai 740 His Tyr Ile Trp 550 Tyr Ile 565 Arg Vai Leu Tyr Vai Vai Phe Aia 630 Lys Met 645 Arg Tyr Ile Ser Ser Lys Arg Tyr 710 Giy Vai 725 Thr Ile Vai Vai His Met Thr Giu Leu Asn Ser 615 Met Ile Giy As n cys 695 Giu Gin Ile Asn Giy 775 S er His Giu Cys 600 Leu Gin Ala Asp Lys 680 S er Thr Ser Cys Leu 760 Giy Phe Glu Tyr 585 Vai Thr Pro Giu Leu 665 Ser Ile Ser Leu Thr 745 Asn Ile Pro Lys 570 Tyr Vai Gly Giy Asn 650 Giu Asn Ile Cys Phe 730 Tyr Giu Giu Arg 555 Leu Leu Asn Lys Met 635 Ile Leu Arg Thr Ile 715 Ser Arg Val.
Giy Asn Lys Arg Gin Giu 620 Phe Leu Gin Tyr Asp 700 Cys Trp His Asp Trp 780 Tyr Phe Asp Ser 605 Arg Arg Gin Lys Asn 685 Leu Ser Leu Aia Giu 765 Cys Met S er Asn 590 Tyr Giu Gin Phe Ile 670 Asp Ser Asp His Pro 750 Gin Gin Pro Giu 575 Lys Leu Leu Ile Phe 655 Leu Asn Lys Vai Leu 735 Pro Ser Lys Ser 560 Ser Phe Asn Ser Gin 640 Pro Giu Tyr Phe Leu 720 Thr Phe Gly Leu Vai Asp Leu Giu Met Ile Ile Asn Asp Lys Aia Ile Ser Pro 535 540 Trp 785 Thr Ile Giu Aia Ile Ser Leu Leu Asp Leu Ile Ser Leu Lys Gly 790 795 800 378 9 9* 9 9 Lys Phe Ser Ile Ser Lys Asp Tyr Leu 835 Ala Gly Ile 850 Asp Met Gin 865 Pro Ala Ser Ile Leu Asp Gin Giu Leu 915 Arg Asn Ile 930 Ala Leu Cys 945 His Leu Lys Leu Tyr Met Leu Tyr Arg 995 Ile Val His 1010 Gin Asp Lys 1025 Thr Cys Val Leu Met Arg Ile Pro 820 Leu Gly Phe Ile Asp 900 Giu Trp Asn Thr Asn 980 Ser Ser Leu Ile Asp Thr Ala Leu Ile Asn 805 Val Arg Leu Ile Giu 825 Ala Leu Asn Ser Leu 840 His Lys Leu Lys Gly 855 Met Ser Lys Thr Ile 870 Lys Lys Val Leu Arg 885 Phe Lys Val Ser Leu 905 Tyr Arg Gly Giu Ser 920 Leu Tyr Asn Gin Ile 935 Asn Lys Leu Tyr Leu 950 Phe Phe ASn Leu Asp 965 Leu Pro Met Leu Phe 985 Phe Tyr Arg Arg Thr 1000 Val Phe Val Leu Ser 10i5 Gin Asp Leu Pro Asp 1030 Thr Phe Asp Lys Asn 1045 Pro Gin Ala Leu Gly Gly Asp Asn Gin Ser Ile Asp 810 815 Gly Gin Thr His Ala Gin Ala 830 Lys Leu Leu Tyr Lys Giu Tyr 845 Thr Giu Thr Tyr Ile Ser Arg 860 Gin His Asn Gly Val Tyr Tyr 875 880 Val Gly Pro Trp Ile Asn Thr 890 895 Giu Ser Ile Gly Ser Leu Thr 910 Leu Leu Cys Ser Leu Ile Phe 925 Ala Leu Gin Leu Arg Asn His 940 Asp Ile Leu Lys Val Leu Lys 955 960 Ser Ile Asp Met Ala Leu Ser 970 975 Gly Gly Gly Asp Pro Asn Leu 990 Pro Asp Phe Leu Thr Giu Ala 1005 Tyr Tyr Thr Gly His Asp Leu 1020 Asp Arg Leu Asn Lys Phe Leu 1035 1040 Pro Asn Ala Giu Phe Val Thr 1050 1055 Ser Giu Arg Gin Ala Lys Ile 1070 Thr Giu Val Leu Ser Ile Ala 1060 Thr Ser Giu Ile Asn Arg 106~ Leu Ala Val 379 1075 1080 1085 Pro Asn Lys Ile Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Giu 1090 1095 1100 Ile Asp Leu Asn Asp Ile Met Gin Asn Ile Glu Pro Thr Tyr Pro His 1105 1110 1115 1120 Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lye Ala Giu Lys 1125 1130 1135 Ile Val Asn Leu Ile Ser Gly Thr Ly. Ser Ile Thr Asn Ile Leu Glu 1140 1145 1150 Lys Thr Ser Ala Ile Asp Thr Thr Asp Ile Asn Arg Ala Thr Asp Met 1155 1160 1165 Met Arg Lys An Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp Cys 1170 1175 1180 Asn Lys Asp Lys Arg Giu Leu Leu Ser Leu Glu Aen Leu Ser Ile Thr ***1185 1190 1195 1200 *Giu Leu Ser Lye Tyr Val Arg Giu Arg Ser Trp Ser Leu Ser Asn Ile :1205 1210 1215 .**Val Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asn Ile Lye Tyr .*.1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile Ile Ile Giu Lye Tyr Aen Val 1235 1240 1245 Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 1250 1255 1260 *Ser Ser Thr Gin Giu Lye Lye Thr Met Pro Val Tyr Aen Arg Gin Val 1265 1270 1275 1280 Leu Thr Lye Lye Gin Arg Asp Gin Ile Asp Leu Leu Ala Lye Leu Asp *1285 1290 1295 Trp Val Tyr Ala Ser Ile Asp Asn Lye Asp Giu Phe Met Giu Giu Leu 1300 1305 1310 Ser Thr Gly Thr Leu Gly Leu Ser Tyr Giu Lye Ala Lye Lye Leu Phe 1315 1320 1325 Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 Arg Pro Cys Giu Phe Pro Ala Ser Ile Pro Ala Tyr Arg Thr Thr Aen 1345 1350 1355 1360 380 Tyr His Phe Asp Thr Ser Pro Ile Asn His Val Leu Thr Giu Lys Tyr 1365 1370 1375* Gly Asp Giu Asp Ile Asp Ile Val Phe Gin Asn Cys Ile Ser Phe Gly 1380 1385 1390 Leu Ser Leu Met Ser Val Val Giu Gin Phe Thr Aen Ile Cys Pro Asn 1395 1400 1405 Arg Ile Ile Leu Ile Pro Lys Leu Asn Giu Ile His Leu Met Lys Pro 1410 1415 1420 Pro Ile Phe Thr Gly Asp Vai Asp Ile Ilie Lys Leu Lys Gin Val Ile 1425 1430 1435 1440 Gin Lys Gin His Met Phe Leu Pro Asp Lys Ile Ser Leu Thr Gin Tyr 1445 1450 1455 Val Giu Leu Phe Leu Ser Aen Lys Ala Leu Lys Ser Giy Ser His Ile 1460 1465 1470 Asn Ser Asn Leu Ile Leu Val His Lye Met Ser Asp Tyr Phe His Aen *1475 1480 1485 *Ala Tyr Ile Leu Ser Thr Asn Leu Ala Gly His Trp Ile Leu Ile Ile *1490 1495 1500 Gin Leu Met Lye Asp Ser Lye Gly Ile Phe Glu Lye Asp Trp, Gly Giu 1505 1510 1515 1520 Gly Tyr Ile Thr Asp His Met Phe Ile Asn Leu Asn Val Phe Phe Asn 1525 1530 1535 C:.:Ala Tyr Lye Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 1540 1545 1550 Lye Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Giu ***1555 1560 1565 Leu Ile Asp Ser Ser Tyr Trp Lye Ser Met Ser Lye Val Phe Leu Giu 1570 1575 1580 Gin Lys Val Ile Lye Tyr Ile Val Aen Gin Asp Thr Ser Leu Arg Arg 1585 1590 1595 1600 Ile Lye Gly Cys His Ser Phe Lye Leu Trp Phe Leu Lye Arg Leu Asn 1605 161.0 1615 Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn Ile Asp Tyr His 1620 1625 1630 381 Pro Thr His Met Lys Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met 1635 1640 1645 Gly Leu Ile Asn Val Asp Lys Leu Thr Ile Lys Asn Lys Asn Lys Phe 1650 1655 1660 Asn Asp Giu Phe Tyr Thr Ser Asn Leu Phe Tyr Ile Ser Tyr Asn Phe 1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lys Gin Ile Arg Ile Ala Asn Ser 1685 1690 1695 Giu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Giu Thr 1700 1705 1710 Leu Glu Asn Met Ser LeuIle Pro Val Lys Ser Aen Asn Ser Asn Lys 1715 1720 1725 Pro Lys Phe Cys Ile Ser Gly Asn Thr Giu Ser Met Met Met Ser Thr 1730 1735 1740 Phe Ser Ser Lys Met His Ile Lys Ser Ser Thr Val Thr Thr Arg Phe 1745 1750 1755 1760 *Aen Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro Ile Val Val Ile *1765 1770 1775 *.*Asp Lye Ile Ile Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 1780 1785 1790 Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 1795 1800 1805 Leu Tyr Cys Met Leu Pro Trp His His Val Aen Arg Phe Aen Phe Val 1810 1815 1820 Phe Ser Ser Thr Gly Cys Lye Ile Ser Ile Glu Tyr Ile Leu Lys Asp 1825 1830 1835 1840 *Leu Lye Ile Lye Asp Pro Ser Cys Ile Ala Phe Ile Gly Giu Gly Ala 1845 1850 1855 Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile 1875 1880 1885 Glu Phe Leu Arg Leu Tyr Asn Gly His Ile Asi Ile Asp Tyr Gly Glu 1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala Thr Asn Asn Ile His Trp Ser 382 1905 1910 1915 1920 Tyr Leu His Ile Lys Phe Ala Giu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lye Ile Ile Ile Giu Trp 1940 1945 1950 Ser Lys His Val Arg Lye Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 1955 1960 1965 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lye Thr Tyr Val Cys Leu Gly Ser Lye Leu 1985 1990 1995 2000 Lye Gly Ser Giu Val Tyr Leu Ile Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Ile Leu Ser Arg .2020 2025 2030 *Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lye Giu Ser Ile Asp *2035 2040 2045 *Ala Asn Ile Lye Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lye *2050 2055 2060 Lye Gly Ile Lye Thr Ser Leu Ser Lye Leu Lye Ser Val Val Aen Gly 2065 2070 2075 2080 Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Aen Giu Val Phe Ser Aen 2085 2090 2095 Lye Leu Ile Asn His Lye His Met Aen Ile Leu Lye Trp Leu Asp His 2100 2105 2110 Val Leu Aen Phe Arg Ser Ala Giu Leu Aen Tyr Aen His Leu Tyr Met *2115 2120 2125 Ile Giu Ser Thr Tyr Pro Tyr Leu Ser Giu Leu Leu Aen Ser Leu Thr 2130 2135 2140 Thr Aen Giu Leu Lye Lye Leu Ile Lye Ile Thr Giy Ser Vai Leu Tyr 2145 2150 2155 2160 Asn Leu Pro Asn Giu Gin 2165 INFORMATION FOR SEQ ID NO:33: 383 SEQUENCE CHARACTERISTICS: LENGTH: 15219 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAA AAAATGGGGC
AAATAAGAAC
0 .0 0 0.
0 *0 0 .0.90
TTGATAAGTG
ATGATAAAGG
ACATGTTATA
ACAATTAAAT
GATAACAATA
TACATATGGG
AATTGTGAAA
CAAATATCTG
TAGACATGTG
AATAAACTCA
GATTGATGAT
AAGAAATCAT
TTGATGAAAG
AAGTAGGGAG
CCATGCCTAT
AACACACTCC
CCCAACCAAA
CATTTTAGTA
CTATTTAAGT
TTAGATTACA
CTGATAAATT
TAAACGGCAT
TTGTAGTGAA
AATTGATTGA
TCAAATTTTC
ACTTACTTGG
TTTATTACCA
CCTAATCAAT
CACAGACATG
CACACACAAA
ACAAGCTACA
TACCAAATAC
ATTTATCAAT
TATAATATAC
CCAAACTATT
ATTAAAAATA
CTALACCTTTT
AAATTTATTT
AATTCTTCTG
AGTTTTTATA
ATCTAACTTT
GTTGACACAC
TAAAAGACTA
GCTTGATCTC
TTTTAGTTA.A
CAAACCATGA
AGACCCCTGT
TTCATATACT
TTTACATTCT
AAAAAATACA
CACGGCGGGT
AAATATGACC
CCTCAAACAA
AAAGTAAAGC
CAATCAGAAA
GACAATGACG
ACCAATGCAT
CATGTTATAA
ACAACAATGC
TGCTCTCA.AT
AGTGACTCAG
AATTCATGAA
TATAAAAACT
GCACTACAAA
CAATGGATTC
TGATAA.ACA
TAGTCAATTA
CTGAATATA
TTCTAGAATG
TCAACCCGTG
CAGTGCTCAJA
TGGGGTGC~AA
AAGTAGCATT
TAGCCAAAGC
CAAGCAGTGA
CA.ATACTACA
TA.AACGGTTT
TAATGACTAA
TTATGTTTAG
CATCAAAGGG
TGACAACACT
A.ATA.ATAACA
TGAATGTATT
TGAGATGAAG
TACAAAATAT
TATTGGCATT
AATTCCAACA
TAGTTAAGAA
TTCACTGAGC
GTTAAAAATA
AGCAATACAT
AGTGTGCCCT
AALATGGAGGA
AATGGATGAT
TTATATGAAT
TCTAATTCAA
AA.ATGGGGCA
ACTATGCAAA
TCTCTTACCA
GTAAGAAAAC
CTACTGCACA
GGCACTTTCC
AAGCCTACAA
AAAAAACCAA
GGAGCTAATC
AATACAAAGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 CAATAACATA A.ATTGGGGCA TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG
CTGTCATCCA
384
GCAAATACAC
AAAAACACCT
TCACAGGATT
AGATACTTAA
GTCAAGATAT
CAGAAATACA
TATTCAACGT
AAACAAACTA
AATAGGTATG
AGATGCTGGA
AAATGGA.AAG
AGTCAATATT
AGATGGGAGA
AGTGGCTCCA
*9
TGTGTATAGC
CAGTAATTAG
TACCAAAGGA
ATGTTTTCGT
GAATCTTTGC
GAGTTTTAGC
TGGAGCAAGT
ACCATATATT
CAAGTGTGGT
CAAGAAACCA
GAGTAATAAA~
AACTCAACCC
AGTCAAC-ATG
CAAATTCCTA
TAGCATAATA
TGGCACCc
CTACCCAAGA
TTCTAAGTTG
TGCACTTGTG
GAGGGCAAAC
TATAGCTAA~C
GCACTTTGGC
AGGATTGTTT
CAALATCTGTA
TGTGGAAGTC
GAACAATCCA
CCTAGGCAAT
GGATCTTTAT
CTACAGTGTA
CAAAGAAGAT
GAGAAGTTTG
GAATCAATA
TCTGTTAACT
ATCATCAATC
AAACCCCTAG
TACAAGGAAA
AGTACAGGAG
TGTGGTATGC
TTATATGCTA
TATCATGTTA
GAAATGA.AAT
GAGATAGA.AT
GAATATAGGC
ATAACCAAAT
AATGTCTTA.A
AGTTTTTATG
ATTGCACAAT
ATGAATGCCT
AAAAATATCA
TATGAGTATG
AAAGCATCAT
GCAGCAGGTC
GATGCAGCTA
TTAGACTTAA
SATGTAGAGC
CACCTGAATT
RGGGCAAGTT
2AATAGATAT NAACAAGTGA2 rAAGCTTCAA I MATAGAAAC I
ATAATATTGA
TATTAATCAC
TGTCCAGGTT
AAGCTAATGG
TCGAAGTATT
CTAGAAAGTC
ATGATTCTCC
TAGCAGCACG
AAAACGAAAT
AAGTGTTTGA
CATCCACAAG
ATGGTTCAGG
TGCTAGGACA
CACAGAAGTT
TGCTGTCATT
TAGGCATAAT
AAGCATATGC
CAGCAGAAGA
rTTAAGTTAA rCATGGAGAA
CGCATCATCC
kGAAGTAACT kGCCGACAGT
~GAAGATCTC
~TTTGATAAC
CACTCCCAAT
TGAAGATGCA
AGGAAGGGAA
AGTAGATATA
AACATTATCA
CTACAAAAA.A
AGACTGTGGG
AGACAGATCA
AAAACGATAC
AAAACACCCT
AGGGGGTAGT
GCAAGTAATG
TGCTAGTGTC
GGGAGGAGAA~
AACTCAATTT
GGGAGAGTAT
AGAGCAACTC
ATTGGAAGCC
CAAAAAATAC
GATGCAAATA
AAAGATCCTA
AAAGAGAGCC
A~CCCCAGAAA
ACCCCAAGTG
I
k~ATGAAGAAG
TATGATGTGC
AATCATAAAT
GACACTATAA
ACAACATATC
AGCTTGACAT
ATGCTAAAAG
ATGATAATAC
GGTCTTACAG
AAGGGCCTCA
CATCTTATAG
AGAGTTGAAG
CTAAGATGGG
CAGGCAGAA.A
GCTGGATTCT
CCCAACTTCT
AGAGGTACAC
A.AAGAAAATG
1TAAAGCATC
GGGGCAAATA
A~CAAAGCTAC
k.GAAGAAAGA
CGATACATC
2AAAAGCCAA ~CAACdCTTT
LATCTAGCTA
1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 CTCATATGA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC
TAGATAGAT
2760 385 TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAJAGTGCAGG 0000 000G 0* .0 0 0 0* 00 9 0 0000 09 *0 0 00 0 0 000000 0 0 0000 00000.
0 *00..0 0 0* 00 0 00 **00 0 0000
ACCCACTTCA
AGAAAAAATA
TAGGA.ATGAG
AACTTCCAAA
TGATGATTTT
TCAATCCATT
CCAATCAATC
AAAAGA.ACAA
ATACACAGCA
AATATGGGTG
AAGCATCAAT
GATTAACTCA
TGTATCATTA
AGCATGCAGT
CATGAAGACA
GACATCAAAA
TCTGAACTCA
AATTATTCCT
CAAATATATC
GAGCATATAT
ACTAGAGGAT
ACACACTATA
CCACTCAAAA
AAATAAAACC
CATTTATACC
GCTCGCGATG
AGAGCGGAAG
GAAAGCGAAA
AAATTGAGTG
TGATCAGCGA
GAATCAACTG
AACCAATTGA
GATGGGGCIA
GCTGTTCAGT
CCTATGTTCC
ATACTAGTGA
AGAAGTGCTG
GATGAAAGIA
CTAACATGCT
TTCAkACCCCA
AGAGTAATAA
CTAGAWAATA
TATGCAGGAT
AAACCACAGA
TATGTGACTA
TAAACTTAAT
TCCAAACATC
TCCAAAATCA
AAAATATGGG
GCCAkATTCA
GAATALAGAGA
CATTAATGAC
AAATGGCA.AA
ACTTGTTGGA
TCAACTCACT
CCAGACCGAA
TCAATCAGCA
ATATGGAAAC
ACAATGTTCT
AGTCATCTGT
AGCAGATCTC
TGCTGGCTCA
GCAAATTAGC
TAAAAGTAAA
CTCATGAGAT
TACCAACCTA
TAGCAACCAC
TAGTGTTAGT
GTCAATTTAT
CTAATTGGA.A
TATCAACACT
ATAALACATCT
CTACCAGCCA
GTAAATAGAC
CACATATACT
TGCTATGGTT
CAATGATAGG
AGACACCTCA
AGACAACGAT
CAGCAATCAA
CAAACAAACG
ACCCGACAAA
ATACGTGAAC
AGAAAAAGAT
GCCAGCAGAC
TACGCCCAAA
ALATGCCTAGT
ATATGATGTA
AAGTATGTTA
CATTGCTCTA
TCTAAGATCA
CGAATTCAAA
TATCACAGTT
AGTAGATCTT
GCATACAGCT
GAATGACAGG
ACACTACACA
CTATCTGCTA
ATTAGTTAGA
ATAAATCTTA
GGTCTAAGAG
TTAGAGGCTA
GATGAAGTGT
AGTGACAA.TG
CAACATCAAT
TCCATCAGTA
ATTAACAATA
AAGCTTCACG
GATGATCCTG
TTGCTCATAA
GGACCTTCAC
AATTTCATCA
ACTACACCTT
ACTACAGTCA
TGTGAATTTG
ATTAGTGTCA
AATGCTATCA
ACTGACAATA
GGTGCCTACC
ACACGTTTTT
TCCACATATA
CTTCATCACA
GACCTAGAGT
GTTCAATCAA
AAATGGGAAA
AAGAGATGAT
TGGCA.AGACT
CTCTTAATCC
ATCTATCACT
AAAkACAGACA
GAACCACCAA
TAGTAACAAA
AAGGCTCCAC
CATCACTAAC
AAGAACTTGC
TACGAGTCAC
TAAGCGCAAA
GTGAAATCAA
AAGATCTTAC
AAAATATTAT
AGAACAAGGA
CCAATGCAAA
AAGGAGCATT
TAGAAAAAGA
CAATCAAACC
TCCTCAAACT
CAAACCAATC
GCGAATAGGT
TCTTAACAAC
TACATCCATC
2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 386 ACAATAGAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT gee* 0@Se 6* a of 00 6 0 056 000S
CTAATCTTTT
CATAAAGCAT
AGTTCTACCA
CTTCTCACAG
TATTTAAAAA
AATGCAACCA
GATACTCTTA
ATAGCACAAA
GCCATAATAT
ACAATAAAAA
AGGGTTAGTT
ACATCACCCA
ACCACTTCA
AAAGATGATT
CAACTTTGCA
ATCAAACCCA
GCCAAAACGA
ACAGAAAGAG
ACAGTCCAAC
ACACCCACAG
TAGTTATTCA
AAACCTGGGG
TCTTGCTATT
ATCGACATGT
TAGTGTCATA
TACTA.ATTAT
TCTGTAACAA
TTATGCTGTG
AGTCATGGTG
TTAACATAAT
TGTCCAAACA
ATCATCTAAT
TAGCACTATC
TCATCATCTC
ACCACACTGA
CATCCAAGCA
ATACAAAATC
CACAGACCAA
ACCATTTTGA
AATCCATCTG
CAAACAAACC
CGA.AAAAAGA
ACACCAGCAC
AGCAATCCCT
CATCCGAGCC
AAAACTACAT
CAAATAACCA
AATGCATTGT
AGTGCAGTTA
ACAATAGAAT
AATCACTATT
AACTCTTGAA
TCAA.ATTATA
TCGCAAAACC
GATGAATTGT
CAAGA.ATCAA
TGTAATATCC
AGTTTTGGCA
TGCCAATCAC
AAAA.AACATC
ACCCACAACC
AGAAACACAC
CAAGCCAAGC
AGTGTTCAAC
CAAAACAATA
AACCACCAAA
AACTACCACC
CTCACALATCC
CCTCTCAACC
CTCCACACCA
CTTAGCAGAG
TGGAGTTGAT
ACCTCACCTC
GCAGAGGTTA
TAAGTAATAT
ATGATTGCAA
CTAGGACAGA
ATCCTGTATA
ACGCTAACTA
TAGTATGAGA
CGCACTGCCA
TCTTGTTTAT
ATGATAATCT
AA.AGTTACAC
ACCACCTACC
ACATCACCAA
CATACAACAG
ACAAAACCAC
TTCGTTCCCT
CCAAGCAACA
ACCACAAACA
AACCCAACAA
ACTGCACTCG
ACCCCCGAAA
AACTCCACC
AACCGTGATC
GATCCACAAG
AAGTCAGAAC
TTTTAGTGCT
AAAAGAAACC
TACTAAATAA
TGTATCAAAT
TATAAACAAA
TCATGGTAGC
TCAAAAACAA
GGACTCTAGA
ACAGATTAAA
CAACCTCTCT
TAACAACGGT
CTACTCAAGT
TCCACACAAG
CACAAACCAA
GTCCAAAAAA
GCAGTATATG
AACCAAAGAA
AAAGAGACCC
AAAAACTAAC
ACACAACCAC
ACACACCCAA
AAAAAACCCA
TATCAAGCA
TCAAGTGCA.A
ATAACTGAGG
TTAAGAACAG
A.AATGCAATG
GCTAAGTGAA
CAACACATAG
CAAATCCAAT
ATAGAGTAGT
CATTGGGGCA
AAAGACCTGG
TTTAAAATCT
CATAATTGCA
CACAGTTCAA
CTCACCAGAA
TTCAGCTACA
AGGCAGAACC
TCCACCAAAA
TGGCAACAAT
GAAACCAACC
AAAAACACCA
CCTCAAGACC
ATTAAAACAC
CTCCACACAA
GCCACATGCT
GAACGAAATT
TCTTCCTAAC
AGTTTTACCA
GTTGGTATAC
GAACTGACAC
4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 387 TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA
CAGAATTACA
5940
GCTACTTATC
TATGAACTAC
ACGAAGATTT
ATCAAAAGTT
AAACAAAGCT
TCTCAAGAAT
CTCCAACATT
CAGAGAATTT
CAGTGAGTTA
GTCAAGCAAT
AGAAGTCCTT
GAAATTGCAC
AACAAGGACT
GGCTGACACT
ATTACCAAGT
AATTATGACA
GTCATGCTAT
ATTTTCTAAT
CACTTTATAC
AATAAATTAC
AGTCAATGA
TAATGTAAAT
CATTGTAGTA
CACACCAGTT
CAAAACACAC
ACAATCAATA
CTAGGCTTCT
CTACACCTTG
GTAGTCAGTT
TACATAA.ATA
GAAACAGTTA
AGTGTCAATG
CTATCATTAA
GTTCAGATAG
GCATATGTTG
ACATCGCCTC
GATAGAGGAT
TGTAAAGTAC
GAAGTCAGCC
TCAAAAACAG
GGTAAAACTA
GGTTGTGACT
TATGTAAACA
TATGACCCTC
AAAATCAATC
ACTGGCAAAT
TTGTTATCAT
ACACTAAGCA
CAGCTGTCAA
CCACTAAAAA
TGTTAGGTGT
ALAGGAGA.AGT
TATCAAATGG
ACCAATTATT
TAGAATTCCA
CAGGTGTAAC
TCAATGATAT
TAAGGCAACA
TACAGCTGCC
TATGCACTAC
GGTATTGTGA
AGTCCAATCG
TTTGTAACAC
ACATAAGCAG
AATGCACTGC
ATGTGTCAAA
AGCTGGAAGG
TAGTGTTTCC
AAAGTTTAGC
CTACTACAAA
TAATAGCTAT
AAGACCAACT
*CAACCGGGCC
CCTAAATGTA
GGGATCTGCA
GAACAAGATC
GGTCAGTGTT
ACCCATAGTA
GCAGAAGAAC
AACACCTTTA
GCCTATAACA
AAGTTATTCC
TATCTATGGT
CAACATCAAA
TAATGCAGGA
AGTATTTTGT
TGACATATTC
CTCAGTAATT
ATCCAAcAAA
CA.AAGGAGTA
CAAGAACCTT
TTCTGATGAG
TTTTATTCGT
TATTATGATA
TGGTTTACTG
AAGTGGAATC
TCAATAAGCA
ATAGCAAGTG
AAAAATGCTT
TTAACCAGCA
AATCAACAGA
AGCAGATTGT
AGCACTTACA
AATGATCAGA
ATCATGTCTA
GTAATAGATA
GAAGGATCAA
TCAGTATCCT
GACACTATGA
AATTCCAAGT
ACTTCTCTTG
AATCGTGGGA
GATACTGTGT
TATGTAAAAG
TTTGATGCAT
AGATCTGATG
ACTACAATTA
TTGTATTGTA
AATAATATTG
AGAAGAGGAA
GTATAGCTGT
TGTTGTCTAC
AAGTGTTAGA
GCTGTCGCAT
TGGAAATCAC
TGTTGACAAA
AAAAATTAAT
TAATAAAGGA
CACCTTGCTG
ATATTTGTTT
TCTTTCCACA
ACAGTTTGAC
ATGACTGCAA
GAGCTATAGT
TTATAAAGAC
CAGTGGGCAA
GGGAACCTAT
CAATATCTCA
AATTACTACA
TTATAGTAAT
AAGCCAAAAA
CATTCAGCAA
AGAAGAGAA.G CACCACAGTA 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 ATAGAC3AAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC
AATCCCAAAT
388 CAACTTACKA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 0.0.
*.0 o.
0 0.
TACCCACATA
ATCAACCACT
AGAAATCCTT
AGTCATAATT
AACAAGATAC
GCTGAACTGG
ATAGGATCTA
ATTGAGATCA
AAGATAAGAG
CA.AACCATCC
ACATTAGATA
CAAAATGACC
CATATTGATC
TAACCATAAC
TTGGACACCT
TGAAGATATA
TCATCCALACC
GGAAACTCTG
GAGTGTAATG
AACTTAATTA
ACACAGTCAT
TTCCAGTCAT
AACTTACTTA
ATCTTGAATA
ACTAAGCTAG
AAATCAACAC
GTAAATTTGA
ACTTTGAATG
TCAAGTCA.AT
ATAGAACAGA
TAAACAACAT
ATAGTGATGA
TGTACAATAC
ATCTGCTCAA
TCCACAAA-AG
AAACCAAAAA
TCAAGTGAAA
TATTTGGATA
AAAAACTTAT
TATACAGTAT
ATAAILACTAT
CTAATGTGTA
CTTTAGGGAG
GTAGACAJAAG
TALATATCTAG
TACTTATGAC
AAAAAATAAT
AACTAGGATT
ATCCTTAACT
ATCATTCACA
GATTAGAGGT
GCCTCCTCAT
GGACAAAAGC
AGAATATGCT
AACAAAACAA
CATTAAA1AAG
TGTTATATCA
GAGACTACCA
CATAACCATA
TAATGATATT
GCATGGTTGC
ACCACCAGCG
TAGATGCCAC
ATATATTAGT
TTTGATAAGG
TCTAACTGAT
TTATCTTTTT
CCCACTACTA
ATATCATAAA
ATATAAAAGT
ACGAAGAGCC
AAAGGAAAAG
TATAGTTACA
AA.ATTAACAG
CATTGCTTGA
GCATTACTAG
ATAGACACTT
CTTGGTATAG
TCAGCATGTG
CTTAGAGATA
TACATTGAGA
GCAGACGTGC
AGCAATCCAA
ACCGGATAAA
TACATTCAAT
TTTATTAAAT
TCAACAATTT
GTCATAATGC
TTATGGGACA
AGTTATTTAA
AACGGCCCTT
GAGCATATGA
GGTGAALCTGA
ATGTCCTCGT
ATAGAAATAA
GACAGAGTTA
TAAAA.ACCTC
CTGGGGCA
ATGGTAGAAG
TGAGGCAA
TGTCTGAAAT
TTGGAGTGCT
TTGCTATGAG
ATGAAGAACC
GCAATAGA
TGAAGAAGAC
AAGAGTCAAC
TATCCTTGTA
CATAAAAACA
CATATATTTG
CTCCA-ACATC
TTGACCATAA
AAATGGATCC
AAGGTGTTAT
ATCTTAAAAA
ATCTTAAAAA
AATTAGAAGA
CTGAACAAAT
GTGATGTAAA
AGCCCAACAA
AAGTATCACA
TATGTCGCGA
ATGTCACTAC
CTTCATGTTA
AAGTGGAGCT
AGAGAGTTAC
TAAkACTTCTT
CAATTCACCT
AAACAACAAG
AATAAAGAAC
TGTGAATGAT
GTATATCATC
TATTACKATT
ATGAAATTCA
TTAACATCCC
CGACTCTATG
CATTATTAAT
CTCTTTTTCA
TGATTACACC
ACTAACTATA
ACCAACTTAT
TGCTACAACT
GGTGTACGCC
TAATTCAGGT
7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAC 389 AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAA 9060
S
ACAACACTCT
TGGTTCAATT
AAAAGTCATG
AATCAATATG
AATCAATTTT
TGGATAAGTA
AATGTTGTGT
GAAGGCTTCT
ACAGAAGAAG
GCAGCTATTA
ACAGTGTCTG
TTGATTAAGC
AGAATCTTTG
TGTAATGAAA~
TATAGAATCA
ATTGTCCTAC
GAAATCACAG
CTGCCTAAAA
GATCTAATAT
GAACATGAAA
TTGAGAGATA
CTCAACAACT
AGAATGTTTG
ATAGCTGAAA
TGAAAAAATT
TATATACAkA
GGTTTATATT
GTTGTATCGT
TGACATGGAA
ATTGTTTAAA
TATCACAATT
ACATAATAAA
ATCAATTTAG
AGGCTCAAAA
ATAATATCAT
TTGCAGGTGA
GACATCCAAT
CTAAGTTCTA
TAAAAGGGTT
CTCTAAGATG
AAAATGATTT
AAGTGGATCT
GGACTAGTTT
AGTTGAAGTT
ATAAATTCAA
CTAATCACGT
CTATGCAACC
ATATTTTACA
GATGTGTTCA
ATTAAATAAC
AATAGATAAT
TTATCATAAA
AGACATCAGC
TACATTAAAC
ATTTCTTTAT
AGAAGTAGAG
GAAACGATTT
GGACCTACTA
AAATGGTAAA
TAATAATCTC
GGTCGATGAA
CTTATTAAGT
TGTAAATACC
GTTAAACTAC
GATTATTTTA
TGAAATGATA
TCCTAGAAAT
CTCTGAAAGC
TGAATGCGAT
GGTATCACTA
AGGTATGTTT
ATTCTTCCCT
ATGCAACATC
ATATTAACAC
CAAACTTTAA
GGACTCAAAA
CTTAGCAGAT
AAAAGCTTAG
GGAGATTGTA
GGATTTATTA
TATAATAGCA
TCAAGAGTAT
TGGATAATCC
AATAACTTGA
AGACAAGCAA
AGTCTAAGTA
TACAACAGAT
TATAALACTTA
TCAGGATTGC
ATAAATGACA
TACATGCCAT
GACAGATCGA
CTATACAATT
ACTGGTAAAG
AGGCAA-ATCC
GAGAGTTTGA
CTCCATCATG
AATATCGATC
GTGGTTTTCA
AAATCACAAC
TAAATGTTTG
GGCTGAGATG
TACTGATT
TGTCTTTAAT
TGCTAAATAA
GTCACACTTT
TATTAAGTAA
GTGAGCTATA
TGGATTCTGT
CATTAAGAGG
GGCCCACCTT
ATACTTATCC
GGTTCTATCG
AAGCCATTTC
CACATATACA
GAAGAGTACT
GTGTAGTCAA
AAAGAGAGCT
AAATCTTAGC
CAAGATATGG
GTTA.ATACAC
AAATGAGGTA
GTTTATTTTA
TACTACTTAC
CTTAATTACT
TGGATTCAAT
ATTTCATAAT
TCTAAACATA
CATCACAGAT
ATTAGACA.AG
ATTTCTTAAA
TTTTCTCTTC
AAGAATTAAC
TGCTTTCATT
AAGGAATGCT
ATCTCTACTT
TGAGTTTCAT
ACCTCCAAA.A
AAATTATATA
AGAGTATTAC
TCAAAGCTAT
CAGTGTAGGT
AGAGAAAATG
TGATCTAGAG
9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGC"CA AGTCAAATCG TTATKATGAT 390 AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG
GCATTTAGAT
CAATCTCTGT
AGACATGCAC
AGTGGATTAT
ATTGAAGCTA
CTGATAAATG
CAGACCCATG
GAGTATGCAG
CAGTTCATGA
GTCCTGAGAG
TCTATAGGCA
ATATTTAGGA
TGTAACAATA
AATCTTGATA
GGTGGTGATC
GAAGCTATAG
AAGCTCCAGG
GATAAAAATC
GAAAGGCAAG
ATAGCCCCAA
CTAAATGACA
GAAAGTTTAC
ATAACTAATA
GATATGATGA
GACAAAAGAG
ATGAA.ACATC
TCTCTTGGTT
CTCCTTTCAT
ACAGATATCA
TATCATTATT
GTGATA.ATCA
CACAAGCAGA
GTATAGGCCA
GCAAAACAT
TAGGTCCATG
GCTTAACACA
ACATTTGGTT
AGCTATATTT
GCATTGATAT
CTAATTTGTT
TACATTCAGT
ATCTTCCAGA
CCAATGCCGA
CTAAAATTAC
ACAAAATATT
TTATGCAAAA
CTTTTTATAA
TACTTGAAAA
GGAAAALATAT
AGTTATTAAG
ATGTATCTGC
GCATTTAA
AAAGGATCAT
TATGGGTGGT
AGATCTAATA
GTCAATTGAT
TTATTTGTTA
TAAGCTTAAG
CCAGCACAAT
GATAAACACG
GGAGTTAGAA
ATACA.ATCAA
AGATATATTG
GGCTTTATCA
ATATCGAAGC
GTTTGTGTTG
TGATAGACTG
GTTTGTAACA
TAGTGAGATT
TTCTAAAAGT
TATAGAAcCA
AGCAGAAAAA
AACATCAGCA
AACTTTACTT
TTTAGAAAAT
AGTGATGTAT
ATACCTCTTG
GTTGTTAATC
ATTGAGGGCT
TCTCTCAAAG
ATAAGCAAAC
GCATTAAATA
GGAACAGAGA
GGAGTGTACT
ATACTTGATG
TACAGAGGAG
ATTGCTTTGC
AAAGTATTALA
TTGTATATGA
TTTTATAGGA
AGCTATTATA
AACAAATTCT
TTGATGAGGG
AATAGATTAG
GCACAACATT
ACTTACCCTC
ATAGTTAATC
ATAGATACAA
ATAAGGATAC
CTTAGTATA6A
TAGATGAACT
TCACAATAAT
TTAATGAGGT
GGTGTCAAAA
GGAAATTCTC
CAGTTAGACT
GCCTTAALATT
CCTATATATC
ATCCAGCCAG
ATTTTAAAGT
AAAGCTTATT
AACTCCGAAA
AACACTTAAA
ATTTGCCTAT
GAACTCCAGA
CTGGTCACGA
TGACATGTGT
ATCCACAGGC
CAGTAACAGA
ATACTACCAC
ATGGATTAAG
TTATATCAGG
CTGATATTAA
TTCCACTAGA
CTGAATTAAG
GCATGGAGTA
ATGTACATAT
TGATGAACAA
ACTGTGGACC
TATCACAGCT
TATAGAGGGT
GTTATATAAA
CCGAGATATG
TATCAAAAA6A
TAGTTTAGALA
ATGCAGTTTA
TCATGCATTA
ALACTTTTTTT
GCTGTTTGGT
CTTCCTTACA
TTTACAAGAT
CATCACATTT
TTTAGGGTCT
AGTCTTAAGT
TGAGATTGAT
AGTTGTTTAT
AACAAAATCC
TAGGGCTACT
TTGTAACAA
CAAGTATGTA
10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 391
AGAGAAAGAT
ACAATGGACA
AATGTTAATA
ACGCAGGAGA
GACCAAATAG
GAATTCATGG
TTGTTTCCAC
TGTGAATTCC
CCTATCAATC
AATTGCATAA
CCTAATAGAA
TTTACAGGAG
CTACCAGATA
AAATCTGGAT
CATAATGCTT
ATGAA.AGATT
ATGTTCATTA
AALAGGTTATG
TTGGAGTTAA
GTCATAAAAT
TTTILAGTTGT
GTTAACATAG
AGAATGGGGT
GAATTTTACA
CTAACAAAAC
CTTGGTCATI
TTAAATATAC
GTTTAACTCG
AAAAAACAAT
ATTTATTAGC
AAGAACTGAG
AATATCTA.AG
CTGCATCAAT
ATGTATTAAC
GTTTTGGTCT
TTATTCTCAT
ATGTTGATAT
AAATAAGTTT
CTCACATC-
ATATTTTAAG
CAAAAGGTAT
ATTTGAATGT
GTAAAGCAAA
TAGACAGTAG
ACATAGTCAA
GGTTTTTAAA
ATTATCACCC
TAATA.AATGT
CATCAAATCT
AAATAAGAAT
ATCCAATATA
AACTAGCACT
TGGTGAAAGA
GCCAGTGTAC
AAAATTAGAC
TACTGGAACA
TGTCAATTAT
ACCAGCTTAT
AGAAAAGTAT
TAGCCTGATG
ACCGAAGCTG
CATCAAGTTG
A.ACCCAATAT
CTCTAATTTA
TACTAATTTA
TTTTGAAAAA
TTTCTTTAAT
ATTAGAATGT
CTACTGGAAA
TCAAGACACA
ACGCCTTGAT
AACACACATG
AGATAAATTA
CTTTTACATT
TGCTAATTCA
GTAGGAGTAA
ATAGCCAGTG
GGACCCACCA
AACAGACAAG
TGGGTATATG
CTTGGACTGT
TTACACCGTT
AGAACAACAA
GGAGATGAAG
TCGGTTGTGG
AATGAGATAC
AAGCAAGTGA
GTAGAATTAT
ATATTAGTAC
GCTGGACATT
GATTGGGGAG
GCTTATAAGA
GATATGAACA
TCTATGTCTA
AGTTTGCGTA
AATGCTAAAT
AAAGCTATAT
ACCATTAA
AGTTATAACT
GAATTAGAAG
CATCGCCAAG
GTATAATAkAT
AGCCATGGGT
TTTTAACCAA
CATCCATAGA
CATATGAAAA
TAACAGTCAG
ATTATCATTT
ATATCGACAT
AACAATTCAC
ATTTGATGAA
TACAAAAGCA
TCTTAAGTAA
ATAAAATGTC
GGATTCTGAT
AGGGGTACAT
CTTATTTGCT
CTTCAGATCT
AAGTTTTCCT
GAATAAAAGG
TTACCGTATG
TATCTTACAT
ATAAA AACAA
TTTCAGACAA
ATAATTATAA
TATTATGTTC
AGAAAAATAT
AGGCTCATCC
AAAGCAAAGA
CAACAAAGAT
AGCCAAAAAG
TAGTAGACCA
TGATACTAGT
TGTGTTTCAA
AAACATATGT
ACCTCCTATA
GCACATGTTC
CAALAGCACTT
TGATTATTTT
TATTCAACTT
AACTGATCAT
ATGTTTTCAT
TCTTTGTGTT
AGAACAAAAA
CTGTCACAGT
CCCTTGGGTT
AGATTTAGTT
PTTCAATGAT
CACTCA:TTTG
CAAACTATAT
12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 CACCCAACCC CAGA.AACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG
TAATAATAGT
392
AACAAACCTA
AGTAAAATGC
TTGTACAATT
GCAAAATCTA
GCATCACTTT
TCCACAGGAT
AGTTGTATAG
CTTCATCCAG
CCTATTGAAT
ACCATTCCTG
GCAGAACCTA
AAAATTATAA
AGATGCATTT
ATTACTATAT
TTAATCCTTA
AAATTGATAC
ATCGATGCAG
ATTAAGACTT
ATAGCTGGAC
CTAAAATGGC
TACATGATAG
GAGCTCAAGA
TAGTTTAA;A
TAGTTATTAA
TTTTAGTCTT
ACACIAACGAG
AATTTTGTAT
ATATTAAATC
TATTTCCAAT
ACCAACTTTA
ATTGCATGCT
GCAAGATCAG
CATTCATAGG
ACATILAGATA
TTCTAAGGTT
CTACAGATGC
TTAGCATCTT
TTGAATGGAG
TAATTGCAAA
TAAAAACTTA
CAATAGGCCC
TTTCAAGAAC
TTATTAAAAG
CATTGTCAA.A
GTAATGAAGT
TAGATCATGT
AGTCCACATA
AGCTGATTAA
TATCATTAAC
AAAATATACA
AAGGGGTTAA
ACATTAGTTT
AAGTGGAAAT
TTCCACTGTT
TGTTGTGATA
CACCACCACT
TCCTTGGCAT
TATAGAGTAT
TGAAGGAGCT
CATTTACAGA
ATACAACGGG
AACTAATAAC
TGTCTGCGAT
TAAGCATGTA
ATATCATGCT
CGTGTGCCTA
TGCAAATATA
TAAAAATTTC
CTTALATACCT
ATTGAAGAGT
ATTCAGCAAC
TTTAAATTTT
TCCTTACTTA
AATAACAGGT
AAGTTTGGTC
AACTTTTCAA
ATAAAAGTCT
TTGACACTTT
ACCGAATCTA
ACCACAAGAT
GACAAGATTA
TCACATCAGA
CATGTCAATA
ATTTTAAAAG
GGTAACTTAT
AGTTTAAJ.AG
CATATAAACA
ATTCATTGGT
GCTGAATTAC
AGAAAGTGCA
CAAGATGACA
GGTAGCAAGT
CTTCCTGTTT
ATTATGCCTA
TTCCTTTGTT
GTAGTTAATG
AAGCTTATAA
AGATCAGCTG
AGTGAATTGT
AGTGTGCTAT
AAATTTAGAT
TAATTTAGCA
AA.AACTAACA
TTTTCTCGT
TGATGATGTC
TCAATTATAG
TAGATCATTC
CATCTTTAGT
GATTTAACTT
ATCTTAAGAT
TATTACGTAC
ATTGCAATGA
TAGATTATGG
CTTATTTACA
CTGTTACAGC
AGTACTGTTC
TTGATTTCAA
TAAAAGGATC
TTGATGTTGT
AAAAAACTGA
ACCCTATAAC
GAGATATATT
ACCACAAGCA
AACTTAATTA
TAALATAGTTT
ACAACCTTCC
GCTAACACAT
TATTGATTCC
ATTATACATG
AACATTCTCT
CAAACAAGAC
AGGTA.ATACA
AAGGAATAGT
TGTATTTAGT
TAAGGACCCC
GGTAGTAGAA
TCATAGTTTA
TGAGAATTTA
TATAA.AATTT
CAATTGGAGT
TTCTGTAAAT
ATTAGATAAC
TGAAGTTTAC
ACAAAATGCT
CAAGGAATCT
AAAAAAAGGA
ATCATATTCT
TATGAATATC
CAATCATTTA
AACAACCAAT
CAACGAACAG
CATTATATTA
kAAATTATCA rGCATTCACA 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15219 393 INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 2166 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: Met Asp Pro Ile Ile Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 1 5 10 Ser Tyr Leu Lys Gly Val Ile Ser Phe Ser Giu Cys Asn Ala Leu Gly 25 Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 40 Ile Ser Arg Gin Ser Pro Leu Leu Giu His Met Asn Leu Lys Lys Leu .50 55 *Thr Ile Thr Gin Ser Leu Ile Ser Arg Tyr His Lys Gly Giu Leu Lys 70 75 Leu Giu Giu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lye Ser 90 Met Ser Ser Ser Giu Gin Ile Ala Thr Thr Asn Leu Leu Lys Lys Ile *100 105 110 Ile Arg Arg Ala Ile Giu Ile Ser Asp Val Lys Val Tyr Aia Ile Leu 115 120 125 **Asn Lys Leu Gly Leu Lye Giu Lys Asp Arg Val Lye Pro Aen Asn Asn 130 135 140 Ser Giy Asp Glu Asn Ser Val Leu Thr Thr Ile Ile Lys Asp Asp Ile 145 150 155 160 Leu Ser Ala Vai Giu Asn Asn Gin Ser Tyr Thr Aen Ser Asp Lye Ser 165 170 175 His Ser Val Asn Gin Asn Ile Thr Ile Lys Thr Thr Leu Leu Lye Lye 180 185 190 394 Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu Ile His Trp Phe 195 200 205 Asn Leu Tyr Thr Lys Leu Asn Asn Ile Leu Thr Gin Tyr Arg Ser Asn 210 215 220 Giu Val Lys Ser His Giy Phe Ile Leu Ilie Asp Asn Gin Thr Leu Ser 225 230 235 240 Giy Phe Gin Phe Ile Leu Asn Gin Tyr Gly Cys Ile Vai Tyr His Lys 245 250 255 Gly Leu Lys Lys Ile Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 260 265 270 Lys Asp Ile Ser Leu Ser Arg Leu Asn Val Cys Leu Ile Thr Trp Ile 275 280 285 Ser Asn Cys Leu Asn Thr Lau Asn Lys Ser Leu Giy Leu Arg Cys Gly 290 295 300 Phe Asn Asn Vai Vai Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys Ile 305 310 315 320 .Leu Lys Leu Phe His Asn Giu Giy Phe Tyr Ile Ile Lys Giu Vai Giu 325 330 335 .Giy Phe Ile Met Ser Leu Ile Leu Aen Ile Thr Giu Giu Asp Gin Phe *340 345 350 Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn Ile Thr Asp Ala Ala 355 360 365 Ile Lys Ala Gin Lye Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 370 375 380 **Asp Lys Thr Val Ser Asp An Ile Ile Aen Gly Lys Trp Ile Ile Leu 385 390 395 400 Leu Ser Lye Phe Leu Lye Leu Ile Lys Leu Ala Gly Asp Asn Asn Leu 405 410 415 Aen Aen Leu Ser Giu Leu Tyr Phe Leu Phe Arg Ile Phe Giy His Pro 420 425 430 Met Val Asp Giu Arg Gin Ala Met Asp Ser Val Arg Ile Aen Cys Asn 435 440 445 Giu Thr Lys Phe Tyr Lau Leu Ser Ser Leu Ser Thr Leu Arg Giy Ala 450 455 460 Phe Ile Tyr Arg Ile Ile Lye Gly Phe Val Asn Thr Tyr Asn Arg Trp, 395 465 Pro Thr Tyr Lys Leu Ile Lys Lys 530 Pro Lys 545 His Ile Asp Arg Asn Giu Asn Ser 610 Val Gly 625 Ile Leu Giu Ser Leu Lys Asn Asn 690 Asn Gin 705 Asp Giu Leu Leu Ile 515 Vai Asp Gin Ser Cys 595 Asn Arg Aila Leu Ala 675 Tyr Ala Leu Arg Asn 500 Leu Asp Leu Asn Arg 580 Asp His Met Giu Thr 660 Gly Ile Phe His Asn 485 Thr Ser Leu Ile Tyr 565 Arg Leu Val Phe Lys 645 Arg Ile S er Arg Gly 725 470 Ala Tyr Giy Giu Trp 550 Ile Val Tyr Val Ala 630 Met Tyr Ser Lys Tyr 710 Val Ile Pro Leu Met 535 Thr Giu Leu Asn S er 615 Met Ile Gly Asn Cys 695 Giu Gin Val S er Arg 520 Ile Ser His Giu Cys 600 Leu Gin Aia Asp Lys 680 Ser Thr Ser Leu Leu 505 Phe Ile Phe Glu Tyr 585 Val Thr Pro Giu Leu 665 Ser Ile Ser Leu Pro 490 Leu Tyr Asn Pro Lys 570 Tyr Val Gly Gly Asn 650 Glu Asn Ile Cys Phe 730 475 Leu Giu Arg Asp Arg 555 Leu Leu Asn Lys Met 635 Ile Leu Arg Thr Ile 715 Ser Arg Ile Giu Lys 540 Asn Lys Arg Gin Giu 620 Phe Leu Gin Tyr Asp 700 Cys Trp Trp Thr Phe 525 Aila Tyr Phe Asp Ser 605 Arg Arg Gin Lys Asn 685 Leu Ser Leu Leu Giu His Ile Met Ser Asn 590 Tyr Giu Gin Phe Ile 670 Asp Ser Asp Asn 495 Asn Leu Ser Pro Giu 575 Lys Leu Leu Ile Phe 655 Leu Asn Lys Vai 480 Tyr Asp Pro Pro S er 560 Ser Phe Asn Ser Gin 640 Pro Giu Tyr Phe Leu 720 Ile Pro Leu Vai Thr Ile Ile Cys Thr Tyr Arg His Ala 740 745 735 Pro Pro Phe 750 396 Ile Lys Asp His Vai Vai Asn Leu Asn Giu Val Asp Giu Gin Ser Giy 755 760 765 a. a a. *a a a a a.
a a a a.
a Leu Tyr 770 Trp Thr 785 Lys Phe Ile Ser Asp Tyr Ala Gly 850 Asp Met 865 Pro Ala Ile Leu Gin Glu Arg Asn 930 Ala Leu 945 His Leu Leu Tyr Leu Tyr Arg Tyr Ile Giu Ser Ile Lys Pro 820 Leu Leu 835 Ile Giy Gin Phe Ser Ile Asp Asp 900 Leu Giu 915 Ile Trp Cys Asn Lye Thr Met Asn 980 Arg Ser 995 His Met Ala Ile 790 Thr Ala 805 Val Arg Ala Leu His Lye Met Ser 870 Lys Lys 885 Phe Lye Tyr Arg Leu Tyr Asn Lye 950 Phe Phe 965 Leu Pro Phe Tyr Gly Gly Ile 775 Ser Leu Leu Leu Ile Asn Leu Ile Giu 825 Asn Ser Leu 840 Leu Lye Gly 855 Lye Thr Ile Vai Leu Arg Val Ser Leu 905 Gly Glu Ser 920 Aen Gin Ile 935 Leu Tyr Leu Aen Leu Asp Met Leu Phe 985 Arg Arg Thr 1000 Giu Asp Gi y 810 Gly Lye Thr Gin Val1 890 Glu Leu Al a Asp Ser 970 Gly Pro Gly Leu 795 Asp Gin Leu Giu His 875 Gly Ser Leu Leu Ile 955 Ile Giy Asp Trp 780 Ile Asn Thr Leu Thr 860 Asn Pro Ile eye Gin 940 Leu Asp Gly Phe Cye Gin Ser Leu Gin Ser His Ala 830 Tyr Lye 845 Tyr Ile Gly Val Trp Ile Gly Ser 910 Ser Leu 925 Leu Arg Lye Val Met Aia Asp Pro 990 Leu Thr 1005 Lye Lys Ile 815 Gin Giu Ser Tyr Aen 895 Leu Ile Asn Leu Leu 975 Asn Giu Leu Gly 800 Asp Ala Tyr Arg Tyr 880 Thr Thr Phe His Lye 960 Ser Leu Ala Ile Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu
.LV.LV
1015 1020 397 Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 1025 1030 1035 1040 Thr Cys Val Ile Thr Phe Asp Lys Asn Pro Asn Ala Giu Phe Val Thr 1045 1050 1055 Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Giu Arg Gin Ala Lys Ile 1060 1065 1070 Thr Ser Giu Ile Asn Arg Leu Ala Val Thr Giu Val Leu Ser Ile Ala 1075 1080 1085 Pro Asn Lys Ile Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Giu 1090 1095 1100 Ile Asp Leu Asn Asp Ile Met Gin Asn Ile Giu Pro Thr Tyr Pro His 1105 1110 1115 1120 Gly Leu Arg Val Val Tyr Giu Ser Leu Pro Phe Tyr Lys Ala Giu Lys 1125 1130 1135 Ile Val Asn Leu Ile Ser Gly Thr Lys Ser Ile Thr Asn Ile Leu Giu C..1140 1145 1150 .Lys Thr Ser Ala Ile Asp Thr Thr Asp Ile Asn Arg Aia Thr Asp Met :1155 1160 1165 Met Arg Lys An Ile Thr Leu Leu Ile Arg Ile Leu Pro Leu Asp CyB *1170 1175 1180 Aen Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser Ile Thr 1185 1190 1195 1200 Gluf Leu Ser Lys Tyr Val Arg Giu Arg Ser Trp Ser Leu Ser Aen Ile 1205 1210 1215 *CVal Gly Val Thr Ser Pro Ser Ile Met Phe Thr Met Asp Ile Lys Tyr 1220 1225 1230 Thr Thr Ser Thr Ile Ala Ser Gly Ile Ile Ile Glu Lys Tyr Asn Val *1235 1240 1245 see* Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lye Pro Trp Val Gly 1250 1255 1260 Ser Ser Thr Gin Giu Lys Lye Thr Met Pro Vai Tyr Asn Arg Gin Val 1265 1270 1275 1280 Leu Thr Lys Lye Gin Arg Asp Gin Ile Asp Leu Leu Ala Lye Leu Asp 1285 1290 1295 Trp Val Tyr Ala Ser Ile Asp Asn Lys Asp Giu Phe Met Giu Giu Leu 398 1300 1305 1310 Ser Thr Giy Thr Leu Giy Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 1315 1320 1325 Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 1330 1335 1340 Arg Pro Cys Giu Phe Pro Ala Ser Ile Pro Ala Tyr Arg Thr Thr Asn 1345 1350 1355 1360 Tyr His Phe Asp Thr Ser Pro Ile Asn His Val Leu Thr Giu Lys Tyr 1365 1370 1375 Giy Asp Giu Asp Ile Asp Ile Val Phe Gin Asn Cys Ile Ser Phe Gly 1380 1385 1390 Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn Ile Cys Pro Asn 1395 1400 1405 Arg Ile Ile Leu Ile Pro Lys Leu Asn Giu Ile His Leu Met Lys Pro 1410 1415 1420 *Pro Ile Phe Thr Gly Asp Vai Asp Ile Ile Lys Leu Lye Gin Val Ile 1425 1430 1435 1440 *Gin Lye Gin His Met Phe Leu Pro Asp Lys Ile Ser Leu Thr Gin Tyr **1445 1450 1455 **Val Giu Leu Phe Leu Ser Asn Lye Ala Leu Lye Ser Gly Ser His Ile 1460 1465 1470 Aen Ser Asn Leu Ile Leu Val His Lye Met Ser Asp Tyr Phe Hie Aen 1475 1480 1485 Ala Tyr Ile Leu Ser Thr Aen Leu Ala Gly His Trp Ile Leu Ile Ile 1490 1495 1500 *.Gin Leu Met Lye Asp Ser Lye Gly Ile Phe Giu Lye Asp Trp Giy Giu 1505 1510 1515 1520 .Gly Tyr Ile Thr Asp His Met Phe Ilie Asn Leu Asn Val Phe Phe Asn 1525 1530 1535 Ala Tyr Lye Thr Tyr Leu Leu Cys Phe His Lye Giy Tyr Gly Lye Ala 1540 1545 1550 Lys Leu Glu Cys Asp Met Aen Thr Ser Asp Leu Leu Cys Val Leu Glu 1555 1560 1565 Leu Ile Asp Ser Ser Tyr Trp Lye Ser Met Ser Lys Val Phe Leu Giu 1570 1575 1580 399 Gin Lys Val Ile Lys Tyr Ile Val Asn Gin Asp Thr Ser Leu Arg Arg 1585 1590 1595 1600 Ile Lys Giy Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asp 1605 1610 1615 Asn Ala Lys Phe Thr Vai Cys Pro Trp Val Val Asn Ile Asp Tyr His 1620 1625 1630 Pro Thr His Met Lys Ala Ile Leu Ser Tyr Ile Asp Leu Val Arg Met 1635 1640 1645 Gly Leu Ile Asn Val Asp Lys Leu Thr Ile Lys Asn Lys Asn Lys Phe 1650 1655 1660 Asn Asp Giu Phe Tyr Thr Ser Aen Leu Phe Tyr Ile Ser Tyr Aen Phe 1665 1670 1675 1680 Ser Asp Asn Thr His Leu Leu Thr Lys Gin Ile Arg Ile Ala Asn Ser 1685 1690 1695 Glu Leu Giu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Giu Thr *1700 1705 1710 *Leu Glu Asn Met Ser Leu Ile Pro Val Lye .Ser Asn As Ser An Lye .1715 1720 1725 *Pro Lye Phe Cys Ile Ser Gly Asn Thr Giu Ser Met Met Met Ser Thr .1730 1735 1740 Phe Ser Ser Lye Met His Ile Lye Ser Ser Thr Val Thr Thr Arg Phe 1745 1750 1755 1760 Aen Tyr Ser Lye Gin Asp Leu Tyr Asn Leu Phe Pro Ile Val Val Ile 1765 1770 1775 **Asp Lye Ile Ile Asp His Ser Gly Asn Thr Ala Lye Ser Asn Gin Leu 1780 1785 1790 **Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser *1795 1800 1805 Leu Tyr Cys Met Leu Pro Trp His His Val Aen Arg Phe Aen Phe Val 1810 1815 1820 Phe Ser Ser Thr Gly Cys Lye Ile Ser Ile Giu Tyr Ile Leu Lye Asp 1825 1830 1835 1840 Leu Lye Ile Lye Asp Pro Ser Cys Ile Ala Phe Ile Gly Giu Gly Ala 1845 1850 1855 400 Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp Ile Arg 1860 1865 1870 Tyr Ile Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro Ile 1875 1880 1885 Glu Phe Leu Arg Leu Tyr Asn Gly His Ile Asn Ile Asp Tyr Gly Glu 1890 1895 1900 Asn Leu Thr Ile Pro Ala Thr Asp Ala Thr Asn Asn Ile His Trp Ser 1905 1910 1915 1920 Tyr Leu His Ile Lys Phe Ala Glu Pro Ile Ser Ile Phe Val Cys Asp 1925 1930 1935 Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys Ile Ile Ile Glu Trp 1940 1945 1950 Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 1955 1960 1965 Ile Leu Ile Ala Lys Tyr His Ala Gin Asp Asp Ile Asp Phe Lys Leu 1970 1975 1980 Asp Asn Ile Thr Ile Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu S1985 1990 1995 2000 Lys Gly Ser Glu Val Tyr Leu Ile Leu Thr Ile Gly Pro Ala Asn Ile 2005 2010 2015 Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Ile Leu Ser Arg 2020 2025 2030 Thr Lys Asn Phe Ile Met Pro Lys Lys Thr Asp Lys Glu Ser Ile Asp 2035 2040 2045 Ala Val Ile Lys Ser Leu Ile Pro Phe Leu Cys Tyr Pro Ile Thr Lys 2050 2055 2060 Lys Gly Ile Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 2065 2070 2075 2080 Asp Ile Leu Ser Tyr Ser Ile Ala Gly Arg Asn Glu Val Phe Ser Asn 2085 2090 2095 Lys Leu Ile Asn His Lys His Met Asn Ile Leu Lys Trp Leu Asp His 2100 2105 2110 Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 2115 2120 2125 Ile Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 401 2130 2135 2140 Thr Asn Glu Leu Lys Lys Leu Ile Lys Ile Thr Gly Ser Val Leu Tyr 2145 2150 2155 2160 Asn Leu Pro Asn Glu Gin 2165 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CATATCACTC ACTCTGGGAT GGAG 24 INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: TCAGAACATC AAGCACCGCC S(2) INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 402 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: ACAGTCAAGA CTGAGATGAG INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: AAGAGTCAGA TACATGTGGA INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear e MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: ACATGAATCA GCCTAAAGTC INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 403 (xi) SEQUENCE DESCRIPTION: SEQ ID CCGAAAGAGT TCCTGCGTTA CGACC INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: CAGTCCACAC AAGTACCAGG INFORMATION FOR SEQ ID NO:42: S(i) SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: S GTCAGAAGCT GTGGACCATC INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 404 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: AATATTGCTA CAACAATGGC INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: ACTCTTCATT CCTAGACTGG INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID GTCCAATTAT GACTATGAAC INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 405 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: AGAACAGACA TGAAGCTTGC INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: CCAACAAGGA ATGCTTCTAG INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: ACAGCACTAT CTATGATTGA CCTGG INFORMATION FOR SEQ ID NO:49: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 406 GCAACATGGT TTACACATGC INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID AGATTGAGAG TTGATCCAGG INFORMATION FOR SEQ ID NO:51: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear "0 (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: AGGAGATACT TAAACTAAGC INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 407 TAAGCTTATG CCTTTCAGCG INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: TTAACGGACC TAAGCTGTGC INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: GAAACAGATT ATTATGACGG INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CGGGCTATCT AGGTGAACTT CAGG 94 408 INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: ATTTGGATAT GGAATATGAG INFORMATION FOR SEQ ID NO:57: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear V (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: ACTCAACTGA ACTACCAGTG INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: S" LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: AAGAACATCA TGTATTTCAG 409 INFORMATION FOR SEQ ID NO:59: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: TTATCAACGC ACTGCTCATG INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: GCCTCTGTGC AAACAAGCTG INFORMATION FOR SEQ ID NO:62: INFORMATION FOR SEQ ID N0:62: 410 SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: TCTCTAGTTA CTCTAGCAGC INFORMATION FOR SEQ ID NO:63: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: AGGTCGTTGT TTGTGAGGAG INFORMATION FOR SEQ ID NO:64: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid S* STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: TCGTCCTCTT CTTTACTGTC INFORMATION FOR SEQ ID 411 SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CCGTCCTCGA GCTAGCCTCG INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: CTCCTCCAGG CTCACATTGG INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs i TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: GGGTTGGTAC ATAGCTCTGC INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: 412 LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: CACCCATCTG ATATTTCCCT GATGG INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 0 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: TGGTTGACAG TACAAATCTG INFORMATION FOR SEQ ID 00. SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) 9 (xi) SEQUENCE DESCRIPTION: SEQ ID CTGAAATGGG AAGATTGTGC INFORMATION FOR SEQ ID NO:71: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs 413 TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: AGCAATCTAC ACTGCCTACC INFORMATION FOR SEQ ID NO:72: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: TCACAGATGA TTCAATTATC INFORMATION FOR SEQ ID NO:73: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear o (ii) MOLECULE TYPE: RNA (genomic) o (xi)-SEQUENCE DESCRIPTION: SEQ ID NO:73: GATCCTAGAT ATAAGTTCTC INFORMATION FOR SEQ ID NO:74: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid 414 STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: ACCAAACAAA GTTGGGTAAG G 21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID SGGGGGATCCA TCCCTAATCC TGCTCTTGTC CC 32 INFORMATION FOR SEQ ID NO:76: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: GATTCCTCTG ATGGCTCCAC INFORMATION FOR SEQ ID NO:77: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single 415 TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: TAACAGTCAA GGAGACCAAA G 21 INFORMATION FOR SEQ ID NO:78: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: GGGAAGCTTA ACCCTAATCC TGCCCTAGGT GG 32 INFORMATION FOR SEQ ID NO:79: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear MOLECULE TYPE: RNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: ACCAGACAAA GCTGGGAATA GA 22

Claims (24)

1. An isolated, recombinantly-generated, attenuated, nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales having at least one attenuating mutation in the 3' genomic promoter region and having at least one attenuating mutation in the RNA polymerase gene.
2. The virus of Claim 1 wherein the virus is from the Family Paramyxoviridae.
3. The virus of Claim 2 wherein the virus is from the Subfamily Paramyxovirinae.
4. The virus of Claim 3 wherein the virus is from the Genus Morbillivirus.
5. The virus of Claim 4 wherein the virus -XX- is measles virus.
6. The measles virus of Claim 5 wherein: the at least one attenuating mutation in the 3'.genomic promoter region is selected from the group consisting of Z2 nucleotide 26 (A nucleotide 42 (A T or A C) and nucleotide 96 (G where these nucleotides are presented in positive strand, antigenomic, message sense; and z2 the at least one attenuating mutation in the RNA polymerase gene is selected from the group consisting of nucleotide changes which produce changes in an amino acid selected from the group consisting of residues 331 (isoleucine ->threonine), 1409 (alanine threonine), 1624 (threonine ->alanine), 1649 (arginine -+methionine), 1717 417 (axpartic acid alanine) 1936 (histidine -*tyrosine) 2074 (glutamine -4arginine) and 22.14 (arginine lysine).
7. The virus of. Claim 3 wherein the virus is from the Genus Paramyxovlrus.
8. The virus of Claim 7 wherein the virus is human Parainfluenzaa virus type 3 (PIV-3).
9. The PIV-3 of Claim 8 wherein: the at least one attenuating mutation in the 3' genomic promoter region is selected from the group consisting of nucleotide 23 (T C) nucl.eotide 24 (C nuclectide 28 (G and nucleotide 45 (T where these nucleotiden are presented in positive .*strand, antigenomic, message sense; and the at least one attenuating mutation in the RNA polymerase gene in selected from the group consisting of nucleotide changes which produce changes in an amino acid selected from the group consisting of residues 942 (tyrosine histidine), 992 (leucine phenylalanine) and 1558 (threonine isoleucine). The virus of Claim 3 wherein the virus is from the'.Genus Rubulavirus. *11. The virus of Claim 2 wherein the virus in from the Subfamily Pneumovirinae.
12. The virus of Claim 12 wherein the virus is from the Genus Pneu~ov~Irus. 418
13. The virus of Claim 12 wherein the virus is human respiratory syncytial virus (RSV) subgroup B.
14. The virus of Claim 13 wherein: the at least one attenuating mutation in the 3' genomic promoter region is selected from the group consisting of nucleotide 4 (C and the insertion of an additional A in the stretch of A's at nucleotides 6-11, where these nucleotides are presented in positive strand, antigenomic, message sense; and the at least one attenuating mutation in the RNA polymerase gene is selected from the group consisting of nucleotide changes which produce changes in an amino acid selected from the group consisting of residues 353 (arginine lysine), 451 (lysine arginine), 1229 (aspartic acid asparagine), 2029 (threonine isoleucine) and 2050 (asparagine aspartic acid). The virus of Claim 1 wherein the virus is from the Family Rhabdoviridae.
16. The virus of Claim 1 wherein the virus is from the Family Filoviridae.
17. A vaccine comprising an isolated, recombinantly-generated, attenuated, nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales according to Claim 1 and a physiologically acceptable carrier.
18. The vaccine of Claim 17 comprising a measles virus according to Claim 5 and a physiologically acceptable carrier.
19. The vaccine of claim 18 comprising a measles virus according to claim 6 and a physiologically acceptable carrier. The vaccine of claim 17 comprising a PIV-3 according to claim 8 and a physiologically acceptable carrier.
21. The vaccine of claim 20 comprising a PIV-3 according to claim 9 and a physiologically acceptable carrier.
22. The vaccine of claim 17 comprising an RSV subgroup B according to claim 13 and a physiologically acceptable carrier.
23. The vaccine of claim 22 comprising an RSV subgroup B according to claim 14 and a physiologically acceptable carrier.
24. A method for immunising an individual to induce protection against a nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales which comprises administering to the individual the vaccine of any one of claims 17 to 23. A vaccine of any one of claims 17 to 23 when used for immunising an individual to induce protection against a nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales.
26. Use of a vaccine of any one of claims 17 to 23 in the manufacture of a medicament for immunising an individual to induce protection against a nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales. 20 27. An isolated nucleic acid molecule comprising a measles virus sequence in positive strand, antigenomic message sense selected from the group consisting of 1977 wild-type strain (SEQ ID NO:3), 1983 wild-type strain (SEQ ID NO:5) where the nucleotide 2499 is G or C, Montefiore wild- type strain (SEQ ID NO:7), Rubeovax T M vaccine strain (SEQ ID NO:9), where the nucleotide 2143 is T or C, Moraten vaccine strain (SEQ ID NO:11), Schwarz vaccine strain (SEQ ID NO:11), where the 25 nucleotide 4917 is C and the nucleotide 4924 is C, and Zagreb vaccine strain (SEQ ID NO:13), and -the complementary genomic sequences thereof.
28. An isolated nucleic acid molecule comprising a PIV-3 sequence in positive strand, antigenomic message sense selected from the group consisting of cp45 vaccine strain grown in foetal rhesus lung cells (SEQ ID NO:19) and cp45 vaccine strain grown in Vero cells (SEQ ID NO:21), and the complementary genomic sequences thereof.
29. A composition which comprises a transcription vector comprising an isolated nucleic acid molecule encoding a genome or antigenome of a nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales having at least one attenuating mutation in the 3 genomic promoter region and having at least one attenuating mutation in the RNA polymerase gene, together with at least one expression vector which comprises at least one isolated nucleic acid molecule encoding the trans-acting proteins necessary for encapsidation, transcription and replication, whereby upon expression an infectious attenuated virus is produced. The composition of claim 29 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes a measles virus according to claim 5 and the at least one
04407.doc expression vector comprises at least one isolated nucleic acid molecule encoding the trans-acting proteins N, P and L. 31. The composition of claim 30 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes a measles virus according to claim 6. 32. The composition of claim 29 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes a PIV-3 according to claim 8 and the at least one expression vector comprises at least one isolated nucleic acid molecule encoding the trans-acting proteins NP, P and L. 33. The composition of claim 32 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes a PIV-3 according to claim 9. 34. The composition of claim 29 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes an RSV subgroup B according to claim 13 and the at least one expression vector comprises at least one isolated nucleic acid molecule encoding the trans-acting proteins N, P, L and M2. 35. The composition of claim 34 wherein the transcription vector comprises an isolated nucleic acid molecule which encodes an RSV subgroup B according to claim 14. 36. A method for producing infectious attenuated nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales which comprises transforming or transfecting host cells with the at least two vectors of claim 29 and culturing the host cells under conditions which 20 permit the co-expression of these vectors so as to produce the infectious attenuated virus. 37. The method of claim 36 wherein the virus is the measles virus of claim 38. The method of claim 37 wherein the virus is the measles virus of claim 6. oo*,39. The method of claim 36 wherein the virus is the PIV-3 of claim 8. The method of claim 39 wherein the virus is the PIV-3 of claim 9. 41. The method of claim 36 wherein the virus is the RSV subgroup B of claim 13. 42. The method of claim 41 wherein the virus is the RSV subgroup B of claim 14. 43. An isolated, recombinantly generated, attenuated, nonsegmented, negative sense, single !stranded RNA virus of the Order Mononegavirales, substantially as hereinbefore described with reference to any one of the examples. 44. A vaccine comprising an isolated, recombinantly generated, attenuated, nonsegmented, **negative sense, single stranded RNA virus of the Order Mononegavirales, substantially as hereinbefore described with reference to any one of the examples. An isolated nucleic acid molecule comprising a measles virus sequence in positive strand, antigenomic message sense substantially as hereinbefore described with reference to any one of the examples. 46. An isolated nucleic acid molecule comprising a PIV-3 sequence in positive strand, antigenomic message sense, substantially as hereinbefore described with reference to any one of the examples. 47. A composition which comprises a transcription vector comprising an isolated nucleic acid molecule encoding a genome or antigenome of a nonsegmented, negative-sense, single stranded 04407.doc 421 RNA virus of the Order Mononegavirales substantially as hereinbefore described with reference to any one of the examples. 48. A method for producing infectious attenuated nonsegmented, negative-sense, single stranded RNA virus of the Order Mononegavirales substantially as hereinbefore described with reference to any one of the examples. Dated 6 November 2001 AMERICAN CYANAMID COMPANY THE GOVERNMENT OF THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH AND HUMAN SERVICES Patent Attorneys for the Applicant/Nominated Person SPRUSON&FERGUSON *oo 04407.doc
AU89330/01A 1996-09-27 2001-11-08 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales Abandoned AU8933001A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU89330/01A AU8933001A (en) 1996-09-27 2001-11-08 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales
AU2004237877A AU2004237877A1 (en) 1996-09-27 2004-12-10 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60026823 1996-09-27
AU89330/01A AU8933001A (en) 1996-09-27 2001-11-08 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU44278/97A Division AU4427897A (en) 1996-09-27 1997-09-19 3' genomic promoter region and polymerase gene mutations responsible for att enuation in viruses of the order designated mononegavirales

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004237877A Division AU2004237877A1 (en) 1996-09-27 2004-12-10 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales

Publications (1)

Publication Number Publication Date
AU8933001A true AU8933001A (en) 2002-01-24

Family

ID=3763538

Family Applications (2)

Application Number Title Priority Date Filing Date
AU89330/01A Abandoned AU8933001A (en) 1996-09-27 2001-11-08 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales
AU2004237877A Abandoned AU2004237877A1 (en) 1996-09-27 2004-12-10 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2004237877A Abandoned AU2004237877A1 (en) 1996-09-27 2004-12-10 3' genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales

Country Status (1)

Country Link
AU (2) AU8933001A (en)

Also Published As

Publication number Publication date
AU2004237877A1 (en) 2005-01-13

Similar Documents

Publication Publication Date Title
AU4427897A (en) 3&#39; genomic promoter region and polymerase gene mutations responsible for att enuation in viruses of the order designated mononegavirales
CA2302867A1 (en) Attenuated respiratory syncytial viruses
Yao et al. Peptides corresponding to the heptad repeat sequence of human parainfluenza virus fusion protein are potent inhibitors of virus infection
JP4237268B2 (en) Production of attenuated parainfluenza virus vaccines from cloned nucleotide sequences.
US7192593B2 (en) Use of recombinant parainfluenza viruses (PIVs) as vectors to protect against infection and disease caused by PIV and other human pathogens
AU2020203460B2 (en) Attenuation of human respiratory syncytial virus by genome scale codon-pair deoptimization
US7951383B2 (en) Attenuated parainfluenza virus (PIV) vaccines
AU785148B2 (en) Use of recombinant parainfluenza viruses (PIVs) as vectors to protect against infection and disease caused by PIV and other human pathogens
CN101012454A (en) Production of attenuated chimeric respiratory syncytial virus vaccines from cloned nucleotide sequences
KR20110063863A (en) Live, attenuated respiratory syncytial virus
US7250171B1 (en) Construction and use of recombinant parainfluenza viruses expressing a chimeric glycoprotein
AU767193B2 (en) Mutations responsible for attenuation in measles virus or human respiratory syncytial virus subgroup B
AU8933001A (en) 3&#39; genomic promoter region and polymerase gene mutations responsible for attenuation in viruses of the order designated mononegavirales
MXPA00009256A (en) Mutations responsible for attenuation in measles virus or human respiratory syncytial virus subgroup b
AU2002300291B2 (en) Production Of Attenuated Parainfluenza Virus Vaccines From Cloned Nucleotide Sequences
AU5591601A (en) Production of attenuated respiratory syncytial virus vaccines from cloned nucleotide sequences
AU5592201A (en) Production of attenuated respiratory syncytial virus vaccines from cloned nucleotide sequences

Legal Events

Date Code Title Description
TC Change of applicant's name (sec. 104)

Owner name: THE GOVERNMENT OF THE UNITED STATES OF AMERICA AS

Free format text: FORMER NAME: AMERICAN CYANAMID COMPANY, THE GOVERNMENT OF THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH AND HUMAN SERVICES