AU761093B2 - Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism - Google Patents

Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism Download PDF

Info

Publication number
AU761093B2
AU761093B2 AU10413/00A AU1041300A AU761093B2 AU 761093 B2 AU761093 B2 AU 761093B2 AU 10413/00 A AU10413/00 A AU 10413/00A AU 1041300 A AU1041300 A AU 1041300A AU 761093 B2 AU761093 B2 AU 761093B2
Authority
AU
Australia
Prior art keywords
ala
leu
arg
val
gly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU10413/00A
Other versions
AU1041300A (en
Inventor
Jorg Overhage
Horst Priefert
Jurgen Rabenhorst
Alexander Steinbuchel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Symrise AG
Original Assignee
Haarmann and Reimer GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haarmann and Reimer GmbH filed Critical Haarmann and Reimer GmbH
Publication of AU1041300A publication Critical patent/AU1041300A/en
Application granted granted Critical
Publication of AU761093B2 publication Critical patent/AU761093B2/en
Assigned to SYMRISE GMBH & CO. KG reassignment SYMRISE GMBH & CO. KG Alteration of Name(s) in Register under S187 Assignors: HAARMANN & REIMER GMBH
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N1/00Microorganisms, e.g. protozoa; Compositions thereof; Processes of propagating, maintaining or preserving microorganisms or compositions thereof; Processes of preparing or isolating a composition containing a microorganism; Culture media therefor
    • C12N1/20Bacteria; Culture media therefor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • C12N9/0083Miscellaneous (1.14.99)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1025Acyltransferases (2.3)
    • C12N9/1029Acyltransferases (2.3) transferring groups other than amino-acyl groups (2.3.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/88Lyases (4.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y203/00Acyltransferases (2.3)
    • C12Y203/01Acyltransferases (2.3) transferring groups other than amino-acyl groups (2.3.1)
    • C12Y203/01016Acetyl-CoA C-acyltransferase (2.3.1.16)

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biomedical Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Virology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Description

WO 00/26355 PCT/EP99/07952 -1- Constructing production strains for the preparation of substituted phenols by specifically inactivating genes of eugenol and ferulic acid catabolism The present invention relates to the construction of production strains and to a process for preparing substituted methoxyphenols, in particular vanillin.
DE-A 4 227 076 (process for preparing substituted methoxyphenols, and microorganism which is suitable for this purpose) describes the preparation of substituted methoxyphenols using a novel Pseudomonas sp.. The starting material in this context is eugenol and the products are ferulic acid, vanillic acid, coniferyl alcohol and coniferyl aldehyde.
An extensive review of the biotransformations which were possible using ferulic acid, which was written by Rosazza et al. (Biocatalytic transformation of ferulic acid: an abundant aromatic natural product; J. Ind. Microbiol. 15:457-471), also appeared in 1995.
The genes and enzymes for synthesizing coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillic and vanillin acid from Pseudomonas sp. were described in EP-A 0 845 532.
The enzymes for converting trans-ferulic acid into trans-feruloyl-SCoA ester and subsequently into vanillin, and also the gene for cleaving the ester, were described by the Institute of Food Research, Norwich, GB, in WO 97/35999. In 1998, the content of the patent also appeared in the form of scientific publications (Gasson et al. 1998.
Metabolism of ferulic acid to vanillin. J. Biol. Chem. 273:4163-4170; Narbad and Gasson 1998. Metabolism of ferulic acid via vanillin using a novel CoA-dependent pathway in a newly isolated strain of Pseudomonas fluorescens. Microbiology 144:1397- 1405).
-2- DE-A 195 32 317 describes the use of Amycolatopsis sp. for obtaining vanillin from ferulic acid fermentatively in high yields.
The known processes suffer from the disadvantage that they either achieve only very low yields of vanillin or make use of relatively expensive starting compounds. While the last-mentioned process (DE-A 195 32 317) does achieve high yields, the use of Pseudomonas sp. HR 199 and Amycolatopsis sp. HR167 for biotransforming eugenol into vanillin requires a fermentation which is carried out in two steps, consequently leading to substantial expense and consumption of time.
The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge in Australia.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers or steps but not the exclusion of any other integer or group of integers or steps.
The present invention constructs organisms which are able to convert the relatively inexpensive raw material eugenol into vanillin in a one-step process.
The invention is achieved by means of constructing production strains of unicellular or multicellular organisms, which strains are characterized in that enzymes of eugenol and/or ferulic acid catabolism are inactivated such that the intermediates coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid accumulate.
The production strain may be unicellular or multicellular. Accordingly, the invention 30 can relate to microorganisms, plants or animals. Furthermore, use can also be made of extracts which are obtained from the production strain. According to the invention, P:\WPDOCS\CRNPuniu\Spci'\7595580.spe.doc.24/3/03 -2apreference is given to using unicellular organisms. These latter organisms can be microorganisms or animal or plant cells. According to the invention, particular preference is given to using fungi and bacteria. The highest preference is given to bacterial species. Those bacteria which may in particular be used, after their eugenol and/or ferulic acid catabolism has/have been altered, are species of Rhodococcus, Pseudomonas und Escherichia.
In the simplest case, known, conventional microbiological methods can be used for isolating the organisms which may be employed in accordance with the invention.
S S o*D o*o o* Thus, the enzymic activity of the proteins involved in eugenol and/or ferulic acid catabolism can be altered by using enzyme inhibitors. Furthermore, the enzymic activity of the proteins involved in eugenol and/or ferulic acid catabolism can be altered by mutating the genes which encode these proteins. Such mutations can be generated in a random manner by means of classical methods, for example by using UV irradiation or mutation-inducing chemicals.
Recombinant DNA methods, such as deletions, insertions and/or nucleotide exchanges, are likewise suitable for isolating the novel organisms. Thus, the genes of the organisms can, for example, be inactivated using other DNA elements (2 elements). Suitable vectors can likewise be used for replacing the intact genes with gene structures which are altered and/or inactivated. In this context, the genes which are to be inactivated, and the DNA elements which are employed for the inactivation, can be obtained by means of classical cloning techniques or by means of polymerase chain reactions (PCR).
For example, in one possible embodiment of the invention, eugenol catabolism and ferulic acid catabolism can be altered by inserting Q elements, or introducing deletions, into appropriate genes. In this context, the abovementioned recombinant DNA methods can be used to inactivate the functions of the genes, which encode dehydrogenases, synthetases, hydratase-adolases, thiolases or demethylases, such that production of the relevant enzymes is blocked. Preferably, the genes are those which encode coniferyl alcohol dehydrogenases, coniferyl aldehyde dehydrogenases, feruloyl-CoA synthetases, enoyl-CoA hydratase-aldolases, beta-ketothiolases, vanillin dehdrogenases or vanillic acid demethylases. Very particular preference is given to genes which encode the amino acid sequences specified in EP-A 0845532 and/or nucleotide sequences which encode their allelic variations.
The invention accordingly also relates to gene structures for preparing transformed organisms and mutants.
-4- Preference is given to employing gene structures in which the nucleotide sequences encoding dehydrogenases, synthetases, hydratase-aldolases, thiolases or demethylases are inactivated for isolating the organisms and mutants. Particular preference is given to gene structures in which the nucleotide sequences encoding coniferyl alcohol dehydrogenases, coniferyl aldehyde dehydrogenases, feruloyl-CoA synthetases, enoyl-CoA hydratase-aldolases, beta-ketothiolases, vanillin dehydrogenases or vanillic acid demethylases are inactivated. Very particular preference is given to gene structures which exhibit the structures given in Figures la to Ir having the nucleotide sequences which are depicted in Figures 2a to 2r and/or nucleotide sequences encoding their allelic variations. In this context, particular preference is given to nucleotide sequences 1 to 18.
The invention also encompasses the part sequences of these gene structures as well as functional equivalents. Functional equivalents are to be understood as meaning those derivatives of the DNA in which individual nucleobases have been exchanged (wobble exchanges) without the function being altered. Amino acids may also be exchanged at the protein level without this resulting in an alteration in function.
One or more DNA sequences can be inserted upstream and/or downstream of the gene structures. By cloning the gene structures, it is possible to obtain plasmids or vectors which are suitable for the transformation and/or transfection of an organism and/or for conjugative transfer into an organism.
The invention furthermore relates to plasmids and/or vectors for preparing the organisms and mutants which are transformed in accordance with the invention.
These organisms and mutants consequently harbour the gene structures which have been described. The present invention accordingly also relates to organisms which harbour the said plasmids and/or vectors.
The nature of the plasmids and/or vectors depends on what they are being used for. In order, for example, to be able to replace the intact genes of eugenol and/or ferulic acid catabolism in pseudomonads with the genes which have been inactivated with omega elements, there is a need for vectors which, on the one hand, can be transferred into pseudomonads (conjugatively transferable plasmids) but which, on the other hand, cannot be replicated in these organisms and are consequently unstable in pseudomonads (so-called suicide plasmids). DNA segments which are transferred into pseudomonads with the aid of such a plasmid system can only be retained if they become integrated by homologous recombination into the genome of the bacterial cell.
The described gene structures, vectors and plasmids may be used for preparing different transformed organisms or mutants. The said gene structures can be used for replacing intact nucleic acid sequences with altered and/or inactivated gene structures. In the cells, which can be obtained by transformation or transfection or conjugation, the intact gene is replaced, by homologous recombination, with the altered and/or inactivated gene structure, as a consequence of which the resulting cells now only possess the altered and/or inactivated gene structure in their genome.
In this way, preferably genes can be altered and/or inactivated, in accordance with the invention, such that the relevant organisms are able to produce coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid.
Mutants of the strain Pseudomonas sp. HR199 (DSM 7063), which was described in detail in DE-A 4 227 076 and EP-A 0845532, are examples of production strains which have been constructed in this way in accordance with the invention, with the corresponding gene structures ensuing, inter alia, from Figures la to Ir, in combination with Figures 2a to 2r: 1. Pseudomonas sp. HR199calAQKm, which contains the Q2Km-inactivated calA gene in place of the intact calA gene encoding coniferyl alcohol dehydrogenase (Fig. la; Fig. 2a).
-6- 2. Pseudomonas sp. HR199calAQGm, which contains the QGm-inactivated calA gene in place of the intact calA gene encoding coniferyl alcohol dehydrogenase (Fig. lb; Fig. 2b).
3. Pseudomonas sp. HR199calAA, which contains the deletion-inactivated calA gene in place of the intact calA gene encoding coniferyl alcohol dehydrogenase (Fig. 1c; Fig. 2c).
4. Pseudomonas sp. HR199calM Km, which contains the QKm-inactivated caiB gene in place of the intact caiB gene encoding coniferyl aldehyde, dehydrogenase (Fig. id; Fig. 2d) 5. Pseudomonas sp. HR199calBQ Gm, which contains the n2Gm-inactivated caiB gene in place of the intact caiB gene encoding coniferyl aldehyde, dehydrogenase (Fig. le; Fig. 2e).
6. Pseudomonas sp. HR199calBA, which contains the deletion -inactivated caiB gene in place of the intact caiB gene encoding coniferyl aldehyde, dehydrogenase (Fig.lf; Fig. 2f).
7. Pseudornonas sp. HR199fcsQ Km, which contains the 92Km-inactivated fcs gene in place of the intact fcs gene encoding feruloyl-CoA synthetase (Fig.lg; Fig. 2g).
8. Pseudomonas sp. HR199fcsf2Gm, which contains the 9 Gm-inactivated fes gene in place of the intact fcs gene encoding feruloyl-CoA synthetase (Fig.lh; Fig. 2h).
9. Pseutdornonas sp. HTRL99ftsA, which contains the deletion-inactivated fcs gene in place of the intact fcs gene encoding feruloyl-CoA synthetase (Fig.li; Fig. 2i).
10. Pseudomnonas sp. HR199echQ Km, which contains the 92Km-inactivated ech gene in place of the intact ech gene encoding enoyl-CoA hydratase-aldolase (Fig. lj; Fig. 2j).
11. Pseudonwnas sp. HR199eclzQGm, which contains the Q Gm-inactivated ech gene in place of the intact ech gene encoding enoyl-CoA hydratase-aldolase (Fig.lk; Fig. 2k).
-7- 12. Pseudomonas sp. HR199echA, which contains the deletion-inactivated ech gene in place of the intact ech gene encoding enoyl-CoA hydratase-aldolase (Fig.11; Fig. 21).
13. Pseudomonas sp. HR199aatQKm, which contains the QKm-inactivated aat gene in place of the intact aat gene ecnoding beta-ketothiolase (Fig. Im; Fig. 2m).
14. Pseudomonas sp. HR199aatQGm, which contains the QGm-inactivated aat gene in place of the intact aat gene encoding beta-ketothiolase (Fig.1n; Fig. 2n).
15. Pseudomonas sp. HR199aatA, which contains the deletion-inactivated aat gene in place of the intact aat gene encoding beta-ketothiolase (Fig. lo; 16. Pseudomonas sp. HR199vdh2Km, which contains the 92Km-inactivated vdh gene in place of the intact vdh gene encoding vanillin dehydrogenase (Fig.lp; Fig. 2p).
17. Pseudomonas sp. HR199vdhQGm, which contains the QGm-inactivated vdh gene in place of the intact vdh gene encoding vanillin dehydrogenase (Fig.lq; Fig. 2q).
18. Pseudomonas sp. HR199vdhA, which contains the deletion-inactivated vdh gene in place of the intact vdh gene encoding vanillin dehydrogenase (Fig.1r; Fig. 2r).
19. Pseudomonas sp. HR199vdhBf2Km, which contains the QKm-inactivated vdhB gene in place of the intact vdhB gene encoding vanillin dehydrogenase
II.
Pseudomonas sp. HR199vdhBQGm, which contains the 92Gm-inactivated vdhB gene in place of the intact vdhB gene encoding vanillin dehydrogenase
II.
21. Pseudomonas sp. HR199vdhBA, which contains the deletion-inactivated vdhB gene in place of the intact vdhB gene encoding vanillin dehydrogenase II.
22. Pseudomonas sp. HR199adhQKm, which contains the g Km-inactivated adh gene in place of the intact adh gene encoding alcohol dehydrogenase.
23. Pseudomonas sp. HR199adh2Gm, which contains the 92Gm-inactivated adh gene in place of the intact adh gene encoding alcohol dehydrogenase.
24. Pseudomonas sp. HR199adhA which contains the deletion-inactivated adh gene in place of the intact adh gene encoding alcohol dehydrogenase.
25. Pseudomonas sp. HR199vanA2Km, which contains the QKm-inactivated vanA gene in place of the intact vanA gene encoding the a-subunit of vanillic acid demethylase.
26. Pseudomonas sp. HR199vanAGm, which contains the 92Gm-inactivated vanA gene in place of the intact vanA gene encoding the a-subunit of vanillic acid demethylase.
27. Pseudomonas sp. HR199vanAA, which contains the deletion-inactivated vanA gene in place of the intact vanA gene encoding the a-subunit of vanillic acid demethylase.
28. Pseudomonas sp. HR199vanB9Km, which contains the QKm-inactivated vanB gene in place of the intact vanB gene encoding the 3-subunit of vanillic acid demethylase.
29. Pseudomonas sp. HR199vanB92Gm, which contains the QGm-inactivated vanB gene in place of the intact vanB gene encoding the 3-subunit of vanillic acid demethylase.
30. Pseudomonas sp. HR199vanBA, which contains the deletion-inactivated vanB gene in place of the intact vanB gene encoding the 3-subunit of vanillic acid demethylase.
The invention additionally relates to a process for the biotechnological preparation of organic compounds. In particular, this process can be used to prepare alcohols, aldehydes and organic acids. The latter are preferably coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and vanillic acid.
The above-described organisms are employed in the novel process. The organisms which are very particularly preferred include bacteria, in particular the Pseudomonas species. Specifically, the abovementioned Pseudomonas species can preferably be employed for the following processes: I1. Pseudomonas sp. HRl99calAf Km, Pseudomonas sp. HR199calAQGm and Pseudomonas sp. fIR 99calAA for preparing coniferyl alcohol from eugenol.
2. Pseudomonas sp. HRl99calBL2Krn, Pseudomonas sp. IHRl99calBfQGm and Pseudomonas sp. HR199calBA for preparing coniferyl aldehyde from eugenol or coniferyl alcohol.
3. Pseudornonas sp. HR199fcs92Km, Pseudornonas sp. HR199fcsQ2Gm, Pseudomonas sp. HRl99fcsA, Pseudomonas sp. BR199echQ2Km, Pseudomonas sp. HR199echQ2Gm and Pseudomonas sp. HRl99echA for preparing ferulic acid from eugenol or coniferyl alcohol or coniferyl aldehyde.
4. Pseudomionas sp. HRl99vdhg2Km, Pseudomonas sp. HRl99vdhQGm, Pseudornonas sp. HR 199vdhA, Pseudomionas sp. HR 199vdlzQGmvdhBg Km, Pseudomonas sp. HR199vdh9 KmvdhB92Gm, Pseudomionas sp. HRl99vdhzA vdhB9 Gm and Pseudornonas sp. HRl99vdhiAvdhBf Km for preparing vanillin from eugenol or coniferyl alcohol or coniferyl aldehyde or ferulic acid.
Pseudomonas sp. HRl99vanA 2Km, Pseudornonas sp. HIR199vanAQ~m, Pseudoinonas sp. HR199vanAA, Pseudomonas sp. HR199vanBQ Km, Pseudomonas sp. HR199vanBgi Gm and Pseudomionas sp. HR199vanIBA for preparing vanillic acid from eugenol or coniferyl alcohol or coniferyl aldehyde or ferulic acid or vanillin.
Eugenol is the preferred substrate. However, it is also possible to add further substrates or even to replace the eugenol with another substrate.
Suitable nutrient media for the organisms which are employed in accordance with the invention are synthetic, semisynthetic or complex culture media. These media may comprise carbon-containing and nitrogen-containing compounds, inorganic salts, where appropriate trace elements, and vitamins.
Carbon-containing compounds which may be suitable are carbohydrates, hydrocarbons or organic standard chemicals. Examples of compounds which may preferably be used are sugars, alcohols or sugar alcohols, organic acids or complex mixtures.
The sugar is preferably glucose. The organic acids which may preferably be employed are citric or acetic acid. Examples of the complex mixtures are malt extract, yeast extract, casein or casein hydrolysate.
Inorganic compounds are suitable nitrogen-containing substrates. Examples of these are nitrates and ammonium salts. Organic nitrogen sources can also be used. These sources include yeast extract, soya bean meal, casein, casein hydrolysate and corn steep liquor.
Examples of the inorganic salts which may be employed are sulphates, nitrates, chlorides, carbonates and phosphates. The metals which the said salts contain are preferably sodium, potassium, magnesium, manganese, calcium, zinc and iron.
The temperature for the culture is preferably in the range from 5 to 100°C. The range from 15 to 60 0 C is particularly preferred, with 22 to 37°C being most preferred.
The pH of the medium is preferably 2 to 12. The range from 4 to 8 is particularly preferred.
In principle, any bioreactor known to the skilled person can be employed for carrying out the novel process. Preferential consideration is given to any appliance which is 11 suitable for submerged processes. This means that vessels which do or do not possess a mechanical mixing device may be employed in accordance with the invention.
Examples of the latter are shaking apparatuses, and bubble column reactors or loop reactors. The former preferably include all the known appliances which are fitted with stirrers of any design.
The novel process can be carried out continuously or batchwise. The fermentation time required for achieving a maximum quantity of product depends on the specific nature of the organism employed. However, in principle, the fermentation times are between 2 and 200 hours.
The invention is explained in more detail below while referring to examples: Mutants of the eugenol-utilizing strain Pseudomonas sp. HR199 (DSM 7063) were generated in a targeted manner by specifically inactivating genes of eugenol catabolism by means of inserting omega elements or introducing deletions. The omega elements employed were DNA segments which encoded resistances to the antibiotics kanamycin (QKm) and gentamycin (QGm). These resistance genes were isolated from Tn5 and the plasmid pBBR1MCS-5 using standard methods. The genes calA, calB, fcs, ech, aat, vdh, adh, vdhB, vanA and vanB, which encode coniferyl alcohol dehydrogenase, coniferyl aldehyde dehydrogenase, feruloyl-CoA synthetase, enoyl-CoA hydratase-aldolase, beta-ketothiolase, vanillin dehdrogenase, alcohol dehydrogenase, vanillin dehdrogenase 1 and vanillic acid demethylase, were isolated from genomic DNA of the strain Pseudomonas sp. HR199 using standard methods and cloned into pBluescript SK-. By means of digesting with suitable restriction endonucleases, DNA segments were removed from these genes (deletion) or substituted with Q elements (insertion), resulting in the respective gene being inactivated. The genes which had been mutated in this manner were recloned into conjugatively transferable vectors and subsequently introduced into the strain Pseudomonas sp. HR199. Suitable selection was used to obtain transconjugants which had replaced the respective functional wild-type gene with the newly -12introduced inactivated gene. The insertion and deletion mutants which were obtained in this way now only possessed the respective inactivated gene. This procedure was used to obtain both mutants possessing only one defective gene and multiple mutants, in which several genes had been inactivated in this manner. These mutants were employed for biotransforming a) eugenol into coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid; b) coniferyl alcohol into coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid; c) coniferyl aldehyde into ferulic acid, vanillin and/or vanillic acid; d) ferulic acid into vanillin and/or vanillic acid, and e) vanillin into vanillic acid.
-13- Materials and Methods Conditions for growing the bacteria.
Strains of Escherichia coli were propagated at 37°C in Luria-Bertani (LB) or M9 mineral medium Sambrook, E. F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual. 2nd Edition., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York). Strains of Pseudomonas sp. were propagated at 0 C in Nutrient Broth (NB, wt/vol) or in mineral medium (MM) G.
Schlegel, et al. 1961. Arch. Mikrobiol. 38:209-222) or HR mineral medium (HR- MM) Rabenhorst, 1996. Appl. Microbiol. Biotechnol. 46:470-474.). Ferulic acid, vanillin, vanillic acid and protocatechuic acid were dissolved in dimethyl sulphoxide and added to the respective medium to give a final concentration of 0.1% (wt/vol).
Eugenol was either added directly to the medium to give a final concentration of 0.1% (vol/vol) or applied to filter paper (circular filter 595, Schleicher Schuell, Dassel, Germany) in the lids of MM agar plates. When transconjugants and mutants of Pseudomonas sp. were being propagated, tetracycline, kanamycin and gentamycin were employed in final concentrations of 25 Ig/ml, 100 pg/ml and 7.5 Ag/ml, respectively.
Qualitative and quantitative detection of metabolic intermediates in culture supernatants.
Culture supernatants were analysed by high pressure liquid chromatography (Knauer HPLC) either directly or after dilution with doubly distilled H20. The chromatography was carried out on Nucleosil 100 C18 (7 Am, 250 x 4 mm). 0.1% (vol/vol) formic acid and acetonitrile was used as the solvent. The course of the gradient employed for eluting the substances was as follows: 00:00 06:30 26% acetonitrile 06:30 08:00 100% acetonitrile 08:00 12:00 100% acetonitrile 12:00 13:00 26% acetonitrile 13:00 18:00 26% acetonitrile -14- Purification of vanillin dehydrogenase II.
The purification was carried out at 4 0
C.
Crude extract.
Pseudomonas sp. HR199 cells which had been propagated on eugenol were washed in 10 mM sodium phosphate buffer, pH 6.0, then resuspended in the same buffer and disrupted by being passed twice through a French press (Amicon, Silver Spring, Maryland, USA) at a pressure of 1000 psi. The cell homogenate was subjected to an ultracentrifugation (1 h, 100,000 x g, 4 0 resulting in the soluble fraction of crude extract being obtained as the supernatant.
Anion exchange chromatography on DEAE Sephacel.
The soluble fraction of the crude extract was dialysed overnight against 10 mM sodium phosphate buffer, pH 6.0. The dialysate was loaded onto a DEAE-Sephacel column (2.6 cm x 35 cm, bed volume[BV]: 186 ml) which had been equilibrated with 10 mM sodium phosphate buffer, pH 6.0, and which had a flow rate of 0.8 ml/min. The column was rinsed with two BV of 10 mM sodium phosphate buffer, pH 6.0. The vanillin dehydrogenase II (VDH II) was eluted with a linear salt gradient of from 0 to 400 mM NaCI in 10 mM sodium phosphate buffer, pH 6.0 (750 ml). 10 ml fractions were collected. Fractions having a high VDH II activity were combined to form the DEAE pool.
Determining the vanillin dehydrogenase activity.
The VDH activity was determined at 30 0 C using an optical enzymic test. The reaction mixture, whose volume was 1 ml, contained 0.1 mmol of potassium phosphate (pH 0.125 Itmol of vanillin, 0.5 Amol of NAD, 1.2 /mol of pyruvate (Na salt), lactate dehydrogenase (1 U; from pig heart) and enzyme solution. The oxidation of vanillin was monitored at X 340 nm (Evanillin 11.6 cm 2 /lmol). The enzyme activity was given in units with 1 U corresponding to the quantity of enzyme which converts 1 /mol of vanillin per minute. The protein concentrations in the samples were determined using the method of Lowry et al. H. Lowry, N. J.
Rosebrough, A. L. Farr and R. J. Randall. 1951. J. Biol. Chem. 193:265-275).
Determining the coniferyl alcohol dehydrogenase activity.
The CADH activity was determined at 30 0 C using an optical enzymic test in accordance with Jaeger et al. L. Jaeger, Eggeling and H. Sahm. 1981. Current Microbiology. 6:333-336). The reaction mixture, whose volume was 1 ml, contained 0.2 mmol of tris/HCI (pH 0.4 Amol of coniferyl alcohol, 2 /mol of NAD, 0.1 mmol of semicarbazide and enzyme solution. The reduction of NAD was monitored at X 340 nm (E 6.3 cm 2 /Amol). The enzyme activity was given units with 1 U corresponding to the quantity of enzyme which converts 1 pmol of substrate per minute. The protein concentrations in the samples were determined by the method of Lowry et al. H. Lowry, N. J. Rosebrough, A. L. Farr and R.
J. Randall. 1951. J. Biol. Chem. 193:265-275).
Determining the coniferyl aldehyde dehydrogenase activity.
The CALDH activity was determined at 30 0 C using an optical enzymic test. The reaction mixture, whose volume was 1 ml, contained 0.1 mmol of tris/HCI (pH 8.8), 0.08 ttmol of coniferyl aldehyde, 2.7 imol of NAD and enzyme solution. The oxidation of coniferyl aldehyde to ferulic acid was monitored at X 400 nm (e 34 cm 2 /A/mol). The enzymic activity was given in units with 1 U corresponding to the quantity of enzyme which converts 1 pimol of substrate per minute. The protein concentrations in the samples were determined by the method of Lowry et al. H.
Lowry, N. J. Rosebrough, A. L. Farr and R. J. Randall. 1951. J. Biol. Chem.
193:265-275).
Determining the feruloyl-CoA synthetase (ferulic acid thiokinase) activity.
The FCS activity was determined at 30 0 C using an optical enzymic test which was a modification of that of Zenk et al. (Zenk et al. 1980. Anal. Biochem. 101:182-187).
The reaction mixture, whose volume was 1 ml, contained 0.09 mmol of potassium phosphate (pH 2.1 A/mol of MgCI2, 0.7 Amol of ferulic acid, 2 /mol of ATP, -16- 0.4 #jmol of coenzyme A and enzyme solution. The formation of the CoA ester from ferulic acid was monitored at X 345 nm (E 10 cm 2 The enzymic activity was given in units with 1 U corresponding to the quantity of enzyme which converts 1 Amol of substrate per minute. The protein concentrations in the samples were determined using the method of Lowry et al. H. Lowry, N. J. Rosebrough, A. L. Farr and R. J. Randall. 1951. J. Biol. Chem. 193:265-275).
Electrophoretic methods.
Protein-containing extracts were fractionated under native conditions in 7.4% (wt/vol) polyacrylamide gels using the method of Stegemann et al. (Stegemann et al.
1973. Z. Naturforsch. 28c:722-732) and under denaturing conditions in 11.5% (wt/vol) polyacrylamide gels using the method of Laemmli (Laemmli, U. K. 1970.
Nature (London) 227:680-685). Serva Blue R was used for non-specific protein staining. For specifically staining the coniferyl alcohol dehydrogenase, coniferyl aldehyde dehydrogenase and vanillin dehydrogenase, the gels were rebuffered for min in 100 mM KP buffer (pH 7.0) and subequently incubated at 30 0 C in the same buffer to which 0.08% (wt/vol) NAD, 0.04% (wt/vol) p-nitro blue tetrazolium chloride, 0.003% (wt/vol) phenazine methosulphate and 1 mM of the respective substrate had been added until corresponding colour bands became visible.
Transfer of proteins from polyacrylamide gels to PVDF membranes.
Proteins were transferred from SDS-polyacrylamide gels to PVDF membranes (Waters-Millipore, Bedford, Mass., USA) using a Semidry Fastblot appliance (B32/33, Biometra, Gbttingen, Germany) in accordance with the manufacturer's instructions.
Determining N-terminal amino acid sequences.
N-terminal amino acid sequences were determined using a Protein Peptide Sequencer (Type 477 A, Applied Biosystems, Foster City, USA) and a PTH analyser in accordance with the manufacturer's instructions.
-17- Isolating and manipulating DNA Genomic DNA was isolated using the method of Marmur Marmur, 1961. J. Mol.
Biol. 3:208-218). Other plasmid DNA and/or DNA restriction fragments was/were isolated and analysed using standard methods E. Sambrook, F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual. 2nd Edition., Cold Spring Harbor Laboratory Press, Cold Spring Habor, New York).
Transferring DNA.
Competent Escherichia coli cells were prepared and transformed using the method of Hanahan Hanahan, 1983. J. Mol. Biol. 166:557-580). Conjugative plasmid transfer between plasmid-harbouring Escherichia coli S17-1 strains (donor) and Pseudomonas sp.strains (recipient) was performed on NB agar plates in accordance with the method of Friedrich et al. Friedrich et al. 1981. J. Bacteriol. 147:198- 205), or by means of a "minicomplementation method" on MM agar plates containing 0.5% (wt/vol) gluconate as the C source and 25 pg of tetracycline/ml or 100 pg of kanamycin/ml. In this case, cells of the recipient were applied in one direction as an inoculation streak. After 5 min, cells of the donor strains were then applied as inoculation streaks, with these streaks crossing the recipient inoculation streak. After incubating at 30 0 C for 48 h, the transconjugants grew directly downstream of the crossing site whereas neither the donor strain nor the recipient strain was able to grow.
Hybridization experiments.
DNA restriction fragments were fractionated electrophoretically in a 0.8% (wt/vol) agarose gel in 50 mM tris- 50 mM boric acid- 1.25 mM EDTA buffer (pH 8.5) E.
Sambrook, F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual.
2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.).
The transfer of the denatured DNA out of the gel onto a positively charged nylon membrane (pore size: 0.45 Pall Filtrationstechnik, Dreieich, Germany), the subsequent hybridization with biotinylated or digoxigenin-labelled DNA probes, and the preparation of these DNA probes, were all performed using standard methods -18- E. Sambrook, F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual. 2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York).
DNA sequencing.
Nucleotide sequences were determined "non-radioactively" in accordance with the Sanger et al. (Sanger et al. 1977. Proc. Natl. Acad. Sci. USA 74:5463-5467) dideoxy chain termination method using a "LI-COR" DNA Sequencer Model 4000L" (LI-COR Inc., Biotechnology Division, Lincoln, NE, USA) and using a "thermo sequenase fluorescent labelled primer cycle sequencing kit with 7-deaza-dGTP" (Amersham Life Science, Amersham International plc., Little Chalfont, Buckinghamshire, England), in each case in accordance with the manufacturer's instructions.
Synthetic oligonucleotides were used to carry out sequencing in accordance with the "primer-hopping strategy" of Strauss et al. C. Strauss et al. 1986. Anal. Biochem.
154:353-360).
Chemicals, biochemicals and enzymes.
Restriction enzymes, T4 DNA ligase, lambda DNA and enzymes and substrates for the optical enzymic tests were obtained from C.F. Boehringer Sihne (Mannheim, Germany) or from GIBCO/BRL (Eggenstein, Germany). [y- 3 2 P]ATP was from Amersham/Buchler (Braunschweig, Germany). Oligonucleotides were obtained from MWG-Biotech GmbH (Ebersberg, Germany). Type NA agarose was obtained from Pharmacia-LKB (Uppsala, Sweden). All other chemicals were from Haarmann Reimer (Holzminden, Germany), E. Merck AG (Darmstadt, Germany), Fluka Chemie (Buchs, Switzerland), Serva Feinbiochemica (Heidelberg, Germany) or Sigma Chemie (Deisenhofen, Germany).
-19- Examples Example 1 Constructing omega elements which mediate resistances to kanamycin (0Q Km) or gentamycin (2Gm).
For constructing the i2Km element, the 2099 bp BglI fragment of Transposons A. Auerswald, G. Ludwig and H. Schaller. 1981. Cold Spring Harb. Symp.
Quant. Biol. 45:107-113; E. Beck, G. Ludwig, E. A. Auerswald, B. Reiss and H.
Schaller. 1982. Genes 19:327-336; P. Mazodier, P. Cossart, E. Giraud and F. Gasser.
1985. Nucleic Acids Res. 13:195-205) was isolated on a preparative scale. The fragment was shortened down to approx. 990 bp by treating it with Bal 31 nuclease.
This fragment, which now only comprised the kanamycin resistance gene (encoding an aminoglycoside-3'-O-phosphotransferase), was then ligated to Sinal-cut pSKsym DNA (pBluescript SK- derivative which contains a symmetrically constructed multiple cloning site [Sail, Hind, EcoRI, Smal, EcoRI, HindmI, Sail]). It was possible to reisolate the QKm element from the resulting plasmid as a Smal fragment, an EcoRI fragment, a HindUl fragment or a Sail fragment.
For constructing the Q2Gm element, the 983 bp EaeI fragment of the plasmid E. Kovach, P. H. Elzer, D. S. Hill, G. T. Robertson, M. A. Farris, R. M. Roop and K. M. Peterson. 1995. Genes 166:175-176) was isolated on a preparative scale and then treated with mung bean nuclease (progressive digestion of single-stranded DNA molecule ends). This fragment, which now only comprised the gentamycin resistance gene (encoding a gentamycin-3-acetyltransferase), was then ligated to Smal-cleaved pSKsym DNA (see above). It was possible to reisolate the Q Gm element from the resulting plasmid as a SmaI fragment, an EcoRI fragment, a Hindlm fragment or a Sail fragment.
Example 2 Cloning the genes from Pseudomonas sp. HR199 (DSM7063) which were to be inactivated by inserting Q elements or by means of deletions.
The fcs, ech, vdh and aat genes were cloned separately proceeding from the E. coli S17-1 strains DSM 10439 and DSM 10440 and using the plasmids pE207 and pE5-1 (see EP-A 0845532). The given fragments were isolated on a preparative scale from these plasmids and treated as described below: For cloning the fcs gene, the 2350 bp Sall/EcoRI fragment from plasmid pE207 and the 3700 bp EcoRI/SalI fragment from plasmid pE5-1 were cloned together in pBluescript SK" such that the two fragments were joined together by way of the EcoRI ends. The 6050 bp Sall fragment was isolated on a preparative scale from the resulting hybrid plasmid and shortened down to approx. 2480 bp by being treated with Bal 31 nuclease. PstI linkers were subsequently ligated to the ends of the fragment and, after digestion with PstI, the fragment was cloned into pBluescript SK (pSKfcs). After transformation of E. coli XL1 blue, clones were obtained which expressed the fcs gene and exhibited an FCS activity of 0.2 U/mg of protein.
For cloning the ech gene, the 3800 bp Hindll/EcoRI fragment from plasmid pE207 was isolated on a preparative scale and shortened down to approx. 1470 bp by treating it with Bal 31 nuclease. EcoRI linkers were then ligated to the ends of the fragment and, after digestion with EcoRI, the fragment was cloned into pBluescript SK (pSKech).
For cloning the vdh gene, the 2350 bp SalIlEcoRI fragment from plasmid pE207 was isolated on a preparative scale. After cloning into pBluescript SK-, the fragment was truncated at one end by approx. 1530 bp using an exonuclease II/mung bean nuclease system. An EcoRI linker was then ligated to the end of the fragment and, after digestion with EcoRI, the fragment was cloned into pBluescript SK- (pSKvdh).
-21- Following transformation of E. coli XL1 blue, clones were obtained which expressed the VDH gene and exhibited a VDH activity of 0.01 U/mg of protein.
For cloning the aat gene, the 3700 bp EcoRIISalI fragment from plasmid pE5-1 was isolated on a preparative scale and shortened down to approx. 1590 bp by treating it with Bal 31 nuclease. EcoRI linkers were then ligated to the ends of the fragment and, after digestion with EcoRI, the fragment was cloned into pBluescript SK- (pSKaat).
Example 3 Inactivating the above-described genes by inserting elements or by deleting constituent regions of these genes.
Plasmid pSKfcs, which contained the fcs gene, was digested with BssHII, resulting in a 1290 bp fragment being excised from the fcs gene. Following religation, the deletion derivative of the fcs gene (fcsA) (see Figs. li and 2i) was obtained in cloned form in pBluescript SK- (pSKfcsA). In addition, after the fragment had been excised, the omega elements QKm and QGm were ligated in in its stead. This resulted in the G2-inactivated derivatives of the fcs gene (fcsQKm, see Figs. Ig and 2g) and (fcsQGm, see Fig. lh and 2h) being obtained in cloned form in pBluescript SK" (pSKfcsGKm and pSKfcsS2Gm). It was not possible to detect any FCS activity in crude extracts of the resulting E. coli clones, whose hybrid plasmids possessed an fcs gene which was inactivated by deletion or by 92 element insertion.
Plasmid pSKech, which contained the ech gene, was digested with NruI, resulting in a 53 bp fragment and a 430 bp fragment being excised from the ech gene. After religation, the deletion derivative of the ech gene (echA, see Fig. 11 and 21) was obtained in cloned form in pBluescript SK (pSKechA). In addition, after the fragments had been excised, the omega elements 9Km and QGm were ligated in in their stead. This resulted in the S-inactivated derivatives of the ech gene (echnKm -22and echQGm) being obtained in cloned form in pBluescript SK- (pSKechQKm and pSKech&2Gm).
Plasmid pSKvdh, which contained the vdh gene, was digested with BssHII, resulting in a 210 bp fragment being excised from the vdh gene. After religation, the deletion derivative of the vdh gene (vdhA, see Figs. lo and 20) was obtained in cloned form in pBluescript SK- (pSKvdhA). In addition, after the fragment had been excised, the omega elements 9SKm and DQGm were ligated in in its stead. This resulted in the SQinactivated derivatives of the vdh gene (vdhQKm and vdhSGm) being obtained in cloned form in pBluescript SK- (pSKvdhQKm, see Figs. Im and 2m) and (pSKvdhg2Gm, see Figs. In and 2n). It was not possible to detect any VDH activity in crude extracts of the resulting E. coli clones, whose hybrid plasmids possessed a vdh gene which was inactivated by deletion or by Q2 element insertion.
Plasmid pSKaat, which contained the aat gene, was digested with BssHII, resulting in a 59 bp fragment being excised from the aat gene. After religation, the deletion derivative of the aat gene (aatA, see Figs. Ir and 2r) was obtained in cloned form in pBluescript SK- (pSKaatA). In addition, after the fragment had been excised, the omega elements 2Km and Q2Gm were ligated in in its stead. This resulted in the 92inactivated derivatives of the aat gene (aatG2Km, see Figs. Ip and 2p) and (aatQGm, see Figs. lq and 2q) being obtained in cloned form in pBluescript SK (pSKaatQKm and pSKaatQGm).
-23- Example 4 Subcloning the Q element-inactivated genes into the conjugatively transferable "suicide plasmid" pSUP202.
In order to be able to replace the intact genes in Pseudomonas sp. HR199 with the ielement inactivated genes, there is a need for a vector which can, on the one hand, be transferred into pseudomonads (conjugatively transferable plasmids) but which, on the other hand, cannot replicate in these bacteria and is consequently unstable in pseudomonads ("suicide plasmid"). DNA segments which are transferred into pseudomonads using such a plasmid system can only be retained if they are integrated by means of homologous recombination (RecA-dependent recombination) into the genome of the bacterial cell. In the present case, the "suicide plasmid" pSUP202 (Simon et al. 1983. In: A. Ptihler. Molecular genetics of the bacteria-plant interaction. Springer Verlag, Berlin, Heidelberg, New York, pp. 98-106) was used.
Following digestion with PstI, the inactivated genes fcsQKm and fcsf2Gm were isolated from plasmids pSKfcsQKm and pSKfcs2Gm and ligated to PstI-cleaved pSUP202 DNA. The ligation mixtures were transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium which also contained kanamycin or gentamycin, respectively. Kanamycin-resistant transformants whose hybrid plasmid (pSUPfcsgKm) contained the inactivated gene fcsQ2Km were obtained. The corresponding hybrid plasmid (pSUPfcsnGm) of the gentamycin-resistant transformants contained the inactivated genefcsGGm.
Following EcoRI digestion, the inactivated genes echQKm and echQGm were isolated from plasmids pSKechQKm and pSKechQGm and ligated to EcoRI-cleaved pSUP202 DNA. The ligation mixtures were transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium which also contained kanamycin or gentamycin, respectively. Kanamycin-resistant transformants whose hybrid plasmid (pSUPechQKm) contained the inactivated gene echK2Km were obtained. The -24corresponding hybrid plasmid (pSUPechQGm) of the gentamycin-resistant transformants contained the inactivated gene echQGm.
Following EcoRI digestion, the inactivated genes vdhQKm and vdhSIGm were isolated from plasmids pSKvdhQKm and pSKvdhQ2Gm and ligated to EcoRI-cleaved pSUP202 DNA. The ligation mixtures were transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium which also contained kanamycin or gentamycin, respectively. Kanamycin-resistant transformants whose hybrid plasmid (pSUPvdhiKm) contained the inactivated gene vdhQKm were obtained. The corresponding hybrid plasmid (pSUPvdhQ2Gm) of the gentamycin-resistant transformants contained the inactivated gene vdhSGm.
Following EcoRI digestion, the inactivated genes aatQKm and aatSGm were isolated from plasmids pSKaatQKm and pSKaatQGm and ligated to EcoRI-cleaved pSUP202 DNA. The ligation mixtures were transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium which also contained kanamycin or gentamycin, respectively. Kanamycin-resistant transformants whose hybrid plasmid (pSUPaatQ2Km) contained the inactivated gene aatGKm were obtained. The corresponding hybrid plasmid (pSUPaatQGm) of the gentamycin-resistant transformants contained the inactivated gene aatQGm.
Example Subcloning the deletion-inactivated genes into the conjugatively transferable "suicide plasmid" PHE55, which possesses the "sacB selection system".
In order to be able to replace the intact genes in Pseudomonas sp. HR199 with the deletion-inactivated genes, there is a need for a vector which possesses the properties which have already been described in the case of pSUP202. Since no possibility (no antibiotic resistance) exists of selecting for successful replacement of the genes in Pseudomonas sp. HR199 in the case of deletion-inactivated genes, in contrast to the g element-inactivated genes, another selection system had to be used. In the "sacB selection system", the replacing, deletion-inactivated gene is cloned in a plasmid which possesses the sacB gene in addition to an antibiotic resistance gene. Following the conjugative transfer of this hybrid plasmid into a pseudomonad, the plasmid is integrated by means of homologous recombination at the site in the genome at which the intact gene is located (first crossover). This results in a "heterogenotic" strain which possesses both an intact gene and a deletion-inactivated gene, with these genes being separated from each other by the pHE55 DNA. These strains exhibit the resistance which is encoded by the vector and also possess an active sacB gene. The intention then is that the pHE55 DNA, together with the intact gene, should then be separated out of the genomic DNA by means of a second homologous recombination event (second crossover). This recombination event results in a strain which now only possesses the inactivated gene. In addition, the pHE55-coded antibiotic resistance and the sacB gene are both lost. If strains are streaked on sucrosecontaining media, the growth of strains which express the sacB gene is inhibited since the gene product converts sucrose into a polymer which is accumulated in the periplasm of the cells. The growth of cells which no longer carry the sacB gene as a result of the second recombination event having taken place is consequently not inhibited. In order to have a possibility of selecting phenotypically for the integration of the deletion-inactivated gene, this gene is not exchanged for an intact gene; instead, use is made of a strain in which the gene to be replaced is already "labelled" by the insertion of an 92 element. When successful replacement takes place, the resulting strain loses the antibiotic resistance which is encoded by the Q element.
Following digestion with Pstl, the inactivated gene fcsA was isolated from plasmid pSKfcsA and ligated to PstI-cleaved pHE55 DNA. The ligation mixture was transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium. Tetracycline-resistant transformants, whose hybrid plasmid (pHEfcsA) contained the inactivated genefcsA, were obtained.
Following digestion with EcoRI, the inactivated gene echA was isolated from plasmid pSKechA and treated with mung bean nuclease (generation of blunt ends).
-26- The fragment was ligated to BamHI-cleaved and mung bean nuclease-treated DNA. The ligation mixture was transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium. Tetracycline-resistant transformants, whose hybrid plasmid (pHEechA) contained the inactivated gene echA, were obtained Following digestion with EcoRI, the inactivated gene vdhA was isolated from plasmid pSKvdhA and treated with mung bean nuclease. The fragment was ligated to BamHI-cleaved and mung bean nuclease-treated pHE55 DNA. The ligation mixture was transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium. Tetracycline-resistant transformants, whose hybrid plasmid (pHEvdhA) contained the inactivated gene vdhA, were obtained.
Following digestion with EcoRI, the inactivated gene aatA was isolated from plasmid pSKaatA and treated with mung bean nuclease. The fragment was ligated to BamHIcleaved and mung bean nuclease-treated pHE55 DNA. The ligation mixture was transformed into E. coli S17-1. Selection took place on tetracycline-containing LB medium. Tetracycline-resistant transformants, whose hybrid plasmid (pHEaatA) contained the inactivated gene aatA, were obtained.
-27- Example 6 Generating mutants of the strain Pseudomonas sp. HR199 in which genes of eugenol catabolism have been specifically inactivated by inserting an Q-element.
The strain Pseudomonas sp. HR199 was employed as the recipient in conjugation experiments in which strains of E. coli S17-1 harbouring the hybrid plasmids of pSUP202 which are listed below were used as donors. The transconjugants were selected on gluconate-containing mineral medium which contained the antibiotic corresponding to the Q2 element. It was possible to distinguish between "homogenotic" (replacement of the intact gene with the SQ element insertioninactivated gene by means of a double crossover) and "heterogenotic" (integration of the hybrid plasmid into the genome by means of a single crossover) transconjugants on the basis of the pSUP202-encoded tetracycline resistance.
The mutants Pseudomonas sp. HR199 fcsgKm and Pseudomonas sp. HR199 fcsQGm were obtained after conjugating Pseudomonas sp. HR199 with E. coli S17-1 (pSUPfcsQKm) and E. coli S17-1 (pSUPfcsQGm), respectively. The replacement of the intactfcs gene with the QKm-inactivated or 2Gm-inactivated gene (fcsG2Km and fcsSGm, respectively) was verified by means of DNA sequencing.
The mutants Pseudomonas sp. HR199 echgKm and Pseudomonas sp. HR199 echQGm were obtained after conjugating Pseudomonas sp. HR199 with E. coli S17-1 (pSUPechiKm) and E. coli S17-1 (pSUPechQGm), respectively. The replacement of the intact ech gene with the GKm-inactivated or QGm-inactivated gene (echQKm and echS2Gm, respectively) was verified by means of DNA sequencing.
The mutants Pseudomonas sp. HR199 vdhQKm and Pseudomonas sp. HR199 vdhSGm were obtained after conjugating Pseudomonas sp. HR199 with E. coli S17-1 (pSUPvdh/iKm) and E. coli S17-1 (pSUPvdhQGm), respectively. The 28 replacement of the intact vdh gene with the Q Km-inactivated or QiGm-inactivated gene (vdhQKrn and vdhL2Gm, respectively) was verified by means of DNA sequencing.
The mutants Pseudomonas sp. HR199 aatQiKm and Pseudomonas sp. HR199 aatgi Gm were obtained after conjugating Pseudomonas sp. HR 199 with E. ccli S17-1 (pSLTPaatQiKm) and E. ccli S17-1 (pSUPaatQ2Gm), respectively. The replacement of the intact aat gene with the QKm-inactivated or L Gm-inactivated gene (aat9 Km and aatQiGm, respectively) was verified by means of DNA sequencing.
The mutant Pseudornonas sp. HR199 fcsgT KmvdhQ Gm was obtained after conjugating Pseudomonas sp. HR 199 fcsQ Km with E. ccli S 17-1 (pSU~vdh97 Gm).
The replacement of the intact vdh gene with the QiGm-inactivated gene (vdhK2Gm) was verified by means of DNA sequencing.
The mutant Pseudomionas sp. HIR199 vdhM Kmaat9QGm was obtained after conjugating Pseudomionas sp. H-R199 vdhzQKm with E. coli S 17-1 (pSUPaatQGm).
The replacement of the intact aat gene with the Q2Gm-inactivated gene (aatQ2Gm) was verified by means of DNA sequencing.) The mutant Pseudonionas sp. HR199 vdhQ KmechQ2Gm was obtained after conjugating Pseudomionas sp. HR 199 vdh 2Km with E. coli S 17-1 (pSUJPechn2Gm).
The replacement of the intact ech gene with the Q Gm-inactivated gene (echQ2Gm) was verified by means of DNA sequencing.
-29- Example 7 Generating of mutants of the strain Pseudomonas sp. HR199 in which genes of eugenol catabolism have been specifically inactivated by deleting a constituent region.
The strains Pseudomonas sp. HR199 fcsGQKm, Pseudomonas sp. HR199 echQKm, Pseudomonas sp. HR199 vdhL2Km and Pseudomonas sp. HR199 aatQKm were employed as recipients in conjugation experiments in which strains of E. coli S17-1 harbouring the hybrid plasmids of pHE55 which are listed below were used as donors. The "heterogenotic" transconjugants were selected on gluconate-containing mineral medium which also contained the antibiotic corresponding to the L element in addition to tetracycline (pHE55-encoded resistance). After streaking out on sucrose-containing mineral medium, transconjugants were obtained which had eliminated the vector DNA by means of a second recombination event (second crossover). By streaking out on mineral medium which was without antibiotic or which contained the antibiotic corresponding to the Q element, it was possible to identify the mutants in which the Q2 element-inactivated gene had been replaced with the deletion-inactivated gene (no antibiotic resistance).
The mutant Pseudomonas sp. HR199 fcsA was obtained after conjugating Pseudomonas sp. HR199fcsQ2Km with E. coli S17-1 (pHEfcsA). The replacement of the QKm inactivated gene (fcsQKm) with the deletion-inactivated gene (fcsA) was verified by means of DNA sequencing.
The mutant Pseudomonas sp. HR199 echA was obtained after conjugating Pseudomonas sp. HR199 echQKm with E. coli S17-1 (pHEechA). The replacement of the QKm-inactivated gene (ech2Km) with the deletion-inactivated gene (echA) was verified by means of DNA sequencing.
The mutant Pseudomonas sp. HR199 vdhA was obtained after conjugating Pseudomonas sp. HR199 vdhQKm with E. coli S17-1 (pHEvdhA). The replacement of the 2Km-inactivated gene (vdh&2Km) with the deletion-inactivated gene (vdhA) was verified by means of DNA sequencing.
The mutant Pseudomonas sp. HR199 aatA was obtained after conjugating Pseudomonas sp. HR199 aatSKm with E. coli S17-1 (pHEaatA). The replacement of the gKm-inactivated gene (aatQKm) with the deletion-inactivated gene (aatA) was verified by means of DNA sequencing.
Example 8 Biotransforming eugenol into vanillin using the mutant Pseudomonas sp. HR199 vdh2Km.
The strain Pseudomonas sp. HR199 vdhg2Km was propagated in 50 ml of HR-MM containing 6 mM eugenol up to an optical density of approx. OD600nm 0.6. After 17 h, it was possible to detect 2.9 mM vanillin, 1.4 mM ferulic acid and 0.4 mM vanillic acid in the culture supernatant.
Example 9 Biotransforming eugenol into ferulic acid using the mutant Pseudomonas sp.
HR199 vdhQ2GmaatQ2Km.
The strain Pseudomonas sp. HR199 vdhQGmaatQKm was propagated in 50 ml of HR-MM containing 6 mM eugenol up to an optical density of approx.OD600nm 0.6. After 18 h, it was possible to detect 1.9 mM vanillin, 2.4 mM ferulic acid and 0.6 mM vanillic acid in the culture supernatant.
-31- Example Biotransforming eugenol into coniferyl alcohol using the mutant Pseudomonas sp. HR199 vdhQGmaatKm.
The strain Pseudomonas sp. HR199 vdhQGmaatQKm was propagated in 50 ml of HR-MM containing 6 mM eugenol up to an optical density of approx. OD600nm 0.4. After 15 h, it was possible to detect 1.7 mM coniferyl alcohol, 1.4 mM vanillin, 1.4 mM ferulic acid and 0.2 mM vanillic acid in the culture supernatant.
Example 11 Fermentatively producing natural vanillin from eugenol in a 10 1 fermenter using mutant Pseudomonas sp. HR 199 vdhgKm.
The production fermenter was inoculated with 100 ml of a 24-hour-old preliminary culture which had been propagated at 32 0 C on a shaking incubator (120 rpm) in a medium which was adjusted to pH 7.0 and which consisted of 12.5 g of glycerol/l, g of yeast extract/I and 0.37 g of acetic acid/I. The fermenter contained 9.9 1 of medium of the following composition: 1.5 g of yeast extract/i, 1.6 g of KH 2
PO
4 0.2 g of NaCI/1, 0.2 g of MgSO 4 The pH was adjusted to pH 7.0 with sodium hydroxide solution. After sterilization, 4 g of eugenol were added to the medium. The temperature was 32 0 C, the aeration was 3 Nl/min and the stirrer speed was 600 rpm.
The pH was maintained at pH 6.5 with sodium hydroxide solution.
At 4 hours after the inoculation, continuous addition of eugenol was begun such that 255 g of eugenol had been added to the culture when fermentation ended after hours. 40 g of yeast extract were also fed in during the fermentation. At the end of the fermentation, the concentration of eugenol was 0.2 g/l. The content of vanillin was 2.6 g/1. 3.4 g of ferulic acid/I were also present.
-32- The vanillin which is obtained in this way can be isolated by known physical methods such as chromatography, distillation and/or extraction and used for preparing natural flavourings.
Explanatory notes regarding the figures: FIG. la to Ir: Gene struktures for isolating organisms and mutants calA*: Part of the inactivated gene for coniferyl alcohol dehydrogenase calB*: Part of the inactivated gene for coniferyl aldehyde dehydrogenase fcs*: Part of the inactivated gene for feruloyl-CoA synthetase ech*: Part of the inactivated gene for enoyl-CoA hydratase-aldolase vdh*: Part of the inactivated gene for vanillin dehydrogenase aat*: Part of the inactivated gene for beta-ketothiolase While the restriction enzyme cleavage sites labelled were used for the construction, they are no longer functional in the resulting construct.
-33- FIG. 2a: Nucleotide sequence of the calAQKm gene structure FIG. 2b: Nucleotide sequence of the calAQ2Gm gene structure: FIG. 2c: Nucleotide sequence of the calAA gene structure FIG. 2d: Nucleotide sequence of the calBSKm gene structure FIG. 2e: Nucleotide sequence of the calBIGm gene structure FIG. 2f: Nucleotide sequence of the calBA gene structure FIG. 2g: Nucleotide sequence of the fcsQKm gene structure FIG. 2h: Nucleotide sequence of the fcsSgGm gene structure FIG. 2i: Nucleotide sequence of the fcsA gene structure FIG. 2j: Nucleotide sequence of the echQKm gene structure FIG. 2k: Nucleotide sequence of the echQGm gene structure FIG. 21: Nucleotide sequence of the echA gene structure FIG. 2m: Nucleotide sequence of the vdh2Km gene structure FIG. 2n: Nucleotide sequence of the vdhQGm gene structure FIG. 2o: Nucleotide sequence of the vdhA gene structure FIG. 2 p: Nucleotide sequence of the aatQKm gene structure FIG. 2q: Nucleotide sequence of the aatQGm gene structure FIG. 2r: Nucleotide sequence of the aatA gene structure EDITORIAL NOTE NO 52818/99 Sequence listing pages 1-75 is part of the description.
Claim pages are to follow.
CTGCAGCCAG
GGCTCCAATT
GCTAGGGAGA
CATTCTGCAT
AAGGTTGCTA
GCATGGAAAT
TGGAAGCACG
AAAGAGCATG
Met 1 Sequences GGCTGAAAAG GAGGGATTCA GTGAGGTCAT GAAGGGAGGG GCTCGATGGC GCCGCGATTG AGTGTCTTGG GCGCGGTCTT TAAATTTGCT GGCCATGGTG GCGGCCCCTG ATGCGTTGGA CATGAAATTC ATGAAATCAT CACTTTTCGG GGGGTGGGTG GGAGAGTGCA TTGCTCGTAA GCCCAGGAAG CACGCGGGTT GGCATGAGCT TTGCTGGATA TGATTAGAGA CATTAACTAT ATTCCTCCCC CGGTAGAGCC GTAACCGCGA CATTCAGCAC CAA CTG ACC AAC AAG AAA ATC GTC GTC ACC GGA Gin Leu Thr Asn Lys Lys Ile Val Val Thr Gly 5 10
GACGGCGCCT
GGAGAGTTCG
TGATTTTCTG
CACGGGATTG
TCAGCATGGT
TTTGGCGGAA
CGTAAAAAGG
GTG TCC TCC Val Ser Ser 120 180 240 300 360 420 472 GGT ATC GGT GCC GAA ACT GCC CGC GTT CTG CCC TCT CAC GGC CCC ACA Cly Ile Giy Ala Giu Thr Aia Arg Val Leu Arg Ser His Giy Aia Thr 25 GTG ATT GGC GTA GAT CGC AAC ATG CCG AGC CTG ACT CTG GAT GCT TTC Val Ile Gly Val Asp Arg Asn Met Pro Ser Leu Thr Leu Asp Ala Phe 40 GTT CAC GCT GAC CTC AGC CAT CCT GAA GGC ATC GAT AAG GCC ATC GGG Val Gin Ala Asp Leu Ser His Pro Glu Giy Ile Asp Lys Aia Ile 55 60 62 ACAGCAAGCG AACCGGAATT GCCAGCTGGG GCGCCCTCTC GTAAGGTTGG GAAGCCCTGC AAAGTAAACT GGATCGCTTT CTTGCCGCCA ACGATCTCAT GGCGCAGGGG ATCAAGATCT CATCAAGAGA CAGCATCAGG ATCGTTTCGC ATC ATT GAA CAA CAT GGA TTC CAC Met Ile Giu Gin Asp Giy Leu His 1 CCA GGT TCT CCG GCC CCT TCG GTG GAG AGC CTA TTC GGC TAT CAC TGG Ala Giy Ser Pro Aia Aia Trp Vai Giu Arg Leu Phe Giy Tyr Asp Trp 15 CCA CAA CAG ACA ATC GGC TCC TCT CAT CCC CCC GTG TTC CGG CTG TCA Ala Gin Gin Thr Ile Cly Cys Ser Asp Aia Aia Val Phe Arg Leu Ser 30 35 GCG CAG GGG CCC CCG GTT CTT TTT GTC AAG ACC GAC CTC TCC GGT GCC Ala Gin Gly Arg Pro Val Leu Phe Vai Lys Thr Asp Leu Ser Cly Ala 50 CTC AAT CAA CTG CAG GAC GAG GCA GCG CGG CTA TCC TGG CTG GCC ACC Leu Asn Clu Leu Gin Asp Clu Ala Aia Arg Leu Ser Trp Leu Ala Thr 65 568 616 676 736 790 838 886 934 982 ACG GGC GTT Thr Gly Val CCT TGC GCA GCT Pro Cys Ala Ala
GTG
Val CTC GAC GTT GTC Leu Asp Val Val GAA GCG GGA Glu Ala Gly AGG GAC Arg Asp TGG CTG CTA TTG Trp Leu Leu Leu
GGC
Cly 95 GAA GTG CCC GGG Glu Val Pro Gly
CAG
Gin 100 GAT CTC CTG TCA Asp Leu Leu Ser
TCT
Ser 105 CAC CTT GCT CCT His Leu Ala Pro GAG AAA GTA TCC Glu Lys Val Ser ATG GCT GAT GCA Met Ala Asp Ala CGG CGG CTG CAT Arg Arg Leu His
ACG
Thr 125 CTT CAT CCG GCT Leu Asp Pro Ala
ACC
Thr 130 TGC CCA TTC GAC Cys Pro Phe Asp CAC CAA His Gin 135 GCG AAA CAT Ala Lys His GTC CAT CAG Val Asp Gin 155 ATC GAG CCA CCA Ile Glu Arg Ala
CGT
Arg 145 ACT CGG ATG GAA Thr Arg Met Glu CCC GGT CTT Ala Cly Leu 150 CCG CCA GCC Ala Pro Ala CAT CAT CTG GAC Asp Asp Leu Asp
CAA
Glu 160 GAG CAT CAG GGG Glu His Gin Cly
CTC
Leu 165 1030 1078 1126 1174 1222 1270 1318 1366 1414 1462 1510 GAA CTC Glu Leu 170 TTC CCC AGG CTC Phe Ala Arg Leu GCG CCC ATC CCC Ala Arg Met Pro GGC GAG CAT CTC Gly Glu Asp Leu GTC ACC CAT GGC Val Thr His Gly
CAT
Asp 190 CCC TCC TTC CCC Ala Cys Leu Pro ATC ATG GTG GAA Ile Met Val Glu
AAT
Asn 200 GGC CCC TTT TCT Gly Arg Phe Ser
GGA
Gly 205 TTC ATC CAC TGT Phe Ile Asp Cys
GGC
Cly 210 CGG CTC GGT CTG Arg Leu Cly Val GCG GAC Ala Asp 215 CGC TAT CAC Arg Tyr Gin GGC GGC GAA Cly Cly Glu 235 ATA GCG TTG GCT Ile Ala Leu Ala
ACC
Thr 225 CGT CAT ATT GCT Arg Asp Ile Ala CAA GAG CTT Glu Glu Leu 230 ATC CCC GCT Ile Ala Ala TGG CCT GAC CGC Trp Ala Asp Arg CTC GTC CTT TAC Leu Val Leu Tyr
GGT
Cly 245 CCC CAT Pro Asp 250 TCC CAC CCC ATC Ser Gin Arg Ile TTC TAT CCC CTT Phe Tyr Arg Leu CAC GAG TTC Asp Glu Phe TTC 1558 Phe 264 TGAGCGGGAC TCTGGGGTTC CAAATGACCG ACCAAGCGAC ATT GCA TTC ATC TGT CCT GAG GAG TCA CGT TGG Ile Ala Phe Met Cys Ala Glu Giu Ser Arg Trp 230 235 GCCCTG GCC GCG GTG Ala Ala Val 225 ATC AAC GGC ATA AAT Ile Asn Cly Ile Asn 240 1613 1661 -3- ATT CCA GTG GAC GGA GGT TTG GCA TCG ACC TAC GTG TAA GTTCGTGGAC Ile Pro Val Asp Gly Gly Leu Ala Ser Thr Tyr Val 245 250 255 GCCCTTTGCA CGCGCACTAT ATCTCTATGC AGCAGCTGAA AGCAGCTTTG GTTTTGATCG GAGGTAGCGG GCGGAAAGGT GCAGAATGTC TAAATAATAA AGGATTCTTG TGAAGCTTTA GTTGTCCGTA AACGAAAATA AAAATAAAGA GGAATGATAT GAAAGCAAGT AGATCAGTCT GCACTTTCAA AATAGCTACC CTGGCAGGCG CCATTTATGC AGCGCTGCCA ATGTCAGCTG CAAACTCGAT GCAGCTGGAT GTAGGTAGCT CGGATTGGAC GGTGCGTTGG GGACAACACC CTCAAGTATA GCCTTGCCTC TCGCCTGAAT GAGCAAGACT CAAGTCTGAC AAATGCGCCG ACTGTCAATG GTTATATCCG GATATTCAAA GTCAGGGTGA TCGTAACTTT GACCGGGGGC TTGGTATCCA ATCGTCTCGA TATTCTGGCT GCAG FIG. 2a: 1710 1770 1830 1890 1950 2010 2070 2130 2164 -4- CTGCAGCCAC GGCTGAAAAG GAGGGATTCA GTGAGGTCAT GAAGGGAGGG GACGGCGCC' GGCTCCAATT GCTCGATGGC GCCGCGATTG AGTGTCTTGG GCGCGGTCTT GGAGAGTTC GCTAGGGAGA TAAATTTGCT GGCCATGGTG GCGGCCCCTG ATGGGTTGGA TGATTTTCT CATTCTGCAT CATGAAATTC ATGAAATCAT CACTTTTCGG GGGGTGGGTG CACGGGATT( AAGGTTGCTA GGAGAGTGCA TTGCTCGTAA GCCCAGGAAC CACGCGGGTT TCAGGATGG' GCATGGAAAT GGCATGAGCT TTGCTGGATA TGATTAGAGA CATTAACTAT TTTGCCGGA.
TGGAAGCACG ATTCCTCGCC CCCTAGAGCG GTAACCGCGA CATTCAGGAC CGTAAAAAGI AAAGAGCATG CAA CTG ACC AAC AAC AAA ATC GTC GTC ACC GGA GTG TCC Ti Met Gin Leu Thr Asn Lys Lys Ile Val Val Thr Gly Val Ser S 1 5 10 GGT ATC GGT GCC CAA ACT CCC CGC GTT CTC CGC TCT CAC GOC GCC ACA Gly Ile Gly Ala Giu Thr Ala Arg Val Leu Arg Ser His Gly Ala Thr 25 GTG ATT GGC GTA GAT CGC AAC ATG CCG AGC CTG ACT CTG GAT GCT TTC Val Ilie Gly Val Asp Arg Asn Met Pro Ser Leu Thr Leu Asp Ala Phe 40 GTT CAG GCT GAC CTG AGC CAT CCT GAGGGGAGAG GCGGTTTGCG TATTGGGCGC Val Gin Ala Asp Leu Ser His Pro
T
G
3
G
er 120 180 240 300 360 420 472 520 568 622 682 742 802 862 922 982 1033 1081 1129 1177
ATGCATAAAA
GCATGATGAA
CCCATGGACG
CGGTTCGTAA.
CCGAACGCAG
TTGTACAGTC
TGATGTTATG
ACTGTTGTAA
CCTGAATCGC
CACACCGTGG
ACTGTA-ATGC
CGGTGGTAAC
TATGCCTCGG
GAGCAGCAAC
TTCATTAAGC ATTCTCCCGA CAGCGGCATC AGCACCTTGT AAACGGATGA AGGCACGAAC AAGTAGCGTA TGCGCTCACG GGCGCAGTGG CGGTTTTCAT GCATCCAAGC AGCAAGCGCG C ATG TTA CGC AGC AGC Met Leu Arg Ser Ser 1 5 CATGGAAGCC ATCACAAACG CCCCTTGCGT ATAATATTTG CCACTTGACA TAACCCTGTT CAACTGGTCC AGAACCTTGA GGCTTCTTA.T GACTCTTTTT TTACGCCCTG GGTCGATCTT AAC CAT GTT ACC CAC Asn Asp Val Thr Gin TCA ACT ATC CCC ATC Ser Ser Met Cly Ile AAA TCC ATG CCC CCT Lys Ser Met Arg Ala GTA CCC ACC TAC TCC Val Ala Thr Tyr Ser CAG CCC ACT CCC CCT AAA ACA AAG TTA CCT GC Gin Ciy Ser Arg Pro Lys Thr Lys Leu Ciy Cly 20 ATT CCC ACA TGT AGG CTC CCC CCT GAC CAA CTC Ile Arg Thr Cys Arg Leu Cly Pro Asp Gin Val 35 GCT CTT GAT CTT TTC GCT CCT GAG TTC GGA CAC Ala Leu Asp Leu Phe Gly Arg Glu Phe Giy Asp CAA CAT CAG CCG GAC TCC GAT Gin His Gin TTC ATC Phe Ile Pro Asp Ser Asp 65 GCG CTT GCT GCC Ala Leu Ala Ala 80 TAC CTC GGG Tyr Leu Gly TTC GAC CAA Phe Asp Gin Asn Leu Leu Arg Ser Lys AAC TTG CTC CGT AGT AAG
ACA
Thr
GAA
Glu GCG GTT GTT GGC Ala Val Val Gly CTC GCG GCT TAC Leu Ala Ala Tyr
GTT
Va1
CTC
Leu CTG CCC AGG TTT Leu Pro Arg Phe
GAG
Glu 100
GAG
Glu CAG CCG CGT AGT Gin Pro Arg Ser GAG ATC Glu Ile 105 TAT ATC TAT Tyr Ile Tyr GCC ACC GCG Ala Thr Ala 125 GCT TAT GTG Ala Tyr Val GCA GTC TCC Ala Val Ser
GGC
Gly 115
AAG
Lys CAC CGG AGG His Arg Arg ATC AAT CTC Ile Asn Leu CAT GAG GCC His Giu Ala
AAC
Asn 135
GAT
Asp CAG GGC ATT Gin Gly Ile 120 GCG CTT GGT Ala Leu Gly CCC GCA GTG Pro Ala Val ATC TAC GTG Ile Tyr Val GCA GAT TAC GGT Ala Asp Tyr Gly 140 GCT CTC Ala Leu TAT ACA AAG Tyr Thr Lys
TTG
Leu 160
GCC
Ala ATA CGG GAA Ile Arg Glu
GAA
Glu 165 ATG CAC TTT Met His Phe
GAT
Asp 170 1225 1273 1321 1369 1417 1465 1513 1567 1616 1665 1725 1785 1845 1905 1965 2025 2085 GAC CCA AGT Asp Pro Ser TAA CAATTCGTTC AAGCCGAGAT CGGCTTCCCT G ATT GCA TTC ATG TGT GCT GAG GAG TCA CGT TGG ATC AAC GGC ATA AAT Ile Ala Phe Met Cys Ala Giu Giu Ser Arg Trp Ile Asn Gly Ile Asn 228 235 240 ATT CCA GTC Ile Pro Va 245
GCCCTTTGCA
GAGGTAGCGG
GTTGTCCGTA
GCACTTTCAA
CAAACTCGAT
CTCAAGTATA
ACTGTCAATG
GAC GGA GGT TTG GCA TCG ACC TAC L Asp Gly Gly Leu Ala Ser Thr Tyr 250
CGCGCACTAT
GCGGAAAGGT
AACGAAAATA
AATAGCTACC
GCAGCTGGAT
GCCTTGCCTC
GTTATATCCG
ATCTCTATGC
GCAGAATGTC
AAAATAAAGA
CTGGCAGGCG
GTAGGTAGCT
TCGCCTGAAT
GATATTCAAA
AGCAGCTGAA
TAAATAATAA
GGAATGATAT
CCATTTATGC
CGGATTGGAC
GAGCAAGACT
GTCAGGGTGA
GTG TAA GTTCGTGGAC Va1 255 AGCAGCTTTG GTTTTGATCG AGGATTCTTG TGAAGCTTTA GAAAGCAAGT AGATCAGTCT AGCGCTGCCA ATGTCAGCTG GGTGCGTTGG GGACAACACC CAAGTCTGAC AAATGCGCCG TCGTAACTTT GACCGGGGGC TTGGTATCCA ATCGTCTCGA TATTCTGGCT GCAG 2119 FIG. 2b: -6-
CTGCAGCCAG
GGCTCCAATT
CCTAGCGAGA
CATTCTGCAT
AAGGTTGCTA
GCATGGAAAT
TGGAACCACG
AAAGACCATC
Met 1 GCCTGAAAAG GAGGGATTCA GTCAGGTCAT GAAGGGAGGG CCTCGATGGC GCCGCGATTG ACTGTCTTGG GCGCGGTCTT TAAATTTGCT GGCCATGGTG GCGGCCCCTG ATGGGTTGGA CATGAAATTC ATGAAATCAT CACTTTTCGG GGGGTGGGTG GGAGAGTGCA TTGCTCGTAA GCCCAGGAAG CACGCGGGTT GGCATGAGCT TTGCTGGATA TGATTAGAGA CATTAACTAT ATTCCTCGCC CGGTAGAGCG GTAACCGCGA CATTCAGGAC CAA CTG ACC AAC AAG AAA ATC GTC GTC ACC GGA Gin Leu Thr Asn Lys Lys Ile Val Val Thr Gly
GACGGCGCCT
GGAGAGTTCG
TGATTTTCTG
CACGGGATTG
TCAGGATGCT
TTTGGCGGAA
CGTAAAAAGG
GTG TCC TCC Vai Ser Ser 120 180 240 300 360 420 472 520 568 617 GCT ATC GGT CCC GAA ACT GCC CGC GTT CTG CGC Giy Ile Giy Aia Giu Thr Aia Arg Vai Leu Arg 25 GTG ATT GGC GTA GAT CGC AAC ATG CCC ACC CTC Vai Ilie Gly Vai Asp Arg Asn Met Pro Ser Leu 40 GTT CAG GCT GAC CTC AGC CAT CCT GAA GGC ATC Val Gin Aia Asp Leu Ser His Pro Ciu Giy Ile 55 58 ATT CCA GTG GAC GGA GGT TTG GCA TCG ACC TAC Ilie Pro Vai Asp Ciy Giy Leu Aia Ser Thr Tyr 245 250 CCCCTTTGCA CGCCCACTAT ATCTCTATCC AGCAGCTCAA CACGTAGCGG GCGGAAAGGT GCAGAATGTC TAAATAATAA GTTGTCCGTA AACCAAAATA AAAATAAAGA GGAATGATAT GCACTTTCAA AATACCTACC CTGGCAGGCG CCATTTATGC CAAACTCGAT GCAGCTGGAT GTAGGTACCT CGGATTGGAC CTCAAGTATA GCCTTGCCTC TCGCCTGAAT GAGCAAGACT ACTCTCAATG GTTATATCCC GATATTCAAA GTCAGCGTCA TTCCTATCCA ATCGTCTCGA TATTCTCGCT GCAG FIG. 2c: TCT CAC CCC CCC ACA Ser His Ciy Aia Thr ACT CTC CAT CCT TTC Thr Leu Asp Aia Phe CATC AAC CCC ATA AAT Asn Ciy Ile Asn 240 CTC TAA CTTCCTCCAC Val 255 ACCACCTTTC CTTTTCATCG ACGATTCTTC TCAACCTTTA CAAACCAACT AGATCACTCT ACCGCTCCCA ATCTCACCTC CCTGCCTTCC CCACAACACC CAACTCTCAC AAATCCGCCC TCCTAACTTT CACCCCCCCC 726 786 846 906 966 1026 1086 1120 -7-
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAACAT
CTCCAGCTCA
AGAATAACAA
TATCGCCCCG
TTTTCTTGC
GCTTCGCTTC
TAAGCATTCT
TTGAGGCCGA
AATTAAAATA
AGGGCAATTT
TTGACTCCTC
TTCTATCAGC
CATCCTTGTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGCAAACCGC
TTGGGCTATT
AGCAGGTCAG
GCCCCGCTTT
GCCTGAA.CCT
GCATCGAGAT
TGGTGGCTTT
CTTGCGCG
ATGGTTTCTT
GGCTGAGCAG
CG ATG AGC
CGAAAGTCAT
TCGTTGACAT
CCTGAGGTCA
GAACAGCCTG
TCGAAGCGAT
ATGTGAATTT
TTGCCTCTA.T
GGTCTTAGCC
AGGGCACAGG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
120 180 240 300 360 420 ATT CTT GGT TTG AAT Met Ser Ile Leu 1 GGT GCC CCC Gly Ala Pro GTC GGA GCT Val Gly Ala GAG CAG Giu Gin CAG GGG Gin Gly CTG CCC TCC GCT Leu Gly Ser Ala Gly Leu Asn GAT CCC ATC Asp Arg Met CTG CGT CTG Leu Arg Leu AAC AAC Lys Lys ACT AGG Ser Arg GCG CAC CTG GAG Ala His Leu Ciu CCT GCA AAC Pro Ala Asn 30
ATT
Ile
TTG
Leu
GAA
Glu CTC CAT CCT Leu Asp Arg
GCC
Ala 45
TCT
Ser CCA ATG CTT Ala Met Leu AAT CCT GAA Asn Arg Ciu
GCA
Ala ATT CCC GAC GCC Ile Ala Asp Ala
GTT
Val1 GCT GAC TTT Ala Asp Phe CCC AGC CGT Arg Ser Arg GAG CAA Ciu Gin ACA CTC CTT Thr Leu Leu CCC GAG CAC Arg Giu His TTT CCA GGG Phe Pro Gly CAC ATT GCT GC Asp Ile Ala Cly
TCG
Ser
GAG
Giu GCA AGC CTC Ala Ser Leu GCC AAA TGG Ala Lys Trp CCC CAA CAT Pro Ciu His
CAC
His 100
CTG
Leu AAG CAT AGC Lys Asp Ser AAC CCG ATG Lys Aia Met GGT GTC GTT Gly Val Val CC GAG CCA Ala Glu Ala 105 GCG CTC Cly Val
CGC
Arg 110
AAC
Asn GAC TTT CAG Giu Phe Gin ATT ACT CCC Ile Ser Pro 120
CTG
Leu
TCC
Trp 125
CCA
Ala TTC CCT ATC Phe Pro Ile
CTA
Val1 130
CC
Ala CCC TTT GC Ala Phe Gly 809 857 905 CCC CCC ATA Ala Cly Ile
TTC
Phe 140
CCC
Arg CCA CGT AAT Ala Gly Asn
CC
Arg 145
CTT
Leu ATG CTC AAC Met Leu Lys CCC TCC Pro Ser 150 CCT CGT Ala Arg GAG CTT ACC Giu Leu Thr
CCC
Pro 155 ACT TCT CC Thr Ser Ala CC GAC CTA Ala Glu Leu
ATT
Ile 165 TAC TTC GAT GAA ACT GAG CTG ACT ACA GTG CTG GGC GAC Tyr Phe Asp Glu Thr Giu Leu Thr Thr Val Leu Gly Asp 170 175 180 GGT GCG CTG TTC AGT GCT CAG CCT TTC GAT CAT CTG ATC Gly Ala Leu Phe Ser Ala Gin Pro Phe Asp His Leu Ile 185 190 195 GGC ACT GCC GTG GCC AAG CAC ATC ATG CGT GCC GCG GCG Gly Thr Ala Val Ala Lys His lie Met Arg Ala Ala Ala 200 205 210 GTG CCC GTT ACC CTG GAA TTG GGT GGC AAA TCG CCG GTG Val Pro Val Thr Leu Glu Leu Gly Gly Lys Ser Pro Val 220 225 CGC AGT GCA GAT ATG GCG GAC GTT GCA CAA CGG GTG TTG Arg Ser Ala Asp Met Ala Asp Val Ala Gin Arg Val Leu 235 240 ACC TTC AAT GCC GGG CAA ATC TGT CTG GCA CCG GAC TAT Thr Phe Asn Ala Gly Gin Ile Cys Leu Ala Pro Asp Tyr 250 255 260 CCG GAA GGGACAGCAA GCGAACCGGA ATTGCCAGCT GGGGCGCCCT Pro Glu 265 TGGGAAGCCC TGCAAAGTAA ACTGGATGGC TTTCTTGCCG CCAAGGA GGGATCAAGA TCTGATCAAG AGACAGGATG AGGATCGTTT CGC ATG GCT GAA GTC Ala Glu Val TTC ACC GGC Phe Thr Gly GAT AAC CTA Asp Asn Leu 215 ATC GTT TCC Ile Val Ser 230 ACG GTG AAA Thr Vai Lys 245 GTG CTG CTG Val Leu Leu
CTGGTAAGGT
TCT GATGGCGCAG ATT GAA CAA 1001 1049 1097 1145 1193 1241 1297 1357 1412 1460 1508 1556 1604 1652
GAT
Asp
GGC
Gly
TTC
Phe
CTG
Leu
TGG
Trp
GGA
Gly
TAT
Tyr
CGG
Arg
TCC
Ser
CTG
Leu
TTG
Leu
GAG
Asp
CTG
Leu
GGT
Gly
GCC
Ala
CAC
His
TGG
Trp
TCA
Ser
GCC
Ala
ACG
Thr
GCA
Ala
GCA
Ala
GCG
Ala
CTG
Leu
ACG
Thr
GGT
Gly 10
CAA
Gin
CAG
Gin
AAT
Asn
GGC
Gly
TCT
Ser
CAG
Gin
GGG
Gly
GAA
Glu
OTT
Va1 75
CCC
Pro
ACA
Thr
CGC
Arg
CTG
Leu 60
CCT
Pro
GCC
Ala
ATC
Ile
CCC
Pro 45
CAG
Gin
TGC
Gys
GCT
Ala
GGC
Gly 30
GTT
Val
GAG
Asp
OCA
Ala
TGG
Trp 15
TGC
Cys
GTT
Leu
GAG
Glu
OCT
Ala Met Ile Glu Gin 1 GTG GAG AGG CTA TTC Val Glu Arg Leu Phe TCT OAT GCC GCC GTG Ser Asp Ala Ala Val TTT GTC AAG ACC GAC Phe Val Lys Thr Asp OCA GCG CGG CTA TCG Ala Ala Arg Leu Ser GTG CTC GAC GTT GTC Val Leu Asp Val Val -9-
ACT
Thr GAA GCG GGA AGG Glu Ala Gly Arg TGG CTG CTA TTG Trp Leu Leu Leu
GGC
Gly GAA GTG CCG GGG Glu Val Pro Gly GAT CTC CTG TCA Asp Leu Leu Ser
TCT
Ser 105 CAC CTT GCT CCT His Leu Ala Pro
GCC
Ala 110 GAG AAA GTA TCC Glu Lys Val Ser ATC ATG Ile Met 115 GCT GAT GCA Ala Asp Ala TTC GAC CAC Phe Asp His 135 CGG CGG CTG CAT Arg Arg Leu His
ACG
Thr 125 CTT GAT CCG GCT Leu Asp Pro Ala ACC TGC CCA Thr Cys Pro 130 ACT CGG ATG Thr Arg Met CAA GCG AAA CAT Gin Ala Lys His ATC GAG CGA GCA Ile Glu Arg Ala
CGT
Arg 145 GAA GCC Glu Ala 150 GGT CTT GTC GAT Gly Leu Val Asp
CAG
Gln 155 GAT GAT CTG GAC Asp Asp Leu Asp
GAA
Glu 160 GAG CAT CAG GGG Glu His Gin Gly GCG CCA GCC GAA Ala Pro Ala Glu
CTG
Leu 170 TTC GCC AGG CTC Phe Ala Arg Leu
AAG
Lys 175 GCG CGC ATG CCC Ala Arg Met Pro
GAC
Asp 180 GGC GAG GAT CTC Gly Glu Asp Leu GTG ACC CAT GGC Val Thr His Gly
GAT
Asp 190 GCC TGC TTG CCG Ala Cys Leu Pro AAT ATC Asn Ile 195 1700 1748 1796 1844 1892 1940 1988 2036 2084 2132 2180 2235 2283 2331 ATG GTG GAA Met Val Glu GGT GTG GCG Gly Val Ala 215
AAT
Asn 200 GGC CGC TTT TCT Gly Arg Phe Ser TTC ATC GAC TGT Phe Ile Asp Cys GGC CGG CTG Gly Arg Leu 210 CGT GAT ATT Arg Asp Ile GAC CGC TAT CAG Asp Arg Tyr Gin
GAC
Asp 220 ATA GCG TTG GCT Ile Ala Leu Ala
ACC
Thr 225 GCT GAA Ala Glu 230 GAG CTT GGC GGC Glu Leu Gly Gly
GAA
Glu 235 TGG GCT GAC CGC Trp Ala Asp Arg
TTC
Phe 240 CTC GTG CTT TAC Leu Val Leu Tyr
GGT
Gly 245 ATC GCC GCT CCC Ile Ala Ala Pro TCG CAG CGC ATC Ser Gin Arg Ile TTC TAT CGC CTT Phe Tyr Arg Leu GAC GAG TTC Asp Glu Phe CGC CAT GCC His Ala 444 445 TTC TGA GCGGGACTCT GGGGTTCGAA ATGACCGACC AAGCGACGCC Phe 264 AAG CCT GTT CTC GTG CAA AGT CCT GTG GGT GAG TCG AAC Lys Pro Val Leu Val Gln Ser Pro Val Gly Glu Ser Asn 450 455 TTG GCG Leu Ala 460 ATG CGC GCA CCC Met Arg Ala Pro TAC GGA GAA GCG ATC CAC GGA CTG CTC TCT Tyr Gly Glu Ala Ile His Gly Leu Leu Ser 465 470 GTC CTC CTT TCA ACG GAG TGT TAG AACCGTTGGT AGTGGTTTTG GACGGGCCCA Val Leu Leu Ser Thr Glu Cys 475 480 481 GGAGCATGCG CTTCTGGGCC CGTTTCTTGA GTATTCATTG GATAGTCACG CGTGGTAGC TCGAGCCTGC ACAGCTGATG AGCACCCTGG AAGGCGCGCT GTACGCGGAC GACTGGGTT ATCTTCGCCA TTCATGACGG AACTCCGTTC CCCAGTACCG CGATGACTAT TTTGCCTCT CCGATGTCCG ATTCCACGCC GCCTGACGCT AAGCGGGGGC GGGGGCGCCC GCATCCCAG CCAGACAGCA ACAAATGAGT AGGCTCTTGG ATGCCGCGGC GGCTGAGATT GGTAACGGC ATTTCGTCAA TGTGACGATG GATTCGATTG CCCGTGCTGC CGGCGTCTCA AAAAAAACG TGTACGTCTT GGTGGCGAGC AAGGAAGAAC TCATTTCCCG GTTAGTGGCT CGAGACATG
T
T
c
T
2385 2445 2505 2565 2625 2685 2745 2805 2822
CCAACCTTGA
FIG. 2d:
GGAATTC
I1I
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAAGAT
CTCCAGCTCA
AGAATAACAA
TATCGCCCGG
TTTTCTTGGC
GCTTCGCTTC
TAAGCATTCT
TTGAGCCCA
AATTAAAATA
AGGGCAATTT
TTGACTCCTC
TTCTATCAGC
CATGCTTCTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGGAAACCGC
TTGGGCTATT
AGGAGGTCAG
GGGCCGCTTT CGAP GCCTGAACCT TCGI GCATCGAGAT GCTG TGGTGGCTTT GAAC CTTGGCGGCG TCG ATGGTTTCTT ATGI CGCTGAGCAG TTGC CG ATG AGC ATT Met Ser Ile
AGTCAT
TGACAT
;ACGTCA
AGCCTG
LAGCGAT
~GAATTT
CTCTAT
GGTGTTAGCC
AGGGCAGACG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
120 180 240 300 360 420 473 CTT CCT TTG AAT Leu Gly Leu Asn GCT CCC CCG Gly Ala Pro AAG AAG, GCG Lys Lys Ala GTC GGA GCT GAG Val Gly Ala Clu CAG CTG Gin Leu 15 GGG CCT Gly Pro GGC TCG GCT Gly Ser Ala
CTT
Leu
GAG
Glu GAT CGC ATG Asp Arg Met CTG CGT CTG Leu Arg Leu CAC CTG GAG Hius Leu Clu GCA AAC Ala Asn AGT AGG Ser Arg
TTG
Leu
GAA
Giu CTG CAT CGT Leu Asp Arg
CCG
Ala ATT GCA ATG CTT Ile Ala Met Leu AAT CGT GAA Asn Arg Giu
ATT
Ile CCC GAC CC Ala Asp Ala TCT GCT GAC TTT Ser Ala Asp Phe AAT CGC AGC CGT Asn Arg Ser Arg GAG CAA Ciu Gin 665 ACA CTC CTT Thr Leu Leu CCC GAG CAC Arg Ciu His TTT CCA CCC Phe Pro Cly
TGC
Cys ATT CCT GC Ile Ala Cly CTC CCA ACC CTC Val Ala Ser Leu CTC CCC AAA TCC Val Ala Lys Trp
ATC
Met 95
CTT
Val1 GAG CCC CAA CAT Clu Pro Ciu His
CAC
His 100
CTC
Leu AAC CAT AC Lys Asp Ser AAC CC ATC Lys Aia Met CCT CTC CTT Gly Val Val CC GAG CCA Ala Clu Ala 105 CCC CTC Cly Val
CC
Arg 110 GAG TTT CAC Giu Phe Gin
CCC
Pro ATT ACT CCC Ile Ser Pro 120
CTC
Leu TCC AAC TTC Trp Asn Phe 125 CCT ATC Pro Ile CTC CCC TTT CCC Leu Ala Phe Cly
CCC
Pro 135 857 905 CCC GC Ala Cly ATA TTC Ile Phe 140 CCC CCC Pro Arg 155 CCA CCA CCT AAT CCC CCC ATC CTC AAC Ala Ala Cly Asn Arg Ala Met Leu Lys 145 ACT TCT CCC CTC CTT CC GAG CTA ATT Thr Ser Ala Leu Leu Ala Ciu Leu Ile 160 165 CCC TCC Pro Ser 150 CCT CCT Ala Arg GAG CTT ACC Ciu Leu Thr -12- TAC TTC GAT Tyr Phe Asp 170 GAA ACT GAG CTG Giu Thr Glu Leu
ACT
Thr 175
CCT
Pro ACA GTG CTG GGC Thr Val Leu Gly
GAC
Asp 180
ATC
Ile GCT GAA GTC Ala Giu Val TTC ACC GGC Phe Thr Gly GGT GCG Gly Ala 185 GGC ACT Giv Thr CTG TTC AGT GCT Leu Phe Ser Ala
GAG
Gin 190
CAG
His TTG GAT CAT Phe Asp His
GTG
Leu 195
GG
Ala GGC GTG GCG Ala Val Ala
AAG
Lys 205
GAA
Glu ATG ATG GGT Ile Met Arg
GCG
Ala 210
TG
Ser GCG GAT AAC Ala Asp Asn
CTA
Leu 215 CCC GTT ACC Pro Val Thr TTG GGT GGG Leu Gly Gly
AAA
Lys 225
CAA
Gin CGG GTG ATG Pro Val Ile CGC AGT GGA Arg Ser Ala ACC TTG AAT Thr Phe Asn 250
GAT
Asp 235
GCC
Ala GGG GAC GTT Ala Asp Val
GCA
Ala 240
CTG
Leu GGG GTG TTG Arg Val Leu
AG
Thr 245
GTG
Val GTT TC Val Ser 230 GTG AAA Val Lys CTG GGG Leu 262 GGG CAA ATC Gly Gin Ile
TGT
Cys 255 GCA GCG GAG Ala Pro Asp
TAT
Tyr 260
GAGAGGCGGT
GCCGACATGG
CTTGTCGCCT
CGAACCCAGT
TCACGCAAGT
TTGATGGCTT
GCGCGTTACG
TTGCGTATTG
AAGCCATGAC
TGCGTATAAT
TGACATAAGC
GGTCGAGAAC
GTTATGAGTG
CCGTGGGTG
GGCGATGGA
AAACGGGATG
ATTTGCCGAT
GTGTTGGGTT
CTTGACCGAA
TTTTTTTGTA
ATGTTTGATG
TAAAAACTGT
ATGAACCTGA
GGACGCAGAC
CGTAAACTGT
CGGAGCGGTG
GAGTGTATGG
TTATGGAGCA
TGTAATTCAT
ATCGCCAGG
GGTGGAAACG
AATGCAAGTA
GTAACGGCGC
GTGGGGATC
GCAACG ATG Met 1
TAAGCATTCT
GCATGAGGAC
GATGAAGGCA
GCGTATGCGC
AGTGGCGGTT
CAAGCAGCAA
TTA CGC Leu Arg 1001 1049 1097 1145 1193 1241 1301 1361 1421 1481 1541 1601 1656 1704 1752 1800 1848 1896 1944 AGC AGC Ser Ser AAC GAT GTT AG Asn Asp Val Thr
GAG
Gin 10 GAG GGG ACT CG Gin Gly Ser Arg
GGT
Pro AAA ACA AAG TTA Lys Thr Lys Leu
GT
Gly
GAA
Gin GGG TGA AGT ATG Oly Ser Ser Met ATG ATT GG ACA Ile Ile Arg Thr AGG GTG GG GGT Arg Leu Gly Pro
GAG
Asp GTG AAA TGG Val Lys Ser
ATG
Met OCT GGT GTT Ala Ala Leu
GAT
Asp 45
CG
Gin GTT TTC GGT GGT Leu Phe Gly Arg GAG TTG Giu Phe OGA GAG GTA Oly Asp Val GGG AAG TTG Gly Asn Leu AGG TAG TGG GAA Thr Tyr Ser Gin CCG GAG TGG Pro Asp Ser GAT TAG GTG Asp Tyr Leu 0CC TTG GAG Ala Phe Asp GGT AGT AAG Arg Ser Lys ATG GGG GTT Ile Ala Leu
GGT
Ala CAA GAA GGG GTT GTT GGG GGT GTG GGG GGT TAG GTT GTG CCC AGG TTT 13- Gin Giu Ala Val Val Gly Ala 90 Leu Ala Ala Tyr Va1
CTC
Leu Leu Pro Arg Phe GAG CAG CCG Glu Gin Pro 100 GAG CAC CGG Glu His Arg CAT GAG GCC His Giu Ala TAC GGT GAC Tyr Gly Asp 150 GAA GAA GTG Glu Giu Val 165 CGT AGT Arg Ser AGG CAG Arg Gin 120 AAC GCG Asn Ala GAG ATC Glu Ile TAT ATC TAT Tyr Ile Tyr GCA GTC TCC Ala Val Ser 105
GGC
Gly
GGC
Gly 115 ATT GCC ACC Ile Ala Thr
GCG
Ala 125
GTG
Val ATC AAT CTC Ile Asn Leu CTC AAG Leu Lys 130 CTT GGT GCT Leu Giy Ala
TAT
Tyr 140
CTC
Leu ATC TAC GTG Ile Tyr Val CCC GCA GTG Pro Ala Val TAT ACA AAG Tyr Thr Lys
TTG
Leu 160
GCC
Ala CAA GCA GAT Gin Ala Asp 145 GGC ATA CGG Gly Ile Arg ACC TAA CAA Thr 177 ATG CAC TTT GAT Met His Phe Asp 170 GAC CCA AGT Asp Pro Ser
ACC
Thr 175 TTCGTTCAAG CCGAGATCGG CTTCCCTG CAA AGT CCT GTG GGT GAG TCG AAC Gin Ser Pro Val Gly Giu Ser Asn 451 455 1992 2040 2088 2136 2184 2236 2284 2338 2398 2458 2518 2578 2638 2698 2758 2775 TTG GCG Leu Ala 460 GTC CTC Val Leu 475
GGAGCATG
TCGAGCCT
ATCTTCGC
CCGATGTC
CCAGACAC
ATTTCGTC
TGTACGTC
CCAACCT9 FIG. 2e: ATG CGC GCA Met Arg Ala CTT TCA ACG Leu Ser Thr ;CG CTTCTGGGC 'GC ACAGCTGAJ CA TTCATGAC( CG ATTCCACG( ;CA ACAAATGA( AA TGTGACGA TT GGTGGCGA( ?GA GGAATTC Cc P2 C TAC GGA GAA GCG ATC CAC GGA CTG CTC TCT o0 Tyr Gly Giu Ala Ile His Gly Leu Leu Ser 465 470 GAG TGT TAG AACCGTTGGT AGTGGTTTTG GACGGGCCCA Glu Cys 480 481 C CGTTTCTTGA GTATTCATTG GATAGTCACG CGTGGTAGC' 'G AGCACCCTGG AAGGCGCGCT GTACGCGGAC GACTGGGTT( ;G AACTCCGTTC CCCAGTACCG CGATGACTAT TTTGCCTCT' C GCCTGACGCT AAGCGGGGGC GGGGGCGCCC GCATCCCAG( 3T AGGCTCTTGG ATGCCGCGGC GGCTGAGATT GGTAACGGC.
rG GATTCGATTG CCCGTGCTGC CGGCGTCTCA AAAAAAACGI ,C AAGGAAGAAC TCATTTCCCG GTTAGTGGCT CGAGACATG'
T
r
A
T
14
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAAGAT
CTCCAGCTCA
TATCGCCCGG
TTTTCTTGGC
GCTTCGCTTC
TAAGCATTCT
TTGAGGCCGA
AATTAAAATA
AGGGCAATTT
TTCTATCAGC
CATGCTTGTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGGAAACCGC
TTGGGCTATT
GGGCCGCTTT
GCCTGAACCT
GCATCGAGAT
TGGTGGCTTT
CTTGGCGGCG
ATGGTTTCTT
GGCTGAGCAG
CGAAAGTCAT
TCGTTGACAT
GCTGAGGTCA
GAACAGCCTG
TCGAAGCGAT
ATGTGAATTT
TTGCCTCTAT
GGTGTTAGCC
AGGGCAGAGG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
120 180 240 300 360 420 AGAATAACAA TTGACTCCTC AGGAGGTCAG CG ATG AGC ATT Met Ser Ile CTT GGT TTG AAT Leu Gly Leu Asn GGT GCC CCG Gly Ala Pro AAG AAG GCG Lys Lys Ala GTC GGA GCT GAG Val Gly Ala Glu CAG CTG Gin Leu 15 GGG CCT Gly Pro GGC TCG GCT Gly Ser Ala
CTT
Leu GAT CGC ATG Asp Arg Met CAC CTG GAG His Leu Glu GCA AAC Ala Asn AGT AGG Ser Arg
TTG
Leu
GAA
Glu GAG CTG CGT CTG Glu Leu Arg Leu CTG GAT CGT Leu Asp Arg
ATT
Ile
GCG
Ala 45
TCT
Ser GCA ATG CTT Ala Met Leu
CTG
Leu
AAT
Asn AAT CGT GAA Asn Arg Giu GCC GAC GCG Ala Asp Ala
GTT
Val1
GAC
Asp GCT GAC TTT Ala Asp Phe CGC AGC CGT Arg Ser Arg 665 713 ACA CTG CTT Thr Leu Leu CGC GAG CAC Arg Giu His TTT CCA GGG Phe Pro Gly ATT GCT GGC Ile Ala Gly GTG GCA AGC CTG Val Ala Ser Leu GTG GCC AAA TGG Val Ala Lys Trp
ATG
Met 95
GTT
Val1 GAG CCC GAA CAT Giu Pro Glu His
CAC
His 100
CTG
Leu AAG GAT AGC Lys Asp Ser AAG GCG ATG Lys Ala Met GGT GTC GTT Gly Val Val GCG GAG GCA Ala Giu Ala GAG TTT CAG Giu Phe Gin 105 GOG GTC Gly Val
CCG
Pro 115
CTG
Leu ATT AGT CCC Ile Ser Pro 120
CTG
Leu
TGG
Trp 125
GCA
Ala TTC CCT ATC Phe Pro Ilie GCC TTT GGG Ala Phe Gly
CCG
Pro 135 GCC GGC ATA Ala Gly Ile
TTC
Phe 140
CGG
Arg GCA GGT AAT Ala Gly Asn
CGC
Arg 145
CTT
Leu GCC ATG CTC AAG Ala Met Leu Lys CCG TCC Pro Ser 150 GCT CGT Ala Arg GAG CTT ACC Giu Leu Thr ACT TCT GCC Thr Ser Ala
CTG
Leu 160 GCG GAG CTA Ala Giu Leu 15 TAC TTC GAT Tyr Phe Asp 170 GGT GCG CTG Gly Ala Leu GAA ACT GAG CTG ACT ACA GTG CTG GGC GAC GCT GAA GTC Glu Thr Glu Leu Thr Thr Val Leu Gly Asp Ala Giu Val 175 180 TTC AGT GCT CAG CCT TTC GAT CAT CTG ATC TTC ACC GGC Phe Ser Ala Gin Pro Phe Asp His Leu Ile Phe Thr Gly 185 GGC ACT Glv Thr 195
GCG
Ala GCC GTG GCC Ala Val Ala ATC ATG CGT Ile Met Arg GCG GAT AAC Ala Asp Asn 200
GTG
Va1
CTA
Leu 215 CCC GTT ACC Pro Val Thr
CTG
Leu 220
ATG
Met TTG GGT GGC Leu Gly Gly
AAA
Lys 225 CGC AGT GCA Arg Ser Ala ACC TTC AAT Thr Phe Asn 250 TTG GCG ATG Leu Ala Met 460 GCG GAC GTT Ala Asp Val GCA CAA Ala Gin 240 TCG CCG GTG ATC GTT TCC Ser Pro Val Ile Val Ser 230 CGG GTG TTG ACG GTG AAA Arg Val Leu Thr Val Lys 245 CC GTG GGT GAG TCG AAC Val Gly Glu Ser Asn 454 455 ATC CAC GGA CTG CTC TCT Ile His Gly Leu Leu Ser 470 GGG CAA ATC Gly Gin Ile TGT CTG GCA Cys Leu Ala 255 257 GGA GAA GCG Gly Glu Ala 1001 1049 1097 1145 1193 1240 1288 1342 1402 1462 1522 1582 1642 1702 1762 1779 CGC GCA CCC Arg Ala Pro
TAC
Tyr 465 CTC CTT TCA ACG Leu Leu Ser Thr GAG TGT Glu Cys 480 481 TAG AACCGTTGGT AGTGGTTTTG GACGGGCCCA
GGAGCATGCG
TCGAGCCTGC
ATCTTCGCCA
CCGATGTCCG
CCAGACAGCA
ATTTCGTCAA
TGTACGTCTT
CCAACCTTGA
FIG. 2f:
CTTCTGGGCC
ACAGCTGATG
TTCATGACGG
ATTCCACGCC
ACAAATGAGT
TGTGACGATG
GGTGGCGAGC
GGAATTC
CGTTTCTTGA
AGCACCCTGG
AACTCCGTTC
GCCTGACGCT
AGGCTCTTGG
GATTCGATTG
AAGGAAGAAC
GTATTCATTG
AAGGCGCGCT
CCCAGTACCG
AAGCGGGGGC
ATGCCGCGGC
CCCGTGCTGC
TCATTTCCCG
GATAGTCACG
GTACGCGGAC
CGATGACTAT
GGGGGCGCCC
GGCTGAGATT
CGGCGTCTCA
GTTAGTGGCT
CGTGGTAGCT
GACTGGGTTC
TTTGCCTCTT
GCATCCCAGC
GGTAACGGCA
AAAAAAACGC
CGAGACATGT
-16-
CTGCAGCCGA
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
GCATCGATTG
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
AGCACTTTAC
AAATCGATCT
AACTTGATAA
CTCGAAGAGG
TCTGCTCAGC
GACGCTGGAG
CCAGCTGCGC
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
TGGCTGACCA TTCAGAATGG CGGGCATCAT GCCCGCGGCG GTTCTCCGGT CTTGGTGGAT GAACGCCGAG TCCACATTGC CAGTGTGTCG ATTGGTCATC G ATG CGT TCT CTC GAG Met Arg Ser Leu Giu 120 180 240 300 356 GC CTT CTT CCC Ala Leu Leu Pro
TTC
Phe CCG GGT CGA ATT Pro Gly Arg Ile
CTT
Leu 15
GTT
Val1 GAG CGT CTC GAG Glu Arg Leu Giu CAT TGG His Trp, GCT AAG ACC Ala Lys Thr COG GAA TGG Cly Glu Trp GCC ATC GCA Ala Ile Ala CCA GAA CAA ACC Pro Glu Gin Thr
TGC
Cys
GCG
Ala GCT GCC AGG Ala Ala Arg CGT CGT ATC AGC Arg Arg Ile Ser GAA ATG TTC Glu Met Phe
CAC
His
GCA
Ala GCG GCA AAT Ala Ala Asn AAC GTC CGC Asn Val Arg GAG CGT CCG Glu Arg Pro CAG AGC TTG Gin Ser Leu TAC GGA CTA Tyr Gly Leu CTG CTT Leu Leu
TCG
Ser
CTT
Leu ATC GTC TCT Ile Val Ser
GGA
Gly
GGC
Gly GAC CTG GAA Asp Leu Clu
CAT
His
CCG
Pro CAG CTG GCA Gin Leu Ala 596 644 OCT ATO TAT Ala Met Tyr
GCG
Ala
CAA
Gin ATT CCC TAT Ile Pro Tyr
TGC
Cys 95
CTG
Leu GTC TCT CCT Val Ser Pro TCA CTO CTG Ser Leu Leu CTG CAA CCG Leu Gin Pro 120
TCG
Ser 105
GGA
Gly GAT TTG GCG Asp Leu Ala
AAG
Lys 110
GCC
Ala COT CAC ATC Arg His Ile
GTA
Val1 115
TTC
Phe GCT TAT Ala Tyr 100 GGT CTT Gly Leu CAG GGG Gin 132 CTG GTC TTT Leu Val Phe
GCT
Ala 125 GAT GCA GCA Asp Ala Ala
CCT
Pro 130 ACAGCAAGCG AACCGGAATT GCCAGCTGGO OCOCCCTCTG AAAGTAAACT GCATGGCTTT CTTOCCCCCA AOOATCTGAT
GTAAGGTTGG
GGCGCAGGGO
GAAGCCCTGC
ATCAAGATCT
GATCAAGAGA CAGGATGAGG ATCGTTTCGC ATG ATT OAA CAA GAT GGA TTG CAC Met Ile Glu Gin Asp Gly Leu His 1 GCA GOT TCT CCG GCC GCT TOO GTG GAG AGG CTA TTC 0CC TAT CAC TGG Ala Oly Ser Pro Ala Ala Trp Val Clu Arg Leu Phe Cly Tyr Asp Trp 15
GCA
Ala CAA CAG ACA ATC Gin Gin Thr Ile
GGC
Gly TGC TCT GAT GCC Cys Ser Asp Ala GTG TTC CGG CTG Val Phe Arg Leu GCG CAG GGG CGC Ala Gin Gly Arg GTT CTT TTT GTC Val Leu Phe Val ACC GAC CTG TCC Thr Asp Leu Ser GGT GCC Gly Ala CTG AAT GAA Leu Asn Glu ACG GGC GTT Thr Gly Val
CTG
Leu CAG GAC GAG GCA GCG CGG CTA TCG TGG Gin Asp Glu Ala Ala Arg Leu Ser Trp 65 CTG GCC ACG Leu Ala Thr GAA GCG GGA Glu Ala Gly CCT TGC GCA GCT Pro Cys Ala Ala
GTG
Val CTC GAC GTT GTC Leu Asp Val Val AGG GAC Arg Asp TGG CTG CTA TTG Trp Leu Leu Leu GAA GTG CCG GGG Glu Val Pro Gly
CAG
Gin 100 GAT CTC CTG TCA Asp Leu Leu Ser
TCT
Ser 105 CAC CTT GCT CCT His Leu Ala Pro GAG AAA GTA TCC Glu Lys Val Ser ATG GCT GAT GCA Met Ala Asp Ala
ATG
Met 120 CGG CG CCTG CAT Arg Arg Leu His
ACG
Thr 125 CTT GAT CCG GCT Leu Asp Pro Ala
ACC
Thr 130 TGC CCA TTC GAC Cys Pro Phe Asp CAC CAA His Gln 135 1010 1058 1106 1154 1202 1250 1298 1346 1394 1442 1490 1538 1586 1634 GCG AAA CAT Ala Lys His GTC GAT CAG Val Asp Gin 155 ATC GAG CGA GCA Ile Glu Arg Ala ACT CGG ATG GAA Thr Arg Met Glu GCC GGT CTT Ala Gly Leu 150 GCG CCA GCC Ala Pro Ala GAT GAT CTG GAC Asp Asp Leu Asp GAG CAT CAG GGG Glu His Gin Gly GAA CTG Glu Leu 170 TTC GCC AGG CTC Phe Ala Arg Leu
AAG
Lys 175 GCG CGC ATG CCC Ala Arg Met Pro
GAC
Asp 180 GGC GAG GAT CTC Gly Glu Asp Leu
GTC
Val 185 GTG ACC CAT GGC Val Thr His Gly
GAT
Asp 190 GCC TGC TTG CCG Ala Cys Leu Pro
AAT
Asn 195 ATC ATG GTG GAA Ile Met Val Glu GGC CGC TTT TCT Gly Arg Phe Ser TTC ATC GAC TGT Phe Ile Asp Cys
GGC
Gly 210 CGG CTG GGT GTG Arg Leu Gly Val GCG GAC Ala Asp 215 CGC TAT CAG Arg Tyr Gin GGC GGC GAA Gly Gly Glu 235
GAC
Asp 220 ATA GCG TTG GCT ACC CGT GAT ATT GCT Ile Ala Leu Ala Thr Arg Asp Ile Ala 225 GAA GAG CTT Glu Glu Leu 230 ATC GCC GCT Ile Ala Ala TGG GCT GAC CGC Trp Ala Asp Arg
TTC
Phe 240 CTC GTG CTT TAC Leu Val Leu Tyr
GGT
Gly 245 -18- CCC GAT TCG GAG CCC ATC GCC TTC TAT CGC CTT CTT GAG GAG TTC TTC Pro Asp Ser Gin Arg Ile Ala Phe Tyr Arg Leu Leu Asp Clu Phe Phe 250 255 260 264 TGACCGCGAC TCTGGGGTTC GAAATGACCG ACCAAGCGAC CCCCCT GTT TTG CAA Val Leu Gin 563 565 TGG CGG TCG GCG AAA GTT GAT GCG CTG TAT CGT GGT GAA GAT CAA TCC Trp Arg Ser Ala Lys Val Asp Ala Leu Tyr Arg Gly Glu Asp Gin Ser 570 575 580 ATG CTG CCT CAC GAG CC ACA CTC TGA CTTCGTCAGG CCCGGCTTAC Met Leu Arg Asp Glu Ala Thr Leu 585 589 TCGGCGTTTT CCCACACTGC CTTGGTTGCC CCAGTGCGCA CCCCCTGGAT TG GGTGCCCTCT CCCTGCTCTC CCCTATCCAC TTAGGGGTAA AGGTCCCTCG CCG ATCCGTCCGT CGCTTGAACC ACAAATGGTC GATACCGTAC TCGCAGCCTC TA CCAAGCTTTG ATCCTTACCT GCTCCCGCGC CACATTCGCT TGTACACCGG TG TCGGTTCCGG CCTTGGCGGT GCAGCCCATT TCCGGCACAG CCTTCGAACT C GCCGGCGAGC AGATTTCCCA ACCCGCTGAT CACGTCCTGT CTGTCCCGGG CT FIG. 2g: 1682 1737 1785 1832 1892 1952 2012 2072 2132 2188 AiTTGC CCC
ACTTCTG
TGCCTCAA
TTCCCAAG
TTCGGCAG
GCAG
19
CTGCAGCCGA
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
GCATCGATTG
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
AGCACTTTAC
AAATCGATCT
AACTTGATAA
CTCGAAGAGG
TCTGCTCAGC
GACGCTGGAG
CCAGCTGCGC
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
TGGCTGACCA TTCAGAATGG CGGGCATCAT GCCCGCGGCG GTTCTCCGGT CTTGGTGGAT GAACGCCGAG TCCACATTGC CAGTGTGTCG ATTGGTCATC G ATG CGT TCT CTC GAG Met Arg Ser Leu Giu GCG CTT CTT CCC Ala Leu Leu Pro
TTC
Phe CCG GGT CGA Pro Gly Arg GCT AAG ACC Ala Lys Thr GGG GAA TGG Gly Glu Trp CCC ATC GCA Ala Ilie Ala CCA GAA CAA ACC Pro Giu Gin Thr ATT CTT Ile Leu 15 TGC GTT Cys Val GCG GAA Ala Ciu GAG CGT CTC GAG Giu Arg Leu Giu CAT TGG His Trp 404 452 GCT GCC AGG Ala Ala Arg CCT ATC AGC Arg Ile Ser ATG TTC Met Phe
CAC
His
GCA
Ala GCG GCA AAT Ala Ala Asn AAC GTC CGC Asn Val Arg GAG CGT CCG Giu Arg Pro CAG AGC TTG Gin Ser Leu CTC CTT Leu Leu
CTT
Leu 60
AAT
Asn TAC GCA CTA Tyr Cly Leu
TCG
Ser ATC GTC TCT Ile Val Ser GAC CTG GAA Asp Leu Giu CTT CAG CTG GCA Leu Gin Leu Ala
CCC
Cly GCT ATG TAT Ala Met Tyr
C
Ala
CAA
Gin ATT CCC TAT Ile Pro Tyr GTG TCT CCT Val Ser Pro TCA CTG CTG Ser Leu Leu CTG CAA CCG Leu Gin Pro 120
TCG
Ser 105
GGA
Gly GAT TTG GCG Asp Leu Ala
AAG
Lys 110
GCC
Ala CCT CAC ATC Arg His Ile CCT TAT Ala Tyr 100 GGT CTT Cly Leu CAG CCC Gin 132 CTC GTC TTT Leu Val Phe
GCT
Ala 125 CAT CCA CCA Asp Ala Ala
CCT
Pro 130
CACAGCCCT
CCCACATGC
CTTGTCCCCT
CGAACCCACT
TCACCCAACT
TTCATCGCTT
TTCCTATTC
AACCCATCAC
TGCGTATAAT
TGACATAAC
GCTCCAGAAC
CTTATCACTC
GGCGCATCCA
AAACCCCATC
ATTTGCCCAT
CTGTTCGCTT
CTTGACCCAA
TTTTTTTCTA
TAAAAACTCT
ATCAACCTCA
GGACGCACAC
CCTAAACTCT
CGCACCCTC
CAGTCTATC
TCTAATTCAT
ATCCCCAGCC
CGTCGAAACC
AATCCAAGTA
GTAACGCCGC
CTCGCCCATC
TAAGCATTCT
CCATCACCAC
CATGAAGGCA
CCTATGCGC
AGTGCCTT
CAAGCAGCAA
800 860 920 980 1040 1100 GCGCGTTACG CCGTGGGTCG ATGTTTGATG TTATGGAGCA GCAACG ATG TTA CGC Met Leu Arg 1 AGC AGC AAC GAT GTT ACG Ser Ser Asn Asp Val Thr CAG GGC AGT CGC Gin Gly Ser Arg
CCT
Pro AAA ACA AAG TTA Lys Thr Lys Leu
GGT
Gly GGC TCA AGT ATG Gly Ser Ser Met
GGC
Gly ATC ATT CGC ACA Ile Ile Arg Thr AGG CTC GGC CCT Arg Leu Gly Pro CAA GTC AAA TCC Gin Val Lys Ser CGG GCT GCT CTT GAT CTT TTC GGT CGT Arg Ala Ala Leu Asp Leu Phe Gly Arg GAG TTC Glu Phe GGA GAC GTA Gly Asp Val GGG AAC TTG Gly Asn Leu ACC TAC TCC CAA Thr Tyr Ser Gin CAG CCG GAC TCC Gin Pro Asp Ser GAT TAC CTC Asp Tyr Leu GCC TTC GAC Ala Phe Asp CTC CGT AGT AAG Leu Arg Ser Lys TTC ATC GCG CTT Phe Ile Ala Leu
GCT
Ala CAA GAA Gin Glu GCG GTT GTT GGC Ala Val Val Gly
GCT
Ala 90 CTC GCG GCT TAC Leu Ala Ala Tyr
GTT
Val CTG CCC AGG TTT Leu Pro Arg Phe 1155 1203 1251 1299 1347 1395 1443 1491 1539 1587 1635 1683 1735 1783
GAG
Glu 100 CAG CCG CGT AGT Gin Pro Arg Ser ATC TAT ATC TAT Ile Tyr Ile Tyr
GAT
Asp 110 CTC GCA GTC TCC Leu Ala Val Ser GAG CAC CGG AGG Glu His Arg Arg
CAG
Gin 120 GGC ATT GCC ACC Gly Ile Ala Thr CTC ATC AAT CTC Leu Ile Asn Leu CTC AAG Leu Lys 130 CAT GAG GCC His Glu Ala TAC GGT GAC Tyr Gly Asp 150
AAC
Asn 135 GCG CTT GGT GCT Ala Leu Gly Ala
TAT
Tyr 140 GTG ATC TAC GTG Val Ile Tyr Val CAA GCA GAT Gin Ala Asp 145 GGC ATA CGG Gly Ile Arg GAT CCC GCA GTG Asp Pro Ala Val
GCT
Ala 155 CTC TAT ACA AAG Leu Tyr Thr Lys
TTG
Leu 160 GAA GAA Glu Glu 165 GTG ATG CAC TTT Val Met His Phe ATC GAC CCA AGT Ile Asp Pro Ser ACC GCC ACC TAA CAA Thr Ala Thr 175 177 TTCGTTCAAG CCGAGATCGG CTTCCCCT GTT TTG CAA TGG CGG TCG GCG AAA Val Leu Gin Trp Arg Ser Ala Lys 563 565 570 GTT GAT GCG CTG TAT CGT GGT GAA GAT CAA TCC ATG CTG CGT GAC GAG Val Asp Ala Leu Tyr Arg Gly Glu Asp Gin Ser Met Leu Arg Asp Glu 575 580 585 -21- GCC ACA CTG TCA GTTGGTCAGG GGGGGCTTAC TCGGCGTTTT CCGACACTGC Ala Thr Leu 589 GTTGGTTGCG GCAGTGCGCA CCCCCTGGAT TGATTGCGGG GGTGCCCTGT CGCTGGTGTC GCCTATCGAC TTAGGGGTAA AGGTCGCTCG CGAAGTTCTG ATGCGTGCGT CGCTTGAACC ACAAATGGTC GATAGCGTAC TCGCAGGCTC TATGGCTCAA GCAAGCTTTG ATGCTTACCT GCTCCCGCGG CACATTGGCT TGTACAGCGG TGTTCCCAAG TCGGTTCCGG CCTTGGGGGT GCAGCGCATT TGCGGCACAG GCTTCGAACT GCTTCGGCAG GCCGGCGAGC AGATTTCCCA AGGCGCTGAT CACGTGCTGT GTGTCGCGGG CTGCAG FIG. 2h: 1835 1895 1955 2015 2075 2135 2171 -22- CTGCAGCCGA GCATCGATTG AGCACTTTAC CCAGCTGCGC TGGCTGACCA TTCAGAATGG
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
AAATCGATCT
AACTTGATAA
CTCGAAGAGG
TCTGCTCAGC
GACGCTGGAG
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
CGGGCATCAT GCCCGCGGCG GTTCTCCGGT CTTGGTGGAT GAACGCCGAG TCCACATTGC CAGTGTGTCG ATTGGTCATC G ATG CGT TCT CTC GAG Met Arg Ser Leu Glu GCG CTT CTT CCC Ala Leu Leu Pro
TTC
Phe
CCA
Pro CCG GGT CGA ATT Pro Gly Arg Ile GAG CGT CTC GAG Glu Arg Leu Glu CAT TGG His Trp GCT AAG ACC Ala Lys Thr GGG GAA TGG Gly Glu Trp GCC ATC GCA Ala Ile Ala GAA CAA ACC Glu Gin Thr
TGC
Cys
GCG
Ala GCT GCC AGG Ala Ala Arg CGT ATC AGC Arg Ile Ser GAA ATG TTC Glu Met Phe
CAC
His
GCA
Ala GCG GCA AAT Ala Ala Asn AAC GTC CGC Asn Val Arg GAG CGT CCG Glu Arg Pro CAG AGC TTG Gin Ser Leu TAC GGA CTA Tyr Gly Leu CTG CTT Leu Leu
TCG
Ser
CTT
Leu ATC GTC TCT Ile Val Ser
GGA
Gly
GGC
Gly GAC CTG GAA Asp Leu Glu
CAT
His
CCG
Pro CAG CTG GCA Gin Leu Ala 596 644 GCT ATG TAT Ala Met Tyr ATT CCC TAT Ile Pro Tyr
TGC
Cys
CTG
Leu GTG TCT CCT Val Ser Pro TCA CTG CTG Ser Leu Leu CTG CAA CCG Leu Gin Pro 120 GCT GTT TTG Ala Val Leu
TCG
Ser 105
GGA
Gly
CAA
Gin 565
TCC
Ser GAT TTG GCG Asp Leu Ala CGT CAC ATC Arg His Ile CTG GTC TTT Leu Val Phe TGG CGG TCG Trp Arg Ser ATG CTG CGT Met Leu Arg
GCT
Ala 125
GCG
Ala
GAC
Asp 585 GAT GCA GCA Asp Ala Ala
CCT
Pro 130
CTG
Leu
GTA
Va1 115
TTC
Phe
TAT
Tyr 575 GCT TAT Ala Tyr 100 GGT CTT Gly Leu CAG CGC Gin Arg 133 CGT GGT Arg Gly 562
GAA
Glu AAA GTT GAT GCG Lys Val Asp Ala 570 GAG GCC ACA CTG Glu Ala Thr Leu 589 GAT CAA Asp Gin 580 TGA GTTGGTCAGG GGGGGCTTAC TCGGCGTTTT CCGACACTGC GTTGGTTGCG TGATTGCGGG GGTGCCCTGT CGCTGGTGTC GCCTATCGAC GCAGTGCGCA CCCCCTGGAT TTAGGGGTAA AGGTCGCTCG 23 CGAAGTTCTG ATGCGTGCGT CGCTTGAACC ACAAATGGTC GATAGCGTAC TCGCAGGCTC 1017 TATGGCTCAA GCAAGCTTTG ATGCTTACCT GCTCCCGCGG CACATTGGCT TGTACAGCGG 1077 TGTTCCCAAG TCGGTTCCGG CCTTGGGGGT GCAGCGCATT TGCGGCACAG GCTTCGAACT 1137 GCTTCGGCAG GCCGGCGAGC AGATTTCCCA AGGCGCTGAT CACGTGCTGT GTGTCGCGGG 1197 CTGCAG 1203 FIG. 2i: -24- GAATTCCCCT GGCGACGAAA GGGCGGCAGG GCTTGGGTTA ATCGTTAACC GTTTGAAATT GGGTACGCCT TTGCGTGCGC TTTGATCTGC AATTGAGAGA ACTATAGGTT CGCAGTAGCT GGTGCACG ATG AAT AGC TAG GAT GGG Met Asn Ser Tyr Asp Gly 1 5 CCGCATGGCC ACGGCTGGGC GGTAACTGAT GCTTGCCAAA TTTCGGGAG AGAATCATGG GCTTCCGTGG CTTGAATCAG AAAAATAGTT TTTGCTCACC CACCAAATCG ACAGGACTGG GGT TGG TCT ACC GTT GAT GTG AAG Arg Trp Ser Thr Val Asp Val Lys
GTT
Va1
AAG
Asn GAA GAA GGT ATG Glu Giu Gly Ile
GCT
Ala 20 TGG GTG AGG CTG Trp Val Thr Leu GCA ATG AGC Ala Met Ser
GCA
Pro
GAG
Asp ACT GTC AAT GGA Thr Leu Asn Arg AAC CGC CCG GAG AAG GGC Asn Arg Pro Giu Lys Arg 25 ATG GTG GAG GTT GTG GAG Met Val Glu Val Leu Glu GTT GTT GTG AGT GGT GCA Leu Val Leu Thr Gly Ala 120 180 240 290 338 386 434 482 531 591 651 703 GTG GTG GAG GAG Val Leu Glu Gin GGG GAA TCC TGG Gly Glu Ser Trp ACC GAT GGT GGG Thr Asp Ala Gly GCA GAT GCT Ala Asp Ala
CGC
Arg 55 ACC GGG GGG ATG Thr Ala Gly Met 70 CCC GAA ATT GTG Pro Glu Ile Leu 85 GAC CTG AAG Asp Leu Lys GAA GAG AAG Gin Glu Lys AAGCGAACCG GAATTGCGAG GTGGGGCGCC CTCTGGTAAG AAACTGGATG GGTTTGTTGC CGCCAAGGAT GTGATGGCGC AGAGAGAGGA TGAGGATGGT TTCGC ATG ATT GAA CAA Met Ile Giu Gin 1 GGT TGT CCG GCC GGT TGG GTG GAG AGG GTA TTG Gly Ser Pro Ala Ala Trp Val Giu Arg Leu Phe 15 20 CAA GAG ACA ATC GGG TGG TGT GAT GCC GCC GTG Gin Gin Thr Ile Gly Gys Ser Asp Ala Ala Val 35 GAG GGG GGC CCG GTT CTT TTT GTG AAG ACC GAG Gin Gly Arg Pro Vai Leu Phe Val Lys Thr Asp 50 AAT GAA CTG GAG GAG GAG GCA GGG GGG GTA TCG Asn Giu Leu Gin Asp Glu Ala Ala Arg Leu Ser 65 GAG TAT TTC CGC GAG Glu Tyr Phe Arg Glu ATT CGT GGGGGAGAGC Ile Arg 90 91 GTTGGGAAGC CCTGCAAAGT AGGGGATGAA GATCTGATCA GAT GGA TTG CAC GCA Asp Gly Leu His Ala GGC TAT GAG TGG GCA Gly Tyr Asp Trp Ala TTC CGG CTG TGA GCG Phe Arg Leu Ser Ala GTG TCC GGT GCG CTG Leu Ser Gly Ala Leu TGG GTG GCC ACG ACG Trp Leu Ala Thr Thr 799 GGC GTT Gly Val CCT TGC GCA GCT Pro Cys Ala Ala
GTG
Val 80 CTC GAC GTT GTC Leu Asp Val Val
ACT
Thr GAA GCG GGA AGG Glu Ala Gly Arg
GAC
Asp TGG CTG CTA TTG Trp Leu Leu Leu GAA GTG CCG GGG Glu Val Pro Gly GAT CTC CTG TCA Asp Leu Leu Ser CAC CTT GCT CCT His Leu Ala Pro
GCC
Ala 110 GAG AAA GTA TCC Glu Lys Val Ser ATG GCT GAT GCA Met Ala Asp Ala ATG CGG Met Arg 120 CGG CTG CAT Arg Leu His AAA CAT CGC Lys His Arg 140
ACG
Thr 125 CTT GAT CCG GCT Leu Asp Pro Ala
ACC
Thr 130 TGC CCA TTC GAC Cys Pro Phe Asp CAC CAA GCG His Gin Ala 135 GGT CTT GTC Gly Leu Val ATC GAG CGA GCA Ile Glu Arg Ala ACT CGG ATG GAA Thr Arg Met Glu
GCC
Ala 150 GAT CAG Asp Gin 155 GAT GAT CTG GAC Asp Asp Leu Asp GAG CAT CAG GGG Glu His Gin Gly
CTC
Leu 165 GCG CCA GCC GAA Ala Pro Ala Glu
CTG
Leu 170 TTC GCC AGG CTC Phe Ala Arg Leu
AAG
Lys 175 GCG CGC ATG CCC Ala Arg Met Pro GGC GAG GAT CTC Gly Glu Asp Leu GTG ACC CAT GGC Val Thr His Gly
GAT
Asp 190 GCC TGC TTG CCG Ala Cys Leu Pro
AAT
Asn 195 ATC ATG GTG GAA Ile Met Val Glu AAT GGC Asn Gly 200 1039 1087 1135 1183 1231 1279 1327 1375 1423 1471 1525 1573 CGC TTT TCT Arg Phe Ser TAT CAG GAC Tyr Gin Asp 220
GGA
Gly 205 TTC ATC GAC TGT Phe Ile Asp Cys
GGC
Gly 210 CGG CTG GGT GTG Arg Leu Gly Val ATA GCG TTG GCT Ile Ala Leu Ala CGT GAT ATT GCT Arg Asp Ile Ala
GAA
Glu 230 GCG GAC CGC Ala Asp Arg 215 GAG CTT GGC Glu Leu Gly GCC GCT CCC Ala Ala Pro GGC GAA Gly Glu 235 TGG GCT GAC CGC Trp Ala Asp Arg CTC GTG CTT TAC Leu Val Leu Tyr GGT ATC Gly Ile 245
GAT
Asp 250 TCG CAG CGC ATC GCC TTC TAT CGC CTT Ser Gin Arg Ile Ala Phe Tyr Arg Leu GAC GAG TTC Asp Glu Phe TTC TGA Phe 264 GCGGGACTCT GGGGTTCGAA ATGACCGACC AAGCGACGCC CC GAG CAG GGC ATG Glu Gin Gly Met 255 GGC TTG CAG ACC TAC Gly Leu Gin Thr Tyr 270 AAG CAG Lys Gin 260 TTC CTT GAC GAG Phe Leu Asp Glu AAA AGC ATC AAG CCG Lys Ser Ile Lys Pro 265 26 AAG CGC TGA TAAATGCGCC GGGGCCCTCG CTGCGCCCCC GGCCTTCCAA TAATGACAAT 1632 Lys Arg 275 276 AATGAGGAGT GCCCAATGTT TCACGTGCCC CTGCTTATTG GTGGTAAGCC TTGTTCAGCA 1692
TCTGATGAGC
GCTGCTGCCA
GAATGGGCGG
CTAGAGGACC
TGGTATGGGT
FIG. 2j:
GCACCTTCGA
GTTTGGAAGA
CGCTTGCTCC
GTTCTTCCGA
TTAACGTTTA
GCGTCGTAGC
TGCGGACGCC
GAGCGAACGC
GTTCACCGCC
CCTGGCGGCG
CCGCTGACCG
GCAGTGGCCG
CGTGCCCGAC
GCAGCGAGTG
GGCATGTTGC
GAGAAGTGGT ATCGCGCGTC CTGCACAGGC TGCGTTTCCT TGCTGCGAGC GGCGGATCTT AAACTGGCGC AGCGGGAAAC
GGGGAATTC
1752 1812 1872 1932 1981 27 GAATTCCCCT GGCGACGAAA GGGCGGCAGG GCTTGCGTTA ATCGTTAACC GTTTGAAATT GGGTACGCCT TTCCGTGCGC TTTGATCTGC AATTGACAGA ACTATAGGTT CGCAGTAGCT GGTGCACG ATG AAT AGC TAC GAT GGC Met Asn Ser Tyr Asp Gly 1 5 CCGCATGGCC ACGGCTGGGC GGTAACTGAT CCTTGCCAAA TTTCGGCGAG AGAATCATGC GCTTCCGTGC CTTGAATCAG AAAAATAGTT TTTGCTCACC CACCAAATCC ACAGCACTGG CGT TGG TCT ACC GTT GAT GTG AAG Arg Trp Ser Thr Val Asp Val Lys 120 180 240 290 338
GTT
Val
AAC
Asn GAA GAA GGT ATC Glu Glu Gly Ile TGG GTC ACG CTG Trp, Val Thr Leu GCA ATG AGC Ala Met Ser
CCA
Pro
CAC
Asp ACT CTC AAT Thr Leu Asn GCA CAT GCT Ala Asp Ala GTG CTG GAG CAC Val Leu Giu Gin CGC CAA TCC TGC Cly Giu Ser Trp CGA GAG Arg Ciu 40 CGC CTC Arg Val 55 GAC CTG Asp Leu
AAC
Asn 25
ATG
Met
CTT
Leu CGC CCC GAG AAG CGC Arg Pro Giu Lys Arg CTC GAG CTT CTC GAG Val Ciu Val Leu Giu GTT CTC ACT GGT GCA Val Leu Thr Gly Ala ACC CC CCC ATC Thr Ala Cly Met 70 AAG GAG TAT Lys Ciu Tyr TTC CCC GAG Phe Arg Ciu ACC CAT GC' Thr Asp Al~
GCGGTTTGCC
CATGGAAGCC
CCCCTTCT
CCACTTCACA
CAACTGGTCC
CCCTTCTTAT
TTACCCCTG
P CCC CCC GAA ATT CTG CAA GAG AAC i Cly Pro Glu Ile Leu Gin Ciu Lys
TATTCGCC
ATCACAAACG
ATAATATTTG
TAAGCCTGTT
AGAACCTTGA
CACTCTTTTT
GGTCGATCTT
ATGCATAAAA
GCATGATGAA
CCCATCCACC
CGGTTCGTAA.
CCGAACGCAG
TTCTACACTC
TCATGTTATG
ACTGTTGTAA
CCTGAATCGC
CACACCCTCC
ACTGTAATGC
CGGTGGTAAC
TATCCCTCG
GAGCAGCAAC
ATT CCT CCCCCCACAC Ile Arg 90 91 TTCATTAAGC ATTCTGCCCA CACCCATC ACCACCTTCT AAACGGATGA AGGCACGA.AC AAGTAGCGTA TCCTCACC GCCCACTCC CCCTTTTCAT GCATCCAAGC AGCAAGCGCG C ATG TTA CCC ACC AC Met Leu Arg Ser Ser 1 ACA AAC TTA CCT GC Thr Lys Leu Gly Cly CCC CCT CAC CAA CTC Cly Pro Asp Gin Val 531 591 651 711 771 831 891 947 995 1043 AAC CAT CTT ACC CAC CAC CCC ACT CCC CCT AAA Asn Asp Val Thr Gin Gin Cly Ser Arg Pro Lys 15 TCA ACT ATC CCC ATC ATT CCC ACA TCT ACC CTC Ser Ser Met Cly Ilie Ile Arg Thr Cys Arg Leu 30 -28- AAA TCG ATG CGG GCT GCT CTT GAT GTT TTG Lys Ser Met Arg Ala Ala Leu Asp
CAG
Gin Leu Phe GGT CGT GAG TTG GGA GAG Gly Arg Glu Phe Gly Asp GTA GCC Val Ala TTG GTC Leu Leu TAC TCC CAA Tyr Ser Gin GGG GAC TC Pro Asp Ser
CAT
Asp
CC
Ala CTG GGC AAC Leu Gly Asn GGT AGT AAG Arg Ser Lys
AGA
Thr 75 TTC ATC CG CTT Phe Ile Ala Leu
GCG
Ala
GCT
Ala 80
CTG
Leu TTC GAG GAA Phe Asp Gin
CAA
Giu GTT GTT GGC Val Val Gly CTC GGG GGT TAG Leu Ala Ala Tyr GGG AGC TTT Pro Arg Phe GAG GAG Glu Gin 100 CCC GT AGT Pro Arg Ser CG ACC GAG Arg Arg Gin 120 GCC AACGCC Ala Asn Aia
GAG
Ciu 105
GGG
Gly TAT ATC TAT Tyr Ile Tyr
GAT
Asp 110
GTG
Leu CA GTG TGG Ala Val Ser ATT CG AGG Ile Ala Thr
GG
Ala 125
CTG
Val1 ATG AAT GTG Tie Asn Leu
GTG
Leu 130
CA
Ala GGG GAG GAG Gly Glu His 115 AAG GAT GAG Lys His Giu GAT TACGGCT Asp Tyr Cly 1091 1139 1187 1235 1283 1331 1379 1427 1476 GTT CCT GGT Leu Cly Ala 135 CAC CAT Asp Asp 150
TAT
Tyr 140
GTG
Leu ATG TACGCTC Ile Tyr Val CCC CA CTC Pro Ala Val TAT ACA AAC Tyr Thr Lys
CAA
Gin 145
GC
Cly
ACC
Thr 177 ATA CCC CAA CAA Ile Arg Giu Clu 165 TAA CAATTGTTC CTC ATGCGAG TTT Val Met His Phe
CAT
Asp 170 GC GGA ACT Asp Pro Ser
ACC
Thr 175 AACCGACAT CGGTTGGGCC GAC CCC ATC AAGCGAG TTG GTT GC GAG 1526 Clu Gin Cly Met Lys Gin Phe Leu Asp Giu 255 260 AAA AC ATG AAC CCC CCC TTGCGAG ACC TAG AAC Lys Ser Ile Lys Pro Cly Leu Gin Thr Tyr Lys 265 270 275 GGGCCCGTCG CTGCGGGGG GGGGTTGGAA TAATGAGAAT TGACGTGGGG GTGGTTATTC CTCCTA.AC TTCTTCGCA CGCGTAC GCGCTGAGGG CAGAACTCCT ATCGCGCGC TCGCG GACTGCGC GTGGAGCCG CTCCCTCGTTTGGT CACGAACGGCCGGCGC TCGCGAC GCGATGTT CCC TCA TAAATCGCG Arg 276 AATCACGACT GGGGAATGTT TTGATCAC G CAGGTTGGA CGTGCA CTTTCGAACA GAATCCCGCGG GCGTTCTGG GTACAAGGC CTTTTCGA 1575 1635 1695 1755 1815 1875 -29 GTTCACCGCC GCAGCGAGTG AAACTGGCGC AGCGGGAAAC TGGTATGGGT TTAACGTTTA 1935 CCTGGCGGCG GGCATGTTGC GGGGAATTC 1964 FIG. 2k: 30 GAATTCCCCT GGCGACGAAA GGGCGGCAGG CCGCATGGCC ACGGCTGGGC GGTAACTGAT GCTTGCGTTA ATCGTTAACC GTTTGAAATT CCTTGCCAAA TTTCGGCGAG AGAATCATGC GGGTACGCCT TTCCGTGCGC TTTGATCTGC GCTTCCGTGC CTTGAATCAG AAAAATAGTT AATTGACAGA ACTATAGGTT CGCAGTAGCT TTTGCTCACC CACCAAATCC ACAGCACTGG GGTGCACG ATG AAT AGC TAC GAT GGC CGT TGG TCT ACC GTT GAT GTG AAG Met Asn Ser Tyr Asp Gly Arg Trp Ser Thr Val Asp Val Lys
GTT
Val1
AAC
Asn GAA GAA GGT ATC Giu Glu Gly Ile
GCT
Ala 20
ACT
Thr TGG GTC ACG Trp Val Thr CTC AAT CGA Leu Asn Arg CTG AAC Leu Asn GAG ATG Giu Met CGC CCG GAG AAG Arg Pro Giu Lys GCA ATG AGC Ala Met Ser GTC GAG GTT Vai Glu Val
CTG
Leu GTG CTG GAG Val Leu Giu GGC GAA TCC Gly Glu Ser ACC GAT GCT Thr Asp Ala
CAG
Gin
TGG
Trp
GGC
Gly GCA GAT GCT Ala Asp Ala GTG CTT GTT CTG Val Leu Val Leu CTG AAG GAG TAT Leu Lys Giu Tyr ACC GCG GGC Thr Ala Giy CCC GAA ATT Pro Giu Ile 85 TTC CTT GAC Phe Leu Asp
ATG
Met 70 ACT GGT GCA Thr Gly Ala TTC CGC GAG Phe Arg Giu CGC GAG CAG Arg Glu Gin 92 255 GGC TTG CAG Gly Leu Gin 270 482 530 CTG CAA GAG AAG Leu Gin Giu Lys ATT CGT Ile Arg 90 GGC ATG AAG CAG Gly Met Lys Gin 260 ACC TAC AAG CGC Thr Tyr Lys Arg 275 276 GAG AAA Giu Lys 265 AGC ATC AAG CCG Ser Ile Lys Pro TGA TAAATGCGCC GGGGCCCTCG CTGCGCCCCC GGCCTTCCAA
TAATGACAAT
TTGTTCAGCA
ATCGCGCGTC
TGCGTTTCCT
GGCGGATCTT
AGCGGGAAAC
FIG. 21:
AATGAGGAGT
TCTGATGAGC
GCTGCTGCCA
GAATGGGCGG
CTAGAGGACC
TGGTATGGGT
GCCCAATGTT
GCACCTTCGA
GTTTGGAAGA
CGCTTGCTCC
GTTCTTCCGA
TTAACGTTTA
TCACGTGCCC
GCGTCGTAGC
TGCGGACGCC
GAGCGAACGC
GTTCACCGCC
CCTGGCGGCG
CTGCTTATTG
CCGCTGACCG
GCAGTGGCCG
CGTGCCCGAC
GCAGCGAGTG
GGCATGTTGC
GTGGTAAGCC
GAGAAGTGGT
CTGCACAGGC
TGCTGCGAGC
AAACTGGCGC
GGGGAATTC
-31 GAATTCCAAT AATGACAATA ATGAGGAGTG CCCA ATO TTT CAC GTG CCC CTC CTT Met Phe His Val Pro Leu Leu
ATT
Ile
CCT
Arg
TTG
Leu
GAA
Giu
GCG
Ala
AGT
Ser
CC
Ala
GGC
Gly 120
CGA
Arg
GTA
Val1
ACC
Thr
ATT
Ile
GGT
Gly
AGC
Ser
GAA
Glu
TGG
Trp
GCG
Ala
GAA
Glu
GCG
Ala 105
GAT
Asp
CAC
Gin
ATC
Ile
GTG
Val
GGT
Gly 185
GT
Gly
CCC
Pro
CAT
Asp
C
Ala
CAT
Asp
ACT
Thr
GC
Cly
GTC
Val1
CCA
Pro
CTT
Leu
CTC
Val 170
CAC
Gin
AAC
Lys
CTC
Leu
CC
Ala
C
Ala
CTT
Leu
GC
Cly
ATG
Met
ATT
Ile
TGT
Cys
GC
Cly 155
TTC
Leu
CTC
Val1 CCT TCT Pro Cys ACC GCA Thr Gly GAC GCC Asp Ala 45 CTT CCT Leu Ala CTA GAG Leu Glu CCA CC Ala Ala TTC CCC Leu Arg CCC TCC Pro Ser 125 CCC GTC Cly Val 140 CTA CCC Val Arg AAA AC Lys Ser TTG CAT Leu His
TCA
Ser
CAA
Clu 30
CCA
Ala
CCC
Pro
CAC
Asp
GCA
Cly
CAA
Clu 110
AAT
Asn
CTG
Val
GCT
Ala
TCT
Ser
CAT
Asp 190 CCA TCT Ala Ser 15 CTC CTA Val Val GT CC Val Ala ACC CAA Ser Clu CCT TCT Arg Ser 80 AAC TCC Asn Trp 95 CCC CC Ala Ala CTG CCC Val Pro CTC GCT Leu Cly CTT CC Val Ala 160 GAG CTG Glu Leu 175 CCT GGT Ala Cly
CAT
Asp
TCC
Ser
GCT
Ala
CC
Arg 65
TCC
Ser
TAT
Tyr
CC
Al a
GGT
Gly
ATT
Ile 145
ATG
Met
ACT
Ser
CTC
Leu GAG CCC ACC TTC GAG CCT Clu
CC
Arg
CCA
Ala 50
CCT
Arg
GAG
Clu
CCC
Cly
ATC
Met
AC
Ser 130
CC
Ala
CCC
Pro
CCC
Pro
CCC
Gly Arg Thr CTC CCT Val Ala CAC CCT Gin Ala CCC CGA Ala Arg TTC ACC Phe Thr TTT AAC Phe Asn 100 ACC ACA Thr Thr 115 TTT GCC Phe Ala CCT TGG Pro Trp TTC CCA Leu Ala TTT ACC Phe Thr 180 CAT GC Asp Gly 195 Phe
GCT
Ala
CC
Ala
CTC
Leu
CC
Ala
GTT
Val
CAC
Gin
ATC
Met
AAT
Asn
TC
Cys 165
CAT
His
CTC
Val1 Clu
CC
Ala
TTT
Phe
CTG
Leu
GCA
Ala
TAC
Tyr
ATT
Ile
CC
Ala
CCT
Ala 150
GC
Gly
CC
Arg
GTC
Val1 Arg
ACT
Ser
CCT
Pro
CGA
Arg
CC
Ala
CTG
Leu
CAC
Gin
GTT
Val 135
CCC
Pro
AAT
Asn
CTG
Leu
AAT
Asn 535 CTC ATC AGC AAT GCC CCC CAA CAC CCT CCT CC GTC CTC GAG CGA CTC Ile Ser Asn Ala Pro 205 Gin Asp Ala Pro Val Val Glu Arg Leu 215 -32- ATT GCA AAT CCT Ile Ala Asn Pro OTA CGT CGA Val Arg Arg GTO AAC Val Asn 225 TTC ACC GOT TCG Phe Thr Gly Ser ACC CAC Thr His 230 GTT GGA CGG Val Gly Arg GTG CTG GAA Val Leu Glu 250
ATC
Ile 235 ATT GOT GAG CTO Ile Gly Glu Leu
TCT
Ser 240 GCG CGT CAT CTG Ala Arg His Leu AAO CCT OCT Lys Pro Ala 245 GAC OAT 0CC Asp Asp Ala TTA GOT GOT AAO Leu Gly Gly Lys
OCT
Ala 255 CCG TTC TTG GTC Pro Phe Leu Val
TTG
Leu 260 GAC CTC Asp Leu 265 OAT GCG GCG Asp Ala Ala OTC GAA Val Glu 270 GCG GCG GCC TTT Ala Ala Ala Phe
GGT
Gly 275 0CC TAC TTC AAT Ala Tyr Phe Asn
CAG
Gin 280 GGT CAA ATC TGC Gly Gin Ile Cys
ATG
Met 285 TCC ACT GAG CGT Ser Thr Giu Arg ATT GTG ACA OCA Ile Val Thr Ala
OTC
Va1 295 OCA GAC 0CC TTT Ala Asp Ala Phe
OTT
Va1 300 GAA AAG CTG GCG Glu Lys Leu Ala
AGO
Arg 305 AAG GTC 0CC ACA CTG CGT Lys Val Ala Thr Leu Arg 310 OCT GOC OAT Ala Gly Asp 0CC AAT OCA Ala Asn Ala 330 AAT OAT CCG CAA Asn Asp Pro Gin
TCG
Ser 320 OTC TTO GGT Val Leu Gly TC0 TTG ATT OAT Ser Leu Ile Asp 325 OAT GCG CTC GGG Asp Ala Leu 340 342 GGT CAA COC ATC Gly Gin Arg Ile OTT CTG GTC OAT Val Leu Val Asp OACAGCAAOC OAACCGOAAT TGCCAGCTGG GGCOCCCTCT OGTAAGGTTG OGAAOCCCTO CAAAGTAAAC TGGATGGCTT TCTTOCCGCC AAGGATCTOA TGGCGCAGOG OATCAAGATC TGATCAAGAG ACAGGATOAG GATCOTTTCG C ATO ATT OAA CAA OAT OGA TTG Met Ile Glu Gin Asp Gly Leu 1015 1063 1123 1183 1235 1283 1331 1379 1427 CAC OCA GGT His Ala Oly TCT CCG GCC OCT Ser Pro Ala Ala
TGG
Trp 15 GTG GAG AGO CTA TTC GOC TAT GAC Val Glu Arg Leu Phe Gly Tyr Asp TGG GCA Trp Ala CAA CAG ACA ATC Gin Gin Thr Ile TGC TCT OAT 0CC Cys Ser Asp Ala GTG TTC COG CTG Val Phe Arg Leu TCA GCG CAG GGG COC CCG GTT CTT TTT GTC AAG ACC GAC CTG TCC GGT Ser Ala Gin Gly Arg Pro Val Leu Phe Val Lys Thr Asp Leu Ser Gly 45 50 0CC CTO AAT GAA CTG CAG GAC GAG GCA GCG CGG CTA TCG TOG CTG 0CC Ala Leu Asn Olu Leu Gin Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala 65 -33- ACG ACG GGC Thr Thr Gly GGA AGG GAC Gly Arg Asp CCT TGC GCA GCT Pro Cys Ala Ala
GTG
Val CTC GAC GTT GTC Leu Asp Val Val ACT GAA GCG Thr Glu Ala GAT CTC CTG Asp Leu Leu TGG CTG CTA TTG Trp Leu Leu Leu GAA GTG CCG GGG Glu Val Pro Gly
CAG
Gin 100 TCA TCT Ser Ser 105 CAC CTT GCT CCT His Leu Ala Pro
GCC
Ala 110 GAG AAA GTA TCC Glu Lys Val Ser
ATC
Ile 115 ATG GCT GAT GCA Met Ala Asp Ala
ATG
Met 120 CGG CGG CTG CAT Arg Arg Leu His CTT GAT CCG GCT Leu Asp Pro Ala
ACC
Thr 130 TGC CCA TTC GAC Cys Pro Phe Asp CAA GCG AAA CAT Gin Ala Lys His
CGC
Arg 140 ATC GAG CGA GCA Ile Glu Arg Ala ACT CGG ATG GAA Thr Arg Met Glu GCC GGT Ala Gly 150 CTT GTC GAT Leu Val Asp GCC GAA CTG Ala Glu Leu 170
CAG
Gin 155 GAT GAT CTG GAC Asp Asp Leu Asp GAG CAT CAG GGG Glu His Gin Gly CTC GCG CCA Leu Ala Pro 165 GGC GAG GAT Gly Glu Asp TTC GCC AGG CTC Phe Ala Arg Leu
AAG
Lys 175 GCG CGC ATG CCC Ala Arg Met Pro
GAC
Asp 180 1475 1523 1571 1619 1667 1715 1763 1811 1859 1907 1955 2003 2057 2105 CTC GTC Leu Val 185 GTG ACC CAT GGC Val Thr His Gly GCC TGC TTG CCG Ala Cys Leu Pro
AAT
Asn 195 ATC ATG GTG GAA Ile Met Val Glu
AAT
Asn 200 GGC CGC TTT TCT Gly Arg Phe Ser TTC ATC GAC TGT Phe Ile Asp Cys CGG CTG GGT GTG Arg Leu Gly Val
GCG
Ala 215 GAC CGC TAT CAG Asp Arg Tyr Gin
GAC
Asp 220 ATA GCG TTG GCT Ile Ala Leu Ala
ACC
Thr 225 CGT GAT ATT GCT Arg Asp Ile Ala GAA GAG Glu Glu 230 CTT GGC GGC Leu Gly Gly GCT CCC GAT Ala Pro Asp 250 TGG GCT GAC CGC Trp Ala Asp Arg
TTC
Phe 240 CTC GTG CTT TAC Leu Val Leu Tyr TCG CAG CGC ATC Ser Gin Arg Ile TTC TAT CGC CTT Phe Tyr Arg Leu
CTT
Leu 260 GGT ATC GCC Gly Ile Ala 245 GAC GAG TTC Asp Glu Phe CG GCC CAG Ala Gin 421 GTG CAT GAC Val His Asp TTC TGA GCGGGACTCT GGGGTTCGAA ATGACCGACC AAGCGACGCC Phe 264 CGC GTC GAT TCG GGC ATT TGC CAT ATC AAT GGA CCG ACT Arg Val Asp Ser Gly Ile Cys His Ile Asn Gly Pro Thr 425 430 435 -34- GAG GCT CAG ATG CCA TTC GGT GGG GTG AAG TCC AGC GGC TAC GGC AGC Giu Ala Gin Met Pro Phe Gly Gly Val Lys Ser Ser Gly Tyr Gly Ser 440 445 450 TTC GGC AGT CGA GCA TCG ATT GAG CAC TTT ACC CAG CTG CGC TGG CTG Phe Gly Ser Arg Ala Ser Ile Giu His Phe Thr Gin Leu Arg Trp Leu 455 460 465 470 ACC ATT CAG AAT GGC CCG CGG CAC TAT CCA ATC TAA ATCGATCTTC Thr Ile Gin Asn Gly Pro Arg His Tyr Pro Ile 475 480 481 2153 2201 2247 2307 2367 2427 2487 2539
GGGCGCCGCG
ACAGAGCTGT
AGTACAGTGA
CGCTACCGCA
ATTGTCCGGA
FIG. 2m:
GGCATCATGC
TCTCCGGTCT
ACGCCGAGTC
GTGTGTCGAT
TGCGTTCTCT
CCGCGGCGCT
TGGTGGATCA
CACATTGCAA
TGGTCATCCT
CGAGGCGCTT
CGCCTCATTT
AGGCCAGTCG
CCGCAGGCAT
CCGGTTGAGG
CTTCCCTTCC
CAATCTCTAA
CGGAGAGTCT
CATCATGCTC
TTACGCAAGA
CGGGTGGAAT
CTTGATAAAA
CGAAGAGGAG
TGCTCAGCCA
CGCTGGAGGT
TC
GAATTCCAAT AATGACAATA ATGAGGAGTG CCCA ATG TTT CAC GTG CCC CTG CTT Met Phe His Val Pro Leu Leu ATT GGT GGT Ile Gly Gly AAG CCT TGT TCA Lys Pro Cys Ser TCT GAT GAG CGC Ser Asp Glu Arg
ACC
Thr TTC GAG CGT Phe Glu Arg CGT AGC Arg Ser CCG CTG ACC GGA Pro Leu Thr Gly GTG GTA TCG CGC Val Val Ser Arg GCT GCT CCC AGT Ala Ala Ala Ser
TTG
Leu GAA GAT GCG GAC GCC GCA GTG GCC GCT Glu Asp Ala Asp Ala Ala Val Ala Ala CAG GCT GCG TTT Gin Ala Ala Phe 199 247 CAA TGG GCG GCG Glu Trp Ala Ala GCT CCG AGC GAA Ala Pro Ser Glu
CGC
Arg CGT GCC CGA CTG Arg Ala Arg Leu CTG CGA Leu Arg GCG GCG GAT Ala Ala Asp AGT GAA ACT Ser Glu Thr CTA GAG GAC CGT Leu Glu Asp Arg TCC GAG TTC ACC Ser Giu Phe Thr GCC GCA GCG Ala Ala Ala GTT TAC CTG Val Tyr Leu GGC GCA GCG GGA Gly Ala Ala Gly
AAC
Asn TGG TAT GCG TTT Trp Tyr Gly Phe GCC GCG Ala Ala 105 GGC ATG TTG CGG Gly Met Leu Arg
GAA
Glu 110 GCC GCC GCC ATG Ala Ala Ala Met
ACC
Thr 115 ACA CAG ATT CAG Thr Gin Ile Gin GAT GTC ATT CCG Asp Val Ile Pro AAT GTG CCC GGT Asn Val Pro Cly TTT GCC ATG GCG Phe Ala Met Ala
GTT
Va1 135 CGA CAG CCA TGT Arg Gin Pro Cys
GGC
Gly 140 GTG GTG CTC GGT Val Val Leu Cly GCG CCT TGG AAT Ala Pro Trp Asn CCT CCG Ala Pro 150 GTA ATC CTT Val Ile Leu ACC GTG GTG Thr Val Val 170
GGC
Cly 155 GTA CGG CCT GTT Val Arg Ala Val
GCG
Ala 160 ATG CCG TTG CCA Met Pro Leu Ala TGC CGC AAT Cys Gly Asn 165 CAT CGC CTG His Arg Leu TTG AAA AGC TCT Leu Lys Ser Ser CTG ACT CCC TTT Leu Ser Pro Phe
ACC
Thr 180 ATT GGT Ile Cly 185 CAG GTG TTG CAT Cln Val Leu His CCT GGT CTG GGG Ala Cly Leu Cly
GAT
Asp 195 GGC GTG GTG AAT Cly Val Val Asn ATC ACC AAT GCC Ile Ser Asn Ala
CCC
Pro 205 CAA CAC GCT CCT Gin Asp Ala Pro
CCG
Ala 210 CTG CTG GAG CGA Val Val Glu Arg 679 -36- ATT GCA AAT CCT Ile Ala Asn Pro GTA COT CGA GTG Val Arg Arg Val
AAC
Asn 225
GCG
Ala TTC ACC GOT TCG Phe Thr Gly Ser ACC CAC Thr His 230 727 775 GTT OGA CGG Val Oly Arg GTG CTC GAA Val Leu Clu 250 CAC CTC OAT Asp Leu Asp
ATC
Ile 235
TTA
Leu GGT GAG CTG Gly Giu Leu
TCT
Ser 240
CCC
Pro CGT CAT CTO Arg His Leu GGT GOT AAC Gly Oly Lys
OCT
Ala 255
GCG
Ala TTC TTG GTC Phe Leu Val
TTO
Leu 260 0CC Ala AAO CCT OCT Lys Pro Ala 245 GAC OAT 0CC Asp Asp Ala TAC TTC AAT Tyr Phe Asn GCG GCG GTC Ala Ala Val
OAA
Glu 270 GCG 0CC TTT Ala Ala Phe
GGT
Oly 275 265 CAG GGT Gin Oly CAA ATC TGC Gin Ile Cys 280
OCA
Ala
ATG
Met 285
GAA
Glu TCC ACT GAO CGT Ser Thr Giu Arg ATT CTG ACA GCA Ile Val Thr Ala
GTC
Val 295 GAC CCC TTT Asp Ala Phe
GTT
Val 300
AAT
Asn AAG CTG GCG Lys Leu Ala
AGO
Arg 305
GTC
Val GTC CCC ACA Val Ala Thr CTC CGT Leu Arg 310 ATT OAT Ile Asp OCT GGC OAT Ala Cly Asp GCC AAT OCA Ala Asn Ala 330 CAT CCC CAA Asp Pro Gin
TCC
Ser 320 TTG GGT TCC Leu Cly Ser GGT CAA CCC ATC Gly Gin Arg Ile CTGGGGAGAG GCGGTTTGCG TATTGGGCGC
ATGCATAAAA
GCATCATCAA
CCCATGGACG
CGGTTCGTAA
CCGAACGCAG
TTCTACACTC
TGATCTTATG
ACTGTTGTAA
CCTGAATCGC
CACACCGTGG
ACTGTAATGC
CGGTCGTAAC
TATCCCTCGG
GACCAGCAAC
TTCATTAACC ATTCTGCCGA CAGCGGCATC ACCACCTTGT AAACCGATCA ACGCACGAAC AAGTACCGTA TGCGCTCACC GGCGCAGTGG CGGTTTTCAT GCATCCAACC AGCAAGCGCC C ATO TTA CCC ACC AGC Met Leu Arg Ser Ser CATGCAACCC ATCACAAACC CGCCTTGCGT ATAATATTTC CCACTTCACA TAACCCTCTT CAACTCGTCC ACAACCTTGA GGCTTGTTAT CACTGTTTTT TTACGCCGTG GGTCCATOTT AAC CAT OTT ACO CAG Asn Asp Val Thr Gin TCA ACT ATG GGC ATC Ser Ser Met Cly Ile AAA TCC ATC CGG GCT Lys Ser Met Arg Ala 871 919 967 1015 1069 1129 1189 1249 1309 1369 1429 1480 1528 1576 CAG GGC ACT CCC CCT AAA ACA AAC TTA GGT GGC Cmn Gly Ser Arg Pro Lys Thr Lys Leu Cly Cly 20 ATT CCC ACA TOT AGO CTC GGC CCT CAC CAA CTC Ile Arg Thr Cys Arg Leu Cly Pro Asp Cmn Val 35 -37- GCT CTT GAT CTT TTC GGT CGT GAG TTC GGA GAC GTA GCC ACC TAC TCC 1624 Ala Leu Asp Leu Phe Gly Arg Glu Phe Gly Asp Val Ala Thr Tyr Ser 50 CAA CAT CAG CCG GAC TCC GAT TAC CTC GGG AAC TTG CTC CGT AGT AAG 1672 Gin His Gin Pro Asp Ser Asp Tyr Leu Gly Asn Leu Leu Arg Ser Lys 65 ACA TTC ATC GCG CTT GCT GCC TTC GAC CAA GAA GCG GTT GTT GGC GCT 1720 Thr Phe Ile Ala Leu Ala Ala Phe Asp Gin Glu Ala Val Val Gly Ala 80 85 CTC GCG GCT TAC GTT CTG CCC AGG TTT GAG CAG CCG CGT AGT GAG ATC 1768 Leu Ala Ala Tyr Val Leu Pro Arg Phe Glu Gin Pro Arg Ser Glu Ile 100 105 TAT ATC TAT GAT CTC GCA GTC TCC GGC GAG CAC CGG AGG CAG GGC ATT 1816 Tyr Ile Tyr Asp Leu Ala Val Ser Gly Glu His Arg Arg Gin Gly Ile 110 115 120 GCC ACC GCG CTC ATC AAT CTC CTC AAG CAT GAG GCC AAC GCG CTT GGT 1864 Ala Thr Ala Leu Ile Asn Leu Leu Lys His Glu Ala Asn Ala Leu Gly 125 130 135 GCT TAT GTG ATC TAC GTG CAA GCA GAT TAC GGT GAC GAT CCC GCA GTG 1912 Ala Tyr Val Ile Tyr Val Gin Ala Asp Tyr Gly Asp Asp Pro Ala Val 140 145 150 GCT CTC TAT ACA AAG TTG GGC ATA CGG GAA GAA GTG ATG CAC TTT GAT 1960 Ala Leu Tyr Thr Lys Leu Gly Ile Arg Glu Glu Val Met His Phe Asp 155 160 165 170 ATC GAC CCA AGT ACC GCC ACC TAA CAATTCGTTC AAGCCGAGAT CGGCTTCCCA 2014 Ile Asp Pro Ser Thr Ala Thr 175 177 A TTG GCC CAG CGC GTC GAT TCG GGC ATT TGC CAT ATC AAT GGA CCG ACT 2063 Leu Ala Gin Arg Val Asp Ser Gly Ile Cys His Ile Asn Gly Pro Thr 420 425 430 435 GTG CAT GAC GAG GCT CAG ATG CCA TTC GGT GGG GTG AAG TCC AGC GGC 2111 Val His Asp Glu Ala Gin Met Pro Phe Gly Gly Val Lys Ser Ser Gly 440 445 450 TAC GGC AGC TTC GGC AGT CGA GCA TCG ATT GAG CAC TTT ACC CAG CTG 2159 Tyr Gly Ser Phe Gly Ser Arg Ala Ser Ile Glu His Phe Thr Gin Leu 455 460 465 CGC TGG CTG ACC ATT CAG AAT GGC CCG CGG CAC TAT CCA ATC TAA 2204 Arg Trp Leu Thr Ile Gin Asn Gly Pro Arg His Tyr Pro Ile 470 475 480 481 ATCGATCTTC GGGCGCCGCG GGCATCATGC CCGCGGCGCT CGCCTCATTT CAATCTCTAA 2264 CTTGATAAAA ACAGAGCTGT TCTCCGGTCT TGGTGGATCA AGGCCAGTCG CGGAGAGTCT 2324 -38- CGAAGAGGAG AGTACAGTGA ACGCCGAGTC CACATTGCAA CCGCAGGCAT CATCATGCTC 2384 TGCTCAGCCA CGCTACCGCA GTGTGTCGAT TGGTCATCCT CCGGTTGAGG TTACGCAAGA 2444 CGCTGGAGGT ATTGTCCGGA TCCGTTCTCT CGAGGCGCTT CTTCCCTTCC CGGGTGGAAT 2504 TC 2506 FIG. 2n: -39- GAATTCCAAT AATGACAATA ATGAGGAGTG CCCA ATG TTT CAC GTG CCC CTG CTT Met Phe His Val Pro Leu Leu ATT GGT GGT Ile Gly Gly AAG CCT TGT TCA Lys Pro Cys Ser TCT GAT GAG CGC Ser Asp Glu Arg
ACC
Thr TTC GAG CGT Phe Clu Arg CGT AGC Arg Ser CCG CTG ACC GGA Pro Leu Thr Gly GTG OTA TCG CGC Val Val Ser Arg GCT GCT GCC AGT Ala Ala Ala Ser
TTG
Leu GAA GAT CCG GAC Glu Asp Ala Asp
GCC
Ala 45 GCA GTC GCC GCT Ala Val Ala Ala
GCA
Ala CAG OCT GCG TTT Gin Ala Ala Phe 199 247 GAA TGG GCG GCG Glu Trp Ala Ala GCT CCG AGC GAA Ala Pro Ser Glu
CGC
Arg CGT GCC CGA CTG Arg Ala Arg Leu CTG CGA Leu Arg GCG GCG GAT Ala Ala Asp AGT CAA ACT Ser Glu Thr
CTT
Leu CTA GAG GAC CGT Leu Glu Asp Arg TCC GAG TTC ACC Ser Glu Phe Thr CCC GCA GCG Ala Ala Ala GTT TAC CTG Val Tyr Leu GGC GCA GCG GGA Cly Ala Ala Cly
AAC
Asn TGG TAT GGG TTT Trp Tyr Cly Phe GCG GCG Ala Ala 105 GGC ATC TTC CGC Cly Met Leu Arg
CAA
Clu 110 CCC GCG GCC ATG Ala Ala Ala Met
ACC
Thr 115 ACA CAC ATT CAC Thr Gin Ile Gin
GGC
Cly 120 CAT GTC ATT CCC Asp Val Ile Pro AAT GTC CCC GGT Asn Val Pro Cly TTT CCC ATG GCG Phe Ala Met Ala
GTT
Va1 135 CGA CAC CCA TGT Arg Gin Pro Cys
GGC
Gly 140 GTG GTC CTC CGT Val Val Leu Cly GCG CCT TGG AAT Ala Pro Trp Asn CCT CCC Ala Pro 150 CTA ATC CTT Val Ile Leu ACC GTG GTC Thr Val Val 170
GGC
Cly 155 GTA CGG GCT GTT Val Arg Ala Val
GCG
Ala 160 ATG CCC TTC GCA Met Pro Leu Ala TGC GGC AAT Cys Gly Asn 165 CAT CCC CTC His Arg Leu TTC AAA ACC TCT Leu Lys Ser Ser
GAG
Clu 175 CTG ACT CCC TTT Leu Ser Pro Phe
ACC
Thr 180 ATT GGT Ile Cly 185 CAG GTG TTC CAT Gin Val Leu His CCT CGT CTG GGG Ala Cly Leu Cly GGC GTG GTG AAT Cly Val Val Asn
CTC
Va1 200 ATC ACC AAT GCC CCC CAA CAC CCT CCT GCG GTG GTG GAG CGA Ile Ser Asn Ala Pro Gin Asp Ala Pro Ala Val Val Clu Arg CTG 679 Leu 215 ATT GCA AAT CCT GCG GTA CGT CGA GTG AAC TTC ACC GGT TCG ACC CAC 727 Ile Ala Asn Pro Ala Val Arg Arg Val Asn Phe Thr Gly Ser Thr His 220 225 230 GTT GGA CGG ATC ATT GGT GAG CTG TCT GCG CGT CAT CTG AAG CCT GCT 775 Val Gly Arg Ile Ile Gly Glu Leu Ser Ala Arg His Leu Lys Pro Ala 235 240 245 GTG CTG GAA TTA GGT GGT AAG GCT CCG TTC TTG GTC TTG GAC GAT GCC 823 Val Leu Glu Leu Gly Gly Lys Ala Pro Phe Leu Val Leu Asp Asp Ala 250 255 260 GAC CTC GAT GCG GCG GTC GAA GCG GCG GCC TTT GGT GCC TAC TTC AAT 871 Asp Leu Asp Ala Ala Val Glu Ala Ala Ala Phe Gly Ala Tyr Phe Asn 265 270 275 CAG GGT CAA ATC TGC ATG TCC ACT GAG CGT CTG ATT GTG ACA GCA GTC 919 Gin Gly Gin Ile Cys Met Ser Thr Glu Arg Leu Ile Val Thr Ala Val 280 285 290 295 GCA GAC GCC TTT GTT GAA AAG CTG GCG AGG AAG GTC GCC ACA CTG CGT 967 Ala Asp Ala Phe Val Glu Lys Leu Ala Arg Lys Val Ala Thr Leu Arg 300 305 310 GCT GGC GAT CCT AAT GAT CCG CAA TCG GTC TTG GGT TCG TTG ATT GAT 1015 Ala Gly Asp Pro Asn Asp Pro Gin Ser Val Leu Gly Ser Leu Ile Asp 315 320 325 GCC AAT GCA GGT CAA CGC ATC CAG GTT CTG GTC GAT GAT GCG CTC GCA 1063 Ala Asn Ala Gly Gin Arg Ile Gin Val Leu Val Asp Asp Ala Leu Ala 330 335 340 AAA GGC GCG CAATGGAA TTG GCC CAG CGC GTC GAT TCG GGC ATT TGC CAT 1113 Lys Gly Ala Leu Ala Gin Arg Val Asp Ser Gly Ile Cys His 345 346 420 425 430 ATC AAT GGA CCG ACT GTG CAT GAC GAG GCT CAG ATG CCA TTC GGT GGG 1161 Ile Asn Gly Pro Thr Val His Asp Glu Ala Gin Met Pro Phe Gly Gly 435 440 445 GTG AAG TCC AGC GGC TAC GGC AGC TTC GGC AGT CGA GCA TCG ATT GAG 1209 Val Lys Ser Ser Gly Tyr Gly Ser Phe Gly Ser Arg Ala Ser Ile Glu 450 455 460 CAC TTT ACC CAG CTG CGC TGG CTG ACC ATT CAG AAT GGC CCG CGG CAC 1257 His Phe Thr Gin Leu Arg Trp Leu Thr Ile Gin Asn Gly Pro Arg His 465 470 475 TAT CCA ATC TAA ATCGATCTTC GGGCGCCGCG GGCATCATGC CCGCGGCGCT 1309 Tyr Pro Ile 480 481 CGCCTCATTT CAATCTCTAA CTTGATAAAA ACAGAGCTGT TCTCCGGTCT TGGTGGATCA 1369 AGGCCAGTCG CGGAGAGTCT CGAAGAGGAG AGTACAGTGA ACGCCGAGTC CACATTGCAA 1429 -41- CCGCAGGCAT CATCATGCTC TGCTCAGCCA CGCTACCGCA GTGTGTCGAT TGGTCATCCT 1489 CCGGTTGAGG TTACGCAAGA CGCTGGAGGT ATTGTCCGGA TCCGTTCTCT CGAGGCGCTT 1549 CTTCCCTTCC CGGGTGGAAT TC 1571 FIG. 2o: -42- GAATTCCGCG GTCGGCGAAA GTTGATGCOC TGTATCGTGG TGAAGATCAA TCCATGCTGC GTGACGAGGC CACACT GTO AGT TGG .TCA GGG GGG GCT TAC TCG GCG TTT TCC Met Ser Trp Ser Gly Gly Ala Tyr Ser Ala Phe Ser 1 5 GAC ACT GCG TTG GTT GCG GCA Asp Thr Ala Leu Val Ala Ala
GTG
Va1 CGC ACC CCC TGO Arg Thr Pro Trp
ATT
Ile GAT TGC GGG Asp Cys Gly 160 208 GGT GCC Gly Ala CTG TCO CTG GTO Leu Ser Leu Val CCT ATC GAC TTA Pro Ile Asp Leu
GGG
Gly GTA AAG OTC OCT Val Lys Val Ala
CGC
Arg GAA OTT CTO ATO Olu Val Leu Met
CGT
Arg 50 GCG TCO CTT OAA Ala Ser Leu Glu CAA ATG GTC OAT Gin Met Val Asp
AGC
Ser OTA CTC OCA GGC Val Leu Ala Oly ATG GCT CAA OCA Met Ala Gin Ala
AGC
Ser TTT OAT OCT TAC Phe Asp Ala Tyr CTO CTC Leu Leu CCG CGG CAC Pro Arg His TTG GGG GTG Leu Oly Vai GGC TTO TAC AGC Oly Leu Tyr Ser OTT CCC AAO TCO Val Pro Lys Ser OTT CCG GCC Val Pro Ala CTT CGG CAG Leu Arg Gin CAG COC ATT TGC Gin Arg Ile Cys
GGC
Oly 100 ACA GOC TTC OAA Thr Oly Phe Olu
CTO
Leu 105 GCC GGC Ala Gly 110 GAG CAG ATT TCC CAA OGC OCT OAT CAC Glu Gin Ile Ser Gin Oly Ala Asp His 115
GTG
Va1 120 CTO TOT OTC GCG Leu Cys Val Ala GAG TCC ATO TCG Glu Ser Met Ser AAC CCC ATC GCG Asn Pro Ile Ala
TCG
Ser 135 TAT ACA CAC CGG Tyr Thr His Arg GGG TTC COC CTC Gly Phe Arg Leu GCG CCC OTT GAG Ala Pro Val Glu AAG GAT TTT TTG Lys Asp Phe Leu TGG GAO Trp Glu 155 OCA TTO TTT OAT CCT OCT CCA OGA CTC GAC ATO ATC OCT Ala Leu Phe Asp Pro Ala Pro Gly Leu Asp Met Ile Ala ACC OCA OAA Thr Ala Glu 170 AAC CTG GGGACAOCAA GCGAACCGGA ATTOCCAOCT GGGGCGCCCT CTGGTAAGGT Asn Leu 174 TGGGAAOCCC TGCAAAGTAA ACTOGATGGC TTTCTTGCCO CCAAGGATCT GATGGCGCAC GGGATCAAGA TCTOATCAAO AGACAGOATO AGGATCOTTT COC ATO ATT GAA CAA Met Ile Giu Gin 1 -43-
GAT
Asp GGA TTG CAC GCA Gly Leu His Ala TCT CCG GCC GCT Ser Pro Ala Ala GTG GAG AGG CTA Val Glu Arg Leu GGC TAT GAC TGG Gly Tyr Asp Trp
GCA
Ala CAA CAG ACA ATC Gin Gin Thr Ile
GGC
Gly 30 TGC TCT GAT GCC Cys Ser Asp Ala GCC GTG Ala Val TTC CGG CTG Phe Arg Leu CTG TCC GGT Leu Ser Gly
TCA
Ser GCG CAG GGG CGC Ala Gin Gly Arg
CCG
Pro 45 GTT CTT TTT GTC Val Leu Phe Val AAG ACC GAC Lys Thr Asp CGG CTA TCG Arg Leu Ser GCC CTG AAT GAA Ala Leu Asn Glu CAG GAC GAG GCA Gin Asp Glu Ala
GCG
Ala TGG CTG Trp Leu GCC ACG ACG GGC Ala Thr Thr Gly CCT TGC GCA GCT Pro Cys Ala Ala CTC GAC GTT GTC Leu Asp Val Val
ACT
Thr GAA GCG GGA AGG Glu Ala Gly Arg TGG CTG CTA TTG Trp Leu Leu Leu
GGC
Gly 95 GAA GTG CCG GGG Glu Val Pro Gly
CAG
Gin 100 GAT CTC CTG TCA Asp Leu Leu Ser
TCT
Ser 105 CAC CTT GCT CCT His Leu Ala Pro
GCC
Ala 110 GAG AAA GTA TCC Glu Lys Val Ser ATC ATG Ile Met 115 GCT GAT GCA Ala Asp Ala TTC GAC CAC Phe Asp His 135
ATG
Met 120 CGG CGG CTG CAT Arg Arg Leu His
ACG
Thr 125 CTT GAT CCG GCT Leu Asp Pro Ala ACC TGC CCA Thr Cys Pro 130 ACT CGG ATG Thr Arg Met 859 907 955 1003 1051 1099 1147 1195 1243 1291 1339 1387 1435 CAA GCG AAA CAT Gin Ala Lys His ATC GAG CGA GCA Ile Glu Arg Ala
CGT
Arg 145 GAA GCC Glu Ala 150 GGT CTT GTC GAT Gly Leu Val Asp GAT GAT CTG GAC Asp Asp Leu Asp GAG CAT CAG GGG Glu His Gin Gly GCG CCA GCC GAA Ala Pro Ala Glu
CTG
Leu 170 TTC GCC AGG CTC Phe Ala Arg Leu
AAG
Lys 175 GCG CGC ATG CCC Ala Arg Met Pro
GAC
Asp 180 GGC GAG GAT CTC Gly Glu Asp Leu
GTC
Val 185 GTG ACC CAT GGC Val Thr His Gly
GAT
Asp 190 GCC TGC TTG CCG Ala Cys Leu Pro AAT ATC Asn Ile 195 ATG GTG GAA Met Val Glu GGT GTG GCG Gly Val Ala 215
AAT
Asn 200 GGC CGC TTT TCT Gly Arg Phe Ser
GGA
Gly 205 TTC ATC GAC TGT Phe Ile Asp Cys GGC CGG CTG Gly Arg Leu 210 CGT GAT ATT Arg Asp Ile GAC CGC TAT CAG Asp Arg Tyr Gln GAC ATA GCG TTG GCT ACC Asp Ile Ala Leu Ala Thr 220 225 -44- GCT GAA Ala Glu 230 GAG CTT GGC GGC Glu Leu Gly Gly TGG GCT GAC Trp Ala Asp CGC TTC CTC GTG CTT TAC Arg Phe Leu Val Leu Tyr 240 GCC TTC TAT CGC CTT CTT Ala Phe Tyr Arg Leu Leu 255 260
GGT
Gly 245 ATC GCC GCT CCC Ile Ala Ala Pro
GAT
Asp 250 TCG CAG CGC ATC Ser Gin Arg Ile GAC GAG TTC TTC TGA GCGGGACTCT GGGGTTCGAA ATGACCGACC AAGCGACGCC Asp Glu Phe Phe 264 CA TTG AGG GCG CAA GAG GAG AAA TGG ATT GAC CAA GAG ATC GTG GCT Leu Arg Ala Gin Glu Glu Lys Trp Ile Asp Gin Glu Ile Val Ala GTT ACG GAT Val Thr Asp GAA CTG CCT Glu Leu Pro 230
GAA
Glu 215 CAG TTC GAT TTA Gin Phe Asp Leu
GAG
Glu 220 GGC TAC AAC AGT Gly Tyr Asn Ser CGA GCA ATT Arg Ala Ile 225 ATC CGC GGC Ile Arg Gly CGG AAG GCA AAA Arg Lys Ala Lys TTG ATC GTG ACA Leu Ile Val Thr
GTC
Val 240 CTA GCA Leu Ala 245 GTC TTT GAA GCC Val Phe Glu Ala
CTT
Leu 250 TCC CGA TTG AAG Ser Arg Leu Lys GTT CAT TCT GGC Val His Ser Gly 1483 1531 1586 1633 1681 1729 1777 1825 1873 1921 1969 2017 2065 2113
GGG
Gly 260 GTG CAG ACT GCG Val Gin Thr Ala
GGC
Gly 265 AAC AGC TGT GCC Asn Ser Cys Ala
GTA
Val 270 GTG GAC GGC GCC Val Asp Gly Ala
GCG
Ala 275 GCG GCT TTG GTG Ala Ala Leu Val
GCT
Ala 280 CGA GAG TCG TCT Arg Glu Ser Ser
GCG
Ala 285 ACA CAG CCG GTC Thr Gin Pro Val TTG GCT Leu Ala 290 AGG ATA CTG Arg Ile Leu CTC GGC CCT Leu Gly Pro 310
GCT
Ala 295 ACC TCC GTA GTC Thr Ser Val Val ATC GAG CCC GAG Ile Glu Pro Glu CAT ATG GGG His Met Gly 305 AGT GAT CTT Ser Asp Leu GCG CCC GCG ATT Ala Pro Ala Ile CTG CTG CTT GCG Leu Leu Leu Ala
CGT
Arg 320 AGT TTG Ser Leu 325 AGG GAT ATC GAC Arg Asp Ile Asp
CTC
Leu 330 TTT GAG ATA AAC Phe Glu Ile Asn
GAG
Glu 335 GCG CAG GCC GCC Ala Gin Ala Ala
CAA
Gin 340 GTT CTA GCG GTA Val Leu Ala Val CAT GAA TTG GGT His Glu Leu Gly GAG CAC TCA AAA Glu His Ser Lys AAT ATT TGG GGC Asn Ile Trp Gly
GGG
Gly 360 GCC ATT GCA CTT Ala Ile Ala Leu GGA CAC Gly His 365 CCG CTT GCC Pro Leu Ala GCG ACC Ala Thr 370 GGA TTG CGT CTC TGC ATG ACC CTC GCT CAC CAA Gly Leu Arg Leu Cys Met Thr Leu Ala His Gin 375 380 TTT CGA TAT GGA ATT CCC TCG GCA TGC ATT GGT Phe Arg Tyr Gly Ile Ala Ser Ala Cys Ile Gly 390 395 GCG GTT CTT TTA GAG AAT CCC CAC TTC GGT TCG Ala Val Leu Leu Giu Asn Pro His Phe Gly Ser 405 410 TCG ATG ATT AAC AGA GTT GAC CAC TAT CCA CTG Ser Met Ile Asn Arg Val Asp His Tyr Pro Leu 420 425 430 CTTTGTTGCT TTGAGGTGGC GCACGAAGGA GGGCTCGAAA GAAGGAACAG GGAACATGAT TAGTTTCGCT CGTATGGCAG AAACTTGCCC TTGCCTTCGC ACTCGTATTA TGTGTCGGGC TTCTACAGTG TACATACCTT GTCAGGGTTG GTGGGAATTC FIG. 2p: TTG CAA GCT AAT AAC Leu Gin Ala Asn Asn 385 GGG GGA CAG GGG ATG Gly Giy Gin Gly Met 400 TCC TCT GCA CGA AGT Ser Ser Ala Arg Ser 415 AGC TAA CGGGCATCTC Ser 431 ATCTCTGCTA AAAACAAGAA AAAGTTTAGG AGTCCAGGCT TGATTGTTAC CGGCACGGGT 2161 2209 2257 2306 2366 2426 2486 2526 -46 GAATTCCGCG GTCGGCGAAA GTTGATGCGC TGTATCGTGG TGAAGATCAA TCCATGCTGC GTGACGAGGC CACACT GTG AGT TGG TCA GGG GGG GCT TAC TCG GCG TTT TCC Met Ser Trp, Ser Gly Gly Ala Tyr Ser Ala Phe Ser
GAC
Asp
GGT
Cly
CGC
Arg
GTA
Val1
CCG
Pro
TTG
Leu
GCC
Ala
GCA
Ala 125
GGG
Gly
GCA
ACT GCG Thr Ala GCC CTG Ala Leu GAA GTT Clu Val CTC GCA Leu Ala CGG CAC Arg His GGG GTC Gly Val GGC GAG Gly Clu 110 GAG TCC Clu Ser TTC CC Phe Arg TTC TTT
TTG
Leu
TCG
Ser
CTC
Leu
GGC
Gly
ATT
Ile
CAG
Gin
CAG
Gln
ATG
Met
CTC
Leu
CAT
GTT GCG GCA CTC CCC ACC CCC TCC ATT CAT TCC CCC Val
CTC
Leu
ATG
Met
TCT
Ser
GGC
Gly
CGC
Arg
ATT
I le
TCC
Ser
CCT
Gly 145
CCT
Ala
GTG
Val
CGT
Arg 50
ATG
Met
TTG
Leu
ATT
Ile
TCC
Ser
CCT
Arg 130
GCG
Ala
GCT
Ala
TCG
Ser 35
GCG
Ala
GCT
Ala
TAC
Tyr
TC
Cys
CAA
Gin 115
AAC
Asn
CCC
Pro
CCA
Val1 20
CCT
Pro
TCC
Ser
CAA
Gin
AGC
Ser
GC
Gly 100
GC
Gly
CCC
Pro
GTT
Val
GGA
Arg
ATC
Ile
CTT
Leu
CCA
Ala
CGT
Gly 85
ACA
Thr
GCT
Al a
ATC
Ile
GAG
Glu
CTC
Thr
GAC
Asp
GAA
Glu
AGC
Ser 70
GTT
Val1
GGC
Gly
GAT
Asp
GCG
Ala
TTC
Phe 150
GAC
Pro
TTA
Leu
CCA
Pro 55
TTT
Phe
CCC
Pro
TTC
Phe
CAC
His
TCG
Ser 135
AAG
Lys
ATG
Trp Ile GGG GTA Gly Val CAA ATG Gin Met GAT GCT Asp Ala AAG TCG Lys Ser GAA CTG Glu Leu 105 GTG CTG Val Leu 120 TAT ACA Tyr Thr CAT TTT Asp Phe ATC GCT Ile Ala Asp
AAG
Lys
GTC
Val
TAC
Tyr
GTT
Val1
CTT
Leu
TGT
Cys
CAC
His
TTG
Leu
ACC
Cys Gly GTC CCT Val Ala CAT AGC Asp Ser CTG CTC Leu Leu CCG CC Pro Ala CGG CAG Arg Gin GTC GCG Val Ala CGC GGC Arg Gly 140 TGG GAG Trp Ciu 155 CCA CAA 160 208 256 304 352 544 Ala Leu Phe Asp Pro Ala Pro Cly Leu Asp Met Thr Ala Ciu 170 AAC CTG GGGGAGAGGC GGTTTGCGTA TTGGGCGCAT CCATAAAAAC TCTTCTAATT Asn Leu 174 CATTAACCAT TCTGCCCACA TCCAAGCCAT CACAAACCCC ATCATCAACC TCAATCCCCA GCGGCATCAC CACCTTGTCG CCTTGCCTAT AATATTTCCC CATGGACGCA CACCCTCCAA ACCCATGAAC CCACCAACCC AGTTGACATA AGCCTGTTCC CTTCCTAAAC TCTAATCCAA GTAGCGTATG CGCTCACGCA ACTGGTCCAG AACCTTCACC CAACCCAC CTCCTAACCC -47- CGCAGTGGCG GTTTTCATGG CTTGTTATGA CTGTTTTTTT GTACAGTCTA TGCCTCGGGC ATCCAAGC AGCAAGCGCG TTACGCCGTC GGTCGATGTTTG ATGTTATGGA GCAGCAACG ATG TTA CGC AGC Met Leu Arg Ser 1
AGC
Ser 5 AAC GAT GTT ACG CAG CAG GGC AGT CGC Asn Asp Val Thr Gin Gin Gly Ser Arg 10 CCT AAA Pro Lys ACA AAG TTA Thr Lys Leu GGC CCT GAC Cly Pro Asp GGC TCA ACT ATG Cly Ser Ser Met ATC ATT CCC ACA Ile Ile Arg Thr TGT AGC CTC Cys Arg Leu CTT TTC GGT Leu Phe Cly CAA CTC AAA TCC Gin Vai Lys Ser
ATC
Met 40 CGG GCT CCT CTT Arg Aia Aia Leu
CAT
Asp CGT GAG Arg Glu TTC GCA GAC CTA Phe Gly Asp Vai
GCC
Ala 55 ACC TAC TCC CAA Thr Tyr Ser Gin
CAT
His CAC CCC GAC TCC Gin Pro Asp Ser
CAT
Asp TAC CTC GGG AAC Tyr Leu Cly Asn CTC CCT ACT AAC Leu Arg Ser Lys TTC ATC GCG CTT Phe Ile Ala Leu GCC TTC CAC CAA Ala Phe Asp Gin GCG GTT CTT CGC Ala Val Val Gly CTC GCG GCT TAC Leu Aia Ala Tyr GTT CTG Vai Leu 948 1007 1055 1103 1151 1199 1247 1295 1343 1391 1439 1487 1535 1589 1637 CCC AGG TTT Pro Arg Phe CTC TCC GGC Val Ser Cly 115
GAG
Glu 100 CAC CCC CCT ACT GAG ATC TAT ATC TAT Gin Pro Arg Ser Clu Ile Tyr Ile Tyr 105 CAT CTC CCA Asp Leu Ala 110 CTC ATC AAT Leu Ile Asn GAG CAC CGG AGG Glu His Arg Arg
CAC
Gin 120 GGC ATT CCC ACC Cly Ile Aia Thr
GCG
Ala 125 CTC CTC Leu Leu 130 AAC CAT GAG GCC Lys His Glu Ala GCG CTT GGT CCT Ala Leu Cly Ala
TAT
Tyr 140 GTG ATC TAC GTG Val Ile Tyr Val
CAA
Gin 145 CCA CAT TAC GGT Ala Asp Tyr Gly CAT CCC CCA GTG GCT CTC TAT ACA AAG Asp Pro Aia Vai Ala Leu Tyr Thr Lys 155
TTG
Leu 160 GGC ATA CGG GAA Cly Ile Arg Glu
CAA
Glu 165 GTG ATC CAC TTT Val Met His Phe
CAT
Asp 170 ATC CAC CCA AGT Ile Asp Pro Ser ACC GCC Thr Ala 175 ACC TAA CAATTCCTTC AAGCCGAGAT CGGCTTCCCA TTC AGG GCG CAA GAG GAG Thr Leu Arg Aia Gin Glu Clu 177 197 200 AAA TGG ATT CAC CAA GAG ATC CTG CCT CTT ACG GAT Lys Trp Ile Asp Gin Glu Ile Val Aia Vai Thr Asp 205 210 GAA CAC TTC CAT Glu Gin Phe Asp 215 -48
TTA
Leu
TTG
Leu 235
TCC
Ser
AGC
Ser
TCG
Ser
GTC
Val1
CGC
Arg 315
TTT
Phe
GAA
Giu
GCA
Ala
CTC
Leu
GCA
Ala 395
CAC
His
GGC
Gly
ATC
Ile
TTG
Leu
GCC
Ala
GCG
Ala 285
ATC
Ile
CTG
Leu
ATA
Ile
GGT
Gly
GGA
Gly 365
CAC
His
ATT
Ile
GGT
Gly
TAC
Tyr
GTG
Val
AAG
Lys
GTA
Val1 270
ACA
Thr
GAG
Glu
CTT
Leu
AAC
Asn
ATT
Ile 350
CAC
His
CAA
Gin
GGT
Gly
TCG
Ser
AAC
Asn
ACA
Thr
CCT
Pro 255
GTG
Val
CAG
Gin
CCC
Pro
GCG
Ala
GAG
Giu 335
GAG
Giu
CCG
Pro
TTG
Leu
GGG
Gly
TCC
Ser 415
AGT
Ser
GTC
Val1 240
GTT
Val1
GAC
Asp
CCG
Pro
GAG
Giu
CGT
Arg 320
GCG
Ala
CAC
His
CTT
Leu
CAA
Gin
GGA
Gly 400
TCT
Ser
CGA
Arg 225
ATC
Ile
CAT
His
GGC
Gly
GTC
Val
CAT
His 305
AGT
Ser
CAG
Gin
TCA
Ser
CC
Ala
GCT
Ala 385
CAG
Gin
GCA
GCA
Ala
CGC
Arg
TCT
Ser
GCC
Ala
TTG
Leu 290
ATG
Met
GAT
Asp
CC
Ala
AAA
Lys
C
Ala 370
AAT
Asn
GGG
Gly
CGA
ATT
Ile
GGC
Gly
GGC
Gly
C
Ala 275
GCT
Ala
GGG
Gly
CTT
Leu
CC
Ala
CTT
Leu 355
ACC
Thr
AAC
Asn
ATC
Met
AGT
GAA
Giu
CTA
Leu
CCC
Cly 260
CCG
Ala
AGG
Arg
CTC
Leu
ACT
Ser
CAA
Gin 340
AAT
Asn
GGA
Gly
TTT
Phe
GCG
Ala
TCG
CCT
Pro 230
GTC
Val1
CAG
Gin
TTG
Leu
CTG
Leu
CCT
Pro 310
AGG
Arg
CTA
Leu
TCG
Trp
CGT
Arg
TAT
Tyr 390
CTT
Leu
ATT
CGG
Arg
TTT
Phe
ACT
Thr
GTG
Val1
GCT
Ala 295
GCG
Ala
GAT
Asp
CCG
Ala
GCC
Cly
CTC
Leu 375
CGA
Cly
TTA
Leu
AAC
GCA
Ala
GCC
Ala
GGC
Gly 265
CGA
Arg
TCC
Ser
GCC
Ala
GAC
Asp
CAC
Gin 345
CC
Ala
ATG
Met
CC
Ala
AAT
Asn
GTT
AAA
Lys
CTT
Leu 250
AAC
Asn
GAG
Giu
GTA
Val
ATT
Ile
CTC
Leu 330
CAT
His
ATT
Ile
ACC
Thr
TCG
Ser
CCC
Pro 410
GAC
Asp 1685 1733 1781 1829 1877 1925 1973 2021 2069 2117 2165 2213 2261 2309 Ala Arg Ser Ser Met Ile Asn Arg Val CAC TAT CCA CTG AGC TAA CCCCCATCTC CTTTCTTCCT TTGAGGTGGC His Tyr Pro Leu Ser 430 431 -49 GCACGAAGGA GGGCTCGAAA ATCTCTGCTA AAAACAAGAA GAAGGAACAG GGAACATGAT 2369 TAGTTTCGCT CGTATGGCAG AAAGTTTAGG AGTCCAGGCT AAACTTGCCC TTGCCTTCGC 2429 ACTCGTATTA TGTGTCGGGC TGATTGTTAC CGGCACGGGT TTCTACAGTG TACATACCTT 2489 GTCAGGGTTG GTGGGAATTC 2509 FIG. 2q: GAATTCCGCG GTCGGCGAAA GTTGATGCGC TGTATCGTGG TGAAGATCAA TCCATGCTGC GTGACGAGGC CACACT GTG AGT TGG TCA GGG GGG GCT TAC TCG GCG TTT TCC 112 Met Ser Trp Ser Gly Gly Ala Tyr Ser Ala Phe Ser 1 5 GAC ACT GCG TTG GTT GCG GCA GTG CGC ACC CCC TGG ATT GAT TGC GGG 160 Asp Thr Ala Leu Val Ala Ala Val Arg Thr Pro Trp Ile Asp Cys Gly 20 GGT GCC CTG TCG CTG GTG TCG CCT ATC GAC TTA GGG GTA AAG GTC GCT 208 Gly Ala Leu Ser Leu Val Ser Pro Ile Asp Leu Gly Val Lys Val Ala 35 CGC GAA GTT CTG ATG CGT GCG TCG CTT GAA CCA CAA ATG GTC GAT AGC 256 Arg Glu Val Leu Met Arg Ala Ser Leu Glu Pro Gln Met Val Asp Ser 50 55 GTA CTC GCA GGC TCT ATG GCT CAA GCA AGC TTT GAT GCT TAC CTG CTC 304 Val Leu Ala Gly Ser Met Ala Gin Ala Ser Phe Asp Ala Tyr Leu Leu 70 CCG CGG CAC ATT GGC TTG TAC AGC GGT GTT CCC AAG TCG GTT CCG GCC 352 Pro Arg His Ile Gly Leu Tyr Ser Gly Val Pro Lys Ser Val Pro Ala 85 TTG GGG GTG CAG CGC ATT TGC GGC ACA GGC TTC GAA CTG CTT CGG CAG 400 Leu Gly Val Gin Arg Ile Cys Gly Thr Gly Phe Glu Leu Leu Arg Gin 100 105 GCC GGC GAG CAG ATT TCC CAA GGC GCT GAT CAC GTG CTG TGT GTC GCG 448 Ala Gly Glu Gin Ile Ser Gln Gly Ala Asp His Val Leu Cys Val Ala 110 115 120 GCA GAG TCC ATG TCG CGT AAC CCC ATC GCG TCG TAT ACA CAC CGG GGC 496 Ala Glu Ser Met Ser Arg Asn Pro Ile Ala Ser Tyr Thr His Arg Gly 125 130 135 140 GGG TTC CGC CTC GGT GCG CCC GTT GAG TTC AAG GAT TTT TTG TGG GAG 544 Gly Phe Arg Leu Gly Ala Pro Val Glu Phe Lys Asp Phe Leu Trp Glu 145 150 155 GCA TTG TTT GAT CCT GCT CCA GGA CTC GAC ATG ATC GCT ACC GCA GAA 592 Ala Leu Phe Asp Pro Ala Pro Gly Leu Asp Met Ile Ala Thr Ala Glu 160 165 170 AAC CTG GCG CGC A TTG AGG GCG CAA GAG GAG AAA TGG ATT GAC CAA GAG 641 Asn Leu Ala Arg Leu Arg Ala Gin Glu Glu Lys Trp Ile Asp Gin Glu 175 176 197 200 205 ATC GTG GCT GTT ACG GAT GAA CAG TTC GAT TTA GAG GGC TAC AAC AGT 689 Ile Val Ala Val Thr Asp Glu Gin Phe Asp Leu Glu Gly Tyr Asn Ser 210 215 220 CGA GCA ATT GAA CTG CCT CGG AAG GCA AAA TTG TTG ATC GTG ACA GTC 737 Arg Ala Ile Glu Leu Pro Arg Lys Ala Lys Leu Leu Ile Val Thr Val 225 230 235 240 -51- ATC CGC GGC CTA Ile Arg Gly Leu
GCA
Ala 245 GTC TTT GAA GCC Val Phe Glu Ala
CTT
Leu 250 TCC CGA TTG AAG Ser Arg Leu Lys CCT GTT Pro Val 255 CAT TCT GGC His Ser Gly GGC CCC GCG Gly Ala Ala 275
GGG
Gly 260 GTG CAG ACT GCG Val Gin Thr Ala
GGC
Gly 265 AAC AGC TGT GCC Asn Ser Cys Ala GTA GTG GAC Val Val Asp 270 ACA CAG CCG Thr Gin Pro GCG GCT TTG GTG Ala Ala Leu Val CGA GAG TCG TCT Arg Glu Ser Ser
GCG
Ala 285 GTC TTG Val Leu 290 GCT AGG ATA CTG Ala Arg Ile Leu ACC TCC GTA GTC GGG ATC GAG CCC GAG Thr Ser Val Val Gly Ile Giu Pro Glu ATG GGG CTC GGC Met Gly Leu Gly
CCT
Pro 310 GCG CCC GCG ATT CGC CTG CTG CTT GCG Ala Pro Ala Ile Arg Leu Leu Leu Ala 315
CGT
Arg 320 AGT GAT CTT AGT Ser Asp Leu Ser
TTG
Leu 325 AGG GAT ATC GAC Arg Asp Ile Asp
CTC
Leu 330 TTT GAG ATA AAC Phe Giu Ile Asn GAG GCG Glu Ala 335 CAG CCC GCC Gin Ala Ala TCA AAA CTT Ser Lys Leu 355
CAA
Gin 340 GTT CTA GCG GTA Val Leu Aia Val CAT GAA TTG GGT His Glu Leu Gly ATT GAG CAC Ile Giu His 350 CAC CCG CTT His Pro Leu AAT ATT TGG GGC Asn Ile Trp Gly
GGG
Gly 360 CCC ATT GCA CTT Ala Ile Ala Leu
GGA
Gly 365 CCC GCG Ala Ala 370 ACC GGA TTG CGT Thr Gly Leu Arg
CTC
Leu 375 TGC ATG ACC CTC Cys Met Thr Leu
GCT
Ala 380 CAC CAA TTG CAA His Gin Leu Gin
GCT
Ala 385 AAT AAC TTT CGA Asn Asn Phe Arg GGA ATT CCC TCG Gly Ile Ala Ser TGC ATT GGT GGG Cys Ile Gly Gly
GGA
Gly 400 1025 1073 1121 1169 1217 1265 1313 1373 1433 1493 1543 CAG GGG ATG GCG Gin Cly Met Ala CTT TTA GAG AAT Leu Leu Giu Asn
CCC
Pro 410 CAC TTC GGT TCG His Phe Gly Ser TCC TCT Ser Ser 415 GCA CGA AGT Ala Arg Ser
TCG
Ser 420 ATG ATT AAC AGA Met Ile Asn Arg
GTT
Va1 425 GAC CAC TAT CCA Asp His Tyr Pro CTG AGC TAA Leu Ser 430 431 CGGGCATCTC CTTTGTTGCT TTGAGGTGGC CCACGAGGA GGGCTCCAAA ATCTCTGCTA AAAACAACAA CAAGGAACAG GGAACATGAT TAGTTTCGCT CGTATGGCAG AAACTTTAGG AGTCCAGGCT AAACTTCCCC TTGCCTTCGC ACTCGTATTA TGTGTCGGGC TGATTGTTAC CGGCACGGGT TTCTACAGTG TACATACCTT GTCAGGGTTG CTGGCAATTC FIG. 2r: 52 Sequence 1
CTGCAGCCAG
GGCTCCAATT
GCTAGGGAGA
CATTCTGCAT
AAGGTTGCTA
GCATGGAAAT
TGGAAGCACG
AAAGAGCATG
TGCCGAAACT
CATGCCGAGC
CGATAAGGCC
GGTTGGGA-AG
CAGGGGATCA
TGGATTGCAC
ACAACAGACA
GGTTCTTTTT
GCGGCTATCG
TGAAGCGGGA
TCACCTTGCT
GCTTGATCCG
TACTCGGATG
CGCGCCAGCC
CGTGACCCAT
ATTCATCGAC
CCGTGATATT
TATCGCCGCT
AGCGGGACTC
TCATGTGTGC
TGGCATCGAC
AGCAGCTGAA
TAAATAATAA
GGAATGATAT
CCATTTATGC
CGGATTGGAC
GAGCAAGACT
GTCAGGGTGA
GCAG
GGCTGAAAAG
GCTCGATGGC
TAAATTTGCT
CATGAAATTC
GGAGAGTGCA
GGCATGAGCT
ATTCCTCGCC
CAACTGACCA
GCCCGCGTTC
CTGACTCTGG
ATCGGGACAG
CCCTGCAAAG
AGATCTGATC
GCAGGTTCTC
ATCGGCTGCT
GTCAAGACCG
TGGCTGGCCA
AGGGACTGGC
CCTGCCGAGA
GCTACCTGCC
GAAGCCGGTC
GA.ACTGTTCG
GGCGATGCCT
TGTGGCCGGC
GCTGAAGAGC
CCCGATTCGC
TGGGGTTCGA
TGAGGAGTCA
CTACGTGTAA
AGCAGCTTTG
AGGATTCTTG
GAAAGCAAGT
AGCGCTGCCA
GGTGCGTTGG
CAAGTCTGAC
TCGTAACTTT
GAGGGATTCA
GCCGCGATTG
GGCCATGGTG
ATGAAATCAT
TTGCTCGTAA
TTGCTGGATA
CGGTAGAGCG
ACAAGAAAAT
TGCGCTCTCA
ATGCTTTCGT
CAAGCGAACC
TAAACTGGAT
AAGAGACAGG
CGGCCGCTTG
CTGATGCCGC
ACCTGTCCGG
CGACGGGCGT
TGCTATTGGG
AAGTATCCAT
CATTCGACCA
TTGTCGATCA
CCAGGCTCAA
GCTTGCCGAA
TGGGTGTGGC
TTGGCGGCGA
AGCGCATCGC
AATGACCGAC
CGTTGGATCA
GTTCGTGGAC
GTTTTGATCG
TGAAGCTTTA
AGATCAGTCT
ATGTCAGCTG
GGACAACACC
AAATGCGCCG
GACCGGGGGC
GTGAGGTCAT
AGTGTCTTGG
GCGGCCCCTG
CACTTTTCGG
GCCCAGGAAG
TGATTAGAGA
GTAACCGCGA
CGTCGTCACC
CGGCGCCACA
TCAGGCTGAC
GGAATTGCCA
GGCTTTCTTG
ATGAGGATCG
GGTGGAGAGG
CGTGTTCCGG
TGCCCTCAAT
TCCTTGCGCA
CGAAGTGCCG
CATGGCTGAT
CCAAGCGAAA
GGATGATCTG
GGCGCGCATG
TATCATGGTG
GGACCGCTAT
ATGGGCTGAC
CTTCTATCGC
CAAGCGACGC
ACGGCATAAA
GCCCTTTGCA
GAGGTAGCGG
GTTGTCCGTA
GCACTTTCAA
CAAACTCGAT
CTCAAGTATA
ACTGTCAATG
TTGGTATCCA
GAAGGGAGGG
GCGCGCTCTT
ATGGGTTGGA
GGGGTGGGTG
CACGCGGGTT
CATTAACTAT
CATTCAGGAC
GGAGTGTCCT
GTGATTGGCG
CTGAGCCATC
GCTGGGGCGC
CCGCCAAGGA
TTTCGCATGA
CTATTCGGCT
CTGTCAGCGC
GAACTGCAGG
GCTGTGCTCG
GGGCAGGATC
GCAATGCGGC
CATCGCATCG
GACGAAGAC
CCCGACGGCG
GAAAATGGCC
CAGGACATAG
CGCTTCCTCG
CTTCTTGACG
CCTGGCCGCG
TATTCCAGTG
CGCGCACTAT
GCGGAAAGGT
AACGAAkAATA
AATAGCTACC
GCAGCTGGAT
GCCTTGCCTC
GTTATATCCG
ATCGTCTCGA
GACGGCGCCT
GGAGAGTTCG
TGATTTTCTG
CACGGGATTG
TCAGGATGGT
TTTGGCGGAA
CGTAAAAAGG
CCGGTATCGG
TAGATCGCAA
CTGAAGGCAT
CCTCTGGTAA
TCTGATGGCG
TTGAACAAGA
ATGACTGGGC
AGGGGCGCCC
ACGAGGCAGC
ACGTTGTCAC
TCCTGTCATC
GGCTGCATAC
AGCGAGCACG
ATCAGGGGCT
ACGATCTCGT
GCTTTTCTGG
CGTTGGCTAC
TGCTTTACG
AGTTCTTCTG
GTGATTGCAT
GACGGAGGTT
ATCTCTATGC
GCAGAATGTC
AAAATAAAGA
CTGGCAGGCG
GTAGGTAGCT
TCGCCTGAAT
GATATTCAA
TATTCTGGCT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2164 53
CTGCAGCCAG
GGCTCCAATT
GCTAGGGAGA
CATTCTGCAT
AAGGTTGCTA
GCATGGAAAT
TGGAAGCACG
AAAGAGCATG
TGCCGAAACT
CATGCCGAGC
AGGCGGTTTG
GACATGGAAG
GTCGCCTTGC
ACCCAGTTGA
CGCAACTGGT
ATGGCTTGTT
CGTTACGCCG
CGATGTTACG
CATTCGCACA
TTTCGGTCGT
CCTCGGGAAC
GGTTGTTGGC
CTATATCTAT
CATCAATCTC
AGATTACGGT
GATGCACTTT
CTTCCCTGAT
CAGTGGACGG
ACTATATCTC
AAGGTGCAGA
AAATAAAAAT
CTACCCTGGC
TGGATGTAGG
GCCTCTCGCC
ATCCGGATAT
CTCGATATTC
GGCTGAAAAG
GCTCGATGGC
TAAATTTGCT
CATGAAATTC
GGAGAGTGCA
GGCATGAGCT
ATTCCTCGCC
CAACTGACCA
GCCCGCGTTC
CTGACTCTGG
CGTATTGGGC
CCATCACAAA
GTATAATATT
CATAAGCCTG
CCAGAACCTT
ATGACTGTTT
TGGGTCGATG
CAGCAGGGCA
TGTAGGCTCG
GAGTTCGGAG
TTGCTCCGTA
GCTCTCGCGG
GATCTCGCAG
CTCAAGCATG
GACGATCCCG
GATATCGACC
TGCATTCATG
AGGTTTGGCA
TATGCAGCAG
ATGTCTAAAT
AAAGAGGAAT
AGGCGCCATT
TAGCTCGGAT
TGAATGAGCA
TCAAAGTCAG
TGGCTGCAG
GAGGGATTCA
GCCGCGATTG
GGCCATGGTG
ATGAAATCAT
TTGCTCGTAA
TTGCTGGATA
CGGTAGAGCG
ACAAGAAAAT
TGCGCTCTCA
ATGCTTTCGT
GCATGCATAA
CGGCATGATG
TGCCCATGGA
TTCGGTTCGT
GACCGAACGC
TTTTGTACAG
TTTGATGTTA
GTCGCCCTAA
GCCCTGACCA
ACGTAGCCAC
GTAAGACATT
CTTACGTTCT
TCTCCGGCGA
AGGCCAACGC
CAGTGGCTCT
CAAGTACCGC
TGTGCTGAGG
TCGACCTACG
CTGAAAGCAG
AATAAAGGAT
GATATCAAAG
TATGCAGCGC
TGGACGGTGC
AGACTCAAGT
GGTGATCGTA
Sequence 2
GTGAGGTCAT
AGTGTCTTGG
GCGGCCCCTG
CACTTTTCGG
GCCCAGGAAG
TGATTAGAGA
GTAACCGCGA
CGTCGTCACC
CGGCGCCACA
TCAGGCTGAC
AAACTGTTGT
AACCTGAATC
CGCACACCGT
AAACTGTAAT
AGCGGTGGTA
TCTATGCCTC
TGGAGCAGCA
AACAAAGTTA
ACTCAAATCC
CTACTCCCAA
CATCGCGCTT
GCCCAGGTTT
GCACCGGAGG
GCTTGGTGCT
CTATACAAAG
CACCTALACAA
AGTCACGTTG
TGTAAGTTCG
CTTTGGTTTT
TCTTGTGAAG
CAAGTAGATC
TGCCAATGTC
GTTGGGGACA
CTGACAAATG
ACTTTGACCG
GAAGGGAGGG
GCGCGGTCTT
ATGGGTTGGA
GGGGTGGGTG
CACGCGGGTT
CATTAACTAT
CATTCAGGAC
GGAGTGTCCT
GTGATTGGCG
CTGAGCCATC
AATTCATTAA
GCCAGCGGCA
GGAAACGGAT
GCAAGTAGCG
ACGGCGCAGT
GGGCATCCAA
ACGATGTTAC
GGTGGCTCAA
ATGCGGGCTG
CATCAGCCGG
GCTGCCTTCG
GAGCAGCCGC
CAGGGCATTG
TATGTGATCT
TTGGGCATAC
TTCGTTCAAG
GATCAACGGC
TGGACGCCCT
GATCGGAGGT
CTTTAGTTGT
AGTCTGCACT
AGCTGCAAAC
ACACCCTCAA
CGCCGACTGT
GGGGCTTGGT
GACGGCGCCT
GGAGAGTTCG
TGATTTTCTG
CACGGGATTG
TCAGGATGGT
TTTGGCGGAA
CGTAAAAAGG
CCGGTATCGG
TAGATCGCAA
CTGAGGGGAG
GCATTCTGCC
TCAGCACCTT
GAAGGCACGA
TATGCGCTCA
GGCGGTTTTC
GCAGCAAGCG
GCAGCAGCAA
GTATGGGCAT
CTCTTGATCT
ACTCCGATTA
ACCAAGAAGC
GTAGTGAGAT
CCACCGCGCT
ACGTGCAAGC
GGGAAGAAGT
CCGAGATCGG
ATAAATATTC
TTGCACGCGC
AGCGGGCGGA
CCGTAAACGA
TTCAAAATAG
TCGATGCAGC
GTATAGCCTT
CAATGGTTAT
ATCCAATCGT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2119 -54- Sequence 3
CTGCAGCCAG
GGCTCCAATT
GCTAGGGAGA
CATTCTGCAT
AAGGTTGCTA
GCATGGAAAT
TGGAAGCACG
AAAGAGCATG
TGCCGAAACT
CATGCCGAGC
CGATCAACGG
GTGGACGCCC
TGATCGGAGG
GCTTTAGTTG
CAGTCTGCAC
CAGCTGCAAA
AACACCCTCA
GCGCCGACTG
GGGGGCTTGG
GGCTGAAAAG
GCTCGATGGC
TAAATTTGCT
CATGAAATTC
GGAGAGTCCA
GGCATGAGCT
ATTCCTCGCC
CAACTGACCA
GCCCGCGTTC
CTGACTCTGG
CATAAATATT
TTTGCACGCG
TAGCGGGCGG
TCCGTAAACG
TTTCAAAATA
CTCGATGCAG
AGTATAGCCT
TCAATGGTTA
TATCCAATCG
GAGGGATTCA
GCCGCGATTG
GGCCATGGTG
ATGAAATCAT
TTGCTCGTAA
TTGCTGGATA
CGGTAGAGCG
ACAAGAAAAT
TGCGCTCTCA
ATGCTTTCGT
CCAGTGGACG
CACTATATCT
AAAGGTGCAG
AAAATAAAAA
GCTACCCTGG
CTGGATGTAG
TGCCTCTCGC
TATCCGGATA
TCTCGATATT
GTGAGGTCAT
AGTGTCTTGG
GCGGCCCCTG
CACTTTTCGG
GCCCAGGAAG
TGATTAGAGA
GTAACCGCGA
CGTCGTCACC
CGGCGCCACA
TCAGGCTGAC
GAGGTTTGGC
CTATGCAGCA
AATGTCTAA.A
TAAAGAGGAA
CAGGCGCCAT
GTAGCTCGGA
CTGAATGAGC
TTCAAAGTCA
CTGGCTGCAG
GAAGGGAGGG
GCGCGGTCTT
ATGGGTTGGA
GGGGTGGGTG
CACGCGGGTT
CATTAACTAT
CATTCAGGAC
GGAGTGTCCT
GTGATTGGCG
CTGAGCCATC
ATCGACCTAC
GCTGAAAGCA
TAATAAAGGA
TGATATGAAA
TTATGCAGCG
TTGGACGGTG
AAGACTCAAG
GGGTGATCGT
GACGGCGCCT
GGAGAGTTCG
TGATTTTCTG
CACGGGATTG
TCAGGATGGT
TTTGGCGGAA
CGTAAAAAGG
CCGGTATCGG
TAGATCGCAA
CTGAAGGCAT
GTGTAAGTTC
GCTTTGGTTT
TTCTTGTGAA
GCAAGTAGAT
CTGCCAATGT
CGTTGGGGAC
TCTGACAAAT
AACTTTGACC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1120 55 Sequence 4
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAAGAT
CTCCAGCTCA
AGAATAACAA
CGGTCGGAGC
AGGGGCCTGC
TGGAAAATCG
AGCAAACACT
ACGTGGCCAA
GCGTTGAGTT
TACTGGCCTT
CGTCCGAGCT
ATGAAACTGA
AGCCTTTCGA
CCGCGGCGGA
TTTCCCGCAG
ATGCCGGGCA
AACCGGALATT
GGATGGCTTT
CAGGATGAGG
CTTGGGTGGA
CCGCCGTGTT
CCGGTGCCCT
GCGTTCCTTG
TGGGCGAAGT
CCATCATGGC
ACCACCAAGC
ATCAGGATGA
TCAAGGCGCG
CGAATATCAT
TGGCGGACCG
GCGAATGGGC
TATCGCCCGG
TTTTCTTGGC
GCTTCGCTTC
TAAGCATTCT
TTGAGGCCGA
AATTAAAATA
AGGGCAATTT
TTGACTCCTC
TGAGCAGCTG
AAACTTGGAG
TGAAGCAATT
GCTTTGCGAC
ATGGATGGAG
TCAGCCGCTG
TGGGCCGCTG
TACCCCGCGG
GCTGACTACA
TCATCTGATC
TALACCTAGTG
TGCAGATATG
ALATCTGTCTG
GCCAGCTGGG
CTTGCCGCCA
ATCGTTTCGC
GAGGCTATTC
CCGGCTGTCA
GAATGAACTG
CGCAGCTGTG
GCCGGGGCAG
TGATGCAATG
GAAACATCC
TCTGGACGAA
CATGCCCGAC
GGTGGAAAAT
CTATCAGGAC
TGACCGCTTC
TTCTATCAGC
CATGCTTGTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGGAAACCGC
TTGGGCTATT
AGGAGGTCAG
GGCTCGGCTC
CTGCGTCTGA
GCCGACGCGG
ATTGCTGGCT
CCCGAACATC
GGTGTCGTTG
GCCGGCATAT
ACTTCTGCCC
GTGCTGGGCG
TTCACCGGCG
CCCGTTACCC
GCGGACGTTG
GCACCGGACT
GCGCCCTCTG
AGGATCTCAT
ATGATTGAAC
GGCTATGACT
GCGCAGGGGC
CAGGACGAGG
CTCGACGTTG
GATCTCCTGT
CGGCGGCTGC
ATCGAGCGAG
GAGCATCAGG
GGCGAGGATC
GGCCGCTTTT
ATAGCGTTGG
CTCGTGCTTT
GGGCCGCTTT
GCCTGAACCT
GCATCGAGAT
TGGTGGCTTT
CTTGGCGGCG
ATGGTTTCTT
GGCTGAGCAG
CGATGAGCAT
TTGATCGCAT
GTAGGCTGGA
TTTCTGCTGA
CGGTGGCAAG
ACAAGGCGAT
GGGTCATTAG
TCGCAGCAGG
TGCTTGCGGA
ACGCTGAAGT
CCACTGCCGT
TGGAATTGCG
CACAACGGGT
ATGTGCTGCT
GTAAGGTTGG
GGCGCAGGGG
AAGATGGATT
GGGCACAACA
GCCCGGTTCT
CAGCGCGGCT
TCACTGAAGC
CATCTCACCT
ATACGCTTGA
CACGTACTCG
GGCTCGCGCC
TCGTCGTGAC
CTGGATTCAT
CTACCCGTGA
ACGGTATCGC
CGAAAGTCAT
TCGTTGACAT
GCTGAGGTCA
GALACAGCCTG
TCGAAGCGAT
ATGTGAATTT
TTGCCTCTAT
TCTTGGTTTG
GAAGAAGGCG
TCGTGCGATT
CTTTGGCAAT
CCTGAAGGAT
GTTTCCAGGG
TCCCTGGAAC
TAATCGCGCC
GCTAATTGCT
CGGTGCGCTG
GGCCAAGCAC
TGGCAAATCG
GTTGACGGTG
GCCGGAAGGG
GAAGCCCTGC
ATCAAGATCT
GCACGCAGGT
GACAATCGGC
TTTTGTCAAG
ATCGTGGCTG
GGGAAGGGAC
TGCTCCTGCC
TCCGGCTACC
GATGGAAGCC
AGCCGAACTG
CCATGGCGAT
CCACTGTGGC
TATTGCTGAA
CGCTCCCGAT
GGTGTTAGCC
AGGGCAGAGG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
AATGGTGCCC
CACCTGGAGC
GCAATGCTTC
CGCAGCCGTG
AGCCGCGAGC
GCGGAGGCAC
TTCCCTATCG
ATGCTCAAGC
CGTTACTTCG
TTCAGTGCTC
ATCATGCGTG
CCGGTGATCC
AAAACCTTCA
ACAGCAAGCG
A.AAGTAAACT
GATCAAGAGA
TCTCCGGCCG
TGCTCTGATG
ACCGACCTGT
GCCACGACGG
TGGCTGCTAT
GAGAAAGTAT
TGCCCATTCG
GGTCTTGTCG
TTCGCCAGGC
GCCTGCTTGC
CGGCTGGGTG
CAGCTTGGCG
TCGCAGCGCA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 56
TCCCCTTCTA
CGACCAAGCG
AACTTGGCGA
TCAACGGAGT
GGGCCCGTTT
TGATGAGCAC
GACGGAACTC
ACGCCGCCTG
TGAGTACGCT
CGATGGATTC
CGAGCAAGGA
TC
TCGCCTTCTT
ACGCCCGCCA
TGCGCGCACC
GTTAGAACCG
CTTGAGTATT
CCTGGAAGGC
CGTTCCCCAG
ACGCTAAGCG
CTTGGATGCC
GATTGCCCGT
AGAACTCATT
GACGAGTTCT
TGCCAAGCCT
CTACGGAGAA
TTGGTAGTGG
CATTGGATAG
GCGCTGTACG
TACCGCGATG
GGGGCGGGGG
GCGGCGGCTG
GCTGCCGGCG
TCCCGGTTAG
TCTGAGCGGG
GTTCTCGTGC
GCGATCCACG
TTTTGGACGG
TCACGCGTGG
CGGACGACTG
ACTATTTTGC
CGCCCGCATC
AGATTGGTAA
TCTCAAAAAA
TGGCTCGAGA
ACTCTGGGGT
AAAGTCCTGT
GACTGCTCTC
GCCCAGGAGC
TAGCTTCGAG
GGTTCATCTT
CTCTTCCGAT
CCAGCCCAGA
CGGCAATTTC
AACGCTCTAC
CATGTCCAAC
TCGAAATGAC
GGGTGAGTCG
TGTCCTCCTT
ATGCGCTTCT
CCTGCACAGC
CGCCATTCAT
GTCCGATTCC
CAGCAACAAA
GTCAATGTGA
GTCTTGGTGG
CTTGAGGAAT
2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2822 57 Sequence
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAAGAT
CTCCAGCTCA
AGAATAACAA
CGGTCGGAGC
AGGGGCCTGC
TGGAAAATCG
AGCAAACACT
ACCTGGCCAA
GCGTTGAGTT
TACTGGCCTT
CGTCCGACCT
ATGAAACTGA
AGCCTTTCGA
CCGCGGCGGA
TTTCCCGCAG
ATGCCGGGCA
GGGCGCATGC
CAAACGGCAT
TATTTGCCCA
CCTGTTCGGT
CCTTGACCGA
GTTTTTTTGT
GATGTTTGAT
GGCAGTCGCC
CTCGGCCCTG
GGAGACGTAG
CGTAGTAAGA
GCGGCTTACG
GCAGTCTCCG
CATGAGGCCA
CCCGCAGTGG
TATCGCCCGG
TTTTCTTGGC
GCTTCGCTTC
TAAGCATTCT
TTGACGCCGA
AATTAAAATA
AGGCCAATTT
TTGACTCCTC
TGAGCAGCTG
AAACTTGGAG
TGAAGCAATT
GCTTTGCGAC
ATGGATGGAG
TCAGCCGCTG
TGGGCCGCTG
TACCCCGCGG
GCTGACTACA
TCATCTGATC
TAACCTAGTG
TGCAGATATG
AATCTGTCTG
ATAAAAACTG
GATGAACCTG
TGGACGCACA
TCGTAAACTG
ACGCAGCGGT
ACAGTCTATG
GTTATGGAGC
CTAAAACAAA
ACCAAGTCAA
CCACCTACTC
CATTCATCGC
TTCTGCCCAG
GCGAGCACCG
ACGCGCTTGG
CTCTCTATAC
TTCTATCAGC
CATGCTTGTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGGAAACCGC
TTGGGCTATT
AGGAGGTCAG
GGCTCGGCTC
CTGCGTCTGA
GCCGACGCGG
ATTGCTGGCT
CCCGAACATC
GGTGTCGTTG
GCCGGCATAT
ACTTCTGCCC
GTGCTGGGCG
TTCACCGGCG
CCCGTTACCC
GCGGACGTTG
GCACCGGACT
TTGTAATTCA
AATCGCCAGC
CCGTGGAAAC
TAATGCAAGT
GGTAACGGCG
CCTCGGGCAT
AGCAACGATG
GTTAGCTGGC
ATCCATGCGG
CCAACATCAG
GCTTGCTGCC
GTTTGAGCAG
GAGGCAGGGC
TGCTTATGTG
AAAGTTGGGC
GGGCCGCTTT
GCCTGAACCT
GCATCGAGAT
TGGTGGCTTT
CTTGGCGGCG
ATGGTTTCTT
GGCTGAGCAG
CGATGAGCAT
TTGATCGCAT
GTAGGCTGGA
TTTCTGCTGA
CGGTGGCAAG
ACAAGGCGAT
GGGTCATTAG
TCGCAGCAGG
TGCTTGCGGA
ACGCTGAAGT
GCACTGCCGT
TGGAATTGGG
CACAACGGGT
ATGTGCTGGG
TTAAGCATTC
GGCATCAGCA
GGATGAAGGC
AGCGTATGCG
CAGTGGCGGT
CCAAGCAGCA
TTACGCAGCA
TCAAGTATGG
GCTGCTCTTG
CCGGACTCCG
TTCGACCAAG
CCGCGTAGTG
ATTGCCACCG
ATCTACGTGC
ATACGGGAAG
CGAAAGTCAT
TCGTTGACAT
GCTGAGGTCA
GAACAGCCTG
TCGAAGCGAT
ATGTGAATTT
TTGCCTCTAT
TCTTGGTTTG
GAAGAAGGCG
TCGTGCGATT
CTTTGGCAAT
CCTGAAGGAT
GTTTCCAGGG
TCCCTGGAAC
TAATCGCGCC
GCTAATTGCT
CGGTGCGCTG
GGCCAAGCAC
TGGCAAATCG
GTTGACGGTG
GGAGAGGCGG
TGCCGACATG
CCTTGTCGCC
ACGAACCCAG
CTCACGCAAC
TTTCATGGCT
AGCGCGTTAC
GCAACGATGT
GCATCATTCG
ATCTTTTCGG
ATTACCTCGG
AAGCGGTTGT
AGATCTATAT
CGCTCATCAA
AAGCAGATTA
AAGTGATGCA
GGTGTTAGCC
AGGGCAGAGG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
AATGGTGCCC
CACCTGGAGC
GCAATGCTTC
CGCAGCCGTG
AGCCGCGAGC
GCGGAGGCAC
TTCCCTATCG
ATGCTCAAGC
CGTTACTTCG
TTCAGTGCTC
ATCATGCGTG
CCGGTGATCG
AAAACCTTCA
TTTGCGTATT
GAAGCCATCA
TTGCGTATAA
TTGACATAAG
TCGTCCAGAA
TGTTATGACT
GCCGTGGGTC
TACGCAGCAG
CACATGTAGG
TCGTGAGTTC
GAACTTGCTC
TGGCGCTCTC
CTATGATCTC
TCTCCTCAAG
CGGTGACGAT
CTTTGATATC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 58
GACCCAAGTA
TGTGGGTGAG
CTCTGTCCTC
AGCATGCGCT
GAGCCTGCAC
CTTCGCCATT
GATGTCCGAT
AGACAGCAAC
TTCGTCAATG
TACGTCTTGG
AACCTTGAGG
CCGCCACCTA
TCGAACTTGG
CTTTCAACGG
TCTGGGCCCG
AGCTGATGAG
CATGACGGAA
TCCACGCCGC
AAATGAGTAG
TGACGATGGA
TGGCGAGCAA
AATTC
ACAATTCGTT
CGATGCGCGC
AGTGTTAGAA
TTTCTTGAGT
CACCCTGGAA
CTCCGTTCCC
CTGACGCTAA
GCTCTTGGAT
TTCGATTGCC
GGAAGAACTC
CAAGCCGAGA
ACCCTACGGA
CCGTTGGTAG
ATTCATTGGA
GGCGCGCTGT
CAGTACCGCG
GCGGGGGCGG
GCCGCGGCGG
CGTGCTGCCG
ATTTCCCGGT
TCGGCTTCCC
GAAGCGATCC
TGGTTTTGGA
TAGTCACGCG
ACGCGGACGA
ATGACTATTT
GGGCGCCCGC
CTGAGATTGG
GCGTCTCAAA
TAGTGGCTCG
TGCAAAGTCC
ACGGACTGCT
CGGGCCCAGG
TGGTAGCTTC
CTGGGTTCAT
TGCCTCTTCC
ATCCCAGCCC
TAACGGCAAT
AAAAACGCTG
AGACATGTCC
2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2775 59 Sequence 6
GAATTCCGCG
GGTAGGGTCT
TGCGTTTGCC
TTAACTCGCG
GTCTCGCCCT
CGATTAAGAT
CTCCAGCTCA
AGAATAACAA
CGGTCGGAGC
AGGGGCCTGC
TGGAAAATCG
AGCAAACACT
ACGTGGCCAA
GCGTTGAGTT
TACTGGCCTT
CGTCCGAGCT
ATGAAACTGA
AGCCTTTCGA
CCGCGGCGGA
TTTCCCGCAG
ATGCCGGGCA
CGGAGAAGCG
GTAGTGGTTT
TGGATAGTCA
CTGTACGCGG
CGCGATGACT
GCGGGGGCGC
GCGGCTGAGA
GCCGGCGTCT
CGGTTAGTGG
TATCGCCCGG
TTTTCTTGGC
GCTTCGCTTC
TAAGCATTCT
TTGAGGCCGA
AATTAAAATA
AGGGCAATTT
TTGACTCCTC
TGAGCAGCTG
AAACTTGGAG
TGAAGCAATT
GCTTTGCGAC
ATGGATGGAG
TCAGCCGCTG
TGGGCCGCTG
TACCCCGCGG
GCTGACTACA
TCATCTGATC
TAACCTAGTG
TGCAGATATG
AATCTGTCTG
ATCCACGGAC
TGGACGGGCC
CGCGTGGTAG
ACGACTGGGT
ATTTTGCCTC
CCGCATCCCA
TTGGTAACGG
CAAAAAAAAC
CTCGAGACAT
TTCTATCAGC
CATGCTTGTT
GCGATGAACC
GTCATTTTTT
TTCTTGGGCG
AGGAAACCGC
TTGGGCTATT
AGGAGGTCAG
GGCTCGGCTC
CTGCGTCTGA
GCCGACGCGG
ATTGCTGGCT
CCCGAACATC
GGTGTCGTTG
GCCGGCATAT
ACTTCTGCCC
GTGCTGGGCG
TTCACCGGCG
CCCGTTACCC
GCGGACGTTG
GCACCGTGGG
TGCTCTCTGT
CAGGAGCATG
CTTCGAGCCT
TCATCTTCGC
TTCCGATGTC
GCCCAGACAG
CALATTTCGTC
GCTGTACGTC
GTCCAACCTT
GGGCCGCTTT
GCCTGAACCT
GCATCGAGAT
TCGTGGCTTT
CTTGGCGGCG
ATGGTTTCTT
GGCTGAGCAC
CGATGAGCAT
TTGATCGCAT
GTAGGCTGGA
TTTCTGCTGA
CGGTGGCAAG
ACAAGGCGAT
GGGTCATTAG
TCGCAGCAGG
TGCTTGCGGA
ACGCTGAAGT
GCACTGCCGT
TGGAATTGGG
CACAACGGGT
TGAGTCGAAC
CCTCCTTTCA
CGCTTCTGGG
GCACAGCTGA
CATTCATGAC
CGATTCCACG
CAACAAATGA
AATGTGACGA
TTGGTGGCGA
GAGGAATTC
CGAAAGTCAT
TCGTTGACAT
GCTGAGGTCA
GAACAGCCTG
TCGAAGCGAT
ATGTGAATTT
TTGCCTCTAT
TCTTGGTTTG
GAAGA.AGGCG
TCGTGCGATT
CTTTGGCAAT
CCTGA-AGGAT
GTTTCCAGGG
TCCCTGGAAC
TAATCGCGCC
GCTAATTGCT
CGGTCCGCTG
GGCCA.AGCAC
TGGCAAATCG
GTTGACGGTG
TTGGCGATGC
ACGGAGTGTT
CCCGTTTCTT
TGAGCACCCT
GGAACTCCGT
CCGCCTGACG
GTAGGCTCTT
TGGATTCGAT
GCAAGGAAGA
GGTGTTAGCC
AGGGCAGAGG
GGATTTTTCC
ATGAAAGGTG
GCTCCACTAC
GTCTGGCATA
ATGGTTATTC
AATGGTGCCC
CACCTGGAGC
GCAATGCTTC
CGCAGCCGTG
AGCCGCGAGC
GCGGAGGCAC
TTCCCTATCG
ATGCTCA.AGC
CGTTACTTCG
TTCAGTGCTC
ATCATGCGTG
CCGGTGATCG
AAAACCTTCA
GCGCACCCTA
AGAACCGTTG
GAGTATTCAT
CGAAGGCGCG
TCCCCAGTAC
CTAAGCGGGG
GGATGCCGCG
TGCCCGTCCT
ACTCATTTCC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1779 60 Sequence 7
CTGCAGCCGA
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
TTCTTCCCTT
AACAAACCTG
AAATGTTCCA
AGCGTCCGCT
CTATGTATGC
ATTTGGCGAA
ATGCAGCACC
GTAAGGTTGG
GGCGCAGGG
AAGATGGATT
GGGCACAACA
GCCCGGTTCT
CAGCGCGGCT
TCACTGAAGC
CATCTCACCT
ATACGCTTGA
CACGTACTCG
GGCTCGCGCC
TCGTCGTGAC
CTGGATTCAT
CTACCCGTGA
ACGGTATCGC
TCTGAGCGGG
CGGTCGGCGA
GCCACACTGT
CGGCAGTCCG
ACTTAGGGGT
TCGATAGCGT
GGCACATTGG
TTTGCGGCAC
ATCACGTGCT
GCATCGATTG
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
CCCGGGTCGA
CGTTGCTGCC
CAACGTCCGC
GCTTATCGTC
GGGCATTCCC
GCTGCGTCAC
TTTCCAGGGG
GAAGCCCTGC
ATCAAGATCT
GCACGCAGGT
GACAATCGGC
TTTTGTCAAG
ATCGTGGCTG
GGGAAGGGAC
TGCTCCTGCC
TCCGGCTACC
GATGGAAGCC
AGCCGAACTG
CCATGGCGAT
CGACTGTGGC
TATTGCTGAA
CGCTCCCGAT
ACTCTGGGGT
AAGTTGATGC
GAGTTGGTCA
CACCCCCTGG
AAAGGTCGCT
ACTCGCAGGC
CTTGTACAC
AGGCTTCGAA
GTGTGTCGCG
AGCACTTTAC
AAATCGATCT
AACTTGATAA
CTCGAAGAGG
TCTCCTCAGC
GACGCTGGAG
ATTCTTGAGC
AGGGCGGCAA
GCCATCGCAC
TCTGGAAATG
TATTGCCCGG
ATCGTAGGTC
ACACCAAGCG
AAAGTAAACT
GATCAAGAGA
TCTCCGGCCG
TGCTCTGATG
ACCGACCTGT
GCCACGACGG
TGGCTGCTAT
GAGAA.AGTAT
TGCCCATTCG
GGTCTTGTCG
TTCGCCAGGC
GCCTGCTTGC
CGGCTGGGTG
GAGCTTGGCG
TCGCAGCGCA
TCGAAATGAC
GCTGTATCGT
GGGGGGGCTT
ATTGATTGCG
CGCGAAGTTC
TCTATGGCTC
GGTGTTCCCA
CTGCTTCGGC
GGCTGCAG
CCAGCTGCGC
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
GTCTCGAGCA
ATGGGGAATG
AGAGCTTGCT
ACCTGGAACA
TGTCTCCTGC
TTCTGCAACC
AACCGGAATT
GGATGGCTTT
CAGGATGAGG
CTTGGGTGGA
CCGCCGTGTT
CCGGTGCCCT.
GCGTTCCTTG
TGGGCGAAGT
CCATCATCGC
ACCACCAAGC
ATCAGGATGA
TCAAGGCGCG
CGAATATCAT
TGGCGGACCG
GCCAATGGGC
TCGCCTTCTA
CGACCAAGCG
GGTGAAGATC
ACTCGGCGTT
GGGGTGCCCT
TGATGCGTGC
AAGCAACCTT
AGTCGGTTCC
AGGCCGGCGA
TGGCTGACCA
CGGGCATCAT
GTTCTCCGGT
GAACGCCGAG
CAGTGTCTCG
GATGCCTTCT
TTGGGCTAAG
GCGTCGTATC
TCCTTACGGA
TCTTCAGCTG
TTATTCACTG
GGGACTGGTC
GCCAGCTGGG
CTTGCCGCCA
ATCGTTTCGC
GAGGCTATTC
CCGGCTGTCA
GAATGAACTG
CGCAGCTGTG
GCCGGGGCAC
TGATGCAATG
CPAACATCGC
TCTGGACGALA
CATGCCCGAC
GGTGCAAA.AT
CTATCAGGAC
TGACCGCTTC
TCGCCTTCTT
ACGCCCCTGT
AATCCATGCT
TTCCGACACT
GTCGCTGCTG
GTCGCTTGAA
TGATGCTTAC
GGCCTTGGGG
GCAGATTTCC
TTCAGAATGG
GCCCGCGGCG
CTTGGTGGAT
TCCACATTGC
ATTGGTCATC
CTCGAGGCGC
ACCCGTCCAG
AGCTACGCGG
CTATCGGCAG
GCATTTGGGG
CTGTCGCAAG
TTTGCTGCCG
GCGCCCTCTG
AGGATCTGAT
ATGATTGAAC
GGCTATGACT
GCCCAGGGGC
CACGACGAGG
CTCGACGTTG
GATCTCCTGT
CGGCGGCTGC
ATCGAGCGAG
GAGCATCAGG
GGCGAGGATC
GGCCGCTTTT
ATAGCGTTGG
CTCGTGCTTT
GACGAGTTCT
TTTGCAATGG
GCGTGACGAG
GCGTTGGTTG
TCGCCTATCG
CCACAAATGG
CTGCTCCCGC
GTGCAGCGCA
CAAGGCGCTG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2188 -61- Sequence 8
CTGCAGCCGA
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
TTCTTCCCTT
AACAAACCTG
AAATGTTCCA
AGCGTCCGCT
CTATGTATGC
ATTTGGCGAA
ATGCAGCACC
TGTAATTCAT
ATCGCCAGCG
CGTGGAAACG
AATGCAAGTA
GTAACGGCGC
CTCGGGCATC
GCAACGATGT
TTAGGTGCCT
TCCATGCGGG
CAACATCAGC
CTTGCTGCCT
TTTGAGCAGC
AGGCAGGGCA
GCTTATGTGA
AAGTTGGGCA
CAATTCGTTC
TGCGCTGTAT
TCAGGGGGGG
TGGATTGATT
GCTCGCGAAG
GGCTCTATGG
AGCGGTGTTC
GAACTGCTTC
GCGGGCTGCA
GCATCGATTG
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
CCCGGGTCGA
CGTTGCTGCC
CAACGTCCGC
GCTTATCGTC
GGGCATTCCC
GCTGCGTCAC
TTTCCAGGGG
TAAGCATTCT
GCATCAGCAC
GATGAAGGCA
GCGTATGCGC
AGTGGCGGTT
CAAGCAGCAA
TACGCAGCAG
CAAGTATGGG
CTGCTCTTCA
CGGACTCCGA
TCGACCAAGA
CGCGTAGTGA
TTGCCACCGC
TCTACGTGCA
TACGGGAAGA
AAGCCGAGAT
CGTGGTGAAG
CTTACTCGGC
GCGGGGGTGC
TTCTGATGCG
CTCAAGCAAG
CCAAGTCGGT
GGCAGGCCGG
G
AGCACTTTAC
AAATCGATCT
AACTTGATAA
CTCGAAGAGG
TCTGCTCAGC
GACGCTGGAG
ATTCTTGAGC
AGGGCGGCAA
GCCATCGCAC
TCTGGAAATG
TATTGCCCGG
ATCGTAGGTC
GAGAGGCGGT
GCCGACATGG
CTTGTCGCCT
CGAACCCAGT
TCACGCAACT
TTCATGGCTT
GCGCGTTACG
CAACGATGTT
CATCATTCGC
TCTTTTCGGT
TTACCTCGGG
AGCGGTTGTT
GATCTATATC
GCTCATCAAT
AGCAGATTAC
AGTGATGCAC
CGGCTTCCCC
ATCAATCCAT
GTTTTCCGAC
CCTGTCGCTG
TGCGTCGCTT
CTTTGATGCT
TCCGGCCTTG
CGAGCAGATT
CCAGCTGCGC
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
GTCTCGAGCA
ATGGGGAATG
AGAGCTTGCT
ACCTGGAACA
TGTCTCCTGC
TTCTGCAACC
TTGCGTATTG
AAGCCATCAC
TGCGTATAAT
TGACATAAGC
GGTCCAGAAC
GTTATGACTG
CCGTGGGTCG
ACGCAGCAGG
ACATGTAGGC
CGTGAGTTCG
AACTTGCTCC
GGCGCTCTCG
TATGATCTCG
CTCCTCAAGC
GGTGACGATC
TTTGATATCG
TGTTTTGCAA
GCTGCGTGAC
ACTGCGTTGG
GTGTCGCCTA
GAACCACAAA
TACCTGCTCC
GGGGTGCAGC
TCCCA-AGGCG
TGGCTGACCA
CGGGCATCAT
GTTCTCCGGT
GAACGCCGAG
CAGTGTGTCG
GATGCGTTCT
TTGGGCTAAG
GCGTCGTATC
TCCTTACGGA
TCTTCAGCTG
TTATTCACTG
GGGACTGGTC
GGCGCATGCA
AAACGGCATG
ATTTGCCCAT
CTGTTCGGTT
CTTGACCGAA
TTTTTTTGTA
ATGTTTGATG
GCAGTCGCCC
TCGGCCCTGA
CAGACGTAGC
GTAGTAAGAC
CGGCTTACGT
CAGTCTCCGG
ATGAGGCCAA
CCGCAGTGGC
ACCCAAGTAC
TGGCGGTCGG
GAGGCCACAC
TTGCGGCAGT
TCGACTTAGG
TGGTCGATAG
CGCGGCACAT
GCATTTGCGG
CTGATCACGT
TTCAGAATGG
GCCCGCGGCG
CTTGGTGGAT
TCCACATTGC
ATTGGTCATC
CTCGAGGCGC
ACCCGTCCAG
AGCTACGCGG
CTATCGGCAG
GCATTTGGGG
CTGTCGCAAG
TTTGCTGCCG
TAAAAACTGT
ATGAACCTGA
GGACGCACAC
CGTAAACTGT
CGCAGCGGTG
CAGTCTATGC
TTATGGAGCA
TAAAACAAAG
CCAAGTCAAA
CACCTACTCC
ATTCATCGCG
TCTGCCCAGG
CGAGCACCGG
CGCGCTTGCT
TCTCTATACA
CGCCACCTAA
CGAAAGTTGA
TGTGAGTTGG
GCGCACCCCC
GGTAAAGGTC
CGTACTCGCA
TGGCTTGTAC
CACAGGCTTC
GCTGTGTGTC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2171 -62 Sequzence 9
CTGCAGCCGA
CCCGCGGCAC
CTCGCCTCAT
CAAGGCCAGT
AACCGCAGGC
CTCCGGTTGA
TTCTTCCCTT
AACAAACCTG
AAATGTTCCA
AGCGTCCGCT
CTATGTATGC
ATTTGGCGAA
ATGCAGCACC
ATCGTGGTGA
GGCTTACTCG
TTGCGGGGGT
AGTTCTGATG
GGCTCAAGCA
TCCCAAGTCG
TCGGCAGGCC
CAG
GCATCGATTG
TATCCAATCT
TTCAATCTCT
CGCGGAGAGT
ATCATCATGC
GGTTACGCAA
CCCGGGTCGA
CGTTGCTGCC
CAACGTCCGC
GCTTATCGTC
GGGCATTCCC
GCTGCGTCAC
TTTCCAGCGC
AGATCAATCC
GCGTTTTCCG
GCCCTGTCGC
CGTGCGTCGC
AGCTTTGATG
GTTCCGGCCT
GGCGAGCAGA
AGCACTTTAC
AAATCGATCT
AACTTGATAA
CTCGAAGAGC
TCTGCTCAGC
GACGCTGGAG
ATTCTTGAGC
AGGGCGGCAA
GCCATCGCAC
TCTCGAAATG
TATTGCCCGG
ATCGTAGGTC
GCTGTTTTGC
ATGCTGCGTG
ACACTGCGTT
TGGTGTCGCC
TTGAACCACA
CTTACCTGCT
TGGGGGTGCA
TTTCCCAAGG
CCAGCTGCGC
TCGGGCGCCG
AAACAGAGCT
AGAGTACAGT
CACGCTACCG
GTATTGTCCG
GTCTCGAGCA
ATGGGGAATG
AGAGCTTGCT
ACCTGGAACA
TGTCTCCTGC
TTCTGCAACC
AATGGCGGTC
ACGAGGCCAC
GGTTCCGGCA
TATCGACTTA
AATGGTCGAT
CCCGCGGCAC
GCGCATTTGC
CGCTGATCAC
TGGCTGACCA
CGGGCATCAT
GTTCTCCGGT
GAACGCCGAG
CAGTGTGTCG
GATGCGTTCT
TTGGGCTAAG
GCGTCCTATC
TCCTTACGGA
TCTTCAGCTG
TTATTCACTG
GGGACTGGTC
GGCGAAAGTT
ACTGTGAGTT
GTGCGCACCC
GGGGTAAAGG
AGCGTACTCG
ATTGGCTTGT
GGCACAGGCT
GTGCTGTGTG
TTCAGAATGG
GCCCGCGGCG
CTTGGTGGAT
TCCACATTGC
ATTGGTCATC
CTCGAGGCGC
ACCCGTCCAG
AGCTACGCGG
CTATCGGCAG
GCATTTGGGG
CTGTCGCAAG
TTTGCTGCCG
GATGCGCTGT
GGTCAGGGGG
CCTGGATTGA
TCGCTCGCGA
CAGGCTCTAT
ACAGCGGTGT
TCGAACTGCT
TCGCGGGCTG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1203 63 Sequence
GAATTCCCCT
GCTTGCGTTA
GGGTACGCCT
AATTGACAGA
GGTGCACGAT
GTATCGCTTG
ATCGAGAGAT
TTCTGACTGG
AGACCGATGC
GGAATTGCCA
GGCTTTCTTG
ATGAGGATCG
GGTGGAGAGG
CGTGTTCCGG
TGCCCTGAAT
TCCTTGCGCA
CGAAGTGCCG
CATGGCTGAT
CCAAGCGAAA
GGATGATCTG
GGCGCGCATG
TATCATGGTG
GGACCGCTAT
ATGGGCTGAC
CTTCTATCGC
CAACCGACGC
CTTGCAGACC
AATAATGACA
CCTTGTTCAG
GTATCGCGCG
GCTGCGTTTC
GCGGCGGATC
GCAGCGGGAA
C
GGCGACGAAA
ATCGTTAACC
TTCCGTGCGC
ACTATAGGTT
GAATAGCTAC
GGTCACGCTG
GGTCGAGGTT
TGCAGGCGAA
TGGCCCCGAA
GCTGGGGCGC
CCGCCAAGGA
TTTCGCATGA
CTATTCGGCT
CTGTCAGCGC
GAACTGCAGG
GCTGTGCTCG
GGGCAGGATC
GCAATGCGGC
CATCGCATCG
GACGAAGAGC
CCCGACGGCG
GAAAATGGCC
CAGGACATAG
CGCTTCCTCG
CTTCTTGACG
CCCGAGCAGG
TACAAGCGCT
ATAATGAGGA
CATCTGATGA
TCGCTGCTGC
CTGAATGGGC
TTCTAGAGGA
ACTGGTATGG
GGGCGGCAGG
GTTTGAAATT
TTTGATCTGC
CGCAGTAGCT
GATGGCCGTT
AACCGCCCGG
CTGGAGGTGC
TCCTGGACCG
ATTCTGCAAG
CCTCTGGTAA
TCTGATGGCG
TTGAACAAGA
ATGACTGGGC
AGGGGCGCCC
ACGAGGCAGC
ACCTTGTCAC
TCCTGTCATC
GGCTGCATAC
AGCGAGCACG
ATCAGGGGCT
AGGATCTCCT
GCTTTTCTGG
CGTTGGCTAC
TGCTTTACGG
AGTTCTTCTG
GCATGAAGCA
GATAAATGCG
GTGCCCAATG
GCGCACCTTC
CAGTTTGGAA
GGCGCTTGCT
CCGTTCTTCC
GTTTAACGTT
CCGCATGGCC
CCTTGCCAAA
GCTTCCGTGC
TTTGCTCACC
GGTCTACCGT
AGAAGCGCAA
TGGAGCAGGA
CGGGCATGGA
AGAAGATTCG
GGTTGGGAAG
CAGGGGATCA
TGGATTGCAC
ACAACAGACA
GGTTCTTTTT
GCGGCTATCG
TGAAGCGGGA
TCACCTTGCT
GCTTGATCCG
TACTCGGATG
CGCGCCAGCC
CGTGACCCAT
ATTCATCGAC
CCGTGATATT
TATCGCCGCT
AGCGGGACTC
GTTCCTTGAC
CCGGGGCCCT
TTTCACGTGC
GAGCGTCGTA
GATCCGGACG
CCGAGCGAAC
GAGTTCACCG
TACCTGGCGG
ACGGCTGGGC
TTTCGGCGAG
CTTGAATCAG
CACCAAATCC
TGATGTGAAG
CGCAATGAGC
CGCAGATGCT
CCTGAAGGAG
TCGGGGACAG
CCCTGCAAAG
AGATCTGATC
GCAGGTTCTC
ATCGGCTGCT
GTCAAGACCG
TGGCTGGCCA
AGGGACTGGC
CCTGCCGAGA
GCTACCTGCC
GAAGCCGGTC
GAACTCTTCG
GGCGATGCCT
TGTGGCCGGC
GCTGAAGAGC
CCCGATTCGC
TGGGGTTCGA
GAGAAAAGCA
CGCTCCGCCC
CCCTGCTTAT
GCCCGCTGAC
CCGCAGTGGC
GCCGTGCCCG
CCGCAGCGAG
CGGGCATGTT
GGTAACTGAT
AGAATCATGC
AAAAATAGTT
ACAGCACTGG
GTTGAAGAAG
CCAACTCTCA
CGCGTGCTTG
TATTTCCGCG
CAAGCGAACC
TAAACTGGAT
AAGAGACAGG
CGGCCGCTTG
CTGATGCCGC
ACCTGTCCGG
CGACGGGCGT
TGCTATTGGG
AAGTATCCAT
CATTCGACCA
TTGTCGATCA
CCAGGCTCAA
GCTTGCCGAA
TGGGTGTGGC
TTGGCGGCGA
AGCGCATCGC
AATGACCGAC
TCAAGCCGGG
CCGGCCTTCC
TGGTGGTAAG
CGGAGAAGTG
CGCTGCACAG
ACTGCTGCGA
TGAAACTGGC
GCGGGGAATT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 1981 64 Sequence 11
GAATTCCCCT
GCTTGCGTTA
GGGTACGCCT
AATTGACAGA
GGTGCACGAT
GTATCGCTTG
ATCGAGAGAT
TTCTGACTGG
AGACCGATGC
GTATTGGGCG
CATCACAAAC
TATAATATTT
ATAAGCCTGT
CAGAACCTTG
TGACTGTTTT
GGGTCGATGT
AGCAGGGCAG
GTAGGCTCGG
AGTTCGGAGA
TGCTCCGTAG
CTCTCGCGGC
ATCTCGCAGT
TCA-AGCATGA
ACGATCCCGC
ATATCGACCC
AGGGCATGAA
GCTGATAAAT
GGAGTGCCCA
TGAGCGCACC
TGCCAGTTTG
GGCGGCGCTT
GGACCGTTCT
TGGGTTTAAC
GGCGACGAAA
ATCGTTAACC
TTCCGTGCGC
ACTATAGGTT
GAATAGCTAC
GGTCACGCTG
GGTCGAGGTT
TGCAGGCGAA
TGGCCCCGAA
CATGCATAAA
GGCATGATGA
GCCCATGGAC
TCGGTTCGTA
ACCGAACGCA
TTTGTACAGT
TTGATGTTAT
TCGCCCTAAA
CCCTGACCAA
CGTAGCCACC
TAAGACATTC
TTACGTTCTG
CTCCGGCGAG
GGCCAACGCG
AGTGGCTCTC
AAGTACCGCC
GCAGTTCCTT
GCGCCGGGGC
ATGTTTCACG
TTCGAGCGTC
GAAGATGCGG
GCTCCGAGCG
TCCGAGTTCA
GTTTACCTGG
GGGCGGCAGG
GTTTGAAATT
TTTGATCTGC
CGCAGTAGCT
GATGGCCGTT
AACCGCCCGG
CTGGAGGTGC
TCCTGGACCG
ATTCTGCAAG
AACTGTTGTA
ACCTGAATCG
GCACACCGTG
AACTGTAATG
GCGGTGGTAA
CTATGCCTCG
GGAGCAGCAA
ACAAAGTTAG
GTCAAATCCA
TACTCCCAAC
ATCGCGCTTG
CCCAGGTTTG
CACCGGAGGC
CTTGGTGCTT
TATACAAAGT
ACCTAACAAT
GACGAGAAAA
CCTCGCTGCG
TGCCCCTGCT
GTAGCCCGCT
ACGCCGCAGT
AACGCCGTGC
CCGCCGCAGC
CGGCGGGCAT
CC GCATGGC C
CCTTGCCAAA
GCTTCCGTGC
TTTGCTCACC
GGTCTACCGT
AGAAGCGCAA
TGGAGCAGGA
CGGGCATGGA
AGAAGATTCG
ATTCATTAAG
CCAGCGGCAT
GAAACGGATG
CAAGTAGCGT
CGGCGCAGTG
GGCATCCAAG
CGATGTTACG
GTGGCTCAAG
TGCGGGCTGC
ATCAGCCGCA
CTGCCTTCGA
AGCAGCCGCC
AGCGCATTGC
ATGTGATCTA
TGGCCATACG
TCGTTCAAGC
GCATCAAGCC
CCCCCGGCCT
TATTGGTGGT
GACCGGAGAA
GGCCGCTGCA
CCGACTGCTG
GACTGAAACT
GTTGCGGGGA
ACGGCTGGGC
TTTCGGCGAG
CTTGAATCAG
CACCAAATCC
TGATGTGAAG
CGCAATGAGC
CGCAGATGCT
CCTGAAGGAG
TCGGGGGAGA
CATTCTGCCG
CAGCACCTTG
AAGGCACGAA
ATGCGCTCAC
GCGGTTTTCA
CAGCAAGCC
CAGCAGCAAC
TATGGGCATC
TCTTGATCTT
CTCCGATTAC
CCAAGAAGCG
TAGTGAGATC
CACCGCGCTC
CGTGCAAGCA
GGAAGAAGTG
CGAGATCGGC
GGGCTTGCAG
TCCAATAATG
AAGCCTTGTT
GTGGTATCGC
CAGGCTGCGT
CGAGCGGCGG
GGCGCAGCGG
ATTC
GGTAACTGAT
AGAATCATGC
AAAAATAGTT
ACAGCACTGG
GTTGAAGAAG
CCAACTCTCA
CGCGTGCTTG
TATTTCCGCG
GGCGGTTTGC
ACATGGAAGC
TCGCCTTGCG
CCCAGTTGAC
GCAACTGGTC
TGGCTTGTTA
GTTACGCCGT
GATGTTACGC
ATTCGCACAT
TTCGGTCGTG
CTCGGCAACT
GTTGTTGGCG
TATATCTATG
ATCAATCTCC
GATTACGGTG
ATGCACTTTG
TTCCCCGAGC
ACCTACAAGC
ACAATAATGA
CAGCATCTGA
GCGTCGCTGC
TTCCTGAATG
ATCTTCTAGA
GAAACTGGTA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1964 65 Sequence 12
GAATTCCCCT
GCTTGCGTTA
GGGTACGCCT
AATTGACAGA
GGTGCACGAT
GTATCGCTTG
ATCGAGAGAT
TTCTGACTGG
AGACCGATGC
AGTTCCTTGA
GCCGGGGCCC
GTTTCACGTG
CGAGCGTCGT
AGATGCGGAC
TCCGAGCGAA
CGAGTTCACC
TTACCTGGCG
GGCGACGAAA
ATCGTTAACC
TTCCGTGCGC
ACTATAGGTT
GAATAGCTAC
GGTCACGCTG
GGTCGAGGTT
TGCAGGCGAA
TGCCCCCGAA
CGAGAAAAGC
TCGCTGCGCC
CCCCTGCTTA
AGCCCGCTGA
GCCGCAGTGG
CGCCGTGCCC
GCCGCAGCGA
GCGGGCATGT
GGGCGGCAGG
GTTTGAAATT
TTTGATCTGC
CGCAGTAGCT
GATGGCCGTT
AACCGCCCGG
CTGGAGGTGC
TCCTGGACCG
ATTCTGCAAG
ATCAAGCCGG
CCCGGCCTTC
TTGGTGGTAA
CCGGAGAAGT
CCGCTGCACA
GACTGCTGCG
GTGAAACTGG
TGCGGGGAAT
CCGCATGGCC
CCTTGCCAAA
GCTTCCGTGC
TTTGCTCACC
GGTCTACCGT
AGAAGCGCAA
TGGAGCAGGA
CGGGCATGGA
AGAAGATTCG
GCTTGCAGAC
CAATAATGAC
GCCTTGTTCA
GGTATCGCGC
GGCTGCGTTT
AGCGGCGGAT
CGCAGCGGGA
TC
ACGGCTGGGC
TTTCGGCGAG
CTTGAATCAG
CACCAAATCC
TGATGTGAAG
CGCAATGAGC
CGCAGATGCT
CCTGAAGGAG
TCGCGAGCAG
CTACAAGCGC
AATAATGAGG
GCATCTGATG
GTCGCTGCTG
CCTGAATGGG
CTTCTAGAGG
AACTGGTATG
GGTAACTGAT
AGAATCATGC
AAAAATAGTT
ACAGCACTGG
GTTGAAGAAG
CCAACTCTCA
CGCGTGCTTG
TATTTCCGCG
GGCATGAAGC
TGATAAATGC
AGTGCCCAAT
AGCGCACCTT
CCAGTTTGGA
CGGCGCTTGC
ACCGTTCTTC
GGTTTAACGT
-66 Sequence 13 GAATTCCAAT AATGACAATA
TGGTAAGCCT
AGAAGTGGTA
TGCACAGGCT
GCTGCGAGCG
AACTGGCGCA
GGAAGCCGCG
TAGCTTTGCC
TGCTCCGGTA
GGTGTTGAAA
TGATGCTGGT
TGCGGTGGTG
GACCCACGTT
GGAATTAGGT
CGAAGCGGCG
TCTGATTGTG
ACTGCGTGCT
TGCAGGTCAA
AATTGCCAGC
CTTTCTTGCC
GAGGATCGTT
TGGAGAGGCT
TGTTCCGGCT
CCCTGAATGA
CTTGCGCAGC
AAGTGCCGGG
TGGCTGATGC
AAGCGAA.ACA
ATGATCTGGA
CGCGCATGCC
TCATGGTGGA
ACCGCTATCA
GGGCTGACCG
TCTATCGCCT
AGCGACGCCC
ATGACGAGGC
TGTTCAGCAT
TCGCGCGTCG
GCGTTTCCTG
GCGGATCTTC
GCGGGAAACT
GCCATGACCA
ATGGCGGTTC
ATCCTTGGCG
AGCTCTGAGC
CTGGGGGATG
GAGCGACTGA
GGACGGATCA
GGTAAGGCTC
GCCTTTGGTG
ACAGCAGTCG
GGCGATCCTA
CGCATCCAGG
TGGGGCGCCC
GCCAAGGATC
TCGCATGATT
ATTCGGCTAT
GTCAGCGCAG
ACTGCAGGAC
TGTGCTCGAC
GCAGCATCTC
AATGCGGCGG
TCGCATCGAG
CGAAGAGCAT
CGACGGCGAG
AAATGGCCGC
GGACATAGCG
CTTCCTCGTG
TCTTGACGAG
GGCCCAGCGC
TCAGATGCCA
ATGAGGAGTG
CTGATGAGCG
CTGCTGCCAG
AATGGGCGGC
TAGAGGACCG
GGTATGGGTT
CACAGATTCA
GACAGCCATG
TACGGGCTGT
TGAGTCCCTT
GCGTGGTGAA
TTGCAAATCC
TTGGTGAGCT
CGTTCTTGGT
CCTACTTCAA
CAGACGCCTT
ATGATCCGCA
TTCTGGTCGA
TCTGGTAAGG
TGATGGCGCA
GAACAAGATG
GACTGGGCAC
GGGCGCCCGG
GACGCAGCGC
GTTGTCACTG
CTGTCATCTC
CTGCATACGC
CGAGCACGTA
CAGGGGCTCG
GATCTCGTCG
TTTTCTGGAT
TTGGCTACCC
CTTTACGGTA
TTCTTCTGAG
GTCGATTCGG
TTCGGTGGGG
CCCAATGTTT
CACCTTCGAG
TTTGGAAGAT
GCTTGCTCCG
TTCTTCCGAG
TAACGTTTAC
GGGCGATGTC
TGGCGTCGTG
TGCGATCCCG
TACCCATCGC
TGTCATCAGC
TGCGGTACGT
GTCTGCGCGT
CTTGGACGAT
TCAGGGTCAA
TGTTGAAAAG
ATCGGTCTTG
TGATGCGCTC
TTGGGAAGCC
GGGGATCAAG
GATTGCACGC
AACAGACAAT
TTCTTTTTGT
GGCTATCGTG
AAGCGGGAAG
ACCTTGCTCC
TTGATCCGGC
CTCGGATGGA
CGCCAGCCGA
TGACCCATGG
TCATCGACTG
GTGATATTGC
TCGCCGCTCC
CGGGACTCTG
GCATTTGCCA
TGAAGTC CAG
CACGTGCCCC
CGTCGTAGCC
GCGGACGCCG
AGCGAACGCC
TTCACCGCCG
CTGGCGGCGG
ATTCCGTCCA
CTCGGTATTG
TTGGCATGCG
CTGATTGGTC
AATGCCCCGC
CGAGTGAACT
CATCTGAAGC
GCCGACCTCG
ATCTGCATGT
CTGGCGAGGA
GGTTCCTTGA
GGGGACAGCA
CTGCAAAGTA
ATCTGATCAA
AGGTTCTCCG
CGGCTGCTCT
CAAGACCGAC
GCTGGCCACG
GGACTGGCTG
TGCCGAGAAA
TACCTGCCCA
AGCCGGTCTT
ACTGTTCGCC
CGATGCCTGC
TGGCCGGCTG
TGAAGAGCTT
CGATTCGCAG
GCGTTCGAAA
TATCAATGGA
CGGCTACGC
TGCTTATTGG
CGCTGACCGG
CAGTGGCCGC
GTGCCCGACT
CAGCGAGTGA
GCATGTTGCG
ATGTGCCCGG
CGCCTTGGAA
GCAATACCGT
AGGTGTTGCA
AAGACGCTCC
TCACCGGTTC
CTGCTGTGCT
ATGCGGCGGT
CCACTGAGCG
AGGTCGCCAC
TTGATGCCAA
AGCGAACCGG
AACTGGATGG
GAGACAGGAT
GCCGCTTGGG
GATCCGCCG
CTGTCCGGTG
ACGGGCGTTC
CTATTGGGCG
GTATCCATCA
TTCGACCACC
GTCGATCAGG
AGGCTCAAGG
TTGCCGAATA
GGTGTGGCGC
GGCGGCGAAT
CGCATCGCCT
TGACCGACCA
CCGACTGTGC
AGCTTCGGCA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 -67
GTCGAGCATC
GGCACTATCC
CTCATTTCAA
CCAGTCGCGG
CAGGCATCAT
GTTGAGGTTA
CCCTTCCCGG
GATTGAGCAC
AATCTAAATC
TCTCTAACTT
AGAGTCTCGA
CATGCTCTGC
CGCAAGACGC
GTGGAATTC
TTTACCCAC
GATCTTCGGG
GATAAAAACA
AGAGGAGAGT
TCAGCCACGC
TGGAGGTATT
TGCGCTGGCT
CGCCGCGGGC
GAGCTGTTCT
ACAGTGAACG
TACCGCAGTG
GTCCGGATGC
GACCATTCAG
ATCATGCCCG
CCGGTCTTGG
CCGAGTCCAC
TGTCGATTGG
GTTCTCTCGA
AATGGCCCGC
CGGCGCTCGC
TGGATCAAGG
ATTGCAACCG
TCATCCTCCG
GGCGCTTCTT
2220 2280 2340 2400 2460 2520 2539 -68- Sequence 14
GAATTCCAAT
TGGTAAGCCT
AGAAGTGGTA
TGCACAGGCT
GCTGCGAGCG
AACTGGCGCA
GGAAGCCGCG
TAGCTTTGCC
TGCTCCGGTA
GGTGTTGAAA
TGATGCTGGT
TGCGGTGGTG
GACCCACGTT
GGAATTAGGT
CGAAGCGGCG
TCTGATTGTG
ACTGCGTGCT
TGCAGGTCAA
CTGTTGTAAT
CTGAATCGCC
ACACCGTGGA
CTGTAATGCA
GGTGGTAACG
ATGCCTCGGG
AGCAGCAACG
AAAGTTAGGT
CAAATCCATG
CTCCCAACAT
CGCGCTTGCT
CAGGTTTGAG
CCGGAGGCAG
TGGTGCTTAT
TACAAAGTTG
CTAACAATTC
TTTGCCATAT
AGTCCAGCGG
AATGACAATA
TGTTCAGCAT
TCGCGCGTCG
GCGTTTCCTG
GCGGATCTTC
GCGGGAAACT
GCCATGACCA
ATGGCGGTTC
ATCCTTGGCG
AGCTCTGAGC
CTGGGGGATG
GAGCGACTGA
GGACGGATCA
GGTAAGGCTC
GCCTTTGGTG
ACAGCAGTCG
GGCGATCCTA
CGCATCCAGG
TCATTAAGCA
AGCGGCATCA
AACGGATGAA
AGTAGCGTAT
GCGCAGTGGC
CATCCALAGCA
ATGTTACGCA
GGCTCAAGTA
CGGGCTGCTC
CAGCCGGACT
GCCTTCGACC
CAGCCGCGTA
GGCATTGCCA
GTGATCTACG
GGCATACGGG
GTTCAAGCCG
CAATGGACCC
CTACGGCAGC
ATGAGGAGTG
CTGATGAGCG
CTGCTGCCAG
AATGGGCGGC
TAGAGGACCG
GGTATGGGTT
CACAGATTCA
GACAGCCATG
TACGGGCTGT
TGAGTCCCTT
GCGTGGTGAA
TTGCAAATCC
TTGGTGAGCT
CGTTCTTGGT
CCTACTTCAA
CAGACGCCTT
ATGATCCGCA
TGGGGAGAGG
TTCTGCCGAC
GCACCTTGTC
GGCACGAACC
CCGCTCACGC
GGTTTTCATG
GCALAGCGCGT
GCAGCAACGA
TGGGCATCAT
TTGATCTTTT
CCGATTACCT
AAGAAGCGGT
GTGAGATCTA
CCGCGCTCAT
TGCAAGCAGA
AAGAAGTGAT
AGATCGGCTT
ACTGTGCATG
TTCGGCAGTC
CCCAATGTTT
CACCTTCGAG
TTTGGAAGAT
GCTTGCTCCG
TTCTTCCGAG
TAACGTTTAC
GGGCGATGTC
TGGCGTGGTG
TGCGATGCCG
TACCCATCGC
TGTCATCAGC
TGCGGTACGT
GTCTGCGCGT
CTTGGACGAT
TCAGGGTCAA
TGTTGAAAAG
ATCGGTCTTG
CGGTTTGCGT
ATGGAAGCCA
GCCTTGCGTA
CAGTTGACAT
AACTGGTCCA
GCTTGTTATG
TACGCCGTGG
TGTTACGCAG
TCGCACATGT
CGGTCGTGAG
CGGAACTTG
TGTTGGCGCT
TATCTATGAT
CAATCTCCTC
TTACGGTGAC
GCACTTTGAT
CCCAATTGGC
ACGAGGCTCA
GAGCATCGAT
CACGTGCCCC
CGTCGTAGCC
GCGGACGCCG
AGCGAACGCC
TTCACCGCCG
CTGGCGGCGG
ATTCCGTCCA
CTCGGTATTG
TTGGCATGCG
CTGATTGGTC
AATGCCCCGC
CGAGTGAACT
CATCTGAAGC
GCCGACCTCG
ATCTGCATGT
CTGGCGAGGA
GGTTCGTTGA
ATTGGGCGCA
TCACAAACGG
TAATATTTGC
AAGCCTGTTC
GAACCTTGAC
ACTGTTTTTT
GTCCATGTTT
CAGGGCAGTC
AGGCTCGGCC
TTCGGAGACG
CTCCGTAGTA
CTCGCGGCTT
CTCGCAGTCT
AAGCATGAGG
GATCCCGCAG
ATCGACCCALA
CCAGCGCGTC
GATGCCATTC
TGAGCACTTT
TGCTTATTGG
CGCTGACCGG
CAGTGGCCGC
GTGCCCGACT
CAGCGAGTGA
GCATGTTGCG
ATGTGCCCGG
CGCCTTGGAA
GCAATACCGT
AGGTGTTGCA
AAGACGCTCC
TCACCGGTTC
CTGCTGTGCT
ATGCGGCGGT
CCACTGAGCG
AGGTCGCCAC
TTGATGCCAA
TGCATAAAAA
CATGATGAAC
CCATGGACGC
GGTTCGTAAA
CGAACGCAGC
TCTACAGTCT
GATGTTATGG
GCCCTAAAAC
CTGACCAAGT
TAGCCACCTA
AGACATTCAT
ACGTTCTGCC
CCGGCGAGCA
CCAACGCGCT
TCGCTCTCTA
GTACCGCCAC
GATTCGGGCA
GGTGGGGTGA
ACCCAGCTGC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 -69
GCTGGCTGAC
CGCGGGCATC
CTGTTCTCCG
GTGAACGCCG
CGCAGTGTGT
CGGATGCGTT
CATTCAGAAT
ATGCCCGCGG
GTCTTGGTGG
AGTCCACATT
CGATTGGTCA
CTCTCGAGGC
GGCCCGCGGC
CGCTCGCCTC
ATCAAGGCCA
GCAACCGCAG
TCCTCCGGTT
GCTTCTTCCC
ACTATCCAAT
ATTTCAATCT
GTCGCGGAGA
GCATCATCAT
GAGGTTACGC
TTCCCGGGTG
CTAAATCGAT
CTAACTTGAT
GTCTCGAAGA
GCTCTGCTCA
AAGACGCTGG
GAATTC
CTTCGGGCGC
AAAAACAGAG
GGAGAGTACA
GCCACGCTAC
AGGTATTGTC
2220 2280 2340 2400 2460 2506 70 Sequence
GAATTCCAAT
TGGTAAGCCT
AGAAGTGGTA
TCCACAGGCT
GCTGCGAGCG
AACTGGCGCA
GGAAGCCGCG
TAGCTTTGCC
TGCTCCGGTA
GGTGTTGAAA
TGATGCTGGT
TGCGGTGGTG
GACCCACGTT
GGAATTAGGT
CGAAGCGGCG
TCTGATTGTG
ACTGCGTGCT
TGCAGGTCAA
TTGGCCCAGC
GCTCAGATGC
TCGATTGAGC
CCAATCTAAA
AATCTCTAAC
GGAGAGTCTC
ATCATGCTCT
TACGCAAGAC
GGGTGGAATT
AATGACAATA
TGTTCAGCAT
TCGCGCGTCG
GCGTTTCCTG
GCGGATCTTC
GCGGGAAACT
GCCATGACCA
ATGGCGGTTC
ATCCTTGGCG
AGCTCTGAGC
CTGGGGGATG
GAGCGACTGA
GGACGGATCA
GGTAAGGCTC
GCCTTTGGTG
ACAGCAGTCG
GGCGATCCTA
CGCATCCAGG
GCGTCGATTC
CATTCGGTGG
ACTTTACCCA
TCGATCTTCG
TTGATAAAAA
GAAGAGGAGA
GCTCACCCAC
GCTGGAGGTA
C
ATGAGGAGTG
CTGATGAGCG
CTGCTGCCAG
AATGGGCGGC
TAGAGGACCG
GGTATGGGTT
CACAGATTCA
GACAGCCATG
TACGGGCTGT
TGAGTCCCTT
GCGTGGTGAA
TTGCAAATCC
TTGGTGAGCT
CGTTCTTGGT
CCTACTTCAA
CAGACGCCTT
ATGATCCGCA
TTCTGGTCGA
GGGCATTTGC
GGTGAAGTCC
GCTGCGCTGG
GCCGCCGCGG
CAGAGCTGTT
GTACAGTGAA
GCTACCGCAG
TTGTCCGGAT
CCCAATGTTT
CACCTTCGAG
TTTGGAACAT
GCTTGCTCCG
TTCTTCCGAG
TAACGTTTAC
GGGCGATGTC
TGGCGTGGTG
TGCGATGCCG
TACCCATCGC
TGTCATCAGC
TGCGGTACGT
GTCTGCGCGT
CTTGGACGAT
TCAGGGTCAA
TGTTGAAAAG
ATCGGTCTTG
TGATGCGCTC
CATATCAATG
AGCGGCTACG
CTGACCATTC
GCATCATGCC
CTCCGGTCTT
CGCCGAGTCC
TGTGTCGATT
GCGTTCTCTC
CACGTGCCCC
CGTCGTAGCC
GCGGACGCCG
AGCGAACGCC
TTCACCGCCG
CTGGCGGCGG
ATTCCGTCCA
CTCGGTATTG
TTGGCATGCG
CTGATTGGTC
AATGCCCCGC
CGAGTGAACT
CATCTGAAGC
GCCGACCTCG
ATCTGCATGT
CTGGCGAGGA
GGTTCGTTGA
GCAAAAGGCG
GACCGACTGT
GCAGCTTCGG
AGAATGGCCC
CGCGGCGCTC
GGTGGATCAA
ACATTGCA-AC
GGTCATCCTC
GAGGCGCTTC
TGCTTATTGG
CGCTGACCGG
CACTGGCCGC
GTGCCCGACT
CAGCGAGTGA
GCATGTTGCG
ATGTGCCCGG
CGCCTTGGAA
GCAATACCGT
AGGTGTTGCA
AAGACGCTCC
TCACCGGTTC
CTGCTGTGCT
ATGCGGCGGT
CCACTGAGCG
AGGTCGCCAC
TTGATGCCAA
CGCAATGGAA
GCATGACGAG
CAGTCGAGCA
GCGGCACTAT
GCCTCATTTC
GGCCACTCGC
CGCAGGCATC
CGGTTGAGCT
TTCCCTTCCC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1571 -71- Sequence 16
GAATTCCGCG
GTGACGAGGC
GTTGGTTGCG
GCCTATCGAC
ACAAATGGTC
GCTCCCGCGG
GCAGCGCATT
AGGCGCTGAT
GTATACACAC
GGAGGCATTG
GACAGCAAGC
CAAAGTAAAC
TGATCAAGAC
TTCTCCGGCC
CTGCTCTGAT
GACCGACCTG
GGCCACGACG
CTGGCTGCTA
CGAGAAAGTA
CTGCCCATTC
CGGTCTTGTC
GTTCGCCAGG
TGCCTGCTTG
CCGGCTGGGT
AGAGCTTGGC
TTCGCAGCC
TTCGAAATGA
AGAGATCGTG
TCAACTGCCT
TGAAGCCCTT
CTGTGCCGTA
GCCGGTCTTG
GCTCGGCCCT
TATCGACCTC
ATTGGGTATT
GCTTGCCGCG
GTCGGCGAAA
CACACTGTGA
GCAGTGCGCA
TTAGGGGTAA
GATAGCGTAC
CACATTGGCT
TGCGGCACAG
CACGTGCTGT
CGGGGCGGGT
TTTGATCCTG
GAACCGGAAT
TGGATGGCTT
ACAGGATGAG
CCTTGGGTGG
GCCGCCGTGT
TCCGGTGCCC
GGCGTTCCTT
TTGGGCGAAG
TCCATCATGG
GACCACCAAG
GATCAGGATG
CTCAAGGCGC
CCGAATATCA
GTGGCGGACC
GGCGAATGGG
ATCGCCTTCT
CCGACCAAGC
GCTGTTACGG
CGGAAGGCAA
TCCCGATTGA
GTGGACGGCG
GCTAGGATAC
GCGCCCGCGA
TTTGAGATAA
GAGCACTCAA
ACCGGATTGC
GTTGATGCGC
GTTGGTCAGG
CCCCCTGGAT
AGGTCGCTCG
TCGCAGGCTC
TGTACAGCGG
GCTTCGAACT
GTGTCGCGGC
TCCGCCTCGG
CTCCAGGACT
TGCCAGCTGG
TCTTGCCGCC
GATCGTTTCG
AGACGCTATT
TCCGGCTGTC
TGAATGAACT
GCGCAGCTGT
TGCCGGGGCA
CTGATGCAAT
CGAAACATCG
ATCTGGACGA
GCATGCCCGA
TGGTGGAAAA
GCTATCAGGA
CTGACCGCTT
ATCGCCTTCT
GACGCCCATT
ATGAACAGTT
AATTGTTGAT
AGCCTGTTCA
CCGCGGCGGC
TGGCTACCTC
TTCGCCTGCT
ACGAGGCGCA
AACTTAATAT
GTCTCTGCAT
TGTATCGTGG
GGGGGCTTAC
TGATTGCGGG
CGAAGTTCTG
TATGGCTCAA
TGTTCCCAAG
GCTTCGGCAG
AGAGTCCATG
TGCGCCCGTT
CGACATGATC
GGCGCCCTCT
AAGGATCTGA
CATGATTGAA
CGGCTATGAC
AGCGCAGGGG
GCAGGACGAG
GCTCGACGTT
GGATCTCCTG
GCGGCGGCTG
CATCGAGCGA
AGAGCATCAG
CGGCGAGCAT
TGGCCGCTTT
CATAGCGTTG
CCTCGTGCTT
TGACGAGTTC
GAGGGCGCAA
CGATTTAGAG
CGTGACAGTC
TTCTGGCGGG
TTTGGTGCCT
CGTAGTCGGG
GCTTGCGCGT
GGCCGCCCAA
TTGGGCGGG
GACCCTCGCT
TGAAGATCAA
TCGGCGTTTT
GGTGCCCTGT
ATGCGTGCGT
GCAAGCTTTG
TCGGTTCCGG
GCCCGCGAGC
TCGCGTAACC
GAGTTCAAGG
GCTACCGCAG
GGTAAGGTTG
TGGCGCAGGC
CAAGATGGAT
TGGGCACAAC
CGCCCGGTTC
GCAGCGCGGC
GTCACTGAAG
TCATCTCACC
CATACGCTTG
GCACGTACTC
GGGCTCGCGC
CTCGTCGTGA
TCTGGATTCA
GCTACCCGTG
TACGGTATCG
TTCTGAGCCG
GAGGAGAAAT
GGCTACAACA
ATCCGCGGCC
GTGCAGACTG
CGAGAGTCGT
ATCGAGCCCG
AGTGATCTTA
GTTCTAGCGG
GCCATTGCAC
CACCAATTGC
TCCATGCTC
CCGACACTGC
CGCTGGTGTC
CGCTTGAACC
ATGCTTACCT
CCTTGGGGGT
AGATTTCCCA
CCATCGCGTC
ATTTTTTGTG
AAAACCTGGG
GGAAGCCCTG
GATCAAGATC
TGCACGCAGG
AGACAATCGG
TTTTTGTCAA
TATCGTGGCT
CGGGAAGGGA
TTGCTCCTGC
ATCCGGCTAC
GGATGGAAGC
CAGCCGAACT
CCCATGGCGA
TCGACTGTGG
ATATTGCTGA
CCGCTCCCGA
GACTCTGGGG
GGATTGACCA
GTCGAGCAAT
TAGCAGTCTT
CGGGCAACAG
CTGCGACACA
AGCATATGGG
GTTTGAGGGA
TACAGCATGA
TTGGACACCC
AAGCTAATAA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 72
CTTTCGATAT
AGAGAATCCC
CTATCCACTG
TCGAAAATCT
TGGCAGAAAG
TCGGGCTGAT
GAATTC
GGAATTGCCT CGGCATGCAT CACTTCGGTT CGTCCTCTGC AGCTAACGGG CATCTCCTTT CTGCTAAAAA CAAGAAGAAG TTTAGGAGTC CAGGCTAAAC TGTTACCGGC ACGGGTTTCT
TGGTGGGGGA
ACGAAGTTCG
GTTGCTTTGA
GAACAGGGAA
TTGCCCTTGC
ACAGTGTACA
CAGGGGATGG
ATGATTAACA
GGTGGCGCAC
CATGATTAGT
CTTCGCACTC
TACCTTGTCA
CGGTTCTTTT
GAGTTGACCA
GAAGGAGGGC
TTCGCTCGTA
GTATTATGTG
GGGTTGGTGG
2220 2280 2340 2400 2460 2520 2526 73 Sequence 17
GAATTCCGCG
GTGACGAGGC
GTTGGTTGCG
GCCTATCGAC
ACAAATGGTC
GCTCCCGCGG
GCAGCGCATT
AGGCGCTGAT
CTATACACAC
GGAGGCATTG
GGAGAGGCGG
TGCCGACATG
CCTTGTCGCC
ACGAACCCAG
CTCACGCAAC
TTTCATGGCT
AGCGCGTTAC
GCAACGATGT
GCATCATTCG
ATCTTTTCG
ATTACCTCG
AAGCGGTTGT
AGATCTATAT
CGCTCATCAA
AAGCAGATTA
AAGTGATGCA
TCGGCTTCCC
CGGATGAACA
CAAAATTGTT
TGAAGCCTGT
GCGCCGCGGC
TACTGGCTAC
CGATTCGCCT
TAAACGAGC
CAAAACTTAA
TGCGTCTCTG
GTCGGCGAAA
CACACTGTGA
GCAGTGCGCA
TTAGGGGTAA
GATAGCGTAC
CACATTGGCT
TGCGGCACAG
CACGTGCTGT
CGGGGCGGGT
TTTGATCCTG
TTTGCGTATT
GAAGCCATCA
TTGCGTATAA
TTGACATAAG
TGGTCCAGAA
TGTTATGACT
GCCGTGGGTC
TACGCAGCAG
CACATGTAGG
TCGTGAGTTC
GAACTTGCTC
TGGCGCTCTC
CTATGATCTC
TCTCCTCAAG
CGGTGACGAT
CTTTGATATC
ATTGAGGGCG
GTTCGATTTA
GATCGTGACA
TCATTCTGC
GGCTTTGGTG
CTCCGTAGTC
GCTGCTTGCG
GCAGCCCC
TATTTGGGGC
CATGACCCTC
GTTGATCCC
GTTGGTCAGG
CCCCCTGGAT
AGGTCGCTCG
TCGCAGGCTC
TGTACAGCGG
GCTTCGAACT
GTGTCGCGGC
TCCGCCTCGG
CTCCAGGACT
GGGCGCATGC
CAAACGGCAT
TATTTGCCCA
CCTGTTCGGT
CCTTGACCGA
GTTTTTTTGT
GATGTTTGAT
GGCAGTCCC
CTCGGCCCTG
GGAGACGTAG
CGTAGTAAGA
GCGGCTTACG
GCAGTCTCCG
CATGAGGCCA
CCCGCAGTGG
GACCCAAGTA
CAAGAGGAGA
GAGGGCTACA
GTCATCCGCG
GGGGTGCAGA
GCTCGAGAGT
GGGATCGAGC
CGTAGTGATC
CAAGTTCTAG
GGGGCCATTG
GCTCACCAAT
TGTATCGTGG
GGGGGCTTAC
TGATTGCCGGG
CGAAGTTCTG
TATGGCTCAA
TGTTCCCAAG
GCTTCGGCAG
AGAGTCCATG
TGCGCCCGTT
CGACATGATC
ATAAAAACTG
GATGAACCTG
TGGACGCACA
TCGTAAACTG
ACGCAGCGGT
ACAGTCTATG
GTTATGGAGC
CTAAAACAAA
ACCAAGTCAA
CCACCTACTC
CATTCATCGC
TTCTGCCCAG
GCGAGCACCG
ACGCGCTTGG
CTCTCTATAC
CCCCCAC CTA
AATGGATTGA
ACAGTCGAGC
GCCTAGCACT
CTCCCGGCAA
CGTCTGCGAC
CCGAGCATAT
TTAGTTTGAG
CGGTACAGCA
CACTTGGACA
TGCAAGCTAA
TGAAGATCAA
TCGGCGTTTT
GGTGCCCTGT
ATGCGTGCGT
GCAAGCTTTG
TCGGTTCCCG
GCCGGCGAGC
TCGCGTAACC
GAGTTCAAGG
GCTACCGCAG
TTGTAATTCA
AATCGCCAGC
CCGTGGAAAC
TAATGCAAGT
GGTAACGGCG
CCTCGGGCAT
AGCAACGATG
GTTAGGTGGC
ATCCATGCGG
CCAACATCAG
GCTTGCTGCC
GTTTGAGCAG
GAGGCAGGGC
TGCTTATGTG
AAAGTTGGGC
ACAATTCGTT
CCAAGAGATC
AATTGAACTG
CTTTGAAGCC
CAGCTGTGCC
ACAGCCGGTC
GGGGCTCGGC
GGATATCGAC
TGAATTGGGT
CCCGCTTGCC
TAACTTTCGA
TCCATGCTGC
CCGACACTC
CGCTGGTGTC
CGCTTGAACC
ATGCTTACCT
CCTTGGGGGT
AGATTTCCCA
CCATCGCGTC
ATTTTTTGTG
AAAACCTGGG
TTAAGCATTC
GGCATCAGCA
GGATGAAGGC
AGCGTATGCG
CAGTGGCGGT
CCAAGCAGCA
TTACGCAGCA
TCAAGTATGG
GCTGCTCTTG
CCGGACTCCG
TTCGACCAAG
CCGCGTAGTG
ATTCCACCG
ATCTACGTGC
ATACGGGAAG
CAAGCCGAGA
GTGGCTGTTA
CCTCGGAAGG
CTTTCCCGAT
GTAGTGGACG
TTGGCTAGGA
CCTGCGCCCG
CTCTTTGAGA
ATTGAGCACT
GCGACCGGAT
TATGGAATTG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 -74- CCTCGGCATG CATTGGTGG GTTCGTCCTC TGCACGAAGT GGGCATCTCC TTTGTTGCTT AAACAAGAAG AAGGAACAGG GTCCAGGCTA AACTTGCCCT GGCACGGGTT TCTACAGTGT
GGACAGGGGA
TCGATGATTA
TGAGGTGGCG
GAACATGATT
TGCCTTCGCA
ACATACCTTG
TGGCGGTTCT
ACAGAGTTGA
CACGAAGGAG
AGTTTCGCTC
CTCGTATTAT
TCAGGGTTGG
TTTAGAGAAT
CCACTATCCA
GGCTCGAAAA
GTATGGCAGA
GTGTCGGGCT
TGGGAATTC
CCCCACTTCG
CTGAGCTAAC
TCTCTGCTAA
AAGTTTAGGA
GATTGTTACC
2220 2280 2340 2400 2460 2509 75 Sequence 18
GAATTCCGCG
GTGACGAGGC
GTTGGTTGCG
GCCTATCGAC
ACAAATGGTC
GCTCCCGCGG
GCAGCGCATT
AGGCGCTGAT
GTATACACAC
GGAGGCATTG
GCGCATTGAG
AACAGTTCGA
TGTTGATCGT
CTGTTCATTC
CGGCGGCTTT
CTACCTCCGT
GCCTGCTGCT
AGGCGCAGGC
TTAATATTTG
TCTGCATGAC
CATGCATTGG
CCTCTGCACG
CTCCTTTGTT
GAAGAAGGAA
GCTAAACTTG
GGTTTCTACA
GTCGGCGAAA
CACACTGTGA
GCAGTGCGCA
TTAGGGGTAA
GATAGCGTAC
CACATTGGCT
TGCGGCACAG
CACGTGCTGT
CGGGGCGGGT
TTTGATCCTG
GGCGCAAGAG
TTTAGAGGC
GACAGTCATC
TGGCGGGGTG
GGTGGCTCGA
AGTCGGGATC
TGCGCGTAGT
CGCCCAAGTT
GGGCGGGGCC
CCTCGCTCAC
TGCGGGACAG
AAGTTCGATG
GCTTTGAGGT
CAGGGAACAT
CCCTTGCCTT
GTGTACATAC
GTTGATGCGC
GTTGGTCAGG
CCCCCTGGAT
AGGTCGCTCG
TCGCAGGCTC
TGTACAGCGG
GCTTCGAACT
GTGTCGCGGC
TCCGCCTCGG
CTCCAGGACT
GAGAAATGGA
TACAACAGTC
CGCGGCCTAG
CAGACTGCGG
GAGTCGTCTG
GAGCCCGAGC
GATCTTAGTT
CTAGCGGTAC
ATTGCACTTG
CAATTGCAAG
GGGATGGCGG
ATTAACAGAG
GGCGCACGAA
GATTAGTTTC
CGCACTCGTA
CTTGTCAGGG
TGTATCGTGG
GGGGGCTTAC
TGATTGCGGG
CGAAGTTCTG
TATGGCTCAA
TGTTCCCAAG
GCTTCGGCAG
AGAGTCCATG
TGCGCCCGTT
CGACATGATC
TTGACCAAGA
GAGCAATTGA
CAGTCTTTGA
GCAACAGCTG
CGACACAGCC
ATATGGGGCT
TGAGGGATAT
AGCATGAATT
GACACCCGCT
CTAATAACTT
TTCTTTTAGA
TTGACCACTA
GGAGGGCTCG
GCTCGTATG
TTATGTGTCG
TTGGTGGGAA
TGAAGATCAA
TCGGCGTTTT
GGTGCCCTGT
ATGCGTGCGT
GCAAGCTTTG
TCGGTTCCGG
GCCGGCGAGC
TCGCGTAACC
GAGTTCAAGG
GCTACCGCAG
GATCGTGGCT
ACTGCCTCGG
AGCCCTTTCC
TGCCGTAGTG
GGTCTTGGCT
CGGCCCTGCG
CGACCTCTTT
GGGTATTGAG
TGCCGCGACC
TCGATATGGA
GAATCCCCAC
TCCACTGAGC
AAAATCTCTG
CAGAAAGTTT
GGCTGATTGT
TTC
TCCATGCTGC
CCGACACTGC
CGCTGGTGTC
CGCTTGAACC
ATGCTTACCT
CCTTGGGGGT
AGATTTCCCA
CCATCGCGTC
ATTTTTTGTG
AAAACCTGGC
GTTACGGATG
AAGGCAAAAT
CGATTGAAGC
GACGGCGCCG
AGGATACTGG
CCCGCGATTC
GAGATAAACG
CACTCAAAAC
GGATTGCGTC
ATTCCCTCGG
TTCGGTTCGT
TAACGGGCAT
CTAAAAACAA
AGGAGTCCAG
TACCGGCACG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1543

Claims (20)

1. Transformed and/or mutagenized unicellular or multicellular organism, wherein an enzyme of eugenol and/or ferulic acid catabolism is inactivated by inserting an omega element or introducing a deletion into a corresponding gene, such that the intermediates coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid accumulate.
2. Organism according to claim 1, characterized in that a gene encoding any of the enzymes coniferyl alcohol dehydrogenases, coniferyl aldehyde dehydrogenases, feruloyl-CoA synthetases, enoyl-CoA hydratase-aldolases, beta-ketothiolases, vanillin dehydrogenases or vanillic acid demethylases is inactivated.
3. Organism according to one of claims 1 or 2, characterized in that it is unicellular, preferably a microorganism or a plant or animal cell.
4. Organism according to one of claims 1 to 3, characterized in that it is a bacterium, preferably a Pseudomonas species. 20
5. Gene structure for preparing transformed organisms and mutants, comprising a nucleotide sequence encoding any of the enzymes coniferyl alcohol dehydrogenases, coniferyl aldehyde dehydrogenases, feruloyl-CoA synthetases, enoyl-CoA hydratase-aldolases, beta-ketothiolases, vanillin dehydrogenases or vanillic acid demethylases, said nucleotide sequence being inactivated.
6. Gene structure, comprising a nucleotide sequence encoding for any of the enzymes coniferyl alcohol dehydrogenases, coniferyl aldehyde dehydrogenases, .i feruloyl-CoA synthetases, enoyl-CoA hydratase-aldolases, beta-ketothiolases, P:\WPDOCS\CRN\Punia\Speci\7595580.claims.doc-24/3/03 vanillin dehydrogenases or vanillic acid demethylases, said nucleotide sequence being inactivated by inserting an omega element or introducing a deletion.
7. Gene structure having any of the structures given in Figures la to Ir.
8. Gene structure having any of the sequences given in Sequence 1 to Sequence 18. 0* S S S S
9. Vector, comprising at least one gene structure according to one of claims 5 to 8.
10. Organism according to one of claims 1 to 4, characterized in that it harbours a vector according to claim 9.
11. Organism according to one of claims 1 to 4, characterized in that it contains a gene structure according to one of claims 5 to 8, which is integrated into the genome instead of the respective intact gene.
12. Process for the biotechnological preparation of organic compounds, in particular alcohols, aldehydes and organic acids, characterized in that an organism according to one of claims 1 to 4, 10 or 11 is employed.
13. Use of an organism according to one of claims 1 to 4, 10 or 11 for preparing coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin and/or vanillic acid.
14. Use of a gene structure according to one of claims 5 to 8 or a vector according 25 to claim 9 for preparing transformed and/or mutagenized organisms.
S .00. 6600 a S S 5S *S S *5 S 50 S S P:\WPDOCS\CRN\Punia\Spmd\759580.cimsdoc243/03 36 Transformed organism according to any one of claims 1 to 4, substantially as described herein with reference to the examples.
16. A gene structure according to claim 5 or 6, substantially as described herein with reference to the examples.
17. A vector according to claim 9 substantially as described herein with reference to the examples.
18. Process according to claims 12 substantially as described herein with reference to the examples.
19. Use according to claim 13 or 14 substantially as described herein with reference to the examples. DATED this 24th day of March, 2003 HAARMANN REIMER GMBH 0000 By its Patent Attorneys S
20 DAVIES COLLISON CAVE *oo 0*OS *0,0 *ft
AU10413/00A 1998-10-31 1999-10-20 Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism Ceased AU761093B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE19850242 1998-10-31
DE19850242A DE19850242A1 (en) 1998-10-31 1998-10-31 Construction of production strains for the production of substituted phenols by targeted inactivation of genes of eugenol and ferulic acid catabolism
PCT/EP1999/007952 WO2000026355A2 (en) 1998-10-31 1999-10-20 Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism

Publications (2)

Publication Number Publication Date
AU1041300A AU1041300A (en) 2000-05-22
AU761093B2 true AU761093B2 (en) 2003-05-29

Family

ID=7886266

Family Applications (1)

Application Number Title Priority Date Filing Date
AU10413/00A Ceased AU761093B2 (en) 1998-10-31 1999-10-20 Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism

Country Status (14)

Country Link
EP (1) EP1124947A2 (en)
JP (1) JP2003533166A (en)
KR (1) KR20020022045A (en)
CN (1) CN1325444A (en)
AU (1) AU761093B2 (en)
BR (1) BR9914930A (en)
CA (1) CA2348962A1 (en)
DE (1) DE19850242A1 (en)
HK (1) HK1041902A1 (en)
HU (1) HUP0104772A3 (en)
IL (1) IL142272A0 (en)
PL (1) PL348647A1 (en)
SK (1) SK5742001A3 (en)
WO (1) WO2000026355A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100830691B1 (en) * 2006-11-21 2008-05-20 광주과학기술원 Novel bacterium able to biotransform isoeugenol and eugenol to natural vanillin or vanillic acid
WO2012172108A1 (en) 2011-06-17 2012-12-20 Symrise Ag Microorganisms and methods for producing substituted phenols
JP6509215B2 (en) 2013-07-22 2019-05-08 ビーエーエスエフ ソシエタス・ヨーロピアBasf Se Genetic engineering of Pseudomonas putida KT 2440 for rapid and high yield production of vanillin from ferulic acid
CN103805640B (en) * 2014-01-26 2016-04-06 东华大学 A kind of method utilizing bacterial oxidation pine uncle aldehyde to prepare forulic acid
EP3000888B1 (en) * 2014-09-29 2018-12-05 Symrise AG Process for converting ferulic acid into vanillin
FR3041655B1 (en) * 2015-09-29 2017-11-24 Lesaffre & Cie NEW BACTERIAL STRAINS FOR VANILLIN PRODUCTION
CN111019995B (en) 2019-12-31 2021-04-27 厦门欧米克生物科技有限公司 Method for producing vanillin by fermentation with eugenol as substrate

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997035999A2 (en) * 1996-03-23 1997-10-02 Institute Of Food Research Production of vanillin
EP0845532A2 (en) * 1996-11-29 1998-06-03 Haarmann & Reimer Gmbh Enzymes for the synthesis of coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin, vanillic acid and their applications

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05227980A (en) * 1992-02-21 1993-09-07 Takasago Internatl Corp Production of vanillin and its related compound by fermentation
DE4227076A1 (en) * 1992-08-17 1994-02-24 Haarmann & Reimer Gmbh Process for the preparation of substituted methoxyphenols and microorganisms suitable therefor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997035999A2 (en) * 1996-03-23 1997-10-02 Institute Of Food Research Production of vanillin
EP0845532A2 (en) * 1996-11-29 1998-06-03 Haarmann & Reimer Gmbh Enzymes for the synthesis of coniferyl alcohol, coniferyl aldehyde, ferulic acid, vanillin, vanillic acid and their applications

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GANNON M J ET AL, J. OF BIOL. CHEM, 1998, 273(7):4163-4170 *

Also Published As

Publication number Publication date
HUP0104772A2 (en) 2002-03-28
AU1041300A (en) 2000-05-22
WO2000026355A3 (en) 2000-11-09
PL348647A1 (en) 2002-06-03
HUP0104772A3 (en) 2003-10-28
EP1124947A2 (en) 2001-08-22
KR20020022045A (en) 2002-03-23
SK5742001A3 (en) 2001-12-03
WO2000026355A2 (en) 2000-05-11
JP2003533166A (en) 2003-11-11
CN1325444A (en) 2001-12-05
BR9914930A (en) 2001-07-10
DE19850242A1 (en) 2000-05-04
CA2348962A1 (en) 2000-05-11
HK1041902A1 (en) 2002-07-26
IL142272A0 (en) 2002-03-10

Similar Documents

Publication Publication Date Title
JP4763017B2 (en) Multiple promoters for gene expression and uses thereof
US8067210B2 (en) Method of producing lysine by culturing a host cell expressing a polynucleotide encoding a feedback resistant aspartokinase from corynebacterium
EP1094111B1 (en) Coryneform Bacteria with a deletion of Phosphoenolpyruvate-carboxykinase and their use
JP4841093B2 (en) Method for producing L-amino acid by increasing cellular NADPH
CN113667682B (en) YH66-RS11190 gene mutant and application thereof in preparation of L-valine
TW200602489A (en) PSOD expression units
US20210147889A1 (en) Method for enhancing continuous production of a natural compound during exponential growth phase and stationary phase of a microorganism
JPH0751070A (en) Polypeptide having nitrilase activity
HUE030771T2 (en) Improved promoter, and a production method for l-lysine using the same
KR20150115009A (en) L-lysine generation method by fermenting bacteria having modified aconitase gene and/or regulatory element
US5641660A (en) Glutamicum threonine biosynthetic pathway
AU761093B2 (en) Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism
JPH1084978A (en) Improved production of riboflavin
JP3408737B2 (en) Protein involved in nitrile hydratase activation and gene encoding the same
AU753879B2 (en) Industrial method for producing heterologous proteins in E.coli and strains useful for said method
AU749069B2 (en) Cytochrome BD type quinol oxidase gene of brevibacterium lactofermentum
TW200533749A (en) Pgro expression units
JP6510733B2 (en) O-acetylhomoserine sulfhydrylase mutant and method for producing L-methionine using the same
KR20010053141A (en) Propionibacterium vector
CA3217728A1 (en) Microorganism strain and method for antibiotic-free plasmid-based fermentation
JP4383984B2 (en) Method for producing L-threonine
CN111662903B (en) Logarithmic phase specific promoter and application thereof
Winteler et al. Anaerobically controlled expression system derived from the arcDABC operon of Pseudomonas aeruginosa: application to lipase production
CN113166787A (en) Method for the fermentative production of L-lysine using L-lysine-secreting bacteria of the species Corynebacterium glutamicum having the gene whiB4 completely or partially deleted
MXPA01004338A (en) Construction of production strains for producing substituted phenols by specifically inactivating genes of the eugenol and ferulic acid catabolism

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
PC Assignment registered

Owner name: SYMRISE GMBH AND CO. KG

Free format text: FORMER OWNER WAS: HAARMANN AND REIMER GMBH