AU728116B2 - Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use - Google Patents

Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use Download PDF

Info

Publication number
AU728116B2
AU728116B2 AU51993/98A AU5199398A AU728116B2 AU 728116 B2 AU728116 B2 AU 728116B2 AU 51993/98 A AU51993/98 A AU 51993/98A AU 5199398 A AU5199398 A AU 5199398A AU 728116 B2 AU728116 B2 AU 728116B2
Authority
AU
Australia
Prior art keywords
leu
protein
gly
seq
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU51993/98A
Other versions
AU5199398A (en
Inventor
Laurence B. Davin
Albena T. Dinkova-Kostova
Masayuki Fujita
David R. Gang
Norman G. Lewis
Simo Sarkanen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Washington State University Research Foundation
Original Assignee
University of Washington
Washington State University Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Washington, Washington State University Research Foundation filed Critical University of Washington
Publication of AU5199398A publication Critical patent/AU5199398A/en
Application granted granted Critical
Publication of AU728116B2 publication Critical patent/AU728116B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/22Preparation of oxygen-containing organic compounds containing a hydroxy group aromatic

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Botany (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Peptides Or Proteins (AREA)

Description

WO 98/20113 PCT/US97/20391 -1- RECOMBINANT PINORESINOL/LARICIRESINOL
REDUCTASE,
RECOMBINANT DIRIGENT PROTEIN, AND METHODS OF USE Field of the Invention The present invention relates to isolated dirigent proteins and pinoresinol/lariciresinol reductases from Forsythia intermedia, Tsuga heterophylla and Thuja plicata, to nucleic acid sequences which code for dirigent proteins and pinoresinol/lariciresinol reductases from Forsythia intermedia, Tsuga heterophylla and Thuja plicata, and to vectors containing the sequences, host cells containing the sequences and methods of producing recombinant pinoresinol/lariciresinol reductases, recombinant dirigent protein and their mutants.
Background of the Invention Lignans are a large, structurally diverse, class of vascular plant metabolites having a wide range of physiological functions and pharmacologically important properties (Ayres, and Loike, J.D. in Chemistry and Pharmacology of Natural Products. Lignans. Chemical, Biological and Clinical Properties, Cambridge University Press, Cambridge, England (1990); Lewis et al., in Chemistry of the Amazon, Biodiversity Natural Products, and Environmental Issues, 588, Seidl, O.R. Gottlieb and M.A.C. Kaplan) 135-167, ACS Symposium Series, Washington D.C. (1995)). Because of their pronounced antibiotic properties (Markkanen, T. et al., Drugs Exptl. Clin. Res. 7:711-718 (1981)), antioxidant properties (Faurd, M. et al., Phytochemistry 29:3773-3775 (1990); Osawa, T. et al., Agric. Biol. Chem. 49:3351-3352 (1985)) and antifeedant properties (Harmatha, J., and Nawrot, Biochem. Syst. Ecol. 12:95-98 (1984)), a major role of lignans in vascular plants is to help confer resistance against various opportunistic biological WO 98/20113 PCT/US97/20391 -2pathogens and predators. Lignans have also been proposed as cytokinins (Binns, A.N. etal., Proc. Natl. Acad. Sci. USA 84:980-984 (1987)) and as intermediates in lignification (Rahman, M.M.A. et al., Phytochemistry 29:1861-1866 (1990)), suggesting a critical role in plant growth and development. It is widely held that elaboration of biochemical pathways to lignins/lignans and related substances from phenylalanine (tyrosine) was essential for the successful transition of aquatic plants to their vascular dry-land counterparts (Lewis, and Davin, in Isoprenoids and Other Natural Products. Evolution and Function, 562 Nes, ed) 202-246, ACS Symposium Series: Washington, DC (1994)), some four hundred and eighty million years ago (Graham, Origin of Land Plants, John Wiley Sons, Inc., New York, NY (1993)).
Based on existing chemotaxonomic data, lignans are present in "primitive" plants, such as the fern Blechnum orientale (Wada, H. et al., Chem. Pharm. Bull.
40:2099-2101 (1992)) and the hornworts, Dendroceros japonicus and Megaceros flagellaris (Takeda, R. et al., in Bryophytes. Their Chemistry and Chemical Taxonomy, Vol. 29 (Zinsmeister, H.D. and Mues, R. eds) pp. 201-207, Oxford University Press: New York, NY (1990); Takeda, R. et al., Tetrahedron Lett.
31:4159-4162 (1990)), with the latter recently being classified as originating in the Silurian period (Graham, J. Plant Res. 109: 241-252 (1996)). Interestingly, evolution of both gymnosperms and angiosperms was accompanied by major changes in the structural complexity and oxidative modifications of the lignans (Lewis, and Davin, in Isoprenoids and Other Natural Products.
Evolution and Function, 562 D. Nes, ed) 202-246, ACS Symposium Series: Washington, DC (1994); Gottlieb, and Yoshida, in Natural Products of Woody Plants. Chemicals Extraneous to the Lignocellulosic Cell Wall (Rowe, J.W.
and Kirk, C.H. eds) pp. 439-511, Springer Verlag: Berlin (1989)). Indeed, in some species, such as Western Red Cedar (Thuja plicata), lignans can contribute extensively to heartwood formation/generation by enhancing the resulting heartwood color, quality, fragrance and durability.
In addition to their functions in plants, lignans also have important pharmacological roles. For example, podophyllotoxin, as its etoposide and teniposide derivatives, is an example of a plant compound that has been successfully employed as an anticancer agent (Ayres, and Loike, J.D. in Chemistry and Pharmacology of Natural Products. Lignans. Chemical, Biological and Clinical Properties, Cambridge University Press, Cambridge, England (1990)). Antiviral WO 98/20113 PCT/US97/20391 -3properties have also been reported for selected lignans. For example, (-)-arctigenin (Schr6der, H.C. etal., Z Naturforsch. 45c, 1215-1221 (1990)), (-)-trachelogenin (Schr6der, H.C. etal., Z Naturforsch. 45c, 1215-1221 (1990)) and nordihydroguaiaretic acid (Gnabre, J.N. etal., Proc. Natl. Acad. Sci. USA 92:11239-11243 (1995)) are each effective against HIV due to their pronounced reverse transcriptase inhibitory activities. Some lignans, matairesinol (Nikaido, T. et al., Chem. Pharm. Bull. 29:3586-3592 (1981)), inhibit cAMP-phosphodiesterase, whereas others enhance cardiovascular activity, e.g., syringaresinol P-D-glucoside (Nishibe, S. et al., Chem. Pharm. Bull. 38:1763-1765 (1990)). There is also a high correlation between the presence, in the diet, of the "mammalian" lignans or "phytoestrogens", enterolactone and enterodiol, formed following digestion of high fiber diets, and reduced incidence rates of breast and prostate cancers (so-called chemoprevention) (Axelson, and Setchell, K.D.R., FEBS Lett. 123:337-342 (1981); Adlercreutz et al., J. Steroid Biochem. Molec. Biol.
41:3-8 (1992); Adlercreutz et al., J. Steroid Biochem. Molec. Biol. 52:97-103 (1995)). The "mammalian lignans," in turn, are considered to be derived from lignans such as matairesinol and secoisolariciresinol (Boriello et al., J. Applied Bacteriol., 58:37-43 (1985)).
The biosynthetic pathways to the lignans are only now being defined, although there are no prior art reports of the isolation of enzymes or genes involved in the lignan biosynthetic pathway. Based on radiolabeling experiments with crude enzyme extracts from Forsythia intermedia, it was first established that entry into the 8,8'-linked lignans, which represent the most prevalent dilignol linkage known (Davin, and Lewis, in Rec. Adv. Phytochemistry Vol. 26 (Stafford, H.A., and Ibrahim, eds), pp. 325-375, Plenum Press, New York, NY (1992)), occurs via stereoselective coupling of two achiral coniferyl alcohol molecules, in the form of oxygenated free radicals, to afford the furofuran lignan (+)-pinoresinol (Davin, L.B., Bedgar, Katayama, and Lewis, Phytochemistry 31:3869-3874 (1992); Pard, P.W. et al., Tetrahedron Lett. 35:4731-4734 (1994)) (FIGURE 1).
Bimolecular phenoxy radical coupling reactions, such as the stereoselective coupling of two achiral coniferyl alcohol molecules to afford the furofuran lignan (+)-pinoresinol, are involved in numerous biological processes. These are presumed to include lignin formation in vascular plants Nose et al., Phytochemistry 39:71 (1995)), lignan formation in vascular plants Lewis and L.B. Davin, ACS Symp.
Ser. 562:202 (1994); P. W. Pare et al., Tetrahedron Lett. 35:4731 (1994)), suberin IK1 WO 98/20113 PCT/US97/20391 -4formation in vascular plants Bernards et al., J. Biol. Chem. 270:7382 (1995)), fruiting body development in fungi Bu'Lock et al., J. Chem. Soc. 2085 (1962)), insect cuticle melanization and sclerotization Miessner et al., Helv. Chim. Acta 74:1205 (1991); V.J. Marmaras et al., Arch. Insect Biochem. Physiol. 31:119 (1996)), the formation of aphid pigments Cameron and Lord Todd, in Organic Substances of Natural Origin. Oxidative Coupling of Phenols, W.I. Taylor and A.R.
Battersby, Eds. (Dekker, New York, 1967), Vol. 1, p.203), and the formation of algal cell wall polymers Ragan, Phytochemistry 23:2029 (1984)).
In contrast to the marked regiochemical and/or stereochemical specificities observed in the biosynthesis of the foregoing lignin and lignan substances in vivo, all previously described chemical Iqbal et al., Chem. Rev. 94:519 (1994)) and enzymatic Freudenberg, Science 148:595 (1965)) bimolecular phenoxy radical coupling reactions in vitro have lacked strict regio- and stereospecific control. That is, if chiral centers are introduced during coupling in vitro, the products are racemic, and different regiochemistries can result if more than one potential coupling site is present. Thus, the ability to generate a particular enantiomeric form or a specific coupling product in vitro is not under explicit control. Consequently, it is inferred that a mechanism exists in vivo to control the regiochemistry and stereochemistry of bimolecular phenoxy radical coupling reactions leading to the formation of, for example, lignans.
In Forsythia intermedia, and presumably other species, (+)-pinoresinol, the product of the stereospecific coupling of two E-coniferyl alcohol molecules, undergoes sequential reduction to generate (+)-lariciresinol and then (-)-secoisolariciresinol (Katayama, T. et al., Phytochemistry 32:581-591 (1993); Chu, A. et al., J. Biol. Chem. 268:27026-27033 (1993)) (FIGURE While it has hitherto been unclear whether more than one reductase is required to catalyze the sequential steps, the reductions proceed via abstraction of the pro-R hydride of NADPH, resulting in an "inversion" of configuration at both the C-7 and C-7' positions of the products, (+)-lariciresinol and (-)-secoisolariciresinol (Chu, et al., J. Biol. Chem. 268:27026-27033 (1993)). (-)-Matairesinol is subsequently formed via dehydrogenation of (-)-secoisolariciresinol, further metabolism of which presumably affords lignans such as the antiviral (-)-trachelogenin in Ipomoea cairica and (-)-podophyllotoxin in Podophyllum peltatum.
Thus, the stereospecific formation of (+)-pinoresinol and the subsequent reductive steps giving (+)-lariciresinol and (-)-secoisolariciresinol are pivotal points WO 98/20113 PCTIUS97/20391 in lignan metabolism, since they represent entry into the furano, dibenzylbutane, dibenzylbutyrolactone and aryltetrahydronaphthalene lignan subclasses.
Additionally, it should be noted that while lignans are normally optically active, the particular enantiomer present may differ between plant species. For example, (-)-pinoresinol occurs in Xanthoxylum ailanthoides (Ishii et al., Yakugaku Zasshi, 103:279-292 (1983)), and (-)-lariciresinol is present in Daphne tangutica (Lin-Gen, et al., Planta Medica, 45:172-176 (1982)). The optical activity of a particular lignan may have important ramifications regarding biological activity. For example, (-)-trachelogenin inhibits the in vitro replication of HIV-1, whereas its (+)-enantiomer is much less effective (Schroder et al., Naturforsch.
1215-1221(1990)).
Summary of the Invention In accordance with the foregoing, in one aspect of the invention it has now been discovered that a 78-kD dirigent protein is involved in conferring stereospecificity in 8,8'-linked lignan formation. This protein has no detectable catalytically active oxidative center and apparently serves only to bind and orient coniferyl alcohol-derived free radicals, which then undergo stereoselective coupling to form (+)-pinoresinol. The formation of free-radicals, in the first instance, requires the oxidative capacity of either a nonspecific oxidase or even a non-enzymatic electron oxidant. In another aspect of the invention, it has been discovered that a single enzyme, designated pinoresinol/lariciresinol reductase, catalyzes the conversion of pinoresinol to lariciresinol and then to secoisolariciresinol. Thus, one aspect of the invention relates to isolated dirigent proteins and to isolated pinoresinol/lariciresinol reductases, such as, for example, those from Forsythia intermedia, Thuja plicata and Tsuga heterophylla.
In other aspects of the invention, cDNAs encoding dirigent protein from Forsythia intermedia (SEQ ID Nos:12 and 14), Thuja plicata (SEQ ID Nos:20,22,24,26,28,30,32 and 34) and Tsuga heterophila (SEQ ID Nos:16 and 18) have been isolated and sequenced, and the corresponding amino acid sequences have been deduced. Also, cDNAs encoding pinoresinol/lariciresinol reductase from Forsythia intermedia (SEQ ID Nos:47,49,51,53,55 and 57), Thuja plicata (SEQ ID Nos:61,63,65 and 67) and Tsuga heterophila (SEQ ID Nos:69 and 71) have been isolated and sequenced, and the corresponding amino acid sequences have been deduced.
Thus, the present invention relates to isolated proteins and to isolated DNA sequences which code for the expression of dirigent protein or pinoresinol/lariciresinol reductase. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a pinoresinol/lariciresinol reductase or for a dirigent protein. The present invention is also directed to a base sequence sufficiently complementary to at least a portion of a pinoresinol/lariciresinol reductase DNA or RNA, or to at least a portion of a dirigent protein DNA or RNA, to enable hybridization therewith. The aforesaid complementary base sequences include, but are not limited to: antisense pinoresinol/lariciresinol I0 reductase RNA; antisense dirigent protein RNA; fragments of DNA that arc complementary to a pinoresinol/lariciresinol reductase DNA, or to a dirigent protein DNA, and which are therefore useful as polymerase chain reaction primers, or as probes for pinoresinol/lariciresinol reductase genes, dirigent protein genes, or related genes.
In yet other aspects of the invention, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention. Thus, the present invention provides for the recombinant expression of pinoresinol/lariciresinol reductases and dirigent proteins in plants, animals, microbes and in cell cultures. The inventive concepts described herein may be used to facilitate the production, isolation and purification of significant quantities of recombinant pinoresinol/lariciresinol reductase or dirigent protein, or of their enzyme products, in plants, animals, microbes or cell cultures.
According to a first embodiment of the invention, there is provided an isolated protein from a lignan biosynthetic pathway selected from the group consisting of dirigent protein and pinoresinol/lariciresinol reductases, wherein when the isolated protein is a 25 pinoresinol/lariciresinol reductase the isolated protein has an enzymatic activity of at least 51nmol h 1 mg According to a second embodiment of the invention, there is provided an isolated Snucleotide sequence encoding a dirigent protein.
According to a third embodiment of the invention, there is provided an isolated nucleotide sequence encoding a dirigent protein from a Forsythia species.
According to a fourth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No:13 or SEQ [I:\DayLib\LIBFF]06648spec.doc:gcc 6a According to a fifth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a dirigent protein from a Tsuga species.
According to a sixth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No: 17 or SEQ ID No:19.
According to a seventh embodiment of the invention, there is provided an isolated nucleotide sequence encoding a dirigent protein from a Thuja species.
According to an eighth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:21, 23, 25, 27, 29, 31, 33 or According to a ninth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Forsythia species.
According to a tenth embodiment of the invention, there is provided an isolated is nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:48, 50,52, 54, 56 or 58.
According to a eleventh embodiment of the invention, there is provided an isolated nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Thuja species.
According to a twelfth embodiment of the invention, there is provided an isolated 20 nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:62, 64, 66 or 68.
According to a thirteenth embodiment of the invention, there is provided an isolated nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Tsuga species.
S: According to a fourteenth embodiment of the invention, there is provided an 25 isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No:70 or SEQ ID No:72.
According to a fifteenth embodiment of the invention, there is provided a replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a dirigent protein selected from the group consisting of SEQ ID Nos:13, 15, 17, 19, 21, 23, 25, 27, 29, 31,33 and According to a sixteenth embodiment of the invention, there is provided a replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a pinoresinol/lariciresinol reductase selected from the group Sconsisting of SEQ ID Nos:48, 50, 52, 54, 56, 58, 62, 64, 66, 68, 70 and 72.
[I:\DayLib\LIBFF]06648spec.doc:gcc 6b According to a seventeenth embodiment of the invention, there is provided a host cell comprising a vector in accordance with the fifteenth embodiment of the present invention.
According to an eighteenth embodiment of the invention, there is provided a method of enhancing the expression of pinoresinol/lariciresinol reductase in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence encoding a protein having the biological activity of a protein selected from the group consisting of SEQ ID Nos:48, 50, 52, 54, 56, 58, 62, 64, 66, 68, and 72.
According to a nineteenth embodiment of the invention, there is provided a method of modifying the expression of pinoresinol/lariciresinol reductase in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence that expresses an RNA that is complementary to all or part of a nucleic acid molecule selected from the group consisting of SEQ ID Nos:47, 49, 51, 53, 55, 57, 61, 63, s1 65, 67, 69 and 71.
According to a twentieth embodiment of the invention, there is provided a method S. of enhancing the expression of dirigent protein in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence encoding a protein having the biological activity of a protein selected from the group 20 consisting of SEQ IDNos:13, 15, 17, 19, 21, 23, 25, 27,29, 31, 33 and According to a twenty-first embodiment of the invention, there is provided a method of modifying the expression of dirigent protein in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence that expresses an RNA that is complementary to all or part of a nucleic acid molecule 25 selected from the group consisting of SEQ ID Nos:12, 14, 16, 18, 20, 22, 24, 26, 28, 32 and 34.
4 According to a twenty-second embodiment of the invention, there is provided a .4 method of producing optically-pure lignans comprising introducing into a host cell an expression vector that comprises a nucleotide sequence encoding a dirigent protein capable of directing a bimolecular phenoxy coupling reaction to produce an optically pure lignan, and purifying the optically pure lignan from the host cell.
Brief Description of the Drawings The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to [I :\DayL ib\LI B FF] 06648 spec.doc:gcc I 4 the following detailed description, when taken in conjunction with the accompanying drawings, wherein: FIGURE 1 shows the stereospecific conversion of E-coniferyl alcohol to (+)-pinoresinol in Forsythia intermedia. The stereoselectivity of this reaction is controlled by dirigent protein. (+)-Pinoresinol is then sequentially converted to (+)-lariciresinol and (-)-secoisolariciresinol by (+)-pinoresinol/(+)-lariciresinoI reductase. (+)-pinoresinol, (+)-lariciresinol and (-)-secoisolariciresinol are the precursors of the furofuran, furano and dibenzylbutane families of lignans, respectively.
see.*: 0** 00 .0 [I 1\Day Li b\L B FF16648spec. doc:9cc WO 98/20113 PCT/US97/20391 -7- Detailed Description of the Preferred Embodiment As used herein, the terms "amino acid" and "amino acids" refer to all naturally occurring L-a-amino acids or their residues. The amino acids are identified by either the single-letter or three-letter designations: Asp D aspartic acid Ile I isoleucine Thr T threonine Leu L leucine Ser S serine Tyr Y tyrosine Glu E glutamic acid Phe F phenylalanine Pro P proline His H histidine Gly G glycine Lys K lysine Ala A alanine Arg R arginine Cys C cysteine Trp W tryptophan Val V valine Gin Q glutamine Met M methionine Asn N asparagine As used herein, the term "nucleotide" means a monomeric unit of DNA or RNA containing a sugar moiety (pentose), a phosphate and a nitrogenous heterocyclic base. The base is linked to the sugar moiety via the glycosidic carbon carbon of pentose) and that combination of base and sugar is called a nucleoside.
The base characterizes the nucleotide with the four bases of DNA being adenine guanine cytosine and thymine Inosine is a synthetic base that can be used to substitute for any of the four, naturally-occurring bases (A, C, G or The four RNA bases are A,G,C and uracil The nucleotide sequences described herein comprise a linear array of nucleotides connected by phosphodiester bonds between the 3' and 5' carbons of adjacent pentoses.
The term "percent identity" means the percentage of amino acids or nucleotides that occupy the same relative position when two amino acid sequences, or two nucleic acid sequences, are aligned side by side.
The term "percent similarity" is a statistical measure of the degree of relatedness of two compared protein sequences. The percent similarity is calculated by a computer program that assigns a numerical value to each compared pair of amino acids based on chemical similarity whether the compared amino acids are acidic, basic, hydrophobic, aromatic, etc.) and/or evolutionary distance as measured by the minimum number of base pair changes that would be required to convert a codon encoding one member of a pair of compared amino acids to a codon WO 98/20113 PCT/US97/20391 -8encoding the other member of the pair. Calculations are made after a best fit alignment of the two sequences has been made empirically by iterative comparison of all possible alignments. (Henikoff, S. and Henikoff, Proc. Nat'l Acad Sci USA 89:10915-10919 (1992)).
"Oligonucleotide" refers to short length single or double stranded sequences of deoxyribonucleotides linked via phosphodiester bonds. The oligonucleotides are chemically synthesized by known methods and purified, for example, on polyacrylamide gels.
The term "pinoresinol/lariciresinol reductase" is used herein to mean an enzyme capable of catalyzing two reduction reactions: the reduction of pinoresinol to lariciresinol, and the reduction of lariciresinol to secoisolariciresinol. The products of these reactions, lariciresinol and secoisolariciresinol, can be either the or (-)-enantiomers.
The term "dirigent protein" is used herein to mean a protein capable of guiding a bimolecular phenoxy radical coupling reaction thereby determining the stereochemistry and regiochemistry of the product of the reaction and/or its polymeric derivatives.
The terms "alteration", "amino acid sequence alteration", "variant" and "amino acid sequence variant" refer to dirigent protein or pinoresinol/lariciresinol reductase molecules with some differences in their amino acid sequences as compared to the corresponding native dirigent protein or pinoresinol/lariciresinol reductase. Ordinarily, the variants will possess at least about 70% homology with the corresponding, native dirigent protein or pinoresinol/lariciresinol reductase, and preferably they will be at least about 80% homologous with the corresponding, native dirigent protein or pinoresinol/lariciresinol reductase. The amino acid sequence variants of dirigent protein or pinoresinol/lariciresinol reductase falling within this invention possess substitutions, deletions, and/or insertions at certain positions.
Sequence variants of dirigent protein or pinoresinol/lariciresinol reductase may be used to attain desired enhanced or reduced enzymatic activity, modified regiochemistry or stereochemistry, or altered substrate utilization or product distribution.
Substitutional dirigent protein variants or pinoresinol/lariciresinol reductase variants are those that have at least one amino acid residue in the corresponding native dirigent protein sequence or pinoresinol/lariciresinol reductase sequence removed and a different amino acid inserted in its place at the same position. The WO 98/20113 PCTf~S97/20391 -9substitutions may be single, where only one amino acid in the molecule has been substituted, or they may be multiple, where two or more amino acids have been substituted in the same molecule. Substantial changes in the activity of the dirigent protein or pinoresinol/lariciresinol reductase molecule may be obtained by substituting an amino acid with a side chain that is significantly different in charge and/or structure from that of the native amino acid. This type of substitution would be expected to affect the structure of the polypeptide backbone and/or the charge or hydrophobicity of the molecule in the area of the substitution.
Moderate changes in the activity of the dirigent protein or pinoresinol/lariciresinol reductase molecule would be expected by substituting an amino acid with a side chain that is similar in charge and/or structure to that of the native molecule. This type of substitution, referred to as a conservative substitution, would not be expected to substantially alter either the structure of the polypeptide backbone or the charge or hydrophobicity of the molecule in the area of the substitution.
Insertional dirigent protein variants or pinoresinol/lariciresinol reductase variants are those with one or more amino acids inserted immediately adjacent to an amino acid at a particular position in the native dirigent protein or pinoresinol/lariciresinol reductase molecule. Immediately adjacent to an amino acid means connected to either the a-carboxy or a-amino functional group of the amino acid.
The insertion may be one or more amino acids. Ordinarily, the insertion will consist of one or two conservative amino acids. Amino acids similar in charge and/or structure to the amino acids adjacent to the site of insertion are defined as conservative. Alternatively, this invention includes insertion of an amino acid with a charge and/or structure that is substantially different from the amino acids adjacent to the site of insertion.
Deletional variants are those where one or more amino acids in the native dirigent protein or pinoresinol/lariciresinol reductase molecule have been removed.
Ordinarily, deletional variants will have one or two amino acids deleted in a particular region of the dirigent protein or pinoresinol/lariciresinol reductase molecule.
The term "antisense" or "antisense RNA" or "antisense nucleic acid" is used herein to mean a nucleic acid molecule that is complementary to all or part of a messenger RNA molecule. Antisense nucleic acid molecules are typically used to inhibit the expression, in vivo, of complementary, expressed messenger RNA molecules.
WO 98/20113 PCTIUS97/20391 The terms "biological activity", "biologically active", "activity" and "active" when used with reference to a pinoresinol/lariciresinol reductase molecule refer to the ability of the pinoresinol/lariciresinol reductase molecule to reduce pinoresinol and lariciresinol to yield lariciresinol and secoisolariciresinol, respectively, as measured in an enzyme activity assay, such as the assay described in Example 8 below.
The terms "biological activity", "biologically active", "activity" and "active" when used with reference to a dirigent protein refer to the ability of the dirigent protein to guide a bimolecular phenoxy radical coupling reaction thereby determining the stereochemistry and regiochemistry of the product of the reaction and of its polymeric derivatives.
Amino acid sequence variants of dirigent protein or pinoresinol/lariciresinol reductase may have desirable altered biological activity including, for example, altered reaction kinetics, substrate utilization, product distribution or other characteristics such as regiochemistry and stereochemistry.
The terms "DNA sequence encoding", "DNA encoding" and "nucleic acid encoding" refer to the order or sequence of deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids along the translated polypeptide chain. The DNA sequence thus codes for the amino acid sequence.
The terms "replicable expression vector" and "expression vector" refer to a piece of DNA, usually double-stranded, which may have inserted into it a piece of foreign DNA. Foreign DNA is defined as heterologous DNA, which is DNA not naturally found in the host. The vector is used to transport the foreign or heterologous DNA into a suitable host cell. Once in the host cell, the vector can replicate independently of or coincidentally with the host chromosomal DNA, and several copies of the vector and its inserted (foreign) DNA may be generated. In addition, the vector contains the necessary elements that permit translating the foreign DNA into a polypeptide. Many molecules of the polypeptide encoded by the foreign DNA can thus be rapidly synthesized.
The terms "transformed host cell," "transformed" and "transformation" refer to the introduction of DNA into a cell. The cell is termed a "host cell", and it may be a prokaryotic or a eukaryotic cell. Typical prokaryotic host cells include various strains of E. coli. Typical eukaryotic host cells are plant cells, such as maize cells, yeast cells, insect cells or animal cells. The introduced DNA is usually in the form of WO 98/20113 PCT/US97/20391 -11a vector containing an inserted piece of DNA. The introduced DNA sequence may be from the same species as the host cell or from a different species from the host cell, or it may be a hybrid DNA sequence, containing some foreign DNA and some DNA derived from the host species.
In accordance with the present invention, cDNAs encoding dirigent protein and pinoresinol/lariciresinol reductase from Forsythia intermedia, Thuja plicata and Tsuga heterophylla were isolated, sequenced and expressed in the following manner.
With respect to the cDNAs encoding dirigent protein from Forsythia intermedia, an empirically-determined purification protocol was developed to isolate the Forsythia dirigent protein. This procedure yielded at least six isoforms of the dirigent protein. Amino acid sequencing of the amino terminus of each of these isoforms revealed that the sequence of each isoform was identical. Sequencing of the N-terminus of a mixture of these isoforms yielded a 28 amino acid sequence (SEQ ID No:1). Tryptic digestion of a mixture of these isoforms yielded six peptide fragments which were purified in sufficient quantity to permit sequencing SEQ ID Nos:2-7.
A primer designated PSINT1 (SEQ ID No:8) was synthesized based on the sequence of amino acids 9 to 15 of the N-terminal peptide (SEQ ID No: A primer designated PSIlR (SEQ ID No:9) was synthesized based on the sequence of amino acids 3 to 9 of the internal peptide sequence set forth in (SEQ ID No:2). A primer designated PSI2R (SEQ ID No:10) was synthesized based on the sequence of amino acids 13 to 20 of the internal peptide sequence set forth in (SEQ ID No:2). A primer designated PSI7R (SEQ ID No: 11) was synthesized based on the sequence of amino acids 6 to 12 of the internal peptide sequence set forth in (SEQ ID No:3).
Forsythia total RNA was isolated by means of a protocol adapted from a method specifically designed for woody tissues which contain a large concentration of polyphenols. Poly A+ RNA was isolated and a cDNA library constructed using standard means. A PCR reaction utilizing primers PSINT1 (SEQ ID No:8) and one of PSI7R, (SEQ ID No:ll) PSI2R (SEQ ID No:10) or PSI1R (SEQ ID No:9), together with an aliquot of Forsythia cDNA as substrate, each yielded a single cDNA band of -370 bp, -155 bp and -125 bp, respectively. The -370 bp product of the PSINTI (SEQ ID NO:8)-PSI7R (SEQ ID No: 11) reaction was amplified by PCR and utilized as a probe to screen approximatley 600,000 PFU of a Forsythia intermedia cDNA library. Two distinct cDNAs were identified, called pPSDFil (SEQ ID No:12) and pPSDFi2 (SEQ ID No:14). The cDNA insert encoding dirigent WO 98/20113 PCT/US97/20391 -12protein was excised from plasmid pPSDFil and cloned into the baculovirus transfer vector pBlueBac4. The resulting construct was used to transform Spodoptera frugiperda from which functional dirigent protein was purified.
With respect to the cloning of dirigent protein from Thuja plicata and Tsuga heterophylla, the Forsythia cDNAs were used as probes to isolate two dirigent protein clones from Tsuga heterophylla (SEQ ID Nos:16, 18), and eight dirigent protein cDNA clones from Thuja plicata (SEQ ID Nos:20, 22, 24, 26, 28, 30, 32, 34).
With respect to the cDNAs encoding (+)-pinoresinol/(+)-lariciresinol reductase from Forsythia intermedia, an empirically-determined purification protocol, consisting of eight chromatographic steps, was developed to isolate the Forsythia (+)-pinoresinol/(+)-lariciresinol reductase protein. This procedure yielded two isoforms of (+)-pinoresinol/(+)-lariciresinol reductase which were both capable of catalyzing the reduction of (+)-pinoresinol and (+)-lariciresinol. Sequencing of the N-terminus of each of these isoforms yielded an identical 30 amino acid sequence (SEQ ID No:36). Tryptic digestion of a mixture of both of these isoforms yielded four peptide fragments which were purified in sufficient quantity to permit sequencing (SEQ ID Nos:37-40). Additionally, cyanogen bromide cleavage of a mixture of both of these isoforms yielded three peptide fragments which were purified in sufficient quantity to permit sequencing (SEQ ID Nos:41-43).
A primer designated PLRN5 (SEQ ID No:44) was synthesized based on the sequence of amino acids 7 to 13 of the N-terminal peptide (SEQ ID No:36). A primer designated PLR14R (SEQ ID No:45) was synthesized based on the sequence of amino acids 2 to 8 of the internal peptide sequence set forth in SEQ ID No:37. A primer designated PLRI5R (SEQ ID No:46) was synthesized based on the sequence of amino acids 9 to 15 of the internal peptide sequence set forth in SEQ ID No:37.
The sequence of amino acids 9 to 15 of the internal peptide sequence set forth in SEQ ID No:37, upon which the sequence of primer PLR15R (SEQ ID No:46) was based, also corresponded to the sequence of amino acids 4 to 10 of the cyanogen bromide-generated, internal fragment set forth in SEQ ID No:41.
Forsythia total RNA was isolated by means of a protocol adapted from a method specifically designed for woody tissues which contain a large concentration of polyphenols. Poly A+ RNA was isolated and a cDNA library constructed using standard means. A PCR reaction utilizing primers PLRN5 (SEQ ID No:44) and either PLR14R (SEQ ID No:45) or PLR15R (SEQ ID No:46), together with an WO 98/20113 PCTIUS97/20391 -13aliquot of Forsythia cDNA as substrate, yielded two, amplified bands of 380 bp and 400 bp. One 400 bp cDNA insert was utilized as a probe with which to screen the Forsythia cDNA library. The 400 bp probe corresponded to bases 22 to 423 of SEQ ID No:47. Six cDNA clones were isolated and sequenced (SEQ ID Nos:47, 49, 51, 53, 55, 57). The clones shared a common coding region, many had a different region and the 3'-untranslated region of each terminated at a different point. One of these cDNAs (SEQ ID No:47), expressed as a P-galactosidase fusion protein in E. coli, catalyzed the same enantiomer-specific reactions as the native plant protein.
With respect to the cloning of (+)-pinoresinol/(+)-lariciresinol reductase and (-)-pinoresinol/(-)-lariciresinol reductase from Thuja plicata, cDNA was synthesized and utilized as a template in a PCR reaction in which the primers were a 3' linkerprimer (SEQ ID No:59) and a 5' primer, designated CR6-NT, (SEQ ID No:60). At least two bands of the expected length (1.2 kb) were generated and cloned into a plasmid vector. One clone, designated plr-Tpl, (SEQ ID No:61) was completely sequenced and expressed as a p-galactosidase fusion protein in E. coli. plr-Tpl encodes a (-)-pinoresinol/(-)-lariciresinol reductase.
The cDNA insert of clone plr-Tpl was used to screen the T plicata cDNA library and identified an additional, unique clone, designated plr-Tp2, (SEQ ID No:63). plr-Tp2 has high homology to plr-Tpl but encodes a (+)-pinoresinol/(+)-lariciresinol reductase. The cDNA insert of clone plr-Tpl was used to screen the T. plicata cDNA library and identify an additional two pinoresinol/lariciresinol reductase cDNAs (SEQ ID Nos:65, 67).
Two cDNAs encoding pinoresinol/lariciresinol reductases from Tsuga heterophylla (SEQ ID Nos:69, 71) were isolated by screening a Tsuga heterophylla cDNA library with the plr-Tpl cDNA insert.
The isolation of cDNAs encoding dirigent proteins, (+)-pinoresinol/- (+)-lariciresinol reductase and (-)-pinoresinol/(-)-lariciresinol reductase permits the development of an efficient expression system for these functional enzymes; provides useful tools for examining the developmental regulation of lignan biosynthesis and permits the isolation of other dirigent proteins and pinoresinol/lariciresinol reductases. The isolation of the dirigent protein and pinoresinol/lariciresinol reductase cDNAs also permits the transformation of a wide range of organisms in order to enhance or modify lignan biosynthesis.
WO 98/20113 PCT/US97/20391 -14- The proteins and nucleic acids of the present invention can be utilized to predetermine the stereochemistry, regiochemistry, or both, of the products of bimolecular phenoxy coupling reactions, such as the furofuran, furano and dibenzylbutane lignans. By way of non-limiting examples, the proteins and nucleic acids of the present invention can be utilized to: elevate or otherwise alter the levels of health-protecting lignans, such as podophyllotoxin, in plant species, including but not limited to vegetables, grains and fruits, and to food items incorporating material derived from such genetically altered plants; genetically alter plant species to provide an abundant, natural supply of lignans useful for a variety of purposes, for example as neutriceuticals and dietary supplements; to genetically alter living organisms to produce an abundant supply of optically pure lignans having desirable biological properties, for example (-)-arctigenin which possesses antiviral properties. In particular, characterization of the dirigent protein binding site and mechanism of action permits the development of synthetic proteins consisting of an array of dirigent protein binding sites which serve as templates for stereochemicallycontrolled polymeric assembly.
N-terminal transport sequences well known in the art (see, e.g., von Heijne, G. et al., Eur. J. Biochem 180:535-545 (1989); Stryer, Biochemistry W.H. Freeman and Company, New York, NY, p. 769 (1988)) may be employed to direct the dirigent protein or pinoresinol/lariciresinol reductase to a variety of cellular or extracellular locations.
Sequence variants of wild-type dirigent protein clones and pinoresinol/lariciresinol clones that can be produced by deletions, substitutions, mutations and/or insertions are intended to be within the scope of the invention except insofar as limited by the prior art. Dirigent protein or pinoresinol/lariciresinol reductase amino acid sequence variants may be constructed by mutating the DNA sequence that encodes wild-type dirigent protein or wild-type pinoresinol/lariciresinol reductase, such as by using techniques commonly referred to as site-directed mutagenesis.
Various polymerase chain reaction (PCR) methods now well known in the field, such as a two primer system like the Transformer Site-Directed Mutagenesis kit from Clontech, may be employed for this purpose.
Following denaturation of the target plasmid in this system, two primers are simultaneously annealed to the plasmid; one of these primers contains the desired site-directed mutation, the other contains a mutation at another point in the plasmid resulting in elimination of a restriction site. Second strand synthesis is then carried WO 98/20113 PCTIUS97/20391 out, tightly linking these two mutations, and the resulting plasmids are transformed into a mutS strain of E. coli. Plasmid DNA is isolated from the transformed bacteria, restricted with the relevant restriction enzyme (thereby linearizing the unmutated plasmids), and then retransformed into E. coli. This system allows for generation of mutations directly in an expression plasmid, without the necessity of subcloning or generation of single-stranded phagemids. The tight linkage of the two mutations and the subsequent linearization of unmutated plasmids results in high mutation efficiency and allows minimal screening. Following synthesis of the initial restriction site primer, this method requires the use of only one new primer type per mutation site. Rather than prepare each positional mutant separately, a set of "designed degenerate" oligonucleotide primers can be synthesized in order to introduce all of the desired mutations at a given site simultaneously. Transformants can be screened by sequencing the plasmid DNA through the mutagenized region to identify and sort mutant clones. Each mutant DNA can then be restricted and analyzed by electrophoresis on Mutation Detection Enhancement gel Baker) to confirm that no other alterations in the sequence have occurred (by band shift comparison to the unmutagenized control).
The verified mutant duplexes can be cloned into a replicable expression vector, if not already cloned into a vector of this type, and the resulting expression construct used to transform E. coli, such as strain E. coli BL21(DE3)pLysS, for high level production of the mutant protein, and subsequent purification thereof. The method of FAB-MS mapping can be employed to rapidly check the fidelity of mutant expression. This technique provides for sequencing segments throughout the whole protein and provides the necessary confidence in the sequence assignment. In a mapping experiment of this type, protein is digested with a protease (the choice will depend on the specific region to be modified since this segment is of prime interest and the remaining map should be identical to the map of unmutagenized protein).
The set of cleavage fragments is fractionated by microbore HPLC (reversed phase or ion exchange, again depending on the specific region to be modified) to provide several peptides in each fraction, and the molecular weights of the peptides are determined by FAB-MS. The masses are then compared to the molecular weights of peptides expected from the digestion of the predicted sequence, and the correctness of the sequence quickly ascertained. Since this mutagenesis approach to protein modification is directed, sequencing of the altered peptide should not be necessary if the MS agrees with prediction. If necessary to verify a changed residue, WO 98/20113 PCTIS97/20391 -16- CAD-tandem MS/MS can be employed to sequence the peptides of the mixture in question, or the target peptide purified for subtractive Edman degradation or carboxypeptidase Y digestion depending on the location of the modification.
In the design of a particular site directed mutant, it is generally desirable to first make a non-conservative substitution Ala for Cys, His or Glu) and determine if activity is greatly impaired as a consequence. The properties of the mutagenized protein are then examined with particular attention to the kinetic parameters of Km and kcat as sensitive indicators of altered function, from which changes in binding and/or catalysis per se may be deduced by comparison to the native enzyme. If the residue is by this means demonstrated to be important by activity impairment, or knockout, then conservative substitutions can be made, such as Asp for Glu to alter side chain length, Ser for Cys, or Arg for His. For hydrophobic segments, it is largely size that will be altered, although aromatics can also be substituted for alkyl side chains. Changes in the normal product distribution can indicate which step(s) of the reaction sequence have been altered by the mutation.
Other site directed mutagenesis techniques may also be employed with the nucleotide sequences of the invention. For example, restriction endonuclease digestion of DNA followed by ligation may be used to generate dirigent protein or pinoresinol/lariciresinol reductase deletion variants, as described in Section 15.3 of Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, New York, NY (1989)). A similar strategy may be used to construct insertion variants, as described in Section 15.3 of Sambrook et al., supra.
Oligonucleotide-directed mutagenesis may also be employed for preparing substitution variants of this invention. It may also be used to conveniently prepare the deletion and insertion variants of this invention. This technique is well known in the art as described by Adelmanetal. (DNA 2:183 (1983)). Generally, oligonucleotides of at least 25 nucleotides in length are used to insert, delete or substitute two or more nucleotides in the dirigent protein gene or pinoresinol/lariciresinol reductase gene. An optimal oligonucleotide will have 12 to 15 perfectly matched nucleotides on either side of the nucleotides coding for the mutation. To mutagenize the wild-type dirigent protein or wild-type pinoresinol/lariciresinol reductase, the oligonucleotide is annealed to the single-stranded DNA template molecule under suitable hybridization conditions. A DNA polymerizing enzyme, usually the Klenow fragment of E. coli DNA polymerase I, is then added. This WO 98/20113 PCT/US97/20391 -17enzyme uses the oligonucleotide as a primer to complete the synthesis of the mutation-bearing strand of DNA. Thus, a heteroduplex molecule is formed such that one strand of DNA encodes the wild-type dirigent protein or pinoresinol/lariciresinol reductase inserted in the vector, and the second strand of DNA encodes the mutated form of dirigent protein or pinoresinol/lariciresinol reductase inserted into the same vector. This heteroduplex molecule is then transformed into a suitable host cell.
Mutants with more than one amino acid substituted may be generated in one of several ways. If the amino acids are located close together in the polypeptide chain, they may be mutated simultaneously using one oligonucleotide that codes for all of the desired amino acid substitutions. If however, the amino acids are located some distance from each other (separated by more than ten amino acids, for example) it is more difficult to generate a single oligonucleotide that encodes all of the desired changes. Instead, one of two alternative methods may be employed. In the first method, a separate oligonucleotide is generated for each amino acid to be substituted.
The oligonucleotides are then annealed to the single-stranded template DNA simultaneously, and the second strand of DNA that is synthesized from the template will encode all of the desired amino acid substitutions.
An alternative method involves two or more rounds of mutagenesis to produce the desired mutant. The first round is as described for the single mutants: wild-type dirigent protein or pinoresinol/lariciresinol reductase DNA is used for the template, an oligonucleotide encoding the first desired amino acid substitution(s) is annealed to this template, and the heteroduplex DNA molecule is then generated.
The second round of mutagenesis utilizes the mutated DNA produced in the first round of mutagenesis as the template. Thus, this template already contains one or more mutations. The oligonucleotide encoding the additional desired amino acid substitution(s) is then annealed to this template, and the resulting strand of DNA now encodes mutations from both the first and second rounds of mutagenesis. This resultant DNA can be used as a template in a third round of mutagenesis, and so on.
Eukaryotic expression systems may be utilized for dirigent protein or pinoresinol/lariciresinol reductase production since they are capable of carrying out any required posttranslational modifications and of directing the enzyme to the proper membrane location. A representative eukaryotic expression system for this purpose uses the recombinant baculovirus, Autographa californica nuclear polyhedrosis virus (AcNPV; M.D. Summers and G.E. Smith, A Manual of Methods forBaculovirus Vectors and Insect Cell Culture Procedures (1986); Luckow et al., WO 98/20113 PCTIUS97/20391 -18- Bio-technology 6:47-55 (1987)) for expression of the dirigent protein or pinoresinol/lariciresinol reductases of the invention. Infection of insect cells (such as cells of the species Spodoptera frugiperda) with the recombinant baculoviruses allows for the production of large amounts of the dirigent protein or pinoresinol/lariciresinol reductase protein. In addition, the baculovirus system has other important advantages for the production of recombinant dirigent protein or pinoresinol/lariciresinol reductase. For example, baculoviruses do not infect humans and can therefore be safely handled in large quantities. In the baculovirus system, a DNA construct is prepared including a DNA segment encoding dirigent protein or pinoresinol/lariciresinol reductase and a vector. The vector may comprise the polyhedron gene promoter region of a baculovirus, the baculovirus flanking sequences necessary for proper cross-over during recombination (the flanking sequences comprise about 200-300 base pairs adjacent to the promoter sequence) and a bacterial origin of replication which permits the construct to replicate in bacteria.
The vector is constructed so that (i)the DNA segment is placed adjacent (or operably-linked or "downstream" or "under the control of') to the polyhedron gene promoter and (ii)the promoter/pinoresinol/lariciresinol reductase, or promoter/dirigent protein, combination is flanked on both sides by 200-300 base pairs of baculovirus DNA (the flanking sequences).
To produce a dirigent protein DNA construct, or a pinoresinol/lariciresinol reductase DNA construct, a cDNA clone encoding a full length dirigent protein or pinoresinol/lariciresinol reductase is obtained using methods such as those described herein. The DNA construct is contacted in a host cell with baculovirus DNA of an appropriate baculovirus (that is, of the same species of baculovirus as the promoter encoded in the construct) under conditions such that recombination is effected. The resulting recombinant baculoviruses encode the full dirigent protein or pinoresinol/lariciresinol reductase. For example, an insect host cell can be cotransfected or transfected separately with the DNA construct and a functional baculovirus. Resulting recombinant baculoviruses can then be isolated and used to infect cells to effect production of dirigent protein or pinoresinol/lariciresinol reductase. Host insect cells include, for example, Spodoptera frugiperda cells.
Insect host cells infected with a recombinant baculovirus of the present invention are then cultured under conditions allowing expression of the baculovirus-encoded dirigent protein or pinoresinol/lariciresinol reductase. Recombinant protein thus produced is then extracted from the cells using methods known in the art.
WO 98/20113 PCTIUS97/20391 -19- Other eukaryotic microbes such as yeasts may also be used to practice this invention. The baker's yeast Saccharomyces cerevisiae, is a commonly used yeast, although several other strains are available. The plasmid YRp7 (Stinchcomb et al., Nature 282:39 (1979); Kingsman et al., Gene 7:141 (1979); Tschemper et al., Gene 10:157 (1980)) is commonly used as an expression vector in Saccharomyces. This plasmid contains the trpl gene that provides a selection marker for a mutant strain of yeast lacking the ability to grow in tryptophan, such as strains ATCC No. 44,076 and PEP4-1 (Jones, Genetics 85:12 (1977)). The presence of the trpl lesion as a characteristic of the yeast host cell genome then provides an effective environment for detecting transformation by growth in the absence of tryptophan. Yeast host cells are generally transformed using the polyethylene glycol method, as described by Hinnen (Proc. Natl. Acad. Sci. USA 75:1929 (1978)). Additional yeast transformation protocols are set forth in Gietz et al., N.A.R. 20(17):1425 (1992); Reeves et al., FEMS 99:193-197 (1992).
Suitable promoting sequences in yeast vectors include the promoters for 3-phosphoglycerate kinase (Hitzeman et al., J. Biol. Chem. 255:2073 (1980)) or other glycolytic enzymes (Hess et al., J. Adv. Enzyme Reg. 7:149 (1968); Holland et al., Biochemistry 17:4900 (1978)), such as enolase, glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, phosphoglucose isomerase, and glucokinase. In the construction of suitable expression plasmids, the termination sequences associated with these genes are also ligated into the expression vector 3' of the sequence desired to be expressed to provide polyadenylation of the mRNA and termination. Other promoters that have the additional advantage of transcription controlled by growth conditions are the promoter region for alcohol dehydrogenase 2, isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen metabolism, and the aforementioned glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible for maltose and galactose utilization. Any plasmid vector containing yeast-compatible promoter, origin of replication and termination sequences is suitable.
Cell cultures derived from multicellular organisms, such as plants, may be used as hosts to practice this invention. Transgenic plants can be obtained, for example, by transferring plasmids that encode pinoresinol/lariciresinol reductase, and/or dirigent protein, and a selectable marker gene, the kan gene encoding WO 98/20113 PCT/US97/20391 resistance to kanamycin, into Agrobacterium tumifaciens containing a helper Ti plasmid as described in Hoeckema et al., Nature 303:179-181 (1983) and culturing the Agrobacterium cells with leaf slices of the plant to be transformed as described by An et al., Plant Physiology 81:301-305 (1986). Transformation of cultured plant host cells is normally accomplished through Agrobacterium tumifaciens, as described above. Cultures of mammalian host cells and other host cells that do not have rigid cell membrane barriers are usually transformed using the calcium phosphate method as originally described by Graham and Van der Eb (Virology 52:546 (1978)) and modified as described in Sections 16.32-16.37 of Sambrook et al., supra. However, other methods for introducing DNA into cells such as Polybrene (Kawai and Nishizawa, Mol. Cell. Bio 1. 4:1172 (1984)), protoplast fusion (Schaffner, Proc. Natl.
Acad Sci. USA 77:2163 (1980)), electroporation (Neumann et al., EMBOJ. 1:841 (1982)), and direct microinjection into nuclei (Capecchi, Cell 22:479 (1980)) may also be used. Additionally, animal transformation strategies are reviewed in Monastersky G.M. and Robl, Strategies in Transgenic Animal Science, ASM Press, Washington, D.C. (1995). Transformed plant calli may be selected through the selectable marker by growing the cells on a medium containing, kanamycin, and appropriate amounts of phytohormone such as naphthalene acetic acid and benzyladenine for callus and shoot induction. The plant cells may then be regenerated and the resulting plants transferred to soil using techniques well known to those skilled in the art.
In addition, a gene regulating pinoresinol/lariciresinol reductase production, or dirigent protein production, can be incorporated into the plant along with a necessary promoter which is inducible. In the practice of this embodiment of the invention, a promoter that only responds to a specific external or internal stimulus is fused to the target cDNA. Thus, the gene will not be transcribed except in response to the specific stimulus. As long as the gene is not being transcribed, its gene product is not produced.
An illustrative example of a responsive promoter system that can be used in the practice of this invention is the glutathione-S-transferase (GST) system in maize.
GSTs are a family of enzymes that can detoxify a number of hydrophobic electrophilic compounds that often are used as pre-emergent herbicides (Weigand et al., Plant Molecular Biology 7:235-243 (1986)). Studies have shown that the GSTs are directly involved in causing this enhanced herbicide tolerance.
This action is primarily mediated through a specific 1.1 kb mRNA transcription WO 98/20113 PCTIUS97/20391 -21product. In short, maize has a naturally occurring quiescent gene already present that can respond to external stimuli and that can be induced to produce a gene product.
This gene has previously been identified and cloned. Thus, in one embodiment of this invention, the promoter is removed from the GST responsive gene and attached to a pinoresinol/lariciresinol reductase gene, or a dirigent protein gene, that previously has had its native promoter removed. This engineered gene is the combination of a promoter that responds to an external chemical stimulus and a gene responsible for successful production of pinoresinol/lariciresinol reductase or dirigent protein.
In addition to the methods described above, several methods are known in the art for transferring cloned DNA into a wide variety of plant species, including gymnosperms, angiosperms, monocots and dicots (see, Glick and Thompson, eds., Methods in Plant Molecular Biology, CRC Press, Boca Raton, Florida (1993)).
Representative examples include electroporation-facilitated DNA uptake by protoplasts (Rhodes et al., Science 240(4849):204-207 (1988)); treatment of protoplasts with polyethylene glycol (Lyznik et al., Plant Molecular Biology 13:151-161 (1989)); and bombardment of cells with DNA laden microprojectiles (Kleinetal., Plant Physiol. 91:440-444 (1989) and Boynton et al., Science 240(4858):1534-1538 (1988)). Numerous methods now exist, for example, for the transformation of cereal crops (see, McKinnon, G.E. and Henry, J. Cereal Science, 22(3):203-210 (1995); Mendel, R.R. and Teeri, Plant and Microbial Biotechnology Research Series, 3:81-98, Cambridge University Press (1995); McElroy, D. and Brettell, Trends in Biotechnology, 12(2):62-68 (1994); Christou et al., Trends in Biotechnology, 10(7):239-246 (1992); Christou, P. and Ford, Annals of Botany, 75(5): 449-454 (1995); Park et al., Plant Molecular Biology, 32(6):1135-1148 (1996); Altpeter et al., Plant Cell Reports, 16:12-17 (1996)). Additionally, plant transformation strategies and techniques are reviewed in Birch, Ann Rev Plant Phys Plant Mol Biol 48:297 (1997); Forester et al., Exp.
Agric. 33:15-33 (1997). Minor variations make these technologies applicable to a broad range of plant species.
Each of these techniques has advantages and disadvantages. In each of the techniques, DNA from a plasmid is genetically engineered such that it contains not only the gene of interest, but also selectable and screenable marker genes. A selectable marker gene is used to select only those cells that have integrated copies of the plasmid (the construction is such that the gene of interest and the selectable and WO 98/20113 PCT/US97/20391 -22screenable genes are transferred as a unit). The screenable gene provides another check for the successful culturing of only those cells carrying the genes of interest. A commonly used selectable marker gene is neomycin phosphotransferase II (NPT II).
This gene conveys resistance to kanamycin, a compound that can be added directly to the growth media on which the cells grow. Plant cells are normally susceptible to kanamycin and, as a result, die. The presence of the NPT II gene overcomes the effects of the kanamycin and each cell with this gene remains viable. Another selectable marker gene which can be employed in the practice of this invention is the gene which confers resistance to the herbicide glufosinate (Basta). A screenable gene commonly used is the f-glucuronidase gene (GUS). The presence of this gene is characterized using a histochemical reaction in which a sample of putatively transformed cells is treated with a GUS assay solution. After an appropriate incubation, the cells containing the GUS gene turn blue. Preferably, the plasmid will contain both selectable and screenable marker genes.
The plasmid containing one or more of these genes is introduced into either plant protoplasts or callus cells by any of the previously mentioned techniques. If the marker gene is a selectable gene, only those cells that have incorporated the DNA package survive under selection with the appropriate phytotoxic agent. Once the appropriate cells are identified and propagated, plants are regenerated. Progeny from the transformed plants must be tested to insure that the DNA package has been successfully integrated into the plant genome.
Mammalian host cells may also be used in the practice of the invention.
Examples of suitable mammalian cell lines include monkey kidney CVI line transformed by SV40 (COS-7, ATCC CRL 1651); human embryonic kidney line 293S (Graham et al., J. Gen. Virol. 36:59 (1977)); baby hamster kidney cells (BHK, ATCC CCL 10); Chinese hamster ovary cells (Urlab and Chasin, Proc. Natl. Acad.
Sci USA 77:4216 (1980)); mouse sertoli cells (TM4, Mather, Biol. Reprod. 23:243 (1980)); monkey kidney cells (CVI-76, ATCC CCL 70); African green monkey kidney cells (VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, ATCC CCL canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 1442); human lung cells (W138, ATCC CCL 75); human liver cells (HepG2, HB 8065); mouse mammary tumor cells (MMT 060562, ATCC CCL 51); rat hepatoma cells (HTC, MI.54, Baumann et al., J. Cell Biol. 85:1 (1980)); and TRI cells (Mather et al., Annals N.Y. Acad Sci. 383:44 (1982)).
Expression vectors for these cells ordinarily include (if necessary) DNA sequences WO 98/20113 PCTfUS9720391 -23for an origin of replication, a promoter located in front of the gene to be expressed, a ribosome binding site, an RNA splice site, a polyadenylation site, and a transcription terminator site.
Promoters used in mammalian expression vectors are often of viral origin.
These viral promoters are commonly derived from polyoma virus, Adenovirus 2, and most frequently Simian Virus 40 (SV40). The SV40 virus contains two promoters that are termed the early and late promoters. These promoters are particularly useful because they are both easily obtained from the virus as one DNA fragment that also contains the viral origin of replication (Fiers et al., Nature 273:113 (1978)). Smaller or larger SV40 DNA fragments may also be used, provided they contain the approximately 250-bp sequence extending from the HindIII site toward the BglI site located in the viral origin of replication.
Alternatively, promoters that are naturally associated with the foreign gene (homologous promoters) may be used provided that they are compatible with the host cell line selected for transformation.
An origin of replication may be obtained from an exogenous source, such as or other virus Polyoma, Adeno, VSV, BPV) and inserted into the cloning vector. Alternatively, the origin of replication may be provided by the host cell chromosomal replication mechanism. If the vector containing the foreign gene is integrated into the host cell chromosome, the latter is often sufficient.
The use of a secondary DNA coding sequence can enhance production levels of pinoresinol/lariciresinol reductase or dirigent protein in transformed cell lines.
The secondary coding sequence typically comprises the enzyme dihydrofolate reductase (DHFR). The wild-type form of DHFR is normally inhibited by the chemical methotrexate (MTX). The level of DHFR expression in a cell will vary depending on the amount of MTX added to the cultured host cells. An additional feature of DHFR that makes it particularly useful as a secondary sequence is that it can be used as a selection marker to identify transformed cells. Two forms of DHFR are available for use as secondary sequences, wild-type DHFR and MTX-resistant DHFR. The type of DHFR used in a particular host cell depends on whether the host cell is DHFR deficient (such that it either produces very low levels of DHFR endogenously, or it does not produce functional DHFR at all). DHFR-deficient cell lines such as the CHO cell line described by Urlaub and Chasin, supra, are transformed with wild-type DHFR coding sequences. After transformation, these DHFR-deficient cell lines express functional DHFR and are capable of growing in a WO 98/20113 PCTIUS97/20391 -24culture medium lacking the nutrients hypoxanthine, glycine and thymidine.
Nontransformed cells will not survive in this medium.
The MTX-resistant form of DHFR can be used as a means of selecting for transformed host cells in those host cells that endogenously produce normal amounts of functional DHFR that is MTX sensitive. The CHO-KI cell line (ATCC No. CL 61) possesses these characteristics, and is thus a useful cell line for this purpose. The addition of MTX to the cell culture medium will permit only those cells transformed with the DNA encoding the MTX-resistant DHFR to grow. The nontransformed cells will be unable to survive in this medium.
Prokaryotes may also be used as host cells for the initial cloning steps of this invention. They are particularly useful for rapid production of large amounts of DNA, for production of single-stranded DNA templates used for site-directed mutagenesis, for screening many mutants simultaneously, and for DNA sequencing of the mutants generated. Suitable prokaryotic host cells include E. coli K12 strain 294 (ATCC No. 31,446), E. coli strain W3110 (ATCC No. 27,325) E. coli X1776 (ATCC No. 31,537), and E. coli B; however many other strains of E. coli, such as HB101, JM101, NM522, NM538, NM539, and many other species and genera of prokaryotes including bacilli such as Bacillus subtilis, other enterobacteriaceae such as Salmonella typhimurium or Serratia marcesans, and various Pseudomonas species may all be used as hosts. Prokaryotic host cells or other host cells with rigid cell walls are preferably transformed using the calcium chloride method as described in section 1.82 of Sambrook et al., supra. Alternatively, electroporation may be used for transformation of these cells. Prokaryote transformation techniques are set forth in Dower, W. in Genetic Engineering, Principles and Methods, 12:275-296, Plenum Publishing Corp. (1990); Hanahan et al., Meth. Enxymol., 204:63 (1991).
As a representative example, cDNA sequences encoding dirigent protein or pinoresinol/lariciresinol reductase may be transferred to the (His) 6 .Tag pET vector commercially available (from Novagen) for overexpression in E. coli as heterologous host. This pET expression plasmid has several advantages in high level heterologous expression systems. The desired cDNA insert is ligated in frame to plasmid vector sequences encoding six histidines followed by a highly specific protease recognition site (thrombin) that are joined to the amino terminus codon of the target protein. The histidine "block" of the expressed fusion protein promotes very tight binding to immobilized metal ions and permits rapid purification of the recombinant protein by immobilized metal ion affinity chromatography. The histidine leader sequence is WO 98/20113 PCT[US97/20391 then cleaved at the specific proteolysis site by treatment of the purified protein with thrombin, and the dirigent protein or pinoresinol/lariciresinol reductase eluted. This overexpression-purification system has high capacity, excellent resolving power and is fast, and the chance of a contaminating E. coli protein exhibiting similar binding behavior (before and after thrombin proteolysis) is extremely small.
As will be apparent to those skilled in the art, any plasmid vectors containing replicon and control sequences that are derived from species compatible with the host cell may also be used in the practice of the invention. The vector usually has a replication site, marker genes that provide phenotypic selection in transformed cells, one or more promoters, and a polylinker region containing several restriction sites for insertion of foreign DNA. Plasmids typically used for transformation of E. coli include pBR322, pUC18, pUC19, pUCI18, pUC119, and BluescriptM13, all of which are described in Sections 1.12-1.20 of Sambrook et al., supra. However, many other suitable vectors are available as well. These vectors contain genes coding for ampicillin and/or tetracycline resistance which enables cells transformed with these vectors to grow in the presence of these antibiotics.
The promoters most commonly used in prokaryotic vectors include the p-lactamase (penicillinase) and lactose promoter systems (Chang etal. Nature 375:615 (1978); Itakura et al., Science 198:1056 (1977); Goeddel et al., Nature 281:544 (1979)) and a tryptophan (trp) promoter system (Goeddel et al., Nucl. Acids Res. 8:4057 (1980); EPO Appl. Publ. No. 36,776), and the alkaline phosphatase systems. While these are the most commonly used, other microbial promoters have been utilized, and details concerning their nucleotide sequences have been published, enabling a skilled worker to ligate them functionally into plasmid vectors (see Siebenlist et al., Cell 20:269 (1980)).
Many eukaryotic proteins normally secreted from the cell contain an endogenous secretion signal sequence as part of the amino acid sequence. Thus, proteins normally found in the cytoplasm can be targeted for secretion by linking a signal sequence to the protein. This is readily accomplished by ligating DNA encoding a signal sequence to the 5' end of the DNA encoding the protein and then expressing this fusion protein in an appropriate host cell. The DNA encoding the signal sequence may be obtained as a restriction fragment from any gene encoding a protein with a signal sequence. Thus, prokaryotic, yeast, and eukaryotic signal sequences may be used herein, depending on the type of host cell utilized to practice the invention. The DNA and amino acid sequence encoding the signal sequence WO 98/20113 PCTIS97/20391 -26portion of several eukaryotic genes including, for example, human growth hormone, proinsulin, and proalbumin are known (see Stryer, Biochemistry W.H. Freeman and Company, New York, NY, p. 769 (1988)), and can be used as signal sequences in appropriate eukaryotic host cells. Yeast signal sequences, as for example acid phosphatase (Arima et al., Nucleic Acids Res. 11:1657 (1983)), alpha-factor, alkaline phosphatase and invertase may be used to direct secretion from yeast host cells.
Prokaryotic signal sequences from genes encoding, for example, LamB or OmpF (Wong et al., Gene 68:193 (1988)), MalE, PhoA, or beta-lactamase, as well as other genes, may be used to target proteins from prokaryotic cells into the culture medium.
Trafficking sequences from plants, animals and microbes can be employed in the practice of the invention to direct the gene product to the cytoplasm, endoplasmic reticulum, mitochondria or other cellular components, or to target the protein for export to the medium. These considerations apply to the overexpression of pinoresinol/lariciresinol reductase or dirigent protein, and to direction of expression within cells or intact organisms to permit gene product function in any desired location.
The construction of suitable vectors containing DNA encoding replication sequences, regulatory sequences, phenotypic selection genes and the dirigent protein DNA or pinoresinol/lariciresinol reductase DNA of interest are prepared using standard recombinant DNA procedures. Isolated plasmids and DNA fragments are cleaved, tailored, and ligated together in a specific order to generate the desired vectors, as is well known in the art (see, for example, Sambrook et al., supra).
As discussed above, pinoresinol/lariciresinol reductase variants, or dirigent protein variants, are preferably produced by means of mutation(s) that are generated using the method of site-specific mutagenesis. This method requires the synthesis and use of specific oligonucleotides that encode both the sequence of the desired mutation and a sufficient number of adjacent nucleotides to allow the oligonucleotide to stably hybridize to the DNA template.
A dirigent protein gene and/or pinoresinol/lariciresinol reductase gene, or an antisense nucleic acid fragment complementary to all or part of a dirigent protein gene or pinoresinol/lariciresinol reductase gene, may be introduced, as appropriate, into any plant species for a variety of purposes including, but not limited to: altering or improving the color, texture, durability and pest-resistance of wood tissue, especially heartwood tissue; reducing the formation of lignans and/or lignins in plant species, such as corn, which are useful as animal fodder, thereby enhancing the WO 98/20113 PCTIUS97/20391 -27availability of the cellulose fraction of the plant material to the digestive system of animals ingesting the plant material; reducing the lignan/lignin content of plant species utilized in pulp and paper production, thereby making pulp and paper production easier and cheaper; improving the defensive capability of a plant against predators and pathogens by enhancing the production of defensive lignans or lignins; the alteration of other ecological interactions mediated by lignans or lignins; producing elevated levels of optically-pure lignan enantiomers as medicines or food additives; introducing, enhancing or inhibiting the production of dirigent proteins or pinoresinol/lariciresinol reductases, or the production of pinoresinol or lariciresinol and their derivatives. A dirigent protein and/or pinoresinol/lariciresinol reductase gene may be introduced into any organism for a variety of purposes including, but not limited to: introducing, enhancing or inhibiting the production of dirigent protein and/or pinoresinol/lariciresinol reductase, or the production of pinoresinol or lariciresinol and their derivatives.
The foregoing may be more fully understood in connection with the following representative examples, in which "Plasmids" are designated by a lower case p followed by an alphanumeric designation. The starting plasmids used in this invention are either commercially available, publicly available on an unrestricted basis, or can be constructed from such available plasmids using published procedures. In addition, other equivalent plasmids are known in the art and will be apparent to the ordinary artisan.
"Digestion", "cutting" or "cleaving" of DNA refers to catalytic cleavage of the DNA with an enzyme that acts only at particular locations in the DNA. These enzymes are called restriction endonucleases, and the site along the DNA sequence where each enzyme cleaves is called a restriction site. The restriction enzymes used in this invention are commercially available and are used according to the instructions supplied by the manufacturers. (See also Sections 1.60-1.61 and Sections 3.38-3.39 of Sambrook et al., supra.) "Recovery" or "isolation" of a given fragment of DNA from a restriction digest means separation of the resulting DNA fragment on a polyacrylamide or an agarose gel by electrophoresis, identification of the fragment of interest by comparison of its mobility versus that of marker DNA fragments of known molecular weight, removal of the gel section containing the desired fragment, and separation of the gel from DNA. This procedure is known generally. For example, see Lawn et al.
WO 98/20113 PCT/US97/20391 -28- (Nucleic Acids Res. 9:6103-6114 (1982)), and Goeddel et al. (Nucleic Acids Res., supra).
The following examples merely illustrate the best mode now contemplated for practicing the invention, but should not be construed to limit the invention. All literature citations herein are expressly incorporated by reference.
EXAMPLE 1 Purification of Dirigent Protein from Forsvthia intermedia Plant Materials. Forsythia intermedia plants were either obtained from Bailey's Nursery (var. Lynwood Gold, St., Paul, MN), and maintained in Washington State University greenhouse facilities, or were gifts from the local community.
Initial Extraction and Ammonium Sulphate Precipitation. Solubilization of bound proteins was carried out at 4 0 C. Frozen Forsythia intermedia stems (2 kg) were pulverized in a Waring Blendor (Model CB6) in the presence of liquid nitrogen.
The resulting powder was homogenized with 0.1 M KH 2
PO
4
-K
2
HPO
4 buffer (pH 7.0, 4 liters) containing 5 mM dithiothreitol, and filtered through four layers of cheesecloth. The insoluble residue was consecutively extracted, with continuous agitation at 250 rpm, as follows: with chilled (-20 0 C) re-distilled acetone (4 liters, 3x30min); 0.1 M KH 2 P0 4
-K
2 HP0 4 buffer (pH 6.5) containing 0.1% P-mercaptoethanol (solution A, 8 liters, 30 min); solution A containing 1% Triton X100 (8 liters, 4 hours) and finally solution A (8 liters, 16 hours). Between each extraction, the residue was filtered through one layer of Miracloth (Calbiochem). Solubilization of the (+)-pinoresinol forming system was achieved by mechanically stirring the residue in solution A containing 1 M NaCI (8 liters, 4 hours). The homogenate was decanted and the resulting solution consecutively filtered through Miracloth (Calbiochem) and glass fiber (G6, Fisher Sci.). The filtrate was concentrated in an Amicon cell (Model 2000, YM 30 membrane) to a final volume of -800 ml, and subjected to (NH 4 2 S0 4 fractionation. Proteins precipitating between 40 and 80% saturation were recovered by centrifugation (15,000g, 30 min) and the (NH 4 2 S0 4 pellet stored at -20 0 C until required.
Mono S Column Chromatography. Purification of 78-kD dirigent protein and partial purification of oxidase. The ammonium sulfate pellet (obtained from 2 kg of F intermedia stems) was reconstituted in 40 mM MES 2 -(N-Morpholino)ethanesulfonic acid] buffer, adjusted to pH 5.0 with 6 M NaOH (solution B, 30 ml), the slurry being centrifuged (3,600g, 5 min), and the supernatant dialyzed overnight against solution B (4 liters). The dialyzed extract was filtered 6 9 WO 98/20113 PCT/US97/20391 -29- (0.22 utm) and the sample (35 to 40 mg proteins) was applied to a MonoS mm by 5 mm) column equilibrated in solution B at 4 0 C. After eluting (flow rate ml min- 1 cm- 2 with solution B (13 ml), proteins were desorbed with Na 2
SO
4 in solution B, using a linear gradient from 0 to 100 mM in 8 ml and holding at this concentration for 32 ml, then implementing a series of step gradients at 133 mM for ml, 166 mM for 50 ml, 200 mM for 40 ml, 233 mM for 40 ml and finally 333 mM Na 2
SO
4 for 40 ml. Fractions capable of forming (+)-pinoresinol from E-coniferyl alcohol were eluted with 333 mM Na 2
SO
4 combined and stored (-80 0 C) until needed.
POROS SP-M Matrix Column Chromatography (First Column). Fractions from 15 individual elutions from the MonoS HR5/5 column (33mM Na 2
SO
4 were combined (18.5 mg proteins, 180 ml) and dialyzed overnight against solution C. The dialyzed enzyme solution (190 ml) was filtered (0.22 lpm) and an aliquot (47 ml) was applied to the POROS SP-M column. All separations on a POROS SP-M matrix (100 mm by 4.6 mm), previously equilibrated in 25 mM MES-HEPES-sodium acetate buffer (pH 5.0, solution were performed at a flow rate of 60 ml min-l cm- 2 and at room temperature. After elution with solution C (12 ml), the proteins were desorbed with a linear Na 2
SO
4 gradient (0 to 0.7 M in 66.5 ml) in solution C, whereupon the concentration established was held for an additional 16.6 ml. Under these conditions, separation of four fractions II, III and IV) was achieved at 47, 55 and 61 mS, respectively. This purification step was repeated three times with the remaining dialyzed enzymatic extract, and fractions I, II, II, and IV from each experiment were separately combined. When protease inhibitors [that is, phenylmethanesulfonyl fluoride (0.1 mmol ml-1), EDTA (0.5 nmol ml-1), pepstatin A (1 pg ml- 1 and antipain (1 pg were added during the solubilization and all subsequent purification stages, no differences were observed in the elution profiles of fractions I, II, III, and IV.
POROS SP-M Matrix Column Chromatography (Second Column). Fraction I from the first POROS SP-M Matrix column chromatography step (2.62 mg proteins, 40 ml, -24.6 mS) was diluted in filtered, cold distilled water until the conductivity reached ~8 mS (final volume 150 ml). The diluted protein solution was then applied onto a POROS SP-M column (100 mm by 4.6 mm). After elution with solution C (12 ml), fraction I was desorbed using a linear Na 2
SO
4 gradient from 0 to 0.25 M in 20 ml, whereupon the concentration established was held for another 25 ml. This was followed by another linear Na 2
SO
4 gradient from 0.25 to 0.7 M in WO 98/20113 PCTIUS97/20391 26 ml which was then held at 0.7 M for an additional 16.6 ml. Fractions eluted at mS (the ionic strength of the eluent was measured with a flow-through detector) were combined (15 ml, 1.3 mg), diluted with water and rechromatographed. The resulting protein (eluted at -30 mS with the gradient described above) was stored (-80 0 C) until needed.
Gelfiltration. An aliquot from fraction I (595.5 tg proteins, 3 ml, eluted at mS), was concentrated to 0.6 ml (Centricon 10, Amicon) and loaded onto a S200 (73.2 cm by 1.6 cm, Pharmacia-LKB) gel chromatographic column equilibrated in 0.1 M MES-HEPES-sodium acetate buffer (pH 5.0) containing 50 mM Na 2
SO
4 at 4 0 C. An apparently homogenous 78-kD dirigent protein (242 ug) was eluted (flow rate 0.25 ml min cm 2 as a single component at 133 ml (Vo 105 ml). Molecular weights were estimated by comparison of their elution profiles with the standard proteins, B-amylase (200,000), alcohol dehydrogenase (150,000), bovine serum albumin (66,000), ovalbumin (45,000), carbonic anhydrase (29,000) and cytochrome c (12,400).
EXAMPLE 2 Characterization of the Purified Dirigent Protein Molecular Weight and Isoelectric Point Determination. Polyacrylamide gel electrophoresis (PAGE) was performed in Laemmli's buffer system with gradient (4 to 15% acrylamide, Bio-Rad) gels under denaturing and reducing conditions.
Proteins were visualized by silver staining. Gel filtration (S200) chromatography of fraction I gave a protein of native molecular weight -78 kD, whereas SDSpolyacrylamide gel electrophoresis showed a single band at -27 kD, suggesting that the native protein exists as a trimer. Isoelectric focusing of the native protein on a polyacrylamide gel (pH 3 to 10 gradient) revealed the presence of six bands. After isoelectric focusing, each of these bands was electroblotted onto a polyvinylidene fluoride (PVDF) membrane and subjected to amino terminal sequencing, which established that all had similar sequences indicating a series of isoforms. The ultraviolet-visible spectrum of the protein had only a characteristic protein absorbance at 280 nm with a barely perceptible shoulder at -330 nm. Inductively coupled plasma (ICP) analysis gave no indication of any metal being present in the protein. Thus, the 78-kD dirigent protein lacks any detectable catalytically active oxidative center.
Assay of the Ability of the Purified Dirigent Protein to Form (+)Pinoresinol from E-Coniferyl alcohol. The four fractions (I to IV) from the first POROS SP-M WO 98/20113 PCTfUS97/20391 -31chromatographic step (Example 1) were individually rechromatographed, with each fraction subsequently assayed for (+)-pinoresinol-forming activity with E-[9- 3 H]coniferyl alcohol as substrate for one hour. Fraction I (containing dirigent protein) had very little (+)-pinoresinol-forming activity of total activity loaded onto the POROS SP-M column), whereas fraction III catalyzed nonspecific oxidative coupling to give (±)-dehydrodiconiferyl alcohols, (±)-pinoresinols, and (±)-erythro/threo guaiacylglycerol 8-O-4'-coniferyl alcohol ethers. Thus, Fraction III appeared to contain an endogenous plant oxygenating protein.
Although the putative oxidase preparation (Fraction III) was not purified to electrophoretic homogeneity, the electron paramagnetic resonance (EPR) spectrum of this protein preparation resembled that of a typical plant laccase, a class of naturally-occurring plant oxygenase proteins. We then studied the fate of E-[9- 3 H]coniferyl alcohol (2 imol ml- 1 14.7 kBq) in the presence of, respectively, the oxidase (fraction III), the 78-kD dirigent protein (Fraction and both fraction III and the 78-kD protein together. With the fraction III preparation alone, only nonspecific bimolecular radical coupling occurs to give (+)-dehydrodiconiferyl alcohols, (±)-pinoresinols and (±)-erythro/threo guaiacylglycerol 8-O-4'-coniferyl alcohol ethers. With the 78-kD protein by itself, however, a small amount of (+)-pinoresinol formation over 10 hours) was observed, this being presumed to result from residual traces of oxidizing capacity in the preparation. When both fraction III and the 78-kD protein were combined, full catalytic activity and regioand stereo-specificity in the product was reestablished, whereby essentially only (+)-pinoresinol was formed. Additionally, with fraction III alone, and when fraction III was combined with the 78-kD protein, the rates of substrate depletion and dimeric product formation were nearly identical. Moreover, essentially no turnover of the dimeric lignan products occurred in either case in the presence of the oxidase, over the time-period (8 hours) examined: subsequent dimer oxidation does not occur when E-coniferyl alcohol, the preferred substrate, is still present in the assay mixture.
The 78-kD protein therefore appears to determine the specificity of the bimolecular phenoxy radical coupling reaction.
Gel filtration studies were also carried out with mixtures of the dirigent and fraction III proteins, in order to establish if any detectable protein-protein interaction might account for the stereoselectivity. But no evidence in support of complex formation to higher molecular size entities) was observed.
WO 98/20113 PCT/US97/20391 -32- EXAMPLE 3 Effect of the 78-KD Dirigent Protein on Plant Laccase-Catalyzed Monolignol Coupling E-coniferyl alcohol coupling assay. E-[9- 3 H]Coniferyl alcohol (4 pmol ml- 1 29.3 kBq) was incubated with a 120-kD laccase (previously purified from Forsythia intermedia stem tissue) over a 24-hour period, in the presence and absence of the dirigent protein, as follows. Each assay consisted of E-[9- 3 H]coniferyl alcohol (4 pmol ml- 1 29.3 kBq, 7.3 MBq mole liter- 1 or 2 pmol ml-1, 14.7 kBq with fraction III), the 78-kD dirigent protein, an oxidase or oxidant, or both [final concentrations: 770 pmol ml-1 dirigent protein; 10.7 pmol protein ml-1 Forsythia laccase; 12 pg protein mli fraction III; 0.5 pmol ml- 1 FMN; 0.5 pmol ml- 1 FAD; I and 10 pLmol ml- 1 ammonium peroxydisulfate] in buffer (0.1 M MES-HEPESsodium acetate, pH 5.0) to a total volume of 250 ptl. The enzymatic reaction was initiated by addition of E-[9- 3 H]coniferyl alcohol. Controls were performed in the presence of buffer alone.
After one hour incubation at 30 OC while shaking, the assay mixture was extracted with ethyl acetate (EtOAc, 500 pl) containing (±)-pinoresinols (7.5 pg), (±)-dehydrodiconiferyl alcohols (3.5 pg) and erythro/threo (±)-guaiacylglycerol 8-O-4'-coniferyl alcohol ethers (7.5 pg) as radiochemical carriers and ferulic acid (15.0 ig) as an internal standard. After centrifugation (13,800g, 5 min), the EtOAc soluble components were removed and the extraction procedure repeated with EtOAc (500 pl). The EtOAc soluble components from each assay were combined, the solutions evaporated to dryness in vacuo, redissolved in methanol-water solution 100 pl) with an aliquot (50 pl) thereof subjected to reversed-phase column chromatography (Waters, Nova-Pak C 18 150 mm by 3.8 mm). The elution conditions were as follows: acetonitrile/3 acetic acid in H 2 0 (5:95) from 0 to min, then linear gradients to ratios of 10:90 between 5 and 20 min, then to 20:80 between 20 and 45 min and finally to 50:50 between 45 and 60 min, at a flow rate of 8.8 ml min- 1 cm- 2 Fractions corresponding to E-coniferyl alcohol, erythro/threo guaiacylglycerol 8-O-4'-coniferyl alcohol ethers, (+)-dehydrodiconiferyl alcohols and (±)-pinoresinols were individually collected, aliquots removed for liquid scintillation counting, and the remainder freeze-dried. Pinoresinol-containing fractions were redissolved in methanol (100 p1) and subjected to chiral column chromatography (Daicel, Chiralcel OD, 50 mm by 4.6 mm) with a solution of WO 98/20113 PCT/US97/20391 -33hexanes and ethanol as the mobile phase (flow rate 3 ml min- 1 cm- 2 whereas dehydrodiconiferyl alcohol fractions were subjected to Chiralcel OF (250 mm by 4.6 mm) column chromatography eluted with a solution of hexanes and isopropanol as the mobile phase (flow rate 2.4 ml min- 1 cm- 2 the radioactivity of the eluent being measured with a flow-through detector (Radiomatic, Model A120).
Results of E-coniferyl alcohol coupling assay. Incubation with laccase alone gave only racemic dimeric products, with (±)-dehydrodiconiferyl alcohols predominating. In the presence of the dirigent protein, however, the process was now primarily stereoselective, affording (+)-pinoresinol, rather than being nonspecific as observed when only laccase was present. The rates of both E-coniferyl alcohol (substrate) depletion and the formation of the dimeric lignans were similar with and without the dirigent protein. A substantial difference was noted in the subsequent turnover of the lignan products observed after E-coniferyl alcohol depletion. With the laccase alone no turnover occurred, but when both proteins were present the disappearance of the products was significant. In order to understand the difference, assays were conducted where bovine serum albumin (BSA) and ovalbumin were individually added to the laccase-containing solutions at levels matching the weight concentrations of the dirigent protein. In this way, it was established that the differences in product turnover were simply due to stabilization of laccase activity at the higher protein concentrations, although interestingly the dirigent protein, BSA and ovalbumin afforded somewhat different degrees of protection. The findings were quite comparable when a fungal laccase (from Trametes versicolor) was used in place of the plant laccase. When the oxidizing capacity laccase concentration) was lowered five-fold, only (+)-pinoresinol formation was observed. Thus, complete stereoselectivity is preserved when the oxidative capacity does not exceed a point where the dirigent protein is saturated.
Stereoselective E-coniferyl alcohol coupling. Assays were also conducted with E-[9- 2
H
2
OC
2
H
3 ]coniferyl alcohol and the dirigent protein in the presence of laccase as follows. E-[9- 2
H
2
OC
2
H
3 ]coniferyl alcohol (2 pmol ml- 1 was incubated in the presence of dirigent protein (770 pmol ml-i), the purified plant laccase (4.1 pmol ml-1) and buffer (0.1 M MES-HEPES-sodium acetate, pH 5.0) in a total volume of 250 After one hour incubation, the reaction mixture was extracted with EtOAc, but with the addition of an internal standard and radiochemical carriers omitted. After reversed-phase column chromatography, the enzymatically formed pinoresinol was collected, freeze-dried, redissolved in methanol (100 pl) and WO 98/20113 PCTIUS97/20391 -34subjected to chiral column chromatography (Daicel, Chiralcel OD, 50 mm by 4.6 mm) with detection at 280 nm and analysis by mass spectral fragmentation in the El mode (Waters, Integrity System). Liquid chromatography-mass spectrometry (LC-MS) analysis of the resulting (+)-pinoresinol enantiomeric excess) gave a molecular ion with a mass to charge ratio 368, thus establishing the presence of 2 H atoms and verifying that together the laccase and dirigent protein catalyzed stereoselective coupling ofE-[9- 2
H
2
OC
2
H
3 ]coniferyl alcohol.
Other auxiliary one-electron oxidants can also facilitate stereoselective coupling with the dirigent protein. Ammonium peroxydisulfate readily undergoes homolytic cleavage Usaitis, R. Makuska, Polymer 35:4896 (1994)) and is routinely used as an one-electron oxidant in acrylamide polymerization. Ammonium peroxydisulfate was first incubated with E-[9- 3 H]coniferyl alcohol (4 imol ml- 1 29.3 kBq) for 6 hours using the E-coniferyl alcohol coupling assay procedure described above. Nonspecific bimolecular radical coupling was observed, to afford predominantly (±)-dehydrodiconiferyl alcohols as well as the other racemic lignans (Table However, when the dirigent protein was added, the stereoselectivity of coupling was dramatically altered to give primarily (+)-pinoresinol at both concentrations of oxidant, together with small amounts of racemic lignans. This established that even an inorganic oxidant, such as ammonium peroxydisulfate, could promote (+)-pinoresinol synthesis in the presence of the dirigent protein, even if it was not oxidatively as selective toward the monolignol as was the fraction III oxidase or laccase.
Table 1.
Effect of dirigent protein on product distribution from E-coniferyl alcohol oxidized by ammonium peroxydisulfate (6 hour assay).
Dirigent protein Oxidant (770 pmol ml- 1 E-Coniferyl alcohol in dimer equivalents depleted (nmol mi- 1 (±)-Guaiacyl-glycerol 8-O-4-coniferyl alcohol ethers (nmol m1- 1 (±)-Dehydrodiconiferyl alcohols (nmol ml1) (±)-Pinoresinol s (nmol m['1) (+)-Pinoresinol (nmoi mi-) Total dimers (nmrd mil~ Ammonium peroxydisulfate (I jimol ni- 1 Ammonium peroxydisulfate jimol mA-) Dirigent protein absent present absent present present 200 ±4 250 ±55 860 ±30 1030 ±25 61 20 10 1 6 0 90+±4 30±1 5 1 35 ±2 13
I
16 ±0 61 ±3 130 ±10 149 ±11 250±=L10 90 3 135 =I 4 0 0 475±=L17 450 ±10 55 ±L I 570 ±14 68 ±3 8±1I S I, WO 98/20113 PCT/US97/20391 -36- Effect of Other Oxygenating Agents on the Stereospecific Conversion of E- Coniferyl Alcohol to (+)-pinoresinol. The effects of incubating E-coniferyl alcohol (4 gmol ml- 1 29.3 kBq) with flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD) were investigated since, in addition to their roles as enzyme cofactors, they can also oxidize various organic substrates Bruice, Acc. Chem.
Res. 13:256 (1980)). E-[9- 3 H]coniferyl alcohol was respectively incubated with FMN and FAD for 48 hours. To obtain the FMN, snake (Naja naja atra, Formosan cobra) venom was added to a solution of FAD (5 ptmol ml- 1 in H 2 0) and, after min incubation at 30 0 C, the enzymatically formed FMN was separated from the protein mixture by filtration through a Centricon 10 (Amicon) microconcentrator. In every instance, E-coniferyl alcohol oxidation was more rapid in the presence of FMN than FAD. Although these differences between the FMN and FAD catalyzed rates of E-coniferyl alcohol oxidation were not anticipated, a consistent pattern was sustained: racemic lignan products were obtained, with the (±)-dehydrodiconiferyl alcohols predominating as before. When the time courses were repeated in the presence of the dirigent protein, a dramatic change in stereoselectivity was observed, where essentially only (+)-pinoresinol formation occurred. Again, the rates of E-coniferyl alcohol depletion, when adjusted for the traces of residual oxidizing capacity over 10 hours) in the dirigent protein preparation, were dependent only upon [FMN] and [FAD], as were the total amounts of dimers formed. When full depletion of E-coniferyl alcohol occurs, the corresponding lignan dimers can begin to undergo oxidative changes as a function of time; specifically, FMN is able subsequently to oxidize pinoresinol, in open solution, after the E-coniferyl alcohol has been fully depleted.
Investigation of Substrate-Specific Stereoselectivity. The coupling stereoselectivity was substrate specific. Neither E-p-[9- 3 H]coumaryl (4 pmol ml- 1 44.5 kBq) or E-[8-1 4 C]sinapyl alcohols (4 umol ml- 1 8.3 kBq), which differ from E-coniferyl alcohol only by a methoxyl group substituent on the aromatic ring, yielded stereoselective products when incubated for 6 hours with FMN and ammonium peroxydisulfate respectively, in the presence and absence of the dirigent protein. Incubations were carried out as described above with the following modifications: E-p-[9- 3 H]coumaryl (4 pmol ml-1, 44.5 kBq) or E-[8- 14 C]sinapyl alcohols (4 pmol ml- 1 8.3 kBq) were used as substrates and, after 6 hour incubation at 30 0 C, the reaction mixture was extracted with EtOAc but without addition of radiochemical carriers. E-Sinapyl alcohol readily underwent coupling to afford WO 98/20113 PCT/US97/20391 -37syringaresinol, but chiral HPLC analysis revealed that the resulting products were, in every instance, racemic (Table Interestingly, by itself, the 78-kD dirigent protein preparation catalyzed a low level of dimer formation, as previously noted, but only gave rise to racemic (±)-syringaresinol formation, which is presumably a consequence of the residual traces of contaminating oxidizing capacity present in the protein preparation.
In an analogous manner, no stereoselective coupling was observed with E-pcoumaryl alcohol as substrate. That is, only E-coniferyl alcohol undergoes stereoselective coupling in the presence of the dirigent protein. Given the marked substrate specificity of the dirigent protein for E-coniferyl alcohol, it will be of considerable interest to determine, in the future, how it differs from that affording (+)-syringaresinol in Eucommia ulmoides Deyama, Chem. Pharm. Bull. 31, 2993 (1983)).
Table 2.
Effect of dirigent protein on coupling of E-sinapyl alcohol (6 hour assay).
E-Sinapyl alcohol in dimer equivalents Racemic Dirigent protein depleted (±)-syringaresinols (770 pmol ml-1) (nmol ml-1) (nmol ml- 1 FMN absent 570 100 290 umol ml-1) present 610 110 340 Ammonium absent 1400 120 1020 peroxydisulfate umol ml- 1 present 1520 10 1060 Dirigent protein present 110 10 50 Although the inventors do not intend to be bound by any particular mechanism for stereoselective coupling, three distinct possibilities can be envisaged.
The most likely is that the oxidase or oxidant generates free-radical species from E-coniferyl alcohol, and that the latter are the true substrates that bind to the dirigent protein prior to coupling. The other two possibilities would require that E-coniferyl alcohol molecules are bound and oriented on the dirigent protein, thereby ensuring that only (+)-pinoresinol formation occurs upon subsequent oxidative coupling: this WO 98/20113 PCT[US97/20391 -38could occur either if both substrate phenolic hydroxyl groups were exposed so that they could readily be oxidized by an oxidase or oxidant, or if an electron transfer mechanism were operative between the oxidase or oxidant and an electron acceptor site or sites on the dirigent protein.
Among the three alternative mechanisms, three lines of evidence suggest "capture" of phenoxy radical intermediates by the dirigent protein. First, the rates of both substrate depletion and product formation are largely unaffected by the presence of the dirigent protein. If capture of the free-radical intermediates is the operative mechanism, the dirigent protein would only affect the specificity of coupling when single-electron oxidation of coniferyl alcohol is rate-determining. Second, an electron transfer mechanism is currently ruled out, since we observed no new ultraviolet-visible chromophores in either the presence or absence of an auxiliary oxidase or oxidant, under oxidizing conditions. Third, preliminary kinetic data (as disclosed in Example 4) support the concept of free-radical capture based on the formal values of Michaelis constant (Km) and maximum velocity (Vmax) characterizing the conversion of E-coniferyl alcohol into (+)-pinoresinol, with the dirigent protein alone and in the presence of the various oxidases or oxidants.
EXAMPLE 4 Kinetic Characterization of the Conversion of E-Coniferyl Alcohol to (+)-pinoresinol in the Presence of Dirigent Protein and an Oxygenating Agent.
Assays were carried out as described in Example 3 by incubating a series of E-[9- 3 H]coniferyl alcohol concentrations (between 8.00 and 0.13 Pmol ml-1, 7.3 MBq mole liter) with dirigent protein (770 pmol ml-1) alone and in presence of Forsythia laccase (2.1 pmol ml- 1 fraction III (12 tg protein ml-1), or FMN (0.5 pmol ml- 1 Assays with dirigent protein, in presence or absence of FMN, were incubated at 30 0 C for 1 hour, whereas assays with Forsythia laccase or fraction III in presence or absence of dirigent protein were incubated at 30 °C for 15 min. If freeradical capture by the dirigent protein is the operative mechanism, the Michaelis- Menten parameters obtained will only represent formal rather than true values, because the highest free-energy intermediate state during the conversion of E-coniferyl alcohol into (+)-pinoresinol is still unknown and the relation between the concentration of substrate and that of the corresponding intermediate free-radical in open solution has not been delineated.
Bearing these qualifications in mind, we estimated formal Km and Vmax values for the dirigent protein preparation. As noted earlier, it was capable of WO 98/20113 PCT/US97/20391 -39engendering formation of low levels of both (+)-pinoresinol from E-coniferyl alcohol, and racemic (±)-syringaresinols from E-sinapyl alcohol, because of traces of contaminating oxidizing capacity. With this preparation (Table a formal Km of 6 mM and Vmax of 0.02 0.02 mol s- 1 mol- 1 were obtained. However, with addition of fraction III, laccase, and FMN, the formal Km values (mM) were reduced to 1.6 0.3, 0.100 0.003, and 0.10 0.01, respectively, whereas the Vma x values were far less affected at these concentrations of auxiliary oxidase/oxidant.
Formal Km and Vmax values were calculated for the laccase and fraction III oxidase with respect to E-coniferyl alcohol conversion into the three racemic lignans.
However, no direct comparisons can be made to the 78-kD protein, since the formal Km values involve only the corresponding oxidases. For completeness, the K m (mM) and Vmax (mol s- 1 mol- 1 enzyme) were as follows: with respect to the laccase, 0.200 0.001 and 3.9 0.2 for (±)-erythro/threo guaiacylglycerol 8-O-4'-coniferyl alcohol ethers, 0.3000 0.0003 and 13.1 0.6 for (±)-dehydrodiconiferyl alcohols, and 0.300 0.002 and 7.54 0.50 for (+)-pinoresinols; with respect to the fraction III oxidase (estimated to have a native molecular weight of 80 kDa), 2.2 0.3 and 0.20 0.03 for (±)-erythro/threo guaiacylglycerol coniferyl alcohol ethers, 2.2 0.2 and 0.7 0.1 for (±)-dehydrodiconiferyl alcohols, and 3.7 0.7 and 0.6 0.1 for (±)-pinoresinols.
These preliminary kinetic parameters are in harmony with the finding that dirigent protein does not substantially affect the rate of E-coniferyl alcohol depletion in the presence of fraction III, laccase and FMN. Both sets of results are together in accord with the working hypothesis that the dirigent protein functions by capturing free-radical intermediates which then undergo stereoselective coupling.
Table 3.
Effect of various oxidants on formal Km and Vmax values for the dirigent protein (770 pmol ml-1) during (+)-pinoresinol formation from E-coniferyl alcohol.
Vmax (mol s- 1 mol- 1 Oxidase/Oxidant Formal Km (mM) dirigent protein) Dirigent protein 10 6 0.02 ±0.02 Fraction III (12 [tg protein 1.6 0.3 0.10 0.03 Laccase (2.07 pmol ml-1) 0.100 0.003 0.0600 0.0002 FMN (0.5 plmol ml-1) 0.10 0.01 0.024 0.001 WO 98/20113 PCT[US97/20391 EXAMPLE Cloning of the Dirigent Protein cDNA From Forsythia intermedia Plant Materials Forsythia intermedia plants were either obtained from Bailey's Nursery (var. Lynwood Gold, St., Paul, MN), and maintained in Washington State University greenhouse facilities, or were gifts from the local community.
Materials All solvents and chemicals used were reagent or HPLC grade.
Taq thermostable DNA polymerase was obtained from Promega, whereas restriction enzymes were from Gibco BRL (HaelII), Boehringer Mannheim (Sau3a) and Promega (TaqI). pT7Blue T-vector and competent NovaBlue cells were purchased from Novagen and radiolabeled nucleotide 32 P]dCTP) was from DuPont NEN.
Oligonucleotide primers for polymerase chain reaction (PCR) and sequencing were synthesized by Gibco BRL Life Technologies. GENECLEAN II® kits (BIO 101 Inc.) were used for purification of PCR fragments, with the gel-purified DNA concentrations determined by comparison to a low DNA mass ladder (Gibco BRL) in 1.5% agarose gels.
Instrumentation UV (including RNA and DNA determinations at OD 2 60 spectra were recorded on a Lambda 6 UV/VIS spectrophotometer. A Temptronic II thermocycler (Thermolyne) was used for all PCR amplifications. Purification of DNA for sequencing employed a QIAwell Plus plasmid purification system (QIAGEN) followed by PEG precipitation (Sambrook, Fritsch, E. and Maniatis, T. (1994) Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY), with DNA sequences determined using an Applied Biosystems Model 373A automated sequencer. Amino acid sequences were obtained using an Applied Biosystems protein sequencer with on-line HPLC detection, according to the manufacturer's instructions.
Dirigent Protein Amino Acid Sequencing The dirigent protein N-terminal amino acid sequence (SEQ ID No: 1) was obtained from the purified protein using an Applied Biosystems protein sequencer with on-line HPLC detection. For trypsin digestion, the purified enzyme (150 pmol) was suspended in 0.1 M Tris-HCl (50 utl, pH 8.5, Boehringer Mannheim, sequencing grade), with urea added to give a final concentration of 8 M in 77.5 pl. The mixture was incubated for 15 min at 50 0
C,
following which 100 mM iodoacetamide (2.5 il) was added, with the whole kept at room temperature for 15 min. Trypsin (1 jig in 20 was then added, with the mixture digested for 24 h at 37 0 C, following which TFA (4 pl) was added to stop the enzymatic reaction. The resulting mixture was subjected to reversed phase HPLC *i r WO 98/20113 PCT/US97/20391 -41analysis (C-8 column, Applied Biosytems), this being eluted with a linear gradient over 2 h from 0 to 100% acetonitrile (in 0.1% TFA) at a flow rate of 0.2 ml/min with detection at 280 nm. Fractions containing individual oligopeptide peaks were collected manually and directly submitted to amino acid sequencing (SEQ ID Nos:2-7).
Forsythia intermedia stem cDNA Library Synthesis Total RNA (-300 pg/g fresh weight) was obtained (Dong, and Dunstan, D.I. (1996) Plant Cell Reports 15:516-521) from young green stems of greenhouse-grown Forsythia intermedia plants (var. Lynwood Gold). A Forsythia intermedia stem cDNA library was constructed using 5 ig of purified poly A mRNA (Oligotex-dTTM Suspension, QIAGEN) with the ZAP-cDNA® synthesis kit, the Uni-ZAPTM XR vector and the Gigapack® II Gold packaging extract (Stratagene), with a titer of 1.2 x 106 PFU for the primary library. A portion (30 ml) of the amplified library (1.2 x 1010 PFU/ml; 158 ml total) (Sambrook, J. et al., supra) was used to obtain pure cDNA library DNA (Ausubel, Brent, Kingston, Moore, Seidnam, Smith, J.A., and Struhl, K. (1991) Current Protocols in Molecular Biology, 2 volumes, Greene Publishing Associates and Wiley-Interscience, John Wiley Sons, NY) for PCR.
Dirigent Protein DNA Probe Synthesis The N-terminal and internal peptide amino acid sequences were used to construct the degenerate oligonucleotide primers.
Purified F. intermedia cDNA library DNA (5 ng) was used as the template in 100 Pl PCR reactions (10 mM Tris-HCl [pH 50 mM KC1, 0.1% Triton X-100, 2.5 mM MgCl 2 0.2 mM each dNTP and 2.5 units Taq DNA polymerase) with primer PSINT1 (SEQ ID No:8) (100 pmol) and either primer PSI7R (SEQ ID No: 1) (20 pmol), primer PSI2R (SEQ ID No:10) (20 pmol) or primer PSIIR (SEQ ID No:9) (20 pmol). PCR amplification was carried out in a thermocycler as follows: cycles of 1 min at 94C, 2 min at 50 0 C and 3 min at 72 0 C; with 5 min at 72 0 C and an indefinite hold at 4 0 C after the final cycle. Single-primer, template-only and primer-only reactions were performed as controls. PCR products were resolved in agarose gels, where a single band -155- or -125-bp, respectively) was observed for each reaction.
To determine the nucleotide sequence of the amplified bands, five 100 pl PCR reactions were performed as above with PSINT1 (SEQ IDNo:8) +PSI7R (SEQ ID No:ll), PSINT1 (SEQ ID No:8) +PSI2R (SEQ ID No:10) and PSINT1 (SEQ ID No:8) +PSI1R (SEQ ID No:9) primer pairs. The 5 reactions from each primer pair were concentrated (Microcon 30, Amicon Inc.) and washed with TE WO 98/20113 PCTfUlS97/20391 -42buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA; 2 x 200 tl), with the PCR products subsequently recovered in TE buffer (2 x 50 These were resolved in preparative agarose gels. Each gel-purified PCR product pmol) was then ligated into the pT7Blue T-vector and transformed into competent NovaBlue cells, according to Novagen's instructions. Insert sizes were determined using the rapid boiling lysis and PCR technique (with R20mer and U19mer primers) according to the manufacturer's instructions. Restriction analyses were performed to determine whether all inserts from the reactions utilizing each of the foregoing primer pairs were the same, as follows: to 20 pl each of a 100 p1 PCR reaction (insert of interest amplified with R20mer(SEQ ID No:74) and U19mer(SEQ ID No:75) primers) were added 4 units HaeIII, 1.5 units Sau3a or 5 units TaqI restriction enzyme. Restriction digestions were allowed to proceed for 60 min at 37 0 C for HaeIII and Sau3A and at 0 C for TaqI reactions. Restriction products were resolved in 1.5% agarose gels giving one restriction group for each insert tested. Five recombinant plasmids from PSINT1 (SEQ ID No:8) +PSI7R (SEQ ID No: 1) (called pT7PSIl-pT7PSI5) and 2 recombinant plasmids from PSINT1 (SEQ ID No:8) +PSI2R (SEQ ID No:10) (called pT7PSI6 and pT7PSI7) PCR products were selected for DNA sequencing; all contained the same open reading frame (ORF) (SEQ ID No:69). The dirigent protein probe was next constructed as follows: five 100 1p PCR reactions were performed as above with 10 ng pT7PSIl DNA (SEQ ID No:69) with primers PSINT1 (SEQ ID No:8) and PSI7R (SEQ ID No:l Gel-purified pT7PSII insert (50 ng) was used with Pharmacia's TQuickPrime® kit and [a- 32 P]dCTP, according to kit instructions, to produce a radiolabeled probe (in 0.1 ml), which was purified over BioSpin 6 columns (Bio-Rad) and added to carrier DNA (0.5 mg/ml sheared salmon sperm DNA [Sigma], 0.9 ml).
Library Screening 600,000 PFU of F intermedia amplified cDNA library were plated for primary screening, according to Stratagene's instructions. Plaques were blotted onto Magna Nylon membrane circles (Micron Separations Inc.), which were then allowed to air dry. The membranes were placed between two layers of Whatman® 3MM Chr paper. cDNA library phage DNA was fixed to the membranes and denatured in one step by autoclaving for 2 min at 100 0 C with fast exhaust. The membranes were washed for 30 min at 37 0 C in 6X standard saline citrate (SSC) and 0.1% SDS and prehybridized for 5 h with gentle shaking at 57-58 C in preheated 6X SSC, 0.5% SDS and 5X Denhardt's reagent (hybridization solution, 300 ml) in a crystallization dish (190 x 75 mm). The 3 2 P]radiolabeled probe was denatured WO 98/20113 PCTfUS97/20391 -43- (boiling, 10 min), quickly cooled (ice, 15 min) and added to a preheated fresh hybridization solution (60 ml, 58 0 C) in a crystallization dish (150 x 75 mm). The prehybridized membranes were next added to this dish, which was then covered with plastic wrap. Hybridization was performed for 18 h at 57-58 0 C with gentle shaking.
The membranes were washed in 4X SSC and 0.5% SDS for 5 min at room temperature, transferred to 2X SSC and 0.5% SDS (at room temperature) and incubated at 57-58 0 C for 20 min with gentle shaking, wrapped with plastic wrap to prevent drying and finally exposed to Kodak X-OMAT AR film for 24 h at -80 0
C
with intensifying screens. Twenty positive plaques were purified through two more rounds of screening with hybridization conditions as above.
In vivo Excision and Sequencing of Dirigent Protein cDNA-containing Phagemids Purified cDNA clones were rescued from the phage following Stratagene's in vivo excision protocol. Both strands of several different cDNAs that coded for dirigent protein were completely sequenced using overlapping sequencing primers. Two distinct cDNAs were identified, called pPSDFil(SEQ ID No:12) and pPSDFi2(SEQ ID No:14).
Sequence Analysis DNA and amino acid sequence analyses were performed using the Unix-based GCG Wisconsin Package (Program Manual for the Wisconsin Package, Version 8, September 1994, Genetics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 53711; Rice, P. (1996) Program Manual for the EGCG Package, Peter Rice, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1Rq, England) and the ExPASy World Wide Web molecular biology server (Geneva University Hospital and University of Geneva, Geneva, Switzerland).
EXAMPLE 6 Expression of Functional Dirigent Protein in Spodoptera frugiperda Attempts to express functional dirigent protein in Escherichia coli failed.
Consequently, we expressed the dirigent protein in Spodopterafrugiperda utilizing a baculovirus expression system. The full-length 1.2 kb cDNA clone for the dirigent protein (PSD) in F. intermedia, containing both the 5' and 3' untranslated regions, was excised from the pBlueScript (Stratagene) derived plasmid pPSD_Fil (SEQ ID No: 12) using the restriction endonucleases BamH I and Xho I. This 1.2 kb fragment was directionally subcloned into these same restriction sites in the multiple cloning site of the baculovirus transfer vector pBlueBac4 (Invitrogen, San Diego, CA). This produced the 6.0 kb construct pBB4/PSD which generates a nonfusion dirigent protein with translation being initiated at the dirigent protein cDNA WO 98/20113 PCTIUS97/20391 -44start codon. This construct was then co-transfected with linearized Bac-N-Blue DNA (Invitrogen) into Spodoptera frugiperda Sf9 cells by the technique of cationic liposome mediated transfection to produce, by means of homologous recombination, the recombinant Autographa californica nuclear polyhedrosis viral (AcMNPV) DNA Bac-N-Blue dirigent protein (BB/PSD) which was purified from plaques according to procedures described by Invitrogen. The final recombinant AcMNPV-BB/PSD contains the PSD gene under the polyhedrin promoter control and the essential sequence needed for replication of the recombinant virus. To verify that the dirigent protein was successfully expressed in the insect cell culture, log phase Sf9 cells infected with the AcMNPV-PSD recombinant viral high titer stock were used to obtain heterologous protein production. Maximal dirigent protein yield occurred by 48-70 hours post-infection. As determined by SDS-PAGE and (+)-pinoresinol forming activity, the protein was found secreted into the medium and showed a molecular mass and activity which corresponded to the indigenous protein originally isolated from Forsythia intermedia.
EXAMPLE 7 Isolation of Dirigent Protein Clones from Thuja plicata and Tsuga heterophylla The coding region of a Forsythia dirigent protein cDNA, psd-Fil (SEQ ID No:12), was used to screen cDNA libraries from Thuja plicata and Tsuga heterophylla. The conditions and methods were as disclosed in Example 5, except that hybridization was carried out at 45-50 0 C. Two dirigent protein cDNAs were isolated from Tsuga heterophylla (SEQ ID Nos:16, 18), and eight dirigent protein cDNAs were isolated from Thuja plicata (SEQ ID Nos:20, 22, 24, 26, 28, 30, 32, 34).
EXAMPLE 8 Purification of Pinoresinol/lariciresinol Reductases from Forsythia Intermedia Plant Materials. Forsythia intermedia plants were either obtained from Bailey's Nursery (var. Lynwood Gold, St., Paul, MN), and maintained in Washington State University greenhouse facilities, or were gifts from the local community.
Materials. All solvents and chemicals used were reagent or HPLC grade.
Unlabeled (±)-pinoresinols and (±)-lariciresinols were synthesized as described (Katayama, T. et al., Phytochemistry 32:581-591 (1993)). [4R-3H]NADPH was obtained as previously reported (Chu, A. et al., J. Biol. Chem. 268:27026-27033 (1993)) by modification of the procedure of Moran et al. (Moran, R.G. et al., Anal.
Biochem. 138:196-204 (1984)), and [4R-2H]NADPH was prepared according to WO 98/20113 PCTIUS97/20391 Anderson and Lin (Anderson, and Lin Phytochemistry 32:811-812 (1993)). Yeast glucose-6-phosphate dehydrogenase (Type IX,22.32. mmol h' 1 mg"') and yeast hexokinase (Type F300, 15.12 mmol' were purchased from Sigma and dihydrofolate reductase (Lactobacillus casei, 33.48 mmol h- 1 was obtained from Biopure Co. Affi-Gel Blue Gel (100-200 mesh) and Bio-Gel HT Hydroxyapatite were purchased from Bio-Rad, whereas Phenyl Sepharose CL-4B, MonoQ HR 5/5, MonoP HR 5/20, Superose 6, Superose 12, Superdex 75, columns, molecular weight standards and Polybuffer 74 were obtained from Pharmacia LKB Biotechnology, Inc. Adenosine 2',5'-diphosphate Sepharose and Reactive Yellow 3 Agarose were from Sigma Chemical Co.
Instrumentation. H Nuclear magnetic resonance spectra (300 and 500 MHz) were recorded on Brtiker AMX300 and Varian VXR500S spectrometers, respectively, using CDC1 3 as solvent with chemical shifts (6 ppm) reported downfield from tetramethylsilane (internal standard). UV (including RNA and DNA determinations at OD 260 and mass spectra were obtained on Lambda 6 UV/VIS and VG 7070E (ionizing voltage 70 eV) spectrophotometers, respectively. High performance liquid chromatography was carried out using either reversed-phase (Waters, Nova-pak C18, 150 x 3.9 mm inner diameter) or chiral (Daicel, Chiralcel OD or Chiralcel OC, 240 x 4.6 mm inner diameter) columns, with detection at 280 nm (Chu, A. etal., J. Biol. Chem. 268:27026-27033 (1993)). Radioactive samples were analyzed in Ecolume (ICN) and measured using a liquid scintillation counter (Packard, Tricarb.2000 CA). Amino acid sequences were obtained using an Applied Biosystems protein sequencer with on-line HPLC detection, according to the manufacturer's instructions.
Enzyme Assays. Pinoresinol and lariciresinol reductase activities were assayed by monitoring the formation of 3 H]lariciresinol and 3 H]secoisolariciresinol (Chu, A. et al., J. Biol. Chem. 268:27026-27033 (1993)).
Briefly, each assay for pinoresinol reductase activity consisted of (±)-pinoresinols (5 mM in MeOH, 20 pl), the enzyme preparation at the corresponding stage of purity (100 al), and buffer (20 mM Tris-HCl, pH 8.0, 110 gl).
The enzymatic reaction was initiated by addition of [4R- 3 H]NADPH (10 mM, 6.79 kBq/mmol in 20 pl of double-distilled
H
2 After 30 min incubation at with shaking, the assay mixture was extracted with EtOAc (500 pl) containing (+)-lariciresinols (20 ig) and (±)-secoisolariciresinols (20 tg) as radiochemical carriers. After centrifugation (13,800 x g, 5 min), the EtOAc solubles were removed WO 98/20113 PCTIUS97/20391 -46and the extraction procedure was repeated. For each assay; the EtOAc solubles were combined with an aliquot (100 tl) removed for determination of its radioactivity using liquid scintillation counting. The remainder of the combined EtOAc solubles was evaporated to dryness in vacuo, reconstituted in MeOH/3% acetic acid in H 2 0 (30:70, 100 pl) and subjected to reversed phase and chiral column HPLC. Controls were performed using either denatured enzyme (boiled for 10 min) or in the absence of(±)-pinoresinols as substrate.
Lariciresinol reductase activity was assayed by monitoring the formation of H]secoisolariciresinol. These assays were carried out exactly as described above, except that (±)-lariciresinols (5 mM in MeOH, 20 il1) were used as substrates, with (±)-secoisolariciresinols (20 jg) added as radiochemical carriers.
General Procedures for Enzyme Purification. Protein purification procedures were carried out at 4 0 C with chromatographic eluents monitored at 280 nm, unless otherwise stated. Protein concentrations were determined by the method of Bradford (Bradford, Anal. Biochem. 72:248-254 (1976)) using y-globulin as standard.
Polyacrylamide gel electrophoresis used gradient Bio-Rad) gels under denaturing and reducing conditions, these being performed in Laemmli's buffer system (Laemmli, Nature 227:680-685 (1970)). Proteins were visualized by silver staining (Morrissey, Anal. Biochem. 117:307-310 (1981)).
Preparation of crude extracts. F. intermedia stems (20 kg) were harvested, cut into 3-6 cm sections, and stored at -20 0 C until needed. Batches of stems (2 kg) were frozen in liquid nitrogen and pulverized in a Waring Blendor. The resulting powder was homogenized with potassium phosphate buffer (0.1 mM, pH 7.0, 4 L), containing 5 mM dithiothreitol. The homogenate was filtered through four layers of cheesecloth into a beaker containing 10% polyvinylpolypyrolidone. The filtrate was centrifuged (12,000 x g, 15 min). The resulting supernatant was fractionated with (NH 4 2 SO4, with proteins precipitating between 40 and saturation recovered by centrifugation (10,000 x g, 1 The pellet was next reconstituted in a minimum amount of Tris-HCl buffer (20 mM, pH containing 5 mM dithiothreitol (buffer A) and desalted using prepacked PD-10 columns (Sephadex G-25 medium) equilibrated with buffer A.
Affinity (Affi Blue Gel) Chromatography. The crude enzyme preparation (191 mg in buffer A, 5 nmol h 1 was applied to an Affi Blue Gel column (2.6 x 70 cm) equilibrated in buffer A. After washing the column with 200 ml of buffer A, pinoresinol/lariciresinol reductase was eluted with a linear NaCI gradient 1.
WO 98/20113 PCT/US97/20391 -47- (1.5-5 M in 300 ml) in buffer A at a flow rate of 1 ml min"'. Active fractions were stored (-80 0 C) until needed.
Hydrophobic Interaction Chromatography (Phenyl Sepharose). After thawing, ten preparations resulting from the Affi Blue chromatography step (150 mg, 51 nmol h' were combined and applied to a Phenyl Sepharose column (1 x 10 cm) equilibrated in buffer A, containing 5 M NaCI. The column was washed with two bed volumes of the same buffer. Pinoresinol/lariciresinol reductase was eluted using a linear gradient of decreasing concentration of NaCl (5-0 M in 40 ml) in buffer A at a flow rate of I ml min- 1 Fractions catalyzing pinoresinol/lariciresinol reduction were combined and pooled.
Hydroxyapatite I Chromatography. Active protein (31 mg, 91 nmol mg" 1 from the phenyl sepharose purification step was applied to an hydroxyapatite column (1.6 x 70 cm) equilibrated in 10 mM potassium phosphate buffer, pH 7.0, containing mM dithiothreitol (buffer Pinoresinol/lariciresinol reductase was eluted with a linear gradient of potassium phosphate buffer, pH 7.0 (0.01-0.4 M in 200 ml) at a flow rate of 1 ml min' Active fractions were combined. The buffer was then exchanged with buffer A using PD-10 prepacked columns.
Affinity (2,5'-ADP Sepharose) Chromatography. The enzyme solution resulting from the hydroxyapatite purification step (6.5 mg, 463 nmol h mg was next loaded on a 2',5'-ADP Sepharose (1 x 10 cm) column, previously equilibrated in buffer A containing 2.5 mM EDTA (buffer and then washed with 25 ml of buffer Pinoresinol/lariciresinol reductase was eluted with a step gradient of NADP+ (0.3 mM in 10 ml) in buffer A' at a flow rate of 0.5 ml min' 1 [NAD+ (up to 3 mM) did not elute pinoresinol/lariciresinol reductase activity.] Because of the interference of the absorbance of the NADP+, it was not possible to directly monitor the eluent at 280 nm. Protein concentrations for each fraction were determined spectrophotometrically according to Bradford (Bradford, Anal. Biochem.
72:248-254 (1976)).
Hydroxyapatite II Chromatography. Fractions from the 2',5'-ADP Sepharose column that exhibited pinoresinol/lariciresinol reductase activity (0.85 mg, 1051 nmol h were combined and directly applied to a second hydroxyapatite column (1 x 3 cm), equilibrated in buffer B, with the enzyme eluted with a linear gradient of potassium phosphate buffer, pH 7.0 (0.01-0.4 M in 45 ml) at a flow rate of 1 ml min' WO 98/20113 PCTIS97/20391 -48- Affinity (Affi Yellow) Chromatography Active fractions (160 pg, 7960 nmol h i mg from the second hydroxyapatite column purification step were next applied to a Reactive Yellow 3 Agarose column (1 x 3 cm), equilibrated in buffer A.
Pinoresinol/lariciresinol reductase was eluted with a linear NaC1 gradient (0-2.5 M in 100 ml) at a flow rate of 1 ml min'.
Fast Protein Liquid Chromatography (Superose 12 Chromatography) Combined fractions from the Affi Yellow purification step having the highest activity (50 tg, 10,940 nmol h' 1 were pooled and concentrated to 1 ml, using a Centricon 10 microconcentrator (Amicon, Inc.). The enzyme solution was then applied in portions of 200 pl to a fast protein liquid chromatography column (Superose 12, HR 10/30). Gel filtration was performed in a buffer containing 20 mM Tris-HC1, pH 8.0, 150 mM NaCl and 5 mM dithiothreitol at a flow rate of 0.4 ml min 1 Pinoresinol/lariciresinol reductase was eluted with 12.8 ml of the mobile phase. The active fractions which coincided with the UV profile (absorbance at 280 nm) were pooled (20 ug, 15,300 nmol h- 1 mg' l and desalted prepacked columns).
The foregoing purification protocol resulted in a 3060-fold purification of (+)-pinoresinol/(+)-lariciresinol reductase. As for many of the enzymes involved in phenylpropanoid metabolism, the protein was in very low abundance, i.e. 20 kg F. intermedia stems yielded only -20 ug of the purified (+)-pinoresinol/- (+)-lariciresinol reductase.
EXAMPLE 9 Characterization of Purified Pinoresinol/lariciresinol Reductases from Forsythia Intermedia Isoelectric Focussing and pl Determination. In all stages of the purification protocol, (+)-pinoresinol/(+)-lariciresinol reductase activities coeluted. Given this observation, it was essential to unambiguously ascertain whether more than one form of the protein existed,, whether one form of the protein catalyzed the reduction of pinoresinol, and another form of the protein catalyzed the reduction of lariciresinol.
To this end, the isoelectric point of pinoresinol/lariciresinol reductase was estimated by chromatofocussing on a MonoP HR 5/20 FPLC column.
Active fractions from the Superose 12 gel filtration column (Example 1) were pooled and the buffer exchanged with 25 mM Bis-Tris, pH 7.1, using prepacked PDcolumns, equilibrated in the same buffer. The preparation so obtained was loaded on the chromatofocussing column and a pH gradient between 7.1 and 3.9 was WO 98/20113 PCTIUS97/20391 -49formed, using Polybuffer 74 as eluent at a flow rate of 0.5 ml min'. Aliquots (200 p1) of each fraction were assayed for pinoresinol/lariciresinol reductase activities. The remainder of the fractions was used to determine the pH gradient.
Molecular Weight Determination. Application of the MonoP HR 5/20 FPLC column preparation of pinoresinol/lariciresinol reductase to SDS-gradient gel electrophoresis (4-15% polyacrylamide) revealed the presence of two protein bands of similar apparent molecular weight, whose separation was achieved via anionexchange chromatography on a MonoQ HR 5/5 FPLC matrix. Pooled fractions from the Sepharose 12 purification step (Example 1) were applied to a MonoQ HR column (Pharmacia), equilibrated in buffer A. The column was washed with 10 ml of buffer A and pinoresinol/lariciresinol reductase activity eluted using a linear NaCI gradient (0-500 mM in 50 ml) in buffer A at a flow rate of 0.5 ml min Aliquots pl) of the collected fractions were analyzed by SDS polyacrylamide gel electrophoresis, using a gradient (4-15% acrylamide) gel. Proteins were visualized by silver staining. Active fractions 34 through 37 (27,760 nmol h'1 mg') and 38 through 41 (30,790 nmol h' 1 mg' were pooled separately and immediately used for characterization.
The two protein bands thus resolved under denaturing conditions had apparent molecular masses of -36 and -35 kDa, respectively. Each of the two reductase forms had a pI-5.7.
Native molecular weights of each reductase isoform were estimated via comparison of their elution behavior on Superose 12, Superose 6 and Superdex gel filtration FPLC columns with the elution behavior of calibrated molecular weight standards. Gel filtration was carried out as set forth in Example 8. For each reductase, an apparent native molecular weight of 59,000 was calculated based on its elution volume, in contrast to that of -36,000 and -35,000 by SDS-polyacrylamide gel electrophoresis. While the discrepancy between molecular weights from gel filtration and SDS-PAGE remains unknown, it can tentatively be proposed that although the native protein likely exists as a dimer, it could also be a monomer of asymmetric shape, thereby altering its effective Stokes radius (Cantor, and Shimmel, Biophysical Chemistry, Part II, W.H. Freeman and Company, San Francisco, CA (1980); Stellwagen, Methods in Enzymology 182:317-328 (1990)), as reported for human thioredoxin reductase (Oblong, et al., Biochemistry 32:7271-7277 (1993)) and yeast metalloendopeptidase (Hrycyna, and Clarke, Biochemistry 32:11293-11301 (1993)).
WO 98/20113 PCTIUS97/20391 pH and Temperature Optima. To determine the pH-optimum of pinoresinol/lariciresinol reductase, the enzyme preparation from the gel Superose 12 filtration step (Example 8) was assayed utilizing standard assay conditions (Example except that the buffer was replaced with 50 mM Bis-Tris Propane buffer in the pH range of 6.3 to 9.4. The pH optimum was found to be pH 7.4.
The temperature optimum of pinoresinol/lariciresinol reductase was examined in the range between 4°C and 80 0 C under standard assay conditions (Example 8) utilizing the enzyme preparation from the gel filtration step (Example At optimum pH, the temperature optimum for the reductase activity was established to be -30 0
C.
Kinetic Parameters. Velocity studies were carried out to ascertain whether the two reductase isoforms catalyzed distinct reductions, that of the conversion of (+)-pinoresinol to (+)-lariciresinol, and (+)-lariciresinol to (-)-secoisolariciresinol, respectively, or whether either displayed a preference for (+)-pinoresinol or (+)-lariciresinol as substrates. The initial velocity studies were carried out individually utilizing the two isoforms of the enzyme, and individually employing both (+)-pinoresinol and (+)-lariciresinol as substrates. Initial velocity studies were performed in triplicate experiments, using 50 mM Bis-Tris Propane buffer, pH 7.4 containing 5 mM dithiothreitol, pure enzyme (after MonoQ anion-exchange chromatography), ten different substrate concentrations (between 8.8 and 160 pM) at a constant NADPH concentration (80 pM). Incubations were carried out at 30 OC for min (within the linear kinetic range). Kinetic parameters were determined from Lineweaver-Burk plots.
Importantly, the kinetic parameters were essentially the same for both the 35 kDa and the 36 kDa forms of the enzyme Km for pinoresinol: 27±1.5pm for the 35 kDa form of the enzyme, and 23±1.3pM for the 36 kDa form of the enzyme; Km for lariciresinol: 121±5.01M for the 35 kDa form of the enzyme and 123±6.OpM for the 36 kDa form of the enzyme). In an analogous manner, apparent maximum velocities (expressed as tmol mg-' of protein) were also essentially identical Vmax for pinoresinol: 16.2±0.4 for the 35 kDa form of the enzyme and 17.3±0.5 for the 36 kDa form of the enzyme; for lariciresinol: 25.2±0.7 for the 35 kDa form of the enzyme and 29.9±0.7 for the 36 kDa form of the enzyme). Thus, all available evidence suggests that (+)-pinoresinol/(+)-lariciresinol reductase exists as two isoforms, with each capable of catalyzing the reduction of both substrates. How this reduction is carried out, whether both reductions are done in tandem, in either WO 98/20113 PCT/US97/20391 -51quinone or furano ring form, awaits further study using a more abundant protein source.
Enzymatic Formation of 2 H]Lariciresinol. Since the two (+)-pinoresinol/(+)-lariciresinol reductase isoforms exhibited essentially identical catalytic characteristics, the Sepharose 12 enzyme preparation (Example 8), containing both isoforms, was used to examine the stereospecificity of the hydride transfer. The strategy adopted utilized selective deuterium labeling using NADP H as cofactor for the reduction of (+)-pinoresinol, with the enzymatic product, lariciresinol, being analyzed by 'H NMR and mass spectroscopy. Thus, a solution of (A)-pinoresinols (5.2 mM in MeOH, 4 ml) was added to Tris-HCI buffer (20 mM, pH 8.0, containing 5 mM dithiothreitol, 22 ml) and stereospecifically deutero-labeled [4R-2H]NADPH (20 mM in H 2 0, 4 ml) prepared via the method of Anderson and Lin (Anderson, and Lin Phytochemistry 32:811-812 (1993)), with the whole added to the enzyme preparation (20 ml). After incubation at 30 0 C for Ih with shaking, the assay mixture was extracted with EtOAc (2 x 50 ml). The EtOAc soluble fraction was combined, washed with saturated NaCI (50 ml), dried (Na 2
SO
4 and evaporated to dryness in vacuo. The resulting extract was reconstituted in a minimum amount of EtOAc, applied to a silica gel column (0.5 x 7 cm), and eluted with EtOAc/hexanes Fractions containing the enzymatic product were combined and evaporated to dryness.
The enzymatic product was established to be 2 H]lariciresinol, as evidenced by the disappearance of the 7'-proR proton at 5 2.51 ppm due to its replacement by deuterium and by its molecular ion at 361 corresponding to the presence of one deuterium atom at C-7. 'H NMR (300 MHz)
(CDCI
3 2.39 C8H), 2.71 C8'H), 2.88 J7'S,8'=5.0 Hz, C7'HS), 3.73 J8',9'b=7.0 Hz, J9'a,9'b=8.5 Hz, C9'H13), 3.76 J8,9S=6.5 Hz, J9R,9S=8.5 Hz, C9HS), 3.86 (s,3H, OCH 3 3.88 (s, 3 H, OCH 3 3.92 (88,1H, J8,9R=6.0 Hz, J9R,9S=9.5 Hz, C9HR), 4.04 J8',9'a=7.0 Hz, J9'a9'b=8.5 Hz, C9'Ha), 4.77 J7,8=6.6 Hz, C7H), 6.68 6.70 (m,2H, ArH), 6.75 6.85 (m,4H, ArH); MS m/z 361 71.2), 360 31.1), 237 153 152 151 138 (100), 137 (71.1).
Thus, hydride transfer from (+)-pinoresinol to (+)-lariciresinol had occurred in a manner whereby only the 7'-proR hydrogen position of (+)-lariciresinol was deuterated. An analogous result was observed for the conversion of (+)-lariciresinol *1 V.
WO 98/20113 PCT/US97/20391 -52into (-)-secoisolariciresinol, thereby establishing that the overall hydride transfer was completely stereospecific.
EXAMPLE Amino Acid Sequence Analysis of Purified Pinoresinol/Lariciresinol Reductase from Forsythia intermedia Pinoresinol/Lariciresinol Reductase Amino Acid Sequencing. The (+)-pinoresinol/(+)-lariciresinol reductase N-terminal amino acid sequence was obtained from each of the purified proteins, and a mixture of both, using an Applied Biosystems protein sequencer with on-line HPLC detection. The N-terminal sequence was the same for both isoforms (SEQ ID No:36).
For trypsin digestion, 150 pmol of the enzyme purified from the Sepharose 12 column (Example 8) was suspended in 0.1 M Tris-HCI (50 pl, pH with urea added to give a final concentration of 8 M in 77.5 pl. The mixture was incubated for min at 50 0 C, then 100 mM iodoacetamide (2.5 pl) was added, with the whole kept at room temperature for 15 min. Trypsin (1 gg in 20 gl) was then added, with the mixture digested for 24 h at 37 0 C, after which TFA (4 pl) was added to stop the enzymatic reaction.
The resulting mixture was subjected to reversed phase HPLC analysis (C-8 column, Applied Biosytems), this being eluted with a linear gradient over 2 h from 0 to 100% acetonitrile (in 0.1% TFA) at a flow rate of 0.2 ml/min with detection at 280 nm. Fractions containing individual oligopeptide peaks were collected manually and directly submitted to amino acid sequencing. Four tryptic fragments were resolved in sufficient quantity to permit amino acid sequence determination.
(SEQ ID Nos:37-40).
Cyanogen bromide digestion was performed by incubation of 150 pmol of the reductase purified from the Sepharose 12 column (Example 8) with 0.5 M cyanogen bromide in 70% formic acid for 40 h at 37°C, following which the cyanogen bromide and formic acid were removed by centrifugation under reduced pressure (SpeedVac).
The resulting oligopeptide fragments were separated by HPLC and three were resolved in sufficient quantity to permit sequencing (SEQ ID Nos:41-43).
EXAMPLE 11 Cloning of Pinoresinol/Lariciresinol Reductase from Forsythia intermedia Plant Materials. Forsythia intermedia plants were either obtained from Bailey's Nursery (var. Lynwood Gold, St., Paul, MN), and maintained in Washington State University greenhouse facilities, or were gifts from the local community.
WO 98/20113 -53- PCT/US97/20391 Materials. All solvents and chemicals used were reagent or HPLC grade.
UV RNA and DNA determinations at OD 260 were obtained on a Lambda 6 UV/VIS spectrophotometer. A Temptronic II thermocycler (Thermolyne) was used for all PCR amplifications. Taq thermostable DNA polymerase was obtained from Promega, whereas restriction enzymes were from Gibco BRL (HaelII), Boehringer Mannheim (Sau3a) and Promega (TaqI). pT7Blue T-vector and competent NovaBlue cells were purchased from Novagen and radiolabeled nucleotides ([a-32P]dCTP and [y-32P]ATP) were from DuPont NEN.
Oligonucleotide primers for polymerase chain reaction (PCR) and sequencing were synthesized by Gibco BRL Life Technologies. GENECLEAN II® kits (BIO 101 Inc.) were used for purification of PCR fragments, with the gel-purified DNA concentrations determined by comparison to a low DNA mass ladder (Gibco BRL) in 1.5% agarose gels.
Forsythia RNA Isolation. Initial attempts to isolate functional F. intermedia RNA from fast-growing, green stem tissue were unsuccessful, due to difficulties encountered via facile oxidation by its plant phenolic constituents. This problem was, however, successfully overcome by utilization of an RNA isolation procedure, specifically designed for woody plant tissue, which uses low pH and reducing conditions in the extraction buffer to prevent oxidation (Dong, and Dunstan, Plant Cell Reports 15: 516-521(1996)).
Forsythia intermedia stem cDNA Library Synthesis. Total RNA (-300 pg/g fresh weight) was obtained from young green stems of greenhouse-grown Forsythia intermedia plants (var. Lynwood Gold) (Dong, and Dunstan, Plant Cell Reports 15:516-521 (1996)). A Forsythia intermedia stem cDNA library was constructed using 5 pg of purified poly A+ mRNA (Oligotex-dT T M Suspension, QIAGEN) with the ZAP-cDNA® synthesis kit, the Uni-ZAPTM XR vector and the Gigapack® II Gold packaging extract (Stratagene), with a titer of 1.2x10 6 PFU for the primary library. A portion (30 ml) of the amplified library (1.2x10 1 0 PFU/ml; 158 ml total) was used to obtain pure cDNA library DNA for PCR (Sambrook, J. et al., Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994); Ausubel, F.M. et al., Current Protocols in Molecular Biology, 2 volumes, Greene Publishing Associates and Wiley-Interscience, John Wiley Sons, NY (1991)).
Pinoresinol/Lariciresinol Reductase DNA Probe Synthesis The N-terminal and internal peptide amino acid sequences were used to construct the degenerate WO 98/20113 PCTIUS97/20391 -54oligonucleotide primers. Specifically, the primer PLRN5 (SEQ ID No:44) was based on the sequence of amino acids 7 to 13 of the N-terminal peptide (SEQ ID No:36).
The primer PLR14R (SEQ ID No:45) was based on the sequence of amino acids 2 to 8 of the internal peptide sequence set forth in (SEQ ID No:37). The primer PLR15R (SEQ ID No:46) was based on the sequence of amino acids 9 to 15 of the internal peptide sequence set forth in (SEQ ID No:37). The sequence of amino acids 9 to 15 of the internal peptide sequence set forth in SEQ ID No:37, upon which the sequence of primer PLR15R (SEQ ID No:46) was based, also corresponded to the sequence of amino acids 4 to 10 of the cyanogen bromide-generated, internal fragment set forth in SEQ ID No:41.
Purified F. intermedia cDNA library DNA (5 ng) was used as the template in 100 [l PCR reactions (10 mM Tris-HCI [pH 50 mM KCI, 0.1% Triton X-100, mM MgCl 2 0.2 mM each dNTP and 2.5 units Taq DNA polymerase) with primer (SEQ ID No:44) (100 pmol) and either primer PLRI5R (SEQ ID No:46) (20 pmol) or primer PLRI4R (SEQ ID No:45) (20 pmol). PCR amplification was carried out in a thermocycler as follows: 35 cycles of 1 min at 94C, 2 min at and 3 min at 72 0 C; with 5 min at 72 0 C and an indefinite hold at 4°C after the final cycle. Single-primer, template-only and primer-only reactions were performed as controls. PCR products were resolved in 1.5% agarose gels. The combination of primers PLRN5 (SEQ ID No:44) and PLRI4R (SEQ ID No:45) yielded a single band of 380-bp corresponding to bases 22 to 393 of SEQ ID No:47. The combination of primers PLRN5 (SEQ ID No:44) and PLRI5R (SEQ ID No:46) yielded a single band of 400-bp corresponding to bases 22 to 423 of SEQ ID No:47.
To determine the nucleotide sequence of the two amplified bands, five, 100 pl PCR reactions were performed as above with each of the following combinations of template and primers: 380 bp amplified product plus primers (SEQ ID No:44) and PLRI4R (SEQ ID No:45); 400 bp amplified product plus primers PLRN5 (SEQ ID No:44) and PLRI5R (SEQ ID No:46). The 5 reactions from each combination of primers and template were concentrated (Microcon Amicon Inc.) and washed with TE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA; 2 x 200 jtl), with the PCR products subsequently recovered in TE buffer (2 x 50 pl).
These were resolved in preparative 1.5% agarose gels. Each gel-purified PCR product pmol) was then ligated into the pT7Blue T-vector and transformed into competent NovaBlue cells, according to Novagen's instructions. Insert sizes were determined using the rapid boiling lysis and PCR technique (utilizing .1 WO 98/20113 PCT/US97/20391 (SEQID No:74) and U19mer (SEQ ID No:75) primers according to the manufacturer's (Novagen's) instructions.
Restriction analysis was performed to determine whether all inserts for each combination of primers and template were the same. Restriction analysis was carried out as follows: each of the inserts was amplified by PCR utilizing the (SEQ ID No:74) and U19 (SEQ ID No:75) primers. To 20 pl each of a 100 pl PCR reaction were added 4 units HaeIII, 1.5 units Sau3a or 5 units TaqI restriction enzyme. Restriction digestions were allowed to proceed for 60 min at 37 0 C for HaeIII and Sau3A and at 65 0 C for TaqI reactions. Restriction products were resolved in 1.5% agarose gels giving one restriction group for all inserts tested.
Five of the resulting, recombinant plasmids were selected for DNA sequencing. The inserts from three of the recombinant plasmids (called pT7PLRIpT7PLR3) were generated by a combination of primers PLRN5 (SEQ ID No:44) and (SEQ ID No:46) with the 400 bp PCR product as substrate. The inserts from the remaining two recombinant plasmids (called pT7PLR4 and pT7PLR5) were generated from a combination of primers PLRN5 (SEQ ID No:44) and PLRI4R (SEQ ID No:45) and the 380 bp PCR product as substrate. All of the five, sequenced PCR products contained the same open reading frame.
The (+)-pinoresinol/(+)-lariciresinol reductase probe was constructed as follows: five, 100 pl PCR reactions were performed as described above with 10 ng pT7PLR3 DNA with primers PLRN5 (SEQ ID No:44) and PLRI5R (SEQ ID No:46).
Gel-purified pT7PLR3 cDNA insert (50 ng) was used with Pharmacia's T7QuickPrime® kit and [a-32P]dCTP, according to kit instructions, to produce a radiolabeled probe (in 0.1 ml), which was purified over BioSpin 6 columns (Bio-Rad) and added to carrier DNA (0.9 ml of 0.5 mg/ml sheared salmon sperm DNA obtained from Sigma).
Library Screening. 600,000 PFU of F. intermedia amplified cDNA library were plated for primary screening, according to Stratagene's instructions. Plaques were blotted onto Magna Nylon membrane circles (Micron Separations Inc.), which were then allowed to air dry. The membranes were placed between two layers of Whatman® 3MM Chr paper. cDNA library phage DNA was fixed to the membranes and denatured in one step by autoclaving for 2 min at 100 0 C with fast exhaust. The membranes were washed for 30 min at 37°C in 6X standard saline citrate (SSC) and 0.1% SDS and prehybridized for 5 h with gentle shaking at 57-58 0 C in preheated 6X WO 98/20113 PCTIUS9720391 -56- SSC, 0.5% SDS and 5X Denhardt's reagent (hybridization solution, 300 ml) in a crystallization dish (190x75 mm).
The [32P]radiolabeled probe was denatured (boiling, 10 min), quickly cooled (ice, 15 min) and added to a preheated fresh hybridization solution (60 ml, 58 0 C) in a crystallization dish (150x75 mm). The prehybridized membranes were next added to this dish, which was then covered with plastic wrap. Hybridization was performed for 18 h at 57-58 0 C with gentle shaking. The membranes were washed in 4X SSC and 0.5% SDS for 5 min at room temperature, transferred to 2X SSC and 0.5% SDS (at room temperature) and incubated at 57-58 0 C for 20 min with gentle shaking, wrapped with plastic wrap to prevent drying and finally exposed to Kodak X-OMAT AR film for 24 h at -80 0 C with intensifying screens.
This screening procedure resulted in more than 350 positive plaques, with twenty (of different signal intensities) being subjected to two additional rounds of screening. After final purification, six of the twenty cDNAs were subcloned by in vivo excision into pBluescript. These six cDNAs were called plr-Fil to plr-Fi6 (SEQ ID Nos:47, 49, 51, 53, 55, 57).
In vivo Excision and Sequencing of plr-Fil-plr-Fi6 Phagemids. The six purified cDNA clones were rescued from the phage following Stratagene's in vivo excision protocol. Both strands of the six different cDNAs (plr-Fil to plr-Fi6) that coded for (+)-pinoresinol/ (+)-lariciresinol reductase were completely sequenced using overlapping sequencing primers.
Purification of DNA for sequencing employed a QIAwell Plus plasmid purification system (QIAGEN) followed by PEG precipitation (Sambrook, J., Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)), with DNA sequences determined using an Applied Biosystems Model 373A automated sequencer. DNA and amino acid sequence analyses were performed using the Unix-based GCG Wisconsin Package (Program Manual for the Wisconsin Package, Version 8, September 1994, Genetics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 53711; Rice, P., Program Manual for the EGCG Package, Peter Rice, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1Rq, England (1996)) and the ExPASy World Wide Web molecular biology server (Geneva University Hospital and University of Geneva, Geneva, Switzerland).
All six cDNAs had the same coding but different 5'-untranslated regions. On the other hand, analysis of the 3'-untranslated region of each of the six cDNAs WO 98/20113 PCTIUS97/20391 -57established that all were truncated versions of the longest cDNA's 3'-region.
Preliminary RNA gel blot analysis with total RNA from greenhouse-grown plant stem tips confirmed a single transcript with a length of approximately 1.2 kb.
RNA gel blot analysis. For RNA gel blot analysis, total RNA (30 lag per lane) from F. intermedia stem tips was separated by size by denaturing agarose gel electrophoresis. The RNA was transferred to charged nylon membranes (GeneScreen Plus®, Dupont NEN), cross-linked to the membrane (Stratalinker from Stratagene), prehybridized, hybridized with the same probe used to screen the cDNA library during cDNA cloning and washed according to the manufacturer's instructions for aqueous hybridization conditions. The membrane was then exposed to Kodak X- OMAT film for 48 hr at -80 0 C with intensifying screens.
EXAMPLE 12 Expression of (+)-Pinoresinol/(+)-Lariciresinol Reductase cDNA plr-Fi 1 in E.coli Expression in Escherichia coli. In order to confirm that the putative (+)-pinoresinol/(+)-lariciresinol reductase cDNAs encoded functional pinoresinol/(+)-lariciresinol reductase, the cDNAs putatively encoding (+)-pinoresinol/(+)-lariciresinol reductase were heterologously expressed in E. coli.
Heterologous expression was also necessary in order to obtain sufficient protein to enable the systematic study of the precise biochemical mechanism of (+)-pinoresinol/(+)-lariciresinol reductase at a future date.
Examination of the six putative (+)-pinoresinol/(+)-lariciresinol reductase clones revealed that one, plr-Fil (SEQ ID No:47), was in frame with the acomplementation particle of P-galactosidase in pBluescript. This was fortuitous, since it potentially provided a facile means to express the fully functional fusion protein, and hence to provide proof that the cloned sequence was correct.
Purified plasmid DNA from plr-Fil (SEQ ID No:47) was transformed into NovaBlue cells according to Novagen's instructions. Transformed cells (5 ml cultures) were grown at 37°C with shaking (225 rpm) to mid log phase (OD 600 in LB medium (Sambrook, Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)) supplemented with 12.5 [lg ml-' tetracycline and 50 jlg ml' 1 ampicillin. IPTG (isopropyl P-D-thioglucopyranoside) was then added to a final concentration of mM, and the cells were allowed to grow for 2 h. Cells were collected by centrifugation and resuspended in 500 pl (per 5 ml culture tube) buffer (20 mM Tris-HC1, pH 8.0, 5 mM dithiothreitol). Lysozyme (5 Vl of 0.1 mg ml', Research WO 98/20113 PCTfS97/20391 -58- Organics, Inc.) was next added and following incubation for 10 min, the cells were lysed by sonication (3 x 15 After centrifugation at 14,000 x g at 4°C for 10 min, the supernatant was removed and assayed for (+)-pinoresinol/(+)-lariciresinol reductase activity (210 tl supernatant per assay) as described in Example 8.
Catalytic activity was established by incubating cell-free extracts for 2 h at 0 C with (±)-pinoresinols (0.4 mM) and [4R- 3 H]NADPH (0.8 mM) under standard conditions. Following incubation, unlabeled (±)-lariciresinols and (±)-secoisolariciresinols were added as radiochemical carriers, with each lignan isolated by reversed-phase HPLC. Controls included assays of a pinoresinol/lariciresinol reductase cDNA which contains an out-of-frame cDNA insert, with all assay components, as well as plr-Fil (SEQ ID No:47) and an out-of-frame pinoresinol/lariciresinol reductase cDNA with no substrate except [4R- 3 H]NADPH. Separation of products and chiral identification were performed by HPLC as previously described (Chu, et al., J. Biol. Chem. 268:27026-27033 (1993)).
Subsequent chiral HPLC analysis revealed that both (+)-lariciresinol and (-)-secoisolariciresinol, but not the corresponding antipodes, were radiolabeled (total activity: 54 nmol By contrast, no catalytic activity was detected either in the absence of (±)-pinoresinols, or when control cells were used which contained a plasmid in which the cDNA insert was not in-frame with the B-galactosidase gene.
Thus, the heterologously expressed (+)-pinoresinol/(+)-lariciresinol reductase and the plant protein function in precisely the same enantiospecific manner.
EXAMPLE 13 Sequence and Homology Analysis of the cDNA Insert of Clone plr-Fil (SEO ID No:47) Encoding (+)-pinoresinol/(+)-lariciresinol reductase Sequence Analysis. The full length sequence of the cloned (+)-pinoresinol/ (+)-lariciresinol reductase plr-Fil(SEQ ID No:47) contained all of the peptide sequences determined by Edman degradation of digest fragments.
The single ORF predicts a polypeptide of 312 amino acids (SEQ ID No:48) with a calculated molecular mass of 34.9 kDa, in close agreement with the value or -36 kDa) estimated previously by SDS-PAGE for the two isoforms of (+)-pinoresinol/(+)-lariciresinol reductase. An equal number of acidic and basic residues are also present, with a theoretical isoelectric point (pI) of 7.09, in contrast to that experimentally obtained by chromatofocussing (pl The amino acid composition reveals seven methionine residues.
Interestingly, the N-terminus of the plant-purified enzyme lacks the initial WO 98/20113 PCTIUS97/20391 -59methionine, this being the most common post-translational protein modification known. Consequently, the first methionine in the cDNA can be considered to be the site of translational initiation. The sequence analysis also reveals a possible Nglycosylation site at residue 215 (although no secretory targeting signal is present), and seven possible protein phosphorylation sites at residues 50 and 228 (protein kinase C-type), residues 228, 250, 302 and 303 (casein kinase II-type and residue 301 (tyrosine kinase type).
Regions of the pinoresinol/lariciresinol polypeptide chain (SEQ ID NO:48) were also identified that contained conserved sequences associated with NADPH binding (J6rnvall, in Dehydrogenases Requiring Nicotinamide Coenzymes (Jeffery, ed) pp. 126-148, Birkhiuser Verlag, Basel (1980); Branden, and Tooze, Introduction to Protein Structure, pp. 141-159, Garland Publishing, Inc., New York and London (1991); Wierenga, R.K. et al., J. Mol. Biol. 187:101-108 (1986)). There is a limited number of invariant amino acids in the sequences of different reductases which are viewed as indicative of NADPH binding sites. These include three conserved glycine residues with the sequence G-X-G-X-X-G (SEQ ID No:76), where X is any residue, and six conserved hydrophobic residues.
The glycine-rich region is considered to play a central role in positioning the NADPH in its correct conformation. In this regard, a comparison of the N-terminal region of (+)-pinoresinol/(+)-lariciresinol reductase with that of the conserved, NADPH-binding regions of Drosophila melanogaster alcohol dehydrogenase (Branden, and Tooze, Introduction to Protein Structure, pp. 141-159, Garland Publishing, Inc., New York and London (1991)), Pinus taeda cinnamyl alcohol dehydrogenase (MacKay J.J. et al., Mol. Gen. Genet. 247:537-545 (1995)), dogfish muscle lactate dehydrogenase (Branden, and Tooze, Introduction to Protein Structure, pp. 141-159, Garland Publishing, Inc., New York and London (1991)) and human erythrocyte glutathione reductase (Branden, and Tooze, Introduction to Protein Structure, pp. 141-159, Garland Publishing, Inc., New York and London (1991)), revealed some interesting parallels. The invariant glycine residues are aligned in every case, as are four of the six hydrophobic residues required for the correct packaging in the formation of the domain. Hence, the NADPH-binding site of (+)-pinoresinol/(+)-lariciresinol reductase isoforms is localized close to the Nterminus.
Homology Analysis: Comparison to Isoflavone Reductase. A BLAST search (Altschul, S.F, et al., J. Mol. Biol. 215:403-410 (1990)) was conducted with the II WO 98/20113 PCT/US97/20391 translated amino acid sequence of (+)-pinoresinol/(+)-lariciresinol reductase (SEQ ID No:48) against the non-redundant peptide database at the National Center for Biotechnology Information. Significant homology was noted for (+)-pinoresinol/(+)-lariciresinol reductase with various isoflavone reductases from the legumes, Cicer arietinum (Tiemann, et al., Eur. J. Biochem. 200:751-757 (1991)) (63.5% similarity, 44.4% identity), Medicago sativa (Paiva, etal., Plant Mol. Biol. 17:653-667 (1991)) (62.6% similarity, 42.0% identity) and Pisum sativum (Paiva, et al., Arch. Biochem. Biophys. 312:501-510 (1994)) (61.6% similarity, 41.3% identity). This observation is of considerable interest since isoflavonoids are formed via a related branch of phenylpropanoid-acetate pathway metabolism. Specifically, isoflavone reductases catalyze the reduction of ca,p-unsaturated ketones during isoflavonoid formation. For example, the Medicago sativa L. isoflavone reductase catalyzes the stereospecific conversion of 2'-hydroxyformononetin to (3R)-vestitone in the biosynthesis of the phytoalexin, (-)-medicarpin (Paiva, N.L. et al., Plant Mol. Biol. 17:653-667 (1991)). This sequence similarity may be significant given that both lignans and isoflavonoids are offshoots of general phenylpropanoid metabolism, with comparable plant defense functions and pharmacological roles, as "phytoestrogens". Consequently, since both reductases catalyze very similar reactions, it is tempting to speculate that the isoflavone reductases may have evolved from (+)-pinoresinol/(+)-lariciresinol reductase. This is considered likely since the lignans are present in the pteridophytes, hornworts, gymnosperms and angiosperms; hence their pathways apparently evolved prior to the isoflavonoids (Gang et al., In Phytochemicals for Pest Control, Hedin et al., eds, ACS Symposium Series, Washington 658:58-59 (1997)).
Comparable homology was also observed with putative isoflavone reductase "homologs" from Arabidopsis thaliana (Babiychuk, et al., Direct Submission (25-MAY-1995) to the EMBL/GenBank/DDBJ databases (1995)) (65.9% similarity, 50.8% identity), Nicotiana tabacum (Hibi, et al., Plant Cell 6:723-735 (1994)) (64.6% similarity, 47.2% identity), Solanum tuberosum (van Eldik, et al., (1995) Direct submission (06-OCT-1995) to the EMBL/GenBank/DDBJ databases) (65.5% similarity, 47.7% identity) Zea mays (Petrucco, et al., Plant Cell 8:69-80 (1996)) (61.6% similarity, 44.9% identity) and especially Lupinus albus (Attuci, et al., Personal communication and direction submission (06/6/96) to the EMBL/Genbank/DDBJ databases (1996)) (85.9% similarity, 66.2% identity).
WO 98/20113 PCT/US97/20391 -61- By contrast, homology with other NADPH-dependent reductases was significantly lower: for example, dihydroflavonol reductases from Petunia hybrida (Beld, M. etal., Plant Mol. Biol. 13:491-502 (1989)) (43.2% similarity, 21.5% identity) and Hordeum vulgare (Kristiansen, and Rohde, Mol. Gen. Genet.
230:49-59 (1991)) (46.2% similarity, 21.1% identity), chalcone reductase from Medicago sativa (Ballance, G.M. and Dixon, Plant Physiol. 107:1027-1028 (1995)) (39.5% similarity, 15.8% identity), chalcone reductase "homolog" from Sesbania rostrata (Goormachtig, etal., (1995) Direct Submission (13-MAR- 1995) to the EMBL/GenBank/DDBJ databases) (47.6% similarity, 24.1% identity), cholesterol dehydrogenase from Nocardia sp. (Horinouchi, et al., Appl. Environ.
Microbiol. 57:1386-1393 (1991)) (46.6% similarity, 21.0% identity) and 3-3steroid dehydrogenase from Rattus norvegicus (Zhao, et al., Journal Endocrinology 127:3237-3239 (1990)) (43.5% similarity, 20.6% identity).
Thus, sequence analysis establishes significant homology between (+)-pinoresinol/(+)-lariciresinol reductase, isoflavone reductases and putative isoflavone reductase "homologs" which do not possess isoflavone reductase activity.
EXAMPLE 14 cDNA Cloning of Thuia plicata (-)-Pinoresinol/(-)-Lariciresinol Reductases Plant Materials. Western red cedar plants (Thuja plicata) were maintained in Washington State University greenhouse facilities.
Materials. All solvents and chemicals used were reagent or HPLC grade.
Taq thermostable DNA polymerase and restriction enzymes (SacI and XbaI) were obtained from Promega. pT7Blue T-vector and competent NovaBlue cells were purchased from Novagen and radiolabeled nucleotide ([ot- 32 P]dCTP) was purchased from DuPont NEN.
Oligonucleotide primers for polymerase chain reaction (PCR) and sequencing were synthesized by Gibco BRL Life Technologies. GENECLEAN II® kits (BIO 101 Inc.) were used for purification of PCR fragments, with the gel-purified DNA concentrations determined by comparison to a low DNA mass ladder (Gibco BRL) in 1.3% agarose gels.
Instrumentation. UV (including RNA and DNA determinations at OD 260 spectra were recorded on a Lambda 6 UV/VIS spectrophotometer. A Temptronic II thermocycler (Thermolyne) was used for all PCR amplifications. Purification of plasmid DNA for sequencing employed a QIAwell Plus plasmid purification system (Qiagen) followed by PEG precipitation (Sambrook, et al., Molecular Cloning: A WO 98/20113 PCT/US97/20391 -62- Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)) or Wizard® Plus SV Minipreps DNA Purification System (Promega), with DNA sequences determined using an Applied Biosystems Model 373A automated sequencer.
Thuja plicata cDNA Library Synthesis. Total RNA (6.7 tg/g fresh weight) was obtained from young green leaves (including stems) of greenhouse-grown western red cedar plants (Thuja plicata) according to the method of Lewinsohn et al (Lewinsohn, et al., Plant Mol. Biol. Rep. 12:20-25 (1994)). A T.plicata cDNA library was constructed using 3 [g of purified poly(A)+ mRNA (Oligotex-dT T M Suspension, Qiagen) with the ZAP-cDNA® synthesis kit, the Uni ZAP T M XR vector, and the Gigapack® II Gold packaging extract (Stratagene), with a titer of 1.2 X 105 pfu for the primary library. The amplified library (7.1 X 108 pfu /ml; 28 ml total) was used for screening (Sambrook, et al., Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)).
T. plicata (-)-Pinoresinol/(-)-Lariciresinol Reductase cDNA Synthesis.
T. plicata (-)-pinoresinol/(-)-lariciresinol reductase cDNA was obtained from mRNA by a reverse transcription-polymerase chain reaction (RT-PCR) strategy (Sambrook, et al., Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)). First-strand cDNA was synthesized from the purified mRNA previously used for the synthesis of the T. plicata cDNA library, described above. Purified mRNA (150 ng) was mixed with linker-primer (1.4 pg) from ZAP-cDNA® synthesis kit (Stratagene), heated to 0 C for 10 min, and quickly chilled on ice. The mixture of denatured mRNA template and linker-primer was then mixed with First Strand Buffer (Life Technologies), 10 mM DTT, 0.5 mM each dNTP, and 200 units of Super ScriptMII (Life Technologies) in a final volume of 20 pl. The reaction was carried out at 42 0
C
for 50 min and then stopped by heating (70 0 C, 15 min). E. coli RNase H (1.5 units, 1 pl) was added to the solution and incubated at 37 0 C for 20 min.
The first-strand reaction (2 pl) was next used as the template in 100-jl PCR reactions (10 mM Tris-HCl, pH 9.0, 50 mM KC1, 0.1 Triton X-100, 1.5 mM MgCl 2 0.2 mM each dNTP, and 5 units of Taq DNA polymerase) with primer CR6- NT (5'GCACATAAGAGTATGGATAAG3')(SEQ ID No:60) (10 pmol) and primer XhoI-Poly(dT) (5'GTCTCGAGTTTTTTTTTTTTTTTTTT3')(SEQ ID No:59) (10 pmol). PCR amplification was carried out in a thermocycler as described in WO 98/20113 PCTIUS97/20391 -63- (Dinkova-Kostova, et al., J. Biol. Chem. 271:29473-29482 (1996)) except for the annealing temperature at 52 0 C. PCR products were resolved in 1.3 agarose gels, where at least two bands possessing the expected length (about 1,200-bp) were observed. The bands were extracted from the gel. The gel-purified PCR products (56 ng) were then ligated into the pT7Blue T-vector (50 ng) and transformed into competent NovaBlue cells, according to Novagen's instructions.
The size and orientation of the inserted cDNAs were determined using the rapid boiling lysis and PCR technique, following the manufacturer's (Novagen's) instructions, with the following primer combinations: R20-mer(SEQ ID No:74) with U19-mer (SEQ ID No:75); R20-mer (SEQ ID No:74) with CR6-NT (SEQ ID U19-mer (SEQ ID No:75) with CR6-NT (SEQ ID No:60). The CR6-NT primer end of the inserted DNAs was located next to the U19-mer primer site of the T-vector. The T-vectors containing the inserted cDNAs were purified with Wizard® Plus SV Minipreps DNA Purification System. Five inserted cDNAs were completely sequenced using overlapping sequencing primers and were shown to be identical except that polyadenylation sites were different. Therefore, the longest cDNA, designated plr-Tpl, (SEQ ID No:61) was used for detection of enzyme activity using the pBluescript expression system.
Sequence Analysis DNA and amino acid sequence analyses were performed using the Unix-based GCG Wisconsin Package (Program Manual for the Wisconsin Package, Version 8, September 1994, Genetics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 53711 (1996); Rice, Program Manual for the EGCG Package, Peter Rice, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1Rq, England) and the ExPASy World Wide Web molecular biology server (Geneva University Hospital and University of Geneva, Geneva, Switzerland).
EXAMPLE cDNA Cloning and Expression of Thuia plicata (+)-Pinoresinol/ (+)-Lariciresinol Reductase T. plicata (+)-Pinoresinol/(+)-Lariciresinol Reductase cDNA cloning. After plr-Tpl was cloned and sequenced, the full-length clone was used to screen the T. plicata cDNA library as described in Example 11, except that the entire plr-Tpl cDNA insert was used as a probe. Several positive clones were sequenced, revealing one new, unique cDNA which was called plr-Tp2. This cDNA encodes a reductase with high sequence similarity to plr-Tpl similarity at the amino acid level),
I'
WO 98/20113 PCT/US97/20391 -64but with substrate specificity properties identical to the original Forsythia intermedia reductase, as described below.
Enzyme Assays. Pinoresinol and lariciresinol reductase activities were assayed by monitoring the formation of 3 H]lariciresinol and 3 H]secoisolariciresinol as set forth in Example 8, with the following modifications. Briefly, each assay for pinoresinol reductase activity consisted of (±)-pinoresinols (5 mM in MeOH, 20 ull) and the enzyme preparation total protein extract from E. coli, 210 ul). The enzymatic reaction was initiated by addition of [4R- 3 H]NADPH (10 mM, 6.79 kBq/mmol in distilled H 2 0, 20 ul). After 3 hour incubation at 30 0 C with shaking, the assay mixture was extracted with EtOAc (500 pl) containing (±)-lariciresinols (20 jig) and (±)-secoisolariciresinols (20 jg) as radiochemical carriers. After centrifugation (13,800 x g, 5 min), the EtOAc solubles were removed and the extraction procedure was repeated. For each assay, the EtOAc solubles were combined with an aliquot (100 pl) removed for determination of its radioactivity using liquid scintillation counting. The remainder of the combined EtOAc solubles was evaporated to dryness in vacuo, reconstituted in MeOH/H 2 0 (30:70, 100 and subjected to reversed phase and chiral column HPLC.
Lariciresinol reductase activity was assayed by monitoring the formation of 3 H]secoisolariciresinol. These assays were carried out exactly as described above, except that (±)-lariciresinols (5 mM in MeOH, 20 upl) were used as substrates, with (±)-secoisolariciresinols (20 jig) added as radiochemical carriers.
Expression ofplr-Tpl in E. coli In order for the open reading frame (ORF) of plr-Tpl to be in frame with the P-galactosidase gene a-complementation particle in pBluescript plr-Tpl was excised out of pT7Blue T-vector with SacI and XbaI, gel-purified, and then ligated into the expression vector digested with these same enzymes. This plasmid, pPCR-Tpl, was transformed into NovaBlue cells according to Novagen's instructions. The transformed cells (5-ml cultures) were grown at 37 0 C in LB medium (Sambrook, etal., Molecular Cloning: A Laboratory Manual, 3 volumes, 3rd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1994)) supplemented with 50 jig ml carbenicillin with shaking (225 rpm) to mid log phase (A 600 The cells were next collected by centrifugation (1000 x g, 10 min) and resuspended in fresh LB medium supplemented with 10 mM IPTG (isopropyl P-D-thioglucopyranoside) and 50 lg ml carbenicillin to an absorbance of 0.6 (at 600 nm). The cells, allowed to grow overnight, were collected by centrifugation and resuspended in 500-700 ll of (per WO 98/20113 PCTIS97/20391 ml culture tube) of buffer (50 mM Tris-HC1, pH 7.5, 2 mM EDTA, and 5 mM DTT). Next, the cells were lysed by sonication (5 x 45 s) and after centrifugation (17500 x g, 4°C, 10 min) the supernatant was removed and assayed for (-)-pinoresinol/(-)-lariciresinol reductase activity as described above. Controls included assays of pBluescript without insert DNA (as negative control) or with pPLR-Fil (cDNA of authentic F. intermedia (+)-pinoresinol/ (+)-lariciresinol reductase in frame) as stereospecific control, as well as pPLR-Tpl with no substrate except (4R)- 3
HNADPH.
The results showed that both (-)-lariciresinol and (+)-secoisolariciresinol were radiolabeled and that no incorporation of radioactivity was found in (-)-secoisolariciresinol. However, accumulation of radiolabel into (+)-lariciresinol was also observed, although at a much slower rate than that observed for (-)-lariciresinol. These results indicate that plr-Tpl can use both (-)-pinoresinol and (+)-pinoresinol as substrates, with the former being converted via (-)-lariciresinol completely to (+)-secoisolariciresinol, and the latter being converted much more slowly to (+)-lariciresinol, but not further to (-)-secoisolariciresinol.
Expression ofplr-Tp2 in E. coli. The plr-Tp2 cDNA was found to be in frame with the P-galactosidase gene cx-complementation particle in pBluescript When evaluated for activity and substrate specificity, as described above, plr-Tp2 was found to possess the same substrate specificity and product formation as the original Forsythia intermedia reductase (Dinkova-Kostova, et al., J. Biol.
Chem. 271:29473-29482 (1996)) except that a small amount of (-)-lariciresinol was also detected. This is interesting, because plr-Tp2 has a higher sequence similarity to plr-Tpl than it does to the Forsythia reductase.
All the above observations were confirmed using deuterolabeled substrates 2
H
2
OC
2
H
3 ]pinoresinols with isolation of the corresponding lignans; each was then subjected to chiral column chromatography and HPLC-mass spectral analysis to confirm these findings.
EXAMPLE 16 Cloning of Additional Pinoresinol/Lariciresinol Reductases from Thuia plicata and Tsuga heterophylla Two additional pinoresinol/lariciresinol reductases were cloned from a Thuja plicata young stem cDNA library as described in Example 15 for the cloning of plr- Tp2. The two additional pinoresinol/lariciresinol reductases were designated plr-Tp3 (SEQ ID No:65) and plr-Tp4 (SEQ ID No:67).
WO 98/20113 PCT/US97/20391 -66- Two additional pinoresinol/lariciresinol reductases were cloned from a Tsuga heterophylla young stem cDNA library as described in Example 15 for the cloning of plr-Tp2. The two additional pinoresinol/lariciresinol reductases from Tsuga heterophylla were designated plr-Tp3 (SEQ ID No:69) and plr-Tp4 (SEQ ID No:71).
While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.
WO 98/20113 PCT/US97/20391 -67- SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Lewis, Norman G Davin, Laurence B Dinkova-Kostova, Albena T Fujita, Masayuki Gang, David R Sarkanen, Simo (ii) TITLE OF INVENTION: Recombinant Pinoresinol/Lariciresinol Reductases,Recombinant Dirigent Proteins and Methods of Use (iii) NUMBER OF SEQUENCES: 76 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Christensen, O'Connor, Johnson Kindness STREET: 1420 Fifth Avenue, Suite 2800 CITY: Seattle STATE: Washington COUNTRY: USA ZIP: WA 98101-2347 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT INFORMATION: NAME: Shelton, Dennis K REGISTRATION NUMBER: 26,997 REFERENCE/DOCKET NUMBER: WSUR111351 (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: 206 682 8100 TELEFAX: 206 224 0779 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 28 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO WO 98/20113 PCT/US97/20391 -68- (v) (vi) (xi) Lys 1 Asp FRAGMENT TYPE: N-terminal ORIGINAL SOURCE: ORGANISM: Forsythia intermedia dirigent protein N-terminal sequence SEQUENCE DESCRIPTION: SEQ ID NO:1: Pro Arg Pro Xaa Arg Xaa Xaa Lys Glu Leu Val Phe Tyr Phe Xaa 5 10 Ile Leu Phe Lys Gly Xaa Asn Tyr Asn Xaa Ala INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protei tryptic fragment (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Thr Ala Met Ala Val Pro Phe Asn Tyr Gly Asp Leu Val Val 1 5 10 n internal Phe Asp Asp Pro Ile Thr Leu Asp Asn Asn INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 16 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protein internal tryptic fragment (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: WO 98/20113 PCT/US97/20391 -69- Tyr Val Gly Thr Leu Asn Phe Ala Gly Ala Asp Pro Leu Leu Xaa Lys 1 5 10 INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protein internal tryptic fragment (xi) Asp 1 SEQUENCE DESCRIPTION: SEQ ID NO:4: Ile Ser Val Ile Gly Gly Thr Gly Asp Phe Phe Met Ala Arg 5 10 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 15 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protein internal tryptic fragment (xi) SEQUENCE DESCRIPTION: SEQ ID Gly Val Ala Thr Leu Met Thr Asp Ala Phe Glu Gly Asp Xaa Tyr 1 5 10 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 10 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide II r WO 98/20113 PCT/US97/20391 (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protein internal tryptic fragment (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Ala Gin Gly Met Tyr Phe Tyr Asp Gin Lys 1 5 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 5 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: Forsythia intermedia dirigent protein internal tryptic fragment (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: Tyr Asn Ala Trp Leu 1 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: "PCR primer PSINT1" (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: AARGARYTNG TNTTYTAYTT Y 21 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: r I WO 98/20113 PCT/US97/20391 -71- LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: "PCR primer PSI1R" (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: TARTTRAANG GNACNGCCAT INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer PSI2R" (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID GTNATNGGRT CRTCRAANAC INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer PSI7R" (iii) HYPOTHETICAL: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: CCATRAARAA RTCNCCNGT 19 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 901 base pairs TYPE: nucleic acid WO 98/20113 PCT/US97/20391 -72- STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (vi) ORIGINAL SOURCE: ORGANISM: Forsythia intermedia clone psd-fil (ix) FEATURE: NAME/KEY: CDS LOCATION: 26..583 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: ATTTCGGCAC GAGATTAAAC CAAAC ATG GTT TCT AAA ACA CAA ATT GTA GCT Met Val Ser Lys Thr Gin Ile Val Ala
CTT
Leu TTC CTT TGC TTC Phe Leu Cys Phe ACT TCC ACC TCT Thr Ser Thr Ser
TCC
Ser GCC ACC TAC GGC Ala Thr Tyr Gly AAG CCA CGC CCT Lys Pro Arg Pro
CGC
Arg CGG CCC TGC AAA Arg Pro Cys Lys
GAA
Glu 35 TTG GTG TTC TAT Leu Val Phe Tyr TTC CAC Phe His 100 148 196 GAC GTA CTT Asp Val Leu GTC GGG TCC Val Gly Ser AAA GGA AAT AAT Lys Gly Asn Asn CAC AAT GCC ACT His Asn Ala Thr TCC GCC ATA Ser Ala Ile GTG CCA TTC Val Pro Phe CCC CAA TGG GGC Pro Gin Trp Gly
AAC
Asn AAG ACT GCC ATG Lys Thr Ala Met AAT TAT Asn Tyr GGT GAC CTA GTT Gly Asp Leu Val
GTG
Val 80 TTC GAC GAT CCC Phe Asp Asp Pro
ATT
Ile ACC TTA GAC AAC Thr Leu Asp Asn
AAT
Asn CTG CAT TCA CCC Leu His Ser Pro
CCA
Pro 95 GTG GGT CGG GCG Val Gly Arg Ala
CAA
Gin 100 GGG ATG TAC TTC Gly Met Tyr Phe
TAT
Tyr 105 GAT CAA AAA AAT Asp Gin Lys Asn
ACA
Thr 110 TAC AAT GCT TGG Tyr Asn Ala Trp GGG TTC TCA TTT Gly Phe Ser Phe TTG TTC Leu Phe 120 340 388 436 AAT TCA ACT Asn Ser Thr TTG TTG AAC Leu Leu Asn 140 TAT GTT GGA ACC Tyr Val Gly Thr
TTG
Leu 130 AAC TTT GCT GGG Asn Phe Ala Gly GCT GAT CCA Ala Asp Pro 135 ACT GGT GAC Thr Gly Asp AAG ACT AGA GAC Lys Thr Arg Asp TCA GTC ATT GGT Ser Val Ile Gly WO 98/20113 WO 9820113PCTIUS97/20391 -73- TTT TTC ATG GCG AGA GGG GTT GCC ACT TTG ATG ACC GAT GCC TTT GAA Phe Phe Met Ala Arg Gly Val Ala Thr Leu Met Thr Asp Ala Phe Glu 155 160 165 GGG GAT GTG TAT TTC CGC CTT CGT GTC GAT ATT AAT TTG TAT GAA TGT Gly Asp Val Tyr Phe Arg Leu Arg Val Asp Ile Asn Leu Tyr Glu Cys 170 175 180 185 TGG TAAACAATTT AGCCGTATAT ATATATATAT ATGGCTATAC ATATTTCATA Trp 532 580 633 693 753 813 873 901
GAATCCAGAT
TACACATTAT
TGTATTTATT
CAAGACGACA
TGTACTATTG
TTGCTGTTTC AAATGTGTGT TTCTTTAGTT TTAATAAATA TAATTATTTA ATGTGTTCAT TGATTATGTA TAAATTCTCT ATTAGTAAAA TATGTAACTT TATTTCATAT CTTCAACAAG AAAAAAAAAA AAAAA
GTGCCACCAA
TTTTGAAGTT
TAGTCAAAGT
TTCAATAATG
TAAAAAAATG
AAATTTAAGT
GACACATATT
TCATATATAT
INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 186 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein: Forsythia intermedia PSD-Fil protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: Met Val Ser Lys Thr Gin Ile Val Ala Phe Leu 1 Ser Thr Ser Cys Lys Glu Asn Tyr His Leu 10 Ala Thr Tyr Gly Lys Pro Arq Asp Val Leu Cys Phe Leu Thr Pro Arg Arg Pro Phe Lys Gly Asn Pro Gin Trp Gly Val Phe Tyr Phe 40 Ala Asn Ala Thr Ser Val Ile Val Gly Asn Phe Lys Thr Ala Met Pro Phe Asn Tyr Asp Leu Val Asp Asp Pro Ile Gly Thr Leu Asp Asn Asn 90 Leu His Ser Pro Pro Val Gly Arg Ala Ala Trp, Leu 115 Met Tyr Phe Asp Gin Lys Asn 110 Tyr Val Gly Phe Ser Phe Leu 120 Asn Ser Thr Thr Leo Asn Phe Ala Gly Ala Asp Pro Leu Leo Asn Lys Thr Arg Asp 130 135 140 WO 98/20113 PCTIUS97/20391 -74- Ile Ser Val Ile Gly Gly Thr Gly Asp Phe Phe Met Ala Arg 145 150 155 Ala Thr Leu Met Thr Asp Ala Phe Glu Gly Asp Val Tyr Phe 165 170 Gly Val 160 Arg Leu 175 Arg Val Asp Ile Asn Leu Tyr Glu Cys Trp 180 185 INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 858 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PSD-Fi2 (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 19..573 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: AATTCGGCAC GAGGAAAA ATG GCA GCT Met Ala Ala ACA CAA ACC ACA Thr Gin Thr Thr GCC CTT TTC Ala Leu Phe 195 AAA ACC AGG Lys Thr Arg CTC TGC CTC Leu Cys Leu 200 CTC ATC TGC ATC Leu Ile Cys Ile
TCC
Ser 205 GCC GTG TAC GGC Ala Val Tyr Gly
CAC
His 210 TCT CGA Ser Arg 215 CGC CCC TGT AAA Arg Pro Cys Lys
GAG
Glu 220 CTC GTT TTC TTC Leu Val Phe Phe
TTC
Phe 225 CAC GAC ATC CTC His Asp Ile Leu
TAC
Tyr 230 CTA GGA TAC AAT Leu Gly Tyr Asn AAC AAT GCC ACC Asn Asn Ala Thr
GCT
Ala 240 GTC ATA GTA GCC Val Ile Val Ala CCT CAA TGG GGA Pro Gin Trp Gly
AAC
Asn 250 AAG ACT GCC ATG Lys Thr Ala Met AAA CCT TTC AAT Lys Pro Phe Asn TTT GGT Phe Gly 260 GAT TTG GTT Asp Leu Val TCT CCT CCG Ser Pro Pro 280
GTG
Val 265 TTT GAT GAT CCC Phe Asp Asp Pro
ATT
Ile 270 ACC TTA GAC AAC Thr Leu Asp Asn AAC CTG CAT Asn Leu His 275 GAT CAA TGG Asp Gin Trp 243 291 339 GTC GGC CGG GCT Val Gly Arg Ala GGA ACT TAT TTC Gly Thr Tyr Phe I WO 98/20113 PCTIUS97/20391 AGT *ATT TAT GGT GCA TGG CTT GGA TTT Ser Ile 295 GAT TAT Asp Tyr Tyr Gly Ala Trp Leu Gly Phe TCA TTT TTG Ser Phe Leu 305 GGA GCT CAT Gly Ala Asp TTC AAT TCT ACT Phe Asn Ser Thr CCA TTG ATT AAC Pro Leu Ile Asn GTT GGA ACT Val Gly Thr 310
AAA
Lys
CTA
Leu 315
TCA
Ser TTT GCT Phe Ala ACT AGG GAC Thr Arg Asp GTA ATT GGA Val Ile Gly 320
ACT
Thr
GGA
Gly 335
GAT
Asp GGT GAT TTT Gly Asp Phe 325 TTC ATG Phe Met 340 GAT GTT Asp Val GCT AGA GGG Ala Arg Gly TAT TTC AGG Tyr Phe Arg 360
GTA
Val 345
CTT
Le u ACT GTG Thr Val TCG ACC Ser Thr 350 ATT AGG Ile Arg 365 GCT TTT GAA Ala Phe Glu
CG
Gly 355
TGG
Trp CGT GTT GAT Arq Val Asp TTG TAT GAG Leu Tyr Glu
TGT
Cys 370
TAAATTTACC
CTGTAATCCT
ATATTTTAAT
TTTTTCCGTT
AAAATTTGCT
TTATTTTTCC
TGTTTTTGAT
CTGTTAAAAA
AAGGGGAAAA
TTTCAATCAT
ATTTTCTTGA
CAATTTGTGG
AAATTGTGGT
AAAAGTATGT
CTTCTTCAA
GTTTGACTCG
CGATTTTATC
CAAAAGCCAX
CCATGTGTTA
AAA AAAA AA A GATTTGACTA
ATAATGTCTT
AATTAGTGAT
TGTTTGGTTC
TAACCACAAC CGTAGGGAGT CTACGTTTTC AATTTCATTC 633 693 753 813 858 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 185 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia dirigent protein PSD-Fi2 (xi) SEQUENCE Ala Ala Lys Thr 5 Met 1 Cys DESCRIPTION: SEQ ID Gin Thr Thr Ala Leu Phe Leu Cys Leu Leu Ile 10 Ile Ser Ala Val Tyr Giy H-is Lys Thr 25 Lys Giu Leu Val Phe Phe Phe His Asp Ile 40 Arg Ser Arg Leu Tyr Leu Arg Pro Cys Gly Tyr Asn Trp Gly Asn Arg Asn Asn Ala Thr Lys Thr Ala Met Ala Ala Val Ile Val Ala Ser Pro 55 Lys Pro Phe Asn Phe Gly Asp 70 Gin Asp Leu Val Val Asp Pro Ile Thr Leu Asp Asn Asn Leu His Ser Pro Pro WO 98/20113 PCTIUS97/20391 -76- Arg Ala Gin Gly Thr Tyr Phe Tyr Asp Gin Trp Ser Ile Tyr Gly Ala 100 105 110 Trp Leu Gly Phe Ser Phe Leu Phe Asn Ser Thr Asp Tyr Val Gly Thr 115 120 125 Leu Asn Phe Ala Gly Ala Asp Pro Leu Ile Asn Lys Thr Arg Asp Ile 130 135 140 Ser Val Ile Gly Gly Thr Gly Asp Phe Phe Met Ala Arg Gly Val Ala 145 150 155 160 Thr Val Ser Thr Asp Ala Phe Glu Gly Asp Val Tyr Phe Arg Leu Arg 165 170 175 Val Asp Ile Arg Leu Tyr Glu Cys Trp 180 185 INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 948 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Tsuga heterophylla dirigent protein cDNA PSD-Thl (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 104..688 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: GGGCACCCTC TCTTGTTAAT TGAGCCCTTC TCCTCCTACT TCTCTTGTTA GTTCTTTGAT CCCATATCTT CTTCTATAAT CACTTTAGTC TATAAGATTG TCA ATG GCA ATC AAG 115 Met Ala Ile Lys AAT CGT AAT AGA GCT GTG CAC TTG TGT TTT CTA TGG CTT CTA CTG TCC 163 Asn Arg Asn Arg Ala Val His Leu Cys Phe Leu Trp Leu Leu Leu Ser 190 195 200 205 TCT GTG TTG TTG CAA ACA AGT GAT GGG AAA AGC TGG AAG AAG CAC CGA 211 Ser Val Leu Leu Gin Thr Ser Asp Gly Lys Ser Trp Lys Lys His Arg 210 215 220 CTC CGA AAG CCT TGT AGG AAT CTG GTG TTG TAT TTC CAT GAT GTA ATC 259 Leu Arg Lys Pro Cys Arg Asn Leu Val Leu Tyr Phe His Asp Val Ile 225 230 235 TAC AAT GGC AGC AAC GCC AAG AAC GCT ACA TCC ACA CTT GTG GGT GCT 307 Tyr Asn Gly Ser Asn Ala Lys Asn Ala Thr Ser Thr Leu Val Gly Ala 240 245 250 WO 98/20113 WO 9820113PCTIUS97/20391
CCC
Pro
GGA
Gly 270
CAC
His GAC GGG TGT AAG GTC AGA TT CTC GCT GGA AAA GAG AAG GAG TTT Gly Ser Asn Leu Thr 260
GAG
Asp Leu Leu Ala Gly Lys 265
GTT
Leu Asp Asn His Phe GTG GGG GTG Leu Ala Val
TTT
Phe 275
GGG
Gly GAT GGG ATC Asp Pro Ile
ACT
Thr 280
TTG
Phe GAG AAC AAT Asp Asn Asn
TTC
Phe 285 TGT CGT CG Ser Pro Pro
GTG
Val 290
AGC
Ser AGA GGT GAG Arg Ala Gin
GGA
Gly 295
TTG
Phe TAG TTT TAT Tyr Phe Tyr GAG ATG Asp Met 300 AAG AAG AGG Lys Asn Thr AGA GAT TAG Thr Asp Tyr 320 AGT AAA TAG Thr Lys Tyr
TTG
Phe 305
AAA
Lys TGG TGG GTT Ser Trp Leu ACG TTT GTA Thr Phe Val GGG AGG ATC Gly Thr Ile
AG
Thr 325
GTG
Val1 TGT GGA GCG Ser Gly Ala GTG AAC TGT Leu Asn Ser 315 GGA ATC GTT Pro Ile Leu GAT TTG ATA Asp Phe Ile 499 AGA GAT ATA Arg Asp Ile 335 ATG GGA Met Ala
TGA
Ser 340
AGA
Thr GTG GGA GGA Val Gly Gly
ACT
Thr 345
GCG
Ala AGA GGA ATG Arq Gly Ile 350
GTT
Val1
GGG
Al a 355
TG
Gys ATG TGG AGG Ile Ser Thr
GAT
Asp 360 TAT GAA GGG Tyr Glu Gly
GAG
Asp 365 TAG TTG GGT Tyr Phe Arg
GTG
Leu 370 GTG AAT ATG Val Asn Ile AGA CTC Thr Leu 375 TAT GAG TGG Tyr Giu Gys
TAG
Tyr 380
TGAGTGCTAT
GTAGTGTTTT
GTTTAGCAGG
GAAAATGGAG
TAAAAAAAAA
AGGTGTATTT TGTGGTTGGA GTTGTGGGAG AGATATGGAG GGAATAATGT ATTTGGATTT AATTGTATGT GGGTGAAGGA AAAaAAA
GTATGGATTT
GAAGCTGTGA
TGTCGAAGGG
GTTTTATTTA
ATATGTTGAT TTTAGTTGAA GATATTGTAG GGTGAAGTTC GATATGTAAT ATTGTGAAGG AAAATAAAAG AAATATTGGT 748 808 868 928 948 INFORMATION FOR SEQ ID NO:17: SEQUENGE GHARAGTERISTIGS: LENGTH: 195 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: Tsuga. heterophylia dirigent protein PSD-Thi (xi) SEQUENGE DESCRIPTION: SEQ ID NO:17: Met Ala Ile Lys Asn Arg Asn Arg Ala Val His Leu Gys Phe Leu Trp 1 5 10 Leu Leu Leu Ser Ser. Val Leu Leu Gin Thr Ser Asp Gly Lys Ser Trp 25 WO 98/20113 PCT/US97/20391 -78- Lys Lys His Arg Leu Arg Lys Pro Cys Arg Asn Leu Val Leu Tyr Phe 40 His Asp Val Ile Tyr Asn Gly Ser Asn Ala Lys Asn Ala Thr Ser Thr 55 Leu Val Gly Ala Pro His Gly Ser Asn Leu Thr Leu Leu Ala Gly Lys 70 75 Asp Asn His Phe Gly Asp Leu Ala Val Phe Asp Asp.Pro Ile Thr Leu 90 Asp Asn Asn Phe His Ser Pro Pro Val Gly Arg Ala Gin Gly Phe Tyr 100 105 110 Phe Tyr Asp Met Lys Asn Thr Phe Ser Ser Trp Leu Gly Phe Thr Phe 115 120 125 Val Leu Asn Ser Thr Asp Tyr Lys Gly Thr Ile Thr Phe Ser Gly Ala 130 135 140 Asp Pro Ile Leu Thr Lys Tyr Arg Asp Ile Ser Val Val Gly Gly Thr 145 150 155 160 Gly Asp Phe Ile Met Ala Arg Gly Ile Ala Thr Ile Ser Thr Asp Ala 165 170 175 Tyr Glu Gly Asp Val Tyr Phe Arg Leu Cys Val Asn Ile Thr Leu Tyr 180 185 190 Glu Cys Tyr 195 INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 849 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Tsuga heterophylla dirigent protein PSD-Th2 cDNA (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 71..625 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: GTTCTGTTCC AAATTCTAAT TAGCCTTCCA TTCATTCCAG GATCCCACTC TTCTTCCTTC AAGATTGGCA ATG GCT ATC AAG AGT AAT AGG GCT GTG CGT TTC TGC TTT 109 Met Ala Ile Lys Ser Asn Arg Ala Val Arg Phe Cys Phe 200 205 WO 98/20113 PCTIUS97/20391 -79- GTA TGG Val Trp 210 CTT CTG TTG TTG Leu Leu Leu Leu AGT GGT TTT GTA Ser Gly Phe Val
TTT
Phe 220 CCA GTC GCA GAG Pro Leu Pro Gin
CCT
Pro 225 TGT AGG AAT GTG Cys Arg Asn Leu
GTT
Val 230 TTG TAT TTC CAC Leu Tyr Phe His
GAT
Asp 235 GTA CTC TAG AAT Val Leu Tyr Asn
GGG
Gly 240 TTC AAC GCC CAC Phe Asn Ala His GCT AGA TCT ACA Ala Thr Ser Thr GTG GGT GCT CCA Val Gly Ala Pro GAG GGG Gin Gly 255 GCT AAG CTG Ala Asn Leu GCG GTG TTC Ala Val Phe 275
ACA
Thr 260 CTT CTC GCT GGA Leu Leu Ala Gly
AAA
Lys 265 GAC AAC CAC TTT Asp Asn His Phe GGA GAT CTG Gly Asp Leu 270 CAG TCT CCT Gin Ser Pro GAG GAT CCC ATC Asp Asp Pro Ile CTT GAC AAG AAT Leu Asp Asn Asn CCG GTG Pro Val 290 GGC AGA GCT GAG Gly Arq Ala Gin
GGA
Gly 295 TTC TAG TTT TAT Phe Tyr Phe Tyr ATG AAG AAG ACC Met Lys Asn Thr
TTG
Phe 305 AGG TCC TGG GTT Ser Ser Trp Leu TTG AGG TTT GTA Phe Thr Phe Val
CTG
Leu 315 AAG TGT AGA GAT Asn Ser Thr Asp AAA GGG ACC ATG Lys Gly Thr Ile
ACG
Thr 325 TTG TGT GGA GCC Phe Ser Gly Ala
GAT
Asp 330 CCA ATC CTT ACT Pro Ile Leu Thr AAA TAG Lys Tyr 335 445 493 541 589 AGA GAT ATA Arq Asp Ile GGA ATG GCC Gly Ile Ala 355 GTG GTG GGA GGA Vai Val Gly Gly GGA GAT TTG ATA Gly Asp Phe Ile ATG GGA AGA Met Ala Arg 350 GTT TAG TTG Val Tyr Phe ACA ATG TCC ACC Thr Ile Ser Thr GCG TAT GAA GGA Ala Tyr Glu Gly GGT CTC Arg Leu 370 CGC GTG AAT ATG Arg Val Asn Ile
ACA
Thr 375 CTC TAT GAA TGG Leu Tyr Giu Cys
TAG
Tyr 380
TGATATTATT
AAGTAGGTAG TGTTTCTCGT GTGGTCTCGC CATTTCGATG GTCTTTTTAA GATTAGTGCT TTCCATAAAT TGTTGTAGCC TCTCAATAAA ACCCAGTAAA ATATTTGTTG TGTTTATTTA GCAGGTTCCA AATGATTGTA TTAGTATTTT ATATTATTTG GATTTTATAG AAGTCCATAA AATATTTCTT GAGGTAAAA AAAAAAAAAA AAAA INFORMATION FOR SEQ ID NO:19: SEQUENGE GHARAGTERISTIGS: LENGTH: 185 amino acids TYPE: amino acid TOPOLOGY: linear
II
WO 98/20113 PCTIU~S97/20391 (ii) MOLECULE TYPE: Tsuga heterophylla dirigent protein translated from PSD-Th2 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: Met Ala Ile Lys Ser Asn Arg Ala Val Leu Asn His Thr Asp Arg Trp Ile Ser 145 Thr Val Leu Leu Asn Leu Asp Ala Leu Thr 130 Val Ile Asn Leu Val Ala Leu Pro Gin Gly 115 Phe Val Ser Ile Gin Leu Thr Ala Ile Gly 100 Phe Ser Gly Thr Thr 180 Ser Tyr Ser Gly Thr Phe Thr Gly Gly Asp 165 Leu Gly Phe Thr Lys 70 Leu Tyr Phe Ala Thr 150 Ala Tyr Phe His Leu 55 Asp Asp Phe Val Asp 135 Gly Tyr Glu Val Asp 40 Val Asn Asn Tyr Leu 120 Pro Asp Glu Cys Phe 25 Val Gly His Asn Asp 105 Asn Ile Phe Gly Tyr 185 Arg Phe Pro Leu Leu Tyr Ala Pro Phe Gly 75 Phe Gln 90 Met Lys Ser Thr Leu Thr Ile Met 155 Asp Val 170 Cys Pro Asn Gln Asp Ser Asn Asp Lys 140 Ala Tyr Phe Gin Gly Gly Leu Pro Thr Tyr 125 Tyr Arg Phe Val Pro Phe Ala Ala Pro Phe 110 Lys Arg Gly Arg Trp Cys Asn Asn Val Val Ser Gly Asp Ile Leu 175 Leu Arg Ala Leu Phe Gly Ser Thr Ile Ala 160 Arg INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 873 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tpl cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 25..591 WO 98/20113 PCTIUS97/20391 -81- (xi) SEQUENCE DESCRIPTION: SEQ ID GTTGCACOAG GGATTTCAAG AGAT ATG AGT AGA ATA GCA TTT CAT TTG TGC Met Ser Arg Ile Ala Phe His Leu Cys 190
TTC
Phe 195 ATG GGG CTT CTG Met Gly Leu Leu
CTC
Leu 200 TCT TCC ACO GTG Ser Ser Thr Val
CTC
Leu 205 AGA PAT GTA GAT Arg Asn Val Asp CAT GCA TGG AAG His Ala Trp Lys CAA CTT CCA ATG Gin Leu Pro Met TGT AAG PAT TTG Cys Lys Asn Leu GTG CTC Val Leu 225 TAC TTT CAT Tyr Phe His OCT GCG CTG Ala Ala Leu 245
GAT
Asp 230 ATA CTC TAC PAT Ile Leu Tyr Asn
GGC
Gly 235 AAA AAC ATT CAC Lys Asn Ile His PAT GCA ACT Asn Ala Thr 240 ACT TTC GCT Thr Phe Ala OTT GCA OCT CCT Val Ala Ala Pro
GCG
Ala 250 TGG GGC AAT CTC Trp Oly Asn Leu GAA CCT Glu Pro 260 TTC PAG TTT OGA Phe Lys Phe Gly
OAT
Asp 265 GTG GTT GTG TTT Val Val Val Phe
GAC
Asp 270 GAT CCC ATT ACT Asp Pro Ile Thr
CTC
Leu 275 GAC AAC PAT CTT Asp Asn Asn Leu TCT CCT CCT GTG Ser Pro Pro Val
OGA
Gly 285 AGA GCG CAG GGA Arq Ala Gin Oly 339 387 TAT TTG TAC PAC Tyr Leu Tyr Asn
ATG
Met 295 AAG ACT ACT TAC Lys Thr Thr Tyr GCT TOG TTG GGG Ala Trp Leu Gly TTC ACA Phe Thr 305 TTT GTG CTG Phe Val Leu 0CC GAC CCC Ala Asp Pro 325 TCG ACA OAT TAT Ser Thr Asp Tyr GGC ACA ATC ACC Oly Thr Ile Thr TTC PAT GGC Phe Asn Oly 320 OTT GOC GGT Val Gly Oly CCO CTG GTT AAG Pro Leu Val Lys
TAC
Tyr 330 AGA OAT ATA TCC Arg Asp Ile Ser
OTT
Va1 335 ACO GGT Thr Oly 340 OAT TTC TTG ATO Asp Phe Leu Met
GCG
Ala 345 AGA OGA ATT 0CC Arg Gly Ile Ala
ACC
Thr 350 CTT TCT ACT OAT Leu Ser Thr Asp
OCA
Ala 355 ATC GAG GGA PAT Ile Olu Oly Asn TAT TTC CGA CTC Tyr Phe Arg Leu
AGO
Arg 365 OTT PAC ATC ACA Val Asn Ile Thr TAC GAO TOT TAC Tyr Glu Cys Tyr TGATOATTAC TPACTAAATO GAGAGTCTTT GTTTAGAGAA TAGTGTGTTO GOCTOTTTAC TTAAAOTCGA COTTCTATOC AOTTGAAGTC TTTOTTTAGA TGAATGCAAT GGTGGGTTTT CTTTCCTCGT GAGGGTTAAC ATCACACTCT ACGAGTGTTA CTOATPATTO TTAAOTATTT GGAGAOTCTT GTAAGTTGAO AATPATOTAT TTTGGCTGTT WO 98/20113 PCT[US97/20391 -82- TATTTTGAGT CGAAAAAAAA AAAAAA AAAAAA AAAAA AAAAAA
AA
INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 189 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: Met Ser Arg Ile Ala Phe His Leu Cys Phe Met Gly Leu Leu Leu Ser Ser Pro Asn Ala Val1 Pro Thr T yr Tyr 145 Arq Phe Thr Met Gly Trp Val Pro Tyr Lys 130 Arg Gly Arg Val Pro Lys Gi y Val Val1 As n 115 Gi y Asp Ile Leu Leu Cys Asn As n Phe Gly 100 Al a Thr Ile Al a Arg 180 Arg Lys Ile Leu Asp Arg Trp Ile Ser Thr 165 Val1 Asn Asn His Thr 70 Asp Ala Le u Thr Val1 150 Leu Asn Val1 Leu Asn 55 Thr Pro Gln Gly Phe 135 Val Ser Ile Asp Val Ala Phe Ile Gly Phe 120 As n Gi y Thr Thr Gly Leu Thr Ala Thr Phe 105 Thr Gly Gly Asp Leu 185 Al a Ph e Al a Pro Asp Leu Val Asp Gly 155 Ile Glu Trp His Leu Phe As n Tyr Leu Pro 140 Asp Giu C ys Gin Leu Al a Gi y His Lys Thr Val1 Met Val 175 INFORMATION FOR SEQ ID NO:22: SEQUENCE CHARACTERISTIC S: LENGTH: 867 base pairs TYPE: nucleic acid STRANDEDNESS: single WO 98/20113 PCTIUS97/20391 -83- TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp2 cDNA (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 80..655 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: GCAATATTGT GCTGGTTCAG TAATCTATGT CTTGTTGACC TGTAGTGTAT ACCCAAACAT TTCTCCTTCT TTTGCAAAA ATG GCA ATG AAG GCT GCA AAA TTT CTG CAT TTC Met Ala Met Lys Ala Ala Lys Phe Leu His Phe TTA TTT ATC TGG Leu Phe Ile Trp CTA GTC TGC ACT Leu Val Cys Thr TTG CTC AAA TCT Leu Leu Lys Ser GCA GAC Ala Asp 215 TGT CAT AGA Cys His Arg TTG TAC TTT Leu Tyr Phe 235
TGG
Trp 220 AAG AAG AAA ATT Lys Lys Lys Ile
CCA
Pro 225 GAG CCA TGT AAG Glu Pro Cys Lys AAT CTG GTA Asn Leu Val 230 CAC AAT GCA His Asn Ala CAT GAT ATC CTC His Asp Ile Leu AAT GGA TCC AAC Asn Gly Ser Asn
AAA
Lys 245 256 ACA TCT Thr Ser 250 GCA ATT GTT GGA Ala Ile Val Gly
GCA
Ala 255 CCC AAA GGA GCC Pro Lys Gly Ala
AAT
Asn 260 CTC ACT ATT TTG Leu Thr Ile Leu
ACT
Thr 265 GGT AAC AAC CAT Gly Asn Asn His GGA GAT GTG GTT Gly Asp Val Val
GTG
Val 275 TTT GAT GAT CCT Phe Asp Asp Pro
ATT
Ile 280 ACT CTT GAC AAC Thr Leu Asp Asn
AAT
Asn 285 CTT CAC TCT ACT Leu His Ser Thr
CCT
Pro 290 GTG GGA AGA GCT Val Gly Arg Ala CAG GGC Gin Gly 295 TTT TAT TTC Phe Tyr Phe GAC ATG AAG AAT Asp Met Lys Asn TTC AAT TCT TGG Phe Asn Ser Trp CTT GGG TTT Leu Gly Phe 310 ACC TTC AAT Thr Phe Asn ACA TTT GTG TTG Thr Phe Val Leu 315 AAT TCA ACA Asn Ser Thr TAT AAG GGC ACC Tyr Lys Gly Thr GGG GCT Gly Ala 330 GAC.CCA ATT CTG Asp Pro Ile Leu
ACT
Thr 335 AAG TAC AGA GAT Lys Tyr Arg Asp
ATA
Ile 340 TCT GTT GTG GGT Ser Val Val Gly
GGT
Gly 345 ACG GGT GAT TTC Thr Gly Asp Phe
TTG
Leu 350 ATG GCC AGA Met Ala Arg GGA ATC Gly Ile 355 GCC ACC ATT TCT Ala Thr Ile Ser 592 WO 98/20113 PCTfUS97/20391 -84- GAT GCA TAC GAG GGA GAT GTT TAT TTC CGT CTT AGG.GTG AAT ATC ACT Asp Ala Tyr Glu Gly Asp Val Tyr Phe Arg Leu Arg Val Asn Ile Thr 365 370 375 CTC TAT GAG TGT TAC TGATTCGAAT TTGATTTCCT GTTCTAATCT CTAATTTGAG Leu Tyr Glu Cys Tyr 380 AGGATGAACA TTCAATAAAC TTTATAGAAG CATATATAAA TAGGTGCAGG AAAATAAGAG GTAAGGGATG AGATTATTTC AGCCTCATAT CTTATTCTGC ATCAGTTTTG TATGCTCATT TGTTTAATAA AATTTGACCA GTTTCATCAT GTTGAAAAAA AAAAAAAAAA AA INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 192 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 640 695 755 815 867 Met Ala Met Lys Ala Ala Lys Phe Leu His Phe Leu'Phe Ile Trp Leu Le u Lys Ile Gi y Phe Leu Met Ser Le u 145 Val1 Lys Leu Ala Gly His Lys Thr 130 Thr Cys Ile Tyr Pro Asp Ser Asn 115 Asn Lys Thr Pro Asn Lys Val1 Thr 100 Thr T yr Tyr ValI Glu Gly Gi y Val1 Pro Phe Lys Arg Leu Pro Ser Al a 70 Val1 Val1 Asn Gly Asp 150 Leu Cys Asn 55 Asn Phe Gly Ser Thr 135 Ile Lys Lys Lys Leu Asp Arg Trp 120 Ile Ser Ser 25 As n His Thr Asp Al a 105 Leu Thr Val1 Asp Val Al a Leu 75 Ile Gly Phe As n Gly 155 Cys Leu Thr Thr Thr Phe Thr Gly 140 Gly Arg Phe Al a Asn Asp Phe 110 Val Asp Gi y Trp His Ile Asn Asn Tyr Leu Pro Asp Lys Asp Val His Asn Asp Asn Ile Phe 160 Leu Met Ala Arg Ile Ala Thr Ile Ser Thr Asp Ala Tyr Glu Gly .175 WO 98/20113 PCTIUS97/20391 Asp Val Tyr Phe Arg Leu Arg Val Asn Ile Thr Leu Tyr Glu Cys Tyr 180 185 190 INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 914 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp3 cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 94..669 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: CGTAGGAAAT ATCTCAGAGG GAGCCGAAAA TTGAGATAAT TGTTGTACGA AATATATAAA AGATTAGATT CAGAGGAATT TGCAGATGTT GTT GTA TCT AAA ACA GCT GCT AGA Val Ser Lys Thr Ala Ala Arg 195 CTG CAT TTA TGC Leu His Leu Cys
TTT
Phe 205 CTA TGG CTT CTA Leu Trp Leu Leu
GTA
Val 210 TCT GCA ATC TTC Ser Ala Ile Phe
ATA
Ile 215 AAA TCT GCA GAT Lys Ser Ala Asp CGT AGC TGG AAA Arg Ser Trp Lys AAG CTT CCA AAG Lys Leu Pro Lys CCC TGT Pro Cys 230 AGA AAT CTT Arg Asn Leu GCA GAG AAT Ala Glu Asn 250
GTG
Val 235 TTA TAT TTT CAT Leu Tyr Phe His
GAT
Asp 240 ATA ATC TAC AAT Ile Ile Tyr Asn GGC AAA AAT Gly Lys Asn 245 GGA GCT AAT Gly Ala Asn GCA ACA TCT GCA Ala Thr Ser Ala GTT TCA GCC CCT Val Ser Ala Pro
CAA
Gin 260 CTC ACC Leu Thr 265 ATT ATG ACT GGT Ile Met Thr Gly
AAT
Asn 270 AAC CAT TTT GGG Asn His Phe Gly
AAT
Asn 275 CTT GCA GTG TTT Leu Ala Val Phe
GAT
Asp 280 GAT CCT ATT ACT Asp Pro Ile Thr GAC AAC AAT CTT Asp Asn Asn Leu
CAC
His 290 TCT CCT CCT GTT Ser Pro Pro Val
GGA
Gly 295 AGA GCT CAG GGC Arg Ala Gin Gly
TTT
Phe 300 TAC TTC TAT GAC Tyr Phe Tyr Asp
ATG
Met 305 AAG AAC ACC TTC Lys Asn Thr Phe AGT GCC Ser Ala 310 WO 98/20113 PCT/US97/20391 -86- TGG CTT GGC TTC ACA TTT GTG CTC AAT TCA ACT Trp Leu Gly Phe Thr Phe Val Leu Asn Ser Thr 315 320 ATT ACT TTC AAT GGA GCA GAT CCC ATC TTA ACA Ile Thr Phe Asn Gly Ala Asp Pro Ile Leu Thr 330 335 TCT GTT GTG GGT GGA ACA GGG GAT TTC TTG ATG Ser Val Val Gly Gly Thr Gly Asp Phe Leu Met 345 350 ACC ATT TCT ACT GAC TCA TAT GAG GGA GAT GTT Thr Ile Ser Thr Asp Ser Tyr Glu Gly Asp Val 360 365 370 GTC AAT ATC ACA CTC TAT GAG TGT TAC TGAACAA Val Asn Ile Thr Leu Tyr Glu Cys Tyr 380 TATTTCTAGT TTTTGGGACC TTTTAAAGAT AGTTGTTTAC TAACACTGTG TGAAGATTAT ATACGATGGA CTATAGAAAC AGCTAATTTA TGTATATGAT CCACTCATAT CTCTTAATAT CCAGATAAAG TATGTCATGT GCTTTGACAA AAAAAAAAAA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 192 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: Val Ser Lys Thr Ala Ala Arg Val Leu His Leu 1 5 10 Leu Val Ser Ala Ile Phe Ile Lys Ser Ala Asp 25 Lys Lys Leu Pro.Lys Pro Cys Arg Asn Leu Val Ile Ile Tyr Asn Gly Lys Asn Ala Glu Asn Ala 55 Ser Ala Pro Gin Gly Ala Asn Leu Thr Ile Met 70 Phe Gly Asn Leu Ala Val Phe Asp Asp Pro Ile 90 Leu His Ser Pro Pro Val Gly Arg Ala Gin Gly 100 105 GAT CAC AAG GGC Asp His Lys Gly 325 AAG TAC AGA GAC Lys Tyr Arg Asp 340 GCA AGA GGA ATT Ala Arg Gly Ile 355 TAT TTC AGG CTT Tyr Phe Arg Leu ATT CCTTGCTCTG
TCC
Ser
ATA
Ile
GCT
Ala
AGG
Arg 375 TTCAATGTCT CTATATGTAA TATGTTGAAT TCTGTTCTGT GATACCGATT TGTAATTATC
AAAAA
749 809 869 914 Cys Cys Leu Thr Thr Thr Phe Leu Ser Phe Ala Asn Asp Phe 110 Trp Trp His Leu Asn Asn Tyr WO 98/20113 PCT/US97/20391 -87- Met Lys Asn 115 Thr Phe Ser Ala Trp Leu 120 Gly Phe Thr Val Leu Asn Ser Thr 130 Asp His Lys Gly Ser 135 Ile Thr Phe Asn Gly 140 Ala Asp Pro Ile Leu 145 Thr Lys Tyr Arg Ile Ser Val Val Gly 155 Gly.Thr Gly Asp Leu Met Ala Arg Gly 165 Ile Ala Thr Ile Thr Asp Ser Tyr Glu Gly 175 Asp Val Tyr Arg Leu Arg Val Ile Thr Leu Tyr Glu Cys Tyr 190 INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 704 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp4 cDNA (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 3..416 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: AG AAT GCC CAC AAT GCA ACA TCT GCA CTT GTT GCA GCC CCT GAG GGA Asn Ala His Asn Ala Thr Ser Ala Leu Val Ala Ala Pro Glu Gly GCC AAT CTC Ala Asn Leu 210 ACC ATT ATG ACT Thr Ile Met Thr
GGT
Gly 215 AAT AAC CAT TTT Asn Asn His Phe AAT ATT GCT Asn Ile Ala GTG TTT Val Phe 225 GAT GAT CCT ATT Asp Asp Pro Ile CTT GAC AAC AAT Leu Asp Asn Asn
CTT
Leu 235 CAC TCT CCT TCT His Ser Pro Ser
GTT
Val 240 GGA AGA GCT CAG Gly Arg Ala Gin
GGC
Gly 245 TTT TAC TTC TAT Phe Tyr Phe Tyr
GAC
Asp 250 ATG AAG GAT ACC Met Lys Asp Thr
TTC
Phe 255 AAT GCT TGG CTT Asn Ala Trp Leu
GGT
Gly 260 TTT ACA TTT GTG Phe Thr Phe Val
CTG
Leu 265 AAT TCA ACT GAT Asn Ser Thr Asp CAC AAG His Lys 270 GGC ACC ATT Gly Thr Ile TTC AAT GGA GCA Phe Asn Gly Ala
GAT
Asp 280 CCA ATC CTG ACC Pro Ile Leu Thr AAG TAC AGA Lys Tyr Arg 285 WO 98/20113 PCT/US97/20391 -88- GAT ATA TCT GTT GTG GGT GGA ACA GGG GAT TTC Asp Ile Ser Val Val Gly Gly Thr Gly Asp Phe 290 295 ATT GCC ACC ATT TCT ACT GAT TCA TAT GAG GGA Ile Ala Thr Ile Ser Thr Asp Ser Tyr Glu Gly 305 310 CTT AGG GTC AAT ATC ACA CTC TAT GAG TGT TAC Leu Arg Val Asn Ile Thr Leu Tyr Glu Cys Tyr 320 325 330 ATTACTAGCT TATAGGAGTC ATTCCCTGGT TCAATGTCTA TGAAGATGGT TTTGAAATAT GGAGCATGTA TTCTAATTTG ATTTTACAGA GTTTAGTTTT GCCCTCTAGA ATATTATGTT TCATATGATG TATGGAGTAC CATTTGGAAT AATTAAAGCA AAAAAAAAAA AAAAAAAAAA AAAAAAAA INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 138 amino acids TYPE: amino acid TOPOLOGY: linear TTG ATG GCC AGA GGA Leu Met Ala Arg Gly 300 GAT GTT TAT TTC AGG Asp Val Tyr Phe Arg 315 TAAAAATGAA TTTCCTCTGT 335 383 436 496 556 616 676 704
GGGCATGGAA
AAGAGCCCTC
TTCAAAATGC
AGCATATTTT
TAAAAGAATT
AAGGAAGTGC
TCTATGAAAG
ATTAAAAAAA
(ii) (xi)
MOLECULE
SEQUENCE
TYPE: protein DESCRIPTION: SEQ ID NO:27: Asn Ala His Asn Ala Thr Ser Ala Leu Ala Ala Pro Glu Asn Leu Thr Phe Asp Asp Gly Ara Ala Met Thr Gly Asn Ile Thr Leu Asp Gly Ala Asn Phe Gly Asn Asn Asn Leu His Ser Asp Ile Ala Val Pro Ser Val Thr Phe Asn Gin Gly Phe Ala Trp Tyr Phe Tyr Asp Met Leu Gly Phe Val Leu Asn Thr Ser Leu Asp His Lys Ile Thr Phe Asn Gly Ala Asp Pro Thr Lys Tyr Ile Ser Val Ala Thr Ile 115 Gly Thr Gly Asp 105 Glu Leu Met Ala Arg Asp Gly Ile Arg Leu Thr Asp Ser Gly Asp Val WO 98/20113 PCTIUS97/20391 -89- Arg Val Asn Ile Thr Leu Tyr Glu Cys Tyr 130 135 INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 820 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp5 cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 43..612 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: GTCTAATTGA GAGAAAATTC CAATAATTTT TTACCAATAG CA ATG AAA GCC ATT Met Lys Ala Ile 140 AGA GTT CTG Arg Val Leu 145 CAT TTA TGC TTT His Leu Cys Phe
CTA
Leu 150 TGT CTT CTA Cys Leu Leu GTG TCT Val Ser 155 GCA ATC TTG Ala Ile Leu CTA AAA Leu Lys 160 TCT GCA GAT TGC Ser Ala Asp Cys AGC TGG AAA AAG Ser Trp Lys Lys CTT CCA AAG CCC Leu Pro Lys Pro
TGC
Cys 175 AAG AAT CTT GTG Lys Asn Leu Val
TTA
Leu 180 TAT TTC CAT GAT Tyr Phe His Asp
ATA
Ile 185 ATC TAC AAT GGC Ile Tyr Asn Gly
AAA
Lys 190 AAT GCA GAG AAT Asn Ala Glu Asn
GCA
Ala 195 ACA TCT GCA CTT Thr Ser Ala Leu GCA GCC CCT GAG Ala Ala Pro Glu GGA GCC Gly Ala 205 246 AAT CTC ACC Asn Leu Thr TTT GAT GAT Phe Asp Asp 225 ATG ACT GGT AAT Met Thr Gly Asn CAT TTT GGG AAT His Phe Gly Asn CTT GCT GTG Leu Ala Val 220 CCT ATT ACT CTT Pro Ile Thr Leu AAC AAT CTC CAC TCT CCT CCT GTG Asn Asn Leu His Ser Pro Pro Val 235 GGA AGA Gly Arg 240 GCT CAG GGA TTT Ala Gin Gly Phe TTC TAT GAC ATG Phe Tyr Asp Met
AAG
Lys 250 AAC ACC TTC AGT Asn Thr Phe Ser
GCT
Ala 255 TGG CTT GGC TTC Trp Leu Gly Phe
ACA
Thr 260 TTT GTG CTG AAT Phe Val Leu Asn
TCA
Ser 265 ACT GAT CAC AAG Thr Asp His Lys .1 I- WO 98/20113 PCT/US97/20391 ACC ATT ACT TTC AAT GGA GCA GAC CCA ATC CTG Thr Ile Thr Phe Asn Gly Ala Asp Pro Ile Leu 275 280 ATA TCT GTT GTG GGT GGA ACA GGG GAT TTC TTG Ile Ser Val Val Gly Gly Thr Gly Asp Phe Leu 290 295 GCC ACC ATT TCT ACT GAT TCA TAT GAG GGA GAA Ala Thr Ile Ser Thr Asp Ser Tyr Glu Gly Glu 305 310 AGG GTC AAT ATC ACA CTC TAT GAG TGT TAC TGA Arg Val Asn Ile Thr Leu Tyr Glu Cys Tyr 320 325 TCCTCTGTAG TTCTTGTTTT GGGTGCCTTT GAGGAATAGT TATGTAGTAA CATGGTCAAT GGAGTCTATT TTGAAGATTA ATATATATAT TGAAGAGAAT GAGATCTGTT TTAGGTAGCT
AAAAAAAA
INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 190 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:; Met Lys Ala Ile Arg Val Leu His Leu Cys Phe 1 5 10 Ser Ala Ile Leu Leu Lys Ser Ala Asp Cys His Leu Pro Lys Pro Cys Lys Asn Leu Val Leu Tyr 40 Tyr Asn Gly Lys Asn Ala Glu Asn Ala Thr Ser 55 Pro Glu Gly Ala Asn Leu Thr Ile Met Thr Gly 70 75 Asn Leu Ala Val Phe Asp Asp Pro Ile Thr Leu 90 Ser Pro Pro Val Gly Arg Ala Gin Gly Phe Tyr 100 105 Asn Thr Phe Ser Ala Trp Leu Gly Phe Thr Phe 115 120 ACC AAG TAC AGA GAC Thr Lys Tyr Arg Asp 285 ATG GCC AGA GGA ATT Met Ala Arg Gly Ile 300 GTT TAT TTC AGG CTT Val Tyr Phe Arg Leu 315 GCAAATG CCTGTCTTCT TCTTGGCTTC AATGTCTCTG TGAAGATATA GTCTCTATAT CTTTTCATTC AAAAAAAAAA 29: Leu Ser Phe Ala Asn Asp Phe Val Cys Trp His Leu Asn Asn Tyr Leu 125 Leu Lys Ile Ala Phe Leu Met Ser Val Lys Ile Ala Gly His Lys Thr WO 98/20113 PCT/US97/20391 -91- Asp His Lys Gly Thr Ile Thr Phe Asn Gly Ala Asp 130 135 140 Pro Ile Leu Thr Tyr Arg Asp Ile Ser 150 Val Val Gly Gly Gly Asp Phe Leu Ala Arg Gly Ile Thr Ile Ser Thr Ser Tyr Glu Gly Glu Val 175 Tyr Phe Arg Leu 180 Arg Val Asn Ile Thr 185 Leu Tyr Glu Cys INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1013 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp6 cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 47..616 (xi) SEQUENCE DESCRIPTION: SEQ ID CTCAGTCTAA TTGAGAGAAA ATTCCAATAA TTTTTTCCCA ATAGCA ATG AAA GCC Met Lys Ala ATT AGA Ile Arg 195 GTT CTG CAA TTA Val Leu Gin Leu
TGC
Cys 200 TTT CTA TGG CTT Phe Leu Trp Leu
CTA
Leu 205 GTA TCT GCA ATC Val Ser Ala Ile
TTG
Leu 210 CTA AAA TCT GCA Leu Lys Ser Ala
GAT
Asp 215 TGC CAT AGC TGG Cys His Ser Trp
AAA
Lys 220 AAG AAG CTT CCA Lys Lys Leu Pro
AAG
Lys 225 CCC TGC AAG AAT Pro Cys Lys Asn GTG TTA TAT TTC Val Leu Tyr Phe GAT ATA ATC TAC Asp Ile Ile Tyr AAT GGC Asn Gly 240 AAA AAT GCA Lys Asn Ala GCC AAT CTC Ala Asn Leu 260
GAG
Glu 245 AAT GCA ACA TCT Asn Ala Thr Ser
GCA
Ala 250 CTT GTT GCA GCC Leu Val Ala Ala CCT GAG GGA Pro Glu Gly 255 AAT CTT GCT Asn Leu Ala ACC ATT ATG ACT Thr Ile Met Thr
GGT
Gly 265 AAT AAC CAT TTT Asn Asn His Phe 295 WO 98/201 13 PCTIUS97/20391 -92- GTG TTT Val Phe 275 GTG GGA Val Gly GAT GAT CCT ATT Asp Asp Pro Ile
ACT
Thr 280
TTT
Phe CTT GAC AAC Leu Asp Asn TAC TTC TAT Tyr Phe Tyr AAT CTC CAC TCT CCT CCT Asn Leu His Ser Pro Pro AGA GCT CAG Arg Ala Gin 290
AGT
Ser
GGC
Gi y 295
TTC
Phe
GAC
Asp 300
AAT
As n 285
ATG
Met AAG AAC ACC Lys Asn Thr
TTC
Phe 305 GCT TGG CTT Ala Trp Leu ACA TTT GTG Thr Phe Val TCA ACT GAT Ser Thr Asp CAC AAG His Lys 320 439 GGC ACC ATT Gly Thr Ile GAT ATA TCT Asp Ile Ser 340 ATT GCC ACC Ile Ala Thr 355 CTT AGG GTC Leu Arg Val 370
ACT
Thr 325
GTT
Val AAT GGA GCA Asn Giy Ala
GAC
Asp 330
GGG
Gly ATC CTG ACC AAG TAC AGA Ile Leu Thr Lys Tyr Arg 335 TTC TTG ATG GCC AGA GSA Phe Leu Met Ala Arg Giy 350 GTG GGT GGA Val Gly Gly
ACA
Thr 345
TCA
Ser
GAT
Asp ATT TCT ACT GAT Ile Ser Thr Asp 360 TAT GAG GGA Tyr Giu Gly GAT GTT TAT TTC ASS Asp Val Tyr Phe Arg 365 TGASCAAATS CCTSTCTTCT AAT ATC ACA CTC TAT AAG TGT Asn Ile Thr Leu Tyr Lys Cys 375
TAC
Tyr 380
TCCTCTGTAS
TATGTASTAA
ATATATATAT
GTTAACTTG
TTTATSSSAT
ATATGAAACT
TTCTTGTTTT
CATGSTCAAT
TSAAGAGAAiT
ATTTCATGTT
TTTTSACATA
AATCATATAT
SSSTSCCTTT
GGAGTCTATT
SAGATCTSTT
TSSTTCAAAS
TTAGATTACT
GAGSAATAST
TTGAAGATTA
TTAGGTAGCT
ATCAGTTATS
TTCATCTCAA
TCTTGGCTTC
TGAAGATATA
CTTTTCATTC
GAGSATTTCC
ATATATSTTA
AATGTCTCTG
GTCTCTCTAT
ATATATATGS
TTTTASTGGT
AATCAGTTAT
583 636 696 756 816 876 936 996 1013 AAGTTCAGAA ATATCAGAAC AACCATTTTA TSGAAAAAAA INFORMATION FOR SEQ ID NO:31: SEQUENCE
CHARACTERISTICS:
LENGTH: 190 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: Met Lys Ala Ile Arg Val Leu Sin Leu Cys Phe Leu Trp Leu Leu Val 1 5 -10 Ser Ala Ile Leu Leu Lys Ser Ala Asp Cys His Ser Trp Lys Lys Lys 25 WO 98/20113 PCTIUS97/20391 -93- Leu.Pro Lys Pro Cys Lys Asn Leu Val Leu Tyr Phe His Asp Ile Ile 40 Tyr Asn Gly Lys Asn Ala Glu Asn Ala Thr Ser Ala Leu Val Ala Ala 55 Pro Glu Gly Ala Asn Leu Thr Ile Met Thr Gly Asn Asn His Phe Gly 70 75 Asn Leu Ala Val Phe Asp Asp Pro Ile Thr Leu Asp Asn Asn Leu His 90 Ser Pro Pro Val Gly Arg Ala Gin Gly Phe Tyr Phe Tyr Asp Met Lys 100 105 110 Asn Thr Phe Ser Ala Trp Leu Gly Phe Thr Phe Val Leu Asn Ser Thr 115 120 125 Asp His Lys Gly Thr Ile Thr Phe Asn Gly Ala Asp Pro Ile Leu Thr 130 135 140 Lys Tyr Arg Asp Ile Ser Val Val Gly Gly Thr Gly Asp Phe Leu Met 145 150 155 .160 Ala Arg Gly Ile Ala Thr Ile Ser Thr Asp Ser Tyr Glu Gly Asp Val 165 170 175 Tyr Phe Arg Leu Arg Val Asn Ile Thr Leu Tyr Lys Cys Tyr 180 185 190 INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 913 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp7 cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 77..652 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: GCAAGCTCAA ATACCCGACT TCTTTCTCTA CTTCAGAGCT CTTCTTTCTT CAAACATTTT TGATATATTT TGCACA ATG GCA ATC TGG AAT GGA AGA GTT CTG AAT TTG 109 Met Ala Ile Trp Asn Gly Arg Val Leu Asn Leu 195 200 TGC ATT CTG TGG CTT CTG GTC TCC ATA GTT TTG CTG AAT GGT ATA GAT 157 Cys Ile Leu Trp Leu Leu Val Ser Ile Val Leu Leu Asn Gly Ile Asp 205 210 215 WO 98/20113 PCT/US97/20391 -94- TGC CAT AGT Cys His Ser 220 AGA AAA AAG AAG Arg Lys Lys Lys CCA AAG CCA TGT Pro Lys Pro Cys
AGG
Arg 230 AAT CTT GTT Asn Leu Val TTG TAT Leu Tyr 235 TTT CAT GAT ATT Phe His Asp Ile
ATC
Ile 240 TAC AAT GGT AAA Tyr Asn Gly Lys GCA GGC AAT GCA Ala Gly Asn Ala
ACA
Thr 250 TCT ACG CTT GTT Ser Thr Leu Val GCC CCT CAA GGA Ala Pro Gin Gly AAT CTC ACC ATT Asn Leu Thr Ile 253 301 349 ACT GGC AAT TAC Thr Gly Asn Tyr
CAT
His 270 TTT GGA GAT CTG Phe Gly Asp Leu
TCT
Ser 275 GTG TTT GAT GAT Val Phe Asp Asp CCT ATT Pro Ile 280 ACT GTT GAC Thr Val Asp TTT TAC TTC Phe Tyr Phe 300
AAC
Asn 285 AAT CTT CAT TCT Asn Leu His Ser CCT GTG GGA AGA Pro Val Gly Arg GCT CAG GGC Ala Gin Gly 295 CTT GGG TTC Leu Gly Phe TAT GAC ATG AAG Tyr Asp Met Lys ACA TTC AGT GCT Thr Phe Ser Ala
TGG
Trp 310 ACA TTT Thr Phe 315 GTG CTG AAC TCA Val .Leu Asn Ser
ACA
Thr 320 GAT TAT AAA GGC Asp Tyr Lys Gly ATT ACT TTC GGT Ile Thr Phe Gly
GGA
Gly 330 GCA GAC CCA ATT Ala Asp Pro Ile
TTG
Leu 335 GCT AAG TAC AGA Ala Lys Tyr Arg ATA TCT GTT GTG Ile Ser Val Val
GGT
Gly 345 GGT ACT GGA GAT Gly Thr Gly Asp TTG ATG GCA AGA Leu Met Ala Arg
GGA
Gly 355 ATT GCT ACA ATC Ile Ala Thr Ile GAT ACT Asp Thr 360 GAT GCA TAT Asp Ala Tyr
GAG
Glu 365 GGA GAT GTT TAT Gly Asp Val Tyr
TTC
Phe 370 AGG CTA AGG GTG Arg Leu Arg Val AAT ATC ACA Asn Ile Thr 375 CTC TAT GAG TGT TAC Leu Tyr Glu Cys Tyr 380 TGATCCATGG GTATTCTATG TAGAATAGCT CAATCTGATA 692 TGGCTATATT ATTTTGAGAG CATAGGTAGT TAAGTTTTAT AACTAAGTAG TGAACCATGA GATCATTGAA AACTTGGGTG CTCATGCACA GTTTTCATAT TTTCTAAATA AGTCTGCTCG ACTATTACAT TTATGGATTG TTGAGAATTG TGTCGCTTAT TACTTTATGA ATAAGCTATT TTAAACAAAG TTTTCACAAG TTTAAAAAAA AAAAAAAAAA A INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 192 amino acids TYPE: amino acid TOPOLOGY: linear WO 98/20113 PCT/US97/20391 (ii) MOLECULE (xi) SEQUENCE TYPE: protein DESCRIPTION: SEQ ID NO:33: Met 1 Leu Lys Ile Ala Phe Leu Met Ser Leu 145 Leu Asp Ala Ile Trp Asn Gly Arg Val Leu Asn Leu Cys Ile Leu
I
Val Ser Lys Leu Ile Tyr Ala Pro Gly Asp His Ser Lys Asn 115 Thr Asp 130 Ala Lys Met Ala Val Tyr Ile Val Pro Lys Asn Gly Gin Gly Leu Ser Pro Pro 100 Thr Phe Tyr Lys Tyr Arg Arg Gly 165 Phe Arg 180 Leu Pro Lys Ala 70 Val Val Ser Gly Asp 150 Ile Leu Leu Cys Asn 55 Asn Phe Gly Ala Thr 135 Ile Ala Arg Asn Arg 40 Ala Leu Asp Arg Trp 120 Ile Ser Thr Val Gly 25 Asn Gly Thr Asp Ala 105 Leu Thr Val Ile Asn 185 Ile Leu Asn Ile Pro 90 Gin Gly Phe Val Asp 170 Ile Asp Val Ala Met 75 Ile Gly Phe Gly Gly 155 Thr Thr Cys Leu Thr Thr Thr Phe Thr Gly 140 Gly Asp Leu His Tyr Ser Gly Val Tyr Phe 125 Ala Thr Ala Tyr Ser Phe Thr Asn Asp Phe 110 Val Asp Gly Tyr Glu 190 Trp Leu Arg Lys His Asp Leu Val Tyr His Asn Asn Tyr Asp Leu Asn Pro Ile Asp Phe 160 Glu Gly 175 Cys Tyr INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 890 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata dirigent protein PSD-Tp8 cDNA (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 44..619 WO 98/20113 PCTIUS97/20391 -96- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: CAGAGCTCTT CTTTCTTCAA ACATTTTTGA TATATTTTGC ACA ATG GCA ATC TGG Met Ala Ile Trp 195 AAT GGA AGA Asn Gly Arg GTT TTG CTG Val Leu Leu 215
GTT
Val 200 CTG AAT TTG TGC Leu Asn Leu Cys CTG TGG CTT CTG Leu Trp Leu Leu GTC TCC ATA Val Ser Ile 210 AAG CTT CCA Lys Leu Pro 103 151 AAT GGT ATA GAT Asn Gly Ile Asp
TGC
Cys 220 CAT AGT AGA AAA His Ser Arg Lys
AAG
Lys 225 AAG CCA Lys Pro 230 TGT AGG AAT CTT Cys Arg Asn Leu
GTT
Val 235 TTG TAT TTT CAT Leu Tyr Phe His ATT ATC TAC AAT Ile Ile Tyr Asn
GGT
Gly 245 AAA AAT GCA GGC Lys Asn Ala Gly
AAT
Asn 250 GCA ACA TCT ACG Ala Thr Ser Thr
CTT
Leu 255 GTT GCA GCC CCT Val Ala Ala Pro
CAA
Gin 260 247 295 GGA GCT AAT CTC Gly Ala Asn Leu ATT ATG ACT GGC Ile Met Thr Gly
AAT
Asn 270 TAC CAT TTT GGA Tyr His Phe Gly GAT CTG Asp Leu 275 GCT GTG TTT Ala Val Phe CCT GTG GGA Pro Val Gly 295
GAT
Asp 280 GAT CCT ATT ACT Asp Pro Ile Thr
GTT
Val 285 GAC AAC AAT CTT Asp Asn Asn Leu CAT TCT CCT His Ser Pro 290 AAG AAT ACA Lys Asn Thr AGA GCT CAG GGC Arg Ala Gin Gly TAC TTC TAT Tyr Phe Tyr GAC ATG Asp Met 305 TTC AGT Phe Ser 310 GCT TGG CTT GGG Ala Trp Leu Gly
TTC
Phe 315 ACA TTT GTG CTG Thr Phe Val Leu
AAC
Asn 320 TCA ACA GAT TAT Ser Thr Asp Tyr 439
AAA
Lys 325 GGC ACT ATT ACT Gly Thr Ile Thr
TTC
Phe 330 GGT GGA GCA GAC Gly Gly Ala Asp ATT TTG GCT AAG Ile Leu Ala Lys
TAC
Tyr 340 AGA GAT ATA TCT Arg Asp Ile Ser
GTT
Val 345 GTG GGT GGT ACT Val Gly Gly Thr
GGA
Gly 350 GAT TTC TTG ATG GCA AGA Asp Phe Leu Met Ala Arg GGA ATT GCT Gly Ile Ala AGG CTA AGG Arg Leu Arg 375
ACA
Thr 360 ATC GAT ACT GAT GCA TAT GAG GGA GAT Ile Asp Thr Asp Ala Tyr Glu Gly Asp 365 GTT TAT TTC Val Tyr Phe 370 GTG AAT ATC ACA Val Asn Ile Thr
CTC
Leu 380 TAT GAG TGT TAC TGATCCATGG Tyr Glu Cys Tyr GTATTCTATG TAGAATAGCT CAATCTGATA TGGCTATATT ATTTTGAGAG
CATAGGTAGT
TAAGTTTTAT AACTAAGTAG TGAACCATGA GATCATTGAA AACTTGGGTG
CTCATGCACA
GTTTTCATAT TTTCTAAATA AGTCTGCTCG ACTATTACAT TTATGGATTG
TTGAGAATTG
629 689 749 WO 98/20113 PCT/US97/20391 -97.
TGTCGCTTAT TACTTTATGA ATAAGCTATT TTAAACAAAG TTTTCACAAG TTTAAAAGTT GTCAAAAAAA AAAAAAAAAA A INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 192 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID 869 890 Met Leu Lys Ile Al a Phe Leu Met Ser Leu 145 Leu Asp Ala Ile Trp Asn Gly Arq Val Leu Asn Leu Cys Ile Leu Val1 Lys Ile Al a Gly His Lys Thr 130 Ala Met Val Ser Leu Tyr Pro Asp Ser Asn 115 Asp Lys Ala Tyr Ile Pro Asn Gin Leu Pro 100 Thr Tyr Tyr Arg Phe 180 Val1 Lys Gly Gly Al a Pro Phe Lys Arg Gi y 165 Arg Leu Pro Lys Ala 70 Val Val1 Ser Gly Asp 150 Ile Leu Leu Cys Asn 55 Asn Phe Gly Ala Thr 135 Ile Ala Arg Asn Ar g 40 Al a Leu Asp Arg T rp 120 Ile Ser Thr Val Gly 25 Asn Gly Thr Asp Ala 105 Leu Thr Val1 Ile Asn 185 Ile Leu As n Ile Pro 90 Gin Gi y Phe Val1 Asp 170 Ile Asp Val1 Ala Met 75 Ile Gly Phe Gly Gly 155 Thr Thr Cys Leu Thr Th r Thr Ph e Thr Gi y 140 Gi y Asp Leu His Tyr Ser Gly Val Tyr Phe 125 Aila Thr Al a Tyr Ser Phe Thr Asn Asp Phe 110 Val1 Asp Giy Tyr Giu 190 Trp Leu Arg Lys His Asp Leu Val Tyr His Asn Asn Tyr Asp Leu Asn Pro Ile Asp Phe 160 Glu Gly 175 Cys Tyr INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 30 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant WO 98/20113 PCTIS97/20391 -98- (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: N-terminal sequence from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: Gly Lys Ser Lys Val Leu Ile Ile Gly Gly Thr Gly Tyr Leu Gly Arg 1 5 10 Arg Leu Val Lys Ala Ser Leu Ala Gin Gly His Glu Thr Tyr 25 INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 16 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO FRAGMENT TYPE: internal tryptic fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: Phe Met Asp Ile Ala Met Xaa Pro Gly Lys Val Thr Leu Asp Glu Lys 1 5 10 INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 13 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE:
NO
FRAGMENT TYPE: internal tryptic fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase WO 98/20113 PCTIUS97/20391 -99- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: Leu Pro Xaa Glu Phe Gly Met Asp Pro Ala Lys Phe Met 1 5 INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 8 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE:
NO
FRAGMENT TYPE: internal tryptic fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: Glu Val Val Gin Xaa Xaa Glu Lys 1 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 10 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO FRAGMENT TYPE: internal tryptic fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID Tyr Xaa Ser Val Glu Glu Tyr Leu Lys Arg 1 5 INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 12 amino acids TYPE: amino acid WO 98/20113 PCT[US97/20391 -100- STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: internal cyanogen bromide fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: Met Glu Pro Gly Lys Val Thr Leu Asp Glu Lys Met 1 5 INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 7 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: internal cyanogen bromide fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: Met Asp Pro Ala Lys Phe Met 1 INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 7 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO FRAGMENT TYPE: internal cyanogen bromide fragment from Forsythia intermedia (+)-pinoresinol/(+)-lariciresinol reductase WO 98/20113 PCT/US97/20391 -101- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: Met Leu Ile Ser Phe Lys Met 1 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer (iii) HYPOTHETICAL:
NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: ATHATHGGNG GNACNGGNTA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer PLR14R" (iii) HYPOTHETICAL: NO (xi) SEQUENCE DESCRIPTION: SEQ ID GYTCCATNGC NATRTCCAT 19 INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer (iii) HYPOTHETICAL:
NO
SEQUENCE DESCRIPTION: SEQ ID NO:46: TCYTCNARNG TNACYTTNCC on 1 .1 WO 98/20113 -102- INFORMATION FOR SEQ ID NO:47: PCTIUS97/20391 SEQUENCE CHARACTERISTICS: LENGTH: 1060 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PLR-Fil (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 28..963 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: AATTCGGCAC GAGAAAAACA GAGAGAG ATG GGA AAA AGC AAA GTT TTG ATC Met Gly Lys Ser Lys 195 Val Leu Ile 200 ATT GGG GGT ACA Ile Gly Gly Thr TAC TTA GGG AGG Tyr Leu Gly Arg
AGA
Arg 210 TTG GTT AAG GCA Leu Val Lys Ala AGT TTA Ser Leu 215 GCT CAA GGT Ala Gin Gly GAT ATT GAT Asp Ile Asp 235
CAT
His 220 GAA ACA TAC ATT Glu Thr Tyr Ile
CTG
Leu 225 CAT AGG CCT GAA His Arg Pro Glu ATT GGT GTT Ile Gly Val 230 CAA GGA GCT Gin Gly Ala AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 240 ATA TCA TTT AAA Ile Ser Phe Lys CAT CTT His Leu 250 GTA TCT GGT TCT Val Ser Gly Ser
TTC
Phe 255 AAG GAT TTC AAC Lys Asp Phe Asn
AGT
Ser 260 CTG GTC GAG GCT Leu Val Glu Ala
GTC
Val 265
CGA
Arg AAG CTC GTA GAC Lys Leu Val Asp AGC CAT CAA ATT Ser His Gin Ile 285 GTA ATC AGC GCC Val Ile Ser Ala TCT GGT GTT CAT Ser Gly Val His 291 339 CTT CTT CAA CTC Leu Leu Gin Leu CTT GTT-GAA GCT Leu Val Glu Ala ATT AAA Ile Lys 295 GAG GCT GGA Glu Ala Gly CCT GCA AAA Pro Ala Lys 315 GTC AAG AGA TTT Val Lys Arg Phe
TTA
Leu 305 CCA TCT GAG TTT Pro Ser Glu Phe GGA ATG GAT Gly Met Asp 310 GTA ACA CTT Val Thr Leu TTT ATG GAT ACG GCC ATG GAA CCC GGA Phe Met Asp Thr Ala Met Glu Pro Gly 320
AAG
Lys 325 GAT GAG Asp Glu 330 AAG ATG GTG GTA Lys Met Val Val AAA GCA ATT GAA Lys Ala Ile Glu GCT GGG ATT CCT Ala Gly Ile Pro WO 98/20113 PCT/US97/20391 -103- ACA TAT GTC TCT Thr Tyr Val Ser
GCA
Ala 350 AAT TGC TTT GCT Asn Cys Phe Ala TAT TTC TTG GGA Tyr Phe Leu Gly
GGT
Gly 360 CTC TGT CAA TTT Leu Cys Gln Phe AAA ATT CTT CCT Lys Ile Leu Pro AGA GAT TTT GTC Arg Asp Phe Val ATT ATA Ile Ile 375 CAT GGA GAT His Gly Asp GCA ACT TAT Ala Thr Tyr 395
GGT
Gly 380 AAC AAA AAA GCA Asn Lys Lys Ala
ATA
Ile 385 TAT AAC AAT GAA Tyr Asn Asn Glu GAT GAT ATA Asp Asp Ile 390 CTC AAC AAG Leu Asn Lys GCC ATC AAA ACA Ala Ile Lys Thr
ATT
Ile 400 AAT GAT CCA AGA Asn Asp Pro Arg ACA ATC Thr Ile 410 TAC ATT AGT CCT Tyr Ile Ser Pro
CCA
Pro 415 AAA AAC ATC CTT Lys Asn Ile Leu
TCA
Ser 420 CAA AGA GAA GTT Gln Arg Glu Val
GTT
Val 425 CAG ACA TGG GAG Gin Thr Trp Glu CTT ATT GGG AAA GAA CTG CAG AAA ATT Leu Ile Gly Lys Glu Leu Gln Lys Ile 435 CTC TCG AAG GAA Leu Ser Lys Glu
GAT
Asp 445 TTT TTA GCC TCC Phe Leu Ala Ser AAA GAG CTC GAG Lys Glu Leu Glu TAT GCT Tyr Ala 455 819 CAG CAA GTG Gin Gln Val CTT ACG AGT Leu Thr Ser 475
GGA
Gly 460 TTA AGC CAT TAT Leu Ser His Tyr
CAT
His 465 GAT GTC AAC TAT Asp Val Asn.Tyr CAG GGA TGC Gin Gly Cys 470 TTT GAG ATA GGA Phe Glu Ile Gly GAA GAA GAG GCA TCT AAA CTT TAT Glu Glu Glu Ala Ser Lys Leu Tyr 485 CCA GAG Pro Glu 490 GTT AAG TAT ACC Val Lys Tyr Thr
AGT
Ser 495 GTG GAA GAG TAC Val Glu Glu Tyr
CTC
Leu 500 AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CCAAAAA INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: Met Gly Lys Ser Lys Val Leu Ile Ile Gly Gly Thr Gly Tyr Leu Gly 1 5 10 1023 1060 WO 98/20113 WO 9820113PCT/IJS97/20391 -104- Arg Leu Ile Asp Ser Leu Leu Met Ala 145 Phe Pro Ile Asn Asn 225 Gly Ser His Giu Giu 305 Arg His Ser Phe Ala Lys Pro Giu 130 Ile Ala Ser Tyr Asp 210 Ile Lys Val1 Asp Glu 290 Giu Leu Arg Phe Asn Ile Leu Ser 115 Pro Glu Gly Arg Asn 195 Pro Leu Glu Lys Val 275 Giu Tyr Val Pro Lys Ser Ser Val 100 Giu Gly Lys T yr Asp 180 Asn Arg Ser Leu Glu 260 Asri Ala Leu Lys Gi u Met Leu Gly Giu Phe Lys Al a Phe 165 Phe Giu Thr Gin Gin 245 Leu Tyr Ser Lys Ala Ser Ile Giy Gln Giy 55 Val Giu 70 Val His Ala Ile Gly Met Val Thr 135 Gly Ile 150 Leu Gly Val Ile Asp Asp Leu Asn 215 Arg Giu 230 Lys Ile Glu Tyr Gin Giy Lys Leu .295 Arg Tyr 310 Leu Val1 40 Aia Al a Ile Lys Asp 120 Leu Pro Gi y Ile Ile 200 Lys Val Thr Al a Cys 280 Tyr Val Al a 25 Asp His Val1 Arg Giu 105 Pro Asp Phe Leu His 185 Al a Thr Val Leu Gin 265 Le u Pro Gin Ile Le u Lys Ser 90 Ala Ala Glu Thr Cys 170 Gly Thr Ile Gin Ser 250 Gin Thr Glu Gi y Asp Val Leu 75 His Giy Lys Lys T yr 155 Gin Asp Tyr Tyr Thr 235 Lys Val Ser Val1 His Lys Ser Val1 Gin Asn Phe Met 140 Val Phe Gly Al a Ile 220 Trp Giu Gly Phe Lys 300 Glu Val1 Giy Asp Ile Val1 Met 125 Val1 Ser Giy Asn Ile 205 Ser Giu Asp Leu Giu 285 Tyr Thr Glu Ser Val Leu Lys 110 Asp Val1 Ala Lys Lys 190 Lys Pro Lys Ph e Ser 270 Ile Thr Tyr Met Phe Val Leu Arg Thr Arg As n Ile 175 Lys Thr Pro Leu Leu 255 His Gly Ser Ile Lys Ile Gin Phe Al a Lys Cys 160 Leu Ala Ile Lys Ile 240 Al a Tyr Asp Val1 INFORMATION FOR SEQ ID NO:49: WO 98/20113 PCT/US97/20391 -105- SEQUENCE CHARACTERISTICS: LENGTH: 1112 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PLR-Fi2 (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 44..979 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: AATTCGGCAC GAGCTCGTGC CGCACAGAGA AAAACAGAGA GAG ATG GGA AAA AGC Met Gly Lys Ser 315 AAA GTT TTG Lys Val Leu AAG GCA AGT Lys Ala Ser 335 ATT GGG GGT ACA Ile Gly Gly Thr
GGG
Gly 325 TAC TTA GGG AGG Tyr Leu Gly Arg AGA TTG GTT Arg Leu Val 330 CAT AGG CCT His Arg Pro TTA GCT CAA GGT Leu Ala Gin Gly
CAT
His 340 GAA ACA TAC ATT Glu Thr Tyr Ile
CTG
Leu 345 GAA ATT Glu Ile 350 GGT GTT GAT ATT Gly Val Asp Ile AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 360 ATA TCA TTT AAA Ile Ser Phe Lys
ATG
Met 365 CAA GGA GCT CAT Gin Gly Ala His
CTT
Leu 370 GTA TCT GGT TCT Val Ser Gly Ser AAG GAT TTC AAC Lys Asp Phe Asn
AGT
Ser 380 CTG GTC GAG GCT Leu Val Glu Ala AAG CTC GTA GAC Lys Leu Val Asp
GTA
Val 390 GTA ATC AGC GCC Val Ile Ser Ala ATT TCT Ile Ser 395 GGT GTT CAT Gly Val His GAA GCT ATT Glu Ala Ile 415
ATT
Ile 400 CGA AGC CAT CAA Arg Ser His Gin CTT CTT CAA CTC Leu Leu Gin Leu AAG CTT GTT Lys Leu Val 410 CCA TCT GAG Pro Ser Glu AAA GAG GCT GGA Lys Glu Ala Gly
AAT
Asn 420 GTC AAG AGA TTT Val Lys Arg Phe TTT GGA Phe Gly 430 ATG GAT CCT GCA Met Asp Pro Ala TTT ATG GAT ACG Phe Met Asp Thr
GCC
Ala 440 ATG GAA CCC GGA Met Glu Pro Gly
AAG
Lys 445 GTA ACA CTT GAT Val Thr Leu Asp
GAG
Glu 450 AAG ATG GTG GTA Lys Met Val Val AAA GCA ATT GAA Lys Ala Ile Glu WO 98/20113 PCT/US97/20391 -106- GCT GGG ATT CCT Ala Gly Ile Pro
TTC
Phe 465 ACA TAT GTC TCT Thr Tyr Val Ser
GCA
Ala 470 AAT TGC TTT GCT Asn Cys Phe Ala GGT TAT Gly Tyr 475 TTC TTG GGA Phe Leu Gly TTT GTC ATT Phe Val Ile 495 CTC TGT CAA TTT Leu Cys Gin Phe
GGC
Gly 485 AAA ATT CTT CCT Lys Ile Leu Pro TCT AGA GAT Ser Arg Asp 490 TAT AAC AAT Tyr Asn Asn ATA CAT GGA GAT Ile His Gly Asp
GGT
Gly 500 AAC AAA AAA GCA Asn Lys Lys Ala GAA GAT Glu Asp 510 GAT ATA GCA ACT Asp Ile Ala Thr GCC ATC AAA ACA Ala Ile Lys Thr ATT AAT GAT CCA Ile Asn Asp Pro 520 AAA AAC ATC CTT Lys Asn Ile Leu
AGA
Arg
TCA
Ser 540 679
ACC
Thr 525 CTC AAC AAG ACA Leu Asn Lys Thr
ATC
Ile 530 TAC ATT AGT CCT Tyr Ile Ser Pro CAA AGA GAA GTT Gin Arg Glu Val
GTT
Val 545 CAG ACA TGG GAG Gin Thr Trp Glu
AAG
Lys 550 CTT ATT GGG AAA Leu Ile Gly Lys GAA CTG Glu Leu 555 CAG AAA ATT Gin Lys Ile CTC GAG TAT Leu Glu Tyr 575
ACA
Thr 560 CTC TCG AAG GAA Leu Ser Lys Glu
GAT
Asp 565 TTT TTA GCC TCC Phe Leu Ala Ser GTG AAA GAG Val Lys Glu 570 GAT GTC AAC Asp Val Asn GCT CAG CAA GTG Ala Gin Gin Val TTA AGC CAT TAT Leu Ser His Tyr 823 871 919 TAT CAG Tyr Gin 590 GGA TGC CTT ACG Gly Cys Leu Thr AGT TTT GAG ATA GGA GAT Ser Phe Glu Ile Gly Asp 595 600 GAA GAA GAG GCA Glu Glu Glu Ala AAA CTT TAT CCA Lys Leu Tyr Pro GTT AAG TAT ACC Val Lys Tyr Thr GTG GAA GAG TAC Val Glu Glu Tyr
CTC
Leu 620 AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA 1019 1079 1112 TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CAAACGAGTG GTCGATTGAA ATGGAATTTT GAAGTCAAAA AAA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID WO 98/20113 WO 9820113PCTIUS97/20391 -107- Met Giy Lys Ser Lys Val Leu Ile Arg Leu Ile Asp Ser Leu Leu Met Al a 145 Phe Pro Ile Asn Asn 225 Gly Ser His Giu Arg His Ser Phe Ala Lys Pro Giu 130 Ile Ala Ser T yr Asp 210 Ile Lys Val Asp Giu 290 Leu Arg Phe As n Ile Leu Ser 115 Pro Giu Gly Arq Asn 195 Pro Leu Giu Lys Val 275 Giu Val1 Pro Lys Ser Ser Val1 100 Giu Gly Lys Tyr Asp 180 As n Arg Ser Leu Giu 260 As n Ala Lys Giu Met Leu Giy Giu Phe Lys Ala Phe 165 Phe Glu Thr Gin Gin 245 Leu Tyr Ser Ala Ser Ile Giy Gin Gly 55 Val Giu 70 Val His Ala Ile Giy Met Val Thr 135 Gly Ile 150 Leu Gly Val Ile Asp Asp Leu Asn 215 Arg Giu 230 Lys Ile Giu Tyr Gin Giy Lys Leu 295 Leu Val 40 Ala Aia Ile Lys Asp 120 Leu Pro Gly Ile Ile 200 Lys Val Thr Aia Cys 280 Tyr Ile Ala 25 Asp His Val1 Arg Giu 105 Pro Asp Phe Leu His 185 Al a Thr Val Leu Gin 265 Leu Gi y Gin Ile Leu Lys Ser 90 Al a Al a Giu Thr Cys 170 Gly Thr Ile Gin Ser 250 Gin Thr Gly Gly Asp Val Leu 75 His Gly Lys Lys T yr 155 Gin Asp Tyr Tyr Thr 235 Lys Val1 Ser Thr His Lys Ser Val1 Gin As n Phe Met 140 Val1 Phe Gly Al a Ile 220 Trp Giu Gly' Phe Lys 300 Gi y Giu Val1 Gi y Asp Ile Val1 Met 125 Val1 Ser Gi y Asn Ile 205 Ser Giu Asp Leu Giu Tyr Thr Giu Ser Val1 Leu Lys 110 Asp Val1 Aila Lys Lys 190 Lys Pro Lys Phe Se r 270 Ile Leu Tyr Met Phe Vai Leu Arg Thr Arg As n Ile 175 Lys Thr Pro Leu Le u 255 His Giy Gly Ile Leu Lys Ile Gin Phe Aila Lys Cys 160 Leu Ala Ile Lys Ile 240 Aia Tyr Asp Pro Giu Val Tyr Thr Ser Vai Giu 305 Giu Tyr Leu Lys Arg Tyr Val 310 WO 98/20113 -108- INFORMATION FOR SEQ ID NO:51: PCTIUS97/20391 SEQUENCE CHARACTERISTICS: LENGTH: 1124 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PLR-Fi3 (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 29..964 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: AATTCGGCAC GAGGAAAAAC AGAGAGAG ATG GGA AAA AGC AAA GTT TTG ATC Met Gly Lys 315 Ser Lys Val Leu Ile 320 ATT GGG GGT ACA Ile Gly Gly Thr TAC TTA GGG AGG Tyr Leu Gly Arg
AGA
Arg 330 TTG GTT AAG GCA Leu Val Lys Ala AGT TTA Ser Leu 335 GCT CAA GGT Ala Gin Gly GAT ATT GAT Asp Ile Asp 355
CAT
His 340 GAA ACA TAC ATT Glu Thr Tyr Ile CAT AGG CCT GAA His Arg Pro Glu ATT GGT GTT Ile Gly Val 350 CAA GGA GCT Gin Gly Ala AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 360 ATA TCA TTT AAA Ile Ser Phe Lys
.ATG
Met 365 CAT CTT His Leu 370 GTA TCT GGT TCT Val Ser Gly Ser
TTC
Phe 375 AAG GAT TTC AAC Lys Asp Phe Asn CTG GTC GAG GCT Leu Val Glu Ala
GTC
Val 385 AAG CTC GTA GAC Lys Leu Val Asp
GTA
Val 390 GTA ATC AGC GCC Val Ile Ser Ala TCT GGT GTT CAT Ser Gly Val His 292 CGA AGC CAT CAA Arg Ser His Gin
ATT
Ile 405 CTT CTT CAA CTC AAG CTT GTT GAA GCT Leu Leu Gin Leu Lys Leu Val Glu Ala 410 ATT AAA Ile Lys 415 GAG GCT GGA Glu Ala Gly CCT GCA AAA Pro Ala Lys 435 GTC AAG AGA TTT TTA CCA TCT GAG TTT Val Lys Arg Phe Leu Pro Ser Glu Phe 425 GGA ATG GAT Gly Met Asp 430 GTA ACA CTT Val Thr Leu TTT ATG GAT ACG Phe Met Asp Thr
GCC
Ala 440 ATG GAA CCC GGA Met Glu Pro Gly
AAG
Lys 445 GAT GAG Asp Glu 450 AAG ATG GTG GTA Lys Met Val Val
AGG
Arg 455 AAA GCA ATT GAA Lys Ala Ile Glu GCT GGG ATT CCT Ala Gly Ile Pro WO 98/20113 PCT/US97/20391 -109- ACA TAT GTC TCT Thr Tyr Val Ser
GCA
Ala 470 AAT TGC TTT GCT Asn Cys Phe Ala TAT TTC TTG GGA Tyr Phe Leu Gly
GGT
Gly 480 532 580 CTC TGT CAA TTT Leu Cys Gin Phe AAA ATT CTT CCT Lys Ile Leu Pro
TCT
Ser 490 AGA GAT TTT GTC Arg Asp Phe Val ATT ATA Ile Ile 495 CAT GGA GAT His Gly Asp GCA ACT TAT Ala Thr Tyr 515
GGT
Gly 500 AAC AAA AAA GCA Asn Lys Lys Ala TAT AAC AAT GAA Tyr Asn Asn Glu GAT GAT ATA Asp Asp Ile 510 CTC AAC AAG Leu Asn Lys GCC ATC AAA ACA Ala Ile Lys Thr
ATT
Ile 520 AAT GAT CCA AGA Asn Asp Pro Arg ACA ATC Thr Ile 530 TAC ATT AGT CCT Tyr Ile Ser Pro
CCA
Pro 535 AAA AAC ATC CTT Lys Asn Ile Leu
TCA
Ser 540 CAA AGA GAA GTT Gin Arg Glu Val CAG ACA TGG GAG Gin Thr Trp Glu
AAG
Lys 550 CTT ATT GGG AAA Leu Ile Gly Lys CTG CAG AAA ATT Leu Gin Lys Ile
ACA
Thr 560 CTC TCG AAG GAA Leu Ser Lys Glu TTT TTA GCC TCC Phe Leu Ala Ser
GTG
Val 570 AAA GAG CTC GAG Lys Glu Leu Glu TAT GCT Tyr Ala 575 CAG CAA GTG Gin Gin Val CTT ACG AGT Leu Thr Ser 595
GGA
Gly 580 TTA AGC CAT TAT Leu Ser His Tyr GAT GTC AAC TAT Asp Val Asn Tyr CAG GGA TGC Gin Gly Cys 590 AAA CTT TAT Lys Leu Tyr TTT GAG ATA GGA Phe Glu Ile Gly
GAT
Asp 600 GAA GAA GAG GCA Glu Glu Glu Ala CCA GAG Pro Glu 610 GTT AAG TAT ACC Val Lys Tyr Thr
AGT
Ser 615 GTG GAA GAG TAC Val Glu Glu Tyr
CTC
Leu 620 AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CAAACGAGTG GTCGATTGAA ATGGAATTTT GAAGTCATCT TCTCCACAAT ATTAGTCCAA ATAAAAAAAA INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 1024 1084 1124 WO 98/20113 WO 9820113PCT/US97/20391 -110- Met Gly Lys Ser Lys Val Leu Ile Arg Len Ile Asp Ser Leu Leu Met Al a 145 Phe Pro Ile Asn As n 225 Gly Se r His Gin *Arg His Ser Phe Al a Lys Pro Gin 130 Ile Al a Ser Tyr Asp 210 Ile Lys Val1 Asp Glu 290 Leu Arg Phe Asn Ile Len Ser 115 Pro Giu Giy Arg Asn 195 Pro Leu Giu Lys Val 275 Giu Val Pro Lys Ser Ser Val1 100 Gin Gly Lys Tyr Asp 180 Asn Arg Ser Len Gin 260 Asn Al a Lys Giu Met Len Giy Glu Phe Lys Al a Phe 165 Phe Giu Thr Gin Gin 245 Leu Tyr Ser Ala Ile Gin Val1 70 Val1 Al a Gly Val Gly 150 Leu Val1 Asp Leu Arg 230 Lys Glu Gin Lys Ser Gly Gly 55 Gin His Ile Met Thr 135 Ile Gly Ile Asp Asn 215 Gin Ile Tyr Gly Leu 295 Leu Val 40 Ala Al a Ile Lys Asp 120 Leu Pro Gly Ile Ile 200 Lys Val Thr Al a Cys 280 Tyr Ile Al a 25 Asp His Val1 Arg Gin 105 Pro Asp Phe Leu His 185 Ala Thr Val Len Gin 265 Len Gly Gin Ile Len Lys Ser 90 Ala Al a Gin Thr Cys 170 Gly Thr Ile Gin Ser 250 Gin Thr Gly Gly Asp Val Len 75 His Gi y Lys Lys Tyr 155 Gin Asp Tyr Tyr Thr 235 Lys Val Se r *Thr Gly His -Gin Lys Val Ser Gly Val Asp Gin Ile Asn Val The Met 125 Met Val 140 Val Ser Phe Gly Gly Asn Ala Ile 205 Ile Ser 220 Trp Gin Gin Asp Gly Len Phe Gin 285 Tyr Thr Gin Ser Val1 Len Lys 110 Asp Val1 Ala Lys Lys 190 Lys Pro Lys Phe Ser 270 Ile Len Tyr Met Phe Val1 Len Arg Thr Arg Asn Ile 175 Lys Thr Pro Leu Len 255 His Gly Gly Ile Len Lys Ile Gin Phe Ala Lys Cys 160 Leu Ala Ile Lys Ile 240 Al a Tyr Asp Pro Gin Val -Tyr Thr Ser Val Gin Glu Tyr Len Lys 305 Arg Tyr Val 310 WO 98/20113 PCT/US97/20391 -111- INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 1097 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PLR-Fi4 (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 29..964 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: AATTCGGCAC GAGGAAAAAC AGAGAGAG ATG GGA AAA AGC AAA GTT TTG Met Gly Lys Ser Lys Val Leu 315
ATC
Ile 320 ATT GGG GGT ACA Ile Gly Gly Thr
GGG
Gly 325 TAC TTA GGG AGG Tyr Leu Gly Arg
AGA
Arg 330 TTG GTT AAG GCA Leu Val Lys Ala AGT TTA Ser Leu 335 GCT CAA GGT Ala Gin Gly GAT ATT GAT Asp Ile Asp 355 GAA ACA TAC ATT Glu Thr Tyr Ile CAT AGG CCT GAA His Arg Pro Glu ATT GGT GTT Ile Gly Val 350 CAA GGA GCT Gin Gly Ala AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 360 ATA TCA TTT AAA Ile Ser Phe Lys
ATG
Met 365 196 CAT CTT His Leu 370 GTA TCT GGT TCT Val Ser Gly Ser AAG GAT TTC AAC Lys Asp Phe Asn CTG GTC GAG GCT Leu Val Glu Ala
GTC
Val 385 AAG CTC GTA GAC Lys Leu Val Asp
GTA
Val 390 GTA ATC AGC GCC ATT TCT GGT GTT CAT Val Ile Ser Ala Ile Ser Gly Val His 395
ATT
Ile 400 CGA AGC CAT CAA Arg Ser His Gin CTT CTT CAA CTC Leu Leu Gin Leu
AAG
Lys 410 CTT GTT GAA GCT Leu Val Glu Ala ATT AAA Ile Lys 415 GAG GCT GGA Glu Ala Gly CCT GCA AAA Pro Ala Lys 435 GTC AAG AGA TTT Val Lys Arg Phe CCA TCT GAG TTT Pro Ser Glu Phe GGA ATG GAT Gly Met Asp 430 GTA ACA CTT Val Thr Leu TTT ATG GAT ACG Phe Met Asp Thr
GCC
Ala 440 ATG GAA CCC GGA Met Glu Pro Gly 388 436 484 GAT GAG Asp Glu 450 AAG ATG GTG GTA Lys Met Val Val
AGG
Arg 455 AAA GCA ATT GAA Lys Ala Ile Glu
AAG
Lys 460 GCT GGG ATT CCT Ala Gly Ile Pro SI' WO 98/20113 PCTUS97/20391 -112- ACA TAT GTC TCT Thr Tyr Val Ser
GCA
Ala 470 AAT TGC TTT GCT Asn Cys Phe Ala TAT TTC TTG GGA Tyr Phe Leu Gly
GGT
Gly 480 CTC TGT CAA TTT Leu Cys Gin Phe AAA ATT CTT CCT Lys Ile Leu Pro
TCT
Ser 490 AGA GAT TTT GTC Arg Asp Phe Val ATT ATA Ile Ile 495 580 CAT GGA GAT His Gly Asp GCA ACT TAT Ala Thr Tyr 515
GGT
Gly 500 AAC AAA AAA GCA Asn Lys Lys Ala
ATA
Ile 505 TAT AAC AAT GAA Tyr Asn Asn Glu GAT GAT ATA Asp Asp Ile 510 CTC AAC AAG Leu Asn Lys GCC ATC AAA ACA Ala Ile Lys Thr AAT GAT CCA AGA Asn Asp Pro Arg ACA ATC Thr Ile 530 TAC ATT AGT CCT Tyr Ile Ser Pro
CCA
Pro 535 AAA AAC ATC CTT Lys Asn Ile Leu
TCA
Ser 540 CAA AGA GAA GTT Gin Arg Glu Val CAG ACA TGG GAG Gin Thr Trp Glu
AAG
Lys 550 CTT ATT GGG AAA Leu Ile Gly Lys
GAA
Glu 555 CTG CAG AAA ATT Leu Gin Lys Ile 772 820 CTC TCG AAG GAA Leu Ser Lys Glu TTT TTA GCC TCC Phe Leu Ala Ser AAA GAG.CTC GAG Lys Glu Leu Glu TAT GCT Tyr Ala 575 CAG CAA GTG Gin Gin Val CTT ACG AGT Leu Thr Ser 595
GGA
Gly 580 TTA AGC CAT TAT Leu Ser His Tyr
CAT
His 585 GAT GTC AAC TAT Asp Val Asn Tyr CAG GGA TGC Gin Gly Cys 590 AAA CTT TAT Lys Leu Tyr TTT GAG ATA GGA Phe Glu Ile Gly
GAT
Asp 600 GAA GAA GAG GCA Glu Glu Glu Ala CCA GAG Pro Glu 610 GTT AAG TAT ACC Val Lys Tyr Thr GTG GAA GAG TAC Val Glu Glu Tyr AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CAAACGAGTG GTCGATTGAA ATGGAATTTT GAAAAAAAA AAA INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein 1024 1084 1097 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: WO 98/20113 PCTIUS97/20391 -113- Met Gly Lys Ser Lys Val Leu Ile Ile Arq Leu Ile Asp Ser Leu Leu Met Ala 145 Phe Pro Ile Asn Asn 225 Gly Ser His Glu C Glu C 305 Arc His Sex Phe Ala Lys Pro Glu 130 Ile Al a Ser Tyr Asp 210 Ile Lys la 1 ksp ;iu Leu *Arg *Phe Asn Ile Leu Ser 115 Pro Glu Gi y Arg Asn 195 Pro Leu Glu Lys Val 275 Glu Val Pro Lys Sex Sex Val1 100 Glu Gly Lys Tyr Asp 180 As n PArg Sex Leu 260 1 la *Lys Glu Met Leu Gly Glu Phe Lys Ala Phe 165 Phe Glu Thr Gin Gln 245 Leu Tyr Sex Ala IlTe Gin Val 70 Val Al a Gly Val1 Gly 150 Leu Val Asp Leu Arg 230 Lys Giu Gin Lys Sex *Gly Gly 55 Giu His Ile Met Thx 135 Ile Gly Ile Asp Asn 215 Giu Ile Tyr GlyC Leu 295 Leu Val 40 Al a Al a Ile Lys Asp 120 Leu Pro Gly Ile Ile 200 Lays Val Thr ys ~80 ~yr Ala 25 Asp His Val1 Arg Giu 105 Pro Asp Phe Leu His 185 Al a Thr Val1 Leu 265 i eu Gly Gly Gin Gly Ile Asp Leu Val Lys Leu 75 Sex His 90 Ala Gly Ala Lys Giu Lys Thx Tyr 155 Cys Gin 170 Gly Asp Thr Tyr Ile Tyr Gin Thr 235 Sex Lys 250 Gin Val Thr Sex Thr Gly Tyr His Giu Thr Lys Val Glu Sex Gly Ser Val Asp Val Gin Ile Leu Asn Vai Lys 110 Phe Met Asp 125 Met Val Val 140 Val Sex Ala Phe Gly Lys Gly Asn Lys 190 Ala Ile Lys 205 Ile Sex ProI 220 Trp Giu Lys Giu Asp Phe I Gly Leu Sex F- 270 Phe Giu Ile C 285 Leu Tyr Met Phe Val1 Leu Arg Thr Arg As n Ile 175 Lys Thr Pro ,e u *eu is ;i Gly Ile Leu Lys Ile Gin Phe Al a Lys Cys 160 Leu Al a Ile Lys Ile 240 Al a Tyr Asp ?ro Giu Val Lys 300 Tyr Thr Sex Val ;iu Tyr Leu Lys Arg Tyr Val 310 WO 98/20113 PCTIUS97/20391 -114- INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1109 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 31..966 (xi) SEQUENCE DESCRIPTION: SEQ ID AATTCGGCAC GAGGAGAAAA ACAGAGAGAG ATG GGA AAA Met Gly Lys 315 AGC AAA GTT TTG ATC Ser Lys Val Leu Ile 320 ATT GGG GGT ACA Ile Gly Gly Thr
GGG
Gly 325 TAC TTA GGG AGG Tyr Leu Gly Arg TTG GTT AAG GCA Leu Val Lys Ala AGT TTA Ser Leu 335 102 GCT CAA GGT Ala Gin Gly GAT ATT GAT Asp Ile Asp 355
CAT
His 340 GAA ACA TAC ATT Glu Thr Tyr Ile
CTG
Leu 345 CAT AGG CCT GAA His Arg Pro Glu ATT GGT GTT Ile Gly Val 350 CAA GGA GCT Gin Gly Ala AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 360 ATA TCA TTT AAA Ile Ser Phe Lys
ATG
Met 365 CAT CTT His Leu 370 GTA TCT GGT TCT Val Ser Gly Ser
TTC
Phe 375 AAG GAT TTC AAC Lys Asp Phe Asn CTG GTC GAG GCT Leu Val Glu Ala
GTC
Val 385
CGA
Arg AAG CTC GTA GAC Lys Leu Val Asp AGC CAT CAA ATT Ser His Gin Ile 405
GTA
Val 390 GTA ATC AGC GCC Val Ile Ser Ala
ATT
Ile 395 TCT GGT GTT CAT Ser Gly Val His CTT CTT CAA CTC Leu Leu Gin Leu
AAG
Lys 410 CTT GTT GAA GCT Leu Val Glu Ala ATT AAA Ile Lys 415 GAG GCT GGA Glu Ala Gly CCT GCA AAA Pro Ala Lys 435
AAT
Asn 420 GTC AAG AGA TTT Val Lys Arg Phe
TTA
Leu 425 CCA TCT GAG TTT Pro Ser Glu Phe GGA ATG GAT Gly Met Asp 430 GTA ACA CTT Val Thr Leu 390 438 TTT ATG GAT ACG Phe Met Asp Thr ATG GAA CCC GGA Met Glu Pro Gly GAT GAG Asp Glu 450 AAG ATG GTG GTA Lys Met Val Val
AGG
Arg 455 AAA GCA ATT GAA Lys Ala Ile Glu GCT GGG ATT CCT Ala Gly Ile Pro WO 98/20113 PCT/US97/20391 -115-
TTC
Phe 465 ACA TAT GTC TCT Thr Tyr Val Ser AAT TGC TTT GCT Asn Cys Phe Ala
GGT
Gly 475 TAT TTC TTG GGA Tyr Phe Leu Gly CTC TGT CAA TTT Leu Cys Gin Phe
GGC
Gly 485 AAA ATT CTT CCT Lys Ile Leu Pro AGA GAT TTT GTC Arg Asp Phe Val ATT ATA Ile Ile 495 CAT GGA GAT His Gly Asp GCA ACT TAT Ala Thr Tyr 515 AAC AAA AAA GCA Asn Lys Lys Ala
ATA
Ile 505 TAT AAC AAT GAA Tyr Asn Asn Glu GAT GAT ATA Asp Asp Ile 510 CTC AAC AAG Leu Asn Lys GCC ATC AAA ACA Ala Ile Lys Thr
ATT
Ile 520 AAT GAT CCA AGA Asn Asp Pro Arg ACA ATC Thr Ile 530 TAC ATT AGT CCT Tyr Ile Ser Pro
CCA
Pro 535 AAA AAC ATC CTT Lys Asn Ile Leu CAA AGA GAA GTT Gin Arg Glu Val
GTT
Val 545 CAG ACA TGG GAG Gin Thr Trp Glu CTT ATT GGG AAA Leu Ile Gly Lys
GAA
Glu 555 CTG CAG AAA ATT Leu Gin Lys Ile CTC TCG AAG GAA Leu Ser Lys Glu
GAT
Asp 565 TTT TTA GCC TCC Phe Leu Ala Ser AAA GAG CTC GAG Lys Glu Leu Glu TAT GCT Tyr Ala 575 822 CAG CAA GTG Gin Gin Val CTT ACG AGT Leu Thr Ser 595
GGA
Gly 580 TTA AGC CAT TAT Leu Ser His Tyr
CAT
His 585 GAT GTC AAC TAT Asp Val Asn Tyr CAG GGA TGC Gin Gly Cys 590 AAA CTT TAT Lys Leu Tyr TTT GAG ATA GGA Phe Glu Ile Gly
GAT
Asp 600 GAA GAA GAG GCA Glu Glu Glu Ala CCA GAG Pro Glu 610 GTT AAG TAT ACC Val Lys Tyr Thr GTG GAA GAG TAC Val Glu Glu Tyr
CTC
Leu 620 AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CAAACGAGTG GTCGATTGAA ATGGAATTTT GAAGTCATCT TCTCCAAAAA AAA INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein 1026 1086 1109 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: WO 98/20113 WO 9820113PCTIUS97/20391 116- Met Arg Leu Ile Asp Ser Leu Leu Met Ala 145 Phe Pro Ile Asn Asn 225 Gly Ser His Glu Giu 305 Gly Arq His Ser Phe Al a Lys Pro Giu 130 Ile Ala Ser T yr Asp 210 Ile Lys Val Asp Giu 290 Lys Leu Arg Phe As n Ile Leu Ser 115 Pro Giu Gly Arg Asn 195 Pro Leu Glu.
Lys Val 275 Giu Ser Val1 Pro Lys Ser Ser Val1 100 Glu Gly Lys Tyr Asp 180 Asn Arg Ser Leu Giu 260 Asn Al a Lys Lys Giu Met Leu Gly Giu Phe Lys Al a Phe 165 Phe Giu Thr Gin Gin 245 Leu Tyr Ser Val Leu Aia Ser Ile Giy Gin Gly 55 Vai Giu 70 Val His Ala Ile Gly Met Vai Thr 135 Gly Ile 150 Leu Gly Vai Ile Asp Asp Leu Asn 215 Arg Giu 230 Lys Ile Giu Tyr Gin Gly Lys Leu 295 Ile Leu Val 40 Ala Ala Ile Lys Asp 120 Leu Pro Gly Ile Ile 200 Lys Val1 Thr Ala Cys 280 Tyr Ile Al a 25 Asp His Val Arg Giu 105 Pro Asp Phe Leu His 185 Ala Thr Val Leu Gin 265 Leu Gly 10 Gin Ile Leu Lys Ser 90 Al a Al a Glu Th r Cys 170 Gly Thr Ile Gin Ser 250 Gin Thr Gly Thr Gly Tyr Leu Giy Gly Asp Val Leu 75 His Gly Lys Lys Tyr 155 Gin Asp Tyr Tyr Thr 235 Lys Val1 Ser His Lys Ser Val Gin As n Phe Met 140 Val Ph e Gi y Al a Ile 220 Trp Giu Gly Phe Lys 300 Giu Thr Vai Giu Gly Ser Asp Val Ile Leu Val Lys 110 Met Asp 125 Val Val Ser Ala Gly Lys Asn Lys 190 Ile Lys 205 Ser Pro Giu Lys Asp Phe Leu Ser 270 Glu Ile 285 Tyr Ile Met Leu Phe Lys Val Ile Leu Gin Arg Phe Thr Ala Arg Lys Asn Cys 160 Ile Leu 175 Lys Ala Thr Ile Pro Lys Leu Ile 240 Leu Ala 255 His Tyr Gly Asp Pro Giu Val Tyr Thr Ser Val Giu Tyr Leu Lys Arg Tyr Val 310
'I
WO 98/20113 -117- INFORMATION FOR SEQ ID NO:57: PCT/US97/20391 SEQUENCE CHARACTERISTICS: LENGTH: 1107 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Forsythia intermedia cDNA PLR-Fi6 (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 27..962 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: AATTCGGCAC GAGAAAACAG AGAGAG ATG GGA AAA AGC AAA GTT TTG ATC ATT Met Gly Lys Ser Lys Val Leu Ile Ile 315 320 GGG GGT ACA Gly Gly Thr CAA GGT CAT Gin Gly His 340
GGG
Gly 325 TAC TTA GGG AGG Tyr Leu Gly Arg
AGA
Arg 330 TTG GTT AAG GCA Leu Val Lys Ala AGT TTA GCT Ser Leu Ala 335 GGT GTT GAT Gly Val Asp GAA ACA TAC ATT Glu Thr Tyr Ile
CTG
Leu 345 CAT AGG CCT His Arg Pro GAA ATT Glu Ile 350 ATT GAT Ile Asp 355 AAA GTT GAA ATG Lys Val Glu Met
CTA
Leu 360 ATA TCA TTT AAA Ile Ser Phe Lys
ATG
Met 365 CAA GGA GCT CAT Gin Gly Ala His GTA TCT GGT TCT Val Ser Gly Ser AAG GAT TTC AAC Lys Asp Phe Asn
AGT
Ser 380 CTG GTC GAG GCT Leu Val Glu Ala AAG CTC GTA GAC Lys Leu Val Asp
GTA
Val 390 GTA ATC AGC GCC Val Ile Ser Ala
ATT
Ile 395 TCT GGT GTT CAT Ser Gly Val His ATT CGA Ile Arg 400 245 293 341 AGC CAT CAA Ser His Gin GCT GGA AAT Ala Gly Asn 420
ATT
Ile 405 CTT CTT CAA CTC Leu Leu Gin Leu
AAG
Lys 410 CTT GTT GAA GCT ATT AAA GAG Leu Val Glu Ala Ile Lys Glu 415 GTC AAG AGA TTT Val Lys Arg Phe
TTA
Leu 425 CCA TCT GAG TTT Pro Ser Glu Phe
GGA
Gly 430 ATG GAT CCT Met Asp Pro GCA AAA Ala Lys 435 TTT ATG GAT ACG Phe Met Asp Thr
GCC
Ala 440 ATG GAA CCC GGA AAG GTA ACA CTT GAT Met Glu Pro Gly Lys Val Thr Leu Asp 445
GAG
Glu 450 AAG ATG GTG GTA Lys Met Val Val
AGG
Arg 455 AAA GCA ATT GAA Lys Ala Ile Glu
AAG
Lys 460 GCT GGG ATT CCT Ala Gly Ile Pro 1 WO 98/20113 PCT/US97/20391 -118- ACA TAT GTC TCT Thr Tyr Val Ser
GCA
Ala 470 AAT TGC TTT GCT Asn Cys Phe Ala
GGT
Gly 475 TAT TTC TTG GGA Tyr Phe Leu Gly GGT CTC Gly Leu 480 TGT CAA TTT Cys Gin Phe GGA GAT GGT Gly Asp Gly 500 AAA ATT CTT CCT Lys Ile Leu Pro
TCT
Ser 490 AGA GAT TTT GTC Arg Asp Phe Val ATT ATA CAT Ile Ile His 495 AAC AAA AAA GCA Asn Lys Lys Ala TAT AAC AAT GAA GAT GAT ATA GCA Tyr Asn Asn Glu Asp Asp Ile Ala 510 629 ACT TAT Thr Tyr 515 GCC ATC AAA ACA Ala Ile Lys Thr
ATT
Ile 520 AAT GAT CCA AGA Asn Asp Pro Arg ACC CTC Thr Leu 525 AAC AAG ACA Asn Lys Thr
ATC
Ile 530 TAC ATT AGT CCT Tyr Ile Ser Pro
CCA
Pro 535 AAA AAC ATC CTT TCA CAA AGA GAA GTT Lys Asn Ile Leu Ser Gin Arg Glu Val 540
GTT
Val 545 CAG ACA TGG GAG Gin Thr Trp Glu
AAG
Lys 550 CTT ATT GGG AAA Leu Ile Gly Lys
GAA
Glu 555 CTG CAG AAA ATT Leu Gin Lys Ile ACA CTC Thr Leu 560 TCG AAG GAA Ser Lys Glu CAA GTG GGA Gin Val Gly 580
GAT
Asp 565 TTT TTA GCC TCC Phe Leu Ala Ser AAA GAG CTC GAG Lys Glu Leu Glu TAT GCT CAG Tyr Ala Gin 575 GGA TGC CTT Gly Cys Leu TTA AGC CAT TAT Leu Ser His Tyr
CAT
His 585 GAT GTC AAC TAT Asp Val Asn Tyr ACG AGT Thr Ser 595 TTT GAG ATA GGA Phe Glu Ile Gly
GAT
Asp 600 GAA GAA GAG GCA Glu Glu Glu Ala AAA CTT TAT CCA Lys Leu Tyr Pro
GAG
Glu 610 GTT AAG TAT ACC Val Lys Tyr Thr GTG GAA GAG TAC Val Glu Glu Tyr
CTC
Leu 620 AAG CGT TAC GTG Lys Arg Tyr Val TAGTTGAAAG CTTTCCATTA TTATTGTAAT AATATTTAAA TCAGTATGTA GTTTTAAATT TCGTTAAATA ATATGTGTTG AATTTTGCTT CAAACGAGTG GTCGATTGAA ATGGAATTTT GAAGTCATCT TCTCCACAAA AAAAA INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein 1022 1082 1107 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: WO 98/20113 PCTIUS97/20391 -119- Met Gly Lys Ser Lys Val Leu Ile Ile Gly Gly Thr Gly Tyr Leu Gly 1 5 10 Arg Arg Leu Val Lys Ala Ser Leu Ala Gin Gly His Glu Thr Tyr Ile 25 Leu His Arg Pro Glu Ile Gly Val Asp Ile Asp Lys Val Glu Met Leu 40 Ile Ser Phe Lys Met Gin Gly Ala His Leu Val Ser Gly Ser Phe Lys 55 Asp Phe Asn Ser Leu Val Glu Ala Val Lys Leu Val Asp Val Val Ile 70 75 Ser Ala Ile Ser Gly Val His Ile Arg Ser His Gin Ile Leu Leu Gin 90 Leu Lys Leu Val Glu Ala Ile Lys Glu Ala Gly Asn Val Lys Arg Phe 100 105 110 Leu Pro Ser Glu Phe Gly Met Asp Pro Ala Lys Phe Met Asp Thr Ala 115 120 125 Met Glu Pro Gly Lys Val Thr Leu Asp Glu Lys Met Val Val Arg Lys 130 135 140 Ala Ile Glu Lys Ala Gly Ile Pro Phe Thr Tyr Val Ser Ala Asn Cys 145 150 155 160 Phe Ala Gly Tyr Phe Leu Gly Gly Leu Cys Gin Phe Gly Lys Ile Leu 165 170 175 Pro Ser Arg Asp Phe Val Ile Ile His Gly Asp Gly Asn Lys Lys Ala 180 185 190 Ile Tyr Asn Asn Glu Asp Asp Ile Ala Thr Tyr Ala Ile Lys Thr Ile 195 200 205 Asn Asp Pro Arg Thr Leu Asn Lys Thr Ile Tyr Ile Ser Pro Pro Lys 210 215 220 Asn Ile Leu Ser Gin Arg Glu Val Val Gln Thr Trp Glu Lys Leu Ile 225 230 235 240 Gly Lys Glu Leu Gin Lys Ile Thr Leu Ser Lys Glu Asp Phe Leu Ala 245 250 255 Ser Val Lys Glu Leu Glu Tyr Ala Gin Gin Val Gly Leu Ser His Tyr 260 265 270 His Asp Val Asn Tyr Gin Gly Cys Leu Thr Ser Phe Glu Ile Gly Asp 275 280 285 Glu Glu Glu Ala Ser Lys Leu Tyr Pro Glu Val Lys Tyr Thr Ser Val 290 295 300 Glu Glu Tyr Leu Lys Arg Tyr Val 305 310 WO 98/20113 PCTITS97/20391 -120- INFORMATION FOR SEQ ID NO:59: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"cDNA synthesis linker primer" (iii) HYPOTHETICAL:
NO
(xi) SEQUENCE DESCRIPTION:. SEQ ID NO:59: GTCTCGAGTT TTTTTTTTTT TTTTTT 26 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: "cDNA synthesis primer" (iii) HYPOTHETICAL:
NO
(xi) SEQUENCE DESCRIPTION: SEQ ID GCACATAAGA GTATGGATAA G 21 INFORMATION FOR SEQ ID NO:61: SEQUENCE CHARACTERISTICS: LENGTH: 1190 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata cDNA PLR-Tpl (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE:
NO
(ix) FEATURE: NAME/KEY: CDS LOCATION: 13..951 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: GCACATAAGA GT ATG GAT AAG AAG AGC AGA GTT CTG ATA GTG GGG GGC 48 Met Asp Lys Lys Ser Arg Val Leu Ile Val Gly Gly 315 320 WO 98/20113 PCT/US97/20391 -121-
ACT
Thr 325 GGT TAT ATA GGC Gly Tyr Ile Gly
AAA
Lys 330 AGA ATT GTG AAT Arg Ile Val Asn AGT ATA TCT CTT Ser Ile Ser Leu 96 144 CAT CCC ACT TAT His Pro Thr Tyr
GTT
Val 345 TTG TTC AGA CCA Leu Phe Arg Pro
GAA
Glu 350 GTG GTC TCT AAC Val Val Ser Asn ATT GAC Ile Asp 355 AAA GTG CAG Lys Val Gin GAG GCT TCA Glu Ala Ser 375 CTG TTA TAC TTC Leu Leu Tyr Phe
AAA
Lys 365 CAG CTT GGT GCC Gin Leu Gly Ala AAA CTT ATT Lys Leu Ile 370 CTG AAA CAA Leu Lys Gin TTG GAT GAC CAC Leu Asp Asp His AGG CTT GTG GAT Arg Leu Val Asp
GCT
Ala 385 240 GTG GAT Val Asp 390 GTT GTC ATA AGT Val Val Ile Ser
GCT
Ala 395 TTG GCA GGA GGT Leu Ala Gly Gly
GTT
Val 400 CTA AGC CAC CAT Leu Ser His His
ATA
Ile 405 CTT GAA CAG CTC Leu Glu Gin Leu
AAA
Lys 410 CTA GTG GAA GCC Leu Val Glu Ala AAA GAA GCT GGA Lys Glu Ala Gly
AAT
Asn 420 336 384 ATT AAG AGA TTT Ile Lys Arg Phe
CTT
Leu 425 CCA TCT GAG TTT Pro Ser Glu Phe ATG GAT CCA GAT Met Asp Pro Asp ATT ATG Ile Met 435 GAG CAT GCA Glu His Ala GTT CGG CGT Val Arg Arg 455
TTG
Leu 440 CAA CCT GGT AGC Gin Pro Gly Ser
ATT
Ile 445 ACA TTC ATC GAT Thr Phe Ile Asp AAG AGA AAG Lys Arg Lys 450 TAT GTG TCT Tyr Val Ser GCC ATT GAA GCA Ala Ile Glu Ala
GCA
Ala 460 TCC ATT CCT TAC Ser Ile Pro Tyr TCA AAT Ser Asn 470 ATG TTT GCT GGT Met Phe Ala Gly
TAC
Tyr 475 TTT GCT GGA AGT Phe Ala Gly Ser
TTA
Leu 480 GCT CAA CTT GAT Ala Gin Leu Asp
GGT
Gly 485 CAT ATG ATG CCT His Met Met Pro
CCT
Pro 490 CGA GAC AAG GTC Arg Asp Lys Val
CTC
Leu 495 ATC TAT GGA GAT Ile Tyr Gly Asp
GGA
Gly 500 576 AAT GTT AAA GGT Asn Val Lys Gly
ATT
Ile 505 TGG GTG GAT GAA Trp Val Asp Glu GAT GTT GGA ACA Asp Val Gly Thr TAC ACA Tyr Thr 515 ATC AAA TCA Ile Lys Ser AGG CCA CCT Arg Pro Pro 535 GAT GAT CCA CAA Asp Asp Pro Gin
ACC
Thr 525 CTT AAC AAG ACT Leu Asn Lys Thr ATG TAT ATT Met Tyr Ile 530 CAA ATA TGG Gin Ile Trp ATG AAT ATC CTT Met Asn Ile Leu
TCA
Ser 540 CAG AAG GAA GTT Gin Lys Glu Val
ATA
Ile 545 GAG AGA Glu Arg 550 TTA TCA GAA CAA Leu Ser Glu Gin
AAC
Asn 555 CTG GAT AAA ATA Leu Asp Lys Ile ATT TCT TCT CAA Ile Ser Ser Gin U 4, 11 WO 98/20113 PCT/US97/20391 -122- GAC TTT CTT GCA GAT ATG AAA GAT AAA TCA Asp Phe Leu Ala Asp Met Lys Asp Lys Ser 565 570 CGA TGT CAT CTC TAC CAA ATT TTC TTT AGA Arg Cys His Leu Tyr Gin Ile Phe Phe Arg 585 590 GAA ATT GGC CCC AAT GCT ATT GAA GCT ACC Glu Ile Gly Pro Asn Ala Ile Glu Ala Thr 600 605 AAA TAC GTA ACC ATG GAT TCA TAT TTA GAG Lys Tyr Val Thr Met Asp Ser Tyr Leu Glu 615 620 TCTAGTTTTG TATATTGTTT TTCTACATGA
TAATGTG
AGACTTATGG CTCAATTTTA AAACTAGAGT
ACACTTT
TTTACTTCAT ATTGTACTCA ATATAGACTT
GGTATAA
TATAATTATT TATAGATCTT ATTTTAAATA
AAAAAAA
INFORMATION FOR SEQ ID NO:62: SEQUENCE CHARACTERISTICS: LENGTH: 313 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asp Lys Lys Ser Arg Val Leu Ile Val 1 5 10 Gly Lys Arg Ile Val Asn Ala Ser Ile Ser I 25 Val Leu Phe Arg Pro Glu Val Val Ser Asn 40 Leu Leu Tyr Phe Lys Gin Leu Gly Ala Lys I 55 Asp Asp His Gin Arg Leu Val Asp Ala Leu I 70 Ile Ser Ala Leu Ala Gly Gly Val Leu Ser H 90 Leu Lys Leu Val Glu Ala Ile Lys Glu Ala G 100 105 Leu Pro Ser Glu Phe Gly Met Asp Pro Asp I 115 120 TAT GAA GAG AAG ATT GTA Tyr Glu Glu Lys Ile Val 575 580 GGA GAT CTT TAC AAC TTT Gly Asp Leu Tyr Asn Phe 595 AAA CTT TAT CCA GAA GTG Lys Leu Tyr Pro Glu Val 610 CGC TAT GTT TGAATATCTT Arg Tyr Val 625 AGA GGTACTATTT
CAAATAATTT
ATT CCAAATTACT
TACACTATTT
AGA ATATGGAATC
ATAATGATAT
AAA AAAAAAAAA 816 864 912 961 1021 1081 1141 1190 NO:62: Gly Gly Leu Gly Ile Asp Leu Ile lys Gin 75 lis His ly Asn le Met Thr His Lys Glu Val Ile Ile Glu 125 Gly Pro Val Ala Asp Leu Lys 110 His Tyr Thr Gin Ser Val Glu Arg Ala Ile Tyr Met Leu Val Gin Phe Leu WO 98/20113 PCT/US97/20391 -123- Gln Pro Gly Ser 130 Ile Thr Phe Ile Asp Lys Arg Lys Val Arg Arg Ala 135 140 Ile Glu 145 Ala Gly Pro Pro Ile Trp Asp Asp 210 Asn Ile 225 Glu Gin Asp Met Tyr .Gln Asn Ala 290 Met Asp 305 Ala Tyr Arg Val 195 Pro Leu Asn Lys Ile 275 Ile Ser Ala Phe Asp 180 Asp Gin Ser Leu Asp 260 Phe Glu Tyr Ser Ile 150 Ala Gly 165 Lys Val Glu Asp Thr Leu Gin Lys 230 Asp Lys 245 Lys Ser Phe Arg Ala Thr Leu Glu 310 Pro Ser Leu Asp Asn 215 Glu Ile Tyr Gly Lys 295 Arg Tyr Thr Leu Ala Ile Tyr 185 Val Gly 200 Lys Thr Val Ile Tyr Ile Glu Glu 265 Asp Leu 280 Leu Tyr Tyr Val Tyr Val 155 Gin Leu 170 Gly Asp Thr Tyr Met Tyr Gin Ile 235 Ser Ser 250 Lys Ile Tyr Asn Ser Ser Asp Gly Gly Asn Thr Ile 205 Ile Arg 220 Trp Glu Gin Asp Val Arg Phe Glu 285 Asn His Val 190 Lys Pro Arg Phe Cys 270 Ile Met Met 175 Lys Ser Pro Leu Leu 255 His Gly Phe 160 Met Gly Ile Met Ser 240 Ala Leu Pro Pro Glu Val Lys Tyr Val Thr 300 INFORMATION FOR SEQ ID NOi63: SEQUENCE CHARACTERISTICS: LENGTH: 1151 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata cDNA PLR-Tp2 (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 61..996 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: GATAAGCAGC ATTTCTTCAC CAAAGTGGTC CGCCATTAAA GGAATAGTTT
GAAAGCAGAG
WO 98/20113 PCT/US97/20391 -124- ATG GAA Met Glu 315 GAG AGT AGC AGG Glu Ser Ser Arg TTG ATA GTG Leu Ile Val GGA GGC Gly Gly 325 ACA GGA TAC ATA Thr Gly Tyr Ile
GGC
Gly 330 AGA AGG ATT GTG Arg Arg Ile Val
AAA
Lys 335 GCC AGC ATT GCT Ala Ser Ile Ala GGC CAT CCT ACT Gly His Pro Thr ATT TTG TTT AGG Ile Leu Phe Arg
AAA
Lys 350 GAA GTT GTT TCT Glu Val Val Ser
GAT
Asp 355 GTA GAG AAA GTG Val Glu Lys Val GAG ATG Glu Met 360 TTA TTG TCC Leu Leu Ser GAT GAT CAC Asp Asp His 380
TTC
Phe 365 AAA AAG AAT GGT Lys Lys Asn Gly
GCC
Ala 370 AAA TTA CTG GAG Lys Leu Leu Glu GCT TCA TTT Ala Ser Phe 375 GAT GTT GTG Asp Val Val GAA AGC CTT GTA Glu Ser Leu Val GCT GTG AAG CAG Ala Val Lys Gin ATA AGT Ile Ser 395 GCA GTT GCA GGA Ala Val Ala Gly
AAC
Asn 400 CAC ATG CGG CAT His Met Arg His
CAC
His 405 ATC CTT CAA CAG Ile Leu Gin Gin
CTC
Leu 410 AAA TTA GTG GAG Lys Leu Val Glu ATT AAA GAA GCT Ile Lys Glu Ala
GGA
Gly 420 AAT ATT AAG AGG Asn Ile Lys Arg GTT CCT TCA GAA Val Pro Ser Glu
TTT
Phe 430 GGG ATG GAT CCA Gly Met Asp Pro TTA ATG GAG CAT Leu Met Glu His GCA ATG Ala Met 440 GCA CCT GGC Ala Pro Gly
AAC
Asn 445 ATT GTA TTT ATT Ile Val Phe Ile
GAT
Asp 450 AAA ATA AAA GTT Lys Ile Lys Val CGA GAG GCC Arg Glu Ala 455 AAC ATA TTT Asn Ile Phe ATA GAA GCT GCA TCC ATT CCT CAC Ile Glu Ala Ala Ser Ile Pro His 460 465 ACT TAT ATC TCT Thr Tyr Ile Ser 540 588 GCT GGC Ala Gly 475 TAC TTG GTT GGT Tyr Leu Val Gly TTA GCT CAA CTT Leu Ala Gin Leu
GGT
Gly 485 CGT GTG ATG CCT Arg Val Met Pro
CCT
Pro 490 TCA GAA AAA GTA Ser Glu Lys Val CTC TAT GGA GAT Leu Tyr Gly Asp
GGA
Gly 500 AAT GTC AAA GCT Asn Val Lys Ala
GTT
Val 505 TGG GTA GAT GAA Trp Val Asp Glu
GAT
Asp 510 GAT GTT GGA ATA Asp Val Gly Ile ACA ATC AAA GCA Thr Ile Lys Ala ATT GAT Ile Asp 520 GAC CCT CAC Asp Pro His ATT CTT TCT Ile Leu Ser 540
ACC
Thr 525 CTA AAT AAG ACT Leu Asn Lys Thr
ATG
Met 530 TAC ATC AGG CCA Tyr Ile Arg Pro CCT TTG AAT Pro Leu Asn 535 CAG AAG GAA GTG GTT GAA AAA TGG GAA AAG TTA TCA GGA Gin Lys Glu Val Val Glu Lys Trp Glu Lys Leu Ser Gly WO 98/20113 PCTIUS97/20391 -125- AAG AGC TTA AAT AAA ATA AAT ATT TCT GTT GAA GAT TTT CTT GCA GGC 828 Lys Ser Leu Asn Lys Ile Asn Ile Ser Val Glu Asp Phe Leu Ala Gly 555 560 565 ATG GAA GGT CAA TCA TAT GGA GAG CAG ATT GGA ATA TCA CAT TTC TAC 876 Met Glu Gly Gin Ser Tyr Gly Glu Gin Ile Gly Ile Ser His Phe Tyr 570 575 580 585 CAA ATG TTC TAT AGG GGT GAT CTT TAT AAT TTT GAA ATT GGA CCT AAT 924 Gin Met Phe Tyr Arg Gly Asp Leu Tyr Asn Phe Glu Ile Gly Pro Asn 590 595 600 GGA GTA GAA GCT TCC CAA CTT TAT CCA GAA GTA AAA TAT ACA ACA GTG 972 Gly Val Glu Ala Ser Gin Leu Tyr Pro Glu Val Lys Tyr Thr Thr Val 605 610 615 GAT TCA TAC ATG GAA CGC TAC CTA TGAAAATCTT CTTCACGAAG ATATCTAAAT 1026 Asp Ser Tyr Met Glu Arg Tyr Leu 620 625 TTAATTTAAG CTTTCTAAAA GTTTTTATAT TTTGACATTA TGCTAAATAA AAATGGAGAG 1086 TATCTAGATA ATAATATTGA CCAATCATAT TAAAAATTAT TGGGATTAAA AAAAAAAAAA 1146 AAAAA 1151 INFORMATION FOR SEQ ID NO:64: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: Met Glu Glu Ser Ser Arg Val Leu Ile Val Gly Gly Thr Gly Tyr Ile 1 5 10 Gly Arg Arg Ile Val Lys Ala Ser Ile Ala Leu Gly His Pro Thr Phe 25 Ile Leu Phe Arg Lys Glu Val Val Ser Asp Val Glu Lys Val Glu Met 40 Leu Leu Ser Phe Lys Lys Asn Gly Ala Lys Leu Leu Glu Ala Ser Phe 55 Asp Asp His Glu Ser Leu Val Asp Ala Val Lys Gin Val Asp Val Val 70 75 Ile Ser Ala Val Ala Gly Asn His Met Arg His His Ile Leu Gin Gin 90 Leu Lys Leu Val Glu Ala Ile Lys Glu Ala Gly Asn Ile Lys Arg Phe 100 105 110 WO 98/20113 PCT/US97/20391 -126- Val Pro Ser Glu Phe Gly Met 115 Ala Ile 145 Ala Pro Trp Asp Ile 225 Lys Met Gin Gly Asp 305 Pro 130 Glu Gly Ser Val Pro 210 Leu Ser Glu Met Val 290 Ser Gly Ala Tyr Glu Asp 195 His Ser Leu Gly Phe 275 Glu Tyr Asn Ala Leu Lys 180 Glu Thr Gin Asn Gin 260 Tyr Ala Met Ile Val Ser Ile 150 Val Gly 165 Val Ile Asp Asp Leu Asn Lys Glu 230 Lys Ile 245 Ser Tyr Arg Gly Ser Gin Glu Arg 310 Phe 135 Pro Gly Leu Val Lys 215 Val Asn Gly Asp Leu 295 Tyr Asp Pro 120 Ile Asp His Thr Leu Ala Tyr Gly 185 Gly Ile 200 Thr Met Val Glu Ile Ser Glu Gin 265 Leu Tyr 280 Tyr Pro Leu Gly Lys Tyr Gin 170 Asp Tyr Tyr Lys Val 250 Ile Asn Glu Leu Ile Ile 155 Leu Gly Thr Ile Trp 235 Glu Gly Phe Val Met Lys 140 Ser Gly Asn Ile Arg 220 Glu Asp Ile Glu Lys 300 Glu His 125 Val Arg Ala Asn Arg Val Val Lys 190 Lys Ala 205 Pro Pro Lys Leu Phe Leu Ser His 270 Ile Gly 285 Tyr Thr Ala Glu Ile Met 175 Ala Ile Leu Ser Ala 255 Phe Pro Thr Met Ala Phe 160 Pro Val Asp Asn Gly 240 Gly Tyr Asn Val INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1308 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata cDNA PLR-Tp3 (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 164..1105 (xi) SEQUENCE DESCRIPTION: SEQ ID ii WO 98/20113 PCTIUS97/20391 -127- AAAAACTCTT AGACTTATTT TCATTTTTAC CCAGTTCATA AGTGTTTGTT
GGGTCTCTTC
AAAAAAAGCC CCCTCTCGTT AGAGGCAAAG AACAGCATGC TCAGATATAT
GTAAGAAGCA
AAATGCCCAA AATTTGACTG TGAAAGTGGA TGCACATAAG AAT ATO GAT AAG AAG Met Asp Lys Lys 315 AGC AGA GTT Ser Arg Val GTG AAG 0CC Val Lys Ala 335 CTA ATA Leu Ile 320 GTG GGG GGT Val Gly Gly
ACT
Thr 325 GGT TTT ATA GGC Gly Phe Ile Gly AAA AGA ATT Lys Arg Ile 330 TTG TTC AGG Leu Phe Arg 223 271 AGT TTG GCT CTT Ser Leu Ala Leu CAT CCT ACT TAT His Pro Thr Tyr CCA GAA Pro Glu 350 GCC CTC TCT TAC Ala Leu Ser Tyr
ATT
Ile 355 GAC AAA OTG GAG Asp Lys Val Gin
ATO
Met 360 TTG ATA TCC TTC Leu Ile Ser Phe
AAA
Lys 365 CAG CTT GGG GCC Gin Leu Gly Ala
AAA
Lys 370 CTT CTT GAG OCT Leu Leu Glu Ala
TCA
Ser 375 TTG OAT GAC CAC Leu Asp Asp His GGG CTT GTG GAT Gly Leu Vai Asp GTG AAA CAA OTA Val Lys Gin Val OTT GTG ATC AGT Vai Val Ile Ser OCT OTT Ala Val 395 415 463 TCA OGA GGT Ser Gly Oly GAG OCA ATT Glu Ala Ile 415
CTG
Leu 400 GTG COC CAC CAT Vai Arg His His
ATA
Ile 405 CTT GAG CAG CTC Leu Asp Gin Leu AAO CTA GTG Lys Leu Val 410 CCT TCA GAA Pro Ser Glu AAA GAA OCT GGC Lys Olu Aia Gly ATT AAG AGA TTT Ile Lys Arg Phe
CTT
Leu 425 TTT GGG Phe Gly 430 ATO GAG CCA OAT Met Asp Pro Asp
OTT
Val 435 OTA GAA OAT CCA Val Glu Asp Pro
TTO
Leu 440 OAA CCT GOT AAC Glu Pro Giy Asn
ATT
Ile 445 ACA TTC ATT OAT Thr Phe Ile Asp
AAA
Lys 450 AGA AAA OTT AGA Arg Lys Val Arg 0CC ATT GAA OCA Ala Ile Olu Ala 607 655 ACC ATT CCT TAC Thr Ile Pro Tyr TAT GTG TCT TCA Tyr Val Ser Ser ATO TTT OCT GGG Met Phe Aia Oly TTC TTT Phe Phe 475 OCT OGA AGC Ala Gly Ser CGA OAT AAA Arg Asp Lys 495
TTA
Leu 480 GCA CAA CTG CAA Ala Gin Leu Gin
OAT
Asp 485 OCT CCC COC ATO Ala Pro Arq Met ATO CCT OCT Met Pro Ala 490 GGT OTT TAT Oly Val Tyr OTT CTC ATA TAT Val Leu Ile Tyr OAT OGA AAT OTT Asp Oly Asn Val
AAA
Lys 505 OTA OAT Val Asp 510 OAA OAT OAT OCT Olu Asp Asp Ala OGA ATA Gly Ile 515 TAG ATA GTC Tyr Ile Val
AAA
Lys 520 TCA ATT OAT OAT Ser Ile Asp Asp WO 98/20113 PCT/US97/20391 -128- CCT CGC ACA CTC AAC AAG ACT GTG TAT ATC AGG Pro Arg Thr Leu Asn Lys Thr Val Tyr Ile Arg 525 530 535 CTT TCA CAG AAA GAA GTA GTT GAA ATA TGG GAG Leu Ser Gin Lys Glu Val Val Glu Ile Trp Glu 545 550 AGC CTA GAA AAA ATC TAC GTT TCT GAG GAC CAA Ser Leu Glu Lys Ile Tyr Val Ser Glu Asp Gin 560 565 GAT AAA TCT TAT GTG GAG AAG ATG GCA CGA TGT Asp Lys Ser Tyr Val Glu Lys Met Ala Arg Cys 575 580 TTT ATC AAA GGG GAT CTT TAC AAT TTT GAA ATT Phe Ile Lys Gly Asp Leu Tyr Asn Phe Glu Ile 590 595 GAA GGC ACA AAA CTT TAT CCA GAA GTC AAA TAC Glu Gly Thr Lys Leu Tyr Pro Glu Val Lys Tyr 605 610 615 TAT ATG GAG CGT TAT CTA TAGCTAATAG ATTTTTCT Tyr Met Glu Arg Tyr Leu 625 TGAAATATTC TATACTCAAT AAGAGTGTAT TCATAAATAA ATAGATTACT TTTTTAATAG GTGGCTTTTA TAAAACATGT TATTTTTAAA TTAGCAATAA TAACCACCTT
TAAATAAAA
INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 314 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6 Met Asp Lys Lys Ser Arg Val Leu Ile Val Gly 1 5 10 Gly Lys Arg Ile Val Lys Ala Ser Leu Ala Leu 25 Val Leu Phe Arg Pro Glu Ala Leu Ser Tyr Ile 40 Leu Ile Ser Phe Lys Gin Leu Gly Ala Lys Leu 55 Asp Asp His Gin Gly Leu Val Asp Val Val Lys 70 75 CCA CCA ATG AAT Pro Pro Met Asn AGA CTA TCA GGT Arg Leu Ser Gly 555 CTT CTT AAT ATG Leu Leu Asn Met 570 CAT CTC TAT CAT His Leu Tyr His 585 GGA CCC AAT GCT Gly Pro Asn Ala 600 ACA ACC ATG GAT Thr Thr Met Asp TA AATAATAGCT
ATA
Ile 540
TTG
Leu
AAA
Lys
TTT
Phe
ACT
Thr
TCA
Ser 620 847 895 943 991 1039 1087 1135 1195 1255 1308 TACACAACAC TTGCTCTTTT ATAAAAAAAA TTGCAAACAA AAAAAAAAAA AAA 6: Gly-Thr Gly Phe Ile Gly His Pro Thr Tyr Asp Lys Val Gin Met Leu Glu Ala Ser Leu Gin Val Asp Val Val WO 98/20113 PCTIUS97/20391 -129- Ile Ser Ala Val Ser Gly Gly Leu Val Leu Leu Glu Ile 145 Ala Met Lys Ser Pro 225 Leu Leu Leu Pro Thr 305 Lys Pro Pro 130 Glu Gly Met Gly Ile 210 Met Ser Asn Tyr Asn 290 Met Leu Ser 115 Gly Ala Phe Pro Val 195 Asp Asn Gly Met His 275 Ala Asp Val 100 Glu Asn Ala Phe Ala 180 Tyr Asp Ile Leu Lys 260 Phe Thr Ser Glu Phe Ile Thr Ala 165 Arg Val Pro Leu Ser 245 Asp Phe Glu Tyr Ala Gly Thr Ile 150 Gly Asp Asp Arg Ser 230 Leu Lys Ile Gly Met 310 Ile Met Phe 135 Pro Ser Lys Glu Thr 215 Gin Glu Ser Lys Thr 295 Glu Lys Asp 120 Ile Tyr Leu Val Asp 200 Leu Lys Lys Tyr Gly 280 Lys Arg Glu 105 Pro Asp Thr Ala Leu 185 Asp Asn Glu Ile Val 265 Asp Leu Tyr Arg 90 Ala Asp Lys Tyr Gin 170 Ile Ala Lys Val Tyr 250 Glu Leu Tyr Leu His His Ile Leu Asp Gly Val Arg Val 155 Leu Tyr Gly Thr Val 235 Val Lys Tyr Pro Asn Val Lys 140 Ser Gin Gly Ile Val 220 Glu Ser Met Asn Glu 300 Ile Glu 125 Val Ser Asp Asp Tyr 205 Tyr Ile Glu Ala Phe 285 Val Lys 110 Asp Arg Asn Ala Gly 190 Ile Ile Trp Asp Arg 270 Glu Lys Arg Pro Arg Met Pro 175 Asn Val Arg Glu Gin 255 Cys Ile Tyr Gin Phe Leu Ala Phe 160 Arg Val Lys Pro Arg 240 Leu His Gly Thr INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 1287 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Thuja plicata cDNA PLR-Tp4 (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO WO 98/20113 -130- (ix) FEATURE: NAME/KEY:
CDS
LOCATION: 11..946 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: PCTIUS97/20391 GAAAGCAGAG ATG GAA GAG AGT AGC AGG ATT TTG Met Glu Glu Ser Ser Arg Ile Leu 315 320 GTA GTG GGA GGC ACA Val Val Gly Gly Thr 325 GGA TAC ATA Gly Tyr Ile 330 GGC AGA AGG ATT Gly Arg Arg Ile AAA GCC AGC ATT Lys Ala Ser Ile CTG GGC CAT Leu Gly His GTA GAG AAA Val Glu Lys CCT ACT Pro Thr 345 TTC ATT TTG TTT Phe Ile Leu Phe
AGG
Arg 350 AAA GAA GTT GTT Lys Glu Val Val TCT GAT Ser Asp 355
GTG
Val 360 GAG ATG TTA TTG Glu Met Leu Leu
TCC
Ser 365 TTC AAA AAG AAT Phe Lys Lys Asn
GGT
Gly 370 GCC AAA TTA CTG Ala Lys Leu Leu
GAG
Glu 375 145 193 241 GCT TCA TTT GAT Ala Ser Phe Asp CAC GAA AGC CTT His Glu Ser Leu GTA GAT Val Asp 385 GCT GTG AAG Ala Val Lys CAG GTT Gin Val 390 GAT GTT GTC Asp Val Val CTT CAA CAG Leu Gin Gin 410
ATA
Ile 395 AGT GCA GTT GCA Ser Ala Val Ala
GGA
Gly 400 AAC CAC ATG CGG Asn His Met Arg CAT CAC ATC His His Ile 405 GGA AAT ATT Gly Asn Ile CTC AAA TTA GTG Leu Lys Leu Val
GAG
Glu 415 GCC ATT AAA GAA Ala Ile Lys Glu 337 AAG AGG Lys Arg 425 TTT GTC CCT TCA Phe Val Pro Ser
GAA
Glu 430 TTT GGG ATG GAT Phe Gly Met Asp GGG TTA ATG GAC Gly Leu Met Asp
CAT
His 440 GCA ATG GCA CCA Ala Met Ala Pro AAC ATT GTA TTT Asn Ile Val Phe GAT AAA ATA AAA Asp Lys Ile Lys CGA GAG GCC ATT Arg Glu Ala Ile
GAA
Glu 460 GCT GCA GCT ATT Ala Ala Ala Ile
CCT
Pro 465 CAC ACT TAT ATT His Thr Tyr Ile TCT GCC Ser Ala 470 AAT ATA TTT Asn Ile Phe GTG ATG CCT Val Met Pro 490
GCT
Ala 475 GGC TAC TTG GTT Gly Tyr Leu Val
GGT
Gly 480 GGA TTA GCT CAA Gly Leu Ala Gin CTT GGT CGT Leu Gly Arg 485 GGA AAT GTC Gly Asn Val 481 529 577 625 CCT TCA GAC AAA Pro Ser Asp Lys
GTA
Val 495 TTT CTC TAT GGA Phe Leu Tyr Gly AAA GCT Lys Ala 505 GTT TGG ATA GAT Val Trp Ile Asp
GAA
Glu 510 GAA GAT GTT GGA Glu Asp Val Gly
ATA
Ile 515 TAC ACA ATC AAA Tyr Thr Ile Lys WO 98/20113 PCT/US97/20391 -131-
GCA
Ala 520
CCT
Pro ATT GAT GAC CCT Ile Asp Asp Pro
CGC
Arg 525
TCC
Ser ACC CTA AAT AAG Thr Leu Asn Lys
ACT
Thr 530
GTT
Val GTG TAC ATC AGG Val Tyr Ile Arg
CCA
Pro 535 TTG AAT GTT Leu Asn Val CAG AAG GAA Gin Lys Glu GAA AAA TGG Glu Lys Trp TTA TCA AGA Leu Ser Arg CTC GCA GGC Leu Ala Gly 570 CAT TTC TAT His Phe Tyr 585
AAG
Lys 555
ATG
Met AGC TTG GAT AAA Ser Leu Asp Lys ATA TAT ATG Ile Tyr Met 560 TCT GTT Ser Val
GAG
Glu 565 GAA GGT CAA Glu Gly Gin
TAT
Tyr
GGG
Gly CAG ATG TTC Gin Met Phe
TAT
Tyr 590
GCT
Ala GGA GAG AAG ATT GGA Gly Glu Lys Ile Gly 580 GAT CTT TAT AAT TTT Asp Leu Tyr Asn Phe 595 CTT TAC CCA GGA GTA Leu Tyr Pro Gly Val 610 GAA AAA Glu Lys 550 GAT TTT Asp Phe ATA TCA Ile Ser GAA ATT Glu Ile AAA TAC Lys Tyr 615 673 721 769 817 865 913 966
GGA
Gly 600
ACA
Thr CCT AAT GGA GTA Pro Asn Gly Val TCC CAA Ser Gin ACA GTG GAC Thr Val Asp
TCA
Ser 620 ATG GAG CGC Met Glu Arg
TAC
Tyr 625 TGAAAATCTT CTTCATGAAG
ATATTTAAAT
TAGATGTAGA
GACTTTTTCC
AGGTTGAGAA
ATTGTATTTA
TTTAAAAAA
TCAATTTAAT GCTTTCTAAA GTATCTAGAT AATAATATTC CTTTAACTGC ATGCTCAACA ACTAAATATG GTTTTGTATT TTTTGAATGT TATGATTTTG AAAAAAAAAA A
AGTTTTTATA
AATTGATAAT
TATTTTATAC
ACATGGAAAA
ATAAAATTTG
TTTTGACATA
ATTCAACAAT
AAACAAGCTA
ACCATATTTT
AAATTGATTA
ATGCTAAATA
CAGTTGAGAT
ATGTCTTTTA
GATATTTGAG
TGAACATTGT
1026 1086 1146 1206 1266 1287 INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: LENGTH: 312 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: Met Glu Glu Ser Ser Arg Ile Leu Val Val Gly Gly Thr Gly Tyr Ile 1 5 10 Gly Arg Arg Ile Val Lys Ala Ser Ile Ala Leu Gly His Pro Thr Phe 25 Ile Leu Phe Arg Lys Glu Val Val Ser Asp Val Glu Lys Val Glu Met 40 WO 98/20113 PCTIUS97/20391 -132- Leu Leu Ser Phe Lys Lys Asn Gly Ala Lys Leu Leu Glu Ala Ser Phe 55 Asp Asp His Glu Ser Leu Val Asp Ala Val Lys Gin Val Asp Val Val 70 75 Ile Ser Ala Val Ala Gly Asn His Met Arg His His Ile Leu Gin Gin 90 Leu Lys Leu Val Glu Ala Ile LysGlu Ala Gly Asn Ile Lys Arg Phe 100 105 110 Val Pro Ser Glu Phe Gly Met Asp Pro Gly Leu Met Asp His Ala Met 115 120 125 Ala Pro Gly Asn Ile Val Phe Ile Asp Lys Ile Lys Val Arg Glu Ala 130 135 140 Ile Glu Ala Ala Ala Ile Pro His Thr Tyr Ile Ser Ala Asn Ile Phe 145 150 155 160 Ala Gly Tyr Leu Val Gly Gly Leu Ala Gin Leu Gly Arg Val Met Pro 165 170 175 Pro Ser Asp Lys Val Phe Leu Tyr Gly Asp Gly Asn Val Lys Ala Val 180 185 190 Trp Ile Asp Glu Glu Asp Val Gly Ile Tyr Thr Ile Lys Ala Ile Asp 195 200 205 Asp Pro Arg Thr Leu Asn Lys Thr Val Tyr Ile Arg Pro Pro Leu Asn 210 215 220 Val Leu Ser Gin Lys Glu Val Val Glu Lys Trp Glu Lys Leu Ser Arg 225 230 235 240 Lys Ser Leu Asp Lys Ile Tyr Met Ser Val Glu Asp Phe Leu Ala Gly 245 250 255 Met Glu Gly Gin Ser Tyr Gly Glu Lys Ile Gly Ile Ser His Phe Tyr 260 265 270 Gin Met Phe Tyr Lys Gly Asp Leu Tyr Asn Phe Glu Ile Gly Pro Asn 275 280 285 Gly Val Glu Ala Ser Gin Leu Tyr Pro Gly Val Lys Tyr Thr Thr Val 290 295 300 Asp Ser Tyr Met Glu Arg Tyr Leu 305 310 INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 1282 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 98/20113 PCTfUS97/20391 -133- (ii) MOLECULE TYPE: Tsuga heterophylla cDNA PLR-Thl (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 2..922 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: C AGA GTT CTA ATA GTG GGT GGC ACA GGA TAC ATA GGT AGA AAA TTT Arg Val Leu Ile Val Gly Gly Thr Gly Tyr Ile Gly Arg Lys Phe 315 320 325 GTA AAA GCT Val Lys Ala 330 AGC TTA GCT CTA Ser Leu Ala Leu
GGC
Gly 335 CAC CCA ACA TTC His Pro Thr Phe TTG TCC AGG Leu Ser Arg CCA GAA Pro Glu 345 GTA GGG TTT GAC Val Gly Phe Asp
ATT
Ile 350 GAG AAG GTG CAC Glu Lys Val His
ATG
Met 355 TTG CTC TCC TTC Leu Leu Ser Phe
AAA
Lys 360 CAA GCG GGT GCC Gin Ala Gly Ala
AGA
Arg 365 CTT TTG GAG GGT Leu Leu Glu Gly
TCA
Ser 370 TTT GAG GAT TTC Phe Glu Asp Phe
CAA
Gin 375 190 AGC CTT GTG GCA Ser Leu Val Ala
GCC
Ala 380 TTG AAG CAG GTT Leu Lys Gin Val
GAT
Asp 385 GTT GTG ATA AGT Val Val Ile Ser GCA GTG Ala Val 390 GCA GGA AAC Ala Gly Asn GAA GCC ATA Glu Ala Ile 410
CAT
His 395 TTC AGA AAC CTT Phe Arg Asn Leu
ATA
Ile 400 CTT CAA CAG CTT Leu Gin Gin Leu AAA TTG GTG Lys Leu Val 405 CCT TCT GAA Pro Ser Glu AAA GAA GCT GGC Lys Glu Ala Gly ATT AAG AGA TTT Ile Lys Arg Phe TTT GGA Phe Gly 425 ATG GAA CCA GAC Met Glu Pro Asp
CTC
Leu 430 ATG GAG CAC GCT Met Glu His Ala
TTG
Leu 435 GAA CCT GGT AAC Glu Pro Gly Asn
GCT
Ala 440 GTC TTC ATT GAT Val Phe Ile Asp AGA AAG GTT CGG Arg Lys Val Arg
CGC
Arg 450 GCC ATT GAA GCA Ala Ile Glu Ala
GCA
Ala 455 GGC ATT CCT TAC Gly Ile Pro Tyr
ACG
Thr 460 TAT GTC TCT TCA Tyr Val Ser Ser
AAT
Asn 465 ATA TTT GCT GGG Ile Phe Ala Gly TAT TTA Tyr Leu 470 478 526 GCA GGA GGG Ala Gly Gly GTA GTT ATC Val Val Ile 490 GCA CAA ATT GGC Ala Gin Ile Gly CTT ATG CCT CCT Leu Met Pro Pro CGT GAT GAA Arg Asp Glu 485 TAT GGA GAT GGT Tyr Gly Asp Gly GTT AAA GCT GTT TGG GTG GAC GAA Val Lys Ala Val Trp Val Asp Glu WO 98/20113 WO 9820113PCTIUS97/20391 -134- GAT GAT Asp Asp 505 CTG AAC Leu Asn GTG GGA ATA TAC ACA GTG AAA ACA ATC Val Gly Ile Tyr Thr Leu Lys Thr Ile 510
ATC
Ile GAT GAT Asp Asp 515 AAT ATT Asn Ile AAG ACT GTA Lys Thr Val AGG CCA CTC Arg Pro Leu
AAA
Lys 530 CCA CGC ACT Pro Arg Thr CTC TCT GAG Leu Ser Gin 535 TGT TTG AAG Cys Leu Lys 520
AAG
Lys GAG GTT GTG Glu Leu Val
GCA
Ala 540
TCT
Ser AAG TGG GM MA Lys Trp Giu Lys
CTC
Leu 545
GTT
Leu TGA GGA MAG Ser Gly Lys GCA GGC ATC Ala Gly Ile MAA AGA TAG Lys Thr Tyr CGT TAG GMA Pro Tyr Giu 570 AGT GGA GAT Ser Gly Asp
ATT
Ile 555
CAT
His GCT GAG GAT Ala Giu Asp
TTT
Phe 560
TGT
Ser
GMA
Giu 565
ATG
Met GAG GTC GGA Gin Val Gly 550 GAT CMA Asp Gin TTT TAC Phe Tyr
ATA
Ile 575
GAG
Glu CAC TTC TAT His Phe Tyr
CMA
Gin 580
GGT
Gly CTC TAT MAT Leu Tyr Asn 585 ACA GTG Thr Val
TTT
Phe 590
GTT
Val1 ATT GGG GCA Ile Sly Pro
GAG
Asp 595
ATG
Met AGA GMA SCA Arg Giu Aia CTA TAC CCT Leu Tyr Pro GAA TAG ACT ACC Gin Tyr Thr Thr GAT TCT TAT Asp Ser Tyr 600
MAG
Lys 718 766 814 862 910 962 1022 1082 1142 1202 1262 1282 CSC TAG TTA Arq Tyr Leu TMSGCAGSAT GAAGGTTMAT GTTGTAGGAC ATGMATCGGA
CGASMAATAC
TCCATCAGMA
TTAAGTTTTA
GGGGCAGGTT
TATCGATATA
TGACAAMM
CASAAATCTT CATTCMSGAT CAAATAATGS ATACCAGAAA TTTCTMATCG
AGTTCAAATA
TTTATGGMJ TAGGGGTGSA
CGMATTSMAT
GTAAAATTGC MGCTGTACA
GTMACTAGGT
AGTGATGTGA ASATTACCA
TTTCGTMATA
ATMAATMTT
ATGGATMAAT
ATATATTCAT
GTTGTCGCGA
ACTATGCTTG
GMGCATTAST
AATTCATTAT
CTSATATGGA
MAAGGTACTA
AATTTATTTT
INFORMATION FOR SEQ ID Wi SEQUENCE CHARACTERISTICS: LENGTH: 307 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Arq Val Leu Ile Val Gly Sly Thr Giy Tyr Ile Sly Arg Lys Phe Vai 1 5 10 WO 98/20113 WO 9820113PCT/US97/20391 -135- Lys Giu Gin Leu Gly Al a Gly Val Ile 145 Gly Val1 Asp Asn Giu 225 Thr Tyr Giy Ala Val1 Ala Val As n Ile Met Phe 130 Pro Giy Ile Val1 Lys 210 Leu Tyr Giu Asp Ser Giy Gly Ala His Lys Glu 115 Ile Tyr Leu Tyr Gly 195 Thr Val Ile His Leu 275 Leu Phe Al a Al a Phe Giu 100 Pro Asp Thr Ala Giy 180 Ile Val Al a Ser Gin 260 Tyr Ala Asp Arg Leu Arg Al a Asp Lys Tyr Gin 165 Asp Tyr Tyr Lys Aia 245 Val1 Asn Leu Ile Leu Lys 70 Asn Gly Leu Arq Val 150 Ile Gly Thr Ile T rp 230 Giu Gly Phe Giy His Glu Lys 40 Leu Glu 55 Gin Val Leu Ile Asn Ile Met Glu 120 Lys Val 135 Ser Ser Gly Arg Asn Val Leu Lys 200 Arg Pro 215 Glu Lys Asp Phe Ile Ser Giu Ile 280 Pro 25 Val Gly Asp Leu Lys 105 His Arg As n Leu Lys 185 Thr Le u Leu Leu His 265 Gly Thr His Ser Val Gin 90 Arg Al a Arg Ile Met 170 Al a Ile Lys Ser Ala 250 Phe Pro Phe Met Phe Val 75 Gin Phe Leu Al a Phe 155 Pro Val Asp Asn Gi y 235 Gly Tyr Asp Val Leu Giu Ile Leu Leu Gi u Ile 140 Al a Pro Trp Asp Ile 220 Lys le Gin Gi y Asp 300 Leu Leu Asp Ser Lys Pro Pro 125 Giu Gly Arg Val1 Pro 205 Leu Cys Glu Met Arg 285 Ser Ser Phe Al a Le u Ser 110 Gi y Ala Tyr Asp Asp 190 Arg Ser Leu Asp Phe 270 Glu Arg Pro Phe Lys Gin Ser Val Ala Val Glu Giu Phe Asn Ala Ala Gly Leu Ala 160 Glu Val 175 Giu Asp Thr Leu Gin Lys Lys Lys 240 Gin Pro 255 Tyr Ser Ala Thr Val Leu Tyr Pro Giu Val Gin Tyr Thr Thr Met Ser Tyr Leu Lys Arg 305 (2) 290 Tyr Leu INFORMATION FOR SEQ ID NO:71: WO 98/20113 PCTfUS97/20391 -136- SEQUENCE CHARACTERISTICS: LENGTH: 1328 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: Tsuga heterophylla cDNA'PLR-Th2 (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO (ix) FEATURE: NAME/KEY: CDS LOCATION: 20..946 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: GAATTCGGCA CGAGCTAAC ATG AGC AGA GTT CTA ATA GTG GGT GGC ACA GGA Met Ser Arg Val Leu Ile Val Gly Gly Thr Gly 310 315 TAC ATA Tyr Ile 320 GGT AGA AAA TTT Gly Arg Lys Phe
GTA
Val 325 AAA GCT AGC TTA Lys Ala Ser Leu CTA GGC CAC CCA Leu Gly His Pro
ACA
Thr 335 TTC GTT TTG TCC Phe Val Leu Ser
AGG
Arg 340 CCA GAA GTA GGG Pro Glu Val Gly GAC ATT GAG AAG Asp Ile Glu Lys CAC ATG TTG CTC His Met Leu Leu TTC AAA CAA GCG Phe Lys Gin Ala
GGT
Gly 360 GCC AGA CTT TTG Ala Arg Leu Leu GAG GGT Glu Gly 365 196 TCA TTT GAG Ser Phe Glu GTT GTG ATA Val Val Ile 385
GAT
Asp 370 TTC CAA AGC CTT Phe Gin Ser Leu GCA GCC TTG AAG Ala Ala Leu Lys CAG GTT GAT Gin Val Asp 380 AGT GCA GTG GCA Ser Ala Val Ala
GGA
Gly 390 AAC CAT TTC Asn His Phe AGA AAC CTT ATA CTT Arg Asn Leu Ile Leu 395 CAA CAG Gin Gin 400 CTT AAA TTG GTG Leu Lys Leu Val
GAA
Glu 405 GCC ATA AAA GAG Ala Ile Lys Glu CGC AAC ATT.AAG Arg Asn Ile Lys
AGA
Arg 415 TTT CTT CCT TCT Phe Leu Pro Ser TTT GGA ATG GAC Phe Gly Met Asp
CCA
Pro 425 GAC CTC ATG GAG Asp Leu Met Glu 388 436 GCT TTG GAA CCT Ala Leu Glu Pro
GGT
Gly 435 AAC GCT GTC TTC ATT GAT AAG AGA AAG Asn Ala Val Phe Ile Asp Lys Arg Lys 440 GTT CGG Val Arg 445 CGC GCC ATT Arg Ala Ile GAA GCA Glu Ala 450 GCA GGC ATT Ala Gly Ile TAC ACG TAT GTC Tyr Thr Tyr Val TCT TCA AAT Ser Ser Asn 460 WO 98/201 13 PCTIUS97/20391 -137- ATA TTT Ile Phe ATG CCT Met Pro 480 GCT OTT Ala Val GCT GOG TAT TTA OCA GGA GOG TTG GCA CAA ATT GGC CGG CTT Ala 465
CCT
Pro Gliy Tyr Leu Ala Gly Leu Ala Gin Ile Oly Arg Leu COT GAT GAA Arg Asp Giu
GTA
Val1 485
GAT
Asp ATC TAT OGA Ile Tyr Gly
OAT
Asp 490 GOT AAC OTT AAA Gly Asn Val Lys 580 TOO OTO GAC Trp Val Asp
GAA
Glu 500 OAT OTC OGA Asp Val Oly 495
ATC
Ile
ATA
Ile 505
OTA
Val1 TAG ACA CTG AAA Tyr Thr Leu Lys
ACA
Thr 510.
OAT OAT CCA Asp Asp Pro ACT CTG AAG AAO Thr Leu Asn Lys
ACT
Thr 520 TAT ATC AGO Tyr Ile Arg CGA CTG Pro Leu 525 AAA AAT ATA Lys Asn Ile TGA OGA AAG Ser Gly Lys 545 OCA GCG ATG Ala Giv Ile
CTC
Leu 530
TTT
Phe CG AAG GAO Gin Lys Glu GTT OTG GA Leu Val Ala 535 AAG TG Lys Trp TTG AAG AAA Leu Lys Lys
TAG
Tyr
GAA
Giu GAA GAT CAA Giu Asp Gin 560 TTC TAT Phe Tyr
GCT
Pro 565
AOT
Ser ATT TGT OCT GAG Ile Ser Ala Glu 555 CAT GAG GTCGOGA His Gin Val Gly 570 CTG TAT AAT TTT Leu Tyr Asn Phe GAA AAA GTC Giu Lys Leu 540 GAT TTT CTT Asp Phe Leu ATA TGT GAG Ile Ser His GAG ATT GO Giu Ile Gly 590 CAA TAG ACT Gin Tyr Thr 605 CAA ATG TTT Gin Met Phe 575
GGA
Pro
TAG
Tyr 580
OCA
Ala OGA OAT Gly Asp GAG GGT AGA Asp Gly Arg 585
CGT
Pro 820 868 916 966 ACA ATG GTA Thr Met Leu
TAG
T yr 600 GAA OTT Giu Val ACC ATG OAT Thr Met Asp
TCT
Ser 610 TTG AAG CGG Leu Lys Arg TAG TTA Tyr Leu 615 TAAGCAGGAT GAAGGTTAAT
GTTGTACOAC
ATAAATAATT
ATGOATAAAT
AATCAOTATT
TACAGTAACT
TAAAGCTACC
ATGAATGGGA
CAACATTAOT
AATTCATTAT
GAATATATAT
AGGTCTTGTG
ATATCGATAT
GGAGAAATAG
TCGATCAGAA
TTAAGTTTTA
TGATCTGATA
GGGAAAAGCT
AACTGATGTG
CGAAATGTT
ATATCAGAAA
TTTATTGAAA
TGGAGGGGGA
AGGATATCGA
ACCATTTCGT
CATTCAAGAT
TTTCTAATCA
TAGGOCTOGA
GGTTGTAAAA
TATAACTAAG
AATAACTATG
CAAATAATGG
AGTTCAAATA
CGAAGCCTTT
TTGCAAGCCG
TCTTOTGG
CTTGTGCAGG
1026 1086 1146 1206 1266 1326 1328 INFORMATION FOR SEQ ID NO:72: SEQUENCE CHARACTERISTICS: LENGTH: 309 amino acids TYPE: amino acid
A
WO 98/20113 PCT/US97/20391 -138- TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: Met Ser Arq Val Leu Ile Val Gly Gly Thr Gly Tyr Ile Gly Arq Lys Phe Arg Phe Gin Val Val1 Giu Asn Al a 145 Leu Giu Glu Thr Gin 225 Lys Gin Val1 Pro Lys Ser Ala Giu Phe Ala 130 Gly Ala Val Asp Leu 210 Lys Lys Pro Lys Glu Gin Leu Gi y Ala Gly 115 Val1 Ile Gly Val Asp 195 Asn Glu Thr Tyr Ala Val Ala Val1 Asn Ile 100 Met Phe Pro Gly Ile 180 Val1 Lys Leu Tyr Giu 260 Ser Gi y Gly Ala His Lys Asp Ile Tyr Leu 165 Tyr Gly Thr Val1 Ile 245 His Leu Phe Al a Ala 70 Phe Giu Pro Asp Thr 150 Ala Gly Ile Val Ala 230 Ser Gin Al a Asp Arg 55 Leu Arg Al a Asp Lys 135 Tyr Gin Asp Tyr Tyr 215 Lys Al a Val1 Leu Ile Leu Lys Asn Arg Le u 120 Arq Val Ile Gly Thr 200 Ile Trp Giu Gi y Gi y 25 Giu Leu Gin Leu Asn 105 Met Lys Ser Gly Asn 185 Leu Arg Gi u Asp Ile 265 His Lys Giu Vai Ile 90 Ile Giu Val Ser Arg 170 Vai Lys Pro Lys Phe 250 Ser Pro Val Giy Asp 75 Leu Lys His Arq As n 155 Leu Lys Thr Leu Leu 235 Leu His Thr His Ser Val1 Gin Arg Al a Arg 140 Ile Met Al a Ile Lys 220 Ser Ala Phe Val Leu Giu Ile Leu Leu 110 Glu Ile Ala Pro Trp 190 Asp Ile Lys Ile Gin 270 Leu Leu Asp Ser Lys Pro Pro Giu Gly Arg 175 Val Pro Leu Phe Glu 255 Met Ser Se r Phe Ala Leu Ser Gi y Ala Tyr 160 Asp Asp Arg Ser Leu 240 Asp Phe Tyr Ser Gly 275 Asp Leu Tyr Asn Phe Giu Ile Gly Pro 280 Asp Gly Arg Giu 285 WO 98/20113 PCTIUS97/20391 -139- Ala Thr Met Leu Tyr Pro Glu Val Gin Tyr Thr Thr Met Asp Ser Tyr 290 295 300 Lys Arg Tyr Leu INFORMATION FOR SEQ ID NO:73: SEQUENCE CHARACTERISTICS: LENGTH: 355 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA probe used to isolate Forsythia intermedia dirigent protein cDNA clone (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: AAGGAGCTGG TGTTCTACTT CCACGACATA CTTTTCAAAG GGGATAATTA ACTGCCACCA TAGTCGGGTC CCCCCAATGG GGCAACAAGA CTGCCATGGC AATTTTGGTG ACCTAATGGT GTTCGACGAT CCCATTACCT TAGACAACAA CCCCCAGTGG GTCGGGCACA AGGGATGTAC TTCTATGATC AAAAAAGTAC TGGCTCGGGT TCTCATTTTT GTTCAATTCA ACTAAGTATG TTGGAACCTT GGGGCTGATC CATTGTTGAA CAAGACTAGG GACGTATCAG TCATTGGTGG INFORMATION FOR SEQ ID NO:74: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCRprimer (iii) HYPOTHETICAL: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: CAGCTATGAC CATGATTACG INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid
CAACAATGCC
CGTGCCATTC
TCTGCATTCA
ATACAATGCT
GAACTTTGCT
AACCA
120 180 240 300 355 WO 98/20113 PCTIU~S97/20391 -140- STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION:"PCR primer U19" (iii) HYPOTHETICAL: NO (xi) SEQUENCE DESCRIPTION: SEQ ID GTTTTCCCAG TCACGACGT 19 INFORMATION FOR SEQ ID NO:76: SEQUENCE CHARACTERISTICS: LENGTH: 6 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: peptide(NADPH) binding motif (iii) HYPOTHETICAL:
NO
FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: Gly Xaa Gly Xaa Xaa Gly 1

Claims (54)

1. An isolated protein from a lignan biosynthetic pathway selected from the group consisting of dirigent protein and pinoresinol/lariciresinol reductases, wherein when the isolated protein is a pinoresinol/lariciresinol reductase the isolated protein has an enzymatic activity of at least 51 nmol h' mg-'.
2. An isolated protein of claim 1 having the biological activity of dirigent protein.
3. An isolated protein of claim 2 having the biological activity of dirigent protein from Forsythia.
4. An isolated protein of claim 3 having the biological activity of dirigent protein from Forsythia intermedia. An isolated protein of claim 2 having the biological activity of dirigent protein from Tsuga.
6. An isolated protein of claim 5 having the biological activity of dirigent protein from Tsuga heterophylla. *15 7. An isolated protein of claim 2 having the biological activity of dirigent protein from Thuja.
8. An isolated protein of claim 7 having the biological activity of dirigent protein from Thujaplicata.
9. An isolated protein of claim 1 having the biological activity of dirigent protein selected from the group consisting of SEQ ID Nos: 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33 and
10. An isolated protein of claim 1 having the biological activity of pinoresinol/lariciresinol reductase.
11. An isolated protein of claim 10 having the biological activity of .25 pinoresinol/lariciresinol reductase from Forsythia.
12. An isolated protein of claim 10 having the biological activity of pinoresinol/lariciresinol reductase from Forsythia intermedia. [R:\LIBC]06648.doc:mer WO 98/20113 PCT/US97/20391 -142-
13. An isolated protein of Claim 10 having the biological activity of pinoresinol/lariciresinol reductase from Tsuga.
14. An isolated protein of Claim 13 having the biological activity of pinoresinol/lariciresinol reductase from Tsuga heterophylla. An isolated protein of Claim 10 having the biological activity of pinoresinol/lariciresinol reductase from Thuja.
16. An isolated protein of Claim 15 having the biological activity of pinoresinol/lariciresinol reductase from Thuja plicata.
17. An isolated protein of Claim 1 having the biological activity of pinoresinol/lariciresinol reductase selected from the group consisting of SEQ ID Nos:48, 50, 52, 54, 56, 58, 62, 64, 66, 68, 70 and 72.
18. An isolated nucleotide sequence encoding a dirigent protein.
19. An isolated nucleotide sequence encoding a dirigent protein from a Forsythia species. A nucleotide sequence of Claim 19 encoding a dirigent protein from Forsythia intermedia.
21. An isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No: 13 or SEQ ID No:
22. An isolated nucleotide sequence of Claim 19 which encodes the amino acid sequence of SEQ ID No:13 or SEQ ID
23. An isolated nucleotide sequence of Claim 19 having the sequence of SEQ ID No:12 or SEQ ID No:14.
24. An isolated nucleotide sequence encoding a dirigent protein from a Tsuga species. A nucleotide sequence of Claim 24 encoding a dirigent protein from Tsuga heterophylla. WO 98/20113 PCT/US97/20391 -143-
26. An isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No: 17 or SEQ ID No:19.
27. An isolated nucleotide sequence of Claim 24 which encodes the amino acid sequence of SEQ ID No:17 or SEQ ID No:19.
28. An isolated nucleotide sequence of Claim 24 having the sequence of SEQ ID No:16 or SEQ ID No:18.
29. An isolated nucleotide sequence encoding a dirigent protein from a Thuja species. A nucleotide sequence of Claim 29 encoding a dirigent protein from Thuja plicata.
31. An isolated nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:21, 23, 25, 27, 29, 31, 33 or
32. An isolated nucleotide sequence of Claim 29 which encodes the amino acid sequence of any one of SEQ ID Nos:21, 23, 25, 27, 29, 31, 33 or
33. An isolated nucleotide sequence of Claim 29 having the sequence of any one of SEQ ID Nos:20, 22, 24, 26, 28, 30, 32 or 34.
34. An isolated -nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Forsythia species. A nucleotide sequence of Claim 34 encoding a pinoresinol/lariciresinol reductase from Forsythia intermedia.
36. An isolated nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:48, 50, 52, 54, 56 or 58.
37. An isolated nucleotide sequence of Claim 34 which encodes the amino acid sequence of any one of SEQ ID Nos:48, 50, 52, 54, 56 or 58.
38. An isolated nucleotide sequence of Claim 34 having the sequence of any one of SEQ ID Nos:47, 49, 51, 53, 55 or 57. WO 98/20113 PCT[US97/20391 -144-
39. An isolated nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Thuja species. A nucleotide sequence of Claim 39 encoding a pinoresinol/- lariciresinol reductase from Thujaplicata.
41. An isolated nucleotide sequence encoding a protein having the biological activity of any one of SEQ ID Nos:62, 64, 66 or 68.
42. An isolated nucleotide sequence of Claim 39 which encodes the amino acid sequence of any one of SEQ ID Nos:62, 64, 66 or 68.
43. An isolated nucleotide sequence of Claim 39 having the sequence of any one of SEQ ID Nos:61, 63, 65 or 67.
44. An isolated nucleotide sequence encoding a pinoresinol/lariciresinol reductase from a Tsuga species. A nucleotide sequence of Claim 44 encoding a pinoresinol/- lariciresinol reductase from Tsuga heterophylla.
46. An isolated nucleotide sequence encoding a protein having the biological activity of SEQ ID No:70 or SEQ ID No:72.
47. An isolated nucleotide sequence of Claim 44 which encodes the amino acid sequence of SEQ ID No:70 or SEQ ID No:72.
48. An isolated nucleotide sequence of Claim 44 having the sequence of SEQ ID No:69 or SEQ ID No:71.
49. A replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a dirigent protein selected from the group consisting of SEQ ID Nos:13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33 and A replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a pinoresinol/lariciresinol reductase selected from the group consisting of SEQ ID Nos:48, 50, 52, 54, 56, 58, 62, 64, 66, 68, 70 and 72. WO 98/20113 PCTIUS97/20391 -145-
51. A host cell comprising a vector of Claim 49.
52. A host cell comprising a vector of Claim
53. A method of enhancing the expression of pinoresinol/lariciresinol reductase in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence encoding a protein having the biological activity of a protein selected from the group consisting of SEQ ID Nos:48, 52, 54, 56, 58, 62, 64, 66, 68, 70 and 72.
54. A method of modifying the expression of pinoresinol/lariciresinol reductase in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence that expresses an RNA that is complementary to all or part of a nucleic acid molecule selected from the group consisting of SEQ ID Nos:47, 49, 51, 53, 55, 57, 61, 63, 65, 67, 69 and 71. A method of enhancing the expression of dirigent protein in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence encoding a protein having the biological activity of a protein selected from the group consisting of SEQ ID Nos:13, 15, 17, 19, 21, 23, 27, 29, 31, 33 and
56. A method of modifying the expression of dirigent protein in a suitable host cell comprising introducing into the host cell an expression vector that comprises a nucleotide sequence that expresses an RNA that is complementary to all or part of a nucleic acid molecule selected from the group consisting of SEQ ID Nos:12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32 and 34.
57. A method of producing optically-pure lignans comprising introducing into a host cell an expression vector that comprises a nucleotide sequence encoding a dirigent protein capable of directing a bimolecular phenoxy coupling reaction to produce an optically pure lignan, and purifying the optically pure lignan from the host cell.
58. The method of Claim 57 wherein the nucleotide sequence is selected from the group consisting of SEQ ID Nos:12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32 and 34. r f. 146
59. An isolated protein from a lignan biosynthetic pathway, said protein being substantially as hereinbefore described with reference to any one of the examples. An isolated nucleotide sequence encoding a dirigent protein, said sequence being substantially as hereinbefore described with reference to any one of the examples.
61. A replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a dirigent protein said vector being substantially as hereinbefore described with reference to any one of the examples.
62. A replicable expression vector comprising a nucleotide sequence encoding a protein having the biological activity of a pinoresinol/lariciresinol reductase, said vector 0o being substantially as hereinbefore described with reference to any one of the examples.
63. A method of enhancing the expression of pinoresinol/lariciresinol reductase in a suitable host cell, said method being substantially as hereinbefore described with reference to any one of the examples.
64. A method of modifying the expression of pinoresinol/lariciresinol reductase in a suitable host cell, said method being substantially as hereinbefore described with reference to any one of the examples. A method of enhancing the expresion of dirigent protein in a suitable host cell, said method being substantially as hereinbefore described with reference to any one •of the examples. 20 66. A method of modifying the expression of dirigent protein in a suitable host cell, said method being substantially as hereinbefore described with reference to any one of the examples.
67. A method of producing optically-pure lignans, said method being substantially as hereinbefore described with reference to any one of the examples. 25 68. Pinoresinol/lariciresinol reductase expressed in accordance with the method o of any one of claims 54, 63 or 64.
69. A dirigent protein expressed in accordance with the method of any one of claims 55, 56, 65 or 66. An optically-pure lignan produced in accordance with the method of any one of claims 57, 58 or 67. Dated 31 October, 2000 Washington State University Research Foundation Patent Attorneys for the Applicant/Nominated Person SPRUSON FERGUSON [I:\DayLib\LIBFF]06648spec.doc:gcc
AU51993/98A 1996-11-08 1997-11-07 Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use Ceased AU728116B2 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US3052296P 1996-11-08 1996-11-08
US60/030522 1996-11-08
US5438097P 1997-07-31 1997-07-31
US60/054380 1997-07-31
PCT/US1997/020391 WO1998020113A1 (en) 1996-11-08 1997-11-07 Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use

Publications (2)

Publication Number Publication Date
AU5199398A AU5199398A (en) 1998-05-29
AU728116B2 true AU728116B2 (en) 2001-01-04

Family

ID=26706133

Family Applications (1)

Application Number Title Priority Date Filing Date
AU51993/98A Ceased AU728116B2 (en) 1996-11-08 1997-11-07 Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use

Country Status (5)

Country Link
EP (1) EP0948602A1 (en)
JP (1) JP2001507931A (en)
AU (1) AU728116B2 (en)
CA (1) CA2270905A1 (en)
WO (1) WO1998020113A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6210942B1 (en) * 1996-11-08 2001-04-03 Washington State University Research Foundation Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
US20020174452A1 (en) * 2000-09-07 2002-11-21 Lewis Norman G. Monocot seeds with increased lignan content
WO2002061039A2 (en) * 2000-10-25 2002-08-08 Washington State University Research Foundation Thuja plicata dirigent protein promotors
WO2005030944A1 (en) 2003-09-30 2005-04-07 Suntory Limited A gene encoding an enzyme for catalyzing biosynthesis of lignan, and use thereof
JP4667007B2 (en) 2004-11-02 2011-04-06 サントリーホールディングス株式会社 Lignan glycosylation enzyme and its use
EP2246418A4 (en) 2007-12-28 2013-07-31 Suntory Holdings Ltd Lignan hydroxylase
CN112322621B (en) * 2020-11-10 2022-07-22 贵州大学 Eucommia DIR1 gene MeJA response promoter and application thereof
CN113603757B (en) * 2021-08-20 2023-05-26 昆明理工大学 Lily regale Dirigent similar protein gene LrDIR1 and application thereof

Also Published As

Publication number Publication date
EP0948602A1 (en) 1999-10-13
CA2270905A1 (en) 1998-05-14
WO1998020113A1 (en) 1998-05-14
JP2001507931A (en) 2001-06-19
AU5199398A (en) 1998-05-29

Similar Documents

Publication Publication Date Title
US6635459B1 (en) Nucleotide sequences encoding pinoresinol/lariciresinol reductase proteins and their methods of use
EP1023436A1 (en) GERANYL DIPHOSPHATE SYNTHASE FROM MINT ($i(MENTHA PIPERITA))
EP1151079B1 (en) 1-deoxy-d-xylulose-5-phosphate reductoisomerases, and methods of use
AU728116B2 (en) Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
CA2353306A1 (en) Nucleic acid sequences encoding isoflavone synthase
US6703229B2 (en) Aryl propenal double bond reductase
EP1100881A1 (en) Recombinant dehydrodiconiferyl alcohol benzylic ether reductase, and methods of use
JP2002512790A (en) Recombinant secoisolariciresinol dehydrogenase and methods of use
US8426684B2 (en) Isolated menthone reductase and nucleic acid molecules encoding same
EP1268788A2 (en) Aryl propenal double bond reductase
MXPA00010446A (en) Recombinant secoisolariciresinol dehydrogenase, and methods of use

Legal Events

Date Code Title Description
MK14 Patent ceased section 143(a) (annual fees not paid) or expired