WO2022150380A1 - Alkaloid producing oxidases and carboxy-lyases and methods of use - Google Patents

Alkaloid producing oxidases and carboxy-lyases and methods of use Download PDF

Info

Publication number
WO2022150380A1
WO2022150380A1 PCT/US2022/011302 US2022011302W WO2022150380A1 WO 2022150380 A1 WO2022150380 A1 WO 2022150380A1 US 2022011302 W US2022011302 W US 2022011302W WO 2022150380 A1 WO2022150380 A1 WO 2022150380A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
sequences
tyrosine
modified cell
decarboxylase
Prior art date
Application number
PCT/US2022/011302
Other languages
French (fr)
Inventor
Christopher John VAVRICKA, Jr.
Tomohisa Hasunuma
Michihiro Araki
Akihiko Kondo
Original Assignee
National University Corporation Kobe University
JUDD, Paul K.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Corporation Kobe University, JUDD, Paul K. filed Critical National University Corporation Kobe University
Publication of WO2022150380A1 publication Critical patent/WO2022150380A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0012Oxidoreductases (1.) acting on nitrogen containing compounds as donors (1.4, 1.5, 1.6, 1.7)
    • C12N9/0036Oxidoreductases (1.) acting on nitrogen containing compounds as donors (1.4, 1.5, 1.6, 1.7) acting on NADH or NADPH (1.6)
    • C12N9/0038Oxidoreductases (1.) acting on nitrogen containing compounds as donors (1.4, 1.5, 1.6, 1.7) acting on NADH or NADPH (1.6) with a heme protein as acceptor (1.6.2)
    • C12N9/0042NADPH-cytochrome P450 reductase (1.6.2.4)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1003Transferases (2.) transferring one-carbon groups (2.1)
    • C12N9/1007Methyltransferases (general) (2.1.1.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1096Transferases (2.) transferring nitrogenous groups (2.6)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/88Lyases (4.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P17/00Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms
    • C12P17/10Nitrogen as only ring hetero atom
    • C12P17/12Nitrogen as only ring hetero atom containing a six-membered hetero ring
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y106/00Oxidoreductases acting on NADH or NADPH (1.6)
    • C12Y106/02Oxidoreductases acting on NADH or NADPH (1.6) with a heme protein as acceptor (1.6.2)
    • C12Y106/02004NADPH-hemoprotein reductase (1.6.2.4), i.e. NADP-cytochrome P450-reductase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/14Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen (1.14.14)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/14Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen (1.14.14)
    • C12Y114/140094-Hydroxyphenylacetate 3-monooxygenase (1.14.14.9)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y201/00Transferases transferring one-carbon groups (2.1)
    • C12Y201/01Methyltransferases (2.1.1)
    • C12Y201/011163'-Hydroxy-N-methyl-(S)-coclaurine 4'-O-methyltransferase (2.1.1.116)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y201/00Transferases transferring one-carbon groups (2.1)
    • C12Y201/01Methyltransferases (2.1.1)
    • C12Y201/01128(RS)-Norcoclaurine 6-O-methyltransferase (2.1.1.128)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y201/00Transferases transferring one-carbon groups (2.1)
    • C12Y201/01Methyltransferases (2.1.1)
    • C12Y201/0114(S)-Coclaurine-N-methyltransferase (2.1.1.140)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y206/00Transferases transferring nitrogenous groups (2.6)
    • C12Y206/01Transaminases (2.6.1)
    • C12Y206/01005Tyrosine transaminase (2.6.1.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • C12Y401/01001Pyruvate decarboxylase (4.1.1.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • C12Y401/01025Tyrosine decarboxylase (4.1.1.25)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • C12Y401/01028Aromatic-L-amino-acid decarboxylase (4.1.1.28), i.e. tryptophane-decarboxylase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • C12Y401/01043Phenylpyruvate decarboxylase (4.1.1.43)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/01Carboxy-lyases (4.1.1)
    • C12Y401/01074Indolepyruvate decarboxylase (4.1.1.74)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y401/00Carbon-carbon lyases (4.1)
    • C12Y401/02Aldehyde-lyases (4.1.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y402/00Carbon-oxygen lyases (4.2)
    • C12Y402/01Hydro-lyases (4.2.1)
    • C12Y402/01078(S)-norcoclaurine synthase (4.2.1.78)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/20Fusion polypeptide containing a tag with affinity for a non-protein ligand
    • C07K2319/21Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a His-tag
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/185Escherichia
    • C12R2001/19Escherichia coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/38Pseudomonas
    • C12R2001/40Pseudomonas putida
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/46Streptococcus ; Enterococcus; Lactococcus

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The resulting discovery of new types of plant aromatic acetaldehyde synthase (AAS) and phenylpyruvate decarboxylase (PPDC), with novel active site features, demonstrates that alternative branches of the natural Papaver somniferum alkaloid pathway can operate simultaneously. Synergistic applications of AAS and PPDC leads to enhanced BIA production in Escherichia coli through hybrid norcoclaurine and norlaudanosoline pathways. Transplantation of features into homologous microbial enzyme templates leads to the highest bacterial titers of norcoclaurine and N-methylcoclaurine (NMC). Tradeoffs between high-specificity and high-activity are observed when comparing the natural plant enzymes with alternative microbial templates, respectively. Mechanism-directed isotope tracing patterns confirm the alternative pathway variations mediated by the identified AAS and PPDC and provide insight into optimal combinations of these enzymes for alkaloid production. This machine learning-driven enzyme discovery workflow can be applied to the discovery of other types of enzymes including cytochrome P450 (CYP450), and for the metabolic engineering of bioproduction pathways.

Description

ALKALOID PRODUCING OXIDASES AND CARBOXY-LYASES AND METHODS OF USE
RELATED APPLICATIONS
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 63/133,984 filed January 5, 2021, which is incorporated herein by reference in its entirety.
BACKGROUND OF THE INVENTION
Strain engineering is now a reliable approach to scale up production of target metabolites by integrating known genes and applying simple yet effective metabolic engineering strategies. But engineering the microbial production of secondary metabolites reaches the limitation of the characterized enzymes present in sequence databases, where many annotations are incorrect. In reality, there are millions of enzyme variants to choose from for each desired reaction, and a great abundance of variations are still hidden in nature with unknown sequence and function. In this way the evolution of nature over millions of years can be viewed as a highly diverse screening resource for synthetic biologists. Accordingly, the rational discovery of homologous enzyme sequences with useful functions is a powerful and inevitable approach to improve microbial bioproduction pathways.
Microbial production of benzylisoquinoline alkaloids (BIAs) is a model application for synthetic biology, with recent improvements in the genetic engineering of yeast and bacteria. The bacterial studies have relied on a rapid norlaudanosoline (also referred to as tetrahydropapaveroline or THP) containing pathway, whereas plants are reported to center around a more stable norcoclaurine intermediate, which has been well-exploited in yeast. Yet, in all of these current approaches, the committed substrates are produced by a few key microbial enzymes that have not been revealed in plants, and there remains many unexplored routes and undiscovered enzyme variants with great potential to improve related bioproduction pathways. Therefore, it is critical to discover new enzymes that can stabilize, for example, metabolic flux from tyrosine to downstream alkaloids, and to develop enzyme selection systems that can be applied to all metabolic engineering applications. The present disclosure satisfies these needs.
SUMMARY
Engineering the microbial production of secondary metabolites is limited by the known reactions of correctly annotated enzymes in sequence databases. To expand the range of biosynthesis pathways, machine learning is herein demonstrated for the discovery of missing link enzymes, using benzylisoquinoline alkaloid (BIA) production as a model application with potential to revolutionize the paradigm of sustainable biomanufacturing. A synthetic norlaudanosoline pathway has been utilized in bacteria, whereas plants are reported to contain a more stable norcoclaurine pathway, which is exploited in yeast. However, committed aromatic precursors are currently produced by microbial enzymes that remain elusive in plants, and additional downstream missing links are hidden within highly duplicated plant gene families. Accordingly, a machine learning enzyme selection algorithm may be used in the process of engineering a host cell to facilitate production of benzylisoquinoline compounds by predicting missing links enzymes and useful homologous enzymes from diverse candidate sequences. As described herein, metabolomics-based characterization of selected sequences reveals distinct oxidases and carboxy-lyases in reconstructed benzylisoquinoline alkaloid pathways from tyrosine. Synergistic application of aryl acetaldehyde producing enzymes results in enhanced benzylisoquinoline alkaloid production through hybrid norcoclaurine and norlaudanosoline pathways.
Accordingly, the disclosure provides various embodiments of methods for producing a benzylisoquinoline compound in a host cell by modifying a host cell to form a biosynthetic pathway comprising the steps of using machine learning to identify heterologous enzymes or uncharacterized heterologous enzymes that function in a biosynthetic pathway and inserting the identified enzymes or uncharacterized enzyme into the host cell to function in the biosynthetic pathway, or a heterologous enzymatic template is engineered such that an active site of an enzymatic sequence is changed to include one or more amino acid substitutions corresponding to an active site of the identified uncharacterized heterologous enzyme.
The disclosure also provides for a modified cell comprising one or more polynucleotides encoding one or more heterologous enzymes for the production of a benzylisoquinoline compound, wherein the polynucleotides encode one or more heterologous enzymes operably linked to a polynucleotide sequence controlling expression of the one or more heterologous enzymes, wherein expression of the one or more heterologous enzymes causes the modified cell to produce the benzylisoquinoline compound when provided with a substrate, and wherein the modified cell is not a plant cell.
In some embodiments, the modified cell is configured such that the production of the benzylisoquinoline compound comprises the formation of an intermediate compound selected from 4- hydroxyphenylacetaldehyde (4HPAA), norcoclaurine, or both.
In some embodiments, the modified cell includes one or more heterologous enzymes comprising one or more enzymes having thiamine pyrophosphate (TPP)-dependent phenylpyruvate decarboxylase activity and/or pyridoxal 5'-phosphate (PLP)-dependent aromatic acetaldehyde synthase activity.
In some embodiments, the heterologous enzymes having phenylpyruvate decarboxylase activity converts 4-hydroxyphneylpyruvate (4HPP) to 4-hydroxyphenylacetaldehyde (4HPAA), or converts 3,4- dihydroxyphenylpyruvate to 3,4-dihydroxyphenylacetaldehyde (DHPAA), or both.
In other embodiments, the heterologous enzymes having aromatic acetaldehyde synthase activity converts tyrosine to 4-hydroxyphenylacetaldehyde (4HPAA), or converts L-3,4-dihydroxyphenylalanine (L-DOPA) to 3,4-dihydroxyphenylacetaldehyde (DHPAA), or both.
In some embodiments, the one or more heterologous enzymes having phenylpyruvate decarboxylase activity comprises one or more of Saccharomyces cerevisiae transaminated amino acid decarboxylase (ARO10), pyruvate decarboxylase (PsPDCl) from P. somniferum, and pyruvate decarboxylase (PsPDCl) isoform XI from P. somniferum.
In some embodiments, the heterologous enzymes having aromatic acetaldehyde synthase (A AS) activity comprises one or more of tyrosine decarboxylase of P. somniferum (PsTyDCl); a PsTyDCl variant having one or more substitutions selected from Leu205His or Leu205Asn, Tyr98Phe, Phe99Tyr; a P. somniferum tyrosine decarboxylase 3 (PsTyDC3); a PsTyDC3 variant having one or more substitutions selected from Ile370Ser, TyrlOOPhe, PhelOlTyr, and His203Asn; a P. somniferum tyrosine decarboxylase 6 (PsTyDC6); a Pseudomonas putida L-DOPA decarboxylase (PpDDCl); and a PpDDCl variant having one or more substitutions selected from Tyr79Phe, Phe80Tyr, Hisl81Asn and Hisl81Leu.
In other embodiments, the disclosure provides an expression vector comprising one or more promoter sequences operably linked to at least one polynucleotide sequence encoding an enzyme as described herein.
The present disclosure illustrates that P. somniferum PDC1 has phenylpyruvate decarboxylase activity, even though the active site appears as a typical pyruvate decarboxylase, and that machine leaning can assist in predicting the specialized phenylpyruvate decarboxylase function. The present disclosure also illustrates that P. somniferum TyDC6 has aromatic acetaldehyde synthase (AAS) activity but has an active site indicative of a typical aromatic amino acid decarboxylase, and that the machine learning can assist in predicting the specialized AAS function.
Advantageously, these enzymes are adapted to create a pathway for synthesizing benzylisoquinoline compounds using a dual pathway containing both norlaudanosoline (tetrahydropapaveroline) and norcoclaurine intermediates to achieve high titers of downstream BIAs.
Advantageously, the application of multiple aryl acetaldehyde producing enzymes described by the present disclosure enables increased production of aryl acetaldehydes 4HPAA and/or DHPAA, in order to overcome potential loss of these unstable intermediates, for increased production of downstream BIAs.
Advantageously, the present disclosure provides a machine learning method for designing amino acid substitutions to engineer enzymes for improved production of intermediates in benzylisoquinoline alkaloid production, as demonstrated by the machine learning prediction and subsequent application of engineered PpDDC and A-methylcoclaurine hydroxylase (NMCH) variants.
Advantageously, the present disclosure provides for the selective production of norcoclaurine and/or /V-mcthylcoclaunne over norlaudanosoline and/or reticuline.
Advantageously, the present disclosure provides for the rapid increase in microbial production amounts of benzylisoquinoline alkaloids by applying newly discovered and engineered enzymes to mediate key steps in the benzylisoquinoline alkaloid pathway, with levels of reticuline about 50-fold higher than prior methods.
These and other features and advantages of this invention will be more fully understood from the following detailed description of the invention taken together with the accompanying claims. It is noted that the scope of the claims is defined by the recitations therein and not by the specific discussion of features and advantages set forth in the present description.
BRIEF DESCRIPTION OF THE DRAWINGS
The following drawings form part of the specification and are included to further demonstrate certain embodiments or various aspects of the invention. In some instances, embodiments of the invention can be best understood by referring to the accompanying drawings in combination with the detailed description presented herein. The description and accompanying drawings may highlight a certain specific example, or a certain aspect of the invention. However, one skilled in the art will understand that portions of the example or aspect may be used in combination with other examples or aspects of the invention.
FIG. 1. Uncovering missing links in Papaver somniferum as alternative branches to benzylisoquinoline alkaloids, a, An embodiment of a support vector machine (SVM) enzyme selection and discovery workflow for use with the disclosure b, Missing link candidate sequences are predicted and ranked based on high-dimensional SVM models. Structure-based rules are first determined to curate training sequences as those of the target function and to differentiate related sequences with a different function. Structural and chemical features are then extracted from classified training and test sequences, resulting in enzyme feature vectors. SVM models are built with the training vectors, and vectors from missing link candidate sequences are then tested against the models to predict their novel function c, Four quadrants emerge when classifying enzyme prediction based on EC group identification versus sequence selection, and demonstration level versus discovery level. Previous studies used the iterative random algorithm-based prediction software M-path to identify DHPAAS as a shortcut enzyme in a demonstration of pathway improvement. In a parallel study, a novel CYP450 for pathway improvement was discovered by M-path on its own. But M-path can only output the EC number and does not select specific sequences. Therefore, the current study demonstrates that machine learning can assist the selection of specific PPDC, CYP450 and CPR candidate sequences. Most importantly, the SVM machine learning algorithm described herein is able to direct the discovery of novel aromatic acetaldehyde synthase (AAS) and phenylpyruvate decarboxylase (PPDC) enzymes that can not be distinguished by structure alone.
FIG. 2. Prediction of AAS branch pathway enzymes to produce 4HPAA for norcoclaurine production. Structure-based curation of typical aromatic amino acid decarboxylase (AAAD), insect type AAS and plant type AAS, are represented by the active site configurations of Pseudomonas putida DDC (PpDDC) having active site residues Hisl81 and Tyr324, Bombyx mori DHPAAS (DHPAAS) having active stie residues Asnl92 and Tyr 332, and Petroselinum crispum 4HPAAS (Pc4HPAAS) having active site residues His 201 and Phe346. AAS candidate PsTyDCl has a unique active site having residues Leu205 and Tyr350, while AAS candidate PsTyDC6 has an AAAD-like active site having His 205 and Tyr 350, and could not be predicted by a homology or structure-based approach alone a, Cross validation for correct assignment of AAAD and AAAS training sequences is performed using SVM models, Random Forests models and by comparing sequence homology of each training sequence to a consensus sequences of AAS training sequences and a consensus sequence of AAAD training sequences (Homology), as described in the methods section b, For visual representation, a two-dimensional plot of AAS SVM-based prediction is shown, with positive and negative prediction spaces. Principal component analysis (PCA) is used to compress multi-dimensional data into two dimensions (PCI and PC2) for a visual representation. Corresponding high-dimensional SVM decision scores from Table 1 are shown on the right. Decision scores represent the distance from the SVM prediction boundary c, LC-MS detection of products from Thalictrum flavum norcoclaurine synthase (TfNCS) containing strains T1-01-DE3 (wild-type PsTyDCl + TfNCS), T1-02-DE3 (PsTyDCl-Leu205His + TfNCS) and T1-03-DE3 (PsTyDCl-Tyr98Phe-Phe99Tyr- Leu205Asn + TfNCS), (Table 2) grown in LB supplemented with 1 mM tyrosine and 0.5 mM dopamine, at 28°C with 180 rpm shaking for 51 hours. Single letter amino acid abbreviations are used in this panel. Selective in vivo production of the downstream AAS product norcoclaurine accompanies the expression of wild-type PsTyDCl, as well as the triple variant of PsTyDCl with an engineered active site based on that of insect DHPAAS. Tyramine is the major product of PsTyDCl-Leu205His, which contains an engineered active site based on AAAD. Similar results are replicated in FIG. 9 and FIG. 10.
FIG. 3. PsTyDC6 exhibits in vitro AAS activity, a, Derivatized 4HPAA is detected with GC-MS. Lyophilized enzyme reactions were derivatized with methoxyamine in pyridine and MSTFA before analysis with GC-MS. b, H2O2 production accompanies AAS activity. PsTyDC6 produced H2O2 in the presence of tyrosine as indicated by a peroxidase-based fluorescent assay. Error bars represent standard deviations from three independent tests for each condition c, DHPAA, as well as downstream products norlaudanosoline and norcoclaurine, are detected with LC-MS.
FIG. 4. PsPDCl promotes an alternative branch to 4HPAA for norcoclaurine production.
Structural comparison of classified phenylpyruvate decarboxylase (PPDC) enzymes ARO10 having a key active side residue Ile335, Azospirillum brasilense PPDC (AbPPDC, PDB ID: 2Q50) having active site residue Tyr283 in comparison to typical Zymomonas mobilis pyruvate decarboxylase (PDC) (ZmPDC, PDB ID: 2WVA) having a corresponding active site residue Tyr283 in comparison to typical Zymomonas mobilis pyruvate decarboxylase (PDC) (ZmPDC, PDB ID: 2WVA) having active site residue Tyr 290 and candidate PPDC sequence PsPDCl having active site residue Tyr332.. The modeled PsPDCl active site contains Tyr332, which is also present in typical PDC enzymes which decarboxylate pyruvate. In this respect, the PsPDCl active site is distinct from microbial PPDCs, which all contain smaller residues in place of Tyr332 ( Lactococcus lactis KdcA contains Ser286 corresponding to Tyr332). Yet, the presence of Tyr332 in PsPDCl does not interfere with docking of tyrosine into the PsPDCl active site a, Cross validation for correct assignment of PPDC model training sequences is performed using SVM models, Random Forests models and by comparing sequence homology of each training sequence to a consensus sequences of PPDC training sequences and a consensus sequence of PDC training sequences (Homology), as described in the methods section b, SVM-based prediction of putative PPDC sequences visualized in three dimensions by compressing high-dimensional data (Table 3) into two dimensions (PCI and PC2) and plotting them together with two-dimensional decision scores. Prediction score trends for truncated PsPDCl isoform XI (TrcPsPDCl-IXl), PsPDCl, PsPDC2 and Ps2HCLL (2-hydroxyacyl-CoA ligase-like) are similar in high dimensional models (Table 3). c, PsPDCl mediates in vivo production of 4-hydroxyphenylethanol (tyrosol) through a 4HPAA intermediate (Scheme 2), in M9 medium supplemented with 1.2 mM 4HPP at 25 °C with 180 rpm shaking. Strain P1-01-AI, which contains PsPDCl, mediates higher tyrosol production than that of strains P2-01-AI and P3-01-AI, which contain PsPDC2 and Ps2HCLL, respectively. PsPDCl mediates downstream production of norcoclaurine (NC) from LB supplemented with 5 mM tyrosine and 3.77 mM dopamine in strain P1-02-AI, at 20-25 °C with 180 rpm shaking. Here, tyrosol is detected after 71 hours, and norcoclaurine is detected after 41 hours, from filtered and dried culture medium as Tetramethylsilane (TMS)-derivatives using GC-MS. Detection of PsPDCl products is replicated in FIG. 11. FIG. 5. Demonstration level prediction and tuning of P. somniferum NMCH and CPR for improved reticuline production from norcoclaurine. EcNMCH contains binding pocket residue Tyr202 and PsNMCH-Il contains binding pocket residue His203. Single letter amino acid abbreviations are used in this panel a, For visual representation, two-dimensional S VM-based prediction of NMCH sequences is shown, with positive and negative prediction spaces, respectively. With exception to EcNMCH and EcNMCH-Y202H, all points represent P. somniferum sequences. PCA is used to compress multi dimensional data into two dimensions (PCI and PC2) for the visual representation. Corresponding high dimensional SVM results are detailed in Table 4, and high-dimensional SVM decision scores are listed b, Two-dimensional SVM-based prediction of CPR sequences, with positive and negative prediction spaces, respectively. With exception to positive training sequence AtATR2, all points represent tested P. somniferum sequences. PCA is used to compress multi-dimensional data into two dimensions (PCI and PC2) for the visual representation. Corresponding high-dimensional SVM results are detailed in Table 5, and high-dimensional SVM decision scores are listed c, Conversion of 1.2 mM norcoclaurine to reticuline, mediated by various combinations of NMCH and CPR, together with Ps60MT, PsCNMT and Ps40MT, in strains N1-01-DE3, N1-02-DE3, N1-03-DE3, N1-04-DE3, N2-01-DE3, N2-02-DE3, N2-03-DE3, N2-04- DE3 (Table 2) (Scheme 3). Here, individual samples were analyzed 3 times to generate bar graphs in Prism 7 with error bars representing standard deviation (n=3).
FIG. 6. PsPDCl and PsTyDCl promote the norlaudanosoline pathway from L-DOPA to reticuline. a, Pathway expansion of the P. somniferum 4HPAA pathway to a dual norcoclaurine (NC) and norlaudanosoline (NL) pathway b, Strains T1-10-DE3, P1-02-AI and P1-04-AI contain PpDDC, PsONCS3, Cj60MT, CjCNMT and Cj40MT in addition to PsTyDCl, PsPDCl, and PsTyDCl +PsPDCl, respectively. Cultures were grown to high density in TB before addition of inducing agent, L-DOPA and ascorbate according to the methods section. 61 hours after addition of L-DOPA substrate, PsPDCl -mediated reticuline titers decline slightly, likely due to oxidative degradation. Replicate samples of filtered culture medium were analyzed with CE-MS. Here, 3 samples from individual cultures were analyzed to generate bar graphs in Prism 7 with error bars representing standard deviation.
FIG. 7. Replicating norlaudanosoline pathways using homologous enzyme templates, a, PsTyDCl is exchanged with an engineered PpDDC template having modified active site residues Asnl81, Phe79 and Tyr80 template via three active site gain-of-function substitutions, Tyr79Phe, Phe80Tyr and Hisl81Asn, to promote DHPAAS activity. In accordance with the resulting increase in AAS activity, these three substitutions result in increased SVM probability scores for AAS prediction, and reduced SVM probability scores for AAAD prediction. Norlaudanosoline (NL) production from L-DOPA was compared using PsTyDCl in strain T1-10-DE3 (t = 44 hours) and PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (DDC-T) in strain DT-02-DE3 (t = 40.5 hours). Culture conditions for each strain are described in the methods section. Single letter amino acid abbreviation H181L is used in this panel to refer to Hisl81Leu. b, PsPDCl having active site residues Tyr332 is exchanged with S. cerevisiae ARO10 having the corresponding active site residue Ile335, for higher PPDC activity in E. coli. Production of norlaudanosoline from L-DOPA by PsPDCl having active site residue Tyr332 in strain P1-02-AI is shown (t = 44 hours). Production of norlaudanosoline from L-DOPA and dopamine by ARO10 in strain A1-01-DE3 is compared (t = 44 hours). Culture conditions are described in the methods section. For panels a and b, Samples from individual cultures were analyzed to generate bar graphs in Prism 7, with error bars representing standard deviation c, Strain A1-01-DE3 containing ARO10 metabolizes tryptophan in TB medium to produce an indole-3- acetaldehyde derived indole alkaloid as a non-targeted byproduct during BIA production (t = 61 hours). Strain P1-02-AI containing PsPDCl did not readily convert indole 3-pyruvate to indole 3-acetaldehyde, as indicated by no detectable indole alkaloid byproduct (t = 61 hours).
FIG. 8. Optimization of norcoclaurine, reticuline, and /V-methylcoclaurine production for analysis of flux through hybrid pathways, a, PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (PpDDC-T) and PsPDCl containing strain P1-07-AI prefers the norlaudanosoline containing pathway. Combination of PpDDC-T, ARO10 and PsTyDCl in strain A1-06-AI promotes both norcoclaurine and norlaudanosoline containing pathways. ARO10 expressing strain A1-01-DE3 converts tyrosine and dopamine to norcoclaurine and N -methylcoclaurine (NMC). /V-Methylcoclaurine and reticuline were extracted with EtOAc from cultures 40 hours after addition of tyrosine together with L-DOPA or dopamine. Tested strains P1-06-DE3, P1-07-AI, A1-06-AI and A1-01-DE3 each contain Cj60MT, CjCNMT, Cj40MT, NCS, plus the indicated genes of the bottom 4 rows. P1-06-DE3 and P1-07-AI contain the same genes, but P1-06-DE3 was induced with only IPTG, without including arabinose for PsPDCl expression. Cultures containing PpDDC-T and L-DOPA were supplemented with additional sodium ascorbate. The BL21(AI) derived strain P1-07-AI was induced with IPTG and arabinose. For improved A-methylcoclaurine production, A1-01-DE3 was supplemented with the aldehyde reductase/dehydrogenase inhibitor gossypol. Additional culture conditions are described in the methods section. Extracted N -me thylcoclaurine and reticuline were TMS- derivatized and analyzed with GC-MS (t = 40 hours). After extraction, cultures were stored at 4 °C and stable norcoclaurine titers from culture medium were analyzed with LC-MS. b, Isotope profiling of strains P1-02-AI (expressing PsPDCl) and P1-04-AI (expressing PsPDCl and PsTyDCl), which produce N- methylcoclaurine-d6 (P1-02-AI - 62 nM, P1-04-AI 160 nM) and reticuline-d5 from tyrosine-iL and L- DOPA-d3 (t = 61 hours). There is a synergistic improvement in BIA production when combining PsPDCl and PsTyDCl. Here, NCS catalyzes the loss of a deuterium from in vivo-generated dopamine -d3 . c, For tracing aromatic isotope flux from tyrosine to norcoclaurine and N -methylcoclaurine, alkaloids were extracted with EtOAc from the A1-01-DE3 culture 40 hours after addition of tyrosine-d3, and dopamine, according to the methods section. Extracted alkaloids were TMS-derivatized and analyzed with GC-MS (n = 3). After extraction, cultures were stored at 4°C and stable BIA titers from culture medium were analyzed with CE-MS; the fraction of labeled BIA-d4 and unlabeled BIA from natural tyrosine in the rich TB broth can be quantified. With exception to unlabeled norcoclaurine, all other individual samples were analyzed 3 times to generate bar graphs in Prism 7 with error bars representing standard deviations.
FIG. 9. LC-MS detection of AAS and AAAD products from PsONCS3 containing strains. Strains T1-04-DE3 (wild-type PsTyDCl and PsONCS3), T1-05-DE3 (PsTyDCl -Leu205His and PsONCS3) and T1-06-DE3 (PsTyDCl-Tyr79Phe-Phe80Tyr-Leu205Asn and PsONCS3) were grown in LB supplemented with 1 mM tyrosine and 0.5 mM dopamine, at 28°C with 180 rpm shaking for 51 hours. Here, norcoclaurine production is generally lower with codon optimized PsONCS3, than it is with codon optimized TfNCS shown in FIG. 2c.
FIG. 10. LC-MS analysis of hybrid pathway intermediates resulting from PsTyDCl, PsNMCH, TfNCS and three Coptis japonica BIA methyltransferases (60MT, CNMT and 40MT).
Here, it is assumed that PsNMCH is not functioning without expression of CYP450 reductase (CPR). Strains T1-07-DE3 (wild-type PsTyDCl), T1-09-DE3 (PsTyDCl -Tyr98Phe-Phe99Tyr-Leu205Asn, labeled as "Triple") and T1-08-DE3 (PsTyDCl -Leu205His) were grown in M9. Recombinant protein expression was induced during log phase with 0.8 mM IPTG, and 0.5 hours later 5 mM tyrosine and 2.5 mM dopamine were added. BIA production at 20-25°C with 180 rpm shaking was monitored over 94 hours.
FIG. 11. Replicated PsPDCl mediated production of tyrosol and norcoclaurine. In vivo tyrosol is detected from 4HPP conversion by strains P1-01-AI (PsPDCl), P2-01-AI (PsPDC2) and P3-01-AI (Ps2HCLL). In vivo norcoclaurine (NC) is produced from tyrosine (Tyr) and dopamine (DA) by strain Pl- 02-AI (PsPDCl, PpDDC, PsONCS3, Cj60MT, CjCNMT and Cj40MT).
FIG. 12. Isotope profiling of synthetic BIA pathways. NCS-mediated production of 13C2-labeled (a), d6-labclcd (b) and d5 -labclcd (c, d) BIAs (Scheme 6). a, Metabolism of tyrosine-13C by strain Al-03- DE3 (Table 2) in M9 medium. Production of norcoclaurine (NC)-13C2 (day 7) confirms the PPDC bypass through 4HPP-13C (day 3). b, Strain A1-01-DE3 produced NMC-d6 from tyrosine-d4 and dopamine-d2 in TB (t = 61 hours,) c, A1-01-DE3 also produced reticul ine-d5 from L-DOPA-d; and dopamine-d2 in TB (t = 61 hours), through a DHPP branch pathway d, THP-d5 (norlaudanosoline-d5) and reticuline-d5 were produced by the PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (PpDDC-T) containing strain DT-03-DE3 in M9 supplemented with L-DOPA-d3 , where the titers on day 7 are presented. Reticuline-ds production by strains T1-10-DE3 and P1-02-AI in TB is compared. Here, NCS catalyzes the loss of a deuterium from dopamine- di. SAM-dependent methyltransferase bottlenecks are indicated by b-d. Additional culture conditions are described in the methods section.
DETAILED DESCRIPTION
Definitions
The following definitions are included to provide a clear and consistent understanding of the specification and claims. As used herein, the recited terms have the following meanings. All other terms and phrases used in this specification have their ordinary meanings as one of skill in the art would understand. Such ordinary meanings may be obtained by reference to technical dictionaries, such as Singleton, et al., Dictionary of Microbiology and Molecular Biology, 2d ed. , John Wiley and Sons, New York (1994), and Hale & Markham, The Harper Collins Dictionary of Biology. Harper Perennial, N.Y. (1991). References in the specification to "one embodiment", "an embodiment", etc., indicate that the embodiment described may include a particular aspect, feature, structure, moiety, or characteristic, but not every embodiment necessarily includes that aspect, feature, structure, moiety, or characteristic. Moreover, such phrases may, but do not necessarily, refer to the same embodiment referred to in other portions of the specification. Further, when a particular aspect, feature, structure, moiety, or characteristic is described in connection with an embodiment, it is within the knowledge of one skilled in the art to affect or connect such aspect, feature, structure, moiety, or characteristic with other embodiments, whether or not explicitly described.
The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a compound" includes a plurality of such compounds, so that a compound X includes a plurality of compounds X. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for the use of exclusive terminology, such as "solely," "only," and the like, in connection with any element described herein, and/or the recitation of claim elements or use of "negative" limitations.
The term "and/or" means any one of the items, any combination of the items, or all of the items with which this term is associated. The phrases "one or more" and "at least one" are readily understood by one of skill in the art, particularly when read in context of its usage. For example, the phrase can mean one, two, three, four, five, six, ten, 100, or any upper limit approximately 10, 100, or 1000 times higher than a recited lower limit. For example, one or more substituents on a phenyl ring refers to one to five substituents on the ring.
As will be understood by the skilled artisan, all numbers, including those expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth, are approximations and are understood as being optionally modified in all instances by the term "about." These values can vary depending upon the desired properties sought to be obtained by those skilled in the art utilizing the teachings of the descriptions herein. It is also understood that such values inherently contain variability necessarily resulting from the standard deviations found in their respective testing measurements. When values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value without the modifier "about" also forms a further aspect.
The terms "about" and "approximately" are used interchangeably. Both terms can refer to a variation of ± 5%, ± 10%, ± 20%, or ± 25% of the value specified. For example, "about 50" percent can in some embodiments carry a variation from 45 to 55 percent, or as otherwise defined by a particular claim. For integer ranges, the term "about" can include one or two integers greater than and/or less than a recited integer at each end of the range. Unless indicated otherwise herein, the terms "about" and "approximately" are intended to include values, e.g., weight percentages, proximate to the recited range that are equivalent in terms of the functionality of the individual ingredient, composition, or embodiment. The terms "about" and "approximately" can also modify the endpoints of a recited range as discussed above in this paragraph.
As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, all ranges recited herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof, as well as the individual values making up the range, particularly integer values. It is therefore understood that each unit between two particular units are also disclosed. For example, if 10 to 15 is disclosed, then 11, 12, 13, and 14 are also disclosed, individually, and as part of a range. A recited range (e.g., weight percentages or carbon groups) includes each specific value, integer, decimal, or identity within the range. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, or tenths. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art, all language such as "up to", "at least", "greater than", "less than", "more than", "or more", and the like, include the number recited and such terms refer to ranges that can be subsequently broken down into sub-ranges as discussed above. In the same manner, all ratios recited herein also include all sub-ratios falling within the broader ratio. Accordingly, specific values recited for radicals, substituents, and ranges, are for illustration only; they do not exclude other defined values or other values within defined ranges for radicals and substituents. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
This disclosure provides ranges, limits, and deviations to variables such as volume, mass, percentages, ratios, etc. It is understood by an ordinary person skilled in the art that a range, such as “number 1” to “number 2”, implies a continuous range of numbers that includes the whole numbers and fractional numbers. For example, 1 to 10 means 1, 2, 3, 4, 5, ... 9, 10. It also means 1.0, 1.1, 1.2. 1.3, ..., 9.8, 9.9, 10.0, and also means 1.01, 1.02, 1.03, and so on. If the variable disclosed is a number less than “numberlO”, it implies a continuous range that includes whole numbers and fractional numbers less than numberlO, as discussed above. Similarly, if the variable disclosed is a number greater than “numberlO”, it implies a continuous range that includes whole numbers and fractional numbers greater than numberlO. These ranges can be modified by the term “about”, whose meaning has been described above.
One skilled in the art will also readily recognize that where members are grouped together in a common manner, such as in a Markush group, the invention encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group. Additionally, for all purposes, the invention encompasses not only the main group, but also the main group absent one or more of the group members. The invention therefore envisages the explicit exclusion of any one or more of members of a recited group. Accordingly, provisos may apply to any of the disclosed categories or embodiments whereby any one or more of the recited elements, species, or embodiments, may be excluded from such categories or embodiments, for example, for use in an explicit negative limitation.
The term “substantially” as used herein, is a broad term and is used in its ordinary sense, including, without limitation, being largely but not necessarily wholly that which is specified. For example, the term could refer to a numerical value that may not be 100% the full numerical value. The full numerical value may be less by about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 15%, or about 20%.
Wherever the term “comprising” is used herein, options are contemplated wherein the terms “consisting of’ or “consisting essentially of’ are used instead. As used herein, “comprising” is synonymous with "including," "containing," or "characterized by," and is inclusive or open-ended and does not exclude additional, unrecited elements or method steps. As used herein, "consisting of" excludes any element, step, or ingredient not specified in the aspect element. As used herein, "consisting essentially of" does not exclude materials or steps that do not materially affect the basic and novel characteristics of the aspect. In each instance herein any of the terms "comprising", "consisting essentially of" and "consisting of" may be replaced with either of the other two terms. The disclosure illustratively described herein may be practiced in the absence of any element or elements, limitation, or limitations not specifically disclosed herein.
The term "genome" or "genomic DNA" is referring to the heritable genetic information of a host organism. Said genomic DNA comprises the entire genetic material of a cell or an organism, including the DNA of the bacterial chromosome and plasmids for prokaryotic organisms and includes for eukaryotic organisms the DNA of the nucleus (chromosomal DNA), extrachromosomal DNA, and organellar DNA (e.g., of mitochondria). Preferably, the terms genome or genomic DNA is referring to the chromosomal DNA of the nucleus.
The term "chromosomal DNA" or "chromosomal DNA sequence" in the context of eukaryotic cells is to be understood as the genomic DNA of the cellular nucleus independent from the cell cycle status. Chromosomal DNA might therefore be organized in chromosomes or chromatids, they might be condensed or uncoiled. An insertion into the chromosomal DNA can be demonstrated and analyzed by various methods known in the art like e.g., polymerase chain reaction (PCR) analysis, Southern blot analysis, fluorescence in situ hybridization (FISH), in situ PCR and next generation sequencing (NGS).
The term "promoter" refers to a polynucleotide which directs the transcription of a structural gene to produce mRNA. Typically, a promoter is located in the 5' region of a gene, proximal to the start codon of a structural gene. If a promoter is an inducible promoter, then the rate of transcription increases in response to an inducing agent. In contrast, the rate of transcription is not regulated by an inducing agent if the promoter is a constitutive promoter. The term "enhancer" refers to a polynucleotide. An enhancer can increase the efficiency with which a particular gene is transcribed into mRNA irrespective of the distance or orientation of the enhancer relative to the start site of transcription. Usually, an enhancer is located close to a promoter, a 5'-untranslated sequence or in an intron.
A polynucleotide is "heterologous to" an organism or a second polynucleotide if it originates from a foreign species, or, if from the same species, is modified from its original form. For example, a promoter operably linked to a heterologous coding sequence refers to a coding sequence from a species different from that from which the promoter was derived, or, if from the same species, a coding sequence which is not naturally associated with the promoter (e. g. a genetically engineered coding sequence or an allele from a different ecotype or variety).
"Transgene", "transgenic" or "recombinant" refers to a polynucleotide manipulated by man or a copy or complement of a polynucleotide manipulated by man. For instance, a transgenic expression cassette comprising a promoter operably linked to a second polynucleotide may include a promoter that is heterologous to the second polynucleotide as the result of manipulation by man (e.g., by methods described in Sambrook et al., Molecular Cloning-A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, (1989) or Current Protocols in Molecular Biology Volumes 1 -3, John Wiley & Sons, Inc. (1994-1998)) of an isolated nucleic acid comprising the expression cassette. In another example, a recombinant expression cassette may comprise polynucleotides combined in such a way that the polynucleotides are extremely unlikely to be found in nature. For instance, restriction sites or plasmid vector sequences manipulated by man may flank or separate the promoter from the second polynucleotide. One of skill will recognize that polynucleotides can be manipulated in many ways and are not limited to the examples above.
In case the term "recombinant" is used to specify an organism or cell, e.g., a microorganism, it is used to express that the organism or cell comprises at least one "transgene", "transgenic" or "recombinant" polynucleotide, which is usually specified later on.
A polynucleotide "exogenous to" an individual organism is a polynucleotide which is introduced into the organism by any means other than by a sexual cross.
The terms "operable linkage" or "operably linked" are generally understood as meaning an arrangement in which a genetic control sequence, e.g., a promoter, enhancer or terminator, is capable of exerting its function with regard to a polynucleotide being operably linked to it, for example a polynucleotide encoding a polypeptide. Function, in this context, may mean for example control of the expression, i.e., transcription and/or translation, of the nucleic acid sequence. Control, in this context, encompasses for example initiating, increasing, governing, or suppressing the expression, i.e., transcription and, if appropriate, translation. Controlling, in turn, may be, for example, tissue- and / or time-specific. It may also be inducible, for example by certain chemicals, stress, pathogens, and the like. Preferably, operable linkage is understood as meaning for example the sequential arrangement of a promoter, of the nucleic acid sequence to be expressed and, if appropriate, further regulatory elements such as, for example, a terminator, in such a way that each of the regulatory elements can fulfill its function when the nucleic acid sequence is expressed. An operably linkage does not necessarily require a direct linkage in the chemical sense. For example, genetic control sequences like enhancer sequences are also capable of exerting their function on the target sequence from positions located at a distance to the polynucleotide, which is operably linked. Preferred arrangements are those in which the nucleic acid sequence to be expressed is positioned after a sequence acting as promoter so that the two sequences are linked covalently to one another. The distance between the promoter and the amino acid sequence encoding polynucleotide in an expression cassette, is preferably less than 200 base pairs, especially preferably less than 100 base pairs, very especially preferably less than 50 base pairs. The skilled worker is familiar with a variety of ways in order to obtain such an expression cassette. However, an expression cassette may also be constructed in such a way that the nucleic acid sequence to be expressed is brought under the control of an endogenous genetic control element, for example an endogenous promoter, for example by means of homologous recombination or else by random insertion. Such constructs are likewise understood as being expression cassettes for the purposes of the invention.
The term "expression cassette" means those constructs in which the nucleic acid sequence encoding an amino acid sequence to be expressed is linked operably to at least one genetic control element which enables or regulates its expression (i.e., transcription and / or translation). The expression may be, for example, stable or transient, constitutive, or inducible.
The terms "express," "expressing," "expressed" and "expression" refer to expression of a gene product (e.g., a biosynthetic enzyme of a gene of a pathway or reaction defined and described in this application) at a level that the resulting enzyme activity of this protein encoded for or the pathway or reaction that it refers to allows metabolic flux through this pathway or reaction in the organism in which this gene/pathway is expressed in. The expression can be done by genetic alteration of the microorganism that is used as a starting organism. In some embodiments, a microorganism can be genetically altered (e.g., genetically engineered) to express a gene product at an increased level relative to that produced by the starting microorganism or in a comparable microorganism which has not been altered. Genetic alteration includes, but is not limited to, altering or modifying regulatory sequences or sites associated with expression of a particular gene (e.g. by adding strong promoters, inducible promoters or multiple promoters or by removing regulatory sequences such that expression is constitutive), modifying the chromosomal location of a particular gene, altering nucleic acid sequences adjacent to a particular gene such as a ribosome binding site or transcription terminator, increasing the copy number of a particular gene, modifying proteins (e.g., regulatory proteins, suppressors, enhancers, transcriptional activators and the like) involved in transcription of a particular gene and/or translation of a particular gene product, or any other conventional means of deregulating expression of a particular gene using routine in the art (including but not limited to use of antisense nucleic acid molecules, for example, to block expression of repressor proteins).
In some embodiments, a microorganism can be physically or environmentally altered to express a gene product at an increased or lower level relative to level of expression of the gene product unaltered microorganism. For example, a microorganism can be treated with, or cultured in the presence of an agent known, or suspected to increase transcription of a particular gene and or translation of a particular gene product such that transcription and/or translation are enhanced or increased. Alternatively, a microorganism can be cultured at a temperature selected to increase transcription of a particular gene and/or translation of a particular gene product such that transcription and/or translation are enhanced or increased.
The term "domain" refers to a set of amino acids conserved at specific positions along an alignment of sequences of evolutionarily related proteins. While amino acids at other positions can vary between homologues, amino acids that are highly conserved at specific positions indicate amino acids that are likely essential in the structure, stability, or function of a protein. Identified by their high degree of conservation in aligned sequences of a family of protein homologues, they can be used as identifiers to determine if any polypeptide in question belongs to a previously identified polypeptide family.
The term "motif or "consensus sequence" or "signature" refers to a short, conserved region in the sequence of evolutionarily related proteins. Motifs are frequently highly conserved parts of domains, but may also include only part of the domain, or be located outside of conserved domain (if all of the amino acids of the motif fall outside of a defined domain).
Specialist databases exist for the identification of domains, for example, SMART (Schultz et al. (1998) Proc. Natl. Acad. Sci. USA 95, 5857-5864; Letunic et al. (2002) Nucleic Acids Res30, 242-244), InterPro (Mulder et al., (2003) Nucl. Acids. Res. 31, 315-318), Prosite (Bucher and Bairoch (1994), A generalized profile syntax for biomolecular sequences motifs and its function in automatic sequence interpretation. (In) ISMB-94; Proceedings 2nd International Conference on Intelligent Systems for Molecular Biology. Altman R.et al., Eds., pp53-61, AAAI Press, Menlo Park; Hulo et al., Nucl. Acids. Res. 32:D134-D137, (2004)), or Pfam (Bateman et al., Nucleic Acids Research 30(1 ): 276-280 (2002) & The Pfam protein families database: Finn et al., Nucleic Acids Research (2010) Database Issue 38:D21 1-222). A set of tools for in silico analysis of protein sequences is available on the ExPASy proteomics server (Swiss Institute of Bioinformatics (Gasteiger et al., ExPASy: the proteomics server for in-depth protein knowledge and analysis, Nucleic Acids Res. 31 :3784-3788(2003)). Domains or motifs may also be identified using routine techniques, such as by sequence alignment.
Methods for the alignment of sequences for comparison are well known in the art, such methods include GAP, BESTFIT, BLAST, FASTA and TFASTA. GAP uses the algorithm of Needleman and Wunsch ((1970) J Mol Biol 48: 443-453) to find the global (i.e., spanning the complete sequences) alignment of two sequences that maximizes the number of matches and minimizes the number of gaps. The BLAST algorithm (Altschul et al. (1990) J Mol Biol 215: 403-10) calculates percent sequence identity and performs a statistical analysis of the similarity between the two sequences. The software for performing BLAST analysis is publicly available through the National Centre for Biotechnology Information (NCBI). Homologues may readily be identified using, for example, the ClustalW multiple sequence alignment algorithm (version 1 .83), with the default pairwise alignment parameters, and a scoring method in percentage. Global percentages of similarity and identity may also be determined using one of the methods available in the MatGAT software package (Campanella et al., BMC Bioinformatics. 2003 Jul 10;4:29. MatGAT : an application that generates similarity/identity matrices using protein or DNA sequences). Minor manual editing may be performed to optimize alignment between conserved motifs, as would be apparent to a person skilled in the art. Furthermore, instead of using full-length sequences for the identification of homologues, specific domains may also be used. The sequence identity values may be determined over the entire nucleic acid or amino acid sequence or over selected domains or conserved motif(s), using the programs mentioned above using the default parameters. For local alignments, the Smith- Waterman algorithm is particularly useful (Smith TF, Waterman MS (1981) J. Mol. Biol 147(1); 195-7).
Typically, this involves a first BLAST involving BLASTing a query sequence against any sequence database, such as the publicly available NCBI database. BLASTN or TBLASTX (using standard default values) are generally used when starting from a nucleotide sequence, and BLASTP or TBLASTN (using standard default values) when starting from a protein sequence. The BLAST results may optionally be filtered. The full-length sequences of either the filtered results or non-filtered results are then BLASTed back (second BLAST) against sequences from the organism from which the query sequence is derived. The results of the first and second BLASTS are then compared. A paralogue is identified if a high-ranking hit from the first blast is from the same species as from which the query sequence is derived, a BLAST back then ideally results in the query sequence amongst the highest hits; an orthologue is identified if a high- ranking hit in the first BLAST is not from the same species as from which the query sequence is derived, and preferably results upon BLAST back in the query sequence being among the highest hits.
High-ranking hits are those having a low E-value. The lower the E- value, the more significant the score (or in other words the lower the chance that the hit was found by chance).
Computation of the E-value is well known in the art. In addition to E-values, comparisons are also scored by percentage identity. Percentage identity refers to the number of identical nucleotides (or amino acids) between the two compared nucleic acid (or polypeptide) sequences over a particular length. In the case of large families, ClustalW may be used, followed by a neighbor joining tree, to help visualize clustering of related genes and to identify orthologues and paralogues.
The term "sequence identity" between two nucleic acid sequences is understood as meaning the percent identity of the nucleic acid sequence over in each case the entire sequence length which is calculated by alignment with the aid of the program algorithm GAP (Wisconsin Package
Version 10.0, University of Wisconsin, Genetics Computer Group (GCG), Madison, USA), setting, for example, the following parameters:
Gap Weight: 12 Length Weight: 4; Average Match: 2,912 Average Mismatch:-2,003.
The term "sequence identity" between two amino acid sequences is understood as meaning the percent identity of the amino acids sequence over in each case the entire sequence length which is calculated by alignment with the aid of the program algorithm GAP (Wisconsin Package Version 10.0, University of Wisconsin, Genetics Computer Group (GCG), Madison, USA), setting, for example, the following parameters: Gap Weight: 8; Length Weight: 2; Average Match: 2,912; Average Mismatch: -2, 003.
The term "hybridization" as defined herein is a process wherein substantially homologous complementary nucleotide sequences anneal to each other. The hybridization process can occur entirely in solution, i.e., both complementary nucleic acids are in solution. The hybridization process can also occur with one of the complementary nucleic acids immobilized to a matrix such as magnetic beads, Sepharose beads or any other resin. The hybridization process can furthermore occur with one of the complementary nucleic acids immobilized to a solid support such as a nitro-cellulose or nylon membrane or immobilized by e.g., photolithography to, for example, a siliceous glass support (the latter known as nucleic acid arrays or microarrays or as nucleic acid chips). In order to allow hybridization to occur, the nucleic acid molecules are generally thermally or chemically denatured to melt a double strand into two single strands and/or to remove hairpins or other secondary structures from single stranded nucleic acids.
The term "stringency" refers to the conditions under which a hybridization takes place. The stringency of hybridization is influenced by conditions such as temperature, salt concentration, ionic strength, and hybridization buffer composition. Generally, low stringency conditions are selected to be about 30°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. Medium stringency conditions are when the temperature is 20°C below Tm, and high stringency conditions are when the temperature is 10°C below Tm. High stringency hybridization conditions are typically used for isolating hybridizing sequences that have high sequence similarity to the target nucleic acid sequence. However, nucleic acids may deviate in sequence and still encode a substantially identical polypeptide, due to the degeneracy of the genetic code. Therefore, medium stringency hybridization conditions may sometimes be needed to identify such nucleic acid molecules.
The Tm is the temperature under defined ionic strength and pH, at which 50% of the target sequence hybridizes to a perfectly matched probe. The Tm is dependent upon the solution conditions and the base composition and length of the probe. For example, longer sequences hybridize specifically at higher temperatures. The maximum rate of hybridization is obtained from about 16°C up to 32°C below Tm. The presence of monovalent cations in the hybridization solution reduce the electrostatic repulsion between the two nucleic acid strands thereby promoting hybrid formation; this effect is visible for sodium concentrations of up to 0.4M (for higher concentrations, this effect may be ignored). Formamide reduces the melting temperature of DNA-DNA and DNA-RNA duplexes with 0.6 to 0.7°C for each percent formamide, and addition of 50% formamide allows hybridization to be performed at 30 to 45°C, though the rate of hybridization will be lowered. Base pair mismatches reduce the hybridization rate and the thermal stability of the duplexes. On average and for large probes, the Tm decreases about 1 °C per % base mismatch. The Tm may be calculated using the following equations, depending on the types of hybrids:
1) DNA-DNA hybrids (Meinkoth and Wahl, Anal. Biochem., 138: 267-284, 1984):
Tm= 81.5°C + 16.6xlogio[Na+]a + 0.41x%[G/Cb] - 500x|Lc]-' - 0.61x% formamide
2) DNA-RNA or RNA-RNA hybrids:
Tm= 79.8°C+ 18.5 (logio[Na+]a) + 0.58 (%G/Cb) + 1 1 .8 (%G/Cb)2 - 820/Lc
3) oligo-DNA or oligo-RNAd hybrids:
For <20 nucleotides: Tm= 2 (ln)
For 20-35 nucleotides: Tm= 22 + 1 .46 (1„) a or for other monovalent cation, but only accurate in the 0.01-0.4 M range. b only accurate for %GC in the 30% to 75% range. c L = length of duplex in base pairs. d oligo, oligonucleotide; ln, = effective length of primer = 2x(no. of G/C)+(no. of A/T).
Non-specific binding may be controlled using any one of a number of known techniques such as, for example, blocking the membrane with protein containing solutions, additions of heterologous RNA, DNA, and SDS to the hybridization buffer, and treatment with RNAse. For non-homologous probes, a series of hybridizations may be performed by varying one of (i) progressively lowering the annealing temperature (for example from 68°C to 42 °C) or (ii) progressively lowering the formamide concentration (for example from 50% to 0%). The skilled artisan is aware of various parameters which may be altered during hybridization and which will either maintain or change the stringency conditions.
Besides the hybridization conditions, specificity of hybridization typically also depends on the function of post-hybridization washes. To remove background resulting from non-specific hybridization, samples are washed with dilute salt solutions. Critical factors of such washes include the ionic strength and temperature of the final wash solution: the lower the salt concentration and the higher the wash temperature, the higher the stringency of the wash. Wash conditions are typically performed at or below hybridization stringency. A positive hybridization gives a signal that is at least twice of that of the background. Generally, suitable stringent conditions for nucleic acid hybridization assays or gene amplification detection procedures are as set forth above. More or less stringent conditions may also be selected. The skilled artisan is aware of various parameters which may be altered during washing and which will either maintain or change the stringency conditions.
For example, typical high stringency hybridization conditions for DNA hybrids longer than 50 nucleotides encompass hybridization at 65 °C in lx SSC or at 42 °C in lx SSC and 50% formamide, followed by washing at 65 °C in 0.3x SSC. Examples of medium stringency hybridization conditions for DNA hybrids longer than 50 nucleotides encompass hybridization at 50 °C in 4x SSC or at 40 °C in 6x SSC and 50% formamide, followed by washing at 50 °C in 2x SSC. The length of the hybrid is the anticipated length for the hybridizing nucleic acid. When nucleic acids of known sequence are hybridized, the hybrid length may be determined by aligning the sequences and identifying the conserved regions described herein lx SSC is 0.15M NaCl and 15mM sodium citrate; the hybridization solution and wash solutions may additionally include 5x Denhardt's reagent, 0.5-1.0% SDS, 100 pg/inl denatured, fragmented salmon sperm DNA, 0.5% sodium pyrophosphate.
For the purposes of defining the level of stringency, reference can be made to Sambrook et al. (2001) Molecular Cloning: a laboratory manual, 3rd Edition, Cold Spring Harbor Laboratory Press, CSH, New York or to Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989 and yearly updates).
"Homologues" of a protein encompass peptides, oligopeptides, polypeptides, proteins, and enzymes having amino acid substitutions, deletions and/or insertions relative to the unmodified protein in question and having similar biological and functional activity as the unmodified protein from which they are derived. Protein homologues are also referred to as "homologous sequences".
A deletion refers to removal of one or more amino acids from a protein.
An insertion refers to one or more amino acid residues being introduced into a predetermined site in a protein. Insertions may comprise N-terminal and/or C-terminal fusions as well as intra- sequence insertions of single or multiple amino acids. Generally, insertions within the amino acid sequence will be smaller than N- or C-terminal fusions, of the order of about 1 to 10 residues. Examples of N- or C-terminal fusion proteins or peptides include the binding domain or activation domain of a transcriptional activator as used in the yeast two-hybrid system, phage coat proteins, (histidine)-6-tag, glutathione S-transferase-tag, protein A, maltose-binding protein, dihydrofolate reductase, Tag 100 epitope, c-myc epitope, FLAG®- epitope, lacZ, CMP (calmodulin-binding peptide), HA epitope, protein C epitope and VSV epitope.
A substitution refers to replacement of amino acids of the protein with other amino acids having similar properties (such as similar hydrophobicity, hydrophilicity, antigenicity, propensity to form or break a-helical structures or b-sheet structures). Amino acid substitutions are typically of single residues but may be clustered depending upon functional constraints placed upon the polypeptide and may range from 1 to 10 amino acids; insertions will usually be of the order of about 1 to 10 amino acid residues. The amino acid substitutions are preferably conservative amino acid substitutions. Conservative substitution tables are well known in the art (see for example Creighton (1984) Proteins. W.H. Freeman and Company (Eds) and as shown below).
Figure imgf000020_0001
Reference herein to an "endogenous" gene not only refers to the gene in question as found in an organism in its natural form (i.e., without there being any human intervention), but also refers to that same gene (or a substantially homologous nucleic acid/gene) in an isolated form subsequently (re)introduced into a microorganism (a transgene). For example, a transgenic microorganism containing such a transgene may encounter a substantial reduction of the transgene expression and/or substantial reduction of expression of the endogenous gene. The isolated gene may be isolated from an organism or may be manmade, for example by chemical synthesis.
The terms "orthologues" and "paralogues" encompass evolutionary concepts used to describe the ancestral relationships of genes. Paralogues are genes within the same species that have originated through duplication of an ancestral gene; orthologues are genes from different organisms that have originated through speciation and are also derived from a common ancestral gene.
The term "vector", preferably, encompasses phage, plasmid, fosmid, viral vectors as well as artificial chromosomes, such as bacterial or yeast artificial chromosomes. Moreover, the term also relates to targeting constructs which allow for random or site- directed integration of the targeting construct into genomic DNA. Such target constructs, preferably, comprise DNA of sufficient length for either homologous or heterologous recombination as described in detail below. The vector encompassing the polynucleotide of the present invention, preferably, further comprises selectable markers for propagation and/or selection in a recombinant microorganism. The vector may be incorporated into a recombinant microorganism by various techniques well known in the art. If introduced into a recombinant microorganism, the vector may reside in the cytoplasm or may be incorporated into the genome. In the latter case, it is to be understood that the vector may further comprise nucleic acid sequences which allow for homologous recombination or heterologous insertion. Vectors can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques.
The terms "transformation" and "transfection", conjugation and transduction, as used in the present context, are intended to comprise a multiplicity of prior-art processes for introducing foreign nucleic acid (for example DNA) into a recombinant microorganism, including calcium phosphate, rubidium chloride or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, natural competence, carbon-based clusters, chemically mediated transfer, electroporation or particle bombardment. Methods for many species of microorganisms are readily available in the literature.
Embodiments of the Invention
The present disclosure provides certain embodiments of methods of making benzylisoquinoline compounds or benzylisoquinoline precursor compounds in a modified host cell comprising one or more heterologous polynucleotides encoding one or more enzymes for benzylisoquinoline biosynthesis. The benzylisoquinoline compound or benzylisoquinoline precursor compound may be produced from a substrate benzylisoquinoline compound, or a substrate benzylisoquinoline precursor compound including, but not limited to, tyrosine and phenylalanine.
Also disclosed herein are embodiments of a modified a host cell and methods of making the same in which the modified host cell expresses one or more heterologous polynucleotides that encode one or more enzymes for producing a benzylisoquinoline compound or benzylisoquinoline precursor compound. In some embodiments, a method of making a host cell may comprise the steps of using machine learning to select heterologous enzymatic sequences with a metabolic function that produce the compound or an intermediate of the compound and engineering the host cell to express the selected heterologous enzymatic sequences to produce the compound or the intermediate of the compound.
In some embodiments, the machine learning comprises a support vector machine (SVM) algorithm such as described in Watanabe et al., J. Chem. Inf. Model. 60, 1833-1843 (2020). FIG. la and lb illustrate an embodiment of a SVM that may be used in the methods to produce a modified host cell as described herein. To build highly accurate machine learning models for prediction of specialized enzyme functions that may be incorrectly annotated in current biological databases, the protein sequences used for training are first curated based on sequence and structure analysis. Well-known examples of enzyme sequences with the desired specialized function and closely related sequences with a different function are first collected. Then the structures of these exemplar enzymes are analyzed, either from available structures in public structure databases like the Protein Data Bank or from structure models built using homology modeling. Key structural amino acid residues or sequence motifs, which may differentiate the desired function group from the different function group, are then identified from the structures and/or structure models. These key residues and/or motifs are then used to curate all of the training sequences. First training sequences with high homology to the desired function group are collected from sequence databases, including but not limited to NCBI and UniProt. The homologous sequences are classified as positive or negative training examples based on the key structural amino acids. Homologous sequences generally are considered as sequences with around 40% sequence identity or higher to the sequences with the known desired function but can also comprise sequences with about 30% or higher sequence identity. The well-known positive and negative exemplar sequences are also included in the positive and negative training groups, respectively.
Generally, around 100 positive sequence examples or more and around 100 closely related negative sequences examples or more are used to build highly accurate prediction models. However, the machine learning can still be successfully attempted using a lower number of sequences when there are limited sequences available. Negative training examples also include all other known enzyme classes apart from the desired enzyme function, so that the total negative training examples can easily be in the thousands.
After building the database of positive and negative sequence examples, the protein sequences are converted to enzyme feature vectors using systems including, but not limited to, PROFEAT or ProtVec. Here, chemical and structural information are extracted from the amino acid sequences in this process, to build the enzyme feature vectors for training the machine learning models. The machine learning models are then trained with the positive example enzyme feature vectors and the negative example enzyme feature vectors. After training, candidate sequences that might contain the desired function are collected as test sequences. The test sequences are also converted to enzyme feature vectors in the same process as that of the training vectors, and the test vectors are scored against the model. The candidate sequences are then ranked based on positive and negative prediction scores and the top-ranking sequences are prioritized for laboratory testing. In the case of SVM models, a decision score that represents the distance from the prediction border is also used to rank sequences. The training data can then be updated based on the laboratory test results to improve the curation criteria and further upgrade the models in multiple learning cycles.
Accordingly, in various embodiments, the SVM is trained using homologous template sequences of enzymes known to have a similar metabolic function to the selected heterologous enzymatic sequences. Preferably, the homologous template sequences are from a same enzymatic class as the heterologous enzymatic sequences.
In some embodiments, the selected heterologous enzymatic sequences are from enzymes with no known characterized function.
In some embodiments, a method of producing a compound in a host cell comprises the steps of using machine learning (e.g., a support vector machine algorithm) to select heterologous enzymatic sequences with a metabolic function that produces the compound or an intermediate of the compound, and engineering the host cell to express the selected heterologous enzymatic sequences to produce the compound or the intermediate of the compound.
In some embodiments, a method of producing a compound in a host cell comprises the steps of identifying a candidate enzyme template, comparing an amino acid sequence and/or 3-dimensional structure of the heterologous enzymatic sequences to an amino acid sequence and/or 3 -dimensional structure of the candidate enzyme template, finding one or more differences between an active site of the heterologous enzymatic sequences and an active site of the candidate enzyme template, modifying the active site of the candidate enzyme template to include the one or more differences between the active site of the enzymatic sequences and the active site of the candidate enzyme template to produce a modified enzyme template, and expressing the modified enzyme template in the host cell to produce the compound or the intermediate.
In some embodiments, the enzyme template sequence has one or more of aromatic acetaldehyde synthase (AAS) activity or phenylpyruvate decarboxylase (PPDC) activity, or both. In some embodiments, the enzyme template sequences are modified to improve the aromatic acetaldehyde synthase activity or phenylpyruvate decarboxylase activity, and/or to add another functionality. An enzyme template sequence may inlclude any enzyme as described herein.
In some embodiments, a method of producing a compound in a host cell further comprises the steps of confirming the expression of the selected candidate enzyme sequence, causing or enabling production of the compound or intermediate compound by inserting the selected candidate enzyme sequence (e.g., as a polynucleotide) into a test cell, and detecting the production of the compound or intermediate compound. Preferably, the test cell is one or more of a bacteria, yeast, plant, or fungi. Preferably, the test cell is a species of bacteria.
In some embodiments, after testing in a test cell, the enzyme or enzymes comprising the selected candidate sequences confirmed as producing the compound or intermediate compound are expressed in the host cell to produce the compound or intermediate compound.
In some embodiments, the test cell is the host cell. In some embodiments, the test cell and the host cell are well known to those skilled in the art, and includes prokaryotic cells, eukaryotic cells such as bacterial cells, fungal cells, yeast cells, mammalian cells, insect cells or plant cells. Exemplary bacterial cells include Escherichia, Salmonella, Streptomyces, Pseudomonas, Staphylococcus, or Bacillus. Examples of the bacteria include Escherichia coli, Lactococcus lactis, Bacillus subtilis, Bacillus cereus, and Salmonella fulum. Preferably, the test cell and host cell are a species of bacteria, and more preferably the test cell and the host cell is Escherichia coli.
In some embodiment, machine learning is used to identify uncharacterized enzymes that function in a biosynthetic pathway, wherein the identified uncharacterized enzyme is inserted into a host cell to function in the biosynthetic pathway, or an enzymatic template is engineered such that an active site of the enzymatic template sequence is changed to include one or more amino acid substitutions corresponding to an active site of the identified uncharacterized enzyme.
The disclosure provides for a modified cell comprising a synthetic or engineered biosynthetic pathway that may be formed according to various methods described herein. For example, in preferred embodiments, the methods disclosed herein may be used to produce a recombinant host cell having a biosynthetic pathway for production of a benzylisoquinoline alkaloid (BIA), using pathways comprising one or more enzymes modified or substituted for another enzyme as described herein, and as described, for example, in International Pat. Pub. No. WO/2020/090940, incorporated herein by reference in its entirety.
In some embodiments, a modified cell comprising one or more polynucleotides encoding one or more heterologous enzymes for the production of a benzylisoquinoline compound, wherein the polynucleotides encode one or more heterologous enzymes operably linked to a polynucleotide sequence controlling expression of the one or more heterologous enzymes, wherein expression of the one or more heterologous enzymes permits the modified cell to produce the benzylisoquinoline compound when provided with a substrate, and wherein the modified cell is not a plant cell.
Heterologous coding sequences of interest include but are not limited to sequences that encode enzymes, either wild-type or equivalent sequences, which are normally responsible for the production of BIAs and precursors in plants or yeast. In some cases, the enzymes for which the heterologous sequences code can be any of the enzymes in the BIA pathway and can be from any convenient source. The choice and number of enzymes encoded by the heterologous coding sequences for the particular synthetic pathway may be selected based upon the desired product. In certain embodiments, the host cells of the present invention may include 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 8 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, or even 15 or more heterologous coding sequences.
In some embodiments, the host cell or modified cell includes multiple copies of a heterologous coding sequence for an enzyme, such as 2 or more, 3 or more, 4 or more, 5 or more, or even 10 or more copies. In certain embodiments, the host cell includes multiple copies of heterologous coding sequences for one or more enzymes, such as multiple copies of two or more, three or more, four or more, etc. In some cases, the multiple copies of the heterologous coding sequence for an enzyme are derived from two or more different source organisms as compared to the host cell. For example, the host cell may include multiple copies of one heterologous coding sequence, where each of the copies is derived from a different source organism. As such, each copy may include some variations in explicit sequences based on interspecies differences of the enzyme of interest that is encoded by the heterologous coding sequence.
In some embodiments, a modified cell comprises, consists essentially of, or consists of one or more heterologous enzymes or heterologous polynucleotides encoding an enzyme having phenylpyruvate decarboxylase (PPDC) activity for the conversion of 4-hydroxyphneylpyruvate (4HPP) to 4- hydroxyphenylacetaldehyde (4HPAA) and/or the conversion of 3,4-dihydroxyphenylpyruvate to 3,4- dihydroxyphenylacetaldehyde (DHPAA), where 4HPAA and DHPAA are key intermediates for the production of benzylisoquinoline alkaloid (BIA). In some embodiments, the PPDC comprises one or more of S. cerevisiae ARO10, P. somniferum pyruvate decarboxylase 1 (PsPDCl), and P. somniferum pyruvate decarboxylase isoform XI. In some embodiments, the one or more heterologous enzymes or polynucleotides encoding said enzymes having phenylpyruvate decarboxylase activity comprise, consist essential of, or consist of one or more sequences selected from SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 80, and SEQ ID NO: 81.
In some embodiments, the one or more heterologous enzymes having phenylpyruvate decarboxylase activity are about 80%, about 85% about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, or about 100% identical to a sequence selected from SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 80, and SEQ ID NO: 81.
In some embodiments, the one or more heterologous enzymes may have truncated or modified N- terminal domains to improve their expression and/or solubility in a bacterial host, and in particular, E. coli.
In some embodiments, a modified cell may include, in addition to a PPDC, other heterologous enzymes or heterologous genes encoding said enzymes for BIA production including tyrosine 3- monooxygenase, L-DOPA decarboxylase (DDC), tyrosine aminotransferase (TAT), norcoclaurine synthase (NCS), norcoclaurine 6-O-methyltransferase (60MT), coclaurine N- methyl trail sfc rase (CNMT), CYP450 reductase (CPR), /V-mcthylcoclaunnc 3-hydroxylase (NMCH), and 3-hydroxy-/V-mcthylcoclaurmc 4-0- methyltransferase (40MT). In some embodiments, the tyrosine 3-monooxygenase includes the sequence SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 72, and SEQ ID NO: 73; L-DOPA decarboxylase (DDC) includes the sequences SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, AND SEQ ID NO: 71; tyrosine aminotransferase includes sequences SEQ ID NO: 74, SEQ ID NO: 75; norcoclaurine synthase (NCS) includes the sequences SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 68, SEQ ID NO: 69; norcoclaurine 6-0- methyltransferase (60MT) includes the sequences SEQ ID NO: 31, SEQ IDNO: 32, SEQ ID NO: 33, SEQ ID NO: 34; coclaurine /V-mcthy I transferase (CNMT) includes sequences SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 57, SEQ ID NO: 58; CYP450 reductase (CPR) includes sequences SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 78, SEQ ID NO: 79; V-methylcoclaurine 3- hydroxylase (NMCH) includes sequences SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 76, SEQ ID NO: 77; and 3-hydroxy-A-methylcoclaurine 4-O-methyltransferase (40MT) includes SEQ ID NO: 55, SEQ ID NO: 56. In some embodiments, the an enzymes or polynucleotide encoding the enzymes are a homolog or a functional homolog of the above identified sequences.
A functional homolog is a polypeptide i.e., enzyme, that has sequence similarity to a reference enzyme, and that carries out one or more of the biochemical or physiological function(s) of the reference enzyme. A functional homolog and the reference enzyme can be a natural occurring enzyme, and the sequence similarity can be due to convergent or divergent evolutionary events. As such, functional homologs are sometimes designated in the literature as homologs, or orthologs, or paralogs. Variants of a naturally occurring functional homolog, such as polypeptides encoded by mutants of a wild type coding sequence, can themselves be functional homologs. Functional homologs can also be created via site-directed mutagenesis of the coding sequence for a polypeptide, or by combining domains from the coding sequences for different naturally occurring polypeptides (“domain swapping”). Techniques for modifying genes encoding functional polypeptides described herein are known and include, inter alia, directed evolution techniques, site-directed mutagenesis techniques and random mutagenesis techniques, and can be useful to increase specific activity of a polypeptide, alter substrate specificity, alter expression levels, alter subcellular location, or modify polypeptide-polypeptide interactions in a desired manner. Such modified polypeptides are considered functional homologs. The term “functional homolog” also may be applied to the nucleic acid that encodes a functionally homologous polypeptide.
Functional homologs can be identified by analysis of nucleotide and polypeptide sequence alignments. For example, performing a query on a database of nucleotide or polypeptide sequences can identify homologs of benzylisoquinoline alkaloid biosynthesis polypeptides. Sequence analysis can involve BLAST, Reciprocal BLAST, or PSI-BLAST analysis of non-redundant databases using, for example, the amino acid sequence of an NCS polypeptide as the reference sequence. An amino acid sequence is, in some instances, deduced from a corresponding nucleotide sequence. Those polypeptides in the database that have greater than 40% sequence identity are candidates for further evaluation for suitability as a benzylisoquinoline alkaloid biosynthesis polypeptide. An amino acid sequence similarity allows for conservative amino acid substitutions, such as substitution of one hydrophobic residue for another or substitution of one polar residue for another. If desired, manual inspection of such candidates can be carried out in order to narrow the number of candidates to be further evaluated. Manual inspection can be performed by selecting those candidates that appear to have domains present in benzylisoquinoline alkaloid biosynthesis polypeptides, e.g., conserved functional domains. In some embodiments, nucleic acids and polypeptides are identified from transcriptome data based on expression levels rather than by using BLAST analysis.
In some embodiments, the host cell already may include one or more homologs or functional homologs of these enzymes.
In some embodiments, a modified cell comprises, consists essentially of, or consists of a heterologous aromatic acetaldehyde synthase (AAS) enzyme or gene encoding said enzyme for the conversion of tyrosine to 4HPAA and/or the conversion of L-DOPA to DHPAA, where 4HPAA and DHPAA are key intermediates for the production of benzylisoquinoline alkaloid (BIA). In some embodiments, the AAS comprises one or more of P. somniferum tyrosine/L-DOPA decarboxylase 6 (TyDC6), P. somniferum tyrosine/L-DOPA decarboxylase 1 (TyDCl), P. somniferum tyrosine/L-DOPA decarboxylase 3 (TyDC3), Pseudomonas putida L-DOPA decarboxylase (PpDDCl), and/or other AAS genes from plants, insects or bacteria. The AAS enzyme may be bifunctional with a mixture of DDC, tyrosine decarboxylase, and AAS functions.
In some embodiments, the heterologous enzymes having AAS activity and/or heterologous genes encoding said enzymes comprise one or more of a tyrosine decarboxylase of P. somniferum (PsTyDCl); a PsTyDCl variant having one or more substitutions selected from Leu205His, Tyr98Phe, Phe99Tyr, and Leu205Asn; a P. somniferum tyrosine decarboxylase 3 (PsTyDC3); a PsTyDC3 variant having one or more substitutions selected from Ile370Ser, TyrlOOPhe, PhelOlTyr, and His203Asn; a P. somniferum tyrosine decarboxylase 6 (PsTyDC6); a Pseudomonas putida L-DOPA decarboxylase (PpDDCl); and a PpDDCl variant having one or more substitutions selected from Tyr79Phe, Phe80Tyr, and Hisl81Asn or Hisl81Leu.
In some embodiments, the PsTyDCl, PsTyDC3, PsTyDC3, PsTyDC6, PpDDCl, and variants thereof comprise or consist of one or more sequences selected from SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, SEQ ID NO: 71.
In some embodiments, the PsTyDCl, PsTyDC3, PsTyDC3, PsTyDC6, PpDDCl, and variants thereof are about 80%, about 85% about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, or about 100% identical to a sequence selected from SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, SEQ ID NO: 71.
In other embodiments, a modified cell comprises a heterologous PsTyDCl enzyme variant or polynucleotide encoding said enzyme comprising a Leu205His substitution (SEQ ID NO: 7); or the PsTyDCl variant comprises a Tyr98Phe, a Phe99Tyr, and a Leu205Asn substitution (SEQ ID NO: 8). In some embodiments, the heterologous PsTyDC3 variant comprises a sequence selected from SEQ ID NO: 11 and SEQ ID NO: 12. In some embodiments, the heterologous PpDDCl enzyme variant or polynucleotide encoding the same comprises a sequence selected from SEQ ID NO: 27, SEQ ID NO:28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, and SEQ ID NO: 71.
In other embodiments, a modified cell comprises a PsTyDCl variant consisting of the Leu205His substitution (SEQ ID NO: 7); or the PsTyDCl variant consists of the Tyr98Phe, the Phe99Tyr, and the Leu205Asn substitution (SEQ ID NO: 8). In some embodiments, the heterologous PsTyDC3 variant consists of a sequence selected from SEQ ID NO: 11 AND SEQ ID NO: 12. In some embodiments, the heterologous PpDDCl enzyme variant or polynucleotide encoding the same consists of a sequence selected from SEQ ID NO: 27, SEQ ID NO:28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, and SEQ ID NO: 71.
In some embodiments, the AAS function may also be engineered into a DDC or tyrosine decarboxylase sequence through Tyr98Phe, Phe99Tyr, and X205Asn substitutions (PsTyDCl numbering) wherein X is a histidine or leucine amino acid
In addition to AAS, the modified cell also may include certain genes for BIA production including tyrosine 3-monooxygenase, L-DOPA decarboxylase (DDC), tyrosine aminotransferase (TAT), norcoclaurine synthase (NCS), norcoclaurine 6-O-methyltransferase (60MT), coclaurine N- methyltransferase (CNMT), CYP450 reductase (CPR), iV-methylcoclaurine 3-hydroxylase (NMCH) and 3- hydroxy-rV-methylcoclaurine 4- O-methyl transferase (40MT). In some embodiments, the tyrosine 3- monooxygenase includes the sequence SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 72, and SEQ ID NO: 73; L-DOPA decarboxylase (DDC) includes the sequences SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 70, AND SEQ ID NO: 71; tyrosine aminotransferase includes sequences SEQ ID NO: 74, SEQ ID NO: 75; norcoclaurine synthase (NCS) includes the sequences SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 68, SEQ ID NO: 69; norcoclaurine 6-C-methy I transferase (60MT) includes the sequences SEQ ID NO: 31, SEQ IDNO: 32, SEQ ID NO: 33, SEQ ID NO: 34; coclaurine L'-methyl transferase (CNMT) includes sequences SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 57, SEQ ID NO: 58; CYP450 reductase (CPR) includes sequences SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 78, SEQ ID NO: 79; iV-methylcoclaurine 3-hydroxylase (NMCH) includes sequences SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 76, SEQ ID NO: 77; and 3-hydroxy-A-methylcoclaurine 4- O-methyl transferase (40MT) includes SEQ ID NO: 55, SEQ ID NO: 56. In some embodiments, the an enzymes or polynucleotide encoding the enzymes are homologs or functional homologs of the above identified sequences.
In some embodiments, a modified cell comprises, consists essentially of, or consists of a combination of heterologous PPDC, heterologous AAS enzymes, and/or heterologous genes encoding said enzymes as discussed herein.
In some embodiments, a modified cell comprises a biosynthetic pathway for the productions of benzylisoquinoline wherein the production of the benzylisoquinoline compound comprises the formation of an intermediate compound (i.e., a precursor to a BIA compound that is not starting substrate) selected from 4HPAA, norcoclaurine, or both. In some embodiments, the production of the benzylisoquinoline compound comprises the formation of norcoclaurine as an intermediate compound.
For example, a modified cell also may comprise one or more of 3,4-dihydroxyphenylacetaldehyde synthase (DHPAAS) (e.g., from Bombyx mori ), tyrosine 3 -monooxygenase (e.g., from E. coli, Beta vulgaris, P. somniferum or C. japonica ) tyrosine aminotransferase (TAT) (e.g., from E. coli), L-DOPA decarboxylase (DDC) (e.g., from P. putida), cytochrome P450 (CYP450) (e.g., from P. somniferum or A. thaliana), CYP450 reductase (CPR) (e.g., from P. somniferum or A. thaliana), norcoclaurine synthase (NCS) (e.g., fr m P. somniferum, T.flavum, or C. japonica), norcoclaurine 6- O-methyl transferase (60MT) (e.g., from P. somniferum, T. flavum, or C. japonica), coclaurine /V-methy I transferase (CNMT) (e.g., from P. somniferum, T. flavum, or C. japonica), /V-mcthylcoclaurinc 3-hydroxylase (NMCH) (e.g., from P. somniferum or E. californica), and 3-hydroxy-A-methylcoclaurine 4- -methyltransferase (40MT) (e.g., from P. somniferum, T. flavum, E. californica, or C. japonica), L-DOPA decarboxylase (DDC or DODC) (e.g., from P. somniferum or C. japonica), salutaridine synthase (SAS) (e.g., from P. somniferum or C. japonica), salutaridine reductase (SalR) (e.g., from P. somniferum or C. japonica), salutaridinol-7-O- acetyltransferase (SalAT) (e.g., from P. somniferum or C. japonica), purine permease (PUP) (e.g., from P. somniferum, A. thaliana, or C. japonica), or any combination thereof, to facilitate conversion of a substrate to a target benzylisoquinoline compound. The host cell also may include functional homologues of the above listed enzymes.
In some embodiments, a modified cell also may comprise one or more enzymes found in Tables 2- 6. The heterologous enzyme and polynucleotide sequences encoding the heterologous enzymes are as reported in various databases that are well known to those of ordinary skill in the art, such as Genbank, Uniprot, NCBI database, using the provided accession numbers.
In other embodiments, certain enzyme combinations may be used to produce the benzylisoquinoline compound reticuline and/or N-methylcoclaurine using tyrosine or L-DOPA as a substrate. For example, a modified host cell may comprise any one of the following combinations of heterologous enzyme or polynucleotide encoding said heterologous enzymes: Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC (SEQ ID NO: 27; SEQ ID NO: 28), PsONCS3 (SEQ ID NO: 19; SEQ ID NO: 20), PsPDCl (SEQ ID NO: 1; SEQ ID NO: 2); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), PsONCS3 (SEQ ID NO: 19; SEQ ID NO: 20), PsTyDCl (SEQ ID NO: 5; SEQ ID NO: 6), PsPDCl (SEQ ID NO: 1; SEQ ID NO: 2); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC (SEQ ID NO: 27; SEQ ID NO: 28), PsONCS3 (SEQ ID NO: 19; SEQ ID NO: 20), PsTyDCl (SEQ ID NO: 5; SEQ ID NO: 6), PsPDCl(SEQ ID NO: 1; SEQ ID NO: 2); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (SEQ ID NO: 29; SEQ ID NO: 30), TfNCS (SEQ ID NO: 68; SEQ ID NO: 69); or Q60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC-Tyr79Phe-Phe80Tyr-His 181 Asn (SEQ ID NO: 29; SEQ ID NO: 30), PsONCS3 (SEQ ID NO: 19; SEQ ID NO: 20); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), PpDDC-Hisl81Leu; or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), PpDDC-Tyr79Phe-Phe80Tyr- Hisl81Asn (SEQ ID NO: 29; SEQ ID NO: 30); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC (SEQ ID NO: 27; SEQ ID NO: 28), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), PpDDC-Hisl81Leu, EcHpaB (SEQ ID NO: 49; SEQ ID NO: 50), EcHpaC (SEQ ID NO: 51; SEQ ID NO: 52); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn (SEQ ID NO: 29; SEQ ID NO: 30), EcHpaB (SEQ ID NO: 49; SEQ ID NO: 50), EcHpaC (SEQ ID NO: 51; SEQ ID NO: 52); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), ARO10 (SEQ ID NO: 15; SEQ ID NO: 16); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC (SEQ ID NO: 27; SEQ ID NO: 28), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), ARO10 (SEQ ID NO: 15; SEQ ID NO: 16), EcHpaB (SEQ ID NO: 49; SEQ ID NO: 50), EcHpaC (SEQ ID NO: 51; SEQ ID NO: 52); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn (SEQ ID NO: 29; SEQ ID NO: 30), CjNCS (SEQ ID NO: 21; SEQ ID NO: 22), ARO10 (SEQ ID NO: 15; SEQ ID NO: 16), PsTyDCl (SEQ ID NO: 5; SEQ ID NO: 6); or Cj60MT (SEQ ID NO: 31; SEQ ID NO: 32), CjCNMT (SEQ ID NO: 57; SEQ ID NO: 58), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (SEQ ID NO: 29; SEQ ID NO: 30), PsONCS3 (SEQ ID NO: 19; SEQ ID NO: 20), and PsPDCl(SEQ ID NO: 1; SEQ ID NO: 2).
In other embodiments, certain enzyme combinations may be used to produce the benzylisoquinoline compound reticuline using norcoclaurine as a substrate. For example, a modified host cell may comprise Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT (SEQ ID NO: 35; SEQ ID NO: 36), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), PsCPR-L (SEQ ID NO: 41; SEQ ID NO: 42; SEQ ID NO: 78; SEQ ID NO: 79); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT (SEQ ID NO: 35; SEQ ID NO: 36), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH- H203Y (SEQ ID NO: 39; SEQ ID NO: 40), PsCPR-L (SEQ ID NO: 41; SEQ ID NO: 42; SEQ ID NO: 78; SEQ ID NO: 79); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT (SEQ ID NO: 35; SEQ ID NO: 36), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH (SEQ ID NO: 37; SEQ ID NO: 38), AtATR2 (SEQ ID NO: 47; SEQ ID NO:48); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT (SEQ ID NO: 35; SEQ ID NO: 36), Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), PsNMCH-H203Y (SEQ ID NO: 39; SEQ ID NO: 40), AtATR2 (SEQ ID NO: 47; SEQ ID NO:48); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT, Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), EcNMCH (SEQ ID NO: 43; SEQ ID NO: 44), AtATR2 (SEQ ID NO: 47; SEQ ID NO:48); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT, Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), EcNMCH- Y202H (SEQ ID NO: 45; SEQ ID NO: 46), AtATR2 (SEQ ID NO: 47; SEQ ID NO:48); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT, Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), EcNMCH (SEQ ID NO: 43; SEQ ID NO: 44), PsCPR-L (SEQ ID NO: 41; SEQ ID NO: 42; SEQ ID NO: 78; SEQ ID NO: 79 ); or Ps60MT (SEQ ID NO: 33; SEQ ID NO: 34), PsCNMT, Cj40MT (SEQ ID NO: 55; SEQ ID NO: 56), EcNMCH- Y202H (SEQ ID NO: 45; SEQ ID NO: 46), PsCPR-L (SEQ ID NO: 41; SEQ ID NO: 42; SEQ ID NO: 78; SEQ ID NO: 79).
Exemplary benzylisoquinoline compounds that may be produced according to the methods and modified cells described herein may comprise one or more of norcoclaurine, coclaurine, norlaudanosoline, laudanosoline, A-methylnorcococlaurine, 3’-hydroxy-/V-methylcoclaurine, reticuline, norreticuline, papaverine, laudanine, laudanosine, tetrahydropaperivine, 1 ,2-dihydropaperivine, and orientaline.
In some embodiments, the modified cell may use one of more of the following substrates to produce the benzylisoquinoline compound: L-tyrosine, L-dihydroxyphenyl alanine (L-DOPA), dopamine, tyramine, 4-hydroxyphenylpyruvic acid (4HPP), 3,4-hydroxyphenylacetaldehyde (3,4-HPAA), 4-hydroxy- phenylacetaldehyde (4HPAA), or a combination thereof.
Expression of a wild-type or variant of a heterologous enzymes in a host cell is achieved by, for example, expressing a polynucleotide encoding the wild-type or variant of a heterologous enzyme. For example, cells may be transformed with an expression vector containing the polynucleotide encoding the heterologous enzyme. The expression vector is not particularly limited as long as it contains the gene of the present invention in an expressible state, and a vector suitable for each host can be used.
The expression vector of the present invention can be produced by constructing an expression cassette by inserting a transcription promoter upstream of the above-mentioned heterologous polynucleotide and, in some cases, a terminator downstream, and inserting this cassette into the expression vector. Alternatively, when a transcription promoter and / or terminator is already present in the expression vector, the promoter and / or terminator in the vector can be used to insert the heterologous polynucleotide between them without constructing an expression cassette.
To insert the above-mentioned heterologous polynucleotide into a vector, a method using a restriction enzyme, a method using topoisomerase, etc. can be used. Further, if necessary, at the time of insertion, an appropriate linker may be added. Ribosome binding sequences such as SD sequences and Kozak sequences are known as base sequences important for translation into amino acids, and these sequences can be inserted upstream of the gene. A part of the amino acid sequence encoded by the gene may be replaced with the insertion.
The vector used in the present invention is not particularly limited as long as it carries the gene of the present invention, and a vector suitable for each host can be used. Examples of the vector include plasmid DNA, bacteriophage DNA, retrotransposon DNA, artificial chromosome DNA and the like. The method of introducing the expression vector into the host is not particularly limited as long as it is a method suitable for the host. Examples of applicable methods include an electroporation method, a method using calcium ions, a spheroplast method, a lithium acetate method, a calcium phosphate method, and a lipofection method. Expression of the polynucleotide of interest in recombinant host cells can be quantified according to methods known to those skilled in the art. For example, it can be represented by the percentage of total cellular protein of the polypeptide encoded by the polynucleotide. In addition, using a cell extract of transformed cells, Western blotting using an antibody capable of detecting the polypeptide encoded by the polynucleotide, or real-time PCR using a primer that specifically detects the polynucleotide is confirmed
The host cell used in the present invention may be any host cell well known to those skilled in the art, and includes prokaryotic cells, eukaryotic cells such as bacterial cells, fungal cells, yeast cells, mammalian cells, insect cells or plant cells. Exemplary bacterial cells include Escherichia, Salmonella, Streptomyces, Pseudomonas, Staphylococcus, or any species of Bacillus. Examples include Escherichia coli, Lactococcus lactis, Bacillus subtilis, Bacillus cereus, Salmonella, and Pseudomonas. Preferably, the host cell is Escherichia coli.
In some embodiments, one or more heterologous enzymes are plant enzymes. For example, the heterologous enzyme can be from a plant that is from the genus Papaver. More specifically, Papaver plants that can be used include, but are not limited to Papaver bracteatum, Papaver somniferum, Papaver cylindricum, Papaver decaisnei, Papaver fugar, Papaver nudicale, Papaver oreophyllum, Papaver orientale, Papaver paeonifolium, Papaver persicum, Papaver pseudo-orientale, Papaver rhoeas, Papaver rhopalothece, Papaver armeniacum, Papaver setigerum, Papaver tauricolum, and Papaver triniaefolium. In some embodiments, the heterologous enzyme is from Papaver somniferum. In other embodiments, a heterologous enzyme may be from another plant such as Coptis japonica, Arabidopsis thaliana, Beta vulgaris, Nelumbo nucifera, Nelumbo lutea, Hydrastis canadensis, Berberis aquifolium, Berberis vulgaris, Coptis chine sis, Berberis aristate and Thalictrum flavum.
In some embodiments, the one or more heterologous enzymes are from one or more insect species such as Bombix mori, Camponotus Floridanus, Apis melifera, Aedes aegipuchi, Drosophila melanogaster and the like. In other embodiments, the heterologous enzyme may be from yeast, fungi, or bacteria such Saccharomyces sp. {Saccharomyces cerevisiae, Saccharomyces pombe), Pseudomonas putida, and Escherichia coli.
In other embodiments, the disclosure provides for an expression vector comprising one or more promoter sequences operably linked to at least one polynucleotide sequence encoding an enzyme as described herein. Any suitable promoters may be utilized in the subject host cells and methods. The promoters driving expression of the heterologous coding sequences may be constitutive promoters or inducible promoters, provided that the promoters are active in the host cells. The heterologous coding sequences may be expressed from their native promoters, or non-native promoters may be used. Such promoters may be low to high strength in the host in which they are used. Promoters may be regulated or constitutive. Promoters of interest include but are not limited to, promoters of glycolytic genes such as the promoter of the B. subtilis tsr gene (encoding the promoter region of the fructose bisphosphate aldolase gene) or the promoter from yeast S. cerevisiae gene coding for glyceraldehyde 3-phosphate dehydrogenase (GPD, GAPDH, or TDH3), the ADH1 promoter of baker's yeast, the phosphate-starvation induced promoters such as the PH05 promoter of yeast, the alkaline phosphatase promoter from B. licheniformis, yeast inducible promoters such as Gall-10, Gall, GalL, GalS, repressible promoter Met25, tetO, and constitutive promoters such as glyceraldehyde 3 -phosphate dehydrogenase promoter (GPD), alcohol dehydrogenase promoter (ADH), translation-elongation factor- 1 -a promoter (TEF), cytochrome c-oxidase promoter (CYC1), MRP7 promoter, etc. In some embodiments, the promoter element comprises one or more of the T7 promoter sequence taatacgactcactatagggaga (SEQ ID NO: 63), lac promoter tttacactttatgcttccggctcgtatgttg (SEQ ID NO: 64), araBAD promoter gacgctttttatcgcaactctctactgt (SEQ ID NO: 65), Ptac promoter sequences ttgacaattaatcatcggctcgtataatg (SEQ ID NO: 66), and trc promoter sequence ttgacaattaatcatccggctcgtataat (SEQ ID NO: 67).
Additionally, any promoter/enhancer combination (as per the Eukaryotic Promoter Data Base EPDB) could also be used to drive expression of genes. Any convenient appropriate promoters may be selected for the host cell, e.g., E. coli. A person of ordinary skill in the art can also use promoter selection to optimize transcript, and hence, enzyme levels to maximize production while minimizing energy resources.
In some embodiments, the modified host cell medium may be sampled and monitored for the production of BIA precursors of interest. The BIA precursors may be observed and measured using any convenient methods. Methods of interest include, but are not limited to, LC-MS methods (e.g., as described herein) where a sample of interest is analyzed by comparison with a known amount of a standard compound. Identity may be confirmed, e. g., by m/z and MS/MS fragmentation patterns, and quantitation or measurement of the compound may be achieved via LC trace peaks of know retention time and/or EIC MS peak analysis by reference to corresponding LC-MS analysis of a known amount of a standard of the compound.
In some cases, the host cell is capable of producing an increased amount of BIA relative to a control host cell that lacks the one or more modifications (e.g., as described herein). In certain instances, the increased amount of BIA is about 10% or more relative to the control host cell, such as about 20% or more, about 30% or more, about 40% or more, about 50% or more, about 60% or more, about 80% or more, about 100% or more, 2-fold or more, 5-fold or more, or even 10-fold or more relative to the control host cell.
Results and Discussion
Previous studies have successfully produced BIA in yeast and bacteria (Scheme 1). The bacterial studies have utilized bacterial monoamine oxidase (MAO) and insect DHPAAS for production of toxic 3,4- dihydroxyphenylacetaldehyde (DHPAA). However, the DHPAA containing pathways have resulted in the loss of unstable catechol containing intermediates. Other reports show that plants favor the 4- hydroxyphenylacetaldehyde (4HPAA) pathway to norcoclaurine (Scheme 1), which may be more stable due to lack of a catechol group in the early intermediates. Therefore, the application of a plant 4HPAA containing pathway offers potential to address the loss of key intermediates in our previous BIA productions in Escherichia coli, and also to increase utilization of tyrosine. Furthermore, the combination of 4HPAA and DHPAA pathways may also improve the utilization of tyrosine and aryl acetaldehydes. Despite success with the 4HPAA pathway in yeast, and many discussions on the expected phenylpyruvate decarboxylase (PPDC) and aromatic acetaldehyde synthase (AAS) activities, no aromatic acetaldehyde producing enzyme has yet been characterized from the high alkaloid producer Papaver somniferum. Moreover, no specific tyrosine 3 -monooxygenase sequence has been reported from P. somniferum and no characterized PPDC sequence has been reported from any higher plant.
To address this lack of available key enzymes, SVM-based machine learning is first applied to predict the missing link enzymes in plant alkaloid pathways shown in Scheme 1.
To prove the concept of rapid characterization of predicted enzymes, selected sequences are secondly cloned into E. coli expression systems and screened using LC-MS-, CE-MS- and GC-MS-based metabolomics approaches. Following this strategy, PsTyDCl and PsPDCl are confirmed to mediate AAS and PPDC activities: two key missing links in plant alkaloid pathways. These novel enzymes mediate new branches in the natural P. somniferum alkaloid pathway, and their application in E. coli results in the first complete plant BIA pathway within a microbial production system. Moreover, newly discovered PsTyDCl and PsPDCl contain unique active site features that have not been described before for AAS and PPDC activities, respectively. This helps establishes new structure-based rules for selection and engineering of plant pyridoxal 5'-phosphate (PLP)- and thiamine pyrophosphate (TPP) -dependent decarboxylases, as well as P450 monooxygenases.
Scheme 1: Key enzymatic steps from tyrosine to benzylisoquinoline alkaloids are shown in this scheme. Steps that are unclear in P. somniferum include conversion of tyrosine to L-DOPA by tyrosine 3- monooxygenase which may be a CYP450, conversion of tyrosine to 4HPAA by aromatic acetaldehyde synthase (AAS), and conversion of 4HPP to 4HPAA by phenylpyruvate decarboxylase (PPDC). These unclear P. somniferum enzymes are indicated by larger font. Reconstructed pathways to reticuline in yeast and E. coli are shown with superscript 1 or a superscript 2, respectively. Metabolite abbreviations: 4HPP - 4-hydroxyphenylpyruvic acid, 4HPAA - 4-hydroxyphenylacetaldehyde, L-DOPA - 3,4-dihydroxy-L- phenylalanine, DHPAA - 3,4-dihydroxyphenylacetaldehyde, NMC - A-methylcoclaurine, 3HC - 3- hydroxycoclaurine, 3HNMC 3-hydroxy-A-methylcoclaurine.
Figure imgf000034_0001
Enzyme abbreviations: AAS - aromatic acetaldehyde synthase, DHPAAS - 3,4- dihydroxyphenylacetaldehyde synthase, PPDC - phenylpyruvate decarboxylase, TAT - L-tyrosine aminotransferase, TyDC - L-tyrosine decarboxylase, DDC - L-DOPA decarboxylase, CYP450 - cytochrome P450, CPR - CYP450 reductase, AROIO - Saccharomyces cerevisiae transaminated amino acid decarboxylase, NCS - norcoclaurine synthase, 60MT - norcoclaurine 6-O-methyl transferase, CNMT - coclaurine N -methy I transferase, NMCH - /V-methylcoclaurmc 3 -hydroxylase, 40MT - 3- hydroxy-A-methylcoclaurine 4- O-methyl transferase . BIA production varies among approximately 50 combinations of newly discovered natural enzymes, as well as with homologous templates from bacteria and yeast. Switching of natural enzyme features back into microbial templates results in improved reticuline titers and the highest reported production of norcoclaurine and /V-mcthylcoclaui inc in E. coli. Synergistic combination of predicted missing link sequences together with homologous microbial templates affords 96.7 mg/L norcoclaurine, 71.8 mg/L A'-methylcoclaurinc (NMC) and 24.6 mg/L reticuline, without using any strain engineering. Dynamic metabolic profiling, with new mechanism-directed deuterium labeling patterns for aromatic compounds in addition to 13C-labeled alkaloid tracing methods, further confirms the various branches of flux from tyrosine to downstream alkaloids.
The current disclosure clarifies how machine learning can reveal homologous enzyme sequences with specialized functions. Previously, aromatic acetaldehyde synthase (AAS) was predicted with the enzyme selection software such as M-path to improve production of valuable alkaloids. However, only enzyme commission (EC) number could be predicted with M-path and the actual selection of candidate sequences had to be performed by human intuition. This issue is addressed by developing a support vector machine (SVM) algorithm to automatically select specific enzyme sequences: an upgrade that enables computer automated Design, Build, Test and Learn (DBTL) cycles.
To prove the concept of machine learning enzyme selection of specialized enzymes, conversion of tyrosine to benzylisoquinoline alkaloid (BIA) is selected as the target pathway for optimization (Scheme 1). BIAs are precursors to opioid analgesic medications that are currently mass-produced by industrially grown Papaver somniferum plants, which are a historical target for human-directed evolution of natural product production. While opioid misuse is a global problem, natural and semi-synthetic opioids derived from the BIA reticuline actually result in fewer deaths than less expensive and overly potent synthetic opioids (CDC Opioid Data Analysis and Resources). With diverse potential, natural BIAs have been shown to inhibit coronavirus, and the BIA norcoclaurine is a b2^Gehe¾ίe receptor agonist that is present in edible plants, medicinal herbs and sports supplements.
BIA production in Escherichia coli has utilized bacterial monoamine oxidase and insect 3,4- dihydroxyphenylacetaldehyde synthase (DHPAAS) to generate toxic 3,4-dihydroxyphenylacetaldehyde (DHPAA) However, the DHPAA containing pathways result in rapid loss of unstable catechol containing intermediates. Other reports show that plants use a 4-hydroxyphenylacetaldehyde (4HPAA) pathway to norcoclaurine (Scheme 1), which may be more stable due to lack of a catechol group in early intermediates. Therefore, plant 4HPAA pathways offer potential to prevent loss of BIA intermediates in E. coli. Furthermore, the combination of 4HPAA and DHPAA pathways may also improve utilization of tyrosine and aryl acetaldehydes. Despite success with the 4HPAA pathway in yeast and many discussions on the expected phenylpyruvate decarboxylase (PPDC, EC 4.1.1.43) and AAS (EC 4.1.107-9) activities in plants, no enzymes to produce aryl acetaldehydes 4HPAA or DHPAA have been characterized from high alkaloid producing poppy plants. Moreover, no plant sequence annotated as phenylpyruvate decarboxylase can be found from public databases, and numerous P. somniferum cytochrome P450 (CYP450) monooxygenases (EC 1.14.14) require complex clarification. This serious limitation in known enzymes is addressed by applying machine learning to predict the essential missing links in plant alkaloid pathways shown in
Scheme 1.
To automate the selection of sequences from over 100 candidates present throughout highly duplicated carboxy-lyase and oxidase gene families, refined SVM models are built from training sequences classified using structure-based rules. Then, to verify the machine learning prediction, approximately 50 strains expressing various combinations of candidate sequences and analogous templates are screened using liquid chromatography-mass spectrometry (LC-MS)-, capillary electrophoresis-MS (CE-MS)- and gas chromatography-MS (GC-MS)-based metabolomics. As a result, AAS and PPDC are identified as missing links that mediate uncharacterized branches of the Papaver somniferum alkaloid pathway. Synergistic combination of predicted enzymes together with homologous enzyme templates affords 356 mM norcoclaurine, 240 mM N- me th y lcoc I an ri nc and 74.9 pM reticuline, without using any genome engineering. The alternative branches of flux from tyrosine to downstream alkaloids are confirmed using dynamic metabolic profiling with mechanism-directed deuterium labeling patterns.
Prediction and discovery of P. somniferum aromatic acetaldehyde synthase.
DHPAA and norlaudanosoline (also referred to as tetrahydropapaveroline or THP) are more easily oxidized and more toxic than their corresponding 4-hydroxyphenyl analogues Therefore, missing link enzymes to 4HPAA and norcoclaurine are explored to test our machine learning enzyme selection models (FIG. 2). Our previous M-path analysis identified 4-hydroxyphneylacetaldehyde synthase (4HPAAS, 4.1.1.108) to mediate 4HPAA production from tyrosine; however specific 4HPAAS sequences are incompletely annotated throughout most databases. In this study the term AAS is used to cover plant-type AAS enzymes 4HPAAS and phenylacetaldehyde synthase (PAAS, 4.1.1.109) as well as insect 3,4- dihydroxyphenylacetaldehyde synthase (DHPAAS, EC 4.1.1.107), because substrate specificities are often mixed throughout these groups.
Unclear variations within the plant-type AAS group, which may act upon a wide range of substrates including phenylalanine, tyrosine, 3,4-dihydroxy-L-phenylalanine (L-DOPA), tryptophan and histidine, further complicates the selection of a correct sequence based on phylogenetic and structural analyses alone. Accordingly, no AAS enzyme from P. somniferum has been clearly established. To overcome this challenge in prediction, our SVM-based algorithm is first applied to select AAS from P. somniferum homologs annotated as tyrosine/DOPA decarboxylase (TyDC) (FIG. 2a and b, and Table 1).
Table 1 1 High-dimensional SVM-based selection of P. somniferum sequences from AAS and AAAD prediction models
Figure imgf000036_0001
Figure imgf000037_0001
P. somniferum TyDC4 has a premature stop codon. "Positive P" is positive probability and "Negative P" is negative probability. Decision scores represent the distance from the SVM prediction boundary.
Separate SVM models for aromatic amino acid decarboxylase (AAAD) and AAS were trained using sequences classified as described in the methods (Table 1). According to database annotations and previous reports, P. somniferum TyDC (PsTyDC) proteins should be expected to catalyze the decarboxylation of tyrosine to form tyramine, and possibly L-DOPA conversion to dopamine. In contrast, SVM decision scores show that while most of the 8 full length PsTyDC sequences have high potential for AAAD activity, PsTyDCl-8 also appear in AAS prediction space. Higher positive SVM decision scores indicate sequences that are further from the SVM prediction boundary, deeper within the positive prediction group.
PsTyDC 1 contains the unique active site residue Leu205, further suggesting atypical activity of this test sequence, and PsTyDC 1 is therefore first selected to explore demonstration level (FIG. lc) prediction of AAS. In accordance with the SVM prediction, expression of wild-type PsTyDC 1 in E. coli promotes in vivo production of norcoclaurine from tyrosine and dopamine (FIG. 2c). As a positive AAS control, PsTyDCl-Tyr98Phe-Phe99Tyr-Leu205Asn with engineered active site residues transplanted from insect DHPAAS, also produces similar results to those of wild-type PsTyDCl. After substitution of PsTyDCl- L205 to a histidine residue found in typical AAAD, the decarboxylation product tyramine increases dramatically (FIG. 2c and FIG. 9). Production of norcoclaurine is further confirmed in strains expressing PsTyDCl with additional variations in the alkaloid pathway (FIG. 9, FIG. 10, and Table 2).
While in vivo norcoclaurine is detected in vivo with expression of PsTyDCl, in vitro production of unstable 4HPAA by PsTyDCl could not be detected, possibly due to low activity of PsTyDCl. Therefore, the SVM models are investigated further to select a better AAS candidate that might not be suggested by structural analysis. Despite containing AAAD-like active site residues Tyr98, Phe99, His205, Tyr350 and Ser372, PsTyDC6 scores highest in the AAS prediction model (FIG. 2b and Table 1). Therefore, PsTyDC6 is further selected for in vitro characterization. Interestingly, PsTyDC6 and PsTyDCl share over 98% sequence identity, which is the highest sequence identity among the entire PsTyDC family, and PsTyDC6 is accordingly annotated as "tyrosine/DOPA decarboxylase 1-like".
In agreement with the high AAS decision score, PsTyDC6 exhibits AAS activity in the presence of tyrosine and L-DOPA (FIG. 3 and Scheme 4), thereby demonstrating discovery level (FIG. lc) prediction of a novel plant AAS enzyme. Here, the in vitro AAS activity of PsTyDC6 is indicated by detection of 4HPAA by GC-MS, DHPAA by LC-MS as well as production of H O in a peroxidase-based fluorescent assay (FIG. 3). Detection of H O production was abolished by heat inactivation of PsTyDC6 (data not shown). PsTyDC6 also exhibits bifunctional AAAD activity which is indicated by the LC-MS detection of tyramine and dopamine as products of tyrosine and L-DOPA, respectively, and also by production of downstream norlaudanosoline from L-DOPA and norcoclaurine from L-DOPA and tyrosine (FIG. 3b).
Table 2 I Aromatic producing strains of this study
Strain Genotype Condition Products
BL21(DE3) F- ompT gal dcm Ion hsdSB(rB-mB-) L(DE3 [lad lacUV5-T7p07 indl sam7 nin5]) [malB+]K-12(IS)
BL21-AI F- ompT gal dcm Ion hsdSB(rB-mB-) [malB+]K-12(LS) araB: :T7RNAP-tetA
Rosetta- A(ara-leu)7697 AlacX74 AphoA PvuIIphoRaraD139 ahpC garni 36ale galK rpsL (DE3) F'[lac+ laclq pro] gor522::TnlO
2(DE3) trxB pRARE2 (Ca R, StrR, TetR)
T1-01-DE3 pC DFD-TfNCS-PsT y DC 1 LB-Tyr+DA NC _
T1-02-DE3 pCDFD-TfNCS-PsTyDCl-S LB-Tyr+DA Tyramine
T1-03-DE3 pC DFD-TfNCS-PsT y DC 1 -T LB-Tyr+DA NC
T1-04-DE3 pCDFD-PsONCS3-PsTyDCl LB-Tyr+DA trace NC
T1-05-DE3 pCDFD-PsONCS3-PsTyDC 1 -S LB-Tyr+DA Tyramine
T1-06-DE3 pCDFD-PsONCS3-PsTyDC 1 -T LB-Tyr+DA trace NC
T1-07-DE3 pAC Y C-3CjMTs-PsNMCH, pCDFD-TfNCS-PsTyDC 1 M9-Tyr+DA Reticuline
T1-08-DE3 p AC YC-3Cj MT s- PsNMC H . pCDFD-TfNCS-PsTyDC 1 -S M9-Tyr+DA Reticuline
T1-09-DE3 pAC Y C-3CjMTs-PsNMCH, pCDFD-TfNCS-PsTyDC 1 -T M9-Tyr+DA Reticuline
T1-10-DE3 pAC Y C-3CjMTs-PpDDC, pCDFD-PsONCS3-PsTyDC 1 TB-DOPA*; I mM
TB-Tyr+DOPA Reticuline;
NC, Reticuline
Figure imgf000038_0001
Reticuline
N1-01-DE3 pET23a-3PsMTs, pCOLAD-PsNMCH-PsCPR-L TB-NC 27.8 mM Reticuline
N1-02-DE3 pET23a-3PsMTs, pCOLAD-PsNMCH-H203Y-PsCPR-L TB-NC 15.8 mM Reticuline N1-03-DE3 pET23a-3PsMTs, pCOLAD-PsNMCH-AtATR2 TB-NC 15.9 mM Reticuline
N1-04-DE3 pET23a-3PsMTs, pCOLAD-PsNMCH-H203Y-AtATR2 TB-NC 8.0 mM
Reticuline
N2-01-DE3 pET23a-3PsMTs, pCOLAD-EcNMCH-AtATR2 TB-NC 4.9 mM
Reticuline
N2-02-DE3 pET23a-3PsMTs, pCOLAD-EcNMCH-Y202H-AtATR2 TB-NC 8.4 mM
Reticuline
N2-03-DE3 pET23a-3PsMTs, pCOLAD-EcNMCH-PsCPR-L TB-NC 3.7 mM
Reticuline
N2-04-DE3 pET23a-3PsMTs, pCOLAD-EcNMCH-Y202H-PsCPR-L TB-NC 3.6 mM
Reticuline
DS-01- pAC Y C-3CjMTs-PpDDC-S , pCDFD-PsONCS3 M9-DOPA
DE3
DT-01- pACYC-3CjMTs-PpDDC-T, pCDFD-TfNCS M9-DOPA 16.8 mM NL,
DE3 2.5 mM Reticuline
DT-02- pACYC-3CjMTs-PpDDC-T, pCDFD-PsONCS3 M9-DOPA 34 mM NL,
DE3 6.4 mM Reticuline
Figure imgf000039_0001
Reticuline
A1-02-DE3 pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-AROlO
A1-03-DE3 pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-AROlO, M9-Tyr*; 4HPP, DA, pET23a-EcHpaBC TB-Tyr* NC;
NL, NMC
DS-04- pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-PpDDC-S,
DE3 pET23a-EcHpaBC, pE-DHPAAS
A1-05-DE3 pAC Y C-3CjMTs-PsNMCH, pCDFD-CjNCS-AROlO, TB-Tyr+DOPA DA pTrc-DHPAAS-T A1-06-AI pACYC-3CjMTs-PpDDC-T, pCDFD-CjNCS-AROlO, TB- 74.9 mM pTXB 1 -PsTyDC 1 Tyr+DOPA; Reticuline;
TB-Tyr+DA 112 mM NMC
P1-05-AI pAC Y C-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-T, TB-Tyr+DOPA pBAD-PsPDCl
P1-06-DE3 pAC Y C-3CjMTs-PpDDC-T, pCDFD-PsONCS3, pBAD- TB-Tyr+DOPA 3.7 mM
PsPDCl Reticuline
P1-07-AI pACYC-3CjMTs-PpDDC-T, pCDFD-PsONCS3, pBAD- TB-Tyr+DOPA 61.8 mM PsPDCl Reticuline
Strains with names ending with "DE3" are derived from BL21(DE3), strains ending with "AI" are derived from BL21-AI, and strains ending with "ROS" are derived from Rosetta garni 2. Plasmid details are given in Table 7. The last two columns list successfully tested in vivo conditions (growth medium and added substrate) and produced BIAs or BIA precursors. Concentrations of extracted /V-mcthylcoclaurinc (NMC) and reticuline per culture volume are listed for AI-01-DE3, A1-06-AI, P1-06-DE3 and P1-07-AI; all other listed concentrations represent titers in filtered culture medium. Only product titers quantified at or above 1 mM are listed. Matched substrates and corresponding products are indicated by bold font. Substrates marked with * include isotopes (tyrosine-13C, tyrosinc- T. L-DOPA-d; or dopamine-cfe). Abbreviations: S - single variant, D - double variant, T - triple variant, Q - quadruple variant, PsONCS3 - P. somniferum multi- domain NCS, CjNCS - Coptis japonica NCS, 3CjMTs - C. japonica 60MT, CNMT and 40MT, 3PsMTs - P. somniferum 60MT, CNMT and 40MT, PsPDC - P, somniferum pyruvate decarboxylase, Ps2HCLL - P. somniferum 2-hydroxyacyl-CoA ligase-like, PpDDC - P. putida L-DOPA decarboxylase, PsNMCH - P. somniferum /V-methylcoclaurine 3-hydroxylase, PsCPR-L - P. somniferum CYP450 reductase-like, AtATR2 - A. thalania CYP450 reductase 2, EcHpaBC - E. coli 4-hydroxyphenylacetate 3 -monooxygenase complex, ARO10 - S. cerevisiae phenylpyruvate decarboxylase, DHPAAS - Bombyx mori 3,4- dihyxroxyphenylacetaldehyde synthase, Tyr - tyrosine, 4HPP - 4-hydroxyphenylpyruvate, DA - dopamine, DOPA - 3,4-dihydroxy-L-phenylalanine, NC - norcoclaurine, NL - norlaudanosoline, NMC - N- methylcoclaurine.
P. somniferum PDC1 decarbox lates 4-hydroxyphenylpyruvate in an alternative 4HPAA bypass pathway. Phenylpyruvate decarboxylase (PPDC) is an alternative to AAS for production of aryl acetaldehyde intermediates 4HPAA and DHPAA (FIG.4). Previous reports hypothesize that P. somniferum should contain PPDC with specificity towards 4-hydroxyphenylpyruvate (4HPP); however, no plant protein accessions are found with the annotation of phenylpyruvate decarboxylase. In comparison to the known enzymes with PPDC activity, including Azospirillum brasilense ipdC, Lactococcus lactis KdcA, and yeast ARO10, the PsPDCl active site more closely resembles that of typical pyruvate decarboxylases. Yet, in SVM prediction models constructed according to the methods section, P. somniferum PDC1 (PsPDCl) appears in PPDC prediction space (FIG. 4b and Table 3). Two additional test candidates, PsPDC2 and a 2-hydroxyacyl-CoA ligase-like protein, score lower for PPDC prediction and result in lower in vivo production of downstream 4HPP decarboxylase products, in comparison to that of PsPDCl (FIG.4b-c, and FIG. 11). The PPDC prediction model also suggests that truncated PsPDCl isoform XI (TrcPsPDCl-IXl) is a strong PPDC candidate sequence (FIG. 4b, Table 7). Table 3 I High-dimensional SVM-based prediction of P. somniferum sequences with potential PPDC activity
Figure imgf000041_0001
Figure imgf000041_0002
Figure imgf000042_0002
The combined PPDC and PDC model is trained with positive sequences that include typical PDC sequences. The PPDC prediction model is trained with positive training sequences annotated as phenylpyruvate decarboxylase and indolepyruvate decarboxylase, plus rose PPDC (BAU70033.1) and 19 sequences phylogenetically related to rose PPDC. All "C5167" sequences are annotated as "hypothetical protein"; **PsPDCl = XP_026414621.1/RZC79432.1; ***PsPDCl-like = XP_026411619.1/RZC72691.1, XP_026392057.1 , XP_026379022.1 , XP_026379014.1 , XP_026458198.1 , XP_026425556.1/RZC84511.1, XP_026412580.1, XP_026441200.1; C5167_015702 - 91.09% to PsPDCl, C5167_004675 - 90.28% to PsPDCl; C5167_008924 - 97.84% to PsPDCl-like (XP_026392057.1); C5167_039607 - 98.17% to PsPDCl-like (XP_026412580.1). RZC88315.1 is annotated as C5167_016119, is 100% identical to PsPDCl isoform XI with a 27 residue N-terminal truncation, and is herein referred to as truncated PsPDCl isoform XI (TrcPsPDCl-IXl).
In vivo screenings with PsPDCl reveal the alternative alkaloid route through 4HPP, and this PPDC bypass is distinct from the direct aromatic amino acid branch mediated by PsTyDCl (Scheme 2). Application of PsPDCl for conversion of tyrosine through the 4HPP and 4HPAA containing pathway results in improvement in norcoclaurine titers to the >10 mM range (Fig 4e) as estimated by GC-MS, compared the 100-200 nM range of PsTyDCl as estimated by LC-MS (FIG. 2c and FIG. 11).
Scheme 2. In vivo PPDC pathway with PsPDCl:
Figure imgf000042_0001
Tyrosol 4HPAA Norcoclaurine
Automatic selection of paired NMCH and CPR sequences extends the 4HPAA pathway
After constructing the 4HPPA pathway to norcoclaurine, P. somniferum cytochrome P450 (CYP450) homologs of NMCH are next considered to extend this pathway from /V-methylcoclaurine to reticuline (FIG. 5). Currently, Saccharomyces cerevisiae benzylisoquinoline alkaloid productions utilize characterized Eschscholzia californica NMCH (EcNMCH) for conversion of /V-methylcoclaurine to 3- hydroxy-iV-methylcoclaurine (3HNMC). There are several promising P. somniferum CYP450 sequences annotated as NMCH based on characterizations in plants. However, the presence of many additional CYP450 homologs in the P. somniferum genome complicates the selection of the best candidate sequence. To automate the selection of optimal NMCH and CPR sequences, a SVM model was trained using positive training vectors derived from plant CYP80B sequences annotated as "N-methylcoclaurine hydroxylase" as positive examples. 100 P. somniferum CYP450 sequences were then tested against this model to assist the selection of an optimal candidate (FIG. 5a, Table 4). As a result of this prediction (FIG. lc), PsNMCH Isoform 1 (PsNMCH-Il) scored high against the model and was selected.
Table 4 I High-dimensional SVM-based prediction of plant CYP450 sequences with potential NMCH activity
Figure imgf000043_0001
G8HL XP_026411431.1 0 0.05814 0.94186 -0.50609
F3ML XP_026427767.1 0 0.08272 0.91728 -0.42565
F3ML XP_026418653.1 0 0.00994 0.99006 -0.84420
CYP736A12-like XP_026430405.1 1 0.80286 0.19714 0.29765
CYP736A12-like XP_026437612.1 1 0.85694 0.14306 0.37074
CYP76A2-like XP_026437513.1 0 0.00554 0.99446 -0.95524
CYP76Al-like XP_026453156.1 1 0.85751 0.14249 0.37162
CYP76Al-like XP_026451658.1 0 0.44409 0.55591 -0.01159
G8HL XP_026380438.1 0 0.03817 0.96183 -0.58759
G8HL XP_026380437.1 0 0.03139 0.96861 -0.62511
G8HL XP_026380436.1 0 0.02108 0.97892 -0.70119
C5167JM8523,
RZC73046.1 0 0.35366 0.64634 -0.08523 partial
CYP736A12-like XP_026459019.1 0 0.10446 0.89554 -0.37680
ECODL XP_026408386.1 0 0.06783 0.93217 -0.46638
G8HL XP_026396971.1 0 0.20264 0.79736 -0.22896
CYP736A12-like XP_026388372.1 1 0.63926 0.36074 0.13721
CYP76A2-like XP_026451160.1 0 0.02247 0.97753 -0.68904
FSL isoform XI XP_026380444.1 0 0.01065 0.98935 -0.83109
CYP76Al-like XP_026399590.1 0 0.18332 0.81668 -0.25253
C5167_033939 RZC70799.1 0 0.12354 0.87646 -0.34086
CYP450 AFK73714.1 0 0.19020 0.80980 -0.24393
G8HL XP_026393993.1 0 0.25886 0.74114 -0.16860
C5167J) 13475 RZC54568.1 0 0.05053 0.94947 -0.53339
CYP76C4-like XP_026430487.1 0 0.03773 0.96227 -0.58982
CYP76C4-like XP_026430486.1 0 0.02641 0.97359 -0.65815
CYP76A2-like XP_026429624.1 0 0.20841 0.79159 -0.22226
CYP736A12-like XP_026460423.1 0 0.33671 0.66329 -0.10030
C5167_044785 RZC90154.1 1 0.77720 0.22280 0.26828
CYP71A9-like XP_026400228.1 0 0.10221 0.89779 -0.38141
CYP76C4-like XP_026458250.1 0 0.01934 0.98066 -0.71757
C5167_015732 RZC56884.1 0 0.02417 0.97583 -0.67510
C5167_019753 RZC51326.1 1 0.80985 0.19015 0.30613
CYP736A12-like XP_026437453.1 1 0.78432 0.21568 0.27617
CYP71A9-like XP_026444276.1 0 0.02361 0.97639 -0.67957
C5167_027648 RZC91585.1 0 0.04574 0.95426 -0.55271
CYP71A9-like XP_026378190.1 0 0.04165 0.95835 -0.57079
CYP736A12-like XP_026388374.1 0 0.03185 0.96815 -0.62235
CYP71D8-like XP_026451198.1 1 0.58274 0.41726 0.09419
CYP71Al-like XP_026447646.1 0 0.08269 0.91731 -0.42572
CYP71Al-like
XP_026380212.1 0 0.13900 0.86100 -0.31510 isoform XI
CYP736A12-like XP_026460422.1 0 0.02623 0.97377 -0.65945
C5167_026541 RZC85882.1 0 0.18104 0.81896 -0.25543
C5167_005175 RZC57870.1 0 0.00000 1.00000 -4.11315
CYP71D8-like XP_026392887.1 1 0.88125 0.11875 0.41133
C5167_009556 RZC65863.1 1 0.57069 0.42931 0.08497
CYP450 AFK73720.1 0 0.11183 0.88817 -0.36229
CYP71Al-like XP_026377587.1 0 0.27175 0.72825 -0.15604
CYP76A2-like XP_026443290.1 0 0.03203 0.96797 -0.62125
Figure imgf000045_0001
The upper 4 highlighted sequences are selected and tested NMCH sequences from Eschscholzia califomica and P. somniferum. All other sequences are from P. somniferum. XP_026418925.1 is annotated as "(S )-N- methylcoclaurine 3'-hydroxylase isozyme 1", CYP80B1 and CYP80B3 in public databases. EcNMCH- Y202H contains an artificial binding pocket substitution to more closely resemble PsNMCH. PsNMCH- H203Y contains an artificial binding pocket substitution to more closely resemble EcNMCH.
Abbreviations: NMCH - V-methylcoclaurine hydroxylase, NMCH-I1 - NMCH isozyme 1, G8HL - geraniol 8-hydroxylase-like, F3ML - flavonoid 3'-monooxygenase-like, DHP6AML - 3,9-dihydroxypterocarpan 6A- monooxygenase-like, ECODL - 7-ethoxycoumarin O-deethylase-like, FSL - ferruginol synthase-like, PSOL - premnaspirodiene oxygenase-like, F35H1L - flavonoid 3',5'-hydroxylase 1-like. All "C5167" sequences are annotated as "hypothetical protein".
A CPR redox partner for PsNMCH was also selected using the same workflow. While a CPR sequence has been characterized from P. somniferum, the referenced sequence AAC05021.1 is annotated as "NADPHTerrihemoprotein oxidoreductase", which may confuse the selection of this sequence as CPR. Moreover, there are at least 8 other unique P. somniferum sequences with high CPR homology that have not been characterized. After testing the 8 additional P. somniferum candidates against the CPR SVM model, XP_026404029.1 is selected as a high scoring sequence (FIG. 5b and Table 5) and observed to exhibit CPR activity (FIG.5c). This CPR sequence is annotated as "NADPH— cytochrome P450 reductase like", and accordingly it is referred to as PsCPR-L in this manuscript. Table 5 I High-dimensional SYM-based prediction of P. somniferum CPR sequences
Figure imgf000046_0001
AAC05021.1 (NADPH:ferrihemoprotein oxidoreductase) was characterized as a cytochrome P450 reductase (CPR). XP_026404029.1 (NADPH— cytochrome P450 reductase-like) was selected and is referred to as PsCPR-L in the manuscript. High scoring sequence XP_026457702 is 99% identical to selected sequence XP_026404029.1.
NMCH activity is evaluated by converting norcoclaurine to stable reticuline using NMCH and CPR variants expressed together with norcoclaurine 6-O-methyltransferase (60MT), coclaurine N- methyltransferase (CNMT) and 3-hydroxy-/V-methylcoclaurine 4-O-methyltransferase (40MT) (FIG. 5c and Table 2). /V-mcthylcoclauiinc accumulates much more than other intermediates in this system, and therefore stable reticuline titers should reflect the activity of the NMCH bottleneck. In this system, PsNMCH-Il affords higher amounts of reticuline than that of EcNMCH, when paired with either PsCPR-L or AtATR2 (FIG. 5c). PsNMCH-Il pairs best with PsCPR-L from the same species, resulting in the highest amount of reticuline. On the other hand, reticuline production with EcNMCH is best with AtATR2 pairing, with no improvement from PsCPR-L pairing.
Just one residue difference is observed when comparing the binding pockets of PsNMCH and EcNMCH: PsNMCH-His203 versus EcNMCH-Tyr202. SVM prediction of PsNMCH-His203Tyr and EcNMCH-Tyr202His sequences results in lower and higher SVM scores, respectively (Table 4), indicating that the SVM model is able to identify this key residue as an important feature. Consistent with this prediction, transplantation of EcNMCH-Tyr202 into engineered PsNMCH-His203Tyr results in lower reticuline, and transplantation of PsNMCH-His203 into engineered EcNMCH-Tyr202His results in higher conversion of norcoclaurine to reticuline when paired with AtATR2. The improvement in reticuline with EcNMCH-Tyr202His could be replicated in a second independent test of the same strains (data not shown). Scheme 3:
Figure imgf000047_0001
To further clarify the important tyrosine 3-monooxygenase missing link in P. somniferum, the candidate CYP450 monooxygenase sequences are also explored as potential tyrosine 3 -monooxygenase templates (Table 6). Here, the candidate sequences are tested against a plant CYP76AD SVM model and a combined SVM model trained with plant CYP76AD, CYP98A3 and CYP199A2 sequences that hydroxylate tyrosine and structurally similar compound coumaric acid. CYP98A2-like (XP_026403623.1), geraniol 8- hydroxylase-like (XP_026409442.1) and flavonoid 3',5'-hydroxylase 1-like (XP_026378021.1) sequences appear as prime targets with relatively high scores in the positive prediction space of both high-dimensional models of Table 6.
Table 6 I High-dimensional SVM-based prediction of P. somniferum CYP450 sequences with potential tyrosine 3-monooxygenase activity
Figure imgf000047_0002
XP_02638017
0.91344 0.08656 0.72276 0.49424 0.50576 -0.06468
G8HL 9.1
C5167_004855 RZC57552.1 0.96449 0.03551 0.88480 0.00090 0.99910 -1.18485
XP_02640944
0.87491 0.12509 0.65243 0.81581 0.18419 0.17735
G8HL 0.1
XP_02640943
0.78802 0.21198 0.54443 0.61490 0.38510 0.01298
G8HL 7.1
XPJ32637913
0.95001 0.04999 0.82356 0.88381 0.11619 0.26380
G8HL 5.1
XP_02637789
0.56616 0.43384 0.36514 0.72476 0.27524 0.09411
G8HL 0.1
XP_02643003
CYP71Al-like 0.05401 0.94599 -0.17769 0.41167 0.58833 -0.11844
8.1
C5167_026704 RZC86034.1 0.10892 0.89108 -0.04013 0.35503 0.64497 -0.15814
XP_02644128
F3ML 0.09569 0.90431 -0.06482 0.53842 0.46158 -0.03633
7.1
C5167_027831 RZC91763.1 0.04840 0.95160 -0.19688 0.20276 0.79724 -0.28031 XP_02644767
0.07126 0.92874 -0.11985 0.84778 0.15222 0.21398
CYP76A2-like 2.1
XP_02644727
0.22551 0.77449 0.10847 0.68926 0.31074 0.06663
CYP76A2-like 8.1
XP_02638116
0.11624 0.88376 -0.02757 0.14318 0.85682 -0.34759
F3ML 4.1
C5167_005813 RZC58514.1 0.02224 0.97776 -0.33146 0.01744 0.98256 -0.70935
CYP450 AFK73718.1 0.03506 0.96494 -0.25296 0.08685 0.91315 -0.43788
XP_02645050
CYP76A2-like 0.19700 0.80300 0.07913 0.85102 0.14898 0.21802
0.1
C5167_002759 RZC76181.1 0.02624 0.97376 -0.30296 0.04881 0.95119 -0.54300
XP_02644274
0.16589 0.83411 0.04321 0.10784 0.89216 -0.39948
DHP6AML 6.1
XP_02641143
0.02026 0.97974 -0.34744 0.22728 0.77272 -0.25702
G8HL 1.1
XP_02642776
0.05762 0.94238 -0.16630 0.04139 0.95861 -0.56990
F3ML 7.1
XP_02641865
0.00702 0.99298 -0.52926 0.02579 0.97421 -0.64640
F3ML 3.1
CYP736A12- XP_02643040
0.14287 0.85713 0.01297 0.08246 0.91754 -0.44695 like 5.1
CYP736A12- XP_02643761
0.33746 0.66254 0.20181 0.11411 0.88589 -0.38928 like 2.1
XP_02643751
CYP76A2-like 0.07915 0.92085 -0.10041 0.66892 0.33108 0.05168
3.1
XP_02645315
CYP76Al-like 0.36775 0.63225 0.22574 0.90913 0.09087 0.30766
6.1
XP_02645165
CYP76Al-like 0.13538 0.86462 0.00227 0.61347 0.38653 0.01206
8.1
XP_02638043
G8HL 0.00793 0.99207 -0.50840 0.21455 0.78545 -0.26887
8.1
XP_02638043
G8HL 0.00957 0.99043 -0.47618 0.28162 0.71838 -0.21099
7.1
XP_02638043
G8HL 0.00570 0.99430 -0.56473 0.11483 0.88517 -0.38814
6.1
C5167 M8523
RZC73046.1 0.69655 0.30345 0.46193 0.82713 0.17287 0.18970 , partial CYP736A12- XP_02645901
0.00287 0.99713 -0.68233 0.00176 0.99824 -1.07710 like 9.1 XP_02640838
0.00881 0.99119 -0.49036 0.15940 0.84060 -0.32734
ECODL 6.1
XP_02639697
0.23408 0.76592 0.11675 0.34545 0.65455 -0.16525
G8HL 1.1
CYP736A12- XP_02638837
0.14401 0.85599 0.01455 0.03709 0.96291 -0.58770 like 2.1
XP_02645116
0.11039 0.88961 -0.03755 0.76789 0.23211 0.13067
CYP76A2-like 0.1 FSL isoform XP_02638044
0.00344 0.99656 -0.65121 0.02518 0.97482 -0.65026 XI 4.1
XP_02639959
0.00757 0.99243 -0.51627 0.42563 0.57437 -0.10913
CYP76Al-like 0.1
C5167JB3939 RZC70799.1 0.00441 0.99559 -0.60872 0.25109 0.74891 -0.23604
CYP450 AFK73714.1 0.19894 0.80106 0.08123 0.54130 0.45870 -0.03448
XP_02639399
G8HL 0.18141 0.81859 0.06173 0.19336 0.80664 -0.28979
3.1
C5167_013475 RZC54568.1 0.00096 0.99904 -0.87047 0.00065 0.99935 -1.23745
XP_02643048
CYP76C4-like 0.00112 0.99888 -0.84331 0.03096 0.96904 -0.61692
7.1
XP_02643048
CYP76C4-like 0.00110 0.99890 -0.84627 0.02139 0.97861 -0.67653
6.1
XP_02642962
CYP76A2-like 0.57974 0.42026 0.37451 0.52533 0.47467 -0.04474
4.1
CYP736A12- XP_02646042
0.00848 0.99152 -0.49691 0.00211 0.99789 -1.04827 like 3.1
C5167_044785 RZC90154.1 0.03451 0.96549 -0.25566 0.09578 0.90422 -0.42063
XP_02640022
CYP71A9-like 0.03709 0.96291 -0.24318 0.05424 0.94576 -0.52569
8.1
XP_02645825
CYP76C4-like 0.00088 0.99912 -0.88525 0.01859 0.98141 -0.69913
0.1
C5167_015732 RZC56884.1 0.00075 0.99925 -0.91245 0.01943 0.98057 -0.69198
C5167_019753 RZC51326.1 0.53999 0.46001 0.34711 0.80717 0.19283 0.16832
CYP736A12- XP_02643745
0.14329 0.85671 0.01356 0.19329 0.80671 -0.28986 like 3.1
XP_02644427
CYP71A9-like 0.01737 0.98263 -0.37387 0.04853 0.95147 -0.54394
6.1
C5167_027648 RZC91585.1 0.20970 0.79030 0.09257 0.39769 0.60231 -0.12791
XP_02637819
CYP71A9-like 0.00183 0.99817 -0.75893 0.00754 0.99246 -0.84397
0.1
CYP736A12- XP_02638837
0.01802 0.98198 -0.36763 0.01057 0.98943 -0.78970 like 4.1
XP_02645119
CYP71D8-like 0.16519 0.83481 0.04234 0.53959 0.46041 -0.03557
8.1
XP_02644764
CYP71Al-like 0.08686 0.91314 -0.08305 0.08434 0.91566 -0.44301
6.1
CYP71Al-like XP_02638021
0.07484 0.92516 -0.11079 0.01135 0.98865 -0.77828 isoform XI 2.1 CYP736A12- XP_02646042
0.00120 0.99880 -0.83100 0.00092 0.99908 -1.18133 like 2.1
C5167_026541 RZC85882.1 0.28197 0.71803 0.15967 0.66588 0.33412 0.04949
C5167_005175 RZC57870.1 0.00000 1.00000 -1.92678 0.00022 0.99978 -1.40738
XP_02639288
CYP71D8-like 0.25363 0.74637 0.13491 0.56755 0.43245 -0.01755
7.1
C5167_009556 RZC65863.1 0.84695 0.15305 0.61242 0.01363 0.98637 -0.74897
CYP450 AFK73720.1 0.11097 0.88903 -0.03653 0.99336 0.00664 0.74116
XP_02637758
CYP71Al-like 0.18209 0.81791 0.06251 0.00616 0.99384 -0.87640
7.1
Figure imgf000050_0001
P. somniferum sequence abbreviations: NMCH - /V-mcthylcoclaunnc hydroxylase, NMCH-I1 - NMCH isozyme 1, G8HL - geraniol 8-hydroxylase-like, F3ML - flavonoid 3'-monooxygenase-like, DHP6AML - 3,9-dihydroxypterocarpan 6A-monooxygenase-like, ECODL - 7-ethoxycoumarin O-deethylase-like, FSL - ferruginol synthase-like, PSOL - premnaspirodiene oxygenase-like, F35H1L - flavonoid 3',5'-hydroxylase 1-like. All C5167 sequences are annotated as "hypothetical protein". High scoring sequence RZC73039.1 (C5167_048514) shares homology with selected sequence XP_026409442.1 (G8HL). High scoring sequences AFK73720.1 (CYP450) and XP_026458081.1 (CYP98A2-like) share homology with selected sequence XP_026403623.1 (CYP98A2-like).
Emergence of dual norcoclaurine and norlaudanosoline pathways via expression of P. somniferum decarboxylases. PsTyDC6 is able to convert tyrosine and L-DOPA to norcoclaurine and norlaudanosoline (FIG. 3; Scheme 4). Similarly, co-expression of PsTyDCl with PsNMCH-Il, 60MT, CNMT and 40MT, results in a plant-gene only dual pathway through 4HPAA and DHPAA to norcoclaurine and reticuline (FIG. 11). Moreover, in vitro DHPAAS activity is confirmed using PsTyDC6 (FIG. 3c), which shares over 98% sequence identity with PsTyDCl. Accordingly, the potential DHPAAS activity of PsTyDCl is further explored to construct combined norcoclaurine and norlaudanosoline pathways (FIG. 6a). At the same time, PsPDCl is also explored as a mediator of DHPAA production via decarboxylation of transaminated L-DOPA.
Scheme 4
Figure imgf000051_0001
After incorporating L-DOPA decarboxylase (DDC) from Pseudomonas putida (PpDDC) for in vivo dopamine production and optimization in Terrific Broth (TB), PsPDCl and PsTyDCl containing strains produce reticuline from L-DOPA via the DHPAA pathway, with titers reaching the mM range (FIG. 6b). Previously, a single strain containing DHPAAS, 60MT, CNMT and 40MT only produced reticuline titers of 0.2 mM from L-DOPA7. This result suggests that PsPDCl can produce DHPAA from 3,4- dihydroxyphenylpyruvic acid (DHPP) that is supplied by L-DOPA transamination, and that PsPDCl works synergistically with PsTyDCl at later production times to promote high reticuline production in E. coli. Accordingly, combinations of PPDC and AAS are next explored to improve BIA titers.
Expanding the prediction models towards template enzyme engineering
The characterizations of PsTyDCl, PsTyDC6 and PsPDCl indicate that these enzymes promote dual pathways in E. coli. However, the activities of PsTyDCl, PsPDCl and TrcPsPDCl-IXl are low under the conditions tested. Therefore, in order to quickly achieve in vivo titers high enough for dynamic metabolomic profiling, dual norcoclaurine and norlaudanosoline pathways are re-explored using homologous enzyme templates with stable expression in E. coli (FIG. 7). The concept of template enzyme engineering refers to the approach where useful features are identified from a specialized enzyme and those features are transplanted into a related template to confer some advantages. This is illustrated with the above EcNMCH-Tyr202His substitution where the corresponding His203 residue from PsNMCH is substituted to improve EcNMCH as the template enzyme. To further develop this methodology, the SVM enzyme selection algorithm is applied to evaluate multiple enzyme engineering substitutions for highly active template sequences, using PpDDC as a specific example (FIG. 7). AAS activity analogous to that of PsTyDCl could be engineered into the bacterial PpDDC template by transplanting DHPAAS specific catalytic residues F79, Y80 and N181. Rationally engineered PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn mediates improved norlaudanosoline production in E. coli (FIG. 7). Switching from PsPDCl to a S. cerevisiae ARO10 template is observed to improve in vivo turnover of both DHPP (FIG. 7) and 4HPP (FIG. 8), in comparison to corresponding strains containing PsPDCl. However, the high activity of ARO10 may come at a specificity tradeoff, as production of additional aromatic keto acid derived alkaloids result from ARO10 expression (FIG. 7c).
Combinations of natural and analogous enzyme templates result in improved E. coli BIA production (FIG. 8a and Table 2). Expression of PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn together with PsPDCl in strain P1-07-AI selectively promotes the DHPAA pathway in the presence of tyrosine and L-DOPA to produce 61.8 mM reticuline, while the application of ARO10 in strain A1-01-DE3 selectively favors the 4HPAA pathway in the presence of tyrosine and dopamine to produce 356 mM norcoclaurine and 240 pM A-methylcoclaurine. Dual pathway production of 112 pM /V-methylcoclaurine and 74.9 pM reticuline is promoted through the combination of PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn, ARO10 and PsTyDCl in strain A1-06-AI.
Dynamic metabolomic profiling of AAS and PPDC branch pathways
By tracing the turnover of isotope-labeled precursors and measuring the resulting fractions of isotope-labeled intermediates, metabolic flux can be observed, and this approach is referred to as dynamic metabolic profiling. While multiple reaction monitoring (MRM) with LC-MS is sensitive, this method does not readily detect isotope-labeled intermediates. After improving BIA titers to pM levels suitable for quantification with high-resolution CE-MS, isotope tracing experiments could be performed. Combinations of PsPDCl, ARO10, PsTyDCl and PpDDC produce various labeling patterns: tyrosine-13C to BIA-13C2, L- DOPA-d3 with tyrosine-d4 to d6 -labeled BIA, L-DOPA-d3 to d5 -labeled BIA, L-DOPA-d3 with dopamine-d2 to d5 -labeled BIA, tyrosine-d4 with dopamine-d2 to d6 -labeled BIA, and tyrosine-d4 with dopamine to d4 - labeled BIA (FIG. 8 and FIG. 12, Scheme 6). The loss of a ring deuterium atom during NCS-mediated condensation of aryl acetaldehydes with ring-labeled dopamine is consistent with the reported NCS mechanism (FIG. 8b, Scheme 5, and FIG. 12d); this kind of mechanism-directed deuterium labeling pattern has not been reported for the tracing of BIA. Isotope tracing from L- DO PA-7; to d5 -labeled BIA supports the bifunctional decarboxylase and oxidative deamination activities of PpDDC-Tyr79Phe- Phe80Tyr-Hisl81Asn (FIG. 12d). Improvement of Wmethylcoclaurinc-d6 and re ticuline- d5 production via PsTyDCl in addition to PsPDCl again demonstrates the synergistic combination of these distinct aryl acetaldehyde producing enzymes (FIG. 8b, Scheme 5). Moreover, amounts of N-mtheylcoclaurine -d6 and re ticulinc-d5 relative to their respective precursors norcoclaurine-d6 and norlaudanosoline--d5 (FIG. 12b and c) show the bottleneck of the S-adenosylmethionine (SAM)-dependent methylation of deuterium-labeled BIA. Furthermore, isotope tracing from tyrosine-13C supports that PsPDCl and ARO10 are converting isotope labeled 4-hydroxyphenylpyruvate (4HPP) to downstream BIA (FIG. 12a). Scheme 5:
Figure imgf000053_0001
Dynamic metabolomic profiling of mixed fractions of unlabeled and labeled BIA, could be performed with high-titer norcoclaurine-d4 and N - methylcoclaurine- d4 production (FIG. 8c). In this case, a higher fraction of d4 -labeled norcoclaurine relative to d4 -labeled N-methylcoclaurine is consistent with the SAM-dependent methyltransferase bottleneck observed previously.
Figure imgf000054_0001
Scheme 6c:
Figure imgf000055_0001
Scheme 6d:
Figure imgf000055_0002
Discussion. This report demonstrates that machine learning can uncover missing link enzymes with direct applications to biomanufacturing. While previous studies have also reported machine learning for enzyme prediction, these examples were never applied to the discovery of uncharacterized enzymes. Prediction of active glutaminase and aurora kinases B were used as examples to verify an algorithm, however this test data was obtained from previous publications. The current study demonstrates the paired prediction and experimental verification of four kinds of plant enzymes. Furthermore, the possibility to engineer artificial enzymes is demonstrated by prediction of PsNMCH-His203Tyr, EcNMCH-Tyr202His (FIG. 5) andPpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn (FIG. 7), where scores are in agreement with in vivo test results. Therefore, the SVM prediction models of this study (Tables 2-6) can readily enable the discovery and engineering of specialized carboxy-lyases (EC 4.1.1.X), CYP450s (EC 1.14.X.X), and CPRs (EC 1.6.2.4 While the current machine learning method is shown to be superior to homology-based selection of AAS and PPDC sequences (FIG. 2a and FIG. 4a), additional studies should be pursued to demonstrate improved selection for other classes of enzymes.
PsPDCl shows potential for in vivo PPDC activity and contains active site residue Tyr332, which is also present in ZmPDC that is known to only convert small non-aromatic substrates. This active site tyrosine is substituted with smaller residues in characterized yeast and bacterial PPDC enzymes, and therefore the structural basis of plant PPDC substrate recognition appears to be determined by other factors. Species-by-species variation in functional residues is also seen with the evolution of AAS variants throughout insects and plants. Insects have evolved a histidine to asparagine active site switch, corresponding to residue 192 of DHPAAS, to promote AAS activity essential for their survival. In the plant homologues, tyrosine is commonly substituted with a more hydrophobic phenylalanine (residue 346 of Petroselinum crispum 4HPAAS) to switch from AAAD to AAS activity. Yet the active site of PsTyDC6 resembles that of typical AAAD and it still possesses AAS activity. These results with PsPDCl and PsTyDC6 indicate that specialized PPDC and AAS activities may exist in other plant sequences that resemble typical carboxy-lyases. This new insight also suggests that combinations of subtle structural features or emergent properties may be underlying the specialized activities of some plant carboxy-lyases. Accordingly, machine learning offers advantages over structural analysis to identify elusive emergent features in enzymes with novel functions that cannot be predicted from structure or homology alone.
Transplantation of discovered functional residues into high-activity microbial templates is an effective strategy for improving bioproduction, as demonstrated by PpDDC-Tyr79Phe-Phe80Tyr- Hisl81Asn with transplanted DHPAAS active site residues. In this example, the design of three amino acid substitutions, including the most critical Hisl81 substitution that corresponds to PsTyDCl-Leu205, could be guided with the SVM prediction algorithm. Improved protein stability, removal of regulation/inhibition, and higher bacterial expression are additional factors that might contribute to improved templates. While PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn favors the DHPAA pathway, PsTyDC6 is capable of mediating both DHPAA and 4HPAA containing pathways. Similarly, expression of PsPDCl and ARO10 are observed to promote conversion of 4HPP to 4HPAA and DHPP to DHPAA, but strains expressing PsPDCl favor the DHPAA containing pathway under the conditions tested (FIG. 8a and b). This illustrates that the control of flux through the norcoclaurine route versus the norlaudanosoline route may be controlled by the selection of specific PPDC and AAS templates.
For high-level production of the key benzylisoquinoline alkaloid intermediate reticuline, current studies require NCS and plant OMTs, but their selection has generally been limited to a few sequences from P. somniferum, C. japonica and T. flavum. Yeast studies have focused on the natural norcoclaurine route and require the CYP450 enzyme NMCH to complete the pathway to reticuline. Yet in the recent S. cerevisiae reports, the first well-characterized NMCH from E. californica continues to be selected. Considering this, machine learning-based enzyme prediction offers great potential to expand the choice of additional homologous sequences that might further boost product titers. For example, the predicted pathways with PsNMCH-Il paired with PsCPR-L and EcNMCH-Tyr202His paired with AtATR2 produce higher reticuline titers than that of the conventional EcNMCH and AtATR2 combination. For production in E. coli, Matsumura et al. were able to achieve 160 mg/L (S)-reticuline using a MAO-dependent and NMCH-independent pathway through norlaudanosoline in E. coli. In contrast, establishing the natural norcoclaurine pathway in E. coli produces high levels of extracted norcoclaurine (96.7 mg/L) and /V-methylcoclaurine (71.8 g/L) (FIG. 8a). Although yeast is better suited than E. coli for the expression of CYP450s including NMCH, production of expensive iV-methylcoclaurine does not require NMCH. In addition, the dual pathway strain of the current study produces 24.6 mg/L reticuline and 33.6 mg/L IV-methylcoclaurine (FIG. 8a).
The current characterizations of PsTyDCl, PsTyDC6 and PsPDCl show that these enzymes promote production of both norlaudanosoline and norcoclaurine in E. coli.
A dual pathway to norcoclaurine and norlaudanosoline in E. coli offers advantages for utilization of tyrosine, and for improving amounts of unstable aryl acetaldehydes relative to dopamine. Accordingly, increased aryl acetaldehyde production by synergistic expression of PPDC together with AAS results in increased reticuline through an enhanced norlaudanosoline pathway (FIG. 6a and b). These newly constructed routes were further characterized by dynamic metabolomic profiling, an approach that can readily identify new bottleneck targets for increasing metabolic flux to target compounds. In conclusion, machine learning discovery of missing links and homologous enzyme templates is now a realistic approach for assembling alternative routes and relieving bottlenecks in improved metabolic pathways.
The following Examples are intended to illustrate the above invention and should not be construed as to narrow its scope. One skilled in the art will readily recognize that the Examples suggest many other ways in which the invention could be practiced. It should be understood that numerous variations and modifications may be made while remaining within the scope of the invention.
EXAMPLES
Example 1. Material and Methods.
SVM machine learning prediction. Support vector machine (SVM) Enzyme-models were built based on the methods of our previous study with modifications. Aromatic amino acid decarboxylase (AAAD), aromatic acetaldehyde synthase (AAS, previously referred to as aromatic aldehyde synthase) and phenylpyruvate decarboxylase (PPDC) prediction models were trained with vectors generated by PROFEAT as described in Li et al., Nucleic Acids Res. 34, W32-W37 (2006). AAAD positive training sequences include DDC and other typical PLP-dependent carboxylases that decarboxylate aromatic amino acids. The AAAD model is trained with positive examples based on only typical AAAD sequences that contain a catalytic histidine, corresponding to Hisl81 of PpDDC. Characterized PsTyDC9 is included as a positive AAAD training sequence to ensure there is no bias towards AAS prediction. For AAS models, the positive training examples consist of sequences with homology to known plant-type and insect-type AAS enzymes, including Petroselinum crispum 4HPAAS (Pc4HPAAS) and insect DHPAAS. Insect-type AAS sequences are classified based on the presence of Asnl92 (insect DHPAAS numbering), and plant-type AAS enzymes are classified based on the presence of Phe346 or Val346 (Pc4HPAAS numbering).
For PPDC prediction models, positive training vectors included sequences annotated as PPDC and indolepyruvate decarboxylase. Since all current database sequences annotated as phenylpyruvate decarboxylase are from bacteria and fungi (plus 1 from Archaea), PDC sequences also had to be included in the first prediction model (Table 3). After discovering PsPDCl, a rose PPDC sequence was found from continuous literature searches, although its protein accession (BAU70033.1) is annotated as "pyruvate decarboxylase". A second PPDC specific SVM model was therefore built by training with 19 homologous plant sequences in the same phylogenetic clade as rose PPDC as positive training sequences and 3 negative training sequences which were curated as plant PDC.
Positive training sequences from AAS, PPDC and PDC models were included as negative training sequences for the AAAD model; positive training sequences from AAAD, PPDC and PDC models were included as negative training sequences for the AAS model; and positive training sequences from AAS and AAAD models were included as negative training sequences for PPDC models. For all models, general negative training sequences included E. coli, S. cerevisiae and A. thaliana enzymes, excluding sequences classified in the positive training group.
CYP450 prediction models were trained with vectors generated by ProtVec as described in Asgari etal., PLOS One 10, e0141287 (2015). To clarify potential NMCH activities of CYP450 monooxygenases, SVM models were trained with CYP80B positive sequences. To clarify potential tyrosine 3 -monooxygenase activities, SVM models were trained with positive sequences related to CYP76AD, CYP98A3 and CYP199A2, which are reported to mediate aromatic hydroxylation of tyrosine as well as similarly sized substrate coumaric acid.
Prediction models were first built with high-dimensional vectors. Cross validation of all high dimensional models resulted in F-scores above 0.96. Candidate sequences were selected based on high dimensional scores. Two-dimensional and three-dimensional plots were used for visual representation of data in Figures. For two-dimensional plots, high-dimensional vectors were compressed to 2 dimensions using principal component analysis (PCA). 2-dimensional SVM models were then built derived from the PCA compressed vectors. SVM and PCA from the scikit-learn library were used. Compressed two- dimensional decision scores from the combined model (Table 3).
Random Forests E-models were built using the same datasets and feature extractions as that of the corresponding SVM models. As a benchmark, machine learning differentiation of AAS versus AAAD sequences, and PPDC versus PDC sequences, was compared to differentiation based on homology to consensus sequences. To do this, consensus sequences were generated for each group of training sequences (AAS, AAAD, PPDC and PDC) by selecting the amino acid of maximum frequency at each position. If a training sequence has higher sequence identity to the consensus sequence of its correct group, compared to that of its related group, then it was counted as a correct prediction by homology.
Protein structural modeling and docking analysis. Homology models were built using Modeller. Multimeric structures and ligands were first prepared in Pymol. Structures were refined and prepared for docking analysis using Molecular Operating Environment.
Materials and reagents. KOD -Plus- and Ex-Taq HS DNA polymerases were purchased from Toyobo (Tokyo, Japan) and Takara (Tokyo, Japan), respectively. A DNA ligation kit and JM109 chemical competent cells were purchased from Takara (Tokyo, Japan). BL21-AI competent cells were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Rosetta garni 2 cells were purchased from Sigma- Aldrich (St. Louis, MO, USA). All restriction endonucleases were purchased from New England Biolabs (Ipswich, MA, USA). Antibiotics were purchased from Nacalai Tesque (Kyoto, Japan), Sigma-Aldrich (St. Louis, MO, USA) and FUJIFILM Wako Pure Chemical (Osaka, Japan). Growth medium components were purchased from BD (Franklin Lakes, NJ, USA) and Nacalai Tesque. The IMPACT system, with pTXBl and pTYB21 vectors, and chitin resin, was obtained from New England Biolabs (Ipswich, MA, USA). Amicon Ultra centrifugal filters were obtained from Merck-Millipore (Darmstadt, Germany). The Fluorimetric Hydrogen Peroxide Assay Kit was from Sigma-Aldrich (St. Louis, MO, USA). Amplex Red (10-acetyl-3,7-dihydroxyphenoxazine) peroxidase substrate was from Thermo Fisher (Waltham, MA, USA). 3-(3,4-dihydroxyphenyl)-L-alanine (L-DOPA), and 3 -hydroxy tyramine hydrochloride (dopamine) were purchased from TCI (Tokyo, Japan). 4-hydroxyphenylpyruvic acid was from Sigma-Aldrich. L- Tyrosine and L-ascorbic acid sodium salt were obtained from Nacalai Tesque. Analytical standards and isotopes were purchased from Santa Cruz Biotechnology (Dallas, TX, USA), Toronto Research Chemicals (New York, ON, Canada), ALB Technology (Kuala Lumpur, Malaysia), Sigma-Aldrich and Cambridge Isotope Laboratories (Tewksbury, MA, USA). /V,0-Bis(trimethylsilyl)trifluoroacetamide (BSTFA) and N- methyl-/V-trimethylsilyl)trifluoroacetamide (MSTFA) were obtained from GL Sciences (Tokyo, Japan). 1,4-dithiothreitol (DTT) and pyridine were obtained from FUJIFILM Wako Pure Chemical (Osaka, Japan). Chlorotrimethylsilane (TMS-C1) was from Alfa Aesar (Haverhill, MA, USA) and methoxyamine hydrochloride was from MP Biomedicals (Irvine, CA, USA).
Preparation of plasmids. Constructed plasmids (Table 7) were transformed into JM109 chemically competent E. coli (Takara). Transformants were grown on LB-agar plates supplemented with the appropriate antibiotics at 30-37°C. Positive clones were screened using colony PCR and target plasmids were purified using a QIAprep Miniprep Kit (Qiagen). Plasmids were then sequenced using primers listed in Table 8, a BigDye Terminator v3.1 cycle-sequencing kit, and an Applied Biosystems 3500xL Genetic Analyzer (Foster City, CA, USA).
Preparation of predicted candidate genes. Full length P. somniferum PsTyDCl native coding sequence was synthesized by Integrated DNA Technologies (IDT). Codon optimization of PsONCS3 and TfNCS nucleotide sequences for expression in E. coli was assisted by Codon Optimization OnLine (COOL), and the selected sequences were synthesized by IDT. The native sequence of full-length P. somniferum NMCH isoform 1 (PsNMCH-Il ) was also synthesized by IDT.
Native coding sequences of full length PsPDCl, full length Ps2HCLL, and N-terminal truncated PsPDC2 were synthesized and cloned into pBAD-DEST49 (LifeSensors Inc.) via the Gateway cloning system by GeneArt (Invitrogen). Native coding sequences of full length EcNMCH, AtATR2, and P. somniferum CPR-like (PsCPR-L) were synthesized and subcloned into pMA vector (Invitrogen) by GeneArt (Invitrogen). Native coding sequences of full length PsTyDC6 and N-terminal truncated PsPDCl-IXl were synthesized and cloned into pTYB21 (NEB) by GenScript.
Construction of pACYC-3CjMTs-DDC vectors. The pACYC184-derived vectors containing Coptis japonica 40MT, CNMT, 60MT (pACYC184-Cj40MT-CjCNMT-Cj60MT), and PpDDC (pACYC184-Cj40MT-CjCNMT-PpDDC-Cj60MT) were constructed in previous reports. Active site mutations were introduced into PpDDC in pACYC184, by way of site-directed mutagenesis using PCR with primers shown in Table 8.
Construction of subcloning vectors and mutations. To construct subcloning vectors for synthetic genes ( PsONCS3 , TfNCS, PsTyDCl, CjNCS, PsNMCH-U, EcNMCH, PsCPR-L, AtATR2 ), and PCR amplified PpDDC and AR010 (amplified from pGK424-ARO 10), 3’ end A-protrusions were added to each DNA fragment using A-attachment Mix (Toyobo).
Gene mutations were generated using site directed mutagenesis by PCR with primers listed in Table 8. PsTyDCl mutations (Leu205His and Tyr98Phe-Phe99Tyr-His205Asn) were generated in subcloning vectors by PCR. PpDDC mutations (Hisl81Leu, Hisl81Leu-Gly344Ser, Tyr79Phe-Phe80Tyr-Hisl81Asn, Tyr79Phe-Phe80Tyr-Hisl81Asn-Gly344Ser) were generated by PCR. EcNMCH mutation (Tyr202His) and PsNMCH-Il mutation (Tyr203His) were generated in subcloning vectors by PCR. pBad-PsPDCl-His, pBad-PsPDC2-His and pBad-Ps2HCLL-His were generated by removal of a stop codon with PCR.
Construction of alkaloid production vectors. A PsONCS3 containing DNA fragment was obtained from Ncol and BamHI digestion of the PsONCS3 subcloning vector, and then cloned into pCDFDuet-1 via the Ncol and BamHI restriction sites to produce pCDFD-PsONCS3. A TfNCS containing DNA fragment was obtained from Ncol and BamHI digestion of the TfNCS subcloning vector, and then cloned into pCDFDuet-1 via the Ncol and BamHI restriction sites to produce pCDFD-TfNCS.
DNA fragments of PsTyDCl were obtained from Ndel and Xhol digestion of PsTyDCl subcloning vectors, and then cloned into pCDFDuet-1 -PsONCS3 via Ndel and Xhol restriction sites to produce pCDFD-PsONCS3-PsTyDCl. The PsTyDCl containing gene fragments were also cloned into pCDFDuet- 1-TfNCS via Ndel and Xhol sites to produce pCDFD-TfNCS-PsTyDCl . Digestion of PsTyDCl subcloning vector with Ndel and Sapl was used to clone into pTXBl via Ndel and Sapl, resulting in pTXBl-TyDCl. To produce pTYB 21 -PsTyDCl, pTYB21-PsPDCl, pTYB21-PsPDC2 and pTYB21-Ps2HCLL, PsTyDCl, PsPDCl, Ps2HCLL, and N-terminal truncated PsPDC2 were PCR amplified and cloned into pTYB21 digested with Sapl and BamHI via Gibson assembly (NEB).
EcNMCH and EcNMCH-Tyr202His gene fragments were digested with Sail and Notl in subcloning vectors and then cloned into pCOLADuet-1 via the Sail and Notl restriction sites. AtATRl and PsCPR-L fragments were next digested from the subcloning vectors using Ndel and Xhol, and then cloned into pCOLAD-EcNMCH and pCOLAD-EcNMCH-Y202H Tyr202His via the Ndel and Xhol restriction sites to produce pCOLAD-EcNMCH- AtATR2, pCOLAD-EcNMCH- Y 202H- At ATR2 , pCOLAD-EcNMCH- PsCPR-L and pCOLAD-EcNMCH-Y202H-PsCPR-L.
The DNA fragment encoding PsNMCH-Il with a truncated N-terminal, was digested by Notl and Xhol from the subcloning vector and then cloned into a pACYC184 derived vector containing C. japonica 40MT, CNMT, and 60MT via Not I and Xho I restriction sites to produce pACYC-3CjMTs-PsNMCH. Truncated PsNMCH-Il and truncated PsNMCH-Y203H gene fragments were PCR amplified from subcloning vectors and then cloned into pCOLAD-EcNMCH-PsCPR-L digested with BamHI and Notl via Gibson assembly to produce pCOLAD-PsNMCH-PsCPR-L and pCOLAD-PsNMCH-Y203H-PsCPR-L. Truncated PsNMCH-Il and truncated PsNMCH-Tyr203His gene fragments were also digested with Ncol and Notl and cloned into pCOLAD-EcNMCH-AtATR2 digested with Ncol and Notl to produce pCOLAD- PsNMCH-AtATR2 and pCOLAD-PsNMCH-Y203H-AtATR2.
DNA fragments of PpDDC-Hisl81Leu, PpDDC-Hisl81Leu-Gly344Ser, PpDDC-Tyr79Phe- Phe80Tyr-Hisl81Asn and PpDDC- Tyr79Phe-PPhe80Tyr-Hisl 81Asn-Gly344Ser were PCR amplified from subcloning vectors and then cloned into pCDFDuet-1 digested with Ncol and BamHI via Gibson assembly. To produce pTYB21-PpDDC-S, PpDDC-Hisl81Leu was PCR amplified and cloned into pTYB21 digested with Sapl and BamHI via Gibson assembly. A CjNCS DNA fragment was obtained from Ndel and Xhol digestion of the CjNCS subcloning vector, and then cloned into pCDFDuet-1 -PpDDC vectors via Ndel and Xhol sites to produce pCDFD-CjNCS-PpDDC. A S. cerevisiae AROIO gene fragment was digested with Ncol and Notl in the AROIO subcloning vector and then cloned into pCDFDuet-1 via Ncol and Notl restriction sites. A CjNCS gene fragment was next digested from the subcloning vector using Ndel and Xhol, and then cloned into pCDFDuet-1 -AROIO via the Ndel and Xhol restriction sites to produce pCDFD- CjNCS-AROlO. E. coli HpaBC containing gene fragments were PCR amplified from E. coli using the Gibson assembly primers shown in Table 8. The PCR product was cleaned using a conventional column- based kit, and then cloned into Xhol-digested pET23 via Gibson assembly (pET23-EcHpaBC). Any substitutions of amino acids in a particular enzyme may be made using the primers listed in Table 8 and methods that are well known in the art.
In vivo production of BIA. BL21(DE3) and BL21-AI competent E. coli cells were transformed with various combinations of plasmids from Table 9, resulting in the strains shown in Table 2. Strains were tested in M9, LB or TB, supplemented with various substrates according to Table 2. Expression of recombinant genes in expression vectors containing the T7 promoter system was induced by addition of 0.5 - 1.5 mM IPTG (isopropyl b-D-l-thiogalactopyranoside) to BL21(DE3) cultures. When using BL21-AI cells 0.08 - 0.4% arabinose was included. Expression of PsPDCl, PsPDC2 and Ps2HCLL in pBAD- DEST49 was also induced by addition of 0.08 - 0.4% arabinose.
For quantification of aromatic products , A1-01-DE3 (3 CjMTs, PsNMCH, CjNCS and AROIO), P1-02-AI (3CjMTs, PpDDC, PsONCS3 and PsPDCl), P1-04-AI (3CjMTs, PpDDC, PsONCS3, PsTyDCl and PsPDCl), P1-06-DE3 (3CjMTs, PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn, PsONCS3 and PsPDCl), P1-07-AI (3CjMTs, PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn, PsONCS3 and PsPDCl), A1-06-AI (3 CjMTs, PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn, CjNCS, AROIO and PsTyDCl) and T1-10-DE3 (3CjMTs, PpDDC, PsONCS3 and PsTyDCl) (FIG. 6 - 8, Table 2, and FIG. 12b-d) were grown using 3.5 mL teriffic broth (TB) supplemented with sodium ascorbate and appropriate antibiotics, in plastic culture tubes at 34-37°C with shaking at 180-190 rpm. After reaching late log phase, inducing agent (IPTG or arabinose) and substrates (>8 mM tyrosine, >8 mM L-DOPA, >9 mM tyrosine-13C, >3 mM tyrosine-cC, >11 mM L-DOPA-cO, >7 mM dopaminc-r/:) were added. When tyrosine was used as a substrate, sometimes dopamine was included as indicated in Table 2 and FIG. 12b (4.7 - 7.5 mM dopamine, 10.3 mM dopamine- d2). Addition of dopamine together with L-DOPA was also tested with strain A1-01-DE3 as indicated in Table 2 and FIG. 12c (17.3 mM dopamine, 7.9 mM dopamine -c/2). Cultures were then incubated at 25°C with shaking at 180-200 rpm.
DT-01-DE3 (3CjMTs, PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn and TfNCS), DS-02-DE3 (3CjMTs, PsNMCH, CjNCS and PpDDC-Hisl81Leu), DD-01-DE3 (3CjMTs, PsNMCH, CjNCS and PpDDC-Hisl 8 lLeu-Gly344Ser), DQ-01-DE3 (3CjMTs, PsNMCH, CjNCS and PpDDC- Tyr79Phe- Phe80Tyr-His 181 Asn -Gly344Ser), DT-02-DE3 (3CjMTs, PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn and PsONCS3), DT-03-DE3 (3CjMTs, PsNMCH, CjNCS and PpDDC- Tyr79Phe-Phe80Tyr-Hisl81Asn) and A1-03-DE3 (3CjMTs, PpDDC, CjNCS, AROIO and EcHpaBC) (Table 2, FIG. 7a, and FIG. 12a and d) were tested in 3 - 4.8 mL M9 supplemented with ascorbate and appropriate antibiotics. After reaching log phase in plastic culture tubes at 36-37°C, IPTG and substrates (>4.5 mM tyrosine, >2 mM L-DOPA, >5 mM tyrosine-13C, >4 mM L-DOPA-r/ ) were added. When tyrosine was used as a substrate, sometimes dopamine was included as indicated in Table 2 (1.2 - 1.4 mM dopamine). Cultures were then incubated at 20-25 °C with shaking at 180 rpm. Additional ascorbate was added as needed to prevent oxidative degradation of target compounds and melanization.
Conversion of norcoclaurine to reticuline was mediated by NMCH and CPR containing strains Nl- 01-DE3, N1-02-DE3, N1-03-DE3, N1-04-DE3, N2-01-DE3, N2-02-DE3, N2-03-DE3 and N2-04-DE3 (Table 2). Here, strains first grown in LB medium were used to inoculate TB medium to a starting ODeoo of 0.02 in 3 mL, with appropriate antibiotics. After four hours at 37°C with shaking at 200 rpm, recombinant protein expression was induced with 0.68 mM IPTG and the temperature was lowered to 20°C. After 5.5 hours, cells were spun down and re-suspended in 1.5 mL TB supplemented with 1.2 mM norcoclaurine, 5.1 mM sodium ascorbate and 0.2 mM IPTG. After 1.5 days at 25°C with shaking at 200 rpm, BIA titers were measured with LC-MS.
Additional bioproduction conditions are given in the legends of FIG. 2c, FIG. 4c, FIG. 8a, as well as FIGs. 9, 10 and 12. Bioproduction times are defined based on the addition of substrate.
Quantitative analysis of BIA pathway intermediates with LC-MS, CE-MS and GC-MS Culture medium was filtered with Amicon Ultra 0.5 mL centrifugal filters with a molecular weight cut-off of 3,000 Da. Filtrates were kept on ice and immediately processed for analysis or stored at -30°C or -80°C before use.
For LC-MS analysis with multiple-reaction monitoring (MRM), filtered culture medium was diluted in a solution of camphor sulfonic acid, and then loaded onto a Shimadzu LCMS-8050 system. DHPAA [151.30>123.15(-)], tyramine [138.00> 121.15 (+) ] , dopamine [154.10>91.05 (+)] , norcoclaurine [272.00>106.95(+)], norlaudanosoline [288.05>164.15(+)] and reticuline [330.10>192.00(+)] were identified using the MRM transitions listed in brackets and confirmed by running authentic standards. Over 100 metabolites could be monitored with MRM detection.
For CE-MS analysis, filtered samples were diluted in a methionine sulfone solution when using positive ion mode, or in a PIPES solution for negative ion mode. CE-MS analysis was performed. Quantification of isotopes in FIG. 8 and FIG. 12 was based on standard curves of non-labeled compounds. CE-MS peak areas in relation to internal standard peak areas were used to quantify all compounds except for 4HPP (FIG. 12a), which was quantified based on its own peak intensity.
For GC-MS analysis of in vivo products, filtered samples were dried under vacuum and then derivatized with BSTFA and TMS-C1. The derivatized aromatic compounds were analyzed on a GCMS- QP2010 Plus (Shimadzu) with a DB-5 capillary column (Agilent). TMS -derivatized tyrosol and norcoclaurine were identified using the most intense product ions 179.1 and 308.1, respectively, and confirmed by running authentic standards.
In vitro characterizations of PsTyDC6. PsTyDC6 was expressed in Rosetta-gami 2 cells transformed with pTYB21-PsTyDC6 (Table 9). After reaching log phase, the cells were induced with 0.15 mM IPTG and grown overnight at 15.5 °C. PsTyDC6 was purified on a chitin column followed by on- column cleavage of the chitin binding domain and intein fusion via addition of 50 mM DTT to the column. PsTyDC6 was then eluted into Amicon Ultra centrifugal filters and the buffer was changed to PBS (pH 7.0).
For detection of in vitro produced 4HPAA, purified PsTyDC6 and digested PsTyDC6 cell extract were with mixed 5 mM and 4 mM tyrosine, respectively. PsTyDC6 reactions containing 100 mM PLP were started together with control reactions containing 100 mM PLP and 4 mM tyrosine, followed by incubation at 30 °C for 3.5 hours. Samples were lyophilized and then derivatized by treatment with a pyridine and methoxyamine solution followed by treatment with MSTFA. Derivatized compounds were analyzed by GC- MS. TMS- and methoxyamine-derivatized 4HPAA was identified based on product ions 190.1 and 205.1 and confirmed by running an authentic 4HPAA standard after derivatization using the same method. To detect in vitro produced H2O2, a horseradish peroxidase-based fluorescent assay was performed, with replacement of the stock fluorescent substrate to Amplex Red (10-acetyl-3,7-dihydroxyphenoxazine). For the peroxidase-based assay, PsTyDC6 was prepared in PBS (pH 7.0) with 1 mM PLP. Baseline fluorescence from the control with PsTyDC6 and 1 mM PLP, but with no tyrosine, was subtracted from each tested condition containing tyrosine. Initial rates of fluorescence production were plotted against final tyrosine concentration using the Michaelis-Menten function of Prism 7.
LC-MS operated in MRM mode was applied to detect in vitro produced DHPAA, dopamine, tyramine, norcoclaurine and norlaudanosoline. For DHPAA, dopamine and norlaudanosoline production, purified PsTyDC6 was mixed with 5 mM L-DOPA. For tyramine and norcoclaurine production, purified PsTyDC6 was mixed with 1.25 mM tyrosine and 2.5 mM L-DOPA. In vitro samples were incubated at 30°C for 80 min. to analyze DHPAA, and for 8 hours to analyze norcoclaurine and norlaudanosoline.
Extraction of aromatic compounds for GC-MS quantification. A solution of ammonium carbonate was added to culture samples, followed by addition of EtOAc. After vortexing, the organic layer was removed and evaporated under vacuum. The dried extracts were then derivatized in a mixture of BSTFA (Wako), TMS-C1 (Alfa Aesar) and EtOAc. Quantitative standard curves were produced by extracting alkaloid standards from equivalent TB solutions, followed by TMS-derivatization in equivalent volumes. The TMS-derivatized samples were analyzed with GC-MS. Table 8 I List of primers used for molecular cloning
Name Nucleotide sequence (5’-3’)
PsTyDC 1 -L205H Fw GCTTCTGATCAAACCCACAGTGCACTACAGAAAGCTGCTC (SEQ ID NO: 86)
PsTyDCl-L205H Rv GAGCAGCTTTCTGTAGTGCACTGTGGGTTTGATCAGAAGC (SEQ ID NO: 87)
PsTyDCl-Y98F-F99Y Fw CTTTGCTTTTTATCCTTCTAGTGGTTCTATCGCTGGTTTCC (SEQ ID NO: 88)
PsTyDCl-Y98F-F99Y Rv GGAAACCAGCGATAGAACCACTAGAAGGATAAAAAGCAA AG (SEQ ID NO: 89)
PsTyDC 1-L205N Fw GCTTCTGATCAAACCAACAGTGCACTACAGAAAGCTGCTC (SEQ ID NO: 90)
PsTyDC 1-L205N Rv GAGCAGCTTTCTGTAGTGCACTGTTGGTTTGATCAGAAGC (SEQ ID NO: 91)
PsTyDCl Sapl Gibson Fw TGATCATCAGTTTTTGCTTGGATCTCAGGTTGTTGTACAGA ACATGGGAAGTCTTCCAGCTAATAACT (SEQ ID NO: 92)
PsTyDC 1 BamHI Gibson Rv GCCGGATCAACTAGTTATTTAATTACCTGCAGGGAATTCGT TAACAAACATCCTCACCTAGTGCACC (SEQ ID NO: 93)
PsTyDCl SEQ1 GCCAGATGCTTACGCTTCCC (SEQ ID NO: 94)
PsTyDCl SEQ2 ACCAGTGAATCAGAGTCTTTCACCC (SEQ ID NO: 95) pTYB21-Fw GCACGTGAGTGCCGCGG (SEQ ID NO: 96)
PsONCS3 SEQ1 GCGCGCGTAATATTCTGT (SEQ ID NO: 97)
PsONCS3 SEQ2 GTAATCAATTTGCTCATCGCTT (SEQ ID NO: 98)
PsPDCl Sapl Gibson Fw TGATCATCAGTTTTTGCTTGGATCTCAGGTTGTTGTACAGA ACATGGATTTCAAAGTTGGTTCTCTTGATAC (SEQ ID NO: 99)
PsPDCl BamHI Gibson Rv GCCGGATCAACTAGTTATTTAATTACCTGCAGGGAATTCGT TACTGAGGATTTGGTGGTCGACTGT (SEQ ID NO: 100)
PsPDC2 Sapl Gibson Fw TGATCATCAGTTTTTGCTTGGATCTCAGGTTGTTGTACAGA ACATGAAAAAAGCAGGCTTCCTATTGG (SEQ ID NO: 101)
PsPDC2 BamHI Gibson Rv GCCGGATCAACTAGTTATTTAATTACCTGCAGGGAATTCGC TACTGAGGATTGGGTGGGCG (SEQ ID NO: 102)
Ps2HCLL Sapl Gibson Fw
TGATCATCAGTTTTTGCTTGGATCTCAGGTTGTTGTACAGA ACATGGCAGAAATCGATAACCTAACC (SEQ ID NO: 103)
Ps2HCLL BamHI Gibson Rv GCCGGATCAACTAGTTATTTAATTACCTGCAGGGAATTCGT TAGTTCTTGTGCTGCATTCTCCCACT (SEQ ID NO: 104)
TrcPsPDCl-IXl Sapl Fw ATGATGATGATGGCTCTTCCAACATGGGAAGTTCTCAAATT GAACTTGGAACC (SEQ ID NO: 105)
TrcPsPDCl-IXl BamHI Rv CATCATCATCTGGATCCCTACTGAGGATTGGGTGGGCG (SEQ ID NO: 106)
PsPDCl SEQ1 GGAAGATGCACATGAGCAAATCG (SEQ ID NO: 107) PsPDCl SEQ2 CACCCGTTTCGGCAATCAC (SEQ ID NO: 108) PsPDC2 SEQ1 AGGTCTGATACCTGAACATCAC (SEQ ID NO: 109) PsPDC2 SEQ2 ACAGCTGTGTCTCCGCTC (SEQ ID NO: 110) Ps2HCLL SEQ1 CGGTCGTCCTGGGGC (SEQ ID NO: 111) Ps2HCLL SEQ2 GCAGGGCTTCCCTGTGCTA (SEQ ID NO: 112) TrcPsPDCl-IXl SEQ1 CACTAGGGAGCCTGTTCCG (SEQ ID NO: 113) M13 Fw GTAAAACGACGGCCAGT (SEQ ID NO: 114) M13 Rv CAGGAAACAGCTATGAC (SEQ ID NO: 115) ARO10 SEQ1 TTGCACATTGTTGGTGTGGC (SEQ ID NO: 116) ARO10 SEQ2 TCATGATATACTTTATGATTTGGCCC (SEQ ID NO: 117) ARO10 SEQ3 GAC ATC A AT GA A ATT A AT AATGGGC (SEQ ID NO: 118) ARO10 SEQ4 CTCACATCAATGGTGGCAACG (SEQ ID NO: 119) ARO10 Ncol Fw TTTTGTTTAACTTTAAGAAGGAGATATACCATGGCACCTGT TACAATTGAAAAGTT (SEQ ID NO: 120)
ARO10 Notl Rv TTCGACTTAAGCATTATGCGGCCGCCTATTTTTTATTTCTTT TAAGTGCCGCTGC (SEQ ID NO: 121)
ACYCDuetUPl Primer GGATCTCGACGCTCTCCCT (SEQ ID NO: 122) DuetDOWNl Primer GATTATGCGGCCGTGTACAA (SEQ ID NO: 123) DuetUP2 Primer TTGT AC ACGGCCGC AT A ATC (SEQ ID NO: 124) T7 Terminator Primer GCTAGTTATTGCTCAGCGG (SEQ ID NO: 125) PpDDC Gibson Ncol Fw TGTAGAAATAATTTTGTTTAACTTTAATAAGGAGATATACA TGACCCCCGAACAATTCCG (SEQ ID NO: 126)
PpDDC Gibson Ba HI Rv GCAAGCTTGTCGACCTGCAGGCGCGCCGAGCTCGAATTCG GATCCTCAGCCCTTGATCACGT (SEQ ID NO: 127)
PpDDC Gibson Sapl Fw TGATCATCAGTTTTTGCTTGGATCTCAGGTTGTTGTACAGA ACATGACCCCCGAACAATTCC (SEQ ID NO: 128) PpDDC Gibson BamHI Rv GCCGGATCAACTAGTTATTTAATTACCTGCAGGGAATTCGT CAGCCCTTGATCACGTCCTGC (SEQ ID NO: 129)
PpDDC-Y79F-F80Y Fw GCACCCGGACTTCTATGGCTTTTACCCTTCCAATGGCACCC TGTCC (SEQ ID NO: 130)
PpDDC-Y79F-F80Y Rv GGACAGGGTGCCATTGGAAGGGTAAAAGCCATAGAAGTCC GGGTGC (SEQ ID NO:131)
PpDDC-H181L Fw CGTGTATGTCAGCGCCCACGCCCTCAGCTCGGTGGACAAG GCTGCAC (SEQ ID NO: 132)
PpDDC-H181L Rv GTGCAGCCTTGTCCACCGAGCTGAGGGCGTGGGCGCTGAC ATACACG (SEQ ID NO: 133)
PpDDC-G344S Fw TGCGCGACTGGGGGATACCGCTGAGCCGTCGGTTCCGTGC GTTGAAG (SEQ ID NO: 134)
PpDDC-G344S Rv CTTCAACGCACGGAACCGACGGCTCAGCGGTATCCCCCAG TCGCGCA (SEQ ID NO: 135)
PpDDC-G344I Fw TGCGCGACTGGGGGATACCGCTGATCCGTCGGTTCCGTGC GTTGAAGC (SEQ ID NO: 136)
PpDDC-G344I Rv GCTTCAACGCACGGAACCGACGGATCAGCGGTATCCCCCA GTCGCGCA (SEQ ID NO: 137)
PpDDC SEQ1 TGCCCGTGAACGCGCCA (SEQ ID NO: 138) PpDDC SEQ2 GTCGCGCAGGTTCTTCACCT (SEQ ID NO: 139) PsNMCH SEQ1 ACGCAGAAGATGATTGAAAGTCAAGC (SEQ ID NO: 140) PsNMCH SEQ2 GCTTGACTTTCAATCATCTTCTGCGT (SEQ ID NO: 141) PsNMCH SEQ3 CCAACCGCATATTGGCTGTTACT (SEQ ID NO: 142) PsNMCH Ncol Fw ATGATGGCGGCCGC (SEQ ID NO: 143) PsNMCH Xhol Rv C ATC ATCTCGAGTT A ATCCCGAGTTTT AGGA (SEQ ID NO: 144)
PsNMCH Overlap Fw TTTTGTTTAACTTTAAGAAGGAGATATACCATGGATTCAAG TCCT A A AGGTTT GCC ACC A (SEQ ID NO: 145)
T7 promotor element Overlap GGCAAACCTTTAGGACTTGAATCCATGGTATATCTCCTTCT Rv T A A AGTT A A AC AAA ATT ATTTCT (SEQ ID NO: 146)
PsNMCH H203Y Fw GTGGAAGTGTAGAAATGAAAGAATATCTATGGAGAATGCT GGAATTGGGG (SEQ ID NO: 147) PsNMCH H203Y Rv CCCCAATTCCAGCATTCTCCATAGATATTCTTTCATTTCTAC ACTTCCAC (SEQ ID NO: 148)
EcNMCH Y202H Fw GTTCAGAGTTCAAGGAACATCTATGGAGGATGTTGGAATT GGGGAATTC (SEQ ID NO: 149) EcNMCH Y202H Rv GAATTCCCCAATTCCAACATCCTCCATAGATGTTCCTTGAA CTCTGAAC (SEQ ID NO: 150) EcNMCH SEQ1 TCCGCTCGTTACGTTTTTCAAAGTTTTCGAGTAAAAGGGCA TGTAG (SEQ ID NO: 151) EcNMCH SEQ2 CGTGGGAGTAGCAATGGAGTTGGTGGGTGAAGTCTTAGGG (SEQ ID NO: 152)
TTTTGTTTAACTTTAAGAAGGAGATATACCATGGATTCAAG
PsNMCH Ncol Fw TCCTAAAG (SEQ ID NO: 153)
CATCATCATGCGGCCGCTTAATCCCGAGTTTTAGGAACAAT
PsNMCH Notl Rv ATACAGAGGTGG (SEQ ID NO: 154)
TACCATGGGCAGCAGCCATCACCATCATCACCACAGCCAG PsNMCH Gibson Fw ATGGATTCAAGTCCTAAAGGTTTGCC (SEQ ID NO: 155) GTACAATACGATTACTTTCTGTTCGACTTAAGCATTATGCT
PsNMCH Gibson Rv T A ATCCCGAGTTTT AGGAAC A AT AT AC AGAG (SEQ ID NO: 156)
Table 9 I List of plasmids used in this study
Plasmids Description pTrcHis2B AmpR, empty expression plasmid, trc promoter system, pBR322 origin (4.4 kb) pTrc-DHPAAS pTrcHis2B derivative, containing DHPAAS under trc promoter (5.9 kb) pTrc-DHPAAS-S pTrcHis2B derivative, containing DHPAAS-N192H under trc promoter (5.9 kb) pTrc-DHPAAS-D pTrcHis2B derivative, containing DHPAAS-F79Y-Y80F under trc promoter (5.9 kb) pTrc-DHPAAS-T pTrcHis2B derivative, containing DFIPAAS-F79Y-Y80F-N192FI under trc promoter (5.9 kb) pBAD-DEST49 AmpR, empty expression plasmid, araBAD promote system, pUC origin (6.2 kb) pBAD-PsPDCl pBAD-DEST49 derivative, containing PsPDCl under araBAD promoter (6.3 kb) pBAD-PsPDC2 pBAD-DEST49 derivative, containing PsPDC2 under araBAD promoter (5.9 kb) pBAD-Ps2CHLL pBAD-DEST49 derivative, containing Ps2CHLL under araBAD promoter (6.3 kb) pBAD-PsPDCl-His pBAD-DEST49 derivative, containing His-tag fused PsPDCl under araBAD promoter (6.3 kb) pBAD-PsPDC2-His pBAD-DEST49 derivative, containing His-tag fused PsPDCl under araBAD promoter (5.9 kb) pBAD-Ps2CHLL- pBAD-DEST49 derivative, containing His-tag fused Ps2CHLL under araBAD
His promoter (6.3 kb) pTXBl AmpR, empty expression plasmid, T7 promoter system, pBR322 origin (6.7 kb) pTXB 1 -PsTyDC 1 pTXBl derivative, containing PsTyDC 1 under T7 promoter (8.2 kb) pTYB21 AmpR, empty expression plasmid, T7 promoter system, pBR322 origin (7.5 kb) pT YB 21 -PsT yDC 1 pTYB21 derivative, containing PsTyDCl under T7 promoter (9.0 kb) pTYB21-PsTyDC6 pTYB21 derivative, containing PsTyDC6 under T7 promoter (9.0 kb) pTYB21- pTYB21 derivative, containing N-terminal membrane domain truncated
TrcPsPDCl-IXl PsPDCl-IXl under T7 promoter (9.0 kb) pTYB21- pTYB21 derivative, containing N-terminal membrane domain truncated
TrcPsPDCl-IXl PsPDCl-IXl under T7 promoter with tandem lac repressor binding sites (9.3 kb) pTYB21 -PsPDCl pTYB21 derivative, containing PsPDCl under T7 promoter (9.3 kb) pTYB21 -PsPDC2 pTYB21 derivative, containing PsPDC2 under T7 promoter (8.9 kb) pTYB21 -Ps2CHLL pTYB21 derivative, containing Ps2CHLL under T7 promoter (9.2 kb) pTYB21 -PpDDC-S pTYB21 derivative, containing PpDDC-H181L under T7 promoter (8.9 kb) pET23a A pR, empty expression plasmid, T7 promoter system, pBR322 origin (3.7 kb) pET23a-EcHpaBC pET23a derivative, containing EcHpaB and EcHpaC under T7 promoters (5.8 kb) pET23a-3PsMTs pET23a derivative, containing optimized Ps60MT PsCNMT and Q40MT under T7 promoters (9.2 kb) pE-SUMO Kan KmR, empty expression plasmid, T7 promoter system, pBR322 origin (5.6 kb) pE-DHPAAS pE-SUMO derivative, containing DHPAAS under T7 promoter (7.1 kb) pE-DHPAAS-S pE-SUMO derivative, containing DHPAAS-N192H under T7 promoter (7.1 kb) pE-DHPAAS-D pE-SUMO derivative, containing DHPAAS-F79Y- Y80F under T7 promoter (7.1 kb) pE-DHPAAS-T pE-SUMO derivative, containing DHPAAS-F79Y-Y80F-N192H under T7 promoter (7.1 kb) pACYC184 CmR, empty expression plasmid, pl5A origin (4.3 kb) pACYC-3CjMTs pACYC184 derivative, containing CJ60MT, CjCNMT and Cj40MT under T7 promoters (7.4 kb) pACYC-3CjMTs- pACYC184 derivative, containing CJ60MT, CjCNMT, Cj40MT and PpDDC
PpDDC under T7 promoters (9.0 kb) pACYC-3CjMTs- pACYC 184 derivative, containing CJ60MT, CjCNMT, Cj40MT and PpDDC-
PpDDC-S H181L under T7 promoters (9.0 kb) pACYC-3CjMTs- pACYC 184 derivative, containing Cj60MT, CjCNMT, Cj40MT and PpDDC-
PpDDC-G344S G344S under T7 promoters (9.0 kb) pACYC-3CjMTs- pACYC 184 derivative, containing Cj60MT, CjCNMT, Cj40MT and PpDDC-
PpDDC-G344I G344I under T7 promoters (9.0 kb) pACYC-3CjMTs- pACYC 184 derivative, containing Cj60MT, CjCNMT, Cj40MT and PpDDC- ppDDC-T F79Y-Y80F-N192H under T7 promoters (9.0 kb) pACYC-3CjMTs- pACYC 184 derivative, containing Cj60MT, CjCNMT, Q40MT and truncated
PsNMCH PsNMCFI under T7 promoters (9.0 kb) pCDFDuet-1 SmR, empty expression plasmid, T7 promoter system, CloDF13 origin (3.8 kb) pCDFD-TfNCS pCDFDuet-1 derivative, containing TfNCS under T7 promoter (4.3 kb) pCDFD-TfNCS- pCDFDuet-1 derivative, containing TfNCS and PsTyDCl under T7 promoters
PsTyDCl (5.8 kb) pCDFD-TfNCS- pCDFDuet-1 derivative, containing TfNCS and PsTyDCl -L205H under T7
PsTyDCl-S promoters (5.8 kb) pCDFD-TfNCS- pCDFDuet-1 derivative, containing TfNCS and PsTyDCl-Y79F-F80Y-L205N
PsTyDCl-T under T7 promoters (5.8 kb) pCDFD-PsONCS3 pCDFDuet-1 derivative, containing PsONCS3 under T7 promoter (5.8 kb) pCDFD-PsONCS3- pCDFDuet-1 derivative, containing PsONCS3 and PsTyDCl under T7
PsTyDCl promoters (7.2 kb) pCDFD-PsONCS3- pCDFDuet-1 derivative, containing PsONCS3 and PsTyDCl -L205H under T7
PsTyDCl-S promoters (7.2 kb) pCDFD-PsONCS3- pCDFDuet-1 derivative, containing PsONCS3 and PsTyDCl -Y79F-F80Y-
PsTyDCl-T L205N under T7 promoters (7.2 kb) pCDFD-CjNCS- pCDFDuet-1 derivative, containing CjNCS and PpDDC-H181L under T7
PpDDC-S promoters (6.2 kb) pCDFD-CjNCS- pCDFDuet-1 derivative, containing CjNCS and PpDDC-H / 81 L-G344S under
PpDDC-D T7 promoters (6.2 kb) pCDFD-CjNCS- pCDFDuet-1 derivative, containing CjNCS and PpDDC-Y79F-F80Y-Fll81N
PpDDC-T under T7 promoters (6.2 kb) pCDFD-CjNCS- pCDFDuet-1 derivative, containing CjNCS and PpDDC-Y79F-F80Y-H181N-
PpDDC-Q G344S under T7 promoters (6.2 kb) pCDFD-CjNCS- pCDFDuet-1 derivative, containing CjNCS and AROIO under T7 promoters
ARO10 (6.6 kb) pCOLAD- pCOLADuet-1 derivative, containing PsNMCH and AtA TR2 under T7
PsNMCH-AtATR2 promoters (6.9 kb) pCOLAD- pCOLADuet-1 derivative, containing PsNMCH-H203Y and AtATR2 under T7
PsNMCH-H203Y- promoters (6.9 kb)
AtATR2 pCOLAD- pCOLADuet-1 derivative, containing PsNMCH and PsCPR-L under T7
PsNMCH-PsCPR-L promoters (6.9 kb) pCOLAD- pCOLADuet-1 derivative, containing PsNMCH-H203Y and PsCPR-L under
PsNMCH-H203 Y - T7 promoters (6.9 kb)
PsCPR-L pCOLAD- pCOLADuet-1 derivative, containing EcNMCH and AtA TR2 under T7
EcNMCH-AtATR2 promoters (7.0 kb) pCOLAD- pCOLADuet-1 derivative, containing EcNMCH-Y202H and AtATR2 under T7
EcNMCH-Y 177H- promoters (7.0 kb)
AtATR2 pCOLAD- pCOLADuet-1 derivative, containing EcNMCH and PsCPR-L under T7
EcNMCH-PsCPR- promoters (7.0 kb)
L pCOLAD- pCOLADuet-1 derivative, containing EcNMCH-Y202H and PsCPR-L under EcNMCH-Y 177H- T7 promoters (7.0 kb) PsCPR-L
The following species abbreviations are used in gene names: Bm - Bombyx mori, Ps - Papaver somniferum, Ec - Escherichia coli, Cj - Coptis japonica, Tf - Thalictrum flavum, Pp - Pseudomonas putida, At - Arabidopsis thaliana, Ec - Eschscholzia californica. Example 2. Machine Learning Model
Data Collection. Training Datasets. EC numbers and reaction information for 38,320 enzymatic reactions were collected from BRENDA and KEGG databases to build three types of prediction models: E- models, SE-models, and SEP-models. Some EC numbers, which correspond to substrate and products of protein, DNA or metal complexes, and those with very specific reactions, were all removed from training datasets. Enzyme sequences and simplified molecular-input line entry system (SMILES) strings for substrates and products were collected from Swiss-Prot and PubChem, respectively. For each gene, enzyme amino acid sequences from various species were aligned using MAFFT (Katoh et al., Nucleic Acids Res. 2002, 30, 3059-3066.). Consensus sequences were derived from alignments using EMBOSS (Rice et al., Trends Genet. 2000, 16, 276-277.) to decrease the size of training datasets. After removing duplicate information, 2882 enzyme, 25,320 substrate-enzyme, and 33,263 substrate-enzyme-product datasets were used to build E-models, SE models, and SEP-models, respectively. Test Datasets. Sequences, annotations and EC numbers for 838 Escherichia coli K-12 enzymes were collected from Swiss-Prot to train and evaluate the three models. For this study, only E. coli K-12 was selected based on availability of detailed annotations. Substrate and product datasets for all known EC numbers were collected from EC numbers according to KEGG. A total of 838 enzyme, 275 substrate-enzyme, and 299 substrate -enzyme-product datasets were used to test E-models, SE-models, and SEP-models, respectively. A total of 210 enzyme sequences were included in 275 substrate-enzyme and 299 substrate-enzyme-product datasets. Vectors used in test datasets were not included as training dataset vectors.
Databases for comparing sequence similarity of test sequences with training sequences were built using BLAST+ 2.7.1 (Altschul et al., Nucleic Acids Res. 1997, 25, 3389-3402). BLAST results were also used to infer the function of 210 test sequences, in comparison to machine learning prediction using E-, SE- , and SEP-models.
Feature Extractions. A total of 1437 dimensional E-vectors were constructed from enzyme amino acid sequence features using PROFEAT (Li et al., Nucleic Acids Res. 2006, 34, W32-W37.) with seven descriptors. PROFEAT is a reliable system that can extract various enzyme sequence features and select multiple descriptors. These descriptors have been established in many protein sequences analysis studies. Similarly, 1387 dimensional S- and P-vectors were derived from their respective chemical structural features using DRAGON (version 7.0.4) (Kode Chemoinformatics srl. DRAGON 7.0. chm.kodesolutions.net/products_dragon.php) with 13 descriptors. The DRAGON descriptors have been used to express various compounds features by calculating quantitative structure-property relationships and quantitative structure- activity relationships. The descriptors were applied to extract compound chemical structure in two-dimensional (2D) spaces. Several studies have used both PROFEAT and DRAGON descriptors to predict drug-target interactions. E-vectors, SE-vectors, and SEP-vectors have 1437, 2824, and 4211 dimensions, respectively. Three types of vectors were extracted from individual enzymatic reactions. Test vectors were normalized based on training vectors. Moreover, these descriptors were evaluated for enzymatic reaction prediction. Machine Learning. Machine learning can rapidly learn various types of data including vectors derived from substrates, products, and enzyme sequences. Multiple machine learning algorithms were employed to build enzymatic reaction prediction models for critical comparison. Support Vector Machine (SVM), Random Forests (RF), k-Nearest Neighbor (kNN), and Multilayer Perceptron (MLP), which have been demonstrated in various biological annotation predictions, were used in this study. Explanations of each method are given in the Supporting Information. The E-model, SE- model, and SEP-model were built by learning E-vectors, SE-vectors, and SEP-vectors, respectively, in combination with corresponding EC number first digits. Six types of SVM-OvR-models (e.g., ECl or Rest, EC2 or Rest, ...) and an SVM-Multi-model that merges 15 types of classifiers (ECl or EC2, ECl or EC3, ..., EC2 or EC3, ..., EC5 or EC6) were built to predict six types of EC number first digits. RF, kNN, and MLP are normal multi-models (e.g., ECl or EC2 or ..., EC6). In OvR-models, posterior probability for test samples was calculated by Platt’s method, which is based on the distance from the decision boundary. OvR test sample prediction classes were determined via probability thresholds, which we selected in advance. On the other hand, in Multimodal, prediction classes were determined as the class with the highest score. All machine learning models were evaluated using an E. coli K-12 test. Cross-validation was also included for SVM-based models using One-versus-Rest (OvR) and Multiclass One-versus-One (Multi) methods. Each model was optimized by tuning hyperparameters. In SVM E-models, the hyperparameters with the highest Accuracy were used for cross-validation, and in the SE-, SEP-models, the same parameters were used because Accuracy was consistent throughout all cross-validation results. On the other hand, in multi models, the hyperparameters with the highest Macro FI scores were selected for the E. coli K-12 test. Each machine learning algorithm was used in the scikit-learn library. Various parameters were used for cross- validation and the E. coli K-12 test because Accuracy is not always the best metric for prediction when training and test datasets are imbalanced. Macro Precision, Recall, FI score, and AUC were used as metrics because the datasets are imbalanced in six EC number first digits.
Principal Component Analysis (PCA). PCA was used to dimensionally compress and extract features for the SE- and SEP-models. PCA orthogonally projects data onto a lower dimensional linear space known as the principal subspace, resulting in maximization of projected data variance. SE- and SEP-vector dimensions were compressed using PCA to decrease model building time, especially for the SVM-OvR models. Furthermore, PCA was used to identify important features of training vectors because the number of dimensions increases when adding the substrate and product information. SE- and SEP-vectors were compressed into six types and seven types of dimensions, respectively, and the resulting models were then compared. In addition, 30 dimensions were determined in descending order of factor loadings, up to the 10th principal component. The effect of origin vector dimensions on each principal component dimension was evaluated. Important variables for enzymatic reaction prediction were evaluated by removing descriptor dimensions throughout the 30 dimensions and 10 principal components followed by comparing prediction results with all other machine learning SEP-models, not including SVM-Multi. PCA from the scikit-learn library was used.
Sequence Listing.
P. somniferum pyruvate decarboxylase 1 (PsPDCl)
MDFKVGSLDTPKPANGDVGSLPANHVSTLKTSTSSTQLCSAEATLGRHLARRLVQVGVSDVFAV
PGDFNLTLLDDLI AEPGLKLV GCCNELN AGY A ADGY ARSRGV GAC A VTFT VGGLSILN AI AGA Y
SENLPIICIVGGPNSNDYGTNRILHHTIGLPDFSQELRCFQTVTCYQAIVNNLEDAHEQIDTAISTAL
KESKPVYISVSCNLSAIPHPTFSREPVPFCLAPKLSNSLGLEAAVDAAAEFLNKAVKPVMVAGPKL
RVAKACDAFLELADACGYPVAVMPSAKGLMKETHPHFIGTYWGAVSTAFCAEIVESADAYIFAG
PIFNDYSSVGYSLLLKKEKAIIVQPDRVVIANGAAFGCVLMKEFLPALAKKLQRNTTAYENYHRI
YVPEGLPPRCDPKEPLRVNILFKHIQKMLSGDSAVIAETGDSWFNCQKLKLPEGCGYEFQMQYGS IGWSVGATLGYAQAAKDKRVIACIGDGSFQVTAQDISTMLRCEQNTIIFLINNGGYTIEVEIHDGP YNVIKNWNYTALVDAIHNGDGKCWTTKVQCEEELVEAIETATEVKKDCLCFIEIVVHKDDTSKE LLEWGSRVSSANSRPPNPQ (SEQ ID NO: 1)
P. somniferum pyruvate decarboxylase 1 (PsPDCl) DNA atggatttcaaagttggttctcttgatacacccaaacccgctaatggagacgtgggttctctacctgcaaaccatgtatcaaccctcaagacttcgacatcgtc gacacaattatgctcagctgaagcaacattagggaggcatttggcgaggcgtttagtacaagttggtgtcagcgatgtctttgcggtgcccggtgattttaat ctcacattgcttgatgatcttattgctgaacctgggttgaaattagttggttgttgtaatgagcttaatgctggttatgctgctgatggttatgcgaggtctcgtgg tgttggtgcttgtgctgttacttttactgttggtggtttgagtattttgaatgctattgctggtgcttacagtgagaatttgccaatcatctgtattgttggtggtccta attctaatgattatgggacgaatcgcatcttgcatcatactattggtttgccggattttagtcaagaactccgttgtttccagaccgtcacttgttatcaggcaatt gttaataacttggaagatgcacatgagcaaatcgatactgcgatttcgacagctttgaaggaaagcaagcctgtttatatcagtgtcagttgcaatttgtctgc gataccacatcctacgtttagtcgagagcctgttccattctgcttagcacctaagttgagtaatagtttgggtttggaagcagcagtggatgctgcagcagag ttcttgaataaggcagtaaagccagtgatggtggcagggccaaaactaagggttgccaaggcttgcgatgcgtttcttgaactggccgatgcctgtggttat ccagttgcggtgatgccatcagccaaaggactaatgaaggagactcatccacatttcatcggaacttattggggtgcagtaagcacggctttctgtgctga gattgttgaatcagctgatgcttacatatttgcaggaccaatctttaatgactacagctctgtggggtactctctacttctcaagaaggagaaggcgattatcgt gcagcctgaccgggttgtgattgctaatggagctgcatttggatgtgttttgatgaaggaattcttaccagcattggctaagaaacttcagcgaaacacaact gcttatgagaattaccacaggatttatgttcccgaagggcttcctcctcgatgtgacccaaaagagccattgagggttaacatattgttcaaacacattcaaaa gatgctatcaggtgacagtgctgtgattgccgaaacgggtgattcctggtttaactgccagaaattgaaattacccgaagggtgcgggtatgaattccaaat gcagtatgggtctattggttggtcagttggggcaacacttgggtatgctcaggctgcaaaggataagcgagtgattgcttgtattggcgatggaagtttcca ggtaactgcacaagatatatcaacaatgttaaggtgtgagcagaataccatcatcttcctgataaacaatggtgggtacaccattgaagttgagatccatgat ggaccttacaatgtgattaagaactggaactacactgccttagttgacgcgattcataacggtgatggcaaatgctggaccaccaaggttcaatgtgaaga agagctggtggaagcaattgagactgcgactgaagttaagaaggattgcttgtgtttcattgagattgtagttcacaaggatgacacaagcaaagagttgct ggaatggggttcaagggtttcttctgcaaacagtcgaccaccaaatcctcagtaa (SEQ ID NO: 2)
27 N-terminal residue truncated P. somniferum PDC1 isoform XI
MSSQIELGTSLHPTNSSPVPLTNASNSATLGRHLARRLVQAGVKDVFSVPGDFNLCLLDHLIAEPE
LNL V GCCNELN AGY A ADGY ARAN GVGAC V VTFT V GGLSILN AI AGA Y SENLP VICI V GGPNSND
YGTNRILHHTIGLPDFTQELRCFQTVTCFQAVVNNLDDAHELIDTAISTALKESKPVYISIGCNLPA
VPHPTFTREPVPFYLAPRISNQMGLEAAVEAAAAFLNKAVKPVIVGGPRLRVCKAQQAFVELAD
ASGYPIAVMPSGKGLIPEHHPHFIGTYWGAVSSSFCGEIVESADAYVFVGPIFNDYSSVGYSLLIKK
EKAIIIQPNRVTIGDGPSFGWVFMADFLTALASKLKRNTTAMENHRRIFVPPGIALKREANEPLRV
NILFKHIQEMLSGDTAVIAETGDSWFNCQKLHLPENCGYEFQMQYGSIGWSVGATLGYAQAVK
HKRVIACIGDGSFQVTAQDVSTMIRCGQKSIIFLINNGGYTIEVEIHDGPYNVIKNWNYTKFVDAI
HNGEGKCWTTKVKTEEELIEAIAKATGDEKDSLCFIEVLVHKDDTSKELLEWGSRVSAANSRPPN
PQ (SEQ ID NO: 3)
27 N-terminal residue truncated P. somniferum PDC1 isoform XI DNA atgggaagttctcaaattgaacttggaaccagtctacatcctacaaactcatcacccgtaccactaactaatgcttcaaattctgcaacacttggtagacactt agcacgtcgtctagttcaagctggtgtaaaagatgtgttctcagtacctggtgattttaacttgtgtttattagatcatctaatagctgaaccggagctcaactta gttggttgctgtaatgaacttaatgctggttatgctgccgatggttatgcaagagcaaatggtgtcggtgcttgtgttgttacttttactgttggtggacttagtatt cttaatgcaattgctggtgcttatagtgaaaatctacctgttatttgtattgtcggtggtcctaattctaatgattatggtactaatcgtattcttcatcatactattgg attacctgattttactcaagaacttcgatgctttcaaactgttacttgtttccaggctgtagttaacaacttggatgatgcacatgagctgattgacactgccatct ccactgctttgaaagaaagcaagcctgtttatatcagcattggctgtaacttacctgcagttcctcacccaaccttcactagggagcctgttccgttctatcttg ctccaaggattagcaatcaaatggggctagaggctgcagtggaggcagcagcagcatttttgaacaaggctgtaaagcctgtgattgtgggagggccta ggttaagggtgtgcaaggctcaacaagcatttgttgagctagcagatgccagcgggtatcccatagctgttatgccatcaggcaaaggtctgatacctgaa catcaccctcacttcataggaacatactggggtgccgtcagttccagcttctgtggtgaaattgtggagtcagcggatgcctatgtttttgttggtccaattttta atgactacagttctgtgggatactcgttgcttatcaagaaggagaaagccataattatacagcctaaccgggttaccatcggtgatggcccttcttttggatgg gtctttatggctgacttcttgactgctttagcctcaaaactgaagaggaacactacagctatggaaaatcatcgcagaatctttgtcccgcccggtatcgctct gaagcgtgaggctaatgaaccgttgagagtcaacatcctcttcaaacatattcaggaaatgctgagcggagacacagctgttattgcagaaacaggagatt catggttcaattgtcagaaattacatctcccagaaaattgcggatatgagttccagatgcagtacggatctattggatggtcagtaggtgcaacccttggatat gcacaggctgtcaaacataagcgtgtcattgcctgcattggtgatggcagtttccaggtaacagctcaggatgtatccacaatgatccgctgtggccagaa gagtatcatattcctcatcaacaacggaggatacacaattgaagttgagatccatgacgggccatacaatgtaatcaaaaactggaattacaccaagttcgtt gatgccatccataatggtgaaggaaaatgttggaccaccaaggtgaaaacagaggaggaactaattgaagcgattgcaaaagcaacaggagatgaaaa ggatagcttatgctttatagaagtcttggtgcacaaagatgatacgagcaaagaactgttagagtggggatcaagggtctctgctgccaatagccgcccac ccaatcctcagtag (SEQ ID NO: 4) P. somnife im tyrosine decarboxylase 1 (PsTyDCl)
MGSLPANNFESMSLCSQNPLDPDEFRRQGHMIIDFLADYYKNVEKYPVRTQVDPGYLKKRLPES
APYNPESIETILEDVTNDIIPGLTHWQSPNYFAYFPSSGSIAGFLGEMLSTGFNVVGFNWMSSPAAT
ELESIVMNWLGQMLTLPKSFLFSSDGSSGGGGVLQGTTCEAILCTLTAARDKMLNKIGRENINKL
VVYASDQTLSALQKAAQIAGINPKNFLAIATSKATNFGLSPNSLQSTILADIESGLVPLFLCATVGT
TSSTAVDPIGPLCAVAKLHGIWVHIDAAYAGSACICPEFRHFIDGVEDADSFSLNAHKWFFTTLDC
CCLWVKDSDSLVKALSTSPEYLKNKATDSKQVIDYKDWQIALSRRFRSMKLWLVLRSYGIANLR
TFLRSHVKMAKHFQLIGMDNRFEIVVPRTFAMVCFRLKPAAIFRKKIVEDDHIEAQTNEVNAKLL
ESVNASGKIYMTHAVVGGVYMIRFAVGATLTEERHVTGAWKVVQEHTDAILGALGEDVC (SEQ
ID NO: 5)
P. somnifenim tyrosine decarboxylase 1 (PsTyDCl) DNA atgggaagtcttccagctaataactttgaaagcatgtcgctgtgttcgcaaaatccacttgatccagatgaattcagaaggcaaggtcacatgattattgattt ccttgctgattactacaaaaatgttgagaaatatccagttagaacccaagtcgatcccggttatttgaagaaaaggttacccgaatcagctccgtacaatcct gaatccattgaaaccattcttgaagatgtgacaaatgatatcatccctggtctaactcactggcaaagtccaaattactttgcttattttccttctagtggttctatc gctggtttcctaggggaaatgctaagtaccggatttaatgttgtcgggtttaattggatgtcatctccggccgcaactgagttggagagtattgttatgaattgg cttggccagatgcttacgcttcccaaatcatttctcttttcatcagacggaagttcgggaggtggaggagttttgcaagggactacttgtgaagccattttatgt actctaactgcggcaagagataaaatgctgaacaaaattggtagagaaaatattaacaagttggttgtttatgcttctgatcaaaccctaagtgcactacaga aagctgctcaaattgctgggattaatcctaagaatttccttgctatcgcaacctccaaggctacaaattttggtctctctccaaattcacttcaatcgacaattctt gctgatatcgaatccgggttagttccattgtttctctgtgccactgtcggaacaacttcttcaacagccgtagatcctattggcccactttgcgcggtggcaaa attgcacggtatttgggttcacattgatgctgcatacgctggaagtgcatgtatctgcccagagttcaggcacttcatcgatggtgtggaagatgcagactca tttagtctaaatgcacacaagtggttctttactactttggattgttgctgtttatgggtgaaagactctgattcactggtcaaggcattatcaacaagtccagaata tttgaagaacaaagcaactgattccaaacaagttatcgattacaaagattggcaaatagcgctcagcagaagattccgatccatgaaactctggttagtactt cgcagctatggaattgctaacttaagaaccttccttaggagccatgttaaaatggctaagcactttcaggggctcattggtatggacaacaggtttgagattgt agttcctagaacatttgccatggtgtgctttcgccttaaaccagctgccatttttaggaaaaaaatagttgaagatgatcacattgaagctcaaacaaatgagg taaatgcgaaattgcttgaatcagtcaatgcgtccgggaagatatacatgactcatgctgttgttggaggggtgtacatgattcggtttgccgtcggggcaac actgacagaggaaagacatgtcactggggcttggaaggtggtacaggagcatacagatgccatacttggtgcactaggtgaggatgtttgttaa (SEQ
ID NO: 6)
PsTyDC 1 -Leu205His
MGSLPANNFESMSLCSQNPLDPDEFRRQGHMIIDFLADYYKNVEKYPVRTQVDPGYLKKRLPES
APYNPESIETILEDVTNDIIPGLTHWQSPNYFAYFPSSGSIAGFLGEMLSTGFNVVGFNWMSSPAAT
ELESIVMNWLGQMLTLPKSFLFSSDGSSGGGGVLQGTTCEAILCTLTAARDKMLNKIGRENINKL
VVYASDQTHSALQKAAQIAGINPKNFLAIATSKATNFGLSPNSLQSTILADIESGLVPLFLCATVGT
TSSTAVDPIGPLCAVAKLHGIWVHIDAAYAGSACICPEFRHFIDGVEDADSFSLNAHKWFFTTLDC
CCLWVKDSDSLVKALSTSPEYLKNKATDSKQVIDYKDWQIALSRRFRSMKLWLVLRSYGIANLR
TFLRSHVKMAKHFQGLIGMDNRFEIVVPRTFAMVCFRLKPAAIFRKKIVEDDHIEAQTNEVNAKL
LESVNASGKIYMTHAVVGGVYMIRFAVGATLTEERHVTGAWKVVQEHTDAILGALGEDVC
(SEQ ID NO: 7)
PsTyDCl-Tyr98Phe-Phe99Tyr-Leu205Asn
MGSLPANNFESMSLCSQNPLDPDEFRRQGHMIIDFLADYYKNVEKYPVRTQVDPGYLKKRLPES
APYNPESIETILEDVTNDIIPGLTHWQSPNYFAFYPSSGSIAGFLGEMLSTGFNVVGFNWMSSPAAT
ELESIVMNWLGQMLTLPKSFLFSSDGSSGGGGVLQGTTCEAILCTLTAARDKMLNKIGRENINKL
VVYASDQTNSALQKAAQIAGINPKNFLAIATSKATNFGLSPNSLQSTILADIESGLVPLFLCATVGT
TSSTAVDPIGPLCAVAKLHGIWVHIDAAYAGSACICPEFRHFIDGVEDADSFSLNAHKWFFTTLDC
CCLWVKDSDSLVKALSTSPEYLKNKATDSKQVIDYKDWQIALSRRFRSMKLWLVLRSYGIANLR
TFLRSHVKMAKHFQGLIGMDNRFEIVVPRTFAMVCFRLKPAAIFRKKIVEDDHIEAQTNEVNAKL
LESVNASGKIYMTHAVVGGVYMIRFAVGATLTEERHVTGAWKVVQEHTDAILGALGEDVC
(SEQ ID NO: 8)
P. somnifenim tyrosine decarboxylase 3 (PsTyDC3)
MGSLNTEDVLEHSSAFGATNPLDPEEFRRQGHMIIDFLADYYRDVEKYPVRSQVEPGYLRKRLPE
TAPYNPESIETILQDVTSEIIPGLTHWQSPNYYAYFPSSGSVAGFLGEMLSTGFNVVGFNWMSSPA
ATELEGIVMDWFGKMLNLPKSYLFSGTGGGVLQGTTCEAILCTLTAARDRKLNKIGREHIGRLVV
YGSDQTHCALQKAAQIAGINPKNFRAVKTFKANSFGLAASTLREVILEDIEAGLIPLFVCPTVGTT
SSTAVDPIGPICEVAKEYEMWVHIDAAYAGSACICPEFRHFIDGVEEADSFSLNAHKWFFTTLDCC CLWVKDPSSLVKALSTNPEYLRNKATESRQVVDYKDWQIALIRRFRSMKLWMVLRSYGVTNLR NFLRSHVRMAKTFEGLVGADRRFEITVPRTFAMVCFRLLPPTTVKVCGENGVHQNGNGVIAVLR NENEELVLANKLNQVYLRQVKATGSVYMTHAVVGGVYMIRFAVGSTLTEERHVIHAWEVLQE HADLILSKFDEANFSS (SEQ ID NO: 9)
P. somniferum tyrosine decarboxylase 3 (PsTyDC3) DNA atgggtagtcttaacacagaagatgttcttgaacacagttcagctttcggtgcaacaaacccattagacccagaagaattcagaagacaaggtcacatgata atcgacttcttagctgattattacagagatgtcgagaaatatccagttcgaagtcaagtagaacccggttatctacgtaaaagattaccagaaacagctccat acaatccagaatctatcgaaacgattcttcaagatgtgacgagtgagattattccagggttaacacattggcaaagtcctaattactatgcttatttcccttcca gtggttccgttgctggattcctcggtgaaatgcttagtactggttttaatgtcgttggttttaactggatgtcttcacctgctgctacagaactcgagggtatgtt atggattggttcggcaaaatgcttaaccttccaaaatcatacttgttctctggtaccggtggtggagttttacagggaactacttgtgaagctatcttatgtacat taacagctgcaagagacagaaagttgaacaaaatcggtcgtgaacatatcggaagattagttgtttatggatctgatcagactcactgtgcactacagaaag ctgctcagattgcaggaatcaaccccaagaacttccgtgctgttaagacgtttaaagctaattcattcggattagcagcttcaactctaagagaagttattcttg aagatattgaagccgggttgatccctctgtttgtatgtccaacggtcggaactacatcatcgactgcagtggatccaatcggtcctatctgtgaagtggcgaa agaatacgaaatgtgggttcacatcgacgcagcttacgctggaagtgcatgtatctgtcccgagtttagacactttatcgacggagtggaggaagcagatt cattcagtctcaatgcgcataaatggtttttcacaactttggattgttgttgtctttgggttaaagatccaagttccctggttaaagctctttccacaaatcctgagt acttgagaaacaaagctacagagtcaagacaggtcgttgactacaaagactggcagatcgcactcattcgccgattccgatccatgaagctttggatggttt tacgtagctatggtgtgactaatctgagaaatttcttgaggagtcatgttagaatggcaaagacatttgagggtctcgttggtgcggataggagattcgaaatt actgtgcctaggacgtttgctatggtctgcttccgccttttacccccaacaaccgtaaaggtatgcggtgaaaatggagtacaccagaatggaaacggggt cattgcagtactacgcaatgaaaatgaagaattagtccttgctaataagctgaatcaagtgtatttgagacaggtcaaggcaacaggtagtgtttatatgaca catgcggtcgtcggaggtgtctacatgattcggttcgcagtcggttcgaccttgacagaggaacgccatgttattcatgcttgggaggttttgcaagagcat gcagatctgattcttagtaagttcgatgaagcaaattttagtagttaa (SEQ ID NO: 10)
PsTyDC3-Ile370Ser
MGSLNTEDVLEHSSAFGATNPLDPEEFRRQGHMIIDFLADYYRDVEKYPVRSQVEPGYLRKRLPE T APYNPESIETILQD VTSEIIPGLTHWQSPNYYAYFPSSGSVAGFLGEMLSTGFNVVGFNWMSSPA ATELEGIVMDWFGKMLNLPKSYLFSGTGGGVLQGTTCEAILCTLTAARDRKLNKIGREHIGRLVV YGSDQTHCALQKAAQIAGINPKNFRAVKTFKANSFGLAASTLREVILEDIEAGLIPLFVCPTVGTT SSTAVDPIGPICEVAKEYEMWVHIDAAYAGSACICPEFRHFIDGVEEADSFSLNAHKWFFTTLDCC CLWVKDPSSLVKALSTNPEYLRNKATESRQVVDYKDWQIALSRRFRSMKLWMVLRSYGVTNLR NFLRSH VRMAKTFEGL V GADRRFEITVPRTFAM V CFRLLPPTTVKV CGEN GVHQNGNGVI A VLR NENEELVLANKLNQVYLRQVKATGSVYMTHAVVGGVYMIRFAVGSTLTEERHVIHAWEVLQE HADLILSKFDEANFSS (SEQ ID NO: 11)
PsTyDC3-Tyrl00Phe-Phel01Tyr-His203Asn
MGSLNTEDVLEHSSAFGATNPLDPEEFRRQGHMIIDFLADYYRDVEKYPVRSQVEPGYLRKRLPE T APYNPESIETILQD VTSEIIPGLTHW QSPN Y Y AF YPSSGS V AGFLGEMLST GFN VV GFNWMS SPA ATELEGIVMDWFGKMLNLPKSYLFSGTGGGVLQGTTCEAILCTLTAARDRKLNKIGREHIGRLVV YGSDQTNCALQKAAQIAGINPKNFRAVKTFKANSFGLAASTLREVILEDIEAGLIPLFVCPTVGTT SSTAVDPIGPICEVAKEYEMWVHIDAAYAGSACICPEFRHFIDGVEEADSFSLNAHKWFFTTLDCC CLWVKDPSSLVKALSTNPEYLRNKATESRQVVDYKDWQIALIRRFRSMKLWMVLRSYGVTNLR NFFRSH VRMAKTFEGF V GADRRFEITVPRTFAM V CFRFFPPTTVKV CGEN GVFIQNGNGVI A VFR NENEELVLANKLNQVYLRQVKATGSVYMTHAVVGGVYMIRFAVGSTLTEERHVIHAWEVLQE HADLILSKFDEANFSS (SEQ ID NO: 12)
P. somniferum tyrosine decarboxylase 6 (PsTyDC6)
MGSLPANNFESMSLCSQNPLDPDEFRRQGHMIIDFLADYYKNVEKYPVRSQVEPGYLKKRLPES
APYNPESIETILEDVTNDIIPGLTHWQSPNYFAYFPSSGSIAGFLGEMLSTGFNVVGFNWMSSPAAT
ELESIVMNWLGQMLTLPKSFLFSSDGSSGGGGVLQGTTCEAILCTLTAARDKMLNKIGRENINKL
VVYASDQTHCALQKAAQIAGINPKNFRAIATSKATNFGLSPNSLQSTILADIESGLVPLFLCATVG
TTSSTAVDPIGPLCAVAKLHGIWVHIDAAYAGSACICPEFRHFIDGVEDADSFSLNAHKWFFTTLD
CCCLWVKDSDSLVKALSTSPEYLKNKATDSKQVIDYKDWQIALSRRFRSMKLWLVLRSYGIANL
RTFLRSHVKMAKHFQGLIGMDNRFEIVVPRTFAMVCFRLKPAAIFRKKIVEDDHIEAQTNEVNAK
LLESVNASGKIYMTHAVVGGVYMIRFAVGATLTEERHVTGAWKVVQEHTDAILGALDGKTTTI
HEILD (SEQ ID NO: 13) P. somniferum tyrosine decarboxylase 6 (PsTyDC6) DNA atgggaagtcttccagctaataactttgaaagcatgtcgctgtgttcgcaaaatccacttgatccagatgaattcagaaggcaaggtcacatgattattgattt ccttgctgattactacaaaaatgttgagaaatatccagttagaagccaagtcgagcccggttatttgaagaaaaggttacccgaatcagctccgtacaatcct gaatccattgaaaccattcttgaagatgtgacaaatgatatcatccctggtctaactcactggcaaagtccaaattactttgcttattttccttctagtggttctatc gctggtttcctaggggaaatgctaagtaccggatttaatgttgtcgggtttaattggatgtcatctccggccgcaactgagttggagagtattgttatgaattgg cttggccagatgcttacgcttcccaaatcatttctcttttcatcagacggaagttcgggaggtggaggagttttgcaagggactacttgtgaagccattttatgt actctaactgcggcaagagataaaatgctgaacaaaattggtagagaaaatattaacaagttggttgtttatgcttctgatcaaacccattgtgcactacagaa agctgctcaaattgctgggattaatcctaagaatttccgtgctatcgcaacctccaaggctacaaattttggtctctctccaaattcacttcaatcgacaattctt gctgatatcgaatccgggttagttccattgtttctctgtgccactgtcggaacaacttcttcaacagccgtagatcctattggcccactttgcgcggtggcaaa attgcacggtatttgggttcacattgatgctgcatatgctggaagtgcatgtatctgcccagagttcaggcacttcatcgatggtgtggaagatgcagactcat ttagtctaaatgcacacaagtggttctttactactttggattgttgctgtttatgggtgaaagactctgattcactggtcaaggcattatcaacaagtccagaatat ttgaagaacaaagcaactgattccaaacaagttatcgattacaaagattggcaaatagcgctcagcagaagattccgatccatgaaactctggttagtacttc gcagctatggaattgctaacttaagaaccttccttaggagtcatgttaaaatggctaagcactttcaggggctcattggtatggacaacaggtttgagattgta gttcctagaacatttgccatggtgtgctttcgccttaaaccagctgccatttttaggaaaaaaatagttgaagatgatcacattgaagctcaaacaaatgaggt aaatgcgaaattgcttgaatcagtcaatgcgtccgggaagatatacatgactcatgctgttgttggaggggtgtacatgattcggtttgccgtcggggcaac actgacagaggaaagacatgtcactggggcttggaaggtggtacaggagcatacagatgccatactcggtgccttggatggtaaaactactaccattcat gaaattctcgattaa (SEQ ID NO: 14)
S. cerevisiae phenylpyruvate decarboxylase AROIO
MAPVTIEKFVNQEERHLVSNRSATIPFGEYIFKRLLSIDTKSVFGVPGDFNLSLLEYLYSPSVESAG
LRWVGTCNELNAAYAADGYSRYSNKIGCLITTYGVGELSALNGIAGSFAENVKVLHIVGVAKSI
DSRSSNFSDRNLHHLVPQLHDSNFKGPNHKVYHDMVKDRVACSVAYLEDIETACDQVDNVIRDI
YKYSKPGYIFVPADFADMSVTCDNLVNVPRISQQDCIVYPSENQLSDIINKITSWIYSSKTPAILGD
VLTDRYGVSNFLNKLICKTGIWNFSTVMGKSVIDESNPTYMGQYNGKEGLKQVYEHFELCDLVL
HFGVDINEINNGHYTFTYKPNAKIIQFHPNYIRLVDTRQGNEQMFKGINFAPILKELYKRIDVSKLS
LQYDSNVTQYTNETMRLEDPTNGQSSIITQVHLQKTMPKFLNPGDVVVCETGSFQFSVRDFAFPS
QLKYISQGFFLSIGMALPAALGVGIAMQDHSNAHINGGNVKEDYKPRLILFEGDGAAQMTIQELS
TILKCNIPLEVIIWNNNGYTIERAIMGPTRSYNDVMSWKWTKLFEAFGDFDGKYTNSTLIQCPSKL
ALKLEELKN SNKRSGIELLEVKLGELDFPEQLKCM VE A A ALKRNKK (SEQ ID NO: 15)
S. cerevisiae phenylpyruvate decarboxylase AROIO DNA atggcacctgttacaattgaaaagttcgtaaatcaagaagaacgacaccttgtttccaaccgatcagcaacaattccgtttggtgaatacatatttaaaagatt gttgtccatcgatacgaaatcagttttcggtgttcctggtgacttcaacttatctctattagaatatctctattcacctagtgttgaatcagctggcctaagatggg tcggcacgtgtaatgaactgaacgccgcttatgcggccgacggatattcccgttactctaataagattggctgtttaataaccacgtatggcgttggtgaatta agcgccttgaacggtatagccggttcgttcgctgaaaatgtcaaagttttgcacattgttggtgtggccaagtccatagattcgcgttcaagtaactttagtgat cggaacctacatcatttggtcccacagctacatgattcaaattttaaagggccaaatcataaagtatatcatgatatggtaaaagatagagtcgcttgctcggt agcctacttggaggatattgaaactgcatgtgaccaagtcgataatgttatccgcgatatttacaagtattctaaacctggttatatttttgttcctgcagattttgc ggatatgtctgttacatgtgataatttggttaatgttccacgtatatctcaacaagattgtatagtatacccttctgaaaaccaattgtctgacataatcaacaaga ttactagttggatatattccagtaaaacacctgcgatccttggagacgtactgactgataggtatggtgtgagtaactttttgaacaagcttatctgcaaaactg ggatttggaatttttccactgttatgggaaaatctgtaattgatgagtcaaacccaacttatatgggtcaatataatggtaaagaaggtttaaaacaagtctatga acattttgaactgtgcgacttggtcttgcattttggagtcgacatcaatgaaattaataatgggcattatacttttacttataaaccaaatgctaaaatcattcaattt catccgaattatattcgccttgtggacactaggcagggcaatgagcaaatgttcaaaggaatcaattttgcccctattttaaaagaactatacaagcgcattga cgtttctaaactttctttgcaatatgattcaaatgtaactcaatatacgaacgaaacaatgcggttagaagatcctaccaatggacaatcaagcattattacaca agttcacttacaaaagacgatgcctaaatttttgaaccctggtgatgttgtcgtttgtgaaacaggctcttttcaattctctgttcgtgatttcgcgtttccttcgca attaaaatatatatcgcaaggatttttcctttccattggcatggcccttcctgccgccctaggtgttggaattgccatgcaagaccactcaaacgctcacatcaa tggtggcaacgtaaaagaggactataagccaagattaattttgtttgaaggtgacggtgcagcacagatgacaatccaagaactgagcaccattctgaagt gcaatattccactagaagttatcatttggaacaataacggctacactattgaaagagccatcatgggccctaccaggtcgtataacgacgttatgtcttggaa atggaccaaactatttgaagcattcggagacttcgacggaaagtatactaatagcactctcattcaatgtccctctaaattagcactgaaattggaggagctta agaattcaaacaaaagaagcgggatagaacttttagaagtcaaattaggcgaattggatttccccgaacagctaaagtgcatggttgaagcagcggcactt aaaagaaataaaaaatag (SEQ ID NO: 16)
Truncated Thalictrum flavum norcoclaurine synthase (TfNCS)
MKLILTGRPFLHHQGIINQVSTVTKVIHHELEVAASADDIWTVYSWPGLAKHLPDLLPGAFEKLEI IGDGGVGTILDMTFVPGEFPHEYKEKFILVDNEHRLKKVQMIEGGYLDLGVTYYMDTIHVVPTG KDSCVIKSSTEYHVKPEFVKIVEPLITTGPLAAMADAISKLVLEHKSKSNSDEIEAAIITV (SEQ ID NO: 17) Truncated Thalictrum flavum norcoclaurine synthase (TfNCS) DNA atgaaattaatcctgaccggtcgcccgtttttacatcatcagggcatcatcaaccaggtgagcaccgtcaccaaagtcattcaccacgaactggaagtggc ggccagcgccgatgatatctggacggtttacagctggccgggtctggcgaaacatctgccggatctgctgcctggcgcgtttgaaaagctggaaattatc ggcgatggcggcgttggcaccatcctcgatatgacctttgttccgggcgaattcccgcatgaatacaaagaaaaatttattctggttgataacgaacatcgc ctgaaaaaagtgcagatgattgaaggcggttatctggatctcggcgtgacctattatatggataccattcatgtggtgccgacgggtaaagatagctgcgtg attaaatcgtccaccgaatatcacgttaaaccggaatttgtgaaaatcgttgaaccgctgatcaccaccggcccgctggcagcgatggcggatgccatca gcaaactggtgctggagcataaaagtaaaagcaacagcgatgaaatcgaagcggcgattattaccgttaa (SEQ ID NO: 18)
P. somniferum multidomain NCS PsOCNCS3
MRKVIKYDMEVAVSADSVWAVYSSPDIPRLLRDVLLPGVFEKLDVIEGNGGVGTVLDIVFPPGA
VPRSYKEKFVNIDREKRLKEVIMIEGGYLDMGCTFYLDRIHVVEKTKSSCVIESSIVYDAKEECAD
AMSKLITTEPLKSMAEVISNYVIQKESFSARNILSKQSVVKKEIRYDLEVPISADSIWSVYSCPDIPR
LLRDVLLPGVFEKLDVIEGDGGVGTVLDIVFPPGAVPRSYKEKFVNIDREKRLKEVIMIEGGYLD
MGCTFYLDRIHVVEKSLSSCVIESSIVYEVKEEYVDAMSKLITTEPLKSMAEVISNYVIQRESFSAR
NILNKNSLVKKEIRYDLEVPTSADSIWSVYSCPDIPRLLRDVLLPGVFQKLDVIEGNGGVGTVLDI
VFPPGAVPRSYKEKFVNINHEKRLKEVIMIEGGYLDMGCTSYLDRIHVVEKTSKSCIIKSSVVYEV
KQECVEAMSKLITTEPLKSMAEVISNYAMKQQSVSERNIPKKQSLLRKEITYETEVQTSADSIWN
VYSSPDIPRLLRDVLLPGVFEKLDVIAGNGGVGTVLDIAFPLGAVRRRYKEKFVKINHEKRLKEV
VMIEGGYLDMGCTFYMDRIHVFEKTPNSCVIESSIITKLKKSMLVKWLS (SEQ ID NO: 19)
P. somniferum multidomain NCS PsOCNCS3 DNA atgcgcaaagtgattaaatacgatatggaagtggcggtttctgctgacagtgtctgggcggtctattcatcgccagatattccacgcttgttgcgtgacgtctt actgcccggcgtgtttgaaaaattagacgtgatcgaaggcaatggcggcgtcggcaccgtgctggatattgttttcccgccaggcgcggttccgcgttctt ataaagagaagttcgtgaatattgatcgcgaaaaacgcctgaaagaggtcattatgattgaagggggttatcttgatatgggctgcaccttctatctggaccg tattcacgttgtcgaaaaaaccaaaagcagttgcgtcattgaatccagtattgtgtacgacgcgaaagaagaatgcgccgatgcgatgagtaaactgattac taccgagccgttgaaatcgatggcagaagtgattagcaattatgtcattcaaaaagagagttttagcgcgcgtaatattctgtcaaaacaaagcgtggtaaa aaaggaaattcgctatgacctggaggtgccgatttctgcggattctatctggagcgtatatagctgtccggacatcccgcgtctgttacgtgatgtgctgttg cctggcgtgttcgaaaagctggatgtgatcgagggcgatggcggcgttggtaccgttctggacattgtcttcccgccgggggcggtgccgcgcagctac aaagagaagtttgtcaatatcgaccgcgagaaacgtctcaaagaagttatcatgatcgaaggtggctacctggatatgggctgtaccttttacctcgaccgc attcatgtggttgaaaaatctctgtcttcctgcgtgattgagagctccatcgtttatgaagtgaaagaagagtatgttgatgccatgtcgaaactgatcaccac ggaaccgctgaaaagcatggctgaagttatcagcaactatgtgattcagcgtgaaagcttcagcgcccgcaatatcctgaataaaaatagcctggttaaga aagagattcgttatgatctggaagtcccgacctctgccgacagcatctggtcggtttacagctgcccggatatcccacgtttattacgcgacgttttgctgcc gggcgtattccagaaactcgatgttatagaaggcaacggcggtgtcggtacggtactggatatcgtctttccaccgggtgcggtaccgcgcagttataaag agaaattcgttaacatcaaccatgaaaagcgtctgaaggaggttattatgattgaaggcggttaccttgatatggggtgtaccagctatctcgatcgcatcca tgtcgttgagaaaacaagcaaatcctgtattattaagagcagcgttgtttatgaggtgaagcaggaatgtgtggaagcgatgagcaaattgattaccaccga accactgaaatcaatggcggaagtcatcagtaactacgcaatgaaacagcagagcgtcagcgaacgtaacattccgaaaaaacagtcgctgctgcgtaa agaaatcacctacgaaaccgaagtgcagaccagtgctgatagcatttggaacgtgtattccagcccggatattcctcgcctgctgcgcgatgtcctgcttcc cggtgtttttgaaaaactggatgtaattgccggtaacggtggtgtgggcacggtgttggatatcgcctttccgctgggcgcagtgcgtcgccgctacaaag aaaaatttgttaaaattaaccacgaaaagcgcttaaaagaagtggtgatgatcgaaggtggctatctggacatgggttgtacgttttatatggatcgtatccac gtatttgagaaaacgccgaacagctgcgttattgaaagttcgattatcactaaacttaaaaaaagtatgctggtgaaatggctgagttaa (SEQ ID
NO: 20)
Coptis japonica Norcoclaurine Synthase (CjNCS)
MSKNLTGVGGSLPVENVQVLAGKELKNLPNRYVRPELEHDDVVPIDNSLEIPVIDLSRLLDQQYA
CDELAKFHSACLDWGFFQLINHGVREEVIEKMKVDTEDFFRLPFKEKNAYRQLPNGMEGYGQAF
VTSEEQKLDWADMHFLITKPVQERNMRFWPTSPTSFRETMEKYSMELQKVAMCLTGMMAKNL
GLESEILTKPLRTVFNREDELLPSMSSCGEGLGLSPHSDATGLTLLIQVNEVNGLHIKKDEKWVPI
KPILGAFVVNIGDVIEIMSNGIYKSIEHRAVINTDKERLSIAAFHDPEYGTKIGPLPDLVKENGVKY
KTIDYEDYLIRSSNIKLDGKSLLDQMKL (SEQ ID NO: 21)
Coptis japonica Norcoclaurine Synthase (CjNCS) DNA atgagcaagaatcttactggtgtgggaggttcattacctgttgagaacgtccaagttcttgctggtaaagagttgaaaaacttacccaatcgatacgttcgac ctgagttagagcatgatgatgtagttcctattgataactctttggaaatcccagttatcgatctatccagactccttgatcaacaatatgcttgtgatgaacttgca aagttccattcagcctgccttgactggggcttcttccagttgatcaaccatggagtacgagaagaagtgattgagaaaatgaaggttgacactgaggatttct tccgacttcctttcaaggagaagaacgcttataggcagctacccaacggtatggaaggttatggccaagcctttgttacgtctgaagagcaaaagctggatt gggcggacatgcatttcctaattaccaaaccagttcaggaaagaaacatgagattttggcctactagtcccacttcattcagagaaacaatggagaaatact caatggagctgcagaaagttgcaatgtgtctaacaggaatgatggctaagaacttgggacttgaatcagaaatattgacgaagcctttgagaactgtattca accgtgaggatgaattactaccctccatgtcctcatgcggagaaggtttgggactttctccacattctgatgctactggcttaactcttttgattcaagtcaatga agtaaatggactgcacatcaagaaagatgaaaaatgggttccaattaaacccattttaggggctttcgtcgttaacatcggtgatgtaattgagataatgagc aatgggatatacaagagcattgagcatagggcggtcattaacacggacaaggaacgcctctcaatagcggcgttccatgacccagaatatggcactaaa attgggcctctacctgatcttgtgaaagaaaatggtgtgaagtacaagaccatcgattatgaggattacttgatacgttcaagtaatataagcttgatggtaa gagcttattggatcagatgaaactataa (SEQ ID NO: 22)
P. somniferum 2-hydroxyacyl-CoA lyase-like protein (2HCLL)
MAEIDNLTLLEESSNKTPETQIDGNLLIAQSLFRAGVRVMFGVVGIPVTSLANRAVSLGIRFIAFHN
EQSAGYAASAYGYLTGSPGILLTVSGPGCVHGLAGLSNSTINCWPMVMISGSCDQKDIGKGDFQ
ELDQIQAVKPFVKYSIKVTDIKHIPEDVKTVLNSSVFGRPGACYLDVPTDVLHQTVTESVANELLT
VSDQNQILEGTPLANLEVEKAVSLLRNAKRPLIVFGKGAAFARVEDSLKRLVERTGIPFLPTPMG
KGFLPDTHEFAATAARSLAIGFCDVAVVVGARFNWFFHFGEPPKWSEDVKFIFIDVAKEEVELR
KPC V GF V GD AKRVLDRINLEIKDDPFCFGRTHPWIQ AIMKKAKEN VLRME V QF AKD V VPFNFFT
PMRIIRDAIFAQGSPAPILVSEGANTMDVGRAVFVQAEPRTRFDAGTWGTMGVGFGYCIAAAVA
SPDRLVVAVEGDSGFGFSAMEVETFVRYQFSVVVIVFNNGGVYGGDRRNPEEIGGPYKDDPAPT
SFVPGAAYHTLIEAFGGKGYLVGTPEELKSALAESFFARKPTVINVTIDPYAGSESGRMQHKN
(SEQ ID NO: 23)
P. somniferum 2HCLL DNA atggcagaaatcgataacctaaccctccttgaagaatcatcaaacaaaaccccagaaacccaaatcgatggcaaccttctcatagctcaatctctgttccgt gcaggcgtcagagttatgtttggtgtagtaggtataccagtaacatcactagcaaacagagcagtatctttaggtataagattcatagcttttcacaatgaaca atctgctggttatgctgcatctgcttatggttatctaactggttcaccaggtatacttcttactgtttctggtcctggttgtgtacatggtttagctggtttgtctaatt ctactattaattgttggcctatggttatgatctctggatcttgtgatcagaaagatattggtaaaggcgattttcaagaacttgatcagattcaagctgtaaaacc ctttgtcaagtattcaatcaaggttactgatattaaacatatccctgaggatgttaaaactgttttgaattcgtctgtattcggtcgtcctggggcttgttatcttgat gttcctactgatgttcttcaccagactgttactgagtctgtggctaatgagcttttgactgtgtctgaccaaaatcaaattctagaaggaacacctttagcaaac ctagaggttgagaaagcagtgtcgttacttaggaatgcgaagagacccttgattgtgtttggtaaaggtgcggcgtttgctagagtggaggattcactgaaa agacttgttgagaggacagggattccgttcttaccgactcctatggggaaagggttgttacctgatacacatgaactagctgcaacagctgcaaggagcct cgcgattggtctatgcgatgttgctgttgttgttggtgcgaggctaaactggttgttgcattttggtgaacccccaaagtggtcagaggatgtgaagtttattttg attgatgtggccaaagaagaggttgagcttagaaaaccttgtgttggtttggttggtgatgccaagcgggttttggataggattaatttggaaattaaagacg atcctttctgtttagggagaacacatccttggatccaagcgattatgaagaaggctaaggagaatgttttgaggatggaagtgcagttggctaaggatgttgt gccatttaatttccttacgccaatgagaatcataagggatgctattctagcacagggaagccctgctccaattttggtttccgagggtgcaaatacgatggat gttggccgagctgttttggttcaggcagagcctaggacaaggctagatgcagggacttgggggacgatgggggtcggtttaggttattgcattgccgctg cagttgcttcgcctgatcgtcttgtggttgctgttgaaggagattctggatttgggttcagcgctatggaagtcgagacattagtgcgataccagttatctgttg ttgtaattgttttcaacaatggtggtgtatatggtggtgatcgcaggaaccctgaagaaattggaggcccttacaaagatgatccagctcccacttcatttgttc egggtgeageatatcacaccctaattgaagettttggagggaaaggttatcttgttggaactecxgaggaaetgaaatetgeaettgcggaatcattttttgct cggaagccaactgtaataaacgttacaatagatccttatgctggctccgagagtgggagaatgcagcacaagaactaa (SEQ ID NO: 24)
P. somniferum PDC2
MNLMLVMLPMVMQEQMVSVLVLLLLLLVDLVFLMQLLVLIVKIYLLFVLSVVLILMIMVLIVFFI
ILLDYLILLKNFDAFKLLLVSRINVVDKMTQAVVNNLDDAHELIDTAISTALKESKPVYISIGCNLP
A VPHPTFTREPVPFYLAPRISN QMGLE AA VE A AA AFLNKAVKP VIV GGPRLRV CKAQQ AF VEL A
DASGYPIAVMPSGKGLIPEHHPHFIGTYWGAVSSSFCGEIVESADAYVFVGPIFNDYSSVGYSLLIK
KEKAIIIQPNRVTIGDGPSFGWVFMADFLTALASKLKRNTTAMENHRRIFVPPGIALKREANEPLR
VNILFKHIQEMLSGDTAVIAETGDSWFNCQKLHLPENCGYEFQMQYGSIGWSVGATLGYAQAV
KHKRVIACIGDGSFQVTAQDVSTMIRCGQKSIIFLINNGGYTIEVEIHDGPYNVIKNWNYTKFVDA
IHNGEGKCWTTKVKTEEELIEAIAKATGDEKDSLCFIEVLVHKDDTSKELLEWGSRVSAANSRPP
NPQ (SEQ ID NO: 25) P. somniferum PDC2 DNA atgaacttaatgctggttatgctgccgatggttatgcaagagcaaatggtgtcggtgcttgtgttgttacttttactgttggtggacttagtattcttaatgcaattg ctggtgcttatagtgaaaatctacctgttatttgtattgtcggtggtcctaattctaatgattatggtactaatcgtattcttcatcatactattggattacctgatttta ctcaagaacttcgatgctttcaaactgttacttgtttccaggataaatgtcgttgataaaatgacacaggctgtagttaacaacttggatgatgcacatgagctg attgacactgccatctccactgctttgaaagaaagcaagcctgtttatatcagcattggctgtaacttacctgcagttcctcacccaaccttcactagggagcc tgttccgttctatcttgctccaaggattagcaatcaaatggggctagaggctgcagtggaggcagcagcagcatttttgaacaaggctgtaaagcctgtgat tStgggagggcctaggttaagggtgtgcaaggctcaacaagcatttgttgagctagcagatgccagcgggtatcccatagctgttatgccatcaggcaaa ggtctgatacctgaacatcaccctcacttcataggaacatactggggtgccgtcagttccagcttctgtggtgaaattgtggagtcagcggatgcctatgtttt tgttggtccaatttttaatgactacagttctgtgggatactcgttgcttatcaagaaggagaaagccataattatacagcctaaccgggttaccatcggtgatg gcccttcttttggatgggtctttatggctgacttcttgactgctttagcctcaaaactgaagaggaacactacagctatggaaaatcatcgcagaatctttgtcc cgcccggtatcgctctgaagcgtgaggctaatgaaccgttgagagtcaacatcctcttcaaacatattcaggaaatgctgagcggagacacagctgttatt gcagaaacaggagattcatggttcaattgtcagaaattacatctcccagaaaattgcggatatgagttccagatgcagtacggatctattggatggtcagta ggtgcaacccttggatatgcacaggctgtcaaacataagcgtgtcattgcctgcattggtgatggcagtttccaggtaacagctcaggatgtatccacaatg atccgctgtggccagaagagtatcatattcctcatcaacaacggaggatacacaattgaagttgagatccatgacgggccatacaatgtaatcaaaaactg gaattacaccaagttcgttgatgccatccataatggtgaaggaaaatgttggaccaccaaggtgaaaacagaggaggaactaattgaagcgattgcaaaa gcaacaggagatgaaaaggatagcttatgctttatagaagtcttggtgcacaaagatgatacgagcaaagaactgttagagtggggatcaagggtctctg ctgccaatagccgcccacccaatcctcagtag (SEQ ID NO: 26)
P. putida L-DOPA decarboxylase (PpDDC)
MTPEQFRQYGHQLIDLIADYRQTVGERPVMAQVEPGYLKAALPATAPQQGEPFAAILDDVNNLV MPGLSHWQHPDFYGYFPSNGTLSSVLGDFLSTGLGVLGLSWQSSPALSELEETTLDWLRQLLGLS GQWSGVIQDTASTSTLVALISARERATDYALVRGGLQAEPKPLIVYVSAHAHSSVDKAALLAGF GRDNIRLIPTDERYALRPEALQAAIEQDLAAGNQPCAVVATTGTTTTTALDPLRPVGEIAQANGL WLHVDSAMAGSAMILPECRWMWDGIELADSVVVNAHKWLGVAFDCSIYYVRDPQHLIRVMST NPSYLQSAVDGEVKNLRDWGIPLGRRFRALKLWFMLRSEGVDALQARLRRDLDNAQWLAGQV EAAAEWEVL ARVQL ,QTT CIRHRPAGL ,FGEAL D AHTKGWAERL -NASGA A YVTPATI DGRWMVR V SIGALPTERGD V QRL WARLQD VIKG (SEQ ID NO: 27)
P. putida L-DOPA decarboxylase (PpDDC) DNA atgacccccgaacaattccgccagtacggccaccaactgatcgacctgattgccgactaccgccagaccgtgggcgaacgcccggtcatggcccaggt cgaacctggctatctcaaggccgccttgcccgcaactgcccctcaacaaggcgaacctttcgcggccattctcgacgacgtcaataacctggtcatgccc ggcctgtcccattggcagcacccggacttctatggctatttcccttccaatggcaccctgtcctcggtgctgggggacttcctcagtaccggtctgggcgtg ctgggcctgtcctggcaatccagcccggccctgagcgaactggaagaaaccaccctcgactggctgcgccagttgcttggcctgtctggccagtggagt ggggtgatccaggacactgcctcgaccagcaccctggtggcgctgatcagtgcccgtgaacgcgccactgactacgccctggtacgtggtggcctgca ggccgagcccaagcctttgatcgtgtatgtcagcgcccacgcccacagctcggtggacaaggctgcactgctggcaggttttggccgcgacaatatccg cctgattcccaccgacgaacgctacgccctgcgcccagaggcactgcaggcggcgatcgaacaggacctggctgccggcaaccagccgtgcgccgt ggttgccaccaccggcaccacgacgaccactgccctcgacccgctgcgcccggtcggtgaaatcgcccaggccaatgggctgtggttgcacgttgact cggccatggccggttcggcgatgatcctgcccgagtgccgctggatgtgggacggcatcgagctggccgattcggtggtggtcaacgcgcacaaatgg ctgggtgtggccttcgattgctcgatctactacgtgcgcgatccgcaacacctgatccgggtgatgagcaccaatcccagctacctgcagtcggcggtgg atggcgaggtgaagaacctgcgcgactgggggataccgctgggccgtcggttccgtgcgttgaagctgtggttcatgttgcgcagcgagggtgtcgac gcattgcaggcgcggctgcggcgtgacctggacaatgcccagtggctggcggggcaggtcgaggcggcggcggagtgggaagtgttggcgccagt acagctgcaaaccttgtgcattcgccatcgaccggcggggcttgaaggggaggcgctggatgcgcataccaagggctgggccgagcggctgaatgca tccggcgctgcttatgtgacgccggctacactggacgggcggtggatggtgcgggtttcgattggtgcgctgccgaccgagcggggggatgtgcagcg gctgtgggcacgtctgcaggacgtgatcaagggctga (SEQ ID NO: 28)
PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn
MTPEQFRQYGHQLIDLIADYRQTVGERPVMAQVEPGYLKAALPATAPQQGEPFAAILDDVNNLV
MPGLSHWQHPDFYGFYPSNGTLSSVLGDFLSTGLGVLGLSWQSSPALSELEETTLDWLRQLLGLS
GQWSGVIQDTASTSTLVALISARERATDYALVRGGLQAEPKPLIVYVSAHANSSVDKAALLAGF
GRDNIRLIPTDERYALRPEALQAAIEQDLAAGNQPCAVVATTGTTTTTALDPLRPVGEIAQANGL WLHVDSAMAGSAMILPECRWMWDGIELADSVVVNAHKWLGVAFDCSIYYVRDPQHLIRVMST NPSYLQSAVDGEVKNLRDWGIPLGRRFRALKLWFMLRSEGVDALQARLRRDLDNAQWLAGQV EAAAEWEVLAPVQLQTLCIRHRPAGLEGEALDAHTKGWAERLNASGAAYVTPATLDGRWMVR V SIGALPTERGD V QRL WARLQD VIKG (SEQ ID NO: 29)
PpDDC-Tyr79Phe-Phe80Tyr-Hisl81Asn-DNA atgacccccgaacaattccgccagtacggccaccaactgatcgacctgattgccgactaccgccagaccgtgggcgaacgcccggtcatggcccaggt cgaacctggctatctcaaggccgccttgcccgcaactgcccctcaacaaggcgaacctttcgcggccattctcgacgacgtcaataacctggtcatgccc ggcctgtcccattggcagcacccggacttctatggcttttacccttccaatggcaccctgtcctcggtgctgggggacttcctcagtaccggtctgggcgtg ctgggcctgtcctggcaatccagcccggccctgagcgaactggaagaaaccaccctcgactggctgcgccagttgcttggcctgtctggccagtggagt ggggtgatccaggacactgcctcgaccagcaccctggtggcgctgatcagtgcccgtgaacgcgccactgactacgccctggtacgtggtggcctgca ggccgagcccaagcctttgatcgtgtatgtcagcgcccacgccaacagctcggtggacaaggctgcactgctggcaggttttggccgcgacaatatccg cctgattcccaccgacgaacgctacgccctgcgcccagaggcactgcaggcggcgatcgaacaggacctggctgccggcaaccagccgtgcgccgt ggttgccaccaccggcaccacgacgaccactgccctcgacccgctgcgcccggtcggtgaaatcgcccaggccaatgggctgtggttgcacgttgact cggccatggccggttcggcgatgatcctgcccgagtgccgctggatgtgggacggcatcgagctggccgattcggtggtggtcaacgcgcacaaatgg ctgggtgtggccttcgattgctcgatctactacgtgcgcgatccgcaacacctgatccgggtgatgagcaccaatcccagctacctgcagtcggcggtgg atggcgaggtgaagaacctgcgcgactgggggataccgctgggccgtcggttccgtgcgttgaagctgtggttcatgttgcgcagcgagggtgtcgac gcattgcaggcgcggctgcggcgtgacctggacaatgcccagtggctggcggggcaggtcgaggcggcggcggagtgggaagtgttggcgccagt acagctgcaaaccttgtgcattcgccatcgaccggcggggcttgaaggggaggcgctggatgcgcataccaagggctgggccgagcggctgaatgca tccggcgctgcttatgtgacgccggctacactggacgggcggtggatggtgcgggtttcgattggtgcgctgccgaccgagcggggggatgtgcagcg gctgtgggcacgtctgcaggacgtgatcaagggctga (SEQ ID NO: 30)
C.japonica norcoclaurine 6-O-methyltransferase (Cj60MT)
ML VKKKDNLSS QAKLWNFI Y GF AESL VLKC A V QLDL ANIIHN SGTSMTLSELS SRLPS QPVNED A
LYRVMRYLVHMKLFTKASIDGELRYGLAPPAKYLVKGWDKCMVGSILAITDKDFMAPWHYLK
DGLSGESGTAFEKALGTNIWGYMAEHPEKNQLFNEAMANDSRLIMSALVKECGNIFNGITTLVD
VGGGTGTAVRNIANAFPHIKCTVYDLPHVIADSPGYSEVHCVAGDMFKFIPKADAIMMKCILHD
WDDKECIEILKRCKEAVPVKGGKVIIVDIVLNVQSEHPYTKMRLTLDLDMMLNTGGKERTEEEW
KKLIHD AGYKGHKITQIT AV QS VIE A YPY (SEQ ID NO: 31)
C.japonica norcoclaurine 6-0-methyltransferase (Cj60MT) DNA atgttagtgaagaagaaggacaatctctcatctcaagctaaactgtggaacttcatttatggttttgctgaatcactagtcctcaaatgtgcagtgcaacttgat ctagccaacataattcacaacagtggcacgtccatgactctttccgagttatcttcgcgtcttccaagtcaacctgtcaatgaagacgccttgtatcgagtcat gcgttacttggttcacatgaagctattcacaaaagcatcaatagatggagaactaagatatggacttgcaccaccagctaagtatcttgttaaaggttgggat aaatgtatggttggctcaattttagcaatcactgataaagatttcatggcaccatggcattaccttaaggatggattatcaggcgaaagtggtacagcgtttga gaaggccttggggacgaatatatgggggtacatggcagagcaccctgagaaaaaccagctatttaatgaagcaatggctaatgattcaaggcttattatgt ctgcattggtgaaagaatgtggaaatatttttaatggtataactacacttgtggatgttggtggtggtactggaactgctgtgaggaatattgccaatgcatttc cacatataaagtgtactgtttatgatcttcctcatgtcattgctgattctcctgggtactccgaagttcattgcgtggcaggtgatatgttcaagttcataccaaag gctgatgctatcatgatgaagtgcatccttcacgactgggatgacaaagaatgcattgaaattctaaagcgatgcaaggaggcagtaccagtcaaaggcg ggaaagtgattatagtcgacattgtcttaaatgtgcaatcagaacatccttataccaagatgagactgactttggatttggacatgatgctcaacactggagg aaaagagaggactgaagaggaatggaagaagctcatccatgatgcagggtacaaagggcataagataacacaaattactgctgtacaatctgtgattga ggcttatccatattag (SEQ ID NO: 32)
P. somniferum norcoclaurine 6-O-methyltransferase (Ps60MT)
METVSKIDQQNQ AKIWKQIY GFAESLVLKC AV QLEIAETLHNNVKPMSLSELASKLPVAQPVNE
DRLFRIMRYLVHMELFKIDATTQKYSLAPPAKYLLRGWEKSMVDSILCINDKDFLAPWHHLGDG
LTGNCDAFEKALGKSIWVYMSVNPEKNQLFNAAMACDTRLVTSALANECKSIFSDGISTLVDVG
GGTGTAVKAISKAFPDIKCTIYDLPHVIADSPEIPNITKISGDMFKSIPSADAIFMKCILHDWNDDEC
IQILKRCKEALPKGGKVIIVDVVIDMDSTHPYAKIRLTLDLDMMLNTGGKERTKEEWKTLFDAAG
FASHKVTQIS AV QS VIEAYPY (SEQ ID NO: 33) P. somniferum norcoclaurine 6-0-methyltransferase (Ps60MT) DNA atggaaacggtgtctaaaatcgatcagcagaatcaggcaaaaatctggaaacagatttatggttttgcggaaagtctggtgctgaaatgcgcagttcagct ggaaattgcggaaaccctgcataacaatgtgaaaccgatgagcctgtctgaactggccagcaaactgccggtggcacagccggttaacgaagatcgtct gtttcgcatcatgcgttacctggttcacatggaactgttcaaaattgatgcaaccacgcagaaatattctctggcaccgccggcaaaatacctgctgcgtggt tgggaaaaaagcatggtggattctatcctgtgcatcaacgataaagattttctggccccgtggcatcacctgggtgatggcctgaccggtaactgtgatgcg ttcgaaaaagccctgggcaaatctatttgggtgtatatgagtgttaacccggagaaaaaccagctgtttaacgcggccatggcgtgcgatacccgtctggtt acgagtgccctggcaaatgaatgtaaaagtatcttcagcgatggtattagcaccctggtggatgttggcggtggcaccggtacggcagtgaaagcaatca gcaaagccttcccggatatcaaatgcacgatttacgatctgccgcatgttattgcagatagcccggaaattccgaacatcaccaaaatctctggtgatatgtt caaatctatcccgagtgcggatgccatcttcatgaaatgtattctgcacgattggaacgatgatgaatgcattcagatcctgaaacgttgtaaagaagcgctg ccgaaaggtggcaaagtgattatcgttgatgtggttatcgatatggatagcacgcatccgtatgccaaaattcgcctgaccctggatctggatatgatgctga acaccggtggcaaagaacgtacgaaagaagaatggaaaaccctgtttgatgcagcgggcttcgcgagccacaaagtgacccagatcagtgcggtgca gagcgttattgaagcctatccgtactaa (SEQ ID NO: 34)
P. somniferum CNMT (PsCNMT)
MQLKAKEELLRNMELGLIPDQEIRQLIRVELEKRLQWGYKETHEEQLSQLLDLVHSLKGMKMAT
EMENLDLKLYEAPMEFLKIQHGSNMKQSAGYYTDESTTLDEAEIAMLDLYMERAQIKDGQSVL
DLGCGLGAVALFGANKFKKCQFTGVTSSVEQKDYIEGKCKELKLTNVKVLLADITTYETEERFD
RIFAVELIEHMKNYQLLLKKISEWMKDDGLLFVEHVCHKTLAYHYEPVDAEDWYTNYIFPAGTL
TLSSASMLLYFQDDVSVVNQWTLSGKHYSRSHEEWLKNMDKNIVEFKEIMRSITKTEKEAIKLL
NFWRIFCMCGAELFGdYKNGEEWMLTHLLFKKK (SEQ ID NO: 35)
P. somniferum CNMT (PsCNMT) DNA atgcagctgaaagccaaagaagaactgctgcgtaacatggaactgggcctgattccggatcaggaaattcgtcagctgatccgcgtggaactggaaaaa cgcctgcagtggggttataaagaaacccatgaagaacagctgagtcagctgctggatctggttcacagcctgaaaggcatgaaaatggcaacggaaatg gaaaacctggatctgaaactgtacgaagcgccgatggaatttctgaaaatccagcatggtagcaatatgaaacagtctgcgggctattacaccgatgaaa gcaccacgctggatgaagcagaaattgcgatgctggatctgtatatggaacgtgcacagatcaaagatggccagtctgtgctggatctgggttgcggtct gggtgcagttgcactgtttggtgcgaacaaattcaaaaaatgtcagtttaccggcgtgacgagctctgttgaacagaaagattacattgaaggcaaatgcaa agaactgaaactgaccaatgtgaaagttctgctggcggatatcaccacgtatgaaacggaagaacgttttgatcgcattttcgccgtggaactgatcgaac acatgaaaaactaccagctgctgctgaagaaaattagcgaatggatgaaagatgatggtctgctgttcgtggaacatgtttgtcacaaaaccctggcctatc actacgaaccggttgatgcagaagattggtatacgaactacatcttcccggcaggtaccctgacgctgagtagcgccagcatgctgctgtattttcaggatg atgtgtctgtggttaatcagtggaccctgagtggcaaacattactctcgcagtcacgaagaatggctgaaaaacatggataaaaacatcgttgaattcaaag aaatcatgcgtagcatcaccaaaacggaaaaagaagccattaaactgctgaacttttggcgcatcttctgcatgtgtggcgcagaactgttcggttataaaa atggcgaagaatggatgctgacccacctgctgtttaagaaaaaataa (SEQ ID NO: 36)
Truncated P. somniferum /V-methylcoclaurine 3-hydroxylase isoform XI (PsNMCH-Il)
MDSSPKGLPPGPKPWPIVGNLLQLGEKPHSQFAQLAETYGDLFSLKLGSETVVVASTPLAASEILK THDRVLSGRY VF QSFR VKEH VENSI VW SECNETWKKLRKV CRTELFTQKMIES Q AE VRES KAME MVEYLKKNVGNEVKIAEVVFGTLVNIFGNLIFSQNIFKLGDESSGSVEMKEHLWRMLELGNSTNP ADYFPFLGKFDLFGQRKDVADCLQGIYSVWGAMLKERKIAKQHNNSKKNDFVEILLDSGLDDQ QINALLMEIFGAGTETSASTIEWALSELTKNPQVTANMRLELLSVVGKRPVKESDIPNMPYLQAF VKETLRLHPATPLLLPRRALETCKVLNYTIPKECQIMVNAWGIGRDPKRWTDPLKFSPERFLNSSI DFKGNDFELIPFGAGRRICPGVPL ATQFISLI V S SL V QNFD W GLPKGMDPS QLIMEEKFGLTLQKEP PLYIVPKTRD (SEQ ID NO: 37)
Truncated P. somniferum /V-methylcoclaurine 3-hydroxylase isoform XI (PsNMCH-Il) DNA atggattcaagtcctaaaggtttgccaccaggtccaaaaccctggccaatagttggaaaccttcttcaacttggtgagaaacctcattctcagtttgctcagct tgctgaaacctatggtgatctcttttcactgaaactaggaagtgaaacggttgttgtagcttcaactccattagcagctagcgagattctaaagacgcatgatc gtgttctctctggtcgatacgtgtttcaaagtttccgggtaaaagaacatgtggagaactctattgtgtggtctgaatgtaatgaaacatggaagaaactgcgg aaagtttgtagaacggaactttttacgcagaagatgattgaaagtcaagctgaagttagagaaagtaaggctatggaaatggtggagtatttgaagaaaaat gtaggaaatgaagtgaaaattgctgaagttgtatttgggacgttggtgaatatattcggtaacttgatattttcacaaaatattttcaagttgggtgatgaaagta gtggaagtgtagaaatgaaagaacatctatggagaatgctggaattggggaactcgacaaatccagctgattattttccatttttgggtaaattcgatttgtttg gacaaagaaaagatgttgctgattgtctgcaagggatttatagtgtttggggtgctatgctcaaagaaagaaaaatagccaagcagcataacaacagcaag aagaatgattttgttgagattttgctcgattccggactcgatgaccagcagattaatgccttgctcatggaaatatttggtgcgggaacagagacaagtgcat ctacaatagaatgggcgttgtctgagctcacaaaaaaccctcaagtaacagccaatatgcggttggaattgttatctgtggtagggaagaggccggttaag gaatccgacataccaaacatgccttatcttcaagcttttgttaaagaaactctacggcttcatccagcaactcctctgctgcttccacgtcgagcacttgagac ctgcaaagttttgaactatacgatcccgaaagagtgtcagattatggtgaacgcctggggcattggtcgggatccaaaaaggtggactgatccattgaagtt ttcaccagagaggttcttgaattcgagcattgatttcaaagggaacgacttcgagttgataccatttggtgcagggagaaggatatgtcctggtgtgcccttg gcaactcaatttattagtcttattgtgtctagtttggtacagaattttgattggggattaccgaagggaatggatcctagccaactgatcatggaagagaaattt gggttgacactgcaaaaggaaccacctctgtatattgttcctaaaactcgggattaa (SEQ ID NO: 38)
PsNMCH-IXl-His203Tyr
MDSSPKGLPPGPKPWPIVGNLLQLGEKPHSQFAQLAETYGDLFSLKLGSETVVVASTPLAASEILK THDRVLSGRY VF QSFR VKEH VENSI VW SECNETWKKLRKV CRTELFTQKMIES Q AE VRES KAME MVEYLKKNVGNEVKIAEVVFGTLVNIFGNLIFSQNIFKLGDESSGSVEMKEYLWRMLELGNSTNP ADYFPFLGKFDLFGQRKDVADCLQGIYSVWGAMLKERKIAKQHNNSKKNDFVEILLDSGLDDQ QTNAT J MFTFG A GTETS A STTEW A T SET TKNPQVT ANMRT FT J ,S VVGKR PVKESDTPNMPYI ,QAF VKETLRLHPATPLLLPRRALETCKVLNYTIPKECQIMVNAWGIGRDPKRWTDPLKFSPERFLNSSI DFKGNDFELIPFGAGRRICPGVPL ATQFISLI V S SL V QNFD W GLPKGMDPS QLIMEEKFGLTLQKEP PLYIVPKTRD (SEQ ID NO: 39)
PsNMCH-IXl-His203Tyr-DNA atggattcaagtcctaaaggtttgccaccaggtccaaaaccctggccaatagttggaaaccttcttcaacttggtgagaaacctcattctcagtttgctcagct tgctgaaacctatggtgatctcttttcactgaaactaggaagtgaaacggttgttgtagcttcaactccattagcagctagcgagattctaaagacgcatgatc gtgttctctctggtcgatacgtgtttcaaagtttccgggtaaaagaacatgtggagaactctattgtgtggtctgaatgtaatgaaacatggaagaaactgcgg aaagtttgtagaacggaactttttacgcagaagatgattgaaagtcaagctgaagttagagaaagtaaggctatggaaatggtggagtatttgaagaaaaat gtaggaaatgaagtgaaaattgctgaagttgtatttgggacgttggtgaatatattcggtaacttgatattttcacaaaatattttcaagttgggtgatgaaagta gtggaagtgtagaaatgaaagaatatctatggagaatgctggaattggggaactcgacaaatccagctgattattttccatttttgggtaaattcgatttgtttg gacaaagaaaagatgttgctgattgtctgcaagggatttatagtgtttggggtgctatgctcaaagaaagaaaaatagccaagcagcataacaacagcaag aagaatgattttgttgagattttgctcgattccggactcgatgaccagcagattaatgccttgctcatggaaatatttggtgcgggaacagagacaagtgcat ctacaatagaatgggcgttgtctgagctcacaaaaaaccctcaagtaacagccaatatgcggttggaattgttatctgtggtagggaagaggccggttaag gaatccgacataccaaacatgccttatcttcaagcttttgttaaagaaactctacggcttcatccagcaactcctctgctgcttccacgtcgagcacttgagac ctgcaaagttttgaactatacgatcccgaaagagtgtcagattatggtgaacgcctggggcattggtcgggatccaaaaaggtggactgatccattgaagtt ttcaccagagaggttcttgaattcgagcattgatttcaaagggaacgacttcgagttgataccatttggtgcagggagaaggatatgtcctggtgtgcccttg gcaactcaatttattagtcttattgtgtctagtttggtacagaattttgattggggattaccgaagggaatggatcctagccaactgatcatggaagagaaattt gggttgacactgcaaaaggaaccacctctgtatattgttcctaaaactcgggattaa (SEQ ID NO: 40)
Truncated P. somniferum CYP450 reductase-like (TrcPsCPR-L)
MSFRKSSKIVEVPKTGFTKEPEPEIDDGKKKVTIFFGTQTGTAEGFAKALSEEAKARYDKAVFKV
VDLDDYAADDDEFEEKLKKENLALFFLATYGDGEPTDNAARFYKWFTEVAKEKEPWLPNLNFG
VFGLGNRQYEHFNKVAKVVDEIIVELGGKRLVPVGLGDDDQCIEDDFTAWRELVWPELDQLLLD
ENDSTSVSTPYAAAVAEYRVVFHDSADASLQDKNWSNANGYAVYDALHPCRANVAVRRELHT
PASDRSCIHLEFDISGTGLTYETGDHVGVYSENCMETVEEAERLLGLSSDTVFSIHVDNEDGTPIA
GSALPPPFPSPSTLRTALTKYADLLNFPKKAALHALAAHASDPKEAERLRFLASPAGKDEYAQW
VVASQRSLLEVMAEFPSAKPPLGVFFAAIAPRLQPRFYSISSSNRMAPSRIHVTCALVNERTPAGRI
HKGVCSTWMKNSVPSEESRHCSWAPVFVRQSNFKLPADSTVPIIMIGPGTGLAPFRGFMQERLAL
KEAGVELGAAVLFFGCRNRSMDFIYEDELNNFVESGAISELVVAFSREGPTKEYVQHKMTEKAS
DIWNMISQGAYLYVCGDAKGMAKDVHRTLHTIVQEQGSLDSSKTEMLVKNLQMDGRYLRDVW
(SEQ ID NO: 41) Truncated P. somniferum CYP450 reductase-like (TrcPsCPR-L) DNA atgtcttttaggaaatccagtaaaattgttgaggtacctaaaactggttttactaaagaacctgaacctgaaattgacgatggtaaaaagaaagttactatcttct ttggtactcaaactggtactgctgagggtttcgctaaagcactttctgaagaagcaaaagcaagatatgacaaagctgtctttaaagtggttgatctggatgat tacgcagcagatgatgatgagtttgaggagaaactaaaaaaagaaaatttagcgcttttctttttagctacctacggagacggtgaaccaacagataatgct gccagattttataaatggtttacggaagtggctaaagagaaggaaccatggcttccgaatcttaactttggtgtgtttggattgggaaatagacagtatgagc atttcaataaggttgcaaaggttgttgatgagattattgttgaactgggtgggaaacgtcttgttcctgtgggtcttggagacgacgaccaatgtatagaagat gactttacagcatggcgagagttggtatggcctgaattggatcagttgctccttgatgaaaatgattcaacgagtgtttcaaccccttacgctgctgctgtagc agaatatagggtggtattccatgattctgctgatgcatccctacaagacaagaactggagtaatgccaatggctatgctgtctatgatgctctgcacccatgc agagccaatgtggctgtaagaagggagcttcacactccagcttctgatcgttcttgtattcatctggaatttgacatatcaggcactgggcttacgtatgaaac tggagatcatgttggtgtctactctgaaaactgcatggaaactgtggaggaagcggaaagattgttgggtctttcatcggacactgtattttctattcacgtcg ataacgaggatgggacaccgatcgccggcagcgcattacctcccccttttccctctcccagcactttaagaactgcacttaccaaatatgctgatctattgaa tttccccaagaaggctgctctacatgctctagctgctcatgcatctgatccaaaggaagctgagcgattaagatttcttgcatctcctgctggaaaggatgaat atgcacagtgggtagttgcaagtcagagaagtctgctagaagtcatggctgaatttccatcagctaaacctccacttggggtgttctttgcagcaatagcac ctcggctgcagcctagattctattcgatttcgtcctccaacaggatggcaccctctagaattcatgtcacatgtgcgctagtgaatgagagaacaccagctg gtcgaattcataaaggagtctgttcaacctggatgaagaattctgttccttcggaagaaagccgtcactgcagctgggcaccagtttttgtgagacaatctaa cttcaaactgcctgctgattctacagtaccaattatcatgattggccctggtactgggttggctcctttcagaggattcatgcaggaacgacttgctcttaagga agccggtgtagaattgggagctgcggtcctgttctttggatgcagaaacagaagcatggatttcatttatgaagacgagctgaataactttgtcgagtcaggt gctatctctgagttggtggtcgctttctcacgtgagggtcctaccaaagaatacgtacaacataagatgacagagaaggcttccgacatctggaatatgatct ctcagggtgcttatctttacgtctgtggtgatgccaaaggcatggccaaggatgtgcatcgaactcttcacacaattgttcaagagcagggatctttagacag ctccaagactgaaatgttggtgaagaatctgcagatggatgggaggtatctacgtgatgtctggtga (SEQ ID NO: 42)
Truncated Eschscholzia calif ornica L-methylcoclaurine hydroxylase (EcNMCH)
MNLPPGPKPWPIVGNLLQLGEKPHAQFAELAQTYGDIFTLKMGTETVVVASTSSAASEILKTHDR
ILSARYVFQSFRVKGHVENSIVWSDCTETWKNLRKVCRTELFTQKMIESQAHVREKKCEEMVEY
LMKKQGEEVKIVEVIFGTLVNIFGNLIFSQNIFELGDPNSGSSEFKEYLWRMLELGNSTNPADYFP
MLGKFDLFGQRKEVAECLKGIYAIWGAMLQERKLAKKVDGYKSKNDFVDVCLDSGLNDYQIN
ALLMELFGAGTETSASTIEWAMTELTKNPKITAKIRSEIQTVVGERSVKESDFPNLPYLEATVKET
LRLHPPTPLLLPRRALETCTILNYTIPKDCQIMVNAWGIGRDPKTWTDPLTFSPERFLNSSVDFRG
NDFSLIPFGAGRRICPGLPIANQFIALLVATFVQNLDWCLPNGMSVDHLIVEEKFGLTLQKEPPLFI
VPKSRV (SEQ ID NO: 43)
EcNMCH-DNA atgaatctcccaccaggaccaaaaccgtggccaatagttggaaatctcctccaacttggtgagaaaccacacgctcaattcgccgaactagctcaaaccta tggtgacattttcactcttaaaatgggtactgaaactgtagttgttgcatcaacatcttcagcagcttccgaaatactaaaaacccatgatcgaattctatccgct cgttacgtttttcaaagttttcgagtaaaagggcatgtagaaaattcaatagtttggtcagattgtactgaaacttggaagaatttaagaaaagtttgtaggaca gaacttttcacacagaagatgatagaaagtcaagctcatgttagagagaaaaaatgtgaagaaatggttgaatacttgatgaaaaaacaaggggaagaag tgaaaattgtggaagtaatatttggaacattagtgaatatatttggaaatttgatattttcacagaatatatttgaattgggtgatccaaatagtggaagttcagag ttcaaggaatatctatggaggatgttggaattggggaattcaacaaatccagctgattattttccaatgttaggtaaatttgatttgtttggacagaggaaagaa gttgcagagtgtttaaaagggatttatgcaatatggggagctatgcttcaagaaagaaaattagctaaaaaagttgatggatataaaagcaagaatgattttgt tgatgtttgtcttgattctggacttaatgattatcaaatcaatgccttgcttatggaattatttggggcaggcacagaaacgagcgcatcgacaattgagtgggc catgactgaactaacaaagaatccaaagataacagctaagattagatcagaaattcaaacagtggtaggcgagagatcggtaaaagaatccgacttcccc aatcttccataccttgaagctactgttaaagaaaccctaagacttcacccaccaactccattgctactcccacgccgagcacttgaaacctgtacaatcctca actataccatcccaaaagattgtcaaattatggtcaacgcttggggaatcggtcgtgatcccaagacttggaccgatccgttgactttctcaccagagagatt cttgaattctagtgttgactttagggggaatgatttcagtttgataccatttggtgcaggaagaaggatatgccccggtctgccaatagcaaatcagtttattgc attgctagtggcaacatttgtgcaaaatttggattggtgtctaccaaatgggatgagtgttgaccatttgatagtggaggagaagtttgggttgactcttcaaaa agaaccacctctattcattgttcctaaatcaagggtttga (SEQ ID NO: 44)
EcNMCH-Tyr202His
MNLPPGPKPWPIVGNLLQLGEKPHAQFAELAQTYGDIFTLKMGTETVVVASTSSAASEILKTHDR ILS AR Y VFQSHR VKGH YEN SI VW SDCTETWKNLRKV CRTELFTQKMIES Q AH VREKKCEEM VE Y LMKKQGEEVKIVEVIFGTLVNIFGNLIFSQNIFELGDPNSGSSEFKEHLWRMLELGNSTNPADYFP
MLGKFDLFGQRKEVAECLKGIYAIWGAMLQERKLAKKVDGYKSKNDFVDVCLDSGLNDYQIN
HLLMDLFGAGTETSASTIEWAMTELTKNPKITAKIRSEIQTVVGERSVKESDFPNLPYLEATVKET
LRLHPPEPLLLPRRALETCTILNYTIPKDCQIMVNAWGIGRDPKTWTDPLTFSPERFLNSSVDFRG
NDFSLIPFGAGRRICPGLPIANQFIALLVATFVQNLDWCLPNGMSVDHLIVEEKFGDTLQKEPPLFI
VPKSRV (SEQ ID NO: 45)
EcNMCH-Tyr202His DNA atgaatctcccaccaggaccaaaaccgtggccaatagttggaaatctcctccaacttggtgagaaaccacacgctcaattcgccgaactagctcaaaccta tggtgacattttcactcttaaaatgggtactgaaactgtagttgttgcatcaacatcttcagcagcttccgaaatactaaaaacccatgatcgaattctatccgct cgttacgtttttcaaagtcatcgagtaaaagggcatgtagaaaattcaatagtttggtcagattgtactgaaacttggaagaatttaagaaaagtttgtaggaca gaacttttcacacagaagatgatagaaagtcaagctcatgttagagagaaaaaatgtgaagaaatggttgaatacttgatgaaaaaacaaggggaagaag tgaaaattgtggaagtaatatttggaacattagtgaatatatttggaaatttgatattttcacagaatatatttgaattgggtgatccaaatagtggaagttcagag ttcaaggaacatctatggaggatgttggaattggggaattcaacaaatccagctgattattttccaatgttaggtaaatttgatttgtttggacagaggaaaga agttgcagagtgtttaaaagggatttatgcaatatggggagctatgcttcaagaaagaaaattagctaaaaaagttgatggatataaaagcaagaatgatttt gttgatgtttgtcttgattctggacttaatgattatcaaatcaatcacttgcttatggatttatttggggcaggcacagaaacgagcgcatcgacaattgagtgg gccatgactgaactaacaaagaatccaaagataacagctaagattagatcagaaattcaaacagtggtaggcgagagatcggtaaaagaatccgacttcc ccaatcttccataccttgaagctactgttaaagaaaccctaagacttcacccaccagaaccattgctactcccacgccgagcacttgaaacctgtacaatcct caactataccatcccaaaagattgtcaaattatggtcaacgcttggggaatcggtcgtgatcccaagacttggaccgatccgttgactttctcaccagagag attcttgaattctagtgttgactttagggggaatgatttcagtttgataccatttggtgcaggaagaaggatatgccccggtctgccaatagcaaatcagtttatt gcattgctagtggcaacatttgtgcaaaatttggattggtgtctaccaaatgggatgagtgttgaccatttgatagtggaggagaagtttggggacactcttca aaaagaaccacctctattcattgttcctaaatcaagggtttga (SEQ ID NO: 46)
Truncated Arabidopsis thaliana CYP450 reductase 2 (AtATR2)
MSGSGNSKRVEPLKPLVIKPREEEIDDGRKKVTIFFGTQTGTAEGFAKALGEEAKARYEKTRFKIV
DLDDYAADDDEYEEKLKKEDVAFFFLATYGDGEPTDNAARFYKWFTEGNDRGEWLKNLKYGV
FGLGNRQYEHFNKVAKVVDDILVEQGAQRLVQVGLGDDDQCIEDDFTAWREALWPELDTILRE
EGDTAVATPYTAAVLEYRVSIHDSEDAKFNDINMANGNGYTVFDAQHPYKANVAVKRELHTPE
SDRSCIHLEFDIAGSGLTYETGDHVGVLCDNLSETVDEALRLLDMSPDTYFSLHAEKEDGTPISSS
LPPPFPPCNLRTALTRYACLLSSPKKSALVALAAHASDPTEAERLKHLASPAGKDEYSKWVVESQ
RSLLEVMAEFPSAKPPLGVFFAGVAPRLQPRFYSISSSPKIAETRIHVTCALVYEKMPTGRIHKGVC
STWMKNAVPYEKSENCSSAPIFVRQSNFKLPSDSKVPIIMIGPGTGLAPFRGFLQERLALVESGVE
LGPSVLFFGCRNRRMDFIYEEELQRFVESGALAELSVAFSREGPTKEYVQHKMMDKASDIWNMI
SQGAYLYVCGDAKGMARDVHRSLHTIAQEQGSMDSTKAEGFVKNLQTSGRYLRDVW (SEQ ID
NO: 47)
Truncated Arabidopsis thaliana CYP450 reductase 2 (AtATR2) DNA atgtccggttctgggaattcaaaacgtgtcgagcctcttaagcctttggttattaagcctcgtgaggaagagattgatgatgggcgtaagaaagttaccatctt tttcggtacacaaactggtactgctgaaggttttgcaaaggctttaggagaagaagctaaagcaagatatgaaaagaccagattcaaaatcgttgatttggat gattacgcggctgatgatgatgagtatgaggagaaattgaagaaagaggatgtggctttcttcttcttagccacgtatggagatggtgagcctaccgacaat gcagcgagattctacaaatggttcaccgaggggaatgacagaggagaatggcttaagaacttgaagtatggagtgtttggattaggaaacagacaatatg agcattttaataaggttgccaaagttgtagatgacattcttgtcgaacaaggtgcacagcgtcttgtacaagttggtcttggagatgatgaccagtgtattgaa gatgactttaccgcttggcgagaagcattgtggcccgagcttgatacaatactgagggaagaaggggatacagctgttgccacaccatacactgcagctg tgttagaatacagagtttctattcacgactctgaagatgccaaattcaatgatataaacatggcaaatgggaatggttacactgtgtttgatgctcaacatcctt acaaagcaaatgtcgctgttaaaagggagcttcatactcccgagtctgatcgttcttgtatccatttggaatttgacattgctggaagtggacttacgtatgaaa ctggagatcatgttggtgtactttgtgataacttaagtgaaactgtagatgaagctcttagattgctggatatgtcacctgatacttatttctcacttcacgctgaa aaagaagacggcacaccaatcagcagctcactgcctcctcccttcccaccttgcaacttgagaacagcgcttacacgatatgcatgtcttttgagttctcca aagaagtctgctttagttgcgttggctgctcatgcatctgatcctaccgaagcagaacgattaaaacaccttgcttcacctgctggaaaggatgaatattcaa agtgggtagtagagagteaaagaagtetacttgaggtgatggccgagtttccttcagccaagccaccacttggtgtcttcttcgctggagttgctccaaggtt gcagcctaggttctattcgatatcatcatcgcccaagattgctgaaactagaattcacgtcacatgtgcactggtttatgagaaaatgccaactggcaggatt cataagggagtgtgttccacttggatgaagaatgctgtgccttacgagaagagtgaaaactgttcctcggcgccgatatttgttaggcaatccaacttcaag ctgccttctgattctaaggtaccgatcatcatgatcggtccagggactggattagctccattcagaggattccttcaggaaagactagcgttggtagaatctg gtgttgaacttgggccatcagttttgttctttggatgcagaaaccgtagaatggatttcatctacgaggaagagctccagcgatttgttgagagtggtgctctc gcagagctaagtgtcgccttctctcgtgaaggacccaccaaagaatacgtacagcacaagatgatggacaaggcttctgatatctggaatatgatctctca aggagcttatttatatgtttgtggtgacgccaaaggcatggcaagagatgttcacagatctctccacacaatagctcaagaacaggggtcaatggattcaac taaagcagagggcttcgtgaagaatctgcaaacgagtggaagatatcttagagatgtatggtaa (SEQ ID NO: 48)
E. coli 4-hydroxyphenylacetate 3- monooxygenase complex (EcHpaBC)
MKPEDFRASTQRPFTGEEYLKSLQDGREIYIYGERVKDVTTHPAFRNAAASVAQLYDALHKPEM
QDSLCWNTDTGSGGYTHKFFRVAKSADDLRQQRDAIAEWSRLSYGWMGRTPDYKAAFGCALG
ANPGFYGQFEQNARNWYTRIQETGLYFNHAIVNPPIDRHLPTDKVKDVYIKLEKETDAGIIVSGA
KVVATNSALTHYNMIGFGSAQVMGENPDFALMFVAPMDADGVKLISRASYEMVAGATGSPYD
YPLSSRFDENDAILVMDNVLIPWENVLIYRDFDRCRRWTMEGGFARMYPLQACVRLAVKLDFIT
ALLKKSLECTGTLEFRGVQADLGEVVAWRNTFWALSDSMCSEATPWVNGAYLPDHAALQTYR
VLAPMAYAKIKNIIERNVTSGLIYLPSSARDLNNPQIDQYLAKYVRGSNGMDHVQRIKILKLMWD
AIGSEFGGRHELYEINYSGSQDEIRLQCLRQAQNSGNMDKMMAMVDRCLSEYDQDGWTVPHLH
NNDDINMLDKLLK (SEQ ID NO: 49)
EcHpaB-DNA atgaaaccagaagatttccgcgccagtacccaacgtcctttcaccggggaagagtatctgaaaagcctgcaggatggtcgcgagatctatatctatggcg agcgagtgaaagacgtcaccactcatccggcatttcgtaatgcggcagcgtctgttgcccagctgtacgacgcactgcacaaaccggagatgcaggact ctctgtgttggaacaccgacaccggcagcggcggctatacccataaattcttccgcgtggcgaaaagtgccgacgacctgcgccagcaacgcgacgcc atcgctgagtggtcacgcctgagctatggctggatgggccgtaccccagactacaaagccgctttcggttgcgcactgggcgcgaatccgggcttttacg gtcagttcgagcagaacgcccgtaactggtacacccgtattcaggaaactggcctctactttaaccacgcgattgttaacccaccgatcgatcgtcatttgc cgaccgataaagtgaaagacgtttacatcaagctggaaaaagagactgacgccgggattatcgtcagcggtgcgaaagtggttgccaccaactcggcg ctgactcactacaacatgattggcttcggctcggcacaagtgatgggcgaaaacccggacttcgcactgatgttcgttgcgccaatggatgccgatggcgt gaaattaatctcccgcgcctcttatgagatggtcgcgggtgctaccggctcgccatacgactacccgctctccagccgcttcgatgagaacgatgcgattc tggtgatggataacgtgctgattccatgggaaaacgtgctgatctaccgcgattttgatcgctgccgtcgctggacgatggaaggcggttttgcccgtatgt atccgctgcaagcctgtgtgcgcctggcagtgaaattagacttcattacggcactgctgaaaaaatcactcgaatgtaccggcaccctggagttccgtggt gtgcaggccgatctcggtgaagtggtagcgtggcgcaacaccttctgggcattgagtgactcgatgtgttcagaagcaacgccgtgggtcaacggggct tatttaccggatcatgccgcactgcaaacctatcgcgtactggcaccaatggcctacgcgaagatcaaaaacattatcgaacgcaacgttaccagtggcct gatctatctcccttccagtgcccgtgacctgaataatccgcagatcgaccagtatctggcgaagtatgtgcgcggttcgaacggtatggatcacgtccagc gcatcaagatcctcaaactgatgtgggatgctattggcagcgaatttggtggtcgtcacgaactgtatgaaatcaactactccggtagccaggatgagattc gcctgcagtgtctgcgccaggcacaaaactccggcaatatggacaagatgatggcgatggttgatcgctgcctgtcggaatacgaccaggacggctgg actgtgccgcacctgcacaacaacgacgatatcaacatgctggataagctgctgaaataa (SEQ ID NO: 50)
EcHpaC
MQLDEQRLRFRDAMASLSAAVNIITTEGDAGQCGITATAVCSVTDTPPSLMVCINANSAMNPVF QGNGKLCVNVLNHEQELMARHFAGMTGMAMEERFSLSCWQKGPLAQPVLKGSLASLEGEIRD VQAIGTHLVYLVEIKNIILSAEGHGLIYFKRRFHPVMLEMEAAI (SEQ ID NO: 51)
EcHpaC-DNA atgcaattagatgaacaacgcctgcgctttcgtgacgcgatggccagcctgtcggcagcggtaaatattatcaccaccgagggcgacgccggacaatgc gggattacggcaacggccgtctgctcggtcacggatacaccaccgtcgctgatggtgtgcattaacgccaacagtgcgatgaacccggtttttcagggca acggcaagttgtgcgtcaacgtcctcaaccatgagcaggaactgatggcacgccacttcgcgggcatgacaggcatggcgatggaagagcgttttagc ctctcatgctggcaaaaaggtccgctggcgcagccggtgctaaaaggttcgctggccagtcttgaaggtgagatccgcgatgtgcaggcaattggcaca catctggtgtatctggtggagattaaaaacatcatcctcagtgcagaaggtcatggacttatctactttaaacgccgtttccatccggtgatgctggaaatgga agctgcgatttaa (SEQ ID NO: 52) Azospirillum brasilense indolepyruvate/phenylpyruvate decarboxylase (ipdC)
MKLAEALLRALKDRGAQAMFGIPGDFALPFFKVAEETQILPLHTLSHEPAVGFAADAAARYSSTL
GVAAVTYGAGAFNMVNAVAGAYAEKSPVVVISGAPGTTEGNAGLLLHHQGRTLDTQFQVFKEI
TVAQARLDDPAKAPAEIARVLGAARALSRPVYLEIPRNMVNAEAEPVGDDPAWPVDRDALAAC
ADEVLAAMRAATSPVLMVCVEVRRYGLEAKVAELAQRLGVPVVTTFMGRGLLADAPTPPLGTY
IGVAGDAEITRLVEESDGLFLLGAILSDTNFAVSQRKIDLRKTIHAFDRAVTLGYHTYADIPLGGL
VDALLERLPPSDRTTRGKEPHAYPTGLQADGEPIAPMDIARAVNDRVRAGQEPLLIAADMGDCL
FTAMDMIDAGLMAPGYYAGMGFGVPAGIGAQCVSGGKRILTVVGDGAFQMTGWELGNCRRLG
IDPIVILFNNASWEMLRTFQPESAFNDLDDWRFADMAAGMGGDGVRVRTRAELKAALDKAFAT
RGRFQLIEAMIPRGVLSDTLARFVQGQKRLHAAPRE (SEQ ID NO: 53)
Azospirillum brasilense indolepyruvate/phenylpyruvate decarboxylase (ipdC) DNA atgaagctggccgaagccctgctgcgcgcgctgaaggatcgcggcgcacaggccatgttcgggattccgggcgatttcgccttgcccttcttcaaggtg gcggaggaaacgcagatcctgcccctccacacactgagccacgagccggcggtgggcttcgcggcggacgcggcggcgcgctacagctccacgct gggggtggcggcggtcacctacggggcgggcgccttcaacatggtgaacgcggtggccggcgcctacgccgagaagtcgccggtcgtcgtcatctc cggcgcgccgggcacgacggaaggcaacgccggcctgctgctgcaccaccagggccgcacgctggacacgcagttccaggtgttcaaggagatca ccgtcgcccaggcgcggctggacgacccggccaaggccccggcggagatcgcccgcgtgctgggcgccgcccgcgccctgtcgcgccccgtctat ctggaaatcccccgcaacatggtcaacgccgaggccgagccggtgggcgacgaccccgcttggccggtggaccgcgacgcgctggccgcctgcgc ggacgaggtgctggcggccatgcgcgcggccacgtcgccggtgctgatggtctgcgtcgaggtccgccgctacgggctggaggccaaggtggcgg agttggcgcagcggctgggcgtgccggtggtgaccaccttcatggggcgcggcctgctggccgacgcgccgaccccgccgctcggcacctacatcg gcgtcgccggcgacgcggagatcacccggctggtcgaggagtcggacgggctgttcctgctcggcgccatcctcagcgacaccaacttcgcggtgtc ccagcgcaagatcgacctgcgcaagaccatccacgccttcgaccgggcggtgaccctgggctatcacacctacgccgacatcccgctgggcgggctg gtggacgcgctgctggagcggctgccgccgtccgaccggacgacgcgcggcaaggagccccacgcctatccgaccggcctgcaggcggacggcg aaccgatcgccccgatggacatcgcccgcgccgtcaacgaccgcgtccgcgccgggcaggagccgctgctgatcgccgcggacatgggcgactgc ctgttcaccgccatggacatgatcgacgccgggctgatggcgccgggctactatgcgggcatgggcttcggcgtgccggcgggcatcggggcgcagt gcgtgtcgggcggcaagcgcatcctgaccgtggtcggcgacggcgccttccagatgaccgggtgggagcttggcaactgccgacggctgggcatcg accccatcgtgatcctgttcaacaacgccagctgggagatgctgcgcaccttccagcccgagtccgccttcaacgacctggacgactggcgcttcgccg acatggcggcgggcatgggcggcgacggcgtccgcgtgcgcacacgggcggagctgaaggcggcgctggacaaggccttcgccacgcgcgggc gcttccagctgatcgaggcgatgatcccgcgcggcgtgctgtccgacacgctggcccgcttcgtccaagggcagaagcgcctgcacgccgcaccccg ggagtaa (SEQ ID NO: 54)
C.japonica 3-hydroxy -N-methylcoclaurine 4-0 methyltransferase (Cj40MT)
MSFHGKDDVLDIKAQAHVWKIIYGFADSLVLRCAVELGIVDIIDNNNQPMALADLASKLPVSDV
NCDNLYRILRYLVKMEILRVEKSDDGQKKYALEPIATLLSRNAKRSMVPMILGMTQKDFMTPW
HSMKDGLSDNGTAFEKAMGMTIWEYLEGHPDQSQLFNEGMAGETRLLTSSLISGSRDMFQGIDS
LVDVGGGNGTTVKAISDAFPHIKCTLFDLPHVIANSYDLPNIERIGGDMFKSVPSAQAIILKLILHD
WNDEDSIKILKQCRNAVPKDGGKVIIVDVALDEESDHELSSTRLILDIDMLVNTGGKERTKEVWE
KIVKSAGFSGCKIRHIAAIQSVIEVFP (SEQ ID NO: 55)
C.japonica 3-hydroxy -N-methylcoclaurine 4-0 methyltransferase (Cj40MT) DNA atgtctttccatgggaaagatgatgttctggacatcaaagctcaagctcatgtgtggaaaatcatctatggttttgcagattccctagtcctccgatgtgcagtg gaacttggaatcgtcgacatcattgataacaacaaccaacccatggcacttgccgatctggcatctaagcttcctgtttccgatgtgaattgcgataatttgtat cggatattacgatacttggtgaaaatggaaatactgagagtggaaaaatctgatgatggtcagaagaagtacgcgcttgaacctattgcaacattgctttcaa ggaatgcgaagaggagtatggttccaatgattcttggaatgactcaaaaagattttatgactccttggcattcaatgaaggatggcttaagtgacaatggtac tgcttttgagaaggccatgggaatgactatatgggagtacttggaaggacaccctgatcaaagccaattattcaatgaaggcatggccggtgaaacaagg cttctcacttcttcactcatatctggaagtagagatatgtttcaaggtattgactcacttgttgatgttggtggaggaaatggtactactgtcaaggccatttctga cgcatttccacatatcaagtgcaccctctttgatctccctcatgtcattgccaattcctatgaccttcctaatattgaacgaattggtggcgacatgtttaaatccg tgcccagtgcccaagctatcatactcaagctaattttgcacgattggaatgacgaagactcgatcaagattttaaagcaatgcagaaatgcagtgccaaaag atggaggaaaagtgattatagtggatgtggcattagatgaggagtcagaccatgagcttagcagcacacgattgatccttgatatcgatatgttggtgaaca ctggtggtaaagagcggactaaagaggtttgggagaaaattgtgaaaagtgcaggatttagtggttgcaaaatcaggcacatagcggctatacaatcagt cattgaggtttttccatag (SEQ ID NO: 56) C.japonica coclaurine /V-methyltransferase (CNMT)
MAVEAKQTKKAAIVELLKQLELGLVPYDDIKQLIRRELARRLQWGYKPTYEEQIAEIQNLTHSLR QMKIATEVETLDSQLYEIPIEFLKIMNGSNLKGSCCYFKEDSTTLDEAEIAMLDLYCERAQIQDGQ SVLDLGCGQGALTLHVAQKYKNCRVTAVTNSVSQKEYIEEESRRRNLLNVEVKLADITTHEMAE TYDRILVIELFEHMKNYELLLRKISEWISKDGLLFLEHICHKTFAYHYEPLDDDDWFTEYVFPAGT MIIPSASFFLYFQDDVSVVNHWTLSGKHFSRTNEEWLKRLDANLDVIKPMFETLMGNEEEAVKLI NYWRGFCLSGMEMFGYNNGEEWMASHVLFKKK (SEQ ID NO: 57)
C.japonica coclaurine /V-methyltransferase (CNMT) DNA atggctgtggaagcaaagcaaacaaagaaggcagccatagtagagttgttaaaacagttggagctgggcttggttccatatgatgatattaagcagctcat aaggagggaactggcaaggcgcctgcaatggggttataaacctacttatgaagaacaaatagctgaaatccaaaacttaactcattctctgcgacaaatga aaattgcaacagaggttgagaccttggattcacaattgtacgagattcctattgagtttctaaagattatgaatggaagtaacttaaaaggaagttgttgctact tcaaagaagattcaacaacattagatgaagctgagatagcgatgctggatttatactgcgagagagctcaaatccaagatggacagagtgttcttgatcttg gatgtgggcaaggagctcttacattacatgttgcacagaaatataaaaactgtcgcgtaacagcagtaacaaattcagtttcacaaaaagagtacattgaag aagaatcaaggagacgtaatttgttgaatgtggaagtcaaattggcagacataaccacacatgagatggctgagacatacgatcgtattttggtaatagagt tgtttgagcacatgaagaactatgaacttctcctgaggaaaatctcagagtggatatcgaaagatgggcttctctttctagagcacatatgccacaagaccttt gcttaccactatgagcctctagacgacgacgattggtttacagagtacgtgtttcctgctgggactatgatcataccatctgcatcgttctttttgtatttccagg atgacgtttcggttgtgaaccattggactcttagtgggaagcacttttcgcgtaccaatgaggaatggttgaagagattggacgcaaaccttgatgttattaaa ccaatgtttgagactttaatgggaaatgaggaagaggcagtgaagttgattaactattggagaggattttgtttatctggaatggaaatgtttggatataacaat ggtgaagaatggatggcaagtcatgttctgttcaagaaaaaatga (SEQ ID NO: 58)
Bombyx mori 3,4-dihydroxyphenylacetaldehyde synthase (DHPAAS)
MDANQFREFGRAVIDMLASYAENIRDYDVLPSVEPGYLLRALPESAPEQPEDWKDIMKDFNQSI MPGVTHWQSPQFHAFYPSGSSFASIIGNMLSDGLAVVGFSWMASPACTELEVVTMNWLGKLLD LPEEFLNCSSGPGGGVIQGSASEATLVGLLVAKDKTVRRFMNNNPDLDENEIKAKLVAYTSDQC NSSVEKAGLLGSMKMKLLKADADGCLRGETLKRAIEEDKSQGLIPCYVVANLGTTGTCAFDPLH ELGPICSEEDIWLHVDAAYAGAAFLCPEYRHLMKGIEHSQSFVTNAHKWLPVNFDCSAMWVKN GYDITRAFD V QRIYLDD VKTTIKIPD YRHW QMPLGRRFRALKLWTVMRIY GAEGLKTHIRQQIEL AQYFAKLVRADERFVIGPEPTMALVCFRLKDGDTITRQLLENITQKKKVFMVAGTHRDRYVIRF VICSRLTKKEDVDYSWSQIKKETDLIYSDKIHNKAQIPALEQFTSRELCEKSK (SEQ ID NO: 59)
Bombyx mori 3,4-dihydroxyphenylacetaldehyde synthase (DHPAAS) DNA atggacgcgaaccagtttcgggaattcggcagagcagtcatcgatatgttggcgagctacgctgaaaacataagggattatgatgttctgccgtctgttgaa cctggttacttgttaagagctctgcctgaaagtgcacccgaacaaccggaggattggaaagacataatgaaagatttcaatcaatcaataatgccgggtgta acacactggcaatctccgcagttccatgcatactttccttctggctcctcattcgcgagtattataggaaatatgcttagtgacggcttggcggttgtgggattt agttggatggctagccctgcgtgtacagaactggaggtggtoacgatgaattggcttggtaagctattagatttacctgaagaatttotaaattgctcttcggg tcccggaggcggagttattcaaggatcagcgagcgaagccaccttagttggattgcttgtagccaaagataaaactgttcgccgtttcatgaataataatcc agatctcgatgaaaacgagataaaagcaaaacttgtagcttatacatccgaccaatgtcactcgtcagtagaaaaagccggcttacttggttctatgaaaat gaaactgttgaaagcggatgctgatggatgcctacgcggagaaacattgaaaagggcaatcgaagaagataagtcgcaaggacttataccctgctatgtc gtcgctaatctgggaacaactggaacttgtgcattcgatcctttacatgaattagggccaatatgtagtgaggaagatatatggcttcacgtagatgcagcct atgcgggagcagcatttttgtgtcctgaatacagacacctaatgaaaggtattgaacattctcaatcgttcgtaacgaatgcacataagtggctaccggttaat ttcgattgctctgctatgtgggttaaaaatggttatgatataacgagagcattcgatgtacaaagaatttatttggatgatgtaaaaacgacaatcaagatccca gattacagacactggcaaatgccactaggccgtcgctttagagccttaaaattgtggacagttatgagaatttatggtgctgaaggtttgaaaacgcatatca gacagcaaatagaattagcacagtattttgcaaaactggtacgtgcggacgaacgttttgtgattgggcccgaaccgactatggcattggtctgttttagact gaaagacggtgacacaattacgcgacaattgttggaaaatataacgcaaaaaaagaaagtgtttatggtggccggaacgcacagggatagatacgtcatt agattcgtgatctgctcccgattgactaagaaggaagacgtcgattacagctggagccaaataaagaaagaaaccgacctcatctattcagataaaataca caataaagcacaaataccagctcttgaacaattcacttcaagagaactatgcgaaaaatctaagtaa (SEQ ID NO: 60)
Lactococcus lactis phenylpyruvate decarboxylase (KdcA)
MYTVGDYLLDRLHELGIEEIFGVPGDYNLQFLDQIISREDMKWIGNANELNASYMADGYARTKK
AAAFLTTFGVGELSAINGLAGSYAENLPVVEIVGSPTSKVQNDGKFVHHTLADGDFKHFMKMHE
PVTAARTLLTAENATYEIDRVLSQLLKERKPVYINLPVDVAAAKAEKPALSLEKESSTTNTTEQVI LSKIEESLKNAQKPVVIAGHEVISFGLEKTVTQFVSETKLPITTLNFGKSAVDESLPSFLGIYNGKLS EISLKNFVESADFILMLGVKLTDSSTGAFTHHLDENKMISLNIDEGIIFNKVVEDFDFRAVVSSLSE LKGIEYEGQYIDKQYEEFIPSSAPLSQDRLWQAVESLTQSNETIVAEQGTSFFGASTIFLKSNSRFIG QPLWGSIGYTFPAALGSQIADKESRHLLFIGDGSLQLTVQELGLSIREKLNPICFIINNDGYTVEREI HGPTQS YNDIPMWNYSKLPETFGATEDRVVSKI VRTENEFV S VMKEAQADVNRMYWIELVLEK EDAPKLLKKMGKLFAEQNK (SEQ ID NO: 61)
Lactococcus lactis phenylpyruvate decarboxylase (KdcA) DNA atgtatacagtaggagattacctgttagaccgattacacgagttgggaattgaagaaatttttggagttcctggtgactataacttacaatttttagatcaaattat ttcacgcgaagatatgaaatggattggaaatgctaatgaattaaatgcttcttatatggctgatggttatgctcgtactaaaaaagctgccgcatttctcaccac atttggagtcggcgaattgagtgcgatcaatggactggcaggaagttatgccgaaaatttaccagtagtagaaattgttggttcaccaacttcaaaagtaca aaatgacggaaaatttgtccatcatacactagcagatggtgattttaaacactttatgaagatgcatgaacctgttacagcagcgcggactttactgacagca gaaaatgccacatatgaaattgaccgagtactttctcaattactaaaagaaagaaaaccagtctatattaacttaccagtcgatgttgctgcagcaaaagcag agaagcctgcattatctttagaaaaagaaagctctacaacaaatacaactgaacaagtgattttgagtaagattgaagaaagtttgaaaaatgcccaaaaac cagtagtgattgcaggacacgaagtaattagttttggtttagaaaaaacggtaactcagtttgtttcagaaacaaaactaccgattacgacactaaattttggta aaagtgctgttgatgaatctttgccctcatttttaggaatatataacgggaaactttcagaaatcagtcttaaaaattttgtggagtccgcagactttatcctaatg cttggagtgaagcttacggactcctcaacaggtgcattcacacatcatttagatgaaaataaaatgatttcactaaacatagatgaaggaataattttcaataa agtggtagaagattttgattttagagcagtggtttcttctttatcagaattaaaaggaatagaatatgaaggacaatatattgataagcaatatgaagaatttattc catcaagtgctcccttatcacaagaccgtctatggcaggcagttgaaagtttgactcaaagcaatgaaacaatcgttgctgaacaaggaacctcattttttgg agcttcaacaattttcttaaaatcaaatagtcgttttattggacaacctttatggggttctattggatatacttttccagcggctttaggaagccaaattgcggata aagagagcagacaccttttatttattggtgatggttcacttcaacttaccgtacaagaattaggactatcaatcagagaaaaactcaatccaatttgttttatcat aaataatgatggttatacagttgaaagagaaatccacggacctactcaaagttataacgacattccaatgtggaattactcgaaattaccagaaacatttgga gcaacagaagatcgtgtagtatcaaaaattgttagaacagagaatgaatttgtgtctgtcatgaaagaagcccaagcagatgtcaatagaatgtattggata gaactagttttggaaaaagaagatgcgccaaaattactgaaaaaaatgggtaaattatttgctgagcaaaataaatag (SEQ ID NO: 62)
Thalictrum flavum norcoclaurine synthase (TfNCS)
MMKMEVVFVFLMLLGTINCQKLILTGRPFLHHQGIINQVSTVTKVIHHELEVAASADDIWTVYS WPGLAKHLPDLLPGAFEKLEIIGDGGVGTILDMTFVPGEFPHEYKEKFILVDNEHRLKKVQMIEG GYLDLGVTYYMDTIHVVPTGKDSCVIKSSTEYHVKPEFVKIVEPLITTGPLAAMADAISKLVLEH KSKSNSDEIEAAIITV (SEQ ID NO: 68)
Thalictrum flavum norcoclaurine synthase (TfNCS) DNA atgatgaagatggaagttgtatttgttttcttaatgttgttaggaacaataaattgccagaaactgattctgacaggtaggccgtttctgcaccacc agggcataataaaccaggtgtctacagtcacaaaagtgattcatcatgagttggaagttgctgcttcagctgatgatatatggactgtttatagct ggcctggcttggccaagcatcttcctgacttgctccctggcgcttttgaaaagctagaaatcattggtgatggaggtgttggtaccatcctaga catgacatttgtaccaggtgaatttcctcatgaatacaaggagaagtttatattagtcgataatgagcatcgtttaaagaaggtgcaaatgattga gggaggttatctggacttgggagtaacatactacatggacacaatccatgttgttccaactggtaaagattcatgtgttattaaatcctcaactga gtaccatgtgaaacctgagtttgtcaaaatcgttgaaccacttatcaccaccggtccattagctgccatggcagacgccatctcaaaacttgttc tagaacacaaatccaaaagcaactcagatgaaattgaggccgcaataataacagtctga (SEQ ID NO: 69)
P. putida L-DOPA decarboxylase (PpDDC) Hisl81Leu
MTPEQFRQYGHQLIDLIADYRQTVGERPVMAQVEPGYLKAALPATAPQQGEPFAAILDDVNNLV
MPGLSHWQHPDFYGYFPSNGTLSSVLGDFLSTGLGVLGLSWQSSPALSELEETTLDWLRQLLGLS
GQWSGVIQDTASTSTLVALISARERATDYALVRGGLQAEPKPLIVYVSAHALSSVDKAALLAGFG
RDNIRLIPTDERYALRPEALQAAIEQDLAAGNQPCAVVATTGTTTTTALDPLRPVGEIAQANGLW
LHVDSAMAGSAMILPECRWMWDGIELADSVVVNAHKWLGVAFDCSIYYVRDPQHLIRVMSTNP
SYLQSAVDGEVKNLRDWGIPLGRRFRALKLWFMLRSEGVDALQARLRRDLDNAQWLAGQVEA
AAEWEVLAPVQLQTLCIRHRPAGLEGEALDAHTKGWAERLNASGAAYVTPATLDGRWMVRVSI
GALPTERGDVQRLWARLQDVIKG (SEQ ID NO: 70)
P. putida L-DOPA decarboxylase (PpDDC) Hisl81Leu DNA atgacccccgaacaattccgccagtacggccaccaactgatcgacctgattgccgactaccgccagaccgtgggcgaacgcccggtcatggcccaggt cgaacctggctatctcaaggccgccttgcccgcaactgcccctcaacaaggcgaacctttcgcggccattctcgacgacgtcaataacctggtcatgccc ggcctgtcccattggcagcacccggacttctatggctatttcccttccaatggcaccctgtcctcggtgctgggggacttcctcagtaccggtctgggcgtg ctgggcctgtcctggcaatccagcccggccctgagcgaactggaagaaaccaccctcgactggctgcgccagttgcttggcctgtctggccagtggagt ggggtgatccaggacactgcctcgaccagcaccctggtggcgctgatcagtgcccgtgaacgcgccactgactacgccctggtacgtggtggcctgca ggccgagcccaagcctttgatcgtgtatgtcagcgcccacgcccttagctcggtggacaaggctgcactgctggcaggttttggccgcgacaatatccgc ctgattcccaccgacgaacgctacgccctgcgcccagaggcactgcaggcggcgatcgaacaggacctggctgccggcaaccagccgtgcgccgtg gttgccaccaccggcaccacgacgaccactgccctcgacccgctgcgcccggtcggtgaaatcgcccaggccaatgggctgtggttgcacgttgactc ggccatggccggttcggcgatgatcctgcccgagtgccgctggatgtgggacggcatcgagctggccgattcggtggtggtcaacgcgcacaaatggc tgggtgtggccttcgattgctcgatctactacgtgcgcgatccgcaacacctgatccgggtgatgagcaccaatcccagctacctgcagtcggcggtgga tggcgaggtgaagaacctgcgcgactgggggataccgctgggccgtcggttccgtgcgttgaagctgtggttcatgttgcgcagcgagggtgtcgacg cattgcaggcgcggctgcggcgtgacctggacaatgcccagtggctggcggggcaggtcgaggcggcggcggagtgggaagtgttggcgccagta cagctgcaaaccttgtgcattcgccatcgaccggcggggcttgaaggggaggcgctggatgcgcataccaagggctgggccgagcggctgaatgcat ccggcgctgcttatgtgacgccggctacactggacgggcggtggatggtgcgggtttcgattggtgcgctgccgaccgagcggggggatgtgcagcg gctgtgggcacgtctgcaggacgtgatcaagggctga (SEQ ID NO: 71)
Beta vulgaris CYP76AD5 (BvCYP76AD5)
MDNTTLALILSSLFVCFQLIRSFINHAKKSNKLPPGPKRMPIFGNIFDLGEKPHRSFANLAKIHGPL V SLQLGS VTTVVV SS AD VAKEMFLKNDQAL ANRTIPDS VRAGDHDKLSMS WLPV S AKWRNLRK ISAVQLLSTQRFDASQAHRQSKVQQFLEYVHDCSKKGQPVDIGRAAFTTSLNLFSNTFFS VELAS HESSASQEFKQLMWNIMEEIGRPNYADFFPILGYLDPFGIRRRLAGYFDQLIAVFQDIIGERQKIRS ANLSGGKQTTNDILDTLLNLYDEKELSMGEINHLLVDIFDAGTDTTASTLEWAMAELVKNPDMM VKVQDEIEQAIGKGCSMVQESDISKLPYLQAIIKETLRLHPPTVFLLPRKADADVELYGYVVPKN AQVLVNLWAIGRDPKVWKNPEVFSPERFLESNIDYKGRDFELLPFGAGRRICPGLTLAYRMLNL MMNFLHSYDWKLEDGMHPKDLDMDEKFGITLQKVKPLQVIPVPRK (SEQ ID NO: 72)
Beta vulgaris CYP76AD5 (BvCYP76AD5) DNA atggataacactacacttgcattgatactttcttctttatttgtatgttttcaacttattcgatctttcattaaccatgctaaaaaatccaacaaacttccaccagggc caaaaagaatgccgatttttggcaatatttttgatcttggtgaaaaacctcatcgctcatttgcgaatcttgctaagattcacggccctttggtgagcctacagtt aggaagtgttacaactgttgtagtatcatcagcagatgtggctaaagaaatgttccttaaaaatgatcaagcacttgctaacagaactatccctgattcagtta gagcaggtgatcatgataagctgtctatgtcatggttgcctgtatcggctaaatggcggaacctaagaaaaatctccgctgtgcaattgctttcgacgcaac gacttgatgctagtcaagcgcatagacaatccaaggtgcaacaacttcttgaatatgtgcatgattgttctaaaaaaggacaacctgttgacattggaaggg cagcatttactacttcactcaatttattatcaaacacatttttctcagttgaattagctagccatgaatctagtgcttcacaagagtttaagcaactcatgtggaac attatggaggaaattggtaggcctaattatgctgatttcttccccattcttggctaccttgatccttttggcataaggcgtcgtttggctggttactttgatcaactt attgctgtttttcaagacattattggtgaaaggcaaaagattcgatctgctaatctttctggtgggaaacaaacaacaaatgacattcttgacactcttctc aacctctatgatgagaaagagttgagtatgggtgaaatcaatcatctcctagtggatatctttgacgctggaacagacactacagctagcacattggaatgg gcaatggcagagctagttaaaaatccggatatgatggtcaaagttcaagacgaaatcgagcaagcgattggaaaaggttgttcaatggttcaagaatccg atatctcaaaactcccatacttgcaagctattatcaaggaaacattgcgtctacatcctccaactgtatttctcttacctcgaaaggcagacgctgacgtggag ttatatggttatgttgtacccaaaaatgcacaagttctagtcaatctatgggcaattggtcgtgatccaaaggtatggaaaaatccagaagtattttctcctgaa aggtttttagagagtaatattgattacaagggacgagattttgagcttttaccatttggtgctggaagaaggatatgtcctggactcactctagcttatagaatgt taaatttgatgatggccaattttcttcattcctatgattggaagcttgaagatggtatgcatccaaaagatttggacatggatgagaaatttggtataactttgca gaaggttaagcctctccaagttattcccgtacctaggaaataa (SEQ ID NO: 73)
EcTAT - Tyrosine aminotransferase (TAT) of E. coli
MFQKVDAYAGDPILTLMERFKEDPRSDKVNLSIGLYYNEDGIIPQLQAVAEAEARLNAQPHGASL
YLPMEGLNCYRHAIAPLLFGADHPVLKQQRVATIQTLGGSGALKVGADFLKRYFPESGVWVSDP
TWENHVAIFAGAGFEVSTYPWYDEATNGVRFNDLLATLKTLPARSIVLLHPCCHNPTGADLTND
QWDAVIEILKARELIPFLDIAYQGFGAGMEEDAYAIRAIASAGLPALVSNSFSKIFSLYGERVGGLS
VMCEDAEAAGRVLGQLKATVRRNYSSPPNFGAQVVAAVLNDEALKASWLAEVEEMRTRILAM
RQELVKVLSTEMPERNFDYLLNQRGMFSYTGLSAAQVDRLREEFGVYLIASGRMCVAGLNTAN
VQRVAKAFAAVM (SEQ ID NO: 74)
EcTAT - Tyrosine aminotransferase (TAT) of E. coli DNA atgtttcaaaaagttgacgcctacgctggcgacccgattcttacgcttatggagcgttttaaagaagaccctcgcagcgacaaagtgaatttaagtatcggtc tgtactacaacgaagacggaattattccacaactgcaagccgtggcggaggcggaagcgcgcctgaatgcgcagcctcatggcgcttcgctttatttacc gatggaagggcttaactgctatcgccatgccattgcgccgctgctgtttggtgcggaccatccggtactgaaacaacagcgcgtagcaaccattcaaacc cttggcggctccggggcattgaaagtgggcgcggatttcctgaaacgctacttcccggaatcaggcgtctgggtcagcgatcctacctgggaaaaccac gtagcaatattcgccggggctggattcgaagtgagtacttacccctggtatgacgaagcgactaacggcgtgcgctttaatgacctgttggcgacgctgaa aacattacctgcccgcagtattgtgttgctgcatccatgttgccacaacccaacgggtgccgatctcactaatgatcagtgggatgcggtgattgaaattctc aaagcccgcgagcttattccattcctcgatattgcctatcaaggatttggtgccggtatggaagaggatgcctacgctattcgcgccattgccagcgctgga ttacccgctctggtgagcaattcgttctcgaaaattttctccctttacggcgagcgcgtcggcggactttctgttatgtgtgaagatgccgaagccgctggcc gcgtactggggcaattgaaagcaacagttcgccgcaactactccagcccgccgaattttggtgcgcaggtggtggctgcagtgctgaatgacgaggcat tgaaagccagctggctggcggaagtagaagagatgcgtactcgcattctggcaatgcgtcaggaattggtgaaggtattaagcacagagatgccagaac gcaatttcgattatctgcttaatcagcgcggcatgttcagttataccggtttaagtgccgctcaggttgaccgactacgtgaagaatttggtgtctatctcatcg ccagcggtcgcatgtgtgtcgccgggttaaatacggcaaatgtacaacgtgtggcaaaggcgtttgctgcggtgatgtaa (SEQ ID NO: 75)
Full length PsNMCH-Il
MEIVTVSLVAVVITTFLYLIFRDSSPKGLPPGPKPWPIVGNLLQLGEKPHSQFAQLAETYGDLFSLK LGSETVVVASTPLAASEILKTHDRVLSGRYVFQSFRVKEHVENSIVWSECNETWKKLRKVCRTEL FTQKMIESQAEVRESKAMEMVEYLKKNVGNEVKIAEVVFGTLVNIFGNLIFSQNIFKLGDESSGS VEMKEHLWRMLELGN STNP AD YFPFLGKFDLFGQRKD V ADCLQGI Y S V WGAMLKERKI AKQH NNSKKNDFVEILLDSGLDDQQINALLMEIFGAGTETSASTIEWALSELTKNPQVTANMRLELLSV VGKRPVKESDIPNMPYLQAFVKETLRLHPATPLLLPRRALETCKVLNYTIPKECQIMVNAWGIGR DPKRWTDPLKFSPERFLNSSIDFKGNDFELIPFGAGRRICPGVPLATQFISLIVSSLVQNFDWGLPK GMDPSQLIMEEKFGLTLQKEPPLYIVPKTRD SEQ ID NO:76 )
Full length PsNMCH-Il DNA atggagatcgtcacagtatcacttgtagcagttgtgatcactactttcttatacttaatcttcagagattcaagtcctaaaggtttgccaccaggtccaaaaccct ggccaatagttggaaaccttcttcaacttggtgagaaacctcattctcagtttgctcagcttgctgaaacctatggtgatctcttttcactgaaactaggaagtg aaacggttgttgtagcttcaactccattagcagctagcgagattctaaagacgcatgatcgtgttctctctggtcgatacgtgtttcaaagtttccgggtaaaa gaacatgtggagaactctattgtgtggtctgaatgtaatgaaacatggaagaaactgcggaaagtttgtagaacggaactttttacgcagaagatgattgaa agtcaagctgaagttagagaaagtaaggctatggaaatggtggagtatttgaagaaaaatgtaggaaatgaagtgaaaattgctgaagttgtatttgggac gttggtgaatatattcggtaacttgatattttcacaaaatattttcaagttgggtgatgaaagtagtggaagtgtagaaatgaaagaacatctatggagaatgct ggaattggggaactcgacaaatccagctgattattttccatttttgggtaaattcgatttgtttggacaaagaaaagatgttgctgattgtctgcaagggatttat agtgtttggggtgctatgctcaaagaaagaaaaatagccaagcagcataacaacagcaagaagaatgattttgttgagattttgctcgattccggactcgat gaccagcagattaatgccttgctcatggaaatatttggtgcgggaacagagacaagtgcatctacaatagaatgggcgttgtctgagctcacaaaaaaccc tcaagtaacagccaatatgcggttggaattgttatctgtggtagggaagaggccggttaaggaatccgacataccaaacatgccttatcttcaagcttttgtta aagaaactctacggcttcatccagcaactcctctgctgcttccacgtcgagcacttgagacctgcaaagttttgaactatacgatcccgaaagagtgtcaga ttatggtgaacgcctggggcattggtcgggatccaaaaaggtggactgatccattgaagttttcaccagagaggttcttgaattcgagcattgatttcaaagg gaacgacttcgagttgataccatttggtgcagggagaaggatatgtcctggtgtgcccttggcaactcaatttattagtcttattgtgtctagtttggtacagaat tttgattggggattaccgaagggaatggatcctagccaactgatcatggaagagaaatttgggttgacactgcaaaaggaaccacctctgtatattgttccta aaactcgggattaa (SEQ ID NO: 77)
Full length PsCPR-L
MESNSMKLSIVDLMSAILNGKLDQADSILIENREILMILTTAIAVFIGCGFLYIWRRSFRKSSKIVEV
PKTGFTKEPEPEIDDGKKKVTIFFGTQTGTAEGFAKALSEEAKARYDKAVFKVVDLDDYAADDD
EFEEKLKKENLALFFLATYGDGEPTDNAARFYKWFTEVAKEKEPWLPNLNFGVFGLGNRQYEHF
NKVAKVVDEIIVEFGGKRFVPVGLGDDDQCIEDDFTAWREFVWPEFDQFFFDENDSTSVSTPYA
AAVAEYRVVFHDSADASLQDKNWSNANGYAVYDALHPCRANVAVRRELHTPASDRSCIHLEFD
ISGTGLTYETGDHVGVYSENCMETVEEAERLLGLSSDTVFSIHVDNEDGTPIAGSALPPPFPSPSTL
RTALTKYADLLNFPKKAALHALAAHASDPKEAERLRFLASPAGKDEYAQWVVASQRSLLEVMA
EFPSAKPPLGVFFAAIAPRLQPRFYSISSSNRMAPSRIHVTCALVNERTPAGRIHKGVCSTWMKNS
VPSEESRHCSWAPVFVRQSNFKLPADSTVPIIMIGPGTGLAPFRGFMQERLALKEAGVELGAAVL
FFGCRNRSMDFIYEDELNNFVESGAISELVVAFSREGPTKEYVQHKMTEKASDIWNMISQGAYLY
VCGDAKGMAKDVHRTLHTIVQEQGSLDSSKTEMLVKNLQMDGRYLRDVW (SEQ ID NO: 78)
Full length PsCPR-L atggagtcaaattcgatgaaactatcgatagttgatttaatgtctgcaattttaaatgggaaattagatcaagcagattcaattttgatagagaatcgtgagatttt gatgatattgactacagctatagccgtttttattggttgtggtttcctttatatttggagaagatcttttaggaaatccagtaaaattgttgaggtacctaaaactgg ttttactaaagaacctgaacctgaaattgacgatggtaaaaagaaagttactatcttctttggtactcaaactggtactgctgagggtttcgctaaagcactttct gaagaagcaaaagcaagatatgacaaagctgtctttaaagtggttgatctggatgattacgcagcagatgatgatgagtttgaggagaaactaaaaaaag aaaatttagcgcttttctttttagctacctacggagacggtgaaccaacagataatgctgccagattttataaatggtttacggaagtggctaaagagaagga accatggcttccgaatcttaactttggtgtgtttggattgggaaatagacagtatgagcatttcaataaggttgcaaaggttgttgatgagattattgttgaactg ggtgggaaacgtcttgttcctgtgggtcttggagacgacgaccaatgtatagaagatgactttacagcatggcgagagttggtatggcctgaattggatca gttgctccttgatgaaaatgattcaacgagtgtttcaaccccttacgctgctgctgtagcagaatatagggtggtattccatgattctgctgatgcatccctaca agacaagaactggagtaatgccaatggctatgctgtctatgatgctctgcacccatgcagagccaatgtggctgtaagaagggagcttcacactccagctt ctgatcgttcttgtattcatctggaatttgacatatcaggcactgggcttacgtatgaaactggagatcatgttggtgtctactctgaaaactgcatggaaactgt ggaggaagcggaaagattgttgggtctttcatcggacactgtattttctattcacgtcgataacgaggatgggacaccgatcgccggcagcgcattacctc ccccttttccctctcccagcactttaagaactgcacttaccaaatatgctgatctattgaatttccccaagaaggctgctctacatgctctagctgctcatgcatc tgatccaaaggaagctgagcgattaagatttcttgcatctcctgctggaaaggatgaatatgcacagtgggtagttgcaagtcagagaagtctgctagaagt catggctgaatttccatcagctaaacctccacttggggtgttctttgcagcaatagcacctcggctgcagcctagattctattcgatttcgtcctccaacaggat ggcaccctctagaattcatgtcacatgtgcgctagtgaatgagagaacaccagctggtcgaattcataaaggagtctgttcaacctggatgaagaattctgtt ccttcggaagaaagccgtcactgcagctgggcaccagtttttgtgagacaatctaacttcaaactgcctgctgattctacagtaccaattatcatgattggcc ctggtactgggttggctcctttcagaggattcatgcaggaacgacttgctcttaaggaagccggtgtagaattgggagctgcggtcctgttctttggatgcag aaacagaagcatggatttcattatgaagacgagctgaataactttgtcgagtcaggtgctatctctgagttggtggtcgctttctcacgtgagggtcctacca aagaatacgtacaacataagatgacagagaaggcttccgacatctggaatatgatctctcagggtgcttatctttacgtctgtggtgatgccaaaggcatgg ccaaggatgtgcatcgaactcttcacacaattgttcaagagcagggatctttagacagctccaagactgaaatgttggtgaagaatctgcagatggatggg aggtatctacgtgatgtctggtga (SEQ ID NO: 79)
Full length P. somniferum PDC1 isoform XI
MVFINLINTPALKSLTIIIINLGEEKKMSSQIELGTSLHPTNSSPVPLTNASNSATLGRHLARRLVQA
GVKDVFSVPGDFNLCLLDHLIAEPELNLVGCCNELNAGYAADGYARANGVGACVVTFTVGGLSI
LNAIAGAYSENLPVICIVGGPNSNDYGTNRILHHTIGLPDFTQELRCFQTVTCFQAVVNNLDDAHE
LIDTAISTALKESKPVYISIGCNLPAVPHPTFTREPVPFYLAPRISNQMGLEAAVEAAAAFLNKAVK
PVIVGGPRLRVCKAQQAFVELADASGYPIAVMPSGKGLIPEHHPHFIGTYWGAVSSSFCGEIVESA
DAYVFVGPIFNDYSSVGYSLLIKKEKAIIIQPNRVTIGDGPSFGWVFMADFLTALASKLKRNTTAM
ENHRRIFVPPGIALKREANEPLRVNILFKHIQEMLSGDTAVIAETGDSWFNCQKLHLPENCGYEFQ
MQYGSIGWSVGATLGYAQAVKHKRVIACIGDGSFQVTAQDVSTMIRCGQKSIIFLINNGGYTIEV
EIHDGPYNVIKNWNYTKFVDAIHNGEGKCWTTKVKTEEELIEAIAKATGDEKDSLCFIEVLVHKD
DTSKELLEW GSR V S A AN SRPPNPQ (SEQ ID NO: 80)
Full length P. somniferum PDC1 isoform XI DNA atggttttcatcaaccttattaataccccggccctcaagtcacttaccatcatcattatcaacctaggagaagaaaaaaaaatgagttctcaaattgaacttgga accagtctacatcctacaaactcatcacccgtaccactaactaatgcttcaaattctgcaacacttggtagacacttagcacgtcgtctagttcaagctggtgt aaaagatgtgttctcagtacctggtgattttaacttgtgtttattagatcatctaatagctgaaccggagctcaacttagttggttgctgtaatgaacttaatgctg gttatgctgccgatggttatgcaagagcaaatggtgtcggtgcttgtgttgttacttttactgttggtggacttagtattcttaatgcaattgctggtgcttatagtg aaaatctacctgttatttgtattgtcggtggtcctaattctaatgattatggtactaatcgtattcttcatcatactattggattacctgattttactcaagaacttcgat gctttcaaactgttacttgtttccaggctgtagttaacaacttggatgatgcacatgagctgattgacactgccatctccactgctttgaaagaaagcaagcct gtttatatcagcattggctgtaacttacctgcagttcctcacccaaccttcactagggagcctgttccgttctatcttgctccaaggattagcaatcaaatgggg ctagaggctgcagtggaggcagcagcagcatttttgaacaaggctgtaaagcctgtgattgtgggagggcctaggttaagggtgtgcaaggctcaacaa gcatttgttgagctagcagatgccagcgggtatcccatagctgttatgccatcaggcaaaggtctgatacctgaacatcaccctcacttcataggaacatact ggggtgeegteagttcxagcttetgtggtgaaattgtggagtcageggatgeetatgtttttgttggteeaatttttaatgaetaeagttetgtgggataetcgtt gcttatcaagaaggagaaagccataattatacagcctaaccgggttaccatcggtgatggcccttcttttggatgggtctttatggctgacttcttgactgcttt agcctcaaaactgaagaggaacactacagctatggaaaatcatcgcagaatctttgtcccgcccggtatcgctctgaagcgtgaggctaatgaaccgttg agagtcaacatcctcttcaaacatattcaggaaatgctgagcggagacacagctgttattgcagaaacaggagattcatggttcaattgtcagaaattacatc tcccagaaaattgcggatatgagttccagatgcagtacggatctattggatggtcagtaggtgcaacccttggatatgcacaggctgtcaaacataagcgt gtcattgcctgcattggtgatggcagtttccaggtaacagctcaggatgtatccacaatgatccgctgtggccagaagagtatcatattcctcatcaacaacg gaggatacacaattgaagttgagatccatgacgggccatacaatgtaatcaaaaactggaattacaccaagttcgttgatgccatccataatggtgaagga aaatgttggaccaccaaggtgaaaacagaggaggaactaattgaagcgattgcaaaagcaacaggagatgaaaaggatagcttatgctttatagaagtct tggtgcacaaagatgatacgagcaaagaactgttagagtggggatcaagggtctctgctgccaatagccgcccacccaatcctcagtag (SEQ ID
NO: 81 )
Petroselinum crispum 4HPAAS (Pc4HPAAS)
MGSIDNLTEKLASQFPMNTLEPEEFRRQGHMMIDFLADYYRKVENYPVRSQVSPGYLREILPESA
PYNPESLETILQDVQTKIIPGITHWQSPNFFAYFPSSGSTAGFLGEMLSTGFNVVGFNWMVSPAAT
ELENVVTDWFGKMLQLPKSFLFSGGGGGVLQGTTCEAILCTLVAARDKNLRQHGMDNIGKLVV
YCSDQTHSALQKAAKIAGIDPKNFRAIETTKSSNFQLCPKRLESAILHDLQNGLIPLYLCATVGTTS
STTVDPLPALTEVAKKYDLWVHVDAAYAGSACICPEFRQYLDGVENADSFSLNAHKWFLTTLD
CCCLWVRNPSALIKSLSTYPEFLKNNASETNKVVDYKDWQIMLSRRFRALKLWFVLRSYGVGQL
REFIRGHVGMAKYFEGLVNMDKRFEVVAPRLFSMVCFRIKPSAMIGKNDEDEVNEINRKLLESV
NDSGRIYVSHTVLGGIYVIRFAIGGTLTDINHVSAAWK VLQDHAGALLDDTFTSNKLVEVLS
(SEQ ID NO: 82) Pc4HPAAS DNA atgggctccatcgataatcttactgaaaaattagcatcccaattcccaatgaatacacttgagcctgaagagttccgaaggcaaggccacatgatgatagat tttcttgctgattactatcgtaaagttgaaaattatccagttagaagtcaggtctcacctggttatcttcgcgaaattttaccagaatctgccccatacaaccccg aatctcttgaaacaattcttcaagatgtacaaaccaaaataatccctggcatcacacattggcaaagtcccaacttctttgcttattttccttccagtggtagtact gctggttttcttggtgaaatgcttagtactggtttcaatgttgttggctttaactggatggtttcacctgctgctaccgagctcgaaaacgttgtcaccgattggtt cggaaagatgcttcaacttcccaaatcctttcttttctctggtggtggcggaggtgtcctgcaagggactacttgcgaggccatactgtgtacacttgtggca gcaagagacaaaaacctgaggcaacatggcatggataatattggcaagttggtcgtttattgttctgaccaaactcattctgccctgcaaaaggctgccaaa attgctgggattgatcccaagaacttccgtgcaatcgaaacaactaaatcctcaaatttccagctctgtcctaagcgactggaatcggctattttgcatgatttg caaaatgggcttattccattgtacttgtgtgctactgttgggacaacgtcatcaacaactgttgatcctttgccagctcttacagaggtggcgaaaaagtacga tttatgggtgcatgtggatgctgcatatgctggaagtgcttgtatatgccccgaatttcgacagtatcttgacggtgtggaaaatgcagattcttttagtttgaat gcacacaagtggtttttgacaacattagattgttgttgtctttgggtgaggaatccgagtgctcttataaagtctctttccacatatcctgagttcttgaagaataa tgctagtgaaacaaacaaggtggtggattacaaagactggcaaataatgttgagcaggcgatttcgagcattgaaattatggtttgtattgagaagctacgg agttggtcagctgagggagtttattagagggcatgtaggcatggccaagtatttcgaagggctagtgaacatggacaagaggttcgaagttgtagctccta gactattttctatggtctgttttaggattaagccatctgcgatgatcgggaaaaatgatgaagatgaagtaaacgagatcaaccggaagttgttggagtcggt gaatgattcgggtcggatatatgtgagtcacacggtgttaggagggatttacgtaatccggtttgccataggagggaccctaacagatattaaccatgtgag tgcggcttggaaggtgttacaggaccatgcaggcgccttgcttgatgatactttcacgtccaataagctcgtggaagtattatcataa (SEQ ID NO:
83)
Zymomonas mobilis pyruvate decarboxylase (ZmPDC)
MSYTVGTYLAERLVQIGLKHHFAVAGDYNLVLLDNLLLNKNMEQVYCCNELNCGFSAEGYAR
AKGA A A A V VT YS VGALS AFD AIGGA Y AENLP VILISGAPNNNDH A AGH VLHH ALGKTD YH Y QL
EMAKNITAAAEAIYTPEEAPAKIDHVIKTALREKKPVYLEIACNIASMPCAAPGPASALFNDEASD
EASLNAAVEETLKFIANRDKVAVLVGSKLRAAGAEEAAVKFADALGGAVATMAAAKSFFPEEN
PHYIGTSWGEVSYPGVEKTMKEADAVIALAPVFNDYSTTGWTDIPDPKKLVLAEPRSVVVNGIRF
PSVHLKDYLTRLAQKVSKKTGALDFFKSLNAGELKKAAPADPSAPLVNAEIARQVEALLTPNTT
VIAETGDSWFNAQRMKLPNGARVEYEMQWGHIGWSVPAAFGYAVGAPERRNILMVGDGSFQL
TAQEVAQMVRLKLPVIIFLINNYGYTIEVMIHDGPYNNIKNWDYAGLMEVFNGNGGYDSGAGK
GLKAKTGGEL AE AIK V AL ANTDGPTLIECFIGREDCTEEL VKW GKRV A A AN SRKPVNKLL (SEQ
ID NO: 84)
ZmPDC DNA atgagttatactgtcggtacctatttagcggagcggcttgtccagattggtctcaagcatcacttcgcagtcgcgggcgactacaacctcgtccttcttgaca acctgcttttgaacaaaaacatggagcaggtttattgctgtaacgaactgaactgcggtttcagtgcagaaggttatgctcgtgccaaaggcgcagcagca gccgtcgttacctacagcgtcggtgcgctttccgcatttgatgctatcggtggcgcctatgcagaaaaccttccggttatcctgatctccggtgctccgaaca acaatgatcacgctgctggtcacgtgttgcatcacgctcttggcaaaaccgactatcactatcagttggaaatggccaagaacatcacggccgccgctgaa gc gatttacaccccggaagaagctccggctaaaatcgatcacgtgattaaaactgctcttcgtgagaagaagccggtttatctcgaaatcgcttgcaacatt gcttccatgccctgcgccgctcctggaccggcaagcgcattgttcaatgacgaagccagcgacgaagcttctttgaatgcagcggttgaagaaaccctga aattcatcgccaaccgcgacaaagttgccgtcctcgtcggcagcaagctgcgcgcagctggtgctgaagaagctgctgtcaaatttgctgatgctctcggt ggcgcagttgctaccatggctgctgcaaaaagcttcttcccagaagaaaacccgcattacatcggcacctcatggggtgaagtcagctatccgggcgttg aaaagacgatgaaagaagccgatgcggttatcgctctggctcctgtcttcaacgactactccaccactggttggacggatattcctgatcctaagaaactgg ttctcgctgaaccgcgttctgtcgtcgttaacggcattcgcttccccagcgtccatctgaaagactatctgacccgtttggctcagaaagtttccaagaaaac cggtgcattggacttcttcaaatccctcaatgcaggtgaactgaagaaagccgctccggctgatccgagtgctccgttggtcaacgcagaaatcgcccgtc aggtcgaagctcttctgaccccgaacacgacggttattgctgaaaccggtgactcttggttcaatgctcagcgcatgaagctcccgaacggtgctcgcgtt gaatatgaaatgcagtggggtcacattggttggtccgttcctgccgccttcggttatgccgtcggtgctccggaacgtcgcaacatcctcatggttggtgat ggttccttccagctgacggctcaggaagtcgctcagatggttcgcctgaaactgccggttatcatcttcttgatcaataactatggttacaccatcgaagttat gatccatgatggtccgtacaacaacatcaagaactgggattatgccggtctgatggaagtgttcaacggtaacggtggttatgacagcggtgctggtaaag gcctgaaggctaaaaccggtggcgaactggcagaagctatcaaggttgctctggcaaacaccgacggcccaaccctgatcgaatgcttcatcggtcgtg aagactgcactgaagaattggtcaaatggggtaagcgcgttgctgccgccaacagccgtaagcctgttaacaagctcctctag (SEQ ID NO: 85)
While specific embodiments have been described above with reference to the disclosed embodiments and examples, such embodiments are only illustrative and do not limit the scope of the invention. Changes and modifications can be made in accordance with ordinary skill in the art without departing from the invention in its broader aspects as defined in the following claims. All publications, patents, and patent documents are incorporated by reference herein, as though individually incorporated by reference. No limitations inconsistent with this disclosure are to be understood therefrom. The invention has been described with reference to various specific and preferred embodiments and techniques. However, it should be understood that many variations and modifications may be made while remaining within the spirit and scope of the invention.

Claims

What is claimed is:
1. A modified cell comprising one or more polynucleotides encoding one or more heterologous enzymes for the production of a benzylisoquinoline compound, wherein the polynucleotides encode one or more heterologous enzymes operably linked to a polynucleotide sequence controlling expression of the one or more heterologous enzymes, wherein expression of the one or more heterologous enzymes permits the modified cell to produce the benzylisoquinoline compound when provided with a substrate, and wherein the modified cell is not a plant cell.
2. The modified cell of claim 1, wherein the production of the benzylisoquinoline compound comprises the formation of an intermediate compound selected from 4HPAA, norcoclaurine, or both.
3. The modified cell of claim 1, wherein the one or more heterologous enzymes comprise phenylpyruvate decarboxylase activity, aromatic acetaldehyde synthase activity, or both.
4. The modified cell of claim 3, wherein the one or more heterologous enzymes having phenylpyruvate decarboxylase activity convert 4-hydroxyphneylpyruvate (4HPP) to 4- hydroxyphenylacetaldehyde (4HPAA), or convert 3,4-dihydroxyphenylpyruvate to 3,4- dihydroxyphenylacetaldehyde (DHPAA), or both.
5. The modified cell of claim 4, wherein the one or more heterologous enzymes having phenylpyruvate decarboxylase activity comprise one or more of Saccharomyces cerevisiae transaminated amino acid decarboxylase (ARO 10) , pyruvate decarboxylase 1 from P. somniferum (PsPDC 1 ), and pyruvate decarboxylase 1 isoform XI (PsPDCl-Xl) from P. somniferum.
6. The modified cell of claim 3, wherein the one or more heterologous enzymes having aromatic acetaldehyde synthase activity convert tyrosine to 4-hydroxyphenylacetaldehyde (4HPAA), or convert L- 3,4-dihydroxyphenylalanine (L-DOPA) to 3,4-dihydroxyphenylacetaldehyde (DHPAA), or both.
7. The modified cell of claim 6, wherein the one or more heterologous enzymes having aromatic acetaldehyde synthase activity comprises one or more of: a tyrosine decarboxylase of P. somniferum (PsTyDCl); a PsTyDCl variant having one or more amino acid substitutions selected from Leu205His or Leu205Asn, Tyr98Phe, Phe99Tyr; a P. somniferum tyrosine decarboxylase 3 (PsTyDC3); a PsTyDC3 variant having one or more substitutions selected from Ile370Ser, TyrlOOPhe, PhelOlTyr, and His203Asn; a P. somniferum tyrosine decarboxylase 6 (PsTyDC6); a Pseudomona putida L-DOPA decarboxylase (PpDDCl); and a PpDDCl variant having one or more substitutions selected from Tyr79Phe, Phe80Tyr, Hisl81Asn, and Hisl81Leu.
8. The modified cell of claim 7, wherein the PsTyDCl variant comprises the Leu205His substitution; or wherein the PsTyDCl variant comprises the Tyr98Phe, the Phe99Tyr, and the Leu205Asn substitution.
9. The modified cell of claim 8, wherein the PsTyDCl variant consists of the Leu205His substitution; or wherein the PsTyDCl variant consists of the Tyr98Phe, the Phe99Tyr, and the Leu205Asn substitution.
10. The modified cell of claim 7, wherein the PsTyDC3 variant comprises the Ile370Ser substitution; or wherein the PsTyDC3 variant comprises the TyrlOOPhe, the PhelOlTyr, and the His203Asn substitutions.
11. The modified cell of claim 10, wherein the PsTyDC3 variant consists of the Ile370Ser substitution; or wherein the PsTyDC3 variant consists of the TyrlOOPhe, the PhelOlTyr, and the His203Asn substitutions.
12. The modified cell of claim 7, wherein the PpDDCl variant comprises the Tyr79Phe, the Phe80Tyr, and the Hisl81Asn substitutions.
13. The modified cell of claim 12, wherein the PpDDCl variant consists of the Tyr79Phe, the Phe80Tyr, and the Hisl81Asn substitutions.
14. The modified cell of claim 1, comprising one or more of L-DOPA decarboxylase (DDC), cytochrome P450 (CYP450), CYP450 reductase (CPR), norcoclaurine synthase (NCS), norcoclaurine 6-0- methyltransferase (60MT), coclaurine ;V-mcthy I transferase (CNMT), iV-methylcoclaurine 3 -hydroxylase (NMCH), 3 -hydroxy -iV-methylcoclaurine 4- O-methyl transferase (40MT), tyrosine 3 -monooxygenase, and tyrosine aminotransferase.
15. The modified cell of claim 1, wherein the benzylisoquinoline compound comprises one or more of norcoclaurine, coclaurine, norlaudanosoline, laudanosoline, /V-mcthylnorcococlaurinc. N- methylcoclaurine, 3’-hydroxy-/V-mcthvlcoclaurinc, reticuline, norreticuline, papaverine, laudanine, laudanosine, tetrahydropaperivine, 1,2-dihydropaperivine, and orientaline.
16. The modified cell of claim 1, wherein the substrate is L-tyrosine, L-3,4,-dihydroxyphenyl alanine (L-DOPA), dopamine, tyramine, 4-hydroxyphenylpyruvic acid (4HPP), 3,4-dihydroxyphenylacetaldehyde (DHPAA), or 4-hydroxy -phenylacetaldehyde (4HPAA).
17. The modified cell of claim 1 , wherein the modified cell is Escherichia coli.
PCT/US2022/011302 2021-01-05 2022-01-05 Alkaloid producing oxidases and carboxy-lyases and methods of use WO2022150380A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163133984P 2021-01-05 2021-01-05
US63/133,984 2021-01-05

Publications (1)

Publication Number Publication Date
WO2022150380A1 true WO2022150380A1 (en) 2022-07-14

Family

ID=82358102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/011302 WO2022150380A1 (en) 2021-01-05 2022-01-05 Alkaloid producing oxidases and carboxy-lyases and methods of use

Country Status (1)

Country Link
WO (1) WO2022150380A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190127770A1 (en) * 2013-11-04 2019-05-02 The Board Of Trustees Of The Leland Stanford Junior University Benzylisoquinoline Alkaloid (BIA) Precursor Producing Microbes, and Methods of Making and Using the Same
WO2020198373A1 (en) * 2019-03-26 2020-10-01 Antheia, Inc. Methods of improving production of morphinan alkaloids and derivatives

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190127770A1 (en) * 2013-11-04 2019-05-02 The Board Of Trustees Of The Leland Stanford Junior University Benzylisoquinoline Alkaloid (BIA) Precursor Producing Microbes, and Methods of Making and Using the Same
WO2020198373A1 (en) * 2019-03-26 2020-10-01 Antheia, Inc. Methods of improving production of morphinan alkaloids and derivatives

Similar Documents

Publication Publication Date Title
d Moore Bifunctional and moonlighting enzymes: lighting the way to regulatory control
Ehlting et al. Global transcript profiling of primary stems from Arabidopsis thaliana identifies candidate genes for missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation
Persson et al. Medium-and short-chain dehydrogenase/reductase gene and protein families: the MDR superfamily
Yonekura‐Sakakibara et al. An evolutionary view of functional diversity in family 1 glycosyltransferases
Wakao et al. Genome‐wide analysis of glucose‐6‐phosphate dehydrogenases in Arabidopsis
Xu et al. Genome-scale metabolic network modeling results in minimal interventions that cooperatively force carbon flux towards malonyl-CoA
Wolf et al. A systems biology approach reveals major metabolic changes in the thermoacidophilic archaeon S ulfolobus solfataricus in response to the carbon source L‐fucose versus D‐glucose
Carrington et al. Evolution of a secondary metabolic pathway from primary metabolism: shikimate and quinate biosynthesis in plants
EP3303601B1 (en) Biosynthesis of phenylpropanoids and phenylpropanoid derivatives
Kim et al. Strategies for systems‐level metabolic engineering
Lindner et al. NADPH-auxotrophic E. coli: a sensor strain for testing in vivo regeneration of NADPH
He et al. Structural and functional evolution of isopropylmalate dehydrogenases in the leucine and glucosinolate pathways of Arabidopsis thaliana
Chao et al. Characterization of the cinnamoyl-CoA reductase (CCR) gene family in Populus tomentosa reveals the enzymatic active sites and evolution of CCR
Wakabayashi et al. Specific methylation of (11 R)-carlactonoic acid by an Arabidopsis SABATH methyltransferase
Lanza et al. Global strain engineering by mutant transcription factors
Liu et al. Genome-wide comparative analysis of the BAHD superfamily in seven Rosaceae species and expression analysis in pear (Pyrus bretschneideri)
Kouril et al. Unraveling the function of the two Entner–Doudoroff branches in the thermoacidophilic Crenarchaeon Sulfolobus solfataricus P2
Misra et al. Characterization of cytochrome P450 monooxygenases isolated from trichome enriched fraction of Artemisia annua L. leaf
Zhao et al. Identification of essential genes involved in metabolism‐based resistance mechanism to fenoxaprop‐P‐ethyl in Polypogon fugax
Suttiyut et al. Integrative analysis of the shikonin metabolic network identifies new gene connections and reveals evolutionary insight into shikonin biosynthesis
Zhao et al. Comparison and analysis of the genomes of two Aspergillus oryzae strains
CN117597447A (en) Recombinant microorganisms
Lawson et al. Transcriptome-wide identification and characterization of the Rab GTPase family in mango
Wei et al. Evolution of isoprenyl diphosphate synthase-like terpene synthases in fungi
Wang et al. Enzyme engineering and in vivo testing of a formate reduction pathway

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22737044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22737044

Country of ref document: EP

Kind code of ref document: A1