CN118048329A

CN118048329A - CYP98A20 protein participating in acteoside biosynthesis, and coding gene and application thereof

Info

Publication number: CN118048329A
Application number: CN202211436961.8A
Authority: CN
Inventors: 刘涛; 席道义; 杨一涵; 马延和
Original assignee: Tianjin Institute of Industrial Biotechnology of CAS
Current assignee: Tianjin Institute of Industrial Biotechnology of CAS
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2024-05-17

Abstract

The invention belongs to the technical field of biology, and particularly provides cytochrome P450 monooxygenase CYP98A20 protein synthesized by acteoside, and a coding gene and application thereof. The CYP98A20 protein can catalyze the hydroxylation reaction of the 3-position carbon atom of coumaric acid or the 3-position carbon atom of tyrosol in the naringin B, and then catalyze the hydroxylation reaction of the other 3-position carbon atom, so that the acteoside is generated. The invention has important theoretical and practical significance for regulating and producing plant phenethyl alcohol glycoside compounds and improving the content of the phenethyl alcohol glycoside active ingredient verbascoside and derivatives thereof in sesame by biotechnology.

Description

CYP98A20 protein participating in acteoside biosynthesis, and coding gene and application thereof

Technical Field

The invention belongs to the technical field of biology, and particularly relates to CYP98A20 protein participating in acteoside biosynthesis, and a coding gene and application thereof.

Background

Phenylethanoid glycosides (PhGs) are an important class of plant natural products in the kingdom of plants. PhGs has various biological properties such as neuroprotection, anti-inflammatory, antioxidant, antibacterial, antiviral, etc. The core structure is characterized in that the hydroxyphenylethyl is connected to beta-glucopyranose. The glucose moiety is typically modified with phenolic acid and various sugar substituents via ester or glycosidic linkages. At least 572 PhGs have been identified, most of which are isolated from medicinal plants. The acteoside is a hydroxylglycoside residue formed by concentrating caffeoyl and rhamnosyl, is one of the best-known PhGs due to the potential benefit of the acteoside to human health, and confirms that the quality control standard of rehmannia and cistanche deserticola is acteoside in Chinese pharmacopoeia. Acteoside has been found in more than 200 species worldwide, most of which belong to the order cheiliales (Lamiales). A large number of drug studies have shown that acteoside has a wide range of biological activities, such as antioxidant, antibacterial, anti-inflammatory, cytoprotective, anticancer, etc. Sesame (school name: sesamum indicum) is an annual upright herb of the genus Sesamum of the family Pedaliaceae.

With the development of molecular biology, physiological biochemistry and other subjects, the research on plant secondary metabolites is also more and more advanced. However, plant secondary metabolites are of various kinds, different structures, and secondary metabolic pathways are also diverse and complex, many of which are currently unknown or are merely known about the general route of synthetic pathways (Yazaki K.2017, plant Biotechnol,34 (3): 131-142). Biosynthetic enzymes that catalyze the production of acteoside from artocarside B have not been reported. Therefore, there is an urgent need to develop a clone-related enzyme that can increase the content of a target ingredient or directly produce an active ingredient or an intermediate.

Disclosure of Invention

The invention aims to provide CYP98A20 protein participating in acteoside biosynthesis, and a coding gene and application thereof.

In a first aspect of the invention there is provided an isolated cytochrome P450 monooxygenase CYP98a20 polypeptide selected from the group consisting of:

a) A polypeptide having the amino acid sequence shown in SEQ ID NO. 1;

b) A derivative protein which is formed by substitution, deletion or addition of one or a plurality of amino acid residues, preferably 1 to 50, more preferably 1 to 30, still more preferably 1 to 10, most preferably 1 to 6, 1, 2 or 3 amino acids of the amino acid sequence shown in SEQ ID NO.1 and has catalytic activity of cinnarin A;

(c) A derivative protein comprising the protein sequence of (a) or (b) in the sequence;

c) The homology of the amino acid sequence with the amino acid sequence shown in SEQ ID NO. 1 is more than or equal to 65%, preferably more than or equal to 70%, preferably more than or equal to 75%, preferably more than or equal to 80%, preferably more than or equal to 85%, more preferably more than or equal to 90%, preferably more than or equal to 95%, preferably more than or equal to 98%, preferably more than or equal to 99%, and has the derivative protein catalyzing the activity of the cinnarin B.

In another preferred embodiment, the sequence (c) is a fusion protein formed by adding a tag sequence, a signal sequence or a secretion signal sequence to the sequence (a) or (b).

In another preferred embodiment, the CYP98A20 polypeptide is derived from the family Pedaliaceae, preferably from a plant selected from the group consisting of sesame.

In another preferred embodiment, the amino acid sequence of the CYP98A20 polypeptide is shown in SEQ ID NO. 1.

In a second aspect, the invention provides an isolated polynucleotide selected from the group consisting of:

(a) A nucleotide sequence encoding a CYP82AR2 polypeptide as set forth in SEQ ID No. 1;

(b) A nucleotide sequence as set forth in SEQ ID NO. 2;

(c) A nucleotide sequence having a homology of greater than or equal to 75% (preferably greater than or equal to 80%, more preferably greater than or equal to 90%) to the sequence shown in SEQ ID NOs.2;

(d) A nucleotide sequence of 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides truncated or added at the 5 'and/or 3' end of the nucleotide sequence shown in SEQ ID nos.;

(e) A nucleotide sequence complementary (preferably fully complementary) to the nucleotide sequence of any one of (a) - (d).

In another preferred embodiment, the sequence of the nucleotide is set forth in SEQ ID NOs.2.

In another preferred embodiment, the polynucleotide having the sequence set forth in SEQ ID NOs.2 encodes a polypeptide having the amino acid sequence set forth in SEQ ID NOs.1.

In a third aspect, the invention provides a vector comprising a polynucleotide according to the second aspect of the invention.

In another preferred embodiment, the carrier is selected from the group consisting of: expression vectors, shuttle vectors, integration vectors, or combinations thereof.

In another preferred embodiment, the carrier is selected from the group consisting of: bacterial plasmids, phage, yeast plasmids, plant cell viruses, animal cell viruses, retroviruses, or combinations thereof.

In another preferred example, the vector includes a vector expressed in yeast, such as a pESC-series vector, a pYES-series vector, a pUG-series vector, a pSH-series vector, a pRS-series vector.

In a fourth aspect, the invention provides a genetically engineered host cell comprising a vector according to the third aspect of the invention, or having integrated into its genome a polynucleotide according to the second aspect of the invention.

In another preferred embodiment, the host cell is a prokaryotic cell or a eukaryotic cell. Further preferably, the host cell is a lower eukaryotic cell, such as a yeast cell. Alternatively, the host cell is a higher eukaryotic cell, such as a mammalian cell. Alternatively, the host cell is a prokaryotic cell, such as a bacterial cell, preferably E.coli. More preferably, the host cell may also be selected from higher plants, insect cells.

Particularly preferably, the host cell is selected from the group consisting of: saccharomyces cerevisiae, E.coli. Particularly preferably, the host cell is a Saccharomyces cerevisiae cell.

In a fifth aspect, the invention provides a method of producing a CYP98a20 polypeptide, said method comprising:

(a) Culturing the host cell of the fourth aspect of the invention under conditions suitable for expression;

(b) Isolating the CYP98A20 polypeptide from the culture.

In a sixth aspect, the present invention provides the use of a CYP98a20 polypeptide according to the first aspect of the present invention or a derivative thereof, a vector according to the third aspect of the present invention, or a host cell according to the fourth aspect of the present invention, for catalyzing the following reaction, or for preparing a catalytic formulation for catalyzing the following reaction, for catalyzing the hydroxylation of the 3-carbon atom of coumaric acid or the 3-carbon atom of tyrosol in euonylbumin B, followed by catalyzing the hydroxylation of the other 3-carbon atom to produce acteoside.

The seventh aspect of the present invention provides a method for preparing a acteoside, comprising:

In the presence of the polypeptide or the derivative polypeptide thereof in the first aspect of the invention, catalyzing hydroxylation reaction on the 3-carbon atom of coumaric acid or the 3-carbon atom of tyrosol in the naringin B, and then catalyzing hydroxylation reaction of the other 3-carbon atom, thereby obtaining acteoside; the catalytic reaction process is shown in figure 1 below.

Specifically, the polypeptide and the derivative polypeptide thereof are added into catalytic reaction at the same time. In another preferred embodiment, the method is performed in the presence of a cofactor, preferably NADPH and/or NADH.

In a further preferred embodiment, the cofactor is used in an amount of 0.5-6.0 mM, preferably 0.5-5.0 mM, more preferably 1.0-3.0 mM.

In a particularly preferred mode, the process is carried out in the presence of oxygen.

In a further preferred embodiment, the method further comprises: an additive for regulating the activity of the enzyme is provided to the reaction system. Further preferably, the additive for regulating the enzymatic activity is: the additives for increasing or inhibiting the enzymatic activity, more specifically the additives for modulating the enzymatic activity, are selected from the group consisting of: ca ²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni² ⁺、Zn²⁺, or Fe ²⁺.

In a particularly preferred embodiment, the pH of the reaction system is: 6.5-8.5, preferably pH 7.4-7.6. In a specific embodiment, the temperature of the reaction system is: 25 ℃ to 35 ℃, preferably 28 ℃ to 30 ℃.

In a particularly preferred embodiment, the reaction time is from 0.5 h to 24: 24 h, preferably from 1h to 10: 10 h, more preferably from 3: 3h to 6: 6 h.

The invention firstly digs and separates the artocarpine B cytochrome P450 monooxygenase CYP98A20 from the sesame (Sesamum indicum) transcriptome, and verifies that the artocarpine B cytochrome P450 monooxygenase CYP98A20 is a key enzyme in the biosynthesis process of acteoside; CYP98A20 catalyzes the production of acteoside from artocarside B. Furthermore, the invention has important theoretical and practical significance for regulating and producing plant phenethyl alcohol glycoside compounds and improving the content of the phenethyl alcohol glycoside active ingredient verbascoside and derivatives thereof in sesame by biotechnology.

Drawings

FIG. 1 shows the chemical structure and catalytic process of the glycosylation of naringin B, monohydroxy intermediate, and acteoside.

FIG. 2 shows the results of the CYP98A20 crude enzyme reaction assay.

FIG. 3 shows the results of CYP98A20 microsomal assay.

FIG. 4 shows the LC-MS detection results of CYP98A20 catalyzed reactions.

Detailed Description

Through intensive research on acteoside biosynthesis, the inventor firstly separates a cytochrome P450 monooxygenase CYP98A20 protein synthesized by acteoside biosynthesis. Specifically, the CYP98A20 protein can catalyze the naringin B to generate the acteoside. The cloning and functional research of the cytochrome P450 monooxygenase gene are key to analyzing the biosynthesis path of acteoside and derivatives in sesame, and bring wide application space for improving the content of target components or directly producing active ingredients or intermediates by using biotechnology. The present invention has been completed on the basis of this finding.

Definition of the definition

As used herein, the terms "active polypeptide", "polypeptide of the invention and its derivatives", "enzyme of the invention", "CYP 98a20 of the invention" all refer to CYP98a20 (SEQ ID No.: 1) polypeptides and their derivatives.

As used herein, an "isolated polypeptide" means that the polypeptide is substantially free of other proteins, lipids, carbohydrates, or other substances with which it is naturally associated. The person skilled in the art is able to purify the polypeptides using standard protein purification techniques. Substantially pure polypeptides can produce a single main band on a non-reducing polyacrylamide gel. The purity of the polypeptide can also be further analyzed by amino acid sequence.

The active polypeptide of the present invention may be a recombinant polypeptide, a natural polypeptide, a synthetic polypeptide. The polypeptides of the invention may be naturally purified products, or chemically synthesized products, or produced from prokaryotic or eukaryotic hosts (e.g., bacteria, yeast, plants) using recombinant techniques.

The invention also includes fragments, derivatives and analogues of the polypeptides. As used herein, the terms "fragment," "derivative," and "analog" refer to a polypeptide that retains substantially the same biological function or activity as the polypeptide.

The polypeptide fragments, derivatives or analogues of the invention may be (i) polypeptides having one or more conserved or non-conserved amino acid residues, preferably conserved amino acid residues, substituted, which may or may not be encoded by the genetic code, or (ii) polypeptides having a substituent in one or more amino acid residues, or (iii) polypeptides formed by fusion of a mature polypeptide with another compound, such as a compound that extends the half-life of the polypeptide, for example polyethylene glycol, or (iv) polypeptides formed by fusion of an additional amino acid sequence to the polypeptide sequence, such as a leader or secretory sequence or a sequence used to purify the polypeptide or a proprotein sequence, or fusion proteins with the formation of an antigen IgG fragment. Such fragments, derivatives and analogs are within the purview of one skilled in the art and would be well known in light of the teachings herein.

The polynucleotides of the invention may be in the form of DNA or RNA. DNA forms include cDNA, genomic DNA, or synthetic DNA. The DNA may be single-stranded or double-stranded. The DNA may be a coding strand or a non-coding strand. The coding region sequence encoding the mature polypeptide may be identical to the coding region sequence set forth in SEQ ID NO. 1 or a degenerate variant.

The term "polynucleotide encoding a polypeptide" may include polynucleotides encoding the polypeptide, or may include additional coding and/or non-coding sequences.

The invention also relates to variants of the above polynucleotides which encode polypeptides having the same amino acid sequence as the invention or fragments, analogs and derivatives of the polypeptides. Variants of the polynucleotide may be naturally occurring allelic variants or non-naturally occurring variants. Such nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is a substitution of a polynucleotide, which may be a substitution, deletion, or insertion of one or more nucleotides, without substantially altering the function of the encoded polypeptide.

The invention also relates to polynucleotides which hybridize to the sequences described above and which have at least 50%, preferably at least 70%, more preferably at least 80% identity between the two sequences. The invention relates in particular to polynucleotides which hybridize under stringent conditions (or stringent conditions) to the polynucleotides of the invention.

The invention also relates to nucleic acid fragments which hybridize to the sequences described above. As used herein, a "nucleic acid fragment" is at least 15 nucleotides, preferably at least 30 nucleotides, more preferably at least 50 nucleotides, and most preferably at least 100 nucleotides or more in length. The nucleic acid fragments may be used in nucleic acid amplification techniques (e.g., PCR) to determine and/or isolate polynucleotides encoding the CYP98a20 protein.

The polypeptides and polynucleotides of the invention are preferably provided in isolated form, and more preferably purified to homogeneity.

The full-length nucleotide sequence or a fragment thereof of the present invention can be usually obtained by a PCR amplification method, a recombinant method or an artificial synthesis method. For the PCR amplification method, primers can be designed according to the nucleotide sequences disclosed in the present invention, particularly the open reading frame sequences, and amplified to obtain the relevant sequences using a commercially available cDNA library or a cDNA library prepared according to a conventional method known to those skilled in the art as a template. When the sequence is longer, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.

Once the relevant sequences are obtained, recombinant methods can be used to obtain the relevant sequences in large quantities. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the propagated host cell by conventional methods.

Furthermore, the sequences concerned, in particular fragments of short length, can also be synthesized by artificial synthesis. In general, fragments of very long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them.

At present, it is already possible to obtain the DNA sequences encoding the proteins of the invention (or fragments or derivatives thereof) entirely by chemical synthesis. The DNA sequence can then be introduced into a variety of existing DNA molecules (or vectors, for example) and cells known in the art. In addition, mutations can be introduced into the protein sequences of the invention by chemical synthesis.

Methods of amplifying DNA/RNA using PCR techniques are preferred for obtaining the genes of the present invention. In particular, when it is difficult to obtain full-length cDNA from a library, it is preferable to use RACE method (RACE-cDNA end rapid amplification method), and primers for PCR can be appropriately selected according to the sequence information of the present invention disclosed herein and synthesized by a conventional method. The amplified DNA/RNA fragments can be isolated and purified by conventional methods, such as by gel electrophoresis.

The term "recombinant expression vector" refers to bacterial plasmids, phages, yeast plasmids, plant cell viruses, mammalian cell viruses such as adenoviruses, retroviruses or other vectors well known in the art. Any plasmid or vector may be used as long as it is replicable and stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translational control elements.

Methods well known to those skilled in the art can be used to construct expression vectors containing the CYP98A20 polypeptide, coding DNA sequences, and appropriate transcriptional/translational control signals. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like. The DNA sequence may be operably linked to an appropriate promoter in an expression vector to direct mRNA synthesis. Representative examples of these promoters are: the lac or trp promoter of E.coli; a lambda phage PL promoter; eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, LTRs from retroviruses, and other known promoters that control the expression of genes in prokaryotic or eukaryotic cells or viruses thereof. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator.

In addition, the expression vector preferably comprises one or more selectable marker genes to provide phenotypic traits for selection of transformed host cells, such as dihydrofolate reductase, neomycin resistance and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli.

Vectors comprising the appropriate DNA sequences as described above, as well as appropriate promoter or control sequences, may be used to transform appropriate host cells to enable expression of the protein.

The host cell may be a prokaryotic cell, such as a bacterial cell; or lower eukaryotic cells, such as yeast cells; or higher eukaryotic cells, such as mammalian cells. Representative examples are: coli, streptomyces; bacterial cells of salmonella typhimurium; fungal cells such as yeast; a plant cell; insect cells of Drosophila S2 or Sf 9; CHO, COS, 293 cells, or Bowes melanoma cells.

When the polynucleotide of the present invention is expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase the transcription of a gene. Examples include the SV40 enhancer 100 to 270 base pairs on the late side of the origin of replication, the polyoma enhancer on the late side of the origin of replication, and adenovirus enhancers.

It will be clear to a person of ordinary skill in the art how to select appropriate vectors, promoters, enhancers and host cells.

Transformation of host cells with recombinant DNA can be performed using conventional techniques well known to those skilled in the art. When the host is a prokaryote such as E.coli, competent cells, which are capable of absorbing DNA, can be obtained after an exponential growth phase and treated by the CaCl2 method using procedures well known in the art. Another approach is to use MgCl2. Transformation can also be performed by electroporation, if desired. When the host is eukaryotic, the following DNA transfection methods may be used: calcium phosphate co-precipitation, conventional mechanical methods such as microinjection, electroporation, liposome encapsulation, etc.

The transformant obtained can be cultured by a conventional method to express the polypeptide encoded by the gene of the present invention. The medium used in the culture may be selected from various conventional media depending on the host cell used. The culture is carried out under conditions suitable for the growth of the host cell. After the host cells have grown to the appropriate cell density, the selected promoters are induced by suitable means (e.g., temperature switching or chemical induction) and the cells are cultured for an additional period of time.

The recombinant polypeptide in the above method may be expressed in a cell, or on a cell membrane, or secreted outside the cell. If desired, the recombinant proteins can be isolated and purified by various separation methods using their physical, chemical and other properties. Such methods are well known to those skilled in the art. Examples of such methods include, but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (salting-out method), centrifugation, osmotic sterilization, super-treatment, super-centrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, high Performance Liquid Chromatography (HPLC), and other various liquid chromatography techniques and combinations of these methods.

The invention will be further illustrated with reference to specific examples. These examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.

The experimental procedure, in which the specific conditions are not noted in the following examples, is generally followed by routine conditions, such as for example Sambrook et al, molecular cloning: conditions described in the laboratory Manual (New York: cold Spring Harbor Laboratory Press, 1989) or as recommended by the manufacturer. Percentages and parts are weight percentages and parts unless otherwise indicated.

Materials and reagents

Lysis (lysis) buffer: contains 50mM NaH ₂PO₄ and 300mM NaCl, pH7.4.

TEK wash buffer: 50 mMTris-HCl (pH 7.5) buffer containing 0.1M KCl and 1mM EDTA.

TEG preservation buffer: TE buffer containing 20% (v/v) glycerol.

YPD medium: 10 g/L yeast extract, 20 g/L peptone, 20 g/L glucose, and if solid culture medium is prepared, adding 20 g/L agar powder;

SC-Ura medium: 6.7 g/L of an amino-free yeast nitrogen source, 2 g/L,20 g/L of glucose, 0.9 g/L SC Dropoutmix-Ura;

Saccharomyces cerevisiae BY4742 was used as the expression host for candidate P450 enzymes, the expression vector pESC-URA3 was purchased from the company Invitrogen, and the expression vector pCf302 was described in the following document ：Jingjie Jiang, et al. "Metabolic Engineering of Saccharomyces cerevisiae for High-Level Production of Salidroside from Glucose."J. Agric. Food Chem. 2018, 66, 4431-4438.

The cinnaringin B is purchased from Chengdu plant-labeled pure biotechnology Co., ltd; NADPH is purchased from soribao biotechnology limited; acteoside was purchased from beijing merida technologies limited; reverse transcription kit TRANSSCRIPT ONE-step gDNA Removal AND CDNA SYNTHESIS Supermix was purchased from Beijing full gold Limited; the blunt end rapid cloning kit pEASY-Blunt Cloning Kit was purchased from Beijing all gold Limited;

Codon optimized Arabidopsis thaliana source AtCPR1 was synthesized by Shanghai JieRui Bioengineering Co.

EXAMPLE 1 acquisition of CYP98A20 protein and Gene encoding same

By sequence analysis of transcriptome data of sesame (Sesamum indicum) root, stem, leaf and petiole tissues, gene expression level analysis is performed to obtain candidate genes with expression trends such as (leaf > petiole > stem > root). Based on RNA-seq second generation transcriptome sequencing, gene differential expression analysis was performed on the sequencing data to find cytochrome P450 monooxygenase candidate genes. And (3) searching candidate genes with annotation information of P450 enzyme by taking |Log2 (ratio) |1 or more and FDR < 0.001 as a standard and meeting the condition of Max (FPKM) |10 or more, and finally determining 23 candidate genes of the P450 enzyme with complete open reading and expression trend such as leaf blade leaf stalk stem root. Designing primers according to the full-length ORF sequence of the candidate genes, carrying out PCR amplification by taking sesame cDNA as a template to obtain 23 full-length P450 enzyme candidate genes, constructing the 23 full-length P450 enzyme candidate genes on a saccharomyces cerevisiae expression vector carrying an arabidopsis thaliana-derived cytochrome P450 reductase gene (ATR 1), obtaining a yeast recombinant expression vector pCf-ATR 1-CYP carrying different candidate P450 enzyme genes, and carrying out subsequent activity screening to obtain CYP98A20.

Wherein the CYP98A20 primer sequence CYP98A20-5F:5, -ATGGCTTTACCTCTACTCATCC-3,; CYP98A20-3R:5, -TCACACATTTCCGGAAGCCACA-3, using reverse transcribed cDNA as template, PCR amplifying to obtain target gene product, using flat-end quick cloning kit to clone it on pEASY-Blunt cloning vector, selecting three monoclonals, sequencing them in Jinweizhi biotechnology Co-Ltd, PCR amplifying to obtain sequence identical to that obtained by transcriptome, its DNA coding sequence and protein amino acid sequence are respectively shown as SEQ ID NO 2 and SEQ ID NO 1.

EXAMPLE 2 CYP98A20 protein functional analysis

1. Construction of recombinant strains

Recombinant plasmid pCf-ATR 1-CYP98A20 was constructed. The pCf is taken as a starting vector, and the cytochrome P450 reductase gene (ATR 1) gene derived from arabidopsis thaliana synthesized after codon optimization is constructed to the promoter P _TDH3 to obtain the recombinant plasmid pCf302-ATR1. The recombinant plasmid pCf-ATR 1-CYP98A20 is obtained by inserting a DNA sequence shown by SEQ ID NO. 2 into a promoter P _PGK1 by taking the plasmid pCf-ATR 1 as a starting vector. For specific procedures see (Jingjie Jiang, et al J. Agric. Food chem. 2018, 66, 4431-4438.).

Converting the recombinant plasmid pCf-ATR 1-CYP98A20 constructed in the above into Saccharomyces cerevisiae BY4742 to obtain a recombinant strain BY4742-ATR1-CYP98A20; the plasmid pCf-ATR 1 carrying only the ATR1 gene was transformed into Saccharomyces cerevisiae BY4742 to obtain recombinant strain BY4742-pCf302-ATR1, which was used as a blank control strain.

2. Crude enzyme and microsomal reaction experiments

Single colonies of the recombinant strain BY4742-ATR1-CYP98A20 and the blank strain BY4742-pCf were inoculated into 3mL of a Ura (uracil) -deficient SC-Ura liquid medium containing 20g/L glucose, and cultured at 30℃and 220rpm for 24 hours, at 1:100 were inoculated into 50mL of a Ura (uracil) -deficient SC-Ura liquid medium containing 20g/L glucose, and cultured at 30℃and 220rpm for 36-48 hours, and the cells were collected. Glass bead extraction (adding 0.5g of 0.4-0.6 μm glass beads into 1ml of thallus, vortex shaking for 1min, standing on ice for 1min, repeating for 5 times) to obtain supernatant as crude enzyme. In vitro crude enzyme reaction (100. Mu.L crude enzyme, 50. Mu.L 50mM Tris-HCl, 1mM NADPH,100. Mu.M cinnaringin B) was carried out at 30℃for 6h. After the reaction is finished, the reaction is stopped by the methanol with the same volume, and the methanol is filtered by a microporous filter membrane with the size of 0.22 mu M for HPLC-MS detection.

Single colonies of the recombinant strain BY4742-ATR1-CYP98A20 and the blank strain BY4742-pCf were inoculated into 3mL of a Ura (uracil) -deficient SC-Ura liquid medium containing 20g/L glucose, and cultured at 30℃and 220rpm for 24 hours, at 1:100 were inoculated into 50mL of a Ura (uracil) -deficient SC-Ura liquid medium containing 20g/L glucose, and cultured at 30℃and 220rpm for 36-48 hours, and the cells were collected. Glass bead extraction (adding 0.5g of 0.4-0.6 μm glass beads into 1ml of thallus, vortex shaking for 1min, standing on ice for 1min, repeating for 5 times) to obtain supernatant. Yeast microsomes were extracted using ultracentrifugation (2 h centrifuged at 150000g with ultracentrifuge at 4 ℃, supernatant discarded, TEG buffer resuspended). In vitro microsomal reactions (100. Mu.L microsomes, 50. Mu.L 50mM Tris-HCl, 1mM NADPH,100. Mu.M cinnaringin B) were performed at 30℃for 6h. After the reaction is finished, the reaction is stopped by the methanol with the same volume, and the methanol is filtered by a microporous filter membrane with the size of 0.22 mu M for HPLC-MS detection. HPLC-MS was performed using the Agilent 1200 HPLC system series a Bruker-MicrOTOF-II mass spectrometer (Bruker, germany) system.

The HPLC detection parameters were as follows: column YMC-Pack ODS-A (4.6X1250 mm, 5. Mu.M); the eluent consists of a solution A and a solution B: solution A is 0.1% (v/v) formic acid aqueous solution, solution B is acetonitrile, the sample injection amount is 50 mu L, the detection wavelength is 312 nm, the flow rate is 1 mL/min, and the column temperature is 40 ℃. The gradient elution method is adopted: 0min, 5% (v/v) solution B;5-18 min,5% -13.2% (v/v) solution B;18-43 min,18% -21.6% (v/v) solution B; 43-50 min,100% (v/v) solution B;50-57min,5% (v/v) solution B. The mass spectrometry conditions were as follows: the mode is Electrospray (ESI) negative ions, the capillary voltage is-4500V, the atomization air pressure is 1 bar, the solvent removing gas is nitrogen, the flow rate is 6.0L/min, the solvent removing temperature and the ion source temperature are 180 ℃, the scanning range m/z is 50-1000, the LC-MS data collection software is MassLynx 4.0 (Waters, USA), and sodium trifluoroacetate is used as a correction fluid for accurate molecular weight.

As shown in FIG. 2, the retention time of the adsorption peak of the blank strain BY4742-pCf and that of the standard artocarpine B (I) (t _R =39.27 min) are consistent, while the adsorption peak of the recombinant strain BY4742-ATR1-CYP98A20 in the crude enzyme and microsomal reaction system is weakened, and three new adsorption peaks appear at the position with increased polarity, wherein the position peak with the maximum polarity and that of the acteoside standard (IV) are consistent (t _R =27.66 min), indicating that CYP98A20 can effectively convert the substrate artocarpine B to acteoside. The exact molecular weight of this compound (IV) ([ M-H ] ^- = 623.2054), as determined by HPLC-MS, was consistent with the molecular weight of the acteoside, and the conversion of microsomal reactions was found to be about 88.9%. At the same time, new peaks (II) and (III) also appear in the reaction of catalytic cinnarin B of CYP98a20, the retention times of the new peaks (III) and SYRINGALIDE A3 '- α -L-rhamnopyranoside are consistent (t _R =32.12 min), and the exact molecular weight of (III) ([ M-H ] ^- = 607.2037) is determined to be SYRINGALIDE A3' - α -L-rhamnopyranoside by HPLC-MS detection (fig. 4). In addition, (II) the precise molecular weight ([ M-H ] ^- = 607.2039) suggests that CYP98a20 produces two monohydroxy intermediates (II, C3 position hydroxylation of tyrosol) and SYRINGALIDE A3' - α -L-rhamnopyranoside during the reaction that catalyzes the reaction of cinnarin B.

3. SYRINGALIDE A3 crude enzyme and microsomal reaction experiments of 3' -alpha-L-rhamnopyranoside

The CYP98A20 crude enzyme and microsomes extracted in the 2 nd point are used for the SYRINGALIDE A' -alpha-L-rhamnopyranoside in-vitro catalytic reaction. As shown in FIG. 3, the retention time of the SYRINGALIDE A ' - α -L-rhamnopyranoside (III) absorption peak of the blank strain BY4742-pCf302 and the retention time of the standard absorption peak (t _R =32.12 min) are consistent, while the absorption peak of SYRINGALIDE A3 ' - α -L-rhamnopyranoside (III) in the crude enzyme and microsomal reaction system of the recombinant strain BY4742-ATR1-CYP98A20 is weakened, a new absorption peak appears at the position with increased polarity, the retention time is consistent with the retention time of the verbascoside standard (t _R =27.66 min), which indicates that CYP98A20 can effectively convert the monohydroxy intermediate SYRINGALIDE A ' - α -L-rhamnopyranoside to form the verbascoside, and the conversion rate of microsomal reaction is calculated to be about 93.6%.

All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.

Claims

1. An isolated CYP98a20 polypeptide, wherein said polypeptide is selected from the group consisting of:

a) A polypeptide having an amino acid sequence as shown in SEQ ID NO. 1;

2. The polypeptide of claim 1, wherein the CYP98a20 polypeptide is from the family of the general fabaceae, preferably from a member selected from the group consisting of sesame.

3. A peptide derived from the polypeptide according to claim 1 or 2, wherein the polypeptide according to claim 1 or 2 is added with a tag sequence, a signal sequence or a secretion signal sequence to form a fusion protein.

4. An isolated polynucleotide, wherein the encoded amino acid is the isolated CYP98a20 polypeptide of any one of claims 1 to 3, or the complement thereof.

5. A gene vector comprising the polynucleotide of claim 3, preferably it is an expression vector, a shuttle vector, or an integration vector; or bacterial plasmids, phages, yeast plasmids, plant cell viruses, animal cell viruses, retroviruses.

6. A genetically engineered host cell comprising the vector of claim 4, or the polynucleotide of claim 3 integrated into its genome, preferably the host cell is selected from the group consisting of: bacterial cells, fungal cells, higher plant cells, insect or mammalian cells, more preferably yeast cells, E.coli.

7. A method of producing a CYP98a20 polypeptide, said method comprising:

(a) Culturing the host cell of claim 6 under conditions suitable for expression;

(b) Isolating the CYP98A20 polypeptide from the culture.

8. Use of a CYP98a20 polypeptide according to any one of claims 1 to 2 or a derivative peptide according to claim 3 or a gene encoding the same, for catalyzing the hydroxylation of the 3-carbon atom of coumaric acid or the 3-carbon atom of tyrosol in naringin B, followed by catalyzing the hydroxylation of another 3-carbon atom to produce acteoside.

9. A process for the preparation of a acteoside, characterized in that it comprises a catalytic reaction with artocarpine B as substrate in the presence of a CYP98a20 polypeptide according to any one of claims 1 to 2 or a derivative peptide according to claim 3, thereby obtaining acteoside.

10. The method of preparation according to claim 9, wherein the method is carried out in the presence of a cofactor, preferably NADPH and/or NADH, further preferably the cofactor is used in an amount of 0.5-6.0 mM, preferably 0.5-5.0 mM, more preferably 1.0-3.0 mM; preferably, the process is carried out in the presence of oxygen; preferably, an additive for increasing or inhibiting the enzymatic activity is also provided to the reaction system, more specifically said additive for modulating the enzymatic activity is selected from the group consisting of: ca ²⁺、Co²⁺、Mn²⁺、Ba²⁺、Al³⁺、Ni²⁺、Zn²⁺, or Fe ²⁺; further preferably, the pH of the reaction system is: 6.5 to 8.5, preferably pH 7.4 to 7.6, the temperature of the reaction system is: 25 ℃ to 35 ℃, preferably 28 ℃ to 30 ℃; preferably, the reaction time is from 0.5h to 24h, preferably from 1h to 10h, more preferably from 3h to 6h.