CN117897480A

CN117897480A - Mouse Li Tanggao degree specific glycosyltransferase and application thereof

Info

Publication number: CN117897480A
Application number: CN202280053383.1A
Authority: CN
Inventors: 严兴; 王平平; 李超静
Original assignee: Shenghe Everything Shanghai Biotechnology Co ltd
Current assignee: Shenghe Everything Shanghai Biotechnology Co ltd
Priority date: 2021-07-30
Filing date: 2022-08-01
Publication date: 2024-04-16
Also published as: CN115678952A; KR20240032944A; WO2023006109A1

Abstract

The invention provides a rhamnose highly specific glycosyltransferase and application thereof. The invention discloses a specific glycosyltransferase for the first time, which can catalyze a substrate to carry out rhamnosylation at a specific position and has high catalytic activity. In particular, the specific glycosyltransferases of the invention are capable of specifically and efficiently catalyzing glycosylation of the C-6 position of a tetracyclic triterpene compound substrate at the first glycosyl group to extend the rhamnose group. The invention also provides mutants of the specific glycosyltransferases. The specific glycosyltransferase has good specificity and high efficiency, can be applied to the construction of artificially synthesized ginsenoside, various new ginsenosides and derivatives thereof, and has good application value in the fields of pharmacy and the like.

Description

Mouse Li Tanggao degree specific glycosyltransferase and application thereof

The present application claims priority from the application of chinese application No. 202110871374.0 filed at day 7 and 30 of 2021.

Technical Field

The invention relates to the fields of biotechnology and plant biology, in particular to a high-specificity glycosyltransferase of rhamnose and application thereof.

Background

Ginsenoside is the general name of saponin separated from Panax plant (such as Ginseng radix, notoginseng radix, radix Panacis Quinquefolii, etc.) and herba Gynostemmatis, and is a triterpene compound. The ginsenoside can also be called ginsenoside, notoginsenoside and gynosaponin according to the isolated source. Ginsenoside is the main bioactive component in these medicinal plants. Currently, about 150 saponins have been isolated. Structurally, ginsenoside is mainly a bioactive small molecule formed by glycosylation of sapogenin. There are only a limited number of sapogenins of ginsenoside, mainly protopanaxadiol and protopanaxatriol of dammarane type tetracyclic triterpenes, and oleanolic acid. Sapogenins can increase water solubility, alter their subcellular localization, and produce different biological activities after glycosylation. Most protopanaxadiol saponins are glycosylation modified at the C3 and/or C20 hydroxyl groups, while protopanaxatriol saponins are glycosylation modified at the C6 and/or C20 hydroxyl groups. Different types of glycosyl and different degrees of glycosylation modification produce ginsenoside with various molecular structures.

Rhamnosylated modified ginsenoside has abundant bioactivity. For example, rg2 extends a molecule of rhamnose at C6-O-Glc of Rh1, and Rg2 has good effects in treating depression, improving heart function, improving learning and memory ability, resisting senile dementia and the like; ginsenoside Re extends one molecule of rhamnose at C6-O-Glc of Rg1, and can play roles in reducing blood sugar and treating diabetes by promoting secretion of glucagon-like peptide-1 in intestinal tissues.

The ginsenoside is prepared from total or abundant ginsenoside of Ginseng radix or Notoginseng radix by chemical, enzyme and microbial fermentation hydrolysis method. Because wild ginseng resources are basically exhausted, ginsenoside resources are currently derived from artificial cultivation of ginseng or pseudo-ginseng, the artificial cultivation has a long growth period (generally more than 5-7 years), is limited by regions, and is also frequently subjected to diseases and insect pests, and a large amount of pesticides are required to be applied, so that the artificial cultivation of ginseng or pseudo-ginseng has serious continuous cropping obstacles (the cultivation of ginseng or pseudo-ginseng in a place needs to be fallowed for more than 5-15 years to overcome the continuous cropping obstacles), and the yield, quality and safety of the ginsenoside are challenged.

The development of synthetic biology provides a new opportunity for heterologous synthesis of natural products of plant origin. The yeast is taken as a chassis, and the synthesis of the arteannuic acid or the dihydroarteannuic acid by fermenting with cheap monosaccharide through the assembly and optimization of metabolic pathways is realized, and then the arteannuin is produced by a one-step chemical conversion method, which shows that the synthesis biology has great potential in the aspect of the drug synthesis of natural products. The ginsenoside monomer is heterologously synthesized by utilizing yeast chassis cells through a synthetic biological method, the raw materials are cheap monosaccharides, the preparation process is a fermentation process with adjustable safety, and any external pollution (such as pesticides used during artificial planting of raw material plants) is avoided, so that the ginsenoside monomer is prepared through the synthetic biological technology, the cost advantage is realized, and the quality and safety of a finished product can be ensured. The preparation method utilizes the synthetic biological technology to prepare a sufficient amount of various natural and unnatural ginsenoside monomers with high purity, which are used for activity determination and clinical experiments and promote the research and development of innovative medicaments of rare ginsenoside.

In recent years, analysis of the synthetic pathway of ginsenoside has been greatly advanced by research on transcriptomes and functional genomes of ginseng, notoginseng and American ginseng. In 2006, japanese and korean scientists identified the terpene cyclase element da (myrdiol synthase, pgDDS) that converts epoxysqualene to dammarenediol, respectively. From 2011 to 2012, korean scientists have identified cytochrome P450 elements CYP716A4 and CYP716a53v2 that oxidize dammarenediol to protopanaxadiol and further oxidize protopanaxadiol to protopanaxatriol.

The synthetic biological method is utilized to artificially synthesize the ginsenoside with medicinal activity, so that not only the metabolic pathway of the synthesized sapogenin is required to be constructed, but also UDP-glycosyltransferase for catalyzing glycosylation of the ginsenoside is required to be identified. The function of UDP-glycosyltransferases is to transfer the glycosyl group on a glycosyl donor (nucleoside diphosphate sugars such as UDP-glucose, UDP-rhamnose, UDP-xylose and UDP-arabinose) to a different glycosyl acceptor. From the analysis of plant genomes that have been sequenced at present, plant genomes often encode more than hundred different glycosyltransferases. The national students in 2015 identified a UDP-glycosyltransferase element (UGTPG 100) capable of transferring a glucosyl group at the C6 position of the original panaxatriol. Chinese scholars disclose glycosyltransferases (gGT 29-7, etc.) that can extend sugar chains at the C6 position of protopanaxatriol saponins (PCT/CN 2015/081111), for example gGT-7 can use UDP-Xyl to catalyze the C6 position of Rh1 to extend one molecule of xylosyl to generate notoginsenoside R2, can use UDP-Glc to catalyze the C6 position of Rh1 to extend one molecule of glucosyl to generate Rf, but cannot basically use UDP-Rha; mutants gGT-7 (N343G, A359P) of gGT-7 disclosed in patent (PCT/CN 2015/081111) have the advantage of catalyzing extension of the C6 position of Rh1 by UDP-Rha to form Rg2 by one molecule of rhamnosyl, but have very low activity, only about 9% conversion. Besides, gGT-7 (N343G, A359P) can also take UDP-glc as a donor for transglycosylation reaction besides taking UDP-Rha as the donor for transglycosylation reaction, and the catalysis efficiency is higher than that of the catalytic reaction taking UDP-Rha as the glycosyl donor. Therefore, gGT-7 (N343G, A359P) has low and non-specific UDP-Rha catalysis activity, so that the synthesis of a large amount of byproducts cannot meet the application requirements.

Disclosure of Invention

Under the above background, the inventor screens glycosyltransferase URT94-1 and URT94-2 which can extend UPD-rhamnose at C6 position from ginseng, can specifically obtain UDP-Rha as glycosyl donor, and efficiently catalyzes ginsenoside Rh1, ginsenoside Rg1 or Notoginseng radix R3 to extend a molecule of rhamnose at the first glycosyl at C6 position so as to obtain ginsenoside Rg2, ginsenoside Re or Yesanchinoside E respectively. However, URT94-1 and URT94-2 cannot catalyze the above-mentioned saponin substrates using UDP-glucose as a glycosyl donor. Therefore, the glycosyltransferases provide highly specific glycosyltransferases for the efficient preparation of saponins such as ginsenoside Rg2, ginsenoside Re, yesanchinoside E and the like.

In a first aspect of the present invention there is provided a method of linking a rhamnosyl group to the first glycosyl group at the C-6 position of a tetracyclic triterpene(s) compound comprising: transfer is performed with a specific glycosyltransferase having the amino acid sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof.

In another aspect of the invention there is provided the use of a specific glycosyltransferase for the attachment of a rhamnosyl group on the first glycosyl group in the C-6 position of a tetracyclic triterpene(s) (including as a catalyst for the reaction) which specific glycosyltransferase is one having the amino acid sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof.

In one or more embodiments, the rhamnosyl is provided by a glycosyl donor; preferably, the glycosyl donor is a glycosyl donor carrying a rhamnose group; more preferably, the glycosyl donor includes (but is not limited to): uridine Diphosphate (UDP) -rhamnose, guanosine Diphosphate (GDP) -rhamnose, adenosine Diphosphate (ADP) -rhamnose, cytidine Diphosphate (CDP) -rhamnose, thymidine Diphosphate (TDP) -rhamnose.

In one or more embodiments, the tetracyclic triterpene compound is a compound of formula (I), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is a compound of formula (II);

wherein R1 and R2 are H or glycosyl, R3 is monosaccharide glycosyl, and R4 is rhamnosyl; preferably, the glycosyl or monosaccharide glycosyl (R3) is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R2 and R3 is glucosyl, the compound of formula (I) is ginsenoside Rg1 and the compound of formula (II) is ginsenoside Re; when R1 and R2 are H, R and are glucosyl, the compound of formula (I) is ginsenoside Rh1, and the compound of formula (II) is ginsenoside Rg2.

In one or more embodiments, the tetracyclic triterpene compound is a compound of formula (III), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is a compound of formula (IV);

Wherein R1 is H or glycosyl, R2, R3 and R4 are monosaccharide glycosyl, and R5 is rhamnosyl; preferably, the glycosyl (R1) or monosaccharide glycosyl (R2, R3, R4) is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R, R3 and R4 are glucosyl, and R5 is rhamnosyl, the compound of formula (III) is notoginsenoside R3, and the compound of formula (IV) is Yesanchinoside E.

In one or more embodiments, the group species, substrate, or product is as follows:

substrate(s)	R1	R2	R3	R4	Product(s)
Ginsenoside Rg1	H	Glc	Glc	Rha	Ginsenoside Re
Ginsenoside Rh1	H	H	Glc	Rha	Ginsenoside Rg2

substrate(s)	R1	R2	R3	R4	R5	Product(s)
Notoginseng radix saponin R3	H	Glc	Glc	Glc	Rha	Yesanchinoside E

In one or more embodiments, the compounds of formula (I), (III) include, but are not limited to: dammarane type tetracyclic triterpene compounds of S configuration or R configuration, lanolate type tetracyclic triterpene compounds, hydrokansuine (apoirucalane) type tetracyclic triterpene compounds, kansuine type tetracyclic triterpene compounds, cycloartene (cycloartane) type tetracyclic triterpene compounds, cucurbitane type tetracyclic triterpene compounds, or chinaberrane type tetracyclic triterpene compounds.

In one or more embodiments, the compound of formula (II) or (IV) comprises ginsenoside Rg2, ginsenoside Re, yenanchinoside E.

In another aspect of the invention there is provided a method of intracellular attachment of a rhamnosyl group at the first glycosyl group at the C-6 position of a tetracyclic triterpene(s) compound comprising:

(a) Introducing a tetracyclic triterpene compound reaction precursor or a construct expressing/forming the same, and introducing a specific glycosyltransferase or a construct expressing the same into a host cell to obtain a recombinant host cell; the specific glycosyltransferase is a glycosyltransferase having the sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof; the presence of a glycosyl donor bearing a rhamnose group or the introduction of a glycosyl donor bearing a rhamnose group (including constructs/precursors capable of forming the glycosyl donor) within the host cell;

(b) Culturing the recombinant host cell of (a) to obtain a tetracyclic triterpene compound having a rhamnosyl group attached to the first glycosyl group at position C-6;

preferably, the tetracyclic triterpene compound reaction precursor comprises: ginsenoside Rg1, ginsenoside Rh1, and notoginsenoside R3; the corresponding products include: ginsenoside Re, ginsenoside Rg2, yesanchinoside E;

Preferably, the glycosyl donor includes (but is not limited to): uridine Diphosphate (UDP) -rhamnose, guanosine Diphosphate (GDP) -rhamnose, adenosine Diphosphate (ADP) -rhamnose, cytidine Diphosphate (CDP) -rhamnose, thymidine Diphosphate (TDP) -rhamnose.

In one or more embodiments, the method further comprises: an additive for regulating the activity of the enzyme is provided to the reaction system.

In one or more embodiments, the additive for modulating enzymatic activity is: additives for increasing or inhibiting the enzymatic activity.

In one or more embodiments, the additive for modulating enzymatic activity is selected from the group consisting of: ca (Ca) ²⁺ 、Co ²⁺ 、Mn ²⁺ 、Ba ²⁺ 、Al3+、Ni ²⁺ 、Zn ²⁺ Or Fe ²⁺ 。

In one or more embodiments, the additive for modulating enzymatic activity is: can generate Ca ²⁺ 、Co ²⁺ 、Mn ²⁺ 、Ba ²⁺ 、Al3+、Ni ²⁺ 、Zn ²⁺ Or Fe ²⁺ Is a substance of (a).

In one or more embodiments, the pH of the reaction system is: the pH is 4.0-10.0, preferably 5.5-9.0.

In one or more embodiments, the temperature of the reaction system is: 10 ℃ to 105 ℃, preferably 20 ℃ to 50 ℃.

In another aspect of the invention, there is provided a specific glycosyltransferase having the sequence set forth in SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof; preferably, the conservatively variant polypeptide comprises:

(1) Consists of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14 via substitution, deletion or addition of one or more (e.g., 1-20, preferably 1-10, more preferably 1-5, more preferably 1-3) amino acid residues, and having a rhamnosyl functionality linked to the first glycosyl at the C-6 position of the tetracyclic triterpene compound;

(2) Amino acid sequence identical to SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, and has a rhamnosyl function of linking the first glycosyl at the C-6 position of the tetracyclic triterpene compound to the first glycosyl of the tetracyclic triterpene compound; or (b)

(3) In SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a signal peptide sequence is added to the N-terminus of the polypeptide having the sequence shown in fig. 14.

In another aspect of the invention, an isolated polynucleotide encoding the specific glycosyltransferase is provided.

In one or more embodiments, the polynucleotide encoding the specific glycosyltransferase comprises a polynucleotide selected from the group consisting of: (A) the sequence shown in SEQ ID NO: 1. 3 or 13; (B) a sequence corresponding to SEQ ID NO: 1. 3 or 13, and a nucleotide sequence having at least 95% identity to the sequence set forth in seq id no; (E) the amino acid sequence set forth in SEQ ID NO: 1. 3 or 13 and/or a nucleotide sequence formed by truncating or adding 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides to the 5 'and/or 3' end of the sequence shown in seq id no; (F) A complement of the nucleotide sequence of any one of (a) - (E); (G) Fragments of the sequences described in (A) - (F) 20-50 bases in length.

In one or more embodiments, the polynucleotide sequence is selected from the group consisting of SEQ ID NOs: 1. 3 or 13 or a complement thereof.

In another aspect of the invention, a nucleic acid construct (construct) is provided comprising said polynucleotide, or expressing said specific glycosyltransferase; preferably, the nucleic acid construct is an expression vector or a homologous recombination vector.

In another aspect of the invention, a recombinant host cell is provided which expresses said specific glycosyltransferase, or comprises said polynucleotide, or comprises said nucleic acid construct; preferably, the recombinant host cell further comprises a tetracyclic triterpene compound reaction precursor or a construct expressing/forming the same; preferably, a glycosyl donor bearing a rhamnose group or a glycosyl donor bearing a rhamnose group (including constructs/precursors capable of forming the glycosyl donor) is also present in the recombinant host cell;

in one or more embodiments, the tetracyclic triterpene compound reaction precursors comprise: ginsenoside Rg1, ginsenoside Rh1, and notoginsenoside R3; the corresponding products include: ginsenoside Re, ginsenoside Rg2, yesanchinoside E.

In one or more embodiments, the glycosyl donor includes (but is not limited to): uridine Diphosphate (UDP) -rhamnose, guanosine Diphosphate (GDP) -rhamnose, adenosine Diphosphate (ADP) -rhamnose, cytidine Diphosphate (CDP) -rhamnose, thymidine Diphosphate (TDP) -rhamnose.

In one or more embodiments, the host cell is a prokaryotic cell or a eukaryotic cell.

In one or more embodiments, the host cell is a eukaryotic cell, such as a yeast cell or a plant cell. In one or more embodiments, the host cell is a saccharomyces cerevisiae cell. In one or more embodiments, the host cell is a ginseng cell or a pseudo-ginseng cell.

In one or more embodiments, the host cell is a prokaryotic cell, such as E.coli.

In one or more embodiments, the host cell is not a cell that naturally produces the product formed upon treatment with the specific glycosyltransferases of the invention; for example, it is not a cell that naturally produces a compound of formula (II), (IV).

In one or more embodiments, the host cell is not a cell that naturally produces one or more of the following: ginsenoside Rh1, ginsenoside Rg1, notoginsenoside R3, ginsenoside Rg2, ginsenoside Re, and Yesanchinoside E.

In one or more embodiments, the host cell further has a feature selected from the group consisting of:

(a) A mutant expressing dammarenediol and/or protopanaxadiol saponin and/or a key enzyme in the protopanaxatriol saponin anabolic pathway having 50% sequence identity to the enzyme;

(b) Expressing a polypeptide comprising a functional fragment of the enzyme of (a) or a mutant having 50% sequence identity to the fragment;

(c) A polynucleotide comprising the enzyme of (a) or the polypeptide of (b) or a complement thereof, and/or

(d) Comprising a nucleic acid construct comprising the coding sequence of (c).

In one or more embodiments, the protopanaxatriol saponins comprise ginsenoside Rh1, ginsenoside Rg1, ginsenoside R3, ginsenoside Rg2, ginsenoside Re, yesanchinoside E.

In one or more embodiments, key genes in the ginsenoside Rh1 anabolic pathway include (but are not limited to): dammarenediol synthase gene, cytochrome P450 CYP716a47 gene and P450 CYP716a47 reductase gene and the glycosyltransferase UGTPg100 of tetracyclic triterpene C6 (Genbank accession number AKQ 76388.1), or combinations thereof.

In one or more embodiments, key genes in the anabolic pathway of ginsenoside Rg1 include (but are not limited to): dammarenediol synthase gene, cytochrome P450 CYP716A47 gene and P450 CYP716A47 reductase gene and glycosyltransferases UGTPG1 and UGTPG100 (Genbank accession number AKQ 76388.1) at C20 and C6 of tetracyclic triterpenes, or combinations thereof.

In one or more embodiments, key genes in the ginsenoside Rg2 anabolic pathway include (but are not limited to): dammarenediol synthase gene, cytochrome P450 CYP716A47 gene, and P450 CYP716A47 reductase gene, and the glycosyltransferase UGTPG100 (Genbank accession number AKQ 76388.1) of tetracyclic triterpene C6 catalyze glycosyl extension at the C6 position, glycosyltransferases URT94-1 and URT94-2 in the invention, or a combination thereof.

In one or more embodiments, key genes in the ginsenoside Re anabolic pathway include (but are not limited to): dammarenediol synthase gene, cytochrome P450 CYP716a47 gene, and P450 CYP716a47 reductase gene, and tetracyclic triterpene C20 and C6 glycosyltransferases UGTPg1 and UGTPg100 (Genbank accession number AKQ 76388.1), glycosyltransferases URT94-1 and URT94-2 herein catalyzing glycosyl extension at the C6 position, or combinations thereof.

In a further aspect of the invention there is also provided the use of a host cell according to the invention for the preparation of a glycosyltransferase, a catalytic agent, or a compound of formula (II), (IV).

In another aspect of the invention there is also provided a method of producing a glycosyltransferase or a compound of formula (II) or (IV), comprising incubating a host cell according to the invention.

In a further aspect of the invention there is also provided the use of a host cell according to the invention for the preparation of an enzyme catalytic agent, or for the production of a glycosyltransferase, or as a catalytic cell, or for the production of a compound of formula (II), (IV).

In another aspect of the invention, there is provided a method of producing a transgenic plant comprising the steps of: regenerating the host cell of the invention into a plant, wherein the host cell is a plant cell. In one or more embodiments, the host cell is a ginseng cell. In one or more embodiments, the host cell is a pseudo-ginseng cell.

In another aspect of the invention, there is provided a kit for glycosyl transfer comprising: the specific glycosyltransferase can be connected with a rhamnosyl group on the first glycosyl group at the C-6 position of a tetracyclic triterpene (class) compound, and the specific glycosyltransferase is provided with a nucleotide sequence shown in SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof.

In another aspect of the invention, there is provided a kit for glycosyl transfer comprising: the isolated polynucleotide.

In another aspect of the invention, there is provided a kit for glycosyl transfer comprising: the nucleic acid construct (construct).

In another aspect of the invention, there is provided a kit for glycosyl transfer comprising: the recombinant host cell.

In one or more embodiments, the kit further comprises: a glycosyl donor carrying a rhamnose group; more preferably, the glycosyl donor includes (but is not limited to): uridine Diphosphate (UDP) -rhamnose, guanosine Diphosphate (GDP) -rhamnose, adenosine Diphosphate (ADP) -rhamnose, cytidine Diphosphate (CDP) -rhamnose, thymidine Diphosphate (TDP) -rhamnose.

In one or more embodiments, the kit further comprises: tetracyclic triterpene compounds are reactive precursors.

It is understood that within the scope of the present invention, the above-described technical features of the present invention and technical features specifically described below (e.g., in the examples) may be combined with each other to constitute new or preferred technical solutions. And are limited to a space, and are not described in detail herein.

Drawings

FIG. 1 shows the result of DNA agarose gel electrophoresis detection of amplified products obtained by amplifying 2 glycosyltransferase target bands from a single ginseng plant.

FIG. 2 shows the expression of glycosyltransferases URT94-1 and URT94-2 in E.coli by Western Blot. "1" represents the lysate supernatant of empty vector pET28a E.coli recombinants; marker, representing protein molecular weight standard; "2" represents the lysate supernatant of the glycosyltransferase BL21-URT94-1 E.coli recombinant; "3" represents the lysate supernatant of the glycosyltransferase BL21-URT94-2 E.coli recombinant; "4" represents the lysate supernatant of the glycosyltransferase BL 21-gGT-7 E.coli recombinant; "5" represents the lysate supernatant of the E.coli recombinant of glycosyltransferase BL 21-gGT-7 (N343G, A359P).

FIG. 3, panel a, show TLC patterns of glycosyltransferases URT94-1 and URT94-2 catalyzing transglycosylation reaction using protopanaxatriol type ginsenoside Rh1 as glycosyl acceptor and UDP-Rha as glycosyl donor. "1" represents the supernatant of the lysate of the pet28a empty vector recombinant as the enzyme solution; "2", "3", "4", "5" represent the lysate supernatants of BL21-URT94-1, BL21-URT94-2, BL21-gGT29-7 (N343G, A359P) and BL21-gGT29-7, respectively, as enzyme solutions. The arrow indicates the migration position of the saponin standard; panel b shows HPLC patterns of glycosyltransferases URT94-1 and URT94-2 for catalyzing transglycosylation reaction with protopanaxatriol type ginsenoside Rh1 as glycosyl acceptor and UDP-Rha as glycosyl donor.

FIG. 4, panel a, show TLC patterns of transglycosylation reactions using protopanaxatriol type ginsenoside Rg1 as a glycosyl acceptor and UDP-Rha as a glycosyl donor, catalyzed by glycosyl transferases URT94-1 and URT 94-2. "1" represents the supernatant of the lysate of the pet28a empty vector recombinant as the enzyme solution; "2", "3", "4", "5" represent BL21-gGT29-7, BL 21-gGT-7 (N343G, A359P), respectively, and the supernatants of lysates of BL21-URT94-1 and BL21-URT94-2 are used as enzyme solutions. The arrow indicates the migration position of the saponin standard; panel b shows HPLC patterns of glycosyltransferases URT94-1 and URT94-2 for catalyzing transglycosylation reaction with protopanaxatriol type ginsenoside Rg1 as glycosyl acceptor and UDP-Rha as glycosyl donor.

FIG. 5, TLC patterns of transglycosylation reactions with protopanaxatriol type ginsenoside Rh1 as the glycosyl acceptor and UDP-Glc as the glycosyl donor catalyzed by glycosyltransferases URT94-1 and URT 94-2. "1" represents the supernatant of the lysate of the pet28a empty vector recombinant as the enzyme solution; "2", "3", "4", "5" represent BL21-gGT29-7, BL 21-gGT-7 (N343G, A359P), respectively, and the supernatants of lysates of BL21-URT94-1 and BL21-URT94-2 are used as enzyme solutions. The arrow indicates the migration position of the saponin standard.

FIG. 6, TLC patterns of transglycosylation reactions with protopanaxatriol type ginsenoside Rg1 as the glycosyl acceptor and UDP-Glc as the glycosyl donor catalyzed by glycosyltransferases URT94-1 and URT 94-2. "1" represents the supernatant of the lysate of the pet28a empty vector recombinant as the enzyme solution; "2", "3", "4", "5" represent BL21-gGT29-7, BL 21-gGT-7 (N343G, A359P), respectively, and the supernatants of lysates of BL21-URT94-1 and BL21-URT94-2 are used as enzyme solutions. The arrow indicates the migration position of the saponin standard.

FIG. 7, comparison of catalytic activity of glycosyltransferase URT94-1m mutant with wild type.

FIG. 8, western blot method for detecting expression of glycosyltransferase URT94-1m mutant and wild type.

Detailed Description

The inventor provides a specific glycosyltransferase for the first time through intensive research and screening, and the specific glycosyltransferase can catalyze a substrate to carry out rhamnosylation at a specific position and improve the catalytic activity. In particular, the specific glycosyltransferases of the invention are capable of specifically and efficiently catalyzing the hydroxyl glycosylation of the C-6 position of a tetracyclic triterpene compound substrate at the first glycosyl group to extend the rhamnose group.

Definition of the definition

As used herein, an "isolated polypeptide" or "active polypeptide" means that the polypeptide is substantially free of other proteins, lipids, carbohydrates, or other substances with which it is naturally associated. The person skilled in the art is able to purify the polypeptides using standard protein purification techniques. Substantially pure polypeptides can produce a single main band on a non-reducing polyacrylamide gel. The purity of the polypeptide can also be further analyzed by amino acid sequence.

As used herein, the terms "active polypeptide", "polypeptide of the invention and its derivatives", "enzyme of the invention", "glycosyltransferase" are used interchangeably and include URT94-1 (SEQ ID NO: 2), URT94-2 (SEQ ID NO: 4) polypeptide or its derivatives; also, they may refer to mutants of glycosyltransferases, including URT94-1m (SEQ ID NO: 14).

As used herein, the term "conservatively modified polypeptide" refers to a polypeptide that retains essentially the same biological function or activity as the polypeptide. The "conservative variant polypeptide" may be (i) a polypeptide having one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) substituted, which may or may not be encoded by the genetic code, or (ii) a polypeptide having a substituent in one or more amino acid residues, or (iii) a polypeptide formed by fusion of a mature polypeptide with another compound (such as a compound that increases the half-life of the polypeptide, e.g., polyethylene glycol), or (iv) a polypeptide formed by fusion of an additional amino acid sequence to the polypeptide sequence (such as a leader or secretory sequence or a sequence used to purify the polypeptide or a proteolytic sequence, or a fusion protein with the formation of an antigen IgG fragment). Such fragments, derivatives and analogs are within the purview of one skilled in the art and would be well known in light of the teachings herein.

As used herein, the term "variant" or "mutant" refers to a peptide or polypeptide that has an amino acid sequence that is altered by the insertion, deletion or substitution of one or more amino acids as compared to a reference sequence, but retains at least one biological activity. The mutants described in any of the embodiments herein comprise an amino acid sequence having at least 50%, 60% or 70%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity with a reference sequence (SEQ ID NO:2, 4 or 14 as described herein) and retaining the biological activity of the reference sequence (e.g.as a glycosyltransferase). Sequence identity between two aligned sequences can be calculated using BLASTp, e.g., NCBI. Mutants also include amino acid sequences that have one or more mutations (insertions, deletions, or substitutions) in the amino acid sequence of the reference sequence, while still retaining the biological activity of the reference sequence. The plurality of mutations generally refers to within 1-20, such as 1-15, 1-10, 1-8, 1-5, or 1-3. The substitution is preferably a conservative substitution. For example, conservative substitutions with amino acids that are similar or analogous in nature typically do not alter the function of the protein or polypeptide. "similar or analogous amino acids" include, for example, families of amino acid residues with similar side chains, including amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, substitution of one or several sites with another amino acid residue from the same side chain class in a polypeptide of the invention will not substantially affect its activity.

For URT94-1m (SEQ ID NO: 14), it is a mutant of URT 94-1. In the present invention, a conservative variant polypeptide of URT94-1m may also be included, but in a variant corresponding to SEQ ID NO:14 is conserved at amino acid residue position 55.

Active polypeptide, coding gene, vector and host thereof

The inventor discloses a novel specific glycosyltransferase through excavating genome and transcriptome information and combining a great amount of research and experimental work, the novel specific glycosyltransferase can transfer glycosyl onto the first glycosyl of C-6 of a tetracyclic triterpene compound substrate specifically and efficiently to extend a sugar chain, and a reaction product of the novel specific glycosyltransferase has good application value in the fields of pharmacy and the like.

The sequence of the specific glycosyltransferase is preferably shown in SEQ ID NO: 2. 4 or 14. The polypeptide also includes a polypeptide having the same function as the polypeptide shown in SEQ ID NO: 2. 4 or 14. The invention also includes fragments, derivatives and analogues of the polypeptides. As used herein, the terms "fragment," "derivative," and "analog" refer to a polypeptide that retains substantially the same biological function or activity as the polypeptide.

In the present invention, the term "conservatively modified polypeptide" refers to a polypeptide which retains substantially the same biological function or activity as the polypeptide. The "conservative variant polypeptide" may be (i) a polypeptide having one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) substituted, which may or may not be encoded by the genetic code, or (ii) a polypeptide having a substituent in one or more amino acid residues, or (iii) a polypeptide formed by fusion of a mature polypeptide with another compound (such as a compound that increases the half-life of the polypeptide, e.g., polyethylene glycol), or (iv) a polypeptide formed by fusion of an additional amino acid sequence to the polypeptide sequence (such as a leader or secretory sequence or a sequence used to purify the polypeptide or a proteolytic sequence, or a fusion protein with the formation of an antigen IgG fragment). Such fragments, derivatives and analogs are within the purview of one skilled in the art and would be well known in light of the teachings herein.

The "conservatively variant polypeptide" may include (but is not limited to): deletions, insertions and/or substitutions of one or more (usually 1-50, preferably 1-30, more preferably 1-20, most preferably 1-10) amino acids, and additions or deletions of one or more (e.g., within 50, more preferably within 20 or 10, more preferably within 5) amino acids at the C-terminus and/or N-terminus. For example, in the art, substitution with amino acids of similar or similar properties does not generally alter the function of the protein. As another example, the addition of one or more amino acids at the C-terminus and/or N-terminus typically does not alter the function of the protein. The invention also provides analogs of the polypeptides. These analogs may differ from the native polypeptide by differences in amino acid sequence, by differences in modified forms that do not affect the sequence, or by both. These polypeptides include natural or induced genetic variants. Induced variants can be obtained by various techniques, such as random mutagenesis by irradiation or exposure to mutagens, by site-directed mutagenesis or other known techniques of molecular biology. Analogs also include analogs having residues other than the natural L-amino acid (e.g., D-amino acids), as well as analogs having non-naturally occurring or synthetic amino acids (e.g., beta, gamma-amino acids). It is to be understood that the polypeptides of the present invention are not limited to the representative polypeptides exemplified above.

The amino-or carboxy-terminal end of URT94-1 (SEQ ID NO: 2), URT94-2 (SEQ ID NO: 4) or URT94-1m (SEQ ID NO: 14) or a conservatively mutated polypeptide thereof of the present invention may further comprise one or more polypeptide fragments as protein tags. Any suitable label may be used with the present invention. For example, the tag may be FLAG, HA, HA, c-Myc, poly-His, poly-Arg, strep-TagII, AU1, EE, T7, 4A6, ε, B, gE, and Ty1. These tags can be used to purify proteins.

When for the purpose of producing a specific glycosyltransferase of the invention or other enzyme (e.g., an enzyme in a host cell that reacts to form a substrate for a specific glycosyltransferase of the invention, an enzyme involved in any step of the product synthesis pathway of the invention), a signal peptide sequence may also be added to the amino terminus of a polypeptide of the invention for secretory expression (e.g., to the outside of the cell) of the translated protein. The signal peptide may be cleaved off during endocrine egress of the polypeptide from the cell.

The active polypeptide of the present invention may be a recombinant polypeptide, a natural polypeptide, a synthetic polypeptide. The polypeptides of the invention may be naturally purified products, or chemically synthesized products, or produced from prokaryotic or eukaryotic hosts (e.g., bacteria, yeast, higher plants) using recombinant techniques. Depending on the host used in the recombinant production protocol, the polypeptides of the invention may be glycosylated or may be non-glycosylated. The polypeptides of the invention may or may not also include an initial methionine residue.

Polynucleotides encoding specific glycosyltransferases and other enzymes of the invention may be in the form of DNA or RNA. DNA forms include cDNA, genomic DNA, or synthetic DNA. The DNA may be single-stranded or double-stranded. The DNA may be a coding strand or a non-coding strand. The term "polynucleotide encoding a polypeptide" may include polynucleotides encoding the polypeptide, or may include additional coding and/or non-coding sequences.

The invention also relates to vectors comprising the polynucleotides of the invention, as well as host cells genetically engineered with the vectors or polypeptide coding sequences of the invention, and methods for producing the polypeptides of the invention by recombinant techniques.

The present invention relates to nucleic acid constructs comprising a polynucleotide as described herein, and one or more regulatory sequences operably linked to these sequences or sequences required for genomic homologous recombination. The polynucleotides of the invention may be manipulated in a variety of ways to ensure expression of the polypeptides or proteins. The nucleic acid construct may be manipulated according to the expression vector or requirements prior to insertion into the vector. Techniques for altering polynucleotide sequences using recombinant DNA methods are known in the art.

In certain embodiments, the nucleic acid construct is a vector. The vector may be a cloning vector, an expression vector, or a gene knock-in vector. Polynucleotides of the invention may be cloned into many types of vectors, e.g., plasmids, phagemids, phage derivatives, animal viruses and cosmids. Cloning vectors may be used to provide the coding sequence for a protein or polypeptide of the invention. The expression vector may be provided to the cell as a bacterial vector or a viral vector. Expression of the polynucleotides of the invention is typically achieved by operably linking the polynucleotides of the invention to a promoter and incorporating the construct into an expression vector. The vector may be suitable for replication and integration of eukaryotic cells. Typical expression vectors contain expression control sequences that can be used to regulate the expression of a desired nucleic acid sequence.

The knock-in vector is used to integrate the polynucleotide sequences described herein into a region of interest of the genome. Typically, the knock-in vector will contain, in addition to the polynucleotide sequence, a 5 'homology arm and a 3' homology arm required for homologous recombination of the genome. In some embodiments, the nucleic acid constructs herein contain a 5 'homology arm, a polynucleotide sequence described herein, and a 3' homology arm. When using a knock-in vector, the CRISPR/Cas9 technique can be used simultaneously to homologous recombine polynucleotide sequences to a location of interest. The CRISPR/Cas9 technology guides Cas9 nuclease to modify the genome at an insertion position by designing guide RNA for a target gene, resulting in increased homologous recombination efficiency in the modified region of the gene, and homologous recombination of a target fragment contained in a gene knock-in vector to the target site. Procedures for CRISPR/Cas9 technology are well known in the art, as well as reagents used, such as Cas9 nucleases.

Methods well known to those skilled in the art can be used to construct the nucleic acid construct. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like. The DNA sequence may be operably linked to an appropriate promoter in an expression vector to direct mRNA synthesis. Representative examples of these promoters are: the lac or trp promoter of E.coli; a lambda phage PL promoter; eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, LTRs from retroviruses, and other known promoters that control the expression of genes in prokaryotic or eukaryotic cells or viruses thereof. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator. In addition, the expression vector preferably comprises one or more selectable marker genes to provide phenotypic traits for selection of transformed host cells, such as dihydrofolate reductase, neomycin resistance and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli.

When the polynucleotide of the present invention is expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase the transcription of a gene. Examples include the SV40 enhancer 100 to 270 base pairs on the late side of the origin of replication, the polyoma enhancer on the late side of the origin of replication, and adenovirus enhancers.

The invention also provides host cells for biosynthesis of a product of interest. The host cell may be a prokaryotic cell, such as, but not limited to, E.coli, yeast, streptomyces; more preferably E.coli cells. The cell host is a production tool, and a person skilled in the art can modify various host cells by means of some technical means, so that biosynthesis according to the invention is also achieved, and thus the host cells and the production method are also intended to be encompassed by the invention.

The polynucleotide sequences of the present invention may be used to express or produce the polypeptides described herein by conventional recombinant DNA techniques. Generally, there are the following steps: (1) Transforming or transducing a suitable host cell with a polynucleotide (or variant) encoding the specific glycosyltransferase of the invention, or with an expression vector comprising the polynucleotide; (2) host cells cultured in a suitable medium; (3) isolating and purifying the protein from the culture medium or the cells.

Vectors comprising the appropriate DNA sequences as described above, as well as appropriate promoter or control sequences, may be used to transform appropriate host cells to enable expression of the protein. The host cell may be a prokaryotic cell, such as a bacterial cell; or lower eukaryotic cells, such as yeast cells; or higher eukaryotic cells, such as mammalian cells. Representative examples are: coli, streptomyces; bacterial cells of salmonella typhimurium; fungal cells such as yeast; a plant cell; insect cells of Drosophila S2 or Sf 9; CHO, COS, 293 cells, or Bowes melanoma cells. It will be clear to a person of ordinary skill in the art how to select appropriate vectors, promoters, enhancers and host cells.

Transformation of host cells with recombinant DNA can be performed using conventional techniques well known to those skilled in the art. The recombinant polypeptide in the above method may be expressed in a cell, or on a cell membrane, or secreted outside the cell. If desired, the recombinant proteins can be isolated and purified by various separation methods using their physical, chemical and other properties. Such methods are well known to those skilled in the art. Examples of such methods include, but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (salting-out method), centrifugation, osmotic sterilization, super-treatment, super-centrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, high Performance Liquid Chromatography (HPLC), and other various liquid chromatography techniques and combinations of these methods.

Application of

The present inventors have made an effort on the study of glycosyltransferases, however, in the previous work, an enzyme which can efficiently utilize a rhamnosyl donor, specifically linking a rhamnosyl group to the first glycosyl group at the C-6 position of a tetracyclic triterpene (type) compound has not been obtained. Among the existing enzymes, some are substantially incapable of utilizing rhamnosyl donors (e.g., UDP-Rha); some of them have very low activity and cannot fully meet the application requirements.

Under the above background, the present inventors screened from ginseng to obtain a specific glycosyltransferase (URT 94 s) capable of extending rhamnose at C6 position, which can efficiently catalyze the reaction of protopanaxatriol saponin (protopanaxatriol saponin/protopanaxatriol saponin): ginsenoside Rh1, ginsenoside Rg1 and notoginsenoside R3 extend 1 molecule of rhamnose on the first glycosyl at C-6 position; thereby obtaining ginsenoside Rg2, ginsenoside Re or Yesanchinoside E. The glycosyltransferase is a highly specific glycosyltransferase provided for the efficient preparation of ginsenoside Rg2 or ginsenoside Re or Yesanchinoside E. Preferably, the protopanaxatriol saponin comprises ginsenoside Rh1 and ginsenoside Rg1.

As a specific embodiment of the invention, the active polypeptides of the invention have glycosyltransferase activity and are capable of catalyzing one or more of the following reactions:

wherein R1 and R2 are H or glycosyl, and R3 and R4 are monosaccharide glycosyl.

In one or more embodiments, the substituted compounds of R1-R4 are as follows:

substrate(s)	R1	R2	R3	R4	Product(s)
Rg1	H	Glc	Glc	Rha	Ginsenoside Re
Rh1	H	H	Glc	Rha	Ginsenoside Rg2

Namely, when R1 is H, R2 and R3 are glucosyl, the compound of formula (I) is ginsenoside Rg1, and when R4 is rhamnosyl, the compound of formula (II) is notoginsenoside Re; or when R1 and R2 are H and R3 is glucosyl, the compound of formula (I) is ginsenoside Rh1, and when R4 is rhamnosyl, the compound of formula (II) is notoginsenoside Rg2.

As a further embodiment of the present invention,

wherein R1 is H or glycosyl, and R2, R3, R4 and R5 are monosaccharide glycosyl; the polypeptide is selected from SEQ ID NO: 2. 4 or 14 or a polypeptide derived therefrom.

In one or more embodiments, the substituted compounds of R1-R5 are as follows:

That is, when R1 is H, R2, R3 and R4 are glucosyl groups, the compound of formula (III) is notoginsenoside R3, and when R5 is rhamnosyl, the compound of formula (IV) is Yesanchinoside E.

The invention also provides a method of constructing a transgenic plant comprising regenerating a host cell containing a polypeptide or polynucleotide described herein into a plant, said host cell being a plant cell. Methods and reagents for regenerating plant cells are well known in the art.

The glycosyltransferase of the present invention is especially capable of converting ginsenoside Rh1 into ginsenoside Rg2 with other activities. The glycosyltransferase of the invention can respectively convert ginsenoside Rg1 into ginsenoside Re with other activities.

The active polypeptide or glycosyltransferase related by the invention can be used for artificially synthesizing known ginsenoside and new ginsenoside and derivatives thereof, and can convert Rh1 into active ginsenoside Rg2 and Rg1 into active ginsenoside Re.

The invention also provides a method of constructing a transgenic plant comprising transforming a plant with a polynucleotide or nucleic acid construct as described herein, and obtaining a transgenic positive plant expressing a polypeptide as described herein, comprising said polynucleotide or comprising said nucleic acid construct in the progeny of the plant by crossing, screening. Methods for transforming plants with nucleic acids, crossing plants, and screening for transgenic positive plants are well known in the art.

The invention also provides a kit for biosynthesis of a target product or an intermediate thereof, comprising: SEQ ID NO: 2. 4 or 14 or a conservative variant polypeptide thereof; preferably also included therein are glycosyl donors; preferably host cells are also included. More preferably, the kit further comprises instructions for performing the method of biosynthesis.

The invention has the main advantages that:

(1) The specific glycosyltransferase of the invention can transfer the first glycosyl of C-6 of the substrate of the tetracyclic triterpene compound into glycosyl to extend the sugar chain in a specific and efficient way;

(2) Rh1 can be efficiently converted into ginsenoside Rg2 with activity by the glycosyltransferase; rg1 can be efficiently converted into ginsenoside Re with activity by the glycosyltransferase. The activity of Rg2 in preventing and treating neurodegenerative diseases; re has the activity of reducing blood sugar and treating diabetes. Therefore, the glycosyltransferase of the invention has wide application value.

(3) The catalytic efficiency is high. The activity of URT94-1 and URT94-2 in catalyzing extension of the sugar chain at the C6 position of Rh1 with UDP-rhamnose as a glycosyl donor is improved by at least 5 times as compared with glycosyltransferase disclosed in patent PCT/CN 2015/081111.

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental procedures, which are not specifically noted in the examples below, are generally carried out according to conventional conditions such as those described in J.Sam Brookfield et al, molecular cloning guidelines, third edition, scientific Press, or according to the manufacturer's recommendations.

Sequence information

SEQ ID NO:1 (URT 94-1 nucleic acid)

SEQ ID NO:2 (URT 94-1 protein)

SEQ ID NO:3 (URT 94-2 nucleic acid)

SEQ ID NO:4 (URT 94-2 protein)

SEQ ID NO:5 (primer set 1-F)

SEQ ID NO:6 (primer set 1-R)

SEQ ID NO:7 (primer set 2-F)

SEQ ID NO:8 (primer set 2-R)

SEQ ID NO：9(URT94-1_Pet28a-F)

SEQ ID NO：10(URT94-1_Pet28a-R)

SEQ ID NO：11(URT94-2_Pet28a-F)

SEQ ID NO：12(URT94-2_Pet28a-R)

SEQ ID NO:13 (URT 94-1m1 nucleic acid)

SEQ ID NO:14 (URT 94-1m1 protein)

EXAMPLE 1 cloning of ginseng-derived glycosyltransferase URT94s

The inventor performs intensive research and screening to clone two glycosyltransferases from a single ginseng plant, which are named URT94-1 and URT94-2 (URT 94 s).

Cloning of the URT94 s: extracting ginseng RNA and carrying out reverse transcription to obtain ginseng cDNA. 2 pairs of primers (SEQ ID NO:5-SEQ ID NO:6 amplified URT94-1; SEQ ID NO:7-SEQ ID NO:8 amplified URT 94-2) were designed using the cDNA as a template for PCR amplification. The DNA polymerase is PrimeSTAR, a high-fidelity DNA polymerase of Takara Bio-engineering Co. The PCR products were detected by agarose gel electrophoresis (FIG. 1). The target DNA band is excised by irradiation with ultraviolet light. Then, the amplified DNA fragment was obtained by recovering DNA from agarose gel using AxyPrep DNA Gel Extraction Kit (AXYGEN Co.). The DNA fragment was ligated with the commercial cloning vector pMD18T plasmid by rTaq DNA polymerase of Takara Bio-engineering Co., ltd at the end, and recombinant plasmids URT94-1-pMD18T and URT94-2-pMD18T were obtained. The ligation product was transformed into competent cells of E.coli Top10, and the transformed E.coli broth was plated on LB plates supplemented with 100ug/mL ampicillin, and recombinant clones were further verified by PCR and restriction enzyme. And respectively selecting one clone to extract recombinant plasmid, and sequencing. It was found that URT94-1 and URT94-2 are glycosyltransferase genes whose ORFs encode the PSPG cassette of the conserved domain of glycosyltransferase family 1.

The present inventors performed expression and transglycosylation reaction analyses on URT94-1 and URT94-2, respectively. Wherein the glycosyltransferases encoded by 2 nucleic acid sequences (SEQ ID NO: i, 3, respectively) can catalyze extension of 1 rhamnosyl group at the C6 position of Rh1 to generate Rg2, and the catalytic activity is improved by at least 5 times compared with the mutant gGT-7 (N343G, A359P) of gGT29-7 disclosed in the prior patent (PCT/CN 2015/081111), and both cannot catalyze extension of 1 glucosyl group at the C6 position of Rh1 to generate Rf.

Experimental results show that the conversion rate of the ginseng-derived URT94-1 and URT94-2 for catalyzing the C6 position of Rh1 to extend one rhamnosyl to generate Rg2 is more than 50%, the conversion rate of the ginseng-derived URT94-1 and URT94-2 for catalyzing the C6 position of Rh1 to extend one rhamnosyl to generate Re is more than 50%, and the ginseng-derived URT94-1 and URT94-2 cannot catalyze the C6 position of Rh1 to extend 1 glucosyl to generate C20-O-Glc-Rf, so that the ginseng-derived URT94-2 is UDP-mouse Li Tanggao degree specific glycosyltransferase.

EXAMPLE 2 construction of recombinant expression plasmid for the Ginseng glycosyltransferase URT94s Gene

The pMD18T plasmids containing the URT94-1 and URT94-2 genes constructed in example 1 were used as examples, the forward primer comprising two parts, the 5 '-end-3' -end comprising 20bp of the sequence of the pET28a homology arm and the initiation sequence encoding URT94-1 in sequence, the reverse primer comprising two parts, the 5 '-end-3' -end comprising 20bp of the sequence of the pET28a homology arm and the end sequence encoding URT94-1 comprising 20bp (SEQ ID NO:9-SEQ ID NO:10, see Table 1), and the gene encoding URT94-1 (comprising the pET28a homology arm) was amplified by PCR using the above primers. The DNA polymerase is high-fidelity DNA polymerase PrimeSTAR of Takara Bio-engineering Co., ltd, and the PCR program is set by referring to the instruction book: 94 ℃ for 2min;94℃for 15s,57℃for 30s and 68℃for 1.5min, 33 cycles in total; 68 ℃ for 10min; preserving heat at 16 ℃. The PCR product is detected by agarose gel electrophoresis, and a band consistent with the size of the target DNA is cut off under ultraviolet light. The DNA fragment was then recovered from the agarose gel using AxyPrep DNA Gel Extraction Kit (AXYGEN Co.).

Plasmid pET28a was digested with the FD restriction enzymes NcoI and SalI from Thermo, 37℃for 50min, and then linear plasmid pET28a was recovered from agarose gel using AxyPrep DNA Gel Extraction Kit (AXYGEN). The enzyme tangential plasmids were subjected to homologous recombination with 2 UGTs such as URT94-1 obtained above, respectively, using a recombinase from assist in Shanghai, st.Johnsonian, inc., and the ligation products were transformed into E.coli BL21 (DE 3) competent cells, which were plated on LB plates supplemented with 50. Mu.g/mL kanamycin (Kana). Positive transformants were verified by colony PCR and sequenced to further verify whether the recombinant expression plasmid was constructed successfully. Positive transformants were designated as E.coli BL21-URT94-1 and BL21-URT94-2.

TABLE 1 primers for construction of Gene expression plasmids

Example 3 expression of the Ginseng glycosyltransferase URT94s in E.coli

Two kinds of colibacillus BL21-URT94-1 and BL21-URT94-2 with correct sequencing are inoculated into 50mL LB culture medium, cultured at 37 ℃ and 200rpm until OD600 is about 0.6-0.8, the bacterial liquid is cooled to 4 ℃, IPTG with the final concentration of 200 mu tM is added, and induced expression is carried out at 18 ℃ and 120rpm for 16h. The cells were collected by centrifugation at 4℃and disrupted by sonication, and the supernatant of the cell lysate was collected by centrifugation at 12000g for 10min at 4℃to thereby obtain a crude protease solution. The 6 XHis tag sequence on pET28a makes the C terminal of proteins URT94-1 and URT94-2 carry 6 XHis tag tags, respectively. Thus, western blot is carried out on the two protein crude enzyme solutions to detect the protein expression condition. Anti 6X His tag Western Blot (FIG. 2) shows that there is a distinct band between 45-55kD and that glycosyltransferases URT94-1 and URT94-2 are both expressed soluble in E.coli.

EXAMPLE 4 in vitro transglycosylation Activity and product identification by glycosyltransferase URT94s Using protopanaxatriol type saponin Rh1 as substrate

The cell lysates of recombinant E.coli BL21-URT94-1 and BL21-URT94-2 of example 4 were used as crude enzyme solutions for the transglycosylation reaction, and the cell lysates of recombinant E.coli transformed with empty vector pET28a were used as controls. The ginseng glycosyltransferase gGT-7, gGT29-7 (N343G, A359P) from patent PCT/CN2015/081111 was selected as a positive control. In vitro transglycosylation assays were performed according to the reaction system presented in table 2, and reacted overnight at 35 ℃.

The reaction results were detected by Thin Layer Chromatography (TLC), high Performance Liquid Chromatography (HPLC):

TABLE 2 enzyme Activity measurement reaction System

As shown in figures 3a-b, protopanaxatriol type ginsenoside Rh1 is taken as a glycosyl acceptor, UDP-Rha is taken as a glycosyl donor, BL21-URT94-1 and BL21-URT94-2 catalyze the protopanaxatriol type ginsenoside Rh1 to generate Rg2, and the catalytic efficiency of the protopanaxatriol type ginsenoside Rh1 and the UDP-Rha are obviously better than that of glycosyltransferase gGT-7 (N343G, A359P) disclosed before. The results of HPLC were consistent with the results of TLC.

Therefore, like URT94-1, URT94-2 and gGT29-7 (N343G, A359P), it is able to catalyze C6-O-Glc extension of Rh1 by one molecule of rhamnose to generate ginsenoside Rg2.

Example 5 in vitro transglycosylation Activity and product identification by glycosyltransferase URT94s with protopanaxatriol type saponin Rg1 as substrate

The cell lysates of recombinant E.coli BL21-URT94-1 and BL21-RT94-2 of example 4 were used as crude enzyme solutions for the transglycosylation reaction, and the cell lysates of recombinant E.coli transformed with empty vector pET28a were used as controls. The ginseng glycosyltransferase gGT-7, gGT29-7 (N343G, A359P) from patent PCT/CN2015/081111 was selected as a positive control. In vitro transglycosylation assays were performed according to the reaction system presented in table 3, at 35 ℃ overnight.

the protopanaxatriol type ginsenoside Rg1 is taken as a glycosyl acceptor, UDP-Rha is taken as a glycosyl donor, URT94-1 and URT94-2 catalyze the protopanaxatriol type ginsenoside Rg1 to generate Re, and the catalytic efficiency is obviously better than that of glycosyltransferase gGT29-7 (N343G, A359P) (PCT/CN 2015/081111) disclosed before. The results of HPLC were consistent with the results of TLC. As shown in fig. 4 a-b.

Therefore, like URT94-1, URT94-2 and gGT29-7 (N343G, A359P), it is capable of catalyzing C6-O-Glc extension of Rg1 by one molecule of rhamnose to generate ginsenoside Re.

Example 6 in vitro transglycosylation Activity and product identification by glycosyltransferase URT94s with protopanaxatriol type saponin Rh1/Rg1 as substrate and UDP-Glc as sugar donor

The cell lysates of recombinant E.coli BL21-URT94-1 and BL21-URT94-2 of example 4 were used as crude enzyme solutions for the transglycosylation reaction, and the cell lysates of recombinant E.coli transformed with empty vector pET28a were used as controls. The ginseng glycosyltransferase gGT-7, gGT29-7 (N343G, A359P) from patent PCT/CN2015/081111 was selected as a positive control. In vitro transglycosylation assays were performed according to the reaction system presented in table 3, at 35 ℃ overnight. The reaction results were detected by Thin Layer Chromatography (TLC) and High Performance Liquid Chromatography (HPLC), respectively.

TABLE 3 enzyme activity assay reaction System

Protopanaxatriol ginsenoside Rh1 is taken as a glycosyl acceptor, UDP-Glc is taken as a glycosyl donor, URT94-1 and URT94-2 cannot catalyze the protopanaxatriol ginsenoside Rh1 to generate Rf, and the HPLC result is consistent with the TLC result. Thus, unlike gGT-29 and gGT-7 (N343G, A359P), glycosyltransferases URT94-1 and URT94-2 of the present invention are not capable of catalyzing the extension of C6-O-Glc of Rh1 by one molecule of glucose to generate ginsenoside Rf, as shown in FIG. 5.

Protopanaxatriol type ginsenoside Rg1 is taken as a glycosyl acceptor, UDP-Glc is taken as a glycosyl donor, URT94-1 and URT94-2 cannot catalyze the protopanaxatriol type ginsenoside Rg1 to generate C20-O-Glc-Rf, and the HPLC result is consistent with the TLC result. Thus, unlike gGT-29 and gGT-7 (N343G, A359P), the glycosyltransferases URT94-1 and URT94-2 of the present invention are not capable of catalyzing the extension of C6-O-Glc of Rg1 by one molecule of glucose to produce ginsenoside C20-O-Glc-Rf, as shown in FIG. 6. It was shown that URT94-1 and URT94-2 are UDP-mouse Li Tanggao degree specific glycosyltransferases.

Example 7 comparison of efficiency of URT94s catalyzing extension of C6 by one molecule of rhamnose

Glycosyltransferase gGT-7 of source patent PCT/CN2015/081111 can extend a molecule of glucose at C6, gGT-7 (N343G, A359P), can extend a molecule of glucose at C6 or can extend a molecule of rhamnose at C6. Glycosyltransferases gGT-7, gGT29-7 (N343G, A359P) and glycosyltransferases URT94-1 and URT94-2 of the present invention were expressed and crude enzyme solutions were prepared as described in example 4. The enzyme-catalyzed reaction was carried out as in example 5, using UDP-Rha as the glycosyl donor and Rh1 and/or Rg1 as the glycosyl acceptor, at 35℃for 1 hour, and the product was quantified by HPLC. The calculation of catalytic efficiency is carried out according to the following formula:

conversion efficiency (%) =product amount/(substrate amount+product amount)

As shown in Table 4, the activity of the glycosyltransferases gGT29-7, gGT29-7 (N343G, A359P), URT94-1 and URT94-2, both catalyzing the extension of the sugar chain at the C6 position of Rh1 and/or Rg1 with UDP-rhamnose as a glycosyl donor, was improved compared to those disclosed in PCT/CN 2015/081111.

Table 4 comparison of catalytic efficiency of glycosyltransferases catalyzing extended Rha at C6 position

Thus, unlike prior glycosyltransferases, URT94-1 and URT94-2 of the present invention can specifically and efficiently add a rhamnosyl group further to the first glycosyl group of C-6 of the tetracyclic triterpene compound substrate to extend the sugar chain.

Example 8 efficient murine glycosyltransferase URT94-1 mutant protein

In order to further increase the catalytic activity of rhamnosyl transferase, the inventors constructed a mutant library thereof against URT94-1 using a random mutation method.

(1) Error-prone PCR

Error-prone PCR was performed using the rhamnosyltransferase URT94-1 gene sequence (SEQ ID NO: 1) as a template, and primers URT94-1_Pet28a-F (5'-ctttaagaaggagatataccatggataccaatgaaaaaacca-3' (SEQ ID NO: 9)) and URT94-1_Pet28a-R (5'-ctcgagtgcggccgcaagcttggggcatcgcttcccctggcctg-3' (SEQ ID NO: 10)). The error-prone PCR is selected from Stratagene GeneMorph II Random Mutagenesis Kit random mutation kit. The PCR procedure was: 95 ℃ for 2min;95 ℃ for 10s,55 ℃ for 15s and 72 ℃ for 2min, 28 cycles in total; the temperature was reduced to 10℃for 10min at 72℃and the template was used in an amount of 50ng. And (3) recovering the PCR product after agarose gel electrophoresis to obtain the error-prone PCR product of the rhamnosyl transferase URT 94-1.

(2) Expression of enzymes

The PCR product described above was ligated to pET28a plasmid (one-step cloning kit, purchased from Shanghai assist in Saint.) and the ligation product transformed into laboratory-prepared competent E.coli BL21 cells, and the transformed E.coli bacterial solution was plated on LB plates supplemented with 100ug/mL kanamycin and recombinant clones were further verified by PCR. And respectively selecting a plurality of clones to extract recombinant plasmids, and sequencing.

(3) Enzyme activity assay and screening

And (3) selecting a plurality of escherichia coli expression strains of the ginseng glycosyltransferase URT94-1 mutant obtained in the step (2), respectively inoculating the escherichia coli expression strains into 50mL of LB liquid culture medium, and culturing at 37 ℃ and 200rpm to ensure that OD600 reaches 0.6-0.8, inducing with 0.2mM IPTG and culturing at 110rpm for 18h at 16 ℃. After low temperature collection, 2mL of 50mM tris-HCl pH8.0 was used for reselecting the cells, and the cells were crushed by a cell crusher to obtain a crude protease solution.

By using the enzyme activity measurement reaction system of Table 3 to compare enzyme activities, the inventor obtains 1 mutant which has high enzyme activity (the nucleic acid sequence is shown as SEQ ID NO:13; the protein sequence is shown as SEQ ID NO: 14) which is named as URT94-1M1 after analyzing and screening a large number of mutants, and the 55 th position of the mutant is mutated from L to M (L55M) corresponding to the wild type URT 94-1. The results of the comparison of the activity of rhamnosyl transferase URT94-1 and its mutant URT94-1m1 are shown in Table 5.

TABLE 5

Sequence numbering	Sequence name	Mutation site	Rg2 conversion (%)	Re conversion (%)
SEQ ID NO：2	PgURT94	-	75％	62％
SEQ ID NO：14	PgURT94m1	L55M	92％	99％

According to Table 5, the catalytic activity of the mutant is obviously improved compared with URT94-1, the efficiency of catalyzing Rh1 to synthesize Rg2 is improved to 92%, and the efficiency of catalyzing Rg1 to generate Re is improved to 99%, and the specific results are shown in Table 5 and FIG. 7.

The inventor detects protein expression by using a Western blot method, and the result is shown in figure 8, and the mutant URT94-1m1 can be efficiently expressed.

All documents mentioned in this application are incorporated by reference as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the claims appended hereto.

Claims

A method of linking a rhamnosyl group on the first glycosyl group at the C-6 position of a tetracyclic triterpene compound comprising: transfer is performed with a specific glycosyltransferase having the amino acid sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof.
The method of claim 1, wherein said rhamnosyl is provided by a glycosyl donor; preferably, the glycosyl donor is a glycosyl donor carrying a rhamnose group; more preferably, the glycosyl donor comprises a glycosyl donor selected from the group consisting of: uridine diphosphate-rhamnose, guanosine diphosphate-rhamnose, adenosine diphosphate-rhamnose, cytidine diphosphate-rhamnose, thymidine diphosphate-rhamnose, or a combination thereof.
The method of claim 1, wherein the tetracyclic triterpene compound is of formula (I), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is of formula (II);

Wherein R1 and R2 are H or glycosyl, R3 is monosaccharide glycosyl, and R4 is rhamnosyl; preferably, the glycosyl or monosaccharide glycosyl is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R2 and R3 is glucosyl, the compound of formula (I) is ginsenoside Rg1 and the compound of formula (II) is ginsenoside Re; when R1 and R2 are H, R and are glucosyl, the compound of formula (I) is ginsenoside Rh1, and the compound of formula (II) is ginsenoside Rg2.
The method of claim 1, wherein the tetracyclic triterpene compound is of formula (III), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is of formula (IV);

wherein R1 is H or glycosyl, R2, R3 and R4 are monosaccharide glycosyl, and R5 is rhamnosyl; preferably, the glycosyl or monosaccharide glycosyl is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R, R3 and R4 are glucosyl, and R5 is rhamnosyl, the compound of formula (III) is notoginsenoside R3, and the compound of formula (IV) is Yesanchinoside E.
Use of a specific glycosyltransferase for the attachment of a rhamnosyl group on the first glycosyl group in position C-6 of a tetracyclic triterpene compound, said specific glycosyltransferase being a glycosyl transferase having the amino acid sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof.
The use according to claim 5, wherein the rhamnosyl group is provided by a glycosyl donor; preferably, the glycosyl donor is a glycosyl donor carrying a rhamnose group; more preferably, the glycosyl donor comprises a glycosyl donor selected from the group consisting of: uridine diphosphate-rhamnose, guanosine diphosphate-rhamnose, adenosine diphosphate-rhamnose, cytidine diphosphate-rhamnose, thymidine diphosphate-rhamnose, or a combination thereof.
The use according to claim 5, wherein the tetracyclic triterpene compound is of formula (I), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is of formula (II);

wherein R1 and R2 are H or glycosyl, R3 is monosaccharide glycosyl, and R4 is rhamnosyl; preferably, the glycosyl or monosaccharide glycosyl is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R2 and R3 is glucosyl, the compound of formula (I) is ginsenoside Rg1 and the compound of formula (II) is ginsenoside Re; when R1 and R2 are H, R and are glucosyl, the compound of formula (I) is ginsenoside Rh1, and the compound of formula (II) is ginsenoside Rg2.
The use according to claim 5, wherein the tetracyclic triterpene compound is of formula (III), and the compound having a glycosyl group attached to the glycosyl group at the C-6 position is of formula (IV);

Wherein R1 is H or glycosyl, R2, R3 and R4 are monosaccharide glycosyl, and R5 is rhamnosyl; preferably, the glycosyl or monosaccharide glycosyl is selected from: a glucosyl, xylosyl, arabinosyl or rhamnosyl group;

preferably, when R1 is H, R, R3 and R4 are glucosyl, and R5 is rhamnosyl, the compound of formula (III) is notoginsenoside R3, and the compound of formula (IV) is Yesanchinoside E.
A method of intracellular attachment of a rhamnosyl group on the first glycosyl group at the C-6 position of a tetracyclic triterpene compound comprising:

(a) Introducing a tetracyclic triterpene compound reaction precursor or a construct expressing/forming the same, and introducing a specific glycosyltransferase or a construct expressing the same into a host cell to obtain a recombinant host cell; the specific glycosyltransferase is a glycosyltransferase having the sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof; a glycosyl donor carrying a rhamnose group or a glycosyl donor carrying a rhamnose group is introduced into the host cell;

(b) Culturing the recombinant host cell of (a) to obtain a tetracyclic triterpene compound having a rhamnosyl group attached to the first glycosyl group at position C-6.
The method of claim 9, wherein the tetracyclic triterpene compound reaction precursor comprises: ginsenoside Rg1, ginsenoside Rh1, and notoginsenoside R3; the corresponding products include: ginsenoside Re, ginsenoside Rg2, yesanchinoside E;

preferably, the glycosyl donor comprises a glycosyl donor selected from the group consisting of: uridine diphosphate-rhamnose, guanosine diphosphate-rhamnose, adenosine diphosphate-rhamnose, cytidine diphosphate-rhamnose, thymidine diphosphate-rhamnose, or a combination thereof.
A specific glycosyltransferase having the sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof; preferably, the conservatively variant polypeptide comprises:

(1) Consists of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, and has a rhamnosyl function linked to the first glycosyl at the C-6 position of the tetracyclic triterpene compound;

(2) Amino acid sequence identical to SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14 and has more than 50% identity to the polypeptide of the sequence shown in figure 14 and has the function of linking a rhamnosyl group to the first glycosyl group at the C-6 position of the tetracyclic triterpene compound; or (b)

(3) In SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a signal peptide sequence is added to the N-terminus of the polypeptide having the sequence shown in fig. 14.
An isolated polynucleotide encoding the specific glycosyltransferase of claim 11.
A nucleic acid construct comprising the polynucleotide of claim 11, or expressing the specific glycosyltransferase of claim 11; preferably, the nucleic acid construct is an expression vector or a homologous recombination vector.
A recombinant host cell expressing the specific glycosyltransferase of claim 11, or comprising the polynucleotide of claim 12, or comprising the nucleic acid construct of claim 13; preferably, the recombinant host cell further comprises a tetracyclic triterpene compound reaction precursor or a construct expressing/forming the same; preferably, a glycosyl donor carrying a rhamnose group or a glycosyl donor carrying a rhamnose group is also present in the recombinant host cell;

preferably, the tetracyclic triterpene compound reaction precursor comprises: ginsenoside Rg1, ginsenoside Rh1, and notoginsenoside R3; the corresponding products include: ginsenoside Re, ginsenoside Rg2, yesanchinoside E;

Preferably, the glycosyl donor comprises a glycosyl donor selected from the group consisting of: uridine diphosphate-rhamnose, guanosine diphosphate-rhamnose, adenosine diphosphate-rhamnose, cytidine diphosphate-rhamnose, thymidine diphosphate-rhamnose, or a combination thereof.
A kit for glycosyl transfer comprising:

the specific glycosyltransferase of claim 7, which is capable of linking a rhamnosyl group to the first glycosyl group at position C-6 of a tetracyclic triterpene compound, said specific glycosyltransferase being a glycosyl transferase having the amino acid sequence of SEQ ID NO: 2. SEQ ID NO:4 or SEQ ID NO:14, or a conservatively variant polypeptide thereof; or (b)

The isolated polynucleotide of claim 12; or (b)

The nucleic acid construct of claim 13; or (b)

The recombinant host cell of claim 14;

preferably, the method further comprises: a glycosyl donor carrying a rhamnose group; more preferably, the glycosyl donor comprises: uridine diphosphate-rhamnose, guanosine diphosphate-rhamnose, adenosine diphosphate-rhamnose, cytidine diphosphate-rhamnose, thymidine diphosphate-rhamnose;

preferably, the method further comprises: tetracyclic triterpene compounds are reactive precursors.