CN112708602B

CN112708602B - Dioscorea zingiberensis-derived diosgenin synthesis related protein, coding gene and application

Info

Publication number: CN112708602B
Application number: CN201911021712.0A
Authority: CN
Inventors: 张学礼; 陈晶; 程健; 江会锋; 樊飞宇; 戴住波
Original assignee: Tianjin Institute of Industrial Biotechnology of CAS
Current assignee: Tianjin Institute of Industrial Biotechnology of CAS
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2022-04-05
Anticipated expiration: 2039-10-25
Also published as: CN112708602A

Abstract

The invention discloses a dioscorea zingiberensis-derived diosgenin synthesis-related protein, a coding gene and application thereof. The dioscorea zingiberensis-derived diosgenin synthesis related protein provided by the invention is two proteins shown in SEQ ID No.9 and SEQ ID No. 7. The saccharomyces cerevisiae capable of synthesizing the cholesterol is introduced by utilizing the dioscorea zingiberensis-derived diosgenin synthesis related gene provided by the invention and a cytochrome P450 reductase coding gene together, so that a saccharomyces cerevisiae engineering strain capable of synthesizing the dioscorea zingiberensis saponin can be obtained. The invention realizes the biosynthesis of the saccharomyces cerevisiae dioscin.

Description

Dioscorea zingiberensis-derived diosgenin synthesis related protein, coding gene and application

Technical Field

The invention relates to the technical field of biology, in particular to a dioscorea zingiberensis-derived diosgenin synthesis-related protein, a coding gene and application thereof.

Background

Dioscorea plants are rich in Starch or steroid saponins and have been widely used in the food and pharmaceutical industries worldwide (Piperno, D.R., Ranore, A.J., Holst, I., Hansell, P.,2000.Starch yield crop moisture conservation in the pana Manual nutritional requirement, Nature.407, 894-7.). Among them, peltate yam (Dioscorea zingiberensis) is an important Chinese medicine and widely used in the treatment of rheumatoid arthritis, anthrax, cough, heart disease, etc. (Jesus, M., Martins, A.P., Gallardo, E., Silvestre, S.,2016.Diosgenin: Recent Highliights on pharmacy and Analytical Methods. J. animal Methods chem.2016, 4156293.). Diosgenin, also known as diosgenin, is a steroid sapogenin accumulated in dioscorea zingiberensis rhizome and is an important precursor for synthesizing various steroid hormone drugs, including antioxidants, anti-inflammatory drugs, sex hormones, steroids, cortisones, contraceptives, birth control compounds, anabolic drugs, and the like (Willaman, j.j., Fenske, c.s., corell, d.s.,1953. occurence of alkaloids in dioscorea.science.118, 329-30.).

At present, the production of diosgenin in China mainly adopts a mode of directly extracting dioscoreaceae plants, and the source plant of the dioscorea zingiberensis is mainly grown and planted in the middle and lower reaches of Yangtze river and the north-south water transfer area in China, however, due to the unstable planting state and long period (more than 2 years) and the complicated extraction process and the influence of factors such as Bai, y, Zhang, L, Jin, w, Wei, m, Zhou, p, Zheng, g, Niu, L, Nie, L, Zhang, y, Wang, h, Yu, L, 2015, In situ high-valued digestion and transformation of sulfur free diosorea zingiberensis c.h.wright for clean production of diosgenin, biosesor, 196,642-7, the steroid hormone and the steroid hormone of saponin, the steroid hormone supply and the steroid hormone price fluctuate to the industry. Steroid hormone is the second major hormone medicine second to antibiotics at present, and China is a major export country of steroid hormone bulk drugs and intermediates. The stable and efficient supply and output of dioscin and related substitute raw materials are the basis for maintaining the continuous and healthy development of the steroid hormone industry in China. Therefore, the transformation from plant resource extraction to microbial synthesis of the upstream raw materials used by the steroid drugs is a revolutionary transformation, and the further leap-type development of the steroid hormone industry in China is led. The method combining biological conversion and chemical synthesis is utilized to realize the synthesis of steroid hormone, replace plant raw materials with high pollution and high cost, and have remarkable economic and social benefits.

In plants, diosgenin is biosynthesized from precursor cholesterol in ten steps (see fig. 1), at least three oxidation reactions on C-22, C-26, C-16 of the precursor cholesterol followed by two cyclization reactions, resulting in the synthesis of diosgenin (sonawale, p.d., Pollier, J., Panda, S.s., Szymanski, J.Massalha, H.s., Yona, M.s., Unger, T.Makily, S.s., Arent, P.s., Pauls, L.Almekias-Siegl, E.s., Rogachev, I.Meir, S.Cardinas, P.D., Masri, A.A., Petrikov, M.Scheduler, H.Schaffer, A.A.Kambrin, A.A.A., Gimbrios, Godys, P.D., Masri, A.A.1628, P.D., P.P.P.S., Petrolb.2016, P.D.S.S., phytol.S.D.. The synthetic pathway of diosgenin has been successfully identified in Paris polyphylla (Paris polyphylla) and fenugreek (Trigonella foenum graecum). The identification shows that the synthesis of diosgenin of the two plants needs two key P450 proteins (PpCYP90G4/TfCYP90B50 and PpCYP94D108/TfCYP82J17), wherein the former is responsible for catalyzing C-16 and C-22 double oxidation of cholesterol at the same time, and the latter is responsible for completing hydroxylation of C-26, namely, biosynthesis from cholesterol to diosgenin. The dioscorea zingiberensis is used as a main source plant of dioscin in China, the synthesis route of the dioscin is not resolved so far, and related important P450 protein is not identified, so that the identification of the P450 gene related to the dioscin synthesis in the dioscorea zingiberensis is completed, and the important significance is realized for the biosynthesis of the dioscin in the saccharomyces cerevisiae.

Disclosure of Invention

The invention aims to provide a dioscorea zingiberensis-derived diosgenin synthesis related protein, a coding gene and application.

In a first aspect, the invention claims a protein or a protein set.

The protein claimed in the present invention is protein a or protein B.

The protein set claimed by the invention is protein set A or protein set B or protein set C.

The protein set A is composed of the protein A and the protein B.

The protein set B consists of the protein A, the protein B and the protein C.

The protein set is composed of the protein A, the protein B, the protein C, the protein D and the protein E.

The protein A (cholesterol 16-position and 22-position double oxidases from dioscorea zingiberensis and named as DGCYP033) is a protein shown in any one of the following (A1) - (A4):

(A1) protein with amino acid sequence shown as SEQ ID No. 9;

(A2) a protein obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence defined in (A1) and having the same function;

(A3) and (b) a protein having 99% or more, 95% or more, 90% or more, 85% or more, or 80% or more homology with the amino acid sequence defined in (A1) or (A2) and having the same function.

(A4) A fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of a protein defined in any one of (A1) to (A3);

the protein B (cholesterol 26-site oxidase derived from dioscorea zingiberensis, named DGCYP029) is a protein shown in any one of the following (B1) - (B4):

(B1) protein with amino acid sequence shown as SEQ ID No. 7;

(B2) a protein obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence defined in (B1) and having the same function;

(B3) a protein having a homology of 99% or more, 95% or more, 90% or more, 85% or more, or 80% or more with the amino acid sequence defined in (B1) or (B2) and having the same function;

(B4) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of the protein defined in any one of (B1) to (B3).

The protein C (grape-derived cytochrome P450 reductase, designated VvCPR) is a protein represented by any one of the following (C1) to (C4):

(C1) protein with amino acid sequence shown as SEQ ID No. 11;

(C2) a protein obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence defined in (C1) and having the same function;

(C3) a protein having a homology of 99% or more, 95% or more, 90% or more, 85% or more, or 80% or more with the amino acid sequence defined in (C1) or (C2) and having the same function;

(C4) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of the protein defined in any one of (C1) to (C3).

The protein D (sterol 7-position reductase derived from zebra fish, named DrDHCR7) is a protein shown in any one of the following (D1) - (D4):

(D1) protein with amino acid sequence shown as SEQ ID No. 1;

(D2) a protein obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence defined in (D1) and having the same function;

(D3) a protein having a homology of 99% or more, 95% or more, 90% or more, 85% or more, or 80% or more with the amino acid sequence defined in (D1) or (D2) and having the same function;

(D4) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of the protein defined in any one of (D1) to (D3).

The protein E (sterol 24-site reductase derived from zebra fish, named DrDHCR24) is a protein shown in any one of the following (E1) - (E4):

(E1) protein with amino acid sequence shown as SEQ ID No. 2;

(E2) a protein obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence defined in (E1) and having the same function;

(E3) a protein having a homology of 99% or more, 95% or more, 90% or more, 85% or more, or 80% or more with the amino acid sequence defined in (E1) or (E2) and having the same function;

(E4) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of the protein defined in any one of (E1) to (E3).

In the above protein, the tag is a polypeptide or protein that is expressed by fusion with a target protein using in vitro recombinant DNA technology, so as to facilitate expression, detection, tracking and/or purification of the target protein. The protein tag may be a His tag, a Flag tag, an MBP tag, an HA tag, a myc tag, a GST tag, and/or a SUMO tag, among others.

In a second aspect, the invention claims a nucleic acid molecule or a set of nucleic acid molecules.

The nucleic acid molecules claimed in the present invention are nucleic acid molecules encoding the proteins described hereinbefore.

The nucleic acid molecule is a nucleic acid molecule A or a nucleic acid molecule B.

The set of nucleic acid molecules claimed by the invention is set of nucleic acid molecules A or set of nucleic acid molecules B or set of nucleic acid molecules C.

The set of nucleic acid molecules A consists of the nucleic acid molecule A and the nucleic acid molecule B.

The nucleic acid molecule set B consists of the nucleic acid molecule A, the nucleic acid molecule B and the nucleic acid molecule C.

The set of nucleic acid molecules consists of the nucleic acid molecule A, the nucleic acid molecule B, the nucleic acid molecule C, the nucleic acid molecule D and the nucleic acid molecule E.

The nucleic acid molecule A is a nucleic acid molecule encoding the protein A described above.

The nucleic acid molecule B is a nucleic acid molecule encoding the protein B described above.

The nucleic acid molecule C is a nucleic acid molecule encoding the protein C described above.

The nucleic acid molecule D is a nucleic acid molecule encoding the protein D as described above.

The nucleic acid molecule E is a nucleic acid molecule which codes for the protein E described above.

Further, the nucleic acid molecule a encodes DGCYP033, which is a DNA molecule represented by any one of (a1) to (a3) below:

(a1) a DNA molecule with the nucleotide sequence shown as 11 th-1477 th sites of SEQ ID No. 10;

(a2) a DNA molecule which hybridizes under stringent conditions to the DNA molecule defined in (a1) and which encodes a protein represented by any one of (A1) to (A4) as described hereinbefore;

(a3) a DNA molecule which has 99% or more, 95% or more, 90% or more, 85% or more or 80% or more homology to the DNA sequence defined in (a1) or (a2) and which encodes a protein represented by any one of (A1) to (A4) described above.

The nucleic acid molecule B encodes DGCYP029 and is a DNA molecule shown in any one of (B1) to (B3) as follows:

(b1) a DNA molecule with a nucleotide sequence shown as 11 th to 1501 th sites of SEQ ID No. 8;

(b2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (B1) and which encodes the protein B as described hereinbefore;

(b3) a DNA molecule having 99% or more, 95% or more, 90% or more, 85% or more or 80% or more homology to the DNA sequence defined in (B1) or (B2) and encoding the protein B as described above.

The nucleic acid molecule C encodes VvCPR and is a DNA molecule shown in any one of (C1) - (C3) as follows:

(c1) a DNA molecule with the nucleotide sequence shown as 11 th-2125 th site of SEQ ID No. 12;

(c2) a DNA molecule which hybridizes under stringent conditions to the DNA molecule defined in (C1) and which encodes protein C as described hereinbefore;

(c3) a DNA molecule having 99% or more, 95% or more, 90% or more, 85% or more or 80% or more homology to the DNA sequence defined in (C1) or (C2) and encoding the protein C as described above.

The nucleic acid molecule D encodes DrDHCR7 and is a DNA molecule shown in any one of (D1) - (D3) as follows:

(d1) a DNA molecule with the nucleotide sequence shown in the 813 th and 2249 th positions of SEQ ID No. 3;

(d2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (D1) and which encodes the protein D as described hereinbefore;

(d3) a DNA molecule having 99% or more, 95% or more, 90% or more, 85% or more or 80% or more homology to the DNA sequence defined in (D1) or (D2) and encoding the protein D as described above.

The nucleic acid molecule E encodes DrDHCR24 and is a DNA molecule shown as any one of (E1) - (E3) below:

(e1) a DNA molecule with the nucleotide sequence shown in the 493 2043 position of SEQ ID No. 4;

(e2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (E1) and which encodes the protein E described hereinbefore;

(e3) a DNA molecule having 99% or more, 95% or more, 90% or more, 85% or more or 80% or more homology to the DNA sequence defined in (E1) or (E2) and encoding the protein E as described above.

Wherein the stringent conditions may be as follows: 50 ℃ in 7% Sodium Dodecyl Sulfate (SDS), 0.5M Na₃PO₄Hybridization with 1mM EDTA, rinsing in 2 XSSC, 0.1% SDS at 50 ℃; also can be: 50 ℃ in 7% SDS, 0.5M Na₃PO₄Hybridization with 1mM EDTA, rinsing at 50 ℃ in 1 XSSC, 0.1% SDS; also can be: 50 ℃ in 7% SDS, 0.5M Na₃PO₄Hybridization with 1mM EDTA, rinsing in 0.5 XSSC, 0.1% SDS at 50 ℃; also can be: 50 ℃ in 7% SDS, 0.5M Na₃PO₄Hybridization with 1mM EDTA, rinsing in 0.1 XSSC, 0.1% SDS at 50 ℃; also can be: 50 ℃ in 7% SDS, 0.5M Na₃PO₄Hybridization with 1mM EDTA, rinsing in 0.1 XSSC, 0.1% SDS at 65 ℃; can also be: in a solution of 6 XSSC, 0.5% SDS at 65 ℃ and then washed once with each of 2 XSSC, 0.1% SDS and 1 XSSC, 0.1% SDS.

In a third aspect, the invention claims any of the following biomaterials:

(c1) a recombinant vector comprising a nucleic acid molecule as described above.

(c2) An expression cassette comprising a nucleic acid molecule as described herein before.

(c3) A transgenic cell line comprising a nucleic acid molecule as described above.

(c4) The recombinant bacterium is a recombinant bacterium containing the nucleic acid molecule.

(c5) The complete set of recombinant vector is a complete set of vector A or a complete set of vector B or a complete set of vector C.

The complete set of vector A consists of a recombinant vector A and a recombinant vector B.

The complete set of vector B consists of the recombinant vector A, the recombinant vector B and the recombinant vector C.

The complete set of vector C consists of the recombinant vector A, the recombinant vector B, the recombinant vector C, the recombinant vector D and the recombinant vector E.

The recombinant vector A is a recombinant vector containing the nucleic acid molecule A; the recombinant vector B is a recombinant vector containing the nucleic acid molecule B; the recombinant vector C is a recombinant vector containing the nucleic acid molecule C; the recombinant vector D is a recombinant vector containing the nucleic acid molecule D; the recombinant vector E is a recombinant vector containing the nucleic acid molecule E described above.

(c6) The complete set of expression cassette is a complete set of expression cassette A or a complete set of expression cassette B or a complete set of expression cassette C.

The expression cassette A set consists of an expression cassette A and an expression cassette B.

The expression cassette B consists of the expression cassette A, the expression cassette B and the expression cassette C.

The expression cassette C set consists of the expression cassette A, the expression cassette B, the expression cassette C, the expression cassette D and the expression cassette E.

The expression cassette A is an expression cassette comprising the nucleic acid molecule A as described above; the expression cassette B is an expression cassette comprising the nucleic acid molecule B as described above; the expression cassette C is an expression cassette comprising a nucleic acid molecule C as described hereinbefore; the expression cassette D is an expression cassette comprising a nucleic acid molecule D as described hereinbefore; the expression cassette E is an expression cassette which contains the nucleic acid molecule E described above.

(c7) The complete set of transgenic cell line is a complete set of transgenic cell line A or a complete set of transgenic cell line B or a complete set of transgenic cell line C.

The complete set of transgenic cell line A consists of a transgenic cell line A and a transgenic cell line B.

The complete set of transgenic cell line B consists of the transgenic cell line A, the transgenic cell line B and the transgenic cell line C.

The complete set of transgenic cell line C consists of the transgenic cell line A, the transgenic cell line B, the transgenic cell line C, the transgenic cell line D and the transgenic cell line E.

The transgenic cell line A is a transgenic cell line containing the nucleic acid molecule A as described above; the transgenic cell line B is a transgenic cell line containing the nucleic acid molecule B as described above; the transgenic cell line C is a transgenic cell line comprising a nucleic acid molecule C as described above; the transgenic cell line D is a transgenic cell line containing the nucleic acid molecule D; the transgenic cell line E is a transgenic cell line which contains the nucleic acid molecule E described above.

(c8) The complete set of recombinant bacteria is a complete set of recombinant bacteria A or a complete set of recombinant bacteria B or a complete set of recombinant bacteria C.

The recombinant bacterium A set consists of a recombinant bacterium A and a recombinant bacterium B.

The set of recombinant bacteria B consists of the recombinant bacteria A, the recombinant bacteria B and the recombinant bacteria C.

The set of recombinant bacterium C consists of the recombinant bacterium A, the recombinant bacterium B, the recombinant bacterium C, the recombinant bacterium D and the recombinant bacterium E.

The recombinant bacterium A is a recombinant bacterium containing the nucleic acid molecule A; the recombinant bacterium B is a recombinant bacterium containing the nucleic acid molecule B; the recombinant bacterium C is a recombinant bacterium containing the nucleic acid molecule C; the recombinant bacterium D is a recombinant bacterium containing the nucleic acid molecule D; the recombinant bacterium E is a recombinant bacterium containing the nucleic acid molecule E.

In a fourth aspect, the invention claims a method for constructing engineering bacteria.

The method for constructing the yeast engineering bacteria for synthesizing the diosgenin, which is claimed by the invention, can comprise the following steps of: modifying the saccharomycete capable of synthesizing cholesterol to express the protein A, the protein B and the cytochrome P450 reductase, wherein the modified saccharomycete is the target engineering bacterium.

Further, the yeast capable of synthesizing cholesterol may be prepared according to a method comprising the steps of: modifying the starting yeast to express sterol 7-position reductase and sterol 24-position reductase, wherein the modified yeast is the yeast capable of synthesizing cholesterol.

Corresponding to the gene level, the method may comprise the steps of: the target engineering bacterium is obtained by introducing the nucleic acid molecule A, the nucleic acid molecule B and the coding gene of the cytochrome P450 reductase into the yeast capable of synthesizing cholesterol to obtain recombinant yeast expressing the protein A, the protein B and the cytochrome P450 reductase.

Further, the yeast capable of synthesizing cholesterol may be prepared according to a method comprising the steps of: and (3) introducing the encoding gene of the sterol 7-position reductase and the encoding gene of the sterol 24-position reductase into the starting yeast to obtain recombinant yeast for expressing the sterol 7-position reductase and the sterol 24-position reductase, namely the yeast capable of synthesizing cholesterol.

Wherein the cytochrome P450 reductase may be protein C as described above; the sterol 7-position reductase may be protein D as described hereinbefore; the sterol 24-reductase can be protein E as described above.

The gene encoding the cytochrome P450 reductase may be the nucleic acid molecule C as described hereinbefore, corresponding to the gene level; the encoding gene of sterol 7-position reductase can be the nucleic acid molecule D; the gene encoding sterol 24-reductase can be nucleic acid molecule E as described above.

Further, each of the nucleic acid molecules or encoding genes may be introduced into the corresponding recipient yeast in the form of a recombinant vector or an expression cassette.

In a particular embodiment of the invention, said nucleic acid molecule a and said nucleic acid molecule B and the gene encoding said cytochrome P450 reductase (as described above for nucleic acid molecule C) are integrated into the genome of said yeast capable of synthesizing cholesterol at the site Gal 80. The gene encoding sterol 7-reductase (nucleic acid molecule D, as described above) and the gene encoding sterol 24-reductase (nucleic acid molecule E, as described above) are integrated into the genome of the starting yeast at the Gal7 site.

Further, the yeast may be saccharomyces cerevisiae, etc.

In a specific embodiment of the invention, the starting yeast is specifically Saccharomyces cerevisiae BY-T3.

In a specific embodiment of the present invention, the nucleic acid molecule D and the nucleic acid molecule E are introduced into the starting yeast in the form of expression cassettes. The expression cassette is an expression cassette Ppgk-DrDHCR7-ADH1t and an expression cassette pGal1-DrDHCR24-CYC1 t. The sequence of the expression cassette Ppgk-DrDHCR7-ADH1t is SEQ ID No. 3; the sequence of the expression cassette pGal1-DrDHCR24-CYC1t is SEQ ID No. 4. When the expression cassette Ppgk-DrDHCR7-ADH1t and the expression cassette pGal1-DrDHCR24-CYC1t are introduced into the starting yeast, a homologous arm marker fragment Gal7-URA3-up and a homologous arm marker fragment Gal7-URA3-down (the Gal7 site integrated into saccharomyces cerevisiae is realized) are also introduced; the sequence of the homologous arm marker fragment gal7-URA3-up is shown as SEQ ID No. 5; the sequence of the homologous arm marker fragment gal7-URA3-down is shown in SEQ ID No. 6.

In a specific embodiment of the present invention, the nucleic acid molecule a, the nucleic acid molecule B and the nucleic acid molecule C are introduced into the yeast capable of synthesizing cholesterol in the form of expression cassettes. The expression cassette is expression cassette pPgk-DGCYP029-ADH1t, expression cassette pTDH3-VvCPR-TPI1t and expression cassette pTEF-DGCYP033-CYC1 t.

The expression cassette pPgk-DGCYP029-ADH1t takes plasmid pM2-DGCYP029-sy-Sc as a template, and a fragment obtained by PCR amplification is obtained by using a primer 1-M-pEASY-PGK1-F and a primer 3G-1-M-ADHt-TDH 3-R. The sequence of the primer 1-M-pEASY-PGK1-F is shown in SEQ ID No. 15; the sequence of the primer 3G-1-M-ADHt-TDH3-R is shown in SEQ ID No. 16. The plasmid pM2-DGCYP029-sy-Sc is a recombinant plasmid obtained by replacing a small fragment between an enzyme cutting site SexA1 and Asc1 of a pM2 plasmid with a DNA fragment (DGCYP029-sy-Sc) shown at 11 th to 1501 th sites of SEQ ID No. 8.

The expression cassette pTDH3-VvCPR-TPI1t is a fragment obtained by PCR amplification using a primer 3G-3-M-ADHt-TDH3-F and a primer 3G-3-M-TPI1t-TEF1-R with a plasmid pM4-VvCPR-sy-Sc as a template. The sequence of the primer 3G-3-M-ADHt-TDH3-F is shown in SEQ ID No. 17; the sequence of the primer 3G-3-M-TPI1t-TEF1-R is shown as SEQ ID No. 18. The plasmid pM4-VvCPR-sy-Sc is a recombinant plasmid obtained by replacing a small fragment between the enzyme cutting site SexA1 and Asc1 of the pM4 plasmid by a DNA fragment (VvCPR-sy-Sc) shown at the 11 th to the 2125 th sites of SEQ ID No. 12.

The expression cassette pTEF-DGCYP033-CYC1t takes plasmid pM3-DGCYP033-sy-Sc as a template, and a fragment obtained by PCR amplification is carried out by using a primer 3G-2-M-TPI1t-TEF1-F and a primer M-CYC1 t-pEASY-R. The sequence of the primer 3G-2-M-TPI1t-TEF1-F is shown as SEQ ID No. 19; the sequence of the primer M-CYC1t-pEASY-R is shown in SEQ ID No. 20. The plasmid pM3-DGCYP033-sy-Sc is a recombinant plasmid obtained by replacing a small fragment between the enzyme cutting site SexA1 and Asc1 of the pM3 plasmid with a DNA fragment (DGCYP033-sy-Sc) shown at 11 th to 1477 th positions of SEQ ID No. 10.

When the expression cassette pPgk-DGCYP029-ADH1t, the expression cassette pTDH3-VvCPR-TPI1t and the expression cassette pTEF-DGCYP033-CYC1t are introduced into the yeast capable of synthesizing cholesterol, a homology arm marker fragment Gal80-Leu-up and a homology arm marker fragment Gal80-Leu-down (realizing the Gal80 site integrated in saccharomyces cerevisiae) are also introduced; the sequence of the homologous arm marker fragment gal80-Leu-up is shown in SEQ ID No. 13; the sequence of the homologous arm marker fragment gal80-Leu-down is shown in SEQ ID No. 14.

In a fifth aspect, the invention claims an engineered bacterium prepared by the method described in the fourth aspect.

In a specific embodiment of the invention, the engineering bacteria is engineering bacteria DG 001; the engineering bacterium DG001 is prepared according to the following steps:

(d1) introducing the expression cassette Ppgk-DrDHCR7-ADH1T (SEQ ID No.3), the expression cassette pGal1-DrDHCR24-CYC1T (SEQ ID No.4), the marker fragment gal7-URA3-up (SEQ ID No.5) of the homologous arm and the marker fragment gal7-URA3-down (SEQ ID No.6) of the homologous arm into Saccharomyces cerevisiae BY-T3 to obtain a recombinant strain which is the yeast capable of synthesizing cholesterol;

(d2) the engineering bacterium DG001 is a recombinant bacterium obtained by introducing the expression cassette pPgk-DGCYP029-ADH1t, the expression cassette pTDH3-VvCPR-TPI1t, the expression cassette pTEF-DGCYP033-CYC1t, the marker fragment gal80-Leu-up (SEQ ID No.13) of the homologous arm and the marker fragment gal80-Leu-down (SEQ ID No.14) of the homologous arm into the yeast capable of synthesizing cholesterol.

In a sixth aspect, the invention claims the use of the protein or the set of proteins or the nucleic acid molecule or the set of nucleic acid molecules or the biological material or the engineered bacteria in the preparation of diosgenin.

In a seventh aspect, the invention claims a method for preparing diosgenin.

The method for preparing diosgenin claimed by the invention can comprise the following steps: carrying out fermentation culture on the engineering bacteria in the fifth aspect, and collecting fermentation products; the fermentation product contains diosgenin.

Further, the temperature of the culture was 30 ℃.

Further, the conditions of the culture are: 1) selective culture with 2% (percent. indicates g/100mL) glucose as a carbon source was performed at 30 ℃ and 250rpm for 30 hours. 2) The selection medium was transferred to a selection medium with 2% (percentage indicated g/100mL) galactose as carbon source and incubated at 30 ℃ and 250rpm for 90 h.

Experiments prove that the saccharomyces cerevisiae engineering strain capable of synthesizing the diosgenin can be obtained by introducing the relevant gene for synthesizing the dioscorea zingiberensis which is derived from the dioscorea zingiberensis provided by the invention and a cytochrome P450 reductase coding gene into the saccharomyces cerevisiae capable of synthesizing cholesterol. The invention realizes the biosynthesis of the saccharomyces cerevisiae dioscin.

Drawings

Fig. 1 shows the synthesis of diosgenin starting from cholesterol.

FIG. 2 shows the GC-MS identification of strain DG-Cho fermentation products.

FIG. 3 shows GC-MS identification of fermentation products of strain DG 001.

Detailed Description

The experimental procedures used in the following examples are all conventional procedures unless otherwise specified.

Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

Plasmid pM 2: a SexA1 cleavage site, a pPGK1 promoter, a Green Fluorescent Protein (GFP) gene, and terminator ADHt and Asc1 cleavage sites were inserted in this order into the multiple cloning site of the peasy-Blunt-simple vector (all-grass organisms, Inc.).

Plasmid pM 3: a SexA1 cleavage site, a pTEF1 promoter, a Green Fluorescent Protein (GFP) gene, and terminators CYC1t and Asc1 cleavage sites were inserted in this order into the multiple cloning site of the peasy-Blunt-simple vector (all-grass biology, Ltd.).

Plasmid pM 4: a SexA1 cleavage site, a pTDH3 promoter, a Green Fluorescent Protein (GFP) gene, and terminator TPI1t and Asc1 cleavage sites were inserted in this order into the multiple cloning site of the peasy-Blunt-simple vector (all-grass King Bio Inc.).

Plasmid pyes2.0: is Addgene product.

Saccharomyces cerevisiae (Saccharomyces cerevisiae) BY-T3: dai, z.et al.identification of a novel cytochrome P450enzyme sites the C-2 α hydroxylation of specific Tritersugars and its application in yeast cell factors 51,70-78(2019), from the 2019 literature of the experimental wearman. The applicant can obtain the said product, and can only use it for repeating the experiment of the invention, and has no other use.

Example 1 construction of Saccharomyces cerevisiae cholesterol Synthesis Chassis Strain DG-Cho

1. Construction of Yeast PolyGene integration fragments

Selecting sterol 7-site reductase (DrDHCR7) and sterol 24-site reductase (DrDHCR24) from zebra fish (Danio rerio) to perform codon optimization on the two proteins, and performing optimization and gene synthesis work by Nanjing Kingsri biotechnology limited to obtain optimized genes which are named as DrDHCR7-sy-Sc and DrDHCR 24-sy-Sc. The sequence of the DrDHCR7-sy-Sc gene is shown as 813 th and 2249 th positions of SEQ ID No.3, and the DrDHCR7-sy-Sc gene encodes protein shown as SEQ ID No. 1; the sequence of the DrDHCR24-sy-Sc gene is shown as the 493-2043 position of SEQ ID No.4, and the DrDHCR24-sy-Sc gene encodes the protein shown as SEQ ID No. 2. The cloning vector plasmids pM2, pYES2.0 and gene fragments DrDHCR7-sy-Sc and DrDHCR24-sy-Sc are cut by Thermo company SexA1 and Asc1, and the cut products are recovered by PCR product gel recovery kit of Shanghai biological engineering Limited company for later use. The enzyme cutting vector pM2 and the obtained DrDHCR7-sy-Sc gene fragment are added into a connection system by 50ng respectively: mu.L of 2 Xquick ligation Buffer (NEB), 0.5. mu.L of Quick ligation Buffer (NEB, 400,000 covalent end units/ml), distilled water was added to 10. mu.L, the mixture was reacted at room temperature for 10min to obtain a ligation product, which was transferred to Trans1-T1 competent cells and ice-cooled for 30min, heat-shocked at 42 ℃ for 30 sec, and immediately placed on ice for 2 min. Adding 800 mu l LB culture medium, incubating at 250rpm and 37 ℃ for 1 hour, coating the bacterial liquid on LB plate containing ampicillin, after overnight culture, PCR screening 5 positive single colonies, carrying out liquid culture on positive clones, extracting positive clone plasmid for sequencing verification, wherein the sequencing result shows that the target fragment is inserted into the vector pM2, thus obtaining the plasmid pM2-DrDHCR 7-sy-Sc. The plasmid pYes-DrDHCR24-sy-Sc was constructed using the pYES2.0 vector and the DrDHCR24-sy-Sc gene fragment prepared by the enzymatic cleavage in the same manner as described above.

PCR amplification was performed using the constructed plasmid pM2-DrDHCR7-sy-Sc as a template and using primers 1-M-pEASY-PGK1-F and 1-M-ADHt-Gal1-R (see Table 2) as the amplification system TAKARA

10. mu.L of HS DNA polymerase 5 XBuffer, 4. mu.L of Dntp mix, 1. mu.L each of primers (see Table 1), 0.5. mu.L of plasmid pM2-DrDHCR7-sy-Sc template, 0.5. mu.L of PrimerSTAR HS polymerase (2.5U/. mu.L), and distilled water were added to a total volume of 50. mu.L. The amplification conditions were 98 ℃ for 3 min(1 cycle); denaturation at 98 ℃ for 10 seconds, annealing at 56 ℃ for 15 seconds, and extension at 72 ℃ for 2 minutes (30 cycles); extension at 72 ℃ for 10min (1 cycle). The resulting amplification product was designated Ppgk-DrDHCR7-ADH1t (SEQ ID No.3), and the fragment contained Pgk promoter (positions 63-812 of SEQ ID No.3), zebra fish-derived DrDHCR7 gene (positions 813-2249 of SEQ ID No.3) and ADH1 terminator (positions 2250-2407 of SEQ ID No. 3). PCR was performed using the constructed plasmid pYes-DrDHCR24-sy-Sc as a template using the primers 2-M-ADHt-Gal1-F and M-CYC1t-pEASY-R (see Table 1) in the same manner as described above to obtain pGal1-DrDHCR24-CYC1t fragment (SEQ ID No.4) comprising the Gal1 promoter (position 51-492 of SEQ ID No.4), the zebrafish-derived DrDHCR24 gene (position 493 2043 of SEQ ID No.4) and the CYC1 terminator (position 2044-2243 of SEQ ID No. 4). And performing gel recovery treatment on the target fragment obtained by amplification for later use.

TABLE 1 amplification primers for DrDHCR7-sy-Sc and DrDHCR24-sy-Sc gene integration fragments

2. Construction of Saccharomyces cerevisiae Strain DG-Cho

A starting strain, Saccharomyces cerevisiae BY-T3, was inoculated into a liquid screening medium (formulation: 0.8% SD-His (Beijing Pankeno technology Co., Ltd.), 2% glucose (each percentage indicates g/100mL), each percentage indicates g/100mL) and cultured overnight. 1mL (OD600 about 0.6-1.0) was dispensed into 1.5mL EP tubes, centrifuged at 4 ℃ at 10000g for 1min, the supernatant was discarded, the precipitate was washed with sterile water (4 ℃), centrifuged under the same conditions, and the supernatant was discarded. The cells were treated with 1mL of a treatment solution (formulation: 10mM LiAc; 10mM DTT; 0.6M sorbitol; 10mM Tris-HCl (pH7.5), and DTT was added when the treatment solution was used), and the cells were left at 25 ℃ for 20 min. After centrifugation, the supernatant was discarded, 1mL of 1M sorbitol (0.22 μ M aqueous membrane filtration sterilization) was added to the cells for resuspension, and the cells were centrifuged, and the supernatant was discarded (resuspended twice with 1M sorbitol buffer) to a final volume of about 90. mu.L. Adding the fragment Ppgk-DrDHCR7-ADH1T, pGal1-DrDHCR24-CYC1T and the homologous arm marker fragment Gal7-URA3-up (SEQ ID No. 5; the homologous arm fragment comprises 400bp homologous region upstream of Gal7 site, URA3marker gene, and Pgk promoter 400bp homologous region), Gal7-URA3-down (SEQ ID No. 6; the homologous arm fragment comprises CYC1 terminator 200bp homologous region, and Gal7 site downstream of 300bp homologous region) (spreading DrDHCR7-sy-Sc and DrDHCR24-sy-Sc gene fragments on Gal7 site of Saccharomyces cerevisiae BY-T3) obtained in step 1, 1.5. mu.L each, transferring to electroporation cuvette after mixing, 2.7kv electroporation 5.7, adding 1mL sorbitol 1M, and screening medium (Trra-0.8% His), 2% glucose, 0.01% Trp, 0.01% Leu, 1.5% agar; each percentage number represents g/100 mL). The conditions of the screening culture are as follows: culturing at 30 deg.C for 36 hr or more. PCR identified the correct positive clone, designated strain DG-Cho.

Example 2 production of Cholesterol by Saccharomyces cerevisiae Strain DG-Cho fermentation

Activating the Saccharomyces cerevisiae strain DG-Cho and the control strain BY-T3 in a solid selection medium (same as the step 2 in the example 1), preparing seed solutions (30 ℃, 250rpm, 12h) in corresponding liquid selection media (same as the step 2 in the example 1), respectively inoculating the seed solutions with 1% of inoculation amount into 100mL triangular flasks containing 15mL of corresponding liquid selection media, culturing at 30 ℃, 250rpm for 30h, then centrifugally collecting the bacteria at 5000rpm, resuspending the bacteria in 15mL of corresponding 2% (percent indicates g/100mL) liquid selection media with galactose as a carbon source at 2 mL, further culturing at 30 ℃, 250rpm for 90h to obtain a fermentation broth, and growing the cells (OD 2)₆₀₀) And the product is measured.

Taking 6mL of fermentation liquor in a crushing tube, centrifuging for 1min at 13000rpm, and removing supernatant; washing the precipitate with sterile water, centrifuging under the same condition, and discarding the supernatant; adding a proper amount of glass beads (the diameter is 0.5mm) and 1mL of extract (methanol: acetone is 1:1), carrying out shaking crushing for 5min, carrying out ultrasonic crushing for 30min, centrifuging at 13000rpm for 2min, taking supernatant, filtering through a 0.22 mu m organic filter membrane, putting the supernatant into a liquid phase bottle, and carrying out GC-MS detection. GC-MS detection method: gas mass spectrometry tandem-mass spectrometry (GC-MS) agilent technologies 5975C and a three-axis insert xl MSD detector equipped with a chromatography column: HP-5ms (30m 0.25mm 0.5 μm). GC-MS measurement conditions: the injection port temperature is 300 ℃, the injection volume is 1 mu L, the flow is not divided, and the solvent is delayed for 5 min; chromatographic conditions are as follows: maintaining at 240 deg.C for 5min, heating at 10 deg.C/min to 300 deg.C, and maintaining at 300 deg.C for 25min, and totally 36 min. MS conditions: SIM: 69,139,282, and 414.

And (3) detection results: after integration of DrDHCR7 and DrDHCR24 derived from zebrafish into Saccharomyces cerevisiae BY-T3 was detected, the resulting fermentation product of strain DG-Cho was qualitatively analyzed for cholesterol production BY GC-MS detection (FIG. 2). Therefore, the saccharomyces cerevisiae chassis strain DG-Cho capable of synthesizing cholesterol in vivo is successfully constructed.

Example 3 construction of a de novo synthetic engineered Strain of diosgenin

1. Construction of Yeast PolyGene integration fragments

Selecting 16-site and 22-site dioxygenase (DGCYP033) and 26-site oxidase (DGCYP029) of cholesterol derived from dioscorea zingiberensis (D.zingiberensis) and cytochrome P450 reductase (VvCPR) derived from grapes to perform codon optimization on the 3 proteins, and entrusting the optimization and gene synthesis work to Nanjing King Shirui biotechnology Limited company to complete, wherein the obtained optimized genes are named as DGCYP029-sy-Sc, DGCYP033-sy-Sc and VvCPR-sy-Sc. The sequence of the DGCYP029-sy-Sc gene with the recognition sequences of enzyme cutting sites SexA1 and Asc1 at two ends is shown as SEQ ID No.8, and the 11 th to 1501 th sites of the SEQ ID No.8 encode protein shown as SEQ ID No. 7; the sequence of DGCYP033-sy-Sc gene with the recognition sequences of the enzyme cutting sites SexA1 and Asc1 at two ends is shown as SEQ ID No.10, the 11 th to 1477 th sites of the SEQ ID No.10 code the protein shown as SEQ ID No.9, the sequence of VvCPR-sy-Sc gene with the recognition sequences of the enzyme cutting sites SexA1 and Asc1 at two ends is shown as SEQ ID No.12, and the 11 th to 2125 th sites of the SEQ ID No.12 code the protein shown as SEQ ID No. 11. The cloning vector plasmids pM2, pM3 and pM4 and gene fragments DGCYP029-sy-Sc, DGCYP033-sy-Sc and VvCPR-sy-Sc with recognition sequences of enzyme cutting sites SexA1 and Asc1 at two ends are cut by Thermo company SexA1 and Asc1, and the cut enzyme products are recovered by PCR product gel recovery kit of Shanghai Biotechnology engineering Limited company for later use. The restriction enzyme vector pM2 is linked with the DGCYP029-sy-Sc gene fragment, the restriction enzyme vector pM3 and the gene fragment DGCYP033-sy-Sc, and the restriction enzyme vector pM4 is linked with the gene fragment VvCPR-sy-Sc (the method is the same as the step 1 of the example 1), finally the constructed plasmid pM2-DGCYP029-sy-Sc (the recombinant plasmid obtained by replacing the DNA fragment shown at the 11 th to 1501 th positions of SEQ ID No.8 with the small fragment between the restriction enzyme sites SexA1 and Asc1 of the pM2 plasmid), pM3-DGCYP033-sy-Sc (the recombinant plasmid obtained by replacing the DNA fragment shown at the 11 th to 1477 th positions of SEQ ID No.10 with the small fragment between the restriction sites SexA1 and Asc1 of the pM3 plasmid) and pM 4-Vvsy-Sc (the recombinant plasmid obtained by replacing the DNA fragment shown at the 11 th to 2125 th positions of SEQ ID No.12 with the DNA 3884 and the recombinant plasmid Asc 4 th positions of the SexA 3884 of the plasmid).

Using plasmid pM2-DGCYP029-sy-Sc as template, primers 1-M-pEASY-PGK1-F and 3G-1-M-ADHt-TDH3-R, plasmid pM4-VvCPR-sy-Sc as template, primers 3G-3-M-ADHt-TDH3-F and 3G-3-M-TPI1t-TEF1-R, plasmid pM3-DGCYP033-sy-Sc as template, primers 3G-2-M-TPI1t-TEF1-F and M-CYCYCYCYC 1 t-TEASY-R, performing PCR amplification, obtaining fragments Pgk-CYP 029-ADH1t, pTDH 3-Vv73742 and pTEF-461-CYP 461-84, and recovering the above-purified primers (see Table 1 for PCR) 2).

TABLE 2 amplification primers for the VcCYP94N-sy-Sc, DGCYP033-sy-Sc and VvCPR-sy-Sc gene integration fragments

2. Construction of Saccharomyces cerevisiae Strain DG-001

The starting strain Saccharomyces cerevisiae DG-Cho was inoculated into a liquid screening medium (formulation: 0.8% SD-His-Ura (Beijing Pankeno (functional genome) science and technology Co., Ltd.), 2% glucose; each percentage number indicates g/100mL) for overnight culture. 1mL (OD600 about 0.6-1.0) was dispensed into 1.5mL EP tubes, centrifuged at 4 ℃ at 10000g for 1min, the supernatant was discarded, the precipitate was washed with sterile water (4 ℃), centrifuged under the same conditions, and the supernatant was discarded. The cells were incubated at 25 ℃ for 20min with 1mL of a treatment solution (10mM LiAc; 10mM DTT; 0.6M sorbitol; 10mM Tris-HCl (pH7.5) added thereto, and the treatment solution was used. After centrifugation, the supernatant was discarded, 1mL of 1M sorbitol (0.22 μ M aqueous membrane filtration sterilization) was added to the cells for resuspension, and the cells were centrifuged to discard the supernatant (resuspended twice with 1M sorbitol) to a final volume of about 90 μ L. Adding the fragments pPgk-DGCYP029-ADH1t, pTDH3-VvCPR-TPI1t and pTEF-DGCYP033-CYC1t obtained in step 1 and a homologous arm marker fragment Gal80-Leu-up (SEQ ID No. 13; the homologous arm fragment comprises a 400bp homologous region upstream of the Gal80 site, a Leu2marker gene and a Pgk promoter 400bp homologous region), Gal80-Leu-down (SEQ ID No. 14; the homologous arm fragment comprises a CYC1 terminator 200bp homologous region and a Gal80 bp homologous region downstream of the Gal80 site) (realizing that the VcCYP94N-sy-Sc, VvCPR-sy-Sc and CYP DG033-sy-Sc gene fragments are integrated into the Gal80 site of the Saccharomyces cerevisiae DG-Cho) and transferring the mixture to an electrotransfer, 2.7, 5.7 electric shock culture cup, adding sorbitol 1.7-1.8% screening medium (1 mL-1.8 ℃ culture medium: 1.8: 1 mL-Leu-1-Leu-culture, 2% glucose, 0.01% Trp, 1.5% agar; each percentage number represents g/100 mL). The conditions of the screening culture are as follows: culturing at 30 deg.C for 36 hr or more. The correct positive clone was identified by PCR and was designated strain DG 001.

Example 4 production of diosgenin by Saccharomyces cerevisiae strain DG001 fermentation

Activating the Saccharomyces cerevisiae strain DG001 and the control strain DG-Cho in a solid selection medium (same as the step 2 in the example 1), preparing seed solutions (30 ℃, 250rpm and 12h) in corresponding liquid selection media (same as the step 2 in the example 1), respectively inoculating the seed solutions in a 100mL triangular flask containing 15mL of the corresponding liquid selection media in an inoculation amount of 1%, after culturing for 30h at 30 ℃ at 250rpm, centrifugally collecting the bacteria at 5000rpm, re-suspending the bacteria in a 100mL triangular flask by using 15mL of the corresponding liquid selection media with 2% (percent of the bacteria indicates g/100mL) galactose as a carbon source, continuously performing shaking culture at 30 ℃, 250rpm for 90h to obtain a fermentation liquid, and performing cell growth (OD (OD) on the cells₆₀₀) And the product was measured (the same method as in example 2 was used for extraction and measurement of the product).

And (3) detection results: the DGCYP029-sy-Sc, VvCPR-sy-Sc and DGCYP033-sy-Sc gene fragments are integrated into the Saccharomyces cerevisiae DG-Cho by detection, and the obtained fermentation product of the strain DG-001 is qualitatively analyzed for the generation of diosgenin by GC-MS detection (figure 3). Therefore, the saccharomyces cerevisiae chassis strain DG001 capable of synthesizing the diosgenin in vivo from the beginning is successfully constructed, and the yield of the diosgenin is 3.2 mg/L.

<110> institute of biotechnology for Tianjin industry of Chinese academy of sciences

<120> dioscorea zingiberensis-derived diosgenin synthesis-related protein, coding gene and application

<130> GNCLN192228

<160> 20

<170> PatentIn version 3.5

<210> 1

<211> 478

<212> PRT

<213> Artificial sequence

<400> 1

Met Met Ala Ser Asp Arg Val Arg Lys Arg His Lys Gly Ser Ala Asn

1 5 10 15

Gly Ala Gln Thr Val Glu Lys Glu Pro Ser Lys Glu Pro Ala Gln Trp

20 25 30

Gly Arg Ala Trp Glu Val Asp Trp Phe Ser Leu Ser Gly Val Ile Leu

35 40 45

Leu Leu Cys Phe Ala Pro Phe Leu Val Ser Phe Phe Ile Met Ala Cys

50 55 60

Asp Gln Tyr Gln Cys Ser Ile Ser His Pro Leu Leu Asp Leu Tyr Asn

65 70 75 80

Gly Asp Ala Thr Leu Phe Thr Ile Trp Asn Arg Ala Pro Ser Phe Thr

85 90 95

Trp Ala Ala Ala Lys Ile Tyr Ala Ile Trp Val Thr Phe Gln Val Val

100 105 110

Leu Tyr Met Cys Val Pro Asp Phe Leu His Lys Ile Leu Pro Gly Tyr

115 120 125

Val Gly Gly Val Gln Asp Gly Ala Arg Thr Pro Ala Gly Leu Ile Asn

130 135 140

Lys Tyr Glu Val Asn Gly Leu Gln Cys Trp Leu Ile Thr His Val Leu

145 150 155 160

Trp Val Leu Asn Ala Gln His Phe His Trp Phe Ser Pro Thr Ile Ile

165 170 175

Ile Asp Asn Trp Ile Pro Leu Leu Trp Cys Thr Asn Ile Leu Gly Tyr

180 185 190

Ala Val Ser Thr Phe Ala Phe Ile Lys Ala Tyr Leu Phe Pro Thr Asn

195 200 205

Pro Glu Asp Cys Lys Phe Thr Gly Asn Met Phe Tyr Asn Tyr Met Met

210 215 220

Gly Ile Glu Phe Asn Pro Arg Ile Gly Lys Trp Phe Asp Phe Lys Leu

225 230 235 240

Phe Phe Asn Gly Arg Pro Gly Ile Val Ala Trp Thr Leu Ile Asn Leu

245 250 255

Ser Tyr Ala Ala Lys Gln Gln Glu Leu Tyr Gly Tyr Val Thr Asn Ser

260 265 270

Met Ile Leu Val Asn Val Leu Gln Ala Val Tyr Val Val Asp Phe Phe

275 280 285

Trp Asn Glu Ala Trp Tyr Leu Lys Thr Ile Asp Ile Cys His Asp His

290 295 300

Phe Gly Trp Tyr Leu Gly Trp Gly Asp Cys Val Trp Leu Pro Phe Leu

305 310 315 320

Tyr Thr Leu Gln Gly Leu Tyr Leu Val Tyr Asn Pro Ile Gln Leu Ser

325 330 335

Thr Pro His Ala Ala Gly Val Leu Ile Leu Gly Leu Val Gly Tyr Tyr

340 345 350

Ile Phe Arg Val Thr Asn His Gln Lys Asp Leu Phe Arg Arg Thr Glu

355 360 365

Gly Asn Cys Ser Ile Trp Gly Lys Lys Pro Thr Phe Ile Glu Cys Ser

370 375 380

Tyr Gln Ser Ala Asp Gly Ala Ile His Lys Ser Lys Leu Met Thr Ser

385 390 395 400

Gly Phe Trp Gly Val Ala Arg His Met Asn Tyr Thr Gly Asp Leu Met

405 410 415

Gly Ser Leu Ala Tyr Cys Leu Ala Cys Gly Gly Asn His Leu Leu Pro

420 425 430

Tyr Phe Tyr Ile Ile Tyr Met Thr Ile Leu Leu Val His Arg Cys Ile

435 440 445

Arg Asp Glu His Arg Cys Ser Asn Lys Tyr Gly Lys Asp Trp Glu Arg

450 455 460

Tyr Thr Ala Ala Val Ser Tyr Arg Leu Leu Pro Asn Ile Phe

465 470 475

<210> 2

<211> 516

<212> PRT

<213> Artificial sequence

<400> 2

Met Asp Pro Leu Leu Tyr Leu Gly Gly Leu Ala Val Leu Phe Leu Ile

1 5 10 15

Trp Ile Lys Val Lys Gly Leu Glu Tyr Val Ile Ile His Gln Arg Trp

20 25 30

Ile Phe Val Cys Leu Phe Leu Leu Pro Leu Ser Val Val Phe Asp Val

35 40 45

Tyr Tyr His Leu Arg Ala Trp Ile Ile Phe Lys Met Cys Ser Ala Pro

50 55 60

Lys Gln His Asp Gln Arg Val Arg Asp Ile Gln Arg Gln Val Arg Glu

65 70 75 80

Trp Arg Lys Asp Gly Gly Lys Lys Tyr Met Cys Thr Gly Arg Pro Gly

85 90 95

Trp Leu Thr Val Ser Leu Arg Val Gly Lys Tyr Lys Lys Thr His Lys

100 105 110

Asn Ile Met Ile Asn Met Met Asp Ile Leu Glu Val Asp Thr Lys Arg

115 120 125

Lys Val Val Arg Val Glu Pro Leu Ala Asn Met Gly Gln Val Thr Ala

130 135 140

Leu Leu Asn Ser Ile Gly Trp Thr Leu Pro Val Leu Pro Glu Leu Asp

145 150 155 160

Asp Leu Thr Val Gly Gly Leu Val Met Gly Thr Gly Ile Glu Ser Ser

165 170 175

Ser His Ile Tyr Gly Leu Phe Gln His Ile Cys Val Ala Phe Glu Leu

180 185 190

Val Leu Ala Asp Gly Ser Leu Val Arg Cys Thr Glu Lys Glu Asn Ser

195 200 205

Asp Leu Phe Tyr Ala Val Pro Trp Ser Cys Gly Thr Leu Gly Phe Leu

210 215 220

Val Ala Ala Glu Ile Arg Ile Ile Pro Ala Gln Lys Trp Val Lys Leu

225 230 235 240

His Tyr Glu Pro Val Arg Gly Leu Asp Ala Ile Cys Lys Lys Phe Ala

245 250 255

Glu Glu Ser Ala Asn Lys Glu Asn Gln Phe Val Glu Gly Leu Gln Tyr

260 265 270

Ser Arg Asp Glu Ala Val Ile Met Thr Gly Val Met Thr Asp His Ala

275 280 285

Glu Pro Asp Lys Thr Asn Cys Ile Gly Tyr Tyr Tyr Lys Pro Trp Phe

290 295 300

Phe Arg His Val Glu Ser Phe Leu Lys Gln Asn Arg Val Ala Val Glu

305 310 315 320

Tyr Ile Pro Leu Arg His Tyr Tyr His Arg His Thr Arg Ser Ile Phe

325 330 335

Trp Glu Leu Gln Asp Ile Ile Pro Phe Gly Asn Asn Pro Leu Phe Arg

340 345 350

Tyr Val Phe Gly Trp Met Val Pro Pro Lys Ile Ser Leu Leu Lys Leu

355 360 365

Thr Gln Gly Glu Thr Ile Arg Lys Leu Tyr Glu Gln His His Val Val

370 375 380

Gln Asp Met Leu Val Pro Met Lys Asp Ile Lys Ala Ala Ile Gln Arg

385 390 395 400

Phe His Glu Asp Ile His Val Tyr Pro Leu Trp Leu Cys Pro Phe Leu

405 410 415

Leu Pro Asn Gln Pro Gly Met Val His Pro Lys Gly Asp Glu Asp Glu

420 425 430

Leu Tyr Val Asp Ile Gly Ala Tyr Gly Glu Pro Lys Val Lys His Phe

435 440 445

Glu Ala Thr Ser Ser Thr Arg Gln Leu Glu Lys Phe Val Arg Asp Val

450 455 460

His Gly Phe Gln Met Leu Tyr Ala Asp Val Tyr Met Glu Arg Lys Glu

465 470 475 480

Phe Trp Glu Met Phe Asp Gly Thr Leu Tyr His Lys Leu Arg Glu Glu

485 490 495

Leu Gly Cys Lys Asp Ala Phe Pro Glu Val Phe Asp Lys Ile Cys Lys

500 505 510

Ser Ala Arg His

515

<210> 3

<211> 2460

<212> DNA

<213> Artificial sequence

<400> 3

ctgtttcctg tgtgaaattg ttatccgctc acaattccac acaacatacg agccttaatt 60

aaacgcacag atattataac atctgcacaa taggcatttg caagaattac tcgtgagtaa 120

ggaaagagtg aggaactatc gcatacctgc atttaaagat gccgatttgg gcgcgaatcc 180

tttattttgg cttcaccctc atactattat cagggccaga aaaaggaagt gtttccctcc 240

ttcttgaatt gatgttaccc tcataaagca cgtggcctct tatcgagaaa gaaattaccg 300

tcgctcgtga tttgtttgca aaaagaacaa aactgaaaaa acccagacac gctcgacttc 360

ctgtcttcct attgattgca gcttccaatt tcgtcacaca acaaggtcct agcgacggct 420

cacaggtttt gtaacaagca atcgaaggtt ctggaatggc gggaaagggt ttagtaccac 480

atgctatgat gcccactgtg atctccagag caaagttcgt tcgatcgtac tgttactctc 540

tctctttcaa acagaattgt ccgaatcgtg tgacaacaac agcctgttct cacacactct 600

tttcttctaa ccaagggggt ggtttagttt agtagaacct cgtgaaactt acatttacat 660

atatataaac ttgcataaat tggtcaatgc aagaaataca tatttggtct tttctaattc 720

gtagtttttc aagttcttag atgctttctt tttctctttt ttacagatca tcaaggaagt 780

aattatctac tttttacaac aaatataaaa caatgatggc atctgataga gttagaaaaa 840

gacataaagg ttcagcaaat ggtgctcaaa ctgttgaaaa agaaccatct aaagaaccag 900

cacaatgggg tagagcttgg gaagttgatt ggttctcttt gtcaggtgtt attttgttgt 960

tgtgtttcgc accatttttg gtttctttct ttatcatggc ttgtgatcaa taccaatgtt 1020

ctatctcaca tccattgttg gatttgtata atggtgacgc aactttgttt actatttgga 1080

atagagctcc atcttttact tgggctgcag ctaagatcta tgctatctgg gttacattcc 1140

aagttgtttt gtacatgtgt gttccagatt tcttgcataa aattttgcca ggttatgttg 1200

gtggtgttca agatggtgca agaacaccag ctggtttgat taataagtac gaagttaacg 1260

gtttgcaatg ttggttgatc actcatgttt tgtgggtttt gaacgcacaa catttccatt 1320

ggttctctcc aacaatcatc atcgataact ggatcccatt gttgtggtgt actaacatct 1380

tgggttacgc tgtttcaaca ttcgctttta ttaaggctta cttatttcca actaatccag 1440

aagattgtaa gtttactggt aacatgttct acaactacat gatgggtatc gaattcaatc 1500

caagaatcgg taaatggttc gatttcaagt tgtttttcaa tggtagacca ggaattgttg 1560

cttggacttt gattaatttg tcttacgcag ctaagcaaca agaattgtac ggttacgtta 1620

caaactcaat gatcttggtt aacgttttgc aagcagttta cgttgttgat ttcttttgga 1680

acgaagcttg gtacttgaag actatcgata tctgtcatga tcatttcggt tggtatttgg 1740

gttggggtga ctgtgtttgg ttgccatttt tatacacttt gcaaggtttg tatttggttt 1800

acaacccaat ccaattgtct acaccacatg cagctggtgt tttgatcttg ggtttggttg 1860

gttactacat ttttagagtt actaaccatc aaaaggattt gtttagaaga acagagggta 1920

actgttcaat ctggggtaaa aagccaactt ttattgaatg ttcttaccaa tcagcagatg 1980

gtgctatcca taagtctaag ttgatgactt caggtttctg gggtgttgct agacatatga 2040

attatactgg tgacttgatg ggttctttgg catactgttt agcttgtggt ggtaatcatt 2100

tgttgccata cttctacatc atctatatga ctatcttatt ggttcataga tgtatcagag 2160

atgaacatag atgttctaat aagtacggta aagattggga aagatataca gcagctgttt 2220

catacagatt attgccaaat attttctaaa gttataaaaa aaataagtgt atacaaattt 2280

taaagtgact cttaggtttt aaaacgaaaa ttcttattct tgagtaactc tttcctgtag 2340

gtcaggttgc tttctcaggt atagcatgag gtcgctctta ttgaccacac ctctaccggc 2400

atgccgacgg attagaagcc gccgagcggg tgacagccct ccgaaggaag actctcctcc 2460

<210> 4

<211> 2297

<212> DNA

<213> Artificial sequence

<400> 4

ggtatagcat gaggtcgctc ttattgacca cacctctacc ggcatgccga cggattagaa 60

gccgccgagc gggtgacagc cctccgaagg aagactctcc tccgtgcgtc ctcgtcttca 120

ccggtcgcgt tcctgaaacg cagatgtgcc tcgcgccgca ctgctccgaa caataaagat 180

tctacaatac tagcttttat ggttatgaag aggaaaaatt ggcagtaacc tggccccaca 240

aaccttcaaa tgaacgaatc aaattaacaa ccataggatg ataatgcgat tagtttttta 300

gccttatttc tggggtaatt aatcagcgaa gcgatgattt ttgatctatt aacagatata 360

taaatgcaaa aactgcataa ccactttaac taatactttc aacattttcg gtttgtatta 420

cttcttattc aaatgtaata aaagtatcaa caaaaaattg ttaatatacc tctatacttt 480

aacgtcaagg agatggatcc attgttatac ttgggtggtt tagctgtttt gtttttaatc 540

tggatcaagg ttaaaggttt agaatatgtt attattcatc aaagatggat ttttgtttgt 600

ttatttttgt tgccattgtc agttgttttc gatgtttact accatttgag agcttggatc 660

atttttaaga tgtgttctgc accaaagcaa catgatcaaa gagttagaga tatccaaaga 720

caagttagag aatggagaaa agatggtggt aaaaagtaca tgtgtactgg tagaccagga 780

tggttgacag tttcattaag agttggtaaa tacaagaaaa ctcataagaa catcatgatt 840

aatatgatgg atattttaga agttgataca aagagaaagg ttgttagagt tgaaccattg 900

gctaatatgg gtcaagttac tgcattgttg aactctatcg gttggacatt gccagtttta 960

ccagaattgg atgatttgac tgttggtggt ttagttatgg gtacaggtat cgaatcttca 1020

tctcatatct atggtttgtt ccaacatatc tgtgttgctt tcgaattggt tttagcagat 1080

ggttctttag ttagatgtac tgaaaaggaa aactcagatt tgttttacgc tgttccatgg 1140

tcttgtggta cattgggttt cttggttgct gcagaaatca gaatcatccc agctcaaaag 1200

tgggttaagt tgcattacga accagttaga ggtttggatg caatctgtaa gaaattcgct 1260

gaagaatcag caaataagga aaaccaattc gttgaaggtt tacaatactc tagagatgaa 1320

gctgttatta tgactggtgt tatgacagat catgcagaac cagataagac taactgtatc 1380

ggttactact acaagccatg gtttttcaga catgttgaat catttttaaa gcaaaacaga 1440

gttgcagttg aatacatccc attgagacat tactaccata gacatacaag atctattttc 1500

tgggaattgc aagatatcat cccattcggt aacaacccat tgtttagata cgtttttggt 1560

tggatggttc caccaaagat ctcattgttg aagttgactc aaggtgaaac aatcagaaag 1620

ttgtacgaac aacatcatgt tgttcaagat atgttggttc caatgaagga tatcaaggct 1680

gcaatccaaa gattccatga agatatccat gtttacccat tgtggttgtg tccatttttg 1740

ttaccaaatc aaccaggaat ggttcatcca aaaggtgacg aagatgaatt gtacgttgat 1800

attggtgctt acggtgaacc aaaggttaag catttcgaag caacttcatc tacaagacaa 1860

ttggaaaagt ttgttagaga tgttcatggt ttccaaatgt tgtacgctga tgtttacatg 1920

gaaagaaagg aattctggga aatgttcgat ggtactttgt accataagtt gagagaagaa 1980

ttgggttgta aggatgcttt tccagaagtt tttgataaaa tttgtaaatc tgcaagacat 2040

taaagtctag gtccctattt atttttttat agttatgtta gtattaagaa cgttatttat 2100

atttcaaatt tttctttttt ttctgtacag acgcgtgtac gcatgtaaca ttatactgaa 2160

aaccttgctt gagaaggttt tgggacgctc gaaggcttta atttgcaagc tgcggccctg 2220

cattaatgaa tcggccaacg cgccagggtt ttcccagtca cgacgttgta aaacgacggc 2280

cagtgaattg taatacg 2297

<210> 5

<211> 1604

<212> DNA

<213> Artificial sequence

<400> 5

ggaaaagttg taaatattat tggtagtatt cgtttggtaa agtagagggg gtaatttttc 60

ccctttattt tgttcataca ttcttaaatt gctttgcctc tccttttgga aagctatact 120

tcggagcact gttgagcgaa ggctcattag atatattttc tgtcattttc cttaacccaa 180

aaataaggga aagggtccaa aaagcgctcg gacaactgtt gaccgtgatc cgaaggactg 240

gctatacagt gttcacaaaa tagccaagct gaaaataatg tgtagctatg ttcagttagt 300

ttggctagca aagatataaa agcaggtcgg aaatatttat gggcattatt atgcagagca 360

tcaacatgat aaaaaaaaac agttgaatat tccctcaaaa atgtcgaaag ctacatataa 420

ggaacgtgct gctactcatc ctagtcctgt tgctgccaag ctatttaata tcatgcacga 480

aaagcaaaca aacttgtgtg cttcattgga tgttcgtacc accaaggaat tactggagtt 540

agttgaagca ttaggtccca aaatttgttt actaaaaaca catgtggata tcttgactga 600

tttttccatg gagggcacag ttaagccgct aaaggcatta tccgccaagt acaatttttt 660

actcttcgaa gacagaaaat ttgctgacat tggtaataca gtcaaattgc agtactctgc 720

gggtgtatac agaatagcag aatgggcaga cattacgaat gcacacggtg tggtgggccc 780

aggtattgtt agcggtttga agcaggcggc agaagaagta acaaaggaac ctagaggcct 840

tttgatgtta gcagaattgt catgcaaggg ctccctatct actggagaat atactaaggg 900

tactgttgac attgcgaaga gcgacaaaga ttttgttatc ggctttattg ctcaaagaga 960

catgggtgga agagatgaag gttacgattg gttgattatg acacccggtg tgggtttaga 1020

tgacaaggga gacgcattgg gtcaacagta tagaaccgtg gatgatgtgg tctctacagg 1080

atctgacatt attattgttg gaagaggact atttgcaaag ggaagggatg ctaaggtaga 1140

gggtgaacgt tacagaaaag caggctggga agcatatttg agaagatgcg gccagcaaaa 1200

ctaaacgcac agatattata acatctgcac aataggcatt tgcaagaatt actcgtgagt 1260

aaggaaagag tgaggaacta tcgcatacct gcatttaaag atgccgattt gggcgcgaat 1320

cctttatttt ggcttcaccc tcatactatt atcagggcca gaaaaaggaa gtgtttccct 1380

ccttcttgaa ttgatgttac cctcataaag cacgtggcct cttatcgaga aagaaattac 1440

cgtcgctcgt gatttgtttg caaaaagaac aaaactgaaa aaacccagac acgctcgact 1500

tcctgtcttc ctattgattg cagcttccaa tttcgtcaca caacaaggtc ctagcgacgg 1560

ctcacaggtt ttgtaacaag caatcgaagg ttctggaatg gcgg 1604

<210> 6

<211> 499

<212> DNA

<213> Artificial sequence

<400> 6

agtctaggtc cctatttatt tttttatagt tatgttagta ttaagaacgt tatttatatt 60

tcaaattttt cttttttttc tgtacagacg cgtgtacgca tgtaacatta tactgaaaac 120

cttgcttgag aaggttttgg gacgctcgaa ggctttaatt tgcaagctgc ggccctgcat 180

taatgaatcg gccaacgcgc aaagaaagtg gaatattcat tcatatcata ttttttctat 240

taactgcctg gtttctttta aattttttat tggttgtcga cttgaacgga gtgacaatat 300

atatatatat atatttaata atgacatcat tatctgtaaa tctgattctt aatgctattc 360

tagttatgta agagtggtcc tttccataaa aaaaaaaaaa aagaaaaaag aattttagga 420

atacaatgca gcttgtaagt aaaatctgga atattcatat cgccacaact tcttatgctt 480

ataaaagcac taatgcctg 499

<210> 7

<211> 496

<212> PRT

<213> Artificial sequence

<400> 7

Met Gly Leu Met Gly Pro Leu Leu Leu Thr Leu Ala Ala Leu Ala Val

1 5 10 15

Thr Val Phe Leu Leu Arg Arg Arg Arg Gln Pro Ser Ser Lys Thr Ser

20 25 30

Lys Pro Leu Ala Ser Ser Gly Thr Leu Ser Glu Leu Met Lys Asn Gly

35 40 45

His Arg Ile Leu Asp Trp Thr Thr Glu Leu Leu Ser Ser Ser Gln Thr

50 55 60

Gly Thr Val Thr Thr Phe Met Gly Val Val Thr Ala Asn Pro Ser Asn

65 70 75 80

Val Glu His Ile Leu Lys Ser His Phe Pro Asn Tyr Pro Lys Gly Ser

85 90 95

His Ser Thr Thr Ile Leu Ser Asp Phe Leu Gly Ala Gly Ile Phe Asn

100 105 110

Ser Asp Gly Glu His Trp Arg Leu Gln Arg Lys Thr Ala Ser Leu Glu

115 120 125

Phe Thr Thr Lys Ser Ile Arg Ser Phe Val Ser Ser Asn Val Arg Leu

130 135 140

Glu Thr Ser Ser Arg Leu Leu Pro Val Leu His Ser Phe Ala Arg Ser

145 150 155 160

Gly Gln Ile Val Asp Leu Gln Asp Leu Phe Asp Cys Leu Ala Phe Asp

165 170 175

Asn Val Cys Gln Val Thr Phe Gly Tyr Asp Pro Ala Arg Leu Asp Ser

180 185 190

Ser Gly Asp Pro Asp Ser Val Ala Phe Ser Arg Ala Phe Asp Arg Ala

195 200 205

Thr Ala Leu Ser Val Arg Arg Phe Ser His Pro Phe Pro Phe Thr Trp

210 215 220

Lys Leu Leu Arg Phe Leu Asn Ala Gly Tyr Glu Arg Glu Leu Lys Ala

225 230 235 240

Glu Val Ala Lys Val His Arg Phe Ala Met Gln Val Val Arg Arg Arg

245 250 255

Lys Lys Asp Gly Asp Leu Gly Asp Asp Leu Leu Ser Arg Phe Ile Ala

260 265 270

Glu Ala Asp Tyr Ser Asp Glu Phe Leu Arg Asp Ile Ile Ile Ser Phe

275 280 285

Val Leu Ala Gly Arg Asp Thr Thr Ser Ala Thr Leu Thr Trp Phe Phe

290 295 300

Trp Leu Ile Ala Ser Arg Pro Glu Val Lys Ala Arg Val Leu Asp Glu

305 310 315 320

Ile Arg Ala Ala Arg Glu Gln Glu Arg Glu Arg Thr Gly Thr Ala Thr

325 330 335

Ser Glu Ala Val Leu Thr Leu Asp Gln Val Arg Gly Met Asp Tyr Leu

340 345 350

His Ala Ala Leu Ser Glu Thr Leu Arg Leu Tyr Pro Pro Val Pro Leu

355 360 365

Gln Thr Arg Ala Cys Ala Glu Asp Asp Leu Leu Pro Asp Gly Thr Pro

370 375 380

Val Lys Lys Gly Ser Thr Val Met Tyr Ser Ala Tyr Ala Met Gly Arg

385 390 395 400

Ser Glu Ser Ile Trp Gly Glu Asp Trp Lys Glu Phe Arg Pro Glu Arg

405 410 415

Trp Leu Glu Asn Gly Val Phe Arg Pro Ala Ser Ser Phe Arg Phe Pro

420 425 430

Val Phe His Ala Gly Pro Arg Met Cys Leu Gly Lys Asp Met Ala Tyr

435 440 445

Ile Gln Met Lys Ala Val Ala Ala Ala Val Met Glu Arg Phe Glu Leu

450 455 460

Glu Val Val Asp Glu Glu Lys Pro Arg Glu Pro Glu Phe Thr Met Ile

465 470 475 480

Leu Arg Met Lys Gly Gly Leu Pro Val Arg Ile Arg Glu Lys Glu Phe

485 490 495

<210> 8

<211> 1509

<212> DNA

<213> Artificial sequence

<400> 8

gcgaccaggt atgggtttga tgggtccttt attattgact ttagccgctt tagccgttac 60

agtattttta ttgagaagaa gaagacaacc tagtagtaaa acatcaaaac cattggcttc 120

ttcaggtact ttgtctgaat tgatgaagaa cggtcataga atcttggatt ggactacaga 180

attgttatct tcatctcaaa ctggtacagt tactactttt atgggtgttg ttacagctaa 240

tccatcaaac gttgaacata tcttgaagtc tcatttccca aactacccaa agggttcaca 300

ttctactaca atcttgtcag atttcttggg tgcaggtatt ttcaattctg atggtgaaca 360

ttggagattg caaagaaaga cagcttcttt ggaattcact acaaagtcaa tcagatcttt 420

cgtttcatct aacgttagat tggaaacttc atctagattg ttgccagttt tgcattcatt 480

tgctagatct ggtcaaatcg ttgatttgca agatttgttc gattgtttgg cattcgataa 540

cgtttgtcaa gttactttcg gttacgatcc agctagatta gattcatctg gtgacccaga 600

ttcagttgca ttttctagag cttttgatag agcaacagct ttgtcagtta gaagattttc 660

tcatccattc ccttttactt ggaagttgtt gagatttttg aacgcaggtt acgaaagaga 720

attgaaggca gaagttgcta aggttcatag atttgctatg caagttgtta gaagaagaaa 780

gaaagatggt gacttgggtg acgatttgtt atcaagattc attgcagaag ctgattactc 840

tgatgaattc ttgagagata tcatcatctc atttgtttta gcaggtagag atactacatc 900

tgctactttg acttggtttt tctggttaat tgcatctaga ccagaagtta aggctagagt 960

tttggatgaa atcagagctg caagagaaca agaaagagaa agaactggta cagcaacttc 1020

agaagctgtt ttgacattgg atcaagttag aggcatggat tatttgcatg ctgcattatc 1080

tgaaacattg agattatacc caccagttcc attacaaact agagcatgtg ctgaagatga 1140

tttgttacca gatggtacac cagttaagaa aggttcaact gttatgtatt ctgcatacgc 1200

tatgggtaga tcagaatcta tttggggtga agactggaaa gaattcagac cagaaagatg 1260

gttggaaaat ggtgttttta gaccagcatc atcttttaga tttccagttt ttcatgctgg 1320

tccaagaatg tgtttgggta aagatatggc atacatccaa atgaaggctg ttgctgcagc 1380

tgttatggaa agattcgaat tggaagttgt tgatgaagaa aagccaagag aaccagaatt 1440

tactatgatt ttgagaatga aaggtggttt gccagttaga ataagagaaa aggaattttg 1500

aggcgcgcc 1509

<210> 9

<211> 488

<212> PRT

<213> Artificial sequence

<400> 9

Met Phe Pro Leu Ala Ile Ile Val Leu Leu Phe Pro Thr Leu Leu Leu

1 5 10 15

Leu Phe Ile Gly Val Ala Leu Gly Leu Arg Ser Gly Ala Asn Glu Ser

20 25 30

Trp Lys Lys Arg Gly Leu Asn Ile Pro Pro Gly Ser Met Gly Trp Pro

35 40 45

Leu Leu Gly Glu Thr Ile Ala Phe Arg Lys Leu His Pro Cys Thr Ser

50 55 60

Leu Gly Glu Tyr Met Glu Asp Arg Leu Gln Arg Tyr Gly Lys Ile Tyr

65 70 75 80

Arg Ser Asn Leu Phe Gly Ala Pro Thr Val Val Ser Ala Asp Ala Glu

85 90 95

Leu Asn Arg Phe Val Leu Met Asn Asp Gly Lys Leu Phe Glu Pro Ser

100 105 110

Trp Pro Lys Ser Val Ala Asp Ile Leu Gly Lys Thr Ser Met Leu Val

115 120 125

Leu Thr Gly Glu Met His Arg Tyr Met Lys Ser Leu Ser Val Asn Phe

130 135 140

Met Gly Ile Ala Arg Leu Arg Asn His Phe Leu Gly Asp Ser Glu Arg

145 150 155 160

Tyr Ile Leu Glu Asn Leu Ala Thr Trp Lys Glu Gly Val Pro Phe Pro

165 170 175

Ala Lys Glu Glu Ala Cys Lys Ile Thr Phe Asn Leu Met Val Lys Asn

180 185 190

Ile Leu Ser Met Asn Pro Gly Glu Pro Glu Thr Glu Arg Leu Arg Ile

195 200 205

Leu Tyr Met Ser Phe Met Lys Gly Val Ile Ala Met Pro Leu Asn Phe

210 215 220

Pro Gly Thr Ala Tyr Arg Lys Ala Ile Gln Ser Arg Ala Thr Ile Leu

225 230 235 240

Lys Thr Ile Glu His Leu Met Glu Asp Arg Leu Glu Lys Lys Lys Ala

245 250 255

Gly Thr Asp Asn Ile Gly Glu Ala Asp Leu Leu Gly Phe Ile Leu Glu

260 265 270

Gln Ser Asn Leu Asp Ala Glu Gln Phe Gly Asp Leu Leu Leu Gly Leu

275 280 285

Leu Phe Gly Gly His Glu Thr Ser Ser Thr Ala Ile Thr Leu Ala Ile

290 295 300

Tyr Phe Leu Glu Gly Cys Pro Lys Ala Val Gln Glu Leu Arg Glu Glu

305 310 315 320

His Leu Asn Leu Val Arg Met Lys Lys Gln Arg Gly Glu Ser Lys Ala

325 330 335

Leu Thr Trp Glu Asp Tyr Lys Ser Met Asp Phe Ala Gln Cys Val Val

340 345 350

Ser Glu Thr Leu Arg Leu Gly Asn Ile Ile Lys Phe Val His Arg Lys

355 360 365

Ala Asn Thr Asp Val Gln Phe Lys Gly Tyr Asp Ile Pro Ser Gly Trp

370 375 380

Ser Val Ile Pro Val Phe Ala Ala Ala His Leu Asp Pro Thr Val Tyr

385 390 395 400

Asp Asn Pro Gln Lys Phe Asp Pro Trp Arg Trp Gln Thr Ile Ser Ser

405 410 415

Ser Thr Ala Arg Ile Asp Asn Tyr Met Pro Phe Gly Gln Gly Leu Arg

420 425 430

Asn Cys Ala Gly Leu Glu Leu Ala Lys Met Glu Ile Ala Val Phe Leu

435 440 445

His His Leu Val Leu Asn Phe Asp Trp Glu Leu Ala Glu Pro Asp His

450 455 460

Pro Leu Ala Tyr Ala Phe Pro Glu Phe Glu Lys Gly Leu Pro Ile Lys

465 470 475 480

Val Arg Lys Leu Ser Ile Leu Glu

485

<210> 10

<211> 1485

<212> DNA

<213> Artificial sequence

<400> 10

gcgaccaggt atgtttcctc tagctatcat cgtcttgcta tttcccacac tgctgctcct 60

cttcatagga gtggccctgg gtttgagaag tggagccaat gagagctgga agaagagggg 120

gctcaacatt cccccaggaa gcatgggctg gccgctcctc ggcgagacca tcgccttccg 180

gaagctccat ccctgcacct ctctcggcga gtacatggag gatcgtctcc agaggtatgg 240

aaagatctac cgctcgaact tgttcggcgc gccgacggtg gtttcggcgg atgcagagct 300

gaaccggttc gtgctgatga acgacgggaa gctgttcgag ccgagctggc cgaagagcgt 360

ggcggacata ctgggaaaga cgtcgatgct ggtgctcaca ggggagatgc atcgctacat 420

gaagtccttg tccgtcaact tcatggggat cgctaggctt cggaatcact tccttgggga 480

ctctgagcgc tatatcttgg agaaccttgc gacctggaag gagggcgttc ctttccctgc 540

taaagaagaa gcttgcaaga taaccttcaa tttaatggtg aagaacatac tgagtatgaa 600

tcctggggag ccagagaccg agaggttgcg cattctctac atgtccttca tgaagggagt 660

gatagctatg cctctcaatt tccctggaac tgcatacagg aaagccattc agtctagagc 720

tacaatcctg aaaaccattg agcatttgat ggaggatagg ctggagaaga agaaggcagg 780

cactgataat atcggagaag ctgatcttct aggtttcatt ctagagcagt cgaacttgga 840

tgctgaacaa ttcggagact tgctgttagg tttgcttttt ggtggccatg agacctcctc 900

cactgccatc actctggcta tctacttcct tgaaggatgc cctaaagctg tacaagaact 960

aagggaagag catttgaacc tggtgaggat gaagaagcag agaggagagt ccaaagcact 1020

cacttgggaa gactacaaat ccatggactt tgcacagtgt gtggtgagtg agactctaag 1080

gctgggaaac atcatcaagt ttgtgcacag gaaggctaac actgatgtcc aatttaaagg 1140

atatgacata ccgagtggct ggagtgtgat tccggtgttc gccgcagctc atttagatcc 1200

tactgtctat gacaatcctc agaaatttga tccttggaga tggcagacaa tctcctccag 1260

cactgctagg attgacaatt acatgccatt cggtcagggg ctgcgcaact gtgctggcct 1320

tgagctcgcc aagatggaga tcgccgtgtt ccttcaccac cttgtcctta acttcgactg 1380

ggagcttgct gagccagatc accccctcgc ctacgccttc cctgaattcg aaaagggcct 1440

tcctatcaaa gttcgcaagc tatccatcct agaatgaggc gcgcc 1485

<210> 11

<211> 704

<212> PRT

<213> Artificial sequence

<400> 11

Met Gln Ser Ser Ser Val Lys Val Ser Pro Phe Asp Leu Met Ser Ala

1 5 10 15

Ile Ile Lys Gly Ser Met Asp Gln Ser Asn Val Ser Ser Glu Ser Gly

20 25 30

Gly Ala Ala Ala Met Val Leu Glu Asn Arg Glu Phe Ile Met Ile Leu

35 40 45

Thr Thr Ser Ile Ala Val Leu Ile Gly Cys Val Val Val Leu Ile Trp

50 55 60

Arg Arg Ser Gly Gln Lys Gln Ser Lys Thr Pro Glu Pro Pro Lys Pro

65 70 75 80

Leu Ile Val Lys Asp Leu Glu Val Glu Val Asp Asp Gly Lys Gln Lys

85 90 95

Val Thr Ile Phe Phe Gly Thr Gln Thr Gly Thr Ala Glu Gly Phe Ala

100 105 110

Lys Ala Leu Ala Glu Glu Ala Lys Ala Arg Tyr Glu Lys Ala Ile Phe

115 120 125

Lys Val Val Asp Leu Asp Asp Tyr Ala Gly Asp Asp Asp Glu Tyr Glu

130 135 140

Glu Lys Leu Lys Lys Glu Thr Leu Ala Phe Phe Phe Leu Ala Thr Tyr

145 150 155 160

Gly Asp Gly Glu Pro Thr Asp Asn Ala Ala Arg Phe Tyr Lys Trp Phe

165 170 175

Ala Glu Gly Lys Glu Arg Gly Glu Trp Leu Gln Asn Leu Lys Tyr Gly

180 185 190

Val Phe Gly Leu Gly Asn Arg Gln Tyr Glu His Phe Asn Lys Val Ala

195 200 205

Lys Val Val Asp Asp Ile Ile Thr Glu Gln Gly Gly Lys Arg Ile Val

210 215 220

Pro Val Gly Leu Gly Asp Asp Asp Gln Cys Ile Glu Asp Asp Phe Ala

225 230 235 240

Ala Trp Arg Glu Leu Leu Trp Pro Glu Leu Asp Gln Leu Leu Arg Asp

245 250 255

Glu Asp Asp Ala Thr Thr Val Ser Thr Pro Tyr Thr Ala Ala Val Leu

260 265 270

Glu Tyr Arg Val Val Phe His Asp Pro Glu Gly Ala Ser Leu Gln Asp

275 280 285

Lys Ser Trp Gly Ser Ala Asn Gly His Thr Val His Asp Ala Gln His

290 295 300

Pro Cys Arg Ala Asn Val Ala Val Arg Lys Glu Leu His Thr Pro Ala

305 310 315 320

Ser Asp Arg Ser Cys Thr His Leu Glu Phe Asp Ile Ser Gly Thr Gly

325 330 335

Leu Thr Tyr Glu Thr Gly Asp His Val Gly Val Tyr Cys Glu Asn Leu

340 345 350

Pro Glu Thr Val Glu Glu Ala Glu Arg Leu Leu Gly Phe Ser Pro Asp

355 360 365

Val Tyr Phe Ser Ile His Thr Glu Arg Glu Asp Gly Thr Pro Leu Ser

370 375 380

Gly Ser Ser Leu Ser Pro Pro Phe Pro Pro Cys Thr Leu Arg Thr Ala

385 390 395 400

Leu Thr Arg Tyr Ala Asp Val Leu Ser Ser Pro Lys Lys Ser Ala Leu

405 410 415

Val Ala Leu Ala Ala His Ala Ser Asp Pro Ser Glu Ala Asp Arg Leu

420 425 430

Lys Tyr Leu Ala Ser Pro Ser Gly Lys Asp Glu Tyr Ala Gln Trp Val

435 440 445

Val Ala Ser Gln Arg Ser Leu Leu Glu Ile Met Ala Glu Phe Pro Ser

450 455 460

Ala Lys Pro Pro Leu Gly Val Phe Phe Ala Ala Val Ala Pro Arg Leu

465 470 475 480

Gln Pro Arg Tyr Tyr Ser Ile Ser Ser Ser Pro Lys Met Val Pro Ser

485 490 495

Arg Ile His Val Thr Cys Ala Leu Val Cys Asp Lys Met Pro Thr Gly

500 505 510

Arg Ile His Lys Gly Ile Cys Ser Thr Trp Met Lys Tyr Ala Val Pro

515 520 525

Leu Glu Glu Ser Gln Asp Cys Ser Trp Ala Pro Ile Phe Val Arg Gln

530 535 540

Ser Asn Phe Lys Leu Pro Ala Asp Thr Ser Val Pro Ile Ile Met Ile

545 550 555 560

Gly Pro Gly Thr Gly Leu Ala Pro Phe Arg Gly Phe Leu Gln Glu Arg

565 570 575

Phe Ala Leu Lys Glu Ala Gly Ala Glu Leu Gly Ser Ser Ile Leu Phe

580 585 590

Phe Gly Cys Arg Asn Arg Lys Met Asp Tyr Ile Tyr Glu Asp Glu Leu

595 600 605

Asn Gly Phe Val Glu Ser Gly Ala Leu Ser Glu Leu Ile Val Ala Phe

610 615 620

Ser Arg Glu Gly Pro Thr Lys Glu Tyr Val Gln His Lys Met Met Glu

625 630 635 640

Lys Ala Ser Asp Ile Trp Asn Val Ile Ser Gln Gly Gly Tyr Ile Tyr

645 650 655

Val Cys Gly Asp Ala Lys Gly Met Ala Arg Asp Val His Arg Thr Leu

660 665 670

His Thr Ile Leu Gln Glu Gln Gly Ser Leu Asp Ser Ser Lys Ala Glu

675 680 685

Ser Met Val Lys Asn Leu Gln Met Thr Gly Arg Tyr Leu Arg Asp Val

690 695 700

<210> 12

<211> 2133

<212> DNA

<213> Artificial sequence

<400> 12

gcgaccaggt atgcaatcat cctccgtaaa ggtatcccca ttcgacttaa tgtcagcaat 60

catcaagggt tctatggacc aatcaaacgt atcatcagaa tcaggtggtg ctgcagccat 120

ggttttggaa aacagagaat tcattatgat cttgactaca tccattgctg ttttgatcgg 180

ttgtgttgtc gtattgatat ggagaagatc aggtcaaaaa caatccaaga ctccagaacc 240

acctaaacct ttgattgtta aggatttgga agtagaagtt gatgacggta aacaaaaggt 300

tacaatattt ttcggtacac aaaccggtac tgctgaaggt ttcgcaaaag ccttggctga 360

agaagcaaag gccagatacg aaaaggcaat ttttaaggtt gtcgatttgg atgactatgc 420

cggtgacgac gatgaatacg aagaaaaatt gaaaaaggaa actttggcct ttttcttttt 480

ggctacatat ggtgacggtg aaccaaccga caatgctgca agattctaca aatggtttgc 540

tgagggtaaa gaacgtggtg aatggttgca aaacttaaag tatggtgttt tcggtttggg 600

taacagacaa tacgaacatt tcaacaaagt tgcaaaggta gttgacgata taatcacaga 660

acaaggtggt aaaagaatcg tcccagtagg tttgggtgac gatgaccaat gtattgaaga 720

tgacttcgcc gcttggagag aattattatg gcctgaatta gatcaattgt taagagacga 780

agatgacgct accactgtat ctacaccata taccgcagcc gttttggaat acagagtcgt 840

atttcatgat cctgaaggtg catcattaca agacaagtca tggggttccg ccaatggtca 900

tactgttcac gatgctcaac acccatgtag agccaacgtt gctgtcagaa aagaattgca 960

tactcctgct agtgatagat cttgcacaca cttggaattc gacatttctg gtactggttt 1020

aacatatgaa accggtgacc atgtaggtgt ttactgtgaa aatttgccag aaacagtcga 1080

agaagcagaa agattgttag gtttctcacc tgatgtatat ttttccatac acaccgaaag 1140

agaagacggt actccattaa gtggttcttc attgtctcca ccttttccac cttgcacttt 1200

gagaacagca ttaaccagat acgccgatgt tttgtccagt cctaaaaagt ctgcattggt 1260

cgccttagct gcacatgcat cagatccatc cgaagccgac agattgaaat atttggctag 1320

tccttctggt aaagatgaat acgctcaatg ggttgtcgca agtcaaagat ctttgttaga 1380

aattatggcc gaatttccat ctgctaagcc acctttgggt gtcttctttg ccgctgtagc 1440

tccaagattg caacctagat actacagtat ctcttcatcc ccaaagatgg ttccttctag 1500

aatacatgtt acctgtgcat tggtctgcga taaaatgcca actggtagaa tccacaaggg 1560

tatttgttca acatggatga aatatgccgt tccattagaa gaatcacaag attgctcctg 1620

ggcacctatc ttcgttagac aatcaaactt caaattgcca gctgatacct ccgtccctat 1680

cattatgatt ggtccaggta caggtttagc tcctttcaga ggtttcttgc aagaaagatt 1740

tgcattgaag gaagctggtg cagaattggg tagttctatc ttgttctttg gttgtagaaa 1800

cagaaagatg gattacatct acgaagacga attgaacggt ttcgtagaaa gtggtgcttt 1860

gtctgaattg atcgttgcat tttcaagaga aggtccaact aaggaatacg ttcaacataa 1920

gatgatggaa aaggctagtg atatctggaa cgtcatctct caaggtggtt atatatacgt 1980

atgcggtgac gctaagggta tggcaagaga cgttcataga actttgcaca caatcttaca 2040

agaacaaggt tctttagatt catccaaggc tgaatcaatg gtaaagaact tacaaatgac 2100

tggtagatac ttgagagatg tctaaggcgc gcc 2133

<210> 13

<211> 1775

<212> DNA

<213> Artificial sequence

<400> 13

gcgcaagttt tccgctttgt aatatatatt tatacccctt tcttctctcc cctgcaatat 60

aatagtttaa ttctaatatt aataatatcc tatattttct tcatttaccg gcgcactctc 120

gcccgaacga cctcaaaatg tctgctacat tcataataac caaaagctca taactttttt 180

ttttgaacct gaatatatat acatcacata tcactgctgg tccttgccga ccagcgtata 240

caatctcgat agttggtttc ccgttctttc cactcccgtc atgtctgccc ctaagaagat 300

cgtcgttttg ccaggtgacc acgttggtca agaaatcaca gccgaagcca ttaaggttct 360

taaagctatt tctgatgttc gttccaatgt caagttcgat ttcgaaaatc atttaattgg 420

tggtgctgct atcgatgcta caggtgttcc acttccagat gaggcgctgg aagcctccaa 480

gaaggctgat gccgttttgt taggtgctgt gggtggtcct aaatggggta ccggtagtgt 540

tagacctgaa caaggtttac taaaaatccg taaagaactt caattgtacg ccaacttaag 600

accatgtaac tttgcatccg actctctttt agacttatct ccaatcaagc cacaatttgc 660

taaaggtact gacttcgttg ttgtcagaga attagtggga ggtatttact ttggtaagag 720

aaaggaagac gatggtgatg gtgtcgcttg ggatagtgaa caatacaccg ttccagaagt 780

gcaaagaatc acaagaatgg ccgctttcat ggccctacaa catgagccac cattgcctat 840

ttggtccttg gataaagcta atgttttggc ctcttcaaga ttatggagaa aaactgtgga 900

ggaaaccatc aagaacgaat tccctacatt gaaggttcaa catcaattga ttgattctgc 960

cgccatgatc ctagttaaga acccaaccca cctaaatggt attataatca ccagcaacat 1020

gtttggtgat atcatctccg atgaagcctc cgttatccca ggttccttgg gtttgttgcc 1080

atctgcgtcc ttggcctctt tgccagacaa gaacaccgca tttggtttgt acgaaccatg 1140

ccacggttct gctccagatt tgccaaagaa taaggtcaac cctatcgcca ctatcttgtc 1200

tgctgcaatg atgttgaaat tgtcattgaa cttgcctgaa gaaggtaagg ccattgaaga 1260

tgcagttaaa aaggttttgg atgcaggtat cagaactggt gatttaggtg gttccaacag 1320

taccaccgaa gtcggtgatg ctgtcgccga agaagttaag aaaatccttg cttaaacgca 1380

cagatattat aacatctgca caataggcat ttgcaagaat tactcgtgag taaggaaaga 1440

gtgaggaact atcgcatacc tgcatttaaa gatgccgatt tgggcgcgaa tcctttattt 1500

tggcttcacc ctcatactat tatcagggcc agaaaaagga agtgtttccc tccttcttga 1560

attgatgtta ccctcataaa gcacgtggcc tcttatcgag aaagaaatta ccgtcgctcg 1620

tgatttgttt gcaaaaagaa caaaactgaa aaaacccaga cacgctcgac ttcctgtctt 1680

cctattgatt gcagcttcca atttcgtcac acaacaaggt cctagcgacg gctcacaggt 1740

tttgtaacaa gcaatcgaag gttctggaat ggcgg 1775

<210> 14

<211> 532

<212> DNA

<213> Artificial sequence

<400> 14

agtctaggtc cctatttatt tttttatagt tatgttagta ttaagaacgt tatttatatt 60

tcaaattttt cttttttttc tgtacagacg cgtgtacgca tgtaacatta tactgaaaac 120

cttgcttgag aaggttttgg gacgctcgaa ggctttaatt tgcaagctgc ggccctgcat 180

taatgaatcg gccaacgcgc aagcatcttg ccctgtgctt ggcccccagt gcagcgaacg 240

ttataaaaac gaatactgag tatatatcta tgtaaaacaa ccatatcatt tcttgttctg 300

aactttgttt acctaactag ttttaaattt ccctttttcg tgcatgcggg tgttcttatt 360

tattagcata ctacatttga aatatcaaat ttccttagta gaaaagtgag agaaggtgca 420

ctgacacaaa aaataaaatg ctacgtataa ctgtcaaaac tttgcagcag cgggcatcct 480

tccatcatag cttcaaacat attagcgttc ctgatcttca tacccgtgct ca 532

<210> 15

<211> 80

<212> DNA

<213> Artificial sequence

<400> 15

ctgtttcctg tgtgaaattg ttatccgctc acaattccac acaacatacg agccttaatt 60

aaacgcacag atattataac 80

<210> 16

<211> 74

<212> DNA

<213> Artificial sequence

<400> 16

cctccgcgtc attaaacttc ttgttgttga cgctaacatc aacgctagta ttcggcatgc 60

cggtagaggt gtgg 74

<210> 17

<211> 76

<212> DNA

<213> Artificial sequence

<400> 17

caggtatagc atgaggtcgc tcttattgac cacacctcta ccggcatgcc gaatactagc 60

gttgaatgtt agcgtc 76

<210> 18

<211> 75

<212> DNA

<213> Artificial sequence

<400> 18

aggagtagaa acattttgaa gctatggtgt gtgggggatc actttaatta atctatataa 60

cagttgaaat ttgga 75

<210> 19

<211> 76

<212> DNA

<213> Artificial sequence

<400> 19

gtcattttcg cgttgagaag atgttcttat ccaaatttca actgttatat agattaatta 60

aagtgatccc ccacac 76

<210> 20

<211> 78

<212> DNA

<213> Artificial sequence

<400> 20

cgtattacaa ttcactggcc gtcgttttac aacgtcgtga ctgggaaaac cctggcgcgt 60

tggccgattc attaatgc 78

Claims

1. A protein, which is protein a or protein B;

the protein A is a protein shown in any one of the following (A1) - (A3):

(A1) protein with amino acid sequence shown as SEQ ID No. 9;

(A2) has more than 99 percent of homology with the amino acid sequence defined in (A1) and is derived from the protein of dioscorea zingiberensis with 16 th and 22 th double oxidase functions of cholesterol;

(A3) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of a protein defined in any one of (A1) to (A2);

the protein B is a protein shown in any one of the following (B1) to (B3):

(B1) protein with amino acid sequence shown as SEQ ID No. 7;

(B2) a protein which has more than 99% of homology with the amino acid sequence defined in (B1) and is derived from dioscorea zingiberensis and has the function of cholesterol 26-bit oxidase;

(B3) and (B1) attaching a tag to the N-terminus and/or C-terminus of the protein defined in (B1).

2. A set of proteins, characterized in that:

the protein set is protein set A or protein set B or protein set C;

the protein A set consists of protein A and protein B;

the protein B consists of protein A, protein B and protein C;

the protein set C consists of protein A, protein B, protein C, protein D and protein E;

the protein A is a protein shown in any one of the following (A1) - (A3):

(A1) protein with amino acid sequence shown as SEQ ID No. 9;

the protein B is a protein shown in any one of the following (B1) to (B3):

(B1) protein with amino acid sequence shown as SEQ ID No. 7;

(B3) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of a protein defined in any one of (B1) to (B2);

the protein C is a protein shown in any one of the following (C1) to (C3):

(C1) protein with amino acid sequence shown as SEQ ID No. 11;

(C2) a protein having a homology of 99% or more with the amino acid sequence defined in (C1) and derived from grape having a cytochrome P450 reductase function;

(C3) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of a protein defined in any one of (C1) to (C2);

the protein D is a protein shown in any one of the following (D1) to (D3):

(D1) protein with amino acid sequence shown as SEQ ID No. 1;

(D2) a protein which has more than 99% of homology with the amino acid sequence defined in (D1) and is derived from zebrafish and has the function of sterol 7-position reductase;

(D3) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of a protein defined in any one of (D1) to (D2);

the protein E is a protein shown in any one of the following (E1) to (E3):

(E1) protein with amino acid sequence shown as SEQ ID No. 2;

(E2) a protein which has more than 99% of homology with the amino acid sequence defined in (E1) and is derived from zebra fish and has the function of sterol 24 site reductase;

(E3) a fusion protein obtained by attaching a tag to the N-terminus and/or C-terminus of the protein defined in any one of (E1) to (E2).

3. A nucleic acid molecule, which is a nucleic acid molecule A or a nucleic acid molecule B;

the nucleic acid molecule A is a nucleic acid molecule encoding the protein A of claim 1;

the nucleic acid molecule B is a nucleic acid molecule encoding the protein B according to claim 1.

4. The nucleic acid molecule of claim 3, wherein:

the nucleic acid molecule A is a DNA molecule shown as any one of (a1) to (a3) as follows:

(a2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (a1) and which encodes the protein A as claimed in claim 1;

(a3) a DNA molecule having 80% or more homology with the DNA sequence defined in (a1) or (a2) and encoding the protein A as defined in claim 1;

the nucleic acid molecule B is a DNA molecule shown as any one of (B1) to (B3) as follows:

(b2) a DNA molecule which hybridizes under stringent conditions to the DNA molecule defined in (B1) and which encodes the protein B as claimed in claim 1;

(b3) a DNA molecule having a homology of 80% or more with the DNA sequence defined in (B1) or (B2) and encoding the protein B of claim 1.

5. A set of nucleic acid molecules, characterized in that: the nucleic acid molecule set is a nucleic acid molecule set A or a nucleic acid molecule set B or a nucleic acid molecule set C;

the nucleic acid molecule A set consists of a nucleic acid molecule A and a nucleic acid molecule B;

the nucleic acid molecule B consists of a nucleic acid molecule A, a nucleic acid molecule B and a nucleic acid molecule C;

the nucleic acid molecule C set consists of a nucleic acid molecule A, a nucleic acid molecule B, a nucleic acid molecule C, a nucleic acid molecule D and a nucleic acid molecule E;

the nucleic acid molecule A is a nucleic acid molecule encoding the protein A of claim 2;

the nucleic acid molecule B is a nucleic acid molecule encoding the protein B of claim 2;

the nucleic acid molecule C is a nucleic acid molecule encoding the protein C according to claim 2;

the nucleic acid molecule D is a nucleic acid molecule encoding the protein D according to claim 2;

the nucleic acid molecule E is a nucleic acid molecule which codes for a protein E as claimed in claim 2.

6. The kit of nucleic acid molecules according to claim 5, wherein:

(a2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (a1) and which encodes the protein A of claim 2;

(a3) a DNA molecule having 80% or more homology with the DNA sequence defined in (a1) or (a2) and encoding the protein A as defined in claim 2;

(b2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (B1) and which encodes the protein B as claimed in claim 2;

(b3) a DNA molecule having 80% or more homology with the DNA sequence defined in (B1) or (B2) and encoding the protein B of claim 2;

the nucleic acid molecule C is a DNA molecule shown in any one of (C1) to (C3) as follows:

(c2) a DNA molecule which hybridizes under stringent conditions to the DNA molecule defined in (C1) and which encodes the protein C as claimed in claim 2;

(c3) a DNA molecule having 80% or more homology with the DNA sequence defined in (C1) or (C2) and encoding the protein C as claimed in claim 2;

the nucleic acid molecule D is a DNA molecule shown in any one of (D1) to (D3) as follows:

(d2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (D1) and which encodes the protein D as claimed in claim 2;

(d3) a DNA molecule having 80% or more homology with the DNA sequence defined in (D1) or (D2) and encoding the protein D as defined in claim 2;

the nucleic acid molecule E is a DNA molecule shown as any one of (E1) to (E3) below:

(e2) a DNA molecule which hybridizes under stringent conditions with the DNA molecule defined in (E1) and which encodes the protein E according to claim 2;

(e3) a DNA molecule having a homology of 80% or more with the DNA sequence defined in (E1) or (E2) and encoding the protein E of claim 2.

7. Any of the following biological materials:

(c1) a recombinant vector comprising the nucleic acid molecule of claim 3 or 4;

(c2) an expression cassette comprising the nucleic acid molecule of claim 3 or 4;

(c3) a transgenic cell line comprising the nucleic acid molecule of claim 3 or 4;

(c4) a recombinant bacterium comprising the nucleic acid molecule according to claim 3 or 4;

(c5) the complete set of recombinant vector is a complete set of vector A or a complete set of vector B or a complete set of vector C;

the complete set of vector A consists of a recombinant vector A and a recombinant vector B;

the complete set of vector B consists of the recombinant vector A, the recombinant vector B and the recombinant vector C;

the complete set of vector C consists of the recombinant vector A, the recombinant vector B, the recombinant vector C, the recombinant vector D and the recombinant vector E;

the recombinant vector A is a recombinant vector containing the nucleic acid molecule A in claim 5 or 6; the recombinant vector B is a recombinant vector containing the nucleic acid molecule B in claim 5 or 6; the recombinant vector C is a recombinant vector containing the nucleic acid molecule C of claim 5 or 6; the recombinant vector D is a recombinant vector containing the nucleic acid molecule D of claim 5 or 6; the recombinant vector E is a recombinant vector containing the nucleic acid molecule E of claim 5 or 6;

(c6) the complete set of expression cassette is a complete set of expression cassette A or a complete set of expression cassette B or a complete set of expression cassette C;

the set of expression cassette A consists of an expression cassette A and an expression cassette B;

the set of expression cassettes B consists of the expression cassette A, the expression cassette B and the expression cassette C;

the set of expression cassettes consists of the expression cassette A, the expression cassette B, the expression cassette C, the expression cassette D and the expression cassette E;

the expression cassette A is an expression cassette comprising the nucleic acid molecule A of claim 5 or 6; the expression cassette B is an expression cassette comprising the nucleic acid molecule B according to claim 5 or 6; the expression cassette C is an expression cassette comprising the nucleic acid molecule C according to claim 5 or 6; the expression cassette D is an expression cassette comprising the nucleic acid molecule D according to claim 5 or 6; the expression cassette E is an expression cassette which comprises the nucleic acid molecule E according to claim 5 or 6;

(c7) the complete set of transgenic cell line is a complete set of transgenic cell line A or a complete set of transgenic cell line B or a complete set of transgenic cell line C;

the complete set of transgenic cell line A consists of a transgenic cell line A and a transgenic cell line B;

the complete set of transgenic cell line B consists of the transgenic cell line A, the transgenic cell line B and a transgenic cell line C;

the complete set of transgenic cell line C consists of the transgenic cell line A, the transgenic cell line B, the transgenic cell line C, the transgenic cell line D and the transgenic cell line E;

the transgenic cell line A is a transgenic cell line containing the nucleic acid molecule A according to claim 5 or 6; the transgenic cell line B is a transgenic cell line containing the nucleic acid molecule B according to claim 5 or 6; the transgenic cell line C is a transgenic cell line comprising the nucleic acid molecule C according to claim 5 or 6; the transgenic cell line D is a transgenic cell line comprising the nucleic acid molecule D according to claim 5 or 6; the transgenic cell line E is a transgenic cell line containing the nucleic acid molecule E according to claim 5 or 6;

(c8) the complete set of recombinant bacteria is a complete set of recombinant bacteria A or a complete set of recombinant bacteria B or a complete set of recombinant bacteria C;

the set of recombinant bacterium A consists of a recombinant bacterium A and a recombinant bacterium B;

the set of recombinant bacteria B consists of the recombinant bacteria A, the recombinant bacteria B and the recombinant bacteria C;

the set of recombinant bacterium C consists of the recombinant bacterium A, the recombinant bacterium B, the recombinant bacterium C, the recombinant bacterium D and the recombinant bacterium E;

the recombinant bacterium A is a recombinant bacterium containing the nucleic acid molecule A in claim 5 or 6; the recombinant bacterium B is a recombinant bacterium containing the nucleic acid molecule B of claim 5 or 6; the recombinant bacterium C is a recombinant bacterium containing the nucleic acid molecule C as claimed in claim 5 or 6; the recombinant bacterium D is a recombinant bacterium containing the nucleic acid molecule D of claim 5 or 6; the recombinant bacterium E is a recombinant bacterium containing the nucleic acid molecule E according to claim 5 or 6.

8. A method for constructing yeast engineering bacteria for synthesizing diosgenin comprises the following steps: modifying yeast capable of synthesizing cholesterol to express protein A as claimed in claim 1, protein B as claimed in claim 1 and cytochrome P450 reductase, wherein the modified yeast is the target engineering bacteria.

9. The method of claim 8, wherein: the yeast capable of synthesizing cholesterol is prepared according to a method comprising the following steps: modifying the starting yeast to express sterol 7-position reductase and sterol 24-position reductase, wherein the modified yeast is the yeast capable of synthesizing cholesterol.

10. The method of claim 8, wherein: the method comprises the following steps: the target engineered bacterium is a recombinant yeast expressing the protein A, the protein B and the cytochrome P450 reductase obtained by introducing the nucleic acid molecule A according to claim 5 or 6, the nucleic acid molecule B according to claim 5 or 6 and the gene encoding the cytochrome P450 reductase into the yeast capable of synthesizing cholesterol.

11. The method of claim 9, wherein: the yeast capable of synthesizing cholesterol is prepared according to a method comprising the following steps: and (3) introducing the encoding gene of the sterol 7-position reductase and the encoding gene of the sterol 24-position reductase into the starting yeast to obtain recombinant yeast for expressing the sterol 7-position reductase and the sterol 24-position reductase, namely the yeast capable of synthesizing cholesterol.

12. The method of claim 8, wherein: the cytochrome P450 reductase is the protein C according to claim 2.

13. The method of claim 9, wherein: the sterol 7-position reductase is the protein D according to claim 2.

14. The method of claim 9, wherein: the sterol 24-position reductase is the protein E according to claim 2.

15. The method of claim 10, wherein: the cytochrome P450 reductase encoding gene is the nucleic acid molecule C according to claim 5 or 6.

16. The method of claim 11, wherein: the gene encoding sterol 7-position reductase is the nucleic acid molecule D according to claim 5 or 6.

17. The method of claim 11, wherein: the gene encoding sterol 24-position reductase is the nucleic acid molecule E according to claim 5 or 6.

18. The method of claim 10, wherein: the nucleic acid molecule a and the nucleic acid molecule B and the genes encoding the cytochrome P450 reductase are integrated into the Gal80 site of the genome of the yeast capable of synthesizing cholesterol.

19. The method of claim 11, wherein: the coding gene of sterol 7-position reductase and the coding gene of sterol 24-position reductase are integrated into the Gal7 locus of the genome of the starting yeast.

20. An engineered bacterium produced by the method of any one of claims 8 to 19.

21. Use of the protein of claim 1, the set of proteins of claim 2, the nucleic acid molecule of claim 3 or 4, the set of nucleic acid molecules of claim 5 or 6, the biological material of claim 7, or the engineered bacterium of claim 20 in the preparation of diosgenin.

22. A method for preparing diosgenin comprises the following steps: carrying out fermentation culture on the engineering bacteria in the claim 20, and collecting fermentation products; the fermentation product contains diosgenin.