CN113801233A

CN113801233A - Preparation method of somaglutide

Info

Publication number: CN113801233A
Application number: CN202010530625.4A
Authority: CN
Inventors: 陈卫; 刘慧玲; 骆莉
Original assignee: Ningbo Kunpeng Biotech Co Ltd
Current assignee: Ningbo Kunpeng Biotech Co Ltd
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2021-12-17
Anticipated expiration: 2040-06-11
Also published as: CN113801233B

Abstract

The invention provides a soxhlet peptide derivative and a preparation method thereof. Specifically, the method provided by the invention utilizes a somaglutide fusion protein containing a green fluorescent protein folding unit to prepare the somaglutide, performs Fmoc modification on a Boc-modified somaglutide main chain, and performs side chain addition of the somaglutide by orthogonal protection. The invention also provides the Fmoc and Boc modified somaglutide backbone involved in the preparation process and a fusion protein comprising the somaglutide backbone.

Description

Preparation method of somaglutide

Technical Field

The invention relates to the field of biological medicines, and particularly relates to a preparation method of somaglutide.

Background

Diabetes is a major disease threatening human health worldwide. In China, the prevalence rate of diabetes is on a rapid rising trend along with the change of life styles and the accelerated aging process of people. Acute and chronic complications of diabetes, especially chronic complications, accumulate a plurality of organs, are disabled, have high fatality rate, seriously affect physical and psychological health of patients and bring heavy burden to individuals, families and society.

The somaglutide is a hypoglycemic drug developed by Novo Nordisk, and the product can remarkably reduce the level of glycosylated hemoglobin (HbA1c) of a type 2 diabetic patient and reduce the weight, and simultaneously greatly reduce the risk of hypoglycemia. Semeglitide is obtained by modifying and modifying GLP-1 (7-37). Compared with Liraglutide, the fatty chain of Semeglutide is longer, and hydrophobicity is increased, but Semeglutide is modified by short-chain PEG, and hydrophilicity is greatly enhanced. After being modified by PEG, the modified PEG not only can be tightly combined with albumin to cover DPP-4 enzyme hydrolysis sites, but also can reduce renal excretion, prolong the biological half-life and achieve the effect of long circulation.

CAS number for somagluteptide: 910463-68-2, the name of English, Semaglutide, the sequence of which is as follows: H-His1-Aib2-Glu3-Gly4-Thr5-Phe6-Thr7-Ser8-Asp9-Val10-Ser11-Ser12-Tyr13-Le u14-Glu15-Gly16-Gln17-Ala18-Ala19-Lys20(PEG2-PEG 2-gamma-Glu-Octadecaneedioic acid) -Glu21-Phe22-Ile23-Ala24-Trp25-Leu26-Val27-Arg28-Gly29-Arg30-Gly 31-OH.

The application number is CN201611095162, which adopts a fragment condensation method to synthesize and obtain the total protection somaltulide, and the crude somaltulide peptide is obtained after cracking. Because the method adopts segment condensation, the raw materials are not easy to obtain, and the cost is high. In addition, the condensation of the side chain is carried out by firstly condensing the main chain to Thr at the 5 th position and then removing the side chain protecting group Alloc of Lys at the 20 th position. The method is easy to cause condensation polymerization of segment 2 resin in the synthesis process, greatly reduces the coupling efficiency of the amino acid after 20-bit Lys and the segment 1, is easy to generate racemization impurities, and is not beneficial to industrial production.

The application number is CN201511027176, the total protection somaglutide resin is obtained by a solid phase synthesis method, crude somaglutide peptide is obtained after cracking, and refined somaglutide is obtained after purification. The method comprises the steps of firstly condensing a main chain, then removing a Lys side chain protecting group Alloc, and carrying out side chain condensation. The method is easy to cause the condensation polymerization of resin in the synthesis process, greatly reduces the coupling efficiency, is easy to generate racemization impurities, particularly racemizes the last amino acid His, greatly reduces the yield of products and increases the production cost.

Therefore, the skilled person is working on new methods for producing somaglutide.

Disclosure of Invention

The invention aims to provide a preparation method of somaglutide.

In a first aspect of the invention, there is provided a somaglutide precursor fusion protein having, from N-terminus to C-terminus, the structure of formula I:

A-FP-TEV-EK-G (I)

in the formula (I), the compound is shown in the specification,

"-" represents a peptide bond;

a is a null or leader peptide sequence,

FP is a green fluorescent protein folding unit;

TEV is a first enzyme cutting site, preferably a TEV enzyme cutting site (shown as a sequence ENLYFQG, SEQ ID NO: 8);

EK is a second enzyme cutting site, preferably enterokinase enzyme cutting site (shown as a sequence DDDDDDK, SEQ ID NO: 9);

g is a sumatriptan precursor or a fragment thereof;

wherein said green fluorescent protein fold units comprise 2-6 β -sheet units selected from the group consisting of:

in another preferred embodiment, the green fluorescent protein folding unit is u2-u3, u4-u5, u1-u2-u3, u3-u4-u5 or u4-u5-u 6.

In another preferred example, G is a Boc-modified somaglutide precursor, the somaglutide precursor lacks 2-5 amino acids from the N-terminus of the somaglutide backbone, and lysine contained in the somaglutide precursor is Boc-modified.

In another preferred embodiment, the epsilon amino group of the Boc-modified lysine is modified with a tert-butoxycarbonyl group.

In another preferred embodiment, the amino acid sequence of the backbone of the somaglutide is shown in SEQ ID NO. 3.

In another preferred embodiment, the somaglutide precursor comprises:

the 18 th position is a first precursor of the Somalutide modified by Boc, and the amino acid sequence of the first precursor is shown as SEQ ID NO. 1;

or the 17 th position is a second precursor of the Boc modified Somaltulin, and the amino acid sequence of the second precursor is shown as SEQ ID NO. 2.

SEQ ID NO:1：EGTFTSDVSSYLEGQAAKEFIAWLVRGRG

SEQ ID NO:2：GTFTSDVSSYLEGQAAKEFIAWLVRGRG (K is Boc modified lysine)

In the present application, the complete somaglutide sequence (H (Aib) EGTFTSDVSSYLEGQAAK EFIAWLVRGRG, SEQ ID NO:3) is defined as the somaglutide backbone and the somaglutide with the N-terminal amino acid deleted is defined as the somaglutide precursor. For the Fmoc-modified somaglutide backbone, the H at the N-terminus is Fmoc-modified; for the Boc-modified somagluteptide backbone, the 20 th lysine was N ∈ - (tert-butyloxycarbonyl) -lysine.

In another preferred embodiment, the green fluorescent protein folding unit is u3-u4-u 5.

In another preferred embodiment, the amino acid sequence of the leader peptide is as shown in SEQ ID NO 7.

In another preferred embodiment, the 17 th or 18 th position of the somaglutide precursor is N ∈ - (tert-butoxycarbonyl) -lysine.

In a second aspect of the present invention, there is provided an Fmoc and Boc modified somaglutide backbone, wherein the 20 th position of the somaglutide backbone is a protected lysine, the protected lysine is N ∈ - (t-butoxycarbonyl) -lysine, and the N-terminus of the somaglutide backbone is an Fmoc modified histidine.

In another preferred embodiment, Fmoc is fluorenylmethyloxycarbonyl.

In a third aspect of the present invention, there is provided a Boc-modified somaglutide precursor, comprising:

In a fourth aspect of the present invention, an Fmoc-modified somaglutide backbone is provided, wherein the N-terminus of the somaglutide backbone is Fmoc-modified histidine, and the amino acid sequence of the somaglutide backbone is shown in SEQ ID No. 3.

In a fifth aspect of the invention, there is provided a method of preparing somaglutide, the method comprising the steps of:

(A) fermenting by using recombinant bacteria to prepare the somaglutide precursor fusion protein,

(B) the somaglutide precursor fusion protein is utilized to prepare the somaglutide,

wherein the thaumatin fusion protein is as described in the first aspect of the invention.

In another preferred embodiment, the step (B) further includes the steps of:

(i) carrying out enzyme digestion treatment on the soxhlet-marlu-peptide precursor fusion protein to obtain a Boc-modified soxhlet-marlu-peptide precursor;

(ii) attaching an Fmoc complex to the N-terminus of the Boc modified somaglutide precursor to thereby prepare Fmoc and Boc modified somaglutide backbones,

wherein the Fmoc complex comprises X amino acids from the N-terminus of the backbone of the somaglutide, and the N-terminal amino acids of the Fmoc complex are Fmoc-modified;

(iii) carrying out Boc removal treatment on the Fmoc-modified Somalou peptide main chain and the Boc-modified Somalou peptide main chain, and reacting the Somalou peptide main chain and a Somalou peptide side chain to prepare Fmoc-modified Somalou peptide; and

(iv) performing Fmoc removal treatment on the Fmoc-modified Somalobu peptide to obtain Fmoc-removed Somalobu peptide;

(v) and carrying out side chain tBu removal treatment on the Fmoc-removed somaglutide to prepare the somaglutide.

In another preferred example, in step (i), the enzyme digestion treatment is performed using enterokinase.

In another preferred embodiment, the Boc-modified somaglutide precursor comprises:

a first precursor of the 18 th Boc modified Somarlu peptide, wherein the amino acid sequence of the first precursor is shown as SEQ ID NO. 1;

or a second precursor of the 17 th Boc modified Somaltulin, wherein the amino acid sequence of the second precursor is shown as SEQ ID NO. 2.

In another preferred embodiment, the Fmoc complex is Fmoc-H-Aib or Fmoc-H-Aib-E.

In another preferred embodiment, in step (i) and step (ii), the value of X is the same.

In another preferred embodiment, the Fmoc and Boc modified somaglutide backbone is as described in the second aspect of the invention.

In another preferred embodiment, the reaction of step (ii) is as follows:

in another preferred embodiment, the side chain of the somaglutide is as follows:

in another preferred example, in step (ii), Fmoc complex, DIPEA (N, N-diisopropylethylamine) and DMF (N, N-dimethylformamide) are added, thereby attaching Fmoc complex to the N-terminus of the Boc-modified somaglutide precursor.

In another preferred embodiment, the molar ratio of Fmoc complex, DIPEA and Boc-modified somaglutide precursor added is (1.0-3.0): (10-14): (0.8-1.2), preferably (2-2.8): (11-13): (0.8-1.2).

In another preferred embodiment, between the step (ii) and the step (iii), a step of purifying the prepared Fmoc and Boc modified somaglutide backbone is further included.

In another preferred embodiment, the purification treatment is to add an organic solvent to the reaction solution to obtain a solid product, and more preferably, the organic solvent is a methyl tert-ether/petroleum ether mixed solution.

In another preferred embodiment, in step (iii), the method further comprises the steps of:

(a) adding TFA solution, stirring at low temperature, and removing Boc to obtain a Boc-removed product;

(b) adding an organic solvent to the reaction solution of step (a) to produce a solid de-Boc product, preferably said organic solvent is a methyl tert-ether/petroleum ether mixture;

(c) and mixing the de-Boc product with a side chain of the somaglutide to prepare the Fmoc modified somaglutide.

In another preferred embodiment, in step (c), the solid Boc-removed product is mixed with the side chain of somaglutide in DMF and reacted at room temperature.

In another preferred embodiment, in step (c), the reaction system further comprises DIPEA.

In another preferred embodiment, in step (iv), a piperidine-containing DMF solution is added to perform defmoc treatment, thereby preparing defmoc somaglutide.

In another preferred example, in step (v), a mixed solution of TFA, TIS and DCM is added to perform the side chain removal tBu protecting group treatment, thereby obtaining the somaglutide.

In another preferred embodiment, step (v) includes a step of purifying the produced somaglutide.

In another preferred embodiment, said Boc-modified somaglutide precursor is prepared using genetic recombination techniques.

In another preferred example, in the step (a), inclusion bodies of the somagluteptide precursor fusion protein are obtained by separating from the fermentation broth of the recombinant bacteria, and the somagluteptide precursor fusion protein is obtained after the inclusion bodies are subjected to renaturation and enzyme digestion.

In another preferred embodiment, before and after step (i), a purification step, preferably reverse-phase chromatography, is further included.

In another preferred embodiment, the recombinant bacterium comprises or integrates an expression cassette for expressing the somaglutide precursor fusion protein.

In another preferred embodiment, the method comprises the following steps:

in another preferred example, the method comprises the steps of:

(i) providing the somaglutide precursor fusion protein of the first aspect of the invention, carrying out enzyme digestion to obtain a compound 1,

(ii) attaching Compound 1 to the Fmoc-H-Aib complex to produce Compound 2,

(iii) carrying out Boc removal treatment on the compound 2, and reacting the compound with a side chain of the somaglutide to obtain a compound 4; and

(iv) subjecting compound 4 to Fmoc removal treatment to obtain compound 5;

(v) and (3) carrying out side chain tBu removal treatment on the compound 5 to prepare the somaglutide shown as the compound 6.

In another preferred embodiment, the method comprises the following steps:

in another preferred example, the method comprises the steps of:

(i) providing the somaglutide precursor fusion protein of the first aspect of the invention, carrying out enzyme digestion to obtain a compound 7,

(ii) compound 7 is attached to the Fmoc-H-Aib-E complex to produce compound 2,

(iv) subjecting compound 4 to Fmoc removal treatment to obtain compound 5;

In a sixth aspect of the invention there is provided an isolated polynucleotide encoding a soma peptide precursor fusion protein according to the first aspect of the invention, an Fmoc and Boc modified soma peptide backbone according to the second aspect of the invention, a Boc modified soma peptide precursor according to the third aspect of the invention or an Fmoc modified soma peptide backbone according to the fourth aspect of the invention.

In a seventh aspect of the invention, there is provided a vector comprising a polynucleotide according to the sixth aspect of the invention.

In another preferred embodiment, the carrier is selected from the group consisting of: DNA, RNA, plasmids, lentiviral vectors, adenoviral vectors, retroviral vectors, transposons, or combinations thereof.

In an eighth aspect of the present invention, there is provided a host cell comprising the vector of the seventh aspect of the present invention or having the polynucleotide of the sixth aspect of the present invention integrated exogenously into the chromosome.

In another preferred embodiment, the host cell is Escherichia coli, Bacillus subtilis, a yeast cell, an insect cell, a mammalian cell, or a combination thereof.

In a ninth aspect of the invention, there is provided a formulation comprising a somaglutide precursor fusion protein according to the first aspect of the invention, an Fmoc and Boc modified somaglutide backbone according to the second aspect of the invention, a Boc modified somaglutide precursor according to the third aspect of the invention or an Fmoc modified somaglutide backbone according to the fourth aspect of the invention.

In a tenth aspect of the invention there is provided a preparation of somaglutide prepared using the method of the fifth aspect of the invention.

Drawings

FIG. 1 shows a map of plasmid pBAD-FP-TEV-EK-GLP-1 (18).

FIG. 2 shows a map of plasmid pBAD-FP-TEV-EK-GLP-1 (17).

FIG. 3 shows a map of the plasmid pEvol-pylRs-pylT.

FIG. 4 shows an SDS-PAGE electrophoresis of Boc-somagluteptide precursor fusion proteins after renaturation of inclusion bodies.

FIG. 5 shows HPLC detection profile of Boc-somaglutide precursor.

Detailed Description

The inventor of the invention has extensively and intensively studied and found a somaglutide derivative and a preparation method thereof. Specifically, the method provided by the invention utilizes a somaglutide fusion protein containing a green fluorescent protein folding unit to prepare the somaglutide, performs Fmoc modification on a Boc-modified somaglutide main chain, and performs side chain addition of the somaglutide by orthogonal protection. The invention also provides the Fmoc and Boc modified somaglutide backbone involved in the preparation process and a fusion protein comprising the somaglutide backbone. The method of the invention does not need expensive solid phase synthesis instruments, shortens the production period, has simple production process and improves the purity and the yield of the product.

Somaltulide

Somaglutide was developed by noh and nodel corporation, the english name Semaglutide, CAS number: 204656-20-2, is a human glucagon-like peptide-1 (GLP-1) analogue, and has the sequence: H-His1-Aib2-Glu3-Gly4-Thr5-Phe6-Thr7-Ser8-Asp9-Val10-Ser11-Ser12-Tyr13-Leu14-Glu15-Gly16-Gln17-Ala18-Ala19-Lys20(PEG2-PEG 2-gamma-Glu-Octadecaneedioic acid) -Glu21-Phe22-Ile23-Ala24-Trp25-Leu26-Val27-Arg28-Gly29-Arg30-Gly 31-OH. The sequence homology with the natural GLP-1 of the human reaches 97 percent.

The somaglutide is a hypoglycemic drug developed by Novo Nordisk, and the product can remarkably reduce the level of glycosylated hemoglobin (HbA1c) of a type 2 diabetic patient and reduce the weight, and simultaneously greatly reduce the risk of hypoglycemia. Semeglitide is obtained by modifying and modifying GLP-1 (7-37). Compared with Liraglutide, the fatty chain of Semeglutide is longer, and hydrophobicity is increased, but Semeglutide is modified by short-chain PEG, and hydrophilicity is greatly enhanced. After being modified by PEG, the modified PEG not only can be tightly combined with albumin to cover DPP-4 enzyme hydrolysis sites, but also can reduce renal excretion, prolong the biological half-life and achieve the effect of long circulation. Can significantly reduce fasting or postprandial blood sugar of type 2 diabetes patients to achieve the regulation of blood sugar level in vivo, and simultaneously can reduce the weight of the patients and the death risk of the patients with cardiovascular diseases.

Fusion proteins

The present invention constructs a somaglutide precursor fusion protein using a green fluorescent protein folding unit, as described in the first aspect of the invention.

The green fluorescent protein fold unit FP comprised in the fusion protein of the invention comprises 2 to 6, preferably 2 to 3 β -sheet units selected from the group consisting of:

	amino acid sequence
		u1	VPILVELDGDVNG(SEQ ID NO:11)
u2	HKFSVRGEGEGDAT(SEQ ID NO:12)
		u3	KLTLKFICTT(SEQ ID NO:13)
u4	YVQERTISFKD(SEQ ID NO:14)
		u5	TYKTRAEVKFEGD(SEQ ID NO:15)
u6	TLVNRIELKGIDF(SEQ ID NO:16)
		u7	HNVYITADKQ(SEQ ID NO:17)
u8	GIKANFKIRHNVED(SEQ ID NO:18)
		u9	VQLADHYQQNTPIG(SEQ ID NO:19)
u10	HYLSTQSVLSKD(SEQ ID NO:20)
		u11	HMVLLEFVTAAGI(SEQ ID NO:21)。

In another preferred embodiment, the green fluorescent protein folding unit FP can be selected from: u8, u9, u2-u3, u4-u5, u8-u9, u1-u2-u3, u2-u3-u4, u3-u4-u5, u5-u6-u7, u8-u9-u10, u9-u 9-u 9-u 9, u9-u 9-u 9-u 9, u 9-36u 9, u 9-9, u 36u 9-36u 9, u 36u 9-9, u 9-36u 9-9, u 9-36u 9-9, u 9-9, u 9-36u 9-9, u 9-36u 9-9, u-36u-9, u 36u 9, u 9-36u 9, u 36u 9-9, u 9-36u 9, u-9, u 9-9, u-9, u 9-36u-9, u-9, u 9-9, u 9-9, u 36u-9, u 9-36u 9, u 36u-9, u-36u-9, u-9-, u1-I-u5, u2-I-u4, u3-I-u8, u5-I-u6, or u10-I-u 11.

In another preferred embodiment, the green fluorescent protein folding unit is u3-u4-u5 or u4-u5-u 6.

The term "fusion protein" as used herein also includes variants having the above-described activities. These variants include (but are not limited to): deletion, insertion and/or substitution of 1 to 3 (usually 1 to 2, more preferably 1) amino acids, and addition or deletion of one or several (usually up to 3, preferably up to 2, more preferably up to 1) amino acids at the C-terminal and/or N-terminal. For example, in the art, substitutions with amino acids of similar or similar properties will not generally alter the function of the protein. Also, for example, the addition or deletion of one or several amino acids at the C-terminus and/or N-terminus does not generally alter the structure and function of the protein. In addition, the term also includes monomeric and multimeric forms of the polypeptides of the invention. The term also includes linear as well as non-linear polypeptides (e.g., cyclic peptides).

The invention also includes active fragments, derivatives and analogs of the above fusion proteins. As used herein, the terms "fragment," "derivative," and "analog" refer to a polypeptide that substantially retains the function or activity of a fusion protein of the invention. The polypeptide fragment, derivative or analogue of the present invention may be (i) a polypeptide in which one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) are substituted, or (ii) a polypeptide having a substituent group in one or more amino acid residues, or (iii) a polypeptide in which a polypeptide is fused with another compound (such as a compound for increasing the half-life of the polypeptide, e.g., polyethylene glycol), or (iv) a polypeptide in which an additional amino acid sequence is fused with the polypeptide sequence (a fusion protein in which a tag sequence such as a leader sequence, a secretory sequence or 6His is fused). Such fragments, derivatives and analogs are within the purview of those skilled in the art in view of the teachings herein.

A preferred class of reactive derivatives refers to polypeptides formed by the replacement of up to 3, preferably up to 2, more preferably up to 1 amino acid with an amino acid of similar or analogous nature compared to the amino acid sequence of the present invention. These conservative variants are preferably produced by amino acid substitutions according to Table A.

TABLE A

Initial residue(s)	Representative substitutions	Preferred substitutions
			Ala(A)	Val；Leu；Ile	Val
Arg(R)	Lys；Gln；Asn	Lys
			Asn(N)	Gln；His；Lys；Arg	Gln
Asp(D)	Glu	Glu
			Cys(C)	Ser	Ser
Gln(Q)	Asn	Asn
			Glu(E)	Asp	Asp
Gly(G)	Pro；Ala	Ala
			His(H)	Asn；Gln；Lys；Arg	Arg
Ile(I)	Leu；Val；Met；Ala；Phe	Leu
			Leu(L)	Ile；Val；Met；Ala；Phe	Ile
Lys(K)	Arg；Gln；Asn	Arg
			Met(M)	Leu；Phe；Ile	Leu
Phe(F)	Leu；Val；Ile；Ala；Tyr	Leu
			Pro(P)	Ala	Ala
Ser(S)	Thr	Thr
			Thr(T)	Ser	Ser
Trp(W)	Tyr；Phe	Tyr
			Tyr(Y)	Trp；Phe；Thr；Ser	Phe
Val(V)	Ile；Leu；Met；Phe；Ala	Leu

The invention also provides analogs of the fusion proteins of the invention. These analogs may differ from the polypeptides of the invention by amino acid sequence differences, by modifications that do not affect the sequence, or by both. Analogs also include analogs having residues other than the natural L-amino acids (e.g., D-amino acids), as well as analogs having non-naturally occurring or synthetic amino acids (e.g., beta, gamma-amino acids). It is to be understood that the polypeptides of the present invention are not limited to the representative polypeptides exemplified above.

In addition, modifications may be made to the fusion proteins of the invention. Modified (generally without altering primary structure) forms include: chemically derivatized forms of the polypeptide, such as acetylation or carboxylation, in vivo or in vitro. Modifications also include glycosylation, such as those resulting from glycosylation modifications in the synthesis and processing of the polypeptide or in further processing steps. Such modification may be accomplished by exposing the polypeptide to an enzyme that performs glycosylation, such as a mammalian glycosylase or deglycosylase. Modified forms also include sequences having phosphorylated amino acid residues (e.g., phosphotyrosine, phosphoserine, phosphothreonine). Also included are polypeptides modified to increase their resistance to proteolysis or to optimize solubility.

The term "polynucleotide encoding a fusion protein of the present invention" may include a polynucleotide encoding a fusion protein of the present invention, and may also include polynucleotides that additionally include coding and/or non-coding sequences.

The invention also relates to variants of the above polynucleotides which encode fragments, analogs and derivatives of the polypeptides or fusion proteins having the same amino acid sequence as the present invention. These nucleotide variants include substitution variants, deletion variants and insertion variants. As is known in the art, an allelic variant is a substitution of a polynucleotide, which may be a substitution, deletion, or insertion of one or more nucleotides, without substantially altering the function of the fusion protein encoded thereby.

The present invention also relates to polynucleotides which hybridize to the sequences described above and which have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides hybridizable under stringent conditions (or stringent conditions) with the polynucleotides of the present invention. In the present invention, "stringent conditions" mean: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2 XSSC, 0.1% SDS, 60 ℃; or (2) adding denaturant during hybridization, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42 deg.C, etc.; or (3) hybridization occurs only when the identity between two sequences is at least 90% or more, preferably 95% or more.

The fusion proteins and polynucleotides of the invention are preferably provided in isolated form, and more preferably, purified to homogeneity.

The full-length sequence of the polynucleotide of the present invention can be obtained by PCR amplification, recombination, or artificial synthesis. For PCR amplification, primers can be designed based on the nucleotide sequences disclosed herein, particularly open reading frame sequences, and the sequences can be amplified using commercially available cDNA libraries or cDNA libraries prepared by conventional methods known to those skilled in the art as templates. When the sequence is long, two or more PCR amplifications are often required, and then the amplified fragments are spliced together in the correct order.

Once the sequence of interest has been obtained, it can be obtained in large quantities by recombinant methods. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the propagated host cell by conventional methods.

In addition, the sequence can be synthesized by artificial synthesis, especially when the fragment length is short. Generally, fragments with long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them.

At present, DNA sequences encoding the proteins of the present invention (or fragments or derivatives thereof) have been obtained completely by chemical synthesis. The DNA sequence may then be introduced into various existing DNA molecules (or vectors, for example) and cells known in the art.

Methods for amplifying DNA/RNA using PCR techniques are preferably used to obtain the polynucleotides of the invention. Particularly, when it is difficult to obtain a full-length cDNA from a library, it is preferable to use the RACE method (RACE-cDNA terminal rapid amplification method), and primers used for PCR can be appropriately selected based on the sequence information of the present invention disclosed herein and synthesized by a conventional method. The amplified DNA/RNA fragments can be isolated and purified by conventional methods, such as by gel electrophoresis.

Expression vector

The invention also relates to vectors comprising the polynucleotides of the invention, as well as genetically engineered host cells transformed with the vectors of the invention or the coding sequences of the fusion proteins of the invention, and methods for producing the polypeptides of the invention by recombinant techniques.

The polynucleotide sequences of the present invention may be used to express or produce recombinant fusion proteins by conventional recombinant DNA techniques. Generally, the following steps are performed:

(1) transforming or transducing a suitable host cell with a polynucleotide (or variant) of the invention encoding a fusion protein of the invention, or with a recombinant expression vector comprising the polynucleotide;

(2) a host cell cultured in a suitable medium;

(3) isolating and purifying the protein from the culture medium or the cells.

In the present invention, the polynucleotide sequence encoding the fusion protein may be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to a bacterial plasmid, bacteriophage, yeast plasmid, plant cell virus, mammalian cell virus such as adenovirus, retrovirus, or other vectors well known in the art. Any plasmid or vector may be used as long as it can replicate and is stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translation control elements.

Methods well known to those skilled in the art can be used to construct expression vectors containing the DNA sequences encoding the fusion proteins of the present invention and appropriate transcription/translation control signals. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like. The DNA sequence may be operably linked to a suitable promoter in an expression vector to direct mRNA synthesis. Representative examples of such promoters are: lac or trp promoter of E.coli; a lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, LTRs of retrovirus, and other known promoters capable of controlling gene expression in prokaryotic or eukaryotic cells or viruses. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator.

Furthermore, the expression vector preferably comprises one or more selectable marker genes to provide phenotypic traits for selection of transformed host cells, such as dihydrofolate reductase, neomycin resistance and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli.

Vectors comprising the appropriate DNA sequences described above, together with appropriate promoter or control sequences, may be used to transform appropriate host cells to enable expression of the protein.

The host cell may be a prokaryotic cell, such as a bacterial cell; or lower eukaryotic cells, such as yeast cells; or higher eukaryotic cells, such as mammalian cells. Representative examples are: escherichia coli, streptomyces; bacterial cells of salmonella typhimurium; fungal cells such as yeast, plant cells (e.g., ginseng cells).

When the polynucleotide of the present invention is expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase transcription of a gene. Examples include the SV40 enhancer at the late side of the replication origin at 100 to 270 bp, the polyoma enhancer at the late side of the replication origin, and adenovirus enhancers.

It will be clear to one of ordinary skill in the art how to select appropriate vectors, promoters, enhancers and host cells.

Transformation of a host cell with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is prokaryotic, e.g., E.coli, competent cells capable of DNA uptake can be harvested after exponential growth phase using CaCl₂Methods, the steps used are well known in the art. Another method is to use MgCl₂. If desired, transformation can also be carried out by electroporation. When the host is a eukaryote, the following DNA transfection methods may be used: calcium phosphate coprecipitation, conventional mechanical methods such as microinjection, electroporation, liposome encapsulation, etc.

The obtained transformant can be cultured by a conventional method to express the polypeptide encoded by the gene of the present invention. The medium used in the culture may be selected from various conventional media depending on the host cell used. The culturing is performed under conditions suitable for growth of the host cell. After the host cells have been grown to an appropriate cell density, the selected promoter is induced by suitable means (e.g., temperature shift or chemical induction) and the cells are cultured for an additional period of time.

The recombinant polypeptide in the above method may be expressed intracellularly or on the cell membrane, or secreted extracellularly. If necessary, the recombinant protein can be isolated and purified by various separation methods using its physical, chemical and other properties. These methods are well known to those skilled in the art. Examples of such methods include, but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (such as salt precipitation), centrifugation, cell lysis by osmosis, sonication, ultracentrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, High Performance Liquid Chromatography (HPLC), and other various liquid chromatography techniques, and combinations thereof.

Construction of a Somaltulin expression vector

FP-TEV-EK-GLP1(18 or 17) fragments were synthesized with the target gene, and recognition sites for restriction enzymes Nco I and Xho I were placed at both ends of the fragment. The sequence is subjected to codon optimization, and can realize high-level expression of functional protein in escherichia coli. After expression, the expression vector "pBAD/His A" (Kana) was digested with restriction enzymes Nco I and Xho I^R) "and a plasmid containing the target gene of" FP-TEV-EK-GLP1(18 or 17) ", the cleavage products are separated by agarose electrophoresis, extracted by an agarose gel DNA recovery kit, and finally two DNA fragments are connected by T4DNA ligase. The ligation products were chemically transformed into E.coli Top10 cells, and the transformed cells were cultured overnight on LB agar medium (10g/L yeast peptone, 5g/L yeast extract powder, 10g/L NaCl, 1.5% agar) containing 50. mu.g/mL kanamycin. 3 viable colonies were picked, cultured overnight in 5mL of liquid LB medium (10g/L yeast peptone, 5g/L yeast extract powder, 10g/L NaCl) containing 50. mu.g/mL kanamycin, and plasmid extraction was performed using a plasmid miniprep kit. The extracted plasmid was then sequenced using sequencing oligonucleotide primer 5'-ATGCCATAGCATTTTTATCC-3' (SEQ ID NO:15) to confirm correct insertion. The finally obtained plasmid was designated "pBAD-FP-TEV-EK-GLP 1(18 or 17)".

Fmoc modification

In the field of biological medicine, the use of polypeptides is increasing, amino acids are basic raw materials for polypeptide synthesis technology, and all amino acids contain alpha-amino and carboxyl, and some also contain side chain active groups, such as: hydroxyl, amino, guanidyl, heterocycle and the like, therefore, amino and side chain active groups need to be protected in a peptide grafting reaction, and the protecting groups are removed after the polypeptide is synthesized, so that amino acid misconnection and a plurality of side reactions can occur.

Fmoc is a base-sensitive protecting group and can be removed in 50% dichloromethane solution of ammonia such as concentrated ammonia water or dioxane-methanol-4N NaOH (30: 9: 1), piperidine, ethanolamine, cyclohexylamine, 1, 4-dioxane, and pyrrolidone.

Fmoc-protecting groups are generally introduced by Fmoc-Cl or Fmoc-OSu under weakly basic conditions such as sodium carbonate or sodium bicarbonate. Fmoc-OSu allows easier control of reaction conditions and fewer side reactions than Fmoc-Cl.

Fmoc has strong ultraviolet absorption with maximum absorption wavelengths of 267nm (. epsilon.18950), 290nm (. epsilon.5280) and 301nm (. epsilon.6200), so that the detection can be realized by using the ultraviolet absorption, and a plurality of convenience is brought to the automatic polypeptide synthesis of an instrument. Moreover, the method is compatible with a wide range of solvents and reagents, has high mechanical stability, and can be used for various carriers and various activation modes, and the like. Therefore, the Fmoc protecting group is most commonly used in polypeptide synthesis today.

Fmoc-OSu (fluorenylmethoxycarbonylsuccinimides)

Side chain of somaglutide

tBuO-Ste-Glu (AEEA-AEEA-OSu) -OtBu is the side chain of somaglutide.

The preparation method of the somaglutide comprises the steps of firstly obtaining a somaglutide precursor with Boc-protected lysine at the 17-or 18-position by utilizing a gene recombination technology, and then connecting a somaglutide side chain tBuO-Ste-Glu (AEEA-AEEA-OSu) -OtBu to obtain the somaglutide.

Preparation of somaglutide

The invention provides two synthetic routes of Somaltulip, which are respectively shown as follows, a Fmoc compound modified compound 2 is prepared from a Boc-Somaltulip precursor (compound 1), a compound 3 is obtained after the compound 2 is subjected to Boc removal protection, the compound 3 is reacted with an activated Somaltulip side chain tBuO-Ste-Glu (AEEA-AEEA-OSu) -OtBu to obtain a compound 4, then the Fmoc removal reaction is carried out to obtain a compound 5, a tBu protecting group is removed from a side chain, and finally a Somaltulip compound 6 is obtained.

Specifically, the present invention provides a method for preparing somaglutide, comprising the steps of:

(i) providing a Boc-modified somaglutide precursor;

(ii) modifying the Fmoc compound of the Boc modified soxhalutatide precursor to prepare Fmoc and Boc modified soxhalutatide backbones;

(iv) and (3) carrying out Fmoc removal and side chain tBu removal treatment on the Fmoc-modified Somalou peptide so as to prepare the Somalou peptide.

The main advantages of the invention include:

(1) the invention directly utilizes a biosynthesis mode to produce the Boc modified soxhlet peptide precursor without adopting methods of dilution, ultrafiltration liquid exchange and the like to remove excessive inorganic salt in the supernatant of fermentation liquid. In the method of the present invention, the Boc-somagluteptide precursor is separated by using a chromatographic column, the yield of one step is more than 70%, which is 3 times higher than that of the conventional method, and the yield of the Boc-somagluteptide precursor is about 800-1000 mg/L. Moreover, the method can remove most of pigments, reduces the original multi-step process and reduces the process time and equipment investment cost;

(2) due to the protection of the Boc-lysine at the 20 th position, the invention can directly utilize orthogonal reaction with Fmoc protection to synthesize the somaglutide.

(3) The Somarlu peptide synthesized by the method disclosed by the invention has no N-terminal fatty acid acylated impurities, is beneficial to downstream purification, and reduces the cost.

(4) Compared with solid phase synthesis, the method of the invention does not produce racemized impurity polypeptide, does not need to use a large amount of modified amino acid, does not use a large amount of organic reagent, has small environmental pollution and lower cost;

(5) the fusion protein of the invention contains the somaglutide main chain with high specific gravity (increased fusion ratio), the green fluorescent protein in the fusion protein contains arginine and lysine, can be digested into small fragments by protease, has large molecular weight difference compared with the target protein, and is easy to separate.

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. The experimental procedures, in which specific conditions are not noted in the following examples, are generally carried out under conventional conditions or conditions recommended by the manufacturers. Unless otherwise indicated, percentages and parts are by weight.

Example 1 construction of a Somaltulin expression Strain

Construction of the somaglutide expression plasmid is described in the examples in patent application No. 201910210102.9. The DNA fragment of fusion protein FP1-TEV-EK-GLP-1(18) or FP2-TEV-EK-GLP-1(17) was cloned into the NcoI-XhoI site downstream of the araBAD promoter of the expression vector plasmid pBAD/His A (available from NTCC, kanamycin resistance) to give plasmid pBAD-FP1-TEV-EK-GLP-1(18) or pBAD-FP2-TEV-EK-GLP-1 (17). The plasmid maps are shown in FIGS. 1 and 2.

Based on the somaglutide precursor with 2-5 amino acids deleted from the N-terminal shown in SEQ ID NO. 1 or SEQ ID NO. 2, the fusion protein 1 and the fusion protein 2 are constructed

The amino acid sequence of the fusion protein 1 is shown as SEQ ID NO: 4:

MVSKGEELFTGVKLTLKFICTTYVQERTISFKDTYKTRAEVKFEGDENLYFQGDDDDKEGTFTSDVSSYLEGQAAKEFIAWLVRGRG

the amino acid sequence of the fusion protein 2 is shown as SEQ ID NO. 5

MVSKGEELFTGVYVQERTISFKDTYKTRAEVKFEGDTLVNRIELKGIDFENLYFQGDDDDKGTFTSDVSSYLEGQAAKEFIAWLVRGRG

Wherein the leader peptide sequence is MVSKGEELFTGV (SEQ ID NO:7)

The sequence of the green fluorescent protein folding unit (FP) is

FP1:KLTLKFICTTYVQERTISFKDTYKTRAEVKFEGD(SEQ ID NO:6,U3-U4-U5)

FP2:YVQERTISFKDTYKTRAEVKFEGDTLVNRIELKGIDF(SEQ ID NO:10,U4-U5-U6)

The TEV enzyme cleavage site is ENLYFQG (SEQ ID NO: 8);

the enzyme cutting site of the enterokinase is DDDDK (SEQ ID NO:9)

The somaglutide precursor with 2-5 amino acids deleted from the N-terminal is shown in SEQ ID NO. 1 or SEQ ID NO. 2.

SEQ ID NO:1：EGTFTSDVSSYLEGQAAKEFIAWLVRGRG

SEQ ID NO:2：GTFTSDVSSYLEGQAAKEFIAWLVRGRG (K is Boc modified lysine)

The DNA sequence of pylRs was then cloned into the SpeI-SalI site downstream of the araBAD promoter of the expression vector plasmid pEvol-pBpF (available from NTCC for chloramphenicol resistance), while the DNA sequence of the tRNA (pylTcua) of lysyl-tRNA synthetase was PCR inserted downstream of the proK promoter. This plasmid was designated pEvol-pylRs-pylT. The plasmid map is shown in FIG. 3.

The constructed plasmid pBAD-FP1-TEV-EK-GLP-1(18) and pEvol-pylRs-pylT are jointly transformed into an escherichia coli TOP10 strain, and a recombinant strain expressing the somagulin precursor fusion protein FP-TEV-EK-GLP-1(18) is obtained through screening.

The constructed plasmid pBAD-FP2-TEV-EK-GLP-1(17) and pEvol-pylRs-pylT are jointly transformed into an escherichia coli TOP10 strain, and a recombinant strain expressing the somataltide precursor fusion protein FP2-TEV-EK-GLP-1(17) is obtained through screening.

Example 2 expression of Boc-Somarlu peptide precursor

Inoculating the two recombinant Escherichia coli seed solutions into a fermentation culture medium respectively according to the inoculation amount of 5%, culturing at 37 ℃ and pH7.0 in batches until the pH rises to 7.05, separately feeding carbon and nitrogen sources, and feeding the carbon and nitrogen sources according to a constant pH method. After feeding, 7.5M ammonia water is automatically fed, and the pH is controlled to be 7.0-7.2. Culturing for 4-6 hr, adding L-arabinose for induction, and continuously inducing for 14 + -2 hr. Two fermentation broths comprising the somaglutide precursor fusion protein were obtained.

Example 3 preparation of Boc-somagluteptide precursor Inclusion bodies

After centrifuging the two fermentation broths obtained in example 2, the wet biomass was centrifuged at a ratio of 1: mixing the mixture with a volume of 1 and a bacterium breaking buffer solution, suspending for 3h, then breaking the bacteria by using a high-pressure homogenizer, centrifugally collecting inclusion bodies after breaking the bacteria, cleaning the inclusion bodies by using the buffer solution, weighing the inclusion bodies after cleaning, wherein the yield of the inclusion bodies of the two fusion proteins is 39-43g/L and 41-45g/L respectively.

The results of SDS-PAGE are shown in FIG. 4.

Example 4 denaturation and enzymatic cleavage of Boc-somagluteptide precursor Inclusion bodies

To the inclusion bodies obtained in example 3, 8mol/L of urea-dissolving buffer was added at a weight/volume ratio of 1:15, the mixture was dissolved with stirring at room temperature, the protein concentration was measured by the Bradford method, the total protein concentration of the inclusion body-dissolving solution was controlled to about 20mg/ml, and the pH was adjusted to 9.0. + -. 1.0 with NaOH. Dripping the inclusion body dissolving solution into the renaturation buffer solution to dilute the inclusion body dissolving solution by 5-10 times, maintaining the pH value of the fusion protein renaturation solution at 9.0-10.0, controlling the temperature at 4-8 ℃, and renaturing for 10-20 h.

The results showed that the ratio of fusion protein 1 to fusion protein 2 was about 30% and 33% after solubilization.

Example 5 Primary purification of Boc-Somarlu peptide fusion protein

Filtering the fusion protein renaturation solution obtained in the embodiment 4 by a filter membrane of 0.45 mu m to remove undissolved substances; according to the difference of the isoelectric points of the proteins, the fusion protein is primarily purified by adopting an anion exchange column.

Experimental results show that after anion exchange chromatography, the purities of the Boc-somaglutide precursor fusion protein 1 and the fusion protein 2 both reach more than 65%, the carrying capacity is about 18mg/mL, and the yield is more than 80%.

Example 6 enzymatic cleavage of Boc-somaglutide precursor fusion protein

Desalting the Boc-somaglutide precursor fusion protein primarily purified in the example 5, adjusting the pH value to 7.5-8.5, controlling the temperature to 18-25 ℃, adding enterokinase for enzyme digestion for 8-24h to obtain a Boc-somaglutide precursor, wherein the Boc-somaglutide precursor 1 and the precursor 2 are about 0.9g/L and 1.2g/L, and the enzyme digestion efficiency is more than or equal to 95%.

Example 7 reverse phase chromatography of Boc-Somarlu peptide precursor

And purifying the Boc-somaglutide precursor by reverse phase chromatography according to the hydrophobicity difference of the polypeptide and the protein to remove most of the foreign protein.

Diluted hydrochloric acid was added to the enzyme-cleaved solutions of Boc-somaglutide precursor 1 and precursor 2 obtained in example 6, the pH was adjusted to 2.0 to 3.0, and after the solution was clarified by filtration through a 0.45 μm filter membrane, an appropriate amount of acetonitrile was added to the solution to perform reverse phase chromatography separation and purification.

Using an aqueous solution containing trifluoroacetic acid as a mobile phase A; acetonitrile solution containing trifluoroacetic acid is used as mobile phase B. And combining the Boc-somaglutelin precursor with a filler, controlling the loading amount of the Boc-somaglutelin precursor to be not higher than 10mg/mL, carrying out gradient elution, and collecting the Boc-somaglutelin precursor. The experimental result shows that the purity of the Boc-somaglutide precursor 1 and the Boc-somaglutide precursor 2 collected by reverse phase chromatography is more than or equal to 90 percent, the yield is more than 80 percent, and an HPLC detection map of the purified Boc-somaglutide precursor is shown in figure 5.

EXAMPLE 8 preparation of Somaltulin Using Boc-Somaltulin precursor 1 (Fmoc-H-Aib, line 1)

Taking Boc-somaglutide precursor 1 (compound 1) obtained in example 7 (the molar ratio of the raw materials is 30 mg), Fmoc-H-Aib, DIPEA and DMF are added according to the molar ratio of the table 1, and the reaction is carried out for 8 to 12 hours, so as to prepare Fmoc and Boc protected somaglutide main chains. Adding a mixed solution of methyl tert-ether and petroleum ether into the reaction solution, centrifuging the precipitate, and washing the precipitate with methyl tert-ether for 2-3 times to obtain an Fmoc-protected compound 2: Fmoc-GLP-1 (Lys)²⁰Boc)。

TABLE 1 molar ratio of the feeds

	Boc-SomarluPeptides	Fmoc-H-Aib	DIPEA	DMF
					Equivalent weight or volume	1.0eq	2.5eq	12eq	1V

Adding TFA solution into the compound 2 after crude purification, stirring at low temperature for 0.5-2.0h, adding mixed solution of methyl tert-butyl ether and petroleum ether with the volume of 15-20 times into reaction liquid, precipitating and centrifuging, washing the precipitate for 2-3 times by using the mixed solution, and finally obtaining a solid compound 3 without Boc: Fmoc-GLP-1 (Lys)²⁰NH₂)。

Taking the compound 3 after Boc removal, adding DMF and 12eq of DIPEA, and stirring gently at room temperature for 5 min. 2.5eq of tBuO-Ste-Glu (AEEA-AEEA-OSu) -OtBu were dissolved in DMF solution and added to the resulting mixture and the reaction mixture was shaken gently at room temperature for 2-3 h. Adding a mixed solution of methyl tertiary butyl ether and petroleum ether with the volume 15-20 times that of the reaction system into the reaction system, precipitating and centrifuging, washing the solid for 2-3 times by using the mixed solution, and drying in vacuum to obtain a compound 4: Fmoc-GLP-1- (tBuO-Ste-Glu (AEEA-AEEA) -OtBu) (20).

Taking the compound 4, adding 20% piperidine-containing DMF solution, and reacting for 0.5-2.0h at room temperature. And then adding a mixed solvent of methyl tert-ether and petroleum ether into the reaction system, precipitating and centrifuging, washing the solid for 3-5 times by using the mixed solvent of methyl tert-ether and petroleum ether to obtain a compound 5 after Fmoc removal: Fmoc-GLP-1- (tBuO-Ste-Glu (AEEA-AEEA) -OtBu) (20).

Taking a compound 5, adding a mixed solution of TFA (trifluoroacetic acid), TIS (triisopropylsilane) and DCM (dichloromethane), carrying out oscillation reaction at room temperature for 2-4 hours to remove a side chain tBu protecting group, adding a mixed solvent of 10-20 times of methyl tert-butyl ether and petroleum ether into a reaction system, precipitating and centrifuging, and washing a solid with the mixed solvent of methyl tert-ether and petroleum ether for 3 times to obtain a final product. After HPLC purification, the obtained soxhalutatide has a purity of more than 98%.

Example 9 preparation of Somaltulide Using Boc-Somaltulide precursor 2 (Fmoc-H-Aib-E, line 2)

Taking Boc-somaglutide precursor 2 (compound 7) obtained in example 7 (the molar ratio of the feed is 30mg for example), Fmoc-H-Aib-E, DIPEA and DMF were added according to the molar ratio shown in Table 2, and the reaction was carried out for 8-12 hours to obtain Fmoc-and Boc-protected somaglutide backbones. Adding a mixed solution of methyl tert-ether and petroleum ether into the reaction solution, centrifuging the precipitate, and washing the precipitate with methyl tert-ether for 2-3 times to obtain an Fmoc-protected compound 2: Fmoc-GLP-1 (Lys)²⁰Boc)。

TABLE 2 molar ratio of feeds

Taking the crude and purified compound 2, adding a TFA solution, stirring at low temperature for 0.5-2.0h, adding a mixed solution of methyl tertiary butyl ether and petroleum ether with the volume 15-20 times that of a reaction system into a reaction solution, precipitating and centrifuging, washing the precipitate for 2-3 times by using the mixed solution, and finally obtaining a solid compound 3 without Boc: Fmoc-GLP-1 (Lys)²⁰NH₂)。

Comparative example

Construction and expression of the fusion protein expression strain were carried out in a similar manner to that in examples 1 to 3, except that the amino acid sequence of the fusion protein used for expression was as shown in SEQ ID NO: 22.

MKKLLFAIPLVVPFYSHSTMELEICSWYHMGIRSFLEQKLISEEDLNSAVDDDDDKEGTFTSDVSSYLEGQAAKEFIAWLVRGRG (SEQ ID NO:22) the above fusion protein contains gIII signal peptide. The results showed that the inclusion body yield was 30g of wet-heavy inclusion bodies.

The results show that compared with the expression of the fusion protein with the conventional structure, the expression amount of the fusion protein is obviously improved.

All documents referred to herein are incorporated by reference into this application as if each were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Sequence listing

<110> Ningbo spread Biotechnology Ltd

<120> preparation method of somaglutide

<130> P2020-0743

<160> 22

<170> SIPOSequenceListing 1.0

<210> 1

<211> 29

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 1

Glu Gly Thr Phe Thr Ser Asp Val Ser Ser Tyr Leu Glu Gly Gln Ala

1 5 10 15

Ala Lys Glu Phe Ile Ala Trp Leu Val Arg Gly Arg Gly

20 25

<210> 2

<211> 28

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Gly Thr Phe Thr Ser Asp Val Ser Ser Tyr Leu Glu Gly Gln Ala Ala

1 5 10 15

Lys Glu Phe Ile Ala Trp Leu Val Arg Gly Arg Gly

20 25

<210> 3

<211> 31

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 3

His Xaa Glu Gly Thr Phe Thr Ser Asp Val Ser Ser Tyr Leu Glu Gly

1 5 10 15

Gln Ala Ala Lys Glu Phe Ile Ala Trp Leu Val Arg Gly Arg Gly

20 25 30

<210> 4

<211> 87

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 4

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Lys Leu Thr Leu

1 5 10 15

Lys Phe Ile Cys Thr Thr Tyr Val Gln Glu Arg Thr Ile Ser Phe Lys

20 25 30

Asp Thr Tyr Lys Thr Arg Ala Glu Val Lys Phe Glu Gly Asp Glu Asn

35 40 45

Leu Tyr Phe Gln Gly Asp Asp Asp Asp Lys Glu Gly Thr Phe Thr Ser

50 55 60

Asp Val Ser Ser Tyr Leu Glu Gly Gln Ala Ala Lys Glu Phe Ile Ala

65 70 75 80

Trp Leu Val Arg Gly Arg Gly

85

<210> 5

<211> 89

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 5

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Tyr Val Gln Glu

1 5 10 15

Arg Thr Ile Ser Phe Lys Asp Thr Tyr Lys Thr Arg Ala Glu Val Lys

20 25 30

Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp

35 40 45

Phe Glu Asn Leu Tyr Phe Gln Gly Asp Asp Asp Asp Lys Gly Thr Phe

50 55 60

Thr Ser Asp Val Ser Ser Tyr Leu Glu Gly Gln Ala Ala Lys Glu Phe

65 70 75 80

Ile Ala Trp Leu Val Arg Gly Arg Gly

85

<210> 6

<211> 34

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 6

Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr Tyr Val Gln Glu Arg Thr

1 5 10 15

Ile Ser Phe Lys Asp Thr Tyr Lys Thr Arg Ala Glu Val Lys Phe Glu

20 25 30

Gly Asp

<210> 7

<211> 12

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 7

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val

1 5 10

<210> 8

<211> 7

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 8

Glu Asn Leu Tyr Phe Gln Gly

1 5

<210> 9

<211> 5

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 9

Asp Asp Asp Asp Lys

1 5

<210> 10

<211> 37

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 10

Tyr Val Gln Glu Arg Thr Ile Ser Phe Lys Asp Thr Tyr Lys Thr Arg

1 5 10 15

Ala Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu

20 25 30

Lys Gly Ile Asp Phe

35

<210> 11

<211> 13

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 11

Val Pro Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly

1 5 10

<210> 12

<211> 14

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 12

His Lys Phe Ser Val Arg Gly Glu Gly Glu Gly Asp Ala Thr

1 5 10

<210> 13

<211> 10

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 13

Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr

1 5 10

<210> 14

<211> 11

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 14

Tyr Val Gln Glu Arg Thr Ile Ser Phe Lys Asp

1 5 10

<210> 15

<211> 13

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 15

Thr Tyr Lys Thr Arg Ala Glu Val Lys Phe Glu Gly Asp

1 5 10

<210> 16

<211> 13

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 16

Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp Phe

1 5 10

<210> 17

<211> 10

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 17

His Asn Val Tyr Ile Thr Ala Asp Lys Gln

1 5 10

<210> 18

<211> 14

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 18

Gly Ile Lys Ala Asn Phe Lys Ile Arg His Asn Val Glu Asp

1 5 10

<210> 19

<211> 14

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 19

Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly

1 5 10

<210> 20

<211> 12

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 20

His Tyr Leu Ser Thr Gln Ser Val Leu Ser Lys Asp

1 5 10

<210> 21

<211> 13

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 21

His Met Val Leu Leu Glu Phe Val Thr Ala Ala Gly Ile

1 5 10

<210> 22

<211> 85

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 22

Met Lys Lys Leu Leu Phe Ala Ile Pro Leu Val Val Pro Phe Tyr Ser

1 5 10 15

His Ser Thr Met Glu Leu Glu Ile Cys Ser Trp Tyr His Met Gly Ile

20 25 30

Arg Ser Phe Leu Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Asn Ser

35 40 45

Ala Val Asp Asp Asp Asp Asp Lys Glu Gly Thr Phe Thr Ser Asp Val

50 55 60

Ser Ser Tyr Leu Glu Gly Gln Ala Ala Lys Glu Phe Ile Ala Trp Leu

65 70 75 80

Val Arg Gly Arg Gly

85

Claims

1. A method of preparing somaglutide, comprising the steps of:

wherein the somaglutide fusion protein has a structure shown in formula I from N end to C end:

A-FP-TEV-EK-G (I)

in the formula (I), the compound is shown in the specification,

"-" represents a peptide bond;

a is a null or leader peptide sequence,

FP is a green fluorescent protein folding unit;

g is a sumatriptan precursor or a fragment thereof;

beta-sheet unit Amino acid sequence u1 VPILVELDGDVNG(SEQ ID NO:11) u2 HKFSVRGEGEGDAT(SEQ ID NO:12) u3 KLTLKFICTT(SEQ ID NO:13) u4 YVQERTISFKD(SEQ ID NO:14) u5 TYKTRAEVKFEGD(SEQ ID NO:15) u6 TLVNRIELKGIDF(SEQ ID NO:16) u7 HNVYITADKQ(SEQ ID NO:17) u8 GIKANFKIRHNVED(SEQ ID NO:18) u9 VQLADHYQQNTPIG(SEQ ID NO:19) u10 HYLSTQSVLSKD(SEQ ID NO:20) u11 HMVLLEFVTAAGI(SEQ ID NO:21)。

2. The method of claim 1, wherein said step (B) further comprises the steps of:

3. The method of claim 2, wherein in step (i), the enzymatic cleavage is performed using enterokinase.

4. The method of claim 2, wherein the Boc-modified somaglutide precursor comprises:

a first precursor of the 18 th Boc modified Somaltulin, wherein the amino acid sequence of the first precursor is shown as SEQ ID NO. 1;

5. The method of claim 2, wherein said Fmoc complex is Fmoc-H-Aib or Fmoc-H-Aib-E.

6. The method of claim 2, wherein the reaction of step (ii) is as follows:

7. the method of claim 2, wherein the side chain of the somaglutide is as follows:

8. the method of claim 2, wherein in step (iii), further comprising the step of:

9. The method of claim 1, wherein the green fluorescent protein folding unit is u2-u3, u4-u5, u1-u2-u3, u3-u4-u5, or u4-u5-u 6.

10. A somaglutide formulation prepared using the method of claim 1.