WO2022094146A1

WO2022094146A1 - Methods of converting precursor proteins to mature proteins using kex2 proteases

Info

Publication number: WO2022094146A1
Application number: PCT/US2021/057142
Authority: WO
Inventors: Michael Hecht
Original assignee: The Trustees Of Princeton University; ZARZHITSKY, Shlomo
Priority date: 2020-10-28
Filing date: 2021-10-28
Publication date: 2022-05-05

Abstract

The disclosure provides, in various embodiments, methods of converting a precursor protein ( e.g., a prohormone) to a mature protein (e.g, a hormone), and methods of obtaining a target protein (e.g, from a fusion protein), using a KEX2 protease. In various embodiments, the disclosure further provides fusion proteins, polynucleotides, vectors, host cells and kits that are useful for performing the methods of the disclosure.

Description

METHODS OF CONVERTING PRECURSOR PROTEINS TO MATURE PROTEINS USING KEX2 PROTEASES

RELATED APPLICATION

[0001] This application claims the benefit of U.S. Provisional Application No. 63/106,676, filed on October 28, 2020. The entire teachings of this application are incorporated herein by reference.

INCORPORATION BY REFERENCE OF MATERIAL IN ASCII TEXT FILE

[0002] This application incorporates by reference the Sequence Listing contained in the following ASCII text file being submitted concurrently herewith: a) File name: 53911030001SequenceListing.txt; created October 28, 2021, 33,000 Bytes in size.

BACKGROUND

[0003] It was estimated in 2018 that approximately 7.4 million Americans with diabetes required the administration of insulin. Insulin is a protein hormone that enables the regulation of blood sugar in the body. Impaired function or lack of insulin, as observed with patients suffering from type I or type II diabetes, could lead to lethal consequences.

[0004] The insulin molecule is composed of two peptide chains, A and B, held together by several covalent bonds formed by cysteine residues. During the process of production, the insulin molecule begins as a prohormone, containing a connecting chain C, that holds the A and B chains together to enable the folding of the protein and the formation of correct disulfide bonds. The A and B chains of insulin held by C chain, referred to as the insulin prohormone, or proinsulin, requires a conversion step to release the mature insulin from the connecting C chain. This process happens in the pancreas in vivo.

[0005] To accommodate specific pharmacokinetic and pharmacodynamic needs, several insulin analogs have been developed. One of these analogs is the dibasic insulin glargine, which is sold under the trade name of LANTUS® insulin, among others. Two modifications to the wildtype insulin were introduced to make the insulin glargine molecule. One modification is the substitution of Asn with Gly at position A21 at the end of the A chain, and the other is the addition of two Arg residues at the end of the B chain.

- 1 - [0006] The most utilized enzyme in the in vitro conversion of proinsulin to insulin is trypsin. Trypsin recognize and cleaves at the C-terminus of Lys and Arg residues. The current state of the art process for producing glargine involves chemical blockage of certain amino acids, enzymatic processing using trypsin, and finally de-blocking. However, since the end of the B chain of glargine contains the sequence KTRR-COOH, (i.e., Lys-Thr-Arg-Arg-COOH), in vitro conversion of glargine with trypsin is inefficient and yields several byproducts, such as desB32R and desB30TRR. Thus, the overall yields achieved by this tedious process are low, and require purification to eliminate byproducts. To increase the efficiency of trypsin cleavage and improve yields of mature glargine, it has been proposed to chemically block B29K. Although yields of insulin glargine can be increased by doing so, this chemical blockage does not prevent the formation of undesirable byproducts.

[0007] Accordingly, there is a need for efficient methods for processing proinsulin, glargine, and other prohormones.

SUMMARY

[0008] The present disclosure generally relates to methods, compositions, and kits that are useful for processing precursor proteins (e.g. , prohormones) to obtain mature proteins (e.g., mature hormones) using KEX2 proteases.

[0009] Accordingly, the disclosure provides, in various embodiments, a method of converting a precursor protein to a mature protein. The method comprises providing a precursor protein that is to be converted to a mature protein, wherein the precursor protein comprises at least one cleavage site for a KEX2 protease, and contacting the precursor protein with a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 under conditions in which cleavage of the precursor protein by the KEX2 protease occurs to provide a mature protein.

[0010] Also provided herein, in various embodiments, is a method of obtaining a target protein, comprising providing a precursor protein that comprises a target protein and at least one cleavage site for a KEX2 protease; and contacting the precursor protein with a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 under conditions in which cleavage of the precursor protein by the KEX2 protease occurs and releases the target protein. In some embodiments, the method further comprises isolating the target protein. [0011] In various embodiments, the present disclosure further provides a fusion protein comprising a DEEP fusion tag, a target protein, and at least one KEX2 protease cleavage site, as well as polynucleotides, vectors, and host cells encoding and/or expressing the fusion protein.

[0012] Also provided herein, in various embodiments, is a target protein obtained by a method of the disclosure.

[0013] The present disclosure further provides, in various embodiments, a kit comprising a polynucleotide that comprises a nucleotide sequence encoding a DEEP fusion tag and a cloning site for introducing a nucleotide sequence encoding a target protein to form a fusion protein. The kit also comprises a KEX2 protease that comprises an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The foregoing will be apparent from the following more particular description of example embodiments.

[0015] FIG. 1 depicts an alignment of example DEEP-glargine precursor proteins disclosed herein. Box 1 indicates linker amino acids. Boxes 2, 4, and 5 indicate KEX2 protease cleavage sites. Box 3 indicates glargine B chain polypeptide (SEQ ID NO:24). Box 6 indicates glargine A chain polypeptide (SEQ ID NO:23). The unbounded regions of SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28 and SEQ ID NO:29, located between boxes 4 and 5, correspond to DEEP fusion tags having amino acid sequences of SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO: 13 and SEQ ID NO: 17, respectively.

[0016] FIG. 2 is a graph showing the concentration of processed (mature active) insulin glargine after treatment of a DEEP-glargine precursor (SEQ ID NO:27) with various molar ratios of KEX2 enzyme at varying CaCl₂ concentrations. Concentration was calculated based on injection of 0.03mg/ml LANTUS® insulin (AUC=386).

DETAILED DESCRIPTION

[0017] A description of example embodiments follows.

[0018] When introducing elements disclosed herein, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “having” and “including” are intended to be open-ended and mean that there may be additional elements other than the listed elements.

- 3 - [0019] The term “about,” when referring to a measurable value, such as an amount, refers to variations of ± 20%, e.g., in some embodiments, ± 10%, ± 5%, ± 1% or ± 0.1% from the specified value.

[0020] “Protein,” “peptide” and “polypeptide” are used interchangeably herein to denote a polymer of at least two amino acids covalently linked by an amide bond, regardless of length or post-translational modification (e.g., glycosylation or phosphorylation). A protein, peptide or polypeptide can comprise any suitable L-and/or D-amino acid, for example, common a-amino acids (e.g., alanine, glycine, valine), non-a-amino acids (e.g., 0-alanine, 4- aminobutyric acid, 6-aminocaproic acid, sarcosine, statine), and unusual amino acids (e.g., citrulline, homocitruline, homoserine, norleucine, norvaline, ornithine). The amino, carboxyl and/or other functional groups on a peptide can be free (e.g., unmodified) or protected with a suitable protecting group. Suitable protecting groups for amino and carboxyl groups, and methods for adding or removing protecting groups are known in the art and are disclosed in, for example, Green and Wuts, “Protecting Groups in Organic Synthesis, ” John Wiley and Sons, 1991. The functional groups of a protein, peptide or polypeptide can also be derivatized (e.g., alkylated) or labeled (e.g., with a detectable label, such as a fluorogen or a hapten) using methods known in the art. A protein, peptide or polypeptide can comprise one or more modifications (e.g., amino acid linkers, acylation, acetylation, amidation, methylation, terminal modifiers (e.g., cyclizing modifications), A-methyl-a-amino group substitution), if desired. In addition, a protein, peptide or polypeptide can be an analog of a known and/or naturally-occurring peptide, for example, a peptide analog having conservative amino acid residue substitution(s).

[0021] The term “nucleic acid” is used herein to refer to a polymer comprising multiple nucleotide monomers (e.g., ribonucleotide monomers or deoxyribonucleotide monomers). “Nucleic acid” includes, for example, DNA (e.g., cDNA), RNA, and DNA-RNA hybrid molecules. Nucleic acid molecules can be naturally occurring, recombinant, or synthetic. In addition, nucleic acid molecules can be single-stranded, double-stranded or triple-stranded. In some aspects, nucleic acid molecules can be modified. Nucleic acid modifications include, for example, methylation, substitution of one or more of the naturally occurring nucleotides with a nucleotide analog, internucleotide modifications such as uncharged linkages (e.g. , methyl phosphonates, phosphotriesters, phosphoamidates, carbamates, and the like), charged linkages (e.g., phosphorothioates, phosphorodithioates, and the like), pendent moi eties (e.g., polypeptides), intercalators (e.g., acridine, psoralen, and the like), chelators, alkylators, and 301

WO 2022/094146 PCT/US2021/057142 modified linkages (e.g., alpha anomeric nucleic acids, and the like). In the case of a doublestranded polymer, “nucleic acid” can refer to either or both strands of the molecule.

[0022] Methods of the Disclosure

[0023] The present disclosure is based, at least in part, on the discovery that cleavage of a prohormone of an insulin analog by KEX2 protease can facilitate production of the mature form of the insulin analog, with high efficiency and without producing undesired byproducts that result from alternative known methods.

[0024] Accordingly, in various embodiments, the present disclosure provides a method of converting a precursor protein into a mature protein by contacting the precursor protein with a KEX2 protease. The method comprises providing a precursor protein that is to be converted to a mature protein, wherein the precursor protein comprises at least one cleavage site for a KEX2 protease, and contacting (e.g., incubating in vitro) the precursor protein with a KEX2 protease under conditions in which cleavage of the precursor protein by the KEX2 protease occurs to provide a mature protein. In particular embodiments, the methods of the disclosure are useful for converting a prohormone to a mature hormone using KEX2 protease (e.g. , a yeast KEX2 protease). In some embodiments, the prohormone is insulin prohormone, also known as proinsulin. In some embodiments, the prohormone is insulin glargine prohormone, also referred to as proinsulin glargine or glargine proinsulin or proglargine.

[0025] In various other embodiments, the present disclosure provides a method of obtaining a target protein by contacting a precursor protein with a KEX2 protease. The method comprises providing a precursor protein that comprises a target protein and at least one cleavage site for a KEX2 protease, and contacting the precursor protein with a KEX2 protease under conditions in which cleavage of the precursor protein by the KEX2 protease occurs and releases the target protein. In some embodiments, the method further comprises isolating the target protein. In some embodiments, the precursor protein is a fusion protein comprising the target protein. As used herein, the term “target protein” refers to a peptide or polypeptide whose expression in a host system is desired. Such proteins are also referred to herein as proteins of interest, or POIs. Examples of such proteins that can be included in precursor or fusion proteins of the disclosure are green fluorescent protein (GFP), amyloid beta (AP) polypeptide, Trp Cage protein, LS3 polypeptide, insulin A chain polypeptide, and insulin B chain polypeptide. A further example of a target protein that can be included in precursor or fusion proteins of the disclosure is insulin, or a fragment thereof, such as an 301

WO 2022/094146 PCT/US2021/057142 insulin A chain polypeptide, an insulin B chain polypeptide or an insulin C chain polypeptide, or a combination thereof. Another example of a target protein that can be included in the precursor or fusion proteins of the disclosure is glargine, or a fragment thereof.

[0026] In some embodiments, the disclosure provides a target protein obtained by a method disclosed herein.

[0027] Suitable KEX2 proteases for use in the methods disclosed herein include the KEX2 protease from S. cerevisiae, NCBI Reference Sequence: NP 014161.1 (SEQ ID NO: 1), and KEX2 proteases comprising the amino acid sequence of SEQ ID NO: 1, or a variant amino acid sequence thereof (e.g., a variant amino acid sequence having at least about 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater amino acid sequence identity to wild type S. cerevisiae KEX2 protease). KEX2 proteases can be naturally occurring (e.g., isolated, purified, extracted from a natural source, such as S. cerevisiae or another organism that expresses a KEX2 protease endogenously), recombinant, or synthetic. Recombinant yeast KEX2 proteases, including recombinant S. cerevisiae KEX2 protease, can be obtained commercially from a variety of sources. In various embodiments, a KEX2 comprises, consists essentially of, or consists of the amino acid sequence of SEQ ID NO: 1.

[0028] As used herein, the term “sequence identity” means that two nucleotide or amino acid sequences, when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, share at least, e.g., 70% sequence identity, or at least 80% sequence identity, or at least 85% sequence identity, or at least 90% sequence identity, or at least 95% sequence identity or more.

[0029] For sequence comparison, one sequence acts as a reference sequence (e.g., parent sequence) to which one or more test sequences are compared. The sequence identity comparison can be examined throughout the entire length of a given protein, or within a desired fragment of a given protein. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.

[0030] Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the 301

WO 2022/094146 PCT/US2021/057142 search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see generally Ausubel et al., Current Protocols in Molecular Biology). One example of algorithm that is suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul et al., J. Mol. Biol. 215:403 (1990). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (publicly accessible through the National Institutes of Health NCBI internet server). Typically, default program parameters can be used to perform the sequence comparison, although customized parameters can also be used. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989)).

[0031] In the methods of the disclosure, a precursor protein is contacted with a KEX2 protease under conditions suitable for converting the prohormone to a mature hormone. Conditions suitable for converting the prohormone to a mature hormone using a KEX2 protease can be readily ascertained by a person of ordinary skill in the art to which the present disclosure pertains, and includes those described herein, as well as those described in product sheets supplied with commercially available KEX2 proteases.

[0032] In certain embodiments, the conversion is performed in vitro. In other embodiments, the conversion is performed in vivo (e.g., in eukaryotic cells, such as yeast cells, that express a KEX2 protease, either endogenously or exogenously).

[0033] The term “precursor protein” refers to any protein that can be processed to yield a mature (e.g., active) protein, including preproteins and prohormones. Precursor proteins encompasses both naturally occurring proteins and artificial proteins, such as fusion proteins. A precursor protein can be synthetic, semi-synthetic or recombinant. In some embodiments, the precursor protein includes the sequence of a mature protein. In some embodiments, the precursor protein is a preprotein. In some embodiments, the precursor protein is a prohormone. In certain embodiments, the precursor protein is a prohormone of insulin, also referred to as insulin prohormone or proinsulin, or an analog thereof. In particular embodiments, the precursor protein is a prohormone of glargine, also referred to as proinsulin 301

WO 2022/094146 PCT/US2021/057142 glargine or glargine proinsulin or proglargine. When the precursor protein is a prohormone, the corresponding mature protein is generally the mature hormone form of the prohormone. [0034] Precursor proteins can be selected from the following non-limiting example classes of proteins: transcription factors, ligands for cellular receptors, hormones and extracellular binding peptides. Examples of precursor proteins include enkephlin, LHRH, neuropeptides, glycoincretins, integrin, glucagons and glucagon-like peptides, antithrombotic peptides, cytokines and interleukins, transferrins, interferons, endothelins, natriuretic hormones, extracellular kinase ligands, angiotensin enzyme inhibitors, peptide antiviral compounds, thrombin, substance P, substance G, somatotropin, somatostatin, GnRH, bradykinin, vasopressin, insulin, and growth factors.

[0035] Examples of mature proteins that can be generated from a precursor protein (e.g., by the methods disclosed herein) include, without limitation, growth hormones (GH), particularly human and bovine growth hormone, growth hormone-releasing hormones; interferon including a-, P-, or y-interferons, interleukin-I, interleukin-II, erythropoietin, including a- and P-erythropoietin (EPO), granulocyte colony stimulating factor (GCSF), granulocyte macrophage colony stimulating factor (GM-CSF), anti-angiogenic proteins (e.g., angiostatin, endostatin), PACAP polypeptide (pituitary adenylate cyclase activating polypeptide), vasoactive intestinal peptide (VIP), thyrotrophin releasing hormone (TRH), corticotropin releasing hormone (CRH), vasopressin, arginine vasopressin (A VP), angiotensin, calcitonin, atrial naturetic factor, somatostatin, adrenocorticotropin, gonadotropin releasing hormone, oxytocin, insulin, somatotropin, plasminogen tissue activator, coagulation factors including coagulation factors VIII and IX, glucosylceramidase, sargramostim, lenograstin, filgrastin, dornase-a, molgramostim, PEG-L-asparaginase, PEG- adenosine deaminase, hirudin, eptacog-a (human blood coagulation factor Vila), nerve growth factors, transforming growth factor, epidermal growth factor, basic fibroblast growth factor, VEGF, heparin including low molecular weight heparin, calcitonin, antigens, monoclonal antibodies, vancomycin, desferrioxamine (DFO), parathyroid hormone, an immunogen or antigen, an antibody such as a monoclonal antibody.

[0036] KEX2 cleavage sites are known in the art. Examples of KEX2 cleavage sites include -Lys-Arg-|-Xaa- and -Arg-Arg-|-Xaa-, where Xaa can be any amino acid. The KEX2 cleavage site can be naturally occurring in a precursor protein, or can be added to the precursor protein (e.g., in a peptide linker or other heterologous peptide sequence), for example, using molecular cloning and recombinant protein expression. 301

WO 2022/094146 PCT/US2021/057142

[0037] Without wishing to be bound by theory, it is believed that the methods disclosed herein can be applied to any hormone (e.g., any insulin hormone or insulin analog) that requires in vitro processing to convert a prohormone to a mature hormone, provided that the prohormone comprises at least one (e.g., 2, 3, 4, 5 or more) KEX2 cleavage site.

[0038] Insulin is transcribed as a 110-amino acid chain, sometimes referred to as preproinsulin. The amino acid sequence of human preproinsulin has been assigned UniProt Accession No. P01308 (SEQ ID NO: 18). Removal of the signal peptide of preproinsulin, consisting of amino acid residues 1-24 (SEQ ID NO: 19), produces proinsulin. Biologically active insulin results from removal of amino acid residues 57-87 of SEQ ID NO: 18, corresponding to the amino acid sequence of the insulin C chain (SEQ ID NO:21), and formation of disulfide bonds between the A and B chains. Thus, biologically active insulin comprises just 51 amino acids of the original translation product. Unless otherwise indicated, “insulin,” as used herein, encompasses preproinsulin, proinsulin and biologically active insulin. In some embodiments, the insulin is biologically active insulin. In some embodiments, the insulin is proinsulin. In some embodiments, the insulin is preproinsulin. [0039] As used herein, “insulin A chain polypeptide,” can be naturally occurring or non- naturally occurring (e.g., engineered). The insulin A chain polypeptide can be recombinant or synthetic, and unmodified or modified (e.g., post-translationally modified, as by glycosylation or phosphorylation, for example). Examples of insulin A chain polypeptides that are suitable for use in the fusion proteins and methods described herein are known in the art and include variants of naturally occurring insulin A chain polypeptides (e.g., variants having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99% identity to a naturally occurring insulin A chain polypeptide), such as an insulin A chain polypeptide from humans. In some embodiments, an insulin A chain polypeptide is a polypeptide having the amino acid sequence of amino acid residues 90-110 of human insulin assigned UniProt Accession No. P01308 (SEQ ID NO: 18), or a variant thereof having at least about 70% (c.g, about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to amino acid residues 90-110 of the amino acid sequence of SEQ ID NO: 18. The amino acid sequence of SEQ ID NO:22 corresponds to amino acid residues 90-110 of the amino acid sequence of SEQ ID NO: 18. Accordingly, in some embodiments, an insulin A chain polypeptide is a polypeptide having the amino acid sequence of SEQ ID NO:22, or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to the amino acid sequence of SEQ ID NO:22. [0040] As used herein, “insulin B chain polypeptide,” can be naturally occurring or non- naturally occurring (e.g., engineered). The insulin B chain polypeptide can be recombinant or synthetic, and unmodified or modified (e.g., post-translationally modified, as by glycosylation or phosphorylation, for example). Examples of insulin B chain polypeptides that are suitable for use in the fusion proteins and methods described herein are known in the art and include variants of naturally occurring insulin B chain polypeptides (e.g., variants having at least about 70%, about 75%, about 80%, about 85%, about 90, about 95%, about 96%, about 97%, about 98% or about 99% identity to a naturally occurring insulin B chain polypeptide), such as an insulin B chain polypeptide from humans. In some embodiments, an insulin B chain polypeptide is a polypeptide having the amino acid sequence of amino acid residues 25-54 of human insulin assigned UniProt Accession No. P01308 (SEQ ID NO: 18), or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to amino acid residues 25-54 of the amino acid sequence of SEQ ID NO: 18. The amino acid sequence of SEQ ID NO:20 corresponds to amino acid residues 25-54 of the amino acid sequence of SEQ ID NO: 18. Accordingly, in some embodiments, an insulin B chain polypeptide is a polypeptide having the amino acid sequence of SEQ ID NO:20, or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to the amino acid sequence of SEQ ID NO:20. [0041] As used herein, “insulin C chain polypeptide,” can be naturally occurring or non- naturally occurring (e.g., engineered). The insulin C chain polypeptide can be recombinant or synthetic, and unmodified or modified (e.g., post-translationally modified, as by glycosylation or phosphorylation, for example). Examples of insulin C chain polypeptides that are suitable for use in the fusion proteins and methods described herein are known in the art and include variants of naturally occurring insulin C chain polypeptides (e.g., variants having at least about 70%, about 75%, about 80%, about 85%, about 90, about 95%, about 96%, about 97%, about 98% or about 99% identity to a naturally occurring insulin C chain polypeptide), such as an insulin C chain polypeptide from humans. In some embodiments, an insulin C chain polypeptide is a polypeptide having the amino acid sequence of amino acid residues 57-87 of human insulin assigned UniProt Accession No. P01308 (SEQ ID NO: 14), or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about

- 10 - 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to amino acid residues 57-87 of the amino acid sequence of SEQ ID NO: 18. The amino acid sequence of SEQ ID NO:21 corresponds to amino acid residues 57-87 of the amino acid sequence of SEQ ID NO: 18.

[0042] U.S. Application Publication No. US 2018/0194827 describes insulin peptides and single chain insulin peptide agonists that include insulin A chains and insulin B chains containing a variety of substitutions, additions and/or modifications compared to native or naturally-occurring insulin A and B chains. The teachings of US 2018/0194827 relevant to insulin derivatives and analogs, for example, insulin A chain polypeptides and insulin B chain polypeptides, are incorporated herein by reference in their entirety. Thus, examples of insulin A chain polypeptides and insulin B chain polypeptides include the insulin A chains and insulin B chains disclosed in US 2018/0194827.

[0043] Mathieu, C., Gillard, P. and Benhalima, K., Nature Reviews Endocrinology 13, 385-399 (2017) (Mathieu et al.) and Zaykov, A.N., Mayer, J.P. and DiMarchi, R.D., Nature Reviews Drug Discovery 15, 425-439 (2016) (Zaykov et al.) describe insulin analogues. The teachings of Mathieu et al. and Zaykov et al. relevant to insulin derivatives and analogues, for example, insulin A chain polypeptides and insulin B chain polypeptides, are incorporated herein by reference in their entirety. Thus, examples of insulin A chain polypeptides and insulin B chain polypeptides include the insulin A chains and insulin B chains disclosed in Mathieu et al. and Zaykov et al.

[0044] Glargine is an insulin analog. Two modifications to the wildtype insulin were introduced to make the insulin glargine molecule. One modification is the substitution of Asn with Gly at position A21 at the end of the insulin A chain, and the other is the addition of two Arg residues at the end of the insulin B chain.

[0045] As used herein, “glargine A chain polypeptide,” is the sequence of insulin A chain with a substitution of Asn with Gly at position A21 at the end of the insulin A chain. In some embodiments, a glargine A chain polypeptide is a polypeptide having the amino acid sequence of SEQ ID NO:23, or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to amino acid of SEQ ID NO:23.

[0046] As used herein, “glargine B chain polypeptide,” is the sequence of insulin B chain with the addition of two Arg residues at the end of the insulin B chain. In some embodiments, a glargine B chain polypeptide is a polypeptide having the amino acid sequence of SEQ ID 301

WO 2022/094146 PCT/US2021/057142

NO:24, or a variant thereof having at least about 70% (e.g., about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98% or about 99%) identity to amino acid of SEQ ID NO:24.

[0047] In some embodiments, the precursor protein is a fusion protein (e.g., a fusion protein comprising a target protein). In certain embodiments, the precursor protein is a fusion protein that comprises a tag (e.g., a His-tag) to enhance expression, folding and/or purification. In some embodiments, the precursor protein comprises a De novo Expression Enhancer Protein (DEEP) fusion tag. Examples of DEEP fusion tags that are suitable for inclusion in prohormones capable of being processed by the methods disclosed herein are described herein below, and in International Application No. PCT/US2018/044156, published as International Publication No. WO 2019/023616 Al, the contents of which are incorporated herein by reference in their entirety.

[0048] Fusion Proteins of the Disclosure

[0049] The present disclosure further provides, in various embodiments, a fusion protein comprising a DEEP fusion tag, a target protein, and at least one KEX2 protease cleavage site. [0050] The term “fusion protein” refers to a synthetic, semi -synthetic or recombinant single protein molecule that comprises all or a portion of two or more different proteins and/or peptides.

[0051] As used herein, the term “De novo Expression Enhancer Protein fusion tag” or “DEEP fusion tag” refers to a polypeptide having at least two (e.g., 2, 3, 4, 5 or 6) a-helices, wherein each a-helix comprises a binary patterned sequence of seven amino acid residues, or heptad sequence, defined by [PNPPNNP]n, where each “P” is independently selected from the polar amino acid residues Lys (K), His (H), Glu (E), Gin (Q), Asp (D), Asn (N), Thr (T) and Ser (S), each “N” is independently selected from the nonpolar amino acid residues Phe (F), Leu (L), He (I), Met (M), Vai (V) and Trp (W), and n is an integer from 2 to 10 (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10). In particular embodiments of a DEEP fusion tag, n = 3.

[0052] The heptad sequences in an a-helix containing more than one heptad sequence can be identical (i.e., repeats of the same heptad sequence) or they can be different (i.e., each PNPPNNP heptad sequence within the same a-helix can have a different amino acid sequence). Furthermore, the amino acid composition of the a-helices in a DEEP fusion tag can vary from helix to helix such that, for example, each of the a-helices in the tag will have a different amino acid sequence. 301

WO 2022/094146 PCT/US2021/057142

[0053] A DEEP fusion tag can also include additional amino acid residues, for example, N-terminal to the first a-helix and/or C-terminal to the last a-helix in the tag. Typically, a DEEP fusion tag will also include interhelical turns between the a-helices, wherein each interhelical turn includes, for example, 4, 5 or 6 amino acid residues encoded by the degenerate DNA codon VAN (V: A, G, or C; N: A, G, C, or T) (e.g., Gly (G), His (H), Gin (Q), Asn (N), Asp (D), Glu (E) and Lys (K)).

[0054] A DEEP fusion tag is generally at least about 70 amino acid residues in length (e.g., 74 amino acid residues). In a particular embodiment, a DEEP fusion tag is at least about 100 amino acid residues in length (e.g., 102 amino acid residues). Typically, a DEEP fusion tag is less than about 500 amino acid residues in length, for example, less than about 450 amino acid residues in length or less than about 420 amino acid residues in length.

[0055] In particular embodiments, a DEEP fusion tag includes a plurality of histidine residues that are exposed at the surface in a properly folded fusion protein. In a further embodiment, each a-helix in a DEEP fusion tag includes at least one (e.g., 1, 2, 3, 4, 5, 6, or more, for example, 12) histidine residue(s).

[0056] The fusion can be an TV-terminal fusion (with respect to the DEEP fusion tag), a C- terminal fusion (with respect to the DEEP fusion tag) or an internal fusion (with respect to the DEEP fusion tag and/or the target protein).

[0057] Examples of polypeptides that are suitable for use as DEEP fusion tags in the present disclosure, as well as methods of designing and making such polypeptides, are described in the following publications, each of which is incorporated by reference herein in its entirety:

• Zarzhitsky, Shlomo, et al. (2020), Harnessing synthetic biology to enhance heterologous protein expression. Protein Science 29, 1698-1706.

• Wei Y, Liu T, Sazinsky SL, Moffet DA, Pelczer I, and Hecht MH (2003), Stably folded de novo proteins from a designed combinatorial library. Protein Science 12, 92-102 (see, e.g., Figure 2, proteins designated 86, n86, S-23, S-213, S-285, S-824 and S-836);

• Kamtekar S, Schiffer JM, Xiong H, Babik JM & Hecht MH (1993), Protein Design by Binary Patterning of Polar and Non-Polar Amino Acids. Science 262, 1680-1685;

• Wei Y, Kim S, Fela D, Baum J, & Hecht MH (2003), Solution Structure of a De Novo Protein From a Designed Combinatorial Library. Proc. Natl. Acad. Sci. (USA) 100, 13270-13273; 301

WO 2022/094146 PCT/US2021/057142

• Hecht MH, Das A, Go A, Bradley LH & Wei Y (2004), De Novo Proteins from Designed Combinatorial Libraries. Protein Science 13. 1711-1723;

• Go A, Kim S, Baum J, & Hecht MH (2008), Structure and Dynamics of De novo Proteins from a Designed Superfamily of 4-Helix Bundles. Protein Science 17, 821- 832; and

• Bradley LH, Kleiner RE, Wang AF, Hecht MH & Wood DW (2005), An Intein-Based Genetic Selection Enables Construction of a High-Quality Library of Binary Patterned De Novo Sequences. Protein Engineering, Design & Selection (PEDS) 18, 201-207.

[0058] In particular embodiments, the DEEP fusion tag comprises, consists essentially of, or consists of (e.g., comprises) the amino acid sequence of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16 or SEQ ID NO: 17, or a functional fragment thereof, such as a fragment lacking the N-terminal methionine residue.

In other embodiments, the DEEP fusion tag comprises a variant amino acid sequence of any of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO:14, SEQ ID NO: 15, SEQ ID NO: 16 or SEQ ID NO: 17 having, for example, at least about 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more identity to the amino acid sequence of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO:15, SEQ ID NO: 16 or SEQ ID NO: 17, respectively.

[0059] In some embodiments, the fusion protein further comprises at least one linker sequence. A variety of linker amino acid sequences are known in the art and can be used in the invention. In some embodiments, a linker sequence includes one or more amino acid residues selected from Gly, Ser, Thr, His, Asp, Glu, Asn, Gin, Lys and Arg. In some embodiments, a linker sequence includes a polyglycine sequence (e.g., a 6X glycine sequence). Other examples of linkers include GSAGSAAGSG (SEQ ID NO: 12), GGGGGGSR (SEQ ID NO: 13), KR and RR. In certain embodiments, a linker sequence includes a cleavage site. 301

WO 2022/094146 PCT/US2021/057142

[0060] In some embodiments, the fusion protein comprises at least one KEX2 cleavage site in the target protein. In some embodiments, the fusion protein additionally or alternatively comprises at least one KEX2 cleavage site in a linker sequence.

[0061] In certain embodiments, the fusion protein comprises, consists essentially of or consists of SEQ ID NO:27.

[0062] In some embodiments, the target protein comprises or consists of a hormone and the fusion protein comprises or consists of a prohormone.

[0063] In some embodiments, the target protein comprises insulin or an insulin analog.

[0064] In some embodiments, the fusion protein comprises proinsulin.

[0065] In particular embodiments, the target protein comprises glargine.

[0066] In certain embodiments, the fusion protein comprises proglargine.

[0067] In some embodiments, the fusion protein comprises or consists of SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:29, or a variant of any of the foregoing having at least about 70% amino acid sequence identity to one or more of SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:29.

[0068] In some embodiments, the fusion protein comprises or consists of SEQ ID NO:27.

[0069] The fusion proteins of the disclosure can be produced recombinantly or synthetically, using routine methods and reagents that are well known in the art. For example, a fusion protein of the disclosure can be produced recombinantly in a suitable host cell (e.g., bacteria, yeast, insect cells, mammalian cells) according to methods known in the art. See, e.g., Current Protocols in Molecular Biology, Second Edition, Ausubel et al. eds., John Wiley & Sons, 1992; and Molecular Cloning: a Laboratory Manual, 2nd edition, Sambrook et al., 1989, Cold Spring Harbor Laboratory Press. For example, a nucleic acid molecule comprising a nucleotide sequence encoding a fusion protein described herein can be introduced and expressed in suitable host cells (e.g., E. colt), and the expressed fusion protein can be isolated/purified from the host cells (e.g., in inclusion bodies) using routine methods and readily available reagents.

[0070] Methods for introducing DNA constructs encoding fusion proteins into host cells are well known in the art and include, for example, standard transformation and transfection techniques (e.g., electroporation, chemical transformation). A person of ordinary skill in the field of the disclosure can readily select an appropriate method for introducing a DNA construct into host cells. 301

WO 2022/094146 PCT/US2021/057142

[0071] A variety of methods for expressing proteins in host cells are well known in the art (e.g., IPTG-induced expression in E. colt). A person of ordinary skill in the field of the disclosure can readily select an appropriate method for expressing a fusion protein of the disclosure in host cells.

[0072] An expressed fusion protein can be isolated from host cells using known methods and reagents including, e.g., lysozyme treatment, sonication, filtration, salting-out, ultracentrifugation, and chromatography. A recombinantly-expressed fusion protein can be recovered from host cells and/or the host cell culture medium. Once released from the cells, the fusion protein can be purified from cell lysates by binding to an affinity resin using standard techniques and reagents. In a particular embodiment, the fusion protein is isolated by binding of the DEEP fusion tag in the fusion protein to an affinity resin (e.g., on a solid support). In some embodiments, the DEEP fusion tag comprises a plurality of surface- exposed histidine residues, enabling purification by methods typically used for His-tagged proteins. For example, a fusion protein of the disclosure can be isolated using immobilized metal ion affinity chromatography (IMAC). Suitable IMAC resins containing immobilized transition metals for IMAC applications are known in the art and are commercially available (e.g., TALON® Superfl ow™ resins, HisTrap™ High Performance resins, GE Healthcare Life Sciences), and include, e.g., immobilized nickel resins, immobilized cobalt resins, immobilized copper resins, and immobilized zinc resins. In a particular embodiment, a fusion protein of the disclosure is purified using an affinity resin comprising immobilized nickel ions.

[0073] Nucleic Acids, Vectors, Host Cells, and Kits of the Disclosure

[0074] The present disclosure further provides, in various embodiments, a polynucleotide encoding a fusion protein disclosed herein. In some embodiments, the polynucleotide is a DNA polynucleotide. In some embodiments, the polynucleotide is a RNA polynucleotide. The polynucleotide can be in the form of an insert (e.g., for cloning into a vector). The polynucleotide can be linear or circular. In some embodiments, the polynucleotide comprises one or more of a non-canonical nucleotide and a modified nucleotide (e.g., a nucleotide comprising a chemical modification). The polynucleotide can be isolated, recombinant, synthetic or semi -synthetic.

[0075] Although the genetic code is degenerate in that most amino acids are represented by several codons (called “synonyms” or “synonymous” codons), it is understood in the art 301

WO 2022/094146 PCT/US2021/057142 that codon usage by particular organisms is nonrandom and biased towards particular codon triplets. Accordingly, in a particular embodiment, a nucleic acid encoding a fusion protein of the disclosure includes a nucleotide sequence that has been optimized for expression in a particular type of host cell (e.g., through codon optimization). Codon optimization refers to a process in which a polynucleotide encoding a protein of interest is modified to replace particular codons in that polynucleotide with codons that encode the same amino acid(s), but are more commonly used/recognized in the host cell in which the nucleic acid is being expressed. In some embodiments, the polynucleotides encoding a fusion protein of the disclosure are codon optimized for expression in

coli.

[0076] The disclosure also provides, in various embodiments, a vector comprising a polynucleotide of the disclosure. In some embodiments, the vector is an expression vector. In certain embodiments, the vector is a viral vector (e.g., lentiviral vector, adenoviral vector, AAV). In other embodiments, the vector is a non-viral vector (e.g., plasmid, cloning vector). A variety of vectors, including expression vectors, viral vectors and non-viral vectors are known in the art and are commercially available.

[0077] The disclosure further provides, in various embodiments, a host cell comprising a polynucleotide disclosed herein or a vector disclosed herein. As used herein, the term “host cell” refers to a suitable host for expressing a nucleic acid encoding a fusion protein comprising a DEEP fusion tag. In some embodiments, the host cells are cells that have been transformed or transfected with vectors constructed using recombinant DNA techniques known in the art. Examples of suitable host cells include yeast cells (e.g., Pichia pastoris and Saccharomyces cerevisiae), insect cells (e.g., Spodoptera frugiperda Sf9 cells), mammalian cells (e.g., CHO cells), and bacterial cells (e.g., E. coli and B. sublHis Agrobacterium tumefaciens). Further examples of suitable host cells include plant cells (e.g., Nicotiana benlhamiana . In a particular aspect, the host cell is E. coli.

[0078] In various embodiments, the disclosure additional provides a kit comprising one or more polynucleotides comprising a nucleotide sequence encoding a DEEP fusion tag and a cloning site for introducing a nucleotide sequence encoding a target protein to form a fusion protein, and a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 (e.g., in separate containers). In some embodiments, the KEX2 protease comprises or consists of SEQ ID NO:1. 301

WO 2022/094146 PCT/US2021/057142

[0079] In some embodiments, the polynucleotide further comprises a sequence encoding a peptide linker, a sequence providing a KEX2 protease cleavage site, or a combination thereof.

[0080] In certain embodiments, the kit further comprises instructions for use.

Exemplification

[0081] The following materials and methods were used in the experiments described in FIGs. 1 and 2 herein.

[0082] Construction of plasmids and strains

[0083] A synthetic gene encoding a DEEP-insulin glargine fusion protein (SEQ ID NO:27) was constructed using E. coll codon optimized gBlocks (Integrated DNA Technologies), and amplified with forward and reverse primers containing Xbal and Hindlll restriction sites.

[0084] Using standard genetic cloning techniques, the sequence encoding DEEP -glargine was introduced into pET30 vector carrying kanamycin resistance to yield pET30DEEP- glargine. For overexpression, the plasmids were transformed into competent BL21DE3 cells and plated on LB agar supplemented with 30 mg/L kanamycin. The next day, a fresh colony was picked and inoculated into 5mL of LB kanamycin, and incubated overnight at 37°C with continuous shaking at 200 rpm. 4 ml of the overnight culture were inoculated into IL of LB supplemented with kanamycin. The culture was induced with 0.5mM IPTG, after a sample of the culture measured an OD600 of 0.8, the culture was further incubated for 3h. Cells were collected using a centrifuge, operating at 4,500xg. Cell pellets were kept at -80°C until further use.

[0085] Lysis and Extraction of Inclusion Bodies (IBs)

[0086] Cell pellets were resuspended in lysis buffer containing 50mM Tris, 300mM NaCl (TBS), 4mM EDTA and lysed using Emulsiflex. Immediately after lysis the samples were spundown at 15,000xg for 30 min. The supernatant was discarded, and the pellet containing inclusion bodies was washed 3 times with: 1. TBS, 1% Triton, 2M urea, 4mM EDTA, 2. TBS, and 3.milli-q water. The washed inclusion bodies were eventually resuspended at 4°C in a buffer containing 8M urea, 50mM glycine pH=10.5. After overnight incubation, the resuspended inclusion bodies were spun down at 35,000xg for 30min. Lysis and protein extraction from inclusion bodies was performed with solutions either kept on ice or at 4°C. To determine protein concentration, an aliquot of resuspended IBs was fully reduced and 301

WO 2022/094146 PCT/US2021/057142 denatured by xlO dilution into 8M GdnHCl lOOmM DTT. After 20min of incubation at room temperature the reactions were quenched with acetonitrile and TFA, at final concentrations of 10% and 0.5% (v/v), respectively. This fully reduced and denatured protein sample was loaded on the HPLC equipped with Zorbax C-18 analytical column (Agilent). The concentration of the protein was calculated based on the area under the curve (AUC) using the following equation:

[0088] where n is the moles of protein in the peak, F is the flow rate, a is the molar extinction coefficient, and d is the pathlength.

[0089] Refolding

[0090] Refolding was performed by the rapid dilution method. The concentration of the protein in resuspended inclusion bodies was adjusted to -5-10 mg/ml, and further diluted 10- 20 times into the refolding buffer. The refolding buffer was 50mM glycine pH=10.5 supplemented with different ratios of P-mercaptoethanol to protein’s cys residues. The refolding was performed with solutions pre-chilled on ice or at 4°C. Analysis of the refolding process was performed using an HPLC. Before loading the samples, the reactions were quenched with acetonitrile and TFA, at final concentrations of 10% and 0.5% (v/v), respectively. Refolding yields were calculated based on AUC of the refolded protein divided by the AUC of the fully reduced and denatured one.

[0091] Purification and KEX2 digest

[0092] After 48h of refolding at 4°C, the solution was supplemented with 50mM tris and 300mM NaCl (final concentration), ImM oxidized glutathione, and the pH was adjusted to 8. The protein solution was purified on the FPLC (AKTA pure) equipped with a HisTrap column, buffer A (50mM tris, 300mM NaCl pH8), and buffer B same as A supplemented with 500mM imidazole at pH8. Fractions containing the elution peak were combined, and immediately desalted using PD-10 column into lOOmM tris pH8. Aliquots of protein solution were supplemented with various concentrations of CaC12 and KEX2 (PeproTech). The cleavage reaction was incubated at 37°C, monitoring the progress using RP-HPLC equipped with C18 column (Zorbax, Agilent). Cleaved insulin glargine was precipitated by the addition of lOmM ZnSO4, and adjusting the pH to 6. Instantly the solution became cloudy, and was further incubated for 12h at 4°C with gentle stirring. The precipitated protein was spun down at 5000Xg, for 15min and the pellet was dissolved in 6M GdnHCl, 20% MeCN, 1% TFA. The protein sample was further purified on the HPLC equipped with a semi-prep Cl 8 301

WO 2022/094146 PCT/US2021/057142 column. The eluting fractions containing insulin were collected, lyophilized, and stored at - 80°C until further use.

[0093] Purification of protein

[0094] DEEP-glargine protein was expressed, extracted and purified as described above. The combined fractions of the Ni-IMAC peak (~7mg/ml in 7.5ml) were further desalted using PD10, into lOOmM Tris pH = 8 (~5mg/ml in 10.5 ml). lOOul samples were supplemented with CaC12 and KEX2 (Peprotech, US) (FIG. 2). The samples were further incubated at 37°C overnight.

[0095] Results

[0096] To verify experimentally the methods disclosed herein, insulin glargine was expressed as a prohormone comprising a DEEP fusion tag, referred to herein as “DEEP- Glargine,” and having the amino acid sequence of SEQ ID NO:27, and converted in vitro to mature glargine. Specifically, DEEP-Glargine, in which the C-chain of proglargine was replaced with a DEEP protein, was expressed in E. coh. extracted, refolded, and converted to mature glargine in vitro by incubation with a recombinant KEX2 protease as described above. The conversion of DEEP-Glargine using KEX2 protease was observed to proceed to near completion without generating any detectable cleavage byproducts.

[0097] The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.

[0098] While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the disclosure herein.

Claims

CLAIMS What is claimed is:

1. A method of converting a precursor protein to a mature protein comprising: providing a precursor protein that is to be converted to a mature protein, wherein the precursor protein comprises at least one cleavage site for a KEX2 protease; and contacting the precursor protein with a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 under conditions in which cleavage of the precursor protein by the KEX2 protease occurs, thereby converting the precursor protein to a mature protein.

2. A method of obtaining a target protein, comprising: providing a precursor protein that comprises a target protein and at least one cleavage site for a KEX2 protease; contacting the precursor protein with a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 under conditions in which cleavage of the precursor protein by the KEX2 protease occurs and releases the target protein; and isolating the target protein, thereby obtaining the target protein.

3. The method of claim 1 or 2, wherein the precursor protein is a prohormone.

4. The method of claim 3, wherein the prohormone is a prohormone of insulin or an analog thereof.

5. The method of claim 3, wherein the prohormone is a prohormone of glargine.

6. The method of any one of claims 1-5, wherein the precursor protein is a fusion protein.

7. The method of claim 6, wherein the fusion protein comprises a DEEP fusion tag.

- 21 - The method of claim 7, wherein the DEEP fusion tag comprises an amino acid sequence having at least 70% sequence identity to one or more of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NOTO, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO:15, SEQ ID NO: 16 or SEQ ID NO: 17. The method of any one of claims 6-8, wherein the fusion protein further comprises at least one linker sequence. The method of claim 9, wherein at least one KEX2 cleavage site is in the linker sequence. The method of any one of claims 2-9, wherein at least one KEX2 cleavage site is in the target protein. The method of any one of claims 1-11, wherein the precursor protein comprises SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, or SEQ ID NO:29. The method of any one of claims 1-11, wherein the precursor protein comprises SEQ ID NO:27. The method of any one of claims 1-13, wherein the method is performed in vitro. A fusion protein comprising a DEEP fusion tag, a target protein, and at least one KEX2 protease cleavage site. The fusion protein of claim 15, wherein the fusion protein further comprises at least one linker sequence. The fusion protein of claim 15 or 16, wherein at least one KEX2 cleavage site is in the target protein. The fusion protein of any one of claims 15-17, wherein at least one KEX2 cleavage site is in the linker sequence. The fusion protein of claim 15, 16 or 17, comprising SEQ ID NO:27. The fusion protein of any one of claims 15-19, wherein the target protein comprises a hormone and wherein the fusion protein comprises a prohormone. The fusion protein of any one of claims 15-20, wherein the target protein comprises insulin or an insulin analog. The fusion protein of any one of claims 15-21, wherein the fusion protein comprises proinsulin. The fusion protein of any one of claims 15-20, wherein the target protein comprises glargine. The fusion protein of any one of claims 15-20 and 23, wherein the fusion protein comprises proglargine. A polynucleotide encoding the fusion protein of any one of claims 15-24. The polynucleotide of claim 25, wherein the polynucleotide is a DNA polynucleotide. A vector comprising the polynucleotide of claim 25 or 26. A host cell comprising the polynucleotide of claim 25 or 26, or the vector of claim 27. A KEX2 protease for use in the method of any one of claims 1-14. A target protein obtained by the method of any one of claims 2-14. 101

WO 2022/094146 PCT/US2021/057142 A kit comprising: a polynucleotide comprising a nucleotide sequence encoding a DEEP fusion tag and a cloning site for introducing a nucleotide sequence encoding a target protein to form a fusion protein; and a KEX2 protease comprising an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1. The kit of claim 31, wherein the polynucleotide further comprises a sequence encoding a peptide linker. The kit of claim 31 or 32, wherein the polynucleotide further comprises a sequence encoding a KEX2 protease cleavage site. The method of any one of claims 1-14, or the kit of any one of claims 31-33, wherein the KEX2 protease comprises SEQ ID NO: 1.