EP3818072A1

EP3818072A1 - Means and methods for site-specific protein modification using transpeptidases

Info

Publication number: EP3818072A1
Application number: EP19745053.9A
Authority: EP
Inventors: Kathrin Lang; Maximilian FOTTNER; Andreas-David BRUNNER
Original assignee: Technische Universitaet Muenchen
Current assignee: Eidgenoessische Technische Hochschule Zurich ETHZ
Priority date: 2018-07-03
Filing date: 2019-07-03
Publication date: 2021-05-12
Also published as: WO2020007899A1

Abstract

The present invention is in the field of biochemistry. The present invention relates to a method for modifying a polypeptide employing a transpeptidase and polypeptides obtainable by the method of the invention. The present invention further relates to polypeptides employed in the method of the invention and a host cell comprising said polypeptides. Finally, the present invention relates to a host cell employed in the method of the present invention.

Description

Means and methods for site-specific protein modification using transpeptidases

Cross-reference to related applications

[0001] The present application claims the benefit of priority of European Patent Application No. 18181497. Ifiled 3 July 2018, the content of which is hereby incorporated by reference it its entirety for all purposes.

I. Background

[0002] The covalent attachment of ubiquitin (Ub) to target proteins represents one of the most versatile and common posttranslational modifications (PTMs) in eukaryotic cells and many fundamental cellular processes are regulated by this modification (Hershko et al., Annu Rev Biochem 67, 425-79 (1998); Komander et al., Annu Rev Biochem 81 , 203-29 (2012)). Ubiquitylation, in which the C-terminal carboxylate of Ub is attached to the e-amino group of a lysine in a substrate protein to form an isopeptide-bond, is naturally mediated by E1/E2/E3- enzymes. Once attached, further ubiquitins can be added either to additional lysine residues within the substrate protein or to an already attached ubiquitin via one of the seven lysines of ubiquitin itself. Thereby, diverse ubiquitin topologies are formed on substrate proteins (Kulathu et al., Nat Rev Mol Cell Biol 13, 508-23 (2012)). Like many PTMs, ubiquitylation is a reversible process and tightly regulated by a family of enzymes called deubiquitinases (DUBs) (Komander et al., Nat Rev Mol Cell Biol 10, 550-63 (2009)). In a similar fashion, target proteins can also be covalently modified by ubiquitin-like-proteins (Ubls) that share the common b-grasp fold, including SUMO and NEDD8 (van der Veen et al., Annu Rev Biochem 81 , 323-57 (2012)). Ubiquitylation and modification of target proteins with Ubls play crucial roles in a variety of cellular processes, such as protein degradation, DNA repair, nuclear transport, endocytosis, and chromosomal organization. Hence, many different human diseases, including different types of cancer and neurodegenerative diseases, are being linked to dysfunction of ubiquitylation pathways (Flotho et al., Annu Rev Biochem 82, 357-85 (2013)). The specific combinations of enzymes that are used to ubiquitylate/deubiquitylate target proteins are often unknown, making it challenging to understand the roles of Ub- and Ubl-modifications and to decipher enzyme specificity. A major obstacle consists in the generation of defined protein-Ub and protein-Ubl conjugates for subsequent biochemical analysis and consequently, only a small fraction of regulatory events triggered by ubiquitylation has been studied in detail. Over the past 15 years, chemical approaches including thiol-ene coupling (Trang et al., Angew Chem Int Ed Engl 51 , 13085-8 (2012)), disulphide- exchange (Chen et al., Nat Chem Biol 6, 270-2 (2010)), thioether ligation (Jung et al., Bioconjug Chem 20, 1152-62 (2009)), Cu(l) catalysed azide-alkyne cycloaddition (Weikart et al., Chembiochem 1 1 , 774-7 (2010); Eger et al., J Am Chem Soc 132, 16337-9 (2010)) and oxime ligation (Stanley et al., Chembiochem 17, 1472-80 (2016)) have been developed to produce Ub- protein conjugates with non-native isopeptide-linkages. Furthermore silver-mediated chemical condensation (Virdee et al., Nat Chem Biol 6, 750-7 (2010)) and native chemical ligation (Li et al., Angew Chem Int Ed Engl 48, 9184-7 (2009); Virdee et al., J Am Chem Soc 133, 10708-1 1 (201 1 )) in conjunction with site-specific incorporation of unnatural amino acids (UAAs) have been utilized to generate ubiquitylated proteins linked via a native isopeptide-bond. Although these approaches represent proven and established tools for studying ubiquitylation, many of them depend on harsh deprotection and desulfurization protocols and are therefore not applicable for the ubiquitylation of complex, multi-domain proteins or for studying ubiquitylation in living cells (Mali et al., J Am Chem Soc (2017); Stanley et al., Biochem J 473, 1297-314 (2016). As a consequence, the ability to study the effects of ubiquitylation is limited by the difficulty to prepare homogenously modified proteins in vitro and by the impossibility to selectively trigger specific ubiquitylation events in living cells.

[0003] Accordingly, the present inventors developed a new approach to site-specifically modify target proteins - both in vitro and in cellulo. This new approach can be employed to ubiquitylate and SUMOylate a target protein or to conjugate site-specifically any other ubiquitin-like protein, polypeptide or dye-polypeptide to the target protein. This new approach, which the present inventors term sortylation, overcomes current limitations for generating site-specifically modified complex, non-refoldable protein targets, e.g. via ubiquitylation or SUMOylation, and for studying such ubiquitylation/SUMOylation events under physiological conditions in living cells.

Definitions

[0004] The following list defines terms, phrases, and abbreviations used throughout the instant specification. All terms listed and defined herein are intended to encompass all grammatical forms.

[0005] It must be noted that as used herein, the singular forms“a”,“an”, and“the”, include plural references unless the context clearly indicates otherwise. Thus, for example, reference to“an expression cassette” includes one or more of the expression cassettes disclosed herein and reference to“the method” includes reference to equivalent steps and methods known to those of ordinary skill in the art that could be modified or substituted for the methods described herein.

[0006] Unless otherwise indicated, the term "at least" preceding a series of elements is to be understood to refer to every element in the series. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the present invention.

[0007] The term "and/or" wherever used herein includes the meaning of "and", "or" and "all or any other combination of the elements connected by said term". For example, A, B and/or C means A, B, C, A+B, A+C, B+C and A+B+C.

[0008] The term "about" or "approximately" as used herein means within 20%, preferably within 10%, and more preferably within 5% of a given value or range. It includes also the concrete number, e.g., about 20 includes 20.

[0009] The term “less than”, “more than” or“larger than” includes the concrete number. For example, less than 20 means <20 and more than 20 means >20.

[0010] Further, in describing representative embodiments of the present invention, the specification may have presented the method and/or process of the present invention as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. In addition, the claims directed to the method and/or process of the present invention should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the present invention.

[0011] It should be understood that this invention is not limited to the particular methodology, protocols, material, reagents, and substances, etc., described herein. The terminologies used herein are for the purpose of describing particular embodiments only and are not intended to limit the scope of the present invention, which is defined solely by the claims/items.

[0012] All publications and patents cited in this disclosure are incorporated by reference in their entirety. To the extent the material incorporated by reference contradicts or is inconsistent with this specification, the specification will supersede any such material.

Unless otherwise indicated, the term "at least" preceding a series of elements is to be understood to refer to every element in the series. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the present invention.

[0013] Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as“comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integer or step. When used herein the term “comprising” can be substituted with the term“containing” or sometimes when used herein with the term“having”. When used herein“consisting of" excludes any element, step, or ingredient not specified in the claim element. When used herein, "consisting essentially of" does not exclude materials or steps that do not materially affect the basic and novel characteristics of the claim. In each instance herein any of the terms "comprising", "consisting essentially of" and "consisting of" may be replaced with either of the other two terms.

[0014] Unless otherwise defined herein, scientific and technical terms used in connection with the present invention shall have the meanings that are commonly understood by those of ordinary skill in the art. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. The methods and techniques of the present invention are generally performed according to conventional methods well-known in the art. Generally, nomenclatures used in connection with techniques of biochemistry, enzymology, molecular and cellular biology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those well-known and commonly used in the art.

[0015] The methods and techniques of the present invention are generally performed according to conventional methods well-known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. See, e. g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y. (2001 ); Ausubel et al., Current Protocols in Molecular Biology, J, Greene Publishing Associates (1992, and Supplements to 2002); Handbook of Biochemistry: Section A Proteins, Vol I 1976 CRC Press; Handbook of Biochemistry: Section A Proteins, Vol II 1976 CRC Press. The nomenclatures used in connection with, and the laboratory procedures and techniques of, molecular and cellular biology, protein biochemistry, enzymology and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art.

[0016] “Identity” is a property of sequences that measures their similarity or relationship. The term “sequence identity” or“identity” as used in the present disclosure means the percentage of pair- wise identical residues— following (homologous) alignment of a sequence of a polypeptide of the disclosure with a sequence in question— with respect to the number of residues in the longer of these two sequences. Sequence identity is measured by dividing the number of identical amino acid residues by the total number of residues and multiplying the product by 100.

[0017] The term“homology” is used herein in its usual meaning and includes identical amino acids as well as amino acids which are regarded to be conservative substitutions (for example, exchange of a glutamate residue by an aspartate residue) at equivalent positions in the linear amino acid sequence of a polypeptide of the disclosure (e.g., any lipocalin muteins of the disclosure).

[0018] The percentage of sequence homology or sequence identity can, for example, be determined herein using the program BLASTP, version blastp 2.2.5 (November 16, 2002) (cf. Altschul et al., Nucleic Acids Res, 1997). In this embodiment the percentage of homology is based on the alignment of the entire polypeptide sequence (matrix: BLOSUM 62; gap costs: 11.1 ; cut-off value set to 10³) including the propeptide sequences, preferably using the wild-type protein scaffold as reference in a pairwise comparison. It is calculated as the percentage of numbers of “positives” (homologous amino acids) indicated as result in the BLASTP program output divided by the total number of amino acids selected by the program for the alignment.

[0019] The term "isolated" means a substance in a form or environment that does not occur in nature. Non-limiting examples of isolated substances include (1 ) any non-naturally occurring substance, (2) any substance including, but not limited to, any enzyme, variant, nucleic acid, protein, peptide or cofactor, that is at least partially removed from one or more or all of the naturally occurring constituents with which it is associated in nature; (3) any substance modified by the hand of man relative to that substance found in nature, e.g. cDNA made from mRNA; or (4) any substance modified by increasing the amount of the substance relative to other components with which it is naturally associated (e.g., recombinant production in a host cell; multiple copies of a gene encoding the substance; and use of a stronger promoter than the promoter naturally associated with the gene encoding the substance).

[0020] The term "nucleotide sequence" or“nucleic acid sequence” used herein refers to either DNA or RNA. "Nucleic acid sequence" or "polynucleotide sequence" or simply“polynucleotide” refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5' to the 3' end. It includes both self-replicating plasmids, infectious polymers of DNA or RNA, and non-functional DNA or RNA.

[0021] The term“expressing” as used herein refers to the synthesis of a gene product encoded by a polynucleotide. In context of a polypeptide “expressing” means when a polynucleotide is transcribed to mRNA and the mRNA is translated to a polypeptide. The term “expressing” in context of a RNA means when a DNA is transcribed to RNA, e.g. a tRNA.

[0022] The term "amino acid" or“natural amino acid” are used interchangeably herein and refer to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y-carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. However, such an analog is not to be confused with an unnatural amino acid, which comprises one or more amino acid residues fused to the R group of an amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that function in a manner similar to a naturally occurring amino acid.

[0023] The terms "polypeptide" and "protein" are interchangeably used. The term "polypeptide" refers to a protein or peptide that contains two or more amino acids, typically at least 3, preferably at least 20, more preferred at least 30, such as at least 50 amino acids. Accordingly, a polypeptide comprises an amino acid sequence, and, thus, sometimes a polypeptide comprising an amino acid sequence is referred to herein as a“polypeptide comprising a polypeptide sequence”. Thus, herein the term“polypeptide sequence” is interchangeably used with the term“amino acid sequence”.

[0024] The term "vector" as used herein refers to a nucleic acid sequence into which an expression cassette comprising a gene of the present invention or gene encoding the protein of interest may be inserted or cloned. Furthermore, the vector may encode an antibiotic resistance gene conferring selection of the host cell. Preferably, the vector is an expression vector. The vector may be capable of autonomous replication in a host cell (e. g., vectors having an origin of replication which functions in the host cell). The vector may have a linear, circular, or supercoiled configuration and may be complexed with other vectors or other material for certain purposes. Vectors used herein for expressing an expression cassette comprising a gene of the present invention or gene encoding the protein of interest usually contain transcriptional control elements suitable to drive transcription such as e.g. promoters, enhancers, polyadenylation signals, transcription pausing or termination signals as elements of an expression cassette. For proper expression of the polypeptides, suitable translational control elements are preferably included in the vector, such as e.g. 5' untranslated regions leading to 5' cap structures suitable for recruiting ribosomes and stop codons to terminate the translation process. In particular, the nucleotide sequence serving as the selectable marker genes as well as the nucleotide sequence encoding the protein of interest can be transcribed under the control of transcription elements present in appropriate promoters. The resultant transcripts of the selectable marker genes and that of the protein of interest harbour functional translation elements that facilitate substantial levels of protein expression (i.e. translation) and proper translation termination. The vector may comprise a polylinker (multiple cloning site), i.e. a short segment of DNA that contains many restriction sites, a standard feature on many plasmids used for molecular cloning. Multiple cloning sites typically contain more than 5, 10, 15, 20, 25, or more than 25 restrictions sites. Restriction sites within an MCS are typically unique (i.e., they occur only once within that particular plasmid). MCSs are commonly used during procedures involving molecular cloning or subcloning. One type of vector is a plasmid, which refers to a circular double stranded DNA loop into which additional DNA segments may be introduced via ligation or by means of restriction-free cloning. Other vectors include cosmids, bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC) or mini-chromosomes. Another type of vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome. The invention further relates to a vector that can be integrated into the host cells genome and thereby replicates along with the host cells genome. The expression vector may comprise a predefined restriction site, which can be used for linearization of the vector nucleic acid prior to transfection. The skilled person knows how to integrate into the genome. For example, it is important how to place the linearization restriction site, because said restriction site determines where the vector nucleic acid is opened/linearized and thus determines the order/arrangement of the expression cassettes when the construct is integrated into the genome of the host cell.

[0025] An antibiotic resistance gene, in accordance with the invention, means a gene which provides the transformed cells with a selection advantage (e.g. resistance against an antibiotic) by expressing the corresponding gene product. The gene product confers a characteristic to the cell expressing the antibiotic resistance gene that allows it to be distinguished from cells that do not express the antibiotic resistance gene (i.e. selection of cells) if the antibiotic, to which the gene product confers resistance to, is applied to the cell culture medium. Resistance by the gene product to the cell may be conferred via different molecular mechanisms (e.g. inactivation of the drug, increased efflux).

[0026] The expression cassette comprising a gene of the present invention or gene encoding the protein of interest is inserted into the expression vector as a DNA construct. This DNA construct can be recombinantly made from a synthetic DNA molecule, a genomic DNA molecule, a cDNA molecule or a combination thereof. The DNA construct is preferably made by ligating the different fragments to one another according to standard techniques known in the art.

[0027] The expression cassette or vector according to the invention which is present in the host may either be integrated into the genome of the host or it may be maintained in some form extrachromosomally.

[0028] Furthermore, the expression cassettes may comprise an appropriate transcription termination site. This, as continued transcription from an upstream promoter through a second transcription unit may inhibit the function of the downstream promoter, a phenomenon known as promoter occlusion or transcriptional interference. This event has been described in both prokaryotes and eukaryotes. The proper placement of transcriptional termination signals between two transcription units can prevent promoter occlusion. Transcription termination sites are well characterized and their incorporation in expression vectors has been shown to have multiple beneficial effects on gene expression.

III. Description of the Figures

[0029] Figure 1. Site-specific incorporation of AzGGK into proteins in E. coli. a) Structural formulas of AzGGK and GGK. b) SDS-PAGE analysis of AzGGK-dependent expression of full- length sfGFP bearing an amber codon at position 150. c) Purified sfGFP-N150AzGGK can be reduced to sfGFP-N150GGK by treatment with 2DPBA (2-(diphenylphosphino)benzoic acid) d) MS-analysis confirming the integrity of sfGFP-N150AzGGK and sfGFP-N150GGK.

[0030] Figure 2. Srt5M-mediated ubiquitylation of GGK-bearing proteins, a) SDS-PAGE analysis of Srt5M-mediated formation of sfGFP-Ub conjugates confirms the specificity of the approach for GGK-bearing sfGFP. b) MS-analysis of sfGFP-Ub(PT) and sfGFP-Ub(LPT) conjugates c) SDS-PAGE analysis of Srt5M-mediated diUb formation between Ub-KxGGK and Ub(LPT) at 37 °C, 1 hour. Ubiquitylation assays were carried out as described in Supplementary Methods. [0031] Figure 3. Srt2A-mediated ubiquitylation of GGK-bearing proteins, a) Srt2A-mediated formation of diUbs analysed by SDS-PAGE gels. ^*denotes an impurity from Srt2A. b) Incubation of diUbs with USPD_CD shows that Srt2A-generated K6-DiUb(AT) and K6-DiUb(LAT) are stable towards DUB cleavage, while natively linked K6-DiUb is quantitatively cleaved within an hour at 37 °C. left: SDS-PAGE analysis; right: MS-analysis of DUB-cleavage. ^**denotes bands and MS-peaks that correspond to DiUbs where the C-terminal His6-tag of the acceptor ubiquitin has been cleaved off. c) PCNA (PDB: 1 axc) is a homotrimeric DNA-repair protein that is ubiquitylated at position K164. d) SDS-PAGE analysis of Srt2A-mediated formation of PCNA-UB(LAT) conjugate. ^*denotes the generated thioester intermediate between Srt2A and Ub(AT). e) Incubation of natively ubiquitylated PCNA-Ub(nat) and Srt2A-generated PCNA-Ub(AT) with Usp1/UAF1 shows that Srt2A-generated mono-ubiquitylated PCNA is resistant to DUB-cleavage. Ubiquitylation reactions and deubiquitylation assays were carried out as described in Supplementary Methods.

[0032] Figure 4. Site-specific SUMOylation of GGK-bearing proteins, a) Ubiqutin (PDB: 1 ubq) and SUMO (PDB: 1y8r) show a similar globular b-grasp fold with an unstructured C-terminus. The wild type C-terminal sequence of SUM01 is QEQTGG. For recognition by Srt2A two point mutations are introduced: Q92L and E93A. b) SDS-PAGE analysis of Srt2A-catalysed

SUMOylation shows specificity for sfGFP-GGK. c) SUMOylation in living E. coli using mSrt2A is specific for sfGFP-GGK as analysed by anti-His6 Western-Blots. Time points refer to times after 2DPBA addition. Srt2A and mSrt2A-mediated SUMOylation on purified proteins and in living E. coli were carried out as described in Supplementary Methods.

[0033] Figure 5. Incorporation of AzGGK into proteins in mammalian cells and sortase- mediated ubiquitylation and SUMOylation of proteins in living HEK293T cells a)

Fluorescence microscopy and anti-His6 Western-Blots of HEK293T cells expressing sfGFP- 150TAG-His6 in the absence and presence of 2 mM AzGGK. b) MS-analysis of sfGFP- N150AzGGK-His6 purified from HEK193T cells c) mSrt2A-mediated ubiquitylation and

SUMOylation in live HEK293T cells shows specificity for Ub(LAT) and SUMO(LAT) as analysed by anti-His6 Western-Blots. Overexpression of SUMO(wt) and Ub(wt) did not lead to sfGFP-Ub or sfGFP-SUMO conjugates. Likewise, in the absence of 2DPBA (i.e. when sfGFP-AzGGK-His6 is not reduced to sfGFP-GGK-His6) mSrt2A-mediated ubiquitylation and SUMOylation did not take place, proving the specificity and inducibility of our approach d) AzGGK-dependent expression of PCNA-164TAG-His6 in HEK293T cells as analysed by anti-His6-Western-Blots. Srt2A-mediated ubiquitylation and SUMOylation of PCNA-K164AzGGK-His6 in the presence of 2DPBA is dependent on overexpression of Ub- and SUMO-variants bearing a sortagging-motif. Srt2A variants were tagged with a C-terminal Myc-tag. [0034] Figure 6: General scheme showing sortase-mediated ubiquitylation. The unnatural amino acid AzGGK is site-specifically incorporated into proteins via genetic code expansion. In vivo Staudinger reduction converts AzGGK-bearing proteins to GGK-bearing proteins, which in turn undergo transpeptidation with a modified ubiquitin bearing a sortase recognition motif (LPLTG or LALTG) via SrtA. Generated ubiquitylated proteins display a native isopeptide-bond and two point mutations (R72P or R72A and R74T) in the linker region.

[0035] Figure 7: Sortase-mediated transpeptidation between GGK and a peptide resembling the Srt5M-compatible C-terminus. a) Structural formula of unnatural amino acid GGK. b) Structural formula of the sortase-mediated transpeptidation product that is formed between peptide Fmoc-VLPLTGG and GGK. c) LC-MS analysis of Srt5M-mediated transpeptidation between Fmoc- VLPLTGG and GGK shows close to quantitative formation of product within 30 minutes (red box), while incubation of Fmoc-VLPLTGG with AzGGK in the presence of Srt5M did not yield the transpeptidation product, as expected (blue box). In the absence of GGK, hydrolysis of the thioester, formed between Srt5M and Fmoc-VLPLTGG, did lead to small amounts of hydrolysed peptide Fmoc-VLPLT (grey box). Fmoc = Fluorenylmethoxycarbonyl. d) Overnight-incubation of a ubiquitin mutant displaying two mutations in its C-terminus (=Ub(PT)) with GGK and Srt5M leads to formation of the expected product.

[0036] Figure 8: Structural analysis of wild type PylRS. Crystal structure of /WmPylRS (PDB: 2Q7H). The PylRS C-lobe is shown in cartoon model in red, blue and green. Key amino acid positions in the active site are displayed in stick model in orange. The blue and green part (723 bp) represent the part subjected to DNA-shuffling, the green part (495 bp) was subjected to error-prone PCR. The PylRS variants used for the shuffling approach contained amino acid mutations at positions highlighted in orange.

[0037] Figure 9: Site-specific incorporation of AzGGK into sfGFP in E. coli. Expression of sGFP-150TAG-His6 in the presence of AzGGKRS/tRNAcu_A and AzGGK (4 mM) under auto- induction conditions lead to AzGGK-dependent synthesis of full-length sfGFP-His6, as confirmed by LS-MS. Apart form the mass peak corresponding to sfGFP-AzGGK a further small peak (green) was observed in the ESI-MS analysis, which could be assigned to misincorporation of phenylalanine. Omitting phenylalanine from the amino acid mix lead to clean expression of sfGFP- AzGGK.

[0038] Figure 10. Srt5M-mediated ubiquitylation of sfGFP-GGK. Time course of Srt5M mediated ubiquitylation of sfGFP-GGK with Ub(PT) or Ub(LPT). sfGFP-GGK (20 mM) was incubated at 37 °C with 100 pM Ub(PT) or Ub(LPT) in the presence of Srt5M (20 pM) and reactions were quenched at indicated time points with 4x Laemmli buffer (and heated to 95 °C) and analysed via SDS-PAGE. Incubation of sfGFP-BocK under the same conditions did not lead to ubiquitylated product.

[0039] Figure 11 : Srt5M-catalysed formation of diubiquitins. a) SDS-PAGE analysis of purified C-terminal His6-tagged ubiquitins bearing GGK at the respective lysine positions (Ub-K6GGK- His6, Ub-K11 GGK-His6, Ub-K27GGK-His6, Ub-K29GGK-His6, Ub-K33GGK-His6, Ub-K48GGK- His6, Ub-K63GGK-His6). b) ESI-MS analysis of Ub-K6GGK-His6. c) SDS-PAGE analysis of differently linked diubiquitins formed by Srt5M-cata lysed transpeptidation between Ub-KxGGK and Ub(LPT).

[0040] Figure 12: Srt2A-mediated ubiquitylation of GGK-bearing proteins a) Incubation of sfGFP-GGK with Ub(AT) and Ub(LAT) in the presence of Srt2A leads to specific formation of sfGFP-Ub(AT) and sfGFP-Ub(LAT) conjugates, while sfGFP-BocK is unreactive under the same reaction conditions b) MS-analysis of sfGFP-Ub(AT) and sfGFP-Ub(LAT) conjugates c) While incubation of Ub-K6GGK with Ub(AT) leads to efficient formation of the corresponding K6-linked diUb in the presence of Srt2A, addition of Ub(wt) instead of Ub(AT) did not lead to K6-diUb formation, showing that Srt2A cannot recognize Ub(wt). d) Srt2A-catalysed formation of differently linked diubiquitins by incubation of Ub-KxGGK-His6 with Ub(AT) or Ub(LAT) in the presence of Srt2A. K27- and K29 linked diubiquitins were not accessible through Srt2A-mediated transpeptidation, the K33-diubiquitin was only obtained in good yields when using the Ub(LAT) donor ubiquitin, displaying the more exposed sortagging motif.

[0041] Figure 13: Optimizing conditions for Srt2A-mediated K6-diUb-formation. a) Time course of K6-diUb formation between Ub-K6GGK-His6 and Ub(AT) analysed by SDS-PAGE. Upon sortase-mediated transpeptidation between GGK-bearing acceptor ubiquitin and donor ubiquitin the sortagging motif LALTG is re-installed in the generated diubiquitin molecule, making sortylation in principle a reversible approach. Using the Ub(AT) variant with less accessible sortagging motif, however, the formed diubiquitins were stable over a period of 16 hours in the presence of high concentrations of Srt2A. b) Time course of K6-diUb formation between Ub-K6GGK-His6 and Ub(LAT) analysed by SDS-PAGE. The donor ubiquitin with a leucine spacer introduced into its C- terminus (Ub(LAT)) that displays a more accessible sortagging motif, showed diubiquitin formation already within 10 minutes, but the formed ubiquitin dimers hydrolysed over a time course of 16 hours, especially in the presence of high Srt2A concentrations.

[0042] Figure 14: Preparative formation of Srt2A-generated diubiquitins. a) Overview of expression and purification of components needed for assembly of diubiquitins. Sortase-mediated reactions were typically stopped by adding the cysteine protease inhibitor phenylvinylsulfone. Diubiquitins (with either Ub(AT) or Ub(LAT) as donor ubiquitin) were purified via Ni-NTA chromatography, followed by size-exclusion chromatography b) All diubiquitin mutants were obtained in multi-milligram scale and were characterized by LC-MS.

[0043] Figure 15: Deubiquitylation assays. Incubation of Srt2A-generated diubiquitins and native diubiquitins with the promiscuous DUB USP2_CD showed that Srt2A-generated diubiquitins are stable against isopeptidase activity of the DUB, while all the native diubiquitins were cleaved within an hour at 37 °C. ^**denotes bands corresponding to C-terminal His6-tag cleavage of the donor ubiquitin. Deubiquitylation assays were carried out as described in Supplementary Methods.

[0044] Figure 16: Ubiquitylation of PCNA-K164GGK. a) PCNA-K164GGK was purified from PCNA-K164AzGGK-CPD-His6 and treated with Ub(LAT) in the presence of Srt2A at 37 °C or 25 °C. PCNA-Ub(LAT) conjugate formation was observed within 30 minutes. ^*denotes thioester formed between Srt2A and Ub(LAT). The formation of PCNA-Ub(LAT) conjugate is specific for PCNA-bearing GGK. No ubiquitylated PCNA is detected when using PCNA-K164BocK.

[0045] Figure 17: Srt2A-mediated SUMOylation of GGK-bearing proteins, a) Time course for formation of SUMO-sfGFP conjugates. sfGFP-GGK (20 mM) was incubated in the presence of Srt2A (20 mM) with an excess of SUMO(AT) or SUMO(LAT) (100 mM) at 37 °C and product formation was analysed by SDS-PAGE at different time points. As seen for ubiquitylation, SUMOylation proceeded more rapidly when using the SUMO(LAT) variant that bears a leucine spacer preceding the sortagging motif. Incubation of sfGFP-BocK under the same reaction conditions confirmed specificity of Srt2A-mediated SUMOylation. b) Time course for Srt2A- catalyzed formation of a Ub-SUMO conjugate analysed by SDS-PAGE.

[0046] Figure 18: Comparison of different sortase A enzymes, a) Comparison of S. aureus wt SrtA (PDB: 2kid) and S. pyogenes wt SrtA (PDB: 3fn5). S. aureus SrtA is strongly Ca²⁺-dependent. Binding of Ca²⁺ to glutamate residues in the b3/b4 loop, distal to the active site enhances substrate binding by stabilizing a closed conformation of the active site b6/b7 loop. S. pyogenes SrtA is Ca²⁺- independent and the b3/b4-Ioor and b6/b7-Ioor are kept in a closed conformation through hydrogen-bonding between K126 and D196. b) Sequence alignment of SrtA-enzymes used in this study. In analogy to the Ca²⁺-independent Srt7M mutant, the present inventors introduced K47 and Q50 mutations into the b3/b4-Ioor of Srt2A, generating mSrt2A.

[0047] Figure 19: Comparison between Srt2A and mSrt2A. a) Purified-Ub-K6GGK-His6 was incubated with an excess of Ub(LAT) in the presence of Srt2A or mSrt2A. The reactions were either supplemented with 5 mM CaCI₂ or 5 mM EGTA (no CaCI₂ was added). mSrt2A performed equally well in presence and absence of Ca²⁺. Srt2A showed reduced efficiency in Ca²⁺-free conditions. ^*denotes an impurity from Srt2A/mSrt2A b) In vivo ubiquitylation of sfGFP-GGK in £. coli. sfGFP-AzGGK was co-expressed together with SUMO(LAT) and Srt2A and mSrt2A for 24 hours. After washing of cells to remove AzGGK, cells were treated with 2DPBA for indicated time points, washed again and analysed by anti-His6 Western-Blots. Consistent with results for in vitro experiments conducted in the absence of Ca²⁺ (a), in vivo SUMOylation was much more effective with mSrt2A. Srt2A and mSrt2A-mediated ubiquitylation/SUMOylation on purified proteins and in living £. coli was carried out as described in Supplementary Methods.

[0048] Figure 20: In vivo ubiquitylation and SUMOylation of PCNA. a) PCNA-K164AzGGK- CPD-His6 was co-expressed together with mSrt2A and Ub(LAT) for 24 hours. After washing of cells to remove AzGGK, cells were treated with 2DPBA for indicated time points, washed again and analysed by anti-His6 Western-Blots. Expression of PCNA-K164BocK-CPD-His6 shows that in vivo ubiquitylation is dependent on GGK-bearing proteins b) same as (a) for SUMOylation. In vivo ubiquitylation/SUMOylation on PCNA-CPD-His6 in living £. coli was carried out as described in Supplementary Methods.

[0049] Figure 21 : Sortase-mediated ubiquitylation and SUMOylation in mammalian cells, a)

Ubiquitylation and SUMOylation in HEK293T cell lysates: sfGFP-N150AzGGK-His6 was expressed for 48 hours in HEK293T cells. Cells were washed with AzGGK-free medium, incubated with 0.5 mM DPBA overnight, washed again and lysed by consecutive freeze-thaw cycles. Lysates were treated with 20 mM SrtA variant (either Srt5M, Srt7M, Srt2A or mSrt2A) and 100 pM Ub(LPT) or Ub(LAT) for one hour at 37 °C and analysed by anti-His6 Western Blotting. In the absence of 2DPBA no ubiquitylation of sfGFP could be observed. All four tested SrtA variants were active in forming sfGFP-Ub conjugates in buffer containing 5 mM CaCI₂, while only the Ca²⁺-independent mutant Srt7M and mSrt2A were able to ubiquitinate sfGFP-GGK in the presence of Ca²⁺-chelating agent EGTA within one hour b) mSrt2A- and Srt2A-mediated ubiquitylation and SUMOylation in live HEK293T cells. See Figure 4c. Experiments were carried out as described in Supplementary Methods.

[0050] Figure 22: Examples of unnatural amino acids. Structural formulas of discussed unnatural amino acids: AzGGK, AzGK, GGK and GK. Furthermore structural formulas of unnatural amino acids, where the N-terminal glycine moiety in GGK is protected with a photocaging group (either 2-nitrobenzyl or coumarin) are described.

[0051] Figure 23: Generation of ubiquitin chains conjugated to a polypeptide of interest. a) Using orthogonal sortases: A bifunctional ubiquitin bearing a Srt2A-compatible C-terminus and a protected GGK moiety (AzGGK or photocaged GGK) at a specific position is reacted with a protein of interest (POI) bearing GGK in the presence of Srt2A to give a site-specifi cally ubiquitylated POI. The protected GGK amino acid in the Ub molecule is deprotected using light or a phosphine to yield GGK and is reacted with a Ubiquitin bearing a C-terminal motif compatible with Srt5M and reacted in the presence of Srt5M to the shown product.

b) Using subtiligase: Ubiquitin (1-74) bearing a protected GGK amino acid (AzGGK or photocaged GGK) is expressed as a thioester via intein technology and reacted with a POI bearing GGK at a specific position using subtiligase. In a second step the protecting group is removed and GGK can react with a second Ub(1-74) thioester in the presence of subtiligase.

[0052] Figure 24: Transpeptidase reaction employing a subtiligase.

a) a recombinant protein is expressed as a intein fusion and eluted from a chitin column as a thioester using a thiol, (e.g. MESNA, Sodium Mercaptoethansulfonate). The thioester can be ligated to the N-terminus of a peptide using Subtiligase.

b) Subtiligase is known to interact with 4 residues (P₄ - P-i) N-terminal to and 2 residues (R-i' - P₂') C-terminal to the ligation site.

[0053] Figure 25: Hydrolysis assays of differently linked Dillbs reveal orthogonal Sortases.

K6-DiUbs linked via different sortase motifs (S2A = LALTG; S5M = LPLTG; S4S = LPLSG) were incubated with the three different sortase variants a) Incubation of the three differently linked K6- DiUbs (40 mM) with S2A (10 pM) in sortase buffer at 37 °C shows fast hydrolysis for LALTG linked DiUb, rather slow hydrolysis for LPLSG linked but no hydrolysis for LPLTG linked DiUb. b) Incubation of the three differently linked K6-DiUbs (40 pM) with S5M (10 pM) in sortase buffer at 37 °C shows fast hydrolysis for LPLTG and LPLSG linked but no hydrolysis reaction for LALTG linked DiUb. c) Incubation of the three differently linked K6-DiUbs (40 pM) with S4S (10 pM) in sortase buffer at 37 °C shows fast hydrolysis for LPLTG and LPLSG linked but no hydrolysis for LALTG linked DiUb.

[0054] Figure 26: Orthogonal Sortases allow assembly of isopeptide linked triubiquitin. a) Scheme illustrating the iterative use of orthogonal sortases to assemble a triubiquitin (TriUb) linked via two isopeptide bonds (K48 and K6). In the first step S2A forms an isopeptide bond between Ub(LAT) and UbK48GGK-LPT yielding Ub-isoK48(LAT)-Ub-LPT. Subsequent reaction of Ub-isoK48(LAT)-Ub-LPT with S5M and UbK6GGK results in a TriUb linked via two isopeptide bonds at K48 and K6. b) SDS-PAGE analysis of the transpeptidation reaction between Ub- isoK48(LAT)-Ub-LPT and UbK6GGK mediated by S5M. TriUb assembly is already visible after 5 min incubation at 37 °C and reaches a maximum yield at approx. 1 h. c) Left: Purification of TriUb via size-exclusion chromatography (SEC) yields pure protein. Right: LC-MS analysis of purified TriUb.

[0055] Figure 27: Orthogonal Sortases allow assembly of Dillb-SUM02 hybrid chains.

a) Scheme illustrating the iterative use of orthogonal sortases to assemble different DiUb-SUM02 hybrid chains. In the first step S2A forms an isopeptide bond between Ub(LAT) and UbK63GGK- LPT yielding Ub-isoK63(LAT)-Ub-LPT. In the second step Ub-isoK63(LAT)-Ub-LPT serves as a platform for S5M mediated transpeptidation leading to attachment of SUMOKXXGGK via an isopeptide bond. In theory nine different hybrid DiUb-SUM02 chains are possible either via one of the eight lysine residues of SUM02 or via the N-terminus giving a linear linkage b) SDS-PAGE analysis of the formation of linear DiUb-SUM02 hybrid chains. Incubation of Ub-isoK63(LAT)-Ub- LPT with GG-SUM02 (a SUM02 variant with a C-terminal diglycine motif) and S5M leads to the formation of a Ub-isoK63(LAT)-Ub-LPT-SUM02 hybrid chain within 1 hour. Incubation of Ub- isoK63(LAT)-Ub-LPT without GG-SUM02 leads to the S5M dependent cleavage of the C-terminal His-Tag of Ub-isoK63(LAT)-Ub-LPT c) SEC yields pure hybrid chain d) SDS-PAGE analysis of the formation of isopeptide linked DiUb-SUM02 hybrid chains. Ub-isoK63(LAT)-Ub-LPT was incubated with S5M either in absence or in presence of SUM02 bearing GGK or BocK at depicted positions. GGK specific hybrid chain formation was observed for all tested SUM02 sites.

IV. Detailed description of the invention

[0056] The present invention is based on the surprising finding that unnatural amino acids can serve as a platform for a transpeptidase conjugation when integrated in a polypeptide of interest. More specifically, the present inventors developed a method for site-specifi cally conjugating two polypeptides via a transpeptidation reaction. In this context the first polypeptide is modified such that a certain amino acid is substituted with an unnatural amino acid, thereby defining the position in the first polypeptide where the second polypeptide is to be conjugated. The unnatural amino acid comprises a first amino acid and at least one further amino acid conjugated to the side chain of the first amino acid. The second polypeptide comprises a recognition motif for the transpeptidase. Accordingly, when the first polypeptide, the second polypeptide and the transpeptidase are brought into close proximity, the transpeptidase will recognize the recognition motif in the second polypeptide and subsequently catalyse a transpeptidation reaction. In this transpeptidation reaction the second polypeptide is conjugated via its recognition motif site-specifically to the unnatural amino acid of the first polypeptide.

[0057] More precisely, in the first step the transpeptidase recognizes the recognition motif in the second polypeptide - usually a sequence of five amino acids (e.g. Sortase A recognizes the amino acid sequence LPXTG) - and forms a covalent thioester bond to the recognition motif of the second polypeptide (Sortase A cleaves between Threonine and Glycine in the recognition motif and forms a covalent bond between a cysteine residue in the Sortase A catalytic site and the carboxy group of the Threonine residue of the recognition motif). As a consequence, a thioester intermediate product is generated in which the transpeptidase is covalently linked to the second polypeptide. In the second step this intermediate product encounters the first polypeptide comprising a free N- terminal amino acid residue (e.g. Sortase A requires an N-terminal Glycine residue), wherein the transpeptidase catalyses the formation of a covalent peptide bond between said free N-terminal amino acid residue of the first polypeptide and the recognition motif of the second polypeptide (the Threonine residue), thereby generating a conjugate comprising the first and the second polypeptide.

[0058] Before the present inventors developed the site-specific method of the present invention, such a transpeptidation reaction was restricted to conjugates of the second polypeptide (comprising the transpeptidase recognition motif) to the N-terminus of the first polypeptide (comprising the free N-terminal amino acid residue).

[0059] Now, the present inventors surprisingly discovered that the described transpeptidation reaction cannot only be conducted between the recognition motif of the second polypeptide and a free amino acid residue at the N-terminus of the first polypeptide but also between the recognition motif of the second polypeptide and an unnatural amino acid integrated in the first polypeptide, if said unnatural amino acid comprises a free amino acid (comprising a free N-terminus as required by the transpeptidase, i.e. a N-terminus amenable to the formation of a covalent peptide bond) fused to its side chain (e.g. a Glycine residue conjugated to the side chain of a Lysine residue integrated in the first polypeptide can be used in the second step of the reaction by Sortase A). By way of example, Figure 6 shows such a transpeptidation reaction. In Figure 6 an Ubiquitin polypeptide (second polypeptide) was modified such that it comprises a Sortase A recognition motif at its C-terminus and a polypeptide of interest (first polypeptide) was modified to comprise an unnatural amino acid comprising a glycylglycine moiety fused via an isopeptide bond to a Lysine residue integrated in the first polypeptide (termed GGK-bearing POI in Figure 6). In this example Sortase A recognizes the“LPLTGG” motif of the Ubiquitin polypeptide (second polypeptide) and forms a intermediate product - comprising the Sortase A and the modified Ubiquitin - by forming a thioester between its catalytic cysteine and Threonine, cleaving thereby the recognition motif between the Threonine and the first Glycine. Next, the Sortase A conjugates the Threonine to the N-terminal Glycine residue of the unnatural amino acid (GGK) integrated in the first polypeptide, thereby generating a conjugate comprising an Ubiquitin fused to a predetermined specific site in the polypeptide of interest.

[0060] In sum, the present inventors developed a new method allowing to modify a first polypeptide by conjugating a second polypeptide to a predetermined specific site in the first polypeptide via a transpeptidation reaction. Importantly, the method of the present inventors is applicable to proteins under native conditions, allowing the modification of large multi-domain and non-refoldable proteins; an endeavour that is challenging with present chemical methods. This is the first approach where a site-specifically introduced unnatural amino acid serves as a platform for a chemoenzymatic reaction, namely the transpeptidation reaction of a first polypeptide comprising an unnatural amino acid and a second polypeptide comprising a recognition motif for the transpeptidase.

[0061] Accordingly, the present invention relates to a method for modifying a polypeptide, comprising:

(i) providing a transpeptidase;

(ii) providing a first polypeptide comprising one or more unnatural amino acids, wherein the unnatural amino acid comprises a first amino acid and at least one further amino acid conjugated to the side chain of the first amino acid;

(iii) providing a second polypeptide, comprising a recognition motif for the transpeptidase; and

(iv) obtaining the modified polypeptide.

[0062] The term “transpeptidase” as used herein refers to an enzyme catalyzing a transpeptidation reaction, i.e. the transfer of an amino or peptide group from one molecule to another. A transpeptidase as used herein recognizes a recognition motif in a polypeptide and forms a covalent bond with an amino acid residue of the recognition motif of the polypeptide, thereby generating an intermediate product in which the transpeptidase is covalently linked to the polypeptide. The intermediate product - comprising the transpeptidase and the polypeptide - reacts with a free N-terminal amino acid residue of another polypeptide, wherein the transpeptidase catalyses the formation of a covalent bond between said free N-terminal amino acid residue and the recognition motif, thereby generating a conjugate comprising both polypeptides. Thus, a transpeptidase as used herein forms a peptide-bond between a first polypeptide comprising a free N-terminal amino acid and a second polypeptide comprising a recognition motif for the transpeptidase. By way of example, transpeptidases known in the art are Sortase A from Staphylococcus aureus or Sortase B from Staphylococcus aureus or Staphylococcus pneumoniae.

[0063] The term“recognition motif” as used herein refers to a motif of amino acids recognized by the transpeptidase in the first step of the transpeptidation reaction as described herein. A person skilled in the art knows the recognition motif of a certain transpeptidase. A recognition motif comprises a sequence of a few amino acids, e.g. five amino acids. By way of example, recognition motifs and transpeptidases that can be used to put the present invention into practice are“LPXTG” for Sortase A of Staphylococcus aureus, “NPQTN” for SortaseB of Staphylococcus aureus, “YPRTG” for Sortase B of Streptococcus pneumoniae, “LPXTA” or“LPXTG” for Sortase A of streptococcus pyogenes“LAXTG” for Sortase2A as disclosed herein or“LPXSG” for Sortase4S as disclosed herein. Further transpeptidases with other recognition motifs that can be used to put the present invention into practice are known in the art. The recognition motif for the transpeptidase can be located at the N-terminus, at the C-terminus or be incorporated in the amino acid sequence of a polypeptide, e.g. the second polypeptide according to the method of the present invention. In case the recognition motif is incorporated in the amino acid sequence of a polypeptide, the locus of the amino acid sequence comprising the recognition motif preferably forms a loop. However, the recognition motif is preferably located at the N-terminus or at the C-terminus of the polypeptide and even more preferably at the C-terminus of the polypeptide. The recognition motif can be conjugated to the N-terminus or the C-terminus or integrated in any polypeptide. The person skilled in the art knows how to.

[0064] The present inventors further discovered that a Leucine introduced at the N-terminus of the recognition motif of the second polypeptide may act as a spacer making the recognition motif more accessible and thus increasing the conjugation rate in the transpeptidase reaction. Accordingly, in a preferred embodiment the second polypeptide and the recognition motif are separated by a spacer, preferably a Leucine. [0065] The term“providing” as used herein refers in its broadest sense to providing a polypeptide such that the polypeptide can fulfill its function in the transpeptidation reaction of the present invention. The person skilled in the art knows how to provide a polypeptide. Examples are solid phased peptide synthesis, in vitro transcription and translation or expressing the polypeptide in a host cell. Preferably, the polypeptide is purified or isolated. The person skilled in the art knows how to isolate or purify a polypeptide. By way of example, a tag may be used to for purification of the polypeptide. A nucleotide sequence encoding a polypeptide may also encode a tag which is advantageously genetically fused in frame to the nucleotide sequence encoding said polypeptide. Said tag may be at the C-or N-terminus of said polypeptide. Examples of tags that may be used include, but are not limited to, HAT, FLAG, c-myc, hemagglutinin antigen, His (e.g., 6xHis) tags, flag-tag, strep-tag, strepl l-tag, TAP-tag, chitin binding domain (CBD), maltose-binding protein, immunoglobulin A (IgA), His-6-tag, glutathione-S-transferase (GST) tag, intein and streptavidine binding protein (SBP) tag. Preferably, the first polypeptide comprises a tag, which can be used for detection or purifying or isolating the first polypeptide. More preferably, the first polypeptide comprises a tag at the C-terminus, which can be used for detection or purifying or isolating the first polypeptide.

[0066] The term “modified polypeptide” as used herein refers to the conjugate of the first polypeptide and the second polypeptide obtained by the method of the present invention. More precisely, the“modified polypeptide” comprises the second polypeptide that is conjugated via its recognition motif to the unnatural amino acid, i.e. to the N-terminus of the at least one further amino acid, which is conjugated to the side chain of the first amino acid, incorporated site- specifically into the first polypeptide.

[0067] The term “obtaining” as used herein means in its broadest sense incubating the components used in the method of the invention under conditions allowing the transpeptidase reaction such that the modified polypeptide is generated. A person skilled in the art knows how to provide conditions suitable for an enzymatic reaction such as a transpeptidase reaction and will adjust the important parameters (e.g. temperature, pH, salt concentration, etc.) accordingly. Examples of conditions allowing to put the present invention into practice are shown in the examples. In case the method of the invention is performed in a host cell, the term“obtaining” means culturing the host cell under conditions allowing expression of the polypeptides and/or RNAs required for the transpeptidation reaction of the invention and allowing the transpeptidation reaction in the host cell.

[0068] The term“unnatural amino acid” as used herein refers to a first amino acid (X), wherein at least one further amino acid (Z) has been conjugated to the side chain (also known in the art as“R” of an amino acid) of the first amino acid (X), such that the first amino acid and the one or more further amino acids form a linear amino acid side chain (e.g. X-Z or X-Z-Z). In case such an unnatural amino acid is incorporated in a polypeptide, the first natural amino acid (X) is integrated in the amino acid sequence of the polypeptide, wherein the one or more further amino acids (Z) protrude from the linear amino acid sequence of the polypeptide as a“side chain” such that the N- terminus of the“side chain” can form an isopeptide-bond in the transpeptidation reaction. Thus, the unnatural amino acid comprises at least one amino acid (Z) conjugated to the first amino acid (X), wherein the at least one amino acid (Z) is required for the second step of the transpeptidation reaction, i.e. conjugating an amino acid of the recognition motif via a peptide-bond to the amino acid (Z). In case two or more amino acids (Z) are fused to the first amino acid (X-Z-Z, X-Z-Z-Z, etc., wherein Z can be any amino acid) the terminal amino acid (Z) is conjugated to the amino acid of the recognition motif. The amino acids (Z) in the unnatural amino acid have to be selected in dependence of the transpeptidase that is used in the method of the invention in order to allow the second step of the transpeptidation reaction. Thus, in case Sortase A is used as a transpeptidase, the terminal amino acid is a Glycine (e.g. X-G, X-Z-G, or X-Z-Z-G etc.). By way of example, Figure 22 shows two unnatural amino acids (N⁶-glycylglycyl-L-lysines (GGK) or N⁶-glycyl-L-lysine (GK)) of the present invention having Lysine as a first amino acid (X=Lysine) and one or two Glycines as the one or more further amino acids (Z=Glycine). An unnatural amino acid of the present invention serves as a motif required for the transpeptidase reaction, more precisely the second step of the transpeptidase reaction as described herein. An unnatural amino acid can be integrated at any position of the first polypeptide according to the method of the invention, i.e. at the N-terminus, at the C-terminus or integrated in the amino acid sequence of the first polypeptide. However, in a preferred embodiment, the unnatural amino acid is integrated in the amino acid sequence of the first polypeptide.

[0069] In a preferred embodiment on the invention the unnatural amino acid comprises two or more amino acid residues, wherein the first amino acid residue is integrated in the amino acid chain of the first polypeptide, wherein one or more further amino acid residues are attached to the side chain of the first amino acid residue, and wherein the one or more further amino acid residues react with the transpeptidase in the transpeptidation reaction. The first amino acid preferably comprises an amino group, a thiol, a hydroxyl group or a carboxyl group in its side chain and wherein the two or more further amino acids are linked to the amino group, thiol, hydroxyl group or carboxyl group in the side chain of the first amino acid.

[0070] In a preferred embodiment of the invention, the first amino acid of the unnatural amino acid is Lysine, wherein the one or more amino acids are conjugated via an isopeptide-bond to the s- amino group of the Lysine. [0071] In a further preferred embodiment the unnatural amino acid is N⁶-glycylglycyl-L-lysines (GGK) or N⁶-glycyl-L-lysine (GK), which are shown in Figure 22.

[0072] In another preferred embodiment the unnatural amino acid is /V⁶-((2-azidoacetyl)glycyl)-L- lysine (AzGGK) or /V⁶-((2-azidoacetyl)-L-lysine (AzGK) as shown in Figure 22, wherein transpeptidation is induced by providing a phosphine, preferably 2-(diphenylphosphino)benzoic acid (2DPBA). In this embodiment the terminal Glycine residue of the unnatural amino acid - which is required for the second step of the transpeptidation reaction - is masked with an azido group preventing the second step of the transpeptidation reaction. The azido group of AzGGK or AzGK can be reduced quantitatively via Staudinger reduction with a phosphine to restore GGK. The phosphine is preferably water-soluble. In case the method of the present invention is performed in a host cell, the phosphine has to be cell-permeable and non-cytotoxic. Examples of phosphines are described in Luo et al. (Nat. Chem. 2016, 8, 1027-1034; doi: 10.1038/nchem.2573. Epub 2016 Jul 25), which is incorporated herein by reference. A particularly preferred phosphine is 2DPBA (2- (diphenylphosphino)benzoic acid). In this embodiment, the second step of the transpeptidation reaction is induced by providing a suitable phosphine, allowing a temporal control of the reaction.

[0073] In a further preferred embodiment the unnatural amino acid is GK or GGK, whose N- terminal glycine residue is protected with a photoremovable protecting group. The term “photoremovable protecting group” or photoprotective group” are used interchangeably herein and related to a chemical moiety that is attached to the N-terminus of the amino acid (Z) fused to the side chain of the first amino acid (X) in the unnatural amino acid, wherein said chemical moiety prevents the transpeptidation reaction and can be removed with a light pulse, thereby inducing the transpeptidation reaction. In such unnatural amino acids the primary amino group of the N-terminal glycine is masked with a photoremovable protecting group that prevents the second step of the transpeptidation reaction. In order to allow the second step of the transpeptidation reaction, the photoremovable protecting group has to be removed with a light pulse to restore GK or GGK. The wavelength of the light pulse has to be adjusted according to the photoremovable group such that the photoremovable group is removed from GK or GGK. Preferably, the photo protective group is 2-nitrobenzyl or coumarin, wherein the light pulse for removing the photo protective group has preferably a wavelength of about 365 nm. Further photo protective groups are known in the art and can be used to put the present invention into practice. In this embodiment, the second step of the transpeptidation reaction is induced by providing a suitable light pulse, allowing a temporal and spatial control of the reaction. Examples of a photoprotected GGK are shown in Figure 22. The terms“photoprotected” and“photocaged” are used interchangeably herein. [0074] In a further preferred embodiment the transpeptidase is Sortase A, preferably Sortase A (SrtA) from Staphylococcus aureus , which is exemplarily shown in SEQ ID NO: 1. Thus, a Sortase A that can be used in accordance with the present invention has an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 1. Further preferred are mutants of Sortase A capable of conjugating the second polypeptide to the first polypeptide. A preferred mutant of the invention is Srt2A having an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 2, wherein the position corresponding to position 36 of SEQ ID NO: 2 is Arginine, the position corresponding to position 44 of SEQ ID NO: 2 is Cysteine, the position corresponding to position 46 of SEQ ID NO: 2 is Histidine, the position corresponding to position 47 of SEQ ID NO: 2 is Aspartic acid, the position corresponding to position 80 of SEQ ID NO: 2 is Proline, the position corresponding to position 94 of SEQ ID NO: 2 is Isoleucine, the position corresponding to position 102 of SEQ ID NO: 2 is Lysine, the position corresponding to position 104 of SEQ ID NO: 2 is Histidine, the position corresponding to position 106 of SEQ ID NO: 2 is Asparagine, the position corresponding to position 107 of SEQ ID NO: 2 is Alanine, the position corresponding to position 115 of SEQ ID NO: 2 is Glutamic acid, the position corresponding to position 124 of SEQ ID NO: 2 is Valine, the position corresponding to position 132 of SEQ ID NO: 2 is Glutamic acid and the position corresponding to position 138 of SEQ ID NO: 2 is Serine, i.e. positions 36, 44, 46, 47, 80, 94, 102, 104, 106, 107, 115, 124, 132 and 138 of SEQ ID NO: 2 are not altered. A further preferred mutant of the invention is Srt5M having an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 3, wherein the position corresponding to position 36 of SEQ ID NO: 3 is Arginine, the position corresponding to position 102 of SEQ ID NO: 3 is Asparagine, the position corresponding to position 107 of SEQ ID NO: 3 is Alanine, the position corresponding to position 132 of SEQ ID NO: 3 is Glutamic acid and the position corresponding to position 138 of SEQ ID NO: 3 is Threonine, i.e. positions 36, 102, 107, 132 and 138 of SEQ ID NO: 3 are not altered. A further preferred mutant of the invention is Srt7M having an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 4, wherein the position corresponding to position 36 of SEQ ID NO: 4 is Arginine, the position corresponding to position 47 of SEQ ID NO: 4 is Lysine, the position corresponding to position 50 of SEQ ID NO: 4 is Glutamine, the position corresponding to position 102 of SEQ ID NO: 4 is Asparagine, the position corresponding to position 107 of SEQ ID NO: 4 is Alanine, the position corresponding to position 132 of SEQ ID NO: 4 is Glutamic acid and the position corresponding to position 138 of SEQ ID NO: 4 is Threonine, i.e. positions 36, 47, 50, 102, 107, 132 and 138 of SEQ ID NO: 4 are not altered. A further preferred mutant of the invention is mSrt2A having an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 5, wherein the position corresponding to position 36 of SEQ ID NO: 5 is Arginine, the position corresponding to position 44 of SEQ ID NO: 5 is Cysteine, the position corresponding to position 46 of SEQ ID NO: 5 is Histidine, the position corresponding to position 47 of SEQ ID NO: 5 is Lysine, the position corresponding to position 50 of SEQ ID NO: 5 is Glutamine, the position corresponding to position 80 of SEQ ID NO: 5 is Proline, the position corresponding to position 94 of SEQ ID NO: 5 is Isoleucine, the position corresponding to position 102 of SEQ ID NO: 5 is Lysine, the position corresponding to position 104 of SEQ ID NO: 5 is Histidine, the position corresponding to position 106 of SEQ ID NO: 5 is Asparagine, the position corresponding to position 107 of SEQ ID NO: 5 is Alanine, the position corresponding to position 115 of SEQ ID NO: 5 is Glutamic acid, the position corresponding to position 124 of SEQ ID NO: 5 is Valine, the position corresponding to position 132 of SEQ ID NO: 5 is Glutamic acid and the position corresponding to position 138 of SEQ ID NO: 5 is Serine, i.e. positions 36, 44, 46, 47, 50, 80, 94, 102, 104, 106, 107, 1 15, 124, 132, and 138 of SEQ ID NO: 5 are not altered. A further preferred mutant of the invention is Srt4S having an amino acid sequence which has an identity of at least 60%, of at least 65%, of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 6, wherein the position corresponding to position 36 of SEQ ID NO: 6 is Arginine, the position corresponding to position 40 of SEQ ID NO: 6 is Aspartic acid, the position corresponding to position 44 of SEQ ID NO: 6 is Cysteine, the position corresponding to position 46 of SEQ ID NO: 6 is Valine, the position corresponding to position 60 of SEQ ID NO: 6 is Threonine, the position corresponding to position 64 of SEQ ID NO: 6 is Alanine, the position corresponding to position 76 of SEQ ID NO: 6 is Arginine, the position corresponding to position 86 of SEQ ID NO: 6 is Leucine, the position corresponding to position 102 of SEQ ID NO: 6 is Asparagine, the position corresponding to position 107 of SEQ ID NO: 6 is Alanine, the position corresponding to position 124 of SEQ ID NO: 6 is Valine, the position corresponding to position 131 of SEQ ID NO: 6 is Phenylalanine, the position corresponding to position 132 of SEQ ID NO: 6 is Glutamic acid and the position corresponding to position 138 of SEQ ID NO: 6 is Threonine, i.e. positions 36, 40, 44, 46, 60, 64, 76, 86, 102, 107, 124, 131 , 132 and 138 of SEQ ID NO: 6 are not altered. Thus, the amino acid mutations of said mutants, compared to SEQ ID NO: 1 are not altered. Thus, by way of example, a derivative of mSrt2A can have an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 5, wherein the derivative comprises the described mutations of mSrt2A. Srt5M contains five mutations that confer 140-fold increased activity compared to wt SrtA. Srt7M is based on Srt5M but works Ca²⁺-independently. Srt2A recognizes the motif LAXTG compared to LPXTG of SrtA, Srt5M and Srt7M. mSrt2A is based on Srt2A but works Ca²⁺-independently. Srt4S recognizes the motif LPXSG. SEQ ID NO: 1-6 relate to the catalytic active sites of the sortases, lacking amino acids 1-58 of the wild type sortase A from Staphylococcus aureus. Thus, in a further preferred embodiment of the present invention the Sortase A is the full length Sortase A from Staphylococcus aureus as shown in SEQ ID NO: 42 or a mutant thereof which has an identity of at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 42. Preferably, a mutant of the full length Sortase A as shown in SEQ ID NO: 42 comprises in the catalytic active site the mutations as described above for SEQ ID NO: 2-6, i.e. SEQ ID NO: 42 is mutated as described above for SEQ ID NO: 2-6 at the corresponding positions in SEQ ID NO: 42. The corresponding positions in SEQ ID NO: 42 can for example be determined by adding to the above numbering of the positions “58”. By way of example, a full length mutant of Sortase A corresponding to the Srt2A mutant shown in SEQ ID NO: 2 comprises the following mutations: position 94 of SEQ ID NO: 42 is Arginine, the position corresponding to position 102 of SEQ ID NO: 42 is Cysteine, the position corresponding to position 104 of SEQ ID NO: 42 is Histidine, the position corresponding to position 105 of SEQ ID NO: 42 is Aspartic acid, the position corresponding to position 138 of SEQ ID NO: 42 is Proline, the position corresponding to position 152 of SEQ ID NO: 42 is Isoleucine, the position corresponding to position 160 of SEQ ID NO: 42 is Lysine, the position corresponding to position 162 of SEQ ID NO: 42 is Histidine, the position corresponding to position 164 of SEQ ID NO: 42 is Asparagine, the position corresponding to position 165 of SEQ ID NO: 42 is Alanine, the position corresponding to position 173 of SEQ ID NO: 42 is Glutamic acid, the position corresponding to position 182 of SEQ ID NO: 42 is Valine, the position corresponding to position 190 of SEQ ID NO: 42 is Glutamic acid and the position corresponding to position 196 of SEQ ID NO: 42 is Serine.

[0075] In a further preferred embodiment the second polypeptide is an ubiquitin (e.g. as shown in SEQ ID NO: 30) or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like- protein has been modified such that it comprises a recognition motif for the sortase A enzyme as described herein. Preferably, the C-terminus of the ubiquitin or ubiquitin-like-protein has been modified such that it comprises a recognition motif for the sortase A enzyme but remains as identical as possible to the native C-terminus of the ubiquitin or ubiquitin-like-protein, i.e. only the required amino acid substitutions for generating the recognition motif are introduced. By way of example, a wild type ubiquitin as shown in SEQ ID NO: 30 comprises the C-terminus“LRLRGG” which can be modified to comprise a recognition motif, by introducing two amino acid substitutions, to“LPLTGG” (containing the recognition motif of Srt5M) or“LALTGG” (containing the recognition motif of Srt2A). A person skilled in the art knows how to introduce amino acid substitutions in a polypeptide. Furthermore, ubiquitin-like proteins can be used as a second polypeptide in a similar fashion by modifying the C-terminus to comprise a sortase A recognition motif, since they all display a highly conserved C-terminal glycylglycine or glycine motif. Ubiquitin-like proteins are known in the art and share the common b-grasp fold of ubiquitin, including e.g. SUMO (e.g. SUM01 and SUM02), NEDD8, URM1 or Ufm1 , described by van der Veen et al. ( Annu Rev Biochem 81 , 323-57 (2012)). Thus, preferred ubiquitin-like-proteins of the present invention comprise SUM01 , SUM02, NEDD8, URM1 , ATG8, ATG12, URM1 , FAT10 or ISG15 or Ufm1. In case an ubiquitin or ubiquitin-like protein is used as the second polypeptide, a Lysine residue is preferably substituted in the first polypeptide with the unnatural amino acid. In a preferred embodiment the second polypeptide and the recognition motif are separated by a spacer, preferably a Leucine.

[0076] In general, the unnatural amino acid can be integrated in the first polypeptide using any method known in the art, e.g. solid phase polypeptide synthesis. However, in a preferred embodiment, the method is performed in a host cell, wherein the first polypeptide is expressed in said host cell using genetic code expansion comprising:

a) providing the host cell with an orthogonal heterologous tRNA synthetase/tRNA pair, wherein the heterologous tRNA synthetase specifically aminoacylates the heterologous tRNA with the unnatural amino acid and wherein the heterologous tRNA recognizes a codon that is not recognized by an endogenous tRNA; and

b) providing the host cell with a polynucleotide encoding the first polypeptide, comprising one or more codons recognized by the heterologous tRNA; and

culturing the host cell under conditions allowing expression of the first polypeptide.

[0077] Genetic code expansion allows the site-specific incorporation of an unnatural amino acid at virtually any chosen position into any polypeptide and is thus useful for the generation of the first polypeptide of the invention. Importantly, the approach is applicable to proteins under native conditions, allowing in context of the present invention the modification (e.g. ubiquitylation) of large multi-domain and non-refoldable proteins. A person skilled in the art knows how to employ genetic code expansion, which is by way of example disclosed in Neumann et al (Nat Chem Biol 2008, 4, 232) and Lang et al (Nat Chem 2012, 4, 298). Briefly, unnatural amino acids are genetically encoded in response to an amber codon (or any codon not coding for an amino acid) introduced into a gene of interest by employing engineered pyrrolysyl tRNA synthetase (PylRS)/tRNA_Cu_A pairs from Methanosarcina species, including Methanosarcina barkeri (Mb) and Methanosarcina mazei {Mm), wherein the pyrrolysyl tRNA synthetase has been mutated such that an unnatural amino acid is accepted by the pyrrolysyl tRNA synthetase.

[0078] Preferably, the orthogonal heterologous tRNA synthetase/tRNA pair is provided to the host cell by providing one or more polynucleotides comprising one or more expression cassettes encoding the heterologous tRNA synthetase and the heterologous tRNA, wherein the one or more polynucleotides can be provided as a vector or stably integrated in the host cell’s genome. Likewise, the transpeptidase is provided to the host cell by providing one or more polynucleotides comprising an expression cassette encoding the transpeptidase, wherein the one or more polynucleotides can be provided as a vector or stably integrated in the host cell’s genome. Preferably, the orthogonal heterologous tRNA synthetase/tRNA pair and the transpeptidase are provided to the host cell by providing one or more polynucleotides comprising an expression cassette encoding, preferably as a vector. Even more preferred is a host cell which comprises the expression cassettes coding for the orthogonal heterologous tRNA synthetase/tRNA pair and the transpeptidase integrated into the genome. And even more preferred is a host cell which comprises expression cassettes coding for the orthogonal heterologous tRNA synthetase/tRNA pair, the transpeptidase and the modified ubiquitin or ubiquitin-like protein integrated into the genome.

[0079] The term“culturing” as used herein relates to culturing the host cell under conditions allowing expression of the polypeptides required to put the method of the present invention into practice. In particular, the term“culturing” as used herein relates to culturing the host cell under conditions allowing expression of the first polypeptide comprising one or more unnatural amino acids using genetic code expansion, comprising the addition of the unnatural amino acid to the growth medium of the host cell in an amount sufficient to express the first polypeptide comprising one or more unnatural amino acids.

[0080] As used herein, a“host cell” refers to a cell which is capable of protein expression and optionally protein secretion, wherein the host cell can be comprised by a living organism. Such host cell is applied in the methods of the present invention. Host cells provided by the present invention can be eukaryotic or prokaryotic host cells. As will be appreciated by one of skill in the art, a prokaryotic cell lacks a membrane-bound nucleus, while a eukaryotic cell has a membrane- bound nucleus. Examples of eukaryotic cells include, but are not limited to, vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells or yeast cells. [0081] A“heterologous tRNA” that can be used according to the present invention is orthogonal to the tRNA synthetase used in the method of the invention and recognizes a codon (i.e. comprises the respective anticodon) that is not recognized by an endogenous tRNA of the host cell, e.g. UAG, UGA or UAA. Preferably, the heterologous tRNA recognizes the codon UAG, i.e. comprises the anticodon CUA.

[0082] In a preferred embodiment the orthogonal heterologous tRNA synthetase/tRNA pair is a pyrrolysyl tRNA synthetase and a tRNA from a Methanosarcina species, preferably from Methanosarcina barkeri or Methanosarcina mazei. In a further preferred embodiment of the method of the present invention the orthogonal heterologous tRNA synthetase/tRNA pair is a pyrrolysyl tRNA synthetase having an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 7 or a pyrrolysyl tRNA synthetase as shown in SEQ ID NO: 7 and tRNA from a Methanosarcina species having an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 8 or SEQ ID NO: 9 or a tRNA from a Methanosarcina species having an amino acid as shown in SEQ ID NO: 8 ( Methanosarcina barkeri) or SEQ ID NO: 9 ( Methanosarcina mazei). Preferred is a tRNA from a Methanosarcina species as shown in SEQ ID NO: 8 or SEQ ID NO: 9, comprising the anticodon CUA, as shown for the Methanosarcina barkeri tRNAcu_{A i}n SEQ ID NO: 14.

[0083] Preferably the pyrrolysyl tRNA synthetase is a mutant pyrrolysyl tRNA synthetase capable of aminoacylating the orthogonal tRNA with the unnatural amino acids AzGGK, AzGK, GK or GGK.

[0084] A mutant pyrrolysyl tRNA synthetase for AzGGK has an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 10, wherein the position corresponding to position 274 of SEQ ID NO: 10 is Alanine, the position corresponding to position 311 of SEQ ID NO: 10 is Glutamine and the position corresponding to position 313 of SEQ ID NO: 10 is Serine. Preferably, a pyrrolysyl tRNA synthetase for AzGGK has an amino acid sequence as shown in SEQ ID NO: 10.

[0085] A mutant pyrrolysyl tRNA synthetase for AzGK has an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 1 1 , wherein the position corresponding to position 271 of SEQ ID NO: 11 is Leucine, the position corresponding to position 274 of SEQ ID NO: 1 1 is Alanine and the position corresponding to position 313 of SEQ ID NO: 1 1 is Phenylalanine. Preferably, a pyrrolysyl tRNA synthetase for AzGK has an amino acid sequence as shown in SEQ ID NO: 1 1.

[0086] A mutant pyrrolysyl tRNA synthetase for GK has an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 12, wherein the position corresponding to position 266 of SEQ ID NO: 12 is Methionine, the position corresponding to position 270 of SEQ ID NO: 12 is Isoleucine, the position corresponding to position 271 of SEQ ID NO: 12 is Phenylalanine, the position corresponding to position 274 of SEQ ID NO: 12 is Alanine and the position corresponding to position 313 of SEQ ID NO: 12 is Phenylalanine. Preferably, a pyrrolysyl tRNA synthetase for GK has an amino acid sequence as shown in SEQ ID NO: 12.

[0087] A mutant pyrrolysyl tRNA synthetase for GGK has an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% to the amino acid sequence shown in SEQ ID NO: 13, wherein the position corresponding to position 266 of SEQ ID NO: 12 is Methionine, the position corresponding to position 270 of SEQ ID NO: 12 is Isoleucine, the position corresponding to position 271 of SEQ ID NO: 12 is Phenylalanine, the position corresponding to position 274 of SEQ ID NO: 12 is Alanine and the position corresponding to position 313 of SEQ ID NO: 12 is Phenylalanine. Preferably, a pyrrolysyl tRNA synthetase for GK has an amino acid sequence as shown in SEQ ID NO: 12.

[0088] A preferred combination of the present invention of a orthogonal heterologous tRNA synthetase/tRNA pair is the tRNAcu_A as shown in SEQ ID NO: 14 with a mutant pyrrolysyl tRNA synthetase having an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, and SEQ ID NO: 13

[0089] In a further preferred embodiment the host cell is a vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells, yeast cells, a bacterial cell, or a multicellular organism comprising a host cell, wherein the multicellular organism is preferably a mammal, an insect or a nematode and more preferably Caenorhabditis elegans, Drosophila or mouse.

[0090] In a further preferred embodiment the method is performed in vitro and wherein the modified polypeptide is obtained by admixing and incubating the provided components under conditions allowing transpeptidase mediated transpeptidation reaction. In this embodiment the required polypeptides can be obtained by solid phase peptide synthesis, in vitro transcription and translation or by expression in a host cell as described herein, wherein the expressed polypeptides are optionally purified. By way of example, an in vitro method according to the present invention is disclosed herein in the examples.

[0091] In a further preferred embodiment of the method of the present invention, the second polypeptide comprises a recognition motif selected from the group consisting of SEQ ID NO: 15 (LPXTG), SEQ ID NO: 16 (LLPXTG), SEQ ID NO: 17 (LAXTG), SEQ ID NO: 18 (LLAXTG), SEQ ID NO: 19 (LPXSG), SEQ ID NO: 20 (LLPXSG), wherein preferably a further Glycine residue is attached to the C-terminus of the recognition motif. The recognition motifs are particularly preferred if the second polypeptide is an ubiquitin or an ubiquitin-like protein. The recognition motifs shown in SEQ ID NO: 15 and 16 work with SrtA, Srt5M and Srt7M. The recognition motifs shown in SEQ ID NO: 17 and 18 work with Srt2A and mSrt2A. The recognition motifs shown in SEQ ID NO: 18 and 19 work with Srt4S. In context of the described recognition motifs, X can be any amino acid. However, in case the second polypeptide is ubiquitin or an ubiquitin-like protein X is preferably the amino acid comprised by the native C-terminus at the corresponding position of ubiquitin or the ubiquitin-like protein. By way of example, in case of Ubiquitin, X is Leucine and in case of SUM01 X is Glutamine. Or with other words, the native Ubiquitin C-terminus (LRLRGG; the native Ubiquitin is shown in SEQ ID NO: 30) is modified to LPLTGG (Ub(PT) as shown in SEQ ID NO: 31 ) in case of SrtA or a mutant thereof is used or LALTGG (Ub(AT) as shown in SEQ ID NO: 33) in case Srt2A or mSrt2A is used. In any case a Leucine can be incorporated at the N-terminus of the recognition motif resulting in LLPLTGG (Ub(LPT) as shown in SEQ ID NO: 32) or LLALTGG (Ub(LAT) as shown in SEQ ID NO: 33). Likewise, the C-termini of other ubiquitin-like proteins can be modified to comprise a recognition motif. By way of example, SUM01 comprises the native C-terminus QEQTGG, which can be modified to LAQTGG for use with Srt2a or mSrt2A. A further Leucine spacer can be introduced resulting in the C-terminus LLAQTGG of a modified SUM01 protein. Ubiquitin- and SUMO-conjugates obtained by the method of the invention thus preferably display a native isopeptide-bond between their C-terminal glycine and a chosen lysine in the first polypeptide protein, wherein the C-terminus comprises two amino acid substitution and optionally one additional Leucine incorporated as a spacer.

[0092] Importantly, ubiquitylation is a reversible process and tightly regulated by a family of enzymes called deubiquitinases (DUBs), as disclosed by Komander et al. ( Nat Rev Mol Cell Biol 10, 550-63 (2009)). Modifying the natural C-terminus of ubiquitin from“LRLRGG” to“LALTGG” (corresponding to R72A und R74T mutations in SEQ ID NO: 30) confers resistance to cleavage by various DUB families.

[0093] Accordingly, in a preferred embodiment, the present invention relates to a method for modifying a polypeptide, comprising: (i) expressing in a host cell a first polypeptide wherein one or more Lysine residues have been substituted with AzGGK, AzGK, GK, GGK, photoprotected GK, or photoprotected GGK and removing the azido group of AzGGK or AzGK by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse, when required;

(ii) expressing in the host cell a sortase A enzyme;

(iii) expressing in the host cell an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like-protein has been modified such that it comprises a recognition motif for the sortase A enzyme; and

(iv) culturing the host cell under conditions allowing expression of the first polypeptide, the Sortase A and the ubiquitin or ubiquitin-like protein; and

(v) obtaining the modified polypeptide.

[0094] In a further preferred embodiment, the present invention relates to a method for modifying a polypeptide, comprising:

(i) expressing in a host cell a first polypeptide wherein one or more Lysine residues have been substituted with AzGGK, AzGK, GK, GGK, photoprotected GK, or photoprotected GGK and removing the azido group of AzGGK or AzGK by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse, when required, wherein expressing the first polypeptide comprises: a) expressing in the host cell an heterologous tRNA synthetase for AzGGK, AzGK, GK, GGK, photoprotected GK, or photoprotected GGK disclosed herein and an orthogonal heterologous tRNA that recognizes a codon that is not recognized by an endogenous tRNA; and

b) providing the host cell with a polynucleotide encoding the first polypeptide, comprising one or more codons recognized by the heterologous tRNA;

(ii) expressing in the host cell a sortase A enzyme;

(v) obtaining the modified polypeptide.

[0095] In a further preferred embodiment, the present invention relates to an in vitro method for modifying a polypeptide of interest, comprising: (i) providing a first polypeptide in which one or more Lysine residues have been substituted with the AzGGK, AzGK, GK, GGK, photoprotected GK, or photoprotected GGK and removing the azido group of AzGGK or AzGK by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse, when required;

(ii) providing a sortase A;

(iii) providing an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like-protein has been modified such that it comprises a recognition motif for the sortase A enzyme; and

(iv) admixing and incubating the provided components under conditions allowing conjugation of the first polypeptide to the ubiquitin or ubiquitin-like-protein.

[0096] In a preferred embodiment, the method of the present invention can be put into practice by providing a subtiligase as a transpeptidase. In this embodiment the first polypeptide comprises one or more unnatural amino acids selected from the group consisting of AzGGK, AzGK, GK, GGK, photoprotected GK, and photoprotected GGK and the second polypeptide is an ubiquitin or ubiquitin-like-protein comprising a thioester at the C-terminus, wherein the subtiligase has preferably an amino acid sequence which has an identity of at least 70% to the amino acid sequence shown in SEQ ID NO: 41. Accordingly, the present invention relates to a method for modifying a polypeptide, comprising:

(i) providing a subtiligase;

(ii) providing a first polypeptide comprising one or more unnatural amino acids selected from the group consisting of AzGGK, AzGK, GK, GGK, photoprotected GK, and photoprotected GGK and removing the azido group of AzGGK or AzGK by providing a phosphine, preferably 2-(diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse, when required;

(iii) providing an ubiquitin or ubiquitin-like-protein comprising a thioester at the C-terminus; and

(iv) obtaining the modified polypeptide.

[0097] The term“subtiligase” as used herein relates to an engineered peptide ligase derived from the protease subtilisin that catalyzes the ligation of a polypeptide containing a donor C-terminal thioester (in this case ubiquitin or ubiquitin-like protein) to an acceptor peptide containing an o amine (in this case a first polypeptide comprising AzGGK, AzGK, GK, GGK, photoprotected GK, and photoprotected GGK). Recently, subtiligase has been used for the efficient ligation of cysteine- free peptides to protein thioesters and was dubbed enzyme-catalyzed EPL. A subtiligase that can be used in the method of the present invention is exemplarily shown in SEQ ID NO: 41. Thus, a subtiligase that can be used in accordance with the present invention has an amino acid sequence which has an identity of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% to the amino acid sequence shown in SEQ ID NO: 41.

[0098] An ubiquitin or ubiquitin-like protein comprising a thioester can for example be generated using intein technology. Thereby ubiquitin or ubiquitin-like protein (missing the two C-terminal Glycine residues) is expressed as an intein fusion with a CBD (chitin binding domain) C-terminal tag. After expression, the cell homogenate is passed through a column containing chitin. This allows the CBD of the chimeric protein to bind to the column. Elution with a thiol, e.g. MESNA (sodium 2-mercaptoehane sulfonate) cleaves the ubiquitin or ubiquitin-like-protein from the intein and affords the C-terminal thioester (see Figure 24a). Relevant for the subtiligase mediated transpeptidation reaction are the four C-terminal amino acids in the ubiquitin or ubiquitin-like protein to which a thioester has been conjugated; i.e. P4-P3-P2-P1 -thioester, as shown in Figure 24 B, wherein the important positions are P1 and P4. Regarding the first polypeptide, the two N- terminal amino acid sequences PT-P2’ are important, as shown in Figure 24 B. In case of the first polypeptide comprising e.g. GGK PT and P2’ are G. The present inventors performed a subtiligase mediated reaction using as the second polypeptide a wild type Ubiquitin, which lacks the C- terminal two Glycine residues, i.e. amino acids 1-74 of SEQ ID NO: 30, wherein a thioester has been conjugated to the C-terminal amino acid R (amino acid 74 of SEQ ID NO: 30). In this case P1 and P4 are R (amino acid 74 and 72 of SEQ ID NO: 30, respectively). As first polypeptide, the present inventors used GGK (the oamine of lysine was protected with a tert. butyloxycarbonyl group). In this reaction, the present inventors could efficiently conjugate GGK to the C-terminus of Ubiquitin. Thus, a subtiligase mediated transpeptidation reaction can be employed to conjugate Ubiquitin comprising a thioester at the C-terminus to a first polypeptide comprising one or more unnatural amino acids selected from the group consisting of GGK, AzGGK and photoprotected GGK. Furthermore, it is known in the art that range of residues at the P4 (e.g. L, A, E) and P1 (e.g. Y, L, I, V, F, A, W, H, G) positions and a range of residues at the PT (e.g. G, A, R, F, M, S, Y, P, E) and P2' (e.g. L, T, I, G) positions result in high conversion efficiencies, whereas Asp or Glu at P4 or PI or Pro at PT resulted in poor efficiencies. Thus, the subtiligase mediated transpeptidation reaction can be transferred to other ubiquitins (e.g. amino acid 1-75 of SEQ ID NO: 30) or ubiquitin-like proteins or first polypeptides comprising other unnatural amino acids (e.g. GK).

[0099] Ubiquitin and many ubiquitin-like proteins have a conserved C-terminus comprising two or one terminal Glycine residues. The method of the present invention can be put into practice using a subtiligase and ubiquitin or ubiquitin-like protein as a second polypeptide in the transpeptidation reaction. In case the first polypeptide comprises AzGGK, GGK or photoprotected GGK, the ubiquitin or ubiquitin-like protein preferably lacks the two C-terminal Glycines, i.e. the ubiquitin preferably comprises amino acids 1-74 or the amino acid sequence shown in SEQ ID NO: 30. In case the first polypeptide comprises AzGK, GK or photoprotected GK, the ubiquitin or ubiquitin-like protein preferably lacks the C-terminal Glycine, i.e. the ubiquitin preferably comprises amino acids 1-75 or the amino acid sequence shown in SEQ ID NO: 30. This will generate a native conjugate comprising a wild type ubiquitin or ubiquitin-like protein and a first polypeptide without any mutations.

[0100] In a further preferred embodiment of the method of the present invention the second polypeptide is an ubiquitin or ubiquitin-like-protein comprising a recognition motif for a first Sortase A enzyme in the C-terminus and one or more Lysine residues in the ubiquitin or ubiquitin-like- protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK, the method further comprising

(v) removing the azido group of AzGGK or AzGK in the ubiquitin or ubiquitin-like-protein by providing a phosphine, preferably 2-(diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse;

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a C-terminus comprising a recognition motif for a second Sortase A enzyme that is different to the recognition motif of the first Sortase A enzyme, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK, and said second Sortase A;

(vii) optionally repeating step (v) and (vi), wherein the ubiquitin or ubiquitin-like protein comprises a C-terminus comprising a recognition motif for a sortase A enzyme that is different to the recognition motifs of the ubiquitin or ubiquitin-like protein of the preceding steps; and

(viii) obtaining a modified polypeptide comprising a chain of two or more ubiquitins or ubiquitin- like-proteins.

[0101] Accordingly, in a preferred embodiment, the present invention relates to a method for modifying a polypeptide, comprising:

(i) providing a first Sortase A;

(ii) providing a first polypeptide comprising AzGGK, AzGK, photoprotected GK, photoprotected GGK, GGK or GK and removing the azido group of AzGGK or AzGK by providing a phosphine, preferably 2-(diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse, if required;

(iii) providing an ubiquitin or ubiquitin-like-protein comprising a recognition motif for a first Sortase A enzyme and one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK; (iv) obtaining the modified polypeptide comprising the first polypeptide conjugated to the ubiquitin or ubiquitin-like-protein;

(vi) incubating the polypeptide obtained in step (v) with

(vii) optionally repeating step (v) and (vi), wherein the ubiquitin or ubiquitin-like protein comprises a C-terminus comprising a recognition motif for a Sortase A enzyme that is different to the recognition motifs for the Sortase A enzymes of the ubiquitin or ubiquitin-like protein of the preceding steps; and

In this embodiment the first and the second Sortase A recognized different recognition motifs, i.e. they are orthogonal. By way of example, the first Sortase A is selected from Srt2A, Srt5M and Srt4S and the second Sortase A is selected from the remaining two Sortase A. Such a transpeptidation reaction employing two orthogonal Sortase A variants is exemplarily shown in Figure 23 A and Figure 26 A. Here, two ubiquitins were coupled to a first ubiquitin (serving as first polypeptide within the context of this embodiment) leading to a triple ubiquitin (see Figure 26 B,C). In case of a further (i.e. third) ubiquitin or ubiquitin-like protein the third Sortase A is the remaining Sortase A that has not been selected as the first or second Sortase A. Preferably, the last ubiquitin or ubiquitin-like protein comprises no unnatural amino acids.

[0102] In a further preferred embodiment of the method of the invention employing a subtiligase, one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK, the method further comprising

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a thioester at the C-terminus, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK, and a subtiligase;

(vii) optionally repeating step (v) and (vi); and

[0103] Accordingly, in a preferred embodiment, the present invention relates to a method for modifying a polypeptide, comprising:

(i) providing a subtiligase;

(iii) providing an ubiquitin or ubiquitin-like-protein comprising a thioester at the C-terminus, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK;

(iv) obtaining the modified polypeptide comprising the first polypeptide conjugated to the ubiquitin or ubiquitin-like-protein;

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a thioester at the C-terminus, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with AzGGK, AzGK, photoprotected GK, or photoprotected GGK, and

a subtiligase;

(vii) optionally repeating step (v) and (vi); and

[0104] The method of the invention employing a subtiligase for producing a polypeptide comprising a chain of two or more ubiquitins is shown in Figure 23B.

[0105] Introduction of protected GGK or GK unnatural amino acids (AzGGK/AzGK or photo protected GGK/GK) into both the acceptor and donor ubiquitin allows the generation of ubiquitin chains with defined length and topology. [0106] In a further preferred embodiment the present invention combines the sortase A based transpeptidation reaction as disclosed herein with the subtiligase based transpeptidation reaction as disclosed herein in order to generate a modified polypeptide comprising a chain of two or more ubiquitins or ubiquitin-like-proteins. By way of example, the first ubiquitin is conjugated to the first polypeptide employing the sortase A based transpeptidation reaction and the second ubiquitin is conjugated to the first ubiquitin employing the subtiligase based transpeptidation reaction and vice versa. Further ubiquitins can be conjugated to the second ubiquitin employing the transpeptidation reaction that has been employed to conjugate the second ubiquitin to the first ubiquitin.

[0107] In a further preferred embodiment the present invention relates to a polypeptide obtainable by the method of the present invention. Even more preferred is a polypeptide obtained by the method of the present invention.

[0108] In a further preferred embodiment the present invention relates to a multidomain or non- refoldable polypeptide conjugated to one or more ubiquitins or ubiquitin-like-proteins, wherein the one or more ubiquitins or ubiquitin-like-proteins comprise a C-terminal amino acid sequence selected from the group consisting of SEQ ID NO: 15 (LPXTGG), SEQ ID NO: 16 (LLPXTGG), SEQ ID NO: 17 (LAXTGG), SEQ ID NO: 18 (LLAXTGG), SEQ ID NO: 19 (LPXSGG) and SEQ ID NO: 20 (LLPXSGG). The disclosed C-terminal amino acid sequences are the six or seven terminal amino acids in the ubiquitin or ubiquitin-like-protein, i.e. in case of wild type ubiquitin the disclosed C-terminal amino acid sequences substitute amino acids 71 to 76 of the wild type ubiquitin amino acid sequence as shown in SEQ ID NO: 30. The method of the present invention allows for the first time the site-specific attachment of ubiquitins or ubiquitin-like proteins to non-refoldable and/or multi-domain proteins. Accordingly, before the present invention was made, it was not possible in the art to generate artificially generated non-refoldable and/or multi-domain proteins conjugated site-specifically to one or more ubiquitins or ubiquitin-like-proteins, in particular to one or more ubiquitins or ubiquitin-like-proteins comprising a C-terminal amino acid sequence selected from the group consisting of SEQ ID NO: 15 (LPXTGG), SEQ ID NO: 16 (LLPXTGG), SEQ ID NO: 17 (LAXTGG), SEQ ID NO: 18 (LLAXTGG), SEQ ID NO: 19 (LPXSGG) and SEQ ID NO: 20 (LLPXSGG).

[0109] In a further preferred embodiment the present invention relates to a pyrrolysyl tRNA synthetase comprising an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90% , at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence set forth in

(i) SEQ ID NO: 10, wherein amino acid residue 274 is Alanine, amino acid residue 311 is Glutamine and amino acid residue 313 is Serine; (ii) SEQ ID NO: 1 1 , wherein amino acid residue 271 is Leucine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine; and

(iii) SEQ ID NO: 12, wherein amino acid residue 266 is Methionine, amino acid residue 270 is Isoleucine, amino acid residue 271 is Phenylalanine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine. The present invention further relates to a polynucleotide encoding the pyrrolysyl tRNA synthetase of the invention. The present invention further relates to a vector comprising the polynucleotide encoding the pyrrolysyl tRNA synthetase of the invention. The present invention further relates to the use of the pyrrolysyl tRNA synthetase of the invention in the method of the invention or for genetic code expansion

[0110] In a further preferred embodiment the present invention relates to a sortase A mutant comprising an amino acid sequence having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence set forth in SEQ ID NO: 5, wherein amino acid residue 36 is Arginine, amino acid residue 44 is Cysteine, amino acid residue 46 is Histidine, amino acid residue 47 is Lysine, amino acid residue 50 is Glutamine, amino acid residue 80 is Proline, amino acid residue 94 is Isoleucine, amino acid residue 102 is Lysine, amino acid residue 104 is Histidine, amino acid residue 106 is Asparagine, amino acid residue 107 is Alanine, amino acid residue 109 is Glutamic acid, amino acid residue 115 is Glutamic acid, amino acid residue 124 is Valine, amino acid residue 132 is Glutamic acid, and amino acid residue 138 is Serine. The sortase A mutant of the invention (mSrt2A) is based on Srt2A (SEQ ID NO: 4), which further comprises the D47K and E50Q amino acid mutations. Advantageously, the mSrt2A of the invention allows working in in conditions with low Ca²⁺ concentrations, which was difficult with the Ca²⁺-dependent Srt2A. The present invention further relates to a polynucleotide encoding the sortase A mutant of the invention. The present invention further relates to a vector comprising a polynucleotide encoding the sortase A mutant of the invention. The present invention further relates to the use of the sortase A mutant of the invention in the method of the invention or for catalyzing a transpeptidation reaction.

[0111] In a further preferred embodiment the present invention relates to a kit comprising one or more polynucleotides encoding

(i) a sortase A or a subtiligase as disclosed herein;

(ii) an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like- protein comprises a recognition motif for the Sortase A or a thioester as disclosed herein; and

(iii) an orthogonal heterologous tRNA synthetase/tRNA pair as disclosed herein; and an unnatural amino acid selected from the group consisting of AzGGK, AzGK, photoprotected GK, photoprotected GGK, GGK and GK. [0112] In a further preferred embodiment the present invention relates to a host cell comprising:

(i) a sortase A or a subtiligase as disclosed herein;

(ii) an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like- protein comprises a recognition motif for the sortase A or a thioester as disclosed herein;

(iii) an orthogonal heterologous tRNA synthetase/tRNA pair as disclosed herein; and

(iv) optionally a polypeptide comprising one or more unnatural amino acids selected from the group consisting of AzGGK, AzGK, photoprotected GK, photoprotected GGK, GGK and GK.

[0113] In sum, the method of the present inventors allows attachment of ubiquitin or any ubiquitin- like protein to multi-domain proteins, as exemplified by preparation of mono-ubiquitylated proliferating cell nuclear antigen (PCNA), a key DNA replication/repair protein as disclosed in the examples. The present inventors show that sortase-mediated transpeptidation enables the site- specific, inducible and E1/E2/E3-enzyme-independent ubiquitylation and SUMOylation of proteins in living mammalian cells, opening up many new opportunities for the targeted analysis of ubiquitylation and SUMOylation regulatory processes. In this respect, the present inventors combine genetic code expansion, bioorthogonal Staudinger reduction and sortase-mediated transpeptidation to develop a novel and generally applicable tool to ubiquitylate or attach ubiquitin- like proteins to target proteins in an inducible fashion. The generated ubiquitin-/ubiquitin-like protein-conjugates display a native isopeptide-bond connecting the C-terminal glycine of ubiquitin with a chosen lysine in a target protein. Introduction of two point mutations in the ubiquitin C- terminus makes the conjugates resistant to isopeptide-cleavage by deubiquitinases. The method of the present inventors allows the site-specific attachment of ubiquitins and ubiquitin-like proteins to non-refoldable, multi-domain proteins and enables for the first time the site-specific, inducible and ubiquitin-ligase-independent ubiquitylation of proteins in mammalian cells, providing a powerful tool to dissect the biological functions of ubiquitylation with temporal control. Finally, the method of the present inventors can be transferred from ubiquitin to any other polypeptide without further ado as described herein.

V. Items of the Invention

The invention is further characterized by the following items.

1. A method for modifying a polypeptide, comprising:

(i) providing a transpeptidase;

(ii) providing a first polypeptide comprising one or more unnatural amino acids, wherein the unnatural amino acid comprises a first amino acid and at least one further amino acid conjugated to the side chain of the first amino acid; (iii) providing a second polypeptide, comprising a recognition motif for the transpeptidase; and

(iv) obtaining the modified polypeptide. The method according to item 1 , wherein the unnatural amino acid comprises two or more amino acid residues, wherein the first amino acid residue is integrated in the amino acid chain of the first polypeptide,

wherein one or more further amino acid residues are attached to the side chain of the first amino acid residue, and

wherein the one or more further amino acid residues react with the transpeptidase in the transpeptidation reaction,

wherein the first amino acid is preferably a Lysine residue and wherein the two or more further amino acids are linked via an isopeptide-bond to the amino group in the side chain of the first amino acid. The method according to item 1 or 2, wherein the transpeptidase is sortase A comprising an amino acid sequence which has an identity of at least 60% to the amino acid sequence shown in SEQ ID NO: 1 , preferably sortase A (SrtA) from Staphylococcus aureus or a mutant thereof capable of conjugating the second polypeptide to the first polypeptide, such as Srt2A having an amino acid sequence as shown in SEQ ID NO: 2, Srt5M having an amino acid sequence as shown in SEQ ID NO: 3, Srt7M having an amino acid sequence as shown in SEQ ID NO: 4, mSrt2A having an amino acid sequence as shown in SEQ ID NO: 5, or Srt4S having an amino acid sequence as shown in SEQ ID NO: 6. The method according to any one of items 1 to 3, wherein the unnatural amino acid is N⁶- glycylglycyl-L-lysine (GGK) or N⁶-glycyl-L-lysine (GK). The method according to any one of items 1 to 3, wherein the unnatural amino acid is /V⁶-(( 2- azidoacetyl)glycyl)-L-lysine (AzGGK) or /V⁶-((2-azidoacetyl)-L-lysine (AzGK) and wherein transpeptidation is induced by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA); or

wherein the unnatural amino acid is GK or GGK, whose N-terminal glycine residue is protected with a photoremovable protecting group, prefereably 2-nitrobenzyl or coumarin, and wherein deprotection to GK or GGK is induced with a light pulse. The method according to any one of items 3 to 5, wherein the second polypeptide is an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like- protein has been modified such that it comprises a recognition motif for the sortase A enzyme,

wherein the ubiquitin-like-protein is preferably SUM01 , SUM02, NEDD8, URM1 , Ufm1 , ATG8, ATG12, URM1 , FAT10 or ISG15. The method according to any one of items 1 to 6, wherein the method is performed in a host cell, wherein the first polypeptide is expressed in said host cell comprising:

a) providing the host cell with an orthogonal heterologous tRNA synthetase/tRNA pair, wherein the heterologous tRNA synthetase specifically aminoacylates the heterologous tRNA with the unnatural amino acid and

wherein the heterologous tRNA recognizes a codon that is not recognized by an endogenous tRNA; and

b) providing the host cell with a polynucleotide encoding the first polypeptide, comprising one or more codons recognized by the heterologous tRNA; and culturing the host cell under conditions allowing expression of the first polypeptide of interest. The method according to item 7, wherein the orthogonal heterologous tRNA synthetase/tRNA pair is a pyrrolysyl tRNA synthetase and a tRNA from a Methanosarcina species, preferably the pyrrolysyl tRNA synthetase is a mutant pyrrolysyl tRNA synthetase having an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 1 1 , SEQ ID NO: 12, and SEQ ID NO: 13 and/or the tRNA is the Methanosarcina barkeri tRNAcu_Aas shown in SEQ ID NO: 14. The method according to any one of items 7 or 8, wherein the host cell is a vertebrate cell, mammalian cell, human cell, animal cell, invertebrate cell, plant cell, nematodal cell, insect cell, stem cell, fungal cell, yeast cell, a bacterial cell, or a multicellular organism comprising a host cell, wherein the multicellular organism is preferably a mammal, an insect or a nematode and more preferably Caenorhabditis elegans, Drosophila or mouse. The method according to any one of items 1 to 6, wherein the method is performed in vitro, wherein the modified polypeptide is obtained by admixing and incubating the provided components under conditions allowing transpeptidase mediated transpeptidation reaction. The method according to any one of items 3 to 10, wherein the second polypeptide comprises a recognition motif selected from the group consisting of SEQ ID NO: 15 (LPXTG), SEQ ID NO: 16 (LLPXTG), SEQ ID NO: 17 (LAXTG), SEQ ID NO: 18 (LLAXTG), SEQ ID NO: 19 (LPXSG) and SEQ ID NO: 20 (LLPXSG). The method according to any one of items 1 and 6-10, wherein

(i) the transpeptidase is a subtiligase;

(ii) the first polypeptide comprises an unnatural amino acid as defined in item 4 or 5; and

(iii) the second polypeptide is an ubiquitin or ubiquitin-like-protein comprising a thioester at the C-terminus,

wherein the subtiligase has preferably an amino acid sequence which has an identity of at least 70% to the amino acid sequence shown in SEQ ID NO: 41. The method according to any one of items 6 to 1 1 , wherein the ubiquitin or ubiquitin-like- protein comprises a recognition motif for a first sortase A enzyme and one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in item 5, the method further comprising

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a C-terminus comprising a recognition motif for a second sortase A enzyme that is different to the recognition motif of the first sortase A enzyme, wherein one or more Lysine residues in the ubiquitin or ubiquitin- like-protein have been substituted with an unnatural amino acid as defined in item 5, and

said second sortase A;

(viii) obtaining a modified polypeptide comprising a chain of two or more ubiquitins or ubiquitin-like-proteins. The method according to item 12, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in item 5, the method further comprising

(v) removing the azido group of AzGGK or AzGK of the unnatural amino acid in the ubiquitin or ubiquitin-like-protein by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse; (vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a thioester at the C-terminus, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in item 5, and

a subtiligase;

(vii) optionally repeating step (v) and (vi); and

(viii) obtaining a modified polypeptide comprising a chain of two or more ubiquitins or ubiquitin-like-proteins. A modified polypeptide obtainable by the method according to any one of items 1-14. A multidomain or non-refoldable polypeptide conjugated to one or more ubiquitins or ubiquitin-like-proteins, wherein the one or more ubiquitins or ubiquitin-like-proteins comprise a C-terminal amino acid sequence selected from the group consisting of SEQ ID NO: 15 (LPXTGG), SEQ ID NO: 16 (LLPXTGG), SEQ ID NO: 17 (LAXTGG), SEQ ID NO: 18 (LLAXTGG), SEQ ID NO: 19 (LPXSGG) and SEQ ID NO: 20 (LLPXSGG). A pyrrolysyl tRNA synthetase comprising an amino acid sequence having at least 70% sequence identity to the amino acid sequence set forth in

(i) SEQ ID NO: 10, wherein amino acid residue 274 is Alanine, amino acid residue 31 1 is Glutamine and amino acid residue 313 is Serine;

(ii) SEQ ID NO: 1 1 , wherein amino acid residue 271 is Leucine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine; and

(iii) SEQ ID NO: 12, wherein amino acid residue 266 is Methionine, amino acid residue 270 is Isoleucine, amino acid residue 271 is Phenylalanine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine. A polynucleotide encoding a pyrrolysyl tRNA synthetase as defined in item 17. Use of the pyrrolysyl tRNA synthetase according to item 17 in the method according to any one of items 7 to 14 or for genetic code expansion. A sortase A mutant comprising an amino acid sequence having at least 60% sequence identity to the amino acid sequence set forth in SEQ ID NO: 5, wherein amino acid residue 36 is Arginine, amino acid residue 44 is Cysteine, amino acid residue 46 is Histidine, amino acid residue 47 is Lysine, amino acid residue 50 is Glutamine, amino acid residue 80 is Proline, amino acid residue 94 is Isoleucine, amino acid residue 102 is Lysine, amino acid residue 104 is Histidine, amino acid residue 106 is Asparagine, amino acid residue 107 is Alanine, amino acid residue 109 is Glutamic acid, amino acid residue 115 is Glutamic acid, amino acid residue 124 is Valine, amino acid residue 132 is Glutamic acid, and amino acid residue 138 is Serine. A polynucleotide encoding a sortase A mutant as defined in item 20. Use of the sortase A mutant according to item 20 in the method according to any one of item 3 to 1 1 and 13 or for catalyzing a transpeptidation reaction. A kit comprising one or more polynucleotides encoding

(i) a sortase A as defined in item 2 or a subtiligase as defined in item 12;

(ii) an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like-protein comprises a recognition motif for the sortase A or a thioester; and

(iii) an orthogonal heterologous tRNA synthetase/tRNA pair as defined in item 7 and 8; and an unnatural amino acid as defined in item 4 or 5. A host cell comprising:

(i) a sortase A as defined in item 2 or a subtiligase as defined in item 12;

(ii) an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like-protein comprises a recognition motif for the sortase A or a thioester;

(iii) an orthogonal heterologous tRNA synthetase/tRNA pair as defined in item 7 and 8; and

(iv) optionally a polypeptide comprising one or more unnatural amino acids as defined in item 4 or 5.

VI. Examples

The following Examples illustrate the invention, but are not to be construed as limiting the scope of the invention.

Materials and Methods

Expression of AzGGK-bearinq proteins

[0114] Chemical competent E. coli K12 cells were co-transformed with pPylT_POI (which encodes MbtRNAcu_A and a C-terminally His6-tagged POI gene with a TAG codon at the desired position under an arabinose-inducible promotor) and pBK_AzGGKRS (which encodes AzGGKRS) plasmids. After recovery with 1 ml. SOC medium for one hour at 37 °C, cells were cultured overnight in 50 ml. of non-inducing medium containing full strength antibiotics (tetracycline and ampicillin), at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in auto- induction medium (17 amino acid mix (no phenylalanine was added), see Supplementary Methods), supplemented with full-strength antibiotics (tetracycline and ampicillin) and 4 mM AzGGK. After overnight incubation at 37 °C, cells were harvested by centrifugation (4000 xg, 15 min, 4 °C), flash frozen in liquid nitrogen and the cell pellet was stored at -80 °C. Compositions of non-inducing and auto-inducing media can be found in Supplementary Tables S3-S7. For protein purification, cell pellets were thawed on ice and re-suspended in 20 ml. lysis buffer (20 mM Tris pH 8.0, 30 mM imidazole, 300 mM NaCI, 0.175 mg/ml_ PMSF, 0.1 mg/ml_ DNase I and one complete™ protease inhibitor tablet (Roche)). The cell suspension was incubated on ice for 30 minutes and sonicated with cooling in an ice-water bath. The lysed cells were centrifuged (15,000 xg, 40 min, 4 °C), the cleared lysate was added to Ni²⁺-NTA slurry (Jena Bioscience) (0.2 ml. of slurry per 100 ml. of culture) and the mixture was incubated with agitation for one hour at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 10 CV wash buffer (20 mM Tris pH 8.0, 30 mM imidazole pH 8.0 and 300 mM NaCI). The protein was eluated in 1 ml. fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the protein were pooled together, concentrated and re-buffered to sortase buffer (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Purified proteins were analyzed by 15 % SDS-PAGE and LC-MS. Reduction of AzGGK-bearinq proteins via Staudinqer Reduction

[0115] Reduction of the azide moiety of AzGGK to the amine moiety (GGK) on the POI was either performed directly in the cell lysate or on purified proteins. For in-lysate reduction 1 mM 2DPBA (100 mM stock solution in EtOH) was added to the lysate for 3 h at room temperature (250 rpm). Afterwards the lysate was centrifuged (15,000 xg, 30 min, 4 °C) and the purification was performed as described in the Online or Supplementary Methods. In order to reduce the azide moiety on purified proteins 2 equivalents of 2DPBA were added to the purified re-buffered protein, followed by incubation for 2 h at room temperature and re-buffering to remove excess of 2DPBA.

In vitro ubiquitylation/SUMOylation of purified proteins.

[0116] In vitro ubiquitylation and SUMOylation of purified proteins was performed with 20 mM of GGK-bearing protein (eg. sfGFP-N150GGK), 100 mM ubiquitin/SUMO mutant (either Ub(PT), Ub(LPT), Ub(AT), Ub(LAT), SUMO(AT) or SUMO(LAT)) and 20 mM sortase variant in sortase buffer (50 mM Tris-HCI pH 7.5, 150 mM NaCI, 5 mM CaCI₂) if not stated differently. Reactions were incubated at 37 °C (with shaking at 600 rpm). 6 mI_ samples were taken at the denoted time points and quenched by the addition of 4x SDS loading buffer and boiling at 95 °C for 10 minutes. Samples were loaded on SDS-PAGE gels and visualized by Coomassie staining.

In vivo SUMOylation/ubiquitylation of proteins in live E. coli

[0117] Chemical competent E. coli K12 cells were co-transformed with pPylT_POI, pBK_AzGGKRS and pBAD-Duett (encodes mSrt2A or SrtA and Ub(LAT)/SUMO(AT)) plasmids. After recovery with 1 ml. of SOC medium for one hour at 37 °C, cells were cultured overnight in 50 ml. of non-inducing medium containing half-strength antibiotics (tetracyline, ampicillin and kanamycin), at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in auto- induction medium supplemented with half-strength antibiotics and 4 mM AzGGK. After incubation for 24 hours at 37 °C the cells were gently washed two times with PBS (2000 xg, 15 min, 4 °C) and re-suspended in auto-induction media supplemented with half-strength antibiotics. After incubation at 37 °C (200 rpm) for one hour a 1 ml. sample was taken, centrifuged (2000 xg, 2 min, 4 °C), washed twice with PBS (2000 xg, 2 min, 4 °C), flash frozen and stored at -20 °C. Afterwards 2 mM 2DPBA was added to the culture and further 1 ml. samples were taken at the denoted time points. Samples were denatured by addition of 4x Laemmli buffer and heated to 95 °C for 10 minutes followed by centrifugation (16000 xg, 10 min, 4 °C). 10 mI_ of supernatant were subjected to SDS- PAGE followed by Western-Blotting using a-His6 antibody to detect in vivo ubiquitylation and SUMOylation of target proteins. In vivo ubiquitylation and SUMOylation of proteins in mammalian cells

[0118] Human embryonic kidney 293T cells (HEK293T) were cultured in Dulbecco’s modified Eagle's medium (DMEM, Sigma Aldrich) supplemented with 10% (v/v) fetal bovine serum (FBS, Biochrom) and 1 % antibiotic-antimycotic solution (25 pg/mL amphotenicin B, 10 mg/mL streptomycin, and 10,000 units of penicillin, Sigma-Aldrich) at 37 °C in a humidified chamber with 5 % C0₂. On the day prior to transfection HEK293T cells were seeded in Poly-L-lysine coated 6-well plates (Greiner) at 6x10⁵ cells/well to reach 60-80 % confluence for transfection. Fresh DMEM, supplemented with 2 mM AzGGK was added followed by transfection using 1 pg of the respective sortase plasmids (plRES_mSrt2A or plRES_Srt2A), 1.5 pg of the corresponding Ub(LAT) or SUMO(LAT) plasmids (pcDNAJJb(LAT) or pcDNA_SUMO(LAT), 0.56 pg pEF1- sf G F P N 150T AG/P C N AK 164T AG plasmid and 0.19 pg of the pEF1-AzGGK-PylRS plasmid. All transfections were performed using Lipofectamine2000 (Invitrogen) according to the manufacturer^'s instructions. Cells were grown for 35 hours in the presence of AzGGK (2 mM). To remove AzGGK, cells were washed twice with PBS, and fresh, complete DMEM was added. Cells were then incubated for another 2-3 hours to eliminate residual traces of AzGGK. After incubation, another washing step with PBS was carried out and 400 pM 2DPBA-containing complete DMEM was added. Cells were then incubated at 37 °C, 5 % C0₂ for 16-18 hours. Two additional washing steps with PBS were performed prior to harvesting. Cell lysis was performed with RIPA buffer followed by Western-Blotting using a-His antibody to detect ubiquitylation and SUMOylation of target proteins. Expression of Srt2A or mSrt2A was detected via D-Myc Western-Blotting.

Preparation of a triubiqutin linked via K48 and K6 isopeptide bonds

[0119] Bifunctional Ub-K48GGK bearing the LPT C-terminus (dubbed UbK48GGK-LPT) was diluted to 20 pM in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). Afterwards 100 pM of the donor ubiquitin (Ub(LAT)) was added followed by the addition of 20 pM Srt2A (without His-Tag). Incubation was performed at 37 °C, 600 rpm, for one hour. Sortase-mediated transpeptidation was stopped by the addition of 200 pM phenylvinylsulfon and further incubation for 10 minutes at 37 °C, 600 rpm.

[0120] Afterwards Ni²⁺-NTA slurry (Jena Bioscience) (0.1 mL/mg of bifunctional ubiquitin) was added to the reaction mixture and the mixture was incubated with agitating for 1 h at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 40 column volumes of wash buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI_2, 30 mM imidazole) to remove Srt2A and the excess of donor ubiquitin. The protein was eluated in 0.2 mL fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the mixture of diubiquitin and unreacted bifunctional ubiquitin were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). In order to remove unreacted bifunctional ubiquitin, size-exclusion chromatography (SEC) was performed using a Superdex S75 16/600 (GE Healthcare) with sortase buffer. Fractions containing Ub-isoK48(LAT)- Ub-LPT were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Diubiquitin was stored at -80 °C until further use.

[0121] In a second sortase reaction Ub-isoK48(LAT)-Ub-LPT was diluted to 20 mM in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). Afterwards 100 pM of the acceptor ubiquitin UbK6GGK was added followed by the addition of 20 pM Srt5M (without His-Tag). Incubation was performed at 37 °C, 600 rpm, for one hour. Sortase-mediated transpeptidation was stopped by the addition of 200 pM phenylvinylsulfon and further incubation for 10 minutes at 37 °C, 600 rpm. Afterwards Ni²⁺-NTA slurry (Jena Bioscience) (0.1 ml_/mg of diubiquitin) was added to the reaction mixture and the mixture was incubated agitating for 1 h at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 40 CV of wash buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI_2, 30 mM imidazole) to remove Srt5M, unreacted bifunctional ubiquitin as well as UbK6GGK. The protein was eluated in 0.2 ml. fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the mixture of triubiquitin, unreacted diubiquitin as well as UbK6GGK were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). In order to remove unreacted bifunctional ubiquitin as well as UbK6GGK, size-exclusion chromatography (SEC) was performed using a Superdex S75 16/600 (GE Healthcare) with sortase buffer. Fractions containing Ub- isoK48(LAT)-Ub-isoK6(LPT)-Ub (TriUb) were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Triubiquitin was stored at -80 °C until further use.

Preparation of a DiUb-SUMQ2 hybrid chains

[0122] DiUb-SUM02 hybrid chains were prepared analogously to TriUb but instead of UbK6GGK SUMOKXXGGK variants were used.

Hydrolyzation assays of Dillbs to investigate sortase orthogonality

[0123] K6-DiUbs linked via different sortase motifs (S2A = LALTG; S5M = LPLTG; S4S = LPLSG) were diluted into to 40 pM in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). Afterwards 10 pM of Sortase variant was added. Incubation was performed at 37 °C, 600 rpm. 6 pL samples were taken at the denoted time points and quenched by the addition of 4x SDS loading buffer and boiling at 95 °C for 10 minutes. Samples were loaded on SDS-PAGE gels and visualized by Coomassie staining. Results

A strategy for sortase-mediated peptide ligation employing a novel UAA

[0124] The sortase-mediated formation of a peptide-bond between a protein of interest (POI) and a modified ubiquitin bearing a sortase recognition tag, involves synthesis and site-specific incorporation of a UAA, in which two glycine residues are coupled via an isopeptide-bond to the s- amino group of lysine glycylglycyl-L-lysine GGK, Fig. 7a). In parallel, ubiquitin is expressed with a modified C-terminal sequence bearing two amino acid mutations that serves as recognition tag for the enzyme sortase A (SrtA). SrtA is an enzyme from Staphylococcus aureus that catalyzes the covalent attachment of proteins to the bacterial cell wall. SrtA cleaves the peptide-bond between T and G in the recognition motif LPXTG, yielding an activated thioester-intermediate that subsequently undergoes specific transpeptidation with a peptide containing N-terminal glycine residues. The present inventors envisioned that by modifying the unstructured ubiquitin C-terminus from its native sequence LRLRGG to LPLTGG, it would be recognized by SrtA and form a thioester-linked Ub-SrtA intermediate that could be nucleophilically attacked by the D-amino group of the terminal glycine residue of GGK, incorporated site-specifically into a POI. Thereby ubiquitin should be conjugated to a target protein via a native isopeptide-bond between a chosen lysine in the POI and G76 of ubiquitin (Fig. 6). The sortase-generated Ub-POI conjugates display two point mutations in the Ub-C-terminus (R72P and R74T), still all surface patches (I36-, I44-, F4-patch and TEK box) that are essential for recognition by ubiquitin-binding domains (UBDs) remain untouched. Since genetic code expansion allows the site-specific incorporation of a UAA at virtually any chosen position into any POI, this method will be useful for the generation of essentially any ubiquitylated protein. Importantly, the approach is applicable to proteins under native conditions, allowing the ubiquitylation of large multi-domain and non-refoldable proteins; an endeavour that is challenging with present chemical methods. Furthermore, such an approach will be extendable to attachment of other Ubls via native isopeptide-linkages, since they all display the highly conserved C-terminal glycylglycine motif.

[0125] In order to investigate if GGK is a possible donor in a sortase-mediated transpeptidation reaction, the present inventors synthesized a seven amino acid long peptide resembling the sortase-compatible C-terminus of ubiquitin (Fmoc-VLPLTGG) via solid phase peptide synthesis (SPPS). GGK was synthesized by coupling tert - butyloxycarbonyl protected diglycine to the □- amino group of lysine (Supplementary Methods). In parallel, the present inventors expressed and purified a mutant version of wt SrtA, Srt5M. This engineered sortase variant contains five mutations that confer 140-fold increased activity compared to wt SrtA. Incubation of GGK with the Fmoc- VLPLTGG peptide in the presence of Srt5M led to nearly quantitative formation of transpeptidation product within 30 minutes as observed by LC-MS (Fig. 7b and 7c). Similarly, ubiquitin bearing the sortase-compatible C-terminus (dubbed Ub(PT)) reacted to the expected conjugate, when incubated with GGK in the presence of Srt5M (Fig. 7d).

[0126] To allow time-resolved ubiquitylation and SUMOylation of target proteins in living cells, the present inventors envisioned to confer inducibility to our sortylation approach via a bioorthogonal reaction. Therefore, the present inventors masked the primary amino group of GGK as an azide moiety, creating AzGGK (Fig. 6). AzGGK was synthesized by SPPS via coupling glycine and azidoacetic acid subsequently onto resin-coupled Na-protected lysine (Supplementary Methods). The azido group of AzGGK can be reduced quantitatively via Staudinger reduction with the water- soluble and cell-permeable phosphine, 2DPBA (2-(diphenylphosphino)benzoic acid) to restore GGK. As expected, incubation of AzGGK with the peptide Fmoc-VLPLTGG in the presence of Srt5M did not lead to conjugate formation (Fig. 7c). After addition of 2DPBA, however, sortase- mediated transpeptidation proceeded smoothly. As the azide group in AzGGK blocks its activity as a donor-substrate in sortase-mediated transpeptidation, site-specific incorporation of AzGGK and its in vivo reduction with 2DPBA will provide an approach for site-specific ubiquitylation and SUMOylation in living cells with temporal control.

Site-specific incorporation of AzGGK

[0127] Encouraged by the peptide-based results of sortase-mediated ligation the present inventors set out to site-specifi cally incorporate AzGGK into proteins in vivo. A large variety of structurally diverse UAAs have been genetically encoded in response to an amber codon introduced into a gene of interest by employing engineered pyrrolysyl tRNA synthetase (PylRS)/tRNA_Cu_A pairs from Methanosarcina species, including Methanosarcina barkeri (Mb) and Methanosarcina mazei (Mm). Genetic code expansion using the PylRS/tRNA_Cu_A pair and evolved variants thereof now allows the site-specific incorporation of UAAs in bacteria, Saccharomyces cerevisiae, mammalian cells, Arabidopsis thaliana and in living organisms. Many of the lysine-based UAAs that can be incorporated via PylRS mutants display an alkyl side chain with a bulky and often hydrophobic moiety that is attached to lysine via a carbamate linkage. In a few lysine-derivatives, for which PylRS mutants have been engineered, the side chain is attached via an isopeptide-bond to the s- amino group of lysine, similar as in the wt substrate pyrrolysine. None of the so far incorporated lysine-derivatives carries however a planar and polar peptide-bond in its side chain as it is the case for the amino acid AzGGK (Fig. 1 a). Guided by structural analyses of the C-terminal catalytic centre of wt PylRS and its mutants, the present inventors screened a panel of > 25 different MbPylRS mutants that accepted lysine-derivatives with long and bulky side chains for their ability to direct the selective and site-specific incorporation of AzGGK. As none of the tested PylRS showed the desired activity, the present inventors created a new PylRS library by DNA-shuffling of the PylRS C-terminal domain of 17 known synthetases and spiking in an unbiased error-prone PCR product of the wt catalytic PylRS C-lobe (Fig. 8). This PylRS library was subjected to alternating rounds of positive and negative selection in E. coli. In the last round the present inventors combined the positive selection step with a fluorescence readout by co-transforming surviving clones from the negative selection with a reporter plasmid bearing both a chloramphenicol-acetyltransferase gene interrupted by an amber codon, as well as a superfolder green fluorescent protein (sfGFP) gene interrupted by an amber codon. sfGFP-expressing colonies that grew in the presence of chloramphenicol were picked and the synthetase efficiency and selectivity evaluated by fluorescence-intensity in the presence and absence of AzGGK. Mutants that showed at least 5-fold fluorescence-intensity increase in the presence of AzGGK were used for preparative scale expression of C-terminal His6-tagged sfGFP bearing an amber codon at position 150 (sfGFP-N150TAG-His6). E. coli containing a PylRS mutant with the three mutations L274A, N31 1Q and C313S in the enzyme active site (dubbed AzGGKRS) and a plasmid that encodes MbtRNAcu_A and sfGFP-N150TAG-His6 led to the amino acid-dependent synthesis of full- length sfGFP (Fig. 1 b). ESI-MS analysis revealed incorporation of AzGGK. Apart from the mass peak corresponding to sfGFP-AzGGK, a further low-intensity peak (approximately 10-20%) was observed in the ESI-MS analysis, which could be assigned to misincorporation of phenylalanine (Fig. 9). sfGFP expression under auto-induction conditions, omitting phenylalanine, abolished its misincorporation and led to clean expression of full-length sfGFP-N150AzGGK-His6. Up to 50 mg of purified sfGFP per litre cell culture were obtained after Ni-NTA affinity chromatography. sfGFP- AzGGK was reduced quantitatively to sfGFP-GGK with 2DPBA within 10 minutes at room temperature, as indicated by ESI-MS analysis (Fig. 1 c and 1d).

Ubiquitylation of GGK-bearinq proteins via sortase-mediated transpeptidation

[0128] With AzGGK- and GGK-containing proteins in hand, the present inventors investigated if it was possible to site-specifically ubiquitylate such modified proteins using Srt5M. The present inventors incubated sfGFP-GGK with an excess of Ub(PT) in the presence of Srt5M in aqueous buffer at 37 °C. SDS-PAGE- and MS-analysis revealed formation of a sfGFP-Ub(PT) conjugate (Fig. 2a and 2b). Nonspecific ubiquitylation was not detected with control sfGFP that contained BocK in place of GGK at the same site, or with sfGFP-GGK and Ub(PT) in the absence of Srt5M (Fig. 2a and Fig. 10). Although ubiquitylation was selective for GGK-containing proteins and LC- MS confirmed the correct isopeptide-linked conjugate, only 10-20% of sfGFP-GGK had reacted to the ubiquitylated product after three hours incubation. The present inventors reasoned that the low conversion may stem from restricted access of Srt5M to the unstructured C-terminus of Ub(PT), as the first leucine of the sortase-motif LPLTGG (L71 in Ub) is engaged in the C-terminal b-sheet of ubiquitin. In order to confer better access of Srt5M to its recognition motif, the present inventors introduced a leucine spacer within the LPLTGG sequence between L71 and P72, generating Ub(LPT) with an LLPLTGG C-terminal sequence to shift the sortagging-motif by one amino acid away from the compact ubiquitin b-grasp fold, making it more accessible. Indeed, incubation of Ub(LPT) with sfGFP-GGK in the presence of Srt5M afforded > 60% of ubiquitylated sfGFP within five minutes (Fig. 2a). The identity of sfGFP-Ub(LPT) conjugate was confirmed by LC-MS (Fig. 2b) and control experiments with sfGFP-BocK confirmed the selectivity of the transpeptidation reaction (Fig. 2a and Fig. 10). Encouraged by these results the present inventors next set out to generate diubiquitins with different linkages using sortylation. For this, the present inventors individually expressed C-terminal His6-tagged ubiquitins with AzGGK at position K6, K1 1 , K27, K29, K33, K48 and K63, respectively. The Ub-AzGGK variants were reduced to their Ub-GGK counterparts using 2DPBA and were characterized by SDS-PAGE and LC-MS (Fig. 1 1 a and 1 1 b). Incubation of five of the Ub-GGK variants (Ub-K6GGK, Ub-K1 1 GGK, Ub-K33GGK, Ub-K48GGK and Ub-K63GGK) with a five-fold excess of Ub(LPT) in the presence of Srt5M resulted in efficient and selective formation of site-specifi cally isopeptide-linked diubiquitins (diUbs) within an hour (Fig. 2c). K27-diUb and K29-diUb could not be generated using sortylation (Fig. 1 1 c). Positions K27 and K29 are less exposed than the other lysine residues in the ubiquitin fold and the present inventors speculate that Ub-K27GGK and Ub-K29GGK cannot nucleophilically attack the thioester formed between Ub(LPT) and Srt5M due to steric clashes. To be recognized by Srt5M, the wt C-terminus of ubiquitin was modified to LPLTGG. Although the present inventors were able to produce mono- ubiquitylated proteins and diUbs, the present inventors reasoned that especially the proline residue introduced at position 72 of ubiquitin might be a poor mimic for the native C-terminus, as it gives it an unusual conformational rigidity and the cis-isomer of the L-P peptide-bond might be elevated in respect to the native L-R peptide-bond. The present inventors therefore turned their attention to a recently evolved sortase (Srt2A) with reprogrammed substrate specificity (Dorr et al., Proc Natl Acad Sci U S A 1 1 1 , 13343-8 (2014)). Srt2A recognizes a LAXTG motif rather than a LPXTG motif, omitting the need to introduce a proline residue into the ubiquitin C-terminus. The present inventors expressed Srt2A and ubiquitin bearing an LALTGG (dubbed Ub(AT)) or an LLALTGG (dubbed Ub(LAT)) C-terminus. Incubation of both Ub(AT) and Ub(LAT) with sfGFP-GGK in the presence of Srt2A yielded the corresponding sfGFP-Ub conjugates which were characterized by LC-MS (Fig. 12a and 12b). The sfGFP-Ub(LAT) conjugate formed with conversion rates > 50% within five minutes incubation at 37 °C. Control experiments with sfGFP-BocK demonstrated the specificity and selectivity of Srt2A-mediated ubiquitylation using both Ub(LAT) or Ub(AT).

Srt2A-mediated generation of non-hvdrolvsable diubiquitins

[0129] The present inventors incubated all seven Ub-GGK-His6 acceptor ubiquitins with donor ubiquitins Ub(AT) and Ub(LAT) in the presence of Srt2A. K6-, K1 1 -, K33-, K48- and K63-linked diubiquitins (diUbs) were formed in very good conversion rates (> 60%) employing either Ub(AT) or Ub(LAT) (Fig. 3a). Importantly, incubation of Ub-K6GGK with Ub(wt), displaying the native C- terminal sequence LRLRGG did not lead to diUb formation (Fig. 12c). Upon sortase-mediated transpeptidation between GGK-bearing acceptor ubiquitin and donor ubiquitin the sortagging-motif LALTG is re-installed in the generated diUb, making sortylation in principle a reversible approach (Fig. 13). For preparative diUb formation, the sortase-mediated reactions were therefore stopped at maximum product formation by adding the cysteine protease inhibitor phenylvinylsulfone. Srt2A- generated diUbs were purified via Ni-NTA chromatography, followed by size-exclusion chromatography. All diUbs were obtained in multi-milligram scale and were characterized by LC- MS (Fig. 14). As sortylation works under native conditions in aqueous buffers, no refolding of the diUbs is needed, providing an advantage over described chemical methods that often rely on acidic deprotection conditions and desulfurization protocols. This is especially important and advantageous for the ubiquitylation of multi-domain proteins and other complex, non-refoldable protein targets, as shown herein.

[0130] The present inventors next determined if their isopeptide-linked diUbs were resistant to DUBs. It has been shown previously that the mutation L73P in the ubiquitin C-terminus confers resistance to various DUB families. The present inventors examined if the introduced R72A and R74T mutations in sortase-generated diubiquitins would also be refractory to cleavage by DUBs. The present inventors incubated the purified diUbs with the catalytic domain of ubiquitin carboxyl terminal hydrolase 2 (USP2_CD)· While all native diUbs were efficiently cleaved to the corresponding monoubiquitins within one hour, all of the sortase-generated diUbs were resistant to USP2_CD cleavage of the GG-isopeptide-bond (Fig. 3b). On incubating the sortase-generated diUbs with USP2_CD the present inventors observed a faint band that migrated slightly further than our diUbs in SDS-PAGE gels. Via LC-MS the present inventors could prove that this band corresponds to cleavage of the C-terminal His6-tag of the acceptor ubiquitin. Overnight incubation of sortase- generated diUbs with USP2_CD led to quantitative cleavage of the C-terminal His6-tag. The isopeptide-bond linking the two ubiquitins, was however resistant to cleavage also upon extended incubation with USP2_CD (Fig. 15).

Generating site-specific ubiquitylated PCNA via sortylation

[0131] One of the advantages of sortylation resides in its applicability to complex and non- refoldable proteins, including multi-domain proteins. The present inventors therefore set out to generate site-specifically ubiquitylated PCNA via sortylation. PCNA is a ring-shaped, homotrimeric protein that functions as a sliding clamp during DNA replication and enhances the processivity of DNA polymerase delta (pol□). DNA lesions in the template strand lead to stalling of the replication fork, which triggers mono-ubiquitylation of PCNA at K164 (PCNA-Ub) via specific E2/E3 enzymes (Fig. 3c). PCNA-Ub in turn leads to recruitment of specialized translesion synthesis (TLS) polymerases to the DNA damage site in order to traverse the damage. Most TLS polymerases contain conserved UBDs for recognition of PCNA-Ub. Ubiquitylated PCNA with non-native disulphide or triazole linkages has been obtained via different chemical approaches. To generate PCNA-Ub conjugates displaying an isopeptide-linkage via their sortase approach, the present inventors introduced an amber codon at position 164 into the gene coding for PCNA and expressed it in the presence of AzGGK and its specific tRNA/synthetase pair. PCNA-K164GGK was purified after reduction with 2DPBA at multi milligram scale. Incubation of purified PCNA- K164GGK with Ub(LAT) and Ub(AT) in the presence of Srt2A showed conversion yields > 50% and was specific for GGK-bearing PCNA (Fig. 3d and Fig. 16). The present inventors purified PCNA-Ub(AT) by Ni-NTA affinity chromatography, followed by size exclusion chromatography to obtain ubiquitylated PCNA. Endogenous PCNA ubiquitylation is reversible, with Usp1 providing the deconjugation activity and UAF1 serving as activator. As seen for sortase-generated diUbs, the isopeptide bond in PCNA-Ub(AT) was recalcitrant to incubation with Usp1/UAF1 , while natively ubiquitylated PCNA was completely cleaved within one hour under the same conditions (Fig. 3e).

Sortase-mediated SUMOylation

SUMO (small-ubiquitin-like-modifier) proteins display the common b-grasp fold with a flexible six- residue C-terminal tail and the characteristic GG motif that is exposed after proteolytic maturation and enzymatically attached via an isopeptide-bond to a lysine in the target protein. The present inventors envisaged that their sortylation approach might also be applicable to the site-specific formation of SUMO conjugates. Four SUMO isoforms exist in human cells. The present inventors cloned and expressed a SUM01 protein bearing the Srt2A-compatible C-terminus LAQTGG (dubbed SUMO(AT)). The native C-terminal sequence of mature, proteolytically processed SUM01 is QEQTGG. Introduction of the sortagging-motif into the SUM01-C-terminus results in two point mutations: Q92L and E93A (Fig 4a). The present inventors incubated sfGFP-GGK with SUMO(AT) in the presence of Srt2A. SDS-PAGE analysis confirmed formation of a sfGFP-SUMO(AT) conjugate with conversion yields > 50% after six hours. As seen for sortase-mediated ubiquitylation, SUMO(LAT) with an extra leucine spacer in the C-terminus, led to more rapid SUMOylation with > 50% product formation within 15 minutes. Control experiments with sfGFP- BocK confirmed specificity of SUMOylation (Fig. 4b and Fig. 17a). Sortase-mediated formation of a Ub-SUMO(AT) conjugate with > 70% yields shows the generality of our approach (Fig. 17b).

Rational design of a Ca2+-independent Srt2A mutant for sortylation in living E. coli

[0132] The activity of S. aureus SrtA derived sortase mutants, including Srt5M and Srt2A, is strongly dependent on Ca²⁺ ions. Binding of Ca²⁺ to glutamate residues in the b3/b4 loop enhances substrate binding by stabilizing a closed conformation of the active site b6/b7 loop in S. aureus SrtA (Figure 18). This strong Ca²⁺-dependency may make it difficult to use Srt2A in conditions with low Ca²⁺ concentrations, including in the cytosol of living cells or in the presence of Ca²⁺-binding compounds. While the SrtA superfamily shows a conserved active site among different Gram-positive bacteria, the amino acids in the b3/b4 loop that bind Ca²⁺ are however not conserved. In fact, Streptococcus pyogenes SrtA and Bacillus anthracis SrtA show Ca²⁺- independent catalytic activities. Furthermore, a Ca²⁺-independent Srt5M mutant (dubbed Srt7M) was recently obtained by substituting two glutamate residues within the b3/b4 loop with neutral or positively charged amino acids. Srt7M was used for efficient in vitro peptide ligation both in the presence and absence of CaCI₂ and was shown to be functional in living C. elegans. The Srt2A variant is derived from Srt5M and shows in total 1 1 mutations to its parental Srt5M enzyme. The negatively charged amino acids in the b3/b4 loop (D47, E50 and D54) that are not conserved in Ca²⁺-independent SrtA mutants are however present in Srt2A. In analogy to the rational design of Srt7M the present inventors substituted D47 in Srt2A with a lysine residue, speculating that it might form a salt bridge with E113, thereby balancing the electrostatic repulsion, which might destabilize the closed conformation of the b6/b7 loop in the absence of Ca²⁺. Furthermore the present inventors introduced an E108Q mutation to reduce the negative charge within this pocket (Fig. 18b). The present inventors expressed the Srt2A variant bearing two point mutations D47K and E50Q (dubbed mSrt2A) and tested its catalytic efficiency in forming diUbs from Ub-K6GGK and Ub(LAT) in the absence and presence of 5 mM CaCI2. To guarantee Ca²⁺-free conditions, the present inventors supplemented the samples lacking CaCI₂ with 5 mM of Ca²⁺-chelating agent ethylene glycol tetraacetic acid (EGTA). Indeed, mSrt2A was active in Ca²⁺-free conditions, while Srt2A showed limited activity in the absence of Ca²⁺ (Figure 19a). The motivation of the present inventors for creating a Ca²⁺-independent Srt2A mutant stemmed from their vision to establish a sortase-mediated approach that would enable site-specific ubiquitylation and SUMOylation of proteins in living cells. The present inventors first set out to build ubiquitin- and SUMO-conjugates in living bacteria. This would provide a method with great potential to specifically produce ubiquitylated/SUMOylated eukaryotic proteins in large quantities in the established workhorse E. coli. The present inventors co-expressed mSrt2A, SUMO(AT) and sfGFP-AzGGK-His6 in E. coli. After 24 hours, cells were washed to remove residual AzGGK and treated with 2DPBA to induce reduction to sfGFP-GGK-His6 and thereby trigger SUMOylation. Formation of sfGFP-SUMO(AT) conjugate was analysed by anti-His6 Western Blotting and was visible already 30 minutes after 2DPBA addition. SUMOylation did not take place when the present inventors expressed sfGFP- BocK instead of sfGFP-AzGGK or when the present inventors omitted 2DPBA, proving the specificity and inducibility of their approach (Fig. 4c). In agreement with in vitro experiments on purified proteins conducted in the absence of Ca²⁺, SUMOylation in living E. coli was more effective with mSrt2A as compared to Srt2A (Figure 19b). Sortylation using mSrt2A furthermore allowed 2DPBA-triggered attachment of Ub(LAT) and SUMO(LAT) to PCNA-K164AzGGK in live E. coli, showing the generality of their approach (Fig. 20).

Inducible, site-specific ubiquitylation and SUMOylation of proteins in mammalian cells

[0133] Encouraged by successful sortase-mediated generation of Ub- and SUMO-conjugates in live E. coli, the present inventors set out to incorporate AzGGK into proteins in mammalian cells to test the possibility of ubiquitylating and SUMOylating proteins in living HEK293T in an inducible fashion. As sortase-mediated Ub-conjugates are resistant to isopeptidase activity of various DUBs, such an approach would provide an attractive tool for studying the effect of stable site-specific mono-ubiquitiylation in physiological settings. To demonstrate the incorporation of AzGGK into proteins in HEK293T cells, the present inventors transferred the mutations of AzGGKRS into a mammalian optimized /WmPylRS. Western blots and fluorescence imaging demonstrated highly efficient incorporation of AzGGK into sfGFP-N150TAG-His6 in HEK293T cells using the AzGGRS/tRNAcu_A pair (Fig. 5a). The incorporation was confirmed by LC-MS analysis of purified sfGFP-N150AzGGK-His6 (Fig. 5b). The present inventors next tested if AzGGK-bearing proteins could be reduced with 2DPBA in vivo. 2DPBA and other benign triarylphosphine reagents have been used extensively for bioorthogonal Staudinger ligation/reduction in mammalian cells and 2DPBA shows good cell permeability. The present inventors expressed sfGFP-N150AzGGK-His6 in HEK293T cells, washed cells to remove AzGGK and treated them with 500 mM 2DPBA. Cells were lysed and treated with sortase and Ub(LAT) for one hour. Western-Blot analysis revealed specific formation of sfGFP-Ub(LAT) conjugates in HEK293T-lysates that had been treated with 2DPBA, proving the inducibility of sortylation (Fig. 21a). The present inventors next turned to the question of whether the sortase-mediated ubiquitylation and SUMOylation approach could be used in the cytosol of living HEK293T cells. The present inventors co-expressed Ub(LAT) or SUMO(LAT) together with a codon optimized version of C-terminal Myc-tagged mSrt2A and sfGFP-N150AzGGK-His6 in HEK293T cells for 36 hours, washed cells with AzGGK-free medium and treated them with 400 mM 2DPBA overnight. After washing, cells were lysed and analysed by anti-His6 Western-Blotting. For both Ub(LAT) and SUMO(LAT), distinct bands corresponding to ubiquitylated and SUMOylated sfGFP were detected (Fig. 5c). Importantly, samples that were not treated with 2DPBA, did not yield the corresponding Ub- or SUMO-conjugates. Also in cells, where the present inventors overexpressed Ub(wt) and SUMO(wt) instead of the Ub/SUMO-mutants with sortase-compatible C-terminus, the present inventors could not detect any ubiquitylated and SUMOylated protein, proving the specificity and inducibility of our approach (Fig. 5c). Interestingly, and in contrast to sortase-mediated transpeptidation in living E. coli, in HEK293T cells, mSrt2A and Srt2A lead to similar ubiquitylation and SUMOylation yields 16 hours after triggering the reaction through addition of 2DPBA (Fig. 21 b). Finally, the present inventors set out to ubiquitylate/SUMOylate PCNA in live mammalian cells (Fig. 5d). Sortylation is dependent on co- expression and co-localization of all three proteins (PCNA, Ub(LAT) or SUMO(LAT) and sortase) that have to come into proximity to yield ubiquitylated or SUMOylated PCNA. The present inventors envisioned that they might enhance co-localization by fusing a nuclear localization sequence to Ub/SUMO constructs and sortase. The present inventors co-expressed all components in HEK293T cells for 48 hours, and triggered ubiquitylation/SUMOylation via Staudinger reduction of PCNA-K164AzGGK with 2DPBA. Western-Blot analysis 16 hours after addition of 2DPBA showed specific ubiquitylation/SUMOylation of PCNA in the presence of Ub(LAT) or SUMO(LAT), but not when over-expressing Ub(wt) or SUMO(wt) (Fig. 5d). Collectively, these data establish the sortylation approach as a powerful tool to generate site-specific isopeptide-linked Ub/SUMO-POI conjugates under physiological conditions in living mammalian cells. Importantly, the approach is triggerable through addition of a small molecule via a bioorthogonal Staudinger reaction conferring temporal resolution over the ubiquitylation/SUMOylation event and yields hydrolysis-resistant mono-ubiquitylated proteins.

Proof of principle of orthogonal sortases

[0134] Since different S. aureus Sortase A mutants recognize different recognition motifs, the inventors were interested in their orthogonality concerning their recognition motifs. Orthogonal Sortases would allow to iteratively use Sortylation in order to build chains. Hydrolyzation Assays of DiUbs linked via the different motifs were used to study orthogonallity.

[0135] Sortase 2A hydrolyzes LPLSG-linked DiUb (but with low activity compared to the positive control (LALTG-linked DiUb) and is therefore not orthogonal to the LPLSG motif (Sortase 4S). Hydrolysis of the LPLTG linked DiUb was not observed. Thus, Sortase 2A is orthogonal to the LPLTG motif (see Fig. 25A).

[0136] Sortase 5M hydrolyzes the LPLSG linked DiUb and is therefore not orthogonal to the LPLSG motif (Sortase 4S). Hydrolysis of the LALTG linked DiUb was not observed. Sortase 5M is thus orthogonal to the LALTG motif (see Fig. 25B).

[0137] Sortase 4S hydrolyzes the LPLTG linked DiUb and is therefore not orthogonal to the LPLTG motif (Sortase 5M). But hydrolysis of the LALTG linked DiUb was not observed (see Fig. 25C). Thus, Sortase 4S is orthogonal to the LALTG motif (S2A) but not vice versa

[0138] The inventors therefore identified the following orthogonal pair: S5M + S2A. Using the orthogonal Sortase pair S5M/S2A the inventors generated a triubiquitin (TriUb) linked via two different isopeptide bonds. In the first step a’’bifunctional” ubiquitin with K48GGK and the LALTG C-terminus gets converted into a diubiquitin using Ub-LPLTG and S5M (Fig. 26A). In the second step the purified diubiquitin gets converted into a triubiquitin using S2A and UbK6GGK to produce a triubiquitin linked via isopeptide bonds at positions K48 and K6.

[0139] The assay shows successful formation of TriUb reaching maximum yield after 1 h reaction time (Fig. 26B). The band emerging below the DiUb corresponds to the cleavage of the His-Tag as can be seen in the negative control without UbK6GGK. Purification of TriUb from the reaction mixture was successful and yielded pure TriUb. The measured mass of TriUb generated by orthogonal sortases corresponds to the calculated mass (Fig. 26C).

[0140] After successful proof of principle TriUb formation, the inventors set out to expand orthogonal sortases to SUMO-Ub hybrid chains which play an important role in DNA damage repair. Analogously to the formation of the K48/K6 linked TriUb, the inventors generated the different hybrid chains via two iterative Sortase reactions using the orthogonal Sortase pair S5M/S2A (Fig. 27A). In order to produce these hybrid chains, the inventors needed to place GGK at different lysine positions in SUM02 (K1 1 , K21 , K33, K35, K42 and K45). The LALTG motif was introduced between the ubiquitins and the LPLTG motif between the DiUb and the SUM02.

[0141] The hybrid chain formation assay showed excellent conversion of the DiUb to the hybrid chain within 1 h (Fig. 27B). The hybrid could be purified by SEC (Fig. 27C). The hybrid chain formation was also shown by the inventors for multiple SUM02 sites (Fig. 27D).

DISCUSSION

[0142] The present inventors have shown that site-specific incorporation of AzGGK into proteins in bacteria and mammalian cells via genetic code expansion and its reduction to GGK through a bioorthogonal Staudinger reaction allows sortase-mediated ubiquitylation and SUMOylation. This represents the first approach where a site-specifi cally introduced UAA serves as a platform for a chemoenzymatic reaction, namely the sortase-mediated transpeptidation with Ub- and SUMO- mutants. Sortase-generated Ub-conjugates display a native isopeptide-bond linking the C-terminal G76 to the D-amino group of a chosen lysine in a target protein. To be recognized by sortase, two point mutations (R72A and R74T) are introduced into the ubiquitin unstructured C-terminus. Importantly, thereby amino acids L71 and L73 in the C-terminal sequence that are essential for formation of the hydrophobic patch that centres around I36 (I36 patch) and is important for binding to various UBDs is preserved. Also other important surface areas within ubiquitin involved in interactions with ubiquitin binding proteins (I44-, F4-, D58 and TEK box) remain intact. The present inventors show the generality of their sortase-mediated approach by producing differently linked diUbs. Even though sortase-generated diUbs are linked via a native isopeptide-bond, the introduced mutations (R72A and R74T) confer resistance to DUBs, providing a valuable tool to interrogate cell-signalling pathways and to assign ubiquitin-specific proteins.

[0143] Sortase-mediated ubiquitylation works under native, aqueous conditions and allows modification of complex, non-refoldable, multi-domain protein targets; an endeavour that is challenging using present chemical ubiquitylation approaches or protein targets where the corresponding E2/E3-enzymes are not known or show reduced activity/specificity in vitro. The present inventors utilize sortylation to site-specifically ubiquitylate the homotrimeric DNA repair protein PCNA. As seen for sortase-generated diUbs, the PCNA-Ub(AT) conjugate is refractory to cleavage by its specific DUB complex.

[0144] Also for sortase-mediated SUMOylation, two point mutations are introduced into the unstructured C-terminus of SUM01 , namely Q92L and E93A. Sortase-mediated transpeptidation yields SUMO-protein conjugates linking a specific lysine in the target protein with G97 in SUM01 via a native isopeptide-bond, presenting a much sought-after tool to create site-specific, well- defined SUMO conjugates. [0145] Besides ubiquitylation and SUMOylation, sortylation will also be conferrable to covalent modification of proteins with other Ubls, such as NEDD8, URM1 or Ufm1 , processes that are much less understood than ubiquitylation and SUMOylation.

[0146] The technology described herein both extends and complements existing methods for studying ubiquitylation and SUMOylation networks. Besides providing a general and easily applicable method for studying effects of stable mono-ubiquitylation on multi-domain and non- refoldable protein targets, the sortase-mediated approach can be developed into a tool for identifying ubiquitin binding proteins. In such a scenario the donor ubiquitin (Ub(AT)) will be expressed with a site-specifically incorporated UAA, bearing either a photo crosslinking or a chemical crosslinking moiety. Sortase-mediated reaction with a POI containing GGK at a specific site will create a well-defined POI-Ub conjugate that can be used for the proteomic identification (using photo crosslinking UAAs) of Ub-binding proteins in mammalian cell lysates or for chemical stabilization of transient E3/DUB-substrate complexes (using e.g. bromoalkyl-bearing UAAs).

[0147] Importantly, the present inventors show that sortase-mediated transpeptidation can be extended to site-specific mono-ubiquitylation and mono-SUMOylation under physiological conditions in living cells, creating a system that is orthogonal to endogenous E1/E2/E3-enzymes. The availability of a bacterial system to specifically produce ubiquitylated and SUMOylated eukaryotic proteins in the work horse E. coli may facilitate future crystallographic, biophysical and biochemical analyses of ubiquitylated and SUMOylated proteins. In vivo E. coli ubiquitylation and SUMOylation is at the moment carried out by co-transforming E. coli with three different plasmids; the present inventors envision that generation of E. coli strains stably expressing sortase and/or Ub/SUMO variants may lead to a more modular and efficient expression system.

[0148] Similarly, the present inventors’ in vivo mammalian cell ubiquitylation and SUMOylation system may benefit from engineered cell lines, stably expressing sortase and/or modified Ub/SUMO. Mono-ubiquitylation of target proteins in mammalian cells regulates processes ranging from membrane transport to transcriptional activation. Since the present inventors approach relies on site-specific incorporation of AzGGK rather than GGK, ubiquitylation becomes triggerable by a small molecule, in principle enabling the study of temporal aspects of mono-ubiquitylation in live cells. Incorporation of a photo caged version of GGK would allow triggering mono-ubiquitylation in cells with high temporal and spatial resolution and would enable the study of ubiquitylation and its effect on substrate localization in real time. Furthermore, ubiquitin regulated cell-signalling pathways and networks could be dissected by obviating the requirement to activate upstream signalling components.

[0149] In conclusion, the present inventors describe a chemo enzymatic approach to ubiquitylate and SUMOylate proteins in vitro and in live cells. The approach creates DUB-resistant Ub/SUMO conjugates with native isopeptide-linkages and is easily implementable in typical biology research labs, as the required amino acid AzGGK can be synthesized at multi-gram scale within a day. The present inventors imagine this technology, which for the first time shows sortase-based ubiquitylation and SUMOylation in living cells, thereby creating an enzymatic approach that is orthogonal to highly specialized E1/E2/E3-enzymes, will have the potential to provide immediate impact to many ubiquitin researchers.

Supplementary Methods

General methods.

[0150] All solvents and chemical reagents were purchased from Sigma, Carbolution, Acros Organics or Fisher Scientific and were used without further purification unless otherwise stated. NMR spectra were recorded on a Bruker 500 UltraShield™ spectrometer (500 MHz for ¹H-NMR, 125 MHz for ¹³C-NMR). Chemical shifts (5), reported in ppm, are referenced to the residual proton solvent signals (DMSO -d_e - 2.50 ppm for ¹H-NMR and 39.5 ppm for ¹³C-NMR spectra). Coupling constants (J) are reported in Hertz (Hz) while peak multiplicities are descripted as follows: s (singlet), d (doublet), t (triplet), q (quartet), quint (quintet), dt (doublet of triplets), ddd (doublet of doublet of doublets), m (multiplet), br (broad signal). Small molecule LC-MS was carried out on an Agilent Technologies 1260 Infinity LC-MS system with a 6310 Quadrupole spectrometer. The solvent system consisted of 0.1 % formic acid in water as buffer A and 0.1 % formic acid in ACN as buffer B. Small molecule LC-MS was carried out on a Phenomenex Aeris™ Peptide XB-C18 column (100 x 2.1 mm, 3.6 pm). The samples were analysed in both positive and negative mode and followed by UV absorbance at 193, 254 and/or 280 nm.

[0151] Oligonucleotide primers were designed with NEBuilder and purchased from Sigma. 15 % SDS-PAGE gels (1 10 V for 15 min, then 200 V for 45 min) were run on a Bolt™ Mini Gel Tank (Invitrogen) system. Gels were stained with Quick Coomassie Stain (Generon). Protein Color Prestained Protein Standard, Broad Range 11-245 kDa (NEB) was used as protein marker. Western blots were carried out on iBIot® 2 Dry Blotting System (Life Technologies) using Method P0 (20 V for 1 min, 23 V for 4 min, 25 V for 2 min). After blotting, the nitrocellulose membrane was blocked with 5 % skim milk powder solution in 1x TBST buffer (1 h at room temperature) and stained with 1 :5000 Anti-His6-Peroxidase antibody (Roche) in 1 % skim milk powder solution in 1 x TBST (1 h at room temperature). After washing, the membrane was treated with WesternBright™ ECL-spray (Advansta) and the proteins visualized using ImageQuant™ LAS 4000 (GE Life Sciences). Protein and DNA concentrations were measured on NanoPhotometer® NP60 (Implen).

Chemical Synthesis

General synthetic procedures for solid phase peptide synthesis (SPPS) [0152] A^-glycylglycyl-L-lysine (GGK), /V⁶-((2-azidoacetyl)glycyl)-L-lysine (Azido-GGK) and the heptapeptide Fmoc-VLPLTGG were synthesized via solid phase peptide synthesis (SPPS). SPPS was performed in a custom-made glass apparatus with a frit for larger amounts of resin or in plastic syringes with a frit for small amounts (< 1 g). Shaking was performed manually or by using a rotary unit. The equivalents (eq.) used were based on the maximal loading capacity of the CTC resin given by the supplier.

Loading and capping of CTC resin

[0153] SPPS was performed according to the Fmoc-strategy for solid phase synthesis using CTC- resin. Therefore, 1.2 eq. of Fmoc-protected amino acid and 2.5 eq. of DIPEA were dissolved in anhydrous DCM (10 mL/g resin) and then added to 1.0 eq. of CTC-resin (100-200 mesh, 1.0-1.6 mmol/g maximal loading capacity) in a syringe equipped with a frit. The mixture was allowed to shake for 1 h at RT. Capping of the remaining chlorotrityl-groups was performed by adding 3 eq. of MeOH and 2.5 eq DIPEA to the syringe followed by 15 minutes of shaking at RT. Subsequently, the resin was washed 5 times with DCM and 5 times with DMF. The loading capacity of the resin was determined by mass gain after washing the resin with MeOH and drying in high vacuum.

On-resin Fmoc-deprotection

[0154] Removal of the Fmoc-protection group was performed by adding a solution of 20% piperidine in DMF to the resin in the syringe, followed by five minutes of shaking at RT. This procedure was repeated for 10 minutes. The resin was subsequently washed five times with DMF.

On-resin coupling of amino acids

[0155] For the coupling of amino acids, a 0.1 -0.2 M solution of the Fmoc-protected amino acid (2 eq.), HATU (2 eq.), HOAt (2 eq.) and DIPEA (5 eq.) was prepared in DMF and stirred for 5 min at RT. Afterwards, the syringe/reactor containing the Fmoc-deprotected resin-bound free amino acid was incubated with this solution at RT for 45 minutes. Subsequently, the resin was washed five times with DMF.

Cleavage from the CTC resin

[0156] The resin was washed five times with DMF and five times with DCM and then treated with a solution of 20% HFIP in DCM (v/v) at RT for 10 min. The filtrate was collected and the procedure was repeated two more times. Afterwards, the resin was washed three times with DCM. The filtrates were combined and the solvent was evaporated under reduced pressure. Removal of acid-labile protection groups

[0157] For the removal of acid-labile protection groups the product was dissolved in a mixture of TFA/DCM/H2O (90/5/5, v/v/v) and stirred at RT for 1 h. Subsequently, the solvents were removed under reduced pressure followed by co-evaporation with toluene.

Precipitation with Et_?Q

[0158] After removal of the solvents the product was added dropwise to a centrifugal tube with ice- cold Et₂0. After centrifugation, the precipitate was washed twice with ice-cold Et₂0 and centrifuged again. The resulting precipitate was dissolved in H₂0 and lyophilized.

Synthesis of Boc-qlvcylqlvcine (Boc-GIv-GIv-OH)

[0159] Diglycine (4 g, 0.030 mol) and Boc-anhydride (9.81 g, 0.045 mol) were dissolved in a mixture of dioxane:H₂0 (5:1 ). Subsequently triethylamine (8.30 ml, 0.060 mol) was added and the mixture was stirred for four hours in a melting ice-bath. Afterwards 500 ml of H₂0 was added and the mixture was acidified to pH 2 using 1 M HCI. The resulting solution was extracted three times with EtOAc. The organic layers were pooled, washed against brine, dried with Na₂S0₄ and filtered. Evaporation of the remaining solvent delivered Boc-diglycine as a white powder with 94.4 % yield (6.57 g). Boc-diglycine was stored at -20 °C until further use.

1 B

- ,

Scheme S1: Synthetic route to Boc-glycylglycine

¹H-NMR (500 MHz, DMSO-d6): d (ppm) 8.04 (t, J = 5.83 Hz, 1 H, NH), 6.97 (t, J = 6.16 Hz, 1 H, NH), 3.76 (d, J = 5.87 Hz, 2H, CH₂), 3.56 (d, 2H, CH₂), 1.38 (s, 9H, 3 CH₃);

MS (ESI+) m/z 233.4 [M+H]⁺

Calculated for C₉H₁₆N₂0₅: 233.1 [M+H]⁺

Synthesis of 2-azidoacetic acid

[0160] Bromoacetic acid (17.25 g, 0.124 mol) was dissolved in 200 ml of H20, cooled to 0 °C in an ice bath while stirring. Afterwards, NaN3 (16.12g, 0.248 mol) was added to the solution and stirred overnight allowing the reaction mixture to reach RT. The reaction mixture was then acidified to pH 1 with 3 M HCI and extracted three times with Et20. The organic layers were pooled, washed with brine, dried with Na2S04 and filtered. Evaporation of the remaining solvent delivered azidoacetic acid as yellow liquid with 95.2 % yield (1 1.80 g). Azidoacetic acid was stored at RT until further use.

Scheme S2: Synthetic route to 2-azidoacetic acid

‘H-NMR (500 MHz, CDC1₃): d (ppm) 7.53 (bs, 1H, COOH), 3.96 (s, 2H, CH₂)

Synthesis of N6-qlvcylqlvcyl-L-lvsine (GGK) via SPPS

[0161] CTC-resin (10.0 g, 1 mmol/g maximal loading capacity) and Boc-Lys(Fmoc)-OH (0.012 mol, 5.616 g) were weighted into a customized glass apparatus with a frit. Subsequently DIPEA (0.025 mol, 4.345 ml) in anhydrous DCM (100 ml) was added to the glass reactor followed by shaking at RT for 45 min. Afterwards, a solution of DIPEA (0.025 mol, 4.345 ml) and MeOH (0.030 mol, 1.215 ml) in anhydrous DCM (50 ml) was added to the glass reactor followed by shaking at RT for 15 min. Subsequently, the resin was washed 5 times with DCM (10 ml/g resin) and five times with DMF (10 mPg resin). For Fmoc- deprotection 20% piperidine in DMF was added to the glass reactor followed by shaking at RT for 10 min. This procedure was repeated once for 15 min. Thereupon the glass reactor was washed with five times DMF (10 mPg resin) and a solution of Boc-Gly-Gly-OH (0.020 mol, 4.640 g), HATU (0.020 mol, 7.600 g) and DIPEA (0.050 mol, 8.692 ml) in DMF (10 ml/g resin) was added. This mixture was allowed to shake at RT for 60 min followed by five times washing with DMF (10 mPg resin) and five times washing with DCM (10 ml/g resin). For cleavage 100 ml of 20 % HFIP in DCM was added to the glass reactor followed by shaking for 10 min at RT. This procedure was repeated and the filtrates were combined in a round bottom flask. The solvent was evaporated under reduced pressure and the product was dissolved in a mixture of TFA/DCM/H₂0 (90/5/5, v/v/v, 50 ml) for Boc-deprotection. This solution was allowed to stir at RT for 1 h then the solvent was evaporated under reduced pressure to a small volume. The product, which was still dissolved in a small volume of solvent, was precipitated in 300 ml ice-cold Et₂0 in 50 ml centrifugal tubes. After centrifugation and washing of the precipitate with ice-cold Et₂0 the product was dissolved in H₂0 and lyophilized, which delivered N⁶ -glycylglycyl-L-lysine (GGK) as a yellowish solid with 74.8 % yield (3.65 g, double TFA-salt, calculated using the maximal loading capacity as 100 %). GGK was stored at -20 °C. Stock solutions were prepared by dissolving GGK in H₂0 followed by neutralization with NaOH.

Scheme S3: Synthetic route to amino acid GGK

‘H-NMR (500 MHz, DMSO-d6): d (ppm) 8.65 (t, J = 5.75 Hz, 1H, NH), 8.32 (s, 3H, NH₃), 8.14 (s, 3H, NH₃), 7.99 (t, J= 5.67 Hz, 1H, NH), 3.84 (dt, J= 4.8 Hz, 1H, CH), 3.74 (d, J= 5.73 Hz, 2H, CH₂), 3.61 (s, J = 4.58 Hz, 2H, CH₂), 3.04 (dt, J = 6.47 Hz, 2H, CH₂), 1.85 - 1.66 (m, 2H, CH₂), 1.48 - 1.34 (m, 2H, CH₂), 1.34 - 1.22 (m, 2H, CH₂);

¹³C-NMR (125 MHz, DMSO-d6): d (ppm) 171.5, 168.3, 166.5, 52.3, 42.4, 38.6, 30.1, 29.0, 22.2†

MS (ESI+) m/z 261.3 [M+H]⁺; Calculated for C_I0H₂₁N₄O₄ ⁺: 261.2 [M+H]⁺

Synthesis of N6-((2-azidoacetyl)qlvcyl)-L-lvsine via SPPS

[0162] CTC-resin (10.0 g, 1 mmol/g maximal loading capacity) and Boc-Lys(Fmoc)-OH (0.012 mol, 5.616 g) were weighted into a customized glass apparatus with a frit. Subsequently DIPEA (0.025 mol, 4.345 ml) in anhydrous DCM (100 ml) was added to the glass reactor followed by shaking at RT for 45 min. Afterwards a solution of DIPEA (0.025 mol, 4.345 ml) and MeOH (0.030 mol, 1.215 ml) in anhydrous DCM (50 ml) was added to the glass reactor followed by shaking at RT for 15 min. Subsequently the resin was washed five times with DCM (10 ml/g resin) and five times with DMF (10 ml/g resin). For Fmoc-deprotection 20% piperidine in DMF (10 ml/g resin) was added to the glass reactor followed by shaking at RT for 10 min. This procedure was repeated once for 15 min. The glass reactor was washed five times with DMF (10 ml/g resin) and a solution of Fmoc- Gly-OH (0.020 mol, 5.94 g), HATU (0.020 mol, 7.60 g) and DIPEA (0.050 mol, 8.69 ml) in DMF (10 ml/g resin) was added. This mixture was shaken at RT for 1 h followed by five times washing with DMF (10 ml/g resin). After Fmoc-deprotection the glass reactor was washed five times with DMF (10 ml/g resin) and azidoacetic-acid (0.030 mol, 2.24 ml), HOBt (0.030 mol, 4.08 g) and DIC (0.030 mol, 4.64 ml) in DMF (10 ml/g resin) was added. This mixture was shaken at RT for 2 h followed by five times washing with DMF (10 ml/g resin) and five times with DCM (10 ml/g resin). For cleavage 100 ml of 20 % HFIP in DCM were added to the glass reactor followed by shaking for 10 min at RT. This procedure was repeated and the filtrates were combined in a round bottom flask. The solvent was evaporated under reduced pressure and the product was dissolved in a mixture of TFA/DCM/H₂0 (90/5/5, v/v/v, 50 ml) for Boc-deprotection. This solution was stirred at RT for 1 h then the solvent was evaporated under reduced pressure to a small volume. The product was precipitated in 300 ml ice-cold Et₂0 in 50 ml centrifugal tubes. After centrifugation and washing of the precipitate with ice-cold Et₂0 the product was dissolved in H₂0 and lyophilized, which delivered /V⁶-((2-azidoacetyl)glycyl)-L-lysine (Azido-GGK) as a yellowish solid with 86.3 % yield (3.45 g, TFA- salt, calculated using the maximal loading capacity as 100 %). /V⁶-((2-azidoacetyl)glycyl)-L-lysine (Azido-GGK) was stored at -20 °C. Stock solutions were prepared by dissolving Azido-GGK (100 mM) in H₂0 followed by neutralization with NaOH.

Boc-Lys( Fmoc)-OH 2-Chlortrityl chloride resin

1 .2 eq. 1 eq.

NHFmoc Fmoc-deprotection

Capping

20% piperidine

3 cq. IfeOH

2.5 eq, DIPEA

- ►

dry DCM, RT. 15 min

1. Fmoc-Gly-OH Coupling

.

Azido-GGK

Scheme S4: Synthetic route to amino acid azido-GGK

‘H-NMR (500 MHz, DMSO-d6): d (ppm) 8.35 (t, J= 5.73 Hz, 1H, NH), 8.27 (s, 3H, NH₃), 7.94 (t, J= 5.63 Hz, 1H, NH), 3.88 (s, 2H, CH₂), 3.86 (bs, J = 4.8 Hz, 1H, CH), 3.70 (d, J = 5.75 Hz, 2H, CH₂), 3.05 (q, J = 6.40 Hz, 2H, CH₂), 1.84 - 1.66 (m, 2H, CH₂), 1.47 - 1.35 (m, 2H, CH₂), 1.34 - 1.22 (m, 2H, CH₂) MS (ESI+) m/z 287.1 [M+H]⁺; Calculated for CioH₁₉N₆0₄ ⁺: 287.1 [M+H]

Synthesis _ of _ (((9/-/-fluoren-9-yl)methoxy)carbonyl)-/.-valyl-/.-leucyl-/.-prolyl-/.-leucyl-/.- threonylqlvcylqlvcine (Fmoc-VLPLTGG) via SPPS

[0163] For the synthesis of the heptapeptide VLPLTGG, 0.166 g of CTC-resin (1.0 mmol/g maximal loading capacity) were used. The loading of the resin with Fmoc-Gly-OH and all the coupling steps with the according Fmoc-protected amino acids were performed as described in the standard Fmoc SPPS protocol above.

[0164] After coupling of Fmoc-Val-OH, the product was cleaved from the resin by charging the syringe with a solution of 20% HFIP in DCM (v/v) followed by shaking at RT for 20 min. This procedure was repeated and the filtrates were combined in a round bottom flask. The solvent was evaporated under reduced pressure and the product was dissolved in a mixture of TFA/DCM/H₂0 (90/5/5, v/v/v) for fe/f.-butyl-deprotection. This solution was stirred at RT for 1 h then the solvent was evaporated under reduced pressure to a little volume. The remaining’s were precipitated in ice-cold Et₂0 in a centrifugal tube. After centrifugation and washing of the precipitate with ice-cold Et₂0 the product was dissolved in H₂0 and lyophilized, which delivered Fmoc-VLPLTGG as a white powder with 69.3 % yield (0.101 g, calculated using the maximal loading capacity as 100 %). Fmoc-VLPLTGG was stored at -20 °C and dissolved in DMSO prior to use.

MS (ESI+) m/z 878.6 [M+H]⁺; Calculated for C₄₅H₆₄N₇0_{I I} ⁺: 878.5 [M+H]⁺

Fmoc-VLPLTGG

Scheme S5: The heptapeptide Fmoc-VLPLTGG

Library construction and directed evolution for AzGGKRS

Construction of a taylor-made library for AzGGK (lib-shuffle)

[0165] In order to create a new taylor-made PylRS library for AzGGK a two-step process was conducted: Firstly an error prone PCR was performed to introduce random mutations in the C-lobe of the PylRS and secondly the present inventors shuffled 17 available PylRS variants that are known to accept bulky unnatural amino acids.

[0166] For the introduction of random mutations by error-prone PCR into the 495 bp Methanosarcina barkeri (Mb) wildtype PylRS C-lobe the present inventors used the GeneMorph II kit (Agilent Technologies) with the Random_C-lobe_Megaprimer primer-pair (Supplementary Table S1 ). Resulting megaprimer-amplicons were used for Mb wt PylRS amplification. Non-randomized plasmid copies were removed by Dpnl digestion at 37 °C and 1 ,500 rpm for 2 hours, followed by chemical transformation into E. coli XL-1 gold cells as described in the GeneMorph II kit (Agilent Technologies). Transformants were grown overnight in 1 L of LB-Kanamycin medium at 37 °C while shaking at 200 rpm. Cells were harvested by centrifugation at 4,000 g and 20 °C for 10 minutes. Plasmids were isolated with a Plasmid Midi-prep kit (Qiagen) following manufacturers instructions and stored at -20 °C.

[0167] Thereafter, seventeen carefully selected L/toPylRS derivatives (Supplementary Table S2) of our laboratory known to incorporate large and bulky unnatural amino acids, were subjected to DNA-shuffling in combination with the L/toPylRS error-prone PCR product. The 723 bp long Mb PylRS C-lobe fragment was amplified with the Shuffle_C-lobe primer pair (Supplementary Table S1 ) and the high-fidelity Q5-Polymerase (New England Biolabs). 4 pg of the 18 equally mixed amplicon variants were digested with 0.25 U of DNAsel (Roche) for 5 minutes at 16 °C in 1x DNAasel reaction buffer (100 mM TrisHCI, 10 mM MnCI₂, pH 7.4). The reaction was stopped with 1x DNAsel stop solution (250 mM EDTA, 50% Glycerol, pH 8.2), followed by gel purification with the QIAquick Gel Extraction kit (Qiagen) of the 50-500 bp DNA fragments. DNA fragments were reassembled by self-priming PCR followed by amplification with the Shuffle_PylRS_Clobe primer- pair (Supplementary Table S1 ) and the high-fidelity Q5-Polymerase (NEB Technologies). The PCR product was cloned into the wt PylRS backbone by restriction digest with Pstl + Bstell (NEB Technologies) restriction enzymes for 2 hours at 37 °C, overnight ligation at 16 °C with T4 Ligase (NEB Technologies) and electroporation into electrocompetent E.coli ϋH10b cells at 2.0 kV, 200 Ohm, and 25 pF (BioRad MicroPulser). Cells were recovered in 1 mL SOB-medium for one hour at 37 °C and 200 rpm. Recovered cells were grown overnight in 1 L LB with full strength Kan medium at 37 °C and 200 rpm. Cells were harvested by centrifugation at 4,000 g and 20 °C and plasmids were isolated with a plasmid midi-prep kit (Qiagen), followed by storage at -20°C. Library depth was calculated by dilution experiments and verified by sequencing, using the PylRS_seq primer- pair (Supplementary Table S1 ).

Directed Evolution for AzGGKRS

[0168] AzGGKRS evolution was performed by subsequent positive and negative selection,^11,12 followed by a second positive selection coupled to a reporter-sfGFP-150TAG-readout.^3,4 The positive selection plasmid pRep_PylT encodes a tetracycline resistance cassette and a constitutively expressed chloramphenicol acetyltransferase gene, bearing an amber codon at position 1 11 and a constitutively expressed Mb-pyrrolysyl-tRNA_Cu_A (PylT). The dual-reporter plasmid psfGFP_150TAG_CAT_1 11TAG_PylT additionally encodes a sfGFP-150TAG. The negative selection plasmid pYOBB_PylT encodes a chloramphenicol resistance cassette, a constitutively expressed PylT and a L-arabinose inducible barnase gene, interrupted by amber codons at positions 3 and 45. In the first positive selection step, 3 pg of lib-shuffle library were transformed by electroporation into 100 pL freshly prepared electrocompetent E. coli ϋH10b cells containing the positive selection plasmid pRep_PylT with an electroporator (BioRad Micropulser) using a 0.2 cm Micropulser electroporator cuvette (Biorad) at 2.0 kV, 200 Ohm, and 25 pF and rescued with 1 mL of SOC medium for 1 hour at 37 °C and 200 rpm. The cell suspension was incubated overnight at 37 °C and 200 rpm in 1 L 2xYT-medium with full-strength kanamycin and tetracycline. Next morning, the cell suspension was diluted in 500 mL 2xYT-medium with half- strength kanamycin and tetracycline to reach an OD₆oo < 0.1 , followed by an incubation at 200 rpm and 37 °C until reaching an OD₆oo <0.3. 10 mL of the cell suspension were transferred into a 50 mL centrifugation tube, 2 mM AzGGK were added to the cells and incubated for 4 hours at 37 °C. 600 pL of AzGGK-containing culture were plated on 24 cm x 24 cm plates containing 200 mL of GMML-agar with 240 pL chloramphenicol (1 .2 x strength), 200 pL tetracycline (full strength) and 200 pL kanamycin (full strength), as well as a 2 mM concentration of AzGGK. Plates were incubated for 36 - 48 h aiming for a single-clone distribution on the plates. Surviving bacterial clones were scraped off from the plate in the presence of 25 mL LB-medium with half-strength kanamycin and tetracycline, followed by a 2 hours incubation at 200 rpm and 37 °C to remove agar residuals and amplify low-abundant bacterial clones. The bacterial suspension was pelleted for 20 minutes at 3,000 g and 4 °C and subjected to plasmid isolation by HiSpeed midiprep (Qiagen). 6 pg of the isolated plasmid-DNA was separated by agarose gel electrophoresis, the PylRS- encoding plasmid was cut out and purified by gel extraction (Qiagen). For negative selection, 50 ng of the lib-shuffle-DNA obtained from the surviving positive selection clones were transformed by electroporation into freshly prepared electrocompetent E. coli ϋH10b cells containing the negative selection plasmid pYOBB_PylT. 2 x 500 pL of the SOC-rescued culture were plated on two square 24 cm x 24 cm plates containing 200 ml of LB-agar, full strength kanamycin and chloramphenicol, as well as 0.2 % L-arabinose. Plates were incubated for 36 - 48h at 37 °C, aiming for single-clone distribution on the plates. Surviving bacterial clones were scraped off from the plate in the presence of 25 mL LB-medium with half-strength kanamycin and chloramphenicol, followed by a 2 hour incubation at 200 rpm and 37 °C to remove agar residuals and amplify low-abundant bacterial clones. The bacterial suspension was pelleted for 20 minutes at 3,000 g and 4 °C and subjected to plasmid isolation by HiSpeed midiprep (Qiagen). 6 pg of the isolated plasmid-DNA was separated by agarose gel electrophoresis, the PylRS-encoding plasmid band was cut out and purified by gel extraction (Qiagen). For dual-positive selection and reporter-readout, 50 ng isolated lib-shuffle- DNA of the surviving clones from the negative selection were transformed into freshly prepared electrocompetent £. coli ϋH10b containing the dual-reporter plasmid psfGFP150TAG_CAT1 1 1 TAG_PylT. All co-transformants were plated on a single 24 cm x 24 cm autoinduction agar plate, containing 0.2 % L-arabinose, full strength kanamycin, as well as tetracycline, 50% of full strength chloramphenicol and 2 mM of AzGGK. Plates were incubated for 48 hours at 37 °C. sfGFP-expressing colonies were picked into 96-deep-well plates containing 1 mL of non-inducing media with full strength kanamycin and tetracycline, followed by 48 hours incubation at 200 rpm and 37 °C. 50 pL of the cell suspension were transferred into two new 96- deep-well plates containing 1 mL of auto-inducing medium with full strength kanamycin and tetracycline, as well as one of them containing 2 mM AzGGK, followed by 48 hours incubation at 37 °C (200 rpm). The residual 96-deep-well plate with non-inducing medium was centrifuged at 4.000 g and 4 °C, followed by storage at -20 °C. 20 pi of the cell suspension were transferred into corning 96-well-plates with a clear bottom and mixed with 180 pL of 1x PBS. The fluorescence- intensity for sfGFP-150TAG expression was measured for both plates incubated with and without 2 mM AzGGK in a Tecan plate reader (Excitation: 480nm; Emission: 527nm). The optical density at 600 nm was used to normalize the sfGFP-150TAG expression within each well. PylRS variants from cells showing at least 5-fold fluorescence intensity increase (as compared to cells grown in the absence of AzGGK) were isolated by agarose gel electrophoresis, purified by gel extraction and sequenced. Test expressions were performed with sfGFP-150TAG-His6. For further expressions, the AzGGRS was recloned into a pBK-backbone with ampicillin resistance.

Protein expression and purification

Expression and purification of sortase mutants

[0169] Chemical competent E. coli BL21 (DE3) were transformed with pET29-sortase-His6 plasmid (Addgene). After recovery with 1 ml. of SOC medium for one hour at 37 °C, the cells were cultured overnight in 50 ml. of LB medium containing kanamycin (50 pg/mL) at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in 1 L of fresh 2xYT medium supplemented with kanamycin (25 pg/ml) and cultured at 37 °C while shaking (200 rpm) until an OD₆oo = 0.5-0.8 was reached. IPTG was added to a final concentration of 0.4 mM and protein expression was induced for three hours at 30 °C. The cells were harvested by centrifugation (4,000xg, 15 min, 4 °C) and re- suspended in lysis buffer (50 mM Tris pH 8.0, 300 mM NaCI supplemented with 1 mM MgCI₂, 0.1 mg/mL DNAsel, one complete™ protease inhibitor tablet (Roche) and 0.175 mg/ml PMSF). Cells were lysed by sonication, centrifuged (15,000 xg, 40 min, 4 °C), and the cleared lysate was added to 2 mL Ni²⁺-NTA slurry (Jena Bioscience) and the mixture was incubated with agitation for one hour at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 10 CV of wash buffer (20 mM Tris pH 8.0, 30 mM imidazole pH 8.0 and 300 mM NaCI). The protein was eluated in 1 mL fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the protein were pooled together, concentrated and rebuffered (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) with Amicon® Ultra-4 10K MWCO centrifugal filter units (Millipore). Enzyme concentration was calculated from the measured A280 absorption (extinction coefficients were calculated with ProtParam (https://web.expasy.org/protparam/)). All Sortase variants were stored at 4 °C for further use.

[0170] Sortase variants harbouring a C-terminal TEV-His-tag were expressed and purified identically. In order to cleave off the His-Tag, the fractions containing the protein were pooled together and 200 pL of TEV protease (1.8 mg/mL) was added. The mixture was transferred to a dialysis tubing (Roth) and the dialysis bag was immersed in 2 L of cold dialysis buffer (25 mM Tris pH 8.0, 150 mM NaCI, 2 mM DTT) and stirred at 4 °C overnight. The protein mixture was recovered from the dialysis tubing and centrifuged (15,000xg, 10 min, 4 °C) in order to precipitate the TEV protease. 2 mL of Ni²⁺-NTA slurry (Jena Bioscience) were added to the supernatant and the mixture was incubated with agitation for one hour at 4 °C. The mixture was then poured into an empty plastic column and the flow-through was collected. The Ni²⁺-NTA beads were washed twice with 15 mL of wash buffer (20 mM Tris pH 8.0, 150 mM NaCI and 5 mM CaCI₂). Flow-through and wash fractions containing pure Sortase without His-Tag were pooled together, concentrated and rebuffered (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) with Amicon® Ultra-4 10K MWCO centrifugal filter units (Millipore). Enzyme concentration was calculated from the measured A280 absorption (extinction coefficients were calculated with ProtParam (https://web.expasy.org/protparam/)). All Sortase variants were stored at 4 °C for further use.

Expression and purification of wt ubiquitin and ubiquitin mutants

[0171] Chemical competent E. coli Rosetta2 (DE3) were transformed with pET 17-ubiquitin plasmid. After recovery with 1 mL of SOC medium for one hour at 37 °C, the cells were cultured overnight in 50 mL of LB medium containing ampicillin (100 pg/mL) and chloramphenicol (50 pg/mL) at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in 1 L of fresh 2 xYT medium supplemented with ampicillin (50 pg/ml) and chloramphenicol (25 pg/ml) and cultured at 37 °C with shaking (200 rpm) until OD₆oo = 0.8-1.0. IPTG was added to a final concentration of 1 mM and protein expression was induced for 4 hours at 37 °C. The cells were harvested by centrifugation (4,000xg, 15 min, 4 °C) and resuspended in lysis buffer (50 mM Tris pH 7.6, supplemented with 10 mM MgCI2, 1 mM EDTA, 0.1 % NP-40, 0.1 mg/mL DNAsel, one complete™ protease inhibitor tablet and 0.175 mg/mL PMSF). Cells were lysed by sonication and centrifuged (15,000 xg, 40 min, 4 °C).

[0172] The cleared lysate was transferred into a glass beaker in an ice-bath that was placed on a magnetic stirrer. Precipitation was performed with 35 % perchloric acid until pH 4.0 - 4.5 was reached. After 5 minutes incubation at 4 °C while stirring, the milky solution was centrifuged (15,000xg, 40 min, 4 °C) and the supernatant was transferred into a dialysis tubing with a MWCO of 2 kDa. Dialysis was performed over night at 4 °C with 50 mM Ammonium acetate buffer pH 4.5. The dialyzed solution was centrifuged (15,000xg, 40 min, 4 °C), filtered and purified via a HiTrap SP FF 5 mL cation exchange chromatography (GE, gradient 0 - 1 M NaCI). Fractions that were > 95 % purity, as judged by SDS-PAGE, were pooled and rebuffered (20 mM Tris pH 8.0, 150 mM NaCI, 5 mM CaCI₂) with Amicon® Ultra-15 3kDa MWCO centrifugal filter units (Millipore). Enzyme concentration was calculated from the measured A280 absorption (extinction coefficients were calculated with ProtParam (https://web.expasy.org/protparam/)). All purified Ubiquitin variants were stored at 4 °C for further use.

PCNA expression and purification

[0173] Chemical competent E. coli K12 cells were co-transformed with pPylT_PCNA-K164TAG- CPD-His6 (which encodes MbtRNAcu_A and a C-terminally His6-tagged PCNAK164TAG-CPD-His6 fusion protein. CPD is a cysteine protease domain of the Vibrio cholerae MARTX toxin)¹³ and pBK_AzGGKRS (which encodes AzGGKRS) plasmids. After recovery with 1 mL of SOC medium for 1 h at 37 °C, the cells were cultured overnight in 50 mL of non-inducing medium containing full strength antibiotics (tetracycline and ampicilline), at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in autoinduction medium supplemented with full strength antibiotics and 4 mM AzGGK. After incubation overnight at 37 °C the cells were harvested by centrifugation (4000 xg, 15 min, 4 °C), flash frozen in liquid nitrogen and stored at -80 °C. The composition of non- inducing and autoinducing media can be found in Supplementary Tables S3-S7.

[0174] The pellet was thawed on ice and re-suspended in lysis buffer (20 mM Tris pH 7.5, 0.5 mM PMSF, 300 mM NaCI, 0.2 % NP-40, 0.2 mg/ml Lysozyme). The cell suspension was incubated on ice for 30 minutes and sonicated at 4 °C on ice. The lysed cells were centrifuged (15,000xg, 30 min, 4 °C), the cleared lysate added to Ni²⁺-NTA slurry (Jena Bioscience) (0.2 mL of slurry per 100 mL of culture) and the mixture was incubated with agitation for 1 h at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 10 CV wash buffer (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). In order to induce auto cleavage of the CPD the slurry was incubated with 1 CV elution buffer 1 (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂ and 1 mM IP6) for 1 h at 4 °C. Afterwards PCNA was eluated in 1 mL fractions with elution elution buffer 1. The column bound CPD-His6 was eluted afterwards with elution buffer 2 (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂ and 300 mM imidazole). The fractions containing PCNA-K164AzGGK (identified via 15% SDS-PAGE) were pooled together, concentrated and rebuffered (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) with Amicon® Ultra-4 10K MWCO centrifugal filter units (Millipore). Protein concentration was calculated from the measured A280 absorption (extinction coefficients were calculated with ProtParam (https://web.expasy.org/protparam/)). PCNA was stored at 4 °C for further use.

Expression and Purification of His6-TEV-SUMQ1 variants and His6-TEV-Ub variants

[0175] Chemical competent £. coli Rosetta2 (DE3) were transformed with pET17-His6-TEC- Ub/SUM01 plasmid. After recovery with 1 mL of SOC medium for 1 hour at 37 °C, the cells were cultured overnight in 50 mL of LB medium containing ampicillin (100 pg/mL) and chloramphenicol (50 pg/mL) at 37 °C, 200 rpm. The overnight culture was diluted to an OD₆oo of 0.05 in 1 L of fresh 2 xYT medium supplemented with ampicillin (50 pg/mL) and chloramphenicol (25 pg/mL) and cultured at 37 °C, 200 rpm, until OD₆oo = 0.8-1.0. IPTG was added to a final concentration of 1 mM and protein expression was induced for 4 h at 37 °C. The cells were harvested by centrifugation (4,000xg, 15 min, 4 °C) and resuspended in lysis buffer (50 mM Tris pH 7.6, supplemented with 10 mM MgCI₂, 1 mM EDTA, 0.1 % NP-40, 0.1 mg/ml_ DNAsel, one complete™ protease inhibitor tablet and 0.175 mg/ml_ PMSF). Cells were lysed by sonication and centrifuged (15,000xg, 40 min, 4 °C).

[0176] The obtained cell pellets were resuspended in 20 ml. of lysis buffer (20 mM Tris pH 8.0, 30 mM imidazole, 300 mM NaCI, 0.175 mg/ml_ PMSF, 0.1 mg/ml_ DNase I and one complete™ protease inhibitor tablet (Roche)). The cell suspension was incubated on ice for 30 minutes and sonicated at 4 °C on ice. The lysed cells were centrifuged (15,000xg, 40 min, 4 °C), the cleared lysate added to Ni²⁺-NTA slurry (Jena Bioscience) (0.2 ml. of slurry per 100 ml. of culture) and the mixture was incubated with agitation for 1 h at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 10 CV of wash buffer (20 mM Tris pH 8.0, 30 mM imidazole pH 8.0 and 300 mM NaCI). The protein was eluated in 1 ml. fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the protein were pooled together, concentrated and rebuffered (20 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Purified proteins were analyzed by 15 % SDS-PAGE and mass spectrometry. Ubiquitin and SUM01 variants were stored at 4 °C until further use.

Ubiquitylation and SUMOylation of GGK-bearinq proteins

Preparation of diubiqutins

[0177] Ub-GGK acceptor ubiquitins with GGK at positions K6, K1 1 , K33, K48 and K63 were diluted to 20 mM in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). Afterwards 100 mM of the donor ubiquitin (either Ub(AT) or Ub(LAT)) was added followed by the addition of 20 pM Srt2A (without His-Tag). Incubation was performed at 37 °C, 600 rpm, for one hour when Ub(LAT) was used and for 18 hours when Ub(AT) was used. Sortase-mediated transpeptidation was stopped by the addition of 200 pM phenylvinylsulfon and further incubation for 10 minutes at 37 °C, 600 rpm.

[0178] Afterwards Ni²⁺-NTA slurry (Jena Bioscience) (0.1 ml_/mg of acceptor ubiquitin) was added to the reaction mixture and the mixture was incubated agitating for 1 h at 4 °C. After incubation, the mixture was transferred to an empty plastic column and washed with 40 CV of wash buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI_2, 30 mM imidazole) to remove Srt2A and the excess of donor ubiquitin. The protein was eluated in 0.2 ml. fractions with wash buffer supplemented with 300 mM imidazole pH 8.0. The fractions containing the mixture of diubiquitin and unreacted acceptor ubiquitin were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). In order to remove unreacted acceptor ubiquitin, size- exclusion chromatography (SEC) was performed using a Superdex S75 16/600 (GE Healthcare) with sortase buffer. Fractions containing the diubiquitn were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Diubiquitins were stored at 4 °C until further use.

Preparation of ubiquitylated PCNA

[0179] PCNA-K164GGK was diluted to 5 mM in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂). Afterwards 100 pM of His-TEV-Ub(AT) and 20 pM Srt2A (without His-Tag) was added and the mixture incubated at 25 °C (600 rpm) for 44 hours. Sortase mediated transpeptidation was stopped by the addition of 200 pM phenylvinylsulfon and further incubation for 10 minutes at 25 °C, 600 rpm.

[0180] After incubation, the mixture was subjected to Ni-NTA affinity chromatography using a His- Trap FF Column (GE Life Sciences) to remove Srt2A and unreacted PCNA. The fractions containing ubiquitylated PCNA were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore).

[0181] In order to remove the excess of His-TEV-Ub(AT), size-exclusion chromatography (SEC) was performed using a Superdex S75 16/600 (GE Healthcare) with SEC-buffer (20 mM HEPES pH 7.5, 150 mM KCI, 0.5 mM TCEP, 5 % Glycerol (w/v)). Fractions containing pure ubiquitylated PCNA were pooled together and concentrated with Amicon® with the corresponding MWCO centrifugal filter units (Millipore). Protein concentration was calculated from the measured A280 absorption (extinction coefficients were calculated with ProtParam (https://web.expasy.org/protparam/)). Ubiquitylated PCNA was stored at 4 °C until further use.

Deubiquitylation assay of diubiquitins

[0182] USP2_CD (100 ng, Boston Biochem) was diluted into DUB dilution buffer (25 mM Tris pH 7.5, 150 mM NaCI, 10 mM DTT) and incubated at room temperature for 10 minutes to activate the enzyme. 2 pg of native diubiquitin (UbiQBio) or sortase-generated diubiquitin were added to 3 pL 10x DUB buffer (500 mM Tris pH 7.5, 500 mM NaCI, 50 mM DTT) and constituted to 20 pL with H₂0. Afterwards 10 pL of the activated DUB was added to diubiquitin samples followed by incubation at 37 °C. 6 pL samples were taken at the denoted time points and quenched by the addition of 4x SDS loading buffer and boiling at 95 °C for 10 minutes. Samples were loaded on SDS-PAGE gels and visualized by Coomassie staining.

Deubiquitylation assay of ubiquitylated PCNA

[0183] Natively ubiquitylated PCNA (a kind gift from Christian Biertijmpfel, MPI Martinsried) or sortase-gen rated PCNA-Ub(AT) conjugate was diluted to 1 pM into DUB buffer (50 mM HEPES pH 7.5, 150 mM NaCI, 0.5 mM TCEP, 1 mM EDTA). UAF1 and USP1 (Boston Biochem) were added in equimolar ratio to a final concentration of 100 nM to the ubiquitylated PCNA conjugates in DUB buffer. The mixture was incubated at 37 °C. 10 mI samples were taken at the denoted time points and quenched by the addition of 4x SDS loading buffer and boiling at 95 °C for 5 minutes. Samples were loaded on to SDS-PAGE gels and visualized by Coomassie staining.

Peptide-based sortase Assays

[0184] Fmoc-VLPLTGG (20 mM stock solution in DMSO) was diluted in sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) to a final concentration of 1 mM followed by the addition of 10 mM AzGGK (50 mM stock in H₂0) or GGK (50 mM stock in H₂0). Subsequently Srt5M (300-800 mM stock in sortase buffer) was added to a final concentration of 20 mM. Incubation was performed at 37 °C (600 rpm). Samples were taken at the denoted time points by quenching the reaction mixture with 10 volumes of 0.5 % formic acid prior to HPLC-MS analysis. Typical reaction volumes were 50 mI_.

Mammalian cell methods

Expression and purification of proteins containing AzGGK in mammalian cells

[0185] Human embryonic kidney 293T cells (HEK293T) were cultured in Dulbecco’s modified

Eagle's medium (DMEM, Sigma Aldrich) supplemented with 10 % (v/v) fetal bovine serum (FBS,

Biochrom) and 1 % antibiotic-antimycotic solution (25 pg/mL amphotenicin B, 10 mg/ml_ streptomycin, and 10,000 units of penicillin, Sigma-Aldrich) at 37 °C in a humidified chamber with 5

% C0₂. HEK293T cells were seeded in Poly-L-lysine coated 6-well plates (Greiner) at 6x10⁵ cells/well for Western-Blotting and in a 100 mm dish at 3x10⁶ cells/dish for protein purification followed by ESI-MS analysis. Fresh DMEM, supplemented with 2 mM AzGGK, was added to the cells prior to transfection. Transfection was performed using PEI Transfection Reagent (Sigma). A plasmid ratio of 3:1/pEF1-sfGFPN150TAG:pEF1-AzGGKRS was used for the co-transfection. Cells were incubated for 24 to 48 hours prior to optional in vivo reduction. For protein purification of sfGFP-N150AzGGK-His6, cells were lysed with 500 pl_ lysis buffer (50 mM Tris pH 8.0, 150 mM

NaCI, 10 mM imidazole, 1 % Triton X-100, 1 mM PMSF, 1x protease inhibitor cocktail) for 30 minutes on ice. Cell lysate was clarified by centrifugation for 15 minutes at 14,000 xg and sfGFP was purified by Ni²⁺-NTA affinity chromatography following the manufacturer’s (Jena Bioscience) instructions. The purified protein was characterized by SDS-PAGE and ESI-MS analysis.

In lysate sortase-mediated ubiquitylation

[0186] By adding the cell-permeable reduction agent 2DPBA to the medium of cells expressing AzGGK-bearing protein, AzGGK-bearing proteins can be reduced to GGK-bearing proteins. The reduction was performed 24 to 48 hours post transfection as follows: cells were washed twice with PBS, and fresh, complete DMEM was added. Subsequently the cells were incubated for another 2- 3 h to eliminate residual traces of AzGGK. After incubation, another washing step with PBS was carried out and 400 mM 2-DPBA containing complete DMEM was added. Cells were then incubated at 37 °C, 5 % C0₂ for another 4-18 h before harvesting them. Prior to harvesting two additional washing steps with PBS were conducted. For in lysate sortase-mediated ubiquitylation, 2DPBA-treated cells (growing in 6-well plates) were scraped in 500 pL PBS, transferred into reaction tubes, and pelleted via centrifugation (500xg, 5 min, 4 °C). The supernatant was discarded and cells were re-suspended in 50-100 pL buffer of choice. After re-suspension, cells were flash frozen in liquid nitrogen and thawed on ice. This was repeated three times to break the membrane. Cell debris and nuclei were pelleted via centrifugation for 15 min at max. speed and 4 °C. The supernatant was removed and used for in lysate sortase-mediated ubiquitylation experiments. In order to work in calcium-free conditions, freeze thaw lysis was performed using calcium-depleting buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM EGTA) containing ethylene glycol tetraacetic acid (EGTA), which is a calcium chelating agent. For Ca²⁺-containing conditions sortase buffer (50 mM Tris pH 7.5, 150 mM NaCI, 5 mM CaCI₂) was used for freeze-thaw lysis. 25 pL of supernatant were supplemented with 20 pM of the corresponding sortase variant, followed by ten minutes incubation at 37 °C. Afterwards 100 pM of Ub(LAT) or Ub(LPT) was added and the mixture was incubated for one hour at 37 °C, 200 rpm. 10 pL of reaction mixture were loaded onto a SDS-PAGE after diluting and boiling with 4x SDS loading buffer. Consecutively, anti-His6 Western-Blot analysis was performed.

Cloning of Constructs

[0187] For mammalian cell experiments, human codon optimized ubiquitin, SUMO, PCNA and sortase (mSrt2A and Srt2A) and /WmAzGGKRS genes were purchased as DNA Strings (GeneArt, Thermo Fisher) and cloned into pcDNA3.1 , pIRES and pEF1 vectors via standard restriction cloning. Point mutations, insertions and deletions were introduced using Site-directed, Ligase- Independent Mutagenesis (SLIM). Srt5M and Srt2A in pET29b vectors were purchased (Addgene Plasmids #75144 and #75145) and mutations were introduced using SLIM cloning. pBAD Duett vectors were created using Gibson Assembly (New England Biolabs). pET17 vectors containing modified ubiquitin and SUMO were created via restriction cloning. Point mutations and insertions were introduced using SLIM. An overview of bacterial and mammalian cell constructs can be found in Supplementary Tables S8 and S9.

Supplementary Tables S1 -S9

[0188] Supplementary Table SI. Primers used for PylRS-library (lib-shuffle) construction Primer Sequence (5’ - 3’)

Random C-lobe Megaprimer rev CTTTGAG C AG ACG TTCCAG GCC

Shuffle Globe rev CGTTTUAAACTGCAGTTACAGGTTCGTGC

Shuffle PylRS Globe rev GGTTTGAAACTGCAGTTACAGGTTCGTGG

PylRS seq

PylRS seq rev C C T A C A A

[0189] Supplementary Table S2. Overview of PylRS variants used for lib-shuffle generation

PylRS Variant # AA266 AA270 AA271 AA274 AA31I AA313 AA349 AA366 AA382

2 L270I Y271 F L274A N31 1 A C3 13V

18 Error-Prone PCR results of 495 base-pair PylRS C-lobe (G223 to F387)

[0190] Supplementary Table S3. Composition of non-inducing media

Components Media concentration (IX)

10% Glycerol

5 Ox M lx

20% Arabinose

[0191] Supplementary Table S4. Composition of auto-induction media

Components Media concentration (IX)

10% Glycerol 0.5%

'ΐk M I s

20% Arabinose 0.05%

1M Nicotinamide

[0192] Supplementary Table S5. Composition of 50x M mix. Store at RT.

Components 50x Stock concentration

KILHPO 1.25 M

Na₂S0₄ 0.25 M [0193] Supplementary Table S6. Components of the 25x 17 AA Mix. Store at 4 °C.

Components 2Sx Stock concentration

Aspartic acid 5 me, ml

Arginine-I K^’l 3 mg-ml.

Alanine 5 rngTnL

Glycine 5 mg/mL

Serine 5 mg. ml.

Asparapine-H -O 3 mg-ml .

Leucin 5 mg l .

Tryptophan 5 mg-ml.

[0194] Supplementary Table S7. Components of 5000x Trace Metal stock solution. All 30 mL stock solutions were prepared by dissolving the denoted salts in ddH₂0 except FeCl₃ which was dissolved in 0.1 M HC1. The mix was sterile filtered and stored at 4 °C.

Components for 30 mL stock solution Media concentration lx

[0195] Supplementary Table S8. Bacterial constructs.

Plasmid Purpose

pP\ l l sl( .l R-M 0Ί \( .-l lis(₎ slGLP-N 150TAG-I Iis6 under arahinose promolor with a C-lerminal I Iis6- lag and a PylT eopy under eonstitulive promolor

pPylT PCNA-K164TAG-CPD-His6 PCNAK164-CPD under arahinose promotor with a C-terminal His6-tag and a PylT copy under constitutive promotor

pk l 17 b_His6-l li V-S L MOl (At j SUMOl with N-lerminai IIis6-Tag, TLV-Site and C-terminus LAQTG under an IPTG inducible T7 Promotor

pBad_Duett Srt2A Ub(LAT) Polycistronic Duett vector harbouring both Srt2A and Ub(LAT) under an arahinose promotor

pBad_Duett Srt2A SUMOl(LAT) Polycistronic Duett vector harbouring both St2A and SUMO(LAT) under an arahinose promotor

pBad l)uell niSrl2A SL MO I(A I ) Polycistronic Duett vector harbouring both mSrl2A and SUMO(AT) under an arahinose promotor [0196] Supplementary Table S9. Mammalian cell constructs.

Plasmid Purpose

pl l l-Rί \ \-K >4 I M.llish 4P\I I PC N/\-K 164TAC i with C-tenninal Ilis-Tag under an I . f I promotor and 4 copies of PylT

Claims

1. A method for modifying a polypeptide, comprising:

(i) providing a transpeptidase;

(iv) obtaining the modified polypeptide.

2. The method according to claim 1 , wherein the unnatural amino acid comprises two or more amino acid residues, wherein the first amino acid residue is integrated in the amino acid chain of the first polypeptide,

wherein the first amino acid is preferably a Lysine residue and wherein the two or more further amino acids are linked via an isopeptide-bond to the amino group in the side chain of the first amino acid.

3. The method according to claim 1 or 2, wherein the transpeptidase is sortase A comprising an amino acid sequence which has an identity of at least 60% to the amino acid sequence shown in SEQ ID NO: 1 , preferably sortase A (SrtA) from Staphylococcus aureus or a mutant thereof capable of conjugating the second polypeptide to the first polypeptide, such as Srt2A having an amino acid sequence as shown in SEQ ID NO: 2, Srt5M having an amino acid sequence as shown in SEQ ID NO: 3, Srt7M having an amino acid sequence as shown in SEQ ID NO: 4, mSrt2A having an amino acid sequence as shown in SEQ ID NO: 5, or Srt4S having an amino acid sequence as shown in SEQ ID NO: 6.

4. The method according to any one of claims 1 to 3, wherein the unnatural amino acid is N⁶- ((2-azidoacetyl)glycyl)-L-lysine (AzGGK) or /V⁶-((2-azidoacetyl)-L-lysine (AzGK) and wherein transpeptidation is induced by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA); or wherein the unnatural amino acid is GK or GGK, whose N-terminal glycine residue is protected with a photoremovable protecting group, prefereably 2-nitrobenzyl or coumarin, and wherein deprotection to GK or GGK is induced with a light pulse.

5. The method according to any one of claims 1 to 3, wherein the unnatural amino acid is N⁶- glycylglycyl-L-lysine (GGK) or N⁶-glycyl-L-lysine (GK).

6. The method according to any one of claims 3 to 5, wherein the second polypeptide is an ubiquitin or ubiquitin-like-protein, wherein the C-terminus of the ubiquitin or ubiquitin-like- protein has been modified such that it comprises a recognition motif for the sortase A enzyme,

wherein the ubiquitin-like-protein is preferably SUM01 , SUM02, NEDD8, URM1 , Ufm1 , ATG8, ATG12, URM1 , FAT10 or ISG15.

7. The method according to any one of claims 1 to 6, wherein the method is performed in a host cell, wherein the first polypeptide is expressed in said host cell comprising:

b) providing the host cell with a polynucleotide encoding the first polypeptide, comprising one or more codons recognized by the heterologous tRNA; and culturing the host cell under conditions allowing expression of the first polypeptide of interest.

8. The method according to claim 7, wherein the orthogonal heterologous tRNA synthetase/tRNA pair is a pyrrolysyl tRNA synthetase and a tRNA from a Methanosarcina species, preferably the pyrrolysyl tRNA synthetase is a mutant pyrrolysyl tRNA synthetase having an amino acid sequence selected from the group consisting of SEQ ID NO: 10, SEQ ID NO: 1 1 , SEQ ID NO: 12, and SEQ ID NO: 13 and/or the tRNA is the Methanosarcina barkeri tRNAcu_Aas shown in SEQ ID NO: 14.

9. The method according to any one of claims 7 or 8, wherein the host cell is a vertebrate cell, mammalian cell, human cell, animal cell, invertebrate cell, plant cell, nematodal cell, insect cell, stem cell, fungal cell, yeast cell, a bacterial cell, or a multicellular organism comprising a host cell, wherein the multicellular organism is preferably a mammal, an insect or a nematode and more preferably Caenorhabditis elegans, Drosophila or mouse.

10. The method according to any one of claims 1 to 6, wherein the method is performed in vitro, wherein the modified polypeptide is obtained by admixing and incubating the provided components under conditions allowing transpeptidase mediated transpeptidation reaction.

1 1. The method according to any one of claims 3 to 10, wherein the second polypeptide comprises a recognition motif selected from the group consisting of SEQ ID NO: 15 (LPXTG), SEQ ID NO: 16 (LLPXTG), SEQ ID NO: 17 (LAXTG), SEQ ID NO: 18 (LLAXTG), SEQ ID NO: 19 (LPXSG) and SEQ ID NO: 20 (LLPXSG).

12. The method according to any one of claims 1 and 6-10, wherein

(i) the transpeptidase is a subtiligase;

(ii) the first polypeptide comprises an unnatural amino acid as defined in claim 4 or 5; and

wherein the subtiligase has preferably an amino acid sequence which has an identity of at least 70% to the amino acid sequence shown in SEQ ID NO: 41.

13. The method according to any one of claims 6 to 1 1 , wherein the ubiquitin or ubiquitin-like- protein comprises a recognition motif for a first sortase A enzyme and one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in claim 5, the method further comprising

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a C-terminus comprising a recognition motif for a second sortase A enzyme that is different to the recognition motif of the first sortase A enzyme, wherein one or more Lysine residues in the ubiquitin or ubiquitin- like-protein have been substituted with an unnatural amino acid as defined in claim 5, and

said second sortase A;

(viii) obtaining a modified polypeptide comprising a chain of two or more ubiquitins or ubiquitin-like-proteins.

14. The method according to claim 12, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in claim 5, the method further comprising

(v) removing the azido group of AzGGK or AzGK of the unnatural amino acid in the ubiquitin or ubiquitin-like-protein by providing a phosphine, preferably 2- (diphenylphosphino)benzoic acid (2DPBA) or removing the photoprotective group by providing a light pulse;

(vi) incubating the polypeptide obtained in step (v) with

an ubiquitin or ubiquitin-like protein comprising a thioester at the C-terminus, wherein one or more Lysine residues in the ubiquitin or ubiquitin-like-protein have been substituted with an unnatural amino acid as defined in claim 5, and

a subtiligase;

(vii) optionally repeating step (v) and (vi); and

15. The method according to any one of the preceding claims, wherein the unnatural amino acid comprises a free amino group.

16. A modified polypeptide obtainable by the method according to any one of claims 1-15.

17. A multidomain or non-refoldable polypeptide conjugated to one or more ubiquitins or ubiquitin-like-proteins, wherein the one or more ubiquitins or ubiquitin-like-proteins comprise a C-terminal amino acid sequence selected from the group consisting of SEQ ID NO: 15 (LPXTGG), SEQ ID NO: 16 (LLPXTGG), SEQ ID NO: 17 (LAXTGG), SEQ ID NO: 18 (LLAXTGG), SEQ ID NO: 19 (LPXSGG) and SEQ ID NO: 20 (LLPXSGG).

18. A pyrrolysyl tRNA synthetase comprising an amino acid sequence having at least 70% sequence identity to the amino acid sequence set forth in

(i) SEQ ID NO: 10, wherein amino acid residue 274 is Alanine, amino acid residue 31 1 is Glutamine and amino acid residue 313 is Serine; (ii) SEQ ID NO: 1 1 , wherein amino acid residue 271 is Leucine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine; and

(iii) SEQ ID NO: 12, wherein amino acid residue 266 is Methionine, amino acid residue 270 is Isoleucine, amino acid residue 271 is Phenylalanine, amino acid residue 274 is Alanine and amino acid residue 313 is Phenylalanine.

19. A polynucleotide encoding a pyrrolysyl tRNA synthetase as defined in claim 18.

20. Use of the pyrrolysyl tRNA synthetase according to claim 18 in the method according to any one of claims 7 to 15 or for genetic code expansion.

21. A sortase A mutant comprising an amino acid sequence having at least 60% sequence identity to the amino acid sequence set forth in SEQ ID NO: 5, wherein amino acid residue 36 is Arginine, amino acid residue 44 is Cysteine, amino acid residue 46 is Histidine, amino acid residue 47 is Lysine, amino acid residue 50 is Glutamine, amino acid residue 80 is Proline, amino acid residue 94 is Isoleucine, amino acid residue 102 is Lysine, amino acid residue 104 is Histidine, amino acid residue 106 is Asparagine, amino acid residue 107 is Alanine, amino acid residue 109 is Glutamic acid, amino acid residue 115 is Glutamic acid, amino acid residue 124 is Valine, amino acid residue 132 is Glutamic acid, and amino acid residue 138 is Serine.

22. The sortase A mutant according to claim 21 , wherein the sortase A mutant is calcium- independent.

23. A polynucleotide encoding a sortase A mutant as defined in claim 21 or 22.

24. Use of the sortase A mutant according to claim 21 or 22 in the method according to any one of claim 3 to 11 , 13 and 15 or for catalyzing a transpeptidation reaction.

25. A kit comprising one or more polynucleotides encoding

(i) a sortase A as defined in claim 2 or a subtiligase as defined in claim 12;

(iii) an orthogonal heterologous tRNA synthetase/tRNA pair as defined in claim 7 and 8; and

an unnatural amino acid as defined in claim 4 or 5.

6. A host cell comprising:

(i) a sortase A as defined in claim 2 or a subtiligase as defined in claim 12;

(iv) optionally a polypeptide comprising one or more unnatural amino acids as defined in claim 4 or 5.