WO2024118635A2

WO2024118635A2 - Thermostable binding scaffolds

Info

Publication number: WO2024118635A2
Application number: PCT/US2023/081399
Authority: WO
Inventors: Richard J. SUDERMAN; David M. Chao
Original assignee: Nectagen, Inc.
Priority date: 2022-11-28
Filing date: 2023-11-28
Publication date: 2024-06-06
Also published as: WO2024118635A3

Abstract

The present invention features thermostable protein binding scaffolds containing framework regions and variable loop regions that can be mutagenized to bind a desired target. The scaffolds are derived from the Carbohydrate Binding Module Family 32 (CBM32) protein domain of Clostridium perfringens hyaluronidase (NagH). The robust framework of the scaffolds described herein allows for the development of custom, high performance affinity chromatography resins compatible with the harsh conditions of process-scale applications that can be adaptable to a wide diversity of target substrates.

Description

THERMOSTABLE BINDING SCAFFOLDS

Sequence Listing

This application contains a Sequence Listing which has been filed electronically in Extensible Markup Language (XML) format and is hereby incorporated by reference in its entirety. Said XML copy, created on November 22, 2023, is named 51027-005W02_Sequence_Listing_11_22_23.XML and is 526,678 bytes in size.

Statement Regarding Federally Sponsored Research

This invention was made with government support under Grant No. 1 R43 GM143942-01 , awarded by the National Institutes of Health. The government has certain rights in the invention.

Background of the Invention

Affinity chromatography (AC) with target-specific, immobilized capture agents is an established method of protein purification. In this technique, a capture agent, such as a protein, nucleic acid, or small molecule, is coupled to a solid support, which can then be used to isolate a protein of interest from a complex mixture. The technique has been widely used at the laboratory scale for single-step purifications of diverse target proteins, including enzymes, transcription factors, growth factors and antibodies.

Use of protein-based capture agents for AC in industrial applications has been less widespread because currently available approaches are incompatible with the temperatures, pH extremes, and solvents often needed for process-scale purification or are useful for only a limited number of targets. An exception is the purification of kg-quantities of antibodies with AC resins based on Staphylococcal Protein A. The development of Protein A resins highlights the use of protein engineering to improve the robustness of AC resins as well as some remaining limitations. Early versions of resins with wild-type Protein A captured antibodies with high selectivity and capacity from cell culture media feedstock but lost activity gradually after multiple cycles of cleaning in place with sodium hydroxide. Mutagenesis of Protein A yielded variants with increased resistance to sodium hydroxide treatment and higher binding capacity. Despite the widespread use of Protein A resins, they are nonetheless limited to the purification of antibodies.

For the process-scale purification of non-antibody targets, the use of AC is much less widespread than the use of Protein A to purify antibodies. For instance, non-protein, ligand-based approaches, such as small molecule substrate mimetics, are effective but are limited to specific enzyme classes and are difficult to use with a general protein of interest. Alternatively, specialized affinity resins, such as glutathione or nickel require the addition of non-native tags, which cause downstream complications for proteins intended for therapeutic use. Immuno-AC with antibody- or nanobody-based capture agents is the most generally applicable approach and has widely been used to purify a diverse range of proteins at laboratory-scale. However, immuno-AC has some limitations. In general, the conjugation of the antibody to the resin often results in heterogenous coupling because of a lack of precise control over the sites of conjugation. In addition, chromatography must be performed under oxidizing conditions in order to preserve the disulfide bonds essential for maintaining antibody structure. Elution of the target also usually requires low or high pH conditions that are incompatible with some target proteins. For processscale applications, the chief limitation of immuno-AC resins is their sensitivity to the sodium hydroxide solutions which are preferred for cleaning-in-place procedures. Because of these limitations, new capture agents are needed that can perform under a variety of extreme conditions necessary for robust target purification.

Summary of the Invention

In one aspect, the invention features a protein scaffold that includes framework regions and loop regions. The protein scaffold has the structure:

A-F1 -L1 -F2-L2-F3-L3-F4-L4-F5-L5-F6-L6-F7-L7-F8-L8-F9-B, wherein each of F1 -F9 correspond to framework regions 1 -9; each of L1 -L8 correspond to loop regions 1 -8;

A and B are each independently, absent or include at least one amino acid;

F1 includes the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 4;

L1 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F2 includes the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 5;

L2 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 includes the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 6;

L3 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F4 includes the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 7;

L4 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F5 includes the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 8;

L5 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F6 includes the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 9;

L6 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 10;

L7 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 11 ;

L8 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and

F9 includes the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 12.

As described herein, a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof includes a sequence having, for example, one insertion, two insertions, one deletion, two deletions, one substitution mutation, two substitution mutations, one insertion and one deletion, one insertion and one substitution mutation, or one deletion and one substitution mutation.

In some embodiments, the protein scaffold includes at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815Xi, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K86OX2, L861X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X₃ relative to SEQ ID NO: 1 , wherein:

X is any amino acid except the amino acid in the equivalent position in SEQ ID NO: 1 ;

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

In some embodiments,

F1 includes the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 4;

F2 includes the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 5;

F3 includes the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 6;

F4 includes the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 7;

F5 includes the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 8; F6 includes the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 9;

F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 10;

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 11 ; and

F9 includes the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 12.

In some embodiments, F1 includes the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4);

F2 includes the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5);

F3 includes the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6);

F4 includes the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7);

F5 includes the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8);

F6 includes the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9);

F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10);

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ); and

F9 includes the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12).

In some embodiments, the protein scaffold includes at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815X, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K860X, L861X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X relative to SEQ ID NO: 1 , wherein X is any amino acid.

In some embodiments, the protein scaffold includes at least one mutation selected from the group consisting of N807D, S809T, R812H, S813T, E814P, S815G, D818V, N822S, N825D, N832S, W836E, K857E, E858V, I859V, K860E, L861V, D862G, R865H, K870A, N871 D, N880T, K881 R, K883R, N890G, K897R, K901 H, K908Q, E912D, S914D, and K922Q relative to SEQ ID NO: 1.

In some embodiments, the at least one mutation is K870X and/or N890X. In some embodiments, the at least one mutation is K870A and/or N890G. In some embodiments, the at least one mutation is K870A. In some embodiments, the at least one mutation is N890G.

In some embodiments, the protein scaffold includes at least 3 fewer lysines relative to SEQ ID NO: 1 . For example, in some embodiments, the protein scaffold includes at least 3, 4, 5, 6, 7, 8, 9, or 10 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes at least 6 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes 9 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold does not include any lysines. In some embodiments, the protein scaffold includes at least 3 fewer asparagines relative to SEQ ID NO: 1 . For example, in some embodiments, the protein scaffold includes at least 3, 4, 5, 6, 7, or 8 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes at least 5 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes 7 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold does not include any asparagines.

In some embodiments, A and B are each independently, absent or at least one amino acid. For example, each of A and B may each be independently, absent. In some embodiments, A and B are each independently, at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 30, 400, 500, 600, 700, 800, 900, 1 ,000 or more amino acids. In some embodiments, A and B are each independently, from 0 to 1 ,000 amino acids, e.g., from 1 to 10 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids), from 10 to 100 amino acids (e.g., 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acids, or from 100 to 1 ,000 amino acids (e.g., 100, 200, 300, 400, 500, 600, 700, 800, 900, or

I ,000 amino acids).

In some embodiments, A and B are each independently, absent or from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids).

In some embodiments, each of L1 -L8 is independently, from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids).

In some embodiments, each of L1 -L8 is, independently, from 1 amino acid to 10 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids). In some embodiments, each of L1 -L8 is, independently, from 3 amino acids to 10 amino acids. In some embodiments, each of L1 -L8 is, independently, from 3 amino acids to 8 amino acids.

In some embodiments, L1 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10,

I I , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L1 is from 0 to 5 amino acids (e.g., from 1 to 5 amino acids, e.g., 0, 1 , 2, 3, 4, or 5 amino acids).

In some embodiments, L2 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L2 is from 1 amino acid to 16 amino acids (e.g., from 4 to 16 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, or 16 amino acids).

In some embodiments, L3 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L3 is 6 amino acids.

In some embodiments, L4 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L4 is from 0 to 5 amino acids (e.g., from 1 to 5 amino acids, e.g., 0, 1 , 2, 3, 4, or 5 amino acids).

In some embodiments, L5 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L5 is 5 amino acids.

In some embodiments, L6 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L6 is from 3 to 6 amino acids (e.g., 3, 4, 5, or 6 amino acids).

In some embodiments, L7 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L7 is 4 or 5 amino acids. In some embodiments, L8 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L8 is from 4 to 6 amino acids (e.g., 4, 5, or 6 amino acids).

In some embodiments, L1 is 4 amino acids. In some embodiments, L2 is 7 amino acids. In some embodiments, L8 is 5 amino acids. In some embodiments, L1 is 4 amino acids, L2 is 7 amino acids, and/or L8 is 5 amino acids. In some embodiments, L1 is 4 amino acids, L2 is 7 amino acids, and L8 is 5 amino acids.

In some embodiments, L1 includes the sequence of: X1X2X3X4 (SEQ ID NO: 13), wherein each of X1-X4 is, independently, any amino acid. In some embodiments, X2 is V.

In some embodiments, L2 includes the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid.

In some embodiments, L8 includes the sequence of: XIX2XSX4XS (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid.

In some embodiments, L4 includes the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid.

In some embodiments, L6 includes the sequence of: XIX2X3X4XSX6 (SEQ ID NO: 16), wherein each of Xi-Xe is, independently, any amino acid.

In some embodiments, L8 includes at least two amino acids. In some embodiments, L8 includes at least one amino acid.

In some embodiments, L4 includes the sequence of: (G/D)-GGSS (SEQ ID NO: 17) or GDT or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 17 or GDT.

In some embodiments, L6 includes the sequence of TGAPAG (SEQ ID NO: 18) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 18.

In some embodiments, L4 includes the sequence of: (G/D)-GGSS (SEQ ID NO: 17) or GDT; and L6 includes the sequence of TGAPAG (SEQ ID NO: 18).

In some embodiments, L3 includes the sequence of: (E/K/S)-(V/E)-(V/I/T)-(E/K/P/S)-(V/L)-(G/D) (SEQ ID NO: 19) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 19.

In some embodiments, L5 includes the sequence of: LD-(G/N)-(E/S)-S (SEQ ID NO: 20) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 20.

In some embodiments, L7 includes at least one amino acid.

In some embodiments, L7 includes the sequence of ETPI-(S/E)-A (SEQ ID NO: 21 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 21 .

In some embodiments, L3 includes the sequence of: (E/K/S)-(V/E)-(V/I/T)-(E/K/P/S)-(V/L)-(G/D) (SEQ ID NO: 19) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 19; L5 includes the sequence of: LD-(G/N)-(E/S)-S (SEQ ID NO: 20) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 20; and L7 includes the sequence of ETPI-(S/E)-A (SEQ ID NO: 21 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 21 .

In some embodiments, A includes the sequence of (D/N/H)-P. In some embodiments, A includes the sequence of DP.

In some embodiments, B includes the sequence of DELE (SEQ ID NO: 35).

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 22;

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 23;

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 24;

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 25;

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 26;

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 27;

L6 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 28;

L7 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 29;

L8 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 30.

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

L1 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

L2 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In some embodiments, F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 22;

L1 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 23; L2 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 24;

L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 31 ;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 25

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 32;

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 26;

L5 comprises the sequence of: LDGES (SEQ ID NO: 33) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 33;

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 27;

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 18;

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 28;

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 34;

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 29;

L8 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 30. In some embodiments, F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 22;

L1 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 23;

L2 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 24;

L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 ) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 31 ;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 25;

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 32;

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 26;

L5 comprises the sequence of: LDGES (SEQ ID NO: 33) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 33;

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 27;

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 18;

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 28;

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 34;

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 29;

L8 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 30.

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

L1 includes the sequence of: X1X2X3X4 (SEQ ID NO: 13), wherein each of X1-X4 is, independently, any amino acid;

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

L2 includes the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid;

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 includes the sequence of: XIX2XSX4XS (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid; and F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In some embodiments, A includes the sequence of: DP;

F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 includes the sequence of: X1X2X3X4X5 (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid;

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30); and

B includes the sequence of: DELE (SEQ ID NO: 35).

In some embodiments, L1 includes the sequence of X1X2X3X4 (SEQ ID NO: 13), wherein each of Xi , X3, and X4 is, independently, any amino acid, and X2 is V.

In another aspect, featured is a protein scaffold that includes a polypeptide having at least 80% (e.g., at least 85%, 90%, 95%, 97%, or 99%) sequence identity to SEQ ID NO: 3. In some embodiments, the polypeptide includes the sequence of SEQ ID NO: 3. In some embodiments, the polypeptide does not include the sequence of SEQ ID NO: 1 . In some embodiments, the polypeptide does not include the sequence of SEQ ID NO: 2.

In some embodiments, the polypeptide includes at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815Xi , D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K86OX2, L861 X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X₃ relative to SEQ ID NO: 1 , wherein:

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

In some embodiments of any of the above aspects, the protein scaffold further includes a mutation that adds a cysteine residue. In some embodiments, the protein scaffold includes a first mutation that adds a first cysteine residue and a second mutation that adds a second cysteine residue. In some embodiments, the first cysteine residue and the second cysteine residue form a disulfide bond under oxidizing conditions.

In some embodiments, the protein scaffold comprises at least one mutation selected from the group consisting of F806C, P808C, S845C, L855C, V858C, V861 C, K878C, W879C, L884C, L888C, A904C, P905C, A906GC, G907C, I924C, L926C, N928C, L936C, I943C, L948C.

In some embodiments, the protein scaffold comprises at least two or more mutations selected from the group consisting of F806C, P808C, S845C, L855C, V858C, V861 C, K878C, W879C, L884C, L888C, A904C, P905C, A906GC, G907C, I924C, L926C, N928C, L936C, I943C, L948C.

In some embodiments, the protein scaffold comprises a pair of cysteine mutations selected from the group consisting of K878C and G907C, K878C and A904C, V861 C and I943C, P905C and L855C, S845C and L936C, W879C and N928C, L884C and L926C, F806C and L948C, V858C and L888C, K878C and G907C, K878C and A906GC, S845C and N928C, K878C and A904C, P808C and I943C, V861 C and I924C, P808C and V861 C, and I943C and L855C.

In some embodiments, the pair of cysteine mutations is selected from the group consisting of K878C and G907C, K878C and A904C, S845C and L936C, W879C and N928C, W879C and N928C, L884C and L926C, V858C and L888C, K878C and G907C, and K878C and A906GC (i.e., the substitution of alanine 906 with glycine and cysteine).

In some embodiments of any of the above aspects, the protein scaffold further includes a tag covalently attached to the scaffold.

In some embodiments, the tag is an affinity tag (e.g., a polyhistidine tag, e.g., 4, 5, 6, 7, 8, 9, or 10 histidines), an epitope tag, a covalent tag, or a protein tag.

In some embodiments, the tag is attached to the N-terminus or the C-terminus of the scaffold.

In some embodiments, the scaffold is conjugated to a functional group. In some embodiments, the functional group includes biotin, streptavidin or a derivative of streptavidin, a polyethylene glycol moiety, a fluorescent dye, an enzyme, a radioactive moiety, a lanthanide, or a lanthanide binding motif.

In some embodiments, the scaffold is conjugated to a lanthanide or a lanthanide binding motif. In some embodiments, the lanthanide is terbium.

In some embodiments, the scaffold is conjugated to a radioactive moiety. In some embodiments, the radioactive moiety is an a or emitter.

In some embodiments, the functional group is conjugated to a sulfhydryl group or a primary amine.

In another aspect, featured is a polynucleotide encoding a protein scaffold as described herein, e.g., of any of the above embodiments. In some embodiments, the polynucleotide is a ribonucleotide. In some embodiments, the polynucleotide is a deoxyribonucleotide.

In another aspect, featured is a vector that includes a polynucleotide as described herein.

In another aspect, featured is a cell that includes a polynucleotide encoding the protein scaffold or a vector that includes the polynucleotide.

In another aspect, featured is a method of producing a protein scaffold as described herein, e.g., of any of the above embodiments. The method includes the steps of (a) providing a cell transformed with a polynucleotide encoding the protein scaffold or a vector that includes the polynucleotide; (b) culturing the transformed cell under conditions for expressing the polynucleotide, wherein the culturing results in expression of the protein scaffold. The method may further include (c) isolating the protein scaffold or using the protein scaffold to bind a target.

In another aspect, featured is a particle that includes the protein scaffold of any of the above embodiments. In some embodiments, the particle is a magnetic particle.

In another aspect, featured is a resin that includes a plurality of the particles, e.g., containing the protein scaffold.

In another aspect, featured is a column (e.g., a chromatography column) containing the particles or the resin, e.g., conjugated to the scaffold.

In another aspect, featured is a method of purifying a target molecule from a plurality of molecules. The method includes (a) providing a sample that includes a mixture of the target molecule and the plurality of molecules; (b) contacting the sample with the protein scaffold of any one of the above embodiments, wherein the scaffold specifically binds to the target molecule; and (c) separating the target molecule bound to the protein scaffold from the plurality of molecules.

In some embodiments, the step of separating includes immobilizing the protein scaffold.

In some embodiments, the protein scaffold is conjugated to a particle. In some embodiments, the particle includes a magnetic bead. In some embodiments, the protein scaffold is conjugated to a resin or monolith including a plurality of the particles.

Definitions

The Carbohydrate Binding Module Family 32 (CBM32) scaffold of SEQ ID NO: 1 is derived from a single protein domain of Clostridium perfringens hyaluronidase (NagH), a multi-domain enzyme consisting of 1627 amino acids. Amino acid residue 1 of SEQ ID NO: 1 corresponds to amino acid residue 807 of NagH, and amino acid residue 140 of SEQ ID NO: 1 corresponds to amino acid residue 946 of NagH. Amino acid positions and mutations described herein generally relate to the position on the corresponding full length NagH unless otherwise specified.

The term “constant region,” as used herein, generally refers to a region of a binding scaffold that does not include the variable loop regions involved in target binding. For example, a constant region may include a framework region (e.g., F1 -F9) or a loop region (e.g., L3-L7) that is not one of the three loops mutagenized for target binding (e.g., L1 , L2, and L8). A constant region may have sequence variability.

The term “non-naturally occurring amino acid,” as used herein, means non-proteinogenic amino acids. Examples of non-naturally occurring amino acids include D-amino acids; an amino acid having an acetylaminomethyl group attached to a sulfur atom of a cysteine; a pegylated amino acid; the omega amino acids of the formula NH2(CH2)nCOOH where n is 2-6, neutral nonpolar amino acids, such as sarcosine, t-butyl alanine, t-butyl glycine, N-methyl isoleucine, and norleucine; oxymethionine; phenylglycine; citrulline; methionine sulfoxide; cysteic acid; ornithine; diaminobutyric acid; 3- aminoalanine; 3-hydroxy-D-proline; 2,4-diaminobutyric acid; 2-aminopentanoic acid; 2-aminooctanoic acid, 2-carboxy piperazine; piperazine-2-carboxylic acid, 2-amino-4-phenylbutanoic acid; 3-(2- naphthyl)alanine, and hydroxyproline. Other amino acids are a-aminobutyric acid, a-amino-a- methylbutyrate, aminocyclopropane-carboxylate, aminoisobutyric acid, aminonorbornyl-carboxylate, L- cyclohexylalanine, cyclopentylalanine, L-N-methylleucine, L-N-methylmethionine, L-N-methylnorvaline, L- N-methylphenylalanine, L-N-methylproline, L-N-methylserine, L-N-methyltryptophan, D-ornithine, L-N- methylethylglycine, L-norleucine, a-methyl-aminoisobutyrate, a-methylcyclohexylalanine, D-a- methylalanine, D-a-methylarginine, D-a-methylasparagine, D-a-methylaspartate, D-a-methylcysteine, D- a-methylglutarnine, D-a-methylhistidine, D-a-methylisoleucine, D-a-methylleucine, D-a-methyllysine, D-a- methylmethionine, D-a-methylornithine, D-a-methylphenylalanine, D-a-methylproline, D-a-methylserine, D-N-methylserine, D-a-methylthreonine, D-a-methyltryptophan, D-a-methyltyrosine, D-a-methylvaline, D- N-methylalanine, D-N-methylarginine, D-N-methylasparagine, D-N-methylaspartate, D-N-methylcysteine, D-N-rnethylglutamine, D-N-methylglutamate, D-N-methylhistidine, D-N-methylisoleucine, D-N- methylleucine, D-N-methyllysine, N-methylcyclohexylalanine, D-N-methylornithine, N-methylglycine, N- methylaminoisobutyrate, N-(1 -methylpropyl)glycine, N-(2-methylpropyl)glycine, D-N-methyltryptophan, D- N-methyltyrosine, D-N-methylvaline, y-aminobutyric acid, L-t-butylglycine, L-ethylglycine, L- homophenylalanine, L-a-methylarginine, L-a-methylaspartate, L-a-methylcysteine, L-a-methylglutamine, L-a-methylhistidine, L-a-methylisoleucine, L-a-methylleucine, L-a-methylmethionine, L-a-methylnorvaline, L-a-methylphenylalanine, L-a-methylserine, L-a-methyltryptophan, L-a-methylvaline, N-(N-(2,2- diphenylethyl) carbamylmethylglycine, 1 -carboxy-1 -(2,2-diphenyl-ethylamino) cyclopropane, 4- hydroxyproline, ornithine, 2-aminobenzoyl (anthraniloyl), D-cyclohexylalanine, 4-phenyl-phenylalanine, L- citrulline, a-cyclohexylglycine, L-1 ,2,3,4-tetrahydroisoquinoline-3-carboxylic acid, L-thiazolidine-4- carboxylic acid, L-homotyrosine, L-2-furylalanine, L-histidine (3-methyl), N-(3-guanidinopropyl)glycine, O- methyl-L-tyrosine, O-glycan-serine, meta-tyrosine, nor-tyrosine, L-N,N',N"-trimethyllysine, homolysine, norlysine, N-glycan asparagine, 7-hydroxy-1 ,2,3,4-tetrahydro-4-fluorophenylalanine, 4- methylphenylalanine, bis-(2-picolyl)amine, pentafluorophenylalanine, indoline-2-carboxylic acid, 2- aminobenzoic acid, 3-amino-2-naphthoic acid, asymmetric dimethylarginine, L-tetrahydroisoquinoline-1 - carboxylic acid, D-tetrahydroisoquinoline-1 -carboxylic acid, 1 -amino-cyclohexane acetic acid, D/L- allylglycine, 4-aminobenzoic acid, 1 -amino-cyclobutane carboxylic acid, 2 or 3 or 4-aminocyclohexane carboxylic acid, 1 -amino-1 -cyclopentane carboxylic acid, 1 -aminoindane-1 -carboxylic acid, 4-amino- pyrrolidine-2-carboxylic acid, 2-aminotetraline-2-carboxylic acid, azetidine-3-carboxylic acid, 4-benzyl- pyrolidine-2-carboxylic acid, tert-butylglycine, b-(benzothiazolyl-2-yl)-alanine, b-cyclopropyl alanine, 5,5- dimethyl-1 ,3-thiazolidine-4-carboxylic acid, (2R,4S)4-hydroxypiperidine-2-carboxylic acid, (2S,4S) and (2S,4R)-4-(2-naphthylmethoxy)-pyrolidine-2-carboxylic acid, (2S,4S) and (2S,4R)4-phenoxy-pyrrolidine-2- carboxylic acid, (2R,5S)and(2S,5R)-5-phenyl-pyrrolidine-2-carboxylic acid, (2S,4S)-4-amino-1 -benzoyl- pyrrolidine-2-carboxylic acid, t-butylalanine, (2S,5R)-5-phenyl-pyrrolidine-2-carboxylic acid, 1 - aminomethyl-cyclohexane-acetic acid, 3,5-bis-(2-amino)ethoxy-benzoic acid, 3,5-diamino-benzoic acid, 2- methylamino-benzoic acid, N-methylanthranylic acid, L-N-methylalanine, L-N-methylarginine, L-N- methylasparagine, L-N-methylaspartic acid, L-N-methylcysteine, L-N-methylglutamine, L-N- methylglutamic acid, L-N-methylhistidine, L-N-methylisoleucine, L-N-methyllysine, L-N-methylnorleucine, L-N-methylornithine, L-N-methylthreonine, L-N-methyltyrosine, L-N-methylvaline, L-N-methyl-t- butylglycine, L-norvaline, a-methyl-y-aminobutyrate, 4,4'-biphenylalanine, a-methylcylcopentylalanine, a- methyl-a-napthylalanine, a-methylpenicillamine, N-(4-aminobutyl)glycine, N-(2-aminoethyl)glycine, N-(3- aminopropyl)glycine, N-amino-a-methylbutyrate, a-napthylalanine, N-benzylglycine, N-(2- carbamylethyl)glycine, N-(carbamylmethyl)glycine, N-(2-carboxyethyl)glycine, N-(carboxymethyl)glycine, N-cyclobutylglycine, N-cyclodecylglycine, N-cycloheptylglycine, N-cyclohexylglycine, N-cyclodecylglycine, N-cylcododecylglycine, N-cyclooctylglycine, N-cyclopropylglycine, N-cycloundecylglycine, N-(2,2- diphenylethyl)glycine, N-(3,3-diphenylpropyl)glycine, N-(3-guanidinopropyl)glycine, N-(1 - hydroxyethyl)glycine, N-(hydroxyethyl))glycine, N-(imidazolylethyl))glycine, N-(3-indolylyethyl)glycine, N- methyl-Y-aminobutyrate, D-N-methylmethionine, N-methylcyclopentylalanine, D-N-methylphenylalanine, D-N-methylproline, D-N-methylthreonine, N-(1 -methylethyl)glycine, N-methyl-napthylalanine, N- methylpenicillamine, N-(p-hydroxyphenyl)glycine, N-(thiomethyl)glycine, penicillamine, L-a-methylalanine, L-a-methylasparagine, L-a-methyl-t-butylglycine, L-methylethylglycine, L-a-methylglutamate, L-a- methylhomophenylalanine, N-(2-methylthioethyl)glycine, L-a-methyllysine, L-a-methylnorleucine, L-a- methylornithine, L-a-methylproline, L-a-methylthreonine, L-a-methyltyrosine, L-N-methyl- homophenylalanine, N-(N-(3,3-diphenylpropyl) carbamylmethylglycine, L-pyroglutamic acid, D- pyroglutamic acid, O-methyl-L-serine, O-methyl-L-homoserine, 5-hydroxylysine, a-carboxyglutamate, phenylglycine, L-pipecolic acid (homoproline), L-homoleucine, L-lysine (dimethyl), L-2-naphthylalanine, L- dimethyldopa or L-dimethoxy-phenylalanine, L-3-pyridylalanine, L-histidine (benzoyloxymethyl), N- cycloheptylglycine, L-diphenylalanine, O-methyl-L-homotyrosine, L-p-homolysine, O-glycan-threoine, Ortho-tyrosine, L-N,N'-dimethyllysine, L-homoarginine, neotryptophan, 3-benzothienylalanine, isoquinoline-3-carboxylic acid, diaminopropionic acid, homocysteine, 3,4-dimethoxyphenylalanine, 4- chlorophenylalanine, L-1 ,2,3,4-tetrahydronorharman-3-carboxylic acid, adamantylalanine, symmetrical dimethylarginine, 3-carboxythiomorpholine, D-1 ,2,3,4-tetrahydronorharman-3-carboxylic acid, 3- aminobenzoic acid, 3-amino-1 -carboxymethyl-pyridin-2-one, 1 -amino-1 -cyclohexane carboxylic acid, 2- aminocyclopentane carboxylic acid, 1 -amino-1 -cyclopropane carboxylic acid, 2-aminoindane-2-carboxylic acid, 4-amino-tetrahydrothiopyran-4-carboxylic acid, azetidine-2-carboxylic acid, b-(benzothiazol-2-yl)- alanine, neopentylglycine, 2-carboxymethyl piperidine, b-cyclobutyl alanine, allylglycine, diaminopropionic acid, homo-cyclohexyl alanine, (2S,4R)- 4-hydroxypiperidine-2-carboxylic acid, octahydroindole-2- carboxylic acid, (2S,4R) and (2S,4R)-4-(2-naphthyl), pyrrolidine-2-carboxylic acid, nipecotic acid, (2S,4R)and (2S,4S)-4-(4-phenylbenzyl) pyrrolidine-2-carboxylic acid, (3S)-1 -pyrrolidine-3-carboxylic acid, (2S,4S)-4-tritylmercapto-pyrrolidine-2-carboxylic acid, (2S,4S)-4-mercaptoproline, t-butylglycine, N,N- bis(3-aminopropyl)glycine, 1 -amino-cyclohexane-1 -carboxylic acid, N-mercaptoethylglycine, and selenocysteine. In some embodiments, amino acid residues may be charged or polar. Charged amino acids include alanine, lysine, aspartic acid, or glutamic acid, or non-naturally occurring analogs thereof. Polar amino acids include glutamine, asparagine, histidine, serine, threonine, tyrosine, methionine, or tryptophan, or non-naturally occurring analogs thereof. It is specifically contemplated that in some embodiments, a terminal amino group in the amino acid may be an amido group or a carbamate group.

As used herein, the term “percent (%) identity” refers to the percentage of amino acid residues of a candidate sequence, e.g., a protein scaffold, that are identical to the amino acid residues of a reference sequence, e.g., a wild-type CBM32 polypeptide, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent identity (i.e., gaps can be introduced in one or both of the candidate and reference sequences for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). Alignment for purposes of determining percent identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, ALIGN, or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. In some embodiments, the percent amino acid sequence identity of a given candidate sequence to, with, or against a given reference sequence (which can alternatively be phrased as a given candidate sequence that has or includes a certain percent amino acid sequence identity to, with, or against a given reference sequence) is calculated as follows:

100 x (fraction of A/B) where A is the number of amino acid residues scored as identical in the alignment of the candidate sequence and the reference sequence, and where B is the total number of amino acid residues in the reference sequence. In some embodiments where the length of the candidate sequence does not equal to the length of the reference sequence, the percent amino acid sequence identity of the candidate sequence to the reference sequence would not equal to the percent amino acid sequence identity of the reference sequence to the candidate sequence.

Brief Description of the Drawings

FIG. 1 is a schematic drawing showing an outline of the protein engineering campaign to produce a member of the nC-B class of nanoCLAMPs. The starting nanoCLAMP was anti-SUMO clone SMT3-A1 , a member of the nC-A class of nanoCLAMPs. SMT3A1 was mutated over 7 rounds. At the conclusion of each round, the performance of clone(s) combining different mutations from the round was assessed by DSF and SEC. The end product is clone P2788, whose constant regions served as the basis for the nC- B class of nanoCLAMPs.

FIG. 2 is a space filled model of P2788, an example of the nC-B class of nanoCLAMP with mutations in P2788 mapped on to CBM-32-3 crystal structure and the sequence of the constant region of the nC-A class of nanoCLAMPs. The constant regions of clone P2788 are the basis for the nC-B class of nanoCLAMPs. Side chains of mutated positions are labeled in green; side of chains of variable loops are shown in red. Other side chains are shown in light gray. Backbone residues are shown in dark gray. The alignment compares the constant regions of the nC-A and nC-B classes of nanoCLAMPs. Residues denoted with black, bolded text represent nC-A positions where at least one mutation was tested (top row). Mutations were tested for 58% of positions in the constant regions (72 out of 124). Residues denoted with bolded text represent mutations in P2788 (bottom row). In P2788, 24% of positions in the constant regions are mutated relative to nC-A (30 out of 124).

FIG. 3 is a model showing the superimposition of the crystal structures for CBM32-2 (basis for the nC-A class of nanoCLAMPs) and the AlphaFold model for P2788 (basis for the nC-B class). CBM32-2 (PDB accession 2W1 Q) and P2788 were aligned with jFATCAT (rigid) on the RCSB server, resulting in a high TM-score (0.95). Backbone deviations are apparent and expected for the loops, which have different amino acid sequences.

FIG. 4 is a graph showing differential scanning fluorimetry analysis of SMT3-A1 (nC-A class) and P2788 (nC-B class). Both clones show classically shaped melting curves with low initial fluorescence. The 30 mutations in P2788 increase its T_m by 24 °C relative to SMT3-A1 .

FIGS. 5A-5F are gels and a graph showing protease resistance of nanoCLAMPs of the nC-A and nC-B classes. FIG. 5A shows SDS-PAGE analysis of SMT3-A1 (nC-A class) and P2788 (same variable loops as SMT3-A1 but with constant regions of the nC-B class) after exposure to 16 hr incubation with trypsin or chymotrypsin. FIGS. 5B-5D show SDS-PAGE analysis of time course tryptic digestions of SMT3-A1 , P2788 and P2808. P2788 and P2808 have the same constant regions (nC-B class), but different loops. P2788 and P2808 were resistant to tryptic digest for over 20 hours. FIG. 5E shows quantitative densitometry analysis of the time course-stained gels. FIG. 5F shows SDS-PAGE analysis of members of the nC-A class of nanoCLAMPs (SMT3-A1 ) and nC-B class (P2788, P2808, P2809, and P281 1 ) following 16 h tryptic digestions.

FIG. 6 is a set of size exclusion chromatograms showing monodispersity and melting temperature analysis of anti-SUMO nanoCLAMPs of the nC-B class. Size exclusion chromatography (left panel) and differential scanning fluorescence (right panel) of nanoCLAMPs P2808, P2809 and P281 1 .

FIG. 7 is a graph showing dynamic binding capacity of SMT3-A1 resin (nC-A class) and P2808 resin (nC-B class). Breakthrough curves were generated by loading a solution of 0.2 mg/ml Sumo-GFP in PBS onto 0.6 ml of packed resin in a column (3 cm height x 5 mm ID) at a flowrate of 0.5 ml/min and measuring the fluorescence of the eluate. The percent fluorescence of the load was calculated by diving the eluate fluorescence by the load fluorescence. The dynamic binding capacity (DBC) was calculated with the following formula: DBC = (Vx-Vdeiay) *c/(V_resin). V_x is the volume of eluate collected, Vdeiay is the elution volume of the load under non-binding conditions, c = concentration of target in load, V resin is the volume of the packed resin in the column. The P2808 resin has a dynamic binding capacity of 10 mg/ml resin (240 nmol/ml resin).

FIGS. 8A and 8B are gels showing performance of P2808 resin in single step affinity chromatography purification of GFP-SUMO from spiked lysates in resin-limiting and protein-limiting scenarios. Identical columns were loaded under conditions that were 33% above (FIG. 8A) or 58% below (FIG. 8B) the columns dynamic binding capacity. E. coli lysates with spiked-in target protein (SUMO- GFP) were used as the load. After loading and washing, bound proteins were eluted with 3 M imidazole pH 8. Total protein loaded on SDS-PAGE in FIG. 8A: Lysate = 32 pg, Spiked Lysate = 34 pg, FT = 47 pg, Eluate = 6 pg; FIG. 8B: Lysate = 17 pg, Spiked Lysate = 17 pg, FT = 21 pg, Wash = NA, Eluate = 3 pg. Metrics of purifications in FIG. 8A and 8B are tabulated in Table 6.

FIG. 9 is a graph showing the effect of sodium hydroxide treatment on binding capacity of nanoCLAMP capture agents of the nC-A and nC-B class. The binding capacities of resins with capture agents of the nC-A class (SMT3-A1 , P1519, P1533) and the nC-B class (P2808, P2809, P281 1 ) were determined after each of 22 cycles of purification of GFP-SUMO from a spiked E. coli lysate, followed by washing, eluting, and cleaning in place with 0.1 M NaOH (10 min contact time). The % of starting binding capacity was determined by dividing the eluate fluorescence with the load fluorescence. The selectivity was determined by analysis of the eluates on SDS-PAGE (FIG. 13).

FIGS. 10A-10D are graphs and gels showing the effect of organic solvent and autoclaving on the binding capacity of resins made with nanoCLAMPs of the nC-B class. Resins P2808, P2809, and P281 1 (nC-B class) and Resin SMT3-A1 (nC-A class) were incubated in 100% DMF for 2 h (FIGS. 10A and 10B) or autoclaved (105-minute liquid steam cycle, including 30 min exposure to 120 °C and 20 p.s.i.) (FIGS. 10C and 10D) and then re-equilibrated in fresh buffer and tested in affinity chromatography purification of SUMO-GFP from spiked E. coli lysate. Binding capacity % of untreated was determined by dividing the eluate fluorescence by the control (non-treated) eluate fluorescence. Specificity was determined by Coomassie staining of SDS-PAGE (FIGS. 10B and 10D).

FIG. 11 is a graph showing kinetic thermal stability of nC-A (SMT3-A1 ) and nC-B (P2808, P2809, and P281 1 ) nanoCLAMPs. nanoCLAMPs were heat treated, cooled and centrifuged. The supernatant was tested for binding activity by biolayer interferometry. The percent of starting response was measured as the amplitude of binding divided by that obtained by the control sample (held at 20 °C during heat treatments). FIG. 12 is a gel showing static binding capacity of P2808 resin. Affinity resin prepared with P2808 (nC-B class) was incubated with a spiked E. coli lysate; washed; eluted with 3 M imidazole, pH 8; buffer exchanged; and quantified by A280.

FIG. 13 is a set of gels showing the effect of sodium hydroxide treatment on specificity of nanoCLAMP capture agents of the nC-A and nC-B class. nC-A (SMT3-A1 , P1519, P1533) and nC-B (P2808, P2809, P2811 ) Sumo-binding nanoCLAMPs were covalently conjugated to 6% cross-linked agarose resin and then used to purify a Sumo-GFP fusion from a crude E. coli lysate. Each cycle consisted of a crude Sumo-GFP-spiked sample load, wash, elution with 3M Imidazole (collected), wash, 0.1 M NaOH regeneration (10 min contact time per cycle), and a 5 min refolding wash. The target protein in the eluate was quantified by fluorescence spectroscopy (FIG. 9), and the percent yield calculated by dividing the fluorescence by the initial eluate fluorescence. The purity of the eluted target from each cycle was assessed by SDS-PAGE and stained with Coomassie. Cycle number is shown for each lane, L = Load, M = marker. The prominent band in the eluates of each gel is SUMO-GFP (42 kD).

FIGS. 14A and 14B are a graph and a gel testing the stability of Resin P2808 (nC-B class) through >20 low pH elution cycles. SUMO-GFP was spiked into crude E. coli lysate, loaded onto P2808 resin, washed, and eluted with 0.1 M Citrate, pH 2.5, followed by regeneration with 0.1 N NaOH with 1 min contact time per cycle, and a re-equilibration wash for 5 min. FIG. 14A shows the target protein in the eluate, which was quantified by densitometry of the Coomassie stained SDS-PAGE gel of FIG. 14B because the fluorescence of the eluate was destroyed by the low pH. Percent yield was calculated by dividing the band density by the initial eluate band density. The purity of the eluted target from each cycle was assessed by SDS-PAGE of FIG. 14B.

FIG. 15 is a graph showing nanoCLAMPs stably binding terbium (Tb). SMT3-A1 (nC-A class), P2808 (nC-B class), and a negative control protein (recombinant SMT3) were incubated with CaCL or TbCIs overnight, and then buffer exchanged to remove unbound metals. The buffer exchanged proteins were analyzed by time resolved fluorescence 24 h post buffer exchange (Ex/Em: 350 nm/544 nm), 200 p- sec delay.

FIG. 16 is a model showing the front, back, top, and bottom faces of the nC-B class of nanoCLAMP. A, B, F1 -9, and L1 -L8 are mapped on clone P2808 sequence and 3D-modeled to illustrate the locations of each region of the scaffold.

FIG. 17 is a model showing an alignment of the P2808, an example of the nC-B class, in which Loop 1 is replaced with a (G4S)s sequence or is removed.

FIG. 18 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 2 is replaced with a (G4S)s sequence or is removed.

FIG. 19 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 4 is replaced with a (G4S)s sequence or is removed.

FIG. 20 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 6 is replaced with a (G4S)s sequence or GG.

FIG. 21 is a model showing the nC-B nanoCLAMP P2808 in which Loop 8 is replaced with a GGGGG (SEQ ID NO: 36), GGGG (SEQ ID NO: 37), GGG, GG, or G, or is removed.

FIG. 22 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 8 is replaced with a (G4S)s sequence or is removed. FIG. 23 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 3 is replaced with a (648)3 sequence or is removed.

FIG. 24 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 5 is replaced with a (648)3 sequence or is removed.

FIG. 25 is a model showing the nC-B nanoCLAMP P2808 in which Loop 7 is replaced with a GGGGG (SEQ ID NO: 36), GGGG (SEQ ID NO: 37), GGG, GG, or G, or is removed.

FIG. 26 is a model showing an alignment of the nC-B nanoCLAMP P2808 in which Loop 8 is replaced with a (648)3 sequence or G.

FIG. 27 is a gel showing introduction of artificial disulfide bonds into clones P2808 and P2960. SDS-PAGE analysis of P2808 and P2960 variants, mutated to contain pairs of adjacent Cysteines, under oxidizing and reducing conditions. Purified proteins were treated with SDS sample buffer containing (first lane of each set) or lacking (second lane of each set) reducing agent (DTT). The presence of faster migrating species indicates disulfide bonding in samples lacking DTT, likely due to more compact folding and a smaller hydrodynamic radius. P2808 and P2960 contain no Cys residues and migrate at the same rate in oxidizing and reducing sample buffer, as expected. BSA, which contains 17 disulfide bonds, is included as a control for the activity of the reducing agent (DTT).

FIG. 28 is a graph showing that artificial disulfide improves thermal stability of P3015 by 9 °C. The graph shows differential scanning fluorescence (DSF) analysis of melting temperature of reduced and oxidized P3015.

Detailed Description

Immunoaffinity chromatography is an established laboratory-scale technique for the isolation of target proteins with high yield and purity. However, properties of antibodies and nanobodies often make immunoaffinity chromatography incompatible with conditions typical of many industrial scale processes. To overcome these limitations, the present invention features an antibody-mimetic scaffold, called a nanoCLAMP, that can be used in process-scale affinity chromatography. The 16 kD antibody mimetic is based on a bacterial, cysteine-free, p-sandwich protein with a structure analogous to immunoglobulin variable domains. Like antibodies and other antibody mimetics, the first generation of nanoCLAMPs generally showed high selectivity and affinity but also suffered from sensitivity to high temperature, digestion by proteases, and inactivation by alkali. The present invention solves this problem by engineering a plurality of mutations in in the nanoCLAMP scaffold to improve the general robustness of nanoCLAMPs and resistance to extreme conditions.

This mutated scaffold serves as the basis for an improved nanoCLAMP class, called the nC-B class. Phage display was used to generate hundreds of nC-B capture agents recognizing diverse targets. The resulting immunoaffinity capture agents typically had a Kd of < 80 nM, a T_m of > 70 °C and a ti/2 in 0.1 mg/ml trypsin of > 20 hours. The nC-B capture agents also maintained their binding capacity and selectivity over 20 purification cycles, each including 10 minutes of cleaning in place with 0.1 M NaOH. Affinity chromatography resins made with nC-B capture agents supported efficient single-step purifications from crude mixtures. Target proteins could be eluted with either 3 M imidazole, pH 8 or 0.1 M sodium citrate, pH 2.5. Furthermore, affinity chromatography resins with nC-B capture agents remained functional after exposure to 100% DMF and autoclaving. The robust nanoCLAMP scaffold described herein allows for the development of custom, high performance affinity chromatography resins compatible with the harsh conditions of process-scale applications that can be adaptable to a wide diversity of target substrates.

Protein Scaffolds

The scaffolds described herein are derived from the Carbohydrate Binding Module Family 32 (CBM32) protein domain of Clostridium perfringens hyaluronidase (NagH), a multi-domain enzyme consisting of 1627 amino acids. Amino acid residue 1 of SEQ ID NO: 1 corresponds to amino acid residue 807 of NagH, and amino acid residue 140 of SEQ ID NO: 1 corresponds to amino acid residue 946 of NagH. Amino acid positions and mutations described herein generally relate to the position on the corresponding full length NagH unless otherwise specified. The WT sequence of CBM32 is shown below:

CBM32 SEQ (SEQ ID NO: 1 ) NPSLIRSESWQVYEGNEANLLDGDDNTGVWYKTLNGDTSLAGEF IGLDLGKEIKLDGIRFVIGKNGGGS SDKWNKFK LEYSLDNESWTT IKEYDKTGAPAGKDVIEESFETP I SAKYIRLTNMENINKWLTFSEFAIVSD

Previous work identified a scaffold in which three or five loop regions (L1 , L2, and L8; or L1 , L2, L4, L6, and L8) were mutagenized to form binders to diverse protein targets instead of carbohydrates, a property not expected for a carbohydrate binding module. In some embodiments, the protein scaffold does not retain carbohydrate binding activity, e.g., of the native CBM scaffold. Loop L1 corresponds to residues 817-820, loop L2 corresponds to residues 838-844, and loop L8 corresponds to residues 931 - 935. The original scaffold (nC-A) is shown below and contains a single M929L mutation relative to SEQ ID NO: 1 . X denotes a variable loop residue, and each X may independently be any residue. nC-A Scaffold sequence (SEQ ID NO: 2) NPSLIRSESWXXXXGNEANLLDGDDNTGVWYXXXXXXXSLAGEF IGLDLGKEIKLDGIRFVIGKNGGGS SDKWNKFK

LEYSLDNESWTT IKEYDKTGAPAGKDVIEESFETP I SAKYIRLTNLEXXXXXLTFSEFAIVSD

The current scaffold (nC-B) described herein is based on the exemplary scaffold of SEQ ID NO: 3 shown below. X denotes a variable loop residue, and each X may independently be any residue. nC-B Scaffold sequence full length (SEQ ID NO: 3) DPTLIHTPGWXXXXGSEADLLDGDDSTGVEYXXXXXXXSLAGEF IGLDLGEWEVGGIHFVIGADGGGS SDKWTRFR LEYSLDGESWTT IREYDHTGAPAGQDVIDEDFETP I SAQYIRLTNLEXXXXXLTFSEFAIVSDELE

The protein scaffolds described herein include (e.g., consist of) framework regions (F) and loop regions (L). The scaffolds generally have the structure of:

A-F1 -L1 -F2-L2-F3-L3-F4-L4-F5-L5-F6-L6-F7-L7-F8-L8-F9-B.

F1 -F9 correspond to framework regions 1 -9, and L1 -L8 correspond to loop regions 1 -8. Framework regions and loop regions were selected based on where beta strands turn to loops or where beta strands show a sharp turn from the plane of the strand’s beta sheet (see FIGS. 2 and 16). The N- and C- termini of the scaffold, A and B, may each independently, be present (e.g., contain one or more amino acids) or absent. F1 includes the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 4;

F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 10;

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 1 1 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 1 1 ;

L8 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and F9 includes the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 12.

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

In some embodiments,

F5 includes the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 8;

F6 includes the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 9; F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 10;

In some embodiments,

F1 includes the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4);

F3 includes the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6);

F5 includes the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8);

F6 includes the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9);

F7 includes the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10);

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ); and

F9 includes the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12).

In some embodiments, the protein scaffold includes at least 3 fewer lysines relative to SEQ ID NO: 1 . For example, in some embodiments, the protein scaffold includes at least 3, 4, 5, 6, 7, 8, 9, or 10 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes at least 6 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes 9 fewer lysines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold does not include any lysines.

In some embodiments, the protein scaffold includes at least 3 fewer asparagines relative to SEQ ID NO: 1 . For example, in some embodiments, the protein scaffold includes at least 3, 4, 5, 6, 7, or 8 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes at least 5 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold includes 7 fewer asparagines relative to SEQ ID NO: 1 . In some embodiments, the protein scaffold does not include any asparagines.

In some embodiments, A and B are each independently, absent or at least one amino acid. For example, each of A and B may each be independently, absent. In some embodiments, A and B are each independently, at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 30, 400, 500, 600, 700, 800, 900, 1 ,000 or more amino acids. In some embodiments, A and B are each independently, from 0 to 1 ,000 amino acids, e.g., from 1 to 10 amino acids (e.g., 1 , 2, 3, 4, 5, 6,7, 8, 9, or 10 amino acids), from 10 to 100 amino acids (e.g., 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acids, or from 100 to 1 ,000 amino acids (e.g., 100, 200, 300, 400, 500, 600, 700, 800, 900, or

I ,000 amino acids).

In some embodiments, A and B are each independently, absent or from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids).

In some embodiments, each of L1 -L8 is independently, from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids).

In some embodiments, L2 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L2 is from 1 amino acid to

16 amino acids (e.g., from 4 to 16 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, or 16 amino acids).

In some embodiments, L4 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L4 is from 0 to 5 amino acids (e.g., from 1 to 5 amino acids, e.g., 0, 1 , 2, 3, 4, or 5 amino acids)

In some embodiments, L7 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L7 is 4 or 5 amino acids.

In some embodiments, L8 is from 1 amino acid to 20 amino acids (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids). In some embodiments, L8 is from 4 to 6 amino acids (e.g., 4, 5, or 6 amino acids).

In some embodiments, L6 includes the sequence of: XIX2X3X4XSX6 (SEQ ID NO: 16), wherein each of Xi-Xe is, independently, any amino acid. In some embodiments, L8 includes at least two amino acids. In some embodiments, L8 includes at least one amino acid.

In some embodiments, L7 includes at least one amino acid.

In some embodiments, B includes the sequence of DELE (SEQ ID NO: 35). In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 22;

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 29;

L8 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 30.

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 23;

L2 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 24; L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 31 ;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 25;

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 30.

In some embodiments, F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 22;

L1 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 23;

L8 is absent or comprises at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6,

7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one amino acid insertion, deletion, or substitution mutation (e.g., one substitution mutation) relative to SEQ ID NO: 30.

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

L1 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7,

8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23); L2 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 is absent or includes at least one amino acid (e.g., 1 to 20 amino acids, e.g., 1 , 2, 3, 4, 5, 6, 7,

8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids); and

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In some embodiments, F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 includes the sequence of: XIX2XSX4XS (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid; and

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In some embodiments, A includes the sequence of: DP;

F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23); L2 includes the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid;

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 includes the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 includes the sequence of: GGGSS (SEQ ID NO: 32);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 includes the sequence of: LDGES (SEQ ID NO: 33);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 includes the sequence of: TGAPAG (SEQ ID NO: 18);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 includes the sequence of: ETPISA (SEQ ID NO: 34);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29);

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30); and

B includes the sequence of: DELE (SEQ ID NO: 35).

In another aspect, featured is a polypeptide having at least 85% (e.g., at least 90%, 95%, 97%, 99%, or 100%) sequence identity to a polypeptide of Table 9 or Table 10. In some embodiments, the polypeptide includes a sequence as set forth in Table 9 or Table 10.

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

One of skill in the art would understand that the protein scaffolds described herein, which contain 9 framework regions (i.e., F1 -F9) may be optimized or swapped according to established biophysical techniques. Accordingly, the invention also features protein scaffolds containing 7 out of 9 or 8 out of 9 framework regions described herein. Based on the detailed structural analysis of the scaffold known in the art (see, e.g., Ficko-Blean et al. J. Mol. Bio. 390: 208-220, 2009) and PDB ID 2w1 q, one of skill in the art could swap out one or more beta strands or a portion thereof (e.g., more than 2 residues of a given framework region, e.g., any one of F1 -F9) of the core protein fold, while still maintaining structural integrity of the overall scaffold. One could generate a phage library where each member of a library expresses a protein scaffold with loops that confer binding to a specific target and with randomized amino acids at each position in a specific beta strand. The phage library could then be selected for library members that are thermostable and maintain binding to a target by selecting the library for those members that can withstand an incubation at >55°C without aggregation and for those members that can bind to an immobilized target. The isolated clones with these properties would represent scaffolds with a swapped out beta strand or portion thereof.

Accordingly, in some embodiments, the invention also contemplates protein scaffolds having at least 7, e.g., at least 8, of the following framework regions, wherein 7 of the 9 or 8 of the 9 framework regions have the following sequence:

F8 includes the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 11 ; and

In other embodiments, the invention also contemplates protein scaffolds having at least 7, e.g., at least 8, of the following framework regions, wherein 7 of the 9 or 8 of the 9 framework regions have the following sequence: F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 22;

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof (e.g., one or two substitution mutations) relative to SEQ ID NO: 29; and

In other embodiments, the invention also contemplates protein scaffolds having at least 7, e.g., at least 8, of the following framework regions, wherein 7 of the 9 or 8 of the 9 framework regions have the following sequence:

F1 includes the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 includes the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 includes the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

F4 includes the sequence of: GIHFVIGAD (SEQ ID NO: 25);

F5 includes the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

F6 includes the sequence of: WTTIREYDH (SEQ ID NO: 27);

F7 includes the sequence of: QDVIDEDF (SEQ ID NO: 28);

F8 includes the sequence of: QYIRLTNLE (SEQ ID NO: 29); and

F9 includes the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

In another aspect, featured is a protein scaffold that includes a polypeptide having at least 80% (e.g., at least 85%, 90%, 95%, 97%, or 99%) sequence identity to the framework regions (F1 -F9) over the region of alignment corresponding to F1 -F9 of the reference sequence (e.g., SEQ ID NO: 3). In some embodiments, the protein scaffold includes one or more non-natural amino acids. In some embodiments, one or more of the framework regions includes a non-natural amino acid. In some embodiments, one or more of the loop regions includes a non-natural amino acid.

Cysteine Mutations and Disulfide Bridges

The protein scaffolds described herein may lack native cysteine residues. Accordingly, the scaffold may be mutagenized to introduce one or more cysteine residues into the scaffold (e.g., in one or more loop or framework regions). When two or more cysteine residues are introduced into the scaffold at nearby sites, the two cysteine residues may form a disulfide bridge, e.g., under oxidizing conditions. In some embodiments, the disulfide bridge enhances thermal stability of the protein scaffold.

In some embodiments, the protein scaffold includes a mutation that adds a cysteine residue. In some embodiments, the protein scaffold includes a first mutation that adds a first cysteine residue and a second mutation that adds a second cysteine residue. In some embodiments, the first cysteine residue and the second cysteine residue form a disulfide bond under oxidizing conditions.

In some embodiments, the pair of cysteine mutations is selected from the group consisting of K878C and G907C, K878C and A904C, S845C and L936C, W879C and N928C, W879C and N928C, L884C and L926C, V858C and L888C, K878C and G907C, and K878C and A906GC.

Tags and Functional Groups

The protein scaffolds described herein may further include a tag. A tag may provide for ease of purification, detection or attachment of the protein scaffold. The tag may be covalently attached to the scaffold. In some embodiments, A and/or B of the scaffold is or includes a tag.

In some embodiments, the tag is an affinity tag (e.g., a polyhistidine tag, e.g., 4, 5, 6, 7, 8, 9, or 10 histidines, e.g., Gly-His tags, e.g., AviTag, e.g., Calmodulin-tag, e.g., polyglutamate tag, e.g., polyarginine tag, e.g., SBP-tag).

In some embodiments, the tag is an epitope tag (e.g., ALFA-tag, C-tag, iCapTag, E-tag, FLAG- tag, HA-tag, Myc-tag, NE-tag, Rho1 D4-tag, S-tag, Softag 1 , Softag 3, Spot-tag, T7-tag, TC tag, Ty tag, V6 tag, VSV-tag or Xpress tag).

In some embodiments, the tag is a covalent protein tag (e.g., Isopeptag, SpyTag, SnoopTag, DogTag or SdyTag). In some embodiments, the tag is a protein tag (e.g., biotin carboxyl carrier protein tag, glutathione-S-transferase (GST) tag, green fluorescent protein (GFP) tag, HaloTag, SNAP-tag, CLIP-tag, HUH-tag, maltose binding protein tag, Nus tag, thioredoxin tag, Fc tag, Designed Intrinsically Disordered tag, CRDSAT tag, SpyCatcher, SnoopCatcher, DogCatcher, SdyCatcher, or SUMO-tag.

In some embodiments, A and/or B of the scaffold includes an affinity tag, epitope tag, covalent peptide tag, or protein tag.

In some embodiments, the scaffold is conjugated to a radioactive moiety. In some embodiments, the radioactive moiety is an a or p emitter.

In some embodiments, the functional group is conjugated to a sulfhydryl group or a primary amine (e.g., on a cysteine residue or a lysine).

Polynucleotides, Vectors, and Cells

The protein scaffolds described herein may be encoded by a polynucleotide. In some embodiments, the polynucleotide is a ribonucleotide. In some embodiments, the polynucleotide is a deoxyribonucleotide. Also contemplated herein is a vector that includes a polynucleotide encoding the protein scaffold.

In other embodiments, featured is a cell that includes a polynucleotide encoding the protein scaffold or a vector that includes the polynucleotide. The polynucleotide or vector may include an expression element configured to drive expression of the protein scaffold. The cell may be a prokaryotic cell (e.g., E. coli). The cell may be a eukaryotic cell. In some embodiments, the eukaryotic cell is yeast cell (e.g., S. cerevisiae) or a mammalian cell (e.g., a Chinese hamster ovary (CHO) cell). In some embodiments, the protein scaffold is secreted by the cell. In some embodiments, the protein scaffold is expressed within the cell. Such a cell (e.g., E. coli) may be lysed to provide a lysate that includes the protein scaffold.

Also featured herein is a method of producing a protein scaffold as described herein. The method includes the steps of providing a cell transformed with a polynucleotide encoding the protein scaffold or a vector that includes the polynucleotide and culturing the transformed cell under conditions for expressing the polynucleotide. The culturing step results in expression of the protein scaffold. The method may further include isolating the protein scaffold or using the protein scaffold to bind a target.

Particles, Resins, and Columns

The protein scaffolds described herein may be conjugated to a particle. In some embodiments, the particle is a magnetic particle. Also featured is a resin or monolith that includes a plurality of the particles, e.g., containing the protein scaffold. Also contemplated herein is a column (e.g., a chromatography column) containing the particles or the resin, e.g., conjugated to the scaffold. The scaffolds and methods of use thereof can use a surface linked to the protein scaffold, which is configured to bind its target. The surface of the resin refers to a part of a support structure (e.g., a substrate) that is accessible to contact with one or more target molecules. The shape, form, materials, and modifications of the surface of the resin can be selected from a range of options depending on the application. In one embodiment, the surface of the resin is SEPHAROSE®. In one embodiment, the surface of the resin is agarose.

The surface of the resin can be substantially flat or planar. Alternatively, the surface of the resin can be rounded or contoured. Exemplary contours that can be included on a surface of the resin are wells, depressions, pillars, ridges, channels or the like.

In one embodiment, the surface of the resin is modified to contain channels, patterns, layers, or other configurations (e.g., a patterned surface). The surface can be in the form of a bead, box, column, cylinder, disc, dish (e.g., glass dish, PETRI dish), fiber, film, filter, microtiter plate (e.g., 96-well microtiter plate), multi-bladed stick, net, pellet, plate, ring, rod, roll, sheet, slide, stick, tray, tube, or vial. The surface can be a singular discrete body (e.g., a single tube, a single bead), any number of a plurality of surface bodies (e.g., a rack of 10 tubes, several beads), or combinations thereof (e.g., a tray includes a plurality of microtiter plates, a column filled with beads, a microtiter plate filed with beads).

In some embodiments, a surface can include a membrane-based resin matrix. In some embodiments, the surface of the resin includes a porous resin or a non-porous resin. Examples of porous resins can include additional agarose-based resins (e.g., cyanogen bromide activated SEPHAROSE® (GE); WORKBEADS™ 40 ACT and WORKBEADS™ 40/10000 ACT (Bioworks)), methacrylate: (Tosoh 650M derivatives etc.), polystyrene divinylbenzene (Life Tech Poros media/ GE Source media), fractogel, polyacrylamide, silica, controlled pore glass, dextran derivatives, acrylamide derivatives, convective- interaction media (Sartorius), additional polymers, and combinations thereof.

In some embodiments, a surface can include one or more pores. In some embodiments, pore sizes can be from 300 to 8,000 Angstroms, e.g., 500 to 4,000 Angstroms in size.

A resin as described herein includes a plurality of particles. Examples of particle sizes are 5 pm - 500 pm, 20 pm -300 pm, and 50 pm -200 pm. In some embodiments, particle size can be 50 pm, 60 pm, 70 pm, 80 pm, 90 pm, 100 pm, 110 pm, 120 pm, 130 pm, 140 pm, 150 pm, 160 pm, 170 pm, 180 pm, 190 pm, or 200 pm.

A protein scaffold can be immobilized, coated on, bound to, stuck, adhered, or attached to any of the forms of surfaces described herein (e.g., bead, box, column, cylinder, disc, dish (e.g., glass dish, PETRI dish), fiber, film, filter, microtiter plate (e.g., 96-well microtiter plate), multi-bladed stick, net, pellet, plate, ring, rod, roll, sheet, slide, stick, tray, tube, or vial).

Methods of Purification

Featured herein is a method of purifying a target molecule, e.g., from a plurality of molecules, e.g., from a crude lysate. The method includes providing a sample that includes a mixture of the target molecule and the plurality of molecules and contacting the sample with a protein scaffold as described herein. The protein scaffold may have previously been generated with loop regions that are specific to the desired target. The scaffold (e.g., the loop regions of the scaffold) specifically binds to the target molecule. The method further includes separating the target molecule bound to the protein scaffold from the plurality of molecules. In some embodiments, the step of separating includes immobilizing the protein scaffold. In some embodiments, the protein scaffold is conjugated to a particle. In some embodiments, the particle includes a magnetic bead. In some embodiments, the protein scaffold is conjugated to a resin or monolith as described herein.

The scaffolds, particles, resins, and columns described herein are amenable to single-step purifications from crude mixtures. For example, target proteins may be eluted with polyol, imidazole (e.g., 3 M imidazole, e.g., pH 8), or sodium citrate (e.g., or 0.1 M sodium citrate, e.g., pH 2.5). The scaffolds, particles, resins, and columns may be cleaned, e.g., with an alkaline substance, e.g., NaOH, e.g., 0.1 M NaOH. Furthermore, resins or columns made with a protein scaffold as described herein may remain functional after exposure to dimethylformamide (DMF), e.g., 100% DMF and/or autoclaving. These features allow reuse of the scaffold without loss of target binding capabilities.

Examples

Example 1. Protein engineering of nanoCLAMP antibody-mimetics for use as affinity chromatography capture agents resistant to high temperature, trypsin, low pH, organic solvent, and sodium hydroxide

Immunoaffinity chromatography is an established laboratory-scale technique for the isolation of target proteins with high yield and purity. However, properties of antibodies and nanobodies often make immunoaffinity chromatography incompatible with conditions typical of many industrial scale processes. To overcome these limitations, we have optimized an antibody-mimetic, called a nanoCLAMP, for use in process-scale affinity chromatography. The 16 kD antibody mimetic is based on a bacterial, cysteine- free, p-sandwich protein with a structure analogous to immunoglobulin variable domains. Like antibodies and other antibody mimetics, the first generation of nanoCLAMPs generally showed high selectivity and affinity but also suffered from sensitivity to high temperature, digestion by proteases, and inactivation by alkali. In this work, we address these limitations with a protein engineering campaign to improve the general robustness of nanoCLAMPs. Over 7 rounds of mutagenesis and screening, we tested 185 protein variants, with at least one mutation made in 58% of the positions in the nanoCLAMP’s constant regions (72 of 124). The campaign yielded a protein with mutations in 30 of 124 positions in the constant regions and dramatically improved resistance to extreme conditions. The mutant protein served as the basis for an improved nanoCLAMP class, called the nC-B class. Phage display was then used to generate several nC-B capture agents recognizing diverse targets. The resulting immunoaffinity capture agents typically had a Kd of < 80 nM, a T_m of > 70 °C and a ti/2 in 0.1 mg/ml trypsin of > 20 hours. The nC-B capture agents also maintained their binding capacity and selectivity over 20 purification cycles, each including 10 minutes of cleaning in place with 0.1 M NaOH. Affinity chromatography resins made with nC-B capture agents supported efficient single-step purifications from crude mixtures. Target proteins could be eluted with either 3 M imidazole, pH 8 or 0.1 M sodium citrate, pH 2.5. Furthermore, affinity chromatography resins with nC-B capture agents remained functional after exposure to 100% DMF and autoclaving. The robust nC-B scaffold developed in this work enables the development of custom, high performance affinity chromatography resins compatible with the harsh conditions of process-scale applications. Introduction

For the process-scale purification of non-antibody targets, the use of AC is much less widespread than the use of Protein A to purify antibodies. For instance, non-protein, ligand-based approaches, such as small molecule substrate mimetics, are effective but are limited to specific enzyme classes and are difficult to use with a general protein of interest. Alternatively, specialized affinity resins such as glutathione or nickel require the addition of non-native tags, which cause downstream complications for proteins intended for therapeutic use. Immuno-AC with antibody- or nanobody-based capture agents is the most generally applicable approach and has widely been used to purify a diverse range of proteins at laboratory-scale. However, immuno-AC has some limitations. In general, the conjugation of the antibody to the resin often results in heterogenous coupling because of a lack of precise control over the sites of conjugation. In addition, chromatography must be performed under oxidizing conditions in order to preserve the disulfide bonds essential for maintaining antibody structure. Elution of the target also usually requires low or high pH conditions that are incompatible with some target proteins. For processscale applications, the chief limitation of immuno-AC resins is their sensitivity to the sodium hydroxide solutions which are preferred for cleaning-in-place procedures.

These limitations have motivated the development of AC resins based on antibody mimetics - proteins that, like antibodies, can be produced to bind specific antigens with high affinity and specificity, but are not directly derived from the immune system of animals. Examples of antibody mimetics include those based on Protein A, gamma-b crystallin, ubiquitin, cystatin, lipocalins, ankyrin repeat motifs, SH3 domains, fibronectin, OB fold domains, lamprey variable lymphocyte receptors, minibodies, miniproteins, and Kunitz domains. Most of these antibody mimetics use animal-sparing phage display for their isolation and can be produced by microbial cells. Many have a unique cysteine that supports homogeneous, sitespecific coupling to sulfhydryl-reactive supports. However, few of the current antibody mimetics have been shown to enable elution near neutral pH or to be compatible with harsh conditions sometimes needed for process-scale procedures. The development of custom peptide and protein-based affinity chromatography platforms for the purification of non-antibody protein therapeutics is supported by some proprietary platforms, but the availability and technical details of these platforms is limited (e.g., Avitide, LigaTrap Technologies, Astrea Bioseparations, and Navigo Proteins).

We set out to develop a broadly available and generally applicable class of protein-based affinity capture reagents useful for industrial protein purification. Specifically, we sought to develop an AC capture agent technology with the potential to address a broad range of targets; enable single-step purifications from crude mixtures; elute targets at near neutral pH; and maintain function after exposure to high temperatures, organic solvents, proteases and pH extremes.

We previously developed an antibody mimetic based on the 16 kD, 2^nd Type 32 carbohydrate binding module of the hyaluronoglucosaminidase nagH from Clostridium perfringens (NagH CpCBM32-2) (Suderman et al. Protein Expr. Purif. 134:1 14-124, 2017). This binding module is a monomeric p- sandwich domain with variable loops comparable to the complementary determining regions of the immunoglobulin variable domain. We named these antibody mimetics nanoCLAMPs (nano Clostridial Antibody Mimetic Proteins). nanoCLAMPs have the unusual and advantageous general property of releasing bound target protein in solutions of non-denaturing polyols and ammonium sulfate at neutral pH. We have isolated nanoCLAMPs recognizing a variety of target proteins with Kd’s ranging from 1 to 100 nM before affinity maturation and from 10 to 1000 pM after affinity maturation (Suderman et al. 2017). Affinity chromatography media produced with nanoCLAMPs support single-step purifications to near homogeneity as assessed by Coomassie staining. The working binding capacity ranges from 5 to 200 nmol target protein per ml of packed beads. While these first generation nanoCLAMP resins have adequate selectivity and capacity for laboratory-scale purifications, they are suboptimal for process-scale purifications because of their moderate thermostability (T_m ranging from 45 to 60 °C), sensitivity to protease digestion (t 1/2 < 1 h in 0.1 mg/ml trypsin), and moderate alkaline resistance (50% loss of activity after 12 cycles of incubation with 0.1 M NaOH).

We set out to improve upon the performance of the first generation of nanoCLAMPs in order to improve their general utility for the process-scale purification of targets with AC. Toward this end, we undertook a multi-round protein engineering campaign to improve upon first generation nanoCLAMPs. In each round, we made site-directed mutations at specific positions, evaluated the mutations’ impact on thermostability and monodispersity, and then combined beneficial mutations to generate the basis for the next round. The end-product of the 7-round, >180 mutation campaign was a clone with significantly improved performance. We used this clone as the basis for the new “nC-B” class of nanoCLAMPs. Here we report on the campaign to develop the improved nC-B class, the generation and characterization of nC-B nanoCLAMPs against the exemplary protein yeast SUMO (SMT3), and the performance of nC-B nanoCLAMPs after repeated exposure to extreme conditions.

Results

Approach to improve nanoCLAMP performance with a multi-round campaign of site-directed mutagenesis.

We aimed to improve the thermal, proteolytic, and alkaline stability of the first generation of nanoCLAMPs by using consensus protein design, an approach for improving the thermostability of proteins. Our initial attempts of directly synthesizing several versions of the consensus sequence were unsuccessful and yielded only aggregated or multimeric proteins. Therefore, we decided to take an incremental approach of making single mutations, determining their effect alone and in combination, and working towards an improved protein in several rounds. We focused the effort on surface residues and loops and generally sought to remove lysine, arginine and asparagine where possible. The intent of making substitutions for lysine and arginine was to reduce the number of potential sites for cleavage by trypsin. The intent of making substitutions for asparagine was to increase the protein’s resistance to the alkaline solutions commonly used to sanitize industrial chromatography columns. Asparagine in certain contexts is susceptible to deamidation, and its removal has been shown to reduce the loss of protein binding activity in sodium hydroxide.

We refer to an individual nanoCLAMP as a “clone,” i.e., a specific isolate with unique binding loops. Our starting clone consisted of an anti-SMT3 nanoCLAMP, SMT3-A1 (Suderman et al. supra), which is capable of purifying SUMO-fusion proteins in a single step from complex lysates. The target of SMT3-A1 is a SUMO tag, which is widely used to improve the solubility and yield of proteins produced in E. coli, and can be cleaved to leave behind a native sequence. SMT3-A1 consists of residues 807 to 946 of the 2^nd Type 32 carbohydrate binding module of NagH from Clostridium perfringenswth loops mutated in positions 817-820, 838-844, and 931 -935 and selected for binding to yeast SUMO. Throughout the paper we number nanoCLAMP amino acids based on the sequence of NagH. To identify evolutionarily conserved amino acids, we generated a multiple sequence alignment for 20 non-redundant BLAST hits selected to cover a range of similarity. The percent identity of the orthologs ranged from 58% for Clostridium nigeriense to 43% for Coprobacillus sp. AF21 -8LB (Table 1 ).

Table 1. Orthologous sequences used for consensus-based design

Our workflow for the protein engineering campaign is summarized in FIG. 1. We made sitespecific mutations and then purified each individual mutant proteins for further biophysical assessment. For the initial assessment, we measured the melting temperature of the resulting mutants using differential scanning fluorimetry (DSF). We then assessed two parameters, melting temperature and initial fluorescence. The rationale for including low initial fluorescence as a criterion for progression is that we previously observed a rough correlation between high initial fluorescence and the presence of soluble aggregates and multimers detected by size exclusion chromatography. Initial fluorescence in DSF may be caused by the binding of fluorophore to hydrophobic patches exposed prior to unfolding. After identifying beneficial mutations, we made several constructs with different combinations of the singly beneficial mutations, whose effects were usually, but not always, additive. As a secondary screen, we confirmed that the starting clone for each subsequent round of mutagenesis was monodisperse by size exclusion chromatography. Optimized protein resulting from the mutagenesis campaign.

The results of 7 rounds of mutagenesis and evaluation are summarized in Table 2. The full list of mutations is listed in Table 3.

Table 2. Summary of mutations to improve nanoCLAMP performance

Table 3. Summary of mutations and resulting biophysical properties in each round of mutagenesis

N = no usable data

*IF = Initial fluorescence in DSF assay Criteria for L, M, H in IF assay: L = IF below 30% of amplitude of unfolding peak, M = 30% to 50% of amplitude of unfolding peak, H = above 50% of amplitude of unfolding peak

The volumes listed for Aggregated, Dimer, and Monomer SEC % correspond to elution volumes from a Superdex 75 SEC column.

Rounds 1 through 4 focused on increasing T_m. Rounds 5 and 6 focused on removing potential protease cleavage sites while maintaining T_m and included many reversions to past rounds. Round 7 focused on removing remaining asparagines while maintaining T_m. Overall, mutations were tested for 58% of the amino acids in the constant regions (72 of 124 positions). The clone resulting from 7 rounds of mutagenesis is designated P2788. In all, P2788 contains 30 mutations, representing approximately 24% of the positions in the constant regions, and includes a three-residue, C-terminal extension of residues from CBM32-2. The resulting mutations are broadly distributed throughout the primary sequence as shown in an alignment with the original sequence (FIG. 2) and throughout the 3-D structure when mapped to the CBM32-2 crystal structure (PDB accession 2W1 Q). Although the mutants all possessed the same binding loops as the initial clone, we expected and observed a gradual decline in target binding with an increasing number of mutations, many of which were adjacent to the binding loops. We speculate that the loss of binding was caused by shifts in the conformation of the binding loops (data not shown). Because our intent was to improve the stability of the nanoCLAMP and then isolate new binders, our workflow did not include a screen for SUMO binding.

The number of lysines and arginines representing potential trypsin cleavage sites was reduced from 1 1 in the constant regions of the starting protein (clone SMT3-A1 ) to 5 in the constant regions of the resulting protein (clone P2788). Three of the remaining arginines (R881 , R897 and R925) are expected to be involved in salt bridges as identified by the ESBRI algorithm (Costantini et al. ESBRI: a web server for evaluating salt bridges in proteins. Bioinformation 3:137-138, 2008). For these residues, we were unable to identify any substitution mutations that did not destabilize the proteins. For another position, K883, substitution with arginine was beneficial, but several additional substitutions either resulted in a greater than 10 °C decrease in melting temperature or high initial fluorescence in DSF. For one remaining position, K878, which is universally conserved in the alignment, 10 of 10 substitution mutations resulted in proteins with high initial fluorescence in DSF. In the wild-type NagH CpCBM32-2 structure, K878 NC forms hydrogen bonds with the carbonyl oxygens of P905 and G907 as determined by the RING 2.0 algorithm (Piovesan, et al. Nucleic Acids Res 44:W367-374, 2016). For asparagine, the number was reduced from 8 in the original clone (SMT3-A1 ) to 1 in the mutated clone. The remaining asparagine N928 is universally conserved in the consensus alignment and buried in the 3D structure. N928 N8² forms hydrogen bonds with the carbonyl oxygens of S845 and L846 as determined by the RING 2.0 algorithm. We chose not to attempt substitution mutations with N928 because of the low likelihood of deamidation based on its sequence context and the likely challenge of finding a substitution with a beneficial effect.

We next used AlphaFold to predict the 3-D structure of P2788 in order to assess the likelihood of gross changes in 3D-structure (Jumper et al. Nature 596:583-589, 2021 ). A 3D-alignment of the crystal structure of CBM32-2 and the predicted structure of P2788 was performed with the jFATCAT(rigid) algorithm (FIG. 3). As expected with conservative substitutions, the high degree of similarity and the use of templates by the AlphaFold algorithm, the predicted structure of the constant regions of P2788 does not show gross deviations from the solved crystal structure of NagH CpCBM32-2. As expected, because of the differences in amino acid sequence, the variable loops show deviations, especially for the longest 838-844 loop. The overall similarity of the solved CBM32-2 structure versus the predicted structure of P2788 is high, even with the differences in loop sequences (TM-score = 0.95).

Compared with the starting protein, the T_m increased by 24 °C from 52 °C to 76 °C. (FIG. 4). We next tested P2788’s resistance to digestion by trypsin. Following a 16-hour digestion in 0.1 mg/ml trypsin, no full length SMT3-A1 remained whereas no apparent digestion of P2788 had occurred as assessed by SDS-PAGE (FIG. 5A). A time course of trypsin digestion determined that the ti/2 increased from 3 hours for SMT3-A1 to > 16 hours for P2788 (FIGS. 5C-5E). While our protein engineering campaign deleted only a few surface-exposed chymotrypsin-cleavage sites, we also tested resistance to chymotrypsin as a reflection of general stability. Following a 16-hour digestion with 0.1 mg/ml chymotrypsin, about a fifth of P2788 appeared to remain full-length while SMT3-A1 was completely digested (FIG. 5A).

We next sought to determine whether a new class of nanoCLAMPs based on the constant regions of P2788 could confer these properties to newly isolated clones. We call the P2788-derived class the “nC-B class” (nanoCLAMP-B with the identifier “B” referring to the next variant of the original class of nanoCLAMPs). The first generation of nanoCLAMPs represented by SMT3-A1 and others is referred to as the “nC-A class” (nanoCLAMP-A with the identifier “A” referring to the first class of nanoCLAMPs). For clarity, the nC-A class of nanoCLAMPs encompasses the first published nanoCLAMPs (Suderman et al. supra). Relative to NagH CpCBM32-2, nanoCLAMPs of the nC-A class have a M929L mutation that removes a methionine as well as amino acid differences in the variable loops.

Phage display library panning for improved SUMO capture agents of the nC-B class of nanoCLAMPs.

To confirm that the optimized constant regions of the nC-B class can support the general isolation of high affinity binders with improved thermal, proteolytic, and alkaline stability, we sought to isolate new nanoCLAMP binders containing the nC-B constant regions and a diversity of variable loops. We first constructed a phage display library with randomized binding loops in the context of the nC-B constant regions and panned the library for binders to yeast SUMO (SMT3). This library has the same three variable loops with randomized residues as the previous library from which SMT3-A1 was isolated but uses the nC-B constant regions instead of the nC-A constant regions. Degenerate oligonucleotides constructed with phosphoramidite trimers were designed so that the variable regions encoded all amino acids except cysteine (omitted to avoid heterogeneous coupling to multiple cysteines), methionine (omitted to avoid the risk of inactivating oxidation), and lysine and arginine (omitted to avoid the addition of a trypsin-cleavage sites). Position 818 was held constant with a valine, the wild-type amino acid, because valine and another small hydrophobic amino acid isoleucine, appeared in one quarter of nanoCLAMPs from previous screens. The resulting library contained over 10¹⁰ variants of nC-B nanoCLAMPs.

Following the third round of panning of this library, we randomly selected 96 clones and screened them for target binding by semELISA, which yielded 93 confirmed positives. Of these, 40 were sequenced to identify 18 unique nanoCLAMP SUMO binders. These binders were subcloned into a bacterial expression vector, expressed, purified by immobilized metal affinity chromatography (IMAC), and confirmed to be over 90% pure as estimated by SDS-PAGE (data not shown). We then evaluated the purified nanoCLAMPs’ ability to function as affinity capture agents in a medium-throughput, small- scale depletion assay. In this assay, the nanoCLAMPs were conjugated to cross-linked agarose under denaturing conditions, refolded on the resin, incubated with SUMO, and the quantity of unbound SUMO measured by A280.

The purified nanoCLAMPs were then screened for monodispersity by size exclusion chromatography and T_m by DSF. Of the 18, seven (38%) were over 90% monomer. Of these, five had a melting temperature of greater than 73 °C, with four having melting temperatures greater than 99 °C (Table 4). The initial results with the SUMO test case suggest that the nC-B constant regions generally support the isolation of clones with high monodispersity and thermostability.

Table 4. Yield and proportion of monomer for anti-SUMO nanoCLAMPs

Characterization of binding affinity, thermostability and protease-resistance of nanoCLAMP capture agents of the nC-B class.

We chose three nC-B nanoCLAMPs (P2808, P2809, and P2811 ) for further characterization. These were selected to provide a diverse sample of binding loops. For the three clones, Loop 817-820 and Loop 931 -935 did not have any apparent similarity. Except for V818, whose identity was fixed in the library, there were no identities in any positions of these loops. For Loop 838-844, clones P2808 and P2809 are identical in 5 of 7 positions while clone 2811 shows no identities with either.

All three nanoCLAMPs were produced in E. coli with yields > 150 mg/liter of shaken E. coli culture and used for biophysical characterization experiments.

We first checked the quaternary structure of nanoCLAMPs by size exclusion chromatography to confirm that subsequent results could be interpreted without confounding avidity effects from higher order multimers or aggregates. All three clones migrated as monodisperse monomers (FIG. 6). To rank these nanoCLAMPs by their affinity for SUMO, we used biolayer interferometry to measure their dissociation constants, which ranged from 5 to 80 nM (Table 5). We then measured the melting temperature of the nanoCLAMPs using DSF. P2808 had an apparent T_m of 73 °C. P2809 and P811 had flat-line DSF curves with no melting transition apparent between 25° and 99 °C (FIG. 6). This observation suggests that the T_m for these nanoCLAMPs exceeds the quantifiable range for this assay. We corroborated this observation with a functional binding assay to assess kinetic thermostability. In this assay, we incubated samples of each nanoCLAMP at different temperatures, cooled and centrifuged the solutions, and then measured the binding activity remaining in the supernatant by biolayer interferometry. In this test, we define Tso as the temperature of a 5-minute heat challenge after which 50% of binding activity is irreversibly lost. The Tso ranged from ~85 °C for clone P2808 to > 100 °C for clones P2809 and P2811 . The rank order and values are consistent with the T_m measured by DSF (FIG. 11). Clones P2809 and P2811 , both with Tm’s > 100 °C, maintained greater than 90% activity after incubation at boiling temperatures for 5 minutes. The maintenance of binding activity indicates that the nanoCLAMPs remained in solution after heat treatment and did not irreversibly aggregate or precipitate. Taken together with the DSF data, the kinetic thermostability measurements suggest that these two nanoCLAMPs may remain folded up to, and possibly above, 99 °C. Table 5. T_m, T₅o, K_d, trypsin resistance of selected anti-SUMO nanoCLAMPs

We next characterized the trypsin-resistance of the three clones. Clones P2808 and P2809 were highly resistant to digestion with 0.1 mg/ml trypsin, while clone P2811 was less resistant (FIG. 5F). FIGS. 5B-5D show a time course of a tryptic digest comparing nanoCLAMPs with different combinations of constant regions and variable loops to understand the contribution of each component to trypsin resistance. We tested the original clone SMT3-A1 (nC-A constant regions and original variable loops), P2788 (nC-B constant regions with the original variable loops from SMT3-A1 ), and P2808 (nC-B constant regions with newly isolated variable loops). Both P2788 and P2808 show a ti/2 of > 20 hours in 0.1 mg/ml trypsin, compared with ~4 hours for the original SMT3-A1 clone. Together with the observations of P2809 and P2811 , these results indicate that trypsin-resistance depends upon the sequences of both the variable loops and constant regions. The observation of trypsin-resistance for 3 of 4 clones isolated with diverse loop sequences and identical constant regions indicates that the nC-B constant regions can be generally used to isolate trypsin-resistant clones, in at least a first test-case.

Measurement of performance parameters with affinity chromatography using capture agents of the nC-B class of nanoCLAMPs.

For brevity, we will subsequently refer to affinity chromatography resins made with capture agents of the nC-B class of nanoCLAMPs as “nC-B affinity resins.” As a test case for the utility of nC-B affinity resins, we used P2808 as a capture agent for more detailed studies. To generate the affinity resin, the P2808 protein was expressed, purified by IMAC under denaturing conditions, conjugated to sulfhydryl-reactive 6% cross-linked agarose resin, and then refolded by rinsing with buffered saline. We prepared a test mixture of crude E. coli lysates spiked with a SUMO-GFP fusion protein, optimized purification conditions and assessed the resins’ selectivity and binding capacity.

In pilot experiments, we observed that polyol-elution at neutral pH was still possible, as with the original nanoCLAMP (Suderman et al. supra), but with qualitatively lower speed and yield (data not shown). As an alternative, we tested elution of the target protein with molar concentrations of imidazole, which has been used successfully for disrupting protein-peptide interactions and antibody-Protein A interactions.

A buffer with 3 M imidazole at pH 8 quickly and completely eluted SUMO from P2808 resin. As shown in FIG. 12, after incubation for 1 hour with an excess of target protein in a spiked E. coli lysate, the bound target was washed and eluted with a purity of over 90% as estimated by densitometric analysis of Coomassie stained SDS-PAGE. The static binding capacity under these conditions was 11 .5 mg/ml resin (277 nmol/ml). In subsequent experiments, we found that imidazole elution worked consistently with P2808, P2809, and P2811 , as well as three different AC resins targeting SUMO, mCherry, and GFP with capture agents of the nC-A class . The observations suggest that imidazole elution is a general property of both class of nanoCLAMPs.

With elution conditions established, we next measured the dynamic binding capacity of P2808 resin using a 0.6 ml packed column (0.5 cm ID x 3.06 cm) at a constant flowrate of 0.5 ml/min. We utilized the fluorescence of the SUMO-GFP target to determine the QB10, or the volume at which the eluate’s fluorescence equaled 10% of the load’s fluorescence (see Materials and Methods for calculations). The dynamic binding capacity of P2808 resin under these conditions was 10 mg/ml resin (240 nmol/ml resin) (FIG. 7). This capacity represents an approximate 70% increase over our original SMT3-A1 resin.

We next tested the efficiency of the SUMO affinity resin P2808 resin under flow conditions. A 0.6 ml packed column and a flowrate of 0.5 ml/min (linear flowrate = 153 cm/h) were used to test two regimes.

In the capacity-excess regime, SUMO was spiked into an E. coli lysate at a low concentration (SUMO-GFP = 0.76% of total protein by weight), loaded at 42% of the column’s dynamic capacity (36% of its static binding capacity). As assessed by densitometry of a Coomassie-stained SDS-PAGE gel, the purity was > 90%, and the yield was 90%. (FIG. 8A).

In the capacity-limited regime, SUMO-GFP was spiked into an E. co// lysate at a high concentration (SUMO-GFP = 5.7% of total protein by weight) and loaded at 133% of the column’s dynamic binding capacity (116% of its static binding capacity) (FIG. 8B). The purity and yield were comparable to the capacity-excess regime, and both estimated to be > 90%. The performance metrics for both purifications are summarized in Table 6.

Table 6. Metrics for AC purification of SUMO-GFP with P2808 resin loaded at different capacities

1 : DBC = 10 mg/ml resin (see FIG. 7)

2: SBC = 11 .5 mg/ml resin (see FIG. 12)

Compatibility of nC-B affinity resins with repeated cycles of NaOH cleaning.

For large-scale production of industrial proteins or biologies, the re-use and cleaning of columns in place reduces manufacturing costs and maintains consistent performance. Sodium hydroxide solutions are commonly used for cleaning-in-place procedures, so we tested the general compatibility of the new nanoCLAMP resins with sodium hydroxide treatment. We tested 3 resins made with nC-A nanoCLAMPs and 3 resins made with nC-B nanoCLAMPs. Each cycle consisted of loading a SUMO-GFP target spiked into an E. coli lysate; a wash step, elution with 3 M imidazole pH 8; 10 minutes of contact time with 0.1 M NaOH; and then a 5-minute re-equilibration step. The eluates were collected and analyzed for purity by SDS PAGE, and the purified target quantified by fluorescence spectroscopy (FIGS. 9 and 13). In the first few cycles, we were surprised to observe an improvement in binding capacity of 5 to 20% with nC-B resins. It is possible that the increase in binding results from the removal of inhibitory material by NaOH. For all three nC-B resins, the binding capacity plateaued over 20 cycles and remained at or above 100% of starting capacity. In contrast, we observed a steady reduction in binding capacity by 25% to 50% with all three nC-A resins. Taken together, this data indicates that nanoCLAMPs of the nC-B class can generally serve as capture agents for resins capable of single-step affinity purification of targets to homogeneity. These resins are also compatible with cleaning-in-place protocols using 0.1 M sodium hydroxide for over 20 cycles, without loss of binding capacity or specificity. Further, in practice, cleaning- in-place cycles are not usually performed after each run, so the expected lifetime likely exceeds 100 cycles with the assumption of sanitation every 5^th run.

Compatibility of nC-B affinity resins with repeated cycles of low pH elution.

Because elution with 3 M imidazole might be suboptimal for some applications, we also tested elution with a citrate buffer at pH 2.5. In these experiments, the resin was also cleaned with NaOH in between each cycle for 1 minute before re-equilibrating the resin with buffered saline. The nanoCLAMP maintained 100% of its binding capacity and specificity over more than 20 cycles of loading, elution, and regeneration (FIGS. 14A and 14B).

Compatibility of nC-B affinity resins with organic solvent and autoclaving.

We next tested the ability of the nC-B resins and the original SMT3-A1 resin to resist more extreme conditions. First, we tested the resins’ ability to recover selective binding capacity after exposure to 100% DMF. We incubated nanoCLAMP resin in 100% DMF for 2 hours, re-equilibrated with buffered saline, and then measured binding capacity. Of the four resins tested (the original SMT3-A1 resin and P2808 resin, P2809 resin, and P2811 resin), all retained over 85% binding capacity after treatment with DMF (FIG. 10A) and maintained their apparent selectivity as assessed qualitatively by SDS-PAGE (FIG. 10B).

Because the nC-B resins were robust to the broad range of conditions tested so far, we decided to determine whether the resins could also retain binding and specificity after autoclaving. We autoclaved the resin with a 105-minute liquid steam cycle, including 30-minute exposure to 120 °C and 20 p.s.i., reequilibrated at room temperature, and then measured binding capacity. The resin made with the nC-A nanoCLAMP (SMT3-A1 ) did not bind any detectable target protein after autoclaving. In contrast, the resins made with the nC-B nanoCLAMPs (P2808, P2809, and P2811 ) retained 25% to 45% binding capacity with specificity comparable to controls (FIGS. 10C and 10D). The SMT3-A1 eluates were not analyzed in FIG. 10D because there was not enough protein in the autoclaved sample to prepare a normalized aliquot to compare to the control. To our knowledge, affinity chromatography resins based on nC-B nanoCLAMPs are the first protein-based affinity resins shown to retain significant binding capacity and specificity after autoclaving.

Discussion

We report on a protein engineering campaign yielding an improved class of nanoCLAMPs suitable as capture agents suitable for process-scale affinity chromatography. Similar to Protein A resins or antibody-based immunoaffinity resins, resins made with the new nC-B class of nanoCLAMPs support single step purifications from complex mixtures with high yield and fold-purification. Similar to Protein A resins, but unlike antibody-based resins, nC-B resins are compatible with NaOH cleaning-in-place, can be produced by bacterial expression of the capture reagent, and lack cysteine residues. Unlike either Protein A or antibody-based resins, nC-B resins are distinct in having been shown to exhibit resistance to boiling temperatures, trypsin, and organic solvent. Key performance parameters of nC-B resins and supporting results are summarized in Table 7.

Table 7. Attributes of affinity chromatography resins made with the nC-B class of nanoCLAMPs

Affinity resins based on Sulfolink crosslinked beaded agarose (Thermo)

This work provides a test case supporting the potential of nC-B nanoCLAMP resins to extend Protein A-like levels of performance to a broad range of proteins beyond antibodies. nC-B resins’ efficiency, low cost-of-manufacture and reusability have the potential to reduce the total cost of manufacture for process-scale purifications. In addition, nanoCLAMPs’ general compatibility with high temperature, organic solvent and pH extremes may enable industrial applications where extreme conditions are required.

The improved stability and performance of nC-B nanoCLAMPs also support their use in applications beyond immunoaffinity chromatography. nanoCLAMPs have been used successfully in bioelectric and electrochemical sensors. The conditions at the surface of a biosensor represent a challenging environment where the improved stability of nC-B nanoCLAMPs may be enabling. For example, conjugation of nanoCLAMPs to surfaces in DMF allows the use of reaction conditions compatible with reagents that have low solubility in aqueous buffers.

The performance characteristics demonstrated for the exemplary nanoCLAMPs in this work support further exploration of nanoCLAMPs’ potential to substitute for other capture agents in a broad range of applications, especially those procedures requiring tight, selective or reversible binding and exposure to extreme conditions.

Materials and Methods

Cloning of SMT3-A1 mutants

The plasmid pET(SMT3-A1 ) containing nanoCLAMP SMT3-A1 was mutated by inverse PCR (Ochman et al. Genetics 120:621 -623, 1988) by amplifying the plasmid with forward and reverse primers containing the mutation(s) of interest with 15-bp overlapping 5’ ends, purifying the amplicon, In-Fusion cloning the ends back together (In Fusion HD Cloning Kit, Takara), and transforming chemically competent NEc1 E. coli (BL21 (DE3) derivative with slyDD(His151 -His196) from Nectagen, Inc.). Plasmids were purified by Qiagen miniprep kit (Qiagen) and mutations were verified by sequencing the purified plasmids by Sanger sequencing (Genewiz). Glycerol stocks of the plasmids in NEc1 cells were prepared for seeding expression cultures. Constructs for conjugating to Sulfolink resin (Thermo) coded for the nanoCLAMP with an N-terminal 6-His tag and a 13-amino acid C-terminal GS-linker followed by a Cys. Constructs for expressing nanoCLAMPs for biophysical characterization lacked the GS-linker and the C-terminal Cys to avoid dimerization issues due to disulfides. pET(SMT3-A1) Sequence (SEQ ID NO: 38)

TTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGT CAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCG CTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGT CGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATG CTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGC CCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGG GC AAGAGC AAC T C GGT C GC C GC AT AC AC T AT T C T C AGAAT GAC T T GGT T GAGT AC T C AC C AGT C AC AGAAAAGC AT C T T AC GGAT GGC AT GAC AGT AAGAGAAT T AT GC AGT GC T GC C AT AAC C AT GAGT GAT AAC AC T GC GGC C AAC T T AC T T C TGAC AAC GAT C GGAGGAC C GAAGGAGC T AAC CGCTTTTTT GC AC AAC AT GGGGGAT CAT GT AAC TCGCCTTGATCG T TGGGAAC C GGAGC T GAAT GAAGC CAT AC C AAAC GAC GAGC GT GAC AC CACGATGCCTGCAGCAAT GGC AAC AAC GT T GC GC AAAC TAT T AAC T GGC GAAC TACTTACTCTAGCTTCCC GGC AAC AAT T AAT AGAC T GGAT GGAGGC GGAT AAA GTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGG GTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTC AGGC AAC TAT GGAT GAAC GAAAT AGAC AGAT C GC T GAGAT AGGT GC C T C AC TGAT T AAGC AT T GGT AAC T GT C AGAC CAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTT TGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAG GATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTT T GT T T GC C GGAT C AAGAGC T AC C AAC T C T T T T T C C GAAGGT AAC T GGC T T C AGC AGAGC GC AGAT AC C AAAT AC T GT CCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCC TGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAG GCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATA C C T AC AGC GT GAGC TAT GAGAAAGC GCCACGCTTCCC GAAGGGAGAAAGGC GGAC AGGT AT C C GGT AAGC GGC AGGG TCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCAC CTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTT TTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACC GTATTACCGCCTTT GAGT GAGC TGATACCGCTCGCCGCAGCC GAAC GAC C GAGC GC AGC GAGT C AGT GAGC GAGGAA GCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATATGGTGCACTCTC AGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATACACTCCGCTATCGCTACGTGACTGGGTCATGGCTGCG CCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGT GACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAAGCT CATCAGCGTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAGAAGC GTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTCACTGATGCCTCCGTGTA AGGGGGAT TTCTGTTCATGGGGGTAATGATACCGAT GAAAC GAGAGAGGAT GC T C AC GAT AC GGGT T AC T GAT GAT G AACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCGGCGGGACCAGAGAAAAATCACT CAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGAT C CGGAAC AT AAT GGTGCAGGGCGCT GAC TTCCGCGTTTC C AGAC T T T AC GAAAC AC GGAAAC C GAAGAC CATTCATG TTGTTGCTCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTGCTAA CCAGTAAGGCAACCCCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCACCCGTGGCCAGGACCC AACGCTGCCCGAGATGCGCCGCGTGCGGCTGCTGGAGATGGCGGACGCGATGGATATGTTCTGCCAAGGGTTGGTTT GCGCATTCACAGTTCTCCGCAAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTAGCGAGGTGCCGCCGG CTTCCATTCAGGTCGAGGTGGCCCGGCTCCATGCACCGCGACGCAACGCGGGGAGGCAGACAAGGTATAGGGCGGCG CCTACAATCCATGCCAACCCGTTCCATGTGCTCGCCGAGGCGGCATAAATCGCCGTGACGATCAGCGGTCCAGTGAT CGAAGTTAGGCTGGTAAGAGCCGCGAGCGATCCTTGAAGCTGTCCCTGATGGTCGTCATCTACCTGCCTGGACAGCA T GGC C T GC AAC GC GGGC AT C C C GAT GC C GC C GGAAGC GAGAAGAAT C AT AATGGGGAAGGC C AT C C AGC C T C GC GT C GCGAACGCCAGCAAGACGTAGCCCAGCGCGTCGGCCGCCATGCCGGCGATAATGGCCTGCTTCTCGCCGAAACGTTT GGT GGC GGGAC C AGT GAC GAAGGC T T GAGC GAGGGC GT GC AAGAT T C C GAAT AC C GC AAGC GAC AGGC CGATCATCG TCGCGCTCCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTGCATGATA AAGAAGAC AGT C AT AAGT GC GGC GAC GATAGTCATGCCCCGCGCCCACC GGAAGGAGC T GAC T GGGT T GAAGGC T C T CAAGGGCATCGGTCGAGATCCCGGTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTGCCCGCTT TCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGG CGCCAGGGTGGTTTTTCTTTTCACCAGTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTT GCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACAT GAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCG CATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCATTTGCATGG TTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATAT T T AT GC C AGC C AGC C AGAC GC AGAC GC GC C GAGAC AGAAC TTAATGGGCCCGC T AAC AGC GCGATTTGCT GGT GAC C CAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCTTCATGGGAGAAAATAATACTGTTGATGGGTGTCTGGT C AGAGAC AT C AAGAAAT AAC GC C GGAAC AT T AGT GC AGGC AGC T T C C AC AGC AAT GGC AT C C T GGT C AT C C AGC GGA TAGTTAATGATCAGCCCACT GAC GCGTTGCGC GAGAAGAT TGTGCACCGCCGCTTTACAGGCTTC GAC GCCGCTTCG TTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACGGCG CGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTG GGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCAC CACGCGGGAAACGGTCTGATAAGAGACACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCA CCCTGAATTGACTCTCTTCCGGGCGCTATCATGCCATACCGCGAAAGGTTTTGCGCCATTCGATGGTGTCCGGGATC TCGACGCTCTCCCTTATGCGACTCCTGCATTAGGAAGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACCGCCGCCG C AAGGAAT GGT GC AT GC AAGGAGAT GGC GC C C AAC AGT C C C C C GGC C AC GGGGC C T GC C AC C AT AC C C AC GC C GAAA CAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGC ACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAATTAATACG AC T C AC T AT AGGGGAAT T GT GAGC GGAT AAC AAT T C C C C T C T AGAAAT AAT T T T GT T T AAC T T T AAGAAGGAGAT AT ACCATGGGCAGCAGCCATCATCATCATCATCACAACCCTTCTTTAATTCGTTCTGAATCCTGGGAAGACATCAAAGG GAAT GAAGC C AAT T TAT T AGAT GGAGAC GAT AAC AC CGGTGTTTGGTATTT C AAC GAAGT T T T C T AC GAAT C T C T T G CAGGAGAATTCATTGGATTGGACTTAGGTAAGGAAATTAAATTGGATGGTATTCGTTTTGTTATTGGTAAGAATGGA GGCGGTAGTTCCGACAAATGGAACAAATTCAAGTTGGAGTACTCCCTGGATAACGAAAGTTGGACTACTATCAAAGA AT AC GAC AAGAC AGGGGC T C C T GC AGGGAAAGAT GT T AT T GAAGAAT C C T T CGAGAC TCCCATTTCCGC T AAGT AC A TTCGTCTGACTAATCTGGAAGACAAAATCCTGTTCCTGACTTTTAGTGAGTTTGCAATTGTGTCTGACGGTGGAGGT GGCAGCGGCGGTGGTGGCTCGGGTGGAGGGTGCTGAGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGC CACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAGGAGGAA CTATATCCGGATATCCCGCAAGAGGCCCGGCAGTACCGGCATAACCAAGCCTATGCCTACAGCATCCAGGGTGACGG TGCCGAGGATGACGATGAGCGCATTGTTAGATTTCATACACGGTGCCTGACTGCGTTAGCAATTTAACTGTGATAAA C T AC C GC AT T AAAGC T TAT C GAT GAT AAGC T GT C AAAC AT GAGAA

Expression and purification of nanoCLAMPs under denaturing conditions for conjugation to affinity chromatography resin (1 I scale)

Glycerol stocks of NEc1 cells harboring nanoCLAMP expression vectors (described above) were used to inoculate 3 ml starter cultures of 2xYT/2% glucose (Glu)/100 mg/ml Carbenici Ilin (CB) and grown overnight at 37 °C, 250 rpm. The overnight cultures were diluted 1 :100 into 300 ml of Novagen Overnight Express Instant TB Medium/1 % glycerol/CB and incubated 24 h, 30 °C, 250 rpm. Cells were pelleted at 10k x g, 10 min, 4 °C, and lysed with 30 ml 100mM NaH2PO4, 10 mM Tris, 6 M GuHCI (QAB) pH 8.5, plus 1 mM TCEP (QAB-TCEP, pH 8.5) using a Polytron to homogenize. The insoluble material was pelleted at 15k x g, 20 min, 15 °C, and the cleared supernatant applied to Ni Sepharose 6 Fast Flow (Cytivia) and incubated rotating for 1 h to overnight. The beads were transferred to a column and washed with 3 CV QAB-TCEP, pH 8.5, then 3 CV QAB, pH 8.5. The protein was eluted with QAB, pH 8.5 + 250 mM imidazole and quantified by A280. The purity of the eluted protein was measured by SDS-PAGE on 12% NuPAGE Bis-Tris gels and Coomassie staining with Gel-Code Blue (after removing the GuHCI by cold ethanol precipitation). Yields for nanoCLAMPs were typically 150 - 300 mg/L culture, and purity was typically greater than 90%.

The purified, denatured nanoCLAMPs in QAB, pH 8.5 were reduced with 2 mM TCEP if used after storage and conjugated to Sulfolink cross-linked, 6% beaded agarose (Thermo). Briefly, the resin was equilibrated with QAB, pH 8.5 + 5 mM EDTA and transferred to a column. The nanoCLAMP was adjusted to 8 mg/ml in a volume 2X the volume of the Sulfolink resin, and then incubated with the resin with rotation for 30 min. at room temperature. The resin was allowed to settle for 15 min., and the column drained to the top of the resin bed. The column was washed with QAB, pH 8.5 and then incubated with 50 mM L-Cys in QAB, pH 8.5 to quench for 15 min. with rotation. The column was allowed to settle, drained, and washed again with 6 M GuHCI, 20 mM Tris, (QCB) pH 8. Finally, the nanoCLAMP was refolded on the resin by rinsing with 6 CV of 20 mM MOPS, 150 mM NaCI (MBS), pH 6.5 + 1 mM CaCl2.

Determination of static binding capacity of nanoCLAMP resins

A spiked E. coli lysate was prepared by pelleting the equivalent of ODeoo = 8 culture, discarding the supernatant and lysing the cells with BPER at 4 ml per g pellet, 20 min at room temperature with rotation. The insoluble material was removed by centrifugation, and the cleared lysate adjusted to a total protein concentration of approximately 1 .87 mg/ml, 20% BPER in PBS, pH 7.4. The target protein, SUMO-GFP (described above) was spiked into the lysate to a final concentration of 0.025 to 0.2 mg/ml, depending on the application with a highly concentrated stock so that the total protein concentration remained unchanged. The spiked lysate was then incubated with 10 ml of the nanoCLAMP resin (packed volume) in a total volume of 1 .4 ml, rotating at 4 °C for 1 h. The resin was precipitated by centrifugation and transferred to a small column. The resin was washed 4 times with 400 ml PBS, pH 7.4, and then eluted with 3 M imidazole, pH 8. The eluates were buffer exchanged twice with Zeba columns (7 kD MWCO, Thermo), and quantified by A280 or fluorescence using an iD5 plate reader.

Expression and purification of nanoCLAMPs for biophysical characterization

Glycerol stocks of NEc1 cells harboring nanoCLAMP expression vectors (described above) were used to inoculate 3 ml starter cultures of 2xYT/2% glucose (Glu)/100 mg/ml Carbenici Ilin (CB) and grown overnight at 37 °C, 250 rpm. The overnight cultures were diluted 1 :100 into 35 ml of Novagen Overnight Express Instant TB Medium/1% glycerol/CB and incubated 24 h, 30 °C, 250 rpm. Cells were pelleted and lysed with QCB, pH 8, and insoluble material removed by centrifugation at 15k x g, 20 min, 15 °C. The cleared lysate was incubated with Ni Sepharose 6 Fast Flow (Cytivia) for > 1 h rotating, room temperature, then transferred to 2 ml columns. The columns were washed with 6 x 1 ml QCB, pH 8, then refolded with 11 ml of 20 mM MOPS, 150 mM NaCI (MBS), 1 mM CaCl2, pH 8. The nanoCLAMPs were eluted with MBS, 1 mM CaCL, 250 mM imidazole, pH 8, buffer exchanged to remove the imidazole using Zeba 7 MWCO desalting columns, and normalized to 1 mg/ml in MBS, 1 mM CaCl2, pH 6.5.

Expression and purification of target proteins for panning and affinity chromatography

To prepare a target protein for panning the library NL-26, we prepared a biotinylated yeast SUMO construct (B-SUMO; P1068) in a pET expression vector and transformed into BL21 (DE3) E. coli harboring a constitutively expressed biotin ligase, BirA. An overnight starter culture was diluted 1 :100 into 500 ml Novagen Overnight Express Instant TB Medium/1% glycerol/CB/CAM including 5 mM Biotin and incubated 24 h, 30 °C, 250 rpm. Following the induction, the cells were pelleted and the media discarded. The pellet was frozen at -80 °C, thawed on ice and resuspended at 5 ml/g pellet in MBS, pH 7.4 + Pierce Protease Inhibitor Tablet Mini and sonicated on ice for 10 min. at 50% duty cycle. Biotin was added to 100 pM and the lysed cells incubated at 37 °C, 30 min, 250 rpm to drive biotinylation to completion. The lysate was cleared by centrifugation at 30k x g, 20 min, 4 °C and the supernatant transferred to a 2.25 ml SMT3-A1 resin (Nectagen, Inc) packed column at 1 ml/min. The resin was washed with 25 ml MBS, pH 7.4 and the protein eluted with polyol elution buffer (PEB): 10 mM Tris, 1 mM EDTA, 0.75M ammonium sulfate, 40% propylene glycol, pH 7.9. The protein was desalted 2X into 50 mM Tris, pH 8 and stored as a 50% glycerol stock at -20 °C.

B-SUMO sequence (P1068) (SEQ ID NO: 39)

MSDSEVNQEAKPEVKPEVKPETHINLKVSDGSSEIFFKIKKTTPLRRLMEAFAKRQGKEMDSLRFLYDGIR IQADQTPEDLDMEDNDIIEAHREQIGGGGGLNDIFEAQKIEWHE

To facilitate quantification of affinity chromatography target protein in eluates, we prepared a SUMO-GFP fusion (P10126RDG-1 ) that we could track by fluorescence spectroscopy. Briefly, the pET expression construct was transformed into NEc1 E. coli (Nectagen, Inc) and an overnight culture diluted 1 :100 into 1 L Novagen Overnight Express Instant TB Medium/1% glycerol/CB and grown 24 h at 30 °C, 250 rpm. The cells were pelleted, the media removed, and the cells lysed with BPER with Universal Nuclease (Thermo). The insoluble material was removed by centrifugation and the cleared supernatant loaded onto a 5 ml Ni Sepharose 6 Fast Flow column (Cytivia) at 1 .5 ml/min, and the resin washed with 100 ml 50 mM NaF^PCk, 300 mM NaCI, 20 mM imidazole, pH 8, and then eluted with the same buffer with 250 mM imidazole. Protein purity of both B-SUMO and SUMO-GFP was assessed by SDS-PAGE using 12% NuPAGE, BisTris, MES running buffer under reducing conditions and stained with GelCode Blue (Thermo).

SUMO-GFP fusion (P10126RDG-1) (SEQ ID NO: 40)

MGSSHHHHHHSDSEVNQEAKPEVKPEVKPETHINLKVSDGSSEIFFKIKKTTPLRRLMEAFAKRQGKEMD SLRFLYDGIRIQADQTPEDLDMEDNDIIEAHREQIGGLYFQGSKGEELFTGVVPILVELDGDVNGHKFSVS GEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFAYGLQCFARYPDHMKQHDFFKSAMPEGYVQERTIFF KDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIE DGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYKIEGR GGKPIPNPLLGLDST

Analysis of monodispersity by size exclusion chromatography

Purified nanoCLAMPs were diluted to a final concentration of 0.18 mg/ml in MBS, 1 mM CaCl2, pH 6.5, centrifuged at 20k x g, 2 min 4 °C, and the supernatants transferred to a clean tube. The samples were loaded into a 125 pl sample loop and injected onto a Superdex 75 10/300 GL column (GE Healthcare Life Sciences, Pittsburg, PA) equilibrated in MBS, 1 mM CaCl2, pH 6.5 at a flowrate of 0.65 ml/min. The column was calibrated with Bio-Rad Gel Filtration Standard per manufacturer’s instructions.

Determination of melting temperature by differential scanning fluorimetry

The melting temperature of purified nanoCLAMPs was determined using GloMelt Thermal Shift Protein Stability Kit (Biotium) per manufacturer’s instructions. Briefly, purified nanoCLAMPs were adjusted to 1 mg/ml in MBS, 1 mM CaCL, pH 6.5 and diluted in half with 2X GloMelt (Biotium) and aliquoted to 386 well plate and sealed with optical film. The plate was then heated in a Quantstudio 5 qPCR machine using SYBR Green reporter with no passive reference. The heating profile was 25 °C for 2 min; ramp at 0.05 °C /sec to 99 °C; 99 °C for 2 min. T_m is defined as the inflection point in the unfolding curve.

Determination of protease stability by digestion with trypsin and chymotrypsin

Digestions were performed by incubating the nanoCLAMPs at 0.25 mg/ml in a 20 pl reaction containing 0.1 mg/ml trypsin (Roche Cat 11418475001 ) or chymotrypsin (Roche Cat 11418467001 ) diluted in 1 mM HCI, such that the final HCI concentration in the reaction was 0.1 mM. CaCL was added to the reaction to 10 mM. The protein and remaining diluent buffer was MBS, pH 6.5. The reaction was incubated at 37 °C for the indicated times and stopped by adding 2 ml 10X Protease Arrest (G- Biosciences), analyzed on SDS-PAGE (12% NuPAGE, Bis Tris, in MES buffer) in SDS sample buffer with reducing agent, and stained with GelCode Blue (Thermo). Densitometry was carried out using GelAnalyzer software to measure relative staining intensities of the full-length band. Phage display library NL-26 construction for nC-B class

The pCombX phagemid template p2799 (Table 8), contained the N-terminal and C-terminal constant regions of the nC-B class separated by a stuffer region containing Hind II I and Spel cut sites. This template was digested with Hindlll and Spel, gel purified, and the plasmid region amplified with degenerate primers 1957T R and 1960T F, which added the N and C-terminal part of nC-B as well as the randomized loops L1 and L8, respectively. The primers listed with a T indicate they are degenerate primers constructed using phosphoramidite trimer mixes (Glen Research) of oligos (IDT) containing all amino acids except Cys, Met, Lys, and Arg. The short internal region of the P2788 was amplified using primers 1958T F and 1959 R, which added and randomized Loop L2. PCR was carried out with ClonAmp HiFi PCR Mix, according to manufacturer’s instructions (Takara Bio, Mountain View, CA). The reaction cycle was 98 °C for 10 sec, 65 °C for 10 sec, and 72 °C for 30 sec, repeated 30 times. These two amplicons, which contained overlapping ends, were gel purified and cloned together by Gibson Assembly (described below), creating the nC-B construct with 3 variable loops: Loop L1 (3 residues -817,819, 820), Loop L2 (7 residues, 838 - 844), and Loop L8 (5 residues, 931 - 935), for a total of 15 variable residues in 3 loops.

Table 8. Primer Sequences

NNN = randomized codon

To clone the library components, 10 pg of the large amplicon and 7.86 pg of the short amplicon were combined in a 2 ml reaction containing 1000 pl of Gibson Assembly Master Mix (2X) (NEB) and incubated at 50 °C for 30 min and then put on ice. The ligated DNA was then purified and concentrated in one Nucleospin Gel and PCR Cleanup Kit (Machery Nagel) and eluted in 45 pl EB. The DNA was then desalted on a VSWP 0.025 pm membrane (EMD Millipore) on ddFLO for 40 min with a water change at 20 min. The desalted DNA was then adjusted to 100 ng/pl with ddF and used to electroporate electrocompetent TG1 cells (Lucigen). Approximately 50 pl of DNA was added to 1 .25 ml ice cold TG1 cells and pipetted up and down 4 times to mix on ice, after which 25 pl aliquots were transferred to 50 electroporation cuvettes (with 1 mm gaps) on ice. The cells were electroporated, and immediately quenched with 975 il recovery media (Lucigen), pooled, and incubated at 37 °C, 250 rpm for 1 h. To titer the library, 10 pl of recovered culture was serially diluted in 2xYT and 10 pl of each dilution spotted on 2xYT/glu/carb and incubated at 30 °C overnight. The remaining library was expanded to 3 L 2xYT/glu/carb and amplified overnight at 30 °C, 250 rpm. The next day, the library was pelleted at 10k x g, 10 min, 4 °C and the media discarded. The pellet was re-suspended to an ODeoo of 75 in 2xYT/2% glucose/18% glycerol, aliquoted and stored at -80 °C.

Panning of nanoCLAMP library NL-26 (nC-B library)

For the first round of panning, 2.7 L of 2xYT medium with 2% glucose and 100 mg/ml carbenici Ilin (2xYT/Glu/CB) was inoculated with 3.6 ml of the NL-26 library glycerol stock (ODeoo = 75), to an ODeoo of approximately 0.1 and grown at 37 °C, 250 rpm until the ODeoo reached 0.52. The library was infected by adding helper phage VCSM13 (Stratagene, Cat#200251 ) to 750 ml of culture at an MOI of 20 phage/cell, and incubating at 37 °C, 100 rpm for 30 min, then 250 rpm for an additional 30 min. The cells were pelleted at 7500 x g for 10 min, and the media discarded. The cells were resuspended in 1 .2 L 2xYT/CB, 70 pg/ml kanamycin (KAN), and incubated 15 h at 30 °C, 250 rpm. The cells were combined, and 100 ml was centrifuged at 10k x g for 10 min. The phage containing supernatant was transferred to clean tubes and precipitated by adding 37.5 ml of 5X PEG/NaCI (20% polyethylene glycol 6000/2.5 M NaCI), and incubated on ice for 25 min. The phage was pelleted at 13k x g, 25 min and the supernatant discarded. The phage was resuspended in 10 ml 20 mM NaH2PO4, 150 mM NaCI, pH 7.4 (PBS), then centrifuged at 15k x g for 15 min to remove insoluble material. The phage was precipitated a second time by adding ¹/4 volume 5X PEG/NaCI, incubated on ice for 5 min, and pelleted at 13k x g, 10 min at 4 °C. The phage pellet was resuspended in 3 ml PBS and quantified by absorbance at 268 nm (A268 = 1 for a solution of 5 x 10¹² phage/ml).

Two sets of 100 pl of Dynabeads MyOne Streptavidin T1 (ThermoFisher Scientific) magnetic beads slurry were washed 2 x 1 ml with PBS-T (PBS with 0.05% Tween 20), applying magnet in between washes to remove the supernatant, and then blocked in 1 ml of 2% dry milk solution in PBS with 0.05% Tween 20 (2% M-PBS-T) for 1 h, rotating, at room temperature. To preclear the phage against beads alone, 1 ml of phage was prepared at a concentration of 2 x 10¹³ phage/ml in 2% M-PBS-T, the block removed from the first set of beads, and the phage added to the beads and incubated 1 h, rotating. The magnet was applied, and the precleared phage removed and transferred to a clean tube. The magnet was applied, and this step repeated two times to ensure no carryover of beads bound to phage to the next step. Biotinylated target (B-SUMO) was added to the precleared phage to 100 nM final concentration and incubated rotating 1 h. Block was removed from the second set of beads, and the phage/B-SUMO mix was added to the beads to precipitate the biotinylated target and bound phage. The beads were washed 8X with PBS-T, 1 ml each, vortexing between each step and applying the magnet. The washed beads were eluted with 800 pl 0.1 M glycine, pH 2.0, 10 min rotating, the magnet applied, and the eluate transferred to 72 pl 2 M Tris base to neutralize. The neutralized phage was then added to 9 ml XL1 -blue E. coH, which had been grown to ODeoo = 0.435 and placed on ice. The cells were infected at 37 °C, 45 min, 175 rpm, and then expanded to 100 ml 2xYT/Glu/CB and incubated overnight at 30 °C, 250 rpm.

The overnight cultures were harvested by measuring the ODeoo, centrifuging the cells at 10k x g for 10 min and then resuspending the cells to an ODeoo of 75 in 2xYT/18% glycerol. To prepare phage for the next round of panning, 5 ml of 2xYT/Glu/CB was inoculated with 5 pl of the 75 ODeoo glycerol stock and incubated at 37 °C, 250 rpm until the ODeoo reached 0.5. The cells were superinfected at 20:1 phage:cell, mixed well, and incubated at 37 °C, 30 min, 150 rpm and then 30 min at 250 rpm. The cells were pelleted at 5500 x g, 10 min, the glucose containing media discarded and the cells resuspended in 10 ml 2xYT/CB /KAN and incubated overnight at 30 °C, 250 rpm.

The overnight phage prep was processed as described above. The phage was then prepared at A268 = 0.8 in 2% M-PBS-T, and the panning and pre-clearing continued as described, except in the second and third rounds, the biotinylated target concentration was reduced 10X per round. Washes after phage-capture was also increased in the third round, to 12 washes. In round 2, neutravidin-coated magnetic beads (Spherotech) were used in place of streptavidin-beads to reduce enrichment for streptavidin binders.

Qualitative semELISA of individual clones following panning.

At the end of the last panning round, individual colonies were plated on 2xYT/Glu/CB agar plates following the 45 min 150 rpm recovery at 37 °C of the infected XL1 -blue cells with the eluted phage. The next day, 95 colonies were inoculated into 400 pl 2xYT/Glu/CB in a 96-deep-well culture plate, and grown overnight at 37 °C, 300 rpm to generate a master plate, to which glycerol was added to 18% for storage at -80 °C. To prepare an induction plate for the ELISA, 5 pl of each master-plate culture was inoculated into 400 pl fresh 2xYT/0.1 % glucose/CB medium and incubated for 2.75 h at 37 °C, 300 rpm. IPTG was then added to 0.5 mM and the plates incubated at 30 °C with 300 rpm shaking overnight. Because the phagemid contains an amber stop codon, some nanoCLAMP protein is produced without the pill domain, even though XL1 -blue is a suppressor strain, resulting in the periplasmic localization of some nanoCLAMP, of which some percentage is ultimately secreted to the media. The media can then be used directly in an ELISA assay (soluble expression-based monoclonal enzyme-linked immunosorbent assay: semELISA). After the overnight induction, the plates were centrifuged at 1200 x g for 10 min to pellet the cells. Streptavidin coated microtiter plates (ThermoFisher) were rinsed 3 times with 200 pl PBS, and then coated with biotinylated target proteins at 2 pg/ml with 100 pl/well and incubated 1 h. For blank controls, a plate was incubated with 100 pl/well PBS. The coating solution was removed, and the plates blocked with 2% M-PBS-T. The block was removed and 50 pl of 4% M-PBS-T added to each well. At this point 50 pl of each induction plate supernatant was transferred to the blank and protein-coated wells and pipetted 10 times to mix and incubated 1 h. The plates were washed 4 times with 200 pl PBS-T and the plates dumped and slapped on paper towels in between washes. After the washes, 75 pl of 1 :2000 dilution anti-FLAG-HRP (Sigma A8592) in 4% M-PBS-T was added to each well and incubated 1 h. The anti-FLAG-HRP was discarded, and the plates washed as before. The plates were developed by adding 75 pl TMB Ultra substrate (ThermoFisher) and analyzed for positive signals compared to controls. Positive clones were then grown up from the master plate by inoculating 1 ml 2xYT/2% glucose/100 pg/ml CB with 3 pl glycerol stock and incubated for at least 6 h at 37 °C, 250 rpm. The cells were then pelleted, and the media was discarded. Plasmid DNA was prepared from the pellets using the Qiaprep Spin Miniprep Kit, and the sequences determined by Sanger sequencing at Genewiz (South Plainfield, NJ). The nanoCLAMP inserts from unique positive clones was amplified and cloned into the pET expression vector, described above. Biolayer Interferometry of nanoCLAMPs

Kinetic analysis of interactions between nanoCLAMPs and Biotinylated SUMO was carried out on an OctetRed96 using SAX streptavidin coated sensor tips. The tips were transferred first to buffer (MBS, 1 mM CaCl2, pH 6.5 + 1 % BSA) for 300 sec, then to B-SUMO at 2 mg/ml in buffer for 180 sec, then to buffer for 300 sec, then to at least 4 dilutions of nanoCLAMPs in buffer (association) for 200 sec, then to buffer (dissociation) for 500 sec. The cells were constantly vortexed at 1000 rpm at rm temp. The kinetics were fit to a 1 :1 model and Kd calculated using global fit analysis.

Dynamic Binding Capacity of P2808 resin and SMT3-A1 resin

A packed volume of 0.6 ml of P2808 resin or SMT3-A1 resin was packed into a Tricorn 5/50 column (5 mm ID x 3.06 cm height) and equilibrated in 20 mM NaH2PO4, 150 mM NaCI, pH 7.4 (PBS) at 0.5 ml/min for 5 CV. The load, a Sumo-GFP fusion protein (MW = 41 ,559 g/mol) diluted to a concentration, c, of 0.2 mg/ml in PBS, was pumped through the system with the column on bypass and the eluate fluorescence measured to determine the total load fluorescence at Ex/Em 485/535 nm. The delay volume, Vdeiay, was measured for the configuration at 0.5 ml. The load was then directed to the column and the volume V_x measured, where V_x is the volume where the fluorescence of the eluate = 10% of that of the total load. The dynamic binding capacity, in units of mg/(ml resin), was then calculated as follows: DBC = (V_x - Vdeiay)*c/(Vol Resin).

Purification of Sumo-GFP from a spiked E. coli lysate by affinity chromatography with P2808 resin

A cleared E. coli lysate was prepared by lysing a pellet of NEc1 E. coli (a derivative of BL21 (DE3) with the C-terminal region of SlyD knocked out by recombineering, Nectagen, Inc) with BPER (Thermo) and removing insoluble material by centrifugation at 15 k x g, 20 min, 4 °C. The cleared supernatant was diluted to a total protein concentration of roughly 3.3 mg/ml with PBS, pH 7.4 such that the BPER reagent was present at 20% vol/vol. The target protein SUMO-GFP (MW = 41 ,559 g/mol) was spiked-in to a final concentration of either 0.2 mg/ml or 0.025 mg/ml. The spiked lysate was loaded onto the column at 0.5 ml/min for indicated times, washed with 20 CV PBS, pH 7.4, then eluted with 3 M Imidazole, pH 8. Fractions containing eluted target were pooled and desalted 2X on Zeba 7 MWCO columns and the protein quantified by A280. Imidazole removal was verified by testing the A280 of elution buffer alone following 2X desalting. Spiked lysate, early wash fractions, and pooled elutions (post buffer exchange) were analyzed by NuPAGE SDS PAGE under reducing conditions, 12%, Bis-Tris in MES running buffer and stained with Gel Code Blue (Thermo).

Repeated AC purification cycles including cleaning in place (CIP) of resins using 0.1 M NaOH

Repeated affinity chromatography purifications were carried out on an FPLC with a small 50 pl (packed) column using running buffer 20 mM MOPS, 150 mM NaCI, 1 mM CaCl2, pH 7.2. The Load consisted of Sumo-GFP spiked into a cleared E. coli lysate (described above in Purification of Sumo-GFP from a spiked E. coli lysate) at 0.1 mg/ml. The cycle consisted of a 2 ml equilibration in running buffer at 1 ml/min, 0.5 ml load of spiked lysate at 0.5 ml/min, 3 ml wash with running buffer at 0.5 ml/min, 2 ml elution with 3 M imidazole, pH 8 (collected) at 0.5 ml/min, a 0.5 ml wash with running buffer at 0.5 ml/min, a cleaning in place cycle of 1 .5 ml NaOH at 1 ml/min and then 2 ml at 0.2 ml/min (total contact time 10 min), and finally a refolding step with 5 ml running buffer at 1 ml/min. The target concentration in the eluates was measured by fluorescence spectroscopy in duplicate on an i D5 plate reader (Molecular Dynamics) Ex/Em 485/535 nm. The eluates were analyzed by SDS-PAGE using NuPAGE gels as described above.

Repeated AC purification cycles with low pH elution and short cleaning in place with NaOH

Repeated affinity chromatography purifications were carried out as described above, except the column was eluted with 0.1 M citrate, pH 2.5 instead of 3 M imidazole, pH 8. Also, the cleaning in place step with 0.1 M NaOH was shortened to 1 ml, at 1 ml/min (1 min contact time per cycle). Since the eluted SUMO-GFP was denatured by the low pH elution, the relative elution concentrations were compared using densitometry of the target bands on SDS PAGE.

Determination of effect of autoclaving or DMF incubation on SUMO binding resin binding capacity and specificity

For each resin tested, three 10 pl aliquots (packed vol) of resin were loaded into 1 .5 ml screw cap tubes. To one, 1 ml DMF was added, and the tube incubated at rm temp for 2 h. To the other two, 100 pl MBS, 1 mM CaCl2, pH 7.2 was added. One of these was autoclaved with its cap left slightly loose, on a 30 min liquid cycle, which sterilizes at around 120 °C and 20 psi for 30 min, and then slowly drops the pressure and the temperature over the next 90 min. The other set of resin was left on ice as a control. After two hours, all of the resins were cooled to room temperature and centrifuged at 1 k x g, 1 min. The control and autoclaved resin were stored overnight at 4 °C. The DMF treated resin was rinsed 3X with fresh MBS, 1 mM CaCl2, pH 7.2 and then stored overnight at 4 °C. The next day all three of the sets of beads were rinsed with fresh buffer, and then incubated with 1 .3 ml of E. coli lysate (prepared as described above) spiked with SUMO-GFP at 0.2 mg/ml, for 1 h, 4 °C, rotating. The resin was loaded into a small, tared column, rinsed 4 x 400 ml PBS, pH 7.4, then eluted 3 x 25 ml 3 M imidazole, pH 8. The fluorescence of the eluates was read on an iD5 plate reader in duplicate as described and the concentration determined by comparison with a standard curve of the target and compared to controls. The concentrations were normalized and analyzed by SDS PAGE as described above to assess purity.

Example 2. Using the protein scaffold to target diverse antigens.

We selected protein scaffolds that bound to diverse protein targets. A summary of the protein scaffolds and their cognate targets is shown in Table 9 below. Table 9 contains a subset of a much larger set of target-specific nanoCLAMPs, the majority of which possess the loop lengths of 4, 7, and 5 residues for loops 1 , 2, and 8, respectively, as designed in library NL-26 (see above). To demonstrate the protein scaffold’s ability to tolerate various loop lengths, we only included those nanoCLAMPs in Table 9 that possess at least one loop with a different length than the designed length. In Table 10 we demonstrate the scaffold’s ability to support vast loop diversity by tabulating the amino acid sequences of nanoCLAMPs specific to several targets and show the diversity of loop sequences to a single target in several cases. Table 9. Sequences of binding scaffolds (nanoCLAMPs) with variable loop lengths

*denotes a stop codon

Table 10. Loop sequence variety of nC-B nanoCLAMPs binding diverse targets

Example 3. Terbium binding

Lanthanide binding to the protein scaffold was demonstrated by incubating the proteins with terbium, removing unbound terbium by buffer exchange, and measuring time resolved fluorescence. Proteins were prepared at 30 pM in 20 mM MOPS, 150 mM NaCI (MBS), pH 6.5 and buffer exchanged to remove any unbound Ca. SMT3-A1 (nC-A), P2808 (nC-B), and a negative control protein (recombinant SMT3) were added to a 140 pl reaction in the same buffer so their final concentrations were 8.57 pM, and either CaCl2 or TbCIs added to 300 pM. The reactions were incubated at 4 °C for 16 h, and then buffer exchanged with Zeba 7 MWCO desalting columns into MBS, pH 6.5. The proteins were diluted to 0.5 pM in MBS, pH 6.5 and then 200 pl analyzed in duplicate on i D5 (Molecular Devices) plate reader using time- resolved fluorescence with Ex/Em: 350/544 nm, 200 micro-sec delay. As shown in FIG. 15, P972 (nC-A) exhibited 9X greater fluorescence when incubated with terbium instead of calcium, and P2808 (nC-B) exhibited 23X greater fluorescence when incubated with terbium instead of calcium.

Example 4. Loop length modelling

We next undertook modelling experiments to ascertain the ability of the protein scaffold to maintain its secondary and tertiary structure while varying loop lengths of each of L1 -L8. For each loop, the plasticity was modeled with P2808 as follows. To explore long length, each loop was replaced by a flexible 15-amino acid (648)3 linker (sequence: GGGGSGGGGSGGGGS (SEQ ID NO: 41 ). The protein fold was modeled in AlphaFold mmseq without relaxation. The top result was aligned with P2808 in Swiss PDB Viewer with the MagicFit function.

The loop, the flanking N-terminal, and the flanking C-terminal amino acid are shown in different shades. The structure was assessed qualitatively for the maintenance of the overall beta-sheet structures. Structures that maintained the overall beta-sheet structure were considered to maintain the overall fold. To explore short loop lengths, each loop was completely deleted and modeled as above. If the complete deletion did not impact the overall fold, no additional constructs were modeled. The complete deletion was aligned with Swiss PDB Viewer with the MagicFit function and then assessed qualitatively.

A deletion was considered to result in disruption of the overall beta-sheet structure if a betastrand secondary structure assignment was converted to a coil assignment or one or more beta strands lost association with an adjacent beta-strand. If the complete deletion resulted in a disruption of the overall beta-sheet structure, a deletion series was made starting with each wild-type amino acid replaced by G and then removing one G at a time. The construct in the deletion series with the shortest loop length that maintained the fold qualitatively was aligned and assessed. The results from these modelling experiments are shown in FIGS. 17-26.

Our database of nanoCLAMP binders was searched for any clones whose loops deviated from the standard lengths of Loop 1 : 4 amino acids, Loop 2: 7 amino acids and Loop 8: 5 amino acids. The resulting clones are listed in Table 9 above. Tables 11 and 12 below summarize the lengths for clones observed with the nC-B and nC-A scaffolds. Table 11 shows top loops and Table 12 shows bottom loops.

The third column indicates the length diversity observed across orthologs. The observation of variation correlates roughly with the modeling data.

Table 11. Lengths observed for top loops in isolated nanoCLAMP clones or in alignment with NagH CBM32-2 in different species

Table 12. Lengths observed for bottom loops in alignment with NagH CBM32-2 in different species

Taken together, the modeling results, the observed loop lengths in isolated nanoCLAMPs, and natural variation in loop length across species indicate that the length and sequence of each of L1 -L8 can be independently varied without disrupting the core fold of the protein.

Example 5. Introduction of artificial disulfides into the nC-B scaffold to improve stability

The well characterized SUMO binder P2808 was modeled with AlphaFold to rationally select adjacent residues on neighboring beta strands for substitution with Cys residues, with the aim of further stabilizing the protein by introducing a disulfide bond. Substitution mutations were chosen by visual inspection of the structure to identify residues whose side chains were located in the core of the protein, whose side were oriented towards each other, and whose alpha carbons were approximately the same distance apart as observed with natural disulfide bonds. AlphaFold modeling predicted that several of the selected substitution mutations would form disulfide bonds and that a few that would not (Table 13). We cloned, expressed, and purified 14 mutants predicted to form disulfides (12 mutants of P2808 and 2 mutants of a P2808 mutant, P2960, which has DGGGSS871 -876GDT and DHTGAP900-905SST X and Y loops from C. celatum). The proteins were immobilized on Ni Sepharose 6 Fast Flow beads under denaturing and reducing conditions and refolded by reducing concentration of denaturant over time, in the presence of reduced and oxidized glutathione to aid in formation of disulfide bonds. The refolded proteins were then eluted from the resin and tested for the presence of disulfide bonds by mobility shift on SDS PAGE under reducing vs oxidizing conditions. Because proteins possessing intramolecular disulfides remain more compact than those that do not, proteins with disulfides typically run faster in SDS-PAGE due to their smaller hydrodynamic radius. Thirteen of the 14 proteins predicted by AlphaFold to form disulfide bonds migrated faster on SDS PAGE in sample buffer lacking reducing agent than in sample buffer containing reducing agent. This observation is consistent with the agent reducing disulfide bonds in those proteins (FIG. 27). The proteins that appeared to possess disulfide bonds also had a band that ran similarly to reduced form. This observation may indicate a mixed population of proteins with and without disulfides. These preparations also contained a small percentage of higher molecular weight species, likely intermolecular disulfide bonded multimers, which largely disappeared upon reduction. The control proteins, P2808 and P2960, which contain no cysteines, migrated at the same rate in both the reducing and non-reducing buffer. Further, an unrelated disulfide-containing control protein, bovine serum albumin (BSA), showed the expected decrease in electrophoretic mobility in reducing buffers. Taken together, these results indicate that disulfide bonds can be successfully introduced into nC-B scaffold in the twelve positive constructs in Table 13. Clones P3007, P3008, and P3009 contain no lysines. The absence of lysines is expected to reduce susceptibility to trypsin and increase the specificity of labeling with aminereactive reagents. These scaffolds have only a single primary amine located the N-terminal alpha amino group and are expected to be modified specifically at this position by amine-reactive reagents. Clones P3013 and P3014 contain no asparagines, common sites of deamidation, so are expected to be less susceptible to deamidation.

Table 13. nanoCLAMPs with introduced pairs of Cysteines for disulfide formation

Neutral or better Tm is defined by Tm of basis, +/- 4C.

¹ Clones with disulfide bonds and good thermal stability

To ensure that the thermal stability was not negatively affected by the mutations, we measured the melting temperature of the mutants and the parents using differential scanning fluorimetry (DSF). We defined neutral or better results as a T_m not more than 4°C below that of the parent. Clones P3010, P3011 , P3017 and P3021 had deleterious effects on thermal stability and were not pursued further. Of the 14 proteins tested, 10 had neutral or better effect on the melting temperature and were at least as thermally stable as the parent constructs (Table 13). One clone, P3015 with a disulfide bond between residues 884 and 926, showed an 8°C improvement in thermal stability in oxidized vs. reduced conditions (FIG. 27). Materials and Methods

AlphaFold modeling of introduced disulfides

The 2808 sequence was modeled in AlphaFold with pairs of cysteine substitutions. The version was mmseq, and the modeling was performed with relaxation.

Analysis of disulfides by SDS PAGE nanoCLAMPs were purified as described above, under denaturing conditions, except the refolding step was modified. Briefly, nanoCLAMPs were bound to Ni Sepharose 6 Fast Flow (Cytivia) in 6 M GuHCI, 20 mM Tris, pH 8 (QCB)+ 5 mM TCEP. The resins were washed with 5 column volumes (CV) QCB + 1 mM TCEP, then 5 CV QCB (no TCEP). The proteins were then gradually refolded by washing with 10 CV QCB + 2 mM GSH/1 mM GSSG, then steps of 5 CVs each stepping the GuHCI down from 4, 3, 2, 1 , and finally 0 M GuHCI by diluting with 20 mM MOPS, 150 mM NaCI (MBS), 1 mM CaCl2, 2 mM GSH/1 mM GSSG, pH 8. Each refolding step of 5 CVs was incubated for 30 min. The refolded protein was then washed in 10 CV MBS, pH 8, 1 mM CaCL, and finally eluted with MBS, pH 8, 1 mM CaCl2, 250 mM imidazole. Proteins were normalized to 1 mg/ml in MBS, 1 mM CaCl2, pH 6.5. The proteins were diluted 10X into SDS sample buffer containing 50 mM DTT (reducing) and SDS sample buffer lacking reducing agent. The proteins were heated to 95 °C for 5 min, cooled and 1 pg separated on 12% NuPAGE BisTris gel (Thermo) with MES running buffer. Gels were stained with GelCode Blue (Thermo).

Differential scanning fluorimetry (DSF)

DSF was performed as described previously, except in the reducing case, TCEP was included in the DSC cocktail at 50 mM (final). The DSC program was performed as described, and the Tm measured at the inflection point of the curve.

Example 6. Identification of Framework Variants that Maintain Binding Activity

Phage library NL-26 contains clones of nanoCLAMPs with the wild-type nC-B framework as well as mutations resulting from errors in gene synthesis, PCR and phage propagation. The library was screened and analyzed to identify nanoCLAMPs that maintain the ability to bind their intended target and that contain one or more mutations in the framework regions. The analysis resulted in the identification of 105 nanoCLAMP variants, each recognizing one of four target antigens and each containing one or more mutations in the framework regions. The number of mutations identified in each framework and their position are summarized in Tables 14 and 15. A listing of individual enriched clones, targets and framework mutations is shown in Table 16. The identification of these variants in this non-exhaustive analysis indicates that each framework region can tolerate one or more mutations while maintaining the ability to be displayed on the phage surface and mediate binding to its target.

Methods

The NL-26 phage library was panned against recombinant GFPMut2, Human Serum Albumin, mCherry, and TEV protease, as described above. Phage were enriched in two rounds, and approximately 200,000 clones from each round were DNA sequenced by next generation DNA sequencing with an Illumina MiSeq system. The sequencing reads were processed with PipeBio software to cluster and count like-sequences. Clones were identified that met the criteria of 1 ) showing greater than two-fold normalized enrichment from Round 1 to Round 2 and 2) having one or more mutations in the framework region. Table 14. Tolerance of Frameworks 1-9 to Mutations

Table 15. Positions of Mutations in Frameworks 1-9

Table 16: Listing of individual enriched clones, target and framework mutations

Other Embodiments

While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the invention that come within known or customary practice within the art to which the invention pertains and may be applied to the essential features hereinbefore set forth, and follows in the scope of the claims.

Other embodiments are within the claims.

Claims

1 . A protein scaffold comprising the structure:

A-F1 -L1 -F2-L2-F3-L3-F4-L4-F5-L5-F6-L6-F7-L7-F8-L8-F9-B; wherein:

F1 -F9 correspond to framework regions 1 -9;

L1 -L8 correspond to loop regions 1 -8;

A and B are each independently, absent or comprise at least one amino acid;

F1 comprises the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 4;

L1 is absent or comprises at least one amino acid;

F2 comprises the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 5;

L2 is absent or comprises at least one amino acid;

F3 comprises the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 6;

L3 is absent or comprises at least one amino acid;

F4 comprises the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 7;

L4 is absent or comprises at least one amino acid;

F5 comprises the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 8;

L5 is absent or comprises at least one amino acid;

F6 comprises the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 9;

L6 is absent or comprises at least one amino acid;

F7 comprises the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 10;

L7 is absent or comprises at least one amino acid;

F8 comprises the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 11 ;

L8 is absent or comprises at least one amino acid; and

F9 comprises the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 12; and wherein the protein scaffold comprises at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815Xi, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K86OX2, L861X, D862X, R865X, K870X, N871X, N880X, K881X, K883X, N890X, K897X, K901X, K908X, E912X, S914X, and K922X₃ relative to SEQ ID NO: 1 , wherein:

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

2. The protein scaffold of claim 1 , wherein:

F1 comprises the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 4;

F2 comprises the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 5;

F3 comprises the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 6;

F4 comprises the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 7;

F5 comprises the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 8;

F6 comprises the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 9;

F7 comprises the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 10;

F8 comprises the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 11 ; and

F9 comprises the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 12.

3. The protein scaffold of claim 2, wherein:

F1 comprises the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4);

F2 comprises the sequence of: G-(S/N/T)-E-(A/S)-(D/N/S/A)-LLDGDD-(S/N/T)-TGV-(E/W/A)-Y (SEQ ID NO: 5);

F3 comprises the sequence of: S-(L/V)-AGEFIGLDLG (SEQ ID NO: 6);

F4 comprises the sequence of: G-(I/V)-(H/R/Y/N)-FVIG-(A/K/R)-(D/N) (SEQ ID NO: 7);

F5 comprises the sequence of: DKW-(T/N/S)-(R/K)-F-(R/K)-LEYS (SEQ ID NO: 8);

F6 comprises the sequence of: WTTI-(R/K/H/Q)-EYD-(H/K/R/Q) (SEQ ID NO: 9);

F7 comprises the sequence of: (Q/K)-DVI-(D/E)-E-(D/S)-F (SEQ ID NO: 10);

F8 comprises the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ); and

F9 comprises the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12).

4. The protein scaffold of any one of claims 1 -3, wherein the protein scaffold comprises at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815X, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K860X, L861 X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X relative to SEQ ID NO: 1 , wherein X is any amino acid.

5. The protein scaffold of claim 4, wherein the protein scaffold comprises at least one mutation selected from the group consisting of N807D, S809T, R812H, S813T, E814P, S815G, D818V, N822S, N825D, N832S, W836E, K857E, E858V, I859V, K860E, L861 V, D862G, R865H, K870A, N871 D, N880T, K881 R, K883R, N890G, K897R, K901 H, K908Q, E912D, S914D, and K922Q relative to SEQ ID NO: 1 .

6. The protein scaffold of claim 4, wherein at least one mutation is K870X and/or N890X.

7. The protein scaffold of claim 6, wherein at least one mutation is K870A and/or N890G.

8. The protein scaffold of any one of claims 1 -7, comprising at least 3 fewer lysines relative to SEQ ID NO: 1 .

9. The protein scaffold of claim 8, comprising at least 6 fewer lysines relative to SEQ ID NO: 1 .

10. The protein scaffold of claim 9, comprising 9 fewer lysines relative to SEQ ID NO: 1 .

11 . The protein scaffold of claim 9, comprising no lysines.

12. The protein scaffold of any one of claims 1 -1 1 , comprising at least 3 fewer asparagines relative to SEQ ID NO: 1 .

13. The protein scaffold of claim 12, comprising at least 5 fewer asparagines relative to SEQ ID NO: 1 .

14. The protein scaffold of claim 13, comprising 7 fewer asparagines relative to SEQ ID NO: 1 .

15. The protein scaffold of claim 13, comprising no asparigines.

16. The protein scaffold of any one of claims 1 -15, wherein A and B are each independently, absent or from 1 amino acid to 20 amino acids.

17. The protein scaffold of any one of claims 1 -16, wherein each of L1 -L8 is, independently, from 1 amino acid to 20 amino acids.

18. The protein scaffold of claim 17, wherein each of L1 -L8 is, independently, from 1 amino acid to 10 amino acids.

19. The protein scaffold of claim 18, wherein each of L1 -L8 is, independently, from 3 amino acids to 10 amino acids.

20. The protein scaffold of claim 19, wherein each of L1 -L8 is, independently, from 3 amino acids to 8 amino acids.

21 . The protein scaffold of claim 20, wherein L1 is 4 amino acids, L2 is 7 amino acids, and/or L8 is 5 amino acids.

22. The protein scaffold of claim 21 , wherein L1 comprises the sequence of: X1X2X3X4 (SEQ ID NO: 13), wherein each of X1-X4 is, independently, any amino acid.

23. The protein scaffold of claim 21 or 22, wherein L2 comprises the sequence of: XIX2X3X4XSX6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid.

24. The protein scaffold do of any one of claims 21 -23, wherein L8 comprises the sequence of: XIX2X3X4XS (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid.

25. The protein scaffold of any one of claims 1 -23, wherein L4 comprises the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid.

26. The protein scaffold of any one of claims 1 -25, wherein L6 comprises the sequence of: X1X2X3X4X5X6 (SEQ ID NO: 16), wherein each of Xi-Xe is, independently, any amino acid.

27. The protein scaffold of any one of claims 1 -26, wherein L8 comprises at least two amino acids.

28. The protein scaffold of any one of claims 1 -27, wherein L4 comprises the sequence of: (G/D)-GGSS (SEQ ID NO: 17) or GDT or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 17 or GDT.

29. The protein scaffold of any one of claims 1 -28, wherein L6 comprises the sequence of TGAPAG (SEQ ID NO: 18) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 18.

30. The protein scaffold of any one of claims 1 -29, wherein:

L4 comprises the sequence of: (G/D)-GGSS (SEQ ID NO: 17) or GDT; and L6 comprises the sequence of TGAPAG (SEQ ID NO: 18).

31 . The protein scaffold of any one of claims 1 -30, wherein L3 comprises the sequence of: (E/K/S)-(V/E)- (V/I/T)-(E/K/P/S)-(V/L)-(G/D) (SEQ ID NO: 19) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 19.

32. The protein scaffold of any one of claims 1 -31 , wherein L5 comprises the sequence of: LD-(G/N)- (E/S)-S (SEQ ID NO: 20) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 20.

33. The protein scaffold of any one of claims 1 -32, wherein L7 comprises at least one amino acid.

34. The protein scaffold of any one of claims 1 -33, wherein L7 comprises the sequence of ETPI-(S/E)-A (SEQ ID NO: 21 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 21 .

35. The protein scaffold of any one of claims 32-34, wherein:

L3 comprises the sequence of: (E/K/S)-(V/E)-(V/I/T)-(E/K/P/S)-(V/L)-(G/D) (SEQ ID NO: 19) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 19;

L5 comprises the sequence of: LD-(G/N)-(E/S)-S (SEQ ID NO: 20) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 20; and

L7 comprises the sequence of ETPI-(S/E)-A (SEQ ID NO: 21 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 21.

36. The protein scaffold of any one of claims 1 -35, wherein A comprises the sequence of (D/N/H)-P.

37. The protein scaffold of claim 36, wherein A comprises the sequence of DP.

38. The protein scaffold of any one of claims 1 -37, wherein B comprises the sequence of DELE (SEQ ID NO: 35).

39. The protein scaffold of any one of claims 1 -38, wherein:

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 22;

L1 is absent or comprises at least one amino acid;

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 23;

L2 is absent or comprises at least one amino acid;

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 24; L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 31 ;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 25;

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 32;

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 26;

L5 comprises the sequence of: LDGES (SEQ ID NO: 33) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 33;

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 27;

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 18;

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 28;

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 34;

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 29;

L8 is absent or comprises at least one amino acid; and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 30.

40. The protein scaffold of claim 39, wherein:

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 22;

L1 is absent or comprises at least one amino acid;

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 23;

L2 is absent or comprises at least one amino acid;

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 24; L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 ) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 31 ;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 25;

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 32;

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 26;

L5 comprises the sequence of: LDGES (SEQ ID NO: 33) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 33;

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 27;

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 18;

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 28;

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 34;

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 29;

L8 is absent or comprises at least one amino acid; and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30) or a sequence having one amino acid insertion, deletion, or substitution mutation relative to SEQ ID NO: 30.

41 . The protein scaffold of any one of claim 40, wherein:

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22);

L1 is absent or comprises at least one amino acid;

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

L2 is absent or comprises at least one amino acid;

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32);

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 comprises the sequence of: LDGES (SEQ ID NO: 33);

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18);

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34);

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 is absent or comprises at least one amino acid; and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

42. The protein scaffold of any one of claim 41 , wherein:

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22);

L1 comprises the sequence of: XIX2XSX4 (SEQ ID NO: 13), wherein each of X1-X4 is, independently, any amino acid;

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

L2 comprises the sequence of: X1X2X3X4X5X6X7 (SEQ ID NO: 14), wherein each of X1-X7 is, independently, any amino acid;

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32);

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 comprises the sequence of: LDGES (SEQ ID NO: 33);

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18);

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34);

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29);

L8 comprises the sequence of: X1X2X3X4X5 (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid; and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

43. The protein scaffold of any one of claim 42, wherein:

A comprises the sequence of: DP;

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

L3 comprises the sequence of: EVVEVG (SEQ ID NO: 31 );

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25);

L4 comprises the sequence of: GGGSS (SEQ ID NO: 32);

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

L5 comprises the sequence of: LDGES (SEQ ID NO: 33);

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27);

L6 comprises the sequence of: TGAPAG (SEQ ID NO: 18);

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28);

L7 comprises the sequence of: ETPISA (SEQ ID NO: 34);

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29); L8 comprises the sequence of: X1X2X3X4X5 (SEQ ID NO: 15), wherein each of X1-X5 is, independently, any amino acid;

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30); and

B comprises the sequence of: DELE (SEQ ID NO: 35).

44. The protein scaffold of claim 42 or 43, wherein L1 comprises the sequence of X1X2X3X4 (SEQ ID NO: 13), wherein each of Xi, X3, and X4 is, independently, any amino acid, and X2 is V.

45. A protein scaffold comprising a polypeptide having at least 80% sequence identity to SEQ ID NO: 3.

46. The protein scaffold of claim 45, wherein the polypeptide has at least 85%, 90%, 95%, 97%, or 99% sequence identity to SEQ ID NO: 3.

47. The protein scaffold of claim 45 or 46, wherein the polypeptide comprises at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815Xi, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K86OX2, L861 X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X₃ relative to SEQ ID NO: 1 , wherein:

Xi is any amino acid except R or S;

X2 is any amino acid except P or K; and

X3 is any amino acid except R or K.

48. The protein scaffold of claim 47, wherein the protein scaffold comprises at least one mutation selected from the group consisting of N807X, S809X, R812X, S813X, E814X, S815X, D818X, N822X, N825X, N832X, W836X, K857X, E858X, I859X, K860X, L861 X, D862X, R865X, K870X, N871 X, N880X, K881 X, K883X, N890X, K897X, K901 X, K908X, E912X, S914X, and K922X relative to SEQ ID NO: 1 , wherein X is any amino acid.

49. The protein scaffold of claim 48, wherein the protein scaffold comprises at least one mutation selected from the group consisting of N807D, S809T, R812H, S813T, E814P, S815G, D818V, N822S, N825D, N832S, W836E, K857E, E858V, I859V, K860E, L861 V, D862G, R865H, K870A, N871 D, N880T, K881 R, K883R, N890G, K897R, K901 H, K908Q, E912D, S914D, and K922Q relative to SEQ ID NO: 1 .

50. A protein scaffold comprising framework regions and loop regions, the protein scaffold comprising at least 7 framework regions from the following structure:

A-F1 -L1 -F2-L2-F3-L3-F4-L4-F5-L5-F6-L6-F7-L7-F8-L8-F9-B; wherein:

F1 -F9 correspond to framework regions 1 -9;

L1 -L8 correspond to loop regions 1 -8 that are each independently, absent or comprises one or more amino acids;

A and B are each independently, absent or comprise at least one amino acid; F1 comprises the sequence of: (T/S)-LI-(H/R)-(T/S)-(P/E)-(G/S)-W (SEQ ID NO: 4) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 4;

F8 comprises the sequence of: (Q/K/R)-YIRLTNLE (SEQ ID NO: 11 ) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 11 ; and

F9 comprises the sequence of: LTFSEFA-(I/V)-VS (SEQ ID NO: 12) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 12.

51 . The protein scaffold of claim 50, comprising at least 7 of the framework regions, wherein

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 24;

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 25; F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 26;

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29) or a sequence having one or two amino acid insertions, deletions, substitution mutations, or a combination thereof relative to SEQ ID NO: 29; and

52. The protein scaffold of claim 51 , comprising at least 7 of the framework regions, wherein

F1 comprises the sequence of: TLIHTPGW (SEQ ID NO: 22);

F2 comprises the sequence of: GSEADLLDGDDSTGVEY (SEQ ID NO: 23);

F3 comprises the sequence of: SLAGEFIGLDLG (SEQ ID NO: 24);

F4 comprises the sequence of: GIHFVIGAD (SEQ ID NO: 25);

F5 comprises the sequence of: DKWTRFRLEYS (SEQ ID NO: 26);

F6 comprises the sequence of: WTTIREYDH (SEQ ID NO: 27);

F7 comprises the sequence of: QDVIDEDF (SEQ ID NO: 28);

F8 comprises the sequence of: QYIRLTNLE (SEQ ID NO: 29); and

F9 comprises the sequence of: LTFSEFAIVS (SEQ ID NO: 30).

53. The protein scaffold of any one of claims 50-52, wherein the protein scaffold comprises at least 8 of the framework regions F1 -F9.

54. The proteins scaffold of any one of claims 50-53, wherein the protein scaffold comprises at least 80%, 85%, 90%, 95%, 97%, or 99% sequence identity to the framework regions F1 -F9 over one or more regions of alignment.

55. The protein scaffold of any one of claims 1 -54, further comprising a substitution mutation that adds a cysteine residue.

56. The protein scaffold of claim 55, wherein the protein scaffold comprises a first substitution mutation that adds a first cysteine residue and a second substitution mutation that adds a second cysteine residue.

57. The protein scaffold of claim 56, wherein the first cysteine residue and the second cysteine residue form a disulfide bond under oxidizing conditions.

58. The protein scaffold of any one of claims 55-57, wherein the protein scaffold comprises at least one mutation selected from the group consisting of F806C, P808C, S845C, L855C, V858C, V861 C, K878C, W879C, L884C, L888C, A904C, P905C, A906GC, G907C, I924C, L926C, N928C, L936C, I943C, L948C.

59. The protein scaffold of claim 58, wherein the protein scaffold comprises at least two or more mutations selected from the group consisting of F806C, P808C, S845C, L855C, V858C, V861 C, K878C, W879C, L884C, L888C, A904C, P905C, A906GC, G907C, I924C, L926C, N928C, L936C, I943C, L948C.

60. The proteins scaffold of claim 59, wherein the protein scaffold comprises a pair of cysteine mutations selected from the group consisting of K878C and G907C, K878C and A904C, V861 C and I943C, P905C and L855C, S845C and L936C, W879C and N928C, L884C and L926C, F806C and L948C, V858C and L888C, K878C and G907C, K878C and A906GC, S845C and N928C, K878C and A904C, P808C and I943C, V861 C and I924C, P808C and V861 C, and I943C and L855C.

61 . The protein scaffold of claim 60, wherein the pair of cysteine mutations is selected from the group consisting of K878C and G907C, K878C and A904C, S845C and L936C, W879C and N928C, W879C and N928C, L884C and L926C, V858C and L888C, K878C and G907C, and K878C and A906GC.

62. The protein scaffold of any one of claims 1 -61 , further comprising a tag covalently attached to the scaffold.

63. The protein scaffold of claim 62, wherein the tag is an affinity tag.

64. The protein scaffold of claim 62 or 63, wherein the tag is attached to the N-terminus or the C- terminus of the scaffold.

65. The protein scaffold of any one of claims 1 -64, wherein the scaffold is conjugated to a functional group.

66. The protein scaffold of claim 65, wherein the functional group comprises biotin, streptavidin or a derivative of streptavidin, a polyethylene glycol moiety, a fluorescent dye, an enzyme, a radioactive moiety, a lanthanide, or a lanthanide binding motif.

67. The protein scaffold of claim 66, wherein the lanthanide is terbium.

68. The protein scaffold of claim 66, wherein the radioactive moiety is an a or emitter.

69. The protein scaffold of any one of claims 65-68, wherein the functional group is conjugated to sulfhydryl group or a primary amine.

70. A polynucleotide encoding the protein scaffold of any one of claims 1 -69.

71 . The polynucleotide of claim 70, wherein the polynucleotide is a ribonucleotide.

72. The polynucleotide of claim 70, wherein the polynucleotide is a deoxyribonucleotide.

73. A vector comprising the polynucleotide of claim 71 or 72.

74. A cell comprising the polynucleotide of any one of claims 70-72 or the vector of claim 73.

75. A method of producing the protein scaffold of any one of claims 1 -69 comprising:

(a) providing a cell transformed with the polynucleotide of any one of claims 70-72 or the vector of claim 73;

(b) culturing the transformed cell under conditions for expressing the polynucleotide, wherein the culturing results in expression of the protein scaffold; and

(c) isolating the protein scaffold.

76. A particle comprising the protein scaffold of any one of claims 1 -69.

77. The particle of claim 76, wherein the particle is a magnetic particle.

78. A resin comprising a plurality of the particles of claim 76 or 77.

79. A column comprising the resin of claim 78.

80. A method of purifying a target molecule from a plurality of molecules, the method comprising:

(a) providing a sample comprising a mixture of the target molecule and the plurality of molecules;

(b) contacting the sample with the protein scaffold of any one of claims 1 -69, wherein the scaffold specifically binds to the target molecule; and

(c) separating the target molecule bound to the protein scaffold from the plurality of molecules.

81 . The method of claim 80, wherein the step of separating comprises immobilizing the protein scaffold.

82. The method of claim 81 , wherein the protein scaffold is conjugated to a particle.

83. The method of claim 82, wherein the particle comprises a magnetic bead.

84. The method of claim 82 or 83, wherein the protein scaffold is conjugated to a resin or monolith comprising a plurality of the particles.